Re: Data import, HBase requirements, and cost savings ?

Donald Szeto Tue, 10 Apr 2018 13:53:00 -0700

Hey Cliff, how are you collecting your events? Is it through PIO's Event
Server, or generated somehow by another ETL process?


Regards,
Donald

On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <[email protected]> wrote:

> It depends on what templates you are using. For instance the recommenders
> require queries to the EventStore to get user history so this will not work
> for them. Some templates do not require Spark to be running at scale except
> for the training phase (The Universal Recommender for instance) so for that
> template it is much more cost-effective to stop Spark when not using it.
>
> Every template uses the PIO framework in different ways. Dropping the DB
> is not likely to work, especially if you are using it to store engine
> metadata.
>
> We’d need to know what templates you are using to advise cost savings.
>
> From: Miller, Clifford <[email protected]>
> <[email protected]>
> Reply: [email protected] <[email protected]>
> <[email protected]>
> Date: April 10, 2018 at 11:22:04 AM
> To: [email protected] <[email protected]>
> <[email protected]>
> Subject:  Data import, HBase requirements, and cost savings ?
>
> I'm exploring cost saving options for a customer that is wanting to
> utilize PredictionIO.  We plan on running multiple engines/templates.  We
> are planning on running everything in AWS and are hoping to not have all
> data loaded for all templates at once.  The hope is to:
>
>    1. start up the HBase cluster.
>    2. Import the events.
>    3. Train the model
>    4. then store the model in S3.
>    5. Then shutdown HBase cluster
>
> We have some general questions.
>
>    1. Is this approach even feasible?
>    2. Does PredictionIO require the Event Store (HBase) to be up and
>    running constantly or can we turn it off when not training?  If it requires
>    HBase constantly can we do the training from a different HBase cluster and
>    then have separate PIO Event/Engine servers to deploy the applications
>    using the model generated by the larger Hbase cluster?
>    3. Can the events be stored in S3 and then imported in (pio import)
>    when needed for training? or will we have to copy them out of S3 to our PIO
>    Event/Engine server?
>    4. Has any import benchmarks been done?  Events per second or MB/GB
>    per second?
>
> Any assistance would be appreciated.
>
> --Cliff.
>
>
>
>

Re: Data import, HBase requirements, and cost savings ?

Reply via email to