Hey Cliff, how are you collecting your events? Is it through PIO's Event Server, or generated somehow by another ETL process?
Regards, Donald On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <[email protected]> wrote: > It depends on what templates you are using. For instance the recommenders > require queries to the EventStore to get user history so this will not work > for them. Some templates do not require Spark to be running at scale except > for the training phase (The Universal Recommender for instance) so for that > template it is much more cost-effective to stop Spark when not using it. > > Every template uses the PIO framework in different ways. Dropping the DB > is not likely to work, especially if you are using it to store engine > metadata. > > We’d need to know what templates you are using to advise cost savings. > > From: Miller, Clifford <[email protected]> > <[email protected]> > Reply: [email protected] <[email protected]> > <[email protected]> > Date: April 10, 2018 at 11:22:04 AM > To: [email protected] <[email protected]> > <[email protected]> > Subject: Data import, HBase requirements, and cost savings ? > > I'm exploring cost saving options for a customer that is wanting to > utilize PredictionIO. We plan on running multiple engines/templates. We > are planning on running everything in AWS and are hoping to not have all > data loaded for all templates at once. The hope is to: > > 1. start up the HBase cluster. > 2. Import the events. > 3. Train the model > 4. then store the model in S3. > 5. Then shutdown HBase cluster > > We have some general questions. > > 1. Is this approach even feasible? > 2. Does PredictionIO require the Event Store (HBase) to be up and > running constantly or can we turn it off when not training? If it requires > HBase constantly can we do the training from a different HBase cluster and > then have separate PIO Event/Engine servers to deploy the applications > using the model generated by the larger Hbase cluster? > 3. Can the events be stored in S3 and then imported in (pio import) > when needed for training? or will we have to copy them out of S3 to our PIO > Event/Engine server? > 4. Has any import benchmarks been done? Events per second or MB/GB > per second? > > Any assistance would be appreciated. > > --Cliff. > > > >
