Also, currently we are using the Similar Product Recommender and the ECommerce recommender.
-- Cliff. On Tue, Apr 10, 2018, 18:12 Clifford Miller < [email protected]> wrote: > Thanks for the responses. They are very helpful. We are currently using > the Event server for event ingest. > > --Cliff. > > On Tue, Apr 10, 2018, 16:52 Donald Szeto <[email protected]> wrote: > >> Hey Cliff, how are you collecting your events? Is it through PIO's Event >> Server, or generated somehow by another ETL process? >> >> Regards, >> Donald >> >> On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <[email protected]> >> wrote: >> >>> It depends on what templates you are using. For instance the >>> recommenders require queries to the EventStore to get user history so this >>> will not work for them. Some templates do not require Spark to be running >>> at scale except for the training phase (The Universal Recommender for >>> instance) so for that template it is much more cost-effective to stop Spark >>> when not using it. >>> >>> Every template uses the PIO framework in different ways. Dropping the DB >>> is not likely to work, especially if you are using it to store engine >>> metadata. >>> >>> We’d need to know what templates you are using to advise cost savings. >>> >>> From: Miller, Clifford <[email protected]> >>> <[email protected]> >>> Reply: [email protected] <[email protected]> >>> <[email protected]> >>> Date: April 10, 2018 at 11:22:04 AM >>> To: [email protected] <[email protected]> >>> <[email protected]> >>> Subject: Data import, HBase requirements, and cost savings ? >>> >>> I'm exploring cost saving options for a customer that is wanting to >>> utilize PredictionIO. We plan on running multiple engines/templates. We >>> are planning on running everything in AWS and are hoping to not have all >>> data loaded for all templates at once. The hope is to: >>> >>> 1. start up the HBase cluster. >>> 2. Import the events. >>> 3. Train the model >>> 4. then store the model in S3. >>> 5. Then shutdown HBase cluster >>> >>> We have some general questions. >>> >>> 1. Is this approach even feasible? >>> 2. Does PredictionIO require the Event Store (HBase) to be up and >>> running constantly or can we turn it off when not training? If it >>> requires >>> HBase constantly can we do the training from a different HBase cluster >>> and >>> then have separate PIO Event/Engine servers to deploy the applications >>> using the model generated by the larger Hbase cluster? >>> 3. Can the events be stored in S3 and then imported in (pio import) >>> when needed for training? or will we have to copy them out of S3 to our >>> PIO >>> Event/Engine server? >>> 4. Has any import benchmarks been done? Events per second or MB/GB >>> per second? >>> >>> Any assistance would be appreciated. >>> >>> --Cliff. >>> >>> >>> >>> >>
