Thanks for the responses. They are very helpful. We are currently using the Event server for event ingest.
--Cliff. On Tue, Apr 10, 2018, 16:52 Donald Szeto <don...@apache.org> wrote: > Hey Cliff, how are you collecting your events? Is it through PIO's Event > Server, or generated somehow by another ETL process? > > Regards, > Donald > > On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> It depends on what templates you are using. For instance the recommenders >> require queries to the EventStore to get user history so this will not work >> for them. Some templates do not require Spark to be running at scale except >> for the training phase (The Universal Recommender for instance) so for that >> template it is much more cost-effective to stop Spark when not using it. >> >> Every template uses the PIO framework in different ways. Dropping the DB >> is not likely to work, especially if you are using it to store engine >> metadata. >> >> We’d need to know what templates you are using to advise cost savings. >> >> From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com> >> <clifford.mil...@phoenix-opsgroup.com> >> Reply: firstname.lastname@example.org <email@example.com> >> <firstname.lastname@example.org> >> Date: April 10, 2018 at 11:22:04 AM >> To: email@example.com <firstname.lastname@example.org> >> <email@example.com> >> Subject: Data import, HBase requirements, and cost savings ? >> >> I'm exploring cost saving options for a customer that is wanting to >> utilize PredictionIO. We plan on running multiple engines/templates. We >> are planning on running everything in AWS and are hoping to not have all >> data loaded for all templates at once. The hope is to: >> >> 1. start up the HBase cluster. >> 2. Import the events. >> 3. Train the model >> 4. then store the model in S3. >> 5. Then shutdown HBase cluster >> >> We have some general questions. >> >> 1. Is this approach even feasible? >> 2. Does PredictionIO require the Event Store (HBase) to be up and >> running constantly or can we turn it off when not training? If it >> requires >> HBase constantly can we do the training from a different HBase cluster and >> then have separate PIO Event/Engine servers to deploy the applications >> using the model generated by the larger Hbase cluster? >> 3. Can the events be stored in S3 and then imported in (pio import) >> when needed for training? or will we have to copy them out of S3 to our >> PIO >> Event/Engine server? >> 4. Has any import benchmarks been done? Events per second or MB/GB >> per second? >> >> Any assistance would be appreciated. >> >> --Cliff. >> >> >> >> >