Thanks for the responses.  They are very helpful.  We are currently using
the Event server for event ingest.

--Cliff.

On Tue, Apr 10, 2018, 16:52 Donald Szeto <don...@apache.org> wrote:

> Hey Cliff, how are you collecting your events? Is it through PIO's Event
> Server, or generated somehow by another ETL process?
>
> Regards,
> Donald
>
> On Tue, Apr 10, 2018 at 1:12 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
>> It depends on what templates you are using. For instance the recommenders
>> require queries to the EventStore to get user history so this will not work
>> for them. Some templates do not require Spark to be running at scale except
>> for the training phase (The Universal Recommender for instance) so for that
>> template it is much more cost-effective to stop Spark when not using it.
>>
>> Every template uses the PIO framework in different ways. Dropping the DB
>> is not likely to work, especially if you are using it to store engine
>> metadata.
>>
>> We’d need to know what templates you are using to advise cost savings.
>>
>> From: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com>
>> <clifford.mil...@phoenix-opsgroup.com>
>> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Date: April 10, 2018 at 11:22:04 AM
>> To: user@predictionio.apache.org <user@predictionio.apache.org>
>> <user@predictionio.apache.org>
>> Subject:  Data import, HBase requirements, and cost savings ?
>>
>> I'm exploring cost saving options for a customer that is wanting to
>> utilize PredictionIO.  We plan on running multiple engines/templates.  We
>> are planning on running everything in AWS and are hoping to not have all
>> data loaded for all templates at once.  The hope is to:
>>
>>    1. start up the HBase cluster.
>>    2. Import the events.
>>    3. Train the model
>>    4. then store the model in S3.
>>    5. Then shutdown HBase cluster
>>
>> We have some general questions.
>>
>>    1. Is this approach even feasible?
>>    2. Does PredictionIO require the Event Store (HBase) to be up and
>>    running constantly or can we turn it off when not training?  If it 
>> requires
>>    HBase constantly can we do the training from a different HBase cluster and
>>    then have separate PIO Event/Engine servers to deploy the applications
>>    using the model generated by the larger Hbase cluster?
>>    3. Can the events be stored in S3 and then imported in (pio import)
>>    when needed for training? or will we have to copy them out of S3 to our 
>> PIO
>>    Event/Engine server?
>>    4. Has any import benchmarks been done?  Events per second or MB/GB
>>    per second?
>>
>> Any assistance would be appreciated.
>>
>> --Cliff.
>>
>>
>>
>>
>

Reply via email to