Data import, HBase requirements, and cost savings ?

2018-04-10 Thread Miller, Clifford
 I'm exploring cost saving options for a customer that is wanting to
utilize PredictionIO.  We plan on running multiple engines/templates.  We
are planning on running everything in AWS and are hoping to not have all
data loaded for all templates at once.  The hope is to:

   1. start up the HBase cluster.
   2. Import the events.
   3. Train the model
   4. then store the model in S3.
   5. Then shutdown HBase cluster

We have some general questions.

   1. Is this approach even feasible?
   2. Does PredictionIO require the Event Store (HBase) to be up and
   running constantly or can we turn it off when not training?  If it requires
   HBase constantly can we do the training from a different HBase cluster and
   then have separate PIO Event/Engine servers to deploy the applications
   using the model generated by the larger Hbase cluster?
   3. Can the events be stored in S3 and then imported in (pio import) when
   needed for training? or will we have to copy them out of S3 to our PIO
   Event/Engine server?
   4. Has any import benchmarks been done?  Events per second or MB/GB per
   second?

Any assistance would be appreciated.

--Cliff.


Data import, HBase requirements, and cost savings ?

2018-04-09 Thread Miller, Clifford
I'm exploring cost saving options for a customer that is wanting to utilize
PredictionIO.  We plan on running multiple engines/templates.  We are
planning on running everything in AWS and are hoping to not have all data
loaded for all templates at once.  The hope is to:

   1. start up the HBase cluster.
   2. Import the events.
   3. Train the model
   4. then store the model in S3.
   5. Then shutdown HBase cluster

We have some general questions.

   1. Is this approach even feasible?
   2. Does PredictionIO require the Event Store (HBase) to be up and
   running constantly or can we turn it off when not training?  If it requires
   HBase constantly can we do the training from a different HBase cluster and
   then have separate PIO Event/Engine servers to deploy the applications
   using the model generated by the larger Hbase cluster?
   3. Can the events be stored in S3 and then imported in (pio import) when
   needed for training? or will we have to copy them out of S3 to our PIO
   Event/Engine server?
   4. Has any import benchmarks been done?  Events per second or MB/GB per
   second?

Any assistance would be appreciated.

--Cliff.