Data import, HBase requirements, and cost savings ?
I'm exploring cost saving options for a customer that is wanting to utilize PredictionIO. We plan on running multiple engines/templates. We are planning on running everything in AWS and are hoping to not have all data loaded for all templates at once. The hope is to: 1. start up the HBase cluster. 2. Import the events. 3. Train the model 4. then store the model in S3. 5. Then shutdown HBase cluster We have some general questions. 1. Is this approach even feasible? 2. Does PredictionIO require the Event Store (HBase) to be up and running constantly or can we turn it off when not training? If it requires HBase constantly can we do the training from a different HBase cluster and then have separate PIO Event/Engine servers to deploy the applications using the model generated by the larger Hbase cluster? 3. Can the events be stored in S3 and then imported in (pio import) when needed for training? or will we have to copy them out of S3 to our PIO Event/Engine server? 4. Has any import benchmarks been done? Events per second or MB/GB per second? Any assistance would be appreciated. --Cliff.
Data import, HBase requirements, and cost savings ?
I'm exploring cost saving options for a customer that is wanting to utilize PredictionIO. We plan on running multiple engines/templates. We are planning on running everything in AWS and are hoping to not have all data loaded for all templates at once. The hope is to: 1. start up the HBase cluster. 2. Import the events. 3. Train the model 4. then store the model in S3. 5. Then shutdown HBase cluster We have some general questions. 1. Is this approach even feasible? 2. Does PredictionIO require the Event Store (HBase) to be up and running constantly or can we turn it off when not training? If it requires HBase constantly can we do the training from a different HBase cluster and then have separate PIO Event/Engine servers to deploy the applications using the model generated by the larger Hbase cluster? 3. Can the events be stored in S3 and then imported in (pio import) when needed for training? or will we have to copy them out of S3 to our PIO Event/Engine server? 4. Has any import benchmarks been done? Events per second or MB/GB per second? Any assistance would be appreciated. --Cliff.