The command line for any pio command that is launched on Spark can specify the master so you can train on one cluster and deploy on another. This is typical when using the ALS recommenders, which use a big cluster to train but deploy with `pio deploy -- --master local[2]` which would use a local context to load and serve the model. Beware of memory use, wherever the pio command is run will also run the Spark driver, which can have large memory needs, as large as the executors, which run on the cluster. If you run 2 contexts on the same machine, one with a local master and one with a cluster master you will have 2 drivers and may have executors also.
Yarn allows you to run the driver on a cluster machine but is somewhat complicated to setup. On Oct 21, 2016, at 4:53 AM, Georg Heiler <[email protected]> wrote: Hi, I am curious if prediction.IO supports different environments e.g. is it possible to define a separate spark context for training and serving of the model in engine.json? The idea is that a trained model e.g. xgboost could be evaluated very quickly outside of a cluster environment (no yarn, ... involved, only prediction.io <http://prediction.io/> in docker with a database + model in file system) Cheers, Georg
