Dear Pat, Thanks for the detailed guide. It is nice to know it is possible. But I am not sure if I understand it correctly, so could you please point out any misunderstanding in the following? (If there is any)
==== Let's say I have 3 machines. There is a machine [EventServer and data store) for ES, HBase+HDFS (or Postgres, but not recommended) The other 2 machines will both connect to this machine. It is permanent. machine [TrainingServer] will run `pio build` and `pio train` This step pull training data from [EventServer] and then store model and metadata back, It is not permanent. machine [PredictionServer] gets a copy of the template from machine [TrainingServer] (only need to do this once) Then run `pio deploy` It is not a Spark driver or executor for training Write a cron job of `pio deploy` It is permanent. ==== Thanks Brian On Wed, Sep 20, 2017 at 11:16 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Yes, this is the recommended config (Postgres is not, but later). Spark is > only needed during training but the `pio train` process creates drives and > executors in Spark. The driver will be the `pio train` machine so you must > install pio on it. You should have 2 Spark machines at least because the > driver and executor need roughly the same memory, more executors will train > faster. > > You will have to spread the pio “workflow” out over a permanent > deploy+eventserver machine. I usually call this a combo PredictionServer and > EventServe. These are 2 JVM processes the take events and respond to queries > and so must be available all the time. You will run `pio eventserver` and > `pio deploy` on this machine. the Spark driver machine will run `pio train`. > Since no state is stored in PIO this will work because the machines get > state from the DBs (HBase is recommended, and Elasticsearch). Install pio > and the UR in the same location on all machines because the path to the UR > is used by PIO to give an id to the engine (not ideal, but oh well). > > Once setup: > > Run `pio eventserver` on the permanent PS/ES machine and input your data > into the EventServer. > Run `pio build` on the “driver” machine and `pio train` on the same machine. > This build the UR, puts metadata about the instance in PIO and creates the > Spark driver, which can use a separate machine or 3 as Spark executors. > Then copy the UR directory to the PS/ES machine and do `pio deploy` from the > copied directory. > Shut down the driver machine and Spark executors. For AWS “stopping" them > means config is saved so you only pay for EBS storage. You will start them > before the next train. > > > From then on there is no need to copy the UR directory, just spin up the > driver and any other Spark machine, do `pio train` and you are done. The > model is automatically hot-swapped with the old one with no downtime and no > need to re-deploy. > > This will only work in this order if you want to take advantage of a > temporary Spark. PIO is installed on the PS/ES machine and the “driver” > machine in exactly the same way connecting to the same stores. > > Hmm, I should write a How to for this... > > > > On Sep 20, 2017, at 3:23 AM, Brian Chiu <br...@snaptee.co> wrote: > > Hi, > > I would like to be able to train and run model on different machines. > The reason is, on my dataset, training takes around 16GB of memory and > deploying only needs 8GB. In order to save money, it would be better > if only a 8GB memory machine is used in production, and only start a > 16GB one perhaps weekly for training. Is it possible with > predictionIO + universal recommender? > > I have done some search and found a related guide here: > https://github.com/actionml/docs.actionml.com/blob/master/pio_load_balancing.md > Which copy the whole template directory and then run pio deploy. But > in their case HBase and elasticsearch cluster are used. In my case > only a single machine is used with elasticsearch and postgresql. Will > this work? (I am flexible about using postresql or localfs or hbase, > but I cannot afford a cluster) > > Perhaps another solution to make the 16GB machine as a spark slave, > start it before training start, and the 8GB machine will connect to > it. Then call pio train; pio deploy on the 8GB machine. Finally > shutdown the 16GB machine. But I have no idea if it can work. And if > yes, is there any documentation I can look into? > > Any other method is welcome! Zero downtime is preferred but not necessary. > > Thanks in advance. > > > Best Regards, > Brian >