Depending on the template you are using the driver and executor memory will increase as your data increases. Spark keeps data in memory to get the speed increase over something like Hadoop MapReduce by using memory instead of temp files. This yields orders of magnitude speed increases but does mean with big data PIO and Spark (more specifically) is a memory hog—by design. The memory requirements will be far larger than you are used to with DBs or other services. The good thing about Spark is that the data can be spread over members of a cluster so if you need a 100g data structure in-memory you can put 10g on each executor—or something like this and the data structures may only be loosely linked to the sixe of your input.
TLDR; Experiment to find the driver and executor memory required to run train and deploy of your template. For instance the Universal Recommender will need a lot of train memory but almost no deploy memory because it does not use Spark for deploy. Other templates may need more memory for deploy. Unfortunately the template and algorithm greatly affect these numbers and there is generally no way but experiment to determine them. From: George Yarish <[email protected]> Reply: [email protected] <[email protected]> Date: July 26, 2018 at 5:51:44 AM To: [email protected] <[email protected]> Subject: Re: Increase heap size for pio deploy ok solved by --driver-memory 10g Sorry for bothering, George On Thu, Jul 26, 2018 at 3:25 PM, George Yarish <[email protected]> wrote: Hi! Can someone please advise me how to setup java heap size properties for pio deploy process? My current issue is "[ERROR] [LocalFSModels] Java heap space" during pio deploy. My model takes ~350mb on localfs in model store. I was trying something like "JAVA_OPTS=-Xmx4g pio deploy" doesn't work for me. Thanks, George
