Depending on the template you are using the driver and executor memory will 
increase as your data increases. Spark keeps data in memory to get the speed 
increase over something like Hadoop MapReduce by using memory instead of temp 
files. This yields orders of magnitude speed increases but does mean with big 
data PIO and Spark (more specifically) is a memory hog—by design. The memory 
requirements will be far larger than you are used to with DBs or other 
services. The good thing about Spark is that the data can be spread over 
members of a cluster so if you need a 100g data structure in-memory you can put 
10g on each executor—or something like this and the data structures may only be 
loosely linked to the sixe of your input.

TLDR; Experiment to find the driver and executor memory required to run train 
and deploy of your template. For instance the Universal Recommender will need a 
lot of train memory but almost no deploy memory because it does not use Spark 
for deploy. Other templates may need more memory for deploy. Unfortunately the 
template and algorithm greatly affect these numbers and there is generally no 
way but experiment to determine them.


From: George Yarish <[email protected]>
Reply: [email protected] <[email protected]>
Date: July 26, 2018 at 5:51:44 AM
To: [email protected] <[email protected]>
Subject:  Re: Increase heap size for pio deploy  

ok solved by --driver-memory 10g

Sorry for bothering,
George


On Thu, Jul 26, 2018 at 3:25 PM, George Yarish <[email protected]> wrote:
Hi!

Can someone please advise me how to setup java heap size properties for pio 
deploy process?

My current issue is "[ERROR] [LocalFSModels]  Java heap space" during pio 
deploy.
My model takes ~350mb on localfs in model store. 

I was trying something like "JAVA_OPTS=-Xmx4g pio deploy" doesn't work for me. 

Thanks,
George

Reply via email to