Java Heap Space Error

2018-02-16 Thread Vinay Muttineni
Hello, I am trying to debug a PySpark program and quite frankly, I am stumped. I see the following error in the logs. I verified the input parameters - all appear to be in order. Driver and executors appear to be proper - about 3MB of 7GB being used on each node. I do see that the DAG plan that i

OOM error with GMMs on 4GB dataset

2015-05-04 Thread Vinay Muttineni
Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760). The spark (1.3.1) job is allocated 120 executors with 6GB each and the driver also has 6GB. Spark Config Params: .set("spark.hadoop.validateOutputSpecs", "false").set("spark.dynamicAllocation.enabled", "false").set("spark.