Re: Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-29 Thread mmdenny
Hi Supun, Did you look at https://spark.apache.org/docs/latest/tuning.html? In addition to the info there, if you're partitioning by some key where you've got a lot of data skew, one of the task's memory requirements may be larger than the RAM of a given executor, where the rest of the tasks ma

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-27 Thread Supun Nakandala
Hi all, I am trying to do some image analytics type workload using Spark. The images are read in JPEG format and then are converted to the raw format in map functions and this causes the size of the partitions to grow by an order of 1. In addition to this, I am caching some of the data because my