1) As i am limited with 12G and i was doing a brodcast join (collect data
and then publish), it was throwing OOM. The data size was 25G and limit was
12G, hence i reverted back to regular join.
2) I started using repartitioning, i started with 100 and now trying 200.
At beginning it looked promisi
Hi Deepak,
I wrote a couple posts with a bunch of different information about how to
tune Spark jobs. The second one might be helpful with how to think about
tuning the number of partitions and resources? What kind of OOMEs are you
hitting?
http://blog.cloudera.com/blog/2015/03/how-to-tune-your
Really not expert here, but try the following ideas:
1) I assume you are using yarn, then this blog is very good about the resource
tuning:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
2) If 12G is a hard limit in this case, then you have no option but lower yo