I got a 40 node cdh 5.1 cluster and attempting to run a simple spark app that
processes about 10-15GB raw data but I keep running into this error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Each node has 8 cores and 2GB memory. I notice the heap size on the
executors is set to
(- incubator list, + user list)
(Answer copied from original posting at
http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-app-throwing-java-lang-OutOfMemoryError-GC-overhead-limit/m-p/16396#U16396
-- let's follow up one place. If it's not specific to CDH, this is a
good place