Hi I have Spark job and its executors hits OOM issue after some time and my job hangs because of it followed by couple of IOException, Rpc client disassociated, shuffle not found etc
I have tried almost everything dont know how do I solve this OOM issue please guide I am fed up now. Here what I tried but nothing worked -I tried 60 executors with each executor having 12 Gig/2 core -I tried 30 executors with each executor having 20 Gig/2 core -I tried 40 executors with each executor having 30 Gig/6 core (I also tried 7 and 8 core) -I tried to set spark.storage.memoryFraction to 0.2 in order to solve OOM issue I also tried to set it 0.0 -I tried to set spark.shuffle.memoryFraction to 0.4 since I need more shuffling memory -I tried to set spark.default.parallelism to 500,1000,1500 but it did not help avoid OOM what is the ideal value for it? -I also tried to set spark.sql.shuffle.partitions to 500 but it did not help it just creates 500 output part files. Please make me understand difference between spark.default.parallelism and spark.sql.shuffle.partitions. My data is skewed but not that much large I dont understand why it is hitting OOM I dont cache anything I jsut have four group by queries I am calling using hivecontext.sql(). I have around 1000 threads which I spawn from driver and each thread will execute these four queries. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org