Re: AppMaster OOME on YARN
This is all that I see related to spark.MapOutputTrackerMaster in the master logs after OOME 14/08/21 13:24:45 ERROR ActorSystemImpl: Uncaught fatal error from thread [spark-akka.actor.default-dispatcher-27] shutting down ActorSystem [spark] java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-59" org.apache.spark.SparkException: Error communicating with MapOutputTracker at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:108) at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:114) at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:319) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:82) at org.apache.spark.SparkContext.stop(SparkContext.scala:984) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:449) Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://spark/user/MapOutputTracker#112553370]] had already been terminated. at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:104) > 2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will > create a hashtable of unique lines form this 5G data and keep it in memory. > it is maybe exceeed 16G . So you mean the master gets that information from individual nodes and keeps it in memory? On Aug 21, 2014, at 8:18 PM, Nieyuan wrote: > 1.At begining of reduce task , mask will deliver map output info to every > excutor. You can check stderr to find size of map output info . It should be > : > "spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is > xxx bytes" > > 2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will > create a hashtable of unique lines form this 5G data and keep it in memory. > it is maybe exceeed 16G . > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/AppMaster-OOME-on-YARN-tp12612p12627.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: AppMaster OOME on YARN
1.At begining of reduce task , mask will deliver map output info to every excutor. You can check stderr to find size of map output info . It should be : "spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is xxx bytes" 2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will create a hashtable of unique lines form this 5G data and keep it in memory. it is maybe exceeed 16G . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/AppMaster-OOME-on-YARN-tp12612p12627.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
AppMaster OOME on YARN
Hi, I'm running Spark on YARN carrying out a simple reduceByKey followed by another reduceByKey after some transformations. After completing the first stage my Master runs out of memory. I have 20G assigned to the master, 145 executors (12G each +4G overhead) , around 90k input files, 10+TB data, and 2000 reducers AND no Caching. Below are the are two reduceByKey calls val myrdd = field1And2.map(x => ( x,1)).reduceByKey(_+_, 2000) The second one feeds off of the first one val countHistogram = myrdd.map(x => (x._2,1)).reduceByKey(_+_, 2000) Any idea what that master is doing gorging so much of data filling up its space? There's no collect kind of call that can get the data back to the master. Thanks, Vipul