Re: AppMaster OOME on YARN

2014-08-22 Thread Vipul Pandey
This is all that I see related to spark.MapOutputTrackerMaster in the master 
logs after OOME


14/08/21 13:24:45 ERROR ActorSystemImpl: Uncaught fatal error from thread 
[spark-akka.actor.default-dispatcher-27] shutting down ActorSystem [spark]
java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-59" org.apache.spark.SparkException: Error 
communicating with MapOutputTracker
at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:108)
at 
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:114)
at 
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:319)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:82)
at org.apache.spark.SparkContext.stop(SparkContext.scala:984)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:449)
Caused by: akka.pattern.AskTimeoutException: 
Recipient[Actor[akka://spark/user/MapOutputTracker#112553370]] had already been 
terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134)
at 
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:104)
 


> 2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will
> create a hashtable of unique lines form this 5G data and keep it in memory.
> it is maybe exceeed 16G .

So you mean the master gets that information from individual nodes and keeps it 
in memory? 


 
On Aug 21, 2014, at 8:18 PM, Nieyuan  wrote:

> 1.At begining of reduce task , mask will deliver map output info to every
> excutor. You can check stderr to find size of map output info . It should be
> :
>   "spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is
> xxx bytes"
> 
> 2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will
> create a hashtable of unique lines form this 5G data and keep it in memory.
> it is maybe exceeed 16G .
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/AppMaster-OOME-on-YARN-tp12612p12627.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: AppMaster OOME on YARN

2014-08-21 Thread Nieyuan
1.At begining of reduce task , mask will deliver map output info to every
excutor. You can check stderr to find size of map output info . It should be
:
   "spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is
xxx bytes"

2.Erery excutor will processing 10+TB/2000 = 5G data. ReduceByKey will
create a hashtable of unique lines form this 5G data and keep it in memory.
it is maybe exceeed 16G .



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/AppMaster-OOME-on-YARN-tp12612p12627.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



AppMaster OOME on YARN

2014-08-21 Thread Vipul Pandey
Hi,

I'm running Spark on YARN carrying out a simple reduceByKey followed by another 
reduceByKey after some transformations. After completing the first stage my 
Master runs out of memory.
I have 20G assigned to the master, 145 executors (12G  each  +4G overhead) , 
around 90k input files, 10+TB data, and 2000 reducers AND no Caching. 

Below are the are two reduceByKey calls

 val myrdd = field1And2.map(x => ( x,1)).reduceByKey(_+_, 2000)

The second one feeds off of the first one 

val   countHistogram = myrdd.map(x => (x._2,1)).reduceByKey(_+_, 2000)


Any idea what that master is doing gorging so much of data filling up its 
space? There's no collect kind of call that can get the data back to the 
master. 


Thanks,
Vipul