Hi Leo, Akka is used to transfer the data back to the master, and there is a setting in Akka for the max message size, which is default to 10 MB here, you can find it at: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala
So just increase spark.akka.frameSize to a larger number. On Wed, Dec 18, 2013 at 4:49 PM, [email protected] <[email protected]>wrote: > Hi, everyone > > I have a problem when I run the WordCount example. I read 6G data from > hdfs , when I run collect(), the executer had died . > there is the exception : > 13/12/18 13:19:39 INFO ClusterTaskSetManager: Lost TID 55 (task 0.0:3) > 13/12/18 13:19:39 INFO ClusterTaskSetManager: Loss was due to task 55 > result exceeding Akka frame size; aborting job > 13/12/18 13:19:39 INFO ClusterScheduler: Remove TaskSet 0.0 from pool > 13/12/18 13:19:39 INFO DAGScheduler: Failed to run collect at > JavaWordCount.java:60 > Exception in thread "main" org.apache.spark.SparkException: Job failed: > Task 55 result exceeded Akka frame size > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) > at > org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) > > I saw there are some issues about this question in the github , it seems > that if the middle resultset is larger than Akka frame size , the job will > fail . > I want to know if I can change some params to solve the problem ? > > Thanks > > Leo > > ------------------------------ > [email protected] >
