Thank you !
[email protected] From: Azuryy Yu Date: 2013-12-18 17:29 To: user Subject: Re: resultset exceed Akka frame size Hi Leo, Akka is used to transfer the data back to the master, and there is a setting in Akka for the max message size, which is default to 10 MB here, you can find it at: core/src/main/scala/org/apache/spark/util/AkkaUtils.scala So just increase spark.akka.frameSize to a larger number. On Wed, Dec 18, 2013 at 4:49 PM, [email protected] <[email protected]> wrote: Hi, everyone I have a problem when I run the WordCount example. I read 6G data from hdfs , when I run collect(), the executer had died . there is the exception : 13/12/18 13:19:39 INFO ClusterTaskSetManager: Lost TID 55 (task 0.0:3) 13/12/18 13:19:39 INFO ClusterTaskSetManager: Loss was due to task 55 result exceeding Akka frame size; aborting job 13/12/18 13:19:39 INFO ClusterScheduler: Remove TaskSet 0.0 from pool 13/12/18 13:19:39 INFO DAGScheduler: Failed to run collect at JavaWordCount.java:60 Exception in thread "main" org.apache.spark.SparkException: Job failed: Task 55 result exceeded Akka frame size at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) I saw there are some issues about this question in the github , it seems that if the middle resultset is larger than Akka frame size , the job will fail . I want to know if I can change some params to solve the problem ? Thanks Leo [email protected]
