Hi, everyone
I have a problem when I run the WordCount example. I read 6G data from hdfs ,
when I run collect(), the executer had died .
there is the exception :
13/12/18 13:19:39 INFO ClusterTaskSetManager: Lost TID 55 (task 0.0:3)
13/12/18 13:19:39 INFO ClusterTaskSetManager: Loss was due to task 55 result
exceeding Akka frame size; aborting job
13/12/18 13:19:39 INFO ClusterScheduler: Remove TaskSet 0.0 from pool
13/12/18 13:19:39 INFO DAGScheduler: Failed to run collect at
JavaWordCount.java:60
Exception in thread "main" org.apache.spark.SparkException: Job failed: Task 55
result exceeded Akka frame size
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
I saw there are some issues about this question in the github , it seems that
if the middle resultset is larger than Akka frame size , the job will fail .
I want to know if I can change some params to solve the problem ?
Thanks
Leo
[email protected]