Hi, everyone

I have a problem when I run the WordCount example. I read 6G data from hdfs , 
when I run collect(), the executer had died .
there is the exception :
13/12/18 13:19:39 INFO ClusterTaskSetManager: Lost TID 55 (task 0.0:3)
13/12/18 13:19:39 INFO ClusterTaskSetManager: Loss was due to task 55 result 
exceeding Akka frame size; aborting job
13/12/18 13:19:39 INFO ClusterScheduler: Remove TaskSet 0.0 from pool
13/12/18 13:19:39 INFO DAGScheduler: Failed to run collect at 
JavaWordCount.java:60
Exception in thread "main" org.apache.spark.SparkException: Job failed: Task 55 
result exceeded Akka frame size
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
        at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
        at 
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)

I saw there are some issues about this question in the github , it seems that 
if the middle resultset is larger than Akka frame size , the job will fail . 
I want to know if I can change some params to solve the problem ?

Thanks 

Leo




[email protected]

Reply via email to