Spark 0.8.1 ( https://spark.incubator.apache.org/releases/spark-release-0-8-1.html), which was released yesterday, adds support for fetching large result sets without needing to tune frame sizes (for large results, it collects them via the BlockManager instead of sending them in Akka messages).
On Wed, Dec 18, 2013 at 1:42 AM, [email protected] <[email protected]>wrote: > Thank you ! > > ------------------------------ > [email protected] > > *From:* Azuryy Yu <[email protected]> > *Date:* 2013-12-18 17:29 > *To:* user <[email protected]> > *Subject:* Re: resultset exceed Akka frame size > Hi Leo, > > Akka is used to transfer the data back to the master, and there is a > setting in Akka for the max message size, which is default to 10 MB here, > you can find it at: > core/src/main/scala/org/apache/spark/util/AkkaUtils.scala > > So just increase spark.akka.frameSize to a larger number. > > > > > On Wed, Dec 18, 2013 at 4:49 PM, [email protected] < > [email protected]> wrote: > >> Hi, everyone >> >> I have a problem when I run the WordCount example. I read 6G data from >> hdfs , when I run collect(), the executer had died . >> there is the exception : >> 13/12/18 13:19:39 INFO ClusterTaskSetManager: Lost TID 55 (task 0.0:3) >> 13/12/18 13:19:39 INFO ClusterTaskSetManager: Loss was due to task 55 >> result exceeding Akka frame size; aborting job >> 13/12/18 13:19:39 INFO ClusterScheduler: Remove TaskSet 0.0 from pool >> 13/12/18 13:19:39 INFO DAGScheduler: Failed to run collect at >> JavaWordCount.java:60 >> Exception in thread "main" org.apache.spark.SparkException: Job failed: >> Task 55 result exceeded Akka frame size >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) >> at >> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) >> at >> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) >> >> I saw there are some issues about this question in the github , it seems >> that if the middle resultset is larger than Akka frame size , the job will >> fail . >> I want to know if I can change some params to solve the problem ? >> >> Thanks >> >> Leo >> >> ------------------------------ >> [email protected] >> > >
