Dear all,

I developed a application that the message size of communication
is greater than 10 MB sometimes.
For smaller datasets it works fine, but fails for larger datasets.
Please check the error message following.

I surveyed the situation online and lots of people said
the problem can be solved by modifying the property of spark.akka.frameSize
and spark.reducer.maxMbInFlight.
It may look like:

134         val conf = new SparkConf()
135             .setMaster(master)
136             .setAppName("SparkLR")
137
.setSparkHome("/home/user/spark-0.9.0-incubating-bin-hadoop2")
138             .setJars(List(jarPath))
139             .set("spark.akka.frameSize", "100")
140             .set("spark.reducer.maxMbInFlight", "100")
141         val sc = new SparkContext(conf)

However, the task still fails with the same error message.
The communication message is the weight vectors of each sub-problem,
it may be larger than 10 MB for higher dimensional dataset.

Is there anybody can help me?
Thanks a lot.

====
[error] (run-main) org.apache.spark.SparkException: Job aborted: Exception
while deserializing and fetching task:*java.lang.OutOfMemoryError: Java
heap space*
org.apache.spark.SparkException: Job aborted: Exception while deserializing
and fetching task: java.lang.OutOfMemoryError: Java heap space
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/>
$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:run for the full output.
====

Chieh-Yen

Reply via email to