Hi all,

Configuration: Standalone 0.9.1-cdh4 cluster, 7 workers per node, 32gb per
worker

I'm running a job on a spark cluster, and running into some strange
behavior. After a while, the akka frame sizes exceed 10mb, and then the
whole job seizes up. I set "spark.akka.frameSize" to 128 in the SparkConf
used to create the spark context (and also set it as a Java system property
on the driver, for good measure). After this, the program didn't hang, but
immediately failed, and logged an error message like the following:
  (on the master):
    14/05/20 21:49:50 INFO SparkDeploySchedulerBackend: Executor 1
disconnected, so removing it
    14/05/20 21:49:50 ERROR TaskSchedulerImpl: Lost executor 1 on [...]:
remote Akka client disassociated
  (on the workers):
    14/05/20 21:50:25 WARN SparkDeploySchedulerBackend: Disconnected from
Spark cluster! Waiting for reconnection...
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Shutting down all
executors
    14/05/20 21:50:25 INFO SparkDeploySchedulerBackend: Asking each executor
to shut down
    14/05/20 21:50:25 INFO AppClient: Stop request to Master timed out; it
may already be shut down.

After lots of fumbling around, I ended up adding
"-Dspark.akka.frameSize=128" to SPARK_JAVA_OPTS in spark-env.sh, under the
theory that the workers couldn't read the larger akka messages. This /seems/
to have made things work, but I'm still a little scared. Is this the
standard way to set the max akka framesize, or is there a way to set it from
the driver and have it propagate to the workers?

Thanks,
Matt





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Setting-spark-akka-frameSize-tp6337.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to