I have occurred the same problem with you . I have a node of 20 machines, and I just run the broadcast example, what I do is just change the data size in the example, to 400M, this is really a small data size. but I occurred the same problem with you . *So I wonder maybe the broadcast capacity is weak in the spark system?*
here is my config: *SPARK_MEM=12g* *SPARK_MASTER_WEBUI_PORT=12306* *SPARK_WORKER_MEMORY=12g* *SPARK_JAVA_OPTS+="-Dspark.executor.memory=8g -Dspark.akka.timeout=600 -Dspark.local.dir=/disk3/lee/tmp -Dspark.worker.timeout=600 -Dspark.akka.frameSize=10000 -Dspark.akka.askTimeout=300 -Dspark.storage.blockManagerTimeoutIntervalMs=100000 -Dspark.akka.retry.wait=600 -Dspark.blockManagerHeartBeatMs=80000 -Xms15G -Xmx15G -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit"* On Sat, Jan 11, 2014 at 8:27 AM, Khanderao kand <[email protected]>wrote: > If your object size > 10MB you may need to change spark.akka.frameSize. > > What is your spark, spark.akka.timeOut ? > > did you change spark.akka.heartbeat.interval ? > > BTW based on large size getting broadcasted across 25 nodes, you may want to > consider the frequency of such transfer and evaluate alternative patterns. > > > > > On Tue, Jan 7, 2014 at 12:55 AM, Sebastian Schelter <[email protected]>wrote: > >> Spark repeatedly fails broadcast a large object on a cluster of 25 >> machines for me. >> >> I get log messages like this: >> >> [spark-akka.actor.default-dispatcher-4] WARN >> org.apache.spark.storage.BlockManagerMasterActor - Removing BlockManager >> BlockManagerId(3, cloud-33.dima.tu-berlin.de, 42185, 0) with no recent >> heart beats: 134689ms exceeds 45000ms >> >> Is there something wrong with my config? Do I have to increase some >> timeout? >> >> Thx, >> Sebastian >> > >
