I have occurred the same problem with you .
I have a node of 20 machines, and I just run the broadcast example, what I
do is just change the data size in the example, to 400M, this is really a
small data size. but I occurred the same problem with you .
*So I wonder maybe the broadcast capacity is weak in the spark system?*


here is my config:

*SPARK_MEM=12g*
*SPARK_MASTER_WEBUI_PORT=12306*
*SPARK_WORKER_MEMORY=12g*
*SPARK_JAVA_OPTS+="-Dspark.executor.memory=8g -Dspark.akka.timeout=600
 -Dspark.local.dir=/disk3/lee/tmp -Dspark.worker.timeout=600
-Dspark.akka.frameSize=10000 -Dspark.akka.askTimeout=300
-Dspark.storage.blockManagerTimeoutIntervalMs=100000
-Dspark.akka.retry.wait=600 -Dspark.blockManagerHeartBeatMs=80000 -Xms15G
-Xmx15G -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit"*













On Sat, Jan 11, 2014 at 8:27 AM, Khanderao kand <[email protected]>wrote:

> If your object size > 10MB you may need to change spark.akka.frameSize.
>
> What is your spark, spark.akka.timeOut ?
>
> did you change   spark.akka.heartbeat.interval  ?
>
> BTW based on large size getting broadcasted across 25 nodes, you may want to 
> consider the frequency of such transfer and evaluate alternative patterns.
>
>
>
>
> On Tue, Jan 7, 2014 at 12:55 AM, Sebastian Schelter <[email protected]>wrote:
>
>> Spark repeatedly fails broadcast a large object on a cluster of 25
>> machines for me.
>>
>> I get log messages like this:
>>
>> [spark-akka.actor.default-dispatcher-4] WARN
>> org.apache.spark.storage.BlockManagerMasterActor - Removing BlockManager
>> BlockManagerId(3, cloud-33.dima.tu-berlin.de, 42185, 0) with no recent
>> heart beats: 134689ms exceeds 45000ms
>>
>> Is there something wrong with my config? Do I have to increase some
>> timeout?
>>
>> Thx,
>> Sebastian
>>
>
>

Reply via email to