>From the documentation this is what I understood:

1. spark.worker.timeout: Number of seconds after which the standalone
deploy master considers a worker lost if it receives no heartbeats.
default: 60

I increased it to be 600

It was pointed before that if there is GC overload and the worker takes
time to respond, master thinks worker JVM died.

I have seen this issue as well several times.

2. spark.akka.timeout: Communication timeout between Spark nodes, in
seconds.
default: 100

I increased it to 200 as it was pointed before but I don't understand when
the communication timeout is triggered. Some explanation on this setting
will be very helpful.

3. spark.storage.blockManagerSlaveTimeoutMs: I could not find documentation
but as Patrick said the 45000 number coming from this.

How is this related to spark.worker.timeout?

I bumped it up to 300s but JVM can go to GC only if there is pressure on
JVM right....May be I need to do a yourkit run to understand the memory
usage in more detail. Any suggestions on how to setup yourkit for memory
analysis ?

I set it using the following options in spark_env.sh:

export SPARK_JAVA_OPTS="-Dspark.local.dir=/app/spark/tmp
-Dspark.storage.blockManagerSlaveTimeoutMs=300000
-Dspark.worker.timeout=600 -Dspark.akka.timeout=200"



This is the correct way to specify spark.storage.blockManagerSlaveTimeoutMs
?


On Sat, Apr 5, 2014 at 4:00 AM, azurecoder <rich...@elastacloud.com> wrote:

> Interested in a resolution to this. I'm building a large triangular matrix
> so
> doing similar to ALS - lots of work on the worker nodes and keep timing
> out.
>
> Tried a few updates to akka frame sizes, timeouts and blockmanager but
> unable to complete. Will try the blockmanagerslaves property now and let
> you
> know the effect. That property doesn't appear to be documented on the site
> though.
>
> Cheers!
>
> Richard
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Heartbeat-exceeds-tp3798p3809.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to