Re: Advices if your worker die often

Sam Bessalah Thu, 23 Jan 2014 03:07:37 -0800

Definitely. Thanks. I usually just played around timeouts before. But this
helps. Thx



On Thu, Jan 23, 2014 at 11:56 AM, Guillaume Pitel <
[email protected]> wrote:

>  Hi sparkers,
>
> So I had this problem where my workers were dying or disappearing (and I
> had to manually kill -9 their processes) often. Sometimes during a
> computation, sometimes when I Ctrl-C'd the driver, sometimes right at the
> end of an application execution.
>
> It seems that these tuning have solved the problem (in spark-env.sh):
>
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.worker.timeout=600 
> -Dspark.akka.timeout=200 -Dspark.shuffle.consolidateFiles=true"
>
> export SPARK_JAVA_OPTS="-Dspark.worker.timeout=600 -Dspark.akka.timeout=200 
> -Dspark.shuffle.consolidateFiles=true"
>
> Explanation : I've increased the timeout because I had this problem that
> the master was missing a heartbeat, thus removing the worker, and after
> that complaining that an unknown worker was sending heartbeats. I've also
> set the consolidateFiles option, because I noticed that deleting shuffle
> files in /tmp/spark-local* was taking forever because of the many files my
> job created.
>
> I also added this to all my programs right after the creation of the
> sparkContext (sc = sparkContext) to cleanly shutdown when cancelling a job
> :
>
> sys.addShutdownHook( { sc.stop() } )
>
> Hope this can be useful to someone
>
> Guillaume
> --
>    [image: eXenSa]
>  *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
>  41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>

<<exensa_logo_mail.png>>

Re: Advices if your worker die often

Reply via email to