Re: Lost executor on YARN ALS iterations

2014-09-10 Thread Sandy Ryza
That's right On Tue, Sep 9, 2014 at 2:04 PM, Debasish Das debasish.da...@gmail.com wrote: Last time it did not show up on environment tab but I will give it another shot...Expected behavior is that this env variable will show up right ? On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
Hi Sandy, Any resolution for YARN failures ? It's a blocker for running spark on top of YARN. Thanks. Deb On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote: Hi Deb, I think this may be the same issue as described in https://issues.apache.org/jira/browse/SPARK-2121 . We

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Sandy Ryza
Hi Deb, The current state of the art is to increase spark.yarn.executor.memoryOverhead until the job stops failing. We do have plans to try to automatically scale this based on the amount of memory requested, but it will still just be a heuristic. -Sandy On Tue, Sep 9, 2014 at 7:32 AM,

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
Hmm...I did try it increase to few gb but did not get a successful run yet... Any idea if I am using say 40 executors, each running 16GB, what's the typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large matrices with say few billion ratings... On Tue, Sep 9, 2014 at 10:49 AM,

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Sandy Ryza
I would expect 2 GB would be enough or more than enough for 16 GB executors (unless ALS is using a bunch of off-heap memory?). You mentioned earlier in this thread that the property wasn't showing up in the Environment tab. Are you sure it's making it in? -Sandy On Tue, Sep 9, 2014 at 11:58

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
Last time it did not show up on environment tab but I will give it another shot...Expected behavior is that this env variable will show up right ? On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza sandy.r...@cloudera.com wrote: I would expect 2 GB would be enough or more than enough for 16 GB

Re: Lost executor on YARN ALS iterations

2014-08-21 Thread Debasish Das
Sandy, I put spark.yarn.executor.memoryOverhead 1024 on spark-defaults.conf but I don't see environment variable on spark properties on the webui-environment Does it need to go in spark-env.sh ? Thanks. Deb On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Xiangrui Meng
Hi Deb, I think this may be the same issue as described in https://issues.apache.org/jira/browse/SPARK-2121 . We know that the container got killed by YARN because it used much more memory that it requested. But we haven't figured out the root cause yet. +Sandy Best, Xiangrui On Tue, Aug 19,

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Debasish Das
I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is definitely a YARN related problem... At least for me right now only deployment option possible is standalone... On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote: Hi Deb, I think this may be the

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Sandy Ryza
Hi Debasish, The fix is to raise spark.yarn.executor.memoryOverhead until this goes away. This controls the buffer between the JVM heap size and the amount of memory requested from YARN (JVMs can take up memory beyond their heap size). You should also make sure that, in the YARN NodeManager

Lost executor on YARN ALS iterations

2014-08-19 Thread Debasish Das
Hi, During the 4th ALS iteration, I am noticing that one of the executor gets disconnected: 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding SendingConnectionManagerId not found 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5 disconnected, so removing it