subject:"Re\: ALS.trainImplicit running out of mem when using higher rank"

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-19 Thread Sean Owen

The problem is clearly to do with the executor exceeding YARN
allocations, so, this can't be in local mode. He said this was running
on YARN at the outset.

On Mon, Jan 19, 2015 at 2:27 AM, Raghavendra Pandey
raghavendra.pan...@gmail.com wrote:
 If you are running spark in local mode, executor parameters are not used as
 there is no executor. You should try to set corresponding driver parameter
 to effect it.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-18 Thread Raghavendra Pandey

If you are running spark in local mode, executor parameters are not used as
there is no executor. You should try to set corresponding driver parameter
to effect it.

On Mon, Jan 19, 2015, 00:21 Sean Owen so...@cloudera.com wrote:

 OK. Are you sure the executor has the memory you think? -Xmx24g in
 its command line? It may be that for some reason your job is reserving
 an exceptionally large amount of non-heap memory. I am not sure that's
 to be expected with the ALS job though. Even if the settings work,
 considering using the explicit command line configuration.

 On Sat, Jan 17, 2015 at 12:49 PM, Antony Mayi antonym...@yahoo.com
 wrote:
  the values are for sure applied as expected - confirmed using the spark
 UI
  environment page...
 
  it comes from my defaults configured using
  'spark.yarn.executor.memoryOverhead=8192' (yes, now increased even
 more) in
  /etc/spark/conf/spark-defaults.conf and 'export
 SPARK_EXECUTOR_MEMORY=24G'
  in /etc/spark/conf/spark-env.sh

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-18 Thread Sean Owen

OK. Are you sure the executor has the memory you think? -Xmx24g in
its command line? It may be that for some reason your job is reserving
an exceptionally large amount of non-heap memory. I am not sure that's
to be expected with the ALS job though. Even if the settings work,
considering using the explicit command line configuration.

On Sat, Jan 17, 2015 at 12:49 PM, Antony Mayi antonym...@yahoo.com wrote:
 the values are for sure applied as expected - confirmed using the spark UI
 environment page...

 it comes from my defaults configured using
 'spark.yarn.executor.memoryOverhead=8192' (yes, now increased even more) in
 /etc/spark/conf/spark-defaults.conf and 'export SPARK_EXECUTOR_MEMORY=24G'
 in /etc/spark/conf/spark-env.sh

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-17 Thread Sean Owen

I'm not sure how you are setting these values though. Where is
spark.yarn.executor.memoryOverhead=6144 ? Env variables aren't the
best way to set configuration either. Again have a look at
http://spark.apache.org/docs/latest/running-on-yarn.html

... --executor-memory 22g --conf
spark.yarn.executor.memoryOverhead=2g ... should do it, off the top
of my head. That should reserve 24g from YARN.

On Sat, Jan 17, 2015 at 5:29 AM, Antony Mayi antonym...@yahoo.com wrote:
although this helped to improve it significantly I still run into this
problem despite increasing the spark.yarn.executor.memoryOverhead vastly:

export SPARK_EXECUTOR_MEMORY=24G
spark.yarn.executor.memoryOverhead=6144

yet getting this:
2015-01-17 04:47:40,389 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=30211,containerID=container_1421451766649_0002_01_115969] is
running beyond physical memory limits. Current usage: 30.1 GB of 30 GB
physical memory used; 33.0 GB of 63.0 GB virtual memory used. Killing
container.

is there anything more I can do?

thanks,
Antony.

On Monday, 12 January 2015, 8:21, Antony Mayi antonym...@yahoo.com wrote:

this seems to have sorted it, awesome, thanks for great help.
Antony.

On Sunday, 11 January 2015, 13:02, Sean Owen so...@cloudera.com wrote:

I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

This wouldn't cause executors to fail for running out of memory
though. In fact, your error does not show the task failing for lack of
memory. What it shows is that YARN thinks the task is using a little
bit more memory than it said it would, and killed it.

This happens sometimes with JVM-based YARN jobs since a JVM configured
to use X heap ends up using a bit more than X physical memory if the
heap reaches max size. So there's a bit of headroom built in and
controlled by spark.yarn.executor.memoryOverhead
(http://spark.apache.org/docs/latest/running-on-yarn.html) You can try
increasing it to a couple GB.

On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
the question really is whether this is expected that the memory
requirements
grow rapidly with the rank... as I would expect memory is rather O(1)
problem with dependency only on the size of input data.

if this is expected is there any rough formula to determine the required
memory based on ALS input and parameters?

thanks,
Antony.

On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:

the actual case looks like this:
* spark 1.1.0 on yarn (cdh 5.2.1)
* ~8-10 executors, 36GB phys RAM per host
* input RDD is roughly 3GB containing ~150-200M items (and this RDD is
made
persistent using .cache())
* using pyspark

yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
33792 (33GB), spark is set to be:
SPARK_EXECUTOR_CORES=6
SPARK_EXECUTOR_INSTANCES=9
SPARK_EXECUTOR_MEMORY=30G

when using higher rank (above 20) for ALS.trainImplicit the executor runs
after some time (~hour) of execution out of the yarn limit and gets
killed:

2015-01-09 17:51:27,130 WARN

org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=27125,containerID=container_1420871936411_0002_01_23]
is
running beyond physical memory limits. Current usage: 31.2 GB of 31 GB
physical memory used; 34.7 GB of 65.1 GB virtual memory used. Killing
container.

thanks for any ideas,
Antony.

On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:

the memory requirements seem to be rapidly growing hen using higher
rank...
I am unable to get over 20 without running out of memory. is this
expected?
thanks, Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-17 Thread Antony Mayi

the values are for sure applied as expected - confirmed using the spark UI
environment page...
it comes from my defaults configured using
'spark.yarn.executor.memoryOverhead=8192' (yes, now increased even more) in
/etc/spark/conf/spark-defaults.conf and 'export SPARK_EXECUTOR_MEMORY=24G' in
/etc/spark/conf/spark-env.sh
Antony.

On Saturday, 17 January 2015, 11:32, Sean Owen so...@cloudera.com wrote:

... --executor-memory 22g --conf
spark.yarn.executor.memoryOverhead=2g ... should do it, off the top
of my head. That should reserve 24g from YARN.

export SPARK_EXECUTOR_MEMORY=24G
spark.yarn.executor.memoryOverhead=6144

is there anything more I can do?

thanks,
Antony.

On Monday, 12 January 2015, 8:21, Antony Mayi antonym...@yahoo.com wrote:

this seems to have sorted it, awesome, thanks for great help.
Antony.

On Sunday, 11 January 2015, 13:02, Sean Owen so...@cloudera.com wrote:

I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

if this is expected is there any rough formula to determine the required
memory based on ALS input and parameters?

thanks,
Antony.

On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:

yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
33792 (33GB), spark is set to be:
SPARK_EXECUTOR_CORES=6
SPARK_EXECUTOR_INSTANCES=9
SPARK_EXECUTOR_MEMORY=30G

when using higher rank (above 20) for ALS.trainImplicit the executor runs
after some time (~hour) of execution out of the yarn limit and gets
killed:

2015-01-09 17:51:27,130 WARN

thanks for any ideas,
Antony.

On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:

the memory requirements seem to be rapidly growing hen using higher
rank...
I am unable to get over 20 without running out of memory. is this
expected?
thanks, Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-16 Thread Antony Mayi

although this helped to improve it significantly I still run into this problem
despite increasing the spark.yarn.executor.memoryOverhead vastly:
export SPARK_EXECUTOR_MEMORY=24Gspark.yarn.executor.memoryOverhead=6144

yet getting this:2015-01-17 04:47:40,389 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=30211,containerID=container_1421451766649_0002_01_115969] is
running beyond physical memory limits. Current usage: 30.1 GB of 30 GB physical
memory used; 33.0 GB of 63.0 GB virtual memory used. Killing container.
is there anything more I can do?
thanks,Antony.

On Monday, 12 January 2015, 8:21, Antony Mayi antonym...@yahoo.com wrote:

this seems to have sorted it, awesome, thanks for great help.Antony.

On Sunday, 11 January 2015, 13:02, Sean Owen so...@cloudera.com wrote:

I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
the question really is whether this is expected that the memory requirements
grow rapidly with the rank... as I would expect memory is rather O(1)
problem with dependency only on the size of input data.

if this is expected is there any rough formula to determine the required
memory based on ALS input and parameters?

thanks,
Antony.

On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:

yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
33792 (33GB), spark is set to be:
SPARK_EXECUTOR_CORES=6
SPARK_EXECUTOR_INSTANCES=9
SPARK_EXECUTOR_MEMORY=30G

when using higher rank (above 20) for ALS.trainImplicit the executor runs
after some time (~hour) of execution out of the yarn limit and gets killed:

2015-01-09 17:51:27,130 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=27125,containerID=container_1420871936411_0002_01_23] is
running beyond physical memory limits. Current usage: 31.2 GB of 31 GB
physical memory used; 34.7 GB of 65.1 GB virtual memory used. Killing
container.

thanks for any ideas,
Antony.

On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:

the memory requirements seem to be rapidly growing hen using higher rank...
I am unable to get over 20 without running out of memory. is this expected?
thanks, Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-11 Thread Antony Mayi

the question really is whether this is expected that the memory requirements 
grow rapidly with the rank... as I would expect memory is rather O(1) problem 
with dependency only on the size of input data.
if this is expected is there any rough formula to determine the required memory 
based on ALS input and parameters?
thanks,Antony. 

 On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com 
wrote:
   
 

 the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10 
executors, 36GB phys RAM per host* input RDD is roughly 3GB containing 
~150-200M items (and this RDD is made persistent using .cache())* using pyspark
yarn is configured with the limit yarn.nodemanager.resource.memory-mb of 33792 
(33GB), spark is set to 
be:SPARK_EXECUTOR_CORES=6SPARK_EXECUTOR_INSTANCES=9SPARK_EXECUTOR_MEMORY=30G
when using higher rank (above 20) for ALS.trainImplicit the executor runs after 
some time (~hour) of execution out of the yarn limit and gets killed:
2015-01-09 17:51:27,130 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container [pid=27125,containerID=container_1420871936411_0002_01_23] is 
running beyond physical memory limits. Current usage: 31.2 GB of 31 GB physical 
memory used; 34.7 GB of 65.1 GB virtual memory used. Killing container.
thanks for any ideas,Antony.
 

 On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com 
wrote:
   
 

 the memory requirements seem to be rapidly growing hen using higher rank... I 
am unable to get over 20 without running out of memory. is this 
expected?thanks, Antony.

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-11 Thread Sean Owen

I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
the question really is whether this is expected that the memory requirements
grow rapidly with the rank... as I would expect memory is rather O(1)
problem with dependency only on the size of input data.

if this is expected is there any rough formula to determine the required
memory based on ALS input and parameters?

thanks,
Antony.

On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:

yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
33792 (33GB), spark is set to be:
SPARK_EXECUTOR_CORES=6
SPARK_EXECUTOR_INSTANCES=9
SPARK_EXECUTOR_MEMORY=30G

when using higher rank (above 20) for ALS.trainImplicit the executor runs
after some time (~hour) of execution out of the yarn limit and gets killed:

thanks for any ideas,
Antony.

On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:

the memory requirements seem to be rapidly growing hen using higher rank...
I am unable to get over 20 without running out of memory. is this expected?
thanks, Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-11 Thread Antony Mayi

this seems to have sorted it, awesome, thanks for great help.Antony.

On Sunday, 11 January 2015, 13:02, Sean Owen so...@cloudera.com wrote:

I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
antonym...@yahoo.com.invalid wrote:
the question really is whether this is expected that the memory requirements
grow rapidly with the rank... as I would expect memory is rather O(1)
problem with dependency only on the size of input data.

if this is expected is there any rough formula to determine the required
memory based on ALS input and parameters?

thanks,
Antony.

On Saturday, 10 January 2015, 10:47, Antony Mayi antonym...@yahoo.com
wrote:

yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
33792 (33GB), spark is set to be:
SPARK_EXECUTOR_CORES=6
SPARK_EXECUTOR_INSTANCES=9
SPARK_EXECUTOR_MEMORY=30G

when using higher rank (above 20) for ALS.trainImplicit the executor runs
after some time (~hour) of execution out of the yarn limit and gets killed:

thanks for any ideas,
Antony.

On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com
wrote:

the memory requirements seem to be rapidly growing hen using higher rank...
I am unable to get over 20 without running out of memory. is this expected?
thanks, Antony.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-10 Thread Antony Mayi

the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10 
executors, 36GB phys RAM per host* input RDD is roughly 3GB containing 
~150-200M items (and this RDD is made persistent using .cache())* using pyspark
yarn is configured with the limit yarn.nodemanager.resource.memory-mb of 33792 
(33GB), spark is set to 
be:SPARK_EXECUTOR_CORES=6SPARK_EXECUTOR_INSTANCES=9SPARK_EXECUTOR_MEMORY=30G
when using higher rank (above 20) for ALS.trainImplicit the executor runs after 
some time (~hour) of execution out of the yarn limit and gets killed:
2015-01-09 17:51:27,130 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container [pid=27125,containerID=container_1420871936411_0002_01_23] is 
running beyond physical memory limits. Current usage: 31.2 GB of 31 GB physical 
memory used; 34.7 GB of 65.1 GB virtual memory used. Killing container.
thanks for any ideas,Antony.
 

 On Saturday, 10 January 2015, 10:11, Antony Mayi antonym...@yahoo.com 
wrote:
   
 

 the memory requirements seem to be rapidly growing hen using higher rank... I 
am unable to get over 20 without running out of memory. is this 
expected?thanks, Antony.

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

Re: ALS.trainImplicit running out of mem when using higher rank

10 matches

Site Navigation

Mail list logo

Footer information