Oh OK you are saying you are requesting 25 executors and getting them,
got it. You can consider making fewer, bigger executors to pool rather
than split up your memory, but at some point it becomes
counter-productive. 32GB is a fine executor size.

So you have ~8GB available per task which seems like plenty. Something
else is at work here. Is this error form your code's stages or ALS?

On Thu, Feb 19, 2015 at 10:07 AM, Antony Mayi <antonym...@yahoo.com> wrote:
> based on spark UI I am running 25 executors for sure. why would you expect
> four? I submit the task with --num-executors 25 and I get 6-7 executors
> running per host (using more of smaller executors allows me better cluster
> utilization when running parallel spark sessions (which is not the case of
> this reported issue - for now using the cluster exclusively)).
>
> thx,
> Antony.
>
>
> On Thursday, 19 February 2015, 11:02, Sean Owen <so...@cloudera.com> wrote:
>
>
>
> This should result in 4 executors, not 25. They should be able to
> execute 4*4 = 16 tasks simultaneously. You have them grab 4*32 = 128GB
> of RAM, not 1TB.
>
> It still feels like this shouldn't be running out of memory, not by a
> long shot though. But just pointing out potential differences between
> what you are expecting and what you are configuring.
>
> On Thu, Feb 19, 2015 at 9:56 AM, Antony Mayi
> <antonym...@yahoo.com.invalid> wrote:
>> Hi,
>>
>> I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running
>> spark
>> 1.2.0 in yarn-client mode with following layout:
>>
>> spark.executor.cores=4
>> spark.executor.memory=28G
>> spark.yarn.executor.memoryOverhead=4096
>>
>> I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a
>> dataset with ~3 billion of ratings using 25 executors. At some point some
>> executor crashes with:
>>
>> 15/02/19 05:41:06 WARN util.AkkaUtils: Error sending message in 1 attempts
>> java.util.concurrent.TimeoutException: Futures timed out after [30
>> seconds]
>>        at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>        at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>        at
>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>        at
>>
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>        at scala.concurrent.Await$.result(package.scala:107)
>>        at
>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>>        at
>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398)
>> 15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in
>> stage
>> 51.0 (TID 7259)
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>        at java.lang.reflect.Array.newInstance(Array.java:75)
>>        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)
>>        at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
>>        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
>>        at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
>>
>> So the GC overhead limit exceeded is pretty clear and would suggest
>> running
>> out of memory. Since I have 1TB of RAM available this must be rather due
>> to
>> some config inoptimality.
>>
>> Can anyone please point me to some directions how to tackle this?
>>
>> Thanks,
>> Antony.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to