spark worker and yarn memory

2014-06-05 Thread Xu (Simon) Chen
I am slightly confused about the --executor-memory setting. My yarn
cluster has a maximum container memory of 8192MB.

When I specify --executor-memory 8G in my spark-shell, no container can
be started at all. It only works when I lower the executor memory to 7G.
But then, on yarn, I see 2 container per node, using 16G of memory.

Then on the spark UI, it shows that each worker has 4GB of memory, rather
than 7.

Can someone explain the relationship among the numbers I see here?

Thanks.


Re: spark worker and yarn memory

2014-06-05 Thread Sandy Ryza
Hi Xu,

As crazy as it might sound, this all makes sense.

There are a few different quantities at play here:
* the heap size of the executor (controlled by --executor-memory)
* the amount of memory spark requests from yarn (the heap size plus
384 mb to account for fixed memory costs outside if the heap)
* the amount of memory yarn grants to the container (yarn rounds up to
the nearest multiple of yarn.scheduler.minimum-allocation-mb or
yarn.scheduler.fair.increment-allocation-mb, depending on the
scheduler used)
* the amount of memory spark uses for caching on each executor, which
is spark.storage.memoryFraction (default 0.6) of the executor heap
size

So, with --executor-memory 8g, spark requests 8g + 384m from yarn,
which doesn't fit into it's container max.  With --executor-memory 7g,
Spark requests 7g + 384m from yarn, which fits into its container max.
 This gets rounded up to 8g by the yarn scheduler.  7g is still used
as the executor heap size, and .6 of this is about 4g, shown as the
cache space in the spark.

-Sandy

 On Jun 5, 2014, at 9:44 AM, Xu (Simon) Chen xche...@gmail.com wrote:

 I am slightly confused about the --executor-memory setting. My yarn cluster 
 has a maximum container memory of 8192MB.

 When I specify --executor-memory 8G in my spark-shell, no container can be 
 started at all. It only works when I lower the executor memory to 7G. But 
 then, on yarn, I see 2 container per node, using 16G of memory.

 Then on the spark UI, it shows that each worker has 4GB of memory, rather 
 than 7.

 Can someone explain the relationship among the numbers I see here?

 Thanks.


Re: spark worker and yarn memory

2014-06-05 Thread Xu (Simon) Chen
Nice explanation... Thanks!


On Thu, Jun 5, 2014 at 5:50 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Xu,

 As crazy as it might sound, this all makes sense.

 There are a few different quantities at play here:
 * the heap size of the executor (controlled by --executor-memory)
 * the amount of memory spark requests from yarn (the heap size plus
 384 mb to account for fixed memory costs outside if the heap)
 * the amount of memory yarn grants to the container (yarn rounds up to
 the nearest multiple of yarn.scheduler.minimum-allocation-mb or
 yarn.scheduler.fair.increment-allocation-mb, depending on the
 scheduler used)
 * the amount of memory spark uses for caching on each executor, which
 is spark.storage.memoryFraction (default 0.6) of the executor heap
 size

 So, with --executor-memory 8g, spark requests 8g + 384m from yarn,
 which doesn't fit into it's container max.  With --executor-memory 7g,
 Spark requests 7g + 384m from yarn, which fits into its container max.
  This gets rounded up to 8g by the yarn scheduler.  7g is still used
 as the executor heap size, and .6 of this is about 4g, shown as the
 cache space in the spark.

 -Sandy

  On Jun 5, 2014, at 9:44 AM, Xu (Simon) Chen xche...@gmail.com wrote:
 
  I am slightly confused about the --executor-memory setting. My yarn
 cluster has a maximum container memory of 8192MB.
 
  When I specify --executor-memory 8G in my spark-shell, no container
 can be started at all. It only works when I lower the executor memory to
 7G. But then, on yarn, I see 2 container per node, using 16G of memory.
 
  Then on the spark UI, it shows that each worker has 4GB of memory,
 rather than 7.
 
  Can someone explain the relationship among the numbers I see here?
 
  Thanks.