Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped at
spark.memory.offHeap.size bytes. This is purposely specified as an absolute
size rather than as a percentage of the heap size in order to allow end
users to tune Spark so that its overall memory consumption stays within
container memory limits.

To use your example of a 3GB YARN container, you could configure Spark so
that it's maximum heap size plus spark.memory.offHeap.size is smaller than
3GB (minus some overhead fudge-factor).

On Thu, Sep 22, 2016 at 7:56 AM Sean Owen <so...@cloudera.com> wrote:

> It's looking at the whole process's memory usage, and doesn't care
> whether the memory is used by the heap or not within the JVM. Of
> course, allocating memory off-heap still counts against you at the OS
> level.
>
> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
> <msegel_had...@hotmail.com> wrote:
> > Thanks for the response Sean.
> >
> > But how does YARN know about the off-heap memory usage?
> > That’s the piece that I’m missing.
> >
> > Thx again,
> >
> > -Mike
> >
> >> On Sep 21, 2016, at 10:09 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> No, Xmx only controls the maximum size of on-heap allocated memory.
> >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
> >> when it can be released).
> >>
> >> The answer is that YARN will kill the process because it's using more
> >> memory than it asked for. A JVM is always going to use a little
> >> off-heap memory by itself, so setting a max heap size of 2GB means the
> >> JVM process may use a bit more than 2GB of memory. With an off-heap
> >> intensive app like Spark it can be a lot more.
> >>
> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor
> >> it will ask for 3.3GB from YARN. You can increase the overhead.
> >>
> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke <jornfra...@gmail.com>
> wrote:
> >>> All off-heap memory is still managed by the JVM process. If you limit
> the
> >>> memory of this process then you limit the memory. I think the memory
> of the
> >>> JVM process could be limited via the xms/xmx parameter of the JVM.
> This can
> >>> be configured via spark options for yarn (be aware that they are
> different
> >>> in cluster and client mode), but i recommend to use the spark options
> for
> >>> the off heap maximum.
> >>>
> >>> https://spark.apache.org/docs/latest/running-on-yarn.html
> >>>
> >>>
> >>> On 21 Sep 2016, at 22:02, Michael Segel <msegel_had...@hotmail.com>
> wrote:
> >>>
> >>> I’ve asked this question a couple of times from a friend who
> didn’t know
> >>> the answer… so I thought I would try here.
> >>>
> >>>
> >>> Suppose we launch a job on a cluster (YARN) and we have set up the
> >>> containers to be 3GB in size.
> >>>
> >>>
> >>> What does that 3GB represent?
> >>>
> >>> I mean what happens if we end up using 2-3GB of off heap storage via
> >>> tungsten?
> >>> What will Spark do?
> >>> Will it try to honor the container’s limits and throw an exception
> or will
> >>> it allow my job to grab that amount of memory and exceed YARN’s
> >>> expectations since its off heap?
> >>>
> >>> Thx
> >>>
> >>> -Mike
> >>>
> >>>
> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB• È
> >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to