Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
I would disagree. 

While you can tune the system to not over subscribe, I would rather have it hit 
swap then fail. Especially on long running jobs. 

If we look at oversubscription on Hadoop clusters which are not running HBase… 
they survive.  Its when you have things like HBase that don’t handle swap well… 
or you don’t allocate enough swap that things go boom. 

Also consider that you could move swap to something that is faster than 
spinning rust. 


> On Sep 22, 2016, at 12:44 PM, Sean Owen  wrote:
> 
> I don't think I'd enable swap on a cluster. You'd rather processes
> fail than grind everything to a halt. You'd buy more memory or
> optimize memory before trading it for I/O.
> 
> On Thu, Sep 22, 2016 at 6:29 PM, Michael Segel
>  wrote:
>> Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation 
>> and ignored the off heap.
>> 
>> WRT over all OS memory… this would be one reason why I’d keep a decent 
>> amount of swap around. (Maybe even putting it on a fast device like an .m2 
>> or PCIe flash drive….



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Jörn Franke
this is probably the best way to manage it

On Thu, Sep 22, 2016 at 6:42 PM, Josh Rosen 
wrote:

> Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped
> at spark.memory.offHeap.size bytes. This is purposely specified as an
> absolute size rather than as a percentage of the heap size in order to
> allow end users to tune Spark so that its overall memory consumption stays
> within container memory limits.
>
> To use your example of a 3GB YARN container, you could configure Spark so
> that it's maximum heap size plus spark.memory.offHeap.size is smaller than
> 3GB (minus some overhead fudge-factor).
>
> On Thu, Sep 22, 2016 at 7:56 AM Sean Owen  wrote:
>
>> It's looking at the whole process's memory usage, and doesn't care
>> whether the memory is used by the heap or not within the JVM. Of
>> course, allocating memory off-heap still counts against you at the OS
>> level.
>>
>> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
>>  wrote:
>> > Thanks for the response Sean.
>> >
>> > But how does YARN know about the off-heap memory usage?
>> > That’s the piece that I’m missing.
>> >
>> > Thx again,
>> >
>> > -Mike
>> >
>> >> On Sep 21, 2016, at 10:09 PM, Sean Owen  wrote:
>> >>
>> >> No, Xmx only controls the maximum size of on-heap allocated memory.
>> >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
>> >> when it can be released).
>> >>
>> >> The answer is that YARN will kill the process because it's using more
>> >> memory than it asked for. A JVM is always going to use a little
>> >> off-heap memory by itself, so setting a max heap size of 2GB means the
>> >> JVM process may use a bit more than 2GB of memory. With an off-heap
>> >> intensive app like Spark it can be a lot more.
>> >>
>> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor
>> >> it will ask for 3.3GB from YARN. You can increase the overhead.
>> >>
>> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke 
>> wrote:
>> >>> All off-heap memory is still managed by the JVM process. If you limit
>> the
>> >>> memory of this process then you limit the memory. I think the memory
>> of the
>> >>> JVM process could be limited via the xms/xmx parameter of the JVM.
>> This can
>> >>> be configured via spark options for yarn (be aware that they are
>> different
>> >>> in cluster and client mode), but i recommend to use the spark options
>> for
>> >>> the off heap maximum.
>> >>>
>> >>> https://spark.apache.org/docs/latest/running-on-yarn.html
>> >>>
>> >>>
>> >>> On 21 Sep 2016, at 22:02, Michael Segel 
>> wrote:
>> >>>
>> >>> I’ve asked this question a couple of times from a friend who
>> didn’t know
>> >>> the answer… so I thought I would try here.
>> >>>
>> >>>
>> >>> Suppose we launch a job on a cluster (YARN) and we have set up the
>> >>> containers to be 3GB in size.
>> >>>
>> >>>
>> >>> What does that 3GB represent?
>> >>>
>> >>> I mean what happens if we end up using 2-3GB of off heap storage via
>> >>> tungsten?
>> >>> What will Spark do?
>> >>> Will it try to honor the container’s limits and throw an exception
>> or will
>> >>> it allow my job to grab that amount of memory and exceed YARN’s
>> >>> expectations since its off heap?
>> >>>
>> >>> Thx
>> >>>
>> >>> -Mike
>> >>>
>> >>> B‹CB•
>> È
>> >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
>> >
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Sean Owen
I don't think I'd enable swap on a cluster. You'd rather processes
fail than grind everything to a halt. You'd buy more memory or
optimize memory before trading it for I/O.

On Thu, Sep 22, 2016 at 6:29 PM, Michael Segel
 wrote:
> Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation and 
> ignored the off heap.
>
> WRT over all OS memory… this would be one reason why I’d keep a decent amount 
> of swap around. (Maybe even putting it on a fast device like an .m2 or PCIe 
> flash drive….

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Jörn Franke
Well off-heap memory will be from an OS perspective be visible under the
JVM process (you see the memory consumption of the jvm process growing when
using off-heap memory). There is one exception: if there is another
process, which has not been started by the JVM and "lives" outside the JVM,
but uses IPC to communicate with the JVM. I do not assume this is for Spark
the case.

@xms/xmx you are right here, this is just about heap memory. You may be
able to limit the memory (and thus under previous described assumption) of
the jvm process by using cgroups, which needs to be thought about if this
shoudld be done.

On Thu, Sep 22, 2016 at 5:09 AM, Sean Owen  wrote:

> No, Xmx only controls the maximum size of on-heap allocated memory.
> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
> when it can be released).
>
> The answer is that YARN will kill the process because it's using more
> memory than it asked for. A JVM is always going to use a little
> off-heap memory by itself, so setting a max heap size of 2GB means the
> JVM process may use a bit more than 2GB of memory. With an off-heap
> intensive app like Spark it can be a lot more.
>
> There's a built-in 10% overhead, so that if you ask for a 3GB executor
> it will ask for 3.3GB from YARN. You can increase the overhead.
>
> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke 
> wrote:
> > All off-heap memory is still managed by the JVM process. If you limit the
> > memory of this process then you limit the memory. I think the memory of
> the
> > JVM process could be limited via the xms/xmx parameter of the JVM. This
> can
> > be configured via spark options for yarn (be aware that they are
> different
> > in cluster and client mode), but i recommend to use the spark options for
> > the off heap maximum.
> >
> > https://spark.apache.org/docs/latest/running-on-yarn.html
> >
> >
> > On 21 Sep 2016, at 22:02, Michael Segel 
> wrote:
> >
> > I’ve asked this question a couple of times from a friend who didn’t
> know
> > the answer… so I thought I would try here.
> >
> >
> > Suppose we launch a job on a cluster (YARN) and we have set up the
> > containers to be 3GB in size.
> >
> >
> > What does that 3GB represent?
> >
> > I mean what happens if we end up using 2-3GB of off heap storage via
> > tungsten?
> > What will Spark do?
> > Will it try to honor the container’s limits and throw an exception or
> will
> > it allow my job to grab that amount of memory and exceed YARN’s
> > expectations since its off heap?
> >
> > Thx
> >
> > -Mike
> >
> > B‹CB•
> È
> > [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
>


Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation and 
ignored the off heap. 

WRT over all OS memory… this would be one reason why I’d keep a decent amount 
of swap around. (Maybe even putting it on a fast device like an .m2 or PCIe 
flash drive…. 


> On Sep 22, 2016, at 9:56 AM, Sean Owen  wrote:
> 
> It's looking at the whole process's memory usage, and doesn't care
> whether the memory is used by the heap or not within the JVM. Of
> course, allocating memory off-heap still counts against you at the OS
> level.
> 
> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
>  wrote:
>> Thanks for the response Sean.
>> 
>> But how does YARN know about the off-heap memory usage?
>> That’s the piece that I’m missing.
>> 
>> Thx again,
>> 
>> -Mike
>> 
>>> On Sep 21, 2016, at 10:09 PM, Sean Owen  wrote:
>>> 
>>> No, Xmx only controls the maximum size of on-heap allocated memory.
>>> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
>>> when it can be released).
>>> 
>>> The answer is that YARN will kill the process because it's using more
>>> memory than it asked for. A JVM is always going to use a little
>>> off-heap memory by itself, so setting a max heap size of 2GB means the
>>> JVM process may use a bit more than 2GB of memory. With an off-heap
>>> intensive app like Spark it can be a lot more.
>>> 
>>> There's a built-in 10% overhead, so that if you ask for a 3GB executor
>>> it will ask for 3.3GB from YARN. You can increase the overhead.
>>> 
>>> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke  wrote:
 All off-heap memory is still managed by the JVM process. If you limit the
 memory of this process then you limit the memory. I think the memory of the
 JVM process could be limited via the xms/xmx parameter of the JVM. This can
 be configured via spark options for yarn (be aware that they are different
 in cluster and client mode), but i recommend to use the spark options for
 the off heap maximum.
 
 https://spark.apache.org/docs/latest/running-on-yarn.html
 
 
 On 21 Sep 2016, at 22:02, Michael Segel  wrote:
 
 I’ve asked this question a couple of times from a friend who didn’t 
 know
 the answer… so I thought I would try here.
 
 
 Suppose we launch a job on a cluster (YARN) and we have set up the
 containers to be 3GB in size.
 
 
 What does that 3GB represent?
 
 I mean what happens if we end up using 2-3GB of off heap storage via
 tungsten?
 What will Spark do?
 Will it try to honor the container’s limits and throw an exception or 
 will
 it allow my job to grab that amount of memory and exceed YARN’s
 expectations since its off heap?
 
 Thx
 
 -Mike
 
 B‹CB• È
 [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
>> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Josh Rosen
Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped at
spark.memory.offHeap.size bytes. This is purposely specified as an absolute
size rather than as a percentage of the heap size in order to allow end
users to tune Spark so that its overall memory consumption stays within
container memory limits.

To use your example of a 3GB YARN container, you could configure Spark so
that it's maximum heap size plus spark.memory.offHeap.size is smaller than
3GB (minus some overhead fudge-factor).

On Thu, Sep 22, 2016 at 7:56 AM Sean Owen  wrote:

> It's looking at the whole process's memory usage, and doesn't care
> whether the memory is used by the heap or not within the JVM. Of
> course, allocating memory off-heap still counts against you at the OS
> level.
>
> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
>  wrote:
> > Thanks for the response Sean.
> >
> > But how does YARN know about the off-heap memory usage?
> > That’s the piece that I’m missing.
> >
> > Thx again,
> >
> > -Mike
> >
> >> On Sep 21, 2016, at 10:09 PM, Sean Owen  wrote:
> >>
> >> No, Xmx only controls the maximum size of on-heap allocated memory.
> >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
> >> when it can be released).
> >>
> >> The answer is that YARN will kill the process because it's using more
> >> memory than it asked for. A JVM is always going to use a little
> >> off-heap memory by itself, so setting a max heap size of 2GB means the
> >> JVM process may use a bit more than 2GB of memory. With an off-heap
> >> intensive app like Spark it can be a lot more.
> >>
> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor
> >> it will ask for 3.3GB from YARN. You can increase the overhead.
> >>
> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke 
> wrote:
> >>> All off-heap memory is still managed by the JVM process. If you limit
> the
> >>> memory of this process then you limit the memory. I think the memory
> of the
> >>> JVM process could be limited via the xms/xmx parameter of the JVM.
> This can
> >>> be configured via spark options for yarn (be aware that they are
> different
> >>> in cluster and client mode), but i recommend to use the spark options
> for
> >>> the off heap maximum.
> >>>
> >>> https://spark.apache.org/docs/latest/running-on-yarn.html
> >>>
> >>>
> >>> On 21 Sep 2016, at 22:02, Michael Segel 
> wrote:
> >>>
> >>> I’ve asked this question a couple of times from a friend who
> didn’t know
> >>> the answer… so I thought I would try here.
> >>>
> >>>
> >>> Suppose we launch a job on a cluster (YARN) and we have set up the
> >>> containers to be 3GB in size.
> >>>
> >>>
> >>> What does that 3GB represent?
> >>>
> >>> I mean what happens if we end up using 2-3GB of off heap storage via
> >>> tungsten?
> >>> What will Spark do?
> >>> Will it try to honor the container’s limits and throw an exception
> or will
> >>> it allow my job to grab that amount of memory and exceed YARN’s
> >>> expectations since its off heap?
> >>>
> >>> Thx
> >>>
> >>> -Mike
> >>>
> >>>
> B‹CB• È
> >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Sean Owen
It's looking at the whole process's memory usage, and doesn't care
whether the memory is used by the heap or not within the JVM. Of
course, allocating memory off-heap still counts against you at the OS
level.

On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel
 wrote:
> Thanks for the response Sean.
>
> But how does YARN know about the off-heap memory usage?
> That’s the piece that I’m missing.
>
> Thx again,
>
> -Mike
>
>> On Sep 21, 2016, at 10:09 PM, Sean Owen  wrote:
>>
>> No, Xmx only controls the maximum size of on-heap allocated memory.
>> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
>> when it can be released).
>>
>> The answer is that YARN will kill the process because it's using more
>> memory than it asked for. A JVM is always going to use a little
>> off-heap memory by itself, so setting a max heap size of 2GB means the
>> JVM process may use a bit more than 2GB of memory. With an off-heap
>> intensive app like Spark it can be a lot more.
>>
>> There's a built-in 10% overhead, so that if you ask for a 3GB executor
>> it will ask for 3.3GB from YARN. You can increase the overhead.
>>
>> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke  wrote:
>>> All off-heap memory is still managed by the JVM process. If you limit the
>>> memory of this process then you limit the memory. I think the memory of the
>>> JVM process could be limited via the xms/xmx parameter of the JVM. This can
>>> be configured via spark options for yarn (be aware that they are different
>>> in cluster and client mode), but i recommend to use the spark options for
>>> the off heap maximum.
>>>
>>> https://spark.apache.org/docs/latest/running-on-yarn.html
>>>
>>>
>>> On 21 Sep 2016, at 22:02, Michael Segel  wrote:
>>>
>>> I’ve asked this question a couple of times from a friend who didn’t know
>>> the answer… so I thought I would try here.
>>>
>>>
>>> Suppose we launch a job on a cluster (YARN) and we have set up the
>>> containers to be 3GB in size.
>>>
>>>
>>> What does that 3GB represent?
>>>
>>> I mean what happens if we end up using 2-3GB of off heap storage via
>>> tungsten?
>>> What will Spark do?
>>> Will it try to honor the container’s limits and throw an exception or will
>>> it allow my job to grab that amount of memory and exceed YARN’s
>>> expectations since its off heap?
>>>
>>> Thx
>>>
>>> -Mike
>>>
>>> B‹CB• È
>>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-22 Thread Michael Segel
Thanks for the response Sean. 

But how does YARN know about the off-heap memory usage? 
That’s the piece that I’m missing.

Thx again, 

-Mike

> On Sep 21, 2016, at 10:09 PM, Sean Owen  wrote:
> 
> No, Xmx only controls the maximum size of on-heap allocated memory.
> The JVM doesn't manage/limit off-heap (how could it? it doesn't know
> when it can be released).
> 
> The answer is that YARN will kill the process because it's using more
> memory than it asked for. A JVM is always going to use a little
> off-heap memory by itself, so setting a max heap size of 2GB means the
> JVM process may use a bit more than 2GB of memory. With an off-heap
> intensive app like Spark it can be a lot more.
> 
> There's a built-in 10% overhead, so that if you ask for a 3GB executor
> it will ask for 3.3GB from YARN. You can increase the overhead.
> 
> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke  wrote:
>> All off-heap memory is still managed by the JVM process. If you limit the
>> memory of this process then you limit the memory. I think the memory of the
>> JVM process could be limited via the xms/xmx parameter of the JVM. This can
>> be configured via spark options for yarn (be aware that they are different
>> in cluster and client mode), but i recommend to use the spark options for
>> the off heap maximum.
>> 
>> https://spark.apache.org/docs/latest/running-on-yarn.html
>> 
>> 
>> On 21 Sep 2016, at 22:02, Michael Segel  wrote:
>> 
>> I’ve asked this question a couple of times from a friend who didn’t know
>> the answer… so I thought I would try here.
>> 
>> 
>> Suppose we launch a job on a cluster (YARN) and we have set up the
>> containers to be 3GB in size.
>> 
>> 
>> What does that 3GB represent?
>> 
>> I mean what happens if we end up using 2-3GB of off heap storage via
>> tungsten?
>> What will Spark do?
>> Will it try to honor the container’s limits and throw an exception or will
>> it allow my job to grab that amount of memory and exceed YARN’s
>> expectations since its off heap?
>> 
>> Thx
>> 
>> -Mike
>> 
>> B‹CB• È
>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-21 Thread Sean Owen
No, Xmx only controls the maximum size of on-heap allocated memory.
The JVM doesn't manage/limit off-heap (how could it? it doesn't know
when it can be released).

The answer is that YARN will kill the process because it's using more
memory than it asked for. A JVM is always going to use a little
off-heap memory by itself, so setting a max heap size of 2GB means the
JVM process may use a bit more than 2GB of memory. With an off-heap
intensive app like Spark it can be a lot more.

There's a built-in 10% overhead, so that if you ask for a 3GB executor
it will ask for 3.3GB from YARN. You can increase the overhead.

On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke  wrote:
> All off-heap memory is still managed by the JVM process. If you limit the
> memory of this process then you limit the memory. I think the memory of the
> JVM process could be limited via the xms/xmx parameter of the JVM. This can
> be configured via spark options for yarn (be aware that they are different
> in cluster and client mode), but i recommend to use the spark options for
> the off heap maximum.
>
> https://spark.apache.org/docs/latest/running-on-yarn.html
>
>
> On 21 Sep 2016, at 22:02, Michael Segel  wrote:
>
> I’ve asked this question a couple of times from a friend who didn’t know
> the answer… so I thought I would try here.
>
>
> Suppose we launch a job on a cluster (YARN) and we have set up the
> containers to be 3GB in size.
>
>
> What does that 3GB represent?
>
> I mean what happens if we end up using 2-3GB of off heap storage via
> tungsten?
> What will Spark do?
> Will it try to honor the container’s limits and throw an exception or will
> it allow my job to grab that amount of memory and exceed YARN’s
> expectations since its off heap?
>
> Thx
>
> -Mike
>
> B‹CB• È
> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Off Heap (Tungsten) Memory Usage / Management ?

2016-09-21 Thread Jörn Franke
All off-heap memory is still managed by the JVM process. If you limit the 
memory of this process then you limit the memory. I think the memory of the JVM 
process could be limited via the xms/xmx parameter of the JVM. This can be 
configured via spark options for yarn (be aware that they are different in 
cluster and client mode), but i recommend to use the spark options for the off 
heap maximum.

https://spark.apache.org/docs/latest/running-on-yarn.html


> On 21 Sep 2016, at 22:02, Michael Segel  wrote:
> 
> I’ve asked this question a couple of times from a friend who didn’t know 
> the answer… so I thought I would try here. 
> 
> 
> Suppose we launch a job on a cluster (YARN) and we have set up the containers 
> to be 3GB in size.
> 
> 
> What does that 3GB represent? 
> 
> I mean what happens if we end up using 2-3GB of off heap storage via 
> tungsten? 
> What will Spark do? 
> Will it try to honor the container’s limits and throw an exception or will 
> it allow my job to grab that amount of memory and exceed YARN’s 
> expectations since its off heap? 
> 
> Thx
> 
> -Mike
> 
> B‹CB•È[œÝXœØÜšX™HK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PÜ\šË˜\XÚK›Ü™ÃBƒ