Re: Off Heap (Tungsten) Memory Usage / Management ?
I would disagree. While you can tune the system to not over subscribe, I would rather have it hit swap then fail. Especially on long running jobs. If we look at oversubscription on Hadoop clusters which are not running HBase… they survive. Its when you have things like HBase that don’t handle swap well… or you don’t allocate enough swap that things go boom. Also consider that you could move swap to something that is faster than spinning rust. > On Sep 22, 2016, at 12:44 PM, Sean Owenwrote: > > I don't think I'd enable swap on a cluster. You'd rather processes > fail than grind everything to a halt. You'd buy more memory or > optimize memory before trading it for I/O. > > On Thu, Sep 22, 2016 at 6:29 PM, Michael Segel > wrote: >> Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation >> and ignored the off heap. >> >> WRT over all OS memory… this would be one reason why I’d keep a decent >> amount of swap around. (Maybe even putting it on a fast device like an .m2 >> or PCIe flash drive….
Re: Off Heap (Tungsten) Memory Usage / Management ?
this is probably the best way to manage it On Thu, Sep 22, 2016 at 6:42 PM, Josh Rosenwrote: > Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped > at spark.memory.offHeap.size bytes. This is purposely specified as an > absolute size rather than as a percentage of the heap size in order to > allow end users to tune Spark so that its overall memory consumption stays > within container memory limits. > > To use your example of a 3GB YARN container, you could configure Spark so > that it's maximum heap size plus spark.memory.offHeap.size is smaller than > 3GB (minus some overhead fudge-factor). > > On Thu, Sep 22, 2016 at 7:56 AM Sean Owen wrote: > >> It's looking at the whole process's memory usage, and doesn't care >> whether the memory is used by the heap or not within the JVM. Of >> course, allocating memory off-heap still counts against you at the OS >> level. >> >> On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel >> wrote: >> > Thanks for the response Sean. >> > >> > But how does YARN know about the off-heap memory usage? >> > That’s the piece that I’m missing. >> > >> > Thx again, >> > >> > -Mike >> > >> >> On Sep 21, 2016, at 10:09 PM, Sean Owen wrote: >> >> >> >> No, Xmx only controls the maximum size of on-heap allocated memory. >> >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know >> >> when it can be released). >> >> >> >> The answer is that YARN will kill the process because it's using more >> >> memory than it asked for. A JVM is always going to use a little >> >> off-heap memory by itself, so setting a max heap size of 2GB means the >> >> JVM process may use a bit more than 2GB of memory. With an off-heap >> >> intensive app like Spark it can be a lot more. >> >> >> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor >> >> it will ask for 3.3GB from YARN. You can increase the overhead. >> >> >> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke >> wrote: >> >>> All off-heap memory is still managed by the JVM process. If you limit >> the >> >>> memory of this process then you limit the memory. I think the memory >> of the >> >>> JVM process could be limited via the xms/xmx parameter of the JVM. >> This can >> >>> be configured via spark options for yarn (be aware that they are >> different >> >>> in cluster and client mode), but i recommend to use the spark options >> for >> >>> the off heap maximum. >> >>> >> >>> https://spark.apache.org/docs/latest/running-on-yarn.html >> >>> >> >>> >> >>> On 21 Sep 2016, at 22:02, Michael Segel >> wrote: >> >>> >> >>> I’ve asked this question a couple of times from a friend who >> didn’t know >> >>> the answer… so I thought I would try here. >> >>> >> >>> >> >>> Suppose we launch a job on a cluster (YARN) and we have set up the >> >>> containers to be 3GB in size. >> >>> >> >>> >> >>> What does that 3GB represent? >> >>> >> >>> I mean what happens if we end up using 2-3GB of off heap storage via >> >>> tungsten? >> >>> What will Spark do? >> >>> Will it try to honor the container’s limits and throw an exception >> or will >> >>> it allow my job to grab that amount of memory and exceed YARN’s >> >>> expectations since its off heap? >> >>> >> >>> Thx >> >>> >> >>> -Mike >> >>> >> >>> B‹CB• >> È >> >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ >> > >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Off Heap (Tungsten) Memory Usage / Management ?
I don't think I'd enable swap on a cluster. You'd rather processes fail than grind everything to a halt. You'd buy more memory or optimize memory before trading it for I/O. On Thu, Sep 22, 2016 at 6:29 PM, Michael Segelwrote: > Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation and > ignored the off heap. > > WRT over all OS memory… this would be one reason why I’d keep a decent amount > of swap around. (Maybe even putting it on a fast device like an .m2 or PCIe > flash drive…. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Off Heap (Tungsten) Memory Usage / Management ?
Well off-heap memory will be from an OS perspective be visible under the JVM process (you see the memory consumption of the jvm process growing when using off-heap memory). There is one exception: if there is another process, which has not been started by the JVM and "lives" outside the JVM, but uses IPC to communicate with the JVM. I do not assume this is for Spark the case. @xms/xmx you are right here, this is just about heap memory. You may be able to limit the memory (and thus under previous described assumption) of the jvm process by using cgroups, which needs to be thought about if this shoudld be done. On Thu, Sep 22, 2016 at 5:09 AM, Sean Owenwrote: > No, Xmx only controls the maximum size of on-heap allocated memory. > The JVM doesn't manage/limit off-heap (how could it? it doesn't know > when it can be released). > > The answer is that YARN will kill the process because it's using more > memory than it asked for. A JVM is always going to use a little > off-heap memory by itself, so setting a max heap size of 2GB means the > JVM process may use a bit more than 2GB of memory. With an off-heap > intensive app like Spark it can be a lot more. > > There's a built-in 10% overhead, so that if you ask for a 3GB executor > it will ask for 3.3GB from YARN. You can increase the overhead. > > On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke > wrote: > > All off-heap memory is still managed by the JVM process. If you limit the > > memory of this process then you limit the memory. I think the memory of > the > > JVM process could be limited via the xms/xmx parameter of the JVM. This > can > > be configured via spark options for yarn (be aware that they are > different > > in cluster and client mode), but i recommend to use the spark options for > > the off heap maximum. > > > > https://spark.apache.org/docs/latest/running-on-yarn.html > > > > > > On 21 Sep 2016, at 22:02, Michael Segel > wrote: > > > > I’ve asked this question a couple of times from a friend who didn’t > know > > the answer… so I thought I would try here. > > > > > > Suppose we launch a job on a cluster (YARN) and we have set up the > > containers to be 3GB in size. > > > > > > What does that 3GB represent? > > > > I mean what happens if we end up using 2-3GB of off heap storage via > > tungsten? > > What will Spark do? > > Will it try to honor the container’s limits and throw an exception or > will > > it allow my job to grab that amount of memory and exceed YARN’s > > expectations since its off heap? > > > > Thx > > > > -Mike > > > > B‹CB• > È > > [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ >
Re: Off Heap (Tungsten) Memory Usage / Management ?
Ok… gotcha… wasn’t sure that YARN just looked at the heap size allocation and ignored the off heap. WRT over all OS memory… this would be one reason why I’d keep a decent amount of swap around. (Maybe even putting it on a fast device like an .m2 or PCIe flash drive…. > On Sep 22, 2016, at 9:56 AM, Sean Owenwrote: > > It's looking at the whole process's memory usage, and doesn't care > whether the memory is used by the heap or not within the JVM. Of > course, allocating memory off-heap still counts against you at the OS > level. > > On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel > wrote: >> Thanks for the response Sean. >> >> But how does YARN know about the off-heap memory usage? >> That’s the piece that I’m missing. >> >> Thx again, >> >> -Mike >> >>> On Sep 21, 2016, at 10:09 PM, Sean Owen wrote: >>> >>> No, Xmx only controls the maximum size of on-heap allocated memory. >>> The JVM doesn't manage/limit off-heap (how could it? it doesn't know >>> when it can be released). >>> >>> The answer is that YARN will kill the process because it's using more >>> memory than it asked for. A JVM is always going to use a little >>> off-heap memory by itself, so setting a max heap size of 2GB means the >>> JVM process may use a bit more than 2GB of memory. With an off-heap >>> intensive app like Spark it can be a lot more. >>> >>> There's a built-in 10% overhead, so that if you ask for a 3GB executor >>> it will ask for 3.3GB from YARN. You can increase the overhead. >>> >>> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke wrote: All off-heap memory is still managed by the JVM process. If you limit the memory of this process then you limit the memory. I think the memory of the JVM process could be limited via the xms/xmx parameter of the JVM. This can be configured via spark options for yarn (be aware that they are different in cluster and client mode), but i recommend to use the spark options for the off heap maximum. https://spark.apache.org/docs/latest/running-on-yarn.html On 21 Sep 2016, at 22:02, Michael Segel wrote: I’ve asked this question a couple of times from a friend who didn’t know the answer… so I thought I would try here. Suppose we launch a job on a cluster (YARN) and we have set up the containers to be 3GB in size. What does that 3GB represent? I mean what happens if we end up using 2-3GB of off heap storage via tungsten? What will Spark do? Will it try to honor the container’s limits and throw an exception or will it allow my job to grab that amount of memory and exceed YARN’s expectations since its off heap? Thx -Mike B‹CB• È [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ >> - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Off Heap (Tungsten) Memory Usage / Management ?
Spark SQL / Tungsten's explicitly-managed off-heap memory will be capped at spark.memory.offHeap.size bytes. This is purposely specified as an absolute size rather than as a percentage of the heap size in order to allow end users to tune Spark so that its overall memory consumption stays within container memory limits. To use your example of a 3GB YARN container, you could configure Spark so that it's maximum heap size plus spark.memory.offHeap.size is smaller than 3GB (minus some overhead fudge-factor). On Thu, Sep 22, 2016 at 7:56 AM Sean Owenwrote: > It's looking at the whole process's memory usage, and doesn't care > whether the memory is used by the heap or not within the JVM. Of > course, allocating memory off-heap still counts against you at the OS > level. > > On Thu, Sep 22, 2016 at 3:54 PM, Michael Segel > wrote: > > Thanks for the response Sean. > > > > But how does YARN know about the off-heap memory usage? > > That’s the piece that I’m missing. > > > > Thx again, > > > > -Mike > > > >> On Sep 21, 2016, at 10:09 PM, Sean Owen wrote: > >> > >> No, Xmx only controls the maximum size of on-heap allocated memory. > >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know > >> when it can be released). > >> > >> The answer is that YARN will kill the process because it's using more > >> memory than it asked for. A JVM is always going to use a little > >> off-heap memory by itself, so setting a max heap size of 2GB means the > >> JVM process may use a bit more than 2GB of memory. With an off-heap > >> intensive app like Spark it can be a lot more. > >> > >> There's a built-in 10% overhead, so that if you ask for a 3GB executor > >> it will ask for 3.3GB from YARN. You can increase the overhead. > >> > >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke > wrote: > >>> All off-heap memory is still managed by the JVM process. If you limit > the > >>> memory of this process then you limit the memory. I think the memory > of the > >>> JVM process could be limited via the xms/xmx parameter of the JVM. > This can > >>> be configured via spark options for yarn (be aware that they are > different > >>> in cluster and client mode), but i recommend to use the spark options > for > >>> the off heap maximum. > >>> > >>> https://spark.apache.org/docs/latest/running-on-yarn.html > >>> > >>> > >>> On 21 Sep 2016, at 22:02, Michael Segel > wrote: > >>> > >>> I’ve asked this question a couple of times from a friend who > didn’t know > >>> the answer… so I thought I would try here. > >>> > >>> > >>> Suppose we launch a job on a cluster (YARN) and we have set up the > >>> containers to be 3GB in size. > >>> > >>> > >>> What does that 3GB represent? > >>> > >>> I mean what happens if we end up using 2-3GB of off heap storage via > >>> tungsten? > >>> What will Spark do? > >>> Will it try to honor the container’s limits and throw an exception > or will > >>> it allow my job to grab that amount of memory and exceed YARN’s > >>> expectations since its off heap? > >>> > >>> Thx > >>> > >>> -Mike > >>> > >>> > B‹CB• È > >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Off Heap (Tungsten) Memory Usage / Management ?
It's looking at the whole process's memory usage, and doesn't care whether the memory is used by the heap or not within the JVM. Of course, allocating memory off-heap still counts against you at the OS level. On Thu, Sep 22, 2016 at 3:54 PM, Michael Segelwrote: > Thanks for the response Sean. > > But how does YARN know about the off-heap memory usage? > That’s the piece that I’m missing. > > Thx again, > > -Mike > >> On Sep 21, 2016, at 10:09 PM, Sean Owen wrote: >> >> No, Xmx only controls the maximum size of on-heap allocated memory. >> The JVM doesn't manage/limit off-heap (how could it? it doesn't know >> when it can be released). >> >> The answer is that YARN will kill the process because it's using more >> memory than it asked for. A JVM is always going to use a little >> off-heap memory by itself, so setting a max heap size of 2GB means the >> JVM process may use a bit more than 2GB of memory. With an off-heap >> intensive app like Spark it can be a lot more. >> >> There's a built-in 10% overhead, so that if you ask for a 3GB executor >> it will ask for 3.3GB from YARN. You can increase the overhead. >> >> On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke wrote: >>> All off-heap memory is still managed by the JVM process. If you limit the >>> memory of this process then you limit the memory. I think the memory of the >>> JVM process could be limited via the xms/xmx parameter of the JVM. This can >>> be configured via spark options for yarn (be aware that they are different >>> in cluster and client mode), but i recommend to use the spark options for >>> the off heap maximum. >>> >>> https://spark.apache.org/docs/latest/running-on-yarn.html >>> >>> >>> On 21 Sep 2016, at 22:02, Michael Segel wrote: >>> >>> I’ve asked this question a couple of times from a friend who didn’t know >>> the answer… so I thought I would try here. >>> >>> >>> Suppose we launch a job on a cluster (YARN) and we have set up the >>> containers to be 3GB in size. >>> >>> >>> What does that 3GB represent? >>> >>> I mean what happens if we end up using 2-3GB of off heap storage via >>> tungsten? >>> What will Spark do? >>> Will it try to honor the container’s limits and throw an exception or will >>> it allow my job to grab that amount of memory and exceed YARN’s >>> expectations since its off heap? >>> >>> Thx >>> >>> -Mike >>> >>> B‹CB• È >>> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Off Heap (Tungsten) Memory Usage / Management ?
Thanks for the response Sean. But how does YARN know about the off-heap memory usage? That’s the piece that I’m missing. Thx again, -Mike > On Sep 21, 2016, at 10:09 PM, Sean Owenwrote: > > No, Xmx only controls the maximum size of on-heap allocated memory. > The JVM doesn't manage/limit off-heap (how could it? it doesn't know > when it can be released). > > The answer is that YARN will kill the process because it's using more > memory than it asked for. A JVM is always going to use a little > off-heap memory by itself, so setting a max heap size of 2GB means the > JVM process may use a bit more than 2GB of memory. With an off-heap > intensive app like Spark it can be a lot more. > > There's a built-in 10% overhead, so that if you ask for a 3GB executor > it will ask for 3.3GB from YARN. You can increase the overhead. > > On Wed, Sep 21, 2016 at 11:41 PM, Jörn Franke wrote: >> All off-heap memory is still managed by the JVM process. If you limit the >> memory of this process then you limit the memory. I think the memory of the >> JVM process could be limited via the xms/xmx parameter of the JVM. This can >> be configured via spark options for yarn (be aware that they are different >> in cluster and client mode), but i recommend to use the spark options for >> the off heap maximum. >> >> https://spark.apache.org/docs/latest/running-on-yarn.html >> >> >> On 21 Sep 2016, at 22:02, Michael Segel wrote: >> >> I’ve asked this question a couple of times from a friend who didn’t know >> the answer… so I thought I would try here. >> >> >> Suppose we launch a job on a cluster (YARN) and we have set up the >> containers to be 3GB in size. >> >> >> What does that 3GB represent? >> >> I mean what happens if we end up using 2-3GB of off heap storage via >> tungsten? >> What will Spark do? >> Will it try to honor the container’s limits and throw an exception or will >> it allow my job to grab that amount of memory and exceed YARN’s >> expectations since its off heap? >> >> Thx >> >> -Mike >> >> B‹CB• È >> [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ
Re: Off Heap (Tungsten) Memory Usage / Management ?
No, Xmx only controls the maximum size of on-heap allocated memory. The JVM doesn't manage/limit off-heap (how could it? it doesn't know when it can be released). The answer is that YARN will kill the process because it's using more memory than it asked for. A JVM is always going to use a little off-heap memory by itself, so setting a max heap size of 2GB means the JVM process may use a bit more than 2GB of memory. With an off-heap intensive app like Spark it can be a lot more. There's a built-in 10% overhead, so that if you ask for a 3GB executor it will ask for 3.3GB from YARN. You can increase the overhead. On Wed, Sep 21, 2016 at 11:41 PM, Jörn Frankewrote: > All off-heap memory is still managed by the JVM process. If you limit the > memory of this process then you limit the memory. I think the memory of the > JVM process could be limited via the xms/xmx parameter of the JVM. This can > be configured via spark options for yarn (be aware that they are different > in cluster and client mode), but i recommend to use the spark options for > the off heap maximum. > > https://spark.apache.org/docs/latest/running-on-yarn.html > > > On 21 Sep 2016, at 22:02, Michael Segel wrote: > > I’ve asked this question a couple of times from a friend who didn’t know > the answer… so I thought I would try here. > > > Suppose we launch a job on a cluster (YARN) and we have set up the > containers to be 3GB in size. > > > What does that 3GB represent? > > I mean what happens if we end up using 2-3GB of off heap storage via > tungsten? > What will Spark do? > Will it try to honor the container’s limits and throw an exception or will > it allow my job to grab that amount of memory and exceed YARN’s > expectations since its off heap? > > Thx > > -Mike > > B‹CB• È > [œÝXœØÜšX™H K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃBƒ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Off Heap (Tungsten) Memory Usage / Management ?
All off-heap memory is still managed by the JVM process. If you limit the memory of this process then you limit the memory. I think the memory of the JVM process could be limited via the xms/xmx parameter of the JVM. This can be configured via spark options for yarn (be aware that they are different in cluster and client mode), but i recommend to use the spark options for the off heap maximum. https://spark.apache.org/docs/latest/running-on-yarn.html > On 21 Sep 2016, at 22:02, Michael Segelwrote: > > I’ve asked this question a couple of times from a friend who didn’t know > the answer… so I thought I would try here. > > > Suppose we launch a job on a cluster (YARN) and we have set up the containers > to be 3GB in size. > > > What does that 3GB represent? > > I mean what happens if we end up using 2-3GB of off heap storage via > tungsten? > What will Spark do? > Will it try to honor the container’s limits and throw an exception or will > it allow my job to grab that amount of memory and exceed YARN’s > expectations since its off heap? > > Thx > > -Mike > > B‹CB•È[œÝXœØÜšX™HK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PÜ\šË˜\XÚK›Ü™ÃBƒ