Re: Memory / resource leak in 0.10.1.1 release
FWIW: I went through and removed all the 'custom' serdes from my code and replaced them with 'string serdes'. The memory leak problem went away. The code is a bit more cumbersome now as it's constantly flipping back and forth between Objects and JSON.. but that seems to be what it takes to keep it running. On Thu, Dec 29, 2016 at 9:42 PM, Guozhang Wang wrote: > Hello Jon, > > It is hard to tell, since I cannot see how is your Aggregate() function is > implemented as well. > > Note that the deserializer of transactionSerde is used in both `aggregate` > and `KstreamBuilder.stream`, while the serializer of transactionSerde is > only used in `aggregate`, so if you suspect the transactionSerde is the > root cause, to narrow it down you can leave the topology as > > > KStream transactionKStream = kStreamBuilder.stream( > stringSerde,transactionSerde,TOPIC); > > transactionKStream.to(TOPIC-2); > > where TOPIC-2 should be pre-created. > > The above topology will also trigger both the serializer and deserializer > of the transactionSerde, and if this topology also leads to memory leak, > then it means it is not relevant to your aggregate function. > > > Guozhang > > > On Sun, Dec 25, 2016 at 4:15 AM, Jon Yeargers > wrote: > > > I narrowed this problem down to this part of the topology (and yes, it's > > 100% repro - for me): > > > > KStream transactionKStream = > > kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC); > > > > KTable, SumRecordCollector> ktAgg = > > transactionKStream.groupByKey().aggregate( > > SumRecordCollector::new, > > new Aggregate(), > > TimeWindows.of(20 * 60 * 1000L), > > collectorSerde, "table_stream"); > > > > Given that this is a pretty trivial, well-traveled piece of Kafka I can't > > imagine it has a memory leak. > > > > So Im guessing that the serde I'm using is causing a problem somehow. The > > 'transactionSerde' is just to get/set JSON into the 'SumRecord' object. > > That Object is just a bunch of String and int fields so nothing > interesting > > there either. > > > > I'm attaching the two parts of the transactionSerde to see if anyone has > > suggestions on how to find / fix this. > > > > > > > > On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers > > wrote: > > > >> Yes - that's the one. It's 100% reproducible (for me). > >> > >> > >> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy > wrote: > >> > >>> Hi Jon, > >>> > >>> Is this for the topology where you are doing something like: > >>> > >>> topology: kStream -> groupByKey.aggregate(minute) -> foreach > >>> \-> groupByKey.aggregate(hour) -> foreach > >>> > >>> I'm trying to understand how i could reproduce your problem. I've not > >>> seen > >>> any such issues with 0.10.1.1, but then i'm not sure what you are > doing. > >>> > >>> Thanks, > >>> Damian > >>> > >>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers > >>> wrote: > >>> > >>> > Im still hitting this leak with the released version of 0.10.1.1. > >>> > > >>> > Process mem % grows over the course of 10-20 minutes and eventually > >>> the OS > >>> > kills it. > >>> > > >>> > Messages like this appear in /var/log/messages: > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java > invoked > >>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java > >>> cpuset=/ > >>> > mems_allowed=0 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 > PID: > >>> 9550 > >>> > Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware > >>> name: > >>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > 88071c517a70 812c958f 88071c517c58 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > 88071c517b00 811ce76d 8109db14 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > 810b2d91 0010 817d0fe9 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call > Trace: > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] dump_stack+0x63/0x84 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] dump_header+0x5e/0x1d8 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] ? set_next_entity+0xa4/0x710 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] ? __raw_callee_save___pv_queued_ > >>> spin_unlock+0x11/0x20 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] oom_kill_process+0x205/0x3d0 > >>> > > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > >>> > [] out_of_memory
Re: Memory / resource leak in 0.10.1.1 release
Hello Jon, It is hard to tell, since I cannot see how is your Aggregate() function is implemented as well. Note that the deserializer of transactionSerde is used in both `aggregate` and `KstreamBuilder.stream`, while the serializer of transactionSerde is only used in `aggregate`, so if you suspect the transactionSerde is the root cause, to narrow it down you can leave the topology as KStream transactionKStream = kStreamBuilder.stream( stringSerde,transactionSerde,TOPIC); transactionKStream.to(TOPIC-2); where TOPIC-2 should be pre-created. The above topology will also trigger both the serializer and deserializer of the transactionSerde, and if this topology also leads to memory leak, then it means it is not relevant to your aggregate function. Guozhang On Sun, Dec 25, 2016 at 4:15 AM, Jon Yeargers wrote: > I narrowed this problem down to this part of the topology (and yes, it's > 100% repro - for me): > > KStream transactionKStream = > kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC); > > KTable, SumRecordCollector> ktAgg = > transactionKStream.groupByKey().aggregate( > SumRecordCollector::new, > new Aggregate(), > TimeWindows.of(20 * 60 * 1000L), > collectorSerde, "table_stream"); > > Given that this is a pretty trivial, well-traveled piece of Kafka I can't > imagine it has a memory leak. > > So Im guessing that the serde I'm using is causing a problem somehow. The > 'transactionSerde' is just to get/set JSON into the 'SumRecord' object. > That Object is just a bunch of String and int fields so nothing interesting > there either. > > I'm attaching the two parts of the transactionSerde to see if anyone has > suggestions on how to find / fix this. > > > > On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers > wrote: > >> Yes - that's the one. It's 100% reproducible (for me). >> >> >> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy wrote: >> >>> Hi Jon, >>> >>> Is this for the topology where you are doing something like: >>> >>> topology: kStream -> groupByKey.aggregate(minute) -> foreach >>> \-> groupByKey.aggregate(hour) -> foreach >>> >>> I'm trying to understand how i could reproduce your problem. I've not >>> seen >>> any such issues with 0.10.1.1, but then i'm not sure what you are doing. >>> >>> Thanks, >>> Damian >>> >>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers >>> wrote: >>> >>> > Im still hitting this leak with the released version of 0.10.1.1. >>> > >>> > Process mem % grows over the course of 10-20 minutes and eventually >>> the OS >>> > kills it. >>> > >>> > Messages like this appear in /var/log/messages: >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked >>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java >>> cpuset=/ >>> > mems_allowed=0 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: >>> 9550 >>> > Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware >>> name: >>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > 88071c517a70 812c958f 88071c517c58 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > 88071c517b00 811ce76d 8109db14 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > 810b2d91 0010 817d0fe9 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace: >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] dump_stack+0x63/0x84 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] dump_header+0x5e/0x1d8 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] ? set_next_entity+0xa4/0x710 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] ? __raw_callee_save___pv_queued_ >>> spin_unlock+0x11/0x20 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] oom_kill_process+0x205/0x3d0 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] out_of_memory+0x431/0x480 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] __alloc_pages_nodemask+0x91e/0xa60 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] alloc_pages_current+0x88/0x120 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] __page_cache_alloc+0xb4/0xc0 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] filemap_fault+0x188/0x3e0 >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >>> > [] ext4_filemap_fault+0x36/0x50 [ext4] >>> > >>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [29898
Re: Memory / resource leak in 0.10.1.1 release
I narrowed this problem down to this part of the topology (and yes, it's 100% repro - for me): KStream transactionKStream = kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC); KTable, SumRecordCollector> ktAgg = transactionKStream.groupByKey().aggregate( SumRecordCollector::new, new Aggregate(), TimeWindows.of(20 * 60 * 1000L), collectorSerde, "table_stream"); Given that this is a pretty trivial, well-traveled piece of Kafka I can't imagine it has a memory leak. So Im guessing that the serde I'm using is causing a problem somehow. The 'transactionSerde' is just to get/set JSON into the 'SumRecord' object. That Object is just a bunch of String and int fields so nothing interesting there either. I'm attaching the two parts of the transactionSerde to see if anyone has suggestions on how to find / fix this. On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers wrote: > Yes - that's the one. It's 100% reproducible (for me). > > > On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy wrote: > >> Hi Jon, >> >> Is this for the topology where you are doing something like: >> >> topology: kStream -> groupByKey.aggregate(minute) -> foreach >> \-> groupByKey.aggregate(hour) -> foreach >> >> I'm trying to understand how i could reproduce your problem. I've not seen >> any such issues with 0.10.1.1, but then i'm not sure what you are doing. >> >> Thanks, >> Damian >> >> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers >> wrote: >> >> > Im still hitting this leak with the released version of 0.10.1.1. >> > >> > Process mem % grows over the course of 10-20 minutes and eventually the >> OS >> > kills it. >> > >> > Messages like this appear in /var/log/messages: >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked >> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/ >> > mems_allowed=0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: >> 9550 >> > Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware >> name: >> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > 88071c517a70 812c958f 88071c517c58 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > 88071c517b00 811ce76d 8109db14 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > 810b2d91 0010 817d0fe9 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace: >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] dump_stack+0x63/0x84 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] dump_header+0x5e/0x1d8 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] ? set_next_entity+0xa4/0x710 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] ? __raw_callee_save___pv_queued_ >> spin_unlock+0x11/0x20 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] oom_kill_process+0x205/0x3d0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] out_of_memory+0x431/0x480 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] __alloc_pages_nodemask+0x91e/0xa60 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] alloc_pages_current+0x88/0x120 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] __page_cache_alloc+0xb4/0xc0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] filemap_fault+0x188/0x3e0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] ext4_filemap_fault+0x36/0x50 [ext4] >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] __do_fault+0x3d/0x70 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] handle_mm_fault+0xf27/0x1870 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] ? __raw_callee_save___pv_queued_ >> spin_unlock+0x11/0x20 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] __do_page_fault+0x183/0x3f0 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] do_page_fault+0x22/0x30 >> > >> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] >> > [] page_fault+0x28/0x30 >> > >> > >
Re: Memory / resource leak in 0.10.1.1 release
Yes - that's the one. It's 100% reproducible (for me). On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy wrote: > Hi Jon, > > Is this for the topology where you are doing something like: > > topology: kStream -> groupByKey.aggregate(minute) -> foreach > \-> groupByKey.aggregate(hour) -> foreach > > I'm trying to understand how i could reproduce your problem. I've not seen > any such issues with 0.10.1.1, but then i'm not sure what you are doing. > > Thanks, > Damian > > On Thu, 22 Dec 2016 at 15:26 Jon Yeargers > wrote: > > > Im still hitting this leak with the released version of 0.10.1.1. > > > > Process mem % grows over the course of 10-20 minutes and eventually the > OS > > kills it. > > > > Messages like this appear in /var/log/messages: > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked > > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/ > > mems_allowed=0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: > 9550 > > Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name: > > Xen HVM domU, BIOS 4.2.amazon 11/11/2016 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > 88071c517a70 812c958f 88071c517c58 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > 88071c517b00 811ce76d 8109db14 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > 810b2d91 0010 817d0fe9 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace: > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] dump_stack+0x63/0x84 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] dump_header+0x5e/0x1d8 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] ? set_next_entity+0xa4/0x710 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] ? __raw_callee_save___pv_queued_ > spin_unlock+0x11/0x20 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] oom_kill_process+0x205/0x3d0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] out_of_memory+0x431/0x480 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] __alloc_pages_nodemask+0x91e/0xa60 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] alloc_pages_current+0x88/0x120 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] __page_cache_alloc+0xb4/0xc0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] filemap_fault+0x188/0x3e0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] ext4_filemap_fault+0x36/0x50 [ext4] > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] __do_fault+0x3d/0x70 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] handle_mm_fault+0xf27/0x1870 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] ? __raw_callee_save___pv_queued_ > spin_unlock+0x11/0x20 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] __do_page_fault+0x183/0x3f0 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] do_page_fault+0x22/0x30 > > > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > > [] page_fault+0x28/0x30 > > >
Re: Memory / resource leak in 0.10.1.1 release
Hi Jon, Is this for the topology where you are doing something like: topology: kStream -> groupByKey.aggregate(minute) -> foreach \-> groupByKey.aggregate(hour) -> foreach I'm trying to understand how i could reproduce your problem. I've not seen any such issues with 0.10.1.1, but then i'm not sure what you are doing. Thanks, Damian On Thu, 22 Dec 2016 at 15:26 Jon Yeargers wrote: > Im still hitting this leak with the released version of 0.10.1.1. > > Process mem % grows over the course of 10-20 minutes and eventually the OS > kills it. > > Messages like this appear in /var/log/messages: > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/ > mems_allowed=0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: 9550 > Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name: > Xen HVM domU, BIOS 4.2.amazon 11/11/2016 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > 88071c517a70 812c958f 88071c517c58 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > 88071c517b00 811ce76d 8109db14 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > 810b2d91 0010 817d0fe9 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace: > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] dump_stack+0x63/0x84 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] dump_header+0x5e/0x1d8 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] ? set_next_entity+0xa4/0x710 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] oom_kill_process+0x205/0x3d0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] out_of_memory+0x431/0x480 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] __alloc_pages_nodemask+0x91e/0xa60 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] alloc_pages_current+0x88/0x120 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] __page_cache_alloc+0xb4/0xc0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] filemap_fault+0x188/0x3e0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] ext4_filemap_fault+0x36/0x50 [ext4] > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] __do_fault+0x3d/0x70 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] handle_mm_fault+0xf27/0x1870 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] __do_page_fault+0x183/0x3f0 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] do_page_fault+0x22/0x30 > > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] > [] page_fault+0x28/0x30 >
Memory / resource leak in 0.10.1.1 release
Im still hitting this leak with the released version of 0.10.1.1. Process mem % grows over the course of 10-20 minutes and eventually the OS kills it. Messages like this appear in /var/log/messages: Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/ mems_allowed=0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: 9550 Comm: java Tainted: GE 4.4.19-29.55.amzn1.x86_64 #1 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] 88071c517a70 812c958f 88071c517c58 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] 88071c517b00 811ce76d 8109db14 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] 810b2d91 0010 817d0fe9 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace: Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] dump_stack+0x63/0x84 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] dump_header+0x5e/0x1d8 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] ? set_next_entity+0xa4/0x710 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] oom_kill_process+0x205/0x3d0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] out_of_memory+0x431/0x480 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] __alloc_pages_nodemask+0x91e/0xa60 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] alloc_pages_current+0x88/0x120 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] __page_cache_alloc+0xb4/0xc0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] filemap_fault+0x188/0x3e0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] ext4_filemap_fault+0x36/0x50 [ext4] Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] __do_fault+0x3d/0x70 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] handle_mm_fault+0xf27/0x1870 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] __do_page_fault+0x183/0x3f0 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] do_page_fault+0x22/0x30 Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] [] page_fault+0x28/0x30