I've run into similar leaks with one of our topologies. ZMQ vs. Netty
didn't make any difference for us. We'd been looking into the Netty-based
HTTP client we're using as a suspect, but maybe it is Storm.
8 workers, 1.5GB heap, CMS collector, Java 1.7.0_25-b15, Storm 0.9.0.1
What kinds of things do your topologies do?
One thing we'd observed is a bump in direct buffers. Usually starts around
100. Java can't account for the memory used, but the size & count of the
allocations as shown by pmap is suspicious.
...
00007f30ac1bc000 63760K ----- [ anon ]
00007f30b0000000 864K rw--- [ anon ]
00007f30b00d8000 64672K ----- [ anon ]
00007f30b4000000 620K rw--- [ anon ]
00007f30b409b000 64916K ----- [ anon ]
00007f30b8000000 1780K rw--- [ anon ]
00007f30b81bd000 63756K ----- [ anon ]
00007f30bc000000 1376K rw--- [ anon ]
00007f30bc158000 64160K ----- [ anon ]
00007f30c0000000 1320K rw--- [ anon ]
...
"buffers":{
"direct":{
"count":721,
"memoryUsed":16659150,
"totalCapacity":16659150
},
"mapped":{
"count":0,
"memoryUsed":0,
"totalCapacity":0
}
},
Do you have a similar bump in direct buffer counts?
Michael
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
[email protected]
On Tue, Jun 17, 2014 at 11:15 AM, Indra Nath Bardhan <
[email protected]> wrote:
> Hi All,
>
> We have a topology which is running on 16 workers with 2GB heap each.
>
> However we see that the topology worker RES memory usage keeps on piling
> up i.e., starting at 1.1 G and keeps growing over and beyond the 2G mark
> till it overwhelms the entire node.
>
> This possibly indicates that
>
> 1) we either have slowly consuming bolts and thus need throttling in spout
> 2) OR a memory leak in the ZMQ buffer allocation or some of the JNI code.
>
> Based on responses in certain other discussions, we tried making our
> topology reliable and make use of the MAX_SPOUT_PENDING to throttle the
> spouts. However this did not yield us much value, trying with a value of
> 1000 & 100, we see the same growth in the memory usage, although a bit
> slower in the later case.
>
> We also did a pmap of the offending pids and did not see much memory usage
> by the native lib*so files.
>
> Is there any way to identify the source of this native leak OR fix this ?
> We need some urgent help on this.
>
> [NOTE: Using Storm - 0.9.0_wip21]
>
> Thanks,
> Indra
>
>
>
>