I've run into similar leaks with one of our topologies. ZMQ vs. Netty
didn't make any difference for us. We'd been looking into the Netty-based
HTTP client we're using as a suspect, but maybe it is Storm.

8 workers, 1.5GB heap, CMS collector, Java 1.7.0_25-b15, Storm 0.9.0.1

What kinds of things do your topologies do?

One thing we'd observed is a bump in direct buffers. Usually starts around
100. Java can't account for the memory used, but the size & count of the
allocations as shown by pmap is suspicious.

...
00007f30ac1bc000  63760K -----    [ anon ]
00007f30b0000000    864K rw---    [ anon ]
00007f30b00d8000  64672K -----    [ anon ]
00007f30b4000000    620K rw---    [ anon ]
00007f30b409b000  64916K -----    [ anon ]
00007f30b8000000   1780K rw---    [ anon ]
00007f30b81bd000  63756K -----    [ anon ]
00007f30bc000000   1376K rw---    [ anon ]
00007f30bc158000  64160K -----    [ anon ]
00007f30c0000000   1320K rw---    [ anon ]
...

      "buffers":{
         "direct":{
            "count":721,
            "memoryUsed":16659150,
            "totalCapacity":16659150
         },
         "mapped":{
            "count":0,
            "memoryUsed":0,
            "totalCapacity":0
         }
      },

Do you have a similar bump in direct buffer counts?

Michael

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
[email protected]


On Tue, Jun 17, 2014 at 11:15 AM, Indra Nath Bardhan <
[email protected]> wrote:

> Hi All,
>
> We have a topology which is running on 16 workers with 2GB heap each.
>
> However we see that the topology worker RES memory usage keeps on piling
> up i.e., starting at 1.1 G and keeps growing over and beyond the 2G mark
> till it overwhelms the entire node.
>
> This possibly indicates that
>
> 1) we either have slowly consuming bolts and thus need throttling in spout
> 2) OR a memory leak in the ZMQ buffer allocation or some of the JNI code.
>
> Based on responses in certain other discussions, we tried making our
> topology reliable and make use of the MAX_SPOUT_PENDING to throttle the
> spouts. However this did not yield us much value, trying with a value of
> 1000 & 100, we see the same growth in the memory usage, although a bit
> slower in the later case.
>
> We also did a pmap of the offending pids and did not see much memory usage
> by the native lib*so files.
>
> Is there any way to identify the source of this native leak OR fix this ?
> We need some urgent help on this.
>
> [NOTE: Using Storm - 0.9.0_wip21]
>
> Thanks,
> Indra
>
>
>
>

Reply via email to