As Michael mentioned, we see the exact same use case in the pmap where
there are lot of anon blocks indicating malloc or direct byte buffer
allocations and keeps on increasing with time.

Actually the topology is used for ingestion of content, where the bolts are
responsible for curating and enriching them.

Couple of bolts do have native calls through JNI which we suspect to be the
culprits and did find an issue where we were not freeing up memory and
fixed it. However even after the fix, we see the memory usage grow but at a
much slower rate than earlier.

Would be interesting to know if you found any other sources of leaks that
we need to be aware off, and how did you fix them. Based on current status,
we may have to do a daily rebalance or restart of topologies to remediate
this build up.

Thanks
Indra
On 18 Jun 2014 05:33, "P. Taylor Goetz" <[email protected]> wrote:

> I've not seen this, (un)fortunately. :)
>
> Are there any other relevant details you might be able to provide?
>
> Or better yet can you distill it down to a bare bones topology that
> reproduces it and share the code?
>
> -Taylor
>
> On Jun 17, 2014, at 6:27 PM, Michael Rose <[email protected]> wrote:
>
> I've run into similar leaks with one of our topologies. ZMQ vs. Netty
> didn't make any difference for us. We'd been looking into the Netty-based
> HTTP client we're using as a suspect, but maybe it is Storm.
>
> 8 workers, 1.5GB heap, CMS collector, Java 1.7.0_25-b15, Storm 0.9.0.1
>
> What kinds of things do your topologies do?
>
> One thing we'd observed is a bump in direct buffers. Usually starts around
> 100. Java can't account for the memory used, but the size & count of the
> allocations as shown by pmap is suspicious.
>
> ...
> 00007f30ac1bc000  63760K -----    [ anon ]
> 00007f30b0000000    864K rw---    [ anon ]
> 00007f30b00d8000  64672K -----    [ anon ]
> 00007f30b4000000    620K rw---    [ anon ]
> 00007f30b409b000  64916K -----    [ anon ]
> 00007f30b8000000   1780K rw---    [ anon ]
> 00007f30b81bd000  63756K -----    [ anon ]
> 00007f30bc000000   1376K rw---    [ anon ]
> 00007f30bc158000  64160K -----    [ anon ]
> 00007f30c0000000   1320K rw---    [ anon ]
> ...
>
>       "buffers":{
>          "direct":{
>             "count":721,
>             "memoryUsed":16659150,
>             "totalCapacity":16659150
>          },
>          "mapped":{
>             "count":0,
>             "memoryUsed":0,
>             "totalCapacity":0
>          }
>       },
>
> Do you have a similar bump in direct buffer counts?
>
> Michael
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> [email protected]
>
>
> On Tue, Jun 17, 2014 at 11:15 AM, Indra Nath Bardhan <
> [email protected]> wrote:
>
>> Hi All,
>>
>> We have a topology which is running on 16 workers with 2GB heap each.
>>
>> However we see that the topology worker RES memory usage keeps on piling
>> up i.e., starting at 1.1 G and keeps growing over and beyond the 2G mark
>> till it overwhelms the entire node.
>>
>> This possibly indicates that
>>
>> 1) we either have slowly consuming bolts and thus need throttling in spout
>> 2) OR a memory leak in the ZMQ buffer allocation or some of the JNI code.
>>
>> Based on responses in certain other discussions, we tried making our
>> topology reliable and make use of the MAX_SPOUT_PENDING to throttle the
>> spouts. However this did not yield us much value, trying with a value of
>> 1000 & 100, we see the same growth in the memory usage, although a bit
>> slower in the later case.
>>
>> We also did a pmap of the offending pids and did not see much memory
>> usage by the native lib*so files.
>>
>> Is there any way to identify the source of this native leak OR fix this ?
>> We need some urgent help on this.
>>
>> [NOTE: Using Storm - 0.9.0_wip21]
>>
>> Thanks,
>> Indra
>>
>>
>>
>>
>

Reply via email to