Re: Netty Errors, chain reaction, topology breaks down

Drew Goya Mon, 03 Mar 2014 11:18:03 -0800

Thanks for sharing your experiences guys, we will be heading back to 0mq as
well.  It's a shame as we really got some nice throughput improvements with
Netty.



On Sun, Mar 2, 2014 at 5:18 PM, Michael Rose <[email protected]>wrote:

> Right now we're having slow, off-heap memory leaks, unknown if these are
> linked to Netty (yet). When the workers inevitably get OOMed, the topology
> will rarely recover gracefully with similar Netty timeouts. Sounds like
> we'll be heading back to 0mq.
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> [email protected]
>
>
> On Sun, Mar 2, 2014 at 5:44 PM, Sean Allen <[email protected]>wrote:
>
>> We have the same issue and after attempting a few fixes, we switched back
>> to using 0mq for now.
>>
>>
>> On Sun, Mar 2, 2014 at 2:46 PM, Drew Goya <[email protected]> wrote:
>>
>>> Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I
>>> occasionally run into a strange and pretty catastrophic error.  One of my
>>> workers is either overloaded or stuck and gets killed and restarted.  This
>>> usually works fine but once in a while the whole topology breaks down, all
>>> the workers are killed and restarted continually.  Looking through the logs
>>> it looks like some netty errors on initialization kill the Async Loop.  The
>>> topology is never able to recover, I have to kill it manually and relaunch
>>> it.
>>>
>>> Is this something anyone else has come across?  Any tips? Config
>>> settings I could change?
>>>
>>> This is a pastebin of the errors:  http://pastebin.com/XXZBsEj1
>>>
>>
>>
>>
>> --
>>
>> Ce n'est pas une signature
>>
>
>

Re: Netty Errors, chain reaction, topology breaks down

Reply via email to