Thanks for sharing your experiences guys, we will be heading back to 0mq as well. It's a shame as we really got some nice throughput improvements with Netty.
On Sun, Mar 2, 2014 at 5:18 PM, Michael Rose <[email protected]>wrote: > Right now we're having slow, off-heap memory leaks, unknown if these are > linked to Netty (yet). When the workers inevitably get OOMed, the topology > will rarely recover gracefully with similar Netty timeouts. Sounds like > we'll be heading back to 0mq. > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > [email protected] > > > On Sun, Mar 2, 2014 at 5:44 PM, Sean Allen <[email protected]>wrote: > >> We have the same issue and after attempting a few fixes, we switched back >> to using 0mq for now. >> >> >> On Sun, Mar 2, 2014 at 2:46 PM, Drew Goya <[email protected]> wrote: >> >>> Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I >>> occasionally run into a strange and pretty catastrophic error. One of my >>> workers is either overloaded or stuck and gets killed and restarted. This >>> usually works fine but once in a while the whole topology breaks down, all >>> the workers are killed and restarted continually. Looking through the logs >>> it looks like some netty errors on initialization kill the Async Loop. The >>> topology is never able to recover, I have to kill it manually and relaunch >>> it. >>> >>> Is this something anyone else has come across? Any tips? Config >>> settings I could change? >>> >>> This is a pastebin of the errors: http://pastebin.com/XXZBsEj1 >>> >> >> >> >> -- >> >> Ce n'est pas une signature >> > >
