Right now we're having slow, off-heap memory leaks, unknown if these are linked to Netty (yet). When the workers inevitably get OOMed, the topology will rarely recover gracefully with similar Netty timeouts. Sounds like we'll be heading back to 0mq.
Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> [email protected] On Sun, Mar 2, 2014 at 5:44 PM, Sean Allen <[email protected]>wrote: > We have the same issue and after attempting a few fixes, we switched back > to using 0mq for now. > > > On Sun, Mar 2, 2014 at 2:46 PM, Drew Goya <[email protected]> wrote: > >> Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I >> occasionally run into a strange and pretty catastrophic error. One of my >> workers is either overloaded or stuck and gets killed and restarted. This >> usually works fine but once in a while the whole topology breaks down, all >> the workers are killed and restarted continually. Looking through the logs >> it looks like some netty errors on initialization kill the Async Loop. The >> topology is never able to recover, I have to kill it manually and relaunch >> it. >> >> Is this something anyone else has come across? Any tips? Config settings >> I could change? >> >> This is a pastebin of the errors: http://pastebin.com/XXZBsEj1 >> > > > > -- > > Ce n'est pas une signature >
