Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I occasionally run into a strange and pretty catastrophic error. One of my workers is either overloaded or stuck and gets killed and restarted. This usually works fine but once in a while the whole topology breaks down, all the workers are killed and restarted continually. Looking through the logs it looks like some netty errors on initialization kill the Async Loop. The topology is never able to recover, I have to kill it manually and relaunch it.
Is this something anyone else has come across? Any tips? Config settings I could change? This is a pastebin of the errors: http://pastebin.com/XXZBsEj1
