This morning, the supervisors/nimbus connections to zookeeper are having problems (they are inside AWS VPC same subnet, not sure the root issue), from the supervisor log:
http://pastebin.com/CVMTrfuQ The supervisord died at line: 2014-05-11 00:11:07 b.s.util [INFO] Halting process: ("Error when processing an event") My Python Supervisord will auto restart the supervisor and after restarted, I observed the error log supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started supervisor [INFO] 46b25fa5-b333-4985-9c1d-3f112d5c615a still hasn't started And the memory usage is substantially increased, and the topology is not running at all. Until sometimes later, I killed all the Java processes and it become normal again. (I am using netty transport) Any idea? Thanks.
