Hey Everyone,
This will be a blast from the past, but some problems have prevented us
from upgrading to 0.9.x at the moment.
The problem we're seeing is ZMQ trying to use an address already in use:
2015-01-14 04:39:38 util [ERROR] Async loop died!
org.zeromq.ZMQException: Address already in use(0x62)
at org.zeromq.ZMQ$Socket.bind(Native Method)
at zilch.mq$bind.invoke(mq.clj:69)
at backtype.storm.messaging.zmq.ZMQContext.bind(zmq.clj:57)
at
backtype.storm.messaging.loader$launch_receive_thread_BANG_$fn__1629.invoke(loader.clj:26)
at backtype.storm.util$async_loop$fn__465.invoke(util.clj:375)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
I googled around and couldn't find anything. It looks like this tends to
affect some workers on a machine, but not all. Those workers are
functionally dead, and eventually Storm will switch to using a different
workers. However, as our machines get more full, these get hit more often,
and the time to rebalance usually means failed tuples.
I know it's an old version, but any help would be greatly appreciated.
Thanks,
Keith.