Hi all,
How do you get bolts that take a ludicrously long time to load (we're
talking minutes here) to cooperate with Zookeeper?
I may not be understanding my problem properly, but on my test cluster
(**not** in local mode!) my bolt keeps getting restarted in the middle of
its prepare() method -- which may take up to two minutes to return.
The problem seems to be the " Client session timed out", but I'm not
knowledgable enough with Zookeeper to really know how to fix this.
Here's a portion of logs from the supervisor affected. The STDIO messages
come from a poorly-coded third party library that I have to use.
2014-01-17 23:19:28 o.a.z.ClientCnxn [INFO] Client session timed out,
have not heard from server in 2747ms for sessionid 0x143a22eb4060078,
closing socket connection and attempting reconnect
2014-01-17 23:19:28 b.s.d.worker [DEBUG] Doing heartbeat
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000768,
:storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]},
:port 6702}
2014-01-17 23:19:28 b.s.d.worker [DEBUG] Doing heartbeat
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000768,
:storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]},
:port 6702}
2014-01-17 23:19:28 c.n.c.f.s.ConnectionStateManager [INFO] State
change: SUSPENDED
2014-01-17 23:19:28 c.n.c.f.s.ConnectionStateManager [WARN] There are
no ConnectionStateListeners registered.
2014-01-17 23:19:28 b.s.cluster [WARN] Received event
:disconnected::none: with disconnected Zookeeper.
2014-01-17 23:19:28 b.s.cluster [WARN] Received event
:disconnected::none: with disconnected Zookeeper.
2014-01-17 23:19:28 STDIO [ERROR] done [7.2 sec].
2014-01-17 23:19:28 STDIO [ERROR] Adding annotator lemma
2014-01-17 23:19:28 STDIO [ERROR] Adding annotator ner
2014-01-17 23:19:28 STDIO [ERROR] Loading classifier from
edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz
2014-01-17 23:19:28 STDIO [ERROR] ...
2014-01-17 23:19:29 b.s.d.worker [DEBUG] Doing heartbeat
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000769,
:storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]},
:port 6702}
2014-01-17 23:19:29 b.s.d.worker [DEBUG] Doing heartbeat
#backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000769,
:storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]},
:port 6702}
2014-01-17 23:19:30 o.a.z.ClientCnxn [INFO] Opening socket connection
to server zookeeper/192.168.50.3:2181
^-- This is where the bolt gets restarted in its initialization.
Thanks,
Eddie