Implementing both suggestions worked. Thanks!
- Eddie On Tue, Feb 18, 2014 at 1:58 PM, Michael Rose <[email protected]>wrote: > You may need to configure your cluster to give it more time to start up. > Additionally, knowing how long it can take to load the Stanford NLP models, > make sure you're only doing it in a single bolt instance (e.g. static > initializer or double-check synch) and sharing it between all your bolt > instances. > > supervisor.worker.start.timeout.secs 120 > supervisor.worker.timeout.secs 60 > > I'd try tuning your worker start timeout here. Try setting it up to 300s > and (again) ensuring your prepare method only initializes expensive > resources once, then shares them between instances in the JVM. > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > [email protected] > > > On Tue, Feb 18, 2014 at 1:45 PM, Eddie Santos <[email protected]>wrote: > >> Hi all, >> >> How do you get bolts that take a ludicrously long time to load (we're >> talking minutes here) to cooperate with Zookeeper? >> >> I may not be understanding my problem properly, but on my test cluster >> (**not** in local mode!) my bolt keeps getting restarted in the middle of >> its prepare() method -- which may take up to two minutes to return. >> >> The problem seems to be the " Client session timed out", but I'm not >> knowledgable enough with Zookeeper to really know how to fix this. >> >> Here's a portion of logs from the supervisor affected. The STDIO messages >> come from a poorly-coded third party library that I have to use. >> >> 2014-01-17 23:19:28 o.a.z.ClientCnxn [INFO] Client session timed out, >> have not heard from server in 2747ms for sessionid 0x143a22eb4060078, >> closing socket connection and attempting reconnect >> 2014-01-17 23:19:28 b.s.d.worker [DEBUG] Doing heartbeat >> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000768, >> :storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]}, >> :port 6702} >> 2014-01-17 23:19:28 b.s.d.worker [DEBUG] Doing heartbeat >> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000768, >> :storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]}, >> :port 6702} >> 2014-01-17 23:19:28 c.n.c.f.s.ConnectionStateManager [INFO] State >> change: SUSPENDED >> 2014-01-17 23:19:28 c.n.c.f.s.ConnectionStateManager [WARN] There are >> no ConnectionStateListeners registered. >> 2014-01-17 23:19:28 b.s.cluster [WARN] Received event >> :disconnected::none: with disconnected Zookeeper. >> 2014-01-17 23:19:28 b.s.cluster [WARN] Received event >> :disconnected::none: with disconnected Zookeeper. >> 2014-01-17 23:19:28 STDIO [ERROR] done [7.2 sec]. >> 2014-01-17 23:19:28 STDIO [ERROR] Adding annotator lemma >> 2014-01-17 23:19:28 STDIO [ERROR] Adding annotator ner >> 2014-01-17 23:19:28 STDIO [ERROR] Loading classifier from >> edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz >> 2014-01-17 23:19:28 STDIO [ERROR] ... >> 2014-01-17 23:19:29 b.s.d.worker [DEBUG] Doing heartbeat >> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000769, >> :storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]}, >> :port 6702} >> 2014-01-17 23:19:29 b.s.d.worker [DEBUG] Doing heartbeat >> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1390000769, >> :storm-id "nlptools-test-1-1390000740", :executors #{[3 3] [6 6] [-1 -1]}, >> :port 6702} >> 2014-01-17 23:19:30 o.a.z.ClientCnxn [INFO] Opening socket connection >> to server zookeeper/192.168.50.3:2181 >> >> ^-- This is where the bolt gets restarted in its initialization. >> >> Thanks, >> Eddie >> > >
