I am trying to troubleshoot an issue with our storm cluster where a worker process on one of the machines in the cluster does not perform any work. All the counts(emitted/transferred/executed) for all executors in that worker are 0 as shown below. Even if I restart the worker, storm supervisor starts a new one and that does not process any work either.
[120-120]26m 17sstorm6-prod6702 <http://watson-storm6-prod.lup1:8000/log?file=worker-6702.log>000.000 0.00000.00000 Supervisor logs shows that the worker is started and the worker log just has a bunch of zookeeper messages printed every minute. 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Refreshing partition manager connections 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Deleted partition managers: [] 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] New partition managers: [] 2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Finished refreshing I am looking for some debugging help and have following questions. If you have any suggestions , I will appreciate that. - From the storm UI, it looks like the worker process is up and running and is assigned to executing tasks from all bolts and spouts in the topology. But it does not get any messages to work on. Is there a way I can find out why is storm infrastructure routing any messages to any of the bolts running in that process? For spouts, since they are reading from kafka, I could understand that there are no partitions left for this worker to read from and so it does not have anything to read. But I would expect messages from other kafka spouts to be routed to bolts in this worker process. - Is there a way I can enable debug logging for storm which can tell me why a particular worker process is not getting any messages/tuples to execute? Thanks, Girish.
