I am trying to troubleshoot an issue with our storm cluster where a worker
process on one of the machines in the cluster does not perform any work.
All the counts(emitted/transferred/executed) for all executors in that
worker are 0 as shown below. Even if I restart the worker, storm supervisor
starts a new one and that does not process any work either.

[120-120]26m 17sstorm6-prod6702
<http://watson-storm6-prod.lup1:8000/log?file=worker-6702.log>000.000
0.00000.00000

Supervisor logs shows that the worker is started and the worker log just
has a bunch of zookeeper messages printed every minute.

2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Refreshing partition manager
connections
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Deleted partition managers: []
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] New partition managers: []
2015-03-19 22:25:07 s.k.ZkCoordinator [INFO] Finished refreshing

I am looking for some debugging help and have following questions. If you
have any suggestions , I will appreciate that.

- From the storm UI, it looks like the worker process is up and running and
is assigned to executing tasks from all bolts and spouts in the topology.
But it does not get any messages to work on. Is there a way I can find out
why is storm infrastructure routing any messages to any of the bolts
running in that process? For spouts, since they are reading from kafka, I
could understand that there are no partitions left for this worker to read
from and so it does not have anything to read. But I would expect messages
from other kafka spouts to be routed to bolts in this worker process.

- Is there a way I can enable debug logging for storm which can tell me why
a particular worker process is not getting any messages/tuples to execute?

Thanks,

Girish.

Reply via email to