Hi,

For some reason, after a few hours of processing, my topology starts
hanging. In the UI's 'Topology Stats' the emitted and transferred counts
are equal to 0, and I can't see anything coming out of the topology
(usually inserting in some database).

I can't see anything unusual in the storm workers logs, nor in kafka and
zookeeper's logs.
The zkCoordinator keeps refreshing, but nothing happens :
2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] Deleted partition
managers: []
2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] New partition
managers: []
2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] Finished refreshing
2014-10-31 17:00:13 s.k.DynamicBrokersReader [INFO] Read partition info
from zookeeper: GlobalPartitionInformation{...

I don't really understand why this is hanging, and how I could fix this.


I'm using storm 0.9.2-incubating with Kafka 0.8.1.1 and storm-kafka
0.9.2-incubating.

My topology pulls data from 4 different topics in Kafka, and has 9
different bolts. Each bolt implements IBasicBolt. I'm not doing any acking
manually (storm should take care of this for me, right?)
It takes a few second for a tuple to go through the entire topology.
I'm setting a MaxSpoutPending to limit the number of tuples in the topology.
My tuples shouldn't exceed the max size limit (set to default on my kafka
brokers and in my SpoutConfig. And I think the default is rather high and
should easily handle a few lines of text)
The tuples don't necessarily go to each bolt.

I'm defining my spouts like this:
        ZkHosts zkHosts = new ZkHosts("zk1.example.com:2181", "
zk2.example.com:2181"...);
        zkHosts.refreshFreqSecs = 120;

        SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts(),
                "TOPIC_NAME",
                "/consumers",
                "CONSUMER_ID");
        kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
        KafkaSpout kafkaSpout = new KafkaSpout(kafkaConfig)

I'm running this topology on 2 different workers, located on two different
supervisors. In total I'm using something like 160 executors.


I would greatly appreciate any help or hints on how to fix/investigate this
problem!

Thanks,
Maxime

Reply via email to