Hey Guys, We are using Storm in production for almost one year. It runs perfect until lastly there are some weired exceptions in our topologies.
Storm-kafka is what we used to read messages from Kafka into Storm, along with our business logic running in Storm topologies. Both Storm and storm-kafka is version 1.0.0. At some circumstances the KafkaSpout will get killed and never start up again, so the messages in this topic or some partitions of this topic will not find its way to out topologies. The exception message in worker.log showed that there are exceptions in the nextTuple() method of KafkaSpout. After reading the source code https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/org/apache/storm/kafka/KafkaSpout.java here, I found that line 145(which is zookeeper refreshing in the catch clause) is the first place where this Exception is raised. Is this problem visible to anyone else? And do you guys have any possible solutions for this? For now I just rewrite the nextTuple method to surround the original code into a new try-catch block which did nothing to prevent the KafkaSpout from being killed. Any suggestions here? Thanks!
