Hi, We are using Apache Storm for a couple of years, and everything was fine till now. For our spout we are using “storm-kafka-0.9.4.jar”.
Lately, we started seeing that our “Failed” number of events has increased dramatically, and currently almost 20% of our total events are marked as Failed. We tried investigating our Topology logs, but we came up empty handed. Also checking our DB logs didn’t give us any clue as for heavy load on our system. Moreover, our spout complete latency is 25.996 ms, which overruled any timeouts that might occur. Lowering our max pending value has produced a negative result. At some point, since we are not using anchoring, we thought about adding anchoring, but we saw that the KafkaSpout handles failures by replaying them, so we were not sure whether to add it or not. It would be helpful if you can direct us as to where we can find in Storm logs the reason for these failures, if there’s an exception which is not caught, maybe a time out, since we are a bit blind at the moment. We would appreciate any help with that. Thanks in advance, Yovav
