We are on Storm 0.8.2 and have a fairly simple topology with one spout and
9 bolts on two worker nodes. The spout reads from a Azure storage queue via
the Azure API and the final bolt posts data to a REST API endpoint.


Things work great for a while but over time messages are being dropped
between the bolts meaning the difference between the “Emitted” count of one
bolt and the “Executed” count of the next bolt in the processing chain is
getting larger and larger to the point where no messages make it all the
way through the system. When we restart the topology everything is fine
again for a day or two. This would indicate a resource issue but the
servers look very healthy (memory/CPU never go over 50%).

We recently stopped acking messages but that did not make a difference. We
also tried to switch to Storm 0.9.0.1 (trying both Netty and 0MQ) but that
did not help either.

We are pretty much out of ideas. What are we missing? Any suggestions are
very much appreciated!


Thanks,

Markus

Reply via email to