Hi, I am developing a benchmark on Storm 0.9.4. The benchmark consists three bolts which can be parallelized (i.e., by increasing the number of executors) without affecting the semantics of the application. Although the benchmark does filter part of its input data, it has a deterministic behavior. I should get the same number of output events across multiple runs with the data set I use during the experiments. The bolts were developed following the reliability guidelines provided by Storm developers [1] and were extended from BaseBasicBolt. While a level of parallelism one worked as expected, with increasing numbers of parallelism (2,4, and 8), I experienced event loss at the output. The event count was taken at the last bolt of the topology based on a first punctuation and a last punctuation sent through the benchmark application. One of the reasons for event loss was that the last punctuation arrived at the output before some other events have been arrived at the output bolt giving a lesser number of events than what I should get. This is behavior is typical in any stream processing application. The solution was to delay sending the last punctuation event 20 seconds after sending all the events. Furthermore, there was a text compression step which was CPU intensive in the final bolt of the topology. This heavy bolt gave rise to additional event loss. Once the text compression step was removed, the benchmark's storm topology was running as expected without any event loss irrespective of the level of parallelism used (even for a parallelism of 32).
My question is does Storm have some way of warning the user/developer of a streaming application that there was such event losses? [1] https://storm.apache.org/documentation/Guaranteeing-message-processing.html Thanks, Miyuru
