> Can you somehow verify your output?

Do you mean the Kafka streams output? In the Kafka Streams output, we do
see some missing values. I have attached the Kafka Streams output (for a
few hours) in the very first email of this thread for reference.

Let me also summarise what we have done so far.

We took a dump of the raw data present in the source topic. We wrote a
script to read this data and do the exact same aggregations that we do
using Kafka Streams. And then we compared the output from Kafka Streams and
our script.

The difference that we observed in the two outputs is that there were a few
rows (corresponding to some time windows) missing in the Streams output.
For the time windows for which the data was present, the aggregated numbers
matched exactly.

This means, either all the records for a particular time window are being
skipped, or none. Now this is highly unlikely to happen. Maybe there is a
bug somewhere in the rocksdb state stores? Just a speculation, not sure
though. And there could even be a bug in the reported metric.

Reply via email to