Hi expert,

I don't understand a kafka behavior and I'm here to ask for explanation.

My processing task is pretty simple and it's quite similar to a change-log one.

The record value contains a key/value pair: if the new value is different respect the stored one, forward to the output topic and update the state store, otherwise do nothing. There is also a punctuate task that forwards all stored data to the output topic periodically (30 seconds).

The input topic has 6 partitions.

The observed behavior is that the punctuate task sends 6 times the same key/value pair. I figure out that there are 6 state store instances, one for each topic partition, and this produces the undesired behavior of having 6 times the same key/value pair, but I want only one.

I tried to use a single partition for the input topic and in this scenario I got the correct behavior: the punctuate task sends no pair copies.

The issue is that I don't want use input topic with a single partition because that topic collects data from a large number of producers.


Any better explanations?

Any comments or advices?


Thank a lot in advance,

Gioacchino



Reply via email to