Hi expert,
I don't understand a kafka behavior and I'm here to ask for explanation.
My processing task is pretty simple and it's quite similar to a
change-log one.
The record value contains a key/value pair: if the new value is
different respect the stored one, forward to the output topic and update
the state store, otherwise do nothing. There is also a punctuate task
that forwards all stored data to the output topic periodically (30 seconds).
The input topic has 6 partitions.
The observed behavior is that the punctuate task sends 6 times the same
key/value pair. I figure out that there are 6 state store instances, one
for each topic partition, and this produces the undesired behavior of
having 6 times the same key/value pair, but I want only one.
I tried to use a single partition for the input topic and in this
scenario I got the correct behavior: the punctuate task sends no pair
copies.
The issue is that I don't want use input topic with a single partition
because that topic collects data from a large number of producers.
Any better explanations?
Any comments or advices?
Thank a lot in advance,
Gioacchino