You need to partition the input data correctly, thus that all records with the same key go the same partition. For this case, all records with the same key will be processed by the same task, and thus each key is stored in one shard only.
-Matthias On 11/15/19 4:28 PM, Gioacchino Vino wrote: > Hi expert, > > > I don't understand a kafka behavior and I'm here to ask for explanation. > > My processing task is pretty simple and it's quite similar to a > change-log one. > > The record value contains a key/value pair: if the new value is > different respect the stored one, forward to the output topic and update > the state store, otherwise do nothing. There is also a punctuate task > that forwards all stored data to the output topic periodically (30 > seconds). > > The input topic has 6 partitions. > > The observed behavior is that the punctuate task sends 6 times the same > key/value pair. I figure out that there are 6 state store instances, one > for each topic partition, and this produces the undesired behavior of > having 6 times the same key/value pair, but I want only one. > > I tried to use a single partition for the input topic and in this > scenario I got the correct behavior: the punctuate task sends no pair > copies. > > The issue is that I don't want use input topic with a single partition > because that topic collects data from a large number of producers. > > > Any better explanations? > > Any comments or advices? > > > Thank a lot in advance, > > Gioacchino > > >
signature.asc
Description: OpenPGP digital signature