Hi Sigalit,

if I understand correctly, what you want is a deduplication of outputs coming from your GBK logic, is that correct? If do, you can use a stateful DoFn and a ValueState holding the last value seen per key in global window. There is an implementation of this approach in Deduplicate.KeyedValues [1].

 Jan

[1] https://beam.apache.org/releases/javadoc/2.38.0/org/apache/beam/sdk/transforms/Deduplicate.KeyedValues.html

On 4/27/22 13:36, Sigalit Eliazov wrote:
Hi all
i have the following scenario:
a. a pipeline that reads messages from kafka and a session window with 1 minute duration.
b.  groupbykey in order to aggregate the data
c. for each 'group' i do some calculation and build a new event to send to kafka.

the output of this cycle is
key1 - value1
key2 - value2

If a new message arrives with the same key i would like to have a logic that checks if the current message is : key1-value1 don't send (because it was already sent).
Currently we implemented this using DB (postgres).
the performance of this implementation is not very good.
Is there any way to implement this without any external state?

thanks a lot
Sigalit

Reply via email to