Hi Sigalit,
if I understand correctly, what you want is a deduplication of outputs
coming from your GBK logic, is that correct? If do, you can use a
stateful DoFn and a ValueState holding the last value seen per key in
global window. There is an implementation of this approach in
Deduplicate.KeyedValues [1].
Jan
[1]
https://beam.apache.org/releases/javadoc/2.38.0/org/apache/beam/sdk/transforms/Deduplicate.KeyedValues.html
On 4/27/22 13:36, Sigalit Eliazov wrote:
Hi all
i have the following scenario:
a. a pipeline that reads messages from kafka and a session window with
1 minute duration.
b. groupbykey in order to aggregate the data
c. for each 'group' i do some calculation and build a new event to
send to kafka.
the output of this cycle is
key1 - value1
key2 - value2
If a new message arrives with the same key i would like to have a
logic that checks
if the current message is : key1-value1 don't send (because it was
already sent).
Currently we implemented this using DB (postgres).
the performance of this implementation is not very good.
Is there any way to implement this without any external state?
thanks a lot
Sigalit