Hello everyone,

We're currently using tumbling window in Kafka stream processing our data. 


One thing we have noticed is the stream time is advanced bases on the records 
that been processed by the processor node, not by the record for the grouping 
key.


This means, if I have
a data with key01 with timestamp01  
a key02 with timestamp02
​If both key01 and key02 in the same partition, they are processed by the same 
stream client, the stream time for them is same.


If the gap between timestamp01 amd timestamp02 are bigger than the window size, 
the one with smaller timestamp is always dropped.


We checked the code implementation, this is controlled by  class 
KStreamWindowAggregate,  located at org.apache.kafka.streams.kstream.internals.


How can we keep all the logic from class KStreamWindowAggregate but only change 
the way it cacuates the stream time? Instead of using the object level 
variable:observedStreamTime, can we have a map or using state store to keep 
separate stream time for each key?


Looking forward to your insights and feedback.


Best regards,
Chen 





Reply via email to