Hello Chen,

Thanks for reaching out.

If the gap between timestamp01 amd timestamp02 are bigger than the window size, 
the one with smaller timestamp is always dropped.
Well, that not entirely correct. It depends on the used grace-period. You can have many consecutive open windows for the same key, if you grace-period is set to a higher value.


Btw: there is a Jira for the exact thing you are asking about: https://issues.apache.org/jira/browse/KAFKA-8769

Right now, as a user of the DSL, you cannot do anything about it directly. Of course, you could patch the code yourself, and run a "fork" of Kafka Streams.

The other thing you could do is, to use the Processor API, and build a custom Processor for the windowed aggregation.


HTH,
  -Matthias


On 9/18/25 11:41 PM, 李晨 wrote:
Hello everyone,


We're currently using tumbling window in Kafka stream processing our data.


One thing we have noticed is the stream time is advanced bases on the records 
that been processed by the processor node, not by the record for the grouping 
key.


This means, if I have
a data with key01 with timestamp01
a key02 with timestamp02
​If both key01 and key02 in the same partition, they are processed by the same 
stream client, the stream time for them is same.


If the gap between timestamp01 amd timestamp02 are bigger than the window size, 
the one with smaller timestamp is always dropped.


We checked the code implementation, this is controlled by  class 
KStreamWindowAggregate,  located at org.apache.kafka.streams.kstream.internals.


How can we keep all the logic from class KStreamWindowAggregate but only change 
the way it cacuates the stream time? Instead of using the object level 
variable:observedStreamTime, can we have a map or using state store to keep 
separate stream time for each key?


Looking forward to your insights and feedback.


Best regards,
Chen






Reply via email to