Guozhang Wang created KAFKA-3514:
------------------------------------

             Summary: Stream timestamp computation needs some further thoughts
                 Key: KAFKA-3514
                 URL: https://issues.apache.org/jira/browse/KAFKA-3514
             Project: Kafka
          Issue Type: Bug
          Components: kafka streams
            Reporter: Guozhang Wang
             Fix For: 0.10.1.0


Our current stream task's timestamp is used for punctuate function as well as 
selecting which stream to process next (i.e. best effort stream 
synchronization). And it is defined as the smallest timestamp over all 
partitions in the task's partition group. This results in two unintuitive 
corner cases:

1) observing a late arrived record would keep that stream's timestamp low for a 
period of time, and hence keep being process until that late record. For 
example take two partitions within the same task annotated by their timestamps:

{code}
Stream A: 5, 6, 7, 8, 9, 1, 10
{code}

{code}
Stream B: 2, 3, 4, 5
{code}

The late arrived record with timestamp "1" will cause stream A to be selected 
continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
until the record itself is dequeued and processed, then stream B will be 
selected starting with timestamp 2.

2) an empty buffered partition will cause its timestamp to be not advanced, and 
hence the task timestamp as well since it is the smallest among all partitions. 
This may not be a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to