Guozhang Wang created KAFKA-10391:
-------------------------------------

             Summary: Streams should overwrite checkpoint excluding corrupted 
partitions
                 Key: KAFKA-10391
                 URL: https://issues.apache.org/jira/browse/KAFKA-10391
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: Guozhang Wang
            Assignee: Guozhang Wang


While working on https://issues.apache.org/jira/browse/KAFKA-9450 I discovered 
another bug in Streams: when some partitions are corrupted due to offsets out 
of range, we treat it as task corrupted and would close them as dirty and then 
revive. However we forget to overwrite the checkpoint file excluding those 
out-of-range partitions to let them be re-bootstrapped from the new log-start 
offset, and hence when the task is revived, it would still load the old offset 
and start from there and then get the out-of-range exception again. This may 
cause {{StreamsUpgradeTest.test_app_upgrade}} to be flaky.

We do not see this often because in the past we always delete the checkpoint 
file after loading it and we usually only see the out-of-range exception at the 
beginning of the restoration but not during restoration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to