[ 
https://issues.apache.org/jira/browse/KAFKA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12256.
-----------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

> auto commit causes delays due to retriable UNKNOWN_TOPIC_OR_PARTITION
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-12256
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12256
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 2.0.0
>            Reporter: Ryan Leslie
>            Priority: Minor
>              Labels: new-consumer-threading-should-fix
>             Fix For: 3.2.0
>
>
> In KAFKA-6829 a change was made to the consumer to internally retry commits 
> upon receiving UNKNOWN_TOPIC_OR_PARTITION.
> Though this helped mitigate issues around stale broker metadata, there were 
> some valid concerns around the negative effects for routine topic deletion:
> https://github.com/apache/kafka/pull/4948
> In particular, if a commit is issued for a deleted topic, retries can block 
> the consumer for up to max.poll.interval.ms. This is tunable of course, but 
> any amount of stalling in a consumer can lead to unnecessary lag.
> One of the assumptions while permitting the change was that in practice it 
> should be rare for commits to occur for deleted topics, since that would 
> imply messages were being read or published at the time of deletion. It's 
> fair to expect users to not delete topics that are actively published to. But 
> this assumption is false in cases where auto commit is enabled.
> With the current implementation of auto commit, the consumer will regularly 
> issue commits for all topics being fetched from, regardless of whether or not 
> messages were actually received. The fetch positions are simply flushed, even 
> when they are 0. This is simple and generally efficient, though it does mean 
> commits are often redundant. Besides the auto commit interval, commits are 
> also issued at the time of rebalance, which is often precisely at the time 
> topics are deleted.
> This means that in practice commits for deleted topics are not really rare. 
> This is particularly an issue when the consumer is subscribed to a multitude 
> of topics using a wildcard. For example, a consumer might subscribe to a 
> particular "flavor" of topic with the aim of auditing all such data, and 
> these topics might dynamically come and go. The consumer's metadata and 
> rebalance mechanisms are meant to handle this gracefully, but the end result 
> is that such groups are often blocked in a commit for several seconds or 
> minutes (the default is 5 minutes) whenever a delete occurs. This can 
> sometimes result in significant lag.
> Besides having users abandon auto commit in the face of topic deletes, there 
> are probably multiple ways to deal with this, including reconsidering if 
> commits still truly need to be retried here, or if this behavior should be 
> more configurable; e.g. having a separate commit timeout or policy. In some 
> cases the loss of a commit and subsequent message duplication is still 
> preferred to processing delays. And having an artificially low 
> max.poll.interval.ms or rebalance.timeout.ms comes with its own set of 
> concerns.
> In the very least the current behavior and pitfalls around delete with active 
> consumers should be documented.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to