[ 
https://issues.apache.org/jira/browse/KAFKA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7655.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.2
                   2.1.1
                   2.2.0

> Metadata spamming requests from Kafka Streams under some circumstances, 
> potential DOS
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7655
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7655
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.0.1
>            Reporter: Pasquale Vazzana
>            Assignee: Pasquale Vazzana
>            Priority: Major
>              Labels: performance, pull-request-available, security
>             Fix For: 2.2.0, 2.1.1, 2.0.2
>
>
> There is a bug in the InternalTopicManager that makes the client believe that 
> a topic exists even though it doesn't, it occurs mostly in those few seconds 
> between when a topic is marked for deletion and when it is actually deleted. 
> In that timespan, the Broker gives inconsistent information, first it hides 
> the topic but then it refuses to create a new one therefore the client 
> believes the topic was existing already and it starts polling for metadata.
> The consequence is that the client goes into a loop where it polls for topic 
> metadata and if this is done by many threads it can take down a small cluster 
> or degrade greatly its performances.
> The real life scenario is probably a reset gone wrong. Reproducing the issue 
> is fairly simple, these are the steps:
>  * Stop a Kafka streams application
>  * Delete one of its changelog and the local store
>  * Restart the application immediately after the topic delete
>  * You will see the Kafka streams application hanging after the bootstrap 
> saying something like: INFO  Metadata - Cluster ID: xxxx
>  
> I am attaching a patch that fixes the issue client side but my personal 
> opinion is that this should be tackled on the broker as well, metadata 
> requests seem expensive and it would be easy to craft a DDOS that can 
> potentially take down an entire cluster in seconds just by flooding the 
> brokers with metadata requests.
> The patch kicks in only when a topic that wasn't existing in the first call 
> to getNumPartitions triggers a TopicExistsException. When this happens it 
> forces the re-validation of the topic and if it still looks like doesn't 
> exists plan a retry with some delay, to give the broker the necessary time to 
> sort it out.
> I think this patch makes sense beside the above mentioned use case where a 
> topic it's not existing, because, even if the topic was actually created, the 
> client should not blindly trust it and should still re-validate it by 
> checking the number of partitions. IE: a topic can be created automatically 
> by the first request and then it would have the default partitions rather 
> than the expected ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to