Jason Gustafson created KAFKA-15828:
---------------------------------------

             Summary: Protect clients from broker hostname reuse
                 Key: KAFKA-15828
                 URL: https://issues.apache.org/jira/browse/KAFKA-15828
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson


In some environments such as k8s, brokers may be assigned to nodes dynamically 
from an available pool. When a cluster is rolling, it is possible for the 
client to see the same node advertised for different broker IDs in a short 
period of time. For example, kafka-1 might be initially assigned to node1. 
Before the client is able to establish a connection, it could be that kafka-3 
is now on node1 instead. Currently there is no protection in the client or in 
the protocol for this scenario. If the connection succeeds, the client will 
assume it has a good connection to kafka-1. Until something disrupts the 
connection, it will continue under this assumption even if the hostname for 
kafka-1 changes.

We have observed this scenario in practice. The client connected to the wrong 
broker through stale hostname information. It was unable to produce data 
because of persistent NOT_LEADER errors. The only way to recover in the end was 
by restarting the client to force a reconnection.

We have discussed a couple potential solutions to this problem:
 # Let the client be smarter managing the connection/hostname mapping. When it 
detects that a hostname has changed, it should force a disconnect to ensure it 
connects to the right node.
 # We can modify the protocol to verify that the client has connected to the 
intended broker. For example, we can add a field to ApiVersions to indicate the 
intended broker ID. The broker receiving the request can return an error if its 
ID does not match that in the request.

Are there alternatives? 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to