Thanks for the reply, Xi! The default value of 'controller.socket.timeout.ms'
is 3. That is 30 seconds. What we have observed was that the controller
would not assign another replica as the leader, even if it failed to send
updated topic metadata information too the problematic broker for >30
Hi everyone,
Recently we had a cluster in which the controller failed to connect to a
broker A for an extended period of time. I had expected that the
controller would identify the broker as a failed broker, and re-elect
another broker as the leader for partitions that were hosted on broker A.