Hello!

We are running Kafka 2.1.1. Yesterday we accidentally  corrupted dns record for 
our controller broker. As a result, broker was not visible from outside, but 
was able to connect to zookeeper cluster (zookeepers were on remote servers) 
and other Kafka Brokers (also on remote servers) from inside. Thus, zookeeper 
did not delete ephemeral node of the controller and controller reelection was 
not triggered. Also leader reassignment of partitions was not triggered 
(because broker continued reporting itself as heathy).

To sum up, we experienced a lot of connection timeouts from clients and 
replicas to some partitions of Kafka cluster (leader partitions on corrupted 
broker) and we lost a broker for some time (it was inaccessible). Though, Kafka 
cluster did not react somehow and reported health state of cluster.

I believe, this is not a bug, but behavior of Kafka could be improved (for 
example, heart beats from zookeeper to Kafka brokers or some kind of ACKs, that 
Kafka really accessible from outside world). I am interested in community 
opinion.

Regards,
Tolya

________________________________
"This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом.”

Reply via email to