Hi Mark, The leader election of a new topic partition happens once the controller detects that the Leader has crashed. This happens asynchronously via a zookeeper listener. Once a zookeeper listener is fired, the corresponding object indicating the event happened is put in to a controller queue. The controller has a single thread that pulls data out of this queue and handles each event one after another. I can't remember of a config to tune this, on top of my head. How much delay are you seeing in leadership change? Are there any controller socket timeouts in the log? Also might want to take a look at KIP-291 (KAFKA-4453), which is meant for shortening this time period for handling controller events.
Thanks, Mayuresh On Thu, Dec 6, 2018 at 9:50 AM Harper Henn <harper.h...@datto.com> wrote: > Hi Mark, > > If a broker fails and you want to elect a new leader as quickly as > possible, you could tweak zookeeper.session.timeout.ms in the kafka broker > configuration. According to the documentation: "If the consumer fails to > heartbeat to ZooKeeper for this period of time it is considered dead and a > rebalance will occur." > > https://kafka.apache.org/0101/documentation.html > > I think making zookeeper.session.timeout.ms smaller will result in faster > detection of a dead node, but the downside is that a leader election might > get triggered by network blips or other cases where your broker is not > actually dead. > > Harper > > On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson <manderso...@gmail.com> > wrote: > > > Hi, > > > > I'm currently testing how Kafka reacts in cases of broker failure due to > > process failure or network timeout. > > > > I'd like to have the election of a new leader for a topic partition > happen > > as quickly as possible but it is unclear from the documentation or broker > > configuration what the key parameters are to tune to make this possible. > > > > Does anyone have any pointers? Or are there any guides online? > > > > Thanks, > > Mark > > > -- -Regards, Mayuresh R. Gharat (862) 250-7125