Load balancer for Kafka brokers
Hi, Has anyone used load balancers between publishers and Kafka brokers? I want to do active-passive setup of Kafka in two datacenters. My question is can I add GSLB layer between these two Kafka clusters to configure automatic fail over while publishing data? Thanks, LCassa
Re: Load balancer for Kafka brokers
We use loadbalancers for our producer configurations, but what you need to keep in mind is that that connection is only used for metadata requests. The producer queries the loadbalancer IP for metadata for the topic, then disconnects and reconnects directly to the Kafka brokers for producing messages. With the older producer lib, it periodically reconnects to the loadbalancer to refresh metadata. With the newer producer lib, it actually caches information about all the brokers locally and queries them directly for metadata refreshes moving forwards (and therefore does not use the loadbalancer again). In your situation, it sounds like you want to put all the individual broker connections through the GSLB as well. In order to do this, you would have to: - have an individual GSLB configuration for each broker, where that config has an active/passive setup with 1 broker from each DC (Not too bad) - configure the announced hostnames for each broker to be the same in the active and passive DC (A little tricky) - maintain the exact same partition to broker mapping, including leadership, in each DC (Virtually impossible) In short, I don’t think this is a reasonable thing to do. You’re not going to be able to assure the exact partition mapping, especially not in the face of Zookeeper timeouts or hardware failures that will cause partition leadership to move around. This will result in partitions becoming unavailable as soon as one of the clusters shifts just a little bit. A better way to approach this is probably to set up a front-end service, such as a REST endpoint for Kafka, which receives produce requests and publishes them to the local Kafka cluster. Then you can put that endpoint behind the GSLB, and you do not have to worry about the makeup of the Kafka clusters themselves. Your producers would all send their messages through the GSLB to that endpoint, rather than talking to Kafka directly. -Todd On Tue, Nov 3, 2015 at 10:15 AM, Cassa Lwrote: > Hi, > Has anyone used load balancers between publishers and Kafka brokers? I > want to do active-passive setup of Kafka in two datacenters. My question > is can I add GSLB layer between these two Kafka clusters to configure > automatic fail over while publishing data? > > Thanks, > LCassa >
Re: Load balancer for Kafka brokers
Thanks for detail answer. Regards, LCassa On Tue, Nov 3, 2015 at 10:54 AM, Todd Palinowrote: > We use loadbalancers for our producer configurations, but what you need to > keep in mind is that that connection is only used for metadata requests. > The producer queries the loadbalancer IP for metadata for the topic, then > disconnects and reconnects directly to the Kafka brokers for producing > messages. With the older producer lib, it periodically reconnects to the > loadbalancer to refresh metadata. With the newer producer lib, it actually > caches information about all the brokers locally and queries them directly > for metadata refreshes moving forwards (and therefore does not use the > loadbalancer again). > > In your situation, it sounds like you want to put all the individual broker > connections through the GSLB as well. In order to do this, you would have > to: > > - have an individual GSLB configuration for each broker, where that config > has an active/passive setup with 1 broker from each DC (Not too bad) > - configure the announced hostnames for each broker to be the same in the > active and passive DC (A little tricky) > - maintain the exact same partition to broker mapping, including > leadership, in each DC (Virtually impossible) > > In short, I don’t think this is a reasonable thing to do. You’re not going > to be able to assure the exact partition mapping, especially not in the > face of Zookeeper timeouts or hardware failures that will cause partition > leadership to move around. This will result in partitions becoming > unavailable as soon as one of the clusters shifts just a little bit. > > A better way to approach this is probably to set up a front-end service, > such as a REST endpoint for Kafka, which receives produce requests and > publishes them to the local Kafka cluster. Then you can put that endpoint > behind the GSLB, and you do not have to worry about the makeup of the Kafka > clusters themselves. Your producers would all send their messages through > the GSLB to that endpoint, rather than talking to Kafka directly. > > -Todd > > > > On Tue, Nov 3, 2015 at 10:15 AM, Cassa L wrote: > > > Hi, > > Has anyone used load balancers between publishers and Kafka brokers? I > > want to do active-passive setup of Kafka in two datacenters. My question > > is can I add GSLB layer between these two Kafka clusters to configure > > automatic fail over while publishing data? > > > > Thanks, > > LCassa > > >