Load balancer for Kafka brokers

2015-11-03 Thread Cassa L
Hi,
 Has anyone used load balancers between publishers and  Kafka brokers? I
want to do active-passive setup of Kafka in two datacenters.  My question
is can I add GSLB layer between these two Kafka clusters to configure
automatic fail over while publishing data?

Thanks,
LCassa


Re: Load balancer for Kafka brokers

2015-11-03 Thread Todd Palino
We use loadbalancers for our producer configurations, but what you need to
keep in mind is that that connection is only used for metadata requests.
The producer queries the loadbalancer IP for metadata for the topic, then
disconnects and reconnects directly to the Kafka brokers for producing
messages. With the older producer lib, it periodically reconnects to the
loadbalancer to refresh metadata. With the newer producer lib, it actually
caches information about all the brokers locally and queries them directly
for metadata refreshes moving forwards (and therefore does not use the
loadbalancer again).

In your situation, it sounds like you want to put all the individual broker
connections through the GSLB as well. In order to do this, you would have
to:

- have an individual GSLB configuration for each broker, where that config
has an active/passive setup with 1 broker from each DC (Not too bad)
- configure the announced hostnames for each broker to be the same in the
active and passive DC (A little tricky)
- maintain the exact same partition to broker mapping, including
leadership, in each DC (Virtually impossible)

In short, I don’t think this is a reasonable thing to do. You’re not going
to be able to assure the exact partition mapping, especially not in the
face of Zookeeper timeouts or hardware failures that will cause partition
leadership to move around. This will result in partitions becoming
unavailable as soon as one of the clusters shifts just a little bit.

A better way to approach this is probably to set up a front-end service,
such as a REST endpoint for Kafka, which receives produce requests and
publishes them to the local Kafka cluster. Then you can put that endpoint
behind the GSLB, and you do not have to worry about the makeup of the Kafka
clusters themselves. Your producers would all send their messages through
the GSLB to that endpoint, rather than talking to Kafka directly.

-Todd



On Tue, Nov 3, 2015 at 10:15 AM, Cassa L  wrote:

> Hi,
>  Has anyone used load balancers between publishers and  Kafka brokers? I
> want to do active-passive setup of Kafka in two datacenters.  My question
> is can I add GSLB layer between these two Kafka clusters to configure
> automatic fail over while publishing data?
>
> Thanks,
> LCassa
>


Re: Load balancer for Kafka brokers

2015-11-03 Thread Cassa L
Thanks for detail answer.

Regards,
LCassa

On Tue, Nov 3, 2015 at 10:54 AM, Todd Palino  wrote:

> We use loadbalancers for our producer configurations, but what you need to
> keep in mind is that that connection is only used for metadata requests.
> The producer queries the loadbalancer IP for metadata for the topic, then
> disconnects and reconnects directly to the Kafka brokers for producing
> messages. With the older producer lib, it periodically reconnects to the
> loadbalancer to refresh metadata. With the newer producer lib, it actually
> caches information about all the brokers locally and queries them directly
> for metadata refreshes moving forwards (and therefore does not use the
> loadbalancer again).
>
> In your situation, it sounds like you want to put all the individual broker
> connections through the GSLB as well. In order to do this, you would have
> to:
>
> - have an individual GSLB configuration for each broker, where that config
> has an active/passive setup with 1 broker from each DC (Not too bad)
> - configure the announced hostnames for each broker to be the same in the
> active and passive DC (A little tricky)
> - maintain the exact same partition to broker mapping, including
> leadership, in each DC (Virtually impossible)
>
> In short, I don’t think this is a reasonable thing to do. You’re not going
> to be able to assure the exact partition mapping, especially not in the
> face of Zookeeper timeouts or hardware failures that will cause partition
> leadership to move around. This will result in partitions becoming
> unavailable as soon as one of the clusters shifts just a little bit.
>
> A better way to approach this is probably to set up a front-end service,
> such as a REST endpoint for Kafka, which receives produce requests and
> publishes them to the local Kafka cluster. Then you can put that endpoint
> behind the GSLB, and you do not have to worry about the makeup of the Kafka
> clusters themselves. Your producers would all send their messages through
> the GSLB to that endpoint, rather than talking to Kafka directly.
>
> -Todd
>
>
>
> On Tue, Nov 3, 2015 at 10:15 AM, Cassa L  wrote:
>
> > Hi,
> >  Has anyone used load balancers between publishers and  Kafka brokers? I
> > want to do active-passive setup of Kafka in two datacenters.  My question
> > is can I add GSLB layer between these two Kafka clusters to configure
> > automatic fail over while publishing data?
> >
> > Thanks,
> > LCassa
> >
>