Is it possible to setup one cluster across multiple DC with high availability and low latency?

He etime Tue, 08 Jun 2021 09:58:35 -0700

Hi,

   I have read the official instructions about Kafka geo-replication 
(Cross-DC-mirroring). But I think it is not able to guarantee the data is 
always been replicated in multiple DCs.  When DC-level failure happens (whole 
DC become unavailable), the cluster in other DC may not have all the data 
replicated. So we come up with an idea with one Kafka cluster across multiple 
DCs.


Imaging that we have 3 data centers in 2 different cities. 2 DCs are in city A 
and 1 DC is in city B. Our goal is to make our application to tolerate any DC 
failure without data loss. The network latency between two cities is about 30ms.

A Kafka cluster is setup with 5 brokers, 2 in DC-A1, 2 in DC-A2, 1 in DC-B.  
Topics are created with 5 replicas distributed in each broker. As we want to 
maximize the data availability, producer sending ack is set with “all”. So 
massage sending would not be successful unless it has been wrote to all 
replicas. But the cost is also high because the latency of replica in DC-B is 
longer.

We know that acks=all means that producer.send() has to wait all the replicas’ 
response in ISR set. If we can find a way to make the replica in DC-B out of 
ISR set constantly ,while the replicas in DC-A1 & DC-A2 are always in ISR, and 
the replication process is still working, that will be great to decrease the 
producer latency.

So we found the broker-side config:  replica.lag.time.max.ms, we tried to set 
this config with value less than 100ms, but the whole cluster became un-stable. 
We can see the ISR of topic partitions located in DC-A1 and DC-A2 shrinking and 
expanding frequently from server.log, even with very low throughput. If we turn 
up the config value, the DC-B replica is always in ISR. I think it shows that 
the replica fetching process and ISR update may be more complex than I thought.

I don’t know if there is a proper replica.lag.time.max.ms for this. Does anyone 
have any suggestions?  Thanks.

Is it possible to setup one cluster across multiple DC with high availability and low latency?

Reply via email to