Re: Increased latency when consuming from the closest ISR
Hi, Thanks very much for the information. In https://github.com/apache/kafka/pull/11942, it is said the fetch request goes to the purgatory and will wait for wait timeout. Does it refer to fetch.max.wait.ms? In our test, we have configured that parameter to 100 ms. If that is the case, do you think that part of 400 ms is explained by that PR? Regards On Wed, May 11, 2022 at 9:45 AM Luke Chen wrote: > Hi, > > We have some improvement for the preferred read replica configured case. > Ex: > https://github.com/apache/kafka/pull/11942 > https://github.com/apache/kafka/pull/11965 > > I know one improvement will be included in the v3.2.0 release, which will > be released soon. > Maybe you can give it a try to see if it improves the throughput. > > Thank you. > Luke > > On Wed, May 11, 2022 at 2:56 PM benitocm wrote: > > > Hi, > > > > We are using the functionality provided by KIP-392 (a consumer can fetch > > the data from a ISR replica instead of the partition leader) in a Kafka > > cluster stretched between two very close DCs (average round-trip latency > > about 2 milliseconds). > > > > What we have seen is that, on average, when the consumer is in the same > DC > > (configured by rack.id) as the partition leader (i.e. the consumer will > > consume from the leader), the time that takes the message to get to the > > consumer is close to 20 milliseconds. However, when the consumer is in a > > different DC than the partition leader (the consumer will consume from a > > replica that is in the same DC as the consumer) that latency goes to > around > > 400 milliseconds. > > > > We have also checked that if we dont configure the rack.id in a > consumer > > to force it to consume from the leader although the partition leader is > a > > different DC (i.e. the consumer is in DC1 and the partition leader is in > > DC2 so the consumer goes from a DC to the other DC) , the latency is > > reduced to the 20 milliseconds. > > > > From those tests, we have concluded that consuming from a ISR replica > > implies to have higher latencies. > > > > Please does anybody share any thoughts on this? > > > > Thanks in advance > > >
Increased latency when consuming from the closest ISR
Hi, We are using the functionality provided by KIP-392 (a consumer can fetch the data from a ISR replica instead of the partition leader) in a Kafka cluster stretched between two very close DCs (average round-trip latency about 2 milliseconds). What we have seen is that, on average, when the consumer is in the same DC (configured by rack.id) as the partition leader (i.e. the consumer will consume from the leader), the time that takes the message to get to the consumer is close to 20 milliseconds. However, when the consumer is in a different DC than the partition leader (the consumer will consume from a replica that is in the same DC as the consumer) that latency goes to around 400 milliseconds. We have also checked that if we dont configure the rack.id in a consumer to force it to consume from the leader although the partition leader is a different DC (i.e. the consumer is in DC1 and the partition leader is in DC2 so the consumer goes from a DC to the other DC) , the latency is reduced to the 20 milliseconds. >From those tests, we have concluded that consuming from a ISR replica implies to have higher latencies. Please does anybody share any thoughts on this? Thanks in advance
How to reduce the latency to interact with a topic?
Hi all, I am using a Kafka topic to hanle invalidation events inside a system A that consists of different nodes. When a node of System A detects that a situation for invalidation happens, it produces an event to the invalidation topic. The rest of the nodes in System A consume that invalidation topic to be aware of those invalidations and process them. My question is how can I configure the producers and consumers of that topic to minimize the time of that end to end scenario? I mean I am interested in reducing the time that it takes to an event to be written into Kafka and reducing the time that it takes for a producer to consume those events. Thanks in advance
Kafka Connect metrics as Yammer metrics
Hi, Currently I am using the library https://github.com/RTBHOUSE/kafka-graphite-reporter in the brokers and clients to extract metrics and publish them into Graphite. This can be done for the metrics that are available as Yammer metrics. The thing is that I wanted to do the same with Connect Metrics https://kafka.apache.org/documentation/#connect_monitoring but I am not able to do it. Please, could anybody tell me if Kafka Connect metrics are available as Yammer metrics? Thanks in advance
Re: MM2 for DR
Thanks Ryanne. However what I dont understand clearly if the case of primary crashes, what do I need to do in the secondary? For example, LB at the entrance will just point to the primary front-end so all the data will be injected in the local topic "topic1" in secondary that will be consumed for the consumers instances that was connected to the "secondary". What happens with remote topics in the secondary "primary.topic1"? Once the "primary" comes up, the replication process will sync the data so events added to local topic1 in the secondary will be copied to the remote topic in the primary, "secondary.topic1". Thanks again On Thu, Feb 13, 2020 at 12:42 AM Ryanne Dolan wrote: > > elaborate a bit more about the active-active > > Active/active in this context just means that both (or multiple) > clusters are used under normal operation, not just during an outage. > For this to work, you basically have isolated instances of your application > stack running in each DC, with MM2 keeping each DC in sync. If one DC is > unavailable, traffic is shifted to another DC. It's possible to set this up > s.t. failover/failback between DCs happens automatically and seamlessly, > e.g. with load balancers and health checks. It's more complicated to set up > than the active/standby approach, but DR sorta takes care of itself from > then on. I frequently demo this stuff, where I pull the plug on entire DCs > and apps keep running like nothing happened. > > On Wed, Feb 12, 2020 at 12:05 AM benitocm wrote: > > > Hi Ryanne, > > > > Please could you elaborate a bit more about the active-active > > recommendation? > > > > Thanks in advance > > > > On Mon, Feb 10, 2020 at 10:21 PM benitocm wrote: > > > > > Thanks very much for the response. > > > > > > Please could you elaborate a bit more about "I'd > > > arc in that direction. Instead of migrating A->B->C->D..., > active/active > > is > > > more like having one big cluster". > > > > > > Another thing that I would like to share is that currently my consumers > > > only consumer from one topic so the fact of introducing MM2 will impact > > > them. > > > Any suggestion in this regard would be greatly appreciated > > > > > > Thanks in advance again! > > > > > > > > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan > > > wrote: > > > > > >> Hello, sounds like you have this all figured out actually. A couple > > notes: > > >> > > >> > For now, we just need to handle DR requirements, i.e., we would not > > need > > >> active-active > > >> > > >> If your infrastructure is sufficiently advanced, active/active can be > a > > >> lot > > >> easier to manage than active/standby. If you are starting from scratch > > I'd > > >> arc in that direction. Instead of migrating A->B->C->D..., > active/active > > >> is > > >> more like having one big cluster. > > >> > > >> > secondary.primary.topic1 > > >> > > >> I'd recommend using regex subscriptions where possible, so that apps > > don't > > >> need to worry about these potentially complex topic names. > > >> > > >> > An additional question. If the topic is compacted, i.e.., the topic > > >> keeps > > >> > forever, does switchover operations would imply add an additional > path > > >> in > > >> > the topic name? > > >> > > >> I think that's right. You could always clean things up manually, but > > >> migrating between clusters a bunch of times would leave a trail of > > >> replication hops. > > >> > > >> Also, you might look into implementing a custom ReplicationPolicy. For > > >> example, you could squash "secondary.primary.topic1" into something > > >> shorter > > >> if you like. > > >> > > >> Ryanne > > >> > > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm wrote: > > >> > > >> > Hi, > > >> > > > >> > After having a look to the talk > > >> > > > >> > > > >> > > > https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 > > >> > and the > > >> > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+Mirror
Re: [External] Re: MM2 for DR
Hi, I knew about uReplicator but I discarded because it uses Helix. Anyway, I will have a look to the talk Thanks very much On Wed, Feb 12, 2020 at 9:59 PM Brian Sang wrote: > Not sure if you saw this before, but you might be interested in some of the > work Uber has done for inter-cluster replication and federation. They of > course use their own tool, uReplicator ( > https://github.com/uber/uReplicator), > instead of mirror maker, but you should be able to draw the same insights > as them. > > This talk from kafka summit SF 2019 wasn't about ureplicator per se, but > goes over some problems they experienced and addressed w.r.t. inter-cluster > replication and federation: > > https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-cluster-federation-at-uber > > On Tue, Feb 11, 2020 at 10:05 PM benitocm wrote: > > > Hi Ryanne, > > > > Please could you elaborate a bit more about the active-active > > recommendation? > > > > Thanks in advance > > > > On Mon, Feb 10, 2020 at 10:21 PM benitocm wrote: > > > > > Thanks very much for the response. > > > > > > Please could you elaborate a bit more about "I'd > > > arc in that direction. Instead of migrating A->B->C->D..., > active/active > > is > > > more like having one big cluster". > > > > > > Another thing that I would like to share is that currently my consumers > > > only consumer from one topic so the fact of introducing MM2 will impact > > > them. > > > Any suggestion in this regard would be greatly appreciated > > > > > > Thanks in advance again! > > > > > > > > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan > > > wrote: > > > > > >> Hello, sounds like you have this all figured out actually. A couple > > notes: > > >> > > >> > For now, we just need to handle DR requirements, i.e., we would not > > need > > >> active-active > > >> > > >> If your infrastructure is sufficiently advanced, active/active can be > a > > >> lot > > >> easier to manage than active/standby. If you are starting from scratch > > I'd > > >> arc in that direction. Instead of migrating A->B->C->D..., > active/active > > >> is > > >> more like having one big cluster. > > >> > > >> > secondary.primary.topic1 > > >> > > >> I'd recommend using regex subscriptions where possible, so that apps > > don't > > >> need to worry about these potentially complex topic names. > > >> > > >> > An additional question. If the topic is compacted, i.e.., the topic > > >> keeps > > >> > forever, does switchover operations would imply add an additional > path > > >> in > > >> > the topic name? > > >> > > >> I think that's right. You could always clean things up manually, but > > >> migrating between clusters a bunch of times would leave a trail of > > >> replication hops. > > >> > > >> Also, you might look into implementing a custom ReplicationPolicy. For > > >> example, you could squash "secondary.primary.topic1" into something > > >> shorter > > >> if you like. > > >> > > >> Ryanne > > >> > > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm wrote: > > >> > > >> > Hi, > > >> > > > >> > After having a look to the talk > > >> > > > >> > > > >> > > > https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 > > >> > and the > > >> > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 > > >> > I am trying to understand how I would use it > > >> > in the setup that I have. For now, we just need to handle DR > > >> requirements, > > >> > i.e., we would not need active-active > > >> > > > >> > My requirements, more or less, are the following: > > >> > > > >> > 1) Currently, we have just one Kafka cluster "primary" where all the > > >> > producers are producing to and where all the consumers are consuming > > >> from. > > >> > 2) In case "primary" crashes, we would need to have other Kafka > > cluster > >
Re: MM2 for DR
Hi Ryanne, Please could you elaborate a bit more about the active-active recommendation? Thanks in advance On Mon, Feb 10, 2020 at 10:21 PM benitocm wrote: > Thanks very much for the response. > > Please could you elaborate a bit more about "I'd > arc in that direction. Instead of migrating A->B->C->D..., active/active is > more like having one big cluster". > > Another thing that I would like to share is that currently my consumers > only consumer from one topic so the fact of introducing MM2 will impact > them. > Any suggestion in this regard would be greatly appreciated > > Thanks in advance again! > > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan > wrote: > >> Hello, sounds like you have this all figured out actually. A couple notes: >> >> > For now, we just need to handle DR requirements, i.e., we would not need >> active-active >> >> If your infrastructure is sufficiently advanced, active/active can be a >> lot >> easier to manage than active/standby. If you are starting from scratch I'd >> arc in that direction. Instead of migrating A->B->C->D..., active/active >> is >> more like having one big cluster. >> >> > secondary.primary.topic1 >> >> I'd recommend using regex subscriptions where possible, so that apps don't >> need to worry about these potentially complex topic names. >> >> > An additional question. If the topic is compacted, i.e.., the topic >> keeps >> > forever, does switchover operations would imply add an additional path >> in >> > the topic name? >> >> I think that's right. You could always clean things up manually, but >> migrating between clusters a bunch of times would leave a trail of >> replication hops. >> >> Also, you might look into implementing a custom ReplicationPolicy. For >> example, you could squash "secondary.primary.topic1" into something >> shorter >> if you like. >> >> Ryanne >> >> On Mon, Feb 10, 2020 at 1:24 PM benitocm wrote: >> >> > Hi, >> > >> > After having a look to the talk >> > >> > >> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 >> > and the >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 >> > I am trying to understand how I would use it >> > in the setup that I have. For now, we just need to handle DR >> requirements, >> > i.e., we would not need active-active >> > >> > My requirements, more or less, are the following: >> > >> > 1) Currently, we have just one Kafka cluster "primary" where all the >> > producers are producing to and where all the consumers are consuming >> from. >> > 2) In case "primary" crashes, we would need to have other Kafka cluster >> > "secondary" where we will move all the producer and consumers and keep >> > working. >> > 3) Once "primary" is recovered, we would need to move to it again (as we >> > were in #1) >> > >> > To fullfill #2, I have thought to have a new Kafka cluster "secondary" >> and >> > setup a replication procedure using MM2. However, it is not clear to me >> how >> > to proceed. >> > >> > I would describe the high level details so you guys can point my >> > misconceptions: >> > >> > A) Initial situation. As in the example of the KIP-382, in the primary >> > cluster, we will have a local topic: "topic1" where the producers will >> > produce to and the consumers will consume from. MM2 will create in the >> > primary the remote topic "primary.topic1" where the local topic in the >> > primary will be replicated. In addition, the consumer group information >> of >> > primary will be also replicated. >> > >> > B) Kafka primary cluster is not available. Producers are moved to >> produce >> > into the topic1 that it was manually created. In addition, consumers >> need >> > to connect to >> > secondary to consume the local topic "topic1" where the producers are >> now >> > producing and from the remote topic "primary.topic1" where the >> producers >> > were producing before, i.e., consumers will need to aggregate.This is so >> > because some consumers could have lag so they will need to consume from >> > both. In this situation, local topic "topic1" in the secon
Re: MM2 for DR
Thanks very much for the response. Please could you elaborate a bit more about "I'd arc in that direction. Instead of migrating A->B->C->D..., active/active is more like having one big cluster". Another thing that I would like to share is that currently my consumers only consumer from one topic so the fact of introducing MM2 will impact them. Any suggestion in this regard would be greatly appreciated Thanks in advance again! On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan wrote: > Hello, sounds like you have this all figured out actually. A couple notes: > > > For now, we just need to handle DR requirements, i.e., we would not need > active-active > > If your infrastructure is sufficiently advanced, active/active can be a lot > easier to manage than active/standby. If you are starting from scratch I'd > arc in that direction. Instead of migrating A->B->C->D..., active/active is > more like having one big cluster. > > > secondary.primary.topic1 > > I'd recommend using regex subscriptions where possible, so that apps don't > need to worry about these potentially complex topic names. > > > An additional question. If the topic is compacted, i.e.., the topic keeps > > forever, does switchover operations would imply add an additional path in > > the topic name? > > I think that's right. You could always clean things up manually, but > migrating between clusters a bunch of times would leave a trail of > replication hops. > > Also, you might look into implementing a custom ReplicationPolicy. For > example, you could squash "secondary.primary.topic1" into something shorter > if you like. > > Ryanne > > On Mon, Feb 10, 2020 at 1:24 PM benitocm wrote: > > > Hi, > > > > After having a look to the talk > > > > > https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 > > and the > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 > > I am trying to understand how I would use it > > in the setup that I have. For now, we just need to handle DR > requirements, > > i.e., we would not need active-active > > > > My requirements, more or less, are the following: > > > > 1) Currently, we have just one Kafka cluster "primary" where all the > > producers are producing to and where all the consumers are consuming > from. > > 2) In case "primary" crashes, we would need to have other Kafka cluster > > "secondary" where we will move all the producer and consumers and keep > > working. > > 3) Once "primary" is recovered, we would need to move to it again (as we > > were in #1) > > > > To fullfill #2, I have thought to have a new Kafka cluster "secondary" > and > > setup a replication procedure using MM2. However, it is not clear to me > how > > to proceed. > > > > I would describe the high level details so you guys can point my > > misconceptions: > > > > A) Initial situation. As in the example of the KIP-382, in the primary > > cluster, we will have a local topic: "topic1" where the producers will > > produce to and the consumers will consume from. MM2 will create in the > > primary the remote topic "primary.topic1" where the local topic in the > > primary will be replicated. In addition, the consumer group information > of > > primary will be also replicated. > > > > B) Kafka primary cluster is not available. Producers are moved to produce > > into the topic1 that it was manually created. In addition, consumers need > > to connect to > > secondary to consume the local topic "topic1" where the producers are now > > producing and from the remote topic "primary.topic1" where the producers > > were producing before, i.e., consumers will need to aggregate.This is so > > because some consumers could have lag so they will need to consume from > > both. In this situation, local topic "topic1" in the secondary will be > > modified with new messages and will be consumed (its consumption > > information will also change) but the remote topic "primary.topic1" will > > not receive new messages but it will be consumed (its consumption > > information will change) > > > > At this point, my conclusion is that consumers needs to consume from both > > topics (the new messages produced in the local topic and the old messages > > for consumers that had a lag) > > > > C) primary cluster is recovered (here is when the things get complicated > > for me). In the talk, the new
MM2 for DR
Hi, After having a look to the talk https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0 and the https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382 I am trying to understand how I would use it in the setup that I have. For now, we just need to handle DR requirements, i.e., we would not need active-active My requirements, more or less, are the following: 1) Currently, we have just one Kafka cluster "primary" where all the producers are producing to and where all the consumers are consuming from. 2) In case "primary" crashes, we would need to have other Kafka cluster "secondary" where we will move all the producer and consumers and keep working. 3) Once "primary" is recovered, we would need to move to it again (as we were in #1) To fullfill #2, I have thought to have a new Kafka cluster "secondary" and setup a replication procedure using MM2. However, it is not clear to me how to proceed. I would describe the high level details so you guys can point my misconceptions: A) Initial situation. As in the example of the KIP-382, in the primary cluster, we will have a local topic: "topic1" where the producers will produce to and the consumers will consume from. MM2 will create in the primary the remote topic "primary.topic1" where the local topic in the primary will be replicated. In addition, the consumer group information of primary will be also replicated. B) Kafka primary cluster is not available. Producers are moved to produce into the topic1 that it was manually created. In addition, consumers need to connect to secondary to consume the local topic "topic1" where the producers are now producing and from the remote topic "primary.topic1" where the producers were producing before, i.e., consumers will need to aggregate.This is so because some consumers could have lag so they will need to consume from both. In this situation, local topic "topic1" in the secondary will be modified with new messages and will be consumed (its consumption information will also change) but the remote topic "primary.topic1" will not receive new messages but it will be consumed (its consumption information will change) At this point, my conclusion is that consumers needs to consume from both topics (the new messages produced in the local topic and the old messages for consumers that had a lag) C) primary cluster is recovered (here is when the things get complicated for me). In the talk, the new primary is renamed a primary-2 and the MM2 is configured to active-active replication. The result is the following. The secondary cluster will end up with a new remote topic (primary-2.topic1) that will contain a replica of the new topic1 created in the primary-2 cluster. The primary-2 cluster will have 3 topics. "topic1" will be a new topic where in the near future producers will produce, "secondary.topic1" contains the replica of the local topic "topic1" in the secondary and "secondary.primary.topic1" that is "topic1" of the old primary (got through the secondary). D) Once all the replicas are in sync, producers and consumers will be moved to the primary-2. Producers will produce to local topic "topic1" of primary-2 cluster. The consumers will connect to primary-2 to consume from "topic1" (new messages that come in), "secondary.topic1" (messages produced during the outage) and from "secondary.primary.topic1" (old messages) If topics have a retention time, e.g. 7 days, we could remove "secondary.primary.topic1" after a few days, leaving the situation as at the beginning. However, if another problem happens in the middle, the number of topics could be a little difficult to handle. An additional question. If the topic is compacted, i.e.., the topic keeps forever, does switchover operations would imply add an additional path in the topic name? I would appreciate some guidance with this. Regards