Re: Increased latency when consuming from the closest ISR

2022-05-11 Thread benitocm
Hi,

Thanks very much for the information. In
https://github.com/apache/kafka/pull/11942, it is said the fetch request
goes to the purgatory and will wait for wait timeout. Does it refer to
fetch.max.wait.ms? In our test, we have configured that parameter to 100
ms. If that is the case, do you think that part of 400 ms is explained by
that PR?

Regards

On Wed, May 11, 2022 at 9:45 AM Luke Chen  wrote:

> Hi,
>
> We have some improvement for the preferred read replica configured case.
> Ex:
> https://github.com/apache/kafka/pull/11942
> https://github.com/apache/kafka/pull/11965
>
> I know one improvement will be included in the v3.2.0 release, which will
> be released soon.
> Maybe you can give it a try to see if it improves the throughput.
>
> Thank you.
> Luke
>
> On Wed, May 11, 2022 at 2:56 PM benitocm  wrote:
>
> > Hi,
> >
> > We are using the functionality provided by KIP-392 (a consumer can fetch
> > the data from a ISR replica instead of the partition leader) in a Kafka
> > cluster stretched between two very close DCs (average round-trip latency
> > about 2 milliseconds).
> >
> > What we have seen is that, on average, when the consumer is in the same
> DC
> > (configured by rack.id) as the partition leader (i.e. the consumer will
> > consume from the leader), the time that takes the message to get to the
> > consumer is close to 20 milliseconds. However, when the consumer is in a
> > different DC than the partition leader (the consumer will consume from a
> > replica that is in the same DC as the consumer) that latency goes to
> around
> > 400 milliseconds.
> >
> > We have also checked that if we dont configure  the rack.id in a
> consumer
> > to force  it to consume from the leader although the partition leader is
> a
> > different DC (i.e. the consumer is in DC1 and the partition leader is in
> > DC2 so the consumer goes from a DC to the other DC) , the latency is
> > reduced to the 20 milliseconds.
> >
> > From those tests, we have concluded that consuming from a ISR replica
> > implies to have higher latencies.
> >
> > Please does anybody share any thoughts on this?
> >
> > Thanks in advance
> >
>


Increased latency when consuming from the closest ISR

2022-05-11 Thread benitocm
Hi,

We are using the functionality provided by KIP-392 (a consumer can fetch
the data from a ISR replica instead of the partition leader) in a Kafka
cluster stretched between two very close DCs (average round-trip latency
about 2 milliseconds).

What we have seen is that, on average, when the consumer is in the same DC
(configured by rack.id) as the partition leader (i.e. the consumer will
consume from the leader), the time that takes the message to get to the
consumer is close to 20 milliseconds. However, when the consumer is in a
different DC than the partition leader (the consumer will consume from a
replica that is in the same DC as the consumer) that latency goes to around
400 milliseconds.

We have also checked that if we dont configure  the rack.id in a consumer
to force  it to consume from the leader although the partition leader is a
different DC (i.e. the consumer is in DC1 and the partition leader is in
DC2 so the consumer goes from a DC to the other DC) , the latency is
reduced to the 20 milliseconds.

>From those tests, we have concluded that consuming from a ISR replica
implies to have higher latencies.

Please does anybody share any thoughts on this?

Thanks in advance


How to reduce the latency to interact with a topic?

2021-05-24 Thread benitocm
Hi all,

I am using a Kafka topic to hanle invalidation events inside a system A
that consists of different nodes. When a node of System A detects that a
situation for invalidation happens, it  produces an event to the
invalidation topic. The rest of the nodes in System A consume that
invalidation topic to be aware of those invalidations and process them.
My question is how can I configure the producers and consumers of that
topic to minimize the time of that end to end  scenario? I mean I am
interested in reducing the time that it takes to an event to be written
into Kafka and reducing the time that it takes for a producer to consume
those events.

Thanks in advance


Kafka Connect metrics as Yammer metrics

2021-05-21 Thread benitocm
Hi,

Currently I am using the library
https://github.com/RTBHOUSE/kafka-graphite-reporter in the brokers and
clients to extract metrics and publish them into Graphite. This can be done
for the metrics that are available as Yammer metrics. The thing is that I
wanted to do the same with Connect Metrics
https://kafka.apache.org/documentation/#connect_monitoring but I am not
able to do it. Please, could anybody tell me if Kafka Connect metrics are
available  as Yammer metrics?

Thanks in advance


Re: MM2 for DR

2020-02-12 Thread benitocm
Thanks Ryanne.
However what I dont understand clearly if the case of primary crashes, what
do  I need to do in the secondary? For example,  LB at the entrance will
just point to the primary front-end so all the data will be injected in the
local topic "topic1" in secondary that will be consumed for the consumers
instances  that was connected to the "secondary". What happens with remote
topics in the secondary "primary.topic1"?
Once the "primary" comes up, the replication process will sync the data so
events added to local topic1 in the secondary will be copied to the remote
topic in the primary, "secondary.topic1".
Thanks again


On Thu, Feb 13, 2020 at 12:42 AM Ryanne Dolan  wrote:

> > elaborate a bit more about the active-active
>
> Active/active in this context just means that both (or multiple)
> clusters are used under normal operation, not just during an outage.
> For this to work, you basically have isolated instances of your application
> stack running in each DC, with MM2 keeping each DC in sync. If one DC is
> unavailable, traffic is shifted to another DC. It's possible to set this up
> s.t. failover/failback between DCs happens automatically and seamlessly,
> e.g. with load balancers and health checks. It's more complicated to set up
> than the active/standby approach, but DR sorta takes care of itself from
> then on. I frequently demo this stuff, where I pull the plug on entire DCs
> and apps keep running like nothing happened.
>
> On Wed, Feb 12, 2020 at 12:05 AM benitocm  wrote:
>
> > Hi Ryanne,
> >
> > Please could you elaborate a bit more about the active-active
> > recommendation?
> >
> > Thanks in advance
> >
> > On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:
> >
> > > Thanks very much for the response.
> > >
> > > Please could you elaborate a bit more about  "I'd
> > > arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > is
> > > more like having one big cluster".
> > >
> > > Another thing that I would like to share is that currently my consumers
> > > only consumer from one topic so the fact of introducing MM2 will impact
> > > them.
> > > Any suggestion in this regard would be greatly appreciated
> > >
> > > Thanks in advance again!
> > >
> > >
> > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> > > wrote:
> > >
> > >> Hello, sounds like you have this all figured out actually. A couple
> > notes:
> > >>
> > >> > For now, we just need to handle DR requirements, i.e., we would not
> > need
> > >> active-active
> > >>
> > >> If your infrastructure is sufficiently advanced, active/active can be
> a
> > >> lot
> > >> easier to manage than active/standby. If you are starting from scratch
> > I'd
> > >> arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > >> is
> > >> more like having one big cluster.
> > >>
> > >> > secondary.primary.topic1
> > >>
> > >> I'd recommend using regex subscriptions where possible, so that apps
> > don't
> > >> need to worry about these potentially complex topic names.
> > >>
> > >> > An additional question. If the topic is compacted, i.e.., the topic
> > >> keeps
> > >> > forever, does switchover operations would imply add an additional
> path
> > >> in
> > >> > the topic name?
> > >>
> > >> I think that's right. You could always clean things up manually, but
> > >> migrating between clusters a bunch of times would leave a trail of
> > >> replication hops.
> > >>
> > >> Also, you might look into implementing a custom ReplicationPolicy. For
> > >> example, you could squash "secondary.primary.topic1" into something
> > >> shorter
> > >> if you like.
> > >>
> > >> Ryanne
> > >>
> > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > After having a look to the talk
> > >> >
> > >> >
> > >>
> >
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> > >> > and the
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+Mirror

Re: [External] Re: MM2 for DR

2020-02-12 Thread benitocm
Hi,

I knew about uReplicator but I discarded because it uses Helix. Anyway, I
will have a look to the talk

Thanks very much

On Wed, Feb 12, 2020 at 9:59 PM Brian Sang  wrote:

> Not sure if you saw this before, but you might be interested in some of the
> work Uber has done for inter-cluster replication and federation. They of
> course use their own tool, uReplicator (
> https://github.com/uber/uReplicator),
> instead of mirror maker, but you should be able to draw the same insights
> as them.
>
> This talk from kafka summit SF 2019 wasn't about ureplicator per se, but
> goes over some problems they experienced and addressed w.r.t. inter-cluster
> replication and federation:
>
> https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-cluster-federation-at-uber
>
> On Tue, Feb 11, 2020 at 10:05 PM benitocm  wrote:
>
> > Hi Ryanne,
> >
> > Please could you elaborate a bit more about the active-active
> > recommendation?
> >
> > Thanks in advance
> >
> > On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:
> >
> > > Thanks very much for the response.
> > >
> > > Please could you elaborate a bit more about  "I'd
> > > arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > is
> > > more like having one big cluster".
> > >
> > > Another thing that I would like to share is that currently my consumers
> > > only consumer from one topic so the fact of introducing MM2 will impact
> > > them.
> > > Any suggestion in this regard would be greatly appreciated
> > >
> > > Thanks in advance again!
> > >
> > >
> > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> > > wrote:
> > >
> > >> Hello, sounds like you have this all figured out actually. A couple
> > notes:
> > >>
> > >> > For now, we just need to handle DR requirements, i.e., we would not
> > need
> > >> active-active
> > >>
> > >> If your infrastructure is sufficiently advanced, active/active can be
> a
> > >> lot
> > >> easier to manage than active/standby. If you are starting from scratch
> > I'd
> > >> arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > >> is
> > >> more like having one big cluster.
> > >>
> > >> > secondary.primary.topic1
> > >>
> > >> I'd recommend using regex subscriptions where possible, so that apps
> > don't
> > >> need to worry about these potentially complex topic names.
> > >>
> > >> > An additional question. If the topic is compacted, i.e.., the topic
> > >> keeps
> > >> > forever, does switchover operations would imply add an additional
> path
> > >> in
> > >> > the topic name?
> > >>
> > >> I think that's right. You could always clean things up manually, but
> > >> migrating between clusters a bunch of times would leave a trail of
> > >> replication hops.
> > >>
> > >> Also, you might look into implementing a custom ReplicationPolicy. For
> > >> example, you could squash "secondary.primary.topic1" into something
> > >> shorter
> > >> if you like.
> > >>
> > >> Ryanne
> > >>
> > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > After having a look to the talk
> > >> >
> > >> >
> > >>
> >
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> > >> > and the
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> > >> > I am trying to understand how I would use it
> > >> > in the setup that I have. For now, we just need to handle DR
> > >> requirements,
> > >> > i.e., we would not need active-active
> > >> >
> > >> > My requirements, more or less, are the following:
> > >> >
> > >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> > >> > producers are producing to and where all the consumers are consuming
> > >> from.
> > >> > 2) In case "primary" crashes, we would need to have other Kafka
> > cluster
> >

Re: MM2 for DR

2020-02-11 Thread benitocm
Hi Ryanne,

Please could you elaborate a bit more about the active-active
recommendation?

Thanks in advance

On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:

> Thanks very much for the response.
>
> Please could you elaborate a bit more about  "I'd
> arc in that direction. Instead of migrating A->B->C->D..., active/active is
> more like having one big cluster".
>
> Another thing that I would like to share is that currently my consumers
> only consumer from one topic so the fact of introducing MM2 will impact
> them.
> Any suggestion in this regard would be greatly appreciated
>
> Thanks in advance again!
>
>
> On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> wrote:
>
>> Hello, sounds like you have this all figured out actually. A couple notes:
>>
>> > For now, we just need to handle DR requirements, i.e., we would not need
>> active-active
>>
>> If your infrastructure is sufficiently advanced, active/active can be a
>> lot
>> easier to manage than active/standby. If you are starting from scratch I'd
>> arc in that direction. Instead of migrating A->B->C->D..., active/active
>> is
>> more like having one big cluster.
>>
>> > secondary.primary.topic1
>>
>> I'd recommend using regex subscriptions where possible, so that apps don't
>> need to worry about these potentially complex topic names.
>>
>> > An additional question. If the topic is compacted, i.e.., the topic
>> keeps
>> > forever, does switchover operations would imply add an additional path
>> in
>> > the topic name?
>>
>> I think that's right. You could always clean things up manually, but
>> migrating between clusters a bunch of times would leave a trail of
>> replication hops.
>>
>> Also, you might look into implementing a custom ReplicationPolicy. For
>> example, you could squash "secondary.primary.topic1" into something
>> shorter
>> if you like.
>>
>> Ryanne
>>
>> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
>>
>> > Hi,
>> >
>> > After having a look to the talk
>> >
>> >
>> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
>> > and the
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
>> > I am trying to understand how I would use it
>> > in the setup that I have. For now, we just need to handle DR
>> requirements,
>> > i.e., we would not need active-active
>> >
>> > My requirements, more or less, are the following:
>> >
>> > 1) Currently, we have just one Kafka cluster "primary" where all the
>> > producers are producing to and where all the consumers are consuming
>> from.
>> > 2) In case "primary" crashes, we would need to have other Kafka cluster
>> > "secondary" where we will move all the producer and consumers and keep
>> > working.
>> > 3) Once "primary" is recovered, we would need to move to it again (as we
>> > were in #1)
>> >
>> > To fullfill #2, I have thought to have a new Kafka cluster "secondary"
>> and
>> > setup a replication procedure using MM2. However, it is not clear to me
>> how
>> > to proceed.
>> >
>> > I would describe the high level details so you guys can point my
>> > misconceptions:
>> >
>> > A) Initial situation. As in the example of the KIP-382, in the primary
>> > cluster, we will have a local topic: "topic1" where the producers will
>> > produce to and the consumers will consume from. MM2 will create in  the
>> > primary the remote topic "primary.topic1" where the local topic in the
>> > primary will be replicated. In addition, the consumer group information
>> of
>> > primary will be also replicated.
>> >
>> > B) Kafka primary cluster is not available. Producers are moved to
>> produce
>> > into the topic1 that it was manually created. In addition, consumers
>> need
>> > to connect to
>> > secondary to consume the local topic "topic1" where the producers are
>> now
>> > producing and from the remote topic  "primary.topic1" where the
>> producers
>> > were producing before, i.e., consumers will need to aggregate.This is so
>> > because some consumers could have lag so they will need to consume from
>> > both. In this situation, local topic "topic1" in the secon

Re: MM2 for DR

2020-02-10 Thread benitocm
Thanks very much for the response.

Please could you elaborate a bit more about  "I'd
arc in that direction. Instead of migrating A->B->C->D..., active/active is
more like having one big cluster".

Another thing that I would like to share is that currently my consumers
only consumer from one topic so the fact of introducing MM2 will impact
them.
Any suggestion in this regard would be greatly appreciated

Thanks in advance again!


On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan  wrote:

> Hello, sounds like you have this all figured out actually. A couple notes:
>
> > For now, we just need to handle DR requirements, i.e., we would not need
> active-active
>
> If your infrastructure is sufficiently advanced, active/active can be a lot
> easier to manage than active/standby. If you are starting from scratch I'd
> arc in that direction. Instead of migrating A->B->C->D..., active/active is
> more like having one big cluster.
>
> > secondary.primary.topic1
>
> I'd recommend using regex subscriptions where possible, so that apps don't
> need to worry about these potentially complex topic names.
>
> > An additional question. If the topic is compacted, i.e.., the topic keeps
> > forever, does switchover operations would imply add an additional path in
> > the topic name?
>
> I think that's right. You could always clean things up manually, but
> migrating between clusters a bunch of times would leave a trail of
> replication hops.
>
> Also, you might look into implementing a custom ReplicationPolicy. For
> example, you could squash "secondary.primary.topic1" into something shorter
> if you like.
>
> Ryanne
>
> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
>
> > Hi,
> >
> > After having a look to the talk
> >
> >
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> > and the
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> > I am trying to understand how I would use it
> > in the setup that I have. For now, we just need to handle DR
> requirements,
> > i.e., we would not need active-active
> >
> > My requirements, more or less, are the following:
> >
> > 1) Currently, we have just one Kafka cluster "primary" where all the
> > producers are producing to and where all the consumers are consuming
> from.
> > 2) In case "primary" crashes, we would need to have other Kafka cluster
> > "secondary" where we will move all the producer and consumers and keep
> > working.
> > 3) Once "primary" is recovered, we would need to move to it again (as we
> > were in #1)
> >
> > To fullfill #2, I have thought to have a new Kafka cluster "secondary"
> and
> > setup a replication procedure using MM2. However, it is not clear to me
> how
> > to proceed.
> >
> > I would describe the high level details so you guys can point my
> > misconceptions:
> >
> > A) Initial situation. As in the example of the KIP-382, in the primary
> > cluster, we will have a local topic: "topic1" where the producers will
> > produce to and the consumers will consume from. MM2 will create in  the
> > primary the remote topic "primary.topic1" where the local topic in the
> > primary will be replicated. In addition, the consumer group information
> of
> > primary will be also replicated.
> >
> > B) Kafka primary cluster is not available. Producers are moved to produce
> > into the topic1 that it was manually created. In addition, consumers need
> > to connect to
> > secondary to consume the local topic "topic1" where the producers are now
> > producing and from the remote topic  "primary.topic1" where the producers
> > were producing before, i.e., consumers will need to aggregate.This is so
> > because some consumers could have lag so they will need to consume from
> > both. In this situation, local topic "topic1" in the secondary will be
> > modified with new messages and will be consumed (its consumption
> > information will also change) but the remote topic "primary.topic1" will
> > not receive new messages but it will be consumed  (its consumption
> > information will change)
> >
> > At this point, my conclusion is that consumers needs to consume from both
> > topics (the new messages produced in the local topic and the old messages
> > for consumers that had a lag)
> >
> > C) primary cluster is recovered (here is when the things get complicated
> > for me). In the talk, the new 

MM2 for DR

2020-02-10 Thread benitocm
Hi,

After having a look to the talk
https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
and the
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
I am trying to understand how I would use it
in the setup that I have. For now, we just need to handle DR requirements,
i.e., we would not need active-active

My requirements, more or less, are the following:

1) Currently, we have just one Kafka cluster "primary" where all the
producers are producing to and where all the consumers are consuming from.
2) In case "primary" crashes, we would need to have other Kafka cluster
"secondary" where we will move all the producer and consumers and keep
working.
3) Once "primary" is recovered, we would need to move to it again (as we
were in #1)

To fullfill #2, I have thought to have a new Kafka cluster "secondary" and
setup a replication procedure using MM2. However, it is not clear to me how
to proceed.

I would describe the high level details so you guys can point my
misconceptions:

A) Initial situation. As in the example of the KIP-382, in the primary
cluster, we will have a local topic: "topic1" where the producers will
produce to and the consumers will consume from. MM2 will create in  the
primary the remote topic "primary.topic1" where the local topic in the
primary will be replicated. In addition, the consumer group information of
primary will be also replicated.

B) Kafka primary cluster is not available. Producers are moved to produce
into the topic1 that it was manually created. In addition, consumers need
to connect to
secondary to consume the local topic "topic1" where the producers are now
producing and from the remote topic  "primary.topic1" where the producers
were producing before, i.e., consumers will need to aggregate.This is so
because some consumers could have lag so they will need to consume from
both. In this situation, local topic "topic1" in the secondary will be
modified with new messages and will be consumed (its consumption
information will also change) but the remote topic "primary.topic1" will
not receive new messages but it will be consumed  (its consumption
information will change)

At this point, my conclusion is that consumers needs to consume from both
topics (the new messages produced in the local topic and the old messages
for consumers that had a lag)

C) primary cluster is recovered (here is when the things get complicated
for me). In the talk, the new primary is renamed a primary-2 and the MM2 is
configured to active-active replication.
The result is the following. The secondary cluster will end up with a new
remote topic (primary-2.topic1) that will contain a replica of the new
topic1 created in the primary-2 cluster. The primary-2 cluster will have 3
topics. "topic1" will be a new topic where in the near future producers
will produce, "secondary.topic1" contains the replica of the local topic
"topic1" in the secondary and "secondary.primary.topic1" that is "topic1"
of the old primary (got through the secondary).

D) Once all the replicas are in sync, producers and consumers will be moved
to the primary-2. Producers will produce to local topic "topic1" of
primary-2 cluster. The consumers
will connect to primary-2 to consume from "topic1" (new messages that come
in), "secondary.topic1" (messages produced during the outage) and from
"secondary.primary.topic1" (old messages)

If topics have a retention time, e.g. 7 days, we could remove
"secondary.primary.topic1" after a few days, leaving the situation as at
the beginning. However, if another problem happens in the middle, the
number of topics could be a little difficult to handle.

An additional question. If the topic is compacted, i.e.., the topic keeps
forever, does switchover operations would imply add an additional path in
the topic name?

I would appreciate some guidance with this.

Regards