Re: [External] Re: MM2 for DR

2020-02-12 Thread benitocm
Hi,

I knew about uReplicator but I discarded because it uses Helix. Anyway, I
will have a look to the talk

Thanks very much

On Wed, Feb 12, 2020 at 9:59 PM Brian Sang  wrote:

> Not sure if you saw this before, but you might be interested in some of the
> work Uber has done for inter-cluster replication and federation. They of
> course use their own tool, uReplicator (
> https://github.com/uber/uReplicator),
> instead of mirror maker, but you should be able to draw the same insights
> as them.
>
> This talk from kafka summit SF 2019 wasn't about ureplicator per se, but
> goes over some problems they experienced and addressed w.r.t. inter-cluster
> replication and federation:
>
> https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-cluster-federation-at-uber
>
> On Tue, Feb 11, 2020 at 10:05 PM benitocm  wrote:
>
> > Hi Ryanne,
> >
> > Please could you elaborate a bit more about the active-active
> > recommendation?
> >
> > Thanks in advance
> >
> > On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:
> >
> > > Thanks very much for the response.
> > >
> > > Please could you elaborate a bit more about  "I'd
> > > arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > is
> > > more like having one big cluster".
> > >
> > > Another thing that I would like to share is that currently my consumers
> > > only consumer from one topic so the fact of introducing MM2 will impact
> > > them.
> > > Any suggestion in this regard would be greatly appreciated
> > >
> > > Thanks in advance again!
> > >
> > >
> > > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> > > wrote:
> > >
> > >> Hello, sounds like you have this all figured out actually. A couple
> > notes:
> > >>
> > >> > For now, we just need to handle DR requirements, i.e., we would not
> > need
> > >> active-active
> > >>
> > >> If your infrastructure is sufficiently advanced, active/active can be
> a
> > >> lot
> > >> easier to manage than active/standby. If you are starting from scratch
> > I'd
> > >> arc in that direction. Instead of migrating A->B->C->D...,
> active/active
> > >> is
> > >> more like having one big cluster.
> > >>
> > >> > secondary.primary.topic1
> > >>
> > >> I'd recommend using regex subscriptions where possible, so that apps
> > don't
> > >> need to worry about these potentially complex topic names.
> > >>
> > >> > An additional question. If the topic is compacted, i.e.., the topic
> > >> keeps
> > >> > forever, does switchover operations would imply add an additional
> path
> > >> in
> > >> > the topic name?
> > >>
> > >> I think that's right. You could always clean things up manually, but
> > >> migrating between clusters a bunch of times would leave a trail of
> > >> replication hops.
> > >>
> > >> Also, you might look into implementing a custom ReplicationPolicy. For
> > >> example, you could squash "secondary.primary.topic1" into something
> > >> shorter
> > >> if you like.
> > >>
> > >> Ryanne
> > >>
> > >> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > After having a look to the talk
> > >> >
> > >> >
> > >>
> >
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> > >> > and the
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> > >> > I am trying to understand how I would use it
> > >> > in the setup that I have. For now, we just need to handle DR
> > >> requirements,
> > >> > i.e., we would not need active-active
> > >> >
> > >> > My requirements, more or less, are the following:
> > >> >
> > >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> > >> > producers are producing to and where all the consumers are consuming
> > >> from.
> > >> > 2) In case "primary" crashes, we would need to have other Kafka
> > cluster
> > >> > "secondary" where we will move all the producer and consumers and
> keep
> > >> > working.
> > >> > 3) Once "primary" is recovered, we would need to move to it again
> (as
> > we
> > >> > were in #1)
> > >> >
> > >> > To fullfill #2, I have thought to have a new Kafka cluster
> "secondary"
> > >> and
> > >> > setup a replication procedure using MM2. However, it is not clear to
> > me
> > >> how
> > >> > to proceed.
> > >> >
> > >> > I would describe the high level details so you guys can point my
> > >> > misconceptions:
> > >> >
> > >> > A) Initial situation. As in the example of the KIP-382, in the
> primary
> > >> > cluster, we will have a local topic: "topic1" where the producers
> will
> > >> > produce to and the consumers will consume from. MM2 will create in
> > the
> > >> > primary the remote topic "primary.topic1" where the local topic in
> the
> > >> > primary will be replicated. In addition, the consumer group
> > information
> > >> of
> > >> > primary will be also replicated.
> > >> >
> > >> > B) Kafka primary cluster is not available. Producers are moved to
> > >> produce
> > >> > into the topic1 that it 

Re: [External] Re: MM2 for DR

2020-02-12 Thread Brian Sang
Not sure if you saw this before, but you might be interested in some of the
work Uber has done for inter-cluster replication and federation. They of
course use their own tool, uReplicator (https://github.com/uber/uReplicator),
instead of mirror maker, but you should be able to draw the same insights
as them.

This talk from kafka summit SF 2019 wasn't about ureplicator per se, but
goes over some problems they experienced and addressed w.r.t. inter-cluster
replication and federation:
https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-cluster-federation-at-uber

On Tue, Feb 11, 2020 at 10:05 PM benitocm  wrote:

> Hi Ryanne,
>
> Please could you elaborate a bit more about the active-active
> recommendation?
>
> Thanks in advance
>
> On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:
>
> > Thanks very much for the response.
> >
> > Please could you elaborate a bit more about  "I'd
> > arc in that direction. Instead of migrating A->B->C->D..., active/active
> is
> > more like having one big cluster".
> >
> > Another thing that I would like to share is that currently my consumers
> > only consumer from one topic so the fact of introducing MM2 will impact
> > them.
> > Any suggestion in this regard would be greatly appreciated
> >
> > Thanks in advance again!
> >
> >
> > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> > wrote:
> >
> >> Hello, sounds like you have this all figured out actually. A couple
> notes:
> >>
> >> > For now, we just need to handle DR requirements, i.e., we would not
> need
> >> active-active
> >>
> >> If your infrastructure is sufficiently advanced, active/active can be a
> >> lot
> >> easier to manage than active/standby. If you are starting from scratch
> I'd
> >> arc in that direction. Instead of migrating A->B->C->D..., active/active
> >> is
> >> more like having one big cluster.
> >>
> >> > secondary.primary.topic1
> >>
> >> I'd recommend using regex subscriptions where possible, so that apps
> don't
> >> need to worry about these potentially complex topic names.
> >>
> >> > An additional question. If the topic is compacted, i.e.., the topic
> >> keeps
> >> > forever, does switchover operations would imply add an additional path
> >> in
> >> > the topic name?
> >>
> >> I think that's right. You could always clean things up manually, but
> >> migrating between clusters a bunch of times would leave a trail of
> >> replication hops.
> >>
> >> Also, you might look into implementing a custom ReplicationPolicy. For
> >> example, you could squash "secondary.primary.topic1" into something
> >> shorter
> >> if you like.
> >>
> >> Ryanne
> >>
> >> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
> >>
> >> > Hi,
> >> >
> >> > After having a look to the talk
> >> >
> >> >
> >>
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> >> > and the
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> >> > I am trying to understand how I would use it
> >> > in the setup that I have. For now, we just need to handle DR
> >> requirements,
> >> > i.e., we would not need active-active
> >> >
> >> > My requirements, more or less, are the following:
> >> >
> >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> >> > producers are producing to and where all the consumers are consuming
> >> from.
> >> > 2) In case "primary" crashes, we would need to have other Kafka
> cluster
> >> > "secondary" where we will move all the producer and consumers and keep
> >> > working.
> >> > 3) Once "primary" is recovered, we would need to move to it again (as
> we
> >> > were in #1)
> >> >
> >> > To fullfill #2, I have thought to have a new Kafka cluster "secondary"
> >> and
> >> > setup a replication procedure using MM2. However, it is not clear to
> me
> >> how
> >> > to proceed.
> >> >
> >> > I would describe the high level details so you guys can point my
> >> > misconceptions:
> >> >
> >> > A) Initial situation. As in the example of the KIP-382, in the primary
> >> > cluster, we will have a local topic: "topic1" where the producers will
> >> > produce to and the consumers will consume from. MM2 will create in
> the
> >> > primary the remote topic "primary.topic1" where the local topic in the
> >> > primary will be replicated. In addition, the consumer group
> information
> >> of
> >> > primary will be also replicated.
> >> >
> >> > B) Kafka primary cluster is not available. Producers are moved to
> >> produce
> >> > into the topic1 that it was manually created. In addition, consumers
> >> need
> >> > to connect to
> >> > secondary to consume the local topic "topic1" where the producers are
> >> now
> >> > producing and from the remote topic  "primary.topic1" where the
> >> producers
> >> > were producing before, i.e., consumers will need to aggregate.This is
> so
> >> > because some consumers could have lag so they will need to consume
> from
> >> > both. In this situation, local topic "topic1" in the