Re: Mirror Maker bidirectional offset sync

2024-01-11 Thread Ryanne Dolan
sets
> > > of
> > > > the prefixed topic in the primary cluster.
> > > > In my example, the producer keeps producing in the primary cluster
> > > whereas
> > > > only the consumer fails over to the secondary cluster and, after some
> > > time
> > > > fails back to the primary cluster. This consumer will then consume
> > > messages
> > > > from the prefixed topic in the secondary cluster, and I'd like to
> have
> > > > those offsets replicated back to the non-prefixed topic in the
> primary
> > > > cluster. If you like I can provide an illustration if that helps to
> > > clarify
> > > > this use case.
> > > >
> > > > To add some context on why I'd like to have this is to retain loose
> > > > coupling between producers and consumers so we're able to test
> failovers
> > > > for individual applications without the need for all
> producers/consumers
> > > to
> > > > failover and failback at once.
> > > >
> > > > Digging through the Connect debug logs I found the offset-syncs of
> the
> > > > prefixed topic not being pushed to mm2-offset-syncs.b.internal is
> likely
> > > > the reason the checkpoint connector doesn't replicate consumer
> offsets:
> > > > DEBUG translateDownstream(replication,a.replicate-me-0,25): Skipped
> > > (offset
> > > > sync not found) (org.apache.kafka.connect.mirror.OffsetSyncStore)
> > > >
> > > > I wrote a small program to produce these offset syncs for the
> prefixed
> > > > topic, and this successfully triggers the Checkpoint connector to
> start
> > > > replicating the consumer offsets back to the primary cluster.
> > > > OffsetSync{topicPartition=replicate-me-0, upstreamOffset=28,
> > > > downstreamOffset=28}
> > > > OffsetSync{topicPartition=replicate-me-0, upstreamOffset=29,
> > > > downstreamOffset=29}
> > > > OffsetSync{topicPartition=a.replicate-me-0, upstreamOffset=29,
> > > > downstreamOffset=29} <-- the artificially generated offset-sync
> > > >
> > > > At this point it goes a bit beyond my understanding of the MM2
> internals
> > > of
> > > > whether this is a wise thing to do and if it would have any negative
> side
> > > > effects. I'd need to spend some more time in the MM2 source, though I
> > > > welcome any feedback on this hack :-)
> > > >
> > > > On the two complications you're mentioning Greg, the second one is
> > > > something we should figure out regardless, as any given consumer
> group
> > > may
> > > > not be active on both the primary and secondary cluster as it would
> block
> > > > MM2 from replicating its offsets from primary to the cluster-prefixed
> > > topic
> > > > on the secondary cluster already. On the first point, I think it
> would
> > > be a
> > > > good practice to only allow MM2 to produce to any cluster-prefixed
> topic
> > > by
> > > > using topic ACLs. In other words, the only application producing to a
> > > > cluster-prefixed (or downstream) topic would be mirror maker and I
> think
> > > > that prevents this kind of message 'drift'. In case a producer has to
> > > > failover, it starts producing to the non-prefixed topic on the
> secondary
> > > > cluster whose offsets are subject to a different Source/Checkpoint
> > > > connector replication stream.
> > > >
> > > > On Mon, Jan 8, 2024 at 9:12 PM Ryanne Dolan 
> > > wrote:
> > > >
> > > > > Jeroen, MirrorClient will correctly translate offsets for both
> > > failover and
> > > > > failback, exactly as you describe. It's possible to automate
> failover
> > > and
> > > > > failback using that logic. The integration tests automatically fail
> > > over
> > > > > and fail back, for example. I've seen it done two ways: during
> startup
> > > > > within the consumer itself, or in an external tool which writes
> offsets
> > > > > directly. In either case MirrorClient will give you the correct
> > > offsets to
> > > > > resume from.
> > > > >
> > > > > MirrorCheckpointConnector will automatically write offsets, but
> only
> > > under
> > > > > certain conditions, to avoid accidentally overwriting offsets. I'm
> not
> > >

Re: Mirror Maker bidirectional offset sync

2024-01-10 Thread Ryanne Dolan
.
> >
> > Digging through the Connect debug logs I found the offset-syncs of the
> > prefixed topic not being pushed to mm2-offset-syncs.b.internal is likely
> > the reason the checkpoint connector doesn't replicate consumer offsets:
> > DEBUG translateDownstream(replication,a.replicate-me-0,25): Skipped
> (offset
> > sync not found) (org.apache.kafka.connect.mirror.OffsetSyncStore)
> >
> > I wrote a small program to produce these offset syncs for the prefixed
> > topic, and this successfully triggers the Checkpoint connector to start
> > replicating the consumer offsets back to the primary cluster.
> > OffsetSync{topicPartition=replicate-me-0, upstreamOffset=28,
> > downstreamOffset=28}
> > OffsetSync{topicPartition=replicate-me-0, upstreamOffset=29,
> > downstreamOffset=29}
> > OffsetSync{topicPartition=a.replicate-me-0, upstreamOffset=29,
> > downstreamOffset=29} <-- the artificially generated offset-sync
> >
> > At this point it goes a bit beyond my understanding of the MM2 internals
> of
> > whether this is a wise thing to do and if it would have any negative side
> > effects. I'd need to spend some more time in the MM2 source, though I
> > welcome any feedback on this hack :-)
> >
> > On the two complications you're mentioning Greg, the second one is
> > something we should figure out regardless, as any given consumer group
> may
> > not be active on both the primary and secondary cluster as it would block
> > MM2 from replicating its offsets from primary to the cluster-prefixed
> topic
> > on the secondary cluster already. On the first point, I think it would
> be a
> > good practice to only allow MM2 to produce to any cluster-prefixed topic
> by
> > using topic ACLs. In other words, the only application producing to a
> > cluster-prefixed (or downstream) topic would be mirror maker and I think
> > that prevents this kind of message 'drift'. In case a producer has to
> > failover, it starts producing to the non-prefixed topic on the secondary
> > cluster whose offsets are subject to a different Source/Checkpoint
> > connector replication stream.
> >
> > On Mon, Jan 8, 2024 at 9:12 PM Ryanne Dolan 
> wrote:
> >
> > > Jeroen, MirrorClient will correctly translate offsets for both
> failover and
> > > failback, exactly as you describe. It's possible to automate failover
> and
> > > failback using that logic. The integration tests automatically fail
> over
> > > and fail back, for example. I've seen it done two ways: during startup
> > > within the consumer itself, or in an external tool which writes offsets
> > > directly. In either case MirrorClient will give you the correct
> offsets to
> > > resume from.
> > >
> > > MirrorCheckpointConnector will automatically write offsets, but only
> under
> > > certain conditions, to avoid accidentally overwriting offsets. I'm not
> sure
> > > whether you can failover and failback using just the automatic
> behavior. My
> > > guess is it works, but you are tripping over one of the safety checks.
> You
> > > might try deleting the consumer group on the source cluster prior to
> > > failback.
> > >
> > > Ryanne
> > >
> > > On Mon, Jan 8, 2024, 9:10 AM Jeroen Schutrup
>  > > >
> > > wrote:
> > >
> > > > Hi all,
> > > > I'm exploring using the MirrorSourceConnector and
> > > MirrorCheckpointConnector
> > > > on Kafka Connect to setup active/active replication between two Kafka
> > > > clusters. Using the DefaultReplicationPolicy replication policy
> class,
> > > > messages originating from the source cluster get replicated as
> expected
> > > to
> > > > the cluster-prefixed topic in the target cluster. Consumergroup
> offsets
> > > > from the source to target cluster are replicated likewise. However,
> once
> > > > the consumer group migrates from the source to the target cluster,
> its
> > > > offsets are not replicated from the target back to the source
> cluster.
> > > > For an active/active setup I'd want consumer group offsets for topic
> > > > . in the target cluster to be
> > > replicated
> > > > back to  in the source cluster. This would allow
> consumers to
> > > > failover & failback between clusters with minimal duplicate message
> > > > consumption.
> > > >
> > > > To clarify my setup a bit; I'm running two single-broker Kafka
> clusters
> > > in
> > > > Docker (cluster A & B), along with a single Connect instance on which
> > > I've
> > > > provisioned four source connectors:
> > > > - A MirrorSourceConnector replicating topics from cluster A to
> cluster B
> > > > - A MirrorSourceConnector replicating topics from cluster B to
> cluster A
> > > > - A MirrorCheckpointConnector translating & replicating offsets from
> > > > cluster A to cluster B
> > > > - A MirrorCheckpointConnector translating & replicating offsets from
> > > > cluster B to cluster A
> > > >
> > > > I'm not sure whether this is by design, or maybe I'm missing
> something.
> > > > I've seen a similar question posted to KAFKA-9076 [1] without a
> > > resolution.
> > > >
> > > > Regards,
> > > > Jeroen
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://issues.apache.org/jira/browse/KAFKA-9076?focusedCommentId=17268908=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17268908
> > > >
> > >
>


Re: Mirror Maker bidirectional offset sync

2024-01-08 Thread Ryanne Dolan
Jeroen, MirrorClient will correctly translate offsets for both failover and
failback, exactly as you describe. It's possible to automate failover and
failback using that logic. The integration tests automatically fail over
and fail back, for example. I've seen it done two ways: during startup
within the consumer itself, or in an external tool which writes offsets
directly. In either case MirrorClient will give you the correct offsets to
resume from.

MirrorCheckpointConnector will automatically write offsets, but only under
certain conditions, to avoid accidentally overwriting offsets. I'm not sure
whether you can failover and failback using just the automatic behavior. My
guess is it works, but you are tripping over one of the safety checks. You
might try deleting the consumer group on the source cluster prior to
failback.

Ryanne

On Mon, Jan 8, 2024, 9:10 AM Jeroen Schutrup 
wrote:

> Hi all,
> I'm exploring using the MirrorSourceConnector and MirrorCheckpointConnector
> on Kafka Connect to setup active/active replication between two Kafka
> clusters. Using the DefaultReplicationPolicy replication policy class,
> messages originating from the source cluster get replicated as expected to
> the cluster-prefixed topic in the target cluster. Consumergroup offsets
> from the source to target cluster are replicated likewise. However, once
> the consumer group migrates from the source to the target cluster, its
> offsets are not replicated from the target back to the source cluster.
> For an active/active setup I'd want consumer group offsets for topic
> . in the target cluster to be replicated
> back to  in the source cluster. This would allow consumers to
> failover & failback between clusters with minimal duplicate message
> consumption.
>
> To clarify my setup a bit; I'm running two single-broker Kafka clusters in
> Docker (cluster A & B), along with a single Connect instance on which I've
> provisioned four source connectors:
> - A MirrorSourceConnector replicating topics from cluster A to cluster B
> - A MirrorSourceConnector replicating topics from cluster B to cluster A
> - A MirrorCheckpointConnector translating & replicating offsets from
> cluster A to cluster B
> - A MirrorCheckpointConnector translating & replicating offsets from
> cluster B to cluster A
>
> I'm not sure whether this is by design, or maybe I'm missing something.
> I've seen a similar question posted to KAFKA-9076 [1] without a resolution.
>
> Regards,
> Jeroen
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-9076?focusedCommentId=17268908=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17268908
>


Re: Delay issue when replicating topics creation with Kafka MM2 cluster

2023-02-20 Thread Ryanne Dolan
As far as I know, there is no way for MM2 to be notified when a new topic
is created, so it polls at a configurable interval. You can try adjusting
the refresh.topics.interval.seconds property. It sounds like it's
configured for 10 minutes.

Ryanne

On Mon, Feb 20, 2023, 11:35 AM Fares Oueslati 
wrote:

> Hello,
>
> I am currently using Kafka mirror maker v2 v3.2.0 to replicate a Kafka
> cluster version 3.2.0 to another Kafka cluster version 3.2.0. However, I am
> experiencing a delay issue where it takes around 10 minutes for a topic to
> be created in the destination cluster after it has been created in the
> source cluster. I am replicating all topics and groups with identity
> replication and have provided the configs of the mirrors below.
>
> When I check the status, all the tasks are running smoothly and the logs
> show no errors (I even increased the connect.root.logger.level to DEBUG).
> The CPU and memory consumption of the cluster are both below the request
> levels.
>
> Despite everything appearing to be fine, I am still experiencing a delay
> when I create a topic in the source cluster. I should note that once the
> topic is created, message replication is instantaneous, it is only the
> topic creation that is delayed by a few minutes.
>
> Here are the configs for the different connectors of the MM2 cluster:
>
> mirrors:
>   - sourceConnector:
>   config:
> consumer.auto.offset.reset: latest
> sync.topic.acls.enabled: "false"
> replication.policy.separator: ""
> replication.policy.class:
> org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> heartbeatConnector:
>   config:
> heartbeats.topic.replication.factor: 1
> checkpointConnector:
>   tasksMax: 6
>   config:
> replication.policy.separator: ""
> replication.policy.class:
> org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> checkpoints.topic.replication.factor: 1
> sync.group.offsets.enabled: "true"
> topicsPattern: ".*"
> groupsPattern: ".*"
>
>
> Thank you for your help on this.
>


Re: Help with MM2 active/passive configuration

2022-10-19 Thread Ryanne Dolan
Hey Chris, check out the readme in connect/mirror for examples on how to
run mirror maker as a standalone process. It's similar to how Cloudera's
mirror maker is configured. If you've got a Connect cluster already, I
recommend using that and manually configuring the MirrorSourceConnector.

Ryanne

On Wed, Oct 19, 2022, 5:26 AM Chris Peart  wrote:

>
>
> Hi All,
>
> I'm new to MirrorMaker and have a production cluster running version
> 2.8.1 and have a development cluster running the same version.
>
> Our prod cluster has 4 brokers and our dev cluster has 3 brokers, both
> have a replication factor of 3.
>
> I would like to setup an active/passive replication using MM2, we used
> to do this with Cloudera but have we have decommissioned Cloudera and
> would like to know how to configure topic replication using MM2.
>
> I believe i need a mm2.properties file to achieve this. i do see what
> looks like a configuration in
> /opt/kafka_2.13-2.8.1/config/connect-mirror-maker.properties but i'm not
> sure if this is an active/passive configuration.
>
> Ideally an example file would be ideal if possible?
>
> Many Thanks,
>
> Chris


Re: Mirror maker disable auto topic creation

2022-09-15 Thread Ryanne Dolan
The link is correct. Connect and mm2 create those internal topics at
startup, whether or not auto topic creation is enabled.

On Wed, Sep 14, 2022, 11:15 PM Mcs Vemuri 
wrote:

> Hello,
> Is there any way to disable topic creation in MM2? I tried setting the
> topic.creation.enable to False but the MM-offsets/configs/status topics are
> still created(broker auto topic creation is also set to false)
> I found this- https://groups.google.com/g/confluent-platform/c/JRZmpCEZElo
> which seems to indicate that it isn’t possible - can anyone please advise?
> I’m on v2.7.0
>


Re: MM2 - mapping different replica number

2022-07-12 Thread Ryanne Dolan
Yes, you can configure replication.factor to match the target cluster.

Ryanne

On Tue, Jul 12, 2022, 12:44 PM An, Hongguo (CORP)
 wrote:

> Hi:
> My source has 5 brokers and all topic has 5 replica, but my target cluster
> has only 3 brokers and I can’t allocate 2 more, is it possible for MM2 to
> change all topic copied to have replica as 3?
>
> Thanks
> Andrew
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>


Re: MirrorMaker 2 not replicating topic ACLs correctly

2022-04-15 Thread Ryanne Dolan
Alex, MM2 is very conservative wrt replicating ACLs. Usually only MM2 is
supposed to write to remote topics, so it usually doesn't make sense to
replicate WRITE for other identities. Currently we only sync READ to remote
topics, and we don't touch non-remote topics.

If your use-case requires replicating WRITE permission, you'll need to do
it manually.

Ryanne


On Fri, Apr 15, 2022, 2:40 PM Alex Zuroff  wrote:

> Hi,
>
> We are using MirrorMaker 2 (version 2.6.2 - to match our cluster version)
> to migrate applications from one cluster to another, and as such, need the
> topic ACLs to be the same on both clusters.  The ACLs are being replicated,
> but the operation is being set to "READ", even if the operation in the
> source cluster was "ALL" or "WRITE".
>
> Here's an example ACL:
>
> Old cluster -
> Current ACLs for resource `ResourcePattern(resourceType=TOPIC,
> name=gms-price-logic-detail, patternType=LITERAL)`:
> (principal=User:CN=MirrorMaker_DEV, host=*, operation=ALL,
> permissionType=ALLOW)
> (principal=User:CN=WGS_DEV, host=*, operation=ALL, permissionType=ALLOW)
>
> New cluster -
> Current ACLs for resource `ResourcePattern(resourceType=TOPIC,
> name=gms-price-logic-detail, patternType=LITERAL)`:
> (principal=User:CN=WGS_DEV, host=*, operation=READ, permissionType=ALLOW)
> (principal=User:CN=MirrorMaker_DEV, host=*, operation=READ,
> permissionType=ALLOW)
>
> If I manually add an ALL ACL to a topic on the new cluster, MM2 will
> sooner or later add another READ ACL to that same topic, even though the
> ACLs now match.
>
> Is there some hidden config value I'm missing?  I've verified in the logs
> that "sync.topic.acls.enabled" is set to true.
>
> Thanks,
> Alex
> **
>


Re: Mirror Maker 2 - High Throughput Identity Mirroring

2022-03-02 Thread Ryanne Dolan
Henry Cai has a PR somewhere.

Ryanne

On Wed, Mar 2, 2022, 3:36 AM Antón Rodríguez Yuste 
wrote:

> Hi Ryanne,
>
> Is there a PR or code I could take a look at or just the KIP? "shallow
> mirroring" seems very interesting for our use cases, and I would like to
> evaluate it internally.
>
> Thanks,
>
> Antón
>
> On Thu, Jul 29, 2021 at 7:02 PM Ryanne Dolan 
> wrote:
>
> > Jamie, this would depend on KIP-712 (or similar) aka "shallow mirroring".
> > This is a work in progress, but I'm optimistic it'll happen at some
> point.
> >
> > ftr, "IdentityReplicationPolicy" has landed for the upcoming release, tho
> > "identity" in that context just means that topics aren't renamed.
> >
> > Ryanne
> >
> > On Thu, Jul 29, 2021, 11:37 AM Jamie 
> wrote:
> >
> > > Hi All,
> > > This blog post:
> > > https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/ mentions
> > > that "High Throughput Identity Mirroring" (when the compression is the
> > same
> > > in both the source and destination cluster) will soon be coming to MM2
> > > which would avoid the MM2 consumer decompressing the data only for the
> > MM2
> > > producer to then re-compress it again.
> > > Has this feature been implemented yet in MM2?
> > > Many Thanks,
> > > Jamie
> >
>


Re: MM2 setup using Kafka 2.4 compatibility with Kafka Broker built using 2.0.0 ver.

2022-02-07 Thread Ryanne Dolan
Ranga, yes that will work. You are correct that mm2's compatibility matrix
is essentially the same as the client's. I've tested 2.4 mm2 against 0.10.2
brokers and it works fine. Anything older than that may still generally
work, but you may see errors for unsupported apis.

Ryanne

On Mon, Feb 7, 2022, 9:58 AM Ranga  wrote:

> Hi Kakfa Experts
>
> Trying to understand compatibility of MM2 libraries packaged as part of
> core kafka release 2.4 with Kafka Cluster running with Kafka core version
> 2.0.0.
>
> The major improvements to MM2 were implemented as part of the KIP-382
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> >
> and
> merged via this PR . In
> this PR I don’t see any changes being made to core kafka. The changes are
> only seen with Kafka-connect and mirrormaker related code.
>
> As I know Kafka Connect is a client application to kafka (Similar to any
> standard consumer and producer application). And MM2 built on top of Kafka
> Connect Architecture.
>
> Though MirrorMaker2 related libraries are bundled with Core kafka libraries
> i.e., 2.4 Kafka version as part of release cycles and packages, I think MM2
> doesn't have absolute dependency on the same version of the Kafka Broker
> (i.e., the one built with kafka 2.4.).
>
> The reason being the MM2 component is client side.
>
> Is it valid to assume that we can set MM2 using kafka core 2.4 release
> packages to work with Kafka Broker set up with Kafka 2.0.0?
>
> Any responses, comments or any materials in this regard are highly
> appreciated.
>
>
> Many thanks in advance!
>
>
> --
> Thanks & Regards
> Ranga
>


Re: Is it possible to run MirrorMaker in active/active/active?

2022-01-31 Thread Ryanne Dolan
Sorry, I just meant mesh in the generic sense. A topology where every node
is directly connected to every other node is sometimes called a
"fully-connected mesh topology". In this context, I mean that you can set
up a replication topology where every cluster is replicated directly to
every other cluster, which is what you're after, I think.

Ryanne


On Mon, Jan 31, 2022, 2:18 PM Doug Whitfield 
wrote:

> Hi Ryanne,
>
> I think you are probably correct, but just for clarity, you are talking
> about a data mesh, not a service mesh, correct?
>
> Best Regards,
> --
>
> Doug Whitfield | Enterprise Architect, OpenLogic<
> https://www.openlogic.com/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
> Perforce Software<
> http://www.perforce.com/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
> Visit us on: LinkedIn<
> https://www.linkedin.com/company/perforce?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | Twitter<
> https://twitter.com/perforce?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | Facebook<
> https://www.facebook.com/perforce/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | YouTube<
> https://www.youtube.com/user/perforcesoftware?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
>
>
>
> From: Ryanne Dolan 
> Date: Monday, January 31, 2022 at 1:12 PM
> To: Kafka Users 
> Subject: Re: Is it possible to run MirrorMaker in active/active/active?
> Doug, you can have any number of clusters with a fully-connected mesh
> topology, which I think is what you are looking for.
>
> Ryanne
>
> On Mon, Jan 31, 2022, 12:44 PM Doug Whitfield 
> wrote:
>
> > Hi folks,
> >
> > Every example I have seen uses two clusters in active/active and testing
> > suggests I can only get two clusters to run active/active.
> >
> > I think we will need to use a fan-in pattern if we want more than two
> > clusters. Is that correct?
> >
> > Best Regards,
> > --
> >
> > Doug Whitfield | Enterprise Architect, OpenLogic<
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.openlogic.com%2F%3Futm_leadsource%3Demail-signature%26utm_source%3Doutlook-direct-email%26utm_medium%3Demail%26utm_campaign%3D2019-common%26utm_content%3Demail-signature-linkdata=04%7C01%7Cdwhitfield%40perforce.com%7C69479167348b414b1acb08d9e4eda4c9%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637792531610265981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=Hh0LBcY84p3HRXqLXdm6NA1Ak1Sd2hUEa2%2BzuER9PZs%3Dreserved=0
> > >
> > Perforce Software<
> >
> http://www.perforce.com/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> > >
> > Visit us on: LinkedIn<
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fperforce%3Futm_leadsource%3Demail-signature%26utm_source%3Doutlook-direct-email%26utm_medium%3Demail%26utm_campaign%3D2019-common%26utm_content%3Demail-signature-linkdata=04%7C01%7Cdwhitfield%40perforce.com%7C69479167348b414b1acb08d9e4eda4c9%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637792531610265981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=K8Q5Eg3YM8G7jM4Aveb5HoeC2yNZIA%2FdIgYzuBY1mpw%3Dreserved=0
> ><
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fperforce%3Futm_leadsource%3Demail-signature%26utm_source%3Doutlook-direct-email%26utm_medium%3Demail%26utm_campaign%3D2019-common%26utm_content%3Demail-signature-linkdata=04%7C01%7Cdwhitfield%40perforce.com%7C69479167348b414b1acb08d9e4eda4c9%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637792531610265981%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=K8Q5Eg3YM8G7jM4Aveb5HoeC2yNZIA%2FdIgYzuBY1mpw%3Dreserved=0%3e
> >
> > | Twitter<
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fperforce%3Futm_leadsource%3Demail-signature%26utm_source%3Doutlook-direct-email%26utm_medium%3Demail%26utm_campaign%3D2019-common%26utm_content%3Demail-signature-linkdata=04%7C01%7Cdwhitfield%40perforce.com%7C69479167348b414b1acb08d9e4eda4c9%7C95b666d19a75

Re: Is it possible to run MirrorMaker in active/active/active?

2022-01-31 Thread Ryanne Dolan
Doug, you can have any number of clusters with a fully-connected mesh
topology, which I think is what you are looking for.

Ryanne

On Mon, Jan 31, 2022, 12:44 PM Doug Whitfield 
wrote:

> Hi folks,
>
> Every example I have seen uses two clusters in active/active and testing
> suggests I can only get two clusters to run active/active.
>
> I think we will need to use a fan-in pattern if we want more than two
> clusters. Is that correct?
>
> Best Regards,
> --
>
> Doug Whitfield | Enterprise Architect, OpenLogic<
> https://www.openlogic.com/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
> Perforce Software<
> http://www.perforce.com/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
> Visit us on: LinkedIn<
> https://www.linkedin.com/company/perforce?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | Twitter<
> https://twitter.com/perforce?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | Facebook<
> https://www.facebook.com/perforce/?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link>
> | YouTube<
> https://www.youtube.com/user/perforcesoftware?utm_leadsource=email-signature_source=outlook-direct-email_medium=email_campaign=2019-common_content=email-signature-link
> >
>
>
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>


Re: MirrorMaker2 fails to replicate - please urgent help needed

2021-10-25 Thread Ryanne Dolan
Hmm, sounds like something is wrong with advertised.listeners on the
brokers. Even if (as you mentioned) you can reach the brokers over tcp, the
brokers must be configured to advertise the right addresses and ports. It
sounds like mm can reach the bootstrap servers, but then these brokers are
providing the wrong listeners (in metadata responses). This would explain
why mm can create topics but not replicate.

Ryanne

On Mon, Oct 25, 2021, 3:38 PM Rijo Roy  wrote:

>  Hi Ryanne,
>
> Thanks for responding!
>
> I am using only one node to run the mm atm. I found some luck in enabling
> the connect.log and now it is printing a lot of lines like below -
>
> [2021-10-25 20:13:24,477] WARN [Worker clientId=connect-2, groupId=A-mm2]
> 51 partitions have leader brokers without a matching listener, including
> [__consumer_offsets-0, __consumer_offsets-10, __consumer_offsets-20,
> __consumer_offsets-40, __consumer_offsets-30, __consumer_offsets-9,
> __consumer_offsets-39, __consumer_offsets-11, __consumer_offsets-31,
> __consumer_offsets-13]
> (org.apache.kafka.clients.NetworkClient:1070)[2021-10-25 20:13:24,512] INFO
> AbstractConfig
> values: (org.apache.kafka.common.config.AbstractConfig:347)[2021-10-25
> 20:13:24,515] INFO Connector B->A configured.
> (org.apache.kafka.connect.mirror.MirrorMaker:212)org.apache.kafka.connect.runtime.rest.errors.BadRequestException:
> Connector configuration is invalid and contains the following 1
> error(s):Invalid value
> io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy for configuration
> replication.policy.class: Class
> io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy could not be
> found.
>
> but mostly this one - [2021-10-25 20:23:48,430] WARN [Worker
> clientId=connect-2, groupId=A-mm2] 51 partitions have leader brokers
> without a matching listener, including [__consumer_offsets-0,
> __consumer_offsets-10, __consumer_offsets-20, __consumer_offsets-40,
> __consumer_offsets-30, __consumer_offsets-9, __consumer_offsets-39,
> __consumer_offsets-11, __consumer_offsets-31, __consumer_offsets-13]
> (org.apache.kafka.clients.NetworkClient:1070)
>
> Could you help me understand the WARN. Now I have stopped it as this
> looked confusing [2021-10-25 20:13:24,515] INFO Connector B->A configured.
> (org.apache.kafka.connect.mirror.MirrorMaker:212) as my data is on A which
> needs to be mirrored to B and that is what I have set in my mm2.properties
> (pasting it again fyr)
> A->B.enabled = true
>
> B->A.enabled = false
>
> Thanks again for your help!
>
>
> ~
>
> On Tuesday, 26 October, 2021, 01:46:54 am IST, Ryanne Dolan <
> ryannedo...@gmail.com> wrote:
>
>  Hello, how many mm nodes are you using? Try starting with one and adding
> more after the first starts working. There is a known race that effects
> some people trying to start multiple nodes at once.
>
> Also check for any potential auth problems. The driver isn't very verbose
> when it comes to auth problems, and it's possible for some nodes to have
> the right certs while others don't (due to the distributed nature of
> Connect). For example, I've often seen the leader node have the proper
> configuration while the other Workers don't have the same creds in the same
> place.
>
> Ryanne
>
> On Mon, Oct 25, 2021, 2:42 PM Rijo Roy  wrote:
>
> > Hi,
> >
> > I was very much relieved to see MirrorMaker2 working in my sandbox for a
> > major project where we are migrating out of cluster A to cluster B for a
> > better hardware. To my surprise, it fails to migrate any user topics
> when I
> > used it in one of our real pre-prod environments where we have real
> topics.
> > Let me give a brief about my environment -
> >
> > ENV A
> >
> > OS: Ubuntu 18
> > 3 node Kafka cluster
> > 3 node ZooKeeper cluster
> >
> > ENV B
> > OS: Ubuntu 18
> > 3 node Kafka cluster
> > 3 node ZooKeeper cluster
> >
> > Kafka version: Confluent kafka 5.5
> > Zk version: 3.5.8
> >
> > here is the mm2.properties I used in my sandbox fyr where replication is
> > working fine -
> > # mm2.propertiesclusters = A, BA.bootstrap.servers = Ahost1:port,
> > Ahost2:port, Ahost3:portB.bootstrap.servers =
> > Bhost1:port, Bhost2:port, Bhost3:portA->B.enabled = trueB->A.enabled =
> >
> falsereplication.factor=3checkpoints.topic.replication.factor=3heartbeats.topic.replication.factor=3offset-syncs.topic.replication.factor=3replication.policy.separator=source.cluster.alias=target.cluster.alias=replication.policy.class=
> > io.strimzi.kafka.connect.mirror.IdentityReplicationPolicytopics =
> .*groups
> > = .*emit.checkpoints.interval.seconds =
> &

Re: MirrorMaker2 fails to replicate - please urgent help needed

2021-10-25 Thread Ryanne Dolan
Hello, how many mm nodes are you using? Try starting with one and adding
more after the first starts working. There is a known race that effects
some people trying to start multiple nodes at once.

Also check for any potential auth problems. The driver isn't very verbose
when it comes to auth problems, and it's possible for some nodes to have
the right certs while others don't (due to the distributed nature of
Connect). For example, I've often seen the leader node have the proper
configuration while the other Workers don't have the same creds in the same
place.

Ryanne

On Mon, Oct 25, 2021, 2:42 PM Rijo Roy  wrote:

> Hi,
>
> I was very much relieved to see MirrorMaker2 working in my sandbox for a
> major project where we are migrating out of cluster A to cluster B for a
> better hardware. To my surprise, it fails to migrate any user topics when I
> used it in one of our real pre-prod environments where we have real topics.
> Let me give a brief about my environment -
>
> ENV A
>
> OS: Ubuntu 18
> 3 node Kafka cluster
> 3 node ZooKeeper cluster
>
> ENV B
> OS: Ubuntu 18
> 3 node Kafka cluster
> 3 node ZooKeeper cluster
>
> Kafka version: Confluent kafka 5.5
> Zk version: 3.5.8
>
> here is the mm2.properties I used in my sandbox fyr where replication is
> working fine -
> # mm2.propertiesclusters = A, BA.bootstrap.servers = Ahost1:port,
> Ahost2:port, Ahost3:portB.bootstrap.servers =
> Bhost1:port, Bhost2:port, Bhost3:portA->B.enabled = trueB->A.enabled =
> falsereplication.factor=3checkpoints.topic.replication.factor=3heartbeats.topic.replication.factor=3offset-syncs.topic.replication.factor=3replication.policy.separator=source.cluster.alias=target.cluster.alias=replication.policy.class=
> io.strimzi.kafka.connect.mirror.IdentityReplicationPolicytopics = .*groups
> = .*emit.checkpoints.interval.seconds =
> 10security.protocol=SASL_PLAINTEXTsasl.mechanism=PLAINsasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> required \  username="v" \  password="c";
>
> I used the same configuration in my pre-prod environment where I have to
> successfully test the cutover procedure inorder to move to PROD within the
> speculated tight timeline but it fails to replicate any data. The only job
> it is successfully doing is creating these 3 below topics in both A and B
> clusters -
> mm2-configs.n.internalmm2-offsets.n.internalmm2-status.n.internal
> where n is A in source and B in target
> And the kafka logs does not have anything to help me with troubleshooting
> neither have any monitoring tools to track the sync progress if at all it
> is happening.
>
> Could you please help me here -
> 1. How to enable logging for mirrormaker
> 2. How can I successfully monitor it
> 3. How can I make it work
>
> Note: Its not a network traffic issue as I have verified telnet from
> A<-->B and it is working.
>
> Thanks!
>


Re: Mirror Maker 2: use prefix-less topic names?

2021-10-07 Thread Ryanne Dolan
Jake, in the most recent Kafka 3.0 release you will find
IdentityReplicationPolicy, which does what you want. Just be careful that
you don't try to replicate the same topics in a loop.

Ryanne

On Thu, Oct 7, 2021, 1:31 PM Jake Mayward  wrote:

> Hi,
>
> I have read a bit on MirrorMaker 2 via
> https://kafka.apache.org/documentation/#georeplication-overview, and I am
> curious whether I can enable replication with MirrorMaker 2 without having
> to prefix the topic with the cluster name. Reasoning behind this is that I
> would like the client to always use the same topic name, regardless of the
> cluster it is connected to.
>
> Note: I have seen the
> https://github.com/apache/kafka/blob/trunk/connect/mirror-client/src/main/java/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.java
> but it isn't obvious to me yet whether MM2 takes care of preventing a
> replication loop.
>
> Thanks for any hints!
> Jake
>


Re: question about mm2 on consumer group offset mirroring

2021-09-30 Thread Ryanne Dolan
Hey Calvin, the property you're looking for is
emit.checkpoint.interval.seconds. That's how often MM will write
checkpoints, which includes consumer group offsets.

Ryanne

On Thu, Sep 30, 2021, 9:18 AM Calvin Chen  wrote:

> Hi all
>
> I have a question about the mirror make 2, on the consumer group offset
> mirroring, what is the duration for mm2 to detect consumer group offset
> change and mirror it to remote kafka consumer group?
>
> I have my mm2 code define as below:
>
>
> {{ kafka01_name }}->{{ kafka02_name }}.sync.group.offsets.enabled = true
> {{ kafka02_name }}->{{ kafka01_name }}.sync.group.offsets.enabled = true
>
> refresh.topics.interval.seconds=10
> refresh.groups.interval.seconds=10
>
> so I would expect the consumer group offset mirroring would happen every
> around 10 second, but during test, I see sometime consumer group offset
> mirroring are quick, sometimes it takes minutes, so I would like to know
> how is offset mirrored and why there is time difference, thanks
>
> -Calvin
>


Re: Mirror Maker 2 with different Avro Schema Registries

2021-09-14 Thread Ryanne Dolan
Yes it's a consequence of Connect, which has its own serde model.

mirror-makers hard-coded ByteArraySerialization for
> consumers/producers


Re: Mirror Maker 2 with different Avro Schema Registries

2021-09-14 Thread Ryanne Dolan
Hey Anders, take a look at Connect's serdes and SMTs. MirrorMaker can be
configured to use them.

Ryanne

On Tue, Sep 14, 2021, 3:13 AM Anders Engström  wrote:

> Hi!
> I'm trying to replicate a few topics using Mirror Maker 2 (2.8).
>
> Both the source and the target cluster use Schema Registry (Karapace, on
> Aiven)
> and the Confluent Avro serialization format when publishing messages.
>
> This means that messages replicated from source->target are not readable
> with
> the Avro deserializer in the target cluster - since the schema-id either
> does
> not exist, or points to the wrong definition, in the target schema
> registry.
>
> I've tried to work around this issue by forcing the source consumer to read
> messages as Schema Registry Avro (using the source schema-registry), and
> configure the target producer to write messages using the target
> schema-registry:
>
> {
> "name": "mirror-maker",
>   "config": {
>   "connector.class":
> "org.apache.kafka.connect.mirror.MirrorSourceConnector",
>   "replication.factor": "1",
>   "heartbeats.topic.replication.factor": "1",
>   "checkpoints.topic.replication.factor": "1",
>   "offset-syncs.topic.replication.factor": "1",
>   "source.cluster.alias": "SOURCE",
>   "target.cluster.alias": "DEST",
>   "source.cluster.bootstrap.servers": "localhost:9092",
>   "target.cluster.bootstrap.servers": "localhost:9093",
>   "topics": "dev\\.pub\\..*",
>   "emit.checkpoints.interval.seconds": "1",
>   "emit.heartbeats.interval.seconds": "1",
>   "tasks.max": "3",
>   "sync.topic.configs.enabled": "false",
>   "sync.topic.acls.enabled": "false",
>   "sync.group.offsets.enabled": "false",
>   "refresh.topics.interval.seconds": "10",
>   "source.consumer.key.deserializer":
> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
>   "source.consumer.value.deserializer":
> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
>   "source.consumer.schema.registry.url": "http://localhost:8081;,
>   "target.producer.key.serializer":
> "io.confluent.kafka.serializers.KafkaAvroSerializer",
>   "target.consumer.value.serializer":
> "io.confluent.kafka.serializers.KafkaAvroSerializer",
>   "target.producer.schema.registry.url": "http://localhost:8082; }
> }
>
> However, this does not seem to have any effect. It seems that the
> de/serializer
> for consumers/producers are not overridable in the mirror-maker connector.
> It
> always uses `ByteArray(De)Serializer`. I guess this is by design?
>
> So - I would really appreciate advice on how to handle this replication
> scenario. I'm guessing it's a pretty common setup.
>
> Best regards /Anders
>


Re: Mirror Maker 2 - High Throughput Identity Mirroring

2021-07-29 Thread Ryanne Dolan
Jamie, this would depend on KIP-712 (or similar) aka "shallow mirroring".
This is a work in progress, but I'm optimistic it'll happen at some point.

ftr, "IdentityReplicationPolicy" has landed for the upcoming release, tho
"identity" in that context just means that topics aren't renamed.

Ryanne

On Thu, Jul 29, 2021, 11:37 AM Jamie  wrote:

> Hi All,
> This blog post:
> https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/ mentions
> that "High Throughput Identity Mirroring" (when the compression is the same
> in both the source and destination cluster) will soon be coming to MM2
> which would avoid the MM2 consumer decompressing the data only for the MM2
> producer to then re-compress it again.
> Has this feature been implemented yet in MM2?
> Many Thanks,
> Jamie


Re: MirrorMaker 2.0 compliance question

2021-07-21 Thread Ryanne Dolan
yep!

On Wed, Jul 21, 2021, 3:18 AM Tomer Zeltzer 
wrote:

> Hi,
>
> Can I use MirrorMaker2.0 from Kafka 2.8.0 with Kafka version 2.4.0?
>
> Thanks,
> Tomer Zeltzer
>
> This email and the information contained herein is proprietary and
> confidential and subject to the Amdocs Email Terms of Service, which you
> may review at https://www.amdocs.com/about/email-terms-of-service <
> https://www.amdocs.com/about/email-terms-of-service>
>


Re: How to avoid storing password in clear text in server.properties file

2021-06-21 Thread Ryanne Dolan
Take a look at the ConfigProvider interface.

On Mon, Jun 21, 2021, 8:26 AM Dhirendra Singh  wrote:

> Hi All,
> I am currently storing various passwords like "ssl.keystore.password",
> "ssl.truststore.password", SASL plain user password in cleartext in
> server.properties file.
> is there any way to store the password in encrypted text ?
> i am using kafka version 2.5.0
>


Re: Help !! Apache-Kafka On-Premises Setup.

2021-06-18 Thread Ryanne Dolan
Nitin, I know this isn't a particularly helpful response, but you should
know using a load balancer like that is really hard to set up with kafka
due to the fact that it advertises broker addresses to clients. I've done
it a few times but it's something of a dark art :)

If you can get away without using an LB, you should do that. Generally they
don't help since Kafka will tell clients which brokers to connect to,
regardless of which broker they originally connected to.

That said, I'm aware of cases where LB and NAT and whatnot are required. In
those cases the usual approach is to have one public DNS entry for each
broker and configure Kafka to advertise those. For a static number of
brokers this works pretty well.

Ryanne

On Fri, Jun 18, 2021, 4:08 PM Nitin Sharma  wrote:

> Actually we are using Load balancer in this setup like as follows:-
>
> 1. Public user/producer hits the public IP x.x.x.x:9092 which is a static
> NAT with load balancer private IP y.y.y.y: 9092.
>
> 2. In load balancer we have a pool of servers which contains the 3 node
> clustered Kafka broker servers. Load balancer will distribute the
> connection requests among the Kafka servers.
>
> 3. Kafka broker server/ leader will talk to primary zookeeper(which is also
> 3 node cluster) and consumes & write the data on database servers.
>
>
> Please suggest if this is a feasible or recommended solution by Kafka as
> for on-premises setup. If yes then what are recommendations to make it
> working.
>
> Because we are facing issue in name resolution after request landed onto
> the Kafka broker server as it is not taking the same reverse path to
> fulfill the request and name resolution as well.
>
> Thanks
> Nitin Sharma
>
> On Fri, 18 Jun, 2021, 10:50 pm Mich Talebzadeh,  >
> wrote:
>
> > OK the three node Kafka cluster is classic.Pretty easy to set-up with
> three
> > brokers and three zookeepers on three nodes.
> >
> > You quoted scala version 2.13. Your kafka version should be something
> like
> > below
> >
> > *kafka_2.12-2.7.0*
> >
> >  For Kafka 2.7.0 with Scala 2.12
> >
> > For DNS refer to this SO thead
> >
> > How to Setup a Public Kafka Broker Using a Dynamic DNS?
> > <
> >
> https://stackoverflow.com/questions/53705573/how-to-setup-a-public-kafka-broker-using-a-dynamic-dns
> > >
> >
> > HTH
> >
> >
> >
> >view my Linkedin profile
> > 
> >
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >
> > On Fri, 18 Jun 2021 at 21:32, Nitin Sharma  wrote:
> >
> > > Hello Mich,
> > >
> > >
> > > Thanks for your response.
> > >
> > > :Please find the answers of your points :-
> > >
> > > 1. Have you set-up Apache Kafka on AWS as part of AWS offerings on
> > >Compute engines (VM) and if so how many nodes? - > *No it's in
> > > on-premises.*
> > >2. Are you using containers in the Cloud? -- > *No containers are
> > being
> > > used.*
> > >3.  Which version of Kafka -- > *2.13 version *
> > >4. Any third party offerings as a service for this set-up in AWS
> --->
> > > *No
> > > third party offering are being used*
> > >5. Assuming that you have a three node cluster in Cloud with Kafka
> and
> > >zookeeper, do you have the same number of nodes on-premises ( Data
> > >centre ) that you intend to utilise?  > *Yes 3 node cluster but
> > > on-premises*
> > >6. Finally what is not working in your Data centre -- >  *We want to
> > > connect Kafka cluster from public network with 3 node cluster. *
> > >
> > > *Note :- one more thing we want to know is , How important the DNS is
> in
> > > the Kafka setup while accessing it from open internet and what are the
> > must
> > > have requirements to achieve this setup to work in the On-premises
> > > environment.*
> > >
> > > Regards
> > > Nitin Sharma
> > >
> > > .
> > >
> > > On Fri, Jun 18, 2021 at 7:36 PM Mich Talebzadeh <
> > mich.talebza...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Vital pieces are missing. How has the cluster been set up in AWS?
> > > >
> > > >
> > > >1. Have you set-up Apache Kafka on AWS as part of AWS offerings on
> > > >Compute engines (VM) and if so how many nodes?
> > > >2. Are you using containers in the Cloud?
> > > >3.  Which version of Kafka
> > > >4. Any third party offerings as a service for this set-up in AWS
> > > >5. Assuming that you have a three node cluster in Cloud with Kafka
> > and
> > > >zookeeper, do you have the same number of nodes on-premises ( Data
> > > >centre ) that you intend to utilise?
> > > >6. Finally what is not working in your Data centre
> > > >
> > > >
> > > > HTH
> > > >

Re: kafka mirror with TLS and SASL enabled

2021-06-17 Thread Ryanne Dolan
Calvin, that's an interesting idea. The motivation behind the current
behavior is to only grant principals access to data they already have
access to. If a principal can access data in one cluster, there's no harm
in providing read access to the same data in another cluster. But you are
right that the resulting ACLs are far from completely synchronized.

With topic renaming, there's a clear distinction between source topics and
replicated topics. With consumer groups we can be smart about which
direction to replicate by looking at which consumer groups are active and
inactive. But it's not immediately clear how we'd replicate other types of
ACLs without introducing races. We'd need a way to reconcile differences
between corresponding ACLs in different clusters. Do they get unioned
together? Does the latest change win? etc.

I agree this would be a nice feature tho. You might want to bring it up on
the dev list.

Ryanne

On Thu, Jun 17, 2021, 10:24 AM Calvin Chen  wrote:

> Hi all
>
> I have a question, does kafka mirror2.0 mirror kafka users(created by
> kafka-configs.sh dynamically) and kafka acls(topic/group)?
>
> I setup below fields in mirror config file, and I think kafka mirror2.0
> should mirror users and acls(topic/group) into remote cluster, but I see
> only part of acl are mirrored, basically missing users and group info.
>
>
> topics=.*
> groups=.*
> sync.topic.acls.enabled = true
>
> I manually created kafka user and acl in remote kafka cluster and then
> mirror works on copying message, do I miss some configuration? I think
> kafka user and acls(topic/group) should be automatically mirrored...
>
> Thanks
> -Calvin
>


Re: Resend old messages, find partitions for a message

2021-06-14 Thread Ryanne Dolan
Andrew, an offset implies a partition -- an offset is only meaningful
within the context of a particular partition -- so if you are able to log
offsets you should also be able to log the corresponding partitions.

For example, the RecordMetadata object, which provides the offset of a
written record, also includes the corresponding TopicPartition.

Ryanne

On Mon, Jun 14, 2021, 11:14 AM Greer, Andrew C <
andrew.c.gr...@conocophillips.com> wrote:

> Hello all,
>
> I am looking to add a "resend" option to my program where a user can
> specify an older message they would like to produce through Kafka again,
> for whatever reason. I can get the topic and offset for each message from
> my logs after consuming a message, but I do not see a way to get which
> partition the message was in. Currently, I do not have a key associated
> with the messages.
>
> My thoughts right now lean towards adding a key system to know which
> partition each message will go to, but I was wondering if there are better
> alternatives to this.
>
>
> Thank you,
>
> Andrew Greer
>
>


Re: MirrorMaker 2 with SSL

2021-04-05 Thread Ryanne Dolan
Yes it's possible. The most common issue in my experience is the location
of the trust store and key store being different or absent on some hosts.
You need to make sure that these locations are consistent across all hosts
in your Connect cluster, or use a ConfigProvider to provide the location
dynamically. Otherwise, a task will get scheduled on some host and fail to
find these files.

Ryanne


On Wed, Mar 31, 2021, 8:22 PM Men Lim  wrote:

> Hello.  I was wondering if someone can help answer my question.  I'm trying
> to run MirrorMaker 2 in distributed mode using SSL.  I have the distributor
> running in SSL but when I can't get the curl REST api to do so. I saw that
> kif-208 fixed this but I can't seem to implement it.
>
> in my mm2-dist.prop file I have set:
> 
> listeners=https://localhost:8443
> security.protocol=SSL
>
> ssl.truststore.location=/home/ec2-user/kafka_2.13-2.7.0/cert/kafka.client.truststore.jks
> 
> my connector.json file look like this:
>
> 
> {
> "name": "mm2-connect-cluster",
> "config":{
> "connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
> "connector.client.config.override.policy": "All",
> "name": "mm2-connect-cluster",
> "topics": "test.*",
> "tasks.max": "1",
> "source.cluster.alias": "source",
> "target.cluster.alias": "target",
> "source.cluster.bootstrap.servers": "source:9094",
> "target.cluster.bootstrap.servers": "target:9094",
> "source->target.enabled": "true",
> "target->source.enabled": "false",
> "offset-syncs.topic.replication.factor": "4",
> "topics.exclude": ".*[\\-\\.]internal, .*\\.replica,
> __consumer_offsets",
> "groups.blacklist": "console-consumer-.*, connect-.*, __.*",
> "topic.creation.enabled": "true",
> "topic.creation.default.replication.factor": "4",
> "topic.creation.default.partitions": "1"
> "key.converter": "org.apache.kafka.connect.json.JsonConverter",
> "value.converter": "org.apache.kafka.connect.json.JsonConverter",
> "security.protocol": "SSL",
> "ssl.truststore.password":
> "/home/ec2-user/kafka_2.13-2.7.0/cert/kafka.client.truststore.jks"
> }
> }
> 
>
> I would then start up the distributor and it launched fine.  So I try to
> run the CURl command
>
> 
> curl -s -X POST -H 'Content-Type: application/json' --data @connector.json
> https://localhost:8443/connectors
> 
> nada.  nothing.  no error.  no reasons for not starting.
>
> Is it possible to run MM2 with SSL?  If so, can someone point me to a
> working example?
>
> thanks.
>


Re: MirrorMaker 2 and Negative Lag

2021-03-25 Thread Ryanne Dolan
Thanks Alan for the investigation and bug report!

On Thu, Mar 25, 2021, 3:25 PM Alan Ning  wrote:

> Another update on this. I am pretty sure I have found a bug in
> MirrorSourceTask. The details are written in
> https://issues.apache.org/jira/browse/KAFKA-12558. I hope this helps
> others
> who have encountered this issue.
>
> ... Alan
>
>
>
> On Thu, Mar 25, 2021 at 9:40 AM Alan Ning  wrote:
>
> > Just an update on this. It has been an adventure.
> >
> > Sam, I think you are right that `consumer.auto.offset.reset:latest` does
> > not work. However, I was also seeing issues with the default behavior
> > (which is consumer.auto.offset.reset:earliest). After a lot of trial and
> > error, I have found that my MM2 setup works a lot better when I give it
> > more resources and tasks.
> >
> > I am mirroring about 1600 partitions, running on AWS M5.4xl with
> > tasks.max=8. With this setup, I will always get a lot of negative offset.
> > In a sense, it really just means the mirroring tasks have stalled
> silently
> > and can't make progress. There was no warning or errors in the logs.
> >
> > I bumped the system up to C5.16xl with tasks.max=180 and suddenly
> > everything runs a lot smoother. The CG offset will start at negative, and
> > will eventually catch up to 0 (or positive).
> >
> > It seems like there is a sweet spot for partitions per task, but I have
> > yet to find any documentation for it.
> >
> > ... Alan
> >
> >
> >
> >
> > On Wed, Mar 17, 2021 at 5:49 PM Alan Ning  wrote:
> >
> >> OK. I follow now. Let me try to re-test to see if it makes a difference.
> >>
> >> Thanks.
> >>
> >> ... Alan
> >>
> >> On Wed, Mar 17, 2021 at 5:46 PM Samuel Cantero 
> >> wrote:
> >>
> >>> I've found that bug the hard way. FWIW I've migrated several clusters
> >>> from
> >>> kafka 0.10 to kafka 2.x using mm2. So offsets sync work fine for kafka
> >>> 0.10.
> >>>
> >>> Best,
> >>>
> >>> On Wed, Mar 17, 2021 at 6:43 PM Samuel Cantero 
> >>> wrote:
> >>>
> >>> > No, what I meant is that offsets sync won't work if
> >>> > `consumer.auto.offset.reset:latest` (it was not talking about that
> >>> > particular bug). Try setting `consumer.auto.offset.reset:earliest`
> and
> >>> do
> >>> > verify if offsets are sync'd correctly.
> >>> >
> >>> > Best,
> >>> >
> >>> > On Wed, Mar 17, 2021 at 6:42 PM Alan Ning  wrote:
> >>> >
> >>> >> Hey Samuel,
> >>> >>
> >>> >> I am aware of that `consumer.auto.offset.reset:latest` problem. It
> was
> >>> >> because this PR
> >>> >> 
> >>> never
> >>> >> made it to trunk. I patched MM2 locally for 2.7 so that `latest`
> >>> offset
> >>> >> will work.
> >>> >>
> >>> >> ... Alan
> >>> >>
> >>> >> On Wed, Mar 17, 2021 at 4:50 PM Samuel Cantero  >
> >>> >> wrote:
> >>> >>
> >>> >> > I've seen this before. I've found that consumer offsets sync does
> >>> not
> >>> >> work
> >>> >> > with `consumer.auto.offset.reset:latest`. If you set this to
> >>> earliest,
> >>> >> then
> >>> >> > it should work. One way to workaround the need to start from
> >>> earliest
> >>> >> is by
> >>> >> > starting with latest and once mirroring is ongoing swap to
> earliest.
> >>> >> This
> >>> >> > won't affect mirroring as the mm2 consumers will resume from the
> >>> last
> >>> >> > committed offsets.
> >>> >> >
> >>> >> > Best,
> >>> >> >
> >>> >> > On Wed, Mar 17, 2021 at 5:27 PM Ning Zhang <
> ning2008w...@gmail.com>
> >>> >> wrote:
> >>> >> >
> >>> >> > > Hello Alan,
> >>> >> > >
> >>> >> > > I may probably see the similar case. One quick validation that
> >>> could
> >>> >> be
> >>> >> > > run is to test on the source cluster with higher Kafka version.
> If
> >>> >> still
> >>> >> > > not working, please email me and I could introduce you to person
> >>> who
> >>> >> may
> >>> >> > > have similar case before.
> >>> >> > >
> >>> >> > > On 2021/03/15 21:59:03, Alan Ning  wrote:
> >>> >> > > > I am running MirrorMaker 2 (Kafka 2.7), trying to migrate all
> >>> topics
> >>> >> > from
> >>> >> > > > one cluster to another while preserving through
> >>> >> > > > `sync.group.offsets.enabled=true`. My source cluster is
> running
> >>> >> Kafka
> >>> >> > > 0.10,
> >>> >> > > > while the target cluster is running 2.6.1.
> >>> >> > > >
> >>> >> > > > While I can see data being replicated, the data on the
> >>> replicated
> >>> >> > > Consumer
> >>> >> > > > Group in the target cluster looks wrong. The lag values of the
> >>> >> > replicated
> >>> >> > > > Consumer Group are large negative values, and the
> >>> LOG-END-OFFSET are
> >>> >> > > mostly
> >>> >> > > > 0. I determined this information from
> kafka-consumer-groups.sh.
> >>> >> > > >
> >>> >> > > > I checked the
> >>> >> kafka_consumer_consumer_fetch_manager_metrics_records_lag
> >>> >> > > JMX
> >>> >> > > > metrics in MM2 and the reported lag is zero for all
> partitions.
> >>> >> > > >
> >>> >> > > > By using `sync.group.offsets.enabled=true`, I envisioned that
> >>> MM2
> >>> >> will
> >>> >> > > 

Re: options for kafka cluster backup?

2021-03-07 Thread Ryanne Dolan
MirrorMaker v1 does not sync offsets, but MM2 does!

Ryanne

On Sun, Mar 7, 2021, 10:02 PM Pushkar Deole  wrote:

> Thanks you all!
>
> Blake, for your comment:
>
> It'll require having a HA cluster running in another region, of course.
> One other caveat is that it doesn't preserve the offsets of the records
>
> -> I believe I can't afford to keep another cluster running due to cost
> reasons.Can you elaborate on the offset part, if offset is not preserved
> then how the backup cluster know where to start processing for each topic?
>
> For example, you could use a Kafka Connect s3 sink. You'd have to write
> some disaster-recovery code to restore lost data from s3 into Kafka.
>
> -> again here the same question, does s3 also store offset for each topic
> as it is modified in kafka? If not then when the back is restored back into
> kafka cluster, how it will know where to process each topic from?
>
> On Sat, Mar 6, 2021 at 4:44 PM Himanshu Shukla <
> himanshushukla...@gmail.com>
> wrote:
>
> > Hi Pushkar,
> >
> > you could also look at the available Kafka-connect plugins. It provides
> > many connectors which could be leveraged to move the data in/out from
> > Kafka.
> >
> > On Sat, Mar 6, 2021 at 10:18 AM Blake Miller 
> > wrote:
> >
> > > MirrorMaker is one reasonable way to do this, certainly it can
> replicate
> > to
> > > another region, with most of the latency being the unavoidable kind, if
> > you
> > > give it enough resources.
> > >
> > > It'll require having a HA cluster running in another region, of course.
> > One
> > > other caveat is that it doesn't preserve the offsets of the records.
> > That's
> > > probably okay for your use-case, but you should be aware of it.
> > >
> > > Since what you want is a backup, there are many ways to do that which
> > might
> > > be cheaper than another Kafka cluster.
> > >
> > > For example, you could use a Kafka Connect s3 sink. You'd have to write
> > > some disaster-recovery code to restore lost data from s3 into Kafka.
> > >
> > > https://www.confluent.io/blog/apache-kafka-to-amazon-s3-exactly-once/
> > >
> > > There are many other sinks available, but s3 might be a reasonable
> choice
> > > for backup. It's inexpensive and reliable.
> > >
> > > On Fri, Mar 5, 2021, 2:48 AM Pushkar Deole 
> wrote:
> > >
> > > > Yes.. so the requirement for me is to have data backed up or
> replicated
> > > in
> > > > a different 'region' to cater for disaster scenarios and recover from
> > > them
> > > >
> > > > On Fri, Mar 5, 2021 at 3:01 PM Ran Lupovich 
> > > wrote:
> > > >
> > > > > I guess that in case of avoiding data lose you would need to use 3
> > > > replica
> > > > > in different rack/sites awareness to avoid data lose, Confluent's
> > > > > Replicator or MirrorMaker are for copying data from one cluster to
> > > > another
> > > > > usually in different dc / regions, If I am not mistaken
> > > > >
> > > > > בתאריך יום ו׳, 5 במרץ 2021, 11:21, מאת Pushkar Deole ‏<
> > > > > pdeole2...@gmail.com
> > > > > >:
> > > > >
> > > > > > Thanks Luke... is the mirror maker asynchronous? What will be
> > typical
> > > > lag
> > > > > > between the replicated cluster and running cluster and in case of
> > > > > disaster,
> > > > > > what are the chances of data loss?
> > > > > >
> > > > > > On Fri, Mar 5, 2021 at 11:37 AM Luke Chen 
> > wrote:
> > > > > >
> > > > > > > Hi Pushkar,
> > > > > > > MirrorMaker is what you're looking for.
> > > > > > > ref:
> > > > > https://kafka.apache.org/documentation/#georeplication-mirrormaker
> > > > > > >
> > > > > > > Thanks.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Fri, Mar 5, 2021 at 1:50 PM Pushkar Deole <
> > pdeole2...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I was looking for some options to backup a running kafka
> > cluster,
> > > > for
> > > > > > > > disaster recovery requirements. Can someone provide what are
> > the
> > > > > > > available
> > > > > > > > options to backup and restore a running cluster in case the
> > > entire
> > > > > > > cluster
> > > > > > > > goes down?
> > > > > > > >
> > > > > > > > Thanks..
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Himanshu Shukla
> >
>


Re: Mirror Maker 2 - Issues

2021-03-07 Thread Ryanne Dolan
You can see the replicated consumer group offsets using the
kafka-consumer-groups.sh tool. Make sure the consumer group is in the
groups allowlist.

Ryanne

On Sun, Mar 7, 2021, 11:09 PM Navneeth Krishnan 
wrote:

> Hi Ryanne,
>
> I commented out the alias config and set target to source as false. Now I
> don't see the error anymore and everything looks good. Thanks a lot for the
> feedback. One more question, how can I check if the consumer group offsets
> are replicated. If I switch over an application to this new cluster, the
> consumer should just read from where it left off right.
>
> Also I will try the custom replication policy and let you know.
>
> Here is the new config.
>
> # Kafka datacenters.
> clusters = source, target
> source.bootstrap.servers = loc-kafka01.eu-prod.dnaspaces.io:9092,
> loc-kafka02.eu-prod.dnaspaces.io:9092
> target.bootstrap.servers = loc-msk-kafka-1.eu-prod.dnaspaces.io:9092,
> loc-msk-kafka-2.eu-prod.dnaspaces.io:9092,
> loc-msk-kafka-3.eu-prod.dnaspaces.io:9092
>
> # Source and target clusters configurations.
> source.config.storage.replication.factor = 2
> target.config.storage.replication.factor = 2
>
> source.offset.storage.replication.factor = 2
> target.offset.storage.replication.factor = 2
>
> source.status.storage.replication.factor = 2
> target.status.storage.replication.factor = 2
>
> source->target.enabled = true
> target->source.enabled = false
>
> # Mirror maker configurations.
> offset-syncs.topic.replication.factor = 2
> heartbeats.topic.replication.factor = 2
> checkpoints.topic.replication.factor = 2
>
> topics = .*
> groups = .*
>
> tasks.max = 3
> replication.factor = 2
> refresh.topics.enabled = true
> sync.topic.configs.enabled = true
> refresh.topics.interval.seconds = 10
>
> topics.blacklist = .*[\-\.]internal, .*\.replica, __consumer_offsets
> groups.blacklist = console-consumer-.*, connect-.*, __.*
>
> # Enable heartbeats and checkpoints.
> source->target.emit.heartbeats.enabled = true
> source->target.emit.checkpoints.enabled = true
>
> # customize as needed
> # replication.policy.separator = ""
> # source.cluster.alias: ""
> # target.cluster.alias: ""
> # sync.topic.acls.enabled = false
> # emit.heartbeats.interval.seconds = 5
>
> Thanks
>
> On Sun, Mar 7, 2021 at 2:17 PM Ryanne Dolan  wrote:
>
> > Navneeth, it looks like you are trying to override the cluster aliases to
> > the empty string. If you look closely, these properties are defined using
> > incorrect syntax. I'm not sure how the properties would be parsed, but
> you
> > should fix that to rule it out as a source of confusion.
> >
> > Then, be aware that overriding cluster aliases like that may not work as
> > you intend. Consider using a custom ReplicationPolicy if you are trying
> to
> > do "identity" replication. There are several implementations floating
> > around.
> >
> > Ryanne
> >
> > On Sun, Mar 7, 2021, 1:44 AM Navneeth Krishnan  >
> > wrote:
> >
> > > Hi All,
> > >
> > > I'm trying to use mirror maker 2 to replicate data to our new AWS MSK
> > kafka
> > > cluster and I have been running into so many issues and I couldn't find
> > > proper documentation. Need some help and it's very urgent. Thanks
> > >
> > > Also I don't see any of my topics created.
> > >
> > > Note: There are no consumers on the destination brokers
> > >
> > > *Run Command*
> > > bin/connect-mirror-maker.sh config/mm2.properties
> > >
> > > *Kafka Versions*
> > > Source: 2.3
> > > MSK: 2.6.1
> > > Mirror Maker Node: 2.7 (Using 2.7 to replicate group offsets)
> > >
> > > *Exception*
> > > [2021-03-07 07:32:15,145] ERROR Scheduler for MirrorCheckpointConnector
> > > caught exception in scheduled task: creating internal topics
> > > (org.apache.kafka.connect.mirror.Scheduler:102)
> > > org.apache.kafka.connect.errors.ConnectException: Error while
> attempting
> > to
> > > create/find topic(s) '"".checkpoints.internal'
> > > at
> > >
> >
> org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:321)
> > > at
> > >
> > >
> >
> org.apache.kafka.connect.mirror.MirrorUtils.createCompactedTopic(MirrorUtils.java:109)
> > > at
> > >
> > >
> >
> org.apache.kafka.connect.mirror.MirrorUtils.createSinglePartitionCompactedTopic(MirrorUtils.java:114)
> > > at
> > >
> > >
>

Re: Mirror Maker 2 - Issues

2021-03-07 Thread Ryanne Dolan
Navneeth, it looks like you are trying to override the cluster aliases to
the empty string. If you look closely, these properties are defined using
incorrect syntax. I'm not sure how the properties would be parsed, but you
should fix that to rule it out as a source of confusion.

Then, be aware that overriding cluster aliases like that may not work as
you intend. Consider using a custom ReplicationPolicy if you are trying to
do "identity" replication. There are several implementations floating
around.

Ryanne

On Sun, Mar 7, 2021, 1:44 AM Navneeth Krishnan 
wrote:

> Hi All,
>
> I'm trying to use mirror maker 2 to replicate data to our new AWS MSK kafka
> cluster and I have been running into so many issues and I couldn't find
> proper documentation. Need some help and it's very urgent. Thanks
>
> Also I don't see any of my topics created.
>
> Note: There are no consumers on the destination brokers
>
> *Run Command*
> bin/connect-mirror-maker.sh config/mm2.properties
>
> *Kafka Versions*
> Source: 2.3
> MSK: 2.6.1
> Mirror Maker Node: 2.7 (Using 2.7 to replicate group offsets)
>
> *Exception*
> [2021-03-07 07:32:15,145] ERROR Scheduler for MirrorCheckpointConnector
> caught exception in scheduled task: creating internal topics
> (org.apache.kafka.connect.mirror.Scheduler:102)
> org.apache.kafka.connect.errors.ConnectException: Error while attempting to
> create/find topic(s) '"".checkpoints.internal'
> at
> org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:321)
> at
>
> org.apache.kafka.connect.mirror.MirrorUtils.createCompactedTopic(MirrorUtils.java:109)
> at
>
> org.apache.kafka.connect.mirror.MirrorUtils.createSinglePartitionCompactedTopic(MirrorUtils.java:114)
> at
>
> org.apache.kafka.connect.mirror.MirrorCheckpointConnector.createInternalTopics(MirrorCheckpointConnector.java:163)
> at org.apache.kafka.connect.mirror.Scheduler.run(Scheduler.java:93)
> at
> org.apache.kafka.connect.mirror.Scheduler.executeThread(Scheduler.java:112)
> at
>
> org.apache.kafka.connect.mirror.Scheduler.lambda$execute$2(Scheduler.java:63)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> *mm2.properties*
> # Kafka brokers.
> clusters = source, target
> source.bootstrap.servers = <>
> target.bootstrap.servers = <>
>
> # Source and target clusters configurations.
> source.config.storage.replication.factor = 2
> target.config.storage.replication.factor = 2
>
> source.offset.storage.replication.factor = 2
> target.offset.storage.replication.factor = 2
>
> source.status.storage.replication.factor = 2
> target.status.storage.replication.factor = 2
>
> source->target.enabled = true
> target->source.enabled = true
>
> # Mirror maker configurations.
> offset-syncs.topic.replication.factor = 2
> heartbeats.topic.replication.factor = 2
> checkpoints.topic.replication.factor = 2
>
> topics = .*
> groups = .*
>
> tasks.max = 3
> replication.factor = 2
> refresh.topics.enabled = true
> sync.topic.configs.enabled = true
> refresh.topics.interval.seconds = 10
>
> topics.blacklist = .*[\-\.]internal, .*\.replica, __consumer_offsets
> groups.blacklist = console-consumer-.*, connect-.*, __.*
>
> # Enable heartbeats and checkpoints.
> source->target.emit.heartbeats.enabled = true
> source->target.emit.checkpoints.enabled = true
>
> # customize as needed
> replication.policy.separator = ""
> source.cluster.alias: ""
> target.cluster.alias: ""
> # sync.topic.acls.enabled = false
> # emit.heartbeats.interval.seconds = 5
>
> Thanks
>


Re: When using MM2, how should the consumer be switched from source to target?

2021-01-02 Thread Ryanne Dolan
Aki, that's right. Prior to 2.7, you can use the translateOffsets method
from within your client code, or you can write a little command-line tool
to do it. I've done the latter for Cloudera's srm-control tool, as
documented here:
https://docs.cloudera.com/csp/2.0.1/srm-using/topics/srm-migrating-consumer-groups.html

Ryanne

On Thu, Dec 17, 2020, 5:23 PM Aki Yoshida  wrote:

> I'm answering to my own mail and would like to hear if this assumption
> is correct.
> it looks like I need 2.7.0 to have the automatic translation.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0
> for earlier versions, there is no way to use the console client but
> use java API RemoteClusterUtils.translateOffsets() to get the offset
> and re-consuming from the new topic.
> Is this correct?
>
> El jue, 17 dic 2020 a las 21:18, Aki Yoshida ()
> escribió:
> >
> > Hi,
> > I have a question regarding how to migrate a consumer when the
> > subscribing topic has been migrated to the target Kafka broker.
> > Suppose a consumer is consuming from topic1 at the source Kafka
> > broker. When MM2 mirrors this topic using options
> > source.cluster.alias="", replication.policy.separator="", topic named
> > topic1 shows up at the target Kafka broker.
> >
> > When the consumer instance simply switches the bootstrap server to
> > start consuming from this topic1 at the target broker, the consumer
> > seems to start subscribing from the latest offset. I thought the
> > consumer offsets were also migrated to the target and the consumer
> > could simply start subscribing from the mirrored topic from the
> > continued offset. Did I miss some configuration parameters or does the
> > consumer need to perform some actions to be able to consume the
> > records seamlessly?
> >
> > I appreciate for your help. Thanks.
> >
> > regards, aki
>


Re: Announcing #TwelveDaysOfSMT

2020-12-14 Thread Ryanne Dolan
Robin, I don't think this is an appropriate use of the user group.

Ryanne

On Mon, Dec 14, 2020 at 10:20 AM Robin Moffatt  wrote:

>  #TwelveDaysOfSMT  
>
> Day 5️⃣: MaskField
> https://rmoff.net/2020/12/14/twelve-days-of-smt-day-5-maskfield/
>
>
> --
>
> Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff
>
>
> On Fri, 11 Dec 2020 at 17:14, Robin Moffatt  wrote:
>
> >  #TwelveDaysOfSMT  
> >
> > Day 4️⃣: RegexRouter
> > https://rmoff.net/2020/12/11/twelve-days-of-smt-day-4-regexrouter/
> >
> >
> > --
> >
> > Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff
> >
> >
> > On Thu, 10 Dec 2020 at 17:56, Robin Moffatt  wrote:
> >
> >> Here's the third instalment of  #TwelveDaysOfSMT  
> >>
> >> Flatten :https://rmoff.net/2020/12/10/twelve-days-of-smt-day-3-flatten/
> >>
> >>
> >> --
> >>
> >> Robin Moffatt | Senior Developer Advocate | ro...@confluent.io | @rmoff
> >>
> >>
> >> On Thu, 10 Dec 2020 at 09:49, Robin Moffatt  wrote:
> >>
> >>> Here's the second instalment of  #TwelveDaysOfSMT  
> >>>
> >>> Day 2: ValueToKey and ExtractField :
> >>>
> https://rmoff.net/2020/12/09/twelve-days-of-smt-day-2-valuetokey-and-extractfield/
> >>>
> >>>
> >>> --
> >>>
> >>> Robin Moffatt | Senior Developer Advocate | ro...@confluent.io |
> @rmoff
> >>>
> >>>
> >>> On Wed, 9 Dec 2020 at 09:07, Robin Moffatt  wrote:
> >>>
>  ✨ Do you use Single Message Transforms in Kafka Connect? Or maybe
>  you've wondered what they are?
> 
>   ✍️I'm doing a series of videos and blogs about them during December
>  (#TwelveDaysOfSMT), with the first one out today:
> 
> https://rmoff.net/2020/12/08/twelve-days-of-smt-day-1-insertfield-timestamp/
> 
>  Reply here with any feedback, any particular SMT you'd like me to
>  explore, or scenarios to solve :)
> 
> 
>  --
> 
>  Robin Moffatt | Senior Developer Advocate | ro...@confluent.io |
> @rmoff
> 
> >>>
>


Re: MirrorMaker 2 Reload Configuration

2020-11-11 Thread Ryanne Dolan
Hey guys, this is because the configuration gets loaded into the internal
mm2-config topics, and these may get out of sync with the mm2.properties
file in some scenarios. I believe this occurs whenever an old/bad
configuration gets written to Kafka, which MM2 can read successfully but
which causes MM2 to get stuck before it can write any updates back to the
mm2-config topics. Just modifying the mm2.properties file does not resolve
the issue, since Workers read from the mm2-config topics, not the
mm2.properties file directly.

The fix is to truncate or delete the mm2-config and mm2-status topics. N.B.
do _not_ delete the mm2-offsets topics, as this would cause MM2 to
restart replication from offset 0.

I'm not sure why deleting these topics works, but it seems to cause Connect
to wait for the new configuration to be loaded from mm2.properties, rather
than reading the old configuration from mm2-config and getting stuck.

Can someone report the issue in jira?

Ryanne

On Wed, Nov 11, 2020 at 9:35 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I have a similar issue.  I changed the source cluster bootstrap address and
> MM2 picked it up only partially. Some parts of it still use the old
> address, some the new. The old and the new address list is routed to the
> same cluster, same brokers, just on a different network path.
>
> So is there any way to force the configuration update?
>
> Cheers,
> Peter
>
> On Wed, 4 Nov 2020 at 18:39, Ning Zhang  wrote:
>
> > if your new topics are not named "topic1" or "topic2", maybe you want to
> > use regex * to allow more topics to be considered by Mm2
> >
> > # regex which defines which topics gets replicated. For eg "foo-.*"
> > src-cluster->dst-cluster.topics = topic1,topic2
> >
> > On 2020/10/30 01:48:00, "Devaki, Srinivas" 
> > wrote:
> > > Hi Folks,
> > >
> > > I'm running mirror maker as a dedicated cluster as given in the
> > > mirrormaker 2 doc. but for some reason when I add new topics and
> > > deploy the mirror maker it's not detecting the new topics at all, even
> > > the config dumps in the mirror maker startup logs don't show the newly
> > > added topics.
> > >
> > > I've attached the config that I'm using, initially I assumed that
> > > there might be some refresh configuration option either in connect or
> > > mirror maker, but the connect rest api doesn't seem to be working in
> > > this mode and also couldn't find any refresh configuration option.
> > >
> > > Any ideas on this? Thank you in advance
> > >
> > > ```
> > > clusters = src-cluster, dst-cluster
> > >
> > > # disable topic prefixes
> > > src-cluster.replication.policy.separator =
> > > dst-cluster.replication.policy.separator =
> > > replication.policy.separator =
> > > source.cluster.alias =
> > > target.cluster.alias =
> > >
> > >
> > > # enable idemptotence
> > > source.cluster.producer.enable.idempotence = true
> > > target.cluster.producer.enable.idempotence = true
> > >
> > > # connection information for each cluster
> > > # This is a comma separated host:port pairs for each cluster
> > > # for e.g. "A_host1:9092, A_host2:9092, A_host3:9092"
> > > src-cluster.bootstrap.servers =
> > >
> >
> sng-kfnode1.internal:9092,sng-kfnode1.internal:9092,sng-kfnode1.internal:9092
> > > dst-cluster.bootstrap.servers =
> > >
> >
> prod-online-v2-kafka-1.internal:9092,prod-online-v2-kafka-2.internal:9092,prod-online-v2-kafka-3.internal:9092,prod-online-v2-kafka-4.internal:9092,prod-online-v2-kafka-5.internal:9092
> > >
> > > # regex which defines which topics gets replicated. For eg "foo-.*"
> > > src-cluster->dst-cluster.topics = topic1,topic2
> > >
> > > # client-id
> > > src-cluster.client.id = prod-mm2-onlinev1-to-onlinev2-consumer-v0
> > > dst-cluster.client.id = prod-mm2-onlinev1-to-onlinev2-producer-v0
> > >
> > >
> > > # group.instance.id=_mirror_make_instance_1
> > > # consumer should periodically emit heartbeats
> > > src-cluster->dst-cluster.consumer.auto.offset.reset = earliest
> > > src-cluster->dst-cluster.consumer.overrides.auto.offset.reset =
> earliest
> > >
> > > # connector should periodically emit heartbeats
> > > src-cluster->dst-cluster.emit.heartbeats.enabled = true
> > >
> > > # frequency of heartbeats, default is 5 seconds
> > > src-cluster->dst-cluster.emit.heartbeats.interval.seconds = 10
> > >
> > > # connector should periodically emit consumer offset information
> > > src-cluster->dst-cluster.emit.checkpoints.enabled = true
> > >
> > > # frequency of checkpoints, default is 5 seconds
> > > src-cluster->dst-cluster.emit.checkpoints.interval.seconds = 10
> > >
> > > # whether to monitor source cluster ACLs for changes
> > > src-cluster->dst-cluster.sync.topic.acls.enabled = false
> > >
> > > # whether or not to monitor source cluster for configuration changes
> > > src-cluster->dst-cluster.sync.topic.configs.enabled = true
> > > # add retention.ms to the default list given in the
> > DefaultConfigPropertyFilter
> > > #
> >
> 

Re: Bidirectional XDCR by MirrorMaker2

2020-09-25 Thread Ryanne Dolan
Oleg, if you want bidirectional replication, you should have:

dc1->dc2.enabled = true
dc2->dc1.enabled = true

... in _both_  DCs. The replication topology should be consistent across
all DCs, generally. Otherwise DCs will disagree on what should get
replicated and you'll likely encounter confusing behavior.

So the topology is global, but you can use the --clusters argument to
localize a driver instance to a specific DC:

# in dc1:
$ connect-mirror-maker.sh --clusters dc1 ...

# in dc2:
$ connect-mirror-maker.sh --clusters dc2 ...

Ryanne

On Fri, Sep 25, 2020, 3:06 AM Oleg Osipov 
wrote:

> Hello!
> KIP-382 states the following
> "For cross-datacenter replication (XDCR), each datacenter should have a
> single Connect cluster which pulls records from the other data centers via
> source connectors. Replication may fan-out within each datacenter via sink
> connectors."
>
> It looks like, here we have unidirectional replication.
> Assume, we have two datacenters DC1 and DC2, so I may run MM2 with the
> configuration in  DC1:
> dc1->dc2.enabled = true
> dc2->dc1.enabled = true
> Is it correct? Will I have bidirectional replication?
>
>


Re: Two MirrorMakers 2 for two DCs

2020-09-21 Thread Ryanne Dolan
Oleg, yes you can run multiple MM2s for multiple DCs, and generally that's
what you want to do. Are you using Connect to run MM2, or the
connect-mirror-maker.sh driver?

Ryanne

On Mon, Sep 21, 2020, 3:38 PM Oleg Osipov 
wrote:

> I use the configuration for M2M for both datacentres
>   clusters:
> - {"name": "dc1", "bootstrapServers": ip1}
> - {"name": "dc2", "bootstrapServers": ip2}
>
> Do you mean I need use additional names besides 'dc1' and 'dc2'?
>
> On 2020/09/21 17:27:50, nitin agarwal  wrote:
> > Did you keep the cluster name the same ? If yes, then it will cause
> > conflict in metadata stored in MM2 internal topics.
> >
> > Thanks,
> > Nitin
> >
> > On Mon, Sep 21, 2020 at 10:36 PM Oleg Osipov  >
> > wrote:
> >
> > > Hello!
> > >
> > > I have two datacenters DC1 and DC2. When I deploy M2M in DC1 or DC2 all
> > > things look correct. I can create a topic, and this topic will be
> > > syncronized with another datacenter. In this case,  I have only one
> mirror
> > > maker. But I want to deploy M2M in each DC. So after I done this,
> newly
> > > created topics aren't replicated anymore. It doesn't look as correct
> > > behavior. Am I wrong?  Can I deploy M2M in two (or even more)
> datacenters?
> > >
> >
>


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-21 Thread Ryanne Dolan
Josh, make sure there is a consumer in cluster B subscribed to A.topic1.
Wait a few seconds for a checkpoint to appear upstream on cluster A, and
then translateOffsets() will give you the correct offsets.

By default MM2 will block consumers that look like kafka-console-cosumer,
so make sure you specify a custom group ID when testing this.

Ryanne

On Thu, Aug 20, 2020, 11:21 AM Josh C  wrote:

> Thanks again Ryanne, I didn't realize that MM2 would handle that.
>
> However, I'm unable to mirror the remote topic back to the source cluster
> by adding it to the topic whitelist. I've also tried to update the topic
> blacklist and remove ".*\.replica" (since the blacklists take precedence
> over the whitelists), but that doesn't seem to be doing much either? Is
> there something else I should be aware of in the mm2.properties file?
>
> Appreciate all your help!
>
> Josh
>
> On Wed, Aug 19, 2020 at 12:55 PM Ryanne Dolan 
> wrote:
>
> > Josh, if you have two clusters with bidirectional replication, you only
> get
> > two copies of each record. MM2 won't replicate the data "upstream", cuz
> it
> > knows it's already there. In particular, MM2 knows not to create topics
> > like B.A.topic1 on cluster A, as this would be an unnecessary cycle.
> >
> > >  is there a reason for MM2 not emitting checkpoint data for the source
> > topic AND the remote topic
> >
> > No, not really! I think it would be surprising if one-directional flows
> > insisted on writing checkpoints both ways -- but it's also surprising
> that
> > you need to explicitly allow a remote topic to be checkpointed. I'd
> support
> > changing this, fwiw.
> >
> > Ryanne
> >
> > On Wed, Aug 19, 2020 at 2:30 PM Josh C  wrote:
> >
> > > Sorry, correction -- I am realizing now it would be 3 copies of the
> same
> > > topic data as A.topic1 has different data than B.topic1. However, that
> > > would still be 3 copies as opposed to just 2 with something like topic1
> > and
> > > A.topic1.
> > >
> > > As well, if I were to explicitly replicate the remote topic back to the
> > > source cluster by adding it to the topic whitelist, would I also need
> to
> > > update the topic blacklist and remove ".*\.replica" (since the
> blacklists
> > > take precedence over the whitelists)?
> > >
> > > Josh
> > >
> > > On Wed, Aug 19, 2020 at 11:46 AM Josh C 
> wrote:
> > >
> > > > Thanks for the clarification Ryanne. In the context of active/active
> > > > clusters, does this mean there would be 6 copies of the same topic
> > data?
> > > >
> > > > A topics:
> > > > - topic1
> > > > - B.topic1
> > > > - B.A.topic1
> > > >
> > > > B topics:
> > > > - topic1
> > > > - A.topic1
> > > > - A.B.topic1
> > > >
> > > > Out of curiosity, is there a reason for MM2 not emitting checkpoint
> > data
> > > > for the source topic AND the remote topic as a pair as opposed to
> > having
> > > to
> > > > explicitly replicate the remote topic back to the source cluster just
> > to
> > > > have the checkpoints emitted upstream?
> > > >
> > > > Josh
> > > >
> > > > On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan 
> > > > wrote:
> > > >
> > > >> Josh, yes it's possible to migrate the consumer group back to the
> > source
> > > >> topic, but you need to explicitly replicate the remote topic back to
> > the
> > > >> source cluster -- otherwise no checkpoints will flow "upstream":
> > > >>
> > > >> A->B.topics=test1
> > > >> B->A.topics=A.test1
> > > >>
> > > >> After the first checkpoint is emitted upstream,
> > > >> RemoteClusterUtils.translateOffsets() will translate B's A.test1
> > offsets
> > > >> into A's test1 offsets for you.
> > > >>
> > > >> Ryanne
> > > >>
> > > >> On Tue, Aug 18, 2020 at 5:56 PM Josh C 
> > wrote:
> > > >>
> > > >> > Hi there,
> > > >> >
> > > >> > I'm currently exploring MM2 and having some trouble with the
> > > >> > RemoteClusterUtils.translateOffsets() method. I have been
> successful
> > > in
> > > >> > migrating a consumer group from the source cluster to the target
> > > >> cluster,
> > > >> > but was wondering how I could migrate this consumer group back to
> > the
> > > >> > original source topic?
> > > >> >
> > > >> > It is my understanding that there isn't any checkpoint data being
> > > >> > emitted for this consumer group since it is consuming from a
> > mirrored
> > > >> topic
> > > >> > in the target cluster. I'm currently getting an empty map since
> > there
> > > >> isn't
> > > >> > any checkpoint data for 'target.checkpoints.internal' in the
> source
> > > >> > cluster. So, I was wondering how would I get these new translated
> > > >> offsets
> > > >> > to migrate the consumer group back to the source cluster?
> > > >> >
> > > >> > Please let me know if my question was unclear or if you require
> > > further
> > > >> > clarification! Appreciate the help.
> > > >> >
> > > >> > Thanks,
> > > >> > Josh
> > > >> >
> > > >>
> > > >
> > >
> >
>


Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ryanne Dolan
> Can we configure tasks.max for each of these connectors separately?

I don't believe that's currently possible. If you need fine-grained control
over each Connector like that, you might consider running MM2's Connectors
manually on a bunch of Connect clusters. This requires more effort to set
up, but enables you to control the configuration of each Connector using
the Connect REST API.

Ryanne

On Thu, Aug 20, 2020 at 12:30 PM Ananya Sen  wrote:

> Thanks, Ryanne. That answers my questions. I was actually missing this
> "tasks.max" property. Thanks for pointing that out.
>
> Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
> connectors in a Mirror Maker Cluster:
>
>1. KafkaSourceConnector - focus on replicating topic partitions
>2. KafkaCheckpointConnector - focus on replicating consumer groups
>3. KafkaHeartbeatConnector - focus on checking cluster availability
>
> *Can we configure tasks.max for each of these connectors separately? That
> is, Can I have 3 tasks for KafkaSourceConnector, 5
> for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*
>
>
>
> Regards
> Ananya Sen
>
> On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan 
> wrote:
>
> > Ananya, see responses below.
> >
> > > Can this number of workers be configured?
> >
> > The number of workers is not exactly configurable, but you can control it
> > by spinning up drivers and using the '--clusters' flag. A driver instance
> > without '--clusters' will run one worker for each A->B replication flow.
> So
> > e.g. if you've got two clusters being replicated bidirectionally, you'll
> > have an A->B worker and a B->A worker on each MM2 driver.
> >
> > You can use the '--clusters' flag to limit what clusters are targeted
> for a
> > given driver, which is useful in many ways, including to limit the number
> > of workers for a given worker. So e.g. if you've got 10 clusters all
> being
> > replicated in a full mesh you can run a driver with '--clusters A' and it
> > will have only 9 workers, one for each of the other clusters.
> >
> > Also note that there is a configuration property 'tasks.max' that
> controls
> > the number of tasks available to workers. Each A->B flow is replicated
> by a
> > Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> > default, 'tasks.max' is one, which means there will only be one task for
> > each Herd, regardless of how many drivers and workers you spin up. You
> > definitely want to change this property. You can tweak this for each A->B
> > replication flow independently to strike the right balance. If
> 'tasks.max'
> > is the same or more than the total number of topic-partitions being
> > replicated, it will mean each topic-partition is replicated in a
> dedicated
> > task, which is probably not an efficient use of resource overhead.
> >
> > > Does every topic partition given a new task?
> >
> > No, topic-partitions are spread out across tasks. Each topic's partitions
> > are divided round-robin among available tasks. However, keep in mind that
> > if 'tasks.max' is too high, you could end up with one topic-partition in
> > each task.
> >
> > > Does every consumer group - topic pair given a new task for replicating
> > offset?
> >
> > No, consumer-groups are also spread out across tasks. As with
> > topic-partitions, 'tasks.max' applies.
> >
> > > How can I scale up the mirror maker instance so that I can have very
> > little lag?
> >
> > Tweak 'tasks.max' and spin up more driver instances.
> >
> > Ryanne
> >
> > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen 
> wrote:
> >
> > > Thank you Ryanne for the quick response.
> > > I further want to clarify a few points.
> > >
> > > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > > connect we have multiple workers and each worker has some assigned
> task.
> > To
> > > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> > workers.
> > >
> > > 1) Can this number of workers be configured?
> > > 2) What is the default value of this worker configuration?
> > > 3) Does every topic partition given a new task?
> > > 4) Does every consumer group - topic pair given a new task for
> > replicating
> > > offset?
> > >
> > > Also, consider a case where I have 1000 topics in a Kafka cluster and
> > each
> > > topic has a high amount of data + new data is being written at high
> > > throughput. Now I want to 

Re: Mirror Maker 2.0 Queries

2020-08-20 Thread Ryanne Dolan
Ananya, see responses below.

> Can this number of workers be configured?

The number of workers is not exactly configurable, but you can control it
by spinning up drivers and using the '--clusters' flag. A driver instance
without '--clusters' will run one worker for each A->B replication flow. So
e.g. if you've got two clusters being replicated bidirectionally, you'll
have an A->B worker and a B->A worker on each MM2 driver.

You can use the '--clusters' flag to limit what clusters are targeted for a
given driver, which is useful in many ways, including to limit the number
of workers for a given worker. So e.g. if you've got 10 clusters all being
replicated in a full mesh you can run a driver with '--clusters A' and it
will have only 9 workers, one for each of the other clusters.

Also note that there is a configuration property 'tasks.max' that controls
the number of tasks available to workers. Each A->B flow is replicated by a
Herd of Workers (in Connect terminology), and Herds work on Tasks. By
default, 'tasks.max' is one, which means there will only be one task for
each Herd, regardless of how many drivers and workers you spin up. You
definitely want to change this property. You can tweak this for each A->B
replication flow independently to strike the right balance. If 'tasks.max'
is the same or more than the total number of topic-partitions being
replicated, it will mean each topic-partition is replicated in a dedicated
task, which is probably not an efficient use of resource overhead.

> Does every topic partition given a new task?

No, topic-partitions are spread out across tasks. Each topic's partitions
are divided round-robin among available tasks. However, keep in mind that
if 'tasks.max' is too high, you could end up with one topic-partition in
each task.

> Does every consumer group - topic pair given a new task for replicating
offset?

No, consumer-groups are also spread out across tasks. As with
topic-partitions, 'tasks.max' applies.

> How can I scale up the mirror maker instance so that I can have very
little lag?

Tweak 'tasks.max' and spin up more driver instances.

Ryanne

On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen  wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen  wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >1. For running mirror maker as a dedicated mirror maker cluster, the
> >documentation specifies a config file and a starter script. Is this
> mirror
> >maker process distributed ?
> >2. I could not find any port configuration for the above mirror maker
> >process, So can we configure mirror maker itself to run as a cluster
> i.e
> >running the process instance across multiple server to avoid downtime
> due
> >to server crash.
> >3. If we could somehow run the mirror maker as a distributed process
> >then does that mean that topic and consumer offset replication will be
> >shared among those mirror maker processes?
> >4. What is the default port of this mirror maker process and how can
> we
> >override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Ryanne Dolan
Josh, if you have two clusters with bidirectional replication, you only get
two copies of each record. MM2 won't replicate the data "upstream", cuz it
knows it's already there. In particular, MM2 knows not to create topics
like B.A.topic1 on cluster A, as this would be an unnecessary cycle.

>  is there a reason for MM2 not emitting checkpoint data for the source
topic AND the remote topic

No, not really! I think it would be surprising if one-directional flows
insisted on writing checkpoints both ways -- but it's also surprising that
you need to explicitly allow a remote topic to be checkpointed. I'd support
changing this, fwiw.

Ryanne

On Wed, Aug 19, 2020 at 2:30 PM Josh C  wrote:

> Sorry, correction -- I am realizing now it would be 3 copies of the same
> topic data as A.topic1 has different data than B.topic1. However, that
> would still be 3 copies as opposed to just 2 with something like topic1 and
> A.topic1.
>
> As well, if I were to explicitly replicate the remote topic back to the
> source cluster by adding it to the topic whitelist, would I also need to
> update the topic blacklist and remove ".*\.replica" (since the blacklists
> take precedence over the whitelists)?
>
> Josh
>
> On Wed, Aug 19, 2020 at 11:46 AM Josh C  wrote:
>
> > Thanks for the clarification Ryanne. In the context of active/active
> > clusters, does this mean there would be 6 copies of the same topic data?
> >
> > A topics:
> > - topic1
> > - B.topic1
> > - B.A.topic1
> >
> > B topics:
> > - topic1
> > - A.topic1
> > - A.B.topic1
> >
> > Out of curiosity, is there a reason for MM2 not emitting checkpoint data
> > for the source topic AND the remote topic as a pair as opposed to having
> to
> > explicitly replicate the remote topic back to the source cluster just to
> > have the checkpoints emitted upstream?
> >
> > Josh
> >
> > On Wed, Aug 19, 2020 at 6:16 AM Ryanne Dolan 
> > wrote:
> >
> >> Josh, yes it's possible to migrate the consumer group back to the source
> >> topic, but you need to explicitly replicate the remote topic back to the
> >> source cluster -- otherwise no checkpoints will flow "upstream":
> >>
> >> A->B.topics=test1
> >> B->A.topics=A.test1
> >>
> >> After the first checkpoint is emitted upstream,
> >> RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
> >> into A's test1 offsets for you.
> >>
> >> Ryanne
> >>
> >> On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:
> >>
> >> > Hi there,
> >> >
> >> > I'm currently exploring MM2 and having some trouble with the
> >> > RemoteClusterUtils.translateOffsets() method. I have been successful
> in
> >> > migrating a consumer group from the source cluster to the target
> >> cluster,
> >> > but was wondering how I could migrate this consumer group back to the
> >> > original source topic?
> >> >
> >> > It is my understanding that there isn't any checkpoint data being
> >> > emitted for this consumer group since it is consuming from a mirrored
> >> topic
> >> > in the target cluster. I'm currently getting an empty map since there
> >> isn't
> >> > any checkpoint data for 'target.checkpoints.internal' in the source
> >> > cluster. So, I was wondering how would I get these new translated
> >> offsets
> >> > to migrate the consumer group back to the source cluster?
> >> >
> >> > Please let me know if my question was unclear or if you require
> further
> >> > clarification! Appreciate the help.
> >> >
> >> > Thanks,
> >> > Josh
> >> >
> >>
> >
>


Re: MirrorMaker 2.0 - Translating offsets for remote topics and consumer groups

2020-08-19 Thread Ryanne Dolan
Josh, yes it's possible to migrate the consumer group back to the source
topic, but you need to explicitly replicate the remote topic back to the
source cluster -- otherwise no checkpoints will flow "upstream":

A->B.topics=test1
B->A.topics=A.test1

After the first checkpoint is emitted upstream,
RemoteClusterUtils.translateOffsets() will translate B's A.test1 offsets
into A's test1 offsets for you.

Ryanne

On Tue, Aug 18, 2020 at 5:56 PM Josh C  wrote:

> Hi there,
>
> I'm currently exploring MM2 and having some trouble with the
> RemoteClusterUtils.translateOffsets() method. I have been successful in
> migrating a consumer group from the source cluster to the target cluster,
> but was wondering how I could migrate this consumer group back to the
> original source topic?
>
> It is my understanding that there isn't any checkpoint data being
> emitted for this consumer group since it is consuming from a mirrored topic
> in the target cluster. I'm currently getting an empty map since there isn't
> any checkpoint data for 'target.checkpoints.internal' in the source
> cluster. So, I was wondering how would I get these new translated offsets
> to migrate the consumer group back to the source cluster?
>
> Please let me know if my question was unclear or if you require further
> clarification! Appreciate the help.
>
> Thanks,
> Josh
>


Re: Kafka Mirror Maker 2: RemoteClusterUtils.translateOffsets() returning empty map

2020-08-04 Thread Ryanne Dolan
Sunny, check the groups and groups.blacklist properties. By default, MM2
won't replicate consumer groups from kafka-console-consumer, for example.
Sometimes that confuses people when testing MM2.

Also check the logs to see if there is any reason MirrorCheckpointConnector
might be failing to start.

Ryanne

On Tue, Aug 4, 2020 at 10:27 AM Sunny Lohani  wrote:

> Hi Ryanne,
>
> First of all, thanks for a quick revert. Actually, I have a consumer
> application consuming messages in cluster A and then I failover the
> consumer from A to cluster B. Also, the consumer is subscribed to both
> local and remote topics. Let's say the local topic name in both the
> clusters A and B is test-topic. Here is a sequence of steps that I am
> following before and after the failover:
>
> 1. Consumer application subscribes to topics test-topic and B.test-topic in
> cluster A with group.id "test-consumer"
> 2. It consumes some messages from both the topics.
> 3. Now, we stop the consumer application and restart it pointing to cluster
> B, subscribed to topics test-topic and A.test-topic, with group.id
> "test-consumer". I use the RemoteClusterUtils.translateOffsets() here.
> 4. The method returns an empty map as well as the checkpoints topic in both
> the clusters is empty.
>
> Let me know if you see anything wrong here.
>
> Thanks,
> Sunny
>
>
> On Tue, Aug 4, 2020 at 8:26 PM Ryanne Dolan  wrote:
>
> > Sunny, is it possible there are no consumer groups? There will be no
> > checkpoints, and thus nothing to use for offset translation, if there are
> > no upstream consumer groups.
> >
> > Ryanne
> >
> > On Tue, Aug 4, 2020, 9:28 AM Sunny Lohani 
> wrote:
> >
> > > Hi,
> > >
> > > I have 2 data centers, each having single node Zookeeper and Kafka
> > cluster.
> > > I have a topic (single partition) in both the data center kafka
> > clusters. I
> > > am using MM 2.0 as a dedicated cluster for bi-directional replication
> of
> > > the topic as well as using RemoteClusterUtils.translateOffsets() in my
> > > application for offset translation during failover. But the method is
> > > returning an empty map due to which the consumer is not resuming from
> > > proper offsets for local/remote topics.
> > >
> > > When I investigated further, I found that the checkpoint
> > > topics A.checkpoints.internal and B.checkpoints.internal in respective
> > > clusters do not have any kafka message. I don't see any errors in the
> > > mirror maker console logs. I searched everywhere on the internet but
> > could
> > > not get any help. Below is the mm.properties:
> > >
> > > clusters = A, B
> > > A.bootstrap.servers = 10.34.45.113:19092
> > > B.bootstrap.servers = 10.34.45.113:29092
> > >
> > > A->B.enabled = true
> > > A->B.topics = .*
> > > B->A.enabled = true
> > > B->A.topics = .*
> > >
> > > # Setting replication factor of newly created remote topics
> > > replication.factor=1
> > >
> > > checkpoints.topic.replication.factor=1
> > > heartbeats.topic.replication.factor=1
> > > offset-syncs.topic.replication.factor=1
> > >
> > > offset.storage.replication.factor=1
> > > status.storage.replication.factor=1
> > > config.storage.replication.factor=1
> > >
> > > sync.topic.acls.enabled = false
> > >
> > > emit.checkpoints.enabled = true
> > > emit.checkpoints.interval.seconds = 5
> > > 
> > >
> > > Need help on this urgently. Thanks in advance.
> > >
> > > Thanks & Regards,
> > > Sunny Kumar Lohani,
> > >
> >
>


Re: Kafka Mirror Maker 2: RemoteClusterUtils.translateOffsets() returning empty map

2020-08-04 Thread Ryanne Dolan
Sunny, is it possible there are no consumer groups? There will be no
checkpoints, and thus nothing to use for offset translation, if there are
no upstream consumer groups.

Ryanne

On Tue, Aug 4, 2020, 9:28 AM Sunny Lohani  wrote:

> Hi,
>
> I have 2 data centers, each having single node Zookeeper and Kafka cluster.
> I have a topic (single partition) in both the data center kafka clusters. I
> am using MM 2.0 as a dedicated cluster for bi-directional replication of
> the topic as well as using RemoteClusterUtils.translateOffsets() in my
> application for offset translation during failover. But the method is
> returning an empty map due to which the consumer is not resuming from
> proper offsets for local/remote topics.
>
> When I investigated further, I found that the checkpoint
> topics A.checkpoints.internal and B.checkpoints.internal in respective
> clusters do not have any kafka message. I don't see any errors in the
> mirror maker console logs. I searched everywhere on the internet but could
> not get any help. Below is the mm.properties:
>
> clusters = A, B
> A.bootstrap.servers = 10.34.45.113:19092
> B.bootstrap.servers = 10.34.45.113:29092
>
> A->B.enabled = true
> A->B.topics = .*
> B->A.enabled = true
> B->A.topics = .*
>
> # Setting replication factor of newly created remote topics
> replication.factor=1
>
> checkpoints.topic.replication.factor=1
> heartbeats.topic.replication.factor=1
> offset-syncs.topic.replication.factor=1
>
> offset.storage.replication.factor=1
> status.storage.replication.factor=1
> config.storage.replication.factor=1
>
> sync.topic.acls.enabled = false
>
> emit.checkpoints.enabled = true
> emit.checkpoints.interval.seconds = 5
> 
>
> Need help on this urgently. Thanks in advance.
>
> Thanks & Regards,
> Sunny Kumar Lohani,
>


Re: Mirrormaker 2 logs - WARN Catching up to assignment's config offset

2020-07-29 Thread Ryanne Dolan
Iftach, you can try deleting Connect's internal config and status topics.
The status topic records, among other things, the offsets within the config
topics iirc, so if you delete the configs without deleting the status,
you'll get messages such as those. Just don't delete the mm2-offsets
topics, as doing so would result in MM2 starting from the beginning of all
source partitions and re-replicating everything.

You can also check that there are enough partitions in the config and
status topics to account for all the Connect workers. It's possible that
you're in a rebalance loop from too many consumers to those internal topics.

Ryanne

On Wed, Jul 29, 2020 at 1:03 AM Iftach Ben-Yosef 
wrote:

> Hello
>
> I'm running a mirrormaker 2 cluster which copies from 3 source clusters
> into 1 destination. Yesterday I restarted the cluster and it took 1 of the
> mirrored topics a pretty long time to recover (2~ hours)
>
> Since the restart the mm2 cluster has been sending a lot of these warning
> messages from all 3 source clusters
>
>  WARN [Worker clientId=connect-4, groupId=local-ny-mm2] Catching up to
> assignment's config offset.
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020)
>
> Here is a snippet of how the logs look like
>
> TDOUT: [2020-07-29 05:44:17,143] INFO [Worker clientId=connect-2,
> groupId=local-chi-mm2] Rebalance started
> (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:222)
> STDOUT: [2020-07-29 05:44:17,143] INFO [Worker clientId=connect-2,
> groupId=local-chi-mm2] (Re-)joining group
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:552)
> STDOUT: [2020-07-29 05:44:17,144] INFO [Worker clientId=connect-2,
> groupId=local-chi-mm2] Successfully joined group with generation 9005
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:503)
> STDOUT: [2020-07-29 05:44:17,144] INFO [Worker clientId=connect-2,
> groupId=local-chi-mm2] Joined group at generation 9005 with protocol
> version 2 and got assignment: Assignment{error=0,
> leader='connect-1-cb7aa52c-a29a-4cf9-8f50-b691ba38aa3d',
> leaderUrl='NOTUSED/local-chi', offset=1178, connectorIds=[], taskIds=[],
> revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1549)
> STDOUT: [2020-07-29 05:44:17,144] WARN [Worker clientId=connect-2,
> groupId=local-chi-mm2] Catching up to assignment's config offset.
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020)
> STDOUT: [2020-07-29 05:44:17,144] INFO [Worker clientId=connect-2,
> groupId=local-chi-mm2] Current config state offset -1 is behind group
> assignment 1178, reading to end of config log
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1081)
> STDOUT: [2020-07-29 05:44:17,183] INFO [Worker clientId=connect-6,
> groupId=local-chi-mm2] Successfully joined group with generation 9317
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:503)
> STDOUT: [2020-07-29 05:44:17,184] INFO [Worker clientId=connect-6,
> groupId=local-chi-mm2] Joined group at generation 9317 with protocol
> version 2 and got assignment: Assignment{error=0,
> leader='connect-5-9a82cf55-e113-4112-86bf-d3cabcd44f54',
> leaderUrl='NOTUSED/local-chi', offset=1401, connectorIds=[], taskIds=[],
> revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1549)
> STDOUT: [2020-07-29 05:44:17,184] WARN [Worker clientId=connect-6,
> groupId=local-chi-mm2] Catching up to assignment's config offset.
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020)
> STDOUT: [2020-07-29 05:44:17,184] INFO [Worker clientId=connect-6,
> groupId=local-chi-mm2] Current config state offset -1 is behind group
> assignment 1401, reading to end of config log
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1081)
> STDOUT: [2020-07-29 05:44:17,239] INFO [Worker clientId=connect-8,
> groupId=local-sa-mm2] Finished reading to end of log and updated config
> snapshot, new config log offset: -1
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1085)
> STDOUT: [2020-07-29 05:44:17,239] INFO [Worker clientId=connect-8,
> groupId=local-sa-mm2] Current config state offset -1 does not match group
> assignment 1387. Forcing rebalance.
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1053)
> STDOUT: [2020-07-29 05:44:17,239] INFO [Worker clientId=connect-8,
> groupId=local-sa-mm2] Rebalance started
> (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:222)
> STDOUT: [2020-07-29 05:44:17,239] INFO [Worker clientId=connect-8,
> groupId=local-sa-mm2] (Re-)joining group
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:552)
> STDOUT: [2020-07-29 05:44:17,276] INFO [Worker clientId=connect-8,
> groupId=local-sa-mm2] Successfully joined group with generation 9077
> 

Re: mirrormaker 2 monitoring

2020-07-19 Thread Ryanne Dolan
Iftach, MM2 uses the same MetricsReporter interface that brokers use,
however seems most people use a JMX exporter.

Ryanne

On Sun, Jul 19, 2020, 7:12 AM Iftach Ben-Yosef 
wrote:

> Hello,
> I want to monitor our mirrormaker 2 cluster using prometheus. We are mostly
> interested in lag, incoming/outgoing traffic and java related metrics.
>
> Does anyone have experience with this? Should I use the kafka prometheus
> jmx exporter or some other exporter for this? (kafka connect exporter or
> something like that? doesnt seem like mm has its own exporter as far as I
> could say?)
>
> Any recommendations on how to set this up and which metrics you found
> useful would be appreciated!
>
> Thanks,
> Iftach
>
> --
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and
> do
> not constitute a legally binding obligation. No legally binding
> obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>
>
> This email and any
> attachments hereto may be confidential or privileged.  If you received
> this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person. Thanks.
>


Re: Mirror Maker 2.0 Queries

2020-07-13 Thread Ryanne Dolan
Ananya, yes the driver is distributed, but each worker only communicates
via kafka. They do not listen on any ports.

Ryanne

On Sat, Jul 11, 2020, 11:28 AM Ananya Sen  wrote:

> Hi
>
> I was exploring the Mirror maker 2.0. I read through this
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> documentation
> and I have  a few questions.
>
>1. For running mirror maker as a dedicated mirror maker cluster, the
>documentation specifies a config file and a starter script. Is this
> mirror
>maker process distributed ?
>2. I could not find any port configuration for the above mirror maker
>process, So can we configure mirror maker itself to run as a cluster i.e
>running the process instance across multiple server to avoid downtime
> due
>to server crash.
>3. If we could somehow run the mirror maker as a distributed process
>then does that mean that topic and consumer offset replication will be
>shared among those mirror maker processes?
>4. What is the default port of this mirror maker process and how can we
>override it?
>
> Looking forward to your reply.
>
>
> Thanks & Regards
> Ananya Sen
>


Re: destination topics in mm2 larger than source topic

2020-07-01 Thread Ryanne Dolan
Iftach, is it possible the source topic is compressed?

Ryanne

On Wed, Jul 1, 2020, 8:39 AM Iftach Ben-Yosef 
wrote:

> Hello everyone.
>
> I'm testing mm2 for our cross dc topic replication. We used to do it using
> mm1 but faced various issues.
>
> So far, mm2 is working well, but I have 1 issue which I can't really
> explain; the destination topic is larger than the source topic.
>
> For example, We have 1 topic which on the source cluster is around
> 2.8-2.9TB with retention.ms=8640
>
> I added to our mm2 cluster the "sync.topic.configs.enabled=false" config,
> and edited the retention.ms of the destination topic to be 5760. Other
> than that, I haven't touched the topic created by mm2 on the destination
> cluster.
>
> By logic I'd say that if I shortened the retention on the destination, the
> topic size should decrease, but in practice, I see that it is larger than
> the source topic (it's about 4.6TB).
> This same behaviour is seen on all 3 topics which I am currently mirroring
> (all 3 from different source clusters, into the same destination clusters)
>
> Does anyone have any idea as to why mm2 acts this way for me?
>
> Thanks,
> Iftach
>
> --
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and
> do
> not constitute a legally binding obligation. No legally binding
> obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>
>
> This email and any
> attachments hereto may be confidential or privileged.  If you received
> this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person. Thanks.
>


Re: Should there be separate Kafka for producers and consumers

2020-05-23 Thread Ryanne Dolan
Hey Nitin, it's not uncommon to use separate clusters and MirrorMaker for
this purpose, e.g. separate ingest and ETL clusters. That way your ETL
cluster has only one producer (MM) and only one consumer (Camus).

It's also possible you just need more brokers and/or more partitions.

Ryanne

On Sat, May 23, 2020, 2:38 AM nitin agarwal  wrote:

> Hi All,
>
> We have Camus consumer which reads data from Kafka and writes to HDFS.
> Whenever this consumer runs, there is impact on producer latencies. We
> don't want producer latencies to be increased in anyway.
>
> I want to know which design pattern is more suited for supporting this use
> case.
> 1) Single Kafka cluster with efficient throttling limits for consumers.
> 2) Separate Kafka cluster for producers and consumers. Data is replicated
> using MirrorMaker.
>
> Thanks,
> Nitin
>


Re: MM2 Message Handlers

2020-05-18 Thread Ryanne Dolan
Jamie, MM2 uses Connect, so you can use Connect's SMTs for the same effect.

Ryanne

On Mon, May 18, 2020, 6:17 AM Jamie  wrote:

> Hi All,
> Does MM2 currently support message handlers, the same way MM1 did? If not,
> are there any plans to support this in future?
> Many Thanks,
> Jamie


Re: Identity Mirroring

2020-05-07 Thread Ryanne Dolan
Hey Henry, this was done with MM1 at LinkedIn at one point, but it requires
support for shallow iteration in KafkaConsumer, which was removed from
Apache Kafka a long time ago. Given recent talk of breaking changes in
Kafka 3.0, this might be a good time to revisit this.

Ryanne


On Thu, May 7, 2020, 2:46 AM Henry Cai  wrote:

> I saw this feature mentioned in the cloudera blog post:
> https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/
>
> High Throughput Identity Mirroring
> The batch does not need to be decompressed and compressed and deserialized
> and serialized if nothing had to be changed.  Identity mirroring can have a
> much higher throughput than the traditional approach. This is another
> feature that will be coming soon in MM2.
>
> Is this feature already implemented in MM2?  Is there a KIP or JIRA
> associated with the feature?
>


Re: Metrics in Kafka Connect

2020-04-30 Thread Ryanne Dolan
This is what I did inside MM2's Connectors and Tasks. It works pretty well,
but I'd certainly prefer if Connect exposed it's own Metrics API.

Ryanne

On Thu, Apr 30, 2020 at 2:57 PM Gérald Quintana 
wrote:

> Hello,
>
> We developed a custom Kafka Connect implementation for a specific need, and
> we would like to monitor its internals (request latency and rate, pool
> usage).
>
> Is it possible to publish custom metrics using the Kafka client metric
> framework (org.apache.kafka.common.metrics.*) . We would like to reuse the
> metric reporter defined at worker level and used for other connectors.
>
> Thanks,
> Gerald
>


Re: Kafka Mirror Maker 2

2020-04-29 Thread Ryanne Dolan
> 1. ... Do I need to manually create a topic TOPIC1 in cluster K2 and this
topic will be used for producing messages to when failover happens.

Correct. When you fail-over producers from K1 to K2, producers will send to
TOPIC1 in K2. You may need to manually create that topic and set up write
ACLs, etc. MM2 will automatically create and configure "remote topics",
including ACLs -- but it won't touch "normal" topics. I recommend you set
this up ahead of time.

> 2. .. how do I move back to my primary

The operation is the same, just in reverse. Any data produced to TOPIC1 in
K2 will get replicated to K2.TOPIC1 in K1. A consumer in K1 can then
consume from TOPIC1 _and_ K2.TOPIC1 in order to see records that were
produced to K2 during failover.

Ryanne

On Wed, Apr 29, 2020 at 3:05 PM Himanshu Tyagi 
wrote:

> Hey Team,
> I've a few doubts regarding how the producers work after failover in Mirror
> Maker 2
>
> Let's say that we have two clusters K1 and K2 and configured MM2
> replication for TOPIC1 (originally created in just K1).
>
> We configured the active-active replication:
>
> K1->K2.enabled = true
> K2->K1.enabled = true
> K1->K2.topics = .*
> K2->K1.topics = .*
>
> On starting mirror maker 2, I see that topics are replicated from cluster
> K1
> to K2 in the naming format K1.topic_name_here and vice versa for topics
> from cluster K2 to K1.
>
> I see that there was no topic TOPIC1 created in K2, only K1.TOPIC1 was
> created. I see this scenario working for consumers, as in the beginning the
> consumers consume TOPIC1 from cluster K1. When the cluster K1 out of
> service, fail over happens. Consumers start to consume K1.TOPIC1 from K2.
>
> My questions are as follows:
>
>1. For producers, they won't producer to the topic K1.TOPIC1 in cluster
>K2, my question is how do the producers go about producing data. Do I
> need
>to manually create a topic TOPIC1 in cluster K2 and this topic will be
>used for producing messages to when failover happens.
>2. If the above scenario is true, how do I move back to my primary
>cluster K1. As, now the topic TOPIC1 in cluster K2 has digressed from
>the topic TOPIC in K1. How do we sync the messages in this scenario ?
>
>
> --
> Best,
> Himanshu Tyagi
> Contact No.- +1 480 465 0625
>


Re: MirrorMaker2 ordering guarantees

2020-04-27 Thread Ryanne Dolan
> Could the `max.in.flight.requests.per.connection=1` parameter help to
> prevent the "slightly out-of-order records"?

Yes that helps, but dupes are still possible when MM2 restarts or
rebalances, since it will restart at the latest commit. If you are
replicating something like CDC or changelogs, then dupes might be fine
(downstream state will be eventually consistent). That's a common pattern
with MM1 as well.

Ryanne

On Mon, Apr 27, 2020 at 4:47 AM Péter Sinóros-Szabó
 wrote:

> Hey Ryanne,
>
> Is there any documentation where I can read more about this "slightly
> out-of-order records"?
> It would help very much to see how we can use MM2 in our systems.
>
> Thanks,
> Peter
>
> On Thu, 23 Apr 2020 at 08:56, Péter Sinóros-Szabó <
> peter.sinoros-sz...@transferwise.com> wrote:
>
> > Hey Ryanna,
> >
> > Could the `max.in.flight.requests.per.connection=1` parameter help to
> > prevent the "slightly out-of-order records"?
> > Or is there any workaround for that? Duplicates are fine for me, but I'd
> > like to have the same order of messages too.
> > Can you please add some more detail about why those "slightly
> out-of-order
> > records" may happen?
> >
> > Thanks,
> > Peter
> >
> > On Wed, 22 Apr 2020 at 20:16, Ryanne Dolan 
> wrote:
> >
> >> Hey Peter, Connect will need to support transactions before we can
> >> guarantee the order of records in remote topics. We can guarantee that
> no
> >> records are dropped or skipped, even during consumer failover/migration
> >> etc, but we can still have duplicates and slightly out-of-order records
> in
> >> the downstream remote topics, for now.
> >>
> >> Ryanne
> >>
> >> On Wed, Apr 22, 2020 at 3:39 AM Péter Sinóros-Szabó
> >>  wrote:
> >>
> >> > Hey,
> >> >
> >> > so KIP-382 mentions that:
> >> > "Partitioning and order of records is preserved between source and
> >> remote
> >> > topics."
> >> > is the ordering of messages (I guess only in a partition) something
> >> that is
> >> > actually implemented in 2.4 (or in 2.5)?
> >> >
> >> > Or do I need to set `max.in.flight.requests.per.connection=1` ?
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >>
> >
>


Re: MM2 with older Kafka version

2020-04-23 Thread Ryanne Dolan
Yes :)

On Wed, Apr 22, 2020, 8:03 PM Henry Cai  wrote:

> Looks like MM2 ships with Kafka 2.4, if our brokers are still running on
> older kafka version (2.3), can the MM2 running with 2.4 code work with
> brokers running with 2.3 code?
>
> Thanks.
>
>


Re: MirrorMaker2 ordering guarantees

2020-04-22 Thread Ryanne Dolan
Hey Peter, Connect will need to support transactions before we can
guarantee the order of records in remote topics. We can guarantee that no
records are dropped or skipped, even during consumer failover/migration
etc, but we can still have duplicates and slightly out-of-order records in
the downstream remote topics, for now.

Ryanne

On Wed, Apr 22, 2020 at 3:39 AM Péter Sinóros-Szabó
 wrote:

> Hey,
>
> so KIP-382 mentions that:
> "Partitioning and order of records is preserved between source and remote
> topics."
> is the ordering of messages (I guess only in a partition) something that is
> actually implemented in 2.4 (or in 2.5)?
>
> Or do I need to set `max.in.flight.requests.per.connection=1` ?
>
> Thanks,
> Peter
>


Re: MirrorMaker2 - uneven loadbalancing

2020-03-23 Thread Ryanne Dolan
Thanks Peter for running this experiment. That looks sorta normal. It looks
like Connect is deciding to use 10 total tasks and doesn't care which ones
do what. Ideally you'd see the MirrorSourceConnector tasks evenly divided,
since they do the bulk of the work -- but that doesn't seem to be the case
with your selection of parameters.

I'd recommend bumping up the tasks.max a lot higher than 4 in order to
achieve finer-grained workloads and a more even balance.

Ryanne

On Mon, Mar 23, 2020 at 9:58 AM Péter Sinóros-Szabó
 wrote:

> so I made some tests with tasks.max = 4
>
> with 2 instances:
> - instance 1: 4 MirrorSourceConnector, 1 MirrorHeartbeatConnector tasks
> - instance 2: 4 MirrorCheckpointConnector, 1 MirrorHeartbeatConnector tasks
>
> with 3 instances:
> - instance 1: 3 MirrorCheckpointConnector tasks
> - instance 2: 3 MirrorSourceConnector tasks, 1 MirrorHeartbeatConnector
> task
> - instance 3: 1 MirrorSourceConnector, 1 MirrorCheckpointConnector task,
> 1 MirrorHeartbeatConnector task
>
> with 4 instances:
> - instance 1: 3 MirrorCheckpointConnector tasks
> - instance 2: 2 MirrorSourceConnector tasks, 1 MirrorHeartbeatConnector
> task
> - instance 3: 1 MirrorSourceConnector task, 1 MirrorCheckpointConnector
> task
> - instance 4: 1 MirrorSourceConnector task, 1 MirrorHeartbeatConnector task
>
> So it seems that it is not well balanced, but can be scaled somewhat, not
> ideal.
> Is this how it should work?
>
> Peter
>
> On Fri, 20 Mar 2020 at 20:58, Ryanne Dolan  wrote:
>
> > Peter, what happens when you add an additional node? Usually Connect will
> > detect it and rebalance tasks accordingly. I'm wondering if that
> mechanism
> > isn't working for you.
> >
> > Ryanne
> >
> > On Fri, Mar 20, 2020 at 2:40 PM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Well, I don't know much about herders. If you can give some idea how to
> > > check it, I will try.
> > >
> > > Peter
> > >
> > > On Fri, 20 Mar 2020 at 17:47, Ryanne Dolan 
> > wrote:
> > >
> > > > Hmm, that's weird. I'd expect the type of tasks to be evenly
> > distributed
> > > as
> > > > well. Is it possible one of the internal topics are misconfigured
> s.t.
> > > the
> > > > Herders aren't functioning correctly?
> > > >
> > > > Ryanne
> > > >
> > > > On Fri, Mar 20, 2020 at 11:17 AM Péter Sinóros-Szabó
> > > >  wrote:
> > > >
> > > > > I use tasks.max = 4.
> > > > >
> > > > > I see 4 tasks of MirrorSourceConnectors on MM2 instances A.
> > > > > I see 4 tasks of MirrorCheckpointConnector and 1 task of
> > > > > MirrorHeartbeatConnector on MM2 instance B.
> > > > >
> > > > > The number of tasks are well distributed, but the type of tasks are
> > > not.
> > > > > According to Connect documentation I expected 1-3 or 2-2 tasks of
> > > > > the MirrorSourceConnectors on the two MM2 instances.
> > > > >
> > > > > So is this a bug or an expected behaviour?
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > > On Fri, 20 Mar 2020 at 15:26, Ryanne Dolan 
> > > > wrote:
> > > > >
> > > > > > Peter, in Connect the Connectors are only run on the leader node.
> > > Most
> > > > of
> > > > > > the work is done in the Tasks, which should be divided across
> > nodes.
> > > > Make
> > > > > > sure you have tasks.max set to something higher than the default
> of
> > > 1.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > > On Fri, Mar 20, 2020, 8:53 AM Péter Sinóros-Szabó
> > > > > >  wrote:
> > > > > >
> > > > > > > Hey,
> > > > > > >
> > > > > > > I am using MM2 to mirror A cluster to B with tasks.max = 4.
> > > > > > >
> > > > > > > I started two instances of MM2 and noticed that all
> > > > > > MirrorSourceConnectors
> > > > > > > were running in one instance and the rest of the connectors in
> > the
> > > > > other.
> > > > > > >
> > > > > > > This results in a very uneven resource utilization and also it
> > did
> > > > not
> > > > > > > really spread the mirroring oad between the two nodes.
> > > > > > >
> > > > > > > I assumed that MM2 will run 2-2 of those connectors in each
> > > instance.
> > > > > > >
> > > > > > > Is this current behaviour as expected or did I miss something
> on
> > > how
> > > > to
> > > > > > > configure it better?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Peter
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: MirrorMaker2 - uneven loadbalancing

2020-03-20 Thread Ryanne Dolan
Peter, what happens when you add an additional node? Usually Connect will
detect it and rebalance tasks accordingly. I'm wondering if that mechanism
isn't working for you.

Ryanne

On Fri, Mar 20, 2020 at 2:40 PM Péter Sinóros-Szabó
 wrote:

> Well, I don't know much about herders. If you can give some idea how to
> check it, I will try.
>
> Peter
>
> On Fri, 20 Mar 2020 at 17:47, Ryanne Dolan  wrote:
>
> > Hmm, that's weird. I'd expect the type of tasks to be evenly distributed
> as
> > well. Is it possible one of the internal topics are misconfigured s.t.
> the
> > Herders aren't functioning correctly?
> >
> > Ryanne
> >
> > On Fri, Mar 20, 2020 at 11:17 AM Péter Sinóros-Szabó
> >  wrote:
> >
> > > I use tasks.max = 4.
> > >
> > > I see 4 tasks of MirrorSourceConnectors on MM2 instances A.
> > > I see 4 tasks of MirrorCheckpointConnector and 1 task of
> > > MirrorHeartbeatConnector on MM2 instance B.
> > >
> > > The number of tasks are well distributed, but the type of tasks are
> not.
> > > According to Connect documentation I expected 1-3 or 2-2 tasks of
> > > the MirrorSourceConnectors on the two MM2 instances.
> > >
> > > So is this a bug or an expected behaviour?
> > >
> > > Thanks,
> > > Peter
> > >
> > > On Fri, 20 Mar 2020 at 15:26, Ryanne Dolan 
> > wrote:
> > >
> > > > Peter, in Connect the Connectors are only run on the leader node.
> Most
> > of
> > > > the work is done in the Tasks, which should be divided across nodes.
> > Make
> > > > sure you have tasks.max set to something higher than the default of
> 1.
> > > >
> > > > Ryanne
> > > >
> > > > On Fri, Mar 20, 2020, 8:53 AM Péter Sinóros-Szabó
> > > >  wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > I am using MM2 to mirror A cluster to B with tasks.max = 4.
> > > > >
> > > > > I started two instances of MM2 and noticed that all
> > > > MirrorSourceConnectors
> > > > > were running in one instance and the rest of the connectors in the
> > > other.
> > > > >
> > > > > This results in a very uneven resource utilization and also it did
> > not
> > > > > really spread the mirroring oad between the two nodes.
> > > > >
> > > > > I assumed that MM2 will run 2-2 of those connectors in each
> instance.
> > > > >
> > > > > Is this current behaviour as expected or did I miss something on
> how
> > to
> > > > > configure it better?
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > >
> > >
> >
>


Re: MirrorMaker2 not mirroring for 5 minutes when adding a topic

2020-03-20 Thread Ryanne Dolan
eeking to offset 180324042 for partition ...
>
> There isn't any logs with "x took too long (y ms)" or "timed out running
> task x"?
> Only some "x took y ms", but all of them during the startup and it does not
> seem to be a reason for the 5 minute delay:
>
>
> [2020-03-20 10:49:13,980] INFO Reflections took 1443 ms to scan 92 urls,
> producing 4486 keys and 18097 values [using 2 cores]
> (org.reflections.Reflections)
> [2020-03-20 10:49:14,986] INFO Reflections took 653 ms to scan 92 urls,
> producing 4486 keys and 18097 values [using 2 cores]
> (org.reflections.Reflections)
> [2020-03-20 10:49:16,059] INFO Reflections took 618 ms to scan 92 urls,
> producing 4486 keys and 18097 values [using 2 cores]
> (org.reflections.Reflections)
> [2020-03-20 10:49:19,929] INFO creating internal topics took 13 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:20,997] INFO creating internal topics took 10 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,637] INFO creating internal topics took 21 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,658] INFO creating upstream offset-syncs topic took 29
> ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,658] INFO creating internal topics took 33 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,680] INFO loading initial consumer groups took 21 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,953] INFO loading initial set of topic-partitions took
> 295 ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,964] INFO creating downstream topic-partitions took 10
> ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,980] INFO refreshing known target topics took 15 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:49:57,981] INFO Starting MirrorSourceConnector took 367 ms.
> (org.apache.kafka.connect.mirror.MirrorSourceConnector)
> [2020-03-20 10:49:58,442] INFO syncing topic configs took 461 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,316] INFO Stopping MirrorSourceConnector took 6051 ms.
> (org.apache.kafka.connect.mirror.MirrorSourceConnector)
> [2020-03-20 10:50:07,329] INFO creating upstream offset-syncs topic took 9
> ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,395] INFO loading initial set of topic-partitions took
> 66 ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,397] INFO creating downstream topic-partitions took 2
> ms (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,412] INFO refreshing known target topics took 15 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,412] INFO Starting MirrorSourceConnector took 95 ms.
> (org.apache.kafka.connect.mirror.MirrorSourceConnector)
> [2020-03-20 10:50:07,430] INFO creating internal topics took 13 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,448] INFO creating internal topics took 13 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,457] INFO loading initial consumer groups took 8 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:07,844] INFO syncing topic configs took 432 ms
> (org.apache.kafka.connect.mirror.Scheduler)
> [2020-03-20 10:50:09,755] INFO Stopping DistributedHerder-connect-1 took
> 922 ms. (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:10,749] INFO Stopping DistributedHerder-connect-1 took
> 994 ms. (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:10,755] INFO Stopping DistributedHerder-connect-1 took 6
> ms. (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:11,749] INFO Stopping DistributedHerder-connect-1 took
> 994 ms. (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:11,859] INFO Stopping
> task-thread-MirrorCheckpointConnector-3 took 3 ms.
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:11,861] INFO Stopping
> task-thread-MirrorCheckpointConnector-2 took 7 ms.
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:11,861] INFO Stopping
> task-thread-MirrorCheckpointConnector-0 took 5 ms.
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
> [2020-03-20 10:50:11,862] INFO Stopping
> task-thread-MirrorCheckpointConnector-1 took 6 ms.
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask)
>
>
>
>
> Peter
>
> On Wed, 18 Mar 2020 at 15:12, Ryanne Dolan  wrote:
>
> > Peter, can you share any log lines like "x took y ms" or "x took too long
> > (y ms)" or "timed out running task x"?
> >
>

Re: MirrorMaker2 - uneven loadbalancing

2020-03-20 Thread Ryanne Dolan
Hmm, that's weird. I'd expect the type of tasks to be evenly distributed as
well. Is it possible one of the internal topics are misconfigured s.t. the
Herders aren't functioning correctly?

Ryanne

On Fri, Mar 20, 2020 at 11:17 AM Péter Sinóros-Szabó
 wrote:

> I use tasks.max = 4.
>
> I see 4 tasks of MirrorSourceConnectors on MM2 instances A.
> I see 4 tasks of MirrorCheckpointConnector and 1 task of
> MirrorHeartbeatConnector on MM2 instance B.
>
> The number of tasks are well distributed, but the type of tasks are not.
> According to Connect documentation I expected 1-3 or 2-2 tasks of
> the MirrorSourceConnectors on the two MM2 instances.
>
> So is this a bug or an expected behaviour?
>
> Thanks,
> Peter
>
> On Fri, 20 Mar 2020 at 15:26, Ryanne Dolan  wrote:
>
> > Peter, in Connect the Connectors are only run on the leader node. Most of
> > the work is done in the Tasks, which should be divided across nodes. Make
> > sure you have tasks.max set to something higher than the default of 1.
> >
> > Ryanne
> >
> > On Fri, Mar 20, 2020, 8:53 AM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Hey,
> > >
> > > I am using MM2 to mirror A cluster to B with tasks.max = 4.
> > >
> > > I started two instances of MM2 and noticed that all
> > MirrorSourceConnectors
> > > were running in one instance and the rest of the connectors in the
> other.
> > >
> > > This results in a very uneven resource utilization and also it did not
> > > really spread the mirroring oad between the two nodes.
> > >
> > > I assumed that MM2 will run 2-2 of those connectors in each instance.
> > >
> > > Is this current behaviour as expected or did I miss something on how to
> > > configure it better?
> > >
> > > Thanks,
> > > Peter
> > >
> >
>


Re: MirrorMaker2 - uneven loadbalancing

2020-03-20 Thread Ryanne Dolan
Peter, in Connect the Connectors are only run on the leader node. Most of
the work is done in the Tasks, which should be divided across nodes. Make
sure you have tasks.max set to something higher than the default of 1.

Ryanne

On Fri, Mar 20, 2020, 8:53 AM Péter Sinóros-Szabó
 wrote:

> Hey,
>
> I am using MM2 to mirror A cluster to B with tasks.max = 4.
>
> I started two instances of MM2 and noticed that all MirrorSourceConnectors
> were running in one instance and the rest of the connectors in the other.
>
> This results in a very uneven resource utilization and also it did not
> really spread the mirroring oad between the two nodes.
>
> I assumed that MM2 will run 2-2 of those connectors in each instance.
>
> Is this current behaviour as expected or did I miss something on how to
> configure it better?
>
> Thanks,
> Peter
>


Re: MirrorMaker2 not mirroring for 5 minutes when adding a topic

2020-03-18 Thread Ryanne Dolan
Peter, can you share any log lines like "x took y ms" or "x took too long
(y ms)" or "timed out running task x"?

Ryanne

On Tue, Mar 17, 2020 at 10:45 AM Péter Sinóros-Szabó
 wrote:

> Hey,
>
> Running a MM2 cluster to mirror from A->B clusters I noticed that when I
> add a new topic to A cluster, MM2 will notice it:
> [2020-03-17 13:14:05,477] INFO Found 2719 topic-partitions on main. 1 are
> new. 0 were removed. Previously had 2718.
> (org.apache.kafka.connect.mirror.MirrorSourceConnector)
>
> That's fine. It seems that MM2 connectors just simply restart to start up
> with the new configuration.
> My problem is that it takes about 5 minutes for MM2 to start to mirror
> messages again.
>
> What I see in the logs are:
> [2020-03-17 13:14:07,107] INFO Kafka startTimeMs: 1584450847106
> (org.apache.kafka.common.utils.AppInfoParser)
> [2020-03-17 13:14:07,204] INFO [Producer clientId=producer-11] Cluster ID:
> wif2mnkZTayzAEOb2VyoLA (org.apache.kafka.clients.Metadata)
>
> It seems that MM2 restarts the connectors in 2 seconds.
> But then I see the usual logs, but according to MM2's metrics, it is not
> mirroring any messages for about 5 minutes.
>
> [2020-03-17 13:14:10,485] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} Committing offsets
> (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2020-03-17 13:14:10,485] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} Committing offsets
> (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2020-03-17 13:14:10,485] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} flushing 0 outstanding
> messages for offset commit
> (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2020-03-17 13:14:10,485] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} flushing 0 outstanding
> messages for offset commit
> (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2020-03-17 13:14:10,637] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} Finished commitOffsets
> successfully in 152 ms (org.apache.kafka.connect.runtime.WorkerSourceTask)
> [2020-03-17 13:14:10,637] INFO
> WorkerSourceTask{id=MirrorCheckpointConnector-3} Finished commitOffsets
> successfully in 152 ms (org.apache.kafka.connect.runtime.WorkerSourceTask)
>
> So after ~5 minutes, I see that it is subscribing to the topics... and I
> also see in the metrics that that is the time when it starts to mirror
> messages.
>
> [2020-03-17 13:19:39,850] INFO [Consumer clientId=consumer-39,
> groupId=null] Subscribed to partition(s):  ...
> [2020-03-17 13:19:39,851] INFO Starting with 308 previously uncommitted
> partitions. (org.apache.kafka.connect.mirror.MirrorSourceTask)
> [2020-03-17 13:19:39,851] INFO [Consumer clientId=consumer-39,
> groupId=null] Seeking to offset 178508243 for partition
> twTasks.austrac-service.executeTask.default-0 (org.apache.kafka
>
>
> Is there any idea how to speed this up?
>
> Thanks,
> Peter
>


Re: MirrorMaker2 to mirror new topics?

2020-03-09 Thread Ryanne Dolan
Hey Peter, often the problem is the replication factor or min ISRs being
misconfigured, which can prevent MM2 (or anything else) from sending. Hope
that helps!

Ryanne

On Mon, Mar 9, 2020, 10:35 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> it is mirroring but the problem was that on of the task failed with not
> being able to produce messages to the destination cluster and the task
> stopped there.
>
> Peter
>
> On Mon, 9 Mar 2020 at 11:20, Péter Sinóros-Szabó <
> peter.sinoros-sz...@transferwise.com> wrote:
>
> > Hi,
> >
> > Should MM2 automatically start to mirror new topics?
> > I see in the logs that MM2 created the remote topic, but it is not
> > mirroring the new messages.
> >
> > In the source cluster, topic was created 3 days ago and there is about
> 200
> > million messages in it.
> > Just to test if the messages are there, I could consume 30.000 messages
> > from it (just didn't want to read more...).
> >
> > In the destination cluster, I see the topic is there, MM2 logs that it
> > created it, MM2 logs also that it is consuming from all 6 partitions of
> it
> > besides the other about 600 topics that is is mirroring.
> >
> > But on the destination cluster, I do not see new messages arriving and
> > also in total I only see about 22.000 messages.
> >
> > What can be the problem here?
> >
> > Thanks,
> > Peter
> >
>


Re: How to produce in DR scenario with Mirror maker (automatic failover)

2020-03-07 Thread Ryanne Dolan
Hello, take a look at RemoteClusterUtils.translateOffsets, which will give
you the correct offsets _and topics_ to subscribe to. The method
automatically renames the topics according to the ReplicationPolicy. You
can leverage this method in your consumer itself or in external tooling.

Ryanne

On Sat, Mar 7, 2020, 12:27 PM Taha Aslam Qureshi  wrote:

> Hello,
>
> I am trying out Mirror Maker 2 on my Kafka 2.4 cluster for DR purposes. I
> have created a dedicated cluster for the DR. MM2 seems to be working fine,
> but I not sure how would I be able to produce to a topic in a scenario of a
> DR.
>
> Current Scenario, let's say I have a topic called "mytopic" in my primary
> cluster and it will be replication to backup cluster with prefix primary
> called "primary.mytopic"
>
> Primary Cluster1:
> mytopic
>
> Backup Cluster2
> primary.mytopic
>
> My app is bootstrapped to the primary cluster to produce to mytopic. In a
> scenario of DR is there an automatic way to switch the topic to
> primary.topics with bootstrap to the backup cluster?
>
> Thanks for the help!
>
>


Re: Mirror Maker 2 MirrorClient

2020-02-27 Thread Ryanne Dolan
Hey Carl, that's what my team has done for our internal tooling, and I
designed MirrorClient with that in mind. Given a single mm2.properties file
you can create MirrorClients for each cluster and those in turn give you
Admin/Consumer/Producer clients if you need them. Our internal tooling
essentially looks for an mm2.properties file in a system-wide location and
lets you interrogate any given cluster by alias, which is nice when you
manage a very large number of Kafka clusters :)

Ryanne

On Thu, Feb 27, 2020, 1:21 PM Carl Graving  wrote:

> All:
>  I was tinkering around with the MirrorClient and was curious about the
> configs. I see that I can use the MirrorMaker config to pull in the same
> configs that were used to spin up the MM2 cluster. Seeing as MM2 is started
> with these configs in a property file and it is passed in on command line,
> I take it is implying that I should read the same file to set up my
> MirrorClient application to then feed into MirrorMakerConfig, to the get
> the clientConfig like in the code comments? Or am I reading too much into
> this and just configure for the cluster I am setting up the Client for?
>
> thanks,
> Carl
>


Re: Use a single consumer or create consumer per topic

2020-02-26 Thread Ryanne Dolan
On an older cluster like that, rebalances will stop-the-world and kill your
throughput. Much better to have a bunch of consumer groups, one per topic,
so they can rebalance independently.

On Wed, Feb 26, 2020, 1:05 AM Mark Zang  wrote:

> Hi,
>
> I have a 20 brokers kafka cluster and there are about 50 topics to consume.
>
> Between creating a consumer for each topic and creating a consumer for all
> 50 topics, what is the pros and cons?
>
> What would be the suggested way if I enable auto commit for each 10
> seconds?
>
> Kafka client version is 0.10.2
>
>
> Thanks!
>
> Mark
>


Re: [External] Mirror Maker Questions

2020-02-20 Thread Ryanne Dolan
For 1), MM2 will work with older versions of Kafka. I've gotten it to work
with clusters as old as 0.10.2 but with some features disabled iirc.

Ryanne

On Thu, Feb 20, 2020, 2:49 AM Dean Barlan 
wrote:

> Hi everyone,
>
> I have a few small questions for you regarding MirrorMaker.
>
> 1.  I know that MirrorMaker 2.0 is only available starting with Kafka
> version 2.4.  Does that mean that if I was mirroring from cluster A to
> cluster B, that both clusters need to be running Kafka 2.4?
>
> 2.  For MirrorMaker 1.0, is there a way to change the name of the mirrored
> topic?  For testing, I have just one Kafka cluster and was trying to run MM
> replicating from a topic to the same cluster but have a different name.
>
> Thanks,
>
> --
> *_Grab is hiring. Learn more at _**https://grab.careers
> *
>
>
> By communicating with Grab Inc and/or its
> subsidiaries, associate companies and jointly controlled entities (“Grab
> Group”), you are deemed to have consented to the processing of your
> personal data as set out in the Privacy Notice which can be viewed at
> https://grab.com/privacy/ 
>
>
> This email contains
> confidential information and is only for the intended recipient(s). If you
> are not the intended recipient(s), please do not disseminate, distribute
> or
> copy this email Please notify Grab Group immediately if you have received
> this by mistake and delete this email from your system. Email transmission
> cannot be guaranteed to be secure or error-free as any information therein
> could be intercepted, corrupted, lost, destroyed, delayed or incomplete,
> or
> contain viruses. Grab Group do not accept liability for any errors or
> omissions in the contents of this email arises as a result of email
> transmission. All intellectual property rights in this email and
> attachments therein shall remain vested in Grab Group, unless otherwise
> provided by law.
>
>


Re: Streams - multiple clusters support

2020-02-13 Thread Ryanne Dolan
Cyrille, I don't see why using MM1/2 would break your isolation
requirement. But if you can't mirror topics for some reason consider Flink
instead of Kafka Streams.

Ryanne

On Thu, Feb 13, 2020 at 10:52 AM Cyrille Karmann 
wrote:

> Hello,
>
> We are trying to create a streaming pipeline of data between different
> Kafka clusters. Our users send data to the input Kafka cluster, and we want
> to process this data and send the result to topics on another Kafka
> cluster.
>
> We have different reasons for this setup, but mainly it's for isolation:
> the two clusters don't have to have the same configuration and the first
> "input" Kafka cluster is critical: we want to be able to do maintenance on
> the second cluster without impacting the first one. Also we have more than
> a thousand topics on each side so managing them separately is easier.
>
> We are investigating different technologies for the processing part, and
> Kafka Streams looked promising except it is apparently not supporting to
> write in a different cluster as the one it is reading from.
>
> I saw people on forums suggesting to write in the first cluster and use
> MirrorMaker to channel the data to the output cluster. This breaks our
> isolation requirements and add more latency so we don't want to do that.
>
> I have two questions:
>
> - Is there a reason behind the constraint that Kafka Streams can not
> produce to a different cluster? I see that Kafka Streams allow to specify
> different configuration for the producer but it explicitly disallow it for
> ProducerConfig.BOOTSTRAP_SERVERS_CONFIG so it definitely something the
> developers did not want to support (
>
> https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/StreamsConfig.html#getMainConsumerConfigs-java.lang.String-java.lang.String-
> )
> but I am not clear why it is so.
>
> - At the same time, there is the KafkaClientSupplier mechanism that allows
> to inject our own KafkaProducer. I was actually successful in injecting
> such a KafkaProducer that connects to a different cluster. The fact that I
> am able to do, using a not-very documented API, something that other parts
> of the Kafka Streams library try to prevent me to do, makes me wonder if I
> am breaking something while doing this? In particular one thing important
> to me is exactly-once processing so I want to be sure it would still work.
>
> Thanks,
> Cyrille
>


Re: MM2 for DR

2020-02-12 Thread Ryanne Dolan
> elaborate a bit more about the active-active

Active/active in this context just means that both (or multiple)
clusters are used under normal operation, not just during an outage.
For this to work, you basically have isolated instances of your application
stack running in each DC, with MM2 keeping each DC in sync. If one DC is
unavailable, traffic is shifted to another DC. It's possible to set this up
s.t. failover/failback between DCs happens automatically and seamlessly,
e.g. with load balancers and health checks. It's more complicated to set up
than the active/standby approach, but DR sorta takes care of itself from
then on. I frequently demo this stuff, where I pull the plug on entire DCs
and apps keep running like nothing happened.

On Wed, Feb 12, 2020 at 12:05 AM benitocm  wrote:

> Hi Ryanne,
>
> Please could you elaborate a bit more about the active-active
> recommendation?
>
> Thanks in advance
>
> On Mon, Feb 10, 2020 at 10:21 PM benitocm  wrote:
>
> > Thanks very much for the response.
> >
> > Please could you elaborate a bit more about  "I'd
> > arc in that direction. Instead of migrating A->B->C->D..., active/active
> is
> > more like having one big cluster".
> >
> > Another thing that I would like to share is that currently my consumers
> > only consumer from one topic so the fact of introducing MM2 will impact
> > them.
> > Any suggestion in this regard would be greatly appreciated
> >
> > Thanks in advance again!
> >
> >
> > On Mon, Feb 10, 2020 at 9:40 PM Ryanne Dolan 
> > wrote:
> >
> >> Hello, sounds like you have this all figured out actually. A couple
> notes:
> >>
> >> > For now, we just need to handle DR requirements, i.e., we would not
> need
> >> active-active
> >>
> >> If your infrastructure is sufficiently advanced, active/active can be a
> >> lot
> >> easier to manage than active/standby. If you are starting from scratch
> I'd
> >> arc in that direction. Instead of migrating A->B->C->D..., active/active
> >> is
> >> more like having one big cluster.
> >>
> >> > secondary.primary.topic1
> >>
> >> I'd recommend using regex subscriptions where possible, so that apps
> don't
> >> need to worry about these potentially complex topic names.
> >>
> >> > An additional question. If the topic is compacted, i.e.., the topic
> >> keeps
> >> > forever, does switchover operations would imply add an additional path
> >> in
> >> > the topic name?
> >>
> >> I think that's right. You could always clean things up manually, but
> >> migrating between clusters a bunch of times would leave a trail of
> >> replication hops.
> >>
> >> Also, you might look into implementing a custom ReplicationPolicy. For
> >> example, you could squash "secondary.primary.topic1" into something
> >> shorter
> >> if you like.
> >>
> >> Ryanne
> >>
> >> On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:
> >>
> >> > Hi,
> >> >
> >> > After having a look to the talk
> >> >
> >> >
> >>
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> >> > and the
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> >> > I am trying to understand how I would use it
> >> > in the setup that I have. For now, we just need to handle DR
> >> requirements,
> >> > i.e., we would not need active-active
> >> >
> >> > My requirements, more or less, are the following:
> >> >
> >> > 1) Currently, we have just one Kafka cluster "primary" where all the
> >> > producers are producing to and where all the consumers are consuming
> >> from.
> >> > 2) In case "primary" crashes, we would need to have other Kafka
> cluster
> >> > "secondary" where we will move all the producer and consumers and keep
> >> > working.
> >> > 3) Once "primary" is recovered, we would need to move to it again (as
> we
> >> > were in #1)
> >> >
> >> > To fullfill #2, I have thought to have a new Kafka cluster "secondary"
> >> and
> >> > setup a replication procedure using MM2. However, it is not clear to
> me
> >> how
> >> > to proceed.
> >> >
> &g

Re: MM2 for DR

2020-02-10 Thread Ryanne Dolan
Hello, sounds like you have this all figured out actually. A couple notes:

> For now, we just need to handle DR requirements, i.e., we would not need
active-active

If your infrastructure is sufficiently advanced, active/active can be a lot
easier to manage than active/standby. If you are starting from scratch I'd
arc in that direction. Instead of migrating A->B->C->D..., active/active is
more like having one big cluster.

> secondary.primary.topic1

I'd recommend using regex subscriptions where possible, so that apps don't
need to worry about these potentially complex topic names.

> An additional question. If the topic is compacted, i.e.., the topic keeps
> forever, does switchover operations would imply add an additional path in
> the topic name?

I think that's right. You could always clean things up manually, but
migrating between clusters a bunch of times would leave a trail of
replication hops.

Also, you might look into implementing a custom ReplicationPolicy. For
example, you could squash "secondary.primary.topic1" into something shorter
if you like.

Ryanne

On Mon, Feb 10, 2020 at 1:24 PM benitocm  wrote:

> Hi,
>
> After having a look to the talk
>
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0
> and the
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382
> I am trying to understand how I would use it
> in the setup that I have. For now, we just need to handle DR requirements,
> i.e., we would not need active-active
>
> My requirements, more or less, are the following:
>
> 1) Currently, we have just one Kafka cluster "primary" where all the
> producers are producing to and where all the consumers are consuming from.
> 2) In case "primary" crashes, we would need to have other Kafka cluster
> "secondary" where we will move all the producer and consumers and keep
> working.
> 3) Once "primary" is recovered, we would need to move to it again (as we
> were in #1)
>
> To fullfill #2, I have thought to have a new Kafka cluster "secondary" and
> setup a replication procedure using MM2. However, it is not clear to me how
> to proceed.
>
> I would describe the high level details so you guys can point my
> misconceptions:
>
> A) Initial situation. As in the example of the KIP-382, in the primary
> cluster, we will have a local topic: "topic1" where the producers will
> produce to and the consumers will consume from. MM2 will create in  the
> primary the remote topic "primary.topic1" where the local topic in the
> primary will be replicated. In addition, the consumer group information of
> primary will be also replicated.
>
> B) Kafka primary cluster is not available. Producers are moved to produce
> into the topic1 that it was manually created. In addition, consumers need
> to connect to
> secondary to consume the local topic "topic1" where the producers are now
> producing and from the remote topic  "primary.topic1" where the producers
> were producing before, i.e., consumers will need to aggregate.This is so
> because some consumers could have lag so they will need to consume from
> both. In this situation, local topic "topic1" in the secondary will be
> modified with new messages and will be consumed (its consumption
> information will also change) but the remote topic "primary.topic1" will
> not receive new messages but it will be consumed  (its consumption
> information will change)
>
> At this point, my conclusion is that consumers needs to consume from both
> topics (the new messages produced in the local topic and the old messages
> for consumers that had a lag)
>
> C) primary cluster is recovered (here is when the things get complicated
> for me). In the talk, the new primary is renamed a primary-2 and the MM2 is
> configured to active-active replication.
> The result is the following. The secondary cluster will end up with a new
> remote topic (primary-2.topic1) that will contain a replica of the new
> topic1 created in the primary-2 cluster. The primary-2 cluster will have 3
> topics. "topic1" will be a new topic where in the near future producers
> will produce, "secondary.topic1" contains the replica of the local topic
> "topic1" in the secondary and "secondary.primary.topic1" that is "topic1"
> of the old primary (got through the secondary).
>
> D) Once all the replicas are in sync, producers and consumers will be moved
> to the primary-2. Producers will produce to local topic "topic1" of
> primary-2 cluster. The consumers
> will connect to primary-2 to consume from "topic1" (new messages that come
> in), "secondary.topic1" (messages produced during the outage) and from
> "secondary.primary.topic1" (old messages)
>
> If topics have a retention time, e.g. 7 days, we could remove
> "secondary.primary.topic1" after a few days, leaving the situation as at
> the beginning. However, if another problem happens in the middle, the
> number of topics could be a little difficult to handle.
>
> An additional question. If the 

Re: MirrorMaker 2 does not replicate all messages?

2020-01-23 Thread Ryanne Dolan
Nils, are those record counts or offsets?

Ryanne

On Thu, Jan 23, 2020, 2:46 AM Nils Hermansson 
wrote:

> Hello,
> I have setup MM2 and the config is similar to the example config.
>
> https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties
> Primary has 3 brokers and secondary has 1 broker.
>
> I wrote a script to diff number of messages in topics between primary and
> secondary cluster.
>
> To get number of messages in topic I use.
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
> localhost:9092 --topic mytopic --time -1 --offsets 1 | awk -F ":" '{sum +=
> $3} END {print sum}'
>
> Is there some reason not all messages gets replicated?
>
> A first thought was that it may have been because I wiped the secondary
> cluster and started from beginning. So i switched the name of the secondary
> cluster and prefix in mm2 config. But I ended up with the same diff.
>
> Result:
> topic: topic1, primary: 236, secondary: 119, diff 117
> topic: topic2, primary: 2910, secondary: 1223, diff 1687
> topic: topic3, primary: 2248, secondary: 2248, diff 0
> topic: topic4, primary: 627, secondary: 91, diff 536
> topic: topic5, primary: 84, secondary: 77, diff 7
> topic: topic6, primary: 333, secondary: 156, diff 177
> topic: topic7, primary: 158, secondary: 45, diff 113
> topic: topic8, primary: 247, secondary: 48, diff 199
> ...
> ...
>
>
> BR
> Nils
>


Re: MM2 offset sync vs checkpoints

2020-01-22 Thread Ryanne Dolan
Peter, the offset sync records are used internally to create the checkpoint
records. You are correct that the checkpoints include everything you need.

Ryanne

On Wed, Jan 22, 2020, 11:08 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> Why do MM2 need offset sync messages too? It seems to me that the
> checkpoint messages contains everything that's needed for offset
> translation. What am I missing here?
>
> Cheers,
> Peter
>


Re: Mirrormaker 2.0

2020-01-21 Thread Ryanne Dolan
Peter, the LegacyReplicationPolicy class is described in the existing
KIP-382 and is a requirement for the deprecation of MM1. I was planning to
implement it but would love the help if you're interested.

Ryanne

On Tue, Jan 21, 2020, 8:25 AM Péter Sinóros-Szabó
 wrote:

> Ryanne,
>
> I didn't do much work yet, just checked the Interface to see if it is easy
> to implement or not.
>
> > The PR for LegacyReplicationPolicy should include any relevant fixes to
> get it to run without crashing
> Do you mean that there is already a PR for LegacyReplicationPolicy? If
> there is, please link it here, I could not find it.
>
> Thanks,
> Peter
>
>
>
>
> On Fri, 17 Jan 2020 at 20:58, Ryanne Dolan  wrote:
>
> > Peter, KIP-382 includes LegacyReplicationPolicy for this purpose, but no,
> > it has not been implemented yet. If you are interested in writing the PR,
> > it would not require a separate KIP before merging. Looks like you are
> > already doing the work :)
> >
> > It is possible, as you point out, that returning nulls like that will
> break
> > parts of MM2. The PR for LegacyReplicationPolicy should include any
> > relevant fixes to get it to run without crashing.
> >
> > Certainly some parts of MM2 will not work as intended when used with
> > LegacyReplicationPolicy, e.g. MM2 would not be able to prevent cycles.
> But
> > of course it should still _run_ and should work roughly the same as MM1.
> >
> > I'm happy to work with you on the PR if you are interested.
> >
> > Ryanne
> >
> > On Fri, Jan 17, 2020 at 9:34 AM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Hi Sebastian & Ryanne,
> > >
> > > do you have maybe an implementation of this is just some ideas about
> how
> > to
> > > implement the policy that does not rename topics?
> > > I am checking the ReplicationPolicy interface and don't really know
> what
> > > the impact will be if I implement this:
> > >
> > > public String formatRemoteTopic(String sourceClusterAlias, String
> topic)
> > {
> > > return topic; }
> > > public String topicSource(String topic) { return null; // well, I do
> not
> > > really know if this were a mirrored topic or not }
> > > public String upstreamTopic(String topic) { return topic;  // well, I
> do
> > > not really know if this were a mirrored topic or not}
> > > public String originalTopic(String topic) { return topic; }
> > >
> > > Thanks,
> > > Peter
> > >
> > > On Mon, 30 Dec 2019 at 06:57, Ryanne Dolan 
> > wrote:
> > >
> > > > Sebastian, you can drop in a custom jar in the "Connect plug-in path"
> > and
> > > > MM2 will be able to load it. That enables you to implement your own
> > > > ReplicationPolicy (and other pluggable interfaces) without compiling
> > > > everything.
> > > >
> > > > In an upcoming release we'll have a "LegacyReplicationPolicy" that
> does
> > > not
> > > > rename topics. It's possible "SimpleReplicationPolicy" is a better
> > name.
> > > >
> > > > Be advised that some features depend on correct ReplicationPolicy
> > > > semantics, which LegacyReplicationPolicy will explicitly break. For
> > > > example, MM2 cannot prevent cycles if topics are not renamed (or some
> > > other
> > > > similar mechanism is used).
> > > >
> > > > Ryanne
> > > >
> > > > On Sun, Dec 29, 2019, 7:41 PM Sebastian Schmitz <
> > > > sebastian.schm...@propellerhead.co.nz> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I found that it's using the DefaultReplicationPolicy that always
> > > returns
> > > > > "sourceClusterAlias + separator + topic" with only the separator
> > being
> > > > > configurable in the configuration-file with
> > > REPLICATION_POLICY_SEPARATOR.
> > > > >
> > > > > It seems like I need a different ReplicationPolicy, like a
> > > > > SimpleReplicationPolicy which always returns "topic" for the
> > > > > formatRemoteTopic, then. But that would mean that I can't download
> > the
> > > > > Binaries and have to build the whole thing myself after adding the
> > new
> > > > > Policy-file!?
> > > > > Or I could create a PR for a SimpleReplicationPolicy to be in some
> > > > > future build...

Re: MirrorMaker 2

2020-01-18 Thread Ryanne Dolan
I think that's right. If there is no per-topic retention configured, Kafka
will use the cluster default.

Ryanne

On Sat, Jan 18, 2020, 10:21 AM Vishal Santoshi 
wrote:

> Last question .. so if I set the global property on the
> destination cluster, would that be overridden by the per topic retention ?
> I look at the zk config for the specific topic I do not see the specific
> overrides that I generally do so I would assume it is honoring the
> destination global retention ?
>
> On Tue, Jan 14, 2020 at 10:11 PM Vishal Santoshi <
> vishal.santo...@gmail.com>
> wrote:
>
> > Thanks
> >
> > On Tue, Jan 14, 2020 at 8:47 AM Ryanne Dolan 
> > wrote:
> >
> >> Take a look at the DefaultConfigPropertyFilter class, which supports
> >> customizable blacklists via config.properties.blacklist.
> >>
> >> Ryanne
> >>
> >> On Tue, Jan 14, 2020, 6:05 AM Vishal Santoshi <
> vishal.santo...@gmail.com>
> >> wrote:
> >>
> >> > Thank you for the prompt reply, Very much appreciated. I am not sure
> >> > disable config sync  is an option for us. How do I blacklist
> >> retention.ms
> >> > ?
> >> >
> >> > On Tue, Jan 14, 2020 at 12:47 AM Ryanne Dolan 
> >> > wrote:
> >> >
> >> > > Vishal, there is no support for overriding topic configuration like
> >> > > retention. Instead, remote topics will have the same configuration
> as
> >> > their
> >> > > source topics. You could disable config sync or blacklist
> >> retention.ms
> >> > to
> >> > > prevent that from happening, and then alter retention for remote
> >> topics
> >> > > manually if you need to.
> >> > >
> >> > > KIP-158 might help with this in the future by allowing you to
> control
> >> how
> >> > > Connect creates topics, to some extent.
> >> > >
> >> > > Ryanne
> >> > >
> >> > > On Mon, Jan 13, 2020, 9:55 PM Vishal Santoshi <
> >> vishal.santo...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Can I override the retention on target topics through
> >> mm2.properties ?
> >> > > It
> >> > > > should be as simple as  stating the retention.ms globally ? Am
> also
> >> > > > curious whether it can more at a single channel level ?
> >> > > >
> >> > > > For example A->B, topic on B should have a retention of x and for
> >> B->A
> >> > > the
> >> > > > retention is y..
> >> > > >
> >> > > > Is that possible?
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > >
> >> >
> >>
> >
>


Re: Clustered MirrorMaker 2 configuration update

2020-01-17 Thread Ryanne Dolan
That's right. The leader(s) will apply the config by writing it to the
mm2-config topic(s), which the followers will pick up. If the entire
cluster is bounced, the new config will have been applied.

Ryanne

On Fri, Jan 17, 2020, 4:28 AM Péter Sinóros-Szabó
 wrote:

> Am I right that the configuration that is actually being used will come
> from the leader? Well, except the exception like the bootstrap list...
>
> Asking because I'd like to have a clear process to change the configuration
> in a way that at the end of the process I can be sure that all nodes use
> the new configuration.
>
> So assuming that I have a 2+ node MM2 cluster, all nodes has the same
> configuration and I'd like to change the configuration for the cluster, I
> plan to restart each node after each other with the new configuration. When
> I change configuration, I change it to the same on all nodes. Am I right
> that at the end I can be sure that the cluster uses the new configuration,
> because on the process I will bounce the leader for sure with the new
> config.
>
> I actually plan to run it on Kubernetes, so I am using a rolling restart on
> Kubernetes.
> I only have one mirroring path, but I guess this would work if I have more
> paths as well.
>
> Peter
>
> On Thu, 16 Jan 2020 at 16:36, Ryanne Dolan  wrote:
>
> > MM2 nodes only communicate via Kafka -- no connection is required between
> > them.
> >
> > To reconfigure, a rolling restart probably won't do what you expect,
> since
> > the configuration is always dictated by a single leader. Once the leader
> is
> > bounced, it will broadcast the new configuration via Kafka. If you
> bounce a
> > follower, it will still use the old configuration.
> >
> > It's slightly more complicated than that actually, since there are
> multiple
> > connectors in each cluster->cluster "herder". When there are multiple
> > clusters being replicated, there may be dozens of leaders. So a rolling
> > restart might be a good idea for larger deployments.
> >
> > And technically there are some properties that do affect followers, e.g.
> a
> > follower will read connection info like bootstrap.servers directly from
> > mm2.properties, not from the leader via Kafka. Obviously that would be a
> > chicken-egg problem otherwise! So a rolling restart would be prudent when
> > making such changes.
> >
> > So I guess, generally speaking, rolling restarts are a good idea -- just
> be
> > advised that it won't necessarily behave as you expect.
> >
> > Ryanne
> >
> > On Thu, Jan 16, 2020, 7:55 AM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Hi,
> > >
> > > I run two instances of MM2 with the command connect-mirror-maker.sh
> > >
> > > Q1., Is there any requirement to cluster MM2? Like a network connection
> > > between the nodes? How do MM2 coordinate the work between nodes?
> > >
> > > Q2., Assuming I run two instances and want to update the configuration,
> > > should it work if I rolling restart the nodes after each other? It
> seems
> > > that after the restart both nodes still run with the old configuration.
> > So
> > > what should be the correct way of reconfiguring a MM2 cluster?
> > >
> > > Cheers,
> > > Peter
> > >
> >
>


Re: Mirrormaker 2.0

2020-01-17 Thread Ryanne Dolan
Peter, KIP-382 includes LegacyReplicationPolicy for this purpose, but no,
it has not been implemented yet. If you are interested in writing the PR,
it would not require a separate KIP before merging. Looks like you are
already doing the work :)

It is possible, as you point out, that returning nulls like that will break
parts of MM2. The PR for LegacyReplicationPolicy should include any
relevant fixes to get it to run without crashing.

Certainly some parts of MM2 will not work as intended when used with
LegacyReplicationPolicy, e.g. MM2 would not be able to prevent cycles. But
of course it should still _run_ and should work roughly the same as MM1.

I'm happy to work with you on the PR if you are interested.

Ryanne

On Fri, Jan 17, 2020 at 9:34 AM Péter Sinóros-Szabó
 wrote:

> Hi Sebastian & Ryanne,
>
> do you have maybe an implementation of this is just some ideas about how to
> implement the policy that does not rename topics?
> I am checking the ReplicationPolicy interface and don't really know what
> the impact will be if I implement this:
>
> public String formatRemoteTopic(String sourceClusterAlias, String topic) {
> return topic; }
> public String topicSource(String topic) { return null; // well, I do not
> really know if this were a mirrored topic or not }
> public String upstreamTopic(String topic) { return topic;  // well, I do
> not really know if this were a mirrored topic or not}
> public String originalTopic(String topic) { return topic; }
>
> Thanks,
> Peter
>
> On Mon, 30 Dec 2019 at 06:57, Ryanne Dolan  wrote:
>
> > Sebastian, you can drop in a custom jar in the "Connect plug-in path" and
> > MM2 will be able to load it. That enables you to implement your own
> > ReplicationPolicy (and other pluggable interfaces) without compiling
> > everything.
> >
> > In an upcoming release we'll have a "LegacyReplicationPolicy" that does
> not
> > rename topics. It's possible "SimpleReplicationPolicy" is a better name.
> >
> > Be advised that some features depend on correct ReplicationPolicy
> > semantics, which LegacyReplicationPolicy will explicitly break. For
> > example, MM2 cannot prevent cycles if topics are not renamed (or some
> other
> > similar mechanism is used).
> >
> > Ryanne
> >
> > On Sun, Dec 29, 2019, 7:41 PM Sebastian Schmitz <
> > sebastian.schm...@propellerhead.co.nz> wrote:
> >
> > > Hello,
> > >
> > > I found that it's using the DefaultReplicationPolicy that always
> returns
> > > "sourceClusterAlias + separator + topic" with only the separator being
> > > configurable in the configuration-file with
> REPLICATION_POLICY_SEPARATOR.
> > >
> > > It seems like I need a different ReplicationPolicy, like a
> > > SimpleReplicationPolicy which always returns "topic" for the
> > > formatRemoteTopic, then. But that would mean that I can't download the
> > > Binaries and have to build the whole thing myself after adding the new
> > > Policy-file!?
> > > Or I could create a PR for a SimpleReplicationPolicy to be in some
> > > future build...
> > >
> > > Any suggestions for this?
> > >
> > > Thanks
> > >
> > > Sebastian
> > >
> > >
> > > On 30-Dec-19 1:39 PM, Sebastian Schmitz wrote:
> > > > Hello,
> > > >
> > > > another thing I found and didn't find any configuration in the KIP
> yet
> > > > was that if I have two clusters (source and target) and a topic
> > > > "replicateme" on the source-cluster it will get replicated to the
> > > > target-cluster as "source.replicateme".
> > > >
> > > > How can I stop it from adding the cluster-name in front of the
> > > > topic-name on target-cluster?
> > > >
> > > > Thanks
> > > >
> > > > Sebastian
> > > >
> > > > On 27-Dec-19 7:24 AM, Sebastian Schmitz wrote:
> > > >> Hello Ryanne,
> > > >>
> > > >> Is there a way to prevent that from happening? We have two separate
> > > >> clusters with some topics being replicated to the second one for
> > > >> reporting. If we replicate everything again that reporting would
> > > >> probably have some problems.
> > > >>
> > > >> Yes, I wondered when the Networking-guys would come and complain
> > > >> about me using too much bandwidth on the VPN-Link ;)
> > > >>
> > > >> Thanks
> > > >>
> > >

Re: Kafka Broker leader change without effect

2020-01-16 Thread Ryanne Dolan
That's right, thanks for the correction. I don't suppose the producer is
configured with acks=all in this case.

Ryanne

On Thu, Jan 16, 2020, 11:05 AM JOHN, BIBIN  wrote:

> Producer request will not fail. Producer will fail based on acks and
> min.insync.replicas config parameters.
>
>
> -Original Message-
> From: Ryanne Dolan 
> Sent: Thursday, January 16, 2020 10:52 AM
> To: Kafka Users 
> Subject: Re: Kafka Broker leader change without effect
>
> Marco, the replication factor of 3 is not possible when you only have two
> brokers, thus the producer will fail to send records until the third broker
> is restored. You would need to change the topic replication factor to 2 for
> your experiment to work as you expect.
>
> Ryanne
>
> On Thu, Jan 16, 2020, 9:59 AM Marco Di Falco 
> wrote:
>
> > Hello guys!
> > i have a problem i wrote about stackoverflow here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com
> > _questions_59772124_kafka-2Dbroker-2Dleader-2Dchange-2Dwithout-2Deffec
> > t=DwIBaQ=LFYZ-o9_HUMeMTSQicvjIg=eFshZuDXOwvmW_UjVcAH8Q=BhZVZXe
> > seQEOD2SYRzwo_lU6ibRcmwDoMXL643BQB44=tPA4Mq0KgEzVUn9Dihe5Z-0G7Hc4voz
> > O8ox3nAmoLh0=
> >
> > Can you help me?
> > thank you
> > Marco
> >
>


Re: Kafka Broker leader change without effect

2020-01-16 Thread Ryanne Dolan
Marco, the replication factor of 3 is not possible when you only have two
brokers, thus the producer will fail to send records until the third broker
is restored. You would need to change the topic replication factor to 2 for
your experiment to work as you expect.

Ryanne

On Thu, Jan 16, 2020, 9:59 AM Marco Di Falco 
wrote:

> Hello guys!
> i have a problem i wrote about stackoverflow here:
> https://stackoverflow.com/questions/59772124/kafka-broker-leader-change-without-effect
>
> Can you help me?
> thank you
> Marco
>


Re: Clustered MirrorMaker 2 configuration update

2020-01-16 Thread Ryanne Dolan
MM2 nodes only communicate via Kafka -- no connection is required between
them.

To reconfigure, a rolling restart probably won't do what you expect, since
the configuration is always dictated by a single leader. Once the leader is
bounced, it will broadcast the new configuration via Kafka. If you bounce a
follower, it will still use the old configuration.

It's slightly more complicated than that actually, since there are multiple
connectors in each cluster->cluster "herder". When there are multiple
clusters being replicated, there may be dozens of leaders. So a rolling
restart might be a good idea for larger deployments.

And technically there are some properties that do affect followers, e.g. a
follower will read connection info like bootstrap.servers directly from
mm2.properties, not from the leader via Kafka. Obviously that would be a
chicken-egg problem otherwise! So a rolling restart would be prudent when
making such changes.

So I guess, generally speaking, rolling restarts are a good idea -- just be
advised that it won't necessarily behave as you expect.

Ryanne

On Thu, Jan 16, 2020, 7:55 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I run two instances of MM2 with the command connect-mirror-maker.sh
>
> Q1., Is there any requirement to cluster MM2? Like a network connection
> between the nodes? How do MM2 coordinate the work between nodes?
>
> Q2., Assuming I run two instances and want to update the configuration,
> should it work if I rolling restart the nodes after each other? It seems
> that after the restart both nodes still run with the old configuration. So
> what should be the correct way of reconfiguring a MM2 cluster?
>
> Cheers,
> Peter
>


Re: How to achieve "Effectively one big consumer group" in active/active clusters

2020-01-16 Thread Ryanne Dolan
In this case the consumers just subscribe to "topic1" like normal, and the
remote topics (primary.topic1, secondary.topic1) are just for DR. MM2 is
not required for things to work under normal circumstances, but if one
cluster goes down you can recover its data from the other.

Ryanne

On Thu, Jan 16, 2020, 2:35 AM dixingxing  wrote:

> hello everyone:
>
> I've read this slide recently(
> https://www.confluent.io/kafka-summit-lon19/disaster-recovery-with-mirrormaker-2-0/),
> i'm not sure how to deploy active-active clusters like this.
>
>
> There is no detail about how to make an "Effectively one big consumer
> group",  since there is MM2 between the two clusters, so the topic should
> like this:
> *primary cluster:*
> topic1
> secondary.topic1
>
> *secondary cluster:*
> topic1
> primary.topic1
>
> I think the simplest way to make an "Effectively one big consumer group"
> is :
> Each client should make 2 cosumer instances, one subscribe
> primary.topic1, and the other subscribe secondary.topic1, so if one cluster
> crash down,  LB just need to failover to another cluster, the consumer
> should do nothing, every thing will works fine.
> But it seems there is no need for MM2, or MM2 is just for disaster
> recovery?
> Am i misunderstanding about this slide?
>
>
>
>
>
>
>
>
>
>
>
>


Re: Mirror Maker 2 problems

2020-01-14 Thread Ryanne Dolan
Nils, a REST API for MM2 was discussed at length, but so far the community
hasn't settled on a plan. For now, if you need to monitor MM2's Connectors
over HTTP, you can use a bunch of Connect clusters and manually configure
them to run MirrorSourceConnector etc. This has some advantages, e.g. you
can use the Connect REST API to start, stop, and re-configure individual
connectors on-the-fly -- but it also means you have a lot of stuff to
configure and monitor yourself.

I'd recommend monitoring MM2 via its JMX metrics and heartbeats. These
provide substantially more information than the Connect REST API.

Ryanne

On Tue, Jan 14, 2020 at 9:32 AM Nils Hermansson <
nils.hermans...@getdreams.com> wrote:

> Another question is there anyway to open up a rest port like standard 8083
> when running
>
> ./bin/connect-mirror-maker.sh config/connect-mirror-maker.properties
>
> have tried adding rest.port=8083 but does not seem to work.
>
> Would like to be able to query the status of the connector via rest.
>
> On Tue, Jan 14, 2020 at 3:54 PM Nils Hermansson <
> nils.hermans...@getdreams.com> wrote:
>
> > Hello,
> > yes that seems todo the trick.
> >
> > Another questions is there any documentation somewhere describing how the
> > offset translations topics MM2 creates actually works?
> > We need to write code ourself until this PR is merged I guess.
> > https://github.com/apache/kafka/pull/7577
> >
> >
> > Thanks for the help.
> >
> > On Tue, Jan 14, 2020 at 12:58 PM Karan Kumar 
> > wrote:
> >
> >> Hi Nils
> >>
> >> Can you try with the latest mm2 default config file found at
> >>
> >>
> https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties
> >> .
> >> Please feel free to reach out again if you are still stuck.
> >>
> >> Thanks
> >> Karan
> >>
> >>
> >>
> >> On Tue, Jan 14, 2020 at 1:48 PM Nils Hermansson <
> >> nils.hermans...@getdreams.com> wrote:
> >>
> >> > Hello,
> >> > I am trying to setup a replication between 2 clusters with MM2. The
> >> goal is
> >> > to be able to disksnapshot the secondary cluster for backups hence we
> >> only
> >> > want one broker on the secondary cluster. It's very important for us
> >> that
> >> > we can replicate the offsets which MM2 should solve for us.
> >> >
> >> > Cluster:
> >> > A=primary
> >> > B=secondary
> >> >
> >> > Primary cluster have 3 brokers and secondary has one broker.
> >> >
> >> > My first attempt was as standalone worker.
> >> > This attempt yielded replication of the topics but no topics for
> offsets
> >> > were created.
> >> > MM2 configuration replication.factor = 1 was honored.
> >> >
> >> > Since I did not get any offsets in standalone mode I decided to test
> >> > cluster mode.
> >> >
> >> > Secondary attempt was running MM2 in cluster mode(still one broker in
> >> > secondary cluster) however the problem I get here is that MM2 tries to
> >> > write the messages with a replication of 3 to the secondary cluster.
> In
> >> > this case MM2 seem to create the offset topics but fail when trying to
> >> > create them.
> >> >
> >> > Giving:
> >> > org.apache.kafka.connect.errors.ConnectException: Error while
> >> attempting to
> >> > create/find topic(s) 'mm2-offsets.A.internal'
> >> > ...
> >> > ...
> >> > InvalidReplicationFactorException: Replication factor: 3 larger than
> >> > available brokers: 1
> >> > ...
> >> >
> >> > I have tried setting these config value in the properties config but
> >> they
> >> > do not seem to be honored:
> >> > replication.factor = 1
> >> >
> >> > A.replication.factor = 1
> >> > B.replication.factor = 1
> >> >
> >> > BR
> >> > Nils
> >> >
> >>
> >>
> >> --
> >> Thanks
> >> Karan
> >>
> >
>


Re: MirrorMaker 2

2020-01-14 Thread Ryanne Dolan
Take a look at the DefaultConfigPropertyFilter class, which supports
customizable blacklists via config.properties.blacklist.

Ryanne

On Tue, Jan 14, 2020, 6:05 AM Vishal Santoshi 
wrote:

> Thank you for the prompt reply, Very much appreciated. I am not sure
> disable config sync  is an option for us. How do I blacklist  retention.ms
> ?
>
> On Tue, Jan 14, 2020 at 12:47 AM Ryanne Dolan 
> wrote:
>
> > Vishal, there is no support for overriding topic configuration like
> > retention. Instead, remote topics will have the same configuration as
> their
> > source topics. You could disable config sync or blacklist retention.ms
> to
> > prevent that from happening, and then alter retention for remote topics
> > manually if you need to.
> >
> > KIP-158 might help with this in the future by allowing you to control how
> > Connect creates topics, to some extent.
> >
> > Ryanne
> >
> > On Mon, Jan 13, 2020, 9:55 PM Vishal Santoshi  >
> > wrote:
> >
> > > Can I override the retention on target topics through mm2.properties ?
> > It
> > > should be as simple as  stating the retention.ms globally ? Am also
> > > curious whether it can more at a single channel level ?
> > >
> > > For example A->B, topic on B should have a retention of x and for B->A
> > the
> > > retention is y..
> > >
> > > Is that possible?
> > >
> > > Thanks.
> > >
> >
>


Re: MirrorMaker 2

2020-01-13 Thread Ryanne Dolan
Vishal, there is no support for overriding topic configuration like
retention. Instead, remote topics will have the same configuration as their
source topics. You could disable config sync or blacklist retention.ms to
prevent that from happening, and then alter retention for remote topics
manually if you need to.

KIP-158 might help with this in the future by allowing you to control how
Connect creates topics, to some extent.

Ryanne

On Mon, Jan 13, 2020, 9:55 PM Vishal Santoshi 
wrote:

> Can I override the retention on target topics through mm2.properties ?  It
> should be as simple as  stating the retention.ms globally ? Am also
> curious whether it can more at a single channel level ?
>
> For example A->B, topic on B should have a retention of x and for B->A the
> retention is y..
>
> Is that possible?
>
> Thanks.
>


Re: Where to run MM2? Source or destination DC/region?

2020-01-09 Thread Ryanne Dolan
+1, it is preferable to run MM2 at the target/destination cluster.
Basically, producer lag is more problematic than consumer lag, so you want
the producer as close to the target cluster as possible.

Also take a look at the "--clusters" option of the connect-mirror-maker.sh
command, which lets you specify which clusters are nearby each MM2 node. If
specified, the node will only produce to those specific clusters (and
consume from everywhere else).

Ryanne

On Thu, Jan 9, 2020 at 10:32 AM Andrew Otto  wrote:

> Hi Peter,
>
> My understanding here comes from MirrorMaker 1, but I believe it holds for
> MM2 (someone correct me if I am wrong!)
> For the most part, if you have no latency or connectivity issues, running
> MM at the source will be fine.  However, the failure scenario is different
> if something goes wrong.
>
> When running at the destination, it is the kafka consumer that has to cross
> the network boundary.  If the consumer can't consume, it can always pick
> off from where it left off later.
>
> When running at the source, it is the kafka producer that has to cross the
> network boundary.  If the producer can't produce, it will eventually drop
> messages.
>
>
>
> On Thu, Jan 9, 2020 at 11:28 AM Péter Sinóros-Szabó
>  wrote:
>
> > Hey,
> >
> > I am thinking about where (well in which AWS region) should I run MM2.
> > I might be wrong, but as I know it is better to run it close to the
> > destination cluster.
> > But for other reasons, it would be much easier for me to run it at the
> > source.
> > So is it still advised to run MM2 at the destination?
> > Latency between source and destination is about 32ms.
> > What are the downsides if I run it at the source?
> >
> > Thanks,
> > Peter
> >
>


Re: Mirrormaker 2.0

2020-01-09 Thread Ryanne Dolan
Peter, that's right. So long as ReplicationPolicy is implemented with
proper semantics (i.e. the methods do what they say they should do) any
naming convention will work. You can also use something like double
underscore "__" as a separator with DefaultReplicationPolicy -- it doesn't
need to be a single character.

Ryanne

On Thu, Jan 9, 2020, 7:24 AM Péter Sinóros-Szabó
 wrote:

> Hi Ryanne,
>
> Am I right that as far as I implement ReplicationPolicy properly, those
> features you just mentioned will work fine?
>
> Asking because we already use dot(.) underscore(_) and even hyphen(-)
> characters in not replicated topics :D , so it seems to be that we will
> need a more advanced renaming convention if we plan to use those mentioned
> features. Or we'll need to rename topics, but that's a huge task I'd like
> to avoid...
>
> Thanks,
> Peter
>


Re: MirrorMaker 2 throttling

2020-01-08 Thread Ryanne Dolan
Peter, have you tried overriding the client ID used by MM2's consumers?
Otherwise, the client IDs are dynamic, which would make it difficult to
throttle using quotas.

Ryanne

On Wed, Jan 8, 2020, 10:12 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I'd like to throttle the mirroring process when I start Mirror Maker 2 at
> the first time, so it starts to pull all the messages that exists on the
> source cluster. I'd like to do this only to avoid putting too much traffic
> on the source cluster that may slow down existing production client on it.
>
> I tried several quota setups on both the source and destination clusters,
> both none of them worked.
> - it either did not have any affect
> - or slowed down the mirroring but also cause issues like
> ERROR WorkerSourceTask{id=MirrorHeartbeatConnector-0} Failed to flush,
> timed out while waiting for producer to flush outstanding 115 messages
>
> Is there a good practice on how to initialize/bootstrap a MirrorMaker
> cluster on an existing Kafka cluster?
>
> Cheers,
> Peter
>


Re: MirrorMaker 2 - Does it write anything to source cluster?

2020-01-08 Thread Ryanne Dolan
Peter, MM2 writes offset syncs upstream to the source cluster, which are
then used to emit checkpoints to the target cluster. There is no particular
reason why offset syncs are stored on the source cluster instead of the
target, and it's been suggested that we swap that around.

Ryanne

On Wed, Jan 8, 2020, 3:58 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> so I am planning to use MM2 and was thinking if it has any impact on the
> source cluster when mirroring.
>
> Obviously it impacts the performance of the source cluster, so I plan to
> use quotas to solve that, but other than that,
>
> Does MM2 write anything back to the source cluster?
>
> As I understand documentation, it won't, but a clarification on this would
> be great.
>
> Thanks for creating MM2! :)
>
> best,
> Peter
>


Re: Question On KIP-500

2020-01-07 Thread Ryanne Dolan
Hello. The dev list might be a better place to ask this question. FWIW I
believe your interpretation is correct -- the proposal essentially uses two
separate clusters, comprising "controllers" and "brokers". N.B. that the
brokers cannot become controllers or vice versa.

You can find the discussion thread here:
https://lists.apache.org/thread.html/cce5313ebe72bde34bf0da3af5a1723db3ee871667b1fd8edf2ee7ab@%3Cdev.kafka.apache.org%3E

You will see I expressed concerns early on that "the proposal still
requires separate processes with separate configuration".

Ryanne

On Thu, Jan 2, 2020 at 9:45 AM M. Manna  wrote:

> Hello,
>
> Greetings of the New Year to everybody. Sorry for reviving this randomly,
> as I didn't have the original thread anymore.
>
> I was reading through this KIP and trying to following the current vs
> proposed diagrams. Once again, apologies for making mistakes in
> understanding this.
>
> Are we replacing the ZK cluster with another kafka cluster which will
> simply do what ZK is doing? Or, are we simply distributing the metadata
> management job to the existing kafka cluster? I believe it's the former as
> I read on the KIP that the metadata manager (in post-ZK world) would be
> separate from Kafka brokers But it would be good if someone can correct me.
>
> Thanks and Best Regards,
>


Re: Kafka 2.4.0 & Mirror Maker 2.0 Error

2020-01-06 Thread Ryanne Dolan
I just downloaded the 2.4.0 release tarball and didn't run into any issues.
Peter, Jamie, can one of you file a jira ticket if you are still seeing
this? Thanks!

Ryanne

On Fri, Dec 27, 2019 at 12:04 PM Ryanne Dolan  wrote:

> Thanks Peter, I'll take a look.
>
> Ryanne
>
> On Fri, Dec 27, 2019, 7:48 AM Péter Sinóros-Szabó
>  wrote:
>
>> Hi,
>>
>> I see the same.
>> I just downloaded the Kafka zip and I run:
>>
>> ~/kafka-2.4.0-rc3$ ./bin/connect-mirror-maker.sh
>> config/connect-mirror-maker.properties
>>
>> Peter
>>
>> On Mon, 16 Dec 2019 at 17:14, Ryanne Dolan  wrote:
>>
>> > Hey Jamie, are you running the MM2 connectors on an existing Connect
>> > cluster, or with the connet-mirror-maker.sh driver? Given your question
>> > about plugin.path I'm guessing the former. Is the Connect cluster
>> running
>> > 2.4.0 as well? The jars should land in the Connect runtime without any
>> need
>> > to modify the plugin.path or copy jars around.
>> >
>> > Ryanne
>> >
>> > On Mon, Dec 16, 2019, 6:23 AM Jamie 
>> wrote:
>> >
>> > > Hi All,
>> > > I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm
>> > > receiving the following errors on startup:
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found.
>> > > Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not
>> > > found. Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not
>> > > found. Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > >
>> > > I've checked the jar file containing these class file is in the class
>> > > path.
>> > > Is there anything I need to add to plugin.path for the connect
>> properties
>> > > when running mirror maker?
>> > > Many Thanks,
>> > > Jamie
>> >
>>
>>
>> --
>>  - Sini
>>
>


Re: Mirrormaker 2.0

2019-12-29 Thread Ryanne Dolan
> Is there a way to prevent that from happening?

Unfortunately there is no tooling (yet?) to manipulate Connect's offsets,
so it's difficult to force MM2 to skip ahead, reset, etc.

One approach is to use Connect's Simple Message Transform feature. This
enables you to filter the messages being replicated, e.g. based on
timestamps, s.t. only recent messages are ever replicated. It's possible to
configure MM2 to use an existing SMT or you can write your own as a plug-in.

Ryanne

On Thu, Dec 26, 2019, 12:25 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello Ryanne,
>
> Is there a way to prevent that from happening? We have two separate
> clusters with some topics being replicated to the second one for
> reporting. If we replicate everything again that reporting would
> probably have some problems.
>
> Yes, I wondered when the Networking-guys would come and complain about
> me using too much bandwidth on the VPN-Link ;)
>
> Thanks
>
> Sebastian
>
> On 24-Dec-19 1:11 PM, Ryanne Dolan wrote:
> > Glad to hear you are replicating now :)
> >
> >> it probably started mirroring the last seven days as there was no offset
> > for the new consumer-group.
> >
> > That's correct -- MM2 will replicate the entire topic, as far back as the
> > retention period. However, technically there are no consumer groups in
> MM2!
> >
> > 550MB/s in a test cluster sounds pretty good to me. Try increasing
> > "tasks.max" and adding additional nodes.
> >
> > Ryanne
> >
> >
> > On Mon, Dec 23, 2019 at 5:40 PM Sebastian Schmitz <
> > sebastian.schm...@propellerhead.co.nz> wrote:
> >
> >> Hello again!
> >>
> >> Some probably important configs I found out:
> >>
> >> We need this to enable mirroring as it seems to disabled by default?
> >>
> >> source->target.enabled = true
> >> target->source.enabled = true
> >>
> >> Also, the Client-IDs can be configured using:
> >>
> >> source.client.id = my_cool_id
> >> target.client.id = my_cooler_id
> >>
> >> I configured them to include the ID of the server and the name of the
> >> environment to have separate IDs per mirror-node.
> >>
> >> After adding these two, it looks a bit better than before, but still not
> >> satisfied as it started to mirror from my prod to test with 550MB/s as
> >> it probably started mirroring the last seven days as there was no offset
> >> for the new consumer-group. That's next on my list to solve.
> >>
> >> Best regards
> >>
> >> Sebastian
> >>
> >> On 24-Dec-19 8:34 AM, Sebastian Schmitz wrote:
> >>> Hello,
> >>>
> >>> I tried running this connect-mirror-config:
> >>>
> >>> 
> >>> name = $MIRROR_NAME
> >>> clusters = source, target
> >>> source.bootstrap.servers = $SOURCE_SERVERS
> >>> target.bootstrap.servers = $TARGET_SERVERS
> >>> source->target.topics = $SOURCE_TARGET_TOPICS
> >>> target->source.topics = $TARGET_SOURCE_TOPICS
> >>> source->target.emit.heartbeats.enabled = true
> >>> target->source.emit.heartbeats.enabled = true
> >>> connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
> >>>
> >>> # disable some new features
> >>> refresh.topics.enabled = false
> >>> refresh.groups.enabled = false
> >>> emit.checkpoints.enables = true
> >>> emit.heartbeats.enabled = true
> >>> sync.topic.configs.enabled = false
> >>> sync.topic.acls.enabled = false
> >>> 
> >>>
> >>> SOURCE_SERVERS and TARGET_SERVERS are a comma-separated list of three
> >>> brokers with ports.
> >>> The TOPICS are |-separated lists of topics.
> >>>
> >>> I get these warning during startup which is a bit weird as I never
> >>> supplied any of those settings, but maybe I should?
> >>>
> >>> [2019-12-23 00:36:25,918] WARN The configuration
> >>> 'config.storage.topic' was supplied but isn't a known config.
> >>> (org.apache.kafka.clients.producer.ProducerConfig:355)
> >>> [2019-12-23 00:36:25,918] WARN The configuration
> >>> 'producer.bootstrap.servers' was supplied but isn't a known config.
> >>> (org.apache.kafka.clients.producer.ProducerConfig:355)
> >>> [2019-12-23 00:36:25,918] WARN The configuration 'group.id' was
> >>> supplied but isn't a known config.
&

Re: Mirrormaker 2.0

2019-12-29 Thread Ryanne Dolan
Sebastian, you can drop in a custom jar in the "Connect plug-in path" and
MM2 will be able to load it. That enables you to implement your own
ReplicationPolicy (and other pluggable interfaces) without compiling
everything.

In an upcoming release we'll have a "LegacyReplicationPolicy" that does not
rename topics. It's possible "SimpleReplicationPolicy" is a better name.

Be advised that some features depend on correct ReplicationPolicy
semantics, which LegacyReplicationPolicy will explicitly break. For
example, MM2 cannot prevent cycles if topics are not renamed (or some other
similar mechanism is used).

Ryanne

On Sun, Dec 29, 2019, 7:41 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello,
>
> I found that it's using the DefaultReplicationPolicy that always returns
> "sourceClusterAlias + separator + topic" with only the separator being
> configurable in the configuration-file with REPLICATION_POLICY_SEPARATOR.
>
> It seems like I need a different ReplicationPolicy, like a
> SimpleReplicationPolicy which always returns "topic" for the
> formatRemoteTopic, then. But that would mean that I can't download the
> Binaries and have to build the whole thing myself after adding the new
> Policy-file!?
> Or I could create a PR for a SimpleReplicationPolicy to be in some
> future build...
>
> Any suggestions for this?
>
> Thanks
>
> Sebastian
>
>
> On 30-Dec-19 1:39 PM, Sebastian Schmitz wrote:
> > Hello,
> >
> > another thing I found and didn't find any configuration in the KIP yet
> > was that if I have two clusters (source and target) and a topic
> > "replicateme" on the source-cluster it will get replicated to the
> > target-cluster as "source.replicateme".
> >
> > How can I stop it from adding the cluster-name in front of the
> > topic-name on target-cluster?
> >
> > Thanks
> >
> > Sebastian
> >
> > On 27-Dec-19 7:24 AM, Sebastian Schmitz wrote:
> >> Hello Ryanne,
> >>
> >> Is there a way to prevent that from happening? We have two separate
> >> clusters with some topics being replicated to the second one for
> >> reporting. If we replicate everything again that reporting would
> >> probably have some problems.
> >>
> >> Yes, I wondered when the Networking-guys would come and complain
> >> about me using too much bandwidth on the VPN-Link ;)
> >>
> >> Thanks
> >>
> >> Sebastian
> >>
> >> On 24-Dec-19 1:11 PM, Ryanne Dolan wrote:
> >>> Glad to hear you are replicating now :)
> >>>
> >>>> it probably started mirroring the last seven days as there was no
> >>>> offset
> >>> for the new consumer-group.
> >>>
> >>> That's correct -- MM2 will replicate the entire topic, as far back
> >>> as the
> >>> retention period. However, technically there are no consumer groups
> >>> in MM2!
> >>>
> >>> 550MB/s in a test cluster sounds pretty good to me. Try increasing
> >>> "tasks.max" and adding additional nodes.
> >>>
> >>> Ryanne
> >>>
> >>>
> >>> On Mon, Dec 23, 2019 at 5:40 PM Sebastian Schmitz <
> >>> sebastian.schm...@propellerhead.co.nz> wrote:
> >>>
> >>>> Hello again!
> >>>>
> >>>> Some probably important configs I found out:
> >>>>
> >>>> We need this to enable mirroring as it seems to disabled by default?
> >>>>
> >>>> source->target.enabled = true
> >>>> target->source.enabled = true
> >>>>
> >>>> Also, the Client-IDs can be configured using:
> >>>>
> >>>> source.client.id = my_cool_id
> >>>> target.client.id = my_cooler_id
> >>>>
> >>>> I configured them to include the ID of the server and the name of the
> >>>> environment to have separate IDs per mirror-node.
> >>>>
> >>>> After adding these two, it looks a bit better than before, but
> >>>> still not
> >>>> satisfied as it started to mirror from my prod to test with 550MB/s as
> >>>> it probably started mirroring the last seven days as there was no
> >>>> offset
> >>>> for the new consumer-group. That's next on my list to solve.
> >>>>
> >>>> Best regards
> >>>>
> >>>> Sebastian
> >>>>
> &

Re: MM2 startup delay

2019-12-27 Thread Ryanne Dolan
ets
> successfully in 192 ms
> (org.apache.kafka.connect.runtime.WorkerSourceTask:515)
> [2019-12-27 11:36:53,911] INFO WorkerSourceTask{id=MirrorSourceConnector-0}
> Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:416)
> [2019-12-27 11:36:53,911] INFO WorkerSourceTask{id=MirrorSourceConnector-0}
> flushing 0 outstanding messages for offset commit
> (org.apache.kafka.connect.runtime.WorkerSourceTask:433)
>
>
> then ...
>
> [2019-12-27 11:44:47,642] ERROR Scheduler for MirrorSourceConnector caught
> exception in scheduled task: syncing topic ACLs
> (org.apache.kafka.connect.mirror.Scheduler:102)
> java.util.concurrent.ExecutionException:
> org.apache.kafka.common.errors.SecurityDisabledException: No Authorizer is
> configured on the broker
> at
>
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>
> and then the usual logs from WorkerSourceTask... for some minutes. And then
> another Security... exception:
>
> [2019-12-27 11:54:47,643] ERROR Scheduler for MirrorSourceConnector caught
> exception in scheduled task: syncing topic ACLs
> (org.apache.kafka.connect.mirror.Scheduler:102)
> java.util.concurrent.ExecutionException:
> org.apache.kafka.common.errors.SecurityDisabledException: No Authorizer is
> configured on the broker
> at
>
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>
> and only after another minute, it starts to subscribe to the topics:
>
> [2019-12-27 11:55:11,298] INFO [Consumer clientId=consumer-7, groupId=null]
> Subscribed to partition(s): Ninja ... ... 
> [2019-12-27 11:55:11,318] INFO Starting with 2303 previously uncommitted
> partitions. (org.apache.kafka.connect.mirror.MirrorSourceTask:94)
> [2019-12-27 11:55:11,319] INFO [Consumer clientId=consumer-7, groupId=null]
> Seeking to offset 0 for partition Ninja...-0
> (org.apache.kafka.clients.consumer.KafkaConsumer:1564)
> ...
> [2019-12-27 11:55:11,719] INFO task-thread-MirrorSourceConnector-0
> replicating 2710 topic-partitions eucmain->euwbackup: [Ninja...
>
> And then I see some real traffic towards the destination cluster, I guess
> it is the time when it really starts the mirroring.
>
> Peter
>
> On Wed, 11 Dec 2019 at 20:26, Ryanne Dolan  wrote:
>
> > Hey Peter. Do you see any timeouts in the logs? The internal scheduler
> will
> > timeout each task after 60 seconds by default, which might not be long
> > enough to finish some of the bootstrap tasks in your case. My team has
> > observed that behavior in super-flaky environments, e.g. when
> connectivity
> > drops during bootstrapping, in which case MirrorSourceConnector can get
> > into a funky state. This resolves when it refreshes its state after a
> > while. The default refresh interval of 10 minutes seems to jibe with your
> > observations.
> >
> > My team patched our internal MM2 build months ago to force bootstrapping
> to
> > complete correctly. I can share the patch, and if it helps we can raise a
> > PR.
> >
> > Ryanne
> >
> > On Mon, Dec 9, 2019 at 5:28 AM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Hi,
> > >
> > > I am experimenting with Mirror Make 2 in 2.4.0-rc3. It seems to start
> up
> > > fine, connects to both source and destination, creates new topics...
> > > But it does not start to actually mirror the messages until about 12
> > > minutes after MM2 was started. I would expect it to start mirroring in
> > some
> > > seconds after startup.
> > >
> > > Source cluster has about 2800 partitions, destination cluster is empty.
> > > Both clusters are in AWS but in different regions.
> > >
> > > What may cause the 12 minutes delay?
> > >
> > > Config is:
> > > ---
> > > clusters = eucmain, euwbackup
> > > eucmain.bootstrap.servers =
> > > test-kafka-main-fra01.xx:9092,test-kafka-main-fra02.xx:9092
> > > euwbackup.bootstrap.servers = 172.30.197.203:9092,172.30.213.104:9092
> > > eucmain->euwbackup.enabled = true
> > > eucmain->euwbackup.topics = .*
> > > eucmain->euwbackup.topics.blacklist = ^(kafka|kmf|__).*
> > > eucmain->euwbackup.rename.topics = false
> > > replication.policy.separator = __
> > > eucmain.client.id = mm2
> > >
> > > I do not see any serious errors in the logs that I would think of a
> cause
> > > of this.
> > >
> > > Thanks,
> > > Peter
> > >
> >
>
>
> --
>  - Sini
>


Re: Kafka 2.4.0 & Mirror Maker 2.0 Error

2019-12-27 Thread Ryanne Dolan
Thanks Peter, I'll take a look.

Ryanne

On Fri, Dec 27, 2019, 7:48 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I see the same.
> I just downloaded the Kafka zip and I run:
>
> ~/kafka-2.4.0-rc3$ ./bin/connect-mirror-maker.sh
> config/connect-mirror-maker.properties
>
> Peter
>
> On Mon, 16 Dec 2019 at 17:14, Ryanne Dolan  wrote:
>
> > Hey Jamie, are you running the MM2 connectors on an existing Connect
> > cluster, or with the connet-mirror-maker.sh driver? Given your question
> > about plugin.path I'm guessing the former. Is the Connect cluster running
> > 2.4.0 as well? The jars should land in the Connect runtime without any
> need
> > to modify the plugin.path or copy jars around.
> >
> > Ryanne
> >
> > On Mon, Dec 16, 2019, 6:23 AM Jamie  wrote:
> >
> > > Hi All,
> > > I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm
> > > receiving the following errors on startup:
> > > ERROR Plugin class loader for connector
> > > 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found.
> > > Returning:
> > >
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> > > ERROR Plugin class loader for connector
> > > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not
> > > found. Returning:
> > >
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> > > ERROR Plugin class loader for connector
> > > 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not
> > > found. Returning:
> > >
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> > >
> > > I've checked the jar file containing these class file is in the class
> > > path.
> > > Is there anything I need to add to plugin.path for the connect
> properties
> > > when running mirror maker?
> > > Many Thanks,
> > > Jamie
> >
>
>
> --
>  - Sini
>


Re: Simplifying standalone mm2-connect config

2019-12-24 Thread Ryanne Dolan
Hello Karan. I agree the initial experience could be a lot friendlier. Most
of the complexity there is inherited from Connect, but it's compounded when
multiple clusters are involved.

I don't think we want to change Connect's (or MM2's) defaults to assume a
single broker cluster -- it'd be too easy to overlook such defaults in a
prod environment, which could be dangerous. It might be nice to add a
high-level internal.replication.factor property that, if specified,
overrides the replication factor for all internal topics. I actually had a
prototype of this at one point but removed it, as it wasn't part of the KIP.

I agree we should modify the example config to work out-of-the-box with
single-broker clusters, and just call out that these should not be used in
a production environment. Lemme know if you want to create a PR or if you'd
like me to. It looks like you've already done most of the work :)

Ryanne


On Tue, Dec 24, 2019, 3:41 AM Karan Kumar  wrote:

> Hi
>
> One of the nice things about kafka is setting up in the local environment
> is really simple. I was giving a try to the latest feature ie MM2 and found
> it took me some time to get a minimal setup running.
> Default config provided assumes that there will already be 3 brokers
> running due to the default replication factor of the admin topics the mm2
> connector creates.
>
> This got me thinking that most of the people would follow the same approach
> I followed.
> 1. Start a single broker cluster on 9092
> 2. Start another single cluster broker on, let's say, 10002
> 3. Start mm2 by"./bin/connect-mirror-maker.sh
> ./config/connect-mirror-maker.properties"
>
> What happened was I had to supply a lot more configs
>
> clusters = A, B
>
> # connection information for each cluster
> A.bootstrap.servers = localhost:9092
> B.bootstrap.servers = localhost:10092
>
> # enable and configure individual replication flows
> A->B.enabled = *
> A->B.topics = test
> B->A.enabled = true
> B->A.topics = *
>
>
> A.heartbeats.topic.replication.factor=1
> A.checkpoints.topic.replication.factor=1
> A.offset-syncs.topic.replication.factor=1
>
> B.heartbeats.topic.replication.factor=1
> B.checkpoints.topic.replication.factor=1
> B.offset-syncs.topic.replication.factor=1
>
>
> A.offset.storage.replication.factor=1
> B.offset.storage.replication.factor=1
>
>
>
> A.status.storage.replication.factor=1
> B.status.storage.replication.factor=1
>
>
> A.config.storage.replication.factor=1
> B.config.storage.replication.factor=1
>
>
> The server.properties has bunch of properties like
> "offsets.topic.replication.factor=1
> transaction.state.log.replication.factor=1"
> which make it easier for people to start a local single broker kafka
> cluster.
>
>
> Does it make sense to have a similar config as the default mirror maker
> config so that it becomes easier for people to use the MM2 feature.
>
> --
> Thanks
> Karan
>


Re: Mirrormaker 2.0

2019-12-23 Thread Ryanne Dolan
assLoader:165)
> >
> > First I tried the config mentioned in the KIP for "MirrorMaker
> > Clusters" which didn't work and I found removing the "cluster." from
> > the bootstrap-servers made it work a bit more, at least it didn't
> > complain about not having any servers in the config.
> > So, I checked the "Running a dedicated MirrorMaker cluster"from the
> > KIP, which is basically more or less the same, but without the
> > "cluster." for the servers and it does at least start and it looks
> > like all the three MMs find each other, but no mirroring taking place.
> >
> > Running the legacy-config from the old MM is working fine though. I'll
> > try to do some more digging today, so if you need some of those very
> > verbose logs or something else just let me know. I am sure that I can
> > figure this out and just wanted to know if the documentation will get
> > extended as the new MM2 has a lot of features and is a bit more
> > complicated than the old one...
> >
> > Thanks
> >
> > Sebastian
> >
> > On 24-Dec-19 8:06 AM, Ryanne Dolan wrote:
> >> Hello Sebastian, please let us know what issues you are facing and we
> >> can
> >> probably help. Which config from the KIP are you referencing? Also check
> >> out the readme under ./connect/mirror for more examples.
> >>
> >> Ryanne
> >>
> >> On Mon, Dec 23, 2019, 12:58 PM Sebastian Schmitz <
> >> sebastian.schm...@propellerhead.co.nz> wrote:
> >>
> >>> Hello,
> >>>
> >>> I'm currently trying to implement the new Kafka 2.4.0 and the new MM2.
> >>>
> >>> However, it looks like the only documentation available is the KIP-382,
> >>> and the documentation
> >>> (https://kafka.apache.org/documentation/#basic_ops_mirror_maker) for
> >>> the
> >>> MM isn't yet updated, and the documentation in the KIP seems to be
> >>> missing some stuff as I get a lot of errors and warning when starting
> >>> the MM2 as connect-mirror, and it doesn't mirror, so I probably have
> >>> some mistakes in my configuration, but can't confirm this as it's the
> >>> same as in the KIP.
> >>>
> >>> Any plans when the documentation will be updated?
> >>>
> >>> Thanks
> >>>
> >>> Sebastian
> >>>
> >>>
> >>> --
> >>> DISCLAIMER
> >>> This email contains information that is confidential and which
> >>> may be
> >>> legally privileged. If you have received this email in error please
> >>>
> >>> notify the sender immediately and delete the email.
> >>> This email is intended
> >>> solely for the use of the intended recipient and you may not use or
> >>> disclose this email in any way.
> >>>
>
> --
> DISCLAIMER
> This email contains information that is confidential and which
> may be
> legally privileged. If you have received this email in error please
>
> notify the sender immediately and delete the email.
> This email is intended
> solely for the use of the intended recipient and you may not use or
> disclose this email in any way.
>


Re: Mirrormaker 2.0

2019-12-23 Thread Ryanne Dolan
Sebastian, there are multiple ways to run MM2. One way is to start the
individual Connectors (MirrorSourceConnector, MirrorCheckpointConnector,
and MirrorHeartbeatConnector) on an existing Connect cluster, if you have
one. Some of the configuration properties you've listed, e.g. "name" and
"connector.class" are only relevant when configuring individual Connectors
in this way.

The ./bin/connect-mirror.sh script, on the other hand, has its own
high-level configuration file. This is where properties like "a->b.topics"
come into play. The high-level configuration file is used to generate the
low-level configurations for each internal Connector.

Generally you'd use the high-level approach, unless you already have a
bunch of Connect clusters you want to leverage. Keep this distinction in
mind when looking at example configurations, and hopefully things will be
clearer.

Ryanne

On Mon, Dec 23, 2019 at 1:34 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello,
>
> I tried running this connect-mirror-config:
>
> 
> name = $MIRROR_NAME
> clusters = source, target
> source.bootstrap.servers = $SOURCE_SERVERS
> target.bootstrap.servers = $TARGET_SERVERS
> source->target.topics = $SOURCE_TARGET_TOPICS
> target->source.topics = $TARGET_SOURCE_TOPICS
> source->target.emit.heartbeats.enabled = true
> target->source.emit.heartbeats.enabled = true
> connector.class = org.apache.kafka.connect.mirror.MirrorSourceConnector
>
> # disable some new features
> refresh.topics.enabled = false
> refresh.groups.enabled = false
> emit.checkpoints.enables = true
> emit.heartbeats.enabled = true
> sync.topic.configs.enabled = false
> sync.topic.acls.enabled = false
> 
>
> SOURCE_SERVERS and TARGET_SERVERS are a comma-separated list of three
> brokers with ports.
> The TOPICS are |-separated lists of topics.
>
> I get these warning during startup which is a bit weird as I never
> supplied any of those settings, but maybe I should?
>
> [2019-12-23 00:36:25,918] WARN The configuration 'config.storage.topic'
> was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,918] WARN The configuration
> 'producer.bootstrap.servers' was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,918] WARN The configuration 'group.id' was supplied
> but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration 'status.storage.topic'
> was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration 'header.converter' was
> supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration
> 'consumer.bootstrap.servers' was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration 'offset.storage.topic'
> was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration 'value.converter' was
> supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration 'key.converter' was
> supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
> [2019-12-23 00:36:25,919] WARN The configuration
> 'admin.bootstrap.servers' was supplied but isn't a known config.
> (org.apache.kafka.clients.producer.ProducerConfig:355)
>
> And this error:
>
> [2019-12-23 00:36:29,320] ERROR Plugin class loader for connector:
> 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found.
> Returning:
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@5c316230
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165)
>
> First I tried the config mentioned in the KIP for "MirrorMaker Clusters"
> which didn't work and I found removing the "cluster." from the
> bootstrap-servers made it work a bit more, at least it didn't complain
> about not having any servers in the config.
> So, I checked the "Running a dedicated MirrorMaker cluster"from the KIP,
> which is basically more or less the same, but without the "cluster." for
> the servers and it does at least start and it looks like all the three
> MMs find each other, but no mirroring taking place.
>
> Running the legacy-config from the old MM is working fine though. I'll
> try to do some more digging today, so if you nee

Re: Mirrormaker 2.0

2019-12-23 Thread Ryanne Dolan
Hello Sebastian, please let us know what issues you are facing and we can
probably help. Which config from the KIP are you referencing? Also check
out the readme under ./connect/mirror for more examples.

Ryanne

On Mon, Dec 23, 2019, 12:58 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello,
>
> I'm currently trying to implement the new Kafka 2.4.0 and the new MM2.
>
> However, it looks like the only documentation available is the KIP-382,
> and the documentation
> (https://kafka.apache.org/documentation/#basic_ops_mirror_maker) for the
> MM isn't yet updated, and the documentation in the KIP seems to be
> missing some stuff as I get a lot of errors and warning when starting
> the MM2 as connect-mirror, and it doesn't mirror, so I probably have
> some mistakes in my configuration, but can't confirm this as it's the
> same as in the KIP.
>
> Any plans when the documentation will be updated?
>
> Thanks
>
> Sebastian
>
>
> --
> DISCLAIMER
> This email contains information that is confidential and which
> may be
> legally privileged. If you have received this email in error please
>
> notify the sender immediately and delete the email.
> This email is intended
> solely for the use of the intended recipient and you may not use or
> disclose this email in any way.
>


Re: Kafka 2.4.0 & Mirror Maker 2.0 Error

2019-12-16 Thread Ryanne Dolan
Hey Jamie, are you running the MM2 connectors on an existing Connect
cluster, or with the connet-mirror-maker.sh driver? Given your question
about plugin.path I'm guessing the former. Is the Connect cluster running
2.4.0 as well? The jars should land in the Connect runtime without any need
to modify the plugin.path or copy jars around.

Ryanne

On Mon, Dec 16, 2019, 6:23 AM Jamie  wrote:

> Hi All,
> I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm
> receiving the following errors on startup:
> ERROR Plugin class loader for connector
> 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found.
> Returning:
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> ERROR Plugin class loader for connector
> 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not
> found. Returning:
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
> ERROR Plugin class loader for connector
> 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not
> found. Returning:
> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
> (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>
> I've checked the jar file containing these class file is in the class
> path.
> Is there anything I need to add to plugin.path for the connect properties
> when running mirror maker?
> Many Thanks,
> Jamie


Re: rename.topics setting missing from MirrorMaker 2 ?

2019-12-14 Thread Ryanne Dolan
Hey Alan, good catch. I've removed that property from the KIP. We'll have a
LegacyReplicationPolicy in a future release that will not rename topics.
You could implement your own in the meantime (see replication.policy.class).

Ryanne

On Mon, Nov 25, 2019, 11:09 AM Alan  wrote:

> Hi,
>
>
>
> I've just downloaded and built
> 2.4-rc1 to test the MirrorMaker 2 feature and I cannot disable the
> name.topics setting as detailed in the KIP
>
>
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-ConnectorConfigurationProperties
>
>
>
> I
>  did a quick search through the source code and could find no mention of
>  it. Am I missing something, or was it never implemented?
>
>
>
> Thanks,
>
>
>
> Alan


Re: MM2 startup delay

2019-12-11 Thread Ryanne Dolan
Hey Peter. Do you see any timeouts in the logs? The internal scheduler will
timeout each task after 60 seconds by default, which might not be long
enough to finish some of the bootstrap tasks in your case. My team has
observed that behavior in super-flaky environments, e.g. when connectivity
drops during bootstrapping, in which case MirrorSourceConnector can get
into a funky state. This resolves when it refreshes its state after a
while. The default refresh interval of 10 minutes seems to jibe with your
observations.

My team patched our internal MM2 build months ago to force bootstrapping to
complete correctly. I can share the patch, and if it helps we can raise a
PR.

Ryanne

On Mon, Dec 9, 2019 at 5:28 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I am experimenting with Mirror Make 2 in 2.4.0-rc3. It seems to start up
> fine, connects to both source and destination, creates new topics...
> But it does not start to actually mirror the messages until about 12
> minutes after MM2 was started. I would expect it to start mirroring in some
> seconds after startup.
>
> Source cluster has about 2800 partitions, destination cluster is empty.
> Both clusters are in AWS but in different regions.
>
> What may cause the 12 minutes delay?
>
> Config is:
> ---
> clusters = eucmain, euwbackup
> eucmain.bootstrap.servers =
> test-kafka-main-fra01.xx:9092,test-kafka-main-fra02.xx:9092
> euwbackup.bootstrap.servers = 172.30.197.203:9092,172.30.213.104:9092
> eucmain->euwbackup.enabled = true
> eucmain->euwbackup.topics = .*
> eucmain->euwbackup.topics.blacklist = ^(kafka|kmf|__).*
> eucmain->euwbackup.rename.topics = false
> replication.policy.separator = __
> eucmain.client.id = mm2
>
> I do not see any serious errors in the logs that I would think of a cause
> of this.
>
> Thanks,
> Peter
>


Re: Auto Scaling in Kafka

2019-12-04 Thread Ryanne Dolan
Akash, take a look at LinkedIn's Cruise Control project. It can
automatically rebalance partitions across brokers, etc.

Ryanne

On Wed, Dec 4, 2019, 12:10 AM Goel, Akash Deep
 wrote:

> Hi ,
>
> Is it possible to auto scale Kafka? If it is not directly supported, then
> is there automated way of adding brokers and perform other tasks like
> partition rebalancing.
>
> Regards,
> Akash D Goel
> Digital & Business Integration Manager
>
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy. Your privacy is important to us.
> Accenture uses your personal data only in compliance with data protection
> laws. For further information on how Accenture processes your personal
> data, please see our privacy statement at
> https://www.accenture.com/us-en/privacy-policy.
>
> __
>
> www.accenture.com
>


Re: Running Kafka Stream Application in YARN

2019-11-15 Thread Ryanne Dolan
> Why that? Just because there is explicit documentation?

Just that they target YARN.

Ryanne

On Thu, Nov 14, 2019, 1:59 AM Matthias J. Sax  wrote:

> Why that? Just because there is explicit documentation?
>
>
> @Debraj: Kafka Streams can be deployed as a regular Java application.
> Hence, and tutorial on how to run a Java application on YARN should help.
>
>
> -Matthias
>
> On 11/11/19 10:33 AM, Ryanne Dolan wrote:
> > Consider using Flink, Spark, or Samza instead.
> >
> > Ryanne
> >
> > On Fri, Nov 8, 2019, 4:27 AM Debraj Manna 
> wrote:
> >
> >> Hi
> >>
> >> Is there any documentation or link I can refer to for the steps for
> >> deploying the Kafka Streams application in YARN?
> >>
> >> Kafka Client - 0.11.0.3
> >> Kafka Broker - 2.2.1
> >> YARN - 2.6.0
> >>
> >
>
>


Re: MirrorMaker 2 Plugin class loader Error

2019-11-11 Thread Ryanne Dolan
Rajeev, the config errors are unavoidable at present and can be ignored or
silenced. The Plugin error is concerning, and was previously described by
Vishal. I suppose it's possible there is a dependency conflict in these
builds. Can you send me the hash that you're building from? I'll try to
reproduce.

Ryanne

On Fri, Nov 8, 2019, 7:31 PM Rajeev Chakrabarti 
wrote:

> Hi Folks,
> I'm trying to run MM 2 with the current trunk. Also, tried with the 2.4
> branch and I'm getting "ERROR Plugin class loader for connector:
> 'org.apache.kafka.connect.mirror.MirrorSourceConnector'" errors for all the
> connectors. It does not seem to be creating topics in the destination
> cluster but has created the internal topics at both the source and
> destination and has populated the heartbeats topic. But none of the source
> topics created or replicated. I'm also getting a bunch of "not known
> configs" like 'consumer.group.id' was supplied but isn't a known config.
> What am I doing wrong?
> Regards,Rajeev
>


  1   2   >