[jira] [Commented] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-24 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849391#comment-17849391
 ] 

Edoardo Comar commented on KAFKA-15905:
---

backported to 3.7.1

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-05-24 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849390#comment-17849390
 ] 

Edoardo Comar commented on KAFKA-16622:
---

backported to 3.7.1

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-23 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reopened KAFKA-15905:
---

reopening for backporting to 3.7.1 to be confermed

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-23 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15905:
--
Fix Version/s: 3.7.1

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16641) MM2 offset translation should interpolate between sparse OffsetSyncs

2024-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848563#comment-17848563
 ] 

Edoardo Comar commented on KAFKA-16641:
---

[~gharris] what pathological scenarios do you think of ?

frequent aborted transactions on a source topic?

source topic frequently deleted / recreated with same name ?

> MM2 offset translation should interpolate between sparse OffsetSyncs
> 
>
> Key: KAFKA-16641
> URL: https://issues.apache.org/jira/browse/KAFKA-16641
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Greg Harris
>Priority: Major
>
> Right now, the OffsetSyncStore keeps a sparse offset store, with exponential 
> spacing between syncs. This can leave large gaps in translation, where 
> offsets are translated much more conservatively than necessary.
> The dominant way to use MirrorMaker2 is in a "single writer" fashion, where 
> the target topic is only written to by a single mirror maker 2. When a topic 
> without gaps is replicated, contiguous blocks of offsets are preserved. For 
> example:
> Say that MM2 mirrors 100 records, and emits two syncs: 0:100 and 100:200. We 
> can detect when the gap between the upstream and downstream offsets is the 
> same using subtraction, and then assume that 50:150 is also a valid 
> translation. If the source topic has gaps, or goes through a restart, we 
> should expect a discontinuity in the offset syncs, like 0:100 and 100:250 or 
> 0:100 and 100:150.
> This may allow us to restore much of the offset translation precision that 
> was lost for simple contiguous topics, without additional memory usage, but 
> at the risk of mis-translating some pathological situations when the source 
> topic has gaps. This might be able to be enabled unconditionally, or enabled 
> via a configuration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-22 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15905:
--
Fix Version/s: 3.7.1
   3.8
   (was: 3.80)
   (was: 3.71)

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-05-22 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16622:
--
Fix Version/s: 3.7.1
   3.8

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.7.1, 3.8
>
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-22 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15905:
--
Fix Version/s: 3.80
   3.71

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.80, 3.71
>
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848520#comment-17848520
 ] 

Edoardo Comar commented on KAFKA-15905:
---

[https://github.com/apache/kafka/pull/15910]

fixes both this issue https://issues.apache.org/jira/browse/KAFKA-15905

and the related https://issues.apache.org/jira/browse/KAFKA-16622

 

we could backport the fix to 3.7.1

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848519#comment-17848519
 ] 

Edoardo Comar commented on KAFKA-16622:
---

[https://github.com/apache/kafka/pull/15910]

fixes both this https://issues.apache.org/jira/browse/KAFKA-16622

and https://issues.apache.org/jira/browse/KAFKA-15905

 

we could backport the fix to 3.7.1

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-05-15 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-16622:
-

Assignee: Edoardo Comar

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2024-05-09 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-15905:
-

Assignee: Edoardo Comar

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> 
>
> Key: KAFKA-15905
> URL: https://issues.apache.org/jira/browse/KAFKA-15905
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Edoardo Comar
>Priority: Major
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-29 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841927#comment-17841927
 ] 

Edoardo Comar commented on KAFKA-16622:
---

related issue https://issues.apache.org/jira/browse/KAFKA-16364

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-27 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841137#comment-17841137
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/27/24 10:55 AM:
-

if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.

but the problem here is that if the consumer never fully catches up once, we 
will never have a checkpoint.

If as initial state the 
{color:#00}OffsetSyncStore.{color}{color:#871094}offsetSyncs 
{color}contained a distribution of {color:#00}OffsetSync rather than just 
multiple copies of the last {color}{color:#00}OffsetSync , Checkpoints 
would be computed earlier I think{color}

{color:#00} {color}


was (Author: ecomar):
if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.

but the problem here is that if the consumer never fully catches up once, we 
will never have a checkpoint.

If the {color:#00}OffsetSyncStore.{color}{color:#871094}offsetSyncs 
{color}contained a distribution of {color:#00}OffsetSync rather than just 
multiple copies of the last {color}{color:#00}OffsetSync , Checkpoints 
would be computed before
{color}

{color:#00} {color}

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841137#comment-17841137
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 6:04 PM:


if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.

but the problem here is that if the consumer never fully catches up once, we 
will never have a checkpoint.

If the {color:#00}OffsetSyncStore.{color}{color:#871094}offsetSyncs 
{color}contained a distribution of {color:#00}OffsetSync rather than just 
multiple copies of the last {color}{color:#00}OffsetSync , Checkpoints 
would be computed before
{color}

{color:#00} {color}


was (Author: ecomar):
if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.

 

but the problem here is that if the consumer never fully cathces up once, we 
will never have a checkpoint

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841137#comment-17841137
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:40 AM:
-

if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.

 

but the problem here is that if the consumer never fully cathces up once, we 
will never have a checkpoint


was (Author: ecomar):
if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.


> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841137#comment-17841137
 ] 

Edoardo Comar commented on KAFKA-16622:
---

if after the consumer reaches 1 and the 1st checkpoint is emitted, MM2 
restarts before the other 1 messages are produced,
then bug https://issues.apache.org/jira/browse/KAFKA-15905 hits and we end up 
with just two checkpoints, at 1 and 2.


> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:18 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}

{{{color:#00}clientProps{color}.put({color:#00}ConsumerConfig{color}.{color:#871094}MAX_POLL_RECORDS_CONFIG{color},
 {color:#067d17}"100"{color});}}

{{{color:#0033b3}...
while {color}({color:#00}System{color}.currentTimeMillis() - 
{color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00} ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00} polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, {color:#00}consumer{color});}}
{{{color:#00} 
consumer{color}.commitSync({color:#00}Duration{color}.ofSeconds({color:#1750eb}10{color}));}}
{{{color:#00} Thread{color}.sleep({color:#1750eb}1000L{color});}}
{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

 
 
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=16700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=16501, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=18200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18096, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18965, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=19636, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}

[^connect.log.2024-04-26-10.zip]


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}

{{{color:#00}clientProps{color}.put({color:#00}ConsumerConfig{color}.{color:#871094}MAX_POLL_RECORDS_CONFIG{color},
 {color:#067d17}"100"{color});}}

{{{color:#0033b3}...
while {color}({color:#00}System{color}.currentTimeMillis() - 
{color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00} ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:17 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}

{{{color:#0033b3}while {color}({color:#00}System{color}.currentTimeMillis() 
- {color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00}ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00}polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, {color:#00}consumer{color});}}
{{{color:#00}consumer{color}.commitSync({color:#00}Duration{color}.ofSeconds({color:#1750eb}10{color}));}}
{{{color:#00}Thread{color}.sleep({color:#1750eb}1000L{color});}}
{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

 
 
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=16700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=16501, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=18200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18096, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18965, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=19636, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}

[^connect.log.2024-04-26-10.zip]{}}}


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}
{{}}

{{{color:#0033b3}while {color}({color:#00}System{color}.currentTimeMillis() 
- {color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00}ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00}polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, {color:#00}consumer{color});}}
{{{color:#00}consumer{color}.commitSync({color:#00}Duration{color}.ofSeconds({color:#1750eb}10{color}));}}

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:17 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}

{{{color:#00}clientProps{color}.put({color:#00}ConsumerConfig{color}.{color:#871094}MAX_POLL_RECORDS_CONFIG{color},
 {color:#067d17}"100"{color});}}

{{{color:#0033b3}...
while {color}({color:#00}System{color}.currentTimeMillis() - 
{color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00} ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00} polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, {color:#00}consumer{color});}}
{{{color:#00} 
consumer{color}.commitSync({color:#00}Duration{color}.ofSeconds({color:#1750eb}10{color}));}}
{{{color:#00} Thread{color}.sleep({color:#1750eb}1000L{color});}}
{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

 
 
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=16700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=16501, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=18200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18096, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18965, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=19636, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}

[^connect.log.2024-04-26-10.zip]{}}}


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}

{{{color:#0033b3}while {color}({color:#00}System{color}.currentTimeMillis() 
- {color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00}ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00}polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, 

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:16 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}
{{}}

{{{color:#0033b3}while {color}({color:#00}System{color}.currentTimeMillis() 
- {color:#00}now {color}< 
{color:#1750eb}1200{color}*{color:#1750eb}1000L{color}) {}}
{{{color:#00}ConsumerRecords{color}<{color:#00}String{color}, 
{color:#00}String{color}> {color:#00}crs1 {color}= 
{color:#00}consumer{color}.poll({color:#00}Duration{color}.ofMillis({color:#1750eb}1000L{color}));}}
{{{color:#00}polledCount {color}= {color:#00}polledCount {color}+ 
print({color:#00}crs1{color}, {color:#00}consumer{color});}}
{{{color:#00}consumer{color}.commitSync({color:#00}Duration{color}.ofSeconds({color:#1750eb}10{color}));}}
{{{color:#00}Thread{color}.sleep({color:#1750eb}1000L{color});}}
{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

 
 
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=1, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=16700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=16501, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=18200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18096, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19200, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=18965, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=19700, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=19636, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}
{color:#00}Checkpoint{{color}{color:#ff}consumerGroupId{color}{color:#00}=mygroup1,
 {color}{color:#ff}topicPartition{color}{color:#00}=source.mytopic-0, 
{color}{color:#ff}upstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}downstreamOffset{color}{color:#00}=2, 
{color}{color:#ff}metadata{color}{color:#00}=}{color}

[^connect.log.2024-04-26-10.zip]{}}}


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}
{{{}while (...){}}}{{{}{ {}}}

{{ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L)); }}

{{// print(crs1, consumer); print last record of polled 
consumer.commitSync(Duration.ofSeconds(10)); }}

{{Thread.sleep(1000L); }}

{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have


{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=1, downstreamOffset=1, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=16700, 

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:15 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
 - polls 100 messages
 - commitSync
 - waits 1 sec
e.g.

{{ }}
{{{}while (...){}}}{{{}{ {}}}

{{ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L)); }}

{{// print(crs1, consumer); print last record of polled 
consumer.commitSync(Duration.ofSeconds(10)); }}

{{Thread.sleep(1000L); }}

{{}}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have


{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=1, downstreamOffset=1, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=16700, downstreamOffset=16501, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=18200, downstreamOffset=18096, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=19200, downstreamOffset=18965, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=19700, downstreamOffset=19636, 
metadata=}{}}}{{{}Checkpoint{}}}{{{}{consumerGroupId=mygroup1, 
topicPartition=source.mytopic-0, upstreamOffset=2, downstreamOffset=2, 
metadata=}{}}}{{{}{}}}{{{}
[^connect.log.2024-04-26-10.zip]{}}}


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.

 {{   
while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

{{
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}
}}

 [^connect.log.2024-04-26-10.zip] 

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the 

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:13 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.

 {{   
while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
}}

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have

{{
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}
}}

 [^connect.log.2024-04-26-10.zip] 


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.
```
 {{   while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
}}```

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have
```
{{Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}}}
```
 [^connect.log.2024-04-26-10.zip] 

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh 

[jira] [Comment Edited] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar edited comment on KAFKA-16622 at 4/26/24 10:12 AM:
-

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.
```
 {{   while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
}}```

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have
```
{{Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}}}
```
 [^connect.log.2024-04-26-10.zip] 


was (Author: ecomar):
Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.
```
while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
```

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have
```
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}
```
 [^connect.log.2024-04-26-10.zip] 

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh 

[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841134#comment-17841134
 ] 

Edoardo Comar commented on KAFKA-16622:
---

Hi [~gharris1727]
The task did not restart. Here are the trace logs generated in the following 
scenario.
clean start with the properties file posted above.
create 'mytopic' 1 topic with 1 partition
produce 10,000 messages to source cluster
start a single consumer in source cluster that in a loop
- polls 100 messages
- commitSync
- waits 1 sec
e.g.
```
while (...) {
ConsumerRecords crs1 = 
consumer.poll(Duration.ofMillis(1000L));
   // print(crs1, consumer); print last record of polled
consumer.commitSync(Duration.ofSeconds(10));
Thread.sleep(1000L);
  }
```

the first checkpoint is only emitted when the consumer catches up fully at 
1.
Then other 1 messages are produced quickly and the consumer advances, and 
some checkpoints are emitted
so that overall we have
```
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=1, downstreamOffset=1, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=16700, downstreamOffset=16501, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=18200, downstreamOffset=18096, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19200, downstreamOffset=18965, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=19700, downstreamOffset=19636, metadata=}
Checkpoint{consumerGroupId=mygroup1, topicPartition=source.mytopic-0, 
upstreamOffset=2, downstreamOffset=2, metadata=}
```
 [^connect.log.2024-04-26-10.zip] 

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-26 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16622:
--
Attachment: connect.log.2024-04-26-10.zip

> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: connect.log.2024-04-26-10.zip, 
> edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-25 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840966#comment-17840966
 ] 

Edoardo Comar commented on KAFKA-16622:
---

Activating DEBUG logging 
```
[2024-04-25 21:20:10,856] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(mygroup1,mytopic-0,13805): Skipped 
(OffsetSync{topicPartition=mytopic-0, upstreamOffset=1, 
downstreamOffset=1} is ahead of upstream consumer group 13805) 
(org.apache.kafka.connect.mirror.OffsetSyncStore:125)
```
The checkpoint is not emitted because the topic-partition has been mirrorred 
further than where the consumer group is,
so until the group catches up no checkpoints will be emitted.

Question for [~gregharris73]
this behavior would mean that any consumers in groups that are behind the log 
end 
that are switched from consuming from source cluster to the target cluster 
to reprocess the entire partition ? They would have access to no translated 
offsets.


> Mirromaker2 first Checkpoint not emitted until consumer group fully catches 
> up once
> ---
>
> Key: KAFKA-16622
> URL: https://issues.apache.org/jira/browse/KAFKA-16622
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.7.0, 3.6.2, 3.8.0
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> We observed an excessively delayed emission of the MM2 Checkpoint record.
> It only gets created when the source consumer reaches the end of a topic. 
> This does not seem reasonable.
> In a very simple setup :
> Tested with a standalone single process MirrorMaker2 mirroring between two 
> single-node kafka clusters(mirromaker config attached) with quick refresh 
> intervals (eg 5 sec) and a small offset.lag.max (eg 10)
> create a single topic in the source cluster
> produce data to it (e.g. 1 records)
> start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec 
> between polls which commits after each poll
> watch the Checkpoint topic in the target cluster
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
>   --topic source.checkpoints.internal \
>   --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
>--from-beginning
> -> no record appears in the checkpoint topic until the consumer reaches the 
> end of the topic (ie its consumer group lag gets down to 0).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16622) Mirromaker2 first Checkpoint not emitted until consumer group fully catches up once

2024-04-25 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-16622:
-

 Summary: Mirromaker2 first Checkpoint not emitted until consumer 
group fully catches up once
 Key: KAFKA-16622
 URL: https://issues.apache.org/jira/browse/KAFKA-16622
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 3.6.2, 3.7.0, 3.8.0
Reporter: Edoardo Comar
 Attachments: edo-connect-mirror-maker-sourcetarget.properties

We observed an excessively delayed emission of the MM2 Checkpoint record.
It only gets created when the source consumer reaches the end of a topic. This 
does not seem reasonable.

In a very simple setup :

Tested with a standalone single process MirrorMaker2 mirroring between two 
single-node kafka clusters(mirromaker config attached) with quick refresh 
intervals (eg 5 sec) and a small offset.lag.max (eg 10)

create a single topic in the source cluster
produce data to it (e.g. 1 records)
start a slow consumer - e.g. fetching 50records/poll and pausing 1 sec between 
polls which commits after each poll

watch the Checkpoint topic in the target cluster

bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 \
  --topic source.checkpoints.internal \
  --formatter org.apache.kafka.connect.mirror.formatters.CheckpointFormatter \
   --from-beginning

-> no record appears in the checkpoint topic until the consumer reaches the end 
of the topic (ie its consumer group lag gets down to 0).







--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-18 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828067#comment-17828067
 ] 

Edoardo Comar commented on KAFKA-16369:
---

fix cherry picked to 3.6 and 3.7

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-18 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827976#comment-17827976
 ] 

Edoardo Comar commented on KAFKA-16369:
---

fix merged in trunk thanks [~showuon]

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-18 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16369:
--
Fix Version/s: 3.8.0

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.8.0
>
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-18 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16369:
--
Fix Version/s: 3.6.2
   3.7.1

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Fix For: 3.6.2, 3.8.0, 3.7.1
>
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-14 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827107#comment-17827107
 ] 

Edoardo Comar commented on KAFKA-16369:
---

PR is ready for review - can anyone take a look please ?

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-14 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16369:
--
Affects Version/s: 3.6.1
   3.7.0
   3.8.0

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0, 3.6.1, 3.8.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826827#comment-17826827
 ] 

Edoardo Comar commented on KAFKA-16369:
---

KafkaServer (ZKMode) needs to wait on the future returned by 
SocketServer.enableRequestProcessing

similarly to the BrokerServer (KRaft mode).

PR in progress

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-16369:
-

Assignee: Edoardo Comar

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-16369:
--
Attachment: server.log
kraft-server.log

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826811#comment-17826811
 ] 

Edoardo Comar commented on KAFKA-16369:
---

server logs as broker only (zk mode) and broker/controller (kraft mode) when 
port 9092 is already bound

[^server.log][^kraft-server.log]

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826811#comment-17826811
 ] 

Edoardo Comar edited comment on KAFKA-16369 at 3/13/24 5:50 PM:


server logs as broker only (zk mode) and broker/controller (kraft mode) when 
port 9092 is already bound

[^server.log]

[^kraft-server.log]


was (Author: ecomar):
server logs as broker only (zk mode) and broker/controller (kraft mode) when 
port 9092 is already bound

[^server.log][^kraft-server.log]

> Broker may not shut down when SocketServer fails to bind as Address already 
> in use
> --
>
> Key: KAFKA-16369
> URL: https://issues.apache.org/jira/browse/KAFKA-16369
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: kraft-server.log, server.log
>
>
> When in Zookeeper mode, if a port the broker should listen to is already bound
> the KafkaException: Socket server failed to bind to localhost:9092: Address 
> already in use.
> is thrown but the Broker continues to startup .
> It correctly shuts down when in KRaft mode.
> Easy to reproduce when in Zookeper mode with server.config set to listen to 
> localhost only
> {color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16369) Broker may not shut down when SocketServer fails to bind as Address already in use

2024-03-13 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-16369:
-

 Summary: Broker may not shut down when SocketServer fails to bind 
as Address already in use
 Key: KAFKA-16369
 URL: https://issues.apache.org/jira/browse/KAFKA-16369
 Project: Kafka
  Issue Type: Bug
Reporter: Edoardo Comar


When in Zookeeper mode, if a port the broker should listen to is already bound

the KafkaException: Socket server failed to bind to localhost:9092: Address 
already in use.

is thrown but the Broker continues to startup .

It correctly shuts down when in KRaft mode.

Easy to reproduce when in Zookeper mode with server.config set to listen to 
localhost only
{color:#00}listeners={color}{color:#a31515}PLAINTEXT://localhost:9092{color}
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-06 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740748#comment-17740748
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/6/23 8:56 PM:
---

I'll rather rewrite our test expecting the record to be found consuming no more 
than offset.lag.max=100 records from the target cluster, starting from the last 
checkpoint


was (Author: ecomar):
I'll rather rewrite our test expecting the record to be found consuming no more 
than offset.lag.max=100 records from the target cluster

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-06 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740748#comment-17740748
 ] 

Edoardo Comar commented on KAFKA-15144:
---

I'll rather rewrite our test expecting the record to be found consuming no more 
than offset.lag.max=100 records from the target cluster

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-06 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740550#comment-17740550
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/6/23 11:03 AM:


Thanks [~gharris1727] 

I had a good look at the code and now it's all clear.

Except that we could not find very good docs on how to consume the Checkpoints 
... so in the past one of our tests was written expecting to find the exact 
match for upstream/downsteram because that was the externally visible behavior 
when all wasd good.

Cheers


was (Author: ecomar):
Thanks [~gharris1727] 

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-06 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740550#comment-17740550
 ] 

Edoardo Comar commented on KAFKA-15144:
---

Thanks [~gharris1727] 

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-15144.
---
Resolution: Not A Bug

Closing as not a bug.

The "problem" arose as without config changes, by updating MM2 from 3.3.2 to a 
later release the observable content of the Checkpoint topic has changed 
considerably.

 

In 3.3.2 even without new records in the OffsetSync topic, the Checkpoint 
records were advancing often (and even contain many duplicates). 



Now gaps of up to offset.lag.max must be expected and more reprocessing of 
records downstream may occur

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739883#comment-17739883
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 12:12 PM:


again, with {color:#00}offset.lag.max={color}{color:#a31515}100{color}

producing at irregular rate with a console producer,

the checkpoints are unexpected to me

 

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=101, 
downstreamOffset=101}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=202, 
downstreamOffset=202}

 

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=118, 
downstreamOffset=102, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=227, 
downstreamOffset=203, metadata=}

 

so a consumer using Checkpoint switching from source to target is guaranteed 
not to miss any messages but may reprocess quite a few, correct ?


was (Author: ecomar):
again, with {color:#00}offset.lag.max={color}{color:#a31515}100{color}

producing at irregular rate with a console producer,

the checkpoints are unexpected to me

 

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=101, 
downstreamOffset=101}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=202, 
downstreamOffset=202}

 

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=118, 
downstreamOffset=102, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=227, 
downstreamOffset=203, metadata=}

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739883#comment-17739883
 ] 

Edoardo Comar commented on KAFKA-15144:
---

again, with {color:#00}offset.lag.max={color}{color:#a31515}100{color}

producing at irregular rate with a console producer,

the checkpoints are unexpected to me

 

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=101, 
downstreamOffset=101}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=202, 
downstreamOffset=202}

 

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=118, 
downstreamOffset=102, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=227, 
downstreamOffset=203, metadata=}

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 10:32 AM:


Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=30, 
downstreamOffset=29{*}, metadata=}

Checkpoint

{consumerGroupId=edogroup, topicPartition=source.vf-mirroring-test-edo-0, 
*upstreamOffset=52, downstreamOffset=51,* metadata=}

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=57, 
downstreamOffset=57, metadata=}

 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running

 

the discrepancy observed above is not present if 
{color:#00}offset.lag.max={color}{color:#a31515}0
{color}but that comes at the cost of emitting offsets sysncs


was (Author: ecomar):
Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}

[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 10:32 AM:


Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=30, 
downstreamOffset=29{*}, metadata=}

Checkpoint

{consumerGroupId=edogroup, topicPartition=source.vf-mirroring-test-edo-0, 
*upstreamOffset=52, downstreamOffset=51,* metadata=}

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=57, 
downstreamOffset=57, metadata=}

 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running

 

the discrepancy observed above is not present if 
{color:#00}offset.lag.max={color}{color:#a31515}0
{color:#172b4d}but that comes at the cost of emitting offsets 
sysncs{color}{color}


was (Author: ecomar):
Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 

[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 9:51 AM:
---

Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=30, 
downstreamOffset=29{*}, metadata=}

Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=52, 
downstreamOffset=51,* metadata=}

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=57, 
downstreamOffset=57, metadata=}

 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running


was (Author: ecomar):
Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}

[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 9:51 AM:
---

Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=30, 
downstreamOffset=29{*}, metadata=}

Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=52, 
downstreamOffset=51,* metadata=}

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=57, 
downstreamOffset=57, metadata=}

 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running


was (Author: ecomar):
Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}

[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/4/23 9:50 AM:
---

Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=30, 
downstreamOffset=29{*}, metadata=}

Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=52, 
downstreamOffset=51,* metadata=}

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=57, 
downstreamOffset=57, metadata=}

 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running


was (Author: ecomar):
Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}

[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739868#comment-17739868
 ] 

Edoardo Comar commented on KAFKA-15144:
---

Experimenting a bit, setting
{color:#00}offset.lag.max={color}{color:#a31515}1{color}
and then running a console producer and consumer on the source,
I still see puzzling checkpoints despite what looks a correct offset sync :
 
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=6, 
downstreamOffset=6}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=8, 
downstreamOffset=8}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=10, 
downstreamOffset=10}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=12, 
downstreamOffset=12}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=14, 
downstreamOffset=14}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=16, 
downstreamOffset=16}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=18, 
downstreamOffset=18}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=20, 
downstreamOffset=20}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=22, 
downstreamOffset=22}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=24, 
downstreamOffset=24}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=26, 
downstreamOffset=26}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=28, 
downstreamOffset=28}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=30, 
downstreamOffset=30}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=32, 
downstreamOffset=32}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=34, 
downstreamOffset=34}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=36, 
downstreamOffset=36}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=38, 
downstreamOffset=38}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=40, 
downstreamOffset=40}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=42, 
downstreamOffset=42}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=44, 
downstreamOffset=44}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=46, 
downstreamOffset=46}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=48, 
downstreamOffset=48}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=50, 
downstreamOffset=50}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=52, 
downstreamOffset=52}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=54, 
downstreamOffset=54}
OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=56, 
downstreamOffset=56}
 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=2, 
downstreamOffset=2, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=3, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=4, 
downstreamOffset=4, metadata=}
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=5, 
downstreamOffset=5, metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=30, 
downstreamOffset=29,* metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=52, 
downstreamOffset=51,* metadata=}
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, *upstreamOffset=57, 
downstreamOffset=57,* metadata=}
 
I'd expect the upstreamOffset to be always the same as the upstreamOffset
there are no transactions, and the topic has been created fresh with MM2 running

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start 

[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739867#comment-17739867
 ] 

Edoardo Comar commented on KAFKA-15144:
---

Thanks [~gharris1727] !

I had realized that the lack of sync offset records was the cause of the 
downstream checkpoint not advancing.
I had missed is that the external behavior has changed if the configuration was 
kept unchanged.
 

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739688#comment-17739688
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/3/23 4:52 PM:
---

producing 1 record at a time with th console producer, while a consumer is 
polling (on source) the MM2 logs report :

{{[2023-07-03 17:44:49,479] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,0): Translated 0 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:49,479] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0, metadata=} (first for this partition) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:204)}}
{{...}}

{{[2023-07-03 17:44:54,510] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,1): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:54,511] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=} (downstream offset advanced) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:214)}}
{{...}}

{{[2023-07-03 17:45:04,547] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,2): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:45:04,548] TRACE [MirrorCheckpointConnector|task-0] *Skipping* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=2, 
downstreamOffset=1{*}, metadata=} (repeated checkpoint) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:220)}}


was (Author: ecomar):
producing 1 record at a time with th console producer, while a consumer is 
polling (on source) the MM2 logs report :

{{[2023-07-03 17:44:49,479] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,0): Translated 0 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:49,479] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0, metadata=} (first for this partition) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:204)}}
{{...}}

{{[2023-07-03 17:44:54,510] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,1): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:54,511] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=} (downstream offset advanced) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:214)}}
{{...}}

{{[2023-07-03 17:45:04,547] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,2): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:45:04,548] TRACE [MirrorCheckpointConnector|task-0] *Skipping* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, u{*}pstreamOffset=2, 
downstreamOffset=1{*}, metadata=} (repeated checkpoint) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:220)}}

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic 

[jira] [Commented] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739688#comment-17739688
 ] 

Edoardo Comar commented on KAFKA-15144:
---

producing 1 record at a time with th console producer, while a consumer is 
polling (on source) the MM2 logs report :

{{[2023-07-03 17:44:49,479] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,0): Translated 0 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:49,479] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0, metadata=} (first for this partition) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:204)}}
{{...}}

{{[2023-07-03 17:44:54,510] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,1): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:54,511] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=} (downstream offset advanced) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:214)}}
{{...}}

{{[2023-07-03 17:45:04,547] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,2): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:45:04,548] TRACE [MirrorCheckpointConnector|task-0] *Skipping* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, u{*}pstreamOffset=2, 
downstreamOffset=1{*}, metadata=} (repeated checkpoint) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:220)}}

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739688#comment-17739688
 ] 

Edoardo Comar edited comment on KAFKA-15144 at 7/3/23 4:52 PM:
---

producing 1 record at a time with th console producer, while a consumer is 
polling (on source) the MM2 logs report :

{{[2023-07-03 17:44:49,479] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,0): Translated 0 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:49,479] TRACE [MirrorCheckpointConnector|task-0] *Emitting* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=0, 
downstreamOffset=0{*}, metadata=} (first for this partition) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:204)}}
{{...}}

{{[2023-07-03 17:44:54,510] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,1): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:54,511] TRACE [MirrorCheckpointConnector|task-0] *Emitting* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=1, 
downstreamOffset=1{*}, metadata=} (downstream offset advanced) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:214)}}
{{...}}

{{[2023-07-03 17:45:04,547] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,2): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:45:04,548] TRACE [MirrorCheckpointConnector|task-0] *Skipping* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=2, 
downstreamOffset=1{*}, metadata=} (repeated checkpoint) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:220)}}


was (Author: ecomar):
producing 1 record at a time with th console producer, while a consumer is 
polling (on source) the MM2 logs report :

{{[2023-07-03 17:44:49,479] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,0): Translated 0 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:49,479] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0, metadata=} (first for this partition) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:204)}}
{{...}}

{{[2023-07-03 17:44:54,510] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,1): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:44:54,511] TRACE [MirrorCheckpointConnector|task-0] Emitting 
Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=1, 
downstreamOffset=1, metadata=} (downstream offset advanced) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:214)}}
{{...}}

{{[2023-07-03 17:45:04,547] DEBUG [MirrorCheckpointConnector|task-0] 
translateDownstream(edogroup,vf-mirroring-test-edo-0,2): Translated 1 (relative 
to OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}) (org.apache.kafka.connect.mirror.OffsetSyncStore:161)}}
{{[2023-07-03 17:45:04,548] TRACE [MirrorCheckpointConnector|task-0] *Skipping* 
Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, {*}upstreamOffset=2, 
downstreamOffset=1{*}, metadata=} (repeated checkpoint) 
(org.apache.kafka.connect.mirror.MirrorCheckpointTask:220)}}

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the 

[jira] [Updated] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15144:
--
Description: 
Steps to reproduce :

1.Start the source cluster

2.Start the target cluster

3.Start connect-mirror-maker.sh using a config like the attached

4.Create a topic in source cluster

5.produce a few messages

6.consume them all with autocommit enabled

 

7. then dump the Checkpoint topic content e.g.

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}

{{{}Checkpoint{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
{*}downstreamOffset=1{*}, metadata={

 

the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, 
I'd expect the downstreamOffset to match the upstreamOffset.

Note that dumping the offset sync topic, shows matching initial offsets

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}

{{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0{
 
 
 

  was:
Steps to reproduce :

1.Start the source cluster

2.Start the target cluster

3.Start connect-mirror-maker.sh using a config like the attached

4.Create a topic in source cluster

5.produce a few messages

6.consume them all with autocommit enabled

 

7. then dump the Checkpoint topic content e.g.

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}

{{Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}}}

 

the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, 
I'd expect the downstreamOffset to match the upstreamOffset.


Note that dumping the offset sync topic, shows matching initial offsets

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}

{{OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}}}
 
 
 


> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{{}Checkpoint{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> {*}downstreamOffset=1{*}, metadata={
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{{}OffsetSync{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0{
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15144:
--
Description: 
Steps to reproduce :

1.Start the source cluster

2.Start the target cluster

3.Start connect-mirror-maker.sh using a config like the attached

4.Create a topic in source cluster

5.produce a few messages

6.consume them all with autocommit enabled

 

7. then dump the Checkpoint topic content e.g.

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}

{{Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}}}

 

the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, 
I'd expect the downstreamOffset to match the upstreamOffset.


Note that dumping the offset sync topic, shows matching initial offsets

{{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}

{{OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}}}
 
 
 

  was:
Steps to reproduce :

Start source cluster

Start target cluster

start connect-mirror-maker.sh using a config like the attached

 

create topic in source cluster

produce a few messages

consume them all with autocmiit enabled

 

then dumping the Checkpoint topic content e.g.

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}


the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, I'd expect the downstreamOffset to 
match the upstreamOffset.

 

dumping the offset sync topic

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter

shows matching initial offsets

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
 
 
 


> Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{Checkpoint\{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> downstreamOffset=1, metadata=}}}
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0}}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15144) MM2 Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-15144:
--
Summary: MM2 Checkpoint downstreamOffset stuck to 1  (was: Checkpoint 
downstreamOffset stuck to 1)

> MM2 Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{Checkpoint\{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> downstreamOffset=1, metadata=}}}
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0}}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-15144:
-

Assignee: Edoardo Comar

> Checkpoint downstreamOffset stuck to 1
> --
>
> Key: KAFKA-15144
> URL: https://issues.apache.org/jira/browse/KAFKA-15144
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: edo-connect-mirror-maker-sourcetarget.properties
>
>
> Steps to reproduce :
> 1.Start the source cluster
> 2.Start the target cluster
> 3.Start connect-mirror-maker.sh using a config like the attached
> 4.Create a topic in source cluster
> 5.produce a few messages
> 6.consume them all with autocommit enabled
>  
> 7. then dump the Checkpoint topic content e.g.
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> source.checkpoints.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.CheckpointFormatter}}
> {{Checkpoint\{consumerGroupId=edogroup, 
> topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
> downstreamOffset=1, metadata=}}}
>  
> the downstreamOffset remains at 1, while, in a fresh cluster pair like with 
> the source topic created while MM2 is running, 
> I'd expect the downstreamOffset to match the upstreamOffset.
> Note that dumping the offset sync topic, shows matching initial offsets
> {{% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
> mm2-offset-syncs.source.internal --from-beginning --formatter 
> org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter}}
> {{OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
> downstreamOffset=0}}}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-15144:
-

 Summary: Checkpoint downstreamOffset stuck to 1
 Key: KAFKA-15144
 URL: https://issues.apache.org/jira/browse/KAFKA-15144
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Reporter: Edoardo Comar
 Attachments: edo-connect-mirror-maker-sourcetarget.properties

Steps to reproduce :

Start source cluster

Start target cluster

start connect-mirror-maker.sh using a config like the attached

 

create topic in source cluster

produce a few messages

consume them all with autocmiit enabled

 

then dumping the Checkpoint topic content e.g.

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}


the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, I'd expect the downstreamOffset to 
match the upstreamOffset.

 

dumping the offset sync topic

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter

shows matching initial offsets

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15133) RequestMetrics MessageConversionsTimeMs count is ticked even when no conversion occurs

2023-06-28 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738205#comment-17738205
 ] 

Edoardo Comar commented on KAFKA-15133:
---

[~rsivaram] - reading KIP 188

I am not sure if the intent of the MessageConversionsTimeMs Histogram was also 
to record the number of time there was no conversion, so that if only say 1% of 
messages required conversions, we'd see that in the percentile distribution

I found that I was comparing the count with the one in BrokerTopicMetrics and 
the mismatch was obvious

> RequestMetrics MessageConversionsTimeMs count is ticked even when no 
> conversion occurs
> --
>
> Key: KAFKA-15133
> URL: https://issues.apache.org/jira/browse/KAFKA-15133
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Minor
>
> The Histogram 
> {}{color:#00}RequestChannel{color}.{}}}messageConversionsTimeHist}}
> is ticked even when a Produce/Fetch request incurred no conversion,
> because a new entry is added to the historgram distribution, with a 0ms value.
>  
> It's confusing comparing the Histogram
> kafka.network RequestMetrics MessageConversionsTimeMs
> with the Meter
> kafka.server BrokerTopicMetrics ProduceMessageConversionsPerSec
> because for the latter, the metric is ticked only if a conversion actually 
> occurred



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15133) RequestMetrics MessageConversionsTimeMs count is ticked even when no conversion occurs

2023-06-28 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-15133:
-

 Summary: RequestMetrics MessageConversionsTimeMs count is ticked 
even when no conversion occurs
 Key: KAFKA-15133
 URL: https://issues.apache.org/jira/browse/KAFKA-15133
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.4.1, 3.5.0
Reporter: Edoardo Comar
Assignee: Edoardo Comar


The Histogram 
{}{color:#00}RequestChannel{color}.{}}}messageConversionsTimeHist}}
is ticked even when a Produce/Fetch request incurred no conversion,
because a new entry is added to the historgram distribution, with a 0ms value.
 
It's confusing comparing the Histogram
kafka.network RequestMetrics MessageConversionsTimeMs
with the Meter
kafka.server BrokerTopicMetrics ProduceMessageConversionsPerSec
because for the latter, the metric is ticked only if a conversion actually 
occurred



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) The KRaft controller should properly handle overly large user operations

2023-05-26 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726688#comment-17726688
 ] 

Edoardo Comar commented on KAFKA-14996:
---

Opened a PR to allow responding gracefully with INVALID_PARTITION error to the 
clients

https://github.com/apache/kafka/pull/13766

> The KRaft controller should properly handle overly large user operations
> 
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 3.5.0
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Blocker
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-24 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725858#comment-17725858
 ] 

Edoardo Comar commented on KAFKA-14996:
---

I found another way to get into the error state.

3 broker/controller cluster, all 3 voters. If I shut down the 2 non-active 
quorum members, the remaining acive controller enters the state where it logs 
`[2023-05-24 16:29:45,129] WARN [BrokerToControllerChannelManager id=1 
name=heartbeat] Received error UNKNOWN_SERVER_ERROR from node 1 when making an 
ApiVersionsRequest with correlation id 3945. Disconnecting. 
(org.apache.kafka.clients.NetworkClient)`

and correspondingly 
```
[2023-05-24 16:29:45,128] WARN [QuorumController id=1] getFinalizedFeatures: 
failed with unknown server exception RuntimeException in 222 us.  The 
controller is already in standby mode. 
(org.apache.kafka.controller.QuorumController)
java.lang.RuntimeException: No in-memory snapshot for epoch 159730. Snapshot 
epochs are:
    at 
org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173)
    at 
org.apache.kafka.timeline.SnapshotRegistry.iterator(SnapshotRegistry.java:131)
    at org.apache.kafka.timeline.TimelineObject.get(TimelineObject.java:69)
    at 
org.apache.kafka.controller.FeatureControlManager.finalizedFeatures(FeatureControlManager.java:303)
    at 
org.apache.kafka.controller.QuorumController.lambda$finalizedFeatures$16(QuorumController.java:2016)
    at 
org.apache.kafka.controller.QuorumController$ControllerReadEvent.run(QuorumController.java:546)
```
in controller.log

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724946#comment-17724946
 ] 

Edoardo Comar edited comment on KAFKA-14996 at 5/22/23 2:15 PM:


The controller instability is not reproducible with 3.4 (at the git commit 
`2f13471181` so it must be a regression)

Also 3.5 `10189d6159` does not exhibit the controller bug


was (Author: ecomar):
The controller instability is not reproducible with 3.4 (at the git commit 
`2f13471181` so it must be a regression)

 

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724948#comment-17724948
 ] 

Edoardo Comar commented on KAFKA-14996:
---

is the state the controller gets in similar to

https://issues.apache.org/jira/browse/KAFKA-14644

?

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724946#comment-17724946
 ] 

Edoardo Comar edited comment on KAFKA-14996 at 5/22/23 1:10 PM:


The controller instability is not reproducible with 3.4 (at the git commit 
`2f13471181` so it must be a regression)

 


was (Author: ecomar):
The controller instability is not reproducible with 3.4 (at the git commit 
`721a917b44` so it must be a regression)

 

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-22 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724946#comment-17724946
 ] 

Edoardo Comar commented on KAFKA-14996:
---

The controller instability is not reproducible with 3.4 (at the git commit 
`721a917b44` so it must be a regression)

 

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724307#comment-17724307
 ] 

Edoardo Comar edited comment on KAFKA-14996 at 5/19/23 2:55 PM:


given that this means a client request can cause a cluster to become 
unavailable, I'd raise the Priority to critical

 

this is a potential denial of service attack?

cc

[~mimaison] [~ijuma] [~rajinisiva...@gmail.com] 


was (Author: ecomar):
given that this means a client request can cause a cluster to become 
unavailable, I'd raise the Priority to critical

 

this is a potential denial of service attack

cc

[~mimaison] [~ijuma] [~rajinisiva...@gmail.com] 

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724307#comment-17724307
 ] 

Edoardo Comar edited comment on KAFKA-14996 at 5/19/23 2:55 PM:


given that this means a client request can cause a cluster to become 
unavailable, I'd raise the Priority to critical

 

this is a potential denial of service attack

cc

[~mimaison] [~ijuma] [~rajinisiva...@gmail.com] 


was (Author: ecomar):
given that this means a client request can cause a cluster to become 
unavailable, I'd raise the Priority to critical

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724308#comment-17724308
 ] 

Edoardo Comar commented on KAFKA-14996:
---

The controller.log s are full of 

{{[2023-05-19 15:50:18,834] WARN [QuorumController id=0] getFinalizedFeatures: 
failed with unknown server exception RuntimeException in 28 us.  The controller 
is already in standby mode. (org.apache.kafka.controller.QuorumController)}}
{{java.lang.RuntimeException: No in-memory snapshot for epoch 84310. Snapshot 
epochs are: 61900}}
{{    at 
org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:173)}}
{{    at 
org.apache.kafka.timeline.SnapshotRegistry.iterator(SnapshotRegistry.java:131)}}
{{    at org.apache.kafka.timeline.TimelineObject.get(TimelineObject.java:69)}}
{{    at 
org.apache.kafka.controller.FeatureControlManager.finalizedFeatures(FeatureControlManager.java:303)}}
{{    at 
org.apache.kafka.controller.QuorumController.lambda$finalizedFeatures$16(QuorumController.java:2016)}}
{{    at 
org.apache.kafka.controller.QuorumController$ControllerReadEvent.run(QuorumController.java:546)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)}}

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724307#comment-17724307
 ] 

Edoardo Comar commented on KAFKA-14996:
---

given that this means a client request can cause a cluster to become 
unavailable, I'd raise the Priority to critical

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-14996:
--
Priority: Critical  (was: Major)

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Critical
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724300#comment-17724300
 ] 

Edoardo Comar edited comment on KAFKA-14996 at 5/19/23 2:48 PM:


Similar error is encounter if creating partitions > 
QuorumController.MAX_RECORDS_PER_BATCH on an existing topic.

More worrying is that the cluster looks like it can be unstable after the error 
occurs.

Seen in a cluster with 6 nodes 0,1,2=broker,controller 3,4,5=broker

e.g. server.log for node 1 :

 

{{[2023-05-19 15:43:32,640] INFO [RaftManager id=1] Completed transition to 
CandidateState(localId=1, epoch=300, retries=86, voteStates=\{0=UNRECORDED, 
1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1145) 
from CandidateState(localId=1, epoch=299, retries=85, 
voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, 
highWatermark=Optional.empty, electionTimeoutMs=1817) 
(org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:32,649] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4646. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:32,650] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4647. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,095] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4652. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,147] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4656. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,594] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4678. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,696] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4684. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,773] INFO [RaftManager id=1] Election has timed out, 
backing off for 1000ms before becoming a candidate again 
(org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,774] INFO [RaftManager id=1] Re-elect as candidate after 
election backoff has completed (org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,784] INFO [RaftManager id=1] Completed transition to 
CandidateState(localId=1, epoch=301, retries=87, voteStates=\{0=UNRECORDED, 
1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1022) 
from CandidateState(localId=1, epoch=300, retries=86, 
voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, 
highWatermark=Optional.empty, electionTimeoutMs=1145) 
(org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:34,802] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4691. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:34,825] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4692. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}



In this state, client requests that should mutate the metadata (eg delete a 
topic) always timeout

{{% bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic 
edotest1}}
{{Error while executing topic command : Call(callName=deleteTopics, 
deadlineMs=1684507597582, tries=1, nextAllowedTryMs=1684507597698) timed out at 
1684507597598 after 1 attempt(s)}}
{{[2023-05-19 15:46:37,602] ERROR 
org.apache.kafka.common.errors.TimeoutException: Call(callName=deleteTopics, 
deadlineMs=1684507597582, tries=1, nextAllowedTryMs=1684507597698) timed out at 
1684507597598 after 1 attempt(s)}}
{{Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled 
deleteTopics request with correlation id 5 due to node 5 being disconnected}}
{{ (kafka.admin.TopicCommand$)}}

 

 


was (Author: ecomar):
Similar error is encounter if creating partitions > 
QuorumController.MAX_RECORDS_PER_BATCH on an existing topic.

More worrying is that the cluster looks like it can be unstable after the error 
occurs.

Seen in a cluster with 6 nodes 0,1,2=broker,controller 3,4,5=broker

e.g. server.log for node 1 :

 

{{[2023-05-19 15:43:32,640] INFO [RaftManager id=1] Completed transition to 
CandidateState(localId=1, epoch=300, retries=86, voteStates=\{0=UNRECORDED, 
1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1145) 
from CandidateState(localId=1, epoch=299, retries=85, 

[jira] [Commented] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724300#comment-17724300
 ] 

Edoardo Comar commented on KAFKA-14996:
---

Similar error is encounter if creating partitions > 
QuorumController.MAX_RECORDS_PER_BATCH on an existing topic.

More worrying is that the cluster looks like it can be unstable after the error 
occurs.

Seen in a cluster with 6 nodes 0,1,2=broker,controller 3,4,5=broker

e.g. server.log for node 1 :

 

{{[2023-05-19 15:43:32,640] INFO [RaftManager id=1] Completed transition to 
CandidateState(localId=1, epoch=300, retries=86, voteStates=\{0=UNRECORDED, 
1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1145) 
from CandidateState(localId=1, epoch=299, retries=85, 
voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, 
highWatermark=Optional.empty, electionTimeoutMs=1817) 
(org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:32,649] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4646. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:32,650] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4647. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,095] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4652. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,147] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4656. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,594] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4678. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,696] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4684. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:33,773] INFO [RaftManager id=1] Election has timed out, 
backing off for 1000ms before becoming a candidate again 
(org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,774] INFO [RaftManager id=1] Re-elect as candidate after 
election backoff has completed (org.apache.kafka.raft.KafkaRaftClient)}}
{{[2023-05-19 15:43:34,784] INFO [RaftManager id=1] Completed transition to 
CandidateState(localId=1, epoch=301, retries=87, voteStates=\{0=UNRECORDED, 
1=GRANTED, 2=UNRECORDED}, highWatermark=Optional.empty, electionTimeoutMs=1022) 
from CandidateState(localId=1, epoch=300, retries=86, 
voteStates=\{0=UNRECORDED, 1=GRANTED, 2=UNRECORDED}, 
highWatermark=Optional.empty, electionTimeoutMs=1145) 
(org.apache.kafka.raft.QuorumState)}}
{{[2023-05-19 15:43:34,802] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 0 when making an ApiVersionsRequest with 
correlation id 4691. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{[2023-05-19 15:43:34,825] WARN [RaftManager id=1] Received error 
UNKNOWN_SERVER_ERROR from node 2 when making an ApiVersionsRequest with 
correlation id 4692. Disconnecting. (org.apache.kafka.clients.NetworkClient)}}
{{}}

 

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> 

[jira] [Commented] (KAFKA-13279) Implement CreateTopicsPolicy for KRaft

2023-05-19 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724293#comment-17724293
 ] 

Edoardo Comar commented on KAFKA-13279:
---

should this one not marked as closed / resolved ?

> Implement CreateTopicsPolicy for KRaft
> --
>
> Key: KAFKA-13279
> URL: https://issues.apache.org/jira/browse/KAFKA-13279
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>
> Implement CreateTopicsPolicy for KRaft



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-12 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-14996:
-

Assignee: Edoardo Comar

> CreateTopic falis with UnknownServerException if num partitions >= 
> QuorumController.MAX_RECORDS_PER_BATCH 
> --
>
> Key: KAFKA-14996
> URL: https://issues.apache.org/jira/browse/KAFKA-14996
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
>
> If an attempt is made to create a topic with
> num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)
> the client receives an UnknownServerException - it could rather receive a 
> better error.
> The controller logs
> {{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
> with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
> Renouncing leadership and reverting to the last committed offset 174. 
> (org.apache.kafka.controller.QuorumController)}}
> {{java.lang.IllegalStateException: Attempted to atomically commit 10001 
> records, but maxRecordsPerBatch is 1}}
> {{    at 
> org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
> {{    at 
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
> {{    at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
> {{    at java.base/java.lang.Thread.run(Thread.java:829)}}
> {{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-12 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14996:
-

 Summary: CreateTopic falis with UnknownServerException if num 
partitions >= QuorumController.MAX_RECORDS_PER_BATCH 
 Key: KAFKA-14996
 URL: https://issues.apache.org/jira/browse/KAFKA-14996
 Project: Kafka
  Issue Type: Bug
  Components: controller
Reporter: Edoardo Comar


If an attempt is made to create a topic with

num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)

the client receives an UnknownServerException - it could rather receive a 
better error.

The controller logs

{{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
Renouncing leadership and reverting to the last committed offset 174. 
(org.apache.kafka.controller.QuorumController)}}
{{java.lang.IllegalStateException: Attempted to atomically commit 10001 
records, but maxRecordsPerBatch is 1}}
{{    at 
org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
{{    at 
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)}}
{{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-28 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17681549#comment-17681549
 ] 

Edoardo Comar commented on KAFKA-14657:
---

Hi mostly yes the command appear to have filed, 
I have an idea of how to fix it, by accepting the concurrent transaction 
exception as a possible good response. 
I'm more at a loss at how to write an integration test for this.

Besides that, the behaviour of the fenced producer is slightly different in the 
two cases, as the exception it gets on producing is different.

However that looks to be a minor inconsistency that could be fixed with 
documentation.

 

> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> An InvalidProducerEpochException is also logged as error on the broker
> [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition topic-0 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of 
> producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last 
> seen epoch
>  
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-28 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-14657:
-

Assignee: Edoardo Comar

> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> An InvalidProducerEpochException is also logged as error on the broker
> [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition topic-0 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of 
> producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last 
> seen epoch
>  
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-27 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-14657:
--
Description: 
Admin.fenceProducers() 
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

An InvalidProducerEpochException is also logged as error on the broker

[2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
append operation on partition topic-0 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 
1062 at offset 84 in topic-0 is 0, which is smaller than the last seen epoch

 

Conversely, if Admin.fenceProducers() 
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets

As the caller of Admin.fenceProducers() is likely unaware of the producers 
state, the call should succeed regardless

  was:
Admin.fenceProducers() 
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

Conversely, if Admin.fenceProducers() 
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets

As the caller of Admin.fenceProducers() is likely unaware of the producers 
state, the call should succeed regardless


> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> An InvalidProducerEpochException is also logged as error on the broker
> [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition topic-0 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of 
> producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last 
> seen epoch
>  
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-27 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-14657:
--
Description: 
Admin.fenceProducers() 
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

Conversely, if Admin.fenceProducers() 
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets

As the caller of Admin.fenceProducers() is likely unaware of the producers 
state, the call should succeed regardless

  was:
{{Admin.fenceProducers() }}
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

Conversely, if 
{{Admin.fenceProducers() }}
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets 

As the caller of {{Admin.fenceProducers() }} the call should succeed regardless 
of the state of the producer


> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2023-01-27 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14657:
-

 Summary: Admin.fenceProducers fails when Producer has ongoing 
transaction - but Producer gets fenced
 Key: KAFKA-14657
 URL: https://issues.apache.org/jira/browse/KAFKA-14657
 Project: Kafka
  Issue Type: Bug
  Components: admin
Reporter: Edoardo Comar
 Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java

{{Admin.fenceProducers() }}
fails with a ConcurrentTransactionsException if invoked when a Producer has a 
transaction ongoing.
However, further attempts by that producer to produce fail with 
InvalidProducerEpochException and the producer is not re-usable, 
cannot abort/commit as it is fenced.

Conversely, if 
{{Admin.fenceProducers() }}
is invoked while there is no open transaction, the call succeeds and further 
attempts by that producer to produce fail with ProducerFenced.

see attached snippets 

As the caller of {{Admin.fenceProducers() }} the call should succeed regardless 
of the state of the producer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-7666) KIP-391: Allow Producing with Offsets for Cluster Replication

2023-01-27 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-7666.
--
Resolution: Won't Fix

KIP has been retired

> KIP-391: Allow Producing with Offsets for Cluster Replication
> -
>
> Key: KAFKA-7666
> URL: https://issues.apache.org/jira/browse/KAFKA-7666
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
>
> Implementing KIP-391
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-391%3A+Allow+Producing+with+Offsets+for+Cluster+Replication



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-5238) BrokerTopicMetrics can be recreated after topic is deleted

2023-01-13 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676681#comment-17676681
 ] 

Edoardo Comar commented on KAFKA-5238:
--

Retrying with https://github.com/apache/kafka/pull/13113 
many years after the old PR,  as this patch has been adopted and used for more 
than 5 years in my organisation, where before this we were able to detect 
metrics leaking in long lived clusters.

> BrokerTopicMetrics can be recreated after topic is deleted
> --
>
> Key: KAFKA-5238
> URL: https://issues.apache.org/jira/browse/KAFKA-5238
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Edoardo Comar
>Priority: Major
>
> As part of KAFKA-3258, we added code to remove metrics during topic deletion. 
> This works fine as long as there are no fetch requests in the purgatory. If 
> there are, however, we'll recreate the metrics when we call 
> `ReplicaManager.appendToLocalLog`.
> This can be reproduced by updating 
> MetricsTest.testBrokerTopicMetricsUnregisteredAfterDeletingTopic() in the 
> following way:
> {code}
> @Test
>   def testBrokerTopicMetricsUnregisteredAfterDeletingTopic() {
> val topic = "test-broker-topic-metric"
> AdminUtils.createTopic(zkUtils, topic, 2, 1)
> // Produce a few messages and consume them to create the metrics
> TestUtils.produceMessages(servers, topic, nMessages)
> TestUtils.consumeTopicRecords(servers, topic, nMessages)
> assertTrue("Topic metrics don't exist", topicMetricGroups(topic).nonEmpty)
> assertNotNull(BrokerTopicStats.getBrokerTopicStats(topic))
> AdminUtils.deleteTopic(zkUtils, topic)
> TestUtils.verifyTopicDeletion(zkUtils, topic, 1, servers)
> Thread.sleep(1)
> assertEquals("Topic metrics exists after deleteTopic", Set.empty, 
> topicMetricGroups(topic))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes

2023-01-04 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654663#comment-17654663
 ] 

Edoardo Comar commented on KAFKA-14571:
---

classified as minor because the the Cluster object returned from 
MetadataCache.getClusterMetadata(...)
is passed to ClientQuotaCallback.updateClusterMetadata(...)
and not by the describeCluster client API

> ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes
> 
>
> Key: KAFKA-14571
> URL: https://issues.apache.org/jira/browse/KAFKA-14571
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.3
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Minor
> Fix For: 3.4.0, 3.3.3
>
>
> ZkMetadataCache.getClusterMetadata returns a Cluster object where the 
> aliveNodes are missing their rack info.
> when ZkMetadataCache updates the metadataSnapshot, includes the rack in 
> `aliveBrokers` but not in `aliveNodes` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes

2023-01-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-14571:
--
Priority: Minor  (was: Major)

> ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes
> 
>
> Key: KAFKA-14571
> URL: https://issues.apache.org/jira/browse/KAFKA-14571
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.3
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Minor
> Fix For: 3.4.0, 3.3.3
>
>
> ZkMetadataCache.getClusterMetadata returns a Cluster object where the 
> aliveNodes are missing their rack info.
> when ZkMetadataCache updates the metadataSnapshot, includes the rack in 
> `aliveBrokers` but not in `aliveNodes` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes

2023-01-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-14571:
--
Summary: ZkMetadataCache.getClusterMetadata is missing rack information in 
aliveNodes  (was: ZkMetadataCache.getClusterMetadata is missing rack 
information for aliveNodes)

> ZkMetadataCache.getClusterMetadata is missing rack information in aliveNodes
> 
>
> Key: KAFKA-14571
> URL: https://issues.apache.org/jira/browse/KAFKA-14571
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.3
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Minor
>
> ZkMetadataCache.getClusterMetadata returns a Cluster object where the 
> aliveNodes are missing their rack info.
> when ZkMetadataCache updates the metadataSnapshot, includes the rack in 
> `aliveBrokers` but not in `aliveNodes` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14571) ZkMetadataCache.getClusterMetadata is missing rack information for aliveNodes

2023-01-04 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14571:
-

 Summary: ZkMetadataCache.getClusterMetadata is missing rack 
information for aliveNodes
 Key: KAFKA-14571
 URL: https://issues.apache.org/jira/browse/KAFKA-14571
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.3
Reporter: Edoardo Comar
Assignee: Edoardo Comar


ZkMetadataCache.getClusterMetadata returns a Cluster object where the 
aliveNodes are missing their rack info.

when ZkMetadataCache updates the metadataSnapshot, includes the rack in 
`aliveBrokers` but not in `aliveNodes` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-13255) Mirrormaker config property config.properties.exclude is not working as expected

2022-04-29 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529986#comment-17529986
 ] 

Edoardo Comar edited comment on KAFKA-13255 at 4/29/22 1:18 PM:


Thanks Tom !!! [~tombentley]


was (Author: ecomar):
Thanks Tom !!!

> Mirrormaker config property config.properties.exclude is not working as 
> expected 
> -
>
> Key: KAFKA-13255
> URL: https://issues.apache.org/jira/browse/KAFKA-13255
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.8.0
>Reporter: Anamika Nadkarni
>Assignee: Ed Berezitsky
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
>
> Objective - Use MM2 (kafka connect in distributed cluster) for data migration 
> between cluster hosted in private data center and aws msk cluster.
> Steps performed -
>  # Started kafka-connect service.
>  # Created 3 MM2 connectors (i.e. source connector, checkpoint connector and 
> heartbeat connector). Curl commands used to create connectors are in the 
> attached file.  To exclude certain config properties while topic replication, 
> we are using the 'config.properties.exclude' property in the MM2 source 
> connector.
> Expected -
> Source topic 'dev.portlandDc.anamika.helloMsk' should be successfully created 
> in destination cluster.
> Actual -
> Creation of the source topic 'dev.portlandDc.anamika.helloMsk' in destination 
> cluster fails with an error. Error is
> {code:java}
> [2021-08-06 06:13:40,944] WARN [mm2-msc|worker] Could not create topic 
> dev.portlandDc.anamika.helloMsk. 
> (org.apache.kafka.connect.mirror.MirrorSourceConnector:371)
> org.apache.kafka.common.errors.InvalidConfigurationException: Unknown topic 
> config name: confluent.value.schema.validation{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13255) Mirrormaker config property config.properties.exclude is not working as expected

2022-04-29 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529986#comment-17529986
 ] 

Edoardo Comar commented on KAFKA-13255:
---

Thanks Tom !!!

> Mirrormaker config property config.properties.exclude is not working as 
> expected 
> -
>
> Key: KAFKA-13255
> URL: https://issues.apache.org/jira/browse/KAFKA-13255
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.8.0
>Reporter: Anamika Nadkarni
>Assignee: Ed Berezitsky
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
>
> Objective - Use MM2 (kafka connect in distributed cluster) for data migration 
> between cluster hosted in private data center and aws msk cluster.
> Steps performed -
>  # Started kafka-connect service.
>  # Created 3 MM2 connectors (i.e. source connector, checkpoint connector and 
> heartbeat connector). Curl commands used to create connectors are in the 
> attached file.  To exclude certain config properties while topic replication, 
> we are using the 'config.properties.exclude' property in the MM2 source 
> connector.
> Expected -
> Source topic 'dev.portlandDc.anamika.helloMsk' should be successfully created 
> in destination cluster.
> Actual -
> Creation of the source topic 'dev.portlandDc.anamika.helloMsk' in destination 
> cluster fails with an error. Error is
> {code:java}
> [2021-08-06 06:13:40,944] WARN [mm2-msc|worker] Could not create topic 
> dev.portlandDc.anamika.helloMsk. 
> (org.apache.kafka.connect.mirror.MirrorSourceConnector:371)
> org.apache.kafka.common.errors.InvalidConfigurationException: Unknown topic 
> config name: confluent.value.schema.validation{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13255) Mirrormaker config property config.properties.exclude is not working as expected

2022-04-28 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529541#comment-17529541
 ] 

Edoardo Comar commented on KAFKA-13255:
---

Would have been really nice to have this fix in 3.1 

> Mirrormaker config property config.properties.exclude is not working as 
> expected 
> -
>
> Key: KAFKA-13255
> URL: https://issues.apache.org/jira/browse/KAFKA-13255
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.8.0
>Reporter: Anamika Nadkarni
>Assignee: Ed Berezitsky
>Priority: Major
> Fix For: 3.2.0
>
>
> Objective - Use MM2 (kafka connect in distributed cluster) for data migration 
> between cluster hosted in private data center and aws msk cluster.
> Steps performed -
>  # Started kafka-connect service.
>  # Created 3 MM2 connectors (i.e. source connector, checkpoint connector and 
> heartbeat connector). Curl commands used to create connectors are in the 
> attached file.  To exclude certain config properties while topic replication, 
> we are using the 'config.properties.exclude' property in the MM2 source 
> connector.
> Expected -
> Source topic 'dev.portlandDc.anamika.helloMsk' should be successfully created 
> in destination cluster.
> Actual -
> Creation of the source topic 'dev.portlandDc.anamika.helloMsk' in destination 
> cluster fails with an error. Error is
> {code:java}
> [2021-08-06 06:13:40,944] WARN [mm2-msc|worker] Could not create topic 
> dev.portlandDc.anamika.helloMsk. 
> (org.apache.kafka.connect.mirror.MirrorSourceConnector:371)
> org.apache.kafka.common.errors.InvalidConfigurationException: Unknown topic 
> config name: confluent.value.schema.validation{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13407) Kafka controller out of service after ZK leader restart

2021-11-15 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443781#comment-17443781
 ] 

Edoardo Comar commented on KAFKA-13407:
---

[~vinsonZhang] are you suggesting loss of connection with ZK should not cause a 
controller to resign? 

> Kafka controller out of service after ZK leader restart
> ---
>
> Key: KAFKA-13407
> URL: https://issues.apache.org/jira/browse/KAFKA-13407
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
> Environment: Ubuntu 20.04
>Reporter: Daniel
>Priority: Critical
>
> When the Zookeeper leader disappears, a new instance becomes the leader, the 
> instances need to reconnect to Zookeeper, but the Kafka "Controller" gets 
> lost in limbo state after re-establishing connection.
> See below for how I manage to reproduce this over and over.
> *Prerequisites*
> Have a Kafka cluster with 3 instances running version 2.8.1. Figure out which 
> one is the Controller. I'm using Kafkacat 1.5.0 and get this info using the 
> `-L` flag.
> Zookeeper runs with 3 instances on version 3.5.9. Figure out which one is 
> leader by checking
>  
> {code:java}
> echo stat | nc -v localhost 2181
> {code}
>  
>  
> *Reproduce*
> 1. Stop the leader Zookeeper service.
> 2. Watch the logs of the Kafka Controller and ensure that it reconnects and 
> registers again.
>  
> {code:java}
> Oct 27 09:13:08 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:08,882] INFO 
> Unable to read additional data from server sessionid 0x1f2a12870003, likely 
> server has closed socket, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] WARN 
> SASL configuration failed: javax.security.auth.login.LoginException: No JAAS 
> configuration section named 'Client' was found in specified JAAS 
> configuration file: '/opt/kafka/config/kafka_server_jaas.conf'. Will continue 
> connection to Zookeeper server without SASL authentication, if Zookeeper 
> server allows it. (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] INFO 
> Opening socket connection to server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] ERROR 
> [ZooKeeperClient Kafka server] Auth failed. (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,549] INFO 
> Socket connection established, initiating session, client: 
> /10.10.85.215:39338, server: 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,569] INFO 
> Session establishment complete on server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181, 
> sessionid = 0x1f2a12870003, negotiated timeout = 18000 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,548] INFO 
> [ZooKeeperClient Kafka server] Reinitializing due to auth failure. 
> (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [PartitionStateMachine controllerId=1003] Stopped partition state machine 
> (kafka.controller.ZkPartitionStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [ReplicaStateMachine controllerId=1003] Stopped replica state machine 
> (kafka.controller.ZkReplicaStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,554] INFO 
> [RequestSendThread 

[jira] [Comment Edited] (KAFKA-13407) Kafka controller out of service after ZK leader restart

2021-11-15 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443780#comment-17443780
 ] 

Edoardo Comar edited comment on KAFKA-13407 at 11/15/21, 11:44 AM:
---

Hi [~vinsonZhang] 
the zookeeper "Auth failed" error in the logs appears every time authentication 
is not configured for ZK "SASL configuration failed" then the zookeeper clients 
falls back to unauthenticated connection 
I don't see it as the cause of resignation


was (Author: ecomar):
[~vinsonZhang]the zookeeper "Auth failed" error in the logs appears every time 
authentication is not configured for ZK "SASL configuration failed" then the 
zookeeper clients falls back to unauthenticated connection 
I don't see it as the cause of resignation

> Kafka controller out of service after ZK leader restart
> ---
>
> Key: KAFKA-13407
> URL: https://issues.apache.org/jira/browse/KAFKA-13407
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
> Environment: Ubuntu 20.04
>Reporter: Daniel
>Priority: Critical
>
> When the Zookeeper leader disappears, a new instance becomes the leader, the 
> instances need to reconnect to Zookeeper, but the Kafka "Controller" gets 
> lost in limbo state after re-establishing connection.
> See below for how I manage to reproduce this over and over.
> *Prerequisites*
> Have a Kafka cluster with 3 instances running version 2.8.1. Figure out which 
> one is the Controller. I'm using Kafkacat 1.5.0 and get this info using the 
> `-L` flag.
> Zookeeper runs with 3 instances on version 3.5.9. Figure out which one is 
> leader by checking
>  
> {code:java}
> echo stat | nc -v localhost 2181
> {code}
>  
>  
> *Reproduce*
> 1. Stop the leader Zookeeper service.
> 2. Watch the logs of the Kafka Controller and ensure that it reconnects and 
> registers again.
>  
> {code:java}
> Oct 27 09:13:08 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:08,882] INFO 
> Unable to read additional data from server sessionid 0x1f2a12870003, likely 
> server has closed socket, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] WARN 
> SASL configuration failed: javax.security.auth.login.LoginException: No JAAS 
> configuration section named 'Client' was found in specified JAAS 
> configuration file: '/opt/kafka/config/kafka_server_jaas.conf'. Will continue 
> connection to Zookeeper server without SASL authentication, if Zookeeper 
> server allows it. (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] INFO 
> Opening socket connection to server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] ERROR 
> [ZooKeeperClient Kafka server] Auth failed. (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,549] INFO 
> Socket connection established, initiating session, client: 
> /10.10.85.215:39338, server: 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,569] INFO 
> Session establishment complete on server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181, 
> sessionid = 0x1f2a12870003, negotiated timeout = 18000 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,548] INFO 
> [ZooKeeperClient Kafka server] Reinitializing due to auth failure. 
> (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [PartitionStateMachine controllerId=1003] Stopped partition state machine 
> (kafka.controller.ZkPartitionStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [ReplicaStateMachine controllerId=1003] Stopped replica state machine 
> (kafka.controller.ZkReplicaStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> 

[jira] [Commented] (KAFKA-13407) Kafka controller out of service after ZK leader restart

2021-11-15 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443780#comment-17443780
 ] 

Edoardo Comar commented on KAFKA-13407:
---

[~vinsonZhang]the zookeeper "Auth failed" error in the logs appears every time 
authentication is not configured for ZK "SASL configuration failed" then the 
zookeeper clients falls back to unauthenticated connection 
I don't see it as the cause of resignation

> Kafka controller out of service after ZK leader restart
> ---
>
> Key: KAFKA-13407
> URL: https://issues.apache.org/jira/browse/KAFKA-13407
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
> Environment: Ubuntu 20.04
>Reporter: Daniel
>Priority: Critical
>
> When the Zookeeper leader disappears, a new instance becomes the leader, the 
> instances need to reconnect to Zookeeper, but the Kafka "Controller" gets 
> lost in limbo state after re-establishing connection.
> See below for how I manage to reproduce this over and over.
> *Prerequisites*
> Have a Kafka cluster with 3 instances running version 2.8.1. Figure out which 
> one is the Controller. I'm using Kafkacat 1.5.0 and get this info using the 
> `-L` flag.
> Zookeeper runs with 3 instances on version 3.5.9. Figure out which one is 
> leader by checking
>  
> {code:java}
> echo stat | nc -v localhost 2181
> {code}
>  
>  
> *Reproduce*
> 1. Stop the leader Zookeeper service.
> 2. Watch the logs of the Kafka Controller and ensure that it reconnects and 
> registers again.
>  
> {code:java}
> Oct 27 09:13:08 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:08,882] INFO 
> Unable to read additional data from server sessionid 0x1f2a12870003, likely 
> server has closed socket, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] WARN 
> SASL configuration failed: javax.security.auth.login.LoginException: No JAAS 
> configuration section named 'Client' was found in specified JAAS 
> configuration file: '/opt/kafka/config/kafka_server_jaas.conf'. Will continue 
> connection to Zookeeper server without SASL authentication, if Zookeeper 
> server allows it. (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] INFO 
> Opening socket connection to server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] ERROR 
> [ZooKeeperClient Kafka server] Auth failed. (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,549] INFO 
> Socket connection established, initiating session, client: 
> /10.10.85.215:39338, server: 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,569] INFO 
> Session establishment complete on server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181, 
> sessionid = 0x1f2a12870003, negotiated timeout = 18000 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,548] INFO 
> [ZooKeeperClient Kafka server] Reinitializing due to auth failure. 
> (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [PartitionStateMachine controllerId=1003] Stopped partition state machine 
> (kafka.controller.ZkPartitionStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [ReplicaStateMachine controllerId=1003] Stopped replica state machine 
> (kafka.controller.ZkReplicaStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 

[jira] [Commented] (KAFKA-13191) Kafka 2.8 - simultaneous restarts of Kafka and zookeeper result in broken cluster

2021-11-09 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441106#comment-17441106
 ] 

Edoardo Comar commented on KAFKA-13191:
---

[~acldstkusr] can you take a look at 
https://issues.apache.org/jira/browse/KAFKA-13407 ?
It might be symptoms of the same underlying issue.

Would you be able to try reproduce this issue with the fix we propose in 
https://issues.apache.org/jira/browse/KAFKA-13407 ?

> Kafka 2.8 - simultaneous restarts of Kafka and zookeeper result in broken 
> cluster
> -
>
> Key: KAFKA-13191
> URL: https://issues.apache.org/jira/browse/KAFKA-13191
> Project: Kafka
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.8.0, 3.0.0
>Reporter: CS User
>Priority: Major
>
> We're using confluent platform 6.2, running in a Kubernetes environment. The 
> cluster has been running for a couple of years with zero issues, starting 
> from version 1.1, 2.5 and now 2.8. 
> We've very recently upgraded to kafka 2.8 from kafka 2.5. 
> Since upgrading, we have seen issues when kafka and zookeeper pods restart 
> concurrently. 
> We can replicate the issue by restarting either the zookeeper statefulset 
> first or the kafka statefulset first, either way appears to result with the 
> same failure scenario. 
> We've attempted to mitigate by preventing the kafka pods from stopping if any 
> zookeeper pods are being restarted, or a rolling restart of the zookeeper 
> cluster is underway. 
> We've also added a check to stop the kafka pods from starting until all 
> zookeeper pods are ready, however under the following scenario we still see 
> the issue:
> In a 3 node kafka cluster with 5 zookeeper servers
>  # kafka-2 starts to terminate - all zookeeper pods are running, so it 
> proceeds
>  # zookeeper-4 terminates
>  # kafka-2 starts-up, and waits until the zookeeper rollout completes
>  # kafka-2 eventually fully starts, kafka comes up and we see the errors 
> below on other pods in the cluster. 
> Without mitigation and in the above scenario we see errors on pods kafka-0 
> (repeatedly spamming the log) :
> {noformat}
> [2021-08-11 11:45:57,625] WARN Broker had a stale broker epoch 
> (670014914375), retrying. (kafka.server.DefaultAlterIsrManager){noformat}
> Kafka-1 seems ok
> When kafka-2 starts, it has this log entry with regards to its own broker 
> epoch:
> {noformat}
> [2021-08-11 11:44:48,116] INFO Registered broker 2 at path /brokers/ids/2 
> with addresses: 
> INTERNAL://kafka-2.kafka.svc.cluster.local:9092,INTERNAL_SECURE://kafka-2.kafka.svc.cluster.local:9094,
>  czxid (broker epoch): 674309865493 (kafka.zk.KafkaZkClient) {noformat}
> This is despite kafka-2 appearing to start fine, this is what you see in 
> kafka-2's logs, nothing else seems to be added to the log, it just seems to 
> hang here:
> {noformat}
> [2021-08-11 11:44:48,911] INFO [SocketServer listenerType=ZK_BROKER, 
> nodeId=2] Started socket server acceptors and processors 
> (kafka.network.SocketServer)
> [2021-08-11 11:44:48,913] INFO Kafka version: 6.2.0-ccs 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,913] INFO Kafka commitId: 1a5755cf9401c84f 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,913] INFO Kafka startTimeMs: 1628682288911 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,914] INFO [KafkaServer id=2] started 
> (kafka.server.KafkaServer) {noformat}
> This never appears to recover. 
> If you then restart kafka-2, you'll see these errors:
> {noformat}
> org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
> factor: 3 larger than available brokers: 0. {noformat}
> This seems to completely break the cluster, partitions do not failover as 
> expected. 
>  
> Checking zookeeper, and getting the values of the brokers look fine 
> {noformat}
>  get /brokers/ids/0{noformat}
> etc, all looks fine there, each broker is present. 
>  
> This error message appears to have been added to kafka in the last 11 months 
> {noformat}
> Broker had a stale broker epoch {noformat}
> Via this PR:
> [https://github.com/apache/kafka/pull/9100]
> I see also this comment around the leader getting stuck:
> [https://github.com/apache/kafka/pull/9100/files#r494480847]
>  
> Recovery is possible by continuing to restart the remaining brokers in the 
> cluster. Once all have been restarted, everything looks fine.
> Has anyone else come across this? It seems very simple to replicate in our 
> environment, simply start a simultaneous rolling restart of both kafka and 
> zookeeper. 
> I appreciate that Zookeeper and Kafka would not normally be restarted 
> concurrently in this way. However there are going to be scenarios where this 
> can happen, such as if we had simultaneous 

[jira] [Commented] (KAFKA-13407) Kafka controller out of service after ZK leader restart

2021-11-08 Thread Edoardo Comar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17440593#comment-17440593
 ] 

Edoardo Comar commented on KAFKA-13407:
---

Hi we're experiencing a similar issue. We think the following patch to 2.8 may 
solve it

[https://github.com/apache/kafka/pull/11476]

we have however not been able to replicate the issue yet using the ducktape 
system tests

Would you be able to test our patch in your setup [~Olsson] ?

> Kafka controller out of service after ZK leader restart
> ---
>
> Key: KAFKA-13407
> URL: https://issues.apache.org/jira/browse/KAFKA-13407
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
> Environment: Ubuntu 20.04
>Reporter: Daniel
>Priority: Critical
>
> When the Zookeeper leader disappears, a new instance becomes the leader, the 
> instances need to reconnect to Zookeeper, but the Kafka "Controller" gets 
> lost in limbo state after re-establishing connection.
> See below for how I manage to reproduce this over and over.
> *Prerequisites*
> Have a Kafka cluster with 3 instances running version 2.8.1. Figure out which 
> one is the Controller. I'm using Kafkacat 1.5.0 and get this info using the 
> `-L` flag.
> Zookeeper runs with 3 instances on version 3.5.9. Figure out which one is 
> leader by checking
>  
> {code:java}
> echo stat | nc -v localhost 2181
> {code}
>  
>  
> *Reproduce*
> 1. Stop the leader Zookeeper service.
> 2. Watch the logs of the Kafka Controller and ensure that it reconnects and 
> registers again.
>  
> {code:java}
> Oct 27 09:13:08 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:08,882] INFO 
> Unable to read additional data from server sessionid 0x1f2a12870003, likely 
> server has closed socket, closing socket connection and attempting reconnect 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] WARN 
> SASL configuration failed: javax.security.auth.login.LoginException: No JAAS 
> configuration section named 'Client' was found in specified JAAS 
> configuration file: '/opt/kafka/config/kafka_server_jaas.conf'. Will continue 
> connection to Zookeeper server without SASL authentication, if Zookeeper 
> server allows it. (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] INFO 
> Opening socket connection to server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,548] ERROR 
> [ZooKeeperClient Kafka server] Auth failed. (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,549] INFO 
> Socket connection established, initiating session, client: 
> /10.10.85.215:39338, server: 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:10 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:10,569] INFO 
> Session establishment complete on server 
> zookeeper-kafka.service.consul.lab.aws.blue.example.net/10.10.84.12:2181, 
> sessionid = 0x1f2a12870003, negotiated timeout = 18000 
> (org.apache.zookeeper.ClientCnxn)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,548] INFO 
> [ZooKeeperClient Kafka server] Reinitializing due to auth failure. 
> (kafka.zookeeper.ZooKeeperClient)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [PartitionStateMachine controllerId=1003] Stopped partition state machine 
> (kafka.controller.ZkPartitionStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,550] INFO 
> [ReplicaStateMachine controllerId=1003] Stopped replica state machine 
> (kafka.controller.ZkReplicaStateMachine)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,551] INFO 
> [RequestSendThread controllerId=1003] Shutdown completed 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Shutting down 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread controllerId=1003] Stopped 
> (kafka.controller.RequestSendThread)
> Oct 27 09:13:11 ip-10-10-85-215 kafka[62961]: [2021-10-27 09:13:11,552] INFO 
> [RequestSendThread 

[jira] [Updated] (KAFKA-13191) Kafka 2.8 - simultaneous restarts of Kafka and zookeeper result in broken cluster

2021-10-14 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-13191:
--
Affects Version/s: 3.0.0

> Kafka 2.8 - simultaneous restarts of Kafka and zookeeper result in broken 
> cluster
> -
>
> Key: KAFKA-13191
> URL: https://issues.apache.org/jira/browse/KAFKA-13191
> Project: Kafka
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.8.0, 3.0.0
>Reporter: CS User
>Priority: Major
>
> We're using confluent platform 6.2, running in a Kubernetes environment. The 
> cluster has been running for a couple of years with zero issues, starting 
> from version 1.1, 2.5 and now 2.8. 
> We've very recently upgraded to kafka 2.8 from kafka 2.5. 
> Since upgrading, we have seen issues when kafka and zookeeper pods restart 
> concurrently. 
> We can replicate the issue by restarting either the zookeeper statefulset 
> first or the kafka statefulset first, either way appears to result with the 
> same failure scenario. 
> We've attempted to mitigate by preventing the kafka pods from stopping if any 
> zookeeper pods are being restarted, or a rolling restart of the zookeeper 
> cluster is underway. 
> We've also added a check to stop the kafka pods from starting until all 
> zookeeper pods are ready, however under the following scenario we still see 
> the issue:
> In a 3 node kafka cluster with 5 zookeeper servers
>  # kafka-2 starts to terminate - all zookeeper pods are running, so it 
> proceeds
>  # zookeeper-4 terminates
>  # kafka-2 starts-up, and waits until the zookeeper rollout completes
>  # kafka-2 eventually fully starts, kafka comes up and we see the errors 
> below on other pods in the cluster. 
> Without mitigation and in the above scenario we see errors on pods kafka-0 
> (repeatedly spamming the log) :
> {noformat}
> [2021-08-11 11:45:57,625] WARN Broker had a stale broker epoch 
> (670014914375), retrying. (kafka.server.DefaultAlterIsrManager){noformat}
> Kafka-1 seems ok
> When kafka-2 starts, it has this log entry with regards to its own broker 
> epoch:
> {noformat}
> [2021-08-11 11:44:48,116] INFO Registered broker 2 at path /brokers/ids/2 
> with addresses: 
> INTERNAL://kafka-2.kafka.svc.cluster.local:9092,INTERNAL_SECURE://kafka-2.kafka.svc.cluster.local:9094,
>  czxid (broker epoch): 674309865493 (kafka.zk.KafkaZkClient) {noformat}
> This is despite kafka-2 appearing to start fine, this is what you see in 
> kafka-2's logs, nothing else seems to be added to the log, it just seems to 
> hang here:
> {noformat}
> [2021-08-11 11:44:48,911] INFO [SocketServer listenerType=ZK_BROKER, 
> nodeId=2] Started socket server acceptors and processors 
> (kafka.network.SocketServer)
> [2021-08-11 11:44:48,913] INFO Kafka version: 6.2.0-ccs 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,913] INFO Kafka commitId: 1a5755cf9401c84f 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,913] INFO Kafka startTimeMs: 1628682288911 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-08-11 11:44:48,914] INFO [KafkaServer id=2] started 
> (kafka.server.KafkaServer) {noformat}
> This never appears to recover. 
> If you then restart kafka-2, you'll see these errors:
> {noformat}
> org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
> factor: 3 larger than available brokers: 0. {noformat}
> This seems to completely break the cluster, partitions do not failover as 
> expected. 
>  
> Checking zookeeper, and getting the values of the brokers look fine 
> {noformat}
>  get /brokers/ids/0{noformat}
> etc, all looks fine there, each broker is present. 
>  
> This error message appears to have been added to kafka in the last 11 months 
> {noformat}
> Broker had a stale broker epoch {noformat}
> Via this PR:
> [https://github.com/apache/kafka/pull/9100]
> I see also this comment around the leader getting stuck:
> [https://github.com/apache/kafka/pull/9100/files#r494480847]
>  
> Recovery is possible by continuing to restart the remaining brokers in the 
> cluster. Once all have been restarted, everything looks fine.
> Has anyone else come across this? It seems very simple to replicate in our 
> environment, simply start a simultaneous rolling restart of both kafka and 
> zookeeper. 
> I appreciate that Zookeeper and Kafka would not normally be restarted 
> concurrently in this way. However there are going to be scenarios where this 
> can happen, such as if we had simultaneous Kubernetes node failures, 
> resulting in the loss of both a zookeeper and a kafka pod at the same time. 
> This could result in the issue above. 
> This is not something that we have seen previously with versions 1.1 or 2.5. 
> Just to be clear, rolling restarting only kafka or 

  1   2   >