Re: [DISCUSS] Adding non-committers as Github collaborators

2023-04-27 Thread Luke Chen
Hi David,

Thanks for the suggestion. I'm +1.
This is also a good way to show appreciation from Kafka community to some
great contributors who haven't qualified to become committers.

Luke

On Fri, Apr 28, 2023 at 2:45 AM David Arthur
 wrote:

> Hey folks,
>
> I stumbled across this wiki page from the infra team that describes the
> various features supported in the ".asf.yaml" file:
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
>
> One section that looked particularly interesting was
>
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
>
> github:
>   collaborators:
> - userA
> - userB
>
> This would allow us to define non-committers as collaborators on the Github
> project. Concretely, this means they would receive the "triage" Github role
> (defined here
>
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role
> ).
> Practically, this means we could let non-committers do things like assign
> labels and reviewers on Pull Requests.
>
> I wanted to see what the committer group thought about this feature. I
> think it could be useful.
>
> Cheers,
> David
>


[DISCUSS] Recommendations for managing long-running projects

2023-04-27 Thread Kirk True
Hi all,

A handful of engineers are collaborating on a fairly sizable project to improve 
the Java consumer client [1]. We are using as many ASF tools as possible for 
the work (wiki, Jira, mailing list, and Slack thus far).

There are yet two areas where we need recommendations: 

1. Project management tools. What is the recommended tool for communicating 
project scheduling, milestones, etc.?

2. Shared code collaboration. Since none of the engineers on the project are 
committers, we can't collaborate by reviewing and merging our changes into 
trunk. Is there a recommended path to collaborate for non-committers?

Thanks,
Kirk

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+project+overview

[jira] [Resolved] (KAFKA-14942) CopyOnWriteMap implements ConcurrentMap but does not implement required default methods

2023-04-27 Thread Steven Schlansker (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Schlansker resolved KAFKA-14942.
---
Resolution: Invalid

> CopyOnWriteMap implements ConcurrentMap but does not implement required 
> default methods
> ---
>
> Key: KAFKA-14942
> URL: https://issues.apache.org/jira/browse/KAFKA-14942
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.4.0
>Reporter: Steven Schlansker
>Priority: Minor
>
> Hi Kafka team,
> I was reading through the kafka-clients CopyOnWriteMap while investigating a 
> problem in a different library, and I think it is declaring that it is a 
> ConcurrentMap but does not completely implement that interface.
> In particular, it inherits e.g. computeIfAbsent as a default method from Map, 
> which is noted to be a non-atomic implementation, and is not synchronized in 
> any way. I think this can lead to a reader experiencing a map whose contents 
> are not consistent with any serial execution of write ops.
>  
> Consider a thread T1 which calls computeIfAbsent("a", _ -> "1")
> T1 computeIfAbsent calls get("a") and observes null, and is then pre-empted
> T2 calls put("a", "2"), which copies the (empty) backing map and stores 
> \{"a": "2"}
> T1 computeIfAbsent then wakes up, still thinking the value is null, and calls 
> put("a", "1").
>  
> This leads to the map finishing with the contents \{"a":"1"}, while any 
> serial execution of these two operations should always finish with \{"a":"2"}.
>  
> I think CopyOnWriteMap should either re-implement all mutating default 
> methods at least as synchronized. If this is a special internal map and we 
> know those will never be called, perhaps they should throw 
> UnsupportedOperationException or at least document the class as not a 
> complete and proper implementation.
>  
> Thank you for your consideration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14950) Implement assign() and assignment()

2023-04-27 Thread Philip Nee (Jira)
Philip Nee created KAFKA-14950:
--

 Summary: Implement assign() and assignment()
 Key: KAFKA-14950
 URL: https://issues.apache.org/jira/browse/KAFKA-14950
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Philip Nee
Assignee: Philip Nee


Implement assign() and assignment()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14949) Add Streams upgrade tests from AK 3.4

2023-04-27 Thread Victoria Xia (Jira)
Victoria Xia created KAFKA-14949:


 Summary: Add Streams upgrade tests from AK 3.4
 Key: KAFKA-14949
 URL: https://issues.apache.org/jira/browse/KAFKA-14949
 Project: Kafka
  Issue Type: Task
  Components: streams
Reporter: Victoria Xia


Streams upgrade tests currently only test upgrading from 3.3 and earlier 
versions 
([link|https://github.com/apache/kafka/blob/056657d84d84e116ffc9460872945b4d2b479ff3/tests/kafkatest/tests/streams/streams_application_upgrade_test.py#L30]).
 We should add 3.4 as an "upgrade_from" version into these tests, in light of 
the upcoming 3.5 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1803

2023-04-27 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 4471 lines...]
[2023-04-27T20:15:26.598Z] > Task :jmh-benchmarks:compileTestJava NO-SOURCE
[2023-04-27T20:15:26.598Z] > Task :jmh-benchmarks:testClasses UP-TO-DATE
[2023-04-27T20:15:26.598Z] > Task :jmh-benchmarks:checkstyleTest NO-SOURCE
[2023-04-27T20:15:26.598Z] > Task :jmh-benchmarks:spotbugsTest SKIPPED
[2023-04-27T20:15:28.374Z] > Task :jmh-benchmarks:checkstyleMain
[2023-04-27T20:15:30.149Z] > Task :connect:runtime:compileTestJava
[2023-04-27T20:15:30.149Z] > Task :connect:runtime:testClasses
[2023-04-27T20:15:30.149Z] > Task :connect:runtime:spotbugsTest SKIPPED
[2023-04-27T20:15:31.923Z] > Task :connect:mirror:compileTestJava
[2023-04-27T20:15:31.923Z] > Task :connect:mirror:testClasses
[2023-04-27T20:15:31.923Z] > Task :connect:mirror:spotbugsTest SKIPPED
[2023-04-27T20:15:32.646Z] > Task :jmh-benchmarks:spotbugsMain
[2023-04-27T20:15:32.808Z] > Task :streams:compileTestJava
[2023-04-27T20:15:32.808Z] > Task :core:spotbugsMain
[2023-04-27T20:15:34.580Z] > Task :connect:mirror:checkstyleTest
[2023-04-27T20:15:34.580Z] > Task :connect:mirror:check
[2023-04-27T20:15:35.973Z] > Task :streams:checkstyleTest
[2023-04-27T20:15:35.973Z] > Task :streams:check
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] FAILURE: Build failed with an exception.
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] * What went wrong:
[2023-04-27T20:15:35.973Z] Execution failed for task ':rat'.
[2023-04-27T20:15:35.973Z] > A failure occurred while executing 
org.nosphere.apache.rat.RatWork
[2023-04-27T20:15:35.973Z]> Apache Rat audit failure - 1 unapproved license
[2023-04-27T20:15:35.973Z]  See 
file:///home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/build/rat/index.html
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] * Try:
[2023-04-27T20:15:35.973Z] > Run with --stacktrace option to get the stack 
trace.
[2023-04-27T20:15:35.973Z] > Run with --info or --debug option to get more log 
output.
[2023-04-27T20:15:35.973Z] > Run with --scan to get full insights.
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] * Get more help at https://help.gradle.org
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] Deprecated Gradle features were used in this build, 
making it incompatible with Gradle 9.0.
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] You can use '--warning-mode all' to show the 
individual deprecation warnings and determine if they come from your own 
scripts or plugins.
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] See 
https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] BUILD FAILED in 8m
[2023-04-27T20:15:35.973Z] 255 actionable tasks: 206 executed, 49 up-to-date
[2023-04-27T20:15:35.973Z] 
[2023-04-27T20:15:35.973Z] See the profiling report at: 
file:///home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk_2/build/reports/profile/profile-2023-04-27-20-07-38.html
[2023-04-27T20:15:35.973Z] A fine-grained performance profile is available: use 
the --scan option.
[2023-04-27T20:15:37.309Z] > Task :streams:testClasses
[2023-04-27T20:15:37.309Z] > Task :streams:streams-scala:compileTestJava 
NO-SOURCE
[2023-04-27T20:15:37.309Z] > Task :streams:spotbugsTest SKIPPED
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
Failed in branch JDK 8 and Scala 2.12
[2023-04-27T20:15:39.504Z] > Task :jmh-benchmarks:spotbugsMain
[2023-04-27T20:15:42.471Z] 
[2023-04-27T20:15:42.471Z] > Task :streams:streams-scala:compileTestScala
[2023-04-27T20:15:42.471Z] [Warn] 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/streams-scala/src/test/scala/org/apache/kafka/streams/scala/kstream/KStreamSplitTest.scala:19:41:
 imported `Named` is permanently hidden by definition of type Named in package 
kstream
[2023-04-27T20:15:42.471Z] [Warn] 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/streams-scala/src/test/scala/org/apache/kafka/streams/scala/kstream/KStreamTest.scala:24:3:
 imported `Named` is permanently hidden by definition of type Named in package 
kstream
[2023-04-27T20:15:43.419Z] [Warn] 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/streams-scala/src/test/scala/org/apache/kafka/streams/scala/kstream/KTableTest.scala:21:3:
 imported `Named` is permanently hidden by definition of type Named in package 
kstream
[2023-04-27T20:15:47.032Z] three warnings found
[2023-04-27T20:15:47.032Z] 
[2023-04-27T20:15:47.032Z] > Task :streams:streams-scala:testClasses
[2023-04-27T20:15:47.032Z] > Task :streams:streams-scala:checkstyleTest 
NO-SOURCE
[2023-04-27T20:15:47.032Z] > Task :streams:streams-scala:spotbugsTest SKIPPED

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1802

2023-04-27 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 504805 lines...]
[2023-04-27T15:19:52.854Z] 
[2023-04-27T15:19:52.854Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers PASSED
[2023-04-27T15:19:52.854Z] 
[2023-04-27T15:19:52.854Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount STARTED
[2023-04-27T15:20:05.533Z] 
[2023-04-27T15:20:05.533Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount PASSED
[2023-04-27T15:20:05.533Z] 
[2023-04-27T15:20:05.533Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient STARTED
[2023-04-27T15:20:05.533Z] 
[2023-04-27T15:20:05.533Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient PASSED
[2023-04-27T15:20:05.533Z] 
[2023-04-27T15:20:05.533Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys STARTED
[2023-04-27T15:20:13.999Z] 
[2023-04-27T15:20:13.999Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys PASSED
[2023-04-27T15:20:13.999Z] 
[2023-04-27T15:20:13.999Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThreadsPerClient STARTED
[2023-04-27T15:20:15.048Z] 
[2023-04-27T15:20:15.048Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThreadsPerClient PASSED
[2023-04-27T15:20:15.048Z] 
[2023-04-27T15:20:15.048Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyThreadsPerClient STARTED
[2023-04-27T15:20:16.098Z] 
[2023-04-27T15:20:16.098Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyThreadsPerClient PASSED
[2023-04-27T15:20:16.098Z] 
[2023-04-27T15:20:16.098Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargePartitionCount STARTED
[2023-04-27T15:20:46.148Z] 
[2023-04-27T15:20:46.148Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargePartitionCount PASSED
[2023-04-27T15:20:46.148Z] 
[2023-04-27T15:20:46.148Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargePartitionCount STARTED
[2023-04-27T15:21:12.835Z] 
[2023-04-27T15:21:12.835Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargePartitionCount PASSED
[2023-04-27T15:21:12.835Z] 
[2023-04-27T15:21:12.835Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys STARTED
[2023-04-27T15:21:16.848Z] 
[2023-04-27T15:21:16.848Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys PASSED
[2023-04-27T15:21:16.848Z] 
[2023-04-27T15:21:16.848Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys STARTED
[2023-04-27T15:21:35.213Z] 
[2023-04-27T15:21:35.213Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys PASSED
[2023-04-27T15:21:35.213Z] 
[2023-04-27T15:21:35.213Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers STARTED
[2023-04-27T15:21:37.414Z] 
[2023-04-27T15:21:37.414Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers PASSED
[2023-04-27T15:21:37.414Z] 
[2023-04-27T15:21:37.414Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargeNumConsumers STARTED
[2023-04-27T15:21:39.351Z] 
[2023-04-27T15:21:39.351Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 181 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargeNumConsumers PASSED

Re: [DISCUSS] Apache Kafka 3.4.1 release

2023-04-27 Thread Kenneth Eversole
+1, Thanks for leading the charge.


On Thu, Apr 27, 2023 at 2:54 PM Sophie Blee-Goldman 
wrote:

> +1, thanks Luke!
>
> On Wed, Apr 26, 2023 at 10:53 AM Bruno Cadonna  wrote:
>
> > Thanks a lot, Luke!
> >
> > +1
> >
> > Best,
> > Bruno
> >
> > On 26.04.23 16:11, Bill Bejeck wrote:
> > > Thanks for volunteering!
> > >
> > > +1
> > >
> > > On Wed, Apr 26, 2023 at 8:33 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > >> Thanks Luke
> > >> +1
> > >>
> > >> On Wed, Apr 26, 2023 at 2:12 PM Chia-Ping Tsai 
> > wrote:
> > >>>
> > >>> +1 to Luke!
> > >>>
> >  David Jacot  於 2023年4月26日 下午6:44 寫道:
> > 
> >  +1 Thanks, Luke!
> > 
> > > Le mer. 26 avr. 2023 à 12:37, Luke Chen  a
> écrit
> > :
> > >
> > > Hi all,
> > >
> > > I'd like to volunteer as release manager for the Apache
> > > Kafka 3.4.1 release.
> > > If there are no objections, I'll start building a release plan in
> the
> > > wiki in the next couple of days.
> > >
> > > Thank you.
> > > Luke
> > >
> > >>
> > >
> >
>


Re: [DISCUSS] Apache Kafka 3.4.1 release

2023-04-27 Thread Sophie Blee-Goldman
+1, thanks Luke!

On Wed, Apr 26, 2023 at 10:53 AM Bruno Cadonna  wrote:

> Thanks a lot, Luke!
>
> +1
>
> Best,
> Bruno
>
> On 26.04.23 16:11, Bill Bejeck wrote:
> > Thanks for volunteering!
> >
> > +1
> >
> > On Wed, Apr 26, 2023 at 8:33 AM Mickael Maison  >
> > wrote:
> >
> >> Thanks Luke
> >> +1
> >>
> >> On Wed, Apr 26, 2023 at 2:12 PM Chia-Ping Tsai 
> wrote:
> >>>
> >>> +1 to Luke!
> >>>
>  David Jacot  於 2023年4月26日 下午6:44 寫道:
> 
>  +1 Thanks, Luke!
> 
> > Le mer. 26 avr. 2023 à 12:37, Luke Chen  a écrit
> :
> >
> > Hi all,
> >
> > I'd like to volunteer as release manager for the Apache
> > Kafka 3.4.1 release.
> > If there are no objections, I'll start building a release plan in the
> > wiki in the next couple of days.
> >
> > Thank you.
> > Luke
> >
> >>
> >
>


[GitHub] [kafka-site] vcrfxia commented on pull request #507: MINOR: update docs note about spurious stream-stream join results

2023-04-27 Thread via GitHub


vcrfxia commented on PR #507:
URL: https://github.com/apache/kafka-site/pull/507#issuecomment-1526201667

   @mjsax can you review? Looks like I don't have permissions to add reviewers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] vcrfxia opened a new pull request, #507: MINOR: update docs note about spurious stream-stream join results

2023-04-27 Thread via GitHub


vcrfxia opened a new pull request, #507:
URL: https://github.com/apache/kafka-site/pull/507

   Make changes from https://github.com/apache/kafka/pull/13642 live.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-14948) Broker fails to rejoin cluster when cluster is in dual write mode

2023-04-27 Thread Proven Provenzano (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Proven Provenzano resolved KAFKA-14948.
---
Resolution: Cannot Reproduce

I updated my code to tip of master and cannot reproduce it anymore.

> Broker fails to rejoin cluster when cluster is in dual write mode
> -
>
> Key: KAFKA-14948
> URL: https://issues.apache.org/jira/browse/KAFKA-14948
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.5.0
>Reporter: Proven Provenzano
>Assignee: Proven Provenzano
>Priority: Major
>
> While testing migration dual write mode I came across this issue.
> Initial setup: A single ZK node and a single Broker. Create a topic with some 
> data.
> Create a single controller and initiate migrations.
> Update Broker to start migration.
> Wait until all records are migrated. Cluster should be in dual write mode at 
> this point.
> Kill and restart the broker. Sometimes the broker will not rejoin the cluster 
> and consuming records from a topic will fail to find topic.
> Restarting the Controller will fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-04-27 Thread David Arthur
Hey Mickael,

I have one major ZK migration improvement (KAFKA-14805) that landed in
trunk this week that I'd like to merge to 3.5 (once we fix some test
failures it introduced). After that, I have another PR for KAFKA-14840
which is essentially a huge bug in the ZK migration logic that needs to
land in 3.5.

https://issues.apache.org/jira/browse/KAFKA-14805 (done)
https://issues.apache.org/jira/browse/KAFKA-14840 (nearly done)

I just wanted to check with you before cherry-picking these to 3.5

David


On Mon, Apr 24, 2023 at 1:18 PM Mickael Maison 
wrote:

> Hi Justine,
>
> That makes sense. Feel free to revert that commit in 3.5.
>
> Thanks,
> Mickael
>
> On Mon, Apr 24, 2023 at 7:16 PM Mickael Maison 
> wrote:
> >
> > Hi Josep,
> >
> > Thanks for letting me know!
> >
> > On Mon, Apr 24, 2023 at 6:58 PM Justine Olshan
>  wrote:
> > >
> > > Hey Mickael,
> > >
> > > I've just opened a blocker to revert KAFKA-14561 in 3.5. There are a
> few
> > > blocker bugs that I don't think I can fix before the code freeze, so I
> > > think for the quality of the release, we should just revert the commit.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Apr 21, 2023 at 1:23 PM Josep Prat  >
> > > wrote:
> > >
> > > > Hi Mickael,
> > > >
> > > > Greg Harris managed to fix a flaky test in
> > > > https://github.com/apache/kafka/pull/13575, I cherry-picked it to
> the 3.5
> > > > (and 3.4) branch. I updated the Jira to reflect that is now fixed on
> 3.5.0
> > > > as well as 3.6.0.
> > > > Let me know if I forgot anything.
> > > >
> > > > Best,
> > > >
> > > > On Fri, Apr 21, 2023 at 3:44 PM Mickael Maison <
> mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Just a quick reminder that code freeze is next week.
> > > > > We still have 27 JIRAs targeting 3.5 [0] including quite a few bugs
> > > > > and flaky test issues opened recently. If you have time, take one
> of
> > > > > these items or help with the reviews.
> > > > >
> > > > > I'll send another update next once we've entered code freeze.
> > > > >
> > > > > 0:
> > > > >
> > > >
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.5.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > > > >
> > > > > On Thu, Apr 20, 2023 at 9:14 PM Mickael Maison <
> mickael.mai...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Yes feel free to merge that fix. Thanks for letting me know!
> > > > > >
> > > > > > Mickael
> > > > > >
> > > > > > On Thu, Apr 20, 2023 at 8:15 PM Ron Dagostino  >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Mickael.  I would like to merge
> > > > > > > https://github.com/apache/kafka/pull/13532 (KAFKA-14887: No
> shutdown
> > > > > > > for ZK session expiration in feature processing) to the 3.5
> branch.
> > > > > > > It is a very small and focused fix that can cause unexpected
> broker
> > > > > > > shutdowns when there is instability in the connectivity to
> ZooKeeper.
> > > > > > > The risk is very low.
> > > > > > >
> > > > > > > Ron
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 18, 2023 at 9:57 AM Mickael Maison <
> > > > > mickael.mai...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi David,
> > > > > > > >
> > > > > > > > Thanks for the update. I've marked KAFKA-14869 as fixed in
> 3.5.0, I
> > > > > > > > guess you'll only resolve this ticket once you merge the
> backports
> > > > to
> > > > > > > > earlier branches. The ticket will have to be resolved to run
> the
> > > > > > > > release but that should leave you enough time.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Mickael
> > > > > > > >
> > > > > > > > On Tue, Apr 18, 2023 at 3:42 PM David Jacot
> > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Mickael,
> > > > > > > > >
> > > > > > > > > FYI - I just merged the two PRs for KIP-915 to trunk/3.5.
> We are
> > > > > all good.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > David
> > > > > > > > >
> > > > > > > > > On Mon, Apr 17, 2023 at 5:10 PM Mickael Maison <
> > > > > mickael.mai...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Chris,
> > > > > > > > > >
> > > > > > > > > > I was looking at that just now! As you said, the PRs
> merged
> > > > > provide
> > > > > > > > > > some functionality so I think it's fine to deliver the
> KIP
> > > > > across 2
> > > > > > > > > > releases.
> > > > > > > > > > I left a comment in
> > > > > https://issues.apache.org/jira/browse/KAFKA-14876
> > > > > > > > > > to document what's in 3.5.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Mickael
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Apr 17, 2023 at 5:05 PM Chris Egerton
> > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Mickael,
> > > > > > > 

[DISCUSS] Adding non-committers as Github collaborators

2023-04-27 Thread David Arthur
Hey folks,

I stumbled across this wiki page from the infra team that describes the
various features supported in the ".asf.yaml" file:
https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features

One section that looked particularly interesting was
https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub

github:
  collaborators:
- userA
- userB

This would allow us to define non-committers as collaborators on the Github
project. Concretely, this means they would receive the "triage" Github role
(defined here
https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role).
Practically, this means we could let non-committers do things like assign
labels and reviewers on Pull Requests.

I wanted to see what the committer group thought about this feature. I
think it could be useful.

Cheers,
David


Re: Adding reviewers with Github actions

2023-04-27 Thread David Arthur
I just merged the "reviewers" script I wrote a while ago:
https://github.com/apache/kafka/pull/11096

It works by finding previous occurrences of "Reviewers: ...", so it only
works for people who have reviewed something before. I do suspect this is
largely the common case.

E.g., searching for "Ismael" gives:

Possible matches (in order of most recent):
[1] Ismael Juma ism...@juma.me.uk (1514)
[2] Ismael Juma ij...@apache.org (3)
[3] Ismael Juma mli...@juma.me.uk (4)
[4] Ismael Juma ism...@confluent.io (19)
[5] Ismael Juma git...@juma.me.uk (7)

it shows them in order of most recently occurring along with the number of
occurrences. Now that it's merged, it should be easier for folks to try it
out.

Cheers,
David

On Thu, Apr 20, 2023 at 1:02 PM Justine Olshan 
wrote:

> I've tried the script, but it's not quite complete.
> I've had issues finding folks -- if they haven't reviewed in kafka, we can
> not find an email for them. I also had some issues with finding folks who
> had reviewed before.
>
> Right now, my strategy is to use GitHub to search previous commits for
> folks' emails, but that isn't the most optimal solution -- especially if
> the reviewer has no public email.
> I do think it is useful to have in the commit though, so if anyone has some
> ideas on how to improve, I'd be happy to hear.
>
> Justine
>
> On Wed, Apr 19, 2023 at 6:53 AM Ismael Juma  wrote:
>
> > It's a lot more convenient to have it in the commit than having to follow
> > links, etc.
> >
> > David Arthur also wrote a script to help with this step, I believe.
> >
> > Ismael
> >
> > On Tue, Apr 18, 2023, 9:29 AM Divij Vaidya 
> > wrote:
> >
> > > Do we even need a manual attribution for a reviewer in the commit
> > message?
> > > GitHub automatically marks the folks as "reviewers" who have used the
> > > "review-changes" button on the top left corner and left feedback.
> GitHub
> > > also has searchability for such reviews done by a particular person
> using
> > > the following link:
> > >
> > > https://github.com/search?q=is%3Apr+reviewed-by%3A
> > >
> >
> +repo%3Aapache%2Fkafka+repo%3Aapache%2Fkafka-site=issues
> > >
> > > (replace  with the GitHub username)
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Tue, Apr 18, 2023 at 4:09 PM Viktor Somogyi-Vass
> > >  wrote:
> > >
> > > > I'm not that familiar with Actions either, it just seemed like a tool
> > for
> > > > this purpose. :)
> > > > I Did some digging and what I have in mind is that on pull request
> > review
> > > > it can trigger a workflow:
> > > >
> > > >
> > >
> >
> https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_review
> > > >
> > > > We could in theory use Github CLI to edit the description of the PR
> > when
> > > > someone gives a review (or we could perhaps enable this to simply
> > comment
> > > > too):
> > > >
> > > >
> > >
> >
> https://docs.github.com/en/actions/using-workflows/using-github-cli-in-workflows
> > > >
> > > > So the action definition would look something like this below. Note
> > that
> > > > the "run" part is very basic, it's just here for the idea. We'll
> > probably
> > > > need a shell script instead of that line to format it better. But the
> > > point
> > > > is that it edits the PR and adds the reviewer:
> > > >
> > > > name: Add revieweron:
> > > >   issues:
> > > > types:
> > > >   - pull_request_reviewjobs:
> > > >   comment:
> > > > runs-on: ubuntu-latest
> > > > steps:  - run: gh pr edit $PR_ID --title "$PR_TITLE" --body
> > > > "$PR_BODY\n\nReviewers: $SENDER"
> > > > env:
> > > >   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
> > > >   PR_ID: ${{ github.event.pull_request.id }}
> > > >   PR_TITLE: ${{ github.event.pull_request.title }}
> > > >   PR_BODY: ${{ github.event.pull_request.body }}
> > > >   SENDER: ${{ github.event.sender }}
> > > >
> > > > I'll take a look if I can try this out one my fork and get back if it
> > > leads
> > > > to anything.
> > > >
> > > > Viktor
> > > >
> > > > On Tue, Apr 18, 2023 at 10:12 AM Josep Prat
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > > Unless I miss something, wouldn't this GitHub action either amend
> the
> > > > > commit (breaking signature if any) or directly do the commit itself
> > > > > (meaning the action would be the one squashing and merging and not
> > the
> > > > > maintainer anymore)?
> > > > >
> > > > > Let me know if I'm missing something or if there are some nice
> hidden
> > > > > tricks in GitHub that I didn't know :)
> > > > >
> > > > > Best,
> > > > > On Tue, Apr 18, 2023 at 9:48 AM Viktor Somogyi-Vass
> > > > >  wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Unfortunately I forgot to add myself as a reviewer *again *on a
> PR
> > > when
> > > > > > merging. Shame on me.
> > > > > > However I was thinking about looking into Github actions whether
> we
> > > can
> > > > > > automate this process or at least prevent 

Re: KIP-919: Allow AdminClient to Talk Directly with the KRaft Controller Quorum

2023-04-27 Thread Colin McCabe
On Wed, Apr 26, 2023, at 22:08, Luke Chen wrote:
> Hi Colin,
>
> Some comments:
> 1. I agree we should set "top-level" errors for metadata response
>
> 2. In the "brokers" field of metadata response from controller, it'll
> respond with "Controller endpoint information as given in
> controller.quorum.voters", instead of the "alive" controllers(voters). That
> will break the existing admin client because in admin client, we'll rely on
> the metadata response to build the "current alive brokers" list, and choose
> one from them to connect (either least load or other criteria). That means,
> if now, we return the value in `controller.quorum.voters`, but one of them
> is down. We might choose it to connect and get connection errors. Should we
> return the "alive" controllers(voters) to client?

Hi Luke,

Good question. When talking to the controllers directly, the AdminClient needs 
to always send its RPCs to the active controller. There is one exception: 
configuring ephemeral log4j settings with incrementalAlterConfigs must be done 
by sending them to the specified controller node.

I will add this to a section called "AdminClient Implementation Notes" so that 
it's captured in the KIP.

>
> 3. In the KIP, we list the command-line tools will get a new
> --bootstrap-controllers argument, but without explaining why these tools
> need to talk to controller directly. Could we add some explanation about
> them? I tried but cannot know why some tools are listed here:
> - kafka-acls.sh -> Allow clients to update ACLs via controller before
> brokers up
>
> - kafka-cluster.sh -> Reasonable to get/update cluster info via
> controller
>
> - kafka-configs.sh -> Allow clients to dynamically update
> configs/describe configs from controller. But in this script, client can
> still set quota for users/clients/topics... is client also able to update
> via controllers? Or we only allow partial actions in the script to talk to
> controllers?
>
> - kafka-delegation-tokens.sh -> Reasonable to update delegation-tokens
> via controllers
>
> - kafka-features.sh -> Reasonable
> - kafka-metadata-quorum.sh -> Reasonable
> - kafka-metadata-shell.sh -> Reasonable
>
> - kafka-reassign-partitions.sh -> Why should we allow clients to move
> metadata log partitions in controller nodes? What's the use-case?
>

Yes, the common thread here is that all of these shell commands perform 
operations can be done without the broker. So it's reasonable to allow them to 
be done without going through the broker. I don't know if we need a separate 
note for each since the rationale is really the same for all (is it reasonable? 
if so allow it.)

kafka-reassign-partitions.sh cannot be used to move the __cluster_metadata 
topic. However, it can be used to move partitions that reside on the brokers, 
even when using --bootstrap-controllers to talk directly to the quorum.

Colin

>
> Thank you.
> Luke
>
> On Thu, Apr 27, 2023 at 8:04 AM Colin McCabe  wrote:
>
>> On Tue, Apr 25, 2023, at 04:59, Divij Vaidya wrote:
>> > Thank you for the KIP Colin.
>> >
>> > In general, I like the idea of having the ability to interact directly
>> with
>> > the controllers. I agree with your observation that it helps in
>> situations
>> > where you would want to get data directly from the controller instead of
>> > going via a broker. I have some general comments but the main concern I
>> > have is with the piggy-backing of error code with response of
>> > __cluster_metadata topic.
>> >
>> > 1. With this change, how are we guarding against the possibility of
>> > misbehaving client traffic from disrupting the controller (that you
>> > mentioned as a motivation of earlier behaviour)? One solution could be to
>> > have default values set for request throttling on the controller.
>>
>> Hi Divij,
>>
>> Thanks for the comments.
>>
>> Our guards against client misbehavior remain the same:
>> 1. our recommendation to put the clients on a separate network
>> 2. the fact that producers and consumers can't interact directly with the
>> controller
>> 3. the authorizer.
>>
>> Re: #3, I will spell out in the KIP that clients must have DESCRIBE on the
>> CLUSTER resource to send a METADATA request to the controller.
>>
>> > 2. This KIP also increases the network attack surface area. Prior to this
>> > KIP, it was possible to have firewall rules setup for controllers such
>> that
>> > only the brokers can talk to it. But now, the controller would require
>> > access from other endpoints other than brokers as well. Can we add a
>> > suggestion to the upgrade documentation and inform users
>>
>> There is no requirement for access from other endpoints. It is still
>> possible to set up firewall rules such that only the brokers can talk to
>> the controller. In fact I would even say this is desirable. Since this
>> faculty is intended for infrequent manual administrative operations,
>> needing to log into the broker to use it seems perfectly fine.
>>
>> > 3. In 

[jira] [Resolved] (KAFKA-7735) StateChangeLogMerger tool can not work due to incorrect topic regular matches

2023-04-27 Thread Federico Valeri (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Federico Valeri resolved KAFKA-7735.

Resolution: Won't Fix

> StateChangeLogMerger tool can not work due to incorrect topic regular matches
> -
>
> Key: KAFKA-7735
> URL: https://issues.apache.org/jira/browse/KAFKA-7735
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Fangbin Sun
>Assignee: Federico Valeri
>Priority: Major
>
> When StateChangeLogMerger tool tries to obtain a topic's state-change-log, it 
> returns nothing.
> {code:java}
> bin/kafka-run-class.sh com.cmss.kafka.api.StateChangeLogMerger --logs 
> state-change.log --topic test{code}
> This tool uses a topic partition regex as follows:
> {code:java}
> val topicPartitionRegex = new Regex("\\[(" + Topic.LEGAL_CHARS + "+),( 
> )*([0-9]+)\\]"){code}
> However the state-change-log no longer prints log in the above format. e.g. 
> in 0.10.2.0, it prints some state-change logs by case class TopicAndPartition 
> which overrided as follows:
> {code:java}
> override def toString = "[%s,%d]".format(topic, partition){code}
> In a newer version (e.g. 1.0.0+) it prints most of state-change logs in the 
> form of "partition $topic-$partition", as a workaround one can modify the 
> topic partition regex like:
> {code:java}
> val topicPartitionRegex = new Regex("(partition " + Topic.LEGAL_CHARS + 
> "+)-([0-9]+)"){code}
> and match topic with "matcher.group(1).substring(10)", however some output of 
> state changes might be a little bit redundant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: KIP-922: Add the traffic metric of the partition dimension

2023-04-27 Thread Edoardo Comar
Hi hudeqi,

thanks for the KIP.

For the purpose of monitoring if partitions of a topic are used "fairly",
the log end offset metric offers a good hint.
Of course, it only expresses a message count, not bytes, but I find it
sufficient and do not need the actual throughput per partition.
Perhaps the KIP could detail a use case for which monitoring different LOE
rates is not good enough.

Edoardo

On Sun, 23 Apr 2023 at 04:54, hudeqi <16120...@bjtu.edu.cn> wrote:

> Hi all,
>
> I have written a new small KIPto add the traffic metric of the partition
> dimension. The motivation is:
>
> Currently, there are two metrics for measuring the traffic in topic
> dimensions: MessagesInPerSec, BytesInPerSec, but there are two problems:
> 1. It is difficult to intuitively reflect the problem of topic partition
> traffic inclination through these indicators, and it is impossible to
> clearly see which partition has the largest traffic and the traffic
> situation of each partition. But the partition dimension can solve this.
> 2. For the sudden increase in topic traffic (for example, this will lead
> some followers to out of Isr, which can be avoided by appropriately
> increasing the number of partitions.), the metrics of the partition
> dimension can help to provide guidance on whether to expand the partition.
>
> On the whole, I think it is very meaningful to add traffic metrics of
> partition dimension, especially the issue of traffic skew.
>
> Please take a look here in deatil:
> https://cwiki.apache.org/confluence/x/LQs0Dw
>
> best,
>
> hudeqi


[jira] [Created] (KAFKA-14948) Broker fails to rejoin cluster when cluster is in dual write mode

2023-04-27 Thread Proven Provenzano (Jira)
Proven Provenzano created KAFKA-14948:
-

 Summary: Broker fails to rejoin cluster when cluster is in dual 
write mode
 Key: KAFKA-14948
 URL: https://issues.apache.org/jira/browse/KAFKA-14948
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.5.0
Reporter: Proven Provenzano
Assignee: Proven Provenzano


While testing migration dual write mode I came across this issue.

Initial setup: A single ZK node and a single Broker. Create a topic with some 
data.

Create a single controller and initiate migrations.

Update Broker to start migration.

Wait until all records are migrated. Cluster should be in dual write mode at 
this point.

Kill and restart the broker. Sometimes the broker will not rejoin the cluster 
and consuming records from a topic will fail to find topic.

Restarting the Controller will fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14929) Flaky KafkaStatusBackingStoreFormatTest#putTopicStateRetriableFailure

2023-04-27 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-14929.
-
Resolution: Fixed

> Flaky KafkaStatusBackingStoreFormatTest#putTopicStateRetriableFailure
> -
>
> Key: KAFKA-14929
> URL: https://issues.apache.org/jira/browse/KAFKA-14929
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Sagar Rao
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.5.0
>
>
> This test recently started flaky-failing with the following stack trace:
> {noformat}
> org.mockito.exceptions.verification.TooFewActualInvocations: 
> kafkaBasedLog.send(, , );
> Wanted 2 times:->
>  at org.apache.kafka.connect.util.KafkaBasedLog.send(KafkaBasedLog.java:376)
> But was 1 time:->
>  at 
> org.apache.kafka.connect.storage.KafkaStatusBackingStore.sendTopicStatus(KafkaStatusBackingStore.java:315)
>   at 
> app//org.apache.kafka.connect.util.KafkaBasedLog.send(KafkaBasedLog.java:376)
>   at 
> app//org.apache.kafka.connect.storage.KafkaStatusBackingStoreFormatTest.putTopicStateRetriableFailure(KafkaStatusBackingStoreFormatTest.java:219)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>  Method)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
> ...{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Requesting permissions to contribute to Apache Kafka

2023-04-27 Thread Mickael Maison
Hi,

I granted you permissions.

Thanks,
Mickael

On Thu, Apr 27, 2023 at 3:23 PM aaron ai  wrote:
>
> Hello,
>
> I'm following
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-GettingStarted
> and have reached step 4.
>
> wiki ID: aaronai
> Jira ID: aaronai


Requesting permissions to contribute to Apache Kafka

2023-04-27 Thread aaron ai
Hello,

I'm following
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-GettingStarted
and have reached step 4.

wiki ID: aaronai
Jira ID: aaronai


Re: Requesting permissions to contribute to Apache Kafka

2023-04-27 Thread Luke Chen
Hi Keith,

Your accounts are all set.

Thanks.
Luke

On Thu, Apr 27, 2023 at 8:44 PM Keith Wall  wrote:

> Hello,
>
> I'm following
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-GettingStarted
> and have reached step 4.
>
> wiki ID: kwall
> Jira ID: kwall
>
> Keith Wall.
>


Requesting permissions to contribute to Apache Kafka

2023-04-27 Thread Keith Wall
Hello,

I'm following 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-GettingStarted
and have reached step 4.

wiki ID: kwall
Jira ID: kwall

Keith Wall.


Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-04-27 Thread Nick Telford
Hi everyone,

I find myself (again) considering removing the offset management from
StateStores, and keeping the old checkpoint file system. The reason is that
the StreamPartitionAssignor directly reads checkpoint files in order to
determine which instance has the most up-to-date copy of the local state.
If we move offsets into the StateStore itself, then we will need to open,
initialize, read offsets and then close each StateStore (that is not
already assigned and open) for which we have *any* local state, on every
rebalance.

Generally, I don't think there are many "orphan" stores like this sitting
around on most instances, but even a few would introduce additional latency
to an already somewhat lengthy rebalance procedure.

I'm leaning towards Colt's (Slack) suggestion of just keeping things in the
checkpoint file(s) for now, and not worrying about the race. The downside
is that we wouldn't be able to remove the explicit RocksDB flush on-commit,
which likely hurts performance.

If anyone has any thoughts or ideas on this subject, I would appreciate it!

Regards,
Nick

On Wed, 19 Apr 2023 at 15:05, Nick Telford  wrote:

> Hi Colt,
>
> The issue is that if there's a crash between 2 and 3, then you still end
> up with inconsistent data in RocksDB. The only way to guarantee that your
> checkpoint offsets and locally stored data are consistent with each other
> are to atomically commit them, which can be achieved by having the offsets
> stored in RocksDB.
>
> The offsets column family is likely to be extremely small (one
> per-changelog partition + one per Topology input partition for regular
> stores, one per input partition for global stores). So the overhead will be
> minimal.
>
> A major benefit of doing this is that we can remove the explicit calls to
> db.flush(), which forcibly flushes memtables to disk on-commit. It turns
> out, RocksDB memtable flushes are largely dictated by Kafka Streams
> commits, *not* RocksDB configuration, which could be a major source of
> confusion. Atomic checkpointing makes it safe to remove these explicit
> flushes, because it no longer matters exactly when RocksDB flushes data to
> disk; since the data and corresponding checkpoint offsets will always be
> flushed together, the local store is always in a consistent state, and
> on-restart, it can always safely resume restoration from the on-disk
> offsets, restoring the small amount of data that hadn't been flushed when
> the app exited/crashed.
>
> Regards,
> Nick
>
> On Wed, 19 Apr 2023 at 14:35, Colt McNealy  wrote:
>
>> Nick,
>>
>> Thanks for your reply. Ack to A) and B).
>>
>> For item C), I see what you're referring to. Your proposed solution will
>> work, so no need to change it. What I was suggesting was that it might be
>> possible to achieve this with only one column family. So long as:
>>
>>- No uncommitted records (i.e. not committed to the changelog) are
>>*committed* to the state store, AND
>>- The Checkpoint offset (which refers to the changelog topic) is less
>>than or equal to the last written changelog offset in rocksdb
>>
>> I don't see the need to do the full restoration from scratch. My
>> understanding was that prior to 844/892, full restorations were required
>> because there could be uncommitted records written to RocksDB; however,
>> given your use of RocksDB transactions, that can be avoided with the
>> pattern of 1) commit Kafka transaction, 2) commit RocksDB transaction, 3)
>> update offset in checkpoint file.
>>
>> Anyways, your proposed solution works equivalently and I don't believe
>> there is much overhead to an additional column family in RocksDB. Perhaps
>> it may even perform better than making separate writes to the checkpoint
>> file.
>>
>> Colt McNealy
>> *Founder, LittleHorse.io*
>>
>>
>> On Wed, Apr 19, 2023 at 5:53 AM Nick Telford 
>> wrote:
>>
>> > Hi Colt,
>> >
>> > A. I've done my best to de-couple the StateStore stuff from the rest of
>> the
>> > Streams engine. The fact that there will be only one ongoing (write)
>> > transaction at a time is not guaranteed by any API, and is just a
>> > consequence of the way Streams operates. To that end, I tried to ensure
>> the
>> > documentation and guarantees provided by the new APIs are independent of
>> > this incidental behaviour. In practice, you're right, this essentially
>> > refers to "interactive queries", which are technically "read
>> transactions",
>> > even if they don't actually use the transaction API to isolate
>> themselves.
>> >
>> > B. Yes, although not ideal. This is for backwards compatibility,
>> because:
>> > 1) Existing custom StateStore implementations will implement
>> flush(),
>> > and not commit(), but the Streams engine now calls commit(), so those
>> calls
>> > need to be forwarded to flush() for these legacy stores.
>> > 2) Existing StateStore *users*, i.e. outside of the Streams engine
>> > itself, may depend on explicitly calling flush(), so for these cases,
>> > flush() needs to 

Re: Consumer offset value -Apache kafka 3.2.3

2023-04-27 Thread ziming deng
Hello,
It seems kafka.tools.ConsumerOffsetChecker has been deprecated in 0.9.0, in the 
new version, please use kafka-consumer-groups.sh

--
Best,
Ziming

> On Apr 26, 2023, at 18:21, Kafka Life  wrote:
> 
> Dear Kafka Experts
> 
> How can we check for a particular offset number in Apache kafka 3.2.3
> version.Could you please share some light.
> The kafka_console_consumer tool is throwing class not found error.
> 
> ./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>--topic your-topic
>--group your-consumer-group
>--zookeeper localhost:2181



[jira] [Created] (KAFKA-14947) Duplicate records are getting created in the topic.

2023-04-27 Thread krishnendu Das (Jira)
krishnendu Das created KAFKA-14947:
--

 Summary: Duplicate records are getting created in the topic. 
 Key: KAFKA-14947
 URL: https://issues.apache.org/jira/browse/KAFKA-14947
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 3.1.1
Reporter: krishnendu Das


We are using Kafka connect API (version 2.3.0) and Kafka  (version 3.1.1) for 
data ingestion purposes. Previously we were using Kafka (version 2.6.2) and the 
same Kafka connect API (version 2.3.0). The data ingestion was happening 
properly. 

 

Recently we updated the Kafka version from 2.6.2 to 3.1.1.

Post update we are facing duplicate data issues from the source connector into 
the Kafka topic. After debugging the 3.1.1 code, we saw one new function

{*}updateCommittableOffsets{*}() got added and called inside the 
{*}WorkerSourceTask::execute{*}() as part of bug fix --"KAFKA-12226: Commit 
source task offsets without blocking on batch delivery (#11323)"

 

Now because of this function, we are observing this scenario
 # Inside the execute() at the start of the flow, the call goes to 
updateCommittableOffsets() to check if anything was there to perform the 
committed offset or not. As the first poll is still not yet happened, this 
function didn't find anything for commit.
 # Then Kafka connects API poll() method is called from the 
WorkerSourceTask::execute(). *-> 1st poll*
 # Kafka Connect API (using sleepy policy) reads one source file from the Cloud 
source directory.
 # Read the whole content of the file and send the result set Kafka server to 
write to the Kafka topic.
 # During the 2nd poll updateCommittableOffsets() found some offset to commit 
and its updates a reference variable committableOffsets, which will be used 
further by the WorkerSourceTask::commitOffsets() function to perform actual 
commit offset.
 # Then Kafka connects the API poll() method is called from the 
*WorkerSourceTask::execute().* *-> 2nd poll*
 # Kafka Connect API (using sleepy policy) reads the same source file again 
from the start, as the offsetStrorageReader::offset() didn’t give the latest 
offset.
 # Read the whole content of the file and send the result set Kafka server to 
write to the Kafka topic.---> These create duplicate data into the topic.


 # WorkerSourceTask::commitOffsets() commits the offset.


 # Then Kafka connects API poll() method is called from the 
{*}WorkerSourceTask::execute(){*}. -> 3rd poll
 # This time offsetStrorageReader::offset() will be able to give the latest 
offset.
 # Kafka Connect API (using sleepy policy) reads the same source file from the 
last read position.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14946) Kafka Node Shutting Down Automatically

2023-04-27 Thread Akshay Kumar (Jira)
Akshay Kumar created KAFKA-14946:


 Summary: Kafka Node Shutting Down Automatically
 Key: KAFKA-14946
 URL: https://issues.apache.org/jira/browse/KAFKA-14946
 Project: Kafka
  Issue Type: Bug
  Components: controller, kraft
Affects Versions: 3.3.1
Reporter: Akshay Kumar


* We are using the zookeeper less Kafka (kafka Kraft).
 * The cluster is having 3 nodes.
 * One of the nodes gets automatically shut down randomly.
 * Checked the logs but didn't get the exact reason.
 * Kafka version - 3.3.1
 * Attaching the log files. 
 * Time - 2023-04-21 16:28:23

*state-change.log -*
[https://drive.google.com/file/d/1eS-ShKlhGPsIJoybHndlhahJnucU8RWA/view?usp=share_link]
 
*server.log -*
[https://drive.google.com/file/d/1Ov5wrQIqx2AS4J7ppFeHJaDySsfsK588/view?usp=share_link]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-921 OpenJDK CRaC support

2023-04-27 Thread Radim Vansa
Thank you for those questions, as I've mentioned, my knowledge of Kafka 
is quite limited so these are the things that need careful thinking! 
Comments inline.


On 26. 04. 23 16:28, Mickael Maison wrote:

Hi Radim,

Thanks for the KIP! CRaC is an interesting project and it could be a
useful feature in Kafka clients.

The KIP is pretty vague in terms of the expected behavior of clients
when checkpointing and restoring. For example:

1. A consumer may have pre-fetched records in memory. When it is
checkpointed, its group will rebalance and another consumer will
consume the same records. When the initial consumer is restored, will
it process its pre-fetched records and basically reprocess record
already handled by other consumers?



How would the broker (?) know what records were really consumed? I think 
that there must be some form of Two Generals Problem.


The checkpoint should keep as much of the application untouched as it 
could. Here, I would expect that the prefetched records would be 
consumed after the restore operation as if nothing happened. I can 
imagine this could cause some trouble if the data is dependent on the 
'external' world, e.g. other members of the cluster. But I wouldn't 
break the general guarantees Kafka provides if we can avoid it. We 
certainly have an option to do the checkpoint more gracefully, 
deregistering with the group (the checkpoint is effectively blocked by 
the notification handler).


If we're talking about using CRaC for boot speedup this is not that 
important - when the app is about to be checkpointed it will likely stop 
processing data anyway. For other use-cases (e.g. live migration) it 
might matter.





2. Producers may have records in-flight or in the producer buffer when
they are checkpointed. How do you propose to handle these cases?



If there's something in flight we can wait for the acks. Alternatively 
if the receiver guards against double receive using unique ids/sequence 
numbers we could resend that after restore. As for the data in the 
buffer, IMO that can wait until restore.





3. Clients may have loaded plugins such as serializers. These plugins
may establish network connections too. How are these expected to
automatically reconnect when the application is restored?



If there's an independent pool of connections, it's up to the plugin 
author to support CRaC, I doubt there's anything that the generic code 
could do. Also it's likely that the plugins won't need any extension to 
the SPI; these would register its handlers independently (if ordering 
matters there are ways to prioritize one resource over another).


Cheers!

Radim Vansa




Thanks,
Mickael


On Wed, Apr 26, 2023 at 8:27 AM Radim Vansa  wrote:

Hi all,

I haven't seen much reactions on this proposal. Is there any general
policy regarding dependencies, or a prior decision that would hint on this?

Thanks!

Radim


On 21. 04. 23 10:10, Radim Vansa wrote:

Caution: This email originated from outside of the organization. Do
not click links or open attachments unless you recognize the sender
and know the content is safe.


Thank you,

now to be tracked as KIP-921:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-921%3A+OpenJDK+CRaC+support


Radim

On 20. 04. 23 15:26, Josep Prat wrote:

Hi Radim,
You should have now permissions to create a KIP.

Best,

On Thu, Apr 20, 2023 at 2:22 PM Radim Vansa 
wrote:


Hello,

upon filing a PR [1] with some initial support for OpenJDK CRaC
[2][3] I
was directed here to raise a KIP (I don't have the permissions in
wiki/JIRA to create the KIP page yet, though).

In a nutshell, CRaC intends to provide a way to checkpoint (snapshot)
and persist a running Java application and later restore it,
possibly on
a different computer. This can be used to significantly speed up the
boot process (from seconds or minutes to tens of milliseconds), live
replication or migration of the heated up application. This is not
entirely transparent to the application; the application can register
for notification when this is happening, and sometime has to assist
with
that to prevent unexpected state after restore - e.g. close network
connections and files.

CRaC is not integrated yet into the mainline JDK; JEP is being
prepared,
and users are welcome to try out our builds. However even when this
gets
into JDK we can't expect users jump onto the latest release
immediately;
therefore we provide a facade package org.crac [4] that delegates to
the
implementation, if it is present in the running JDK, or provides a
no-op
implementation.

With or without the implementation, the support for CRaC in the
application should be designed to have a minimal impact on performance
(few extra objects, some volatile reads...). On the other hand the
checkpoint operation itself can be non-trivial in this matter.
Therefore
the main consideration should be about the maintenance costs -
keeping a
small JAR in dependencies and some extra code in networking and
persistence.

The support