Re: [REVIEW REQUEST] Move ReassignPartitionsCommandArgsTest to java

2023-08-28 Thread Николай Ижиков
Hello, Mickael.

Thanks for the reply and reviewing previous patches for this feature :).

Just to let all know - I’ve got one approval from non commiter.
So, if any committer has spare cycle, please, join the simple review that will 
help to rewrite tools in java

https://github.com/apache/kafka/pull/14217


> 25 авг. 2023 г., в 13:25, Mickael Maison  
> написал(а):
> 
> Hi Nikolay,
> 
> Thanks for working on this feature. Take into account that many people
> are on PTO at the moment and there's a lot of PRs to review.
> Not sure why fixVersion was set to 3.6.0. Apart for key features or
> fixes, it's usually best to only set the fixVersion once the code has
> been merged. I've cleared that field.
> 
> Thanks,
> Mickael
> 
> On Fri, Aug 25, 2023 at 9:26 AM Николай Ижиков  wrote:
>> 
>> Hello, Luke
>> 
>> Thanks for reply.
>> 
>> Actually, KAFKA-14595 [1] has fixVersion = 3.6.0
>> And the PR is part of the ticket.
>> 
>> Several review required If we want to include java version of 
>> ReassignPartitionCommand in 3.6.0
>> 
>> [1] https://issues.apache.org/jira/browse/KAFKA-14595
>> 
>> 
>>> 23 авг. 2023 г., в 10:07, Luke Chen  написал(а):
>>> 
>>> Hi,
>>> 
>>> Sorry that we're mostly working on features for v3.6.0, which is expected
>>> to be released in the following weeks.
>>> I'll review your PR after releasing. (Please ping me then if I forget it!)
>>> 
>>> Also, it'd be good if the devs in the community can help on PR review when
>>> available.
>>> That'll help a lot.
>>> Besides, PR review is also one kind of contribution, not just code
>>> commitment.
>>> 
>>> Thanks.
>>> Luke
>>> 
>>> 
>>> 
>>> On Tue, Aug 22, 2023 at 7:15 PM Николай Ижиков  wrote:
>>> 
 Hello.
 
 Please, join the simple review)
 We have few steps left to completely rewrite ReassignPartitionsCommand in
 java.
 
> 17 авг. 2023 г., в 17:16, Николай Ижиков 
 написал(а):
> 
> Hello.
> 
> I’m working on [1].
> The goal of ticket is to rewire `ReassignPartitionCommand` in java.
> 
> The PR that moves whole command is pretty big so it makes sense to split
 it.
> I prepared the PR [2] that moves single test
 (ReassignPartitionsCommandArgsTest) to java.
> 
> It relatively small and simple(touches only 3 files):
> 
> To review - https://github.com/apache/kafka/pull/14217
> Big PR  - https://github.com/apache/kafka/pull/13247
> 
> Please, review.
> 
> [1] https://issues.apache.org/jira/browse/KAFKA-14595
> [2] https://github.com/apache/kafka/pull/14217
 
 
>> 



Re: Need Access to create KIP & Jira Tickets

2023-08-28 Thread Josep Prat
Hi Raghu,
Thanks for your interest in Apache Kafka.
As Justine rightly points out, for the change of the Jira ID you should
file a Jira issue under the INFRA project [1]. If you end up creating
another account, I would also recommend creating a Jira issue under INFRA
to merge the 2 accounts.

Best,
[1]: https://issues.apache.org/jira/projects/INFRA/issues

On Mon, Aug 28, 2023 at 10:29 PM Justine Olshan
 wrote:

> Hey Raghu,
>
> I've added your ID to give you permissions to the wiki.
>
> I'm not sure if committers can change your jira ID. You may want to try to
> create a new account or file a ticket with apache for that.
>
> Let me know if there are any issues.
>
> Justine
>
> On Mon, Aug 28, 2023 at 11:54 AM Raghu Baddam  wrote:
>
> > Hi Team,
> >
> > Please find wiki ID and Jira ID and help me by providing me access to
> > create KIP's and Jira Tickets on Apache Kafka space.
> >
> > wiki ID: rbaddam
> > Jira ID: raghu98...@gmail.com
> >
> > Also If possible I also need help with changing my Jira ID same as wiki
> ID
> > i.e. *rbaddam*
> >
> > Thanks,
> > Raghu
> >
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


[jira] [Created] (KAFKA-15414) remote logs get deleted after partition reassignment

2023-08-28 Thread Luke Chen (Jira)
Luke Chen created KAFKA-15414:
-

 Summary: remote logs get deleted after partition reassignment
 Key: KAFKA-15414
 URL: https://issues.apache.org/jira/browse/KAFKA-15414
 Project: Kafka
  Issue Type: Bug
Reporter: Luke Chen
 Attachments: image-2023-08-29-11-12-58-875.png

it seems I'm reaching that codepath when running reassignments on my cluster 
and segment are deleted from remote store despite a huge retention (topic 
created a few hours ago with 1000h retention).
It seems to happen consistently on some partitions when reassigning but not all 
partitions.

My test:

I have a test topic with 30 partition configured with 1000h global retention 
and 2 minutes local retention
I have a load tester producing to all partitions evenly
I have consumer load tester consuming that topic
I regularly reset offsets to earliest on my consumer to test backfilling from 
tiered storage.

My consumer was catching up consuming the backlog and I wanted to upscale my 
cluster to speed up recovery: I upscaled my cluster from 3 to 12 brokers and 
reassigned my test topic to all available brokers to have an even 
leader/follower count per broker.

When I triggered the reassignment, the consumer lag dropped on some of my topic 
partitions:
!image-2023-08-29-11-12-58-875.png|width=800,height=79! Screenshot 2023-08-28 
at 20 57 09

Later I tried to reassign back my topic to 3 brokers and the issue happened 
again.

Both times in my logs, I've seen a bunch of logs like:

[RemoteLogManager=10005 partition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17] Deleted 
remote log segment RemoteLogSegmentId

{topicIdPartition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17, 
id=Mk0chBQrTyKETTawIulQog}

due to leader epoch cache truncation. Current earliest epoch: 
EpochEntry(epoch=14, startOffset=46776780), segmentEndOffset: 46437796 and 
segmentEpochs: [10]

Looking at my s3 bucket. The segments prior to my reassignment have been indeed 
deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Need Access to create KIP & Jira Tickets

2023-08-28 Thread Justine Olshan
Hey Raghu,

I've added your ID to give you permissions to the wiki.

I'm not sure if committers can change your jira ID. You may want to try to
create a new account or file a ticket with apache for that.

Let me know if there are any issues.

Justine

On Mon, Aug 28, 2023 at 11:54 AM Raghu Baddam  wrote:

> Hi Team,
>
> Please find wiki ID and Jira ID and help me by providing me access to
> create KIP's and Jira Tickets on Apache Kafka space.
>
> wiki ID: rbaddam
> Jira ID: raghu98...@gmail.com
>
> Also If possible I also need help with changing my Jira ID same as wiki ID
> i.e. *rbaddam*
>
> Thanks,
> Raghu
>


Need Access to create KIP & Jira Tickets

2023-08-28 Thread Raghu Baddam
Hi Team,

Please find wiki ID and Jira ID and help me by providing me access to
create KIP's and Jira Tickets on Apache Kafka space.

wiki ID: rbaddam
Jira ID: raghu98...@gmail.com

Also If possible I also need help with changing my Jira ID same as wiki ID
i.e. *rbaddam*

Thanks,
Raghu


Re: Disabling Test: org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()

2023-08-28 Thread Sagar
Hey Greg,

Aah ok, I wasn't aware there existed a JIRA for this already. I did see
your attempt to fix this but it seems to be failing still.

Sagar.

On Mon, Aug 28, 2023 at 10:30 PM Greg Harris 
wrote:

> Hey Sagar,
>
> The JIRA for this flaky test is here:
> https://issues.apache.org/jira/browse/KAFKA-8115
>
> Rather than disabling the test, I think we should look into the cause
> of the flakiness.
>
> Thanks!
> Greg
>
> On Mon, Aug 28, 2023 at 2:49 AM Sagar  wrote:
> >
> > Hi All,
> >
> > Should we disable this test:
> >
> org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()?
> >
> > I just did a quick search on my mailbox for this test and it has been
> > failing for a while. I will go ahead and create a ticket for this for
> > fixing this.
> >
> > Let me know if disabling it doesn't sound like a good idea.
> >
> > Thanks!
> > Sagar.
>


Re: Disabling Test: org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()

2023-08-28 Thread Greg Harris
Hey Sagar,

The JIRA for this flaky test is here:
https://issues.apache.org/jira/browse/KAFKA-8115

Rather than disabling the test, I think we should look into the cause
of the flakiness.

Thanks!
Greg

On Mon, Aug 28, 2023 at 2:49 AM Sagar  wrote:
>
> Hi All,
>
> Should we disable this test:
> org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()?
>
> I just did a quick search on my mailbox for this test and it has been
> failing for a while. I will go ahead and create a ticket for this for
> fixing this.
>
> Let me know if disabling it doesn't sound like a good idea.
>
> Thanks!
> Sagar.


[jira] [Created] (KAFKA-15413) kafka-server-stop fails with COLUMNS environment variable on Ubuntu

2023-08-28 Thread Takashi Sakai (Jira)
Takashi Sakai created KAFKA-15413:
-

 Summary: kafka-server-stop fails with COLUMNS environment variable 
on Ubuntu
 Key: KAFKA-15413
 URL: https://issues.apache.org/jira/browse/KAFKA-15413
 Project: Kafka
  Issue Type: Bug
  Components: tools
 Environment: kafka: 3.5.1
Java: openjdk version "20.0.1" 2023-04-18
OS: Ubuntu 22.04.3 LTS on WSL2/Windows 11
Reporter: Takashi Sakai


{{kafka-server-stop}} script does not work if environment variable {{COLUMNS}} 
is set on Ubuntu.

{*}Steps to reproduce{*}:
kafka/zookeeper.properties
{noformat}
dataDir=/tmp/kafka-test-20230828-15217-1lop1tk/zookeeper
clientPort=34461
maxClientCnxns=0
admin.enableServer=false
{noformat}
kafka/server.properties
{noformat}
broker.id=0
listeners=PLAINTEXT://:46161
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-test-20230828-15217-1lop1tk/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.retention.check.interval.ms=30
zookeeper.connect=localhost:34461
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
{noformat}
{noformat}
$ zookeeper-server-start kafka/zookeeper.properties >/dev/null 2>&1 &
[1] 18593
$ kafka-server-start kafka/server.properties >/dev/null 2>&1 &
[2] 18982
$ COLUMNS=10 kafka-server-stop # This is unexpected
No kafka server to stop
$ kafka-server-stop
$ zookeeper-server-stop
[2]+  Exit 143                kafka-server-start kafka/server.properties
$ 
[1]+  Exit 143                zookeeper-server-start kafka/zookeeper.properties 
{noformat}
In the third command, I specified {{COLUMNS}} environment variable. It caused 
{{kafka-server-stop}} script to fail finding kafka process.

*Cause*

{{kafka-server-stop}} script uses {{ps ax}} to find kafka process.
{noformat}
OSNAME=$(uname -s)
if [[ "$OSNAME" == "OS/390" ]]; then
(snip)
elif [[ "$OSNAME" == "OS400" ]]; then
(snip)
else
PIDS=$(ps ax | grep ' kafka\.Kafka ' | grep java | grep -v grep | awk 
'{print $1}')
fi
{noformat}
On Ubuntu, {{ps ax}} truncates its output if environment variable {{COLUMNS}} 
exists.

([source code of ps command|#L226-L230]] shows that COLUMNS environment 
variable wins result of {{{}isatty{}}})
{noformat}
$ ps ax | cat
  19912 pts/0Sl 0:03 
/home/linuxbrew/.linuxbrew/opt/openjdk/libexec/bin/java -Xmx1G -Xms1G -server 
-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 
-XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true 
-Xlog:gc*:file=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs/kafkaServer-gc.log:time,tags:filecount=10,filesize=100M
 -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dkafka.logs.dir=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs
 
-Dlog4j.configuration=file:/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../config/log4j.properties
 -cp 
/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/activation-1.1.1.jar:(snip):/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/zstd-jni-1.5.5-1.jar
 kafka.Kafka kafka/server.properties
$ COLUMNS=10 ps ax | cat
  19912 pts/0Sl 0:05 /home/linux
{noformat}
I tested this on WSL2 on Windows and openjdk installed with Homebrew, but it 
should occur on any environment with {{{}procps-ng{}}}.

*Problem*

This caused CI failure in Homebrew project. 
([GitHub/Homebrew/homebrew-core#133887|https://gitlab.com/procps-ng/procps/-/blob/675246119df143a5f8ced6e3313edac6ccc3e222/src/ps/global.c#L226-L230])

Homebrew's behavior that passes {{COLUMNS}} environment variable seems a bug. 
But, {{server-stop}} script is not expected to be affected by such an 
environment variable. So, this also seemed to be a bug for me.

*Related issues*

This problem, KAFKA-4931 and KAFKA-4110 can also be fixed by introducing 
ProcessID file. But the three problem have different cause and can be thought 
separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15412) Reading an unknown version of quorum-state-file should trigger an error

2023-08-28 Thread John Mannooparambil (Jira)
John Mannooparambil created KAFKA-15412:
---

 Summary: Reading an unknown version of quorum-state-file should 
trigger an error
 Key: KAFKA-15412
 URL: https://issues.apache.org/jira/browse/KAFKA-15412
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: John Mannooparambil


Reading an unknown version of quorum-state-file should trigger an error. 
Currently the only known version is 0. Reading any other version should cause 
an error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15411) DelegationTokenEndToEndAuthorizationWithOwnerTest is Flaky

2023-08-28 Thread Proven Provenzano (Jira)
Proven Provenzano created KAFKA-15411:
-

 Summary: DelegationTokenEndToEndAuthorizationWithOwnerTest is 
Flaky 
 Key: KAFKA-15411
 URL: https://issues.apache.org/jira/browse/KAFKA-15411
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Proven Provenzano
Assignee: Proven Provenzano
 Fix For: 3.6.0


DelegationTokenEndToEndAuthorizationWithOwnerTest has become flaky since the 
merge of delegation token support for KRaft.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15410) Add basic functionality integration test with tiered storage

2023-08-28 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15410:


 Summary: Add basic functionality integration test with tiered 
storage
 Key: KAFKA-15410
 URL: https://issues.apache.org/jira/browse/KAFKA-15410
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Add the below basic functionality integration tests with tiered storage:
 # PartitionsExpandTest
 # DeleteTopicWithSecondaryStorageTest
 # DeleteSegmentsByRetentionSizeTest
 # DeleteSegmentsByRetentionTimeTest
 # DeleteSegmentsDueToLogStartOffsetBreachTest
 # EnableRemoteLogOnTopicTest
 # ListOffsetsTest
 # ReassignReplicaExpandTest
 # ReassignReplicaMoveTest
 # ReassignReplicaShrinkTest and
 # TransactionsTestWithTieredStore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2147

2023-08-28 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 304755 lines...]

Gradle Test Run :streams:test > Gradle Test Executor 84 > 
RocksDBMetricsRecorderTest > 
shouldCorrectlyHandleAvgRecordingsWithZeroSumAndCount() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 84 > 
RocksDBMetricsRecorderTest > 
shouldThrowIfStatisticsToAddIsNullButExistingStatisticsAreNotNull() STARTED

Gradle Test Run :streams:test > Gradle Test Executor 84 > 
RocksDBMetricsRecorderTest > 
shouldThrowIfStatisticsToAddIsNullButExistingStatisticsAreNotNull() PASSED

Gradle Test Run :streams:test > Gradle Test Executor 84 > 
RocksDBMetricsRecorderTest > shouldNotAddItselfToRecordingTriggerWhenNotEmpty() 
STARTED

Gradle Test Run :streams:test > Gradle Test Executor 84 > 
RocksDBMetricsRecorderTest > shouldNotAddItselfToRecordingTriggerWhenNotEmpty() 
PASSED
streams-2: SMOKE-TEST-CLIENT-CLOSED
streams-2: SMOKE-TEST-CLIENT-CLOSED
streams-0: SMOKE-TEST-CLIENT-CLOSED
streams-5: SMOKE-TEST-CLIENT-CLOSED
streams-3: SMOKE-TEST-CLIENT-CLOSED
streams-4: SMOKE-TEST-CLIENT-CLOSED
streams-5: SMOKE-TEST-CLIENT-CLOSED
streams-6: SMOKE-TEST-CLIENT-CLOSED
streams-1: SMOKE-TEST-CLIENT-CLOSED
streams-1: SMOKE-TEST-CLIENT-CLOSED
streams-3: SMOKE-TEST-CLIENT-CLOSED
streams-4: SMOKE-TEST-CLIENT-CLOSED
streams-0: SMOKE-TEST-CLIENT-CLOSED

> Task :core:test

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testZooKeeperSessionStateMetric() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnection() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnection() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetAclExistingZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetAclExistingZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testSessionExpiryDuringClose() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testSessionExpiryDuringClose() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testReinitializeAfterAuthFailure() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testReinitializeAfterAuthFailure() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testSetAclNonExistentZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testSetAclNonExistentZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnectionLossRequestTermination() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnectionLossRequestTermination() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testExistsNonExistentZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testExistsNonExistentZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetDataNonExistentZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetDataNonExistentZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnectionTimeout() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testConnectionTimeout() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testBlockOnRequestCompletionFromStateChangeHandler() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testBlockOnRequestCompletionFromStateChangeHandler() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testUnresolvableConnectString() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testUnresolvableConnectString() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > ZooKeeperClientTest > 
testGetChildrenNonExistent

[jira] [Created] (KAFKA-15409) Distinguishing controller configs from broker configs in KRaft mode

2023-08-28 Thread Luke Chen (Jira)
Luke Chen created KAFKA-15409:
-

 Summary: Distinguishing controller configs from broker configs in 
KRaft mode
 Key: KAFKA-15409
 URL: https://issues.apache.org/jira/browse/KAFKA-15409
 Project: Kafka
  Issue Type: Improvement
  Components: kraft
Reporter: Luke Chen
Assignee: Luke Chen


In the doc, we category the configs by components. Currently, we have:

{code:java}

3. Configuration
3.1 Broker Configs
3.2 Topic Configs
3.3 Producer Configs
3.4 Consumer Configs
3.5 Kafka Connect Configs
Source Connector Configs
Sink Connector Configs 
3.6 Kafka Streams Configs
3.7 AdminClient Configs
3.8 System Properties 
{code}

In the `3.1 Broker Configs` section, currently it contains:
1. controller role only configs
2. broker role only configs
3. controller and broker both applicable configs

We should have a way to allow users to know which configs are for controller, 
and which are for broker, and which are for both.


Created a 
[wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263427911]
 to list the configs for controller/broker.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: FYI - CI failures due to Apache Infra (Issue with creating launcher for agent)

2023-08-28 Thread Luke Chen
Thanks for the info, Divij!

Luke

On Mon, Aug 28, 2023 at 6:01 PM Divij Vaidya 
wrote:

> Hey folks
>
> During you CI runs, you may notice that some test pipelines fail to
> start with messages such as:
>
> "ERROR: Issue with creating launcher for agent builds38. The agent is
> being disconnected"
> "Remote call on builds38 failed"
>
> This occurs due to bad hosts in the Apache infrastructure CI. We have
> an ongoing ticket here -
>
> https://issues.apache.org/jira/browse/INFRA-24927?focusedCommentId=17759528&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17759528
>
> I will keep an eye on the ticket and reply to this thread when it is
> fixed. Meanwhile, the workaround is to restart the tests.
>
> Cheers!
>
> --
> Divij Vaidya
>


FYI - CI failures due to Apache Infra (Issue with creating launcher for agent)

2023-08-28 Thread Divij Vaidya
Hey folks

During you CI runs, you may notice that some test pipelines fail to
start with messages such as:

"ERROR: Issue with creating launcher for agent builds38. The agent is
being disconnected"
"Remote call on builds38 failed"

This occurs due to bad hosts in the Apache infrastructure CI. We have
an ongoing ticket here -
https://issues.apache.org/jira/browse/INFRA-24927?focusedCommentId=17759528&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17759528

I will keep an eye on the ticket and reply to this thread when it is
fixed. Meanwhile, the workaround is to restart the tests.

Cheers!

--
Divij Vaidya


Disabling Test: org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()

2023-08-28 Thread Sagar
Hi All,

Should we disable this test:
org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()?

I just did a quick search on my mailbox for this test and it has been
failing for a while. I will go ahead and create a ticket for this for
fixing this.

Let me know if disabling it doesn't sound like a good idea.

Thanks!
Sagar.


[jira] [Created] (KAFKA-15408) Restart failed tasks in Kafka Connect up to a configurable max-tries

2023-08-28 Thread Patrick Pang (Jira)
Patrick Pang created KAFKA-15408:


 Summary: Restart failed tasks in Kafka Connect up to a 
configurable max-tries
 Key: KAFKA-15408
 URL: https://issues.apache.org/jira/browse/KAFKA-15408
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Patrick Pang


h2. Issue

Currently, Kafka Connect just reports failed tasks on REST API, with the error. 
Users are expected to monitor the status and restart individual connectors if 
there is transient errors. Unfortunately these are common for database 
connectors, e.g. transient connection error, flip of DNS, database downtime, 
etc. Kafka Connect silently failing due to these scenarios would lead to stale 
data downstream.
h2. Proposal

Kafka Connect should be able to restart failed tasks automatically, up to a 
configurable max-tries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-972: Add the metric of the current running version of kafka

2023-08-28 Thread Kamal Chandraprakash
Hi Hudeqi,

Kafka already emits the version metric. Can you check whether the below
metric satisfies your requirement?

kafka.server:type=app-info,id=0

--
Kamal

On Mon, Aug 28, 2023 at 2:29 PM hudeqi <16120...@bjtu.edu.cn> wrote:

> Hi, all, I want to submit a minor kip to add a metric, which supports to
> get the running kafka server verison, the wiki url is here
>
> Motivation
>
> At present, it is impossible to perceive the Kafka version that the broker
> is running from the perspective of metrics. If multiple Kafka versions are
> deployed in a cluster due to various reasons, it is difficult for us to
> intuitively understand the version distribution.
>
> So, I want to add a kafka version metric indicating the version of the
> current running kafka server, it can help us to perceive the mixed
> distribution of multiple versions, and to perceive the progress of version
> upgrade in the cluster in real time.
>
> Proposed Changes
>
> When instantiating kafkaServer/BrokerServer, register `KafkaVersion` gauge
> metric, whose value is obtained by `VersionInfo.getVersion`. And remove all
> related metrics when kafkaServer/BrokerServer shutdown.
>
>
>
>
> best,
>
> hudeqi
>
>
>
>
>
>


[VOTE] KIP-965: Support disaster recovery between clusters by MirrorMaker

2023-08-28 Thread hudeqi
Hi, all, this is a vote about kip-965, thanks.

best,
hudeqi


> -原始邮件-
> 发件人: hudeqi <16120...@bjtu.edu.cn>
> 发送时间: 2023-08-17 18:03:49 (星期四)
> 收件人: dev@kafka.apache.org
> 抄送: 
> 主题: Re: [DISCUSSION] KIP-965: Support disaster recovery between clusters 
by MirrorMaker
> 


[DISCUSS] KIP-972: Add the metric of the current running version of kafka

2023-08-28 Thread hudeqi
Hi, all, I want to submit a minor kip to add a metric, which supports to get 
the running kafka server verison, the wiki url is here

Motivation

At present, it is impossible to perceive the Kafka version that the broker is 
running from the perspective of metrics. If multiple Kafka versions are 
deployed in a cluster due to various reasons, it is difficult for us to 
intuitively understand the version distribution.

So, I want to add a kafka version metric indicating the version of the current 
running kafka server, it can help us to perceive the mixed distribution of 
multiple versions, and to perceive the progress of version upgrade in the 
cluster in real time.

Proposed Changes

When instantiating kafkaServer/BrokerServer, register `KafkaVersion` gauge 
metric, whose value is obtained by `VersionInfo.getVersion`. And remove all 
related metrics when kafkaServer/BrokerServer shutdown.




best,

hudeqi







[jira] [Resolved] (KAFKA-15294) Make remote storage related configs as public (i.e. non-internal)

2023-08-28 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-15294.
--
Resolution: Fixed

> Make remote storage related configs as public (i.e. non-internal)
> -
>
> Key: KAFKA-15294
> URL: https://issues.apache.org/jira/browse/KAFKA-15294
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Gantigmaa Selenge
>Priority: Blocker
> Fix For: 3.6.0
>
>
> We should publish all the remote storage related configs in v3.6.0. It can be 
> verified by:
>  
> {code:java}
> ./gradlew releaseTarGz
> # The build output is stored in 
> ./core/build/distributions/kafka_2.13-3.x.x-site-docs.tgz. Untar the file 
> verify it{code}
> {{}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-942: Add Power(ppc64le) support

2023-08-28 Thread Divij Vaidya
Hey Colin

I suggested running tests on every merge to trunk because on an
average we have 5-6 commits merged per day in the discuss thread
https://lists.apache.org/thread/4mfq46fc7nnsr96odqxxhcxyv24d8zn0.
Running this test suite 5 times won't be a burden to the CI
infrastructure. The advantage we get is that, unlike nightly builds
which have a chance of being ignored, branch builds are actively
monitored by folks in the community. Hence, we will be able to add
this new suite without adding a new routine in the maintenance.

--
Divij Vaidya

On Fri, Aug 25, 2023 at 6:49 PM Colin McCabe  wrote:
>
> Thank you for continuing to work on this.
>
> One comment. When we discussed this in the DISCUSS thread, we all wanted to 
> run it nightly in branch builder (or possibly weekly). But looking at the 
> KIP, it doesn't seem to have been updated with the results of these 
> discussions.
>
> best,
> Colin
>
>
> On Mon, Aug 21, 2023, at 01:37, Mickael Maison wrote:
> > +1 (binding)
> > Thanks for the KIP!
> >
> > Mickael
> >
> > On Mon, Aug 14, 2023 at 1:40 PM Divij Vaidya  
> > wrote:
> >>
> >> +1 (binding)
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >> On Wed, Jul 26, 2023 at 9:04 AM Vaibhav Nazare
> >>  wrote:
> >> >
> >> > I'd like to call a vote on KIP-942


Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-08-28 Thread Sagar
Hey Yash,

Thanks for your further comments. Here are my responses:

1) Deleting offsets via updateOffsets.

Hmm, I am not sure this is really necessary to be part of the KIP at this
point, and we can always add it later on if needed. I say this for the
following reasons:


   - The size of offsets topic can be controlled by setting appropriate
   topic retention values and that is a standard practice in Kafka. Sure it's
   not always possible to get the right values but as I said it is a standard
   practice. For Connect specifically, there is also a KIP (KIP-943
   )
   which is trying to solve the problem of a large connect-offsets topic. So,
   if that is really the motivation, then these are being addressed separately
   anyways.
   - Deleting offsets is not something which should be done very frequently
   and should be handled with care. That is why KIP-875's mechanism to have
   users/ cluster admin do this externally is the right thing to do. Agreed
   this involves some toil but it's not something that should be done on a
   very regular basis.
   - There is no stopping connector implementations to send tombstone
   records as offsets but in practice how many connectors actually do it?
   Maybe 1 or 2 from what we discussed.
   - The usecases you highlighted are edge cases at best. As I have been
   saying, if it is needed we can always add it in the future but that doesn't
   look like a problem we need to solve upfront.

Due to these reasons, I don't think this is a point that we need to stress
so much upon. I say this because offsets topic's purging/clean up can be
handled either via standard Kafka techniques (point #1 above) or via
Connect runtime techniques (Pt #2  above). IMO the problem we are trying to
solve via this KIP has been solved by connectors using techniques which
have been termed as having higher maintenance cost or a high cognitive load
(i.e separate topic) and that needs to be addressed upfront. And since you
yourself termed it as a nice to have feature, we can leave it to that and
take it up as Future Work. Hope that's ok with you and other community
members.

2) Purpose of offsets parameter in updateOffsets

The main purpose is to provide the task with the visibility into what
partitions are getting their offsets committed. It is not necessary that a
task might choose to update offsets everytime it sees that a given source
partition is missing from the about to be committed offsets. Maybe it
chooses to wait for some X iterations or X amount of time and send out an
updated offset for a partition only when such thresholds are breached. Even
here we could argue that since it's sending the partition/offsets it can do
the tracking on it's own, but IMO that is too much work given that the
information is already available via offsets to be committed.

Thanks!
Sagar.


[jira] [Created] (KAFKA-15407) Not able to connect to kafka from the Private NLB from outside the VPC account

2023-08-28 Thread Shivakumar (Jira)
Shivakumar created KAFKA-15407:
--

 Summary: Not able to connect to kafka from the Private NLB from 
outside the VPC account 
 Key: KAFKA-15407
 URL: https://issues.apache.org/jira/browse/KAFKA-15407
 Project: Kafka
  Issue Type: Bug
  Components: clients, connect, consumer, producer , protocol
 Environment: Staging, PROD
Reporter: Shivakumar
 Attachments: image-2023-08-28-12-37-33-100.png

!image-2023-08-28-12-37-33-100.png|width=768,height=223!

Problem statement : 
We are trying to connect Kafka from another account/VPC account
Our kafka is in EKS cluster , we have service pointing to these pods for 
connection

We tried to create private link endpoint form Account B to connect to our NLB 
to connect to our Kafka in Account A
We see the connection reset from both client and target(kafka) in the NLB 
monitoring tab of AWS.
We tried various combo of listeners and advertised listeners which did not help 
us.

We are assuming we are missing some combination of Listeners and Network level 
configs with which this connection can be made 
Can you please guide us with this as we are blocked with a major migration. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)