Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1846

2023-05-12 Thread Apache Jenkins Server
See 




Re: Query regarding implementation of KStreams with Hbase

2023-05-12 Thread Matthias J. Sax
Kafka Streams is designed to read and write from a broker cluster. It's 
not designed to write data to different system like HBase.


If you want to get data from Kafka to HBase, you should use Kafka Connect.

Of course, it's possible (but not recommended) to implement your own 
`Processor` and do whatever you want with the data inside Kafka Streams.


HTH.

-Matthias

On 5/11/23 10:38 AM, Rohit M wrote:

Hi team,

There is lot in internet where Kstreams read data from topic , perform some
transformation and write it back to a topic. But I wonder if write to hbase
table is possible with KStream?. And what I mean by this is that we should
read data from topic using KStream , perform some operations like we do on
a dataframe and then write it to a hbase table
I didn't find any resources on internet implementing KStreams with hbase. I
would be glad if could get some help with a piece of code in scala
preferably to read from a topic or even a hbase table using KStream
application , perform some transformation and write it to a hbase table


Regards
Rohit M



Query regarding implementation of KStreams with Hbase

2023-05-12 Thread Rohit M
Hi team,

There is lot in internet where Kstreams read data from topic , perform some
transformation and write it back to a topic. But I wonder if write to hbase
table is possible with KStream?. And what I mean by this is that we should
read data from topic using KStream , perform some operations like we do on
a dataframe and then write it to a hbase table
I didn't find any resources on internet implementing KStreams with hbase. I
would be glad if could get some help with a piece of code in scala
preferably to read from a topic or even a hbase table using KStream
application , perform some transformation and write it to a hbase table


Regards
Rohit M


Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread Luke Chen
Hi Kirk,

Yes, the pressure in broker comes from the message format down conversion.

Luke


Kirk True  於 2023年5月13日 週六 上午1:30 寫道:

> Hi David,
>
> For my own edification, when you refer to this change possibly putting
> "more pressure on the brokers," is that from the downconversion of the
> message format, specifically, or something else?
>
> Thanks,
> Kirk
>
> On Fri, May 12, 2023, at 1:59 AM, Luke Chen wrote:
> > Hi David,
> >
> > I know what you mean.
> > Let's hear what others' thoughts about it. :)
> >
> > Luke
> >
> > On Fri, May 12, 2023 at 4:53 PM David Jacot  >
> > wrote:
> >
> > > Thanks, Luke.
> > >
> > > > But if the producers and consumers all existed in the same
> organization,
> > > which means upgrading producers/consumers for the org's cost saving,
> should
> > > be a reasonable motivation.
> > >
> > > Yeah, that works in this case. However, Kafka is often used as a
> service
> > > (on premise or in cloud) nowadays and in this case the
> producers/consumers
> > > versions are completely out of control thus my concern.
> > >
> > > BR,
> > > David
> > >
> > > On Fri, May 12, 2023 at 10:47 AM Luke Chen  wrote:
> > >
> > > > Hi David,
> > > >
> > > > Yes, you're right. I've bumped the version of record batch, and
> describe
> > > > the down-conversion will happen like what we do for message format
> v1 now
> > > > when old consumers consuming records.
> > > >
> > > > > Overall, I wonder if the bandwidth saving is worth this change
> given
> > > that
> > > > it will put more pressure on the brokers.
> > > > Actually, I'm not 100% sure. So I'd also like to hear what the
> community
> > > > thought about it.
> > > > But if the producers and consumers all existed in the same
> organization,
> > > > which means upgrading producers/consumers for the org's cost saving,
> > > should
> > > > be a reasonable motivation.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > >
> > > > On Fri, May 12, 2023 at 3:43 PM David Jacot
>  > > >
> > > > wrote:
> > > >
> > > > > Hi Luke,
> > > > >
> > > > > Thanks for the KIP.
> > > > >
> > > > > What do we do in the case where a batch is written with
> > > > > `ignoreMessageAttributes` set to 1, which means that messages won't
> > > have
> > > > > the `attributes`, and is consumed by a consumer which does not
> > > understand
> > > > > this new format? I suppose that we would need to introduce a new
> > > version
> > > > > for the message format (v3) and that we will have to downconvert
> > > records
> > > > > from the new format version to v2 in this case. This is not clear
> in
> > > the
> > > > > KIP. Could you elaborate a bit more on this? Overall, I wonder if
> the
> > > > > bandwidth saving is worth this change given that it will put more
> > > > pressure
> > > > > on the brokers.
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Fri, May 12, 2023 at 9:04 AM Luke Chen 
> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start a discussion for the KIP-931: Flag to ignore
> unused
> > > > > > message attribute field. This KIP is to add a flag in the batch
> > > header
> > > > to
> > > > > > indicate if messages inside the batch have attribute field or
> not, to
> > > > > > reduce the message size, thus, save network traffic and storage
> size
> > > > (and
> > > > > > money, of course).
> > > > > >
> > > > > > Please check the link for more detail:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
> > > > > >
> > > > > > Any feedback is welcome.
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > >
> > > >
> > >
> >
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1845

2023-05-12 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-14996) CreateTopic falis with UnknownServerException if num partitions >= QuorumController.MAX_RECORDS_PER_BATCH

2023-05-12 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-14996:
-

 Summary: CreateTopic falis with UnknownServerException if num 
partitions >= QuorumController.MAX_RECORDS_PER_BATCH 
 Key: KAFKA-14996
 URL: https://issues.apache.org/jira/browse/KAFKA-14996
 Project: Kafka
  Issue Type: Bug
  Components: controller
Reporter: Edoardo Comar


If an attempt is made to create a topic with

num partitions >= QuorumController.MAX_RECORDS_PER_BATCH  (1)

the client receives an UnknownServerException - it could rather receive a 
better error.

The controller logs

{{2023-05-12 19:25:10,018] WARN [QuorumController id=1] createTopics: failed 
with unknown server exception IllegalStateException at epoch 2 in 21956 us.  
Renouncing leadership and reverting to the last committed offset 174. 
(org.apache.kafka.controller.QuorumController)}}
{{java.lang.IllegalStateException: Attempted to atomically commit 10001 
records, but maxRecordsPerBatch is 1}}
{{    at 
org.apache.kafka.controller.QuorumController.appendRecords(QuorumController.java:812)}}
{{    at 
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.run(QuorumController.java:719)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)}}
{{    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)}}
{{    at java.base/java.lang.Thread.run(Thread.java:829)}}
{{[}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Unable to access JIRA

2023-05-12 Thread Ramakrishnan Subramanian
Hi Luke,
  I did self submission for the kafka contribution. Please check and guide
me for the same.

Thanks & Regards
Ram

On Fri, May 12, 2023 at 7:01 AM Luke Chen  wrote:

> Hi Ram,
>
> Please submit an account creation request and we'll approve it.
> https://selfserve.apache.org/jira-account.html
>
> Thanks.
> Luke
>
> On Thu, May 11, 2023 at 9:50 PM Ramakrishnan Subramanian <
> ramkris.subraman...@gmail.com> wrote:
>
> > Hi Team,
> >
> >I am unable to access JIRA. Also, I would like to contribute to kafka.
> > Please help me on this.
> >
> > JIRA ID ramkris.subramanian
> > mail id : ramkris.subraman...@gmail.com
> >
> >
> > Thanks & Regards
> > Ram
> >
>


Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread Kirk True
Hi David,

For my own edification, when you refer to this change possibly putting "more 
pressure on the brokers," is that from the downconversion of the message 
format, specifically, or something else?

Thanks,
Kirk

On Fri, May 12, 2023, at 1:59 AM, Luke Chen wrote:
> Hi David,
> 
> I know what you mean.
> Let's hear what others' thoughts about it. :)
> 
> Luke
> 
> On Fri, May 12, 2023 at 4:53 PM David Jacot 
> wrote:
> 
> > Thanks, Luke.
> >
> > > But if the producers and consumers all existed in the same organization,
> > which means upgrading producers/consumers for the org's cost saving, should
> > be a reasonable motivation.
> >
> > Yeah, that works in this case. However, Kafka is often used as a service
> > (on premise or in cloud) nowadays and in this case the producers/consumers
> > versions are completely out of control thus my concern.
> >
> > BR,
> > David
> >
> > On Fri, May 12, 2023 at 10:47 AM Luke Chen  wrote:
> >
> > > Hi David,
> > >
> > > Yes, you're right. I've bumped the version of record batch, and describe
> > > the down-conversion will happen like what we do for message format v1 now
> > > when old consumers consuming records.
> > >
> > > > Overall, I wonder if the bandwidth saving is worth this change given
> > that
> > > it will put more pressure on the brokers.
> > > Actually, I'm not 100% sure. So I'd also like to hear what the community
> > > thought about it.
> > > But if the producers and consumers all existed in the same organization,
> > > which means upgrading producers/consumers for the org's cost saving,
> > should
> > > be a reasonable motivation.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Fri, May 12, 2023 at 3:43 PM David Jacot  > >
> > > wrote:
> > >
> > > > Hi Luke,
> > > >
> > > > Thanks for the KIP.
> > > >
> > > > What do we do in the case where a batch is written with
> > > > `ignoreMessageAttributes` set to 1, which means that messages won't
> > have
> > > > the `attributes`, and is consumed by a consumer which does not
> > understand
> > > > this new format? I suppose that we would need to introduce a new
> > version
> > > > for the message format (v3) and that we will have to downconvert
> > records
> > > > from the new format version to v2 in this case. This is not clear in
> > the
> > > > KIP. Could you elaborate a bit more on this? Overall, I wonder if the
> > > > bandwidth saving is worth this change given that it will put more
> > > pressure
> > > > on the brokers.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Fri, May 12, 2023 at 9:04 AM Luke Chen  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion for the KIP-931: Flag to ignore unused
> > > > > message attribute field. This KIP is to add a flag in the batch
> > header
> > > to
> > > > > indicate if messages inside the batch have attribute field or not, to
> > > > > reduce the message size, thus, save network traffic and storage size
> > > (and
> > > > > money, of course).
> > > > >
> > > > > Please check the link for more detail:
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
> > > > >
> > > > > Any feedback is welcome.
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > >
> > >
> >
> 


Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-12 Thread Yash Mayya
Hey folks,

Thanks for driving this initiative! I think the ability to assign reviewers
/ apply labels to PRs and re-trigger Jenkins builds is really useful and
will also allow us to help out the community a bit more.

Thanks,
Yash

On Fri, May 12, 2023 at 9:24 PM John Roesler  wrote:

> Thanks again for bringing this up, David!
>
> As an update to the community, the PMC has approved a process to make use
> of this feature.
>
> Here are the relevant updates:
>
> PR to add the policy: https://github.com/apache/kafka-site/pull/510
>
> PR to update the list: https://github.com/apache/kafka/pull/13713
>
> Ticket to automate this process.. Contributions welcome :)
> https://issues.apache.org/jira/browse/KAFKA-14995
>
> And to make sure it doesn't fall through the cracks in the mean time,
> here's the release process step:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList
>
> Unfortunately, the "collaborator" feature only allows 20 usernames, so we
> have decided to simply take the top 20 non-committer authors from the past
> year (according to git shortlog). Congratulations to our new collaborators!
>
> Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya,
> Philip Nee, vamossagar12,, Christo Lolov, Federico Valeri, andymg3,
> RivenSun, Kirk True, Matthew de Detrich, Akhilesh C, Alyssa Huang, Artem
> Livshits, Gantigmaa Selenge, Hao Li, Niket, and hudeqi
>
> Thanks,
> -John
>
> On 2023/04/27 18:45:09 David Arthur wrote:
> > Hey folks,
> >
> > I stumbled across this wiki page from the infra team that describes the
> > various features supported in the ".asf.yaml" file:
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> >
> > One section that looked particularly interesting was
> >
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> >
> > github:
> >   collaborators:
> > - userA
> > - userB
> >
> > This would allow us to define non-committers as collaborators on the
> Github
> > project. Concretely, this means they would receive the "triage" Github
> role
> > (defined here
> >
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role
> ).
> > Practically, this means we could let non-committers do things like assign
> > labels and reviewers on Pull Requests.
> >
> > I wanted to see what the committer group thought about this feature. I
> > think it could be useful.
> >
> > Cheers,
> > David
> >
>


[GitHub] [kafka-site] jlprat commented on a diff in pull request #510: MINOR: Add collaborator policy

2023-05-12 Thread via GitHub


jlprat commented on code in PR #510:
URL: https://github.com/apache/kafka-site/pull/510#discussion_r1192560960


##
contributing.html:
##
@@ -78,6 +78,20 @@ Becoming a Committer
Demonstrated good understanding and exercised good 
technical judgement on at least one component of the codebase (e.g. core, 
clients, connect, streams, tests) from contribution activities in the above 
mentioned areas.

 
+   Collaborators
+
+   
+   The Apache build infrastructure has provided two roles to make 
project management easier. These roles allow non-committers to perform some 
administrative actions like triaging issues or triggering builds. They are:

Review Comment:
   I think it's meant to say triaging pull requests, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Adding non-committers as Github collaborators

2023-05-12 Thread John Roesler
Thanks again for bringing this up, David!

As an update to the community, the PMC has approved a process to make use of 
this feature.

Here are the relevant updates:

PR to add the policy: https://github.com/apache/kafka-site/pull/510

PR to update the list: https://github.com/apache/kafka/pull/13713

Ticket to automate this process.. Contributions welcome :) 
https://issues.apache.org/jira/browse/KAFKA-14995

And to make sure it doesn't fall through the cracks in the mean time, here's 
the release process step: 
https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-UpdatetheCollaboratorsList

Unfortunately, the "collaborator" feature only allows 20 usernames, so we have 
decided to simply take the top 20 non-committer authors from the past year 
(according to git shortlog). Congratulations to our new collaborators!

Victoria Xia, Greg Harris, Divij Vaidya, Lucas Brutschy, Yash Mayya, Philip 
Nee, vamossagar12,, Christo Lolov, Federico Valeri, andymg3, RivenSun, Kirk 
True, Matthew de Detrich, Akhilesh C, Alyssa Huang, Artem Livshits, Gantigmaa 
Selenge, Hao Li, Niket, and hudeqi

Thanks,
-John

On 2023/04/27 18:45:09 David Arthur wrote:
> Hey folks,
> 
> I stumbled across this wiki page from the infra team that describes the
> various features supported in the ".asf.yaml" file:
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features
> 
> One section that looked particularly interesting was
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub
> 
> github:
>   collaborators:
> - userA
> - userB
> 
> This would allow us to define non-committers as collaborators on the Github
> project. Concretely, this means they would receive the "triage" Github role
> (defined here
> https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role).
> Practically, this means we could let non-committers do things like assign
> labels and reviewers on Pull Requests.
> 
> I wanted to see what the committer group thought about this feature. I
> think it could be useful.
> 
> Cheers,
> David
> 


[jira] [Created] (KAFKA-14995) Automate asf.yaml collaborators refresh

2023-05-12 Thread John Roesler (Jira)
John Roesler created KAFKA-14995:


 Summary: Automate asf.yaml collaborators refresh
 Key: KAFKA-14995
 URL: https://issues.apache.org/jira/browse/KAFKA-14995
 Project: Kafka
  Issue Type: Improvement
Reporter: John Roesler


We have added a policy to use the asf.yaml Github Collaborators: 
[https://github.com/apache/kafka-site/pull/510]

The policy states that we set this list to be the top 20 commit authors who are 
not Kafka committers. Unfortunately, it's not trivial to compute this list.

Here is the process I followed to generate the list the first time (note that I 
generated this list on 2023-04-28, so the lookback is one year:

1. List authors by commit volume in the last year:
{code:java}
$ git shortlog --email --numbered --summary --since=2022-04-28 | vim {code}
2. manually filter out the authors who are committers, based on 
[https://kafka.apache.org/committers]

3. truncate the list to 20 authors

4. for each author

4a. Find a commit in the `git log` that they were the author on:
{code:java}
commit 440bed2391338dc10fe4d36ab17dc104b61b85e8
Author: hudeqi <1217150...@qq.com>
Date:   Fri May 12 14:03:17 2023 +0800
...{code}
4b. Look up that commit in Github: 
[https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8]

4c. Copy their Github username into .asf.yaml under both the PR whitelist and 
the Collaborators lists.

5. Send a PR to update .asf.yaml: [https://github.com/apache/kafka/pull/13713]

 

This is pretty time consuming and is very scriptable. Two complications:
 * To do the filtering, we need to map from Git log "Author" to documented 
Kafka "Committer" that we can use to perform the filter. Suggestion: just 
update the structure of the "Committers" page to include their Git "Author" 
name and email 
([https://github.com/apache/kafka-site/blob/asf-site/committers.html)]
 * To generate the YAML lists, we need to map from Git log "Author" to Github 
username. There's presumably some way to do this in the Github REST API (the 
mapping is based on the email, IIUC), or we could also just update the 
Committers page to also document each committer's Github username.

 

Ideally, we would write this script (to be stored in the Apache Kafka repo) and 
create a Github Action to run it every three months.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] vvcephei commented on pull request #510: MINOR: Add collaborator policy

2023-05-12 Thread via GitHub


vvcephei commented on PR #510:
URL: https://github.com/apache/kafka-site/pull/510#issuecomment-1545918143

   cf https://github.com/apache/kafka/pull/13713


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] vvcephei commented on pull request #510: MINOR: Add collaborator policy

2023-05-12 Thread via GitHub


vvcephei commented on PR #510:
URL: https://github.com/apache/kafka-site/pull/510#issuecomment-1545905343

   For reference, this topic was initially proposed by @mumrah : 
https://lists.apache.org/thread/93lb6jhkjkmb9op9629xt6c6olwym28c


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-14994) jose4j is vulnerable to CVE- Improper Cryptographic Algorithm

2023-05-12 Thread Gaurav Jetly (Jira)
Gaurav Jetly created KAFKA-14994:


 Summary:  jose4j is vulnerable to CVE- Improper Cryptographic 
Algorithm
 Key: KAFKA-14994
 URL: https://issues.apache.org/jira/browse/KAFKA-14994
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Gaurav Jetly


Jose4j has the following vulnerability with high score of 7.1. 
jose4j is vulnerable to Improper Cryptographic Algorithm. The vulnerability 
exists due to the way `RSA1_5` and `RSA_OAEP` is implemented, allowing an 
attacker to decrypt `RSA1_5` or `RSA_OAEP` encrypted ciphertexts, and in 
addition, it may be feasible to sign with affected keys.

Please help upgrade the library to latest version
Current version in use: 0.7.9
Latest version with the fix: 0.9.3
CVE-
- Improper Cryptographic Algorithm
- Severity: HIGH
- CVSS: 7.1
- Disclosure Date: 07 Feb 2023 19:00PM EST
- Vulnerability Info: 
https://sca.analysiscenter.veracode.com/vulnerability-database/vulnerabilities/40398



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1844

2023-05-12 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.4 #128

2023-05-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 526293 lines...]
[2023-05-12T12:25:16.963Z] 
[2023-05-12T12:25:16.963Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testJuteMaxBufffer() STARTED
[2023-05-12T12:25:16.963Z] 
[2023-05-12T12:25:16.963Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testJuteMaxBufffer() PASSED
[2023-05-12T12:25:16.963Z] 
[2023-05-12T12:25:16.963Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testCreateTokenChangeNotification() STARTED
[2023-05-12T12:25:17.936Z] 
[2023-05-12T12:25:17.936Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testCreateTokenChangeNotification() PASSED
[2023-05-12T12:25:17.936Z] 
[2023-05-12T12:25:17.936Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testGetTopicsAndPartitions() STARTED
[2023-05-12T12:25:17.936Z] 
[2023-05-12T12:25:17.936Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testGetTopicsAndPartitions() PASSED
[2023-05-12T12:25:17.936Z] 
[2023-05-12T12:25:17.936Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[1] STARTED
[2023-05-12T12:25:18.888Z] 
[2023-05-12T12:25:18.888Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[1] PASSED
[2023-05-12T12:25:18.888Z] 
[2023-05-12T12:25:18.888Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[2] STARTED
[2023-05-12T12:25:18.888Z] 
[2023-05-12T12:25:18.888Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[2] PASSED
[2023-05-12T12:25:18.888Z] 
[2023-05-12T12:25:18.888Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testRegisterBrokerInfo() STARTED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testRegisterBrokerInfo() PASSED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testRetryRegisterBrokerInfo() STARTED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testRetryRegisterBrokerInfo() PASSED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testConsumerOffsetPath() STARTED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testConsumerOffsetPath() PASSED
[2023-05-12T12:25:19.942Z] 
[2023-05-12T12:25:19.942Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testTopicAssignments() STARTED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testTopicAssignments() PASSED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testControllerManagementMethods() STARTED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testControllerManagementMethods() PASSED
[2023-05-12T12:25:20.895Z] 
[2023-05-12T12:25:20.895Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testTopicAssignmentMethods() STARTED
[2023-05-12T12:25:21.849Z] 
[2023-05-12T12:25:21.849Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testTopicAssignmentMethods() PASSED
[2023-05-12T12:25:21.849Z] 
[2023-05-12T12:25:21.849Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZkClientTest > testConnectionViaNettyClient() STARTED
[2023-05-12T12:25:23.223Z] 
[2023-05-12T12:25:23.223Z] Gradle Test Run :core:integrationTest > Gradle Test 
Executor 167 > KafkaZk

[jira] [Created] (KAFKA-14993) Improve TransactionIndex instance handling while copying to and fetching from RSM.

2023-05-12 Thread Satish Duggana (Jira)
Satish Duggana created KAFKA-14993:
--

 Summary: Improve TransactionIndex instance handling while copying 
to and fetching from RSM.
 Key: KAFKA-14993
 URL: https://issues.apache.org/jira/browse/KAFKA-14993
 Project: Kafka
  Issue Type: Sub-task
  Components: core
Reporter: Satish Duggana
Assignee: Kamal Chandraprakash






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1843

2023-05-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 474203 lines...]
[2023-05-12T09:56:35.584Z] 
[2023-05-12T09:56:35.584Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers STARTED
[2023-05-12T09:56:56.902Z] 
[2023-05-12T09:56:56.902Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers PASSED
[2023-05-12T09:56:56.902Z] 
[2023-05-12T09:56:56.902Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount STARTED
[2023-05-12T09:57:13.094Z] 
[2023-05-12T09:57:13.094Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount PASSED
[2023-05-12T09:57:13.094Z] 
[2023-05-12T09:57:13.094Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient STARTED
[2023-05-12T09:57:14.046Z] 
[2023-05-12T09:57:14.046Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient PASSED
[2023-05-12T09:57:14.046Z] 
[2023-05-12T09:57:14.046Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys STARTED
[2023-05-12T09:57:21.668Z] 
[2023-05-12T09:57:21.668Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys PASSED
[2023-05-12T09:57:21.668Z] 
[2023-05-12T09:57:21.668Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThreadsPerClient STARTED
[2023-05-12T09:57:22.721Z] 
[2023-05-12T09:57:22.722Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThreadsPerClient PASSED
[2023-05-12T09:57:22.722Z] 
[2023-05-12T09:57:22.722Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyThreadsPerClient STARTED
[2023-05-12T09:57:24.825Z] 
[2023-05-12T09:57:24.825Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyThreadsPerClient PASSED
[2023-05-12T09:57:24.825Z] 
[2023-05-12T09:57:24.825Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargePartitionCount STARTED
[2023-05-12T09:57:54.415Z] 
[2023-05-12T09:57:54.415Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargePartitionCount PASSED
[2023-05-12T09:57:54.415Z] 
[2023-05-12T09:57:54.415Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargePartitionCount STARTED
[2023-05-12T09:58:17.174Z] 
[2023-05-12T09:58:17.174Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargePartitionCount PASSED
[2023-05-12T09:58:17.174Z] 
[2023-05-12T09:58:17.174Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys STARTED
[2023-05-12T09:58:25.136Z] 
[2023-05-12T09:58:25.136Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys PASSED
[2023-05-12T09:58:25.136Z] 
[2023-05-12T09:58:25.136Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys STARTED
[2023-05-12T09:58:49.724Z] 
[2023-05-12T09:58:49.724Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys PASSED
[2023-05-12T09:58:49.724Z] 
[2023-05-12T09:58:49.724Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers STARTED
[2023-05-12T09:58:49.724Z] 
[2023-05-12T09:58:49.724Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers PASSED
[2023-05-12T09:58:49.724Z] 
[2023-05-12T09:58:49.724Z] Gradle Test Run :streams:integrationTest > Gradle 
Test Executor 179 > StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargeNumConsumers STARTED
[2023-05

Re: [DISCUSS] KIP-910: Update Source offsets for Source Connectors without producing records

2023-05-12 Thread Sagar
Hi All,

Thanks for the comments/reviews. I have updated the KIP
https://cwiki.apache.org/confluence/display/KAFKA/KIP-910%3A+Update+Source+offsets+for+Source+Connectors+without+producing+records
with a newer approach which shelves the need for an explicit topic.

Please review again and let me know what you think.

Thanks!
Sagar.


On Mon, Apr 24, 2023 at 3:35 PM Yash Mayya  wrote:

> Hi Sagar,
>
> Thanks for the KIP! I have a few questions and comments:
>
> 1) I agree with Chris' point about the separation of a connector heartbeat
> mechanism and allowing source connectors to generate offsets without
> producing data. What is the purpose of the heartbeat topic here and are
> there any concrete use cases for downstream consumers on this topic? Why
> can't we instead simply introduce a mechanism to retrieve a list of source
> partition / source offset pairs from the source tasks?
>
> 2) With the currently described mechanism, the new
> "SourceTask::produceHeartbeatRecords" method returns a "List"
> - what happens with the topic in each of these source records? Chris
> pointed this out above, but it doesn't seem to have been addressed? The
> "SourceRecord" class also has a bunch of other fields which will be
> irrelevant here (partition, key / value schema, key / value data,
> timestamp, headers). In fact, it seems like only the source partition and
> source offset are relevant here, so we should either introduce a new
> abstraction or simply use a data structure like a mapping from source
> partitions to source offsets (adds to the above point)?
>
> 3) I'm not sure I fully follow why the heartbeat timer / interval is
> needed? What are the downsides of
> calling "SourceTask::produceHeartbeatRecords" in every execution loop
> (similar to the existing "SourceTask::poll" method)? Is this only to
> prevent the generation of a lot of offset records? Since Connect's offsets
> topics are log compacted (and source partitions are used as keys for each
> source offset), I'm not sure if such concerns are valid and such a
> heartbeat timer / interval mechanism is required?
>
> 4) The first couple of rejected alternatives state that the use of a null
> topic / key / value are preferably avoided - but the current proposal would
> also likely require connectors to use such workarounds (null topic when the
> heartbeat topic is configured at a worker level and always for the key /
> value)?
>
> 5) The third rejected alternative talks about subclassing the
> "SourceRecord" class - this presumably means allowing connectors to pass
> special offset only records via the existing poll mechanism? Why was this
> considered a more invasive option? Was it because of the backward
> compatibility issues that would be introduced for plugins using the new
> public API class that still need to be deployed onto older Connect workers?
>
> Thanks,
> Yash
>
> On Fri, Apr 14, 2023 at 6:45 PM Sagar  wrote:
>
> > One thing I forgot to mention in my previous email was that the reason I
> > chose to include the opt-in behaviour via configs was that the users of
> the
> > connector know their workload patterns. If the workload is such that the
> >  connector would receive regular valid updates then there’s ideally no
> need
> > for moving offsets since it would update automatically.
> >
> > This way they aren’t forced to use this feature and can use it only when
> > the workload is expected to be batchy or not frequent.
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Fri, 14 Apr 2023 at 5:32 PM, Sagar  wrote:
> >
> > > Hi Chris,
> > >
> > > Thanks for following up on the response. Sharing my thoughts further:
> > >
> > > If we want to add support for connectors to emit offsets without
> > >> accompanying source records, we could (and IMO should) do that without
> > >> requiring users to manually enable that feature by adjusting worker or
> > >> connector configurations.
> > >
> > >
> > > With the current KIP design, I have tried to implement this in an
> opt-in
> > > manner via configs. I guess what you are trying to say is that this
> > doesn't
> > > need a config of it's own and instead could be part of the poll ->
> > > transform etc -> produce -> commit cycle. That way, the users don't
> need
> > to
> > > set any config and if the connector supports moving offsets w/o
> producing
> > > SourceRecords, it should happen automatically. Is that correct? If that
> > > is the concern, then I can think of not exposing a config and try to
> make
> > > this process automatically. That should ease the load on connector
> users,
> > > but your point about cognitive load on Connector developers, I am still
> > not
> > > sure how to address that. The offsets are privy to a connector and the
> > > framework at best can provide hooks to the tasks to update their
> offsets.
> > > Connector developers would still have to consider all cases before
> > updating
> > > offsets.  And if I ignore the heartbeat topic and heartbeat interval ms
> > > configs, then what the K

Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread David Jacot
That's fair so I leave it up to you, Luke.

Cheers,
David

On Fri, May 12, 2023 at 10:58 AM Luke Chen  wrote:

> Hi David,
>
> Thanks for the response.
> But I don't think the LEO-based leader election only benefit for this case.
> Like in unclean clear election case, we now randomly chose a out-of-sync
> replica to become the leader.
> This LEO-based leader election will help this case, too.
> Besides, not all producers use `acks=all`, thus, when using `acks=0` or
> `acks=1`, they can also benefit from LEO-based leader election.
>
> That's why I think this could be in a separate KIP, and after that's
> introduced, this KIP will be a further improvement based on that.
> Does that make sense?
>
> Actually, I don't strongly insist in doing this, if you still think they
> should be proposed together, I can update the KIP, too.
>
> Thank you.
> Luke
>
> On Fri, May 12, 2023 at 4:48 PM David Jacot 
> wrote:
>
> > Hi Luke,
> >
> > I disagree with this because we don't need the leader election change on
> > its own if we don't do this KIP. They have to go together or not at all
> in
> > my opinion. We need a KIP which designs the entire solution.
> >
> > Best,
> > David
> >
> > On Fri, May 12, 2023 at 10:33 AM Luke Chen  wrote:
> >
> > > Hi Alexandre,
> > >
> > > Thanks for the thoughts.
> > > I've thought about it, and think I would choose to have a new leader
> > > election method to fix the problem we encountered, not this
> "backup-only"
> > > replica solution.
> > > But this is still an interesting idea. Like what you've said, this
> > solution
> > > can bring many benefits.
> > > So maybe you can create a proposal for it?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, May 12, 2023 at 4:21 PM Luke Chen  wrote:
> > >
> > > > Hi Haruki,
> > > >
> > > > Yes, this scenario could happen.
> > > > I'm thinking we can fix it in step 6, when controller tried to get
> LEO
> > > > from B,C replicas, the B,C replica should stop fetcher for this
> > partition
> > > > immediately, before returning the LEO.
> > > > About if we need quorum-based or not, We can discuss in another KIP.
> > I'm
> > > > still thinking about it.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > >
> > > > On Fri, May 12, 2023 at 3:59 PM Luke Chen  wrote:
> > > >
> > > >> Hi David,
> > > >>
> > > >> > It can't be in another KIP as it is required for your proposal to
> > > work.
> > > >> This is also an important part to discuss as it requires the
> > controller
> > > to
> > > >> do more operations on leader changes.
> > > >>
> > > >> Yes, I know this is a requirement for this KIP to work, and need a
> lot
> > > of
> > > >> discussion.
> > > >> So that's why I think it'd be better to have a separate KIP to write
> > the
> > > >> content and discussion.
> > > >> I've put the status of this KIP as "pending" and added a note on the
> > top
> > > >> of this KIP:
> > > >>
> > > >> Note: This KIP requires leader election change, which will be
> proposed
> > > in
> > > >> another KIP.
> > > >>
> > > >> Thanks.
> > > >> Luke
> > > >>
> > > >> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
> > > >> alexandre.dupr...@gmail.com> wrote:
> > > >>
> > > >>> Hi, Luke,
> > > >>>
> > > >>> Thanks for your reply.
> > > >>>
> > > >>> 102. Whether such a replica could become a leader depends on what
> the
> > > >>> end-user wants to use it for and what tradeoffs they wish to make
> > down
> > > >>> the line.
> > > >>>
> > > >>> There are cases, for instance with heterogeneous or interregional
> > > >>> networks, where the difference in latency between subsets of
> brokers
> > > >>> can be high enough for the "slow replicas" to have a detrimental
> > > >>> impact on the ISR traffic they take part in. This can justify
> > > >>> permanently segregating them from ISR traffic by design. And, an
> > > >>> end-user could still prefer to have these "slow replicas" versus
> > > >>> alternative approaches such as mirroring for the benefits they can
> > > >>> bring, for instance: a) they belong to the same cluster with no
> added
> > > >>> admin and ops, b) benefit from a direct, simpler replication path,
> c)
> > > >>> require less infrastructure than a mirrored solution, d) could
> become
> > > >>> unclean leaders for failovers under disaster scenarios such as a
> > > >>> regional service outages.
> > > >>>
> > > >>> Thanks,
> > > >>> Alexandre
> > > >>>
> > > >>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a
> > > écrit
> > > >>> :
> > > >>> >
> > > >>> > Hi, Luke.
> > > >>> >
> > > >>> > Though this proposal definitely looks interesting, as others
> > pointed
> > > >>> out,
> > > >>> > the leader election implementation would be the hard part.
> > > >>> >
> > > >>> > And I think even LEO-based-election is not safe, which could
> cause
> > > >>> silent
> > > >>> > committed-data loss easily.
> > > >>> >
> > > >>> > Let's say we have replicas A,B,C and A is the leader initially,
> and
> > > >>> > min.insync.replicas = 2.
> > > >>> >
> > > >>> > - 1. Initial
> >

Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread Luke Chen
Hi David,

I know what you mean.
Let's hear what others' thoughts about it. :)

Luke

On Fri, May 12, 2023 at 4:53 PM David Jacot 
wrote:

> Thanks, Luke.
>
> > But if the producers and consumers all existed in the same organization,
> which means upgrading producers/consumers for the org's cost saving, should
> be a reasonable motivation.
>
> Yeah, that works in this case. However, Kafka is often used as a service
> (on premise or in cloud) nowadays and in this case the producers/consumers
> versions are completely out of control thus my concern.
>
> BR,
> David
>
> On Fri, May 12, 2023 at 10:47 AM Luke Chen  wrote:
>
> > Hi David,
> >
> > Yes, you're right. I've bumped the version of record batch, and describe
> > the down-conversion will happen like what we do for message format v1 now
> > when old consumers consuming records.
> >
> > > Overall, I wonder if the bandwidth saving is worth this change given
> that
> > it will put more pressure on the brokers.
> > Actually, I'm not 100% sure. So I'd also like to hear what the community
> > thought about it.
> > But if the producers and consumers all existed in the same organization,
> > which means upgrading producers/consumers for the org's cost saving,
> should
> > be a reasonable motivation.
> >
> > Thanks.
> > Luke
> >
> >
> > On Fri, May 12, 2023 at 3:43 PM David Jacot  >
> > wrote:
> >
> > > Hi Luke,
> > >
> > > Thanks for the KIP.
> > >
> > > What do we do in the case where a batch is written with
> > > `ignoreMessageAttributes` set to 1, which means that messages won't
> have
> > > the `attributes`, and is consumed by a consumer which does not
> understand
> > > this new format? I suppose that we would need to introduce a new
> version
> > > for the message format (v3) and that we will have to downconvert
> records
> > > from the new format version to v2 in this case. This is not clear in
> the
> > > KIP. Could you elaborate a bit more on this? Overall, I wonder if the
> > > bandwidth saving is worth this change given that it will put more
> > pressure
> > > on the brokers.
> > >
> > > Best,
> > > David
> > >
> > > On Fri, May 12, 2023 at 9:04 AM Luke Chen  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start a discussion for the KIP-931: Flag to ignore unused
> > > > message attribute field. This KIP is to add a flag in the batch
> header
> > to
> > > > indicate if messages inside the batch have attribute field or not, to
> > > > reduce the message size, thus, save network traffic and storage size
> > (and
> > > > money, of course).
> > > >
> > > > Please check the link for more detail:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
> > > >
> > > > Any feedback is welcome.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread Luke Chen
Hi David,

Thanks for the response.
But I don't think the LEO-based leader election only benefit for this case.
Like in unclean clear election case, we now randomly chose a out-of-sync
replica to become the leader.
This LEO-based leader election will help this case, too.
Besides, not all producers use `acks=all`, thus, when using `acks=0` or
`acks=1`, they can also benefit from LEO-based leader election.

That's why I think this could be in a separate KIP, and after that's
introduced, this KIP will be a further improvement based on that.
Does that make sense?

Actually, I don't strongly insist in doing this, if you still think they
should be proposed together, I can update the KIP, too.

Thank you.
Luke

On Fri, May 12, 2023 at 4:48 PM David Jacot 
wrote:

> Hi Luke,
>
> I disagree with this because we don't need the leader election change on
> its own if we don't do this KIP. They have to go together or not at all in
> my opinion. We need a KIP which designs the entire solution.
>
> Best,
> David
>
> On Fri, May 12, 2023 at 10:33 AM Luke Chen  wrote:
>
> > Hi Alexandre,
> >
> > Thanks for the thoughts.
> > I've thought about it, and think I would choose to have a new leader
> > election method to fix the problem we encountered, not this "backup-only"
> > replica solution.
> > But this is still an interesting idea. Like what you've said, this
> solution
> > can bring many benefits.
> > So maybe you can create a proposal for it?
> >
> > Thank you.
> > Luke
> >
> > On Fri, May 12, 2023 at 4:21 PM Luke Chen  wrote:
> >
> > > Hi Haruki,
> > >
> > > Yes, this scenario could happen.
> > > I'm thinking we can fix it in step 6, when controller tried to get LEO
> > > from B,C replicas, the B,C replica should stop fetcher for this
> partition
> > > immediately, before returning the LEO.
> > > About if we need quorum-based or not, We can discuss in another KIP.
> I'm
> > > still thinking about it.
> > >
> > > Thank you.
> > > Luke
> > >
> > >
> > > On Fri, May 12, 2023 at 3:59 PM Luke Chen  wrote:
> > >
> > >> Hi David,
> > >>
> > >> > It can't be in another KIP as it is required for your proposal to
> > work.
> > >> This is also an important part to discuss as it requires the
> controller
> > to
> > >> do more operations on leader changes.
> > >>
> > >> Yes, I know this is a requirement for this KIP to work, and need a lot
> > of
> > >> discussion.
> > >> So that's why I think it'd be better to have a separate KIP to write
> the
> > >> content and discussion.
> > >> I've put the status of this KIP as "pending" and added a note on the
> top
> > >> of this KIP:
> > >>
> > >> Note: This KIP requires leader election change, which will be proposed
> > in
> > >> another KIP.
> > >>
> > >> Thanks.
> > >> Luke
> > >>
> > >> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
> > >> alexandre.dupr...@gmail.com> wrote:
> > >>
> > >>> Hi, Luke,
> > >>>
> > >>> Thanks for your reply.
> > >>>
> > >>> 102. Whether such a replica could become a leader depends on what the
> > >>> end-user wants to use it for and what tradeoffs they wish to make
> down
> > >>> the line.
> > >>>
> > >>> There are cases, for instance with heterogeneous or interregional
> > >>> networks, where the difference in latency between subsets of brokers
> > >>> can be high enough for the "slow replicas" to have a detrimental
> > >>> impact on the ISR traffic they take part in. This can justify
> > >>> permanently segregating them from ISR traffic by design. And, an
> > >>> end-user could still prefer to have these "slow replicas" versus
> > >>> alternative approaches such as mirroring for the benefits they can
> > >>> bring, for instance: a) they belong to the same cluster with no added
> > >>> admin and ops, b) benefit from a direct, simpler replication path, c)
> > >>> require less infrastructure than a mirrored solution, d) could become
> > >>> unclean leaders for failovers under disaster scenarios such as a
> > >>> regional service outages.
> > >>>
> > >>> Thanks,
> > >>> Alexandre
> > >>>
> > >>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a
> > écrit
> > >>> :
> > >>> >
> > >>> > Hi, Luke.
> > >>> >
> > >>> > Though this proposal definitely looks interesting, as others
> pointed
> > >>> out,
> > >>> > the leader election implementation would be the hard part.
> > >>> >
> > >>> > And I think even LEO-based-election is not safe, which could cause
> > >>> silent
> > >>> > committed-data loss easily.
> > >>> >
> > >>> > Let's say we have replicas A,B,C and A is the leader initially, and
> > >>> > min.insync.replicas = 2.
> > >>> >
> > >>> > - 1. Initial
> > >>> > * A(leo=0), B(leo=0), C(leo=0)
> > >>> > - 2. Produce a message to A
> > >>> > * A(leo=1), B(leo=0), C(leo=0)
> > >>> > - 3. Another producer produces a message to A (i.e. as the
> different
> > >>> batch)
> > >>> > * A(leo=2), B(leo=0), C(leo=0)
> > >>> > - 4. C replicates the first batch. offset=1 is committed (by
> > >>> > acks=min.insync.replicas)
> > >>> > * A(leo=2), B(leo=0), 

Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread David Jacot
Thanks, Luke.

> But if the producers and consumers all existed in the same organization,
which means upgrading producers/consumers for the org's cost saving, should
be a reasonable motivation.

Yeah, that works in this case. However, Kafka is often used as a service
(on premise or in cloud) nowadays and in this case the producers/consumers
versions are completely out of control thus my concern.

BR,
David

On Fri, May 12, 2023 at 10:47 AM Luke Chen  wrote:

> Hi David,
>
> Yes, you're right. I've bumped the version of record batch, and describe
> the down-conversion will happen like what we do for message format v1 now
> when old consumers consuming records.
>
> > Overall, I wonder if the bandwidth saving is worth this change given that
> it will put more pressure on the brokers.
> Actually, I'm not 100% sure. So I'd also like to hear what the community
> thought about it.
> But if the producers and consumers all existed in the same organization,
> which means upgrading producers/consumers for the org's cost saving, should
> be a reasonable motivation.
>
> Thanks.
> Luke
>
>
> On Fri, May 12, 2023 at 3:43 PM David Jacot 
> wrote:
>
> > Hi Luke,
> >
> > Thanks for the KIP.
> >
> > What do we do in the case where a batch is written with
> > `ignoreMessageAttributes` set to 1, which means that messages won't have
> > the `attributes`, and is consumed by a consumer which does not understand
> > this new format? I suppose that we would need to introduce a new version
> > for the message format (v3) and that we will have to downconvert records
> > from the new format version to v2 in this case. This is not clear in the
> > KIP. Could you elaborate a bit more on this? Overall, I wonder if the
> > bandwidth saving is worth this change given that it will put more
> pressure
> > on the brokers.
> >
> > Best,
> > David
> >
> > On Fri, May 12, 2023 at 9:04 AM Luke Chen  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a discussion for the KIP-931: Flag to ignore unused
> > > message attribute field. This KIP is to add a flag in the batch header
> to
> > > indicate if messages inside the batch have attribute field or not, to
> > > reduce the message size, thus, save network traffic and storage size
> (and
> > > money, of course).
> > >
> > > Please check the link for more detail:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
> > >
> > > Any feedback is welcome.
> > >
> > > Thank you.
> > > Luke
> > >
> >
>


Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread David Jacot
Hi Luke,

I disagree with this because we don't need the leader election change on
its own if we don't do this KIP. They have to go together or not at all in
my opinion. We need a KIP which designs the entire solution.

Best,
David

On Fri, May 12, 2023 at 10:33 AM Luke Chen  wrote:

> Hi Alexandre,
>
> Thanks for the thoughts.
> I've thought about it, and think I would choose to have a new leader
> election method to fix the problem we encountered, not this "backup-only"
> replica solution.
> But this is still an interesting idea. Like what you've said, this solution
> can bring many benefits.
> So maybe you can create a proposal for it?
>
> Thank you.
> Luke
>
> On Fri, May 12, 2023 at 4:21 PM Luke Chen  wrote:
>
> > Hi Haruki,
> >
> > Yes, this scenario could happen.
> > I'm thinking we can fix it in step 6, when controller tried to get LEO
> > from B,C replicas, the B,C replica should stop fetcher for this partition
> > immediately, before returning the LEO.
> > About if we need quorum-based or not, We can discuss in another KIP. I'm
> > still thinking about it.
> >
> > Thank you.
> > Luke
> >
> >
> > On Fri, May 12, 2023 at 3:59 PM Luke Chen  wrote:
> >
> >> Hi David,
> >>
> >> > It can't be in another KIP as it is required for your proposal to
> work.
> >> This is also an important part to discuss as it requires the controller
> to
> >> do more operations on leader changes.
> >>
> >> Yes, I know this is a requirement for this KIP to work, and need a lot
> of
> >> discussion.
> >> So that's why I think it'd be better to have a separate KIP to write the
> >> content and discussion.
> >> I've put the status of this KIP as "pending" and added a note on the top
> >> of this KIP:
> >>
> >> Note: This KIP requires leader election change, which will be proposed
> in
> >> another KIP.
> >>
> >> Thanks.
> >> Luke
> >>
> >> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
> >> alexandre.dupr...@gmail.com> wrote:
> >>
> >>> Hi, Luke,
> >>>
> >>> Thanks for your reply.
> >>>
> >>> 102. Whether such a replica could become a leader depends on what the
> >>> end-user wants to use it for and what tradeoffs they wish to make down
> >>> the line.
> >>>
> >>> There are cases, for instance with heterogeneous or interregional
> >>> networks, where the difference in latency between subsets of brokers
> >>> can be high enough for the "slow replicas" to have a detrimental
> >>> impact on the ISR traffic they take part in. This can justify
> >>> permanently segregating them from ISR traffic by design. And, an
> >>> end-user could still prefer to have these "slow replicas" versus
> >>> alternative approaches such as mirroring for the benefits they can
> >>> bring, for instance: a) they belong to the same cluster with no added
> >>> admin and ops, b) benefit from a direct, simpler replication path, c)
> >>> require less infrastructure than a mirrored solution, d) could become
> >>> unclean leaders for failovers under disaster scenarios such as a
> >>> regional service outages.
> >>>
> >>> Thanks,
> >>> Alexandre
> >>>
> >>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a
> écrit
> >>> :
> >>> >
> >>> > Hi, Luke.
> >>> >
> >>> > Though this proposal definitely looks interesting, as others pointed
> >>> out,
> >>> > the leader election implementation would be the hard part.
> >>> >
> >>> > And I think even LEO-based-election is not safe, which could cause
> >>> silent
> >>> > committed-data loss easily.
> >>> >
> >>> > Let's say we have replicas A,B,C and A is the leader initially, and
> >>> > min.insync.replicas = 2.
> >>> >
> >>> > - 1. Initial
> >>> > * A(leo=0), B(leo=0), C(leo=0)
> >>> > - 2. Produce a message to A
> >>> > * A(leo=1), B(leo=0), C(leo=0)
> >>> > - 3. Another producer produces a message to A (i.e. as the different
> >>> batch)
> >>> > * A(leo=2), B(leo=0), C(leo=0)
> >>> > - 4. C replicates the first batch. offset=1 is committed (by
> >>> > acks=min.insync.replicas)
> >>> > * A(leo=2), B(leo=0), C(leo=1)
> >>> > - 5. A loses ZK session (or broker session timeout in KRaft)
> >>> > - 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so
> >>> it
> >>> > needs to interact with each replica. It detects C has the largest LEO
> >>> and
> >>> > decided to elect C as the new leader
> >>> > - 7. Before leader-election is performed, B replicates offset=1,2
> from
> >>> A.
> >>> > offset=2 is committed
> >>> > * This is possible because even if A lost ZK session, A could
> >>> handle
> >>> > fetch requests for a while.
> >>> > - 8. Controller elects C as the new leader. B truncates its offset.
> >>> > offset=2 is lost silently.
> >>> >
> >>> > I have a feeling that we need quorum-based data replication? as Divij
> >>> > pointed out.
> >>> >
> >>> >
> >>> > 2023年5月11日(木) 22:33 David Jacot :
> >>> >
> >>> > > Hi Luke,
> >>> > >
> >>> > > > Yes, on second thought, I think the new leader election is
> >>> required to
> >>> > > work
> >>> > > for this new acks option. I'll think about it and

Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread Luke Chen
Hi David,

Yes, you're right. I've bumped the version of record batch, and describe
the down-conversion will happen like what we do for message format v1 now
when old consumers consuming records.

> Overall, I wonder if the bandwidth saving is worth this change given that
it will put more pressure on the brokers.
Actually, I'm not 100% sure. So I'd also like to hear what the community
thought about it.
But if the producers and consumers all existed in the same organization,
which means upgrading producers/consumers for the org's cost saving, should
be a reasonable motivation.

Thanks.
Luke


On Fri, May 12, 2023 at 3:43 PM David Jacot 
wrote:

> Hi Luke,
>
> Thanks for the KIP.
>
> What do we do in the case where a batch is written with
> `ignoreMessageAttributes` set to 1, which means that messages won't have
> the `attributes`, and is consumed by a consumer which does not understand
> this new format? I suppose that we would need to introduce a new version
> for the message format (v3) and that we will have to downconvert records
> from the new format version to v2 in this case. This is not clear in the
> KIP. Could you elaborate a bit more on this? Overall, I wonder if the
> bandwidth saving is worth this change given that it will put more pressure
> on the brokers.
>
> Best,
> David
>
> On Fri, May 12, 2023 at 9:04 AM Luke Chen  wrote:
>
> > Hi all,
> >
> > I'd like to start a discussion for the KIP-931: Flag to ignore unused
> > message attribute field. This KIP is to add a flag in the batch header to
> > indicate if messages inside the batch have attribute field or not, to
> > reduce the message size, thus, save network traffic and storage size (and
> > money, of course).
> >
> > Please check the link for more detail:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
> >
> > Any feedback is welcome.
> >
> > Thank you.
> > Luke
> >
>


Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread Luke Chen
Hi Alexandre,

Thanks for the thoughts.
I've thought about it, and think I would choose to have a new leader
election method to fix the problem we encountered, not this "backup-only"
replica solution.
But this is still an interesting idea. Like what you've said, this solution
can bring many benefits.
So maybe you can create a proposal for it?

Thank you.
Luke

On Fri, May 12, 2023 at 4:21 PM Luke Chen  wrote:

> Hi Haruki,
>
> Yes, this scenario could happen.
> I'm thinking we can fix it in step 6, when controller tried to get LEO
> from B,C replicas, the B,C replica should stop fetcher for this partition
> immediately, before returning the LEO.
> About if we need quorum-based or not, We can discuss in another KIP. I'm
> still thinking about it.
>
> Thank you.
> Luke
>
>
> On Fri, May 12, 2023 at 3:59 PM Luke Chen  wrote:
>
>> Hi David,
>>
>> > It can't be in another KIP as it is required for your proposal to work.
>> This is also an important part to discuss as it requires the controller to
>> do more operations on leader changes.
>>
>> Yes, I know this is a requirement for this KIP to work, and need a lot of
>> discussion.
>> So that's why I think it'd be better to have a separate KIP to write the
>> content and discussion.
>> I've put the status of this KIP as "pending" and added a note on the top
>> of this KIP:
>>
>> Note: This KIP requires leader election change, which will be proposed in
>> another KIP.
>>
>> Thanks.
>> Luke
>>
>> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
>> alexandre.dupr...@gmail.com> wrote:
>>
>>> Hi, Luke,
>>>
>>> Thanks for your reply.
>>>
>>> 102. Whether such a replica could become a leader depends on what the
>>> end-user wants to use it for and what tradeoffs they wish to make down
>>> the line.
>>>
>>> There are cases, for instance with heterogeneous or interregional
>>> networks, where the difference in latency between subsets of brokers
>>> can be high enough for the "slow replicas" to have a detrimental
>>> impact on the ISR traffic they take part in. This can justify
>>> permanently segregating them from ISR traffic by design. And, an
>>> end-user could still prefer to have these "slow replicas" versus
>>> alternative approaches such as mirroring for the benefits they can
>>> bring, for instance: a) they belong to the same cluster with no added
>>> admin and ops, b) benefit from a direct, simpler replication path, c)
>>> require less infrastructure than a mirrored solution, d) could become
>>> unclean leaders for failovers under disaster scenarios such as a
>>> regional service outages.
>>>
>>> Thanks,
>>> Alexandre
>>>
>>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a écrit
>>> :
>>> >
>>> > Hi, Luke.
>>> >
>>> > Though this proposal definitely looks interesting, as others pointed
>>> out,
>>> > the leader election implementation would be the hard part.
>>> >
>>> > And I think even LEO-based-election is not safe, which could cause
>>> silent
>>> > committed-data loss easily.
>>> >
>>> > Let's say we have replicas A,B,C and A is the leader initially, and
>>> > min.insync.replicas = 2.
>>> >
>>> > - 1. Initial
>>> > * A(leo=0), B(leo=0), C(leo=0)
>>> > - 2. Produce a message to A
>>> > * A(leo=1), B(leo=0), C(leo=0)
>>> > - 3. Another producer produces a message to A (i.e. as the different
>>> batch)
>>> > * A(leo=2), B(leo=0), C(leo=0)
>>> > - 4. C replicates the first batch. offset=1 is committed (by
>>> > acks=min.insync.replicas)
>>> > * A(leo=2), B(leo=0), C(leo=1)
>>> > - 5. A loses ZK session (or broker session timeout in KRaft)
>>> > - 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so
>>> it
>>> > needs to interact with each replica. It detects C has the largest LEO
>>> and
>>> > decided to elect C as the new leader
>>> > - 7. Before leader-election is performed, B replicates offset=1,2 from
>>> A.
>>> > offset=2 is committed
>>> > * This is possible because even if A lost ZK session, A could
>>> handle
>>> > fetch requests for a while.
>>> > - 8. Controller elects C as the new leader. B truncates its offset.
>>> > offset=2 is lost silently.
>>> >
>>> > I have a feeling that we need quorum-based data replication? as Divij
>>> > pointed out.
>>> >
>>> >
>>> > 2023年5月11日(木) 22:33 David Jacot :
>>> >
>>> > > Hi Luke,
>>> > >
>>> > > > Yes, on second thought, I think the new leader election is
>>> required to
>>> > > work
>>> > > for this new acks option. I'll think about it and open another KIP
>>> for it.
>>> > >
>>> > > It can't be in another KIP as it is required for your proposal to
>>> work.
>>> > > This is also an important part to discuss as it requires the
>>> controller to
>>> > > do more operations on leader changes.
>>> > >
>>> > > Cheers,
>>> > > David
>>> > >
>>> > > On Thu, May 11, 2023 at 2:44 PM Luke Chen  wrote:
>>> > >
>>> > > > Hi Ismael,
>>> > > > Yes, on second thought, I think the new leader election is
>>> required to
>>> > > work
>>> > > > for this new acks option. I'll think about it and open 

Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread Luke Chen
Hi Haruki,

Yes, this scenario could happen.
I'm thinking we can fix it in step 6, when controller tried to get LEO from
B,C replicas, the B,C replica should stop fetcher for this partition
immediately, before returning the LEO.
About if we need quorum-based or not, We can discuss in another KIP. I'm
still thinking about it.

Thank you.
Luke


On Fri, May 12, 2023 at 3:59 PM Luke Chen  wrote:

> Hi David,
>
> > It can't be in another KIP as it is required for your proposal to work.
> This is also an important part to discuss as it requires the controller to
> do more operations on leader changes.
>
> Yes, I know this is a requirement for this KIP to work, and need a lot of
> discussion.
> So that's why I think it'd be better to have a separate KIP to write the
> content and discussion.
> I've put the status of this KIP as "pending" and added a note on the top
> of this KIP:
>
> Note: This KIP requires leader election change, which will be proposed in
> another KIP.
>
> Thanks.
> Luke
>
> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
>> Hi, Luke,
>>
>> Thanks for your reply.
>>
>> 102. Whether such a replica could become a leader depends on what the
>> end-user wants to use it for and what tradeoffs they wish to make down
>> the line.
>>
>> There are cases, for instance with heterogeneous or interregional
>> networks, where the difference in latency between subsets of brokers
>> can be high enough for the "slow replicas" to have a detrimental
>> impact on the ISR traffic they take part in. This can justify
>> permanently segregating them from ISR traffic by design. And, an
>> end-user could still prefer to have these "slow replicas" versus
>> alternative approaches such as mirroring for the benefits they can
>> bring, for instance: a) they belong to the same cluster with no added
>> admin and ops, b) benefit from a direct, simpler replication path, c)
>> require less infrastructure than a mirrored solution, d) could become
>> unclean leaders for failovers under disaster scenarios such as a
>> regional service outages.
>>
>> Thanks,
>> Alexandre
>>
>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a écrit :
>> >
>> > Hi, Luke.
>> >
>> > Though this proposal definitely looks interesting, as others pointed
>> out,
>> > the leader election implementation would be the hard part.
>> >
>> > And I think even LEO-based-election is not safe, which could cause
>> silent
>> > committed-data loss easily.
>> >
>> > Let's say we have replicas A,B,C and A is the leader initially, and
>> > min.insync.replicas = 2.
>> >
>> > - 1. Initial
>> > * A(leo=0), B(leo=0), C(leo=0)
>> > - 2. Produce a message to A
>> > * A(leo=1), B(leo=0), C(leo=0)
>> > - 3. Another producer produces a message to A (i.e. as the different
>> batch)
>> > * A(leo=2), B(leo=0), C(leo=0)
>> > - 4. C replicates the first batch. offset=1 is committed (by
>> > acks=min.insync.replicas)
>> > * A(leo=2), B(leo=0), C(leo=1)
>> > - 5. A loses ZK session (or broker session timeout in KRaft)
>> > - 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so it
>> > needs to interact with each replica. It detects C has the largest LEO
>> and
>> > decided to elect C as the new leader
>> > - 7. Before leader-election is performed, B replicates offset=1,2 from
>> A.
>> > offset=2 is committed
>> > * This is possible because even if A lost ZK session, A could handle
>> > fetch requests for a while.
>> > - 8. Controller elects C as the new leader. B truncates its offset.
>> > offset=2 is lost silently.
>> >
>> > I have a feeling that we need quorum-based data replication? as Divij
>> > pointed out.
>> >
>> >
>> > 2023年5月11日(木) 22:33 David Jacot :
>> >
>> > > Hi Luke,
>> > >
>> > > > Yes, on second thought, I think the new leader election is required
>> to
>> > > work
>> > > for this new acks option. I'll think about it and open another KIP
>> for it.
>> > >
>> > > It can't be in another KIP as it is required for your proposal to
>> work.
>> > > This is also an important part to discuss as it requires the
>> controller to
>> > > do more operations on leader changes.
>> > >
>> > > Cheers,
>> > > David
>> > >
>> > > On Thu, May 11, 2023 at 2:44 PM Luke Chen  wrote:
>> > >
>> > > > Hi Ismael,
>> > > > Yes, on second thought, I think the new leader election is required
>> to
>> > > work
>> > > > for this new acks option. I'll think about it and open another KIP
>> for
>> > > it.
>> > > >
>> > > > Hi Divij,
>> > > > Yes, I agree with all of them. I'll think about it and let you know
>> how
>> > > we
>> > > > can work together.
>> > > >
>> > > > Hi Alexandre,
>> > > > > 100. The KIP makes one statement which may be considered critical:
>> > > > "Note that in acks=min.insync.replicas case, the slow follower might
>> > > > be easier to become out of sync than acks=all.". Would you have some
>> > > > data on that behaviour when using the new ack semantic? It would be
>> > > > interesting to analyse and especi

Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-12 Thread Luke Chen
Hi David,

> It can't be in another KIP as it is required for your proposal to work.
This is also an important part to discuss as it requires the controller to
do more operations on leader changes.

Yes, I know this is a requirement for this KIP to work, and need a lot of
discussion.
So that's why I think it'd be better to have a separate KIP to write the
content and discussion.
I've put the status of this KIP as "pending" and added a note on the top of
this KIP:

Note: This KIP requires leader election change, which will be proposed in
another KIP.

Thanks.
Luke

On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi, Luke,
>
> Thanks for your reply.
>
> 102. Whether such a replica could become a leader depends on what the
> end-user wants to use it for and what tradeoffs they wish to make down
> the line.
>
> There are cases, for instance with heterogeneous or interregional
> networks, where the difference in latency between subsets of brokers
> can be high enough for the "slow replicas" to have a detrimental
> impact on the ISR traffic they take part in. This can justify
> permanently segregating them from ISR traffic by design. And, an
> end-user could still prefer to have these "slow replicas" versus
> alternative approaches such as mirroring for the benefits they can
> bring, for instance: a) they belong to the same cluster with no added
> admin and ops, b) benefit from a direct, simpler replication path, c)
> require less infrastructure than a mirrored solution, d) could become
> unclean leaders for failovers under disaster scenarios such as a
> regional service outages.
>
> Thanks,
> Alexandre
>
> Le jeu. 11 mai 2023 à 14:57, Haruki Okada  a écrit :
> >
> > Hi, Luke.
> >
> > Though this proposal definitely looks interesting, as others pointed out,
> > the leader election implementation would be the hard part.
> >
> > And I think even LEO-based-election is not safe, which could cause silent
> > committed-data loss easily.
> >
> > Let's say we have replicas A,B,C and A is the leader initially, and
> > min.insync.replicas = 2.
> >
> > - 1. Initial
> > * A(leo=0), B(leo=0), C(leo=0)
> > - 2. Produce a message to A
> > * A(leo=1), B(leo=0), C(leo=0)
> > - 3. Another producer produces a message to A (i.e. as the different
> batch)
> > * A(leo=2), B(leo=0), C(leo=0)
> > - 4. C replicates the first batch. offset=1 is committed (by
> > acks=min.insync.replicas)
> > * A(leo=2), B(leo=0), C(leo=1)
> > - 5. A loses ZK session (or broker session timeout in KRaft)
> > - 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so it
> > needs to interact with each replica. It detects C has the largest LEO and
> > decided to elect C as the new leader
> > - 7. Before leader-election is performed, B replicates offset=1,2 from A.
> > offset=2 is committed
> > * This is possible because even if A lost ZK session, A could handle
> > fetch requests for a while.
> > - 8. Controller elects C as the new leader. B truncates its offset.
> > offset=2 is lost silently.
> >
> > I have a feeling that we need quorum-based data replication? as Divij
> > pointed out.
> >
> >
> > 2023年5月11日(木) 22:33 David Jacot :
> >
> > > Hi Luke,
> > >
> > > > Yes, on second thought, I think the new leader election is required
> to
> > > work
> > > for this new acks option. I'll think about it and open another KIP for
> it.
> > >
> > > It can't be in another KIP as it is required for your proposal to work.
> > > This is also an important part to discuss as it requires the
> controller to
> > > do more operations on leader changes.
> > >
> > > Cheers,
> > > David
> > >
> > > On Thu, May 11, 2023 at 2:44 PM Luke Chen  wrote:
> > >
> > > > Hi Ismael,
> > > > Yes, on second thought, I think the new leader election is required
> to
> > > work
> > > > for this new acks option. I'll think about it and open another KIP
> for
> > > it.
> > > >
> > > > Hi Divij,
> > > > Yes, I agree with all of them. I'll think about it and let you know
> how
> > > we
> > > > can work together.
> > > >
> > > > Hi Alexandre,
> > > > > 100. The KIP makes one statement which may be considered critical:
> > > > "Note that in acks=min.insync.replicas case, the slow follower might
> > > > be easier to become out of sync than acks=all.". Would you have some
> > > > data on that behaviour when using the new ack semantic? It would be
> > > > interesting to analyse and especially look at the percentage of time
> > > > the number of replicas in ISR is reduced to the configured
> > > > min.insync.replicas.
> > > >
> > > > The comparison data would be interesting. I can have a test when
> > > available.
> > > > But this KIP will be deprioritized because there should be a
> > > pre-requisite
> > > > KIP for it.
> > > >
> > > > > A (perhaps naive) hypothesis would be that the
> > > > new ack semantic indeed provides better produce latency, but at the
> > > > cost of precipitating the slowest replica(s) out of the ISR?
> >

Re: [DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread David Jacot
Hi Luke,

Thanks for the KIP.

What do we do in the case where a batch is written with
`ignoreMessageAttributes` set to 1, which means that messages won't have
the `attributes`, and is consumed by a consumer which does not understand
this new format? I suppose that we would need to introduce a new version
for the message format (v3) and that we will have to downconvert records
from the new format version to v2 in this case. This is not clear in the
KIP. Could you elaborate a bit more on this? Overall, I wonder if the
bandwidth saving is worth this change given that it will put more pressure
on the brokers.

Best,
David

On Fri, May 12, 2023 at 9:04 AM Luke Chen  wrote:

> Hi all,
>
> I'd like to start a discussion for the KIP-931: Flag to ignore unused
> message attribute field. This KIP is to add a flag in the batch header to
> indicate if messages inside the batch have attribute field or not, to
> reduce the message size, thus, save network traffic and storage size (and
> money, of course).
>
> Please check the link for more detail:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field
>
> Any feedback is welcome.
>
> Thank you.
> Luke
>


[DISCUSS] KIP-931: Flag to ignore unused message attribute field

2023-05-12 Thread Luke Chen
Hi all,

I'd like to start a discussion for the KIP-931: Flag to ignore unused
message attribute field. This KIP is to add a flag in the batch header to
indicate if messages inside the batch have attribute field or not, to
reduce the message size, thus, save network traffic and storage size (and
money, of course).

Please check the link for more detail:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-931%3A+Flag+to+ignore+unused+message+attribute+field

Any feedback is welcome.

Thank you.
Luke