Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Sagar
Hi Divij,

I think this proposal overall makes sense. My only nit sort of a suggestion
is that let's also consider a label called newbie++[1] for flaky tests if
we are considering adding newbie as a label. I think some of the flaky
tests need familiarity with the codebase or the test setup so as a first
time contributor, it might be difficult. newbie++ IMO covers that aspect.

[1]
https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22

Let me know what you think.

Thanks!
Sagar.

On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
wrote:

> >  Please, do it.
> We can use specific labels to effectively filter those tickets.
>
> We already have a label and a way to discover flaky tests. They are tagged
> with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
> for folks who are new to Apache Kafka code base.
> My suggestion is to send a broader email to the community (since many will
> miss details in this thread) and call for action for committers to
> volunteer as "shepherds" for these tickets. I can send one out once we have
> some consensus wrt next steps in this thread.
>
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
>
> [2] https://kafka.apache.org/contributing -> Finding a project to work on
>
>
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> wrote:
>
> >
> > > To kickstart this effort, we can publish a list of such tickets in the
> > community and assign one or more committers the role of a «shepherd" for
> > each ticket.
> >
> > Please, do it.
> > We can use specific label to effectively filter those tickets.
> >
> > > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> > написал(а):
> > >
> > > Thanks for bringing this up David.
> > >
> > > My primary concern revolves around the possibility that the currently
> > > disabled tests may remain inactive indefinitely. We currently have
> > > unresolved JIRA tickets for flaky tests that have been pending for an
> > > extended period. I am inclined to support the idea of disabling these
> > tests
> > > temporarily and merging changes only when the build is successful,
> > provided
> > > there is a clear plan for re-enabling them in the future.
> > >
> > > To address this issue, I propose the following measures:
> > >
> > > 1\ Foster a supportive environment for new contributors within the
> > > community, encouraging them to take on tickets associated with flaky
> > tests.
> > > This initiative would require individuals familiar with the relevant
> code
> > > to offer guidance to those undertaking these tasks. Committers should
> > > prioritize reviewing and addressing these tickets within their
> available
> > > bandwidth. To kickstart this effort, we can publish a list of such
> > tickets
> > > in the community and assign one or more committers the role of a
> > "shepherd"
> > > for each ticket.
> > >
> > > 2\ Implement a policy to block minor version releases until the Release
> > > Manager (RM) is satisfied that the disabled tests do not result in gaps
> > in
> > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs)
> in
> > > the specific code areas to provide assurance before giving the green
> > light
> > > for a release.
> > >
> > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > > Integration (CI) system. This goal should encompass projects such as
> > > refining our test suite to eliminate flakiness and addressing
> > > infrastructure issues if necessary. By publishing this goal, we create
> a
> > > shared vision for the community in 2024, fostering alignment on our
> > > objectives. This alignment will aid in prioritizing tasks for community
> > > members and guide reviewers in allocating their bandwidth effectively.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > 
> > > wrote:
> > >
> > >> I will say that I have also seen tests that seem to be more flaky
> > >> intermittently. It may be ok for some time and suddenly the CI is
> > >> overloaded and we see issues.
> > >> I have also seen the CI struggling with running out of space recently,
> > so I
> > >> wonder if we can also try to improve things on that front.
> > >>
> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> > week.
> > >> I'm happy to try to get to green builds, but everyone needs to be on
> > board.
> > >>
> > >> https://issues.apache.org/jira/browse/KAFKA-15529
> > >> https://issues.apache.org/jira/browse/KAFKA-14806
> > >> https://issues.apache.org/jira/browse/KAFKA-14249
> > >> https://issues.apache.org/jira/browse/KAFKA-15798
> > >> https://issues.apache.org/jira/browse/KAFKA-15797
> > >> 

[jira] [Created] (KAFKA-15821) Active topics for deleted connectors are not reset in standalone mode

2023-11-13 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-15821:
-

 Summary: Active topics for deleted connectors are not reset in 
standalone mode
 Key: KAFKA-15821
 URL: https://issues.apache.org/jira/browse/KAFKA-15821
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.5.1, 3.6.0, 3.4.1, 3.5.0, 3.3.2, 3.3.1, 3.2.3, 3.2.2, 
3.4.0, 3.2.1, 3.1.2, 3.0.2, 3.3.0, 3.1.1, 3.2.0, 2.8.2, 3.0.1, 3.0.0, 2.8.1, 
2.7.2, 2.6.3, 3.1.0, 2.6.2, 2.7.1, 2.8.0, 2.6.1, 2.7.0, 2.5.1, 2.6.0, 2.5.0, 
3.7.0
Reporter: Chris Egerton


In 
[KIP-558|https://cwiki.apache.org/confluence/display/KAFKA/KIP-558%3A+Track+the+set+of+actively+used+topics+by+connectors+in+Kafka+Connect],
 a new REST endpoint was added to report the set of active topics for a 
connector. The KIP specified that "Deleting a connector will reset this 
connector's set of active topics", and this logic was successfully implemented 
in distributed mode. However, in standalone mode, active topics for deleted 
connectors are not deleted, and if a connector is re-created, it will inherit 
the active topics of its predecessor(s) unless they were manually reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-13 Thread Jorge Esteban Quilcate Otoya
Divij, thanks for your prompt feedback!

1. Agree, caching at the plugin level was my initial idea as well; though,
keeping two caches for the same data both at the broker and at the plugin
seems wasteful. (added this as a rejected alternative in the meantime)

2. Not necessarially. The API allows to request a set of indexes. In the
case of the `RemoteIndexCache`, as it's currently implemented, it would be
using: [offset, time, transaction] index types.

However, I see your point that there may be scenarios where only 1 of the 3
indexes are used:
- Time index used mostly once when fetching sequentially by seeking offset
by time.
- Offset and Transaction indexes are probably the only ones that make sense
to cache as are used on every fetch.
Arguably, Transaction indexes are not as common, reducing the benefits of
the proposed approach:
from initially expecting to fetch 3 indexes at once, to potentially
fetching only 2 (offset, txn), but most probably fetching 1 (offset).

If there's value perceived from fetching Offset and Transaction together,
we can keep discussing this KIP. In the meantime, I will look into the
approach to lazily fetch indexes while waiting for additional feedback.

Cheers,
Jorge.

On Mon, 13 Nov 2023 at 16:51, Divij Vaidya  wrote:

> Hi Jorge
>
> 1. I don't think we need a new API here because alternatives solutions
> exist even with the current API. As an example, when the first index is
> fetched, the RSM plugin can choose to download all indexes and cache it
> locally. On the next call to fetch an index from the remote tier, we will
> hit the cache and retrieve the index from there.
>
> 2. The KIP assumes that all indexes are required at all times. However,
> indexes such as transaction indexes are only required for read_committed
> fetches and time index is only required when a fetch call wants to search
> offset by timestamp. As a future step in Tiered Storage, I would actually
> prefer to move towards a direction where we are lazily fetching indexes
> on-demand instead of fetching them together as proposed in the KIP.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I would like to start the discussion on a KIP for Tiered Storage. It's
> > about improving cross-segment latencies by reducing calls to fetch
> indexes
> > individually.
> > Have a look:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
> >
> > Cheers,
> > Jorge
> >
>


Re: [PR] Update Satish added as a PMC member [kafka-site]

2023-11-13 Thread via GitHub


satishd merged PR #566:
URL: https://github.com/apache/kafka-site/pull/566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update Satish added as a PMC member [kafka-site]

2023-11-13 Thread via GitHub


satishd commented on PR #566:
URL: https://github.com/apache/kafka-site/pull/566#issuecomment-1809575763

   Thanks @jlprat @showuon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-15820) Add a metric to track the number of partitions under min ISR

2023-11-13 Thread Calvin Liu (Jira)
Calvin Liu created KAFKA-15820:
--

 Summary: Add a metric to track the number of partitions under min 
ISR
 Key: KAFKA-15820
 URL: https://issues.apache.org/jira/browse/KAFKA-15820
 Project: Kafka
  Issue Type: Sub-task
Reporter: Calvin Liu
Assignee: Calvin Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.6 #112

2023-11-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 310614 lines...]

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testConditionalUpdatePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteTopicZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteTopicZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeletePath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeletePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetBrokerMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetBrokerMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testCreateTokenChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testCreateTokenChangeNotification() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetTopicsAndPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetTopicsAndPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testConsumerOffsetPath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testConsumerOffsetPath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testTopicAssignments() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testTopicAssignments() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testControllerManagementMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testControllerManagementMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testTopicAssignmentMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testTopicAssignmentMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testConnectionViaNettyClient() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testConnectionViaNettyClient() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testPropagateIsrChanges() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testPropagateIsrChanges() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testControllerEpochMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testControllerEpochMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteRecursive() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testDeleteRecursive() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetTopicPartitionStates() STARTED

Gradle Test Run :core:test > Gradle Test Executor 93 > KafkaZkClientTest > 
testGetTopicPartitionStates() PASSED

Gradle Test Run :core:test > Gradle Test Executor 93 > 

Re: [VOTE] KIP-892: Transactional StateStores

2023-11-13 Thread Sophie Blee-Goldman
+1 (binding)

Thanks a lot for this KIP!

On Mon, Nov 13, 2023 at 8:39 AM Lucas Brutschy
 wrote:

> Hi Nick,
>
> really happy with the final KIP. Thanks a lot for the hard work!
>
> +1 (binding)
>
> Lucas
>
> On Mon, Nov 13, 2023 at 4:20 PM Colt McNealy  wrote:
> >
> > +1 (non-binding).
> >
> > Thank you, Nick, for making all of the changes (especially around the
> > `default.state.isolation.level` config).
> >
> > Colt McNealy
> >
> > *Founder, LittleHorse.dev*
> >
> >
> > On Mon, Nov 13, 2023 at 7:15 AM Nick Telford 
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to call a vote for KIP-892: Transactional StateStores[1],
> which
> > > makes Kafka Streams StateStores transactional under EOS.
> > >
> > > Regards,
> > >
> > > Nick
> > >
> > > 1:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-892%3A+Transactional+Semantics+for+StateStores
> > >
>


Re: [VOTE] KIP-979 Allow independently stop KRaft processes

2023-11-13 Thread Hailey Ni
Hi Colin,

Thank you for your review. I removed the "absolute path need to be
provided" line from the KIP, and will modify the code to get the absolute
path to the config files using some bash in the kafka-server-start file.
For your second question, I've added a line in the KIP: "If both parameters
are provided, the value for node-id parameter will take precedence, i.e,
the process with node id specified will be killed, no matter what's the
process role provided."

What do you think?

Thanks,
Hailey

On Thu, Nov 9, 2023 at 4:03 PM Colin McCabe  wrote:

> Hi Hailey,
>
> Thanks for the KIP.
>
> It feels clunky to have to pass an absolute path to the configuration file
> when starting the broker or controller. I think we should consider one of
> two alternate options:
>
> 1. Use JMXtool to examine the running kafka.Kafka processes.
> I believe ID is available from kafka.server, type=app-info,id=1 (replace 1
> with the actual ID)
>
> Role can be deduced by the presence or absence of
> kafka.server,type=KafkaServer,name=BrokerState for brokers, or
> kafka.server,type=ControllerServer,name=ClusterId for controllers.
>
> 2. Alternately, we could inject the ID and role into the command line in
> kafka-server-start.sh. Basically add -Dkafka.node.id=1,
> -Dkafka.node.roles=broker. This would be helpful to people just examining
> the output of ps.
>
> Finally, you state that either command-line option can be given. What
> happens if both are given?
>
> best,
> Colin
>
>
> On Mon, Oct 23, 2023, at 12:20, Hailey Ni wrote:
> > Hi Ron,
> >
> > I've added the "Rejected Alternatives" section in the KIP. Thanks for the
> > comments and +1 vote!
> >
> > Thanks,
> > Hailey
> >
> > On Mon, Oct 23, 2023 at 6:33 AM Ron Dagostino  wrote:
> >
> >> Hi Hailey.  I'm +1 (binding), but could you add a "Rejected
> >> Alternatives" section to the KIP and mention the "--required-config "
> >> option that we decided against and the reason why we made the decision
> >> to reject it?  There were some other small things (dash instead of dot
> >> in the parameter names, --node-id instead of --broker-id), but
> >> cosmetic things like this don't warrant a mention, so I think there's
> >> just the one thing to document.
> >>
> >> Thanks for the KIP, and thanks for adjusting it along the way as the
> >> discussion moved forward.
> >>
> >> Ron
> >>
> >>
> >> Ron
> >>
> >> On Mon, Oct 23, 2023 at 4:00 AM Federico Valeri 
> >> wrote:
> >> >
> >> > +1 (non binding)
> >> >
> >> > Thanks.
> >> >
> >> > On Mon, Oct 23, 2023 at 9:48 AM Kamal Chandraprakash
> >> >  wrote:
> >> > >
> >> > > +1 (non-binding). Thanks for the KIP!
> >> > >
> >> > > On Mon, Oct 23, 2023, 12:55 Hailey Ni 
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I'd like to call a vote on KIP-979 that will allow users to
> >> independently
> >> > > > stop KRaft processes.
> >> > > >
> >> > > >
> >> > > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-979%3A+Allow+independently+stop+KRaft+processes
> >> > > >
> >> > > > Thanks,
> >> > > > Hailey
> >> > > >
> >>
>


[jira] [Created] (KAFKA-15819) KafkaServer leaks KafkaRaftManager when ZK migration enabled

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15819:
---

 Summary: KafkaServer leaks KafkaRaftManager when ZK migration 
enabled
 Key: KAFKA-15819
 URL: https://issues.apache.org/jira/browse/KAFKA-15819
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


In SharedServer, TestRaftServer, and MetadataShell, the KafkaRaftManager is 
maintained as an instance variable, and shutdown when the outer instance is 
shutdown. However, in the KafkaServer, the KafkaRaftManager is instantiated and 
started, but then the reference is lost.

[https://github.com/apache/kafka/blob/49d3122d425171b6a59a2b6f02d3fe63d3ac2397/core/src/main/scala/kafka/server/KafkaServer.scala#L416-L442]

Instead, the KafkaServer should behave like the other call-sites of 
KafkaRaftManager, and shutdown the KafkaRaftManager during shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15818) Implement max poll internval

2023-11-13 Thread Philip Nee (Jira)
Philip Nee created KAFKA-15818:
--

 Summary: Implement max poll internval
 Key: KAFKA-15818
 URL: https://issues.apache.org/jira/browse/KAFKA-15818
 Project: Kafka
  Issue Type: Task
  Components: consumer
Reporter: Philip Nee


In the network thread, we need a timer configure to take MAX_POLL_INTERVAL_MAX. 
 The reason is if the user don't poll the consumer within the internal, the 
member needs to leave the group.

 

Currently, we send an acknowledgement event to the network thread per poll.  It 
needs to do two things 1. update autocommit state 2. update max poll interval 
timer 

 

The current logic looks like this:
{code:java}
 if (heartbeat.pollTimeoutExpired(now)) {
// the poll timeout has expired, which means that the foreground thread has 
stalled
// in between calls to poll().
log.warn("consumer poll timeout has expired. This means the time between 
subsequent calls to poll() " +
"was longer than the configured max.poll.interval.ms, which typically 
implies that " +
"the poll loop is spending too much time processing messages. You can 
address this " +
"either by increasing max.poll.interval.ms or by reducing the maximum 
size of batches " +
"returned in poll() with max.poll.records.");

maybeLeaveGroup("consumer poll timeout has expired.");
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15817) Avoid reconnecting to the same IP address if multiple addresses are available

2023-11-13 Thread Bob Barrett (Jira)
Bob Barrett created KAFKA-15817:
---

 Summary: Avoid reconnecting to the same IP address if multiple 
addresses are available
 Key: KAFKA-15817
 URL: https://issues.apache.org/jira/browse/KAFKA-15817
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.5.1, 3.6.0, 3.4.1, 3.3.2
Reporter: Bob Barrett


In https://issues.apache.org/jira/browse/KAFKA-12193, we changed the DNS 
resolution behavior for clients to re-resolve DNS after disconnecting from a 
broker, rather than wait until we iterated over all addresses from a given 
resolution. This is useful when the IP addresses have changed between the 
connection and disconnection.

However, with the behavior change, this does mean that clients could 
potentially reconnect immediately to the same IP they just disconnected from, 
if the IPs have not changed. In cases where the disconnection happened because 
that IP was unhealthy (such as a case where a load balancer has instances in 
multiple availability zones and one zone is unhealthy, or a case where an 
intermediate component in the network path is going through a rolling restart), 
this will delay the client successfully reconnecting. To address this, clients 
should remember the IP they just disconnected from and skip that IP when 
reconnecting, as long as the address resolved to multiple addresses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.6 #111

2023-11-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 409483 lines...]

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetAllTopicsInClusterTriggersWatch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteTopicZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteTopicZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeletePath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeletePath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetBrokerMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetBrokerMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateTokenChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateTokenChangeNotification() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicsAndPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicsAndPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [1] createChrootIfNecessary=true PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testChroot(boolean) > [2] createChrootIfNecessary=false PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testRetryRegisterBrokerInfo() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConsumerOffsetPath() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConsumerOffsetPath() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursiveWithControllerEpochVersionCheck() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignments() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignments() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerManagementMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerManagementMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignmentMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testTopicAssignmentMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConnectionViaNettyClient() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testConnectionViaNettyClient() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testPropagateIsrChanges() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testPropagateIsrChanges() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerEpochMethods() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testControllerEpochMethods() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursive() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testDeleteRecursive() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicPartitionStates() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testGetTopicPartitionStates() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateConfigChangeNotification() STARTED

Gradle Test Run :core:test > Gradle Test Executor 91 > KafkaZkClientTest > 
testCreateConfigChangeNotification() PASSED

Gradle Test Run :core:test > Gradle Test Executor 91 > 

[jira] [Resolved] (KAFKA-15532) ZkWriteBehindLag should not be reported by inactive controllers

2023-11-13 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-15532.
--
Resolution: Fixed

> ZkWriteBehindLag should not be reported by inactive controllers
> ---
>
> Key: KAFKA-15532
> URL: https://issues.apache.org/jira/browse/KAFKA-15532
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
>
> Since only the active controller is performing the dual-write to ZK during a 
> migration, it should be the only controller to report the ZkWriteBehindLag 
> metric. 
>  
> Currently, if the controller fails over during a migration, the previous 
> active controller will incorrectly report its last value for ZkWriteBehindLag 
> forever. Instead, it should report zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15816) Typos in tests leak network sockets

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15816:
---

 Summary: Typos in tests leak network sockets
 Key: KAFKA-15816
 URL: https://issues.apache.org/jira/browse/KAFKA-15816
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients:
 * KafkaConsumerTest
 * KafkaProducerTest
 * ConfigResourceTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest

Core:
 * DescribeAuthorizedOperationsTest
 * SslGssapiSslEndToEndAuthorizationTest
 * SaslMultiMechanismConsumerTest
 * SaslPlaintextConsumerTest
 * SaslSslAdminIntegrationTest
 * SaslSslConsumerTest
 * MultipleListenersWithDefaultJaasContextTest
 * DescribeClusterRequestTest

Trogdor:
 * AgentTest

These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15815) JsonRestServer leaks sockets via HttpURLConnection when keep-alive enabled

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15815:
---

 Summary: JsonRestServer leaks sockets via HttpURLConnection when 
keep-alive enabled
 Key: KAFKA-15815
 URL: https://issues.apache.org/jira/browse/KAFKA-15815
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Greg Harris


By default HttpURLConnection has keep-alive enabled, which allows a single 
HttpURLConnection to be left open in order to be re-used for later requests. 
This means that despite JsonRestServer calling `close()` on the relevant 
InputStream, and calling `disconnect()` on the connection itself, the 
HttpURLConnection does not call `close()` on the underlying socket.

This affects the Trogdor AgentTest and CoordinatorTest suites, where most of 
the methods make HTTP requests using the JsonRestServer. The effect is that ~32 
sockets are leaked per test run, all remaining in the CLOSE_WAIT state (half 
closed) after the test. This is because the JettyServer has correctly closed 
the connections, but the HttpURLConnection has not.

There does not appear to be a way to locally override the HttpURLConnection's 
behavior in this case, and only disabling keep-alive overall (via the system 
property `http.keepAlive=false`) seems to resolve the socket leaks.

To prevent the leaks, we can move JsonRestServer to an alternative HTTP 
implementation, perhaps the jetty-client that Connect uses, or disable 
keepAlive during tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Philip Nee
Hi David et al,

I agree with all the suggestions, but from what I've seen, the flaky tests
tend to get ignored, and I'm afraid that disabling them would leave them
getting forgotten.  If the Jira ticket is accurate, we've got plenty of
tickets opened for > 2 years
.
I do think Divij's call is a good initiative but keep in mind that tackling
these flaky tests can take significant time and effort - outside of one's
full time job.  I think the very least one can do is to ensure there is no
red build, for the near term - as I have seen quite a few PR getting merged
with broken build and broke the trunk.

P

On Mon, Nov 13, 2023 at 7:41 AM Divij Vaidya 
wrote:

> >  Please, do it.
> We can use specific labels to effectively filter those tickets.
>
> We already have a label and a way to discover flaky tests. They are tagged
> with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
> for folks who are new to Apache Kafka code base.
> My suggestion is to send a broader email to the community (since many will
> miss details in this thread) and call for action for committers to
> volunteer as "shepherds" for these tickets. I can send one out once we have
> some consensus wrt next steps in this thread.
>
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
>
> [2] https://kafka.apache.org/contributing -> Finding a project to work on
>
>
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> wrote:
>
> >
> > > To kickstart this effort, we can publish a list of such tickets in the
> > community and assign one or more committers the role of a «shepherd" for
> > each ticket.
> >
> > Please, do it.
> > We can use specific label to effectively filter those tickets.
> >
> > > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> > написал(а):
> > >
> > > Thanks for bringing this up David.
> > >
> > > My primary concern revolves around the possibility that the currently
> > > disabled tests may remain inactive indefinitely. We currently have
> > > unresolved JIRA tickets for flaky tests that have been pending for an
> > > extended period. I am inclined to support the idea of disabling these
> > tests
> > > temporarily and merging changes only when the build is successful,
> > provided
> > > there is a clear plan for re-enabling them in the future.
> > >
> > > To address this issue, I propose the following measures:
> > >
> > > 1\ Foster a supportive environment for new contributors within the
> > > community, encouraging them to take on tickets associated with flaky
> > tests.
> > > This initiative would require individuals familiar with the relevant
> code
> > > to offer guidance to those undertaking these tasks. Committers should
> > > prioritize reviewing and addressing these tickets within their
> available
> > > bandwidth. To kickstart this effort, we can publish a list of such
> > tickets
> > > in the community and assign one or more committers the role of a
> > "shepherd"
> > > for each ticket.
> > >
> > > 2\ Implement a policy to block minor version releases until the Release
> > > Manager (RM) is satisfied that the disabled tests do not result in gaps
> > in
> > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs)
> in
> > > the specific code areas to provide assurance before giving the green
> > light
> > > for a release.
> > >
> > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > > Integration (CI) system. This goal should encompass projects such as
> > > refining our test suite to eliminate flakiness and addressing
> > > infrastructure issues if necessary. By publishing this goal, we create
> a
> > > shared vision for the community in 2024, fostering alignment on our
> > > objectives. This alignment will aid in prioritizing tasks for community
> > > members and guide reviewers in allocating their bandwidth effectively.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > 
> > > wrote:
> > >
> > >> I will say that I have also seen tests that seem to be more flaky
> > >> intermittently. It may be ok for some time and suddenly the CI is
> > >> overloaded and we see issues.
> > >> I have also seen the CI struggling with running out of space recently,
> > so I
> > >> wonder if we can also try to improve things on that front.
> > >>
> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> > week.
> > >> I'm happy to try to get to green builds, but everyone needs to be on
> > board.
> > >>
> > >> https://issues.apache.org/jira/browse/KAFKA-15529
> > >> 

[jira] [Created] (KAFKA-15814) SASL Kerberos authentication cannot be used with load balanced bootstrap

2023-11-13 Thread Piotr Smolinski (Jira)
Piotr Smolinski created KAFKA-15814:
---

 Summary: SASL Kerberos authentication cannot be used with load 
balanced bootstrap
 Key: KAFKA-15814
 URL: https://issues.apache.org/jira/browse/KAFKA-15814
 Project: Kafka
  Issue Type: Bug
  Components: core, security
Affects Versions: 3.6.0
Reporter: Piotr Smolinski


Actually it is a very old problem still unresolved. When access to Kafka is 
done over load balanced bootstrap (like in Kubernetes, or when the number of 
brokers is inpractical to enlist them in the bootstrap, or when we want to give 
a single access address), the broker endpoint can be accessed using at least 
two addresses: one for connection bootstrap (load balanced) and another one for 
broker connection (direct). The problem is that Kafka Kerberos configuration 
forces JAAS to use only one SPN (like kafka/b-0.kafka@DOMAIN). In weaker 
algorithms (like RC4) the same keytab entry can be used for multiple server 
names. The problem arises when we use stronger algorithms (like AES128 or 
AES256), the SPN is used to compute the messages and keytab entries for 
kafka/b-0.kafka@DOMAIN and kafka/kafka@DOMAIN are not compatible.

JAAS configuration for Kerberos can be specified in two ways depending whether 
we are using it for service client or server:
{code:java}
com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  storeKey=true
  keyTab="/etc/kafka/security/kafka.keytab"
  principal="kafka/node-0.kafka.home.arpa@LOCALDOMAIN"
; {code}
{code:java}
com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  storeKey=true
  keyTab="/etc/kafka/security/kafka.keytab"
  principal="*"
  isInitiator=false
; {code}
While the former one can be used on both sides, it forces only one principal to 
be selected from the keytab. The latter form cannot be used on the client side, 
but it dynamically selects the correct SPN based on the client request.

Kafka Kerberos implementation does not distinguish between client and server 
property. In particular the same JAAS configuration entry is used when the 
broker uses Kerberos for inter-broker communication.

Even if the listener property in the broker is known to be not used, the code 
currently does not allow to specify wildcard principal.

Some time ago I have created a patch that solves the problem preserving the 
current semantics, but I did not have time to describe the submission. This 
ticket is a tracker for the Pull Request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Mickael Maison
Hi,

Ok, I've put together a release plan:
https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.6.1

I'll start chasing the owners of the few open issues. If there's any
other issues you'd like to have in 3.6.1, please let me know.

Thanks,
Mickael

On Mon, Nov 13, 2023 at 4:26 PM Divij Vaidya  wrote:
>
> Thanks for volunteering Mickael. Please feel free to take over this thread.
>
> From a Tiered Storage perspective, there is a long list of known bugs in
> 3.6.0 [1] but we shouldn't wait on fixing them all for 3.6.1. This should
> be ok since this feature is in early access. We will do a best-effort to
> merge some of the critical ones by next week. I will nudge the contributors
> where things are pending for a while.
>
> [1] https://issues.apache.org/jira/browse/KAFKA-15420
>
> --
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:10 PM Mickael Maison 
> wrote:
>
> > Hi Divij,
> >
> > You beat me to it, I was about to propose doing a 3.6.1 release later this
> > week.
> > While there's only a dozen or so issues fixed since 3.6.0, as
> > mentioned there's a few important dependency upgrades that would be
> > good to release.
> >
> > I'm happy to volunteer to run the release if we agree to releasing
> > sooner than initially proposed.
> > There seems to only be a few unresolved Jiras targeting 3.6.1 [0] (all
> > have PRs with some of them even already merged!).
> >
> > 0:
> > https://issues.apache.org/jira/browse/KAFKA-15552?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC
> >
> > Thanks,
> > Mickael
> >
> > On Mon, Nov 13, 2023 at 3:57 PM Divij Vaidya 
> > wrote:
> > >
> > > Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> > > better. Unfortunately, I won't have bandwidth to volunteer for a release
> > in
> > > December. If someone else volunteers to be RM prior to this timeline, I
> > > would be happy to ceed the RM role to them but in the worst case
> > scenario,
> > > my offer to volunteer for Jan release could be considered as a backup.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > I think we should be releasing 3.6.1 this year rather than next. There
> > are
> > > > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > > > long to fix them. What do you think?
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > > > wrote:
> > > >
> > > > > Hey folks,
> > > > >
> > > > >
> > > > > I'd like to volunteer to be the release manager for a bug fix
> > release of
> > > > > the 3.6 line. This will be the first bug fix release of this line and
> > > > will
> > > > > be version 3.6.1. It would contain critical bug fixes for  features
> > such
> > > > as
> > > > > Transaction verification [1], will stabilize Tiered Storage early
> > access
> > > > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty
> > [4]
> > > > and
> > > > > Zookeeper [5].
> > > > >
> > > > > If no one has any objections, I will send out a release plan latest
> > by
> > > > 23rd
> > > > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan
> > will
> > > > > include a list of all of the fixes we are targeting for 3.6.1 along
> > with
> > > > > the detailed timeline.
> > > > >
> > > > > If anyone is interested in releasing this sooner, please feel free to
> > > > take
> > > > > over from me.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Regards,
> > > > > Divij Vaidya
> > > > > Apache Kafka Committer
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > > > >
> > > >
> >


Re: [VOTE] KIP-892: Transactional StateStores

2023-11-13 Thread Lucas Brutschy
Hi Nick,

really happy with the final KIP. Thanks a lot for the hard work!

+1 (binding)

Lucas

On Mon, Nov 13, 2023 at 4:20 PM Colt McNealy  wrote:
>
> +1 (non-binding).
>
> Thank you, Nick, for making all of the changes (especially around the
> `default.state.isolation.level` config).
>
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Mon, Nov 13, 2023 at 7:15 AM Nick Telford  wrote:
>
> > Hi everyone,
> >
> > I'd like to call a vote for KIP-892: Transactional StateStores[1], which
> > makes Kafka Streams StateStores transactional under EOS.
> >
> > Regards,
> >
> > Nick
> >
> > 1:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-892%3A+Transactional+Semantics+for+StateStores
> >


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya
>  Please, do it.
We can use specific labels to effectively filter those tickets.

We already have a label and a way to discover flaky tests. They are tagged
with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
for folks who are new to Apache Kafka code base.
My suggestion is to send a broader email to the community (since many will
miss details in this thread) and call for action for committers to
volunteer as "shepherds" for these tickets. I can send one out once we have
some consensus wrt next steps in this thread.


[1]
https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC


[2] https://kafka.apache.org/contributing -> Finding a project to work on


Divij Vaidya



On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков  wrote:

>
> > To kickstart this effort, we can publish a list of such tickets in the
> community and assign one or more committers the role of a «shepherd" for
> each ticket.
>
> Please, do it.
> We can use specific label to effectively filter those tickets.
>
> > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> написал(а):
> >
> > Thanks for bringing this up David.
> >
> > My primary concern revolves around the possibility that the currently
> > disabled tests may remain inactive indefinitely. We currently have
> > unresolved JIRA tickets for flaky tests that have been pending for an
> > extended period. I am inclined to support the idea of disabling these
> tests
> > temporarily and merging changes only when the build is successful,
> provided
> > there is a clear plan for re-enabling them in the future.
> >
> > To address this issue, I propose the following measures:
> >
> > 1\ Foster a supportive environment for new contributors within the
> > community, encouraging them to take on tickets associated with flaky
> tests.
> > This initiative would require individuals familiar with the relevant code
> > to offer guidance to those undertaking these tasks. Committers should
> > prioritize reviewing and addressing these tickets within their available
> > bandwidth. To kickstart this effort, we can publish a list of such
> tickets
> > in the community and assign one or more committers the role of a
> "shepherd"
> > for each ticket.
> >
> > 2\ Implement a policy to block minor version releases until the Release
> > Manager (RM) is satisfied that the disabled tests do not result in gaps
> in
> > our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
> > the specific code areas to provide assurance before giving the green
> light
> > for a release.
> >
> > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > Integration (CI) system. This goal should encompass projects such as
> > refining our test suite to eliminate flakiness and addressing
> > infrastructure issues if necessary. By publishing this goal, we create a
> > shared vision for the community in 2024, fostering alignment on our
> > objectives. This alignment will aid in prioritizing tasks for community
> > members and guide reviewers in allocating their bandwidth effectively.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> 
> > wrote:
> >
> >> I will say that I have also seen tests that seem to be more flaky
> >> intermittently. It may be ok for some time and suddenly the CI is
> >> overloaded and we see issues.
> >> I have also seen the CI struggling with running out of space recently,
> so I
> >> wonder if we can also try to improve things on that front.
> >>
> >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> week.
> >> I'm happy to try to get to green builds, but everyone needs to be on
> board.
> >>
> >> https://issues.apache.org/jira/browse/KAFKA-15529
> >> https://issues.apache.org/jira/browse/KAFKA-14806
> >> https://issues.apache.org/jira/browse/KAFKA-14249
> >> https://issues.apache.org/jira/browse/KAFKA-15798
> >> https://issues.apache.org/jira/browse/KAFKA-15797
> >> https://issues.apache.org/jira/browse/KAFKA-15690
> >> https://issues.apache.org/jira/browse/KAFKA-15699
> >> https://issues.apache.org/jira/browse/KAFKA-15772
> >> https://issues.apache.org/jira/browse/KAFKA-15759
> >> https://issues.apache.org/jira/browse/KAFKA-15760
> >> https://issues.apache.org/jira/browse/KAFKA-15700
> >>
> >> I've also seen that kraft transactions tests often flakily see that the
> >> producer id is not allocated and times out.
> >> I can file a JIRA for that too.
> >>
> >> Hopefully this is a place we can start from.
> >>
> >> Justine
> >>
> >> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
> >>
> >>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
> >> wrote:
> >>>
>  In other words, I’m biased to think that new flakiness indicates
>  non-deterministic bugs more often 

Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya
Thanks for volunteering Mickael. Please feel free to take over this thread.

>From a Tiered Storage perspective, there is a long list of known bugs in
3.6.0 [1] but we shouldn't wait on fixing them all for 3.6.1. This should
be ok since this feature is in early access. We will do a best-effort to
merge some of the critical ones by next week. I will nudge the contributors
where things are pending for a while.

[1] https://issues.apache.org/jira/browse/KAFKA-15420

--
Divij Vaidya



On Mon, Nov 13, 2023 at 4:10 PM Mickael Maison 
wrote:

> Hi Divij,
>
> You beat me to it, I was about to propose doing a 3.6.1 release later this
> week.
> While there's only a dozen or so issues fixed since 3.6.0, as
> mentioned there's a few important dependency upgrades that would be
> good to release.
>
> I'm happy to volunteer to run the release if we agree to releasing
> sooner than initially proposed.
> There seems to only be a few unresolved Jiras targeting 3.6.1 [0] (all
> have PRs with some of them even already merged!).
>
> 0:
> https://issues.apache.org/jira/browse/KAFKA-15552?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC
>
> Thanks,
> Mickael
>
> On Mon, Nov 13, 2023 at 3:57 PM Divij Vaidya 
> wrote:
> >
> > Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> > better. Unfortunately, I won't have bandwidth to volunteer for a release
> in
> > December. If someone else volunteers to be RM prior to this timeline, I
> > would be happy to ceed the RM role to them but in the worst case
> scenario,
> > my offer to volunteer for Jan release could be considered as a backup.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
> >
> > > Hi Divij,
> > >
> > > I think we should be releasing 3.6.1 this year rather than next. There
> are
> > > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > > long to fix them. What do you think?
> > >
> > > Ismael
> > >
> > > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > >
> > > > I'd like to volunteer to be the release manager for a bug fix
> release of
> > > > the 3.6 line. This will be the first bug fix release of this line and
> > > will
> > > > be version 3.6.1. It would contain critical bug fixes for  features
> such
> > > as
> > > > Transaction verification [1], will stabilize Tiered Storage early
> access
> > > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty
> [4]
> > > and
> > > > Zookeeper [5].
> > > >
> > > > If no one has any objections, I will send out a release plan latest
> by
> > > 23rd
> > > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan
> will
> > > > include a list of all of the fixes we are targeting for 3.6.1 along
> with
> > > > the detailed timeline.
> > > >
> > > > If anyone is interested in releasing this sooner, please feel free to
> > > take
> > > > over from me.
> > > >
> > > > Thanks!
> > > >
> > > > Regards,
> > > > Divij Vaidya
> > > > Apache Kafka Committer
> > > >
> > > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > > >
> > >
>


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Ismael Juma
That would be awesome Mickael.

Ismael

On Mon, Nov 13, 2023 at 7:10 AM Mickael Maison 
wrote:

> Hi Divij,
>
> You beat me to it, I was about to propose doing a 3.6.1 release later this
> week.
> While there's only a dozen or so issues fixed since 3.6.0, as
> mentioned there's a few important dependency upgrades that would be
> good to release.
>
> I'm happy to volunteer to run the release if we agree to releasing
> sooner than initially proposed.
> There seems to only be a few unresolved Jiras targeting 3.6.1 [0] (all
> have PRs with some of them even already merged!).
>
> 0:
> https://issues.apache.org/jira/browse/KAFKA-15552?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC
>
> Thanks,
> Mickael
>
> On Mon, Nov 13, 2023 at 3:57 PM Divij Vaidya 
> wrote:
> >
> > Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> > better. Unfortunately, I won't have bandwidth to volunteer for a release
> in
> > December. If someone else volunteers to be RM prior to this timeline, I
> > would be happy to ceed the RM role to them but in the worst case
> scenario,
> > my offer to volunteer for Jan release could be considered as a backup.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
> >
> > > Hi Divij,
> > >
> > > I think we should be releasing 3.6.1 this year rather than next. There
> are
> > > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > > long to fix them. What do you think?
> > >
> > > Ismael
> > >
> > > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > >
> > > > I'd like to volunteer to be the release manager for a bug fix
> release of
> > > > the 3.6 line. This will be the first bug fix release of this line and
> > > will
> > > > be version 3.6.1. It would contain critical bug fixes for  features
> such
> > > as
> > > > Transaction verification [1], will stabilize Tiered Storage early
> access
> > > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty
> [4]
> > > and
> > > > Zookeeper [5].
> > > >
> > > > If no one has any objections, I will send out a release plan latest
> by
> > > 23rd
> > > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan
> will
> > > > include a list of all of the fixes we are targeting for 3.6.1 along
> with
> > > > the detailed timeline.
> > > >
> > > > If anyone is interested in releasing this sooner, please feel free to
> > > take
> > > > over from me.
> > > >
> > > > Thanks!
> > > >
> > > > Regards,
> > > > Divij Vaidya
> > > > Apache Kafka Committer
> > > >
> > > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > > >
> > >
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Николай Ижиков


> To kickstart this effort, we can publish a list of such tickets in the 
> community and assign one or more committers the role of a «shepherd" for each 
> ticket.

Please, do it.
We can use specific label to effectively filter those tickets.

> 13 нояб. 2023 г., в 15:16, Divij Vaidya  написал(а):
> 
> Thanks for bringing this up David.
> 
> My primary concern revolves around the possibility that the currently
> disabled tests may remain inactive indefinitely. We currently have
> unresolved JIRA tickets for flaky tests that have been pending for an
> extended period. I am inclined to support the idea of disabling these tests
> temporarily and merging changes only when the build is successful, provided
> there is a clear plan for re-enabling them in the future.
> 
> To address this issue, I propose the following measures:
> 
> 1\ Foster a supportive environment for new contributors within the
> community, encouraging them to take on tickets associated with flaky tests.
> This initiative would require individuals familiar with the relevant code
> to offer guidance to those undertaking these tasks. Committers should
> prioritize reviewing and addressing these tickets within their available
> bandwidth. To kickstart this effort, we can publish a list of such tickets
> in the community and assign one or more committers the role of a "shepherd"
> for each ticket.
> 
> 2\ Implement a policy to block minor version releases until the Release
> Manager (RM) is satisfied that the disabled tests do not result in gaps in
> our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
> the specific code areas to provide assurance before giving the green light
> for a release.
> 
> 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> Integration (CI) system. This goal should encompass projects such as
> refining our test suite to eliminate flakiness and addressing
> infrastructure issues if necessary. By publishing this goal, we create a
> shared vision for the community in 2024, fostering alignment on our
> objectives. This alignment will aid in prioritizing tasks for community
> members and guide reviewers in allocating their bandwidth effectively.
> 
> --
> Divij Vaidya
> 
> 
> 
> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan 
> wrote:
> 
>> I will say that I have also seen tests that seem to be more flaky
>> intermittently. It may be ok for some time and suddenly the CI is
>> overloaded and we see issues.
>> I have also seen the CI struggling with running out of space recently, so I
>> wonder if we can also try to improve things on that front.
>> 
>> FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
>> I'm happy to try to get to green builds, but everyone needs to be on board.
>> 
>> https://issues.apache.org/jira/browse/KAFKA-15529
>> https://issues.apache.org/jira/browse/KAFKA-14806
>> https://issues.apache.org/jira/browse/KAFKA-14249
>> https://issues.apache.org/jira/browse/KAFKA-15798
>> https://issues.apache.org/jira/browse/KAFKA-15797
>> https://issues.apache.org/jira/browse/KAFKA-15690
>> https://issues.apache.org/jira/browse/KAFKA-15699
>> https://issues.apache.org/jira/browse/KAFKA-15772
>> https://issues.apache.org/jira/browse/KAFKA-15759
>> https://issues.apache.org/jira/browse/KAFKA-15760
>> https://issues.apache.org/jira/browse/KAFKA-15700
>> 
>> I've also seen that kraft transactions tests often flakily see that the
>> producer id is not allocated and times out.
>> I can file a JIRA for that too.
>> 
>> Hopefully this is a place we can start from.
>> 
>> Justine
>> 
>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
>> 
>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
>> wrote:
>>> 
 In other words, I’m biased to think that new flakiness indicates
 non-deterministic bugs more often than it indicates a bad test.
 
>>> 
>>> My experience is exactly the opposite. As someone who has tracked many of
>>> the flaky fixes, the vast majority of the time they are an issue with the
>>> test.
>>> 
>>> Ismael
>>> 
>> 



Re: [VOTE] KIP-892: Transactional StateStores

2023-11-13 Thread Colt McNealy
+1 (non-binding).

Thank you, Nick, for making all of the changes (especially around the
`default.state.isolation.level` config).

Colt McNealy

*Founder, LittleHorse.dev*


On Mon, Nov 13, 2023 at 7:15 AM Nick Telford  wrote:

> Hi everyone,
>
> I'd like to call a vote for KIP-892: Transactional StateStores[1], which
> makes Kafka Streams StateStores transactional under EOS.
>
> Regards,
>
> Nick
>
> 1:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-892%3A+Transactional+Semantics+for+StateStores
>


[VOTE] KIP-892: Transactional StateStores

2023-11-13 Thread Nick Telford
Hi everyone,

I'd like to call a vote for KIP-892: Transactional StateStores[1], which
makes Kafka Streams StateStores transactional under EOS.

Regards,

Nick

1:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-892%3A+Transactional+Semantics+for+StateStores


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Mickael Maison
Hi Divij,

You beat me to it, I was about to propose doing a 3.6.1 release later this week.
While there's only a dozen or so issues fixed since 3.6.0, as
mentioned there's a few important dependency upgrades that would be
good to release.

I'm happy to volunteer to run the release if we agree to releasing
sooner than initially proposed.
There seems to only be a few unresolved Jiras targeting 3.6.1 [0] (all
have PRs with some of them even already merged!).

0: 
https://issues.apache.org/jira/browse/KAFKA-15552?jql=project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%203.6.1%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC

Thanks,
Mickael

On Mon, Nov 13, 2023 at 3:57 PM Divij Vaidya  wrote:
>
> Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> better. Unfortunately, I won't have bandwidth to volunteer for a release in
> December. If someone else volunteers to be RM prior to this timeline, I
> would be happy to ceed the RM role to them but in the worst case scenario,
> my offer to volunteer for Jan release could be considered as a backup.
>
> --
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
>
> > Hi Divij,
> >
> > I think we should be releasing 3.6.1 this year rather than next. There are
> > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > long to fix them. What do you think?
> >
> > Ismael
> >
> > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > wrote:
> >
> > > Hey folks,
> > >
> > >
> > > I'd like to volunteer to be the release manager for a bug fix release of
> > > the 3.6 line. This will be the first bug fix release of this line and
> > will
> > > be version 3.6.1. It would contain critical bug fixes for  features such
> > as
> > > Transaction verification [1], will stabilize Tiered Storage early access
> > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4]
> > and
> > > Zookeeper [5].
> > >
> > > If no one has any objections, I will send out a release plan latest by
> > 23rd
> > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
> > > include a list of all of the fixes we are targeting for 3.6.1 along with
> > > the detailed timeline.
> > >
> > > If anyone is interested in releasing this sooner, please feel free to
> > take
> > > over from me.
> > >
> > > Thanks!
> > >
> > > Regards,
> > > Divij Vaidya
> > > Apache Kafka Committer
> > >
> > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > >
> >


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Ismael Juma
Sounds good. Let's see if someone else volunteers for an earlier 3.6.1.
I'll ping some people at Confluent to check if anyone has time.

Ismael

On Mon, Nov 13, 2023 at 6:57 AM Divij Vaidya 
wrote:

> Hi Ismael, I am all-in favour for frequent releases. Sooner is always
> better. Unfortunately, I won't have bandwidth to volunteer for a release in
> December. If someone else volunteers to be RM prior to this timeline, I
> would be happy to ceed the RM role to them but in the worst case scenario,
> my offer to volunteer for Jan release could be considered as a backup.
>
> --
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:
>
> > Hi Divij,
> >
> > I think we should be releasing 3.6.1 this year rather than next. There
> are
> > some critical bugs in 3.6.0 and I don't think we should be waiting that
> > long to fix them. What do you think?
> >
> > Ismael
> >
> > On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> > wrote:
> >
> > > Hey folks,
> > >
> > >
> > > I'd like to volunteer to be the release manager for a bug fix release
> of
> > > the 3.6 line. This will be the first bug fix release of this line and
> > will
> > > be version 3.6.1. It would contain critical bug fixes for  features
> such
> > as
> > > Transaction verification [1], will stabilize Tiered Storage early
> access
> > > release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4]
> > and
> > > Zookeeper [5].
> > >
> > > If no one has any objections, I will send out a release plan latest by
> > 23rd
> > > Dec 2023 with a tentative release in mid-Jan 2024. The release plan
> will
> > > include a list of all of the fixes we are targeting for 3.6.1 along
> with
> > > the detailed timeline.
> > >
> > > If anyone is interested in releasing this sooner, please feel free to
> > take
> > > over from me.
> > >
> > > Thanks!
> > >
> > > Regards,
> > > Divij Vaidya
> > > Apache Kafka Committer
> > >
> > > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> > >
> >
>


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya
Hi Ismael, I am all-in favour for frequent releases. Sooner is always
better. Unfortunately, I won't have bandwidth to volunteer for a release in
December. If someone else volunteers to be RM prior to this timeline, I
would be happy to ceed the RM role to them but in the worst case scenario,
my offer to volunteer for Jan release could be considered as a backup.

--
Divij Vaidya



On Mon, Nov 13, 2023 at 3:40 PM Ismael Juma  wrote:

> Hi Divij,
>
> I think we should be releasing 3.6.1 this year rather than next. There are
> some critical bugs in 3.6.0 and I don't think we should be waiting that
> long to fix them. What do you think?
>
> Ismael
>
> On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
> wrote:
>
> > Hey folks,
> >
> >
> > I'd like to volunteer to be the release manager for a bug fix release of
> > the 3.6 line. This will be the first bug fix release of this line and
> will
> > be version 3.6.1. It would contain critical bug fixes for  features such
> as
> > Transaction verification [1], will stabilize Tiered Storage early access
> > release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4]
> and
> > Zookeeper [5].
> >
> > If no one has any objections, I will send out a release plan latest by
> 23rd
> > Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
> > include a list of all of the fixes we are targeting for 3.6.1 along with
> > the detailed timeline.
> >
> > If anyone is interested in releasing this sooner, please feel free to
> take
> > over from me.
> >
> > Thanks!
> >
> > Regards,
> > Divij Vaidya
> > Apache Kafka Committer
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-15653
> > [2] https://issues.apache.org/jira/browse/KAFKA-15481
> > [3] https://issues.apache.org/jira/browse/KAFKA-15695
> > [4] https://issues.apache.org/jira/browse/KAFKA-15644
> > [5] https://issues.apache.org/jira/browse/KAFKA-15596
> >
>


Re: [DISCUSS] KIP-1002: Fetch remote segment indexes at once

2023-11-13 Thread Divij Vaidya
Hi Jorge

1. I don't think we need a new API here because alternatives solutions
exist even with the current API. As an example, when the first index is
fetched, the RSM plugin can choose to download all indexes and cache it
locally. On the next call to fetch an index from the remote tier, we will
hit the cache and retrieve the index from there.

2. The KIP assumes that all indexes are required at all times. However,
indexes such as transaction indexes are only required for read_committed
fetches and time index is only required when a fetch call wants to search
offset by timestamp. As a future step in Tiered Storage, I would actually
prefer to move towards a direction where we are lazily fetching indexes
on-demand instead of fetching them together as proposed in the KIP.

--
Divij Vaidya



On Fri, Nov 10, 2023 at 4:00 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hello everyone,
>
> I would like to start the discussion on a KIP for Tiered Storage. It's
> about improving cross-segment latencies by reducing calls to fetch indexes
> individually.
> Have a look:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1002%3A+Fetch+remote+segment+indexes+at+once
>
> Cheers,
> Jorge
>


Re: Apache Kafka 3.6.1 release

2023-11-13 Thread Ismael Juma
Hi Divij,

I think we should be releasing 3.6.1 this year rather than next. There are
some critical bugs in 3.6.0 and I don't think we should be waiting that
long to fix them. What do you think?

Ismael

On Mon, Nov 13, 2023 at 6:32 AM Divij Vaidya 
wrote:

> Hey folks,
>
>
> I'd like to volunteer to be the release manager for a bug fix release of
> the 3.6 line. This will be the first bug fix release of this line and will
> be version 3.6.1. It would contain critical bug fixes for  features such as
> Transaction verification [1], will stabilize Tiered Storage early access
> release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4] and
> Zookeeper [5].
>
> If no one has any objections, I will send out a release plan latest by 23rd
> Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
> include a list of all of the fixes we are targeting for 3.6.1 along with
> the detailed timeline.
>
> If anyone is interested in releasing this sooner, please feel free to take
> over from me.
>
> Thanks!
>
> Regards,
> Divij Vaidya
> Apache Kafka Committer
>
> [1] https://issues.apache.org/jira/browse/KAFKA-15653
> [2] https://issues.apache.org/jira/browse/KAFKA-15481
> [3] https://issues.apache.org/jira/browse/KAFKA-15695
> [4] https://issues.apache.org/jira/browse/KAFKA-15644
> [5] https://issues.apache.org/jira/browse/KAFKA-15596
>


Apache Kafka 3.6.1 release

2023-11-13 Thread Divij Vaidya
Hey folks,


I'd like to volunteer to be the release manager for a bug fix release of
the 3.6 line. This will be the first bug fix release of this line and will
be version 3.6.1. It would contain critical bug fixes for  features such as
Transaction verification [1], will stabilize Tiered Storage early access
release [2] [3] and upgrade dependencies to fix CVEs such as Netty [4] and
Zookeeper [5].

If no one has any objections, I will send out a release plan latest by 23rd
Dec 2023 with a tentative release in mid-Jan 2024. The release plan will
include a list of all of the fixes we are targeting for 3.6.1 along with
the detailed timeline.

If anyone is interested in releasing this sooner, please feel free to take
over from me.

Thanks!

Regards,
Divij Vaidya
Apache Kafka Committer

[1] https://issues.apache.org/jira/browse/KAFKA-15653
[2] https://issues.apache.org/jira/browse/KAFKA-15481
[3] https://issues.apache.org/jira/browse/KAFKA-15695
[4] https://issues.apache.org/jira/browse/KAFKA-15644
[5] https://issues.apache.org/jira/browse/KAFKA-15596


Re: [DISCUSS] KIP-977: Partition-Level Throughput Metrics

2023-11-13 Thread Divij Vaidya
Thank you for updating the KIP Qichao.

I don't have any more questions or suggestions. Looks good to move forward
from my perspective.



--
Divij Vaidya



On Fri, Nov 10, 2023 at 2:25 PM Qichao Chu  wrote:

> Thank you again for the nice suggestions, Jorge!
> I will wait for Divij's response and move it to the vote stage once the
> generic filter part reached concensus.
>
> Qichao Chu
> Software Engineer | Data - Kafka
> [image: Uber] 
>
>
> On Fri, Nov 10, 2023 at 6:49 AM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Qichao,
> >
> > Thanks for updating the KIP, all updates look good to me.
> >
> > Looking forward to see this KIP moving forward!
> >
> > Cheers,
> > Jorge.
> >
> >
> >
> > On Wed, 8 Nov 2023 at 08:55, Qichao Chu  wrote:
> >
> > > Hi Divij,
> > >
> > > Thank you for the feedback. I updated the KIP to make it a little bit
> > more
> > > generic: filters will stay in an array instead of different top-level
> > > objects. In this way, if we need language filters in the future. The
> > logic
> > > relationship of filters is also added.
> > >
> > > Hi Jorge,
> > >
> > > Thank you for the review and great comments. Here is the reply for each
> > of
> > > the suggestions:
> > >
> > > 1) The words describing the property are now updated to include more
> > > details of the keys in the JSON. It also explicitly mentions the JSON
> > > nature of the config now.
> > > 2) The JSON entries should be non-conflict so the order is not
> relevant.
> > If
> > > there's conflict, the conflict resolution rules are stated in the KIP.
> To
> > > make it more clear, ordering and duplication rules are updated in the
> > > Restrictions section of the *level* property.
> > > 3) Yeah we did take a look at the RecordingLevel config and it does not
> > > work for this case. The RecodingLevel config does not offer the
> > capability
> > > of filtering and it has a drawback of needing to be added to all the
> > future
> > > sensors. To reduce the duplication, I propose we merge the
> RecordingLevel
> > > to this more generic config in the future. Please take a look into the
> > > *Using
> > > the Existing RecordingLevel Config* section under *Rejected
> Alternatives*
> > > for more details.
> > > 4) This suggestion makes a lot of sense. My idea is to create a
> > > table/form/doc in the documentation for the verbosity levels of all
> > metric
> > > series. If it's too verbose to be in the docs, I will update the KIP to
> > > include this info. I will create a JIRA for this effort once the KIP is
> > > approved.
> > > 5) Sure we can expand to all other series, added to the KIP.
> > > 6) Added a new section(*Working with the Configuration via CLI)* with
> the
> > > user experience details
> > > 7) Links are updated.
> > >
> > > Please take another look and let me know if you have any more concerns.
> > >
> > > Best,
> > > Qichao Chu
> > > Software Engineer | Data - Kafka
> > > [image: Uber] 
> > >
> > >
> > > On Wed, Nov 8, 2023 at 6:29 AM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Hi Qichao,
> > > >
> > > > Thanks for the KIP! This will be a valuable contribution and improve
> > the
> > > > tooling for troubleshooting.
> > > >
> > > > I have a couple of comments:
> > > >
> > > > 1. It's unclear from the `metrics.verbosity` description what the
> > > supported
> > > > values are. In the description mentions "If the value is high ... In
> > the
> > > > low settings" but I think it's referring to the `level` property
> > > > specifically instead of the whole value that is now JSON. Could you
> > > clarify
> > > > this?
> > > >
> > > > 2. Could we state in which order the JSON entries are going to be
> > > > evaluated? I guess the last entry wins if it overlaps previous
> values,
> > > but
> > > > better to make this explicit.
> > > >
> > > > 3. Kafka metrics library has a `RecordingLevel` configuration -- have
> > we
> > > > considered aligning these concepts and maybe reuse it instead of
> > > > `verbosityLevel`? Then we can reuse the levels: INFO, DEBUG, TRACE.
> > > >
> > > > 4. Not sure if within the scope of the KIP, but would be helpful to
> > > > document the metrics with the verbosity level attached to the
> metrics.
> > > > Maybe creating a JIRA ticket to track this would be enough if we
> can't
> > > > cover it as part of the KIP.
> > > >
> > > > 5. Could we consider the following client-related metrics as well:
> > > >   - BytesRejectedPerSec
> > > >   - TotalProduceRequestsPerSec
> > > >   - TotalFetchRequestsPerSec
> > > >   - FailedProduceRequestsPerSec
> > > >   - FailedFetchRequestsPerSec
> > > >   - FetchMessageConversionsPerSec
> > > >   - ProduceMessageConversionsPerSec
> > > > Would be great to have these from day 1 instead of requiring a
> > following
> > > > KIP to extend this. Could be implemented in separate PRs if needed.
> > > >
> > > > 6. To make it clearer how the user experience would be, 

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2380

2023-11-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 431558 lines...]
> Task :connect:json:compileTestJava UP-TO-DATE
> Task :connect:json:testClasses UP-TO-DATE
> Task :connect:api:jar UP-TO-DATE
> Task :connect:json:testJar
> Task :connect:api:generateMetadataFileForMavenJavaPublication
> Task :connect:json:copyDependantLibs UP-TO-DATE
> Task :connect:json:jar UP-TO-DATE
> Task :connect:api:compileTestJava UP-TO-DATE
> Task :connect:api:testClasses UP-TO-DATE
> Task :connect:json:generateMetadataFileForMavenJavaPublication
> Task :connect:json:testSrcJar
> Task :connect:api:testJar
> Task :connect:api:testSrcJar
> Task :connect:api:publishMavenJavaPublicationToMavenLocal
> Task :connect:api:publishToMavenLocal
> Task :connect:json:publishMavenJavaPublicationToMavenLocal
> Task :connect:json:publishToMavenLocal
> Task :storage:storage-api:compileTestJava
> Task :storage:storage-api:testClasses
> Task :server-common:compileTestJava
> Task :server-common:testClasses
> Task :raft:compileTestJava
> Task :raft:testClasses
> Task :core:compileScala
> Task :group-coordinator:compileTestJava
> Task :group-coordinator:testClasses

> Task :clients:javadoc
/home/jenkins/workspace/Kafka_kafka_trunk/clients/src/main/java/org/apache/kafka/clients/admin/ScramMechanism.java:32:
 warning - Tag @see: missing final '>': "https://cwiki.apache.org/confluence/display/KAFKA/KIP-554%3A+Add+Broker-side+SCRAM+Config+API;>KIP-554:
 Add Broker-side SCRAM Config API

 This code is duplicated in 
org.apache.kafka.common.security.scram.internals.ScramMechanism.
 The type field in both files must match and must not change. The type field
 is used both for passing ScramCredentialUpsertion and for the internal
 UserScramCredentialRecord. Do not change the type field."
/home/jenkins/workspace/Kafka_kafka_trunk/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/package-info.java:21:
 warning - Tag @link: reference not found: 
org.apache.kafka.common.security.oauthbearer
2 warnings

> Task :clients:javadocJar
> Task :metadata:compileTestJava
> Task :metadata:testClasses
> Task :clients:srcJar
> Task :clients:testJar
> Task :clients:testSrcJar
> Task :clients:publishMavenJavaPublicationToMavenLocal
> Task :clients:publishToMavenLocal
> Task :core:classes
> Task :core:compileTestJava NO-SOURCE
> Task :core:compileTestScala
> Task :core:testClasses
> Task :streams:compileTestJava
> Task :streams:testClasses
> Task :streams:testJar
> Task :streams:testSrcJar
> Task :streams:publishMavenJavaPublicationToMavenLocal
> Task :streams:publishToMavenLocal

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

For more on this, please refer to 
https://docs.gradle.org/8.3/userguide/command_line_interface.html#sec:command_line_warnings
 in the Gradle documentation.

BUILD SUCCESSFUL in 4m 58s
93 actionable tasks: 40 executed, 53 up-to-date

Publishing build scan...
https://ge.apache.org/s/yxybgkorhgn5k

[Pipeline] sh
+ grep ^version= gradle.properties
+ cut -d= -f 2
[Pipeline] dir
Running in /home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart
[Pipeline] {
[Pipeline] sh
+ mvn clean install -Dgpg.skip
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Kafka Streams :: Quickstart[pom]
[INFO] streams-quickstart-java[maven-archetype]
[INFO] 
[INFO] < org.apache.kafka:streams-quickstart >-
[INFO] Building Kafka Streams :: Quickstart 3.7.0-SNAPSHOT[1/2]
[INFO]   from pom.xml
[INFO] [ pom ]-
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ streams-quickstart ---
[INFO] 
[INFO] --- remote-resources:1.5:process (process-resource-bundles) @ 
streams-quickstart ---
[INFO] 
[INFO] --- site:3.5.1:attach-descriptor (attach-descriptor) @ 
streams-quickstart ---
[INFO] 
[INFO] --- gpg:1.6:sign (sign-artifacts) @ streams-quickstart ---
[INFO] 
[INFO] --- install:2.5.2:install (default-install) @ streams-quickstart ---
[INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.7.0-SNAPSHOT/streams-quickstart-3.7.0-SNAPSHOT.pom
[INFO] 
[INFO] --< org.apache.kafka:streams-quickstart-java >--
[INFO] Building streams-quickstart-java 3.7.0-SNAPSHOT[2/2]
[INFO]   from java/pom.xml
[INFO] --[ maven-archetype ]---
[INFO] 
[INFO] --- clean:3.0.0:clean (default-clean) @ streams-quickstart-java 

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya
Thanks for bringing this up David.

My primary concern revolves around the possibility that the currently
disabled tests may remain inactive indefinitely. We currently have
unresolved JIRA tickets for flaky tests that have been pending for an
extended period. I am inclined to support the idea of disabling these tests
temporarily and merging changes only when the build is successful, provided
there is a clear plan for re-enabling them in the future.

To address this issue, I propose the following measures:

1\ Foster a supportive environment for new contributors within the
community, encouraging them to take on tickets associated with flaky tests.
This initiative would require individuals familiar with the relevant code
to offer guidance to those undertaking these tasks. Committers should
prioritize reviewing and addressing these tickets within their available
bandwidth. To kickstart this effort, we can publish a list of such tickets
in the community and assign one or more committers the role of a "shepherd"
for each ticket.

2\ Implement a policy to block minor version releases until the Release
Manager (RM) is satisfied that the disabled tests do not result in gaps in
our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
the specific code areas to provide assurance before giving the green light
for a release.

3\ Set a community-wide goal for 2024 to achieve a stable Continuous
Integration (CI) system. This goal should encompass projects such as
refining our test suite to eliminate flakiness and addressing
infrastructure issues if necessary. By publishing this goal, we create a
shared vision for the community in 2024, fostering alignment on our
objectives. This alignment will aid in prioritizing tasks for community
members and guide reviewers in allocating their bandwidth effectively.

--
Divij Vaidya



On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan 
wrote:

> I will say that I have also seen tests that seem to be more flaky
> intermittently. It may be ok for some time and suddenly the CI is
> overloaded and we see issues.
> I have also seen the CI struggling with running out of space recently, so I
> wonder if we can also try to improve things on that front.
>
> FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
> I'm happy to try to get to green builds, but everyone needs to be on board.
>
> https://issues.apache.org/jira/browse/KAFKA-15529
> https://issues.apache.org/jira/browse/KAFKA-14806
> https://issues.apache.org/jira/browse/KAFKA-14249
> https://issues.apache.org/jira/browse/KAFKA-15798
> https://issues.apache.org/jira/browse/KAFKA-15797
> https://issues.apache.org/jira/browse/KAFKA-15690
> https://issues.apache.org/jira/browse/KAFKA-15699
> https://issues.apache.org/jira/browse/KAFKA-15772
> https://issues.apache.org/jira/browse/KAFKA-15759
> https://issues.apache.org/jira/browse/KAFKA-15760
> https://issues.apache.org/jira/browse/KAFKA-15700
>
> I've also seen that kraft transactions tests often flakily see that the
> producer id is not allocated and times out.
> I can file a JIRA for that too.
>
> Hopefully this is a place we can start from.
>
> Justine
>
> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
>
> > On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
> wrote:
> >
> > > In other words, I’m biased to think that new flakiness indicates
> > > non-deterministic bugs more often than it indicates a bad test.
> > >
> >
> > My experience is exactly the opposite. As someone who has tracked many of
> > the flaky fixes, the vast majority of the time they are an issue with the
> > test.
> >
> > Ismael
> >
>


[jira] [Created] (KAFKA-15813) Improve implementation of client instnce cache

2023-11-13 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15813:
-

 Summary: Improve implementation of client instnce cache
 Key: KAFKA-15813
 URL: https://issues.apache.org/jira/browse/KAFKA-15813
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


In the current implementation the ClientMetricsManager uses LRU cache but we 
should alos support expiring stale clients i.e. client which haven't reported 
metrics for a while.

 

The KIP mentions: This client instance specific state is maintained in broker 
memory up to MAX(60*1000, PushIntervalMs * 3) milliseconds and is used to 
enforce the push interval rate-limiting. There is no persistence of client 
instance metrics state across broker restarts or between brokers 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)