Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Luke Chen
Thanks Bruno!

+1

Luke

On Sat, Feb 5, 2022 at 10:50 AM Guozhang Wang  wrote:

> Thanks Bruno! +1
>
> On Fri, Feb 4, 2022 at 4:14 PM Ismael Juma  wrote:
>
> > Thanks for volunteering, Bruno. +1!
> >
> > Ismael
> >
> > On Fri, Feb 4, 2022 at 7:03 AM Bruno Cadonna  wrote:
> >
> > > Hi,
> > >
> > > I'd like to volunteer to be the release manager for our next
> > > feature release, 3.2.0. If there are no objections, I'll send
> > > out the release plan soon.
> > >
> > > Best,
> > > Bruno
> > >
> >
>
>
> --
> -- Guozhang
>


[jira] [Resolved] (KAFKA-13346) Kafka Streams fails due to RocksDB Locks Not Available Exception

2022-02-04 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13346.
---
Resolution: Not A Problem

> Kafka Streams fails due to RocksDB Locks Not Available Exception
> 
>
> Key: KAFKA-13346
> URL: https://issues.apache.org/jira/browse/KAFKA-13346
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Amit Gupta
>Priority: Major
>
> Hello,
> We are using Kafka Streams and we observe that some times on some of the 
> hosts running streams application, Kafka streams instance fails with 
> unexpected exception. We are running with 40 stream threads per host and 20 
> hosts in total.
> Can some one please help on what can be the root cause here?
>  
> |org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> state-store at location .
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:214)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:188)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:224)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:42)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$0(MeteredKeyValueStore.java:101)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:101)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:199)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:76)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.StandbyTask.initializeIfNeeded(StandbyTask.java:95)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:426)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:660)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:510)
>  ~[kafka-streams-2.6.0.jar:?]
>  Caused by: org.rocksdb.RocksDBException: lock : 
> ./0_468/rocksdb/state-store/LOCK: No locks available
>  at org.rocksdb.RocksDB.open(Native Method) ~[rocksdbjni-5.18.3.jar:?]
>  at org.rocksdb.RocksDB.open(RocksDB.java:286) ~[rocksdbjni-5.18.3.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:211)
>  ~[kafka-streams-2.6.0.jar:?]
>  ... 15 more
>   
>  Some times I also see this exception
>   |
> |org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> state-store at location ./0_433/rocksdb/state-store
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openRocksDB(RocksDBStore.java:214)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:188)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:224)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.init(ChangeLoggingKeyValueBytesStore.java:42)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$init$0(MeteredKeyValueStore.java:101)
>  ~[kafka-streams-2.6.0.jar:?]
>  at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:836)
>  

Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Guozhang Wang
Thanks Bruno! +1

On Fri, Feb 4, 2022 at 4:14 PM Ismael Juma  wrote:

> Thanks for volunteering, Bruno. +1!
>
> Ismael
>
> On Fri, Feb 4, 2022 at 7:03 AM Bruno Cadonna  wrote:
>
> > Hi,
> >
> > I'd like to volunteer to be the release manager for our next
> > feature release, 3.2.0. If there are no objections, I'll send
> > out the release plan soon.
> >
> > Best,
> > Bruno
> >
>


-- 
-- Guozhang


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #661

2022-02-04 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Ismael Juma
Thanks for volunteering, Bruno. +1!

Ismael

On Fri, Feb 4, 2022 at 7:03 AM Bruno Cadonna  wrote:

> Hi,
>
> I'd like to volunteer to be the release manager for our next
> feature release, 3.2.0. If there are no objections, I'll send
> out the release plan soon.
>
> Best,
> Bruno
>


Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Israel Ekpo
+1 thanks for taking this on.

On Fri, Feb 4, 2022 at 11:51 AM Bill Bejeck  wrote:

> +1.  Thanks for volunteering to run the release, Bruno!
>
> -Bill
>
> On Fri, Feb 4, 2022 at 10:18 AM David Jacot  wrote:
>
> > +1. Thanks for volunteering, Bruno!
> >
> > Le ven. 4 févr. 2022 à 16:03, Bruno Cadonna  a
> écrit :
> >
> > > Hi,
> > >
> > > I'd like to volunteer to be the release manager for our next
> > > feature release, 3.2.0. If there are no objections, I'll send
> > > out the release plan soon.
> > >
> > > Best,
> > > Bruno
> > >
> >
>
-- 
Israel Ekpo
Lead Instructor, IzzyAcademy.com
https://www.youtube.com/c/izzyacademy
https://izzyacademy.com/


[jira] [Resolved] (KAFKA-13322) Java client produces a large amount of garbage during a poll

2022-02-04 Thread Michail (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michail resolved KAFKA-13322.
-
Resolution: Won't Fix

Having done the analysis the changes required to achieve this are far too 
intrusive thus decided not to proceed with this.

> Java client produces a large amount of garbage during a poll
> 
>
> Key: KAFKA-13322
> URL: https://issues.apache.org/jira/browse/KAFKA-13322
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 3.0.0
>Reporter: Michail
>Priority: Minor
>  Labels: new-rebalance-should-fix
>
> The java kafka consumer creates multiple collections during a single poll 
> command: in my test system i have a consumer that polls a topic with 100 
> partitions and even though no messages are coming through, the code allocates 
> around 100M per 5 minutes.
>  
> I've investigated the allocations and the biggest ones can be easily avoided 
> by moving them to the instance level, something that can be done as 
> KafkaConsumer is not thread safe. Purpose of this Jira is to get rid of most 
> of them applying either this or a similar approach.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] KIP-704: Send a hint to broker if it is an unclean leader

2022-02-04 Thread José Armando García Sancio
David Jacot wrote:

> The behavior of the leader is clear. However, the part which is
> not clear to me is how followers which could get fetch requests
> from consumers as well will handle them. Sorry if I was not clear.

Got it. I updated the KIP to add more information regarding how the
topic partition follower will handle LEADER_AND_ISR request. In other
words, the follower will return a NOT_LEADER_OR_FOLLOWER error if the
leader is RECOVERING from an unclean leader election.

Thanks,
--
-José


[jira] [Resolved] (KAFKA-13193) Replica manager doesn't update partition state when transitioning from leader to follower with unknown leader

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-13193.
--
Resolution: Fixed

> Replica manager doesn't update partition state when transitioning from leader 
> to follower with unknown leader
> -
>
> Key: KAFKA-13193
> URL: https://issues.apache.org/jira/browse/KAFKA-13193
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft, replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
>
> This issue applies to both the ZK and KRaft implementation of the replica 
> manager. In the rare case when a replica transition from leader to follower 
> with no leader the partition state is not updated.
> This is because when handling makeFollowers the ReplicaManager only updates 
> the partition state if the leader is alive. The solution is to always 
> transition to follower but not start the fetcher thread if the leader is 
> unknown or not alive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] KIP-816: Topology changes without local state reset

2022-02-04 Thread Guozhang Wang
Hi folks,

I think the NamedTopology work would help with the convenience of the
solution for this KIP, but I feel it is not by itself the solution here. If
I'm not mistaken, the scope of this KIP is trying to tackle that, *assuming
the developer already knows* a new topology or part of the topology e.g.
like a state store of the topology does not change, then how to effectively
be able to reuse that part of the topology. Today it is very hard to reuse
part (say a state store, an internal topic) of a previous topology's
persistent state because:

1) the names of those persistent states are prefixed by the application id.
2) the names of those persistent states are suffixed by the index, which
reflects the structure of the topology.
3) the dir path of the persistent states are "prefixed" by the task id,
which is hence dependent on the sub-topology id.

My quick thoughts are that 1) is easy to go around as long as users reuse
the same appId, 3) can be tackled with the help of the named topology but
each named topology can still be composed of multiple sub-topologies so
extra work is still needed to align the sub-topology ids, but we still need
something to tackle 2) here, which I was pondering between those options
and at the moment leaning towards option 2).

Does that make sense to you?








On Fri, Feb 4, 2022 at 4:38 AM Nick Telford  wrote:

> Hi Guozhang, Sophie,
>
> Thanks for both taking the time to review my proposal.
>
> I did actually see the NamedTopology classes, and noted that they were
> internal. I didn't realise they are part of an intended solution to this
> problem, that's very interesting. I'm going to try to find some time to
> take a look at your experimental work so I can understand it a bit better.
>
> From your description, it sounds like the NamedTopology approach should
> enable users to solve this problem at the level that they wish to. My
> concern is that users will need to be explicit about how their Topology is
> structured, and will need to know in advance how their Topologies might
> evolve in the future in order to correctly break them up by name. For
> example, if a user mistakenly assumes one particular structure for their
> application, but later makes changes that implicitly cause an existing
> NamedTopology to have its internal Subtopologies re-ordered, the user will
> need to clear all the local state for that NamedTopology, at least.
>
> Unless I'm mistaken, StateStores are defined exclusively by the data in
> their changelogs. Even if you make changes to a Topology that requires
> clearing locally materialized state, the changelogs aren't reset[1], so the
> newly rebuilt state is materialized from the pre-existing values. Even if
> changes are made to the Subtopology that writes to the StateStore, the
> existing data in the changelog hasn't changed. The contents of the
> StateStore evolves. This is exactly the same as a traditional database
> table, where a client may evolve its behaviour to subtly change the
> semantics of the data written to the table, without deleting the existing
> data.
>
> If a user makes a change that means a different Subtopology reads from the
> StateStore, the semantics of, and the data in the store itself hasn't
> actually changed. The only reason we need to reset this local state at all
> is due to the conflict on-disk caused by the change in Subtopology ordinal.
> If local StateStore data was decoupled from Tasks, this conflict would
> disappear, and the application would work as expected.
>
> A Subtopology is defined by all connected topics, including changelogs,
> repartition topics, source topics and sink topics. Whereas a StateStore is
> defined exclusively by its changelog. So why do we tightly couple
> StateStore to Subtopology? This is my central argument for option A that I
> outlined in the KIP, and I would like to discuss it further, even if only
> to educate myself on why it's not possible :-D
>
> I still think the NamedTopology work is valuable, but more as a means to
> better organize large applications.
>
> Regards,
> Nick
>
> 1: The only exception to this I can think of is when a user decides to
> change the format (Serdes) or semantics of the data in the store, in which
> case they would need to do a full reset by also clearing the changelog
> topic for that store. Realistically, users that wish to do this would be
> better off just creating a new store and deleting the old one, so I don't
> think it's a case worth optimizing for.
>
> On Fri, 4 Feb 2022 at 08:22, Sophie Blee-Goldman
>  wrote:
>
> > Hey Nick,
> >
> > thanks for the KIP, this is definitely a much-needed feature. I've
> actually
> > been working on
> > a somewhat similar feature for a while now and have a good chunk of the
> > implementation
> > completed -- but so far it's only exposed via internal APIs and hasn't
> been
> > brought to a KIP
> > yet, as it's a fairly large and complex project and I wanted to get all
> the
> > details hashed out
> > 

[jira] [Created] (KAFKA-13646) Implement KIP-801: KRaft authorizer

2022-02-04 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-13646:


 Summary: Implement KIP-801: KRaft authorizer
 Key: KAFKA-13646
 URL: https://issues.apache.org/jira/browse/KAFKA-13646
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe
Assignee: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12421) Improve controller's atomic grouping

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-12421.
--
  Assignee: Jose Armando Garcia Sancio  (was: HaiyuanZhao)
Resolution: Fixed

> Improve controller's atomic grouping
> 
>
> Key: KAFKA-12421
> URL: https://issues.apache.org/jira/browse/KAFKA-12421
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Reporter: José Armando García Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
>
> The current controller implementation atomically appends to the metadata log 
> by making sure that all required records are on the same batch. The 
> controller groups all of the records that result from an RPC into one batch. 
> Some of the RPCs are:
>  # Client quota changes
>  # Configuration changes
>  # Feature changes
>  # Topic creation
> This is good enough for correctness but it is more aggressive than necessary. 
> For example, for topic creation since errors are reported independently, the 
> controller only needs to guarantee that all of the records for one topic are 
> committed atomically.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12214) Generated code does not include UUID or struct fields in its toString output

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-12214.
--
Resolution: Fixed

> Generated code does not include UUID or struct fields in its toString output
> 
>
> Key: KAFKA-12214
> URL: https://issues.apache.org/jira/browse/KAFKA-12214
> Project: Kafka
>  Issue Type: Bug
>  Components: generator
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>  Labels: kip-500
>
> The generated code does not include UUID or struct fields in its toString 
> output.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12271) Expose consistent Raft metadata to components

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-12271.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

> Expose consistent Raft metadata to components
> -
>
> Key: KAFKA-12271
> URL: https://issues.apache.org/jira/browse/KAFKA-12271
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Arthur
>Assignee: Colin McCabe
>Priority: Minor
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> In Raft clusters, we need a way of exposing consistent metadata to components 
> such as ReplicaManager, LogManager, etc. 
> This is done through a new class named MetadataImage. As Raft metadata 
> records are processed, new MetadataImage-s are built and provided to the 
> MetadataCache atomically. This avoids readers seeing any partially 
> materialized metadata.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-12209) Add the timeline data structures for the KIP-631 controller

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-12209.
--
Fix Version/s: 2.8.0
 Assignee: Colin McCabe
   Resolution: Fixed

> Add the timeline data structures for the KIP-631 controller
> ---
>
> Key: KAFKA-12209
> URL: https://issues.apache.org/jira/browse/KAFKA-12209
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>  Labels: kip-500
> Fix For: 2.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-10724) Command to run single quorum in raft is missing "--config" parameters.

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-10724.
--
Resolution: Fixed

> Command to run single quorum in raft is missing "--config" parameters.
> --
>
> Key: KAFKA-10724
> URL: https://issues.apache.org/jira/browse/KAFKA-10724
> Project: Kafka
>  Issue Type: Bug
>  Components: core, docs
>Reporter: huldar chen
>Priority: Major
>  Labels: kip-500
>
> When I run "bin/test-raft-server-start.sh config/raft.properties", I get an 
> error:
> [2020-11-14 23:00:38,742] ERROR Exiting Kafka due to fatal exception 
> (kafka.tools.TestRaftServer$)
> org.apache.kafka.common.config.ConfigException: Missing required 
> configuration "zookeeper.connect" which has no default value.
>  at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:478)
>  at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:468)
>  at 
> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:108)
>  at 
> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:142)
>  at kafka.server.KafkaConfig.(KafkaConfig.scala:1314)
>  at kafka.server.KafkaConfig.(KafkaConfig.scala:1317)
>  at kafka.tools.TestRaftServer$.main(TestRaftServer.scala:607)
>  at kafka.tools.TestRaftServer.main(TestRaftServer.scala)
> The correct command is “ ./bin/test-raft-server-start.sh --config 
> ./config/raft.properties”



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-9154) ProducerId generation should be managed by the Controller

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-9154.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

We implemented KIP-730 in Kafka 3.1. Closing this.

> ProducerId generation should be managed by the Controller
> -
>
> Key: KAFKA-9154
> URL: https://issues.apache.org/jira/browse/KAFKA-9154
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Viktor Somogyi-Vass
>Assignee: David Arthur
>Priority: Major
>  Labels: kip-500
> Fix For: 3.1.0
>
>
> Currently producerIds are maintained in Zookeeper but in the future we'd like 
> them to be managed by the controller quorum in an internal topic.
> The reason for storing this in Zookeeper was that this must be unique across 
> the cluster. In this task it should be refactored such that the 
> TransactionManager turns to the Controller for a ProducerId which connects to 
> Zookeeper to acquire this ID. Since ZK is the single source of truth and the 
> PID won't be cached anywhere it should be safe (just one extra hop added).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13503) Validate broker configs for KRaft

2022-02-04 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-13503.
--
Fix Version/s: 3.2.0
 Reviewer: Jose Armando Garcia Sancio
 Assignee: Colin McCabe  (was: dengziming)
   Resolution: Fixed

> Validate broker configs for KRaft
> -
>
> Key: KAFKA-13503
> URL: https://issues.apache.org/jira/browse/KAFKA-13503
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>  Labels: kip-500
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] KIP-801 new authorizer for kip-500 kraft mode

2022-02-04 Thread Colin McCabe
Hi all,

The vote is now closed. KIP-801 has been approved with 3 binding +1s from Jason 
Gustafson, Manikumar Reddy, and David Arthur.

Thanks, all.
Colin

On Wed, Feb 2, 2022, at 16:14, Colin McCabe wrote:
> Thanks, Manikumar. Together with Jason and David's votes (David's vote 
> is in the other thread) that makes 3 +1s.
>
> I will let the vote run for another day and then close it out if there 
> are no more comments.
>
> regards,
> Colin
>
> On Wed, Feb 2, 2022, at 11:42, Manikumar wrote:
>> +1 (binding). Thanks for the KIP.
>>
>> On Tue, Feb 1, 2022 at 10:43 PM Jason Gustafson 
>> wrote:
>>
>>> +1 Thanks!
>>>
>>> On Mon, Jan 31, 2022 at 6:20 PM Colin McCabe  wrote:
>>>
>>> > Hi all,
>>> >
>>> > It looks like people using gmail are seeing the previous vote thread as
>>> > merged with the discuss thread, so let me create a new thread in order to
>>> > avoid confusion. Usually using a very different thread title works well
>>> > enough to avoid the merging.
>>> >
>>> > Original vote thread:
>>> > https://lists.apache.org/thread/jwjhpdll4jp3y6lo9kox3p5thwo8qpk3
>>> >
>>> > best,
>>> > Colin
>>> >
>>>


Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Bill Bejeck
+1.  Thanks for volunteering to run the release, Bruno!

-Bill

On Fri, Feb 4, 2022 at 10:18 AM David Jacot  wrote:

> +1. Thanks for volunteering, Bruno!
>
> Le ven. 4 févr. 2022 à 16:03, Bruno Cadonna  a écrit :
>
> > Hi,
> >
> > I'd like to volunteer to be the release manager for our next
> > feature release, 3.2.0. If there are no objections, I'll send
> > out the release plan soon.
> >
> > Best,
> > Bruno
> >
>


Re: [DISCUSS] KIP-817: Fix inconsistency in dynamic application log levels

2022-02-04 Thread David Jacot
Hi Dongjin,

Thanks for the KIP.

It is not so clear to me why we decided not to support OFF in the
first place. I understand that entirely disabling a logger is rare.

I find the KIP a bit week at the moment for two reasons:

1) The KIP says that the levels that we use are not fully
consistent with the log4j's level. OFF and ALL miss. However,
the KIP proposes to only introduce OFF.

2) Introducing ALL is rejected because TRACE could be used. I
think that the same argument for OFF as FATAL could be used
to reduce the verbosity to the minimum and as we rarely use
FATAL in the code base that is more or less equivalent to OFF.
This is what I usually do, personally.

Honestly, I don't feel strong either way so let's see what others
have to say.

Cheers,
David




On Fri, Jan 28, 2022 at 11:04 AM Dongjin Lee  wrote:
>
> Hi Kafka dev,
>
> I would like to start the discussion of KIP-817: Fix inconsistency in
> dynamic application log levels.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-817%3A+Fix+inconsistency+in+dynamic+application+log+levels
>
> This is rather a minor issue, but I found it while working with KIP-653:
> Upgrade log4j to log4j2 (Accepted).
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-653%3A+Upgrade+log4j+to+log4j2
>
> All kinds of feedbacks are greatly appreciated!
>
> Best,
> Dongjin
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
>
>
> *github:  github.com/dongjinleekr
> keybase: https://keybase.io/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck: speakerdeck.com/dongjin
> *


Re: [DISCUSS] KIP-792: Add "generation" field into consumer protocol

2022-02-04 Thread David Jacot
Hi Luke,

Thanks for updating the KIP. I just have a minor request.
Could you fully describe the changes to the Subscription
public class in the KIP? I think that it would be better than
just saying that the generation will be added to id.

Otherwise, the KIP LGTM.

Thanks,
David

On Mon, Nov 29, 2021 at 4:29 AM Luke Chen  wrote:
>
> Hi devs,
> Welcome to provide feedback.
>
> If there are no other comments, I'll start a vote tomorrow.
>
> Thank you.
> Luke
>
>
> On Mon, Nov 22, 2021 at 4:16 PM Luke Chen  wrote:
>
> > Hello David,
> >
> > For (3):
> >
> >
> >
> > * I suppose that we could add a `generation` field to the JoinGroupRequest
> > instead to do the fencing that you describe while handling the sentinel in
> > the assignor directly. If we would add the `generation` to the request
> > itself, would we need the `generation` in the subscription protocol as
> > well?*
> >
> > On second thought, I think this is not better than adding `generation`
> > field in the subscription protocol, because I think we don't have to do any
> > generation validation on joinGroup request. The purpose of
> > `joinGroupRequest` is to accept any members to join this group, even if the
> > member is new or ever joined or what. As long as we have the generationId
> > in the subscription metadata, the consumer lead can leverage the info to
> > ignore the old ownedPartitions (or do other handling), and the rebalance
> > can still complete successfully and correctly. On the other hand, if we did
> > the generation check on JoinGroupRequest, and return `ILLEGAL_GENERATION`
> > back to consumer, the consumer needs to clear its generation info and
> > rejoin the group to continue the rebalance. It needs more request/response
> > network and slow down the rebalance.
> >
> > So I think we should add the `generationId` field into Subscription
> > protocol to achieve what we want.
> >
> > Thank you.
> > Luke
> >
> > On Thu, Nov 18, 2021 at 8:51 PM Luke Chen  wrote:
> >
> >> Hi David,
> >> Thanks for your feedback.
> >>
> >> I've updated the KIP for your comments (1)(2).
> >> For (3), it's a good point! Yes, we didn't deserialize the subscription
> >> metadata on broker side, and it's not necessary to add overhead on broker
> >> side. And, yes, I think we can fix the original issue by adding a
> >> "generation" field into `JoinGroupRequest` instead, and also add a field
> >> into `JoinGroupResponse` in `JoinGroupResponseMember` field. That way, the
> >> broker can identify the old member from `JoinGroupRequest`. And the
> >> assignor can also get the "generation" info via the `Subscription` 
> >> instance.
> >>
> >> I'll update the KIP to add "generation" field into `JoinGroupRequest` and
> >> `JoinGroupResponse`, if there is no other options.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >> On Tue, Nov 16, 2021 at 12:31 AM David Jacot 
> >> wrote:
> >>
> >>> Hi Luke,
> >>>
> >>> Thanks for the KIP. Overall, I think that the motivation makes sense. I
> >>> have a couple of comments/questions:
> >>>
> >>> 1. In the Public Interfaces section, it would be great if you could put
> >>> the
> >>> end state not the current one.
> >>>
> >>> 2. Do we need to update the Subscription class to expose the
> >>> generation? If so, it would be great to mention it in the Public
> >>> Interfaces section as well.
> >>>
> >>> 3. You mention that the broker will set the generation if the
> >>> subscription
> >>> contains a sentinel value (-1). As of today, the broker does not parse
> >>> the subscription so I am not sure how/why we would do this. I suppose
> >>> that we could add a `generation` field to the JoinGroupRequest instead
> >>> to do the fencing that you describe while handling the sentinel in the
> >>> assignor directly. If we would add the `generation` to the request
> >>> itself,
> >>> would we need the `generation` in the subscription protocol as well?
> >>>
> >>> Best,
> >>> David
> >>>
> >>> On Fri, Nov 12, 2021 at 3:31 AM Luke Chen  wrote:
> >>> >
> >>> > Hi all,
> >>> >
> >>> > I'd like to start the discussion for KIP-792: Add "generation" field
> >>> into
> >>> > consumer protocol.
> >>> >
> >>> > The goal of this KIP is to allow assignor/consumer coordinator/group
> >>> > coordinator to have a way to identify the out-of-date
> >>> members/assignments.
> >>> >
> >>> > Detailed description can be found here:
> >>> >
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336614
> >>> >
> >>> > Any feedback is welcome.
> >>> >
> >>> > Thank you.
> >>> > Luke
> >>>
> >>


Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread David Jacot
+1. Thanks for volunteering, Bruno!

Le ven. 4 févr. 2022 à 16:03, Bruno Cadonna  a écrit :

> Hi,
>
> I'd like to volunteer to be the release manager for our next
> feature release, 3.2.0. If there are no objections, I'll send
> out the release plan soon.
>
> Best,
> Bruno
>


[DISCUSS] Apache Kafka 3.2.0 release

2022-02-04 Thread Bruno Cadonna

Hi,

I'd like to volunteer to be the release manager for our next
feature release, 3.2.0. If there are no objections, I'll send
out the release plan soon.

Best,
Bruno


Re: [DISCUSS] KIP-816: Topology changes without local state reset

2022-02-04 Thread Nick Telford
Hi Guozhang, Sophie,

Thanks for both taking the time to review my proposal.

I did actually see the NamedTopology classes, and noted that they were
internal. I didn't realise they are part of an intended solution to this
problem, that's very interesting. I'm going to try to find some time to
take a look at your experimental work so I can understand it a bit better.

>From your description, it sounds like the NamedTopology approach should
enable users to solve this problem at the level that they wish to. My
concern is that users will need to be explicit about how their Topology is
structured, and will need to know in advance how their Topologies might
evolve in the future in order to correctly break them up by name. For
example, if a user mistakenly assumes one particular structure for their
application, but later makes changes that implicitly cause an existing
NamedTopology to have its internal Subtopologies re-ordered, the user will
need to clear all the local state for that NamedTopology, at least.

Unless I'm mistaken, StateStores are defined exclusively by the data in
their changelogs. Even if you make changes to a Topology that requires
clearing locally materialized state, the changelogs aren't reset[1], so the
newly rebuilt state is materialized from the pre-existing values. Even if
changes are made to the Subtopology that writes to the StateStore, the
existing data in the changelog hasn't changed. The contents of the
StateStore evolves. This is exactly the same as a traditional database
table, where a client may evolve its behaviour to subtly change the
semantics of the data written to the table, without deleting the existing
data.

If a user makes a change that means a different Subtopology reads from the
StateStore, the semantics of, and the data in the store itself hasn't
actually changed. The only reason we need to reset this local state at all
is due to the conflict on-disk caused by the change in Subtopology ordinal.
If local StateStore data was decoupled from Tasks, this conflict would
disappear, and the application would work as expected.

A Subtopology is defined by all connected topics, including changelogs,
repartition topics, source topics and sink topics. Whereas a StateStore is
defined exclusively by its changelog. So why do we tightly couple
StateStore to Subtopology? This is my central argument for option A that I
outlined in the KIP, and I would like to discuss it further, even if only
to educate myself on why it's not possible :-D

I still think the NamedTopology work is valuable, but more as a means to
better organize large applications.

Regards,
Nick

1: The only exception to this I can think of is when a user decides to
change the format (Serdes) or semantics of the data in the store, in which
case they would need to do a full reset by also clearing the changelog
topic for that store. Realistically, users that wish to do this would be
better off just creating a new store and deleting the old one, so I don't
think it's a case worth optimizing for.

On Fri, 4 Feb 2022 at 08:22, Sophie Blee-Goldman
 wrote:

> Hey Nick,
>
> thanks for the KIP, this is definitely a much-needed feature. I've actually
> been working on
> a somewhat similar feature for a while now and have a good chunk of the
> implementation
> completed -- but so far it's only exposed via internal APIs and hasn't been
> brought to a KIP
> yet, as it's a fairly large and complex project and I wanted to get all the
> details hashed out
> before settling on a public API.
>
> For some sense of how complicated it's been, you can check out the JIRA
> ticket we've been
> filing PRs under -- there are already 25 PRs to the feature. See
> KAFKA-12648
> . You can check
> out the new KafkaStreamsNamedTopologyWrapper class to see what the current
> API looks like
> -- I recommend taking a look to see if this might cover some or all of the
> things you wanted
> this KIP to do.
>
> For a high-level sketch, my work introduces the concept of a
> "NamedTopology" (which will be
> renamed to "ModularTopology" in the future, but is still referred to as
> "named" in the codebase
> so I'll keep using it for now) . Each KafkaStreams app can execute multiple
> named topologies,
> which are just regular topologies that are given a unique name. The
> essential feature of a
> named topology is that it can be dynamically added or removed without even
> stopping the
> application, much less resetting it. Technically a NamedTopology can be
> composed or one
> or more subtopologies, but if you want to be able to update the application
> at a subtopology
> level you can just name each  subtopology.
>
> So I believe the feature you want is actually already implemented, for the
> most part -- it's currently
> missing a few things that I just didn't bother to implement yet since I've
> been focused
> on getting a working, minimal POC that I could use for testing. (For
> example it doesn't yet
> 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-04 Thread Sagar
Thanks Sophie/Guozhang.

Yeah I could have amended the KIP but it slipped my mind when Guozhang
proposed this in the PR. Later on, the PR was merged and KIP was marked as
adopted so I thought I will write a new one. I know the PR had been
reopened now :p . I dont have much preference on a new KIP v/s the original
one so anything is ok with me as well.

I agree with the INFO part. I will make that change.

Regarding task level, from my understanding, since every task's
buffer/cache size might be different so if a certain task might be
overshooting the limits then the task level metric might help people to
infer this. Also, thanks for the explanation Guozhang on why this should be
a task level metric. What are your thoughts on this @Sophie?

Thanks!
Sagar.


On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang  wrote:

> Thanks Sagar for proposing the KIP, and Sophie for sharing your thoughts.
> Here're my 2c:
>
> I think I agree with Sophie for making the two metrics (both the added and
> the newly proposed) on INFO level since we are always calculating them
> anyways. Regarding the level of the cache-size though, I'm thinking a bit
> different with you two: today we do not actually keep that caches on the
> per-store level, but rather on the per-thread level, i.e. when the cache is
> full we would flush not only on the triggering state store but also
> potentially on other state stores as well of the task that thread owns.
> This mechanism, in hindsight, is a bit weird and we have some discussions
> about refactoring that in the future already. Personally I'd like to make
> this new metric to be aligned with whatever our future design will be.
>
> In the long run if we would not have a static assignment from tasks to
> threads, it may not make sense to keep a dedicated cache pool per thread.
> Instead all tasks will be dynamically sharing the globally configured max
> cache size (dynamically here means, we would not just divide the total size
> by the num.tasks and then assign that to each task), and when a cache put
> triggers the flushing because the sum now exceeds the global configured
> value, we would potentially flush all the cached records for that task. If
> this is the end stage, then I think keeping this metric at the task level
> is good.
>
>
>
> Guozhang
>
>
>
>
> On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
>  wrote:
>
> > Hey Sagar,  thanks for the KIP!
> >
> > And yes, all metrics are considered part of the public API and thus
> require
> > a KIP to add (or modify, etc...) Although in this particular case, you
> > could probably make a good case for just considering it as an update to
> the
> > original KIP which added the analogous metric `input-buffer-bytes-total`.
> > For  things like this that weren't considered during the KIP proposal but
> > came up during the implementation or review, and are small changes that
> > would have made sense to include in that KIP had they been thought of,
> you
> > can just send an update to the existing KIP's discussion and.or voting
> > thread that explains what you want to add or modify and maybe a brief
> > description why.
> >
> > It's always ok to make a new KIP when in doubt, but there are some cases
> > where an update email is sufficient. If there are any concerns or
> > suggestions that significantly expand the scope of the update, you can
> > always go create a new KIP and move the discussion there.
> >
> > I'd say you can feel free to proceed in whichever way you'd prefer for
> this
> > new proposal -- it just needs to appear in some KIP somewhere, and have
> > given the community thew opportunity to discuss and provide feedback on.
> >
> > On that note, I do have two suggestions:
> >
> > 1)  since we need to measure the size of the cache (and the input
> buffer(s)
> > anyways, we may as well make `cache-size-bytes-total` -- and also the new
> > input-buffer-bytes-total -- an INFO level metric. In general the more
> > metrics the merrier, the only real reason for disabling some are if they
> > have a performance impact or other cost that not everyone will want to
> pay.
> > In this case we're already computing the value of these metrics, so why
> not
> > expose it to the user as an INFO metric
> > 2) I think it would be both more natural and easier to implement if this
> > was a store-level metric. A single task could in theory contain multiple
> > physical state store caches and we would have to roll them up to report
> the
> > size for the task as a whole. It's additional work just to lose some
> > information that the user may want to have
> >
> > Let me know if anything here doesn't make sense or needs clarification.
> And
> > thanks for the quick followup to get this 2nd metric!
> >
> > -Sophie
> >
> > On Sat, Jan 29, 2022 at 4:27 AM Sagar  wrote:
> >
> > > Hi All,
> > >
> > > I would like to open a discussion thread on the following KIP:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> > >
> > > 

Re: [DISCUSS] KIP-816: Topology changes without local state reset

2022-02-04 Thread Sophie Blee-Goldman
Hey Nick,

thanks for the KIP, this is definitely a much-needed feature. I've actually
been working on
a somewhat similar feature for a while now and have a good chunk of the
implementation
completed -- but so far it's only exposed via internal APIs and hasn't been
brought to a KIP
yet, as it's a fairly large and complex project and I wanted to get all the
details hashed out
before settling on a public API.

For some sense of how complicated it's been, you can check out the JIRA
ticket we've been
filing PRs under -- there are already 25 PRs to the feature. See KAFKA-12648
. You can check
out the new KafkaStreamsNamedTopologyWrapper class to see what the current
API looks like
-- I recommend taking a look to see if this might cover some or all of the
things you wanted
this KIP to do.

For a high-level sketch, my work introduces the concept of a
"NamedTopology" (which will be
renamed to "ModularTopology" in the future, but is still referred to as
"named" in the codebase
so I'll keep using it for now) . Each KafkaStreams app can execute multiple
named topologies,
which are just regular topologies that are given a unique name. The
essential feature of a
named topology is that it can be dynamically added or removed without even
stopping the
application, much less resetting it. Technically a NamedTopology can be
composed or one
or more subtopologies, but if you want to be able to update the application
at a subtopology
level you can just name each  subtopology.

So I believe the feature you want is actually already implemented, for the
most part -- it's currently
missing a few things that I just didn't bother to implement yet since I've
been focused
on getting a working, minimal POC that I could use for testing. (For
example it doesn't yet
support global state stores) But beyond that, the only remaining work to
make this feature
available is to settle on the APIs, get a KIP passed, and implement said
APIs.

Would you be interested in helping out with the NamedTopology work so we
can turn it into a
a full-fledged public feature? I'm happy to let you take the lead on the
KIP, maybe by adapting
this one if you think it makes sense to do so. The NamedTopology feature is
somewhat larger
in scope than strictly necessary for your purposes, however, so you could
take on just a part
of it and leave anything beyond that for me to do as followup.

By the way: one advantage of the NamedTopology feature is that we don't
have to worry about
any compatibility issues or upgrade/migration path -- it's opt-in by
definition. (Of course we would
recommend using it to all users, like we do with named operators)

Let me know what you think and how you want to proceed from here -- I
wouldn't want you to
spend time re-implementing more or less the same thing, but I most likely
wasn't going to find time
to put out a KIP for the NamedTopology feature in the near future. If you
would be able to help
drive this to completion, we'd each have significantly less work to do to
achieve our goals :)

Cheers,
Sophie


On Thu, Feb 3, 2022 at 6:12 PM Guozhang Wang  wrote:

> Hello Nick,
>
> Thanks for bringing this up and for the proposed options. I read though
> your writeup and here are some of my thoughts:
>
> 1) When changing the topology of Kafka Streams, the developer need to first
> decide if the whole topology's persisted state (including both the state
> store as well as its changelogs, and the repartition topics, and the
> source/sink external topics) or part of the persisted state can be reused.
> This involves two types of changes:
>
> a) structural change of the topology, such like a new processor node is
> added/removed, a new intermediate topic is added/removed etc.
> b) semantic change of a processor, such as a numerical filter node changing
> its filter threshold etc.
>
> Today both of them are more or less determined by developers manually.
> However, though automatically determining on changes of type b) is hard if
> not possible, automatic determining on the type of a) is doable since it's
> depend on just the information of:
> * number of sub-topologies, and their orders (i.e. sequence of ids)
> * used state stores and changelog topics within the sub-topology
> * used repartition topics
> * etc
>
> So let's assume in the long run we can indeed automatically determine if a
> topology or part of it (a sub-topology) is structurally the same, what we
> can do is to "translate" the old persisted state names to the
> new, isomorphic topology's names. Following this thought I'm leaning
> towards the direction of option B in your proposal. But since in this KIP
> automatic determining structural changes are out of the scope, I feel we
> can consider adding some sort of a "migration tool" from an old topology to
> new topology by renaming all the persisted states (store dirs and names,
> topic names).
>
>
> Guozhang
>
>
> On Tue, Jan 25, 2022 at 9:10 AM Nick Telford 
> wrote:
>
> > Hi