[jira] [Commented] (KAFKA-15326) Decouple Processing Thread from Polling Thread

2023-11-18 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787544#comment-17787544
 ] 

Guozhang Wang commented on KAFKA-15326:
---

My bad not linking the first couple of PRs with this ticket --- was thinking to 
only rename them after I got someone to review, but in the end until it's 
merged I did not rename it.

> Decouple Processing Thread from Polling Thread
> --
>
> Key: KAFKA-15326
> URL: https://issues.apache.org/jira/browse/KAFKA-15326
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Lucas Brutschy
>Assignee: Lucas Brutschy
>Priority: Critical
>
> As part of an ongoing effort to implement a better threading architecture in 
> Kafka streams, we decouple N stream threads into N polling threads and N 
> processing threads. The effort to consolidate N polling thread into a single 
> thread is follow-up after this ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15602) Breaking change in 3.4.0 ByteBufferSerializer

2023-10-29 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17780759#comment-17780759
 ] 

Guozhang Wang commented on KAFKA-15602:
---

[~luke.kirby] Thanks for reporting the issue, I read through the description / 
examples that you and [~mjsax] [~pnee] discussed, and I agree I totally misread 
the original issue while reviewing it. To make it quick into the hotfix 
version, I'm onboard with reverting the changes first.

> Breaking change in 3.4.0 ByteBufferSerializer
> -
>
> Key: KAFKA-15602
> URL: https://issues.apache.org/jira/browse/KAFKA-15602
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.4.0, 3.5.0, 3.4.1, 3.6.0, 3.5.1
>Reporter: Luke Kirby
>Priority: Critical
>
> [This PR|https://github.com/apache/kafka/pull/12683/files] claims to have 
> solved the situation described by KAFKA-4852, namely, to have 
> ByteBufferSerializer respect ByteBuffers wrapping byte arrays with non-0 
> offsets (or, put another way, to honor the buffer's position() as the start 
> point to consume bytes from). Unfortunately, it failed to actually do this, 
> and instead changed the expectations for how an input ByteBuffer's limit and 
> position should be set before being provided to send() on a producer 
> configured with ByteBufferSerializer. Code that worked with pre-3.4.0 
> releases now produce 0-length messages instead of the intended messages, 
> effectively introducing a breaking change for existing users of the 
> serializer in the wild.
> Here are a few different inputs and serialized outputs under pre-3.4.0 and 
> 3.4.0+ to summarize the breaking change:
> ||buffer argument||3.3.2 serialized output||3.4.0+ serialized output||
> |ByteBuffer.wrap("test".getBytes(UTF_8))|len=4 
> val=test|len=4 val=test|
> |ByteBuffer.allocate(8).put("test".getBytes(UTF_8)).flip()|len=4 
> val=test|len=0 val=|
> |ByteBuffer.allocate(8).put("test".getBytes(UTF_8))|len=8 
> val=test<0><0><0><0>|len=4 val=test|
> |ByteBuffer buff = ByteBuffer.allocate(8).put("test".getBytes(UTF_8));
> buff.limit(buff.position());|len=4 
> val=test|len=4 val=test|
> |ByteBuffer.wrap("test".getBytes(UTF_8), 1, 3)|len=4 val=test|len=1 val=t|
> Notably, plain-wrappers of byte arrays continue to work under both versions 
> due to the special case in the serializer for them. I suspect that this is 
> the dominant use-case, which is why this has apparently gone un-reported to 
> this point. The wrapped-with-offset case fails for both cases for different 
> reasons (the expected value would be "est"). As demonstrated here, you can 
> ensure that a manually assembled ByteBuffer will work under both versions by 
> ensuring that your buffers start have position == limit == message-length 
> (and an actual desired start position of 0). Clearly, though, behavior has 
> changed dramatically for the second and third case there, with the 3.3.2 
> behavior, in my experience, aligning better with naive expectations.
> [Previously|https://github.com/apache/kafka/blob/35a0de32ee3823dfb548a1cd5d5faf4f7c99e4e0/clients/src/main/java/org/apache/kafka/common/serialization/ByteBufferSerializer.java],
>  the serializer would just rewind() the buffer and respect the limit as the 
> indicator as to how much data was in the buffer. So, essentially, the 
> prevailing contract was that the data from position 0 (always!) up to the 
> limit on the buffer would be serialized; so it was really just the limit that 
> was honored. So if, per the original issue, you have a byte[] array wrapped 
> with, say, ByteBuffer.wrap(bytes, 3, 5) then that will yield a ByteBuffer() 
> with position = 3 indicating the desired start point to read from, but 
> effectively ignored by the serializer due to the rewind().
> So while the serializer didn't work when presenting a ByteBuffer view onto a 
> sub-view of a backing array, it did however follow expected behavior when 
> employing standard patterns to populate ByteBuffers backed by 
> larger-than-necessary arrays and using limit() to identify the end of actual 
> data, consistent with conventional usage of flip() to switch from writing to 
> a buffer to setting it up to be read from (e.g., to be passed into a 
> producer.send() call). E.g.,
> {code:java}
> ByteBuffer bb = ByteBuffer.allocate(TOO_MUCH);
> ... // some sequence of 
> bb.put(...); // populate buffer with some number of bytes less than TOO_MUCH 
> ... 
> bb.flip(); /* logically, this says "I am done writing, let's set this up for 
> reading"; pragmatically, it sets the limit to the current position so that 
> whoever reads the buffer knows when to stop reading, and sets the position to 
> zero so it knows where to start reading from */ 
> producer.send(bb); {code}
> Technically, you wouldn't even need to use flip() there, 

[jira] [Commented] (KAFKA-13152) Replace "buffered.records.per.partition" & "cache.max.bytes.buffering" with "{statestore.cache}/{input.buffer}.max.bytes"

2023-10-15 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775500#comment-17775500
 ] 

Guozhang Wang commented on KAFKA-13152:
---

At the moment I'm afraid I might become the bottleneck if I took any role as a 
primary reviewer for large PR / KIPs.. for this ticket, I think 
[~wcarl...@confluent.io] [~ableegoldman] and also [~lihaosky] has enough 
background to guide it through. I'm happy to act as a second pair of eyes but I 
concern not enough committed as a primary reviewer.

> Replace "buffered.records.per.partition" & "cache.max.bytes.buffering" with 
> "{statestore.cache}/{input.buffer}.max.bytes"
> -
>
> Key: KAFKA-13152
> URL: https://issues.apache.org/jira/browse/KAFKA-13152
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Sagar Rao
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> The current config "buffered.records.per.partition" controls how many records 
> in maximum to bookkeep, and hence it is exceed we would pause fetching from 
> this partition. However this config has two issues:
> * It's a per-partition config, so the total memory consumed is dependent on 
> the dynamic number of partitions assigned.
> * Record size could vary from case to case.
> And hence it's hard to bound the memory usage for this buffering. We should 
> consider deprecating that config with a global, e.g. "input.buffer.max.bytes" 
> which controls how much bytes in total is allowed to be buffered. This is 
> doable since we buffer the raw records in .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15297) Cache flush order might not be topological order

2023-08-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752108#comment-17752108
 ] 

Guozhang Wang commented on KAFKA-15297:
---

{{We are not sure if we should not instead decouple caching from forwarding.}} 
I'd assume double negations here mean "We think we should just try to decouple 
caching from forwarding as the right solution" :) And yes, I'd love to see that 
happening as I've advocated for it for many years, and I was thinking about 
just "suppressing" the records in the last sink processor of the sub-topology 
to achieve the same effect of less send over the network. It may be just 
similar to what you meant by "caching on the last state store" or may be having 
some corner differences. In either way like you said it will lose some benefit 
of processing less records at the later stage of a sub-topology, but I think in 
most cases given a sub-topology's size this seems a good trade for simplicity.

It also have many other benefits, just to name a few: 1) we have much simpler 
timestamp tracking (today it's as finer-grained as per-processor) with a task 
as every record will always go through the whole sub-topology, 2) we have 
simpler version tracking within sub-topologies for IQ since now all state 
stores have the same version.

> Cache flush order might not be topological order 
> -
>
> Key: KAFKA-15297
> URL: https://issues.apache.org/jira/browse/KAFKA-15297
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.4.0
>Reporter: Bruno Cadonna
>Priority: Major
> Attachments: minimal_example.png
>
>
> The flush order of the state store caches in Kafka Streams might not 
> correspond to the topological order of the state stores in the topology. The 
> order depends on how the processors and state stores are added to the 
> topology. 
> In some cases downstream state stores might be flushed before upstream state 
> stores. That means, that during a commit records in upstream caches might end 
> up in downstream caches that have already been flushed during the same 
> commit. If a crash happens at that point, those records in the downstream 
> caches are lost. Those records are lost for two reasons:
> 1. Records in caches are only changelogged after they are flushed from the 
> cache. However, the downstream caches have already been flushed and they will 
> not be flushed again during the same commit.
> 2. The offsets of the input records that caused the records that now are 
> blocked in the downstream caches are committed during the same commit and so 
> they will not be re-processed after the crash.
> An example for a topology where the flush order of the caches is wrong is the 
> following:
> {code:java}
> final String inputTopic1 = "inputTopic1";
> final String inputTopic2 = "inputTopic2";
> final String outputTopic1 = "outputTopic1";
> final String processorName = "processor1";
> final String stateStoreA = "stateStoreA";
> final String stateStoreB = "stateStoreB";
> final String stateStoreC = "stateStoreC";
> streamsBuilder.stream(inputTopic2, Consumed.with(Serdes.String(), 
> Serdes.String()))
> .process(
> () -> new Processor() {
> private ProcessorContext context;
> @Override
> public void init(ProcessorContext context) {
> this.context = context;
> }
> @Override
> public void process(Record record) {
> context.forward(record);
> }
> @Override
> public void close() {}
> },
> Named.as("processor1")
> )
> .to(outputTopic1, Produced.with(Serdes.String(), Serdes.String()));
> streamsBuilder.stream(inputTopic1, Consumed.with(Serdes.String(), 
> Serdes.String()))
> .toTable(Materialized. byte[]>>as(stateStoreA).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .mapValues(value -> value, Materialized. KeyValueStore byte[]>>as(stateStoreB).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .mapValues(value -> value, Materialized. KeyValueStore byte[]>>as(stateStoreC).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .toStream()
> .to(outputTopic1, Produced.with(Serdes.String(), Serdes.String()));
> final Topology topology = streamsBuilder.build(streamsConfiguration);
> topology.connectProcessorAndStateStores(processorName, stateStoreC);
> {code}
> This code results in the attached topology.
> In the topology {{processor1}} is connected to {{stateStoreC}}. If 
> {{processor1}} is added to the topology before the other processors, i.e., if 
> the right branch of the topology is added before the left branch as in the 
> code above, the cache of 

[jira] [Commented] (KAFKA-15259) Kafka Streams does not continue processing due to rollback despite ProductionExceptionHandlerResponse.CONTINUE if using exactly_once

2023-08-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751878#comment-17751878
 ] 

Guozhang Wang commented on KAFKA-15259:
---

Got it, KAFKA-15309 makes sense.

> Kafka Streams does not continue processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE if using exactly_once
> 
>
> Key: KAFKA-15259
> URL: https://issues.apache.org/jira/browse/KAFKA-15259
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.5.1
>Reporter: Tomonari Yamashita
>Priority: Major
> Attachments: Reproducer.java, app_at_least_once.log, 
> app_exactly_once.log
>
>
> [Problem]
>  - Kafka Streams does not continue processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE if using exactly_once.
>  -- "CONTINUE will signal that Streams should ignore the issue and continue 
> processing"(1), so Kafka Streams should continue processing even if using 
> exactly_once when ProductionExceptionHandlerResponse.CONTINUE used.
>  -- However, if using exactly_once, Kafka Streams does not continue 
> processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE. And the client will be shut down 
> as the default behavior(StreamThreadExceptionResponse.SHUTDOWN_CLIENT) 
> [Environment]
>  - Kafka Streams 3.5.1
> [Reproduction procedure]
>  # Create "input-topic" topic and "output-topic"
>  # Put several messages on "input-topic"
>  # Execute a simple Kafka streams program that transfers too large messages 
> from "input-topic" to "output-topic" with exactly_once and returns 
> ProductionExceptionHandlerResponse.CONTINUE when an exception occurs in the 
> producer. Please refer to the reproducer program (attached file: 
> Reproducer.java).
>  # ==> However, Kafka Streams does not continue processing due to rollback 
> despite ProductionExceptionHandlerResponse.CONTINUE. And the stream thread 
> shutdown as the default 
> behavior(StreamThreadExceptionResponse.SHUTDOWN_CLIENT) (2). Please refer to 
> the debug log (attached file: app_exactly_once.log).
>  ## My excepted behavior is that Kafka Streams should continue processing 
> even if using exactly_once. when ProductionExceptionHandlerResponse.CONTINUE 
> used.
> [As far as my investigation]
>  - FYI, if using at_least_once instead of exactly_once, Kafka Streams 
> continue processing without rollback when 
> ProductionExceptionHandlerResponse.CONTINUE is used. Please refer to the 
> debug log (attached file: app_at_least_once.log).
> - "continue" worked in Kafka Streams 3.1.2, but no longer works since Kafka 
> Streams 3.2.0, as rollback occurs.
> (1) CONFIGURING A STREAMS APPLICATION > default.production.exception.handler
>  - 
> [https://kafka.apache.org/35/documentation/streams/developer-guide/config-streams.html#default-production-exception-handler]
> (2) Transaction abort and shutdown occur
> {code:java}
> 2023-07-26 21:27:19 DEBUG KafkaProducer:1073 - [Producer 
> clientId=java-kafka-streams-e3187cf9-5337-4155-a7cd-fd4e426b889d-StreamThread-1-0_0-producer,
>  transactionalId=java-kafka-streams-0_0] Exception occurred during message 
> send:
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> 2023-07-26 21:27:19 ERROR RecordCollectorImpl:322 - stream-thread 
> [java-kafka-streams-e3187cf9-5337-4155-a7cd-fd4e426b889d-StreamThread-1] 
> stream-task [0_0] Error encountered sending record to topic output-topic for 
> task 0_0 due to:
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> Exception handler choose to CONTINUE processing in spite of this error but 
> written offsets would not be recorded.
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> 2023-07-26 21:27:19 INFO  TransactionManager:393 - [Producer 
> clientId=java-kafka-streams-e3187cf9-5337-4155-a7cd-fd4e426b889d-StreamThread-1-0_0-producer,
>  transactionalId=java-kafka-streams-0_0] Transiting to abortable error state 
> due to org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> 2023-07-26 21:27:19 DEBUG TransactionManager:986 - [Producer 
> 

[jira] [Commented] (KAFKA-15302) Stale value returned when using store.all() in punctuation function.

2023-08-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751830#comment-17751830
 ] 

Guozhang Wang commented on KAFKA-15302:
---

I think case I'm not sure if this is defined as a bug that should be fixed in 
KS: the semantics of `store.all()` is to return a snapshot (behind the scene, 
it's a combo of a cache snapshot and the underlying RocksDB's snapshot) at the 
time of the call, and if the store gets modified after the `all()` call, we do 
not guarantee the entries from the `all()` iterator would return consistent 
results as the `get()` which is meant to return the latest value.

What's confusing though is that the users may not think that modifying the 
store for `keyA` might change the content of `KeyB` compared with the snapshot 
--- as what's happened here due to evictions --- but I think unless we change 
how we execute the caching layer's put/delete calls and how they can trigger 
eviction (of other keys), this would always be the case. For now the only thing 
we can do probably is to clarify it in the docs.

In the long run, if we feel such a confusion is really un-intuitive, we could 
consider 1) decouple flushing with forwarding, and then 2) letting any range 
queries to flush cache first, and then only return from the underlying store.

> Stale value returned when using store.all() in punctuation function.
> 
>
> Key: KAFKA-15302
> URL: https://issues.apache.org/jira/browse/KAFKA-15302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.5.1
>Reporter: Jinyong Choi
>Priority: Major
>
> When using the store.all() function within the Punctuation function of 
> this.context.schedule, the previous value is returned. In other words, even 
> though the value has been stored from 1 to 2, it doesn't return 2; instead, 
> it returns 1.
> In the provided test code, you can see the output 'BROKEN !!!', and while 
> this doesn't occur 100% of the time, by adding logs, it's evident that during 
> the while loop after all() is called, the cache is flushed. As a result, the 
> named cache holds a null value, causing the return of a value from RocksDB. 
> This is observed as the value after the .get() call is different from the 
> expected value. This is possibly due to the consistent read functionality of 
> RocksDB, although the exact cause is not certain.
> Of course, if you perform {{store.flush()}} before {{all()}} there won't be 
> any errors.
>  
>  * test code (forked from balajirrao and modified for this)
> [https://github.com/jinyongchoi/kafka-streams-multi-runner/|https://github.com/jinyongchoi/kafka-streams-multi-runner/tree/main]
>  
> {code:java}
> private void forwardAll(final long timestamp) {
> //
>     System.err.println("forwardAll Start");    KeyValueIterator Integer> kvList = this.kvStore.all();
>     while (kvList.hasNext()) {
>         KeyValue entry = kvList.next();
>         final Record msg = new Record<>(entry.key, 
> entry.value, context.currentSystemTimeMs());
>         final Integer storeValue = this.kvStore.get(entry.key);        if 
> (entry.value != storeValue) {
>             System.err.println("[" + instanceId + "]" + "!!! BROKEN !!! Key: 
> " + entry.key + " Expected in stored(Cache or Store) value: " + storeValue + 
> " but KeyValueIterator value: " + entry.value);
>             throw new RuntimeException("Broken!");
>         }        this.context.forward(msg);
>     }
>     kvList.close();
> }
> {code}
>  * log file (add log in stream source)
>  
> {code:java}
> # console log
> sbt clean "worker/assembly"; sbt "worker/assembly"; sbt "coordinator / run 1"
> [info] welcome to sbt 1.8.2 (Ubuntu Java 11.0.20)
> ...
> [info] running Coordinator 1
> appid: 95108c48-7c69-4eeb-adbd-9d091bd84933
> [0] starting instance +1
> forwardAll Start
> [0]!!! BROKEN !!! Key: 636398 Expected in stored(Cache or Store) value: 2 but 
> KeyValueIterator value: 1
> # log file
> ...
> 01:05:00.382 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.NamedCache -- Named cache 0_0-Counts stats on 
> flush: #hits=5628524, #misses=5636397, #overwrites=636397, #flushes=401
> 01:05:00.388 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.NamedCache -- Named Cache flush 
> dirtyKeys.size():7873 entries:7873
> 01:05:00.434 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.p.i.ProcessorStateManager -- stream-thread 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  stream-task [0_0] Flushed cache or buffer Counts
> ...
> 01:05:00.587 
> 

[jira] [Commented] (KAFKA-15259) Kafka Streams does not continue processing due to rollback despite ProductionExceptionHandlerResponse.CONTINUE if using execute_once

2023-08-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751824#comment-17751824
 ] 

Guozhang Wang commented on KAFKA-15259:
---

[~mjsax] Did you mean a different ticket other than KAFKA-15259 (which is this 
ticket)? BTW I think after the change you proposed, we could still improve the 
KS' layer of handling abortable and fatal txn errors from underlying producers 
on top of what's summarized in KIP-691: if people agree that when KS's 
exception is configured as CONTINUE, then `RecordTooLargeException` etc (maybe 
all exceptions causing `abortable` rather than `fatal` errors as in KIP-691), 
then KS could tell KP to not transit to errors for any `abortable` error type 
but just ignore and continue?

> Kafka Streams does not continue processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE if using execute_once
> 
>
> Key: KAFKA-15259
> URL: https://issues.apache.org/jira/browse/KAFKA-15259
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.5.1
>Reporter: Tomonari Yamashita
>Priority: Major
> Attachments: Reproducer.java, app_at_least_once.log, 
> app_exactly_once.log
>
>
> [Problem]
>  - Kafka Streams does not continue processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE if using execute_once.
>  -- "CONTINUE will signal that Streams should ignore the issue and continue 
> processing"(1), so Kafka Streams should continue processing even if using 
> execute_once when ProductionExceptionHandlerResponse.CONTINUE used.
>  -- However, if using execute_once, Kafka Streams does not continue 
> processing due to rollback despite 
> ProductionExceptionHandlerResponse.CONTINUE. And the client will be shut down 
> as the default behavior(StreamThreadExceptionResponse.SHUTDOWN_CLIENT) 
> [Environment]
>  - Kafka Streams 3.5.1
> [Reproduction procedure]
>  # Create "input-topic" topic and "output-topic"
>  # Put several messages on "input-topic"
>  # Execute a simple Kafka streams program that transfers too large messages 
> from "input-topic" to "output-topic" with execute_once and returns 
> ProductionExceptionHandlerResponse.CONTINUE when an exception occurs in the 
> producer. Please refer to the reproducer program (attached file: 
> Reproducer.java).
>  # ==> However, Kafka Streams does not continue processing due to rollback 
> despite ProductionExceptionHandlerResponse.CONTINUE. And the stream thread 
> shutdown as the default 
> behavior(StreamThreadExceptionResponse.SHUTDOWN_CLIENT) (2). Please refer to 
> the debug log (attached file: app_exactly_once.log).
>  ## My excepted behavior is that Kafka Streams should continue processing 
> even if using execute_once. when ProductionExceptionHandlerResponse.CONTINUE 
> used.
> [As far as my investigation]
>  - FYI, if using at_least_once instead of execute_once, Kafka Streams 
> continue processing without rollback when 
> ProductionExceptionHandlerResponse.CONTINUE is used. Please refer to the 
> debug log (attached file: app_at_least_once.log).
> - "continue" worked in Kafka Streams 3.1.2, but no longer works since Kafka 
> Streams 3.2.0, as rollback occurs.
> (1) CONFIGURING A STREAMS APPLICATION > default.production.exception.handler
>  - 
> [https://kafka.apache.org/35/documentation/streams/developer-guide/config-streams.html#default-production-exception-handler]
> (2) Transaction abort and shutdown occur
> {code:java}
> 2023-07-26 21:27:19 DEBUG KafkaProducer:1073 - [Producer 
> clientId=java-kafka-streams-e3187cf9-5337-4155-a7cd-fd4e426b889d-StreamThread-1-0_0-producer,
>  transactionalId=java-kafka-streams-0_0] Exception occurred during message 
> send:
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> 2023-07-26 21:27:19 ERROR RecordCollectorImpl:322 - stream-thread 
> [java-kafka-streams-e3187cf9-5337-4155-a7cd-fd4e426b889d-StreamThread-1] 
> stream-task [0_0] Error encountered sending record to topic output-topic for 
> task 0_0 due to:
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> Exception handler choose to CONTINUE processing in spite of this error but 
> written offsets would not be recorded.
> org.apache.kafka.common.errors.RecordTooLargeException: The message is 
> 1188 bytes when serialized which is larger than 1048576, which is the 
> value of the max.request.size configuration.
> 2023-07-26 21:27:19 INFO  

[jira] [Commented] (KAFKA-15297) Cache flush order might not be topological order

2023-08-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751821#comment-17751821
 ] 

Guozhang Wang commented on KAFKA-15297:
---

I think this is indeed a general issue, that state stores are initialized in 
the order of the topology which is essentially the "processor node order", as 
in ``InternalTopologyBuilder#build``. This works when a state store is only 
associated with one processors, or when a store is associated with multiple 
processors but they are built as part of a built-in operator (like a join in 
DSL) in which case we carefully make sure that state stores order is adherent 
with the processors order; but in a PAPI scenario like Bruno reported in this 
one, all bets are off.

I think a general fix would be, in the above ``build`` function, we only build 
the processors in the first loop pass without initializing the state stores, 
and then based on the built processors order the state stores to be 
initialized, and do that in another pass.

> Cache flush order might not be topological order 
> -
>
> Key: KAFKA-15297
> URL: https://issues.apache.org/jira/browse/KAFKA-15297
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.4.0
>Reporter: Bruno Cadonna
>Priority: Major
> Attachments: minimal_example.png
>
>
> The flush order of the state store caches in Kafka Streams might not 
> correspond to the topological order of the state stores in the topology. The 
> order depends on how the processors and state stores are added to the 
> topology. 
> In some cases downstream state stores might be flushed before upstream state 
> stores. That means, that during a commit records in upstream caches might end 
> up in downstream caches that have already been flushed during the same 
> commit. If a crash happens at that point, those records in the downstream 
> caches are lost. Those records are lost for two reasons:
> 1. Records in caches are only changelogged after they are flushed from the 
> cache. However, the downstream caches have already been flushed and they will 
> not be flushed again during the same commit.
> 2. The offsets of the input records that caused the records that now are 
> blocked in the downstream caches are committed during the same commit and so 
> they will not be re-processed after the crash.
> An example for a topology where the flush order of the caches is wrong is the 
> following:
> {code:java}
> final String inputTopic1 = "inputTopic1";
> final String inputTopic2 = "inputTopic2";
> final String outputTopic1 = "outputTopic1";
> final String processorName = "processor1";
> final String stateStoreA = "stateStoreA";
> final String stateStoreB = "stateStoreB";
> final String stateStoreC = "stateStoreC";
> streamsBuilder.stream(inputTopic2, Consumed.with(Serdes.String(), 
> Serdes.String()))
> .process(
> () -> new Processor() {
> private ProcessorContext context;
> @Override
> public void init(ProcessorContext context) {
> this.context = context;
> }
> @Override
> public void process(Record record) {
> context.forward(record);
> }
> @Override
> public void close() {}
> },
> Named.as("processor1")
> )
> .to(outputTopic1, Produced.with(Serdes.String(), Serdes.String()));
> streamsBuilder.stream(inputTopic1, Consumed.with(Serdes.String(), 
> Serdes.String()))
> .toTable(Materialized. byte[]>>as(stateStoreA).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .mapValues(value -> value, Materialized. KeyValueStore byte[]>>as(stateStoreB).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .mapValues(value -> value, Materialized. KeyValueStore byte[]>>as(stateStoreC).withKeySerde(Serdes.String()).withValueSerde(Serdes.String()))
> .toStream()
> .to(outputTopic1, Produced.with(Serdes.String(), Serdes.String()));
> final Topology topology = streamsBuilder.build(streamsConfiguration);
> topology.connectProcessorAndStateStores(processorName, stateStoreC);
> {code}
> This code results in the attached topology.
> In the topology {{processor1}} is connected to {{stateStoreC}}. If 
> {{processor1}} is added to the topology before the other processors, i.e., if 
> the right branch of the topology is added before the left branch as in the 
> code above, the cache of {{stateStoreC}} is flushed before the caches of 
> {{stateStoreA}} and {{stateStoreB}}.
> You can observe the flush order by feeding some records into the input topics 
> of the topology, waiting for a commit,  and looking for the following log 
> message:
> 

[jira] [Commented] (KAFKA-15302) Stale value returned when using store.all() in punctuation function.

2023-08-06 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751403#comment-17751403
 ] 

Guozhang Wang commented on KAFKA-15302:
---

[~jinyong.choi] to help us better understand your issue, could you share the 
whole testing code here? More specifically, I'm wondering if you have a 
concurrent thread calling `store.delete` or `store.put(k, null)` (hence causing 
a flush) at the same time while you are looping via `store.all()`?

> Stale value returned when using store.all() in punctuation function.
> 
>
> Key: KAFKA-15302
> URL: https://issues.apache.org/jira/browse/KAFKA-15302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.5.1
>Reporter: Jinyong Choi
>Priority: Major
>
> When using the store.all() function within the Punctuation function of 
> this.context.schedule, the previous value is returned. In other words, even 
> though the value has been stored from 1 to 2, it doesn't return 2; instead, 
> it returns 1.
> In the provided test code, you can see the output 'BROKEN !!!', and while 
> this doesn't occur 100% of the time, by adding logs, it's evident that during 
> the while loop after all() is called, the cache is flushed. As a result, the 
> named cache holds a null value, causing the return of a value from RocksDB. 
> This is observed as the value after the .get() call is different from the 
> expected value. This is possibly due to the consistent read functionality of 
> RocksDB, although the exact cause is not certain.
> Of course, if you perform {{store.flush()}} before {{all()}} there won't be 
> any errors.
>  
>  * test code (forked from balajirrao and modified for this)
> [https://github.com/jinyongchoi/kafka-streams-multi-runner/|https://github.com/jinyongchoi/kafka-streams-multi-runner/tree/main]
>  
> {code:java}
> private void forwardAll(final long timestamp) {
> //
>     System.err.println("forwardAll Start");    KeyValueIterator Integer> kvList = this.kvStore.all();
>     while (kvList.hasNext()) {
>         KeyValue entry = kvList.next();
>         final Record msg = new Record<>(entry.key, 
> entry.value, context.currentSystemTimeMs());
>         final Integer storeValue = this.kvStore.get(entry.key);        if 
> (entry.value != storeValue) {
>             System.err.println("[" + instanceId + "]" + "!!! BROKEN !!! Key: 
> " + entry.key + " Expected in stored(Cache or Store) value: " + storeValue + 
> " but KeyValueIterator value: " + entry.value);
>             throw new RuntimeException("Broken!");
>         }        this.context.forward(msg);
>     }
>     kvList.close();
> }
> {code}
>  * log file (add log in stream source)
>  
> {code:java}
> # console log
> sbt clean "worker/assembly"; sbt "worker/assembly"; sbt "coordinator / run 1"
> [info] welcome to sbt 1.8.2 (Ubuntu Java 11.0.20)
> ...
> [info] running Coordinator 1
> appid: 95108c48-7c69-4eeb-adbd-9d091bd84933
> [0] starting instance +1
> forwardAll Start
> [0]!!! BROKEN !!! Key: 636398 Expected in stored(Cache or Store) value: 2 but 
> KeyValueIterator value: 1
> # log file
> ...
> 01:05:00.382 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.NamedCache -- Named cache 0_0-Counts stats on 
> flush: #hits=5628524, #misses=5636397, #overwrites=636397, #flushes=401
> 01:05:00.388 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.NamedCache -- Named Cache flush 
> dirtyKeys.size():7873 entries:7873
> 01:05:00.434 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.p.i.ProcessorStateManager -- stream-thread 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  stream-task [0_0] Flushed cache or buffer Counts
> ...
> 01:05:00.587 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.s.i.CachingKeyValueStore --  KeyValueIterator 
> all()
> 01:05:00.588 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.RocksDBStore --  RocksDB KeyValueIterator all
> 01:05:00.590 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.ThreadCache -- stream-thread 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>   MemoryLRUCacheBytesIterator cache all()
> 01:05:00.591 
> [95108c48-7c69-4eeb-adbd-9d091bd84933-67de276e-fce4-4621-99c1-aea7849262d2-StreamThread-1]
>  INFO  o.a.k.s.state.internals.NamedCache --   NamedCache allKeys() 
> size():325771
> 

[jira] [Commented] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2023-08-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750491#comment-17750491
 ] 

Guozhang Wang commented on KAFKA-12317:
---

Thanks [~aki] ! Will take a look soon.

> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`-key (`null`-join-key for 
> stream-globalTable), because for a `null`-(join)key the join is undefined: 
> ie, we don't have an attribute the do the table lookup (we consider the 
> stream-record as malformed). Note, that we define the semantics of 
> _left/outer_ join as: keep the stream record if no matching join record was 
> found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12829) Remove Deprecated methods under Topology

2023-08-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750490#comment-17750490
 ] 

Guozhang Wang commented on KAFKA-12829:
---

[~pegasas] Your comment seems deleted, are you still interested in working on 
this JIRA?

> Remove Deprecated methods under Topology
> 
>
> Key: KAFKA-12829
> URL: https://issues.apache.org/jira/browse/KAFKA-12829
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Josep Prat
>Priority: Blocker
> Fix For: 4.0.0
>
>
> The following methods were deprecated in version 2.7:
>  * org.apache.kafka.streams.Topology#addProcessor(java.lang.String, 
> org.apache.kafka.streams.processor.ProcessorSupplier, java.lang.String...) 
>  * 
> org.apache.kafka.streams.Topology#addGlobalStore(org.apache.kafka.streams.state.StoreBuilder,
>  java.lang.String, org.apache.kafka.common.serialization.Deserializer, 
> org.apache.kafka.common.serialization.Deserializer, java.lang.String, 
> java.lang.String, org.apache.kafka.streams.processor.ProcessorSupplier)
>  * 
> org.apache.kafka.streams.Topology#addGlobalStore(org.apache.kafka.streams.state.StoreBuilder,
>  java.lang.String, org.apache.kafka.streams.processor.TimestampExtractor, 
> org.apache.kafka.common.serialization.Deserializer, 
> org.apache.kafka.common.serialization.Deserializer, java.lang.String, 
> java.lang.String, org.apache.kafka.streams.processor.ProcessorSupplier) 
>  
> See KAFKA-10605 and KIP-478.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

2023-07-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748285#comment-17748285
 ] 

Guozhang Wang commented on KAFKA-14748:
---

[~aki] Thanks for picking this series! I think we can have a light KIP just to 
summarize:

1. All the operator behavioral changes among these JIRAs.
2. Why we do not make it an opt-in; and therefore:
3. For users who still want the old behavior, what should they change in their 
code.


> Relax non-null FK left-join requirement
> ---
>
> Key: KAFKA-14748
> URL: https://issues.apache.org/jira/browse/KAFKA-14748
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12317) Relax non-null key requirement for left/outer KStream joins

2023-07-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748284#comment-17748284
 ] 

Guozhang Wang commented on KAFKA-12317:
---

[~mjsax] Though it may not introduce any new configs or interfaces I'd still 
suggest we have a light KIP for the behavioral change, also we would definitely 
want to add a section in the upgrade guide for this.

> Relax non-null key requirement for left/outer KStream joins
> ---
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Florin Akermann
>Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join 
> KafkaStreams drops all stream records with a `null`-key (`null`-join-key for 
> stream-globalTable), because for a `null`-(join)key the join is undefined: 
> ie, we don't have an attribute the do the table lookup (we consider the 
> stream-record as malformed). Note, that we define the semantics of 
> _left/outer_ join as: keep the stream record if no matching join record was 
> found.
> We could relax the definition of _left_ stream-table/globalTable and 
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream 
> records, and call the ValueJoiner with a `null` "other-side" value instead: 
> if the stream record key (or join-key) is `null`, we could treat is as 
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add 
> a `filter()` before the join to drop `null`-(join)key records from the stream 
> explicitly.
> Note that this change also requires to change the behavior if we insert a 
> repartition topic before the join: currently, we drop `null`-key record 
> before writing into the repartition topic (as we know they would be dropped 
> later anyway). We need to relax this behavior for a left stream-table and 
> left/outer stream-stream join. User need to be aware (ie, we might need to 
> put this into the docs and JavaDocs), that records with `null`-key would be 
> partitioned randomly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15240) BrokerToControllerChannelManager cache activeController error cause DefaultAlterPartitionManager send AlterPartition request failed

2023-07-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748281#comment-17748281
 ] 

Guozhang Wang commented on KAFKA-15240:
---

[~lushilin] Thanks for reporting this. I think [~cmccabe] [~hachikuji] would 
have the most context to help investigating.

> BrokerToControllerChannelManager cache activeController error cause 
> DefaultAlterPartitionManager send AlterPartition request failed
> ---
>
> Key: KAFKA-15240
> URL: https://issues.apache.org/jira/browse/KAFKA-15240
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0, 2.8.1, 2.8.2, 3.5.0
> Environment: 2.8.1 kafka version
>Reporter: shilin Lu
>Assignee: shilin Lu
>Priority: Major
> Attachments: image-2023-07-24-16-35-56-589.png
>
>
> After KIP-497,partition leader do not use zk to propagateIsrChanges,it will 
> send AlterPartitionRequest to controller to propagateIsrChanges.Then broker 
> will cache active controller node info through controllerNodeProvider 
> interface.
> 2023.07.12,in kafka product environment,we find so much `Broker had a stale 
> broker epoch` when send partitionAlterRequest to controller.And in this kafka 
> cluster has so much replica not in isr assignment with replica fetch is 
> correct.So it only propagateIsrChanges failed.
> !https://iwiki.woa.com/tencent/api/attachments/s3/url?attachmentid=3165506!
> But there has something strange,if broker send partitionAlterRequest failed 
> controller will print some log like this.But in active controller node not 
> find this log info
> !image-2023-07-24-16-35-56-589.png!
> Then i just suspect this broker connect to an error active controller.Through 
> network packet capture, find this broker connect to an unfamiliar broker 
> port(9092) send request.Refer to this kafka cluster operation history,find 
> this unfamiliar broker is an old broker node in this cluster and this node is 
> a controller node in new kafka cluster.
> Current BrokerToControllerChannelManager update active controller only 
> happened when disconnect or responseCode is NOT_CONTROLLER. So when no 
> request send and error broker node is another kafka cluster controller 
> node,this case will repetite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15215) The default.dsl.store config is not compatible with custom state stores

2023-07-23 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746106#comment-17746106
 ] 

Guozhang Wang commented on KAFKA-15215:
---

Just realized I have not read the KIP! :) At first glance it seems we are on 
similar pages.

Will read it through asap. Also updated the KIP link in this JIRA.

> The default.dsl.store config is not compatible with custom state stores
> ---
>
> Key: KAFKA-15215
> URL: https://issues.apache.org/jira/browse/KAFKA-15215
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Almog Gavra
>Priority: Major
>  Labels: needs-kip
>
> Sort of a bug, sort of a new/missing feature. When we added the long-awaited 
> default.dsl.store config, it was decided to scope the initial KIP to just the 
> two out-of-the-box state stores types offered by Streams, rocksdb and 
> in-memory. The reason being that this would address a large number of the 
> relevant use cases, and could always be followed up with another KIP for 
> custom state stores if/when the demand arose.
> Of course, since rocksdb is the default anyways, the only beneficiaries of 
> this KIP right now are the people who specifically want only in-memory stores 
> – yet custom state stores users are probably by far the ones with the 
> greatest need for an easier way to configure the store type across an entire 
> application. And unfortunately, because the config currently relies on enum 
> definitions for the known OOTB store types, there's not really any way to 
> extend this feature as it is to work with custom implementations.
> I think this is a great feature, which is why I hope to see it extended to 
> the broader user base. Most likely we'll want to introduce a new config for 
> this, though whether it replaces the old default.dsl.store config or 
> complements it will have to be decided during the KIP discussion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15215) The default.dsl.store config is not compatible with custom state stores

2023-07-23 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746102#comment-17746102
 ] 

Guozhang Wang commented on KAFKA-15215:
---

[~ableegoldman] Thanks for bringing this up! I agree that it would be very 
beneficial to customized stores with a configurable store type, what we need to 
figure out though is how to ask users to provide the customized store impls. 
Some quick options on top of my head:

1. Ask users to "register" a set of customized stores, one for each store type 
(i.e. kv, time-ranged, versioned, etc) with a naming convention like a prefix 
so that if the dsl.store configured as "XYZ", the runtime would try to find 
such impl classes named as "XYZ.keyvalue.." in class loading.

2. Provide a "StoreImplSelector" API to extend (or replace) the config-based 
"default.dsl.store". Its default implementation could still follow the config 
value, to return the corresponding in-mem / rocksDB impl classes given the 
store type; users can override this API to select new impl classes of their own 
(again, potentially still rely on the config with new config values; and if the 
corresponding config name is not expected, then error out).

Personally I feel just among these two, the second option seems a clear winner 
--- of course there might be other smarter options as we brainstorm more --- 
since it a) would not enforce users to provide one impl classes for each 
possible store type, b) also as a quality bar to restrict us trying to add more 
store types casually in the future as it will impact all the API instantiations.

> The default.dsl.store config is not compatible with custom state stores
> ---
>
> Key: KAFKA-15215
> URL: https://issues.apache.org/jira/browse/KAFKA-15215
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Almog Gavra
>Priority: Major
>  Labels: needs-kip
>
> Sort of a bug, sort of a new/missing feature. When we added the long-awaited 
> default.dsl.store config, it was decided to scope the initial KIP to just the 
> two out-of-the-box state stores types offered by Streams, rocksdb and 
> in-memory. The reason being that this would address a large number of the 
> relevant use cases, and could always be followed up with another KIP for 
> custom state stores if/when the demand arose.
> Of course, since rocksdb is the default anyways, the only beneficiaries of 
> this KIP right now are the people who specifically want only in-memory stores 
> – yet custom state stores users are probably by far the ones with the 
> greatest need for an easier way to configure the store type across an entire 
> application. And unfortunately, because the config currently relies on enum 
> definitions for the known OOTB store types, there's not really any way to 
> extend this feature as it is to work with custom implementations.
> I think this is a great feature, which is why I hope to see it extended to 
> the broader user base. Most likely we'll want to introduce a new config for 
> this, though whether it replaces the old default.dsl.store config or 
> complements it will have to be decided during the KIP discussion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14639) Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance cycle

2023-03-28 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17706162#comment-17706162
 ] 

Guozhang Wang commented on KAFKA-14639:
---

Had some chat with [~pnee] today regarding the client-side fix, I think a more 
conservative fix may work, which is to only honor the old generation's owned 
partitions if 1) it's the only participant claiming to own this partition, 2) 
it's generation is no smaller than the current generation X minus 1. If its 
generation is X-2 or less, then the above mentioned risk would materialize.

That being said, I still feel that the ultimate fix should touch on the broker 
side, and hence would better be for the new rebalance protocol KIP.

> Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance 
> cycle
> 
>
> Key: KAFKA-14639
> URL: https://issues.apache.org/jira/browse/KAFKA-14639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.2.1
>Reporter: Bojan Blagojevic
>Assignee: Philip Nee
>Priority: Major
> Attachments: consumers-jira.log
>
>
> I have an application that runs 6 consumers in parallel. I am getting some 
> unexpected results when I use {{{}CooperativeStickyAssignor{}}}. If I 
> understand the mechanism correctly, if the consumer looses partition in one 
> rebalance cycle, the partition should be assigned in the next rebalance cycle.
> This assumption is based on the 
> [RebalanceProtocol|https://kafka.apache.org/31/javadoc/org/apache/kafka/clients/consumer/ConsumerPartitionAssignor.RebalanceProtocol.html]
>  documentation and few blog posts that describe the protocol, like [this 
> one|https://www.confluent.io/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/]
>  on Confluent blog.
> {quote}The assignor should not reassign any owned partitions immediately, but 
> instead may indicate consumers the need for partition revocation so that the 
> revoked partitions can be reassigned to other consumers in the next rebalance 
> event. This is designed for sticky assignment logic which attempts to 
> minimize partition reassignment with cooperative adjustments.
> {quote}
> {quote}Any member that revoked partitions then rejoins the group, triggering 
> a second rebalance so that its revoked partitions can be assigned. Until 
> then, these partitions are unowned and unassigned.
> {quote}
> These are the logs from the application that uses 
> {{{}protocol='cooperative-sticky'{}}}. In the same rebalance cycle 
> ({{{}generationId=640{}}}) {{partition 74}} moves from {{consumer-3}} to 
> {{{}consumer-4{}}}. I omitted the lines that are logged by the other 4 
> consumers.
> Mind that the log is in reverse(bottom to top)
> {code:java}
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler1 : New 
> partition assignment: partition-59, seek to min common offset: 85120524
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler2 : Partitions 
> [partition-59] assigned successfully
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler1 : Partitions 
> assigned: [partition-59]
> 2022-12-14 11:18:24 1 — [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer clientId=partition-3-my-client-id-my-group-id, 
> groupId=my-group-id] Adding newly assigned partitions: partition-59
> 2022-12-14 11:18:24 1 — [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer clientId=partition-3-my-client-id-my-group-id, 
> groupId=my-group-id] Notifying assignor about the new 
> Assignment(partitions=[partition-59])
> 2022-12-14 11:18:24 1 — [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer clientId=partition-3-my-client-id-my-group-id, 
> groupId=my-group-id] Request joining group due to: need to revoke partitions 
> [partition-26, partition-74] as indicated by the current assignment and 
> re-join
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler2 : Partitions 
> [partition-26, partition-74] revoked successfully
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler1 : Finished 
> removing partition data
> 2022-12-14 11:18:24 1 — [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer clientId=partition-4-my-client-id-my-group-id, 
> groupId=my-group-id] (Re-)joining group
> 2022-12-14 11:18:24 1 — [consumer-4] x.y.z.MyRebalanceHandler1 : New 
> partition assignment: partition-74, seek to min common offset: 107317730
> 2022-12-14 11:18:24 1 — [consumer-4] x.y.z.MyRebalanceHandler2 : Partitions 
> [partition-74] assigned successfully
> 2022-12-14 11:18:24 1 — [consumer-4] x.y.z.MyRebalanceHandler1 : Partitions 
> assigned: [partition-74]
> 2022-12-14 11:18:24 1 — [consumer-4] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer 

[jira] [Commented] (KAFKA-14639) Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance cycle

2023-03-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705734#comment-17705734
 ] 

Guozhang Wang commented on KAFKA-14639:
---

As for the fix.. it's quite tough, since the ultimate and best solution is the 
new rebalance protocol KIP, and if we want to have a near-term fix, it has to 
be only touching on the client side, not the server side. That means the 
suggestions on the other ticket cannot apply.

The trickiness is that, when a sync-group request got a REBALANCE_IN_PROGRESS 
error for generation X, it means that (let's take the conversion case of 
COORDINATOR_LOAD_IN_PROGRESS out for the moment) the consumer did not actually 
miss a rebalance, it's just that the next generation X+1 has started; HOWEVER, 
its current owned partitions is not from generation X, but from an older 
version, say X-1 (since it did not get the actual assignment for generation X), 
and hence when it sends out that owned partitions of generation X-1, it is 
likely to be discarded by the coordinators. Without fixing the broker-side, 
this trickiness is hard to resolve.

I'd suggest we coordinate with the folks working on the KIP (a.k.a. on the 
broker-side) for the fix at the near term. If everyone agrees this is a good 
issue to be remedied before the KIP is out, I'd suggest we took the 
broker-change, a.k.a. the description in KAFKA-14016 by Shawn. 

If that's considered a too short fix given the KIP, another fix could be, that 
on the assignor side, do not discard the old generation's owned partitions if 
those partitions are NOT claimed by anyone else --- admittedly this is a risky 
(think about a partition owned by host A in generation X, and then assigned to 
B in generation X+1 which host A missed, and then host B missed generation X+2 
in which case it's given back to A, in which case we may see that partitions 
consuming offsets "going backwards") fix but may give us the runway until the 
new rebalance protocol is out. Personally I do not like it and would only 
consider as a last resort.

> Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance 
> cycle
> 
>
> Key: KAFKA-14639
> URL: https://issues.apache.org/jira/browse/KAFKA-14639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.2.1
>Reporter: Bojan Blagojevic
>Assignee: Philip Nee
>Priority: Major
> Attachments: consumers-jira.log
>
>
> I have an application that runs 6 consumers in parallel. I am getting some 
> unexpected results when I use {{{}CooperativeStickyAssignor{}}}. If I 
> understand the mechanism correctly, if the consumer looses partition in one 
> rebalance cycle, the partition should be assigned in the next rebalance cycle.
> This assumption is based on the 
> [RebalanceProtocol|https://kafka.apache.org/31/javadoc/org/apache/kafka/clients/consumer/ConsumerPartitionAssignor.RebalanceProtocol.html]
>  documentation and few blog posts that describe the protocol, like [this 
> one|https://www.confluent.io/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/]
>  on Confluent blog.
> {quote}The assignor should not reassign any owned partitions immediately, but 
> instead may indicate consumers the need for partition revocation so that the 
> revoked partitions can be reassigned to other consumers in the next rebalance 
> event. This is designed for sticky assignment logic which attempts to 
> minimize partition reassignment with cooperative adjustments.
> {quote}
> {quote}Any member that revoked partitions then rejoins the group, triggering 
> a second rebalance so that its revoked partitions can be assigned. Until 
> then, these partitions are unowned and unassigned.
> {quote}
> These are the logs from the application that uses 
> {{{}protocol='cooperative-sticky'{}}}. In the same rebalance cycle 
> ({{{}generationId=640{}}}) {{partition 74}} moves from {{consumer-3}} to 
> {{{}consumer-4{}}}. I omitted the lines that are logged by the other 4 
> consumers.
> Mind that the log is in reverse(bottom to top)
> {code:java}
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler1 : New 
> partition assignment: partition-59, seek to min common offset: 85120524
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler2 : Partitions 
> [partition-59] assigned successfully
> 2022-12-14 11:18:24 1 — [consumer-3] x.y.z.MyRebalanceHandler1 : Partitions 
> assigned: [partition-59]
> 2022-12-14 11:18:24 1 — [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer clientId=partition-3-my-client-id-my-group-id, 
> groupId=my-group-id] Adding newly assigned partitions: partition-59
> 2022-12-14 11:18:24 1 — [consumer-3] o.a.k.c.c.internals.ConsumerCoordinator 
> : [Consumer 

[jira] [Commented] (KAFKA-14639) Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance cycle

2023-03-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17705731#comment-17705731
 ] 

Guozhang Wang commented on KAFKA-14639:
---

I looked at both JIRA tickets and I believe the root cause is KAFKA-13891, and 
this ticket is just a symptom of that issue. More specifically, I think there 
are two common scenarios here, and just to use the example's names for 
illustration purposes only:

Scenario 1:

* in generation 639, partition-74 is "decided" to go to consumer-3 in the 
join-group. Note here I used the word "decided" since the assignment only 
completes after the Sync-group round trip. So this assignment is only logged on 
the leader, but consumer-3 has not learned about this new assignment in the 
sync-group yet.
* then before the sync-group finishes, a new member joins (or more generally, 
some other event happens, such as what KAFKA-13891 describes: some member 
received the sync-group earlier than others, and immediately sends another 
re-join group to trigger a new rebalance). The server-side group coordinator 
bumps up the generation to 640 for the new rebalance.
* consumer-3 sends the sync-group for generation 639 and gets a 
REBALANCE_IN_PROGRESS, and hence it needs to re-join the group without knowing 
partition-74 was given to it in generation 639. In this case consumer-3 has 
never officially "owned" partition-74.
* in generation 640, partition-74 is decided to consumer-4.

>From the callback's perspective, consumer-3 would not trigger the 
>`onPartitionAssigned`, nor `onPartitionRevoked` for partition-74, since the 
>assignment in generation 639 got obsoleted before it even gets notified and 
>hence it does not ever get that partition ever. But this should still be okay 
>since whoever previously owned partition-74 prior to generation 639, should 
>still invoke the `onPartitionRevoked` for partition-74.

Scenario 2 (pretty much what KAFKA-13891's description well stated):

* prior to generation 639, partition-74 is already owned by consumer-3 and 
consumer-3 knows about it. The rebalance of generation 639 did not change that 
ownership, i.e. it's still assigned to consumer-3.
* then before the sync-group finishes, a new member joins (or more generally, 
some other event happens, such as what KAFKA-13891 describes: some member 
received the sync-group earlier than others, and immediately sends another 
re-join group to trigger a new rebalance). The server-side group coordinator 
bumps up the generation to 640 for the new rebalance.
* consumer-3 sends the sync-group for generation 639 and gets a 
REBALANCE_IN_PROGRESS, and hence it needs to re-join the group with generation 
reset. In this case, its owned partitions are discarded by the broker-side 
coordinator since its generation is literally "-1".
* in generation 640, partition-74 is decided to consumer-4 with the perception 
that no one owns this partition in that generation.

In this case, the `onPartitionRevoked` would be triggered on consumer 3 and 
`onPartitionAssigned` on consumer 4 at the same time, which is a bad case.

> Kafka CooperativeStickyAssignor revokes/assigns partition in one rebalance 
> cycle
> 
>
> Key: KAFKA-14639
> URL: https://issues.apache.org/jira/browse/KAFKA-14639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.2.1
>Reporter: Bojan Blagojevic
>Assignee: Philip Nee
>Priority: Major
> Attachments: consumers-jira.log
>
>
> I have an application that runs 6 consumers in parallel. I am getting some 
> unexpected results when I use {{{}CooperativeStickyAssignor{}}}. If I 
> understand the mechanism correctly, if the consumer looses partition in one 
> rebalance cycle, the partition should be assigned in the next rebalance cycle.
> This assumption is based on the 
> [RebalanceProtocol|https://kafka.apache.org/31/javadoc/org/apache/kafka/clients/consumer/ConsumerPartitionAssignor.RebalanceProtocol.html]
>  documentation and few blog posts that describe the protocol, like [this 
> one|https://www.confluent.io/blog/cooperative-rebalancing-in-kafka-streams-consumer-ksqldb/]
>  on Confluent blog.
> {quote}The assignor should not reassign any owned partitions immediately, but 
> instead may indicate consumers the need for partition revocation so that the 
> revoked partitions can be reassigned to other consumers in the next rebalance 
> event. This is designed for sticky assignment logic which attempts to 
> minimize partition reassignment with cooperative adjustments.
> {quote}
> {quote}Any member that revoked partitions then rejoins the group, triggering 
> a second rebalance so that its revoked partitions can be assigned. Until 
> then, these partitions are unowned and unassigned.
> 

[jira] [Created] (KAFKA-14847) Separate the callers of commitAllTasks v.s. commitTasks for EOS(-v2) and ALOS

2023-03-24 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-14847:
-

 Summary: Separate the callers of commitAllTasks v.s. commitTasks 
for EOS(-v2) and ALOS
 Key: KAFKA-14847
 URL: https://issues.apache.org/jira/browse/KAFKA-14847
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


Today, EOS-v2/v1 and ALOS shares the same internal callpath inside 
TaskManager/TaskExecutor for committing tasks from various scenarios, the call 
path {{commitTasksAndMaybeUpdateCommitableOffsets}} -> 
{{commitOffsetsOrTransaction}} takes in a list of tasks as its input, which can 
be a subset of the tasks that thread / task manager owns. For EOS-v1 / ALOS, 
this is fine to commit just a subset of the tasks; however for EOS-v1, since 
all tasks participate in the same txn it could lead to dangerous violations, 
and today we are relying on all the callers of the commit function to make sure 
that the list of tasks they passed in, under EOS-v2, would still not violate 
the semantics. As summarized today (thanks to Matthias), today that callee 
could be triggered in the following cases:

1) Inside handleRevocation() -- this is a clean path, an we add all non-revoked 
tasks with commitNeeded() flag set to the commit -- so this seems to be fine.
2) tryCloseCleanAllActiveTasks() -- here we only call it, if 
tasksToCloseDirty.isEmpty() -- so it seems fine, too.
3) commit() with a list of task handed in -- we call commit() inside the TM 
three time
3.a) inside commitAll() as commit(tasks.values()) (passing in all tasks)
3.b) inside maybeCommitActiveTasksPerUserRequested as 
commit(activeTaskIterable()); (passing in all tasks)
3.c) inside handleCorruption() -- here, we only consider RUNNING and RESTORING 
tasks, which are not corrupted -- note we only throw a TaskCorruptedException 
during restore state initialization, thus, corrupted tasks did not process 
anything yet, and all other tasks should be clean to be committed.
3.d) commitSuccessfullyProcessedTasks() -- under EOS-v2, as we just commit a 
subset of tasks' source offsets while at the same time we still commit those 
unsuccessful task's outgoing records if there are any.

Just going through this list of callers itself, as demonstrated above, is 
already pretty complex, and very vulnerable to bugs. It's better to not rely on 
the callers, but the callees to make sure that's the case. More concretely, I 
think we can introduce a new function called {{commitAllTasks}} such that under 
EOS-v2, the caller always call {{commitAllTasks}} instead, and if there are 
some tasks that should not be committed because we know they have not processed 
any data, the {{commitAllTasks}} callee itself would do some clever filtering 
internally.

Given its scope, I think it's better to do this refactoring after EOS-v1 is 
removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13295) Long restoration times for new tasks can lead to transaction timeouts

2023-03-21 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703404#comment-17703404
 ] 

Guozhang Wang commented on KAFKA-13295:
---

[~sagarrao] I had a chat with others driving the 4.0 release and it seems like 
it will still take some time, given that I will resume helping you to finish 
fixing this bug and not wait for that release.

> Long restoration times for new tasks can lead to transaction timeouts
> -
>
> Key: KAFKA-13295
> URL: https://issues.apache.org/jira/browse/KAFKA-13295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Sagar Rao
>Priority: Critical
>  Labels: eos, new-streams-runtime-should-fix
>
> In some EOS applications with relatively long restoration times we've noticed 
> a series of ProducerFencedExceptions occurring during/immediately after 
> restoration. The broker logs were able to confirm these were due to 
> transactions timing out.
> In Streams, it turns out we automatically begin a new txn when calling 
> {{send}} (if there isn’t already one in flight). A {{send}} occurs often 
> outside a commit during active processing (eg writing to the changelog), 
> leaving the txn open until the next commit. And if a StreamThread has been 
> actively processing when a rebalance results in a new stateful task without 
> revoking any existing tasks, the thread won’t actually commit this open txn 
> before it goes back into the restoration phase while it builds up state for 
> the new task. So the in-flight transaction is left open during restoration, 
> during which the StreamThread only consumes from the changelog without 
> committing, leaving it vulnerable to timing out when restoration times exceed 
> the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-13295) Long restoration times for new tasks can lead to transaction timeouts

2023-03-21 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-13295:
--
Fix Version/s: (was: 4.0.0)

> Long restoration times for new tasks can lead to transaction timeouts
> -
>
> Key: KAFKA-13295
> URL: https://issues.apache.org/jira/browse/KAFKA-13295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Sagar Rao
>Priority: Critical
>  Labels: eos, new-streams-runtime-should-fix
>
> In some EOS applications with relatively long restoration times we've noticed 
> a series of ProducerFencedExceptions occurring during/immediately after 
> restoration. The broker logs were able to confirm these were due to 
> transactions timing out.
> In Streams, it turns out we automatically begin a new txn when calling 
> {{send}} (if there isn’t already one in flight). A {{send}} occurs often 
> outside a commit during active processing (eg writing to the changelog), 
> leaving the txn open until the next commit. And if a StreamThread has been 
> actively processing when a rebalance results in a new stateful task without 
> revoking any existing tasks, the thread won’t actually commit this open txn 
> before it goes back into the restoration phase while it builds up state for 
> the new task. So the in-flight transaction is left open during restoration, 
> during which the StreamThread only consumes from the changelog without 
> committing, leaving it vulnerable to timing out when restoration times exceed 
> the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14172) bug: State stores lose state when tasks are reassigned under EOS wit…

2023-03-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698180#comment-17698180
 ] 

Guozhang Wang commented on KAFKA-14172:
---

[~gray.john][~Horslev] I took a deep look into this issue and I think I found 
the culprit. Here's a short summary:

1. When standbys are enabled, Kafka Streams could recycle a standby task (and 
its state stores) into an active, and vice versa.
2. When caching is enabled, we would bypass the caching layer when updating a 
standby task (i.e. via putInternal).

And these two together combined would cause an issue. Take a concrete example 
following https://github.com/apache/kafka/pull/12540's demo: let's say we have 
a task A with a cached state store S. 

* For a given host, originally the task was hosted as an active.
* A rebalance happens, and that task was recycled into a standby. At that time 
the cache is flushed, so that the underlying store and the cache layer are 
consistent, let's assume they are S1 (version 1).
* The standby task was updated for a period of time, where updates are directly 
written into the underlying store. Now the underlying store is S2 while the 
caching layer is still S1.
* A second rebalance happens, and that task was recycled again into an active. 
Then when that task is normally processing, a read into the store would hit the 
cache layer first, and very likely read out an older versioned S1 instead of 
S2. As a result, we have a duplicate: more specifically in the above PR's 
example, the {{count}} store would return an old counter and hence cause the 
resulted ID inferred from counter being used twice.

That also explains why the test would not fail if caching is disabled, or 
standby replicas are disabled (tested locally); I think this test could still 
fail even when acceptable lag is set to 0, but it is less likely to have a 
standby -> active and then -> standby again so maybe people may not easily 
observe it.

I have a hack fix (note, this is not for merging as it is just a hack) that is 
inherited from [~Horslev]'s integration test, which would clear the cache upon 
flushing it (which is called when the task manager is flushed). With this fix 
the test no longer fails.

> bug: State stores lose state when tasks are reassigned under EOS wit…
> -
>
> Key: KAFKA-14172
> URL: https://issues.apache.org/jira/browse/KAFKA-14172
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.1
>Reporter: Martin Hørslev
>Priority: Critical
>
> h1. State stores lose state when tasks are reassigned under EOS with standby 
> replicas and default acceptable lag.
> I have observed that state stores used in a transform step under a Exactly 
> Once semantics ends up losing state after a rebalancing event that includes 
> reassignment of tasks to previous standby task within the acceptable standby 
> lag.
>  
> The problem is reproduceable and an integration test have been created to 
> showcase the [issue|https://github.com/apache/kafka/pull/12540]. 
> A detailed description of the observed issue is provided 
> [here|https://github.com/apache/kafka/pull/12540/files?short_path=3ca480e#diff-3ca480ef093a1faa18912e1ebc679be492b341147b96d7a85bda59911228ef45]
> Similar issues have been observed and reported to StackOverflow for example 
> [here|https://stackoverflow.com/questions/69038181/kafka-streams-aggregation-data-loss-between-instance-restarts-and-rebalances].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

2023-03-06 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697167#comment-17697167
 ] 

Guozhang Wang commented on KAFKA-14748:
---

Cool! We are on the same page for the first question then.

As for the second question: I think you're right, people can still get the old 
behavior if they filter out the extracted null key records beforehand.

> Relax non-null FK left-join requirement
> ---
>
> Key: KAFKA-14748
> URL: https://issues.apache.org/jira/browse/KAFKA-14748
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

2023-03-03 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696279#comment-17696279
 ] 

Guozhang Wang commented on KAFKA-14748:
---

For table-table FK left-joins, if we have record B whose extractKey function 
returns null, today we would not emit any record; if we allow it to return then 
we would emit a join record. And if there's no further records by this join key 
then that record would be the last one, and hence the emitting behavior indeed 
has changed.

> Relax non-null FK left-join requirement
> ---
>
> Key: KAFKA-14748
> URL: https://issues.apache.org/jira/browse/KAFKA-14748
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14533) Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance

2023-03-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695888#comment-17695888
 ] 

Guozhang Wang commented on KAFKA-14533:
---

After some investigation I found the following:

1. In both successful and failed runs, the first `list-offset` request could 
fail if the topic creation is not yet completed; and that failure is actually 
not fatal since we would just fallback to the naive assignor behavior. So that 
one does not play a role here.
2. What's happening (as I found from both jenkins, as well as locally after 
about 25 runs, phew) is that the stateUpdater.shutdown(timeout) where the 
timeout value is default to `MAX.value` never completes, as the thread itself 
never exits, I have a PR (https://github.com/apache/kafka/pull/13318) that does 
not rely on interruptions to shutdown the thread. I think it could fix the 
never shutdown issue.

> Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
> -
>
> Key: KAFKA-14533
> URL: https://issues.apache.org/jira/browse/KAFKA-14533
> Project: Kafka
>  Issue Type: Test
>  Components: streams, unit tests
>Reporter: Greg Harris
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: flaky-test
>
> The SmokeTestDriverIntegrationTest appears to be flakey failing in recent 
> runs:
> ```
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1444/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1443/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1441/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1440/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1438/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1434/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
> ```
> The stacktrace appears to be:
> ```
> java.util.concurrent.TimeoutException: shouldWorkWithRebalance(boolean) timed 
> out after 600 seconds
>  at 
> org.junit.jupiter.engine.extension.TimeoutExceptionFactory.create(TimeoutExceptionFactory.java:29)
>  at 
> org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:58)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> ...
>  Suppressed: java.lang.InterruptedException: sleep interrupted
>  at java.lang.Thread.sleep(Native Method)
>  at 
> org.apache.kafka.streams.integration.SmokeTestDriverIntegrationTest.shouldWorkWithRebalance(SmokeTestDriverIntegrationTest.java:151)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>  at 
> org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
>  ... 134 more
> ```
> The test appears to be timing out waiting for the SmokeTestClient to complete 
> its asynchronous close, and taking significantly longer to do so (600s 
> instead of 60s) than a typical local test execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12639) AbstractCoordinator ignores backoff timeout when joining the consumer group

2023-02-28 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12639.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> AbstractCoordinator ignores backoff timeout when joining the consumer group
> ---
>
> Key: KAFKA-12639
> URL: https://issues.apache.org/jira/browse/KAFKA-12639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 2.7.0
>Reporter: Matiss Gutmanis
>Assignee: Philip Nee
>Priority: Major
> Fix For: 3.5.0
>
>
> We observed heavy logging while trying to join consumer group during partial 
> unavailability of Kafka cluster (it's part of our testing process). Seems 
> that {{rebalanceConfig.retryBackoffMs}} used in  {{ 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#joinGroupIfNeeded}}
>  is not respected. Debugging revealed that {{Timer}} instance technically is 
> expired thus using sleep of 0 milliseconds which defeats the purpose of 
> backoff timeout.
> Minimal backoff timeout should be respected.
>  
> {code:java}
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14533) Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance

2023-02-28 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694801#comment-17694801
 ] 

Guozhang Wang commented on KAFKA-14533:
---

I took some time into this, also cannot find any clues leading state-updater to 
list-offset request failures on changelog topics.. nevertheless, I re-enabled 
the state-updater flag with finer-grained logging in assignor in 
https://github.com/apache/kafka/pull/13318, and hopefully jenkins could give me 
some more clues (like [~ableegoldman], I cannot reproduce this issue locally 
either with about 15 tries).

> Flaky Test SmokeTestDriverIntegrationTest.shouldWorkWithRebalance
> -
>
> Key: KAFKA-14533
> URL: https://issues.apache.org/jira/browse/KAFKA-14533
> Project: Kafka
>  Issue Type: Test
>  Components: streams, unit tests
>Reporter: Greg Harris
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: flaky-test
>
> The SmokeTestDriverIntegrationTest appears to be flakey failing in recent 
> runs:
> ```
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1444/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1443/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1441/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1440/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1438/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
>     
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1434/tests/
>         java.util.concurrent.TimeoutException: 
> shouldWorkWithRebalance(boolean) timed out after 600 seconds
> ```
> The stacktrace appears to be:
> ```
> java.util.concurrent.TimeoutException: shouldWorkWithRebalance(boolean) timed 
> out after 600 seconds
>  at 
> org.junit.jupiter.engine.extension.TimeoutExceptionFactory.create(TimeoutExceptionFactory.java:29)
>  at 
> org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:58)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
> ...
>  Suppressed: java.lang.InterruptedException: sleep interrupted
>  at java.lang.Thread.sleep(Native Method)
>  at 
> org.apache.kafka.streams.integration.SmokeTestDriverIntegrationTest.shouldWorkWithRebalance(SmokeTestDriverIntegrationTest.java:151)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>  at 
> org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
>  ... 134 more
> ```
> The test appears to be timing out waiting for the SmokeTestClient to complete 
> its asynchronous close, and taking significantly longer to do so (600s 
> instead of 60s) than a typical local test execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

2023-02-28 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694652#comment-17694652
 ] 

Guozhang Wang commented on KAFKA-14748:
---

I agree for Stream-Stream joins now, since for the case "no right hand side 
value found" we would not emit immediately so it is the case with 
"key-extractor returns null". But for table-table FK-joins, today the former 
case would emit while the latter case would not?

> Relax non-null FK left-join requirement
> ---
>
> Key: KAFKA-14748
> URL: https://issues.apache.org/jira/browse/KAFKA-14748
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14747) FK join should record discarded subscription responses

2023-02-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694213#comment-17694213
 ] 

Guozhang Wang commented on KAFKA-14747:
---

Echoing that, I think we can piggy-back on the existing `dropped-records`, as 
it has also been replacing other old sensors like `expired-window-record-drop` 
as well in KIP-743.

> FK join should record discarded subscription responses
> --
>
> Key: KAFKA-14747
> URL: https://issues.apache.org/jira/browse/KAFKA-14747
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Koma Zhang
>Priority: Minor
>  Labels: beginner, newbie
>
> FK-joins are subject to a race condition: If the left-hand side record is 
> updated, a subscription is sent to the right-hand side (including a hash 
> value of the left-hand side record), and the right-hand side might send back 
> join responses (also including the original hash). The left-hand side only 
> processed the responses if the returned hash matches to current hash of the 
> left-hand side record, because a different hash implies that the lef- hand 
> side record was updated in the mean time (including sending a new 
> subscription to the right hand side), and thus the data is stale and the 
> response should not be processed (joining the response to the new record 
> could lead to incorrect results).
> A similar thing can happen on a right-hand side update that triggers a 
> response, that might be dropped if the left-hand side record was updated in 
> parallel.
> While the behavior is correct, we don't record if this happens. We should 
> consider to record this using the existing "dropped record" sensor or maybe 
> add a new sensor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14748) Relax non-null FK left-join requirement

2023-02-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694147#comment-17694147
 ] 

Guozhang Wang commented on KAFKA-14748:
---

Originally I think this can be treated without a KIP, just as a fix in join 
semantics. But when I think again I realized it may not be the case, primarily 
because in that case we cannot distinguish the following two cases:

1) Key extractor returns non-null `K0`, and then found a matching record for 
`K0` with a null `V0`, resulting in ``.

2) Key extractor returns null `K0`, and hence we directly result in ``.

Hence, adding a `filter` operator after the `join` operator alone for `` cannot preserve the old behavior if a developer really wants 
that..

In fact, the same question applies for the general issue of 
https://issues.apache.org/jira/browse/KAFKA-12317 as well: should we try to 
distinguish between the case of extracting a null key for the join, v.s. a case 
where non-null extracted key did not found a matching record on the other 
relation (or more specifically, the other relation returns a null value with 
the extracted key).

My thoughts about the above question are as follows: put performance benefits 
aside, for app semantics where the developers knows there are certain keys in 
the other relation which would never exist (i.e. would always return a null 
value), then developer could let the key extractor to return those keys when 
they want to return no-matching join results; that means, the value of 
KAFKA-12317/KAFKA-14748 would be when the developer does not know any keys in 
the other relations that would never exist.

If we want to change to that behavior which would not distinguish these two 
cases, I'd suggest we add a flag config to enable this behavior across 
fk/out/left joins, and to remove it (i.e. always enable it) when we did not 
hear people complain about the behavior change for a while. But this would 
result in a KIP..

> Relax non-null FK left-join requirement
> ---
>
> Key: KAFKA-14748
> URL: https://issues.apache.org/jira/browse/KAFKA-14748
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>
> Kafka Streams enforces a strict non-null-key policy in the DSL across all 
> key-dependent operations (like aggregations and joins).
> This also applies to FK-joins, in particular to the ForeignKeyExtractor. If 
> it returns `null`, it's treated as invalid. For left-joins, it might make 
> sense to still accept a `null`, and add the left-hand record with an empty 
> right-hand-side to the result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14253) StreamsPartitionAssignor should print the member count in assignment logs

2023-02-16 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-14253.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> StreamsPartitionAssignor should print the member count in assignment logs
> -
>
> Key: KAFKA-14253
> URL: https://issues.apache.org/jira/browse/KAFKA-14253
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: Christopher Pooya Razavian
>Priority: Minor
>  Labels: newbie, newbie++
> Fix For: 3.5.0
>
>
> Debugging rebalance and assignment issues is harder than it needs to be. One 
> simple thing that can help is to print out information in the logs that users 
> have to compute today.
> For example, the StreamsPartitionAssignor prints two messages that contain 
> the the newline-delimited group membership:
> {code:java}
> [StreamsPartitionAssignor] [...-StreamThread-1] stream-thread 
> [...-StreamThread-1-consumer] All members participating in this rebalance:
> : []
> : []
> : []{code}
> and
> {code:java}
> [StreamsPartitionAssignor] [...-StreamThread-1] stream-thread 
> [...-StreamThread-1-consumer] Assigned tasks [...] including stateful [...] 
> to clients as:
> =[activeTasks: ([...]) standbyTasks: ([...])]
> =[activeTasks: ([...]) standbyTasks: ([...])]
> =[activeTasks: ([...]) standbyTasks: ([...])
> {code}
>  
> In both of these cases, it would be nice to:
>  # Include the number of members in the group (I.e., "15 members 
> participating" and "to 15 clients as")
>  # sort the member ids (to help compare the membership and assignment across 
> rebalances)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13152) Replace "buffered.records.per.partition" & "cache.max.bytes.buffering" with "{statestore.cache}/{input.buffer}.max.bytes"

2023-02-14 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688684#comment-17688684
 ] 

Guozhang Wang commented on KAFKA-13152:
---

Thanks [~mimaison], this is still quite relevant and we'd want to fix it 
forward. Let's move it to 3.5.0.

> Replace "buffered.records.per.partition" & "cache.max.bytes.buffering" with 
> "{statestore.cache}/{input.buffer}.max.bytes"
> -
>
> Key: KAFKA-13152
> URL: https://issues.apache.org/jira/browse/KAFKA-13152
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Sagar Rao
>Priority: Major
>  Labels: kip
>
> The current config "buffered.records.per.partition" controls how many records 
> in maximum to bookkeep, and hence it is exceed we would pause fetching from 
> this partition. However this config has two issues:
> * It's a per-partition config, so the total memory consumed is dependent on 
> the dynamic number of partitions assigned.
> * Record size could vary from case to case.
> And hence it's hard to bound the memory usage for this buffering. We should 
> consider deprecating that config with a global, e.g. "input.buffer.max.bytes" 
> which controls how much bytes in total is allowed to be buffered. This is 
> doable since we buffer the raw records in .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12639) AbstractCoordinator ignores backoff timeout when joining the consumer group

2023-02-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683598#comment-17683598
 ] 

Guozhang Wang commented on KAFKA-12639:
---

Yeah that would work better I think.

> AbstractCoordinator ignores backoff timeout when joining the consumer group
> ---
>
> Key: KAFKA-12639
> URL: https://issues.apache.org/jira/browse/KAFKA-12639
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 2.7.0
>Reporter: Matiss Gutmanis
>Assignee: Philip Nee
>Priority: Major
>
> We observed heavy logging while trying to join consumer group during partial 
> unavailability of Kafka cluster (it's part of our testing process). Seems 
> that {{rebalanceConfig.retryBackoffMs}} used in  {{ 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator#joinGroupIfNeeded}}
>  is not respected. Debugging revealed that {{Timer}} instance technically is 
> expired thus using sleep of 0 milliseconds which defeats the purpose of 
> backoff timeout.
> Minimal backoff timeout should be respected.
>  
> {code:java}
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,488 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,489 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,490 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,491 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] JoinGroup failed: Coordinator 
> 127.0.0.1:9092 (id: 2147483634 rack: null) is loading the group.
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] Rebalance failed.
> org.apache.kafka.common.errors.CoordinatorLoadInProgressException: The 
> coordinator is loading and hence can't process requests.
> 2021-03-30 08:30:24,492 INFO 
> [fs2-kafka-consumer-41][o.a.k.c.c.i.AbstractCoordinator] [Consumer 
> clientId=app_clientid, groupId=consumer-group] (Re-)joining group
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14382) StreamThreads can miss rebalance events when processing records during a rebalance

2023-01-20 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679346#comment-17679346
 ] 

Guozhang Wang commented on KAFKA-14382:
---

Thanks for catching this bug [~ableegoldman]! I'm late reviewing the PR but I 
agree with your general case description still. And I think we already have the 
tools to solve the fundamentals as well:

1) When we have the consumer thread refactoring done (hence that's why I also 
add the corresponding label), rebalance would be done by the background thread 
completely and not relying on Streams to call `poll` in time at all. To 
validate that the caller thread is still alive, we still need to call it within 
the max.poll.call, but nothing else like the rebalance related timeous would 
matter.

2) When we have restoration (a heavy IO operation) to the separate thread, we 
should also see the likelihood that the stream thread stuck and not being able 
to call `poll` in time much less as well.

> StreamThreads can miss rebalance events when processing records during a 
> rebalance
> --
>
> Key: KAFKA-14382
> URL: https://issues.apache.org/jira/browse/KAFKA-14382
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: new-consumer-threading-should-fix, rebalancing
> Fix For: 3.4.0, 3.3.2, 3.2.4, 3.1.3, 3.0.3
>
>
> One of the main improvements introduced by the cooperative protocol was the 
> ability to continue processing records during a rebalance. In Streams, we 
> take advantage of this by polling with a timeout of 0 when a rebalance is/has 
> been in progress, so it can return immediately and continue on through the 
> main loop to process new records. The main poll loop uses an algorithm based 
> on the max.poll.interval.ms to ensure the StreamThread returns to call #poll 
> in time to stay in the consumer group.
>  
> Generally speaking, it should exit the processing loop and invoke poll within 
> a few minutes at most based on the poll interval, though typically it will 
> break out much sooner once it's used up all the records from the last poll 
> (based on the max.poll.records config which Streams sets to 1,000 by 
> default). However, if doing heavy processing or setting a higher 
> max.poll.records, the thread may continue processing for more than a few 
> seconds. If it had sent out a JoinGroup request before going on to process 
> and was waiting for its JoinGroup response, then once it does return to 
> invoke #poll it will process this response and send out a SyncGroup – but if 
> the processing took too long, this SyncGroup may immediately fail with the 
> REBALANCE_IN_PROGRESS error.
>  
> Essentially, while the thread was processing the group leader will itself be 
> processing the JoinGroup subscriptions of all members and generating an 
> assignment, then sending this back in its SyncGroup. This may take only a few 
> seconds or less, and the group coordinator will not yet have noticed (or 
> care) that one of the consumers hasn't sent a SyncGroup – it will just return 
> the assigned partitions in the SyncGroup request of the members who have 
> responded in time, and "complete" the rebalance in their eyes. But if the 
> assignment involved moving any partitions from one consumer to another, then 
> it will need to trigger a followup rebalance right away to finish assigning 
> those partitions which were revoked in the previous rebalance. This is what 
> causes a new rebalance to be kicked off just seconds after the first one 
> began.
>  
> If the consumer that was stuck processing was among those who needed to 
> revoke partitions, this can lead to repeating rebalances – since it fails the 
> SyncGroup of the 1st rebalance it never receives the assignment for it and 
> never knows to revoke those partitions, meaning it will rejoin for the new 
> rebalance still claiming them among its ownedPartitions. When the assignor 
> generates the same assignment for the 2nd rebalance, it will again see that 
> some partitions need to be revoked and will therefore trigger yet another new 
> rebalance after finishing the 2nd. This can go on for as long as the 
> StreamThreads are struggling to finish the JoinGroup phase in time due to 
> processing.
>  
> Note that the best workaround at the moment is probably to just set a lower 
> max.poll.records to reduce the processing loop duration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14382) StreamThreads can miss rebalance events when processing records during a rebalance

2023-01-20 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14382:
--
Labels: new-consumer-threading-should-fix rebalancing  (was: rebalancing)

> StreamThreads can miss rebalance events when processing records during a 
> rebalance
> --
>
> Key: KAFKA-14382
> URL: https://issues.apache.org/jira/browse/KAFKA-14382
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: new-consumer-threading-should-fix, rebalancing
> Fix For: 3.4.0, 3.3.2, 3.2.4, 3.1.3, 3.0.3
>
>
> One of the main improvements introduced by the cooperative protocol was the 
> ability to continue processing records during a rebalance. In Streams, we 
> take advantage of this by polling with a timeout of 0 when a rebalance is/has 
> been in progress, so it can return immediately and continue on through the 
> main loop to process new records. The main poll loop uses an algorithm based 
> on the max.poll.interval.ms to ensure the StreamThread returns to call #poll 
> in time to stay in the consumer group.
>  
> Generally speaking, it should exit the processing loop and invoke poll within 
> a few minutes at most based on the poll interval, though typically it will 
> break out much sooner once it's used up all the records from the last poll 
> (based on the max.poll.records config which Streams sets to 1,000 by 
> default). However, if doing heavy processing or setting a higher 
> max.poll.records, the thread may continue processing for more than a few 
> seconds. If it had sent out a JoinGroup request before going on to process 
> and was waiting for its JoinGroup response, then once it does return to 
> invoke #poll it will process this response and send out a SyncGroup – but if 
> the processing took too long, this SyncGroup may immediately fail with the 
> REBALANCE_IN_PROGRESS error.
>  
> Essentially, while the thread was processing the group leader will itself be 
> processing the JoinGroup subscriptions of all members and generating an 
> assignment, then sending this back in its SyncGroup. This may take only a few 
> seconds or less, and the group coordinator will not yet have noticed (or 
> care) that one of the consumers hasn't sent a SyncGroup – it will just return 
> the assigned partitions in the SyncGroup request of the members who have 
> responded in time, and "complete" the rebalance in their eyes. But if the 
> assignment involved moving any partitions from one consumer to another, then 
> it will need to trigger a followup rebalance right away to finish assigning 
> those partitions which were revoked in the previous rebalance. This is what 
> causes a new rebalance to be kicked off just seconds after the first one 
> began.
>  
> If the consumer that was stuck processing was among those who needed to 
> revoke partitions, this can lead to repeating rebalances – since it fails the 
> SyncGroup of the 1st rebalance it never receives the assignment for it and 
> never knows to revoke those partitions, meaning it will rejoin for the new 
> rebalance still claiming them among its ownedPartitions. When the assignor 
> generates the same assignment for the 2nd rebalance, it will again see that 
> some partitions need to be revoked and will therefore trigger yet another new 
> rebalance after finishing the 2nd. This can go on for as long as the 
> StreamThreads are struggling to finish the JoinGroup phase in time due to 
> processing.
>  
> Note that the best workaround at the moment is probably to just set a lower 
> max.poll.records to reduce the processing loop duration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14641) Cleanup CommitNeeded after EOS-V1 is removed

2023-01-19 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-14641:
-

 Summary: Cleanup CommitNeeded after EOS-V1 is removed
 Key: KAFKA-14641
 URL: https://issues.apache.org/jira/browse/KAFKA-14641
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang


This is a follow-up of KAFKA-14294.

Today we have several flags to determine if KS need to execute a commit: 1) 
task-level "commitNeeded" which is set whenever process() or punctuator() is 
called, 2) if there are input topic offsets to commit, retrieved from the 
"task.prepareCommit()", 3) the "transactionInFlight" flag from producer as a 
fix of KAFKA-14294 (this subsumes the first "commitNeeded" functionality).

Given that we are still having EOS-v1, cleanup this would be a bit complex. But 
after the deprecated EOS-V1 is removed, we can cleanup those controls since for 
any commit cases, we would need to commit all tasks anyways whereas in EOS-v1, 
we would commit probably a subset of tasks since they are done by different 
producers and hence different txns.

A quick thought is the following:

1) We would not need the per-task "commitNeeded" anymore.
2) We would maintain a single "commitNeeded" flag on the task-executor, hence 
on the thread level. It is set whenever `process()` or `punctuator` is called.
3) Whenever we need to commit, either a) periodically, b) upon revocation, c) 
upon user request, we simply check that flag, and if necessary commit all tasks 
and reset the flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14641) Cleanup CommitNeeded after EOS-V1 is removed

2023-01-19 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14641:
--
Component/s: streams

> Cleanup CommitNeeded after EOS-V1 is removed
> 
>
> Key: KAFKA-14641
> URL: https://issues.apache.org/jira/browse/KAFKA-14641
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>
> This is a follow-up of KAFKA-14294.
> Today we have several flags to determine if KS need to execute a commit: 1) 
> task-level "commitNeeded" which is set whenever process() or punctuator() is 
> called, 2) if there are input topic offsets to commit, retrieved from the 
> "task.prepareCommit()", 3) the "transactionInFlight" flag from producer as a 
> fix of KAFKA-14294 (this subsumes the first "commitNeeded" functionality).
> Given that we are still having EOS-v1, cleanup this would be a bit complex. 
> But after the deprecated EOS-V1 is removed, we can cleanup those controls 
> since for any commit cases, we would need to commit all tasks anyways whereas 
> in EOS-v1, we would commit probably a subset of tasks since they are done by 
> different producers and hence different txns.
> A quick thought is the following:
> 1) We would not need the per-task "commitNeeded" anymore.
> 2) We would maintain a single "commitNeeded" flag on the task-executor, hence 
> on the thread level. It is set whenever `process()` or `punctuator` is called.
> 3) Whenever we need to commit, either a) periodically, b) upon revocation, c) 
> upon user request, we simply check that flag, and if necessary commit all 
> tasks and reset the flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14328) KafkaAdminClient should be Changing the exception level When an exception occurs

2022-10-27 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625256#comment-17625256
 ] 

Guozhang Wang commented on KAFKA-14328:
---

Hello [~shizhenzhen] sorry for the late reply. I was OOO for a while.

I reviewed the ticket and the latest PR, and I think your proposed change makes 
sense, it's less intrusive than logging a warn on each retry, since people 
would concern it flood the log entries. I left some comments on the PR.

> KafkaAdminClient should be Changing the exception level When an exception 
> occurs
> 
>
> Key: KAFKA-14328
> URL: https://issues.apache.org/jira/browse/KAFKA-14328
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 3.3
>Reporter: shizhenzhen
>Priority: Major
> Attachments: image-2022-10-21-11-19-21-064.png, 
> image-2022-10-21-14-56-31-753.png, image-2022-10-21-16-54-40-588.png, 
> image-2022-10-21-16-56-45-448.png, image-2022-10-21-16-58-19-353.png, 
> image-2022-10-24-14-28-10-365.png, image-2022-10-24-14-47-30-641.png, 
> image-2022-10-24-14-48-27-907.png
>
>
>  
>  
> KafkaAdminClient 的一些日志全部是 log.trace.  当遇到异常的时候根本不知道什么原因,导致排查问题非常艰难。
>  
> 就比如下面这里,当去请求Metadata请求的时候,如果查询到的Topic有分区Leader=-1的时候,就会抛出异常;
>  
> 但是这个时候实际上异常是被吞掉了的,这里往上面抛出异常之后,到了下面第二张图的 Catch部分。
> 他会把这个请求重新放到到请求队列中。然后就会陷入无限读重试之后,直到达到超时时间抛出异常:Timed out waiting for a node 
> assignment. Call: metadata
>  
> 无法给Metadata请求分配节点,正常情况下谁知道他真正的异常其实是
>  
> ```
> org.apache.kafka.common.errors.LeaderNotAvailableException: There is no 
> leader for this topic-partition as we are in the middle of a leadership 
> election.
>  
> ```
>  
>  
>  
>  
> !https://user-images.githubusercontent.com/10442648/196944422-e11b732f-6f7f-4f77-8d9c-1f0544257461.png!
>  
>  
>  
> 下面截图那里是我改成的warn基本的日志
> !image-2022-10-21-11-19-21-064.png!
>  
> 所以我希望这里的log.trace 能改成 log.warn ; 给一个提醒。
> 就可以说明当前因为某个异常的原因而导致可能的重试。
>  
>  
> 
>  
>  
> !image-2022-10-21-14-56-31-753.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14270) Kafka Streams logs exception on startup

2022-10-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612165#comment-17612165
 ] 

Guozhang Wang commented on KAFKA-14270:
---

Thanks for filing the bug [~eikemeier], will take a look.

>From your description, it seems whenever Kafka Streams is started, with 
>whatever integration tooling besides groovy, it will always log a warning?

> Kafka Streams logs exception on startup
> ---
>
> Key: KAFKA-14270
> URL: https://issues.apache.org/jira/browse/KAFKA-14270
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.0
>Reporter: Oliver Eikemeier
>Priority: Minor
>
> Kafka Streams expects a version resource at 
> /kafka/kafka-streams-version.properties. It is read by {{{}ClientMetrics{}}}, 
> initialised by
> [https://github.com/apache/kafka/blob/3.3.0/streams/src/main/java/org/apache/kafka/streams/KafkaStreams.java#L894]
> When the resource is not found,
> [https://github.com/apache/kafka/blob/3.3.0/streams/src/main/java/org/apache/kafka/streams/internals/metrics/ClientMetrics.java#L55]
> logs a warning at startup:
> org.apache.kafka.streams.internals.metrics.ClientMetrics  WARN: Error 
> while loading kafka-streams-version.properties
> java.lang.NullPointerException: inStream parameter is null
>   at java.base/java.util.Objects.requireNonNull(Objects.java:233)
>   at java.base/java.util.Properties.load(Properties.java:407)
>   at 
> org.apache.kafka.streams.internals.metrics.ClientMetrics.(ClientMetrics.java:53)
>   at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:894)
>   at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:856)
>   at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:826)
>   at org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:738)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-14260) InMemoryKeyValueStore iterator still throws ConcurrentModificationException

2022-10-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612164#comment-17612164
 ] 

Guozhang Wang edited comment on KAFKA-14260 at 10/3/22 3:50 AM:


Hello [~aviperksy] sorry for the late reply! I looked at the code again and I 
think I agree with you --- we are probably looking at different versions of the 
source code since in latest trunk, line 125 seems irrelevant 
(https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStore.java#L125)
 --- but the thing is that at the time when this line was called:

{code}
if (forward) {
this.iter = new TreeSet<>(keySet).iterator();
} else {
this.iter = new TreeSet<>(keySet).descendingIterator();
}
{code}

in which the constructor of {{TreeSet}} loops over the {{keySet}}, if that 
{{keySet}}'s underlying map is modified then we will still have an issue. As 
for the fix, I think in the near term we'd have to bite the bullet of 
performance can turn back to ConcurrentSkipListMap (we may tune some initial 
params e.g. using concurrencyLevel of 1). In the long term, I think we could 
leverage on similar ideas we are pursuing for transactional state stores, where 
we keep two in-memory maps, and the first map is read-only for IQ, and second 
is used to maintain deltas within a commit interval and during processing, we'd 
need to read both maps, and upon committing we lock the first one to apply the 
deltas.

cc [~ableegoldman] what do you think?


was (Author: guozhang):
Hello [~aviperksy] sorry for the late reply!

> InMemoryKeyValueStore iterator still throws ConcurrentModificationException
> ---
>
> Key: KAFKA-14260
> URL: https://issues.apache.org/jira/browse/KAFKA-14260
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 3.2.3
>Reporter: Avi Cherry
>Priority: Major
>
> This is the same bug as KAFKA-7912 which was then re-introduced by KAFKA-8802.
> Any iterator returned from {{InMemoryKeyValueStore}} may end up throwing a 
> ConcurrentModificationException because the backing map is not concurrent 
> safe. I expect that this only happens when the store is retrieved from 
> {{KafkaStreams.store()}} from outside of the topology since any usage of the 
> store from inside of the topology should be naturally single-threaded.
> To start off, a reminder that this behaviour explicitly violates the 
> interface contract for {{ReadOnlyKeyValueStore}} which states
> {quote}The returned iterator must be safe from 
> java.util.ConcurrentModificationExceptions
> {quote}
> It is often complicated to make code to demonstrate concurrency bugs, but 
> thankfully it is trivial to reason through the source code in 
> {{InMemoryKeyValueStore.java}} to show why this happens:
>  * All of the InMemoryKeyValueStore methods that return iterators do so by 
> passing a keySet based on the backing TreeMap to the InMemoryKeyValueIterator 
> constructor.
>  * These keySets are all VIEWS of the backing map, not copies.
>  * The InMemoryKeyValueIterator then makes a private copy of the keySet by 
> passing the original keySet into the constructor for TreeSet. This copying 
> was implemented in KAFKA-8802, incorrectly intending it to fix the 
> concurrency problem.
>  * TreeSet then iterates over the keySet to make a copy. If the original 
> backing TreeMap in InMemoryKeyValueStore is changed while this copy is being 
> created it will fail-fast a ConcurrentModificationException.
> This bug should be able to be trivially fixed by replacing the backing 
> TreeMap with a ConcurrentSkipListMap but here's the rub:
> This bug has already been found in KAFKA-7912 and the TreeMap was replaced 
> with a ConcurrentSkipListMap. It was then reverted back to a TreeMap in 
> KAFKA-8802 because of the performance regression. I can [see from one of the 
> PRs|https://github.com/apache/kafka/pull/7212/commits/384c12e40f3a59591f897d916f92253e126820ed]
>  that it was believed the concurrency problem with the TreeMap implementation 
> was fixed by copying the keyset when the iterator is created but the problem 
> remains, plus the fix creates an extra copy of the iterated portion of the 
> set in memory.
> For what it's worth, the performance difference between TreeMap and 
> ConcurrentSkipListMap do not extend into complexity. TreeMap enjoys a similar 
> ~2x speed through all operations with any size of data, but at the cost of 
> what turned out to be an easy-to-encounter bug.
> This is all unfortunate since the only time the state stores ever get 
> accessed concurrently is through the `KafkaStreams.store()` mechanism, but I 
> would imagine that "correct and 

[jira] [Commented] (KAFKA-14260) InMemoryKeyValueStore iterator still throws ConcurrentModificationException

2022-10-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612164#comment-17612164
 ] 

Guozhang Wang commented on KAFKA-14260:
---

Hello [~aviperksy] sorry for the late reply!

> InMemoryKeyValueStore iterator still throws ConcurrentModificationException
> ---
>
> Key: KAFKA-14260
> URL: https://issues.apache.org/jira/browse/KAFKA-14260
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 3.2.3
>Reporter: Avi Cherry
>Priority: Major
>
> This is the same bug as KAFKA-7912 which was then re-introduced by KAFKA-8802.
> Any iterator returned from {{InMemoryKeyValueStore}} may end up throwing a 
> ConcurrentModificationException because the backing map is not concurrent 
> safe. I expect that this only happens when the store is retrieved from 
> {{KafkaStreams.store()}} from outside of the topology since any usage of the 
> store from inside of the topology should be naturally single-threaded.
> To start off, a reminder that this behaviour explicitly violates the 
> interface contract for {{ReadOnlyKeyValueStore}} which states
> {quote}The returned iterator must be safe from 
> java.util.ConcurrentModificationExceptions
> {quote}
> It is often complicated to make code to demonstrate concurrency bugs, but 
> thankfully it is trivial to reason through the source code in 
> {{InMemoryKeyValueStore.java}} to show why this happens:
>  * All of the InMemoryKeyValueStore methods that return iterators do so by 
> passing a keySet based on the backing TreeMap to the InMemoryKeyValueIterator 
> constructor.
>  * These keySets are all VIEWS of the backing map, not copies.
>  * The InMemoryKeyValueIterator then makes a private copy of the keySet by 
> passing the original keySet into the constructor for TreeSet. This copying 
> was implemented in KAFKA-8802, incorrectly intending it to fix the 
> concurrency problem.
>  * TreeSet then iterates over the keySet to make a copy. If the original 
> backing TreeMap in InMemoryKeyValueStore is changed while this copy is being 
> created it will fail-fast a ConcurrentModificationException.
> This bug should be able to be trivially fixed by replacing the backing 
> TreeMap with a ConcurrentSkipListMap but here's the rub:
> This bug has already been found in KAFKA-7912 and the TreeMap was replaced 
> with a ConcurrentSkipListMap. It was then reverted back to a TreeMap in 
> KAFKA-8802 because of the performance regression. I can [see from one of the 
> PRs|https://github.com/apache/kafka/pull/7212/commits/384c12e40f3a59591f897d916f92253e126820ed]
>  that it was believed the concurrency problem with the TreeMap implementation 
> was fixed by copying the keyset when the iterator is created but the problem 
> remains, plus the fix creates an extra copy of the iterated portion of the 
> set in memory.
> For what it's worth, the performance difference between TreeMap and 
> ConcurrentSkipListMap do not extend into complexity. TreeMap enjoys a similar 
> ~2x speed through all operations with any size of data, but at the cost of 
> what turned out to be an easy-to-encounter bug.
> This is all unfortunate since the only time the state stores ever get 
> accessed concurrently is through the `KafkaStreams.store()` mechanism, but I 
> would imagine that "correct and slightly slower) is better than "incorrect 
> and faster".
> Too bad BoilerBay's AirConcurrentMap is closed-source and patented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14269) Partition Assignment Strategy - Topic Round Robin Assignor

2022-10-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612163#comment-17612163
 ] 

Guozhang Wang commented on KAFKA-14269:
---

Hello [~mathieu.amblard], thanks for reporting this use case.

I'm wondering in your scenario if all topics have the same num. partitions, and 
have similar data traffic as well? I'm asking this because one of the primarily 
goals of partition assignors is to achieve workload balance, so if topics have 
different partitions / data traffic, such an assignor may lead to very 
imbalanced assignment.

> Partition Assignment Strategy - Topic Round Robin Assignor
> --
>
> Key: KAFKA-14269
> URL: https://issues.apache.org/jira/browse/KAFKA-14269
> Project: Kafka
>  Issue Type: Wish
>  Components: clients
>Reporter: Mathieu Amblard
>Priority: Major
>
> *The context :*
> I have :
>  * only one type of message per topic
>  * the same number of consumers and topics
>  * each consumer subscribes to all topics in the same microservice
>  * a strategy where I stopped the consumer if the consumption failed
>  
> *The need :*
> I would like to have a Topic Round Robin Assignor in order to assign all 
> partitions of same topic to exactly one consumer, therefore I will be able to 
> continue the consumption of one topic even if one failed.
> If there are exactly the same number of consumers and topics, then each 
> consumer will get all partitions of one topic.
> If there are more consumers than topics, then some consumer will not have any 
> partitions to consume.
> If there are less consumers than topics, then some consumer will have 
> multiple topics to consume.
>  
> As far as I know, there are currently 4 different strategies : 
> CooperativeStickyAssignor, RangeAssignor, RoundRobinAssignor, StickyAssignor.
> Therefore, I have written my own Topic Round Robin Assignor that assigns all 
> partitions from each topic to exactly one consumer.
>  
> For example, suppose there are two consumers *C0* and {*}C1{*}, two topics 
> *t0* and {*}t1{*}, and each topic has 3 partitions, resulting in partitions 
> {*}t0p0{*}, {*}t0p1{*}, {*}t0p2{*}, {*}t1p0{*}, {*}t1p1{*}, and {*}t1p2{*}.
> The assignment will be:
> C0: [t0p0, t0p1 t0p1]
> C1: [t1p0, t1p1, t1p2]
>  
> First of all, I would like to know if this is a legitimate need.
> If this is the case, if you are interested to have a Pull Request about it.
> Thank you in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-4852) ByteBufferSerializer not compatible with offsets

2022-09-29 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4852:
-
Fix Version/s: 3.4.0

> ByteBufferSerializer not compatible with offsets
> 
>
> Key: KAFKA-4852
> URL: https://issues.apache.org/jira/browse/KAFKA-4852
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.1
> Environment: all
>Reporter: Werner Daehn
>Assignee: LinShunkang
>Priority: Minor
> Fix For: 3.4.0
>
>
> Quick intro: A ByteBuffer.rewind() resets the position to zero. What if the 
> ByteBuffer was created with an offset? new ByteBuffer(data, 3, 10)? The 
> ByteBufferSerializer will send from pos=0 and not from pos=3 onwards.
> Solution: No rewind() but flip() for reading a ByteBuffer. That's what the 
> flip is meant for.
> Story:
> Imagine the incoming data comes from a byte[], e.g. a network stream 
> containing topicname, partition, key, value, ... and you want to create a new 
> ProducerRecord for that. As the constructor of ProducerRecord requires 
> (topic, partition, key, value) you have to copy from above byte[] the key and 
> value. That means there is a memcopy taking place. Since the payload can be 
> potentially large, that introduces a lot of overhead. Twice the memory.
> A nice solution to this problem is to simply wrap the network byte[] into new 
> ByteBuffers:
> ByteBuffer key = ByteBuffer.wrap(data, keystart, keylength);
> ByteBuffer value = ByteBuffer.wrap(data, valuestart, valuelength);
> and then use the ByteBufferSerializer instead of the ByteArraySerializer.
> But that does not work as the ByteBufferSerializer does a rewind(), hence 
> both, key and value, will start at position=0 of the data[].
> public class ByteBufferSerializer implements Serializer {
> public byte[] serialize(String topic, ByteBuffer data) {
>  data.rewind();



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-26 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609798#comment-17609798
 ] 

Guozhang Wang commented on KAFKA-10635:
---

HI [~nicktelford] just making sure you get the latest message and if you could 
reproduce the issue with the PR applied to improve logging information.

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
> Attachments: logs.csv
>
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14260) InMemoryKeyValueStore iterator still throws ConcurrentModificationException

2022-09-26 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609705#comment-17609705
 ] 

Guozhang Wang commented on KAFKA-14260:
---

Hello [~aviperksy] just checking is https://github.com/apache/kafka/pull/11367 
related?

> InMemoryKeyValueStore iterator still throws ConcurrentModificationException
> ---
>
> Key: KAFKA-14260
> URL: https://issues.apache.org/jira/browse/KAFKA-14260
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 3.2.3
>Reporter: Avi Cherry
>Priority: Major
>
> This is the same bug as KAFKA-7912 which was then re-introduced by KAFKA-8802.
> Any iterator returned from {{InMemoryKeyValueStore}} may end up throwing a 
> ConcurrentModificationException because the backing map is not concurrent 
> safe. I expect that this only happens when the store is retrieved from 
> {{KafkaStreams.store()}} from outside of the topology since any usage of the 
> store from inside of the topology should be naturally single-threaded.
> To start off, a reminder that this behaviour explicitly violates the 
> interface contract for {{ReadOnlyKeyValueStore}} which states
> {quote}The returned iterator must be safe from 
> java.util.ConcurrentModificationExceptions
> {quote}
> It is often complicated to make code to demonstrate concurrency bugs, but 
> thankfully it is trivial to reason through the source code in 
> {{InMemoryKeyValueStore.java}} to show why this happens:
>  * All of the InMemoryKeyValueStore methods that return iterators do so by 
> passing a keySet based on the backing TreeMap to the InMemoryKeyValueIterator 
> constructor.
>  * These keySets are all VIEWS of the backing map, not copies.
>  * The InMemoryKeyValueIterator then makes a private copy of the keySet by 
> passing the original keySet into the constructor for TreeSet. This copying 
> was implemented in KAFKA-8802, incorrectly intending it to fix the 
> concurrency problem.
>  * TreeSet then iterates over the keySet to make a copy. If the original 
> backing TreeMap in InMemoryKeyValueStore is changed while this copy is being 
> created it will fail-fast a ConcurrentModificationException.
> This bug should be able to be trivially fixed by replacing the backing 
> TreeMap with a ConcurrentSkipListMap but here's the rub:
> This bug has already been found in KAFKA-7912 and the TreeMap was replaced 
> with a ConcurrentSkipListMap. It was then reverted back to a TreeMap in 
> KAFKA-8802 because of the performance regression. I can [see from one of the 
> PRs|https://github.com/apache/kafka/pull/7212/commits/384c12e40f3a59591f897d916f92253e126820ed]
>  that it was believed the concurrency problem with the TreeMap implementation 
> was fixed by copying the keyset when the iterator is created but the problem 
> remains, plus the fix creates an extra copy of the iterated portion of the 
> set in memory.
> For what it's worth, the performance difference between TreeMap and 
> ConcurrentSkipListMap do not extend into complexity. TreeMap enjoys a similar 
> ~2x speed through all operations with any size of data, but at the cost of 
> what turned out to be an easy-to-encounter bug.
> This is all unfortunate since the only time the state stores ever get 
> accessed concurrently is through the `KafkaStreams.store()` mechanism, but I 
> would imagine that "correct and slightly slower) is better than "incorrect 
> and faster".
> Too bad BoilerBay's AirConcurrentMap is closed-source and patented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-12370) Refactor KafkaStreams exposed metadata hierarchy

2022-09-26 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-12370:
--
Description: 
Currently in KafkaStreams we have two groups of metadata getter:

1.
{code}
allMetadata
allMetadataForStore
{code}

Return collection of {{StreamsMetadata}}, which only contains the partitions as 
active/standby, plus the hostInfo, but not exposing any task info.

2.
{code}
queryMetadataForKey
{code}

Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
standbys, plus the partition id.

3.
{code}
localThreadsMetadata
{code}

Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for 
active and standby tasks.

All the above functions are used for interactive queries, but their exposed 
metadata are very different, and some use cases would need to have all client, 
thread, and task metadata to fulfill the feature development. At the same time, 
we may have a more dynamic "task -> thread" mapping in the future and also the 
embedded clients like consumers would not be per thread, but per client.

---

Rethinking about the metadata, I feel we can have a more consistent hierarchy 
as the following:

* {{StreamsMetadata}} represent the metadata for the client, which includes the 
set of {{ThreadMetadata}} for its existing thread and the set of 
{{TaskMetadata}} for active and standby tasks assigned to this client, plus 
client metadata including hostInfo, embedded client ids.

* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
currently assigned tasks. Also after we removed the deprecated EOSv1, it should 
always return a single producer client id since each thread would only have one 
client.

* {{TaskMetadata}} includes the name (including the sub-topology id and the 
partition id), the state, the corresponding sub-topology description (including 
the state store names, source topic names).

* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from 
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
{{localMetadata}} (renamed from localThreadMetadata) returns a single 
{{StreamsMetadata}}.

* {{KeyQueryMetadata}} Class would be deprecated and replaced by 
{{TaskMetadata}}.

To illustrate as an example, to find out who are the current active host / 
standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
for each returned {{StreamsMetadata}} we loop over their contained 
{{TaskMetadata}} for active / standby, and filter by its corresponding 
sub-topology's description's contained store name. 

  was:
Currently in KafkaStreams we have two groups of metadata getter:

1.
{code}
allMetadata
allMetadataForStore
{code}

Return collection of {{StreamsMetadata}}, which only contains the partitions as 
active/standby, plus the hostInfo, but not exposing any task info.

2.
{code}
queryMetadataForKey
{code}

Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
standbys, plus the partition id.

3.
{code}
localThreadsMetadata
{code}

Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for 
active and standby tasks.

All the above functions are used for interactive queries, but their exposed 
metadata are very different, and some use cases would need to have all client, 
thread, and task metadata to fulfill the feature development. At the same time, 
we may have a more dynamic "task -> thread" mapping in the future and also the 
embedded clients like consumers would not be per thread, but per client.

---

Rethinking about the metadata, I feel we can have a more consistent hierarchy 
as the following:

* {{StreamsMetadata}} represent the metadata for the client, which includes the 
set of {{ThreadMetadata}} for its existing thread and the set of 
{{TaskMetadata}} for active and standby tasks assigned to this client, plus 
client metadata including hostInfo, embedded client ids.

* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
currently assigned tasks.

* {{TaskMetadata}} includes the name (including the sub-topology id and the 
partition id), the state, the corresponding sub-topology description (including 
the state store names, source topic names).

* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from 
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
{{localMetadata}} (renamed from localThreadMetadata) returns a single 
{{StreamsMetadata}}.

* {{KeyQueryMetadata}} Class would be deprecated and replaced by 
{{TaskMetadata}}.

To illustrate as an example, to find out who are the current active host / 
standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
for each returned {{StreamsMetadata}} we loop over their contained 
{{TaskMetadata}} for active / standby, and filter by its corresponding 
sub-topology's description's 

[jira] [Comment Edited] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-20 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607329#comment-17607329
 ] 

Guozhang Wang edited comment on KAFKA-10635 at 9/20/22 5:20 PM:


Could you try out this patch https://github.com/apache/kafka/pull/12667 and 
reproduce the issue, and collect the logs again?


was (Author: guozhang):
Could you try out this patch https://github.com/apache/kafka/pull/12667 and 
reproduce the issue?

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
> Attachments: logs.csv
>
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our 

[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-20 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607329#comment-17607329
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Could you try out this patch https://github.com/apache/kafka/pull/12667 and 
reproduce the issue?

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
> Attachments: logs.csv
>
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-19 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606711#comment-17606711
 ] 

Guozhang Wang commented on KAFKA-10635:
---

>From the logs, I think the OOOSException was thrown here: 
>https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/ProducerStateManager.scala#L236.
> because `currentLastSeq` is -1 (i.e. UNKNOWN). It usually indicates that due 
>to a log truncation (which did happen before the exception thrown), the 
>producer's state has all been deleted, while in that case 
>currentEntry.producerEpoch == RecordBatch.NO_PRODUCER_EPOCH should be 
>satisfied — however it does not. And here's my suspected route that lead this:

 

T1: The partition starts with replicas 3,4,5, with 5 as the leader; producers 
are still writing to 5.

T2: Assume there's a producer with id 61009, start writing to leader 5, the 
first append is at an offset larger than offset 853773. NOTE that at this time 
that append has not bee fully replicated across, and hence high watermark has 
not been advanced.

T3: Replica 10 is added to the replica list and old leader 5 is removed. 
Replica 10 truncates itself till 853773, and then rebuild its producer state up 
to offset 853773 as well (you can see that from the log). Note that since 
producer 61009's append record is beyond 8553733, it's not yet contained in the 
persistent producer snapshot and hence not loaded into the new leader 10's 
in-memory producer states.

T4: There's a truncation happened: it seems be deleting an empty log segment 
though, since the log segment is (baseOffset=0, size=0), but that should not 
have any impact on the producer state since deleting files does not immediately 
affect the in-memory producer entries.

T5: Producer 61009 finally learned about the new leader, and start sending to 
it. It's append start offset is 853836 (larger than 853773), the producer 
entry's metadata queue is empty, HOWEVER its epoch is somehow not -1 (UNKNOWN), 
i.e. in the old snapshot it does not have any metadata but has an existing 
epoch. And hence this exception is triggered. Unfortunately since we do not 
have enough log info (I can file a quick PR to enhance it in the future 
releases) so I cannot be certain why that snapshot contains no metadata but a 
non -1 epoch... would like to hear some expert's opinion, [~hachikuji] 
[~junrao] .

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
> Attachments: logs.csv
>
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> 

[jira] [Commented] (KAFKA-12634) Should checkpoint after restore finished

2022-09-16 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606009#comment-17606009
 ] 

Guozhang Wang commented on KAFKA-12634:
---

We should be able to close this ticket once KAFKA-10199 is merged. cc [~cadonna]

> Should checkpoint after restore finished
> 
>
> Key: KAFKA-12634
> URL: https://issues.apache.org/jira/browse/KAFKA-12634
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.5.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: new-streams-runtime-should-fix, newbie++
>
> For state stores, Kafka Streams maintains local checkpoint files to track the 
> offsets of the state store changelog topics. The checkpoint is updated on 
> commit or when a task is closed cleanly.
> However, after a successful restore, the checkpoint is not written. Thus, if 
> an instance crashes after restore but before committing, even if the state is 
> on local disk the checkpoint file is missing (indicating that there is no 
> state) and thus state would be restored from scratch.
> While for most cases, the time between restore end and next commit is small, 
> there are cases when this time could be large, for example if there is no new 
> input data to be processed (if there is no input data, the commit would be 
> skipped).
> Thus, we should write the checkpoint file after a successful restore to close 
> this gap (or course, only for at-least-once processing).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13561) Consider deprecating `StreamsBuilder#build(props)` function

2022-09-16 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606007#comment-17606007
 ] 

Guozhang Wang commented on KAFKA-13561:
---

Hi [~showuon], as we are now working on KIP-862 I think we can now consider 
completing this ticket, are you still interested in this?

> Consider deprecating `StreamsBuilder#build(props)` function
> ---
>
> Key: KAFKA-13561
> URL: https://issues.apache.org/jira/browse/KAFKA-13561
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Luke Chen
>Priority: Major
>  Labels: needs-kip
>
> With 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-591%3A+Add+Kafka+Streams+config+to+set+default+state+store
>  being accepted that introduced the new `StreamsBuilder(TopologyConfig)` 
> constructor, we can consider deprecating the `StreamsBuilder#build(props)` 
> function now. There are still a few things we'd need to do:
> 1. Copy the `StreamsConfig.TOPOLOGY_OPTIMIZATION_CONFIG` to TopologyConfig.
> 2. Make sure the overloaded `StreamsBuilder()` constructor takes in default 
> values of TopologyConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered

2022-09-16 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605945#comment-17605945
 ] 

Guozhang Wang commented on KAFKA-10575:
---

I've just created a new KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: highluck
>Priority: Major
>
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered

2022-09-16 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605911#comment-17605911
 ] 

Guozhang Wang commented on KAFKA-10575:
---

Hello [~nicktelford] thanks for your inputs! Yes I'm now thinking about 
introducing a new API to the `StateRestoreListener` for the paused scenarios, 
and to create a KIP for that new API as well as a couple correlating metrics 
changes that will be introduced by KAFKA-10199.

Regarding the TaskStateChangeListener, I think it worth a separate discussion 
thread for its own scope --- personally I think only very advanced users would 
be leverage on this since {{Tasks}} are a concept that Streams library wants to 
more or less abstract away from common users: they should not worry too much 
about the unit of parallelism afterall.

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: highluck
>Priority: Major
>
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14239) Merge StateRestorationIntegrationTest into RestoreIntegrationTest

2022-09-16 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-14239:
-

 Summary: Merge StateRestorationIntegrationTest into 
RestoreIntegrationTest
 Key: KAFKA-14239
 URL: https://issues.apache.org/jira/browse/KAFKA-14239
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Reporter: Guozhang Wang


We have two integration test classes for store restoration, and 
StateRestorationIntegrationTest only has one single test method. We can merge 
it with the other to save integration testing time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-14 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604838#comment-17604838
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Hmm okay I think we'd need to reproduce this which can help getting a better 
trace on the broker side. cc [~hachikuji]

At the mean time, do you happen to still have the broker-side logs on the 
OOOSException thrown, if yes could you share in the comments?

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira

[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-13 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603789#comment-17603789
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Hi [~nicktelford], regarding the broker's behavior:

> Irrespective of the behaviour on the Streams side, I'm confident that the 
> real issue is that brokers should not be producing an 
> OutOfOrderSequenceException just because partition leadership changed while a 
> producer was writing to that partition. As I mentioned in my earlier comment, 
> I believe this is caused by the producerEpoch not being properly tracked when 
> partition leadership changes.

I think it is related to KIP-360 but when I looked through the history, I 
cannot find obvious relevance to what you've observed and on the current trunk 
the behavior seems not as what you observed either. Would you mind upgrading to 
a newer version of broker than 2.5.1, e.g. 3.0+, and see if this issue still 
preserves?

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition 

[jira] [Commented] (KAFKA-14224) Consumer should auto-commit prior to cooperative partition revocation

2022-09-13 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603743#comment-17603743
 ] 

Guozhang Wang commented on KAFKA-14224:
---

cc [~pnee] I think we can address this by always pairing the commit with the 
revocation whenever the latter happened when auto-commit is enabled. More 
specifically, anytime the consumer need to revoke a partition, it would:

1) "mute" the partition from returning any more data to the poll call, so that 
its position would not advance any more.
2) set the flag so that the next poll call from the polling thread would then 
trigger the {{onPartitionRevoked}} listener.
3) after 2), we know that the position of those partitions would not advance 
any more, and we can then send out the commit offset req.
4) After the resp is returned, remove those partitions, so that the next HB 
could let the coordinator know that it has removed those partitions.

> Consumer should auto-commit prior to cooperative partition revocation
> -
>
> Key: KAFKA-14224
> URL: https://issues.apache.org/jira/browse/KAFKA-14224
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: new-consumer-threading-should-fix
>
> With the old "eager" reassignment logic, we always revoked all partitions 
> prior to each rebalance. When auto-commit is enabled, a part of this process 
> is committing current position. Under the new "cooperative" logic, we defer 
> revocation until after the rebalance, which means we can continue fetching 
> while the rebalance is in progress. However, when reviewing KAFKA-14196, we 
> noticed that there is no similar logic to commit offsets prior to this 
> deferred revocation. This means that cooperative consumption is more likely 
> to lead to have duplicate consumption even when there is no failure involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14224) Consumer should auto-commit prior to cooperative partition revocation

2022-09-13 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14224:
--
Labels: new-consumer-threading-should-fix  (was: )

> Consumer should auto-commit prior to cooperative partition revocation
> -
>
> Key: KAFKA-14224
> URL: https://issues.apache.org/jira/browse/KAFKA-14224
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: new-consumer-threading-should-fix
>
> With the old "eager" reassignment logic, we always revoked all partitions 
> prior to each rebalance. When auto-commit is enabled, a part of this process 
> is committing current position. Under the new "cooperative" logic, we defer 
> revocation until after the rebalance, which means we can continue fetching 
> while the rebalance is in progress. However, when reviewing KAFKA-14196, we 
> noticed that there is no similar logic to commit offsets prior to this 
> deferred revocation. This means that cooperative consumption is more likely 
> to lead to have duplicate consumption even when there is no failure involved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-09 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602574#comment-17602574
 ] 

Guozhang Wang commented on KAFKA-14208:
---

I marked it as a blocker for 3.3.0 since it was a newly introduced regression. 
[~jsancio] Let me know what do you think?

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-09 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14208:
--
Priority: Blocker  (was: Major)

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 3.3.0
>
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-09 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14208:
--
Fix Version/s: 3.3.0

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602131#comment-17602131
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Hello [~nicktelford], thanks for the updates!

I've summarized the EOS related exception and its handling logic in KIP-691 
(the section starting with "As of 08/16/2022"). As you can see 
OutOfSequenceException is an abortable exception not a fatal one. Kafka Streams 
relying on Kafka Producer would handle this exception as a 
TaskMigratedException, note that the latter is an internal exception that would 
be handled without failing the Kafka Streams app, i.e. the second stack trace 
should not be killing the app.

The first stack trace, though, in version 2.5, would be killing the app. The 
difference between these two is that the first stack trace was throw when 
committing streams task, while the second was thrown when the task was being 
processed normally while trying to send a record.

I checked the source code and have confirmed that this issue has been resolved 
in trunk, i.e. we would always throw it as TaskMigrated and handle it 
internally rather than failing the app. So I'd suggest you upgrading your 
application beyond 2.5.



> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> 

[jira] [Commented] (KAFKA-6599) KTable KTable join semantics violated when caching enabled

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602124#comment-17602124
 ] 

Guozhang Wang commented on KAFKA-6599:
--

Hi [~jfilipiak], great to hear back from you.

Yeah I think I agree with you the whether joined tables are materialized or not 
are orthogonal to this issue, what I thought is that, if it is materialized, 
then we can potentially mitigate the issue by comparing the new joined value 
with the old value (retrieved from the materialized store) and then we would 
know if the new value should be emitted. But that's not the key of this issue 
itself.

For this issue itself, I think the key to it is still decoupling the caching 
from emitting policy itself. We are currently working towards that already by 
allow users to specify explicitly when to emit / suppress, and by default even 
with caching we always emit results.

> KTable KTable join semantics violated when caching enabled
> --
>
> Key: KAFKA-6599
> URL: https://issues.apache.org/jira/browse/KAFKA-6599
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Jan Filipiak
>Priority: Critical
>  Labels: bug
>
> Say a tuple A,B got emmited after joining and the delete for A goes into the 
> cache. After that the B record would be deleted aswell. B's join processor 
> would look up A and see `null` while computing for old and new value (at this 
> point we can execute joiner with A beeing null and still emit something, but 
> its not gonna represent the actual oldValue) Then As cache flushes it doesn't 
> see B so its also not gonna put a proper oldValue. The output can then not be 
> used for say any aggregate as a delete would not reliably find its old 
> aggregate where it needs to be removed from filter will also break as it 
> stopps null,null changes from propagating. So for me it looks pretty clearly 
> that Caching with Join breaks KTable semantics. be it my new join or the 
> currently existing once.
>  
> this if branch here
> [https://github.com/apache/kafka/blob/1.0/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java#L155]
> is not usefull. I think its there because when one would delegate the true 
> case to the underlying. One would get proper semantics for streams, but the 
> weiredest cache I've seen.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14202) IQv2: Expose binary store schema to store implementations

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602107#comment-17602107
 ] 

Guozhang Wang commented on KAFKA-14202:
---

Hey [~vvcephei] are you considering the scope for store key schemas only, or 
both keys and values?

> IQv2: Expose binary store schema to store implementations
> -
>
> Key: KAFKA-14202
> URL: https://issues.apache.org/jira/browse/KAFKA-14202
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Minor
>
> One feature of IQv2 is that store implementations can handle custom queries. 
> Many custom query handlers will need to process the key or value bytes, for 
> example deserializing them to implement some filter or aggregations, or even 
> performing binary operations on them.
> For the most part, this should be straightforward for users, since they 
> provide Streams with the serdes, the store implementation, and the custom 
> queries.
> However, Streams will sometimes pack extra data around the data produced by 
> the user-provided serdes. For example, the Timestamped store wrappers add a 
> timestamp on the beginning of the value byte array. And in Windowed stores, 
> we add window timestamps to the key bytes.
> It would be nice to have some generic mechanism to communicate those schemas 
> to the user-provided inner store layers to support users who need to write 
> custom queries. For example, perhaps we can add an extractor class to the 
> state store context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-14208:
-

Assignee: Guozhang Wang

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Assignee: Guozhang Wang
>Priority: Major
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14196) Duplicated consumption during rebalance, causing OffsetValidationTest to act flaky

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602049#comment-17602049
 ] 

Guozhang Wang commented on KAFKA-14196:
---

Thanks Philip, and regarding your two questions above I agree with [~showuon]'s 
thoughts as well. Especially for 1), I think even if subscriptions changed in 
between consecutive onJoinPrepare, as long as they will not change the assigned 
partitions (i.e. as long as `assignFromSubscribed()` has not called) I think we 
are fine, since the returned records depend on that assigned partitions.

> Duplicated consumption during rebalance, causing OffsetValidationTest to act 
> flaky
> --
>
> Key: KAFKA-14196
> URL: https://issues.apache.org/jira/browse/KAFKA-14196
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.2.1
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.2
>
>
> Several flaky tests under OffsetValidationTest are indicating potential 
> consumer duplication issue, when autocommit is enabled.  I believe this is 
> affecting *3.2* and onward.  Below shows the failure message:
>  
> {code:java}
> Total consumed records 3366 did not match consumed position 3331 {code}
>  
> After investigating the log, I discovered that the data consumed between the 
> start of a rebalance event and the async commit was lost for those failing 
> tests.  In the example below, the rebalance event kicks in at around 
> 1662054846995 (first record), and the async commit of the offset 3739 is 
> completed at around 1662054847015 (right before partitions_revoked).
>  
> {code:java}
> {"timestamp":1662054846995,"name":"records_consumed","count":3,"partitions":[{"topic":"test_topic","partition":0,"count":3,"minOffset":3739,"maxOffset":3741}]}
> {"timestamp":1662054846998,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3742,"maxOffset":3743}]}
> {"timestamp":1662054847008,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3744,"maxOffset":3745}]}
> {"timestamp":1662054847016,"name":"partitions_revoked","partitions":[{"topic":"test_topic","partition":0}]}
> {"timestamp":1662054847031,"name":"partitions_assigned","partitions":[{"topic":"test_topic","partition":0}]}
> {"timestamp":1662054847038,"name":"records_consumed","count":23,"partitions":[{"topic":"test_topic","partition":0,"count":23,"minOffset":3739,"maxOffset":3761}]}
>  {code}
> A few things to note here:
>  # Manually calling commitSync in the onPartitionsRevoke cb seems to 
> alleviate the issue
>  # Setting includeMetadataInTimeout to false also seems to alleviate the 
> issue.
> The above tries seems to suggest that contract between poll() and 
> asyncCommit() is broken.  AFAIK, we implicitly uses poll() to ack the 
> previously fetched data, and the consumer would (try to) commit these offsets 
> in the current poll() loop.  However, it seems like as the poll continues to 
> loop, the "acked" data isn't being committed.
>  
> I believe this could be introduced in  KAFKA-14024, which originated from 
> KAFKA-13310.
> More specifically, (see the comments below), the ConsumerCoordinator will 
> alway return before async commit, due to the previous incomplete commit.  
> However, this is a bit contradictory here because:
>  # I think we want to commit asynchronously while the poll continues, and if 
> we do that, we are back to KAFKA-14024, that the consumer will get rebalance 
> timeout and get kicked out of the group.
>  # But we also need to commit all the "acked" offsets before revoking the 
> partition, and this has to be blocked.
> *Steps to Reproduce the Issue:*
>  # Check out AK 3.2
>  # Run this several times: (Recommend to only run runs with autocommit 
> enabled in consumer_test.py to save time)
> {code:java}
> _DUCKTAPE_OPTIONS="--debug" 
> TC_PATHS="tests/kafkatest/tests/client/consumer_test.py::OffsetValidationTest.test_consumer_failure"
>  bash tests/docker/run_tests.sh {code}
>  
> *Steps to Diagnose the Issue:*
>  # Open the test results in *results/*
>  # Go to the consumer log.  It might look like this
>  
> {code:java}
> results/2022-09-03--005/OffsetValidationTest/test_consumer_failure/clean_shutdown=True.enable_autocommit=True.metadata_quorum=ZK/2/VerifiableConsumer-0-xx/dockerYY
>  {code}
> 3. Find the docker instance that has partition getting revoked and rejoined.  
> Observed the offset before and after.
> *Propose Fixes:*
>  TBD
>  
> https://github.com/apache/kafka/pull/12603



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14208) KafkaConsumer#commitAsync throws unexpected WakeupException

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602046#comment-17602046
 ] 

Guozhang Wang commented on KAFKA-14208:
---

Hello Qingsheng, thanks for reporting this issue, and I looked at the source 
code and agree with you that it was introduced as part of KAFKA-13563. I will 
try to fix this with a follow-up PR.

> KafkaConsumer#commitAsync throws unexpected WakeupException
> ---
>
> Key: KAFKA-14208
> URL: https://issues.apache.org/jira/browse/KAFKA-14208
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.1
>Reporter: Qingsheng Ren
>Priority: Major
>
> We recently encountered a bug after upgrading Kafka client to 3.2.1 in Flink 
> Kafka connector (FLINK-29153). Here's the exception:
> {code:java}
> org.apache.kafka.common.errors.WakeupException
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.maybeTriggerWakeup(ConsumerNetworkClient.java:514)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:278)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:252)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.coordinatorUnknownAndUnready(ConsumerCoordinator.java:493)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:1055)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1573)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.KafkaConsumerThread.run(KafkaConsumerThread.java:226)
>  {code}
> As {{WakeupException}} is not listed in the JavaDoc of 
> {{{}KafkaConsumer#commitAsync{}}}, Flink Kafka connector doesn't catch the 
> exception thrown directly from KafkaConsumer#commitAsync but handles all 
> exceptions in the callback.
> I checked the source code and suspect this is caused by KAFKA-13563. Also we 
> never had this exception in commitAsync when we used Kafka client 2.4.1 & 
> 2.8.1. 
> I'm wondering if this is kind of breaking the public API as the 
> WakeupException is not listed in JavaDoc, and maybe it's better to invoke the 
> callback to handle the {{WakeupException}} instead of throwing it directly 
> from the method itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14189) Improve connection limit and reuse of coordinator and leader in KafkaConsumer

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602037#comment-17602037
 ] 

Guozhang Wang commented on KAFKA-14189:
---

Hi [~aglicacha] [~vongosling]

The main motivation for using two connection sockets for the coordinator and 
partition leader is to not block coordination related requests such as 
join/sync by fetching requests (which could be long polling, and during that 
time we cannot send other requests using the same socket). Reusing the 
connection may cause issues e.g. a heartbeat request not being processed in 
time if there's already fetching request parked at the broker side.

> Improve connection limit and reuse of coordinator and leader in KafkaConsumer
> -
>
> Key: KAFKA-14189
> URL: https://issues.apache.org/jira/browse/KAFKA-14189
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.0
>Reporter: Junyang Liu
>Priority: Major
>
> The connection id of connection with coordinator in KafkaConsumer is 
> Integer.MAX_VALUE - coordinator id, which is different with connection id of 
> partition leader. So the connection cannot be reused when coordinator and 
> leader are in the same broker, which means we need two seperated connections 
> with the same broker. Suppose such case, a consumer has connected to the 
> coordinator and finished Join and Sync, and wants to send FETCH to leader in 
> the same broker. But the connection count has reached limit, so the consumer 
> with be in the group but cannot consume messages
> partial logs:
> {code:java}
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Added 
> READ_UNCOMMITTED fetch request for partition topic-test-4 at offset 9 to node 
> :9092 (id: 2 rack: 2) 
> (org.apache.kafka.clients.consumer.internals.Fetcher) 
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Built full fetch 
> (sessionId=INVALID, epoch=INITIAL) for node 2 with 1 partition(s). 
> (org.apache.kafka.clients.FetchSessionHandler) 
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Sending 
> READ_UNCOMMITTED FullFetchRequest(topic-test-4) to broker :9092 (id: 2 
> rack: 2) (org.apache.kafka.clients.consumer.internals.Fetcher)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Initiating 
> connection to node :9092 (id: 2 rack: 2) using address / 
> (org.apache.kafka.clients.NetworkClient) 
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Using older server 
> API v3 to send OFFSET_COMMIT 
> {group_id=group-test,generation_id=134,member_id=consumer-11-2e2b16eb-516c-496c-8aa4-c6e990b43598,retention_time=-1,topics=[{topic=topic-test,partitions=[{partition=3,offset=0,metadata=},{partition=4,offset=9,metadata=},{partition=5,offset=13,metadata=}]}]}
>  with correlation id 242 to node 2147483645 
> (org.apache.kafka.clients.NetworkClient)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Created socket with 
> SO_RCVBUF = 65536, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 2 
> (org.apache.kafka.common.network.Selector)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Completed 
> connection to node 2. Fetching API versions. 
> (org.apache.kafka.clients.NetworkClient)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Initiating API 
> versions fetch from node 2. (org.apache.kafka.clients.NetworkClient)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Subscribed to 
> topic(s): topic-test (org.apache.kafka.clients.consumer.KafkaConsumer)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Connection with 
> / disconnected (org.apache.kafka.common.network.Selector)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Node 2 
> disconnected. (org.apache.kafka.clients.NetworkClient) 
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Cancelled request 
> with header RequestHeader(apiKey=FETCH, apiVersion=10, clientId=consumer-11, 
> correlationId=241) due to node 2 being disconnected 
> (org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient)
> DEBUG [Consumer clientId=consumer-11, groupId=group-test] Error sending fetch 
> request (sessionId=INVALID, epoch=INITIAL) to node 2: 
> org.apache.kafka.common.errors.DisconnectException. 
> (org.apache.kafka.clients.FetchSessionHandler){code}
> connection to coordinator, rebalance and fetching offsets have finished. when 
> preparing connection to leader for fetching, the connection limit has 
> reached, so after tcp connection, the broker disconnect the client.  
>  
> The root cause of this issue is that the process of consuming is a 
> combination of multiple connections(connections with coordinator and leader 
> in same broker), not atomic, which may leads to "half connected". I think we 
> can do some 

[jira] [Commented] (KAFKA-12887) Do not trigger user-customized ExceptionalHandler for RTE

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602033#comment-17602033
 ] 

Guozhang Wang commented on KAFKA-12887:
---

Hello [~icorne] We have reverted this change due to reasons similar to what you 
described here. Do you see it still in 3.1.1? Could you try 3.2.1 instead?

> Do not trigger user-customized ExceptionalHandler for RTE
> -
>
> Key: KAFKA-12887
> URL: https://issues.apache.org/jira/browse/KAFKA-12887
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Josep Prat
>Priority: Major
> Fix For: 3.1.0
>
>
> Today in StreamThread we have a try-catch block that captures all {{Throwable 
> e}} and then triggers {{this.streamsUncaughtExceptionHandler.accept(e)}}. 
> However, there are possible RTEs such as IllegalState/IllegalArgument 
> exceptions which are usually caused by bugs, etc. In such cases we should not 
> let users to decide what to do with these exceptions, but should let Streams 
> itself to enforce the decision, e.g. in the IllegalState/IllegalArgument we 
> should fail fast to notify the potential error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13766) Use `max.poll.interval.ms` as the timeout during complete-rebalance phase

2022-09-08 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601949#comment-17601949
 ] 

Guozhang Wang commented on KAFKA-13766:
---

Inside onCompleteJoin, in the block starting with

{{// trigger the awaiting join group response callback for all the members 
after rebalancing{{

Indicates that once we are in the completing rebalance phase, we’ve re-enabled 
the HB with session timeout. I.e. in that phase we effectively have two timers:

{{completeAndScheduleNextHeartbeatExpiration(group, member)}}
and
{{schedulePendingSync(group)}}
whichever triggers first, we would fail the member and re-trigger the 
rebalance. And since in general session.timeout is smaller than rebalance 
timeout, we would hit the former if there’s a delay on assignment.

> Use `max.poll.interval.ms` as the timeout during complete-rebalance phase
> -
>
> Key: KAFKA-13766
> URL: https://issues.apache.org/jira/browse/KAFKA-13766
> Project: Kafka
>  Issue Type: Bug
>  Components: core, group-coordinator
>Reporter: Guozhang Wang
>Assignee: David Jacot
>Priority: Major
>  Labels: new-rebalance-should-fix
>
> The lifetime of a consumer can be categorized in three phases:
> 1) During normal processing, the broker expects a hb request periodically 
> from consumer, and that is timed by the `session.timeout.ms`.
> 2) During the prepare_rebalance, the broker would expect a join-group request 
> to be received within the rebalance.timeout, which is piggy-backed as the 
> `max.poll.interval.ms`.
> 3) During the complete_rebalance, the broker would expect a sync-group 
> request to be received again within the `session.timeout.ms`.
> So during different phases of the life of the consumer, different timeout 
> would be used to bound the timer.
> Nowadays with cooperative rebalance protocol, we can still return records and 
> process them in the middle of a rebalance from {{consumer.poll}}. In that 
> case, for phase 3) we should also use the `max.poll.interval.ms` to bound the 
> timer, which is in practice larger than `session.timeout.ms`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14196) Duplicated consumption during rebalance, causing OffsetValidationTest to act flaky

2022-09-06 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600985#comment-17600985
 ] 

Guozhang Wang commented on KAFKA-14196:
---

[~pnee] Thanks for reporting this. While reviewing KAFKA-13310 I have realized 
this, but as Luke said this is not a new regression (we would potentially have 
duplicates even before this, since as we commit sync, and if the commit fails, 
we still log a warning and move forward with the revocation, in which case we 
would also have duplicates), I suggested we add a TODO there indicating it's 
sub-optimal but is allowed under at least once semantics.

I think in the long run, as we move the rebalancing related procedure all to 
the background thread, this would no longer be an issue since between the time 
background thread received an response telling it to start rebalancing (of 
which, the first step is to potentially revoking partitions in 
`onJoinPrepare`), and the time after the auto commit has been completed, the 
background thread could simply mark those revoking partitions as "not 
retrievable" so that calling thread's `poll` calls would not return any more 
data for those partitions. Right?

If that's the case, then we only need to consider before that comes, what we 
should do with this. Like I said, the behaviors before are 1) we commit sync, 
and even if it fails we still move forward, which would cause duplicates, or 2) 
we commit async so that `poll` timeout could be respected, but we would still 
potentially return data for those revoking partitions. I'm thinking what about 
just taking the middle ground: we still commit async, while at the same time 
mark those revoking partitions as "not retrievable" to not return any more 
data, note this would still not forbid duplicates completely, but would 
basically take us to where we were in the likelihood of the duplicates. And 
then we rely on the threading remodeling (there's a WIP page that Philip would 
be sending out soon) to completely resolve this issue.

> Duplicated consumption during rebalance, causing OffsetValidationTest to act 
> flaky
> --
>
> Key: KAFKA-14196
> URL: https://issues.apache.org/jira/browse/KAFKA-14196
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 3.3.0, 3.2.1
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Major
>  Labels: new-consumer-threading-should-fix
>
> Several flaky tests under OffsetValidationTest are indicating potential 
> consumer duplication issue, when autocommit is enabled.  I believe this is 
> affecting *3.2* and onward.  Below shows the failure message:
>  
> {code:java}
> Total consumed records 3366 did not match consumed position 3331 {code}
>  
> After investigating the log, I discovered that the data consumed between the 
> start of a rebalance event and the async commit was lost for those failing 
> tests.  In the example below, the rebalance event kicks in at around 
> 1662054846995 (first record), and the async commit of the offset 3739 is 
> completed at around 1662054847015 (right before partitions_revoked).
>  
> {code:java}
> {"timestamp":1662054846995,"name":"records_consumed","count":3,"partitions":[{"topic":"test_topic","partition":0,"count":3,"minOffset":3739,"maxOffset":3741}]}
> {"timestamp":1662054846998,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3742,"maxOffset":3743}]}
> {"timestamp":1662054847008,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3744,"maxOffset":3745}]}
> {"timestamp":1662054847016,"name":"partitions_revoked","partitions":[{"topic":"test_topic","partition":0}]}
> {"timestamp":1662054847031,"name":"partitions_assigned","partitions":[{"topic":"test_topic","partition":0}]}
> {"timestamp":1662054847038,"name":"records_consumed","count":23,"partitions":[{"topic":"test_topic","partition":0,"count":23,"minOffset":3739,"maxOffset":3761}]}
>  {code}
> A few things to note here:
>  # Manually calling commitSync in the onPartitionsRevoke cb seems to 
> alleviate the issue
>  # Setting includeMetadataInTimeout to false also seems to alleviate the 
> issue.
> The above tries seems to suggest that contract between poll() and 
> asyncCommit() is broken.  AFAIK, we implicitly uses poll() to ack the 
> previously fetched data, and the consumer would (try to) commit these offsets 
> in the current poll() loop.  However, it seems like as the poll continues to 
> loop, the "acked" data isn't being committed.
>  
> I believe this could be introduced in  KAFKA-14024, which originated from 
> KAFKA-13310.
> More specifically, (see the comments below), the ConsumerCoordinator will 
> alway 

[jira] [Commented] (KAFKA-14130) Reduce RackAwarenessIntegrationTest to a unit test

2022-09-04 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600190#comment-17600190
 ] 

Guozhang Wang commented on KAFKA-14130:
---

Hello [~hoping][~Eslam.hamdy] sorry about the confusion.. I have complete this 
ticket myself through this PR: https://github.com/apache/kafka/pull/12476. I 
forgot to close this ticket though, which is my bad.. closing now.

> Reduce RackAwarenessIntegrationTest to a unit test
> --
>
> Key: KAFKA-14130
> URL: https://issues.apache.org/jira/browse/KAFKA-14130
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Eslam
>Priority: Major
>  Labels: newbie
>
> While working on KAFKA-13877, I feel it's an overkill to introduce the whole 
> test class as an integration test, since all we need is to just test the 
> assignor itself which could be a unit test. Running this suite with 9+ 
> instances takes long time and is still vulnerable to all kinds of timing 
> based flakiness. A better choice is to reduce it as a unit test, similar to 
> {{HighAvailabilityStreamsPartitionAssignorTest}} that just test the behavior 
> of the assignor itself, rather than creating many instances hence depend on 
> various timing bombs to not explode.
> The scope of this ticket is to refactor the {{RackAwarenessIntegrationTest}} 
> into a {{RackAwarenessStreamsPartitionAssignorTest}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14130) Reduce RackAwarenessIntegrationTest to a unit test

2022-09-04 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-14130.
---
Fix Version/s: 3.4.0
 Assignee: Guozhang Wang  (was: Eslam)
   Resolution: Fixed

> Reduce RackAwarenessIntegrationTest to a unit test
> --
>
> Key: KAFKA-14130
> URL: https://issues.apache.org/jira/browse/KAFKA-14130
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: newbie
> Fix For: 3.4.0
>
>
> While working on KAFKA-13877, I feel it's an overkill to introduce the whole 
> test class as an integration test, since all we need is to just test the 
> assignor itself which could be a unit test. Running this suite with 9+ 
> instances takes long time and is still vulnerable to all kinds of timing 
> based flakiness. A better choice is to reduce it as a unit test, similar to 
> {{HighAvailabilityStreamsPartitionAssignorTest}} that just test the behavior 
> of the assignor itself, rather than creating many instances hence depend on 
> various timing bombs to not explode.
> The scope of this ticket is to refactor the {{RackAwarenessIntegrationTest}} 
> into a {{RackAwarenessStreamsPartitionAssignorTest}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-08-19 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581943#comment-17581943
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Thanks [~nicktelford], just to clarify what I asked for is the logs stack trace 
on the client (Kafka Streams) side, not the broker side.

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10635) Streams application fails with OutOfOrderSequenceException after rolling restarts of brokers

2022-08-18 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581574#comment-17581574
 ] 

Guozhang Wang commented on KAFKA-10635:
---

Hello [~nicktelford] Could you also paste the full stack trace of the 
OutOfEquenceException? Since in 3.2 a lot of code has been changed (e.g. 
there's no `AssignedTasks` anymore), and from a quick peek in the source code I 
think this exception should have been handled internally without killing the 
app. So a full stack trace on the thrown exception would be very helpful.

> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> 
>
> Key: KAFKA-10635
> URL: https://issues.apache.org/jira/browse/KAFKA-10635
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 2.5.1
>Reporter: Peeraya Maetasatidsuk
>Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
> at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
> at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
> at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>    at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>  at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>  at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)   
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. 

[jira] [Commented] (KAFKA-14069) Allow custom configuration of foreign key join internal topics

2022-08-09 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577634#comment-17577634
 ] 

Guozhang Wang commented on KAFKA-14069:
---

That's interesting.. could you check a few things additionally?

1) does your app commit successfully from time to time (the delete records 
request should be sent alined with the commit).
2) did you see the following log-lines in your logs?

```
"Sent delete-records request: {}"
```

```
"Previous delete-records request has failed: {}. Try sending the new request 
now"
```

> Allow custom configuration of foreign key join internal topics
> --
>
> Key: KAFKA-14069
> URL: https://issues.apache.org/jira/browse/KAFKA-14069
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Emmanuel Brard
>Priority: Minor
>
> Internal topic supporting foreign key joins (-subscription-registration-topic 
> and -subscription-response-topic) are automatically created with_ infinite 
> retention_ (retention.ms=-1, retention.bytes=-1).
> As far as I understand those topics are used for communication between tasks 
> that are involved in the FK, the intermediate result though is persisted in a 
> compacted topic (-subscription-store-changelog).
> This means, if I understood right, that during normal operation of the stream 
> application, once a message is read from the registration/subscription topic, 
> it will not be read again, even in case of recovery (the position in those 
> topics is committed).
> Because we have very large tables being joined this way with very high 
> changes frequency, we end up with FK internal topics in the order of 1 or 2 
> TB. This is complicated to maintain especially in term of disk space.
> I was wondering if:
> - this infinite retention is really a required configuration and if not
> - this infinite retention could be replaced with a configurable one (for 
> example of 1 week, meaning that I accept that in case of failure I must this 
> my app within one week)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14070) Improve documentation for queryMetadataForKey for state stores with Processor API

2022-08-09 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577633#comment-17577633
 ] 

Guozhang Wang commented on KAFKA-14070:
---

Hello [~balajirrao] Thanks for the updated description! That's much clearer 
now. Yes the `queryMetadataForKey` overloaded func without the 
{{StreamPartitioner}} does not work very well with the processor API since it's 
assuming the key of the store is inherited from the key of the input (or 
repartition) topic. If user is storing the key in a different manner, they'd 
need to use the other overloaded func that requires {{StreamPartitioner}}. But 
like you brought up in the second example, if the partitioning scheme is not 
dependent by the key (e.g. if it's by the value) then that function does not 
help either. I think in the near term it's definitely necessary to improve our 
docs clarifying that for the PAPI users --- would you like to file a PR?

In the long run, we should consider generalizing this function to allow users 
provide any form of partitioning schemes.

> Improve documentation for queryMetadataForKey for state stores with Processor 
> API
> -
>
> Key: KAFKA-14070
> URL: https://issues.apache.org/jira/browse/KAFKA-14070
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 3.2.0
>Reporter: Balaji Rao
>Priority: Minor
>
> Using {{queryMetadataForKey}} for state stores with Processor API is tricky. 
> One could use state stores in Processor API in ways that would make it 
> impossible to use {{queryMetadataForKey}} with just a key alone - one would 
> have to know the input record's key. This could lead to the method being 
> called with incorrect expectations. The documentation could be improved 
> around this, and around using state stores with the Processor API in general.
> Example Scala snippet:
> {code:scala}
> val input = streamsBuilder.stream(
> "input-topic",
> Consumed.`with`(Serdes.intSerde, Serdes.stringSerde)
>   )
>   private val storeBuilder = Stores
> .keyValueStoreBuilder[String, String](
>   Stores.inMemoryKeyValueStore("store"),
>   Serdes.stringSerde,
>   Serdes.stringSerde
> )
>   streamsBuilder.addStateStore(storeBuilder)
>   input.process(
> new ProcessorSupplier[Int, String, Void, Void] {
>   override def get(): Processor[Int, String, Void, Void] =
> new Processor[Int, String, Void, Void] {
>   var store: KeyValueStore[String, String] = _
>   override def init(context: ProcessorContext[Void, Void]): Unit = {
> super.init(context)
> store = context.getStateStore("store")
>   }
>   override def process(record: Record[Int, String]): Unit = {
> ('a' to 'j').foreach(x =>
>   store.put(s"${record.key}-$x", record.value)
> )
>   }
> }
> },
> "store"
>   )
> {code}
> In the code sample above, AFAICT there is no way the possible partition of 
> the {{store}} containing the key {{"1-a"}} could be determined by calling 
> {{queryMetadataForKey}} with the string {{{}"1-a"{}}}. One has to call 
> {{queryMetadataForKey}} with the record's key that produced {{{}"1-a"{}}}, in 
> this case the {{Int}} 1, to find the partition.
>  
> Example 2:
> The same as above, but with a different {{process}} method.
> {code:scala}
> override def process(record: Record[Int, String]): Unit = {
>   ('a' to 'j').foreach(x => store.put(s"$x", s"${record.key}"))
> }{code}
> In this case the key {{"a"}} could exist in multiple partitions, with 
> different values in different partitions. In this case, AFAICT, one must use 
> {{queryMetadataForKey}} with an {{Int}} to determine the partition where a 
> given {{String}} would be stored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14138) The Exception Throwing Behavior of Transactional Producer is Inconsistent

2022-08-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14138:
--
Description: 
There's an issue for inconsistent error throwing inside Kafka Producer when 
transactions are enabled. In short, there are two places where the received 
error code from the brokers would be eventually thrown to the caller:

* Recorded on the batch's metadata, via "Sender#failBatch"
* Recorded on the txn manager, via "txnManager#handleFailedBatch".

The former would be thrown from 1) the `Future` returned from 
the `send`; or 2) the `callback` inside `send(record, callback)`. Whereas, the 
latter would be thrown from `producer.send()` directly in which we call 
`txnManager.maybeAddPartition -> maybeFailWithError`. However, when thrown from 
the former, it's not wrapped hence the direct exception (e.g. 
ClusterAuthorizationException), whereas in the latter it's wrapped as, e.g. 
KafkaException(ClusterAuthorizationException). And which one would be thrown 
depend on a race condition since we cannot control by the time the caller 
thread calls `txnManager.maybeAddPartition`, if the previous produceRequest's 
error has been sent back or not.

For example consider the following sequence for idempotent producer:


1. caller thread: within future = producer.send(), call recordAccumulator.append

2. sender thread: drain the accumulator, send the produceRequest and get the 
error back.

3. caller thread: within future = producer.send(), call 
txnManager.maybeAddPartition, in which we would check `maybeFailWithError` 
before `isTransactional`.

4. caller thread: future.get()

In a sequence where then 3) happened before 2), we would only get the raw 
exception at step 4; in a sequence where 2) happened before 3), then we would 
throw the exception immediately at 3).

This inconsistent error throwing is pretty annoying for users since they'd need 
to handle both cases, but many of them actually do not know this trickiness. We 
should make the error throwing consistent, e.g. we should consider: 1) which 
errors would be thrown from callback / future.get, and which would be thrown 
from the `send` call directly, and these errors should better be 
non-overlapping, 2) whether we should wrap the raw error or not, we should do 
so consistently.

  was:
There's an issue for inconsistent error throwing inside Kafka Producer when 
transactions are enabled. In short, there are two places where the received 
error code from the brokers would be eventually thrown to the caller:

* Recorded on the batch's metadata, via "Sender#failBatch"
* Recorded on the txn manager, via "txnManager#handleFailedBatch".

The former would be thrown from 1) the `Future` returned from 
the `send`; or 2) the `callback` inside `send(record, callback)`. Whereas, the 
latter would be thrown from `producer.send()` directly in which we call 
`txnManager.maybeAddPartition -> maybeFailWithError`. However, when thrown from 
the former, it's not wrapped hence the direct exception (e.g. 
ClusterAuthorizationException), whereas in the latter it's wrapped as, e.g. 
KafkaException(ClusterAuthorizationException). And which one would be thrown 
depend on a race condition since we cannot control by the time the caller 
thread calls `txnManager.maybeAddPartition`, if the previous produceRequest's 
error has been sent back or not.

For example consider the following sequence:


1. caller thread: within future = producer.send(), call recordAccumulator.append

2. sender thread: drain the accumulator, send the produceRequest and get the 
error back.

3. caller thread: within future = producer.send(), call 
txnManager.maybeAddPartition

4. sender thread: get the addPartition token, send the txnRequest and get the 
error back. NOTE the sender thread could send these two requests in any order.

5. caller thread: future.get()

In a sequence where then 3) happened before 2), we would only get the raw 
exception at step 5; in a sequence where 2) happened before 3), then we would 
throw the exception immediately at 3).

This inconsistent error throwing is pretty annoying for users since they'd need 
to handle both cases, but many of them actually do not know this trickiness. We 
should make the error throwing consistent, e.g. we should consider: 1) which 
errors would be thrown from callback / future.get, and which would be thrown 
from the `send` call directly, and these errors should better be 
non-overlapping, 2) whether we should wrap the raw error or not, we should do 
so consistently.


> The Exception Throwing Behavior of Transactional Producer is Inconsistent
> -
>
> Key: KAFKA-14138
> URL: https://issues.apache.org/jira/browse/KAFKA-14138
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>

[jira] [Commented] (KAFKA-14138) The Exception Throwing Behavior of Transactional Producer is Inconsistent

2022-08-07 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576529#comment-17576529
 ] 

Guozhang Wang commented on KAFKA-14138:
---

[~sagarrao] Yes, I think this is highly related to KIP-691 (see "Callback 
Exception Improvement" and "Unify Wrapped KafkaException").

My intention is that, if we wrap all directly thrown exceptions with a 
`KafkaException`, then we should consider doing the same for exceptions passed 
in the future/callback as well. Of course, maybe we could use a different 
wrapping exception class to be aligned with KIP-691. And again, we also need to 
distinguish which exceptions should be thrown directly. And the principles are:

1. If EOS is not enabled, then produce delivery failures would be only set in 
future/callback, since they are not fatal; fatal exceptions should be thrown 
directly and the user thread should handle them by closing the producer.
2. If EOS is enabled, even non-fatal produce delivery failures should be thrown 
directly and be handled immediately, since otherwise we'd be violating the 
semantics. However, depending on the exception the caller thread could decide 
whether the producer should be closed or we could still abort the transaction 
and then move on.

> The Exception Throwing Behavior of Transactional Producer is Inconsistent
> -
>
> Key: KAFKA-14138
> URL: https://issues.apache.org/jira/browse/KAFKA-14138
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Guozhang Wang
>Assignee: Sagar Rao
>Priority: Critical
>
> There's an issue for inconsistent error throwing inside Kafka Producer when 
> transactions are enabled. In short, there are two places where the received 
> error code from the brokers would be eventually thrown to the caller:
> * Recorded on the batch's metadata, via "Sender#failBatch"
> * Recorded on the txn manager, via "txnManager#handleFailedBatch".
> The former would be thrown from 1) the `Future` returned from 
> the `send`; or 2) the `callback` inside `send(record, callback)`. Whereas, 
> the latter would be thrown from `producer.send()` directly in which we call 
> `txnManager.maybeAddPartition -> maybeFailWithError`. However, when thrown 
> from the former, it's not wrapped hence the direct exception (e.g. 
> ClusterAuthorizationException), whereas in the latter it's wrapped as, e.g. 
> KafkaException(ClusterAuthorizationException). And which one would be thrown 
> depend on a race condition since we cannot control by the time the caller 
> thread calls `txnManager.maybeAddPartition`, if the previous produceRequest's 
> error has been sent back or not.
> For example consider the following sequence:
> 1. caller thread: within future = producer.send(), call 
> recordAccumulator.append
> 2. sender thread: drain the accumulator, send the produceRequest and get the 
> error back.
> 3. caller thread: within future = producer.send(), call 
> txnManager.maybeAddPartition
> 4. sender thread: get the addPartition token, send the txnRequest and get the 
> error back. NOTE the sender thread could send these two requests in any order.
> 5. caller thread: future.get()
> In a sequence where then 3) happened before 2), we would only get the raw 
> exception at step 5; in a sequence where 2) happened before 3), then we would 
> throw the exception immediately at 3).
> This inconsistent error throwing is pretty annoying for users since they'd 
> need to handle both cases, but many of them actually do not know this 
> trickiness. We should make the error throwing consistent, e.g. we should 
> consider: 1) which errors would be thrown from callback / future.get, and 
> which would be thrown from the `send` call directly, and these errors should 
> better be non-overlapping, 2) whether we should wrap the raw error or not, we 
> should do so consistently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14138) The Exception Throwing Behavior of Transactional Producer is Inconsistent

2022-08-03 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-14138:
-

 Summary: The Exception Throwing Behavior of Transactional Producer 
is Inconsistent
 Key: KAFKA-14138
 URL: https://issues.apache.org/jira/browse/KAFKA-14138
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Guozhang Wang


There's an issue for inconsistent error throwing inside Kafka Producer when 
transactions are enabled. In short, there are two places where the received 
error code from the brokers would be eventually thrown to the caller:

* Recorded on the batch's metadata, via "Sender#failBatch"
* Recorded on the txn manager, via "txnManager#handleFailedBatch".

The former would be thrown from 1) the `Future` returned from 
the `send`; or 2) the `callback` inside `send(record, callback)`. Whereas, the 
latter would be thrown from `producer.send()` directly in which we call 
`txnManager.maybeAddPartition -> maybeFailWithError`. However, when thrown from 
the former, it's not wrapped hence the direct exception (e.g. 
ClusterAuthorizationException), whereas in the latter it's wrapped as, e.g. 
KafkaException(ClusterAuthorizationException). And which one would be thrown 
depend on a race condition since we cannot control by the time the caller 
thread calls `txnManager.maybeAddPartition`, if the previous produceRequest's 
error has been sent back or not.

For example consider the following sequence:


1. caller thread: within future = producer.send(), call recordAccumulator.append

2. sender thread: drain the accumulator, send the produceRequest and get the 
error back.

3. caller thread: within future = producer.send(), call 
txnManager.maybeAddPartition

4. sender thread: get the addPartition token, send the txnRequest and get the 
error back. NOTE the sender thread could send these two requests in any order.

5. caller thread: future.get()

In a sequence where then 3) happened before 2), we would only get the raw 
exception at step 5; in a sequence where 2) happened before 3), then we would 
throw the exception immediately at 3).

This inconsistent error throwing is pretty annoying for users since they'd need 
to handle both cases, but many of them actually do not know this trickiness. We 
should make the error throwing consistent, e.g. we should consider: 1) which 
errors would be thrown from callback / future.get, and which would be thrown 
from the `send` call directly, and these errors should better be 
non-overlapping, 2) whether we should wrap the raw error or not, we should do 
so consistently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13840) KafkaConsumer is unable to recover connection to group coordinator after commitOffsetsAsync exception

2022-08-03 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574928#comment-17574928
 ] 

Guozhang Wang commented on KAFKA-13840:
---

I think this issue is indeed fixed in the latest release (starting in 3.1) 
where upon `commitAsync` we would try to clear-and-discover the coordinator:

https://github.com/apache/kafka/pull/12259/files#diff-0029e982555d1fae10943b862924da962ca8e247a3070cded92c5f5a5960244fR954

Could you kindly check that code change and see if it would avoid the scenario 
you observed in the previous version?

> KafkaConsumer is unable to recover connection to group coordinator after 
> commitOffsetsAsync exception
> -
>
> Key: KAFKA-13840
> URL: https://issues.apache.org/jira/browse/KAFKA-13840
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 2.6.1, 3.1.0, 2.7.2, 2.8.1, 3.0.0
>Reporter: Kyle R Stehbens
>Assignee: Luke Chen
>Priority: Critical
>
> Hi, I've discovered an issue with the java Kafka client (consumer) whereby a 
> timeout or any other retry-able exception triggered during an async offset 
> commit, renders the client unable to recover its group co-coordinator and 
> leaves the client in a broken state.
>  
> I first encountered this using v2.8.1 of the java client, and after going 
> through the code base for all versions of the client, have found it affects 
> all versions of the client from 2.6.1 onward.
> I also confirmed that by rolling back to 2.5.1, the issue is not present.
>  
> The issue stems from changes to how the FindCoordinatorResponseHandler in 
> 2.5.1 used to call clearFindCoordinatorFuture(); on both success and failure 
> here:
> [https://github.com/apache/kafka/blob/0efa8fb0f4c73d92b6e55a112fa45417a67a7dc2/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L783]
>  
> In all future version of the client this call is not made:
> [https://github.com/apache/kafka/blob/839b886f9b732b151e1faeace7303c80641c08c4/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L838]
>  
> What this results in, is when the KafkaConsumer makes a call to 
> coordinator.commitOffsetsAsync(...), if an error occurs such that the 
> coordinator is unavailable here:
> [https://github.com/apache/kafka/blob/c5077c679c372589215a1b58ca84360c683aa6e8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L1007]
>  
> then the client will try call:
> [https://github.com/apache/kafka/blob/c5077c679c372589215a1b58ca84360c683aa6e8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L1017]
> However this will never be able to succeed as it perpetually returns a 
> reference to a failed future: findCoordinatorFuture that is never cleared out.
>  
> This manifests in all future calls to commitOffsetsAsync() throwing a 
> "coordinator unavailable" exception forever going forward after any 
> retry-able exception causes the coordinator to close. 
> Note we discovered this when we upgraded the kafka client in our Flink 
> consumers from 2.4.1 to 2.8.1 and subsequently needed to downgrade the 
> client. We noticed this occurring in our non-flink java consumers too running 
> 3.x client versions.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13840) KafkaConsumer is unable to recover connection to group coordinator after commitOffsetsAsync exception

2022-08-03 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574893#comment-17574893
 ] 

Guozhang Wang commented on KAFKA-13840:
---

[~kyle.stehbens] Just to clarify, when the retriable error happens for the 
commitAsync, did the caller thread further triggers other functions on the 
consumer?

In the current code, as long as the caller triggers "pull", or another 
"commitAsync", the "ensureCoordinatorReady" function would be triggered which 
would clear the failed future and mark coordinator unknown, so I'm still a bit 
less clear how the client would unable to recover its group co-coordinator and 
leaves the client in a broken state.

> KafkaConsumer is unable to recover connection to group coordinator after 
> commitOffsetsAsync exception
> -
>
> Key: KAFKA-13840
> URL: https://issues.apache.org/jira/browse/KAFKA-13840
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 2.6.1, 3.1.0, 2.7.2, 2.8.1, 3.0.0
>Reporter: Kyle R Stehbens
>Assignee: Luke Chen
>Priority: Major
>
> Hi, I've discovered an issue with the java Kafka client (consumer) whereby a 
> timeout or any other retry-able exception triggered during an async offset 
> commit, renders the client unable to recover its group co-coordinator and 
> leaves the client in a broken state.
>  
> I first encountered this using v2.8.1 of the java client, and after going 
> through the code base for all versions of the client, have found it affects 
> all versions of the client from 2.6.1 onward.
> I also confirmed that by rolling back to 2.5.1, the issue is not present.
>  
> The issue stems from changes to how the FindCoordinatorResponseHandler in 
> 2.5.1 used to call clearFindCoordinatorFuture(); on both success and failure 
> here:
> [https://github.com/apache/kafka/blob/0efa8fb0f4c73d92b6e55a112fa45417a67a7dc2/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L783]
>  
> In all future version of the client this call is not made:
> [https://github.com/apache/kafka/blob/839b886f9b732b151e1faeace7303c80641c08c4/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L838]
>  
> What this results in, is when the KafkaConsumer makes a call to 
> coordinator.commitOffsetsAsync(...), if an error occurs such that the 
> coordinator is unavailable here:
> [https://github.com/apache/kafka/blob/c5077c679c372589215a1b58ca84360c683aa6e8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L1007]
>  
> then the client will try call:
> [https://github.com/apache/kafka/blob/c5077c679c372589215a1b58ca84360c683aa6e8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L1017]
> However this will never be able to succeed as it perpetually returns a 
> reference to a failed future: findCoordinatorFuture that is never cleared out.
>  
> This manifests in all future calls to commitOffsetsAsync() throwing a 
> "coordinator unavailable" exception forever going forward after any 
> retry-able exception causes the coordinator to close. 
> Note we discovered this when we upgraded the kafka client in our Flink 
> consumers from 2.4.1 to 2.8.1 and subsequently needed to downgrade the 
> client. We noticed this occurring in our non-flink java consumers too running 
> 3.x client versions.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13877) Flaky RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags

2022-08-03 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13877.
---
Resolution: Fixed

> Flaky 
> RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags
> 
>
> Key: KAFKA-13877
> URL: https://issues.apache.org/jira/browse/KAFKA-13877
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> The following test fails on local testbeds about once per 10-15 runs:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.kafka.streams.integration.RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags(RackAwarenessIntegrationTest.java:192)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:53)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14130) Reduce RackAwarenessIntegrationTest to a unit test

2022-08-01 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-14130:
-

 Summary: Reduce RackAwarenessIntegrationTest to a unit test
 Key: KAFKA-14130
 URL: https://issues.apache.org/jira/browse/KAFKA-14130
 Project: Kafka
  Issue Type: Improvement
  Components: streams, unit tests
Reporter: Guozhang Wang


While working on KAFKA-13877, I feel it's an overkill to introduce the whole 
test class as an integration test, since all we need is to just test the 
assignor itself which could be a unit test. Running this suite with 9+ 
instances takes long time and is still vulnerable to all kinds of timing based 
flakiness. A better choice is to reduce it as a unit test, similar to 
{{HighAvailabilityStreamsPartitionAssignorTest}} that just test the behavior of 
the assignor itself, rather than creating many instances hence depend on 
various timing bombs to not explode.

The scope of this ticket is to refactor the {{RackAwarenessIntegrationTest}} 
into a {{RackAwarenessStreamsPartitionAssignorTest}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13877) Flaky RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags

2022-08-01 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573993#comment-17573993
 ] 

Guozhang Wang commented on KAFKA-13877:
---

I've re-tested this case locally with 5+ times, each with 50 runs, and 
identified it is a flakiness by itself, not a real bug. The fix is summarized 
in https://github.com/apache/kafka/pull/12468.

On a side note, I think it's an overkill to really introduce the whole test 
class as an integration test, since all we need is to just test the assignor 
itself which could be a unit test. Running this suite with 9+ instances takes 
long time and is still vulnerable to all kinds of timing based flakiness (yes 
this PR alone cannot guarantee we've avoided all). So I will file a separate 
ticket for reducing this test into unit tests.

> Flaky 
> RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags
> 
>
> Key: KAFKA-13877
> URL: https://issues.apache.org/jira/browse/KAFKA-13877
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> The following test fails on local testbeds about once per 10-15 runs:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.kafka.streams.integration.RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags(RackAwarenessIntegrationTest.java:192)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:53)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-13877) Flaky RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags

2022-08-01 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-13877:
-

Assignee: Guozhang Wang  (was: Levani Kokhreidze)

> Flaky 
> RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags
> 
>
> Key: KAFKA-13877
> URL: https://issues.apache.org/jira/browse/KAFKA-13877
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> The following test fails on local testbeds about once per 10-15 runs:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.kafka.streams.integration.RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags(RackAwarenessIntegrationTest.java:192)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:53)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13877) Flaky RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags

2022-07-29 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573180#comment-17573180
 ] 

Guozhang Wang commented on KAFKA-13877:
---

[~lkokhreidze] ping again, please let me know if you are still working on it.

> Flaky 
> RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags
> 
>
> Key: KAFKA-13877
> URL: https://issues.apache.org/jira/browse/KAFKA-13877
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Levani Kokhreidze
>Priority: Major
>  Labels: newbie
>
> The following test fails on local testbeds about once per 10-15 runs:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.kafka.streams.integration.RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags(RackAwarenessIntegrationTest.java:192)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:53)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13877) Flaky RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags

2022-07-25 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571014#comment-17571014
 ] 

Guozhang Wang commented on KAFKA-13877:
---

Hello [~lkokhreidze], are you still actively working on this flaky test?

> Flaky 
> RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags
> 
>
> Key: KAFKA-13877
> URL: https://issues.apache.org/jira/browse/KAFKA-13877
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Levani Kokhreidze
>Priority: Major
>  Labels: newbie
>
> The following test fails on local testbeds about once per 10-15 runs:
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.kafka.streams.integration.RackAwarenessIntegrationTest.shouldDistributeStandbyReplicasOverMultipleClientTags(RackAwarenessIntegrationTest.java:192)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:53)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14024) Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare

2022-07-19 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568732#comment-17568732
 ] 

Guozhang Wang commented on KAFKA-14024:
---

Thanks to [~aiquestion] for filing this and also submitting the PR, I've added 
you as a contributor and assigned the ticket to you too.

> Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare
> --
>
> Key: KAFKA-14024
> URL: https://issues.apache.org/jira/browse/KAFKA-14024
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.0
>Reporter: Shawn Wang
>Assignee: Shawn Wang
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.1
>
>
> Hi 
> In https://issues.apache.org/jira/browse/KAFKA-13310. we tried to fix a issue 
> that consumer#poll(duration) will be returned after the provided duration. 
> It's because if rebalance needed, we'll try to commit current offset first 
> before rebalance synchronously. And if the offset committing takes too long, 
> the consumer#poll will spend more time than provided duration. To fix that, 
> we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
>  
> However, in this ticket, we found the async commit will keep sending a new 
> commit request during each Consumer#poll, because the offset commit never 
> completes in time. The impact is that the existing consumer will be kicked 
> out of the group after rebalance timeout without joining the group. That is, 
> suppose we have consumer A in group G, and now consumer B joined the group, 
> after the rebalance, only consumer B in the group.
>  
> The workaround for this issue is to change the assignor back to eager 
> assignors, ex: StickyAssignor, RoundRobinAssignor.
>  
> To fix the issue, we come out 2 solutions:
>  # we can explicitly wait for the async commit complete in onPrepareJoin, but 
> that would let the KAFKA-13310 issue happen again.
>  # 2.we can try to keep the async commit offset future currently inflight. So 
> that we can make sure each Consumer#poll, we are waiting for the future 
> completes
>  
> Besides, there's also another bug found during fixing this bug. Before 
> KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry 
> when retriable error until timeout. After KAFKA-13310, we thought we have 
> retry, but we'll retry after partitions revoking. That is, even though the 
> retried offset commit successfully, it still causes some partitions offsets 
> un-committed, and after rebalance, other consumers will consume overlapping 
> records.
>  
>  
> ===
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L752]
>  
> we didn't wait for client to receive commit offset response here, so 
> onJoinPrepareAsyncCommitCompleted will be false in cooperative rebalance, and 
> client will loop in invoking onJoinPrepare.
> I think the EAGER mode don't have this problem because it will revoke the 
> partitions even if onJoinPrepareAsyncCommitCompleted=false and will not try 
> to commit next round.
> reproduce:
>  * single node Kafka version 3.2.0 && client version 3.2.0
>  * topic1 have 5 partititons
>  * start a consumer1 (cooperative rebalance)
>  * start another consumer2 (same consumer group)
>  * consumer1 will hang for a long time before re-join
>  * from server log consumer1 rebalance timeout before joineGroup and re-join 
> with another memberId
> consume1's log keeps printing:
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xx-1, groupId=xxx] Executing onJoinPrepare with generation 
> 54 and memberId consumer-xxx-1-fd3d04a8-009a-4ed1-949e-71b636716938 
> (ConsumerCoordinator.java:739)
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xxx-1, groupId=xxx] Sending asynchronous auto-commit of 
> offsets \{topic1-4=OffsetAndMetadata{offset=5, leaderEpoch=0, metadata=''}} 
> (ConsumerCoordinator.java:1143)
>  
> and coordinator's log:
> [2022-06-26 17:00:13,855] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group xxx in state PreparingRebalance with old generation 56 
> (__consumer_offsets-30) (reason: Adding new member 
> consumer-xxx-1-fa7fe5ec-bd2f-42f6-b5d7-c5caeafe71ac with group instance id 
> None; client reason: rebalance failed due to 'The group member needs to have 
> a valid member id before actually entering a consumer group.' 
> (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,855] INFO [GroupCoordinator 0]: Group xxx removed 
> dynamic members who haven't joined: 
> Set(consumer-xxx-1-d62a0923-6ca6-48dd-a84e-f97136d4603a) 

[jira] [Updated] (KAFKA-14024) Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare

2022-07-19 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-14024:
--
Reviewer: Luke Chen

> Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare
> --
>
> Key: KAFKA-14024
> URL: https://issues.apache.org/jira/browse/KAFKA-14024
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.0
>Reporter: Shawn Wang
>Assignee: Shawn Wang
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.1
>
>
> Hi 
> In https://issues.apache.org/jira/browse/KAFKA-13310. we tried to fix a issue 
> that consumer#poll(duration) will be returned after the provided duration. 
> It's because if rebalance needed, we'll try to commit current offset first 
> before rebalance synchronously. And if the offset committing takes too long, 
> the consumer#poll will spend more time than provided duration. To fix that, 
> we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
>  
> However, in this ticket, we found the async commit will keep sending a new 
> commit request during each Consumer#poll, because the offset commit never 
> completes in time. The impact is that the existing consumer will be kicked 
> out of the group after rebalance timeout without joining the group. That is, 
> suppose we have consumer A in group G, and now consumer B joined the group, 
> after the rebalance, only consumer B in the group.
>  
> The workaround for this issue is to change the assignor back to eager 
> assignors, ex: StickyAssignor, RoundRobinAssignor.
>  
> To fix the issue, we come out 2 solutions:
>  # we can explicitly wait for the async commit complete in onPrepareJoin, but 
> that would let the KAFKA-13310 issue happen again.
>  # 2.we can try to keep the async commit offset future currently inflight. So 
> that we can make sure each Consumer#poll, we are waiting for the future 
> completes
>  
> Besides, there's also another bug found during fixing this bug. Before 
> KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry 
> when retriable error until timeout. After KAFKA-13310, we thought we have 
> retry, but we'll retry after partitions revoking. That is, even though the 
> retried offset commit successfully, it still causes some partitions offsets 
> un-committed, and after rebalance, other consumers will consume overlapping 
> records.
>  
>  
> ===
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L752]
>  
> we didn't wait for client to receive commit offset response here, so 
> onJoinPrepareAsyncCommitCompleted will be false in cooperative rebalance, and 
> client will loop in invoking onJoinPrepare.
> I think the EAGER mode don't have this problem because it will revoke the 
> partitions even if onJoinPrepareAsyncCommitCompleted=false and will not try 
> to commit next round.
> reproduce:
>  * single node Kafka version 3.2.0 && client version 3.2.0
>  * topic1 have 5 partititons
>  * start a consumer1 (cooperative rebalance)
>  * start another consumer2 (same consumer group)
>  * consumer1 will hang for a long time before re-join
>  * from server log consumer1 rebalance timeout before joineGroup and re-join 
> with another memberId
> consume1's log keeps printing:
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xx-1, groupId=xxx] Executing onJoinPrepare with generation 
> 54 and memberId consumer-xxx-1-fd3d04a8-009a-4ed1-949e-71b636716938 
> (ConsumerCoordinator.java:739)
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xxx-1, groupId=xxx] Sending asynchronous auto-commit of 
> offsets \{topic1-4=OffsetAndMetadata{offset=5, leaderEpoch=0, metadata=''}} 
> (ConsumerCoordinator.java:1143)
>  
> and coordinator's log:
> [2022-06-26 17:00:13,855] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group xxx in state PreparingRebalance with old generation 56 
> (__consumer_offsets-30) (reason: Adding new member 
> consumer-xxx-1-fa7fe5ec-bd2f-42f6-b5d7-c5caeafe71ac with group instance id 
> None; client reason: rebalance failed due to 'The group member needs to have 
> a valid member id before actually entering a consumer group.' 
> (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,855] INFO [GroupCoordinator 0]: Group xxx removed 
> dynamic members who haven't joined: 
> Set(consumer-xxx-1-d62a0923-6ca6-48dd-a84e-f97136d4603a) 
> (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,856] INFO [GroupCoordinator 0]: Stabilized group xxx 
> generation 57 

[jira] [Assigned] (KAFKA-14024) Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare

2022-07-19 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-14024:
-

Assignee: Guozhang Wang

> Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare
> --
>
> Key: KAFKA-14024
> URL: https://issues.apache.org/jira/browse/KAFKA-14024
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.0
>Reporter: Shawn Wang
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.1
>
>
> Hi 
> In https://issues.apache.org/jira/browse/KAFKA-13310. we tried to fix a issue 
> that consumer#poll(duration) will be returned after the provided duration. 
> It's because if rebalance needed, we'll try to commit current offset first 
> before rebalance synchronously. And if the offset committing takes too long, 
> the consumer#poll will spend more time than provided duration. To fix that, 
> we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
>  
> However, in this ticket, we found the async commit will keep sending a new 
> commit request during each Consumer#poll, because the offset commit never 
> completes in time. The impact is that the existing consumer will be kicked 
> out of the group after rebalance timeout without joining the group. That is, 
> suppose we have consumer A in group G, and now consumer B joined the group, 
> after the rebalance, only consumer B in the group.
>  
> The workaround for this issue is to change the assignor back to eager 
> assignors, ex: StickyAssignor, RoundRobinAssignor.
>  
> To fix the issue, we come out 2 solutions:
>  # we can explicitly wait for the async commit complete in onPrepareJoin, but 
> that would let the KAFKA-13310 issue happen again.
>  # 2.we can try to keep the async commit offset future currently inflight. So 
> that we can make sure each Consumer#poll, we are waiting for the future 
> completes
>  
> Besides, there's also another bug found during fixing this bug. Before 
> KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry 
> when retriable error until timeout. After KAFKA-13310, we thought we have 
> retry, but we'll retry after partitions revoking. That is, even though the 
> retried offset commit successfully, it still causes some partitions offsets 
> un-committed, and after rebalance, other consumers will consume overlapping 
> records.
>  
>  
> ===
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L752]
>  
> we didn't wait for client to receive commit offset response here, so 
> onJoinPrepareAsyncCommitCompleted will be false in cooperative rebalance, and 
> client will loop in invoking onJoinPrepare.
> I think the EAGER mode don't have this problem because it will revoke the 
> partitions even if onJoinPrepareAsyncCommitCompleted=false and will not try 
> to commit next round.
> reproduce:
>  * single node Kafka version 3.2.0 && client version 3.2.0
>  * topic1 have 5 partititons
>  * start a consumer1 (cooperative rebalance)
>  * start another consumer2 (same consumer group)
>  * consumer1 will hang for a long time before re-join
>  * from server log consumer1 rebalance timeout before joineGroup and re-join 
> with another memberId
> consume1's log keeps printing:
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xx-1, groupId=xxx] Executing onJoinPrepare with generation 
> 54 and memberId consumer-xxx-1-fd3d04a8-009a-4ed1-949e-71b636716938 
> (ConsumerCoordinator.java:739)
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xxx-1, groupId=xxx] Sending asynchronous auto-commit of 
> offsets \{topic1-4=OffsetAndMetadata{offset=5, leaderEpoch=0, metadata=''}} 
> (ConsumerCoordinator.java:1143)
>  
> and coordinator's log:
> [2022-06-26 17:00:13,855] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group xxx in state PreparingRebalance with old generation 56 
> (__consumer_offsets-30) (reason: Adding new member 
> consumer-xxx-1-fa7fe5ec-bd2f-42f6-b5d7-c5caeafe71ac with group instance id 
> None; client reason: rebalance failed due to 'The group member needs to have 
> a valid member id before actually entering a consumer group.' 
> (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,855] INFO [GroupCoordinator 0]: Group xxx removed 
> dynamic members who haven't joined: 
> Set(consumer-xxx-1-d62a0923-6ca6-48dd-a84e-f97136d4603a) 
> (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,856] INFO [GroupCoordinator 0]: Stabilized group xxx 
> generation 57 

[jira] [Assigned] (KAFKA-14024) Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare

2022-07-19 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-14024:
-

Assignee: Shawn Wang  (was: Guozhang Wang)

> Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare
> --
>
> Key: KAFKA-14024
> URL: https://issues.apache.org/jira/browse/KAFKA-14024
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.0
>Reporter: Shawn Wang
>Assignee: Shawn Wang
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.1
>
>
> Hi 
> In https://issues.apache.org/jira/browse/KAFKA-13310. we tried to fix a issue 
> that consumer#poll(duration) will be returned after the provided duration. 
> It's because if rebalance needed, we'll try to commit current offset first 
> before rebalance synchronously. And if the offset committing takes too long, 
> the consumer#poll will spend more time than provided duration. To fix that, 
> we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
>  
> However, in this ticket, we found the async commit will keep sending a new 
> commit request during each Consumer#poll, because the offset commit never 
> completes in time. The impact is that the existing consumer will be kicked 
> out of the group after rebalance timeout without joining the group. That is, 
> suppose we have consumer A in group G, and now consumer B joined the group, 
> after the rebalance, only consumer B in the group.
>  
> The workaround for this issue is to change the assignor back to eager 
> assignors, ex: StickyAssignor, RoundRobinAssignor.
>  
> To fix the issue, we come out 2 solutions:
>  # we can explicitly wait for the async commit complete in onPrepareJoin, but 
> that would let the KAFKA-13310 issue happen again.
>  # 2.we can try to keep the async commit offset future currently inflight. So 
> that we can make sure each Consumer#poll, we are waiting for the future 
> completes
>  
> Besides, there's also another bug found during fixing this bug. Before 
> KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry 
> when retriable error until timeout. After KAFKA-13310, we thought we have 
> retry, but we'll retry after partitions revoking. That is, even though the 
> retried offset commit successfully, it still causes some partitions offsets 
> un-committed, and after rebalance, other consumers will consume overlapping 
> records.
>  
>  
> ===
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L752]
>  
> we didn't wait for client to receive commit offset response here, so 
> onJoinPrepareAsyncCommitCompleted will be false in cooperative rebalance, and 
> client will loop in invoking onJoinPrepare.
> I think the EAGER mode don't have this problem because it will revoke the 
> partitions even if onJoinPrepareAsyncCommitCompleted=false and will not try 
> to commit next round.
> reproduce:
>  * single node Kafka version 3.2.0 && client version 3.2.0
>  * topic1 have 5 partititons
>  * start a consumer1 (cooperative rebalance)
>  * start another consumer2 (same consumer group)
>  * consumer1 will hang for a long time before re-join
>  * from server log consumer1 rebalance timeout before joineGroup and re-join 
> with another memberId
> consume1's log keeps printing:
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xx-1, groupId=xxx] Executing onJoinPrepare with generation 
> 54 and memberId consumer-xxx-1-fd3d04a8-009a-4ed1-949e-71b636716938 
> (ConsumerCoordinator.java:739)
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> clientId=consumer-xxx-1, groupId=xxx] Sending asynchronous auto-commit of 
> offsets \{topic1-4=OffsetAndMetadata{offset=5, leaderEpoch=0, metadata=''}} 
> (ConsumerCoordinator.java:1143)
>  
> and coordinator's log:
> [2022-06-26 17:00:13,855] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group xxx in state PreparingRebalance with old generation 56 
> (__consumer_offsets-30) (reason: Adding new member 
> consumer-xxx-1-fa7fe5ec-bd2f-42f6-b5d7-c5caeafe71ac with group instance id 
> None; client reason: rebalance failed due to 'The group member needs to have 
> a valid member id before actually entering a consumer group.' 
> (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,855] INFO [GroupCoordinator 0]: Group xxx removed 
> dynamic members who haven't joined: 
> Set(consumer-xxx-1-d62a0923-6ca6-48dd-a84e-f97136d4603a) 
> (kafka.coordinator.group.GroupCoordinator)
> [2022-06-26 17:00:43,856] INFO [GroupCoordinator 0]: Stabilized group xxx 
> 

[jira] [Commented] (KAFKA-14024) Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare

2022-07-19 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568731#comment-17568731
 ] 

Guozhang Wang commented on KAFKA-14024:
---

Hello [~mumrah], I took a look at the ticket and also the PR 
(https://github.com/apache/kafka/pull/12349/files) as well, and I agree with 
[~showuon] that this is a pretty bad regression that we should consider fixing 
asap and hence worthy as a blocker for 3.2.1.

As for the PR, personally I'd simplify it a bit than the current fix, to 
`onJoinPrepare` more re-entrant and idempotent: more specifically when the 
caller thread of `poll` enters `onJoinPrepare`, it will check if there's 
already a commit in-flight already and is completed, and if not send out the 
request and return from `onJoinPrepare` immediately, and hence return from the 
`poll` call as well; and the next `poll` call would re-enter `onJoinPrepare` 
and check if the commit request has completed; only if the maintained commit 
future has been completed then would it continue within the function to revoke 
partitions, trigger callbacks etc. In this way we would not need a separate 
timer inside the `onJoinPrepare` for the commit itself. But since [~showuon] is 
almost done reviewing it I think I would leave it to him, rather not block on 
merging it.

In the new rebalance protocol (KIP-848) we would have a much simpler model on 
the client side so hopefully we would not fall in this awkward design pattern 
any more.

> Consumer stuck during cooperative rebalance for Commit offset in onJoinPrepare
> --
>
> Key: KAFKA-14024
> URL: https://issues.apache.org/jira/browse/KAFKA-14024
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.2.0
>Reporter: Shawn Wang
>Priority: Blocker
>  Labels: new-consumer-threading-should-fix
> Fix For: 3.3.0, 3.2.1
>
>
> Hi 
> In https://issues.apache.org/jira/browse/KAFKA-13310. we tried to fix a issue 
> that consumer#poll(duration) will be returned after the provided duration. 
> It's because if rebalance needed, we'll try to commit current offset first 
> before rebalance synchronously. And if the offset committing takes too long, 
> the consumer#poll will spend more time than provided duration. To fix that, 
> we change commit sync with commit async before rebalance (i.e. onPrepareJoin).
>  
> However, in this ticket, we found the async commit will keep sending a new 
> commit request during each Consumer#poll, because the offset commit never 
> completes in time. The impact is that the existing consumer will be kicked 
> out of the group after rebalance timeout without joining the group. That is, 
> suppose we have consumer A in group G, and now consumer B joined the group, 
> after the rebalance, only consumer B in the group.
>  
> The workaround for this issue is to change the assignor back to eager 
> assignors, ex: StickyAssignor, RoundRobinAssignor.
>  
> To fix the issue, we come out 2 solutions:
>  # we can explicitly wait for the async commit complete in onPrepareJoin, but 
> that would let the KAFKA-13310 issue happen again.
>  # 2.we can try to keep the async commit offset future currently inflight. So 
> that we can make sure each Consumer#poll, we are waiting for the future 
> completes
>  
> Besides, there's also another bug found during fixing this bug. Before 
> KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry 
> when retriable error until timeout. After KAFKA-13310, we thought we have 
> retry, but we'll retry after partitions revoking. That is, even though the 
> retried offset commit successfully, it still causes some partitions offsets 
> un-committed, and after rebalance, other consumers will consume overlapping 
> records.
>  
>  
> ===
> [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L752]
>  
> we didn't wait for client to receive commit offset response here, so 
> onJoinPrepareAsyncCommitCompleted will be false in cooperative rebalance, and 
> client will loop in invoking onJoinPrepare.
> I think the EAGER mode don't have this problem because it will revoke the 
> partitions even if onJoinPrepareAsyncCommitCompleted=false and will not try 
> to commit next round.
> reproduce:
>  * single node Kafka version 3.2.0 && client version 3.2.0
>  * topic1 have 5 partititons
>  * start a consumer1 (cooperative rebalance)
>  * start another consumer2 (same consumer group)
>  * consumer1 will hang for a long time before re-join
>  * from server log consumer1 rebalance timeout before joineGroup and re-join 
> with another memberId
> consume1's log keeps printing:
> 16:59:16 [main] DEBUG o.a.k.c.c.i.ConsumerCoordinator - [Consumer 
> 

[jira] [Commented] (KAFKA-13846) Add an overloaded metricOrElseCreate function in Metrics

2022-07-14 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566989#comment-17566989
 ] 

Guozhang Wang commented on KAFKA-13846:
---

Hi Jose the PR has been merged actually so we should close this ticket. I will 
go ahead and do it.

> Add an overloaded metricOrElseCreate function in Metrics
> 
>
> Key: KAFKA-13846
> URL: https://issues.apache.org/jira/browse/KAFKA-13846
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Guozhang Wang
>Assignee: Sagar Rao
>Priority: Major
>  Labels: newbie
>
> The `Metrics` registry is often used by concurrent threads, however it's 
> get/create APIs are not well suited for it. A common pattern from the user 
> today is:
> {code}
> metric = metrics.metric(metricName);
> if (metric == null) {
>   try {
> metrics.createMetric(..)
>   } catch (IllegalArgumentException e){
> // another thread may create the metric at the mean time
>   }
> } 
> {code}
> Otherwise the caller would need to synchronize the whole block trying to get 
> the metric. However, the `createMetric` function call itself indeed 
> synchronize internally on updating the metric map.
> So we could consider adding a metricOrElseCreate function which is similar to 
> createMetric, but instead of throwing an illegal argument exception within 
> the internal synchronization block, it would just return the already existing 
> metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13846) Add an overloaded metricOrElseCreate function in Metrics

2022-07-14 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13846.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

> Add an overloaded metricOrElseCreate function in Metrics
> 
>
> Key: KAFKA-13846
> URL: https://issues.apache.org/jira/browse/KAFKA-13846
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Guozhang Wang
>Assignee: Sagar Rao
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
>
> The `Metrics` registry is often used by concurrent threads, however it's 
> get/create APIs are not well suited for it. A common pattern from the user 
> today is:
> {code}
> metric = metrics.metric(metricName);
> if (metric == null) {
>   try {
> metrics.createMetric(..)
>   } catch (IllegalArgumentException e){
> // another thread may create the metric at the mean time
>   }
> } 
> {code}
> Otherwise the caller would need to synchronize the whole block trying to get 
> the metric. However, the `createMetric` function call itself indeed 
> synchronize internally on updating the metric map.
> So we could consider adding a metricOrElseCreate function which is similar to 
> createMetric, but instead of throwing an illegal argument exception within 
> the internal synchronization block, it would just return the already existing 
> metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14069) Allow custom configuration of foreign key join internal topics

2022-07-13 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566419#comment-17566419
 ] 

Guozhang Wang commented on KAFKA-14069:
---

Hello [~ebrard], thanks for bringing this ticket up.

These internal topics are considered as repartition topics and Kafka Streams 
uses the admin client's `DeleteRecords` API to periodically truncate them after 
read, so these topics should not grow indefinitely. Did you observe such delete 
records request never being issued (which indicates a bug)? Or do you observe 
the delete-records rate cannot catch up with the append rate (for this case, 
you can consider configuring "repartition.purge.interval.ms")?

> Allow custom configuration of foreign key join internal topics
> --
>
> Key: KAFKA-14069
> URL: https://issues.apache.org/jira/browse/KAFKA-14069
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Emmanuel Brard
>Priority: Minor
>
> Internal topic supporting foreign key joins (-subscription-registration-topic 
> and -subscription-response-topic) are automatically created with_ infinite 
> retention_ (retention.ms=-1, retention.bytes=-1).
> As far as I understand those topics are used for communication between tasks 
> that are involved in the FK, the intermediate result though is persisted in a 
> compacted topic (-subscription-store-changelog).
> This means, if I understood right, that during normal operation of the stream 
> application, once a message is read from the registration/subscription topic, 
> it will not be read again, even in case of recovery (the position in those 
> topics is committed).
> Because we have very large tables being joined this way with very high 
> changes frequency, we end up with FK internal topics in the order of 1 or 2 
> TB. This is complicated to maintain especially in term of disk space.
> I was wondering if:
> - this infinite retention is really a required configuration and if not
> - this infinite retention could be replaced with a configurable one (for 
> example of 1 week, meaning that I accept that in case of failure I must this 
> my app within one week)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14070) Improve documentation for queryMetadataForKey

2022-07-13 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566414#comment-17566414
 ] 

Guozhang Wang commented on KAFKA-14070:
---

Hello [~balajirrao], thanks for filing this ticket. Could you provide a 
specific example of key types that would break `queryMetadataForKey`? Note that 
the function takes in a partitioner still to ask users to provide the 
partitioner if possible to determine which partition contains the specific key.

> Improve documentation for queryMetadataForKey
> -
>
> Key: KAFKA-14070
> URL: https://issues.apache.org/jira/browse/KAFKA-14070
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 3.2.0
>Reporter: Balaji Rao
>Priority: Minor
>
> When using key-value state stores with Processor API, one can add key-value 
> state stores of arbitrary key types to a topology. This could lead to the 
> method `queryMetadataForKey` in `KafkaStreams` to be used with incorrect 
> expectations.
> In my understanding, `queryMetadataForKey` uses the source topics of the 
> processor connected to the store to return the `KeyQueryMetadata`. This means 
> that it could provide "incorrect" answers when used with key-value stores of 
> arbitrary key types. The description of the method should be improved to make 
> users aware of this pitfall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13386) Foreign Key Join filtering out valid records after a code change / schema evolved

2022-06-29 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17560812#comment-17560812
 ] 

Guozhang Wang commented on KAFKA-13386:
---

Hi [~nikuis], I agree with the same FK1 it will be sent to the same node and 
hence won't have out of ordering data, but you'd still end up with two 
duplicated join result since the second subscription would come back and join 
with the current value again (assuming it's still the same).

I think by the time we have multi-versioned join, we can relax on the ordering 
of the emit since different versioned join result would not rely on the emit 
ordering any more.

> Foreign Key Join filtering out valid records after a code change / schema 
> evolved
> -
>
> Key: KAFKA-13386
> URL: https://issues.apache.org/jira/browse/KAFKA-13386
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.2
>Reporter: Sergio Duran Vegas
>Priority: Major
>
> The join optimization assumes the serializer is deterministic and invariant 
> across upgrades. So in case of changes this opimitzation will drop 
> invalid/intermediate records. In other situations we have relied on the same 
> property, for example when computing whether an update is a duplicate result 
> or not.
>  
> The problem is that some serializers are sadly not deterministic.
>  
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionResolverJoinProcessorSupplier.java]
>  
> {code:java}
> //If this value doesn't match the current value from the original table, it 
> is stale and should be discarded.
>  if (java.util.Arrays.equals(messageHash, currentHash)) {{code}
>  
> A solution for this problem would be that the comparison use foreign-key 
> reference itself instead of the whole message hash.
>  
> The bug fix proposal is to be allow the user to choose between one method of 
> comparison or another (whole hash or Fk reference). This would fix the 
> problem of dropping valid records on certain cases and allow the user to also 
> choose the current optimized way of checking valid records and intermediate 
> results dropping.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-12478) Consumer group may lose data for newly expanded partitions when add partitions for topic if the group is set to consume from the latest

2022-06-22 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557576#comment-17557576
 ] 

Guozhang Wang commented on KAFKA-12478:
---

Thanks [~hudeqi], I will take a look at the KIP.

> Consumer group may lose data for newly expanded partitions when add 
> partitions for topic if the group is set to consume from the latest
> ---
>
> Key: KAFKA-12478
> URL: https://issues.apache.org/jira/browse/KAFKA-12478
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 3.1.1
>Reporter: hudeqi
>Priority: Blocker
>  Labels: kip-842
> Attachments: safe-console-consumer.png, safe-consume.png, 
> safe-produce.png, trunk-console-consumer.png, trunk-consume.png, 
> trunk-produce.png
>
>   Original Estimate: 1,158h
>  Remaining Estimate: 1,158h
>
>   This problem is exposed in our product environment: a topic is used to 
> produce monitoring data. *After expanding partitions, the consumer side of 
> the business reported that the data is lost.*
>   After preliminary investigation, the lost data is all concentrated in the 
> newly expanded partitions. The reason is: when the server expands, the 
> producer firstly perceives the expansion, and some data is written in the 
> newly expanded partitions. But the consumer group perceives the expansion 
> later, after the rebalance is completed, the newly expanded partitions will 
> be consumed from the latest if it is set to consume from the latest. Within a 
> period of time, the data of the newly expanded partitions is skipped and lost 
> by the consumer.
>   If it is not necessarily set to consume from the earliest for a huge data 
> flow topic when starts up, this will make the group consume historical data 
> from the broker crazily, which will affect the performance of brokers to a 
> certain extent. Therefore, *it is necessary to consume these partitions from 
> the earliest separately.*
>  
> I did a test and the result is as attached screenshot. Firstly, set by 
> producer and consumer "metadata.max.age.ms" are 500ms and 3ms 
> respectively.
> _trunk-console-consumer.png_ means to use the community version to start the 
> consumer and set "latest". 
> _trunk-produce.png_ means the data produced, "partition_count" means the 
> number of partitions of the current topic, "message" means the digital 
> content of the corresponding message, "send_to_partition_index" Indicates the 
> index of the partition to which the corresponding message is sent. It can be 
> seen that at 11:32:10, the producer perceives the expansion of the total 
> partitions from 2 to 3, and writes the numbers 38, 41, and 44 into the newly 
> expanded partition 2.
> _trunk-consume.png_ represents all the digital content consumed by the 
> community version. You can see that 38 and 41 sent to partition 2 were not 
> consumed at the beginning. Finally, after partition 2 was perceived, 38 and 
> 41 were still not consumed. Instead, it has been consumed from the latest 44, 
> so the two data of 38 and 41 are discarded.
>  
> _safe-console-consumer.png_ means to use the fixed version to start the 
> consumer and set "safe_latest". 
> _safe-produce.png_ means the data produced. It can be seen that at 12:12:09, 
> the producer perceives the expansion of the total partitions from 4 to 5, and 
> writes the numbers 109 and 114 into the newly expanded partition 4.
> _safe-consume.png_ represents all the digital content consumed by the fixed 
> version. You can see that 109 sent to partition 4 were not consumed at the 
> beginning. Finally, after partition 4 was perceived,109 was consumed as the 
> first data of partition 4. So the fixed version will not cause consumption to 
> lose data under this condition.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


  1   2   3   4   5   6   7   8   9   10   >