from:"Matthias J. Sax"

[jira] [Commented] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-17 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690560#comment-17690560
 ] 

Matthias J. Sax commented on KAFKA-14713:
-

Ah. Thanks. That makes sense. Did not look into the consumer code, only 
streams. So it's fixed via https://issues.apache.org/jira/browse/KAFKA-12980 in 
3.2.0 – updated the ticket accordingly. Thanks for getting back. It bugged my 
that I did not understand it :) 

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.2.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-14713:
-

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.2.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-14713:
-

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.2.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14713:

Fix Version/s: 3.2.0
   (was: 3.4.0)

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.2.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-16 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690055#comment-17690055
 ] 

Matthias J. Sax edited comment on KAFKA-14713 at 2/16/23 11:52 PM:
---

Thanks for getting back. Glad it's resolved. I am not sure why though. 
Comparing 3.4 and 3.0 code, it seems they do the same thing.

In the end, if you have valid checkpoint on restart, you should not even hit 
`poll(pollMsPlusRequestTimeout)` during restore, because it should hold that 
`offset == highWatermark` and we should not enter the while-loop...

For K14442, we know that `offset == highWatermark - 1` (because we write the 
"incorrect" watermark into the checkpoint file), and thus `poll()` is executed 
and hangs because there is no data – the last "record" is just a commit marker.


was (Author: mjsax):
Thanks for getting back. Glad it's resolved. I am not sure why though. 
Comparing 3.4 and 3.0 code, it seems they do the same thing.

In the end, if you have valid checkpoint on restart, you should not even hit 
`poll(pollMsPlusRequestTimeout)` during restore, because it should hold that 
`offset == highWatermark` and we should not enter the while-loop...

For K14442, we know that `offset == highWatermark - 1` (because we write the 
"incorrect" watermark into the checkpoint file), and thus you thus `poll()` is 
executed and hang because there is no data – the last "record" is just a commit 
marker.

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.4.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-16 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690055#comment-17690055
 ] 

Matthias J. Sax commented on KAFKA-14713:
-

Thanks for getting back. Glad it's resolved. I am not sure why though. 
Comparing 3.4 and 3.0 code, it seems they do the same thing.

In the end, if you have valid checkpoint on restart, you should not even hit 
`poll(pollMsPlusRequestTimeout)` during restore, because it should hold that 
`offset == highWatermark` and we should not enter the while-loop...

For K14442, we know that `offset == highWatermark - 1` (because we write the 
"incorrect" watermark into the checkpoint file), and thus you thus `poll()` is 
executed and hang because there is no data – the last "record" is just a commit 
marker.

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.2
>Reporter: Tamas
>Priority: Critical
> Fix For: 3.4.0
>
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14722) Make BooleanSerde public

2023-02-15 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14722:

Description: We introduce a "BooleanSerde" via 
[https://github.com/apache/kafka/pull/13249] as internal class. We could make 
it public.  (was: We introduce a "BooleanSerde via 
[https://github.com/apache/kafka/pull/13249] as internal class. We could make 
it public.)

> Make BooleanSerde public
> 
>
> Key: KAFKA-14722
> URL: https://issues.apache.org/jira/browse/KAFKA-14722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Spacrocket
>Priority: Minor
>  Labels: beginner, need-kip, newbie
>
> We introduce a "BooleanSerde" via 
> [https://github.com/apache/kafka/pull/13249] as internal class. We could make 
> it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14722) Make BooleanSerde public

2023-02-15 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689401#comment-17689401
 ] 

Matthias J. Sax commented on KAFKA-14722:
-

Thanks for you interest. We will need a KIP for this change (cf 
[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)]
 – it should not be hard to write the KIP and get it approved.

Let us know if you have any questions.

> Make BooleanSerde public
> 
>
> Key: KAFKA-14722
> URL: https://issues.apache.org/jira/browse/KAFKA-14722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Spacrocket
>Priority: Minor
>  Labels: beginner, need-kip, newbie
>
> We introduce a "BooleanSerde via [https://github.com/apache/kafka/pull/13249] 
> as internal class. We could make it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-15 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689356#comment-17689356
 ] 

Matthias J. Sax edited comment on KAFKA-14713 at 2/15/23 9:44 PM:
--

What version are you using? – Also, can you point me to the code where it 
actually waits/hangs (as you did already looked into it, it would be quicker 
this way). – I am not sure yet, if both issues are actually the same though or 
not. (Maybe the "eos" config on the other ticket is a red herring.) But I guess 
we can dig into it a little bit.


was (Author: mjsax):
What version are you using? – Also, can you point me to the code where is 
actually waits/hangs (as you did already looked into it, it would be quicker 
this way). – I am not sure yet, if the issue is still the same though or not. 
But I guess we can dig into it a little bit.

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Tamas
>Priority: Critical
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-15 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689356#comment-17689356
 ] 

Matthias J. Sax commented on KAFKA-14713:
-

What version are you using? – Also, can you point me to the code where is 
actually waits/hangs (as you did already looked into it, it would be quicker 
this way). – I am not sure yet, if the issue is still the same though or not. 
But I guess we can dig into it a little bit.

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Tamas
>Priority: Critical
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14722) Make BooleanSerde public

2023-02-15 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14722:
---

 Summary: Make BooleanSerde public
 Key: KAFKA-14722
 URL: https://issues.apache.org/jira/browse/KAFKA-14722
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax


We introduce a "BooleanSerde via [https://github.com/apache/kafka/pull/13249] 
as internal class. We could make it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14722) Make BooleanSerde public

2023-02-15 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14722:
---

 Summary: Make BooleanSerde public
 Key: KAFKA-14722
 URL: https://issues.apache.org/jira/browse/KAFKA-14722
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax


We introduce a "BooleanSerde via [https://github.com/apache/kafka/pull/13249] 
as internal class. We could make it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14717) KafkaStreams can' get running if the rebalance happens before StreamThread gets shutdown completely

2023-02-14 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14717:

Component/s: streams

> KafkaStreams can' get running if the rebalance happens before StreamThread 
> gets shutdown completely
> ---
>
> Key: KAFKA-14717
> URL: https://issues.apache.org/jira/browse/KAFKA-14717
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 3.5.0
>
>
> I noticed this issue when tracing KAFKA-7109
> StreamThread closes the consumer before changing state to DEAD. If the 
> partition rebalance happens quickly, the other StreamThreads can't change 
> KafkaStream state from REBALANCING to RUNNING since there is a 
> PENDING_SHUTDOWN StreamThread



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14713) Kafka Streams global table startup takes too long

2023-02-14 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688679#comment-17688679
 ] 

Matthias J. Sax commented on KAFKA-14713:
-

Sounds like a duplicate to https://issues.apache.org/jira/browse/KAFKA-14442 ? 
Can we close this ticket?

> Kafka Streams global table startup takes too long
> -
>
> Key: KAFKA-14713
> URL: https://issues.apache.org/jira/browse/KAFKA-14713
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Tamas
>Priority: Critical
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-889 Versioned State Stores

2023-02-14 Thread Matthias J. Sax


Thanks Victoria. Makes sense to me.


On 2/13/23 5:55 PM, Victoria Xia wrote:

Hi everyone,

I have just pushed two minor amendments to KIP-889:

- Updated the versioned store specification to clarify that the *"history
retention" parameter is also used as "grace period,"* which means that
writes (including inserts, updates, and deletes) to the store will not be
accepted if the associated timestamp is older than the store's grace period
(i.e., history retention) relative to the current observed stream time.
   - Additional context: previously, the KIP was not explicit about
   if/when old writes would no longer be accepted. The reason for
enforcing a
   strict grace period after which writes will no longer be accepted is
   because otherwise tombstones must be retained indefinitely -- if
the latest
   value for a key is a very old tombstone, we would not be able to
expire it
   from the store because if there’s an even older non-null put to the store
   later, then without the tombstone the store would accept this
write as the
   latest value for the key, even though it isn't. In the spirit of
not adding
   more to this KIP which has already been accepted, I do not propose to add
   additional interfaces to allow users to configure grace period separately
   from history retention at this time. Such options can be introduced in a
   future KIP in a backwards-compatible way.
- Added a *new method to TopologyTestDriver* for getting a versioned
store: getVersionedKeyValueStore().
   - This new method is analogous to existing methods for other types of
   stores, and its previous omission from the KIP was an oversight.

If there are no concerns / objections, then perhaps these updates are minor
enough that we can proceed without re-voting.

Happy to discuss,
Victoria

On Wed, Dec 21, 2022 at 8:22 AM Victoria Xia 
wrote:


Hi everyone,

We have 3 binding and 1 non-binding vote in favor of this KIP (and no
objections) so KIP-889 is now accepted.

Thanks for voting, and for your excellent comments in the KIP discussion
thread!

Happy holidays,
Victoria

On Tue, Dec 20, 2022 at 12:24 PM Sagar  wrote:


Hi Victoria,

+1 (non-binding).

Thanks!
Sagar.

On Tue, Dec 20, 2022 at 1:39 PM Bruno Cadonna  wrote:


Hi Victoria,

Thanks for the KIP!

+1 (binding)

Best,
Bruno

On 19.12.22 20:03, Matthias J. Sax wrote:

+1 (binding)

On 12/15/22 1:27 PM, John Roesler wrote:

Thanks for the thorough KIP, Victoria!

I'm +1 (binding)

-John

On 2022/12/15 19:56:21 Victoria Xia wrote:

Hi all,

I'd like to start a vote on KIP-889 for introducing versioned

key-value

state stores to Kafka Streams:




https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores


The discussion thread has been open for a few weeks now and has
converged
among the current participants.

Thanks,
Victoria

[jira] [Updated] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-02-07 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14660:

Fix Version/s: 3.4.1

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 3.5.0, 3.4.1
>
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-02-06 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17684958#comment-17684958
 ] 

Matthias J. Sax commented on KAFKA-14660:
-

Correct. 3.4.0 is already voted and should be released soon. I plan to 
cherry-pick for 3.4.1 after 3.4.0 is out.

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 3.5.0
>
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-02-03 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683951#comment-17683951
 ] 

Matthias J. Sax edited comment on KAFKA-14660 at 2/3/23 4:52 PM:
-

Not sure why the PR was not auto-linked... Fixed.

[https://github.com/apache/kafka/pull/13175]

Thanks for your follow up. Can we close this ticket? Let me know if there is 
anything else I can do.


was (Author: mjsax):
Not sure why the PR was not auto-linked... Fixed.

[https://github.com/apache/kafka/pull/13175]

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 3.5.0
>
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-02-03 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683951#comment-17683951
 ] 

Matthias J. Sax commented on KAFKA-14660:
-

Not sure why the PR was not auto-linked... Fixed.

[https://github.com/apache/kafka/pull/13175]

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 3.5.0
>
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-890: Transactions Server Side Defense

2023-02-02 Thread Matthias J. Sax


Thanks for the KIP!

+1 (binding)


On 2/2/23 4:18 PM, Artem Livshits wrote:

(non-binding) +1.  Looking forward to the implementation and fixing the
issues that we've got.

-Artem

On Mon, Jan 23, 2023 at 2:25 PM Guozhang Wang 
wrote:


Thanks Justine, I have no further comments on the KIP. +1.

On Tue, Jan 17, 2023 at 10:34 AM Jason Gustafson
 wrote:


+1. Thanks Justine!

-Jason

On Tue, Jan 10, 2023 at 3:46 PM Colt McNealy 

wrote:



(non-binding) +1. Thank you for the KIP, Justine! I've read it; it

makes

sense to me and I am excited for the implementation.

Colt McNealy
*Founder, LittleHorse.io*


On Tue, Jan 10, 2023 at 10:46 AM Justine Olshan
 wrote:


Hi everyone,

I would like to start a vote on KIP-890 which aims to prevent some

of the

common causes of hanging transactions and make other general

improvements

to transactions in Kafka.






https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense


Please take a look if you haven't already and vote!

Justine

Re: Coralogix Logo on Powered By Page

2023-02-01 Thread Matthias J. Sax


Thanks for reaching out.

Can you open a PR against https://github.com/apache/kafka-site updating 
`powered-by.html`?



-Matthias

On 2/1/23 1:13 AM, Tali Soroker wrote:

Hi,
I am writing on behalf of Coralogix to request adding us to the Powered 
By page on the Apache Kafka website.


I am attaching our logo and here is a description of our usage for your 
consideration:


Coralogix uses Kafka Streams to power our observability platform
that ingests and analyzes up to tens of billions of messages per
day. Using Kafka Streams, we are able to decouple analytics from
indexing and storage to provide our users with the best performance,
scalability, and cost.


Best,
Tali


--





  Tali Soroker

Product Marketing

+972 058-681-1707

coralogix.com

[jira] [Commented] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-01-31 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682648#comment-17682648
 ] 

Matthias J. Sax commented on KAFKA-14660:
-

The original PR did not make sense, as if totalCapacity would really be zero, 
there is a bug and just setting it to 1 does not sound right. I did already 
merge a new PR that just raises an exception for this case, and thus avoid 
divide-by-zero. This should resolve the issue.

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 3.5.0
>
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-14650) IQv2 can throw ConcurrentModificationException when accessing Tasks

2023-01-30 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14650:
---

Assignee: Guozhang Wang

> IQv2 can throw ConcurrentModificationException when accessing Tasks 
> 
>
> Key: KAFKA-14650
> URL: https://issues.apache.org/jira/browse/KAFKA-14650
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.4.0
>Reporter: A. Sophie Blee-Goldman
>Assignee: Guozhang Wang
>Priority: Major
>
> From failure in *[PositionRestartIntegrationTest.verifyStore[cache=false, 
> log=true, supplier=IN_MEMORY_WINDOW, 
> kind=PAPI]|https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/63/testReport/junit/org.apache.kafka.streams.integration/PositionRestartIntegrationTest/Build___JDK_11_and_Scala_2_13___verifyStore_cache_false__log_true__supplier_IN_MEMORY_WINDOW__kind_PAPI_/]*
> java.util.ConcurrentModificationException
>   at 
> java.base/java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1208)
>   at java.base/java.util.TreeMap$EntryIterator.next(TreeMap.java:1244)
>   at java.base/java.util.TreeMap$EntryIterator.next(TreeMap.java:1239)
>   at java.base/java.util.HashMap.putMapEntries(HashMap.java:508)
>   at java.base/java.util.HashMap.putAll(HashMap.java:781)
>   at 
> org.apache.kafka.streams.processor.internals.Tasks.allTasksPerId(Tasks.java:361)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.allTasks(TaskManager.java:1537)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.allTasks(StreamThread.java:1278)
>   at org.apache.kafka.streams.KafkaStreams.query(KafkaStreams.java:1921)
>   at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.iqv2WaitForResult(IntegrationTestUtils.java:168)
>   at 
> org.apache.kafka.streams.integration.PositionRestartIntegrationTest.shouldReachExpectedPosition(PositionRestartIntegrationTest.java:438)
>   at 
> org.apache.kafka.streams.integration.PositionRestartIntegrationTest.verifyStore(PositionRestartIntegrationTest.java:423)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-30 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682401#comment-17682401
 ] 

Matthias J. Sax commented on KAFKA-14646:
-

I was thinking about this issue, and I think the only way to upgrade is to 
"drain" your topology. Ie, you would need to stop your upstream producers and 
not send any new input data. Afterwards, you let KS finish processing of all 
input data (including processing of all data from internal topics, ie, 
repartition, fk-subscription, and fk-response topics), to really "drain" the 
topology completely. Next, do a two round rolling bounce using `upgrade.from`, 
and finally resume your upstream producers.

Would you be willing to try this out (and report back)?

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-01-30 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682291#comment-17682291
 ] 

Matthias J. Sax commented on KAFKA-14660:
-

Hey Andy – thanks for the ticket. We would have accepted a PR, too. :D

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-14660) Divide by zero security vulnerability (sonatype-2019-0422)

2023-01-30 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14660:
---

Assignee: Matthias J. Sax

> Divide by zero security vulnerability (sonatype-2019-0422)
> --
>
> Key: KAFKA-14660
> URL: https://issues.apache.org/jira/browse/KAFKA-14660
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
>Reporter: Andy Coates
>    Assignee: Matthias J. Sax
>Priority: Minor
>
> Looks like SonaType has picked up a "Divide by Zero" issue reported in a PR 
> and, because the PR was never merged, is now reporting it as a security 
> vulnerability in the latest Kafka Streams library.
>  
> See:
>  * [Vulnerability: 
> sonatype-2019-0422]([https://ossindex.sonatype.org/vulnerability/sonatype-2019-0422?component-type=maven=org.apache.kafka%2Fkafka-streams_source=ossindex-client_medium=integration_content=1.7.0)]
>  * [Original PR]([https://github.com/apache/kafka/pull/7414])
>  
> While it looks from the comments made by [~mjsax] and [~bbejeck] that the 
> divide-by-zero is not really an issue, the fact that its now being reported 
> as a vulnerability is, especially with regulators.
> PITA, but we should consider either getting this vulnerability removed 
> (Google wasn't very helpful in providing info on how to do this), or fixed 
> (Again, not sure how to tag the fix as fixing this issue).  One option may 
> just be to reopen the PR and merge (and then fix forward by switching it to 
> throw an exception).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-25 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14646.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-13769) KTable FK join can miss records if an upstream non-key-changing operation changes key serializer

2023-01-25 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680858#comment-17680858
 ] 

Matthias J. Sax commented on KAFKA-13769:
-

I just updated fixed version from 3.0.0 to 3.3.3 and 3.4.0. Cf 
https://issues.apache.org/jira/browse/KAFKA-14646 for details.

> KTable FK join can miss records if an upstream non-key-changing operation 
> changes key serializer
> 
>
> Key: KAFKA-13769
> URL: https://issues.apache.org/jira/browse/KAFKA-13769
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Consider a topology, where the source KTable is followed by a 
> {{transformValues}} operation [that changes the key 
> schema|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L452]
>  followed by a foreign key join. The FK join might miss records in such a 
> topology because they might be sent to the wrong partitions.
> As {{transformValues}} does not change the key itself, repartition won't 
> happen after this operation. However, the KTable instance that calls 
> {{doJoinOnForeignKey}} uses the new serde coming from {{transformValues}} 
> rather than the original. As a result, all nodes in the FK join topology 
> except for 
> [SubscriptionResolverJoinProcessorSupplier|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L1225-L1232]
>  use the "new" serde. {{SubscriptionResolverJoinProcessorSupplier}} uses the 
> old one because it uses 
> [valueGetterSupplier|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L1225]
>  that in turn will retrieve the records from the topic.
> A different serializer might serialize keys to different series of bytes, 
> which will lead to sending them to the wrong partitions. To run into that 
> issue, multiple things must happen:
> * a topic should have more than one partition,
> * KTable's serializer should be modified via a non-key-changing operation,
> * the new serializer should serialize keys differently
> In practice, it might happen if the key type is a {{Struct}} because it 
> serializes to a JSON string {{columnName -> value}}. If the 
> {{transformValues}} operation changes column names to avoid name clashes with 
> the joining table, such join can lead to incorrect behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-13769) KTable FK join can miss records if an upstream non-key-changing operation changes key serializer

2023-01-25 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-13769:

Fix Version/s: 3.4.0
   3.3.3
   (was: 3.3.0)

> KTable FK join can miss records if an upstream non-key-changing operation 
> changes key serializer
> 
>
> Key: KAFKA-13769
> URL: https://issues.apache.org/jira/browse/KAFKA-13769
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Consider a topology, where the source KTable is followed by a 
> {{transformValues}} operation [that changes the key 
> schema|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L452]
>  followed by a foreign key join. The FK join might miss records in such a 
> topology because they might be sent to the wrong partitions.
> As {{transformValues}} does not change the key itself, repartition won't 
> happen after this operation. However, the KTable instance that calls 
> {{doJoinOnForeignKey}} uses the new serde coming from {{transformValues}} 
> rather than the original. As a result, all nodes in the FK join topology 
> except for 
> [SubscriptionResolverJoinProcessorSupplier|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L1225-L1232]
>  use the "new" serde. {{SubscriptionResolverJoinProcessorSupplier}} uses the 
> old one because it uses 
> [valueGetterSupplier|https://github.com/apache/kafka/blob/db724f23f38cdb6c668a10681ea2a03bb11611ad/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L1225]
>  that in turn will retrieve the records from the topic.
> A different serializer might serialize keys to different series of bytes, 
> which will lead to sending them to the wrong partitions. To run into that 
> issue, multiple things must happen:
> * a topic should have more than one partition,
> * KTable's serializer should be modified via a non-key-changing operation,
> * the new serializer should serialize keys differently
> In practice, it might happen if the key type is a {{Struct}} because it 
> serializes to a JSON string {{columnName -> value}}. If the 
> {{transformValues}} operation changes column names to avoid name clashes with 
> the joining table, such join can lead to incorrect behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-25 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14646:

Affects Version/s: 3.3.1
   3.3.0

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-25 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14646.
-
Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
> Fix For: 3.4.0, 3.3.3
>
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-25 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680857#comment-17680857
 ] 

Matthias J. Sax commented on KAFKA-14646:
-

Ok. Talked to Alex and he quickly figured it out – it's quite embarrassing... 
The last two PRs for https://issues.apache.org/jira/browse/KAFKA-13769 did not 
land in 3.3 release branch... (seems we forgot to cherry-pick after merging to 
`trunk`) So we indeed have a bug in 3.3.2 (and 3.3.1 and 3.3.0), and it's only 
fixed in 3.4.0... Bottom line: K13769 does not really fix the issue in 3.3.2 or 
earlier, but only in 3.4.0 which should be available soon.

I just cherry-picked both commits to 3.3 branch, so in case there will be a 
3.3.3 release, it would get picked up.

Thanks for reporting this! Will close this ticket now. Feel free to follow up 
in the comments.

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-869: Improve Streams State Restoration Visibility

2023-01-25 Thread Matthias J. Sax


Thanks!

+1 (binding)

-Matthias

On 1/24/23 1:17 PM, Guozhang Wang wrote:

Hi Matthias:

re "paused" -> "suspended": I got your point now, thanks. Just to
clarify the two functions are a bit different: "paused" tasks are
because of the topology being paused, i.e. from KIP-834; whereas
"suspended" tasks are when a restoring tasks are being removed before
it completes due to a follow-up rebalance, and this is to distinguish
with "onRestoreEnd", as described in KAFKA-10575. A suspended task is
no longer owned by the thread and hence there's no need to measure the
number of such tasks.

re: "restore-ratio": that's a good point. I like it to function in the
same way as the "records-rate" metrics. Will update the wiki.

re: making "restore-remaining-records-total" at INFO level: sounds
good to me too. I will also update the metric name a bit to be more
specific.



On Thu, Jan 19, 2023 at 2:35 PM Guozhang Wang
 wrote:


Hello Matthias,

Thanks for the feedback. I was on vacation for a while. Pardon for the
late replies. Please see them inline below

On Thu, Dec 1, 2022 at 11:23 PM Matthias J. Sax  wrote:


Seems I am late to the party... Great KIP. Couple of questions from my side:

(1) What is the purpose of `standby-updating-tasks`? It seems to be the
same as the number of assigned standby task? Not sure how useful it
would be?


In general, yes, it is the number of assigned standby tasks --- there
will be transit times when the assigned standby tasks are not yet
being updated but it would not last long --- but we do not yet have a
direct gauge to expose this before, and users have to infer this from
other indirect metrics.




(2) `active-paused-tasks` / `standby-paused-tasks` -- what does "paused"
exactly mean? There was a discussion about renaming the callback method
from pause to suspended. So should this be called `suspended`, too? And
if yes, how is it useful for users?


Pausing here refers to "KIP-834: Pause / Resume KafkaStreams
Topologies" 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832).
When a topology is paused, all its tasks including standbys will be
paused too.

I'm not aware of a discussion to rename the call name to "suspend" for
KIP-834. Could you point me to the reference?




(3) `restore-ratio`: the description says


The fraction of time the thread spent on restoring active or standby tasks


I find the term "restoring" does only apply to active tasks, but not to
standbys. Can we reword this?


Yeah I have been discussing this with others in the community a bit as
well, but so far I have not been convinced of a better name than it.
Some other alternatives being discussed but not win everyone's love is
"restore-or-update-ratio", "process-ratio" (for the restore thread
that means restoring or updating), and "io-ratio".

The only one so far that I feel is probably better, is
"state-update-ratio". If folks feel this one is better than
"restore-ratio" I'm happy to update.



(4) `restore-call-rate`: not sure what you exactly mean by "restore calls"?


This is similar to the "io-calls-rate" in the selector classes, i.e.
the number of "restore" function calls made. It's argurably a very
low-level metrics but I included it since it could be useful in some
debugging scenarios.



(5) `restore-remaining-records-total` -- why is this a task metric?
Seems we could roll it up into a thread metric that we report at INFO
level (we could still have per-task DEBUG level metric for it in addition).


The rationale behind it is the general principle in metrics design
that "Kafka would provide the lowest necessary metrics levels, and
users can do the roll-ups however they want".



(6) What about "warmup tasks"? Internally, we treat them as standbys,
but it seems it's hard for users to reason about it in the scale-out
warm-up case. Would it be helpful (and possible) to report "warmup
progress" explicitly?


At the restore thread level, we cannot differentiate standby tasks
from warmup tasks since the latter is created exactly just like the
former. But I do agree this is an issue for visibility that worth
addressing, I think another KIP would be needed to first consider
distinguishing these two at the class level.



-Matthias


On 11/1/22 2:44 AM, Lucas Brutschy wrote:

We need this!

+ 1 non binding

Cheers,
Lucas

On Tue, Nov 1, 2022 at 10:01 AM Bruno Cadonna  wrote:


Guozhang,

Thanks for the KIP!

+1 (binding)

Best,
Bruno

On 25.10.22 22:07, Walker Carlson wrote:

+1 non binding

Thanks for the kip!

On Thu, Oct 20, 2022 at 10:25 PM John Roesler  wrote:


Thanks for the KIP, Guozhang!

I'm +1 (binding)

-John

On Wed, Oct 12, 2022, at 16:36, Nick Telford wrote:

Can't wait!
+1 (non-binding)

On Wed, 12 Oct 2022, 18:02 Guozhang Wang, 
wr

Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-25 Thread Matthias J. Sax


would it build an offset map with just the latest timestamp for a key?


Cannot remember the details without reading the KIP, but yes, something 
like this (I believe it actually needs to track both, offset and 
timestamp per key).



I wonder if ordering assumptions are baked in there, why not use offset-based 
compaction.


The use case is a compacted topic that does contain out-of-order data. 
If you set key k1=v1 @ 5 offset 100 and later key1 = v0 @ 3 at offset 
200 we want to cleanup v0 with higher offset because it's out-of-order 
based on time, but keep v1 what is the actual latest version of k1.




I was also not aware of this "guarantee" with regards to broker side time.


As already said: I am not sure if it's a public contract, but based on 
my experience, people might reply on it as "implicit contract". -- Maybe 
somebody else knows if it's public or not, and if it would be ok to 
"break" it.



Let me know if you have any concerns here.


My understanding is: While we cannot make an offset-order guarantee for 
interleaved writes of different producer, if the topic is configures 
with "append_time", we "guarantee" (cf. my comment above") timestamp 
order... If that's the case, it would be an issue if we break this 
"guarantee".


I am not sure when the broker sets the timestamp for "append_time" 
config? If we do it before putting the request into purgatory, we have a 
problem. However, if we set the timestamp when we actually process the 
request and do the actual append, it seems there is no issue, as the 
request that was waiting in purgatory get the "newest" timestamp and 
thus cannot introduce out-of-order data.



-Matthias


On 1/24/23 10:44 AM, Justine Olshan wrote:

Hey Matthias,

I have actually never heard of KIP-280 so thanks for bringing it up. That
seems interesting. I wonder how it would work though -- would it build an
offset map with just the latest timestamp for a key? I wonder if ordering
assumptions are baked in there, why not use offset-based compaction.

I was also not aware of this "guarantee" with regards to broker side time.
I think that we can do in order handling for a given producer, but not
across all producers. However, we can't guarantee that anyway.

Let me know if you have any concerns here.

Thanks,
Justine

On Mon, Jan 23, 2023 at 6:33 PM Matthias J. Sax  wrote:


Just a side note about Guozhang comments about timestamps.

If the producer sets the timestamp, putting the record into purgatory
seems not to be an issue (as already said: for this case we don't
guarantee timestamp order between writes of different producers anyway).
However, if the broker sets the timestamp, the expectation is that there
is no out-of-order data in the partition ever; if we would introduce
out-of-order data for this case (for interleaved writes of different
producers), it seems we would violate the current contract? (To be fair:
I don't know if that's an official contract, but I assume people rely on
this behavior -- and it "advertised" in many public talks...)

About compaction: there is actually KIP-280 that adds timestamp based
compaction what is a very useful feature for Kafka Streams with regard
to out-of-order data handling. So the impact if we introduce
out-of-order data could be larger scoped.


-Matthias


On 1/20/23 4:48 PM, Justine Olshan wrote:

Hey Artem,

I see there is a check for transactional producers. I'm wondering if we
don't handle the epoch overflow case. I'm also not sure it will be a huge
issue to extend to transactional producers, but maybe I'm missing

something.


As for the recovery path -- I think Guozhang's point was if we have a bad
client that repeatedly tries to produce without adding to the transaction
we would do the following:
a) if not fatal, we just fail the produce request over and over
b) if fatal, we fence the producer

Here with B, the issue with the client would be made clear more quickly.

I

suppose there are some intermediate cases where the issue only occurs
sometimes, but I wonder if we should consider how to recover with clients
who don't behave as expected anyway.

I think there is a place for the abortable error that we are adding --

just

abort and try again. But I think there are also some cases where trying

to

recover overcomplicates some logic. Especially if we are considering

older

clients -- there I'm not sure if there's a ton we can do besides fail the
batch or fence the producer. With newer clients, we can consider more
options for what can just be recovered after aborting. But epochs might

be

a hard one unless we also want to reset producer ID.

Thanks,
Justine



On Fri, Jan 20, 2023 at 3:59 PM Artem Livshits
 wrote:


   besides the poorly written client case


A poorly written client could create a lot of grief to people who run

Kafka

brokers :-), so when deciding to make an error fatal I wou

[jira] [Commented] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-25 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680840#comment-17680840
 ] 

Matthias J. Sax commented on KAFKA-14646:
-

Could be – let me sync with Alex who worked on the ticket I linked above – need 
to think about it a little bit.

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 15 topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-869: Improve Streams State Restoration Visibility

2023-01-23 Thread Matthias J. Sax


Thanks Guozhang. Couple of clarifications and follow up questions.



I'm not aware of a discussion to rename the call name to "suspend" for
KIP-834. Could you point me to the reference?


My commend was not about KIP-834, but about this KIP. You originally 
proposed to call the new call-back `onRestorePause` but to avoid 
confusion it was improved and renamed to `onRestoreSuspended`.




The only one so far that I feel is probably better, is
"state-update-ratio". If folks feel this one is better than
"restore-ratio" I'm happy to update.


Could we actually report two metric, one for the restore phase 
(restore-ration), and one for steady state ([standby-]update-ratio)?


I could like with `state-update-ratio` if we want to have a single 
metric for both, but splitting them sound useful to me.




(4) `restore-call-rate`


Maybe we can clarify in the description a little bit. I agree it's very 
low level but if you think it could be useful to debugging, I have no 
objection.




The rationale behind it is the general principle in metrics design
that "Kafka would provide the lowest necessary metrics levels, and
users can do the roll-ups however they want".


That's fair, but it seems to be a rather important metric, and having it 
only at DEBUG level seems not ideal? Could we make it INFO level, even 
if it's a task level (ie, apply an exception to the rule).




-Matthias



On 1/19/23 2:35 PM, Guozhang Wang wrote:

Hello Matthias,

Thanks for the feedback. I was on vacation for a while. Pardon for the
late replies. Please see them inline below

On Thu, Dec 1, 2022 at 11:23 PM Matthias J. Sax  wrote:


Seems I am late to the party... Great KIP. Couple of questions from my side:

(1) What is the purpose of `standby-updating-tasks`? It seems to be the
same as the number of assigned standby task? Not sure how useful it
would be?


In general, yes, it is the number of assigned standby tasks --- there
will be transit times when the assigned standby tasks are not yet
being updated but it would not last long --- but we do not yet have a
direct gauge to expose this before, and users have to infer this from
other indirect metrics.




(2) `active-paused-tasks` / `standby-paused-tasks` -- what does "paused"
exactly mean? There was a discussion about renaming the callback method
from pause to suspended. So should this be called `suspended`, too? And
if yes, how is it useful for users?


Pausing here refers to "KIP-834: Pause / Resume KafkaStreams
Topologies" 
(https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832).
When a topology is paused, all its tasks including standbys will be
paused too.

I'm not aware of a discussion to rename the call name to "suspend" for
KIP-834. Could you point me to the reference?




(3) `restore-ratio`: the description says


The fraction of time the thread spent on restoring active or standby tasks


I find the term "restoring" does only apply to active tasks, but not to
standbys. Can we reword this?


Yeah I have been discussing this with others in the community a bit as
well, but so far I have not been convinced of a better name than it.
Some other alternatives being discussed but not win everyone's love is
"restore-or-update-ratio", "process-ratio" (for the restore thread
that means restoring or updating), and "io-ratio".

The only one so far that I feel is probably better, is
"state-update-ratio". If folks feel this one is better than
"restore-ratio" I'm happy to update.



(4) `restore-call-rate`: not sure what you exactly mean by "restore calls"?


This is similar to the "io-calls-rate" in the selector classes, i.e.
the number of "restore" function calls made. It's argurably a very
low-level metrics but I included it since it could be useful in some
debugging scenarios.



(5) `restore-remaining-records-total` -- why is this a task metric?
Seems we could roll it up into a thread metric that we report at INFO
level (we could still have per-task DEBUG level metric for it in addition).


The rationale behind it is the general principle in metrics design
that "Kafka would provide the lowest necessary metrics levels, and
users can do the roll-ups however they want".



(6) What about "warmup tasks"? Internally, we treat them as standbys,
but it seems it's hard for users to reason about it in the scale-out
warm-up case. Would it be helpful (and possible) to report "warmup
progress" explicitly?


At the restore thread level, we cannot differentiate standby tasks
from warmup tasks since the latter is created exactly just like the
former. But I do agree this is an issue for visibility that worth
addressing, I think another KIP would be needed to first consider
distinguishing these two at the class level.



-Matthias


On 11/1/22 2:44 AM, Lucas Brutschy wrote:

We need this!

+

Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-23 Thread Matthias J. Sax

rkerRequests with

the

bumped

epoch."

Hmm,

the

epoch

is

associated

with the current txn right?

So,

it

seems

weird to

write a

commit

message

with a bumped epoch. Should we

only

bump

up

the

epoch

in

EndTxnResponse

and

rename the field to sth like

nextProducerEpoch?


Thanks,

Jun



On Mon, Dec 12, 2022 at 8:54

PM

Matthias

J.

Sax <

mj...@apache.org>

wrote:



Thanks for the background.

20/30: SGTM. My proposal was

only

focusing

to

avoid

dangling

transactions if records are

added

without

registered

partition.

--

Maybe

you can add a few more

details

to

the

KIP

about

this

scenario

for

better

documentation purpose?

40: I think you hit a fair

point

about

race

conditions

or

client

bugs

(incorrectly not bumping the

epoch). The

complexity/confusion

for

using

the bumped epoch I see, is

mainly

for

internal

debugging,

ie,

inspecting

log segment dumps -- it

seems

harder to

reason

about

the

system

for

us

humans. But if we get better

guarantees, it

would

be

worth to

use

the

bumped epoch.

60: as I mentioned already,

I

don't

know the

broker

internals

to

provide

more input. So if nobody

else

chimes

in, we

should

just

move

forward

with your proposal.


-Matthias


On 12/6/22 4:22 PM, Justine

Olshan

wrote:

Hi all,
After Artem's questions

about

error

behavior,

I've

re-evaluated

the

unknown producer ID

exception

and

had

some

discussions

offline.


I think generally it makes

sense

to

simplify

error

handling

in

cases

like

this and the

UNKNOWN_PRODUCER_ID

error

has a

pretty

long

and

complicated

history. Because of this,

I

propose

adding a

new

error

code

ABORTABLE_ERROR

that when encountered by

new

clients

(gated

by

the

produce

request

version)

will simply abort the

transaction.

This

allows

the

server

to

have

some

say

in whether the client

aborts

and

makes

handling

much

simpler.

In

the

future, we can also use

this

error in

other

situations

where

we

want

to

abort the transactions. We

can

even

use on

other

apis.


I've added this to the

KIP.

Let

me

know if

there

are

any

questions

or

issues.

Justine

On Fri, Dec 2, 2022 at

10:22

AM

Justine

Olshan

<

jols...@confluent.io



wrote:



Hey Matthias,


20/30 — Maybe I also

didn't

express

myself

clearly.

For

older

clients

we

don't have a way to

distinguish

between a

previous

and

the

current

transaction since we

don't

have

the

epoch

bump.

This

means

that

a

late

message from the previous

transaction

may be

added to

the

new

one.

With

older clients — we can't

guarantee

this

won't

happen

if we

already

sent

the

addPartitionsToTxn call

(why

we

make

changes

for

the

newer

client)

but

we

can at least gate some by

ensuring

that

the

partition

has

been

added

to

the

transaction. The

rationale

here

is

that

there

are

likely

LESS

late

arrivals

as time goes on, so

hopefully

most

late

arrivals

will

come

in

BEFORE

the

addPartitionsToTxn call.

Those

that

arrive

before

will

be

properly

gated

with the

describeTransactions

approach.


If we take the approach

you

suggested,

ANY

late

arrival

from a

previous

transaction will be

added.

And

we

don't

want

that. I

also

don't

see

any

benefit in sending

addPartitionsToTxn

over

the

describeTxns

call.

They

will

both be one extra RPC to

the

Txn

coordinator.



To be clear — newer

clients

will

use

addPartitionsToTxn

instead

of

the

DescribeTxns.


40)
My concern is that if we

have

some

delay

in

the

client

to

bump

the

epoch,

it could continue to send

epoch

73

and

those

records

would

not

be

fenced.

Perhaps this is not an

issue

if

we

don't

allow

the

next

produce

to

go

through before the EndTxn

request

returns.

I'm

also

thinking

about

cases of

failure. I will need to

think

on

this a

bit.


I wasn't sure if it was

that

confusing.

But

if

we

think it

is,

we

can

investigate other ways.


60)

I'm not sure these are

the

same

purgatories

since

one

is a

produce

purgatory (I was planning

on

using a

callback

rather

than

purgatory)

and

the other is simply a

request

to

append

to

the

log.

Not

sure

we

have

any

structure here for

ordering,

but

my

understanding

is

that

the

broker

could

handle the write request

before

it

hears

back

from

the

Txn

Coordinator.


Let me know if I

misunderstood

something

or

something

was

unclear.


Justine

On Thu, Dec 1, 2022 at

12:15

PM

Matthias

J.

Sax

<

mj...@apache.org



wrote:



Thanks for the details

Justine!



20)

The client side change

for

2

is

removing

the

addPartitions

to

transaction

call. We don't need to

make

this

from

the

producer

to

the

txn

coordinator,

only server side.


I think I did not

express

myself

clearly. I

understand

that

we

can

(and

should) change t

Re: Custom Kafka Streams State Restore Logic

2023-01-23 Thread Matthias J. Sax


Thanks.

I agree. Seems your options are limited. The API is not really a good 
fix for what you want to do... Sorry.


-Matthias

On 1/18/23 7:48 AM, Upesh Desai wrote:

Hi Matthias, thanks for your reply! Sure, so the use case is as follows.

We currently store some time series data in the state store, and it is 
stored to a changelog as well. The time series data is bucketed (5 
minutes, 1 hour, and 1 day). Our goal was to always only have a max of 2 
time buckets in the store at once. As we receive new timeseries data, we 
figure out what time bucket it belongs to, and add it to its respective 
bucket. We have a “grace period” which allows for late arriving data to 
be processed even after a time bucket has ended. That’s the reason why 
we have this constraint of 2 time buckets max within the store; 1 for 
the previous bucket in its grace period, 1 for the current bucket.


So we wanted to extend the base state store and add a simple in-memory 
map to track the 2 time buckets per timeseries (that’s the store key). A 
couple reasons why we don’t want to add this as a separate state store 
or the existing store are:
1. There is a ton of serialization / deserialization that happens behind 
the scenes


2. This new time bucket tracking map would only be updated a couple 
times per time bucket, and does not need to be updated on every message 
read.


3. There’s no API on the included stores that allows us to do so

Therefore, I thought it best to try to use the existing store 
functionality, create a “new state store” that really just instantiates 
one of the included stores within, add this in memory map, and then plug 
into/alter/extend the restore functionality to populate the time bucket 
tracking map during restore time.


It sounds like I will either have to 1) create a custom state store from 
scratch, or 2) see if there is a post-restore hook that can then call a 
method to scan the whole store and build up the time bucket map before 
starting to process.


Any advice on Kafka streams / state store logic would be appreciated!

-Upesh

Upesh Desai	 | 	Senior Software Developer	 | 	*ude...@itrsgroup.com* 
<mailto:ude...@itrsgroup.com>


*www.itrsgroup.com* <https://www.itrsgroup.com/>  


<https://www.itrsgroup.com/>



*From: *Matthias J. Sax 
*Date: *Wednesday, January 18, 2023 at 12:50 AM
*To: *users@kafka.apache.org 
*Subject: *Re: Custom Kafka Streams State Restore Logic

Guess it depends what you actually want to achieve?

Also note: `InMemoryWindowStore` is an internal class, and thus might
change at any point, and it was never designed to be extended...


-Matthias

On 1/13/23 2:55 PM, Upesh Desai wrote:

Hello all,

I am currently working on creating a new InMemoryWindowStore, by 
extending the default in memory window store. One of the roadblocks I’ve 
run into is finding a way to add some custom logic when the state store 
is being restored from the changelog. I know that this is possible if I 
completely write the store logic from scratch, but we really only want 
to add a tiny bit of custom logic, and do not want to have to replicate 
all the existing logic.


Is there a simple way for this to be done? I see the default 
implementation in the InMemoryWindowStore :


context.register(
      root,
      (RecordBatchingStateRestoreCallback) records -> {
      for (final ConsumerRecord record : records) {
      put(
      Bytes./wrap/(/extractStoreKeyBytes/(record.key())),
      record.value(),
/extractStoreTimestamp/(record.key())
      );
 
ChangelogRecordDeserializationHelper./applyChecksAndUpdatePosition/(

      record,
      consistencyEnabled,
      position
      );
      }
      }
);

Thanks in advance!

Upesh

<https://www.itrsgroup.com/ <https://www.itrsgroup.com/>>

   
Upesh Desai

Senior Software Developer

*ude...@itrsgroup.com* <mailto:ude...@itrsgroup.com 
<mailto:ude...@itrsgroup.com>>
*www.itrsgroup.com* <https://www.itrsgroup.com/ <https://www.itrsgroup.com/>>

Internet communications are not secure and therefore the ITRS Group does 
not accept legal responsibility for the contents of this message. Any 
view or opinions presented are solely those of the author and do not 
necessarily represent those of the ITRS Group unless otherwise 
specifically stated.


[itrs.email.signature]

[jira] [Commented] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-23 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680042#comment-17680042
 ] 

Matthias J. Sax commented on KAFKA-14646:
-

Thanks for following up – glad to hear that it's in the docs... And I hope it 
resolved the problem.

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 20+ topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-23 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679991#comment-17679991
 ] 

Matthias J. Sax edited comment on KAFKA-14646 at 1/23/23 7:21 PM:
--

Did you upgrade with two rolling bounced leveraging `upgrad_from` config?

I assume is related to https://issues.apache.org/jira/browse/KAFKA-13769

Of course, K13769 could have introduced some bug, but we actually to test 
rolling upgrades and would hope it would have caught it (otherwise, we need to 
improve our testing...)

We unfortunately, lag some docs on the web-page about the required two rolling 
bounce upgrade path – it unfortunately slipped. We have it in the backlog to 
add the missing docs asap.


was (Author: mjsax):
Did you upgrade with two rolling bounced leveraging `upgrad_from` config?

I assume is related to https://issues.apache.org/jira/browse/KAFKA-13769

We unfortunately, lag some docs on the web-page about the required two rolling 
bounce upgrade path – it unfortunately slipped. We have it in the backlog to 
add the missing docs asap.

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 20+ topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14646) SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 3.3.2)

2023-01-23 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679991#comment-17679991
 ] 

Matthias J. Sax commented on KAFKA-14646:
-

Did you upgrade with two rolling bounced leveraging `upgrad_from` config?

I assume is related to https://issues.apache.org/jira/browse/KAFKA-13769

We unfortunately, lag some docs on the web-page about the required two rolling 
bounce upgrade path – it unfortunately slipped. We have it in the backlog to 
add the missing docs asap.

> SubscriptionWrapper is of an incompatible version (Kafka Streams 3.2.3 -> 
> 3.3.2)
> 
>
> Key: KAFKA-14646
> URL: https://issues.apache.org/jira/browse/KAFKA-14646
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.2
> Environment: Kafka Streams 3.2.3 (before update)
> Kafka Streams 3.3.2 (after update)
> Java 17 (Eclipse Temurin 17.0.5), Linux x86_64
>Reporter: Jochen Schalanda
>Priority: Major
>
> Hey folks,
>  
> we've just updated an application from *_Kafka Streams 3.2.3 to 3.3.2_* and 
> started getting the following exceptions:
> {code:java}
> org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version. {code}
> After swiftly looking through the code, this exception is potentially thrown 
> in two places:
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionJoinForeignProcessorSupplier.java#L73-L78]
>  ** Here the check was changed in Kafka 3.3.x: 
> [https://github.com/apache/kafka/commit/9dd25ecd9ce17e608c6aba98e0422b26ed133c12]
>  * 
> [https://github.com/apache/kafka/blob/3.3.2/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/SubscriptionStoreReceiveProcessorSupplier.java#L94-L99]
>  ** Here the check wasn't changed.
>  
> Is it possible that the second check in 
> {{SubscriptionStoreReceiveProcessorSupplier}} was forgotten?
>  
> Any hints how to resolve this issue without a downgrade?
> Since this only affects 2 of 20+ topologies in the application, I'm hesitant 
> to just downgrade to Kafka 3.2.3 again since the internal topics might 
> already have been updated to use the "new" version of 
> {{{}SubscriptionWrapper{}}}.
>  
> Related discussion in the Confluent Community Slack: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1674497054507119]
> h2. Stack trace
> {code:java}
> org.apache.kafka.streams.errors.StreamsException: Exception caught in 
> process. taskId=1_8, 
> processor=XXX-joined-changed-fk-subscription-registration-source, 
> topic=topic.rev7-XXX-joined-changed-fk-subscription-registration-topic, 
> partition=8, offset=12297976, 
> stacktrace=org.apache.kafka.common.errors.UnsupportedVersionException: 
> SubscriptionWrapper is of an incompatible version.
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:750)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
> at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
> at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1182)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:770)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:588)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:550)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14638) Documentation for transaction.timeout.ms should be more precise

2023-01-19 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14638.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Documentation for transaction.timeout.ms should be more precise
> ---
>
> Key: KAFKA-14638
> URL: https://issues.apache.org/jira/browse/KAFKA-14638
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Minor
> Fix For: 3.5.0
>
>
> The documentation for {{transaction.timeout.ms}} is
> {quote}
> The maximum amount of time in ms that the transaction coordinator will wait 
> for a transaction status update from the producer before proactively aborting 
> the ongoing transaction. If this value is larger than the 
> transaction.max.timeout.ms setting in the broker, the request will fail with 
> a InvalidTxnTimeoutException error.
> {quote}
> It would be easier to reason about this timeout if the documentation would 
> elaborate on when the timer starts ticking and under what circumstances it 
> might reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14638) Documentation for transaction.timeout.ms should be more precise

2023-01-19 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14638.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Documentation for transaction.timeout.ms should be more precise
> ---
>
> Key: KAFKA-14638
> URL: https://issues.apache.org/jira/browse/KAFKA-14638
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Minor
> Fix For: 3.5.0
>
>
> The documentation for {{transaction.timeout.ms}} is
> {quote}
> The maximum amount of time in ms that the transaction coordinator will wait 
> for a transaction status update from the producer before proactively aborting 
> the ongoing transaction. If this value is larger than the 
> transaction.max.timeout.ms setting in the broker, the request will fail with 
> a InvalidTxnTimeoutException error.
> {quote}
> It would be easier to reason about this timeout if the documentation would 
> elaborate on when the timer starts ticking and under what circumstances it 
> might reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14638) Documentation for transaction.timeout.ms should be more precise

2023-01-19 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14638:

Issue Type: Improvement  (was: Bug)

> Documentation for transaction.timeout.ms should be more precise
> ---
>
> Key: KAFKA-14638
> URL: https://issues.apache.org/jira/browse/KAFKA-14638
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Minor
>
> The documentation for {{transaction.timeout.ms}} is
> {quote}
> The maximum amount of time in ms that the transaction coordinator will wait 
> for a transaction status update from the producer before proactively aborting 
> the ongoing transaction. If this value is larger than the 
> transaction.max.timeout.ms setting in the broker, the request will fail with 
> a InvalidTxnTimeoutException error.
> {quote}
> It would be easier to reason about this timeout if the documentation would 
> elaborate on when the timer starts ticking and under what circumstances it 
> might reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14638) Documentation for transaction.timeout.ms should be more precise

2023-01-19 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14638:

Component/s: docs

> Documentation for transaction.timeout.ms should be more precise
> ---
>
> Key: KAFKA-14638
> URL: https://issues.apache.org/jira/browse/KAFKA-14638
> Project: Kafka
>  Issue Type: Bug
>  Components: docs
>Reporter: Alex Sorokoumov
>Assignee: Alex Sorokoumov
>Priority: Minor
>
> The documentation for {{transaction.timeout.ms}} is
> {quote}
> The maximum amount of time in ms that the transaction coordinator will wait 
> for a transaction status update from the producer before proactively aborting 
> the ongoing transaction. If this value is larger than the 
> transaction.max.timeout.ms setting in the broker, the request will fail with 
> a InvalidTxnTimeoutException error.
> {quote}
> It would be easier to reason about this timeout if the documentation would 
> elaborate on when the timer starts ticking and under what circumstances it 
> might reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Tumbling windows offset

2023-01-18 Thread Matthias J. Sax


You can use a custom window implementation to do this.

Cf 
https://github.com/confluentinc/kafka-streams-examples/blob/7.1.1-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java



-Matthias

On 1/18/23 6:33 AM, Amsterdam Luís de Lima Filho wrote:

Hello everyone,

I need to perform windowed aggregation for weekly interval, from Monday to 
Monday, using Tumbling Windows.

It does not seem to be supported: 
https://stackoverflow.com/questions/72785744/how-to-apply-an-offset-to-tumbling-window-in-order-to-delay-the-starting-of-win

Can someone help me with a workaround or guidance on how to make a PR adding 
this feature?

I need this to finish a migration from Flink to Kafka Streams in prod.

Best,
Amsterdam

Re: [VOTE] 3.3.2 RC1

2023-01-18 Thread Matthias J. Sax


Done.

On 1/18/23 10:36 AM, Chris Egerton wrote:

Thanks Mickael!

@John, @Matthias -- would either of you be able to lend a hand with the 
push to S3?


Cheers,

Chris

On Tue, Jan 17, 2023 at 12:24 PM Mickael Maison 
mailto:mickael.mai...@gmail.com>> wrote:


Hi Chris,

I've pushed the artifacts to
https://dist.apache.org/repos/dist/release/kafka/
 and added your key
to the KEYS file in that repo.
You should now be able to release the artifacts. Then you'll need a
Confluent employee (I usually ping John or Matthias) to push a few
binaries to their S3 bucket.

Thanks,
Mickael

On Tue, Jan 17, 2023 at 5:04 PM Chris Egerton
 wrote:
 >
 > Hi Mickael,
 >
 > Haven't found anyone yet; your help would be greatly appreciated!
 >
 > Cheers,
 >
 > Chris
 >
 > On Mon, Jan 16, 2023 at 8:46 AM Mickael Maison
mailto:mickael.mai...@gmail.com>>
 > wrote:
 >
 > > Hi Chris,
 > >
 > > Have you already found a PMC member to help you out? If not I'm
happy
 > > to lend you a hand.
 > >
 > > Thanks,
 > > Mickael
 > >
 > > On Wed, Jan 11, 2023 at 9:07 PM Chris Egerton

 > > wrote:
 > > >
 > > > Hi all,
 > > >
 > > > In order to continue with the release process, I need the
assistance of a
 > > > PMC member to help with publishing artifacts to the release
directory of
 > > > our SVN repo, and adding my public key to the KEYS file.
Would anyone be
 > > > willing to lend a hand?
 > > >
 > > > Cheers,
 > > >
 > > > Chris
 > > >
 > > > On Wed, Jan 11, 2023 at 11:45 AM Chris Egerton
mailto:chr...@aiven.io>> wrote:
 > > >
 > > > > Hi all,
 > > > >
 > > > > Thanks to José for running the system tests, and to Bruno
for verifying
 > > > > the results!
 > > > >
 > > > > With that, I'm closing the vote. The RC has passed with the
required
 > > > > number of votes. I'll be sending out a results announcement
shortly.
 > > > >
 > > > > Cheers,
 > > > >
 > > > > Chris
 > > > >
 > > > > On Wed, Jan 11, 2023 at 6:19 AM Bruno Cadonna
mailto:cado...@apache.org>>
 > > wrote:
 > > > >
 > > > >> Hi Chris and José,
 > > > >>
 > > > >> I think the issue is not Streams related but has to do
with the
 > > > >> following commit:
 > > > >>
 > > > >> commit b66af662e61082cb8def576ded1fe5cee37e155f (HEAD,
tag: 3.3.2-rc1)
 > > > >> Author: Chris Egerton mailto:chr...@aiven.io>>
 > > > >> Date:   Wed Dec 21 16:14:10 2022 -0500
 > > > >>
 > > > >>      Bump version to 3.3.2
 > > > >>
 > > > >>
 > > > >> The Streams upgrade system tests verify the upgrade from a
previous
 > > > >> version to the current version. In the above the current
version in
 > > > >> gradle.properties is set from 3.3.2-SNAPSHOT to 3.3.2 but
the test
 > > > >> verifies for development version 3.3.2-SNAPSHOT.
 > > > >>
 > > > >> I ran the following failing test:
 > > > >>
 > > > >>
 > >

TC_PATHS="tests/kafkatest/tests/streams/streams_application_upgrade_test.py::StreamsUpgradeTest.test_app_upgrade"
 > > > >>
 > > > >> _DUCKTAPE_OPTIONS='--parameters
 > > > >>
 > >

'\''{"bounce_type":"full","from_version":"2.2.2","to_version":"3.3.2-SNAPSHOT"}'\'
 > > > >>
 > > > >> bash tests/docker/run_tests.sh
 > > > >>
 > > > >> on the previous commit to the above commit, i.e.:
 > > > >>
 > > > >> commit e3212f28eb88e8f7fcf3d2d4c646b2a28b0f668e
 > > > >> Author: José Armando García Sancio
mailto:jsan...@users.noreply.github.com>>
 > > > >> Date:   Tue Dec 20 10:55:14 2022 -0800
 > > > >>
 > > > >> and the test passed.
 > > > >>
 > > > >> Best,
 > > > >> Bruno
 > > > >>
 > > > >> On 10.01.23 19:25, José Armando García Sancio wrote:
 > > > >> > Hey Chris,
 > > > >> >
 > > > >> > Here are the results:
 > > > >> >
 > > > >>
 > >

http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1673314598--apache--HEAD--b66af662e6/2023-01-09--001./2023-01-09--001./report.html
 

 > > > >> >
 > > > >> > It looks like all of the failures are when trying to
upgrade to
 > > > >> > 3.3.2-SNAPSHOT. I saw a similar error in my PR here but
I am not
 > > sure
 > > > >> > if it is related:
https://github.com/apache/kafka/pull/13077

 > > > >> >
 > > > >> > Maybe someone familiar with Kafka Streams can help.
 > > > >>

Re: Custom Kafka Streams State Restore Logic

2023-01-17 Thread Matthias J. Sax


Guess it depends what you actually want to achieve?

Also note: `InMemoryWindowStore` is an internal class, and thus might 
change at any point, and it was never designed to be extended...



-Matthias

On 1/13/23 2:55 PM, Upesh Desai wrote:

Hello all,

I am currently working on creating a new InMemoryWindowStore, by 
extending the default in memory window store. One of the roadblocks I’ve 
run into is finding a way to add some custom logic when the state store 
is being restored from the changelog. I know that this is possible if I 
completely write the store logic from scratch, but we really only want 
to add a tiny bit of custom logic, and do not want to have to replicate 
all the existing logic.


Is there a simple way for this to be done? I see the default 
implementation in the InMemoryWindowStore :


context.register(
     root,
     (RecordBatchingStateRestoreCallback) records -> {
     for (final ConsumerRecord record : records) {
     put(
     Bytes./wrap/(/extractStoreKeyBytes/(record.key())),
     record.value(),
/extractStoreTimestamp/(record.key())
     );
 
ChangelogRecordDeserializationHelper./applyChecksAndUpdatePosition/(

     record,
     consistencyEnabled,
     position
     );
     }
     }
);

Thanks in advance!

Upesh




Upesh Desai
Senior Software Developer

*ude...@itrsgroup.com* 
*www.itrsgroup.com* 

Internet communications are not secure and therefore the ITRS Group does 
not accept legal responsibility for the contents of this message. Any 
view or opinions presented are solely those of the author and do not 
necessarily represent those of the ITRS Group unless otherwise 
specifically stated.


[itrs.email.signature]

Re: [ANNOUNCE] New committer: Stanislav Kozlovski

2023-01-17 Thread Matthias J. Sax


Congrats!

On 1/17/23 1:26 PM, Ron Dagostino wrote:

Congratulations, Stan!

Ron


On Jan 17, 2023, at 12:29 PM, Mickael Maison  wrote:

Congratulations Stanislav!


On Tue, Jan 17, 2023 at 6:06 PM Rajini Sivaram  wrote:

Congratulations, Stan!

Regards,

Rajini


On Tue, Jan 17, 2023 at 5:04 PM Tom Bentley  wrote:

Congratulations!


On Tue, 17 Jan 2023 at 16:52, Bill Bejeck  wrote:



Congratulations Stan!

-Bill

On Tue, Jan 17, 2023 at 11:37 AM Bruno Cadonna 

wrote:



Congrats Stan!

Well deserved!

Best,
Bruno

On 17.01.23 16:50, Jun Rao wrote:

Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer
Stanislav Kozlovski.

Stan has been contributing to Apache Kafka since June 2018. He made

various

contributions including the following KIPs.

KIP-455: Create an Administrative API for Replica Reassignment
KIP-412: Extend Admin API to support dynamic application log levels

Congratulations, Stan!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: [ANNOUNCE] New committer: Stanislav Kozlovski

2023-01-17 Thread Matthias J. Sax


Congrats!

On 1/17/23 1:26 PM, Ron Dagostino wrote:

Congratulations, Stan!

Ron


On Jan 17, 2023, at 12:29 PM, Mickael Maison  wrote:

Congratulations Stanislav!


On Tue, Jan 17, 2023 at 6:06 PM Rajini Sivaram  wrote:

Congratulations, Stan!

Regards,

Rajini


On Tue, Jan 17, 2023 at 5:04 PM Tom Bentley  wrote:

Congratulations!


On Tue, 17 Jan 2023 at 16:52, Bill Bejeck  wrote:



Congratulations Stan!

-Bill

On Tue, Jan 17, 2023 at 11:37 AM Bruno Cadonna 

wrote:



Congrats Stan!

Well deserved!

Best,
Bruno

On 17.01.23 16:50, Jun Rao wrote:

Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer
Stanislav Kozlovski.

Stan has been contributing to Apache Kafka since June 2018. He made

various

contributions including the following KIPs.

KIP-455: Create an Administrative API for Replica Reassignment
KIP-412: Extend Admin API to support dynamic application log levels

Congratulations, Stan!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

[jira] [Updated] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2023-01-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14302:

Priority: Critical  (was: Major)

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[ANNOUNCE] New committer: Walker Carlson

2023-01-17 Thread Matthias J. Sax


Dear community,

I am pleased to announce Walker Carlson as a new Kafka committer.

Walker has been contributing to Apache Kafka since November 2019. He 
made various contributions including the following KIPs.


KIP-671: Introduce Kafka Streams Specific Uncaught Exception Handler
KIP-696: Update Streams FSM to clarify ERROR state meaning
KIP-715: Expose Committed offset in streams


Congratulations Walker and welcome on board!


Thanks,
  -Matthias (on behalf of the Apache Kafka PMC)

[ANNOUNCE] New committer: Walker Carlson

2023-01-17 Thread Matthias J. Sax


Dear community,

I am pleased to announce Walker Carlson as a new Kafka committer.

Walker has been contributing to Apache Kafka since November 2019. He 
made various contributions including the following KIPs.


KIP-671: Introduce Kafka Streams Specific Uncaught Exception Handler
KIP-696: Update Streams FSM to clarify ERROR state meaning
KIP-715: Expose Committed offset in streams


Congratulations Walker and welcome on board!


Thanks,
  -Matthias (on behalf of the Apache Kafka PMC)

[jira] [Commented] (KAFKA-14624) State restoration is broken with standby tasks and cache-enabled stores in processor API

2023-01-14 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676934#comment-17676934
 ] 

Matthias J. Sax commented on KAFKA-14624:
-

Thanks for reporting this issue – will need to think about the details more 
(and look into you project that reproduces it) – my first reaction was, "why is 
the cache not empty"? Registering the call-back on the cache might introduce a 
lot of overhead during restore, and thus, I have doubts if it's a good fix.
{quote}If a partition moves from instance 1 to 2 and then back to instance 1
{quote}
If the task/partition moves from 1 to 2, the task on one should be closed and 
the cache should be gone? Or are you saying the task on instance 1 becomes a 
standby?

> State restoration is broken with standby tasks and cache-enabled stores in 
> processor API
> 
>
> Key: KAFKA-14624
> URL: https://issues.apache.org/jira/browse/KAFKA-14624
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Balaji Rao
>Priority: Major
>
> I found that cache-enabled state stores in PAPI with standby tasks sometimes 
> returns stale data when a partition moves from one app instance to another 
> and back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a 
> small project that I used to reproduce the issue.
> I dug around a bit and it seems like it's a bug in standby task state 
> restoration when caching is enabled. If a partition moves from instance 1 to 
> 2 and then back to instance 1,  since the `CachingKeyValueStore` doesn't 
> register a restore callback, it can return potentially stale data for 
> non-dirty keys. 
> I could fix the issue by modifying the `CachingKeyValueStore` to register a 
> restore callback in which the cache restored keys are added to the cache. Is 
> this fix in the right direction?
> {code:java}
> // register the store
> context.register(
> root,
> (RecordBatchingStateRestoreCallback) records -> {
> for (final ConsumerRecord record : 
> records) {
> put(Bytes.wrap(record.key()), record.value());
> }
> }
> );
> {code}
>  
> I would like to contribute a fix, if I can get some help!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14279) Add 3.3.1 to broker/client and stream upgrade/compatibility tests

2023-01-09 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14279.
-
Resolution: Fixed

> Add 3.3.1 to broker/client and stream upgrade/compatibility tests
> -
>
> Key: KAFKA-14279
> URL: https://issues.apache.org/jira/browse/KAFKA-14279
> Project: Kafka
>  Issue Type: Task
>  Components: clients, core, streams, system tests
>Reporter: José Armando García Sancio
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Per the penultimate bullet on the [release 
> checklist|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses],
>  Kafka v3.3.0 is released. We should add this version to the system tests.
> Example PRs:
>  * Broker and clients: [https://github.com/apache/kafka/pull/6794]
>  * Streams: [https://github.com/apache/kafka/pull/6597/files]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14279) Add 3.3.1 to broker/client and stream upgrade/compatibility tests

2023-01-09 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14279.
-
Resolution: Fixed

> Add 3.3.1 to broker/client and stream upgrade/compatibility tests
> -
>
> Key: KAFKA-14279
> URL: https://issues.apache.org/jira/browse/KAFKA-14279
> Project: Kafka
>  Issue Type: Task
>  Components: clients, core, streams, system tests
>Reporter: José Armando García Sancio
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Per the penultimate bullet on the [release 
> checklist|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses],
>  Kafka v3.3.0 is released. We should add this version to the system tests.
> Example PRs:
>  * Broker and clients: [https://github.com/apache/kafka/pull/6794]
>  * Streams: [https://github.com/apache/kafka/pull/6597/files]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14279) Add 3.3.1 to broker/client and stream upgrade/compatibility tests

2023-01-09 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14279:

Fix Version/s: 3.4.0
   (was: 3.5.0)

> Add 3.3.1 to broker/client and stream upgrade/compatibility tests
> -
>
> Key: KAFKA-14279
> URL: https://issues.apache.org/jira/browse/KAFKA-14279
> Project: Kafka
>  Issue Type: Task
>  Components: clients, core, streams, system tests
>Reporter: José Armando García Sancio
>Assignee: José Armando García Sancio
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Per the penultimate bullet on the [release 
> checklist|https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses],
>  Kafka v3.3.0 is released. We should add this version to the system tests.
> Example PRs:
>  * Broker and clients: [https://github.com/apache/kafka/pull/6794]
>  * Streams: [https://github.com/apache/kafka/pull/6597/files]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14126) Convert remaining DynamicBrokerReconfigurationTest tests to KRaft

2023-01-09 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656475#comment-17656475
 ] 

Matthias J. Sax commented on KAFKA-14126:
-

Saw this failing 2x on this PR: https://github.com/apache/kafka/pull/13077

> Convert remaining DynamicBrokerReconfigurationTest tests to KRaft
> -
>
> Key: KAFKA-14126
> URL: https://issues.apache.org/jira/browse/KAFKA-14126
> Project: Kafka
>  Issue Type: Test
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Major
>
> After the initial conversion in https://github.com/apache/kafka/pull/12455, 
> three tests still need to be converted. 
> * testKeyStoreAlter
> * testTrustStoreAlter
> * testThreadPoolResize



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Matthias J. Sax


Congrats!

On 1/6/23 5:15 PM, Luke Chen wrote:

Congratulations, Edoardo!

Luke

On Sat, Jan 7, 2023 at 7:58 AM Mickael Maison 
wrote:


Congratulations Edo!


On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:


Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer

Edoardo

Comar.

Edoardo has been a long time Kafka contributor since 2016. His major
contributions are the following.

KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
KIP-277: Fine Grained ACL for CreateTopics API
KIP-136: Add Listener name to SelectorMetrics tags

Congratulations, Edoardo!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Matthias J. Sax


Congrats!

On 1/6/23 5:15 PM, Luke Chen wrote:

Congratulations, Edoardo!

Luke

On Sat, Jan 7, 2023 at 7:58 AM Mickael Maison 
wrote:


Congratulations Edo!


On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:


Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer

Edoardo

Comar.

Edoardo has been a long time Kafka contributor since 2016. His major
contributions are the following.

KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
KIP-277: Fine Grained ACL for CreateTopics API
KIP-136: Add Listener name to SelectorMetrics tags

Congratulations, Edoardo!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

[jira] [Updated] (KAFKA-14597) [Streams] record-e2e-latency-max is not reporting correct metrics

2023-01-05 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14597:

Component/s: metrics
 streams

> [Streams] record-e2e-latency-max is not reporting correct metrics 
> --
>
> Key: KAFKA-14597
> URL: https://issues.apache.org/jira/browse/KAFKA-14597
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics, streams
>Reporter: Atul Jain
>Priority: Major
> Attachments: process-latency-max.jpg, record-e2e-latency-max.jpg
>
>
> I was following this KIP documentation 
> ([https://cwiki.apache.org/confluence/display/KAFKA/KIP-613%3A+Add+end-to-end+latency+metrics+to+Streams])
>  and kafka streams documentation 
> ([https://kafka.apache.org/documentation/#kafka_streams_monitoring:~:text=node%2Did%3D(%5B%2D.%5Cw%5D%2B)-,record%2De2e%2Dlatency%2Dmax,-The%20maximum%20end])
>  . Based on these documentations , the *record-e2e-latency-max* should 
> monitor the full end to end latencies, which includes both *consumption 
> latencies* and  {*}processing delays{*}.
> However, based on my observations , record-e2e-latency-max seems to be only 
> measuring the consumption latencies. processing delays can be measured using 
> *process-latency-max* .I am checking all this using a simple topology 
> consisting of source, processors and sink (code added). I have added some 
> sleep time (of 3 seconds) in one of the processors to ensure some delays in 
> the processing logic. These delays are not getting accounted in the 
> record-e2e-latency-max but are accounted in process-latency-max. 
> process-latency-max was observed to be 3002 ms which accounts for sleep time 
> of 3 seconds. However, record-e2e-latency-max observed in jconsole is 422ms, 
> which does not account for 3 seconds of sleep time.
>  
> Code describing my topology:
> {code:java}
>static Topology buildTopology(String inputTopic, String outputTopic) {
> log.info("Input topic: " + inputTopic + " and output topic: " + 
> outputTopic);
> Serde stringSerde = Serdes.String();
> StreamsBuilder builder = new StreamsBuilder();
> builder.stream(inputTopic, Consumed.with(stringSerde, stringSerde))
> .peek((k,v) -> log.info("Observed event: key" + k + " value: 
> " + v))
> .mapValues(s -> {
> try {
> System.out.println("sleeping for 3 seconds");
> Thread.sleep(3000);
> }
> catch (InterruptedException e) {
> e.printStackTrace();
> }
> return  s.toUpperCase();
> })
> .peek((k,v) -> log.info("Transformed event: key" + k + " 
> value: " + v))
> .to(outputTopic, Produced.with(stringSerde, stringSerde));
> return builder.build();
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14572) Migrate EmbeddedKafkaCluster used by Streams integration tests from EmbeddedZookeeper to KRaft

2023-01-04 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14572:
---

 Summary: Migrate EmbeddedKafkaCluster used by Streams integration 
tests from EmbeddedZookeeper to KRaft
 Key: KAFKA-14572
 URL: https://issues.apache.org/jira/browse/KAFKA-14572
 Project: Kafka
  Issue Type: Test
  Components: streams, unit tests
Reporter: Matthias J. Sax
 Fix For: 4.0.0


ZK will be removed in 4.0, and we need to update our test to switch to ZK to 
KRaft.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14572) Migrate EmbeddedKafkaCluster used by Streams integration tests from EmbeddedZookeeper to KRaft

2023-01-04 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14572:
---

 Summary: Migrate EmbeddedKafkaCluster used by Streams integration 
tests from EmbeddedZookeeper to KRaft
 Key: KAFKA-14572
 URL: https://issues.apache.org/jira/browse/KAFKA-14572
 Project: Kafka
  Issue Type: Test
  Components: streams, unit tests
Reporter: Matthias J. Sax
 Fix For: 4.0.0


ZK will be removed in 4.0, and we need to update our test to switch to ZK to 
KRaft.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14567) Kafka Streams crashes after ProducerFencedException

2023-01-03 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14567:
---

 Summary: Kafka Streams crashes after ProducerFencedException
 Key: KAFKA-14567
 URL: https://issues.apache.org/jira/browse/KAFKA-14567
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Matthias J. Sax


Running a Kafka Streams application with EOS-v2.

After a thread crashed, we re-spanned a new thread what implies that the 
thread-index number was re-used, resulting in an `transactional.id` reuse, that 
lead to a `ProducerFencedException`.

After the fencing, the fenced thread crashed resulting in a non-recoverable 
error:
{quote}[2022-12-22 13:49:13,423] ERROR [i-0c291188ec2ae17a0-StreamThread-3] 
stream-thread [i-0c291188ec2ae17a0-StreamThread-3] Failed to process stream 
task 1_2 due to the following error: 
(org.apache.kafka.streams.processor.internals.TaskExecutor)
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. 
taskId=1_2, processor=KSTREAM-SOURCE-05, topic=node-name-repartition, 
partition=2, offset=539776276, stacktrace=java.lang.IllegalStateException: 
TransactionalId stream-soak-test-72b6e57c-c2f5-489d-ab9f-fdbb215d2c86-3: 
Invalid transition attempted from state FATAL_ERROR to state ABORTABLE_ERROR
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:974)
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:394)
at 
org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:620)
at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1079)
at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:959)
at 
org.apache.kafka.streams.processor.internals.StreamsProducer.send(StreamsProducer.java:257)
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:207)
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:162)
at 
org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:85)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228)
at 
org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:88)
at 
org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:157)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228)
at 
org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84)
at 
org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:791)
at 
org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:867)
at 
org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:791)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:722)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76)
at 
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:607)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:569)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:748)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76)
at 
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop

[jira] [Created] (KAFKA-14567) Kafka Streams crashes after ProducerFencedException

2023-01-03 Thread Matthias J. Sax (Jira)

Matthias J. Sax created KAFKA-14567:
---

 Summary: Kafka Streams crashes after ProducerFencedException
 Key: KAFKA-14567
 URL: https://issues.apache.org/jira/browse/KAFKA-14567
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Matthias J. Sax


Running a Kafka Streams application with EOS-v2.

After a thread crashed, we re-spanned a new thread what implies that the 
thread-index number was re-used, resulting in an `transactional.id` reuse, that 
lead to a `ProducerFencedException`.

After the fencing, the fenced thread crashed resulting in a non-recoverable 
error:
{quote}[2022-12-22 13:49:13,423] ERROR [i-0c291188ec2ae17a0-StreamThread-3] 
stream-thread [i-0c291188ec2ae17a0-StreamThread-3] Failed to process stream 
task 1_2 due to the following error: 
(org.apache.kafka.streams.processor.internals.TaskExecutor)
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. 
taskId=1_2, processor=KSTREAM-SOURCE-05, topic=node-name-repartition, 
partition=2, offset=539776276, stacktrace=java.lang.IllegalStateException: 
TransactionalId stream-soak-test-72b6e57c-c2f5-489d-ab9f-fdbb215d2c86-3: 
Invalid transition attempted from state FATAL_ERROR to state ABORTABLE_ERROR
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:974)
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:394)
at 
org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:620)
at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1079)
at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:959)
at 
org.apache.kafka.streams.processor.internals.StreamsProducer.send(StreamsProducer.java:257)
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:207)
at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:162)
at 
org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:85)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228)
at 
org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:88)
at 
org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:157)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:290)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:269)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:228)
at 
org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84)
at 
org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:791)
at 
org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:867)
at 
org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:791)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:722)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76)
at 
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:607)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:569)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:748)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:95)
at 
org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:76)
at 
org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1645)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop

[jira] [Commented] (KAFKA-13295) Long restoration times for new tasks can lead to transaction timeouts

2023-01-03 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654211#comment-17654211
 ] 

Matthias J. Sax commented on KAFKA-13295:
-

Thanks. SG.

> Long restoration times for new tasks can lead to transaction timeouts
> -
>
> Key: KAFKA-13295
> URL: https://issues.apache.org/jira/browse/KAFKA-13295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Sagar Rao
>Priority: Critical
>  Labels: eos, new-streams-runtime-should-fix
> Fix For: 3.5.0
>
>
> In some EOS applications with relatively long restoration times we've noticed 
> a series of ProducerFencedExceptions occurring during/immediately after 
> restoration. The broker logs were able to confirm these were due to 
> transactions timing out.
> In Streams, it turns out we automatically begin a new txn when calling 
> {{send}} (if there isn’t already one in flight). A {{send}} occurs often 
> outside a commit during active processing (eg writing to the changelog), 
> leaving the txn open until the next commit. And if a StreamThread has been 
> actively processing when a rebalance results in a new stateful task without 
> revoking any existing tasks, the thread won’t actually commit this open txn 
> before it goes back into the restoration phase while it builds up state for 
> the new task. So the in-flight transaction is left open during restoration, 
> during which the StreamThread only consumes from the changelog without 
> committing, leaving it vulnerable to timing out when restoration times exceed 
> the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New committer: Justine Olshan

2023-01-03 Thread Matthias J. Sax


Congrats!

On 12/29/22 6:47 PM, ziming deng wrote:

Congratulations Justine!
—
Best,
Ziming


On Dec 30, 2022, at 10:06, Luke Chen  wrote:

Congratulations, Justine!
Well deserved!

Luke

On Fri, Dec 30, 2022 at 9:15 AM Ron Dagostino  wrote:


Congratulations, Justine!Well-deserved., and I’m very happy for you.

Ron


On Dec 29, 2022, at 6:13 PM, Israel Ekpo  wrote:

Congratulations Justine!



On Thu, Dec 29, 2022 at 5:05 PM Greg Harris



wrote:

Congratulations Justine!


On Thu, Dec 29, 2022 at 1:37 PM Bill Bejeck  wrote:

Congratulations Justine!


-Bill


On Thu, Dec 29, 2022 at 4:36 PM Philip Nee 

wrote:



wow congrats!

On Thu, Dec 29, 2022 at 1:05 PM Chris Egerton <

fearthecel...@gmail.com



wrote:


Congrats, Justine!

On Thu, Dec 29, 2022, 15:58 David Jacot  wrote:


Hi all,

The PMC of Apache Kafka is pleased to announce a new Kafka

committer

Justine
Olshan.

Justine has been contributing to Kafka since June 2019. She

contributed

53

PRs including the following KIPs.

KIP-480: Sticky Partitioner
KIP-516: Topic Identifiers & Topic Deletion State Improvements
KIP-854: Separate configuration for producer ID expiry
KIP-890: Transactions Server-Side Defense (in progress)

Congratulations, Justine!

Thanks,

David (on behalf of the Apache Kafka PMC)

Re: [ANNOUNCE] New committer: Justine Olshan

2023-01-03 Thread Matthias J. Sax


Congrats!

On 12/29/22 6:47 PM, ziming deng wrote:

Congratulations Justine!
—
Best,
Ziming


On Dec 30, 2022, at 10:06, Luke Chen  wrote:

Congratulations, Justine!
Well deserved!

Luke

On Fri, Dec 30, 2022 at 9:15 AM Ron Dagostino  wrote:


Congratulations, Justine!Well-deserved., and I’m very happy for you.

Ron


On Dec 29, 2022, at 6:13 PM, Israel Ekpo  wrote:

Congratulations Justine!



On Thu, Dec 29, 2022 at 5:05 PM Greg Harris



wrote:

Congratulations Justine!


On Thu, Dec 29, 2022 at 1:37 PM Bill Bejeck  wrote:

Congratulations Justine!


-Bill


On Thu, Dec 29, 2022 at 4:36 PM Philip Nee 

wrote:



wow congrats!

On Thu, Dec 29, 2022 at 1:05 PM Chris Egerton <

fearthecel...@gmail.com



wrote:


Congrats, Justine!

On Thu, Dec 29, 2022, 15:58 David Jacot  wrote:


Hi all,

The PMC of Apache Kafka is pleased to announce a new Kafka

committer

Justine
Olshan.

Justine has been contributing to Kafka since June 2019. She

contributed

53

PRs including the following KIPs.

KIP-480: Sticky Partitioner
KIP-516: Topic Identifiers & Topic Deletion State Improvements
KIP-854: Separate configuration for producer ID expiry
KIP-890: Transactions Server-Side Defense (in progress)

Congratulations, Justine!

Thanks,

David (on behalf of the Apache Kafka PMC)

Re: Kafka Stream: The state store, wkstore, may have migrated to another instance

2022-12-29 Thread Matthias J. Sax


Sounds like a SpringBoot issue rather than a KS issues.

-Matthias

On 12/29/22 2:45 AM, Nawal Sah wrote:

Hi,

My SpringBoot stream application works fine in a fresh start of the
clustered environment.
But when I restart one of the pods out of two pods, I start getting the
below exception from "KafkaStreams.store". I debugged the state but found
it was RUNNING.

*org.apache.kafka.streams.errors.InvalidStateStoreException: The state
store, wkstore, may have migrated to another instance.*
*at
org.apache.kafka.streams.state.internals.WrappingStoreProvider.stores(WrappingStoreProvider.java:67)
   *
*at
org.apache.kafka.streams.state.internals.CompositeReadOnlyWindowStore.fetchAll(CompositeReadOnlyWindowStore.java:175)*

To fix this, I must restart all the pods/instances or the cluster.

*Steps to reproduce*

1. Kafka Stream application environment
'kafka-streams', version: '3.0.1'
'kafka-clients', version: '3.0.1'
2. Create a cluster environment with at least two replication factors of
the Kafka Stream application.
3. Restart one of the pods.
4. Call KafkaStreams:store(), it starts throwing an exception. If not then
try at least 5 times to restart the same pod.

Priority: *Blocker*


*Regards,*

Nawal Sah

(M): +91-9717932863

[jira] [Assigned] (KAFKA-9224) State store should not see uncommitted transaction result

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-9224:
--

Assignee: (was: Boyang Chen)

> State store should not see uncommitted transaction result
> -
>
> Key: KAFKA-9224
> URL: https://issues.apache.org/jira/browse/KAFKA-9224
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Priority: Major
>
> Currently under EOS, the uncommitted write could be reflected in the state 
> store before the ongoing transaction is finished. This means interactive 
> query could see uncommitted data within state store which is not ideal for 
> users relying on state stores for strong consistency. Ideally, we should have 
> an option to include state store commit as part of ongoing transaction, 
> however an immediate step towards a better reasoned system is to `write after 
> transaction commit`, which means we always buffer data within stream cache 
> for EOS until the ongoing transaction is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-8977) Remove MockStreamsMetrics Since it is not a Mock

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8977:
--

Assignee: (was: bibin sebastian)

> Remove MockStreamsMetrics Since it is not a Mock
> 
>
> Key: KAFKA-8977
> URL: https://issues.apache.org/jira/browse/KAFKA-8977
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Minor
>  Labels: newbie
>
> The class {{MockStreamsMetrics}} is used throughout unit tests as a mock but 
> it is not really a mock since it only hides two parameters of the 
> {{StreamsMetricsImpl}} constructor. Either a real mock or the real 
> {{StreamsMetricsImpl}} should be used in the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-6460) Add mocks for state stores used in Streams unit testing

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-6460:
--

Assignee: (was: Yishun Guan)

> Add mocks for state stores used in Streams unit testing
> ---
>
> Key: KAFKA-6460
> URL: https://issues.apache.org/jira/browse/KAFKA-6460
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams-test-utils
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: kip, newbie++
>
> We'd like to use mocks for different types of state stores: kv, window, 
> session that can be used to record the number of expected put / get calls 
> used in the DSL operator unit testing. This involves implementing the two 
> interfaces {{StoreSupplier}} and {{StoreBuilder}} that can return a object 
> created from, say, EasyMock, and the object can then be set up with the 
> expected calls.
> In addition, we should also add a mock record collector which can be returned 
> from the mock processor context so that with logging enabled store, users can 
> also validate if the changes have been forwarded to the changelog as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-8272) Changed(De)Serializer does not forward configure() call

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-8272:
--

Assignee: (was: Matthias J. Sax)

> Changed(De)Serializer does not forward configure() call
> ---
>
> Key: KAFKA-8272
> URL: https://issues.apache.org/jira/browse/KAFKA-8272
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>        Reporter: Matthias J. Sax
>Priority: Major
>
> Exposed via KAFKA-3729.
> Because this was not a problem in the past, might not be necessary to back 
> port to all version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-6786) Remove additional configs in StreamsBrokerDownResilienceTest

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-6786:
--

Assignee: (was: Abhishek Sharma)

> Remove additional configs in StreamsBrokerDownResilienceTest
> 
>
> Key: KAFKA-6786
> URL: https://issues.apache.org/jira/browse/KAFKA-6786
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Minor
>  Labels: newbie
>
> Since we are now passing in a property file into the streams service 
> initialization code, we do not need to pass in those configs as additional 
> properties in StreamsBrokerDownResilienceTest. We can refactor this test to 
> get rid of the additional properties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-7653) Streams-Scala: Add type level differentiation for Key and Value serdes.

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-7653:
--

Assignee: (was: Mark Tranter)

> Streams-Scala: Add type level differentiation for Key and Value serdes.
> ---
>
> Key: KAFKA-7653
> URL: https://issues.apache.org/jira/browse/KAFKA-7653
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Mark Tranter
>Priority: Minor
>  Labels: kip, scala
>
> Implicit resolution/conversion of Serdes/Consumed etc is a big improvement 
> for the Scala Streams API. However in cases where a user needs to 
> differentiate between Key and Value serializer functionality (i.e. using the 
> Schema Registry), implicit resolution doesn't help and could cause issues. 
> e.g.
> {code:java}
> case class MouseClickEvent(pageId: Long, userId: String)
> builder
>   // Long serde taken from implicit scope configured with
>   // `isKey` = true
>   .stream[Long, MouseClickEvent]("mouse-clicks")
>   .selectKey((_,v) => v.userId)
>   .groupByKey
>   .aggregate(() => 0L, (_: String, mce: MouseClickEvent, count: Long) => 
> count + 1)
>   .toStream
>   // Same Long serde taken from implicit scope configured with
>   // `isKey` = true, even thought the `Long` value in this case
>   // will be the Value
>   .to("mouse-clicks-by-user")
> {code}
> It would be ideal if Key and Value Serde/SerdeWrapper types/type classes 
> could be introduced to overcome this limitation.
> KIP-513: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-513%3A+Distinguish+between+Key+and+Value+serdes+in+scala+wrapper+library+for+kafka+streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-7653) Streams-Scala: Add type level differentiation for Key and Value serdes.

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7653:
---
Labels: kip scala  (was: scala)

> Streams-Scala: Add type level differentiation for Key and Value serdes.
> ---
>
> Key: KAFKA-7653
> URL: https://issues.apache.org/jira/browse/KAFKA-7653
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Mark Tranter
>Assignee: Mark Tranter
>Priority: Minor
>  Labels: kip, scala
>
> Implicit resolution/conversion of Serdes/Consumed etc is a big improvement 
> for the Scala Streams API. However in cases where a user needs to 
> differentiate between Key and Value serializer functionality (i.e. using the 
> Schema Registry), implicit resolution doesn't help and could cause issues. 
> e.g.
> {code:java}
> case class MouseClickEvent(pageId: Long, userId: String)
> builder
>   // Long serde taken from implicit scope configured with
>   // `isKey` = true
>   .stream[Long, MouseClickEvent]("mouse-clicks")
>   .selectKey((_,v) => v.userId)
>   .groupByKey
>   .aggregate(() => 0L, (_: String, mce: MouseClickEvent, count: Long) => 
> count + 1)
>   .toStream
>   // Same Long serde taken from implicit scope configured with
>   // `isKey` = true, even thought the `Long` value in this case
>   // will be the Value
>   .to("mouse-clicks-by-user")
> {code}
> It would be ideal if Key and Value Serde/SerdeWrapper types/type classes 
> could be introduced to overcome this limitation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-7653) Streams-Scala: Add type level differentiation for Key and Value serdes.

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7653:
---
Description: 
Implicit resolution/conversion of Serdes/Consumed etc is a big improvement for 
the Scala Streams API. However in cases where a user needs to differentiate 
between Key and Value serializer functionality (i.e. using the Schema 
Registry), implicit resolution doesn't help and could cause issues. 

e.g.
{code:java}
case class MouseClickEvent(pageId: Long, userId: String)

builder
  // Long serde taken from implicit scope configured with
  // `isKey` = true
  .stream[Long, MouseClickEvent]("mouse-clicks")
  .selectKey((_,v) => v.userId)
  .groupByKey
  .aggregate(() => 0L, (_: String, mce: MouseClickEvent, count: Long) => count 
+ 1)
  .toStream
  // Same Long serde taken from implicit scope configured with
  // `isKey` = true, even thought the `Long` value in this case
  // will be the Value
  .to("mouse-clicks-by-user")
{code}
It would be ideal if Key and Value Serde/SerdeWrapper types/type classes could 
be introduced to overcome this limitation.

KIP-513: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-513%3A+Distinguish+between+Key+and+Value+serdes+in+scala+wrapper+library+for+kafka+streams]

  was:
Implicit resolution/conversion of Serdes/Consumed etc is a big improvement for 
the Scala Streams API. However in cases where a user needs to differentiate 
between Key and Value serializer functionality (i.e. using the Schema 
Registry), implicit resolution doesn't help and could cause issues. 

e.g.
{code:java}
case class MouseClickEvent(pageId: Long, userId: String)

builder
  // Long serde taken from implicit scope configured with
  // `isKey` = true
  .stream[Long, MouseClickEvent]("mouse-clicks")
  .selectKey((_,v) => v.userId)
  .groupByKey
  .aggregate(() => 0L, (_: String, mce: MouseClickEvent, count: Long) => count 
+ 1)
  .toStream
  // Same Long serde taken from implicit scope configured with
  // `isKey` = true, even thought the `Long` value in this case
  // will be the Value
  .to("mouse-clicks-by-user")
{code}
It would be ideal if Key and Value Serde/SerdeWrapper types/type classes could 
be introduced to overcome this limitation.


> Streams-Scala: Add type level differentiation for Key and Value serdes.
> ---
>
> Key: KAFKA-7653
> URL: https://issues.apache.org/jira/browse/KAFKA-7653
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Mark Tranter
>Assignee: Mark Tranter
>Priority: Minor
>  Labels: kip, scala
>
> Implicit resolution/conversion of Serdes/Consumed etc is a big improvement 
> for the Scala Streams API. However in cases where a user needs to 
> differentiate between Key and Value serializer functionality (i.e. using the 
> Schema Registry), implicit resolution doesn't help and could cause issues. 
> e.g.
> {code:java}
> case class MouseClickEvent(pageId: Long, userId: String)
> builder
>   // Long serde taken from implicit scope configured with
>   // `isKey` = true
>   .stream[Long, MouseClickEvent]("mouse-clicks")
>   .selectKey((_,v) => v.userId)
>   .groupByKey
>   .aggregate(() => 0L, (_: String, mce: MouseClickEvent, count: Long) => 
> count + 1)
>   .toStream
>   // Same Long serde taken from implicit scope configured with
>   // `isKey` = true, even thought the `Long` value in this case
>   // will be the Value
>   .to("mouse-clicks-by-user")
> {code}
> It would be ideal if Key and Value Serde/SerdeWrapper types/type classes 
> could be introduced to overcome this limitation.
> KIP-513: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-513%3A+Distinguish+between+Key+and+Value+serdes+in+scala+wrapper+library+for+kafka+streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-7075) Allow Topology#addGlobalStore to add a window store

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-7075:
--

Assignee: (was: Nishanth Pradeep)

> Allow Topology#addGlobalStore to add a window store
> ---
>
> Key: KAFKA-7075
> URL: https://issues.apache.org/jira/browse/KAFKA-7075
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> Today although {{Topology#addGlobalStore}} can take any {{StateStore}} types, 
> the internal implementation {{InternalTopologyBuilder#addGlobalStore}} only 
> accepts {{StoreBuilder}}. It means if users pass in a windowed 
> store builder in {{Topology#addGlobalStore}} it will cause a runtime 
> ClassCastException.
> We should fix this issue by relaxing the {{InternalTopologyBuilder}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-13295) Long restoration times for new tasks can lead to transaction timeouts

2022-12-28 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652620#comment-17652620
 ] 

Matthias J. Sax commented on KAFKA-13295:
-

[~ableegoldman] – I am wondering if this issue is resolved via 
https://issues.apache.org/jira/browse/KAFKA-14294 implicitly?

> Long restoration times for new tasks can lead to transaction timeouts
> -
>
> Key: KAFKA-13295
> URL: https://issues.apache.org/jira/browse/KAFKA-13295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Sagar Rao
>Priority: Critical
>  Labels: eos, new-streams-runtime-should-fix
>
> In some EOS applications with relatively long restoration times we've noticed 
> a series of ProducerFencedExceptions occurring during/immediately after 
> restoration. The broker logs were able to confirm these were due to 
> transactions timing out.
> In Streams, it turns out we automatically begin a new txn when calling 
> {{send}} (if there isn’t already one in flight). A {{send}} occurs often 
> outside a commit during active processing (eg writing to the changelog), 
> leaving the txn open until the next commit. And if a StreamThread has been 
> actively processing when a rebalance results in a new stateful task without 
> revoking any existing tasks, the thread won’t actually commit this open txn 
> before it goes back into the restoration phase while it builds up state for 
> the new task. So the in-flight transaction is left open during restoration, 
> during which the StreamThread only consumes from the changelog without 
> committing, leaving it vulnerable to timing out when restoration times exceed 
> the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-10493) KTable out-of-order updates are not being ignored

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-10493.
-
Fix Version/s: (was: 4.0.0)
 Assignee: (was: Matthias J. Sax)
   Resolution: Won't Fix

> KTable out-of-order updates are not being ignored
> -
>
> Key: KAFKA-10493
> URL: https://issues.apache.org/jira/browse/KAFKA-10493
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Pedro Gontijo
>Priority: Blocker
> Attachments: KTableOutOfOrderBug.java, out-of-order-table.png
>
>
> On a materialized KTable, out-of-order records for a given key (records which 
> timestamp are older than the current value in store) are not being ignored 
> but used to update the local store value and also being forwarded.
> I believe the bug is here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ValueAndTimestampSerializer.java#L77]
>  It should return true, not false (see javadoc)
> The bug impacts here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableSource.java#L142-L148]
> I have attached a simple stream app that shows the issue happening.
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-10493) KTable out-of-order updates are not being ignored

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-10493.
-
Fix Version/s: (was: 4.0.0)
 Assignee: (was: Matthias J. Sax)
   Resolution: Won't Fix

> KTable out-of-order updates are not being ignored
> -
>
> Key: KAFKA-10493
> URL: https://issues.apache.org/jira/browse/KAFKA-10493
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Pedro Gontijo
>Priority: Blocker
> Attachments: KTableOutOfOrderBug.java, out-of-order-table.png
>
>
> On a materialized KTable, out-of-order records for a given key (records which 
> timestamp are older than the current value in store) are not being ignored 
> but used to update the local store value and also being forwarded.
> I believe the bug is here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ValueAndTimestampSerializer.java#L77]
>  It should return true, not false (see javadoc)
> The bug impacts here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableSource.java#L142-L148]
> I have attached a simple stream app that shows the issue happening.
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-10493) KTable out-of-order updates are not being ignored

2022-12-28 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652613#comment-17652613
 ] 

Matthias J. Sax commented on KAFKA-10493:
-

Given that "version KTable" KIP is approved, I am closing this ticket. Cf 
https://issues.apache.org/jira/browse/KAFKA-14491 

If there is objections, please let us know.

> KTable out-of-order updates are not being ignored
> -
>
> Key: KAFKA-10493
> URL: https://issues.apache.org/jira/browse/KAFKA-10493
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Pedro Gontijo
>Assignee: Matthias J. Sax
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: KTableOutOfOrderBug.java, out-of-order-table.png
>
>
> On a materialized KTable, out-of-order records for a given key (records which 
> timestamp are older than the current value in store) are not being ignored 
> but used to update the local store value and also being forwarded.
> I believe the bug is here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ValueAndTimestampSerializer.java#L77]
>  It should return true, not false (see javadoc)
> The bug impacts here: 
> [https://github.com/apache/kafka/blob/2.6.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableSource.java#L142-L148]
> I have attached a simple stream app that shows the issue happening.
> Thank you!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-8403) Make suppression results queriable

2022-12-28 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652611#comment-17652611
 ] 

Matthias J. Sax commented on KAFKA-8403:


[~vvcephei] – given https://issues.apache.org/jira/browse/KAFKA-13785 do we 
still need this ticket?

> Make suppression results queriable
> --
>
> Key: KAFKA-8403
> URL: https://issues.apache.org/jira/browse/KAFKA-8403
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: Dongjin Lee
>Priority: Major
>  Labels: needs-kip
>
> WIP KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-508%3A+Make+Suppression+State+Queriable]
> The newly added KTable Suppress operator lacks a Materialized variant, which 
> would be useful if you wanted to query the results of the suppression.
> Suppression results will eventually match the upstream results, but the 
> intermediate distinction may be meaningful for some applications. For 
> example, you could want to query only the final results of a windowed 
> aggregation.
>  
> Note: This is _not_ the same as implementing on-disk suppression, which is 
> capured here: https://issues.apache.org/jira/browse/KAFKA-7224



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-13881) Add package.java for public package javadoc

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-13881.
-
Fix Version/s: 3.4.0
   (was: 3.5.0)
   Resolution: Fixed

> Add package.java for public package javadoc
> ---
>
> Key: KAFKA-13881
> URL: https://issues.apache.org/jira/browse/KAFKA-13881
> Project: Kafka
>  Issue Type: Task
>Reporter: Tom Bentley
>Assignee: Greg Harris
>Priority: Trivial
> Fix For: 3.4.0
>
>
> Our public javadoc ([https://kafka.apache.org/31/javadoc/index.html)] doesn't 
> have any package descriptions, which is a bit intimidating for anyone who is 
> new to the project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-13881) Add package.java for public package javadoc

2022-12-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-13881.
-
Fix Version/s: 3.4.0
   (was: 3.5.0)
   Resolution: Fixed

> Add package.java for public package javadoc
> ---
>
> Key: KAFKA-13881
> URL: https://issues.apache.org/jira/browse/KAFKA-13881
> Project: Kafka
>  Issue Type: Task
>Reporter: Tom Bentley
>Assignee: Greg Harris
>Priority: Trivial
> Fix For: 3.4.0
>
>
> Our public javadoc ([https://kafka.apache.org/31/javadoc/index.html)] doesn't 
> have any package descriptions, which is a bit intimidating for anyone who is 
> new to the project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14548) Stable streams applications stall due to infrequent restoreConsumer polls

2022-12-27 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652327#comment-17652327
 ] 

Matthias J. Sax commented on KAFKA-14548:
-

Thanks!

> Stable streams applications stall due to infrequent restoreConsumer polls
> -
>
> Key: KAFKA-14548
> URL: https://issues.apache.org/jira/browse/KAFKA-14548
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Greg Harris
>Priority: Major
>
> We have observed behavior with Streams where otherwise healthy applications 
> stall and become unable to process data after a rebalance 
> (https://issues.apache.org/jira/browse/KAFKA-13405.) The root cause of which 
> is that a restoreConsumer can be partitioned from a Kafka cluster with stale 
> metadata, while the mainConsumer is healthy with up-to-date metadata. This is 
> due to both an issue in streams and an issue in the consumer logic.
> In StoreChangelogReader, a long-lived restoreConsumer is kept instantiated 
> while the streams app is running. This consumer is only `poll()`ed when the 
> ChangelogReader::restore method is called and at least one changelog is in 
> the RESTORING state. This may be very infrequent if the streams app is stable.
> This is an anti-pattern, as frequent poll()s are expected to keep kafka 
> consumers in contact with the kafka cluster. Infrequent polls are considered 
> failures from the perspective of the consumer API. From the [official Kafka 
> Consumer 
> documentation|https://kafka.apache.org/33/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html]:
> {noformat}
> The poll API is designed to ensure consumer liveness.
> ...
> So to stay in the group, you must continue to call poll.
> ...
> The recommended way to handle these cases [where the main thread is not ready 
> for more data] is to move message processing to another thread, which allows 
> the consumer to continue calling poll while the processor is still working.
> ...
> Note also that you will need to pause the partition so that no new records 
> are received from poll until after thread has finished handling those 
> previously returned.{noformat}
> With the current behavior, it is expected that the restoreConsumer will fall 
> out of the group regularly and be considered failed, when the rest of the 
> application is running exactly as intended.
> This is not normally an issue, as falling out of the group is easily repaired 
> by joining the group during the next poll. It does mean that there is 
> slightly higher latency to performing a restore, but that does not appear to 
> be a major concern at this time.
> This does become an issue when other deeper assumptions about the usage of 
> Kafka clients are violated. Relevant to this issue, it is assumed by the 
> client metadata management logic that regular polling will take place, and 
> that the regular poll call can be piggy-backed to initiate a metadata update. 
> Without a regular poll, the regular metadata update cannot be performed, and 
> the consumer violates its own `metadata.max.age.ms` configuration. This leads 
> to the restoreConsumer having a much older metadata containing none of the 
> currently live brokers, partitioning it from the cluster.
> Alleviating this failure mode does not _require_ the streams' polling 
> behavior to change, as solutions for all clients have been considered 
> (https://issues.apache.org/jira/browse/KAFKA-3068 and that family of 
> duplicate issues).
> However, as a tactical fix for the issue, and one which does not require a 
> KIP changing the behavior of {_}every kafka client{_}, we should consider 
> changing the restoreConsumer poll behavior to bring it closer to the expected 
> happy-path of at least one poll() every poll.interval.ms.
> If there is another hidden assumption of the clients that relies on regular 
> polling, then this tactical fix may prevent users of the streams library from 
> being affected, reducing the impact of that hidden assumption through 
> defense-in-depth.
> This would also be a backport-able fix for streams users, instead of a fix to 
> the consumers which would only apply to new versions of the consumers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New committer: Satish Duggana

2022-12-27 Thread Matthias J. Sax


Congrats!

On 12/27/22 10:20 AM, Kirk True wrote:

Congrats, Satish!

On Fri, Dec 23, 2022, at 10:07 AM, Jun Rao wrote:

Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Satish
Duggana.

Satish has been a long time Kafka contributor since 2017. He is the main
driver behind KIP-405 that integrates Kafka with remote storage, a
significant and much anticipated feature in Kafka.

Congratulations, Satish!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: [ANNOUNCE] New committer: Satish Duggana

2022-12-27 Thread Matthias J. Sax


Congrats!

On 12/27/22 10:20 AM, Kirk True wrote:

Congrats, Satish!

On Fri, Dec 23, 2022, at 10:07 AM, Jun Rao wrote:

Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Satish
Duggana.

Satish has been a long time Kafka contributor since 2017. He is the main
driver behind KIP-405 that integrates Kafka with remote storage, a
significant and much anticipated feature in Kafka.

Congratulations, Satish!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

[jira] [Commented] (KAFKA-14548) Stable streams applications stall due to infrequent restoreConsumer polls

2022-12-27 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652292#comment-17652292
 ] 

Matthias J. Sax commented on KAFKA-14548:
-

Thanks! – I have of course an interest to get this addressed. What client 
ticket would need to be tackled? Are they all linked to this ticket? If we 
understand what needs to be done, I am happy to make a case to get this 
prioritized. Also, what KIP did you refer to?

> Stable streams applications stall due to infrequent restoreConsumer polls
> -
>
> Key: KAFKA-14548
> URL: https://issues.apache.org/jira/browse/KAFKA-14548
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Greg Harris
>Priority: Major
>
> We have observed behavior with Streams where otherwise healthy applications 
> stall and become unable to process data after a rebalance 
> (https://issues.apache.org/jira/browse/KAFKA-13405.) The root cause of which 
> is that a restoreConsumer can be partitioned from a Kafka cluster with stale 
> metadata, while the mainConsumer is healthy with up-to-date metadata. This is 
> due to both an issue in streams and an issue in the consumer logic.
> In StoreChangelogReader, a long-lived restoreConsumer is kept instantiated 
> while the streams app is running. This consumer is only `poll()`ed when the 
> ChangelogReader::restore method is called and at least one changelog is in 
> the RESTORING state. This may be very infrequent if the streams app is stable.
> This is an anti-pattern, as frequent poll()s are expected to keep kafka 
> consumers in contact with the kafka cluster. Infrequent polls are considered 
> failures from the perspective of the consumer API. From the [official Kafka 
> Consumer 
> documentation|https://kafka.apache.org/33/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html]:
> {noformat}
> The poll API is designed to ensure consumer liveness.
> ...
> So to stay in the group, you must continue to call poll.
> ...
> The recommended way to handle these cases [where the main thread is not ready 
> for more data] is to move message processing to another thread, which allows 
> the consumer to continue calling poll while the processor is still working.
> ...
> Note also that you will need to pause the partition so that no new records 
> are received from poll until after thread has finished handling those 
> previously returned.{noformat}
> With the current behavior, it is expected that the restoreConsumer will fall 
> out of the group regularly and be considered failed, when the rest of the 
> application is running exactly as intended.
> This is not normally an issue, as falling out of the group is easily repaired 
> by joining the group during the next poll. It does mean that there is 
> slightly higher latency to performing a restore, but that does not appear to 
> be a major concern at this time.
> This does become an issue when other deeper assumptions about the usage of 
> Kafka clients are violated. Relevant to this issue, it is assumed by the 
> client metadata management logic that regular polling will take place, and 
> that the regular poll call can be piggy-backed to initiate a metadata update. 
> Without a regular poll, the regular metadata update cannot be performed, and 
> the consumer violates its own `metadata.max.age.ms` configuration. This leads 
> to the restoreConsumer having a much older metadata containing none of the 
> currently live brokers, partitioning it from the cluster.
> Alleviating this failure mode does not _require_ the streams' polling 
> behavior to change, as solutions for all clients have been considered 
> (https://issues.apache.org/jira/browse/KAFKA-3068 and that family of 
> duplicate issues).
> However, as a tactical fix for the issue, and one which does not require a 
> KIP changing the behavior of {_}every kafka client{_}, we should consider 
> changing the restoreConsumer poll behavior to bring it closer to the expected 
> happy-path of at least one poll() every poll.interval.ms.
> If there is another hidden assumption of the clients that relies on regular 
> polling, then this tactical fix may prevent users of the streams library from 
> being affected, reducing the impact of that hidden assumption through 
> defense-in-depth.
> This would also be a backport-able fix for streams users, instead of a fix to 
> the consumers which would only apply to new versions of the consumers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14453) Flaky test suite MirrorConnectorsWithCustomForwardingAdminIntegrationTest

2022-12-27 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652290#comment-17652290
 ] 

Matthias J. Sax commented on KAFKA-14453:
-

[~ChrisEgerton] – Jenkins seems to be in very bad shape currently, and I did 
not get a single green PR build for a while – seems this test fails frequently. 
Would you have time to pick it up?

> Flaky test suite MirrorConnectorsWithCustomForwardingAdminIntegrationTest
> -
>
> Key: KAFKA-14453
> URL: https://issues.apache.org/jira/browse/KAFKA-14453
> Project: Kafka
>  Issue Type: Test
>  Components: mirrormaker
>Reporter: Chris Egerton
>Priority: Major
>  Labels: flaky-test
>
> We've been seeing some integration test failures lately for the 
> {{MirrorConnectorsWithCustomForwardingAdminIntegrationTest}} test suite. A 
> couple examples:
> {{org.opentest4j.AssertionFailedError: Condition not met within timeout 
> 6. Topic: mm2-offset-syncs.backup.internal didn't get created in the 
> FakeLocalMetadataStore ==> expected:  but was: }}
> {{    at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)}}
> {{    at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)}}
> {{    at 
> app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)}}
> {{    at 
> app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)}}
> {{    at 
> app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)}}
> {{    at 
> app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)}}
> {{    at 
> app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)}}
> {{    at 
> app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)}}
> {{    at 
> app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)}}
> {{    at 
> app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308)}}
> {{    at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.waitForTopicToPersistInFakeLocalMetadataStore(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:326)}}
> {{    at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.testReplicationIsCreatingTopicsUsingProvidedForwardingAdmin(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:217)}}
> {{}}
>  
> And:
>  
> {{org.opentest4j.AssertionFailedError: Condition not met within timeout 
> 6. Topic: primary.test-topic-1's configs don't have partitions:11 ==> 
> expected:  but was: }}
> {{    }}{{at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)}}
> {{    }}{{at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)}}
> {{    }}{{at 
> org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)}}
> {{    }}{{at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)}}
> {{    }}{{at 
> org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)}}
> {{    }}{{at 
> org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)}}
> {{    }}{{at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)}}
> {{    }}{{at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)}}
> {{    }}{{at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)}}
> {{    }}{{at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308)}}
> {{    }}{{at 
> org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.waitForTopicConfigPersistInFakeLocalMetaDataStore(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:334)}}
> {{    }}{{at 
> org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.testCreatePartitionsUseProvidedForwardingAdmin(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:255)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14548) Stable streams applications stall due to infrequent restoreConsumer polls

2022-12-22 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651440#comment-17651440
 ] 

Matthias J. Sax commented on KAFKA-14548:
-

{quote}This is an anti-pattern, as frequent poll()s are expected to keep kafka 
consumers in contact with the kafka cluster.
{quote}
Well, not really. Note that the JavaDoc you quote is about a consumer that is 
part of a consumer group. However, the restore consumer is a "stand along" 
consumer and not part of any group and thus periodic polling is not necessary. 
There is no consumer group, group management, or heart beating etc.
{quote}Without a regular poll, the regular metadata update cannot be performed, 
and the consumer violates its own `metadata.max.age.ms` configuration. This 
leads to the restoreConsumer having a much older metadata containing none of 
the currently live brokers, partitioning it from the cluster.
{quote}
I am not an expert on the consumer, but I would expect that the restore 
consumer would refresh its metadata when we use it again if it's cached 
metadata aged out (for any API call, not just poll()) after a longer pause? 
Thus, as long as its bootstrap servers are reachable, it should be able to 
refresh its metadata. Back in the days I filed a follow up ticket for the 
clients about cached IPs: https://issues.apache.org/jira/browse/KAFKA-13467 

So far, I still think that there is nothing we can (ie, should) do in Streams – 
if it's a client issue, we should not put a workaround into Stream to mask the 
client issue but rather fix the client.

Thoughts?

> Stable streams applications stall due to infrequent restoreConsumer polls
> -
>
> Key: KAFKA-14548
> URL: https://issues.apache.org/jira/browse/KAFKA-14548
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Greg Harris
>Priority: Major
>
> We have observed behavior with Streams where otherwise healthy applications 
> stall and become unable to process data after a rebalance 
> (https://issues.apache.org/jira/browse/KAFKA-13405.) The root cause of which 
> is that a restoreConsumer can be partitioned from a Kafka cluster with stale 
> metadata, while the mainConsumer is healthy with up-to-date metadata. This is 
> due to both an issue in streams and an issue in the consumer logic.
> In StoreChangelogReader, a long-lived restoreConsumer is kept instantiated 
> while the streams app is running. This consumer is only `poll()`ed when the 
> ChangelogReader::restore method is called and at least one changelog is in 
> the RESTORING state. This may be very infrequent if the streams app is stable.
> This is an anti-pattern, as frequent poll()s are expected to keep kafka 
> consumers in contact with the kafka cluster. Infrequent polls are considered 
> failures from the perspective of the consumer API. From the [official Kafka 
> Consumer 
> documentation|https://kafka.apache.org/33/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html]:
> {noformat}
> The poll API is designed to ensure consumer liveness.
> ...
> So to stay in the group, you must continue to call poll.
> ...
> The recommended way to handle these cases [where the main thread is not ready 
> for more data] is to move message processing to another thread, which allows 
> the consumer to continue calling poll while the processor is still working.
> ...
> Note also that you will need to pause the partition so that no new records 
> are received from poll until after thread has finished handling those 
> previously returned.{noformat}
> With the current behavior, it is expected that the restoreConsumer will fall 
> out of the group regularly and be considered failed, when the rest of the 
> application is running exactly as intended.
> This is not normally an issue, as falling out of the group is easily repaired 
> by joining the group during the next poll. It does mean that there is 
> slightly higher latency to performing a restore, but that does not appear to 
> be a major concern at this time.
> This does become an issue when other deeper assumptions about the usage of 
> Kafka clients are violated. Relevant to this issue, it is assumed by the 
> client metadata management logic that regular polling will take place, and 
> that the regular poll call can be piggy-backed to initiate a metadata update. 
> Without a regular poll, the regular metadata update cannot be performed, and 
> the consumer violates its own `metadata.max.age.ms` configuration. This leads 
> to the restoreConsumer having a much older metadata containing none of the 
> currently live brokers, partitioning it fro

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-12-22 Thread Matthias J. Sax


Thanks.

Glad we could align.


-Matthias

On 12/21/22 2:09 AM, Sagar wrote:

Hi All,

Just as an update, the changes described here:

```
Changes at a high level are:

1) KeyQueryMetada enhanced to have a new method called partitions().
2) Lifting the restriction of a single partition for IQ. Now the
restriction holds only for FK Join.
```

are reverted back. As things stand,  KeyQueryMetada exposes only the
partition() method and the restriction for single partition is added back
for IQ. This has been done based on the points raised by Matthias above.

The KIP has been updated accordingly.

Thanks!
Sagar.

On Sat, Dec 10, 2022 at 12:09 AM Sagar  wrote:


Hey Matthias,

Actually I had shared the PR link for any potential issues that might have
gone missing. I guess it didn't come out that way in my response. Apologies
for that!

I am more than happy to incorporate any feedback/changes or address any
concerns that are still present around this at this point as well.

And I would keep in mind the feedback to provide more time in such a
scenario.

Thanks!
Sagar.

On Fri, Dec 9, 2022 at 11:41 PM Matthias J. Sax  wrote:


It is what it is.


we did have internal discussions on this


We sometimes have the case that a KIP need adjustment as stuff is
discovered during coding. And having a discussion on the PR about it is
fine. -- However, before the PR gets merge, the KIP change should be
announced to verify that nobody has objections to he change, before we
carry forward.

It's up to the committer who reviews/merges the PR to ensure that this
process is followed IMHO. I hope we can do better next time.

(And yes, there was the 3.4 release KIP deadline that might explain it,
but it seems important that we give enough time is make "tricky" changes
and not rush into stuff IMHO.)


-Matthias


On 12/8/22 7:04 PM, Sagar wrote:

Thanks Matthias,

Well, as things stand, we did have internal discussions on this and it
seemed ok to open it up for IQ and more importantly not ok to have it
opened up for FK-Join. And more importantly, the PR for this is already
merged and some of these things came up during that. Here's the PR link:
https://github.com/apache/kafka/pull/12803.

Thanks!
Sagar.


On Fri, Dec 9, 2022 at 5:15 AM Matthias J. Sax 

wrote:



Ah. Missed it as it does not have a nice "code block" similar to
`StreamPartitioner` changes.

I understand the motivation, but I am wondering if we might head into a
tricky direction? State stores (at least the built-in ones) and IQ are
kinda build with the idea to have sharded data and that a multi-cast of
keys is an anti-pattern?

Maybe it's fine, but I also don't want to open Pandora's Box. Are we
sure that generalizing the concepts does not cause issues in the

future?


Ie, should we claim that the multi-cast feature should be used for
KStreams only, but not for KTables?

Just want to double check that we are not doing something we regret

later.



-Matthias


On 12/7/22 6:45 PM, Sagar wrote:

Hi Mathias,

I did save it. The changes are added under Public Interfaces (Pt#2

about

enhancing KeyQueryMetadata with partitions method) and
throwing IllegalArgumentException when StreamPartitioner#partitions

method

returns multiple partitions for just FK-join instead of the earlier

decided

FK-Join and IQ.

The background is that for IQ, if the users have multi casted records

to

multiple partitions during ingestion but the fetch returns only a

single

partition, then it would be wrong. That's why the restriction was

lifted

for IQ and that's the reason KeyQueryMetadata now has another

partitions()

method to signify the same.

FK-Join also has a similar case, but while reviewing it was felt that
FK-Join on it's own is fairly complicated and we don't need this

feature

right away so the restriction still exists.

Thanks!
Sagar.


On Wed, Dec 7, 2022 at 9:42 PM Matthias J. Sax 

wrote:



I don't see any update on the wiki about it. Did you forget to hit

"save"?


Can you also provide some background? I am not sure right now if I
understand the proposed changes?


-Matthias

On 12/6/22 6:36 PM, Sophie Blee-Goldman wrote:

Thanks Sagar, this makes sense to me -- we clearly need additional

changes

to
avoid breaking IQ when using this feature, but I agree with

continuing

to

restrict
FKJ since they wouldn't stop working without it, and would become

much

harder
to reason about (than they already are) if we did enable them to use

it.


And of course, they can still multicast the final results of a FKJ,

they

just can't
mess with the internal workings of it in this way.

On Tue, Dec 6, 2022 at 9:48 AM Sagar 

wrote:



Hi All,

I made a couple of edits to the KIP which came up during the code

review.

Changes at a high level are:

1) KeyQueryMetada enhanced to have a new method called

partitions().

2) Lifting the restriction of a single partition for IQ. Now the
restriction holds only for FK Join.

Updated KIP:






https://cwi

Re: [ANNOUNCE] New committer: Josep Prat

2022-12-20 Thread Matthias J. Sax


Congrats!

On 12/20/22 12:01 PM, Josep Prat wrote:

Thank you all!

———
Josep Prat

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Tue, Dec 20, 2022, 20:42 Bill Bejeck  wrote:


Congratulations Josep!

-Bill

On Tue, Dec 20, 2022 at 1:11 PM Mickael Maison 
wrote:


Congratulations Josep!

On Tue, Dec 20, 2022 at 6:55 PM Bruno Cadonna 

wrote:


Congrats, Josep!

Well deserved!

Best,
Bruno

On 20.12.22 18:40, Kirk True wrote:

Congrats Josep!

On Tue, Dec 20, 2022, at 9:33 AM, Jorge Esteban Quilcate Otoya wrote:

Congrats Josep!!

On Tue, 20 Dec 2022, 17:31 Greg Harris,




wrote:


Congratulations Josep!

On Tue, Dec 20, 2022 at 9:29 AM Chris Egerton <

fearthecel...@gmail.com>

wrote:


Congrats Josep! Well-earned.

On Tue, Dec 20, 2022, 12:26 Jun Rao 

wrote:



Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka

committer

Josep

   Prat.

Josep has been contributing to Kafka since May 2021. He

contributed 20

PRs

including the following 2 KIPs.

KIP-773 Differentiate metric latency measured in ms and ns
KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface

with

internal implementation

Congratulations, Josep!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

Re: [ANNOUNCE] New committer: Josep Prat

2022-12-20 Thread Matthias J. Sax


Congrats!

On 12/20/22 12:01 PM, Josep Prat wrote:

Thank you all!

———
Josep Prat

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Tue, Dec 20, 2022, 20:42 Bill Bejeck  wrote:


Congratulations Josep!

-Bill

On Tue, Dec 20, 2022 at 1:11 PM Mickael Maison 
wrote:


Congratulations Josep!

On Tue, Dec 20, 2022 at 6:55 PM Bruno Cadonna 

wrote:


Congrats, Josep!

Well deserved!

Best,
Bruno

On 20.12.22 18:40, Kirk True wrote:

Congrats Josep!

On Tue, Dec 20, 2022, at 9:33 AM, Jorge Esteban Quilcate Otoya wrote:

Congrats Josep!!

On Tue, 20 Dec 2022, 17:31 Greg Harris,




wrote:


Congratulations Josep!

On Tue, Dec 20, 2022 at 9:29 AM Chris Egerton <

fearthecel...@gmail.com>

wrote:


Congrats Josep! Well-earned.

On Tue, Dec 20, 2022, 12:26 Jun Rao 

wrote:



Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka

committer

Josep

   Prat.

Josep has been contributing to Kafka since May 2021. He

contributed 20

PRs

including the following 2 KIPs.

KIP-773 Differentiate metric latency measured in ms and ns
KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface

with

internal implementation

Congratulations, Josep!

Thanks,

Jun (on behalf of the Apache Kafka PMC)

[jira] [Commented] (KAFKA-14537) Materialized with / as ordering issues

2022-12-20 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649894#comment-17649894
 ] 

Matthias J. Sax commented on KAFKA-14537:
-

What you observe is behavior as design. Note that some methods (like `as()` and 
`with()`) are {_}static{_}. Those static methods are the "entry point" to 
create a new `Materialized` object with default config. If you have a 
`Materialized` object at hand, you can modify it calling non-static methods.

It's a Java thing, that you should never call a static method on an 
{_}object{_}. If you call a static method, it just creates a new object and you 
cannot chain method calls to modify an object.

Closing this ticket as "not a bug".

> Materialized with / as ordering issues
> --
>
> Key: KAFKA-14537
> URL: https://issues.apache.org/jira/browse/KAFKA-14537
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
> Environment: Java 17
>Reporter: Matt Allwood
>Priority: Major
>
> I have found a couple of cases where using Materialized .with and .as in the 
> wrong order can either remove your configured serdes, reverting to the 
> default configured serdes, or remove the store name, reverting to the 
> auto-generated Kafka store name.
> This does not appear to affect .withKeySerde().withValueSerde pairs
> My most recent example is using a simple Processor node followed by a 
> toTable()
> {code:java}
> .process(new PreferenceFlatteningProcessor(PREFERENCE_FLATTENING_STORE_NAME), 
>  PREFERENCE_FLATTENING_STORE_NAME)
>                 .toTable(
>                         Materialized
>                          
> .with(avroSerdes.commodityRegionValuesWithCLKeySerde, Serdes.Integer())
>                          .as("flattenedUserPreferencesTable"))
>  {code}
> (apologies for formatting - it's difficult to see in JIRA)
> Having the .as in the above resulted in toTable failing as it tried to use 
> the default serdes rather than those provided. It was a confusing bug, as the 
> error suggested that the issue was in my .process() code in serialising the 
> record rather than in the following toTable().
> As mentioned I have also encountered issues with the names going missing, but 
> didn't raise that at the time, as it was an annoyance rather than crashing my 
> application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14537) Materialized with / as ordering issues

2022-12-20 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14537.
-
Resolution: Not A Bug

> Materialized with / as ordering issues
> --
>
> Key: KAFKA-14537
> URL: https://issues.apache.org/jira/browse/KAFKA-14537
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
> Environment: Java 17
>Reporter: Matt Allwood
>Priority: Major
>
> I have found a couple of cases where using Materialized .with and .as in the 
> wrong order can either remove your configured serdes, reverting to the 
> default configured serdes, or remove the store name, reverting to the 
> auto-generated Kafka store name.
> This does not appear to affect .withKeySerde().withValueSerde pairs
> My most recent example is using a simple Processor node followed by a 
> toTable()
> {code:java}
> .process(new PreferenceFlatteningProcessor(PREFERENCE_FLATTENING_STORE_NAME), 
>  PREFERENCE_FLATTENING_STORE_NAME)
>                 .toTable(
>                         Materialized
>                          
> .with(avroSerdes.commodityRegionValuesWithCLKeySerde, Serdes.Integer())
>                          .as("flattenedUserPreferencesTable"))
>  {code}
> (apologies for formatting - it's difficult to see in JIRA)
> Having the .as in the above resulted in toTable failing as it tried to use 
> the default serdes rather than those provided. It was a confusing bug, as the 
> error suggested that the issue was in my .process() code in serialising the 
> record rather than in the following toTable().
> As mentioned I have also encountered issues with the names going missing, but 
> didn't raise that at the time, as it was an annoyance rather than crashing my 
> application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14537) Materialized with / as ordering issues

2022-12-20 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14537.
-
Resolution: Not A Bug

> Materialized with / as ordering issues
> --
>
> Key: KAFKA-14537
> URL: https://issues.apache.org/jira/browse/KAFKA-14537
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
> Environment: Java 17
>Reporter: Matt Allwood
>Priority: Major
>
> I have found a couple of cases where using Materialized .with and .as in the 
> wrong order can either remove your configured serdes, reverting to the 
> default configured serdes, or remove the store name, reverting to the 
> auto-generated Kafka store name.
> This does not appear to affect .withKeySerde().withValueSerde pairs
> My most recent example is using a simple Processor node followed by a 
> toTable()
> {code:java}
> .process(new PreferenceFlatteningProcessor(PREFERENCE_FLATTENING_STORE_NAME), 
>  PREFERENCE_FLATTENING_STORE_NAME)
>                 .toTable(
>                         Materialized
>                          
> .with(avroSerdes.commodityRegionValuesWithCLKeySerde, Serdes.Integer())
>                          .as("flattenedUserPreferencesTable"))
>  {code}
> (apologies for formatting - it's difficult to see in JIRA)
> Having the .as in the above resulted in toTable failing as it tried to use 
> the default serdes rather than those provided. It was a confusing bug, as the 
> error suggested that the issue was in my .process() code in serialising the 
> record rather than in the following toTable().
> As mentioned I have also encountered issues with the names going missing, but 
> didn't raise that at the time, as it was an annoyance rather than crashing my 
> application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] KIP-889 Versioned State Stores

2022-12-19 Thread Matthias J. Sax


+1 (binding)

On 12/15/22 1:27 PM, John Roesler wrote:

Thanks for the thorough KIP, Victoria!

I'm +1 (binding)

-John

On 2022/12/15 19:56:21 Victoria Xia wrote:

Hi all,

I'd like to start a vote on KIP-889 for introducing versioned key-value
state stores to Kafka Streams:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores

The discussion thread has been open for a few weeks now and has converged
among the current participants.

Thanks,
Victoria

Re: [ANNOUNCE] New committer: Ron Dagostino

2022-12-16 Thread Matthias J. Sax


Congrats!

On 12/15/22 7:09 AM, Rajini Sivaram wrote:

Congratulations, Ron! Well deserved!!

Regards,

Rajini

On Thu, Dec 15, 2022 at 11:42 AM Ron Dagostino  wrote:


Thank you, everyone!

Ron


On Dec 15, 2022, at 5:09 AM, Bruno Cadonna  wrote:

Congrats Ron!

Best,
Bruno


On 15.12.22 10:23, Viktor Somogyi-Vass wrote:
Congrats Ron! :)

On Thu, Dec 15, 2022 at 10:22 AM Mickael Maison <

mickael.mai...@gmail.com>

wrote:
Congratulations Ron!

On Thu, Dec 15, 2022 at 9:41 AM Eslam Farag 

wrote:


Congratulations, Ron ☺️

On Thu, 15 Dec 2022 at 10:40 AM Tom Bentley 

wrote:



Congratulations!

On Thu, 15 Dec 2022 at 07:40, Satish Duggana <

satish.dugg...@gmail.com



wrote:


Congratulations, Ron!!

On Thu, 15 Dec 2022 at 07:48, ziming deng 


wrote:


Congratulations, Ron!
Well deserved!

--
Ziming


On Dec 15, 2022, at 09:16, Luke Chen  wrote:

Congratulations, Ron!
Well deserved!

Luke

Re: [ANNOUNCE] New committer: Ron Dagostino

2022-12-16 Thread Matthias J. Sax


Congrats!

On 12/15/22 7:09 AM, Rajini Sivaram wrote:

Congratulations, Ron! Well deserved!!

Regards,

Rajini

On Thu, Dec 15, 2022 at 11:42 AM Ron Dagostino  wrote:


Thank you, everyone!

Ron


On Dec 15, 2022, at 5:09 AM, Bruno Cadonna  wrote:

Congrats Ron!

Best,
Bruno


On 15.12.22 10:23, Viktor Somogyi-Vass wrote:
Congrats Ron! :)

On Thu, Dec 15, 2022 at 10:22 AM Mickael Maison <

mickael.mai...@gmail.com>

wrote:
Congratulations Ron!

On Thu, Dec 15, 2022 at 9:41 AM Eslam Farag 

wrote:


Congratulations, Ron ☺️

On Thu, 15 Dec 2022 at 10:40 AM Tom Bentley 

wrote:



Congratulations!

On Thu, 15 Dec 2022 at 07:40, Satish Duggana <

satish.dugg...@gmail.com



wrote:


Congratulations, Ron!!

On Thu, 15 Dec 2022 at 07:48, ziming deng 


wrote:


Congratulations, Ron!
Well deserved!

--
Ziming


On Dec 15, 2022, at 09:16, Luke Chen  wrote:

Congratulations, Ron!
Well deserved!

Luke

< 7 8 9 10 11 12 13 14 15 16 >

1101 - 1200 of 11295 matches

Mail list logo