[jira] [Assigned] (KAFKA-8134) ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients 2.1
[ https://issues.apache.org/jira/browse/KAFKA-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhruvil Shah reassigned KAFKA-8134: --- Assignee: Dhruvil Shah > ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients > 2.1 > - > > Key: KAFKA-8134 > URL: https://issues.apache.org/jira/browse/KAFKA-8134 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.1.0 >Reporter: Sam Lendle >Assignee: Dhruvil Shah >Priority: Major > > Prior to 2.1, the type of the "linger.ms" config was Long, but was changed to > Integer in 2.1.0 ([https://github.com/apache/kafka/pull/5270]) A config using > a Long value for that parameter which works with kafka-clients < 2.1 will > cause a ConfigException to be thrown when constructing a KafkaProducer if > kafka-clients is upgraded to >= 2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8134) ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients 2.1
[ https://issues.apache.org/jira/browse/KAFKA-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801386#comment-16801386 ] ASF GitHub Bot commented on KAFKA-8134: --- dhruvilshah3 commented on pull request #6502: KAFKA-8134: `linger.ms` must be a long URL: https://github.com/apache/kafka/pull/6502 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients > 2.1 > - > > Key: KAFKA-8134 > URL: https://issues.apache.org/jira/browse/KAFKA-8134 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.1.0 >Reporter: Sam Lendle >Priority: Major > > Prior to 2.1, the type of the "linger.ms" config was Long, but was changed to > Integer in 2.1.0 ([https://github.com/apache/kafka/pull/5270]) A config using > a Long value for that parameter which works with kafka-clients < 2.1 will > cause a ConfigException to be thrown when constructing a KafkaProducer if > kafka-clients is upgraded to >= 2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7986) distinguish the logging from different ZooKeeperClient instances
[ https://issues.apache.org/jira/browse/KAFKA-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801307#comment-16801307 ] ASF GitHub Bot commented on KAFKA-7986: --- junrao commented on pull request #6493: KAFKA-7986: Distinguish logging from different ZooKeeperClient instances URL: https://github.com/apache/kafka/pull/6493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > distinguish the logging from different ZooKeeperClient instances > > > Key: KAFKA-7986 > URL: https://issues.apache.org/jira/browse/KAFKA-7986 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Ivan Yurchenko >Priority: Major > Labels: newbie > > It's possible for each broker to have more than 1 ZooKeeperClient instance. > For example, SimpleAclAuthorizer creates a separate ZooKeeperClient instance > when configured. It would be useful to distinguish the logging from different > ZooKeeperClient instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7991) Add StreamsUpgradeTest for 2.2 release
[ https://issues.apache.org/jira/browse/KAFKA-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax reassigned KAFKA-7991: -- Assignee: John Roesler > Add StreamsUpgradeTest for 2.2 release > -- > > Key: KAFKA-7991 > URL: https://issues.apache.org/jira/browse/KAFKA-7991 > Project: Kafka > Issue Type: Task > Components: streams, system tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Assignee: John Roesler >Priority: Blocker > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8156) Client id when provided is not suffixed with an index
Nagaraj Gopal created KAFKA-8156: Summary: Client id when provided is not suffixed with an index Key: KAFKA-8156 URL: https://issues.apache.org/jira/browse/KAFKA-8156 Project: Kafka Issue Type: Improvement Components: consumer Affects Versions: 2.1.1 Reporter: Nagaraj Gopal We use Camel Kafka component and one of the configuration is consumersCount which is number of concurrent consumers that can read data from the topic. Usually we don't care about client id but when we start emitting metrics it becomes important piece of the puzzle. The client id would help differentiate metrics between different consumers each with `n` consumer count (concurrent consumers) and each consumer deployed in different JVMs. Currently when client id is provided it is not suffixed with an index and when it is not provided the library seems to create its own client id prefixed with an index (format: consumer-0, consumer-1). This is limiting when we have multiple consumers as described above -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8151) Broker hangs and lockups after Zookeeper outages
[ https://issues.apache.org/jira/browse/KAFKA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801228#comment-16801228 ] Jeff Nadler commented on KAFKA-8151: We've seen similar problems with 2.1.0 and 2.1.1, and we don't use SSL. The issues are always preceeded by a ZK disconnect like yours: WARN Client session timed out, have not heard from server in 5334ms for sessionid 0x400216eb36a0275 ... but if we have any 'real' ZK unavailability it's very brief - I even have gc logging enabled and the GC pauses are very short. I rolled all of our clusters back to 2.0.1 and we're stable again. > Broker hangs and lockups after Zookeeper outages > > > Key: KAFKA-8151 > URL: https://issues.apache.org/jira/browse/KAFKA-8151 > Project: Kafka > Issue Type: Bug > Components: controller, core, zkclient >Affects Versions: 2.1.1 >Reporter: Joe Ammann >Priority: Major > Attachments: symptom3_lxgurten_kafka_dump1.txt, > symptom3_lxgurten_kafka_dump2.txt, symptom3_lxgurten_kafka_dump3.txt > > > We're running several clusters (mostly with 3 brokers) with 2.1.1, where we > see at least 3 different symptoms, all resulting on broker/controller lockups. > We are pretty sure that the triggering cause for all these symptoms are > temporary (for 3-5 minutes normally) of the Zookeeper cluster. The Linux VMs > where the ZK nodes run on regularly get stalled for a couple of minutes. The > ZK nodes always very quickly reunite and build a Quorum after the situation > clears, but the Kafka brokers (which run on then same Linux VMs) quite often > show problems after this procedure. > I've seen 3 different kinds of problems (this is why I put "reproduce" in > quotes, I can never predict what will happen) > # the brokers get their ZK sessions expired (obviously) and sometimes only 2 > of 3 re-register under /brokers/ids. The 3rd broker doesn't re-register for > some reason (that's the problem I originally described) > # the brokers all re-register and re-elect a new controller. But that new > controller does not fully work. For example it doesn't process partition > reassignment requests and or does not transfer partition leadership after I > kill a broker > # the previous controller gets "dead-locked" (it has 3-4 of the important > controller threads in a lock) and hence does not perform any of it's > controller duties. But it regards itsself still as the valid controller and > is accepted by the other brokers > I'll try to describe each one of the problems in more detail below, and hope > to be able to cleary separate them. > I'm able to provoke these problems in our DEV environment quite regularly > using the following procedure > * make sure all ZK nodes and Kafka brokers are stable and reacting normally > * freeze 2 out of 3 ZK nodes with {{kill -STOP}} for some minutes > * let the Kafka broker running, of course they will start complaining to be > unable to reach ZK > * thaw the processes with {{kill -CONT}} > * now all Kafka brokers get notified that their ZK session has expired, and > they start to reorganize the cluster > In about 20% of the tests, I'm able to produce one of the symptoms above. I > can not predict which one though. I'm varying this procedure sometimes by > also freezing one Kafka broker (most often the controller), but until now I > haven't been able to create a clear pattern or really force one specific > symptom > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8151) Broker hangs and lockups after Zookeeper outages
[ https://issues.apache.org/jira/browse/KAFKA-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801214#comment-16801214 ] Joe Ammann commented on KAFKA-8151: --- I still can't reproduce any symptoms in DEV when using PLAINTEXT for interbroker comms. But last night we had 2 occurences of symptom 2 (all brokers and controller registered in ZK, but controller actions - e.g. partition leader reassignment - does not happen) in TEST, where I had also enabled PLAINTEXT. So it definitely also happens with PLAINTEXT > Broker hangs and lockups after Zookeeper outages > > > Key: KAFKA-8151 > URL: https://issues.apache.org/jira/browse/KAFKA-8151 > Project: Kafka > Issue Type: Bug > Components: controller, core, zkclient >Affects Versions: 2.1.1 >Reporter: Joe Ammann >Priority: Major > Attachments: symptom3_lxgurten_kafka_dump1.txt, > symptom3_lxgurten_kafka_dump2.txt, symptom3_lxgurten_kafka_dump3.txt > > > We're running several clusters (mostly with 3 brokers) with 2.1.1, where we > see at least 3 different symptoms, all resulting on broker/controller lockups. > We are pretty sure that the triggering cause for all these symptoms are > temporary (for 3-5 minutes normally) of the Zookeeper cluster. The Linux VMs > where the ZK nodes run on regularly get stalled for a couple of minutes. The > ZK nodes always very quickly reunite and build a Quorum after the situation > clears, but the Kafka brokers (which run on then same Linux VMs) quite often > show problems after this procedure. > I've seen 3 different kinds of problems (this is why I put "reproduce" in > quotes, I can never predict what will happen) > # the brokers get their ZK sessions expired (obviously) and sometimes only 2 > of 3 re-register under /brokers/ids. The 3rd broker doesn't re-register for > some reason (that's the problem I originally described) > # the brokers all re-register and re-elect a new controller. But that new > controller does not fully work. For example it doesn't process partition > reassignment requests and or does not transfer partition leadership after I > kill a broker > # the previous controller gets "dead-locked" (it has 3-4 of the important > controller threads in a lock) and hence does not perform any of it's > controller duties. But it regards itsself still as the valid controller and > is accepted by the other brokers > I'll try to describe each one of the problems in more detail below, and hope > to be able to cleary separate them. > I'm able to provoke these problems in our DEV environment quite regularly > using the following procedure > * make sure all ZK nodes and Kafka brokers are stable and reacting normally > * freeze 2 out of 3 ZK nodes with {{kill -STOP}} for some minutes > * let the Kafka broker running, of course they will start complaining to be > unable to reach ZK > * thaw the processes with {{kill -CONT}} > * now all Kafka brokers get notified that their ZK session has expired, and > they start to reorganize the cluster > In about 20% of the tests, I'm able to produce one of the symptoms above. I > can not predict which one though. I'm varying this procedure sometimes by > also freezing one Kafka broker (most often the controller), but until now I > haven't been able to create a clear pattern or really force one specific > symptom > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8026) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted
[ https://issues.apache.org/jira/browse/KAFKA-8026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801077#comment-16801077 ] ASF GitHub Bot commented on KAFKA-8026: --- bbejeck commented on pull request #6463: KAFKA-8026: Fix flaky regex source integration test 1.0 URL: https://github.com/apache/kafka/pull/6463 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted > > > Key: KAFKA-8026 > URL: https://issues.apache.org/jira/browse/KAFKA-8026 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 1.0.2, 1.1.1 >Reporter: Matthias J. Sax >Assignee: Bill Bejeck >Priority: Critical > Labels: flaky-test > Fix For: 1.0.3, 1.1.2 > > > {quote}java.lang.AssertionError: Condition not met within timeout 15000. > Stream tasks not updated > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:254) > at > org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:215){quote} > Happend in 1.0 and 1.1 builds: > [https://builds.apache.org/blue/organizations/jenkins/kafka-1.0-jdk7/detail/kafka-1.0-jdk7/263/tests/] > and > [https://builds.apache.org/blue/organizations/jenkins/kafka-1.1-jdk7/detail/kafka-1.1-jdk7/249/tests/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8155) Update Streams system tests for 2.2.0 and 2.1.1 releases
John Roesler created KAFKA-8155: --- Summary: Update Streams system tests for 2.2.0 and 2.1.1 releases Key: KAFKA-8155 URL: https://issues.apache.org/jira/browse/KAFKA-8155 Project: Kafka Issue Type: Task Components: streams, system tests Reporter: John Roesler Assignee: John Roesler -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8147) Add changelog topic configuration to KTable suppress
[ https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801023#comment-16801023 ] John Roesler commented on KAFKA-8147: - No worries! I think it's pretty normal to see (and ignore) a lot of updates from the wiki as everyone is editing it. > Add changelog topic configuration to KTable suppress > > > Key: KAFKA-8147 > URL: https://issues.apache.org/jira/browse/KAFKA-8147 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.1.1 >Reporter: Maarten >Assignee: Maarten >Priority: Minor > Labels: needs-kip > > The streams DSL does not provide a way to configure the changelog topic > created by KTable.suppress. > From the perspective of an external user this could be implemented similar to > the configuration of aggregate + materialized, i.e., > {code:java} > changelogTopicConfigs = // Configs > materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs) > .. > KGroupedStream.aggregate(..,materialized) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8154) Buffer Overflow exceptions between brokers and with clients
Rajesh Nataraja created KAFKA-8154: -- Summary: Buffer Overflow exceptions between brokers and with clients Key: KAFKA-8154 URL: https://issues.apache.org/jira/browse/KAFKA-8154 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 2.1.0 Reporter: Rajesh Nataraja Attachments: server.properties.txt https://github.com/apache/kafka/pull/6495 https://github.com/apache/kafka/pull/5785 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7895) Ktable supress operator emitting more than one record for the same key per window
[ https://issues.apache.org/jira/browse/KAFKA-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800950#comment-16800950 ] John Roesler commented on KAFKA-7895: - Hi [~AndrewRK], Sorry to hear that! Looking at your topology, I'd expect it to work as advertised (obviously), and it also looks very similar to what I've tested heavily, so I might need to get some more help from you to investigate the cause. I have a couple of follow-up questions... # Is this restarting after a clean shutdown or a crash? (i.e., how do you stop and restart the app?) # Do you have EOS enabled? (If you don't, can you try to repro with EOS on?) Thanks, -John > Ktable supress operator emitting more than one record for the same key per > window > - > > Key: KAFKA-7895 > URL: https://issues.apache.org/jira/browse/KAFKA-7895 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0, 2.1.1 >Reporter: prasanthi >Assignee: John Roesler >Priority: Major > Fix For: 2.2.0, 2.1.2 > > > Hi, We are using kstreams to get the aggregated counts per vendor(key) within > a specified window. > Here's how we configured the suppress operator to emit one final record per > key/window. > {code:java} > KTable, Long> windowedCount = groupedStream > .windowedBy(TimeWindows.of(Duration.ofMinutes(1)).grace(ofMillis(5L))) > .count(Materialized.with(Serdes.Integer(),Serdes.Long())) > .suppress(Suppressed.untilWindowCloses(unbounded())); > {code} > But we are getting more than one record for the same key/window as shown > below. > {code:java} > [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1039 > [KTABLE-TOSTREAM-10]: [131@154906704/154906710], 1162 > [KTABLE-TOSTREAM-10]: [9@154906704/154906710], 6584 > [KTABLE-TOSTREAM-10]: [88@154906704/154906710], 107 > [KTABLE-TOSTREAM-10]: [108@154906704/154906710], 315 > [KTABLE-TOSTREAM-10]: [119@154906704/154906710], 119 > [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 746 > [KTABLE-TOSTREAM-10]: [154@154906704/154906710], 809{code} > Could you please take a look? > Thanks > > > Added by John: > Acceptance Criteria: > * add suppress to system tests, such that it's exercised with crash/shutdown > recovery, rebalance, etc. > ** [https://github.com/apache/kafka/pull/6278] > * make sure that there's some system test coverage with caching disabled. > ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7943 > * test with tighter time bounds with windows of say 30 seconds and use > system time without adding any extra time for verification > ** Follow-on ticket: https://issues.apache.org/jira/browse/KAFKA-7944 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6988) Kafka windows classpath too long
[ https://issues.apache.org/jira/browse/KAFKA-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800922#comment-16800922 ] ASF GitHub Bot commented on KAFKA-6988: --- ward-eric commented on pull request #6499: KAFKA-6988: Reduce classpath via classpath jar URL: https://github.com/apache/kafka/pull/6499 We hit the issue outlined in #5960; however, we came up with a slightly different solution that attempts to not pollute the classpath with unnecessary items. Instead we build a classpath jar via Gradle and append it to the classpath. The Gradle function handles the regex originally done by `should_include_file`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka windows classpath too long > > > Key: KAFKA-6988 > URL: https://issues.apache.org/jira/browse/KAFKA-6988 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.0, 2.0.0 >Reporter: lkgen >Priority: Major > > In Kafka windows, the kafka-run-class.bat script is building a CLASSPATH with > full path to each jar > If installation is in a long path directory, the CLASSPATH becomes too long > and there is an error of > {{**The input line is too long. }} > {{when running zookeeper-server-start.bat and other commands}} > {{a possible solution may be to expand all jars but add dir\* to classpath}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8150) Fix bugs in handling null arrays in generated RPC code
[ https://issues.apache.org/jira/browse/KAFKA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8150. Resolution: Fixed Fix Version/s: 2.2.1 The code path that this fixes isn't used in 2.2, I think. But I backported the patch to that branch just for the purpose of future-proofing. > Fix bugs in handling null arrays in generated RPC code > -- > > Key: KAFKA-8150 > URL: https://issues.apache.org/jira/browse/KAFKA-8150 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > Fix For: 2.2.1 > > > Fix bugs in handling null arrays in generated RPC code. > toString should not get a NullPointException. > Also, read() must properly translate a negative array length to a null field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8102) Trogdor - Add Produce workload transaction generator by interval
[ https://issues.apache.org/jira/browse/KAFKA-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800881#comment-16800881 ] ASF GitHub Bot commented on KAFKA-8102: --- cmccabe commented on pull request #6444: KAFKA-8102: Add an interval-based Trogdor transaction generator URL: https://github.com/apache/kafka/pull/6444 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Trogdor - Add Produce workload transaction generator by interval > > > Key: KAFKA-8102 > URL: https://issues.apache.org/jira/browse/KAFKA-8102 > Project: Kafka > Issue Type: Improvement >Reporter: Stanislav Kozlovski >Assignee: Stanislav Kozlovski >Priority: Major > > Trogdor's specification for produce worker workloads > ([ProduceBenchSpec|https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/trogdor/workload/ProduceBenchSpec.java]) > supports configuring a transactional producer using a class that implements > `TransactionGenerator` interface. > > [UniformTransactioGenerator|https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/trogdor/workload/UniformTransactionsGenerator.java] > which triggers a transaction every N records. > It would be useful to have a generator which supports triggering a > transaction in an interval - e.g every 100 milliseconds. This is how Kafka > Streams configures its own [EOS semantics by > default|https://github.com/apache/kafka/blob/8e975400711b0ea64bf4a00c8c551e448ab48416/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L140]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8150) Fix bugs in handling null arrays in generated RPC code
[ https://issues.apache.org/jira/browse/KAFKA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-8150: --- Affects Version/s: 2.2.1 > Fix bugs in handling null arrays in generated RPC code > -- > > Key: KAFKA-8150 > URL: https://issues.apache.org/jira/browse/KAFKA-8150 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.2.1 >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > Fix For: 2.2.1 > > > Fix bugs in handling null arrays in generated RPC code. > toString should not get a NullPointException. > Also, read() must properly translate a negative array length to a null field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8150) Fix bugs in handling null arrays in generated RPC code
[ https://issues.apache.org/jira/browse/KAFKA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800872#comment-16800872 ] ASF GitHub Bot commented on KAFKA-8150: --- cmccabe commented on pull request #6489: KAFKA-8150: Fix bugs in handling null arrays in generated RPC code URL: https://github.com/apache/kafka/pull/6489 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix bugs in handling null arrays in generated RPC code > -- > > Key: KAFKA-8150 > URL: https://issues.apache.org/jira/browse/KAFKA-8150 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Fix bugs in handling null arrays in generated RPC code. > toString should not get a NullPointException. > Also, read() must properly translate a negative array length to a null field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8147) Add changelog topic configuration to KTable suppress
[ https://issues.apache.org/jira/browse/KAFKA-8147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800867#comment-16800867 ] Maarten commented on KAFKA-8147: I would like to apologize for the unnecessary KIP updates, I'm working on it but struggling with Confluence at the moment > Add changelog topic configuration to KTable suppress > > > Key: KAFKA-8147 > URL: https://issues.apache.org/jira/browse/KAFKA-8147 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 2.1.1 >Reporter: Maarten >Assignee: Maarten >Priority: Minor > Labels: needs-kip > > The streams DSL does not provide a way to configure the changelog topic > created by KTable.suppress. > From the perspective of an external user this could be implemented similar to > the configuration of aggregate + materialized, i.e., > {code:java} > changelogTopicConfigs = // Configs > materialized = Materialized.as(..).withLoggingEnabled(changelogTopicConfigs) > .. > KGroupedStream.aggregate(..,materialized) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8026) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted
[ https://issues.apache.org/jira/browse/KAFKA-8026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800858#comment-16800858 ] ASF GitHub Bot commented on KAFKA-8026: --- bbejeck commented on pull request #6459: KAFKA-8026: Fix for flaky RegexSourceIntegrationTest URL: https://github.com/apache/kafka/pull/6459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted > > > Key: KAFKA-8026 > URL: https://issues.apache.org/jira/browse/KAFKA-8026 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 1.0.2, 1.1.1 >Reporter: Matthias J. Sax >Assignee: Bill Bejeck >Priority: Critical > Labels: flaky-test > Fix For: 1.0.3, 1.1.2 > > > {quote}java.lang.AssertionError: Condition not met within timeout 15000. > Stream tasks not updated > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:276) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:254) > at > org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:215){quote} > Happend in 1.0 and 1.1 builds: > [https://builds.apache.org/blue/organizations/jenkins/kafka-1.0-jdk7/detail/kafka-1.0-jdk7/263/tests/] > and > [https://builds.apache.org/blue/organizations/jenkins/kafka-1.1-jdk7/detail/kafka-1.1-jdk7/249/tests/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8014) Extend Connect integration tests to add and remove workers dynamically
[ https://issues.apache.org/jira/browse/KAFKA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch resolved KAFKA-8014. -- Resolution: Fixed > Extend Connect integration tests to add and remove workers dynamically > -- > > Key: KAFKA-8014 > URL: https://issues.apache.org/jira/browse/KAFKA-8014 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Major > Fix For: 2.3.0, 2.1.2, 2.2.1 > > > To allow for even more integration tests that can focus on testing Connect > framework itself, it seems necessary to add the ability to add and remove > workers from within a test case. > The suggestion is to extend Connect's integration test harness > {{EmbeddedConnectCluster}} to include methods to add and remove workers as > well as return the workers that are online at any given point. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8014) Extend Connect integration tests to add and remove workers dynamically
[ https://issues.apache.org/jira/browse/KAFKA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-8014: - Fix Version/s: 2.2.1 2.1.2 2.3.0 Description: To allow for even more integration tests that can focus on testing Connect framework itself, it seems necessary to add the ability to add and remove workers from within a test case. The suggestion is to extend Connect's integration test harness {{EmbeddedConnectCluster}} to include methods to add and remove workers as well as return the workers that are online at any given point. was: To allow for even more integration tests that can focus on testing Connect framework itself, it seems necessary to add the ability to add and remove workers from within a test case. The suggestion is to extend Connect's integration test harness {{EmbeddedConnectCluster}} to include methods to add and remove workers as well as return the workers that are online at any given point. > Extend Connect integration tests to add and remove workers dynamically > -- > > Key: KAFKA-8014 > URL: https://issues.apache.org/jira/browse/KAFKA-8014 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Major > Fix For: 2.3.0, 2.1.2, 2.2.1 > > > To allow for even more integration tests that can focus on testing Connect > framework itself, it seems necessary to add the ability to add and remove > workers from within a test case. > The suggestion is to extend Connect's integration test harness > {{EmbeddedConnectCluster}} to include methods to add and remove workers as > well as return the workers that are online at any given point. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3539) KafkaProducer.send() may block even though it returns the Future
[ https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800714#comment-16800714 ] radai rosenblatt commented on KAFKA-3539: - IIUC, the root of the problem is the kafka producer stores compressed batches of msgs, in a map keyed by the partition these msgs are intended for. since without metadata there's no knowing the layout of a topic the producer cant tell where to "place" a msg, which is why it blocks on no metadata. one possible solution would be to have an "unknown" msg bucket (with some finite capacity) where msgs of unknown destination go. the biggest issue with this is that those msgs cannot be compressed (as kafka compresses batches, not individual msgs, and there's no guarantee that everything in the unknown bucket will go into the same batch). once metadata is obtained the "unknown bucket" would need to be iterated over, and the msgs deposited (and compressed) into the correct queues. this would need to happen when metadata arrives and before any new msgs are allowed into the producer (to not violate order) > KafkaProducer.send() may block even though it returns the Future > > > Key: KAFKA-3539 > URL: https://issues.apache.org/jira/browse/KAFKA-3539 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Oleg Zhurakousky >Priority: Critical > > You can get more details from the us...@kafka.apache.org by searching on the > thread with the subject "KafkaProducer block on send". > The bottom line is that method that returns Future must never block, since it > essentially violates the Future contract as it was specifically designed to > return immediately passing control back to the user to check for completion, > cancel etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-3539) KafkaProducer.send() may block even though it returns the Future
[ https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800656#comment-16800656 ] Spyridon Ninos commented on KAFKA-3539: --- Hi guys, any solutions proposed? I've hit a similar issue with [~tu...@avast.com] too, but by studying the code I am not confident that any solution will be that much better than the current one, either semantically or technically. Having said that, some weeks ago I took a look at how to solve the blocking nature of the producer - I'd like to know what others have thought as probable solutions. Any suggestions? Thanks > KafkaProducer.send() may block even though it returns the Future > > > Key: KAFKA-3539 > URL: https://issues.apache.org/jira/browse/KAFKA-3539 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Oleg Zhurakousky >Priority: Critical > > You can get more details from the us...@kafka.apache.org by searching on the > thread with the subject "KafkaProducer block on send". > The bottom line is that method that returns Future must never block, since it > essentially violates the Future contract as it was specifically designed to > return immediately passing control back to the user to check for completion, > cancel etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8153) Streaming application with state stores takes up to 1 hour to restart
[ https://issues.apache.org/jira/browse/KAFKA-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Melsen updated KAFKA-8153: -- Affects Version/s: (was: 2.1.1) 2.1.0 > Streaming application with state stores takes up to 1 hour to restart > - > > Key: KAFKA-8153 > URL: https://issues.apache.org/jira/browse/KAFKA-8153 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.1.0 >Reporter: Michael Melsen >Priority: Major > > We are using spring cloud stream with Kafka streams 2.0.1 and utilizing the > InteractiveQueryService to fetch data from the stores. There are 4 stores > that persist data on disk after aggregating data. The code for the topology > looks like this: > {code:java} > @Slf4j > @EnableBinding(SensorMeasurementBinding.class) > public class Consumer { > public static final String RETENTION_MS = "retention.ms"; > public static final String CLEANUP_POLICY = "cleanup.policy"; > @Value("${windowstore.retention.ms}") > private String retention; > /** > * Process the data flowing in from a Kafka topic. Aggregate the data to: > * - 2 minute > * - 15 minutes > * - one hour > * - 12 hours > * > * @param stream > */ > @StreamListener(SensorMeasurementBinding.ERROR_SCORE_IN) > public void process(KStream stream) { > Map topicConfig = new HashMap<>(); > topicConfig.put(RETENTION_MS, retention); > topicConfig.put(CLEANUP_POLICY, "delete"); > log.info("Changelog and local window store retention.ms: {} and > cleanup.policy: {}", > topicConfig.get(RETENTION_MS), > topicConfig.get(CLEANUP_POLICY)); > createWindowStore(LocalStore.TWO_MINUTES_STORE, topicConfig, stream); > createWindowStore(LocalStore.FIFTEEN_MINUTES_STORE, topicConfig, stream); > createWindowStore(LocalStore.ONE_HOUR_STORE, topicConfig, stream); > createWindowStore(LocalStore.TWELVE_HOURS_STORE, topicConfig, stream); > } > private void createWindowStore( > LocalStore localStore, > Map topicConfig, > KStream stream) { > // Configure how the statestore should be materialized using the provide > storeName > Materialized> materialized > = Materialized > .as(localStore.getStoreName()); > // Set retention of changelog topic > materialized.withLoggingEnabled(topicConfig); > // Configure how windows looks like and how long data will be retained in > local stores > TimeWindows configuredTimeWindows = getConfiguredTimeWindows( > localStore.getTimeUnit(), > Long.parseLong(topicConfig.get(RETENTION_MS))); > // Processing description: > // The input data are 'samples' with key > ::: > // 1. With the map we add the Tag to the key and we extract the error > score from the data > // 2. With the groupByKey we group the data on the new key > // 3. With windowedBy we split up the data in time intervals depending on > the provided LocalStore enum > // 4. With reduce we determine the maximum value in the time window > // 5. Materialized will make it stored in a table > stream > .map(getInstallationAssetModelAlgorithmTagKeyMapper()) > .groupByKey() > .windowedBy(configuredTimeWindows) > .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, > newValue), materialized); > } > private TimeWindows getConfiguredTimeWindows(long windowSizeMs, long > retentionMs) { > TimeWindows timeWindows = TimeWindows.of(windowSizeMs); > timeWindows.until(retentionMs); > return timeWindows; > } > /** >* Determine the max error score to keep by looking at the aggregated error > signal and >* freshly consumed error signal >* >* @param aggValue >* @param newValue >* @return >*/ > private ErrorScore getMaxErrorScore(ErrorScore aggValue, ErrorScore > newValue) { > if(aggValue.getErrorSignal() > newValue.getErrorSignal()) { > return aggValue; > } > return newValue; > } > private KeyValueMapper KeyValue> > getInstallationAssetModelAlgorithmTagKeyMapper() { > return (s, sensorMeasurement) -> new KeyValue<>(s + "::" + > sensorMeasurement.getT(), > new ErrorScore(sensorMeasurement.getTs(), > sensorMeasurement.getE(), sensorMeasurement.getO())); > } > } > {code} > So we are materializing aggregated data to four different stores after > determining the max value within a specific window for a specific key. Please > note that retention which is set to two months of data and the clean up > policy delete. We don't compact data. > The size of the individual state stores on disk is between 14 to 20 gb of > data. > We are making use of Interactive Queries: >
[jira] [Created] (KAFKA-8153) Streaming application with state stores takes up to 1 hour to restart
Michael Melsen created KAFKA-8153: - Summary: Streaming application with state stores takes up to 1 hour to restart Key: KAFKA-8153 URL: https://issues.apache.org/jira/browse/KAFKA-8153 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.1.1 Reporter: Michael Melsen We are using spring cloud stream with Kafka streams 2.0.1 and utilizing the InteractiveQueryService to fetch data from the stores. There are 4 stores that persist data on disk after aggregating data. The code for the topology looks like this: {code:java} @Slf4j @EnableBinding(SensorMeasurementBinding.class) public class Consumer { public static final String RETENTION_MS = "retention.ms"; public static final String CLEANUP_POLICY = "cleanup.policy"; @Value("${windowstore.retention.ms}") private String retention; /** * Process the data flowing in from a Kafka topic. Aggregate the data to: * - 2 minute * - 15 minutes * - one hour * - 12 hours * * @param stream */ @StreamListener(SensorMeasurementBinding.ERROR_SCORE_IN) public void process(KStream stream) { Map topicConfig = new HashMap<>(); topicConfig.put(RETENTION_MS, retention); topicConfig.put(CLEANUP_POLICY, "delete"); log.info("Changelog and local window store retention.ms: {} and cleanup.policy: {}", topicConfig.get(RETENTION_MS), topicConfig.get(CLEANUP_POLICY)); createWindowStore(LocalStore.TWO_MINUTES_STORE, topicConfig, stream); createWindowStore(LocalStore.FIFTEEN_MINUTES_STORE, topicConfig, stream); createWindowStore(LocalStore.ONE_HOUR_STORE, topicConfig, stream); createWindowStore(LocalStore.TWELVE_HOURS_STORE, topicConfig, stream); } private void createWindowStore( LocalStore localStore, Map topicConfig, KStream stream) { // Configure how the statestore should be materialized using the provide storeName Materialized> materialized = Materialized .as(localStore.getStoreName()); // Set retention of changelog topic materialized.withLoggingEnabled(topicConfig); // Configure how windows looks like and how long data will be retained in local stores TimeWindows configuredTimeWindows = getConfiguredTimeWindows( localStore.getTimeUnit(), Long.parseLong(topicConfig.get(RETENTION_MS))); // Processing description: // The input data are 'samples' with key ::: // 1. With the map we add the Tag to the key and we extract the error score from the data // 2. With the groupByKey we group the data on the new key // 3. With windowedBy we split up the data in time intervals depending on the provided LocalStore enum // 4. With reduce we determine the maximum value in the time window // 5. Materialized will make it stored in a table stream .map(getInstallationAssetModelAlgorithmTagKeyMapper()) .groupByKey() .windowedBy(configuredTimeWindows) .reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue), materialized); } private TimeWindows getConfiguredTimeWindows(long windowSizeMs, long retentionMs) { TimeWindows timeWindows = TimeWindows.of(windowSizeMs); timeWindows.until(retentionMs); return timeWindows; } /** * Determine the max error score to keep by looking at the aggregated error signal and * freshly consumed error signal * * @param aggValue * @param newValue * @return */ private ErrorScore getMaxErrorScore(ErrorScore aggValue, ErrorScore newValue) { if(aggValue.getErrorSignal() > newValue.getErrorSignal()) { return aggValue; } return newValue; } private KeyValueMapper> getInstallationAssetModelAlgorithmTagKeyMapper() { return (s, sensorMeasurement) -> new KeyValue<>(s + "::" + sensorMeasurement.getT(), new ErrorScore(sensorMeasurement.getTs(), sensorMeasurement.getE(), sensorMeasurement.getO())); } } {code} So we are materializing aggregated data to four different stores after determining the max value within a specific window for a specific key. Please note that retention which is set to two months of data and the clean up policy delete. We don't compact data. The size of the individual state stores on disk is between 14 to 20 gb of data. We are making use of Interactive Queries: [https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html#interactive-queries] On our setup we have 4 instances of our streaming app to be used as one consumer group. So every instance will store a specific part of all data in its store. This all seems to work nicely. Until we restart one or more instances and wait for it to become available again. (Restart time only is about 3 minutes max). I would expect that the restart of the app would not take that long but unfortunately it
[jira] [Commented] (KAFKA-6820) Improve on StreamsMetrics Public APIs
[ https://issues.apache.org/jira/browse/KAFKA-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800458#comment-16800458 ] ASF GitHub Bot commented on KAFKA-6820: --- guozhangwang commented on pull request #6498: [DO NOT MERGE] KAFKA-6820: Refactor Stream Metrics URL: https://github.com/apache/kafka/pull/6498 *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve on StreamsMetrics Public APIs > - > > Key: KAFKA-6820 > URL: https://issues.apache.org/jira/browse/KAFKA-6820 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Priority: Major > Labels: needs-kip > > Our current `addLatencyAndThroughputSensor`, `addThroughputSensor` are not > very well designed and hence not very user friendly to people to add their > customized sensors. We could consider improving on this feature. Some related > things to consider: > 1. Our internal built-in metrics should be independent on these public APIs > which are for user customized sensor only. See KAFKA-6819 for related > description. > 2. We could enforce the scopeName possible values, and well document on the > sensor hierarchies that would be incurred from the function calls. In this > way the library can help closing user's sensors automatically when the > corresponding scope (store, task, thread, etc) is being de-constructed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)