[jira] [Commented] (KAFKA-9066) Kafka Connect JMX : source & sink task metrics missing for tasks in failed state
[ https://issues.apache.org/jira/browse/KAFKA-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044186#comment-17044186 ] Selman Kayrancioglu commented on KAFKA-9066: We're also experiencing this with version 2.3.0. > Kafka Connect JMX : source & sink task metrics missing for tasks in failed > state > > > Key: KAFKA-9066 > URL: https://issues.apache.org/jira/browse/KAFKA-9066 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.1.1 >Reporter: Mikołaj Stefaniak >Priority: Major > > h2. Overview > Kafka Connect exposes various metrics via JMX. Those metrics can be exported > i.e. by _Prometheus JMX Exporter_ for further processing. > One of crucial attributes is connector's *task status.* > According to official Kafka docs, status is available as +status+ attribute > of following MBean: > {quote}kafka.connect:type=connector-task-metrics,connector="\{connector}",task="\{task}"status > - The status of the connector task. One of 'unassigned', 'running', > 'paused', 'failed', or 'destroyed'. > {quote} > h2. Issue > Generally +connector-task-metrics+ are exposed propery for tasks in +running+ > status but not exposed at all if task is +failed+. > Failed Task *appears* properly with failed status when queried via *REST API*: > > {code:java} > $ curl -X GET -u 'user:pass' > http://kafka-connect.mydomain.com/connectors/customerconnector/status > {"name":"customerconnector","connector":{"state":"RUNNING","worker_id":"kafka-connect.mydomain.com:8080"},"tasks":[{"id":0,"state":"FAILED","worker_id":"kafka-connect.mydomain.com:8080","trace":"org.apache.kafka.connect.errors.ConnectException: > Received DML 'DELETE FROM mysql.rds_sysinfo .."}],"type":"source"} > $ {code} > > Failed Task *doesn't appear* as bean with +connector-task-metrics+ type when > queried via *JMX*: > > {code:java} > $ echo "beans -d kafka.connect" | java -jar > target/jmxterm-1.1.0-SNAPSHOT-uber.jar -l localhost:8081 -n -v silent | grep > connector=customerconnector > kafka.connect:connector=customerconnector,task=0,type=task-error-metricskafka.connect:connector=customerconnector,type=connector-metrics > $ > {code} > h2. Expected result > It is expected, that bean with +connector-task-metrics+ type will appear also > for tasks that failed. > Below is example of how beans are properly registered for tasks in Running > state: > > {code:java} > $ echo "get -b > kafka.connect:connector=sinkConsentSubscription-1000,task=0,type=connector-task-metrics > status" | java -jar target/jmxterm-1.1.0-SNAPSHOT-ube r.jar -l > localhost:8081 -n -v silent > status = running; > $ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9602) Incorrect close of producer instance during partition assignment
[ https://issues.apache.org/jira/browse/KAFKA-9602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044162#comment-17044162 ] ASF GitHub Bot commented on KAFKA-9602: --- abbccdda commented on pull request #8166: (WIP) KAFKA-9602: Close the stream producer only in EOS URL: https://github.com/apache/kafka/pull/8166 This bug reproduces through the trunk stream test, the producer was closed unexpectedly when EOS is not turned on. Will work on adding unit test to guard this logic. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Incorrect close of producer instance during partition assignment > > > Key: KAFKA-9602 > URL: https://issues.apache.org/jira/browse/KAFKA-9602 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > The new StreamProducer instance close doesn't distinguish between an > EOS/non-EOS shutdown. The StreamProducer should take care of that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9602) Incorrect close of producer instance during partition assignment
Boyang Chen created KAFKA-9602: -- Summary: Incorrect close of producer instance during partition assignment Key: KAFKA-9602 URL: https://issues.apache.org/jira/browse/KAFKA-9602 Project: Kafka Issue Type: Bug Affects Versions: 2.6.0 Reporter: Boyang Chen Assignee: Boyang Chen The new StreamProducer instance close doesn't distinguish between an EOS/non-EOS shutdown. The StreamProducer should take care of that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9601) Workers log raw connector configs, including values
[ https://issues.apache.org/jira/browse/KAFKA-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044147#comment-17044147 ] ASF GitHub Bot commented on KAFKA-9601: --- C0urante commented on pull request #8165: KAFKA-9601: Stop logging raw connector config values URL: https://github.com/apache/kafka/pull/8165 [Jira](https://issues.apache.org/jira/browse/KAFKA-9601) whoopsie daisy This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Workers log raw connector configs, including values > --- > > Key: KAFKA-9601 > URL: https://issues.apache.org/jira/browse/KAFKA-9601 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Critical > > [This line right > here|https://github.com/apache/kafka/blob/5359b2e3bc1cf13a301f32490a6630802afc4974/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L78] > logs all configs (key and value) for a connector, which is bad, since it can > lead to secrets (db credentials, cloud storage credentials, etc.) being > logged in plaintext. > We can remove this line. Or change it to just log config keys. Or try to do > some super-fancy parsing that masks sensitive values. Well, hopefully not > that. That sounds like a lot of work. > Affects all versions of Connect back through 0.10.1. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9601) Workers log raw connector configs, including values
Chris Egerton created KAFKA-9601: Summary: Workers log raw connector configs, including values Key: KAFKA-9601 URL: https://issues.apache.org/jira/browse/KAFKA-9601 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Chris Egerton Assignee: Chris Egerton [This line right here|https://github.com/apache/kafka/blob/5359b2e3bc1cf13a301f32490a6630802afc4974/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConnector.java#L78] logs all configs (key and value) for a connector, which is bad, since it can lead to secrets (db credentials, cloud storage credentials, etc.) being logged in plaintext. We can remove this line. Or change it to just log config keys. Or try to do some super-fancy parsing that masks sensitive values. Well, hopefully not that. That sounds like a lot of work. Affects all versions of Connect back through 0.10.1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9600) EndTxn handler should check strict epoch equality
[ https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044133#comment-17044133 ] ASF GitHub Bot commented on KAFKA-9600: --- abbccdda commented on pull request #8164: KAFKA-9600: EndTxn should enforce strict epoch checking if from client URL: https://github.com/apache/kafka/pull/8164 This PR enhances the epoch checking logic for endTransaction call in TransactionCoordinator. Previously it relaxes the checking by allowing a producer epoch bump, which is error-prone since there is no reason to see a producer epoch bump from client. Since this is purely a server side bug which requires no client side change, we haven't added any integration test to verify this behavior yet. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > EndTxn handler should check strict epoch equality > - > > Key: KAFKA-9600 > URL: https://issues.apache.org/jira/browse/KAFKA-9600 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Boyang Chen >Priority: Major > > The EndTxn path in TransactionCoordinator is shared between direct calls to > EndTxn from the client and internal transaction abort logic. To support the > latter, the code is written to allow an epoch bump. However, if the client > bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the > internal invariants are violated which results in a hanging transaction. > Specifically, the transaction is left in a pending state because the epoch > following append to the log does not match what we expect. > To fix this, we should ensure that an EndTxn from the client checks for > strict epoch equality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-7711) Add a bounded flush() API to Kafka Producer
[ https://issues.apache.org/jira/browse/KAFKA-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] highluck reassigned KAFKA-7711: --- Assignee: (was: highluck) > Add a bounded flush() API to Kafka Producer > > > Key: KAFKA-7711 > URL: https://issues.apache.org/jira/browse/KAFKA-7711 > Project: Kafka > Issue Type: Improvement > Components: producer >Reporter: kun du >Priority: Minor > Labels: needs-kip > > Currently the call to Producer.flush() can be hang there for indeterminate > time. > It is a good idea to add a bounded flush() API and timeout if producer is > unable to flush all the batch records in a limited time. In this way the > caller of flush() has a chance to decide what to do next instead of just wait > forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams
[ https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044128#comment-17044128 ] ASF GitHub Bot commented on KAFKA-9455: --- highluck commented on pull request #8163: KAFKA-9455; Consider using TreeMap for in-memory stores of Streams URL: https://github.com/apache/kafka/pull/8163 https://issues.apache.org/jira/browse/KAFKA-9455 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Consider using TreeMap for in-memory stores of Streams > -- > > Key: KAFKA-9455 > URL: https://issues.apache.org/jira/browse/KAFKA-9455 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > Labels: newbie++ > > From [~ableegoldman]: It's worth noting that it might be a good idea to > switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap > allows us to safely perform range queries without copying over the entire > keyset, but the performance on point queries seems to scale noticeably worse > with the number of unique keys. Point queries are used by aggregations while > range queries are used by windowed joins, but of course both are available > within the PAPI and for interactive queries so it's hard to say which we > should prefer. Maybe rather than make that tradeoff we should have one > version for efficient range queries (a "JoinWindowStore") and one for > efficient point queries ("AggWindowStore") - or something. I know we've had > similar thoughts for a different RocksDB store layout for Joins (although I > can't find that ticket anywhere..), it seems like the in-memory stores could > benefit from a special "Join" version as well cc/ Guozhang Wang > Here are some random thoughts: > 1. For kafka streams processing logic (i.e. without IQ), it's better to make > all processing logic relying on point queries rather than range queries. > Right now the only processor that use range queries are, as mentioned above, > windowed stream-stream joins. I think we should consider using a different > window implementation for this (and as a result also get rid of the > retainDuplicate flags) to refactor the windowed stream-stream join operation. > 2. With 1), range queries would only be exposed as IQ. Depending on its usage > frequency I think it makes lots of sense to optimize for single-point queries. > Of course, even without step 1) we should still consider using tree-map for > windowed in-memory stores to have a better scaling effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams
[ https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044126#comment-17044126 ] highluck commented on KAFKA-9455: - [~guozhang] thank you! `JoinWindowStore` is my mistake > Consider using TreeMap for in-memory stores of Streams > -- > > Key: KAFKA-9455 > URL: https://issues.apache.org/jira/browse/KAFKA-9455 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > Labels: newbie++ > > From [~ableegoldman]: It's worth noting that it might be a good idea to > switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap > allows us to safely perform range queries without copying over the entire > keyset, but the performance on point queries seems to scale noticeably worse > with the number of unique keys. Point queries are used by aggregations while > range queries are used by windowed joins, but of course both are available > within the PAPI and for interactive queries so it's hard to say which we > should prefer. Maybe rather than make that tradeoff we should have one > version for efficient range queries (a "JoinWindowStore") and one for > efficient point queries ("AggWindowStore") - or something. I know we've had > similar thoughts for a different RocksDB store layout for Joins (although I > can't find that ticket anywhere..), it seems like the in-memory stores could > benefit from a special "Join" version as well cc/ Guozhang Wang > Here are some random thoughts: > 1. For kafka streams processing logic (i.e. without IQ), it's better to make > all processing logic relying on point queries rather than range queries. > Right now the only processor that use range queries are, as mentioned above, > windowed stream-stream joins. I think we should consider using a different > window implementation for this (and as a result also get rid of the > retainDuplicate flags) to refactor the windowed stream-stream join operation. > 2. With 1), range queries would only be exposed as IQ. Depending on its usage > frequency I think it makes lots of sense to optimize for single-point queries. > Of course, even without step 1) we should still consider using tree-map for > windowed in-memory stores to have a better scaling effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9598) RocksDB exception when grouping dynamically appearing topics into a KTable
[ https://issues.apache.org/jira/browse/KAFKA-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044035#comment-17044035 ] Guozhang Wang commented on KAFKA-9598: -- I think it maybe related to some old bugs that we fixed in trunk, [~sergem] could you compile the latest trunk and run to see if it works? > RocksDB exception when grouping dynamically appearing topics into a KTable > --- > > Key: KAFKA-9598 > URL: https://issues.apache.org/jira/browse/KAFKA-9598 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.2.0, 2.4.0 >Reporter: Sergey Menshikov >Priority: Major > Attachments: exception-details.txt > > > A streams application consumes a number of topics via a whitelisted regex. > The topics appear dynamically, generated from dynamically appearing MongoDB > collections by debezium MongoDB source driver. > The development is running on debezium docker images (Debezium 0.9 and > Debezium 1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and > the streams consumer app. > As the MongoDB driver provides only deltas of the changes, to collect full > record for each key, the code creates KTable which is then transformed into a > KStream for further joining with other KTables and Global KTables. > The following piece of code results in the exception when a new topic is > added: > > {code:java} > Pattern tResultPattern = > > Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}"); > KStream tResultsTempStream = builder.stream(tResultPattern, > Consumed.with(stringSerde, jsonSerde)); > KTable tResultsTempTable = > tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde)) > .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // > mergeNodes is a Json traverse/merger procedure > KStream tResults = > tResultsTempTable.toStream(); > > {code} > kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" > org.apache.kafka.streams.errors.ProcessorStateException: Error opening store > KSTREAM-REDUCE-STATE-STORE-32 at location > /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32 > ... > kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : > /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: > No locks available > Kstore 10_0 contains tr[0-9a-fA-F] 32 records, I checked. > > more details about exception are in the attached file. > > The exception is no longer present when I use an intermediate topic instead: > > > {code:java} > Pattern tResultPattern = > > Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}"); > KStream tResultsTempStream = builder.stream(tResultPattern, > Consumed.with(stringSerde, jsonSerde)); > > tResultsTempStream.transform(trTransformer::new).to(config.getProperty("tr_intermediate_topic_name"),Produced.with(stringSerde, > jsonSerde)); // trTransformer adds topic name into value Json, in previous > snippet it was done in the pipeline after grouping/streaming > KStream tResultsTempStream2 = > builder.stream(config.getProperty("tr_intermediate_topic_name"), > Consumed.with(stringSerde, jsonSerde)); > KTable tResultsTempTable = > tResultsTempStream2.groupByKey(Grouped.with(stringSerde,jsonSerde)) > .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); > KStream tResults = > tResultsTempTable.toStream(); > {code} > > > If making KTable from multiple whitelisted topics is something that is > outside of scope of Kafka Streams, perhaps it would make sense to mention it > in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9598) RocksDB exception when grouping dynamically appearing topics into a KTable
[ https://issues.apache.org/jira/browse/KAFKA-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-9598: - Description: A streams application consumes a number of topics via a whitelisted regex. The topics appear dynamically, generated from dynamically appearing MongoDB collections by debezium MongoDB source driver. The development is running on debezium docker images (Debezium 0.9 and Debezium 1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and the streams consumer app. As the MongoDB driver provides only deltas of the changes, to collect full record for each key, the code creates KTable which is then transformed into a KStream for further joining with other KTables and Global KTables. The following piece of code results in the exception when a new topic is added: {code:java} Pattern tResultPattern = Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}"); KStream tResultsTempStream = builder.stream(tResultPattern, Consumed.with(stringSerde, jsonSerde)); KTable tResultsTempTable = tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde)) .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // mergeNodes is a Json traverse/merger procedure KStream tResults = tResultsTempTable.toStream(); {code} kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-REDUCE-STATE-STORE-32 at location /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32 ... kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: No locks available Kstore 10_0 contains tr[0-9a-fA-F] 32 records, I checked. more details about exception are in the attached file. The exception is no longer present when I use an intermediate topic instead: {code:java} Pattern tResultPattern = Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}"); KStream tResultsTempStream = builder.stream(tResultPattern, Consumed.with(stringSerde, jsonSerde)); tResultsTempStream.transform(trTransformer::new).to(config.getProperty("tr_intermediate_topic_name"),Produced.with(stringSerde, jsonSerde)); // trTransformer adds topic name into value Json, in previous snippet it was done in the pipeline after grouping/streaming KStream tResultsTempStream2 = builder.stream(config.getProperty("tr_intermediate_topic_name"), Consumed.with(stringSerde, jsonSerde)); KTable tResultsTempTable = tResultsTempStream2.groupByKey(Grouped.with(stringSerde,jsonSerde)) .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); KStream tResults = tResultsTempTable.toStream(); {code} If making KTable from multiple whitelisted topics is something that is outside of scope of Kafka Streams, perhaps it would make sense to mention it in the docs. was: A streams application consumes a number of topics via a whitelisted regex. The topics appear dynamically, generated from dynamically appearing MongoDB collections by debezium MongoDB source driver. The development is running on debezium docker images (Debezium 0.9 and Debezium 1.0 -> Kafka 2.2.0 and 2.4.0), single instance of Kafka, Connect and the streams consumer app. As the MongoDB driver provides only deltas of the changes, to collect full record for each key, the code creates KTable which is then transformed into a KStream for further joining with other KTables and Global KTables. The following piece of code results in the exception when a new topic is added: {code:java} Pattern tResultPattern = Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}"); KStream tResultsTempStream = builder.stream(tResultPattern, Consumed.with(stringSerde, jsonSerde)); KTable tResultsTempTable = tResultsTempStream.groupByKey(Grouped.with(stringSerde,jsonSerde)) .reduce((aggValue, newValue) -> mergeNodes(aggValue,newValue)); // mergeNodes is a Json traverse/merger procedure KStream tResults = tResultsTempTable.toStream(); {code} kconsumer_1 | Exception in thread "split-reader-client3-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: Error opening store KSTREAM-REDUCE-STATE-STORE-32 at location /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32 ... kconsumer_1 | Caused by: org.rocksdb.RocksDBException: lock : /tmp/split-reader3/10_0/rocksdb/KSTREAM-REDUCE-STATE-STORE-32/LOCK: No locks available Kstore 10_0 contains tr[0-9a-fA-F] {32} records, I checked. more details about exception are in the attached file. The exception is no longer present when I use an intermediate topic instead: {code:java} Pattern tResultPattern = Pattern.compile(config.getProperty("mongodb_source_prefix")+".tr[0-9a-fA-F]{32}");
[jira] [Commented] (KAFKA-9572) Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some Records
[ https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044030#comment-17044030 ] Guozhang Wang commented on KAFKA-9572: -- I looked into the source code of 2.4 and 2.5 and did not find any new issues other than conjectured https://issues.apache.org/jira/browse/KAFKA-8574 – the logs themselves cannot further validates whether it was the root cause, but at least it shows that some restoring tasks are closed before restoration is completed, which could possibly lead to the bug of KAFKA-8574. This bug is fixed as part of the tech debt cleanup as in KAFKA-9113. So I think I have about 60 percent confidence that this issue is no longer there in 2.6 but it would still be in 2.4 and 2.5 since the fix itself incurs a lot of the cleanup it is hard to cherry-pick to older branches. I'd suggest we close this ticket for 2.6 only and if StreamsEosTest.test_failure_and_recovery failed on trunk again we could look into this once more. WDYT [~cadonna] [~apurva] > Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some > Records > --- > > Key: KAFKA-9572 > URL: https://issues.apache.org/jira/browse/KAFKA-9572 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Bruno Cadonna >Assignee: Guozhang Wang >Priority: Blocker > Fix For: 2.5.0 > > Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, > streams23.log, streams30.log, sum-1.txt > > > System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a > wrongly computed aggregation under exactly-once (EOS). The specific error is: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Result verification > failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset > = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value > size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = > [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698> > at > org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444) > at > org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196) > at > org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69) > {code} > That means, the sum computed by the Streams app seems to be wrong for key > 6069. I checked the dumps of the log segments of the input topic partition > (attached: data-1.txt) and indeed two input records are not considered in the > sum. With those two missed records the sum would be correct. More concretely, > the input values for key 6069 are: > # 147 > # 9250 > # 5340 > # 1231 > # 1301 > The sum of this values is 17269 as stated in the exception above as expected > sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get > 10698 , which is the actual sum in the exception above. Somehow those two > values are missing. > In the log dump of the output topic partition (attached: sum-1.txt), the sum > is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with > 10698. > In the log dump of the changelog topic of the state store that stores the sum > (attached: 7-changelog-1.txt), the sum is also overwritten as in the output > topic. > I attached the logs of the three Streams instances involved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9600) EndTxn handler should check strict epoch equality
[ https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen reassigned KAFKA-9600: -- Assignee: Boyang Chen > EndTxn handler should check strict epoch equality > - > > Key: KAFKA-9600 > URL: https://issues.apache.org/jira/browse/KAFKA-9600 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Boyang Chen >Priority: Major > > The EndTxn path in TransactionCoordinator is shared between direct calls to > EndTxn from the client and internal transaction abort logic. To support the > latter, the code is written to allow an epoch bump. However, if the client > bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the > internal invariants are violated which results in a hanging transaction. > Specifically, the transaction is left in a pending state because the epoch > following append to the log does not match what we expect. > To fix this, we should ensure that an EndTxn from the client checks for > strict epoch equality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9595) Config tool should use admin client's incremental config API
[ https://issues.apache.org/jira/browse/KAFKA-9595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044003#comment-17044003 ] ASF GitHub Bot commented on KAFKA-9595: --- agam commented on pull request #8162: KAFKA-9595: switch usage of `alterConfigs` to `incrementalAlterConfigs` for kafka-configs tool URL: https://github.com/apache/kafka/pull/8162 - Also, some minor refactoring for common code - Test changes to `ConfigCommandTest` - Verifies builds, tests pass ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Config tool should use admin client's incremental config API > > > Key: KAFKA-9595 > URL: https://issues.apache.org/jira/browse/KAFKA-9595 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Agam Brahma >Priority: Major > Labels: newbie > > The `alterConfigs` API is deprecated. We should convert `ConfigCommand` to > use `incrementalAlterConfigs`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9047) AdminClient group operations may not respect backoff
[ https://issues.apache.org/jira/browse/KAFKA-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043866#comment-17043866 ] ASF GitHub Bot commented on KAFKA-9047: --- skaundinya15 commented on pull request #8161: KAFKA-9047: Making AdminClient group operations respect retries and backoff URL: https://github.com/apache/kafka/pull/8161 https://issues.apache.org/jira/browse/KAFKA-9047 Previously, `AdminClient` group operations did not respect a `Call`'s number of configured tries and retry backoff. This could lead to tight retry loops that put a lot of pressure on the broker. This PR introduces fixes that ensures for all group operations the `AdminClient` respects the number of tries and the backoff a given `Call` has. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AdminClient group operations may not respect backoff > > > Key: KAFKA-9047 > URL: https://issues.apache.org/jira/browse/KAFKA-9047 > Project: Kafka > Issue Type: Bug > Components: admin >Reporter: Jason Gustafson >Assignee: Sanjana Kaundinya >Priority: Major > > The retry logic for consumer group operations in the admin client is > complicated by the need to find the coordinator. Instead of simply retry > loops which send the same request over and over, we can get more complex > retry loops like the following: > # Send FindCoordinator to B -> Coordinator is A > # Send DescribeGroup to A -> NOT_COORDINATOR > # Go back to 1 > Currently we construct a new Call object for each step in this loop, which > means we lose some of retry bookkeeping such as the last retry time and the > number of tries. This means it is possible to have tight retry loops which > bounce between steps 1 and 2 and do not respect the retry backoff. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9455) Consider using TreeMap for in-memory stores of Streams
[ https://issues.apache.org/jira/browse/KAFKA-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043797#comment-17043797 ] Guozhang Wang commented on KAFKA-9455: -- I'm actually considering that we should use tree-map for all in-memory time-windowed stores, independent of what queries they may be accessed for. What do you mean by `JoinWindowStore`? I think we do not have a specific store-type just for windowed stream-stream join? > Consider using TreeMap for in-memory stores of Streams > -- > > Key: KAFKA-9455 > URL: https://issues.apache.org/jira/browse/KAFKA-9455 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > Labels: newbie++ > > From [~ableegoldman]: It's worth noting that it might be a good idea to > switch to TreeMap for different reasons. Right now the ConcurrentSkipListMap > allows us to safely perform range queries without copying over the entire > keyset, but the performance on point queries seems to scale noticeably worse > with the number of unique keys. Point queries are used by aggregations while > range queries are used by windowed joins, but of course both are available > within the PAPI and for interactive queries so it's hard to say which we > should prefer. Maybe rather than make that tradeoff we should have one > version for efficient range queries (a "JoinWindowStore") and one for > efficient point queries ("AggWindowStore") - or something. I know we've had > similar thoughts for a different RocksDB store layout for Joins (although I > can't find that ticket anywhere..), it seems like the in-memory stores could > benefit from a special "Join" version as well cc/ Guozhang Wang > Here are some random thoughts: > 1. For kafka streams processing logic (i.e. without IQ), it's better to make > all processing logic relying on point queries rather than range queries. > Right now the only processor that use range queries are, as mentioned above, > windowed stream-stream joins. I think we should consider using a different > window implementation for this (and as a result also get rid of the > retainDuplicate flags) to refactor the windowed stream-stream join operation. > 2. With 1), range queries would only be exposed as IQ. Depending on its usage > frequency I think it makes lots of sense to optimize for single-point queries. > Of course, even without step 1) we should still consider using tree-map for > windowed in-memory stores to have a better scaling effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9599) create unique sensor to record group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-9599. -- Fix Version/s: 2.4.1 2.5.0 Resolution: Fixed > create unique sensor to record group rebalance > -- > > Key: KAFKA-9599 > URL: https://issues.apache.org/jira/browse/KAFKA-9599 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > Fix For: 2.5.0, 2.4.1 > > > {code:scala} > val offsetDeletionSensor = metrics.sensor("OffsetDeletions") > ... > val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions") > {code} > the "offset deletion" and "group rebalance" should not be recorded by the > same sensor since they are totally different. > the code is introduced by KAFKA-8730 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9600) EndTxn handler should check strict epoch equality
[ https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson updated KAFKA-9600: --- Description: The EndTxn path in TransactionCoordinator is shared between direct calls to EndTxn from the client and internal transaction abort logic. To support the latter, the code is written to allow an epoch bump. However, if the client bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the internal invariants are violated which results in a hanging transaction. Specifically, the transaction is left in a pending state because the epoch following append to the log does not match what we expect. To fix this, we should ensure that an EndTxn from the client checks for strict epoch equality. was:The EndTxn path in TransactionCoordinator is shared between direct calls to EndTxn from the client and internal transaction abort logic. To support the latter, the code is written to allow an epoch bump. However, if the client bumps the epoch unexpectedly (e.g. due to a buggy implementation), then we can be left with a hanging transaction. To fix this, we should ensure that an EndTxn from the client checks for strict epoch equality. > EndTxn handler should check strict epoch equality > - > > Key: KAFKA-9600 > URL: https://issues.apache.org/jira/browse/KAFKA-9600 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Priority: Major > > The EndTxn path in TransactionCoordinator is shared between direct calls to > EndTxn from the client and internal transaction abort logic. To support the > latter, the code is written to allow an epoch bump. However, if the client > bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the > internal invariants are violated which results in a hanging transaction. > Specifically, the transaction is left in a pending state because the epoch > following append to the log does not match what we expect. > To fix this, we should ensure that an EndTxn from the client checks for > strict epoch equality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9600) EndTxn handler should check strict epoch equality
Jason Gustafson created KAFKA-9600: -- Summary: EndTxn handler should check strict epoch equality Key: KAFKA-9600 URL: https://issues.apache.org/jira/browse/KAFKA-9600 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson The EndTxn path in TransactionCoordinator is shared between direct calls to EndTxn from the client and internal transaction abort logic. To support the latter, the code is written to allow an epoch bump. However, if the client bumps the epoch unexpectedly (e.g. due to a buggy implementation), then we can be left with a hanging transaction. To fix this, we should ensure that an EndTxn from the client checks for strict epoch equality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9562) Streams not making progress under heavy failures with EOS enabled on 2.5 branch
[ https://issues.apache.org/jira/browse/KAFKA-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043717#comment-17043717 ] Boyang Chen commented on KAFKA-9562: A status update: we have fixed two sub tasks and one flush call try catch. The latest changes are deployed to soak. Will update this ticket once we believe 2.5 is good to go on stream side. > Streams not making progress under heavy failures with EOS enabled on 2.5 > branch > --- > > Key: KAFKA-9562 > URL: https://issues.apache.org/jira/browse/KAFKA-9562 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.5.0 >Reporter: John Roesler >Assignee: Boyang Chen >Priority: Blocker > Fix For: 2.5.0 > > > During soak testing in preparation for the 2.5.0 release, we have discovered > a case in which Streams appears to stop making progress. Specifically, this > is a failure-resilience test in which we inject network faults separating the > instances from the brokers roughly every twenty minutes. > On 2.4, Streams would obviously spend a lot of time rebalancing under this > scenario, but would still make progress. However, on the current 2.5 branch, > Streams effectively stops making progress except rarely. > This appears to be a severe regression, so I'm filing this ticket as a 2.5.0 > release blocker. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9581) Deprecate rebalanceException on StreamThread to avoid infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen resolved KAFKA-9581. Resolution: Fixed > Deprecate rebalanceException on StreamThread to avoid infinite loop > --- > > Key: KAFKA-9581 > URL: https://issues.apache.org/jira/browse/KAFKA-9581 > Project: Kafka > Issue Type: Sub-task >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9581) Deprecate rebalanceException on StreamThread to avoid infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043671#comment-17043671 ] ASF GitHub Bot commented on KAFKA-9581: --- guozhangwang commented on pull request #8145: KAFKA-9581: Remove rebalance exception withholding URL: https://github.com/apache/kafka/pull/8145 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Deprecate rebalanceException on StreamThread to avoid infinite loop > --- > > Key: KAFKA-9581 > URL: https://issues.apache.org/jira/browse/KAFKA-9581 > Project: Kafka > Issue Type: Sub-task >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-6435) Application Reset Tool might delete incorrect internal topics
[ https://issues.apache.org/jira/browse/KAFKA-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-6435: --- Labels: bug help-wanted newbie (was: bug) > Application Reset Tool might delete incorrect internal topics > - > > Key: KAFKA-6435 > URL: https://issues.apache.org/jira/browse/KAFKA-6435 > Project: Kafka > Issue Type: Bug > Components: streams, tools >Affects Versions: 1.0.0 >Reporter: Matthias J. Sax >Priority: Major > Labels: bug, help-wanted, newbie > > The streams application reset tool, deletes all topic that start with > {{-}}. > If people have two versions of the same application and name them {{"app"}} > and {{"app-v2"}}, resetting {{"app"}} would also delete the internal topics > of {{"app-v2"}}. > We either need to disallow the dash in the application ID, or improve the > topic identification logic in the reset tool to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9599) create unique sensor to record group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043633#comment-17043633 ] Guozhang Wang commented on KAFKA-9599: -- Thanks for catching this bug [~chia7712]! Please ping me when you have a PR ready. > create unique sensor to record group rebalance > -- > > Key: KAFKA-9599 > URL: https://issues.apache.org/jira/browse/KAFKA-9599 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Major > > {code:scala} > val offsetDeletionSensor = metrics.sensor("OffsetDeletions") > ... > val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions") > {code} > the "offset deletion" and "group rebalance" should not be recorded by the > same sensor since they are totally different. > the code is introduced by KAFKA-8730 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043607#comment-17043607 ] Bill Bejeck commented on KAFKA-9398: The PR for this ticket was closed as discussions are on-going to what is the proper fix. I've updated the {{Fix Version}} field. > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck updated KAFKA-9398: --- Fix Version/s: (was: 2.5.0) > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9599) create unique sensor to record group rebalance
Chia-Ping Tsai created KAFKA-9599: - Summary: create unique sensor to record group rebalance Key: KAFKA-9599 URL: https://issues.apache.org/jira/browse/KAFKA-9599 Project: Kafka Issue Type: Bug Affects Versions: 2.4.0 Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai {code:scala} val offsetDeletionSensor = metrics.sensor("OffsetDeletions") ... val groupCompletedRebalanceSensor = metrics.sensor("OffsetDeletions") {code} the "offset deletion" and "group rebalance" should not be recorded by the same sensor since they are totally different. the code is introduced by KAFKA-8730 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9541) Flaky Test DescribeConsumerGroupTest#testDescribeGroupMembersWithShortInitializationTimeout
[ https://issues.apache.org/jira/browse/KAFKA-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxihx reassigned KAFKA-9541: - Assignee: (was: huxihx) > Flaky Test > DescribeConsumerGroupTest#testDescribeGroupMembersWithShortInitializationTimeout > --- > > Key: KAFKA-9541 > URL: https://issues.apache.org/jira/browse/KAFKA-9541 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.4.0 >Reporter: huxihx >Priority: Major > > h3. Error Message > java.lang.AssertionError: assertion failed > h3. Stacktrace > java.lang.AssertionError: assertion failed at > scala.Predef$.assert(Predef.scala:267) at > kafka.admin.DescribeConsumerGroupTest.testDescribeGroupMembersWithShortInitializationTimeout(DescribeConsumerGroupTest.scala:630) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at jdk.internal.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) > at jdk.internal.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at >
[jira] [Updated] (KAFKA-9497) Brokers start up even if SASL provider is not loaded and throw NPE when clients connect
[ https://issues.apache.org/jira/browse/KAFKA-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram updated KAFKA-9497: -- Fix Version/s: (was: 2.5.0) 2.6.0 > Brokers start up even if SASL provider is not loaded and throw NPE when > clients connect > --- > > Key: KAFKA-9497 > URL: https://issues.apache.org/jira/browse/KAFKA-9497 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.2, 0.11.0.3, 1.1.1, 2.4.0 >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Major > Fix For: 2.6.0 > > > Note: This is not a regression, this has been the behaviour since SASL was > first implemented in Kafka. > > Sasl.createSaslServer and Sasl.createSaslClient may return null if a SASL > provider that works for the specified configs cannot be created. We don't > currently handle this case. As a result broker/client throws > NullPointerException if a provider has not been loaded. On the broker-side, > we allow brokers to start up successfully even if SASL provider for its > enabled mechanisms are not found. For SASL mechanisms > PLAIN/SCRAM-xx/OAUTHBEARER, the login module in Kafka loads the SASL > providers. If the login module is incorrectly configured, brokers startup and > then fail client connections when hitting NPE. Clients see disconnections > during authentication as a result. It is difficult to tell from the client or > broker logs why the failure occurred. We should fail during startup if SASL > providers are not found and provide better diagnostics for this case. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)