[jira] [Issue Comment Deleted] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongLiang updated KAFKA-6326: - Comment: was deleted (was: [~liuweiwell]gao xiao ne?) > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongLiang updated KAFKA-6326: - Attachment: (was: fast-recver-shutdownbroker.diff) > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283158#comment-16283158 ] HongLiang edited comment on KAFKA-6326 at 12/8/17 7:32 AM: --- sorry, the version is 0.10.2.1 [~huxi_2b] was (Author: hongliang): sorry, the version is 0.10.2.1 > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283158#comment-16283158 ] HongLiang commented on KAFKA-6326: -- sorry, the version is 0.10.2.1 > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283156#comment-16283156 ] huxihx commented on KAFKA-6326: --- Based on the screenshot, it seems to be 0.10.2.1 codebase. Did you run into this problem against 1.0.0? > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongLiang updated KAFKA-6326: - Attachment: (was: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png) > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283053#comment-16283053 ] HongLiang edited comment on KAFKA-6326 at 12/8/17 4:51 AM: --- [~huxi_2b] zookeeper.session.timeout.ms is 120 sec. but the zk session timeout not problem in this case. the problem is broker has down. the controller still send broker .wait 30 sec. I think controller not wait. because controller know the broker has down. was (Author: hongliang): [~huxi_2b] zookeeper.session.timeout.ms is 120 sec. but the zk session timeout not problem in this case. the problem is broker has down(zookeeper has know in session.timeout). the controller still send broker .wait 30 sec. I think controller not wait. because controller know the broker has down. > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283053#comment-16283053 ] HongLiang commented on KAFKA-6326: -- [~huxi_2b] zookeeper.session.timeout.ms is 120 sec. but the zk session timeout not problem in this case. the problem is broker has down(zookeeper has know in session.timeout). the controller still send broker .wait 30 sec. I think controller not wait. because controller know the broker has down. > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HongLiang updated KAFKA-6326: - Attachment: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283028#comment-16283028 ] huxihx commented on KAFKA-6326: --- What's your `zookeeper.session.timeout.ms`? And I am also curious about what ZkEventThread complained before shutting down the request-sending thread. > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283003#comment-16283003 ] ASF GitHub Bot commented on KAFKA-6323: --- GitHub user fredfp opened a pull request: https://github.com/apache/kafka/pull/4304 KAFKA-6323: document that punctuation is called immediately. If KAFKA-6323 is not a bug, then it needs better documentation. Alternative to https://github.com/apache/kafka/pull/4301 @mihbor @mjsax You can merge this pull request into a Git repository by running: $ git pull https://github.com/fredfp/kafka KAFKA-6323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/4304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4304 commit 935d0fc61084f383ec528d28f9c18f8b51fff1d2 Author: Frederic Arno Date: 2017-12-08T03:22:52Z KAFKA-6323: document that punctuation is called immediately. > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno > Fix For: 1.1.0, 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6329) Load trust store as a resource
Allen Wang created KAFKA-6329: - Summary: Load trust store as a resource Key: KAFKA-6329 URL: https://issues.apache.org/jira/browse/KAFKA-6329 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 1.0.0, 0.11.0.0, 0.10.2.0 Reporter: Allen Wang We would like to publish a Kafka client library with SSL enabled by default and distribute to internal applications so that they can communicate to our brokers securely. We also need to distribute a trust store with our internal CA cert. In our library/application ecosystem, this is the easiest way to enable security without adding burdens to each application to deploy a certain trust store. However, that does not seem to be possible as Kafka client assumes that the trust store is in a local file system and uses FileInputStream which does not work with resources. https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/ssl/SslFactory.java Here is the actual line of code: {code:java} in = new FileInputStream(path); {code} Ideally we would also like to be able to do this as another way to load trust store: {code:java} in = this.getClass().getResourcesAsStream(resourcePath) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-6126) Reduce rebalance time by not checking if created topics are available
[ https://issues.apache.org/jira/browse/KAFKA-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax reassigned KAFKA-6126: -- Assignee: Matthias J. Sax > Reduce rebalance time by not checking if created topics are available > - > > Key: KAFKA-6126 > URL: https://issues.apache.org/jira/browse/KAFKA-6126 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > Fix For: 1.1.0 > > > Within {{StreamPartitionAssignor#assign}} we create new topics and afterwards > wait in an "infinite loop" until topic metadata propagated throughout the > cluster. We do this, to make sure topics are available when we start > processing. > However, with this approach we "extend" the time in the rebalance phase and > thus are not responsive (no calls to `poll` for liveness check and > {{KafkaStreams#close}} suffers). Thus, we might want to remove this check and > handle potential "topic not found" exceptions in the main thread gracefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282953#comment-16282953 ] HongLiang edited comment on KAFKA-6326 at 12/8/17 2:57 AM: --- [~liuweiwell]gao xiao ne? was (Author: hongliang): [~liuweiwell]gao xiao ni? > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282953#comment-16282953 ] HongLiang commented on KAFKA-6326: -- [~liuweiwell]gao xiao ni? > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6318) StreamsResetter should return non-zero return code on error
[ https://issues.apache.org/jira/browse/KAFKA-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282932#comment-16282932 ] siva santhalingam commented on KAFKA-6318: -- Hi [~mjsax] Can i assign this to myself. Also maybeResetInputAndSeekToEndIntermediateTopicOffsets is the only method that needs to be changed right ? Please let me know if I'm missing something ? > StreamsResetter should return non-zero return code on error > --- > > Key: KAFKA-6318 > URL: https://issues.apache.org/jira/browse/KAFKA-6318 > Project: Kafka > Issue Type: Bug > Components: streams, tools >Affects Versions: 1.0.0 >Reporter: Matthias J. Sax > > If users specify a non-existing topic as input parameter, > {{StreamsResetter}} only prints out an error message that the topic was not > found, but return code is still zero. We should return a non-zero return code > for this case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4857) Replace StreamsKafkaClient with AdminClient in Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282785#comment-16282785 ] ASF GitHub Bot commented on KAFKA-4857: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/4242 > Replace StreamsKafkaClient with AdminClient in Kafka Streams > > > Key: KAFKA-4857 > URL: https://issues.apache.org/jira/browse/KAFKA-4857 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > Fix For: 1.1.0 > > > Streams uses {{KafkaClientSupplier}} to get > consumer/restore-consumer/producer clients. Streams also uses one more client > for admin purpose namely {{StreamsKafkaClient}} that is instantiated > "manually". > With the newly upcoming {{AdminClient}} from KIP-117, we can simplify (or > even replace {{StreamsKafkaClient}} with the new {{AdminClient}}. We > furthermore want to unify how the client in generated and extend > {{KafkaClientSupplier}} with method that return this client. > NOTE: The public facing changes are summarized in a separate ticket > KAFKA-6170, and this ticket is only for the internal swap, with the accepted > criterion to completely remove StreamsKafkaClient with the newly introduced > KafkaAdminClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6328) Exclude node groups belonging to global stores in InternalTopologyBuilder#makeNodeGroups
[ https://issues.apache.org/jira/browse/KAFKA-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-6328: --- Affects Version/s: 1.0.0 > Exclude node groups belonging to global stores in > InternalTopologyBuilder#makeNodeGroups > > > Key: KAFKA-6328 > URL: https://issues.apache.org/jira/browse/KAFKA-6328 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Guozhang Wang > Labels: newbie > > Today when we group processor nodes into groups (i.e. sub-topologies), we > assign the sub-topology id for global tables' dummy groups as well. As a > result, the subtopology ids (and hence task ids) are not consecutive anymore. > This is quite confusing for users trouble shooting and debugging; in > addition, the node group for global stores are not useful as well: we simply > exclude it in all the caller functions of makeNodeGroups. > It would be better to simply exclude the global store's node groups in this > function so that the subtopology ids and task ids are consecutive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6328) Exclude node groups belonging to global stores in InternalTopologyBuilder#makeNodeGroups
Guozhang Wang created KAFKA-6328: Summary: Exclude node groups belonging to global stores in InternalTopologyBuilder#makeNodeGroups Key: KAFKA-6328 URL: https://issues.apache.org/jira/browse/KAFKA-6328 Project: Kafka Issue Type: Bug Components: streams Reporter: Guozhang Wang Today when we group processor nodes into groups (i.e. sub-topologies), we assign the sub-topology id for global tables' dummy groups as well. As a result, the subtopology ids (and hence task ids) are not consecutive anymore. This is quite confusing for users trouble shooting and debugging; in addition, the node group for global stores are not useful as well: we simply exclude it in all the caller functions of makeNodeGroups. It would be better to simply exclude the global store's node groups in this function so that the subtopology ids and task ids are consecutive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6327) IllegalArgumentException in RocksDB when RocksDBException being generated
[ https://issues.apache.org/jira/browse/KAFKA-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282593#comment-16282593 ] Matthias J. Sax commented on KAFKA-6327: Thanks for the heads up [~nyokodo]! > IllegalArgumentException in RocksDB when RocksDBException being generated > - > > Key: KAFKA-6327 > URL: https://issues.apache.org/jira/browse/KAFKA-6327 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Anthony May >Priority: Minor > > RocksDB had a bug where RocksDBException subCodes related to disk usage were > not present and when a RocksDBException is generated for those it throws an > IllegalArgumentException instead obscuring the error. This is > [fixed|https://github.com/facebook/rocksdb/pull/3050] in RocksDB master but > doesn't appear to have been released yet. Adding this issue so that it can be > tracked for a future release. > Exception: > {noformat} > java.lang.IllegalArgumentException: Illegal value provided for SubCode. > at org.rocksdb.Status$SubCode.getSubCode(Status.java:109) > at org.rocksdb.Status.(Status.java:30) > at org.rocksdb.RocksDB.write0(Native Method) > at org.rocksdb.RocksDB.write(RocksDB.java:602) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6319) kafka-acls regression for comma characters (and maybe other characters as well)
[ https://issues.apache.org/jira/browse/KAFKA-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282489#comment-16282489 ] ASF GitHub Bot commented on KAFKA-6319: --- GitHub user rajinisivaram opened a pull request: https://github.com/apache/kafka/pull/4303 KAFKA-6319: Quote strings stored in JSON configs This is required for ACLs where SSL principals contain special characters (e.g. comma) that are escaped using backslash. The strings need to be quoted for JSON to ensure that the JSON stored in ZK is valid. Also converted `SslEndToEndAuthorizationTest` to use a principal with special characters to ensure that this path is tested. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajinisivaram/kafka KAFKA-6319 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/4303.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4303 commit 9518f465a76b3431a36777d83d57bf1430eaa302 Author: Rajini Sivaram Date: 2017-12-07T20:00:03Z KAFKA-6319: Quote strings in JSON to enable ACLs for principals with special chars > kafka-acls regression for comma characters (and maybe other characters as > well) > --- > > Key: KAFKA-6319 > URL: https://issues.apache.org/jira/browse/KAFKA-6319 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 1.0.0 > Environment: Debian 8. Java 8. SSL clients. >Reporter: Jordan Mcmillan >Assignee: Rajini Sivaram > Labels: regression > Fix For: 1.1.0 > > > As of version 1.0.0, kafka-acls.sh no longer recognizes my ACLs stored in > zookeeper. I am using SSL and the default principle builder class. My > principle name contains a comma. Ex: > "CN=myhost.mycompany.com,OU=MyOU,O=MyCompany, Inc.,ST=MyST,C=US" > The default principle builder uses the getName() function in > javax.security.auth.x500: > https://docs.oracle.com/javase/8/docs/api/javax/security/auth/x500/X500Principal.html#getName > The documentation states "The only characters in attribute values that are > escaped are those which section 2.4 of RFC 2253 states must be escaped". This > makes sense as my kakfa-authorizor log shows the comma correctly escaped with > a backslash: > INFO Principal = User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > Inc.,ST=MyST,C=US is Denied Operation = Describe from host = 1.2.3.4 on > resource = Topic:mytopic (kafka.authorizer.logger) > Here's what I get when I try to create the ACL in kafka 1.0: > {code:java} > > # kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add > > --allow-principal User:"CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > > Inc.,ST=MyST,C=US" --operation "Describe" --allow-host "*" --topic="mytopic" > Adding ACLs for resource `Topic:mytopic`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > Current ACLs for resource `Topic:mytopic`: > "User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US has > Allow permission for operations: Describe from hosts: *"> > {code} > Examining Zookeeper, I can see the data. Though I notice that the json string > for ACLs is not actually valid since the backslash is not escaped with a > double backslash. This was true for 0.11.0.1 as well, but was never actually > a problem. > {code:java} > > # zk-shell localhost:2181 > Welcome to zk-shell (1.1.1) > (CLOSED) /> > (CONNECTED) /> get /kafka-acl/Topic/mytopic > {"version":1,"acls":[{"principal":"User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > > Inc.,ST=MyST,C=US","permissionType":"Allow","operation":"Describe","host":"*"}]} > (CONNECTED) /> json_get /kafka-acl/Topic/mytopic acls > Path /kafka-acl/Topic/mytopic has bad JSON. > {code} > Now Kafka does not recognize any ACLs that have an escaped comma in the > principle name and all the clients are denied access. I tried searching for
[jira] [Created] (KAFKA-6327) IllegalArgumentException in RocksDB when RocksDBException being generated
Anthony May created KAFKA-6327: -- Summary: IllegalArgumentException in RocksDB when RocksDBException being generated Key: KAFKA-6327 URL: https://issues.apache.org/jira/browse/KAFKA-6327 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 1.0.0 Reporter: Anthony May Priority: Minor RocksDB had a bug where RocksDBException subCodes related to disk usage were not present and when a RocksDBException is generated for those it throws an IllegalArgumentException instead obscuring the error. This is [fixed|https://github.com/facebook/rocksdb/pull/3050] in RocksDB master but doesn't appear to have been released yet. Adding this issue so that it can be tracked for a future release. Exception: {noformat} java.lang.IllegalArgumentException: Illegal value provided for SubCode. at org.rocksdb.Status$SubCode.getSubCode(Status.java:109) at org.rocksdb.Status.(Status.java:30) at org.rocksdb.RocksDB.write0(Native Method) at org.rocksdb.RocksDB.write(RocksDB.java:602) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-6323: --- Fix Version/s: 1.1.0 > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno > Fix For: 1.1.0, 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception
[ https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282274#comment-16282274 ] Matthias J. Sax commented on KAFKA-6260: You could check out {{https://github.com/apache/kafka/tree/1.0}} and build it yourself. Than update your application dependencies to use the new client jars. > AbstractCoordinator not clearly handles NULL Exception > -- > > Key: KAFKA-6260 > URL: https://issues.apache.org/jira/browse/KAFKA-6260 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RedHat Linux >Reporter: Seweryn Habdank-Wojewodzki >Assignee: Jason Gustafson > Fix For: 1.1.0, 1.0.1 > > > The error reporting is not clear. But it seems that Kafka Heartbeat shuts > down application due to NULL exception caused by "fake" disconnections. > One more comment. We are processing messages in the stream, but sometimes we > have to block processing for minutes, as consumers are not handling too much > load. Is it possibble that when stream is waiting, then heartbeat is as well > blocked? > Can you check that? > {code} > 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Sending Heartbeat request to coordinator > cljp01.eb.lan.at:9093 (id: 2147483646 rack: null) > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Sending HEARTBEAT > {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08} > with correlation id 24 to node 2147483646 > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT > with correlation id 24, received {throttle_time_ms=0,error_code=0} > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout. > 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Cancelled request > {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]} > with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, > apiVersion=6, > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > correlationId=21) with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Fetch request > {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, > maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed > org.apache.kafka.common.errors.DisconnectException: null > 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > grou
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282200#comment-16282200 ] Ismael Juma commented on KAFKA-6322: The issue was always related to holding a file handle while trying to delete the file. KAFKA-6324 fixes one instance of that. > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282186#comment-16282186 ] Ted Yu commented on KAFKA-6322: --- Edited previous comment. Since Dongyan's case was not for file not existing at time of deletion, we should look out for possible cause of file being used. > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282120#comment-16282120 ] Ted Yu edited comment on KAFKA-6322 at 12/7/17 5:20 PM: FileSystemException was encountered in Dongyan's case (which is not among the exceptions thrown by https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)). Edit: If the problem is not fixed, I am not sure what exception would surface when deleting file. was (Author: yuzhih...@gmail.com): FileSystemException was encountered in Dongyan's case (which is not among the exceptions thrown by https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)). > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282156#comment-16282156 ] Ismael Juma commented on KAFKA-6322: It was thrown by Files.delete as shown by the stacktrace. Not sure how the javadoc helps here. > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282120#comment-16282120 ] Ted Yu commented on KAFKA-6322: --- FileSystemException was encountered in Dongyan's case (which is not among the exceptions thrown by https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)). > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage
[ https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282116#comment-16282116 ] Manikumar commented on KAFKA-6199: -- [~rt_skyscanner] yes, jstack output > Single broker with fast growing heap usage > -- > > Key: KAFKA-6199 > URL: https://issues.apache.org/jira/browse/KAFKA-6199 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.1 > Environment: Amazon Linux >Reporter: Robin Tweedie > Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot > 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, > histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, > path2gc.png > > > We have a single broker in our cluster of 25 with fast growing heap usage > which necessitates us restarting it every 12 hours. If we don't restart the > broker, it becomes very slow from long GC pauses and eventually has > {{OutOfMemory}} errors. > See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage > percentage on the broker. A "normal" broker in the same cluster stays below > 50% (averaged) over the same time period. > We have taken heap dumps when the broker's heap usage is getting dangerously > high, and there are a lot of retained {{NetworkSend}} objects referencing > byte buffers. > We also noticed that the single affected broker logs a lot more of this kind > of warning than any other broker: > {noformat} > WARN Attempting to send response via channel for which there is no open > connection, connection id 13 (kafka.network.Processor) > {noformat} > See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log > message visualized across all the brokers (to show it happens a bit on other > brokers, but not nearly as much as it does on the "bad" broker). > I can't make the heap dumps public, but would appreciate advice on how to pin > down the problem better. We're currently trying to narrow it down to a > particular client, but without much success so far. > Let me know what else I could investigate or share to track down the source > of this leak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage
[ https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282111#comment-16282111 ] Robin Tweedie commented on KAFKA-6199: -- [~omkreddy] Nothing obvious beyond the warnings I shared further up. I'll have another look. When you say thread dump, just the output of {{jstack PID}}? > Single broker with fast growing heap usage > -- > > Key: KAFKA-6199 > URL: https://issues.apache.org/jira/browse/KAFKA-6199 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.1 > Environment: Amazon Linux >Reporter: Robin Tweedie > Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot > 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, > histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, > path2gc.png > > > We have a single broker in our cluster of 25 with fast growing heap usage > which necessitates us restarting it every 12 hours. If we don't restart the > broker, it becomes very slow from long GC pauses and eventually has > {{OutOfMemory}} errors. > See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage > percentage on the broker. A "normal" broker in the same cluster stays below > 50% (averaged) over the same time period. > We have taken heap dumps when the broker's heap usage is getting dangerously > high, and there are a lot of retained {{NetworkSend}} objects referencing > byte buffers. > We also noticed that the single affected broker logs a lot more of this kind > of warning than any other broker: > {noformat} > WARN Attempting to send response via channel for which there is no open > connection, connection id 13 (kafka.network.Processor) > {noformat} > See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log > message visualized across all the brokers (to show it happens a bit on other > brokers, but not nearly as much as it does on the "bad" broker). > I can't make the heap dumps public, but would appreciate advice on how to pin > down the problem better. We're currently trying to narrow it down to a > particular client, but without much success so far. > Let me know what else I could investigate or share to track down the source > of this leak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery
[ https://issues.apache.org/jira/browse/KAFKA-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282109#comment-16282109 ] Ismael Juma commented on KAFKA-6324: https://github.com/apache/kafka/pull/4040 > Change LogSegment.delete to deleteIfExists and harden log recovery > -- > > Key: KAFKA-6324 > URL: https://issues.apache.org/jira/browse/KAFKA-6324 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma > Fix For: 1.1.0 > > > Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and KAFKA-6075) > and makes the code more robust. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery
[ https://issues.apache.org/jira/browse/KAFKA-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-6324: --- Description: Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and KAFKA-6075) and makes the code more robust. (was: Fixes KAFKA-6194 and makes the code more robust.) > Change LogSegment.delete to deleteIfExists and harden log recovery > -- > > Key: KAFKA-6324 > URL: https://issues.apache.org/jira/browse/KAFKA-6324 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma > Fix For: 1.1.0 > > > Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and KAFKA-6075) > and makes the code more robust. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282105#comment-16282105 ] Ismael Juma commented on KAFKA-6322: Probably a duplicate of KAFKA-6324, that is nearly ready to be merged. > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.
[ https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282064#comment-16282064 ] dongyan li commented on KAFKA-6322: --- Yes, I think they attribute to the same problem. However, instead of changing the methods for deleting, we may want to find whom is actually holding the file handler when the system is try to delete the folder. I know NFS filesystem is not a "regular" filesystem and that is the root problem. > Error deleting log for topic, all log dirs failed. > -- > > Key: KAFKA-6322 > URL: https://issues.apache.org/jira/browse/KAFKA-6322 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RancherOS with NFS mounted for log directory >Reporter: dongyan li > > Hello, > I encountered a error when I try to delete a topic with kafka version 1.0.0, > the error is not present on version 0.10.2.1 which is the version I upgraded > from. > I suspect that some other thread is still using that file while the Kafka is > trying to delete that. > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting > Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in > dir /opt/kafka/logs. (kafka.log.LogManager) > 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: > Error while deleting log for topicname-0 in dir /opt/kafka/logs > 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: > /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce: > Device or resource busy > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > 12/6/2017 3:37:32 PM at > sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) > 12/6/2017 3:37:32 PM at > sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) > 12/6/2017 3:37:32 PM at java.nio.file.Files.delete(Files.java:1126) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2670) > 12/6/2017 3:37:32 PM at java.nio.file.Files.walkFileTree(Files.java:2742) > 12/6/2017 3:37:32 PM at > org.apache.kafka.common.utils.Utils.delete(Utils.java:619) > 12/6/2017 3:37:32 PM at kafka.log.Log.$anonfun$delete$2(Log.scala:1432) > 12/6/2017 3:37:32 PM at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) > 12/6/2017 3:37:32 PM at kafka.log.Log.maybeHandleIOException(Log.scala:1669) > 12/6/2017 3:37:32 PM at kafka.log.Log.delete(Log.scala:1427) > 12/6/2017 3:37:32 PM at kafka.log.LogManager.deleteLogs(LogManager.scala:626) > 12/6/2017 3:37:32 PM at > kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362) > 12/6/2017 3:37:32 PM at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110) > 12/6/2017 3:37:32 PM at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61) > 12/6/2017 3:37:32 PM at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 12/6/2017 3:37:32 PM at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 12/6/2017 3:37:32 PM at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 12/6/2017 3:37:32 PM at java.lang.Thread.run(Thread.java:748) > {code} > Then > {code:java} > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in > dir /opt/kafka/logs (kafka.log.LogManager) > 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because > all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage
[ https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282041#comment-16282041 ] Manikumar commented on KAFKA-6199: -- Are there any exceptions in the broker? Can you upload the thread dump of the problematic broker? > Single broker with fast growing heap usage > -- > > Key: KAFKA-6199 > URL: https://issues.apache.org/jira/browse/KAFKA-6199 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.1 > Environment: Amazon Linux >Reporter: Robin Tweedie > Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot > 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, > histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, > path2gc.png > > > We have a single broker in our cluster of 25 with fast growing heap usage > which necessitates us restarting it every 12 hours. If we don't restart the > broker, it becomes very slow from long GC pauses and eventually has > {{OutOfMemory}} errors. > See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage > percentage on the broker. A "normal" broker in the same cluster stays below > 50% (averaged) over the same time period. > We have taken heap dumps when the broker's heap usage is getting dangerously > high, and there are a lot of retained {{NetworkSend}} objects referencing > byte buffers. > We also noticed that the single affected broker logs a lot more of this kind > of warning than any other broker: > {noformat} > WARN Attempting to send response via channel for which there is no open > connection, connection id 13 (kafka.network.Processor) > {noformat} > See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log > message visualized across all the brokers (to show it happens a bit on other > brokers, but not nearly as much as it does on the "bad" broker). > I can't make the heap dumps public, but would appreciate advice on how to pin > down the problem better. We're currently trying to narrow it down to a > particular client, but without much success so far. > Let me know what else I could investigate or share to track down the source > of this leak. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6325) Producer.flush() doesn't throw exception on timeout
[ https://issues.apache.org/jira/browse/KAFKA-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282030#comment-16282030 ] Ted Yu commented on KAFKA-6325: --- I assume you have modified your producer code to accommodate this behavior. Looks like option #2 can be adopted. > Producer.flush() doesn't throw exception on timeout > --- > > Key: KAFKA-6325 > URL: https://issues.apache.org/jira/browse/KAFKA-6325 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Erik Scheuter > Attachments: FlushTest.java > > > Reading the javadoc of the flush() method we assumed an exception would've > been thrown when an error occurs. This would make the code more > understandable as we don't have to return a list of futures if we want to > send multiple records to kafka and eventually call future.get(). > When send() is called, the metadata is retrieved and send is blocked on this > process. When this process fails (no brokers) an FutureFailure is returned. > When you just flush; no exceptions will be thrown (in contrast to > future.get()). Ofcourse you can implement callbacks in the send method. > I think there are two solutions: > * Change flush() (& doSend()) and throw exceptions > * Change the javadoc and describe the scenario you can lose events because no > exceptions are thrown and the events are not sent. > I added an unittest to show the behaviour. Kafka doesn't have to be available > for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wei liu updated KAFKA-6326: --- Comment: was deleted (was: Hi [~China body] {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {code:xml} {code} {panel}) > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987 ] wei liu edited comment on KAFKA-6326 at 12/7/17 3:19 PM: - Hi [~China body] {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {code:xml} {code} {panel} was (Author: liuweiwell): {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {code:xml} {code} {panel} > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987 ] wei liu edited comment on KAFKA-6326 at 12/7/17 3:18 PM: - {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {code:xml} {code} {panel} was (Author: liuweiwell): {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {panel} > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
[ https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987 ] wei liu commented on KAFKA-6326: {panel:title=My SQL code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} {code:} select * from table; {code} {panel} > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout > -- > > Key: KAFKA-6326 > URL: https://issues.apache.org/jira/browse/KAFKA-6326 > Project: Kafka > Issue Type: Bug > Components: controller >Affects Versions: 1.0.0 >Reporter: HongLiang > Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, > fast-recver-shutdownbroker.diff > > > when broker is unavailable(such as broker's machine is down), controller will > wait 30 sec timeout by dedault. it seems to be that the timeout waiting is > not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout
HongLiang created KAFKA-6326: Summary: when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout Key: KAFKA-6326 URL: https://issues.apache.org/jira/browse/KAFKA-6326 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 1.0.0 Reporter: HongLiang Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout by dedault. it seems to be that the timeout waiting is not necessary. It will be increase the MTTR of dead broker . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6325) Producer.flush() doesn't throw exception on timeout
Erik Scheuter created KAFKA-6325: Summary: Producer.flush() doesn't throw exception on timeout Key: KAFKA-6325 URL: https://issues.apache.org/jira/browse/KAFKA-6325 Project: Kafka Issue Type: Bug Components: producer Reporter: Erik Scheuter Attachments: FlushTest.java Reading the javadoc of the flush() method we assumed an exception would've been thrown when an error occurs. This would make the code more understandable as we don't have to return a list of futures if we want to send multiple records to kafka and eventually call future.get(). When send() is called, the metadata is retrieved and send is blocked on this process. When this process fails (no brokers) an FutureFailure is returned. When you just flush; no exceptions will be thrown (in contrast to future.get()). Ofcourse you can implement callbacks in the send method. I think there are two solutions: * Change flush() (& doSend()) and throw exceptions * Change the javadoc and describe the scenario you can lose events because no exceptions are thrown and the events are not sent. I added an unittest to show the behaviour. Kafka doesn't have to be available for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3240) Replication issues
[ https://issues.apache.org/jira/browse/KAFKA-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281779#comment-16281779 ] Sergey AKhmatov commented on KAFKA-3240: Happens to me as well. kafka 1.0.0 3-node setup on FreeBSD-11.1 openjdk8-8.152.16 Sometimes .log file becomes corrupted on random topics and partitions. hexdump shows zeroes in a place where record could be. Can't figure out how to reproduce. > Replication issues > -- > > Key: KAFKA-3240 > URL: https://issues.apache.org/jira/browse/KAFKA-3240 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1 > Environment: FreeBSD 10.2-RELEASE-p9 >Reporter: Jan Omar > Labels: reliability > > Hi, > We are trying to replace our 3-broker cluster running on 0.6 with a new > cluster on 0.9.0.1 (but tried 0.8.2.2 and 0.9.0.0 as well). > - 3 kafka nodes with one zookeeper instance on each machine > - FreeBSD 10.2 p9 > - Nagle off (sysctl net.inet.tcp.delayed_ack=0) > - all kafka machines write a ZFS ZIL to a dedicated SSD > - 3 producers on 3 machines, writing to 1 topics, partitioning 3, replication > factor 3 > - acks all > - 10 Gigabit Ethernet, all machines on one switch, ping 0.05 ms worst case. > While using the ProducerPerformance or rdkafka_performance we are seeing very > strange Replication errors. Any hint on what's going on would be highly > appreciated. Any suggestion on how to debug this properly would help as well. > This is what our broker config looks like: > {code} > broker.id=5 > auto.create.topics.enable=false > delete.topic.enable=true > listeners=PLAINTEXT://:9092 > port=9092 > host.name=kafka-five.acc > advertised.host.name=10.5.3.18 > zookeeper.connect=zookeeper-four.acc:2181,zookeeper-five.acc:2181,zookeeper-six.acc:2181 > zookeeper.connection.timeout.ms=6000 > num.replica.fetchers=1 > replica.fetch.max.bytes=1 > replica.fetch.wait.max.ms=500 > replica.high.watermark.checkpoint.interval.ms=5000 > replica.socket.timeout.ms=30 > replica.socket.receive.buffer.bytes=65536 > replica.lag.time.max.ms=1000 > min.insync.replicas=2 > controller.socket.timeout.ms=3 > controller.message.queue.size=100 > log.dirs=/var/db/kafka > num.partitions=8 > message.max.bytes=1 > auto.create.topics.enable=false > log.index.interval.bytes=4096 > log.index.size.max.bytes=10485760 > log.retention.hours=168 > log.flush.interval.ms=1 > log.flush.interval.messages=2 > log.flush.scheduler.interval.ms=2000 > log.roll.hours=168 > log.retention.check.interval.ms=30 > log.segment.bytes=536870912 > zookeeper.connection.timeout.ms=100 > zookeeper.sync.time.ms=5000 > num.io.threads=8 > num.network.threads=4 > socket.request.max.bytes=104857600 > socket.receive.buffer.bytes=1048576 > socket.send.buffer.bytes=1048576 > queued.max.requests=10 > fetch.purgatory.purge.interval.requests=100 > producer.purgatory.purge.interval.requests=100 > replica.lag.max.messages=1000 > {code} > These are the errors we're seeing: > {code:borderStyle=solid} > ERROR [Replica Manager on Broker 5]: Error processing fetch operation on > partition [test,0] offset 50727 (kafka.server.ReplicaManager) > java.lang.IllegalStateException: Invalid message size: 0 > at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:141) > at kafka.log.LogSegment.translateOffset(LogSegment.scala:105) > at kafka.log.LogSegment.read(LogSegment.scala:126) > at kafka.log.Log.read(Log.scala:506) > at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:536) > at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:507) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:507) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:462) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431) > at kafka.server.KafkaApis.handle(KafkaApis.scala:69) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745)0 > {code} > and > {code} > ERROR Found invalid messages during fetch for partition [test,0] offset 2732 > error Message found with corrupt size (0) in shallow iterator > (kafka
[jira] [Commented] (KAFKA-1996) Scaladoc error: unknown tag parameter
[ https://issues.apache.org/jira/browse/KAFKA-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281760#comment-16281760 ] Andy Doddington commented on KAFKA-1996: I am getting this error on scala docs that I have added to a method in a trait. Curiously, the error only appears on the third @param definition in the method header (third out of five - the other three have no errors). I am using IntelliJ IDEA CE 2017.3 Build IC-173.3727.127. > Scaladoc error: unknown tag parameter > - > > Key: KAFKA-1996 > URL: https://issues.apache.org/jira/browse/KAFKA-1996 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Yaguo Zhou >Assignee: Yaguo Zhou >Priority: Minor > Labels: doc > Attachments: scala-doc-unknown-tag-parameter.patch > > > There are some scala doc error: unknown tag parameter -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery
Ismael Juma created KAFKA-6324: -- Summary: Change LogSegment.delete to deleteIfExists and harden log recovery Key: KAFKA-6324 URL: https://issues.apache.org/jira/browse/KAFKA-6324 Project: Kafka Issue Type: Improvement Reporter: Ismael Juma Assignee: Ismael Juma Fix For: 1.1.0 Fixes KAFKA-6194 and makes the code more robust. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5272) Improve validation for Alter Configs (KIP-133)
[ https://issues.apache.org/jira/browse/KAFKA-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-5272: -- Assignee: (was: Ismael Juma) > Improve validation for Alter Configs (KIP-133) > -- > > Key: KAFKA-5272 > URL: https://issues.apache.org/jira/browse/KAFKA-5272 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma > Fix For: 1.1.0 > > > TopicConfigHandler.processConfigChanges() warns about certain topic configs. > We should include such validations in alter configs and reject the change if > the validation fails. Note that this should be done without changing the > behaviour of the ConfigCommand (as it does not have access to the broker > configs). > We should consider adding other validations like KAFKA-4092 and KAFKA-4680. > Finally, the AlterConfigsPolicy mentioned in KIP-133 will be added at the > same time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5276) Support derived and prefixed configs in DescribeConfigs (KIP-133)
[ https://issues.apache.org/jira/browse/KAFKA-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-5276: -- Assignee: (was: Ismael Juma) > Support derived and prefixed configs in DescribeConfigs (KIP-133) > - > > Key: KAFKA-5276 > URL: https://issues.apache.org/jira/browse/KAFKA-5276 > Project: Kafka > Issue Type: Sub-task >Reporter: Ismael Juma > Fix For: 1.1.0 > > > The broker supports config overrides per listener. The way we do that is by > prefixing the configs with the listener name. These configs are not defined > by ConfigDef and they don't appear in `values()`. They do appear in > `originals()`. We should change the code so that we return these configs. > Because these configs are read-only, nothing needs to be done for > AlterConfigs. > With regards to derived configs, an example is advertised.listeners, which > falls back to listeners. This is currently done outside AbstractConfig. We > should look into including these into AbstractConfig so that the fallback > happens for the returned configs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-3665) Default ssl.endpoint.identification.algorithm should be https
[ https://issues.apache.org/jira/browse/KAFKA-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-3665: -- Assignee: (was: Ismael Juma) > Default ssl.endpoint.identification.algorithm should be https > - > > Key: KAFKA-3665 > URL: https://issues.apache.org/jira/browse/KAFKA-3665 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.9.0.1, 0.10.0.0 >Reporter: Ismael Juma > > The default `ssl.endpoint.identification.algorithm` is `null` which is not a > secure default (man in the middle attacks are possible). > We should probably use `https` instead. A more conservative alternative would > be to update the documentation instead of changing the default. > A paper on the topic (thanks to Ryan Pridgeon for the reference): > http://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-4680) min.insync.replicas can be set higher than replication factor
[ https://issues.apache.org/jira/browse/KAFKA-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-4680: -- Assignee: (was: Ismael Juma) > min.insync.replicas can be set higher than replication factor > - > > Key: KAFKA-4680 > URL: https://issues.apache.org/jira/browse/KAFKA-4680 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: James Cheng > > It is possible to specify a min.insync.replicas for a topic that is higher > than the replication factor of the topic. If you do this, you will not be > able to produce to the topic with acks=all. > Furthermore, each produce request (including retries) to the topic will emit > an ERROR level message to the broker debuglogs. If this is not noticed > quickly enough, it can cause the debuglogs to balloon. > We actually hosed one of our Kafka clusters because of this. A topic got > configured with min.insync.replicas > replication factor. It had partitions > on all brokers of our cluster. The broker logs ballooned and filled up the > disks. We run these clusters on CoreOS, and CoreOS's etcd database got > corrupted. (Kafka didn't get corrupted, tho). > I think Kafka should do validation when someone tries to change a topic to > min.insync.replicas > replication factor, and reject the change. > This would presumably affect kafka-topics.sh, kafka-configs.sh, as well as > the CreateTopics operation that came in KIP-4. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-4637) Update system test(s) to use multiple listeners for the same security protocol
[ https://issues.apache.org/jira/browse/KAFKA-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-4637: -- Assignee: (was: Ismael Juma) > Update system test(s) to use multiple listeners for the same security protocol > -- > > Key: KAFKA-4637 > URL: https://issues.apache.org/jira/browse/KAFKA-4637 > Project: Kafka > Issue Type: Test >Reporter: Ismael Juma > Labels: newbie > > Even though this is tested via the JUnit tests introduced by KAFKA-4565, it > would be good to have at least one system test exercising this functionality. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-6319) kafka-acls regression for comma characters (and maybe other characters as well)
[ https://issues.apache.org/jira/browse/KAFKA-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram reassigned KAFKA-6319: - Assignee: Rajini Sivaram > kafka-acls regression for comma characters (and maybe other characters as > well) > --- > > Key: KAFKA-6319 > URL: https://issues.apache.org/jira/browse/KAFKA-6319 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 1.0.0 > Environment: Debian 8. Java 8. SSL clients. >Reporter: Jordan Mcmillan >Assignee: Rajini Sivaram > Labels: regression > Fix For: 1.1.0 > > > As of version 1.0.0, kafka-acls.sh no longer recognizes my ACLs stored in > zookeeper. I am using SSL and the default principle builder class. My > principle name contains a comma. Ex: > "CN=myhost.mycompany.com,OU=MyOU,O=MyCompany, Inc.,ST=MyST,C=US" > The default principle builder uses the getName() function in > javax.security.auth.x500: > https://docs.oracle.com/javase/8/docs/api/javax/security/auth/x500/X500Principal.html#getName > The documentation states "The only characters in attribute values that are > escaped are those which section 2.4 of RFC 2253 states must be escaped". This > makes sense as my kakfa-authorizor log shows the comma correctly escaped with > a backslash: > INFO Principal = User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > Inc.,ST=MyST,C=US is Denied Operation = Describe from host = 1.2.3.4 on > resource = Topic:mytopic (kafka.authorizer.logger) > Here's what I get when I try to create the ACL in kafka 1.0: > {code:java} > > # kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add > > --allow-principal User:"CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > > Inc.,ST=MyST,C=US" --operation "Describe" --allow-host "*" --topic="mytopic" > Adding ACLs for resource `Topic:mytopic`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > Current ACLs for resource `Topic:mytopic`: > "User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US has > Allow permission for operations: Describe from hosts: *"> > {code} > Examining Zookeeper, I can see the data. Though I notice that the json string > for ACLs is not actually valid since the backslash is not escaped with a > double backslash. This was true for 0.11.0.1 as well, but was never actually > a problem. > {code:java} > > # zk-shell localhost:2181 > Welcome to zk-shell (1.1.1) > (CLOSED) /> > (CONNECTED) /> get /kafka-acl/Topic/mytopic > {"version":1,"acls":[{"principal":"User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, > > Inc.,ST=MyST,C=US","permissionType":"Allow","operation":"Describe","host":"*"}]} > (CONNECTED) /> json_get /kafka-acl/Topic/mytopic acls > Path /kafka-acl/Topic/mytopic has bad JSON. > {code} > Now Kafka does not recognize any ACLs that have an escaped comma in the > principle name and all the clients are denied access. I tried searching for > anything relevant that changed between 0.11.0.1 and 1.0.0 and I noticed > KAFKA-1595: > https://github.com/apache/kafka/commit/8b14e11743360a711b2bb670cf503acc0e604602#diff-db89a14f2c85068b1f0076d52e590d05 > Could the new json library be failing because the acl is not actually a valid > json string? > I can store a valid json string with an escaped backslash (ex: containing > "O=MyCompany\\, Inc."), and the comparison between the principle builder > string, and what is read from zookeeper succeeds. However, successively apply > ACLs seems to strip the backslashes and generally corrupts things: > {code:java} > > # kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 > > --add --allow-principal > > User:"CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\\, Inc.,ST=MyST,C=US" > > --operation Describe --group="*" --allow-host "*" --topic="mytopic" > Adding ACLs for resource `Topic:mytopic`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > Adding ACLs for resource `Group:*`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > Current ACLs for resource `Topic:mytopic`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > Current ACLs for resource `Group:*`: > User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US > has Allow permission for operations: Describe from hosts: * > {code} > It looks as though the backslash used for escaping RFC 2253 strings is not > being handled correctly. That's as far as I've dug.
[jira] [Resolved] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency
[ https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma resolved KAFKA-6313. Resolution: Fixed > Kafka Core should have explicit SLF4J API dependency > > > Key: KAFKA-6313 > URL: https://issues.apache.org/jira/browse/KAFKA-6313 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.0 >Reporter: Randall Hauch >Assignee: Randall Hauch > Fix For: 1.1.0 > > > In an application that depends on the Kafka server artifacts with: > {code:xml} > > org.apache.kafka > kafka_2.11 > 1.1.0-SNAPSHOT > > {code} > and then running the server programmatically, the following error occurs: > {noformat} > [2017-11-23 01:01:45,029] INFO Shutting down producer > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,051] INFO Closing all sync producers > (kafka.producer.ProducerPool:63) > [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down > (kafka.server.KafkaServer:63) > [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during > KafkaServer shutdown. (kafka.server.KafkaServer:161) > java.lang.NoClassDefFoundError: org/slf4j/event/Level > at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83) > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520) >... > Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level > at java.net.URLClassLoader$1.run(URLClassLoader.java:359) > at java.net.URLClassLoader$1.run(URLClassLoader.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:347) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 25 more > {noformat} > It appears that KAFKA-1044 and [this > PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from > Core but [added use > of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34] > the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The > {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, > which is currently not included in the dependencies of > {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the > server, the SLF4J API library probably needs to be added to the dependencies. > [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that > the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just > because this probably needs to be sorted out before the release. > *Update*: As the comments below explain, the actual problem is a bit > different to what is in the JIRA description. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency
[ https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-6313: --- Description: In an application that depends on the Kafka server artifacts with: {code:xml} org.apache.kafka kafka_2.11 1.1.0-SNAPSHOT {code} and then running the server programmatically, the following error occurs: {noformat} [2017-11-23 01:01:45,029] INFO Shutting down producer (kafka.producer.Producer:63) [2017-11-23 01:01:45,051] INFO Closing all sync producers (kafka.producer.ProducerPool:63) [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms (kafka.producer.Producer:63) [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer:63) [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during KafkaServer shutdown. (kafka.server.KafkaServer:161) java.lang.NoClassDefFoundError: org/slf4j/event/Level at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520) ... Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level at java.net.URLClassLoader$1.run(URLClassLoader.java:359) at java.net.URLClassLoader$1.run(URLClassLoader.java:348) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:347) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} It appears that KAFKA-1044 and [this PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from Core but [added use of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34] the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, which is currently not included in the dependencies of {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the server, the SLF4J API library probably needs to be added to the dependencies. [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just because this probably needs to be sorted out before the release. *Update*: As the comments below explain, the actual problem is a bit different to what is in the JIRA description. was: In an application that depends on the Kafka server artifacts with: {code:xml} org.apache.kafka kafka_2.11 1.1.0-SNAPSHOT {code} and then running the server programmatically, the following error occurs: {noformat} [2017-11-23 01:01:45,029] INFO Shutting down producer (kafka.producer.Producer:63) [2017-11-23 01:01:45,051] INFO Closing all sync producers (kafka.producer.ProducerPool:63) [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms (kafka.producer.Producer:63) [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer:63) [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during KafkaServer shutdown. (kafka.server.KafkaServer:161) java.lang.NoClassDefFoundError: org/slf4j/event/Level at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83) at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520) ... Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level at java.net.URLClassLoader$1.run(URLClassLoader.java:359) at java.net.URLClassLoader$1.run(URLClassLoader.java:348) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:347) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} It appears that KAFKA-1044 and [this PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from Core but [added use of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34] the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, which is currently not included in the dependencies of {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the server, the SLF4J API library probably needs to be added to the dependencies. [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that the SLF4J API be marked
[jira] [Updated] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency
[ https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-6313: --- Summary: Kafka Core should have explicit SLF4J API dependency (was: Kafka Core maven dependencies are missing SLF4J API) > Kafka Core should have explicit SLF4J API dependency > > > Key: KAFKA-6313 > URL: https://issues.apache.org/jira/browse/KAFKA-6313 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.0 >Reporter: Randall Hauch >Assignee: Randall Hauch > Fix For: 1.1.0 > > > In an application that depends on the Kafka server artifacts with: > {code:xml} > > org.apache.kafka > kafka_2.11 > 1.1.0-SNAPSHOT > > {code} > and then running the server programmatically, the following error occurs: > {noformat} > [2017-11-23 01:01:45,029] INFO Shutting down producer > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,051] INFO Closing all sync producers > (kafka.producer.ProducerPool:63) > [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down > (kafka.server.KafkaServer:63) > [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during > KafkaServer shutdown. (kafka.server.KafkaServer:161) > java.lang.NoClassDefFoundError: org/slf4j/event/Level > at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83) > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520) >... > Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level > at java.net.URLClassLoader$1.run(URLClassLoader.java:359) > at java.net.URLClassLoader$1.run(URLClassLoader.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:347) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 25 more > {noformat} > It appears that KAFKA-1044 and [this > PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from > Core but [added use > of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34] > the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The > {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, > which is currently not included in the dependencies of > {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the > server, the SLF4J API library probably needs to be added to the dependencies. > [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that > the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just > because this probably needs to be sorted out before the release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6313) Kafka Core maven dependencies are missing SLF4J API
[ https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281499#comment-16281499 ] ASF GitHub Bot commented on KAFKA-6313: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/4296 > Kafka Core maven dependencies are missing SLF4J API > --- > > Key: KAFKA-6313 > URL: https://issues.apache.org/jira/browse/KAFKA-6313 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.0 >Reporter: Randall Hauch >Assignee: Randall Hauch > Fix For: 1.1.0 > > > In an application that depends on the Kafka server artifacts with: > {code:xml} > > org.apache.kafka > kafka_2.11 > 1.1.0-SNAPSHOT > > {code} > and then running the server programmatically, the following error occurs: > {noformat} > [2017-11-23 01:01:45,029] INFO Shutting down producer > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,051] INFO Closing all sync producers > (kafka.producer.ProducerPool:63) > [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms > (kafka.producer.Producer:63) > [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down > (kafka.server.KafkaServer:63) > [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during > KafkaServer shutdown. (kafka.server.KafkaServer:161) > java.lang.NoClassDefFoundError: org/slf4j/event/Level > at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83) > at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520) >... > Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level > at java.net.URLClassLoader$1.run(URLClassLoader.java:359) > at java.net.URLClassLoader$1.run(URLClassLoader.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:347) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 25 more > {noformat} > It appears that KAFKA-1044 and [this > PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from > Core but [added use > of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34] > the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The > {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, > which is currently not included in the dependencies of > {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the > server, the SLF4J API library probably needs to be added to the dependencies. > [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that > the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just > because this probably needs to be sorted out before the release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately
[ https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281498#comment-16281498 ] ASF GitHub Bot commented on KAFKA-6323: --- GitHub user fredfp opened a pull request: https://github.com/apache/kafka/pull/4301 KAFKA-6323: punctuate with WALL_CLOCK_TIME triggered immediately This is the only way I found to fix the issue without altering the API. @mihbor @mjsax the contribution is my original work and I license the work to the project under the project's open source license You can merge this pull request into a Git repository by running: $ git pull https://github.com/fredfp/kafka trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/4301.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4301 commit 0c9b6cac6a5de8e6db81e6ae6f42fe8933012621 Author: Frederic Arno Date: 2017-12-07T08:18:42Z KAFKA-6323: fix punctuate with WALL_CLOCK_TIME triggered immediately > punctuate with WALL_CLOCK_TIME triggered immediately > > > Key: KAFKA-6323 > URL: https://issues.apache.org/jira/browse/KAFKA-6323 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Frederic Arno > Fix For: 1.0.1 > > > When working on a custom Processor from which I am scheduling a punctuation > using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I > set, a call to my Punctuator is always triggered immediately. > Having a quick look at kafka-streams' code, I could find that all > PunctuationSchedule's timestamps are matched against the current time in > order to decide whether or not to trigger the punctuator > (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). > However, I've only seen code that initializes PunctuationSchedule's timestamp > to 0, which I guess is what is causing an immediate punctuation. > At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's > timestamp be initialized to current time + interval? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask
[ https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281484#comment-16281484 ] Seweryn Habdank-Wojewodzki commented on KAFKA-5882: --- Hmm... I will try to retest all together with fix for KAFKA-6260. > NullPointerException in StreamTask > -- > > Key: KAFKA-5882 > URL: https://issues.apache.org/jira/browse/KAFKA-5882 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Seweryn Habdank-Wojewodzki > > It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] > is made, but introduce some other issue. > In some cases (I am not sure which ones) I got NPE (below). > I would expect that even in case of FATAL error anythink except NPE is thrown. > {code} > 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener > for group streamer failed on partition assignment > java.lang.NullPointerException: null > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527) > [myapp-streamer.jar:?] > 2017-09-12 23:34:54 INFO StreamThread:1040 - stream-thread > [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down > 2017-09-12 23:34:54 INFO KafkaProducer:972 - Closing the Kafka producer with > timeoutMillis = 9223372036854775807 ms. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5882) NullPointerException in StreamTask
[ https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272747#comment-16272747 ] Seweryn Habdank-Wojewodzki edited comment on KAFKA-5882 at 12/7/17 8:09 AM: [~mjsax] In meanwhile I had ported the code to {{1.0.0}} :-). I will try do my best. was (Author: habdank): [~mjsax] In mean while I had ported the code to {{1.0.0}} :-). I will try do my best. > NullPointerException in StreamTask > -- > > Key: KAFKA-5882 > URL: https://issues.apache.org/jira/browse/KAFKA-5882 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Seweryn Habdank-Wojewodzki > > It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] > is made, but introduce some other issue. > In some cases (I am not sure which ones) I got NPE (below). > I would expect that even in case of FATAL error anythink except NPE is thrown. > {code} > 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener > for group streamer failed on partition assignment > java.lang.NullPointerException: null > at > org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183) > ~[myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078) > [myapp-streamer.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553) > [myapp-streamer.jar:?] > at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527) > [myapp-streamer.jar:?] > 2017-09-12 23:34:54 INFO StreamThread:1040 - stream-thread > [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down > 2017-09-12 23:34:54 INFO KafkaProducer:972 - Closing the Kafka producer with > timeoutMillis = 9223372036854775807 ms. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception
[ https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281481#comment-16281481 ] Seweryn Habdank-Wojewodzki commented on KAFKA-6260: --- Thanks a lot! When releases will be ready, I will test them. I am not sure if and how I can get lib earlier before they are relased :-). > AbstractCoordinator not clearly handles NULL Exception > -- > > Key: KAFKA-6260 > URL: https://issues.apache.org/jira/browse/KAFKA-6260 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: RedHat Linux >Reporter: Seweryn Habdank-Wojewodzki >Assignee: Jason Gustafson > Fix For: 1.1.0, 1.0.1 > > > The error reporting is not clear. But it seems that Kafka Heartbeat shuts > down application due to NULL exception caused by "fake" disconnections. > One more comment. We are processing messages in the stream, but sometimes we > have to block processing for minutes, as consumers are not handling too much > load. Is it possibble that when stream is waiting, then heartbeat is as well > blocked? > Can you check that? > {code} > 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Sending Heartbeat request to coordinator > cljp01.eb.lan.at:9093 (id: 2147483646 rack: null) > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Sending HEARTBEAT > {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08} > with correlation id 24 to node 2147483646 > 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT > with correlation id 24, received {throttle_time_ms=0,error_code=0} > 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Received successful Heartbeat response > 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout. > 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Cancelled request > {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]} > with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, > apiVersion=6, > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > correlationId=21) with correlation id 21 due to node 1 being disconnected > 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > groupId=kafka-endpoint] Fetch request > {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, > maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, > maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed > org.apache.kafka.common.errors.DisconnectException: null > 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer > clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer, > gro