[jira] [Commented] (KAFKA-9376) Plugin class loader not found using MM2
[ https://issues.apache.org/jira/browse/KAFKA-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013391#comment-17013391 ] yzhou commented on KAFKA-9376: -- [~ryannedolan] Here is a portion of the startup log: [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:264 ] - [ INFO ] Registered loader: sun.misc.Launcher$AppClassLoader@18b4aac2 [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.MockConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.SchemaSourceConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.mirror.MirrorSourceConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.VerifiableSinkConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.VerifiableSourceConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.MockSourceConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.tools.MockSinkConnector' [ org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:193 ] - [ INFO ] Added plugin 'org.apache.kafka.connect.converters.LongConverter' MirrorCheckpointConnector,MirrorHeartbeatConnector,MirrorSourceConnector loaded by AppCliassLoader(getParent()) , but they should be loaded by pluginLoader independently (PluginUtils.shouldLoadInIsolation(name)) > Plugin class loader not found using MM2 > --- > > Key: KAFKA-9376 > URL: https://issues.apache.org/jira/browse/KAFKA-9376 > Project: Kafka > Issue Type: Bug > Components: mirrormaker >Affects Versions: 2.4.0 >Reporter: Sinóros-Szabó Péter >Priority: Minor > > I am using MM2 (release 2.4.0 with scala 2.12) I geta bunch of classloader > errors. MM2 seems to be working, but I do not know if all of it components > are working as expected as this is the first time I use MM2. > I run MM2 with the following command: > {code:java} > ./bin/connect-mirror-maker.sh config/connect-mirror-maker.properties > {code} > Errors are: > {code:java} > [2020-01-07 15:06:17,892] ERROR Plugin class loader for connector: > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not found. > Returning: > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@6ebf0f36 > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165) > [2020-01-07 15:06:17,889] ERROR Plugin class loader for connector: > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not found. > Returning: > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@6ebf0f36 > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:165) > [2020-01-07 15:06:17,904] INFO ConnectorConfig values: > config.action.reload = restart > connector.class = org.apache.kafka.connect.mirror.MirrorHeartbeatConnector > errors.log.enable = false > errors.log.include.messages = false > errors.retry.delay.max.ms = 6 > errors.retry.timeout = 0 > errors.tolerance = none > header.converter = null > key.converter = null > name = MirrorHeartbeatConnector > tasks.max = 1 > transforms = [] > value.converter = null > (org.apache.kafka.connect.runtime.ConnectorConfig:347) > [2020-01-07 15:06:17,904] INFO EnrichedConnectorConfig values: > config.action.reload = restart > connector.class = org.apache.kafka.connect.mirror.MirrorHeartbeatConnector > errors.log.enable = false > errors.log.include.messages = false > errors.retry.delay.max.ms = 6 > errors.retry.timeout = 0 > errors.tolerance = none > header.converter = null > key.converter = null > name = MirrorHeartbeatConnector > tasks.max = 1 > transforms = [] > value.converter = null > > (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:347) > [2020-01-07 15:06:17,905] INFO TaskConfig values: > task.class = class org.apache.kafka.connect.mirror.MirrorHeartbeatTask > (org.apache.kafka.connect.runtime.TaskConfig:347) > [2020-01-07 15:06:17,905] INFO Instantiated task MirrorHeartbeatConnector-0 > with version 1 of type
[jira] [Commented] (KAFKA-8770) Either switch to or add an option for emit-on-change
[ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013373#comment-17013373 ] John Roesler commented on KAFKA-8770: - Hey! Thanks for the effort on this. I looked over your draft. For the KIP, I think the only question is whether this should simply become the default, or whether there should be a config. Or whether it should be the new default with an opt-out config. I don’t think the question of whether we should separately store a hash of the value vs. the value itself really needs to be discussed in a kip. That would be more like an implementation discussion. > Either switch to or add an option for emit-on-change > > > Key: KAFKA-8770 > URL: https://issues.apache.org/jira/browse/KAFKA-8770 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Priority: Major > Labels: needs-kip > > Currently, Streams offers two emission models: > * emit-on-window-close: (using Suppression) > * emit-on-update: (i.e., emit a new result whenever a new record is > processed, regardless of whether the result has changed) > There is also an option to drop some intermediate results, either using > caching or suppression. > However, there is no support for emit-on-change, in which results would be > forwarded only if the result has changed. This has been reported to be > extremely valuable as a performance optimizations for some high-traffic > applications, and it reduces the computational burden both internally for > downstream Streams operations, as well as for external systems that consume > the results, and currently have to deal with a lot of "no-op" changes. > It would be pretty straightforward to implement this, by loading the prior > results before a stateful operation and comparing with the new result before > persisting or forwarding. In many cases, we load the prior result anyway, so > it may not be a significant performance impact either. > One design challenge is what to do with timestamps. If we get one record at > time 1 that produces a result, and then another at time 2 that produces a > no-op, what should be the timestamp of the result, 1 or 2? emit-on-change > would require us to say 1. > Clearly, we'd need to do some serious benchmarks to evaluate any potential > implementation of emit-on-change. > Another design challenge is to decide if we should just automatically provide > emit-on-change for stateful operators, or if it should be configurable. > Configuration increases complexity, so unless the performance impact is high, > we may just want to change the emission model without a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-8770) Either switch to or add an option for emit-on-change
[ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013370#comment-17013370 ] Richard Yu edited comment on KAFKA-8770 at 1/11/20 4:23 AM: I have created a draft KIP for this JIRA. [~xmar] [~mjsax] [~vvcephei] Input would be greatly appreciated! Right now, I've not formalized any API additions / configuration changes. It would be good first if we can get some discussion on what is needed and what is not! [https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams] was (Author: yohan123): I have created a draft KIP for this JIRA. [~xmar] [~mjsax] [~vvcephei] Input would be greatly appreciated! Right now, I've not formalized any API additions / configuration changes. It would be good first if we can get some discussion on what is needed and what is not! [https://cwiki.apache.org/confluence/display/KAFKA/KIP-NUM%3A+Add+emit+on+change+support+for+Kafka+Streams#KIP-NUM:AddemitonchangesupportforKafkaStreams-DetailsonCoreImprovement] > Either switch to or add an option for emit-on-change > > > Key: KAFKA-8770 > URL: https://issues.apache.org/jira/browse/KAFKA-8770 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Priority: Major > Labels: needs-kip > > Currently, Streams offers two emission models: > * emit-on-window-close: (using Suppression) > * emit-on-update: (i.e., emit a new result whenever a new record is > processed, regardless of whether the result has changed) > There is also an option to drop some intermediate results, either using > caching or suppression. > However, there is no support for emit-on-change, in which results would be > forwarded only if the result has changed. This has been reported to be > extremely valuable as a performance optimizations for some high-traffic > applications, and it reduces the computational burden both internally for > downstream Streams operations, as well as for external systems that consume > the results, and currently have to deal with a lot of "no-op" changes. > It would be pretty straightforward to implement this, by loading the prior > results before a stateful operation and comparing with the new result before > persisting or forwarding. In many cases, we load the prior result anyway, so > it may not be a significant performance impact either. > One design challenge is what to do with timestamps. If we get one record at > time 1 that produces a result, and then another at time 2 that produces a > no-op, what should be the timestamp of the result, 1 or 2? emit-on-change > would require us to say 1. > Clearly, we'd need to do some serious benchmarks to evaluate any potential > implementation of emit-on-change. > Another design challenge is to decide if we should just automatically provide > emit-on-change for stateful operators, or if it should be configurable. > Configuration increases complexity, so unless the performance impact is high, > we may just want to change the emission model without a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8770) Either switch to or add an option for emit-on-change
[ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013370#comment-17013370 ] Richard Yu commented on KAFKA-8770: --- I have created a draft KIP for this JIRA. [~xmar] [~mjsax] [~vvcephei] Input would be greatly appreciated! Right now, I've not formalized any API additions / configuration changes. It would be good first if we can get some discussion on what is needed and what is not! [https://cwiki.apache.org/confluence/display/KAFKA/KIP-NUM%3A+Add+emit+on+change+support+for+Kafka+Streams#KIP-NUM:AddemitonchangesupportforKafkaStreams-DetailsonCoreImprovement] > Either switch to or add an option for emit-on-change > > > Key: KAFKA-8770 > URL: https://issues.apache.org/jira/browse/KAFKA-8770 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Priority: Major > Labels: needs-kip > > Currently, Streams offers two emission models: > * emit-on-window-close: (using Suppression) > * emit-on-update: (i.e., emit a new result whenever a new record is > processed, regardless of whether the result has changed) > There is also an option to drop some intermediate results, either using > caching or suppression. > However, there is no support for emit-on-change, in which results would be > forwarded only if the result has changed. This has been reported to be > extremely valuable as a performance optimizations for some high-traffic > applications, and it reduces the computational burden both internally for > downstream Streams operations, as well as for external systems that consume > the results, and currently have to deal with a lot of "no-op" changes. > It would be pretty straightforward to implement this, by loading the prior > results before a stateful operation and comparing with the new result before > persisting or forwarding. In many cases, we load the prior result anyway, so > it may not be a significant performance impact either. > One design challenge is what to do with timestamps. If we get one record at > time 1 that produces a result, and then another at time 2 that produces a > no-op, what should be the timestamp of the result, 1 or 2? emit-on-change > would require us to say 1. > Clearly, we'd need to do some serious benchmarks to evaluate any potential > implementation of emit-on-change. > Another design challenge is to decide if we should just automatically provide > emit-on-change for stateful operators, or if it should be configurable. > Configuration increases complexity, so unless the performance impact is high, > we may just want to change the emission model without a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-7499) Extend ProductionExceptionHandler to cover serialization exceptions
[ https://issues.apache.org/jira/browse/KAFKA-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jbfletch reassigned KAFKA-7499: --- Assignee: jbfletch (was: Walker Carlson) > Extend ProductionExceptionHandler to cover serialization exceptions > --- > > Key: KAFKA-7499 > URL: https://issues.apache.org/jira/browse/KAFKA-7499 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Matthias J. Sax >Assignee: jbfletch >Priority: Major > Labels: beginner, kip, newbie > > In > [KIP-210|https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce], > an exception handler for the write path was introduced. This exception > handler covers exception that are raised in the producer callback. > However, serialization happens before the data is handed to the producer with > Kafka Streams itself and the producer uses `byte[]/byte[]` key-value-pair > types. > Thus, we might want to extend the ProductionExceptionHandler to cover > serialization exception, too, to skip over corrupted output messages. An > example could be a "String" message that contains invalid JSON and should be > serialized as JSON. > KIP-399 (not voted yet; feel free to pick it up): > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-2758) Improve Offset Commit Behavior
[ https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013271#comment-17013271 ] Guozhang Wang commented on KAFKA-2758: -- We used to put it on hold especially for 1) since KIP-211 is not merged yet, however even now after KIP-211 is merged we should be careful since a newer versioned client may talk to an older versioned broker (2.0-) which does not have KIP-211 yet. We have some plans for automatically detecting broker versions so I'd suggest before that we do not pick up this ticket yet. > Improve Offset Commit Behavior > -- > > Key: KAFKA-2758 > URL: https://issues.apache.org/jira/browse/KAFKA-2758 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Guozhang Wang >Priority: Major > Labels: newbie, reliability > > There are two scenarios of offset committing that we can improve: > 1) we can filter the partitions whose committed offset is equal to the > consumed offset, meaning there is no new consumed messages from this > partition and hence we do not need to include this partition in the commit > request. > 2) we can make a commit request right after resetting to a fetch / consume > position either according to the reset policy (e.g. on consumer starting up, > or handling of out of range offset, etc), or through the {code} seek {code} > so that if the consumer fails right after these event, upon recovery it can > restarts from the reset position instead of resetting again: this can lead > to, for example, data loss if we use "largest" as reset policy while there > are new messages coming to the fetching partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-8770) Either switch to or add an option for emit-on-change
[ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013230#comment-17013230 ] Richard Yu edited comment on KAFKA-8770 at 1/10/20 10:09 PM: - [~vvcephei] Actually, I did think of something which might be very useful as a performance enhancement. As mentioned in the JIRA description, Kafka Streams would load prior results and compare them to the original. However, that nonetheless has potential to be a severe hit to processing speed. I propose that instead of loading the prior results, we just get the hash code for that prior result instead. If there is a no op, the hash code of the prior result would be the same as the one that we have currently. However, if the result has _changed,_ then if the hash code function have been implemented correctly, the hash code would have changed correspondingly as well. Therefore, what should be done is the following: # We keep the hash codes of prior results in some store / whatever other device we might be able to use for storage. # Whenever we obtain a new processed result, retrieve corresponding prior hashcode to see if it had changed. # Update store / table as necessary if the hash code has changed. was (Author: yohan123): [~vvcephei] Actually, I did think of something which might be very useful as a performance enhancement. As mentioned in the JIRA description, Kafka Streams would load prior results and compare them to the original. However, that nonetheless has potential to be a severe hit to processing speed. I propose that instead of loading the prior results, we just get the hash code for that prior result instead. If there is a no op, the hash code of the prior result would be the same as the one that we have currently. However, if the result has _changed,_ then if the hash code function have been implemented correctly, the hash code would have changed correspondingly as well. Therefore, what should be done is the following: # We keep the hash codes of prior results in some store / whatever other device we might be able to use for storage. # Whenever we obtain a new processed result, retrieve corresponding prior hashcode to see if it had changed. # Update store / table as necessary if the hash code has changed. > Either switch to or add an option for emit-on-change > > > Key: KAFKA-8770 > URL: https://issues.apache.org/jira/browse/KAFKA-8770 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Priority: Major > Labels: needs-kip > > Currently, Streams offers two emission models: > * emit-on-window-close: (using Suppression) > * emit-on-update: (i.e., emit a new result whenever a new record is > processed, regardless of whether the result has changed) > There is also an option to drop some intermediate results, either using > caching or suppression. > However, there is no support for emit-on-change, in which results would be > forwarded only if the result has changed. This has been reported to be > extremely valuable as a performance optimizations for some high-traffic > applications, and it reduces the computational burden both internally for > downstream Streams operations, as well as for external systems that consume > the results, and currently have to deal with a lot of "no-op" changes. > It would be pretty straightforward to implement this, by loading the prior > results before a stateful operation and comparing with the new result before > persisting or forwarding. In many cases, we load the prior result anyway, so > it may not be a significant performance impact either. > One design challenge is what to do with timestamps. If we get one record at > time 1 that produces a result, and then another at time 2 that produces a > no-op, what should be the timestamp of the result, 1 or 2? emit-on-change > would require us to say 1. > Clearly, we'd need to do some serious benchmarks to evaluate any potential > implementation of emit-on-change. > Another design challenge is to decide if we should just automatically provide > emit-on-change for stateful operators, or if it should be configurable. > Configuration increases complexity, so unless the performance impact is high, > we may just want to change the emission model without a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8770) Either switch to or add an option for emit-on-change
[ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013230#comment-17013230 ] Richard Yu commented on KAFKA-8770: --- [~vvcephei] Actually, I did think of something which might be very useful as a performance enhancement. As mentioned in the JIRA description, Kafka Streams would load prior results and compare them to the original. However, that nonetheless has potential to be a severe hit to processing speed. I propose that instead of loading the prior results, we just get the hash code for that prior result instead. If there is a no op, the hash code of the prior result would be the same as the one that we have currently. However, if the result has _changed,_ then if the hash code function have been implemented correctly, the hash code would have changed correspondingly as well. Therefore, what should be done is the following: # We keep the hash codes of prior results in some store / whatever other device we might be able to use for storage. # Whenever we obtain a new processed result, retrieve corresponding prior hashcode to see if it had changed. # Update store / table as necessary if the hash code has changed. > Either switch to or add an option for emit-on-change > > > Key: KAFKA-8770 > URL: https://issues.apache.org/jira/browse/KAFKA-8770 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Priority: Major > Labels: needs-kip > > Currently, Streams offers two emission models: > * emit-on-window-close: (using Suppression) > * emit-on-update: (i.e., emit a new result whenever a new record is > processed, regardless of whether the result has changed) > There is also an option to drop some intermediate results, either using > caching or suppression. > However, there is no support for emit-on-change, in which results would be > forwarded only if the result has changed. This has been reported to be > extremely valuable as a performance optimizations for some high-traffic > applications, and it reduces the computational burden both internally for > downstream Streams operations, as well as for external systems that consume > the results, and currently have to deal with a lot of "no-op" changes. > It would be pretty straightforward to implement this, by loading the prior > results before a stateful operation and comparing with the new result before > persisting or forwarding. In many cases, we load the prior result anyway, so > it may not be a significant performance impact either. > One design challenge is what to do with timestamps. If we get one record at > time 1 that produces a result, and then another at time 2 that produces a > no-op, what should be the timestamp of the result, 1 or 2? emit-on-change > would require us to say 1. > Clearly, we'd need to do some serious benchmarks to evaluate any potential > implementation of emit-on-change. > Another design challenge is to decide if we should just automatically provide > emit-on-change for stateful operators, or if it should be configurable. > Configuration increases complexity, so unless the performance impact is high, > we may just want to change the emission model without a configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-6078) Investigate failure of ReassignPartitionsClusterTest.shouldExpandCluster
[ https://issues.apache.org/jira/browse/KAFKA-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013208#comment-17013208 ] Matthias J. Sax commented on KAFKA-6078: [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/172/testReport/junit/kafka.admin/ReassignPartitionsClusterTest/shouldExpandCluster/] > Investigate failure of ReassignPartitionsClusterTest.shouldExpandCluster > > > Key: KAFKA-6078 > URL: https://issues.apache.org/jira/browse/KAFKA-6078 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.3.0 >Reporter: Dong Lin >Priority: Major > Labels: flaky-test > Fix For: 2.5.0 > > > See https://github.com/apache/kafka/pull/4084 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9399) Flaky Test BranchedMultiLevelRepartitionConnectedTopologyTest.testTopologyBuild
Matthias J. Sax created KAFKA-9399: -- Summary: Flaky Test BranchedMultiLevelRepartitionConnectedTopologyTest.testTopologyBuild Key: KAFKA-9399 URL: https://issues.apache.org/jira/browse/KAFKA-9399 Project: Kafka Issue Type: Bug Components: streams, unit tests Affects Versions: 2.5.0 Reporter: Matthias J. Sax [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/172/testReport/junit/org.apache.kafka.streams.integration/BranchedMultiLevelRepartitionConnectedTopologyTest/testTopologyBuild/] {quote}java.lang.AssertionError: Condition not met within timeout 15000. Failed to observe stream transits to RUNNING at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26) at org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:369) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:417) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:368) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:339) at org.apache.kafka.streams.integration.BranchedMultiLevelRepartitionConnectedTopologyTest.testTopologyBuild(BranchedMultiLevelRepartitionConnectedTopologyTest.java:146){quote} STDOUT {quote}[2020-01-10 20:54:59,190] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-consumer, groupId=branched-repartition-topic-test] Connection to node 0 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) [2020-01-10 20:54:59,264] WARN [Producer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-producer] Connection to node 0 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) [2020-01-10 20:54:59,314] WARN [AdminClient clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-admin] Connection to node 0 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) ... [2020-01-10 20:54:59,356] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-restore-consumer, groupId=null] Connection to node -1 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) [2020-01-10 20:54:59,356] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-restore-consumer, groupId=null] Bootstrap broker localhost:38720 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1024) [2020-01-10 20:54:59,391] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-consumer, groupId=branched-repartition-topic-test] Connection to node 0 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) [2020-01-10 20:54:59,457] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-restore-consumer, groupId=null] Connection to node -1 (localhost/127.0.0.1:38720) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient:756) [2020-01-10 20:54:59,457] WARN [Consumer clientId=branched-repartition-topic-test-7bb4acef-c1c1-401f-adb9-a67a448cae02-StreamThread-1-restore-consumer, groupId=null] Bootstrap broker localhost:38720 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient:1024) ... and some more of those...{quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013187#comment-17013187 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] for now, try to "help" LogCleaner by following solution steps from above. Idea is to move it's state file (cleaner-offset-checkpoint inside topic-partition folder) on all brokers that experiencing issue. Once when next LogSegment becomes available for cleaning, LogCleaner would fix himself and continue to work as expected (unless you'll have new record "holes", but that is, as [~junrao] stated, is rare event happening in edge cases). Advanced solution is to checkout kafka 2.4.0 from github, apply patch attached to this issue, rebuild kafka and run patched version of kafka with your data, and issue, hopefully, should be gone. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013179#comment-17013179 ] ASF GitHub Bot commented on KAFKA-8764: --- trajakovic commented on pull request #7932: KAFKA-8764: LogCleanerManager endless loop while compacting/clea URL: https://github.com/apache/kafka/pull/7932 This PR fixes LogCleaner's endless loop while clearing LogSegemnts with holes. In rare cases, when clearing LogSegments with missing records, LogCleaner was unable to progress resulting with high CPU usage, high disk read/writes and excessive cleaner logs (if enabled). This PR addresses such situation by skipping missing record(s) and, as result, avoiding endless loop while clearing such Logs. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, >
[jira] [Commented] (KAFKA-9253) Test failure : ReassignPartitionsClusterTest.shouldListMovingPartitionsThroughApi
[ https://issues.apache.org/jira/browse/KAFKA-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013152#comment-17013152 ] Bill Bejeck commented on KAFKA-9253: Seen again in [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4182/testReport/junit/kafka.admin/ReassignPartitionsClusterTest/shouldListMovingPartitionsThroughApi/] {noformat} Error Messagejava.lang.NullPointerExceptionStacktracejava.lang.NullPointerException at kafka.admin.ReassignPartitionsClusterTest.assertIsReassigning(ReassignPartitionsClusterTest.scala:1190) at kafka.admin.ReassignPartitionsClusterTest.shouldListMovingPartitionsThroughApi(ReassignPartitionsClusterTest.scala:811) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at
[jira] [Commented] (KAFKA-9397) Deprecate Direct Zookeeper access in Kafka Administrative Tools
[ https://issues.apache.org/jira/browse/KAFKA-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013151#comment-17013151 ] Gérald Quintana commented on KAFKA-9397: kafka-acl.sh doesn''t have --zookeeper argument, but --authorizer-property zookeeper.connect is the same at the end > Deprecate Direct Zookeeper access in Kafka Administrative Tools > --- > > Key: KAFKA-9397 > URL: https://issues.apache.org/jira/browse/KAFKA-9397 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.5.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: 2.5.0 > > > KIP-555: Deprecate Direct Zookeeper access in Kafka Administrative Tools -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9393) DeleteRecords may cause extreme lock contention for large partition directories
[ https://issues.apache.org/jira/browse/KAFKA-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013150#comment-17013150 ] ASF GitHub Bot commented on KAFKA-9393: --- gardnervickers commented on pull request #7929: KAFKA-9393: Establish a 1:1 mapping between producer state snapshot files and segment files. URL: https://github.com/apache/kafka/pull/7929 https://issues.apache.org/jira/browse/KAFKA-9393 This PR avoids a performance issue with `DeleteRecords` when a partition directory contains high numbers of files. Previously, `DeleteRecords` would iterate the partition directory searching for producer state snapshot files. With this change, the iteration is removed in favor of keeping a 1:1 mapping between producer state snapshot file and segment file. A segment files corresponding producer state snapshot file is now deleted when the segment file is deleted. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > DeleteRecords may cause extreme lock contention for large partition > directories > --- > > Key: KAFKA-9393 > URL: https://issues.apache.org/jira/browse/KAFKA-9393 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Lucas Bradstreet >Priority: Major > > DeleteRecords, frequently used by KStreams triggers a > Log.maybeIncrementLogStartOffset call, calling > kafka.log.ProducerStateManager.listSnapshotFiles which calls > java.io.File.listFiles on the partition dir. The time taken to list this > directory can be extreme for partitions with many small segments (e.g 2) > taking multiple seconds to finish. This causes lock contention for the log, > and if produce requests are also occurring for the same log can cause a > majority of request handler threads to become blocked waiting for the > DeleteRecords call to finish. > I believe this is a problem going back to the initial implementation of the > transactional producer, but I need to confirm how far back it goes. > One possible solution is to maintain a producer state snapshot aligned to the > log segment, and simply delete it whenever we delete a segment. This would > ensure that we never have to perform a directory scan. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013115#comment-17013115 ] Matthias J. Sax commented on KAFKA-9398: Other one: https://issues.apache.org/jira/browse/KAFKA-8178 > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8178) KafkaProducer#send(ProducerRecord,Callback) may block for up to 60 seconds
[ https://issues.apache.org/jira/browse/KAFKA-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013117#comment-17013117 ] Matthias J. Sax commented on KAFKA-8178: KAFKA-3450 is also related > KafkaProducer#send(ProducerRecord,Callback) may block for up to 60 seconds > -- > > Key: KAFKA-8178 > URL: https://issues.apache.org/jira/browse/KAFKA-8178 > Project: Kafka > Issue Type: Bug > Components: clients, producer >Reporter: Sergei Egorov >Priority: Major > > Hello. I was running reactor-kafka with [the BlockHound > agent|https://github.com/reactor/BlockHound] (you can see the progress > [here|https://github.com/reactor/reactor-kafka/pull/75] and even run it > yourself) and it detected a very dangerous blocking call in > KafkaProducer#send(ProducerRecord,Callback) which is supposed to be async: > {code:java} > java.lang.Error: Blocking call! java.lang.Object#wait > at reactor.BlockHound$Builder.lambda$new$0(BlockHound.java:154) > at reactor.BlockHound$Builder.lambda$install$8(BlockHound.java:254) > at reactor.BlockHoundRuntime.checkBlocking(BlockHoundRuntime.java:43) > at java.lang.Object.wait(Object.java) > at org.apache.kafka.clients.Metadata.awaitUpdate(Metadata.java:181) > at > org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata(KafkaProducer.java:938) > at > org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:823) > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:803) > {code} > it blocks for up to "maxBlockTimeMs" (60 seconds by default) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-6571) KafkaProducer.close(0) should be non-blocking
[ https://issues.apache.org/jira/browse/KAFKA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax resolved KAFKA-6571. Resolution: Duplicate KIP-367 add `close(Duration)` with new non-blocking semantics: [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89070496] Closing this. > KafkaProducer.close(0) should be non-blocking > - > > Key: KAFKA-6571 > URL: https://issues.apache.org/jira/browse/KAFKA-6571 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Ahmed Al-Mehdi >Priority: Major > > According to the Java doc of producer.close(long timeout, TimeUnit timeUnit), > it is said that "Specifying a timeout of zero means do not wait for pending > send requests to complete". However, producer.close(0) can currently block on > waiting for the sender thread to exit, which in turn can block on user's > callback. > We probably should not let producer.close(0) join the sender thread if user > has specified zero timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-3450) Producer blocks on send to topic that doesn't exist if auto create is disabled
[ https://issues.apache.org/jira/browse/KAFKA-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013110#comment-17013110 ] Matthias J. Sax commented on KAFKA-3450: Seem KAFKA-3539 is a duplicate? > Producer blocks on send to topic that doesn't exist if auto create is disabled > -- > > Key: KAFKA-3450 > URL: https://issues.apache.org/jira/browse/KAFKA-3450 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.9.0.1 >Reporter: Michal Turek >Priority: Critical > > {{producer.send()}} is blocked for {{max.block.ms}} (default 60 seconds) if > the destination topic doesn't exist and if their automatic creation is > disabled. Warning from NetworkClient containing UNKNOWN_TOPIC_OR_PARTITION is > logged every 100 ms in a loop until the 60 seconds timeout expires, but the > operation is not recoverable. > Preconditions > - Kafka 0.9.0.1 with default configuration and auto.create.topics.enable=false > - Kafka 0.9.0.1 clients. > Example minimalist code > https://github.com/avast/kafka-tests/blob/master/src/main/java/com/avast/kafkatests/othertests/nosuchtopic/NoSuchTopicTest.java > {noformat} > /** > * Test of sending to a topic that does not exist while automatic creation of > topics is disabled in Kafka (auto.create.topics.enable=false). > */ > public class NoSuchTopicTest { > private static final Logger LOGGER = > LoggerFactory.getLogger(NoSuchTopicTest.class); > public static void main(String[] args) { > Properties properties = new Properties(); > properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > "localhost:9092"); > properties.setProperty(ProducerConfig.CLIENT_ID_CONFIG, > NoSuchTopicTest.class.getSimpleName()); > properties.setProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG, "1000"); > // Default is 60 seconds > try (Producer producer = new > KafkaProducer<>(properties, new StringSerializer(), new StringSerializer())) { > LOGGER.info("Sending message"); > producer.send(new ProducerRecord<>("ThisTopicDoesNotExist", > "key", "value"), (metadata, exception) -> { > if (exception != null) { > LOGGER.error("Send failed: {}", exception.toString()); > } else { > LOGGER.info("Send successful: {}-{}/{}", > metadata.topic(), metadata.partition(), metadata.offset()); > } > }); > LOGGER.info("Sending message"); > producer.send(new ProducerRecord<>("ThisTopicDoesNotExistToo", > "key", "value"), (metadata, exception) -> { > if (exception != null) { > LOGGER.error("Send failed: {}", exception.toString()); > } else { > LOGGER.info("Send successful: {}-{}/{}", > metadata.topic(), metadata.partition(), metadata.offset()); > } > }); > } > } > } > {noformat} > Related output > {noformat} > 2016-03-23 12:44:37.725 INFO c.a.k.o.nosuchtopic.NoSuchTopicTest [main]: > Sending message (NoSuchTopicTest.java:26) > 2016-03-23 12:44:37.830 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 0 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:37.928 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 1 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:38.028 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 2 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:38.130 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 3 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:38.231 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 4 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:38.332 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread | NoSuchTopicTest]: Error while fetching > metadata with correlation id 5 : > {ThisTopicDoesNotExist=UNKNOWN_TOPIC_OR_PARTITION} (NetworkClient.java:582) > 2016-03-23 12:44:38.433 WARN o.a.kafka.clients.NetworkClient > [kafka-producer-network-thread |
[jira] [Commented] (KAFKA-3539) KafkaProducer.send() may block even though it returns the Future
[ https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013111#comment-17013111 ] Matthias J. Sax commented on KAFKA-3539: Seems KAFKA-3450 is a duplicate? > KafkaProducer.send() may block even though it returns the Future > > > Key: KAFKA-3539 > URL: https://issues.apache.org/jira/browse/KAFKA-3539 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Oleg Zhurakousky >Priority: Critical > Labels: needs-discussion, needs-kip > > You can get more details from the us...@kafka.apache.org by searching on the > thread with the subject "KafkaProducer block on send". > The bottom line is that method that returns Future must never block, since it > essentially violates the Future contract as it was specifically designed to > return immediately passing control back to the user to check for completion, > cancel etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013109#comment-17013109 ] Matthias J. Sax commented on KAFKA-9398: Other related issue: https://issues.apache.org/jira/browse/KAFKA-3450 > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013108#comment-17013108 ] Matthias J. Sax commented on KAFKA-9398: The root cause seems to be https://issues.apache.org/jira/browse/KAFKA-3539 – ie, because `send()` might block, `StreamsThread` is stuck and thus cannot call `producer.close()`. > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013081#comment-17013081 ] Jun Rao commented on KAFKA-8764: [~trajakovic]: Thanks for providing the additional info. That makes sense now. Currently, the log cleaner runs independently on each replica. If a follower has been down for some time, it is possible that the leader has already cleaned the data and left holes in some log segments. When those segments get replicated to the follower, the follower will clean the same data again and potentially hit the above issue. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013073#comment-17013073 ] Tomislav Rajakovic commented on KAFKA-8764: --- Hy [~junrao]. This issue happened on relatively busy kafka cluster, but I've never tinkered with kafka files, or did any manual actions. And what's more indicative, this issue first time happened on follower brokers/replicas (ISRs), while master broker for topic-partition didn't "feel" same effect. Maybe origin of the problem was "first" ever cleanup on master broker, leaving holes, but that's just my guess. And yes, I'm gonna make PR, just need to see "first good PR" and Contribution guideline. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013058#comment-17013058 ] Jun Rao commented on KAFKA-8764: [~trajakovic]: Thanks for the investigation. Normally, the offset map is built from the dirty portion of the log that shouldn't contain holes in offsets. If segment [0,233) misses offset 232, it means that this segment has been cleaned and the dirty offset should have moved to 233 after the first round of cleaning. Was the dirty offset ever reset manually? In any case, I agree that it would be better to make the code more defensive. Could you submit the patch as a PR (details can be found in [https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes])? > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013052#comment-17013052 ] Jeff Nadler commented on KAFKA-8764: I just attached a graph of CPU usage in a small, 3-node cluster that shows this issue. You can see that for all 3 nodes a massive increase in CPU usage when the log cleaner is going nuts - the high CPU periods correspond to tons of log entries in 'log-cleaner.log', and high CPU usage of the 'kafka-log-clean' thread. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log >
[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Nadler updated KAFKA-8764: --- Attachment: Screen Shot 2020-01-10 at 8.38.25 AM.png > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index >
[jira] [Commented] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013044#comment-17013044 ] Bill Bejeck commented on KAFKA-9398: A PR is available for this ticket [https://github.com/apache/kafka/pull/7814] > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck updated KAFKA-9398: --- Description: Kafka Streams offers the KafkaStreams.close() method when shutting down a Kafka Streams application. There are two overloads to this method, one that takes no parameters and another taking a Duration specifying how long the close() method should block waiting for streams shut down operations to complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. The issue is that if a StreamThread is taking to long to complete or if one of the Consumer or Producer clients is in a hung state, the Kafka Streams application won't exit even after the specified timeout has expired. For example, consider this scenario: # A sink topic gets deleted by accident # The user sets Producer max.block.ms config to a high value In this case, the Producer will issue a WARN logging statement and will continue to make metadata requests looking for the expected topic. The {{Producer}} will continue making metadata requests up until the max.block.ms expires. If this value is high enough, calling close() with a timeout won't fix the issue as when the timeout expires, the Kafka Streams application's main thread won't exit. To prevent this type of issue, we should call Thread.interrupt() on all StreamThread instances once the close() timeout has expired. was: Kafka Streams offers the KafkaStreams.close() method when shutting down a Kafka Streams application. There are two overloads to this method, one that takes no parameters and another taking a Duration specifying how long the close() method should block waiting for streams shut down operations to complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. The issue is that if a StreamThread is taking to long to complete or if one of the Consumer or Producer clients is in a hung state, the Kafka Streams application won't exit even after the specified timeout has expired. For example, consider this scenario: # A sink topic gets deleted by accident # The user sets Producer max.block.ms config to a high value In this case, the Producer will issue a WARN logging statement and will continue to make metadata requests looking for the expected topic. The \{{Producer}} will continue making metadata requests up until the max.block.ms expires. If this value is high enough, calling close() with a timeout won't fix the issue as when the timeout expires, the Kafka Streams application's main thread won't exit. To prevent this type of issue, we should call Thread.interrupt() on all StreamThread instances once the close() timeout has expired. > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > {{Producer}} will continue making metadata requests up until the max.block.ms > expires. If this value is high enough, calling close() with a timeout won't > fix the issue as when the timeout expires, the Kafka Streams application's > main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck updated KAFKA-9398: --- Description: Kafka Streams offers the KafkaStreams.close() method when shutting down a Kafka Streams application. There are two overloads to this method, one that takes no parameters and another taking a Duration specifying how long the close() method should block waiting for streams shut down operations to complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. The issue is that if a StreamThread is taking to long to complete or if one of the Consumer or Producer clients is in a hung state, the Kafka Streams application won't exit even after the specified timeout has expired. For example, consider this scenario: # A sink topic gets deleted by accident # The user sets Producer max.block.ms config to a high value In this case, the Producer will issue a WARN logging statement and will continue to make metadata requests looking for the expected topic. The \{{Producer}} will continue making metadata requests up until the max.block.ms expires. If this value is high enough, calling close() with a timeout won't fix the issue as when the timeout expires, the Kafka Streams application's main thread won't exit. To prevent this type of issue, we should call Thread.interrupt() on all StreamThread instances once the close() timeout has expired. was: Kafka Streams offers the {{KafkaStreams.close()}} method when shutting down a Kafka Streams application. There are two overloads to this method, one that takes no parameters and another taking a {{Duration}} specifying how long the {{close()}} method should block waiting for streams shutdown operations to complete. The no-arg version of {{close()}} sets the timeout to {{Long.MAX_VALUE}}. The issue is that if a {{StreamThread}} is some how hung or if one of the {{Consumer}} or {{Producer}} clients are in a hung state, the Kafka Streams application won't exit even after the specified timeout has expired. For example consider this scenario: # A sink topic gets deleted by accident # The {{Producer max.block.ms}} config is set to high value In this case the {{Producer}} will issue a {{WARN}} logging statement and will continue to make metadata requests looking for the expected topic. This will continue up until the {{max.block.ms}} expires. If this value is high enough, calling {{close()}} with a timeout won't fix the issue as when the timeout expires, the Kafka Streams application main thread won't exit. To prevent this type of issue, we should call {{Thread.interrupt()}} on all {{StreamThread}} instances once the {{close()}} timeout has expired. > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the KafkaStreams.close() method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a Duration specifying how long the > close() method should block waiting for streams shut down operations to > complete. The no-arg version of close() sets the timeout to Long.MAX_VALUE. > > The issue is that if a StreamThread is taking to long to complete or if one > of the Consumer or Producer clients is in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example, consider this scenario: > # A sink topic gets deleted by accident > # The user sets Producer max.block.ms config to a high value > In this case, the Producer will issue a WARN logging statement and will > continue to make metadata requests looking for the expected topic. The > \{{Producer}} will continue making metadata requests up until the > max.block.ms expires. If this value is high enough, calling close() with a > timeout won't fix the issue as when the timeout expires, the Kafka Streams > application's main thread won't exit. > To prevent this type of issue, we should call Thread.interrupt() on all > StreamThread instances once the close() timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
[ https://issues.apache.org/jira/browse/KAFKA-9398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bejeck updated KAFKA-9398: --- Priority: Critical (was: Major) > Kafka Streams main thread may not exit even after close timeout has passed > -- > > Key: KAFKA-9398 > URL: https://issues.apache.org/jira/browse/KAFKA-9398 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bill Bejeck >Assignee: Bill Bejeck >Priority: Critical > Fix For: 2.5.0 > > > Kafka Streams offers the {{KafkaStreams.close()}} method when shutting down a > Kafka Streams application. There are two overloads to this method, one that > takes no parameters and another taking a {{Duration}} specifying how long the > {{close()}} method should block waiting for streams shutdown operations to > complete. The no-arg version of {{close()}} sets the timeout to > {{Long.MAX_VALUE}}. > > The issue is that if a {{StreamThread}} is some how hung or if one of the > {{Consumer}} or {{Producer}} clients are in a hung state, the Kafka Streams > application won't exit even after the specified timeout has expired. > For example consider this scenario: > # A sink topic gets deleted by accident > # The {{Producer max.block.ms}} config is set to high value > In this case the {{Producer}} will issue a {{WARN}} logging statement and > will continue to make metadata requests looking for the expected topic. This > will continue up until the {{max.block.ms}} expires. If this value is high > enough, calling {{close()}} with a timeout won't fix the issue as when the > timeout expires, the Kafka Streams application main thread won't exit. > To prevent this type of issue, we should call {{Thread.interrupt()}} on all > {{StreamThread}} instances once the {{close()}} timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9398) Kafka Streams main thread may not exit even after close timeout has passed
Bill Bejeck created KAFKA-9398: -- Summary: Kafka Streams main thread may not exit even after close timeout has passed Key: KAFKA-9398 URL: https://issues.apache.org/jira/browse/KAFKA-9398 Project: Kafka Issue Type: Improvement Components: streams Reporter: Bill Bejeck Assignee: Bill Bejeck Fix For: 2.5.0 Kafka Streams offers the {{KafkaStreams.close()}} method when shutting down a Kafka Streams application. There are two overloads to this method, one that takes no parameters and another taking a {{Duration}} specifying how long the {{close()}} method should block waiting for streams shutdown operations to complete. The no-arg version of {{close()}} sets the timeout to {{Long.MAX_VALUE}}. The issue is that if a {{StreamThread}} is some how hung or if one of the {{Consumer}} or {{Producer}} clients are in a hung state, the Kafka Streams application won't exit even after the specified timeout has expired. For example consider this scenario: # A sink topic gets deleted by accident # The {{Producer max.block.ms}} config is set to high value In this case the {{Producer}} will issue a {{WARN}} logging statement and will continue to make metadata requests looking for the expected topic. This will continue up until the {{max.block.ms}} expires. If this value is high enough, calling {{close()}} with a timeout won't fix the issue as when the timeout expires, the Kafka Streams application main thread won't exit. To prevent this type of issue, we should call {{Thread.interrupt()}} on all {{StreamThread}} instances once the {{close()}} timeout has expired. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9152) Improve Sensor Retrieval
[ https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013029#comment-17013029 ] ASF GitHub Bot commented on KAFKA-9152: --- highluck commented on pull request #7928: KAFKA-9152; Improve Sensor Retrieval URL: https://github.com/apache/kafka/pull/7928 @cadonna I fixed it I'm sorry but can you confirm it once more? *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Sensor Retrieval > - > > Key: KAFKA-9152 > URL: https://issues.apache.org/jira/browse/KAFKA-9152 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: highluck >Priority: Minor > Labels: newbie, tech-debt > > This ticket shall improve two aspects of the retrieval of sensors: > 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. > {{ThreadMetrics.createTaskSensor()}}) after it was created with the same > method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding > queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in > {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when > {{removeAll*LevelSensors()}} is called. Having multiple times the same > sensors in this queue is not an issue from a correctness point of view. > However, it would reduce the footprint to only store a sensor once in those > queues. > 2. When a sensor is retrieved, the current code attempts to create a new > sensor and to add to it again the corresponding metrics. This could be > avoided. > > Both aspects could be improved by checking whether a sensor already exists by > calling {{getSensor()}} on the {{Metrics}} object and checking the return > value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9389) Document how to use kafka-reassign-partitions.sh to change log dirs for a partition
[ https://issues.apache.org/jira/browse/KAFKA-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mitchell reassigned KAFKA-9389: --- Assignee: Mitchell > Document how to use kafka-reassign-partitions.sh to change log dirs for a > partition > --- > > Key: KAFKA-9389 > URL: https://issues.apache.org/jira/browse/KAFKA-9389 > Project: Kafka > Issue Type: Improvement >Reporter: James Cheng >Assignee: Mitchell >Priority: Major > Labels: newbie > > KIP-113 introduced support for moving replicas between log directories. As > part of it, support was added to kafka-reassign-partitions.sh so that users > can move replicas between log directories. Specifically, when you call > "kafka-reassign-partitions.sh --topics-to-move-json-file > topics-to-move.json", you can specify a "log_dirs" key in the > topics-to-move.json file, and kafka-reassign-partitions.sh will then move > those replicas to those directories. > > However, when working on that KIP, we didn't update the docs on > kafka.apache.org to describe how to use the new functionality. We should add > documentation on that. > > I haven't used it before, but whoever works on this Jira can probably figure > it out by experimentation with kafka-reassign-partitions.sh, or by reading > KIP-113 page or the associated JIRAs. > * > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+directories] > * KAFKA-5163 > * KAFKA-5694 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9152) Improve Sensor Retrieval
[ https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012987#comment-17012987 ] ASF GitHub Bot commented on KAFKA-9152: --- highluck commented on pull request #7914: KAFKA-9152; Improve Sensor Retrieval URL: https://github.com/apache/kafka/pull/7914 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Sensor Retrieval > - > > Key: KAFKA-9152 > URL: https://issues.apache.org/jira/browse/KAFKA-9152 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: highluck >Priority: Minor > Labels: newbie, tech-debt > > This ticket shall improve two aspects of the retrieval of sensors: > 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. > {{ThreadMetrics.createTaskSensor()}}) after it was created with the same > method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding > queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in > {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when > {{removeAll*LevelSensors()}} is called. Having multiple times the same > sensors in this queue is not an issue from a correctness point of view. > However, it would reduce the footprint to only store a sensor once in those > queues. > 2. When a sensor is retrieved, the current code attempts to create a new > sensor and to add to it again the corresponding metrics. This could be > avoided. > > Both aspects could be improved by checking whether a sensor already exists by > calling {{getSensor()}} on the {{Metrics}} object and checking the return > value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Comment: was deleted (was: I think that this [^kafka2.4.0-KAFKA-8764.patch] resolves LogCleaner issue(s).) > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: kafka2.4.0-KAFKA-8764.patch, > kafka2.4.0-KAFKA-8764.patch, log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012952#comment-17012952 ] Tomislav Rajakovic commented on KAFKA-8764: --- I think that this [^kafka2.4.0-KAFKA-8764.patch] resolves LogCleaner issue(s). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index >
[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Attachment: kafka2.4.0-KAFKA-8764.patch > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. >
[jira] [Resolved] (KAFKA-9188) Flaky Test SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads
[ https://issues.apache.org/jira/browse/KAFKA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram resolved KAFKA-9188. --- Fix Version/s: 2.5.0 Reviewer: Manikumar Resolution: Fixed > Flaky Test > SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads > --- > > Key: KAFKA-9188 > URL: https://issues.apache.org/jira/browse/KAFKA-9188 > Project: Kafka > Issue Type: Test > Components: core >Reporter: Bill Bejeck >Assignee: Rajini Sivaram >Priority: Major > Labels: flaky-test > Fix For: 2.5.0 > > > Failed in > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9373/testReport/junit/kafka.api/SslAdminClientIntegrationTest/testSynchronousAuthorizerAclUpdatesBlockRequestThreads/] > > {noformat} > Error Messagejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout.Stacktracejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at > kafka.api.SslAdminClientIntegrationTest.$anonfun$testSynchronousAuthorizerAclUpdatesBlockRequestThreads$1(SslAdminClientIntegrationTest.scala:201) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > kafka.api.SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads(SslAdminClientIntegrationTest.scala:201) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout. > Standard Output[2019-11-14 15:13:51,489] ERROR [ReplicaFetcher replicaId=1, > leaderId=2, fetcherId=0] Error for partition mytopic1-1 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:13:51,490] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition mytopic1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:14:04,686] ERROR [KafkaApi-2] Error when handling request: > clientId=adminclient-644, correlationId=4, api=CREATE_ACLS, version=1, > body={creations=[{resource_type=2,resource_name=foobar,resource_pattern_type=3,principal=User:ANONYMOUS,host=*,operation=3,permission_type=3},{resource_type=5,resource_name=transactional_id,resource_pattern_type=3,principal=User:ANONYMOUS,host=*,operation=4,permission_type=3}]} > (kafka.server.KafkaApis:76) > org.apache.kafka.common.errors.ClusterAuthorizationException: Request > Request(processor=1, connectionId=127.0.0.1:41993-127.0.0.1:34770-0, > session=Session(User:ANONYMOUS,/127.0.0.1), listenerName=ListenerName(SSL), >
[jira] [Commented] (KAFKA-9188) Flaky Test SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads
[ https://issues.apache.org/jira/browse/KAFKA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012742#comment-17012742 ] ASF GitHub Bot commented on KAFKA-9188: --- rajinisivaram commented on pull request #7918: KAFKA-9188; Fix flaky test SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads URL: https://github.com/apache/kafka/pull/7918 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky Test > SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads > --- > > Key: KAFKA-9188 > URL: https://issues.apache.org/jira/browse/KAFKA-9188 > Project: Kafka > Issue Type: Test > Components: core >Reporter: Bill Bejeck >Assignee: Rajini Sivaram >Priority: Major > Labels: flaky-test > > Failed in > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9373/testReport/junit/kafka.api/SslAdminClientIntegrationTest/testSynchronousAuthorizerAclUpdatesBlockRequestThreads/] > > {noformat} > Error Messagejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout.Stacktracejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at > kafka.api.SslAdminClientIntegrationTest.$anonfun$testSynchronousAuthorizerAclUpdatesBlockRequestThreads$1(SslAdminClientIntegrationTest.scala:201) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > kafka.api.SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads(SslAdminClientIntegrationTest.scala:201) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout. > Standard Output[2019-11-14 15:13:51,489] ERROR [ReplicaFetcher replicaId=1, > leaderId=2, fetcherId=0] Error for partition mytopic1-1 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:13:51,490] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition mytopic1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:14:04,686] ERROR [KafkaApi-2] Error when handling request: > clientId=adminclient-644, correlationId=4, api=CREATE_ACLS, version=1, >
[jira] [Comment Edited] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012683#comment-17012683 ] Tomislav Rajakovic edited comment on KAFKA-8764 at 1/10/20 10:24 AM: - [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). Btw. voting for this issue might get faster help/attention ;). was (Author: trajakovic): [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset
[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Affects Version/s: 2.4.0 > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > > And such log keeps
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012683#comment-17012683 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log >
[jira] [Commented] (KAFKA-2758) Improve Offset Commit Behavior
[ https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012672#comment-17012672 ] highluck commented on KAFKA-2758: - [~guozhang] Is this issue still open > Improve Offset Commit Behavior > -- > > Key: KAFKA-2758 > URL: https://issues.apache.org/jira/browse/KAFKA-2758 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Guozhang Wang >Priority: Major > Labels: newbie, reliability > > There are two scenarios of offset committing that we can improve: > 1) we can filter the partitions whose committed offset is equal to the > consumed offset, meaning there is no new consumed messages from this > partition and hence we do not need to include this partition in the commit > request. > 2) we can make a commit request right after resetting to a fetch / consume > position either according to the reset policy (e.g. on consumer starting up, > or handling of out of range offset, etc), or through the {code} seek {code} > so that if the consumer fails right after these event, upon recovery it can > restarts from the reset position instead of resetting again: this can lead > to, for example, data loss if we use "largest" as reset policy while there > are new messages coming to the fetching partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-2758) Improve Offset Commit Behavior
[ https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012672#comment-17012672 ] highluck edited comment on KAFKA-2758 at 1/10/20 10:16 AM: --- [~guozhang] Is this issue still open? was (Author: high.lee): [~guozhang] Is this issue still open > Improve Offset Commit Behavior > -- > > Key: KAFKA-2758 > URL: https://issues.apache.org/jira/browse/KAFKA-2758 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Guozhang Wang >Priority: Major > Labels: newbie, reliability > > There are two scenarios of offset committing that we can improve: > 1) we can filter the partitions whose committed offset is equal to the > consumed offset, meaning there is no new consumed messages from this > partition and hence we do not need to include this partition in the commit > request. > 2) we can make a commit request right after resetting to a fetch / consume > position either according to the reset policy (e.g. on consumer starting up, > or handling of out of range offset, etc), or through the {code} seek {code} > so that if the consumer fails right after these event, upon recovery it can > restarts from the reset position instead of resetting again: this can lead > to, for example, data loss if we use "largest" as reset policy while there > are new messages coming to the fetching partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012667#comment-17012667 ] Seva Feldman commented on KAFKA-8764: - Hi, We have the exact same issue with __consumer_offset compacted topic which kills our consumer groups. Thanks, [~trajakovic], for the solution on manually update *cleaner-offset-checkpoint file.* BR > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} >
[jira] [Updated] (KAFKA-7519) Transactional Ids Left in Pending State by TransactionStateManager During Transactional Id Expiration Are Unusable
[ https://issues.apache.org/jira/browse/KAFKA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alper Kanat updated KAFKA-7519: --- Attachment: image-2020-01-10-12-37-28-804.png > Transactional Ids Left in Pending State by TransactionStateManager During > Transactional Id Expiration Are Unusable > -- > > Key: KAFKA-7519 > URL: https://issues.apache.org/jira/browse/KAFKA-7519 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 2.0.0 >Reporter: Bridger Howell >Assignee: Bridger Howell >Priority: Blocker > Fix For: 2.0.1, 2.1.0 > > Attachments: KAFKA-7519.patch, image-2018-10-18-13-02-22-371.png, > image-2020-01-10-12-37-28-804.png > > > > After digging into a case where an exactly-once streams process was bizarrely > unable to process incoming data, we observed the following: > * StreamThreads stalling while creating a producer, eventually resulting in > no consumption by that streams process. Looking into those threads, we found > they were stuck in a loop, sending InitProducerIdRequests and always > receiving back the retriable error CONCURRENT_TRANSACTIONS and trying again. > These requests always had the same transactional id. > * After changing the streams process to not use exactly-once, it was able to > process messages with no problems. > * Alternatively, changing the applicationId for that streams process, it was > able to process with no problems. > * Every hour, every broker would fail the task `transactionalId-expiration` > with the following error: > ** > {code:java} > {"exception":{"stacktrace":"java.lang.IllegalStateException: Preparing > transaction state transition to Dead while it already a pending sta > te Dead > at > kafka.coordinator.transaction.TransactionMetadata.prepareTransitionTo(TransactionMetadata.scala:262) > at kafka.coordinator > .transaction.TransactionMetadata.prepareDead(TransactionMetadata.scala:237) > at kafka.coordinator.transaction.TransactionStateManager$$a > nonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scal > a:151) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$ano > nfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scala:151) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at > > kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:172) > at kafka.coordinator.transaction.TransactionSt > ateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9.apply(TransactionStateManager.sc > ala:150) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$a > nonfun$2$$anonfun$apply$9.apply(TransactionStateManager.scala:149) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversable > Like.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.map(List.scala:296) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$app > ly$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:149) > at kafka.coordinator.transaction.TransactionStateManager$$anonfun$enabl > eTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:142) > at scala.collection.Traversabl > eLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike. > scala:241) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > at scala.collection.mutable.HashMap$$anon > fun$foreach$1.apply(HashMap.scala:130) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collec > tion.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > at scala.collecti > on.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) > a > t > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Tr > ansactionStateManager.scala:142) > at >
[jira] [Commented] (KAFKA-7519) Transactional Ids Left in Pending State by TransactionStateManager During Transactional Id Expiration Are Unusable
[ https://issues.apache.org/jira/browse/KAFKA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012625#comment-17012625 ] Alper Kanat commented on KAFKA-7519: I just had the exact error in Kafka 2.2.0 – any ideas? is this PR really merged into 2.0.1, 2.1.0 releases? !image-2020-01-10-12-37-28-804.png! > Transactional Ids Left in Pending State by TransactionStateManager During > Transactional Id Expiration Are Unusable > -- > > Key: KAFKA-7519 > URL: https://issues.apache.org/jira/browse/KAFKA-7519 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 2.0.0 >Reporter: Bridger Howell >Assignee: Bridger Howell >Priority: Blocker > Fix For: 2.0.1, 2.1.0 > > Attachments: KAFKA-7519.patch, image-2018-10-18-13-02-22-371.png, > image-2020-01-10-12-37-28-804.png > > > > After digging into a case where an exactly-once streams process was bizarrely > unable to process incoming data, we observed the following: > * StreamThreads stalling while creating a producer, eventually resulting in > no consumption by that streams process. Looking into those threads, we found > they were stuck in a loop, sending InitProducerIdRequests and always > receiving back the retriable error CONCURRENT_TRANSACTIONS and trying again. > These requests always had the same transactional id. > * After changing the streams process to not use exactly-once, it was able to > process messages with no problems. > * Alternatively, changing the applicationId for that streams process, it was > able to process with no problems. > * Every hour, every broker would fail the task `transactionalId-expiration` > with the following error: > ** > {code:java} > {"exception":{"stacktrace":"java.lang.IllegalStateException: Preparing > transaction state transition to Dead while it already a pending sta > te Dead > at > kafka.coordinator.transaction.TransactionMetadata.prepareTransitionTo(TransactionMetadata.scala:262) > at kafka.coordinator > .transaction.TransactionMetadata.prepareDead(TransactionMetadata.scala:237) > at kafka.coordinator.transaction.TransactionStateManager$$a > nonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scal > a:151) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$ano > nfun$2$$anonfun$apply$9$$anonfun$3.apply(TransactionStateManager.scala:151) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at > > kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:172) > at kafka.coordinator.transaction.TransactionSt > ateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2$$anonfun$apply$9.apply(TransactionStateManager.sc > ala:150) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$a > nonfun$2$$anonfun$apply$9.apply(TransactionStateManager.scala:149) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversable > Like.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.Li > st.map(List.scala:296) > at > kafka.coordinator.transaction.TransactionStateManager$$anonfun$enableTransactionalIdExpiration$1$$anonfun$app > ly$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:149) > at kafka.coordinator.transaction.TransactionStateManager$$anonfun$enabl > eTransactionalIdExpiration$1$$anonfun$apply$mcV$sp$1$$anonfun$2.apply(TransactionStateManager.scala:142) > at scala.collection.Traversabl > eLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike. > scala:241) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) > at scala.collection.mutable.HashMap$$anon > fun$foreach$1.apply(HashMap.scala:130) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) > at scala.collec > tion.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) > at scala.collecti > on.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) > a > t >
[jira] [Commented] (KAFKA-9152) Improve Sensor Retrieval
[ https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012572#comment-17012572 ] highluck commented on KAFKA-9152: - [~cadonna] [https://github.com/apache/kafka/pull/7914] I don't know if I understand Please confirm pull > Improve Sensor Retrieval > - > > Key: KAFKA-9152 > URL: https://issues.apache.org/jira/browse/KAFKA-9152 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: highluck >Priority: Minor > Labels: newbie, tech-debt > > This ticket shall improve two aspects of the retrieval of sensors: > 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. > {{ThreadMetrics.createTaskSensor()}}) after it was created with the same > method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding > queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in > {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when > {{removeAll*LevelSensors()}} is called. Having multiple times the same > sensors in this queue is not an issue from a correctness point of view. > However, it would reduce the footprint to only store a sensor once in those > queues. > 2. When a sensor is retrieved, the current code attempts to create a new > sensor and to add to it again the corresponding metrics. This could be > avoided. > > Both aspects could be improved by checking whether a sensor already exists by > calling {{getSensor()}} on the {{Metrics}} object and checking the return > value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9152) Improve Sensor Retrieval
[ https://issues.apache.org/jira/browse/KAFKA-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012572#comment-17012572 ] highluck edited comment on KAFKA-9152 at 1/10/20 8:40 AM: -- [~cadonna] [https://github.com/apache/kafka/pull/7914] I don't know if I understand Please confirm pull was (Author: high.lee): [~cadonna] [https://github.com/apache/kafka/pull/7914] I don't know if I understand Please confirm pull > Improve Sensor Retrieval > - > > Key: KAFKA-9152 > URL: https://issues.apache.org/jira/browse/KAFKA-9152 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: highluck >Priority: Minor > Labels: newbie, tech-debt > > This ticket shall improve two aspects of the retrieval of sensors: > 1. Currently, when a sensor is retrieved with {{*Metrics.*Sensor()}} (e.g. > {{ThreadMetrics.createTaskSensor()}}) after it was created with the same > method {{*Metrics.*Sensor()}}, the sensor is added again to the corresponding > queue in {{*Sensors}} (e.g. {{threadLevelSensors}}) in > {{StreamsMetricsImpl}}. Those queues are used to remove the sensors when > {{removeAll*LevelSensors()}} is called. Having multiple times the same > sensors in this queue is not an issue from a correctness point of view. > However, it would reduce the footprint to only store a sensor once in those > queues. > 2. When a sensor is retrieved, the current code attempts to create a new > sensor and to add to it again the corresponding metrics. This could be > avoided. > > Both aspects could be improved by checking whether a sensor already exists by > calling {{getSensor()}} on the {{Metrics}} object and checking the return > value. -- This message was sent by Atlassian Jira (v8.3.4#803005)