[jira] [Created] (KAFKA-12456) Log "Listeners are not identical across brokers" message at WARN/INFO instead of ERROR
Michael Bingham created KAFKA-12456: --- Summary: Log "Listeners are not identical across brokers" message at WARN/INFO instead of ERROR Key: KAFKA-12456 URL: https://issues.apache.org/jira/browse/KAFKA-12456 Project: Kafka Issue Type: Improvement Components: core, security Affects Versions: 2.6.1, 2.7.0 Reporter: Michael Bingham Currently, when listeners are not identical across brokers, an {{ERROR}} level log message is generated like: {code:java} [2021-03-11 20:25:18,881] ERROR [MetadataCache brokerId=0] Listeners are not identical across brokers: LongMap(0 -> Map(ListenerName(PLAINTEXT) -> 192.168.111.99:9092 (id: 0 rack: null), ListenerName(SSL) -> 192.168.111.99:9094 (id: 0 rack: null), ListenerName(SASL_PLAINTEXT) -> 192.168.111.99:9093 (id: 0 rack: null)), 2 -> Map(ListenerName(PLAINTEXT) -> 192.168.116.148:9092 (id: 2 rack: null), ListenerName(SASL_PLAINTEXT) -> 192.168.116.148:9093 (id: 2 rack: null)), 1 -> Map(ListenerName(PLAINTEXT) -> 192.168.111.100:9092 (id: 1 rack: null), ListenerName(SASL_PLAINTEXT) -> 192.168.111.100:9093 (id: 1 rack: null))) (kafka.server.MetadataCache) {code} When adding security incrementally with a rolling update, this event is normal/expected, so recommend lowering the log level for this message to {{WARN}} or {{INFO}} to avoid confusion. This log message was originally added with https://issues.apache.org/jira/browse/KAFKA-6501 . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10564) Continuous logging about deleting obsolete state directories
Michael Bingham created KAFKA-10564: --- Summary: Continuous logging about deleting obsolete state directories Key: KAFKA-10564 URL: https://issues.apache.org/jira/browse/KAFKA-10564 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.6.0 Reporter: Michael Bingham The internal process which automatically cleans up obsolete task state directories was modified in https://issues.apache.org/jira/browse/KAFKA-6647. The current logic in 2.6 is to remove all files from the task directory except the {{.lock}} file: [https://github.com/apache/kafka/blob/2.6/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L335] However, the directory is only removed in its entirely for a manual cleanup: [https://github.com/apache/kafka/blob/2.6/streams/src/main/java/org/apache/kafka/streams/processor/internals/StateDirectory.java#L349-L353] The result of this is that Streams will continue revisiting this directory and trying to clean it up, since it determines what to clean based on last-modification time of the task directory (which is now no longer deleted during the automatic cleanup). So a user will see log messages like: stream-thread [app-c2d773e6-8ac3-4435-9777-378e0ec0ab82-CleanupThread] Deleting obsolete state directory 0_8 for task 0_8 as 6061ms has elapsed (cleanup delay is 60ms) repeated again and again. This issue doesn't seem to break anything - it's more about avoiding unnecessary logging and cleaning up empty task directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9965) Uneven distribution with RoundRobinPartitioner in AK 2.4+
Michael Bingham created KAFKA-9965: -- Summary: Uneven distribution with RoundRobinPartitioner in AK 2.4+ Key: KAFKA-9965 URL: https://issues.apache.org/jira/browse/KAFKA-9965 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 2.4.1, 2.5.0, 2.4.0 Reporter: Michael Bingham {{RoundRobinPartitioner}} states that it will provide equal distribution of records across partitions. However with the enhancements made in KIP-480, it may not. In some cases, when a new batch is started, the partitioner may be called a second time for the same record: [https://github.com/apache/kafka/blob/2.4/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L909] [https://github.com/apache/kafka/blob/2.4/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L934] Each time the partitioner is called, it increments a counter in {{RoundRobinPartitioner}}, so this can result in unequal distribution. Easiest fix might be to decrement the counter in {{RoundRobinPartitioner#onNewBatch}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9964) Better description of RoundRobinPartitioner behavior for AK 2.4+
[ https://issues.apache.org/jira/browse/KAFKA-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bingham resolved KAFKA-9964. Resolution: Invalid > Better description of RoundRobinPartitioner behavior for AK 2.4+ > > > Key: KAFKA-9964 > URL: https://issues.apache.org/jira/browse/KAFKA-9964 > Project: Kafka > Issue Type: Improvement > Components: producer >Affects Versions: 2.4.0, 2.5.0, 2.4.1 >Reporter: Michael Bingham >Priority: Minor > > The Javadocs for {{RoundRobinPartitioner}} currently state: > {quote}This partitioning strategy can be used when user wants to distribute > the writes to all partitions equally > {quote} > In AK 2.4+, equal distribution is not guaranteed, even with this partitioner. > The enhancements to consider batching made with > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner] > affect this partitioner as well. > So it would be useful to add some additional Javadocs to explain that unless > batching is disabled, even distribution of records is not guaranteed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9964) Better description of RoundRobinPartitioner behavior for AK 2.4+
Michael Bingham created KAFKA-9964: -- Summary: Better description of RoundRobinPartitioner behavior for AK 2.4+ Key: KAFKA-9964 URL: https://issues.apache.org/jira/browse/KAFKA-9964 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 2.4.1, 2.5.0, 2.4.0 Reporter: Michael Bingham The Javadocs for {{RoundRobinPartitioner}} currently state: {quote}This partitioning strategy can be used when user wants to distribute the writes to all partitions equally {quote} In AK 2.4+, equal distribution is not guaranteed, even with this partitioner. The enhancements to consider batching made with [https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner] affect this partitioner as well. So it would be useful to add some additional Javadocs to explain that unless batching is disabled, even distribution of records is not guaranteed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9524) Default window retention does not consider grace period
Michael Bingham created KAFKA-9524: -- Summary: Default window retention does not consider grace period Key: KAFKA-9524 URL: https://issues.apache.org/jira/browse/KAFKA-9524 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.4.0 Reporter: Michael Bingham In a windowed aggregation, if you specify a window size larger than the default window retention (1 day), Streams will implicitly set retention accordingly to accommodate windows of that size. For example, {code:java} .windowedBy(TimeWindows.of(Duration.ofDays(20))) {code} In this case, Streams will implicitly set window retention to 20 days, and no exceptions will occur. However, if you also include a non-zero grace period on the window, such as: {code:java} .windowedBy(TimeWindows.of(Duration.ofDays(20)).grace(Duration.ofMinutes(5))) {code} In this case, Streams will still implicitly set the window retention 20 days (not 20 days + 5 minutes grace), and an exception will be thrown: Exception in thread "main" java.lang.IllegalArgumentException: The retention period of the window store KSTREAM-KEY-SELECT-02 must be no smaller than its window size plus the grace period. Got size=[172800], grace=[30], retention=[172800] Ideally, Streams should include grace period when implicitly setting window retention. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9485) Dynamic updates to num.recovery.threads.per.data.dir are not applied right away
Michael Bingham created KAFKA-9485: -- Summary: Dynamic updates to num.recovery.threads.per.data.dir are not applied right away Key: KAFKA-9485 URL: https://issues.apache.org/jira/browse/KAFKA-9485 Project: Kafka Issue Type: Improvement Components: core, log Affects Versions: 2.4.0 Reporter: Michael Bingham The {{num.recovery.threads.per.data.dir}} broker property is a {{cluster-wide}} dynamically configurable setting, but it does not appear that it would have any dynamic effect on actual broker behavior. The recovery thread pool is currently only created once when the {{LogManager}} is started and the {{loadLogs()}} method is called. If this property is later changed dynamically, it would have no effect until the broker is restarted. This might be confusing to someone modifying this property, so perhaps should be made more clear in the documentation, or perhaps changed to a \{{read-only}} property. The only benefit I see to having it be a dynamic config property is that it can be applied once for the entire cluster, instead of individually specified in each broker's {{server.properties}} file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9331) Add option to terminate application when StreamThread(s) die
Michael Bingham created KAFKA-9331: -- Summary: Add option to terminate application when StreamThread(s) die Key: KAFKA-9331 URL: https://issues.apache.org/jira/browse/KAFKA-9331 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 2.4.0 Reporter: Michael Bingham Currently, if a {{StreamThread}} dies due to an unexpected exception, the Streams application continues running. Even if all {{StreamThread}}s die, the application will continue running, but will be in an {{ERROR}} state. Many users want or expect the application to terminate in the event of a fatal exception that kills one or more {{StreamThread}}s. Currently, this requires extra work from the developer to register an uncaught exception handler on the {{KafkaStreams}} object and trigger a shutdown as needed. It would be useful to provide a configurable option for the Streams application to have it automatically terminate with an exception if one or more {{StreamThread}}(s) die. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9209) Avoid sending unnecessary offset updates from consumer after KIP-211
Michael Bingham created KAFKA-9209: -- Summary: Avoid sending unnecessary offset updates from consumer after KIP-211 Key: KAFKA-9209 URL: https://issues.apache.org/jira/browse/KAFKA-9209 Project: Kafka Issue Type: Improvement Components: consumer Affects Versions: 2.3.0 Reporter: Michael Bingham With KIP-211 ([https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]), offsets will no longer expire as long as the consumer group is active. If the consumer has {{enable.auto.commit=true}}, and if no new events are arriving on subscribed partition(s), the consumer still sends offsets (unchanged) to the group coordinator just to keep them from expiring. This is no longer necessary, and an optimization could potentially be implemented to only send offsets with auto commit when there are actual updates to be made (i.e., when new events have been processed). This would require detecting whether the broker supports the new expiration semantics in KIP-211, and only apply the optimization when it does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-7487) DumpLogSegments reports mismatches for indexed offsets which are not at the start of a record batch
Michael Bingham created KAFKA-7487: -- Summary: DumpLogSegments reports mismatches for indexed offsets which are not at the start of a record batch Key: KAFKA-7487 URL: https://issues.apache.org/jira/browse/KAFKA-7487 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 2.0.0 Reporter: Michael Bingham When running {{DumpLogSegments}} against an {{.index file}}, mismatches may be reported when the indexed message offset is not the first record in a batch. For example: {code} Mismatches in :/var/lib/kafka/data/replicated-topic-0/.index Index offset: 968, log offset: 966 {code} And looking at the corresponding {{.log}} file: {code} baseOffset: 966 lastOffset: 968 count: 3 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false position: 3952771 CreateTime: 1538768639065 isvalid: true size: 12166 magic: 2 compresscodec: NONE crc: 294402254 {code} In this case, the last offset in the batch was indexed instead of the first, but the index has to map physical position to the start of the batch, leading to the mismatch. It seems like {{DumpLogSegments}} should not report these cases as mismatches, which users might interpret as an error or problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)