[jira] [Created] (KAFKA-12891) Add --files and --file-separator options to the ConsoleProducer
Wenbing Shen created KAFKA-12891: Summary: Add --files and --file-separator options to the ConsoleProducer Key: KAFKA-12891 URL: https://issues.apache.org/jira/browse/KAFKA-12891 Project: Kafka Issue Type: New Feature Components: tools Reporter: Wenbing Shen Assignee: Wenbing Shen Introduce *--files* to the producer command line tool to support reading data from a given *multi-file*, Multiple files are separated by *--files-separator*, the default *comma* is the separator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
[ https://issues.apache.org/jira/browse/KAFKA-12734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12734. -- Resolution: Duplicate Duplicate with jiraId KAFKA-10471, This problem has been fixed in version 2.8. > LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip > activeSegment sanityCheck > > > Key: KAFKA-12734 > URL: https://issues.apache.org/jira/browse/KAFKA-12734 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Wenbing Shen >Assignee: Wenbing Shen >Priority: Blocker > Attachments: LoadIndex.png, image-2021-04-30-22-49-24-202.png, > niobufferoverflow.png > > > This question is similar to KAFKA-9156 > We introduced Lazy Index, which helps us skip checking the index files of all > log segments when starting kafka, which has greatly improved the speed of our > kafka startup. > Unfortunately, it skips the index file detection of the active segment. The > active segment will receive write requests from the client or the replica > synchronization thread. > There is a situation when we skip the index detection of all segments, and we > do not need to recover the unflushed log segment, and the index file of the > last active segment is damaged at this time. When appending data to the > active segment, at this time The program reported an error. > Below are the problems I encountered in the production environment: > When Kafka starts to load the log segment, I see in the program log that the > memory mapping position of the index file with timestamp and offset is at the > larger position of the current index file, but in fact, the index file is not > written With so many index items, I guess this kind of problem will occur > during the kafka startup process. When kafka has not been started yet, stop > the kafka process at this time, and then start the kafka process again, > whether it will cause the limit address of the index file memory map to be a > file The maximum value is not cut to the actual size used, which will cause > the memory map position to be set to limit when Kafka is started. > At this time, adding data to the active segment will cause niobufferoverflow. > I agree to skip the index detection of all inactive segments, because in fact > they will no longer receive write requests, but for active segments, we need > to perform index file detection. > Another situation is that we have CleanShutdown, but due to some factors, > the index file of the active segment sets the position of the memory map to > limit, resulting in a niobuffer overflow in the write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12734) LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck
Wenbing Shen created KAFKA-12734: Summary: LazyTimeIndex & LazyOffsetIndex may cause niobufferoverflow when skip activeSegment sanityCheck Key: KAFKA-12734 URL: https://issues.apache.org/jira/browse/KAFKA-12734 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.8.0, 2.7.0, 2.6.0, 2.5.0, 2.4.0 Reporter: Wenbing Shen Attachments: LoadIndex.png, niobufferoverflow.png This question is similar to KAFKA-9156 We introduced Lazy Index, which helps us skip checking the index files of all log segments when starting kafka, which has greatly improved the speed of our kafka startup. Unfortunately, it skips the index file detection of the active segment. The active segment will receive write requests from the client or the replica synchronization thread. There is a situation when we skip the index detection of all segments, and we do not need to recover the unflushed log segment, and the index file of the last active segment is damaged at this time. When appending data to the active segment, at this time The program reported an error. Below are the problems I encountered in the production environment: When Kafka starts to load the log segment, I see in the program log that the memory mapping position of the index file with timestamp and offset is at the larger position of the current index file, but in fact, the index file is not written With so many index items, I guess this kind of problem will occur during the kafka startup process. When kafka has not been started yet, stop the kafka process at this time, and then start the kafka process again, whether it will cause the limit address of the index file memory map to be a file The maximum value is not cut to the actual size used, which will cause the memory map position to be set to limit when Kafka is started. At this time, adding data to the active segment will cause niobufferoverflow. I agree to skip the index detection of all inactive segments, because in fact they will no longer receive write requests, but for active segments, we need to perform index file detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12680. -- Resolution: Not A Problem > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen reopened KAFKA-12680: -- > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12680) Failed to restart the broker in kraft mode
[ https://issues.apache.org/jira/browse/KAFKA-12680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12680. -- Resolution: Cannot Reproduce > Failed to restart the broker in kraft mode > -- > > Key: KAFKA-12680 > URL: https://issues.apache.org/jira/browse/KAFKA-12680 > Project: Kafka > Issue Type: Bug >Reporter: Wenbing Shen >Priority: Major > > I tested kraft mode for the first time today, I deployed a single node kraft > mode broker according to the documentation: > [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] > > first step: ./bin/kafka-storage.sh random-uuid > Second step: Use the uuid generated above to execute the following commands: > ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties > > third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties > > Then I created two topics with two partitions and a single replica. > ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 > --replication-factor 1 --bootstrap-server localhost:9092 > Verify that there is no problem with production and consumption, but when I > call kafka-server-stop.sh, when I call the start command again, the broker > starts to report an error. > I am not sure if it is a known bug or a problem with my usage > > [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception > (kafka.Kafka$) > java.io.IOException: Invalid argument > at java.io.RandomAccessFile.setLength(Native Method) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) > at kafka.log.LogSegment.recover(LogSegment.scala:385) > at kafka.log.Log.recoverSegment(Log.scala:741) > at kafka.log.Log.recoverLog(Log.scala:894) > at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) > at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) > at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) > at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) > at kafka.log.Log.loadSegments(Log.scala:816) > at kafka.log.Log.(Log.scala:326) > at kafka.log.Log$.apply(Log.scala:2593) > at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) > at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) > at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) > at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) > at kafka.Kafka$.buildServer(Kafka.scala:79) > at kafka.Kafka$.main(Kafka.scala:87) > at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12702) Unhandled exception caught in InterBrokerSendThread
Wenbing Shen created KAFKA-12702: Summary: Unhandled exception caught in InterBrokerSendThread Key: KAFKA-12702 URL: https://issues.apache.org/jira/browse/KAFKA-12702 Project: Kafka Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wenbing Shen Attachments: afterFixing.png, beforeFixing.png In kraft mode, if listeners and advertised.listeners are not configured with host addresses, the host parameter value of Listener in BrokerRegistrationRequestData will be null. When the broker is started, a null pointer exception will be thrown, causing startup failure. A feasible solution is to replace the empty host of endPoint in advertisedListeners with InetAddress.getLocalHost.getCanonicalHostName in Broker Server when building networkListeners. The following is the debug log: before fixing: [2021-04-21 14:15:20,032] DEBUG (broker-2-to-controller-send-thread org.apache.kafka.clients.NetworkClient 522) [broker-2-to-controller] Sending BROKER_REGISTRATION request with header RequestHeader(apiKey=BROKER_REGIS TRATION, apiVersion=0, clientId=2, correlationId=6) and timeout 3 to node 2: BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', incarnationId=X8w4_1DFT2yUjOm6asPjIQ, listeners=[Listener(n ame='PLAINTEXT', {color:#FF}host=null,{color} port=9092, securityProtocol=0)], features=[], rack=null) [2021-04-21 14:15:20,033] ERROR (broker-2-to-controller-send-thread kafka.server.BrokerToControllerRequestThread 76) [broker-2-to-controller-send-thread]: unhandled exception caught in InterBrokerSendThread java.lang.NullPointerException at org.apache.kafka.common.message.BrokerRegistrationRequestData$Listener.addSize(BrokerRegistrationRequestData.java:515) at org.apache.kafka.common.message.BrokerRegistrationRequestData.addSize(BrokerRegistrationRequestData.java:216) at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:101) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:525) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:501) at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:461) at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1(InterBrokerSendThread.scala:104) at kafka.common.InterBrokerSendThread.$anonfun$sendRequests$1$adapted(InterBrokerSendThread.scala:99) at kafka.common.InterBrokerSendThread$$Lambda$259/910445654.apply(Unknown Source) at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) at scala.collection.AbstractIterable.foreach(Iterable.scala:919) at kafka.common.InterBrokerSendThread.sendRequests(InterBrokerSendThread.scala:99) at kafka.common.InterBrokerSendThread.pollOnce(InterBrokerSendThread.scala:73) at kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:368) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-04-21 14:15:20,034] INFO (broker-2-to-controller-send-thread kafka.server.BrokerToControllerRequestThread 66) [broker-2-to-controller-send-thread]: Stopped after fixing: [2021-04-21 15:05:01,095] DEBUG (BrokerToControllerChannelManager broker=2 name=heartbeat org.apache.kafka.clients.NetworkClient 512) [BrokerToControllerChannelManager broker=2 name=heartbeat] Sending BROKER_REGISTRATI ON request with header RequestHeader(apiKey=BROKER_REGISTRATION, apiVersion=0, clientId=2, correlationId=0) and timeout 3 to node 2: BrokerRegistrationRequestData(brokerId=2, clusterId='nCqve6D1TEef3NpQniA0Mg', inc arnationId=xF29h_IRR1KzrERWwssQ2w, listeners=[Listener(name='PLAINTEXT', host='hdpxxx.cn', port=9092, securityProtocol=0)], features=[], rack=null) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12684) The valid partition list is incorrectly replaced by the successfully elected partition list
Wenbing Shen created KAFKA-12684: Summary: The valid partition list is incorrectly replaced by the successfully elected partition list Key: KAFKA-12684 URL: https://issues.apache.org/jira/browse/KAFKA-12684 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.7.0, 2.6.0 Reporter: Wenbing Shen Fix For: 3.0.0 Attachments: election-preferred-leader.png, non-preferred-leader.png When using the kafka-election-tool for preferred replica election, if there are partitions in the elected list that are in the preferred replica, the list of partitions already in the preferred replica will be replaced by the successfully elected partition list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12680) Failed to restart the broker in kraft mode
Wenbing Shen created KAFKA-12680: Summary: Failed to restart the broker in kraft mode Key: KAFKA-12680 URL: https://issues.apache.org/jira/browse/KAFKA-12680 Project: Kafka Issue Type: Bug Reporter: Wenbing Shen I tested kraft mode for the first time today, I deployed a single node kraft mode broker according to the documentation: [https://github.com/apache/kafka/blob/6d1d68617ecd023b787f54aafc24a4232663428d/config/kraft/README.md] first step: ./bin/kafka-storage.sh random-uuid Second step: Use the uuid generated above to execute the following commands: ./bin/kafka-storage.sh format -t -c ./config/kraft/server.properties third step: ./bin/kafka-server-start.sh ./config/kraft/server.properties Then I created two topics with two partitions and a single replica. ./bin/kafka-topics.sh --create --topic test-01 --partitions 2 --replication-factor 1 --bootstrap-server localhost:9092 Verify that there is no problem with production and consumption, but when I call kafka-server-stop.sh, when I call the start command again, the broker starts to report an error. I am not sure if it is a known bug or a problem with my usage [2021-04-18 00:19:37,443] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$) java.io.IOException: Invalid argument at java.io.RandomAccessFile.setLength(Native Method) at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:189) at kafka.log.AbstractIndex.resize(AbstractIndex.scala:175) at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:241) at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:241) at kafka.log.LogSegment.recover(LogSegment.scala:385) at kafka.log.Log.recoverSegment(Log.scala:741) at kafka.log.Log.recoverLog(Log.scala:894) at kafka.log.Log.$anonfun$loadSegments$2(Log.scala:816) at kafka.log.Log$$Lambda$153/391630194.apply$mcJ$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17) at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2456) at kafka.log.Log.loadSegments(Log.scala:816) at kafka.log.Log.(Log.scala:326) at kafka.log.Log$.apply(Log.scala:2593) at kafka.raft.KafkaMetadataLog$.apply(KafkaMetadataLog.scala:358) at kafka.raft.KafkaRaftManager.buildMetadataLog(RaftManager.scala:253) at kafka.raft.KafkaRaftManager.(RaftManager.scala:127) at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:74) at kafka.Kafka$.buildServer(Kafka.scala:79) at kafka.Kafka$.main(Kafka.scala:87) at kafka.Kafka.main(Kafka.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
[ https://issues.apache.org/jira/browse/KAFKA-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Shen resolved KAFKA-12445. -- Resolution: Won't Do > Improve the display of ConsumerPerformance indicators > - > > Key: KAFKA-12445 > URL: https://issues.apache.org/jira/browse/KAFKA-12445 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.7.0 >Reporter: Wenbing Shen >Priority: Minor > Attachments: image-2021-03-10-13-30-27-734.png > > > The current test indicators are shown below, the user experience is poor, and > there is no intuitive display of the meaning of each indicator. > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf10 --messages 4 --from-latest > start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, > nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, > 779.7162, 3096, 48206, 8.1034, 829.7930 > > show-detailed-stats: > bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic > test-perf --messages 1 --show-detailed-stats > time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, > rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec > 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, > 1615346338626, -1615346333626, 0., 0. > 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, > 5000, 758.4341, 795275.8000 > 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, > 5000, 795.6612, 834311.2000 > 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, > 5140, 188.8091, 197980.7393 > 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, > 1.9438, 2038.2166 > > One of the optimization methods is to display the indicator variable name and > indicator test value in the form of a table, so that the meaning of each > measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12556) Add --under-preferred-replica-partitions option to describe topics command
Wenbing Shen created KAFKA-12556: Summary: Add --under-preferred-replica-partitions option to describe topics command Key: KAFKA-12556 URL: https://issues.apache.org/jira/browse/KAFKA-12556 Project: Kafka Issue Type: Improvement Components: tools Reporter: Wenbing Shen Whether the preferred replica is the partition leader directly affects the external output traffic of the broker. When the preferred replica of all partitions becomes the leader, the external output traffic of the broker will be in a balanced state. When there are a large number of partition leaders that are not preferred replicas, it will be destroyed this state of balance. Currently, the controller will periodically check the unbalanced ratio of the partition preferred replicas (if enabled) to trigger the preferred replica election, or manually trigger the election through the kafka-leader-election tool. However, if we want to know which partition leader is in the non-preferred replica, we need to look it up in the controller log or judge ourselves from the topic details list. We can add the --under-preferred-replica-partitions configuration option in TopicCommand describe topics to query the list of partitions in the current cluster that are in non-preferred replicas. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12493) The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper
Wenbing Shen created KAFKA-12493: Summary: The controller should handle the consistency between the controllerContext and the partition replicas assignment on zookeeper Key: KAFKA-12493 URL: https://issues.apache.org/jira/browse/KAFKA-12493 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 2.7.0, 2.6.0, 2.5.0, 2.4.0, 2.3.0, 2.2.0, 2.1.0, 2.0.0 Reporter: Wenbing Shen Fix For: 3.0.0 This question can be linked to this email: [https://lists.apache.org/thread.html/redf5748ec787a9c65fc48597e3d2256ffdd729de14afb873c63e6c5b%40%3Cusers.kafka.apache.org%3E] This is a 100% recurring problem. Problem description: In the production environment of our customer’s site, the existing partitions were redistributed in the code of colleagues in other departments and written into zookeeper. This caused the controller to only judge the newly added partitions when processing partition modification events. Partition allocation plan and new partition and replica allocation in the partition state machine and replica state machine, and issue LeaderAndISR and other control requests. But the controller did not verify the existing partition replicas assigment in the controllerContext and whether the original partition allocation on the znode in zookeeper has changed. This seems to be no problem, but when we have to restart the broker for some reasons, such as configuration updates and upgrades Wait, this will cause this part of the topic in real-time production to be abnormal, the controller cannot complete the allocation of the new leader, and the original leader cannot correctly identify the replica allocated on the current zookeeper. The real-time business in our customer's on-site environment is interrupted and partially Data has been lost. This problem can be stably reproduced in the following ways: Adding partitions or modifying replicas of an existing topic through the following code will cause the original partition replicas to be reallocated and finally written to zookeeper.Next, the controller did not accurately process this event, restart the topic related broker, this topic will not be able to be produced and consumed. {code:java} public void updateKafkaTopic(KafkaTopicVO kafkaTopicVO) { ZkUtils zkUtils = ZkUtils.apply(ZK_LIST, SESSION_TIMEOUT, CONNECTION_TIMEOUT, JaasUtils.isZkSecurityEnabled()); try { if (kafkaTopicVO.getPartitionNum() >= 0 && kafkaTopicVO.getReplicationNum() >= 0) { // Get the original broker data information Seq brokerMetadata = AdminUtils.getBrokerMetadatas(zkUtils, RackAwareMode.Enforced$.MODULE$, Option.apply(null)); // Generate a new partition replica allocation plan scala.collection.Map> replicaAssign = AdminUtils.assignReplicasToBrokers(brokerMetadata, kafkaTopicVO.getPartitionNum(), // Number of partitions kafkaTopicVO.getReplicationNum(), // Number of replicas per partition -1, -1); // Modify the partition replica allocation plan AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK(zkUtils, kafkaTopicVO.getTopicNameList().get(0), replicaAssign, null, true); } } catch (Exception e) { System.out.println("Adjust partition abnormal"); System.exit(0); } finally { zkUtils.close(); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12454) Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster
Wenbing Shen created KAFKA-12454: Summary: Add ERROR logging on kafka-log-dirs when given brokerIds do not exist in current kafka cluster Key: KAFKA-12454 URL: https://issues.apache.org/jira/browse/KAFKA-12454 Project: Kafka Issue Type: Improvement Reporter: Wenbing Shen When non-existent brokerIds value are given, the kafka-log-dirs tool will have a timeout error: Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeLogDirs at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) at kafka.admin.LogDirsCommand$.describe(LogDirsCommand.scala:50) at kafka.admin.LogDirsCommand$.main(LogDirsCommand.scala:36) at kafka.admin.LogDirsCommand.main(LogDirsCommand.scala) Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeLogDirs When the brokerId entered by the user does not exist, an error message indicating that the node is not present should be printed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12445) Improve the display of ConsumerPerformance indicators
Wenbing Shen created KAFKA-12445: Summary: Improve the display of ConsumerPerformance indicators Key: KAFKA-12445 URL: https://issues.apache.org/jira/browse/KAFKA-12445 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.7.0 Reporter: Wenbing Shen The current test indicators are shown below, the user experience is poor, and there is no intuitive display of the meaning of each indicator. bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test-perf10 --messages 4 --from-latest start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec 2021-03-10 04:32:54:349, 2021-03-10 04:33:45:651, 390.6348, 7.6144, 40001, 779.7162, 3096, 48206, 8.1034, 829.7930 show-detailed-stats: bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test-perf --messages 1 --show-detailed-stats time, threadId, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec 2021-03-10 11:19:00:146, 0, 785.6112, 157.1222, 823773, 164754.6000, 1615346338626, -1615346333626, 0., 0. 2021-03-10 11:19:05:146, 0, 4577.7817, 758.4341, 4800152, 795275.8000, 0, 5000, 758.4341, 795275.8000 2021-03-10 11:19:10:146, 0, 8556.0875, 795.6612, 8971708, 834311.2000, 0, 5000, 795.6612, 834311.2000 2021-03-10 11:19:15:286, 0, 9526.5665, 188.8091, 9989329, 197980.7393, 0, 5140, 188.8091, 197980.7393 2021-03-10 11:19:20:310, 0, 9536.3321, 1.9438, 569, 2038.2166, 0, 5024, 1.9438, 2038.2166 One of the optimization methods is to display the indicator variable name and indicator test value in the form of a table, so that the meaning of each measurement value can be clearly expressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12157) test Upgrade 2.7.0 from 2.0.0 occur a question
Wenbing Shen created KAFKA-12157: Summary: test Upgrade 2.7.0 from 2.0.0 occur a question Key: KAFKA-12157 URL: https://issues.apache.org/jira/browse/KAFKA-12157 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.7.0 Reporter: Wenbing Shen Attachments: 1001server.log, 1001serverlog.png, 1003server.log, 1003serverlog.png, 1003statechange.log I was in a test environment, rolling upgrade from version 2.0.0 to version 2.7.0, and encountered the following problems. When the rolling upgrade progressed to the second round, I stopped the first broker(1001) in the second round and the following error occurred. When an agent processes the client producer request, the starting offset of the leader epoch of the partition leader suddenly becomes 0, and then continues to process write requests for the same partition, and an error log will appear.All partition leaders with 1001 replicas are transferred to the 1003 node, and these partitions on the 1003 node will generate this error if they receive production requests.When I restart 1001, the 1001 broker will report the following error: [2021-01-06 16:46:55,955] ERROR (ReplicaFetcherThread-8-1003 kafka.server.ReplicaFetcherThread 76) [ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=8] Unexpected error occurred while processing data for partition test-perf1-9 at offset 9666953 I use the following command to make a production request: nohup /home/kafka/software/kafka/bin/kafka-producer-perf-test.sh --num-records 1 --record-size 1000 --throughput 3 --producer-props bootstrap.servers=hdp1:9092,hdp2:9092,hdp3:9092 acks=1 --topic test-perf1 > 1pro.log 2>&1 & I tried to reproduce the problem again, but after three attempts, it did not reappear. I am curious how this problem occurred and why the 1003 broker resets startOffset to 0 of leaderEpoch 4 when the offset is assigned by broker in Log.append function. broker 1003: server.log [2021-01-06 16:37:59,492] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-9] New epoch en try EpochEntry(epoch=4, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=4, startOffset=9667122), EpochEntry(epoch=3, star tOffset=9195729), EpochEntry(epoch=2, startOffset=8348201)). Cache now contains 0 entries. [2021-01-06 16:37:59,493] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-8] New epoch en try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star tOffset=9196127), EpochEntry(epoch=1, startOffset=8342787)). Cache now contains 0 entries. [2021-01-06 16:37:59,495] WARN (data-plane-kafka-request-handler-131 kafka.server.epoch.LeaderEpochFileCache 70) [LeaderEpochCache test-perf1-2] New epoch en try EpochEntry(epoch=3, startOffset=0) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=3, startOffset=9667478), EpochEntry(epoch=2, star tOffset=9196127), EpochEntry(epoch=1, startOffset=8336727)). Cache now contains 0 entries. [2021-01-06 16:37:59,498] ERROR (data-plane-kafka-request-handler-142 kafka.server.ReplicaManager 76) [ReplicaManager broker=1003] Error processing append op eration on partition test-perf1-9 java.lang.IllegalArgumentException: Received invalid partition leader epoch entry EpochEntry(epoch=4, startOffset=-3) at kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:67) at kafka.server.epoch.LeaderEpochFileCache.assign(LeaderEpochFileCache.scala:59) at kafka.log.Log.maybeAssignEpochStartOffset(Log.scala:1268) at kafka.log.Log.$anonfun$append$6(Log.scala:1181) at kafka.log.Log$$Lambda$935/184936331.accept(Unknown Source) at java.lang.Iterable.forEach(Iterable.java:75) at kafka.log.Log.$anonfun$append$2(Log.scala:1179) at kafka.log.Log.append(Log.scala:2387) at kafka.log.Log.appendAsLeader(Log.scala:1050) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1079) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1067) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:953) at kafka.server.ReplicaManager$$Lambda$1025/1369541490.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:941) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:621) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:625) broker 1001:server.log [2021-01-06 16:46:55,955]
[jira] [Created] (KAFKA-10904) There is a misleading log when the replica fetcher thread handles offsets that are out of range
Wenbing Shen created KAFKA-10904: Summary: There is a misleading log when the replica fetcher thread handles offsets that are out of range Key: KAFKA-10904 URL: https://issues.apache.org/jira/browse/KAFKA-10904 Project: Kafka Issue Type: Improvement Components: replication Affects Versions: 2.7.0 Reporter: Wenbing Shen Attachments: ReplicaFetcherThread-has-a-misleading-log.png There is ambiguity in the replica fetcher thread's log. When the fetcher thread is handling with offset out of range, it needs to try to truncate the log. When the end offset of the follower replica is greater than the log start offset of the leader replica and smaller than the end offset of the leader replica, the follower replica will maintain its own fetch offset.However, such cases are processed together with cases where the follower replica's end offset is smaller than the leader replica's start offset, resulting in ambiguities in the log, where the follower replica's fetch offset is reported to reset to the leader replica's start offset.In fact, it still maintains its own fetch offset, so this WARN log is misleading to the user. [2020-11-12 05:30:54,319] WARN (ReplicaFetcherThread-1-1003 kafka.server.ReplicaFetcherThread 70) [ReplicaFetcher replicaId=1 010, leaderId=1003, fetcherId=1] Reset fetch offset for partition eb_raw_msdns-17 from 1933959108 to current leader's start o ffset 1883963889 [2020-11-12 05:30:54,320] INFO (ReplicaFetcherThread-1-1003 kafka.server.ReplicaFetcherThread 66) [ReplicaFetcher replicaId=1 010, leaderId=1003, fetcherId=1] Current offset 1933959108 for partition eb_raw_msdns-17 is out of range, which typically imp lies a leader change. Reset fetch offset to 1933959108 I think it is more accurate to print the WARN log only when follower replica really need to truncate the fetch offset to the leader replica's log start offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10891) The control plane needs to force the validation of requests from the controller
Wenbing Shen created KAFKA-10891: Summary: The control plane needs to force the validation of requests from the controller Key: KAFKA-10891 URL: https://issues.apache.org/jira/browse/KAFKA-10891 Project: Kafka Issue Type: Improvement Components: controller, core Affects Versions: 2.7.0 Reporter: Wenbing Shen Assignee: Wenbing Shen Attachments: 0880c08b0110fd91d30e.png, 0880c08b0110fdd1eb0f.png Current, data and control request through different plane in isolation, these endpoints are registered to the zookeeper node plane, this will cause the client to obtain the control endpoint, the client may use control endpoint for production, consumption and other data request, this violates the separation of data requests and the design of the control request, the server needs to be in the control plane inspection control request, refused to request data through the control plane. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10889) The log cleaner is not working for topic partitions
Wenbing Shen created KAFKA-10889: Summary: The log cleaner is not working for topic partitions Key: KAFKA-10889 URL: https://issues.apache.org/jira/browse/KAFKA-10889 Project: Kafka Issue Type: Bug Components: log cleaner Affects Versions: 2.0.0 Reporter: Wenbing Shen Attachments: 0880c08b0110fcdd9b0c.png, 0880c08b0110fcddfb0b.png * I have a topic that is reserved for the default of 7 days, but the log exists from October 26th to December 25th today.The log cleaner doesn't seem to be working on it.This seems to be an underlying problem in Kafka. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10886) Kafka crashed in windows environment2
Wenbing Shen created KAFKA-10886: Summary: Kafka crashed in windows environment2 Key: KAFKA-10886 URL: https://issues.apache.org/jira/browse/KAFKA-10886 Project: Kafka Issue Type: Bug Components: log Affects Versions: 2.0.0 Environment: Windows Server Reporter: Wenbing Shen I tried using the Kafka-9458 patch to fix the Kafka problem in the Windows environment, but it didn't seem to work.These include restarting the Kafka service causing data to be deleted by mistake, deleting a topic or a partition migration causing a disk to go offline or the broker crashed. [2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log[2020-12-23 17:26:11,124] ERROR (kafka-request-handler-11 kafka.server.LogDirFailureChannel 76) Error while renaming dir for test-1-1 in log dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\logjava.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786) at kafka.log.Log$$anonfun$renameDir$1.apply$mcV$sp(Log.scala:689) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log$$anonfun$renameDir$1.apply(Log.scala:687) at kafka.log.Log.maybeHandleIOException(Log.scala:1837) at kafka.log.Log.renameDir(Log.scala:687) at kafka.log.LogManager.asyncDelete(LogManager.scala:833) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:267) at kafka.cluster.Partition$$anonfun$delete$1.apply(Partition.scala:262) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:256) at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:264) at kafka.cluster.Partition.delete(Partition.scala:262) at kafka.server.ReplicaManager.stopReplica(ReplicaManager.scala:341) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:371) at kafka.server.ReplicaManager$$anonfun$stopReplicas$2.apply(ReplicaManager.scala:369) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:369) at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:222) at kafka.server.KafkaApis.handle(KafkaApis.scala:113) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:73) at java.lang.Thread.run(Unknown Source) Suppressed: java.nio.file.AccessDeniedException: E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1 -> E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log\test-1-1.7c4fdeca40294cc38aed815b1a8e7663-delete at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileCopy.move(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.move(Unknown Source) at java.nio.file.Files.move(Unknown Source) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783) ... 23 more[2020-12-23 17:26:11,127] INFO (LogDirFailureHandler kafka.server.ReplicaManager 66) [ReplicaManager broker=1002] Stopping serving replicas in dir E:\bigdata\works\kafka-2.0.0.5\kafka-2.0.0.5\bin\windows\.\..\..\data01\kafka\log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-10882) When sending a response to the client,a null pointer exception has occurred in the error code set
Wenbing Shen created KAFKA-10882: Summary: When sending a response to the client,a null pointer exception has occurred in the error code set Key: KAFKA-10882 URL: https://issues.apache.org/jira/browse/KAFKA-10882 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.11.0.4 Reporter: Wenbing Shen Attachments: 0880c08b0110fbd3bb0d.png, 0880c08b0110fbd3fb0e.png After the IO thread receives the production request, a null-pointer exception occurs when the error code set is parsed when the response error is returned to the client.A replica grab thread failed to resolve the protocol on another proxy node.I don't know if this is the same problem. I hope I can get help. -- This message was sent by Atlassian Jira (v8.3.4#803005)