[jira] [Assigned] (KAFKA-5358) Consumer perf tool should count rebalance time separately
[ https://issues.apache.org/jira/browse/KAFKA-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5358: --- Assignee: huxi > Consumer perf tool should count rebalance time separately > - > > Key: KAFKA-5358 > URL: https://issues.apache.org/jira/browse/KAFKA-5358 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: huxi > > It would be helpful to measure rebalance time separately in the performance > tool so that throughput between different versions can be compared more > easily in spite of improvements such as > https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance. > At the moment, running the perf tool on 0.11.0 or trunk for a short amount > of time will present a severely skewed picture since the overall time will be > dominated by the join group delay. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5327) Console Consumer should only poll for up to max messages
[ https://issues.apache.org/jira/browse/KAFKA-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5327: --- Assignee: huxi > Console Consumer should only poll for up to max messages > > > Key: KAFKA-5327 > URL: https://issues.apache.org/jira/browse/KAFKA-5327 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Dustin Cote >Assignee: huxi >Priority: Minor > > The ConsoleConsumer has a --max-messages flag that can be used to limit the > number of messages consumed. However, the number of records actually consumed > is governed by max.poll.records. This means you see one message on the > console, but your offset has moved forward a default of 500, which is kind of > counterintuitive. It would be good to only commit offsets for messages we > have printed to the console. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5296) Unable to write to some partitions of newly created topic in 10.2
[ https://issues.apache.org/jira/browse/KAFKA-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019074#comment-16019074 ] huxi commented on KAFKA-5296: - With 0.10.2 client, could you try to enlarge `request.timeout.ms` a little bit to see if this issue happens again? > Unable to write to some partitions of newly created topic in 10.2 > - > > Key: KAFKA-5296 > URL: https://issues.apache.org/jira/browse/KAFKA-5296 > Project: Kafka > Issue Type: Bug >Reporter: Abhisek Saikia > > We are using kafka 10.2 and the cluster was running fine for a month with 50 > topics and now we are having issue in producing message by creating new > topics. The create topic command is successful but producers are throwing > error while writing to some partitions. > Error in producer- > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for > [topic1]-8: 30039 ms has passed since batch creation plus linger time > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70) > ~[kafka-clients-0.10.2.0.jar:na] > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57) > ~[kafka-clients-0.10.2.0.jar:na] > at > org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25) > ~[kafka-clients-0.10.2.0.jar:na] > On the broker side, I don't see any topic-parition folder getting created for > the broker who is the leader for the partition. > While using 0.8 client, the write used to hang while it starts writing to the > partition having this issue. With 10.2 it resolved the the producer hang issue > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5299) MirrorMaker with New.consumer doesn't consume message from multiple topics whitelisted
[ https://issues.apache.org/jira/browse/KAFKA-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019072#comment-16019072 ] huxi commented on KAFKA-5299: - How did you specify the whitelist? It should be a valid regular expression. > MirrorMaker with New.consumer doesn't consume message from multiple topics > whitelisted > --- > > Key: KAFKA-5299 > URL: https://issues.apache.org/jira/browse/KAFKA-5299 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Jyoti > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5278) kafka-console-consumer: `--value-deserializer` is not working but `--property value.deserializer` does
[ https://issues.apache.org/jira/browse/KAFKA-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5278: --- Assignee: huxi > kafka-console-consumer: `--value-deserializer` is not working but `--property > value.deserializer` does > -- > > Key: KAFKA-5278 > URL: https://issues.apache.org/jira/browse/KAFKA-5278 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.2.1 >Reporter: Yeva Byzek >Assignee: huxi >Priority: Minor > > kafka-console-consumer: {{--value-deserializer}} is not working but > {{--property value.deserializer}} is working > 1. Does not work > {noformat} > $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning > --topic TEST1 --value-deserializer > org.apache.kafka.common.serialization.LongDeserializer > [2017-05-18 13:09:41,745] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > java.lang.ClassCastException: java.lang.Long cannot be cast to [B > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:100) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > Processed a total of 0 messages > {noformat} > 2. Does work > {noformat} > $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning > --topic TEST1 --property > value.deserializer=org.apache.kafka.common.serialization.LongDeserializer > 1000 > 2500 > 2000 > 5500 > 8000 > {noformat} > Without either, the output is > {noformat} > $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning > --topic TEST1 > ? > ? > ? > | > @ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009914#comment-16009914 ] huxi commented on KAFKA-5200: - [~ecomar] Seems there is no convenient tool to do this. However, you could follow the instructions below to complete the deletion: 1. Stop all healthy brokers 2. Manually delete `/brokers/topics/` znode in Zookeeper 3. Delete log files for the topic in the file systems 4. Remove `/admin/delete_topics` znode in Zookeeper 5. Start all healthy brokers > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5222) Kafka Connection error in contoller.logs
[ https://issues.apache.org/jira/browse/KAFKA-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007770#comment-16007770 ] huxi commented on KAFKA-5222: - You sure you see it when the broker is running instead of being shutdown? > Kafka Connection error in contoller.logs > > > Key: KAFKA-5222 > URL: https://issues.apache.org/jira/browse/KAFKA-5222 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.2.0 > Environment: AMI - Amazon Linux AMI > Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk) >Reporter: Abhimanyu Nagrath >Priority: Minor > > I am using single node Kafka and single node zookeeper and my controller.log > files are flooded with this exception. > java.io.IOException: Connection to 1 was disconnected before the response > was read > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112) > at scala.Option.foreach(Option.scala:257) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112) > at > kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136) > at > kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142) > at > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:192) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > What is the fix for this error? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5221) Kafka Consumer Attached to partition but not consuming messages
[ https://issues.apache.org/jira/browse/KAFKA-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007766#comment-16007766 ] huxi commented on KAFKA-5221: - [~manyu_aditya] How many partitions are subscribed by the consumer instance which you suspect not consuming messages? If it subscribes too many partitions, I guess it's very likely for some partitions not to be consumed for quite a while. > Kafka Consumer Attached to partition but not consuming messages > --- > > Key: KAFKA-5221 > URL: https://issues.apache.org/jira/browse/KAFKA-5221 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.0 > Environment: Amazon-Linux AMI > Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk) >Reporter: Abhimanyu Nagrath > > I have a single node Kafka broker and a single zookeeper (3.4.9). I am using > new Kafka Consumer APIs. One strange thing I observed is when I am starting > multiple Kafka consumers for multiple topics placed in a single group and on > hitting ./kafka-consumer-groups.sh this script for the group. Few of the > consumers are attached to the group but they do not consume any message. > Below are the stats of group command. > TOPIC PARTITION CURRENT-OFFSET > LOG-END-OFFSET LAGCONSUMER-ID HOST > topic1 0 288 288 0 > consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae / > consumer-8 > > topic1 1 283 283 0 > consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae / > consumer-8 > > topic1 2 279 279 0 > consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae / > consumer-8 > > topic2 0 - 9 - > consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed / > consumer-1 > > topic2 1 - 2 - > consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed / > consumer-1 > > topic3 0 450 450 0 > consumer-3-63c07703-17d0-471b-8c5f-17347699f108 / > consumer-3 > > topic41 - 54 > -consumer-2-94dcc209-8377-45ce-8473-9ab0d85951c4 > / > > topic2 2 441 441 0 > consumer-5-bcfffc99-5915-41f4-b3e4-970baa204c14 / > So can someone help me that why for topic **topic2** partition 0 > **current-offset** is showing - and **lag** is showing - but messages are > still there in the server as **LOG-END-OFFSET** is showing 9. > I am having 600 GB Data distributed among 280 topics and 7000 partitions. > This is happening very frequently and restarting the consumers solves the > issue temporarily. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007757#comment-16007757 ] huxi commented on KAFKA-5200: - Alternatively, you could start up that broker before the deleting operation will be resumed automatically. > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5222) Kafka Connection error in contoller.logs
[ https://issues.apache.org/jira/browse/KAFKA-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007742#comment-16007742 ] huxi commented on KAFKA-5222: - Did you set `broker.id` in server.properties? I wonder where the `Connection to 1` comes from if you did not set this config and only one broker is started since the default broker is starts from zero. > Kafka Connection error in contoller.logs > > > Key: KAFKA-5222 > URL: https://issues.apache.org/jira/browse/KAFKA-5222 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.2.0 > Environment: AMI - Amazon Linux AMI > Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk) >Reporter: Abhimanyu Nagrath >Priority: Minor > > I am using single node Kafka and single node zookeeper and my controller.log > files are flooded with this exception. > java.io.IOException: Connection to 1 was disconnected before the response > was read > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112) > at scala.Option.foreach(Option.scala:257) > at > kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112) > at > kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136) > at > kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142) > at > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:192) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > What is the fix for this error? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002107#comment-16002107 ] huxi edited comment on KAFKA-5200 at 5/9/17 6:15 AM: - Could add a parameter for kafka-reassign-partitions script to enable a force reassignment for partitions from dead brokers. was (Author: huxi_2b): Could add parameter for kafka-reassign-partitions script to forcibly reassign partitions from dead brokers. > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable
[ https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002107#comment-16002107 ] huxi commented on KAFKA-5200: - Could add parameter for kafka-reassign-partitions script to forcibly reassign partitions from dead brokers. > Deleting topic when one broker is down will prevent topic to be re-creatable > > > Key: KAFKA-5200 > URL: https://issues.apache.org/jira/browse/KAFKA-5200 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Edoardo Comar > > In a cluster with 5 broker, replication factor=3, min in sync=2, > one broker went down > A user's app remained of course unaware of that and deleted a topic that > (unknowingly) had a replica on the dead broker. > The topic went in 'pending delete' mode > The user then tried to recreate the topic - which failed, so his app was left > stuck - no working topic and no ability to create one. > The reassignment tool fails to move the replica out of the dead broker - > specifically because the broker with the partition replica to move is dead :-) > Incidentally the confluent-rebalancer docs say > http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster > > Supports moving partitions away from dead brokers > It'd be nice to similarly improve the opensource reassignment tool -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5178) Potential Performance Degradation in Kafka Producer when using Multiple Threads
[ https://issues.apache.org/jira/browse/KAFKA-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998316#comment-15998316 ] huxi commented on KAFKA-5178: - I suggest splitting the test into multiple sub-testings in order to find out the bottleneck. The roundtrip latency contains many parts of the time, such as remote time, queue time on the producer side and polling time on the consumer side. It's hard to tell which part contributes most when hitting the performance limit. Besides, I notice a fact that there are totally 64 partitions distributed on three brokers. How many physical disks does your broker machine have averagely? Could you also monitor the disk usage% during the test to see if it's too busy? > Potential Performance Degradation in Kafka Producer when using Multiple > Threads > --- > > Key: KAFKA-5178 > URL: https://issues.apache.org/jira/browse/KAFKA-5178 > Project: Kafka > Issue Type: Bug >Reporter: Ben Stopford > > There is evidence that the Kafka Producer drops performance as we increase > the number of threads using it. > This is based on some benchmarking done in the community. I have not > independently validated these results. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5177) Automatic creation of topic prompts warnings into console
[ https://issues.apache.org/jira/browse/KAFKA-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997932#comment-15997932 ] huxi commented on KAFKA-5177: - Seems this should be a transient warning log meaning clients fails to find leader from the metadata. You will not see this any more once client-side metadata is updated. > Automatic creation of topic prompts warnings into console > - > > Key: KAFKA-5177 > URL: https://issues.apache.org/jira/browse/KAFKA-5177 > Project: Kafka > Issue Type: Wish >Affects Versions: 0.10.1.0 > Environment: Mac OSX 16.5.0 Darwin Kernel Version 16.5.0 > root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64 & JDK 1.8.0_121 >Reporter: Pranas Baliuka >Priority: Minor > > The quick guide https://kafka.apache.org/0101/documentation.html#quickstart > Leaves the bad first impression at the step when test messages are appended: > {code} > kafka_2.11-0.10.1.0 pranas$ bin/kafka-topics.sh --list --zookeeper > localhost:2181 > session-1 > kafka_2.11-0.10.1.0 pranas$ bin/kafka-console-producer.sh --broker-list > localhost:9092 --topic test > Message 1 > [2017-05-05 09:05:10,923] WARN Error while fetching metadata with correlation > id 0 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > Message 2 > Message 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5177) Default quick start config prompts warnings into console
[ https://issues.apache.org/jira/browse/KAFKA-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997692#comment-15997692 ] huxi commented on KAFKA-5177: - Did you create topic `test` before issuing kafka--console-producer script? > Default quick start config prompts warnings into console > > > Key: KAFKA-5177 > URL: https://issues.apache.org/jira/browse/KAFKA-5177 > Project: Kafka > Issue Type: Wish > Components: documentation >Affects Versions: 0.10.1.0 > Environment: Mac OSX 16.5.0 Darwin Kernel Version 16.5.0 > root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64 & JDK 1.8.0_121 >Reporter: Pranas Baliuka > > The quick guide https://kafka.apache.org/0101/documentation.html#quickstart > Leaves the bad first impression at the step when test messages are appended: > {code} > kafka_2.11-0.10.1.0 pranas$ bin/kafka-topics.sh --list --zookeeper > localhost:2181 > session-1 > kafka_2.11-0.10.1.0 pranas$ bin/kafka-console-producer.sh --broker-list > localhost:9092 --topic test > Message 1 > [2017-05-05 09:05:10,923] WARN Error while fetching metadata with correlation > id 0 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > Message 2 > Message 3 > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
[ https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994197#comment-15994197 ] huxi commented on KAFKA-5153: - Could you set `replica.lag.time.max.ms`, `replica.fetch.wait.max.ms` to larger values and retry? > KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting > --- > > Key: KAFKA-5153 > URL: https://issues.apache.org/jira/browse/KAFKA-5153 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 > Environment: RHEL 6 > Java Version 1.8.0_91-b14 >Reporter: Arpan >Priority: Critical > Attachments: server_1_72server.log, server_2_73_server.log, > server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, > ThreadDump_1493564177.dump, ThreadDump_1493564249.dump > > > Hi Team, > I was earlier referring to issue KAFKA-4477 because the problem i am facing > is similar. I tried to search the same reference in release docs as well but > did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using > 2.11_0.10.2.0. > I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set > of servers in cluster mode. We are having around 240GB of data getting > transferred through KAFKA everyday. What we are observing is disconnect of > the server from cluster and ISR getting reduced and it starts impacting > service. > I have also observed file descriptor count getting increased a bit, in normal > circumstances we have not observed FD count more than 500 but when issue > started we were observing it in the range of 650-700 on all 3 servers. > Attaching thread dumps of all 3 servers when we started facing the issue > recently. > The issue get vanished once you bounce the nodes and the set up is not > working more than 5 days without this issue. Attaching server logs as well. > Kindly let me know if you need any additional information. Attaching > server.properties as well for one of the server (It's similar on all 3 > serversP) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-5155) Messages can be deleted prematurely when some producers use timestamps and some not
[ https://issues.apache.org/jira/browse/KAFKA-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994177#comment-15994177 ] huxi edited comment on KAFKA-5155 at 5/3/17 2:27 AM: - This is very similar with a jira issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported by me complaining of the fact that broker side cannot honor the order of timestamp. Sounds like you cannot mix up the new timestamps and old timestamps based on the current design. was (Author: huxi_2b): This is very similar with a jira issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported by me complaining of the fact that Kafka cannot broker side cannot honor the order of timestamp. Sounds like you cannot mix up the new timestamps and old timestamps based on the current design. > Messages can be deleted prematurely when some producers use timestamps and > some not > --- > > Key: KAFKA-5155 > URL: https://issues.apache.org/jira/browse/KAFKA-5155 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.2.0 >Reporter: Petr Plavjaník > > Some messages can be deleted prematurely and never read in following > scenario. A producer uses timestamps and produces messages that are appended > to the beginning of a log segment. Other producer produces messages without a > timestamp. In that case the largest timestamp is made by the old messages > with a timestamp and new messages with the timestamp does not influence and > the log segment with old and new messages can be delete immediately after the > last new message with no timestamp is appended. When all appended messages > have no timestamp, then they are not deleted because {{lastModified}} > attribute of a {{LogSegment}} is used. > New test case to {{kafka.log.LogTest}} that fails: > {code} > @Test > def > shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() { > val retentionMs = 1000 > val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0) > val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1, > magicValue = 0) > val log = createLog(set.sizeInBytes, retentionMs = retentionMs) > // append some messages to create some segments > log.append(old) > for (_ <- 0 until 12) > log.append(set) > assertEquals("No segment should be deleted", 0, log.deleteOldSegments()) > } > {code} > It can be prevented by using {{def largestTimestamp = > Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using > current timestamp when messages with timestamp {{-1}} are appended. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5155) Messages can be deleted prematurely when some producers use timestamps and some not
[ https://issues.apache.org/jira/browse/KAFKA-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994177#comment-15994177 ] huxi commented on KAFKA-5155: - This is very similar with a jira issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported by me complaining of the fact that Kafka cannot broker side cannot honor the order of timestamp. Sounds like you cannot mix up the new timestamps and old timestamps based on the current design. > Messages can be deleted prematurely when some producers use timestamps and > some not > --- > > Key: KAFKA-5155 > URL: https://issues.apache.org/jira/browse/KAFKA-5155 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.2.0 >Reporter: Petr Plavjaník > > Some messages can be deleted prematurely and never read in following > scenario. A producer uses timestamps and produces messages that are appended > to the beginning of a log segment. Other producer produces messages without a > timestamp. In that case the largest timestamp is made by the old messages > with a timestamp and new messages with the timestamp does not influence and > the log segment with old and new messages can be delete immediately after the > last new message with no timestamp is appended. When all appended messages > have no timestamp, then they are not deleted because {{lastModified}} > attribute of a {{LogSegment}} is used. > New test case to {{kafka.log.LogTest}} that fails: > {code} > @Test > def > shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() { > val retentionMs = 1000 > val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0) > val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1, > magicValue = 0) > val log = createLog(set.sizeInBytes, retentionMs = retentionMs) > // append some messages to create some segments > log.append(old) > for (_ <- 0 until 12) > log.append(set) > assertEquals("No segment should be deleted", 0, log.deleteOldSegments()) > } > {code} > It can be prevented by using {{def largestTimestamp = > Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using > current timestamp when messages with timestamp {{-1}} are appended. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5161) reassign-partitions to check if broker of ID exists in cluster
[ https://issues.apache.org/jira/browse/KAFKA-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5161: --- Assignee: huxi > reassign-partitions to check if broker of ID exists in cluster > -- > > Key: KAFKA-5161 > URL: https://issues.apache.org/jira/browse/KAFKA-5161 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.10.1.1 > Environment: Debian 8 >Reporter: Lawrence Weikum >Assignee: huxi >Priority: Minor > > A topic was created with only one replica. We wanted to increase it later to > 3 replicas. A JSON file was created, but the IDs for the brokers were > incorrect and not part of the system. > The script or the brokers receiving the reassignment command should first > check if the new IDs exist in the cluster first and then continue, throwing > an error to the user if there is one that doesn't. > The current effect of assign partitions to non-existant brokers is a stuck > replication assignment with no way to stop it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5143) Windows platform does not offer kafka-broker-api-versions.bat
huxi created KAFKA-5143: --- Summary: Windows platform does not offer kafka-broker-api-versions.bat Key: KAFKA-5143 URL: https://issues.apache.org/jira/browse/KAFKA-5143 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 0.10.2.0 Reporter: huxi Assignee: huxi Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4879) KafkaConsumer.position may hang forever when deleting a topic
[ https://issues.apache.org/jira/browse/KAFKA-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982598#comment-15982598 ] huxi commented on KAFKA-4879: - [~guozhang] What about KafkaConsumer.pollOnce? Does running time of `updateFetchPositions` count in the total timeout? > KafkaConsumer.position may hang forever when deleting a topic > - > > Key: KAFKA-4879 > URL: https://issues.apache.org/jira/browse/KAFKA-4879 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.0 >Reporter: Shixiong Zhu >Assignee: Armin Braun > > KafkaConsumer.position may hang forever when deleting a topic. The problem is > this line > https://github.com/apache/kafka/blob/022bf129518e33e165f9ceefc4ab9e622952d3bd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L374 > The timeout is "Long.MAX_VALUE", and it will just retry forever for > UnknownTopicOrPartitionException. > Here is a reproducer > {code} > import org.apache.kafka.clients.consumer.KafkaConsumer; > import org.apache.kafka.common.TopicPartition; > import org.apache.kafka.common.serialization.StringDeserializer; > import java.util.Collections; > import java.util.Properties; > import java.util.Set; > public class KafkaReproducer { > public static void main(String[] args) { > // Make sure "delete.topic.enable" is set to true. > // Please create the topic test with "3" partitions manually. > // The issue is gone when there is only one partition. > String topic = "test"; > Properties props = new Properties(); > props.put("bootstrap.servers", "localhost:9092"); > props.put("group.id", "testgroup"); > props.put("value.deserializer", StringDeserializer.class.getName()); > props.put("key.deserializer", StringDeserializer.class.getName()); > props.put("enable.auto.commit", "false"); > KafkaConsumer kc = new KafkaConsumer(props); > kc.subscribe(Collections.singletonList(topic)); > kc.poll(0); > Set partitions = kc.assignment(); > System.out.println("partitions: " + partitions); > kc.pause(partitions); > kc.seekToEnd(partitions); > System.out.println("please delete the topic in 30 seconds"); > try { > // Sleep 30 seconds to give us enough time to delete the topic. > Thread.sleep(3); > } catch (InterruptedException e) { > e.printStackTrace(); > } > System.out.println("sleep end"); > for (TopicPartition p : partitions) { > System.out.println(p + " offset: " + kc.position(p)); > } > System.out.println("cannot reach here"); > kc.close(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
[ https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5118: --- Assignee: huxi > Improve message for Kafka failed startup with non-Kafka data in data.dirs > - > > Key: KAFKA-5118 > URL: https://issues.apache.org/jira/browse/KAFKA-5118 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.0 >Reporter: Dustin Cote >Assignee: huxi >Priority: Minor > > Today, if you try to startup a broker with some non-Kafka data in the > data.dirs you end up with a cryptic message: > {code} > [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads > during logs loading: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 (kafka.log.LogManager) > [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > {code} > It'd be better if we could tell the user to look for non-Kafka data in the > data.dirs and print out the offending directory that caused the problem in > the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
[ https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982092#comment-15982092 ] huxi edited comment on KAFKA-5118 at 4/25/17 1:12 AM: -- I remember 0.10.2.0 checks whether the directory name contains the dash before substring-ing it. There is another possibility that your non-Kafka directory ends with `-delete`. Right? was (Author: huxi_2b): I remember 0.10.2.0 checks whether the directory name contains the dash before substring-ing it. This is another possibility that your non-Kafka directory ends with `-delete`. Right? > Improve message for Kafka failed startup with non-Kafka data in data.dirs > - > > Key: KAFKA-5118 > URL: https://issues.apache.org/jira/browse/KAFKA-5118 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.0 >Reporter: Dustin Cote >Priority: Minor > > Today, if you try to startup a broker with some non-Kafka data in the > data.dirs you end up with a cryptic message: > {code} > [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads > during logs loading: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 (kafka.log.LogManager) > [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > {code} > It'd be better if we could tell the user to look for non-Kafka data in the > data.dirs and print out the offending directory that caused the problem in > the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs
[ https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982092#comment-15982092 ] huxi commented on KAFKA-5118: - I remember 0.10.2.0 checks whether the directory name contains the dash before substring-ing it. This is another possibility that your non-Kafka directory ends with `-delete`. Right? > Improve message for Kafka failed startup with non-Kafka data in data.dirs > - > > Key: KAFKA-5118 > URL: https://issues.apache.org/jira/browse/KAFKA-5118 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.0 >Reporter: Dustin Cote >Priority: Minor > > Today, if you try to startup a broker with some non-Kafka data in the > data.dirs you end up with a cryptic message: > {code} > [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads > during logs loading: java.lang.StringIndexOutOfBoundsException: String index > out of range: -1 (kafka.log.LogManager) > [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > {code} > It'd be better if we could tell the user to look for non-Kafka data in the > data.dirs and print out the offending directory that caused the problem in > the first place. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980835#comment-15980835 ] huxi commented on KAFKA-4295: - [~ijuma] Are we going to continue to fix this issue since KIP 109 is planned to be finished in 0.11? > kafka-console-consumer.sh does not delete the temporary group in zookeeper > -- > > Key: KAFKA-4295 > URL: https://issues.apache.org/jira/browse/KAFKA-4295 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1 >Reporter: Sswater Shi >Assignee: huxi >Priority: Minor > > I'm not sure it is a bug or you guys designed it. > Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group > information in zookeeper/consumers on exit when without "--new-consumer". > There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if > kafka-console-consumer.sh runs a lot of times. > When 0.8.x.x, the kafka-console-consumer.sh can be followed by an argument > "group". If not specified, the kafka-console-consumer.sh will create a > temporary group name like 'console-consumer-'. If the group name is > specified by "group", the information in the zookeeper/consumers will be kept > on exit. If the group name is a temporary one, the information in the > zookeeper will be deleted when kafka-console-consumer.sh is quitted by > Ctrl+C. Why this is changed from 0.9.x.x. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4592) Kafka Producer Metrics Invalid Value
[ https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980818#comment-15980818 ] huxi commented on KAFKA-4592: - [~ijuma] What's the status for this jira after you said you need a discussion with Jun about whether 0 is a good initial value for metrics > Kafka Producer Metrics Invalid Value > > > Key: KAFKA-4592 > URL: https://issues.apache.org/jira/browse/KAFKA-4592 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 >Reporter: AJ Jwair >Assignee: huxi > > Producer metrics > Metric name: record-size-max > When no records are produced during the monitoring window, the > record-size-max has an invalid value of -9.223372036854776E16 > Please notice that the value is not a very small number close to zero bytes, > it is negative 90 quadrillion bytes > The same behavior was observed in: records-lag-max -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5091) ReassignPartitionsCommand should protect against empty replica list assignment
[ https://issues.apache.org/jira/browse/KAFKA-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5091: --- Assignee: huxi > ReassignPartitionsCommand should protect against empty replica list > assignment > --- > > Key: KAFKA-5091 > URL: https://issues.apache.org/jira/browse/KAFKA-5091 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Ryan P >Assignee: huxi > > Currently it is possible to lower a topics replication factor to 0 through > the use of the kafka-reassign-partitions command. > i.e. > cat increase-replication-factor.json > {"version":1, > "partitions":[{"topic":"foo","partition":0,"replicas":[]}]} > kafka-reassign-partitions --zookeeper localhost:2181 --reassignment-json-file > increase-replication-factor.json --execute > Topic:testPartitionCount:1ReplicationFactor:0 Configs: > Topic: foo Partition: 0Leader: -1 Replicas: Isr: > I for one can't think of a reason why this is something someone would do > intentionally. That said I think it's worth validating that at least 1 > replica remains within the replica list prior to executing the partition > reassignment. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5104) DumpLogSegments should not open index files with `rw`
[ https://issues.apache.org/jira/browse/KAFKA-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5104: --- Assignee: huxi > DumpLogSegments should not open index files with `rw` > - > > Key: KAFKA-5104 > URL: https://issues.apache.org/jira/browse/KAFKA-5104 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.10.2.0 > Environment: Kafka is run as root >Reporter: Yeva Byzek >Assignee: huxi >Priority: Minor > > kafka.tools.DumpLogSegments requires sudo for index files but not for log > files. It seems inconsistent why `w` access would be required for the index > files just to dump the output. > {noformat} > $ sudo kafka-run-class kafka.tools.DumpLogSegments --files > .index > Dumping .index > offset: 0 position: 0 > {noformat} > {noformat} > $ kafka-run-class kafka.tools.DumpLogSegments --files > .indexDumping .index > Exception in thread "main" java.io.FileNotFoundException: > .index (Permission denied) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.(RandomAccessFile.java:243) > at kafka.log.AbstractIndex.(AbstractIndex.scala:50) > at kafka.log.OffsetIndex.(OffsetIndex.scala:52) > at > kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpIndex(DumpLogSegments.scala:137) > at > kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:100) > at > kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:93) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:93) > at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak
[ https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972314#comment-15972314 ] huxi commented on KAFKA-5007: - Maybe a complete stack trace is helpful to diagnose what action throws this exception. > Kafka Replica Fetcher Thread- Resource Leak > --- > > Key: KAFKA-5007 > URL: https://issues.apache.org/jira/browse/KAFKA-5007 > Project: Kafka > Issue Type: Bug > Components: core, network >Affects Versions: 0.10.0.0, 0.10.1.1, 0.10.2.0 > Environment: Centos 7 > Jave 8 >Reporter: Joseph Aliase >Priority: Critical > Labels: reliability > > Kafka is running out of open file descriptor when system network interface is > done. > Issue description: > We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file > descriptor for the account running Kafka is set to 10. > During an upgrade, network interface went down. Outage continued for 12 hours > eventually all the broker crashed with java.io.IOException: Too many open > files error. > We repeated the test in a lower environment and observed that Open Socket > count keeps on increasing while the NIC is down. > We have around 13 topics with max partition size of 120 and number of replica > fetcher thread is set to 8. > Using an internal monitoring tool we observed that Open Socket descriptor > for the broker pid continued to increase although NIC was down leading to > Open File descriptor error. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer
[ https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972081#comment-15972081 ] huxi commented on KAFKA-5010: - [~ijuma] Is it due to the fact that Kafka chooses 1024 as the minimum block size for Snappy if the total size of log message set is smaller than 1024? See o.a.k.common.record.AbstractRecords#estimatedSize: return compressionType == CompressionType.NONE ? size : Math.min(Math.max(size / 2, 1024), 1 << 16); > Log cleaner crashed with BufferOverflowException when writing to the > writeBuffer > > > Key: KAFKA-5010 > URL: https://issues.apache.org/jira/browse/KAFKA-5010 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.2.0 >Reporter: Shuai Lin >Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with > BufferOverflowException when writing the filtered records into the > writeBuffer: > {code} > [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log > app-topic-20170317-20. (kafka.log.LogCleaner) > [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for > app-topic-20170317-20... (kafka.log.LogCleaner) > [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log > app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). > (kafka.log.LogCleaner) > [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log > app-topic-20170317-20 complete. (kafka.log.LogCleaner) > [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 > (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior > to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner) > [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log > app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > java.nio.BufferOverflowException > at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206) > at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98) > at > org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158) > at > org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468) > at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405) > at > kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401) > at scala.collection.immutable.List.foreach(List.scala:378) > at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401) > at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363) > at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362) > at scala.collection.immutable.List.foreach(List.scala:378) > at kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped > (kafka.log.LogCleaner) > {code} > I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M > to 128M, all with no luck: The log cleaner thread crashed immediately after > the broker got restarted. But setting it to 256MB fixed the problem! > Here are the settings for the cluster: > {code} > - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old > consumers) > - log.cleaner.enable = 'true' > - log.cleaner.min.cleanable.ratio = '0.1' > - log.cleaner.threads = '1' > - log.cleaner.io.buffer.load.factor = '0.98' > - log.roll.hours = '24' > - log.cleaner.dedupe.buffer.size = 2GB > - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB > for this topic) > - message.max.bytes = 10MB > {code} > Given that the size of readBuffer and writeBuffer are exactly the same (half > of log.cleaner.io.buffer.size), why would the cleaner throw a > BufferOverflowException when writing the filtered records into the > writeBuffer? IIUC that should never happen because the size of the filtered > records should be no greater than the size of the readBuffer, thus no greater > than the size of the writeBuffer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-5068) Optionally print out metrics after running the perf tests
[ https://issues.apache.org/jira/browse/KAFKA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-5068: --- Assignee: huxi > Optionally print out metrics after running the perf tests > - > > Key: KAFKA-5068 > URL: https://issues.apache.org/jira/browse/KAFKA-5068 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 0.10.2.0 >Reporter: Jun Rao >Assignee: huxi > Labels: newbie > > Often, we run ProducerPerformance/ConsumerPerformance tests to investigate > performance issues. It's useful for the tool to print out the metrics in the > producer/consumer at the end of the tests. We can make this optional to > preserve the current behavior by default. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5068) Optionally print out metrics after running the perf tests
[ https://issues.apache.org/jira/browse/KAFKA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968805#comment-15968805 ] huxi commented on KAFKA-5068: - Seems ProducerPerformance does print the final result for the test. > Optionally print out metrics after running the perf tests > - > > Key: KAFKA-5068 > URL: https://issues.apache.org/jira/browse/KAFKA-5068 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 0.10.2.0 >Reporter: Jun Rao > Labels: newbie > > Often, we run ProducerPerformance/ConsumerPerformance tests to investigate > performance issues. It's useful for the tool to print out the metrics in the > producer/consumer at the end of the tests. We can make this optional to > preserve the current behavior by default. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5062) Kafka brokers can accept malformed requests which allocate gigabytes of memory
[ https://issues.apache.org/jira/browse/KAFKA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967026#comment-15967026 ] huxi commented on KAFKA-5062: - I am intrigued by how to reproduce the problem. > Kafka brokers can accept malformed requests which allocate gigabytes of memory > -- > > Key: KAFKA-5062 > URL: https://issues.apache.org/jira/browse/KAFKA-5062 > Project: Kafka > Issue Type: Bug >Reporter: Apurva Mehta > > In some circumstances, it is possible to cause a Kafka broker to allocate > massive amounts of memory by writing malformed bytes to the brokers port. > In investigating an issue, we saw byte arrays on the kafka heap upto 1.8 > gigabytes, the first 360 bytes of which were non kafka requests -- an > application was writing the wrong data to kafka, causing the broker to > interpret the request size as 1.8GB and then allocate that amount. Apart from > the first 360 bytes, the rest of the 1.8GB byte array was null. > We have a socket.request.max.bytes set at 100MB to protect against this kind > of thing, but somehow that limit is not always respected. We need to > investigate why and fix it. > cc [~rnpridgeon], [~ijuma], [~gwenshap], [~cmccabe] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-5048) kafka brokers hang when one of broker node killed in the cluster and remaining broker nodes are not consuming the data.
[ https://issues.apache.org/jira/browse/KAFKA-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965622#comment-15965622 ] huxi commented on KAFKA-5048: - These warnings should be transient as long as you set the replication.factor. What's the replication factor for the topic? > kafka brokers hang when one of broker node killed in the cluster and > remaining broker nodes are not consuming the data. > --- > > Key: KAFKA-5048 > URL: https://issues.apache.org/jira/browse/KAFKA-5048 > Project: Kafka > Issue Type: Bug > Components: consumer, offset manager >Affects Versions: 0.10.1.1 >Reporter: Srinivas Yarra > > Kafka brokers hang when one of broker node killed in cluster and remaining > broker nodes are not consuming the data. Below message is getting in the > console consumer screen: > WARN Auto offset commit failed for group console-consumer-26249: Offset > commit failed with a retriable exception. You should retry committing > offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > if bring it back stopped service and then all brokers are starting consuming > data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4987) Topic creation allows invalid config values on running brokers
[ https://issues.apache.org/jira/browse/KAFKA-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950237#comment-15950237 ] huxi commented on KAFKA-4987: - Did you use a higher version of client talking to a lower version of broker? > Topic creation allows invalid config values on running brokers > -- > > Key: KAFKA-4987 > URL: https://issues.apache.org/jira/browse/KAFKA-4987 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1, 0.10.1.0 >Reporter: dan norwood > > we use kip4 capabilities to make a `CreateTopicsRequest` for our topics. one > of the configs we use is `cleanup.policy=compact, delete`. this was > inadvertently run against a cluster that does not support that policy. the > result was that the topic was created, however on subsequent broker bounce > the broker fails to start up > {code} > [2017-03-23 00:00:44,837] FATAL Fatal error during KafkaServer startup. > Prepare to shutdown (kafka.server.KafkaServer) > org.apache.kafka.common.config.ConfigException: Invalid value compact,delete > for configuration cleanup.policy: String must be one of: compact, delete > at > org.apache.kafka.common.config.ConfigDef$ValidString.ensureValid(ConfigDef.java:827) > at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:427) > at > org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:55) > at kafka.log.LogConfig.(LogConfig.scala:56) > at kafka.log.LogConfig$.fromProps(LogConfig.scala:192) > at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:598) > at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:597) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:597) > at kafka.server.KafkaServer.startup(KafkaServer.scala:183) > at > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37) > at kafka.Kafka$.main(Kafka.scala:67) > at kafka.Kafka.main(Kafka.scala) > [2017-03-23 00:00:44,839] INFO shutting down (kafka.server.KafkaServer) > [2017-03-23 00:00:44,844] INFO shut down completed (kafka.server.KafkaServer) > {code} > i believe that the broker should fail when given an invalid config during > topic creation -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4952) ZkUtils get topics by consumer group not correct
[ https://issues.apache.org/jira/browse/KAFKA-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944500#comment-15944500 ] huxi commented on KAFKA-4952: - [~zxylvlp] All subscribed topics' names will be created as a child znode under `owners` when the consumer group is running, so it is feasible to retrieve all topics by fetching all children nodes under this directory, I think. > ZkUtils get topics by consumer group not correct > > > Key: KAFKA-4952 > URL: https://issues.apache.org/jira/browse/KAFKA-4952 > Project: Kafka > Issue Type: Bug > Components: zkclient >Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, > 0.10.1.1, 0.10.2.0 > Environment: all >Reporter: ZhaoXingyu > Labels: easyfix > Attachments: WechatIMG1.jpeg > > Original Estimate: 24h > Remaining Estimate: 24h > > ZkUtils.getTopicsByConsumerGroup should get topics by consumergroup offsets > dir, but the code get topics by owners now. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4949) Calling kaka-consumer-group.sh to get the consumer offset throws OOM with heap space error
[ https://issues.apache.org/jira/browse/KAFKA-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942584#comment-15942584 ] huxi edited comment on KAFKA-4949 at 3/27/17 3:30 AM: -- Is it a duplicate of [kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]? was (Author: huxi_2b): Is is a duplicate of [kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]? > Calling kaka-consumer-group.sh to get the consumer offset throws OOM with > heap space error > -- > > Key: KAFKA-4949 > URL: https://issues.apache.org/jira/browse/KAFKA-4949 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.10.1.0 >Reporter: T Rao > > Command --> > bin/kafka-consumer-groups.sh --bootstrap-server > broker1:9092,broker2:9092,broker3:9092 --describe --group testgroups > Error > - > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:180) > at > kafka.admin.AdminClient.kafka$admin$AdminClient$$send(AdminClient.scala:49) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:61) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:58) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:58) > at kafka.admin.AdminClient.findCoordinator(AdminClient.scala:72) > at kafka.admin.AdminClient.describeGroup(AdminClient.scala:125) > at kafka.admin.AdminClient.describeConsumerGroup(AdminClient.scala:147) > at > kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:308) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89) > at > kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describe(ConsumerGroupCommand.scala:296) > at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68) > at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4949) Calling kaka-consumer-group.sh to get the consumer offset throws OOM with heap space error
[ https://issues.apache.org/jira/browse/KAFKA-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942584#comment-15942584 ] huxi commented on KAFKA-4949: - Is is a duplicate of [kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]? > Calling kaka-consumer-group.sh to get the consumer offset throws OOM with > heap space error > -- > > Key: KAFKA-4949 > URL: https://issues.apache.org/jira/browse/KAFKA-4949 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.10.1.0 >Reporter: T Rao > > Command --> > bin/kafka-consumer-groups.sh --bootstrap-server > broker1:9092,broker2:9092,broker3:9092 --describe --group testgroups > Error > - > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343) > at org.apache.kafka.common.network.Selector.poll(Selector.java:291) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:180) > at > kafka.admin.AdminClient.kafka$admin$AdminClient$$send(AdminClient.scala:49) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:61) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:58) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:58) > at kafka.admin.AdminClient.findCoordinator(AdminClient.scala:72) > at kafka.admin.AdminClient.describeGroup(AdminClient.scala:125) > at kafka.admin.AdminClient.describeConsumerGroup(AdminClient.scala:147) > at > kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:308) > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89) > at > kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describe(ConsumerGroupCommand.scala:296) > at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68) > at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4908) consumer.properties logging warnings
[ https://issues.apache.org/jira/browse/KAFKA-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932099#comment-15932099 ] huxi commented on KAFKA-4908: - `zookeeper.connection.timeout.ms` and `zookeeper.connect` are old consumer's configs which should be not specified when issuing console consumer with new consumer (which is also the default case). If you do, Kafka complains that and it will not recognize them as valid, which is an expected behavior. > consumer.properties logging warnings > > > Key: KAFKA-4908 > URL: https://issues.apache.org/jira/browse/KAFKA-4908 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 >Reporter: Seweryn Habdank-Wojewodzki >Priority: Minor > > default consumer.properties at startaup of the console consumer delivered > with Kafka package are logging warnings: > [2017-03-15 16:36:57,439] WARN The configuration > 'zookeeper.connection.timeout.ms' was supplied but isn't a known config. > (org.apache.kafka.clients.consumer.ConsumerConfig) > [2017-03-15 16:36:57,455] WARN The configuration 'zookeeper.connect' was > supplied but isn't a known config. > (org.apache.kafka.clients.consumer.ConsumerConfig) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4879) KafkaConsumer.position may hang forever when deleting a topic
[ https://issues.apache.org/jira/browse/KAFKA-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904322#comment-15904322 ] huxi commented on KAFKA-4879: - Could have KafkaConsumer#updateFetchPositions, Fetcher#updateFetchPositions, Fetcher#resetOffset add a timeout. An alternative way to return immediately once capturing `UnknownTopicOrPartitionException` explicitly. > KafkaConsumer.position may hang forever when deleting a topic > - > > Key: KAFKA-4879 > URL: https://issues.apache.org/jira/browse/KAFKA-4879 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.10.2.0 >Reporter: Shixiong Zhu > > KafkaConsumer.position may hang forever when deleting a topic. The problem is > this line > https://github.com/apache/kafka/blob/022bf129518e33e165f9ceefc4ab9e622952d3bd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L374 > The timeout is "Long.MAX_VALUE", and it will just retry forever for > UnknownTopicOrPartitionException. > Here is a reproducer > {code} > import org.apache.kafka.clients.consumer.KafkaConsumer; > import org.apache.kafka.common.TopicPartition; > import org.apache.kafka.common.serialization.StringDeserializer; > import java.util.Collections; > import java.util.Properties; > import java.util.Set; > public class KafkaReproducer { > public static void main(String[] args) { > // Make sure "delete.topic.enable" is set to true. > // Please create the topic test with "3" partitions manually. > // The issue is gone when there is only one partition. > String topic = "test"; > Properties props = new Properties(); > props.put("bootstrap.servers", "localhost:9092"); > props.put("group.id", "testgroup"); > props.put("value.deserializer", StringDeserializer.class.getName()); > props.put("key.deserializer", StringDeserializer.class.getName()); > props.put("enable.auto.commit", "false"); > KafkaConsumer kc = new KafkaConsumer(props); > kc.subscribe(Collections.singletonList(topic)); > kc.poll(0); > Set partitions = kc.assignment(); > System.out.println("partitions: " + partitions); > kc.pause(partitions); > kc.seekToEnd(partitions); > System.out.println("please delete the topic in 30 seconds"); > try { > // Sleep 30 seconds to give us enough time to delete the topic. > Thread.sleep(3); > } catch (InterruptedException e) { > e.printStackTrace(); > } > System.out.println("sleep end"); > for (TopicPartition p : partitions) { > System.out.println(p + " offset: " + kc.position(p)); > } > System.out.println("cannot reach here"); > kc.close(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-4866) Kafka console consumer property is ignored
[ https://issues.apache.org/jira/browse/KAFKA-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4866: --- Assignee: huxi > Kafka console consumer property is ignored > -- > > Key: KAFKA-4866 > URL: https://issues.apache.org/jira/browse/KAFKA-4866 > Project: Kafka > Issue Type: Bug > Components: core, tools >Affects Versions: 0.10.2.0 > Environment: Java 8, Mac >Reporter: Frank Lyaruu >Assignee: huxi >Priority: Trivial > > I'd like to read a topic using the console consumer, which prints the keys > but not the values: > kafka-console-consumer --bootstrap-server someserver:9092 --from-beginning > --property print.key=true --property print.value=false --topic some_topic > the print.value property seems to be completely ignored (I seems missing in > the source), but it is mentioned in the quickstart. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898790#comment-15898790 ] huxi commented on KAFKA-4834: - Did you delete the zookeeper nodes manually before issuing this delete topics? > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > Labels: reliability > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > In controller.log: > INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip > sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch) > There may exist two controllers in the cluster because creating a new topic > may trigger two machines to change the state of same partition,
[jira] [Commented] (KAFKA-4844) kafka is holding open file descriptors
[ https://issues.apache.org/jira/browse/KAFKA-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896815#comment-15896815 ] huxi commented on KAFKA-4844: - What is your OS version? Seems it's a known behavior before NFS 4. > kafka is holding open file descriptors > -- > > Key: KAFKA-4844 > URL: https://issues.apache.org/jira/browse/KAFKA-4844 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: chao >Priority: Critical > > We found strange issue on Kafka 0.9.0.1 , kafka is holding opne file > descriptors , and not allowing disk space to be reclaimed > my question: > 1. what does file (nfsX) mean ??? > 2. why kafka is holding file ?? > $ sudo lsof /nas/kafka_logs/kafka/Order-6/.nfs04550ffcbd61 > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > java 97465 kafka mem REG 0,25 10485760 72683516 > /nas/kafka_logs/kafka/Order-6/.nfs04550ffcbd61 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong
[ https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893619#comment-15893619 ] huxi commented on KAFKA-4834: - How did you delete the topic? Seems the partitions' state is still Online which is weird a little bit since delete thread should firstly put them into Offline. > Kafka cannot delete topic with ReplicaStateMachine went wrong > - > > Key: KAFKA-4834 > URL: https://issues.apache.org/jira/browse/KAFKA-4834 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 >Reporter: Dan > > It happened several times that some topics can not be deleted in our > production environment. By analyzing the log, we found ReplicaStateMachine > went wrong. Here are the error messages: > In state-change.log: > ERROR Controller 2 epoch 201 initiated state change of replica 1 for > partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted > failed (state.change.logger) > java.lang.AssertionError: assertion failed: Replica > [Topic=test_create_topic1,Partition=1,Replica=1] should be in the > OfflineReplica states before moving to ReplicaDeletionStarted state. Instead > it is in OnlineReplica state > at scala.Predef$.assert(Predef.scala:179) > at > kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309) > at > kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344) > at > kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313) > at > kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403) > at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234) > at > kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > In controller.log: > INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip > sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch) > There may exist two controllers in the cluster because creating a new topic > may trigger two machines to change
[jira] [Commented] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.
[ https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889871#comment-15889871 ] huxi commented on KAFKA-4822: - For KafkaProducer, all writes are asynchronous by default. Snippets below implements synchronous writes: {code} Future future = producer.send(record); RecordMetadata metadata = future.get(); {code} > Kafka producer implementation without additional threads, similar to sync > producer of 0.8. > -- > > Key: KAFKA-4822 > URL: https://issues.apache.org/jira/browse/KAFKA-4822 > Project: Kafka > Issue Type: New Feature > Components: producer >Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1 >Reporter: Giri >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric
[ https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4811: --- Assignee: huxi > ReplicaFetchThread may fail to create due to existing metric > > > Key: KAFKA-4811 > URL: https://issues.apache.org/jira/browse/KAFKA-4811 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.10.2.0 >Reporter: Jun Rao >Assignee: huxi > Labels: newbie > > Someone reported the following error. > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, group=replica-fetcher-metrics, > description=Connections closed per second in the window., tags={broker-id=1, > fetcher-id=0}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680) > at org.apache.kafka.common.network.Selector.(Selector.java:140) > at > kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86) > at > kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78) > at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882) > at > kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700) > at > kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148) > at kafka.server.KafkaApis.handle(KafkaApis.scala:84) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric
[ https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887308#comment-15887308 ] huxi commented on KAFKA-4811: - [~junrao] what about the port change? Do we also need to consider this situation? > ReplicaFetchThread may fail to create due to existing metric > > > Key: KAFKA-4811 > URL: https://issues.apache.org/jira/browse/KAFKA-4811 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.10.2.0 >Reporter: Jun Rao > Labels: newbie > > Someone reported the following error. > java.lang.IllegalArgumentException: A metric named 'MetricName > [name=connection-close-rate, group=replica-fetcher-metrics, > description=Connections closed per second in the window., tags={broker-id=1, > fetcher-id=0}]' already exists, can't register another one. > at > org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249) > at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234) > at > org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680) > at org.apache.kafka.common.network.Selector.(Selector.java:140) > at > kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86) > at > kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83) > at > kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78) > at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882) > at > kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700) > at > kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148) > at kafka.server.KafkaApis.handle(KafkaApis.scala:84) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large
[ https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877641#comment-15877641 ] huxi edited comment on KAFKA-4762 at 2/22/17 7:16 AM: -- [~huadongliu] The reason you saw `NoCompressionCodec` in the output is because of the effect of the config `deep-iteration` which enforces a deep iteration of the compressed messages. However, it's easy to find out that many messages share a same position which means the producer enables the compression if it's not explicitly set on broker side. was (Author: huxi_2b): [~huadongliu] The reason you saw `NoCompressionCodec` in the output is because of the effect of the config `deep-iteration` which enforces a deep interation of the compressed messages. However, it's easy to find out that many messages share a same position which means the producer enables the compression if it's not explicitly set on broker side. > Consumer throwing RecordTooLargeException even when messages are not that > large > --- > > Key: KAFKA-4762 > URL: https://issues.apache.org/jira/browse/KAFKA-4762 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We were just recently hit by a weird error. > Before going in any further, explaining of our service setup. we have a > producer which produces messages not larger than 256 kb of messages( we have > an explicit check about this on the producer side) and on the client side we > have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) > Recently our client started to see this error: > {quote} > org.apache.kafka.common.errors.RecordTooLargeException: There are some > messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is > larger than the fetch size 524288 and hence cannot be ever returned. Increase > the fetch size, or decrease the maximum message size the broker will allow. > {quote} > We tried consuming messages with another consumer, without any > max.partition.fetch.bytes limit, and it consumed fine. The messages were > small, and did not seem to be greater than 256 kb > We took a log dump, and the log size looked fine. > {quote} > mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8 > offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: > 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8 > offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: > 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8 > offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: > 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8 > offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: > 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8 > offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: > 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8 > offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: > 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8 > offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: > 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8 > offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: > 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8 > {quote} > Has anyone seen something similar? or any points to troubleshoot this further > Please Note: To overcome this issue, we deployed a new consumer, without this > limit of max.partition.fetch.bytes, and it worked fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large
[ https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877641#comment-15877641 ] huxi commented on KAFKA-4762: - [~huadongliu] The reason you saw `NoCompressionCodec` in the output is because of the effect of the config `deep-iteration` which enforces a deep interation of the compressed messages. However, it's easy to find out that many messages share a same position which means the producer enables the compression if it's not explicitly set on broker side. > Consumer throwing RecordTooLargeException even when messages are not that > large > --- > > Key: KAFKA-4762 > URL: https://issues.apache.org/jira/browse/KAFKA-4762 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We were just recently hit by a weird error. > Before going in any further, explaining of our service setup. we have a > producer which produces messages not larger than 256 kb of messages( we have > an explicit check about this on the producer side) and on the client side we > have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) > Recently our client started to see this error: > {quote} > org.apache.kafka.common.errors.RecordTooLargeException: There are some > messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is > larger than the fetch size 524288 and hence cannot be ever returned. Increase > the fetch size, or decrease the maximum message size the broker will allow. > {quote} > We tried consuming messages with another consumer, without any > max.partition.fetch.bytes limit, and it consumed fine. The messages were > small, and did not seem to be greater than 256 kb > We took a log dump, and the log size looked fine. > {quote} > mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8 > offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: > 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8 > offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: > 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8 > offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: > 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8 > offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: > 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8 > offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: > 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8 > offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: > 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8 > offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: > 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8 > offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: > 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8 > {quote} > Has anyone seen something similar? or any points to troubleshoot this further > Please Note: To overcome this issue, we deployed a new consumer, without this > limit of max.partition.fetch.bytes, and it worked fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-4767) KafkaProducer is not joining its IO thread properly
[ https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4767: --- Assignee: huxi > KafkaProducer is not joining its IO thread properly > --- > > Key: KAFKA-4767 > URL: https://issues.apache.org/jira/browse/KAFKA-4767 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.11.0.0 >Reporter: Buğra Gedik >Assignee: huxi >Priority: Minor > > The {{KafkaProducer}} is not properly joining the thread it creates. The code > is like this: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } > {code} > If the code is interrupted while performing the join, it will end up leaving > the io thread running. The correct way of handling this is a follows: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > // propagate the interrupt > this.ioThread.interrupt(); > try { > this.ioThread.join(); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } finally { > // make sure we maintain the interrupted status > Thread.currentThread.interrupt(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4780) ReplicaFetcherThread.fetch could not get any reponse
[ https://issues.apache.org/jira/browse/KAFKA-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873942#comment-15873942 ] huxi commented on KAFKA-4780: - Seems it is a duplicate of [KAFKA-4477|https://issues.apache.org/jira/browse/KAFKA-4477?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22was%20disconnected%20before%20the%20response%20was%20read%22]? > ReplicaFetcherThread.fetch could not get any reponse > > > Key: KAFKA-4780 > URL: https://issues.apache.org/jira/browse/KAFKA-4780 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 0.10.1.0 >Reporter: mashudong > Attachments: capture.png, log.png, partition.png > > > All partitions with broker 3 as leader has just broker 3 in its isr > !partition.png! > Many IOException in server.log on broker 1 and 2 > !log.png! > According to network packet capture, ReplicaFetcherThread of broker 1 could > not get any response from broker 3, and after 30 seconds the connection was > closed by broker 1. > !capture.png! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly
[ https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870055#comment-15870055 ] huxi commented on KAFKA-4767: - What do you mean by "leaking the IO thread"? Do you mean it could not be shut down successfully after interrupting the user thread in which KafkaProducer.close was invoked? This should be not gonna happen since this.sender.initiateClose() would always be run even when you interrupt the user thread. In my opinion, interrupting the user thread is no different from invoking ioThread.join with a relatively small timeout because there is still a chance to force close the IO thread and wait it again. That's also why we swallow InterruptedException during the first join. Does it look good to you though? And for sake of the curiosity, did you encounter any cases where IO thread got failed to be shut down? > KafkaProducer is not joining its IO thread properly > --- > > Key: KAFKA-4767 > URL: https://issues.apache.org/jira/browse/KAFKA-4767 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.11.0.0 >Reporter: Buğra Gedik >Priority: Minor > > The {{KafkaProducer}} is not properly joining the thread it creates. The code > is like this: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } > {code} > If the code is interrupted while performing the join, it will end up leaving > the io thread running. The correct way of handling this is a follows: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > // propagate the interrupt > this.ioThread.interrupt(); > try { > this.ioThread.join(); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } finally { > // make sure we maintain the interrupted status > Thread.currentThread.interrupt(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly
[ https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869382#comment-15869382 ] huxi commented on KAFKA-4767: - I think I might get the point of what you mean. What you really concerns is that IO thread might got failed to be shut down or leave an inconsistent state if the user thread was interrupted. Am I right? 1. That `this.sender.initiateClose()` is not interruptible which means user thread could always be able to initiate a close to the IO thread even after we interrupt the user thread somewhere. 2. I do agree to restore the interruption status of the user thread after catching the InterruptedException, which is a really good practice. 3. The current logic already considers the situation where user thread does not wait enough time to have the IO thread finish its work, so it adds forceClose and corresponding code to force close the IO thread. In this case, we don't have to explicitly do the same thing again in the catch clause like what you suggest. Do they make any senses? > KafkaProducer is not joining its IO thread properly > --- > > Key: KAFKA-4767 > URL: https://issues.apache.org/jira/browse/KAFKA-4767 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.11.0.0 >Reporter: Buğra Gedik >Priority: Minor > > The {{KafkaProducer}} is not properly joining the thread it creates. The code > is like this: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } > {code} > If the code is interrupted while performing the join, it will end up leaving > the io thread running. The correct way of handling this is a follows: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > // propagate the interrupt > this.ioThread.interrupt(); > try { > this.ioThread.join(); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } finally { > // make sure we maintain the interrupted status > Thread.currentThread.interrupt(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly
[ https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868943#comment-15868943 ] huxi commented on KAFKA-4767: - Seems that KafkaProducer already initiates a close to the IO thread by setting running to false, so it is not reasonable to issue ioThread.interrupt() directly. > KafkaProducer is not joining its IO thread properly > --- > > Key: KAFKA-4767 > URL: https://issues.apache.org/jira/browse/KAFKA-4767 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 0.11.0.0 >Reporter: Buğra Gedik >Priority: Minor > > The {{KafkaProducer}} is not properly joining the thread it creates. The code > is like this: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } > {code} > If the code is interrupted while performing the join, it will end up leaving > the io thread running. The correct way of handling this is a follows: > {code} > try { > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > // propagate the interrupt > this.ioThread.interrupt(); > try { > // join again (if you want to be more accurate, you can re-adjust > the timeout) > this.ioThread.join(timeUnit.toMillis(timeout)); > } catch (InterruptedException t) { > firstException.compareAndSet(null, t); > log.error("Interrupted while joining ioThread", t); > } finally { > // make sure we maintain the interrupted status > Thread.currentThread.interrupt(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large
[ https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864954#comment-15864954 ] huxi edited comment on KAFKA-4762 at 2/15/17 9:17 AM: -- Logs show that you are using 0.10.x(or before) where max.partition.fetch.bytes is a hard limit even when you enable the compression. In your case, seems that you have enabled the compression on the producer side. `max.partition.fetch.bytes` also applies to the whole compressed message which is often much larger than a single one. That's why you run into RecordTooLargeException. 0.10.1 which completes [KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes] already 'fixes' your problem by making `max.partition.fetch.bytes` field in the fetch request much less useful, so you can try with an 0.10.1 build. was (Author: huxi_2b): Logs show that you are using 0.10.x where max.partition.fetch.bytes is a hard limit even when you enable the compression. In your case, seems that you have enabled the compression on the producer side. `max.partition.fetch.bytes` also applies to the whole compressed message which is often much larger than a single one. That's why you run into RecordTooLargeException. 0.10.1 which completes [KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes] already 'fixes' your problem by making `max.partition.fetch.bytes` field in the fetch request much less useful, so you can try with an 0.10.1 build. > Consumer throwing RecordTooLargeException even when messages are not that > large > --- > > Key: KAFKA-4762 > URL: https://issues.apache.org/jira/browse/KAFKA-4762 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We were just recently hit by a weird error. > Before going in any further, explaining of our service setup. we have a > producer which produces messages not larger than 256 kb of messages( we have > an explicit check about this on the producer side) and on the client side we > have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) > Recently our client started to see this error: > {quote} > org.apache.kafka.common.errors.RecordTooLargeException: There are some > messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is > larger than the fetch size 524288 and hence cannot be ever returned. Increase > the fetch size, or decrease the maximum message size the broker will allow. > {quote} > We tried consuming messages with another consumer, without any > max.partition.fetch.bytes limit, and it consumed fine. The messages were > small, and did not seem to be greater than 256 kb > We took a log dump, and the log size looked fine. > {quote} > mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8 > offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: > 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8 > offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: > 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8 > offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: > 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8 > offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: > 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8 > offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: > 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8 > offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: > 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8 > offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: > 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8 > offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: > 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8 > {quote} > Has anyone seen something similar? or any points to troubleshoot this further > Please Note: To overcome this issue, we deployed a new consumer, without this > limit of max.partition.fetch.bytes, and it worked fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large
[ https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864954#comment-15864954 ] huxi commented on KAFKA-4762: - Logs show that you are using 0.10.x where max.partition.fetch.bytes is a hard limit even when you enable the compression. In your case, seems that you have enabled the compression on the producer side. `max.partition.fetch.bytes` also applies to the whole compressed message which is often much larger than a single one. That's why you run into RecordTooLargeException. 0.10.1 which completes [KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes] already 'fixes' your problem by making `max.partition.fetch.bytes` field in the fetch request much less useful, so you can try with an 0.10.1 build. > Consumer throwing RecordTooLargeException even when messages are not that > large > --- > > Key: KAFKA-4762 > URL: https://issues.apache.org/jira/browse/KAFKA-4762 > Project: Kafka > Issue Type: Bug >Reporter: Vipul Singh > > We were just recently hit by a weird error. > Before going in any further, explaining of our service setup. we have a > producer which produces messages not larger than 256 kb of messages( we have > an explicit check about this on the producer side) and on the client side we > have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) > Recently our client started to see this error: > {quote} > org.apache.kafka.common.errors.RecordTooLargeException: There are some > messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is > larger than the fetch size 524288 and hence cannot be ever returned. Increase > the fetch size, or decrease the maximum message size the broker will allow. > {quote} > We tried consuming messages with another consumer, without any > max.partition.fetch.bytes limit, and it consumed fine. The messages were > small, and did not seem to be greater than 256 kb > We took a log dump, and the log size looked fine. > {quote} > mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8 > offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: > 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8 > offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: > 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8 > offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: > 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8 > offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: > 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8 > offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: > 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8 > offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: > 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8 > offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: > 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8 > offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: > 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8 > {quote} > Has anyone seen something similar? or any points to troubleshoot this further > Please Note: To overcome this issue, we deployed a new consumer, without this > limit of max.partition.fetch.bytes, and it worked fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop
[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855232#comment-15855232 ] huxi commented on KAFKA-4739: - Seems that FETCH requests time out every 40 seconds and 40 second is the default value for config `request.timeout.ms` of the new consumer. Could you make sure if your settings take effect? And why do you set a probably large timeout value? Poor network condition? > KafkaConsumer poll going into an infinite loop > -- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 >Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KAFKA-4727) A Production server configuration needs to be updated
[ https://issues.apache.org/jira/browse/KAFKA-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4727: --- Assignee: huxi > A Production server configuration needs to be updated > - > > Key: KAFKA-4727 > URL: https://issues.apache.org/jira/browse/KAFKA-4727 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: huxi > Labels: newbie > > In docs/ops.html, we have a section on "A Production server configuration" > with queued.max.requests=16. This is often too low. We should change it to > the default value, which is 500. It will also be useful to see if other > configurations need to be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4685) All partitions offline, no conroller znode in ZK
[ https://issues.apache.org/jira/browse/KAFKA-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834169#comment-15834169 ] huxi commented on KAFKA-4685: - Server logs showed that all the brokers have come back to register in ZK again. See "Creating /brokers/ids/0,1,2" in the logs. So did you mean /controller and /brokers/ids are still empty even after the broker 2 was elected as the new controller? > All partitions offline, no conroller znode in ZK > > > Key: KAFKA-4685 > URL: https://issues.apache.org/jira/browse/KAFKA-4685 > Project: Kafka > Issue Type: Bug >Reporter: Sinóros-Szabó Péter > Attachments: kafka-0-logs.zip, kafka-1-logs.zip, kafka-2-logs.zip, > zookeeper-logs.zip > > > Setup: 3 Kafka 0.11.1.1 nodes on kubernetes (in AWS), and another 3 nodes of > Zookeeper 3.5.2-alpha also in kubernetes (in AWS). > At 2017-01-23 06:51 ZK sessions expired. It seems from the logs that kafka-2 > was elected as the new controller, but I am not sure how to read that logs. > I've checked the ZK data and both the /controller is empty and also the > /brokers/ids is empty. Kafka reports that all partitions are offline, > although it seems to be working because messages are coming and going. > We are using an alpha version, I know that it may be a problem, but I suppose > that Kafka should see that there is not any node registered as controller. > I have attached the Kafka and ZK logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4684) Kafka does not offer kafka-configs.bat on Windows box
[ https://issues.apache.org/jira/browse/KAFKA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi updated KAFKA-4684: Labels: newbie (was: ) > Kafka does not offer kafka-configs.bat on Windows box > - > > Key: KAFKA-4684 > URL: https://issues.apache.org/jira/browse/KAFKA-4684 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 0.10.1.1 >Reporter: huxi >Assignee: huxi >Priority: Minor > Labels: newbie > > Kafka does not ship with kafka-configs.bat, so it's a little inconvenient to > add/modify configs on Windows platform -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834053#comment-15834053 ] huxi edited comment on KAFKA-4669 at 1/23/17 8:25 AM: -- Maybe we could invoke an alternative method `ProduceRequestResult#await(long, TimeUnit)` and specify a reasonable timeout value(i.e. 30 seconds or request.timeout.ms of time) in RecordAccumulator#awaitFlushCompletion. If await times out, we record a warn log to indicate users that some request did not get marked as complete, and flush will not be stuck anymore, as show below: {code:borderStyle=solid} public void awaitFlushCompletion() throws InterruptedException { try { for (RecordBatch batch : this.incomplete.all()) { if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) { log.warn("Did not complete the produce request for {} in 30 seconds.", batch.produceFuture.topicPartition()); } } } finally { this.flushesInProgress.decrementAndGet(); } } {code} [~becket_qin] Does it make sense? was (Author: huxi_2b): Maybe we could invoke an alternative method `ProduceRequestResult#await(long, TimeUnit)` and specify a reasonable timeout value in RecordAccumulator#awaitFlushCompletion. If await times out, we record a warn log to indicate users that some request did not get marked as complete, and flush will not be stuck anymore, as show below: {code:borderStyle=solid} public void awaitFlushCompletion() throws InterruptedException { try { for (RecordBatch batch : this.incomplete.all()) { if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) { log.warn("Did not complete the produce request for {} in 30 seconds.", batch.produceFuture.topicPartition()); } } } finally { this.flushesInProgress.decrementAndGet(); } } {code} [~becket_qin] Does it make sense? > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception
[ https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834053#comment-15834053 ] huxi commented on KAFKA-4669: - Maybe we could invoke an alternative method `ProduceRequestResult#await(long, TimeUnit)` and specify a reasonable timeout value in RecordAccumulator#awaitFlushCompletion. If await times out, we record a warn log to indicate users that some request did not get marked as complete, and flush will not be stuck anymore, as show below: {code:borderStyle=solid} public void awaitFlushCompletion() throws InterruptedException { try { for (RecordBatch batch : this.incomplete.all()) { if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) { log.warn("Did not complete the produce request for {} in 30 seconds.", batch.produceFuture.topicPartition()); } } } finally { this.flushesInProgress.decrementAndGet(); } } {code} [~becket_qin] Does it make sense? > KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws > exception > - > > Key: KAFKA-4669 > URL: https://issues.apache.org/jira/browse/KAFKA-4669 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1 >Reporter: Cheng Ju >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0 > > > There is no try catch in NetworkClient.handleCompletedReceives. If an > exception is thrown after inFlightRequests.completeNext(source), then the > corresponding RecordBatch's done will never get called, and > KafkaProducer.flush will hang on this RecordBatch. > I've checked 0.10 code and think this bug does exist in 0.10 versions. > A real case. First a correlateId not match exception happens: > 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] > (org.apache.kafka.clients.producer.internals.Sender.run:130) - Uncaught > error in kafka producer I/O thread: > java.lang.IllegalStateException: Correlation id for response (703766) does > not match request (703764) > at > org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > Then jstack shows the thread is hanging on: > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57) > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425) > at > org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544) > at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224) > at > org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) > at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) > at java.lang.Thread.run(Thread.java:745) > client code -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command
[ https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1586#comment-1586 ] huxi edited comment on KAFKA-4675 at 1/22/17 7:26 AM: -- Failed to reproduce this issue with snippet below: {code} val zkUtils = ZkUtils("localhost:2181", 3, 3, false) AdminUtils.deleteTopic(zkUtils, "old-topic") AdminUtils.createTopic(zkUtils, "new-topic", 1, 1) {code} After running the code above, topic "new-topic" will be created. Seems subsequent CreateTopic command has still been invoked. [~guozhang] Does the code reflect the way you run into this problem? was (Author: huxi_2b): Failed to reproduce this issue with snippet below: {code} val zkUtils = ZkUtils("localhost:2181", 3, 3, false) AdminUtils.deleteTopic(zkUtils, "old-topic") AdminUtils.createTopic(zkUtils, "new-topic", 1, 1) {code} After running the code above, topic "new-topic" will be created. Seems subsequent CreateTopic command has still been invoked. Does the code reflect the way you run into this problem? > Subsequent CreateTopic command could be lost after a DeleteTopic command > > > Key: KAFKA-4675 > URL: https://issues.apache.org/jira/browse/KAFKA-4675 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Labels: admin > > This is discovered while investigating KAFKA-3896: If an admin client sends a > delete topic command and a create topic command consecutively, even if it > wait for the response of the previous command before issuing the second, > there is still a race condition that the create topic command could be "lost". > This is because currently these commands are all asynchronous as defined in > KIP-4, and controller will return the response once it has written the > corresponding data to ZK path, which can be handled by different listener > threads at different paces, and if the thread handling create is faster than > the other, the executions could be effectively re-ordered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command
[ https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1586#comment-1586 ] huxi commented on KAFKA-4675: - Failed to reproduce this issue with snippet below: {code} val zkUtils = ZkUtils("localhost:2181", 3, 3, false) AdminUtils.deleteTopic(zkUtils, "old-topic") AdminUtils.createTopic(zkUtils, "new-topic", 1, 1) {code} After running the code above, topic "new-topic" will be created. Seems subsequent CreateTopic command has still been invoked. Does the code reflect the way you run into this problem? > Subsequent CreateTopic command could be lost after a DeleteTopic command > > > Key: KAFKA-4675 > URL: https://issues.apache.org/jira/browse/KAFKA-4675 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Labels: admin > > This is discovered while investigating KAFKA-3896: If an admin client sends a > delete topic command and a create topic command consecutively, even if it > wait for the response of the previous command before issuing the second, > there is still a race condition that the create topic command could be "lost". > This is because currently these commands are all asynchronous as defined in > KIP-4, and controller will return the response once it has written the > corresponding data to ZK path, which can be handled by different listener > threads at different paces, and if the thread handling create is faster than > the other, the executions could be effectively re-ordered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4684) Kafka does not offer kafka-configs.bat on Windows box
huxi created KAFKA-4684: --- Summary: Kafka does not offer kafka-configs.bat on Windows box Key: KAFKA-4684 URL: https://issues.apache.org/jira/browse/KAFKA-4684 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 0.10.1.1 Reporter: huxi Assignee: huxi Priority: Minor Kafka does not ship with kafka-configs.bat, so it's a little inconvenient to add/modify configs on Windows platform -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4680) min.insync.replicas can be set higher than replication factor
[ https://issues.apache.org/jira/browse/KAFKA-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832729#comment-15832729 ] huxi commented on KAFKA-4680: - If sending records to a topic with a larger 'min.insync.replicas' than replication factor with acks set to all, client callback returns `NOT_ENOUGH_REPLICAS` indicating that you be aware of this and adjust the topic-level min.isr.replicas or check the broker availability, doesn't it? I do agree that it's definitely better if we add some check before creating/altering topic. What I am concerned is the completeness. If a check is introduced during creating/altering topics, then do we also need to add some checks before changing the replication factor, especially reducing this number (we could call kafka-reassign-partitions.sh to do this although it's sort of inconvenient to use), to ensure the reduced number is still larger than min.isr.replicas? If this is the case, seems we have to figure out all the affected function paths. You of course could assign this jira to youself and get rolling :-) > min.insync.replicas can be set higher than replication factor > - > > Key: KAFKA-4680 > URL: https://issues.apache.org/jira/browse/KAFKA-4680 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: James Cheng > > It is possible to specify a min.insync.replicas for a topic that is higher > than the replication factor of the topic. If you do this, you will not be > able to produce to the topic with acks=all. > Furthermore, each produce request (including retries) to the topic will emit > an ERROR level message to the broker debuglogs. If this is not noticed > quickly enough, it can cause the debuglogs to balloon. > We actually hosed one of our Kafka clusters because of this. A topic got > configured with min.insync.replicas > replication factor. It had partitions > on all brokers of our cluster. The broker logs ballooned and filled up the > disks. We run these clusters on CoreOS, and CoreOS's etcd database got > corrupted. (Kafka didn't get corrupted, tho). > I think Kafka should do validation when someone tries to change a topic to > min.insync.replicas > replication factor, and reject the change. > This would presumably affect kafka-topics.sh, kafka-configs.sh, as well as > the CreateTopics operation that came in KIP-4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command
[ https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831945#comment-15831945 ] huxi commented on KAFKA-4675: - Seems deleting '/brokers/topics/' comes fourth from the bottom during the topic deletion while creating it comes top front in createTopic logic, did you encounter the TopicExistsException that failed the creation? > Subsequent CreateTopic command could be lost after a DeleteTopic command > > > Key: KAFKA-4675 > URL: https://issues.apache.org/jira/browse/KAFKA-4675 > Project: Kafka > Issue Type: Bug >Reporter: Guozhang Wang > Labels: admin > > This is discovered while investigating KAFKA-3896: If an admin client sends a > delete topic command and a create topic command consecutively, even if it > wait for the response of the previous command before issuing the second, > there is still a race condition that the create topic command could be "lost". > This is because currently these commands are all asynchronous as defined in > KIP-4, and controller will return the response once it has written the > corresponding data to ZK path, which can be handled by different listener > threads at different paces, and if the thread handling create is faster than > the other, the executions could be effectively re-ordered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4592) Kafka Producer Metrics Invalid Value
[ https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi updated KAFKA-4592: Reviewer: Ismael Juma > Kafka Producer Metrics Invalid Value > > > Key: KAFKA-4592 > URL: https://issues.apache.org/jira/browse/KAFKA-4592 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 >Reporter: AJ Jwair >Assignee: huxi > > Producer metrics > Metric name: record-size-max > When no records are produced during the monitoring window, the > record-size-max has an invalid value of -9.223372036854776E16 > Please notice that the value is not a very small number close to zero bytes, > it is negative 90 quadrillion bytes > The same behavior was observed in: records-lag-max -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi updated KAFKA-4295: Reviewer: Ismael Juma > kafka-console-consumer.sh does not delete the temporary group in zookeeper > -- > > Key: KAFKA-4295 > URL: https://issues.apache.org/jira/browse/KAFKA-4295 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1 >Reporter: Sswater Shi >Assignee: huxi >Priority: Minor > > I'm not sure it is a bug or you guys designed it. > Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group > information in zookeeper/consumers on exit when without "--new-consumer". > There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if > kafka-console-consumer.sh runs a lot of times. > When 0.8.x.x, the kafka-console-consumer.sh can be followed by an argument > "group". If not specified, the kafka-console-consumer.sh will create a > temporary group name like 'console-consumer-'. If the group name is > specified by "group", the information in the zookeeper/consumers will be kept > on exit. If the group name is a temporary one, the information in the > zookeeper will be deleted when kafka-console-consumer.sh is quitted by > Ctrl+C. Why this is changed from 0.9.x.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers
[ https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829525#comment-15829525 ] huxi commented on KAFKA-4674: - Is it a duplicate of [KAFKA-3916|https://issues.apache.org/jira/browse/KAFKA-3916]? > Frequent ISR shrinking and expanding and disconnects among brokers > -- > > Key: KAFKA-4674 > URL: https://issues.apache.org/jira/browse/KAFKA-4674 > Project: Kafka > Issue Type: Bug > Components: controller, core >Affects Versions: 0.10.0.1 > Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64 > JDK: 1.8.0_45 >Reporter: Kaiming Wan > Attachments: controller.log.rar, server.log.2017-01-11-14, > zookeeper.out.2017-01-11.log > > > We use a kafka cluster with 3 brokers in production environment. It works > well for several month. Recently, we get the UnderReplicatedPartitions>0 > warning mail. When we check the log, we find that the partition is always > experience ISR shrinking and expanding. And the disconnection exception can > be found in controller's log. > We also found some deviant output in zookeeper's log which point to a > consumer(using old API depends on zookeeper ) which has stopped its work with > many lags. > Actually, it is not the first time we encounter this problem. When we > first met this problem, we also found the same phenomenon and the log output. > We solve the problem by deleting the consumer node info in zookeeper. Then > everything goes well. > However, this time, after we deleting the consumer which already have > large lag, the frequent ISR shrinking and expanding didn't stop for a very > long time(serveral hours). Though, the issue didn't affect our consumer and > producer, we think it will make our cluster unstable. So at last, we solve > this problem by restart the controller broker. > And now I wander what cause this problem. I check the source code and > only know poll timeout will cause disconnection and ISR shrinking. Is the > issue related to zookeeper because it will not hold too many metadata > modification and make the replication fetch thread take more time? > I upload the log file in the attachment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545 ] huxi edited comment on KAFKA-4629 at 1/17/17 6:48 AM: -- Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some debugging, I found that BrokerTopicMetrics.close could always be invoked even when the corresponding MBeans did not get removed. Not sure if it's not caused by Kafka code. Might need to investigate whether it's a known issue for metrics-core library. was (Author: huxi_2b): Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some debugging, I found that BrokerTopicMetrics.close could always be invoked even when the corresponding MBeans did not get removed. Seems that it's not caused by Kafka code. Might need to investigate whether it's a known issue for metrics-core library. > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545 ] huxi edited comment on KAFKA-4629 at 1/17/17 6:41 AM: -- Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some debugging, I found that BrokerTopicMetrics.close could always be invoked even when the corresponding MBeans did not get removed. Seems that it's not caused by Kafka code. Might need to investigate whether it's a known issue for metrics-core library. was (Author: huxi_2b): Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some debugging, I found that BrokerTopicMetrics.close could always be invoked even the corresponding MBeans did not get removed. Seems that it's not caused by Kafka code. Might need to investigate whether it's a known issue for metrics-core library. > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545 ] huxi commented on KAFKA-4629: - Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some debugging, I found that BrokerTopicMetrics.close could always be invoked even the corresponding MBeans did not get removed. Seems that it's not caused by Kafka code. Might need to investigate whether it's a known issue for metrics-core library. > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823329#comment-15823329 ] huxi edited comment on KAFKA-4629 at 1/16/17 9:36 AM: -- [~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing the issue. And some interesting things were found. If deleting topic command was issued right after creating the topic, some MBeans at the topic level indeed failed to get removed. But if some delay time was put on between these two commands, then all the topic-level MBeans could be removed as expected. In the former case, the controller is still doing many background tasks to complete the topic creation although the CREATE command returns successfully. So could you wait some time before issuing the delete command to see whether you would run into this issue? By the way, I am using 0.10.1.0 although I think it does not matter. was (Author: huxi_2b): [~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing the issue. And some interesting things were found. If deleting topic command was issued right after creating the topic, some MBeans at the topic level indeed failed to get removed. But if some delay time was put on between these two commands, then all the topic-level MBeans could be removed as expected. In the former case, the controller is still doing many background tasks to complete the topic creation although the CREATE command returns successfully. So could you wait some time before issuing the delete command to see whether you would run into this issue? > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823594#comment-15823594 ] huxi commented on KAFKA-4595: - [~pengwei] I don't think it's doable since it would leave inconsistent states for the involved partitions, thus polluting controller's cache. As in the current design, the topic deleting is totally asynchronous. Users nearly always see the topic is marked as deleted successfully although there are several steps the controller needs to finish in the background. If we time out the deleteTopicStopReplicaCallback, completeReplicaDeletion will not be invoked. Does it make sense? > Controller send thread can't stop when broker change listener event trigger > for dead brokers > - > > Key: KAFKA-4595 > URL: https://issues.apache.org/jira/browse/KAFKA-4595 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.10.1.1 >Reporter: Pengwei >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0 > > > In our test env, we found controller is not working after a delete topic > opertation and network issue, the stack is below: > "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" > #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on > condition [0x7fb76b7c8000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xc05497b8> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50) > at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128) > at > kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81) > - locked <0xc0258760> (a java.lang.Object) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:79) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356) > at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >Locked ownable synchronizers: > - <
[jira] [Commented] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop
[ https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823399#comment-15823399 ] huxi commented on KAFKA-4557: - [~scf37] Did you create your own Sender instances in the producer code? > ConcurrentModificationException in KafkaProducer event loop > --- > > Key: KAFKA-4557 > URL: https://issues.apache.org/jira/browse/KAFKA-4557 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.1.0 >Reporter: Sergey Alaev >Priority: Critical > Labels: reliability > Fix For: 0.10.2.0 > > > Under heavy load, Kafka producer can stop publishing events. Logs below. > [2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] > [NetworkClient] [] [] [] [DEBUG]: Disconnecting from node 2 due to > request timeout. > [2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > [2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. (#2 from 2016-12-19T15:01:28.793Z) > > [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] > [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message > to Kafka > org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for > events-deadletter-0 due to 30032 ms has passed since batch creation plus > linger time (#285 from 2016-12-19 > T15:01:28.793Z) > [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] > [SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka > deadletter queue > org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for > events-deadletter-0 due to 30032 ms has passed since batch creation plus > linger time (#286 from 2016-12-19 > T15:01:28.793Z) > [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] > [Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer > I/O thread: > java.util.ConcurrentModificationException: null > at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) > ~[na:1.8.0_45] > at > org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242) > ~[kafka-clients-0.10.1.0.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) > ~[kafka-clients-0.10.1.0.jar:na] > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) > ~[kafka-clients-0.10.1.0.jar:na] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] > [NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching > metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers
[ https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823347#comment-15823347 ] huxi commented on KAFKA-4577: - I suspect the TopicDeletionManager instance in KafkaController is null although the controller should have created an instance of that during the initialization. Could you check logs to see if controller has been failed over to another broker? > NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers > -- > > Key: KAFKA-4577 > URL: https://issues.apache.org/jira/browse/KAFKA-4577 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 >Reporter: Scott Reynolds > > Seems as if either deleteTopicManager or deleteTopicManager. > partitionsToBeDeleted wasn't set ? > {code} > java.lang.NullPointerException > at > kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers > (ControllerChannelManager.scala:331) > at kafka.controller.KafkaController.sendUpdateMetadataRequest > (KafkaController.scala:1023) > at > kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications > (KafkaController.scala:1371) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp > (KafkaController.scala:1358) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234) > at kafka.controller.IsrChangeNotificationListener.handleChildChange > (KafkaController.scala:1351) > at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843) > at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823329#comment-15823329 ] huxi commented on KAFKA-4629: - [~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing the issue. And some interesting things were found. If deleting topic command was issued right after creating the topic, some MBeans at the topic level indeed failed to get removed. But if some delay time was put on between these two commands, then all the topic-level MBeans could be removed as expected. In the former case, the controller is still doing many background tasks to complete the topic creation although the CREATE command returns successfully. So could you wait some time before issuing the delete command to see whether you would run into this issue? > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4629) Per topic MBeans leak
[ https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823029#comment-15823029 ] huxi commented on KAFKA-4629: - I tried in my kafka0.10.0.1 single-broker test environment with no luck to reproduce the issue. The partition number and replication factor were both set to 1 which I am sure if the setting matters. As for the WARN logs, seems you are requesting a deletion for a non-exist topic. Could you confirm that the topic exists before invoking AdminUtils.deleteTopic? > Per topic MBeans leak > - > > Key: KAFKA-4629 > URL: https://issues.apache.org/jira/browse/KAFKA-4629 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.0.1 >Reporter: Alberto Forti >Priority: Minor > > Hi, > In our application we create and delete topics dynamically. Most of the times > when a topic is deleted the related MBeans are not deleted. Example of MBean: > kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d > Also, deleting a topic often produces (what I think is) noise in the logs at > WARN level. One example is: > WARN PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener > on 1]: Ignoring request to delete non-existing topics > dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2 > Easy reproducible with a basic Kafka cluster with two brokers, just create > and delete topics few times. Sometimes the MBeans for the topic are deleted > and sometimes are not. > I'm creating and deleting topics using the AdminUtils class in the Java API: > AdminUtils.deleteTopic(zkUtils, topicName); > AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, > topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$); > Kafka version: 0.10.0.1 (haven't tried other versions) > Thanks, > Alberto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4616) Message loss is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821473#comment-15821473 ] huxi commented on KAFKA-4616: - They are not missing, but are just not delivered to Kafka successfully. bq. The guarantee that Kafka offers is that a committed message will not be lost, as long as there is at least one in sync replica alive, at all times. To avoid your "data loss", try to append "retries=" to the command, although you might see some repeated produced messages. > Message loss is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle > --- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: > Topic: test3R3P3Partition: 0Leader: 2 Replicas: > 1,2,0 Isr: 2,0,1 > Topic: test3R3P3Partition: 1Leader: 2 Replicas: > 2,0,1 Isr: 2,0,1 > Topic: test3R3P3Partition: 2Leader: 0 Replicas: > 0,1,2 Isr: 2,0,1 > **consumer group command: > $ bin/kafka-consumer-groups.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --describe --group > console-consumer-9926 > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > console-consumer-9926 test3R3P3 0 > 33265 33265 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 1 > 4 4 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 2 > 3 3 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > could you please help me understand what this error means "err - > org.apache.kafka.common.errors.NetworkException: The se
[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820267#comment-15820267 ] huxi commented on KAFKA-4616: - Appending acks option to the command as this: --producer-props bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: acks=-1 > Message log is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle in-between > -- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: > Topic: test3R3P3Partition: 0Leader: 2 Replicas: > 1,2,0 Isr: 2,0,1 > Topic: test3R3P3Partition: 1Leader: 2 Replicas: > 2,0,1 Isr: 2,0,1 > Topic: test3R3P3Partition: 2Leader: 0 Replicas: > 0,1,2 Isr: 2,0,1 > **consumer group command: > $ bin/kafka-consumer-groups.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --describe --group > console-consumer-9926 > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > console-consumer-9926 test3R3P3 0 > 33265 33265 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 1 > 4 4 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 2 > 3 3 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > could you please help me understand what this error means "err - > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received."? > Could you please provide suggestion to fix this issue? > we are seeing this behavior every-time we perform above test-scenario. > my understanding is, there should not any data loss till n-1 broker is alive. > is message loss is an expected behavior in the above case? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between
[ https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817886#comment-15817886 ] huxi commented on KAFKA-4616: - Could you set acks = -1 and retry the producer to see if there still exists messages loss? > Message log is seen when kafka-producer-perf-test.sh is running and any > broker restarted in middle in-between > -- > > Key: KAFKA-4616 > URL: https://issues.apache.org/jira/browse/KAFKA-4616 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.0.0 > Environment: Apache mesos >Reporter: sandeep kumar singh > > if any broker is restarted while kafka-producer-perf-test.sh command is > running, we see message loss. > commands i run: > **perf command: > $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096 > --throughput 1000 --topic test3R3P3 --producer-props > bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: > I am sending 10 messages of each having size 4096 > error thrown by perf command: > 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, > 433.0 max latency. > 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, > 798.0 max latency. > 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, > 503.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, > 594.0 max latency. > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, > 501.0 max latency. > 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, > 516.0 max latency. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received. > truncated > 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, > 497.0 max latency. > 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, > 521.0 max latency. > 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, > 418.0 max latency. > 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg > latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms > 99.9th. > **consumer command: > $ bin/kafka-console-consumer.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --topic test3R3P3 > 1>~/kafka_output.log > message stored: > $ wc -l ~/kafka_output.log > 99932 /home/montana/kafka_output.log > I found only 99932 message are stored and 68 messages are lost. > **topic describe command: > $ bin/kafka-topics.sh --zookeeper x.x.x.x:2181/dcos-service-kafka-framework > --describe |grep test3R3 > Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs: > Topic: test3R3P3Partition: 0Leader: 2 Replicas: > 1,2,0 Isr: 2,0,1 > Topic: test3R3P3Partition: 1Leader: 2 Replicas: > 2,0,1 Isr: 2,0,1 > Topic: test3R3P3Partition: 2Leader: 0 Replicas: > 0,1,2 Isr: 2,0,1 > **consumer group command: > $ bin/kafka-consumer-groups.sh --zookeeper > x.x.x.x:2181/dcos-service-kafka-framework --describe --group > console-consumer-9926 > GROUP TOPIC PARTITION > CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > console-consumer-9926 test3R3P3 0 > 33265 33265 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 1 > 4 4 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > console-consumer-9926 test3R3P3 2 > 3 3 0 > console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0 > could you please help me understand what this error means "err - > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received."? > Could you please provide suggestion to fix this issue? > we are seeing this behavior every-time we perform above test-scenario. > my understanding is, there should not any data loss till n-1 broker is alive. > is message loss is an expected behavior in the above case? > thanks > Sandeep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817849#comment-15817849 ] huxi edited comment on KAFKA-4614 at 1/11/17 10:00 AM: --- Great catch for the whole things. As for the unmap things, seems there is no perfect solution at this moment. Java does not offer a corresponding unmap that invokes underlying munmap system call for the mapped file. See [the jdk bug|http://bugs.java.com/view_bug.do?bug_id=4724038] Right now there exists two "not-that-perfect" solutions: 1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method named 'forceUnmap' that actually does this kind of thing. We could simply reuse this method. However, this solution has two drawbacks: 1. If others use MappedByteBuffer later, it will crash. 2. Portability risk, although sun.misc.Cleaner is present in most of major JVM vendors. 2 Explicitly invoke 'System.gc()', which is also a not-good-enough idea, especially when setting -XX:+DisableExplicitGC. was (Author: huxi_2b): Great catch for the whole things. As for the unmap things, seems there is no perfect solution at this moment. Java does not offer a corresponding unmap that invokes underlying munmap system call for the mapped file. See [the jdk bug|http://bugs.java.com/view_bug.do?bug_id=4724038] Right now there exists two "not-that-perfect" solutions: 1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method named 'forceUnmap' that actually does this kind of thing. We could simply reuse this method. However, this solution has two drawbacks: 1. If others use MappedByteBuffer later, it will crash. 2. Portability risk, although sun.misc.Cleaner is present in most of major JVM vendors. 2 Explicitly invoke 'System.gc()', which is also a good enough idea, especially when setting -XX:+DisableExplicitGC. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} &
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817849#comment-15817849 ] huxi commented on KAFKA-4614: - Great catch for the whole things. As for the unmap things, seems there is no perfect solution at this moment. Java does not offer a corresponding unmap that invokes underlying munmap system call for the mapped file. See [the jdk bug|http://bugs.java.com/view_bug.do?bug_id=4724038] Right now there exists two "not-that-perfect" solutions: 1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method named 'forceUnmap' that actually does this kind of thing. We could simply reuse this method. However, this solution has two drawbacks: 1. If others use MappedByteBuffer later, it will crash. 2. Portability risk, although sun.misc.Cleaner is present in most of major JVM vendors. 2 Explicitly invoke 'System.gc()', which is also a good enough idea, especially when setting -XX:+DisableExplicitGC. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa
[jira] [Commented] (KAFKA-4603) argument error,and command parsed error
[ https://issues.apache.org/jira/browse/KAFKA-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803933#comment-15803933 ] huxi commented on KAFKA-4603: - [~auroraxlh] If you exactly type '--zookeeper.connect' the command could run successfully? And it gets failed should you get some typos. Correct? > argument error,and command parsed error > --- > > Key: KAFKA-4603 > URL: https://issues.apache.org/jira/browse/KAFKA-4603 > Project: Kafka > Issue Type: Bug > Components: admin, documentation >Affects Versions: 0.10.0.1, 0.10.2.0 > Environment: suse >Reporter: Xin >Priority: Minor > > according to the 7.6.2 Migrating clusters of document : > ./zookeeper-security-migration.sh --zookeeper.acl=secure > --zookeeper.connection=localhost:2181 > joptsimple.OptionArgumentConversionException: Cannot parse argument > 'localhost:2181' of option zookeeper.connection.timeout > at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93) > at > joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274) > at joptsimple.OptionSet.valuesOf(OptionSet.java:223) > at joptsimple.OptionSet.valueOf(OptionSet.java:172) > at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111) > at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119) > at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala) > Caused by: joptsimple.internal.ReflectionException: > java.lang.NumberFormatException: For input string: "localhost:2181" > at > joptsimple.internal.Reflection.reflectionException(Reflection.java:140) > at joptsimple.internal.Reflection.invoke(Reflection.java:122) > at > joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48) > at joptsimple.internal.Reflection.convertWith(Reflection.java:128) > at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90) > ... 6 more > Caused by: java.lang.NumberFormatException: For input string: "localhost:2181" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:492) > at java.lang.Integer.valueOf(Integer.java:582) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at joptsimple.internal.Reflection.invoke(Reflection.java:119) > ... 9 more > ===>the argument "zookeeper.connection" has been parsed to > "zookeeper.connection.timeout" > using help i found that the argument is : > --zookeeper.connectSets the ZooKeeper connect string > (ensemble). This parameter takes a > comma-separated list of host:port > pairs. (default: localhost:2181) > --zookeeper.connection.timeout Sets the ZooKeeper connection timeout. > the document describe wrong, and the code also has something wrong: > in ZkSecurityMigrator.scala, > val parser = new OptionParse()==> > Any of --v, --ve, ... are accepted on the command line and treated as though > you had typed --verbose. > To suppress this behavior, use the OptionParser constructor > OptionParser(boolean allowAbbreviations) and pass a value of false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers
[ https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803804#comment-15803804 ] huxi commented on KAFKA-4595: - Yes, seems that the process might look like this: 1. At some point of time, deleting topic operation began which triggered to resume the DeleteTopicsThread. This thread got the controller lock and ran successfully. Then it released the lock. But the request send thread had not begun to execute the callback. 2. Later, the network got something wrong, then controller triggered the zk listener thread to remove dead brokers. This thread got the controller lock and shut down the request send thread for that dead broker and wait. 3. Then, the request send thread began to execute the deleteTopicStopReplicaCallback which also tried to get the controller lock. So it waited forever and failed to be shut down. 4. Then deadlock happened. [~pengwei] Does it make sense? > Controller send thread can't stop when broker change listener event trigger > for dead brokers > - > > Key: KAFKA-4595 > URL: https://issues.apache.org/jira/browse/KAFKA-4595 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.0, 0.10.1.1 >Reporter: Pengwei > Fix For: 0.10.2.0 > > > In our test env, we found controller is not working after a delete topic > opertation and network issue, the stack is below: > "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" > #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on > condition [0x7fb76b7c8000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xc05497b8> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50) > at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32) > at > kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128) > at > kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81) > - locked <0xc0258760> (a java.lang.Object) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:79) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at > kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChan
[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped
[ https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803315#comment-15803315 ] huxi commented on KAFKA-4599: - What kind of Storm-supplied kafka client do you use? storm-kafka or storm-kafka-client ? And the version? > KafkaConsumer encounters SchemaException when Kafka broker stopped > -- > > Key: KAFKA-4599 > URL: https://issues.apache.org/jira/browse/KAFKA-4599 > Project: Kafka > Issue Type: Bug > Components: consumer >Reporter: Andrew Olson > > We recently observed an issue in production that can apparently occur a small > percentage of the time when a Kafka broker is stopped. We're using version > 0.9.0.1 for all brokers and clients. > During a recent episode, 3 KafkaConsumer instances (out of approximately 100) > ran into the following SchemaException within a few seconds of instructing > the broker to shutdown. > {noformat} > 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: > Error reading field 'responses': Error reading array of size 2774863, only 62 > bytes available > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71) > at > org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) > {noformat} > The exception message was slightly different for one consumer, > {{Error reading field 'responses': Error reading array of size 2774863, only > 260 bytes available}} > The exception was not caught and caused the Storm Executor thread to restart, > so it's not clear if it would have been transient or fatal for the > KafkaConsumer. > Here are the initial broker shutdown logs, > {noformat} > 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], > shutting down > 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutting down > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Stopped > 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-1-40], Shutdown completed > 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: > [ReplicaFetcherThread-3-30], Shutting down > 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], > Controlled shutdown succeeded > 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutting down > 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on > Broker 4], Shutdown completed > {noformat} > We've found one very similar reported occurrence, > http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4592) Kafka Producer Metrics Invalid Value
[ https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801348#comment-15801348 ] huxi edited comment on KAFKA-4592 at 1/5/17 2:36 PM: - Seems that o.a.k.common.metrics.stats.Max's default constructor assigns Double.NEGATIVE_INFINITY as the initial value and picks NEGATIVE_INFINITY in combine, but for the record size and like, value of zero is more reasonable. was (Author: huxi_2b): Seems that o.a.k.common.metrics.stats.Max's default constructor assigns Double.NEGATIVE_INFINITY as the initial value, but for the record size and like, value of zero is more reasonable. > Kafka Producer Metrics Invalid Value > > > Key: KAFKA-4592 > URL: https://issues.apache.org/jira/browse/KAFKA-4592 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 >Reporter: AJ Jwair >Assignee: huxi > > Producer metrics > Metric name: record-size-max > When no records are produced during the monitoring window, the > record-size-max has an invalid value of -9.223372036854776E16 > Please notice that the value is not a very small number close to zero bytes, > it is negative 90 quadrillion bytes > The same behavior was observed in: records-lag-max -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-4592) Kafka Producer Metrics Invalid Value
[ https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4592: --- Assignee: huxi > Kafka Producer Metrics Invalid Value > > > Key: KAFKA-4592 > URL: https://issues.apache.org/jira/browse/KAFKA-4592 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 >Reporter: AJ Jwair >Assignee: huxi > > Producer metrics > Metric name: record-size-max > When no records are produced during the monitoring window, the > record-size-max has an invalid value of -9.223372036854776E16 > Please notice that the value is not a very small number close to zero bytes, > it is negative 90 quadrillion bytes > The same behavior was observed in: records-lag-max -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4592) Kafka Producer Metrics Invalid Value
[ https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801348#comment-15801348 ] huxi edited comment on KAFKA-4592 at 1/5/17 1:31 PM: - Seems that o.a.k.common.metrics.stats.Max's default constructor assigns Double.NEGATIVE_INFINITY as the initial value, but for the record size and like, value of zero is more reasonable. was (Author: huxi_2b): Seems that o.a.k.common.metrics.stats.Max's default constructor assign Double.NEGATIVE_INFINITY as the initial value, but for the record size and like, value of zero is more reasonable. > Kafka Producer Metrics Invalid Value > > > Key: KAFKA-4592 > URL: https://issues.apache.org/jira/browse/KAFKA-4592 > Project: Kafka > Issue Type: Bug > Components: core, producer >Affects Versions: 0.10.1.1 >Reporter: AJ Jwair > > Producer metrics > Metric name: record-size-max > When no records are produced during the monitoring window, the > record-size-max has an invalid value of -9.223372036854776E16 > Please notice that the value is not a very small number close to zero bytes, > it is negative 90 quadrillion bytes > The same behavior was observed in: records-lag-max -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reopened KAFKA-4295: - > kafka-console-consumer.sh does not delete the temporary group in zookeeper > -- > > Key: KAFKA-4295 > URL: https://issues.apache.org/jira/browse/KAFKA-4295 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1 >Reporter: Sswater Shi >Assignee: huxi >Priority: Minor > > I'm not sure it is a bug or you guys designed it. > Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group > information in zookeeper/consumers on exit when without "--new-consumer". > There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if > kafka-console-consumer.sh runs a lot of times. > When 0.8.x.x, the kafka-console-consumer.sh can be followed by an argument > "group". If not specified, the kafka-console-consumer.sh will create a > temporary group name like 'console-consumer-'. If the group name is > specified by "group", the information in the zookeeper/consumers will be kept > on exit. If the group name is a temporary one, the information in the > zookeeper will be deleted when kafka-console-consumer.sh is quitted by > Ctrl+C. Why this is changed from 0.9.x.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-4434) KafkaProducer configuration is logged twice
[ https://issues.apache.org/jira/browse/KAFKA-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi resolved KAFKA-4434. - Resolution: Fixed Reviewer: Ismael Juma > KafkaProducer configuration is logged twice > --- > > Key: KAFKA-4434 > URL: https://issues.apache.org/jira/browse/KAFKA-4434 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 0.10.0.1 >Reporter: Ruben de Gooijer >Assignee: huxi >Priority: Minor > Labels: newbie > Fix For: 0.10.2.0 > > > The constructor of org.apache.kafka.clients.producer.KafkaProducer accepts a > ProducerConfig which when constructed logs the configuration: > https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java#L58 > . > However, when the construction of KafkaProducer proceeds the provided > ProducerConfig is repurposed and another instance is created > https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L323 > which triggers another log with the same contents (only the clientId can > differ in case its not supplied in the original config). > At first sight this seems like unintended behaviour to me. At least it caused > me to dive into it in order to verify if there weren't two producer instances > running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-3739) Add no-arg constructor for library provided serdes
[ https://issues.apache.org/jira/browse/KAFKA-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-3739: --- Assignee: huxi > Add no-arg constructor for library provided serdes > -- > > Key: KAFKA-3739 > URL: https://issues.apache.org/jira/browse/KAFKA-3739 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Guozhang Wang >Assignee: huxi > Labels: newbie, user-experience > > We need to add the no-arg constructor explicitly for those library-provided > serdes such as {{WindowedSerde}} that already have constructors with > arguments. Otherwise they cannot be used through configs which are expecting > to construct them via reflections with no-arg constructors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-4308) Inconsistent parameters between console producer and consumer
[ https://issues.apache.org/jira/browse/KAFKA-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi resolved KAFKA-4308. - Resolution: Duplicate > Inconsistent parameters between console producer and consumer > - > > Key: KAFKA-4308 > URL: https://issues.apache.org/jira/browse/KAFKA-4308 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 >Reporter: Gwen Shapira > Labels: newbie > > kafka-console-producer uses --broker-list while kafka-console-consumer uses > --bootstrap-server. > Let's add --bootstrap-server to the producer for some consistency? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-4434) KafkaProducer configuration is logged twice
[ https://issues.apache.org/jira/browse/KAFKA-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxi reassigned KAFKA-4434: --- Assignee: huxi (was: kevin.chen) > KafkaProducer configuration is logged twice > --- > > Key: KAFKA-4434 > URL: https://issues.apache.org/jira/browse/KAFKA-4434 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 0.10.0.1 >Reporter: Ruben de Gooijer >Assignee: huxi >Priority: Minor > Labels: newbie > Fix For: 0.10.2.0 > > > The constructor of org.apache.kafka.clients.producer.KafkaProducer accepts a > ProducerConfig which when constructed logs the configuration: > https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java#L58 > . > However, when the construction of KafkaProducer proceeds the provided > ProducerConfig is repurposed and another instance is created > https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L323 > which triggers another log with the same contents (only the clientId can > differ in case its not supplied in the original config). > At first sight this seems like unintended behaviour to me. At least it caused > me to dive into it in order to verify if there weren't two producer instances > running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4576) Log segments close to max size break on fetch
[ https://issues.apache.org/jira/browse/KAFKA-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795040#comment-15795040 ] huxi commented on KAFKA-4576: - If we use a loop structure to collect all the 12 bytes, there is a possibility that the loop never gets exited which otherwise is a very very rare situation. Should we handle such situation if we decide to employ the loop? > Log segments close to max size break on fetch > - > > Key: KAFKA-4576 > URL: https://issues.apache.org/jira/browse/KAFKA-4576 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.1 >Reporter: Ivan Babrou >Assignee: huxi >Priority: Critical > Fix For: 0.10.2.0 > > > We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1). > Max segment size is set to 2147483647 globally, that's 1 byte less than max > signed int32. > Every now and then we see failures like this: > {noformat} > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on > Broker 1006]: Error processing fetch operation on partition [mytopic,11], > offset 483579108587 (kafka.server.ReplicaManager) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: > java.lang.IllegalStateException: Failed to read complete buffer for > targetOffset 483686627237 startPosition 2145701130 in > /disk/data0/kafka-logs/mytopic-11/483571890786.log > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.LogSegment.translateOffset(LogSegment.scala:128) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.LogSegment.read(LogSegment.scala:180) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.Log.read(Log.scala:563) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.Iterator$class.foreach(Iterator.scala:893) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaApis.handle(KafkaApis.scala:79) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > java.lang.Thread.run(Thread.java:745) > {noformat} > {noformat} > ... > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:15 > 483557418204.timeindex > -rw-r--r-- 1 kafka kafka 9496 Dec 25 15:26 483564654488.index > -rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:26 > 483564654488.timeindex > -rw-r--r-- 1 kafka kafka 9576 Dec 25 15:37 483571890786.index > -rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:37 > 483571890786.timeindex > -rw-r--r-- 1 kafka kafka 9568 Dec 25 15:48 483579135712.index > -rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:48 > 483579135712.timeindex > -rw-r--r-- 1 kafka kafka 9408 Dec 25 15:59 483586374164.index > ... > {noformat} > Here 483571890786.log is just 3 bytes below the max size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4576) Log segments close to max size break on fetch
[ https://issues.apache.org/jira/browse/KAFKA-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794042#comment-15794042 ] huxi commented on KAFKA-4576: - Seems FileChannel.read does not fill up the 12-byte buffer although there are enough bytes to read. The javadoc does not exactly specify how many bytes the buffer will be filled in. In practice, we are likely to always get a full buffer when using FileChannel, but actually Java does not make a guarantee about that. Instead, it depends on the OS implementation. Maybe we should use a while loop to make sure 12 bytes have been filled into the buffer without violating the corner case check. Does it make sense? [~guozhang] [~ijuma] [~becket_qin] > Log segments close to max size break on fetch > - > > Key: KAFKA-4576 > URL: https://issues.apache.org/jira/browse/KAFKA-4576 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.10.1.1 >Reporter: Ivan Babrou > > We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1). > Max segment size is set to 2147483647 globally, that's 1 byte less than max > signed int32. > Every now and then we see failures like this: > {noformat} > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on > Broker 1006]: Error processing fetch operation on partition [mytopic,11], > offset 483579108587 (kafka.server.ReplicaManager) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: > java.lang.IllegalStateException: Failed to read complete buffer for > targetOffset 483686627237 startPosition 2145701130 in > /disk/data0/kafka-logs/mytopic-11/483571890786.log > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.LogSegment.translateOffset(LogSegment.scala:128) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.LogSegment.read(LogSegment.scala:180) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.log.Log.read(Log.scala:563) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.Iterator$class.foreach(Iterator.scala:893) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaApis.handle(KafkaApis.scala:79) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at > java.lang.Thread.run(Thread.java:745) > {noformat} > {noformat} > ... > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:15 > 483557418204.timeindex > -rw-r--r-- 1 kafka kafka 9496 Dec 25 15:26 483564654488.index > -rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:26 > 483564654488.timeindex > -rw-r--r-- 1 kafka kafka 9576 Dec 25 15:37 483571890786.index > -rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log > -rw-r--r-- 1 kafka kafka 0 Dec 25 15:37 > 483571890786.timeindex > -rw-r--r-- 1 kafka kafka 9568 Dec 25 15:48 483579135712.index > -rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log > -rw-r--r-- 1 kafka kafka
[jira] [Commented] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers
[ https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790517#comment-15790517 ] huxi commented on KAFKA-4577: - Could you elaborate on the steps you ran into this problem? > NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers > -- > > Key: KAFKA-4577 > URL: https://issues.apache.org/jira/browse/KAFKA-4577 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 >Reporter: Scott Reynolds > > Seems as if either deleteTopicManager or deleteTopicManager. > partitionsToBeDeleted wasn't set ? > {code} > java.lang.NullPointerException > at > kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers > (ControllerChannelManager.scala:331) > at kafka.controller.KafkaController.sendUpdateMetadataRequest > (KafkaController.scala:1023) > at > kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications > (KafkaController.scala:1371) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp > (KafkaController.scala:1358) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234) > at kafka.controller.IsrChangeNotificationListener.handleChildChange > (KafkaController.scala:1351) > at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843) > at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers
[ https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790517#comment-15790517 ] huxi edited comment on KAFKA-4577 at 1/1/17 4:26 AM: - Could you elaborate on the steps when you ran into this problem? was (Author: huxi_2b): Could you elaborate on the steps you ran into this problem? > NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers > -- > > Key: KAFKA-4577 > URL: https://issues.apache.org/jira/browse/KAFKA-4577 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.0 >Reporter: Scott Reynolds > > Seems as if either deleteTopicManager or deleteTopicManager. > partitionsToBeDeleted wasn't set ? > {code} > java.lang.NullPointerException > at > kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers > (ControllerChannelManager.scala:331) > at kafka.controller.KafkaController.sendUpdateMetadataRequest > (KafkaController.scala:1023) > at > kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications > (KafkaController.scala:1371) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp > (KafkaController.scala:1358) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at > kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply > (KafkaController.scala:1351) > at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234) > at kafka.controller.IsrChangeNotificationListener.handleChildChange > (KafkaController.scala:1351) > at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843) > at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-4573) Producer sporadic timeout
[ https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784692#comment-15784692 ] huxi commented on KAFKA-4573: - Is it possible it's caused by a transient network error? > Producer sporadic timeout > - > > Key: KAFKA-4573 > URL: https://issues.apache.org/jira/browse/KAFKA-4573 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.9.0.1 >Reporter: Ankur C > > We had production outage due to sporadic kafka producer timeout. About 1 to > 2% of the message would timeout continuously. > Kafka version - 0.9.0.1 > #Kafka brokers - 5 > #Replication for each topic - 3 > #Number of topics - ~30 > #Number of partition - ~300 > We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any > issues. However, on Dec 23rd we saw sporadic kafka producer timeout. > Issue begin around 6:51am and continued until we bounced kafka broker. > 6:51am Underreplication started on small number of topics > 6:53am All underreplication recovered > 11:00am We restarted all kafka producer writer app but this didn't solve the > sporadic kafka producer timeout issue > 12:01pm We restarted all kafka broker after this the issue was resolved. > Kafka metrics and kafka logs doesn't show any major issue. There were no > offline partitions during the outage and #controller was exactly 1. > We only saw following exception in kafka broker in controller.log. This log > was present for all broker 0 to 4. > java.io.IOException: Connection to 2 was disconnected before the response was > read at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) > at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) > at scala.Option.foreach(Option.scala:236) at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) > at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) > at > kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) > at > kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) > at > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], > Controller 2 epoch 18 fails to send request > {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]} > to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. > (kafka.controller.RequestSendThread) -- This message was sent by Atlassian JIRA (v6.3.4#6332)