[jira] [Commented] (KAFKA-1630) ConsumerFetcherThread locked in Tomcat
[ https://issues.apache.org/jira/browse/KAFKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130576#comment-14130576 ] Neha Narkhede commented on KAFKA-1630: -- Can you describe the observed behavior a little more? Is the consumer lagging? (Use the ConsumerOffsetChecker) Is your fetch size set to larger than the largest message in the topic? ConsumerFetcherThread locked in Tomcat -- Key: KAFKA-1630 URL: https://issues.apache.org/jira/browse/KAFKA-1630 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.0 Environment: linux redhat Reporter: vijay Assignee: Neha Narkhede Labels: performance Original Estimate: 12h Remaining Estimate: 12h I am using high level consumer API for consuming messages from kafka. ConsumerFetcherThread gets locked. Kindly look in to the below stack trace ConsumerFetcherThread-SocialTwitterStream6_172.31.240.136-1410398702143-61a247c3-0-1 prio=10 tid=0x7f294001e800 nid=0x1677 runnable [0x7f297aae9000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7f2a7c38eb40 (a sun.nio.ch.Util$1) - locked 0x7f2a7c38eb28 (a java.util.Collections$UnmodifiableSet) - locked 0x7f2a7c326f20 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:193) - locked 0x7f2a7c2163c0 (a java.lang.Object) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) - locked 0x7f2a7c229950 (a sun.nio.ch.SocketAdaptor$SocketInputStream) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:200) - locked 0x7f2a7c38ea50 (a java.lang.Object) at kafka.utils.Utils$.read(Utils.scala:395) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71) - locked 0x7f2a7c38e9f0 (a java.lang.Object) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1630) ConsumerFetcherThread locked in Tomcat
[ https://issues.apache.org/jira/browse/KAFKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130576#comment-14130576 ] Neha Narkhede edited comment on KAFKA-1630 at 9/11/14 7:35 PM: --- Can you describe the observed behavior a little more? Is the consumer lagging? (Use the ConsumerOffsetChecker) Is your fetch size set to larger than the largest message in the topic? Also, these sort of exploratory questions are better suited to the mailing list where the JIRA is created after a potential bug or solution is identified. was (Author: nehanarkhede): Can you describe the observed behavior a little more? Is the consumer lagging? (Use the ConsumerOffsetChecker) Is your fetch size set to larger than the largest message in the topic? ConsumerFetcherThread locked in Tomcat -- Key: KAFKA-1630 URL: https://issues.apache.org/jira/browse/KAFKA-1630 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.0 Environment: linux redhat Reporter: vijay Assignee: Neha Narkhede Labels: performance Original Estimate: 12h Remaining Estimate: 12h I am using high level consumer API for consuming messages from kafka. ConsumerFetcherThread gets locked. Kindly look in to the below stack trace ConsumerFetcherThread-SocialTwitterStream6_172.31.240.136-1410398702143-61a247c3-0-1 prio=10 tid=0x7f294001e800 nid=0x1677 runnable [0x7f297aae9000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x7f2a7c38eb40 (a sun.nio.ch.Util$1) - locked 0x7f2a7c38eb28 (a java.util.Collections$UnmodifiableSet) - locked 0x7f2a7c326f20 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:193) - locked 0x7f2a7c2163c0 (a java.lang.Object) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) - locked 0x7f2a7c229950 (a sun.nio.ch.SocketAdaptor$SocketInputStream) at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:200) - locked 0x7f2a7c38ea50 (a java.lang.Object) at kafka.utils.Utils$.read(Utils.scala:395) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71) - locked 0x7f2a7c38e9f0 (a java.lang.Object) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1628) [New Java Producer] Topic which contains . does not correct corresponding metric name
[ https://issues.apache.org/jira/browse/KAFKA-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1628: - Labels: newbie (was: ) [New Java Producer] Topic which contains . does not correct corresponding metric name - Key: KAFKA-1628 URL: https://issues.apache.org/jira/browse/KAFKA-1628 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.8.2 Environment: ALL Reporter: Bhavesh Mistry Priority: Minor Labels: newbie Hmm, it seems that we do allow . in the topic name. The topic name can't be just . or .. though. So, if there is a topic test.1, we will have the following jmx metrics name. kafka.producer.console-producer.topic.test:type=1 It should be changed to kafka.producer.console-producer.topic:type=test.1 Could you file a jira to follow up on this? Thanks, Jun -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work
[ https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130611#comment-14130611 ] Neha Narkhede commented on KAFKA-1558: -- bq. I think what we need now is to test deleteTopic under failure modes - leader election, partition reassignment, etc. Yes, it would help if someone who has cycles can take this up and basically try to break delete topic under various failure scenarios. Currently, the approach to fixing delete topic is a little ad-hoc. AdminUtils.deleteTopic does not work Key: KAFKA-1558 URL: https://issues.apache.org/jira/browse/KAFKA-1558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Henning Schmiedehausen Assignee: Sriharsha Chintalapani Priority: Blocker Fix For: 0.8.2 the AdminUtils:.deleteTopic method is implemented as {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.createPersistentPath(zkClient, ZkUtils.getDeleteTopicPath(topic)) } {code} but the DeleteTopicCommand actually does {code} zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer) zkClient.deleteRecursive(ZkUtils.getTopicPath(topic)) {code} so I guess, that the 'createPersistentPath' above should actually be {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work
[ https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130630#comment-14130630 ] Sriharsha Chintalapani commented on KAFKA-1558: --- [~gwenshap] Thanks for the info. [~nehanarkhede] I am working on testing those cases mentioned above. Thanks. AdminUtils.deleteTopic does not work Key: KAFKA-1558 URL: https://issues.apache.org/jira/browse/KAFKA-1558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Henning Schmiedehausen Assignee: Sriharsha Chintalapani Priority: Blocker Fix For: 0.8.2 the AdminUtils:.deleteTopic method is implemented as {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.createPersistentPath(zkClient, ZkUtils.getDeleteTopicPath(topic)) } {code} but the DeleteTopicCommand actually does {code} zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer) zkClient.deleteRecursive(ZkUtils.getTopicPath(topic)) {code} so I guess, that the 'createPersistentPath' above should actually be {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work
[ https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130641#comment-14130641 ] Neha Narkhede commented on KAFKA-1558: -- Great. Thanks [~sriharsha] for taking this on. AdminUtils.deleteTopic does not work Key: KAFKA-1558 URL: https://issues.apache.org/jira/browse/KAFKA-1558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Henning Schmiedehausen Assignee: Sriharsha Chintalapani Priority: Blocker Fix For: 0.8.2 the AdminUtils:.deleteTopic method is implemented as {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.createPersistentPath(zkClient, ZkUtils.getDeleteTopicPath(topic)) } {code} but the DeleteTopicCommand actually does {code} zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer) zkClient.deleteRecursive(ZkUtils.getTopicPath(topic)) {code} so I guess, that the 'createPersistentPath' above should actually be {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1591) Clean-up Unnecessary stack trace in error/warn logs
[ https://issues.apache.org/jira/browse/KAFKA-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1591: - Assignee: Abhishek Sharma Clean-up Unnecessary stack trace in error/warn logs --- Key: KAFKA-1591 URL: https://issues.apache.org/jira/browse/KAFKA-1591 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Abhishek Sharma Labels: newbie Fix For: 0.9.0 Attachments: Jira-1591-SocketConnection-Warning.patch, Jira1591-SendProducerRequest-Warning.patch Some of the unnecessary stack traces in error / warning log entries can easily pollute the log files. Examples include KAFKA-1066, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
Ryan Berdeen created KAFKA-1631: --- Summary: ReplicationFactor and under-replicated partitions incorrect during reassignment Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1382) Update zkVersion on partition state update failures
[ https://issues.apache.org/jira/browse/KAFKA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131013#comment-14131013 ] Jagbir commented on KAFKA-1382: --- We are in same fix. Can you please comment if this can be patched safely on 0.8.1.1? Update zkVersion on partition state update failures --- Key: KAFKA-1382 URL: https://issues.apache.org/jira/browse/KAFKA-1382 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Sriharsha Chintalapani Fix For: 0.8.2 Attachments: KAFKA-1382.patch, KAFKA-1382_2014-05-30_21:19:21.patch, KAFKA-1382_2014-05-31_15:50:25.patch, KAFKA-1382_2014-06-04_12:30:40.patch, KAFKA-1382_2014-06-07_09:00:56.patch, KAFKA-1382_2014-06-09_18:23:42.patch, KAFKA-1382_2014-06-11_09:37:22.patch, KAFKA-1382_2014-06-16_13:50:16.patch, KAFKA-1382_2014-06-16_14:19:27.patch Our updateIsr code is currently: private def updateIsr(newIsr: Set[Replica]) { debug(Updated ISR for partition [%s,%d] to %s.format(topic, partitionId, newIsr.mkString(,))) val newLeaderAndIsr = new LeaderAndIsr(localBrokerId, leaderEpoch, newIsr.map(r = r.brokerId).toList, zkVersion) // use the epoch of the controller that made the leadership decision, instead of the current controller epoch val (updateSucceeded, newVersion) = ZkUtils.conditionalUpdatePersistentPath(zkClient, ZkUtils.getTopicPartitionLeaderAndIsrPath(topic, partitionId), ZkUtils.leaderAndIsrZkData(newLeaderAndIsr, controllerEpoch), zkVersion) if (updateSucceeded){ inSyncReplicas = newIsr zkVersion = newVersion trace(ISR updated to [%s] and zkVersion updated to [%d].format(newIsr.mkString(,), zkVersion)) } else { info(Cached zkVersion [%d] not equal to that in zookeeper, skip updating ISR.format(zkVersion)) } We encountered an interesting scenario recently when a large producer fully saturated the broker's NIC for over an hour. The large volume of data led to a number of ISR shrinks (and subsequent expands). The NIC saturation affected the zookeeper client heartbeats and led to a session timeout. The timeline was roughly as follows: - Attempt to expand ISR - Expansion written to zookeeper (confirmed in zookeeper transaction logs) - Session timeout after around 13 seconds (the configured timeout is 20 seconds) so that lines up. - zkclient reconnects to zookeeper (with the same session ID) and retries the write - but uses the old zkVersion. This fails because the zkVersion has already been updated (above). - The ISR expand keeps failing after that and the only way to get out of it is to bounce the broker. In the above code, if the zkVersion is different we should probably update the cached version and even retry the expansion until it succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1382) Update zkVersion on partition state update failures
[ https://issues.apache.org/jira/browse/KAFKA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1382: - Fix Version/s: 0.8.1.2 Update zkVersion on partition state update failures --- Key: KAFKA-1382 URL: https://issues.apache.org/jira/browse/KAFKA-1382 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Sriharsha Chintalapani Fix For: 0.8.2, 0.8.1.2 Attachments: KAFKA-1382.patch, KAFKA-1382_2014-05-30_21:19:21.patch, KAFKA-1382_2014-05-31_15:50:25.patch, KAFKA-1382_2014-06-04_12:30:40.patch, KAFKA-1382_2014-06-07_09:00:56.patch, KAFKA-1382_2014-06-09_18:23:42.patch, KAFKA-1382_2014-06-11_09:37:22.patch, KAFKA-1382_2014-06-16_13:50:16.patch, KAFKA-1382_2014-06-16_14:19:27.patch Our updateIsr code is currently: private def updateIsr(newIsr: Set[Replica]) { debug(Updated ISR for partition [%s,%d] to %s.format(topic, partitionId, newIsr.mkString(,))) val newLeaderAndIsr = new LeaderAndIsr(localBrokerId, leaderEpoch, newIsr.map(r = r.brokerId).toList, zkVersion) // use the epoch of the controller that made the leadership decision, instead of the current controller epoch val (updateSucceeded, newVersion) = ZkUtils.conditionalUpdatePersistentPath(zkClient, ZkUtils.getTopicPartitionLeaderAndIsrPath(topic, partitionId), ZkUtils.leaderAndIsrZkData(newLeaderAndIsr, controllerEpoch), zkVersion) if (updateSucceeded){ inSyncReplicas = newIsr zkVersion = newVersion trace(ISR updated to [%s] and zkVersion updated to [%d].format(newIsr.mkString(,), zkVersion)) } else { info(Cached zkVersion [%d] not equal to that in zookeeper, skip updating ISR.format(zkVersion)) } We encountered an interesting scenario recently when a large producer fully saturated the broker's NIC for over an hour. The large volume of data led to a number of ISR shrinks (and subsequent expands). The NIC saturation affected the zookeeper client heartbeats and led to a session timeout. The timeline was roughly as follows: - Attempt to expand ISR - Expansion written to zookeeper (confirmed in zookeeper transaction logs) - Session timeout after around 13 seconds (the configured timeout is 20 seconds) so that lines up. - zkclient reconnects to zookeeper (with the same session ID) and retries the write - but uses the old zkVersion. This fails because the zkVersion has already been updated (above). - The ISR expand keeps failing after that and the only way to get out of it is to bounce the broker. In the above code, if the zkVersion is different we should probably update the cached version and even retry the expansion until it succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1419) cross build for scala 2.11
[ https://issues.apache.org/jira/browse/KAFKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Stein updated KAFKA-1419: - Fix Version/s: 0.8.1.2 cross build for scala 2.11 -- Key: KAFKA-1419 URL: https://issues.apache.org/jira/browse/KAFKA-1419 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 0.8.1 Reporter: Scott Clasen Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2, 0.8.1.2 Attachments: KAFKA-1419-scalaBinaryVersion.patch, KAFKA-1419.patch, KAFKA-1419.patch, KAFKA-1419_2014-07-28_15:05:16.patch, KAFKA-1419_2014-07-29_15:13:43.patch, KAFKA-1419_2014-08-04_14:43:26.patch, KAFKA-1419_2014-08-05_12:51:16.patch, KAFKA-1419_2014-08-07_10:17:34.patch, KAFKA-1419_2014-08-07_10:52:18.patch Please publish builds for scala 2.11, hopefully just needs a small tweak to the gradle conf? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1631: - Labels: newbie (was: ) ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment
[ https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131051#comment-14131051 ] Neha Narkhede commented on KAFKA-1631: -- Thanks for reporting the issue, [~rberdeen]. Since partition reassignment involves changing the replicas of a partition, it is tricky to report the under replicated status correctly at all times. However, one possible improvement is to change the topics tool to not report partitions being reassigned, as under replicated. It is a minor change, feel free to give it a stab. ReplicationFactor and under-replicated partitions incorrect during reassignment --- Key: KAFKA-1631 URL: https://issues.apache.org/jira/browse/KAFKA-1631 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Ryan Berdeen Labels: newbie We have a topic with a replication factor of 3. We monitor UnderReplicatedPartitions as recommended by the documentation. During a partition reassignment, partitions being reassigned are reported as under-replicated. Running a describe shows: {code} Topic:activity-wal-1PartitionCount:15 ReplicationFactor:5 Configs: Topic: activity-wal-1 Partition: 0Leader: 14 Replicas: 14,13,12,11,15Isr: 14,12,11,13 Topic: activity-wal-1 Partition: 1Leader: 14 Replicas: 15,14,11 Isr: 14,11 Topic: activity-wal-1 Partition: 2Leader: 11 Replicas: 11,15,12 Isr: 12,11,15 ... {code} It looks like the displayed replication factor, 5, is simply the number of replicas listed for the first partition, which includes both brokers in the current list and those onto which the partition is being reassigned. Partition 0 is also included in the list when using the `--under-replicated-partitions` option, even though it is replicated to more partitions than the true replication factor. During a reassignment, the under-replicated partitions metric is not usable, meaning that actual under-replicated partitions can go unnoticed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1534) transient unit test failure in testBasicPreferredReplicaElection
[ https://issues.apache.org/jira/browse/KAFKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131065#comment-14131065 ] Neha Narkhede commented on KAFKA-1534: -- [~abhioncbr] Thanks for signing up to fix this. For any of the unit test failures, especially those that are transient in nature, the best way to reproduce is to run it in a loop several times with logging, so you can go back and check why it failed. transient unit test failure in testBasicPreferredReplicaElection Key: KAFKA-1534 URL: https://issues.apache.org/jira/browse/KAFKA-1534 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Jun Rao Labels: newbie Saw the following transient failure. kafka.admin.AdminTest testBasicPreferredReplicaElection FAILED junit.framework.AssertionFailedError: Timing out after 5000 ms since leader is not elected or changed for partition [test,1] at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:542) at kafka.admin.AdminTest.testBasicPreferredReplicaElection(AdminTest.scala:310) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1590) Binarize trace level request logging along with debug level text logging
[ https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1590: - Assignee: Abhishek Sharma Binarize trace level request logging along with debug level text logging Key: KAFKA-1590 URL: https://issues.apache.org/jira/browse/KAFKA-1590 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Abhishek Sharma Labels: newbie Fix For: 0.9.0 With trace level logging, the request handling logs can grow very fast depending on the client behavior (e.g. consumer with 0 maxWait and hence keep sending fetch requests). Previously we have changed it to debug level which only provides a summary of the requests, omitting request details. However this does not work perfectly since summaries are not sufficient for trouble-shooting, and turning on trace level upon issues will be too late. The proposed solution here, is to default to debug level logging with trace level logging printed as binary format at the same time. The generated binary files can then be further compressed / rolled out. When needed, we will then decompress / parse the trace logs into texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1534) transient unit test failure in testBasicPreferredReplicaElection
[ https://issues.apache.org/jira/browse/KAFKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1534: - Assignee: Abhishek Sharma transient unit test failure in testBasicPreferredReplicaElection Key: KAFKA-1534 URL: https://issues.apache.org/jira/browse/KAFKA-1534 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: Abhishek Sharma Labels: newbie Saw the following transient failure. kafka.admin.AdminTest testBasicPreferredReplicaElection FAILED junit.framework.AssertionFailedError: Timing out after 5000 ms since leader is not elected or changed for partition [test,1] at junit.framework.Assert.fail(Assert.java:47) at kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:542) at kafka.admin.AdminTest.testBasicPreferredReplicaElection(AdminTest.scala:310) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1590) Binarize trace level request logging along with debug level text logging
[ https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131069#comment-14131069 ] Neha Narkhede commented on KAFKA-1590: -- [~abhioncbr] We'd rather not include more dependencies just to do logging. The request logger currently uses log4j and you can just override the appender and use it to write a binary log. At the same time, we would also need a tool that can parse and convert the request log to humanly readable text. Binarize trace level request logging along with debug level text logging Key: KAFKA-1590 URL: https://issues.apache.org/jira/browse/KAFKA-1590 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Abhishek Sharma Labels: newbie Fix For: 0.9.0 With trace level logging, the request handling logs can grow very fast depending on the client behavior (e.g. consumer with 0 maxWait and hence keep sending fetch requests). Previously we have changed it to debug level which only provides a summary of the requests, omitting request details. However this does not work perfectly since summaries are not sufficient for trouble-shooting, and turning on trace level upon issues will be too late. The proposed solution here, is to default to debug level logging with trace level logging printed as binary format at the same time. The generated binary files can then be further compressed / rolled out. When needed, we will then decompress / parse the trace logs into texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131073#comment-14131073 ] Neha Narkhede commented on KAFKA-1481: -- This issue has popped up enough times on the mailing list that we should pay attention to it and fix it. [~junrao], what plan would you suggest for fixing this so that the change works for everyone? Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names
[ https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1481: - Priority: Critical (was: Major) Stop using dashes AND underscores as separators in MBean names -- Key: KAFKA-1481 URL: https://issues.apache.org/jira/browse/KAFKA-1481 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Otis Gospodnetic Priority: Critical Labels: patch Fix For: 0.8.2 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch MBeans should not use dashes or underscores as separators because these characters are allowed in hostnames, topics, group and consumer IDs, etc., and these are embedded in MBeans names making it impossible to parse out individual bits from MBeans. Perhaps a pipe character should be used to avoid the conflict. This looks like a major blocker because it means nobody can write Kafka 0.8.x monitoring tools unless they are doing it for themselves AND do not use dashes AND do not use underscores. See: http://search-hadoop.com/m/4TaT4lonIW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1625) Sample Java code contains Scala syntax
[ https://issues.apache.org/jira/browse/KAFKA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1625: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch, [~davidzchen] Sample Java code contains Scala syntax -- Key: KAFKA-1625 URL: https://issues.apache.org/jira/browse/KAFKA-1625 Project: Kafka Issue Type: Bug Components: website Reporter: David Chen Assignee: David Chen Attachments: KAFKA-1625.site.0.patch, KAFKA-1625.site.1.patch, KAFKA-1625.site.2.patch As I was reading the Kafka documentation, I noticed that some of the parameters use Scala syntax, even though the code appears to be Java. For example: {code} public static kafka.javaapi.consumer.ConsumerConnector createJavaConsumerConnector(config: ConsumerConfig); {code} Also, what is the reason for fully qualifying these classes? I understand that there are Scala and Java classes with the same name, but I think that fully qualifying them in the sample code would encourage that practice by users, which is not desirable in Java code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)