[jira] [Commented] (KAFKA-1630) ConsumerFetcherThread locked in Tomcat

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130576#comment-14130576
 ] 

Neha Narkhede commented on KAFKA-1630:
--

Can you describe the observed behavior a little more? Is the consumer lagging? 
(Use the ConsumerOffsetChecker) Is your fetch size set to larger than the 
largest message in the topic?

 ConsumerFetcherThread locked in Tomcat
 --

 Key: KAFKA-1630
 URL: https://issues.apache.org/jira/browse/KAFKA-1630
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.0
 Environment: linux redhat
Reporter: vijay
Assignee: Neha Narkhede
  Labels: performance
   Original Estimate: 12h
  Remaining Estimate: 12h

 I am using high level consumer API for consuming messages from kafka. 
 ConsumerFetcherThread gets locked. Kindly look in to the below stack trace
 ConsumerFetcherThread-SocialTwitterStream6_172.31.240.136-1410398702143-61a247c3-0-1
  prio=10 tid=0x7f294001e800 nid=0x1677 runnable [0x7f297aae9000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7f2a7c38eb40 (a sun.nio.ch.Util$1)
   - locked 0x7f2a7c38eb28 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7f2a7c326f20 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:193)
   - locked 0x7f2a7c2163c0 (a java.lang.Object)
   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
   - locked 0x7f2a7c229950 (a 
 sun.nio.ch.SocketAdaptor$SocketInputStream)
   at 
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:200)
   - locked 0x7f2a7c38ea50 (a java.lang.Object)
   at kafka.utils.Utils$.read(Utils.scala:395)
   at 
 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
   at 
 kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73)
   at 
 kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
   - locked 0x7f2a7c38e9f0 (a java.lang.Object)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
   at 
 kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94)
   at 
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1630) ConsumerFetcherThread locked in Tomcat

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130576#comment-14130576
 ] 

Neha Narkhede edited comment on KAFKA-1630 at 9/11/14 7:35 PM:
---

Can you describe the observed behavior a little more? Is the consumer lagging? 
(Use the ConsumerOffsetChecker) Is your fetch size set to larger than the 
largest message in the topic?

Also, these sort of exploratory questions are better suited to the mailing list 
where the JIRA is created after a potential bug or solution is identified.


was (Author: nehanarkhede):
Can you describe the observed behavior a little more? Is the consumer lagging? 
(Use the ConsumerOffsetChecker) Is your fetch size set to larger than the 
largest message in the topic?

 ConsumerFetcherThread locked in Tomcat
 --

 Key: KAFKA-1630
 URL: https://issues.apache.org/jira/browse/KAFKA-1630
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.0
 Environment: linux redhat
Reporter: vijay
Assignee: Neha Narkhede
  Labels: performance
   Original Estimate: 12h
  Remaining Estimate: 12h

 I am using high level consumer API for consuming messages from kafka. 
 ConsumerFetcherThread gets locked. Kindly look in to the below stack trace
 ConsumerFetcherThread-SocialTwitterStream6_172.31.240.136-1410398702143-61a247c3-0-1
  prio=10 tid=0x7f294001e800 nid=0x1677 runnable [0x7f297aae9000]
java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x7f2a7c38eb40 (a sun.nio.ch.Util$1)
   - locked 0x7f2a7c38eb28 (a java.util.Collections$UnmodifiableSet)
   - locked 0x7f2a7c326f20 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at 
 sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:193)
   - locked 0x7f2a7c2163c0 (a java.lang.Object)
   at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
   - locked 0x7f2a7c229950 (a 
 sun.nio.ch.SocketAdaptor$SocketInputStream)
   at 
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:200)
   - locked 0x7f2a7c38ea50 (a java.lang.Object)
   at kafka.utils.Utils$.read(Utils.scala:395)
   at 
 kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
   at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
   at 
 kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
   at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73)
   at 
 kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
   - locked 0x7f2a7c38e9f0 (a java.lang.Object)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:110)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:110)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:109)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
   at 
 kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:109)
   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
   at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:108)
   at 
 kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:94)
   at 
 kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1628) [New Java Producer] Topic which contains . does not correct corresponding metric name

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1628:
-
Labels: newbie  (was: )

 [New Java Producer] Topic which contains .  does not correct corresponding 
 metric name 
 -

 Key: KAFKA-1628
 URL: https://issues.apache.org/jira/browse/KAFKA-1628
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 0.8.2
 Environment: ALL
Reporter: Bhavesh Mistry
Priority: Minor
  Labels: newbie

 Hmm, it seems that we do allow . in the topic name. The topic name can't
 be just . or .. though. So, if there is a topic test.1, we will have
 the following jmx metrics name.
 kafka.producer.console-producer.topic.test:type=1
 It should be changed to
 kafka.producer.console-producer.topic:type=test.1
 Could you file a jira to follow up on this?
 Thanks,
 Jun



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130611#comment-14130611
 ] 

Neha Narkhede commented on KAFKA-1558:
--

bq. I think what we need now is to test deleteTopic under failure modes - 
leader election, partition reassignment, etc.

Yes, it would help if someone who has cycles can take this up and basically try 
to break delete topic under various failure scenarios. Currently, the approach 
to fixing delete topic is a little ad-hoc.

 AdminUtils.deleteTopic does not work
 

 Key: KAFKA-1558
 URL: https://issues.apache.org/jira/browse/KAFKA-1558
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Henning Schmiedehausen
Assignee: Sriharsha Chintalapani
Priority: Blocker
 Fix For: 0.8.2


 the AdminUtils:.deleteTopic method is implemented as
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.createPersistentPath(zkClient, 
 ZkUtils.getDeleteTopicPath(topic))
 }
 {code}
 but the DeleteTopicCommand actually does
 {code}
 zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer)
 zkClient.deleteRecursive(ZkUtils.getTopicPath(topic))
 {code}
 so I guess, that the 'createPersistentPath' above should actually be 
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic))
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work

2014-09-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130630#comment-14130630
 ] 

Sriharsha Chintalapani commented on KAFKA-1558:
---

[~gwenshap] Thanks for the info. [~nehanarkhede] I am working on testing those 
cases mentioned above. Thanks.

 AdminUtils.deleteTopic does not work
 

 Key: KAFKA-1558
 URL: https://issues.apache.org/jira/browse/KAFKA-1558
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Henning Schmiedehausen
Assignee: Sriharsha Chintalapani
Priority: Blocker
 Fix For: 0.8.2


 the AdminUtils:.deleteTopic method is implemented as
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.createPersistentPath(zkClient, 
 ZkUtils.getDeleteTopicPath(topic))
 }
 {code}
 but the DeleteTopicCommand actually does
 {code}
 zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer)
 zkClient.deleteRecursive(ZkUtils.getTopicPath(topic))
 {code}
 so I guess, that the 'createPersistentPath' above should actually be 
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic))
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130641#comment-14130641
 ] 

Neha Narkhede commented on KAFKA-1558:
--

Great. Thanks [~sriharsha] for taking this on.

 AdminUtils.deleteTopic does not work
 

 Key: KAFKA-1558
 URL: https://issues.apache.org/jira/browse/KAFKA-1558
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Henning Schmiedehausen
Assignee: Sriharsha Chintalapani
Priority: Blocker
 Fix For: 0.8.2


 the AdminUtils:.deleteTopic method is implemented as
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.createPersistentPath(zkClient, 
 ZkUtils.getDeleteTopicPath(topic))
 }
 {code}
 but the DeleteTopicCommand actually does
 {code}
 zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer)
 zkClient.deleteRecursive(ZkUtils.getTopicPath(topic))
 {code}
 so I guess, that the 'createPersistentPath' above should actually be 
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic))
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1591) Clean-up Unnecessary stack trace in error/warn logs

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1591:
-
Assignee: Abhishek Sharma

 Clean-up Unnecessary stack trace in error/warn logs
 ---

 Key: KAFKA-1591
 URL: https://issues.apache.org/jira/browse/KAFKA-1591
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Abhishek Sharma
  Labels: newbie
 Fix For: 0.9.0

 Attachments: Jira-1591-SocketConnection-Warning.patch, 
 Jira1591-SendProducerRequest-Warning.patch


 Some of the unnecessary stack traces in error / warning log entries can 
 easily pollute the log files. Examples include KAFKA-1066, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment

2014-09-11 Thread Ryan Berdeen (JIRA)
Ryan Berdeen created KAFKA-1631:
---

 Summary: ReplicationFactor and under-replicated partitions 
incorrect during reassignment
 Key: KAFKA-1631
 URL: https://issues.apache.org/jira/browse/KAFKA-1631
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen


We have a topic with a replication factor of 3. We monitor 
UnderReplicatedPartitions as recommended by the documentation.

During a partition reassignment, partitions being reassigned are reported as 
under-replicated. Running a describe shows:

{code}
Topic:activity-wal-1PartitionCount:15   ReplicationFactor:5 Configs:
Topic: activity-wal-1   Partition: 0Leader: 14  Replicas: 
14,13,12,11,15Isr: 14,12,11,13
Topic: activity-wal-1   Partition: 1Leader: 14  Replicas: 
15,14,11  Isr: 14,11
Topic: activity-wal-1   Partition: 2Leader: 11  Replicas: 
11,15,12  Isr: 12,11,15
...
{code}

It looks like the displayed replication factor, 5, is simply the number of 
replicas listed for the first partition, which includes both brokers in the 
current list and those onto which the partition is being reassigned. Partition 
0 is also included in the list when using the `--under-replicated-partitions` 
option, even though it is replicated to more partitions than the true 
replication factor.

During a reassignment, the under-replicated partitions metric is not usable, 
meaning that actual under-replicated partitions can go unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1382) Update zkVersion on partition state update failures

2014-09-11 Thread Jagbir (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131013#comment-14131013
 ] 

Jagbir commented on KAFKA-1382:
---

We are in same fix. Can you please comment if this can be patched safely on 
0.8.1.1?

 Update zkVersion on partition state update failures
 ---

 Key: KAFKA-1382
 URL: https://issues.apache.org/jira/browse/KAFKA-1382
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.2

 Attachments: KAFKA-1382.patch, KAFKA-1382_2014-05-30_21:19:21.patch, 
 KAFKA-1382_2014-05-31_15:50:25.patch, KAFKA-1382_2014-06-04_12:30:40.patch, 
 KAFKA-1382_2014-06-07_09:00:56.patch, KAFKA-1382_2014-06-09_18:23:42.patch, 
 KAFKA-1382_2014-06-11_09:37:22.patch, KAFKA-1382_2014-06-16_13:50:16.patch, 
 KAFKA-1382_2014-06-16_14:19:27.patch


 Our updateIsr code is currently:
   private def updateIsr(newIsr: Set[Replica]) {
 debug(Updated ISR for partition [%s,%d] to %s.format(topic, 
 partitionId, newIsr.mkString(,)))
 val newLeaderAndIsr = new LeaderAndIsr(localBrokerId, leaderEpoch, 
 newIsr.map(r = r.brokerId).toList, zkVersion)
 // use the epoch of the controller that made the leadership decision, 
 instead of the current controller epoch
 val (updateSucceeded, newVersion) = 
 ZkUtils.conditionalUpdatePersistentPath(zkClient,
   ZkUtils.getTopicPartitionLeaderAndIsrPath(topic, partitionId),
   ZkUtils.leaderAndIsrZkData(newLeaderAndIsr, controllerEpoch), zkVersion)
 if (updateSucceeded){
   inSyncReplicas = newIsr
   zkVersion = newVersion
   trace(ISR updated to [%s] and zkVersion updated to 
 [%d].format(newIsr.mkString(,), zkVersion))
 } else {
   info(Cached zkVersion [%d] not equal to that in zookeeper, skip 
 updating ISR.format(zkVersion))
 }
 We encountered an interesting scenario recently when a large producer fully
 saturated the broker's NIC for over an hour. The large volume of data led to
 a number of ISR shrinks (and subsequent expands). The NIC saturation
 affected the zookeeper client heartbeats and led to a session timeout. The
 timeline was roughly as follows:
 - Attempt to expand ISR
 - Expansion written to zookeeper (confirmed in zookeeper transaction logs)
 - Session timeout after around 13 seconds (the configured timeout is 20
   seconds) so that lines up.
 - zkclient reconnects to zookeeper (with the same session ID) and retries
   the write - but uses the old zkVersion. This fails because the zkVersion
   has already been updated (above).
 - The ISR expand keeps failing after that and the only way to get out of it
   is to bounce the broker.
 In the above code, if the zkVersion is different we should probably update
 the cached version and even retry the expansion until it succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1382) Update zkVersion on partition state update failures

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1382:
-
Fix Version/s: 0.8.1.2

 Update zkVersion on partition state update failures
 ---

 Key: KAFKA-1382
 URL: https://issues.apache.org/jira/browse/KAFKA-1382
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.2, 0.8.1.2

 Attachments: KAFKA-1382.patch, KAFKA-1382_2014-05-30_21:19:21.patch, 
 KAFKA-1382_2014-05-31_15:50:25.patch, KAFKA-1382_2014-06-04_12:30:40.patch, 
 KAFKA-1382_2014-06-07_09:00:56.patch, KAFKA-1382_2014-06-09_18:23:42.patch, 
 KAFKA-1382_2014-06-11_09:37:22.patch, KAFKA-1382_2014-06-16_13:50:16.patch, 
 KAFKA-1382_2014-06-16_14:19:27.patch


 Our updateIsr code is currently:
   private def updateIsr(newIsr: Set[Replica]) {
 debug(Updated ISR for partition [%s,%d] to %s.format(topic, 
 partitionId, newIsr.mkString(,)))
 val newLeaderAndIsr = new LeaderAndIsr(localBrokerId, leaderEpoch, 
 newIsr.map(r = r.brokerId).toList, zkVersion)
 // use the epoch of the controller that made the leadership decision, 
 instead of the current controller epoch
 val (updateSucceeded, newVersion) = 
 ZkUtils.conditionalUpdatePersistentPath(zkClient,
   ZkUtils.getTopicPartitionLeaderAndIsrPath(topic, partitionId),
   ZkUtils.leaderAndIsrZkData(newLeaderAndIsr, controllerEpoch), zkVersion)
 if (updateSucceeded){
   inSyncReplicas = newIsr
   zkVersion = newVersion
   trace(ISR updated to [%s] and zkVersion updated to 
 [%d].format(newIsr.mkString(,), zkVersion))
 } else {
   info(Cached zkVersion [%d] not equal to that in zookeeper, skip 
 updating ISR.format(zkVersion))
 }
 We encountered an interesting scenario recently when a large producer fully
 saturated the broker's NIC for over an hour. The large volume of data led to
 a number of ISR shrinks (and subsequent expands). The NIC saturation
 affected the zookeeper client heartbeats and led to a session timeout. The
 timeline was roughly as follows:
 - Attempt to expand ISR
 - Expansion written to zookeeper (confirmed in zookeeper transaction logs)
 - Session timeout after around 13 seconds (the configured timeout is 20
   seconds) so that lines up.
 - zkclient reconnects to zookeeper (with the same session ID) and retries
   the write - but uses the old zkVersion. This fails because the zkVersion
   has already been updated (above).
 - The ISR expand keeps failing after that and the only way to get out of it
   is to bounce the broker.
 In the above code, if the zkVersion is different we should probably update
 the cached version and even retry the expansion until it succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1419) cross build for scala 2.11

2014-09-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1419:
-
Fix Version/s: 0.8.1.2

 cross build for scala 2.11
 --

 Key: KAFKA-1419
 URL: https://issues.apache.org/jira/browse/KAFKA-1419
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.8.1
Reporter: Scott Clasen
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2, 0.8.1.2

 Attachments: KAFKA-1419-scalaBinaryVersion.patch, KAFKA-1419.patch, 
 KAFKA-1419.patch, KAFKA-1419_2014-07-28_15:05:16.patch, 
 KAFKA-1419_2014-07-29_15:13:43.patch, KAFKA-1419_2014-08-04_14:43:26.patch, 
 KAFKA-1419_2014-08-05_12:51:16.patch, KAFKA-1419_2014-08-07_10:17:34.patch, 
 KAFKA-1419_2014-08-07_10:52:18.patch


 Please publish builds for scala 2.11, hopefully just needs a small tweak to 
 the gradle conf?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1631:
-
Labels: newbie  (was: )

 ReplicationFactor and under-replicated partitions incorrect during 
 reassignment
 ---

 Key: KAFKA-1631
 URL: https://issues.apache.org/jira/browse/KAFKA-1631
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
  Labels: newbie

 We have a topic with a replication factor of 3. We monitor 
 UnderReplicatedPartitions as recommended by the documentation.
 During a partition reassignment, partitions being reassigned are reported as 
 under-replicated. Running a describe shows:
 {code}
 Topic:activity-wal-1PartitionCount:15   ReplicationFactor:5 
 Configs:
 Topic: activity-wal-1   Partition: 0Leader: 14  Replicas: 
 14,13,12,11,15Isr: 14,12,11,13
 Topic: activity-wal-1   Partition: 1Leader: 14  Replicas: 
 15,14,11  Isr: 14,11
 Topic: activity-wal-1   Partition: 2Leader: 11  Replicas: 
 11,15,12  Isr: 12,11,15
 ...
 {code}
 It looks like the displayed replication factor, 5, is simply the number of 
 replicas listed for the first partition, which includes both brokers in the 
 current list and those onto which the partition is being reassigned. 
 Partition 0 is also included in the list when using the 
 `--under-replicated-partitions` option, even though it is replicated to more 
 partitions than the true replication factor.
 During a reassignment, the under-replicated partitions metric is not usable, 
 meaning that actual under-replicated partitions can go unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1631) ReplicationFactor and under-replicated partitions incorrect during reassignment

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131051#comment-14131051
 ] 

Neha Narkhede commented on KAFKA-1631:
--

Thanks for reporting the issue, [~rberdeen]. Since partition reassignment 
involves changing the replicas of a partition, it is tricky to report the under 
replicated status correctly at all times. However, one possible improvement is 
to change the topics tool to not report partitions being reassigned, as under 
replicated. It is a minor change, feel free to give it a stab.

 ReplicationFactor and under-replicated partitions incorrect during 
 reassignment
 ---

 Key: KAFKA-1631
 URL: https://issues.apache.org/jira/browse/KAFKA-1631
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Ryan Berdeen
  Labels: newbie

 We have a topic with a replication factor of 3. We monitor 
 UnderReplicatedPartitions as recommended by the documentation.
 During a partition reassignment, partitions being reassigned are reported as 
 under-replicated. Running a describe shows:
 {code}
 Topic:activity-wal-1PartitionCount:15   ReplicationFactor:5 
 Configs:
 Topic: activity-wal-1   Partition: 0Leader: 14  Replicas: 
 14,13,12,11,15Isr: 14,12,11,13
 Topic: activity-wal-1   Partition: 1Leader: 14  Replicas: 
 15,14,11  Isr: 14,11
 Topic: activity-wal-1   Partition: 2Leader: 11  Replicas: 
 11,15,12  Isr: 12,11,15
 ...
 {code}
 It looks like the displayed replication factor, 5, is simply the number of 
 replicas listed for the first partition, which includes both brokers in the 
 current list and those onto which the partition is being reassigned. 
 Partition 0 is also included in the list when using the 
 `--under-replicated-partitions` option, even though it is replicated to more 
 partitions than the true replication factor.
 During a reassignment, the under-replicated partitions metric is not usable, 
 meaning that actual under-replicated partitions can go unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1534) transient unit test failure in testBasicPreferredReplicaElection

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131065#comment-14131065
 ] 

Neha Narkhede commented on KAFKA-1534:
--

[~abhioncbr] Thanks for signing up to fix this. For any of the unit test 
failures, especially those that are transient in nature, the best way to 
reproduce is to run it in a loop several times with logging, so you can go back 
and check why it failed.

 transient unit test failure in testBasicPreferredReplicaElection
 

 Key: KAFKA-1534
 URL: https://issues.apache.org/jira/browse/KAFKA-1534
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
  Labels: newbie

 Saw the following transient failure. 
 kafka.admin.AdminTest  testBasicPreferredReplicaElection FAILED
 junit.framework.AssertionFailedError: Timing out after 5000 ms since 
 leader is not elected or changed for partition [test,1]
 at junit.framework.Assert.fail(Assert.java:47)
 at 
 kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:542)
 at 
 kafka.admin.AdminTest.testBasicPreferredReplicaElection(AdminTest.scala:310)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1590) Binarize trace level request logging along with debug level text logging

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1590:
-
Assignee: Abhishek Sharma

 Binarize trace level request logging along with debug level text logging
 

 Key: KAFKA-1590
 URL: https://issues.apache.org/jira/browse/KAFKA-1590
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Abhishek Sharma
  Labels: newbie
 Fix For: 0.9.0


 With trace level logging, the request handling logs can grow very fast 
 depending on the client behavior (e.g. consumer with 0 maxWait and hence keep 
 sending fetch requests). Previously we have changed it to debug level which 
 only provides a summary of the requests, omitting request details. However 
 this does not work perfectly since summaries are not sufficient for 
 trouble-shooting, and turning on trace level upon issues will be too late.
 The proposed solution here, is to default to debug level logging with trace 
 level logging printed as binary format at the same time. The generated binary 
 files can then be further compressed / rolled out. When needed, we will then 
 decompress / parse the trace logs into texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1534) transient unit test failure in testBasicPreferredReplicaElection

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1534:
-
Assignee: Abhishek Sharma

 transient unit test failure in testBasicPreferredReplicaElection
 

 Key: KAFKA-1534
 URL: https://issues.apache.org/jira/browse/KAFKA-1534
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Abhishek Sharma
  Labels: newbie

 Saw the following transient failure. 
 kafka.admin.AdminTest  testBasicPreferredReplicaElection FAILED
 junit.framework.AssertionFailedError: Timing out after 5000 ms since 
 leader is not elected or changed for partition [test,1]
 at junit.framework.Assert.fail(Assert.java:47)
 at 
 kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:542)
 at 
 kafka.admin.AdminTest.testBasicPreferredReplicaElection(AdminTest.scala:310)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1590) Binarize trace level request logging along with debug level text logging

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131069#comment-14131069
 ] 

Neha Narkhede commented on KAFKA-1590:
--

[~abhioncbr] We'd rather not include more dependencies just to do logging. The 
request logger currently uses log4j and you can just override the appender and 
use it to write a binary log. At the same time, we would also need a tool that 
can parse and convert the request log to humanly readable text.

 Binarize trace level request logging along with debug level text logging
 

 Key: KAFKA-1590
 URL: https://issues.apache.org/jira/browse/KAFKA-1590
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Abhishek Sharma
  Labels: newbie
 Fix For: 0.9.0


 With trace level logging, the request handling logs can grow very fast 
 depending on the client behavior (e.g. consumer with 0 maxWait and hence keep 
 sending fetch requests). Previously we have changed it to debug level which 
 only provides a summary of the requests, omitting request details. However 
 this does not work perfectly since summaries are not sufficient for 
 trouble-shooting, and turning on trace level upon issues will be too late.
 The proposed solution here, is to default to debug level logging with trace 
 level logging printed as binary format at the same time. The generated binary 
 files can then be further compressed / rolled out. When needed, we will then 
 decompress / parse the trace logs into texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-09-11 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131073#comment-14131073
 ] 

Neha Narkhede commented on KAFKA-1481:
--

This issue has popped up enough times on the mailing list that we should pay 
attention to it and fix it. [~junrao], what plan would you suggest for fixing 
this so that the change works for everyone?

 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1481:
-
Priority: Critical  (was: Major)

 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
Priority: Critical
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1625) Sample Java code contains Scala syntax

2014-09-11 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1625:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch, [~davidzchen]

 Sample Java code contains Scala syntax
 --

 Key: KAFKA-1625
 URL: https://issues.apache.org/jira/browse/KAFKA-1625
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: David Chen
Assignee: David Chen
 Attachments: KAFKA-1625.site.0.patch, KAFKA-1625.site.1.patch, 
 KAFKA-1625.site.2.patch


 As I was reading the Kafka documentation, I noticed that some of the 
 parameters use Scala syntax, even though the code appears to be Java. For 
 example:
 {code}
 public static kafka.javaapi.consumer.ConsumerConnector 
 createJavaConsumerConnector(config: ConsumerConfig);
 {code}
 Also, what is the reason for fully qualifying these classes? I understand 
 that there are Scala and Java classes with the same name, but I think that 
 fully qualifying them in the sample code would encourage that practice by 
 users, which is not desirable in Java code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)