Re: Proposed Changes To New Producer Public API
. In order to allow correctly choosing a partition the Producer interface will include a new method: ListPartitionInfo partitionsForTopic(String topic); PartitionInfo will be changed to include the actual Node objects not just the Node ids. Why are the node id's alone insufficient? You can still do round-robin/connection-limiting, etc. with just the node id right? Actually no - we do need the full node info. -- Sent from Gmail Mobile
[jira] [Updated] (KAFKA-1125) Add options to let admin tools block until finish
[ https://issues.apache.org/jira/browse/KAFKA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1125: - Summary: Add options to let admin tools block until finish (was: Add options to let admin tools blocking until finish) Add options to let admin tools block until finish - Key: KAFKA-1125 URL: https://issues.apache.org/jira/browse/KAFKA-1125 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Assignee: Guozhang Wang Topic config change as well as create-topic, add-partition, partition-reassignment and preferred leader election are all asynchronous in the sense that the admin command would return immediately and one has to check himself if the process has finished. It is better to add an option to make these commands blocking until the process is done. Also, it would be good to order admin tasks in order so that they can be executed sequentially in logic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-330) Add delete topic support
[ https://issues.apache.org/jira/browse/KAFKA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888685#comment-13888685 ] Neha Narkhede commented on KAFKA-330: - Updated reviewboard against branch trunk Add delete topic support - Key: KAFKA-330 URL: https://issues.apache.org/jira/browse/KAFKA-330 Project: Kafka Issue Type: Bug Components: controller, log, replication Affects Versions: 0.8.0, 0.8.1 Reporter: Neha Narkhede Assignee: Neha Narkhede Labels: features, project Attachments: KAFKA-330.patch, KAFKA-330_2014-01-28_15:19:20.patch, KAFKA-330_2014-01-28_22:01:16.patch, KAFKA-330_2014-01-31_14:19:14.patch, KAFKA-330_2014-01-31_17:45:25.patch, KAFKA-330_2014-02-01_11:30:32.patch, kafka-330-v1.patch, kafka-330-v2.patch One proposal of this API is here - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+replication+detailed+design+V2#KafkareplicationdetaileddesignV2-Deletetopic -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request
[ https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-703: Fix Version/s: (was: 0.8.1) 0.8.2 A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request --- Key: KAFKA-703 URL: https://issues.apache.org/jira/browse/KAFKA-703 Project: Kafka Issue Type: Bug Components: purgatory Affects Versions: 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Priority: Blocker Fix For: 0.8.2 When a producer request is handled, the fetch purgatory is checked to ensure any fetch requests are satisfied. When the produce request is satisfied we do the check again and if the same fetch request was still in the fetch purgatory it would end up double counting the bytes received. Possible Solutions 1. In the delayed produce request case, do the check only after the produce request is satisfied. This could potentially delay the fetch request from being satisfied. 2. Remove dependency of fetch request on produce request and just look at the last logical log offset (which should mostly be cached). This would need the replica.fetch.min.bytes to be number of messages rather than bytes. This also helps KAFKA-671 in that we would no longer need to pass the ProduceRequest object to the producer purgatory and hence not have to consume any memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request
[ https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888692#comment-13888692 ] Neha Narkhede commented on KAFKA-703: - Moving to 0.8.2 A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request --- Key: KAFKA-703 URL: https://issues.apache.org/jira/browse/KAFKA-703 Project: Kafka Issue Type: Bug Components: purgatory Affects Versions: 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Priority: Blocker Fix For: 0.8.2 When a producer request is handled, the fetch purgatory is checked to ensure any fetch requests are satisfied. When the produce request is satisfied we do the check again and if the same fetch request was still in the fetch purgatory it would end up double counting the bytes received. Possible Solutions 1. In the delayed produce request case, do the check only after the produce request is satisfied. This could potentially delay the fetch request from being satisfied. 2. Remove dependency of fetch request on produce request and just look at the last logical log offset (which should mostly be cached). This would need the replica.fetch.min.bytes to be number of messages rather than bytes. This also helps KAFKA-671 in that we would no longer need to pass the ProduceRequest object to the producer purgatory and hence not have to consume any memory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-686: Fix Version/s: (was: 0.8.1) 0.8.2 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper -- Key: KAFKA-686 URL: https://issues.apache.org/jira/browse/KAFKA-686 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Priority: Blocker Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFAK-686-null-pointer-fix.patch People will not know that the zookeeper paths are not compatible. When you try to start the 0.8 broker pointed at a 0.7 zookeeper you get a NullPointerException. We should detect this and give a more sane error. Error: kafka.common.KafkaException: Can't parse json string: null at kafka.utils.Json$.liftedTree1$1(Json.scala:20) at kafka.utils.Json$.parseFull(Json.scala:16) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498) at kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494) at kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53) at kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43) at kafka.controller.KafkaController.startup(KafkaController.scala:381) at kafka.server.KafkaServer.startup(KafkaServer.scala:90) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34) at kafka.Kafka$.main(Kafka.scala:46) at kafka.Kafka.main(Kafka.scala) Caused by: java.lang.NullPointerException at scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52) at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71) at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85) at kafka.utils.Json$.liftedTree1$1(Json.scala:17) ... 16 more -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-917. - Resolution: Won't Fix Fix Version/s: (was: 0.8.1) KAFKA-924 will fix this. Expose zk.session.timeout.ms in console consumer Key: KAFKA-917 URL: https://issues.apache.org/jira/browse/KAFKA-917 Project: Kafka Issue Type: Bug Affects Versions: 0.7, 0.8.0 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Attachments: kafka-917-0.8-rebased.patch, kafka-917.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Closed] (KAFKA-917) Expose zk.session.timeout.ms in console consumer
[ https://issues.apache.org/jira/browse/KAFKA-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-917. --- Expose zk.session.timeout.ms in console consumer Key: KAFKA-917 URL: https://issues.apache.org/jira/browse/KAFKA-917 Project: Kafka Issue Type: Bug Affects Versions: 0.7, 0.8.0 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Priority: Blocker Labels: bugs Attachments: kafka-917-0.8-rebased.patch, kafka-917.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-496) high level producer send should return a response
[ https://issues.apache.org/jira/browse/KAFKA-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-496: Fix Version/s: (was: 0.8.1) 0.9.0 high level producer send should return a response - Key: KAFKA-496 URL: https://issues.apache.org/jira/browse/KAFKA-496 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Jay Kreps Priority: Blocker Labels: features Fix For: 0.9.0 Original Estimate: 72h Remaining Estimate: 72h Currently, Producer.send() doesn't return any value. In 0.8, since each produce request will be acked, we should pass the response back. What we can do is that if the producer is in sync mode, we can return a map of (topic,partitionId) - (errorcode, offset). If the producer is in async mode, we can just return a null. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-411) Message Error in high cocurrent environment
[ https://issues.apache.org/jira/browse/KAFKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-411: Resolution: Cannot Reproduce Fix Version/s: (was: 0.8.1) Status: Resolved (was: Patch Available) Not sure if this is a problem at all Message Error in high cocurrent environment --- Key: KAFKA-411 URL: https://issues.apache.org/jira/browse/KAFKA-411 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Reporter: jian fan Priority: Blocker Labels: InvalidTopic Attachments: kafka-411.patch In high cocurrent environment, these errors always appera in kafka broker: ERROR Error processing MultiProducerRequest on bxx:2 (kafka.server.KafkaRequestHandlers) kafka.message.InvalidMessageException: message is invalid, compression codec: NoCompressionCodec size: 1030 curr offset: 1034 init offset: 0 at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:130) at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:166) at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:100) at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59) at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51) at scala.collection.Iterator$class.foreach(Iterator.scala:631) at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30) at scala.collection.IterableLike$class.foreach(IterableLike.scala:79) at kafka.message.MessageSet.foreach(MessageSet.scala:87) at kafka.log.Log.append(Log.scala:205) at kafka.server.KafkaRequestHandlers.kafka$server$KafkaRequestHandlers$$handleProducerRequest(KafkaRequestHandlers.scala:69) at kafka.server.KafkaRequestHandlers$$anonfun$handleMultiProducerRequest$1.apply(KafkaRequestHandlers.scala:62) at kafka.server.KafkaRequestHandlers$$anonfun$handleMultiProducerRequest$1.apply(KafkaRequestHandlers.scala:62) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34) at scala.collection.TraversableLike$class.map(TraversableLike.scala:206) at scala.collection.mutable.ArrayOps.map(ArrayOps.scala:34) at kafka.server.KafkaRequestHandlers.handleMultiProducerRequest(KafkaRequestHandlers.scala:62) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$4.apply(KafkaRequestHandlers.scala:41) at kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$4.apply(KafkaRequestHandlers.scala:41) at kafka.network.Processor.handle(SocketServer.scala:296) at kafka.network.Processor.read(SocketServer.scala:319) at kafka.network.Processor.run(SocketServer.scala:214) at java.lang.Thread.run(Thread.java:722) ERROR Closing socket for /192.168.75.15 because of error (kafka.network.Processor) kafka.common.InvalidTopicException: topic name can't be empty at kafka.log.LogManager.getLogPool(LogManager.scala:159) at kafka.log.LogManager.getOrCreateLog(LogManager.scala:195) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-496) high level producer send should return a response
[ https://issues.apache.org/jira/browse/KAFKA-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888696#comment-13888696 ] Neha Narkhede commented on KAFKA-496: - Will be fixed as part of the new producer release (0.9) high level producer send should return a response - Key: KAFKA-496 URL: https://issues.apache.org/jira/browse/KAFKA-496 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Assignee: Jay Kreps Priority: Blocker Labels: features Fix For: 0.9.0 Original Estimate: 72h Remaining Estimate: 72h Currently, Producer.send() doesn't return any value. In 0.8, since each produce request will be acked, we should pass the response back. What we can do is that if the producer is in sync mode, we can return a map of (topic,partitionId) - (errorcode, offset). If the producer is in async mode, we can just return a null. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-95) Create Jenkins readable test output
[ https://issues.apache.org/jira/browse/KAFKA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-95: --- Fix Version/s: (was: 0.8.1) 0.8.2 Create Jenkins readable test output --- Key: KAFKA-95 URL: https://issues.apache.org/jira/browse/KAFKA-95 Project: Kafka Issue Type: Improvement Components: packaging Reporter: Chris Burroughs Fix For: 0.8.2 Jenkinds likes XML. See http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-288) java impacted changes from new producer and consumer request format
[ https://issues.apache.org/jira/browse/KAFKA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-288: Fix Version/s: (was: 0.8.1) 0.9.0 java impacted changes from new producer and consumer request format --- Key: KAFKA-288 URL: https://issues.apache.org/jira/browse/KAFKA-288 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: replication, wireprotocol Fix For: 0.9.0 1) javaapi.SyncProducer: We should get rid of send(topic: String, messages: ByteBufferMessageSet) and only keep send(producerRequest: kafka.javaapi.ProducerRequest). This affects KafkaRecordWriter and DataGenerator 2) javaapi.ProducerRequest: We will need to define a java version of TopicData so that java producers can create request conveniently. The java version of TopicData will use the java version of ByteBufferMessageSet. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time
[ https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888698#comment-13888698 ] Neha Narkhede commented on KAFKA-1194: -- Needs review from [~jkreps]. Moving to 0.8.2 for now. The kafka broker cannot delete the old log files after the configured time -- Key: KAFKA-1194 URL: https://issues.apache.org/jira/browse/KAFKA-1194 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: window Reporter: Tao Qin Assignee: Jay Kreps Labels: features, patch Fix For: 0.8.2 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch Original Estimate: 72h Remaining Estimate: 72h We tested it in windows environment, and set the log.retention.hours to 24 hours. # The minimum age of a log file to be eligible for deletion log.retention.hours=24 After several days, the kafka broker still cannot delete the old log file. And we get the following exceptions: [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 1516723 at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638) at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at kafka.log.Log.deleteOldSegments(Log.scala:418) at kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.log.LogManager.cleanupLogs(LogManager.scala:314) at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I think this error happens because kafka tries to rename the log file when it is still opened. So we should close the file first before rename. The index file uses a special data structure, the MappedByteBuffer. Javadoc describes it as: A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected. Fortunately, I find a forceUnmap function in kafka code, and perhaps it can be used to free the MappedByteBuffer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-289) reuse topicdata when sending producerrequest
[ https://issues.apache.org/jira/browse/KAFKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-289: Fix Version/s: (was: 0.8.1) 0.9.0 reuse topicdata when sending producerrequest Key: KAFKA-289 URL: https://issues.apache.org/jira/browse/KAFKA-289 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: optimization, replication, wireprotocol Fix For: 0.9.0 The way that SyncProducer sends a ProducerRequest over socket is to first serialize the whole request in a bytebuffer and then sends the bytebuffer through socket. An alternative is to send the request like FetchReponse, using a ProduceRequestSend that reuses TopicDataSend. This avoids code duplication and is more efficient since it sends data in ByteBufferMessagesSet directly to socket and avoids extra copying from messageset to bytebuffer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time
[ https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1194: - Fix Version/s: (was: 0.8.1) 0.8.2 The kafka broker cannot delete the old log files after the configured time -- Key: KAFKA-1194 URL: https://issues.apache.org/jira/browse/KAFKA-1194 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.8.1 Environment: window Reporter: Tao Qin Assignee: Jay Kreps Labels: features, patch Fix For: 0.8.2 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch Original Estimate: 72h Remaining Estimate: 72h We tested it in windows environment, and set the log.retention.hours to 24 hours. # The minimum age of a log file to be eligible for deletion log.retention.hours=24 After several days, the kafka broker still cannot delete the old log file. And we get the following exceptions: [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler) kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 1516723 at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249) at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638) at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) at scala.collection.immutable.List.foreach(List.scala:76) at kafka.log.Log.deleteOldSegments(Log.scala:418) at kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316) at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573) at scala.collection.IterableLike$class.foreach(IterableLike.scala:73) at scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742) at kafka.log.LogManager.cleanupLogs(LogManager.scala:314) at kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) I think this error happens because kafka tries to rename the log file when it is still opened. So we should close the file first before rename. The index file uses a special data structure, the MappedByteBuffer. Javadoc describes it as: A mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is garbage-collected. Fortunately, I find a forceUnmap function in kafka code, and perhaps it can be used to free the MappedByteBuffer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-174) Add performance suite for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-174: Fix Version/s: (was: 0.8.1) 0.9.0 Add performance suite for Kafka --- Key: KAFKA-174 URL: https://issues.apache.org/jira/browse/KAFKA-174 Project: Kafka Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Neha Narkhede Labels: replication Fix For: 0.9.0 This is a placeholder JIRA for adding a perf suite to Kafka. The high level proposal is here - https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing There will be more JIRAs covering smaller tasks to fully implement this. They will be linked to this JIRA. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-316) disallow recursively compressed message
[ https://issues.apache.org/jira/browse/KAFKA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-316: Fix Version/s: (was: 0.8.1) 0.9.0 disallow recursively compressed message --- Key: KAFKA-316 URL: https://issues.apache.org/jira/browse/KAFKA-316 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Fix For: 0.9.0 Attachments: 0001-KAFKA-316-Disallow-MessageSets-within-MessageSets.patch Currently, it is possible to create a compressed Message that contains a set of Messages, each of which is further compressed. Support recursively compressed messages has little benefit and can complicates the on disk storage format. We should probably disallow this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-175) Add helper scripts to wrap the current perf tools
[ https://issues.apache.org/jira/browse/KAFKA-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-175: Fix Version/s: (was: 0.8.1) 0.9.0 Add helper scripts to wrap the current perf tools - Key: KAFKA-175 URL: https://issues.apache.org/jira/browse/KAFKA-175 Project: Kafka Issue Type: Sub-task Reporter: Neha Narkhede Assignee: Neha Narkhede Fix For: 0.9.0 Attachments: kafka-175-updated.patch, kafka-175.patch We have 3 useful tools to run producer and consumer perf tests - ProducerPerformance.scala, SimpleConsumerPerformance.scala and ConsumerPerformance.scala. These tests expose several options that allows you to define the load for each perf run. It will be good to expose some helper scripts that will cover some single node perf testing scenarios. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1143) Consumer should cache topic partition info
[ https://issues.apache.org/jira/browse/KAFKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1143: - Fix Version/s: (was: 0.8.1) 0.9.0 Consumer should cache topic partition info -- Key: KAFKA-1143 URL: https://issues.apache.org/jira/browse/KAFKA-1143 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 So that 1. It can check if rebalances are necessary when topic/partition watcher fires (they can be triggered for state change event even the data does not change at all). 2. Rebalance does not need to read again from ZK. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1148) Delayed fetch/producer requests should be satisfied on a leader change
[ https://issues.apache.org/jira/browse/KAFKA-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1148: - Fix Version/s: (was: 0.8.1) Delayed fetch/producer requests should be satisfied on a leader change -- Key: KAFKA-1148 URL: https://issues.apache.org/jira/browse/KAFKA-1148 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Somewhat related to KAFKA-1016. This would be an issue only if max.wait is set to a very high value. When a leader change occurs we should remove the delayed request from the purgatory - either satisfy with error/expire - whichever makes more sense. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1147) Consumer socket timeout should be greater than fetch max wait
[ https://issues.apache.org/jira/browse/KAFKA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1147: - Fix Version/s: (was: 0.8.1) 0.8.2 0.9.0 Consumer socket timeout should be greater than fetch max wait - Key: KAFKA-1147 URL: https://issues.apache.org/jira/browse/KAFKA-1147 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Assignee: Guozhang Wang Fix For: 0.9.0, 0.8.2 Attachments: KAFKA-1147.patch, KAFKA-1147_2013-12-07_18:22:18.patch, KAFKA-1147_2013-12-09_09:14:24.patch, KAFKA-1147_2013-12-10_14:31:46.patch From the mailing list: The consumer-config documentation states that The actual timeout set will be max.fetch.wait + socket.timeout.ms. - however, that change seems to have been lost in the code a while ago - we should either fix the doc or re-introduce the addition. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1164) kafka should depend on snappy 1.0.5 (instead of 1.0.4.1)
[ https://issues.apache.org/jira/browse/KAFKA-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1164: - Fix Version/s: (was: 0.8.1) 0.8.2 kafka should depend on snappy 1.0.5 (instead of 1.0.4.1) Key: KAFKA-1164 URL: https://issues.apache.org/jira/browse/KAFKA-1164 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jason Rosenberg Labels: release Fix For: 0.8.2 There is a bug in snappy 1.0.4.1, that makes it not work when using java 7, on MacOSX. This issue is fixed in snappy 1.0.5. We've confirmed this locally. https://github.com/ptaoussanis/carmine/issues/5 So, the kafka distribution should update the KafkaProject.scala file to call out version 1.0.5 instead of 1.0.4.1. I believe this file is used when generating the pom.xml file for kafka. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1163) see whats going on with the 2.8.0 pom
[ https://issues.apache.org/jira/browse/KAFKA-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888700#comment-13888700 ] Neha Narkhede commented on KAFKA-1163: -- [~joestein] Is this still a problem? see whats going on with the 2.8.0 pom - Key: KAFKA-1163 URL: https://issues.apache.org/jira/browse/KAFKA-1163 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: release Fix For: 0.8.1 assuming we are going to even support 2.8.0 anymore in 0.8.1 =8^/ The POM for 2.8.2, 2.9.1, 2.9.2, and 2.10 include scala-compiler and some other stuff that is not included in 2.8.0 2.8.0 https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.8.0/0.8.0/kafka_2.8.0-0.8.0.pom 2.8.2 https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka_2.8.2/0.8.0/kafka_2.8.2-0.8.0.pom Here's a diff of those two: https://gist.github.com/mumrah/7bd6bd8e2805210d5d9d/revisions I think maybe the 2.8.0 POM is missing some stuff it needs (zkclient, snappy, yammer metrics). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1161) review report of the dependencies, conflicts, and licenses into ivy-report
[ https://issues.apache.org/jira/browse/KAFKA-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888702#comment-13888702 ] Neha Narkhede commented on KAFKA-1161: -- [~joestein] What changes do we need before the 0.8.1 release? review report of the dependencies, conflicts, and licenses into ivy-report -- Key: KAFKA-1161 URL: https://issues.apache.org/jira/browse/KAFKA-1161 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: release Fix For: 0.8.1 I create a simple Ant+Ivy project to test resolving the artifacts published to Apache staging repo: https://github.com/mumrah/kafka-ivy. This will fetch Kafka libs from the Apache staging area and other things from Maven Central. It will fetch the jars into lib/ivy/{conf} and generate a report of the dependencies, conflicts, and licenses into ivy-report. Notice I had to add three exclusions to get things working. Maybe we should add these to our pom? already handling the dependencies in KAFKA-1159 but if there is anything else to change from this output (like to confirm licensing with Scala or anything else) lets use this JIRA to capture that here -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1162) handle duplicate entry for ZK in the pom file
[ https://issues.apache.org/jira/browse/KAFKA-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888701#comment-13888701 ] Neha Narkhede commented on KAFKA-1162: -- [~joestein] Could you add more details on what needs to be fixed here? handle duplicate entry for ZK in the pom file - Key: KAFKA-1162 URL: https://issues.apache.org/jira/browse/KAFKA-1162 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: release Fix For: 0.8.1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1160) have the pom reference the exclusions necessary so folks don't have to
[ https://issues.apache.org/jira/browse/KAFKA-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888703#comment-13888703 ] Neha Narkhede commented on KAFKA-1160: -- [~joestein] What changes do we need before the 0.8.1 release? have the pom reference the exclusions necessary so folks don't have to -- Key: KAFKA-1160 URL: https://issues.apache.org/jira/browse/KAFKA-1160 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: release Fix For: 0.8.1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1159) try to get the bin tar smaller
[ https://issues.apache.org/jira/browse/KAFKA-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888704#comment-13888704 ] Neha Narkhede commented on KAFKA-1159: -- [~joestein] What changes do we need before the 0.8.1 release? try to get the bin tar smaller -- Key: KAFKA-1159 URL: https://issues.apache.org/jira/browse/KAFKA-1159 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Joe Stein Labels: release Fix For: 0.8.1 probably it makes sense to drop libs/scala-compiler.jar -- kafka do not perform compilations during runtime and this step will trim some fat from the resulting release (from 17 mb down to 9.5 mb*). * by the way using the best possible compression method (-9 instead of default -6) + drop of compiler lib gave me the very same result -- 9.5 Mb -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1173) Using Vagrant to get up and running with Apache Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1173: - Fix Version/s: (was: 0.8.1) Using Vagrant to get up and running with Apache Kafka - Key: KAFKA-1173 URL: https://issues.apache.org/jira/browse/KAFKA-1173 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Attachments: KAFKA-1173_2013-12-07_12:07:55.patch Vagrant has been getting a lot of pickup in the tech communities. I have found it very useful for development and testing and working with a few clients now using it to help virtualize their environments in repeatable ways. Using Vagrant to get up and running. For 0.8.0 I have a patch on github https://github.com/stealthly/kafka 1) Install Vagrant [http://www.vagrantup.com/](http://www.vagrantup.com/) 2) Install Virtual Box [https://www.virtualbox.org/](https://www.virtualbox.org/) In the main kafka folder 1) ./sbt update 2) ./sbt package 3) ./sbt assembly-package-dependency 4) vagrant up once this is done * Zookeeper will be running 192.168.50.5 * Broker 1 on 192.168.50.10 * Broker 2 on 192.168.50.20 * Broker 3 on 192.168.50.30 When you are all up and running you will be back at a command brompt. If you want you can login to the machines using vagrant shh machineName but you don't need to. You can access the brokers and zookeeper by their IP e.g. bin/kafka-console-producer.sh --broker-list 192.168.50.10:9092,192.168.50.20:9092,192.168.50.30:9092 --topic sandbox bin/kafka-console-consumer.sh --zookeeper 192.168.50.5:2181 --topic sandbox --from-beginning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1173) Using Vagrant to get up and running with Apache Kafka
[ https://issues.apache.org/jira/browse/KAFKA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888705#comment-13888705 ] Neha Narkhede commented on KAFKA-1173: -- Not sure what needs to happen here, moving out of 0.8.1 Using Vagrant to get up and running with Apache Kafka - Key: KAFKA-1173 URL: https://issues.apache.org/jira/browse/KAFKA-1173 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Attachments: KAFKA-1173_2013-12-07_12:07:55.patch Vagrant has been getting a lot of pickup in the tech communities. I have found it very useful for development and testing and working with a few clients now using it to help virtualize their environments in repeatable ways. Using Vagrant to get up and running. For 0.8.0 I have a patch on github https://github.com/stealthly/kafka 1) Install Vagrant [http://www.vagrantup.com/](http://www.vagrantup.com/) 2) Install Virtual Box [https://www.virtualbox.org/](https://www.virtualbox.org/) In the main kafka folder 1) ./sbt update 2) ./sbt package 3) ./sbt assembly-package-dependency 4) vagrant up once this is done * Zookeeper will be running 192.168.50.5 * Broker 1 on 192.168.50.10 * Broker 2 on 192.168.50.20 * Broker 3 on 192.168.50.30 When you are all up and running you will be back at a command brompt. If you want you can login to the machines using vagrant shh machineName but you don't need to. You can access the brokers and zookeeper by their IP e.g. bin/kafka-console-producer.sh --broker-list 192.168.50.10:9092,192.168.50.20:9092,192.168.50.30:9092 --topic sandbox bin/kafka-console-consumer.sh --zookeeper 192.168.50.5:2181 --topic sandbox --from-beginning -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1096) An old controller coming out of long GC could update its epoch to the latest controller's epoch
[ https://issues.apache.org/jira/browse/KAFKA-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888707#comment-13888707 ] Neha Narkhede commented on KAFKA-1096: -- This is relatively rare, moving to 0.8.2 An old controller coming out of long GC could update its epoch to the latest controller's epoch --- Key: KAFKA-1096 URL: https://issues.apache.org/jira/browse/KAFKA-1096 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8.0 Reporter: Swapnil Ghike Fix For: 0.8.2 If a controller GCs for too long, we could have two controllers in the cluster. The controller epoch is supposed to minimize the damage in such a situation, as the brokers will reject the requests sent by the controller with an older epoch. When the old controller is still in long GC, a new controller could be elected. This will fire ControllerEpochListener on the old controller. When it comes out of GC, its ControllerEpochListener will update its own epoch to the new controller's epoch. So both controllers are now able to send out requests with the same controller epoch until the old controller's handleNewSession() can execute in the controller lock. ControllerEpochListener does not seem necessary, so we can probably delete it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1096) An old controller coming out of long GC could update its epoch to the latest controller's epoch
[ https://issues.apache.org/jira/browse/KAFKA-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1096: - Component/s: controller An old controller coming out of long GC could update its epoch to the latest controller's epoch --- Key: KAFKA-1096 URL: https://issues.apache.org/jira/browse/KAFKA-1096 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8.0 Reporter: Swapnil Ghike Fix For: 0.8.2 If a controller GCs for too long, we could have two controllers in the cluster. The controller epoch is supposed to minimize the damage in such a situation, as the brokers will reject the requests sent by the controller with an older epoch. When the old controller is still in long GC, a new controller could be elected. This will fire ControllerEpochListener on the old controller. When it comes out of GC, its ControllerEpochListener will update its own epoch to the new controller's epoch. So both controllers are now able to send out requests with the same controller epoch until the old controller's handleNewSession() can execute in the controller lock. ControllerEpochListener does not seem necessary, so we can probably delete it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1011) Decompression and re-compression on MirrorMaker could result in messages being dropped in the pipeline
[ https://issues.apache.org/jira/browse/KAFKA-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1011: - Fix Version/s: (was: 0.8.1) 0.9.0 Decompression and re-compression on MirrorMaker could result in messages being dropped in the pipeline -- Key: KAFKA-1011 URL: https://issues.apache.org/jira/browse/KAFKA-1011 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1011.v1.patch The way MirrorMaker works today is that its consumers could use deep iterator to decompress messages received from the source brokers and its producers could re-compress the messages while sending them to the target brokers. Since MirrorMakers use a centralized data channel for its consumers to pipe messages to its producers, and since producers would compress messages with the same topic within a batch as a single produce request, this could result in messages accepted at the front end of the pipeline being dropped at the target brokers of the MirrorMaker due to MesageSizeTooLargeException if it happens that one batch of messages contain too many messages of the same topic in MirrorMaker's producer. If we can use shallow iterator at the MirrorMaker's consumer side to directly pipe compressed messages this issue can be fixed. Also as Swapnil pointed out, currently if the MirrorMaker lags and there are large messages in the MirrorMaker queue (large after decompression), it can run into an OutOfMemoryException. Shallow iteration will be very helpful in avoiding this exception. The proposed solution of this issue is also related to KAFKA-527. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1096) An old controller coming out of long GC could update its epoch to the latest controller's epoch
[ https://issues.apache.org/jira/browse/KAFKA-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1096: - Fix Version/s: (was: 0.8.1) 0.8.2 An old controller coming out of long GC could update its epoch to the latest controller's epoch --- Key: KAFKA-1096 URL: https://issues.apache.org/jira/browse/KAFKA-1096 Project: Kafka Issue Type: Bug Components: controller Affects Versions: 0.8.0 Reporter: Swapnil Ghike Fix For: 0.8.2 If a controller GCs for too long, we could have two controllers in the cluster. The controller epoch is supposed to minimize the damage in such a situation, as the brokers will reject the requests sent by the controller with an older epoch. When the old controller is still in long GC, a new controller could be elected. This will fire ControllerEpochListener on the old controller. When it comes out of GC, its ControllerEpochListener will update its own epoch to the new controller's epoch. So both controllers are now able to send out requests with the same controller epoch until the old controller's handleNewSession() can execute in the controller lock. ControllerEpochListener does not seem necessary, so we can probably delete it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged
[ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1108: - Fix Version/s: (was: 0.8.1) 0.8.2 when controlled shutdown attempt fails, the reason is not always logged --- Key: KAFKA-1108 URL: https://issues.apache.org/jira/browse/KAFKA-1108 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Fix For: 0.8.2 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown. Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code: {code} info(Remaining partitions to move: %s.format(shutdownResponse.partitionsRemaining.mkString(,))) info(Error code from controller: %d.format(shutdownResponse.errorCode)) {code} Alternatively, there could be an IOException, with this code executed: {code} catch { case ioe: java.io.IOException = channel.disconnect() channel = null // ignore and try again } {code} And then finally, in either case: {code} if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn(Retrying controlled shutdown after the previous attempt failed...) } {code} It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!). I suspect, in my case, the actual IOException is a socket timeout (as the time between initial Starting controlled shutdown and the first Retrying... message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff). So, it would seem that really, the issue in this case is that controlled shutdown is taking too long. It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-570) Kafka should not need snappy jar at runtime
[ https://issues.apache.org/jira/browse/KAFKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-570: Fix Version/s: (was: 0.8.1) 0.8.2 Kafka should not need snappy jar at runtime --- Key: KAFKA-570 URL: https://issues.apache.org/jira/browse/KAFKA-570 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swapnil Ghike Labels: bugs Fix For: 0.8.2 CompressionFactory imports snappy jar in a pattern match. The purpose of importing it this way seems to be avoiding the import unless snappy compression is actually required. However, kafka throws a ClassNotFoundException if snappy jar is removed at runtime from lib_managed. This exception can be easily seen by producing some data with the console producer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1002) Delete aliveLeaders field from LeaderAndIsrRequest
[ https://issues.apache.org/jira/browse/KAFKA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1002: - Fix Version/s: (was: 0.8.1) 0.8.2 Delete aliveLeaders field from LeaderAndIsrRequest -- Key: KAFKA-1002 URL: https://issues.apache.org/jira/browse/KAFKA-1002 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Swapnil Ghike Fix For: 0.8.2 After KAFKA-999 is committed, we don't need aliveLeaders in LeaderAndIsrRequest. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-566) Add last modified time to the TopicMetadataRequest
[ https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-566: Fix Version/s: (was: 0.8.1) 0.8.2 Add last modified time to the TopicMetadataRequest -- Key: KAFKA-566 URL: https://issues.apache.org/jira/browse/KAFKA-566 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.8.2 To support KAFKA-560 it would be nice to have a last modified time in the TopicMetadataRequest. This would be the timestamp of the last append to the log as taken from stat on the final log segment. Implementation would involve 1. Adding a new field to TopicMetadataRequest 2. Adding a method Log.lastModified: Long to get the last modified time from a log This timestamp would, of course, be subject to error in the event that the file was touched without modification, but I think that is actually okay since it provides a manual way to avoid gc'ing a topic that you know you will want. It is debatable whether this should go in 0.8. It would be nice to add the field to the metadata request, at least, as that change should be easy and would avoid needing to bump the version in the future. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-558) KafkaETLContext should use getTopicMetadata before sending offset requests
[ https://issues.apache.org/jira/browse/KAFKA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-558: Fix Version/s: (was: 0.8.1) 0.9.0 KafkaETLContext should use getTopicMetadata before sending offset requests -- Key: KAFKA-558 URL: https://issues.apache.org/jira/browse/KAFKA-558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Joel Koshy Assignee: Joel Koshy Fix For: 0.9.0 Filing this or I may forget. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-564) Wildcard-based topic consumption should assign partitions to threads uniformly
[ https://issues.apache.org/jira/browse/KAFKA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-564: Fix Version/s: (was: 0.8.1) 0.9.0 Wildcard-based topic consumption should assign partitions to threads uniformly -- Key: KAFKA-564 URL: https://issues.apache.org/jira/browse/KAFKA-564 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Joel Koshy Fix For: 0.9.0 Right now, if when a client uses createMessageStreamsByFilter and specifies 'n' streams (threads), 'n' should be = the max partition count of any topic. If it is greater than that, the excess threads will be idle. However, it would be better to allow a greater number of threads and spread all the available partitions across the threads. This should not be too difficult, but may require significant refactoring. Although it is relevant to current trunk/0.7, we will target this for post-0.8. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR
[ https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-661: Fix Version/s: (was: 0.8.1) 0.8.2 Prevent a shutting down broker from re-entering the ISR --- Key: KAFKA-661 URL: https://issues.apache.org/jira/browse/KAFKA-661 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Fix For: 0.8.2 There is a timing issue in controlled shutdown that affects low-volume topics. The leader that is being shut down receives a leaderAndIsrRequest informing it is no longer the leader and thus starts up a follower which starts issuing fetch requests to the new leader. We then shrink the ISR and send a StopReplicaRequest to the shutting down broker. However, the new leader upon receiving the fetch request expands the ISR again. This does not really have critical impact in the sense that it can cause producers to that topic to timeout. However, there are probably very few or no produce requests coming in as it primarily affects low-volume topics. The shutdown logic itself seems to be working correctly in that the leader has been successfully moved. One possible approach would be to use the callback feature in the ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been processed by the shutting down broker before shrinking the ISR; and there are probably other ways as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-493: Fix Version/s: (was: 0.8.1) 0.8.2 High CPU usage on inactive server - Key: KAFKA-493 URL: https://issues.apache.org/jira/browse/KAFKA-493 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.8.2 Attachments: stacktrace.txt I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background task? I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. Then I run a test that essential does 10k publish operations followed by immediate consume operations (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU essentially forever. hprof results: THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, group=system) THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 21, name=main, group=main) THREAD START (obj=53ae, id = 27, name=Thread-2, group=main) THREAD START (obj=53ae, id = 28, name=Thread-3, group=main) THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, group=main) THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, group=main) THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main) THREAD START (obj=574b, id = 200012, name=ZkClient-EventThread-20-localhost:2181, group=main) THREAD START (obj=576e, id = 200014, name=main-SendThread(), group=main) THREAD START (obj=576d, id = 200013, name=main-EventThread, group=main) THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, group=main) THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, group=main) THREAD START (obj=53ae, id = 200017, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200018, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, group=main) THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, group=main) THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main) THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main) THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on broker 1, , group=main) THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on broker 1, , group=main) THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on broker 1, , group=main) THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on broker 1, , group=main) THREAD START (obj=53ae, id = 200028, name=SIGINT handler, group=system) THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main) THREAD START (obj=574b, id = 200030, name=Thread-1, group=main) THREAD START (obj=574b, id = 200031, name=Thread-0, group=main) THREAD END (id = 200031) THREAD END (id = 200029) THREAD END (id = 200020) THREAD END (id = 200019) THREAD END (id = 28) THREAD END (id = 200021) THREAD END (id = 27) THREAD END (id = 200022) THREAD END (id = 200018) THREAD END (id = 200017) THREAD END (id = 200012) THREAD END (id = 200013) THREAD END (id = 200014) THREAD END (id = 200025) THREAD END (id = 200023) THREAD END (id = 200026) THREAD END (id = 200024) THREAD END (id = 200011) THREAD END (id = 29) THREAD END (id = 200010) THREAD END (id = 200030) THREAD END (id = 200028) TRACE 301281: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218) sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) kafka.utils.Utils$.read(Utils.scala:453) kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
[jira] [Commented] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR
[ https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888709#comment-13888709 ] Neha Narkhede commented on KAFKA-661: - 0.8.1 will add callbacks as part of delete topic. These could be used to fix the issue described here. Prevent a shutting down broker from re-entering the ISR --- Key: KAFKA-661 URL: https://issues.apache.org/jira/browse/KAFKA-661 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Fix For: 0.8.2 There is a timing issue in controlled shutdown that affects low-volume topics. The leader that is being shut down receives a leaderAndIsrRequest informing it is no longer the leader and thus starts up a follower which starts issuing fetch requests to the new leader. We then shrink the ISR and send a StopReplicaRequest to the shutting down broker. However, the new leader upon receiving the fetch request expands the ISR again. This does not really have critical impact in the sense that it can cause producers to that topic to timeout. However, there are probably very few or no produce requests coming in as it primarily affects low-volume topics. The shutdown logic itself seems to be working correctly in that the leader has been successfully moved. One possible approach would be to use the callback feature in the ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been processed by the shutting down broker before shrinking the ISR; and there are probably other ways as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-493) High CPU usage on inactive server
[ https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888711#comment-13888711 ] Neha Narkhede commented on KAFKA-493: - We should try this on Java 7 and observe the CPU usage. High CPU usage on inactive server - Key: KAFKA-493 URL: https://issues.apache.org/jira/browse/KAFKA-493 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.0 Reporter: Jay Kreps Fix For: 0.8.2 Attachments: stacktrace.txt I've been playing with the 0.8 branch of Kafka and noticed that idle CPU usage is fairly high (13% of a core). Is that to be expected? I did look at the stack, but didn't see anything obvious. A background task? I wanted to mention how I am getting into this state. I've set up two machines with the latest 0.8 code base and am using a replication factor of 2. On starting the brokers there is no idle CPU activity. Then I run a test that essential does 10k publish operations followed by immediate consume operations (I was measuring latency). Once this has run the kafka nodes seem to consistently be consuming CPU essentially forever. hprof results: THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, group=system) THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, group=system) THREAD START (obj=53ae, id = 21, name=main, group=main) THREAD START (obj=53ae, id = 27, name=Thread-2, group=main) THREAD START (obj=53ae, id = 28, name=Thread-3, group=main) THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, group=main) THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, group=main) THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main) THREAD START (obj=574b, id = 200012, name=ZkClient-EventThread-20-localhost:2181, group=main) THREAD START (obj=576e, id = 200014, name=main-SendThread(), group=main) THREAD START (obj=576d, id = 200013, name=main-EventThread, group=main) THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, group=main) THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, group=main) THREAD START (obj=53ae, id = 200017, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200018, name=request-expiration-task, group=main) THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, group=main) THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, group=main) THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main) THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main) THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on broker 1, , group=main) THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on broker 1, , group=main) THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on broker 1, , group=main) THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on broker 1, , group=main) THREAD START (obj=53ae, id = 200028, name=SIGINT handler, group=system) THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main) THREAD START (obj=574b, id = 200030, name=Thread-1, group=main) THREAD START (obj=574b, id = 200031, name=Thread-0, group=main) THREAD END (id = 200031) THREAD END (id = 200029) THREAD END (id = 200020) THREAD END (id = 200019) THREAD END (id = 28) THREAD END (id = 200021) THREAD END (id = 27) THREAD END (id = 200022) THREAD END (id = 200018) THREAD END (id = 200017) THREAD END (id = 200012) THREAD END (id = 200013) THREAD END (id = 200014) THREAD END (id = 200025) THREAD END (id = 200023) THREAD END (id = 200026) THREAD END (id = 200024) THREAD END (id = 200011) THREAD END (id = 29) THREAD END (id = 200010) THREAD END (id = 200030) THREAD END (id = 200028) TRACE 301281: sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218) sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) kafka.utils.Utils$.read(Utils.scala:453)
[jira] [Updated] (KAFKA-906) Invoke halt on shutdown and startup failure to ensure the jvm is brought down
[ https://issues.apache.org/jira/browse/KAFKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-906: Fix Version/s: (was: 0.8.1) Invoke halt on shutdown and startup failure to ensure the jvm is brought down - Key: KAFKA-906 URL: https://issues.apache.org/jira/browse/KAFKA-906 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Sometimes kafka is made to run as a service in an external container. This container usually disables the individual services from exiting the process by installing a security manager. The right fix is to implement the startup and shutdown logic of kafka using the interfaces provided by these containers which would involve more work. For 0.8, we will simply call halt as the last step of shutdown and startup during a failure. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1034) Improve partition reassignment to optimize writes to zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1034: - Fix Version/s: (was: 0.8.1) 0.8.2 Improve partition reassignment to optimize writes to zookeeper -- Key: KAFKA-1034 URL: https://issues.apache.org/jira/browse/KAFKA-1034 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Fix For: 0.8.2 For ReassignPartition tool, check if optimizing the writes to ZK after every replica reassignment is possible -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (KAFKA-906) Invoke halt on shutdown and startup failure to ensure the jvm is brought down
[ https://issues.apache.org/jira/browse/KAFKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-906. - Resolution: Won't Fix Invoke halt on shutdown and startup failure to ensure the jvm is brought down - Key: KAFKA-906 URL: https://issues.apache.org/jira/browse/KAFKA-906 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Fix For: 0.8.1 Sometimes kafka is made to run as a service in an external container. This container usually disables the individual services from exiting the process by installing a security manager. The right fix is to implement the startup and shutdown logic of kafka using the interfaces provided by these containers which would involve more work. For 0.8, we will simply call halt as the last step of shutdown and startup during a failure. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-984) Avoid a full rebalance in cases when a new topic is discovered but container/broker set stay the same
[ https://issues.apache.org/jira/browse/KAFKA-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-984: Fix Version/s: (was: 0.8.1) 0.9.0 Avoid a full rebalance in cases when a new topic is discovered but container/broker set stay the same - Key: KAFKA-984 URL: https://issues.apache.org/jira/browse/KAFKA-984 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-984.v1.patch, KAFKA-984.v2.patch, KAFKA-984.v2.patch Currently a full rebalance will be triggered on high level consumers even when just a new topic is added to ZK. Better avoid this behavior but only rebalance on this newly added topic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (KAFKA-666) Fetch requests from the replicas take several seconds to complete on the leader
[ https://issues.apache.org/jira/browse/KAFKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-666. - Resolution: Incomplete Fix Version/s: (was: 0.8.1) Need more information to understand the issue Fetch requests from the replicas take several seconds to complete on the leader --- Key: KAFKA-666 URL: https://issues.apache.org/jira/browse/KAFKA-666 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Neha Narkhede Priority: Critical Labels: replication-performance I've seen that fetch requests from the replicas take several seconds to complete. The nature of the latency breakdown is different, sometimes they spend too long sitting in the request/response queue, sometimes the local/remote time is too large - [2012-12-07 20:14:51,233] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.45-port_9092: FetchRequest:9967, queueTime:0, localTime:4, remoteTime:9963, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 20:14:51,236] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.45-port_9092: FetchRequest:9967, queueTime:1, localTime:3, remoteTime:9963, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 20:14:51,239] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.45-port_9092: FetchRequest:9966, queueTime:0, localTime:2, remoteTime:9964, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 20:16:07,643] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.45-port_9092: FetchRequest:9996, queueTime:1, localTime:2, remoteTime:9992, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 20:16:07,645] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.45-port_9092: FetchRequest:, queueTime:0, localTime:4, remoteTime:9994, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 20:59:22,424] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3502, queueTime:1, localTime:1, remoteTime:3500, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:13:35,042] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3981, queueTime:0, localTime:1, remoteTime:3979, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 21:13:35,042] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:4095, queueTime:1, localTime:6, remoteTime:4088, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:13:57,254] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:4116, queueTime:1, localTime:1, remoteTime:4113, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 21:13:57,300] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3844, queueTime:1, localTime:3795, remoteTime:48, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:14:19,645] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:4239, queueTime:1, localTime:1, remoteTime:4236, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 21:14:19,689] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3977, queueTime:3931, localTime:8, remoteTime:38, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:23:58,427] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3940, queueTime:1, localTime:1, remoteTime:3938, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:23:58,435] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3858, queueTime:1, localTime:6, remoteTime:3851, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:24:21,575] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:4037, queueTime:0, localTime:1, remoteTime:4036, sendTime:0 (kafka.network.RequestChannel$) [2012-12-07 21:24:21,583] TRACE Completed request with correlation id 0 and client replica-fetcher-host_172.20.72.48-port_9092: FetchRequest:3962, queueTime:1, localTime:4, remoteTime:3956, sendTime:1 (kafka.network.RequestChannel$) [2012-12-07 21:24:43,965] TRACE Completed request with correlation id 0 and client
[jira] [Updated] (KAFKA-1034) Improve partition reassignment to optimize writes to zookeeper
[ https://issues.apache.org/jira/browse/KAFKA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1034: - Description: For ReassignPartition tool, check if optimizing the writes to ZK after every replica reassignment is possible (was: Once the 0.8 changes are merged to trunk, we need to do the following 1. Integrate AddPartition command with other admin tools in trunk 2. Remove usage of the first index of the partition list to find the number of replicas 3. For ReassignPartition tool, check if optimizing the writes to ZK after every replica reassignment is possible) Improve partition reassignment to optimize writes to zookeeper -- Key: KAFKA-1034 URL: https://issues.apache.org/jira/browse/KAFKA-1034 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Fix For: 0.8.2 For ReassignPartition tool, check if optimizing the writes to ZK after every replica reassignment is possible -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Closed] (KAFKA-906) Invoke halt on shutdown and startup failure to ensure the jvm is brought down
[ https://issues.apache.org/jira/browse/KAFKA-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-906. --- Invoke halt on shutdown and startup failure to ensure the jvm is brought down - Key: KAFKA-906 URL: https://issues.apache.org/jira/browse/KAFKA-906 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sriram Subramanian Assignee: Sriram Subramanian Sometimes kafka is made to run as a service in an external container. This container usually disables the individual services from exiting the process by installing a security manager. The right fix is to implement the startup and shutdown logic of kafka using the interfaces provided by these containers which would involve more work. For 0.8, we will simply call halt as the last step of shutdown and startup during a failure. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-740) Improve crash-safety of log segment swap
[ https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-740: Fix Version/s: (was: 0.8.1) 0.8.2 Improve crash-safety of log segment swap Key: KAFKA-740 URL: https://issues.apache.org/jira/browse/KAFKA-740 Project: Kafka Issue Type: Bug Components: log Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8.2 Currently Log.replaceSegments has a bug that can cause a swap containing multiple segments to partially complete. This would lead to duplicate data in the log. The proposed fix is to use a name like offset1_and_offset2.swap for a segment meant to replace segments with base offsets offset1 and offset2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-740) Improve crash-safety of log segment swap
[ https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-740: Component/s: log Improve crash-safety of log segment swap Key: KAFKA-740 URL: https://issues.apache.org/jira/browse/KAFKA-740 Project: Kafka Issue Type: Bug Components: log Reporter: Jay Kreps Assignee: Jay Kreps Fix For: 0.8.2 Currently Log.replaceSegments has a bug that can cause a swap containing multiple segments to partially complete. This would lead to duplicate data in the log. The proposed fix is to use a name like offset1_and_offset2.swap for a segment meant to replace segments with base offsets offset1 and offset2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-747) RequestChannel re-design
[ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888717#comment-13888717 ] Neha Narkhede commented on KAFKA-747: - Relatively large redesign, moving out of 0.8.1 RequestChannel re-design Key: KAFKA-747 URL: https://issues.apache.org/jira/browse/KAFKA-747 Project: Kafka Issue Type: New Feature Components: network Reporter: Jay Kreps Assignee: Neha Narkhede Fix For: 0.8.2 We have had some discussion around how to handle queuing requests. There are two competing concerns: 1. We need to maintain request order on a per-socket basis. 2. We want to be able to balance load flexibly over a pool of threads so that if one thread blocks on I/O request processing continues. Two Approaches We Have Considered 1. Have a global queue of unprocessed requests. All I/O threads read requests off this global queue and process them. To avoid re-ordering have the network layer only read one request at a time. 2. Have a queue per I/O thread and have the network threads statically map sockets to I/O thread request queues. Problems With These Approaches In the first case you are not able to get any per-producer parallelism. That is you can't read the next request while the current one is being handled. This seems like it would not be a big deal, but preliminary benchmarks show that it might be. In the second case there are two problems. The first is that when an I/O thread gets blocked all request processing for sockets attached to that I/O thread will grind to a halt. If you have 10,000 connections, and 10 I/O threads, then each blockage will stop 1,000 producers. If there is one topic that has long synchronous flush times enabled (or is experiencing fsync locking) this will cause big latency blips for all producers using that I/O thread. The next problem is around backpressure and memory management. Say we use BlockingQueues to feed the I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will then block ALL network threads, since they will block on inserting into that queue, even though the other I/O threads are unused and have empty queues. A Proposed Better Solution The problem with the first solution is that we are not pipelining requests. The problem with the second approach is that we are too constrained in moving work from one I/O thread to another. Instead we should have a single request queue-like structure, but internally enforce the condition that requests are not re-ordered. Here are the details. We retain RequestChannel but refactor its internals. Internally we replace the blocking queue with a linked list. We also keep an in-flight-keys array with one entry per I/O thread. When removing a work item from the list we can't just take the first thing. Instead we need to walk the list and look for something with a request key not in the in-flight-keys array. When a response is sent, we remove that key from the in-flight array. This guarantees that requests for a socket with key K are ordered, but that processing for K can only block requests made by K. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-747) RequestChannel re-design
[ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-747: Fix Version/s: (was: 0.8.1) 0.8.2 RequestChannel re-design Key: KAFKA-747 URL: https://issues.apache.org/jira/browse/KAFKA-747 Project: Kafka Issue Type: New Feature Components: network Reporter: Jay Kreps Assignee: Neha Narkhede Fix For: 0.8.2 We have had some discussion around how to handle queuing requests. There are two competing concerns: 1. We need to maintain request order on a per-socket basis. 2. We want to be able to balance load flexibly over a pool of threads so that if one thread blocks on I/O request processing continues. Two Approaches We Have Considered 1. Have a global queue of unprocessed requests. All I/O threads read requests off this global queue and process them. To avoid re-ordering have the network layer only read one request at a time. 2. Have a queue per I/O thread and have the network threads statically map sockets to I/O thread request queues. Problems With These Approaches In the first case you are not able to get any per-producer parallelism. That is you can't read the next request while the current one is being handled. This seems like it would not be a big deal, but preliminary benchmarks show that it might be. In the second case there are two problems. The first is that when an I/O thread gets blocked all request processing for sockets attached to that I/O thread will grind to a halt. If you have 10,000 connections, and 10 I/O threads, then each blockage will stop 1,000 producers. If there is one topic that has long synchronous flush times enabled (or is experiencing fsync locking) this will cause big latency blips for all producers using that I/O thread. The next problem is around backpressure and memory management. Say we use BlockingQueues to feed the I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will then block ALL network threads, since they will block on inserting into that queue, even though the other I/O threads are unused and have empty queues. A Proposed Better Solution The problem with the first solution is that we are not pipelining requests. The problem with the second approach is that we are too constrained in moving work from one I/O thread to another. Instead we should have a single request queue-like structure, but internally enforce the condition that requests are not re-ordered. Here are the details. We retain RequestChannel but refactor its internals. Internally we replace the blocking queue with a linked list. We also keep an in-flight-keys array with one entry per I/O thread. When removing a work item from the list we can't just take the first thing. Instead we need to walk the list and look for something with a request key not in the in-flight-keys array. When a response is sent, we remove that key from the in-flight array. This guarantees that requests for a socket with key K are ordered, but that processing for K can only block requests made by K. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-806) Index may not always observe log.index.interval.bytes
[ https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-806: Component/s: (was: core) log Index may not always observe log.index.interval.bytes - Key: KAFKA-806 URL: https://issues.apache.org/jira/browse/KAFKA-806 Project: Kafka Issue Type: Improvement Components: log Reporter: Jun Rao Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-404) When using chroot path, create chroot on startup if it doesn't exist
[ https://issues.apache.org/jira/browse/KAFKA-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-404: Fix Version/s: (was: 0.8.1) 0.8.2 When using chroot path, create chroot on startup if it doesn't exist Key: KAFKA-404 URL: https://issues.apache.org/jira/browse/KAFKA-404 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1 Environment: CentOS 5.5, Linux 2.6.18-194.32.1.el5 x86_64 GNU/Linux Reporter: Jonathan Creasy Labels: newbie, patch Fix For: 0.8.2 Attachments: KAFKA-404-0.7.1.patch, KAFKA-404-0.8.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-i.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v2.patch, KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v3.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-803) Offset returned to producer is not consistent
[ https://issues.apache.org/jira/browse/KAFKA-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-803: Fix Version/s: (was: 0.8.1) 0.9.0 Offset returned to producer is not consistent - Key: KAFKA-803 URL: https://issues.apache.org/jira/browse/KAFKA-803 Project: Kafka Issue Type: Bug Reporter: Jun Rao Fix For: 0.9.0 Currently, the offset that we return to the producer is the offset of the first message if the producer request doesn't go through purgatory, and the offset of the last message, if otherwise. We need to make this consistent. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-795) Improvements to PreferredReplicaLeaderElection tool
[ https://issues.apache.org/jira/browse/KAFKA-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-795: Fix Version/s: (was: 0.8.1) 0.8.2 Improvements to PreferredReplicaLeaderElection tool --- Key: KAFKA-795 URL: https://issues.apache.org/jira/browse/KAFKA-795 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swapnil Ghike Assignee: Swapnil Ghike Fix For: 0.8.2 We can make some improvements to the PreferredReplicaLeaderElection tool: 1. Terminate the tool if a controller is not up and running. Currently we can run the tool without having any broker running, which is kind of confusing. 2. Should we delete /admin zookeeper path in PreferredReplicaLeaderElection (and ReassignPartition) tool at the end? Otherwise the next run of the tool complains that a replica election is already in progress. 3. If there is an error, we can see it in cotroller.log. Should the tool also throw an error? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-806) Index may not always observe log.index.interval.bytes
[ https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-806: Fix Version/s: (was: 0.8.1) Index may not always observe log.index.interval.bytes - Key: KAFKA-806 URL: https://issues.apache.org/jira/browse/KAFKA-806 Project: Kafka Issue Type: Improvement Components: log Reporter: Jun Rao Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-766) Isr shrink/expand check is fragile
[ https://issues.apache.org/jira/browse/KAFKA-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-766: Fix Version/s: (was: 0.8.1) Isr shrink/expand check is fragile -- Key: KAFKA-766 URL: https://issues.apache.org/jira/browse/KAFKA-766 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sriram Subramanian Assignee: Neha Narkhede Currently the isr check is coupled tightly with the produce batch size. For example, if the producer batch size is 1 messages and isr check is 4000 messages, we continuously oscillate between shrinking isr and expanding isr every second. This is because a single produce request throws the replica out of the isr. This results in hundreds of calls to ZK (we still dont have multi write). This can be alleviated by making the producer batch size smaller than the isr check size. Going forward, we should try to not have this coupling. It is worth investigating if we can make the check more robust under such scenarios. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-806) Index may not always observe log.index.interval.bytes
[ https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888718#comment-13888718 ] Neha Narkhede commented on KAFKA-806: - [~junrao] Do we still need this? Index may not always observe log.index.interval.bytes - Key: KAFKA-806 URL: https://issues.apache.org/jira/browse/KAFKA-806 Project: Kafka Issue Type: Improvement Components: log Reporter: Jun Rao Currently, each log.append() will add at most 1 index entry, even when the appended data is larger than log.index.interval.bytes. One potential issue is that if a follower restarts after being down for a long time, it may fetch data much bigger than log.index.interval.bytes at a time. This means that fewer index entries are created, which can increase the fetch time from the consumers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-438) Code cleanup in MessageTest
[ https://issues.apache.org/jira/browse/KAFKA-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-438: Fix Version/s: (was: 0.8.1) 0.8.2 Code cleanup in MessageTest --- Key: KAFKA-438 URL: https://issues.apache.org/jira/browse/KAFKA-438 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.7.1 Reporter: Jim Plush Priority: Trivial Fix For: 0.8.2 Attachments: KAFKA-438 While exploring the Unit Tests this class had an unused import statement, some ambiguity on which HashMap implementation was being used and assignments of function returns when not required. Trivial stuff -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1146) toString() on KafkaStream gets stuck indefinitely
[ https://issues.apache.org/jira/browse/KAFKA-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1146: - Fix Version/s: (was: 0.8.1) 0.9.0 toString() on KafkaStream gets stuck indefinitely - Key: KAFKA-1146 URL: https://issues.apache.org/jira/browse/KAFKA-1146 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.0 Reporter: Arup Malakar Priority: Trivial Labels: newbie Fix For: 0.9.0 There is no toString implementation for KafkaStream, so if a user tries to print the stream it falls back to default toString implementation which tries to iterate over the collection and gets stuck indefinitely as it awaits messages. KafkaStream could instead override the toString and return a verbose description of the stream with topic name etc. println(Current stream: + stream) // This call never returns -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1040) ConsumerConfig and ProducerConfig do work in the Constructor
[ https://issues.apache.org/jira/browse/KAFKA-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1040: - Fix Version/s: (was: 0.8.1) 0.9.0 ConsumerConfig and ProducerConfig do work in the Constructor -- Key: KAFKA-1040 URL: https://issues.apache.org/jira/browse/KAFKA-1040 Project: Kafka Issue Type: Improvement Components: config, consumer, producer Affects Versions: 0.8.0 Environment: Java 1.7 Linux Mint 14 (64bit) Reporter: Sharmarke Aden Assignee: Neha Narkhede Priority: Minor Labels: config Fix For: 0.9.0 It appears that validation of configuration properties is performed in the ConsumerConfig and ProducerConfig constructors. This is generally bad practice as it couples object construction and validation. It also makes it difficult to mock these objects in unit tests. Ideally validation of the configuration properties should be separated from object construction and initiated by those that rely/use these config objects. http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1027) Add documentation for system tests
[ https://issues.apache.org/jira/browse/KAFKA-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1027: - Fix Version/s: (was: 0.8.1) 0.9.0 Add documentation for system tests -- Key: KAFKA-1027 URL: https://issues.apache.org/jira/browse/KAFKA-1027 Project: Kafka Issue Type: Sub-task Components: website Reporter: Tejas Patil Priority: Minor Labels: documentation Fix For: 0.9.0 Create a document describing following things: - Overview of Kafka system test framework - how to run the entire system test suite - how to run a specific system test - how to interpret the system test results - how to troubleshoot a failed test case - how to add new test module - how to add new test case -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (KAFKA-712) Controlled shutdown tool should provide a meaningful message if a controller failover occurs during the operation
[ https://issues.apache.org/jira/browse/KAFKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-712. - Resolution: Not A Problem Controlled shutdown tool should provide a meaningful message if a controller failover occurs during the operation - Key: KAFKA-712 URL: https://issues.apache.org/jira/browse/KAFKA-712 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Joel Koshy Priority: Minor Fix For: 0.8.1 If the controller fails over before a jmx connection can be established, the tool shows the following exception: javax.management.InstanceNotFoundException: kafka.controller:type=KafkaController,name=ControllerOps at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassLoaderFor(DefaultMBeanServerInterceptor.java:1438) at com.sun.jmx.mbeanserver.JmxMBeanServer.getClassLoaderFor(JmxMBeanServer.java:1276) at javax.management.remote.rmi.RMIConnectionImpl$5.run(RMIConnectionImpl.java:1326) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.getClassLoaderFor(RMIConnectionImpl.java:1323) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:771) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993) at kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:50) at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:105) at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) Using the retry option on the tool would work, but we should provide a more meaningful message. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1116) Need to upgrade sbt-assembly to compile on scala 2.10.2
[ https://issues.apache.org/jira/browse/KAFKA-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1116: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Since we are moving to gradle Need to upgrade sbt-assembly to compile on scala 2.10.2 --- Key: KAFKA-1116 URL: https://issues.apache.org/jira/browse/KAFKA-1116 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Kane Kim Priority: Minor Labels: build Fix For: 0.8.1 Attachments: kafka-1116-scala.2.10.2.patch Need to upgrade sbt-assembly to 0.9.0 compile on scala 2.10.2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-334) Some tests fail when building on a Windows box
[ https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-334: Fix Version/s: (was: 0.8.1) 0.8.2 Some tests fail when building on a Windows box -- Key: KAFKA-334 URL: https://issues.apache.org/jira/browse/KAFKA-334 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Environment: Windows 7 - reproduces under command shell, cygwin, and MINGW32 (Git Bash) Reporter: Roman Garcia Priority: Minor Labels: build-failure, test-fail Fix For: 0.8.2 Trying to create a ZIP distro from sources failed. On Win7. On cygwin, command shell and git bash. Tried with incubator-src download from ASF download page, as well as fresh checkout from latest trunk (r1329547). Once I tried the same on a Linux box, everything was working ok. svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0 ./sbt update (OK) ./sbt package (OK) ./sbt release-zip (FAIL) Tests failing: [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED [error] Error running kafka.producer.ProducerTest: Test FAILED [error] Error running test: One or more subtasks failed [error] Error running doc: Scaladoc generation failed Stacks: [error] Test Failed: testZKSendWithDeadBroker junit.framework.AssertionFailedError: Message set should have another message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416) [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest) junit.framework.AssertionFailedError: expected:5 but was:0 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121) at kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89) [error] Test Failed: testPartitionedSendToNewTopic java.lang.AssertionError: Unexpected method call send(test-topic1, 0, ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )): close(): expected: 1, actual: 0 at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45) at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73) at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92) at kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated) at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.producer.ProducerPool.send(ProducerPool.scala:100) at kafka.producer.Producer.zkSend(Producer.scala:137) at kafka.producer.Producer.send(Producer.scala:99) at kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-997) Provide a strict verification mode when reading configuration properties
[ https://issues.apache.org/jira/browse/KAFKA-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-997: Fix Version/s: (was: 0.8.1) 0.8.2 Provide a strict verification mode when reading configuration properties Key: KAFKA-997 URL: https://issues.apache.org/jira/browse/KAFKA-997 Project: Kafka Issue Type: Improvement Components: config Reporter: Sam Meder Assignee: Sam Meder Priority: Minor Fix For: 0.8.2 Attachments: strict-verification-2.patch This ticket is based on the discussion in KAFKA-943. It introduces a new property that makes the config system throw an exception when it encounters unrecognized properties. (instead of a simple warn-level log statement). This new property defaults to false. Hopefully this will result in fewer instance of out-of-date configuration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Closed] (KAFKA-1116) Need to upgrade sbt-assembly to compile on scala 2.10.2
[ https://issues.apache.org/jira/browse/KAFKA-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede closed KAFKA-1116. Need to upgrade sbt-assembly to compile on scala 2.10.2 --- Key: KAFKA-1116 URL: https://issues.apache.org/jira/browse/KAFKA-1116 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0, 0.8.1 Reporter: Kane Kim Priority: Minor Labels: build Fix For: 0.8.1 Attachments: kafka-1116-scala.2.10.2.patch Need to upgrade sbt-assembly to 0.9.0 compile on scala 2.10.2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-322) Remove one-off Send objects
[ https://issues.apache.org/jira/browse/KAFKA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-322: Fix Version/s: (was: 0.8.1) 0.9.0 Remove one-off Send objects --- Key: KAFKA-322 URL: https://issues.apache.org/jira/browse/KAFKA-322 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.8.0 Reporter: Jay Kreps Assignee: Jay Kreps Priority: Minor Labels: replication Fix For: 0.9.0 We seem to be accumulating a bunch of unnecessary classes that implement Send. I am not sure why people are doing this. Example: ProducerResponseSend.scala It is not at all clear why we would add a custom send object for each request/response type. They all do the same thing. The only reason for having the concept of a Send object was to allow two implementations: ByteBufferSend and MessageSetSend, the later let's us abstract over the difference between a normal write and a sendfile() call. I think we can refactory ByteBufferSend to take one or more ByteBuffers instead of just one and delete all of these one-offs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-179) Log files always touched when broker is bounced
[ https://issues.apache.org/jira/browse/KAFKA-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-179: Fix Version/s: (was: 0.8.1) 0.8.2 Log files always touched when broker is bounced --- Key: KAFKA-179 URL: https://issues.apache.org/jira/browse/KAFKA-179 Project: Kafka Issue Type: Bug Components: core Reporter: Joel Koshy Priority: Minor Labels: newbie Fix For: 0.8.2 It looks like the latest log segment is always touched when the broker upon start-up regardless of whether it has corrupt data or not, which fudges the segment's mtime. Minor issue, but I found it a bit misleading when trying to verify a log cleanup setting in production. I think it should be as simple as adding a guard in FileMessageSet's recover method to skip truncate if validUpTo == the length of the segment. Will test this later. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-121) pom should include standard maven niceties
[ https://issues.apache.org/jira/browse/KAFKA-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888722#comment-13888722 ] Neha Narkhede commented on KAFKA-121: - [~joestein] What do we need to do here before the 0.8.1 release? pom should include standard maven niceties -- Key: KAFKA-121 URL: https://issues.apache.org/jira/browse/KAFKA-121 Project: Kafka Issue Type: Improvement Components: packaging Reporter: Chris Burroughs Priority: Minor Fix For: 0.8.1 * license info, name, description, etc * webpage link * parent tags for sub-modules * groupid pending mailing list discussion One refernce: http://vasilrem.com/blog/software-development/from-sbt-to-maven-in-one-move/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-313) Add JSON output and looping options to ConsumerOffsetChecker
[ https://issues.apache.org/jira/browse/KAFKA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-313: Fix Version/s: (was: 0.8.1) 0.8.2 Add JSON output and looping options to ConsumerOffsetChecker Key: KAFKA-313 URL: https://issues.apache.org/jira/browse/KAFKA-313 Project: Kafka Issue Type: Improvement Reporter: Dave DeMaagd Priority: Minor Labels: patch Fix For: 0.8.2 Attachments: KAFKA-313-2012032200.diff Adds: * '--loop N' - causes the program to loop forever, sleeping for up to N seconds between loops (loop time minus collection time, unless that's less than 0, at which point it will just run again immediately) * '--asjson' - display as a JSON string instead of the more human readable output format. Neither of the above depend on each other (you can loop in the human readable output, or do a single shot execution with JSON output). Existing behavior/output maintained if neither of the above are used. Diff Attached. Impacted files: core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-334) Some tests fail when building on a Windows box
[ https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888721#comment-13888721 ] Neha Narkhede commented on KAFKA-334: - [~romangarciam] Do these still fail on latest trunk? Some tests fail when building on a Windows box -- Key: KAFKA-334 URL: https://issues.apache.org/jira/browse/KAFKA-334 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Environment: Windows 7 - reproduces under command shell, cygwin, and MINGW32 (Git Bash) Reporter: Roman Garcia Priority: Minor Labels: build-failure, test-fail Fix For: 0.8.2 Trying to create a ZIP distro from sources failed. On Win7. On cygwin, command shell and git bash. Tried with incubator-src download from ASF download page, as well as fresh checkout from latest trunk (r1329547). Once I tried the same on a Linux box, everything was working ok. svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0 ./sbt update (OK) ./sbt package (OK) ./sbt release-zip (FAIL) Tests failing: [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED [error] Error running kafka.producer.ProducerTest: Test FAILED [error] Error running test: One or more subtasks failed [error] Error running doc: Scaladoc generation failed Stacks: [error] Test Failed: testZKSendWithDeadBroker junit.framework.AssertionFailedError: Message set should have another message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416) [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest) junit.framework.AssertionFailedError: expected:5 but was:0 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:277) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121) at kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89) [error] Test Failed: testPartitionedSendToNewTopic java.lang.AssertionError: Unexpected method call send(test-topic1, 0, ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )): close(): expected: 1, actual: 0 at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45) at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73) at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92) at kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated) at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43) at kafka.producer.ProducerPool.send(ProducerPool.scala:100) at kafka.producer.Producer.zkSend(Producer.scala:137) at kafka.producer.Producer.send(Producer.scala:99) at kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576) [error] Test Failed: testZKSendToNewTopic junit.framework.AssertionFailedError: Message set should have 1 message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1041) Number of file handles increases indefinitely in producer if broker host is unresolvable
[ https://issues.apache.org/jira/browse/KAFKA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1041: - Fix Version/s: (was: 0.8.1) 0.8.2 Number of file handles increases indefinitely in producer if broker host is unresolvable Key: KAFKA-1041 URL: https://issues.apache.org/jira/browse/KAFKA-1041 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.0 Environment: *unix* Reporter: Rajasekar Elango Assignee: Neha Narkhede Labels: features, newbie Fix For: 0.8.2 We found a issue that if broker host is un resolvable, the number of file handle keep increasing for every message we produce and eventually it uses up all available files handles in operating system. If broker itself is not running and broker host name is resolvable, open file handles count stays flat. lsof output shows number of these open file handles continue to grow for every message we produce. java 19631relango 81u sock0,6 0t0 196966526 can't identify protocol I can easily reproduce this with console producer, If I run console producer with right hostname and if broker is not running, the console producer will exit after three tries. But If I run console producer with unresolvable broker, it throws below exception and continues to wait for user input, every time I enter new message, it opens socket and file handle count keeps increasing.. Here is Exception in producer ERROR fetching topic metadata for topics [Set(test-1378245487417)] from broker [ArrayBuffer(id:0,host:localhost1,port:6667)] failed (kafka.utils.Utils$) kafka.common.KafkaException: fetching topic metadata for topics [Set(test-1378245487417)] from broker [ArrayBuffer(id:0,host:localhost1,port:6667)] failed at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:51) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.async.DefaultEventHandler$$anonfun$handle$2.apply$mcV$sp(DefaultEventHandler.scala:79) at kafka.utils.Utils$.swallow(Utils.scala:186) at kafka.utils.Logging$class.swallowError(Logging.scala:105) at kafka.utils.Utils$.swallowError(Utils.scala:45) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:79) at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104) at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87) at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67) at scala.collection.immutable.Stream.foreach(Stream.scala:526) at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66) at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:30) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:487) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:59) at kafka.producer.SyncProducer.connect(SyncProducer.scala:151) at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:166) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:73) at kafka.producer.SyncProducer.send(SyncProducer.scala:117) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:37) ... 12 more -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1041) Number of file handles increases indefinitely in producer if broker host is unresolvable
[ https://issues.apache.org/jira/browse/KAFKA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1041: - Labels: features newbie (was: features) Number of file handles increases indefinitely in producer if broker host is unresolvable Key: KAFKA-1041 URL: https://issues.apache.org/jira/browse/KAFKA-1041 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.0 Environment: *unix* Reporter: Rajasekar Elango Assignee: Neha Narkhede Labels: features, newbie Fix For: 0.8.2 We found a issue that if broker host is un resolvable, the number of file handle keep increasing for every message we produce and eventually it uses up all available files handles in operating system. If broker itself is not running and broker host name is resolvable, open file handles count stays flat. lsof output shows number of these open file handles continue to grow for every message we produce. java 19631relango 81u sock0,6 0t0 196966526 can't identify protocol I can easily reproduce this with console producer, If I run console producer with right hostname and if broker is not running, the console producer will exit after three tries. But If I run console producer with unresolvable broker, it throws below exception and continues to wait for user input, every time I enter new message, it opens socket and file handle count keeps increasing.. Here is Exception in producer ERROR fetching topic metadata for topics [Set(test-1378245487417)] from broker [ArrayBuffer(id:0,host:localhost1,port:6667)] failed (kafka.utils.Utils$) kafka.common.KafkaException: fetching topic metadata for topics [Set(test-1378245487417)] from broker [ArrayBuffer(id:0,host:localhost1,port:6667)] failed at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:51) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.async.DefaultEventHandler$$anonfun$handle$2.apply$mcV$sp(DefaultEventHandler.scala:79) at kafka.utils.Utils$.swallow(Utils.scala:186) at kafka.utils.Logging$class.swallowError(Logging.scala:105) at kafka.utils.Utils$.swallowError(Utils.scala:45) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:79) at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:104) at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:87) at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:67) at scala.collection.immutable.Stream.foreach(Stream.scala:526) at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:66) at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:44) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:30) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:487) at kafka.network.BlockingChannel.connect(BlockingChannel.scala:59) at kafka.producer.SyncProducer.connect(SyncProducer.scala:151) at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:166) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:73) at kafka.producer.SyncProducer.send(SyncProducer.scala:117) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:37) ... 12 more -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1213) Adding fetcher needs to be avoided upon make-follower when replica manager is shutting down
[ https://issues.apache.org/jira/browse/KAFKA-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1213: - Fix Version/s: (was: 0.8.1) 0.8.2 Adding fetcher needs to be avoided upon make-follower when replica manager is shutting down --- Key: KAFKA-1213 URL: https://issues.apache.org/jira/browse/KAFKA-1213 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Fix For: 0.8.2 Today in ReplicaManager.makeFollowers, we check if isShuttingDown.get() is false before adding fetchers. However this check cannot avoid adding fetcher while at the same time shutting down replica manager since the isShuttingDown can be set to false after the check. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1190) create a draw performance graph script
[ https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888723#comment-13888723 ] Neha Narkhede commented on KAFKA-1190: -- Patch pending, moving out of 0.8.1 create a draw performance graph script -- Key: KAFKA-1190 URL: https://issues.apache.org/jira/browse/KAFKA-1190 Project: Kafka Issue Type: Improvement Reporter: Joe Stein Fix For: 0.8.2 Attachments: KAFKA-1190.patch This will be an R script to draw relevant graphs given a bunch of csv files from the above tools. The output of this script will be a bunch of png files that can be combined with some html to act as a perf report. Here are the graphs that would be good to see: * Latency histogram for producer * MB/sec and messages/sec produced * MB/sec and messages/sec consumed * Flush time * Errors (should not be any) * Consumer cache hit ratio (both the bytes and count, specifically 1 #physical_reads / #requests and 1 - physical_bytes_read / bytes_read) * Write merge ratio (num_physical_writes/num_produce_requests and avg_request_size/avg_physical_write_size) CPU, network, io, etc -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1230) shell script files under bin don't work with cygwin (bash on windows)
[ https://issues.apache.org/jira/browse/KAFKA-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888724#comment-13888724 ] Neha Narkhede commented on KAFKA-1230: -- [~aloklal99] Would you like to attach a patch here? shell script files under bin don't work with cygwin (bash on windows) - Key: KAFKA-1230 URL: https://issues.apache.org/jira/browse/KAFKA-1230 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.8.0 Environment: The change have been tested under GNU bash, version 4.1.11(2)-release (x86_64-unknown-cygwin) running on Windows 7 Enterprise. Reporter: Alok Lal Fix For: 0.8.0, 0.8.1 Original Estimate: 24h Remaining Estimate: 24h h3. Introduction This bug is being created for a pull request that I had submitted earlier for these. Per Jun this is so changes confirm to Apache license. h3. Background The script files to run Kafka under Windows don't work as is. One needs to hand tweak them since their location is not bin but bin/windows. Further, the script files under bin/windows are not a complete replica of those under bin. To be sure, this isn't a complaint. To the contrary most projects now-a-days don't bother to support running on Windows or do so very late. Just that because of these limitation it might be more prudent to make the script files under bin itself run under windows rather than trying to make the files under bin/windows work or to make them complete. h3. Change Summary Most common unix-like shell on windows is the bash shell which is a part of the cygwin project. Out of the box the scripts don't work mostly due to peculiarities of the directory paths and class path separators. This change set makes a focused change to a single file under bin so that all of the script files under bin would work as is on windows platform when using bash shell of Cygwin distribution. h3. Motivation Acceptance of this change would enable a vast body of developers that use (or have to use) Windows as their development/testing/production platform to use Kafka's with ease. More importantly by making the running of examples smoothly on Windoes+Cygwin-bash it would make the process of evaluation of Kafka simpler and smoother and potentially make for a favorable evaluation. For, it would show commitment of the Kafka team to espouse deployments on Windows (albeit only under cygwin). Further, as the number of people whom use Kafka on Windows increases, one would attract people who can eventually fix the script files under bin/Windows itself so that need to run under Cygwin would also go away, too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1061) Break-down sendTime to multipleSendTime
[ https://issues.apache.org/jira/browse/KAFKA-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1061: - Fix Version/s: (was: 0.8.1) 0.8.2 Break-down sendTime to multipleSendTime --- Key: KAFKA-1061 URL: https://issues.apache.org/jira/browse/KAFKA-1061 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.2 After KAFKA-1060 is done we would also like to break the sendTime to each MultiSend's time and its corresponding send data size. This is related to KAFKA-1043 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1054) Elimiate Compilation Warnings for 0.8 Final Release
[ https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1054: - Fix Version/s: (was: 0.8.1) 0.8.2 Elimiate Compilation Warnings for 0.8 Final Release --- Key: KAFKA-1054 URL: https://issues.apache.org/jira/browse/KAFKA-1054 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Fix For: 0.8.2 Currently we have a total number of 38 warnings for source code compilation of 0.8. 1) 3 from Unchecked type pattern 2) 6 from Unchecked conversion 3) 29 from Deprecated Hadoop API functions It's better we finish these before the final release of 0.8 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1056: - Component/s: log Evenly Distribute Intervals in OffsetIndex -- Key: KAFKA-1056 URL: https://issues.apache.org/jira/browse/KAFKA-1056 Project: Kafka Issue Type: Improvement Components: log Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.2 Today a new entry will be created in OffsetIndex for each produce request regardless of the number of messages it contains. It is better to evenly distribute the intervals between index entries for index search efficiency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1080) why are builds for 2.10 not coming out with the trailing minor version number
[ https://issues.apache.org/jira/browse/KAFKA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888726#comment-13888726 ] Neha Narkhede commented on KAFKA-1080: -- [~joestein] Do we need any changes here? why are builds for 2.10 not coming out with the trailing minor version number - Key: KAFKA-1080 URL: https://issues.apache.org/jira/browse/KAFKA-1080 Project: Kafka Issue Type: Task Reporter: Joe Stein Fix For: 0.8.1 Joes-MacBook-Air:kafka joestein$ ls -l target/ total 0 drwxr-xr-x 4 joestein wheel 136 Sep 26 22:54 resolution-cache drwxr-xr-x 42 joestein wheel 1428 Oct 10 00:33 scala-2.10 drwxr-xr-x 21 joestein wheel 714 Oct 9 23:37 scala-2.8.0 drwxr-xr-x 21 joestein wheel 714 Oct 9 23:38 scala-2.8.2 drwxr-xr-x 21 joestein wheel 714 Oct 9 23:39 scala-2.9.1 drwxr-xr-x 21 joestein wheel 714 Oct 9 23:40 scala-2.9.2 drwxr-xr-x 5 joestein wheel 170 Oct 10 00:02 streams -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-472) update metadata in batches in Producer
[ https://issues.apache.org/jira/browse/KAFKA-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-472: Fix Version/s: (was: 0.8.1) 0.9.0 update metadata in batches in Producer -- Key: KAFKA-472 URL: https://issues.apache.org/jira/browse/KAFKA-472 Project: Kafka Issue Type: Bug Components: core Reporter: Jun Rao Labels: optimization Fix For: 0.9.0 Currently, the producer obtains the metadata of topics being produced one at a time. This means that tools like mirror maker will make many getMetadata requests to the broker. Ideally, we should make BrokerPartition.getBrokerPartitionInfo() a batch api that takes a list of topics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue
[ https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1043: - Fix Version/s: (was: 0.8.1) 0.8.2 Time-consuming FetchRequest could block other request in the response queue --- Key: KAFKA-1043 URL: https://issues.apache.org/jira/browse/KAFKA-1043 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.2 Since in SocketServer the processor who takes any request is also responsible for writing the response for that request, we make each processor owning its own response queue. If a FetchRequest takes irregularly long time to write the channel buffer it would block all other responses in the queue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-485) Support MacOS for this test framework
[ https://issues.apache.org/jira/browse/KAFKA-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-485: Fix Version/s: (was: 0.8.1) 0.9.0 Support MacOS for this test framework - Key: KAFKA-485 URL: https://issues.apache.org/jira/browse/KAFKA-485 Project: Kafka Issue Type: Sub-task Affects Versions: 0.8.0 Reporter: John Fung Assignee: John Fung Labels: replication-testing Fix For: 0.9.0 Currently this test framework doesn't work properly in MacOS due to the different ps arguments from Linux. It is required to have ps to work in MacOS to stop background running processes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (KAFKA-1032) Messages sent to the old leader will be lost on broker GC resulted failure
[ https://issues.apache.org/jira/browse/KAFKA-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888727#comment-13888727 ] Neha Narkhede commented on KAFKA-1032: -- [~junrao], [~guozhang] Do we want this in 0.8.1? Messages sent to the old leader will be lost on broker GC resulted failure -- Key: KAFKA-1032 URL: https://issues.apache.org/jira/browse/KAFKA-1032 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.8.1 Attachments: KAFKA-1032.v1.patch As pointed out by Swapnil, today when a broker in on long GC, it will marked by the controller as failed and trigger the onBrokerFailure function to migrate leadership to other brokers. However, since the Controller does not notify the broker with stopReplica request even after a new leader has been elected for its partitions. The new leader will hence stop fetching from the old leader while the old leader is not aware that he is no longer the leader. And since the old leader is not really dead producers will not refresh their metadata immediately and will continue sending messages to the old leader. The old leader will only know it is no longer the leader when it gets notified by controller in the onBrokerStartup function, and message sent starting from the time the new leader is elected to the timestamp the old leader realize it is no longer the leader will be lost. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-1026) Dynamically Adjust Batch Size Upon Receiving MessageSizeTooLargeException
[ https://issues.apache.org/jira/browse/KAFKA-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1026: - Fix Version/s: (was: 0.8.1) 0.9.0 Dynamically Adjust Batch Size Upon Receiving MessageSizeTooLargeException - Key: KAFKA-1026 URL: https://issues.apache.org/jira/browse/KAFKA-1026 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Among the exceptions that can possibly received in Producer.send(), MessageSizeTooLargeException is currently not recoverable since the producer does not change the batch size but still retries on sending. It is better to have a dynamic batch size adjustment mechanism based on MessageSizeTooLargeException. This is related to KAFKA-998 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-934) kafka hadoop consumer and producer use older 0.19.2 hadoop api's
[ https://issues.apache.org/jira/browse/KAFKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-934: Labels: hadoop hadoop-2.0 newbie (was: hadoop hadoop-2.0) kafka hadoop consumer and producer use older 0.19.2 hadoop api's Key: KAFKA-934 URL: https://issues.apache.org/jira/browse/KAFKA-934 Project: Kafka Issue Type: Bug Components: contrib Affects Versions: 0.8.0 Environment: [amilkowski@localhost impl]$ uname -a Linux localhost.localdomain 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Reporter: Andrew Milkowski Labels: hadoop, hadoop-2.0, newbie Fix For: 0.8.2 New hadoop api present in 0.20.1 especially package org.apache.hadoop.mapredude.lib is not used code affected is both consumer and producer in kafka in the contrib package [amilkowski@localhost contrib]$ pwd /opt/local/git/kafka/contrib [amilkowski@localhost contrib]$ ls -lt total 12 drwxrwxr-x 8 amilkowski amilkowski 4096 May 30 11:14 hadoop-consumer drwxrwxr-x 6 amilkowski amilkowski 4096 May 29 19:31 hadoop-producer drwxrwxr-x 6 amilkowski amilkowski 4096 May 29 16:43 target [amilkowski@localhost contrib]$ in example import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.RunningJob; import org.apache.hadoop.mapred.TextOutputFormat; use 0.19.2 hadoop api format, this prevents merging of hadoop feature into more modern hadoop implementation instead of drawing from 0.20.1 api set in import org.apache.hadoop.mapreduce -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-478) Move start_consumer start_producer inside start_entity_in_background
[ https://issues.apache.org/jira/browse/KAFKA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-478: Fix Version/s: (was: 0.8.1) 0.9.0 Move start_consumer start_producer inside start_entity_in_background Key: KAFKA-478 URL: https://issues.apache.org/jira/browse/KAFKA-478 Project: Kafka Issue Type: Sub-task Affects Versions: 0.8.0 Reporter: John Fung Assignee: John Fung Labels: replication-testing Fix For: 0.9.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-665) Outgoing responses delayed on a busy Kafka broker
[ https://issues.apache.org/jira/browse/KAFKA-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-665: Fix Version/s: (was: 0.8.1) 0.8.2 Outgoing responses delayed on a busy Kafka broker -- Key: KAFKA-665 URL: https://issues.apache.org/jira/browse/KAFKA-665 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Neha Narkhede Priority: Critical Labels: replication-performance Fix For: 0.8.2 In a long running test, I observed that after a few hours of operation, few requests start timing out, mainly because they spent very long time sitting in the response queue - [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$) [2012-12-07 22:04:12,046] TRACE Completed request with correlation id 3962561 and client : TopicMetadataRequest:3449, queueTime:0, localTime:29, remoteTime:0, sendTime:3420 (kafka.network.RequestChannel$) [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$) We might have a problem in the way we process outgoing responses. Basically, if the processor thread blocks on enqueuing requests in the request queue, it doesn't come around to processing its responses which are ready to go out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-934) kafka hadoop consumer and producer use older 0.19.2 hadoop api's
[ https://issues.apache.org/jira/browse/KAFKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-934: Fix Version/s: (was: 0.8.1) 0.8.2 kafka hadoop consumer and producer use older 0.19.2 hadoop api's Key: KAFKA-934 URL: https://issues.apache.org/jira/browse/KAFKA-934 Project: Kafka Issue Type: Bug Components: contrib Affects Versions: 0.8.0 Environment: [amilkowski@localhost impl]$ uname -a Linux localhost.localdomain 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Reporter: Andrew Milkowski Labels: hadoop, hadoop-2.0, newbie Fix For: 0.8.2 New hadoop api present in 0.20.1 especially package org.apache.hadoop.mapredude.lib is not used code affected is both consumer and producer in kafka in the contrib package [amilkowski@localhost contrib]$ pwd /opt/local/git/kafka/contrib [amilkowski@localhost contrib]$ ls -lt total 12 drwxrwxr-x 8 amilkowski amilkowski 4096 May 30 11:14 hadoop-consumer drwxrwxr-x 6 amilkowski amilkowski 4096 May 29 19:31 hadoop-producer drwxrwxr-x 6 amilkowski amilkowski 4096 May 29 16:43 target [amilkowski@localhost contrib]$ in example import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.RunningJob; import org.apache.hadoop.mapred.TextOutputFormat; use 0.19.2 hadoop api format, this prevents merging of hadoop feature into more modern hadoop implementation instead of drawing from 0.20.1 api set in import org.apache.hadoop.mapreduce -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-777) Add system tests for important tools
[ https://issues.apache.org/jira/browse/KAFKA-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-777: Fix Version/s: (was: 0.8.1) 0.9.0 Add system tests for important tools Key: KAFKA-777 URL: https://issues.apache.org/jira/browse/KAFKA-777 Project: Kafka Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sriram Subramanian Assignee: John Fung Labels: kafka-0.8, p2, replication-testing Fix For: 0.9.0 Few tools were broken after the zk format change. It would be great to catch these issues during system tests. Some of the tools are 1. ShudownBroker 2. PreferredReplicaAssignment 3. ConsumerOffsetChecker There might be a few more for which we need tests. Need to add them once identified. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (KAFKA-853) Allow OffsetFetchRequest to initialize offsets
[ https://issues.apache.org/jira/browse/KAFKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-853: Fix Version/s: (was: 0.8.1) 0.8.2 Allow OffsetFetchRequest to initialize offsets -- Key: KAFKA-853 URL: https://issues.apache.org/jira/browse/KAFKA-853 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.8.1 Reporter: David Arthur Fix For: 0.8.2 Original Estimate: 24h Remaining Estimate: 24h It would be nice for the OffsetFetchRequest API to have the option to initialize offsets instead of returning unknown_topic_or_partition. It could mimic the Offsets API by adding the time field and then follow the same code path on the server as the Offset API. In this case, the response would need to a boolean to indicate if the returned offset was initialized or fetched from ZK. This would simplify the client logic when dealing with new topics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Preparing for 0.8.1 release
Hi, I cleaned up the JIRAs in preparation for the 0.8.1 release. Here are some JIRAshttps://issues.apache.org/jira/browse/KAFKA-1158?jql=fixVersion%20%3D%20%220.8.1%22%20AND%20project%20%3D%20KAFKA%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESCmarked for 0.8.1 that either need a patch review or were filed to fix build issues that were discovered during the 0.8 release. Let's review this list and fix the build issues before the 0.8.1 release. We can probably start a vote in 1-2 weeks. Thanks, Neha
Re: Proposed Changes To New Producer Public API
1. Change send to use java.util.concurrent.Future in send(): FutureRecordPosition send(ProducerRecord record, Callback callback) The cancel method will always return false and not do anything. Callback will then be changed to interface Callback { public void onCompletion(RecordPosition position, Exception exception); } so that the exception is available there too and will be null if no error occurred. I'm not planning on changing the name callback because I haven't thought of a better one. +1 2. We will change the way serialization works to proposal 1A in the previous discussion. That is the Partitioner and Serializer interfaces will disappear. ProducerRecord will change to: class ProducerRecord { public byte[] key() {...} public byte[] value() {...} public Integer partition() {...} // can be null } In order to allow correctly choosing a partition the Producer interface will include a new method: ListPartitionInfo partitionsForTopic(String topic); PartitionInfo will be changed to include the actual Node objects not just the Node ids. Mostly agree with this but wanted to confirm my understanding of partitionsForTopic. So, is that something a user has to remember to invoke in order to detect newly added partitions to a topic? Will each invocation of partitionsForTopic lead to a TopicMetadataRequest RPC? 3. I will make the producer implement java.io.Closable but not throw any exceptions as there doesn't really seem to be any disadvantage to this and the interface may remind people to call close. +1 Non-changes 1. I don't plan to change the build system. The SBT=gradle change is basically orthoganol and we should debate it in the context of its ticket. +1 2. I'm going to stick with my oddball kafka.* rather than org.apache.kafka.* package name and non-getter methods unless everyone complains. -1. I prefer org.apache.kafka for consistency with other top level Apache projects 3. I'm not going to introduce a zookeeper dependency in the client as I don't see any advantage. +1. I see the point about non-java clients not having an easy way to talk to zookeeper, but don't really think the connections to zookeeper is the problem. 4. There were a number of reasonable suggestions on the Callback execution model. I'm going to leave it as is, though. Because we are moving to use Java Future we can't include functionality like ListenableFuture. I think the simplestic callback model where callbacks are executed in the i/o thread should be good enough for most cases and other cases can use their own threading. +1 On Sat, Feb 1, 2014 at 9:06 AM, Joel Koshy jjkosh...@gmail.com wrote: . In order to allow correctly choosing a partition the Producer interface will include a new method: ListPartitionInfo partitionsForTopic(String topic); PartitionInfo will be changed to include the actual Node objects not just the Node ids. Why are the node id's alone insufficient? You can still do round-robin/connection-limiting, etc. with just the node id right? Actually no - we do need the full node info. -- Sent from Gmail Mobile
Re: Review Request 17460: Patch for KAFKA-330
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17460/ --- (Updated Feb. 1, 2014, 10:58 p.m.) Review request for kafka. Bugs: KAFKA-330 https://issues.apache.org/jira/browse/KAFKA-330 Repository: kafka Description (updated) --- Removed init() API from TopicDeletionManager and added docs to TopicDeletionManager to describe the lifecycle of topic deletion Updated docs for the new states. Removed the changes to log4j.properties Cleanup unused APIs, consolidated APIs of TopicDeletionManager, added docs, unit tests working Moved deletion states into ReplicaStateMachine. All unit tests pass. Cleanup of some APIs pending Changed controller to reference APIs in TopicDeletionManager. All unit tests pass Introduced a TopicDeletionManager. KafkaController changes pending to use the new TopicDeletionManager Addressed Guozhang's review comments Fixed docs in a few places Fixed the resume logic for partition reassignment to also include topics that are queued up for deletion, since topic deletetion is halted until partition reassignment can finish anyway. We need to let partition reassignment finish (since it started before topic deletion) so that topic deletion can resume Organized imports Moved offline replica handling to controller failover Reading replica assignment from zookeeper instead of local cache Deleting unused APIs Reverting the change to the stop replica request protocol. Instead hacking around with callbacks All functionality and all unit tests working Rebased with trunk after controller cleanup patch Diffs (updated) - core/src/main/scala/kafka/admin/AdminUtils.scala a167756f0fd358574c8ccb42c5c96aaf13def4f5 core/src/main/scala/kafka/admin/TopicCommand.scala 842c11047cca0531fbc572fdb25523244ba2b626 core/src/main/scala/kafka/api/ControlledShutdownResponse.scala a80aa4924cfe9a4670591d03258dd82c428bc3af core/src/main/scala/kafka/api/LeaderAndIsrRequest.scala a984878fbd8147b21211829a49de511fd1335421 core/src/main/scala/kafka/api/StopReplicaRequest.scala 820f0f57b00849a588a840358d07f3a4a31772d4 core/src/main/scala/kafka/api/StopReplicaResponse.scala d7e36308263aec2298e8adff8f22e18212e33fca core/src/main/scala/kafka/api/UpdateMetadataRequest.scala 54dd7bd4e195cc2ff4637ac93e2f9b681e316024 core/src/main/scala/kafka/controller/ControllerChannelManager.scala ea8485b479155b479c575ebc89a4f73086c872cb core/src/main/scala/kafka/controller/DeleteTopicsThread.scala PRE-CREATION core/src/main/scala/kafka/controller/KafkaController.scala a0267ae2670e8d5f365e49ec0fa5db1f62b815bf core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala fd9200f3bf941aab54df798bb5899eeb552ea3a3 core/src/main/scala/kafka/controller/PartitionStateMachine.scala ac4262a403fc73edaecbddf55858703c640b11c0 core/src/main/scala/kafka/controller/ReplicaStateMachine.scala 483559aa64726c51320d18b64a1b48f8fb2905a0 core/src/main/scala/kafka/controller/TopicDeletionManager.scala PRE-CREATION core/src/main/scala/kafka/network/BlockingChannel.scala d22dabdf4fc2346c5487b9fd94cadfbcab70040d core/src/main/scala/kafka/server/KafkaApis.scala bd7940b80ca1f1fa4a671c49cf6be1aeec2bbd7e core/src/main/scala/kafka/server/KafkaHealthcheck.scala 9dca55c9254948f1196ba17e1d3ebacdcd66be0c core/src/main/scala/kafka/server/OffsetCheckpoint.scala b5719f89f79b9f2df4b6cb0f1c869b6eae9f8a7b core/src/main/scala/kafka/server/ReplicaManager.scala f9d10d385cee49a1e3be8c82e3ffa22ef87a8fd6 core/src/main/scala/kafka/server/TopicConfigManager.scala 42e98dd66f3269e6e3a8210934dabfd65df2dba9 core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala b189619bdc1b0d2bba8e8f88467fce014be96ccd core/src/main/scala/kafka/utils/ZkUtils.scala b42e52b8e5668383b287b2a86385df65e51b5108 core/src/test/scala/unit/kafka/admin/AdminTest.scala 59de1b469fece0b28e1d04dcd7b7015c12576abb core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala PRE-CREATION core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala 8df0982a1e71e3f50a073c4ae181096d32914f3e core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9aea67b140e50c6f9f1868ce2e2aac9e7530fa77 core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala c0475d07a778ff957ad266c08a7a81ea500debd2 core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 03e6266ffdad5891ec81df786bd094066b78b4c0 core/src/test/scala/unit/kafka/utils/TestUtils.scala 426b1a7bea1d83a64081f2c6b672c88c928713b7 Diff: https://reviews.apache.org/r/17460/diff/ Testing --- Several integration tests added to test - 1. Topic deletion when all replica brokers are alive 2. Halt and resume topic deletion after a follower replica is restarted 3. Halt and resume topic deletion after a controller failover 4. Request handling
[jira] [Commented] (KAFKA-330) Add delete topic support
[ https://issues.apache.org/jira/browse/KAFKA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888769#comment-13888769 ] Neha Narkhede commented on KAFKA-330: - Updated reviewboard https://reviews.apache.org/r/17460/ against branch trunk Add delete topic support - Key: KAFKA-330 URL: https://issues.apache.org/jira/browse/KAFKA-330 Project: Kafka Issue Type: Bug Components: controller, log, replication Affects Versions: 0.8.0, 0.8.1 Reporter: Neha Narkhede Assignee: Neha Narkhede Labels: features, project Attachments: KAFKA-330.patch, KAFKA-330_2014-01-28_15:19:20.patch, KAFKA-330_2014-01-28_22:01:16.patch, KAFKA-330_2014-01-31_14:19:14.patch, KAFKA-330_2014-01-31_17:45:25.patch, KAFKA-330_2014-02-01_11:30:32.patch, KAFKA-330_2014-02-01_14:58:31.patch, kafka-330-v1.patch, kafka-330-v2.patch One proposal of this API is here - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+replication+detailed+design+V2#KafkareplicationdetaileddesignV2-Deletetopic -- This message was sent by Atlassian JIRA (v6.1.5#6160)