[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release
[ https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krzysztof SzafraĆski updated KAFKA-1054: Summary: Eliminate Compilation Warnings for 0.8 Final Release (was: Elimiate Compilation Warnings for 0.8 Final Release) Eliminate Compilation Warnings for 0.8 Final Release Key: KAFKA-1054 URL: https://issues.apache.org/jira/browse/KAFKA-1054 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1054.patch Currently we have a total number of 38 warnings for source code compilation of 0.8. 1) 3 from Unchecked type pattern 2) 6 from Unchecked conversion 3) 29 from Deprecated Hadoop API functions It's better we finish these before the final release of 0.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1633) Data loss if broker is killed
[ https://issues.apache.org/jira/browse/KAFKA-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142124#comment-14142124 ] Jun Rao commented on KAFKA-1633: If both brokers are shut down normally, there shouldn't be any data loss since during the normal shutdown, we force unflushed data to disk. The hard killing case is probably a bit different. Depending on the OS and file system, unflushed data could be lost during a hard kill. If that's the case, when both brokers are hard killed, some previously acked messages (but not flushed yet) could be lost. Data loss if broker is killed - Key: KAFKA-1633 URL: https://issues.apache.org/jira/browse/KAFKA-1633 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.1.1 Environment: centos 6.3, open jdk 7 Reporter: gautham varada Assignee: Jun Rao We have a 2 node kafka cluster, we experienced data loss when we did a kill -9 on the brokers. We also found a work around to prevent this loss. Replication factor :2, 4 partitions Steps to reproduce 1. Create a 2 node cluster with replication factor 2, num partitions 4 2. We used Jmeter to pump events 3. We used kafka web console to inspect the log size after the test During the test, we simultaneously killed the brokers using kill -9 and we tallied the metrics reported by jmeter and the size we observed in the web console, we lost tons of messages. We went back and set the Producer retry to 1 instead of the default 3 and repeated the above tests and we did not loose a single message. We repeated the above tests with the Producer retry set to 3 and 1 with a single broker and we observed data loss when the retry was 3 and no loss when the retry was 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25703: Patch for KAFKA-1490
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25703/#review54082 --- Thanks for the patch. Could you update the README to explain any extra step that's needed before using gradlew? - Jun Rao On Sept. 16, 2014, 6:54 p.m., Ivan Lyutov wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25703/ --- (Updated Sept. 16, 2014, 6:54 p.m.) Review request for kafka. Bugs: KAFKA-1490 https://issues.apache.org/jira/browse/KAFKA-1490 Repository: kafka Description --- Removed wrapper generated contents and added default task to generate gradle wrapper binary. remove gradlew initial setup output from source distribution Diffs - build.gradle 74c8c8a9e2f4d9a651181d5337d5a8f07f0cb313 gradle/wrapper/gradle-wrapper.jar a7634b071cb255e91a4572934e55b8cd8877b3e4 gradle/wrapper/gradle-wrapper.properties d238df326fec6d925ccecdecaac0bcedf8e68672 wrapper.gradle PRE-CREATION Diff: https://reviews.apache.org/r/25703/diff/ Testing --- Thanks, Ivan Lyutov
[jira] [Resolved] (KAFKA-1634) Update protocol wiki to reflect the new offset management feature
[ https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao resolved KAFKA-1634. Resolution: Fixed The wiki is actually consistent with the code in trunk. The timestamp is used to decide whether some old consumer offsets can be garbage collected. Typically, the timestamp should just be the time when the offset is committed (and can be obtained just on the broker side). However, this idea is that a client, if needed, can set a larger timestamp if it wants to preserve the offset for longer. It's not clear to me if this is super useful. We can debate that in a separate jira if needed. Update protocol wiki to reflect the new offset management feature - Key: KAFKA-1634 URL: https://issues.apache.org/jira/browse/KAFKA-1634 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Joel Koshy Priority: Blocker Fix For: 0.8.2 From the mailing list - following up on this -- I think the online API docs for OffsetCommitRequest still incorrectly refer to client-side timestamps: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest Wasn't that removed and now always handled server-side now? Would one of the devs mind updating the API spec wiki? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142132#comment-14142132 ] Jun Rao commented on KAFKA-1493: Since this is a blocker for 0.8.2, if we can't get this fixed in the next few days, I suggest that we just remove the documentation in producerConfig about the LZ4 and leave LZ4 an unsupported compression codec for now. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1054) Eliminate Compilation Warnings for 0.8 Final Release
[ https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1054: - Reviewer: Neha Narkhede Eliminate Compilation Warnings for 0.8 Final Release Key: KAFKA-1054 URL: https://issues.apache.org/jira/browse/KAFKA-1054 Project: Kafka Issue Type: Improvement Reporter: Guozhang Wang Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1054.patch Currently we have a total number of 38 warnings for source code compilation of 0.8. 1) 3 from Unchecked type pattern 2) 6 from Unchecked conversion 3) 29 from Deprecated Hadoop API functions It's better we finish these before the final release of 0.8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KAFKA-1634) Update protocol wiki to reflect the new offset management feature
[ https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede reopened KAFKA-1634: -- Assignee: Jun Rao (was: Joel Koshy) Reopening until we have filed a JIRA to fix the issue. Update protocol wiki to reflect the new offset management feature - Key: KAFKA-1634 URL: https://issues.apache.org/jira/browse/KAFKA-1634 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Jun Rao Priority: Blocker Fix For: 0.8.2 From the mailing list - following up on this -- I think the online API docs for OffsetCommitRequest still incorrectly refer to client-side timestamps: https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest Wasn't that removed and now always handled server-side now? Would one of the devs mind updating the API spec wiki? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker
[ https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142168#comment-14142168 ] Neha Narkhede commented on KAFKA-1618: -- Thanks for the patch, [~balaji.sesha...@dish.com]. Few review comments - ConsoleProducer 1. The variable regex is not used on line 222. 2. In the validate() API, shouldn't the regex be val regex = new Regex(:[0-9].*) instead? 3. Since the PortValidator is common to all tools, it is worth moving the validateAndDie to a new class ToolsUtils. I'm sure we can refactor the tools to move some commonly usable APIs there. 4. Let's rename validateAndDie to validatePortOrDie. When you create ToolsUtils, let's only move validatePortOrDie there and move the logic from validate to validatePortOrDie. validate is only used by that API. Throughout the patch- 1. Please remove the whitespace introduced for the imports. Let's keep things consistent across the codebase 2. Please get rid of the whitespace between if and (. Also not consistent with the rest of the code. Exception thrown when running console producer with no port number for the broker - Key: KAFKA-1618 URL: https://issues.apache.org/jira/browse/KAFKA-1618 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.1.1 Reporter: Gwen Shapira Assignee: Balaji Seshadri Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1618-ALL.patch, KAFKA-1618.patch When running console producer with just localhost as the broker list, I get ArrayIndexOutOfBounds exception. I expect either a clearer error about arguments or for the producer to guess a default port. [root@shapira-1 bin]# ./kafka-console-producer.sh --topic rufus1 --broker-list localhost java.lang.ArrayIndexOutOfBoundsException: 1 at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102) at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97) at kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32) at kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41) at kafka.producer.Producer.init(Producer.scala:59) at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158) at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1637: - Assignee: (was: Neha Narkhede) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Labels: newbie This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1637) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group
[ https://issues.apache.org/jira/browse/KAFKA-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1637: - Labels: newbie (was: ) SimpleConsumer.fetchOffset returns wrong error code when no offset exists for topic/partition/consumer group Key: KAFKA-1637 URL: https://issues.apache.org/jira/browse/KAFKA-1637 Project: Kafka Issue Type: Bug Components: consumer, core Affects Versions: 0.8.1, 0.8.1.1 Environment: Linux Reporter: Amir Malekpour Labels: newbie This concerns Kafka's Offset Fetch API: According to Kafka's current documentation, if there is no offset associated with a topic-partition under that consumer group the broker does not set an error code (since it is not really an error), but returns empty metadata and sets the offset field to -1. (Link below) However, in Kafka 08.1.1 Error code '3' is returned, which effectively makes it impossible for the client to decide if there was an error, or if there is no offset associated with a topic-partition under that consumer group. https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-MetadataAPI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142170#comment-14142170 ] Neha Narkhede commented on KAFKA-1476: -- [~balaji.sesha...@dish.com],for #1, #2, please read my previous comment. For #3, we need For reading offsets, --offsets --describe --group groupName Let's file another JIRA for reset offset tool as that requires more thought and that tool doesn't even exist today. Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: Balaji Seshadri Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1476-RENAME.patch, KAFKA-1476.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-328) Write unit test for kafka server startup and shutdown API
[ https://issues.apache.org/jira/browse/KAFKA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142171#comment-14142171 ] Neha Narkhede commented on KAFKA-328: - Ping [~balaji.sesha...@dish.com] Write unit test for kafka server startup and shutdown API -- Key: KAFKA-328 URL: https://issues.apache.org/jira/browse/KAFKA-328 Project: Kafka Issue Type: Bug Reporter: Neha Narkhede Assignee: Balaji Seshadri Labels: newbie Background discussion in KAFKA-320 People often try to embed KafkaServer in an application that ends up calling startup() and shutdown() repeatedly and sometimes in odd ways. To ensure this works correctly we have to be very careful about cleaning up resources. This is a good practice for making unit tests reliable anyway. A good first step would be to add some unit tests on startup and shutdown to cover various cases: 1. A Kafka server can startup if it is not already starting up, if it is not currently being shutdown, or if it hasn't been already started 2. A Kafka server can shutdown if it is not already shutting down, if it is not currently starting up, or if it hasn't been already shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector
[ https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142174#comment-14142174 ] Neha Narkhede commented on KAFKA-1282: -- +1 on your latest patch. I'm leaning towards accepting the patch since the test above points to an issue that seems unrelated to the patch. [~nmarasoi], it will be great if you can follow Jun's suggestion to reproduce the issue. Then file a JIRA to track it. I'm guessing killing idle connections shouldn't lead to data loss. Disconnect idle socket connection in Selector - Key: KAFKA-1282 URL: https://issues.apache.org/jira/browse/KAFKA-1282 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: nicu marasoiu Labels: newbie++ Fix For: 0.9.0 Attachments: 1282_brush.patch, 1282_brushed_up.patch, KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch To reduce # socket connections, it would be useful for the new producer to close socket connections that are idle. We can introduce a new producer config for the idle time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector
[ https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142199#comment-14142199 ] Neha Narkhede commented on KAFKA-1282: -- Pushed the latest patch to trunk. Disconnect idle socket connection in Selector - Key: KAFKA-1282 URL: https://issues.apache.org/jira/browse/KAFKA-1282 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Jun Rao Assignee: nicu marasoiu Labels: newbie++ Fix For: 0.9.0 Attachments: 1282_brush.patch, 1282_brushed_up.patch, KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch To reduce # socket connections, it would be useful for the new producer to close socket connections that are idle. We can introduce a new producer config for the idle time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Kafka-trunk #270
See https://builds.apache.org/job/Kafka-trunk/270/changes Changes: [neha.narkhede] KAFKA-1282 Disconnect idle socket connection in Selector; reviewed by Neha Narkhede and Jun Rao -- [...truncated 2154 lines...] at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testPartitionReassignmentNonOverlappingReplicas FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testReassigningNonExistingPartition FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testResumePartitionReassignmentThatWasCompleted FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testPreferredReplicaJsonData FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testBasicPreferredReplicaElection FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testShutdownBroker FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at kafka.zk.EmbeddedZookeeper.init(EmbeddedZookeeper.scala:33) at kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33) at kafka.admin.AdminTest.setUp(AdminTest.scala:33) kafka.admin.AdminTest testTopicConfigChange FAILED java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at
[jira] [Updated] (KAFKA-1622) project shouldn't require signing to build
[ https://issues.apache.org/jira/browse/KAFKA-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1622: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. +1 and committed to trunk. project shouldn't require signing to build -- Key: KAFKA-1622 URL: https://issues.apache.org/jira/browse/KAFKA-1622 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Ivan Lyutov Priority: Blocker Labels: build, newbie, packaging Fix For: 0.8.2 Attachments: KAFKA-1622.patch we only need signing for uploadArchives that is it The project trunk failed to build due to some signing/license checks (the diff I used to get things to build is here: https://gist.github.com/dehora/7e3c0bd75bb2b5d87557) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : Kafka-trunk #271
See https://builds.apache.org/job/Kafka-trunk/271/changes
[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled
[ https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261 ] Jiangjie Qin commented on KAFKA-1305: - I looked into this problem and it seems to me the issue is mainly because the default controller queue size was too small. The follow is as below: 1. Controller 265 received controlled shutdown request 2. Controller 265 put leaderAndIsrRequest into controller message queue and responded to broker 265. 3. Broker 265 received respond from Controller 265, shutdown successfully and de-registerred itself form zk. 4. Controller 265 request send thread started to send leaderAndIsrRequests which are put in step 2 to the brokers. Since broker 265 has already shutdown, it will start infinite retry. At this moment, the controller message queue size will never decrease. (Thread t@113) 5. Scheduled preferred leader election started, grabbed the controller lock and was trying to put the LeaderAndIsr request into controller message queue. However, because the queue size is only 10, it could not finish but just blocking on the put method while still holding the controller lock. (Thread t@117) 6. Broker change listener on controller 265 was triggered because broker path change in step 3, it was trying to grab the controller lock and stop thread t@113, but failed to do that because thread t@117 was holding controller lock and waiting on the controller message queue. Currently the controller message queue size is 10. IMO if we can increase the number to be 100 or even bigger, this problem won't happen again. Actually, in most time, the number of messages in the queue will be small even empty because there should not be too many controller messages. So increasing the queue size won't cause memory consumption to be increase. Controller can hang on controlled shutdown with auto leader balance enabled --- Key: KAFKA-1305 URL: https://issues.apache.org/jira/browse/KAFKA-1305 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Neha Narkhede Priority: Blocker Fix For: 0.9.0 This is relatively easy to reproduce especially when doing a rolling bounce. What happened here is as follows: 1. The previous controller was bounced and broker 265 became the new controller. 2. I went on to do a controlled shutdown of broker 265 (the new controller). 3. In the mean time the automatically scheduled preferred replica leader election process started doing its thing and starts sending LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers). (t@113 below). 4. While that's happening, the controlled shutdown process on 265 succeeds and proceeds to deregister itself from ZooKeeper and shuts down the socket server. 5. (ReplicaStateMachine actually removes deregistered brokers from the controller channel manager's list of brokers to send requests to. However, that removal cannot take place (t@18 below) because preferred replica leader election task owns the controller lock.) 6. So the request thread to broker 265 gets into infinite retries. 7. The entire broker shutdown process is blocked on controller shutdown for the same reason (it needs to acquire the controller lock). Relevant portions from the thread-dump: Controller-265-to-broker-265-send-thread - Thread t@113 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143) at kafka.utils.Utils$.swallow(Utils.scala:167) at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) at kafka.utils.Utils$.swallowWarn(Utils.scala:46) at kafka.utils.Logging$class.swallow(Logging.scala:94) at kafka.utils.Utils$.swallow(Utils.scala:46) at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) - locked java.lang.Object@6dbf14a7 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) Locked ownable synchronizers: - None ... Thread-4 - Thread t@17 java.lang.Thread.State: WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: kafka-scheduler-0 at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at
[jira] [Comment Edited] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled
[ https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261 ] Jiangjie Qin edited comment on KAFKA-1305 at 9/21/14 12:45 AM: --- I looked into this problem and it seems to me the issue is mainly because the default controller queue size was too small. The problem flow is as below: 1. Controller 265 received controlled shutdown request 2. Controller 265 put leaderAndIsrRequest into controller message queue and responded to broker 265. 3. Broker 265 received respond from Controller 265, shutdown successfully and de-registerred itself form zk. 4. Controller 265 request send thread started to send leaderAndIsrRequests which are put in step 2 to the brokers. Since broker 265 has already shutdown, it will start infinite retry. At this moment, the controller message queue size will never decrease. (Thread t@113) 5. Scheduled preferred leader election started, grabbed the controller lock and was trying to put the LeaderAndIsr request into controller message queue. However, because the queue size is only 10, it could not finish but just blocking on the put method while still holding the controller lock. (Thread t@117) 6. Broker change listener on controller 265 was triggered because broker path change in step 3, it was trying to grab the controller lock and stop thread t@113, but failed to do that because thread t@117 was holding controller lock and waiting on the controller message queue. Currently the controller message queue size is 10. IMO if we can increase the number to be 100 or even bigger, this problem won't happen again. Actually, in most time, the number of messages in the queue will be small even empty because there should not be too many controller messages. So increasing the queue size won't cause memory consumption to be increase. was (Author: becket_qin): I looked into this problem and it seems to me the issue is mainly because the default controller queue size was too small. The follow is as below: 1. Controller 265 received controlled shutdown request 2. Controller 265 put leaderAndIsrRequest into controller message queue and responded to broker 265. 3. Broker 265 received respond from Controller 265, shutdown successfully and de-registerred itself form zk. 4. Controller 265 request send thread started to send leaderAndIsrRequests which are put in step 2 to the brokers. Since broker 265 has already shutdown, it will start infinite retry. At this moment, the controller message queue size will never decrease. (Thread t@113) 5. Scheduled preferred leader election started, grabbed the controller lock and was trying to put the LeaderAndIsr request into controller message queue. However, because the queue size is only 10, it could not finish but just blocking on the put method while still holding the controller lock. (Thread t@117) 6. Broker change listener on controller 265 was triggered because broker path change in step 3, it was trying to grab the controller lock and stop thread t@113, but failed to do that because thread t@117 was holding controller lock and waiting on the controller message queue. Currently the controller message queue size is 10. IMO if we can increase the number to be 100 or even bigger, this problem won't happen again. Actually, in most time, the number of messages in the queue will be small even empty because there should not be too many controller messages. So increasing the queue size won't cause memory consumption to be increase. Controller can hang on controlled shutdown with auto leader balance enabled --- Key: KAFKA-1305 URL: https://issues.apache.org/jira/browse/KAFKA-1305 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Neha Narkhede Priority: Blocker Fix For: 0.9.0 This is relatively easy to reproduce especially when doing a rolling bounce. What happened here is as follows: 1. The previous controller was bounced and broker 265 became the new controller. 2. I went on to do a controlled shutdown of broker 265 (the new controller). 3. In the mean time the automatically scheduled preferred replica leader election process started doing its thing and starts sending LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers). (t@113 below). 4. While that's happening, the controlled shutdown process on 265 succeeds and proceeds to deregister itself from ZooKeeper and shuts down the socket server. 5. (ReplicaStateMachine actually removes deregistered brokers from the controller channel manager's list of brokers to send requests to. However, that removal cannot take place (t@18 below) because preferred replica leader election task owns
[jira] [Comment Edited] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled
[ https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142261#comment-14142261 ] Jiangjie Qin edited comment on KAFKA-1305 at 9/21/14 12:48 AM: --- I looked into this problem and it seems to me the issue is mainly because the default controller queue size was too small. The problem flow is as below: 1. Controller 265 received controlled shutdown request 2. Controller 265 put leaderAndIsrRequest into controller message queue and responded to broker 265. 3. Broker 265 received respond from Controller 265, shutdown successfully and de-registerred itself form zk. 4. Controller 265 request send thread started to send leaderAndIsrRequests which are put in step 2 to the brokers. Since broker 265 has already shutdown, it will start infinite retry. At this moment, the controller message queue size will never decrease. (Thread t@113) 5. Scheduled preferred leader election started, grabbed the controller lock and was trying to put the LeaderAndIsr request into controller message queue. However, because the queue size is only 10, it could not finish but just blocking on the put method while still holding the controller lock. (Thread t@117) 6. Broker change listener on controller 265 was triggered because broker path change in step 3, it was trying to grab the controller lock and stop thread t@113, but failed to do that because thread t@117 was holding controller lock and waiting on the controller message queue. Currently the controller message queue size is 10. IMO if we can increase the number to be 100 or even bigger, this problem won't happen again. Actually, in most time, the number of messages in the queue will be small even empty because there should not be too many controller messages. So increasing the queue size won't cause memory consumption to increase. was (Author: becket_qin): I looked into this problem and it seems to me the issue is mainly because the default controller queue size was too small. The problem flow is as below: 1. Controller 265 received controlled shutdown request 2. Controller 265 put leaderAndIsrRequest into controller message queue and responded to broker 265. 3. Broker 265 received respond from Controller 265, shutdown successfully and de-registerred itself form zk. 4. Controller 265 request send thread started to send leaderAndIsrRequests which are put in step 2 to the brokers. Since broker 265 has already shutdown, it will start infinite retry. At this moment, the controller message queue size will never decrease. (Thread t@113) 5. Scheduled preferred leader election started, grabbed the controller lock and was trying to put the LeaderAndIsr request into controller message queue. However, because the queue size is only 10, it could not finish but just blocking on the put method while still holding the controller lock. (Thread t@117) 6. Broker change listener on controller 265 was triggered because broker path change in step 3, it was trying to grab the controller lock and stop thread t@113, but failed to do that because thread t@117 was holding controller lock and waiting on the controller message queue. Currently the controller message queue size is 10. IMO if we can increase the number to be 100 or even bigger, this problem won't happen again. Actually, in most time, the number of messages in the queue will be small even empty because there should not be too many controller messages. So increasing the queue size won't cause memory consumption to be increase. Controller can hang on controlled shutdown with auto leader balance enabled --- Key: KAFKA-1305 URL: https://issues.apache.org/jira/browse/KAFKA-1305 Project: Kafka Issue Type: Bug Reporter: Joel Koshy Assignee: Neha Narkhede Priority: Blocker Fix For: 0.9.0 This is relatively easy to reproduce especially when doing a rolling bounce. What happened here is as follows: 1. The previous controller was bounced and broker 265 became the new controller. 2. I went on to do a controlled shutdown of broker 265 (the new controller). 3. In the mean time the automatically scheduled preferred replica leader election process started doing its thing and starts sending LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers). (t@113 below). 4. While that's happening, the controlled shutdown process on 265 succeeds and proceeds to deregister itself from ZooKeeper and shuts down the socket server. 5. (ReplicaStateMachine actually removes deregistered brokers from the controller channel manager's list of brokers to send requests to. However, that removal cannot take place (t@18 below) because preferred replica leader election task
[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work
[ https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142342#comment-14142342 ] Sriharsha Chintalapani commented on KAFKA-1558: --- [~nehanarkhede] [~junrao] update on the tests. I ran the above tests in 5 kafka brokers and 3 zookeeper node with 1000 topics with 3 partitions 3 replication factor. All these topics being written to and read from simultaneously. 1) after the controller is restarted topic log files and metadata deleted successfully 2) after a soft failure (can simulate by pausing the jvm for longer that zk session timeout) of the controller I am not able to consistently pass this test. I am getting this error 2014-09-21 01:21:39,101] INFO Partition [my-topic-435,0] on broker 5: Cached zkVersion [114] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) Initially thought it could be related KAFKA-1382 but from debug logs expected Some((Leader:2,ISR:2,LeaderEpoch:4,ControllerEpoch:7)) written Some((Leader:4,ISR:4,LeaderEpoch:5,ControllerEpoch:9)) (kafka.utils.ReplicationUtils$) Leader is different. This is causing the broker to keep on logging these messages. I am not sure why it didn't get the leader update. I didn't see any errors in zookeeper logs. I'll get more details on this. 3) after a topic's partitions have been reassigned to some other brokers topic deleted no issues 4) after running a preferred leader command topic successfully deleted 5) after a topic's partition has been increased topic and new partition data also deleted 6) controller broker is killed (kill -9) successfully deleted the topic and metadata. Once the killed controller back online the logfiles for that topic partition on that broker also got deleted. I am not able to reproduce ReplicaFetcher issue here that pointed out earlier. Before I was running tests on vms with low memory . I wasn't able to reproduce this on these nodes. AdminUtils.deleteTopic does not work Key: KAFKA-1558 URL: https://issues.apache.org/jira/browse/KAFKA-1558 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Henning Schmiedehausen Assignee: Sriharsha Chintalapani Priority: Blocker Fix For: 0.8.2 the AdminUtils:.deleteTopic method is implemented as {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.createPersistentPath(zkClient, ZkUtils.getDeleteTopicPath(topic)) } {code} but the DeleteTopicCommand actually does {code} zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer) zkClient.deleteRecursive(ZkUtils.getTopicPath(topic)) {code} so I guess, that the 'createPersistentPath' above should actually be {code} def deleteTopic(zkClient: ZkClient, topic: String) { ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic)) } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)