Re: Review Request 24676: Fix KAFKA-1583
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24676/#review57147 --- Thanks for the patch. A few more comments. core/src/main/scala/kafka/server/ReplicaManager.scala https://reviews.apache.org/r/24676/#comment97617 typo uni core/src/main/scala/kafka/server/DelayedProduce.scala https://reviews.apache.org/r/24676/#comment97628 Instead of saying return error, it's more accruate to say set an error in response. core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/24676/#comment97627 It's not super clear what the relationships for forceComplete, tryComplete, complete, onComplete are and how they should be used together. Perhaps we can add an explanation here? core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/24676/#comment97626 How about we rename this to onComplete()? core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/24676/#comment97619 The description of the return value is not quite right. It should be the same as forceComplete(): return true iff the operation is completed by the caller. core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/24676/#comment97620 can be completed = can be completed by the caller - Jun Rao On Oct. 17, 2014, 4:15 a.m., Guozhang Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24676/ --- (Updated Oct. 17, 2014, 4:15 a.m.) Review request for kafka. Bugs: KAFKA-1583 https://issues.apache.org/jira/browse/KAFKA-1583 Repository: kafka Description --- Incoporate Jun's comments after rebase Diffs - core/src/main/scala/kafka/api/FetchRequest.scala 59c09155dd25fad7bed07d3d00039e3dc66db95c core/src/main/scala/kafka/api/FetchResponse.scala 8d085a1f18f803b3cebae4739ad8f58f95a6c600 core/src/main/scala/kafka/api/OffsetCommitRequest.scala 861a6cf11dc6b6431fcbbe9de00c74a122f204bd core/src/main/scala/kafka/api/ProducerRequest.scala b2366e7eedcac17f657271d5293ff0bef6f3cbe6 core/src/main/scala/kafka/api/ProducerResponse.scala a286272c834b6f40164999ff8b7f8998875f2cfe core/src/main/scala/kafka/cluster/Partition.scala e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 core/src/main/scala/kafka/common/ErrorMapping.scala 880ab4a004f078e5d84446ea6e4454ecc06c95f2 core/src/main/scala/kafka/log/Log.scala 157d67369baabd2206a2356b2aa421e848adab17 core/src/main/scala/kafka/network/BoundedByteBufferSend.scala a624359fb2059340bb8dc1619c5b5f226e26eb9b core/src/main/scala/kafka/server/DelayedFetch.scala e0f14e25af03e6d4344386dcabc1457ee784d345 core/src/main/scala/kafka/server/DelayedProduce.scala 9481508fc2d6140b36829840c337e557f3d090da core/src/main/scala/kafka/server/FetchRequestPurgatory.scala ed1318891253556cdf4d908033b704495acd5724 core/src/main/scala/kafka/server/KafkaApis.scala 85498b4a1368d3506f19c4cfc64934e4d0ac4c90 core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 core/src/main/scala/kafka/server/RequestPurgatory.scala 9d76234bc2c810ec08621dc92bb4061b8e7cd993 core/src/main/scala/kafka/utils/DelayedItem.scala d7276494072f14f1cdf7d23f755ac32678c5675c core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 209a409cb47eb24f83cee79f4e064dbc5f5e9d62 core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala fb61d552f2320fedec547400fbbe402a0b2f5d87 core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 03a424d45215e1e7780567d9559dae4d0ae6fc29 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala cd302aa51eb8377d88b752d48274e403926439f2 core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala a9c4ddc78df0b3695a77a12cf8cf25521a203122 core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala a577f4a8bf420a5bc1e62fad6d507a240a42bcaa core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 3804a114e97c849cae48308997037786614173fc core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 Diff: https://reviews.apache.org/r/24676/diff/ Testing --- Unit tests Thanks, Guozhang Wang
[jira] [Updated] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-1583: - Attachment: KAFKA-1583_2014-10-17_09:56:33.patch Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1711) WARN Property topic is not valid when running console producer
Jun Rao created KAFKA-1711: -- Summary: WARN Property topic is not valid when running console producer Key: KAFKA-1711 URL: https://issues.apache.org/jira/browse/KAFKA-1711 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.2 Reporter: Jun Rao bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test [2014-10-17 09:54:23,984] WARN Property topic is not valid (kafka.utils.VerifiableProperties) It would be good if we can get rid of the warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175239#comment-14175239 ] Guozhang Wang commented on KAFKA-1583: -- Updated reviewboard https://reviews.apache.org/r/24676/diff/ against branch origin/trunk Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Security JIRAS
I'm looking at Kafka Brokers authentication with ZooKeeper since this looks independent of other tasks. [AM] 1) Is authentication required only between kafka broker and zookeeper? Can we assume world read so that consumers don't have to be authenticated (I believe in any case kafka is planning to change in such that consumers don't have to interact with zk)? In this case I assume kafka broker can I think easily create the znode with appropriate acl list - broker can be admin. 2) Zookeeper supports Kerberos authentication. Zookeeper supports SSL connections (version 3.4 or later) but I don't see an x509 authentication provider. Do we want to support x509 cert based authentication for zk? - Arvind
[jira] [Updated] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option
[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1493: --- Resolution: Fixed Status: Resolved (was: Patch Available) James, Thanks a lot for the patch. +1 and committed to trunk and 0.8.2. This patch also removes the lz4 jar as a runtime dependency if lz4 compression is not used. Use a well-documented LZ4 compression format and remove redundant LZ4HC option -- Key: KAFKA-1493 URL: https://issues.apache.org/jira/browse/KAFKA-1493 Project: Kafka Issue Type: Improvement Affects Versions: 0.8.2 Reporter: James Oliver Assignee: James Oliver Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jenkins build is back to normal : Kafka-trunk #305
See https://builds.apache.org/job/Kafka-trunk/305/changes
Re: Security JIRAS
For the moment, consumers still need to write under the /consumers tree. Even if they are committing offsets to Kafka instead of ZK, they will need to write owner information there when they are balancing. Eventually, you are correct, this is going away with the new consumer. -Todd On Fri, Oct 17, 2014 at 10:09 AM, Arvind Mani am...@linkedin.com.invalid wrote: I'm looking at Kafka Brokers authentication with ZooKeeper since this looks independent of other tasks. [AM] 1) Is authentication required only between kafka broker and zookeeper? Can we assume world read so that consumers don't have to be authenticated (I believe in any case kafka is planning to change in such that consumers don't have to interact with zk)? In this case I assume kafka broker can I think easily create the znode with appropriate acl list - broker can be admin. 2) Zookeeper supports Kerberos authentication. Zookeeper supports SSL connections (version 3.4 or later) but I don't see an x509 authentication provider. Do we want to support x509 cert based authentication for zk? - Arvind
Re: Security JIRAS
Yes, I think we can focus on Broker to Zookeeper communication only. At least for initial stage. Gwen On Fri, Oct 17, 2014 at 2:10 PM, Todd Palino tpal...@gmail.com wrote: For the moment, consumers still need to write under the /consumers tree. Even if they are committing offsets to Kafka instead of ZK, they will need to write owner information there when they are balancing. Eventually, you are correct, this is going away with the new consumer. -Todd On Fri, Oct 17, 2014 at 10:09 AM, Arvind Mani am...@linkedin.com.invalid wrote: I'm looking at Kafka Brokers authentication with ZooKeeper since this looks independent of other tasks. [AM] 1) Is authentication required only between kafka broker and zookeeper? Can we assume world read so that consumers don't have to be authenticated (I believe in any case kafka is planning to change in such that consumers don't have to interact with zk)? In this case I assume kafka broker can I think easily create the znode with appropriate acl list - broker can be admin. 2) Zookeeper supports Kerberos authentication. Zookeeper supports SSL connections (version 3.4 or later) but I don't see an x509 authentication provider. Do we want to support x509 cert based authentication for zk? - Arvind
Re: Review Request 24676: Rebase KAFKA-1583
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24676/#review57178 --- Thanks for the patch. +1 after addressing a couple of more minor comments below. Also, do you plan to have a followup jira to rename request to operation globally? core/src/main/scala/kafka/server/DelayedProduce.scala https://reviews.apache.org/r/24676/#comment97691 set the response core/src/main/scala/kafka/server/RequestPurgatory.scala https://reviews.apache.org/r/24676/#comment97709 Reworded the explanation as follows. Does it look ok? * * The logic upon completing a delayed operation is defined in onComplete() and will be called exactly once. * Once an operation is completed, isCompleted() will return true. onComplete() can be triggered by either * forceComplete(), which forces calling onComplete() after delayMs if the operation is not yet completed, * or tryComplete(), which first checks if the operation can be completed or not now, and if yes calls * forceComplete(). A subclass of DelayedRequest needs to provide an implementation of both onComplete() and * tryComplete(). - Jun Rao On Oct. 17, 2014, 4:56 p.m., Guozhang Wang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24676/ --- (Updated Oct. 17, 2014, 4:56 p.m.) Review request for kafka. Bugs: KAFKA-1583 https://issues.apache.org/jira/browse/KAFKA-1583 Repository: kafka Description --- Incorporate Jun's comments round two after rebase Diffs - core/src/main/scala/kafka/api/FetchRequest.scala 59c09155dd25fad7bed07d3d00039e3dc66db95c core/src/main/scala/kafka/api/FetchResponse.scala 8d085a1f18f803b3cebae4739ad8f58f95a6c600 core/src/main/scala/kafka/api/OffsetCommitRequest.scala 861a6cf11dc6b6431fcbbe9de00c74a122f204bd core/src/main/scala/kafka/api/ProducerRequest.scala b2366e7eedcac17f657271d5293ff0bef6f3cbe6 core/src/main/scala/kafka/api/ProducerResponse.scala a286272c834b6f40164999ff8b7f8998875f2cfe core/src/main/scala/kafka/cluster/Partition.scala e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 core/src/main/scala/kafka/common/ErrorMapping.scala 880ab4a004f078e5d84446ea6e4454ecc06c95f2 core/src/main/scala/kafka/log/Log.scala 157d67369baabd2206a2356b2aa421e848adab17 core/src/main/scala/kafka/network/BoundedByteBufferSend.scala a624359fb2059340bb8dc1619c5b5f226e26eb9b core/src/main/scala/kafka/server/DelayedFetch.scala e0f14e25af03e6d4344386dcabc1457ee784d345 core/src/main/scala/kafka/server/DelayedProduce.scala 9481508fc2d6140b36829840c337e557f3d090da core/src/main/scala/kafka/server/FetchRequestPurgatory.scala ed1318891253556cdf4d908033b704495acd5724 core/src/main/scala/kafka/server/KafkaApis.scala 85498b4a1368d3506f19c4cfc64934e4d0ac4c90 core/src/main/scala/kafka/server/OffsetManager.scala 43eb2a35bb54d32c66cdb94772df657b3a104d1a core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f core/src/main/scala/kafka/server/ReplicaManager.scala 78b7514cc109547c562e635824684fad581af653 core/src/main/scala/kafka/server/RequestPurgatory.scala 9d76234bc2c810ec08621dc92bb4061b8e7cd993 core/src/main/scala/kafka/utils/DelayedItem.scala d7276494072f14f1cdf7d23f755ac32678c5675c core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 209a409cb47eb24f83cee79f4e064dbc5f5e9d62 core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala fb61d552f2320fedec547400fbbe402a0b2f5d87 core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 03a424d45215e1e7780567d9559dae4d0ae6fc29 core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala cd302aa51eb8377d88b752d48274e403926439f2 core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala a9c4ddc78df0b3695a77a12cf8cf25521a203122 core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala a577f4a8bf420a5bc1e62fad6d507a240a42bcaa core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 3804a114e97c849cae48308997037786614173fc core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 Diff: https://reviews.apache.org/r/24676/diff/ Testing --- Unit tests Thanks, Guozhang Wang
[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost
[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1642: - Assignee: Ewen Cheslack-Postava (was: Jun Rao) Status: Patch Available (was: Open) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost --- Key: KAFKA-1642 URL: https://issues.apache.org/jira/browse/KAFKA-1642 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Attachments: KAFKA-1642.patch I see my CPU spike to 100% when network connection is lost for while. It seems network IO thread are very busy logging following error message. Is this expected behavior ? 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka producer I/O thread: java.lang.IllegalStateException: No entry found for node -2 at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) at java.lang.Thread.run(Thread.java:744) Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost
[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175360#comment-14175360 ] Ewen Cheslack-Postava commented on KAFKA-1642: -- Created reviewboard https://reviews.apache.org/r/26885/diff/ against branch origin/trunk [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost --- Key: KAFKA-1642 URL: https://issues.apache.org/jira/browse/KAFKA-1642 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Bhavesh Mistry Assignee: Jun Rao Attachments: KAFKA-1642.patch I see my CPU spike to 100% when network connection is lost for while. It seems network IO thread are very busy logging following error message. Is this expected behavior ? 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka producer I/O thread: java.lang.IllegalStateException: No entry found for node -2 at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) at java.lang.Thread.run(Thread.java:744) Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 26885: Patch for KAFKA-1642
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26885/ --- Review request for kafka. Bugs: KAFKA-1642 https://issues.apache.org/jira/browse/KAFKA-1642 Repository: kafka Description --- Fixes two issues with computation of poll timeouts in Sender/RecordAccumulator. First, the timeout was being computed by RecordAccumulator as it looked up which nodes had data to send, but the timeout cannot be computed until after nodes that aren't ready for sending are filtered since this could result in a node that is currently unreachable always returning a timeout of 0 and triggering a busy loop. The fixed version computes per-node timeouts and only computes the final timeout after nodes that aren't ready for sending are removed. Second, timeouts were only being computed based on the first TopicAndPartition encountered for each node. This could result in incorrect timeouts if the first encountered didn't have the minimum timeout for that node. This now evaluates every TopicAndPartition with a known leader and takes the minimum. Diffs - clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java c5d470011d334318d5ee801021aadd0c000974a6 clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 8ebe7ed82c9384b71ce0cc3ddbef2c2325363ab9 clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java 0762b35abba0551f23047348c5893bb8c9acff14 Diff: https://reviews.apache.org/r/26885/diff/ Testing --- Thanks, Ewen Cheslack-Postava
[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost
[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-1642: - Attachment: KAFKA-1642.patch [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost --- Key: KAFKA-1642 URL: https://issues.apache.org/jira/browse/KAFKA-1642 Project: Kafka Issue Type: Bug Components: producer Affects Versions: 0.8.2 Reporter: Bhavesh Mistry Assignee: Jun Rao Attachments: KAFKA-1642.patch I see my CPU spike to 100% when network connection is lost for while. It seems network IO thread are very busy logging following error message. Is this expected behavior ? 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka producer I/O thread: java.lang.IllegalStateException: No entry found for node -2 at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) at org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) at java.lang.Thread.run(Thread.java:744) Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1485) Upgrade to Zookeeper 3.4.6 and create shim for ZKCLI so system tests can run
[ https://issues.apache.org/jira/browse/KAFKA-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175365#comment-14175365 ] Jun Rao commented on KAFKA-1485: Gwen, Joe, It seems that ZK 3.4.6 added a runtime dependency on netty-3.7.0.Final.jar. However, it doesn't seem that's really needed to start ZK server and ZK client. Could you confirm that runtime dependency is unnecessary? If so, we should exclude that in our build file. Thanks, Upgrade to Zookeeper 3.4.6 and create shim for ZKCLI so system tests can run -- Key: KAFKA-1485 URL: https://issues.apache.org/jira/browse/KAFKA-1485 Project: Kafka Issue Type: Wish Affects Versions: 0.8.1.1 Reporter: Machiel Groeneveld Assignee: Gwen Shapira Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1485.2.patch, KAFKA-1485.3.patch, KAFKA-1485.4.patch, KAFKA-1485.patch I can't run projects alongside Kafka that use zookeeper 3.4 jars. 3.4 has been out for 2.5 years and seems to be ready for adoption. In particular Apache Storm will upgrade to Zookeeper 3.4.x in their next 0.9.2 release. I can't run both versions in my tests at the same time. The only compile problem I saw was in EmbeddedZookeeper.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1712) Excessive storage usage on newly added node
Oleg Golovin created KAFKA-1712: --- Summary: Excessive storage usage on newly added node Key: KAFKA-1712 URL: https://issues.apache.org/jira/browse/KAFKA-1712 Project: Kafka Issue Type: Bug Reporter: Oleg Golovin When a new node is added to cluster data stars replicating into it. The mtime of creating segments will be set on the last message being written to them. Though the replication is a prolonged process, let's assume (for simplicity of explanation) that their mtime is very close to the time when the new node was added. After the replication is done, new data will start to flow into this new node. After `log.retention.hours` the amount of data will be 2 * daily_amount_of_data_in_kafka_node (first one is the replicated data from other nodes when the node was added (let us call it `t1`) and the second is the amount of replicated data from other nodes during `t1 + log.retention.hours`). So by that time the node will have twice as much data as the other nodes. This poses a big problem to us as our storage is chosen to fit normal amount of data (not twice this amount). In our particular case it poses another problem. We have an emergency segment cleaner which runs in case storage is nearly full (90%). We try to balance the amount of data for it not to run to rely solely on kafka internal log deletion, but sometimes emergency cleaner runs. It works this way: - it gets all kafka segments for the volume - it filters out last segments of each partition (just to avoid unnecessary recreation of last small-size segments) - it sorts them by segment mtime - it changes mtime of the first N segements (with the lowest mtime) to 1, so they become really really old. Number N is chosen to free specified percentage of volume (3% in our case). Kafka deletes these segments later (as they are very old). Emergency cleaner works very well. Except for the case when the data is replicated to the newly added node. In this case segment mtime is the time the segment was replicated and does not reflect the real creation time of original data stored in this segment. So in this case kafka emergency cleaner will delete segments with the lowest mtime, which may hold the data which is much more recent than the data in other segments. This is not a big problem until we delete the data which hasn't been fully consumed. In this case we loose data and this makes it a big problem. Is it possible to retain segment mtime during initial replication on a new node? This will help not to load the new node with the twice as large amount of data as other nodes have. Or maybe there are another ways to sort segments by data creation times (or close to data creation time)? (for example if this ticket is implemented https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the first message from .index). In our case it will help with kafka emergency cleaner, which will be deleting really the oldest data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1671) uploaded archives are missing for Scala version 2.11
[ https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-1671: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. +1. Committed to trunk and 0.8.2. uploaded archives are missing for Scala version 2.11 Key: KAFKA-1671 URL: https://issues.apache.org/jira/browse/KAFKA-1671 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Ivan Lyutov Priority: Blocker Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1671.patch https://repository.apache.org/content/groups/staging/org/apache/kafka/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175430#comment-14175430 ] Ewen Cheslack-Postava commented on KAFKA-1710: -- bq. The dead lock will occur something depending on Thread scheduling and how log the are blocked. Dead lock has a specific definition -- two or more threads that are both waiting on each other such that neither can make any forward progress -- and as far as I can tell this isn't triggering a deadlock. From what I've seen this is simply an issue of trying of anywhere from 50 - 200 threads trying to access a shared, synchronized resource. This is just contention, everything continues to make progress. The test program runs to completion just fine. As for performance, I have no doubt there are improvements to be made in the Producer implementation, but you'll get a far bigger performance boost with careful design in your system. I already mentioned multiple ways you can improve performance that, based on your current test code, shouldn't affect anything else. Here's a quick example (using a lightly modified version of your code against a local test cluster): {quote} Existing setup (4 producers, 1 partition): All Producers done...! All done...! real1m50.135s user1m45.019s sys 1m53.219s {quote} {quote} 8 Producers, 1 partition (and parameters adjusted to generate same # of msgs): All Producers done...! All done...! real0m55.465s user1m27.132s sys 1m1.144s {quote} Nothing surprising, but since you haven't specified a constraint on the # of producers this seems like the simplest solution to improve performance. [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
Build failed in Jenkins: Kafka-trunk #306
See https://builds.apache.org/job/Kafka-trunk/306/changes Changes: [junrao] kafka-1671; uploaded archives are missing for Scala version 2.11; patched by Ivan Lyutov; reviewed by Jun Rao -- [...truncated 1011 lines...] kafka.producer.ProducerTest testSendWithDeadBroker PASSED kafka.producer.ProducerTest testAsyncSendCanCorrectlyFailWithTimeout PASSED kafka.producer.ProducerTest testSendNullMessage PASSED kafka.producer.AsyncProducerTest testProducerQueueSize PASSED kafka.producer.AsyncProducerTest testProduceAfterClosed PASSED kafka.producer.AsyncProducerTest testBatchSize PASSED kafka.producer.AsyncProducerTest testQueueTimeExpired PASSED kafka.producer.AsyncProducerTest testPartitionAndCollateEvents PASSED kafka.producer.AsyncProducerTest testSerializeEvents PASSED kafka.producer.AsyncProducerTest testInvalidPartition PASSED kafka.producer.AsyncProducerTest testNoBroker PASSED kafka.producer.AsyncProducerTest testIncompatibleEncoder PASSED kafka.producer.AsyncProducerTest testRandomPartitioner PASSED kafka.producer.AsyncProducerTest testFailedSendRetryLogic PASSED kafka.producer.AsyncProducerTest testJavaProducer PASSED kafka.producer.AsyncProducerTest testInvalidConfiguration PASSED kafka.log.CleanerTest testCleanSegments PASSED kafka.log.CleanerTest testCleaningWithDeletes PASSED kafka.log.CleanerTest testCleanSegmentsWithAbort PASSED kafka.log.CleanerTest testSegmentGrouping PASSED kafka.log.CleanerTest testBuildOffsetMap PASSED kafka.log.LogManagerTest testCreateLog PASSED kafka.log.LogManagerTest testGetNonExistentLog PASSED kafka.log.LogManagerTest testCleanupExpiredSegments PASSED kafka.log.LogManagerTest testCleanupSegmentsToMaintainSize PASSED kafka.log.LogManagerTest testTimeBasedFlush PASSED kafka.log.LogManagerTest testLeastLoadedAssignment PASSED kafka.log.LogManagerTest testTwoLogManagersUsingSameDirFails PASSED kafka.log.LogManagerTest testCheckpointRecoveryPoints PASSED kafka.log.LogManagerTest testRecoveryDirectoryMappingWithTrailingSlash PASSED kafka.log.LogManagerTest testRecoveryDirectoryMappingWithRelativeDirectory PASSED kafka.log.OffsetIndexTest truncate PASSED kafka.log.OffsetIndexTest randomLookupTest PASSED kafka.log.OffsetIndexTest lookupExtremeCases PASSED kafka.log.OffsetIndexTest appendTooMany PASSED kafka.log.OffsetIndexTest appendOutOfOrder PASSED kafka.log.OffsetIndexTest testReopen PASSED kafka.log.FileMessageSetTest testWrittenEqualsRead PASSED kafka.log.FileMessageSetTest testIteratorIsConsistent PASSED kafka.log.FileMessageSetTest testSizeInBytes PASSED kafka.log.FileMessageSetTest testWriteTo PASSED kafka.log.FileMessageSetTest testFileSize PASSED kafka.log.FileMessageSetTest testIterationOverPartialAndTruncation PASSED kafka.log.FileMessageSetTest testIterationDoesntChangePosition PASSED kafka.log.FileMessageSetTest testRead PASSED kafka.log.FileMessageSetTest testSearch PASSED kafka.log.FileMessageSetTest testIteratorWithLimits PASSED kafka.log.FileMessageSetTest testTruncate PASSED kafka.log.LogCleanerIntegrationTest cleanerTest PASSED kafka.log.OffsetMapTest testBasicValidation PASSED kafka.log.OffsetMapTest testClear PASSED kafka.log.LogTest testTimeBasedLogRoll PASSED kafka.log.LogTest testTimeBasedLogRollJitter PASSED kafka.log.LogTest testSizeBasedLogRoll PASSED kafka.log.LogTest testLoadEmptyLog PASSED kafka.log.LogTest testAppendAndReadWithSequentialOffsets PASSED kafka.log.LogTest testAppendAndReadWithNonSequentialOffsets PASSED kafka.log.LogTest testReadAtLogGap PASSED kafka.log.LogTest testReadOutOfRange PASSED kafka.log.LogTest testLogRolls PASSED kafka.log.LogTest testCompressedMessages PASSED kafka.log.LogTest testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED kafka.log.LogTest testMessageSetSizeCheck PASSED kafka.log.LogTest testMessageSizeCheck PASSED kafka.log.LogTest testLogRecoversToCorrectOffset PASSED kafka.log.LogTest testIndexRebuild PASSED kafka.log.LogTest testTruncateTo PASSED kafka.log.LogTest testIndexResizingAtTruncation PASSED kafka.log.LogTest testBogusIndexSegmentsAreRemoved PASSED kafka.log.LogTest testReopenThenTruncate PASSED kafka.log.LogTest testAsyncDelete PASSED kafka.log.LogTest testOpenDeletesObsoleteFiles PASSED kafka.log.LogTest testAppendMessageWithNullPayload PASSED kafka.log.LogTest testCorruptLog PASSED kafka.log.LogTest testCleanShutdownFile PASSED kafka.log.LogSegmentTest testTruncate PASSED kafka.log.LogSegmentTest testReadOnEmptySegment PASSED kafka.log.LogSegmentTest testReadBeforeFirstOffset PASSED kafka.log.LogSegmentTest testMaxOffset PASSED kafka.log.LogSegmentTest testReadAfterLast PASSED kafka.log.LogSegmentTest testReadFromGap PASSED kafka.log.LogSegmentTest testTruncateFull PASSED kafka.log.LogSegmentTest testNextOffsetCalculation PASSED kafka.log.LogSegmentTest testChangeFileSuffixes
[jira] [Commented] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175500#comment-14175500 ] BalajiSeshadri commented on KAFKA-1476: --- [~nehanarkhede] I tried to implement the describe group,but the cal is getting stuck in while loop of readFrom. Please see log below. io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108) at kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154) at kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92) at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67) at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108) at kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154) at kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92) at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67) at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108) at kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154) at kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92) at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67) at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Its just continously going through while loop of kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154). Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: BalajiSeshadri Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ?
[jira] [Updated] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BalajiSeshadri updated KAFKA-1476: -- Attachment: ConsumerCommand.scala Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: BalajiSeshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava resolved KAFKA-1710. -- Resolution: Invalid Assignee: Ewen Cheslack-Postava [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1476) Get a list of consumer groups
[ https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175503#comment-14175503 ] BalajiSeshadri commented on KAFKA-1476: --- [~nehanarkhede] Please find the attached code. Get a list of consumer groups - Key: KAFKA-1476 URL: https://issues.apache.org/jira/browse/KAFKA-1476 Project: Kafka Issue Type: Wish Components: tools Affects Versions: 0.8.1.1 Reporter: Ryan Williams Assignee: BalajiSeshadri Labels: newbie Fix For: 0.9.0 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, KAFKA-1476.patch It would be useful to have a way to get a list of consumer groups currently active via some tool/script that ships with kafka. This would be helpful so that the system tools can be explored more easily. For example, when running the ConsumerOffsetChecker, it requires a group option bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group ? But, when just getting started with kafka, using the console producer and consumer, it is not clear what value to use for the group option. If a list of consumer groups could be listed, then it would be clear what value to use. Background: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175548#comment-14175548 ] Bhavesh Mistry commented on KAFKA-1710: --- [~ewencp], Thank you for entertaining this issue and you may close this. I do agree with you if I increase number of producers then throughput will be alleviated (thread contention to critical block) at expense of TCP connections, memory etc. Do you think it would be good to open another jira issues or story for improving performance when sending to single partition for some time to avoid Thread contention? Please let me know if I should open the performance aspect of New Producer. Thanks, Bhavesh [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175551#comment-14175551 ] Jay Kreps commented on KAFKA-1710: -- [~Bmis13] What is the performance you see? What do you hope to see? [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175548#comment-14175548 ] Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:19 PM: - [~ewencp], Thank you for entertaining this issue and you may close this. I do agree with you if I increase number of producers then throughput will be alleviated (thread contention to critical block) at expense of TCP connections, memory etc. Do you think it would be good to open another jira issues or story for improving performance when sending to single partition for some time to avoid Thread contention? Please let me know if I should open the performance aspect of New Producer. Only request is to make New Producer truly Async to enqueue the message regardless of message key or partition number. Thanks, Bhavesh was (Author: bmis13): [~ewencp], Thank you for entertaining this issue and you may close this. I do agree with you if I increase number of producers then throughput will be alleviated (thread contention to critical block) at expense of TCP connections, memory etc. Do you think it would be good to open another jira issues or story for improving performance when sending to single partition for some time to avoid Thread contention? Please let me know if I should open the performance aspect of New Producer. Thanks, Bhavesh [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by
blog with some out of the box pains
This guy documented a few struggles getting going with Kafka. Not sure if there is anything we can do to make it better? http://ispyker.blogspot.com/2014/10/kafka-part-1.html 1. Would be great to figure out the apache/gradle thing. 2. The problem of having Kafka advertise localhost on AWS is really common. I was thinking one possible solution for this would be to get all the interfaces and prefer non-localhost interfaces if they exist. -Jay
[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175574#comment-14175574 ] Bhavesh Mistry commented on KAFKA-1710: --- [~jkreps], Only request is to make New Producer truly Async to enqueue the message regardless of message key hashcode or partition number for the message. The new Producer is far far better than old Scala producer. ( I have worked both with new and old producers/consumer and entire linked-in pipeline) But new producer inherit the same problem that old producer had thread contention when queuing message into buffer. I think Kafka Dev team can do better because this use case of aggregating events into single partition is widely used. What my plan is to replace the Steam processing framework with Kafka is possible (For Aggregation and counting metrics etc) We currently use following steam processor, but it has lots of down fall and only distribute the load which Kafka Brokers provide. Any way this is our use case. https://github.com/walmartlabs/mupd8 http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf Thanks, Bhavesh [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175574#comment-14175574 ] Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:32 PM: - [~jkreps], Only request is to make New Producer truly Async to enqueue the message regardless of message key hashcode or partition number for the message. The new Producer is far far better than old Scala producer. ( I have worked on both new and old producers/consumer and entire linked-in pipeline) But new producer inherit the same problem that old producer had thread contention when queuing message into buffer. I think Kafka Dev team can do better because this use case of aggregating events into single partition is widely used. What my plan is to replace the Steam processing framework with Kafka is possible (For Aggregation and counting metrics etc) We currently use following steam processor, but it has lots of down fall and only distribute the load which Kafka Brokers provide. Any way this is our use case. https://github.com/walmartlabs/mupd8 http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf Thanks, Bhavesh was (Author: bmis13): [~jkreps], Only request is to make New Producer truly Async to enqueue the message regardless of message key hashcode or partition number for the message. The new Producer is far far better than old Scala producer. ( I have worked both with new and old producers/consumer and entire linked-in pipeline) But new producer inherit the same problem that old producer had thread contention when queuing message into buffer. I think Kafka Dev team can do better because this use case of aggregating events into single partition is widely used. What my plan is to replace the Steam processing framework with Kafka is possible (For Aggregation and counting metrics etc) We currently use following steam processor, but it has lots of down fall and only distribute the load which Kafka Brokers provide. Any way this is our use case. https://github.com/walmartlabs/mupd8 http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf Thanks, Bhavesh [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run()
[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition
[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175586#comment-14175586 ] Jay Kreps commented on KAFKA-1710: -- Well but of course you can't have multiple threads appending to a shared in memory data structure without some synchronization. That lock should be very very cheap, though. What is meant by asynchronous is not that it doesn't block but rather that it doesn't block on the network request (after all due to gc, context switches, etc your program always stops). It sounds like you were seeing some kind of performance problem. What performance (say msgs/sec) were you seeing and what were you hoping for? [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition Key: KAFKA-1710 URL: https://issues.apache.org/jira/browse/KAFKA-1710 Project: Kafka Issue Type: Bug Components: producer Environment: Development Reporter: Bhavesh Mistry Assignee: Ewen Cheslack-Postava Priority: Critical Labels: performance Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, th6.dump, th7.dump, th8.dump, th9.dump Hi Kafka Dev Team, When I run the test to send message to single partition for 3 minutes or so on, I have encounter deadlock (please see the screen attached) and thread contention from YourKit profiling. Use Case: 1) Aggregating messages into same partition for metric counting. 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. Here is output: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. pool-1-thread-128 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-159 --- Frozen for at least 2m 1 sec org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 pool-1-thread-55 --- Frozen for at least 2m org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, Callback) KafkaProducer.java:237 org.kafka.test.TestNetworkDownProducer$MyProducer.run() TestNetworkDownProducer.java:84 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1145 java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:615 java.lang.Thread.run() Thread.java:744 Thanks, Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged
[ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1108: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch. Pushed to trunk and 0.8.2 when controlled shutdown attempt fails, the reason is not always logged --- Key: KAFKA-1108 URL: https://issues.apache.org/jira/browse/KAFKA-1108 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Assignee: Ewen Cheslack-Postava Labels: newbie Fix For: 0.9.0 Attachments: KAFKA-1108.patch, KAFKA-1108_2014-10-16_13:53:11.patch In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown. Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code: {code} info(Remaining partitions to move: %s.format(shutdownResponse.partitionsRemaining.mkString(,))) info(Error code from controller: %d.format(shutdownResponse.errorCode)) {code} Alternatively, there could be an IOException, with this code executed: {code} catch { case ioe: java.io.IOException = channel.disconnect() channel = null // ignore and try again } {code} And then finally, in either case: {code} if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn(Retrying controlled shutdown after the previous attempt failed...) } {code} It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!). I suspect, in my case, the actual IOException is a socket timeout (as the time between initial Starting controlled shutdown and the first Retrying... message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff). So, it would seem that really, the issue in this case is that controlled shutdown is taking too long. It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged
[ https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1108: - Fix Version/s: (was: 0.9.0) 0.8.2 when controlled shutdown attempt fails, the reason is not always logged --- Key: KAFKA-1108 URL: https://issues.apache.org/jira/browse/KAFKA-1108 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Assignee: Ewen Cheslack-Postava Labels: newbie Fix For: 0.8.2 Attachments: KAFKA-1108.patch, KAFKA-1108_2014-10-16_13:53:11.patch In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and then if there's a failure, it will retry the controlledShutdown. Looking at the code, there are 2 ways a retry could fail, one with an error response from the controller, and this messaging code: {code} info(Remaining partitions to move: %s.format(shutdownResponse.partitionsRemaining.mkString(,))) info(Error code from controller: %d.format(shutdownResponse.errorCode)) {code} Alternatively, there could be an IOException, with this code executed: {code} catch { case ioe: java.io.IOException = channel.disconnect() channel = null // ignore and try again } {code} And then finally, in either case: {code} if (!shutdownSuceeded) { Thread.sleep(config.controlledShutdownRetryBackoffMs) warn(Retrying controlled shutdown after the previous attempt failed...) } {code} It would be nice if the nature of the IOException were logged in either case (I'd be happy with an ioe.getMessage() instead of a full stack trace, as kafka in general tends to be too willing to dump IOException stack traces!). I suspect, in my case, the actual IOException is a socket timeout (as the time between initial Starting controlled shutdown and the first Retrying... message is usually about 35 seconds (the socket timeout + the controlled shutdown retry backoff). So, it would seem that really, the issue in this case is that controlled shutdown is taking too long. It would seem sensible instead to have the controller report back to the server (before the socket timeout) that more time is needed, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: blog with some out of the box pains
In #2, do you refer to advertising the internal hostname instead of the external one? In this case, will it be enough to use getCanonicalHostName (which uses a name service)? Note that I think the problem the blog reported (wrong name advertised) is somewhat orthogonal to the question of which interface we bind to (which should probably be the default interface). Gwen On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote: This guy documented a few struggles getting going with Kafka. Not sure if there is anything we can do to make it better? http://ispyker.blogspot.com/2014/10/kafka-part-1.html 1. Would be great to figure out the apache/gradle thing. 2. The problem of having Kafka advertise localhost on AWS is really common. I was thinking one possible solution for this would be to get all the interfaces and prefer non-localhost interfaces if they exist. -Jay
Jenkins build is back to normal : Kafka-trunk #307
See https://builds.apache.org/job/Kafka-trunk/307/changes
[jira] [Created] (KAFKA-1713) Partition files not created
Pradeep created KAFKA-1713: -- Summary: Partition files not created Key: KAFKA-1713 URL: https://issues.apache.org/jira/browse/KAFKA-1713 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Environment: Linux, 2GB RAM, 2 Core CPU Reporter: Pradeep We are using Kafka Topic APIs to create the topic. But in some cases, the topic gets created but we don't see the partition specific files and when producer/consumer tries to get the topic metadata and it fails with exception. k.p.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for topic tloader1 - No partition metadata for topic tloader1 due to kafka.common.UnknownTopicOrPartitionException}] for topic [tloader1]: class kafka.common.UnknownTopicOrPartitionException -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-1713) Partition files not created
[ https://issues.apache.org/jira/browse/KAFKA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede resolved KAFKA-1713. -- Resolution: Won't Fix Please can you redirect such questions to the mailing list? Regarding your problem, did you send any data from the producer? Partition files not created --- Key: KAFKA-1713 URL: https://issues.apache.org/jira/browse/KAFKA-1713 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Environment: Linux, 2GB RAM, 2 Core CPU Reporter: Pradeep We are using Kafka Topic APIs to create the topic. But in some cases, the topic gets created but we don't see the partition specific files and when producer/consumer tries to get the topic metadata and it fails with exception. k.p.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for topic tloader1 - No partition metadata for topic tloader1 due to kafka.common.UnknownTopicOrPartitionException}] for topic [tloader1]: class kafka.common.UnknownTopicOrPartitionException -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: blog with some out of the box pains
Hmm, yes, actually I don't think I actually understand the issue. Basically as I understand it we do InetAddress.getLocalHost.getHostAddress which on AWS picks the wrong hostname/ip and then the producer can't connect. People eventually find this FAQ, but I was hoping there was a more automatic way since everyone is on AWS these days. Maybe getCanonicalHostName would fix it? https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers ? -Jay On Fri, Oct 17, 2014 at 3:19 PM, Gwen Shapira gshap...@cloudera.com wrote: In #2, do you refer to advertising the internal hostname instead of the external one? In this case, will it be enough to use getCanonicalHostName (which uses a name service)? Note that I think the problem the blog reported (wrong name advertised) is somewhat orthogonal to the question of which interface we bind to (which should probably be the default interface). Gwen On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote: This guy documented a few struggles getting going with Kafka. Not sure if there is anything we can do to make it better? http://ispyker.blogspot.com/2014/10/kafka-part-1.html 1. Would be great to figure out the apache/gradle thing. 2. The problem of having Kafka advertise localhost on AWS is really common. I was thinking one possible solution for this would be to get all the interfaces and prefer non-localhost interfaces if they exist. -Jay
Kafka Command Line Shell
Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Kafka Command Line Shell
Joe I think this is great! On Fri, Oct 17, 2014 at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote: Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: blog with some out of the box pains
Basically, the issue (or at least one of very many possible network issues...) is that the server has localhost hardcoded as its canonical name in /etc/hosts: [root@Billc-cent70x64 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 Billc-cent70x64 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Unfortunately a very common default for RedHat and Centos machines. As the blog mentions, a good solution (other than instructing Kafka on the right name to advertise) is to add the correct IP and hostname to /etc/hosts. We may want to add this option to the FAQ. Gwen On Fri, Oct 17, 2014 at 7:56 PM, Gwen Shapira gshap...@cloudera.com wrote: It looks like we are using canonical hostname: def register() { val advertisedHostName = if(advertisedHost == null || advertisedHost.trim.isEmpty) InetAddress.getLocalHost.getCanonicalHostName else advertisedHost val jmxPort = System.getProperty(com.sun.management.jmxremote.port, -1).toInt ZkUtils.registerBrokerInZk(zkClient, brokerId, advertisedHostName, advertisedPort, zkSessionTimeoutMs, jmxPort) } So never mind :) On Fri, Oct 17, 2014 at 6:36 PM, Jay Kreps jay.kr...@gmail.com wrote: Hmm, yes, actually I don't think I actually understand the issue. Basically as I understand it we do InetAddress.getLocalHost.getHostAddress which on AWS picks the wrong hostname/ip and then the producer can't connect. People eventually find this FAQ, but I was hoping there was a more automatic way since everyone is on AWS these days. Maybe getCanonicalHostName would fix it? https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers ? -Jay On Fri, Oct 17, 2014 at 3:19 PM, Gwen Shapira gshap...@cloudera.com wrote: In #2, do you refer to advertising the internal hostname instead of the external one? In this case, will it be enough to use getCanonicalHostName (which uses a name service)? Note that I think the problem the blog reported (wrong name advertised) is somewhat orthogonal to the question of which interface we bind to (which should probably be the default interface). Gwen On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote: This guy documented a few struggles getting going with Kafka. Not sure if there is anything we can do to make it better? http://ispyker.blogspot.com/2014/10/kafka-part-1.html 1. Would be great to figure out the apache/gradle thing. 2. The problem of having Kafka advertise localhost on AWS is really common. I was thinking one possible solution for this would be to get all the interfaces and prefer non-localhost interfaces if they exist. -Jay
Re: Kafka Command Line Shell
We've been talking about this a little internally as well. What about the idea of presenting all the admin functions through a web API interface (restful or not) complete with authentication? That would make it much easier for creating structure around Kafka without having to layer commands on top of each other. I'm not a big fan of the language specific interfaces, because they tend to complicate trying to integrate with larger systems. Consider something like AWS or Azure, where it would be much easier if there is an API interface like that. -Todd On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote: Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
[jira] [Commented] (KAFKA-1583) Kafka API Refactoring
[ https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175763#comment-14175763 ] Joel Koshy commented on KAFKA-1583: --- Reviewing is taking a bit longer than expected - I'm about half-way through, so hopefully should be done on Monday. Kafka API Refactoring - Key: KAFKA-1583 URL: https://issues.apache.org/jira/browse/KAFKA-1583 Project: Kafka Issue Type: Bug Reporter: Guozhang Wang Assignee: Guozhang Wang Fix For: 0.9.0 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, KAFKA-1583_2014-10-17_09:56:33.patch This is the next step of KAFKA-1430. Details can be found at this page: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Kafka Command Line Shell
+1 It would definitely be useful to have a CLI. We had a cursory discussion on this in the past [1] but it would be useful to have a full proposal describing everything the CLI should provide. [1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote: We've been talking about this a little internally as well. What about the idea of presenting all the admin functions through a web API interface (restful or not) complete with authentication? That would make it much easier for creating structure around Kafka without having to layer commands on top of each other. I'm not a big fan of the language specific interfaces, because they tend to complicate trying to integrate with larger systems. Consider something like AWS or Azure, where it would be much easier if there is an API interface like that. -Todd On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote: Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Kafka Command Line Shell
Absolutely. My suggestion of an HTTP interface was in addition to a CLI. I think the CLI should can the HTTP interface underneath to keep it simple. -Todd On Oct 17, 2014, at 6:24 PM, Joel Koshy jjkosh...@gmail.com wrote: +1 It would definitely be useful to have a CLI. We had a cursory discussion on this in the past [1] but it would be useful to have a full proposal describing everything the CLI should provide. [1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote: We've been talking about this a little internally as well. What about the idea of presenting all the admin functions through a web API interface (restful or not) complete with authentication? That would make it much easier for creating structure around Kafka without having to layer commands on top of each other. I'm not a big fan of the language specific interfaces, because they tend to complicate trying to integrate with larger systems. Consider something like AWS or Azure, where it would be much easier if there is an API interface like that. -Todd On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote: Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: Kafka Command Line Shell
+1 to http interface and cli. Http layer will make it easier to integrate with gui like Hue. Gwen — Sent from Mailbox On Fri, Oct 17, 2014 at 10:14 PM, Todd Palino tpal...@gmail.com wrote: Absolutely. My suggestion of an HTTP interface was in addition to a CLI. I think the CLI should can the HTTP interface underneath to keep it simple. -Todd On Oct 17, 2014, at 6:24 PM, Joel Koshy jjkosh...@gmail.com wrote: +1 It would definitely be useful to have a CLI. We had a cursory discussion on this in the past [1] but it would be useful to have a full proposal describing everything the CLI should provide. [1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote: We've been talking about this a little internally as well. What about the idea of presenting all the admin functions through a web API interface (restful or not) complete with authentication? That would make it much easier for creating structure around Kafka without having to layer commands on top of each other. I'm not a big fan of the language specific interfaces, because they tend to complicate trying to integrate with larger systems. Consider something like AWS or Azure, where it would be much easier if there is an API interface like that. -Todd On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote: Hi, I have been thinking about the ease of use for operations with Kafka. We have lots of tools doing a lot of different things and they are all kind of in different places. So, what I was thinking is to have a single interface for our tooling https://issues.apache.org/jira/browse/KAFKA-1694 This would manifest itself in two ways 1) a command line interface 2) a repl We would have one entry point centrally for all Kafka commands. kafka CMD ARGS kafka createTopic --brokerList etc, kafka reassignPartition --brokerList etc, or execute and run the shell kafka --brokerList localhost kafkause topicName; kafkaset acl='label'; I was thinking that all calls would be initialized through --brokerList and the broker can tell the KafkaCommandTool what server to connect to for MetaData. Thoughts? Tomatoes? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop /
Re: blog with some out of the box pains
The first issue he runs into is one I also find frustrating -- with cloud providers pushing SSDs, you have to use a pretty large instance type to get a reasonable test setup. I'm not sure if he couldn't launch an older type like m1.large (I think some newer AWS accounts aren't able to) or if he just didn't see it as an option since they are hidden by default. Even the largest general purpose instance types are pretty wimpy wrt storage, only 80GB local instance storage. The hostname issues are a well known pain point and unfortunately there aren't any great solutions that aren't EC2-specific. Here's a quick run down: * None of the images for popular distros on EC2 will auto-set the hostname beyond what EC2 already sets up (which isn't publicly routable). The following details might explain why they can't. For example, a recent Ubuntu image gives: ubuntu@ip-172-30-2-76:~$ hostname ip-172-30-2-76 ubuntu@ip-172-30-2-76:~$ cat /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback --- cut irrelevant pieces --- * Sometimes the hostname is set, but isn't useful. For example, in this Ubuntu image, the hostname is set to ip-[ip-address-], but that isn't routable, so generates really irritating behavior. Running on the server itself (which is running in a VPC, see below for more details): scala InetAddress.getLocalHost java.net.UnknownHostException: ip-172-30-2-76: ip-172-30-2-76: Name or service not known at java.net.InetAddress.getLocalHost(InetAddress.java:1473) at .init(console:9) at .clinit(console) at .init(console:11) at .clinit(console) at $print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704) at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920) at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43) at scala.tools.nsc.io.package$$anon$2.run(package.scala:25) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: ip-172-30-2-76: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getLocalHost(InetAddress.java:1469) ... 14 more * As described in a bunch of places, the only reliable way to get public DNS info is through EC2's own instance metadata API: https://forums.aws.amazon.com/thread.jspa?threadID=77788 For example: curl -s http://169.254.169.254/latest/meta-data/public-hostname might give something like: ec2-203-0-113-25.compute-1.amazonaws.com * But you may not even *have* a public DNS hostname. If you launch in a VPC, you'll only get one if you set the VPC to generate them (and I'm pretty sure the default is to not create them): http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html The output of the curl call above will just be empty. * AWS is pretty aggressively trying to move away from EC2-Classic (i.e. non-VPC instances), so most new instances will end up in VPCs unless you are working in a grandfathered account + AZ. If VPC without public DNS is the default, we'll have to carefully guide new users in generating a setup that works properly if we try to use hostnames. * Even if you try moving the IP addresses, you still have to deal with VPCs. You can't directly get your public IP address without accessing something outside the host since you're in a VPC. You need to use the instance metadata API to look it up, i.e., curl -s http://169.254.169.254/latest/meta-data/public-ipv4 * And yet another problem with IPs: unless you use an elastic IP, you're not guaranteed they'll be stable: Auto-assign Public IP Requests a public IP address from Amazon's public IP address pool, to make your instance reachable from the Internet. In most cases, the public IP address is associated with the instance until it’s stopped or terminated, after which it’s no longer available for you to use. If you require a persistent public IP address that you can associate and disassociate at will, use an Elastic IP address (EIP) instead. You can allocate your own EIP, and associate it to your instance after launch. I know Spark had some similar issues -- using their (very convenient!) ec2 script, you still ended up with some stuff