Re: Review Request 24676: Fix KAFKA-1583

2014-10-17 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24676/#review57147
---


Thanks for the patch. A few more comments.


core/src/main/scala/kafka/server/ReplicaManager.scala
https://reviews.apache.org/r/24676/#comment97617

typo uni



core/src/main/scala/kafka/server/DelayedProduce.scala
https://reviews.apache.org/r/24676/#comment97628

Instead of saying return error, it's more accruate to say set an error 
in response.



core/src/main/scala/kafka/server/RequestPurgatory.scala
https://reviews.apache.org/r/24676/#comment97627

It's not super clear what the relationships for forceComplete, tryComplete, 
complete, onComplete are and how they should be used together. Perhaps we can 
add an explanation here?



core/src/main/scala/kafka/server/RequestPurgatory.scala
https://reviews.apache.org/r/24676/#comment97626

How about we rename this to onComplete()?



core/src/main/scala/kafka/server/RequestPurgatory.scala
https://reviews.apache.org/r/24676/#comment97619

The description of the return value is not quite right. It should be the 
same as forceComplete(): return true iff the operation is completed by the 
caller.



core/src/main/scala/kafka/server/RequestPurgatory.scala
https://reviews.apache.org/r/24676/#comment97620

can be completed = can be completed by the caller


- Jun Rao


On Oct. 17, 2014, 4:15 a.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24676/
 ---
 
 (Updated Oct. 17, 2014, 4:15 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1583
 https://issues.apache.org/jira/browse/KAFKA-1583
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Incoporate Jun's comments after rebase
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/api/FetchRequest.scala 
 59c09155dd25fad7bed07d3d00039e3dc66db95c 
   core/src/main/scala/kafka/api/FetchResponse.scala 
 8d085a1f18f803b3cebae4739ad8f58f95a6c600 
   core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
 861a6cf11dc6b6431fcbbe9de00c74a122f204bd 
   core/src/main/scala/kafka/api/ProducerRequest.scala 
 b2366e7eedcac17f657271d5293ff0bef6f3cbe6 
   core/src/main/scala/kafka/api/ProducerResponse.scala 
 a286272c834b6f40164999ff8b7f8998875f2cfe 
   core/src/main/scala/kafka/cluster/Partition.scala 
 e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 
   core/src/main/scala/kafka/common/ErrorMapping.scala 
 880ab4a004f078e5d84446ea6e4454ecc06c95f2 
   core/src/main/scala/kafka/log/Log.scala 
 157d67369baabd2206a2356b2aa421e848adab17 
   core/src/main/scala/kafka/network/BoundedByteBufferSend.scala 
 a624359fb2059340bb8dc1619c5b5f226e26eb9b 
   core/src/main/scala/kafka/server/DelayedFetch.scala 
 e0f14e25af03e6d4344386dcabc1457ee784d345 
   core/src/main/scala/kafka/server/DelayedProduce.scala 
 9481508fc2d6140b36829840c337e557f3d090da 
   core/src/main/scala/kafka/server/FetchRequestPurgatory.scala 
 ed1318891253556cdf4d908033b704495acd5724 
   core/src/main/scala/kafka/server/KafkaApis.scala 
 85498b4a1368d3506f19c4cfc64934e4d0ac4c90 
   core/src/main/scala/kafka/server/OffsetManager.scala 
 43eb2a35bb54d32c66cdb94772df657b3a104d1a 
   core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala 
 d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f 
   core/src/main/scala/kafka/server/ReplicaManager.scala 
 78b7514cc109547c562e635824684fad581af653 
   core/src/main/scala/kafka/server/RequestPurgatory.scala 
 9d76234bc2c810ec08621dc92bb4061b8e7cd993 
   core/src/main/scala/kafka/utils/DelayedItem.scala 
 d7276494072f14f1cdf7d23f755ac32678c5675c 
   core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
 209a409cb47eb24f83cee79f4e064dbc5f5e9d62 
   core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala 
 fb61d552f2320fedec547400fbbe402a0b2f5d87 
   core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
 03a424d45215e1e7780567d9559dae4d0ae6fc29 
   core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
 cd302aa51eb8377d88b752d48274e403926439f2 
   core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala 
 a9c4ddc78df0b3695a77a12cf8cf25521a203122 
   core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
 a577f4a8bf420a5bc1e62fad6d507a240a42bcaa 
   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 
 3804a114e97c849cae48308997037786614173fc 
   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 
 
 Diff: https://reviews.apache.org/r/24676/diff/
 
 
 Testing
 ---
 
 Unit tests
 
 
 Thanks,
 
 Guozhang Wang
 




[jira] [Updated] (KAFKA-1583) Kafka API Refactoring

2014-10-17 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1583:
-
Attachment: KAFKA-1583_2014-10-17_09:56:33.patch

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1711) WARN Property topic is not valid when running console producer

2014-10-17 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-1711:
--

 Summary: WARN Property topic is not valid when running console 
producer
 Key: KAFKA-1711
 URL: https://issues.apache.org/jira/browse/KAFKA-1711
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao


bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
[2014-10-17 09:54:23,984] WARN Property topic is not valid 
(kafka.utils.VerifiableProperties)

It would be good if we can get rid of the warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1583) Kafka API Refactoring

2014-10-17 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175239#comment-14175239
 ] 

Guozhang Wang commented on KAFKA-1583:
--

Updated reviewboard https://reviews.apache.org/r/24676/diff/
 against branch origin/trunk

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Security JIRAS

2014-10-17 Thread Arvind Mani

I'm looking at Kafka Brokers authentication with ZooKeeper since this
looks independent of other tasks.

[AM] 

1) Is authentication required only between kafka broker and zookeeper? Can we 
assume world read so that consumers don't have to be authenticated (I believe 
in any case kafka is planning to change in such that consumers don't have to 
interact with zk)? In this case I assume kafka broker can I think easily create 
the znode with appropriate acl list - broker can be admin.

2)  Zookeeper supports Kerberos authentication. Zookeeper supports SSL 
connections (version 3.4 or later) but I don't see an x509 authentication 
provider. Do we want to support x509 cert based authentication for zk? 

- Arvind



[jira] [Updated] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-10-17 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1493:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

James,

Thanks a lot for the patch. +1 and committed to trunk and 0.8.2.

This patch also removes the lz4 jar as a runtime dependency if lz4 compression 
is not used.

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Assignee: James Oliver
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1493.patch, KAFKA-1493.patch, 
 KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Kafka-trunk #305

2014-10-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/305/changes



Re: Security JIRAS

2014-10-17 Thread Todd Palino
For the moment, consumers still need to write under the /consumers tree.
Even if they are committing offsets to Kafka instead of ZK, they will need
to write owner information there when they are balancing. Eventually, you
are correct, this is going away with the new consumer.

-Todd

On Fri, Oct 17, 2014 at 10:09 AM, Arvind Mani am...@linkedin.com.invalid
wrote:


 I'm looking at Kafka Brokers authentication with ZooKeeper since this
 looks independent of other tasks.

 [AM]

 1) Is authentication required only between kafka broker and zookeeper? Can
 we assume world read so that consumers don't have to be authenticated (I
 believe in any case kafka is planning to change in such that consumers
 don't have to interact with zk)? In this case I assume kafka broker can I
 think easily create the znode with appropriate acl list - broker can be
 admin.

 2)  Zookeeper supports Kerberos authentication. Zookeeper supports SSL
 connections (version 3.4 or later) but I don't see an x509 authentication
 provider. Do we want to support x509 cert based authentication for zk?

 - Arvind




Re: Security JIRAS

2014-10-17 Thread Gwen Shapira
Yes, I think we can focus on Broker to Zookeeper communication only.
At least for initial stage.

Gwen

On Fri, Oct 17, 2014 at 2:10 PM, Todd Palino tpal...@gmail.com wrote:
 For the moment, consumers still need to write under the /consumers tree.
 Even if they are committing offsets to Kafka instead of ZK, they will need
 to write owner information there when they are balancing. Eventually, you
 are correct, this is going away with the new consumer.

 -Todd

 On Fri, Oct 17, 2014 at 10:09 AM, Arvind Mani am...@linkedin.com.invalid
 wrote:


 I'm looking at Kafka Brokers authentication with ZooKeeper since this
 looks independent of other tasks.

 [AM]

 1) Is authentication required only between kafka broker and zookeeper? Can
 we assume world read so that consumers don't have to be authenticated (I
 believe in any case kafka is planning to change in such that consumers
 don't have to interact with zk)? In this case I assume kafka broker can I
 think easily create the znode with appropriate acl list - broker can be
 admin.

 2)  Zookeeper supports Kerberos authentication. Zookeeper supports SSL
 connections (version 3.4 or later) but I don't see an x509 authentication
 provider. Do we want to support x509 cert based authentication for zk?

 - Arvind




Re: Review Request 24676: Rebase KAFKA-1583

2014-10-17 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24676/#review57178
---


Thanks for the patch. +1 after addressing a couple of more minor comments below.

Also, do you plan to have a followup jira to rename request to operation 
globally?


core/src/main/scala/kafka/server/DelayedProduce.scala
https://reviews.apache.org/r/24676/#comment97691

set the response



core/src/main/scala/kafka/server/RequestPurgatory.scala
https://reviews.apache.org/r/24676/#comment97709

Reworded the explanation as follows. Does it look ok?
 * 
 * The logic upon completing a delayed operation is defined in onComplete() 
and will be called exactly once.
 * Once an operation is completed, isCompleted() will return true. 
onComplete() can be triggered by either 
 * forceComplete(), which forces calling onComplete() after delayMs if the 
operation is not yet completed,
 * or tryComplete(), which first checks if the operation can be completed 
or not now, and if yes calls
 * forceComplete(). A subclass of DelayedRequest needs to provide an 
implementation of both onComplete() and
 * tryComplete().


- Jun Rao


On Oct. 17, 2014, 4:56 p.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24676/
 ---
 
 (Updated Oct. 17, 2014, 4:56 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1583
 https://issues.apache.org/jira/browse/KAFKA-1583
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Incorporate Jun's comments round two after rebase
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/api/FetchRequest.scala 
 59c09155dd25fad7bed07d3d00039e3dc66db95c 
   core/src/main/scala/kafka/api/FetchResponse.scala 
 8d085a1f18f803b3cebae4739ad8f58f95a6c600 
   core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
 861a6cf11dc6b6431fcbbe9de00c74a122f204bd 
   core/src/main/scala/kafka/api/ProducerRequest.scala 
 b2366e7eedcac17f657271d5293ff0bef6f3cbe6 
   core/src/main/scala/kafka/api/ProducerResponse.scala 
 a286272c834b6f40164999ff8b7f8998875f2cfe 
   core/src/main/scala/kafka/cluster/Partition.scala 
 e88ecf224a4dab8bbd26ba7b0c3ccfe844c6b7f4 
   core/src/main/scala/kafka/common/ErrorMapping.scala 
 880ab4a004f078e5d84446ea6e4454ecc06c95f2 
   core/src/main/scala/kafka/log/Log.scala 
 157d67369baabd2206a2356b2aa421e848adab17 
   core/src/main/scala/kafka/network/BoundedByteBufferSend.scala 
 a624359fb2059340bb8dc1619c5b5f226e26eb9b 
   core/src/main/scala/kafka/server/DelayedFetch.scala 
 e0f14e25af03e6d4344386dcabc1457ee784d345 
   core/src/main/scala/kafka/server/DelayedProduce.scala 
 9481508fc2d6140b36829840c337e557f3d090da 
   core/src/main/scala/kafka/server/FetchRequestPurgatory.scala 
 ed1318891253556cdf4d908033b704495acd5724 
   core/src/main/scala/kafka/server/KafkaApis.scala 
 85498b4a1368d3506f19c4cfc64934e4d0ac4c90 
   core/src/main/scala/kafka/server/OffsetManager.scala 
 43eb2a35bb54d32c66cdb94772df657b3a104d1a 
   core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala 
 d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f 
   core/src/main/scala/kafka/server/ReplicaManager.scala 
 78b7514cc109547c562e635824684fad581af653 
   core/src/main/scala/kafka/server/RequestPurgatory.scala 
 9d76234bc2c810ec08621dc92bb4061b8e7cd993 
   core/src/main/scala/kafka/utils/DelayedItem.scala 
 d7276494072f14f1cdf7d23f755ac32678c5675c 
   core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
 209a409cb47eb24f83cee79f4e064dbc5f5e9d62 
   core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala 
 fb61d552f2320fedec547400fbbe402a0b2f5d87 
   core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
 03a424d45215e1e7780567d9559dae4d0ae6fc29 
   core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
 cd302aa51eb8377d88b752d48274e403926439f2 
   core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala 
 a9c4ddc78df0b3695a77a12cf8cf25521a203122 
   core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
 a577f4a8bf420a5bc1e62fad6d507a240a42bcaa 
   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 
 3804a114e97c849cae48308997037786614173fc 
   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 
 
 Diff: https://reviews.apache.org/r/24676/diff/
 
 
 Testing
 ---
 
 Unit tests
 
 
 Thanks,
 
 Guozhang Wang
 




[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-17 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1642:
-
Assignee: Ewen Cheslack-Postava  (was: Jun Rao)
  Status: Patch Available  (was: Open)

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
 Attachments: KAFKA-1642.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-17 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175360#comment-14175360
 ] 

Ewen Cheslack-Postava commented on KAFKA-1642:
--

Created reviewboard https://reviews.apache.org/r/26885/diff/
 against branch origin/trunk

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Jun Rao
 Attachments: KAFKA-1642.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 26885: Patch for KAFKA-1642

2014-10-17 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26885/
---

Review request for kafka.


Bugs: KAFKA-1642
https://issues.apache.org/jira/browse/KAFKA-1642


Repository: kafka


Description
---

Fixes two issues with computation of poll timeouts in Sender/RecordAccumulator. 
First, the timeout
was being computed by RecordAccumulator as it looked up which nodes had data to 
send, but the timeout
cannot be computed until after nodes that aren't ready for sending are filtered 
since this could
result in a node that is currently unreachable always returning a timeout of 0 
and triggering a busy
loop. The fixed version computes per-node timeouts and only computes the final 
timeout after nodes
that aren't ready for sending are removed.

Second, timeouts were only being computed based on the first TopicAndPartition 
encountered for each
node. This could result in incorrect timeouts if the first encountered didn't 
have the minimum
timeout for that node. This now evaluates every TopicAndPartition with a known 
leader and takes the
minimum.


Diffs
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 c5d470011d334318d5ee801021aadd0c000974a6 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
8ebe7ed82c9384b71ce0cc3ddbef2c2325363ab9 
  
clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java
 0762b35abba0551f23047348c5893bb8c9acff14 

Diff: https://reviews.apache.org/r/26885/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

2014-10-17 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1642:
-
Attachment: KAFKA-1642.patch

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Jun Rao
 Attachments: KAFKA-1642.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1485) Upgrade to Zookeeper 3.4.6 and create shim for ZKCLI so system tests can run

2014-10-17 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175365#comment-14175365
 ] 

Jun Rao commented on KAFKA-1485:


Gwen, Joe,

It seems that ZK 3.4.6 added a runtime dependency on netty-3.7.0.Final.jar. 
However, it doesn't seem that's really needed to start ZK server and ZK client. 
Could you confirm that runtime dependency is unnecessary? If so, we should 
exclude that in our build file. Thanks,

 Upgrade to Zookeeper 3.4.6 and create shim for ZKCLI so system tests can run
 --

 Key: KAFKA-1485
 URL: https://issues.apache.org/jira/browse/KAFKA-1485
 Project: Kafka
  Issue Type: Wish
Affects Versions: 0.8.1.1
Reporter: Machiel Groeneveld
Assignee: Gwen Shapira
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1485.2.patch, KAFKA-1485.3.patch, 
 KAFKA-1485.4.patch, KAFKA-1485.patch


 I can't run projects alongside Kafka that use zookeeper 3.4 jars. 3.4 has 
 been out for 2.5 years and seems to be ready for adoption.
 In particular Apache Storm will upgrade to Zookeeper 3.4.x in their next 
 0.9.2 release. I can't run both versions in my tests at the same time. 
 The only compile problem I saw was in EmbeddedZookeeper.scala 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1712) Excessive storage usage on newly added node

2014-10-17 Thread Oleg Golovin (JIRA)
Oleg Golovin created KAFKA-1712:
---

 Summary: Excessive storage usage on newly added node
 Key: KAFKA-1712
 URL: https://issues.apache.org/jira/browse/KAFKA-1712
 Project: Kafka
  Issue Type: Bug
Reporter: Oleg Golovin


When a new node is added to cluster data stars replicating into it. The mtime 
of creating segments will be set on the last message being written to them. 
Though the replication is a prolonged process, let's assume (for simplicity of 
explanation) that their mtime is very close to the time when the new node was 
added.

After the replication is done, new data will start to flow into this new node. 
After `log.retention.hours` the amount of data will be 2 * 
daily_amount_of_data_in_kafka_node (first one is the replicated data from other 
nodes when the node was added (let us call it `t1`) and the second is the 
amount of replicated data from other nodes during `t1 + log.retention.hours`). 
So by that time the node will have twice as much data as the other nodes.

This poses a big problem to us as our storage is chosen to fit normal amount of 
data (not twice this amount).

In our particular case it poses another problem. We have an emergency segment 
cleaner which runs in case storage is nearly full (90%). We try to balance the 
amount of data for it not to run to rely solely on kafka internal log deletion, 
but sometimes emergency cleaner runs.
It works this way:
- it gets all kafka segments for the volume
- it filters out last segments of each partition (just to avoid unnecessary 
recreation of last small-size segments)
- it sorts them by segment mtime
- it changes mtime of the first N segements (with the lowest mtime) to 1, so 
they become really really old. Number N is chosen to free specified percentage 
of volume (3% in our case).  Kafka deletes these segments later (as they are 
very old).

Emergency cleaner works very well. Except for the case when the data is 
replicated to the newly added node. 
In this case segment mtime is the time the segment was replicated and does not 
reflect the real creation time of original data stored in this segment.
So in this case kafka emergency cleaner will delete segments with the lowest 
mtime, which may hold the data which is much more recent than the data in other 
segments.
This is not a big problem until we delete the data which hasn't been fully 
consumed.
In this case we loose data and this makes it a big problem.

Is it possible to retain segment mtime during initial replication on a new node?
This will help not to load the new node with the twice as large amount of data 
as other nodes have.

Or maybe there are another ways to sort segments by data creation times (or 
close to data creation time)? (for example if this ticket is implemented 
https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the first 
message from .index). In our case it will help with kafka emergency cleaner, 
which will be deleting really the oldest data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1671) uploaded archives are missing for Scala version 2.11

2014-10-17 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1671:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. +1. Committed to trunk and 0.8.2.

 uploaded archives are missing for Scala version 2.11
 

 Key: KAFKA-1671
 URL: https://issues.apache.org/jira/browse/KAFKA-1671
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Ivan Lyutov
Priority: Blocker
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1671.patch


 https://repository.apache.org/content/groups/staging/org/apache/kafka/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175430#comment-14175430
 ] 

Ewen Cheslack-Postava commented on KAFKA-1710:
--

bq. The dead lock will occur something depending on Thread scheduling and how 
log the are blocked. 

Dead lock has a specific definition -- two or more threads that are both 
waiting on each other such that neither can make any forward progress -- and as 
far as I can tell this isn't triggering a deadlock. From what I've seen this is 
simply an issue of trying of anywhere from 50 - 200 threads trying to access a 
shared, synchronized resource. This is just contention, everything continues to 
make progress. The test program runs to completion just fine.

As for performance, I have no doubt there are improvements to be made in the 
Producer implementation, but you'll get a far bigger performance boost with 
careful design in your system. I already mentioned multiple ways you can 
improve performance that, based on your current test code, shouldn't affect 
anything else. Here's a quick example (using a lightly modified version of your 
code against a local test cluster):

{quote}
Existing setup (4 producers, 1 partition):
All Producers done...!
All done...!

real1m50.135s
user1m45.019s
sys 1m53.219s
{quote}

{quote}
8 Producers, 1 partition (and parameters adjusted to generate same # of msgs):
All Producers done...!
All done...!

real0m55.465s
user1m27.132s
sys 1m1.144s
{quote}

Nothing surprising, but since you haven't specified a constraint on the # of 
producers this seems like the simplest solution to improve performance.

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 

Build failed in Jenkins: Kafka-trunk #306

2014-10-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/306/changes

Changes:

[junrao] kafka-1671; uploaded archives are missing for Scala version 2.11; 
patched by Ivan Lyutov; reviewed by Jun Rao

--
[...truncated 1011 lines...]
kafka.producer.ProducerTest  testSendWithDeadBroker PASSED

kafka.producer.ProducerTest  testAsyncSendCanCorrectlyFailWithTimeout PASSED

kafka.producer.ProducerTest  testSendNullMessage PASSED

kafka.producer.AsyncProducerTest  testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest  testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest  testBatchSize PASSED

kafka.producer.AsyncProducerTest  testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest  testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest  testSerializeEvents PASSED

kafka.producer.AsyncProducerTest  testInvalidPartition PASSED

kafka.producer.AsyncProducerTest  testNoBroker PASSED

kafka.producer.AsyncProducerTest  testIncompatibleEncoder PASSED

kafka.producer.AsyncProducerTest  testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest  testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest  testJavaProducer PASSED

kafka.producer.AsyncProducerTest  testInvalidConfiguration PASSED

kafka.log.CleanerTest  testCleanSegments PASSED

kafka.log.CleanerTest  testCleaningWithDeletes PASSED

kafka.log.CleanerTest  testCleanSegmentsWithAbort PASSED

kafka.log.CleanerTest  testSegmentGrouping PASSED

kafka.log.CleanerTest  testBuildOffsetMap PASSED

kafka.log.LogManagerTest  testCreateLog PASSED

kafka.log.LogManagerTest  testGetNonExistentLog PASSED

kafka.log.LogManagerTest  testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest  testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest  testTimeBasedFlush PASSED

kafka.log.LogManagerTest  testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest  testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest  testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest  testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.log.LogManagerTest  testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.OffsetIndexTest  truncate PASSED

kafka.log.OffsetIndexTest  randomLookupTest PASSED

kafka.log.OffsetIndexTest  lookupExtremeCases PASSED

kafka.log.OffsetIndexTest  appendTooMany PASSED

kafka.log.OffsetIndexTest  appendOutOfOrder PASSED

kafka.log.OffsetIndexTest  testReopen PASSED

kafka.log.FileMessageSetTest  testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest  testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest  testSizeInBytes PASSED

kafka.log.FileMessageSetTest  testWriteTo PASSED

kafka.log.FileMessageSetTest  testFileSize PASSED

kafka.log.FileMessageSetTest  testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest  testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest  testRead PASSED

kafka.log.FileMessageSetTest  testSearch PASSED

kafka.log.FileMessageSetTest  testIteratorWithLimits PASSED

kafka.log.FileMessageSetTest  testTruncate PASSED

kafka.log.LogCleanerIntegrationTest  cleanerTest PASSED

kafka.log.OffsetMapTest  testBasicValidation PASSED

kafka.log.OffsetMapTest  testClear PASSED

kafka.log.LogTest  testTimeBasedLogRoll PASSED

kafka.log.LogTest  testTimeBasedLogRollJitter PASSED

kafka.log.LogTest  testSizeBasedLogRoll PASSED

kafka.log.LogTest  testLoadEmptyLog PASSED

kafka.log.LogTest  testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest  testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest  testReadAtLogGap PASSED

kafka.log.LogTest  testReadOutOfRange PASSED

kafka.log.LogTest  testLogRolls PASSED

kafka.log.LogTest  testCompressedMessages PASSED

kafka.log.LogTest  testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest  testMessageSetSizeCheck PASSED

kafka.log.LogTest  testMessageSizeCheck PASSED

kafka.log.LogTest  testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest  testIndexRebuild PASSED

kafka.log.LogTest  testTruncateTo PASSED

kafka.log.LogTest  testIndexResizingAtTruncation PASSED

kafka.log.LogTest  testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest  testReopenThenTruncate PASSED

kafka.log.LogTest  testAsyncDelete PASSED

kafka.log.LogTest  testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest  testAppendMessageWithNullPayload PASSED

kafka.log.LogTest  testCorruptLog PASSED

kafka.log.LogTest  testCleanShutdownFile PASSED

kafka.log.LogSegmentTest  testTruncate PASSED

kafka.log.LogSegmentTest  testReadOnEmptySegment PASSED

kafka.log.LogSegmentTest  testReadBeforeFirstOffset PASSED

kafka.log.LogSegmentTest  testMaxOffset PASSED

kafka.log.LogSegmentTest  testReadAfterLast PASSED

kafka.log.LogSegmentTest  testReadFromGap PASSED

kafka.log.LogSegmentTest  testTruncateFull PASSED

kafka.log.LogSegmentTest  testNextOffsetCalculation PASSED

kafka.log.LogSegmentTest  testChangeFileSuffixes 

[jira] [Commented] (KAFKA-1476) Get a list of consumer groups

2014-10-17 Thread BalajiSeshadri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175500#comment-14175500
 ] 

BalajiSeshadri commented on KAFKA-1476:
---

[~nehanarkhede] I tried to implement the describe group,but the cal is getting 
stuck in while loop of readFrom.

Please see log below.
io.EOFException: Received -1 when reading from channel, socket has likely been 
closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
at 
kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154)
at 
kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92)
at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67)
at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
at 
kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154)
at 
kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92)
at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67)
at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
java.io.EOFException: Received -1 when reading from channel, socket has likely 
been closed.
at kafka.utils.Utils$.read(Utils.scala:381)
at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at 
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
at 
kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154)
at 
kafka.tools.ConsumerCommand$.getOffsetsByTopic(ConsumerCommand.scala:92)
at kafka.tools.ConsumerCommand$.main(ConsumerCommand.scala:67)
at kafka.tools.ConsumerCommand.main(ConsumerCommand.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

Its just continously going through while loop of 
kafka.client.ClientUtils$.channelToOffsetManager(ClientUtils.scala:154).

 Get a list of consumer groups
 -

 Key: KAFKA-1476
 URL: https://issues.apache.org/jira/browse/KAFKA-1476
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Williams
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1476-LIST-GROUPS.patch, KAFKA-1476-RENAME.patch, 
 KAFKA-1476.patch


 It would be useful to have a way to get a list of consumer groups currently 
 active via some tool/script that ships with kafka. This would be helpful so 
 that the system tools can be explored more easily.
 For example, when running the ConsumerOffsetChecker, it requires a group 
 option
 bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
 ?
 

[jira] [Updated] (KAFKA-1476) Get a list of consumer groups

2014-10-17 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1476:
--
Attachment: ConsumerCommand.scala

 Get a list of consumer groups
 -

 Key: KAFKA-1476
 URL: https://issues.apache.org/jira/browse/KAFKA-1476
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Williams
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.9.0

 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, 
 KAFKA-1476-RENAME.patch, KAFKA-1476.patch


 It would be useful to have a way to get a list of consumer groups currently 
 active via some tool/script that ships with kafka. This would be helpful so 
 that the system tools can be explored more easily.
 For example, when running the ConsumerOffsetChecker, it requires a group 
 option
 bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
 ?
 But, when just getting started with kafka, using the console producer and 
 consumer, it is not clear what value to use for the group option.  If a list 
 of consumer groups could be listed, then it would be clear what value to use.
 Background:
 http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-1710.
--
Resolution: Invalid
  Assignee: Ewen Cheslack-Postava

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1476) Get a list of consumer groups

2014-10-17 Thread BalajiSeshadri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175503#comment-14175503
 ] 

BalajiSeshadri commented on KAFKA-1476:
---

[~nehanarkhede] Please find the attached code.

 Get a list of consumer groups
 -

 Key: KAFKA-1476
 URL: https://issues.apache.org/jira/browse/KAFKA-1476
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 0.8.1.1
Reporter: Ryan Williams
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.9.0

 Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, 
 KAFKA-1476-RENAME.patch, KAFKA-1476.patch


 It would be useful to have a way to get a list of consumer groups currently 
 active via some tool/script that ships with kafka. This would be helpful so 
 that the system tools can be explored more easily.
 For example, when running the ConsumerOffsetChecker, it requires a group 
 option
 bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
 ?
 But, when just getting started with kafka, using the console producer and 
 consumer, it is not clear what value to use for the group option.  If a list 
 of consumer groups could be listed, then it would be clear what value to use.
 Background:
 http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175548#comment-14175548
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.

Thanks,
Bhavesh

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175551#comment-14175551
 ] 

Jay Kreps commented on KAFKA-1710:
--

[~Bmis13] What is the performance you see? What do you hope to see?

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175548#comment-14175548
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:19 PM:
-

[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.  Only request is to make  New Producer truly Async to enqueue 
the message regardless of  message key or partition number.

Thanks,
Bhavesh


was (Author: bmis13):
[~ewencp],

Thank you for entertaining this issue and you may close this.  I do agree with 
you if I increase number of producers then throughput will be alleviated  
(thread contention to critical block) at expense of TCP connections, memory 
etc.  

Do you think it would be good to open another jira issues or story for 
improving performance when sending to single partition for some time to avoid 
Thread contention?  Please let me know if I should open the performance aspect 
of New Producer.

Thanks,
Bhavesh

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by 

blog with some out of the box pains

2014-10-17 Thread Jay Kreps
This guy documented a few struggles getting going with Kafka. Not sure if
there is anything we can do to make it better?
http://ispyker.blogspot.com/2014/10/kafka-part-1.html

1. Would be great to figure out the apache/gradle thing.
2. The problem of having Kafka advertise localhost on AWS is really common.
I was thinking one possible solution for this would be to get all the
interfaces and prefer non-localhost interfaces if they exist.

-Jay


[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175574#comment-14175574
 ] 

Bhavesh Mistry commented on KAFKA-1710:
---

[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked both 
with new and old producers/consumer and entire linked-in pipeline)  But new 
producer inherit the same problem that old producer had thread contention when 
queuing message into buffer.   I think Kafka Dev team can do better because 
this use case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh


 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Bhavesh Mistry (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175574#comment-14175574
 ] 

Bhavesh Mistry edited comment on KAFKA-1710 at 10/17/14 9:32 PM:
-

[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked on both 
new and old producers/consumer and entire linked-in pipeline)  But new producer 
inherit the same problem that old producer had thread contention when queuing 
message into buffer.   I think Kafka Dev team can do better because this use 
case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh



was (Author: bmis13):
[~jkreps],

Only request is to make New Producer truly Async to enqueue the message 
regardless of message key hashcode or partition number for the message.   The 
new Producer is far far better than old Scala producer. ( I have worked both 
with new and old producers/consumer and entire linked-in pipeline)  But new 
producer inherit the same problem that old producer had thread contention when 
queuing message into buffer.   I think Kafka Dev team can do better because 
this use case of aggregating events into single partition is widely used. 
What my plan is to replace the Steam processing framework with Kafka is 
possible (For Aggregation and counting metrics etc)   We currently use 
following steam processor, but it has lots of down fall and only distribute the 
load which Kafka Brokers provide.  Any way this is our use case.

https://github.com/walmartlabs/mupd8
http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf 

Thanks,
Bhavesh


 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 

[jira] [Commented] (KAFKA-1710) [New Java Producer Potential Deadlock] Producer Deadlock when all messages is being sent to single partition

2014-10-17 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175586#comment-14175586
 ] 

Jay Kreps commented on KAFKA-1710:
--

Well but of course you can't have multiple threads appending to a shared in 
memory data structure without some synchronization. That lock should be very 
very cheap, though. What is meant by asynchronous is not that it doesn't block 
but rather that it doesn't block on the network request (after all due to gc, 
context switches, etc your program always stops). It sounds like you were 
seeing some kind of performance problem. What performance (say msgs/sec) were 
you seeing and what were you hoping for?

 [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
 being sent to single partition
 

 Key: KAFKA-1710
 URL: https://issues.apache.org/jira/browse/KAFKA-1710
 Project: Kafka
  Issue Type: Bug
  Components: producer 
 Environment: Development
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Critical
  Labels: performance
 Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
 TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
 th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
 th6.dump, th7.dump, th8.dump, th9.dump


 Hi Kafka Dev Team,
 When I run the test to send message to single partition for 3 minutes or so 
 on, I have encounter deadlock (please see the screen attached) and thread 
 contention from YourKit profiling.  
 Use Case:
 1)  Aggregating messages into same partition for metric counting. 
 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
 Here is output:
 Frozen threads found (potential deadlock)
  
 It seems that the following threads have not changed their stack for more 
 than 10 seconds.
 These threads are possibly (but not necessarily!) in a deadlock or hung.
  
 pool-1-thread-128 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-159 --- Frozen for at least 2m 1 sec
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 pool-1-thread-55 --- Frozen for at least 2m
 org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
 org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
 Callback) KafkaProducer.java:237
 org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
 TestNetworkDownProducer.java:84
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
 ThreadPoolExecutor.java:1145
 java.util.concurrent.ThreadPoolExecutor$Worker.run() 
 ThreadPoolExecutor.java:615
 java.lang.Thread.run() Thread.java:744
 Thanks,
 Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-10-17 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1108:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. Pushed to trunk and 0.8.2

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1108.patch, KAFKA-1108_2014-10-16_13:53:11.patch


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-10-17 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1108:
-
Fix Version/s: (was: 0.9.0)
   0.8.2

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
Assignee: Ewen Cheslack-Postava
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1108.patch, KAFKA-1108_2014-10-16_13:53:11.patch


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: blog with some out of the box pains

2014-10-17 Thread Gwen Shapira
In #2, do you refer to advertising the internal hostname instead of
the external one?
In this case, will it be enough to use getCanonicalHostName (which
uses a name service)?

Note that I think the problem the blog reported (wrong name
advertised) is somewhat orthogonal to the question of which interface
we bind to (which should probably be the default interface).

Gwen

On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote:
 This guy documented a few struggles getting going with Kafka. Not sure if
 there is anything we can do to make it better?
 http://ispyker.blogspot.com/2014/10/kafka-part-1.html

 1. Would be great to figure out the apache/gradle thing.
 2. The problem of having Kafka advertise localhost on AWS is really common.
 I was thinking one possible solution for this would be to get all the
 interfaces and prefer non-localhost interfaces if they exist.

 -Jay


Jenkins build is back to normal : Kafka-trunk #307

2014-10-17 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/307/changes



[jira] [Created] (KAFKA-1713) Partition files not created

2014-10-17 Thread Pradeep (JIRA)
Pradeep created KAFKA-1713:
--

 Summary: Partition files not created
 Key: KAFKA-1713
 URL: https://issues.apache.org/jira/browse/KAFKA-1713
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
 Environment: Linux, 2GB RAM, 2 Core CPU
Reporter: Pradeep


We are using Kafka Topic APIs to create the topic. But in some cases, the topic 
gets created but we don't see the partition specific files and when 
producer/consumer tries to get the topic metadata and it fails with exception.

k.p.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for 
topic tloader1 - No partition metadata for topic tloader1 due to 
kafka.common.UnknownTopicOrPartitionException}] for topic [tloader1]: class 
kafka.common.UnknownTopicOrPartitionException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1713) Partition files not created

2014-10-17 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede resolved KAFKA-1713.
--
Resolution: Won't Fix

Please can you redirect such questions to the mailing list?
Regarding your problem, did you send any data from the producer?

 Partition files not created
 ---

 Key: KAFKA-1713
 URL: https://issues.apache.org/jira/browse/KAFKA-1713
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
 Environment: Linux, 2GB RAM, 2 Core CPU
Reporter: Pradeep

 We are using Kafka Topic APIs to create the topic. But in some cases, the 
 topic gets created but we don't see the partition specific files and when 
 producer/consumer tries to get the topic metadata and it fails with exception.
 k.p.BrokerPartitionInfo - Error while fetching metadata [{TopicMetadata for 
 topic tloader1 - No partition metadata for topic tloader1 due to 
 kafka.common.UnknownTopicOrPartitionException}] for topic [tloader1]: class 
 kafka.common.UnknownTopicOrPartitionException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: blog with some out of the box pains

2014-10-17 Thread Jay Kreps
Hmm, yes, actually I don't think I actually understand the issue. Basically
as I understand it we do InetAddress.getLocalHost.getHostAddress which on
AWS picks the wrong hostname/ip and then the producer can't connect. People
eventually find this FAQ, but I was hoping there was a more automatic way
since everyone is on AWS these days. Maybe getCanonicalHostName would fix
it?

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers
?

-Jay

On Fri, Oct 17, 2014 at 3:19 PM, Gwen Shapira gshap...@cloudera.com wrote:

 In #2, do you refer to advertising the internal hostname instead of
 the external one?
 In this case, will it be enough to use getCanonicalHostName (which
 uses a name service)?

 Note that I think the problem the blog reported (wrong name
 advertised) is somewhat orthogonal to the question of which interface
 we bind to (which should probably be the default interface).

 Gwen

 On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote:
  This guy documented a few struggles getting going with Kafka. Not sure if
  there is anything we can do to make it better?
  http://ispyker.blogspot.com/2014/10/kafka-part-1.html
 
  1. Would be great to figure out the apache/gradle thing.
  2. The problem of having Kafka advertise localhost on AWS is really
 common.
  I was thinking one possible solution for this would be to get all the
  interfaces and prefer non-localhost interfaces if they exist.
 
  -Jay



Kafka Command Line Shell

2014-10-17 Thread Joe Stein
Hi, I have been thinking about the ease of use for operations with Kafka.
We have lots of tools doing a lot of different things and they are all kind
of in different places.

So, what I was thinking is to have a single interface for our tooling
https://issues.apache.org/jira/browse/KAFKA-1694

This would manifest itself in two ways 1) a command line interface 2) a repl

We would have one entry point centrally for all Kafka commands.
kafka CMD ARGS
kafka createTopic --brokerList etc,
kafka reassignPartition --brokerList etc,

or execute and run the shell

kafka --brokerList localhost
kafkause topicName;
kafkaset acl='label';

I was thinking that all calls would be initialized through --brokerList and
the broker can tell the KafkaCommandTool what server to connect to for
MetaData.

Thoughts? Tomatoes?

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/


Re: Kafka Command Line Shell

2014-10-17 Thread Steve Morin
Joe I think this is great!

On Fri, Oct 17, 2014 at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote:

 Hi, I have been thinking about the ease of use for operations with Kafka.
 We have lots of tools doing a lot of different things and they are all kind
 of in different places.

 So, what I was thinking is to have a single interface for our tooling
 https://issues.apache.org/jira/browse/KAFKA-1694

 This would manifest itself in two ways 1) a command line interface 2) a
 repl

 We would have one entry point centrally for all Kafka commands.
 kafka CMD ARGS
 kafka createTopic --brokerList etc,
 kafka reassignPartition --brokerList etc,

 or execute and run the shell

 kafka --brokerList localhost
 kafkause topicName;
 kafkaset acl='label';

 I was thinking that all calls would be initialized through --brokerList and
 the broker can tell the KafkaCommandTool what server to connect to for
 MetaData.

 Thoughts? Tomatoes?

 /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /



Re: blog with some out of the box pains

2014-10-17 Thread Gwen Shapira
Basically, the issue (or at least one of very many possible network
issues...) is that the server has localhost hardcoded as its
canonical name in /etc/hosts:

[root@Billc-cent70x64 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4 Billc-cent70x64
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

Unfortunately a very common default for RedHat and Centos machines.

As the blog mentions, a good solution (other than instructing Kafka on
the right name to advertise) is to add the correct IP and hostname to
/etc/hosts. We may want to add this option to the FAQ.

Gwen




On Fri, Oct 17, 2014 at 7:56 PM, Gwen Shapira gshap...@cloudera.com wrote:
 It looks like we are using canonical hostname:

  def register() {
 val advertisedHostName =
   if(advertisedHost == null || advertisedHost.trim.isEmpty)
 InetAddress.getLocalHost.getCanonicalHostName
   else
 advertisedHost
 val jmxPort =
 System.getProperty(com.sun.management.jmxremote.port, -1).toInt
 ZkUtils.registerBrokerInZk(zkClient, brokerId, advertisedHostName,
 advertisedPort, zkSessionTimeoutMs, jmxPort)
   }

 So never mind :)


 On Fri, Oct 17, 2014 at 6:36 PM, Jay Kreps jay.kr...@gmail.com wrote:
 Hmm, yes, actually I don't think I actually understand the issue. Basically
 as I understand it we do InetAddress.getLocalHost.getHostAddress which on
 AWS picks the wrong hostname/ip and then the producer can't connect. People
 eventually find this FAQ, but I was hoping there was a more automatic way
 since everyone is on AWS these days. Maybe getCanonicalHostName would fix
 it?

 https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers
 ?

 -Jay

 On Fri, Oct 17, 2014 at 3:19 PM, Gwen Shapira gshap...@cloudera.com wrote:

 In #2, do you refer to advertising the internal hostname instead of
 the external one?
 In this case, will it be enough to use getCanonicalHostName (which
 uses a name service)?

 Note that I think the problem the blog reported (wrong name
 advertised) is somewhat orthogonal to the question of which interface
 we bind to (which should probably be the default interface).

 Gwen

 On Fri, Oct 17, 2014 at 5:28 PM, Jay Kreps jay.kr...@gmail.com wrote:
  This guy documented a few struggles getting going with Kafka. Not sure if
  there is anything we can do to make it better?
  http://ispyker.blogspot.com/2014/10/kafka-part-1.html
 
  1. Would be great to figure out the apache/gradle thing.
  2. The problem of having Kafka advertise localhost on AWS is really
 common.
  I was thinking one possible solution for this would be to get all the
  interfaces and prefer non-localhost interfaces if they exist.
 
  -Jay



Re: Kafka Command Line Shell

2014-10-17 Thread Todd Palino
We've been talking about this a little internally as well. What about the idea 
of presenting all the admin functions through a web API interface (restful or 
not) complete with authentication? That would make it much easier for creating 
structure around Kafka without having to layer commands on top of each other.

I'm not a big fan of the language specific interfaces, because they tend to 
complicate trying to integrate with larger systems. Consider something like AWS 
or Azure, where it would be much easier if there is an API interface like that.

-Todd

 On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote:
 
 Hi, I have been thinking about the ease of use for operations with Kafka.
 We have lots of tools doing a lot of different things and they are all kind
 of in different places.
 
 So, what I was thinking is to have a single interface for our tooling
 https://issues.apache.org/jira/browse/KAFKA-1694
 
 This would manifest itself in two ways 1) a command line interface 2) a repl
 
 We would have one entry point centrally for all Kafka commands.
 kafka CMD ARGS
 kafka createTopic --brokerList etc,
 kafka reassignPartition --brokerList etc,
 
 or execute and run the shell
 
 kafka --brokerList localhost
 kafkause topicName;
 kafkaset acl='label';
 
 I was thinking that all calls would be initialized through --brokerList and
 the broker can tell the KafkaCommandTool what server to connect to for
 MetaData.
 
 Thoughts? Tomatoes?
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /


[jira] [Commented] (KAFKA-1583) Kafka API Refactoring

2014-10-17 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175763#comment-14175763
 ] 

Joel Koshy commented on KAFKA-1583:
---

Reviewing is taking a bit longer than expected - I'm about half-way through, so 
hopefully should be done on Monday.

 Kafka API Refactoring
 -

 Key: KAFKA-1583
 URL: https://issues.apache.org/jira/browse/KAFKA-1583
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1583.patch, KAFKA-1583_2014-08-20_13:54:38.patch, 
 KAFKA-1583_2014-08-21_11:30:34.patch, KAFKA-1583_2014-08-27_09:44:50.patch, 
 KAFKA-1583_2014-09-01_18:07:42.patch, KAFKA-1583_2014-09-02_13:37:47.patch, 
 KAFKA-1583_2014-09-05_14:08:36.patch, KAFKA-1583_2014-09-05_14:55:38.patch, 
 KAFKA-1583_2014-10-13_19:41:58.patch, KAFKA-1583_2014-10-16_21:15:40.patch, 
 KAFKA-1583_2014-10-17_09:56:33.patch


 This is the next step of KAFKA-1430. Details can be found at this page:
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+API+Refactoring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka Command Line Shell

2014-10-17 Thread Joel Koshy
+1 
It would definitely be useful to have a CLI. We had a cursory
discussion on this in the past [1] but it would be useful to have a
full proposal describing everything the CLI should provide.

[1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools 

On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote:
 We've been talking about this a little internally as well. What about the 
 idea of presenting all the admin functions through a web API interface 
 (restful or not) complete with authentication? That would make it much easier 
 for creating structure around Kafka without having to layer commands on top 
 of each other.
 
 I'm not a big fan of the language specific interfaces, because they tend to 
 complicate trying to integrate with larger systems. Consider something like 
 AWS or Azure, where it would be much easier if there is an API interface like 
 that.
 
 -Todd
 
  On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote:
  
  Hi, I have been thinking about the ease of use for operations with Kafka.
  We have lots of tools doing a lot of different things and they are all kind
  of in different places.
  
  So, what I was thinking is to have a single interface for our tooling
  https://issues.apache.org/jira/browse/KAFKA-1694
  
  This would manifest itself in two ways 1) a command line interface 2) a repl
  
  We would have one entry point centrally for all Kafka commands.
  kafka CMD ARGS
  kafka createTopic --brokerList etc,
  kafka reassignPartition --brokerList etc,
  
  or execute and run the shell
  
  kafka --brokerList localhost
  kafkause topicName;
  kafkaset acl='label';
  
  I was thinking that all calls would be initialized through --brokerList and
  the broker can tell the KafkaCommandTool what server to connect to for
  MetaData.
  
  Thoughts? Tomatoes?
  
  /***
  Joe Stein
  Founder, Principal Consultant
  Big Data Open Source Security LLC
  http://www.stealth.ly
  Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /



Re: Kafka Command Line Shell

2014-10-17 Thread Todd Palino
Absolutely. My suggestion of an HTTP interface was in addition to a CLI. I 
think the CLI should can the HTTP interface underneath to keep it simple.

-Todd


 On Oct 17, 2014, at 6:24 PM, Joel Koshy jjkosh...@gmail.com wrote:
 
 +1 
 It would definitely be useful to have a CLI. We had a cursory
 discussion on this in the past [1] but it would be useful to have a
 full proposal describing everything the CLI should provide.
 
 [1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools 
 
 On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote:
 We've been talking about this a little internally as well. What about the 
 idea of presenting all the admin functions through a web API interface 
 (restful or not) complete with authentication? That would make it much 
 easier for creating structure around Kafka without having to layer commands 
 on top of each other.
 
 I'm not a big fan of the language specific interfaces, because they tend to 
 complicate trying to integrate with larger systems. Consider something like 
 AWS or Azure, where it would be much easier if there is an API interface 
 like that.
 
 -Todd
 
 On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote:
 
 Hi, I have been thinking about the ease of use for operations with Kafka.
 We have lots of tools doing a lot of different things and they are all kind
 of in different places.
 
 So, what I was thinking is to have a single interface for our tooling
 https://issues.apache.org/jira/browse/KAFKA-1694
 
 This would manifest itself in two ways 1) a command line interface 2) a repl
 
 We would have one entry point centrally for all Kafka commands.
 kafka CMD ARGS
 kafka createTopic --brokerList etc,
 kafka reassignPartition --brokerList etc,
 
 or execute and run the shell
 
 kafka --brokerList localhost
 kafkause topicName;
 kafkaset acl='label';
 
 I was thinking that all calls would be initialized through --brokerList and
 the broker can tell the KafkaCommandTool what server to connect to for
 MetaData.
 
 Thoughts? Tomatoes?
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 


Re: Kafka Command Line Shell

2014-10-17 Thread gshapira
+1 to http interface and cli.

Http layer will make it easier to integrate with gui like Hue.

Gwen

—
Sent from Mailbox

On Fri, Oct 17, 2014 at 10:14 PM, Todd Palino tpal...@gmail.com wrote:

 Absolutely. My suggestion of an HTTP interface was in addition to a CLI. I 
 think the CLI should can the HTTP interface underneath to keep it simple.
 -Todd
 On Oct 17, 2014, at 6:24 PM, Joel Koshy jjkosh...@gmail.com wrote:
 
 +1 
 It would definitely be useful to have a CLI. We had a cursory
 discussion on this in the past [1] but it would be useful to have a
 full proposal describing everything the CLI should provide.
 
 [1] http://grokbase.com/t/kafka/dev/1435tr3pfc/command-line-tools 
 
 On Fri, Oct 17, 2014 at 05:12:16PM -0700, Todd Palino wrote:
 We've been talking about this a little internally as well. What about the 
 idea of presenting all the admin functions through a web API interface 
 (restful or not) complete with authentication? That would make it much 
 easier for creating structure around Kafka without having to layer commands 
 on top of each other.
 
 I'm not a big fan of the language specific interfaces, because they tend to 
 complicate trying to integrate with larger systems. Consider something like 
 AWS or Azure, where it would be much easier if there is an API interface 
 like that.
 
 -Todd
 
 On Oct 17, 2014, at 5:03 PM, Joe Stein joe.st...@stealth.ly wrote:
 
 Hi, I have been thinking about the ease of use for operations with Kafka.
 We have lots of tools doing a lot of different things and they are all kind
 of in different places.
 
 So, what I was thinking is to have a single interface for our tooling
 https://issues.apache.org/jira/browse/KAFKA-1694
 
 This would manifest itself in two ways 1) a command line interface 2) a 
 repl
 
 We would have one entry point centrally for all Kafka commands.
 kafka CMD ARGS
 kafka createTopic --brokerList etc,
 kafka reassignPartition --brokerList etc,
 
 or execute and run the shell
 
 kafka --brokerList localhost
 kafkause topicName;
 kafkaset acl='label';
 
 I was thinking that all calls would be initialized through --brokerList and
 the broker can tell the KafkaCommandTool what server to connect to for
 MetaData.
 
 Thoughts? Tomatoes?
 
 /***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 /
 

Re: blog with some out of the box pains

2014-10-17 Thread Ewen Cheslack-Postava
The first issue he runs into is one I also find frustrating -- with
cloud providers pushing SSDs, you have to use a pretty large instance
type to get a reasonable test setup. I'm not sure if he couldn't launch
an older type like m1.large (I think some newer AWS accounts aren't able
to) or if he just didn't see it as an option since they are hidden by
default. Even the largest general purpose instance types are pretty
wimpy wrt storage, only 80GB local instance storage.

The hostname issues are a well known pain point and unfortunately there
aren't any great solutions that aren't EC2-specific. Here's a quick run
down:

* None of the images for popular distros on EC2 will auto-set the
hostname beyond what EC2 already sets up (which isn't publicly
routable). The following details might explain why they can't. For
example, a recent Ubuntu image gives:

  ubuntu@ip-172-30-2-76:~$ hostname
  ip-172-30-2-76
  
  ubuntu@ip-172-30-2-76:~$ cat /etc/hosts
  127.0.0.1 localhost
  
  # The following lines are desirable for IPv6 capable hosts
  ::1 ip6-localhost ip6-loopback
  --- cut irrelevant pieces ---

* Sometimes the hostname is set, but isn't useful. For example, in this
Ubuntu image, the hostname is set to ip-[ip-address-], but that isn't
routable, so generates really irritating behavior. Running on the server
itself (which is running in a VPC, see below for more details):

  scala InetAddress.getLocalHost
  java.net.UnknownHostException: ip-172-30-2-76: ip-172-30-2-76: Name or
  service not known
  at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
  at .init(console:9)
  at .clinit(console)
  at .init(console:11)
  at .clinit(console)
  at $print(console)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
  scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
  at
  
scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
  at
  
scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
  at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.net.UnknownHostException: ip-172-30-2-76: Name or
  service not known
  at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
  at
  java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
  at
  
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
  at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
  ... 14 more

* As described in a bunch of places, the only reliable way to get public
DNS info is through EC2's own instance metadata API:
https://forums.aws.amazon.com/thread.jspa?threadID=77788 For example:

  curl -s http://169.254.169.254/latest/meta-data/public-hostname

might give something like:

  ec2-203-0-113-25.compute-1.amazonaws.com

* But you may not even *have* a public DNS hostname. If you launch in a
VPC, you'll only get one if you set the VPC to generate them (and I'm
pretty sure the default is to not create them):
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html The
output of the curl call above will just be empty.

* AWS is pretty aggressively trying to move away from EC2-Classic (i.e.
non-VPC instances), so most new instances will end up in VPCs unless you
are working in a grandfathered account + AZ. If VPC without public DNS
is the default, we'll have to carefully guide new users in generating a
setup that works properly if we try to use hostnames.

* Even if you try moving the IP addresses, you still have to deal with
VPCs. You can't directly get your public IP address without accessing
something outside the host since you're in a VPC. You need to use the
instance metadata API to look it up, i.e.,

  curl -s http://169.254.169.254/latest/meta-data/public-ipv4

* And yet another problem with IPs: unless you use an elastic IP, you're
not guaranteed they'll be stable:

  Auto-assign Public IP

  Requests a public IP address from Amazon's public IP address pool,
  to make your instance reachable from the Internet. In most cases, the
  public IP address is associated with the instance until it’s stopped
  or
  terminated, after which it’s no longer available for you to use. If
  you
  require a persistent public IP address that you can associate and
  disassociate at will, use an Elastic IP address (EIP) instead. You can
  allocate your own EIP, and associate it to your instance after launch.

I know Spark had some similar issues -- using their (very convenient!)
ec2 script, you still ended up with some stuff