date:20141201


[ 
https://issues.apache.org/jira/browse/KAFKA-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230151#comment-14230151
 ] 

Guozhang Wang commented on KAFKA-1800:
--

It is actually quite important for monitoring topic level error / retry rate 
for producer thrown exceptions since many users do not monitor global level 
metrics but only on topics that they are interested in.

 KafkaException was not recorded at the per-topic metrics
 

 Key: KAFKA-1800
 URL: https://issues.apache.org/jira/browse/KAFKA-1800
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1800.patch


 When KafkaException was thrown from producer.send() call, it is not recorded 
 on the per-topic record-error-rate, but only the global error-rate.
 Since users are usually monitoring on the per-topic metrics, loosing all 
 dropped message counts at this level that are caused by kafka producer thrown 
 exceptions such as BufferExhaustedException could be very dangerous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1783) Missing slash in documentation for the Zookeeper paths in ZookeeperConsumerConnector


[ 
https://issues.apache.org/jira/browse/KAFKA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230160#comment-14230160
 ] 

Guozhang Wang commented on KAFKA-1783:
--

Thanks for the patch, committed to trunk.

 Missing slash in documentation for the Zookeeper paths in 
 ZookeeperConsumerConnector
 

 Key: KAFKA-1783
 URL: https://issues.apache.org/jira/browse/KAFKA-1783
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Jean-Francois Im
Assignee: Neha Narkhede
Priority: Trivial
 Attachments: kafka-missing-doc-slash.patch


 The documentation for the ZookeeperConsumerConnector refers to the consumer 
 id registry location as /consumers/[group_id]/ids[consumer_id], it should be 
 /consumers/[group_id]/ids/[consumer_id], as evidenced by 
 registerConsumerInZK() and TopicCount.scala line 61.
 A patch is provided that adds the missing forwards slash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (KAFKA-1783) Missing slash in documentation for the Zookeeper paths in ZookeeperConsumerConnector


 [ 
https://issues.apache.org/jira/browse/KAFKA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1783.
--
Resolution: Fixed
  Assignee: Jean-Francois Im  (was: Neha Narkhede)

 Missing slash in documentation for the Zookeeper paths in 
 ZookeeperConsumerConnector
 

 Key: KAFKA-1783
 URL: https://issues.apache.org/jira/browse/KAFKA-1783
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Jean-Francois Im
Assignee: Jean-Francois Im
Priority: Trivial
 Attachments: kafka-missing-doc-slash.patch


 The documentation for the ZookeeperConsumerConnector refers to the consumer 
 id registry location as /consumers/[group_id]/ids[consumer_id], it should be 
 /consumers/[group_id]/ids/[consumer_id], as evidenced by 
 registerConsumerInZK() and TopicCount.scala line 61.
 A patch is provided that adds the missing forwards slash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1792) change behavior of --generate to produce assignment config with fair replica distribution and minimal number of reassignments

2014-12-01 Thread Neha Narkhede (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230236#comment-14230236
 ] 

Neha Narkhede commented on KAFKA-1792:
--

[~Dmitry Pekar] Left some comments on the rb. 

At a high level, it will help to know the source of the algorithm. Is this 
published or well tested someplace else? If this is completely new, we'd have 
to do the due diligence on generating enough test cases, beyond unit tests, to 
verify that the algorithm achieves its objectives (fairness and minimum data 
movement). Would you be up for sharing such test results. The metrics for such 
tests would be the # of replicas per broker and the total replica movement.

 change behavior of --generate to produce assignment config with fair replica 
 distribution and minimal number of reassignments
 -

 Key: KAFKA-1792
 URL: https://issues.apache.org/jira/browse/KAFKA-1792
 Project: Kafka
  Issue Type: Sub-task
  Components: tools
Reporter: Dmitry Pekar
Assignee: Dmitry Pekar
 Fix For: 0.8.3

 Attachments: KAFKA-1792.patch


 Current implementation produces fair replica distribution between specified 
 list of brokers. Unfortunately, it doesn't take
 into account current replica assignment.
 So if we have, for instance, 3 brokers id=[0..2] and are going to add fourth 
 broker id=3, 
 generate will create an assignment config which will redistribute replicas 
 fairly across brokers [0..3] 
 in the same way as those partitions were created from scratch. It will not 
 take into consideration current replica 
 assignment and accordingly will not try to minimize number of replica moves 
 between brokers.
 As proposed by [~charmalloc] this should be improved. New output of improved 
 --generate algorithm should suite following requirements:
 - fairness of replica distribution - every broker will have R or R+1 replicas 
 assigned;
 - minimum of reassignments - number of replica moves between brokers will be 
 minimal;
 Example.
 Consider following replica distribution per brokers [0..3] (we just added 
 brokers 2 and 3):
 - broker - 0, 1, 2, 3 
 - replicas - 7, 6, 0, 0
 The new algorithm will produce following assignment:
 - broker - 0, 1, 2, 3 
 - replicas - 4, 3, 3, 3
 - moves - -3, -3, +3, +3
 It will be fair and number of moves will be 6, which is minimal for specified 
 initial distribution.
 The scope of this issue is:
 - design an algorithm matching the above requirements;
 - implement this algorithm and unit tests;
 - test it manually using different initial assignments;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1753) add --decommission-broker option

2014-12-01 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230239#comment-14230239
 ] 

Joe Stein commented on KAFKA-1753:
--

So, I think with the changes in KAFKA-1792 this now becomes the same type of 
algo but the partitions on the broker being decommissioned are what we are 
evenly spreading to the rest of the cluster nodes essentially.

 add --decommission-broker option
 

 Key: KAFKA-1753
 URL: https://issues.apache.org/jira/browse/KAFKA-1753
 Project: Kafka
  Issue Type: Sub-task
  Components: tools
Reporter: Dmitry Pekar
Assignee: Dmitry Pekar
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 28481: Patch for KAFKA-1792

2014-12-01 Thread Neha Narkhede


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28481/#review63409
---



core/src/main/scala/kafka/admin/AdminUtils.scala
https://reviews.apache.org/r/28481/#comment105663

Will help to clarify that this groups the reassignment by partition.



core/src/main/scala/kafka/admin/AdminUtils.scala
https://reviews.apache.org/r/28481/#comment105662

It will help to document the algorithm with examples in detail here. Makes 
it much easier to understand the code



core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala
https://reviews.apache.org/r/28481/#comment105664

Shouldn't we get rid of the 2 ways of generating reassignments?


- Neha Narkhede


On Nov. 26, 2014, 9:09 p.m., Dmitry Pekar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28481/
 ---
 
 (Updated Nov. 26, 2014, 9:09 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1792
 https://issues.apache.org/jira/browse/KAFKA-1792
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1792: change behavior of --generate to produce assignment config with 
 fair replica distribution and minimal number of reassignments
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/admin/AdminUtils.scala 
 28b12c7b89a56c113b665fbde1b95f873f8624a3 
   core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala 
 979992b68af3723cd229845faff81c641123bb88 
   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
 e28979827110dfbbb92fe5b152e7f1cc973de400 
   topics.json ff011ed381e781b9a177036001d44dca3eac586f 
 
 Diff: https://reviews.apache.org/r/28481/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Dmitry Pekar

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-01 Thread Joe Stein

+1 to doing this, can you sub ticket in the security ticket when you create
the JIRA for this (unless you did it already and I missed it). I made one
comment in regards to the JSON returned on the confluence otherwise this
matches what we already agreed and discussed and like how it is breaking it
down into smaller chunks so that the different implementations can utilize
it.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Tue, Nov 25, 2014 at 2:13 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Hi Everyone,

 One of the pre-requisites we have for supporting multiple security
 protocols (SSL, Kerberos) is to support them on separate ports.

 This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
 several different issues - Multiple ports, enriching the channels, SSL
 implementation - which makes it more challenging to review and to test.

 I'd like to split this into 3 separate patches: multi-port brokers,
 enriching SocketChannel, and  the actual security implementations.

 Since even just adding support for multiple listeners per broker is
 somewhat involved and touches multiple components, I wrote a short design
 document that covers the necessary changes and the upgrade process:


 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers

 Comments are more than welcome :)

 If this is acceptable, hope to have a patch ready in few days.

 Gwen Shapira

Re: Review Request 27391: Fix KAFKA-1634

2014-12-01 Thread Guozhang Wang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27391/
---

(Updated Dec. 1, 2014, 7:44 p.m.)


Review request for kafka.


Bugs: KAFKA-1634
https://issues.apache.org/jira/browse/KAFKA-1634


Repository: kafka


Description (updated)
---

Add another api in offset manager to return the struct, and the cache layer 
will only read its expiration timestamp while the offset formatter will read 
the struct as a whole


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java 
7517b879866fc5dad5f8d8ad30636da8bbe7784a 
  
clients/src/main/java/org/apache/kafka/common/requests/OffsetCommitRequest.java 
3ee5cbad55ce836fd04bb954dcf6ef2f9bc3288f 
  
clients/src/test/java/org/apache/kafka/common/requests/RequestResponseTest.java 
df37fc6d8f0db0b8192a948426af603be3444da4 
  core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
050615c72efe7dbaa4634f53943bd73273d20ffb 
  core/src/main/scala/kafka/api/OffsetFetchRequest.scala 
c7604b9cdeb8f46507f04ed7d35fcb3d5f150713 
  core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 
4cabffeacea09a49913505db19a96a55d58c0909 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
da29a8cb461099eb675161db2f11a9937424a5c6 
  core/src/main/scala/kafka/server/KafkaApis.scala 
2a1c0326b6e6966d8b8254bd6a1cb83ad98a3b80 
  core/src/main/scala/kafka/server/KafkaServer.scala 
1bf7d10cef23a77e71eb16bf6d0e68bc4ebe 
  core/src/main/scala/kafka/server/OffsetManager.scala 
3c79428962604800983415f6f705e04f52acb8fb 
  core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala 
cd16ced5465d098be7a60498326b2a98c248f343 
  core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
8c5364fa97da1be09973c176d1baeb339455d319 

Diff: https://reviews.apache.org/r/27391/diff/


Testing
---


Thanks,

Guozhang Wang

[jira] [Updated] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation


 [ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1634:
-
Attachment: KAFKA-1634_2014-12-01_11:44:35.patch

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation


[ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230317#comment-14230317
 ] 

Guozhang Wang commented on KAFKA-1634:
--

Updated reviewboard https://reviews.apache.org/r/27391/diff/
 against branch origin/trunk

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost


 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1642:
-
Attachment: KAFKA-1642.patch

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Blocker
 Fix For: 0.8.2

 Attachments: 
 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
 KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
 KAFKA-1642_2014-10-23_16:19:41.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost


[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230709#comment-14230709
 ] 

Ewen Cheslack-Postava commented on KAFKA-1642:
--

Created reviewboard https://reviews.apache.org/r/28582/diff/
 against branch origin/trunk

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Blocker
 Fix For: 0.8.2

 Attachments: 
 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
 KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
 KAFKA-1642_2014-10-23_16:19:41.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost


 [ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1642:
-
Status: Patch Available  (was: Reopened)

 [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
 connection is lost
 ---

 Key: KAFKA-1642
 URL: https://issues.apache.org/jira/browse/KAFKA-1642
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Blocker
 Fix For: 0.8.2

 Attachments: 
 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
 KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
 KAFKA-1642_2014-10-23_16:19:41.patch


 I see my CPU spike to 100% when network connection is lost for while.  It 
 seems network  IO thread are very busy logging following error message.  Is 
 this expected behavior ?
 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
 org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
 producer I/O thread: 
 java.lang.IllegalStateException: No entry found for node -2
 at 
 org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
 at 
 org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
 at 
 org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
 at 
 org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
 at java.lang.Thread.run(Thread.java:744)
 Thanks,
 Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 28582: Patch for KAFKA-1642

2014-12-01 Thread Ewen Cheslack-Postava


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28582/
---

Review request for kafka.


Bugs: KAFKA-1642
https://issues.apache.org/jira/browse/KAFKA-1642


Repository: kafka


Description
---

KAFKA-1642 Make nodes as connecting before making connect request so exceptions 
are handled properly.


KAFKA-1642 Update last time no nodes were available when updating metadata has 
to initiate a connection.


KAFKA-1642 Don't busy wait trying to send metadata to a node that is connected 
but has too many in flight requests.


Diffs
-

  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
525b95e98010cd2053eacd8c321d079bcac2f910 

Diff: https://reviews.apache.org/r/28582/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava

[jira] [Commented] (KAFKA-1642) [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network connection is lost

[
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230715#comment-14230715
]

Ewen Cheslack-Postava commented on KAFKA-1642:
--

Attached a new patch that fixes all the timeout issues I'm aware of. Here's how
it addresses each of the situations I listed earlier:

1. lastNoNodeAvailableMs is updated, which forces metadata timeout for each
poll to be use a backoff period

2. Added another backoff value based on metadataFetchInProgress. Since the
request actually made it out, this can be arbitrarily large -- we just need to
see some sort of response or failure for the request.

3. Requires some response to arrive to clear out space, so we can wait
arbitrarily long. Updating lastNoNodeAvailableMs works even though it may wake
up sooner than necessary. But making all cases that didn't send the data use a
single approach keeps the code simpler.

4a. This can happen if, e.g., the network interface has been taken down
entirely. After fixing the ordering of marking the node as connecting and
issuing the request, this cleans up after that error cleanly.
lastNoNodeAvailableMs is updated since there *weren't* any nodes available.
This triggers a backoff period where the connection won't be retried.

4b. This can be handled in the same way - we set lastNoNodeAvailableMs whether
or not we immediately saw an error from the connection request. This causes it
to sleep while waiting for the connection request. This may wake up before we
get connected. However, if the node is still in the connecting state, it'll be
ignored during the next round and we'll either start trying to connect to
another node or we'll end up in state 1 with no nodes available. Either way, we
still only wake up periodically based on the this timeout.

5. Looking more carefully at how leastLoadedNode works, this case isn't
actually possible.

One additional note -- apparently you can't use Long.MAX_VALUE as a timeout, it
throws an exception. That's why Integer.MAX_VALUE is there instead. We could
also detect the large value and convert it to a negative value instead, which
the underlying API treats as having no timeout.

[~Bmis13] can you test this out for the failure modes you found?

[Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network
connection is lost
---

Key: KAFKA-1642
URL: https://issues.apache.org/jira/browse/KAFKA-1642
Project: Kafka
Issue Type: Bug
Components: producer
Affects Versions: 0.8.2
Reporter: Bhavesh Mistry
Assignee: Ewen Cheslack-Postava
Priority: Blocker
Fix For: 0.8.2

Attachments:
0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch,
KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch,
KAFKA-1642_2014-10-23_16:19:41.patch

I see my CPU spike to 100% when network connection is lost for while. It
seems network IO thread are very busy logging following error message. Is
this expected behavior ?
2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka
producer I/O thread:
java.lang.IllegalStateException: No entry found for node -2
at
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
at
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
at
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
at
org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:744)
Thanks,
Bhavesh

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (KAFKA-742) Existing directories under the Kafka data directory without any data cause process to not start

2014-12-01 Thread Ashish Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh reassigned KAFKA-742:


Assignee: Ashish Kumar Singh

 Existing directories under the Kafka data directory without any data cause 
 process to not start
 ---

 Key: KAFKA-742
 URL: https://issues.apache.org/jira/browse/KAFKA-742
 Project: Kafka
  Issue Type: Bug
  Components: config
Affects Versions: 0.8.0
Reporter: Chris Curtin
Assignee: Ashish Kumar Singh

 I incorrectly setup the configuration file to have the metrics go to 
 /var/kafka/metrics while the logs were in /var/kafka. On startup I received 
 the following error then the daemon exited:
 30   [main] INFO  kafka.log.LogManager  - [Log Manager on Broker 0] Loading 
 log 'metrics'
 32   [main] FATAL kafka.server.KafkaServerStartable  - Fatal error during 
 KafkaServerStable startup. Prepare to shutdown
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
 at java.lang.String.substring(String.java:1937)
 at 
 kafka.log.LogManager.kafka$log$LogManager$$parseTopicPartitionName(LogManager.scala:335)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$3.apply(LogManager.scala:112)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$3.apply(LogManager.scala:109)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:109)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:101)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at 
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
 at kafka.log.LogManager.loadLogs(LogManager.scala:101)
 at kafka.log.LogManager.init(LogManager.scala:62)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:59)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 34   [main] INFO  kafka.server.KafkaServer  - [Kafka Server 0], shutting down
 This was on a brand new cluster so no data or metrics logs existed yet.
 Moving the metrics to their own directory (not a child of the logs) allowed 
 the daemon to start.
 Took a few minutes to figure out what was wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-742) Existing directories under the Kafka data directory without any data cause process to not start

2014-12-01 Thread Ashish Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230737#comment-14230737
 ] 

Ashish Kumar Singh commented on KAFKA-742:
--

[~jkreps], [~chriscurtin], I would like to take a stab at this. Assigning it to 
myself.

 Existing directories under the Kafka data directory without any data cause 
 process to not start
 ---

 Key: KAFKA-742
 URL: https://issues.apache.org/jira/browse/KAFKA-742
 Project: Kafka
  Issue Type: Bug
  Components: config
Affects Versions: 0.8.0
Reporter: Chris Curtin
Assignee: Ashish Kumar Singh

 I incorrectly setup the configuration file to have the metrics go to 
 /var/kafka/metrics while the logs were in /var/kafka. On startup I received 
 the following error then the daemon exited:
 30   [main] INFO  kafka.log.LogManager  - [Log Manager on Broker 0] Loading 
 log 'metrics'
 32   [main] FATAL kafka.server.KafkaServerStartable  - Fatal error during 
 KafkaServerStable startup. Prepare to shutdown
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
 at java.lang.String.substring(String.java:1937)
 at 
 kafka.log.LogManager.kafka$log$LogManager$$parseTopicPartitionName(LogManager.scala:335)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$3.apply(LogManager.scala:112)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1$$anonfun$apply$3.apply(LogManager.scala:109)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:109)
 at 
 kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:101)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at 
 scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:32)
 at kafka.log.LogManager.loadLogs(LogManager.scala:101)
 at kafka.log.LogManager.init(LogManager.scala:62)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:59)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 34   [main] INFO  kafka.server.KafkaServer  - [Kafka Server 0], shutting down
 This was on a brand new cluster so no data or metrics logs existed yet.
 Moving the metrics to their own directory (not a child of the logs) allowed 
 the daemon to start.
 Took a few minutes to figure out what was wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1799) ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG doesn't work


 [ 
https://issues.apache.org/jira/browse/KAFKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1799:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch. +1 and committed to trunk and 0.8.2.

 ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG doesn't work
 --

 Key: KAFKA-1799
 URL: https://issues.apache.org/jira/browse/KAFKA-1799
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Manikumar Reddy
Priority: Blocker
  Labels: newbie++
 Fix For: 0.8.2

 Attachments: KAFKA-1799.patch, KAFKA-1799_2014-11-30_15:58:32.patch, 
 KAFKA-1799_2014-11-30_16:04:16.patch


 When running the following test, we got an unknown configuration exception.
 @Test
 public void testMetricsReporter() {
 Properties producerProps = new Properties();
 producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
 host1:123);
 producerProps.put(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG, 
 org.apache.kafka.clients.producer.new-metrics-reporter);
 new KafkaProducer(producerProps);
 }
 org.apache.kafka.common.config.ConfigException: Unknown configuration 
 'org.apache.kafka.clients.producer.new-metrics-reporter'
   at 
 org.apache.kafka.common.config.AbstractConfig.get(AbstractConfig.java:60)
   at 
 org.apache.kafka.common.config.AbstractConfig.getClass(AbstractConfig.java:91)
   at 
 org.apache.kafka.common.config.AbstractConfig.getConfiguredInstances(AbstractConfig.java:147)
   at 
 org.apache.kafka.clients.producer.KafkaProducer.init(KafkaProducer.java:105)
   at 
 org.apache.kafka.clients.producer.KafkaProducer.init(KafkaProducer.java:94)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-1803) UncleanLeaderElectionEnableProp in LogConfig should be of boolean

Jun Rao created KAFKA-1803:
--

 Summary: UncleanLeaderElectionEnableProp in LogConfig should be of 
boolean
 Key: KAFKA-1803
 URL: https://issues.apache.org/jira/browse/KAFKA-1803
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
 Fix For: 0.8.3


Now that KAFKA-1798 is fixed, we should define UncleanLeaderElectionEnableProp 
as a boolean, instead of String in LogConfig and get rid of the customized 
validation for boolean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1789) Issue with Async producer

2014-12-01 Thread devendra tagare (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230804#comment-14230804
 ] 

devendra tagare commented on KAFKA-1789:


Hi,

We are using an async producer.

~Dev

 Issue with Async producer
 -

 Key: KAFKA-1789
 URL: https://issues.apache.org/jira/browse/KAFKA-1789
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: devendra tagare
Priority: Critical

 Hi,
 We are using an async producer to send data to a kafka cluster.The event rate 
 at peak is around 250 events/second of size 25KB each.
 In the producer code base we have added specific debug statements to capture 
 the time taken to create a producer,create a keyed message with a byte 
 payload  send the message.
 We have added the below properties to the producerConfig
 queue.enqueue.timeout.ms=20
 send.buffer.bytes=1024000
 topic.metadata.refresh.interval.ms=3
 Based on the documentation, producer.send() queues the message on the async 
 producer's queue.
 So, ideally if the queue is full then the enqueue operation should result in 
 an kafka.common.QueueFullException in 20 ms.
 The logs indicate that the enqueue operation is taking more than 20ms (takes 
 around 250ms) without throwing any exceptions.
 Is there any other property that could conflict with queue.enqueue.timeout.ms 
 which is causing this behavior ?
 Or is it possible that the queue is not full  yet the producer.send() call 
 is still taking around 200ms under peak load ?
 Also, could you suggest any other alternatives so that we can either enforce 
 a timeout or throw an exception in-case the async producer is taking more 
 than a specified amount of time.
 Regards,
 Dev



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27391: Fix KAFKA-1634

2014-12-01 Thread Guozhang Wang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27391/
---

(Updated Dec. 2, 2014, 2:03 a.m.)


Review request for kafka.


Bugs: KAFKA-1634
https://issues.apache.org/jira/browse/KAFKA-1634


Repository: kafka


Description
---

Add another api in offset manager to return the struct, and the cache layer 
will only read its expiration timestamp while the offset formatter will read 
the struct as a whole


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java 
7517b879866fc5dad5f8d8ad30636da8bbe7784a 
  clients/src/main/java/org/apache/kafka/common/protocol/types/Struct.java 
121e880a941fcd3e6392859edba11a94236494cc 
  
clients/src/main/java/org/apache/kafka/common/requests/OffsetCommitRequest.java 
3ee5cbad55ce836fd04bb954dcf6ef2f9bc3288f 
  
clients/src/test/java/org/apache/kafka/common/requests/RequestResponseTest.java 
df37fc6d8f0db0b8192a948426af603be3444da4 
  core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
050615c72efe7dbaa4634f53943bd73273d20ffb 
  core/src/main/scala/kafka/api/OffsetFetchRequest.scala 
c7604b9cdeb8f46507f04ed7d35fcb3d5f150713 
  core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 
4cabffeacea09a49913505db19a96a55d58c0909 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
da29a8cb461099eb675161db2f11a9937424a5c6 
  core/src/main/scala/kafka/server/KafkaApis.scala 
2a1c0326b6e6966d8b8254bd6a1cb83ad98a3b80 
  core/src/main/scala/kafka/server/KafkaServer.scala 
1bf7d10cef23a77e71eb16bf6d0e68bc4ebe 
  core/src/main/scala/kafka/server/OffsetManager.scala 
3c79428962604800983415f6f705e04f52acb8fb 
  core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala 
cd16ced5465d098be7a60498326b2a98c248f343 
  core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
8c5364fa97da1be09973c176d1baeb339455d319 

Diff: https://reviews.apache.org/r/27391/diff/


Testing
---


Thanks,

Guozhang Wang

[jira] [Commented] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation


[ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230836#comment-14230836
 ] 

Guozhang Wang commented on KAFKA-1634:
--

Updated reviewboard https://reviews.apache.org/r/27391/diff/
 against branch origin/trunk

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, 
 KAFKA-1634_2014-12-01_18:03:12.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1634) Improve semantics of timestamp in OffsetCommitRequests and update documentation


 [ 
https://issues.apache.org/jira/browse/KAFKA-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1634:
-
Attachment: KAFKA-1634_2014-12-01_18:03:12.patch

 Improve semantics of timestamp in OffsetCommitRequests and update 
 documentation
 ---

 Key: KAFKA-1634
 URL: https://issues.apache.org/jira/browse/KAFKA-1634
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: Guozhang Wang
Priority: Blocker
 Fix For: 0.8.3

 Attachments: KAFKA-1634.patch, KAFKA-1634_2014-11-06_15:35:46.patch, 
 KAFKA-1634_2014-11-07_16:54:33.patch, KAFKA-1634_2014-11-17_17:42:44.patch, 
 KAFKA-1634_2014-11-21_14:00:34.patch, KAFKA-1634_2014-12-01_11:44:35.patch, 
 KAFKA-1634_2014-12-01_18:03:12.patch


 From the mailing list -
 following up on this -- I think the online API docs for OffsetCommitRequest
 still incorrectly refer to client-side timestamps:
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
 Wasn't that removed and now always handled server-side now?  Would one of
 the devs mind updating the API spec wiki?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1787) Consumer should delete offsets and release partition ownership for deleted topics


[ 
https://issues.apache.org/jira/browse/KAFKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230875#comment-14230875
 ] 

Jun Rao commented on KAFKA-1787:


Even if we trigger a rebalance in the consumer on topic deletion, it's still 
possible for a deleted topic to be recreated since a consumer could have issued 
a fetch request (with the deleted topic) after the topic is deleted on the 
broker, but before the rebalance is triggered on the consumer. To completely 
solve this problem, we need to add a create topic api and modify the 
TopicMetadataRequest such that it doesn't trigger auto topic creation on the 
broker. This is probably too big a change for 0.8.2 though.

 Consumer should delete offsets and release partition ownership for deleted 
 topics
 -

 Key: KAFKA-1787
 URL: https://issues.apache.org/jira/browse/KAFKA-1787
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Onur Karaman
Priority: Blocker
 Fix For: 0.8.2


 Marking as a blocker for now, but we can evaluate.
 If a topic is deleted, the consumer currently does not rebalance 
 (handleDataDeleted is a TODO in ZooKeeperConsumerConnector)
 As a result, it will continue owning that (deleted) partition and continue to 
 issue fetch requests. Those fetch requests will result in an 
 UnknownTopicOrPartition error and thus will result in TopicMetadataRequests 
 issued by the leader finder thread which can recreate the topic if 
 auto-create is turned on.
 Furthermore if we don't delete the offsets it is possible to lose messages if 
 the topic is recreated immediately and the previously checkpointed offset 
 (for the old data) is small. E.g., if a consumer is at offset 10 on some 
 partition, and if the partition is deleted and recreated and 15 messages are 
 sent to it then the first 10 messages will be lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1784) Implement a ConsumerOffsetClient library