[jira] [Created] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)
Artur Denysenko created KAFKA-1585:
--

 Summary: Client: Infinite conflict in /consumers/
 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko


Periodically we have kafka consumers cycling in to conflict in /consumers/.
Then process is restarted kafka consumers are working perfectly.






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Denysenko updated KAFKA-1585:
---

Description: 
Periodically we have kafka consumers cycling in to conflict in /consumers/. 
Please see attached log extract.
After restarting the process kafka consumers are working perfectly. 

We are using Zookeeper 3.4.5





  was:
Periodically we have kafka consumers cycling in to conflict in /consumers/.
Then process is restarted kafka consumers are working perfectly.





 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in to conflict in /consumers/. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Denysenko updated KAFKA-1585:
---

Component/s: consumer

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in to conflict in /consumers/. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Denysenko updated KAFKA-1585:
---

Description: 
Periodically we have kafka consumers cycling in conflict in /consumers/ and 
I wrote this conflicted ephemeral node. 
Please see attached log extract.
After restarting the process kafka consumers are working perfectly. 

We are using Zookeeper 3.4.5




  was:
Periodically we have kafka consumers cycling in to conflict in /consumers/. 
Please see attached log extract.
After restarting the process kafka consumers are working perfectly. 

We are using Zookeeper 3.4.5






 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Denysenko updated KAFKA-1585:
---

Priority: Critical  (was: Major)

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in to conflict in /consumers/. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1585:
-

Fix Version/s: 0.8.2

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Fix For: 0.8.2

 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092050#comment-14092050
 ] 

Joe Stein commented on KAFKA-1585:
--

FWIW there were a lot of bug fixes released in the Zookeeper 3.4.6 
http://zookeeper.apache.org/doc/r3.4.6/releasenotes.html from 3.4.5 version.

You could be hitting ZOOKEEPER-1382 which was fixed in the 3.4.6 release

Current Kafka 0.8.1.1 zookeeper recommend 
https://kafka.apache.org/documentation.html#zk though folks are using 3.4.6 in 
production and that should be Zookeeper version for 0.8.2

In regards to your logs, before this happened it looks like you had errors and 
then a reconnect and consumer shutdown

Line 132356: 18:31:38,948 [7-cloudera:2181] INFO  kafka.utils.Logging$class - 
[Q_dev-1407608193903-1cb30b18], Q_dev-1407608193903-1cb30b18-0 attempting to 
claim partition 0
Line 132357: 18:31:38,975 [26-d7f0e66a-0-0] ERROR kafka.utils.Logging$class - 
[ConsumerFetcherThread-Q_dev-1407608195226-d7f0e66a-0-0], Current offset 15 for 
partition [gk.q.event,0] out of range; reset offset to 0
Line 132358: 18:31:38,980 [62-1d81f64b-0-0] ERROR kafka.utils.Logging$class - 
[ConsumerFetcherThread-Q_dev-1407608193962-1d81f64b-0-0], Current offset 4 for 
partition [gk.q.mail.api,0] out of range; reset offset to 0
Line 132359: 18:31:38,994 [84-ceea5788-0-0] WARN  kafka.utils.Logging$class - 
Reconnect due to socket error: null
Line 132360: 18:31:38,995 [84-ceea5788-0-0] INFO  kafka.utils.Logging$class - 
[ConsumerFetcherThread-dev_dev-1407608194884-ceea5788-0-0], Stopped 
Line 132361: 18:31:38,995 [atcher_executor] INFO  kafka.utils.Logging$class - 
[ConsumerFetcherThread-dev_dev-1407608194884-ceea5788-0-0], Shutdown completed
Line 132362: 18:31:38,995 [atcher_executor] INFO  kafka.utils.Logging$class - 
[ConsumerFetcherManager-1407608194890] All connections stopped
Line 132363: 18:31:38,996 [atcher_executor] INFO  kafka.utils.Logging$class - 
[dev_dev-1407608194884-ceea5788], Cleared all relevant queues for this fetcher
Line 132364: 18:31:38,996 [atcher_executor] INFO  kafka.utils.Logging$class - 
[dev_dev-1407608194884-ceea5788], Cleared the data chunks in all the consumer 
message iterators
Line 132365: 18:31:38,996 [atcher_executor] INFO  kafka.utils.Logging$class - 
[dev_dev-1407608194884-ceea5788], Committing all offsets after clearing the 
fetcher queues
Line 132366: 18:31:38,996 [atcher_executor] INFO  kafka.utils.Logging$class - 
[dev_dev-1407608194884-ceea5788], Releasing partition ownership
Line 132367: 18:31:39,005 [7-cloudera:2181] INFO  kafka.utils.Logging$class - 
conflict in /consumers/Q/owners/gk.q.log/0 data: Q_dev-1407608193903-1cb30b18-0 
stored data: Q_dev-1407608205503-9cfb99aa-0

likely what happened is when it reconnected the timeout with zk never occurred 
and it got stuck there.  Could be the Zk bug, could also be related somewhat to 
KAFKA-1387 or KAFKA-1451 I will link the JIRAs so when we test 0.8.2 see about 
reproducing this on a good zk version

To resolve that you can stop the consumer, wait for the zk nodes to expire and 
start up the consumers again.



 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Fix For: 0.8.2

 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race

2014-08-10 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092051#comment-14092051
 ] 

Joe Stein commented on KAFKA-1451:
--

Hi, two issues so far where found with leader election 
https://issues.apache.org/jira/browse/KAFKA-1387?focusedCommentId=14087063page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14087063
 I don't know if the issues are related to each other or even to this just 
yet... the issues found were not happening on the 0.8.1 branch could be 
another 0.8.2 patch I supose but before I started trying to test on a 0.8.2 
version without this patch (to isolate the root cause) I wanted to see if this 
type of scenario was tested or what thoughts were in general to this patch and 
how it might be affecting either of the two issues found in 0.8.2 trunk?  

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch, 
 KAFKA-1451_2014-07-29_10:13:23.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24006: Patch for KAFKA-1420

2014-08-10 Thread Jonathan Natkins

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24006/#review50126
---



core/src/test/scala/unit/kafka/admin/AdminTest.scala
https://reviews.apache.org/r/24006/#comment87702

I wasn't totally sure I understood this comment, so I made a change that I 
think reflects what you were looking for. Let me know if I missed the mark.


- Jonathan Natkins


On Aug. 10, 2014, 9:11 p.m., Jonathan Natkins wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24006/
 ---
 
 (Updated Aug. 10, 2014, 9:11 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1420
 https://issues.apache.org/jira/browse/KAFKA-1420
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
 with TestUtils.createTopic in unit tests
 
 
 Diffs
 -
 
   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
 e28979827110dfbbb92fe5b152e7f1cc973de400 
   core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
 29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
 f44568cb25edf25db857415119018fd4c9922f61 
   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
 c4e13c5240c8303853d08cc3b40088f8c7dae460 
 
 Diff: https://reviews.apache.org/r/24006/diff/
 
 
 Testing
 ---
 
 Automated
 
 
 Thanks,
 
 Jonathan Natkins
 




[jira] [Commented] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests

2014-08-10 Thread Jonathan Natkins (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092203#comment-14092203
 ] 

Jonathan Natkins commented on KAFKA-1420:
-

Updated reviewboard https://reviews.apache.org/r/24006/diff/
 against branch origin/trunk

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch, KAFKA-1420_2014-07-30_11:18:26.patch, 
 KAFKA-1420_2014-07-30_11:24:55.patch, KAFKA-1420_2014-08-02_11:04:15.patch, 
 KAFKA-1420_2014-08-10_14:12:05.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092205#comment-14092205
 ] 

Jun Rao commented on KAFKA-1510:


Thinking about this a bit more, would it be more reliable to do the expiration 
of an offset based on the last connect time from the client, instead of the 
last time the offset is modified? In the new consumer, we will be tracking the 
set of consumers per consumer group on the broker. We can expire an offset if 
the time since the last time the partition was actively owned by a consumer 
exceeds the threshold. Handling consumer coordinator failover can be a bit 
tricky. We can probably just start doing the expiration countdown from the 
beginning during the failover. This means that the removal of some of the 
offsets may be delayed. This maybe ok since the consumer coordinator failover 
should be rare.

 Force offset commits when migrating consumer offsets from zookeeper to kafka
 

 Key: KAFKA-1510
 URL: https://issues.apache.org/jira/browse/KAFKA-1510
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Joel Koshy
  Labels: newbie
 Fix For: 0.8.2

 Attachments: kafka-1510.patch


 When migrating consumer offsets from ZooKeeper to kafka, we have to turn on 
 dual-commit (i.e., the consumers will commit offsets to both zookeeper and 
 kafka) in addition to setting offsets.storage to kafka. However, when we 
 commit offsets we only commit offsets if they have changed (since the last 
 commit). For low-volume topics or for topics that receive data in bursts 
 offsets may not move for a long period of time. Therefore we may want to 
 force the commit (even if offsets have not changed) when migrating (i.e., 
 when dual-commit is enabled) - we can add a minimum interval threshold (say 
 force commit after every 10 auto-commits) as well as on rebalance and 
 shutdown.
 Also, I think it is safe to switch the default for offsets.storage from 
 zookeeper to kafka and set the default to dual-commit (for people who have 
 not migrated yet). We have deployed this to the largest consumers at linkedin 
 and have not seen any issues so far (except for the migration caveat that 
 this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-08-10 Thread Artur Denysenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092207#comment-14092207
 ] 

Artur Denysenko commented on KAFKA-1585:


I've tried with Zookeeper 3.4.6 - same problem.

KAFKA-1029 have some comments related to consumers as well: 
https://issues.apache.org/jira/browse/KAFKA-1029?focusedCommentId=13944775

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Fix For: 0.8.2

 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-10 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-1586:
--

 Summary: support sticky partitioning in the new producer
 Key: KAFKA-1586
 URL: https://issues.apache.org/jira/browse/KAFKA-1586
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao


If a message doesn't specify a key or a partition, the new producer selects a 
partition for each message in a round-robin way. As a result, in a window of 
linger.ms, messages are spread around in all partitions of a topic. Compared 
with another strategy that assigns all messages to a single partition in the 
same time window, this strategy may not compress the message set as well since 
the batch is smaller. Another potential problem with this strategy is that the 
compression ratio could be sensitive to the change of # partitions in a topic. 
If # partitions are increased in a topic, the produced data may not be 
compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1586) support sticky partitioning in the new producer

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092237#comment-14092237
 ] 

Jun Rao commented on KAFKA-1586:


One way to address this issue is introduce a new config 
partition.sticky.time.ms in the new producer. The producer will then stick to 
a partition for the configured amount of time before switching to another. 
partition.sticky.time.ms can default to 0, which means every message will 
switch to a new partition.

 support sticky partitioning in the new producer
 ---

 Key: KAFKA-1586
 URL: https://issues.apache.org/jira/browse/KAFKA-1586
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao

 If a message doesn't specify a key or a partition, the new producer selects a 
 partition for each message in a round-robin way. As a result, in a window of 
 linger.ms, messages are spread around in all partitions of a topic. Compared 
 with another strategy that assigns all messages to a single partition in the 
 same time window, this strategy may not compress the message set as well 
 since the batch is smaller. Another potential problem with this strategy is 
 that the compression ratio could be sensitive to the change of # partitions 
 in a topic. If # partitions are increased in a topic, the produced data may 
 not be compressed as well as before. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092253#comment-14092253
 ] 

Jun Rao commented on KAFKA-1451:


Joe,

KAFKA-1387 seems to be related to broker registration and this jira only fixes 
how the controller is registered in ZK. So, I am not sure if they are related.

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch, 
 KAFKA-1451_2014-07-29_10:13:23.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092255#comment-14092255
 ] 

Jun Rao commented on KAFKA-1387:


Hmm, this seems really weird. Not sure why starting two brokers at the same 
time will affect the ZK registration. Is this reproducible by running multiple 
brokers on the same machine?

 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Reporter: Fedor Korotkiy

 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24214: Patch for KAFKA-1374

2014-08-10 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review50128
---



core/src/main/scala/kafka/log/LogCleaner.scala
https://reviews.apache.org/r/24214/#comment87704

Thinking about this a bit more. I am wondering if it would be better if we 
introduce a per-topic level log.compact.compress.codec property. During log 
compaction, we always write the retained data using the specified compress 
codec, independent of whether the original records are compressed or not. This 
provides the following benefits.

1. Whether the messages were compressed originally, they can be compressed 
on the broker side over time. Since compact topics preserve records much 
longer, enabling compression on the broker side will be beneficial in general.

2. As old records are removed, we still want to batch enough messages to do 
the compression.

3. The code can be a bit simpler. We can just (deep) iterate messages 
(using MemoryRecods.iterator) and append retained messages to an output 
MemoryRecords. The output MemoryRecords will be initialized with the configured 
compress codec and batch size.


- Jun Rao


On Aug. 9, 2014, 10:51 a.m., Manikumar Reddy O wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24214/
 ---
 
 (Updated Aug. 9, 2014, 10:51 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1374
 https://issues.apache.org/jira/browse/KAFKA-1374
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Jun's comments;Added few changes in LogCleaner stats for compressed 
 messages
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 
 5bfa764638e92f217d0ff7108ec8f53193c22978 
 
 Diff: https://reviews.apache.org/r/24214/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Manikumar Reddy O
 




Re: Review Request 24510: Patch for KAFKA-1582

2014-08-10 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24510/#review50127
---


Thanks for the patch. Could you take a look at the following comment?


system_test/utils/kafka_system_test_utils.py
https://reviews.apache.org/r/24510/#comment87703

Does this work as expected?

I tried the following test. However, the pid in the file tt is not the pid 
of the sleep process.

echo $!  tt  sleep 100


- Jun Rao


On Aug. 8, 2014, 9:50 p.m., Dong Lin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24510/
 ---
 
 (Updated Aug. 8, 2014, 9:50 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1582
 https://issues.apache.org/jira/browse/KAFKA-1582
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1582; System Test should wait for producer to finish
 
 
 Diffs
 -
 
   system_test/utils/kafka_system_test_utils.py 
 6edd64a92ce749421b5ab60fbea5b55e2ccbfe94 
 
 Diff: https://reviews.apache.org/r/24510/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Dong Lin
 




[jira] [Commented] (KAFKA-1587) Possible Memory Leak when we use Kafka 8.0 Producer for sending messages

2014-08-10 Thread Gopinath Sundaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092292#comment-14092292
 ] 

Gopinath Sundaram commented on KAFKA-1587:
--

I found this issue https://issues.apache.org/jira/browse/KAFKA-1024 and hence I 
double checked on the GC again. We are using Parallel GC only.

 Possible Memory Leak when we use Kafka 8.0 Producer for sending messages
 

 Key: KAFKA-1587
 URL: https://issues.apache.org/jira/browse/KAFKA-1587
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Gopinath Sundaram
Priority: Critical

 Hi Kafka team,
 We use Kafka to send messages in an high volume/memory crazy application 
 which uses Parallel GC. We send messages at the rate of 12500/min in the 
 first few hours and then the number of messages drop down to 6000/min. Our 
 application usually runs for a maximum of 24 hours
 What we have:
 1) When we do not send messages through Kafka Producer 0.8, then our 
 application never slows down much and our entire process completes within 24 
 hours
 2) When we use Kafka, our machines slow down in sending messages to around 
 2500/min and as time progresses, the number of messages being sent is even 
 lesser
 3) We suspect that our application spends more time in GC and hence the 
 problem. The Heap Dump does not contain an leak suspect with Kafka, but this 
 slowness happens only when Kafka messaging system is used.
 Any pointers that could help us resolve this issue will be highly appreciated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (KAFKA-1419) cross build for scala 2.11

2014-08-10 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-1419.


Resolution: Fixed

Thanks for the patch. +1. Committed to trunk after fixing README and the 2.11 
compilation warning.

 cross build for scala 2.11
 --

 Key: KAFKA-1419
 URL: https://issues.apache.org/jira/browse/KAFKA-1419
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.8.1
Reporter: Scott Clasen
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1419.patch, KAFKA-1419.patch, 
 KAFKA-1419_2014-07-28_15:05:16.patch, KAFKA-1419_2014-07-29_15:13:43.patch, 
 KAFKA-1419_2014-08-04_14:43:26.patch, KAFKA-1419_2014-08-05_12:51:16.patch, 
 KAFKA-1419_2014-08-07_10:17:34.patch, KAFKA-1419_2014-08-07_10:52:18.patch


 Please publish builds for scala 2.11, hopefully just needs a small tweak to 
 the gradle conf?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24510: Patch for KAFKA-1582

2014-08-10 Thread Dong Lin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24510/
---

(Updated Aug. 11, 2014, 4:23 a.m.)


Review request for kafka.


Bugs: KAFKA-1582
https://issues.apache.org/jira/browse/KAFKA-1582


Repository: kafka


Description
---

KAFKA-1582; System Test should wait for producer to finish


Diffs (updated)
-

  system_test/utils/kafka_system_test_utils.py 
6edd64a92ce749421b5ab60fbea5b55e2ccbfe94 

Diff: https://reviews.apache.org/r/24510/diff/


Testing
---


Thanks,

Dong Lin



Re: Review Request 24006: Patch for KAFKA-1420

2014-08-10 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24006/#review50137
---


Thanks for the patch. One comment below.


core/src/test/scala/unit/kafka/utils/TestUtils.scala
https://reviews.apache.org/r/24006/#comment87735

Now that KAFKA-1419 is committed, could we remove this?


- Jun Rao


On Aug. 10, 2014, 9:11 p.m., Jonathan Natkins wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24006/
 ---
 
 (Updated Aug. 10, 2014, 9:11 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1420
 https://issues.apache.org/jira/browse/KAFKA-1420
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
 with TestUtils.createTopic in unit tests
 
 
 Diffs
 -
 
   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
 e28979827110dfbbb92fe5b152e7f1cc973de400 
   core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
 29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
 f44568cb25edf25db857415119018fd4c9922f61 
   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
 c4e13c5240c8303853d08cc3b40088f8c7dae460 
 
 Diff: https://reviews.apache.org/r/24006/diff/
 
 
 Testing
 ---
 
 Automated
 
 
 Thanks,
 
 Jonathan Natkins
 




[jira] [Commented] (KAFKA-1582) System Test should wait for producer to finish

2014-08-10 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092428#comment-14092428
 ] 

Dong Lin commented on KAFKA-1582:
-

Updated reviewboard https://reviews.apache.org/r/24510/diff/
 against branch origin/trunk

 System Test should wait for producer to finish
 --

 Key: KAFKA-1582
 URL: https://issues.apache.org/jira/browse/KAFKA-1582
 Project: Kafka
  Issue Type: Bug
Reporter: Dong Lin
Assignee: Dong Lin
 Attachments: KAFKA-1582.patch, KAFKA-1582_2014-08-10_21:24:07.patch


 1) start_producer_in_thread() does not wait for producer to finish before 
 creating the next producer process. And producers may be killed before they 
 finish.
 2) Replace tab with spaces in kafka_system_test_utils.py.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24214: Patch for KAFKA-1374

2014-08-10 Thread Manikumar Reddy O


 On Aug. 10, 2014, 11:44 p.m., Jun Rao wrote:
  core/src/main/scala/kafka/log/LogCleaner.scala, lines 400-420
  https://reviews.apache.org/r/24214/diff/4/?file=657033#file657033line400
 
  Thinking about this a bit more. I am wondering if it would be better if 
  we introduce a per-topic level log.compact.compress.codec property. During 
  log compaction, we always write the retained data using the specified 
  compress codec, independent of whether the original records are compressed 
  or not. This provides the following benefits.
  
  1. Whether the messages were compressed originally, they can be 
  compressed on the broker side over time. Since compact topics preserve 
  records much longer, enabling compression on the broker side will be 
  beneficial in general.
  
  2. As old records are removed, we still want to batch enough messages 
  to do the compression.
  
  3. The code can be a bit simpler. We can just (deep) iterate messages 
  (using MemoryRecods.iterator) and append retained messages to an output 
  MemoryRecords. The output MemoryRecords will be initialized with the 
  configured compress codec and batch size.

What you proposed is similar to KAFKA-1499. KAFKA-1499 deals with default 
broker-side compression configuration.
I proposed new configuration properties on KAFKA-1499. The idea is to compress 
the data upon reaching the server.
This is applicable all topics (log compaction and retention).

Can you comment on KAFKA-1499?


- Manikumar Reddy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review50128
---


On Aug. 9, 2014, 10:51 a.m., Manikumar Reddy O wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24214/
 ---
 
 (Updated Aug. 9, 2014, 10:51 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1374
 https://issues.apache.org/jira/browse/KAFKA-1374
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addressed Jun's comments;Added few changes in LogCleaner stats for compressed 
 messages
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/log/LogCleaner.scala 
 c20de4ad4734c0bd83c5954fdb29464a27b91dff 
   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 
 5bfa764638e92f217d0ff7108ec8f53193c22978 
 
 Diff: https://reviews.apache.org/r/24214/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Manikumar Reddy O
 




Re: Review Request 24510: Patch for KAFKA-1582

2014-08-10 Thread Dong Lin


 On Aug. 11, 2014, 12:06 a.m., Jun Rao wrote:
  system_test/utils/kafka_system_test_utils.py, lines 1121-1124
  https://reviews.apache.org/r/24510/diff/1/?file=656792#file656792line1121
 
  Does this work as expected?
  
  I tried the following test. However, the pid in the file tt is not the 
  pid of the sleep process.
  
  echo $!  tt  sleep 100

Oh, you are right. echo #! actually outputs the pid of last started process, 
thus not the pid of the sleep process here.

I just uploaded a new patch to fix the problem. This time I have verified that 
1) system test waits for ProducerPerformance to finish, and 2) pid of the 
ProducerPerformance is indeed dumped to the pid file. 

Please review the updated patch. Thanks!


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24510/#review50127
---


On Aug. 11, 2014, 4:23 a.m., Dong Lin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24510/
 ---
 
 (Updated Aug. 11, 2014, 4:23 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1582
 https://issues.apache.org/jira/browse/KAFKA-1582
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1582; System Test should wait for producer to finish
 
 
 Diffs
 -
 
   system_test/utils/kafka_system_test_utils.py 
 6edd64a92ce749421b5ab60fbea5b55e2ccbfe94 
 
 Diff: https://reviews.apache.org/r/24510/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Dong Lin
 




[jira] [Updated] (KAFKA-1574) unit tests can hang on socketserver shutdown

2014-08-10 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1574:
---

Summary: unit tests can hang on socketserver shutdown  (was: unit tests can 
hand on socketserver shutdown)

 unit tests can hang on socketserver shutdown
 

 Key: KAFKA-1574
 URL: https://issues.apache.org/jira/browse/KAFKA-1574
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
  Labels: newbie++

 Saw the following stacktrace.
 kafka-network-thread-59843-2 prio=5 tid=7fc7e5943800 nid=0x11eefa000 
 runnable [11eef9000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
 at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
 at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 7f4a80328 (a sun.nio.ch.Util$2)
 - locked 7f4a80310 (a java.util.Collections$UnmodifiableSet)
 - locked 7f4a71968 (a sun.nio.ch.KQueueSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at kafka.network.Processor.run(SocketServer.scala:296)
 at java.lang.Thread.run(Thread.java:695)
 Test worker prio=5 tid=7fc7e50d4800 nid=0x11534c000 waiting on condition 
 [115349000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  7f4a69d50 (a 
 java.util.concurrent.CountDownLatch$Sync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
 at kafka.network.AbstractServerThread.shutdown(SocketServer.scala:113)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:92)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:91)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
 at kafka.network.SocketServer.shutdown(SocketServer.scala:91)
 at 
 kafka.server.KafkaServer$$anonfun$shutdown$3.apply$mcV$sp(KafkaServer.scala:246)
 at kafka.utils.Utils$.swallow(Utils.scala:172)
 at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
 at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
 at kafka.utils.Logging$class.swallow(Logging.scala:94)
 at kafka.utils.Utils$.swallow(Utils.scala:45)
 at kafka.server.KafkaServer.shutdown(KafkaServer.scala:246)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24540: Patch for KAFKA-1574

2014-08-10 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24540/
---

Review request for kafka.


Bugs: KAFKA-1574
https://issues.apache.org/jira/browse/KAFKA-1574


Repository: kafka


Description
---

set alive to true only during initialization


Diffs
-

  core/src/main/scala/kafka/network/SocketServer.scala 
8e99de08a443ce7cb032968c602655444ee30dce 

Diff: https://reviews.apache.org/r/24540/diff/


Testing
---


Thanks,

Jun Rao



[jira] [Commented] (KAFKA-1574) unit tests can hang on socketserver shutdown

2014-08-10 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092436#comment-14092436
 ] 

Jun Rao commented on KAFKA-1574:


Created reviewboard https://reviews.apache.org/r/24540/
 against branch origin/trunk

 unit tests can hang on socketserver shutdown
 

 Key: KAFKA-1574
 URL: https://issues.apache.org/jira/browse/KAFKA-1574
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
  Labels: newbie++
 Attachments: KAFKA-1574.patch


 Saw the following stacktrace.
 kafka-network-thread-59843-2 prio=5 tid=7fc7e5943800 nid=0x11eefa000 
 runnable [11eef9000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
 at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
 at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 7f4a80328 (a sun.nio.ch.Util$2)
 - locked 7f4a80310 (a java.util.Collections$UnmodifiableSet)
 - locked 7f4a71968 (a sun.nio.ch.KQueueSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at kafka.network.Processor.run(SocketServer.scala:296)
 at java.lang.Thread.run(Thread.java:695)
 Test worker prio=5 tid=7fc7e50d4800 nid=0x11534c000 waiting on condition 
 [115349000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  7f4a69d50 (a 
 java.util.concurrent.CountDownLatch$Sync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
 at kafka.network.AbstractServerThread.shutdown(SocketServer.scala:113)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:92)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:91)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
 at kafka.network.SocketServer.shutdown(SocketServer.scala:91)
 at 
 kafka.server.KafkaServer$$anonfun$shutdown$3.apply$mcV$sp(KafkaServer.scala:246)
 at kafka.utils.Utils$.swallow(Utils.scala:172)
 at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
 at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
 at kafka.utils.Logging$class.swallow(Logging.scala:94)
 at kafka.utils.Utils$.swallow(Utils.scala:45)
 at kafka.server.KafkaServer.shutdown(KafkaServer.scala:246)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1574) unit tests can hang on socketserver shutdown

2014-08-10 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-1574:
---

Attachment: KAFKA-1574.patch

 unit tests can hang on socketserver shutdown
 

 Key: KAFKA-1574
 URL: https://issues.apache.org/jira/browse/KAFKA-1574
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
  Labels: newbie++
 Attachments: KAFKA-1574.patch


 Saw the following stacktrace.
 kafka-network-thread-59843-2 prio=5 tid=7fc7e5943800 nid=0x11eefa000 
 runnable [11eef9000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
 at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:136)
 at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:69)
 at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 - locked 7f4a80328 (a sun.nio.ch.Util$2)
 - locked 7f4a80310 (a java.util.Collections$UnmodifiableSet)
 - locked 7f4a71968 (a sun.nio.ch.KQueueSelectorImpl)
 at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 at kafka.network.Processor.run(SocketServer.scala:296)
 at java.lang.Thread.run(Thread.java:695)
 Test worker prio=5 tid=7fc7e50d4800 nid=0x11534c000 waiting on condition 
 [115349000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  7f4a69d50 (a 
 java.util.concurrent.CountDownLatch$Sync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
 at kafka.network.AbstractServerThread.shutdown(SocketServer.scala:113)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:92)
 at 
 kafka.network.SocketServer$$anonfun$shutdown$2.apply(SocketServer.scala:91)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
 at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
 at kafka.network.SocketServer.shutdown(SocketServer.scala:91)
 at 
 kafka.server.KafkaServer$$anonfun$shutdown$3.apply$mcV$sp(KafkaServer.scala:246)
 at kafka.utils.Utils$.swallow(Utils.scala:172)
 at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
 at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
 at kafka.utils.Logging$class.swallow(Logging.scala:94)
 at kafka.utils.Utils$.swallow(Utils.scala:45)
 at kafka.server.KafkaServer.shutdown(KafkaServer.scala:246)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)
 at 
 kafka.admin.AdminTest$$anonfun$testPartitionReassignmentNonOverlappingReplicas$3.apply(AdminTest.scala:232)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24540: Patch for KAFKA-1574

2014-08-10 Thread Jay Kreps

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24540/#review50148
---

Ship it!


Ship It!

- Jay Kreps


On Aug. 11, 2014, 4:44 a.m., Jun Rao wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24540/
 ---
 
 (Updated Aug. 11, 2014, 4:44 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1574
 https://issues.apache.org/jira/browse/KAFKA-1574
 
 
 Repository: kafka
 
 
 Description
 ---
 
 set alive to true only during initialization
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/network/SocketServer.scala 
 8e99de08a443ce7cb032968c602655444ee30dce 
 
 Diff: https://reviews.apache.org/r/24540/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jun Rao
 




Re: Review Request 24540: Patch for KAFKA-1574

2014-08-10 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24540/#review50157
---

Ship it!


Ship It!

- Guozhang Wang


On Aug. 11, 2014, 4:44 a.m., Jun Rao wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24540/
 ---
 
 (Updated Aug. 11, 2014, 4:44 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1574
 https://issues.apache.org/jira/browse/KAFKA-1574
 
 
 Repository: kafka
 
 
 Description
 ---
 
 set alive to true only during initialization
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/network/SocketServer.scala 
 8e99de08a443ce7cb032968c602655444ee30dce 
 
 Diff: https://reviews.apache.org/r/24540/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jun Rao
 




[jira] [Commented] (KAFKA-1588) Offset response does not support two requests for the same topic/partition combo

2014-08-10 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092461#comment-14092461
 ] 

Guozhang Wang commented on KAFKA-1588:
--

As have discussed with Todd, changing the map to set in the offset response 
would resolve this problem.

 Offset response does not support two requests for the same topic/partition 
 combo
 

 Key: KAFKA-1588
 URL: https://issues.apache.org/jira/browse/KAFKA-1588
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Todd Palino

 When performing an OffsetFetchRequest, if you request the same topic and 
 partition combination in a single request more than once (for example, if you 
 want to get both the head and tail offsets for a partition in the same 
 request), you will get a response for both, but they will be the same offset.
 We identified that the problem is that when the offset response is assembled, 
 a map is used to store the offset info before it is converted to the response 
 format and sent to the client. Therefore, the second request for a 
 topic/partition combination will overwrite the offset from the first request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24006: Patch for KAFKA-1420

2014-08-10 Thread Guozhang Wang


 On Aug. 10, 2014, 9:12 p.m., Jonathan Natkins wrote:
  core/src/test/scala/unit/kafka/admin/AdminTest.scala, line 114
  https://reviews.apache.org/r/24006/diff/3-4/?file=646111#file646111line114
 
  I wasn't totally sure I understood this comment, so I made a change 
  that I think reflects what you were looking for. Let me know if I missed 
  the mark.

What I meant is that we need to guarantee the preferred replica would be the 
first replica in the list. For our case, it just that

range(0, 10).map(i = (i - i % brokers.size)).toMap

gets the same result that it gets the first replica of the lists returned by

combinations(brokers, replicationFactor)

but it may not always be the case.


- Guozhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24006/#review50126
---


On Aug. 10, 2014, 9:11 p.m., Jonathan Natkins wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24006/
 ---
 
 (Updated Aug. 10, 2014, 9:11 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1420
 https://issues.apache.org/jira/browse/KAFKA-1420
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
 with TestUtils.createTopic in unit tests
 
 
 Diffs
 -
 
   core/src/test/scala/unit/kafka/admin/AdminTest.scala 
 e28979827110dfbbb92fe5b152e7f1cc973de400 
   core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
 29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
   core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
 f44568cb25edf25db857415119018fd4c9922f61 
   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
 c4e13c5240c8303853d08cc3b40088f8c7dae460 
 
 Diff: https://reviews.apache.org/r/24006/diff/
 
 
 Testing
 ---
 
 Automated
 
 
 Thanks,
 
 Jonathan Natkins
 




Re: Review Request 24510: Patch for KAFKA-1582

2014-08-10 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24510/#review50159
---

Ship it!


Ship It!

- Guozhang Wang


On Aug. 11, 2014, 4:23 a.m., Dong Lin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24510/
 ---
 
 (Updated Aug. 11, 2014, 4:23 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1582
 https://issues.apache.org/jira/browse/KAFKA-1582
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1582; System Test should wait for producer to finish
 
 
 Diffs
 -
 
   system_test/utils/kafka_system_test_utils.py 
 6edd64a92ce749421b5ab60fbea5b55e2ccbfe94 
 
 Diff: https://reviews.apache.org/r/24510/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Dong Lin
 




Jenkins build is back to normal : Kafka-trunk #243

2014-08-10 Thread Apache Jenkins Server
See https://builds.apache.org/job/Kafka-trunk/243/changes