date:20140728


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23962/
---

(Updated July 28, 2014, 2:50 p.m.)


Review request for kafka.


Bugs: KAFKA-1451
https://issues.apache.org/jira/browse/KAFKA-1451


Repository: kafka


Description (updated)
---

Addresing Jun's comments


Diffs (updated)
-

  core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala 
e5b6ff1e2544b043007cf16a6b9dd4451c839e63 

Diff: https://reviews.apache.org/r/23962/diff/


Testing
---


Thanks,

Manikumar Reddy O

[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race


[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076270#comment-14076270
 ] 

Manikumar Reddy commented on KAFKA-1451:


Updated reviewboard https://reviews.apache.org/r/23962/diff/
 against branch origin/trunk

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:17:21.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race


 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451_2014-07-28_20:17:21.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:17:21.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race


 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: (was: KAFKA-1451_2014-07-28_20:17:21.patch)

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 23983: Patch for KAFKA-1451


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23983/
---

Review request for kafka.


Bugs: KAFKA-1451
https://issues.apache.org/jira/browse/KAFKA-1451


Repository: kafka


Description
---

controller existence check added in election process of ZookeeperLeaderElector


Diffs
-

  core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala 
e5b6ff1e2544b043007cf16a6b9dd4451c839e63 

Diff: https://reviews.apache.org/r/23983/diff/


Testing
---


Thanks,

Manikumar Reddy O

[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race


[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076274#comment-14076274
 ] 

Manikumar Reddy commented on KAFKA-1451:


Created reviewboard https://reviews.apache.org/r/23983/diff/
 against branch origin/trunk

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race


 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23962: Patch for KAFKA-1451


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23962/
---

(Updated July 28, 2014, 2:59 p.m.)


Review request for kafka.


Bugs: KAFKA-1451
https://issues.apache.org/jira/browse/KAFKA-1451


Repository: kafka


Description (updated)
---

Addresing Jun's Comments


Diffs (updated)
-

  core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala 
e5b6ff1e2544b043007cf16a6b9dd4451c839e63 

Diff: https://reviews.apache.org/r/23962/diff/


Testing
---


Thanks,

Manikumar Reddy O

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race


 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: (was: KAFKA-1451.patch)

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race


 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451_2014-07-28_20:27:32.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race


[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076277#comment-14076277
 ] 

Manikumar Reddy commented on KAFKA-1451:


Updated reviewboard https://reviews.apache.org/r/23962/diff/
 against branch origin/trunk

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23895: Patch for KAFKA-1419

2014-07-28 Thread Ivan Lyutov


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23895/
---

(Updated July 28, 2014, 3:07 p.m.)


Review request for kafka.


Bugs: KAFKA-1419
https://issues.apache.org/jira/browse/KAFKA-1419


Repository: kafka


Description (updated)
---

KAFKA-1419 - cross build for scala 2.11 - dropped scala 2.8 support - minor bug 
fixes


Diffs (updated)
-

  build.gradle a72905df824ba68bed5d5170d18873c23e1782c9 
  core/src/main/scala/kafka/javaapi/message/ByteBufferMessageSet.scala 
fecee8d5f7b32f483bb1bfc6a5080d589906f9c4 
  core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
73401c5ff34d08abce22267aa9c4d86632c6fb74 
  gradle.properties 4827769a3f8e34f0fe7e783eb58e44d4db04859b 
  gradle/buildscript.gradle 225e0a82708bc5f390e5e2c1d4d9a0d06f491b95 
  gradle/wrapper/gradle-wrapper.properties 
610282a699afc89a82203ef0e4e71ecc53761039 
  scala.gradle ebd21b870c0746aade63248344ab65d9b5baf820 

Diff: https://reviews.apache.org/r/23895/diff/


Testing
---


Thanks,

Ivan Lyutov

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

2014-07-28 Thread Jun Rao (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076286#comment-14076286
]

Jun Rao commented on KAFKA-1555:

Hmm, I have to understand the semantics and the implementation of
dfs.replication.min a bit more. For now, this seems more like the ack 1 case
in Kafka. Basically, if you have R replicas and you set dfs.replication.min to
a value smaller than R, you can get an ack back sooner. However, there is a
slight possibility that the acked data is lost with fewer than R-1 failures.

A couple of other thoughts on min.isr.required.

1. Let's say we do support this mode. However, immediately after a message is
successfully published, some replicas could go down and bring the ISR below
min.isr.required. There is still no guarantee that you have more than one
copy of the data at that point. So, min.isr.required can only be enforced at
the time of publishing. Is that what you want?

2. I image the implementation could be when a message is committed, we check
the ISR size and return an error if |ISR| is less than min.isr.required.
However, the message is still committed and will be exposed to the consumer.
The error is just to tell the producer that the message may be under
replicated. Does that match what you expect?

provide strong consistency with reasonable availability
---

Key: KAFKA-1555
URL: https://issues.apache.org/jira/browse/KAFKA-1555
Project: Kafka
Issue Type: Improvement
Components: controller
Affects Versions: 0.8.1.1
Reporter: Jiang Wu
Assignee: Neha Narkhede

In a mission critical application, we expect a kafka cluster with 3 brokers
can satisfy two requirements:
1. When 1 broker is down, no message loss or service blocking happens.
2. In worse cases such as two brokers are down, service can be blocked, but
no message loss happens.
We found that current kafka versoin (0.8.1.1) cannot achieve the requirements
due to its three behaviors:
1. when choosing a new leader from 2 followers in ISR, the one with less
messages may be chosen as the leader.
2. even when replica.lag.max.messages=0, a follower can stay in ISR when it
has less messages than the leader.
3. ISR can contains only 1 broker, therefore acknowledged messages may be
stored in only 1 broker.
The following is an analytical proof.
We consider a cluster with 3 brokers and a topic with 3 replicas, and assume
that at the beginning, all 3 replicas, leader A, followers B and C, are in
sync, i.e., they have the same messages and are all in ISR.
According to the value of request.required.acks (acks for short), there are
the following cases.
1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this
time, although C hasn't received m, C is still in ISR. If A is killed, C can
be elected as the new leader, and consumers will miss m.
3. acks=-1. B and C restart and are removed from ISR. Producer sends a
message m to A, and receives an acknowledgement. Disk failure happens in A
before B and C replicate m. Message m is lost.
In summary, any existing configuration cannot satisfy the requirements.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1419) cross build for scala 2.11

2014-07-28 Thread Ivan Lyutov (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Lyutov updated KAFKA-1419:
---

Attachment: KAFKA-1419_2014-07-28_15:05:16.patch

 cross build for scala 2.11
 --

 Key: KAFKA-1419
 URL: https://issues.apache.org/jira/browse/KAFKA-1419
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.8.1
Reporter: Scott Clasen
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1419.patch, KAFKA-1419.patch, 
 KAFKA-1419_2014-07-28_15:05:16.patch


 Please publish builds for scala 2.11, hopefully just needs a small tweak to 
 the gradle conf?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1419) cross build for scala 2.11

2014-07-28 Thread Ivan Lyutov (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076285#comment-14076285
 ] 

Ivan Lyutov commented on KAFKA-1419:


Updated reviewboard https://reviews.apache.org/r/23895/diff/
 against branch apache/trunk

 cross build for scala 2.11
 --

 Key: KAFKA-1419
 URL: https://issues.apache.org/jira/browse/KAFKA-1419
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.8.1
Reporter: Scott Clasen
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1419.patch, KAFKA-1419.patch, 
 KAFKA-1419_2014-07-28_15:05:16.patch


 Please publish builds for scala 2.11, hopefully just needs a small tweak to 
 the gradle conf?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23648: Patch for KAFKA-1524

2014-07-28 Thread Raul Castro Fernandez

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 92
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line92

What if the coordinator itself fails after a while and needs to be
re-queried? Not sure if you intend for all of these to be covered in the
subsequent work that addresses failure scenarios more thoroughly.

Yes, I will revisit this once we check all failure scenarios, as there are
other changes we may need to make.

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 146
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line146

I think it would be clearer to use a negated condition. i.e., if
(txStatus != TransactionStatus.NOTRANSACTION)

There are more than 2 states. With a negated condition we would not allow to
start a transaction when the status is ABORTED, for example.

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 158
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line158

if (txStatus != TransactionStatus.ONGOING)

Similar to previous. In this case we would be allowing to abort a committed
transaction.

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 170
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line170

if (txStatus != TransactionStatus.ONGOING)

In this case, repeating a commit command would throw an exception. I'm not sure
if this is what we want, as there might be some failure scenarios where we want
to be able to repeat commits. Not sure about this, maybe I should wait to have
the failure handling done and then check this again?

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 215
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line215

Each check is a full iteration. Rather, can we just keep a count of
pending messages for this check?

Yes, the iteration also removes messages. The idea is to remove them lazily
when it is necessary to check if there are more to send. The alternative seems
more complex, as we would need to attach a listener to check and remove
received messages. Also, by keeping the remove logic in the same method that
counts, we avoid the need to synch. between listener and this thread, i.e. it
seems easier to return the exact count.

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/NetworkClient.java, line 186
https://reviews.apache.org/r/23648/diff/1/?file=634383#file634383line186

One issue with this call is that it can block and add to the total time
that poll needs to wait which breaks the API contract of poll (i.e., it
should return in at most timeout ms). Can we fold these into the poll below?

Also, (especially since the method is prefixed with) maybe... it would
be cleaner to move the txContext null check inside the method.

Regarding poll, do you mean putting the maybeUpdate... methods inside poll?
This would require to pass at least two additional references. What if instead
I measure the time to perform these maybeUpdate... and substrate from the
remaining timeout for poll? I think this should be accurate enough

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/NetworkClient.java, line 463
https://reviews.apache.org/r/23648/diff/1/?file=634383#file634383line463

Offset commits are only needed to redo aborted transactions - i.e.,
typically in a failure scenario - so this is not really required for the
scope of this jira right? Furthermore, we would want the partition of the
offsets topic to be part of the transaction itself and the offset manager
is not transaction aware either.

Correct. I started sketching this and should have not included in this patch. I
will provide a complete version once we handle failure scenarios.

On July 22, 2014, 7:59 p.m., Joel Koshy wrote:
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionContext.java,
line 218
https://reviews.apache.org/r/23648/diff/1/?file=634391#file634391line218

May need to unnecessarily wait. i.e., who interrupts?

True. My intention was to check if all ACKs are received, otherwise wait a
fraction of the total timeout. Instead of wait(remainingWaitMs) it would be
something like wait(remainingWaitMs/4).

- Raul

---
This is an automatically generated e-mail. To reply, visit:

[jira] [Commented] (KAFKA-1542) normal IOException in the new producer is logged as ERROR

2014-07-28 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076479#comment-14076479
 ] 

Jun Rao commented on KAFKA-1542:


This change introduced a bug. Saw the following when running 
ProducerFailureHandlingTest. The problem is that the remote InetAdress may not 
always be available (e.g., in connecting mode). Committed a followup patch.

java.lang.NullPointerException
at org.apache.kafka.common.network.Selector.poll(Selector.java:265)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:178)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:695)

 normal IOException in the new producer is logged as ERROR
 -

 Key: KAFKA-1542
 URL: https://issues.apache.org/jira/browse/KAFKA-1542
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: David Corley
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1542.patch


 Saw the following error in the log. It seems this can happen if the broker is 
 down. So, this probably should be logged as WARN, instead ERROR.
 2014/07/16 00:12:51.799 [Selector] Error in I/O: 
 java.io.IOException: Connection timed out
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:197)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
 at 
 org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:60)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:241)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:171)
 at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:174)
 at 
 org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:114)
 at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1533) transient unit test failure in ProducerFailureHandlingTest

2014-07-28 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076484#comment-14076484
 ] 

Jun Rao commented on KAFKA-1533:


This seems to be introduced by a bug in KAFKA-1542. I just committed a fix. 
After that, I don't see the unit test hanging issue any more.

David,

Could you try the test again?


 transient unit test failure in ProducerFailureHandlingTest
 --

 Key: KAFKA-1533
 URL: https://issues.apache.org/jira/browse/KAFKA-1533
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Guozhang Wang
 Fix For: 0.8.2

 Attachments: KAFKA-1533.patch, KAFKA-1533.patch, KAFKA-1533.patch, 
 KAFKA-1533_2014-07-21_15:45:58.patch, kafka.threads, stack.out


 Occasionally, saw the test hang on tear down. The following is the stack 
 trace.
 Test worker prio=5 tid=7f9246956000 nid=0x10e078000 in Object.wait() 
 [10e075000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1344)
 - locked 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:732)
 at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91)
 at org.I0Itec.zkclient.ZkClient$8.call(ZkClient.java:720)
 at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
 at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:716)
 at kafka.utils.ZkUtils$.deletePath(ZkUtils.scala:416)
 at kafka.utils.ZkUtils$.deregisterBrokerInZk(ZkUtils.scala:184)
 at kafka.server.KafkaHealthcheck.shutdown(KafkaHealthcheck.scala:50)
 at 
 kafka.server.KafkaServer$$anonfun$shutdown$2.apply$mcV$sp(KafkaServer.scala:243)
 at kafka.utils.Utils$.swallow(Utils.scala:172)
 at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
 at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
 at kafka.utils.Logging$class.swallow(Logging.scala:94)
 at kafka.utils.Utils$.swallow(Utils.scala:45)
 at kafka.server.KafkaServer.shutdown(KafkaServer.scala:243)
 at 
 kafka.api.ProducerFailureHandlingTest.tearDown(ProducerFailureHandlingTest.scala:90)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23767: Fix KAFKA-1430

2014-07-28 Thread Guozhang Wang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23767/
---

(Updated July 28, 2014, 6:30 p.m.)


Review request for kafka.


Bugs: KAFKA-1430
https://issues.apache.org/jira/browse/KAFKA-1430


Repository: kafka


Description (updated)
---

Address Jun's comments: 1. Kept the first comment about removing readlock on 
leaderReplicaIfLocal for further discussion; Kept the comment on whether 
satisfying a delayed fetch immediately if on partition has an error for further 
discussion 3. Rebased on KAFKA-1542 follow-up; 


Diffs (updated)
-

  core/src/main/scala/kafka/api/FetchRequest.scala 
55a5982c5234f3ac10d4b4ea9fd7c4aa11d34154 
  core/src/main/scala/kafka/api/FetchResponse.scala 
d117f10f724b09d6deef0df3a138d28fc91aa13a 
  core/src/main/scala/kafka/cluster/Partition.scala 
134aef9c88068443d4d465189f376dd78605b4f8 
  core/src/main/scala/kafka/cluster/Replica.scala 
5e659b4a5c0256431aecc200a6b914472da9ecf3 
  core/src/main/scala/kafka/consumer/ConsumerFetcherThread.scala 
f8c1b4e674f7515c377c6c30d212130f1ff022dd 
  core/src/main/scala/kafka/consumer/SimpleConsumer.scala 
0e64632210385ef63c2ad3445b55ac4f37a63df2 
  core/src/main/scala/kafka/log/Log.scala 
b7bc5ffdecba7221b3d2e4bca2b0c703bc74fa12 
  core/src/main/scala/kafka/log/LogCleaner.scala 
afbeffc72e7d7706b44961aecf8519c5c5a3b4b1 
  core/src/main/scala/kafka/log/LogSegment.scala 
0d6926ea105a99c9ff2cfc9ea6440f2f2d37bde8 
  core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
3b15254f32252cf824d7a292889ac7662d73ada1 
  core/src/main/scala/kafka/server/DelayedFetch.scala PRE-CREATION 
  core/src/main/scala/kafka/server/DelayedProduce.scala PRE-CREATION 
  core/src/main/scala/kafka/server/DelayedRequestKey.scala PRE-CREATION 
  core/src/main/scala/kafka/server/FetchDataInfo.scala PRE-CREATION 
  core/src/main/scala/kafka/server/FetchRequestPurgatory.scala PRE-CREATION 
  core/src/main/scala/kafka/server/KafkaApis.scala 
fd5f12ee31e78cdcda8e24a0ab3e1740b3928884 
  core/src/main/scala/kafka/server/LogOffsetMetadata.scala PRE-CREATION 
  core/src/main/scala/kafka/server/OffsetManager.scala 
0e22897cd1c7e45c58a61c3c468883611b19116d 
  core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala PRE-CREATION 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
75ae1e161769a020a102009df416009bd6710f4a 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
897783cb756de548a8b634876f729b63ffe9925e 
  core/src/main/scala/kafka/server/RequestPurgatory.scala 
3d0ff1e2dbd6a5c3587cffa283db70415af83a7b 
  core/src/main/scala/kafka/tools/MetadataRequestProducer.scala PRE-CREATION 
  core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 
af4783646803e58714770c21f8c3352370f26854 
  core/src/main/scala/kafka/tools/TestEndToEndLatency.scala 
5f8f6bc46ae6cbbe15cec596ac99d0b377d1d7ef 
  core/src/test/scala/other/kafka/StressTestLog.scala 
8fcd068b248688c40e73117dc119fa84cceb95b3 
  core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala 
cd16ced5465d098be7a60498326b2a98c248f343 
  core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala 
9f04bd38be639cde3e7f402845dbe6ae92e87dc2 
  core/src/test/scala/unit/kafka/log/LogManagerTest.scala 
7d4c70ce651b1af45cf9bb69b974aa770de8e59d 
  core/src/test/scala/unit/kafka/log/LogSegmentTest.scala 
6b7603728ae5217565d68b92dd5349e7c6508f31 
  core/src/test/scala/unit/kafka/log/LogTest.scala 
1da1393983d4b20330e7c7f374424edd1b26f2a3 
  core/src/test/scala/unit/kafka/message/BaseMessageSetTestCases.scala 
6db245c956d2172cde916defdb0749081bf891fd 
  core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
e532c2826c96bfa47d9a3c41b4d71dbf69541eac 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
2cd3a3faf7be2bbc22a892eec78c6e4c05545e18 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
0ec120a4a953114e88c575dd6b583874371a09e3 
  core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
4f61f8469df99e02d6ce7aad897d10e158cca8fd 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
b1c4ce9f66aa86c68f2e6988f67546fa63ed1f9b 

Diff: https://reviews.apache.org/r/23767/diff/


Testing
---


Thanks,

Guozhang Wang

[jira] [Commented] (KAFKA-1430) Purgatory redesign

[
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076550#comment-14076550
]

Guozhang Wang commented on KAFKA-1430:
--

Updated reviewboard https://reviews.apache.org/r/23767/
against branch origin/trunk

Purgatory redesign
--

Key: KAFKA-1430
URL: https://issues.apache.org/jira/browse/KAFKA-1430
Project: Kafka
Issue Type: Improvement
Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Guozhang Wang
Attachments: KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430.patch,
KAFKA-1430.patch, KAFKA-1430.patch, KAFKA-1430_2014-06-05_17:07:37.patch,
KAFKA-1430_2014-06-05_17:13:13.patch, KAFKA-1430_2014-06-05_17:40:53.patch,
KAFKA-1430_2014-06-10_11:22:06.patch, KAFKA-1430_2014-06-10_11:26:02.patch,
KAFKA-1430_2014-07-11_10:59:13.patch, KAFKA-1430_2014-07-21_12:53:39.patch,
KAFKA-1430_2014-07-25_09:52:43.patch, KAFKA-1430_2014-07-28_11:30:23.patch

We have seen 2 main issues with the Purgatory.
1. There is no atomic checkAndWatch functionality. So, a client typically
first checks whether a request is satisfied or not and then register the
watcher. However, by the time the watcher is registered, the registered item
could already be satisfied. This item won't be satisfied until the next
update happens or the delayed time expires, which means the watched item
could be delayed.
2. FetchRequestPurgatory doesn't quite work. This is because the current
design tries to incrementally maintain the accumulated bytes ready for fetch.
However, this is difficult since the right time to check whether a fetch (for
regular consumer) request is satisfied is when the high watermark moves. At
that point, it's hard to figure out how many bytes we should incrementally
add to each pending fetch request.
The problem has been reported in KAFKA-1150 and KAFKA-703.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1430) Purgatory redesign

[
https://issues.apache.org/jira/browse/KAFKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Guozhang Wang updated KAFKA-1430:
-

Attachment: KAFKA-1430_2014-07-28_11:30:23.patch

Purgatory redesign
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Jenkins build is back to normal : Kafka-trunk #236

2014-07-28 Thread Apache Jenkins Server

See https://builds.apache.org/job/Kafka-trunk/236/changes

Re: Review Request 23895: Patch for KAFKA-1419

2014-07-28 Thread Joe Crobak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23895/#review48877
---


Good stuff, I was able to get all binaries to build and pass tests with this 
patch.

I think a fix to maintain compatibility with the BeanProperty change is 
important, but the other two aren't so much.


build.gradle
https://reviews.apache.org/r/23895/#comment85642

I think the file can be removed so that the exclusion isn't needed.



core/src/main/scala/kafka/javaapi/message/ByteBufferMessageSet.scala
https://reviews.apache.org/r/23895/#comment85644

you need to add `def getBuffer = buffer` here and below in order to 
maintain backwards compatibility (those are added by the @BeanProperty 
annotation).

You could also do something similar to `kafka.utils.Annotations_2.9+.scala` 
and the groovy excludes to keep the BeanProperty annotation working.



gradle.properties
https://reviews.apache.org/r/23895/#comment85643

I hesitate to nit, but scala 2.11.2 was released 6 days ago. Might be worth 
switching if it doesn't cause any issues?


- Joe Crobak


On July 28, 2014, 3:07 p.m., Ivan Lyutov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23895/
 ---
 
 (Updated July 28, 2014, 3:07 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1419
 https://issues.apache.org/jira/browse/KAFKA-1419
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1419 - cross build for scala 2.11 - dropped scala 2.8 support - minor 
 bug fixes
 
 
 Diffs
 -
 
   build.gradle a72905df824ba68bed5d5170d18873c23e1782c9 
   core/src/main/scala/kafka/javaapi/message/ByteBufferMessageSet.scala 
 fecee8d5f7b32f483bb1bfc6a5080d589906f9c4 
   core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
 73401c5ff34d08abce22267aa9c4d86632c6fb74 
   gradle.properties 4827769a3f8e34f0fe7e783eb58e44d4db04859b 
   gradle/buildscript.gradle 225e0a82708bc5f390e5e2c1d4d9a0d06f491b95 
   gradle/wrapper/gradle-wrapper.properties 
 610282a699afc89a82203ef0e4e71ecc53761039 
   scala.gradle ebd21b870c0746aade63248344ab65d9b5baf820 
 
 Diff: https://reviews.apache.org/r/23895/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ivan Lyutov

[jira] Subscription: outstanding kafka patches

2014-07-28 Thread jira

Issue Subscription
Filter: outstanding kafka patches (111 issues)
The list of outstanding kafka patches
Subscriber: kafka-mailing-list

Key Summary
KAFKA-1559  Upgrade Gradle wrapper to Gradle 2.0
https://issues.apache.org/jira/browse/KAFKA-1559
KAFKA-1550  Patch review tool should use git format-patch to generate patch
https://issues.apache.org/jira/browse/KAFKA-1550
KAFKA-1543  Changing replication factor
https://issues.apache.org/jira/browse/KAFKA-1543
KAFKA-1541   Add transactional request definitions to clients package
https://issues.apache.org/jira/browse/KAFKA-1541
KAFKA-1536  Change the status of the JIRA to Patch Available in the 
kafka-review-tool
https://issues.apache.org/jira/browse/KAFKA-1536
KAFKA-1528  Normalize all the line endings
https://issues.apache.org/jira/browse/KAFKA-1528
KAFKA-1527  SimpleConsumer should be transaction-aware
https://issues.apache.org/jira/browse/KAFKA-1527
KAFKA-1526  Producer performance tool should have an option to enable 
transactions
https://issues.apache.org/jira/browse/KAFKA-1526
KAFKA-1525  DumpLogSegments should print transaction IDs
https://issues.apache.org/jira/browse/KAFKA-1525
KAFKA-1524  Implement transactional producer
https://issues.apache.org/jira/browse/KAFKA-1524
KAFKA-1523  Implement transaction manager module
https://issues.apache.org/jira/browse/KAFKA-1523
KAFKA-1522  Transactional messaging request/response definitions
https://issues.apache.org/jira/browse/KAFKA-1522
KAFKA-1517  Messages is a required argument to Producer Performance Test
https://issues.apache.org/jira/browse/KAFKA-1517
KAFKA-1510  Force offset commits when migrating consumer offsets from zookeeper 
to kafka
https://issues.apache.org/jira/browse/KAFKA-1510
KAFKA-1509  Restart of destination broker after unreplicated partition move 
leaves partitions without leader
https://issues.apache.org/jira/browse/KAFKA-1509
KAFKA-1507  Using GetOffsetShell against non-existent topic creates the topic 
unintentionally
https://issues.apache.org/jira/browse/KAFKA-1507
KAFKA-1500  adding new consumer requests using the new protocol
https://issues.apache.org/jira/browse/KAFKA-1500
KAFKA-1498  new producer performance and bug improvements
https://issues.apache.org/jira/browse/KAFKA-1498
KAFKA-1496  Using batch message in sync producer only sends the first message 
if we use a Scala Stream as the argument 
https://issues.apache.org/jira/browse/KAFKA-1496
KAFKA-1481  Stop using dashes AND underscores as separators in MBean names
https://issues.apache.org/jira/browse/KAFKA-1481
KAFKA-1477  add authentication layer and initial JKS x509 implementation for 
brokers, producers and consumer for network communication
https://issues.apache.org/jira/browse/KAFKA-1477
KAFKA-1476  Get a list of consumer groups
https://issues.apache.org/jira/browse/KAFKA-1476
KAFKA-1475  Kafka consumer stops LeaderFinder/FetcherThreads, but application 
does not know
https://issues.apache.org/jira/browse/KAFKA-1475
KAFKA-1471  Add Producer Unit Tests for LZ4 and LZ4HC compression
https://issues.apache.org/jira/browse/KAFKA-1471
KAFKA-1468  Improve perf tests
https://issues.apache.org/jira/browse/KAFKA-1468
KAFKA-1460  NoReplicaOnlineException: No replica for partition
https://issues.apache.org/jira/browse/KAFKA-1460
KAFKA-1451  Broker stuck due to leader election race 
https://issues.apache.org/jira/browse/KAFKA-1451
KAFKA-1450  check invalid leader in a more robust way
https://issues.apache.org/jira/browse/KAFKA-1450
KAFKA-1430  Purgatory redesign
https://issues.apache.org/jira/browse/KAFKA-1430
KAFKA-1419  cross build for scala 2.11
https://issues.apache.org/jira/browse/KAFKA-1419
KAFKA-1394  Ensure last segment isn't deleted on expiration when there are 
unflushed messages
https://issues.apache.org/jira/browse/KAFKA-1394
KAFKA-1372  Upgrade to Gradle 1.10
https://issues.apache.org/jira/browse/KAFKA-1372
KAFKA-1367  Broker topic metadata not kept in sync with ZooKeeper
https://issues.apache.org/jira/browse/KAFKA-1367
KAFKA-1351  String.format is very expensive in Scala
https://issues.apache.org/jira/browse/KAFKA-1351
KAFKA-1343  Kafka consumer iterator thread stalls
https://issues.apache.org/jira/browse/KAFKA-1343
KAFKA-1330  Implement subscribe(TopicPartition...partitions) in the new consumer
https://issues.apache.org/jira/browse/KAFKA-1330
KAFKA-1329  Add metadata fetch and refresh functionality to the consumer
https://issues.apache.org/jira/browse/KAFKA-1329
KAFKA-1324  Debian packaging
https://issues.apache.org/jira/browse/KAFKA-1324
KAFKA-1303  metadata

[jira] [Commented] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076660#comment-14076660
 ] 

ASF GitHub Bot commented on KAFKA-1414:
---

Github user ataraxer closed the pull request at:

https://github.com/apache/kafka/pull/26


 Speedup broker startup after hard reset
 ---

 Key: KAFKA-1414
 URL: https://issues.apache.org/jira/browse/KAFKA-1414
 Project: Kafka
  Issue Type: Improvement
  Components: log
Affects Versions: 0.8.1.1, 0.8.2, 0.9.0
Reporter: Dmitry Bugaychenko
Assignee: Anton Karamanov
 Fix For: 0.8.2

 Attachments: 
 0001-KAFKA-1414-Speedup-broker-startup-after-hard-reset-a.patch, 
 KAFKA-1414-rev1.patch, KAFKA-1414-rev2.fixed.patch, KAFKA-1414-rev2.patch, 
 KAFKA-1414-rev3-interleaving.patch, KAFKA-1414-rev3.patch, 
 KAFKA-1414-rev4.patch, KAFKA-1414-rev5.patch, freebie.patch, 
 parallel-dir-loading-0.8.patch, 
 parallel-dir-loading-trunk-fixed-threadpool.patch, 
 parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch


 After hard reset due to power failure broker takes way too much time 
 recovering unflushed segments in a single thread. This could be easiliy 
 improved launching multiple threads (one per data dirrectory, assuming that 
 typically each data directory is on a dedicated drive). Localy we trie this 
 simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
 to scala, so do not take it literally:
 {code}
   /**
* Recover and load all logs in the given data directories
*/
   private def loadLogs(dirs: Seq[File]) {
 val threads : Array[Thread] = new Array[Thread](dirs.size)
 var i: Int = 0
 val me = this
 for(dir - dirs) {
   val thread = new Thread( new Runnable {
 def run()
 {
   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
   /* load the logs */
   val subDirs = dir.listFiles()
   if(subDirs != null) {
 val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
 if(cleanShutDownFile.exists())
   info(Found clean shutdown file. Skipping recovery for all logs 
 in data directory '%s'.format(dir.getAbsolutePath))
 for(dir - subDirs) {
   if(dir.isDirectory) {
 info(Loading log ' + dir.getName + ')
 val topicPartition = Log.parseTopicPartitionName(dir.getName)
 val config = topicConfigs.getOrElse(topicPartition.topic, 
 defaultConfig)
 val log = new Log(dir,
   config,
   recoveryPoints.getOrElse(topicPartition, 0L),
   scheduler,
   time)
 val previous = addLogWithLock(topicPartition, log)
 if(previous != null)
   throw new IllegalArgumentException(Duplicate log 
 directories found: %s, %s!.format(log.dir.getAbsolutePath, 
 previous.dir.getAbsolutePath))
   }
 }
 cleanShutDownFile.delete()
   }
 }
   })
   thread.start()
   threads(i) = thread
   i = i + 1
 }
 for(thread - threads) {
   thread.join()
 }
   }
   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
 logCreationOrDeletionLock synchronized {
   this.logs.put(topicPartition, log)
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

2014-07-28 Thread saurabh agarwal (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076758#comment-14076758
]

saurabh agarwal commented on KAFKA-1555:

this seems more like the ack 1 case in Kafka. Basically, if you have R
replicas and you set dfs.replication.min to a value smaller than R, you can get
an ack back sooner. However, there is a slight possibility that the acked data
is lost with fewer than R-1 failures

Agree on the part that ack1 is similar to dfs.replication.min. It is
actually also same as proposed min.isr.required. What I am suggesting is to
pair with the ack =-1 which is similar to dfs.replication. In this case, when
the system is running normal, all write will get acked by all replicas (
enforced by ack=-1). When replicas are going down one by one, the ack1 will
ensure that there is min more of one replica.

provide strong consistency with reasonable availability
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24006: Patch for KAFKA-1420


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24006/
---

Review request for kafka.


Bugs: KAFKA-1420
https://issues.apache.org/jira/browse/KAFKA-1420


Repository: kafka


Description
---

KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
with TestUtils.createTopic in all unit tests


Diffs
-

  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
f44568cb25edf25db857415119018fd4c9922f61 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
c4e13c5240c8303853d08cc3b40088f8c7dae460 

Diff: https://reviews.apache.org/r/24006/diff/


Testing
---


Thanks,

Jonathan Natkins

Review Request 24008: Patch for KAFKA-1420


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24008/
---

Review request for kafka.


Bugs: KAFKA-1420
https://issues.apache.org/jira/browse/KAFKA-1420


Repository: kafka


Description
---

KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
with TestUtils.createTopic in all unit tests


Diffs
-

  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
f44568cb25edf25db857415119018fd4c9922f61 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
c4e13c5240c8303853d08cc3b40088f8c7dae460 

Diff: https://reviews.apache.org/r/24008/diff/


Testing
---


Thanks,

Jonathan Natkins

Re: Review Request 24006: Patch for KAFKA-1420


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24006/
---

(Updated July 28, 2014, 8:52 p.m.)


Review request for kafka.


Bugs: KAFKA-1420
https://issues.apache.org/jira/browse/KAFKA-1420


Repository: kafka


Description
---

KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
with TestUtils.createTopic in all unit tests


Diffs
-

  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
f44568cb25edf25db857415119018fd4c9922f61 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
c4e13c5240c8303853d08cc3b40088f8c7dae460 

Diff: https://reviews.apache.org/r/24006/diff/


Testing (updated)
---

Automated


Thanks,

Jonathan Natkins

[jira] [Updated] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests


 [ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Natkins updated KAFKA-1420:


Attachment: KAFKA-1420.patch

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24007: Patch for KAFKA-1420


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24007/
---

Review request for kafka.


Bugs: KAFKA-1420
https://issues.apache.org/jira/browse/KAFKA-1420


Repository: kafka


Description
---

KAFKA-1420 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK 
with TestUtils.createTopic in all unit tests


Diffs
-

  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
29cc01bcef9cacd8dec1f5d662644fc6fe4994bc 
  core/src/test/scala/unit/kafka/integration/UncleanLeaderElectionTest.scala 
f44568cb25edf25db857415119018fd4c9922f61 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
c4e13c5240c8303853d08cc3b40088f8c7dae460 

Diff: https://reviews.apache.org/r/24007/diff/


Testing
---


Thanks,

Jonathan Natkins

[jira] [Commented] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests


[ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076838#comment-14076838
 ] 

Guozhang Wang commented on KAFKA-1420:
--

Thanks for the patch, will review it asap.

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests


 [ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Natkins updated KAFKA-1420:


Status: Patch Available  (was: Open)

Patch available at https://reviews.apache.org/r/24006/

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests


 [ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Natkins updated KAFKA-1420:


Attachment: (was: KAFKA-1420.patch)

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests


 [ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Natkins updated KAFKA-1420:


Attachment: KAFKA-1420.patch

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24012: Patch for KAFKA-1560


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24012/
---

Review request for kafka.


Bugs: KAFKA-1560
https://issues.apache.org/jira/browse/KAFKA-1560


Repository: kafka


Description
---

KAFKA-1560 Make arguments to jira-python API more explicit in 
kafka-patch-review's get_jira()


Diffs
-

  kafka-patch-review.py dc45549f886440f1721c60aab9aa0a4af9b4cbef 

Diff: https://reviews.apache.org/r/24012/diff/


Testing
---


Thanks,

Jonathan Natkins

[jira] [Updated] (KAFKA-1560) Make arguments to jira-python API more explicit in kafka-patch-review's get_jira()


 [ 
https://issues.apache.org/jira/browse/KAFKA-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Natkins updated KAFKA-1560:


Status: Patch Available  (was: Open)

 Make arguments to jira-python API more explicit in kafka-patch-review's 
 get_jira() 
 ---

 Key: KAFKA-1560
 URL: https://issues.apache.org/jira/browse/KAFKA-1560
 Project: Kafka
  Issue Type: Improvement
Reporter: Jonathan Natkins
Priority: Minor
 Attachments: KAFKA-1560.patch


 I ran into an issue with kafka-patch-review.py as a result of a change in 
 parameters for the JIRA() object between jira-python 0.28 and 0.29. A quick 
 fix is to make parameters more explicit.
 As a side note, there is a separate bug affecting the review tool, but I'm 
 not sure it's fixable without modifying the jira-python source 
 (https://bitbucket.org/bspeakmon/jira-python/issue/122/clientissue-function-needs-param-removal)
 It seems the answer is that people should not use jira-python 0.29 yet, since 
 it's a little broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (KAFKA-1560) Make arguments to jira-python API more explicit in kafka-patch-review's get_jira()


[ 
https://issues.apache.org/jira/browse/KAFKA-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076957#comment-14076957
 ] 

Jonathan Natkins commented on KAFKA-1560:
-

Created reviewboard https://reviews.apache.org/r/24012/diff/
 against branch origin/trunk

 Make arguments to jira-python API more explicit in kafka-patch-review's 
 get_jira() 
 ---

 Key: KAFKA-1560
 URL: https://issues.apache.org/jira/browse/KAFKA-1560
 Project: Kafka
  Issue Type: Improvement
Reporter: Jonathan Natkins
Priority: Minor
 Attachments: KAFKA-1560.patch


 I ran into an issue with kafka-patch-review.py as a result of a change in 
 parameters for the JIRA() object between jira-python 0.28 and 0.29. A quick 
 fix is to make parameters more explicit.
 As a side note, there is a separate bug affecting the review tool, but I'm 
 not sure it's fixable without modifying the jira-python source 
 (https://bitbucket.org/bspeakmon/jira-python/issue/122/clientissue-function-needs-param-removal)
 It seems the answer is that people should not use jira-python 0.29 yet, since 
 it's a little broken.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23339: Patch for KAFKA-1507

2014-07-28 Thread Sriharsha Chintalapani



 On July 28, 2014, 3:16 a.m., Jun Rao wrote:
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Metadata.java,
   lines 56-57
  https://reviews.apache.org/r/23339/diff/3/?file=641061#file641061line56
 
  We need to pass in the createTopic flag in this constructor too. In the 
  producer, we will then set createTopic to true and the in the consumer, we 
  will set it to true.

  I added another constructor public Metadata(long refreshBackoffMs, long 
metadataExpireMs, boolean createTopic) and kept this one as a backward 
compatibility if it causes confusion I'll remove this. I was thinking if the 
user is calling Metdata() from producer  createTopic should be transparent to 
them as they expect to work since its part of broker config by default.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23339/#review48822
---


On July 24, 2014, 12:07 a.m., Sriharsha Chintalapani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23339/
 ---
 
 (Updated July 24, 2014, 12:07 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1507
 https://issues.apache.org/jira/browse/KAFKA-1507
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1507. Using GetOffsetShell against non-existent topic creates the topic 
 unintentionally. Changes as per Jun's suggestions.
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
 d8f9ce663ee24d2b0852c974136741280c39f8f8 
   
 clients/src/main/java/org/apache/kafka/clients/producer/internals/Metadata.java
  4aa5b01d611631db72df47d50bbe30edb8c478db 
   clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java 
 7517b879866fc5dad5f8d8ad30636da8bbe7784a 
   clients/src/main/java/org/apache/kafka/common/protocol/types/Struct.java 
 444e69e7c95d5ffad19896fff0ab15cb4f5c9b4e 
   clients/src/main/java/org/apache/kafka/common/requests/MetadataRequest.java 
 b22ca1dce65f665d84c2a980fd82f816e93d9960 
   
 clients/src/test/java/org/apache/kafka/common/requests/RequestResponseTest.java
  df37fc6d8f0db0b8192a948426af603be3444da4 
   core/src/main/scala/kafka/api/TopicMetadataRequest.scala 
 7dca09ce637a40e125de05703dc42e8b611971ac 
   core/src/main/scala/kafka/client/ClientUtils.scala 
 ce7ede3f6d60e756e252257bd8c6fedc21f21e1c 
   core/src/main/scala/kafka/consumer/ConsumerFetcherManager.scala 
 b9e2bea7b442a19bcebd1b350d39541a8c9dd068 
   core/src/main/scala/kafka/javaapi/TopicMetadataRequest.scala 
 b0b7be14d494ae8c87f4443b52db69d273c20316 
   core/src/main/scala/kafka/server/KafkaApis.scala 
 fd5f12ee31e78cdcda8e24a0ab3e1740b3928884 
   core/src/main/scala/kafka/tools/GetOffsetShell.scala 
 9c6064e201eebbcd5b276a0dedd02937439edc94 
   core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 
 af4783646803e58714770c21f8c3352370f26854 
   core/src/main/scala/kafka/tools/SimpleConsumerShell.scala 
 36314f412a8281aece2789fd2b74a106b82c57d2 
   core/src/test/scala/unit/kafka/admin/AddPartitionsTest.scala 
 1bf2667f47853585bc33ffb3e28256ec5f24ae84 
   core/src/test/scala/unit/kafka/integration/TopicMetadataTest.scala 
 35dc071b1056e775326981573c9618d8046e601d 
 
 Diff: https://reviews.apache.org/r/23339/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sriharsha Chintalapani

[jira] [Created] (KAFKA-1561) Data Loss for Incremented Replica Factor and Leader Election

Guozhang Wang created KAFKA-1561:


 Summary: Data Loss for Incremented Replica Factor and Leader 
Election
 Key: KAFKA-1561
 URL: https://issues.apache.org/jira/browse/KAFKA-1561
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.2


This is reported on the mailing list (thanks to Jad).

quote
Hi,

I have a test that continuously sends messages to one broker, brings up
another broker, and adds it as a replica for all partitions, with it being
the preferred replica for some. I have auto.leader.rebalance.enable=true,
so replica election gets triggered. Data is being pumped to the old broker
all the while. It seems that some data gets lost while switching over to
the new leader. Is this a bug, or do I have something misconfigured? I also
have request.required.acks=-1 on the producer.

Here's what I think is happening:

1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/
broker 0 currently leader, with ISR=(0), so write returns successfully,
even when acks = -1. Correlation id 35836

Producer log:

[2014-07-24 14:44:26,991]  [DEBUG]  [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[kafka.producer.BrokerPartitionInfo]  Partition
[EventServiceUpsertTopic,13] has leader 0

[2014-07-24 14:44:26,993]  [DEBUG]  [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[k.producer.async.DefaultEventHandler]  Producer sent messages with
correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on
localhost:56821
2. Broker 1 is still catching up

Broker 0 Log:

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971.
All leo's are 975,971

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-request-handler-3]
[kafka.server.KafkaApis]  [KafkaApi-0] Produce to local log in 0 ms

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-processor-56821-0]
[kafka.request.logger]  Completed request:Name: ProducerRequest; Version:
0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1
ms from client /127.0.0.1:57086
;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0
3. Leader election is triggered by the scheduler:

Broker 0 Log:

[2014-07-24 14:44:26,991]  [INFO ]  [kafka-scheduler-0]
[k.c.PreferredReplicaPartitionLeaderSelector]
[PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [
EventServiceUpsertTopic,13] is not the preferred replica. Trigerring
preferred replica leader election

[2014-07-24 14:44:26,993]  [DEBUG]  [kafka-scheduler-0]
[kafka.utils.ZkUtils$]  Conditional update of path
/brokers/topics/EventServiceUpsertTopic/partitions/13/state with value
{controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]}
and expected version 3 succeeded, returning the new version: 4

[2014-07-24 14:44:26,994]  [DEBUG]  [kafka-scheduler-0]
[k.controller.PartitionStateMachine]  [Partition state machine on
Controller 0]: After leader election, leader cache is updated to
Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip)

[2014-07-24 14:44:26,994]  [INFO ]  [kafka-scheduler-0]
[kafka.controller.KafkaController]  [Controller 0]: Partition [
EventServiceUpsertTopic,13] completed preferred replica leader election.
New leader is 1
4. Broker 1 is still behind, but it sets the high water mark to 971!!!

Broker 1 Log:

[2014-07-24 14:44:26,999]  [INFO ]  [kafka-request-handler-6]
[kafka.server.ReplicaFetcherManager]  [ReplicaFetcherManager on broker 1]
Removed fetcher for partitions [EventServiceUpsertTopic,13]

[2014-07-24 14:44:27,000]  [DEBUG]  [kafka-request-handler-6]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1.
All leo's are -1,971

[2014-07-24 14:44:27,098]  [DEBUG]  [kafka-request-handler-3]
[kafka.server.KafkaApis]  [KafkaApi-1] Maybe update partition HW due to
fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId:
ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo: [EventServiceUpsertTopic,13] -
PartitionFetchInfo(971,1048576), Snipped

[2014-07-24 14:44:27,098]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Recording follower 0 position 971 for partition [
EventServiceUpsertTopic,13].

[2014-07-24 14:44:27,100]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Highwatermark for partition [EventServiceUpsertTopic,13] updated to 971

[jira] [Updated] (KAFKA-1561) Data Loss for Incremented Replica Factor and Leader Election


 [ 
https://issues.apache.org/jira/browse/KAFKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1561:
-

Description: 
This is reported on the mailing list (thanks to Jad).

{quote}
Hi,

I have a test that continuously sends messages to one broker, brings up
another broker, and adds it as a replica for all partitions, with it being
the preferred replica for some. I have auto.leader.rebalance.enable=true,
so replica election gets triggered. Data is being pumped to the old broker
all the while. It seems that some data gets lost while switching over to
the new leader. Is this a bug, or do I have something misconfigured? I also
have request.required.acks=-1 on the producer.

Here's what I think is happening:

1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/
broker 0 currently leader, with ISR=(0), so write returns successfully,
even when acks = -1. Correlation id 35836

Producer log:

[2014-07-24 14:44:26,991]  [DEBUG]  [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[kafka.producer.BrokerPartitionInfo]  Partition
[EventServiceUpsertTopic,13] has leader 0

[2014-07-24 14:44:26,993]  [DEBUG]  [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[k.producer.async.DefaultEventHandler]  Producer sent messages with
correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on
localhost:56821
2. Broker 1 is still catching up

Broker 0 Log:

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971.
All leo's are 975,971

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-request-handler-3]
[kafka.server.KafkaApis]  [KafkaApi-0] Produce to local log in 0 ms

[2014-07-24 14:44:26,992]  [DEBUG]  [kafka-processor-56821-0]
[kafka.request.logger]  Completed request:Name: ProducerRequest; Version:
0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 1
ms from client /127.0.0.1:57086
;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0
3. Leader election is triggered by the scheduler:

Broker 0 Log:

[2014-07-24 14:44:26,991]  [INFO ]  [kafka-scheduler-0]
[k.c.PreferredReplicaPartitionLeaderSelector]
[PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [
EventServiceUpsertTopic,13] is not the preferred replica. Trigerring
preferred replica leader election

[2014-07-24 14:44:26,993]  [DEBUG]  [kafka-scheduler-0]
[kafka.utils.ZkUtils$]  Conditional update of path
/brokers/topics/EventServiceUpsertTopic/partitions/13/state with value
{controller_epoch:1,leader:1,version:1,leader_epoch:3,isr:[0,1]}
and expected version 3 succeeded, returning the new version: 4

[2014-07-24 14:44:26,994]  [DEBUG]  [kafka-scheduler-0]
[k.controller.PartitionStateMachine]  [Partition state machine on
Controller 0]: After leader election, leader cache is updated to
Map(Snipped(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),EndSnip)

[2014-07-24 14:44:26,994]  [INFO ]  [kafka-scheduler-0]
[kafka.controller.KafkaController]  [Controller 0]: Partition [
EventServiceUpsertTopic,13] completed preferred replica leader election.
New leader is 1
4. Broker 1 is still behind, but it sets the high water mark to 971!!!

Broker 1 Log:

[2014-07-24 14:44:26,999]  [INFO ]  [kafka-request-handler-6]
[kafka.server.ReplicaFetcherManager]  [ReplicaFetcherManager on broker 1]
Removed fetcher for partitions [EventServiceUpsertTopic,13]

[2014-07-24 14:44:27,000]  [DEBUG]  [kafka-request-handler-6]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1.
All leo's are -1,971

[2014-07-24 14:44:27,098]  [DEBUG]  [kafka-request-handler-3]
[kafka.server.KafkaApis]  [KafkaApi-1] Maybe update partition HW due to
fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId:
ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo: [EventServiceUpsertTopic,13] -
PartitionFetchInfo(971,1048576), Snipped

[2014-07-24 14:44:27,098]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Recording follower 0 position 971 for partition [
EventServiceUpsertTopic,13].

[2014-07-24 14:44:27,100]  [DEBUG]  [kafka-request-handler-3]
[kafka.cluster.Partition]  Partition [EventServiceUpsertTopic,13] on broker
1: Highwatermark for partition [EventServiceUpsertTopic,13] updated to 971
5. Consumer is none the wiser. All data that was in offsets 972-975 doesn't
show up!

I tried this with 2 initial replicas, and adding a 3rd which is supposed to
be the leader for some new partitions,

[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally

2014-07-28 Thread Jay Kreps (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077182#comment-14077182
]

Jay Kreps commented on KAFKA-1507:
--

I kind of feel this is not the right fix.

I think the behavior we have is kind of indefensible. The history was that we
previously had an auto-create topic that happened when a message was sent to a
broker and that broker didn't host the topic. However when we moved to
replication this broke for multiple reasons: the producer needed metadata about
the topic to send the message, and not all brokers would have all partitions.
However this auto-create behavior is very useful. So we kind of just
grandfathered it in by having the metadata request have the same side effect.

But this is really a silly indefensible behavior. There is no reason that
asking for metadata should create topics! This would be like in a database if
running DESCRIBE TABLE X would create table X for you if it didn't exist. This
just confuses everyone, as it must have confused you.

In any case the auto-creation behavior is very limited because there is no way
to specify the number of partitions, the replication factor, or any
topic-specific configuration.

Rather than further enshrining this behavior behavior by starting to add topic
creation options to the metadata request, I really think we should add a proper
create_topic API and have producers use that.

We could even make this a little more general and handle create, alter, and
delete. This would give clients full control.

Using GetOffsetShell against non-existent topic creates the topic
unintentionally
-

Key: KAFKA-1507
URL: https://issues.apache.org/jira/browse/KAFKA-1507
Project: Kafka
Issue Type: Bug
Affects Versions: 0.8.1.1
Environment: centos
Reporter: Luke Forehand
Assignee: Sriharsha Chintalapani
Priority: Minor
Labels: newbie
Attachments: KAFKA-1507.patch, KAFKA-1507_2014-07-22_10:27:45.patch,
KAFKA-1507_2014-07-23_17:07:20.patch

A typo in using GetOffsetShell command can cause a
topic to be created which cannot be deleted (because deletion is still in
progress)
./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1
./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo
Topic:typo PartitionCount:8ReplicationFactor:1 Configs:
Topic: typo Partition: 0Leader: 10 Replicas: 10
Isr: 10
...

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23895: Patch for KAFKA-1419

2014-07-28 Thread Joe Crobak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23895/#review48957
---



build.gradle
https://reviews.apache.org/r/23895/#comment85795

I think these two dependencies need to be updated. As I was trying out this 
code, I got some strange compile errors [1] in a downstream project. This seems 
to work better:

+  compile 'org.scala-lang.modules:scala-xml_2.11:1.0.2'
+  compile 'org.scala-lang.modules:scala-parser-combinators_2.11:1.0.2'

That error also makes me wonder if those compile dependencies are actually 
needed, too.

[1] java.lang.RuntimeException: Conflicting cross-version suffixes in: 
org.scala-lang.modules:scala-xml, 
org.scala-lang.modules:scala-parser-combinators



- Joe Crobak


On July 28, 2014, 3:07 p.m., Ivan Lyutov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23895/
 ---
 
 (Updated July 28, 2014, 3:07 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1419
 https://issues.apache.org/jira/browse/KAFKA-1419
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1419 - cross build for scala 2.11 - dropped scala 2.8 support - minor 
 bug fixes
 
 
 Diffs
 -
 
   build.gradle a72905df824ba68bed5d5170d18873c23e1782c9 
   core/src/main/scala/kafka/javaapi/message/ByteBufferMessageSet.scala 
 fecee8d5f7b32f483bb1bfc6a5080d589906f9c4 
   core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
 73401c5ff34d08abce22267aa9c4d86632c6fb74 
   gradle.properties 4827769a3f8e34f0fe7e783eb58e44d4db04859b 
   gradle/buildscript.gradle 225e0a82708bc5f390e5e2c1d4d9a0d06f491b95 
   gradle/wrapper/gradle-wrapper.properties 
 610282a699afc89a82203ef0e4e71ecc53761039 
   scala.gradle ebd21b870c0746aade63248344ab65d9b5baf820 
 
 Diff: https://reviews.apache.org/r/23895/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ivan Lyutov

[jira] [Commented] (KAFKA-1451) Broker stuck due to leader election race


[ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077385#comment-14077385
 ] 

Manikumar Reddy commented on KAFKA-1451:


Updated reviewboard https://reviews.apache.org/r/23962/diff/
 against branch origin/trunk

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch, 
 KAFKA-1451_2014-07-29_10:13:23.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23962: Patch for KAFKA-1451



 On July 29, 2014, 4:08 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala, lines 71-72
  https://reviews.apache.org/r/23962/diff/3/?file=643587#file643587line71
 
  I think we need to first update leaderId here based on the value 
  returned from getControllerID(), and then check if the controller exists. 
  Otherwise, if the another broker is the controller, this broker's leaderId 
  won't be updated correctly.

Initially I also thought of updating leaderId field. But we are using leaderId 
field in amILeader method only. I thought its OK to not to update leaderId 
field.

Anyhow.. i will upload a new patch with correct leaderId update.


- Manikumar Reddy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23962/#review48965
---


On July 29, 2014, 4:45 a.m., Manikumar Reddy O wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23962/
 ---
 
 (Updated July 29, 2014, 4:45 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1451
 https://issues.apache.org/jira/browse/KAFKA-1451
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Addresing Jun's Comments
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/server/ZookeeperLeaderElector.scala 
 e5b6ff1e2544b043007cf16a6b9dd4451c839e63 
 
 Diff: https://reviews.apache.org/r/23962/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Manikumar Reddy O

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race