[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725206#comment-15725206
 ] 

ASF GitHub Bot commented on KAFKA-3038:
---

GitHub user resetius opened a pull request:

https://github.com/apache/kafka/pull/2213

KAFKA-3038; Future'based pseudo-async controller



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/resetius/kafka KAFKA-3038-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2213


commit 339f8d76f7c2eb1b4ff45c7e088c6c8486ba786a
Author: Alexey Ozeritsky 
Date:   2016-12-01T17:29:12Z

KAFKA-3038; Future'based pseudo-async controller




> Speeding up partition reassignment after broker failure
> ---
>
> Key: KAFKA-3038
> URL: https://issues.apache.org/jira/browse/KAFKA-3038
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 0.9.0.0
>Reporter: Eno Thereska
> Fix For: 0.11.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for 
> each partition on the failed broker. Writes are done one at a time, in closed 
> loop, which is slow especially under high latency networks. Zookeeper has 
> support for batching operations (the "multi" API). It is expected that 
> substituting serial writes with batched ones should reduce failure handling 
> time by an order of magnitude.
> This is identified as an issue in 
> https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3
>  (section End-to-end latency during a broker failure)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

2016-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112505#comment-15112505
 ] 

ASF GitHub Bot commented on KAFKA-3038:
---

Github user enothereska closed the pull request at:

https://github.com/apache/kafka/pull/750


> Speeding up partition reassignment after broker failure
> ---
>
> Key: KAFKA-3038
> URL: https://issues.apache.org/jira/browse/KAFKA-3038
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 0.9.0.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.9.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for 
> each partition on the failed broker. Writes are done one at a time, in closed 
> loop, which is slow especially under high latency networks. Zookeeper has 
> support for batching operations (the "multi" API). It is expected that 
> substituting serial writes with batched ones should reduce failure handling 
> time by an order of magnitude.
> This is identified as an issue in 
> https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3
>  (section End-to-end latency during a broker failure)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

2016-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090765#comment-15090765
 ] 

ASF GitHub Bot commented on KAFKA-3038:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/750

KAFKA-3038: use async ZK calls to speed up leader reassignment

Updated failure code path to deal specifically with issue identified at 
affecting latency most. 
@fpj could you have a look please?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka kafka-3038

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #750


commit 3be8bb68c6ccb37b77ed527cf4ff05bc80ee8e99
Author: Eno Thereska 
Date:   2016-01-08T16:09:38Z

Asynchronous implementation of failure path when updating Zookeeper

commit e288c5e35d151e6e8ce06eaa1076ebb2ceb2db13
Author: Eno Thereska 
Date:   2016-01-08T16:10:07Z

Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038

commit 3913ab76707a6ad125b4252d88bc3cdf091702ee
Author: Eno Thereska 
Date:   2016-01-09T18:23:33Z

Implemented top method using a CountDownLatch. Minor code cleanup

commit a40ad4e768f1c626fc6c818c28d22f0a91d33eaf
Author: Eno Thereska 
Date:   2016-01-09T18:24:25Z

Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038




> Speeding up partition reassignment after broker failure
> ---
>
> Key: KAFKA-3038
> URL: https://issues.apache.org/jira/browse/KAFKA-3038
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 0.9.0.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.9.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for 
> each partition on the failed broker. Writes are done one at a time, in closed 
> loop, which is slow especially under high latency networks. Zookeeper has 
> support for batching operations (the "multi" API). It is expected that 
> substituting serial writes with batched ones should reduce failure handling 
> time by an order of magnitude.
> This is identified as an issue in 
> https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3
>  (section End-to-end latency during a broker failure)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

2016-01-06 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085300#comment-15085300
 ] 

Flavio Junqueira commented on KAFKA-3038:
-

You don't really need to batch with multi, you just need to make the calls 
asynchronous. In fact, unless you really need to make multiple updates 
transactional, the preferred way is to push updates asynchronously to keep the 
pipeline full.

> Speeding up partition reassignment after broker failure
> ---
>
> Key: KAFKA-3038
> URL: https://issues.apache.org/jira/browse/KAFKA-3038
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 0.9.0.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.9.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for 
> each partition on the failed broker. Writes are done one at a time, in closed 
> loop, which is slow especially under high latency networks. Zookeeper has 
> support for batching operations (the "multi" API). It is expected that 
> substituting serial writes with batched ones should reduce failure handling 
> time by an order of magnitude.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure

2016-01-06 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085628#comment-15085628
 ] 

Eno Thereska commented on KAFKA-3038:
-

[~fpj]: makes sense, thanks

> Speeding up partition reassignment after broker failure
> ---
>
> Key: KAFKA-3038
> URL: https://issues.apache.org/jira/browse/KAFKA-3038
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 0.9.0.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.9.0.0
>
>
> After a broker failure the controller does several writes to Zookeeper for 
> each partition on the failed broker. Writes are done one at a time, in closed 
> loop, which is slow especially under high latency networks. Zookeeper has 
> support for batching operations (the "multi" API). It is expected that 
> substituting serial writes with batched ones should reduce failure handling 
> time by an order of magnitude.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)