[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725206#comment-15725206 ] ASF GitHub Bot commented on KAFKA-3038: --- GitHub user resetius opened a pull request: https://github.com/apache/kafka/pull/2213 KAFKA-3038; Future'based pseudo-async controller You can merge this pull request into a Git repository by running: $ git pull https://github.com/resetius/kafka KAFKA-3038-trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2213.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2213 commit 339f8d76f7c2eb1b4ff45c7e088c6c8486ba786a Author: Alexey Ozeritsky Date: 2016-12-01T17:29:12Z KAFKA-3038; Future'based pseudo-async controller > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska > Fix For: 0.11.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. > This is identified as an issue in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 > (section End-to-end latency during a broker failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112507#comment-15112507 ] Eno Thereska commented on KAFKA-3038: - Closing initial PR since there is an opportunity to speed up other parts of the controller (in addition to failover). It is likely this JIRA will be part of a larger story. > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. > This is identified as an issue in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 > (section End-to-end latency during a broker failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112505#comment-15112505 ] ASF GitHub Bot commented on KAFKA-3038: --- Github user enothereska closed the pull request at: https://github.com/apache/kafka/pull/750 > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. > This is identified as an issue in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 > (section End-to-end latency during a broker failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090765#comment-15090765 ] ASF GitHub Bot commented on KAFKA-3038: --- GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/750 KAFKA-3038: use async ZK calls to speed up leader reassignment Updated failure code path to deal specifically with issue identified at affecting latency most. @fpj could you have a look please? You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka kafka-3038 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 commit 3be8bb68c6ccb37b77ed527cf4ff05bc80ee8e99 Author: Eno Thereska Date: 2016-01-08T16:09:38Z Asynchronous implementation of failure path when updating Zookeeper commit e288c5e35d151e6e8ce06eaa1076ebb2ceb2db13 Author: Eno Thereska Date: 2016-01-08T16:10:07Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038 commit 3913ab76707a6ad125b4252d88bc3cdf091702ee Author: Eno Thereska Date: 2016-01-09T18:23:33Z Implemented top method using a CountDownLatch. Minor code cleanup commit a40ad4e768f1c626fc6c818c28d22f0a91d33eaf Author: Eno Thereska Date: 2016-01-09T18:24:25Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038 > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. > This is identified as an issue in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 > (section End-to-end latency during a broker failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085628#comment-15085628 ] Eno Thereska commented on KAFKA-3038: - [~fpj]: makes sense, thanks > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3038) Speeding up partition reassignment after broker failure
[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085300#comment-15085300 ] Flavio Junqueira commented on KAFKA-3038: - You don't really need to batch with multi, you just need to make the calls asynchronous. In fact, unless you really need to make multiple updates transactional, the preferred way is to push updates asynchronously to keep the pipeline full. > Speeding up partition reassignment after broker failure > --- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core >Affects Versions: 0.9.0.0 >Reporter: Eno Thereska >Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. -- This message was sent by Atlassian JIRA (v6.3.4#6332)