[jira] [Updated] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread Vahid Hashemian (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-9205:
---
Description: 
One regularly used healing operation on Kafka clusters is replica reassignments 
for topic partitions. For example, when there is a skew in inbound/outbound 
traffic of a broker replica reassignment can be used to move some 
leaders/followers from the broker; or if there is a skew in disk usage of 
brokers, replica reassignment can more some partitions to other brokers that 
have more disk space available.

In Kafka clusters that span across multiple data centers (or availability 
zones), high availability is a priority; in the sense that when a data center 
goes offline the cluster should be able to resume normal operation by 
guaranteeing partition replicas in all data centers.

This guarantee is currently the responsibility of the on-call engineer that 
performs the reassignment or the tool that automatically generates the 
reassignment plan for improving the cluster health (e.g. by considering the 
rack configuration value of each broker in the cluster). the former, is quite 
error-prone, and the latter, would lead to duplicate code in all such admin 
tools (which are not error free either). Not all use cases can make use the 
default assignment strategy that is used by --generate option; and current rack 
aware enforcement applies to this option only.

It would be great for the built-in replica assignment API and tool provided by 
Kafka to support a rack aware verification option for --execute scenario that 
would simply return an error when [some] brokers in any replica set share a 
common rack. 

  was:
One regularly used healing operation on Kafka clusters is replica reassignments 
for topic partitions. For example, when there is a skew in inbound/outbound 
traffic of a broker replica reassignment can be used to move some 
leaders/followers from the broker; or if there is a skew in disk usage of 
brokers, replica reassignment can more some partitions to other brokers that 
have more disk space available.

In Kafka clusters that span across multiple data centers (or availability 
zones), high availability is a priority; in the sense that when a data center 
goes offline the cluster should be able to resume normal operation by 
guaranteeing partition replicas in all data centers.

This guarantee is currently the responsibility of the on-call engineer that 
performs the reassignment or the tool that automatically generates the 
reassignment plan for improving the cluster health (e.g. by considering the 
rack configuration value of each broker in the cluster). the former, is quite 
error-prone, and the latter, would lead to duplicate code in all such admin 
tools (which are not error free either).

It would be great for the built-in replica assignment API and tool provided by 
Kafka to support a rack aware verification option that would simply return an 
error when [some] brokers in any replica set share a common rack. 


> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>  Labels: needs-kip
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 

[jira] [Updated] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread Vahid Hashemian (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-9205:
---
Labels:   (was: needs-kip)

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)