[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-22 Thread sats (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979966#comment-16979966
 ] 

sats commented on KAFKA-9205:
-

Here is the 
[KIP-548|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-548+Add+Option+to+enforce+rack-aware+custom+partition+reassignment+execution]]

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-19 Thread sats (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977902#comment-16977902
 ] 

sats commented on KAFKA-9205:
-

Cool let me dig into it, thanks. [~vahid] 

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-19 Thread Vahid Hashemian (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977592#comment-16977592
 ] 

Vahid Hashemian commented on KAFKA-9205:


[~sbellapu] KIP process is not that difficult. If you have access to the wiki 
you can easily create one and start discussion on it in the mailing list (and 
after enough discussion/time you do a vote). The KIP page has all the necessary 
info: 
[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals].
 You can also take some of the recent KIPs as an example.

Since there is an existing option for disabling rack aware mode, this change 
should be designed in a way that either makes use of that option, or works well 
alongside it (without causing confusion); and at the same time preserves 
backward compatibility (i.e. existing default behavior should ideally not 
change).

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread sats (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977155#comment-16977155
 ] 

sats commented on KAFKA-9205:
-

[~vahid] can you please create the KIP (Sorry i am newbie not aware of the 
process), i can work on the code part.

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread Vahid Hashemian (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976895#comment-16976895
 ] 

Vahid Hashemian commented on KAFKA-9205:


This will still likely require a KIP since the default behavior could change.

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread Vahid Hashemian (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976888#comment-16976888
 ] 

Vahid Hashemian commented on KAFKA-9205:


Thanks [~sbellapu] for the pointer. KIP-36 and the current implementation 
enforces rack aware assignment when generating an assignment (using the 
--generate option). If a custom reassignment algorithm is used to generate the 
assignment, or if the reassignment is manually generated on ad-hoc basic, the 
tool does not enforce rack awareness when run with --execute option. It would 
be great if enforcement can be implemented in --execute scenario too. I updated 
the description too. 

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either). Not all use cases can make use the 
> default assignment strategy that is used by --generate option; and current 
> rack aware enforcement applies to this option only.
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option for --execute scenario 
> that would simply return an error when [some] brokers in any replica set 
> share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment

2019-11-18 Thread sats (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976855#comment-16976855
 ] 

sats commented on KAFKA-9205:
-

[~vahid] do you have new KIP ? or this can be a extension to 
[KIP-36|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]]
 please let me know so that i can give a shot on the implementation.

> Add an option to enforce rack-aware partition reassignment
> --
>
> Key: KAFKA-9205
> URL: https://issues.apache.org/jira/browse/KAFKA-9205
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin, tools
>Reporter: Vahid Hashemian
>Priority: Minor
>  Labels: needs-kip
>
> One regularly used healing operation on Kafka clusters is replica 
> reassignments for topic partitions. For example, when there is a skew in 
> inbound/outbound traffic of a broker replica reassignment can be used to move 
> some leaders/followers from the broker; or if there is a skew in disk usage 
> of brokers, replica reassignment can more some partitions to other brokers 
> that have more disk space available.
> In Kafka clusters that span across multiple data centers (or availability 
> zones), high availability is a priority; in the sense that when a data center 
> goes offline the cluster should be able to resume normal operation by 
> guaranteeing partition replicas in all data centers.
> This guarantee is currently the responsibility of the on-call engineer that 
> performs the reassignment or the tool that automatically generates the 
> reassignment plan for improving the cluster health (e.g. by considering the 
> rack configuration value of each broker in the cluster). the former, is quite 
> error-prone, and the latter, would lead to duplicate code in all such admin 
> tools (which are not error free either).
> It would be great for the built-in replica assignment API and tool provided 
> by Kafka to support a rack aware verification option that would simply return 
> an error when [some] brokers in any replica set share a common rack. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)