[jira] [Commented] (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)

2020-05-20 Thread GEORGE LI (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111858#comment-17111858
 ] 

GEORGE LI commented on KAFKA-8638:
--

[~hai_lin]

some of the recent activities about KIP-491 is in KAFKA-4084, where I made a 
patch for version 2.4 (and 1.1)  with an installation guide. 



> Preferred Leader Blacklist (deprioritized list)
> ---
>
> Key: KAFKA-8638
> URL: https://issues.apache.org/jira/browse/KAFKA-8638
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, controller, core
>Affects Versions: 1.1.1, 2.3.0, 2.2.1
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during preferred leader election.
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader. When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election. Below is a list of use 
> cases:
>  * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
>  * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines. We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas.
>  * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline. The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing. The ultimate solution is to swap these bad 
> hosts. But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest.
>  * If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
>  * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election. 
> A bouncing broker will cause temporary URP, and sometimes other issues. Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers. If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline, the 3rd broker can take leadership.
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty broker caught up). That’s a rather tedious 
> task.
> KIP is located at 
> [KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)

2020-05-19 Thread Hai Lin (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111427#comment-17111427
 ] 

Hai Lin commented on KAFKA-8638:


What's the current status is this kip? I see this will be very useful when we 
do any operation on the cluster. Like during rolling restart, first blacklist 
the broker and bounce it, until it got fully replicated, re enable it again.  
Most of the time, when a broker come up after a restart, some small partition 
will get synced faster than those big ones. And the broker is under a lot of 
stressed when replicating for those big partitions, but small partition become 
leader and the performance is compromised. I think even a simple blacklist 
white-list would be very helpful.

> Preferred Leader Blacklist (deprioritized list)
> ---
>
> Key: KAFKA-8638
> URL: https://issues.apache.org/jira/browse/KAFKA-8638
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, controller, core
>Affects Versions: 1.1.1, 2.3.0, 2.2.1
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during preferred leader election.
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader. When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election. Below is a list of use 
> cases:
>  * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
>  * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines. We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas.
>  * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline. The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing. The ultimate solution is to swap these bad 
> hosts. But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest.
>  * If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
>  * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election. 
> A bouncing broker will cause temporary URP, and sometimes other issues. Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers. If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline, the 3rd broker can take leadership.
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty broker caught up). That’s a rather tedious 
> task.
> KIP is located at 
> 

[jira] [Commented] (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)

2019-07-14 Thread GEORGE LI (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884787#comment-16884787
 ] 

GEORGE LI commented on KAFKA-8638:
--

Here is the KIP:  
[KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]

> Preferred Leader Blacklist (deprioritized list)
> ---
>
> Key: KAFKA-8638
> URL: https://issues.apache.org/jira/browse/KAFKA-8638
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, controller, core
>Affects Versions: 1.1.1, 2.3.0, 2.2.1
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during  preferred leader election. 
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader.  When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election.   Below is a list of 
> use cases:
> * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
> * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines.  We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas. 
> * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline.  The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing.  The ultimate solution is to swap these bad 
> hosts.  But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest. 
> *  If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
> * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election.  
> A bouncing broker will cause temporary URP, and sometimes other issues.  Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers.  If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline,  the 3rd broker can take leadership. 
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty broker caught up). That’s a rather tedious 
> task.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)

2019-07-09 Thread GEORGE LI (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881480#comment-16881480
 ] 

GEORGE LI commented on KAFKA-8638:
--

Hi Jose,

Because a broker can have hundreds/thousands topic/partitions assigned to it.   
To do reassignments to move it to the end and lower the priority, then remember 
to the original ordering to restore later is much more tedious than simply put 
it in "deprioritized list" for some time, then remove it when certain 
conditions are improved/met.  

We have a Rebalance Tool which rebalance the whole cluster,  it's better not 
keep changing the assignments replicas ordering constantly. With the  
"deprioritized list" , it's cleaner. 

Let's just take the use case of taking controller out of being leaders/serving 
traffic, and just as followers.   We observed that broker not serving any 
leaders will have less CPU utilization.  For clusters with busy controller 
doing extra work than other brokers, we would like it to not taking any 
leaders.   Right now,  for a broker to lose leadership, we need to bounce the 
broker.  In this case, if bounce, the controller fails over to another broker.  
 If we change the ordering of the current assignments for the controller,  next 
time, the controller fails over, we need to do the same.   

For managing  "deprioritized list",  the user (e.g. the on-call engineer seeing 
issue with a broker that should not serve leadership traffic) should have the 
ability to add/remove it.   My initial thought on how to store this 
"deprioritized list"  is 2 approaches below: 

* Design #1:
Introduce a Preferred Leader Blacklist. e.g. ZK path/node: 
/preferred_leader_blacklist/ 

Direct manipulation of ZK should be avoided as Kafka is moving toward RPC 
based.  A new Request/Response RPC call is needed.  

No ZK Watcher of this ZK node children is needed to trigger leadership changes 
for the current design.  

* Design #2:
Introduce a preferred_leader_blacklist dynamic config which by default is 
empty.  It allows a list of broker IDs separated by commas.  E.g. below broker 
ID  1,  10, 65 are being put into the blacklist. 


{code}
/usr/lib/kafka/bin/kafka-configs.sh --bootstrap-server localhost:9092 
--entity-type brokers --entity-default --alter --add-config 
preferred_leader_blacklist=1,10,65
{code}

Since the Kafka dynamic config is already using --bootstrap-server,  it does 
not need to manipulate the Zookeeper directly.  The downside of this: when 
adding/removing one broker from the list, instead of doing with one ZK node per 
broker in Design#1 above, the dynamic config needs to be updated with a new 
complete list. E.g. in order to remove broker 10 from the blacklist,  update 
preferred_leader_blacklist=1,65

The dynamic config should not trigger any leadership changes for the current 
design. 




> Preferred Leader Blacklist (deprioritized list)
> ---
>
> Key: KAFKA-8638
> URL: https://issues.apache.org/jira/browse/KAFKA-8638
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, controller, core
>Affects Versions: 1.1.1, 2.3.0, 2.2.1
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during  preferred leader election. 
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader.  When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election.   Below is a list of 
> use cases:
> * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
> * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines.  We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas. 
> * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 

[jira] [Commented] (KAFKA-8638) Preferred Leader Blacklist (deprioritized list)

2019-07-09 Thread Jose Armando Garcia Sancio (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881421#comment-16881421
 ] 

Jose Armando Garcia Sancio commented on KAFKA-8638:
---

Hi George,

Thanks for the issue. I see that you have tried a reassignment where the 
assigned replicas stay the same but the order the replicas is changed. How is 
managing this more or less tedious that managing this "deprioritized list"? How 
do you see the user managing this "deprioritized list"? For example, how do you 
see the user determining which brokers should be added and removed from this 
list?

Thanks!

> Preferred Leader Blacklist (deprioritized list)
> ---
>
> Key: KAFKA-8638
> URL: https://issues.apache.org/jira/browse/KAFKA-8638
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, controller, core
>Affects Versions: 1.1.1, 2.3.0, 2.2.1
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> Currently, the kafka preferred leader election will pick the broker_id in the 
> topic/partition replica assignments in a priority order when the broker is in 
> ISR. The preferred leader is the broker id in the first position of replica. 
> There are use-cases that, even the first broker in the replica assignment is 
> in ISR, there is a need for it to be moved to the end of ordering (lowest 
> priority) when deciding leadership during  preferred leader election. 
> Let’s use topic/partition replica (1,2,3) as an example. 1 is the preferred 
> leader.  When preferred leadership is run, it will pick 1 as the leader if 
> it's ISR, if 1 is not online and in ISR, then pick 2, if 2 is not in ISR, 
> then pick 3 as the leader. There are use cases that, even 1 is in ISR, we 
> would like it to be moved to the end of ordering (lowest priority) when 
> deciding leadership during preferred leader election.   Below is a list of 
> use cases:
> * (If broker_id 1 is a swapped failed host and brought up with last segments 
> or latest offset without historical data (There is another effort on this), 
> it's better for it to not serve leadership till it's caught-up.
> * The cross-data center cluster has AWS instances which have less computing 
> power than the on-prem bare metal machines.  We could put the AWS broker_ids 
> in Preferred Leader Blacklist, so on-prem brokers can be elected leaders, 
> without changing the reassignments ordering of the replicas. 
> * If the broker_id 1 is constantly losing leadership after some time: 
> "Flapping". we would want to exclude 1 to be a leader unless all other 
> brokers of this topic/partition are offline.  The “Flapping” effect was seen 
> in the past when 2 or more brokers were bad, when they lost leadership 
> constantly/quickly, the sets of partition replicas they belong to will see 
> leadership constantly changing.  The ultimate solution is to swap these bad 
> hosts.  But for quick mitigation, we can also put the bad hosts in the 
> Preferred Leader Blacklist to move the priority of its being elected as 
> leaders to the lowest. 
> *  If the controller is busy serving an extra load of metadata requests and 
> other tasks. we would like to put the controller's leaders to other brokers 
> to lower its CPU load. currently bouncing to lose leadership would not work 
> for Controller, because after the bounce, the controller fails over to 
> another broker.
> * Avoid bouncing broker in order to lose its leadership: it would be good if 
> we have a way to specify which broker should be excluded from serving 
> traffic/leadership (without changing the replica assignment ordering by 
> reassignments, even though that's quick), and run preferred leader election.  
> A bouncing broker will cause temporary URP, and sometimes other issues.  Also 
> a bouncing of broker (e.g. broker_id 1) can temporarily lose all its 
> leadership, but if another broker (e.g. broker_id 2) fails or gets bounced, 
> some of its leaderships will likely failover to broker_id 1 on a replica with 
> 3 brokers.  If broker_id 1 is in the blacklist, then in such a scenario even 
> broker_id 2 offline,  the 3rd broker can take leadership. 
> The current work-around of the above is to change the topic/partition's 
> replica reassignments to move the broker_id 1 from the first position to the 
> last position and run preferred leader election. e.g. (1, 2, 3) => (2, 3, 1). 
> This changes the replica reassignments, and we need to keep track of the 
> original one and restore if things change (e.g. controller fails over to 
> another broker, the swapped empty broker caught up). That’s a rather tedious 
> task.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)