[jira] [Updated] (KAFKA-8903) allow the new replica (offset 0) to catch up with current leader using latest offset

2019-09-12 Thread GEORGE LI (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GEORGE LI updated KAFKA-8903:
-
Description: 
It very common (and sometimes frequent) that a broker has hardware failures 
(disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers.  
The failed host will be replaced by a new one with the same "broker.id",  and 
the new broker starts up as empty.  All topic/partitions will start with offset 
0.  If the current leader has start offset > 0,  this replaced broker will 
start the partition from the leader's earliest (start) offset. 

If the number of partitions  and size of the partitions that this broker is 
hosting is high, it would take quite sometime for the ReplicaFetcher threads to 
pull from all the leaders in the cluster.  and it could incur load of the 
brokers/leaders in the cluster affecting Latency, etc.  performance.   Once 
this replaced broker is caught up,  Preferred leader election can be run to 
move the leaders back to this broker. 

To avoid above performance impact and make the failed broker replacement 
process much easier and scalable,  we are proposing a new Dynamic config 
_replica.start.offset.strategy_.  The default is Earliest, and can be 
dynamically set for a broker (or brokers) to Latest.  If it's set to Latest,  
when the empty broker is starting up, all partitions will be starting from 
latest (LEO LogEndOffset) of the current leader.  So the replace broker 
replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. 

For preferred leadership election, we can wait till the retention time has 
passed, and this replaced broker is in the replication for enough time.  The 
better/safer approach is enable Preferred Leader Blacklist  mentioned in  
KAFKA-8638 /  
[KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]
  ,  so before this replaced broker is completely caught up,  it's leadership 
determination priority is moved to the lowest. 










  was:
It very common (and sometimes frequent) that a broker has hardware failures 
(disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers.  
The failed host will be replaced by a new one with the same "broker.id",  and 
the new broker starts up as empty.  All topic/partitions will start with offset 
0.  If the current leader has start offset > 0,  this replaced broker will 
start the partition from the leader's earliest (start) offset. 

If the number of partitions  and size of the partitions that this broker is 
hosting is high, it would take quite sometime for the ReplicaFetcher threads to 
pull from all the leaders in the cluster.  and it could incur load of the 
brokers/leaders in the cluster affecting Latency, etc.  performance.   Once 
this replaced broker is caught up,  Preferred leader election can be run to 
move the leaders back to this broker. 

To avoid above performance impact and make the failed broker replacement 
process much easier and scalable,  we are proposing a new Dynamic config {{ 
replica.start.offset.strategy}}.  The default is Earliest, and can be 
dynamically set for a broker (or brokers) to Latest.  If it's set to Latest,  
when the empty broker is starting up, all partitions will be starting from 
latest (LEO LogEndOffset) of the current leader.  So the replace broker 
replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. 

For preferred leadership election, we can wait till the retention time has 
passed, and this replaced broker is in the replication for enough time.  The 
better/safer approach is enable Preferred Leader Blacklist  mentioned in  
KAFKA-8638 /  
[KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]
  ,  so before this replaced broker is completely caught up,  it's leadership 
determination priority is moved to the lowest. 











> allow the new replica (offset 0) to catch up with current leader using latest 
> offset
> 
>
> Key: KAFKA-8903
> URL: https://issues.apache.org/jira/browse/KAFKA-8903
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, core
>Affects Versions: 1.1.0, 1.1.1, 2.3.0
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> It very common (and sometimes frequent) that a broker has hardware failures 
> (disk, memory, cpu, nic) for large Kafka deployment with thousands of 
> brokers.  The failed host will be replaced by a new one with the same 
> "broker.id",  and the new broker starts up as empty.  All topic/partitions 
> will start with offset 0.  If the current leader has start offset > 0,  this 
> replaced broker will start the partition from the leader's earliest (start) 
> offset. 
> If the number of partitions  and 

[jira] [Updated] (KAFKA-8903) allow the new replica (offset 0) to catch up with current leader using latest offset

2019-09-12 Thread GEORGE LI (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GEORGE LI updated KAFKA-8903:
-
Description: 
It very common (and sometimes frequent) that a broker has hardware failures 
(disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers.  
The failed host will be replaced by a new one with the same "broker.id",  and 
the new broker starts up as empty.  All topic/partitions will start with offset 
0.  If the current leader has start offset > 0,  this replaced broker will 
start the partition from the leader's earliest (start) offset. 

If the number of partitions  and size of the partitions that this broker is 
hosting is high, it would take quite sometime for the ReplicaFetcher threads to 
pull from all the leaders in the cluster.  and it could incur load of the 
brokers/leaders in the cluster affecting Latency, etc.  performance.   Once 
this replaced broker is caught up,  Preferred leader election can be run to 
move the leaders back to this broker. 

To avoid above performance impact and make the failed broker replacement 
process much easier and scalable,  we are proposing a new Dynamic config {{ 
replica.start.offset.strategy}}.  The default is Earliest, and can be 
dynamically set for a broker (or brokers) to Latest.  If it's set to Latest,  
when the empty broker is starting up, all partitions will be starting from 
latest (LEO LogEndOffset) of the current leader.  So the replace broker 
replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. 

For preferred leadership election, we can wait till the retention time has 
passed, and this replaced broker is in the replication for enough time.  The 
better/safer approach is enable Preferred Leader Blacklist  mentioned in  
KAFKA-8638 /  
[KIP-491|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982]
  ,  so before this replaced broker is completely caught up,  it's leadership 
determination priority is moved to the lowest. 










  was:
It very common (and sometimes frequent) that a broker has hardware failures 
(disk, memory, cpu, nic) for large Kafka deployment with thousands of brokers.  
The failed host will be replaced by a new one with the same "broker.id",  and 
the new broker starts up as empty.  All topic/partitions will start with offset 
0.  If the current leader has start offset > 0,  this replaced broker will 
start the partition from the leader's earliest (start) offset. 

If the number of partitions  and size of the partitions that this broker is 
hosting is high, it would take quite sometime for the ReplicaFetcher threads to 
pull from all the leaders in the cluster.  and it could incur load of the 
brokers/leaders in the cluster affecting Latency, etc.  performance.   Once 
this replaced broker is caught up,  Preferred leader election can be run to 
move the leaders back to this broker. 

To avoid above performance impact and make the failed broker replacement 
process much easier and scalable,  we are proposing a new Dynamic config {{ 
replica.start.offset.strategy}}.  The default is Earliest, and can be 
dynamically set for a broker (or brokers) to Latest.  If it's set to Latest,  
when the empty broker is starting up, all partitions will be starting from 
latest (LEO LogEndOffset) of the current leader.  So the replace broker 
replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. 

For preferred leadership election, we can wait till the retention time has 
passed, and this replaced broker is in the replication for enough time.  The 
better/safer approach is enable Preferred Leader Blacklist  mentioned in  
KAFKA-8638 /  KIP-491  ,  so before this replaced broker is completely caught 
up,  it's leadership determination priority is moved to the lowest. 











> allow the new replica (offset 0) to catch up with current leader using latest 
> offset
> 
>
> Key: KAFKA-8903
> URL: https://issues.apache.org/jira/browse/KAFKA-8903
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, core
>Affects Versions: 1.1.0, 1.1.1, 2.3.0
>Reporter: GEORGE LI
>Assignee: GEORGE LI
>Priority: Major
>
> It very common (and sometimes frequent) that a broker has hardware failures 
> (disk, memory, cpu, nic) for large Kafka deployment with thousands of 
> brokers.  The failed host will be replaced by a new one with the same 
> "broker.id",  and the new broker starts up as empty.  All topic/partitions 
> will start with offset 0.  If the current leader has start offset > 0,  this 
> replaced broker will start the partition from the leader's earliest (start) 
> offset. 
> If the number of partitions  and size of the partitions that this broker is 
> hosting is high, it would take