[ https://issues.apache.org/jira/browse/KAFKA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928906#comment-16928906 ]
GEORGE LI commented on KAFKA-8903: ---------------------------------- e.g. {code:java} /usr/lib/kafka/bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --add-config replica.start.offset.strategy=Latest --entity-type brokers --entity-name 1028 Completed updating config for broker: 1028. [2019-09-08 06:48:34,997] 1307 [/config/changes-event-process-thread] INFO kafka.server.DynamicConfigManager - Processing override for entityPath: brokers/1028 with config: Map(replica.start.offset.strategy -> Latest) ...... [2019-09-08 07:34:31,826] 1777 [ReplicaFetcherThread-1-1025] INFO kafka.server.ReplicaFetcherThread - [ReplicaFetcher replicaId=1028, leaderId=1025, fetcherId=1] brokerConfig.ReplicaStartOffsetStrategy: Latest [2019-09-08 07:34:31,826] 1777 [ReplicaFetcherThread-1-1025] INFO kafka.server.ReplicaFetcherThread - [ReplicaFetcher replicaId=1028, leaderId=1025, fetcherId=1] replica.logEndOffset.messageOffset: 0 [2019-09-08 07:34:31,826] 1777 [ReplicaFetcherThread-1-1025] WARN kafka.server.ReplicaFetcherThread - [ReplicaFetcher replicaId=1028, leaderId=1025, fetcherId=1] replica.start.offset.strategy: Latest. Reset fetch offset for partition georgeli_test-0 from 0 to current leader's latest offset 339347 ... [2019-09-08 07:34:31,826] 1777 [ReplicaFetcherThread-1-1025] INFO kafka.log.LogCleaner - The cleaning for partition georgeli_test-0 is aborted and paused [2019-09-08 07:34:31,826] 1777 [ReplicaFetcherThread-1-1025] INFO kafka.log.Log - [Log partition=georgeli_test-0, dir=/var/kafka-spool/data] Scheduling log segment [baseOffset 0, size 0] for deletion. ... [2019-09-08 07:34:31,828] 1779 [ReplicaFetcherThread-1-1025] INFO kafka.log.LogCleaner - Compaction for partition georgeli_test-0 is resumed ... {code} To remove the config and use default "Earliest" leader start offset {code:java} /usr/lib/kafka/bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --delete-config replica.start.offset.strategy --entity-type brokers --entity-name 1028 Completed updating config for broker: 1028. [2019-09-08 23:18:20,581] 468051 [/config/changes-event-process-thread] INFO kafka.common.ZkNodeChangeNotificationListener - Processing notification(s) to /config/changes [2019-09-08 23:18:20,588] 468058 [/config/changes-event-process-thread] INFO kafka.server.DynamicConfigManager - Processing override for entityPath: brokers/1028 with config: Map() [2019-09-08 23:18:20,589] 468059 [/config/changes-event-process-thread] INFO kafka.server.KafkaConfig - KafkaConfig values: advertised.host.name = kafka1028-dc3 ... replica.start.offset.strategy = Earliest ... [2019-09-08 23:23:36,408] 1773 [ReplicaFetcherThread-1-1025] INFO kafka.log.Log - [Log partition=georgeli_test-0, dir=/var/kafka-spool/data] Truncating to 0 has no effect as the largest offset in the log is -1 [2019-09-08 23:23:36,439] 1804 [ReplicaFetcherThread-1-1025] WARN kafka.server.ReplicaFetcherThread - [ReplicaFetcher replicaId=1028, leaderId=1025, fetcherId=1] Reset fetch offset for partition georgeli_test-0 from 0 to current leader's start offset 319246 [2019-09-08 23:23:36,440] 1805 [ReplicaFetcherThread-1-1025] INFO kafka.log.LogCleaner - The cleaning for partition georgeli_test-0 is aborted and paused [2019-09-08 23:23:36,440] 1805 [ReplicaFetcherThread-1-1025] INFO kafka.log.Log - [Log partition=georgeli_test-0, dir=/var/kafka-spool/data] Scheduling log segment [baseOffset 0, size 0] for deletion. [2019-09-08 23:23:36,441] 1806 [ReplicaFetcherThread-1-1025] INFO kafka.log.LogCleaner - Compaction for partition georgeli_test-0 is resumed [2019-09-08 23:23:36,449] 1814 [ReplicaFetcherThread-1-1025] INFO kafka.server.ReplicaFetcherThread - [ReplicaFetcher replicaId=1028, leaderId=1025, fetcherId=1] Current offset 0 for partition georgeli_test-0 is out of range, which typically implies a leader change. Reset fetch offset to 319246 ... {code} >From above, we can see the new config "replica.start.offset.strategy" >Earliest/Latest effect on the empty/new partition's start offset. 319246 >Vs. 339347 > allow the new replica (offset 0) to catch up with current leader using latest > offset > ------------------------------------------------------------------------------------ > > Key: KAFKA-8903 > URL: https://issues.apache.org/jira/browse/KAFKA-8903 > Project: Kafka > Issue Type: Improvement > Components: config, core > Affects Versions: 1.1.0, 1.1.1, 2.3.0 > Reporter: GEORGE LI > Assignee: GEORGE LI > Priority: Major > > It very common (and sometimes frequent) that a broker has hardware failures > (disk, memory, cpu, nic) for large Kafka deployment with thousands of > brokers. The failed host will be replaced by a new one with the same > "broker.id", and the new broker starts up as empty. All topic/partitions > will start with offset 0. If the current leader has start offset > 0, this > replaced broker will start the partition from the leader's earliest (start) > offset. > If the number of partitions and size of the partitions that this broker is > hosting is high, it would take quite sometime for the ReplicaFetcher threads > to pull from all the leaders in the cluster. and it could incur load of the > brokers/leaders in the cluster affecting Latency, etc. performance. Once > this replaced broker is caught up, Preferred leader election can be run to > move the leaders back to this broker. > To avoid above performance impact and make the failed broker replacement > process much easier and scalable, we are proposing a new Dynamic config {{ > replica.start.offset.strategy}}. The default is Earliest, and can be > dynamically set for a broker (or brokers) to Latest. If it's set to Latest, > when the empty broker is starting up, all partitions will be starting from > latest (LEO LogEndOffset) of the current leader. So the replace broker > replicas are in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. > For preferred leadership election, we can wait till the retention time has > passed, and this replaced broker is in the replication for enough time. The > better/safer approach is enable Preferred Leader Blacklist mentioned in > KAFKA-8638 / KIP-491 , so before this replaced broker is completely caught > up, it's leadership determination priority is moved to the lowest. -- This message was sent by Atlassian Jira (v8.3.2#803003)