[ 
https://issues.apache.org/jira/browse/KAFKA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-6636:
----------------------------
    Description: 
ReplicaFetcherThread can die in the following scenario:

 

1) Partition P1 has replica set size 1. Broker A is the leader. The segment is 
empty and log start offset is 100

2) User executes partition reassignment to change replica set from \{A} to \{B, 
C}

3) Broker B starts ReplicaFetcherThread, which triggers 
handleOffsetOutOfRange(), truncates the log fully and start at offset 100. At 
this moment its high watermark is still 0 (or -1). Same for broker C.

4) Broker B sends FetchRequest to A at offset 100, broker A immediately adds 
broker B to ISR set, and controller moves leadership to broker B.

5) Broker B handles LeaderAndIsrRequest to become leader. It calls 
`leaderReplica.convertHWToLocalOffsetMetadata()` to initialize its HW. Since 
its HW was smaller than logStartOffset=100, now its HW will be overridden to 
LogOffsetMetadata.UnknownOffsetMetadata, i.e. -1.

6) Broker C handles LeaderAndIsrRequest to fetch from broker B. Broker C 
updates its HW to the FetchRequest's HW, i.e. -1. Then broker C calls 
replica.maybeIncrementLogStartOffset(leaderLogStartOffset) where 
leaderLogStartOffset=100. This cause exception because leaderLogStartOffset > 
HW. This is an unhandled exception and thus the ReplicaFetcherThread will exit

> ReplicaFetcherThread should not die if hw < 0
> ---------------------------------------------
>
>                 Key: KAFKA-6636
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6636
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Dong Lin
>            Priority: Major
>
> ReplicaFetcherThread can die in the following scenario:
>  
> 1) Partition P1 has replica set size 1. Broker A is the leader. The segment 
> is empty and log start offset is 100
> 2) User executes partition reassignment to change replica set from \{A} to 
> \{B, C}
> 3) Broker B starts ReplicaFetcherThread, which triggers 
> handleOffsetOutOfRange(), truncates the log fully and start at offset 100. At 
> this moment its high watermark is still 0 (or -1). Same for broker C.
> 4) Broker B sends FetchRequest to A at offset 100, broker A immediately adds 
> broker B to ISR set, and controller moves leadership to broker B.
> 5) Broker B handles LeaderAndIsrRequest to become leader. It calls 
> `leaderReplica.convertHWToLocalOffsetMetadata()` to initialize its HW. Since 
> its HW was smaller than logStartOffset=100, now its HW will be overridden to 
> LogOffsetMetadata.UnknownOffsetMetadata, i.e. -1.
> 6) Broker C handles LeaderAndIsrRequest to fetch from broker B. Broker C 
> updates its HW to the FetchRequest's HW, i.e. -1. Then broker C calls 
> replica.maybeIncrementLogStartOffset(leaderLogStartOffset) where 
> leaderLogStartOffset=100. This cause exception because leaderLogStartOffset > 
> HW. This is an unhandled exception and thus the ReplicaFetcherThread will exit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to