Replica Fetcher Reset Its Offset to beginning

Andrew Otto Thu, 29 Oct 2015 06:52:18 -0700

Hi all,

This morning I woke up to see a very high max replica lag on one of my brokers. 
 I looked at logs, and it seems that one of the replica fetchers for a 
partition just decided that its offset was out of range, so it reset its offset 
to the beginning of the leader’s log and started replicating from there.  This 
broker is currently catching back up, so things will be fine.


But, I’m curious.  Has anyone seen this before?  Why would this just happen?

The logs show that many segments for this partition were scheduled for deletion 
all at once, right before the fetcher reset its offset:


[2015-10-29 09:27:11,899] 5421994218 [ReplicaFetcherThread-5-14] INFO  
kafka.log.Log  - Scheduling log segment 28493996399 for log webrequest_upload-0 
for deletion.
…
(repeats for about 950 segments…)
…
[2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] WARN  
kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Replica 18 
for partition [webrequest_upload,0] reset its fetch offset from 28493996399 to 
current leader 14's start offset 28493996399
[2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] ERROR 
kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Current 
offset 31062784634 for partition [webrequest_upload,0] out of range; reset 
offset to 28493996399
…


A more complete capture of this log is here:
https://gist.github.com/ottomata/033ddef8f699ca09cfa8 
<https://gist.github.com/ottomata/033ddef8f699ca09cfa8>

Thanks!
-Ao

Replica Fetcher Reset Its Offset to beginning

Reply via email to