It might be related to KAFKA-2477.

On Thu, Oct 29, 2015 at 6:44 AM, Andrew Otto <ao...@wikimedia.org> wrote:

> Hi all,
>
> This morning I woke up to see a very high max replica lag on one of my
> brokers.  I looked at logs, and it seems that one of the replica fetchers
> for a partition just decided that its offset was out of range, so it reset
> its offset to the beginning of the leader’s log and started replicating
> from there.  This broker is currently catching back up, so things will be
> fine.
>
> But, I’m curious.  Has anyone seen this before?  Why would this just
> happen?
>
> The logs show that many segments for this partition were scheduled for
> deletion all at once, right before the fetcher reset its offset:
>
>
> [2015-10-29 09:27:11,899] 5421994218 [ReplicaFetcherThread-5-14] INFO
> kafka.log.Log  - Scheduling log segment 28493996399 for log
> webrequest_upload-0 for deletion.
> …
> (repeats for about 950 segments…)
> …
> [2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] WARN
> kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Replica
> 18 for partition [webrequest_upload,0] reset its fetch offset from
> 28493996399 to current leader 14's start offset 28493996399
> [2015-10-29 09:27:12,606] 5421994925 [ReplicaFetcherThread-5-14] ERROR
> kafka.server.ReplicaFetcherThread  - [ReplicaFetcherThread-5-14], Current
> offset 31062784634 for partition [webrequest_upload,0] out of range; reset
> offset to 28493996399
> …
>
>
> A more complete capture of this log is here:
> https://gist.github.com/ottomata/033ddef8f699ca09cfa8 <
> https://gist.github.com/ottomata/033ddef8f699ca09cfa8>
>
> Thanks!
> -Ao
>
>

Reply via email to