Re: topics stuck in "Leader: -1" after crash while migrating topics

James Brown Fri, 28 Apr 2017 10:58:57 -0700

The following is also appearing in the logs a lot, if anyone has any ideas:


INFO Partition [easypost.syslog,7] on broker 1: Cached zkVersion [647] not
equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

On Fri, Apr 28, 2017 at 10:43 AM, James Brown <jbr...@easypost.com> wrote:

> We're running 0.10.1.0 on a five-node cluster.
>
> I was in the process of migrating some topics from having 2 replicas to
> having three replicas when two the five machines in this cluster crashed
> (brokers 2 and 3).
>
> After restarting them, all of the topics that were previously assigned to
> them are unavailable and showing "Leader: -1".
>
> Example kafka-topics output:
>
> % kafka-topics.sh --zookeeper $ZK_HP --describe  --unavailable-partitions
> Topic: __consumer_offsets Partition: 9 Leader: -1 Replicas: 3,2,4 Isr:
> Topic: __consumer_offsets Partition: 13 Leader: -1 Replicas: 3,2,4 Isr:
> Topic: __consumer_offsets Partition: 17 Leader: -1 Replicas: 3,2,5 Isr:
> Topic: __consumer_offsets Partition: 23 Leader: -1 Replicas: 5,2,1 Isr:
> Topic: __consumer_offsets Partition: 25 Leader: -1 Replicas: 3,2,5 Isr:
> Topic: __consumer_offsets Partition: 26 Leader: -1 Replicas: 3,2,1 Isr:
> Topic: __consumer_offsets Partition: 30 Leader: -1 Replicas: 3,1,2 Isr:
> Topic: __consumer_offsets Partition: 33 Leader: -1 Replicas: 1,2,4 Isr:
> Topic: __consumer_offsets Partition: 35 Leader: -1 Replicas: 1,2,5 Isr:
> Topic: __consumer_offsets Partition: 39 Leader: -1 Replicas: 3,1,2 Isr:
> Topic: __consumer_offsets Partition: 40 Leader: -1 Replicas: 3,4,2 Isr:
> Topic: __consumer_offsets Partition: 44 Leader: -1 Replicas: 3,1,2 Isr:
> Topic: __consumer_offsets Partition: 45 Leader: -1 Replicas: 1,3,2 Isr:
>
> Note that I wasn't even moving any of the __consumer_offsets partitions,
> so I'm not sure if the fact that a reassignment was in progress is a red
> herring or not.
>
> The logs are full of
>
> ERROR [ReplicaFetcherThread-0-3], Error for partition [tracking.syslog,2]
> to broker 3:org.apache.kafka.common.errors.UnknownServerException: The
> server experienced an unexpected error when processing the request
> (kafka.server.ReplicaFetcherThread)
> ERROR [ReplicaFetcherThread-0-3], Error for partition [tracking.syslog,2]
> to broker 3:org.apache.kafka.common.errors.UnknownServerException: The
> server experienced an unexpected error when processing the request
> (kafka.server.ReplicaFetcherThread)
> ERROR [ReplicaFetcherThread-0-3], Error for partition
> [epostg.request_log_v1,0] to broker 
> 3:org.apache.kafka.common.errors.UnknownServerException:
> The server experienced an unexpected error when processing the request
> (kafka.server.ReplicaFetcherThread)
> ERROR [ReplicaFetcherThread-0-3], Error for partition
> [epostg.request_log_v1,0] to broker 
> 3:org.apache.kafka.common.errors.UnknownServerException:
> The server experienced an unexpected error when processing the request
> (kafka.server.ReplicaFetcherThread)
>
> What can I do to fix this? Should I manually reassign all partitions that
> were led by brokers 2 or 3 to only have whatever the third broker was in
> their replica-set as their replica set? Do I need to temporarily enable
> unclean elections?
>
> I've never seen a cluster fail this way...
>
> --
> James Brown
> Engineer
>



-- 
James Brown
Engineer

Re: topics stuck in "Leader: -1" after crash while migrating topics

Reply via email to