Re: kafka in unrecoverable state

SenthilKumar K Sat, 07 Apr 2018 06:07:38 -0700

I observed below error in one of the broker , and it is unresponsive ...

2018-04-07 12:51:39,830] ERROR [Replica Manager on Broker 3]: Error
processing append operation on partition __consumer_offsets-27
(kafka.server.ReplicaManager)


org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync
replicas for partition __consumer_offsets-27 is [1], below required minimum
[2]


First 24 hours cluster works well under ~60K messages/sec inbound& outbound
load after a day broker is unresponsive and Group Coordinator started
thrwoing  "Offset commit failed with a retriable exception. You should
retry committing offsets. The underlying error was: The coordinator is not
available".


./bin/kafka-topics.sh --zookeeper localhost:2181:/kafka --describe --topic
__consumer_offsets

Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer

Topic: __consumer_offsets Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3

Topic: __consumer_offsets Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 1

Initially concumer_offsets topic replicas was in sync.


Kafka Version : 0.11.0.



On Fri, Aug 25, 2017 at 6:04 PM, Murad Mamedov <m...@muradm.net> wrote:

> At the time of first time it occurred, all replicas was in sync.
> But after restart of clients and brokers, exception started to occur
> immediately, and replicas becoming out of sync.
> As explained in the issue, bug related to configuration and timing of
> records.
>
> On Fri, Aug 25, 2017 at 10:31 AM, Dan Markhasin <minimi...@gmail.com>
> wrote:
>
> > If you run kafka-topics.sh --describe --topic __consumer_offsets, does it
> > show that all replicas are in sync?
> >
> > On 23 August 2017 at 23:11, Murad Mamedov <m...@muradm.net> wrote:
> >
> > > Hi David,
> > >
> > > Thanks for reply. However, I don't have problem with number of
> replicas.
> > I
> > > have 3 brokers. And topics configured accordingly, especially
> > > __consumer_offsets
> > >
> > > Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3
> > > Configs:segment.bytes=104857600,cleanup.policy=
> compact,compression.type=
> > > producer
> > >
> > > And everything was working find for months, until today.
> > >
> > > Why would I want changing replication factor? To what value?
> > >
> > > On Wed, Aug 23, 2017 at 11:19 PM, David Frederick <
> > > david.freder...@gmail.com
> > > > wrote:
> > >
> > > > |> NotEnoughReplicasException: Number of  insync replicas for
> partition
> > > > __consumer_offsets-17 is [1], below required minimum [2]
> > > >
> > > > Please refer to
> > > > https://stackoverflow.com/questions/37960767/how-to-
> > > > change-the-replicas-of-kafka-topic.
> > > > Hope it helps!
> > > >
> > > >
> > > > On Aug 23, 2017 5:17 AM, "Murad Mamedov" <m...@muradm.net> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Did you manage to find the root cause of this issue?
> > > > >
> > > > > Same thing happened here.
> > > > >
> > > > > Thanks in advance
> > > > >
> > > > > On Tue, Jun 13, 2017 at 7:50 PM, Paul van der Linden <
> > > p...@sportr.co.uk>
> > > > > wrote:
> > > > >
> > > > > > I managed to solve it by:
> > > > > > - stopping and deleting all data on kafka & zookeeper
> > > > > > - stopping all consumers and producers
> > > > > > - starting kafka & zookeeper, waiting till they are up
> > > > > > - start all consumers & producers,
> > > > > >
> > > > > > Is there a better way to do this, without data loss and halting
> > > > > everything?
> > > > > >
> > > > > > On Tue, Jun 13, 2017 at 4:28 PM, Paul van der Linden <
> > > > p...@sportr.co.uk>
> > > > > > wrote:
> > > > > >
> > > > > > > A few lines of the logs:
> > > > > > >
> > > > > > > [2017-06-13 15:25:37,343] INFO [GroupCoordinator 0]: Stabilized
> > > group
> > > > > > > summarizer generation 701 (kafka.coordinator.GroupCoordinator)
> > > > > > > [2017-06-13 15:25:37,345] INFO [GroupCoordinator 0]: Assignment
> > > > > received
> > > > > > > from leader for group summarizer for generation 701
> > > > (kafka.coordinator.
> > > > > > > GroupCoordinator)
> > > > > > > [2017-06-13 15:25:37,345] ERROR [Replica Manager on Broker 0]:
> > > Error
> > > > > > > processing append operation on partition __consumer_offsets-17
> > > > > > > (kafka.server.ReplicaManager)
> > > > > > > org.apache.kafka.common.errors.NotEnoughReplicasException:
> > Number
> > > of
> > > > > > > insync replicas for partition __consumer_offsets-17 is [1],
> below
> > > > > > required
> > > > > > > minimum [2]
> > > > > > > [2017-06-13 15:25:37,345] INFO [GroupCoordinator 0]: Preparing
> to
> > > > > > > restabilize group summarizer with old generation 701
> > > > > (kafka.coordinator.
> > > > > > > GroupCoordinator)
> > > > > > >
> > > > > > > This keeps happening, for all consumer offsets and all groups,
> > etc
> > > > > > >
> > > > > > > On Tue, Jun 13, 2017 at 4:21 PM, Paul van der Linden <
> > > > > p...@sportr.co.uk>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> I'm trying to find out how to at least get my kafka working
> > again.
> > > > > > >> Something went wrong and kafka has halted to a throughput of 0
> > > > > > messages. It
> > > > > > >> keeps looping on stablizing consumer groups, and erroring on
> an
> > > > append
> > > > > > >> operation to the offset paritions, plus Not enough replicas.
> > > > > > >>
> > > > > > >> The weird things is, that after not being able to work this
> out
> > I
> > > > want
> > > > > > >> pretty brutal (luckily I can afford to loose more messages):
> > > > > > >> - delete all kafka and zookeeper instances
> > > > > > >> - updated kafka
> > > > > > >> - cleared all disk
> > > > > > >>
> > > > > > >> Still kafka is in this unrecoverable error. Does anyone have
> any
> > > > idea
> > > > > > how
> > > > > > >> to fix this?
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > *Murad M*
> > > > > *M (tr): +90 (533) 4874329*
> > > > > *E: m...@muradm.net <m...@muradm.net>*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > *Murad M*
> > > *M (tr): +90 (533) 4874329*
> > > *E: m...@muradm.net <m...@muradm.net>*
> > >
> >
>
>
>
> --
> Regards,
> *Murad M*
> *M (tr): +90 (533) 4874329*
> *E: m...@muradm.net <m...@muradm.net>*
>

Re: kafka in unrecoverable state

Reply via email to