Thanks for your sharing Sahil, just FYI there is a KIP proposal for
considering always turn on "log.cleaner.enable" here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-184%3A+Rename+LogCleaner+and+related+classes+to+LogCompactor


Guozhang


On Thu, Aug 3, 2017 at 5:58 AM, sahil aggarwal <sahil.ag...@gmail.com>
wrote:

> Face the similar issue in kafka 0.10.0.1. Going through the kafka code
> figured that when coordinator goes down the other ISR scans whole log
> file of partition of __consumer_offsets for my consumer group to
> update the cache of offsets. In my case its size was around ~600G
> which took around ~40 mins during which consumers were without
> coordinator. So duration of consumers being in this state depends on
> how big log file of partition is.
>
>
> Did following changes in broker config to fix it:
>
>
> log.cleaner.enable=true
>
>
> (This enabled the __consumer_offsets log files to compact every 10 mins).
>
>
>
> On Sun, May 14, 2017 at 1:01 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > Hi,
> >
> > I just dug a little bit. The messages are logged at INFO level and thus
> > should not be a problem if they go away by themselves after some time.
> > Compare:
> > https://groups.google.com/forum/#!topic/confluent-platform/A14dkPlDlv4
> >
> > Do you still see missing data?
> >
> >
> > -Matthias
> >
> >
> > On 5/11/17 2:39 AM, Mahendra Kariya wrote:
> > > Hi Matthias,
> > >
> > > We faced the issue again. The logs are below.
> > >
> > > 16:13:16.527 [StreamThread-7] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Marking the coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) dead
> > for
> > > group grp_id
> > > 16:13:16.543 [StreamThread-3] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Discovered coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) for
> > group
> > > grp_id.
> > > 16:13:16.543 [StreamThread-3] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Marking the coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) dead
> > for
> > > group grp_id
> > > 16:13:16.547 [StreamThread-6] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Discovered coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) for
> > group
> > > grp_id.
> > > 16:13:16.547 [StreamThread-6] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Marking the coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) dead
> > for
> > > group grp_id
> > > 16:13:16.551 [StreamThread-1] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Discovered coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) for
> > group
> > > grp_id.
> > > 16:13:16.551 [StreamThread-1] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Marking the coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) dead
> > for
> > > group grp_id
> > > 16:13:16.572 [StreamThread-4] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Discovered coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) for
> > group
> > > grp_id.
> > > 16:13:16.572 [StreamThread-4] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Marking the coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) dead
> > for
> > > group grp_id
> > > 16:13:16.573 [StreamThread-2] INFO o.a.k.c.c.i.AbstractCoordinator -
> > > Discovered coordinator broker-05:6667 (id: 2147483642
> <(214)%20748-3642> rack: null) for
> > group
> > > grp_id.
> > >
> > >
> > >
> > > On Tue, May 9, 2017 at 3:40 AM, Matthias J. Sax <matth...@confluent.io
> >
> > > wrote:
> > >
> > >> Great! Glad 0.10.2.1 fixes it for you!
> > >>
> > >> -Matthias
> > >>
> > >> On 5/7/17 8:57 PM, Mahendra Kariya wrote:
> > >>> Upgrading to 0.10.2.1 seems to have fixed the issue.
> > >>>
> > >>> Until now, we were looking at random 1 hour data to analyse the
> issue.
> > >> Over
> > >>> the weekend, we have written a simple test that will continuously
> check
> > >> for
> > >>> inconsistencies in real time and report if there is any issue.
> > >>>
> > >>> No issues have been reported for the last 24 hours. Will update this
> > >> thread
> > >>> if we find any issue.
> > >>>
> > >>> Thanks for all the support!
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 5, 2017 at 3:55 AM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > >>> wrote:
> > >>>
> > >>>> About
> > >>>>
> > >>>>> 07:44:08.493 [StreamThread-10] INFO o.a.k.c.c.i.AbstractCoordinato
> r
> > -
> > >>>>> Discovered coordinator broker-05:6667 for group group-2.
> > >>>>
> > >>>> Please upgrade to Streams 0.10.2.1 -- we fixed couple of bug and I
> > would
> > >>>> assume this issue is fixed, too. If not, please report back.
> > >>>>
> > >>>>> Another question that I have is, is there a way for us detect how
> > many
> > >>>>> messages have come out of order? And if possible, what is the
> delay?
> > >>>>
> > >>>> There is no metric or api for this. What you could do though is, to
> > use
> > >>>> #transform() that only forwards each record and as a side task,
> > extracts
> > >>>> the timestamp via `context#timestamp()` and does some book keeping
> to
> > >>>> compute if out-of-order and what the delay was.
> > >>>>
> > >>>>
> > >>>>>>>  - same for .mapValues()
> > >>>>>>>
> > >>>>>>
> > >>>>>> I am not sure how to check this.
> > >>>>
> > >>>> The same way as you do for filter()?
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 5/4/17 10:29 AM, Mahendra Kariya wrote:
> > >>>>> Hi Matthias,
> > >>>>>
> > >>>>> Please find the answers below.
> > >>>>>
> > >>>>> I would recommend to double check the following:
> > >>>>>>
> > >>>>>>  - can you confirm that the filter does not remove all data for
> > those
> > >>>>>> time periods?
> > >>>>>>
> > >>>>>
> > >>>>> Filter does not remove all data. There is a lot of data coming in
> > even
> > >>>>> after the filter stage.
> > >>>>>
> > >>>>>
> > >>>>>>  - I would also check input for your AggregatorFunction() -- does
> it
> > >>>>>> receive everything?
> > >>>>>>
> > >>>>>
> > >>>>> Yes. Aggregate function seems to be receiving everything.
> > >>>>>
> > >>>>>
> > >>>>>>  - same for .mapValues()
> > >>>>>>
> > >>>>>
> > >>>>> I am not sure how to check this.
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>



-- 
-- Guozhang

Reply via email to