Re: Frequent ZK session timeouts

Dillian Murphey Tue, 12 Jan 2016 23:20:07 -0800

[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
22:16:59,629 <http://airmail.calendar/2016-01-12%2022:16:59%20PST>] TRACE
[Controller 925537]: leader imbalance ratio for broker 925537 is 0.000000
(kafka.controller.KafkaController)


[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
[SessionExpirationListener on 925537], ZK expired; shut down all controller
components and try to re-elect
(kafka.controller.KafkaController$SessionExpirationListener)

[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
[delete-topics-thread-925537], Shutting down
(kafka.controller.TopicDeletionManager$DeleteTopicsThread)

[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
[delete-topics-thread-925537], Shutdown completed
(kafka.controller.TopicDeletionManager$DeleteTopicsThread)

[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST>
22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO
[delete-topics-thread-925537], Stopped
 (kafka.controller.TopicDeletionManager$Del

 This occurs very frequently, even after clean slating kafka.  This is
something that never occurs in our production env. I've read here and there
that it could be a GC issue? Here is the tail end of recent GC log.


20534K(8354560K), 52.5293140 secs] [Times: user=209.09 sys=0.06, real=52.53
secs]

2016-01-11T23:16:05.149+0000: 784.219: [GC 784.219: [ParNew:
274263K->1685K(306688K), 54.8993730 secs] 793174K->520803K(8354560K),
54.8994450 secs] [Times: user=218.86 sys=0.03, real=54.90 secs]

2016-01-11T23:17:01.095+0000: 840.165: [GC 840.165: [ParNew:
274325K->1896K(306688K), 56.4208930 secs] 793443K->521139K(8354560K),
56.4209750 secs] [Times: user=224.88 sys=0.05, real=56.42 secs]

2016-01-11T23:17:59.024+0000: 898.093: [GC 898.093: [ParNew:
274536K->1705K(306688K), 58.1100630 secs] 793779K->521093K(8354560K),
58.1101400 secs] [Times: user=231.75 sys=0.05, real=58.12 secs]

2016-01-11T23:18:58.240+0000: 957.310: [GC 957.310: [ParNew:
274345K->1483K(306688K), 64.2820420 secs] 793733K->521047K(8354560K),
64.2821180 secs] [Times: user=241.93 sys=0.06, real=64.28 secs]

2016-01-11T23:20:03.571+0000: 1022.640: [GC 1022.640: [ParNew:
274123K->1379K(306688K), 61.5305280 secs] 793687K->521097K(8354560K),
61.5305990 secs] [Times: user=245.72 sys=0.01, real=61.53 secs]

2016-01-11T23:21:06.194+0000: 1085.263: [GC 1085.263: [ParNew:
274019K->1508K(306688K), 63.4433440 secs] 793737K->521372K(8354560K),
63.4434240 secs] [Times: user=253.33 sys=0.02, real=63.44 secs]

2016-01-11T23:22:10.413+0000: 1149.482: [GC 1149.483: [ParNew:
274148K->1313K(306688K), 65.6956010 secs] 794012K->521330K(8354560K),
65.6956660 secs] [Times: user=262.01 sys=0.05, real=65.69 secs]

Heap

 par new generation   total 306688K, used 132112K [0x00000005f5a00000,
0x000000060a6c0000, 0x000000060a6c0000)

  eden space 272640K,  47% used [0x00000005f5a00000, 0x00000005fd9bbba0,
0x0000000606440000)

  from space 34048K,   3% used [0x0000000606440000, 0x00000006065884a8,
0x0000000608580000)

  to   space 34048K,   0% used [0x0000000608580000, 0x0000000608580000,
0x000000060a6c0000)

 concurrent mark-sweep generation total 8047872K, used 520016K
[0x000000060a6c0000, 0x00000007f5a00000, 0x00000007f5a00000)

 concurrent-mark-sweep perm gen total 38760K, used 25768K
[0x00000007f5a00000, 0x00000007f7fda000, 0x0000000800000000)



On Tue, Jan 12, 2016 at 6:34 PM, Mayuresh Gharat <gharatmayures...@gmail.com
> wrote:

> Can you paste the logs?
>
> Thanks,
>
> Mayuresh
>
> On Tue, Jan 12, 2016 at 4:58 PM, Dillian Murphey <crackshotm...@gmail.com>
> wrote:
>
> > Possibly running more stable with 1.7 JVM.
> >
> > Can someone explain the Zookeeper session?  SHould it never expire,
> unless
> > the broker becomes unresponsive?  I set a massive timeout value in the
> > broker config far beyond the amount of time I see the zk expiration. Is
> > this entirely on the kafka side, or could zookeeper be doing something?
> > From my zk logs I didn't see anything unusual, just exceptions as a
> result
> > of the zk session expiring (my guess).
> >
> > tnx
> >
> > On Tue, Jan 12, 2016 at 3:05 PM, Dillian Murphey <
> crackshotm...@gmail.com>
> > wrote:
> >
> > > Our 2 node kafka cluster has become unhealthy.  We're running zookeeper
> > as
> > > a 3 node system, which very light load.
> > >
> > > What seems to be happening is in the controller log we get a ZK session
> > > expire message, and in the process of re-assigning the leader for the
> > > partitions (if I'm understanding this right, please correct me), the
> > broker
> > > goes offline and it interrupts our applications that are publishing
> > > messages.
> > >
> > > We don't see this in production, and kafka has been stable for months,
> > > since september.
> > >
> > > I've searched a lot and found some similiar complaints but no real
> > > solutions.
> > >
> > > I'm running 0.8.2 and JVM 1.6.X on ubuntu.
> > >
> > > Thanks for any ideas at all.
> > >
> > >
> >
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>

Re: Frequent ZK session timeouts

Reply via email to