[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST> 22:16:59,629 <http://airmail.calendar/2016-01-12%2022:16:59%20PST>] TRACE [Controller 925537]: leader imbalance ratio for broker 925537 is 0.000000 (kafka.controller.KafkaController)
[2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST> 22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO [SessionExpirationListener on 925537], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST> 22:21:07,167 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO [delete-topics-thread-925537], Shutting down (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST> 22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO [delete-topics-thread-925537], Shutdown completed (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2016-01-12 <http://airmail.calendar/2016-01-12%2012:00:00%20PST> 22:21:07,169 <http://airmail.calendar/2016-01-12%2022:21:07%20PST>] INFO [delete-topics-thread-925537], Stopped (kafka.controller.TopicDeletionManager$Del This occurs very frequently, even after clean slating kafka. This is something that never occurs in our production env. I've read here and there that it could be a GC issue? Here is the tail end of recent GC log. 20534K(8354560K), 52.5293140 secs] [Times: user=209.09 sys=0.06, real=52.53 secs] 2016-01-11T23:16:05.149+0000: 784.219: [GC 784.219: [ParNew: 274263K->1685K(306688K), 54.8993730 secs] 793174K->520803K(8354560K), 54.8994450 secs] [Times: user=218.86 sys=0.03, real=54.90 secs] 2016-01-11T23:17:01.095+0000: 840.165: [GC 840.165: [ParNew: 274325K->1896K(306688K), 56.4208930 secs] 793443K->521139K(8354560K), 56.4209750 secs] [Times: user=224.88 sys=0.05, real=56.42 secs] 2016-01-11T23:17:59.024+0000: 898.093: [GC 898.093: [ParNew: 274536K->1705K(306688K), 58.1100630 secs] 793779K->521093K(8354560K), 58.1101400 secs] [Times: user=231.75 sys=0.05, real=58.12 secs] 2016-01-11T23:18:58.240+0000: 957.310: [GC 957.310: [ParNew: 274345K->1483K(306688K), 64.2820420 secs] 793733K->521047K(8354560K), 64.2821180 secs] [Times: user=241.93 sys=0.06, real=64.28 secs] 2016-01-11T23:20:03.571+0000: 1022.640: [GC 1022.640: [ParNew: 274123K->1379K(306688K), 61.5305280 secs] 793687K->521097K(8354560K), 61.5305990 secs] [Times: user=245.72 sys=0.01, real=61.53 secs] 2016-01-11T23:21:06.194+0000: 1085.263: [GC 1085.263: [ParNew: 274019K->1508K(306688K), 63.4433440 secs] 793737K->521372K(8354560K), 63.4434240 secs] [Times: user=253.33 sys=0.02, real=63.44 secs] 2016-01-11T23:22:10.413+0000: 1149.482: [GC 1149.483: [ParNew: 274148K->1313K(306688K), 65.6956010 secs] 794012K->521330K(8354560K), 65.6956660 secs] [Times: user=262.01 sys=0.05, real=65.69 secs] Heap par new generation total 306688K, used 132112K [0x00000005f5a00000, 0x000000060a6c0000, 0x000000060a6c0000) eden space 272640K, 47% used [0x00000005f5a00000, 0x00000005fd9bbba0, 0x0000000606440000) from space 34048K, 3% used [0x0000000606440000, 0x00000006065884a8, 0x0000000608580000) to space 34048K, 0% used [0x0000000608580000, 0x0000000608580000, 0x000000060a6c0000) concurrent mark-sweep generation total 8047872K, used 520016K [0x000000060a6c0000, 0x00000007f5a00000, 0x00000007f5a00000) concurrent-mark-sweep perm gen total 38760K, used 25768K [0x00000007f5a00000, 0x00000007f7fda000, 0x0000000800000000) On Tue, Jan 12, 2016 at 6:34 PM, Mayuresh Gharat <gharatmayures...@gmail.com > wrote: > Can you paste the logs? > > Thanks, > > Mayuresh > > On Tue, Jan 12, 2016 at 4:58 PM, Dillian Murphey <crackshotm...@gmail.com> > wrote: > > > Possibly running more stable with 1.7 JVM. > > > > Can someone explain the Zookeeper session? SHould it never expire, > unless > > the broker becomes unresponsive? I set a massive timeout value in the > > broker config far beyond the amount of time I see the zk expiration. Is > > this entirely on the kafka side, or could zookeeper be doing something? > > From my zk logs I didn't see anything unusual, just exceptions as a > result > > of the zk session expiring (my guess). > > > > tnx > > > > On Tue, Jan 12, 2016 at 3:05 PM, Dillian Murphey < > crackshotm...@gmail.com> > > wrote: > > > > > Our 2 node kafka cluster has become unhealthy. We're running zookeeper > > as > > > a 3 node system, which very light load. > > > > > > What seems to be happening is in the controller log we get a ZK session > > > expire message, and in the process of re-assigning the leader for the > > > partitions (if I'm understanding this right, please correct me), the > > broker > > > goes offline and it interrupts our applications that are publishing > > > messages. > > > > > > We don't see this in production, and kafka has been stable for months, > > > since september. > > > > > > I've searched a lot and found some similiar complaints but no real > > > solutions. > > > > > > I'm running 0.8.2 and JVM 1.6.X on ubuntu. > > > > > > Thanks for any ideas at all. > > > > > > > > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 >