How to dump Kafka consumer offset? On Thu, Mar 1, 2018, 10:25 PM Erik Weathers <[email protected]> wrote:
> Agreed, there have been a number of fixes in the storm-kafka spout that > might account for that problem. If you need to debug further on 0.9.x you > shoulda dump the Kafka consumer offsets and see if the topology is getting > stuck at some specific offsets. Then examine the data at those offsets > using a console consumer to try to infer why the topology would get stuck. > > - Erik > > On Wed, Feb 28, 2018 at 2:41 PM Jungtaek Lim <[email protected]> wrote: > >> Hi Ajeesh, >> >> Sorry but the version is really outdated, released 3 years ago. Would you >> mind upgrading to recent version, 1.2.1 for example and see how it help? >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 2018년 2월 28일 (수) 오후 9:48, Ajeesh <[email protected]>님이 작성: >> >>> Hi Team, >>> >>> We are facing issues in Storm version 0.9.4, Storm application hangs >>> after processing for 3-4 days. We tried to restart the same Storm topology >>> but it will fail within 1-2 minutes by processing around 15K-16K. If we >>> decrease the max.spout.pending value then it fails by processing only few >>> tuples. >>> >>> If we start a new topology with new Kafka topic then everything >>> works fine for 3-4 days. Our daily volume will be around 11 million. >>> >>> Checked the execute latency, it's around 6ms. >>> Checked worker logs, there's no error/exceptions. >>> Storm visualization graph shows all nodes in "green" color. >>> >>> Workflow: >>> KafkaSpout->Bolt-1->Bolt-2->Bolt-3->Bolt-4. >>> >>> Storm configurations: >>> No. of workers: 10 >>> No. of executors: 260 >>> Max Spout Pending: 50 >>> No. of KafkaSpout executors: 10 >>> >>> TODO: >>> 1. Wanna take a thread dump >>> 2. Is there anything you require to know more about this issue? >>> >>> Analyzed worker logs in debug mode: >>> 2018-02-28T05:12:51.462-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=163348 with _pending.size()=3953 pending and _emittedToOffset=168353 >>> 2018-02-28T05:12:51.461-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=195116 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> 2018-02-28T05:12:51.463-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=194007 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> 2018-02-28T05:12:51.463-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=194700 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> 2018-02-28T05:12:51.463-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=193891 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> 2018-02-28T05:12:51.463-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=194455 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> 2018-02-28T05:12:51.463-0500 s.k.PartitionManager [DEBUG] failing at >>> offset=194632 with _pending.size()=4442 pending and _emittedToOffset=199437 >>> >>> 2018-02-28T05:14:05.241-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d003e after 10ms >>> 2018-02-28T05:14:05.703-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x361d81df7a80048 after 0ms >>> 2018-02-28T05:14:05.703-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d003d after 0ms >>> 2018-02-28T05:14:05.745-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30043 after 2ms >>> 2018-02-28T05:14:05.775-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30045 after 3ms >>> 2018-02-28T05:14:05.849-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x361d81df7a80044 after 1ms >>> 2018-02-28T05:14:05.969-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30046 after 0ms >>> 2018-02-28T05:14:07.067-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30041 after 11ms >>> 2018-02-28T05:14:07.131-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d003c after 0ms >>> 2018-02-28T05:14:07.135-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30042 after 0ms >>> 2018-02-28T05:14:07.140-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d003b after 0ms >>> 2018-02-28T05:14:07.150-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30044 after 0ms >>> 2018-02-28T05:14:08.319-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x361d81df7a8004b after 6ms >>> 2018-02-28T05:14:08.938-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d0042 after 1ms >>> 2018-02-28T05:14:08.977-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x161d81df7a30047 after 10ms >>> 2018-02-28T05:14:08.985-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d0043 after 6ms >>> 2018-02-28T05:14:08.985-0500 o.a.z.ClientCnxn [DEBUG] Got ping response >>> for sessionid: 0x261d81e8b3d0044 after 7ms >>> >>>
