Kashyap - I see this same issue on 0.9.5 On Sun, Sep 13, 2015 at 9:58 AM, Enno Shioji <[email protected]> wrote:
> There was a change in that area in 0.9.6 ( > https://issues.apache.org/jira/browse/STORM-763), although I'm not sure > if it will help your issue. > > > On Sun, Sep 13, 2015 at 2:35 PM, Kashyap Mhaisekar <[email protected]> > wrote: > >> Hmm. Thanks for the lead. On storm UI, the uptime for each executor >> except spout shows pretty much consistent values. Spout has crashed for >> sure. But then never comes up. Will check this up again. >> >> But the other question is - Is the Netty reconnects issue solved in >> 0.9.5? What is your storm version? >> >> Thanks >> Kashyap >> On Sep 13, 2015 08:04, "Martin Burian" <[email protected]> wrote: >> >>> They do restart after a while, yes. But if you don't see any error in >>> the log, it's weird. I encountered a case of workers not starting because I >>> configured the worker JVM to expose JMX interface for remote monitoring on >>> a given port. Other workers on the same machine however could not start as >>> they failed to bind to the already used port. No error messages whatsoever. >>> Might any such thing be your case? >>> >>> Othervise the cause should be logged somewhere. A worker is definitely >>> not running, or at least talking to the supervisor. You could try using >>> less workers to find out when/where the error occurs. >>> >>> Martin >>> >>> ne 13. 9. 2015 v 13:43 odesÃlatel Kashyap Mhaisekar <[email protected]> >>> napsal: >>> >>>> All worker logs have the same log. Workers are up. I am using only one >>>> box with multiple workers to test. >>>> Workers should be restarted of they fail right? So ideally, this error >>>> should be gone in a while.. >>>> >>>> Thanks >>>> >>>> >>>> Kashyap >>>> On Sep 13, 2015 05:10, "Martin Burian" <[email protected]> >>>> wrote: >>>> >>>>> When this appears in worker log, it means that the worker is trying to >>>>> connect to another worker, but the other is not running. What do you see >>>>> in >>>>> worker-6707.log? Is the other worker runing? >>>>> Matrin >>>>> >>>>> ne 13. 9. 2015 v 6:06 odesÃlatel Kashyap Mhaisekar < >>>>> [email protected]> napsal: >>>>> >>>>>> Also, >>>>>> Is there a way to switch back to 0mq from Netty? If so, what needs to >>>>>> be done? >>>>>> >>>>>> Thanks >>>>>> kashyap >>>>>> >>>>>> On Sat, Sep 12, 2015 at 10:49 PM, Kashyap Mhaisekar < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Am having a Netty related issues in my storm cluster where the spout >>>>>>> stops consuming after a while. The corresponding worker logs show - >>>>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [ERROR] connection >>>>>>> attempt 26 to >>>>>>> Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707 >>>>>>> <http://Netty-Client-trsttel2pascapp01.vm.itg.corp.us.shldcorp.com/10.2.70.18:6707> >>>>>>> failed: java.lang.RuntimeException: Returned channel was actually not >>>>>>> established* >>>>>>> *2015-09-12T23:28:23.391-0400 b.s.m.n.Client [INFO] connection >>>>>>> attempt 27 to Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707 >>>>>>> <http://Netty-Client-serverstorm1.myorg.com/10.2.70.18:6707> scheduled >>>>>>> to >>>>>>> run in 392 ms* >>>>>>> *2015-09-12T23:28:23.784-0400 b.s.m.n.Client [ERROR] connection >>>>>>> attempt 27 to Netty-Client-**serverstorm1.myorg.com >>>>>>> <http://serverstorm1.myorg.com>**/10.2.70.18:6707 >>>>>>> <http://10.2.70.18:6707> failed: java.lang.RuntimeException: Returned >>>>>>> channel was actually not established* >>>>>>> >>>>>>> The corresponding supervisor logs had >>>>>>> *2015-09-12T23:28:23.018-0400 b.s.d.supervisor [INFO] >>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>>>> *2015-09-12T23:28:23.518-0400 b.s.d.supervisor [INFO] >>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>>>> *2015-09-12T23:28:24.019-0400 b.s.d.supervisor [INFO] >>>>>>> 32e3f906-3869-4f0c-ac1c-4916615daf99 still hasn't started* >>>>>>> >>>>>>> I had storm version 0.9.3 when this issue occurred and had upgraded >>>>>>> to 0.9.4 and 0.9.5 to seek relief, but the issue still persists. Am not >>>>>>> sure what else to do. Am not even sure why this issue occurs and what >>>>>>> triggers it. Any help would be great and appreciated. >>>>>>> >>>>>>> Thanks >>>>>>> Kashyap >>>>>>> >>>>>>> >>>>>> >
