The is nothing to kill. crmd has finished (I can see it in the log) and it's
a ghost in defunct state at this point.


On Tue, May 11, 2010 at 8:42 AM, Dejan Muhamedagic <[email protected]> wrote:

> Hi,
>
> On Tue, May 11, 2010 at 07:40:39AM -0400, Vadym Chepkov wrote:
> > By the way, reboot is too drastic, I do kill -9 of the corosync
>
> I guess that corosync is waiting for crmd to stop. Did you try to
> kill crmd?
>
> Thanks,
>
> Dejan
>
> > On May 11, 2010, at 7:37 AM, Alain.Moulle wrote:
> >
> > > Hi Steven ,
> > > Vadym, just to know: did you execute crm_mon on another window when the
> > > corosync
> > > shutdown was stalled , just to see if there was some "failed" items ?
> > > On my side : I've set debug off and the news (bad or good) is that it
> > > did not occur again,
> > > but it was also the case since yesterday with debug on ! With debug
> off,
> > > I've
> > > tried 10 times without any problem on corosync shutdown.  So I tried
> again
> > > the thing I thought it was a good clue two days ago :
> > > with debug : off (but it is similar with debug on)
> > > /etc/init.d/corosync stop    => sucessful
> > > mv external/ipmi external/ipmi.save to force the start of my
> > > resourcetofence to be failed
> > > /etc/init.d/corosync start    => sucessful
> > > but crm_mon shows :
> > >  restofencenode2        (stonith:external/ipmi):    Started node3
> FAILED
> > >  Failed actions:
> > >    restofencenode2_start_0 (node=node3, call=5, rc=1, status=complete):
> > > unknown error
> > > then :
> > > /etc/init.d/corosync stop
> > > Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
> > > Waiting for corosync services to
> > > unload:............................................
> > >
> .............................................................................................................
> > > and it does not return (since about 5mn)
> > > So I did :
> > > crm resource cleanup restofencenode2
> > > crm resource stop restofencenode2
> > > but unfortunately, it does not help the corosync shutdown to
> complete...
> > > So I have to reboot the node ...
> > >
> > > Don't know if this helps but ... ok I'll try other things ...
> > > Alain
> > >
> > >
> > >> The bad news - it didn't help, still observing the same issue.
> > >
> > >> The good news - it's 100% reproducible.
> > >>
> > >> Vadym
> > >>
> > >> On May 10, 2010, at 7:19 PM, Steven Dake wrote:
> > >>
> > >>
> > >>>> On Mon, 2010-05-10 at 19:02 -0400, Vadym Chepkov wrote:
> > >>>
> > >>>>>> Yes, I am
> > >>>>>>
> > >>>>
> > >>>> try without
> > >>>>
> > >>>
> > >>>>>>
> > >>>>>> On May 10, 2010, at 6:59 PM, Steven Dake wrote:
> > >>>>>>
> > >>>>
> > >>>>>>>> Do you have debug: on in your config file?
> > >>>>>>>>
> > >>>>>>>> Regards
> > >>>>>>>> -steve
> > >>>>>>>>
> > >>>>>>>> On Mon, 2010-05-10 at 18:24 -0400, Vadym Chepkov wrote:
> > >>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I experienced the same issue on Redhat 5.5 PPC.
> > >>>>>>>>>> I compiled all packages myself, since there are no ppc
> packages available in the clusterlabs repository.
> > >>>>>>>>>> If Andrew will post his SRPM somewhere or maybe instructions
> how to compile it, I would be happy to contribute.
> > >>>>>>>>>>
> > >>>>>>>>>> Vadym
> > >>>>>>>>>>
> > >>>>>>>>>> On May 10, 2010, at 5:38 PM, Steven Dake wrote:
> > >>>>>>>>>>
> > >>>>>>
> > >>>>>>>>>>>> It seems pretty clear from the mailing list traffic recently
> there is a
> > >>>>>>>>>>>> critical flaw with the shutdown related in some way to
> Pacemaker and
> > >>>>>>>>>>>> Corosync that happens on a few people's opensuse systems.
>  It seems to
> > >>>>>>>>>>>> only reproduce on opensuse however we don't know if it is
> limited to
> > >>>>>>>>>>>> this platform.  Finally we want Corosync to work perfectly
> for every
> > >>>>>>>>>>>> Linux platform and will do everything possible to understand
> the
> > >>>>>>>>>>>> specific environmental issues that are exposing bugs in
> Corosync.
> > >>>>>>>>>>>> Unfortunately for several weeks we have been unable in our
> labs to
> > >>>>>>>>>>>> reproduce this problem which means we need your help!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The developers will work to resolve this problem at our
> highest priority
> > >>>>>>>>>>>> and release a fix as soon as we can generate an adequate
> execution
> > >>>>>>>>>>>> trace.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We have a backtrace around where the issue occurred which
> presents us
> > >>>>>>>>>>>> with enough data to get started.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Our plans are as follows:
> > >>>>>>>>>>>> Mon-Wed: Code review of suspected areas and instrumentation
> patch
> > >>>>>>>>>>>> created
> > >>>>>>>>>>>> Thu: Special build created by Andrew with the
> instrumentation patch for
> > >>>>>>>>>>>> those people affected by this issue.
> > >>>>>>>>>>>> We will begin analysis of the instrumentation results once
> we have a
> > >>>>>>>>>>>> trace.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I would really appreciate those people affected by this
> issue to run
> > >>>>>>>>>>>> Andrew's special build of Corosync which will have more
> trace info in it
> > >>>>>>>>>>>> when it is available.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards
> > >>>>>>>>>>>> -steve
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Mon, 2010-05-10 at 14:26 +0200, Alain.Moulle wrote:
> > >>>>>>>
> > >>>>>>>>>>>>>> As soon as I got it again ... because it is strange, I did
> not face
> > >>>>>>>>>>>>>> the problem
> > >>>>>>>>>>>>>> again since this morning ! And besides I'm sure that on
> Friday I was
> > >>>>>>>>>>>>>> in a case where
> > >>>>>>>>>>>>>> the stop/cleanup (of a resource failed on start) enables
> the corosync
> > >>>>>>>>>>>>>> shutdown to
> > >>>>>>>>>>>>>> complete , and as long as I had not cleanup the failed
> resource, the
> > >>>>>>>>>>>>>> corosync stop
> > >>>>>>>>>>>>>> does not returns and was stalled in "Waiting for corosync
> services to
> > >>>>>>>>>>>>>> unload:........
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'll keep you inform if I can find the conditions for this
> abnormal
> > >>>>>>>>>>>>>> behavior.
> > >>>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>> Regards
> > >>>>>>>>>>>>>> Alain
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Andrew Beekhof a ?crit :
> > >>>>>>>>
> > >>>>>>>>>>>>>>>> On Mon, May 10, 2010 at 8:31 AM, Alain.Moulle <
> [email protected]> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I meant  "/etc/init.d/corosync stop" never returns.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Ok. Can you show us the logs and "ps axf" please?
> > > _______________________________________________
> > > Openais mailing list
> > > [email protected]
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> >
> > _______________________________________________
> > Openais mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/openais
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to