The bad news - it didn't help, still observing the same issue. The good news - it's 100% reproducible.
Vadym On May 10, 2010, at 7:19 PM, Steven Dake wrote: > On Mon, 2010-05-10 at 19:02 -0400, Vadym Chepkov wrote: >> Yes, I am >> > try without > >> >> On May 10, 2010, at 6:59 PM, Steven Dake wrote: >> >>> Do you have debug: on in your config file? >>> >>> Regards >>> -steve >>> >>> On Mon, 2010-05-10 at 18:24 -0400, Vadym Chepkov wrote: >>>> Hi, >>>> >>>> I experienced the same issue on Redhat 5.5 PPC. >>>> I compiled all packages myself, since there are no ppc packages available >>>> in the clusterlabs repository. >>>> If Andrew will post his SRPM somewhere or maybe instructions how to >>>> compile it, I would be happy to contribute. >>>> >>>> Vadym >>>> >>>> On May 10, 2010, at 5:38 PM, Steven Dake wrote: >>>> >>>>> It seems pretty clear from the mailing list traffic recently there is a >>>>> critical flaw with the shutdown related in some way to Pacemaker and >>>>> Corosync that happens on a few people's opensuse systems. It seems to >>>>> only reproduce on opensuse however we don't know if it is limited to >>>>> this platform. Finally we want Corosync to work perfectly for every >>>>> Linux platform and will do everything possible to understand the >>>>> specific environmental issues that are exposing bugs in Corosync. >>>>> Unfortunately for several weeks we have been unable in our labs to >>>>> reproduce this problem which means we need your help! >>>>> >>>>> The developers will work to resolve this problem at our highest priority >>>>> and release a fix as soon as we can generate an adequate execution >>>>> trace. >>>>> >>>>> We have a backtrace around where the issue occurred which presents us >>>>> with enough data to get started. >>>>> >>>>> Our plans are as follows: >>>>> Mon-Wed: Code review of suspected areas and instrumentation patch >>>>> created >>>>> Thu: Special build created by Andrew with the instrumentation patch for >>>>> those people affected by this issue. >>>>> We will begin analysis of the instrumentation results once we have a >>>>> trace. >>>>> >>>>> I would really appreciate those people affected by this issue to run >>>>> Andrew's special build of Corosync which will have more trace info in it >>>>> when it is available. >>>>> >>>>> Regards >>>>> -steve >>>>> >>>>> On Mon, 2010-05-10 at 14:26 +0200, Alain.Moulle wrote: >>>>>> As soon as I got it again ... because it is strange, I did not face >>>>>> the problem >>>>>> again since this morning ! And besides I'm sure that on Friday I was >>>>>> in a case where >>>>>> the stop/cleanup (of a resource failed on start) enables the corosync >>>>>> shutdown to >>>>>> complete , and as long as I had not cleanup the failed resource, the >>>>>> corosync stop >>>>>> does not returns and was stalled in "Waiting for corosync services to >>>>>> unload:........ >>>>>> >>>>>> I'll keep you inform if I can find the conditions for this abnormal >>>>>> behavior. >>>>>> Thanks >>>>>> Regards >>>>>> Alain >>>>>> >>>>>> Andrew Beekhof a écrit : >>>>>>> On Mon, May 10, 2010 at 8:31 AM, Alain.Moulle <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I meant "/etc/init.d/corosync stop" never returns. >>>>>>>> >>>>>>> >>>>>>> Ok. Can you show us the logs and "ps axf" please? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Openais mailing list >>>>> [email protected] >>>>> https://lists.linux-foundation.org/mailman/listinfo/openais >>>> >>>> _______________________________________________ >>>> Openais mailing list >>>> [email protected] >>>> https://lists.linux-foundation.org/mailman/listinfo/openais >>> >> > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
