Hi, On Tue, Mar 30, 2010 at 11:43:22AM +0200, Colin wrote: > Hi All, > > we are running Corosync 1.2.0-0ubuntu1 on Ubuntu 10.4 beta w/current > updates; the cluster consists of two systems running in KVM, each on a > dedicated host. > > We have observed several times, but are unfortunately unable to nail > the exact cause, that when the virtualised system that is running > corosync has a "hiccup", i.e. hangs for couple of seconds when we > introduce a delay into its storage access, then the corosync process > enters an endless loop from which it doesn't ever seem to recover. > > In this endless loop the process uses 193% CPU in the 2-CPU > virtualised system, and is issuing a stream of wait4() system-calls > (with an occasional nanosleep() and some futex-stuff). > > ...?
It'd be good to kill -ABRT the process and then get the backtrace with gdb. If you're running pacemaker, there's hb_report to collect all relevant information (incl the backtraces). Make sure that coredumps are allowed and install the packages which contain the debugging information. Thanks, Dejan > Thanks, Colin > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
