On Fri, Apr 16, 2010 at 02:28:26PM -0500, Schaefer, Diane E wrote: > Hi, > I have a resource that sometimes can take 10 minutes to start after > a failure due to log records that need to be sync'd. (my own OCF) > > I noticed while the start action was being performed, if other > resources in my cluster report a "not running", no restart will be > attempted until my long running started resource returns. > > Meanwhile, the crm_mon reports the resources as "started" > eventhough they are not running, and may not be for many minutes. > Is the lrm process single threaded?
You are saying that while your RA starts (with a long start timeout), and the start action is not yet complete, other _independend_ resources are not yet started, but crm_mon thinks they are running already, even though "something" (what?) reports "not running" for those? I think you lost me ;) please show a "crm configure show" Can you reproduce this easily? Can you reproduce this with just a few "Dummy" resources? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf