On Fri, Apr 16, 2010 at 9:28 PM, Schaefer, Diane E <diane.schae...@unisys.com> wrote: > Hi, > > I have a resource that sometimes can take 10 minutes to start after a > failure due to log records that need to be sync’d. (my own OCF) I noticed > while the start action was being performed, if other resources in my cluster > report a “not running”, no restart will be attempted until my long running > started resource returns. Meanwhile, the crm_mon reports the resources as > “started” eventhough they are not running, and may not be for many minutes.
Does your RA return from the start action immediately or after the sync is complete and the service is truly started? It _must_ only do the later. Doing the former would explain what you're seeing. > Is the lrm process single threaded? Is running my resource start action > async a better strategy? I am concerned that other critical resources will > not be restarted in case of failures during the restart of the long starting > one. Is the resource state of started, not running or failed triggered by > the result of start instead of monitor? > > > > Thanks for any information. > > Diane Schaefer > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf