On Mon, Nov 29, 2010 at 5:37 PM, Dejan Muhamedagic <deja...@fastmail.fm> wrote: > Hi, > > On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote: >> Was: Re: [Pacemaker] crm resource restart doesn't restart the correct >> resource >> >> Zitat von Dejan Muhamedagic <deja...@fastmail.fm>: >> >>>> This is happening, because, when the clone is created, >>>> pacemaker stops the primitive but does not wait for the stop action >>>> to return, and just starts the primitive over. And that off course >>>> causes problems. >>> >>> Hmm, don't quite understand what is going on. Is that primitive >>> part of the group? Can you describe in more detail what is going >>> on. >> >> I have a group (grp_fs) consisting of a LVM and several Filesystem >> resources, in that order. That group is started and all resources are >> running. Now I do clone this group by issuing: >> >> crm configure clone clo_fs grp_fs >> >> That does stop all resources and starts them again as clone. But >> Pacemaker does not seem to wait until the stop action has finished. I >> have modified the LVM RA to log the action command issued to the agent >> and the value returned by the agent: >> >> 14:24:11 [ 14495 ] Action: start >> 14:24:11 [ 14494 ] Action: stop >> 14:24:13 [ 14494 ] RC: 1 >> 14:24:14 [ 14495 ] RC: 0 >> 14:24:14 [ 14599 ] Action: monitor >> 14:24:14 [ 14599 ] RC: 0 >> >> In brackets you see the PID. As can be seen, Pacemaker first issues a >> start command and then immediately a stop afterwards, not waiting for >> the first command to return. That produces an orphan resource. That >> involves that the state of the LVM resource (which is now cloned) is >> uncertain. It can happen to start but it can also fail. > > I see. The problem here is that as far as the cluster's > concerned, the new resources and the old resources are > unrelated: they have different names (before it was say lvm1 and > now it's lvm1:0). I'm not sure if the crmd/pengine can tell if > the resources of the group which are running actually belong to > the cloned group as well. Andrew?
We'll find it if it's an anonymous clone (thanks to the initial monitor op). Although things might be a bit confusing for a while since we'll probably try and stop it under the "old" name (which would cause any recurring monitor ops for the "new" name to fail) > If not, then we'll have to > forbid creating a clone of running resources in the shell. Might be the best option. > Thanks, > > Dejan > >> -- >> Uwe Grawert >> Linux / Unix Consultant & Trainer >> Tel.: +49 151 12051100 >> Mail: graw...@b1-systems.de >> >> B1 Systems GmbH >> Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de >> GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 >> >> >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker