On 9/30/19 6:45 PM, Lentes, Bernd wrote: >>> >>> Hi Yan, >>> I had a look in the logs and what happened when i issued a "resource >>> cleanup" of >>> the GFS2 resource is >>> that the cluster deleted an entry in the status section: >>> >>> Sep 26 14:52:52 [9317] ha-idg-2 cib: info: cib_process_request: >>> Completed cib_delete operation for section >>> <================================================= >>> //node_state[@uname='ha-idg-1']//lrm_resource[@id='dlm']: OK (rc=0, > >>> and soon later on it recognized dlm on ha-idg-1 as stopped (or stops it): > >>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print: >>> dlm (ocf::pacemaker:controld): Stopped >>> <======================================== > >>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print: >>> clvmd (ocf::heartbeat:clvm): Started ha-idg-1 >>> Sep 26 14:52:54 [9321] ha-idg-2 pengine: info: common_print: > >>> >>> Following the logs dlm is running before. Does the deletion of that entry >>> leads >>> to the stop of the dlm resource ? >>> Is that expected behaviour ? >> First, unless "force" is specified, cleanup issued >> for a child resource >> will do the work for the whole resource group. > > Ah. Then i will use "force" in the future when i just want to do > a "resource cleanup" for one resource in a group. > But is the initial deleting of the dlm resource in the status section > the expected behaviour when i do a "resource cleanup" ? > Is it because it is the first in the row of that group ? > Sorry for insisting, but i'm interested in really understanding what was > going on.
It's supposed to be a feature (arguable :-)) of "crm_resource -C" that intelligently cleans up "relevant" resources all together at once. The behavior/idea about cleanup makes more sense in pacemaker-2.0 (SLE-HA 15 releases). It does *real* cleanup only if a resource has any failures. > >> Cleanup deletes resources' history which triggers (re-) probe of >> resources. But before probe of a resource has been finished, the >> resource will be shown as "Stopped" which doesn't necessarily mean it's >> actually "Stopped". A running resource will be detected to be "Started" >> with the probe. > > Deleting history means resetting fail-count and last-failure ? Fail-count yes, and all the recorded historical operations of the resource in cib status rather than just the last-failure. > >> Restart of VM was because pengine/crmd thought the resources it depended >> on were really "Stopped" and wasn't patient enough to wait for probe of >> them to finish. That's what the pull request resolved. >> > > I installed it. is there a way to test it ? Simply clean up any resources like gfs2 from the resource group that the VM depends on as you did, and see if VM can remain untouched? Regards, Yan > > Thanks. > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling > Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
