On 23/05/2013, at 8:52 PM, Alexandr A. Alexandrov <shurr...@gmail.com> wrote:
> Hi, All! > > On one of my clusters I have resources groups, second group depends on first > resource in the first group. Today I needed to restart one service from the > first group (no dependancies other than group), so I made in unmanaged: > > May 23 14:14:22 kennedy cib[20888]: notice: cib:diff: -- > <nvpair value="true" id="wcs_wcsd-meta_attributes-is-managed" /> > May 23 14:14:22 kennedy cib[20888]: notice: cib:diff: ++ > <nvpair id="wcs_wcsd-meta_attributes-is-managed" name="is-manage" > value="false" /> > > I made sure that the resource is "unmanaged" in crm_mon. After that I stopped > the resource. > However, after that the monitor operation was performed and resource was > marked as failed, and both groups got stopped! Well, the second group got > stopped because of dependancy, but why was the first group stopped because of > failure of an unmanaged resource, in the first place? Did you set is-managed=false for the group or a resource in the group? I'm assuming the latter - basically the cluster noticed your resource was not running anymore. While it did not try and do anything to fix that resource, it did stop anything that needed it. Then when the resource came back, it was able to start the dependancies again. A better approach would have been to disable the recurring monitor - then the cluster wouldn't have noticed the resource was restarted. Well, unless the dependancies noticed something they needed wasn't there and failed themselves. > > May 23 14:16:32 kennedy crmd[1787]: notice: process_lrm_event: LRM > operation wcs_wcsd_monitor_15000 (call=832, rc=7, cib-update=668, > confirmed=false) not running > May 23 14:16:32 kennedy crmd[1787]: warning: update_failcount: Updating > failcount for wcs_wcsd on kennedy after failed monitor: rc=7 (update=value++, > time=1369304192) > May 23 14:16:32 kennedy crmd[1787]: notice: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > May 23 14:16:32 kennedy attrd[20891]: notice: attrd_trigger_update: Sending > flush op to all hosts for: fail-count-wcs_wcsd (1) > May 23 14:16:32 kennedy pengine[1783]: notice: unpack_config: On loss of > CCM Quorum: Ignore > May 23 14:16:32 kennedy pengine[1783]: warning: unpack_rsc_op: Processing > failed op monitor for wcs_wcsd on kennedy: not running (7) > May 23 14:16:32 kennedy attrd[20891]: notice: attrd_perform_update: Sent > update 230: fail-count-wcs_wcsd=1 > > Is this a bug, or expected behaviour, or did I miss something from > documentation? > > Thanks in advance, > Alexandr > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org