Hi, Andrei Thanks for your comment.
We are not assuming node level fencing in the current environment. I tried the power_timeout setting that you taught. However, fence_mpath immediately returns the status off when you execute the off action. https://github.com/ClusterLabs/fence-agents/blob/v4.0.25/fence/agents/lib/fencing.py.py#L744 Therefore, we could not wait to stop IPaddr2 using this option. I read the code and learned the power_wait option. With this option you can delay the completion of STONITH by the specified amount of time, so it seems to meet our requirements. Thanks, Yusuke > -----Original Message----- > From: Users [mailto:users-boun...@clusterlabs.org] On Behalf Of Andrei > Borzenkov > Sent: Friday, April 06, 2018 2:04 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] How can I prevent multiple start of IPaddr 2 in an > environment using fence_mpath? > > 06.04.2018 07:30, 飯田 雄介 пишет: > > Hi, all > > I am testing the environment using fence_mpath with the following settings. > > > > ======= > > Stack: corosync > > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with quorum > > Last updated: Fri Apr 6 13:16:20 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Online: [ x3650e x3650f ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started > x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started > x3650e > > Clone Set: clnDiskd1 [prmDiskd1] > > Started: [ x3650e x3650f ] > > Clone Set: clnDiskd2 [prmDiskd2] > > Started: [ x3650e x3650f ] > > Clone Set: clnPing [prmPing] > > Started: [ x3650e x3650f ] > > ======= > > > > When split-brain occurs in this environment, x3650f executes fence and the > resource is started with x3650f. > > > > === view of x3650e ==== > > Stack: corosync > > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT > quorum > > Last updated: Fri Apr 6 13:16:36 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Node x3650f: UNCLEAN (offline) > > Online: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f ] > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started > x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started > x3650e > > Clone Set: clnDiskd1 [prmDiskd1] > > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnPing [prmPing] > > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > > > === view of x3650f ==== > > Stack: corosync > > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition WITHOUT > quorum > > Last updated: Fri Apr 6 13:16:36 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Online: [ x3650f ] > > OFFLINE: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650f > > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started > x3650f > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started > x3650f > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started > x3650f > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650f > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Started > x3650f > > Clone Set: clnDiskd1 [prmDiskd1] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > Clone Set: clnPing [prmPing] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > ======= > > > > However, IPaddr2 of x3650e will not stop until pgsql monitor error occurs. > > At this time, IPaddr2 is temporarily started on two nodes. > > > > === view of after pgsql monitor error === > > Stack: corosync > > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT > quorum > > Last updated: Fri Apr 6 13:16:56 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Node x3650f: UNCLEAN (offline) > > Online: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f ] > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Started > x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Started > x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Stopped > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Stopped > > Clone Set: clnDiskd1 [prmDiskd1] > > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnPing [prmPing] > > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > > > Node Attributes: > > * Node x3650e: > > + default_ping_set : 100 > > + diskcheck_status : normal > > + diskcheck_status_internal : normal > > > > Migration Summary: > > * Node x3650e: > > prmApPostgreSQLDB: migration-threshold=1 fail-count=1 > last-failure='Fri Apr 6 13:16:39 2018' > > > > Failed Actions: > > * prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7): call=60, > status=complete, exitreason='Configuration file > /dbfp/pgdata/data/postgresql.conf doesn't exist', > > last-rc-change='Fri Apr 6 13:16:39 2018', queued=0ms, exec=0ms > > ====== > > > > We regard this behavior as a problem. > > Is there a way to avoid this behavior? > > > > > Use node level stonith agent instead of storage resource fencing? :) > > Seriously, storage fencing just ensures that other node(s) cannot access the > same resources and so damage data by uncontrolled concurrent access. > Otherwise node with fenced off resource continues to run "normally". > > See also https://access.redhat.com/articles/3078811 for some statements > regarding use of storage fencing. > > The only workaround for two node cluster I can think of is to artificially > delay > stonith agent completion to be longer than monitor timeout. This way node will > not begin failover resources until resources are (hopefully) stopped on other > node. You can probably do it with power_timeout property. > > For three+ nodes setting no-quorum-policy=stop may work, although it does not > solve the problem of intentional node fencing of healthy node (e.g. due to > resource stop failure). > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org