On Fri, 2018-04-06 at 04:30 +0000, 飯田 雄介 wrote: > Hi, all > I am testing the environment using fence_mpath with the following > settings. > > ======= > Stack: corosync > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with > quorum > Last updated: Fri Apr 6 13:16:20 2018 > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on > x3650e > > 2 nodes configured > 13 resources configured > > Online: [ x3650e x3650f ] > > Full list of resources: > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > Resource Group: grpPostgreSQLDB > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started > x3650e > Resource Group: grpPostgreSQLIP > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > ed x3650e > Clone Set: clnDiskd1 [prmDiskd1] > Started: [ x3650e x3650f ] > Clone Set: clnDiskd2 [prmDiskd2] > Started: [ x3650e x3650f ] > Clone Set: clnPing [prmPing] > Started: [ x3650e x3650f ] > ======= > > When split-brain occurs in this environment, x3650f executes fence > and the resource is started with x3650f. > > === view of x3650e ==== > Stack: corosync > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition > WITHOUT quorum > Last updated: Fri Apr 6 13:16:36 2018 > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on > x3650e > > 2 nodes configured > 13 resources configured > > Node x3650f: UNCLEAN (offline) > Online: [ x3650e ] > > Full list of resources: > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e > x3650f ] > Resource Group: grpPostgreSQLDB > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started > x3650e > Resource Group: grpPostgreSQLIP > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > ed x3650e > Clone Set: clnDiskd1 [prmDiskd1] > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f > (UNCLEAN) > Started: [ x3650e ] > Clone Set: clnDiskd2 [prmDiskd2] > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f > (UNCLEAN) > Started: [ x3650e ] > Clone Set: clnPing [prmPing] > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > Started: [ x3650e ] > > === view of x3650f ==== > Stack: corosync > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition > WITHOUT quorum > Last updated: Fri Apr 6 13:16:36 2018 > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on > x3650e > > 2 nodes configured > 13 resources configured > > Online: [ x3650f ] > OFFLINE: [ x3650e ] > > Full list of resources: > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650f > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > Resource Group: grpPostgreSQLDB > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > ed x3650f > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > ed x3650f > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > ed x3650f > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started > x3650f > Resource Group: grpPostgreSQLIP > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > ed x3650f > Clone Set: clnDiskd1 [prmDiskd1] > Started: [ x3650f ] > Stopped: [ x3650e ] > Clone Set: clnDiskd2 [prmDiskd2] > Started: [ x3650f ] > Stopped: [ x3650e ] > Clone Set: clnPing [prmPing] > Started: [ x3650f ] > Stopped: [ x3650e ] > ======= > > However, IPaddr2 of x3650e will not stop until pgsql monitor error > occurs. > At this time, IPaddr2 is temporarily started on two nodes. > > === view of after pgsql monitor error === > Stack: corosync > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition > WITHOUT quorum > Last updated: Fri Apr 6 13:16:56 2018 > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on > x3650e > > 2 nodes configured > 13 resources configured > > Node x3650f: UNCLEAN (offline) > Online: [ x3650e ] > > Full list of resources: > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e > x3650f ] > Resource Group: grpPostgreSQLDB > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > ed x3650e > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Stopped > Resource Group: grpPostgreSQLIP > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Stopp > ed > Clone Set: clnDiskd1 [prmDiskd1] > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f > (UNCLEAN) > Started: [ x3650e ] > Clone Set: clnDiskd2 [prmDiskd2] > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f > (UNCLEAN) > Started: [ x3650e ] > Clone Set: clnPing [prmPing] > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > Started: [ x3650e ] > > Node Attributes: > * Node x3650e: > + default_ping_set : 100 > + diskcheck_status : normal > + diskcheck_status_internal : normal > > Migration Summary: > * Node x3650e: > prmApPostgreSQLDB: migration-threshold=1 fail-count=1 last- > failure='Fri Apr 6 13:16:39 2018' > > Failed Actions: > * prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7): > call=60, status=complete, exitreason='Configuration file > /dbfp/pgdata/data/postgresql.conf doesn't exist', > last-rc-change='Fri Apr 6 13:16:39 2018', queued=0ms, exec=0ms > ====== > > We regard this behavior as a problem. > Is there a way to avoid this behavior? > > Regards, Yusuke
Hi Yusuke, One possibility would be to implement network fabric fencing as well, e.g. fence_snmp with an SNMP-capable network switch. You can make a fencing topology level with both the storage and network devices. The main drawback is that unfencing isn't automatic. After a fenced node is ready to rejoin, you have to clear the block at the switch yourself. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org