fence_scsi.key (does not exist) after nodes are rebooted

Rafael David Tinoco Sun, 31 May 2020 21:56:02 -0700

For Groovy:

# fence_mpath


node 1: clusterg01
node 2: clusterg02
node 3: clusterg03
primitive fence-mpath-clusterg01 stonith:fence_mpath \
    params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg01 
pcmk_monitor_action=metadata pcmk_
    meta provides=unfencing target-role=Started
primitive fence-mpath-clusterg02 stonith:fence_mpath \
    params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg02 
pcmk_monitor_action=metadata pcmk_
    meta provides=unfencing target-role=Started
primitive fence-mpath-clusterg03 stonith:fence_mpath \
    params pcmk_on_timeout=70 pcmk_off_timeout=70 pcmk_host_list=clusterg03 
pcmk_monitor_action=metadata pcmk_
    meta provides=unfencing target-role=Started
property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=2.0.3-4b1f869f0f \
    cluster-infrastructure=corosync \
    cluster-name=clusterg \
    stonith-enabled=true \
    no-quorum-policy=stop \
    last-lrm-refresh=1590773755

--

$ crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: clusterg01 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Mon Jun  1 04:17:28 2020
  * Last change:  Mon Jun  1 04:07:10 2020 by root via cibadmin on clusterg03
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ clusterg01 clusterg02 clusterg03 ]

Full List of Resources:
  * fence-mpath-clusterg01      (stonith:fence_mpath):   Started clusterg01
  * fence-mpath-clusterg02      (stonith:fence_mpath):   Started clusterg02
  * fence-mpath-clusterg03      (stonith:fence_mpath):   Started clusterg03

--

(k)rafaeldtinoco@clusterg02:~$ sudo mpathpersist --in -r /dev/mapper/volume01
  PR generation=0x11, Reservation follows:
   Key = 0x59450001
  scope = LU_SCOPE, type = Write Exclusive, registrants only

(k)rafaeldtinoco@clusterg02:~$ sudo mpathpersist --in -k /dev/mapper/volume01
  PR generation=0x11,   12 registered reservation keys follow:
    0x59450001
    0x59450001
    0x59450001
    0x59450001
    0x59450000
    0x59450000
    0x59450000
    0x59450000
    0x59450002
    0x59450002
    0x59450002
    0x59450002

-- when removing communication in between all nodes and clusterg01:

(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01
  PR generation=0x12,   8 registered reservation keys follow:
    0x59450001
    0x59450001
    0x59450001
    0x59450001
    0x59450002
    0x59450002
    0x59450002
    0x59450002

(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r /dev/mapper/volume01
  PR generation=0x12, Reservation follows:
   Key = 0x59450001
  scope = LU_SCOPE, type = Write Exclusive, registrants only

and

Node List:
  * Node clusterg01: UNCLEAN (offline)
  * Online: [ clusterg02 clusterg03 ]

Full List of Resources:
  * fence-mpath-clusterg01      (stonith:fence_mpath):   Started [ clusterg01 
clusterg02 ]
  * fence-mpath-clusterg02      (stonith:fence_mpath):   Started clusterg03
  * fence-mpath-clusterg03      (stonith:fence_mpath):   Started clusterg03

Pending Fencing Actions:
  * reboot of clusterg01 pending: client=pacemaker-controld.906, 
origin=clusterg02

and watchdog on host clusterg01 rebooted it. After reboot, only a single path 
has
came set the reservation, not all of them:

(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k /dev/mapper/volume01
  PR generation=0x13,   9 registered reservation keys follow:
    0x59450001
    0x59450001
    0x59450001
    0x59450001
    0x59450002
    0x59450002
    0x59450002
    0x59450002
    0x59450000

I had to stop "fence-mpath-clusterg01" fence agent and restore all
reservations manually with:

(k)rafaeldtinoco@clusterg01:~$ sudo fence_mpath -v -d
/dev/mapper/volume01 -n 59450000 -o on

and start resource "fence-mpath-clusterg01" again. This is the problem
with multipathed devices about having automatic recovery... sometimes it
is better to have manual intervention only and stick with the faulty
node until you manually reboot it and fix reservations.


** Description changed:

- Whenever trying to configure fence_scsi using Ubuntu Bionic the
- following error happens:
+ This bug's intent is to check if fence_scsi and fence_mpath agents are
+ working in all supported Ubuntu versions. This is needed because both
+ agents are very prone to errors and, depending the way they are
+ configured, a vast set of errors can occur.
+ 
+ # fence-agents:
+ 
+ Both agents, fence_scsi and fence_mpath, are prone to errors
+ 
+ ## fence_scsi:
+ 
+ You may find the following cluster resource manager errors:
  
  Failed Actions:
- * fence_clubionicpriv01_start_0 on clubionic01 'unknown error' (1): call=8, 
status=Error, exitreason='',
-     last-rc-change='Mon Feb 24 03:20:28 2020', queued=0ms, exec=1132ms
+ * fence_bionic_start_0 on clubionic01 'unknown error' (1): call=8, 
status=Error, exitreason='', last-rc-change='Mon Feb 24 03:20:28 2020', 
queued=0ms, exec=1132ms
  
  And the logs show:
  
  Feb 24 03:20:31 clubionic02 fence_scsi[14072]: Failed: Cannot open file 
"/var/run/cluster/fence_scsi.key"
  Feb 24 03:20:31 clubionic02 fence_scsi[14072]: Please use '-h' for usage
  
- That happens because the key to be used by fence_scsi agent does not
- exist.
+ The fence_scsi agent is responsible for creating those files on the fly
+ and this error might be related to how the fence agent was configured in
+ pacemaker.
  
- The fence agent is responsible for creating those files on the fly and
- this error might be related to how the fence agent was configured in
- pacemaker.
+ ## fence-mpath
+ 
+ You may find very difficult to configure fence_mpath to work flawless,
+ try to follow comments from this bug.

** Summary changed:

- fence_scsi cannot open /var/run/cluster/fence_scsi.key (does not exist) after 
nodes are rebooted
+ fence_scsi and fence_mpath configuration issues (e.g. 
/var/run/cluster/fence_scsi.key)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1864404

Title:
  fence_scsi and fence_mpath configuration issues (e.g.
  /var/run/cluster/fence_scsi.key)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1864404] Re: fence_scsi cannot open /var/run/cluster/fence_scsi.key (does not exist) after nodes are rebooted

Reply via email to