[Ubuntu-ha] [Bug 1866119] [NEW] [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1

Rafael David Tinoco Wed, 04 Mar 2020 12:05:55 -0800

Public bug reported:

OBS: This bug was originally into LP: #1865523 but it was split.


#### SRU: pacemaker

[Impact]

 * fence_scsi is not currently working in a share disk environment

 * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
be able to start the fencing agents OR, in worst case scenarios, the
fence_scsi agent might start but won't make scsi reservations in the
shared scsi disk.

 * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi,
since the later was fixed at LP: #1865523.

[Test Case]

 * having a 3-node setup, nodes called "clubionic01, clubionic02,
clubionic03", with a shared scsi disk (fully supporting persistent
reservations) /dev/sda, with corosync and pacemaker operational and
running, one might try:

rafaeldtinoco@clubionic01:~$ crm configure
crm(live)configure# property stonith-enabled=on
crm(live)configure# property stonith-action=off
crm(live)configure# property no-quorum-policy=stop
crm(live)configure# property have-watchdog=true
crm(live)configure# commit
crm(live)configure# end
crm(live)# end

rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \
    stonith:fence_scsi params \
    pcmk_host_list="clubionic01 clubionic02 clubionic03" \
    devices="/dev/sda" \
    meta provides=unfencing

And see the following errors:

Failed Actions:
* fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6, 
status=Error, exitreason='',
    last-rc-change='Wed Mar  4 19:53:12 2020', queued=0ms, exec=1105ms
* fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6, 
status=Error, exitreason='',
    last-rc-change='Wed Mar  4 19:53:13 2020', queued=0ms, exec=1109ms
* fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6, 
status=Error, exitreason='',
    last-rc-change='Wed Mar  4 19:53:11 2020', queued=0ms, exec=1108ms

and corosync.log will show:

warning: unpack_rsc_op_failure: Processing failed op start for
fence_clubionic on clubionic01: unknown error (1)

[Regression Potential]

 * LP: #1865523 shows fence_scsi fully operational after SRU for that
bug is done.

 * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix
fence_scsi.

 * TODO

[Other Info]

 * Original Description:

Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
the fencing mechanism, I realized that fence_scsi is not working in
Ubuntu Bionic. I first thought it was related to Azure environment (LP:
#1864419), where I was trying this environment, but then, trying
locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
shared scsi disk properly.

Note: I was able to "backport" vanilla 1.1.19 from upstream and
fence_scsi worked. I have then tried 1.1.18 without all quilt patches
and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
might tell us which commit has fixed the behaviour needed by the
fence_scsi agent.

(k)rafaeldtinoco@clubionic01:~$ crm conf show
node 1: clubionic01.private
node 2: clubionic02.private
node 3: clubionic03.private
primitive fence_clubionic stonith:fence_scsi \
        params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" 
devices="/dev/sda" \
        meta provides=unfencing
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.18-2b07d5c5a9 \
        cluster-infrastructure=corosync \
        cluster-name=clubionic \
        stonith-enabled=on \
        stonith-action=off \
        no-quorum-policy=stop \
        symmetric-cluster=true

----

(k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1
Stack: corosync
Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with 
quorum
Last updated: Mon Mar 2 15:55:30 2020
Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private

3 nodes configured
1 resource configured

Online: [ clubionic01.private clubionic02.private clubionic03.private ]

Active resources:

 fence_clubionic (stonith:fence_scsi): Started clubionic01.private

----

(k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys 
--device=/dev/sda
  LIO-ORG cluster.bionic. 4.0
  Peripheral device type: disk
  PR generation=0x0, there are NO registered reservation keys

(k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda
  LIO-ORG cluster.bionic. 4.0
  Peripheral device type: disk
  PR generation=0x0, there is NO reservation held

** Affects: pacemaker (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: pacemaker (Ubuntu Bionic)
     Importance: Undecided
         Status: In Progress

** Also affects: pacemaker (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: pacemaker (Ubuntu)
       Status: New => Fix Released

** Changed in: pacemaker (Ubuntu Bionic)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to pacemaker in Ubuntu.
https://bugs.launchpad.net/bugs/1866119

Title:
  [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1

Status in pacemaker package in Ubuntu:
  Fix Released
Status in pacemaker source package in Bionic:
  In Progress

Bug description:
  OBS: This bug was originally into LP: #1865523 but it was split.

  #### SRU: pacemaker

  [Impact]

   * fence_scsi is not currently working in a share disk environment

   * all clusters relying in fence_scsi and/or fence_scsi + watchdog
  won't be able to start the fencing agents OR, in worst case scenarios,
  the fence_scsi agent might start but won't make scsi reservations in
  the shared scsi disk.

   * this bug is taking care of pacemaker 1.1.18 issues with fence_scsi,
  since the later was fixed at LP: #1865523.

  [Test Case]

   * having a 3-node setup, nodes called "clubionic01, clubionic02,
  clubionic03", with a shared scsi disk (fully supporting persistent
  reservations) /dev/sda, with corosync and pacemaker operational and
  running, one might try:

  rafaeldtinoco@clubionic01:~$ crm configure
  crm(live)configure# property stonith-enabled=on
  crm(live)configure# property stonith-action=off
  crm(live)configure# property no-quorum-policy=stop
  crm(live)configure# property have-watchdog=true
  crm(live)configure# commit
  crm(live)configure# end
  crm(live)# end

  rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \
      stonith:fence_scsi params \
      pcmk_host_list="clubionic01 clubionic02 clubionic03" \
      devices="/dev/sda" \
      meta provides=unfencing

  And see the following errors:

  Failed Actions:
  * fence_clubionic_start_0 on clubionic02 'unknown error' (1): call=6, 
status=Error, exitreason='',
      last-rc-change='Wed Mar  4 19:53:12 2020', queued=0ms, exec=1105ms
  * fence_clubionic_start_0 on clubionic03 'unknown error' (1): call=6, 
status=Error, exitreason='',
      last-rc-change='Wed Mar  4 19:53:13 2020', queued=0ms, exec=1109ms
  * fence_clubionic_start_0 on clubionic01 'unknown error' (1): call=6, 
status=Error, exitreason='',
      last-rc-change='Wed Mar  4 19:53:11 2020', queued=0ms, exec=1108ms

  and corosync.log will show:

  warning: unpack_rsc_op_failure: Processing failed op start for
  fence_clubionic on clubionic01: unknown error (1)

  [Regression Potential]

   * LP: #1865523 shows fence_scsi fully operational after SRU for that
  bug is done.

   * LP: #1865523 used pacemaker 1.1.19 (vanilla) in order to fix
  fence_scsi.

   * TODO

  [Other Info]

   * Original Description:

  Trying to setup a cluster with an iscsi shared disk, using fence_scsi
  as the fencing mechanism, I realized that fence_scsi is not working in
  Ubuntu Bionic. I first thought it was related to Azure environment
  (LP: #1864419), where I was trying this environment, but then, trying
  locally, I figured out that somehow pacemaker 1.1.18 is not fencing
  the shared scsi disk properly.

  Note: I was able to "backport" vanilla 1.1.19 from upstream and
  fence_scsi worked. I have then tried 1.1.18 without all quilt patches
  and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
  might tell us which commit has fixed the behaviour needed by the
  fence_scsi agent.

  (k)rafaeldtinoco@clubionic01:~$ crm conf show
  node 1: clubionic01.private
  node 2: clubionic02.private
  node 3: clubionic03.private
  primitive fence_clubionic stonith:fence_scsi \
          params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" 
devices="/dev/sda" \
          meta provides=unfencing
  property cib-bootstrap-options: \
          have-watchdog=false \
          dc-version=1.1.18-2b07d5c5a9 \
          cluster-infrastructure=corosync \
          cluster-name=clubionic \
          stonith-enabled=on \
          stonith-action=off \
          no-quorum-policy=stop \
          symmetric-cluster=true

  ----

  (k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1
  Stack: corosync
  Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with 
quorum
  Last updated: Mon Mar 2 15:55:30 2020
  Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on 
clubionic01.private

  3 nodes configured
  1 resource configured

  Online: [ clubionic01.private clubionic02.private clubionic03.private
  ]

  Active resources:

   fence_clubionic (stonith:fence_scsi): Started clubionic01.private

  ----

  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys 
--device=/dev/sda
    LIO-ORG cluster.bionic. 4.0
    Peripheral device type: disk
    PR generation=0x0, there are NO registered reservation keys

  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda
    LIO-ORG cluster.bionic. 4.0
    Peripheral device type: disk
    PR generation=0x0, there is NO reservation held

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1866119/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1866119] [NEW] [bionic] fence_scsi not working properly with 1.1.18-2ubuntu1.1

Reply via email to