** Description changed:

  OBS: I have split this bug into 2 bugs:
-      - fence-agents (this) and pacemaker (LP: #1866119)
+      - fence-agents (this) and pacemaker (LP: #1866119)
  
  #### SRU: fence-agents
  
  [Impact]
  
   * fence_scsi is not currently working in a share disk environment
  
   * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't
  be able to start the fencing agents OR, in worst case scenarios, the
  fence_scsi agent might start but won't make scsi reservations in the
  shared scsi disk.
  
  [Test Case]
  
   * having a 3-node setup, nodes called "clubionic01, clubionic02,
  clubionic03", with a shared scsi disk (fully supporting persistent
  reservations) /dev/sda, one might try the following command:
  
  sudo fence_scsi --verbose -n clubionic01 -d /dev/sda -k 3abe0000 -o off
  
  from nodes "clubionic02 or clubionic03" and check if the reservation
  worked:
  
  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys 
--device=/dev/sda
    LIO-ORG   cluster.bionic.   4.0
    Peripheral device type: disk
    PR generation=0x0, there are NO registered reservation keys
  
  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda
    LIO-ORG   cluster.bionic.   4.0
    Peripheral device type: disk
    PR generation=0x0, there is NO reservation held
  
   * having a 3-node setup, nodes called "clubionic01, clubionic02,
  clubionic03", with a shared scsi disk (fully supporting persistent
  reservations) /dev/sda, with corosync and pacemaker operational and
  running, one might try:
  
  rafaeldtinoco@clubionic01:~$ crm configure
  crm(live)configure# property stonith-enabled=on
  crm(live)configure# property stonith-action=off
  crm(live)configure# property no-quorum-policy=stop
  crm(live)configure# property have-watchdog=true
  crm(live)configure# property symmetric-cluster=true
  crm(live)configure# commit
  crm(live)configure# end
  crm(live)# end
  
  rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \
      stonith:fence_scsi params \
      pcmk_host_list="clubionic01 clubionic02 clubionic03" \
      devices="/dev/sda" \
      meta provides=unfencing
  
  And see that crm_mon won't show fence_clubionic resource operational.
  
  [Regression Potential]
+ 
+  * Fix involves adding new cmdline and stdin arguments to the fencing
+ agents. Both changes in that direction (normalizing "-" with "_" and
+ deprecating some commands in favor of others) keep the existing commands
+ working and allow the new commands to work as well (that part is the
+ fix, because of the integration with pacemaker).
  
   * Comments #3 and #4 show this new version fully working.
  
   * This fix has a potential of breaking other "nowadays working" fencing
  agent. If that happens, I suggest that ones affected revert previous to
  previous package AND open a bug against either pacemaker and/or fence-
  agents.
  
   * Judging by this issue, it is very likely that any Ubuntu user that
  have tried using fence_scsi has probably migrated to a newer version
  because fence_scsi agent is broken since its release.
  
   * The way I fixed fence_scsi was this:
  
  I packaged pacemaker in latest 1.1.X version and kept it "vanilla" so I
  could bisect fence-agents. At that moment I realized that bisecting was
  going to be hard because there were multiple issues, not only one. I
  backported the latest fence-agents together with Pacemaker 1.1.19-0 and
  saw that it worked.
  
  From then on, I bisected the following intervals:
  
  4.3.0 .. 4.4.0 (eoan - working)
  4.2.0 .. 4.3.0
  4.1.0 .. 4.2.0
  4.0.25 .. 4.1.0 (bionic - not working)
  
  In each of those intervals I discovered issues. For example, Using 4.3.0
  I faced problems so I had to backport fixes that were in between 4.4.0
  and 4.3.0. Then, backporting 4.2.0, I faced issues so I had to backport
  fixes from the 4.3.0 <-> 4.2.0 interval. I did this until I was at
  4.0.25 version, current Bionic fence-agents version.
  
  [Other Info]
  
   * Original Description:
  
  Trying to setup a cluster with an iscsi shared disk, using fence_scsi as
  the fencing mechanism, I realized that fence_scsi is not working in
  Ubuntu Bionic. I first thought it was related to Azure environment (LP:
  #1864419), where I was trying this environment, but then, trying
  locally, I figured out that somehow pacemaker 1.1.18 is not fencing the
  shared scsi disk properly.
  
  Note: I was able to "backport" vanilla 1.1.19 from upstream and
  fence_scsi worked. I have then tried 1.1.18 without all quilt patches
  and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19
  might tell us which commit has fixed the behaviour needed by the
  fence_scsi agent.
  
  (k)rafaeldtinoco@clubionic01:~$ crm conf show
  node 1: clubionic01.private
  node 2: clubionic02.private
  node 3: clubionic03.private
  primitive fence_clubionic stonith:fence_scsi \
          params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" 
devices="/dev/sda" \
          meta provides=unfencing
  property cib-bootstrap-options: \
          have-watchdog=false \
          dc-version=1.1.18-2b07d5c5a9 \
          cluster-infrastructure=corosync \
          cluster-name=clubionic \
          stonith-enabled=on \
          stonith-action=off \
          no-quorum-policy=stop \
          symmetric-cluster=true
  
  ----
  
  (k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1
  Stack: corosync
  Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with 
quorum
  Last updated: Mon Mar  2 15:55:30 2020
  Last change: Mon Mar  2 15:45:33 2020 by root via cibadmin on 
clubionic01.private
  
  3 nodes configured
  1 resource configured
  
  Online: [ clubionic01.private clubionic02.private clubionic03.private ]
  
  Active resources:
  
   fence_clubionic        (stonith:fence_scsi):   Started
  clubionic01.private
  
  ----
  
  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys 
--device=/dev/sda
    LIO-ORG   cluster.bionic.   4.0
    Peripheral device type: disk
    PR generation=0x0, there are NO registered reservation keys
  
  (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda
    LIO-ORG   cluster.bionic.   4.0
    Peripheral device type: disk
    PR generation=0x0, there is NO reservation held

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1865523

Title:
  [bionic] fence_scsi not working properly with Pacemaker
  1.1.18-2ubuntu1.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1865523/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to