** Description changed: OBS: I have split this bug into 2 bugs: - fence-agents (this) and pacemaker (LP: #1866119) #### SRU: fence-agents [Impact] * fence_scsi is not currently working in a share disk environment * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't be able to start the fencing agents OR, in worst case scenarios, the fence_scsi agent might start but won't make scsi reservations in the shared scsi disk. [Test Case] * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, one might try the following command: sudo fence_scsi --verbose -n clubionic01 -d /dev/sda -k 3abe0000 -o off from nodes "clubionic02 or clubionic03" and check if the reservation worked: (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there are NO registered reservation keys (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there is NO reservation held * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, with corosync and pacemaker operational and running, one might try: rafaeldtinoco@clubionic01:~$ crm configure crm(live)configure# property stonith-enabled=on crm(live)configure# property stonith-action=off crm(live)configure# property no-quorum-policy=stop crm(live)configure# property have-watchdog=true crm(live)configure# property symmetric-cluster=true crm(live)configure# commit crm(live)configure# end crm(live)# end rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \ stonith:fence_scsi params \ pcmk_host_list="clubionic01 clubionic02 clubionic03" \ devices="/dev/sda" \ meta provides=unfencing And see that crm_mon won't show fence_clubionic resource operational. [Regression Potential] - * Fix involves adding new cmdline and stdin arguments to the fencing + * Fix involves adding new cmdline and stdin arguments to the fencing agents. Both changes in that direction (normalizing "-" with "_" and deprecating some commands in favor of others) keep the existing commands working and allow the new commands to work as well (that part is the fix, because of the integration with pacemaker). * Comments #3 and #4 show this new version fully working. - * This fix has a potential of breaking other "nowadays working" fencing - agent. If that happens, I suggest that ones affected revert previous to - previous package AND open a bug against either pacemaker and/or fence- - agents. + * This is a quite complex change and I'd appreciate leaving it in -proposed for a + while longer (15 days ?) for a higher chance to detect issues. Furthermore there was no update since bionic release, so users could in the worst-case (and only then) + report a bug and downgrade to the former version. * Judging by this issue, it is very likely that any Ubuntu user that have tried using fence_scsi has probably migrated to a newer version because fence_scsi agent is broken since its release. * The way I fixed fence_scsi was this: I packaged pacemaker in latest 1.1.X version and kept it "vanilla" so I could bisect fence-agents. At that moment I realized that bisecting was going to be hard because there were multiple issues, not only one. I backported the latest fence-agents together with Pacemaker 1.1.19-0 and saw that it worked. From then on, I bisected the following intervals: 4.3.0 .. 4.4.0 (eoan - working) 4.2.0 .. 4.3.0 4.1.0 .. 4.2.0 4.0.25 .. 4.1.0 (bionic - not working) In each of those intervals I discovered issues. For example, Using 4.3.0 I faced problems so I had to backport fixes that were in between 4.4.0 and 4.3.0. Then, backporting 4.2.0, I faced issues so I had to backport fixes from the 4.3.0 <-> 4.2.0 interval. I did this until I was at 4.0.25 version, current Bionic fence-agents version. [Other Info] * Original Description: Trying to setup a cluster with an iscsi shared disk, using fence_scsi as the fencing mechanism, I realized that fence_scsi is not working in Ubuntu Bionic. I first thought it was related to Azure environment (LP: #1864419), where I was trying this environment, but then, trying locally, I figured out that somehow pacemaker 1.1.18 is not fencing the shared scsi disk properly. Note: I was able to "backport" vanilla 1.1.19 from upstream and fence_scsi worked. I have then tried 1.1.18 without all quilt patches and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19 might tell us which commit has fixed the behaviour needed by the fence_scsi agent. (k)rafaeldtinoco@clubionic01:~$ crm conf show node 1: clubionic01.private node 2: clubionic02.private node 3: clubionic03.private primitive fence_clubionic stonith:fence_scsi \ params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \ meta provides=unfencing property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=clubionic \ stonith-enabled=on \ stonith-action=off \ no-quorum-policy=stop \ symmetric-cluster=true ---- (k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1 Stack: corosync Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Mon Mar 2 15:55:30 2020 Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private 3 nodes configured 1 resource configured Online: [ clubionic01.private clubionic02.private clubionic03.private ] Active resources: fence_clubionic (stonith:fence_scsi): Started clubionic01.private ---- (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there are NO registered reservation keys (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there is NO reservation held
** Description changed: OBS: I have split this bug into 2 bugs: - fence-agents (this) and pacemaker (LP: #1866119) #### SRU: fence-agents [Impact] * fence_scsi is not currently working in a share disk environment * all clusters relying in fence_scsi and/or fence_scsi + watchdog won't be able to start the fencing agents OR, in worst case scenarios, the fence_scsi agent might start but won't make scsi reservations in the shared scsi disk. [Test Case] * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, one might try the following command: sudo fence_scsi --verbose -n clubionic01 -d /dev/sda -k 3abe0000 -o off from nodes "clubionic02 or clubionic03" and check if the reservation worked: (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there are NO registered reservation keys (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there is NO reservation held * having a 3-node setup, nodes called "clubionic01, clubionic02, clubionic03", with a shared scsi disk (fully supporting persistent reservations) /dev/sda, with corosync and pacemaker operational and running, one might try: rafaeldtinoco@clubionic01:~$ crm configure crm(live)configure# property stonith-enabled=on crm(live)configure# property stonith-action=off crm(live)configure# property no-quorum-policy=stop crm(live)configure# property have-watchdog=true crm(live)configure# property symmetric-cluster=true crm(live)configure# commit crm(live)configure# end crm(live)# end rafaeldtinoco@clubionic01:~$ crm configure primitive fence_clubionic \ stonith:fence_scsi params \ pcmk_host_list="clubionic01 clubionic02 clubionic03" \ devices="/dev/sda" \ meta provides=unfencing And see that crm_mon won't show fence_clubionic resource operational. [Regression Potential] * Fix involves adding new cmdline and stdin arguments to the fencing agents. Both changes in that direction (normalizing "-" with "_" and deprecating some commands in favor of others) keep the existing commands working and allow the new commands to work as well (that part is the fix, because of the integration with pacemaker). * Comments #3 and #4 show this new version fully working. - * This is a quite complex change and I'd appreciate leaving it in -proposed for a + * This is a quite complex change and I'd appreciate leaving it in -proposed for a while longer (15 days ?) for a higher chance to detect issues. Furthermore there was no update since bionic release, so users could in the worst-case (and only then) report a bug and downgrade to the former version. * Judging by this issue, it is very likely that any Ubuntu user that have tried using fence_scsi has probably migrated to a newer version because fence_scsi agent is broken since its release. + + [Other Info] * The way I fixed fence_scsi was this: I packaged pacemaker in latest 1.1.X version and kept it "vanilla" so I could bisect fence-agents. At that moment I realized that bisecting was going to be hard because there were multiple issues, not only one. I backported the latest fence-agents together with Pacemaker 1.1.19-0 and saw that it worked. From then on, I bisected the following intervals: 4.3.0 .. 4.4.0 (eoan - working) 4.2.0 .. 4.3.0 4.1.0 .. 4.2.0 4.0.25 .. 4.1.0 (bionic - not working) In each of those intervals I discovered issues. For example, Using 4.3.0 I faced problems so I had to backport fixes that were in between 4.4.0 and 4.3.0. Then, backporting 4.2.0, I faced issues so I had to backport fixes from the 4.3.0 <-> 4.2.0 interval. I did this until I was at 4.0.25 version, current Bionic fence-agents version. - - [Other Info] * Original Description: Trying to setup a cluster with an iscsi shared disk, using fence_scsi as the fencing mechanism, I realized that fence_scsi is not working in Ubuntu Bionic. I first thought it was related to Azure environment (LP: #1864419), where I was trying this environment, but then, trying locally, I figured out that somehow pacemaker 1.1.18 is not fencing the shared scsi disk properly. Note: I was able to "backport" vanilla 1.1.19 from upstream and fence_scsi worked. I have then tried 1.1.18 without all quilt patches and it didnt work as well. I think that bisecting 1.1.18 <-> 1.1.19 might tell us which commit has fixed the behaviour needed by the fence_scsi agent. (k)rafaeldtinoco@clubionic01:~$ crm conf show node 1: clubionic01.private node 2: clubionic02.private node 3: clubionic03.private primitive fence_clubionic stonith:fence_scsi \ params pcmk_host_list="10.250.3.10 10.250.3.11 10.250.3.12" devices="/dev/sda" \ meta provides=unfencing property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.18-2b07d5c5a9 \ cluster-infrastructure=corosync \ cluster-name=clubionic \ stonith-enabled=on \ stonith-action=off \ no-quorum-policy=stop \ symmetric-cluster=true ---- (k)rafaeldtinoco@clubionic02:~$ sudo crm_mon -1 Stack: corosync Current DC: clubionic01.private (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Mon Mar 2 15:55:30 2020 Last change: Mon Mar 2 15:45:33 2020 by root via cibadmin on clubionic01.private 3 nodes configured 1 resource configured Online: [ clubionic01.private clubionic02.private clubionic03.private ] Active resources: fence_clubionic (stonith:fence_scsi): Started clubionic01.private ---- (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist --in --read-keys --device=/dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there are NO registered reservation keys (k)rafaeldtinoco@clubionic02:~$ sudo sg_persist -r /dev/sda LIO-ORG cluster.bionic. 4.0 Peripheral device type: disk PR generation=0x0, there is NO reservation held -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1865523 Title: [bionic] fence_scsi not working properly with Pacemaker 1.1.18-2ubuntu1.1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1865523/+subscriptions -- Ubuntu-server-bugs mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
