Re: [ClusterLabs] fence_scsi no such device

marvin Mon, 21 Mar 2016 06:42:05 -0700


On 03/15/2016 03:39 PM, Ken Gaillot wrote:

On 03/15/2016 09:10 AM, marvin wrote:

Hi,

I'm trying to get fence_scsi working, but i get "no such device" error.
It's a two node cluster with nodes called "node01" and "node03". The OS
is RHEL 7.2.

here is some relevant info:

# pcs status
Cluster name: testrhel7cluster
Last updated: Tue Mar 15 15:05:40 2016          Last change: Tue Mar 15
14:33:39 2016 by root via cibadmin on node01
Stack: corosync
Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 23 resources configured

Online: [ node01 node03 ]

Full list of resources:

  Clone Set: dlm-clone [dlm]
      Started: [ node01 node03 ]
  Clone Set: clvmd-clone [clvmd]
      Started: [ node01 node03 ]
  fence-node1    (stonith:fence_ipmilan):        Started node03
  fence-node3    (stonith:fence_ipmilan):        Started node01
  Resource Group: test_grupa
      test_ip    (ocf::heartbeat:IPaddr):        Started node01
      lv_testdbcl        (ocf::heartbeat:LVM):   Started node01
      fs_testdbcl        (ocf::heartbeat:Filesystem):    Started node01
      oracle11_baza      (ocf::heartbeat:oracle):        Started node01
      oracle11_lsnr      (ocf::heartbeat:oralsnr):       Started node01
  fence-scsi-node1       (stonith:fence_scsi):   Started node03
  fence-scsi-node3       (stonith:fence_scsi):   Started node01

PCSD Status:
   node01: Online
   node03: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

# pcs stonith show
  fence-node1    (stonith:fence_ipmilan):        Started node03
  fence-node3    (stonith:fence_ipmilan):        Started node01
  fence-scsi-node1       (stonith:fence_scsi):   Started node03
  fence-scsi-node3       (stonith:fence_scsi):   Started node01
  Node: node01
   Level 1 - fence-scsi-node3
   Level 2 - fence-node3
  Node: node03
   Level 1 - fence-scsi-node1
   Level 2 - fence-node1

# pcs stonith show fence-scsi-node1 --all
  Resource: fence-scsi-node1 (class=stonith type=fence_scsi)
   Attributes: pcmk_host_list=node01 pcmk_monitor_action=metadata
pcmk_reboot_action=off
   Meta Attrs: provides=unfencing
   Operations: monitor interval=60s (fence-scsi-node1-monitor-interval-60s)

# pcs stonith show fence-scsi-node3 --all
  Resource: fence-scsi-node3 (class=stonith type=fence_scsi)
   Attributes: pcmk_host_list=node03 pcmk_monitor_action=metadata
pcmk_reboot_action=off
   Meta Attrs: provides=unfencing
   Operations: monitor interval=60s (fence-scsi-node3-monitor-interval-60s)

node01 # pcs stonith fence node03
Error: unable to fence 'node03'
Command failed: No such device

node01 # tail /var/log/messages
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Client
stonith_admin.29191.2b7fe910 wants to fence (reboot) 'node03' with
device '(any)'
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: Initiating remote
operation reboot for node03: d1df9201-5bb1-447f-9b40-d3d7235c3d0a (0)
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-scsi-node3 can
fence (reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: fence-node3 can fence
(reboot) node03: static-list
Mar 15 14:54:04 node01 stonith-ng[20024]:  notice: All fencing options
to fence node03 for [email protected] failed

The above line is the key. Both of the devices registered for node03
returned failure. Pacemaker then looked for any other device capable of
fencing node03 and there is none, so that's why it reported "No such
device" (an admittedly obscure message).

It looks like the fence agents require more configuration options than
you have set. If you run "/path/to/fence/agent -o metadata", you can see
the available options. It's a good idea to first get the agent running
successfully manually on the command line ("status" command is usually
sufficient), then put those same options in the cluster configuration.

Made some progress, found new issue.

So i get the scsi_fence to work, it unfences at start, and fences when itell it to.

The problem is when I, for instance, fence node01. It stops pacemakerbut leaves corosync, so node01 is in "pending" state and node03 won'tstop services until node01 is restarted. The keys seem to be handledcorrectly.


Before fence:
# pcs status
Cluster name: testrhel7cluster

Last updated: Mon Mar 21 14:26:53 2016 Last change: Mon Mar 2114:26:27 2016 by root via crm_resource on node01

Stack: corosync
Current DC: node01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 21 resources configured

Online: [ node01 node03 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ node01 node03 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node01 node03 ]
 Resource Group: test_grupa
     test_ip    (ocf::heartbeat:IPaddr):        Started node01
     lv_testdbcl        (ocf::heartbeat:LVM):   Started node01
     fs_testdbcl        (ocf::heartbeat:Filesystem):    Started node01
     oracle11_baza      (ocf::heartbeat:oracle):        Started node01
     oracle11_lsnr      (ocf::heartbeat:oralsnr):       Started node01
 Resource Group: oracle12_test
     oracle12_ip        (ocf::heartbeat:IPaddr):        Started node03
     lv_testdbcl12      (ocf::heartbeat:LVM):   Started node03
     fs_testdbcl12      (ocf::heartbeat:Filesystem):    Started node03
     oracle12_baza      (ocf::heartbeat:oracle):        Started node03
     oracle12_lsnr      (ocf::heartbeat:oralsnr):       Started node03
 scsi-node03    (stonith:fence_scsi):   Started node03
 scsi-node01    (stonith:fence_scsi):   Started node01

PCSD Status:
  node01: Online
  node03: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



After fence:
# pcs status
Cluster name: testrhel7cluster

Last updated: Mon Mar 21 14:28:40 2016 Last change: Mon Mar 2114:26:27 2016 by root via crm_resource on node01

Stack: corosync
Current DC: node03 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 21 resources configured

Node node01: pending
Online: [ node03 ]

Full list of resources:

 Clone Set: dlm-clone [dlm]
     Started: [ node03 ]
     Stopped: [ node01 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ node03 ]
     Stopped: [ node01 ]
 Resource Group: test_grupa
     test_ip    (ocf::heartbeat:IPaddr):        Stopped
     lv_testdbcl        (ocf::heartbeat:LVM):   Stopped
     fs_testdbcl        (ocf::heartbeat:Filesystem):    Stopped
     oracle11_baza      (ocf::heartbeat:oracle):        Stopped
     oracle11_lsnr      (ocf::heartbeat:oralsnr):       Stopped
 Resource Group: oracle12_test
     oracle12_ip        (ocf::heartbeat:IPaddr):        Started node03
     lv_testdbcl12      (ocf::heartbeat:LVM):   Started node03
     fs_testdbcl12      (ocf::heartbeat:Filesystem):    Started node03
     oracle12_baza      (ocf::heartbeat:oracle):        Started node03
     oracle12_lsnr      (ocf::heartbeat:oralsnr):       Started node03
 scsi-node03    (stonith:fence_scsi):   Started node03
 scsi-node01    (stonith:fence_scsi):   Stopped

PCSD Status:
  node01: Online
  node03: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

See the node01 in pending state.

The stonith config is this:
# pcs stonith show scsi-node01 --all
 Resource: scsi-node01 (class=stonith type=fence_scsi)

Attributes: pcmk_host_list=node01,node03 pcmk_host_check=static-listpcmk_monitor_action=metadata pcmk_reboot_action=offlogfile=/var/log/cluster/fence_scsi.log verbose=3

  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-node01-monitor-interval-60s)
# pcs stonith show scsi-node03 --all
 Resource: scsi-node03 (class=stonith type=fence_scsi)

Attributes: pcmk_host_list=node01,node03 pcmk_host_check=static-listpcmk_monitor_action=metadata pcmk_reboot_action=offlogfile=/var/log/cluster/fence_scsi.log verbose=3

  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-node03-monitor-interval-60s)

As soon as i restart or disconnect from network node01 the servicesstart on node03.


Is this somehow expected behavior or is something weird going on here?

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_scsi no such device

Reply via email to