Re: [ClusterLabs] unable to start fence_scsi on a new add node

2020-04-20 Thread Stefan Sabolowitsch
Oyvind,
>> Sound like you need to increase the number of journals for your GFS2 
>> filesystem.

Thanks that was the trick.
Thank you for the professional help here.

Stefan



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start fence_scsi on a new add node

2020-04-20 Thread Oyvind Albrigtsen

Sound like you need to increase the number of journals for your GFS2
filesystem.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/s1-manage-addjournalfs


Oyvind

On 19/04/20 11:03 +, Stefan Sabolowitsch wrote:

Andrei,
i found this.
if i try to mount the volume by hand, i get this error message

[root@logger log]# mount /dev/mapper/vg_cluster-lv_cluster /data-san
mount: mount /dev/mapper/vg_cluster-lv_cluster on /data-san failed: to many 
users
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start fence_scsi on a new add node

2020-04-19 Thread Stefan Sabolowitsch
Andrei,
i found this.
if i try to mount the volume by hand, i get this error message

[root@logger log]# mount /dev/mapper/vg_cluster-lv_cluster /data-san
mount: mount /dev/mapper/vg_cluster-lv_cluster on /data-san failed: to many 
users
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start fence_scsi on a new add node

2020-04-19 Thread Stefan Sabolowitsch
Privjet / Hello Andrei (happy easter to russia)
thanks for the tip, which made me even more brought something, but the volume 
is not mounted now on the new node.

[root@elastic ~]# pcs status
Cluster name: cluster_elastic
Stack: corosync
Current DC: elastic-02 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with 
quorum
Last updated: Sun Apr 19 12:21:01 2020
Last change: Sun Apr 19 12:15:29 2020 by root via cibadmin on elastic-03

3 nodes configured
10 resources configured

Online: [ elastic-01 elastic-02 elastic-03 ]

Full list of resources:

 scsi   (stonith:fence_scsi):   Started elastic-01
 Clone Set: dlm-clone [dlm]
 Started: [ elastic-01 elastic-02 elastic-03 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ elastic-01 elastic-02 elastic-03 ]
 Clone Set: fs_gfs2-clone [fs_gfs2]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]

Failed Resource Actions:
* fs_gfs2_start_0 on elastic-03 'unknown error' (1): call=53, status=complete, 
exitreason='Couldn't mount device [/dev/vg_cluster/lv_cluster] as /data-san',
last-rc-change='Sun Apr 19 12:02:44 2020', queued=0ms, exec=1015ms

Failed Fencing Actions:
* unfencing of elastic-03 failed: delegate=, client=crmd.5149, 
origin=elastic-02,
last-failed='Sun Apr 19 11:32:59 2020'

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

my config:

Cluster Name: cluster_elastic
Corosync Nodes:
 elastic-01 elastic-02 elastic-03
Pacemaker Nodes:
 elastic-01 elastic-02 elastic-03

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
   start interval=0s timeout=90 (dlm-start-interval-0s)
   stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
   start interval=0s timeout=90s (clvmd-start-interval-0s)
   stop interval=0s timeout=90s (clvmd-stop-interval-0s)
 Clone: fs_gfs2-clone
  Meta Attrs: interleave=true
  Resource: fs_gfs2 (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/vg_cluster/lv_cluster directory=/data-san 
fstype=gfs2 options=noatime,nodiratime
   Operations: monitor interval=10s on-fail=fence (fs_gfs2-monitor-interval-10s)
   notify interval=0s timeout=60s (fs_gfs2-notify-interval-0s)
   start interval=0s timeout=60s (fs_gfs2-start-interval-0s)
   stop interval=0s timeout=60s (fs_gfs2-stop-interval-0s)

Stonith Devices:
 Resource: scsi (class=stonith type=fence_scsi)
  Attributes: pcmk_host_list="elastic-01 elastic-02 elastic-03" 
pcmk_monitor_action=metadata pcmk_reboot_action=offdevices=/dev/mapper/mpatha 
verbose
=true
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) 
(id:order-dlm-clone-clvmd-clone-mandatory)
  start clvmd-clone then start fs_gfs2-clone (kind:Mandatory) 
(id:order-clvmd-clone-fs_gfs2-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) 
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
  fs_gfs2-clone with clvmd-clone (score:INFINITY) 
(id:colocation-fs_gfs2-clone-clvmd-clone-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_elastic
 dc-version: 1.1.20-5.el7_7.2-3c4c782f70
 have-watchdog: false
 maintenance-mode: false
 no-quorum-policy: ignore

Quorum:
  Options:



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start fence_scsi on a new add node

2020-04-19 Thread Andrei Borzenkov
16.04.2020 18:58, Stefan Sabolowitsch пишет:
> Hi there,
> i have expanded a cluster with 2 nodes with an additional one "elastic-03". 
> However, fence_scsi does not start on the new node.
> 
> pcs-status:
> [root@logger cluster]# pcs status
> Cluster name: cluster_elastic
> Stack: corosync
> Current DC: elastic-02 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with 
> quorum
> Last updated: Thu Apr 16 17:38:16 2020
> Last change: Thu Apr 16 17:23:43 2020 by root via cibadmin on elastic-03
> 
> 3 nodes configured
> 10 resources configured
> 
> Online: [ elastic-01 elastic-02 elastic-03 ]
> 
> Full list of resources:
> 
>  scsi (stonith:fence_scsi):   Stopped
>  Clone Set: dlm-clone [dlm]
>  Started: [ elastic-01 elastic-02 ]
>  Stopped: [ elastic-03 ]
>  Clone Set: clvmd-clone [clvmd]
>  Started: [ elastic-01 elastic-02 ]
>  Stopped: [ elastic-03 ]
>  Clone Set: fs_gfs2-clone [fs_gfs2]
>  Started: [ elastic-01 elastic-02 ]
>  Stopped: [ elastic-03 ]
> 
> Failed Fencing Actions:
> * unfencing of elastic-03 failed: delegate=, client=crmd.5149, 
> origin=elastic-02,
> last-failed='Thu Apr 16 17:23:43 2020'
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> corosync.log 
> Apr 16 17:27:10 [4572] logger stonith-ng:   notice: 
> can_fence_host_with_device:   scsi can fence (off) elastic-01 : 
> static-list
> Apr 16 17:27:12 [4572] logger stonith-ng:   notice: 
> can_fence_host_with_device:   scsi can fence (off) elastic-02 : 
> static-list
> Apr 16 17:27:13 [4572] logger stonith-ng:   notice: 
> can_fence_host_with_device:   scsi can not fence (off) elasti c-03: 
> static-list
> Apr 16 17:38:43 [4572] logger stonith-ng:   notice: 
> can_fence_host_with_device:   scsi can not fence (on) elastic -03: 
> static-list

You probably need to update your stonith resource to include new node.

> Apr 16 17:38:43 [4572] logger stonith-ng:   notice: remote_op_done:   
> Operation on of elastic-03 by  for crmd .5149@elastic-02.4b624305: No 
> such device
> Apr 16 17:38:43 [4576] logger.feltengroup.local   crmd:error: 
> tengine_stonith_notify:   Unfencing of elastic-03 by  failed: No such 
> device (-19)
> 
> [root@logger cluster]# stonith_admin -L
>  scsi
> 1 devices found
> 
> [root@logger cluster]# stonith_admin -l elastic-03
> No devices found
> 
> Thanks for any help here.
> Stefan
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] unable to start fence_scsi on a new add node

2020-04-16 Thread Stefan Sabolowitsch
Hi there,
i have expanded a cluster with 2 nodes with an additional one "elastic-03". 
However, fence_scsi does not start on the new node.

pcs-status:
[root@logger cluster]# pcs status
Cluster name: cluster_elastic
Stack: corosync
Current DC: elastic-02 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with 
quorum
Last updated: Thu Apr 16 17:38:16 2020
Last change: Thu Apr 16 17:23:43 2020 by root via cibadmin on elastic-03

3 nodes configured
10 resources configured

Online: [ elastic-01 elastic-02 elastic-03 ]

Full list of resources:

 scsi   (stonith:fence_scsi):   Stopped
 Clone Set: dlm-clone [dlm]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]
 Clone Set: fs_gfs2-clone [fs_gfs2]
 Started: [ elastic-01 elastic-02 ]
 Stopped: [ elastic-03 ]

Failed Fencing Actions:
* unfencing of elastic-03 failed: delegate=, client=crmd.5149, 
origin=elastic-02,
last-failed='Thu Apr 16 17:23:43 2020'

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


corosync.log 
Apr 16 17:27:10 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can fence (off) elastic-01 : static-list
Apr 16 17:27:12 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can fence (off) elastic-02 : static-list
Apr 16 17:27:13 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can not fence (off) elasti c-03: static-list
Apr 16 17:38:43 [4572] logger stonith-ng:   notice: can_fence_host_with_device: 
  scsi can not fence (on) elastic -03: static-list
Apr 16 17:38:43 [4572] logger stonith-ng:   notice: remote_op_done:   Operation 
on of elastic-03 by  for crmd .5149@elastic-02.4b624305: No such device
Apr 16 17:38:43 [4576] logger.feltengroup.local   crmd:error: 
tengine_stonith_notify:   Unfencing of elastic-03 by  failed: No such 
device (-19)

[root@logger cluster]# stonith_admin -L
 scsi
1 devices found

[root@logger cluster]# stonith_admin -l elastic-03
No devices found

Thanks for any help here.
Stefan

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Ken Gaillot
On 05/18/2016 05:21 AM, Marco A. Carcano wrote:
> Hi Ken,
> 
> by the way I’ve just also tried with pacemaker 1.1.14 (I builded it from 
> sources into a new RPM) but it doesn’t work
> 
> 
>> On 18 May 2016, at 11:29, Marco A. Carcano  wrote:
>>
>> Hi Ken,
>>
>> thank you for the reply
>>
>> I tried as you suggested, and now the stonith devices tries to start but 
>> fails.
>>
>> I tried this
>>
>> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
>> apache-up002.ring0 apache-up003.ring0" 
>> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
>> apache-up002.ring1=apache-up002.ring0; 
>> apache-up003.ring1=apache-up003.ring0" pcmk_reboot_action="off" 
>> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
>> provides="unfencing"  op monitor interval=60s
>>
>> and even this, adding pcmk_monitor_action="metadata” as suggested in a post 
>> on RH knowledge base (even if the error was quite different)

Avoid that -- it's a last resort for a fence agent with a missing or
broken monitor action. If the fence agent is properly written, you're
just glossing over real errors.

>> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
>> apache-up002.ring0 apache-up003.ring0" 
>> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
>> apache-up002.ring1=apache-up002.ring0; 
>> apache-up003.ring1=apache-up003.ring0" pcmk_reboot_action="off" 
>> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
>> provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s
>>
>> I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
>> fence-agents-scsi-4.0.11-27
>>
>> the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
>> with any device anderror: Operation on of apache-up003.ring0 by  
>> for crmd.15918@apache-up001.ring0.0599387e: No such device

I'm not sure why that would happen. You can try:

* fence_scsi -o metadata

Make sure "on" is in the list of supported actions. The stock one does,
but just to be sure you don't have a modified version ...

* stonith_admin -L

Make sure "scsi" is in the output (list of configured fence devices).

* stonith_admin -l apache-up003.ring0

to see what devices the cluster thinks can fence that node

* Does the cluster status show the fence device running on some node?
Does it list any failed actions?


>> Thanks
>>
>> Marco
>>
>>
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE 
>> -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>> origin=abort_transition_graph ]
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
>> Ignore
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up001.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up002.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
>> apache-up003.ring0: node discovery
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
>> scsia#011(apache-up001.ring0)
>> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 
>> 11: /var/lib/pacemaker/pengine/pe-input-95.bz2
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (11) on apache-up003.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
>> probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
>> waiting
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (8) on apache-up002.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
>> probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
>> waiting
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
>> operation (5) on apache-up001.ring0 (timeout=6)
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
>> operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
>> operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
>> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
>> probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 
>> (local) - no waiting
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
>> crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device 
>> '(any)'
>> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: 

Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Marco A. Carcano
Hi Ken,

by the way I’ve just also tried with pacemaker 1.1.14 (I builded it from 
sources into a new RPM) but it doesn’t work


> On 18 May 2016, at 11:29, Marco A. Carcano  wrote:
> 
> Hi Ken,
> 
> thank you for the reply
> 
> I tried as you suggested, and now the stonith devices tries to start but 
> fails.
> 
> I tried this
> 
> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
> apache-up002.ring0 apache-up003.ring0" 
> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
> apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
> pcmk_reboot_action="off" 
> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
> provides="unfencing"  op monitor interval=60s
> 
> and even this, adding pcmk_monitor_action="metadata” as suggested in a post 
> on RH knowledge base (even if the error was quite different)
> 
> pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
> apache-up002.ring0 apache-up003.ring0" 
> pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
> apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
> pcmk_reboot_action="off" 
> devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
> provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s
> 
> I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
> fence-agents-scsi-4.0.11-27
> 
> the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
> with any device anderror: Operation on of apache-up003.ring0 by  
> for crmd.15918@apache-up001.ring0.0599387e: No such device
> 
> Thanks
> 
> Marco
> 
> 
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph ]
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
> Ignore
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up001.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up002.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
> apache-up003.ring0: node discovery
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
> scsia#011(apache-up001.ring0)
> May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 
> 11: /var/lib/pacemaker/pengine/pe-input-95.bz2
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (11) on apache-up003.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
> probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
> waiting
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (8) on apache-up002.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
> probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
> waiting
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
> operation (5) on apache-up001.ring0 (timeout=6)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
> May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
> probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 
> (local) - no waiting
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
> crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device 
> '(any)'
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
> operation on for apache-up001.ring0: e50d7e16-9578-4964-96a3-7b36bdcfba46 (0)
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) apache-up003.ring0 with any device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) apache-up002.ring0 with any device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
> apache-up003.ring0 by  for crmd.15918@apache-up001.ring0.0599387e: No 
> such device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
> apache-up002.ring0 by  for crmd.15918@apache-up001.ring0.76aba815: No 
> such device
> May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
> to fence (on) 

Re: [ClusterLabs] unable to start fence_scsi

2016-05-18 Thread Marco A. Carcano
Hi Ken,

thank you for the reply

I tried as you suggested, and now the stonith devices tries to start but fails.

I tried this

pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
apache-up002.ring0 apache-up003.ring0" 
pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
pcmk_reboot_action="off" 
devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
provides="unfencing"  op monitor interval=60s

and even this, adding pcmk_monitor_action="metadata” as suggested in a post on 
RH knowledge base (even if the error was quite different)

pcs stonith create scsi fence_scsi pcmk_host_list="apache-up001.ring0 
apache-up002.ring0 apache-up003.ring0" 
pcmk_host_map="apache-up001.ring1=apache-up001.ring0; 
apache-up002.ring1=apache-up002.ring0; apache-up003.ring1=apache-up003.ring0" 
pcmk_reboot_action="off" 
devices="/dev/mapper/36001405973e201b3fdb4a999175b942f" meta 
provides="unfencing" pcmk_monitor_action="metadata"  op monitor interval=60s

I’m using CentOS 7.2, pacemaker-1.1.13-10  resource-agents-3.9.5-54 and 
fence-agents-scsi-4.0.11-27

the error message are  Couldn't find anyone to fence (on) apache-up003.ring0 
with any device anderror: Operation on of apache-up003.ring0 by  
for crmd.15918@apache-up001.ring0.0599387e: No such device

Thanks

Marco


May 18 10:37:03 apache-up001 crmd[15918]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
May 18 10:37:03 apache-up001 pengine[15917]:  notice: On loss of CCM Quorum: 
Ignore
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up001.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up002.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Unfencing 
apache-up003.ring0: node discovery
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Start   
scsia#011(apache-up001.ring0)
May 18 10:37:03 apache-up001 pengine[15917]:  notice: Calculated Transition 11: 
/var/lib/pacemaker/pengine/pe-input-95.bz2
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (11) on apache-up003.ring0 (timeout=6)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 9: 
probe_complete probe_complete-apache-up003.ring0 on apache-up003.ring0 - no 
waiting
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (8) on apache-up002.ring0 (timeout=6)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 6: 
probe_complete probe_complete-apache-up002.ring0 on apache-up002.ring0 - no 
waiting
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Executing on fencing 
operation (5) on apache-up001.ring0 (timeout=6)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up003.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up003.ring0: 0599387e-0a30-4e1b-b641-adea5ba2a4ad (0)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up002.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up002.ring0: 76aba815-280e-491a-bd17-40776c8169e9 (0)
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Initiating action 3: 
probe_complete probe_complete-apache-up001.ring0 on apache-up001.ring0 (local) 
- no waiting
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Client 
crmd.15918.697c495e wants to fence (on) 'apache-up001.ring0' with device '(any)'
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Initiating remote 
operation on for apache-up001.ring0: e50d7e16-9578-4964-96a3-7b36bdcfba46 (0)
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up003.ring0 with any device
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up002.ring0 with any device
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up003.ring0 by  for crmd.15918@apache-up001.ring0.0599387e: No 
such device
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up002.ring0 by  for crmd.15918@apache-up001.ring0.76aba815: No 
such device
May 18 10:37:03 apache-up001 stonith-ng[15914]:  notice: Couldn't find anyone 
to fence (on) apache-up001.ring0 with any device
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Stonith operation 
5/11:11:0:8248cebf-c198-4ff2-bd43-7415533ce50f: No such device (-19)
May 18 10:37:03 apache-up001 stonith-ng[15914]:   error: Operation on of 
apache-up001.ring0 by  for crmd.15918@apache-up001.ring0.e50d7e16: No 
such device
May 18 10:37:03 apache-up001 crmd[15918]:  notice: Stonith operation 5 for