On 07/06/2017 10:29 AM, Ken Gaillot wrote: > On 07/06/2017 10:13 AM, ArekW wrote: >> Hi, >> >> It seems that my the fence_vbox is running but there are errors in >> logs every few minutes like: >> >> Jul 6 12:51:12 nfsnode1 fence_vbox: Unable to connect/login to fencing >> device >> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220] >> stderr: [ Unable to connect/login to fencing device ] >> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220] >> stderr: [ ] >> Jul 6 12:51:13 nfsnode1 stonith-ng[7899]: warning: fence_vbox[30220] >> stderr: [ ] >> >> Eventually after fome time the pcs status shows Failed Actions: >> >> # pcs status --full >> Cluster name: nfscluster >> Stack: corosync >> Current DC: nfsnode1 (1) (version 1.1.15-11.el7_3.5-e174ec8) - >> partition with quorum >> Last updated: Thu Jul 6 13:02:52 2017 Last change: Thu Jul >> 6 13:00:33 2017 by root via crm_resource on nfsnode1 >> >> 2 nodes and 11 resources configured >> >> Online: [ nfsnode1 (1) nfsnode2 (2) ] >> >> Full list of resources: >> >> Master/Slave Set: StorageClone [Storage] >> Storage (ocf::linbit:drbd): Master nfsnode1 >> Storage (ocf::linbit:drbd): Master nfsnode2 >> Masters: [ nfsnode1 nfsnode2 ] >> Clone Set: dlm-clone [dlm] >> dlm (ocf::pacemaker:controld): Started nfsnode1 >> dlm (ocf::pacemaker:controld): Started nfsnode2 >> Started: [ nfsnode1 nfsnode2 ] >> vbox-fencing (stonith:fence_vbox): Started nfsnode1 >> Clone Set: ClusterIP-clone [ClusterIP] (unique) >> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started nfsnode1 >> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started nfsnode2 >> Clone Set: StorageFS-clone [StorageFS] >> StorageFS (ocf::heartbeat:Filesystem): Started nfsnode1 >> StorageFS (ocf::heartbeat:Filesystem): Started nfsnode2 >> Started: [ nfsnode1 nfsnode2 ] >> Clone Set: WebSite-clone [WebSite] >> WebSite (ocf::heartbeat:apache): Started nfsnode1 >> WebSite (ocf::heartbeat:apache): Started nfsnode2 >> Started: [ nfsnode1 nfsnode2 ] >> >> Failed Actions: >> * vbox-fencing_start_0 on nfsnode1 'unknown error' (1): call=157, >> status=Error, exitreason='none', >> last-rc-change='Thu Jul 6 13:58:04 2017', queued=0ms, exec=11947ms >> * vbox-fencing_start_0 on nfsnode2 'unknown error' (1): call=57, >> status=Error, exitreason='none', >> last-rc-change='Thu Jul 6 13:58:16 2017', queued=0ms, exec=11953ms >> >> The fence was created with command: >> pcs -f stonith_cfg stonith create vbox-fencing fence_vbox ip=10.0.2.2 >> ipaddr=10.0.2.2 login=AW23321 username=AW23321 >> identity_file=/root/.ssh/id_rsa host_os=windows >> pcmk_host_check=static-list pcmk_host_list="centos1 centos2" >> vboxmanage_path="/cygdrive/c/Program\ >> Files/Oracle/VirtualBox/VBoxManage" op monitor interval=5 >> >> where centos1 and centos2 are VBox machines names (not hostnames). I >> used duplicated login/username parameters as it is indicated as >> required in stonith description fence_vbox. >> >> Then I updated the configuration and set: >> >> pcs stonith update vbox-fencing pcmk_host_list="nfsnode1 nfsnode2" >> pcs stonith update vbox-fencing >> pcmk_host_map="nfsnode1:centos1;nfsnode2:centos2" >> >> where nfsnode1 and nfsnode2 are the hostnames >> >> I'not sure which config is correct but both shows Failed Actions after >> some time. > > You only need one of pcmk_host_list or pcmk_host_map. Use pcmk_host_list > if fence_vbox recognizes the node names used by the cluster, or > pcmk_host_map if fence_vbox knows the nodes by other names. In this > case, it looks like you want to tell fence_vbox to use "centos2" when > the cluster wants to fence nfsnode2, so your pcmk_host_map is the right > choice. > >> I've successfully tested the fence connection to the VBox host with: >> fence_vbox --ip 10.0.2.2 --username=AW23321 >> --identity-file=/root/.ssh/id_rsa --plug=centos2 --host-os=windows >> --action=status --vboxmanage-path="/cygdrive/c/Program\ >> Files/Oracle/VirtualBox/VBoxManage" >> >> Why the above configuration work as standalone command and does not >> work in pcs ? > Two main possibilities: you haven't expressed those identical options in > the cluster configuration correctly; or, you have some permissions on > the command line that the cluster doesn't have (maybe SELinux, or file > permissions, or ...).
Forgot one other possibility: the status shows that the *start* action is what failed, not a fence action. Check the fence_vbox source code to see what start does, and try to do that manually step by step. _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
