Re: [ClusterLabs] Nodes see each other as OFFLINE - fence agent (fence_pcmk) may not be working properly on RHEL 6.5

2016-12-16 Thread Ken Gaillot
On 12/16/2016 07:46 AM, avinash shankar wrote:
> 
> Hello team,
> 
> I am a newbie in pacemaker and corosync cluster.
> I am facing trouble with fence_agent on RHEL 6.5
> I have installed pcs, pacemaker, corosync, cman on RHEL 6.5 on two
> virtual nodes (libvirt) cluster.
> SELINUX and firewall is completely disabled.
> 
> # yum list installed | egrep 'pacemaker|corosync|cman|fence'
> cman.x86_64  3.0.12.1-78.el6
> @rhel-ha-for-rhel-6-server-rpms
> corosync.x86_64  1.4.7-5.el6
> @rhel-ha-for-rhel-6-server-rpms
> corosynclib.x86_64   1.4.7-5.el6
> @rhel-ha-for-rhel-6-server-rpms
> fence-agents.x86_64  4.0.15-12.el6  
> @rhel-6-server-rpms   
> fence-virt.x86_640.2.3-19.el6   
> @rhel-ha-for-rhel-6-server-eus-rpms
> pacemaker.x86_64 1.1.14-8.el6_8.2   
> @rhel-ha-for-rhel-6-server-rpms
> pacemaker-cli.x86_64 1.1.14-8.el6_8.2   
> @rhel-ha-for-rhel-6-server-rpms
> pacemaker-cluster-libs.x86_641.1.14-8.el6_8.2   
> @rhel-ha-for-rhel-6-server-rpms
> pacemaker-libs.x86_641.1.14-8.el6_8.2   
> @rhel-ha-for-rhel-6-server-rpms
>  
> 
> I bring up cluster using pcs cluster start --all
> also done pcs property set stonith-enabled=false

fence_pcmk simply tells CMAN to use pacemaker's fencing ... it can't
work if pacemaker's fencing is disabled.

> Below is the status
> ---
> # pcs status
> Cluster name: roamclus
> Last updated: Fri Dec 16 18:54:40 2016Last change: Fri Dec 16
> 17:44:50 2016 by root via cibadmin on cnode1
> Stack: cman
> Current DC: NONE
> 2 nodes and 2 resources configured
> 
> Online: [ cnode1 ]
> OFFLINE: [ cnode2 ]
> 
> Full list of resources:
> 
> PCSD Status:
>   cnode1: Online
>   cnode2: Online
> ---
> Same kind of output is observed on other node = cnode2
> So nodes see each other as OFFLINE.
> Expected result is Online: [ cnode1 cnode2 ]
> I did same packages installation on RHEL 6.8 and when I am starting the
> cluster,
> it shows both nodes ONLINE from each other.
> 
> I need to resolve this such that on RHEL 6.5 nodes when we start cluster
> by default
> both nodes should display each others status as online.
> --
> Below is the  /etc/cluster/cluster.conf
> 
> 
>   
>   
> 
>   
> 
>   
> 
>   
> 
> 
>   
> 
>   
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
> 
>   
> 
> --
> # cat /var/lib/pacemaker/cib/cib.xml
>  num_updates="0" admin_epoch="0" cib-last-written="Fri Dec 16 18:57:10
> 2016" update-origin="cnode1" update-client="cibadmin" update-user="root"
> have-quorum="1" dc-uuid="cnode1">
>   
> 
>   
>  name="have-watchdog" value="false"/>
>  value="1.1.14-8.el6_8.2-70404b0"/>
>  name="cluster-infrastructure" value="cman"/>
>  name="stonith-enabled" value="false"/>
>   
> 
> 
>   
>   
> 
> 
> 
>   
> 
> 
> /var/log/messages have below contents :
> 
> Dec 15 20:29:43 cnode2 kernel: DLM (built Oct 26 2016 10:26:08) installed
> Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Corosync Cluster
> Engine ('1.4.7'): started and ready to provide service.
> Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Corosync built-in
> features: nss dbus rdma snmp
> Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Successfully read
> config from /etc/cluster/cluster.conf
> Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Successfully parsed
> cman config
> Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] Initializing transport
> (UDP/IP Multicast).
> Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] Initializing
> transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] The network interface
> [10.10.18.138] is now up.
> Dec 15 20:29:46 cnode2 corosync[2464]:   [QUORUM] Using quorum provider
> quorum_cman
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> corosync cluster quorum service v0.1
> Dec 15 20:29:46 cnode2 corosync[2464]:   [CMAN  ] CMAN 3.0.12.1 (built
> Feb  1 2016 07:06:19) started
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> corosync CMAN membership service 2.90
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> openais checkpoint service B.01.01
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> corosync extended virtual synchrony service
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> corosync configuration service
> Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
> corosync cluster closed process group service v1.01
>

[ClusterLabs] Nodes see each other as OFFLINE - fence agent (fence_pcmk) may not be working properly on RHEL 6.5

2016-12-16 Thread avinash shankar
Hello team,

I am a newbie in pacemaker and corosync cluster.
I am facing trouble with fence_agent on RHEL 6.5
I have installed pcs, pacemaker, corosync, cman on RHEL 6.5 on two virtual
nodes (libvirt) cluster.
SELINUX and firewall is completely disabled.

# yum list installed | egrep 'pacemaker|corosync|cman|fence'
cman.x86_64  3.0.12.1-78.el6
@rhel-ha-for-rhel-6-server-rpms
corosync.x86_64  1.4.7-5.el6
@rhel-ha-for-rhel-6-server-rpms
corosynclib.x86_64   1.4.7-5.el6
@rhel-ha-for-rhel-6-server-rpms
fence-agents.x86_64  4.0.15-12.el6
@rhel-6-server-rpms
fence-virt.x86_640.2.3-19.el6
@rhel-ha-for-rhel-6-server-eus-rpms
pacemaker.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-cli.x86_64 1.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-cluster-libs.x86_641.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms
pacemaker-libs.x86_641.1.14-8.el6_8.2
@rhel-ha-for-rhel-6-server-rpms


I bring up cluster using pcs cluster start --all
also done pcs property set stonith-enabled=false

Below is the status
---
# pcs status
Cluster name: roamclus
Last updated: Fri Dec 16 18:54:40 2016Last change: Fri Dec 16
17:44:50 2016 by root via cibadmin on cnode1
Stack: cman
Current DC: NONE
2 nodes and 2 resources configured

Online: [ cnode1 ]
OFFLINE: [ cnode2 ]

Full list of resources:

PCSD Status:
  cnode1: Online
  cnode2: Online
---
Same kind of output is observed on other node = cnode2
So nodes see each other as OFFLINE.
Expected result is Online: [ cnode1 cnode2 ]
I did same packages installation on RHEL 6.8 and when I am starting the
cluster,
it shows both nodes ONLINE from each other.

I need to resolve this such that on RHEL 6.5 nodes when we start cluster by
default
both nodes should display each others status as online.
--
Below is the  /etc/cluster/cluster.conf


  
  

  

  

  


  

  

  

  
  
  

  
  


  

--
# cat /var/lib/pacemaker/cib/cib.xml

  

  




  


  
  



  


/var/log/messages have below contents :

Dec 15 20:29:43 cnode2 kernel: DLM (built Oct 26 2016 10:26:08) installed
Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Corosync Cluster Engine
('1.4.7'): started and ready to provide service.
Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Corosync built-in
features: nss dbus rdma snmp
Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Successfully parsed cman
config
Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] The network interface
[10.10.18.138] is now up.
Dec 15 20:29:46 cnode2 corosync[2464]:   [QUORUM] Using quorum provider
quorum_cman
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Dec 15 20:29:46 cnode2 corosync[2464]:   [CMAN  ] CMAN 3.0.12.1 (built Feb
1 2016 07:06:19) started
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync configuration service
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync profile loading service
Dec 15 20:29:46 cnode2 corosync[2464]:   [QUORUM] Using quorum provider
quorum_cman
Dec 15 20:29:46 cnode2 corosync[2464]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Dec 15 20:29:46 cnode2 corosync[2464]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Dec 15 20:29:46 cnode2 corosync[2464]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 15 20:29:46 cnode2 corosync[2464]:   [CMAN  ] quorum regained, resuming
activity
Dec 15 20:29:46 cnode2 corosync[2464]:   [QUORUM] This node is within the
primary component and will provide service.
Dec 15