Re: [ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Ken Gaillot
On 12/10/2015 12:45 PM, Louis Munro wrote:
> Hello all,
> 
> I am trying to get a Corosync 2 cluster going on CentOS 6.7 but I am running 
> in a bit of a problem with either Corosync or Pacemaker.
> crm reports that all my nodes are offline and the stack is unknown (I am not 
> sure if that is relevant).
> 
> I believe both nodes are actually present and seen in corosync, but they may 
> not be considered as such by pacemaker.
> I have messages in the logs saying that the processes cannot get the node 
> name and default to uname -n: 
> 
> Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: 
> corosync_node_name:Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com   crmd:   notice: get_node_name: 
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: crm_get_peer:  
> Node 739513528 is now known as hack1.example.com
> 
> The uname -n is correct as far that is concerned.
> 
> 
> Does this mean anything to anyone here? 
> 
> 
> [Lots of details to follow]...
> 
> I compiled my own versions of Corosync, Pacemaker, crm and the 
> resource-agents seemingly without problems.
> 
> Here is what I currently have installed:
> 
> # corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> # pacemakerd -F
> Pacemaker 1.1.13 (Build: 5b41ae1)
>  Supporting v3.0.10:  generated-manpages agent-manpages ascii-docs ncurses 
> libqb-logging libqb-ipc lha-fencing upstart nagios  corosync-native 
> atomic-attrd libesmtp acls
> 
> # crm --version
> crm 2.2.0-rc3
> 
> 
> 
> Here is the output of crm status:
> 
> # crm status
> Last updated: Thu Dec 10 12:47:50 2015Last change: Thu Dec 10 
> 12:02:33 2015 by root via cibadmin on hack1.example.com
> Stack: unknown
> Current DC: NONE
> 2 nodes and 0 resources configured
> 
> OFFLINE: [ hack1.example.com hack2.example.com ]
> 
> Full list of resources:
> 
> {nothing to see here}
> 
> 
> 
> # corosync-cmapctl | grep members
> runtime.totem.pg.mrp.srp.members.739513528.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513528.ip (str) = r(0) ip(172.20.20.184)
> runtime.totem.pg.mrp.srp.members.739513528.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513528.status (str) = joined
> runtime.totem.pg.mrp.srp.members.739513590.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513590.ip (str) = r(0) ip(172.20.20.246)
> runtime.totem.pg.mrp.srp.members.739513590.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513590.status (str) = joined
> 
> 
> # uname -n
> hack1.example.com
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513528
> RING ID 0
>   id  = 172.20.20.184
>   status  = ring 0 active with no faults
> 
> 
> # uname -n
> hack2.example.com
> 
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513590
> RING ID 0
>   id  = 172.20.20.246
>   status  = ring 0 active with no faults
> 
> 
> 
> 
> Shouldn’t I see both nodes in the same ring?

They are in the same ring, but the cfgtool will only print the local id.

> My corosync config is currently defined as:
> 
> # egrep -v '#' /etc/corosync/corosync.conf
> totem {
>   version: 2
> 
>   crypto_cipher: none
>   crypto_hash: none
>   clear_node_high_bit: yes
>   cluster_name: hack_cluster
>   interface {
>   ringnumber: 0
>   bindnetaddr: 172.20.0.0
>   mcastaddr: 239.255.1.1
>   mcastport: 5405
>   ttl: 1
>   }
> 
> }
> 
> logging {
>   fileline: on
>   to_stderr: no
>   to_logfile: yes
>   logfile: /var/log/cluster/corosync.log
>   to_syslog: yes
>   debug: off
>   timestamp: on
>   logger_subsys {
>   subsys: QUORUM
>   debug: off
>   }
> }
> 
> # cat /etc/corosync/service.d/pacemaker
> service {
> name: pacemaker
> ver: 1
> }

You don't want this section if you're using corosync 2. That's the old
"plugin" used with corosync 1.

> 
> 
> And here is my pacemaker configuration:
> 
> # crm config show xml
> 
>  crm_feature_set="3.0.10" validate-with="pacemaker-2.4" 
> update-client="cibadmin" epoch="13" admin_epoch="0" update-user="root" 
> cib-last-written="Thu Dec 10 13:35:06 2015">
>   
> 
>   
>  id="cib-bootstrap-options-stonith-enabled"/>
>  id="cib-bootstrap-options-no-quorum-policy"/>
>   
> 
> 
>   
> 
>id="hack1.example.com-instance_attributes-standby"/>
> 
>   
>   
> 
>id="hack2.example.com-instance_attributes-standby"/>
> 
>   
> 
> 
> 
>   
> 
> 
> 
> 
> 
> 
> 
> And finally some logs that might be relevant: 
> 
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] 
> main.c:1227 Corosync Cluster Engine ('2.3.5'): started and ready to provide 
> service.

Re: [ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Ken Gaillot
On 12/10/2015 01:14 PM, Louis Munro wrote:
> I can now answer parts of my own question.
> 
> 
> My config was missing the quorum configuration:
> 
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> two_node: 1
> expected_votes: 2
> }
> 
> 
> I read the manpage as saying that was optional, but it looks like I may be 
> misreading here.
> corosync.conf(5) says the following: 
> 
> Within the quorum directive it is possible to specify the quorum algorithm to 
> use with the
> provider directive. At the time of writing only corosync_votequorum is 
> supported.  
> See votequorum(5) for configuration options.
> 
> 
> 
> I still have messages in the logs saying 
> crmd:   notice: get_node_name:   Defaulting to uname -n for the local 
> corosync node name
> 
> I am not sure which part of the configuration I should be setting for that.
> 
> Any pointers regarding that would be nice.

Hi,

As long as the unames are what you want the nodes to be called, that
message is fine. You can explicitly set the node names by using a
nodelist {} section in corosync.conf, with each node {} having a
ring0_addr specifying the name.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Louis Munro
I can now answer parts of my own question.


My config was missing the quorum configuration:

quorum {
# Enable and configure quorum subsystem (default: off)
# see also corosync.conf.5 and votequorum.5
provider: corosync_votequorum
two_node: 1
expected_votes: 2
}


I read the manpage as saying that was optional, but it looks like I may be 
misreading here.
corosync.conf(5) says the following: 

Within the quorum directive it is possible to specify the quorum algorithm to 
use with the
provider directive. At the time of writing only corosync_votequorum is 
supported.  
See votequorum(5) for configuration options.



I still have messages in the logs saying 
crmd:   notice: get_node_name:   Defaulting to uname -n for the local corosync 
node name

I am not sure which part of the configuration I should be setting for that.

Any pointers regarding that would be nice.

Regards,
--
Louis Munro
lmu...@inverse.ca  ::  www.inverse.ca 
+1.514.447.4918 x125  :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Stack: unknown and all nodes offline

2015-12-10 Thread Louis Munro
Hello all,

I am trying to get a Corosync 2 cluster going on CentOS 6.7 but I am running in 
a bit of a problem with either Corosync or Pacemaker.
crm reports that all my nodes are offline and the stack is unknown (I am not 
sure if that is relevant).

I believe both nodes are actually present and seen in corosync, but they may 
not be considered as such by pacemaker.
I have messages in the logs saying that the processes cannot get the node name 
and default to uname -n: 

Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: 
corosync_node_name:  Unable to get node name for nodeid 739513528
Dec 10 13:38:53 [2236] hack1.example.com   crmd:   notice: get_node_name:   
Defaulting to uname -n for the local corosync node name
Dec 10 13:38:53 [2236] hack1.example.com   crmd: info: crm_get_peer:
Node 739513528 is now known as hack1.example.com

The uname -n is correct as far that is concerned.


Does this mean anything to anyone here? 


[Lots of details to follow]...

I compiled my own versions of Corosync, Pacemaker, crm and the resource-agents 
seemingly without problems.

Here is what I currently have installed:

# corosync -v
Corosync Cluster Engine, version '2.3.5'
Copyright (c) 2006-2009 Red Hat, Inc.

# pacemakerd -F
Pacemaker 1.1.13 (Build: 5b41ae1)
 Supporting v3.0.10:  generated-manpages agent-manpages ascii-docs ncurses 
libqb-logging libqb-ipc lha-fencing upstart nagios  corosync-native 
atomic-attrd libesmtp acls

# crm --version
crm 2.2.0-rc3



Here is the output of crm status:

# crm status
Last updated: Thu Dec 10 12:47:50 2015  Last change: Thu Dec 10 
12:02:33 2015 by root via cibadmin on hack1.example.com
Stack: unknown
Current DC: NONE
2 nodes and 0 resources configured

OFFLINE: [ hack1.example.com hack2.example.com ]

Full list of resources:

{nothing to see here}



# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.739513528.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.739513528.ip (str) = r(0) ip(172.20.20.184)
runtime.totem.pg.mrp.srp.members.739513528.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.739513528.status (str) = joined
runtime.totem.pg.mrp.srp.members.739513590.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.739513590.ip (str) = r(0) ip(172.20.20.246)
runtime.totem.pg.mrp.srp.members.739513590.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.739513590.status (str) = joined


# uname -n
hack1.example.com

# corosync-cfgtool -s
Printing ring status.
Local node ID 739513528
RING ID 0
id  = 172.20.20.184
status  = ring 0 active with no faults


# uname -n
hack2.example.com


# corosync-cfgtool -s
Printing ring status.
Local node ID 739513590
RING ID 0
id  = 172.20.20.246
status  = ring 0 active with no faults




Shouldn’t I see both nodes in the same ring?



My corosync config is currently defined as:

# egrep -v '#' /etc/corosync/corosync.conf
totem {
version: 2

crypto_cipher: none
crypto_hash: none
clear_node_high_bit: yes
cluster_name: hack_cluster
interface {
ringnumber: 0
bindnetaddr: 172.20.0.0
mcastaddr: 239.255.1.1
mcastport: 5405
ttl: 1
}

}

logging {
fileline: on
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}

# cat /etc/corosync/service.d/pacemaker
service {
name: pacemaker
ver: 1
}



And here is my pacemaker configuration:

# crm config show xml


  

  


  


  

  

  
  

  

  



  







And finally some logs that might be relevant: 

Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [MAIN  ] main.c:1227 
Corosync Cluster Engine ('2.3.5'): started and ready to provide service.
Dec 10 13:38:50 [2227] hack1.example.com corosync info[MAIN  ] main.c:1228 
Corosync built-in features: pie relro bindnow
Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
totemnet.c:248 Initializing transport (UDP/IP Multicast).
Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
totemcrypto.c:579 Initializing transmit/receive security (NSS) crypto: none 
hash: none
Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [TOTEM ] 
totemudp.c:671 The network interface [172.20.20.184] is now up.
Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
service.c:174 Service engine loaded: corosync configuration map access [0]
Dec 10 13:38:50 [2227] hack1.example.com corosync info[QB] 
ipc_setup.c:377 server name: cmap
Dec 10 13:38:50 [2227] hack1.example.com corosync notice  [SERV  ] 
service.c:174 Service engine loaded: corosync co

Re: [ClusterLabs] duplicate node

2015-12-10 Thread Dejan Muhamedagic
Hi,

On Tue, Dec 08, 2015 at 09:17:27PM +, gerry kernan wrote:
> Hi 
>  
> How would I remove a duplicate node, I have a 2 node setup , but on node is 
> showing twice .  crm show configure below, node gat-voip-01.gdft.org is 
> listed twice.
>  
>  
> node $id="0dc85a64-01ad-4fc5-81fd-698208a8322c" gat-voip-02\
> attributes standby="on"
> node $id="3b5d1061-8f68-4ab3-b169-e0ebe890c446" gat-voip-01
> node $id="ae4d76e7-af64-4d93-acdd-4d7b5c274eff" gat-voip-01\
> attributes standby="off"

First you need to figure out which one is the old uuid, then try:

# crm node delete 

This looks like heartbeat, there used to be a crm_uuid or
something similar to read the uuid. There's also a uuid file
somewhere in /var/lib/heartbeat.

Thanks,

Dejan

> primitive res_Filesystem_rep ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/rep" fstype="ext3" \
> operations $id="res_Filesystem_rep-operations" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60" \
> op monitor interval="20" timeout="40" start-delay="0" \
> op notify interval="0" timeout="60" \
> meta target-role="started" is-managed="true"
> primitive res_IPaddr2_northIP ocf:heartbeat:IPaddr2 \
> params ip="10.75.29.10" cidr_netmask="26" \
> operations $id="res_IPaddr2_northIP-operations" \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="20" \
> op monitor interval="10" timeout="20" start-delay="0" \
> meta target-role="started" is-managed="true"
> primitive res_IPaddr2_sipIP ocf:heartbeat:IPaddr2 \
> params ip="158.255.224.226" nic="bond2" \
> operations $id="res_IPaddr2_sipIP-operations" \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="20" \
> op monitor interval="10" timeout="20" start-delay="0" \
> meta target-role="started" is-managed="true"
> primitive res_asterisk_res_asterisk lsb:asterisk \
> operations $id="res_asterisk_res_asterisk-operations" \
> op start interval="0" timeout="15" \
> op stop interval="0" timeout="15" \
> op monitor interval="15" timeout="15" start-delay="15" \
> meta target-role="started" is-managed="true"
> primitive res_drbd_1 ocf:linbit:drbd \
> params drbd_resource="r0" \
> operations $id="res_drbd_1-operations" \
> op start interval="0" timeout="240" \
> op promote interval="0" timeout="90" \
> op demote interval="0" timeout="90" \
> op stop interval="0" timeout="100" \
> op monitor interval="10" timeout="20" start-delay="0" \
> op notify interval="0" timeout="90"
> primitive res_httpd_res_httpd lsb:httpd \
> operations $id="res_httpd_res_httpd-operations" \
> op start interval="0" timeout="15" \
> op stop interval="0" timeout="15" \
> op monitor interval="15" timeout="15" start-delay="15" \
> meta target-role="started" is-managed="true"
> primitive res_mysqld_res_mysql lsb:mysqld \
> operations $id="res_mysqld_res_mysql-operations" \
> op start interval="0" timeout="15" \
> op stop interval="0" timeout="15" \
> op monitor interval="15" timeout="15" start-delay="15" \
> meta target-role="started"
> group asterisk res_Filesystem_rep res_IPaddr2_northIP res_IPaddr2_sipIP 
> res_mysqld_res_mysql res_httpd_res_httpd res_asterisk_res_asterisk
> ms ms_drbd_1 res_drbd_1 \
> meta clone-max="2" notify="true" interleave="true" 
> resource-stickiness="100"
> location loc_res_httpd_res_httpd_gat-voip-01.gdft.org asterisk inf: 
> gat-voip-01.gdft.org
> location loc_res_mysqld_res_mysql_gat-voip-01.gdft.org asterisk inf: 
> gat-voip-01.gdft.org
> colocation col_res_Filesystem_rep_ms_drbd_1 inf: asterisk ms_drbd_1:Master
> order ord_ms_drbd_1_res_Filesystem_rep inf: ms_drbd_1:promote asterisk:start
> property $id="cib-bootstrap-options" \
> stonith-enabled="false" \
> dc-version="1.0.12-unknown" \
> no-quorum-policy="ignore" \
> cluster-infrastructure="Heartbeat" \
> last-lrm-refresh="1345727614"
>  
>  
>  
> Gerry Kernan
>  
>  
> Infinity IT   |   17 The Mall   |   Beacon Court   |   Sandyford   |   Dublin 
> D18 E3C8   |   Ireland
> Tel:  +353 - (0)1 - 293 0090   |   E-Mail:  gerry.ker...@infinityit.ie
>  
> Managed IT Services   Infinity IT - www.infinityit.ie
> IP TelephonyAsterisk Consulting - 
> www.asteriskconsulting.com
> Contact CentreTotal Interact - www.totalinteract.com
>  



> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org