On 12/10/2015 12:45 PM, Louis Munro wrote:
> Hello all,
>
> I am trying to get a Corosync 2 cluster going on CentOS 6.7 but I am running
> in a bit of a problem with either Corosync or Pacemaker.
> crm reports that all my nodes are offline and the stack is unknown (I am not
> sure if that is relevant).
>
> I believe both nodes are actually present and seen in corosync, but they may
> not be considered as such by pacemaker.
> I have messages in the logs saying that the processes cannot get the node
> name and default to uname -n:
>
> Dec 10 13:38:53 [2236] hack1.example.com crmd: info:
> corosync_node_name:Unable to get node name for nodeid 739513528
> Dec 10 13:38:53 [2236] hack1.example.com crmd: notice: get_node_name:
> Defaulting to uname -n for the local corosync node name
> Dec 10 13:38:53 [2236] hack1.example.com crmd: info: crm_get_peer:
> Node 739513528 is now known as hack1.example.com
>
> The uname -n is correct as far that is concerned.
>
>
> Does this mean anything to anyone here?
>
>
> [Lots of details to follow]...
>
> I compiled my own versions of Corosync, Pacemaker, crm and the
> resource-agents seemingly without problems.
>
> Here is what I currently have installed:
>
> # corosync -v
> Corosync Cluster Engine, version '2.3.5'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> # pacemakerd -F
> Pacemaker 1.1.13 (Build: 5b41ae1)
> Supporting v3.0.10: generated-manpages agent-manpages ascii-docs ncurses
> libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native
> atomic-attrd libesmtp acls
>
> # crm --version
> crm 2.2.0-rc3
>
>
>
> Here is the output of crm status:
>
> # crm status
> Last updated: Thu Dec 10 12:47:50 2015Last change: Thu Dec 10
> 12:02:33 2015 by root via cibadmin on hack1.example.com
> Stack: unknown
> Current DC: NONE
> 2 nodes and 0 resources configured
>
> OFFLINE: [ hack1.example.com hack2.example.com ]
>
> Full list of resources:
>
> {nothing to see here}
>
>
>
> # corosync-cmapctl | grep members
> runtime.totem.pg.mrp.srp.members.739513528.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513528.ip (str) = r(0) ip(172.20.20.184)
> runtime.totem.pg.mrp.srp.members.739513528.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513528.status (str) = joined
> runtime.totem.pg.mrp.srp.members.739513590.config_version (u64) = 0
> runtime.totem.pg.mrp.srp.members.739513590.ip (str) = r(0) ip(172.20.20.246)
> runtime.totem.pg.mrp.srp.members.739513590.join_count (u32) = 1
> runtime.totem.pg.mrp.srp.members.739513590.status (str) = joined
>
>
> # uname -n
> hack1.example.com
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513528
> RING ID 0
> id = 172.20.20.184
> status = ring 0 active with no faults
>
>
> # uname -n
> hack2.example.com
>
>
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 739513590
> RING ID 0
> id = 172.20.20.246
> status = ring 0 active with no faults
>
>
>
>
> Shouldn’t I see both nodes in the same ring?
They are in the same ring, but the cfgtool will only print the local id.
> My corosync config is currently defined as:
>
> # egrep -v '#' /etc/corosync/corosync.conf
> totem {
> version: 2
>
> crypto_cipher: none
> crypto_hash: none
> clear_node_high_bit: yes
> cluster_name: hack_cluster
> interface {
> ringnumber: 0
> bindnetaddr: 172.20.0.0
> mcastaddr: 239.255.1.1
> mcastport: 5405
> ttl: 1
> }
>
> }
>
> logging {
> fileline: on
> to_stderr: no
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> to_syslog: yes
> debug: off
> timestamp: on
> logger_subsys {
> subsys: QUORUM
> debug: off
> }
> }
>
> # cat /etc/corosync/service.d/pacemaker
> service {
> name: pacemaker
> ver: 1
> }
You don't want this section if you're using corosync 2. That's the old
"plugin" used with corosync 1.
>
>
> And here is my pacemaker configuration:
>
> # crm config show xml
>
> crm_feature_set="3.0.10" validate-with="pacemaker-2.4"
> update-client="cibadmin" epoch="13" admin_epoch="0" update-user="root"
> cib-last-written="Thu Dec 10 13:35:06 2015">
>
>
>
> id="cib-bootstrap-options-stonith-enabled"/>
> id="cib-bootstrap-options-no-quorum-policy"/>
>
>
>
>
>
>id="hack1.example.com-instance_attributes-standby"/>
>
>
>
>
>id="hack2.example.com-instance_attributes-standby"/>
>
>
>
>
>
>
>
>
>
>
>
>
>
> And finally some logs that might be relevant:
>
> Dec 10 13:38:50 [2227] hack1.example.com corosync notice [MAIN ]
> main.c:1227 Corosync Cluster Engine ('2.3.5'): started and ready to provide
> service.