Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

Denis Gribkov Wed, 08 Mar 2017 08:33:43 -0800

Hi Everyone,

So my question is - why Corosync didn't work correctly if it start under Pacemaker?


Finally I found the answer.

In short - it was caused by the wrong settings of local name resolver.

/etc/cluster/cluster.conf contain the similar settings for each node:

<clusternode name="NODE17.local" nodeid="17">
   <fence>
     <method name="pcmk-redirect">
          <device name="pcmk" port="NODE17"/>
     </method>
   </fence>
  <altname name="NODE17.pub"/>
</clusternode>

where
NODE17.local point to private IP address
NODE17.pub point to public IP address

Due to configuration error on the latest NODE17 the internal name NODE17.local improperly resolved to the public IP (NODE17.pub). In such case command 'corosync-cfgtool -s' on the latest added node NODE17 printed status looks like:

Local node ID 17
RING ID 0
        id      = 111.11.11.1
        Marking ringid 0 interface 111.11.11.1 FAULTY
RING ID 1
        id      = 111.11.11.1
        status  = ring 1 active with no faults

As well as RING ID 0 was marked as FAULTY on all 17 nodes.
After name resolution has fixed - both rings are operating as expected .

I think the error would be resolved much quickly if I will used newest pacemaker/corosync versions, since both of them has many improvements/fixes applied to logging subsystems.

Thanks everyone who found ability to help me, and be be careful to DNS names when you're using similar configuration :)


--

Regards Denis Gribkov

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

Reply via email to