Hi All, in a test that we started last week we have two Pacemaker+Corosync clusters, each with three hosts, where all six hosts are on the same network(s). The two clusters are identically configured, with one execption: the mcastport is 688 for one, and 689 for the other.
This morning I found the clusters in a strange state, none of the hosts could see any of the others, i.e. Pacemaker output was "as if" Corosync wasn't running on the other nodes, although the network was fine, as I could easily verify with a ping etc. I then noticed in the lsof output that Corosync seems to also use the port below the configured mcastport, which leads me to my questions: Is this normal? It doesn't seem to be documented in http://corosync.org/doku.php?id=faq:configure_openais and corosync.conf(5). Is this overlap created by the additional port a likely cause for the cluster conking out? Thanks, Colin PS: I'm in the process of trying to revive the cluster; /etc/init.d/corosync stop didn't work, but a few "kill -9" and "rm -f /var/lib/heartbeat/crm/*" commands later I'm up-and-running again on 2x2 of the 2x3 nodes with the same config as previously, looking fine so far... r...@h001:~# dpkg -l | grep corosync ii corosync 1.2.0-0ubuntu1 Standards-based cluster framework (daemon an ii libcorosync4 1.2.0-0ubuntu1 Standards-based cluster framework (libraries r...@h001:~# cat /etc/corosync/corosync.conf totem { version: 2 consensus: 1500 vsftype: none clear_node_high_bit: yes secauth: off threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.50.32 broadcast: yes mcastport: 688 <=== 689 for the other cluster } interface { ringnumber: 1 bindnetaddr: 192.168.52.32 broadcast: yes mcastport: 688 <=== 689 for the other cluster } } amf { mode: disabled } service { ver: 0 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: daemon debug: on timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } r...@h001:~# lsof -n | grep corosync | grep UDP corosync 17688 root 5u IPv4 89563 0t0 UDP 255.255.255.255:688 corosync 17688 root 6u IPv4 89564 0t0 UDP 192.168.50.40:687 corosync 17688 root 7u IPv4 89565 0t0 UDP 192.168.50.40:688 corosync 17688 root 8u IPv4 89612 0t0 UDP 255.255.255.255:688 corosync 17688 root 9u IPv4 89613 0t0 UDP 192.168.52.40:687 corosync 17688 root 10u IPv4 89614 0t0 UDP 192.168.52.40:688 r...@h001:~# _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
