Hello,
I've build a cluster with just two nodes, both of them see each other, but
they don't like to go online. This is my config:
interface {
bindnetaddr: 172.28.87.0
mcastaddr: 226.94.1.1
mcastport: 5420
ringnumber: 0
}
Both nodes have the same config.
..
# crm_mon --one-shot
============
Last updated: Tue Jul 6 13:38:39 2010
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
1 Resources configured.
============
OFFLINE: [ lis01 lis11 ]
..
I made a tcpdump:
...
13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
....
most of the time, just the .64 node is sending packets. Just this cut
shows after long time the .66 node
This tcpdump is one the other node near the same, also .64 sends most of
the packets.
When I stop openais(corosync) on .64 the other node send all the time
until the .64 is online again.
That seems that both see each other.
The syslog output:
# tail -f /var/log/messages
Jul 6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign
on to the LRM 6 (30 max) times
Jul 6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped!
Jul 6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
connection
Jul 6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign
on to the LRM 7 (30 max) times
Jul 6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
(I_NULL) just popped!
Jul 6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
connection
... and so on
Jul 6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to
crmd failed: reply failed
Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] info: pcmk_ipc_exit:
Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] ERROR:
pcmk_wait_dispatch: Child process crmd exited (pid=15909, rc=2)
Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] ERROR:
pcmk_wait_dispatch: Child respawn count exceeded by crmd
Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] info: update_member:
Node hhloklis11 now has process list: 00000000000000000000000000111112
(1118482)
Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] WARN: route_ais_message:
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 6 13:47:06 lis11 corosync[13445]: [pcmk ] WARN: route_ais_message:
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul 6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28
operations (1071.00us average, 0% utilization) in the last 10min
....
OS is SuSE SLES11 SP1
pacemaker-1.1.2-0.2.1
pacemaker-mgmt-2.0.0-0.2.19
corosync-1.2.1-0.5.1
libcorosync4-1.2.1-0.5.1
openais-1.1.2-0.5.19
libopenais3-1.1.2-0.5.19
openais config is empty.
Kernel: 2.6.32.12-0.7-default x86_64
Any help?
Thomas Schreiber_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais