Hello,

I've build a cluster with just two nodes, both of them see each other, but 
 they don't like to go online. This is my config:

interface {
        bindnetaddr:    172.28.87.0
        mcastaddr:      226.94.1.1
                mcastport:      5420
                ringnumber:     0
}
Both nodes have the same config.
..

# crm_mon --one-shot
============
Last updated: Tue Jul  6 13:38:39 2010
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
1 Resources configured.
============

OFFLINE: [ lis01 lis11 ]
..


I made a tcpdump:
...
13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
....

most of the time, just the .64 node is sending packets. Just this cut 
shows after long time the .66 node
This tcpdump is one the other node near the same, also .64 sends most of 
the packets.

When I stop openais(corosync) on .64 the other node send all the time 
until the .64 is online again.
That seems that both see each other.


The syslog output:

 # tail -f /var/log/messages
Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign 
on to the LRM 6 (30 max) times
Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate 
connection
Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign 
on to the LRM 7 (30 max) times
Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer 
(I_NULL) just popped!
Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate 
connection
... and so on
Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to 
crmd failed: reply failed
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit: 
Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
pcmk_wait_dispatch: Child process crmd exited (pid=15909, rc=2)
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: 
pcmk_wait_dispatch: Child respawn count exceeded by crmd
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member: 
Node hhloklis11 now has process list: 00000000000000000000000000111112 
(1118482)
Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message: 
Sending message to local.crmd failed: ipc delivery failed (rc=-2)
Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 
operations (1071.00us average, 0% utilization) in the last 10min
....



OS is SuSE SLES11 SP1

pacemaker-1.1.2-0.2.1
pacemaker-mgmt-2.0.0-0.2.19
corosync-1.2.1-0.5.1
libcorosync4-1.2.1-0.5.1
openais-1.1.2-0.5.19
libopenais3-1.1.2-0.5.19

openais config is empty.


Kernel: 2.6.32.12-0.7-default      x86_64


Any help?


Thomas Schreiber
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to