On Tue, Jul 6, 2010 at 1:53 PM, <[email protected]> wrote: > > Hello, > > I've build a cluster with just two nodes, both of them see each other, but > they don't like to go online. This is my config: > > interface { > bindnetaddr: 172.28.87.0 > mcastaddr: 226.94.1.1 > mcastport: 5420 > ringnumber: 0 > } > Both nodes have the same config. > .. > > # crm_mon --one-shot > ============ > Last updated: Tue Jul 6 13:38:39 2010 > Stack: openais > Current DC: NONE > 2 Nodes configured, 2 expected votes > 1 Resources configured. > ============ > > OFFLINE: [ lis01 lis11 ] > .. > > > I made a tcpdump: > ... > 13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119 > 13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75 > 13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919 > 13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119 > 13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119 > 13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119 > .... > > most of the time, just the .64 node is sending packets. Just this cut shows > after long time the .66 node > This tcpdump is one the other node near the same, also .64 sends most of the > packets. > > When I stop openais(corosync) on .64 the other node send all the time until > the .64 is online again. > That seems that both see each other. > > > The syslog output: > > # tail -f /var/log/messages > Jul 6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on > to the LRM 6 (30 max) times > Jul 6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped! > Jul 6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate > connection > Jul 6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on > to the LRM 7 (30 max) times > Jul 6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped! > Jul 6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate > connection > ... and so on
So did you check if the lrmd was running (and if not, why not)? > Jul 6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to > crmd failed: reply failed > Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] info: pcmk_ipc_exit: > Client crmd (conn=0x68eba0, async-conn=0x68eba0) left > Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] ERROR: pcmk_wait_dispatch: > Child process crmd exited (pid=15909, rc=2) > Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] ERROR: pcmk_wait_dispatch: > Child respawn count exceeded by crmd > Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] info: update_member: Node > hhloklis11 now has process list: 00000000000000000000000000111112 (1118482) > Jul 6 13:46:17 lis11 corosync[13445]: [pcmk ] WARN: route_ais_message: > Sending message to local.crmd failed: ipc delivery failed (rc=-2) > Jul 6 13:47:06 lis11 corosync[13445]: [pcmk ] WARN: route_ais_message: > Sending message to local.crmd failed: ipc delivery failed (rc=-2) > Jul 6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 operations > (1071.00us average, 0% utilization) in the last 10min > .... > > > > OS is SuSE SLES11 SP1 > > pacemaker-1.1.2-0.2.1 > pacemaker-mgmt-2.0.0-0.2.19 > corosync-1.2.1-0.5.1 > libcorosync4-1.2.1-0.5.1 > openais-1.1.2-0.5.19 > libopenais3-1.1.2-0.5.19 > > openais config is empty. > > > Kernel: 2.6.32.12-0.7-default x86_64 > > > Any help? > > > Thomas Schreiber > _______________________________________________ > Openais mailing list > [email protected] > https://lists.linux-foundation.org/mailman/listinfo/openais > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
