On Tue, Jul 6, 2010 at 1:53 PM,  <[email protected]> wrote:
>
> Hello,
>
> I've build a cluster with just two nodes, both of them see each other, but
>  they don't like to go online. This is my config:
>
> interface {
>         bindnetaddr:    172.28.87.0
>         mcastaddr:      226.94.1.1
>                 mcastport:      5420
>                 ringnumber:     0
> }
> Both nodes have the same config.
> ..
>
> # crm_mon --one-shot
> ============
> Last updated: Tue Jul  6 13:38:39 2010
> Stack: openais
> Current DC: NONE
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
>
> OFFLINE: [ lis01 lis11 ]
> ..
>
>
> I made a tcpdump:
> ...
> 13:40:15.870996 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.085725 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 75
> 13:40:16.086270 IP 172.28.87.66.5419 > 226.94.1.1.5420: UDP, length 919
> 13:40:16.296619 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.539215 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> 13:40:16.773796 IP 172.28.87.64.5419 > 226.94.1.1.5420: UDP, length 119
> ....
>
> most of the time, just the .64 node is sending packets. Just this cut shows
> after long time the .66 node
> This tcpdump is one the other node near the same, also .64 sends most of the
> packets.
>
> When I stop openais(corosync) on .64 the other node send all the time until
> the .64 is online again.
> That seems that both see each other.
>
>
> The syslog output:
>
>  # tail -f /var/log/messages
> Jul  6 13:42:55 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on
> to the LRM 6 (30 max) times
> Jul  6 13:42:57 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> Jul  6 13:42:57 lis11 crmd: [13107]: WARN: do_lrm_control: Failed to sign on
> to the LRM 7 (30 max) times
> Jul  6 13:42:59 lis11 crmd: [13107]: info: crm_timer_popped: Wait Timer
> (I_NULL) just popped!
> Jul  6 13:42:59 lis11 crmd: [13107]: WARN: lrm_signon: can not initiate
> connection
> ... and so on

So did you check if the lrmd was running (and if not, why not)?


> Jul  6 13:46:17 lis11 cib: [13507]: WARN: do_local_notify: A-Sync reply to
> crmd failed: reply failed
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: pcmk_ipc_exit:
> Client crmd (conn=0x68eba0, async-conn=0x68eba0) left
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: pcmk_wait_dispatch:
> Child process crmd exited (pid=15909, rc=2)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] ERROR: pcmk_wait_dispatch:
> Child respawn count exceeded by crmd
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] info: update_member: Node
> hhloklis11 now has process list: 00000000000000000000000000111112 (1118482)
> Jul  6 13:46:17 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message:
> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:06 lis11 corosync[13445]:   [pcmk  ] WARN: route_ais_message:
> Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> Jul  6 13:47:54 lis11 cib: [13507]: info: cib_stats: Processed 28 operations
> (1071.00us average, 0% utilization) in the last 10min
> ....
>
>
>
> OS is SuSE SLES11 SP1
>
> pacemaker-1.1.2-0.2.1
> pacemaker-mgmt-2.0.0-0.2.19
> corosync-1.2.1-0.5.1
> libcorosync4-1.2.1-0.5.1
> openais-1.1.2-0.5.19
> libopenais3-1.1.2-0.5.19
>
> openais config is empty.
>
>
> Kernel: 2.6.32.12-0.7-default      x86_64
>
>
> Any help?
>
>
> Thomas Schreiber
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to