On Fri, 2010-04-09 at 07:45 -0400, Vadym Chepkov wrote:
> Hi,
>
> I experience this issue on every cluster I have, not just this one, so it
> could be a common misconfiguration on my part.
> I am using the latest version of the corosync:
>
Broadcast and redundant ring probably don't work to well together. If
you really want to use broadcast, take care to insure port numbers are
separated by 2. In your config, your using port 5405 for one ring and
5406 for another. Internally totem will use 5405+5404 for one ring, and
5405+5406 for another. With multicast this isn't a problem since you
could use different multicast addresses. With brodcast, this is not the
case.
Try fixing that and report back if it helps. If not we can further
investigate.
Regards
-steve
> corosync-1.2.1-1.el5
>
> Here is my config:
>
> compatibility: none
>
> aisexec {
> user: root
> group: root
> }
>
> service {
> name: pacemaker
> ver: 0
> }
>
> totem {
> version: 2
> token: 5000
> token_retransmits_before_loss_const: 20
> join: 1000
> consensus: 7500
> vsftype: none
> max_messages: 20
> secauth: off
> threads: 0
> clear_node_high_bit: yes
> rrp_mode: passive
> interface {
> ringnumber: 0
> broadcast: yes
> bindnetaddr: 10.0.0.0
> mcastport: 5405
> }
> interface {
> ringnumber: 1
> broadcast: yes
> bindnetaddr: 207.207.163.0
> mcastport: 5406
> }
> }
>
> logging {
> fileline: off
> to_stderr: no
> to_syslog: yes
> debug: on
> timestamp: on
> }
>
> amf {
> mode: disabled
> }
>
> [r...@xen-11 ~]# ifconfig
> eth0 Link encap:Ethernet HWaddr 00:30:48:62:4E:DC
> inet addr:207.207.163.11 Bcast:207.207.163.255 Mask:255.255.255.0
> inet6 addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2009418 errors:0 dropped:0 overruns:0 frame:0
> TX packets:799835 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:1428434820 (1.3 GiB) TX bytes:664164837 (633.3 MiB)
>
> eth1 Link encap:Ethernet HWaddr 00:30:48:62:4E:DD
> inet addr:10.0.0.1 Bcast:10.0.0.3 Mask:255.255.255.252
> inet6 addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:4233811 errors:0 dropped:0 overruns:0 frame:0
> TX packets:14118095 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:518593446 (494.5 MiB) TX bytes:14199338528 (13.2 GiB)
> Memory:d8060000-d8080000
>
> [r...@xen-12 ~]# ifconfig
> eth0 Link encap:Ethernet HWaddr 00:30:48:62:4C:CA
> inet addr:207.207.163.12 Bcast:207.207.163.255 Mask:255.255.255.0
> inet6 addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:1210002 errors:0 dropped:0 overruns:0 frame:0
> TX packets:473204 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:698444593 (666.0 MiB) TX bytes:1145344594 (1.0 GiB)
>
> eth1 Link encap:Ethernet HWaddr 00:30:48:62:4C:CB
> inet addr:10.0.0.2 Bcast:10.0.0.3 Mask:255.255.255.252
> inet6 addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:13776771 errors:0 dropped:0 overruns:0 frame:0
> TX packets:4008079 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:14138136203 (13.1 GiB) TX bytes:493569061 (470.7 MiB)
> Memory:d8060000-d8080000
>
> Cross-over connection on eth1
>
> I don't see much of details in message log, probably need to increase debug
> level
>
> [r...@xen-12 ~]# corosync-cfgtool -s
> Printing ring status.
> Local node ID 33554442
> RING ID 0
> id = 10.0.0.2
> status = ring 0 active with no faults
> RING ID 1
> id = 207.207.163.12
> status = Marking seqid 6594 ringid 1 interface 207.207.163.12 FAULTY -
> adminisrtative intervention required.
>
>
> I can reset it just fine
>
> [r...@xen-12 ~]# corosync-cfgtool -r
> Re-enabling all failed rings.
> [r...@xen-12 ~]# corosync-cfgtool -s
> Printing ring status.
> Local node ID 33554442
> RING ID 0
> id = 10.0.0.2
> status = ring 0 active with no faults
> RING ID 1
> id = 207.207.163.12
> status = ring 1 active with no faults
>
> But it goes into FAULTY mode almost right away:
>
> Apr 9 11:40:56 xen-12 corosync[13835]: [TOTEM ] Marking seqid 18340 ringid
> 1 interface 207.207.163.12 FAULTY - adminisrtative intervention required.
>
> that's the only message from the corosync in the log
>
> Thank you,
> Vadym Chepkov
>
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais