I was optimistic too early. One of the interface is still marked as FAULTY pretty fast. I have tried a multicast address as well with the same result, but with multicast results are even more upsetting. Cluster members don't see each other, each thinks another node is dead, even though one of the interfaces in the ring is in healthy state.
Sincerely yours, Vadym Chepkov --- On Mon, 4/12/10, Vadym Chepkov <[email protected]> wrote: > From: Vadym Chepkov <[email protected]> > Subject: Re: [Openais] corosync ring marked FAULTY - administrative > intervention required > To: [email protected] > Cc: [email protected] > Date: Monday, April 12, 2010, 8:50 AM > > --- On Fri, 4/9/10, Steven Dake <[email protected]> > wrote: > > > > > Broadcast and redundant ring probably don't work to > well > > together. If > > you really want to use broadcast, take care to insure > port > > numbers are > > separated by 2. In your config, your using port > 5405 > > for one ring and > > 5406 for another. Internally totem will use > 5405+5404 > > for one ring, and > > 5405+5406 for another. With multicast this isn't a > > problem since you > > could use different multicast addresses. With > > brodcast, this is not the > > case. > > > > Try fixing that and report back if it helps. If not > > we can further > > investigate. > > > > Regards > > -steve > > > > I have changed the ports and it did help, thank you. The > reason I was using broadcast is because my second ring is a > cross-over cable. I wasn't sure if multicast makes any sense > on such interface. Also I didn't know if I can have one > redundant ring with multicast and another with broadcast. I > would really like to know how an expert would configure > corosync in my setup (two nodes, two ethernet cards each, > connected to common switch and crossover-link between). > > Thank you, > Vadym > > > > > > corosync-1.2.1-1.el5 > > > > > > Here is my config: > > > > > > compatibility: none > > > > > > aisexec { > > > > > user: root > > > group: > > root > > > } > > > > > > service { > > > name: pacemaker > > > ver: 0 > > > } > > > > > > totem { > > > version: 2 > > > token: 5000 > > > > > token_retransmits_before_loss_const: 20 > > > join: 1000 > > > consensus: 7500 > > > vsftype: none > > > max_messages: > > 20 > > > secauth: off > > > threads: 0 > > > > > clear_node_high_bit: yes > > > rrp_mode: > > passive > > > interface { > > > > > ringnumber: 0 > > > > > broadcast: yes > > > > > bindnetaddr: 10.0.0.0 > > > > > mcastport: 5405 > > > } > > > interface { > > > > > ringnumber: 1 > > > > > broadcast: yes > > > > > bindnetaddr: 207.207.163.0 > > > > > mcastport: 5406 > > > } > > > } > > > > > > logging { > > > fileline: off > > > to_stderr: no > > > to_syslog: yes > > > debug: on > > > timestamp: on > > > } > > > > > > amf { > > > mode: disabled > > > } > > > > > > [r...@xen-11 ~]# ifconfig > > > eth0 Link encap:Ethernet > > HWaddr 00:30:48:62:4E:DC > > > inet > > addr:207.207.163.11 Bcast:207.207.163.255 > > Mask:255.255.255.0 > > > inet6 > > addr: fe80::230:48ff:fe62:4edc/64 Scope:Link > > > UP > > BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX > > packets:2009418 errors:0 dropped:0 overruns:0 frame:0 > > > TX > > packets:799835 errors:0 dropped:0 overruns:0 > carrier:0 > > > > > collisions:0 txqueuelen:0 > > > RX > > bytes:1428434820 (1.3 GiB) TX bytes:664164837 > (633.3 > > MiB) > > > > > > eth1 Link encap:Ethernet > > HWaddr 00:30:48:62:4E:DD > > > inet > > addr:10.0.0.1 Bcast:10.0.0.3 > > Mask:255.255.255.252 > > > inet6 > > addr: fe80::230:48ff:fe62:4edd/64 Scope:Link > > > UP > > BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX > > packets:4233811 errors:0 dropped:0 overruns:0 frame:0 > > > TX > > packets:14118095 errors:0 dropped:0 overruns:0 > carrier:0 > > > > > collisions:0 txqueuelen:1000 > > > RX > > bytes:518593446 (494.5 MiB) TX bytes:14199338528 > (13.2 > > GiB) > > > > > Memory:d8060000-d8080000 > > > > > > [r...@xen-12 ~]# ifconfig > > > eth0 Link encap:Ethernet > > HWaddr 00:30:48:62:4C:CA > > > inet > > addr:207.207.163.12 Bcast:207.207.163.255 > > Mask:255.255.255.0 > > > inet6 > > addr: fe80::230:48ff:fe62:4cca/64 Scope:Link > > > UP > > BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX > > packets:1210002 errors:0 dropped:0 overruns:0 frame:0 > > > TX > > packets:473204 errors:0 dropped:0 overruns:0 > carrier:0 > > > > > collisions:0 txqueuelen:0 > > > RX > > bytes:698444593 (666.0 MiB) TX bytes:1145344594 > (1.0 > > GiB) > > > > > > eth1 Link encap:Ethernet > > HWaddr 00:30:48:62:4C:CB > > > inet > > addr:10.0.0.2 Bcast:10.0.0.3 > > Mask:255.255.255.252 > > > inet6 > > addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link > > > UP > > BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX > > packets:13776771 errors:0 dropped:0 overruns:0 > frame:0 > > > TX > > packets:4008079 errors:0 dropped:0 overruns:0 > carrier:0 > > > > > collisions:0 txqueuelen:1000 > > > RX > > bytes:14138136203 (13.1 GiB) TX bytes:493569061 > (470.7 > > MiB) > > > > > Memory:d8060000-d8080000 > > > > > > Cross-over connection on eth1 > > > > > > I don't see much of details in message log, > > probably need to increase debug level > > > > > > [r...@xen-12 ~]# corosync-cfgtool -s > > > Printing ring status. > > > Local node ID 33554442 > > > RING ID 0 > > > id = 10.0.0.2 > > > status = ring 0 > > active with no faults > > > RING ID 1 > > > id = > > 207.207.163.12 > > > status = Marking > > seqid 6594 ringid 1 interface 207.207.163.12 FAULTY - > > adminisrtative intervention required. > > > > > > > > > I can reset it just fine > > > > > > [r...@xen-12 ~]# corosync-cfgtool -r > > > Re-enabling all failed rings. > > > [r...@xen-12 ~]# corosync-cfgtool -s > > > Printing ring status. > > > Local node ID 33554442 > > > RING ID 0 > > > id = 10.0.0.2 > > > status = ring 0 > > active with no faults > > > RING ID 1 > > > id = > > 207.207.163.12 > > > status = ring 1 > > active with no faults > > > > > > But it goes into FAULTY mode almost right away: > > > > > > Apr 9 11:40:56 xen-12 > > corosync[13835]: [TOTEM ] Marking seqid > > 18340 ringid 1 interface 207.207.163.12 FAULTY - > > adminisrtative intervention required. > > > > > > that's the only message from the corosync in the > log > > > > > > Thank you, > > > Vadym Chepkov > > > > > > _______________________________________________ > > > Openais mailing list > > > [email protected] > > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > > _______________________________________________ > > Openais mailing list > > [email protected] > > https://lists.linux-foundation.org/mailman/listinfo/openais > > > _______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
