One more thing I observed, when it fails it gets rebounded to a loopback
interface sometimes
RING ID 1
id = 127.0.0.1
status = ring 1 active with no faults
And after that corosync-cfgtool -r doesn't help anymore, at least I don't see
any output from cpdump -lnn -i eth0 udp portrange 5400-5499 at all.
Sincerely yours,
Vadym Chepkov
--- On Thu, 4/15/10, Vadym Chepkov <[email protected]> wrote:
> From: Vadym Chepkov <[email protected]>
> Subject: Re: [Openais] corosync ring marked FAULTY - administrative
> intervention required
> To: [email protected]
> Cc: [email protected]
> Date: Thursday, April 15, 2010, 11:55 AM
> I was optimistic too early. One of
> the interface is still marked as FAULTY pretty fast. I have
> tried a multicast address as well with the same result, but
> with multicast results are even more upsetting. Cluster
> members don't see each other, each thinks another node is
> dead, even though one of the interfaces in the ring is in
> healthy state.
>
> Sincerely yours,
> Vadym Chepkov
>
>
> --- On Mon, 4/12/10, Vadym Chepkov <[email protected]>
> wrote:
>
> > From: Vadym Chepkov <[email protected]>
> > Subject: Re: [Openais] corosync ring marked FAULTY -
> administrative intervention required
> > To: [email protected]
> > Cc: [email protected]
> > Date: Monday, April 12, 2010, 8:50 AM
> >
> > --- On Fri, 4/9/10, Steven Dake <[email protected]>
> > wrote:
> >
> > >
> > > Broadcast and redundant ring probably don't work
> to
> > well
> > > together. If
> > > you really want to use broadcast, take care to
> insure
> > port
> > > numbers are
> > > separated by 2. In your config, your using
> port
> > 5405
> > > for one ring and
> > > 5406 for another. Internally totem will use
> > 5405+5404
> > > for one ring, and
> > > 5405+5406 for another. With multicast this
> isn't a
> > > problem since you
> > > could use different multicast addresses. With
> > > brodcast, this is not the
> > > case.
> > >
> > > Try fixing that and report back if it helps. If
> not
> > > we can further
> > > investigate.
> > >
> > > Regards
> > > -steve
> > >
> >
> > I have changed the ports and it did help, thank you.
> The
> > reason I was using broadcast is because my second ring
> is a
> > cross-over cable. I wasn't sure if multicast makes any
> sense
> > on such interface. Also I didn't know if I can have
> one
> > redundant ring with multicast and another with
> broadcast. I
> > would really like to know how an expert would
> configure
> > corosync in my setup (two nodes, two ethernet cards
> each,
> > connected to common switch and crossover-link
> between).
> >
> > Thank you,
> > Vadym
> >
> > >
> > > > corosync-1.2.1-1.el5
> > > >
> > > > Here is my config:
> > > >
> > > > compatibility: none
> > > >
> > > > aisexec {
> > > >
> > > user: root
> > > > group:
> > > root
> > > > }
> > > >
> > > > service {
> > > > name: pacemaker
> > > > ver: 0
> > > > }
> > > >
> > > > totem {
> > > > version: 2
> > > > token: 5000
> > > >
> > > token_retransmits_before_loss_const: 20
> > > > join: 1000
> > > > consensus: 7500
> > > > vsftype: none
> > > > max_messages:
> > > 20
> > > > secauth: off
> > > > threads: 0
> > > >
> > > clear_node_high_bit: yes
> > > > rrp_mode:
> > > passive
> > > > interface {
> > > >
> > > ringnumber: 0
> > > >
> > > broadcast: yes
> > > >
> > > bindnetaddr: 10.0.0.0
> > > >
> > > mcastport: 5405
> > > > }
> > > > interface {
> > > >
> > > ringnumber: 1
> > > >
> > > broadcast: yes
> > > >
> > > bindnetaddr: 207.207.163.0
> > > >
> > > mcastport: 5406
> > > > }
> > > > }
> > > >
> > > > logging {
> > > > fileline: off
> > > > to_stderr: no
> > > > to_syslog: yes
> > > > debug: on
> > > > timestamp: on
> > > > }
> > > >
> > > > amf {
> > > > mode: disabled
> > > > }
> > > >
> > > > [r...@xen-11 ~]# ifconfig
> > > > eth0 Link encap:Ethernet
> > > HWaddr 00:30:48:62:4E:DC
> > > > inet
> > > addr:207.207.163.11 Bcast:207.207.163.255
> > > Mask:255.255.255.0
> > > > inet6
> > > addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
> > > > UP
> > > BROADCAST RUNNING MULTICAST MTU:1500
> Metric:1
> > > > RX
> > > packets:2009418 errors:0 dropped:0 overruns:0
> frame:0
> > > > TX
> > > packets:799835 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >
> > > collisions:0 txqueuelen:0
> > > > RX
> > > bytes:1428434820 (1.3 GiB) TX bytes:664164837
> > (633.3
> > > MiB)
> > > >
> > > > eth1 Link encap:Ethernet
> > > HWaddr 00:30:48:62:4E:DD
> > > > inet
> > > addr:10.0.0.1 Bcast:10.0.0.3
> > > Mask:255.255.255.252
> > > > inet6
> > > addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
> > > > UP
> > > BROADCAST RUNNING MULTICAST MTU:1500
> Metric:1
> > > > RX
> > > packets:4233811 errors:0 dropped:0 overruns:0
> frame:0
> > > > TX
> > > packets:14118095 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >
> > > collisions:0 txqueuelen:1000
> > > > RX
> > > bytes:518593446 (494.5 MiB) TX
> bytes:14199338528
> > (13.2
> > > GiB)
> > > >
> > > Memory:d8060000-d8080000
> > > >
> > > > [r...@xen-12 ~]# ifconfig
> > > > eth0 Link encap:Ethernet
> > > HWaddr 00:30:48:62:4C:CA
> > > > inet
> > > addr:207.207.163.12 Bcast:207.207.163.255
> > > Mask:255.255.255.0
> > > > inet6
> > > addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
> > > > UP
> > > BROADCAST RUNNING MULTICAST MTU:1500
> Metric:1
> > > > RX
> > > packets:1210002 errors:0 dropped:0 overruns:0
> frame:0
> > > > TX
> > > packets:473204 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >
> > > collisions:0 txqueuelen:0
> > > > RX
> > > bytes:698444593 (666.0 MiB) TX
> bytes:1145344594
> > (1.0
> > > GiB)
> > > >
> > > > eth1 Link encap:Ethernet
> > > HWaddr 00:30:48:62:4C:CB
> > > > inet
> > > addr:10.0.0.2 Bcast:10.0.0.3
> > > Mask:255.255.255.252
> > > > inet6
> > > addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
> > > > UP
> > > BROADCAST RUNNING MULTICAST MTU:1500
> Metric:1
> > > > RX
> > > packets:13776771 errors:0 dropped:0 overruns:0
> > frame:0
> > > > TX
> > > packets:4008079 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >
> > > collisions:0 txqueuelen:1000
> > > > RX
> > > bytes:14138136203 (13.1 GiB) TX
> bytes:493569061
> > (470.7
> > > MiB)
> > > >
> > > Memory:d8060000-d8080000
> > > >
> > > > Cross-over connection on eth1
> > > >
> > > > I don't see much of details in message
> log,
> > > probably need to increase debug level
> > > >
> > > > [r...@xen-12 ~]# corosync-cfgtool -s
> > > > Printing ring status.
> > > > Local node ID 33554442
> > > > RING ID 0
> > > > id = 10.0.0.2
> > > > status = ring 0
> > > active with no faults
> > > > RING ID 1
> > > > id =
> > > 207.207.163.12
> > > > status = Marking
> > > seqid 6594 ringid 1 interface 207.207.163.12
> FAULTY -
> > > adminisrtative intervention required.
> > > >
> > > >
> > > > I can reset it just fine
> > > >
> > > > [r...@xen-12 ~]# corosync-cfgtool -r
> > > > Re-enabling all failed rings.
> > > > [r...@xen-12 ~]# corosync-cfgtool -s
> > > > Printing ring status.
> > > > Local node ID 33554442
> > > > RING ID 0
> > > > id = 10.0.0.2
> > > > status = ring 0
> > > active with no faults
> > > > RING ID 1
> > > > id =
> > > 207.207.163.12
> > > > status = ring 1
> > > active with no faults
> > > >
> > > > But it goes into FAULTY mode almost right
> away:
> > > >
> > > > Apr 9 11:40:56 xen-12
> > > corosync[13835]: [TOTEM ] Marking seqid
> > > 18340 ringid 1 interface 207.207.163.12 FAULTY -
> > > adminisrtative intervention required.
> > > >
> > > > that's the only message from the corosync in
> the
> > log
> > > >
> > > > Thank you,
> > > > Vadym Chepkov
> > > >
> > > >
> _______________________________________________
> > > > Openais mailing list
> > > > [email protected]
> > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > >
> > > _______________________________________________
> > > Openais mailing list
> > > [email protected]
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > >
> >
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
>
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais