One more thing I observed, when it fails it gets rebounded to a loopback 
interface sometimes 

RING ID 1
        id      = 127.0.0.1
        status  = ring 1 active with no faults

And after that corosync-cfgtool -r doesn't help anymore, at least I don't see 
any output from cpdump -lnn -i eth0  udp portrange 5400-5499 at all.

Sincerely yours,
  Vadym Chepkov


--- On Thu, 4/15/10, Vadym Chepkov <[email protected]> wrote:

> From: Vadym Chepkov <[email protected]>
> Subject: Re: [Openais] corosync ring marked FAULTY - administrative 
> intervention required
> To: [email protected]
> Cc: [email protected]
> Date: Thursday, April 15, 2010, 11:55 AM
> I was optimistic too early. One of
> the interface is still marked as FAULTY pretty fast. I have
> tried a multicast address as well with the same result, but
> with multicast results are even more upsetting. Cluster
> members don't see each other, each thinks another node is
> dead, even though one of the interfaces in the ring is in
> healthy state.
> 
> Sincerely yours,
>   Vadym Chepkov
> 
> 
> --- On Mon, 4/12/10, Vadym Chepkov <[email protected]>
> wrote:
> 
> > From: Vadym Chepkov <[email protected]>
> > Subject: Re: [Openais] corosync ring marked FAULTY -
> administrative intervention required
> > To: [email protected]
> > Cc: [email protected]
> > Date: Monday, April 12, 2010, 8:50 AM
> > 
> > --- On Fri, 4/9/10, Steven Dake <[email protected]>
> > wrote:
> > 
> > > 
> > > Broadcast and redundant ring probably don't work
> to
> > well
> > > together.  If
> > > you really want to use broadcast, take care to
> insure
> > port
> > > numbers are
> > > separated by 2.  In your config, your using
> port
> > 5405
> > > for one ring and
> > > 5406 for another.  Internally totem will use
> > 5405+5404
> > > for one ring, and
> > > 5405+5406 for another.  With multicast this
> isn't a
> > > problem since you
> > > could use different multicast addresses.  With
> > > brodcast, this is not the
> > > case.
> > > 
> > > Try fixing that and report back if it helps.  If
> not
> > > we can further
> > > investigate.
> > > 
> > > Regards
> > > -steve
> > > 
> > 
> > I have changed the ports and it did help, thank you.
> The
> > reason I was using broadcast is because my second ring
> is a
> > cross-over cable. I wasn't sure if multicast makes any
> sense
> > on such interface. Also I didn't know if I can have
> one
> > redundant ring with multicast and another with
> broadcast. I
> > would really like to know how an expert would
> configure
> > corosync in my setup (two nodes, two ethernet cards
> each,
> > connected to common switch and crossover-link
> between).
> > 
> > Thank you,
> > Vadym
> > 
> > > 
> > > > corosync-1.2.1-1.el5
> > > > 
> > > > Here is my config:
> > > > 
> > > > compatibility: none
> > > > 
> > > > aisexec {
> > > >     
> > >    user:   root
> > > >         group: 
> > > root
> > > > }
> > > > 
> > > > service {
> > > >         name: pacemaker
> > > >         ver:  0
> > > > }
> > > > 
> > > > totem {
> > > >         version: 2
> > > >         token: 5000
> > > >     
> > >    token_retransmits_before_loss_const: 20
> > > >         join: 1000
> > > >         consensus: 7500
> > > >         vsftype: none
> > > >         max_messages:
> > > 20
> > > >         secauth: off
> > > >         threads: 0
> > > >     
> > >    clear_node_high_bit: yes
> > > >         rrp_mode:
> > > passive
> > > >         interface {
> > > >             
> > >    ringnumber: 0
> > > >             
> > >    broadcast: yes
> > > >             
> > >    bindnetaddr: 10.0.0.0
> > > >             
> > >    mcastport: 5405
> > > >         }
> > > >         interface {
> > > >             
> > >    ringnumber: 1
> > > >             
> > >    broadcast: yes
> > > >             
> > >    bindnetaddr: 207.207.163.0
> > > >             
> > >    mcastport: 5406
> > > >         }
> > > > }
> > > > 
> > > > logging {
> > > >         fileline: off
> > > >         to_stderr: no
> > > >         to_syslog: yes
> > > >         debug: on
> > > >         timestamp: on
> > > > }
> > > > 
> > > > amf {
> > > >         mode: disabled
> > > > }
> > > > 
> > > > [r...@xen-11 ~]# ifconfig 
> > > > eth0      Link encap:Ethernet 
> > > HWaddr 00:30:48:62:4E:DC  
> > > >           inet
> > > addr:207.207.163.11  Bcast:207.207.163.255 
> > > Mask:255.255.255.0
> > > >           inet6
> > > addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
> > > >           UP
> > > BROADCAST RUNNING MULTICAST  MTU:1500 
> Metric:1
> > > >           RX
> > > packets:2009418 errors:0 dropped:0 overruns:0
> frame:0
> > > >           TX
> > > packets:799835 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >       
> > >    collisions:0 txqueuelen:0 
> > > >           RX
> > > bytes:1428434820 (1.3 GiB)  TX bytes:664164837
> > (633.3
> > > MiB)
> > > > 
> > > > eth1      Link encap:Ethernet 
> > > HWaddr 00:30:48:62:4E:DD  
> > > >           inet
> > > addr:10.0.0.1  Bcast:10.0.0.3 
> > > Mask:255.255.255.252
> > > >           inet6
> > > addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
> > > >           UP
> > > BROADCAST RUNNING MULTICAST  MTU:1500 
> Metric:1
> > > >           RX
> > > packets:4233811 errors:0 dropped:0 overruns:0
> frame:0
> > > >           TX
> > > packets:14118095 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >       
> > >    collisions:0 txqueuelen:1000 
> > > >           RX
> > > bytes:518593446 (494.5 MiB)  TX
> bytes:14199338528
> > (13.2
> > > GiB)
> > > >       
> > >    Memory:d8060000-d8080000 
> > > > 
> > > > [r...@xen-12 ~]# ifconfig 
> > > > eth0      Link encap:Ethernet 
> > > HWaddr 00:30:48:62:4C:CA  
> > > >           inet
> > > addr:207.207.163.12  Bcast:207.207.163.255 
> > > Mask:255.255.255.0
> > > >           inet6
> > > addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
> > > >           UP
> > > BROADCAST RUNNING MULTICAST  MTU:1500 
> Metric:1
> > > >           RX
> > > packets:1210002 errors:0 dropped:0 overruns:0
> frame:0
> > > >           TX
> > > packets:473204 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >       
> > >    collisions:0 txqueuelen:0 
> > > >           RX
> > > bytes:698444593 (666.0 MiB)  TX
> bytes:1145344594
> > (1.0
> > > GiB)
> > > > 
> > > > eth1      Link encap:Ethernet 
> > > HWaddr 00:30:48:62:4C:CB  
> > > >           inet
> > > addr:10.0.0.2  Bcast:10.0.0.3 
> > > Mask:255.255.255.252
> > > >           inet6
> > > addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
> > > >           UP
> > > BROADCAST RUNNING MULTICAST  MTU:1500 
> Metric:1
> > > >           RX
> > > packets:13776771 errors:0 dropped:0 overruns:0
> > frame:0
> > > >           TX
> > > packets:4008079 errors:0 dropped:0 overruns:0
> > carrier:0
> > > >       
> > >    collisions:0 txqueuelen:1000 
> > > >           RX
> > > bytes:14138136203 (13.1 GiB)  TX
> bytes:493569061
> > (470.7
> > > MiB)
> > > >       
> > >    Memory:d8060000-d8080000 
> > > > 
> > > > Cross-over connection on eth1
> > > > 
> > > > I don't see much of details  in message
> log,
> > > probably need to increase debug level
> > > > 
> > > > [r...@xen-12 ~]# corosync-cfgtool -s
> > > > Printing ring status.
> > > > Local node ID 33554442
> > > > RING ID 0
> > > >     id    = 10.0.0.2
> > > >     status    = ring 0
> > > active with no faults
> > > > RING ID 1
> > > >     id    =
> > > 207.207.163.12
> > > >     status    = Marking
> > > seqid 6594 ringid 1 interface 207.207.163.12
> FAULTY -
> > > adminisrtative intervention required.
> > > > 
> > > > 
> > > > I can reset it just fine
> > > > 
> > > > [r...@xen-12 ~]# corosync-cfgtool -r
> > > > Re-enabling all failed rings.
> > > > [r...@xen-12 ~]# corosync-cfgtool -s
> > > > Printing ring status.
> > > > Local node ID 33554442
> > > > RING ID 0
> > > >     id    = 10.0.0.2
> > > >     status    = ring 0
> > > active with no faults
> > > > RING ID 1
> > > >     id    =
> > > 207.207.163.12
> > > >     status    = ring 1
> > > active with no faults
> > > > 
> > > > But it goes into FAULTY mode almost right
> away:
> > > > 
> > > > Apr  9 11:40:56 xen-12
> > > corosync[13835]:   [TOTEM ] Marking seqid
> > > 18340 ringid 1 interface 207.207.163.12 FAULTY -
> > > adminisrtative intervention required.
> > > > 
> > > > that's the only message from the corosync in
> the
> > log
> > > > 
> > > > Thank you,
> > > > Vadym Chepkov
> > > > 
> > > >
> _______________________________________________
> > > > Openais mailing list
> > > > [email protected]
> > > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > > 
> > > _______________________________________________
> > > Openais mailing list
> > > [email protected]
> > > https://lists.linux-foundation.org/mailman/listinfo/openais
> > >
> > 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to