Re: [Openais] corosync ring marked FAULTY - administrative intervention required

Steven Dake Fri, 09 Apr 2010 11:34:04 -0700

On Fri, 2010-04-09 at 07:45 -0400, Vadym Chepkov wrote:
> Hi,
> 
> I experience this issue  on every cluster I have, not just this one, so it 
> could be a common misconfiguration on my part.
> I am using the latest version of the corosync:
>


Broadcast and redundant ring probably don't work to well together.  If
you really want to use broadcast, take care to insure port numbers are
separated by 2.  In your config, your using port 5405 for one ring and
5406 for another.  Internally totem will use 5405+5404 for one ring, and
5405+5406 for another.  With multicast this isn't a problem since you
could use different multicast addresses.  With brodcast, this is not the
case.

Try fixing that and report back if it helps.  If not we can further
investigate.

Regards
-steve


> corosync-1.2.1-1.el5
> 
> Here is my config:
> 
> compatibility: none
> 
> aisexec {
>         user:   root
>         group:  root
> }
> 
> service {
>         name: pacemaker
>         ver:  0
> }
> 
> totem {
>         version: 2
>         token: 5000
>         token_retransmits_before_loss_const: 20
>         join: 1000
>         consensus: 7500
>         vsftype: none
>         max_messages: 20
>         secauth: off
>         threads: 0
>         clear_node_high_bit: yes
>         rrp_mode: passive
>         interface {
>                 ringnumber: 0
>                 broadcast: yes
>                 bindnetaddr: 10.0.0.0
>                 mcastport: 5405
>         }
>         interface {
>                 ringnumber: 1
>                 broadcast: yes
>                 bindnetaddr: 207.207.163.0
>                 mcastport: 5406
>         }
> }
> 
> logging {
>         fileline: off
>         to_stderr: no
>         to_syslog: yes
>         debug: on
>         timestamp: on
> }
> 
> amf {
>         mode: disabled
> }
> 
> [r...@xen-11 ~]# ifconfig 
> eth0      Link encap:Ethernet  HWaddr 00:30:48:62:4E:DC  
>           inet addr:207.207.163.11  Bcast:207.207.163.255  Mask:255.255.255.0
>           inet6 addr: fe80::230:48ff:fe62:4edc/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:2009418 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:799835 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:1428434820 (1.3 GiB)  TX bytes:664164837 (633.3 MiB)
> 
> eth1      Link encap:Ethernet  HWaddr 00:30:48:62:4E:DD  
>           inet addr:10.0.0.1  Bcast:10.0.0.3  Mask:255.255.255.252
>           inet6 addr: fe80::230:48ff:fe62:4edd/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:4233811 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:14118095 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:518593446 (494.5 MiB)  TX bytes:14199338528 (13.2 GiB)
>           Memory:d8060000-d8080000 
> 
> [r...@xen-12 ~]# ifconfig 
> eth0      Link encap:Ethernet  HWaddr 00:30:48:62:4C:CA  
>           inet addr:207.207.163.12  Bcast:207.207.163.255  Mask:255.255.255.0
>           inet6 addr: fe80::230:48ff:fe62:4cca/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1210002 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:473204 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0 
>           RX bytes:698444593 (666.0 MiB)  TX bytes:1145344594 (1.0 GiB)
> 
> eth1      Link encap:Ethernet  HWaddr 00:30:48:62:4C:CB  
>           inet addr:10.0.0.2  Bcast:10.0.0.3  Mask:255.255.255.252
>           inet6 addr: fe80::230:48ff:fe62:4ccb/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:13776771 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:4008079 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:14138136203 (13.1 GiB)  TX bytes:493569061 (470.7 MiB)
>           Memory:d8060000-d8080000 
> 
> Cross-over connection on eth1
> 
> I don't see much of details  in message log, probably need to increase debug 
> level
> 
> [r...@xen-12 ~]# corosync-cfgtool -s
> Printing ring status.
> Local node ID 33554442
> RING ID 0
>       id      = 10.0.0.2
>       status  = ring 0 active with no faults
> RING ID 1
>       id      = 207.207.163.12
>       status  = Marking seqid 6594 ringid 1 interface 207.207.163.12 FAULTY - 
> adminisrtative intervention required.
> 
> 
> I can reset it just fine
> 
> [r...@xen-12 ~]# corosync-cfgtool -r
> Re-enabling all failed rings.
> [r...@xen-12 ~]# corosync-cfgtool -s
> Printing ring status.
> Local node ID 33554442
> RING ID 0
>       id      = 10.0.0.2
>       status  = ring 0 active with no faults
> RING ID 1
>       id      = 207.207.163.12
>       status  = ring 1 active with no faults
> 
> But it goes into FAULTY mode almost right away:
> 
> Apr  9 11:40:56 xen-12 corosync[13835]:   [TOTEM ] Marking seqid 18340 ringid 
> 1 interface 207.207.163.12 FAULTY - adminisrtative intervention required.
> 
> that's the only message from the corosync in the log
> 
> Thank you,
> Vadym Chepkov
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Re: [Openais] corosync ring marked FAULTY - administrative intervention required

Reply via email to