Hi Jan,
thanks for all your accurate answers.

Again some questions a little bit relative to this pb and another :

As we are also facing the pb described in *Bug 854216* <https://bugzilla.redhat.com/show_bug.cgi?id=854216> -[TOTEM] FAILED TO RECEIVE + corosync crash, and it is written in this Bug that it is also relative to the 917914 where it is written: "This generally indicates a network (multicast) issue between the nodes"

So now my questions are:

1/ If we do the modifications of mcastport so the mcastport for 1st ring and the one for 2nd are disctinct (5405 & 5407), will it also avoid to face again the corosync crash described in 854216 or not ?

2/ Does the corosync crash described in 854216 may happen only in HA clusters with more than two nodes, or may it happen also in two nodes HA clusters ?

3/ Which first release of corosync includes the patch given in 854216 ?

Thanks again.
Alain





Le 29/04/2013 08:54, Jan Friesse a écrit :
Moullé Alain napsal(a):
Jan,
OK thanks.

By the way, is there a way to modify the corosync.conf mcast values and
to make a Started Pacemaker/corosync configuration take it in account
without a restart ?

Sadly not. It's one of "nice to have" features, but sadly quite complex
to implement.

Regards,
   Honza

Alain
Alain,
don't test with ifdown. Ifdown changes route table and makes corosync
bind to 127.0.0.1 so other nodes sees that node as it's own and corosync
will behave extremely badly (no reaction to other node failure or gather
state loop).

I'm really not able to remember what was real issue with same mcast
addresses, but when I was testing that, it had big problems. You maybe
didn't hit them because of ifdown, but they are there. Don't expect
crash of corosync. Expect inconsistent membership view, no reaction on
failure, ... In short, just don't do that.

Moullé Alain napsal(a):
Hi again Jan,

I add another question to my last question :
"what is the real risk with my configuration (and corosync-1.4.1-7) ?"

and is there any dependancy between the risk and the IF used for both
hearbeats ?
I explain :
    I've tested this configuration with two std eth IF, meaning doing
ifdown on an IF  and then
    same test but on the second IF , and all works fine providing
there is
always at least one IF reachable
    there is no impact on Pacemaker/corosync.
    But when mixing two IF types, let's say : a std eth IF and a bridge
eth IF , or a std eth IF and a IP/IB IF, etc.
    in this case I've always had a problem when "ifdown-ing" one or
the IF
(don't remember which one) as if
    there was only one heartbeat network.
    So my question, does the risk of rrp mode not working correctly (if
mcastaddr and mcastport are the same
    for both rings), depends on IF used ?
Not from corosync side, but routing of packets may do a really bad job,
like packet is delivered even it shouldn't ... To make really highly
available setup, use two nics per machine connected to two independent
switches.

Honza

    And is the risk null when using two std eth IF ?

Thanks for all information.
Alain

Le 25/04/2013 17:33, Jan Friesse a écrit :
Moullé Alain napsal(a):
Hi,

"you can choose" ... meaning that it is not mandatory ? and my
configuration is correct anyway ?
No, your configuration is not correct. "You can choose..." means binary
OR. So (table)

same_mcast_addr | same_port +- 1 | works
----------------------------------------
0               | 0              | 1
0               | 1              | 1
1               | 0              | 1
1               | 1              | 0


Because somebody told me that it we put same mcastaddr it is
written in
corosync documentation (but I can't find where)
that corosync may crash if one network becomes unreachable ...
could you
confirm this or reassure me telling that it is not true ;-)     ?
Yes, rrp will not works as expected.

Honza

Thanks
Alain
Moullé Alain napsal(a):
Hi,

corosync-1.4.1-7

with two rings in corosync.conf , and rrp mode active, is it
recommended
to have two distinct mcastaddr ?
You can choose to have ether two distinct mcastaddr(eses) or distinct
ports (don't use port +- 1).

(and if so, where can I find this information ?)

It looks like it's not documented (thanks for that information, I
will
add to my TODO) and sadly, corosync will not complain (this already
exists in TODO)

Regards,
      Honza

or it is not important and we can have same mcast addr on both
rings ?

Saying it in another way , is this sample correct :

         interface {
             ringnumber: 0

             # The following three values need to be set based on
your
environment
             bindnetaddr: 182.128.3.0
             mcastaddr: 239.0.0.1
             mcastport: 5405
         }
            interface {
                    ringnumber: 1

                    # The following values need to be set based on
your
environment
                    bindnetaddr: 182.128.2.0
                    mcastaddr: 239.0.0.1
                    mcastport: 5405
            }


Thanks
Alain
_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/openais

Reply via email to