Re: [Openais] Question about corosync mcastaddr setting

Moullé Alain Tue, 21 May 2013 07:22:35 -0700

Thanks again, it is clear.

Just one last thing ;-) :
which is the last official release available of the corosync rpm ?
and where can I get this last release ?
(can't find it on clusterlabs site)


Thanks
Alain

Le 17/05/2013 07:55, Jan Friesse a écrit :

Moullé Alain napsal(a):

Hi Jan,
thanks for all your accurate answers.

Again some questions a little bit relative to this pb and another :

As we are also facing the pb described in *Bug 854216*
<https://bugzilla.redhat.com/show_bug.cgi?id=854216> -[TOTEM] FAILED TO
RECEIVE + corosync crash, and it is written in this Bug that it is also
relative to the 917914 where it is written: "This generally indicates a
network (multicast) issue between the nodes"

Yes. Bug is happening if one of nodes is unable to receive multicast
packets for quite a long time (multicast is blocked OR for example
switch decides to stop sending multicast packets OR switch is overloaded
and doesn't send packets, ...). Also for VMs, you may need to run "echo
1 > /sys/class/net/virbrN/bridge/multicast_querier" on newer kernels,
because mcast querier there is somehow buggy (see
https://bugzilla.redhat.com/show_bug.cgi?id=880035).

So now my questions are:

1/ If we do the modifications of mcastport so the mcastport for 1st ring
and the one for 2nd are disctinct (5405 & 5407), will it also  avoid to
face again the corosync crash described in 854216 or not ?

No it will not avoid described crash. Actually, 854216 is not related to
rrp at all. Crash can happen with and without rrp.

2/ Does the corosync crash described in 854216 may happen only in HA
clusters with more than two nodes, or may it happen also in two nodes HA
clusters ?

Also with two nodes.

3/ Which first release of corosync includes the patch given in 854216 ?

Upstream 1.4.5 (or 2.3.0). If you are asking about RHEL, it's not in
6.4, but hopefully will be in 6.5.

Thanks again.
Alain

Regards,
   Honza




Le 29/04/2013 08:54, Jan Friesse a écrit :

Moullé Alain napsal(a):

Jan,
OK thanks.

By the way, is there a way to modify the corosync.conf mcast values and
to make a Started Pacemaker/corosync configuration take it in account
without a restart ?

Sadly not. It's one of "nice to have" features, but sadly quite complex
to implement.

Regards,
    Honza

Alain

Alain,
don't test with ifdown. Ifdown changes route table and makes corosync
bind to 127.0.0.1 so other nodes sees that node as it's own and
corosync
will behave extremely badly (no reaction to other node failure or
gather
state loop).

I'm really not able to remember what was real issue with same mcast
addresses, but when I was testing that, it had big problems. You maybe
didn't hit them because of ifdown, but they are there. Don't expect
crash of corosync. Expect inconsistent membership view, no reaction on
failure, ... In short, just don't do that.

Moullé Alain napsal(a):

Hi again Jan,

I add another question to my last question :
"what is the real risk with my configuration (and corosync-1.4.1-7) ?"

and is there any dependancy between the risk and the IF used for both
hearbeats ?
I explain :
     I've tested this configuration with two std eth IF, meaning doing
ifdown on an IF  and then
     same test but on the second IF , and all works fine providing
there is
always at least one IF reachable
     there is no impact on Pacemaker/corosync.
     But when mixing two IF types, let's say : a std eth IF and a
bridge
eth IF , or a std eth IF and a IP/IB IF, etc.
     in this case I've always had a problem when "ifdown-ing" one or
the IF
(don't remember which one) as if
     there was only one heartbeat network.
     So my question, does the risk of rrp mode not working correctly
(if
mcastaddr and mcastport are the same
     for both rings), depends on IF used ?

Not from corosync side, but routing of packets may do a really bad job,
like packet is delivered even it shouldn't ... To make really highly
available setup, use two nics per machine connected to two independent
switches.

Honza

     And is the risk null when using two std eth IF ?

Thanks for all information.
Alain

Le 25/04/2013 17:33, Jan Friesse a écrit :

Moullé Alain napsal(a):

Hi,

"you can choose" ... meaning that it is not mandatory ? and my
configuration is correct anyway ?

No, your configuration is not correct. "You can choose..." means
binary
OR. So (table)

same_mcast_addr | same_port +- 1 | works
----------------------------------------
0               | 0              | 1
0               | 1              | 1
1               | 0              | 1
1               | 1              | 0

Because somebody told me that it we put same mcastaddr it is
written in
corosync documentation (but I can't find where)
that corosync may crash if one network becomes unreachable ...
could you
confirm this or reassure me telling that it is not true ;-)     ?

Yes, rrp will not works as expected.

Honza

Thanks
Alain

Moullé Alain napsal(a):

Hi,

corosync-1.4.1-7

with two rings in corosync.conf , and rrp mode active, is it
recommended
to have two distinct mcastaddr ?

You can choose to have ether two distinct mcastaddr(eses) or
distinct
ports (don't use port +- 1).

(and if so, where can I find this information ?)

It looks like it's not documented (thanks for that information, I
will
add to my TODO) and sadly, corosync will not complain (this already
exists in TODO)

Regards,
       Honza

or it is not important and we can have same mcast addr on both
rings ?

Saying it in another way , is this sample correct :

          interface {
              ringnumber: 0

              # The following three values need to be set based on
your
environment
              bindnetaddr: 182.128.3.0
              mcastaddr: 239.0.0.1
              mcastport: 5405
          }
             interface {
                     ringnumber: 1

                     # The following values need to be set based on
your
environment
                     bindnetaddr: 182.128.2.0
                     mcastaddr: 239.0.0.1
                     mcastport: 5405
             }


Thanks
Alain
_______________________________________________
Openais mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/openais


_______________________________________________
Openais mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/openais

Re: [Openais] Question about corosync mcastaddr setting

Reply via email to