Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

Eliot Gable Fri, 05 Jun 2009 07:12:40 -0700

FYI, I recently tried NIC bonding on CentOS 5.2 32-bit and had issues in the 
bonding driver causing kernel panics. I disabled bonding because it was less 
stable.

Eliot Gable
Senior Engineer
1228 Euclid Ave, Suite 390
Cleveland, OH 44115

Direct: 216-373-4808
Fax: 216-373-4657
ega...@broadvox.net

CONFIDENTIAL COMMUNICATION.  This e-mail and any files transmitted with it are 
confidential and are intended solely for the use of the individual or entity to 
whom it is addressed. If you are not the intended recipient, please call me 
immediately.  BROADVOX is a registered trademark of Broadvox, LLC.

-----Original Message-----
From: Lars Marowsky-Bree [mailto:l...@suse.de]
Sent: Thursday, June 04, 2009 12:05 PM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

On 2009-05-25T18:10:32, Florian Haas <florian.h...@linbit.com> wrote:

> I've repeatedly told customers that NIC bonding is not a valid
> substitute for redundant Heartbeat links, I will stubbornly insist it
> isn't one for OpenAIS RRP links either.

I think your stubborness is misguided, actually. I've had a similar
initial reaction when I looked at this - before ending up to recommend
bonding - but it turns out that bonding seemed actually preferable.

The downside with RRP, as mentioned on IRC, is that it is "only"
available to OpenAIS clients. The DLM and drbd and other software
however opens independent TCP connections, not to mention the
server-client connectivity, which only benefits if bonding is used.

> Some reasons:

These reasons are all technically valid, but I don't think they outweigh
the benefit from getting redundancy for all cluster communications.

> - You're not protected against bugs, currently known or unknown, in the
> bonding driver. If bonding itself breaks, you're screwed.

The same is true for bugs in the network stack in general.

> - Most people actually run bonding over interfaces over the same make,
> model, and chipset. That's not necessarily optimal, but it's a reality.
> Thus, if your driver breaks, you're screwed again. Granted, this is
> probably to if you ran two RRP links in that same configuration too.

Exactly.

Some of this can be balanced by running at least different NICs in
different nodes, which mitigates the problem at the cluster level, even
if a single node goes down.

> - Finally, you can't bond between a switched and a direct back-to-back
> connection, which makes bonding entirely unsuitable for the redundant
> links use case I described earlier.

Yes, bonding has a different deployment mode than the scenario you
described. On the other hand, modifying the deployment scenario would
give you more redundancy even for the replication, which has benefits
too.

> That I fully agree with. The question is what "working properly" means
> in this case -- should it be capable of auto-recovery, or should it not?

Despite the above arguments that nowadays I'd design my clusters with
bonding in mind, I of course agree that RRP _should_ work.

Just like drbd/DLM/etc should work with SCTP to make use of the
redundant, un-bonded links.

But for the time being, I think bonded NICs is overall the best
solution.

Regards,
    Lars

--
SuSE Labs, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

CONFIDENTIAL.  This e-mail and any attached files are confidential and should 
be destroyed and/or returned if you are not the intended and proper recipient.

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

Reply via email to