FYI, I recently tried NIC bonding on CentOS 5.2 32-bit and had issues in the bonding driver causing kernel panics. I disabled bonding because it was less stable.
Eliot Gable Senior Engineer 1228 Euclid Ave, Suite 390 Cleveland, OH 44115 Direct: 216-373-4808 Fax: 216-373-4657 ega...@broadvox.net CONFIDENTIAL COMMUNICATION. This e-mail and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please call me immediately. BROADVOX is a registered trademark of Broadvox, LLC. -----Original Message----- From: Lars Marowsky-Bree [mailto:l...@suse.de] Sent: Thursday, June 04, 2009 12:05 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure On 2009-05-25T18:10:32, Florian Haas <florian.h...@linbit.com> wrote: > I've repeatedly told customers that NIC bonding is not a valid > substitute for redundant Heartbeat links, I will stubbornly insist it > isn't one for OpenAIS RRP links either. I think your stubborness is misguided, actually. I've had a similar initial reaction when I looked at this - before ending up to recommend bonding - but it turns out that bonding seemed actually preferable. The downside with RRP, as mentioned on IRC, is that it is "only" available to OpenAIS clients. The DLM and drbd and other software however opens independent TCP connections, not to mention the server-client connectivity, which only benefits if bonding is used. > Some reasons: These reasons are all technically valid, but I don't think they outweigh the benefit from getting redundancy for all cluster communications. > - You're not protected against bugs, currently known or unknown, in the > bonding driver. If bonding itself breaks, you're screwed. The same is true for bugs in the network stack in general. > - Most people actually run bonding over interfaces over the same make, > model, and chipset. That's not necessarily optimal, but it's a reality. > Thus, if your driver breaks, you're screwed again. Granted, this is > probably to if you ran two RRP links in that same configuration too. Exactly. Some of this can be balanced by running at least different NICs in different nodes, which mitigates the problem at the cluster level, even if a single node goes down. > - Finally, you can't bond between a switched and a direct back-to-back > connection, which makes bonding entirely unsuitable for the redundant > links use case I described earlier. Yes, bonding has a different deployment mode than the scenario you described. On the other hand, modifying the deployment scenario would give you more redundancy even for the replication, which has benefits too. > That I fully agree with. The question is what "working properly" means > in this case -- should it be capable of auto-recovery, or should it not? Despite the above arguments that nowadays I'd design my clusters with bonding in mind, I of course agree that RRP _should_ work. Just like drbd/DLM/etc should work with SCTP to make use of the redundant, un-bonded links. But for the time being, I think bonded NICs is overall the best solution. Regards, Lars -- SuSE Labs, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker CONFIDENTIAL. This e-mail and any attached files are confidential and should be destroyed and/or returned if you are not the intended and proper recipient. _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker