On 7/30/25 10:59, Friedrich Weber wrote: > Testing has shown that running corosync (only) over a bond can be > problematic in some failure scenarios and for certain bond modes. The > documentation only discourages bonds for corosync because corosync can > switch between available networks itself, but does not mention other > caveats when using bonds for corosync. > > Hence, extend the documentation with recommendations and caveats > regarding bonds for corosync. > > Signed-off-by: Friedrich Weber <f.we...@proxmox.com> > --- > > Notes: > Aaron suggested we could expose the bond-lacp-rate in the GUI to > make it easier to change the setting on the PVE side. I'd open a > feature report for this. > > Changes since v3: > - describe recommendations first, and further details for interested > readers below. Consequently, rephrase failure scenario description > (thx HD!) > > Changes since v2: > - fix wording in the failure scenario description > - explain that load-balancing bond modes are affected and why > - clarify that the caveats apply whenver a bond is used for Corosync > traffic (even if only as a redundant link) > > Changes since v1: > - move to its own section under "Cluster Network" > - reword remarks about bond-lacp-rate fast > - reword remark under "Requirements" > > pve-network.adoc | 4 ++- > pvecm.adoc | 68 +++++++++++++++++++++++++++++++++++++++++++++--- > 2 files changed, 67 insertions(+), 5 deletions(-) > > diff --git a/pve-network.adoc b/pve-network.adoc > index 2dec882..b361f97 100644 > --- a/pve-network.adoc > +++ b/pve-network.adoc > @@ -495,7 +495,9 @@ use the active-backup mode. > > For the cluster network (Corosync) we recommend configuring it with multiple > networks. Corosync does not need a bond for network redundancy as it can > switch > -between networks by itself, if one becomes unusable. > +between networks by itself, if one becomes unusable. Some bond modes are > known > +to be problematic for Corosync, see > +xref:pvecm_corosync_over_bonds[Corosync over Bonds]. > > The following bond configuration can be used as distributed/shared > storage network. The benefit would be that you get more speed and the > diff --git a/pvecm.adoc b/pvecm.adoc > index 312a26f..3af1a06 100644 > --- a/pvecm.adoc > +++ b/pvecm.adoc > @@ -89,10 +89,8 @@ NOTE: To ensure reliable Corosync redundancy, it is > essential to have at least > another link on a different physical network. This enables Corosync to keep > the > cluster communication alive should the dedicated network be down. > + > -NOTE: A single link backed by a bond is not enough to provide Corosync > -redundancy. When a bonded interface fails and Corosync cannot fall back to > -another link, it can lead to asymmetric communication in the cluster, which > in > -turn can lead to the cluster losing quorum. > +NOTE: A single link backed by a bond can be problematic in certain failure > +scenarios, see xref:pvecm_corosync_over_bonds[Corosync Over Bonds]. > > * The root password of a cluster node is required for adding nodes. > > @@ -606,6 +604,68 @@ transport to `udp` or `udpu` in your > xref:pvecm_edit_corosync_conf[corosync.conf > but keep in mind that this will disable all cryptography and redundancy > support. > This is therefore not recommended. > > +[[pvecm_corosync_over_bonds]] > +Corosync Over Bonds > +~~~~~~~~~~~~~~~~~~~ > + > +Recommendations > +^^^^^^^^^^^^^^^ > + > +We recommend at least one dedicated physical NIC for the primary Corosync > link, > +see xref:pvecm_cluster_requirements[Requirements]. > +xref:sysadmin_network_bond[Bonds] may be used as additional links for > increased > +redundancy. The following caveats apply *whenever a bond is used for Corosync > +traffic*: > + > +* Bond mode *active-backup* may not provide the expected redundancy in > certain > + failure scenarios, see below for details. > + > +* We *advise against* using bond modes *balance-rr*, *balance-xor*, > + *balance-tlb*, or *balance-alb* for Corosync traffic. They are known to be > + problematic in certain failure scenarios, see below for details. > + > +* *IEEE 802.3ad (LACP)*: If LACP bonds are used for corosync traffic, we > + strongly recommend setting `bond-lacp-rate fast` *on the Proxmox VE node > and > + the switch*! With the default setting `bond-lacp-rate slow`, this mode is Looking at the rendered version, having the `bond-lacp-rate fast` and then the bold sentence afterwards seems a bit much. Maybe we could limit the bold parts to just `Proxmox VE` and `switch` here instead?
> + known to be problematic in certain failure scenarios, see below for > details. > + > +Background > +^^^^^^^^^^ > + > +Using a xref:sysadmin_network_bond[bond] as a Corosync link can be > problematic > +in certain failure scenarios. Consider the failure scenario where one of the > +bonded interfaces fails and stops transmitting packets, but its link state > +stays up, and there are no other Corosync links available. In this scenario, > +some bond modes may cause a state of asymmetric connectivity where cluster > +nodes can only communicate with different subsets of other nodes. Affected > are > +bond modes that provide load balancing, as these modes may still try to send > +out a subset of packets via the failed interface. In case of asymmetric > +connectivity, Corosync may not be able to form a stable quorum in the > cluster. > +If this state persists and HA is enabled, even nodes whose bond does not have > +any issues may fence themselves. In the worst case, the whole cluster may > fence > +itself. > + > +The bond mode *active-backup* will not cause asymmetric connectivity in the Maybe we can make the `not` here bold as well, to better differentiate its behavior from the other bond modes? > +failure scenario described above. However, the bond with the interface > failure > +may not switch over to the backup link. The node may lose connection to the > +cluster and, if HA is enabled, fence itself. > + > +Bond modes *balance-rr*, *balance-xor*, *balance_tlb*, or *balance-alb* may > +cause asymmetric connectivity in the failure scenario above, which can lead > to > +unexpected fencing if HA is enabled. > + > +Bond mode *IEEE 802.3ad (LACP)* can cause asymmetric connectivity in the > +failure scenario above, but it can recover from this state, as each side of > the > +bond (Proxmox VE node and switch) can stop using a bonded interface if it has > +not received three LACPDUs in a row on it. However, with default settings, > +LACPDUs are only sent every 30 seconds, yielding a failover time of 90 > seconds. > +This is too long, as nodes with HA resources will fence themselves already > +after roughly one minute without a stable quorum. If LACP bonds are used for > +corosync traffic, we recommend setting `bond-lacp-rate fast` on the Proxmox > VE > +node and the switch! Setting this option on one side requests the other side > to This should match the part above and be bold as well. > +send an LACPDU every second. Setting this option on both sides can reduce the > +failover time in the scenario above to 3 seconds and thus prevent fencing. > + > Separate Cluster Network > ~~~~~~~~~~~~~~~~~~~~~~~~ > The changes look good to me, so consider this: Reviewed-by: Mira Limbeck <m.limb...@proxmox.com> _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel