Re: [pve-devel] [PATCH docs 4/7] pvecm: extend cluster Requirements

Robin Christ Thu, 08 May 2025 05:04:23 -0700


On 07.05.25 17:22, Kevin Schneider wrote:

IMO this isn't strict enough and we should empathize on the importanceof the problem. I would go for
To ensure reliable Corosync redundancy, it's essential to use at leasttwo separate physical and logical networks. Single bonded interfaces donot provide Corosync redundancy. When a bonded interface fails withoutredundancy, it can lead to asymmetric communication, causing all nodesto lose quorum—even if more than half of them can still communicate witheach other.

Although a bond on the interface together with MLAG'd switches CANprovide further resiliency in case of switch or single NIC PHY failure.It does not protect against total failure of the NIC of course.

I think adding a "typical topologies" or "example topologies" to thedocs might be a good idea?

Below my personal, opinionated recommendation after deploying quite agood amount of Proxmox clusters. Of course I don't expect everyone toagree with this... But hopefully it can serve as a starting point?



Typical topologies:

In most cases, a server for a Proxmox cluster will have at least twophysical NICs. One is usually a low or medium speed dual-port onboardNIC (1GBase-T or 10GBase-T). The other one is typically a medium or highspeed add-in PCIe NIC (e.g. 10G SFP+, 40G QSFP+, 25G SFP28, 100GQSFP28). There may be more NICs depending on the specific use case, e.g.a separate NIC for Ceph Cluster (private, replication, back-side) traffic.

In such a setup, it is recommended to reserve the low or medium speedonboard NICs for cluster traffic (and potentially management purposes).These NICs should be connected using a switch.Although for very small clusters (3 nodes) and a dual-port NIC a ringtopology could be used to connect the nodes together, this is notrecommended as it makes later expansion more troublesome.

It is recommended to use a physically separate switch just for thecluster network. If your main switch is the only way for nodes tocommunicate, failure of this switch will take out your entire clusterwith potentially catastrophic consequences.

For single-port onboard NICs there are no further design decisions tomake. However, onboard NICs are almost always dual port, which allowssome more freedom in the design of the cluster network.


Design of the dedicated cluster network:

a) Two separate cluster switches, switches support MLAG or Stacking /Virtual ChassisThis is an ideal scenario, in which you deploy two managed switches inan MLAG or Stacking / Virtual Chassis configuration. MLAG or Stacking /Virtual Chassis requires the switches to have a link between them,called IPL ("Inter Peer Link"). MLAG or Stacking / Virtual Chassis makestwo switches behave as if they were one, but if one switch fails, theremaining one will still work and take over seamlessly!

Each cluster node is connected to both switches. Both NIC ports on eachnode are bonded together (LACP recommended).


This topology provides a very good degree of resiliency.

The bond is configured as Ring0 for corosync.

b) Two separate cluster switches, switches DO NOT support MLAG orStacking / Virtual Chassis

In this scenario you deploy two separate switches (potentiallyunmanaged). There should not be a link between the switches, as this caneasily lead to loops and makes the entire configuration more complex.

Each cluster node is connected to both switches, but the NIC ports arenot bonded together. Typically, both NIC ports will be in separate IPsubnets.

This topology provides a slightly smaller degree of resiliency comparedto MLAG.

One switch / broadcast domain is configured as Ring0 for corosync, theother one is configured as Ring1.



c) Single separate cluster switch

If you only want to deploy a single switch that is reserved for clustertraffic, you can either use a single NIC port on each node, or bothbonded together. It will not make much of a difference, as bonding willonly protect against single PHY / port failure.


The interface is configured as Ring0 for corosync.


Usage of the other NICs for redundancy purposes:

It is recommended to add the other NICs / networks in the system asbackup links / additional rings to corosync. Bad connectivity over apotentially congested storage network is better than no connectivity atall, because the dedicated cluster network has failed and there is nobackup.




_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH docs 4/7] pvecm: extend cluster Requirements

Reply via email to