Hey all, I'm investigating Wireguard as an alternative to mTLS in our infrastructure and I'm trying to understand what's achievable in terms of throughput and incremental cpu overhead, though at the moment I'm focused on throughput.
We run 48 core machines with 25gbit NICs using the 5.10.113 kernel. Benchmarking unencrypted traffic between hosts in the same dc achieves line rate, and initially Wireguard was hitting ~2gbit. However, After bumping up Wireguard's default MTU to 8920 I'm able to get to ~12gbit throughput consistently over repeated benchmarks. At this point I'm at a bit of a loss where to look for increasing throughput. Independent results seem to be sparse for this kind of workload, but looking at cilium's benchmark they were able to get around 17.5gbit[0] with the following setup[1]. One thing I'm wondering (probably incorrectly?) is whether we're being impacted by Wireguard's lack of cpu affinity. Our NIC has 8 queues which are bound to explicit cores, and my understanding with Wireguard is that encrypt/decrypt handling is going to get splayed across all 48. [0] https://docs.cilium.io/en/latest/operations/performance/benchmark/#wireguard-vs-ipsec [1] https://docs.cilium.io/en/latest/operations/performance/benchmark/#test-hardware