I have a series of patches against the FreeBSD driver available on the jb-wip branch which I believe is (mostly) a candidate for merging to master. The changes in the branch include:
- 9d1881a Only include compat.h for if_wg.c and fix build with an obj directory. This patch permits building the module as part of a kernel build via the LOCAL_MODULES hook in 'make buildkernel'. - 2c4b941 wg_queue_len: Remove locking. - 61b401e wg_queue_delist_staged: Use more standard STAILQ_CONCAT. Misc small fixes, generally cosmetic. - d746eed wgc_get/set: Use M_WAITOK with malloc(). - 9223fbb wg_clone_create: Use M_WAITOK with malloc. - 9095ebe wg_peer_alloc and wg_aip_add: Use M_WAITOK with malloc. The ioctl path in FreeBSD generally uses M_WAITOK rather than M_NOWAIT (e.g. the non-smalldata case in sys_ioctl()). This slightly simplifies the code by avoiding additional edge cases to handle. It is also true that FreeBSD admins do not generally expect 'ifconfig create' to fail non-deterministically (that is, they only expect failure for things like invalid arguments, missing kernel module, etc.). - 4fbba97 wg_mbuf_reset: Don't free send tags. - 0a5fa77 ratelimit_init: Use callout_init_mtx. More misc small fixes. - a679db5 DNM: Add counters for the times en/decrypt tasks do no work. This is the one commit that I don't think is a merge candidate, instead it adds some counters useful for evaluating the effect of the next commit. - 89f91dc Avoid scheduling excessive tasks for encryption/decryption. There is more detail in the commit log, but this commit changes the scheduling of encryption/decryption tasks to match the behavior of the WireGuard driver in Linux instead of the current approach of scheduling a task on all available CPUs for every packet. In my performance benchmarks of this series, this commit had the single largest effect of any of the changes. My benchmark consisted of running iperf across a tunnel between two jails on the same host (an X1 Carbon laptop with 8 CPU threads). This commit generally resulted in a doubling of throughput for both UDP and TCP with 1, 2, 4, or 8 streams. To better explain why this change matters, I sampled the counters added in the previous commit for a sample run of iperf with a single TCP stream. The "empty" counters count the number of tasks which ran on a CPU but had no work to do, the "work" counters count the number of tasks which encrypted or decrypted at least one packet. Using the current code gave the following counts: hw.wg.encrypt_work: 992858 hw.wg.encrypt_empty: 6274830 hw.wg.decrypt_work: 1114235 hw.wg.decrypt_empty: 7064707 encrypt efficiency: 13.7% decrypt efficiency: 13.6% The efficiency is close to the 12.5% worst-case for an 8 CPU system. Using the code in this commit gave the following: hw.wg.encrypt_work: 1486616 hw.wg.encrypt_empty: 783377 hw.wg.decrypt_work: 1880567 hw.wg.decrypt_empty: 657807 encrypt efficiency: 65.5% decrypt efficiency: 74.1% Note: The increased "work" counts here are a result of the increased throughput In addition, a user recently mailed Jason and I directly to say that this commit greatly reduced the power usage for a WG endpoint in an ESXi VM putting the FreeBSD VM nearly on par with a Linux VM performing the same work. - 4e0478f wg_module_init: Clean up more if the self tests fail. - b885223 Return an error code from mbuf crypt routines. Small fixes preparing to use crypto support from FreeBSD. - ce85779 Use OCF to encrpyt/decrypt packets when supported. - 8ad55a8 Use <crypto/chacha20_poly1305.h> when present. - 447abb Use curve25519 API from the kernel when available. Use crypto support from FreeBSD's kernel on new-enough versions (the OCF bits are available on 13.0-stable and later, the rest are only present in 14.0-current). FreeBSD's kernel does not currently provide a suitable API for Blake2 that matches WireGuard's needs, but does provide suitable APIs for the other crypto algorithms used by WireGuard in 14.0-current. -- John Baldwin
