That's excellent. Thanks for writing that. I'll review this implementation.
Is your speed up compared to your unaligned optimization from the
other patch? Or is that against vanilla?
With only a 1% increase, I'm first interested to see where precisely
that improvement is coming from, and if we could squeeze that out of
gcc instead, so that they're producing more or less the same code.
WireGuard mailing list