I am using the LEDE-projects default kernel.
My comparison is only between the patched C version with the aligned
memory reads and my assembly version module.
I think it is too complex for GCC to optimize, so it flows the code by
This results in a lot of data hazards.
By doing by hand you can prevent many data hazards.
The trick is try to do 2 things by weaving the code together.
Which results in less maintainable code.
René van Dorst.
Quoting "Jason A. Donenfeld" <ja...@zx2c4.com>:
That's excellent. Thanks for writing that. I'll review this implementation.
Is your speed up compared to your unaligned optimization from the
other patch? Or is that against vanilla?
With only a 1% increase, I'm first interested to see where precisely
that improvement is coming from, and if we could squeeze that out of
gcc instead, so that they're producing more or less the same code.
WireGuard mailing list