An update of my current findings.
Most improvements I have seen at the moment is writing and optimize
poly1305_generic_blocks function.
This gives a improvement of more than 1%.
I also noticed that the ping time does not change.
Improvement at the moment is around UDP: ~1.47% TCP: ~1.68% on
Here is my last source code https://github.com/vDorst/wireguard/tree/mips32r2
Including the long history of try and fail ;).
But also good ideas like try to optimize the code for better data dependency.
Which makes the code less readable but more efficient.
This is the assembly part
Not yet.
But it think more platforms suffer of this misaligned memory fetching.
So if someone fix this also in the C code that it will boost the
performance without the assembly version.
Greats,
René
Quoting Baptiste Jonglez :
Nice work! I had tried to
Nice work! I had tried to write chacha20_generic_block in MIPS assembly,
but I got confused with endianness issues and the code didn't work in the
end.
Is your code available somewhere? I'd be happy to test on a variety of
MIPS routers.
On Fri, Sep 09, 2016 at 01:46:11PM +, René van Dorst
I did try to write some MIPS32r2 code.
I wrote the chacha20_keysetup, chacha20_generic_block and
poly1305_generic_blocks in assembly.
Tried to load all needed variables in the registers. Which should
reduce the memory overhead.
But it is very difficult for me to do code profiling and/or
Would you like to write it?
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
http://lists.zx2c4.com/mailman/listinfo/wireguard