Hey René,
I've begun trying to integrate your excellent work into WireGuard in
the branch rvh/mips:
https://git.zx2c4.com/WireGuard/commit/?h=rvd/mips
It seems like there's still a bit of cleaning up and polishing to do,
but it's headed in a great direction. There's a lot of weird
formatting and
Hi Jason,
I am using the LEDE-projects default kernel.
My comparison is only between the patched C version with the aligned
memory reads and my assembly version module.
I think it is too complex for GCC to optimize, so it flows the code by
the letter.
This results in a lot of data hazards
Hey René,
That's excellent. Thanks for writing that. I'll review this implementation.
Is your speed up compared to your unaligned optimization from the
other patch? Or is that against vanilla?
With only a 1% increase, I'm first interested to see where precisely
that improvement is coming from, a
On 09.09.2016 15:52, Baptiste Jonglez wrote:
> Nice work! I had tried to write chacha20_generic_block in MIPS assembly,
> but I got confused with endianness issues and the code didn't work in the
> end.
>
> Is your code available somewhere? I'd be happy to test on a variety of
> MIPS routers.
i
An update of my current findings.
Most improvements I have seen at the moment is writing and optimize
poly1305_generic_blocks function.
This gives a improvement of more than 1%.
I also noticed that the ping time does not change.
Improvement at the moment is around UDP: ~1.47% TCP: ~1.68% on l
Here is my last source code https://github.com/vDorst/wireguard/tree/mips32r2
Including the long history of try and fail ;).
But also good ideas like try to optimize the code for better data dependency.
Which makes the code less readable but more efficient.
This is the assembly part
https://git
Not yet.
But it think more platforms suffer of this misaligned memory fetching.
So if someone fix this also in the C code that it will boost the
performance without the assembly version.
Greats,
René
Quoting Baptiste Jonglez :
Nice work! I had tried to write chacha20_generic_block in MI
Nice work! I had tried to write chacha20_generic_block in MIPS assembly,
but I got confused with endianness issues and the code didn't work in the
end.
Is your code available somewhere? I'd be happy to test on a variety of
MIPS routers.
On Fri, Sep 09, 2016 at 01:46:11PM +, René van Dorst w
Duo the misaligned data fetching function like poly1305 causes
regression on the mips.
h0 += (le32_to_cpuvp(src + 0) >> 0) & 0x3ff;
h1 += (le32_to_cpuvp(src + 3) >> 2) & 0x3ff;
h2 += (le32_to_cpuvp(src + 6) >> 4) & 0x3ff;
h3
I did try to write some MIPS32r2 code.
I wrote the chacha20_keysetup, chacha20_generic_block and
poly1305_generic_blocks in assembly.
Tried to load all needed variables in the registers. Which should
reduce the memory overhead.
But it is very difficult for me to do code profiling and/or isolat
Would you like to write it?
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
http://lists.zx2c4.com/mailman/listinfo/wireguard
News about MIPS and ARM optimized code?
Greats,
René van Dorst.
___
WireGuard mailing list
WireGuard@lists.zx2c4.com
http://lists.zx2c4.com/mailman/listinfo/wireguard
12 matches
Mail list logo