PPC chacha

2020-09-24 Thread Niels Möller
I'm trying to learn a bit of ppc assembly. Below is an implementation of _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just put the file in the powerpc64 directory and reconfigure). This machine is little-endian, I haven't yet tested on big-endian. Unfortunately I don't get

[PATCH] "PowerPC64" GCM support

2020-09-24 Thread Maamoun TK
This is a stand-alone patch that applies all the previous patches to the optimized GCM implementation. This patch is based on the master upstream so it can be merged directly. It passes the testsuite and yields the expected performance. --- configure.ac | 5 +- fat-ppc.c

Re: [PATCH] "PowerPC64" GCM support

2020-09-24 Thread Niels Möller
Maamoun TK writes: > This is a stand-alone patch that applies all the previous patches to the > optimized GCM implementation. This patch is based on the master upstream so > it can be merged directly. Some questions on the overall structure: What's the speedup you get from assembly gcm_fill? I

Re: [PATCH] "PowerPC64" GCM support

2020-09-24 Thread Maamoun TK
> > What's the speedup you get from assembly gcm_fill? I see the C > implementation uses memcpy and WRITE_UINT32, and is likely significantly > slower than the ctr_fill16 in ctr.c. But it could be improved using > portable means. If done well, it should be a very small fraction of the > cpu time

Re: PPC chacha

2020-09-24 Thread Jeffrey Walton
On Thu, Sep 24, 2020 at 3:46 PM Niels Möller wrote: > > I'm trying to learn a bit of ppc assembly. Below is an implementation of > _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just > put the file in the powerpc64 directory and reconfigure). This machine > is little-endian, I