I'm trying to learn a bit of ppc assembly. Below is an implementation of
_chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just
put the file in the powerpc64 directory and reconfigure). This machine
is little-endian, I haven't yet tested on big-endian.
Unfortunately I don't get
This is a stand-alone patch that applies all the previous patches to the
optimized GCM implementation. This patch is based on the master upstream so
it can be merged directly.
It passes the testsuite and yields the expected performance.
---
configure.ac | 5 +-
fat-ppc.c
Maamoun TK writes:
> This is a stand-alone patch that applies all the previous patches to the
> optimized GCM implementation. This patch is based on the master upstream so
> it can be merged directly.
Some questions on the overall structure:
What's the speedup you get from assembly gcm_fill? I
>
> What's the speedup you get from assembly gcm_fill? I see the C
> implementation uses memcpy and WRITE_UINT32, and is likely significantly
> slower than the ctr_fill16 in ctr.c. But it could be improved using
> portable means. If done well, it should be a very small fraction of the
> cpu time
On Thu, Sep 24, 2020 at 3:46 PM Niels Möller wrote:
>
> I'm trying to learn a bit of ppc assembly. Below is an implementation of
> _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just
> put the file in the powerpc64 directory and reconfigure). This machine
> is little-endian, I