On Thu, Jul 9, 2020 at 4:11 PM Niels Möller wrote:
>
> Do you expect that this "auto" logic does what that user wants? I'm
> thinking, maybe it's simpler to stick with just yes/no (no being the
> default), and then add support for --enable-fat later, to select code at
> run-time?
You are
You are right, I measured the throughput and latency for vncipher and vxor
instructions for POWER8 and updated the patch accordingly.
On Thu, Jul 9, 2020 at 5:58 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > +L16x_round_loop:
> > + lxvd2x KX,10,KEYS
> > + vperm K,K,K,swap_mask
> > +
---
powerpc64/machine.m4 | 32
1 file changed, 32 insertions(+)
create mode 100644 powerpc64/machine.m4
diff --git a/powerpc64/machine.m4 b/powerpc64/machine.m4
new file mode 100644
index ..3a121260
--- /dev/null
+++ b/powerpc64/machine.m4
@@ -0,0 +1,32
---
gcm.c | 82 +++-
powerpc64/P8/gcm-hash.asm | 998
++
2 files changed, 1066 insertions(+), 14 deletions(-)
create mode 100644 powerpc64/P8/gcm-hash.asm
diff --git a/gcm.c b/gcm.c
index cf615daf..935d4420 100644
--- a/gcm.c
+++
I measured the latency and throughput of vcipher/vncipher/vxor instructions
for POWER8
vcipher/vncipher
throughput 6 instructions per cycle
latency 0.91 clock cycles
vxor
throughput 6 instructions per cycle
latency 0.32 clock cycles
So the ideal option for POWER8 is processing 8 blocks, it has
---
powerpc64/README | 86
1 file changed, 86 insertions(+)
create mode 100644 powerpc64/README
diff --git a/powerpc64/README b/powerpc64/README
new file mode 100644
index ..f78357ab
--- /dev/null
+++ b/powerpc64/README
@@ -0,0
---
aes-decrypt-internal.c | 10 ++
aes-encrypt-internal.c | 10 ++
fat-ppc.c| 173
+++
fat-setup.h | 9 ++
powerpc64/fat/aes-decrypt-internal-2.asm | 37 +++