Re: ppc64 micro optimization

2024-04-14 Thread Niels Möller
Niels Möller writes: > I've added tests that set the intial counter so that the four counter > bytes wraps around 2^32, and I've verified that if these instructions > should be changed to vadduwm, to get output that agrees with nettle's > other gcm implementations. I've commit those fixes, and

Re: ppc64 micro optimization

2024-03-24 Thread Niels Möller
Niels Möller writes: > One other question: In the counter updates, > >> C increase ctr value as input to aes_encrypt >> vaddudm S1, S0, CNT1 >> vaddudm S2, S1, CNT1 >> vaddudm S3, S2, CNT1 >> vaddudm S4, S3, CNT1 >> vaddudm S5, S4, CNT1 >> vaddudm S6, S5, CNT1 >>

Re: ppc64 micro optimization

2024-03-20 Thread Niels Möller
Niels Möller writes: > Below is an updated version of gcm-aes-encrypt.asm, seems to work for > me, and uses fewer of the regular registers. Some comments and > questions: > > 1. What about the vsrX registers, 0 <= X < 32? They are used to copy >values from and to the v registers (aka vsrX,

Re: ppc64 micro optimization

2024-03-17 Thread Niels Möller
Niels Möller writes: > Next, I'll have a look at register usage in the assembly code. Below is an updated version of gcm-aes-encrypt.asm, seems to work for me, and uses fewer of the regular registers. Some comments and questions: 1. What about the vsrX registers, 0 <= X < 32? They are used to

Re: ppc64 micro optimization

2024-03-15 Thread Niels Möller
Niels Möller writes: > Danny Tsen writes: > >> My fault. I did not include the gym-aes-crypt.c in the patch. Here is >> the updated patch. Please apply this one and we can work from there. > > Thanks, now pushed onto a new branch ppc64-gcm-aes. I've now pushed some more changes to that branch:

Re: ppc64 micro optimization

2024-03-06 Thread Niels Möller
Danny Tsen writes: > My fault. I did not include the gym-aes-crypt.c in the patch. Here is > the updated patch. Please apply this one and we can work from there. Thanks, now pushed onto a new branch ppc64-gcm-aes. Regards, /Niels -- Niels Möller. PGP key

Re: ppc64 micro optimization

2024-03-05 Thread Danny Tsen
Hi Niels, My fault. I did not include the gym-aes-crypt.c in the patch. Here is the updated patch. Please apply this one and we can work from there. Thanks. -Danny > On Mar 5, 2024, at 1:08 PM, Niels Möller wrote: > > Danny Tsen writes: > >> Please let me know when you merge the code and

Re: ppc64 micro optimization

2024-03-05 Thread Niels Möller
Danny Tsen writes: > Please let me know when you merge the code and we can work from there. Hi, I tried to apply and build with the v5 patch, and noticed some problems. Declaration of _gcm_aes_encrypt / _gcm_aes_decrypt is missing. It can go in gcm-internal.h, like on this branch,

RE: ppc64 micro optimization

2024-02-26 Thread Danny Tsen
Hi Niels, Please let me know when you merge the code and we can work from there. Thanks. -Danny From: Niels Möller Sent: Friday, February 23, 2024 1:07 AM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: ppc64 micro

Re: ppc64 micro optimization

2024-02-22 Thread Niels Möller
Danny Tsen writes: > Here is the v5 patch from your comments. Please review. Thanks. I think this looks pretty good. Maybe I should commit it on a branch and we can iterate from there. I'll be on vacation and mostly offline next week, though. > --- a/gcm-aes128.c > +++ b/gcm-aes128.c > @@

Re: ppc64 micro optimization

2024-02-20 Thread Danny Tsen
Hi Niels, Here is the v5 patch from your comments. Please review. Thanks. -Danny > On Feb 14, 2024, at 8:46 AM, Niels Möller wrote: > > Danny Tsen writes: > >> Here is the new patch v4 for AES/GCM stitched implementation and >> benchmark based on the current repo. > > Thanks. I'm not able

Re: ppc64 micro optimization

2024-02-14 Thread Niels Möller
Danny Tsen writes: > Here is the new patch v4 for AES/GCM stitched implementation and > benchmark based on the current repo. Thanks. I'm not able to read it all carefully at the moment, but I have a few comments, see below. In the mean time, I've also tried to implement something similar for

Re: ppc64 micro optimization

2024-02-03 Thread Danny Tsen
Hi Niels, Here is the new patch v4 for AES/GCM stitched implementation and benchmark based on the current repo. Thanks. -Danny > On Jan 31, 2024, at 4:35 AM, Niels Möller wrote: > > Niels Möller writes: > >> While the powerpc64 vncipher instruction really wants the original >> subkeys, not

Re: ppc64 micro optimization

2024-01-30 Thread Niels Möller
Niels Möller writes: > While the powerpc64 vncipher instruction really wants the original > subkeys, not transformed. So on power, it would be better to have a > _nettle_aes_invert that is essentially a memcpy, and then the aes > decrypt assembly code could be reworked without the xors, and run

RE: ppc64 micro optimization

2024-01-24 Thread Danny Tsen
, January 25, 2024 3:58 AM To: Danny Tsen Cc: nettle-bugs@lists.lysator.liu.se ; George Wilson Subject: [EXTERNAL] Re: ppc64 micro optimization Danny Tsen writes: > Thanks for merging the stitched implementation for PPC64 with your > detailed information and efforts We're not quite the

Re: ppc64 micro optimization

2024-01-24 Thread Niels Möller
Danny Tsen writes: > Thanks for merging the stitched implementation for PPC64 with your > detailed information and efforts We're not quite there yet, though. Do you think you could rebase your work on top of recent changes? Sorry about conflicts, but I think new macros should fit well with what

Re: ppc64 micro optimization

2024-01-22 Thread Danny Tsen
Hi Niels, Thanks for merging the stitched implementation for PPC64 with your detailed information and efforts Thanks. -Danny > On Jan 21, 2024, at 11:27 PM, Niels Möller wrote: > > In preparing for merging the gcm-aes "stitched" implementation, I'm > reviewing the existing ghash code. WIP

ppc64 micro optimization

2024-01-21 Thread Niels Möller
In preparing for merging the gcm-aes "stitched" implementation, I'm reviewing the existing ghash code. WIP branch "ppc-ghash-macros. I've introduced a macro GHASH_REDUCE, for the reduction logic. Besides that, I've been able to improve scheduling of the reduction instructions (adding in the