Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-08-01 Thread Maamoun TK
I will add PPC to this check. Thank you, Mamone On Fri, Jul 31, 2020 at 8:56 PM Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > BTW, about fat tests, I'm considering adding a make target "check-fat" > > which will run make check with some different settings of > >

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-08-01 Thread Maamoun TK
Sounds good. Thank you, Mamone On Fri, Jul 31, 2020 at 9:42 PM Niels Möller wrote: > Maamoun TK writes: > > > Yes, both are part of the same extension. I considered calling the > > directory "P8" for three reasons: > > - POWER8 is the minimal processor that support the crypto extensions > > -

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-31 Thread Niels Möller
Maamoun TK writes: > Yes, both are part of the same extension. I considered calling the > directory "P8" for three reasons: > - POWER8 is the minimal processor that support the crypto extensions > - I measured the throughput and latency of the instructions on POWER8 > - The current

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-31 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > BTW, about fat tests, I'm considering adding a make target "check-fat" > which will run make check with some different settings of > NETTLE_FAT_OVERRIDE (platform specific, and determined by configure). I've added this now, with fairly solid coverage

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-23 Thread Maamoun TK
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote: > then add ghash and fat builds > (not sure in which order). > I forgot to mention that you can merge them at any order. Regards, Mamone ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-23 Thread Maamoun TK
On Wed, Jul 22, 2020 at 6:04 PM Niels Möller wrote: > But in the patch for fat builds, you do the runtime check as > > + hwcap2 = getauxval(AT_HWCAP2); > > + features->have_crypto_ext = > + (hwcap2 & PPC_FEATURE2_VEC_CRYPTO) == PPC_FEATURE2_VEC_CRYPTO ? 1 : 0; > > I think I would prefer to

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-22 Thread Niels Möller
Maamoun TK writes: > On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote: > >> Latency less than one cycle sounds wrong. > I had the same concern, I measured the clock time from the start of the > instruction execution until the start of the next dependent instruction. > I'm sure about the

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-21 Thread Maamoun TK
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote: Latency less than one cycle sounds wrong. Usually, simple ALU > instructions like xor has a latency of exactly one cycle (i.e., when an > instruction starts executing (all inputs are available), the result is > available for depending

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-20 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > To get going, I've merged this and the machine.m4 patch to a development > branch. I'd like to do things stepwise, first do the minimal configure > changes to get AES working (and maybe with default on, to get it > exercised by the .gitlab-ci

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-20 Thread Niels Möller
Maamoun TK writes: > I measured the latency and throughput of vcipher/vncipher/vxor instructions > for POWER8 > vcipher/vncipher > throughput 6 instructions per cycle > latency 0.91 clock cycles > vxor > throughput 6 instructions per cycle > latency 0.32 clock cycles Latency less than one cycle

[PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-14 Thread Maamoun TK
I measured the latency and throughput of vcipher/vncipher/vxor instructions for POWER8 vcipher/vncipher throughput 6 instructions per cycle latency 0.91 clock cycles vxor throughput 6 instructions per cycle latency 0.32 clock cycles So the ideal option for POWER8 is processing 8 blocks, it has