Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Niels Möller
Maamoun TK writes: > Thanks for the clarification, I just misunderstanded the division with the > partial reduction in a previous reply. > > Ok, so you mean a polynomial division of b_0(x) by P(x) where P(x) = X^128 > + X^127 + X^126 + X^121 + 1 > b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod

Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Maamoun TK
Thanks for the clarification, I just misunderstanded the division with the partial reduction in a previous reply. Ok, so you mean a polynomial division of b_0(x) by P(x) where P(x) = X^128 + X^127 + X^126 + X^121 + 1 b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod P(x) b_0(x)/P(x) = (b_0(x)*(p'))

Re: GCM with ARM Neon

2020-10-11 Thread Jeffrey Walton
On Sun, Oct 11, 2020 at 2:03 PM Niels Möller wrote: > > Jeffrey Walton writes: > > > I may be mistaken, but I believe 64-bit poly multiplies are available. > > Or they are available on Aarch64 with Crypto extensions. > > I'm looking in the Arm Instruction Set Reference Guide, labeled version >

Re: GCM with ARM Neon

2020-10-11 Thread Niels Möller
Jeffrey Walton writes: > I may be mistaken, but I believe 64-bit poly multiplies are available. > Or they are available on Aarch64 with Crypto extensions. I'm looking in the Arm Instruction Set Reference Guide, labeled version 1.0, 2018. It includes a section on cryptographic instructions, but

Re: GCM with ARM Neon (was: Re: [PATCH] "PowerPC64" GCM support)

2020-10-11 Thread Jeffrey Walton
On Sun, Oct 11, 2020 at 1:42 PM Niels Möller wrote: > > ni...@lysator.liu.se (Niels Möller) writes: > > > So if we have the input in register A (loaded from memory with no > > processing besides ensuring proper *byte* order), and precompute two > > values, M representing b_1(x) x^64 + c_1(x), and

GCM with ARM Neon (was: Re: [PATCH] "PowerPC64" GCM support)

2020-10-11 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > So if we have the input in register A (loaded from memory with no > processing besides ensuring proper *byte* order), and precompute two > values, M representing b_1(x) x^64 + c_1(x), and L representing b_0(x) > x^64 + d_1(x)), then we get the two

Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Niels Möller
Maamoun TK writes: > Hi Niels, > > I tried to apply your method but can't get it work, Hmm, do you think I've missed something in the math, or are there other difficulties? > while applying it one > question came to my mind. > > >> First, compute b_0(x) / x^64 (mod P(x)), which expands it from

Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Maamoun TK
Hi Niels, I tried to apply your method but can't get it work, while applying it one question came to my mind. > First, compute b_0(x) / x^64 (mod P(x)), which expands it from 64 bits to > 128, > > c_1(x) x^64 + c_0(x) = b_0(x) / x^64 (mod P(x)) > Here you are trying to get partially reduced