Re: [PATCH] "PowerPC64" AES improve syntax

2020-09-04 Thread Maamoun TK
e syntax. On Fri, Sep 4, 2020 at 1:07 PM Niels Möller wrote: > Maamoun TK writes: > > > This patch adds "VSR" macro to improve the syntax of assembly code, > > Speaking of syntax, I've had quick look at the powerpc64 assembly in > GMP, and it seems to use symbols li

Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Maamoun TK
at 8:14 PM Niels Möller wrote: > Maamoun TK writes: > > > Hi Niels, > > > > I tried to apply your method but can't get it work, > > Hmm, do you think I've missed something in the math, or are there other > difficulties? > > > while applying it one > >

Re: [PATCH] "PowerPC64" GCM support

2020-10-11 Thread Maamoun TK
Hi Niels, I tried to apply your method but can't get it work, while applying it one question came to my mind. > First, compute b_0(x) / x^64 (mod P(x)), which expands it from 64 bits to > 128, > > c_1(x) x^64 + c_0(x) = b_0(x) / x^64 (mod P(x)) > Here you are trying to get partially reduced

[PATCH] "PowerPC64" AES improve syntax

2020-08-29 Thread Maamoun TK
This patch adds "VSR" macro to improve the syntax of assembly code, I will create a separate patch for gcm-hash since it hasn't merged yet to the master. I also removed the TODO from README because I tried to use "lxv/stxv" in POWER9 instead of "lxvd2x/stxvd2x" but gcc produced "lxvd2x/stxvd2x"

Re: [Patch] "PowerPC64" Add README (Reformatted)

2020-08-29 Thread Maamoun TK
Great. I can confirm the testsuite is passed and the performance of AES is as expected for both fat and explicit crypto configurations. On Sat, Aug 29, 2020 at 4:14 PM Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > Merged to the power-asm-wip branch (there were still

Re: [PATCH 4/6] "PowerPC64" Add fat build

2020-08-19 Thread Maamoun TK
On Wed, Aug 19, 2020 at 10:46 PM Niels Möller wrote: > Hi, I've been busy and silent for a while. > > I was thinking, that I don't want to merge to master with > --enable-power-crypto-ext enabled by default. And then we really need > fat builds to have easy ci testing. > > So I'm applying the

[PATCH] "PowerPC64" Detect and check ABI

2020-08-20 Thread Maamoun TK
--- configure.ac | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/configure.ac b/configure.ac index 49db7af1..6db3fc42 100644 --- a/configure.ac +++ b/configure.ac @@ -322,6 +322,17 @@ case "$host_cpu" in AC_TRY_COMPILE([ #if defined(__sgi) &&

Re: [PATCH 4/6] "PowerPC64" Add fat build

2020-08-20 Thread Maamoun TK
On Thu, Aug 20, 2020 at 1:07 AM Jeffrey Walton wrote: > You might want to check this. In practice I don't believe 32-bit ABIs > are supported on the 64-bit iron. > > I don't recall if the Linux ABI supports 32-bit on these machines. I > thought Steven Monroe said something about this (i.e., not

[PATCH] "PowerPC64" GCM support

2020-09-27 Thread Maamoun TK
--- configure.ac | 6 +- fat-ppc.c | 33 ++ fat-setup.h| 7 + gcm.c | 69 +++- powerpc64/fat/gcm-hash.asm | 39 +++ powerpc64/p8/gcm-hash.asm | 773 + 6 files changed, 909

[PATCH] "PowerPC64" GCM support

2020-09-24 Thread Maamoun TK
This is a stand-alone patch that applies all the previous patches to the optimized GCM implementation. This patch is based on the master upstream so it can be merged directly. It passes the testsuite and yields the expected performance. --- configure.ac | 5 +- fat-ppc.c

Re: [PATCH] "PowerPC64" GCM support

2020-09-25 Thread Maamoun TK
It's gotten better with this patch, now it takes 0.49 seconds to execute under the same circumstances. On Fri, Sep 25, 2020 at 9:59 AM Niels Möller wrote: > Maamoun TK writes: > > >> What's the speedup you get from assembly gcm_fill? I see the C > >> im

Re: PPC chacha

2020-09-25 Thread Maamoun TK
> > > I'm trying to learn a bit of ppc assembly. Below is an implementation of > _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just > put the file in the powerpc64 directory and reconfigure). This machine > is little-endian, I haven't yet tested on big-endian. > Great work.

Re: [PATCH] "PowerPC64" GCM support

2020-09-24 Thread Maamoun TK
> > What's the speedup you get from assembly gcm_fill? I see the C > implementation uses memcpy and WRITE_UINT32, and is likely significantly > slower than the ctr_fill16 in ctr.c. But it could be improved using > portable means. If done well, it should be a very small fraction of the > cpu time

Re: PPC chacha

2020-09-25 Thread Maamoun TK
Yes, it would make sense. On Fri, Sep 25, 2020 at 5:25 PM Niels Möller wrote: > Jeffrey Walton writes: > > > I believe the 64-bit adds (addudm) and subtracts (subudm) require > > POWER8. > > I don't think there are any 64-bit adds in my chacha code, only 32-bit, > vadduwm. The chacha state is

Re: PPC chacha

2020-09-25 Thread Maamoun TK
ep 25, 2020 at 10:58 PM Niels Möller wrote: > Maamoun TK writes: > > > Great work. The implementation looks fine, I like the idea of using -16 > > instead of 16 for rotating because vspltisw is limited to (-16 to 15) > > and vrlw picks the low-order 5 bits which is the same fo

[PATCH] "PowerPC64" Use explicit register names

2020-09-18 Thread Maamoun TK
This patch is built upon ppc-m4-macrology.patch. Using explicit register names is working as expected now. --- powerpc64/machine.m4 | 11 +- powerpc64/p8/aes-decrypt-internal.asm | 194 +- powerpc64/p8/aes-encrypt-internal.asm | 192

[PATCH] "PowerPC64" Use same register convention in VSR macro

2020-09-19 Thread Maamoun TK
--- powerpc64/machine.m4 | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/powerpc64/machine.m4 b/powerpc64/machine.m4 index f867ec01..e2383201 100644 --- a/powerpc64/machine.m4 +++ b/powerpc64/machine.m4 @@ -24,10 +24,9 @@ define(`EPILOGUE', C Get vector-scalar

Re: [PATCH] "PowerPC64" GCM support

2020-09-28 Thread Maamoun TK
of optimizing such functions. On Mon, Sep 28, 2020 at 8:40 AM Amos Jeffries wrote: > On 28/09/20 8:25 am, Maamoun TK wrote: > > gcm_fill() got C optimization which performs close to the one I > > What do you mean by "close"? > faster or slower? > and it that differen

Re: [PATCH] "PowerPC64" GCM support

2020-10-03 Thread Maamoun TK
> > 1. Take out the fat support to it's own patch. > Done! I will post the assembly part first so you can review it. 2. You could consider doing the init_key in C, if nothing else as >documentation. It could be either under some #ifdef in gcm.c, or a >separate .c file under

[PATCH] "PowerPC64" chacha-core big-endian support "Shorter version"

2020-09-25 Thread Maamoun TK
The last patch follows the C implementation but I just figured out a decent way to do it. --- powerpc64/p7/chacha-core-internal.asm | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/powerpc64/p7/chacha-core-internal.asm

[PATCH] "PowerPC64" chacha-core big-endian support

2020-09-25 Thread Maamoun TK
--- powerpc64/p7/chacha-core-internal.asm | 55 ++- 1 file changed, 54 insertions(+), 1 deletion(-) diff --git a/powerpc64/p7/chacha-core-internal.asm b/powerpc64/p7/chacha-core-internal.asm index 33c721c1..922050ff 100644 ---

[PATCH] "PowerPC64" Use register names explicitly

2020-09-16 Thread Maamoun TK
Use explicit register names to improve the syntax of assembly files and pass -mregnames to the assembler to allow building the assembly files. I will make a stand-alone patch for GCM which brings all the accumulated modifications so it can be directly merged. --- configure.ac

Re: [PATCH] "PowerPC64" Use register names explicitly

2020-09-18 Thread Maamoun TK
> > I'm not sure it's a good idea to unconditionally use these gcc specific > flags. Are they supported by all relevant compilers? It seems that Clang doesn't support that flag. > I'm considering instead adding the attached patch. > This is a better solution. I'll consider this patch for

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-08-01 Thread Maamoun TK
Sounds good. Thank you, Mamone On Fri, Jul 31, 2020 at 9:42 PM Niels Möller wrote: > Maamoun TK writes: > > > Yes, both are part of the same extension. I considered calling the > > directory "P8" for three reasons: > > - POWER8 is the minimal processor

[Patch] "PowerPC64" Add README (Reformatted)

2020-08-01 Thread Maamoun TK
--- powerpc64/README | 73 1 file changed, 73 insertions(+) create mode 100644 powerpc64/README diff --git a/powerpc64/README b/powerpc64/README new file mode 100644 index ..19351be8 --- /dev/null +++ b/powerpc64/README @@ -0,0

Re: [PATCH] "PowerPC64" Add README

2020-08-01 Thread Maamoun TK
Hi, I reformatted the README as you suggested and re-wrote line breaks to avoid the invalid ones. Regards, Mamone On Fri, Jul 31, 2020 at 9:40 PM Niels Möller wrote: > Maamoun TK writes: > > > powerpc64/README | 86 > > Hi, this patch still has lien break problems,

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-08-01 Thread Maamoun TK
I will add PPC to this check. Thank you, Mamone On Fri, Jul 31, 2020 at 8:56 PM Niels Möller wrote: > ni...@lysator.liu.se (Niels Möller) writes: > > > BTW, about fat tests, I'm considering adding a make target "check-fat" > > which will run make check with some different settings of > >

Re: [Patch] "PowerPC64" Add README (Reformatted)

2020-08-02 Thread Maamoun TK
Thanks for the info, I'll take a look. Regards, Mamone On Sun, Aug 2, 2020 at 9:27 PM Jeffrey Walton wrote: > On Sun, Aug 2, 2020 at 2:12 PM Niels Möller wrote: > > > > Maamoun TK writes: > > > > > --

[PATCH] Check for ENV_OVERRIDE in get_ppc_features()

2020-08-02 Thread Maamoun TK
This patch doesn't add FAT_TEST_LIST to ppc in configure.ac because the check-fat patch hasn't merged to the power-asm-wip branch. It can be enabled easily by adding FAT_TEST_LIST="crypto_ext" to the ppc condition in configure.ac once both patches are merged together. Regards, Mamone ---

Re: [Patch] Optimize AES and GHASH for PowerPC64 (support little-endian and big-endian)

2020-06-30 Thread Maamoun TK
I tested something similar, I tried to load data at address 0xXXX1 using lxvd2x and it loaded it properly. On Tue, Jun 30, 2020 at 12:35 PM Jeffrey Walton wrote: > On Tue, Jun 30, 2020 at 5:29 AM Jeffrey Walton wrote: > > > > On Tue, Jun 30, 2020 at 5:14 AM Maam

Re: [PATCH] Add ppc64 and ppc64el to Gitlab CI

2020-06-30 Thread Maamoun TK
On Tue, Jun 30, 2020 at 12:26 PM Niels Möller wrote: > Does that mean that explicitly > setting QEMU_LD_PREFIX is needed only for ppc64 (big-endian), but not > for ppc64el? > It's needed for both, I just give ppc64 as an example. ___ nettle-bugs

[Patch] Optimize AES and GHASH for PowerPC64 (support little-endian and big-endian)

2020-06-30 Thread Maamoun TK
Patch implementation benchmark for GCM_AES (Tested on POWER8): little-endian: - Encrypt x~17.5 of nettle C implementation - Decrypt x~17.5 of nettle C implementation - Update x~30 of nettle C implementation big-endian: - Encrypt x~18.5 of nettle C implementation - Decrypt x~18.5 of nettle C

Re: [Patch] Optimize AES and GHASH for PowerPC64 (support little-endian and big-endian)

2020-06-30 Thread Maamoun TK
On Tue, Jun 30, 2020 at 12:29 PM Jeffrey Walton wrote: One small comment for aes_encrypt and aes_decrypt... src and dst are > usually user supplied buffers. Using lxvd2x to load a vector may > produce incorrect results if the user is feeding a stream to an > encryptor or decryptor that is not

Re: [PATCH 1/4] Check for PowerPC64 assembly if crypto extensions are available

2020-07-14 Thread Maamoun TK
On Thu, Jul 9, 2020 at 4:11 PM Niels Möller wrote: > > Do you expect that this "auto" logic does what that user wants? I'm > thinking, maybe it's simpler to stick with just yes/no (no being the > default), and then add support for --enable-fat later, to select code at > run-time? You are

Re: [PATCH 4/4] Add AES [Enc|Dec] optimized implementations for PowerPC64

2020-07-14 Thread Maamoun TK
You are right, I measured the throughput and latency for vncipher and vxor instructions for POWER8 and updated the patch accordingly. On Thu, Jul 9, 2020 at 5:58 PM Niels Möller wrote: > Maamoun TK writes: > > > +L16x_round_loop: > > + lxvd2x KX,10,KEYS > > +

[PATCH 1/6] "PowerPC64" Add machine.m4

2020-07-14 Thread Maamoun TK
--- powerpc64/machine.m4 | 32 1 file changed, 32 insertions(+) create mode 100644 powerpc64/machine.m4 diff --git a/powerpc64/machine.m4 b/powerpc64/machine.m4 new file mode 100644 index ..3a121260 --- /dev/null +++ b/powerpc64/machine.m4 @@ -0,0 +1,32

[PATCH 3/6] "PowerPC64" Add optimized GHASH

2020-07-14 Thread Maamoun TK
--- gcm.c | 82 +++- powerpc64/P8/gcm-hash.asm | 998 ++ 2 files changed, 1066 insertions(+), 14 deletions(-) create mode 100644 powerpc64/P8/gcm-hash.asm diff --git a/gcm.c b/gcm.c index cf615daf..935d4420 100644 --- a/gcm.c +++

[PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-14 Thread Maamoun TK
I measured the latency and throughput of vcipher/vncipher/vxor instructions for POWER8 vcipher/vncipher throughput 6 instructions per cycle latency 0.91 clock cycles vxor throughput 6 instructions per cycle latency 0.32 clock cycles So the ideal option for POWER8 is processing 8 blocks, it has

[PATCH 5/6] "PowerPC64" Add README

2020-07-14 Thread Maamoun TK
--- powerpc64/README | 86 1 file changed, 86 insertions(+) create mode 100644 powerpc64/README diff --git a/powerpc64/README b/powerpc64/README new file mode 100644 index ..f78357ab --- /dev/null +++ b/powerpc64/README @@ -0,0

[PATCH 4/6] "PowerPC64" Add fat build

2020-07-14 Thread Maamoun TK
--- aes-decrypt-internal.c | 10 ++ aes-encrypt-internal.c | 10 ++ fat-ppc.c| 173 +++ fat-setup.h | 9 ++ powerpc64/fat/aes-decrypt-internal-2.asm | 37 +++

Re: [Patch] Optimize AES and GHASH for PowerPC64 (support little-endian and big-endian)

2020-06-30 Thread Maamoun TK
ster/ppc/ghashp8-ppc.pl He uses lvx_u which indicate to lvx_unaligned I guess, Perl translates this instruction to lxvd2x before passing it to the assembler. regards, Mamone On Tue, Jun 30, 2020 at 11:31 PM Jeffrey Walton wrote: > On Tue, Jun 30, 2020 at 7:55 AM Maamoun TK > wrote: > &

Re: [Patch] Optimize AES and GHASH for PowerPC64 (support little-endian and big-endian)

2020-06-30 Thread Maamoun TK
On Tue, Jun 30, 2020 at 11:06 PM Niels Möller wrote: Can you explain a bit more what's special with the powerpc64 prologue > and epilogue? What are the .C_NAME($1) symbols? This is ELFv1 ABI stuff that is still required to get the function working for PowerPC64 big-endian systems. PowerPC64

[PATCH 1/4] Check for PowerPC64 assembly if crypto extensions are available

2020-07-09 Thread Maamoun TK
--- Makefile.in | 2 +- aclocal.m4 | 52 configure.ac | 11 + powerpc64/README | 67 powerpc64/machine.m4 | 24 +++ 5 files changed, 155

[PATCH 3/4] Add test 128 bytes to gcm-test

2020-07-09 Thread Maamoun TK
--- testsuite/gcm-test.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/testsuite/gcm-test.c b/testsuite/gcm-test.c index c8174019..df1fc94a 100644 --- a/testsuite/gcm-test.c +++ b/testsuite/gcm-test.c @@ -170,6 +170,29 @@ test_main(void)

[PATCH 2/4] Add GHASH optimized implementation for PowerPC64

2020-07-09 Thread Maamoun TK
--- gcm.c | 19 +- powerpc64/gcm-hash8.asm | 1004 +++ 2 files changed, 1022 insertions(+), 1 deletion(-) create mode 100644 powerpc64/gcm-hash8.asm diff --git a/gcm.c b/gcm.c index cf615daf..809c03bc 100644 --- a/gcm.c +++ b/gcm.c

[PATCH 4/4] Add AES [Enc|Dec] optimized implementations for PowerPC64

2020-07-09 Thread Maamoun TK
--- powerpc64/aes-decrypt-internal.asm | 579 + powerpc64/aes-encrypt-internal.asm | 540 ++ 2 files changed, 1119 insertions(+) create mode 100644 powerpc64/aes-decrypt-internal.asm create mode 100644

Re: Optimizing salsa20

2020-07-09 Thread Maamoun TK
I would like to help but I have no clue or experience with ARM NEON, sorry. regards, Mamone On Tue, Jul 7, 2020 at 5:46 PM Niels Möller wrote: > I've written some new ARM Neon assembly for salsa20. See > > https://gitlab.com/gnutls/nettle/-/commit/2ac58a1ce729a6cfe1d3703f4deb6da8862909e9 > , >

[PATCH] Add missing undef directives in configure.ac"

2020-07-09 Thread Maamoun TK
--- configure.ac | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure.ac b/configure.ac index cc0d67ec..d7020fd6 100644 --- a/configure.ac +++ b/configure.ac @@ -582,7 +582,9 @@ AH_VERBATIM([HAVE_NATIVE], #undef HAVE_NATIVE_ecc_secp384r1_redc #undef HAVE_NATIVE_ecc_secp521r1_modp

PPC64LE optimizing AES and GHASH

2020-06-18 Thread Maamoun TK
I added a PowerPC64LE optimized version of AES and GHASH to nettle. Patch summary: GHASH Algorithm I took the advantage of several references and researches to achieve the high-speed implementation of this algorithm. These references include several techniques that have been used to improve the

Post patch on nettle-bugs list

2020-06-19 Thread Maamoun TK
Is there a limit to the number of lines of a message posted on the list or special option that should be passed to git diff? I created a patch with command "git diff --stat --summary --patch HEAD~1 > patch.patch", when I posted this patch on the list nothing happened.

Re: Add ppc64le arch to Gitlab CI

2020-06-23 Thread Maamoun TK
I made a workaround by installing the required packages for ppc64el via .gitlab-ci.yml I will post the patch with big endian support for the optimized functions. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se

[PATCH] Add ppc64 and ppc64el to Gitlab CI

2020-06-23 Thread Maamoun TK
To run tests on ppc64 and ppc64el, this patch install cross building packages for both architectures on the Debian image, these packages will be install every time the CI triggered. A proper fix would be to install these packages to the image directly. This patch follows a different approach to

Re: [PATCH] Add ppc64 and ppc64el to Gitlab CI

2020-06-24 Thread Maamoun TK
On Wed, Jun 24, 2020 at 10:43 AM Niels Möller wrote: Speaking of ppc64 (big-endian, I assume) vs ppc64el, do you think it's > possible and reasonable to use same assembly files? That's how the > current ARM big-endian support works. > Yes, that is what I'm working on. nettle has IF_LE and IF_BE

Re: [PATCH] Add ppc64 and ppc64el to Gitlab CI

2020-06-29 Thread Maamoun TK
On Mon, Jun 29, 2020 at 3:15 PM Niels Möller wrote: > I've committed a change based on this patch. I dropped big-endian ppc > support for now (the "apt-get remove nettle-dev:$arch" failed because > it's not an arch in official debian. Not sure if that was the only > problem, but I wanted to get

[PATCH] optimized AES and GHASH for PPC64LE

2020-06-20 Thread Maamoun TK
Makefile.in | 2 +- configure.ac | 5 + gcm.c | 19 +- powerpc64le/aes-decrypt-internal.asm (new +x) | 573 +++ powerpc64le/aes-encrypt-internal.asm (new +x) | 534

[PATCH] optimizing AES and GHASH for PPC64LE

2020-06-20 Thread Maamoun TK
Makefile.in | 2 +- configure.ac | 5 + gcm.c| 19 +- powerpc64le/aes-decrypt-internal.asm | 573 powerpc64le/aes-encrypt-internal.asm | 534 +++ powerpc64le/gcm-hash8.asm

Re: PPC64LE optimizing AES and GHASH

2020-06-20 Thread Maamoun TK
diff -urN nettle/configure.ac nettle_PowerPC64LE/configure.ac --- nettle/configure.ac 2020-06-08 08:42:20.0 +0300 +++ nettle_PowerPC64LE/configure.ac 2020-06-15 18:41:43.485342900 +0300 @@ -435,6 +435,9 @@ esac fi ;; +*powerpc64le*) + asm_path=powerpc64le + ;;

[PATCH] optimizing AES and GHASH for PPC64LE

2020-06-20 Thread Maamoun TK
--- Makefile.in | 2 +- configure.ac | 5 + gcm.c | 19 +- powerpc64le/aes-decrypt-internal.asm | 573 +++ powerpc64le/aes-encrypt-internal.asm | 534

[PATCH] optimized AES and GHASH for PPC64LE

2020-06-20 Thread Maamoun TK
diff --git a/configure.ac b/configure.ac index 90ea1ea8..1ea54ce8 100644 --- a/configure.ac +++ b/configure.ac @@ -435,6 +435,9 @@ if test "x$enable_assembler" = xyes ; then esac fi ;; +*powerpc64le*) + asm_path=powerpc64le + ;; *) enable_assembler=no

Fwd: PPC64LE optimizing AES and GHASH

2020-06-22 Thread Maamoun TK
On Sat, Jun 20, 2020 at 11:54 AM Niels Möller wrote: > Have you measured speedup when going from 4 to 8 blocks? We shouldn't > add larger loops than needed. > The 8x loop has x~1.15 performance boost over 4x loop, if you think it's not worth it, I can add only 4x loop to make the code simpler.

Add ppc64le arch to Gitlab CI

2020-06-22 Thread Maamoun TK
I investigated this issue. The Debian image used for Gitlab CI only supports the following archs amd64 mips armhf arm64. To add a new arch, this arch should be added to the sources list of apt and install the required packages to build and check nettle library.

Re: PPC64LE optimizing AES and GHASH

2020-06-18 Thread Maamoun TK
On Thu, Jun 18, 2020 at 6:58 PM Maamoun TK wrote: > I added a PowerPC64LE optimized version of AES and GHASH to nettle. > Patch summary: > > GHASH Algorithm > > I took the advantage of several references and researches to achieve the > high-speed implementat

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-21 Thread Maamoun TK
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote: Latency less than one cycle sounds wrong. Usually, simple ALU > instructions like xor has a latency of exactly one cycle (i.e., when an > instruction starts executing (all inputs are available), the result is > available for depending

[PATCH] "PowerPC64" Add README

2020-07-21 Thread Maamoun TK
--- powerpc64/README | 86 1 file changed, 86 insertions(+) create mode 100644 powerpc64/README diff --git a/powerpc64/README b/powerpc64/README new file mode 100644 index ..6d6f3fbb --- /dev/null +++ b/powerpc64/README @@ -0,0

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-23 Thread Maamoun TK
On Wed, Jul 22, 2020 at 6:04 PM Niels Möller wrote: > But in the patch for fat builds, you do the runtime check as > > + hwcap2 = getauxval(AT_HWCAP2); > > + features->have_crypto_ext = > + (hwcap2 & PPC_FEATURE2_VEC_CRYPTO) == PPC_FEATURE2_VEC_CRYPTO ? 1 : 0; > > I think I would prefer to

Re: [PATCH 2/6] "PowerPC64" Add optimized AES [Enc|Dec]

2020-07-23 Thread Maamoun TK
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote: > then add ghash and fat builds > (not sure in which order). > I forgot to mention that you can merge them at any order. Regards, Mamone ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se

[PowerPC64] Add AIX to cpu detection

2020-07-20 Thread Maamoun TK
--- fat-ppc.c | 29 - 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/fat-ppc.c b/fat-ppc.c index e09b2097..eca689fe 100644 --- a/fat-ppc.c +++ b/fat-ppc.c @@ -39,10 +39,17 @@ #include #include #include -#if defined(__FreeBSD__) && __FreeBSD__ < 12

Re: [PowerPC64] Add AIX to cpu detection

2020-07-20 Thread Maamoun TK
this is fine. On Mon, Jul 20, 2020 at 7:34 PM Jeffrey Walton wrote: > On Mon, Jul 20, 2020 at 12:18 PM Maamoun TK > wrote: > > > > --- > > fat-ppc.c | 29 - > > 1 file changed, 20 insertions(+), 9 deletions(-) > > > > diff --git

Re: [AArch64] Optimize GHASH

2021-01-11 Thread Maamoun TK
ssh connection so we can get it work properly. The patch is built on top of the master branch. regards, Mamone On Sun, Jan 10, 2021 at 10:45 PM Michael Weiser wrote: > Hello Maamoun, > > On Tue, Jan 05, 2021 at 09:04:59PM +0200, Maamoun TK wrote: > > > Thank you, I will keep

Re: Make --enable-fat the default? (was: Re: Release of Nettle-3.7?)

2020-12-26 Thread Maamoun TK
Since there are many variants of architectures, some are supported and others could be supported in the future, it becomes a little annoying for end-users to browse the configurable options and enable specific options to get maximum speed for corresponding algorithms so here --enable-fat comes in

Re: [PowerPC] GCM optimization

2020-11-27 Thread Maamoun TK
I made a pull request in the repository. regards, Mamone On Thu, Nov 26, 2020 at 11:41 PM Niels Möller wrote: > Maamoun TK writes: > > > To suppress these warnings we need to declare a prototype for > > _nettle_gcm_init_key() and _nettle_gcm_hash() if > &quo

Re: [PowerPC] GCM optimization

2020-11-27 Thread Maamoun TK
On Fri, Nov 27, 2020 at 8:13 PM Niels Möller wrote: > I wonder if gcm-internal.h can be cut down a bit, to > > /* Functions available only in some configurations */ > void > _nettle_gcm_init_key (union nettle_block16 *table); > > void > _nettle_gcm_hash(const struct gcm_key *key, union

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 10:15 AM Niels Möller wrote: > Maamoun TK writes: > > Let's leave that as is, then. Do you want to make another pull request > with only the fixes for register usage? > Sure. I updated the pull request. > I was thinking of something similar to how t

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 9:21 PM Niels Möller wrote: It remains to wire it up for fat-ppc.c. Anything else that is missing? > No, I'll make a pull request for fat build support. ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se

Re: [PowerPC] GCM optimization

2020-11-25 Thread Maamoun TK
On Wed, Nov 25, 2020 at 10:13 PM Maamoun TK wrote: > I'll make a pull request for fat build support. > Done! ___ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Re: [PowerPC] GCM optimization

2020-11-24 Thread Maamoun TK
-storing permuting operations on little-endian mode. regards, Mamone On Sun, Nov 22, 2020 at 11:26 PM Niels Möller wrote: > Maamoun TK writes: > > > It generates a mask compatible with the length of leftovers, for example > if > > the length is 1 th

Re: PPC chacha

2020-11-24 Thread Maamoun TK
Thank you for your work. On POWER9 I got the following benchmark result: ./configured: chacha encrypt 308.58 chacha decrypt 325.87 ./configured --enable-power-altivec "master branch": chacha encrypt 342.15 chacha decrypt 356.24 ./configured --enable-power-altivec

Re: PPC chacha

2020-11-30 Thread Maamoun TK
On Mon, Nov 30, 2020 at 10:07 PM Maamoun TK wrote: > BTW since there is no function called while the register of the stack > frame is modified, I think it's fine to not follow the rules and keep the > store and restore sequences as are without any modification. > I'm thinking what

Re: PPC chacha

2020-11-30 Thread Maamoun TK
On Mon, Nov 30, 2020 at 12:37 PM Niels Möller wrote: > Niels Möller writes: > 1. Does the save and restore of registers look correct? I checked the >abi spec, and the intention is to use the part of the 288 byte >"Protected zone" below the stack pointer. There are requirements should

Re: Fwd: [PowerPC] GCM optimization

2020-12-01 Thread Maamoun TK
Hi George, I'll start writing a white paper called "Optimizing Galois-Counter-Mode on PowerPC Architecture Processors". Once I finish the first draft I'll send it to Neils to review it. > What do you need from the IBM side? I may be able to help. We'd > definitely > like to support you and

Re: [PowerPC] GCM optimization

2020-11-26 Thread Maamoun TK
uot; Let me know if you want me to make a pull request for these changes. regards, Mamone On Thu, Nov 26, 2020 at 9:13 PM Niels Möller wrote: > Niels Möller writes: > > > Maamoun TK writes: > > > >>> I'll make a pull request for fat build support. > >

Re: PPC chacha

2020-11-30 Thread Maamoun TK
on POWER9 I get the following benchmark with ". /configure --enable-power-altivec": chacha encrypt 763.57 chacha decrypt 780.64 regards, Mamone On Mon, Nov 30, 2020 at 11:08 PM Niels Möller wrote: > Niels Möller writes: > > > Below code seems to work (but is not yet a drop-in

Re: PPC chacha

2020-11-30 Thread Maamoun TK
On Mon, Nov 30, 2020 at 11:18 PM Maamoun TK wrote: > on POWER9 I get the following benchmark with ". /configure > --enable-power-altivec": > > chacha encrypt 763.57 > chacha decrypt 780.64 > > regards, > Mamone > I got this result using pp

Re: PPC chacha

2020-11-30 Thread Maamoun TK
On Mon, Nov 30, 2020 at 10:56 PM Niels Möller wrote: > Hmm. I agree just lowering the stack pointer sounds a bit questionable. > But if we use some other register to point into the protected zone, we > should be fine? E.g., > > addir10, r1, -0x40 C Save callee-save registers >

Re: PPC chacha

2020-12-02 Thread Maamoun TK
looks like this: std r30,-16(r1) std r31,-8(r1) li r0,-80 stvxv28,r1,r0 li r0,-64 stvxv29,r1,r0 li r0,-48 stvxv30,r1,r0 li r0,-32 stvxv31,r1,r0 regards, Mamone On Wed, Dec 2, 2020 at 7:31 PM David Edelsohn wrote: > On Wed, Dec 2, 2020 at 9:41 AM Maam

Re: PPC chacha

2020-12-02 Thread Maamoun TK
On Tue, Dec 1, 2020 at 8:02 PM Niels Möller wrote: > How portable is this, do all relevant operating systems support storing > data below the stack pointer? > I need to investigate this. regards, Mamone ___ nettle-bugs mailing list

Re: [AArch64] Optimize GHASH

2020-12-17 Thread Maamoun TK
> > I wonder which assembly files we should use if target host is aarch64, > but ABI=32? I guess the arm/v6/ code can be used unconditionally. Can > we also use arm/neon/ code unconditionally? > It seems gcc for aarch64 doesn't support building 32-bit binaries, maybe we should remove the check of

Re: [AArch64] Optimize GHASH

2020-12-18 Thread Maamoun TK
I created a couple of merge requests in the repo, with those MRs merged I think the powerpc code is stable to be included in the upcoming version of nettle. regards, Mamone On Thu, Dec 17, 2020 at 12:28 PM Maamoun TK wrote: > I wonder which assembly files we should use if target h

Re: Release of Nettle-3.7?

2020-12-18 Thread Maamoun TK
> > I think the powerpc64 code is in good shape now, and ready for release. Are > you aware of anything that needs fixing? > No, all what I can think of are a couple of issues that make the powerpc64 code more stable (you can check their merge requests in the repo). > One problem with the

Re: Release of Nettle-3.7?

2020-12-19 Thread Maamoun TK
Hi Michael, On Fri, Dec 18, 2020 at 8:00 PM Michael Weiser wrote: > qemu-user works nicely for aarch64_be. I used it to semi-natively > compile a whole aarch64 userland. I could dust off pine64 board that is > running that userland now for real-world testing if you like. > Thank you, I will

Re: [AArch64] Optimize GHASH

2020-12-19 Thread Maamoun TK
On Sat, Dec 19, 2020 at 11:27 AM Niels Möller wrote: > For the other one, > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/15 "Use signal > to detect CPU features when getauxval() isn't available", can you > explain for which systems is that needed? In the current code, you > handle

Re: [AArch64] Optimize GHASH

2020-12-21 Thread Maamoun TK
On Mon, Dec 21, 2020 at 9:29 AM Niels Möller wrote: > Thanks, looks pretty good. I added a few minor comments on the mr > (https://git.lysator.liu.se/nettle/nettle/-/merge_requests/16 for > reference). > Thank you, I made a commit with the changes. regards, Mamone

Re: [AArch64] Optimize GHASH

2020-12-14 Thread Maamoun TK
I forgot to mention that I made the benchmark test on gcc17 in GCC Farm. regards, Mamone On Tue, Dec 15, 2020 at 12:12 AM Maamoun TK wrote: > I made a merge request in the main repo that enables optimized GHASH on > AArch64 architecture. The implementation is based on Niels Möller's >

[AArch64] Optimize GHASH

2020-12-14 Thread Maamoun TK
I made a merge request in the main repo that enables optimized GHASH on AArch64 architecture. The implementation is based on Niels Möller's enhanced algorithm which yields more speedup on AArch64 arch in comparison with intel algorithm. Using the Karatsuba algorithm with Intel algorithm yielded an

Re: [AArch64] Optimize GHASH

2020-12-20 Thread Maamoun TK
On Sat, Dec 19, 2020 at 9:05 PM Niels Möller wrote: > Do you have any idea how common such old systems might be? > I don't have a specific number but I think using that old versions of glibc is uncommon specially for POWER8 and above processors considering those versions are more than 8 years

[PowerPC] GCM optimization

2020-11-09 Thread Maamoun TK
This implementation takes advantage of research made by Niels Möller to optimize GCM on PowerPC, this optimization yields a +27.7% performance boost on POWER8 over the previous implementation that was based on intel documents. The performance comparison is made by processing 4 blocks per loop

[PATCH] "PowerPC" Detect VSX support on AIX and FreeBSD

2020-11-10 Thread Maamoun TK
--- fat-ppc.c | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fat-ppc.c b/fat-ppc.c index 2bfd649f..ec971706 100644 --- a/fat-ppc.c +++ b/fat-ppc.c @@ -43,8 +43,13 @@ #if defined(_AIX) # include #elif defined(__linux__) +# include # include

Re: [PowerPC] GCM optimization

2020-11-10 Thread Maamoun TK
I think I mislabeled the percentage of performance comparison, the new method achieved 27.7% reduction in time on POWER8 that corresponds to 37.9% increase in performance. On Tue, Nov 10, 2020 at 6:25 AM Maamoun TK wrote: > This implementation takes advantage of research made by Niels Möl

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
| |*| regards, Mamone On Wed, Nov 11, 2020 at 6:24 PM George Wilson wrote: > On Wed, Nov 11, 2020 at 02:17:41AM +0200, Maamoun TK wrote: > > I think I mislabeled the percentage of performance c

Re: [PowerPC] GCM optimization

2020-11-20 Thread Maamoun TK
On Sat, Nov 14, 2020 at 8:11 PM Maamoun TK wrote: > For the first approach I can think of this method: > lxvd2x VSR(C0),0,DATA > IF_LE(` > vperm C0,C0,C0,LE_MASK > ') > slwiLENGTH,LENGTH,4 (Shift left 4 bitls because vsro get > bit[121:124])

  1   2   3   >