e syntax.
On Fri, Sep 4, 2020 at 1:07 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > This patch adds "VSR" macro to improve the syntax of assembly code,
>
> Speaking of syntax, I've had quick look at the powerpc64 assembly in
> GMP, and it seems to use symbols li
at 8:14 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > Hi Niels,
> >
> > I tried to apply your method but can't get it work,
>
> Hmm, do you think I've missed something in the math, or are there other
> difficulties?
>
> > while applying it one
> >
Hi Niels,
I tried to apply your method but can't get it work, while applying it one
question came to my mind.
> First, compute b_0(x) / x^64 (mod P(x)), which expands it from 64 bits to
> 128,
>
> c_1(x) x^64 + c_0(x) = b_0(x) / x^64 (mod P(x))
>
Here you are trying to get partially reduced
This patch adds "VSR" macro to improve the syntax of assembly code, I will
create a separate patch for gcm-hash since it hasn't merged yet to the
master. I also removed the TODO from README because I tried to use
"lxv/stxv" in POWER9 instead of "lxvd2x/stxvd2x" but gcc produced
"lxvd2x/stxvd2x"
Great. I can confirm the testsuite is passed and the performance of AES is
as expected for both fat and explicit crypto configurations.
On Sat, Aug 29, 2020 at 4:14 PM Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > Merged to the power-asm-wip branch (there were still
On Wed, Aug 19, 2020 at 10:46 PM Niels Möller wrote:
> Hi, I've been busy and silent for a while.
>
> I was thinking, that I don't want to merge to master with
> --enable-power-crypto-ext enabled by default. And then we really need
> fat builds to have easy ci testing.
>
> So I'm applying the
---
configure.ac | 27 ---
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/configure.ac b/configure.ac
index 49db7af1..6db3fc42 100644
--- a/configure.ac
+++ b/configure.ac
@@ -322,6 +322,17 @@ case "$host_cpu" in
AC_TRY_COMPILE([
#if defined(__sgi) &&
On Thu, Aug 20, 2020 at 1:07 AM Jeffrey Walton wrote:
> You might want to check this. In practice I don't believe 32-bit ABIs
> are supported on the 64-bit iron.
>
> I don't recall if the Linux ABI supports 32-bit on these machines. I
> thought Steven Monroe said something about this (i.e., not
---
configure.ac | 6 +-
fat-ppc.c | 33 ++
fat-setup.h| 7 +
gcm.c | 69 +++-
powerpc64/fat/gcm-hash.asm | 39 +++
powerpc64/p8/gcm-hash.asm | 773
+
6 files changed, 909
This is a stand-alone patch that applies all the previous patches to the
optimized GCM implementation. This patch is based on the master upstream so
it can be merged directly.
It passes the testsuite and yields the expected performance.
---
configure.ac | 5 +-
fat-ppc.c
It's gotten better with this patch, now it takes 0.49 seconds to
execute under the same circumstances.
On Fri, Sep 25, 2020 at 9:59 AM Niels Möller wrote:
> Maamoun TK writes:
>
> >> What's the speedup you get from assembly gcm_fill? I see the C
> >> im
>
>
> I'm trying to learn a bit of ppc assembly. Below is an implementation of
> _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just
> put the file in the powerpc64 directory and reconfigure). This machine
> is little-endian, I haven't yet tested on big-endian.
>
Great work.
>
> What's the speedup you get from assembly gcm_fill? I see the C
> implementation uses memcpy and WRITE_UINT32, and is likely significantly
> slower than the ctr_fill16 in ctr.c. But it could be improved using
> portable means. If done well, it should be a very small fraction of the
> cpu time
Yes, it would make sense.
On Fri, Sep 25, 2020 at 5:25 PM Niels Möller wrote:
> Jeffrey Walton writes:
>
> > I believe the 64-bit adds (addudm) and subtracts (subudm) require
> > POWER8.
>
> I don't think there are any 64-bit adds in my chacha code, only 32-bit,
> vadduwm. The chacha state is
ep 25, 2020 at 10:58 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > Great work. The implementation looks fine, I like the idea of using -16
> > instead of 16 for rotating because vspltisw is limited to (-16 to 15)
> > and vrlw picks the low-order 5 bits which is the same fo
This patch is built upon ppc-m4-macrology.patch. Using explicit register
names is working as expected now.
---
powerpc64/machine.m4 | 11 +-
powerpc64/p8/aes-decrypt-internal.asm | 194
+-
powerpc64/p8/aes-encrypt-internal.asm | 192
---
powerpc64/machine.m4 | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/powerpc64/machine.m4 b/powerpc64/machine.m4
index f867ec01..e2383201 100644
--- a/powerpc64/machine.m4
+++ b/powerpc64/machine.m4
@@ -24,10 +24,9 @@ define(`EPILOGUE',
C Get vector-scalar
of optimizing such functions.
On Mon, Sep 28, 2020 at 8:40 AM Amos Jeffries wrote:
> On 28/09/20 8:25 am, Maamoun TK wrote:
> > gcm_fill() got C optimization which performs close to the one I
>
> What do you mean by "close"?
> faster or slower?
> and it that differen
>
> 1. Take out the fat support to it's own patch.
>
Done! I will post the assembly part first so you can review it.
2. You could consider doing the init_key in C, if nothing else as
>documentation. It could be either under some #ifdef in gcm.c, or a
>separate .c file under
The last patch follows the C implementation but I just figured out a decent
way to do it.
---
powerpc64/p7/chacha-core-internal.asm | 22 +-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/powerpc64/p7/chacha-core-internal.asm
---
powerpc64/p7/chacha-core-internal.asm | 55
++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/powerpc64/p7/chacha-core-internal.asm
b/powerpc64/p7/chacha-core-internal.asm
index 33c721c1..922050ff 100644
---
Use explicit register names to improve the syntax of assembly files and
pass -mregnames to the assembler to allow building the assembly files. I
will make a stand-alone patch for GCM which brings all the accumulated
modifications so it can be directly merged.
---
configure.ac
>
> I'm not sure it's a good idea to unconditionally use these gcc specific
> flags. Are they supported by all relevant compilers?
It seems that Clang doesn't support that flag.
> I'm considering instead adding the attached patch.
>
This is a better solution. I'll consider this patch for
Sounds good.
Thank you,
Mamone
On Fri, Jul 31, 2020 at 9:42 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > Yes, both are part of the same extension. I considered calling the
> > directory "P8" for three reasons:
> > - POWER8 is the minimal processor
---
powerpc64/README | 73
1 file changed, 73 insertions(+)
create mode 100644 powerpc64/README
diff --git a/powerpc64/README b/powerpc64/README
new file mode 100644
index ..19351be8
--- /dev/null
+++ b/powerpc64/README
@@ -0,0
Hi,
I reformatted the README as you suggested and re-wrote line breaks to avoid
the invalid ones.
Regards,
Mamone
On Fri, Jul 31, 2020 at 9:40 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > powerpc64/README | 86
>
> Hi, this patch still has lien break problems,
I will add PPC to this check.
Thank you,
Mamone
On Fri, Jul 31, 2020 at 8:56 PM Niels Möller wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > BTW, about fat tests, I'm considering adding a make target "check-fat"
> > which will run make check with some different settings of
> >
Thanks for the info, I'll take a look.
Regards,
Mamone
On Sun, Aug 2, 2020 at 9:27 PM Jeffrey Walton wrote:
> On Sun, Aug 2, 2020 at 2:12 PM Niels Möller wrote:
> >
> > Maamoun TK writes:
> >
> > > --
This patch doesn't add FAT_TEST_LIST to ppc in configure.ac because the
check-fat patch hasn't merged to the power-asm-wip branch. It can be
enabled easily by adding FAT_TEST_LIST="crypto_ext" to the ppc condition in
configure.ac once both patches are merged together.
Regards,
Mamone
---
I tested something similar, I tried to load data at address 0xXXX1
using lxvd2x and it loaded it properly.
On Tue, Jun 30, 2020 at 12:35 PM Jeffrey Walton wrote:
> On Tue, Jun 30, 2020 at 5:29 AM Jeffrey Walton wrote:
> >
> > On Tue, Jun 30, 2020 at 5:14 AM Maam
On Tue, Jun 30, 2020 at 12:26 PM Niels Möller wrote:
> Does that mean that explicitly
> setting QEMU_LD_PREFIX is needed only for ppc64 (big-endian), but not
> for ppc64el?
>
It's needed for both, I just give ppc64 as an example.
___
nettle-bugs
Patch implementation benchmark for GCM_AES (Tested on POWER8):
little-endian:
- Encrypt x~17.5 of nettle C implementation
- Decrypt x~17.5 of nettle C implementation
- Update x~30 of nettle C implementation
big-endian:
- Encrypt x~18.5 of nettle C implementation
- Decrypt x~18.5 of nettle C
On Tue, Jun 30, 2020 at 12:29 PM Jeffrey Walton wrote:
One small comment for aes_encrypt and aes_decrypt... src and dst are
> usually user supplied buffers. Using lxvd2x to load a vector may
> produce incorrect results if the user is feeding a stream to an
> encryptor or decryptor that is not
On Thu, Jul 9, 2020 at 4:11 PM Niels Möller wrote:
>
> Do you expect that this "auto" logic does what that user wants? I'm
> thinking, maybe it's simpler to stick with just yes/no (no being the
> default), and then add support for --enable-fat later, to select code at
> run-time?
You are
You are right, I measured the throughput and latency for vncipher and vxor
instructions for POWER8 and updated the patch accordingly.
On Thu, Jul 9, 2020 at 5:58 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > +L16x_round_loop:
> > + lxvd2x KX,10,KEYS
> > +
---
powerpc64/machine.m4 | 32
1 file changed, 32 insertions(+)
create mode 100644 powerpc64/machine.m4
diff --git a/powerpc64/machine.m4 b/powerpc64/machine.m4
new file mode 100644
index ..3a121260
--- /dev/null
+++ b/powerpc64/machine.m4
@@ -0,0 +1,32
---
gcm.c | 82 +++-
powerpc64/P8/gcm-hash.asm | 998
++
2 files changed, 1066 insertions(+), 14 deletions(-)
create mode 100644 powerpc64/P8/gcm-hash.asm
diff --git a/gcm.c b/gcm.c
index cf615daf..935d4420 100644
--- a/gcm.c
+++
I measured the latency and throughput of vcipher/vncipher/vxor instructions
for POWER8
vcipher/vncipher
throughput 6 instructions per cycle
latency 0.91 clock cycles
vxor
throughput 6 instructions per cycle
latency 0.32 clock cycles
So the ideal option for POWER8 is processing 8 blocks, it has
---
powerpc64/README | 86
1 file changed, 86 insertions(+)
create mode 100644 powerpc64/README
diff --git a/powerpc64/README b/powerpc64/README
new file mode 100644
index ..f78357ab
--- /dev/null
+++ b/powerpc64/README
@@ -0,0
---
aes-decrypt-internal.c | 10 ++
aes-encrypt-internal.c | 10 ++
fat-ppc.c| 173
+++
fat-setup.h | 9 ++
powerpc64/fat/aes-decrypt-internal-2.asm | 37 +++
ster/ppc/ghashp8-ppc.pl
He uses lvx_u which indicate to lvx_unaligned I guess, Perl translates this
instruction to lxvd2x before passing it to the assembler.
regards,
Mamone
On Tue, Jun 30, 2020 at 11:31 PM Jeffrey Walton wrote:
> On Tue, Jun 30, 2020 at 7:55 AM Maamoun TK
> wrote:
> &
On Tue, Jun 30, 2020 at 11:06 PM Niels Möller wrote:
Can you explain a bit more what's special with the powerpc64 prologue
> and epilogue? What are the .C_NAME($1) symbols?
This is ELFv1 ABI stuff that is still required to get the function working
for PowerPC64 big-endian systems. PowerPC64
---
Makefile.in | 2 +-
aclocal.m4 | 52
configure.ac | 11 +
powerpc64/README | 67
powerpc64/machine.m4 | 24 +++
5 files changed, 155
---
testsuite/gcm-test.c | 23 +++
1 file changed, 23 insertions(+)
diff --git a/testsuite/gcm-test.c b/testsuite/gcm-test.c
index c8174019..df1fc94a 100644
--- a/testsuite/gcm-test.c
+++ b/testsuite/gcm-test.c
@@ -170,6 +170,29 @@ test_main(void)
---
gcm.c | 19 +-
powerpc64/gcm-hash8.asm | 1004
+++
2 files changed, 1022 insertions(+), 1 deletion(-)
create mode 100644 powerpc64/gcm-hash8.asm
diff --git a/gcm.c b/gcm.c
index cf615daf..809c03bc 100644
--- a/gcm.c
+++ b/gcm.c
---
powerpc64/aes-decrypt-internal.asm | 579
+
powerpc64/aes-encrypt-internal.asm | 540 ++
2 files changed, 1119 insertions(+)
create mode 100644 powerpc64/aes-decrypt-internal.asm
create mode 100644
I would like to help but I have no clue or experience with ARM NEON, sorry.
regards,
Mamone
On Tue, Jul 7, 2020 at 5:46 PM Niels Möller wrote:
> I've written some new ARM Neon assembly for salsa20. See
>
> https://gitlab.com/gnutls/nettle/-/commit/2ac58a1ce729a6cfe1d3703f4deb6da8862909e9
> ,
>
---
configure.ac | 2 ++
1 file changed, 2 insertions(+)
diff --git a/configure.ac b/configure.ac
index cc0d67ec..d7020fd6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -582,7 +582,9 @@ AH_VERBATIM([HAVE_NATIVE],
#undef HAVE_NATIVE_ecc_secp384r1_redc
#undef HAVE_NATIVE_ecc_secp521r1_modp
I added a PowerPC64LE optimized version of AES and GHASH to nettle.
Patch summary:
GHASH Algorithm
I took the advantage of several references and researches to achieve the
high-speed implementation of this algorithm. These references include
several techniques that have been used to improve the
Is there a limit to the number of lines of a message posted on the
list or special
option that should be passed to git diff?
I created a patch with command "git diff --stat --summary --patch HEAD~1 >
patch.patch", when I posted this patch on the list nothing happened.
I made a workaround by installing the required packages for ppc64el via
.gitlab-ci.yml
I will post the patch with big endian support for the optimized functions.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
To run tests on ppc64 and ppc64el, this patch install cross building
packages for both architectures on the Debian image, these packages will be
install every time the CI triggered. A proper fix would be to install these
packages to the image directly.
This patch follows a different approach to
On Wed, Jun 24, 2020 at 10:43 AM Niels Möller wrote:
Speaking of ppc64 (big-endian, I assume) vs ppc64el, do you think it's
> possible and reasonable to use same assembly files? That's how the
> current ARM big-endian support works.
>
Yes, that is what I'm working on. nettle has IF_LE and IF_BE
On Mon, Jun 29, 2020 at 3:15 PM Niels Möller wrote:
> I've committed a change based on this patch. I dropped big-endian ppc
> support for now (the "apt-get remove nettle-dev:$arch" failed because
> it's not an arch in official debian. Not sure if that was the only
> problem, but I wanted to get
Makefile.in | 2 +-
configure.ac | 5 +
gcm.c | 19 +-
powerpc64le/aes-decrypt-internal.asm (new +x) | 573 +++
powerpc64le/aes-encrypt-internal.asm (new +x) | 534
Makefile.in | 2 +-
configure.ac | 5 +
gcm.c| 19 +-
powerpc64le/aes-decrypt-internal.asm | 573
powerpc64le/aes-encrypt-internal.asm | 534 +++
powerpc64le/gcm-hash8.asm
diff -urN nettle/configure.ac nettle_PowerPC64LE/configure.ac
--- nettle/configure.ac 2020-06-08 08:42:20.0 +0300
+++ nettle_PowerPC64LE/configure.ac 2020-06-15 18:41:43.485342900 +0300
@@ -435,6 +435,9 @@
esac
fi
;;
+*powerpc64le*)
+ asm_path=powerpc64le
+ ;;
---
Makefile.in | 2 +-
configure.ac | 5 +
gcm.c | 19 +-
powerpc64le/aes-decrypt-internal.asm | 573 +++
powerpc64le/aes-encrypt-internal.asm | 534
diff --git a/configure.ac b/configure.ac
index 90ea1ea8..1ea54ce8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -435,6 +435,9 @@ if test "x$enable_assembler" = xyes ; then
esac
fi
;;
+*powerpc64le*)
+ asm_path=powerpc64le
+ ;;
*)
enable_assembler=no
On Sat, Jun 20, 2020 at 11:54 AM Niels Möller wrote:
> Have you measured speedup when going from 4 to 8 blocks? We shouldn't
> add larger loops than needed.
>
The 8x loop has x~1.15 performance boost over 4x loop, if you think it's
not worth it, I can add only 4x loop to make the code simpler.
I investigated this issue. The Debian image used for Gitlab CI only
supports the following archs amd64 mips armhf arm64. To add a new arch,
this arch should be added to the sources list of apt and install the
required packages to build and check nettle library.
On Thu, Jun 18, 2020 at 6:58 PM Maamoun TK
wrote:
> I added a PowerPC64LE optimized version of AES and GHASH to nettle.
> Patch summary:
>
> GHASH Algorithm
>
> I took the advantage of several references and researches to achieve the
> high-speed implementat
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote:
Latency less than one cycle sounds wrong. Usually, simple ALU
> instructions like xor has a latency of exactly one cycle (i.e., when an
> instruction starts executing (all inputs are available), the result is
> available for depending
---
powerpc64/README | 86
1 file changed, 86 insertions(+)
create mode 100644 powerpc64/README
diff --git a/powerpc64/README b/powerpc64/README
new file mode 100644
index ..6d6f3fbb
--- /dev/null
+++ b/powerpc64/README
@@ -0,0
On Wed, Jul 22, 2020 at 6:04 PM Niels Möller wrote:
> But in the patch for fat builds, you do the runtime check as
>
> + hwcap2 = getauxval(AT_HWCAP2);
>
> + features->have_crypto_ext =
> + (hwcap2 & PPC_FEATURE2_VEC_CRYPTO) == PPC_FEATURE2_VEC_CRYPTO ? 1 : 0;
>
> I think I would prefer to
On Mon, Jul 20, 2020 at 8:41 PM Niels Möller wrote:
> then add ghash and fat builds
> (not sure in which order).
>
I forgot to mention that you can merge them at any order.
Regards,
Mamone
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
---
fat-ppc.c | 29 -
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/fat-ppc.c b/fat-ppc.c
index e09b2097..eca689fe 100644
--- a/fat-ppc.c
+++ b/fat-ppc.c
@@ -39,10 +39,17 @@
#include
#include
#include
-#if defined(__FreeBSD__) && __FreeBSD__ < 12
this is fine.
On Mon, Jul 20, 2020 at 7:34 PM Jeffrey Walton wrote:
> On Mon, Jul 20, 2020 at 12:18 PM Maamoun TK
> wrote:
> >
> > ---
> > fat-ppc.c | 29 -
> > 1 file changed, 20 insertions(+), 9 deletions(-)
> >
> > diff --git
ssh connection so we can get it work properly.
The patch is built on top of the master branch.
regards,
Mamone
On Sun, Jan 10, 2021 at 10:45 PM Michael Weiser
wrote:
> Hello Maamoun,
>
> On Tue, Jan 05, 2021 at 09:04:59PM +0200, Maamoun TK wrote:
>
> > Thank you, I will keep
Since there are many variants of architectures, some are supported and
others could be supported in the future, it becomes a little annoying for
end-users to browse the configurable options and enable specific options to
get maximum speed for corresponding algorithms so here --enable-fat comes
in
I made a pull request in the repository.
regards,
Mamone
On Thu, Nov 26, 2020 at 11:41 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > To suppress these warnings we need to declare a prototype for
> > _nettle_gcm_init_key() and _nettle_gcm_hash() if
> &quo
On Fri, Nov 27, 2020 at 8:13 PM Niels Möller wrote:
> I wonder if gcm-internal.h can be cut down a bit, to
>
> /* Functions available only in some configurations */
> void
> _nettle_gcm_init_key (union nettle_block16 *table);
>
> void
> _nettle_gcm_hash(const struct gcm_key *key, union
On Wed, Nov 25, 2020 at 10:15 AM Niels Möller wrote:
> Maamoun TK writes:
>
> Let's leave that as is, then. Do you want to make another pull request
> with only the fixes for register usage?
>
Sure. I updated the pull request.
> I was thinking of something similar to how t
On Wed, Nov 25, 2020 at 9:21 PM Niels Möller wrote:
It remains to wire it up for fat-ppc.c. Anything else that is missing?
>
No, I'll make a pull request for fat build support.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
On Wed, Nov 25, 2020 at 10:13 PM Maamoun TK
wrote:
> I'll make a pull request for fat build support.
>
Done!
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
-storing permuting operations on little-endian mode.
regards,
Mamone
On Sun, Nov 22, 2020 at 11:26 PM Niels Möller wrote:
> Maamoun TK writes:
>
> > It generates a mask compatible with the length of leftovers, for example
> if
> > the length is 1 th
Thank you for your work.
On POWER9 I got the following benchmark result:
./configured:
chacha encrypt 308.58
chacha decrypt 325.87
./configured --enable-power-altivec "master branch":
chacha encrypt 342.15
chacha decrypt 356.24
./configured --enable-power-altivec
On Mon, Nov 30, 2020 at 10:07 PM Maamoun TK
wrote:
> BTW since there is no function called while the register of the stack
> frame is modified, I think it's fine to not follow the rules and keep the
> store and restore sequences as are without any modification.
>
I'm thinking what
On Mon, Nov 30, 2020 at 12:37 PM Niels Möller wrote:
> Niels Möller writes:
> 1. Does the save and restore of registers look correct? I checked the
>abi spec, and the intention is to use the part of the 288 byte
>"Protected zone" below the stack pointer.
There are requirements should
Hi George,
I'll start writing a white paper called "Optimizing Galois-Counter-Mode on
PowerPC Architecture Processors". Once I finish the first draft I'll send
it to Neils to review it.
> What do you need from the IBM side? I may be able to help. We'd
> definitely
> like to support you and
uot;
Let me know if you want me to make a pull request for these changes.
regards,
Mamone
On Thu, Nov 26, 2020 at 9:13 PM Niels Möller wrote:
> Niels Möller writes:
>
> > Maamoun TK writes:
> >
> >>> I'll make a pull request for fat build support.
> >
on POWER9 I get the following benchmark with ". /configure
--enable-power-altivec":
chacha encrypt 763.57
chacha decrypt 780.64
regards,
Mamone
On Mon, Nov 30, 2020 at 11:08 PM Niels Möller wrote:
> Niels Möller writes:
>
> > Below code seems to work (but is not yet a drop-in
On Mon, Nov 30, 2020 at 11:18 PM Maamoun TK
wrote:
> on POWER9 I get the following benchmark with ". /configure
> --enable-power-altivec":
>
> chacha encrypt 763.57
> chacha decrypt 780.64
>
> regards,
> Mamone
>
I got this result using pp
On Mon, Nov 30, 2020 at 10:56 PM Niels Möller wrote:
> Hmm. I agree just lowering the stack pointer sounds a bit questionable.
> But if we use some other register to point into the protected zone, we
> should be fine? E.g.,
>
> addir10, r1, -0x40 C Save callee-save registers
>
looks like this:
std r30,-16(r1)
std r31,-8(r1)
li r0,-80
stvxv28,r1,r0
li r0,-64
stvxv29,r1,r0
li r0,-48
stvxv30,r1,r0
li r0,-32
stvxv31,r1,r0
regards,
Mamone
On Wed, Dec 2, 2020 at 7:31 PM David Edelsohn wrote:
> On Wed, Dec 2, 2020 at 9:41 AM Maam
On Tue, Dec 1, 2020 at 8:02 PM Niels Möller wrote:
> How portable is this, do all relevant operating systems support storing
> data below the stack pointer?
>
I need to investigate this.
regards,
Mamone
___
nettle-bugs mailing list
>
> I wonder which assembly files we should use if target host is aarch64,
> but ABI=32? I guess the arm/v6/ code can be used unconditionally. Can
> we also use arm/neon/ code unconditionally?
>
It seems gcc for aarch64 doesn't support building 32-bit binaries, maybe we
should remove the check of
I created a couple of merge requests in the repo, with those MRs merged I
think the powerpc code is stable to be included in the upcoming version of
nettle.
regards,
Mamone
On Thu, Dec 17, 2020 at 12:28 PM Maamoun TK
wrote:
> I wonder which assembly files we should use if target h
>
> I think the powerpc64 code is in good shape now, and ready for release. Are
> you aware of anything that needs fixing?
>
No, all what I can think of are a couple of issues that make the powerpc64
code more stable (you can check their merge requests in the repo).
> One problem with the
Hi Michael,
On Fri, Dec 18, 2020 at 8:00 PM Michael Weiser
wrote:
> qemu-user works nicely for aarch64_be. I used it to semi-natively
> compile a whole aarch64 userland. I could dust off pine64 board that is
> running that userland now for real-world testing if you like.
>
Thank you, I will
On Sat, Dec 19, 2020 at 11:27 AM Niels Möller wrote:
> For the other one,
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/15 "Use signal
> to detect CPU features when getauxval() isn't available", can you
> explain for which systems is that needed? In the current code, you
> handle
On Mon, Dec 21, 2020 at 9:29 AM Niels Möller wrote:
> Thanks, looks pretty good. I added a few minor comments on the mr
> (https://git.lysator.liu.se/nettle/nettle/-/merge_requests/16 for
> reference).
>
Thank you, I made a commit with the changes.
regards,
Mamone
I forgot to mention that I made the benchmark test on gcc17 in GCC Farm.
regards,
Mamone
On Tue, Dec 15, 2020 at 12:12 AM Maamoun TK
wrote:
> I made a merge request in the main repo that enables optimized GHASH on
> AArch64 architecture. The implementation is based on Niels Möller's
>
I made a merge request in the main repo that enables optimized GHASH on
AArch64 architecture. The implementation is based on Niels Möller's
enhanced algorithm which yields more speedup on AArch64 arch in
comparison with intel algorithm. Using the Karatsuba algorithm with Intel
algorithm yielded an
On Sat, Dec 19, 2020 at 9:05 PM Niels Möller wrote:
> Do you have any idea how common such old systems might be?
>
I don't have a specific number but I think using that old versions of glibc
is uncommon specially for POWER8 and above processors considering those
versions are more than 8 years
This implementation takes advantage of research made by Niels Möller to
optimize GCM on PowerPC, this optimization yields a +27.7% performance
boost on POWER8 over the previous implementation that was based on intel
documents. The performance comparison is made by processing 4 blocks per
loop
---
fat-ppc.c | 25 +
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/fat-ppc.c b/fat-ppc.c
index 2bfd649f..ec971706 100644
--- a/fat-ppc.c
+++ b/fat-ppc.c
@@ -43,8 +43,13 @@
#if defined(_AIX)
# include
#elif defined(__linux__)
+# include
# include
I think I mislabeled the percentage of performance comparison, the new
method achieved 27.7% reduction in time on POWER8 that corresponds to 37.9%
increase in performance.
On Tue, Nov 10, 2020 at 6:25 AM Maamoun TK
wrote:
> This implementation takes advantage of research made by Niels Möl
|
|*|
regards,
Mamone
On Wed, Nov 11, 2020 at 6:24 PM George Wilson
wrote:
> On Wed, Nov 11, 2020 at 02:17:41AM +0200, Maamoun TK wrote:
> > I think I mislabeled the percentage of performance c
On Sat, Nov 14, 2020 at 8:11 PM Maamoun TK
wrote:
> For the first approach I can think of this method:
> lxvd2x VSR(C0),0,DATA
> IF_LE(`
> vperm C0,C0,C0,LE_MASK
> ')
> slwiLENGTH,LENGTH,4 (Shift left 4 bitls because vsro get
> bit[121:124])
1 - 100 of 244 matches
Mail list logo