Re: Old ARM Neon code for salsa20 and chacha

2021-02-06 Thread Michael Weiser
Hello Niels, On Thu, Jan 28, 2021 at 07:26:46PM +0100, Niels Möller wrote: > > With the new 2-way or 3-way functions, performance of the single-block > > functions isn't that critical, so deletion may be ok even if it causes > > some small regression on some processors (e.g., single-block chacha

Re: [AArch64] Optimize GHASH

2021-02-06 Thread Michael Weiser
Hello Niels, On Tue, Feb 02, 2021 at 06:09:42PM +0100, Niels Möller wrote: > > I've downloaded binary builds of clang for aarch64 from > > https://releases.llvm.org/download.html. 3.9.1 was the oldest prebuilt > > toolchain I could find there and 11.0.0 the most recent. > [...] > > They also

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Michael Weiser
Hi all, On Tue, Feb 02, 2021 at 08:23:39AM -0500, Jeffrey Walton wrote: > > > I think my mentioning of llvm-as was a red herring. Looking at the > > > output of clang -v, llvm-as isn't involved at all. This is supported by > > > the man page stating that llvm-as accepts LLVM assembly and emits

Re: [AArch64] Optimize GHASH

2021-02-02 Thread Michael Weiser
Hello Niels, On Tue, Feb 02, 2021 at 07:40:44AM +0100, Niels Möller wrote: > > llvm-as wouldn't recognize pmull instruction without > > adding -march=armv8-a+crypto flag at least with the version I use "3.8.1" 3.8.1 was released in 2017. It might not support recent aarch64 additions regarding

Re: [AArch64] Optimize GHASH

2021-01-31 Thread Michael Weiser
Hello Niels, > I think this would be more user-friendle without the "a", > --enable-armv8-crypto, or --enable-arm64-crypto. Or do you foresee any > collision with an incompatible ARMv8-M crypto extension or the like? FWIW, I like --enable-arm64-crypto because it would nicely match with a

Re: [AArch64] Optimize GHASH

2021-01-26 Thread Michael Weiser
Hello Mamone, On Tue, Jan 26, 2021 at 07:15:22PM +0200, Maamoun TK wrote: > > Attached are the current > > patches, the first being your original. What do you think? > I liked how the patch ended up so far, just give me one or two days to give > the patch additional review before letting it up

Re: [AArch64] Optimize GHASH

2021-01-25 Thread Michael Weiser
if test "$ABI" = 64 ; then + CFLAGS="$CFLAGS -Wa,-march=armv8-a+crypto" + asm_path="arm64 arm64/v8" +fi + fi +;; *powerpc64*) if test "$ABI" = 64 ; then GMP_ASM_POWERPC_R_REGISTERS -- 2.30.0 >From 3c219dfc

Re: [AArch64] Optimize GHASH

2021-01-24 Thread Michael Weiser
Hello Mamone, On Sat, Jan 23, 2021 at 08:52:30PM +0200, Maamoun TK wrote: > > @@ -280,9 +266,9 @@ L1x: > > tstLENGTH,#-16 > > b.eq Lmod > > > > -ld1{H1M.16b,H1L.16b},[TABLE] > > +ld1{H1M.2d,H1L.2d},[TABLE] > > > > -ld1

Re: [AArch64] Optimize GHASH

2021-01-23 Thread Michael Weiser
Hi Mamone, Jeff, sorry for the duplication, used the wrong sender address for the list again. On Fri, Jan 22, 2021 at 07:07:46PM -0500, Jeffrey Walton wrote: > > > Do you think it makes sense to try and adjust the code to work with the > > > BE layout natively and have a full 128bit reverse

Re: [AArch64] Optimize GHASH

2021-01-22 Thread Michael Weiser
Hello Mamone, On Fri, Jan 22, 2021 at 10:14:36PM +0200, Maamoun TK wrote: > > The difference in index in dup EMSB nicely shows the doubleword > > transposition compared to LE. If on LE the dup was done after the rev64, > > it'd be H.b[7] vs. H.b[15]. > I see what you did here, but I'm confused

Re: [AArch64] Optimize GHASH

2021-01-21 Thread Michael Weiser
Hello Mamone, On Wed, Jan 20, 2021 at 10:25:19PM +0200, Maamoun TK wrote: > I'm trying to install Gentoo on VMware by walking through this receip > https://medium.com/@steensply/vmware-installation-of-gentoo-linux-from-scratch-on-an-encrypted-partition-9e4665f638e2 > I'm in the middle of receip

Re: [AArch64] Optimize GHASH

2021-01-19 Thread Michael Weiser
Hello Mamone, On Mon, Jan 18, 2021 at 06:27:40PM +0200, Maamoun TK wrote: > It would be nice to get the implementation of the enhanced algorithm > working for both endian modes as it yields a good performance boost. Also, > there is no much effort here, the only thing I'm struggling with is to

Re: Release of Nettle-3.7?

2021-01-13 Thread Michael Weiser
Hello Niels, On Wed, Jan 13, 2021 at 01:43:38PM +0100, Niels Möller wrote: > > Attached is the new patch that unconditionally switches from vldm to > > vld1.32 but > > keeps vstm in favour of vst1.8 on little-endian for stores. > Thanks! Applied now. Perfect! Incidentally: The other day I was

Re: [AArch64] Optimize GHASH

2021-01-13 Thread Michael Weiser
4_19=y' ; \ echo 'BR2_PACKAGE_GMP=y' ; \ + echo 'BR2_PACKAGE_HOST_GDB=y' ; \ echo 'BR2_PER_PACKAGE_DIRECTORIES=y' ; \ ) > .config && \ make olddefconfig && \ @@ -75,7 +76,7 @@ MAINTAINER Michael Weiser RUN apt-get upda

Re: [AArch64] Optimize GHASH

2021-01-10 Thread Michael Weiser
Hello Maamoun, On Tue, Jan 05, 2021 at 09:04:59PM +0200, Maamoun TK wrote: > Thank you, I will keep you updated about progress of big-endian support for > GHASH on arm64 arch so we can test the patch on real device before sending > it to Niels. I've added aarch64_be buildroot toolchain

Re: [AArch64] Optimize GHASH

2021-01-05 Thread Michael Weiser
Hello Maamoun, On Tue, Jan 05, 2021 at 05:52:35PM +0200, Maamoun TK wrote: > > I've made a new branch "arm64" with the configure changes. If you think > > that looks ok, can you add your new ghash code on top of that? > Great. I'll add the ghash code to the branch once I finish the big-endian >

Re: Release of Nettle-3.7?

2021-01-03 Thread Michael Weiser
; for va in variants: refva = "master" if "no23core" in va: refva = "master-no23core" ref = val[refva][bo][al][op] abs_ = val[va][bo][al][op] rel = abs_ *

Re: Release of Nettle-3.7?

2021-01-01 Thread Michael Weiser
Happy new year, Niels and all around, On Wed, Dec 30, 2020 at 09:12:24PM +0100, Niels Möller wrote: > > It comes out at around seven cycles per block slowdown for chacha-3core > > and five for salsa20-2core. I trace this to vst1.8. It's just slower > Thanks for investigating. Maybe keep some

Re: Release of Nettle-3.7?

2020-12-29 Thread Michael Weiser
t point to some flaw in my benchmarking or system software/hardware? (I've done my best using gdb to verify that the asm routines are in use. Unfortunately, nettle-benchmark is resisting attempts to ltrace or gdb-debug it, so I diagnosed the testsuite tests instead.) -- Thanks, Michael >From d5

Re: Release of Nettle-3.7?

2020-12-25 Thread Michael Weiser
X2 - vrev32.u8 X3, X3') - - vstmDST, {X0,X1,X2,X3} + C vst1.8 because caller expects results little-endian +IF_LE(`vstmDST, {X0,X1,X2,X3}') +IF_BE(`vst1.8 {X0,X1}, [DST]! + vst1.8 {X2,X3}, [DST]') bx lr EPILOGUE(_nettle_salsa20_core) >Fr

Re: Release of Nettle-3.7?

2020-12-21 Thread Michael Weiser
Hello Niels, On Sat, Dec 19, 2020 at 09:51:45AM +0100, Niels Möller wrote: > > Porting over the basic > > IF_[LB]E mechanism from chacha-core-internal was easy and fixed up the > > first of the three interleaved blocks right away. For the other two I am > > still in the process of wrapping my

Re: Release of Nettle-3.7?

2020-12-18 Thread Michael Weiser
Hi Niels and Maamoun, On Fri, Dec 18, 2020 at 07:18:24PM +0200, Maamoun TK wrote: > > One problem with the current state is that big-endian arm is most likely > > broken. I don't want to delay the release for that though, since I'm not > > able to fix it. If anyone is able to test and fix, soon

Re: Optimizing salsa20

2020-07-21 Thread Michael Weiser
Hello Niels, sorry for the delay - I've been on vacation. On Thu, Jul 09, 2020 at 04:05:21PM +0200, Niels Möller wrote: > This will break support for big-endian ARM for > now, since I'm not able to test that. We still have the ARM BE CI ready to go. Is it maybe time to get it activated on

Re: Add ppc64le arch to Gitlab CI

2020-07-21 Thread Michael Weiser
Hello Niels, On Tue, Jun 23, 2020 at 09:50:20AM +0200, Niels Möller wrote: > > I investigated this issue. The Debian image used for Gitlab CI only > > supports the following archs amd64 mips armhf arm64. To add a new arch, > > this arch should be added to the sources list of apt and install the

Re: Nettle 3.5.1 and OS X 10.12 patch

2020-04-01 Thread Michael Weiser
Hello Niels, On Tue, Mar 31, 2020 at 08:08:35PM +0200, Niels Möller wrote: > > I think a reasonable way is to add > > > > abs_top_builddir = @abs_top_builddir@ > > > > TEST_SHLIB_DIR = "${abs_top_builddir}/.lib" > > > > to config.make.in, and use that to set LD_LIBRARY_PATH. And possibly > >

Re: Nettle 3.5.1 and OS X 10.12 patch

2020-03-31 Thread Michael Weiser
Hi Jeff, On Tue, Mar 31, 2020 at 06:51:37AM -0400, Jeffrey Walton wrote: > > I believe the reason the patch works is, the environment is scrubbed > > before run-tests.sh is run. run-tests.sh then sets DYLD_LIBRARY_PATH > > (and friends). Since the test runner is calling programs outside the > >

Re: Nettle 3.5.1 and OS X 10.12 patch

2020-03-31 Thread Michael Weiser
Hi Jeff, On Tue, Mar 31, 2020 at 05:51:38AM -0400, Jeffrey Walton wrote: > > > > In a quick test on Mojave it appears that any attempt to setenv() a > > > > variable that starts with DYLD_ is silently ignored. Can you confirm > > > Odd. I hope it's still possible to set DYLD_LIBRARY_PATH to a

Re: Nettle 3.5.1 and OS X 10.12 patch

2020-03-31 Thread Michael Weiser
Hello Niels, On Tue, Mar 31, 2020 at 09:27:02AM +0200, Niels Möller wrote: > > In a quick test on Mojave it appears that any attempt to setenv() a > > variable that starts with DYLD_ is silently ignored. Can you confirm > > that? My testcase is: > > > > DYLD_FOO=foo DLYD_FOO=bar bash -c 'echo

Re: Nettle 3.5.1 and OS X 10.12 patch

2020-03-31 Thread Michael Weiser
Hi Jeff, On Sat, Mar 28, 2020 at 01:52:05AM -0400, Jeffrey Walton wrote: > I know Apple did some hardening lately but I have not read anything > about scrubbing LD_LIBRARY_PATH and DYLD_LIBRARY_PATH. But it appears > something was sanitizing the environment. In a quick test on Mojave it appears

Re: [PATCH 1/1] arm: Fix memxor for non-armv6+ big-endian systems

2020-03-12 Thread Michael Weiser
Hello Niels, On Thu, Mar 12, 2020 at 10:01:51PM +0100, Niels Möller wrote: > > ARM assembly adjustments for big-endian systems contained armv6+-only > > instructions (rev) in generic arm memxor code. Replace those with an > > actual conversion of the leftover byte store routines for big-endian >

[PATCH 1/1] arm: Fix memxor for non-armv6+ big-endian systems

2020-03-05 Thread Michael Weiser
as well as increased symmetry between little- and big-endian implementations. Signed-off-by: Michael Weiser --- arm/memxor.asm | 13 +++-- arm/memxor3.asm | 31 ++- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/arm/memxor.asm b/arm/memxor.asm

[PATCH 0/1] Re: Armeb is broken

2020-03-05 Thread Michael Weiser
Hello Niels, > > This is where after a lot of scratching of my thinking cap I got to the > > conclusion that in LE we're actually working with the least significant > > bytes of r4 at the low end of the register. > My understanding of LE here is that the least significant CNT bits > (first in

Re: Armeb is broken

2020-03-04 Thread Michael Weiser
Hello Niels, On Tue, Mar 03, 2020 at 06:57:25PM +0100, Niels Möller wrote: > The correctness in all cases is not that obvious to me now, but the idea > is that we write aligned words, and read aligned words. But since input > and output may have different alignment, src words are shifted around

Re: Armeb is broken

2020-02-24 Thread Michael Weiser
tps://gitlab.com/gnutls/build-images)? There's some qemu bits (specific to Debian's multiarch though) in docker-debian-cross that might be helpful. -- Thanks, Michael From 3e2118d41472842c368bb5bb56d71023b861b59d Mon Sep 17 00:00:00 2001 From: Michael Weiser Date: Sun, 23 Feb 2020 15:22:51 +0100

Re: Armeb is broken

2020-02-23 Thread Michael Weiser
Hi Dmitry, On Sun, Feb 23, 2020 at 07:08:54PM +0300, Dmitry Baryshkov wrote: > If I remember correctly, ARMv5 be was BE-32 Yep. So is the buildroot output I get for arm926 armeb. I just need the BE8 workaround on my BE8 armv7veb system to test the armv5 compilate. On Sun, Feb 23, 2020 at

Re: Armeb is broken

2020-02-23 Thread Michael Weiser
Hi all, On Sat, Feb 22, 2020 at 07:43:18PM +0100, Michael Weiser wrote: > > 2. Eliminate use of rev in the armbe code. > ... I've been looking at the revs and they now strike me as taking the > easy way out anyway. They work around the implicit LE order in which Updated code is

Re: Armeb is broken

2020-02-22 Thread Michael Weiser
Hi Niels, On Sat, Feb 22, 2020 at 07:58:10AM +0100, Niels Möller wrote: > > I am *using* it on Cubieboard 2's which are armv7. I don't think there's > > eveer been an actual hardware support statement. > Question is, what devices are out where, there armv5 support would be > useful? Well, arm

Re: Armeb is broken

2020-02-21 Thread Michael Weiser
Hi Dmitry, On Fri, Feb 21, 2020 at 07:02:54PM +0300, Dmitry Baryshkov wrote: > > The asm for arm certainly needed adjustment to work on armeb. See > > https://lists.lysator.liu.se/pipermail/nettle-bugs/2018/007280.html. > What is the target hardware for armeb? I am *using* it on Cubieboard 2's

Re: Armeb is broken

2020-02-19 Thread Michael Weiser
Hi Niels, On Tue, Feb 18, 2020 at 09:45:01PM +0100, Niels Möller wrote: > > That's exactly the rabbithole I went down and got lost in the last time: > > armeb is so niche that I'm not aware of any ready-to-install mainstream > > distribution for it. I am running Gentoo but that's not easy to

Re: Armeb is broken

2020-02-18 Thread Michael Weiser
Hello Niels, On Tue, Feb 18, 2020 at 12:25:05PM +0100, Niels Möller wrote: > > Let me know if there are any outstanding issues regarding armeb and I'll > > look into them. > To keep it in working shape, it would help a lot with additional tests > in .gitlab-ci. I'm not really familiar with the

Re: Armeb is broken

2020-02-18 Thread Michael Weiser
Hi Андрей, On Tue, Feb 18, 2020 at 02:28:50AM +0300, Андрей Аладьев wrote: > Hello, please see the following gnutls issue > https://gitlab.com/gnutls/gnutls/issues/941. > Nettle today is working on aarch64, aarch64_be and arm, but broken on armeb. Nice to see that someone other than me is

Re: [PATCH 1/3] Add arm endianness-aware assembly infrastructure

2018-03-25 Thread Michael Weiser
Hi Niels, On Sun, Mar 25, 2018 at 11:51:38AM +0200, Niels Möller wrote: > > Introduce m4 macros to conditionally handle differences of little- and > > big-endian arm in assembler code. > This and the next two patches now merged to master-updates for some final > testing. > At the moment, I'm

Re: Miscomputation with big-endian arm asm

2018-02-11 Thread Michael Weiser
ianness. But then I want to have a nice error message so as to not leave the user with an aborted build and no apparent reason. :) Is this portable? The patch got quite large now. Should I better make a series out of it? From 67de31a70f8b8076681d6ddd221605365080103f Mon Sep 17 00:00:00 2001 From: Michael Weiser &

Re: Miscomputation with big-endian arm asm

2018-02-10 Thread Michael Weiser
ASM_WORDS_BIGENDIAN directly. Also this should make the explicit value checking in IF_BE redundant because we now know for sure configure will never emit anything other than yes and no. Documentation says that AC_C_BIGENDIAN will abort if endianness can't be determined. From db70ecccdc65a97c103f3900b4f45d8370c