Hello Niels,
On Wed, Jan 13, 2021 at 01:43:38PM +0100, Niels Möller wrote:
> > Attached is the new patch that unconditionally switches from vldm to
> > vld1.32 but
> > keeps vstm in favour of vst1.8 on little-endian for stores.
> Thanks! Applied now.
Perfect! Incidentally: The other day I was
ni...@lysator.liu.se (Niels Möller) writes:
> I've done a benchmark run of nettle-3.6 on the GMP "nanot2" system, with
> a Cortex-A9 processor. The installed compiler is gcc-5.4 (a few years
> old).
I choose Cortex-A9 for this test in attempt to reproduce my old numbers.
Even if it's probably
Michael Weiser writes:
> Attached is the new patch that unconditionally switches from vldm to vld1.32
> but
> keeps vstm in favour of vst1.8 on little-endian for stores.
Thanks! Applied now.
> From that point of view, the slight performance hit for vld1.32 but
> keeping of vstm on LE seems
Hello Niels,
On Fri, Jan 01, 2021 at 06:07:14PM +0100, Niels Möller wrote:
> > With the help of Jeff I've gone on a bit of a benchmark binge using a:
> >
> > - Raspberry Pi 1B (Broadcom BCM2835, arm11),
> > - Cubieboard2 (Allwinner A20, Cortex-A7),
> > - Wandboard (Freescale i.MX6 DualLite,
I've made a release candidate tarball, see
http://www.lysator.liu.se/~nisse/archive/nettle-3.7rc1.tar.gz
Intend to release in a day or two. I mostly trust the ci system, so I
will do only a few tests on the tarball to try to catch any packaging
mistakes. As usual, any additional testing highly
ni...@lysator.liu.se (Niels Möller) writes:
> Thanks for investigating. So from these charts, it looks like the
> single-block Neon code is of no benefit on any of the test systems. And
> even significantly slower on the tinkerboard and rpi4.
>
> If that's right, the code should probably just be
Michael Weiser writes:
> Happy new year, Niels and all around,
>
> On Wed, Dec 30, 2020 at 09:12:24PM +0100, Niels Möller wrote:
>
>> > It comes out at around seven cycles per block slowdown for chacha-3core
>> > and five for salsa20-2core. I trace this to vst1.8. It's just slower
>> Thanks for
Happy new year, Niels and all around,
On Wed, Dec 30, 2020 at 09:12:24PM +0100, Niels Möller wrote:
> > It comes out at around seven cycles per block slowdown for chacha-3core
> > and five for salsa20-2core. I trace this to vst1.8. It's just slower
> Thanks for investigating. Maybe keep some
On Tue, Dec 29, 2020 at 5:15 PM Michael Weiser wrote:
>
> ...
> Do you (or anybody else) have a hardware arm board for testing, possibly
> with a Cortex A8 or A9 implementation to see how it behaves there?
I've got a Wnadboard/Cortex-A9 and Tinkerboard/Cortex-A17 hanging off
the internet with
Michael Weiser writes:
> It comes out at around seven cycles per block slowdown for chacha-3core
> and five for salsa20-2core. I trace this to vst1.8. It's just slower
> than vstm (in contrast to vldm vs. vld1.32). I managed to save a
> cumulative two cycles by rescheduling instructions so that
Hello Niels,
On Fri, Dec 25, 2020 at 10:48:19PM +0100, Niels Möller wrote:
> Since we have plenty of registers available, (including r3 which seems
> unused and free to clobber), I'd suggest using
>define(`SRCp32', `r3')
> and an
>add SRCp32, SRC, #32
> in function entry, and then
Since there are many variants of architectures, some are supported and
others could be supported in the future, it becomes a little annoying for
end-users to browse the configurable options and enable specific options to
get maximum speed for corresponding algorithms so here --enable-fat comes
in
ni...@lysator.liu.se (Niels Möller) writes:
> Hi, I wonder if it would make sense to try to cut a release pretty soon
> (and without any arm64 changes)? Previous release was made end of April,
> and there's been quite a few improvements since then.
I've pushed a couple of changes to increase
Michael Weiser writes:
> Longer story for completeness: It seems I ran afoul gdb's way of
> displaying registers in memory endianness again. I knew all this once
> already.[1] I should likely do this more often than every couple of
> years. ;)
I'm always confused by the conventions for ordering
Hello Niels,
On Mon, Dec 21, 2020 at 09:16:25PM +0100, Niels Möller wrote:
> What's the layout before the transpose, immediately after load? I'd
> guess you get X1: 1 0 3 2?
TL;DR: Yes, it is. I abandoned this approach for now though, since I
found some options to eliminate the word
Michael Weiser writes:
> See the attached patch for my current approach to fixing it, which is
> explicit transposing, adding and then transposing again to be as
> transposed as the other operands.
I haven't yet read the code, but I have some comments based on your
description only.
> I
Hello Niels,
On Sat, Dec 19, 2020 at 09:51:45AM +0100, Niels Möller wrote:
> > Porting over the basic
> > IF_[LB]E mechanism from chacha-core-internal was easy and fixed up the
> > first of the three interleaved blocks right away. For the other two I am
> > still in the process of wrapping my
Jeffrey Walton writes:
> Also see
> https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html.
It's not entirely clear to me how libtool versions maps to soname, but
from looking at GMP, I guess the number embedded in the soname is
current - age. So for gmp-6.1.2, the
On 2020-12-19 Niels Möller wrote:
> Amos Jeffries writes:
> > I would have though this needs a soname bump. Otherwise software built
> > to use bcrypt might try to link to the old version with same soname.
> My understanding is that one usually doesn't bump the soname when adding
> new
On Sat, Dec 19, 2020 at 4:44 AM Niels Möller wrote:
>
> Amos Jeffries writes:
>
> > I would have though this needs a soname bump. Otherwise software built
> > to use bcrypt might try to link to the old version with same soname.
>
> My understanding is that one usually doesn't bump the soname
Amos Jeffries writes:
> I would have though this needs a soname bump. Otherwise software built
> to use bcrypt might try to link to the old version with same soname.
My understanding is that one usually doesn't bump the soname when adding
new functions.
I was trying to look at how it has been
On 19/12/20 5:29 am, Niels Möller wrote:
Andreas Metzler writes:
it would not count as transition
https://release.debian.org/bullseye/freeze_policy.html#transition
...
* Support for bcrypt, contributed by Stephen R. van den Berg.
I would have though this needs a soname bump.
Michael Weiser writes:
> Porting over the basic
> IF_[LB]E mechanism from chacha-core-internal was easy and fixed up the
> first of the three interleaved blocks right away. For the other two I am
> still in the process of wrapping my head around how the interleaving
> works and how it would need
Hi Michael,
On Fri, Dec 18, 2020 at 8:00 PM Michael Weiser
wrote:
> qemu-user works nicely for aarch64_be. I used it to semi-natively
> compile a whole aarch64 userland. I could dust off pine64 board that is
> running that userland now for real-world testing if you like.
>
Thank you, I will
Hi Niels and Maamoun,
On Fri, Dec 18, 2020 at 07:18:24PM +0200, Maamoun TK wrote:
> > One problem with the current state is that big-endian arm is most likely
> > broken. I don't want to delay the release for that though, since I'm not
> > able to fix it. If anyone is able to test and fix, soon
>
> I think the powerpc64 code is in good shape now, and ready for release. Are
> you aware of anything that needs fixing?
>
No, all what I can think of are a couple of issues that make the powerpc64
code more stable (you can check their merge requests in the repo).
> One problem with the
.
Regards,
/Niels
NEWS for the Nettle 3.7 release
This release adds one new feature, the bcrypt password hashing
function, and lots of optimizations. The release adds
PowerPC64 assembly for a few algorithms, resulting in great
speedups. Benchmarked on a Power9 machine, spee
On 2020-12-15 Niels Möller wrote:
> Hi, I wonder if it would make sense to try to cut a release pretty soon
> (and without any arm64 changes)? Previous release was made end of April,
> and there's been quite a few improvements since then.
> I wonder if it is possible to make a release in time for
Hi, I wonder if it would make sense to try to cut a release pretty soon
(and without any arm64 changes)? Previous release was made end of April,
and there's been quite a few improvements since then.
I wonder if it is possible to make a release in time for the upcoming
debian release?
29 matches
Mail list logo