Re: x86 sha_ni

2018-03-12 Thread Niels Möller
Jeffrey Walton  writes:

> On Mon, Mar 12, 2018 at 2:40 PM, Niels Möller  wrote:
>> ni...@lysator.liu.se (Niels Möller) writes:
>> ...
>>
>> Now wired up for fat builds, changes pushed to the same branch.
>
> Looks good on a Celeron J3455 (https://www.amazon.com/dp/B01LYCDG4H):
>
> Without --enable-fat
>
>md2   update6.88
>md4   update  570.47
>md5   update  383.59
>openssl md5   update  444.94
>   sha1   update  238.53
>   openssl sha1   update 1323.53
> sha224   update  110.07
> sha256   update  110.25
> sha384   update  173.90
> sha512   update  174.35
> sha512-224   update  174.30
> sha512-256   update  174.08
>
> With --enable-fat
>
>md2   update6.89
>md4   update  569.68
>md5   update  382.82
>openssl md5   update  444.76
>   sha1   update 1192.25
>   openssl sha1   update 1324.47
> sha224   update  494.33
> sha256   update  495.22
> sha384   update  173.87
> sha512   update  174.33

So you get 5 times speedup of sha1 and 4.5 times for sha256. Nice!

On gcc67 (AMD Ryzen 5 2400G), I measure 3 times and 4.8 times speedup,
respectively.

Now, I think there are opportunities for improvements also for sha1 and
sha256 without sha_ni, but that's a more difficult project, to carefully
take data dependencies into account, and deal with hard-to-predict x86
scheduling.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Jeffrey Walton
On Mon, Mar 12, 2018 at 4:23 PM, Amos Jeffries  wrote:
> On 13/03/18 08:44, Jeffrey Walton wrote:
>> Check /proc/cpuinfo for the sha_ni flag. If present, then you can test
>> the SHA extensions.
>>
>> SHA extensions made their debut in Goldmont. They are also available
>> in Goldmont+. They were scheduled for one of the lakes but they did
>> not make it in.
>>
>> I have a Goldmont machine for testing SHA but it is a turd. It is a
>> Celeron J3455 (https://www.amazon.com/dp/B01LYCDG4H).
>
> Ah, okay. That Goldmont info matches my /proc/cpuinfo.
> No "sha_ni" listed :-(, just the aes-ni instruction set.

Yeah, if I recall correctly, SHA was supposed to be in Kaby Lake. It
looks like it slipped, and SHA was added to the non-turd machines at
Cannon Lake. Also see
https://en.wikipedia.org/wiki/Cannon_Lake_(microarchitecture)

Jeff
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Jeffrey Walton
On Mon, Mar 12, 2018 at 2:40 PM, Niels Möller  wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
> ...
>
> Now wired up for fat builds, changes pushed to the same branch.

Looks good on a Celeron J3455 (https://www.amazon.com/dp/B01LYCDG4H):

Without --enable-fat

   md2   update6.88
   md4   update  570.47
   md5   update  383.59
   openssl md5   update  444.94
  sha1   update  238.53
  openssl sha1   update 1323.53
sha224   update  110.07
sha256   update  110.25
sha384   update  173.90
sha512   update  174.35
sha512-224   update  174.30
sha512-256   update  174.08

With --enable-fat

   md2   update6.89
   md4   update  569.68
   md5   update  382.82
   openssl md5   update  444.76
  sha1   update 1192.25
  openssl sha1   update 1324.47
sha224   update  494.33
sha256   update  495.22
sha384   update  173.87
sha512   update  174.33

Jeff
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Amos Jeffries
On 13/03/18 08:44, Jeffrey Walton wrote:
> Check /proc/cpuinfo for the sha_ni flag. If present, then you can test
> the SHA extensions.
>
> SHA extensions made their debut in Goldmont. They are also available
> in Goldmont+. They were scheduled for one of the lakes but they did
> not make it in.
>
> I have a Goldmont machine for testing SHA but it is a turd. It is a
> Celeron J3455 (https://www.amazon.com/dp/B01LYCDG4H).
>
> Jeff
>

Ah, okay. That Goldmont info matches my /proc/cpuinfo.
No "sha_ni" listed :-(, just the aes-ni instruction set.

AYJ
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Niels Möller
Amos Jeffries  writes:

> Is there anything you would like in the way of tests or benchmarking
> done with this hardware and environment?
> Just let me know what build and/or test commands you want run, and on
> which git branch.

It would be nice if you could verify the code on branch
x86_64-sha_ni-sha256. Build with and without --enable-fat (and if you
don't want to mess with setting LD_LIBRARY_PATH=.lib, I'd recommend also
using --disable-shared).

Run make check and 

NETTLE_FAT_VERBOSE=1 ./examples/nettle-benchmark

and see if results look right (NETTLE_FAT_VERBOSE, naturally has effect
only in fat builds).

If you like, also compare the performance with the nettle-3.4 release.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Amos Jeffries
On 13/03/18 07:40, Niels Möller wrote:
> nisse (Niels Möller) writes:
> 
>> nisse (Niels Möller) writes:
>>
>>> I've been trying out the sha_ni instructions available on some newer
>>> x86_64 processors.
>>
>> And now that the gcc67 machine is up again, I got my sha256
>> implementation working too. Pushed to branch x86_64-sha_ni-sha256.
>>
>> Not yet wired up in fat builds, but can be tested with
>> --enable-x86-sha-ni to configure.
> 
> Now wired up for fat builds, changes pushed to the same branch.
> 
> Regards,
> /Niels
> 

I have a new machine with Intel KabyLake CPU + GPU which apparently has
AES and related crypto support available. Running Debian sid with GCC-6,
7, and 8 all available.

Is there anything you would like in the way of tests or benchmarking
done with this hardware and environment?
Just let me know what build and/or test commands you want run, and on
which git branch.

AYJ
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-03-12 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>> I've been trying out the sha_ni instructions available on some newer
>> x86_64 processors.
>
> And now that the gcc67 machine is up again, I got my sha256
> implementation working too. Pushed to branch x86_64-sha_ni-sha256.
>
> Not yet wired up in fat builds, but can be tested with
> --enable-x86-sha-ni to configure.

Now wired up for fat builds, changes pushed to the same branch.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-02-21 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> I've been trying out the sha_ni instructions available on some newer
> x86_64 processors.

And now that the gcc67 machine is up again, I got my sha256
implementation working too. Pushed to branch x86_64-sha_ni-sha256.

Not yet wired up in fat builds, but can be tested with
--enable-x86-sha-ni to configure.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-02-08 Thread Jeffrey Walton
On Thu, Feb 8, 2018 at 5:15 PM, Niels Möller  wrote:
> Jeffrey Walton  writes:
>
>> Looks good on a Celeron J3455, which is a [low-end] Goldmont machine
>> with the instructions:
>
> [...]
>
>> goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/
>> ./examples/nettle-benchmark
>> sha1_compress: 84.60 cycles
>
> 85 cycles is a lot less than than 136 cycles I observed in my testing.
> The function is 131 instructions long, so it's approximately 1.5
> instructions per cycle.
>
>>   sha1   update 1194.33
>>   openssl sha1   update 1321.71
>
> And this is a 11% difference (compared to 8% in my benckmarks). Makes
> sense if the main crunching is fewer cycles, then the per block function
> call overhead is relatively larger.

I think this might be explained by root access. I can put the Celeron
in performance mode. Using
https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh
(based on a script by Andy Polyakov):

$ sudo ./governor.sh perf
Current CPU governor scaling settings:
  CPU 0: powersave
  CPU 1: powersave
  CPU 2: powersave
  CPU 3: powersave
New CPU governor scaling settings:
  CPU 0: performance
  CPU 1: performance
  CPU 2: performance
  CPU 3: performance

The benchmarks are then performed using the new governor scaling,
which I believe is max freq.

>> A small suggestion may be to update Section 8 Installation
>> (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not
>> obvious to me how to enable the hardware acceleration.
>
> There's an --enable-x86-aesni configure option which should enable the
> aesni code unconditionally in non-fat builds. And an --enable-arm-neon.
> But it seems I forgot to add a corresponding --enable-x86-sha-ni.
>
> But --enable-fat is the most common way to enable the support. I'm
> considering enabling it by default in the next release.

+1.

Jeff
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-02-08 Thread Niels Möller
Jeffrey Walton  writes:

> Looks good on a Celeron J3455, which is a [low-end] Goldmont machine
> with the instructions:

[...]

> goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/
> ./examples/nettle-benchmark
> sha1_compress: 84.60 cycles

85 cycles is a lot less than than 136 cycles I observed in my testing.
The function is 131 instructions long, so it's approximately 1.5
instructions per cycle.

>   sha1   update 1194.33
>   openssl sha1   update 1321.71

And this is a 11% difference (compared to 8% in my benckmarks). Makes
sense if the main crunching is fewer cycles, then the per block function
call overhead is relatively larger.

> A small suggestion may be to update Section 8 Installation
> (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not
> obvious to me how to enable the hardware acceleration.

There's an --enable-x86-aesni configure option which should enable the
aesni code unconditionally in non-fat builds. And an --enable-arm-neon.
But it seems I forgot to add a corresponding --enable-x86-sha-ni.

But --enable-fat is the most common way to enable the support. I'm
considering enabling it by default in the next release.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs


Re: x86 sha_ni

2018-02-08 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

> Below replacement for sha1-compress.asm seems to run on roughly 2
> cycles/byte when I benchmark it on an "AMD Ryzen 7 1700X" cpu in the gcc
> compile farm. Still sligthly slower than openssl, to squeeze out a few
> more cycles, it might help to change the sha1_compress interface to let
> it process more than one 64-byte block at a time.
>
> I hope to be able to wire it up via fat-x86_64.c reasonably soon. In the
> mean time, if anyone wants to try it out, just change the
> sha1-compress.asm symlink to point to this file.

Enabled via fat-x86_64 now, and pushed to a branch named
x86_64-sha_ni-sha1.

I intend to merge to master soon.

Testing and benchmarking appreciated.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs