Re: AES-NI: slower than aes-generic?

2016-06-08 Thread Stephan Mueller
Am Donnerstag, 26. Mai 2016, 22:14:29 schrieb Theodore Ts'o:

Hi Theodore,

> On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote:
> > Using the kernel crypto API one can relieve the CPU of the crypto work, if
> > a hardware or assembler implementation is available. This may be of
> > particular interest for smaller systems. So, for smaller systems (where
> > kernel bloat is bad, but where now these days more and more hardware
> > crypto support is added), we must weigh the kernel bloat (of 3 or 4
> > additional C files for the basic kernel crypto API logic) against
> > relieving the CPU of work.
> 
> There are a number of caveats with using hardware acceleration; one is
> that many hardware accelerators are optimized for bulk data
> encryption, and so key scheduling, or switching between key schedules,
> can have a higher overhead that a pure software implementation.

As a followup: I tweaked the DRBG side a bit to use the full speed of the AES-
NI implementation. With that tweak, the initial finding does not apply any 
more.

I depending on the request size, I now get more than 800 MB/s (increase by 
more than 450% compared to the initial implementation) from the AES-NI 
implementation. Hence, the frequent key schedule update seems to be not too 
much of an issue.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-29 Thread Theodore Ts'o
On Sun, May 29, 2016 at 09:51:59PM +0200, Stephan Mueller wrote:
> 
> I personally am not sure that taking some arbitrary cipher and turning it 
> into 
> a DRNG by simply using a self-feeding loop based on the ideas of X9.31 
> Appendix A2.4 is good. Chacha20 is a good cipher, but is it equally good for 
> a 
> DRNG? I do not know. There are too little assessments from mathematicians out 
> there regarding that topic.

If ChCha20 is a good (stream) cipher, it must be a good DRNG by
definition.  In other words, if you can predict the output of
ChaCha20-base DRNG with any accuracy greater than chance, this can be
used as a wedge to attack the stream cipher..

I will note that OpenBSD's "ARC4" random number generator is currently
using ChaCha20, BTW.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-29 Thread Stephan Mueller
Am Samstag, 28. Mai 2016, 07:28:25 schrieb Aaron Zauner:

Hi Aaron,

> Heya,
> 
> > On 27 May 2016, at 01:49, Stephan Mueller  wrote:
> > Then, the use of the DRBG offers users to choose between a Hash/HMAC and
> > CTR implementation to suit their needs. The DRBG code is agnostic of the
> > underlying cipher. So, you could even use Blowfish instead of AES or
> > whirlpool instead of SHA -- these changes are just one entry in
> > drbg_cores[] away without any code change.
> 
> That's a really nice change and something I've been thinking about for a
> couple of months as well. Then I came across tytso's ChaCha patches to
> urandom and was thinking ISA dependent switches between ciphers would make
> sense, i.e. you get AESNI performance when there's support.
> > Finally, the LRNG code is completely agnostic of the underlying
> > deterministic RNG. You only need a replacement of two small functions to
> > invoke the seeding and generate operation of a DRNG. So, if one wants a
> > Chacha20, he can have it. If one wants X9.31, he can have it. See section
> > 2.8.3 [1] -- note, that DRNG does not even need to be a kernel crypto API
> > registered implementation.
> It's valid criticism that the number of algorithms should be limited.
> Algorithmic agility is an issue and has caused many real-world security
> problems in protocols liberally granting crypto primitives to be chosen by
> the user isn't a good idea. We should think about algorithms that make
> sense. E.g. TLS 1.3 and HTTP/2 have been moving into this direction. TLS
> 1.3 will only allow a couple off cipher-suites as opposed to combinatorial
> explosion with previous iterations of the protocol.

I cannot agree more with you, if the attacker can choose the algo. However, I 
would think that a compile time selection of one specific algo is not prone to 
this issue. Also, the code of the LRNG provides a pre-defined set of DRBGs 
that should not leave any wish open. Hence, I am not sure that too many folks 
would change the code here.

Though, if folks really want to, they have the option to do so.
> 
> I'd suggest sticking to AES-CTR and ChaCha20 for DRNG designs. That should
> fit almost all platforms with great performance, keep code-base small etc.

Regarding the CTR DRBG: I did not make it default out of two reasons:

- it is not the fastest -- as I just found a drag on the CTR DRBG performance 
that I want to push upstream after closing the merge window. With that patch 
the CTR DRBG now is the fastest by orders of magnitude. So, this issue does 
not apply any more.

- the DF/BCC function in the DRBG is critical as I think it looses entropy 
IMHO. When you seed the DRBG with, say 256 or 384 bits of data, the BCC acts 
akin a MAC by taking the 256 or 384 bits and collapse it into one AES block of 
128 bits. Then he DF function expands this one block into the DRBG internal 
state including the AES key of 256 / 384 bits depending on the type of AES you 
use. So, if you have 256 bits of entropy in the seed, you have 128 bits left 
after the BCC operation.

Given that criticism, I am asking whether the use of the CTR DRBG with AES > 
128 should be used as default. Also, the CTR DRBG is the most complex of all 
three DRBGs (with the HMAC, the current default, is the leanest and cleanest).

But if folks think that the CTR DRBG should be made the default, I would 
listen and make it so.

> 
> There's now heavily optimised assembly in OpenSSL for ChaCha20 if you want
> to take a look:
> https://github.com/openssl/openssl/tree/master/crypto/chacha/asm But as
> mentioned in the ChaCha/urandom thread: architecture specific optimisation
> may be painful and error-prone.

I personally am not sure that taking some arbitrary cipher and turning it into 
a DRNG by simply using a self-feeding loop based on the ideas of X9.31 
Appendix A2.4 is good. Chacha20 is a good cipher, but is it equally good for a 
DRNG? I do not know. There are too little assessments from mathematicians out 
there regarding that topic.

Hence, I rather like to stick to DRNG designs that have been analyzed by 
different folks.

> > Bottom line, I want to give folks a full flexibility. That said, the LRNG
> > code is more of a logic to collect entropy and maintain two DRNG types
> > which are seeded according to a defined schedule than it is a tightly
> > integrated RNG.
> > 
> > Also, I am not so sure that simply taking a cipher, sprinkling some
> > backtracking logic on it implies you have a good DRNG. As of now, I have
> > not seen assessments from others for the Chacha20 DRNG approach. I
> > personally would think that the Chacha20 approach from Ted is good. Yet
> > others may have a more conservative approach of using a DRNG
> > implementation that has been reviewed by a lot of folks.
> > 
> > [1] http://www.chronox.de/lrng/doc/lrng.pdf
> 
> Currently reading that paper, it seems like a solid approach.

There was criticism on the entropy maintenance. I have now reverted 

Re: AES-NI: slower than aes-generic?

2016-05-27 Thread Aaron Zauner
Heya,

> On 27 May 2016, at 01:49, Stephan Mueller  wrote:
> Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR
> implementation to suit their needs. The DRBG code is agnostic of the
> underlying cipher. So, you could even use Blowfish instead of AES or whirlpool
> instead of SHA -- these changes are just one entry in drbg_cores[] away
> without any code change.

That's a really nice change and something I've been thinking about for a couple 
of months as well. Then I came across tytso's ChaCha patches to urandom and was 
thinking ISA dependent switches between ciphers would make sense, i.e. you get 
AESNI performance when there's support.

> Finally, the LRNG code is completely agnostic of the underlying deterministic
> RNG. You only need a replacement of two small functions to invoke the seeding
> and generate operation of a DRNG. So, if one wants a Chacha20, he can have it.
> If one wants X9.31, he can have it. See section 2.8.3 [1] -- note, that DRNG
> does not even need to be a kernel crypto API registered implementation.

It's valid criticism that the number of algorithms should be limited. 
Algorithmic agility is an issue and has caused many real-world security 
problems in protocols liberally granting crypto primitives to be chosen by the 
user isn't a good idea. We should think about algorithms that make sense. E.g. 
TLS 1.3 and HTTP/2 have been moving into this direction. TLS 1.3 will only 
allow a couple off cipher-suites as opposed to combinatorial explosion with 
previous iterations of the protocol.

I'd suggest sticking to AES-CTR and ChaCha20 for DRNG designs. That should fit 
almost all platforms with great performance, keep code-base small etc.

There's now heavily optimised assembly in OpenSSL for ChaCha20 if you want to 
take a look: https://github.com/openssl/openssl/tree/master/crypto/chacha/asm
But as mentioned in the ChaCha/urandom thread: architecture specific 
optimisation may be painful and error-prone.

> Bottom line, I want to give folks a full flexibility. That said, the LRNG code
> is more of a logic to collect entropy and maintain two DRNG types which are
> seeded according to a defined schedule than it is a tightly integrated RNG.
> 
> Also, I am not so sure that simply taking a cipher, sprinkling some
> backtracking logic on it implies you have a good DRNG. As of now, I have not
> seen assessments from others for the Chacha20 DRNG approach. I personally
> would think that the Chacha20 approach from Ted is good. Yet others may have a
> more conservative approach of using a DRNG implementation that has been
> reviewed by a lot of folks.
> 
> [1] http://www.chronox.de/lrng/doc/lrng.pdf

Currently reading that paper, it seems like a solid approach.

I don't like the approach that user-space programs may modify entropy. It's a 
myth that `haveged` etc. provide more security, and EGDs have been barely 
audited, usually written as academic work and have been completely 
unmaintained. I regularly end up in randomness[sic!] discussions with core 
language maintainers [0] [1] - they seem to have little understanding of what's 
going on in the kernel and either use /dev/random as a seed or a Userspace RNG 
(most of which aren't particularly safe to begin with -- OpenSSL is not fork 
safe [2] [3], a recent paper found weaknesses in the OpenSSL RNG at low entropy 
state leaking secrets [4] et cetera). This seems to be mostly the case because 
of the infamous `random(4)` man-page. With end-users (protocol implementers, 
core language designers,..) always pointing to upstream, which - of course - is 
the Linux kernel.

I can't really tell from the paper if /dev/random would still be blocking in 
some cases? If so that's unfortunate.

Thanks for your work on this,
Aaron

[0] https://bugs.ruby-lang.org/issues/9569
[1] https://github.com/nodejs/node/issues/5798
[2] 
https://emboss.github.io/blog/2013/08/21/openssl-prng-is-not-really-fork-safe/
[3] https://wiki.openssl.org/index.php/Random_fork-safety
[4] https://eprint.iacr.org/2016/367.pdf


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: AES-NI: slower than aes-generic?

2016-05-27 Thread Jeffrey Walton
> If we implement something which happens to result in a 2 minute stall
> in boot times, the danger is that a clueless engineer at Sony, or LGE,
> or Motorola, or BMW, or Toyota, etc, will "fix" the problem without
> telling anyone about what they did, and we might not notice right away
> that the fix was in fact catastrophically bad.

This is an non-trivial threat. +1 for recognizing it.

I know of one VM hypervisor used in US Financial that was effectively
doing "One thing you should not do is the following..." from
http://lwn.net/Articles/525459/.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-27 Thread Stephan Mueller
Am Donnerstag, 26. Mai 2016, 22:14:29 schrieb Theodore Ts'o:

Hi Theodore,

> On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote:
> > Using the kernel crypto API one can relieve the CPU of the crypto work, if
> > a hardware or assembler implementation is available. This may be of
> > particular interest for smaller systems. So, for smaller systems (where
> > kernel bloat is bad, but where now these days more and more hardware
> > crypto support is added), we must weigh the kernel bloat (of 3 or 4
> > additional C files for the basic kernel crypto API logic) against
> > relieving the CPU of work.
> 
> There are a number of caveats with using hardware acceleration; one is
> that many hardware accelerators are optimized for bulk data
> encryption, and so key scheduling, or switching between key schedules,
> can have a higher overhead that a pure software implementation.

Squeezing the last drop of speed out of the ciphers for the LRNG is not my 
priority given that the speed is limited by the reseeding. The LRNG should 
allow the CPU to offload the crypto work. For small systems, crypto is intense 
work that could be spend elsewhere.
> 
> There have also been situations where the hardware crypto engine is
> actually slower than a highly optimized software implementation.  This
> has been the case for certain ARM SOC's, for example.

And I would be fine with that. Besides, if a user wants to use software 
implementations with the LRNG and still offer HW support for all else, all 
they need to do is to statically compile the software implementation and 
compile the hardware support as a module. As the LRNG initializes before 
kernel modules can be loaded, it can only use what it finds in the static 
kernel.
> 
> This is not that big of deal, if you are developing a cryptographic
> application (such as file system level encryption, for example) for a
> specific hardware platform (such as a specific Nexus device).  But if
> you are trying to develop a generic service that has to work on a wide
> variety of CPU architectures, and specific CPU/SOC implementations,
> life is a lot more complicated.  I've worked on both problems, let me
> assure you the second is way tricker than the first.
> 
> > Then, the use of the DRBG offers users to choose between a Hash/HMAC and
> > CTR implementation to suit their needs. The DRBG code is agnostic of the
> > underlying cipher. So, you could even use Blowfish instead of AES or
> > whirlpool instead of SHA -- these changes are just one entry in
> > drbg_cores[] away without any code change.
> 
> I really question how much this matters in practice.  Unless you are a
> US Government Agency, where you might be laboring under a Federal
> mandate to use DUAL-EC DRBG (for example), most users really don't

I am not sure such references help the discussion.

> care about the details of the algorithm used in their random number
> generator.  Giving users choice (or lots of knobs) isn't necessarily
> always a feature, as the many TLS downgrade attacks have demonstrated.

The options are at compile time, not at runtime.
> 
> This is why from my perspectve it's more important to implement an
> interface which is always there, and which by default is secure, to
> minimize the chances that random JV-team kernel developers working for
> a Linux distribution or some consumer electronics manufacturer won't
> actually make things worse.  As the Debian's attempt to "improve" the
> security of OpenSSL demonstrates, it doesn't always end well.  :-)

Rest assured, the current implementation of /dev/random gives many people many 
headaches. And I can tell you that I have seen "random JV-team kernel 
developers" doing things you do not want to know just to make the behavior 
better.

And even if they do not change anything, I am yet under the impression that 
the current implementation has shortcommings in typical deployment scenarios 
(mainly VMs and headless server systems).

Hence I want to give a framework where people can safely alter a few things to 
suit their needs. But the things they can change should not affect the overall 
security.
> 
> If we implement something which happens to result in a 2 minute stall
> in boot times, the danger is that a clueless engineer at Sony, or LGE,
> or Motorola, or BMW, or Toyota, etc, will "fix" the problem without
> telling anyone about what they did, and we might not notice right away
> that the fix was in fact catastrophically bad.

Such "fixes" are employed these days already! And they are not employed 
because of the used crypto (which was the topic this thread started), but due 
to the handling and accounting of the initial entropy.

So, I think that the used crypto for the DRNG side is just the icing (hence I 
said I can live with SP800-90A, your Chacha20, even X9.31 given that the LRNG 
ensures proper seeding and reseeding). The real issues are in the entropy 
accounting and maintenance and the reseeding of the DRNGs.

Ciao
Stephan
--

Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Theodore Ts'o
On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote:
> 
> Using the kernel crypto API one can relieve the CPU of the crypto work, if a 
> hardware or assembler implementation is available. This may be of particular 
> interest for smaller systems. So, for smaller systems (where kernel bloat is 
> bad, but where now these days more and more hardware crypto support is 
> added), 
> we must weigh the kernel bloat (of 3 or 4 additional C files for the basic 
> kernel crypto API logic) against relieving the CPU of work.

There are a number of caveats with using hardware acceleration; one is
that many hardware accelerators are optimized for bulk data
encryption, and so key scheduling, or switching between key schedules,
can have a higher overhead that a pure software implementation.

There have also been situations where the hardware crypto engine is
actually slower than a highly optimized software implementation.  This
has been the case for certain ARM SOC's, for example.

This is not that big of deal, if you are developing a cryptographic
application (such as file system level encryption, for example) for a
specific hardware platform (such as a specific Nexus device).  But if
you are trying to develop a generic service that has to work on a wide
variety of CPU architectures, and specific CPU/SOC implementations,
life is a lot more complicated.  I've worked on both problems, let me
assure you the second is way tricker than the first.

> Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR 
> implementation to suit their needs. The DRBG code is agnostic of the 
> underlying cipher. So, you could even use Blowfish instead of AES or 
> whirlpool 
> instead of SHA -- these changes are just one entry in drbg_cores[] away 
> without any code change.

I really question how much this matters in practice.  Unless you are a
US Government Agency, where you might be laboring under a Federal
mandate to use DUAL-EC DRBG (for example), most users really don't
care about the details of the algorithm used in their random number
generator.  Giving users choice (or lots of knobs) isn't necessarily
always a feature, as the many TLS downgrade attacks have demonstrated.

This is why from my perspectve it's more important to implement an
interface which is always there, and which by default is secure, to
minimize the chances that random JV-team kernel developers working for
a Linux distribution or some consumer electronics manufacturer won't
actually make things worse.  As the Debian's attempt to "improve" the
security of OpenSSL demonstrates, it doesn't always end well.  :-)

If we implement something which happens to result in a 2 minute stall
in boot times, the danger is that a clueless engineer at Sony, or LGE,
or Motorola, or BMW, or Toyota, etc, will "fix" the problem without
telling anyone about what they did, and we might not notice right away
that the fix was in fact catastrophically bad.

These aren't the standard things which academics tend to worry about,
which tend to assume that attackers will be able to read arbitrary
kernel memory, and recovering such an exposure of the entropy pool is
_the_ most important thing to worry about (as opposed to say, the
contents of the user's private keys in the ssh-agent process).  But
this will perhaps explain why worrying about accomodating users who
care about whether Blowfish or AES should be used in their random
number generator isn't near the top of my personal priority list.

Cheers,

- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Sandy Harris
On Thu, May 26, 2016 at 2:49 PM, Stephan Mueller  wrote:

> Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR
> implementation to suit their needs. The DRBG code is agnostic of the
> underlying cipher. So, you could even use Blowfish instead of AES or whirlpool
> instead of SHA -- these changes are just one entry in drbg_cores[] away
> without any code change.

Not Blowfish in anything like the code you describe! It has only
64-bit blocks which might or might not be a problem, but it also has
an extremely expensive key schedule which would be awful if you want
to rekey often.

I'd say if you want a block cipher there you can quite safely restrict
the interface to ciphers with the same block & key sizes as AES.
Implement AES and one of the other finalists (I'd pick Serpent) to
test, and others can add the remaining finalists or national standards
like Korean ARIA or the Japanese one if they want them.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Stephan Mueller
Am Donnerstag, 26. Mai 2016, 14:20:19 schrieb Sandy Harris:

Hi Sandy,
> 
> Why are you using AES? Granted, it is a reasonable idea, but when Ted
> replaced the non-blocking pool with a DBRG, he used a different cipher
> (I think chacha, not certain) and I think chose not to use the crypto
> library implementation to avoid kernel bloat.
> 
> So he has adopted on of your better ideas. Why not follow his
> lead on how to implement it?

Using the kernel crypto API one can relieve the CPU of the crypto work, if a 
hardware or assembler implementation is available. This may be of particular 
interest for smaller systems. So, for smaller systems (where kernel bloat is 
bad, but where now these days more and more hardware crypto support is added), 
we must weigh the kernel bloat (of 3 or 4 additional C files for the basic 
kernel crypto API logic) against relieving the CPU of work.

Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR 
implementation to suit their needs. The DRBG code is agnostic of the 
underlying cipher. So, you could even use Blowfish instead of AES or whirlpool 
instead of SHA -- these changes are just one entry in drbg_cores[] away 
without any code change.

Finally, the LRNG code is completely agnostic of the underlying deterministic 
RNG. You only need a replacement of two small functions to invoke the seeding 
and generate operation of a DRNG. So, if one wants a Chacha20, he can have it. 
If one wants X9.31, he can have it. See section 2.8.3 [1] -- note, that DRNG 
does not even need to be a kernel crypto API registered implementation.

Bottom line, I want to give folks a full flexibility. That said, the LRNG code 
is more of a logic to collect entropy and maintain two DRNG types which are 
seeded according to a defined schedule than it is a tightly integrated RNG.

Also, I am not so sure that simply taking a cipher, sprinkling some 
backtracking logic on it implies you have a good DRNG. As of now, I have not 
seen assessments from others for the Chacha20 DRNG approach. I personally 
would think that the Chacha20 approach from Ted is good. Yet others may have a 
more conservative approach of using a DRNG implementation that has been 
reviewed by a lot of folks.

[1] http://www.chronox.de/lrng/doc/lrng.pdf

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Sandy Harris
Stephan Mueller  wrote:

> for the DRBG and the LRNG work I am doing, I also test the speed of the DRBG.
> The DRBG can be considered as a form of block chaining mode on top of a raw
> cipher.
>
> What I am wondering is that when encrypting 256 16 byte blocks, I get a speed
> of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes-
> asm, I get up to 180 MB/s with all else being equal. Note, that figure
> includes a copy_to_user of the generated data.

Why are you using AES? Granted, it is a reasonable idea, but when Ted
replaced the non-blocking pool with a DBRG, he used a different cipher
(I think chacha, not certain) and I think chose not to use the crypto
library implementation to avoid kernel bloat.

So he has adopted on of your better ideas. Why not follow his
lead on how to implement it?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Stephan Mueller
Am Donnerstag, 26. Mai 2016, 19:30:01 schrieb Stephan Mueller:

Hi,

> 
> However, the key difference to a standard speed test is that I set up a new
> key schedule quite frequently. And I would suspect that something is going
> on here...

With tcrypt, there is some interesting hint: on smaller blocks, the C 
implementation is indeed faster:

[   20.391510] 
   testing speed of async ecb(aes) (ecb(aes-generic)) encryption
   20.391513] test 0 (128 bit key, 16 byte blocks): 1 operation in 275 cycles 
(16 bytes)
[   20.391517] test 1 (128 bit key, 64 byte blocks): 1 operation in 702 cycles 
(64 bytes)
[   20.391521] test 2 (128 bit key, 256 byte blocks): 1 operation in 2431 
cycles (256 bytes)
[   20.391532] test 3 (128 bit key, 1024 byte blocks): 1 operation in 9347 
cycles (1024 bytes)
[   20.391570] test 4 (128 bit key, 8192 byte blocks): 1 operation in 74375 
cycles (8192 bytes)


vs for ecb-aes-aesni:

[  143.482123] test 0 (128 bit key, 16 byte blocks): 1 operation in 1203 
cycles (16 bytes)
[  143.482138] test 1 (128 bit key, 64 byte blocks): 1 operation in 1328 
cycles (64 bytes)
[  143.482148] test 2 (128 bit key, 256 byte blocks): 1 operation in 1922 
cycles (256 bytes)
[  143.482159] test 3 (128 bit key, 1024 byte blocks): 1 operation in 3328 
cycles (1024 bytes)
[  143.482176] test 4 (128 bit key, 8192 byte blocks): 1 operation in 19483 
cycles (8192 bytes)


As I use crypto_cipher_encrypt_one, I only send one block at a time to AES-NI.


Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Jeffrey Walton
> What I am wondering is that when encrypting 256 16 byte blocks, I get a speed
> of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes-
> asm, I get up to 180 MB/s with all else being equal. Note, that figure
> includes a copy_to_user of the generated data.
>
> ...

Something sounds amiss.

AES-NI should be on the order of magnitude faster than a generic
implementation. Can you verify AES-NI is actually using AES-NI, and
aes-generic is a software implementation?

Here are some OpenSSL numbers. EVP uses AES-NI when available.
Omitting -evp means its software only (no hardware acceleration, like
AES-NI).

$ openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128-cbc 626533.60k   669884.42k   680917.93k   682079.91k   684736.51k


$ openssl speed -elapsed aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
...
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128 cbc 106520.59k   114380.16k   116741.46k   117489.32k   117563.39k

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AES-NI: slower than aes-generic?

2016-05-26 Thread Stephan Mueller
Am Donnerstag, 26. Mai 2016, 13:25:02 schrieb Jeffrey Walton:

Hi Jeffrey,

> > What I am wondering is that when encrypting 256 16 byte blocks, I get a
> > speed of about 170 MB/s with the AES-NI driver. When using the
> > aes-generic or aes- asm, I get up to 180 MB/s with all else being equal.
> > Note, that figure includes a copy_to_user of the generated data.
> > 
> > ...
> 
> Something sounds amiss.
> 
> AES-NI should be on the order of magnitude faster than a generic
> implementation. Can you verify AES-NI is actually using AES-NI, and
> aes-generic is a software implementation?

I am pretty sure I am using the right implementations as I checked the 
refcount in /proc/crypto.
> 
> Here are some OpenSSL numbers. EVP uses AES-NI when available.
> Omitting -evp means its software only (no hardware acceleration, like
> AES-NI).

I understand that AES-NI should be faster. That is what I am wondering about.

However, the key difference to a standard speed test is that I set up a new 
key schedule quite frequently. And I would suspect that something is going on 
here...

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AES-NI: slower than aes-generic?

2016-05-26 Thread Stephan Mueller
Hi,

for the DRBG and the LRNG work I am doing, I also test the speed of the DRBG. 
The DRBG can be considered as a form of block chaining mode on top of a raw 
cipher.

What I am wondering is that when encrypting 256 16 byte blocks, I get a speed 
of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes-
asm, I get up to 180 MB/s with all else being equal. Note, that figure 
includes a copy_to_user of the generated data.

To be precise, the code does the following steps:

1. setkey

2. AES encrypt 256 blocks and copy_to_user

3. Use AES to generate a new key and start at step 1.

I am wondering why the AES-NI driver is slower than the C/ASM implementations 
given that all else is equal.

Note, if I have less blocks in step 2 above, AES-NI is becoming faster. E.g. 
if I have 8 blocks or just one block in step 2 above, AES-NI is faster by 10 
or 20%.

Ciao
Stephan
-- 
atsec information security GmbH, Steinstraße 70, 81667 München, Germany
P: +49 89 442 49 830 - F: +49 89 442 49 831
M: +49 172 216 55 78 - HRB: 129439 (Amtsgericht München)
US: +1 949 545 4096
GF: Salvatore la Pietra, Staffan Persson
atsec it security news blog - atsec-information-security.blogspot.com

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html