Re: AES-NI: slower than aes-generic?
Am Donnerstag, 26. Mai 2016, 22:14:29 schrieb Theodore Ts'o: Hi Theodore, > On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote: > > Using the kernel crypto API one can relieve the CPU of the crypto work, if > > a hardware or assembler implementation is available. This may be of > > particular interest for smaller systems. So, for smaller systems (where > > kernel bloat is bad, but where now these days more and more hardware > > crypto support is added), we must weigh the kernel bloat (of 3 or 4 > > additional C files for the basic kernel crypto API logic) against > > relieving the CPU of work. > > There are a number of caveats with using hardware acceleration; one is > that many hardware accelerators are optimized for bulk data > encryption, and so key scheduling, or switching between key schedules, > can have a higher overhead that a pure software implementation. As a followup: I tweaked the DRBG side a bit to use the full speed of the AES- NI implementation. With that tweak, the initial finding does not apply any more. I depending on the request size, I now get more than 800 MB/s (increase by more than 450% compared to the initial implementation) from the AES-NI implementation. Hence, the frequent key schedule update seems to be not too much of an issue. Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
On Sun, May 29, 2016 at 09:51:59PM +0200, Stephan Mueller wrote: > > I personally am not sure that taking some arbitrary cipher and turning it > into > a DRNG by simply using a self-feeding loop based on the ideas of X9.31 > Appendix A2.4 is good. Chacha20 is a good cipher, but is it equally good for > a > DRNG? I do not know. There are too little assessments from mathematicians out > there regarding that topic. If ChCha20 is a good (stream) cipher, it must be a good DRNG by definition. In other words, if you can predict the output of ChaCha20-base DRNG with any accuracy greater than chance, this can be used as a wedge to attack the stream cipher.. I will note that OpenBSD's "ARC4" random number generator is currently using ChaCha20, BTW. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Am Samstag, 28. Mai 2016, 07:28:25 schrieb Aaron Zauner: Hi Aaron, > Heya, > > > On 27 May 2016, at 01:49, Stephan Muellerwrote: > > Then, the use of the DRBG offers users to choose between a Hash/HMAC and > > CTR implementation to suit their needs. The DRBG code is agnostic of the > > underlying cipher. So, you could even use Blowfish instead of AES or > > whirlpool instead of SHA -- these changes are just one entry in > > drbg_cores[] away without any code change. > > That's a really nice change and something I've been thinking about for a > couple of months as well. Then I came across tytso's ChaCha patches to > urandom and was thinking ISA dependent switches between ciphers would make > sense, i.e. you get AESNI performance when there's support. > > Finally, the LRNG code is completely agnostic of the underlying > > deterministic RNG. You only need a replacement of two small functions to > > invoke the seeding and generate operation of a DRNG. So, if one wants a > > Chacha20, he can have it. If one wants X9.31, he can have it. See section > > 2.8.3 [1] -- note, that DRNG does not even need to be a kernel crypto API > > registered implementation. > It's valid criticism that the number of algorithms should be limited. > Algorithmic agility is an issue and has caused many real-world security > problems in protocols liberally granting crypto primitives to be chosen by > the user isn't a good idea. We should think about algorithms that make > sense. E.g. TLS 1.3 and HTTP/2 have been moving into this direction. TLS > 1.3 will only allow a couple off cipher-suites as opposed to combinatorial > explosion with previous iterations of the protocol. I cannot agree more with you, if the attacker can choose the algo. However, I would think that a compile time selection of one specific algo is not prone to this issue. Also, the code of the LRNG provides a pre-defined set of DRBGs that should not leave any wish open. Hence, I am not sure that too many folks would change the code here. Though, if folks really want to, they have the option to do so. > > I'd suggest sticking to AES-CTR and ChaCha20 for DRNG designs. That should > fit almost all platforms with great performance, keep code-base small etc. Regarding the CTR DRBG: I did not make it default out of two reasons: - it is not the fastest -- as I just found a drag on the CTR DRBG performance that I want to push upstream after closing the merge window. With that patch the CTR DRBG now is the fastest by orders of magnitude. So, this issue does not apply any more. - the DF/BCC function in the DRBG is critical as I think it looses entropy IMHO. When you seed the DRBG with, say 256 or 384 bits of data, the BCC acts akin a MAC by taking the 256 or 384 bits and collapse it into one AES block of 128 bits. Then he DF function expands this one block into the DRBG internal state including the AES key of 256 / 384 bits depending on the type of AES you use. So, if you have 256 bits of entropy in the seed, you have 128 bits left after the BCC operation. Given that criticism, I am asking whether the use of the CTR DRBG with AES > 128 should be used as default. Also, the CTR DRBG is the most complex of all three DRBGs (with the HMAC, the current default, is the leanest and cleanest). But if folks think that the CTR DRBG should be made the default, I would listen and make it so. > > There's now heavily optimised assembly in OpenSSL for ChaCha20 if you want > to take a look: > https://github.com/openssl/openssl/tree/master/crypto/chacha/asm But as > mentioned in the ChaCha/urandom thread: architecture specific optimisation > may be painful and error-prone. I personally am not sure that taking some arbitrary cipher and turning it into a DRNG by simply using a self-feeding loop based on the ideas of X9.31 Appendix A2.4 is good. Chacha20 is a good cipher, but is it equally good for a DRNG? I do not know. There are too little assessments from mathematicians out there regarding that topic. Hence, I rather like to stick to DRNG designs that have been analyzed by different folks. > > Bottom line, I want to give folks a full flexibility. That said, the LRNG > > code is more of a logic to collect entropy and maintain two DRNG types > > which are seeded according to a defined schedule than it is a tightly > > integrated RNG. > > > > Also, I am not so sure that simply taking a cipher, sprinkling some > > backtracking logic on it implies you have a good DRNG. As of now, I have > > not seen assessments from others for the Chacha20 DRNG approach. I > > personally would think that the Chacha20 approach from Ted is good. Yet > > others may have a more conservative approach of using a DRNG > > implementation that has been reviewed by a lot of folks. > > > > [1] http://www.chronox.de/lrng/doc/lrng.pdf > > Currently reading that paper, it seems like a solid approach. There was criticism on the entropy maintenance. I have now reverted
Re: AES-NI: slower than aes-generic?
Heya, > On 27 May 2016, at 01:49, Stephan Muellerwrote: > Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR > implementation to suit their needs. The DRBG code is agnostic of the > underlying cipher. So, you could even use Blowfish instead of AES or whirlpool > instead of SHA -- these changes are just one entry in drbg_cores[] away > without any code change. That's a really nice change and something I've been thinking about for a couple of months as well. Then I came across tytso's ChaCha patches to urandom and was thinking ISA dependent switches between ciphers would make sense, i.e. you get AESNI performance when there's support. > Finally, the LRNG code is completely agnostic of the underlying deterministic > RNG. You only need a replacement of two small functions to invoke the seeding > and generate operation of a DRNG. So, if one wants a Chacha20, he can have it. > If one wants X9.31, he can have it. See section 2.8.3 [1] -- note, that DRNG > does not even need to be a kernel crypto API registered implementation. It's valid criticism that the number of algorithms should be limited. Algorithmic agility is an issue and has caused many real-world security problems in protocols liberally granting crypto primitives to be chosen by the user isn't a good idea. We should think about algorithms that make sense. E.g. TLS 1.3 and HTTP/2 have been moving into this direction. TLS 1.3 will only allow a couple off cipher-suites as opposed to combinatorial explosion with previous iterations of the protocol. I'd suggest sticking to AES-CTR and ChaCha20 for DRNG designs. That should fit almost all platforms with great performance, keep code-base small etc. There's now heavily optimised assembly in OpenSSL for ChaCha20 if you want to take a look: https://github.com/openssl/openssl/tree/master/crypto/chacha/asm But as mentioned in the ChaCha/urandom thread: architecture specific optimisation may be painful and error-prone. > Bottom line, I want to give folks a full flexibility. That said, the LRNG code > is more of a logic to collect entropy and maintain two DRNG types which are > seeded according to a defined schedule than it is a tightly integrated RNG. > > Also, I am not so sure that simply taking a cipher, sprinkling some > backtracking logic on it implies you have a good DRNG. As of now, I have not > seen assessments from others for the Chacha20 DRNG approach. I personally > would think that the Chacha20 approach from Ted is good. Yet others may have a > more conservative approach of using a DRNG implementation that has been > reviewed by a lot of folks. > > [1] http://www.chronox.de/lrng/doc/lrng.pdf Currently reading that paper, it seems like a solid approach. I don't like the approach that user-space programs may modify entropy. It's a myth that `haveged` etc. provide more security, and EGDs have been barely audited, usually written as academic work and have been completely unmaintained. I regularly end up in randomness[sic!] discussions with core language maintainers [0] [1] - they seem to have little understanding of what's going on in the kernel and either use /dev/random as a seed or a Userspace RNG (most of which aren't particularly safe to begin with -- OpenSSL is not fork safe [2] [3], a recent paper found weaknesses in the OpenSSL RNG at low entropy state leaking secrets [4] et cetera). This seems to be mostly the case because of the infamous `random(4)` man-page. With end-users (protocol implementers, core language designers,..) always pointing to upstream, which - of course - is the Linux kernel. I can't really tell from the paper if /dev/random would still be blocking in some cases? If so that's unfortunate. Thanks for your work on this, Aaron [0] https://bugs.ruby-lang.org/issues/9569 [1] https://github.com/nodejs/node/issues/5798 [2] https://emboss.github.io/blog/2013/08/21/openssl-prng-is-not-really-fork-safe/ [3] https://wiki.openssl.org/index.php/Random_fork-safety [4] https://eprint.iacr.org/2016/367.pdf signature.asc Description: Message signed with OpenPGP using GPGMail
Re: AES-NI: slower than aes-generic?
> If we implement something which happens to result in a 2 minute stall > in boot times, the danger is that a clueless engineer at Sony, or LGE, > or Motorola, or BMW, or Toyota, etc, will "fix" the problem without > telling anyone about what they did, and we might not notice right away > that the fix was in fact catastrophically bad. This is an non-trivial threat. +1 for recognizing it. I know of one VM hypervisor used in US Financial that was effectively doing "One thing you should not do is the following..." from http://lwn.net/Articles/525459/. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Am Donnerstag, 26. Mai 2016, 22:14:29 schrieb Theodore Ts'o: Hi Theodore, > On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote: > > Using the kernel crypto API one can relieve the CPU of the crypto work, if > > a hardware or assembler implementation is available. This may be of > > particular interest for smaller systems. So, for smaller systems (where > > kernel bloat is bad, but where now these days more and more hardware > > crypto support is added), we must weigh the kernel bloat (of 3 or 4 > > additional C files for the basic kernel crypto API logic) against > > relieving the CPU of work. > > There are a number of caveats with using hardware acceleration; one is > that many hardware accelerators are optimized for bulk data > encryption, and so key scheduling, or switching between key schedules, > can have a higher overhead that a pure software implementation. Squeezing the last drop of speed out of the ciphers for the LRNG is not my priority given that the speed is limited by the reseeding. The LRNG should allow the CPU to offload the crypto work. For small systems, crypto is intense work that could be spend elsewhere. > > There have also been situations where the hardware crypto engine is > actually slower than a highly optimized software implementation. This > has been the case for certain ARM SOC's, for example. And I would be fine with that. Besides, if a user wants to use software implementations with the LRNG and still offer HW support for all else, all they need to do is to statically compile the software implementation and compile the hardware support as a module. As the LRNG initializes before kernel modules can be loaded, it can only use what it finds in the static kernel. > > This is not that big of deal, if you are developing a cryptographic > application (such as file system level encryption, for example) for a > specific hardware platform (such as a specific Nexus device). But if > you are trying to develop a generic service that has to work on a wide > variety of CPU architectures, and specific CPU/SOC implementations, > life is a lot more complicated. I've worked on both problems, let me > assure you the second is way tricker than the first. > > > Then, the use of the DRBG offers users to choose between a Hash/HMAC and > > CTR implementation to suit their needs. The DRBG code is agnostic of the > > underlying cipher. So, you could even use Blowfish instead of AES or > > whirlpool instead of SHA -- these changes are just one entry in > > drbg_cores[] away without any code change. > > I really question how much this matters in practice. Unless you are a > US Government Agency, where you might be laboring under a Federal > mandate to use DUAL-EC DRBG (for example), most users really don't I am not sure such references help the discussion. > care about the details of the algorithm used in their random number > generator. Giving users choice (or lots of knobs) isn't necessarily > always a feature, as the many TLS downgrade attacks have demonstrated. The options are at compile time, not at runtime. > > This is why from my perspectve it's more important to implement an > interface which is always there, and which by default is secure, to > minimize the chances that random JV-team kernel developers working for > a Linux distribution or some consumer electronics manufacturer won't > actually make things worse. As the Debian's attempt to "improve" the > security of OpenSSL demonstrates, it doesn't always end well. :-) Rest assured, the current implementation of /dev/random gives many people many headaches. And I can tell you that I have seen "random JV-team kernel developers" doing things you do not want to know just to make the behavior better. And even if they do not change anything, I am yet under the impression that the current implementation has shortcommings in typical deployment scenarios (mainly VMs and headless server systems). Hence I want to give a framework where people can safely alter a few things to suit their needs. But the things they can change should not affect the overall security. > > If we implement something which happens to result in a 2 minute stall > in boot times, the danger is that a clueless engineer at Sony, or LGE, > or Motorola, or BMW, or Toyota, etc, will "fix" the problem without > telling anyone about what they did, and we might not notice right away > that the fix was in fact catastrophically bad. Such "fixes" are employed these days already! And they are not employed because of the used crypto (which was the topic this thread started), but due to the handling and accounting of the initial entropy. So, I think that the used crypto for the DRNG side is just the icing (hence I said I can live with SP800-90A, your Chacha20, even X9.31 given that the LRNG ensures proper seeding and reseeding). The real issues are in the entropy accounting and maintenance and the reseeding of the DRNGs. Ciao Stephan --
Re: AES-NI: slower than aes-generic?
On Thu, May 26, 2016 at 08:49:39PM +0200, Stephan Mueller wrote: > > Using the kernel crypto API one can relieve the CPU of the crypto work, if a > hardware or assembler implementation is available. This may be of particular > interest for smaller systems. So, for smaller systems (where kernel bloat is > bad, but where now these days more and more hardware crypto support is > added), > we must weigh the kernel bloat (of 3 or 4 additional C files for the basic > kernel crypto API logic) against relieving the CPU of work. There are a number of caveats with using hardware acceleration; one is that many hardware accelerators are optimized for bulk data encryption, and so key scheduling, or switching between key schedules, can have a higher overhead that a pure software implementation. There have also been situations where the hardware crypto engine is actually slower than a highly optimized software implementation. This has been the case for certain ARM SOC's, for example. This is not that big of deal, if you are developing a cryptographic application (such as file system level encryption, for example) for a specific hardware platform (such as a specific Nexus device). But if you are trying to develop a generic service that has to work on a wide variety of CPU architectures, and specific CPU/SOC implementations, life is a lot more complicated. I've worked on both problems, let me assure you the second is way tricker than the first. > Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR > implementation to suit their needs. The DRBG code is agnostic of the > underlying cipher. So, you could even use Blowfish instead of AES or > whirlpool > instead of SHA -- these changes are just one entry in drbg_cores[] away > without any code change. I really question how much this matters in practice. Unless you are a US Government Agency, where you might be laboring under a Federal mandate to use DUAL-EC DRBG (for example), most users really don't care about the details of the algorithm used in their random number generator. Giving users choice (or lots of knobs) isn't necessarily always a feature, as the many TLS downgrade attacks have demonstrated. This is why from my perspectve it's more important to implement an interface which is always there, and which by default is secure, to minimize the chances that random JV-team kernel developers working for a Linux distribution or some consumer electronics manufacturer won't actually make things worse. As the Debian's attempt to "improve" the security of OpenSSL demonstrates, it doesn't always end well. :-) If we implement something which happens to result in a 2 minute stall in boot times, the danger is that a clueless engineer at Sony, or LGE, or Motorola, or BMW, or Toyota, etc, will "fix" the problem without telling anyone about what they did, and we might not notice right away that the fix was in fact catastrophically bad. These aren't the standard things which academics tend to worry about, which tend to assume that attackers will be able to read arbitrary kernel memory, and recovering such an exposure of the entropy pool is _the_ most important thing to worry about (as opposed to say, the contents of the user's private keys in the ssh-agent process). But this will perhaps explain why worrying about accomodating users who care about whether Blowfish or AES should be used in their random number generator isn't near the top of my personal priority list. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
On Thu, May 26, 2016 at 2:49 PM, Stephan Muellerwrote: > Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR > implementation to suit their needs. The DRBG code is agnostic of the > underlying cipher. So, you could even use Blowfish instead of AES or whirlpool > instead of SHA -- these changes are just one entry in drbg_cores[] away > without any code change. Not Blowfish in anything like the code you describe! It has only 64-bit blocks which might or might not be a problem, but it also has an extremely expensive key schedule which would be awful if you want to rekey often. I'd say if you want a block cipher there you can quite safely restrict the interface to ciphers with the same block & key sizes as AES. Implement AES and one of the other finalists (I'd pick Serpent) to test, and others can add the remaining finalists or national standards like Korean ARIA or the Japanese one if they want them. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Am Donnerstag, 26. Mai 2016, 14:20:19 schrieb Sandy Harris: Hi Sandy, > > Why are you using AES? Granted, it is a reasonable idea, but when Ted > replaced the non-blocking pool with a DBRG, he used a different cipher > (I think chacha, not certain) and I think chose not to use the crypto > library implementation to avoid kernel bloat. > > So he has adopted on of your better ideas. Why not follow his > lead on how to implement it? Using the kernel crypto API one can relieve the CPU of the crypto work, if a hardware or assembler implementation is available. This may be of particular interest for smaller systems. So, for smaller systems (where kernel bloat is bad, but where now these days more and more hardware crypto support is added), we must weigh the kernel bloat (of 3 or 4 additional C files for the basic kernel crypto API logic) against relieving the CPU of work. Then, the use of the DRBG offers users to choose between a Hash/HMAC and CTR implementation to suit their needs. The DRBG code is agnostic of the underlying cipher. So, you could even use Blowfish instead of AES or whirlpool instead of SHA -- these changes are just one entry in drbg_cores[] away without any code change. Finally, the LRNG code is completely agnostic of the underlying deterministic RNG. You only need a replacement of two small functions to invoke the seeding and generate operation of a DRNG. So, if one wants a Chacha20, he can have it. If one wants X9.31, he can have it. See section 2.8.3 [1] -- note, that DRNG does not even need to be a kernel crypto API registered implementation. Bottom line, I want to give folks a full flexibility. That said, the LRNG code is more of a logic to collect entropy and maintain two DRNG types which are seeded according to a defined schedule than it is a tightly integrated RNG. Also, I am not so sure that simply taking a cipher, sprinkling some backtracking logic on it implies you have a good DRNG. As of now, I have not seen assessments from others for the Chacha20 DRNG approach. I personally would think that the Chacha20 approach from Ted is good. Yet others may have a more conservative approach of using a DRNG implementation that has been reviewed by a lot of folks. [1] http://www.chronox.de/lrng/doc/lrng.pdf Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Stephan Muellerwrote: > for the DRBG and the LRNG work I am doing, I also test the speed of the DRBG. > The DRBG can be considered as a form of block chaining mode on top of a raw > cipher. > > What I am wondering is that when encrypting 256 16 byte blocks, I get a speed > of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes- > asm, I get up to 180 MB/s with all else being equal. Note, that figure > includes a copy_to_user of the generated data. Why are you using AES? Granted, it is a reasonable idea, but when Ted replaced the non-blocking pool with a DBRG, he used a different cipher (I think chacha, not certain) and I think chose not to use the crypto library implementation to avoid kernel bloat. So he has adopted on of your better ideas. Why not follow his lead on how to implement it? -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Am Donnerstag, 26. Mai 2016, 19:30:01 schrieb Stephan Mueller: Hi, > > However, the key difference to a standard speed test is that I set up a new > key schedule quite frequently. And I would suspect that something is going > on here... With tcrypt, there is some interesting hint: on smaller blocks, the C implementation is indeed faster: [ 20.391510] testing speed of async ecb(aes) (ecb(aes-generic)) encryption 20.391513] test 0 (128 bit key, 16 byte blocks): 1 operation in 275 cycles (16 bytes) [ 20.391517] test 1 (128 bit key, 64 byte blocks): 1 operation in 702 cycles (64 bytes) [ 20.391521] test 2 (128 bit key, 256 byte blocks): 1 operation in 2431 cycles (256 bytes) [ 20.391532] test 3 (128 bit key, 1024 byte blocks): 1 operation in 9347 cycles (1024 bytes) [ 20.391570] test 4 (128 bit key, 8192 byte blocks): 1 operation in 74375 cycles (8192 bytes) vs for ecb-aes-aesni: [ 143.482123] test 0 (128 bit key, 16 byte blocks): 1 operation in 1203 cycles (16 bytes) [ 143.482138] test 1 (128 bit key, 64 byte blocks): 1 operation in 1328 cycles (64 bytes) [ 143.482148] test 2 (128 bit key, 256 byte blocks): 1 operation in 1922 cycles (256 bytes) [ 143.482159] test 3 (128 bit key, 1024 byte blocks): 1 operation in 3328 cycles (1024 bytes) [ 143.482176] test 4 (128 bit key, 8192 byte blocks): 1 operation in 19483 cycles (8192 bytes) As I use crypto_cipher_encrypt_one, I only send one block at a time to AES-NI. Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
> What I am wondering is that when encrypting 256 16 byte blocks, I get a speed > of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes- > asm, I get up to 180 MB/s with all else being equal. Note, that figure > includes a copy_to_user of the generated data. > > ... Something sounds amiss. AES-NI should be on the order of magnitude faster than a generic implementation. Can you verify AES-NI is actually using AES-NI, and aes-generic is a software implementation? Here are some OpenSSL numbers. EVP uses AES-NI when available. Omitting -evp means its software only (no hardware acceleration, like AES-NI). $ openssl speed -elapsed -evp aes-128-cbc You have chosen to measure elapsed time instead of user CPU time. ... The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128-cbc 626533.60k 669884.42k 680917.93k 682079.91k 684736.51k $ openssl speed -elapsed aes-128-cbc You have chosen to measure elapsed time instead of user CPU time. ... The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes aes-128 cbc 106520.59k 114380.16k 116741.46k 117489.32k 117563.39k Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AES-NI: slower than aes-generic?
Am Donnerstag, 26. Mai 2016, 13:25:02 schrieb Jeffrey Walton: Hi Jeffrey, > > What I am wondering is that when encrypting 256 16 byte blocks, I get a > > speed of about 170 MB/s with the AES-NI driver. When using the > > aes-generic or aes- asm, I get up to 180 MB/s with all else being equal. > > Note, that figure includes a copy_to_user of the generated data. > > > > ... > > Something sounds amiss. > > AES-NI should be on the order of magnitude faster than a generic > implementation. Can you verify AES-NI is actually using AES-NI, and > aes-generic is a software implementation? I am pretty sure I am using the right implementations as I checked the refcount in /proc/crypto. > > Here are some OpenSSL numbers. EVP uses AES-NI when available. > Omitting -evp means its software only (no hardware acceleration, like > AES-NI). I understand that AES-NI should be faster. That is what I am wondering about. However, the key difference to a standard speed test is that I set up a new key schedule quite frequently. And I would suspect that something is going on here... Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AES-NI: slower than aes-generic?
Hi, for the DRBG and the LRNG work I am doing, I also test the speed of the DRBG. The DRBG can be considered as a form of block chaining mode on top of a raw cipher. What I am wondering is that when encrypting 256 16 byte blocks, I get a speed of about 170 MB/s with the AES-NI driver. When using the aes-generic or aes- asm, I get up to 180 MB/s with all else being equal. Note, that figure includes a copy_to_user of the generated data. To be precise, the code does the following steps: 1. setkey 2. AES encrypt 256 blocks and copy_to_user 3. Use AES to generate a new key and start at step 1. I am wondering why the AES-NI driver is slower than the C/ASM implementations given that all else is equal. Note, if I have less blocks in step 2 above, AES-NI is becoming faster. E.g. if I have 8 blocks or just one block in step 2 above, AES-NI is faster by 10 or 20%. Ciao Stephan -- atsec information security GmbH, Steinstraße 70, 81667 München, Germany P: +49 89 442 49 830 - F: +49 89 442 49 831 M: +49 172 216 55 78 - HRB: 129439 (Amtsgericht München) US: +1 949 545 4096 GF: Salvatore la Pietra, Staffan Persson atsec it security news blog - atsec-information-security.blogspot.com -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html