Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying [EMAIL PROTECTED] To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. In any case, the solution is not to say nah, that's not quite what I need - I'd rather patch into the default s/w implementation instead and disable engine support. The default s/w implementation is a legacy notion maintained for backward compatibility (in an ideal world, that would be an engine too, in fact it would be multiple engines - asm, no-asm, [etc]...) The issues of signatures and dynamic loading are questions of build configuration, which distinguish between what is (a) built in to the libcrypto image and registered/enabled by ENGINE_load_builtin_engines(), and (b) possibly external to libcrypto and must be explicitly enabled/loaded. If you have issues with those semantics (and I know of a few, but fixing and simplifying them will require deprecating the notion of s/w fallback), then patches to crypto/engine/ are welcome (as they would be for ./Configure). Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
On Thu, Dec 11, 2008 at 10:03:32AM -0500, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. I'm surprised this can be certified for FIPS. Are you sure it is the case for the FIPS module? Consider that eng_cryptodev will in many cases end up using unknown -- and thus presumptively unvalidated -- hardware implementations of most of the core algorithms, in some cases even software implementations in the kernel. I would be surprised that the test lab would allow 'hooks' like this in the FIPS module. Thor __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
On Thursday 11 December 2008 10:52:36 Thor Lancelot Simon wrote: On Thu, Dec 11, 2008 at 10:03:32AM -0500, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. I'm surprised this can be certified for FIPS. Are you sure it is the case for the FIPS module? Consider that eng_cryptodev will in many cases end up using unknown -- and thus presumptively unvalidated -- hardware implementations of most of the core algorithms, in some cases even software implementations in the kernel. I would be surprised that the test lab would allow 'hooks' like this in the FIPS module. I'm not saying they did, do, would, nor should. But your point about unvalidated hardware implementations would equally cover the alternative intel code-path, just as it would(/might/could/has) cover(ed) the via-padlock code-path. The engine issue is about how the C code is laid out, not about what was, is, or should be validated, nor is it *necessarily* about what is static versus dynamic. My point is that there's zero technical argument for bypassing a mechanism whose sole purpose is to organise algorithm implementations, so that one can instead directly hack/patch a pre-existing implementation. If the engine mechanism needs work to meet or beat requirements, that's where the effort should go. Andy can speak for himself, but I *think* that was his point too. Caveat: I haven't been involved in FIPS at all BTW, so I find myself in the couldn't care less category regarding the details. But if there's something that needs to be locked down (ie. fixed in the C code), we can surely lock it down as a part *of* the engine code rather than locking out engine code altogether. If this isn't the case then long-term, FIPS work will diverge significantly from (a) ongoing development, and (b) common sense. Side-note: I've heard people who *are* involved in the FIPS work speak of it in fairly colourful ways - I suspect they'd agree that the standard (and its process) is little more than bureaucratic, superstitious B-S. IANAL, FWIW, etc. :-) Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
Geoff Thorpe wrote: ... Caveat: I haven't been involved in FIPS at all BTW, so I find myself in the couldn't care less category regarding the details. But if there's something that needs to be locked down (ie. fixed in the C code), we can surely lock it down as a part *of* the engine code rather than locking out engine code altogether. If this isn't the case then long-term, FIPS work will diverge significantly from (a) ongoing development, and (b) common sense. FIPS 140-2 level 1 does NOT require that all possible invalid modes of operation of the validated module be absolutely locked down. Many validated modules contain within the module boundary implementations of disallowed algorithms. However FIPS 140-2 does require through the Security Policy that the end user not utilize any such disallowed functionality while in the validated mode of operation. In other words, the fact that the module *could* be misused is OK as long as it *isn't* misused. That said, in the OpenSSL FIPS Object Modules we've taken care to automatically disable disallowed functionality wherever feasible, to aid the end user with the responsibility of not using such functionality. Side-note: I've heard people who *are* involved in the FIPS work speak of it in fairly colourful ways - I suspect they'd agree that the standard (and its process) is little more than bureaucratic, superstitious B-S. IANAL, FWIW, etc. :-) Well, as with most such initiatives the FIPS validation program was initiated with the best of intentions and IMHO in the beginning served a useful purpose. Other that that, no comment :-) -Steve M. -- Steve Marquess Open Source Software Institute [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
openSSL : digest command (md5) to crypto driver
Hi All, I am having a issue when using digest command from openssl. When I issue digest command md5 from openssl, kernel side it will never receive IOCTL - CIOCGSESSION with sop-mac getting set, also it wont receive IOCTL - CIOCCRYPT with mac operation set. Tho, crypto driver which I have written registered new session, free session, process functions for CRYPTO_MD5, CRYPTO_MD5_HMAC. But when I issue des/3des/aes enc commands from openssl, open crypto device at the kernel side receives proper IOCTL and calls my crypto driver with new session and process functions with sop-cipher and other fields related to cipher get set. Is there anything I might be missing in my driver or is there anything which I have to enable to receive any digest commands? BTW, I dont have any engine supported, so I dont use engine params while issueing command from openssl. thanks, MB.
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? So I think the best method is to implement AES-NI with both branch and engine. With branch version, we get full power of AES-NI for cbc, cfb and ofb mode. At the same time the engine version can provide further acceleration for ecb and ctr mode. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. At least on UNIX system, mlock() can be used to prevent specified memory range from swapping out. Maybe we should put all key schedule into a memory area protected by mlock()? That is safer than stack I think. Best Regards, Huang Ying My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
I think it's overly paranoid :). No other keys are protected in that way. Also: If mlock()'d memory isn't automatically free'd when a process crashes you risk DOS'ing the OS eventually as well. (I'm not sure whether that's the case or not, just a vague memory from the past.). I'd certainly prefer the code in the core, with extended function as an engine. Steve Marquess described the problem with certification and engines pretty well, give the certifiers anything they can complain about and the process gets 10x harder/slower, it's a lot easier if we can have a clean we don't do that statement. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/12/2008 11:47 Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. At least on UNIX system, mlock() can be used to prevent specified memory range from swapping out. Maybe we should put all key schedule into a memory area protected by mlock()? That is safer than stack I think. Best Regards, Huang Ying My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
RE: Realligning const void *data variables into 32-bit boundaries
From: owner-openssl-...@openssl.org On Behalf Of Ger Hobbelt Sent: Wednesday, 10 December, 2008 12:51 To: openssl-dev@openssl.org Subject: Re: Realligning const void *data variables into 32-bit boundaries Processing unaligned data in an aligned fashion always requires some data copying. There's two different problems, each with a slightly different solution. snip One note: type casting doesn't modify the pointer value (check your ANSI/ISO C89/C99 standard references). What you need is data at an 'aligned pointer'. Yes and no. Standard C allows different data pointer types to have different representations (with a few exceptions not relevant here). This was because several important architectures had varying pointer/address formats, primarily for word versus byte/character, so converting a pointer on those machines uses actual code to move bits around. Nowadays all mainstream (i.e. desktop) architectures use all byte addressing, so pointer conversions are no-ops. Standard C also does NOT define the results of casting a pointer value that doesn't meet alignment requirements of the target type; this falls in the category of Undefined Behavior and according to the standard the implementation is permitted to do anything. The purpose for this, and in practice the result, is that a C implementation just accepts whatever the underlying hardware/OS does, which in all known cases is a fault (e.g. exception), a fixup, or accessing a changed address. But (I believe) your point was that casting an unaligned pointer doesn't (reliably) make it aligned. That's true, and important. For this, there's a way too: get a buffer somewhere (malloc() or stack); we will assume this buffer is unaligned, then align it as needed. Hence to process W words, the size of the buffer MUST be W words PLUS extra (wordsize-1) bytes, to allow for aligning the pointer. Same code off the top of my head (bugs in it come free): // for C89: typedef unsigned charbyte; Did you mean this single line 'for C89' or the whole code? There is no difference between C89 and C99 as to a type named 'byte', so the typedef is equally needed in C99. (And // comments aren't C89!) (C99 library has new 'exact' types like uint8_least_t, only if you #include stdint.h or inttypes.h, but not any named byte.) void process_unaligned_data_in_aligned_fashion ( void *src /* unaligned source */ , size_t srclen /* ASSUME padding has already been taken care of: this one is already 'wordsize' aligned. VALUE is therefor in WORDS, _NOT_ bytes! */ , int wordsize ) { size_t buflen; /* allocate buffer for aligning; allow for unaligned result: */ void *rawptr = OPENSSL_malloc(srclen * wordsize + wordsize - 1); /* calc aligned pointer for target buf: shift UP */ int shift = wordsize; shift -= ((int)rawptr) % wordsize; Use unsigned. int is not necessarily big enough for (all) addresses to be positive, and % of a negative value probably won't work as needed in C89 and definitely won't in C99. Technically the conversion to any integer type is implementation-defined but nonnormatively 'intended to be unsurprising' if I remember the wording the correctly. In practice if int/uint is narrower than the address width, implementations will just drop the high bits, so this only works correctly if wordsize is (always) a power of 2. Which is true on all known machines, and therefore (unsigned)rawptr (wordsize-1) gives the same result and is almost always more efficient. byte *al_ptr = (byte *)rawptr; al_ptr += shift; Declaration after (executable) statement in a block is C99 or C++ only. Easily fixed; for that matter, we don't really need shift as a separate variable, I assume you just used it for greater explanatoryness. Note that this method will give (byte*)rawptr+wordsize in the case rawptr is already aligned, which for _malloc it *usually* will be. (Specifically, if OPENSSL_malloc actually uses C malloc, as it usually does, that is required to be aligned for all datatypes supported by the C implementation; that should be most, though maybe not all, of the types/sizes used in crypto or an engine or whatever.) Thus you need to allocate srclen*wordsize + wordsize /* no -1 */ Or, we can easily use (byte*)rawptr+0 in that 'special' case: al_ptr = ((unsigned)rawptr + wordsize-1) -wordsize; /* now al_ptr is aligned at 'wordsize' aligned memory address */ memcpy(al_ptr, src, srclen * wordsize); /* perform word-aligned operation: */ do_aligned_thing(); on al_ptr for srclen*wordsize, or maybe part(s) thereof, which you need to pass to it (either directly or indirectly). ... IF the aligned processing modified the data, need to copy it back. Don't forget _cleanse if appropriate and _free. So far, 'C class 102'. ;-) __ OpenSSL Project
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 20:39:41 Huang Ying wrote: On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. Please look at the padlock engine implementation, particularly the use of the DECLARE_AES_EVP macro. I humbly suggest this is a better way to go. However I see no reason why it couldn't go into crypto/engine/ (like eng_cryptodev.c) rather than in engines/ (like e_padlock.c) so that it is compiled in and loaded by default. Also, the load function for your engine could perform the run-time check for processor type, and only register its implementations with the engine infrastructure if the processor supports the AES-NI extensions (similar to padlock_available() for example). It would then suffice to add the -DHAVE_INTEL_AES_NI define to applicable build targets in Configure (presumably x86_64-related targets). Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 20:39:41 Huang Ying wrote: On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 23:02:12 Huang Ying wrote: On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. It is my intention to eventually engineify the default s/w implementations anyway - their static existence is a legacy thing that will get deprecated eventually (that way, if you don't want them and don't load them, or only load a subset of them, your linker and run-time footprint will shrink). At that point the whole issue of duplication is moot because someone will use the vanilla implementation *OR* your AES-NI engine, but not both. Even for now, the duplication issue is a red herring (especially when weighed against the inelegant mess it'll make of the code). So please, just write an engine and see how that goes. Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org