Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. The plan is to consolidate mode implementations, so it doesn't have to be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. Just to clarify CBC situation. While it's absolutely correct that *de*cryption is the one that can take full advantage of pipe-lining, dedicated *en*cryption procedure should also be implemented in assembler. Why? It doesn't come as surprise that CBC timing is sum of time spent in block procedure and time spent performing the block chaining. The latter can be underestimated and as block procedure gets faster it actually becomes underestimated. I reckon that with 4x faster block procedure, C timing for block chaining would be comparable with block procedure. This in turn means that overall performance would be almost twice as low as if chaining was implemented in assembler. This applies to x86_64, on x86 performance loss would be even higher... If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. I second it. And additional note. As padlock engine was mentioned, I can imagine that the idea of using inline assembler will pop up in the head. Please don't! As already mentioned we support other compilers as well and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it might be acceptable (both GNU and Microsoft compilers support inline assembler), but not in 64-bit case (GNU is the only one supporting inline assembler). As for FIPS. Given current precedent it should be noted that if branch version is certified, then the branch becomes bound to be taken. In other words branched version would be prohibited to reach certified mode of operation on CPU that does not support the instruction set extension in question. Then why does it have to be branched? Having this in mind wouldn't it make as much sense to implement module that can be used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who are willing to pursue certification for given hardware can do so with not so much hassle(*)? Would it be effort duplication? Does not have to be! Because the code can be used in engine context just as well... Now to practicalities. What I can do to help. I can put together perl scripts for x86_64 and x86, which can be used as drop-in replacement for aes-[586|x86_64].pl as well as in engine context. Note that drop-in replacement implies presence of CBC procedure, though I'd be reluctant to implement pipe-lined version. At least not without further consideration, because it might turn out that pipe-lined version doesn't have to monolithic. Most notably one can break decryption into multi-block ECB and separate multi-block chaining to minimize developing effort. A. (*) such narrow platform binding is hardly of general interest till given instruction set implementation is really common place (and hopefully multi-vendor:-). __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Not that I'm answering the question with a question, but what's wrong with movups? I mean I consider that the question about copying key schedule was already discussed in enough detail, but I'm left with feeling that other options are even less preferable. So I wonder why? Is movups expected to so much worse? Even if input is moderately misaligned, say at 64-bit boundary instead of 128? Can't one pipe-line it, i.e. schedule movups longer before aes[enc|dec], to amortize' additional latency, or will it be non-pipe-line-able? Even if so, it's possible to compromise having movaps and palignr. If one chooses to declare key schedule to ensure 64-bit alignment(*) it would really have to be a single palignr case... A. (*) for example see http://cvs.openssl.org/fileview?f=openssl/crypto/camellia/camellia.hv=1.4. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Tue, 2008-12-16 at 21:30 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Not that I'm answering the question with a question, but what's wrong with movups? I mean I consider that the question about copying key schedule was already discussed in enough detail, but I'm left with feeling that other options are even less preferable. So I wonder why? Is movups expected to so much worse? Even if input is moderately misaligned, say at 64-bit boundary instead of 128? Can't one pipe-line it, i.e. schedule movups longer before aes[enc|dec], to amortize' additional latency, or will it be non-pipe-line-able? Even if so, it's possible to compromise having movaps and palignr. If one chooses to declare key schedule to ensure 64-bit alignment(*) it would really have to be a single palignr case... A. That sounds interesting. At least deserve try. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Tue, 2008-12-16 at 19:12 +0800, Andy Polyakov wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. The plan is to consolidate mode implementations, so it doesn't have to be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692. Good! Hope that can be merged quickly. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. Just to clarify CBC situation. While it's absolutely correct that *de*cryption is the one that can take full advantage of pipe-lining, dedicated *en*cryption procedure should also be implemented in assembler. Why? It doesn't come as surprise that CBC timing is sum of time spent in block procedure and time spent performing the block chaining. The latter can be underestimated and as block procedure gets faster it actually becomes underestimated. I reckon that with 4x faster block procedure, C timing for block chaining would be comparable with block procedure. This in turn means that overall performance would be almost twice as low as if chaining was implemented in assembler. This applies to x86_64, on x86 performance loss would be even higher... OK, I will implement CBC encryption with ASM too. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. I second it. And additional note. As padlock engine was mentioned, I can imagine that the idea of using inline assembler will pop up in the head. Please don't! As already mentioned we support other compilers as well and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it might be acceptable (both GNU and Microsoft compilers support inline assembler), but not in 64-bit case (GNU is the only one supporting inline assembler). OK. I will use same format as aes-x86_64.pl. As for FIPS. Given current precedent it should be noted that if branch version is certified, then the branch becomes bound to be taken. In other words branched version would be prohibited to reach certified mode of operation on CPU that does not support the instruction set extension in question. Then why does it have to be branched? Having this in mind wouldn't it make as much sense to implement module that can be used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who are willing to pursue certification for given hardware can do so with not so much hassle(*)? Would it be effort duplication? Does not have to be! Because the code can be used in engine context just as well... Now to practicalities. What I can do to help. I can put together perl scripts for x86_64 and x86, which can be used as drop-in replacement for aes-[586|x86_64].pl as well as in engine context. Note that drop-in replacement implies presence of CBC procedure, though I'd be reluctant to implement pipe-lined version. At least not without further consideration, because it might turn out that pipe-lined version doesn't have to monolithic. Most notably one can break decryption into multi-block ECB and separate multi-block chaining to minimize developing effort. A. Thank you very much. I can change the format to perl format, but need your help to test it on Windows 64 and fix some issue such as SSE operands. I think AES-NI based pipelined implementation can be a start point for general version. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Fri, 2008-12-12 at 12:24 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 23:02:12 Huang Ying wrote: On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. OK. I will write an engine version firstly, at least for discussion. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying [EMAIL PROTECTED] To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. In any case, the solution is not to say nah, that's not quite what I need - I'd rather patch into the default s/w implementation instead and disable engine support. The default s/w implementation is a legacy notion maintained for backward compatibility (in an ideal world, that would be an engine too, in fact it would be multiple engines - asm, no-asm, [etc]...) The issues of signatures and dynamic loading are questions of build configuration, which distinguish between what is (a) built in to the libcrypto image and registered/enabled by ENGINE_load_builtin_engines(), and (b) possibly external to libcrypto and must be explicitly enabled/loaded. If you have issues with those semantics (and I know of a few, but fixing and simplifying them will require deprecating the notion of s/w fallback), then patches to crypto/engine/ are welcome (as they would be for ./Configure). Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
On Thu, Dec 11, 2008 at 10:03:32AM -0500, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. I'm surprised this can be certified for FIPS. Are you sure it is the case for the FIPS module? Consider that eng_cryptodev will in many cases end up using unknown -- and thus presumptively unvalidated -- hardware implementations of most of the core algorithms, in some cases even software implementations in the kernel. I would be surprised that the test lab would allow 'hooks' like this in the FIPS module. Thor __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
On Thursday 11 December 2008 10:52:36 Thor Lancelot Simon wrote: On Thu, Dec 11, 2008 at 10:03:32AM -0500, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. I'm surprised this can be certified for FIPS. Are you sure it is the case for the FIPS module? Consider that eng_cryptodev will in many cases end up using unknown -- and thus presumptively unvalidated -- hardware implementations of most of the core algorithms, in some cases even software implementations in the kernel. I would be surprised that the test lab would allow 'hooks' like this in the FIPS module. I'm not saying they did, do, would, nor should. But your point about unvalidated hardware implementations would equally cover the alternative intel code-path, just as it would(/might/could/has) cover(ed) the via-padlock code-path. The engine issue is about how the C code is laid out, not about what was, is, or should be validated, nor is it *necessarily* about what is static versus dynamic. My point is that there's zero technical argument for bypassing a mechanism whose sole purpose is to organise algorithm implementations, so that one can instead directly hack/patch a pre-existing implementation. If the engine mechanism needs work to meet or beat requirements, that's where the effort should go. Andy can speak for himself, but I *think* that was his point too. Caveat: I haven't been involved in FIPS at all BTW, so I find myself in the couldn't care less category regarding the details. But if there's something that needs to be locked down (ie. fixed in the C code), we can surely lock it down as a part *of* the engine code rather than locking out engine code altogether. If this isn't the case then long-term, FIPS work will diverge significantly from (a) ongoing development, and (b) common sense. Side-note: I've heard people who *are* involved in the FIPS work speak of it in fairly colourful ways - I suspect they'd agree that the standard (and its process) is little more than bureaucratic, superstitious B-S. IANAL, FWIW, etc. :-) Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for?x86_64 platform
Geoff Thorpe wrote: ... Caveat: I haven't been involved in FIPS at all BTW, so I find myself in the couldn't care less category regarding the details. But if there's something that needs to be locked down (ie. fixed in the C code), we can surely lock it down as a part *of* the engine code rather than locking out engine code altogether. If this isn't the case then long-term, FIPS work will diverge significantly from (a) ongoing development, and (b) common sense. FIPS 140-2 level 1 does NOT require that all possible invalid modes of operation of the validated module be absolutely locked down. Many validated modules contain within the module boundary implementations of disallowed algorithms. However FIPS 140-2 does require through the Security Policy that the end user not utilize any such disallowed functionality while in the validated mode of operation. In other words, the fact that the module *could* be misused is OK as long as it *isn't* misused. That said, in the OpenSSL FIPS Object Modules we've taken care to automatically disable disallowed functionality wherever feasible, to aid the end user with the responsibility of not using such functionality. Side-note: I've heard people who *are* involved in the FIPS work speak of it in fairly colourful ways - I suspect they'd agree that the standard (and its process) is little more than bureaucratic, superstitious B-S. IANAL, FWIW, etc. :-) Well, as with most such initiatives the FIPS validation program was initiated with the best of intentions and IMHO in the beginning served a useful purpose. Other that that, no comment :-) -Steve M. -- Steve Marquess Open Source Software Institute [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 05:04:36 Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? So I think the best method is to implement AES-NI with both branch and engine. With branch version, we get full power of AES-NI for cbc, cfb and ofb mode. At the same time the engine version can provide further acceleration for ecb and ctr mode. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. At least on UNIX system, mlock() can be used to prevent specified memory range from swapping out. Maybe we should put all key schedule into a memory area protected by mlock()? That is safer than stack I think. Best Regards, Huang Ying My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
I think it's overly paranoid :). No other keys are protected in that way. Also: If mlock()'d memory isn't automatically free'd when a process crashes you risk DOS'ing the OS eventually as well. (I'm not sure whether that's the case or not, just a vague memory from the past.). I'd certainly prefer the code in the core, with extended function as an engine. Steve Marquess described the problem with certification and engines pretty well, give the certifiers anything they can complain about and the process gets 10x harder/slower, it's a lot easier if we can have a clean we don't do that statement. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/12/2008 11:47 Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Thu, 2008-12-11 at 18:04 +0800, Peter Waltenberg wrote: Anything in memory could end up swapped out, but stack is the least likely since it's more often in use, the best you can do is zero the area ASAP. At least on UNIX system, mlock() can be used to prevent specified memory range from swapping out. Maybe we should put all key schedule into a memory area protected by mlock()? That is safer than stack I think. Best Regards, Huang Ying My other objection to putting all of this into an engine is that the engine code is unusable in quite a few cases. Export approvals, and certifications like FIPS and Common Criteria all pretty much insist that the crypto. isn't replaceable by some random chunk of code, that's not an OpenSSL issue as such, but it's going to be awkward for some subset of OpenSSL consumers. There are ways around those issues, but I doubt you really want to add the option of signature checking engine plugins ?. Perhaps a compromise ?. Put the generic AES speedup into the core, and the extra modes where you gain the big performance boosts into an engine ?. Peter From: Huang Ying ying.hu...@intel.com To: openssl-dev@openssl.org openssl-dev@openssl.org Date: 12/11/2008 05:06 PM Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org [attachment signature.asc deleted by Peter Waltenberg/Australia/IBM] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 20:39:41 Huang Ying wrote: On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. Please look at the padlock engine implementation, particularly the use of the DECLARE_AES_EVP macro. I humbly suggest this is a better way to go. However I see no reason why it couldn't go into crypto/engine/ (like eng_cryptodev.c) rather than in engines/ (like e_padlock.c) so that it is compiled in and loaded by default. Also, the load function for your engine could perform the run-time check for processor type, and only register its implementations with the engine infrastructure if the processor supports the AES-NI extensions (similar to padlock_available() for example). It would then suffice to add the -DHAVE_INTEL_AES_NI define to applicable build targets in Configure (presumably x86_64-related targets). Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: On Thursday 11 December 2008 20:39:41 Huang Ying wrote: On Thu, 2008-12-11 at 23:03 +0800, Geoff Thorpe wrote: Engines like eng_cryptodev.c *are* built in (they're in ./crypto/engine/ rather ./engines/) and the intention is that they should be the implementation de base for those build targets to which they apply. Cryptodev is the only one so far, but there could be others. In fact, the padlock support for VIA chips (which is comparable to what's being discussed here, with all due respect to the intel instruction-set faithful) sits in ./engines like any other h/w support - a similar argument could be made that, on chips that support it, it should provide the default implementation(s), but right now they've been happy enough to make it a non-default option. The difference between Intel AES-NI and padlock is that padlock provide support for different modes directly including ecb, cbc, cfb and ofb, while Intel AES-NI just provides instructions for AES core block algorithm NOT for modes directly. At the same time, AES-NI pipelining implementation can benefit ecb encrypt and cbc decrypt and counter mode. If we implement AES-NI with branch, we can get full power of AES-NI except ecb and ctr mode. If we implement AES-NI with engine, we can get full power of AES-NI for all modes. But we must duplicate mode implementations that can not benefit from AES-NI, such as cfb, ofb, etc. Do you OK with code duplication? The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Thursday 11 December 2008 23:02:12 Huang Ying wrote: On Fri, 2008-12-12 at 11:38 +0800, Geoff Thorpe wrote: The cipher and digest support is at the granularity of nids, and these combine algorithm, key-length, and mode. So if you implement support for those cipher,length,mode combinations that can be accelerated by AES-NI, your engine will only be invoked for those combinations. You're not obliged to implement anything else, and indeed there is nothing to be gained by doing so. The situation is: - We implement cbc and ecb mode in engine - If we implement cfb and ofb in engine too, we will duplicate code of cfb and ofb mode itself. - If we do not implement cfb and ofb in engine, no code duplication, BUT we can NOT get AES-NI acceleration for AES core block algorithm (which benefit cfb and ofb too) until we have a branch version. OK, I (mis)understood from your original mail that you could only accelerate a subset of modes. If you can accelerate them all, then please do so by implementing an intel/aes-ni engine. But not by branching in the vanilla implementation. So my suggestion is: - Accelerate AES core block algorithm with branch version. Which is used by cbc, cfb and ofb too. - Accelerate AES ecb and ctr? with engine version. And my suggestion is: - write an engine for your hardware. It is my intention to eventually engineify the default s/w implementations anyway - their static existence is a legacy thing that will get deprecated eventually (that way, if you don't want them and don't load them, or only load a subset of them, your linker and run-time footprint will shrink). At that point the whole issue of duplication is moot because someone will use the vanilla implementation *OR* your AES-NI engine, but not both. Even for now, the duplication issue is a red herring (especially when weighed against the inelegant mess it'll make of the code). So please, just write an engine and see how that goes. Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
- OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so some SIMD instructions can't be handled. Some adjustments for crypto/perlasm are required... A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I seem to recall that the cryptodev engine (for use on *BSDs) is loaded by default if HAVE_CRYPTODEV is defined, and if so, the load function will bind the engine at run time if /dev/crypto is alive and well. This means it'll get used by default for those algorithms/modes it supports. Isn't this precisely what you'd want to do for processor-specific enhancements? Enable compilation on platforms that might have your processor by setting the corresponding -Dfoo in Configure, and then have your load function bind the engine only if a run-time check shows you're running on a compatible chip. Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? The answer is engine. In engine, I can just just re-align the expanded key address because it is not exposed to user? Something as follow: typedef struct { AES_KEY ks; unsigned int _pad[3]; } INTEL_AES_KEY; IMPLEMENT_BLOCK_CIPHER(intel_aes_128, ks, intel_AES, INTEL_AES_KEY, NID_aes_128, 16, 16, 16, 128, 0, intel_aes_init_key, NULL, EVP_CIPHER_set_asn1_iv, EVP_CIPHER_get_asn1_iv, NULL) BTW: The comments of AES_KEY in aes.h says: /* This should be a hidden type, but EVP requires that the size be known */ Does this means AES_KEY is not a public interface and user should not take use of its internal implementation? - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. Yes, and the relevant question is if it worth it. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. There is no hardware in sight, so until is not really an argument. One can reserve for branch version as back-up/exit plan, i.e. in case, but not until. ECB, CBC decrypt, CTR can benefit from AES-NI pipelining. But other modes can not. So maybe we should have both branch version and engine version. Branch version used for other modes and CBC decrypt, while engine version used for ECB and CTR modes. BTW: Is ECB used widely in practice? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 16:01 +0800, Andy Polyakov wrote: I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I said one can discuss it, there is no way currently, but as it's *soft*ware there is hardly limit for what one can do. A. What's your idea about that? It seems that EVP_CipherInit_ex() will check engines. And AES-NI engine can register itself upon there is appropriate CPUID bit set. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 17:22 +0800, Andy Polyakov wrote: - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so some SIMD instructions can't be handled. Some adjustments for crypto/perlasm are required... I just know a little about perl and I have no windows 64 machine to test. Can you help me to do that? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
As for RFC part. NO! This is NOT the way to do it. For several reasons (in ascending order of importance): - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; - zero-copy CBC routine gives a fair performance improvement even in ordinary case, and driving ultra-fast block function from C would be just wasteful. In other words AESENC/DEC would benefit more from dedicated CBC routine (see even comment below); - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. Hardware however is not expected before 2010, right? A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
If you want this in the mainstream code, you'll need to detect the capability at runtime and use your alternate code paths only if the hardware is present. It's not even to Intels advantage if OpenSSL crashes and burns on older Intel CPU's and most bulk users of OpenSSL (OS vendors) won't want to mess around installing different OpenSSL versions for different hardware. Autodetection is the best option if the detection overhead is reasonable - take a look at crypto/x86_64cpuid.pl for how to do the detection logic neatly. There are advantages in this being present all the time/dynamically enabled if it can be done, most users/OS vendors wouldn't bother to configure an engine backend anyway. I'll disagree with Andy on that aspect only. The engine modules aren't particularly useful for this situation where the function is inherent in some subset of CPU's, the engines will only get used by a few end users that can be bothered to configure them. I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. (i.e. You get scenarios like building an image on a system with the new hardware then cloning it across large numbers of machines ) Peter From: Andy Polyakov [EMAIL PROTECTED] To: openssl-dev@openssl.org Date: 10/12/2008 05:42 Subject:Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform As for RFC part. NO! This is NOT the way to do it. For several reasons (in ascending order of importance): - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; - zero-copy CBC routine gives a fair performance improvement even in ordinary case, and driving ultra-fast block function from C would be just wasteful. In other words AESENC/DEC would benefit more from dedicated CBC routine (see even comment below); - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. Hardware however is not expected before 2010, right? A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
If you want this in the mainstream code, you'll need to detect the capability at runtime and use your alternate code paths only if the hardware is present. He did. It wouldn't work on Win64, but on Unix detection would actually work. There are advantages in this being present all the time/dynamically enabled if it can be done, most users/OS vendors wouldn't bother to configure an engine backend anyway. I'll disagree with Andy on that aspect only. The engine modules aren't particularly useful for this situation where the function is inherent in some subset of CPU's, the engines will only get used by a few end users that can be bothered to configure them. As mentioned, in order to fully utilize the pipelined architecture one would have to implement a number of mode-specific subroutines, most notably Nx-interleaved and non-interleaved for short input and tail processing, and wrap them in specific C logic. Putting this all this in general purpose code serves no purpose. Of course one could argue that improvement by patching single block function would be impressive enough, ~4x(?), but why stop there if you can reach for ~20x? This is my main argument for engine. I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide proper isolation for specific pipeline-enabling logic, would it? Either way, there were more points:-) A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 03:40 +0800, Andy Polyakov wrote: As for RFC part. NO! This is NOT the way to do it. For several reasons (in ascending order of importance): - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? Is it better to declare AES_KEY as follow: struct aes_key_st { unsigned int rd_key[4 *(AES_MAXNR + 1)]; int rounds; } __attribute__ ((aligned (16))); And how to deal with memory allocated with malloc()? - zero-copy CBC routine gives a fair performance improvement even in ordinary case, and driving ultra-fast block function from C would be just wasteful. In other words AESENC/DEC would benefit more from dedicated CBC routine (see even comment below); I will do more investigation on that. - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. This patch adds support to Intel AES-NI instruction set for x86_64 platform. Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) instructions that are going to be introduced in the next generation of Intel processor, as of 2009. Hardware however is not expected before 2010, right? A. Maybe 2009 or 2010, I don't know that exactly too. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 04:58 +0800, Peter Waltenberg wrote: If you want this in the mainstream code, you'll need to detect the capability at runtime and use your alternate code paths only if the hardware is present. It's not even to Intels advantage if OpenSSL crashes and burns on older Intel CPU's and most bulk users of OpenSSL (OS vendors) won't want to mess around installing different OpenSSL versions for different hardware. Autodetection is the best option if the detection overhead is reasonable - take a look at crypto/x86_64cpuid.pl for how to do the detection logic neatly. There are advantages in this being present all the time/dynamically enabled if it can be done, most users/OS vendors wouldn't bother to configure an engine backend anyway. Auto-detection has been implemented in patch. - In entry point of AES algorithm in crypto/aes/asm/aes-x86_64.pl, OPENSSL_ia32cap_P is checked, if corresponding bit (57) is set, branch into AES-NI based implementation. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 05:47 +0800, Andy Polyakov wrote: I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
- and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? The answer is engine. Is it better to declare AES_KEY as follow: struct aes_key_st { unsigned int rd_key[4 *(AES_MAXNR + 1)]; int rounds; } __attribute__ ((aligned (16))); This is gcc-ism and we support other compilers, so no. And how to deal with memory allocated with malloc()? Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. Yes, and the relevant question is if it worth it. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. There is no hardware in sight, so until is not really an argument. One can reserve for branch version as back-up/exit plan, i.e. in case, but not until. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I said one can discuss it, there is no way currently, but as it's *soft*ware there is hardly limit for what one can do. A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]