[openssl.org #1787] [PATCH] speed -multi buffered output fix
Thanks, patch applied. Best regards, Lutz __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
- OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so some SIMD instructions can't be handled. Some adjustments for crypto/perlasm are required... A. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Realligning const void *data variables into 32-bit boundaries
Hi guys, This is the HASH_BLOCK_DATA_ORDER (../crypto/sha/sha_locl.h) function prototype : static void HASH_BLOCK_DATA_ORDER (SHA_CTX *c, const void *p, size_t num) As I have mentioned before in a previous thread, I am trying to modify this function to use my own code in my crappy embedded system for my university project. I am using this for my input data : unsigned long *data = (unsigned long *)p; Most of the time, my custom function works. But when OpenSSL passes a non 32-bit alligned address, my calculations will go wrong. So my data is not alligned to the byte boundary. How do I compile OpenSSL to align itself to 32-bit boundaries? I know I can solve this easily by just working with bytes, but then my function will be slower than OpenSSL. Any ideas? Regards, Vishnu. _ Easily edit your photos like a pro with Photo Gallery. http://get.live.com/photogallery/overview
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I seem to recall that the cryptodev engine (for use on *BSDs) is loaded by default if HAVE_CRYPTODEV is defined, and if so, the load function will bind the engine at run time if /dev/crypto is alive and well. This means it'll get used by default for those algorithms/modes it supports. Isn't this precisely what you'd want to do for processor-specific enhancements? Enable compilation on platforms that might have your processor by setting the corresponding -Dfoo in Configure, and then have your load function bind the engine only if a run-time check shows you're running on a compatible chip. Cheers, Geoff -- Un terrien, c'est un singe avec des clefs de char... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: Realligning const void *data variables into 32-bit boundaries
Processing unaligned data in an aligned fashion always requires some data copying. There's two different problems, each with a slightly different solution. #1: input data must be word-aligned but can be processed per byte. Assume wordsize=8, then an unaligned data input of length = 21 can be cut up as 1 2 3 || 4 5 6 7 8 9 0 1 | 2 3 4 5 6 7 8 9 || 0 1 so three calls: process three bytes (unaligned), 2*8 aligned, then 2 bytes unaligned at the end. No data copying required (unless maybe for the head and tail). Just needs two calls: one for aligned, one for unaligned. #2 (and this is what the crypto hardware needs!) not only must the input be word-aligned, its *length* must also be word-aligned. (That's where 'padding' comes in) wordsize=8, input length = 21, then the solution is QUITE different: 1 2 3 | 4 5 6 7 8 9 0 1 | 2 3 4 5 6 7 8 9 | 0 1 must be *moved* (copy is fine too ;-) ) to an _aligned_ buffer, i.e. - 1 2 3 4 5 6 7 8 | 9 0 1 2 3 4 5 6 | 7 8 9 0 1 and padded: - 1 2 3 4 5 6 7 8 | 9 0 1 2 3 4 5 6 | 7 8 9 0 1 p p p | so it can be fed to the process. (You may exchange the padding and aligning steps, of course. (Proof thereof is left as an exercise to the reader.)) One note: type casting doesn't modify the pointer value (check your ANSI/ISO C89/C99 standard references). What you need is data at an 'aligned pointer'. For this, there's a way too: get a buffer somewhere (malloc() or stack); we will assume this buffer is unaligned, then align it as needed. Hence to process W words, the size of the buffer MUST be W words PLUS extra (wordsize-1) bytes, to allow for aligning the pointer. Same code off the top of my head (bugs in it come free): // for C89: typedef unsigned charbyte; void process_unaligned_data_in_aligned_fashion ( void *src /* unaligned source */ , size_t srclen /* ASSUME padding has already been taken care of: this one is already 'wordsize' aligned. VALUE is therefor in WORDS, _NOT_ bytes! */ , int wordsize ) { size_t buflen; /* allocate buffer for aligning; allow for unaligned result: */ void *rawptr = OPENSSL_malloc(srclen * wordsize + wordsize - 1); /* calc aligned pointer for target buf: shift UP */ int shift = wordsize; shift -= ((int)rawptr) % wordsize; byte *al_ptr = (byte *)rawptr; al_ptr += shift; /* now al_ptr is aligned at 'wordsize' aligned memory address */ memcpy(al_ptr, src, srclen * wordsize); /* perform word-aligned operation: */ do_aligned_thing(); ... So far, 'C class 102'. ;-) I am sure you'll be able to glean the relevant parts from this and deduce how and which bits must be applied to your particular problem. Sigh. Too bad you're not on M68K hardware or other machinery which simply (and quite fatally) bombs out on you at a hardware level when addressing words at UNaligned boundaries. Ah, those were the days... Java doesn't care any more. (Oops, sorry. let's stop this rant in its tracks!) If you run on Intel (and you very probably are), you don't get that penalty, so (performance degrading!) unaligned word accesses will 'work'. Combine this with your given fault description, then consider that 'aligning the data' /may/ not be the answer you seek -- mark the mention of the padding in passing. Consider it a hint that other things may be wrong with your code. (hint != answer) On Wed, Dec 10, 2008 at 1:51 PM, Vishnu Param [EMAIL PROTECTED] wrote: Hi guys, This is the HASH_BLOCK_DATA_ORDER (../crypto/sha/sha_locl.h) function prototype : static void HASH_BLOCK_DATA_ORDER (SHA_CTX *c, const void *p, size_t num) As I have mentioned before in a previous thread, I am trying to modify this function to use my own code in my crappy embedded system for my university project. I am using this for my input data : unsigned long *data = (unsigned long *)p; Most of the time, my custom function works. But when OpenSSL passes a non 32-bit alligned address, my calculations will go wrong. So my data is not alligned to the byte boundary. How do I compile OpenSSL to align itself to 32-bit boundaries? I know I can solve this easily by just working with bytes, but then my function will be slower than OpenSSL. Any ideas? Regards, Vishnu. Enrich your blog with Windows Live Writer. Windows Live Writer -- Met vriendelijke groeten / Best regards, Ger Hobbelt -- web:http://www.hobbelt.com/ http://www.hebbut.net/ mail: [EMAIL PROTECTED] mobile: +31-6-11 120 978 -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: - and $-16, %rdx is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; Does there any other mechanism to deal with alignment issue in OpenSSL? The answer is engine. In engine, I can just just re-align the expanded key address because it is not exposed to user? Something as follow: typedef struct { AES_KEY ks; unsigned int _pad[3]; } INTEL_AES_KEY; IMPLEMENT_BLOCK_CIPHER(intel_aes_128, ks, intel_AES, INTEL_AES_KEY, NID_aes_128, 16, 16, 16, 128, 0, intel_aes_init_key, NULL, EVP_CIPHER_set_asn1_iv, EVP_CIPHER_get_asn1_iv, NULL) BTW: The comments of AES_KEY in aes.h says: /* This should be a hidden type, but EVP requires that the size be known */ Does this means AES_KEY is not a public interface and user should not take use of its internal implementation? - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. But as Peter Waltenberg said, engine has its issue too. Yes, and the relevant question is if it worth it. At least we should have a branch based version (may be slower) to benefit most users, until we can make engine version usable by most users. There is no hardware in sight, so until is not really an argument. One can reserve for branch version as back-up/exit plan, i.e. in case, but not until. ECB, CBC decrypt, CTR can benefit from AES-NI pipelining. But other modes can not. So maybe we should have both branch version and engine version. Branch version used for other modes and CBC decrypt, while engine version used for ECB and CTR modes. BTW: Is ECB used widely in practice? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 16:01 +0800, Andy Polyakov wrote: I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. One can discuss loading selected engines by default, i.e. you'd have to work to not load it:-) Then it wouldn't be any different, yet provide I am new to OpenSSL. Can you tell me how to do that? how to use the proper engine automatically? I said one can discuss it, there is no way currently, but as it's *soft*ware there is hardly limit for what one can do. A. What's your idea about that? It seems that EVP_CipherInit_ex() will check engines. And AES-NI engine can register itself upon there is appropriate CPUID bit set. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 17:22 +0800, Andy Polyakov wrote: - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; OK. I will follow the way like that in aes-x86_64.pl to deal with ABI issue. Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so some SIMD instructions can't be handled. Some adjustments for crypto/perlasm are required... I just know a little about perl and I have no windows 64 machine to test. Can you help me to do that? Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part
Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote: Implementation aiming to complement interface exposed by crypto/aes/asm should allow for non-16-byte-aligned key schedule. Period. One can use movups, or check alignment and choose between movups and movaps code paths, or copy key schedule to aligned location on stack. Should it be considered an unsafe behavior to copy key schedule to stack? The stack maybe swapped out to a swap file, so that the key schedule is leaked. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part