[openssl.org #1787] [PATCH] speed -multi buffered output fix

2008-12-10 Thread Lutz Jaenicke via RT
Thanks, patch applied.

Best regards,
Lutz
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Andy Polyakov

- OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable
for both Unix and Win64;


OK. I will follow the way like that in aes-x86_64.pl to deal with ABI
issue.


Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so 
some SIMD instructions can't be handled. Some adjustments for 
crypto/perlasm are required... A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Realligning const void *data variables into 32-bit boundaries

2008-12-10 Thread Vishnu Param

Hi guys,

This is the HASH_BLOCK_DATA_ORDER (../crypto/sha/sha_locl.h) function prototype 
:
static void HASH_BLOCK_DATA_ORDER (SHA_CTX *c, const void *p, size_t num)

As I have mentioned before in a previous thread, I am trying to modify this 
function to use my own code in my crappy embedded system for my university 
project.

I am using this for my input data :
unsigned long *data = (unsigned long *)p;

Most of the time, my custom function works. But when OpenSSL passes a non 
32-bit alligned address, my calculations will go wrong. So my data is not 
alligned to the byte boundary. How do I compile OpenSSL to align itself to 
32-bit boundaries?

I know I can solve this easily by just working with bytes, but then my function 
will be slower than OpenSSL. Any ideas?

Regards,
Vishnu.

_
Easily edit your photos like a pro with Photo Gallery.
http://get.live.com/photogallery/overview

Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Geoff Thorpe
  I doubt the OS vendors would bother
  to enable an engine by default, testing of the possible
  configurations is expensive and the costs of support calls if they
  mess up makes autodetecting the engine to use a very unattractive
  proposition.
 
  One can discuss loading selected engines by default, i.e. you'd
  have to work to not load it:-) Then it wouldn't be any different,
  yet provide
 
  I am new to OpenSSL. Can you tell me how to do that? how to use the
  proper engine automatically?

I seem to recall that the cryptodev engine (for use on *BSDs) is loaded 
by default if HAVE_CRYPTODEV is defined, and if so, the load function 
will bind the engine at run time if /dev/crypto is alive and well. This 
means it'll get used by default for those algorithms/modes it supports.

Isn't this precisely what you'd want to do for processor-specific 
enhancements? Enable compilation on platforms that might have your 
processor by setting the corresponding -Dfoo in Configure, and then 
have your load function bind the engine only if a run-time check shows 
you're running on a compatible chip.

Cheers,
Geoff

-- 
Un terrien, c'est un singe avec des clefs de char...
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: Realligning const void *data variables into 32-bit boundaries

2008-12-10 Thread Ger Hobbelt
Processing unaligned data in an aligned fashion always requires some
data copying.

There's two different problems, each with a slightly different solution.

#1: input data must be word-aligned but can be processed per byte.

Assume wordsize=8, then an unaligned data input of length = 21 can be cut up as

1 2 3 || 4 5 6 7 8 9 0 1 | 2 3 4 5 6 7 8 9 || 0 1

so three calls: process three bytes (unaligned), 2*8 aligned, then 2
bytes unaligned at the end.
No data copying required (unless maybe for the head and tail). Just
needs two calls: one for aligned, one for unaligned.


#2 (and this is what the crypto hardware needs!) not only must the
input be word-aligned, its *length*
must also be word-aligned. (That's where 'padding' comes in)

wordsize=8, input length = 21, then the solution is QUITE different:

1 2 3 | 4 5 6 7 8 9 0 1 | 2 3 4 5 6 7 8 9 | 0 1

must be *moved* (copy is fine too ;-) ) to an _aligned_ buffer, i.e.

- 1 2 3 4 5 6 7 8 | 9 0 1 2 3 4 5 6 | 7 8 9 0 1

and padded:

- 1 2 3 4 5 6 7 8 | 9 0 1 2 3 4 5 6 | 7 8 9 0 1 p p p |

so it can be fed to the process. (You may exchange the padding and
aligning steps, of course. (Proof thereof is left as an exercise to
the reader.))


One note: type casting doesn't modify the pointer value (check your
ANSI/ISO C89/C99 standard references). What you need is data at an
'aligned pointer'.
For this, there's a way too:

get a buffer somewhere (malloc() or stack); we will assume this buffer
is unaligned, then align it as needed.
Hence to process W words, the size of the buffer MUST be W words PLUS
extra (wordsize-1) bytes, to allow for aligning the pointer.

Same code off the top of my head (bugs in it come free):

// for C89:
typedef unsigned charbyte;

void process_unaligned_data_in_aligned_fashion
  ( void *src /* unaligned source */
  , size_t srclen /* ASSUME padding has already been taken care of:
   this one is already 'wordsize' aligned.
   VALUE is therefor in WORDS, _NOT_ bytes! */
  , int wordsize )
{
  size_t buflen;
  /* allocate buffer for aligning; allow for unaligned result: */
  void *rawptr = OPENSSL_malloc(srclen * wordsize + wordsize - 1);

  /* calc aligned pointer for target buf: shift UP */
  int shift = wordsize;
  shift -= ((int)rawptr) % wordsize;
  byte *al_ptr = (byte *)rawptr;
  al_ptr += shift;

  /* now al_ptr is aligned at 'wordsize' aligned memory address */
  memcpy(al_ptr, src, srclen * wordsize);

  /* perform word-aligned operation: */
  do_aligned_thing();

  ...




So far, 'C class 102'.  ;-)

I am sure you'll be able to glean the relevant parts from this and
deduce how and which bits must be applied to your particular problem.





Sigh. Too bad you're not on M68K hardware or other machinery which
simply (and quite fatally) bombs out on you at a hardware level when
addressing words at UNaligned boundaries. Ah, those were the days...
Java doesn't care any more. (Oops, sorry. let's stop this rant in its
tracks!)


If you run on Intel (and you very probably are), you don't get that
penalty, so (performance degrading!) unaligned word accesses will
'work'. Combine this with your given fault description, then consider
that 'aligning the data' /may/ not be the answer you seek -- mark the
mention of the padding in passing. Consider it a hint that other
things may be wrong with your code. (hint != answer)




On Wed, Dec 10, 2008 at 1:51 PM, Vishnu Param [EMAIL PROTECTED] wrote:
 Hi guys,

 This is the HASH_BLOCK_DATA_ORDER (../crypto/sha/sha_locl.h) function
 prototype :
 static void HASH_BLOCK_DATA_ORDER (SHA_CTX *c, const void *p, size_t num)

 As I have mentioned before in a previous thread, I am trying to modify this
 function to use my own code in my crappy embedded system for my university
 project.

 I am using this for my input data :
 unsigned long *data = (unsigned long *)p;

 Most of the time, my custom function works. But when OpenSSL passes a non
 32-bit alligned address, my calculations will go wrong. So my data is not
 alligned to the byte boundary. How do I compile OpenSSL to align itself to
 32-bit boundaries?

 I know I can solve this easily by just working with bytes, but then my
 function will be slower than OpenSSL. Any ideas?

 Regards,
 Vishnu.

 
 Enrich your blog with Windows Live Writer. Windows Live Writer



-- 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--
web:http://www.hobbelt.com/
http://www.hebbut.net/
mail:   [EMAIL PROTECTED]
mobile: +31-6-11 120 978
--
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote:
  - and $-16, %rdx is unacceptable in this context. The relevant
  interface is exposed to end-user and we have to reserve for possibility
  that key schedule is memcpy-ed to location with alternative alignment;
  
  Does there any other mechanism to deal with alignment issue in OpenSSL?
 
 The answer is engine.

In engine, I can just just re-align the expanded key address because it
is not exposed to user? Something as follow:

typedef struct
{
   AES_KEY ks;
   unsigned int _pad[3];
} INTEL_AES_KEY;

IMPLEMENT_BLOCK_CIPHER(intel_aes_128, ks, intel_AES, INTEL_AES_KEY,
   NID_aes_128, 16, 16, 16, 128,
   0, intel_aes_init_key, NULL,
   EVP_CIPHER_set_asn1_iv,
   EVP_CIPHER_get_asn1_iv,
   NULL)

BTW: The comments of AES_KEY in aes.h says:
/* This should be a hidden type, but EVP requires that the size be 
known */

Does this means AES_KEY is not a public interface and user should not
take use of its internal implementation?

  - implementation should allow for pipelining;
 
  As for the latter. I refer to possibility of scheduling of multiple
  AESENC/DEC with same key schedule element and multiple data chunks. It's
  possible in modes that allow for parallelization (e.g. ECB, CBC decrypt,
  CTR), and as far as I understand it is even recommended. So we are kind
  of obliged to reserve for this option.
 
  The answer is engine. I mean this preferably should be implemented as
  engine that will be able to take full advantage of architecture, not as
  patch to general purpose block function.
  
  But as Peter Waltenberg said, engine has its issue too.
 
 Yes, and the relevant question is if it worth it.

  At least we
  should have a branch based version (may be slower) to benefit most
  users, until we can make engine version usable by most users.
 
 There is no hardware in sight, so until is not really an argument. One 
 can reserve for branch version as back-up/exit plan, i.e. in case, 
 but not until.

ECB, CBC decrypt, CTR can benefit from AES-NI pipelining. But other
modes can not. So maybe we should have both branch version and
engine version. Branch version used for other modes and CBC decrypt,
while engine version used for ECB and CTR modes.

BTW: Is ECB used widely in practice?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying
On Wed, 2008-12-10 at 16:01 +0800, Andy Polyakov wrote:
  I doubt the OS vendors would bother
  to enable an engine by default, testing of the possible configurations is
  expensive and the costs of support calls if they mess up makes
  autodetecting the engine to use a very unattractive proposition.
  One can discuss loading selected engines by default, i.e. you'd have to
  work to not load it:-) Then it wouldn't be any different, yet provide
  
  I am new to OpenSSL. Can you tell me how to do that? how to use the
  proper engine automatically?
 
 I said one can discuss it, there is no way currently, but as it's 
 *soft*ware there is hardly limit for what one can do. A.

What's your idea about that? It seems that EVP_CipherInit_ex() will
check engines. And AES-NI engine can register itself upon there is
appropriate CPUID bit set.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying
On Wed, 2008-12-10 at 17:22 +0800, Andy Polyakov wrote:
  - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable
  for both Unix and Win64;
  
  OK. I will follow the way like that in aes-x86_64.pl to deal with ABI
  issue.
 
 Oh! Currently x86_64-xlate.pl doesn't handle 3 operand instructions, so 
 some SIMD instructions can't be handled. Some adjustments for 
 crypto/perlasm are required...

I just know a little about perl and I have no windows 64 machine to
test. Can you help me to do that?

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform

2008-12-10 Thread Huang Ying
On Wed, 2008-12-10 at 15:56 +0800, Andy Polyakov wrote:
 Implementation aiming to complement interface exposed by crypto/aes/asm 
 should allow for non-16-byte-aligned key schedule. Period. One can use 
 movups, or check alignment and choose between movups and movaps code 
 paths, or copy key schedule to aligned location on stack.

Should it be considered an unsafe behavior to copy key schedule to
stack? The stack maybe swapped out to a swap file, so that the key
schedule is leaked.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part