Re: Open SSL and CUDA

2012-11-14 Thread Miele Andrea
Thanks a lot for your reply, Andy.
Some time ago I  came up with a proof of concept multi-threaded implementation.
The GPU is not used if the system load (measured through getloadvg under linux) 
is below a certain threshold.
Otherwise each thread puts its message (on which the private key operation has 
to be performed) into a shared buffer.
If the buffer is full after inserting the message, the current thread runs the 
private key operation  batch on the GPU.
If the buffer is not full it sleeps for some time.
The first thread to wake up runs the batch on the GPU even if the buffer is not 
full.
There thread running the batch wakes up the others afterwards.
What do you think about this approach?
Can you give more insight about developing the idea for DNSSEC?
How can I go about that?


Compiling OpenSSL with icc

2012-11-14 Thread Timur I. Bakeyev
Hi all!

Not sure if this is an appropriate ML for the question, but I hope it's
interesting for the developers.

I've tried to compile OpenSSL 1.0.1c with the latest ICC(Intel C Compiler)
on my x86_64 Debian Squeeze machine:

# icc -v
icc version 13.0.1 (gcc version 4.4.5 compatibility)

Despite absence of the special -icc target for x86_64 architecture
specifying CC=icc did the trick and OpenSSL
compiled smoothly without any errors. But... make test failed with the
tons of errors, starting with:

des_ede3_cbcm_encrypt decrypt error
 37 36 35 34 33 32 31 20 4e 6f 77 20 69 73 20 74 68 65 20 74 69 6d 65 20 66
6f 72 20 00
 5c c4 64 56 4b 7d 39 f6 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
Doing ecb
Encryption error  1
k= p= o=8CA64DE9C1B123A7
act=5CC464564B7D39F6
Decryption error  1
k= p=5CC464564B7D39F6 o=
act=5CC464564B7D39F6
Encryption error  2
k= p= o=7359B2163E4EDC58
act=68845C562BF5316A
Decryption error  2
...
fast crypt test fast crypt error, ef18m.H5CTFIY should be efGnQx2725bI2
fast crypt error, yAH2WUFGPVkkc should be yA1Rp/1hZXIJk

Obviously, something went wrong here :( I played around with optimization
flags in the Makefile(removing whatever
left from the default target), but nothing helps.

This is definitely beyond my expertize, I can't figure out why results of
icc so different from gcc, but I think it would be really
useful to fix the problem and get icc work, as with some of the speed tests
I've noticed 15% increase in performance. Of course,
that numbers are useless as the resulting hashes are wrong, but I hope to
see real comparison.

With best regards,
Timur Bakeyev.


RC4 - asm or C?

2012-11-14 Thread Timur I. Bakeyev
Hi all!

I know, it's an old topic, been discussed several times in the past, but
I've decided to check in my own environment the difference between asm and
C implementations of RC4 in OpenSSL 1.0.1c on
 Intel(R) Xeon(R) CPU X5679 @ 3.20GHz.

http://zombe.es/post/405783/openssl-outmoded-asm

Well, results are quite interesting.

# ./openssl -evp rc4

ASM
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
rc4 287633.90k   573238.77k   735101.34k   777062.91k
794848.66k
rc4 286393.18k   572485.03k   731541.58k   795963.08k
817934.21k

vs.

NO ASM
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
rc4 462543.94k   530657.76k   539455.79k   547207.11k
548447.55k
rc4 472625.58k   531457.61k   541795.39k   547749.59k
548894.14k

For the small blocks C implementation still rocks(performance gain is
almost 200%), but
while the block size grows, assembler code outperforms C one.

I guess, from now on asm implementation of RC4 should be preferred.

But I'm curious, why there is such a drop in performance of asm code and
what can be done to address that issue? Also, what is the common size of
the RC4 block in SSL traffic, which test is more realistic?

With best regerds,
Timur Bakeyev.


Re: RC4 - asm or C?

2012-11-14 Thread Peter Waltenberg
Quite a simple answer.
The maximum TLS record size is 16k - overhead. Optimize for that (16k).

Yes but ...

The other cases don't matter, as the packet size decreases, other factors,
like TCP/IP stack and network latency dominate performance - so if you send
lots of small packets your net throughput is going to be limitted by things
other than encryption speed anyway.

For other uses of encryption, it might matter, but for SSL, it's an easy
answer.

Peter




From:   Timur I. Bakeyev ti...@com.bat.ru
To: openssl-dev openssl-dev@openssl.org
Date:   14/11/2012 23:58
Subject:RC4 - asm or C?
Sent by:owner-openssl-...@openssl.org



Hi all!

I know, it's an old topic, been discussed several times in the past, but
I've decided to check in my own environment the difference between asm and
C implementations of RC4 in OpenSSL 1.0.1c on
 Intel(R) Xeon(R) CPU X5679 @ 3.20GHz.

http://zombe.es/post/405783/openssl-outmoded-asm

Well, results are quite interesting.

# ./openssl -evp rc4

ASM
type 16 bytes 64 bytes    256 bytes   1024 bytes   8192
bytes
rc4 287633.90k   573238.77k   735101.34k   777062.91k
794848.66k
rc4 286393.18k   572485.03k   731541.58k   795963.08k
817934.21k

vs.

NO ASM
type 16 bytes 64 bytes    256 bytes   1024 bytes   8192
bytes
rc4 462543.94k   530657.76k   539455.79k   547207.11k
548447.55k
rc4 472625.58k   531457.61k   541795.39k   547749.59k
548894.14k

For the small blocks C implementation still rocks(performance gain is
almost 200%), but
while the block size grows, assembler code outperforms C one.

I guess, from now on asm implementation of RC4 should be preferred.

But I'm curious, why there is such a drop in performance of asm code and
what can be done to address that issue? Also, what is the common size of
the RC4 block in SSL traffic, which test is more realistic?

With best regerds,
Timur Bakeyev.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org