Re: Open SSL and CUDA
Thanks a lot for your reply, Andy. Some time ago I came up with a proof of concept multi-threaded implementation. The GPU is not used if the system load (measured through getloadvg under linux) is below a certain threshold. Otherwise each thread puts its message (on which the private key operation has to be performed) into a shared buffer. If the buffer is full after inserting the message, the current thread runs the private key operation batch on the GPU. If the buffer is not full it sleeps for some time. The first thread to wake up runs the batch on the GPU even if the buffer is not full. There thread running the batch wakes up the others afterwards. What do you think about this approach? Can you give more insight about developing the idea for DNSSEC? How can I go about that?
Compiling OpenSSL with icc
Hi all! Not sure if this is an appropriate ML for the question, but I hope it's interesting for the developers. I've tried to compile OpenSSL 1.0.1c with the latest ICC(Intel C Compiler) on my x86_64 Debian Squeeze machine: # icc -v icc version 13.0.1 (gcc version 4.4.5 compatibility) Despite absence of the special -icc target for x86_64 architecture specifying CC=icc did the trick and OpenSSL compiled smoothly without any errors. But... make test failed with the tons of errors, starting with: des_ede3_cbcm_encrypt decrypt error 37 36 35 34 33 32 31 20 4e 6f 77 20 69 73 20 74 68 65 20 74 69 6d 65 20 66 6f 72 20 00 5c c4 64 56 4b 7d 39 f6 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Doing ecb Encryption error 1 k= p= o=8CA64DE9C1B123A7 act=5CC464564B7D39F6 Decryption error 1 k= p=5CC464564B7D39F6 o= act=5CC464564B7D39F6 Encryption error 2 k= p= o=7359B2163E4EDC58 act=68845C562BF5316A Decryption error 2 ... fast crypt test fast crypt error, ef18m.H5CTFIY should be efGnQx2725bI2 fast crypt error, yAH2WUFGPVkkc should be yA1Rp/1hZXIJk Obviously, something went wrong here :( I played around with optimization flags in the Makefile(removing whatever left from the default target), but nothing helps. This is definitely beyond my expertize, I can't figure out why results of icc so different from gcc, but I think it would be really useful to fix the problem and get icc work, as with some of the speed tests I've noticed 15% increase in performance. Of course, that numbers are useless as the resulting hashes are wrong, but I hope to see real comparison. With best regards, Timur Bakeyev.
RC4 - asm or C?
Hi all! I know, it's an old topic, been discussed several times in the past, but I've decided to check in my own environment the difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on Intel(R) Xeon(R) CPU X5679 @ 3.20GHz. http://zombe.es/post/405783/openssl-outmoded-asm Well, results are quite interesting. # ./openssl -evp rc4 ASM type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k vs. NO ASM type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k For the small blocks C implementation still rocks(performance gain is almost 200%), but while the block size grows, assembler code outperforms C one. I guess, from now on asm implementation of RC4 should be preferred. But I'm curious, why there is such a drop in performance of asm code and what can be done to address that issue? Also, what is the common size of the RC4 block in SSL traffic, which test is more realistic? With best regerds, Timur Bakeyev.
Re: RC4 - asm or C?
Quite a simple answer. The maximum TLS record size is 16k - overhead. Optimize for that (16k). Yes but ... The other cases don't matter, as the packet size decreases, other factors, like TCP/IP stack and network latency dominate performance - so if you send lots of small packets your net throughput is going to be limitted by things other than encryption speed anyway. For other uses of encryption, it might matter, but for SSL, it's an easy answer. Peter From: Timur I. Bakeyev ti...@com.bat.ru To: openssl-dev openssl-dev@openssl.org Date: 14/11/2012 23:58 Subject:RC4 - asm or C? Sent by:owner-openssl-...@openssl.org Hi all! I know, it's an old topic, been discussed several times in the past, but I've decided to check in my own environment the difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on Intel(R) Xeon(R) CPU X5679 @ 3.20GHz. http://zombe.es/post/405783/openssl-outmoded-asm Well, results are quite interesting. # ./openssl -evp rc4 ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k vs. NO ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k For the small blocks C implementation still rocks(performance gain is almost 200%), but while the block size grows, assembler code outperforms C one. I guess, from now on asm implementation of RC4 should be preferred. But I'm curious, why there is such a drop in performance of asm code and what can be done to address that issue? Also, what is the common size of the RC4 block in SSL traffic, which test is more realistic? With best regerds, Timur Bakeyev. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org