Re: RC4 - asm or C?
First of all I'd dare to assert that results in originating message are tainted by varying oscillator frequency. If you want to obtain reliable results you should perform benchmarks at fixed oscillator frequency. Note that it's not sufficient to configure OS to run at fixed frequency, e.g. by setting cpufreq/scaling_governor to 'performance' on Linux, you have to disable even TurboBoost in BIOS. Only then one can meaningfully compare results. But I'm curious, why there is such a drop in performance of asm code and In C case unrolled loop is entered for lengths of 8 bytes and beyond. In assembler optimized loop is engaged for lengths larger than 32... what can be done to address that issue? As Peter implied the question is if it's worth the effort. Say you improve small block performance by 60%. But if the operation in question takes only 10%, then netto effect would by 6%. Well, I can have a look, but please don't hold your breath. SSH is surely the protocol that would exhibit shortest packets, upon single character entry, right? As the whole thing is encrypted (i.e. there is no clear-text data in SSH stream, right?) shortest observable packet is directly representative in the context. Now, even shortest packet seems to be larger than 16, the least I could observe is 48. So that comparison at 16 bytes block size is probably not as relevant. 48 is larger than [above mentioned] 32 and so there hardly is any reason to put effort into optimizing for shorter inputs. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: RC4 - asm or C?
Is ASM MASM or NASM compilation? And what is the value 287633.90k means? Can you send your testing code on C/С++? I will test. -- From: Timur I. Bakeyev Sent: Wednesday, November 14, 2012 5:57 PM To: openssl-dev Subject: RC4 - asm or C? Hi all! I know, it's an old topic, been discussed several times in the past, but I've decided to check in my own environment the difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on Intel(R) Xeon(R) CPU X5679 @ 3.20GHz. http://zombe.es/post/405783/openssl-outmoded-asm Well, results are quite interesting. # ./openssl -evp rc4 ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k vs. NO ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k For the small blocks C implementation still rocks(performance gain is almost 200%), but while the block size grows, assembler code outperforms C one. I guess, from now on asm implementation of RC4 should be preferred. But I'm curious, why there is such a drop in performance of asm code and what can be done to address that issue? Also, what is the common size of the RC4 block in SSL traffic, which test is more realistic? With best regerds, Timur Bakeyev.
Re: RC4 - asm or C?
Hi, Vladimir! Not much to send, it's a standard benchmark of openssl: #./openssl -evp rc4 Just compiled with and without ASM RC4 implementation. In this case it's whatever gcc is using as asm compiler. With best regards, Timur Bakeyev. On Fri, Nov 16, 2012 at 10:16 AM, Vladimir Belov vladimbe...@gmail.comwrote: ./openssl -evp rc4
Re: RC4 - asm or C?
But I'm curious, why there is such a drop in performance of asm code and In C case unrolled loop is entered for lengths of 8 bytes and beyond. In assembler optimized loop is engaged for lengths larger than 32. One should keep in might that RC4 is very sensitive to architectural characteristics. The fact that C code was faster had lesser to do with quality of compiler-generated code, but with the fact that pre-Sandy Bridge hardware was confusing itself on long blocks. As mentioned in rc4-586.pl performance vs. block size had [quite a] maximum at 64 bytes. Then assembler performed poorer because it was using compact byte-based key schedule, while C - word-based, and degree of confusion was in reverse proportion to key schedule size. what can be done to address that issue? As Peter implied the question is if it's worth the effort. Say you improve small block performance by 60%. But if the operation in question takes only 10%, then netto effect would by 6%. Well, I can have a look, but please don't hold your breath. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
RC4 - asm or C?
Hi all! I know, it's an old topic, been discussed several times in the past, but I've decided to check in my own environment the difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on Intel(R) Xeon(R) CPU X5679 @ 3.20GHz. http://zombe.es/post/405783/openssl-outmoded-asm Well, results are quite interesting. # ./openssl -evp rc4 ASM type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k vs. NO ASM type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k For the small blocks C implementation still rocks(performance gain is almost 200%), but while the block size grows, assembler code outperforms C one. I guess, from now on asm implementation of RC4 should be preferred. But I'm curious, why there is such a drop in performance of asm code and what can be done to address that issue? Also, what is the common size of the RC4 block in SSL traffic, which test is more realistic? With best regerds, Timur Bakeyev.
Re: RC4 - asm or C?
Quite a simple answer. The maximum TLS record size is 16k - overhead. Optimize for that (16k). Yes but ... The other cases don't matter, as the packet size decreases, other factors, like TCP/IP stack and network latency dominate performance - so if you send lots of small packets your net throughput is going to be limitted by things other than encryption speed anyway. For other uses of encryption, it might matter, but for SSL, it's an easy answer. Peter From: Timur I. Bakeyev ti...@com.bat.ru To: openssl-dev openssl-dev@openssl.org Date: 14/11/2012 23:58 Subject:RC4 - asm or C? Sent by:owner-openssl-...@openssl.org Hi all! I know, it's an old topic, been discussed several times in the past, but I've decided to check in my own environment the difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on Intel(R) Xeon(R) CPU X5679 @ 3.20GHz. http://zombe.es/post/405783/openssl-outmoded-asm Well, results are quite interesting. # ./openssl -evp rc4 ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k vs. NO ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k For the small blocks C implementation still rocks(performance gain is almost 200%), but while the block size grows, assembler code outperforms C one. I guess, from now on asm implementation of RC4 should be preferred. But I'm curious, why there is such a drop in performance of asm code and what can be done to address that issue? Also, what is the common size of the RC4 block in SSL traffic, which test is more realistic? With best regerds, Timur Bakeyev. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org