Re: RC4 - asm or C?

2012-11-19 Thread Andy Polyakov
First of all I'd dare to assert that results in originating message are
tainted by varying oscillator frequency. If you want to obtain
reliable results you should perform benchmarks at fixed oscillator
frequency. Note that it's not sufficient to configure OS to run at fixed
frequency, e.g. by setting cpufreq/scaling_governor to 'performance' on
Linux, you have to disable even TurboBoost in BIOS. Only then one can
meaningfully compare results.

 But I'm curious, why there is such a drop in performance of asm code and
 
 In C case unrolled loop is entered for lengths of 8 bytes and beyond. In
 assembler optimized loop is engaged for lengths larger than 32...
 
 what can be done to address that issue?
 
 As Peter implied the question is if it's worth the effort. Say you
 improve small block performance by 60%. But if the operation in question
 takes only 10%, then netto effect would by 6%. Well, I can have a look,
 but please don't hold your breath.

SSH is surely the protocol that would exhibit shortest packets, upon
single character entry, right? As the whole thing is encrypted (i.e.
there is no clear-text data in SSH stream, right?) shortest observable
packet is directly representative in the context. Now, even shortest
packet seems to be larger than 16, the least I could observe is 48. So
that comparison at 16 bytes block size is probably not as relevant. 48
is larger than [above mentioned] 32 and so there hardly is any reason to
put effort into optimizing for shorter inputs.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: RC4 - asm or C?

2012-11-16 Thread Vladimir Belov
Is ASM MASM or NASM compilation? And what is the value 287633.90k means?


Can you send your testing code on C/С++? I will test.



--
From: Timur I. Bakeyev
Sent: Wednesday, November 14, 2012 5:57 PM
To: openssl-dev
Subject: RC4 - asm or C?
Hi all!

I know, it's an old topic, been discussed several times in the past, but
I've decided to check in my own environment the
difference between asm and C implementations of RC4 in OpenSSL 1.0.1c on
Intel(R) Xeon(R) CPU X5679 @ 3.20GHz.

http://zombe.es/post/405783/openssl-outmoded-asm

Well, results are quite interesting.

# ./openssl -evp rc4

ASM
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
rc4 287633.90k 573238.77k 735101.34k 777062.91k 794848.66k
rc4 286393.18k 572485.03k 731541.58k 795963.08k 817934.21k

vs.

NO ASM
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
rc4 462543.94k 530657.76k 539455.79k 547207.11k 548447.55k
rc4 472625.58k 531457.61k 541795.39k 547749.59k 548894.14k

For the small blocks C implementation still rocks(performance gain is
almost 200%), but
while the block size grows, assembler code outperforms C one.

I guess, from now on asm implementation of RC4 should be preferred.

But I'm curious, why there is such a drop in performance of asm code and
what can be done to address that issue? Also,
what is the common size of the RC4 block in SSL traffic, which test is more
realistic?

With best regerds,
Timur Bakeyev.


Re: RC4 - asm or C?

2012-11-16 Thread Timur I. Bakeyev
Hi, Vladimir!

Not much to send, it's a standard benchmark of openssl:

#./openssl -evp rc4

Just compiled with and without ASM RC4 implementation.

In this case it's whatever gcc is using as asm compiler.

With best regards,
Timur Bakeyev.


On Fri, Nov 16, 2012 at 10:16 AM, Vladimir Belov vladimbe...@gmail.comwrote:

 ./openssl -evp rc4


Re: RC4 - asm or C?

2012-11-15 Thread Andy Polyakov
 But I'm curious, why there is such a drop in performance of asm code and

In C case unrolled loop is entered for lengths of 8 bytes and beyond. In
assembler optimized loop is engaged for lengths larger than 32. One
should keep in might that RC4 is very sensitive to architectural
characteristics. The fact that C code was faster had lesser to do with
quality of compiler-generated code, but with the fact that pre-Sandy
Bridge hardware was confusing itself on long blocks. As mentioned in
rc4-586.pl performance vs. block size had [quite a] maximum at 64 bytes.
Then assembler performed poorer because it was using compact byte-based
key schedule, while C - word-based, and degree of confusion was in
reverse proportion to key schedule size.

 what can be done to address that issue?

As Peter implied the question is if it's worth the effort. Say you
improve small block performance by 60%. But if the operation in question
takes only 10%, then netto effect would by 6%. Well, I can have a look,
but please don't hold your breath.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


RC4 - asm or C?

2012-11-14 Thread Timur I. Bakeyev
Hi all!

I know, it's an old topic, been discussed several times in the past, but
I've decided to check in my own environment the difference between asm and
C implementations of RC4 in OpenSSL 1.0.1c on
 Intel(R) Xeon(R) CPU X5679 @ 3.20GHz.

http://zombe.es/post/405783/openssl-outmoded-asm

Well, results are quite interesting.

# ./openssl -evp rc4

ASM
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
rc4 287633.90k   573238.77k   735101.34k   777062.91k
794848.66k
rc4 286393.18k   572485.03k   731541.58k   795963.08k
817934.21k

vs.

NO ASM
type 16 bytes 64 bytes256 bytes   1024 bytes   8192
bytes
rc4 462543.94k   530657.76k   539455.79k   547207.11k
548447.55k
rc4 472625.58k   531457.61k   541795.39k   547749.59k
548894.14k

For the small blocks C implementation still rocks(performance gain is
almost 200%), but
while the block size grows, assembler code outperforms C one.

I guess, from now on asm implementation of RC4 should be preferred.

But I'm curious, why there is such a drop in performance of asm code and
what can be done to address that issue? Also, what is the common size of
the RC4 block in SSL traffic, which test is more realistic?

With best regerds,
Timur Bakeyev.


Re: RC4 - asm or C?

2012-11-14 Thread Peter Waltenberg
Quite a simple answer.
The maximum TLS record size is 16k - overhead. Optimize for that (16k).

Yes but ...

The other cases don't matter, as the packet size decreases, other factors,
like TCP/IP stack and network latency dominate performance - so if you send
lots of small packets your net throughput is going to be limitted by things
other than encryption speed anyway.

For other uses of encryption, it might matter, but for SSL, it's an easy
answer.

Peter




From:   Timur I. Bakeyev ti...@com.bat.ru
To: openssl-dev openssl-dev@openssl.org
Date:   14/11/2012 23:58
Subject:RC4 - asm or C?
Sent by:owner-openssl-...@openssl.org



Hi all!

I know, it's an old topic, been discussed several times in the past, but
I've decided to check in my own environment the difference between asm and
C implementations of RC4 in OpenSSL 1.0.1c on
 Intel(R) Xeon(R) CPU X5679 @ 3.20GHz.

http://zombe.es/post/405783/openssl-outmoded-asm

Well, results are quite interesting.

# ./openssl -evp rc4

ASM
type 16 bytes 64 bytes    256 bytes   1024 bytes   8192
bytes
rc4 287633.90k   573238.77k   735101.34k   777062.91k
794848.66k
rc4 286393.18k   572485.03k   731541.58k   795963.08k
817934.21k

vs.

NO ASM
type 16 bytes 64 bytes    256 bytes   1024 bytes   8192
bytes
rc4 462543.94k   530657.76k   539455.79k   547207.11k
548447.55k
rc4 472625.58k   531457.61k   541795.39k   547749.59k
548894.14k

For the small blocks C implementation still rocks(performance gain is
almost 200%), but
while the block size grows, assembler code outperforms C one.

I guess, from now on asm implementation of RC4 should be preferred.

But I'm curious, why there is such a drop in performance of asm code and
what can be done to address that issue? Also, what is the common size of
the RC4 block in SSL traffic, which test is more realistic?

With best regerds,
Timur Bakeyev.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org