On 7/22/2010 4:10 AM, Martin Sustrik wrote:
>> This is a somewhat weak example because the work being done by the
>> worker is so trivial, but even so on a virtual quad-core machine
>> building with -O0 I see a 35-40% reduction in processing time.
>>
> Wrker being trivial, the large reduction in processing time is even more
> impressive.
>
Just to follow up on that, I thought I'd post the findings of my
benchmark comparisons of GCC vs the Intel C Compiler, they're kinda
impressive:
Virtual Ubuntu 10.04 guest Machine running under VMWare 7.0 on an i7
host under Windows 7 host, 2 virtual cpus with 2 cores each:
Async-Worker tests with GCC v4.4.3 with -O3 -msse -msse2 -msse3 -mssse3
-msse4 -msse4.1 -msse4.2 -mfpmath=sse -mtune=core2 -march=core2:
(NOTE: I used Acovea to find these optimal settings, I wouldn't
ordinarily use -mtune/-march because I always find they make things worse :)
~3580ms for serial RunAndReturn, ~3580 for serial RunAndReturnLocal,
~930ms for parallel RunAndReturn, ~940ms for parallel RunAndReturnLocal
Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo:
~2590ms for serial RunAndReturn, ~2580ms for serial
RunAndReturnLocal, (27% gain)
~700ms for parallel RunAndReturn, ~700ms for parallel
RunAndReturnLocal (25% gain)
Building ZeroMQ with "icpc -O3 -ipo -xHOST" instead of GCC shaved an
extra 4-10ms off parallel results.
Building both Async::Worker examples and ZeroMQ with "icpc -O3 -ipo
-xHOST -fbuiltin" reduces benchmark times by upto 50ms.
Async-Worker tests with Intel C++ compiler 11.1 72 with -O3 -xHOST -ipo
-fbuiltin and ZeroMQ compiled with same flags:
~2510ms for serial RunAndReturn, ~2510ms for serial
RunAndReturnLocal, (30% gain)
~640ms for parallel RunAndReturn, ~650ms for parallel
RunAndReturnLocal (32% gain)
Given the trivial workloads, these are fairly impressive benchmarks.
The Intel C++ compiler is dual-licensed, you can download the Linux
version free
http://software.intel.com/en-us/intel-compilers/
Compared to the Microsoft Visual C++ compiler (2008) we found between
15-50% performance improvements. The 2010 VSCC is significantly
improved, but Intel's compiler still produces 10-30% improvements.
You may be aware there was some controversy over the Intel compiler
generating code that didn't work as well on AMD chips: This only
occurred when you built "alternate code paths" for SSE instructions etc,
and the (9.x) version of the compiler would tend not to use the
alternate code paths unless you had an Intel compiler.
That option is now called "Build Intel specific optimizations", and the
alternate code paths now applies fairly to any CPU that claims to have
the feature set you are targetting.
- Oliver
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev