At least for now, I think I've done pretty much all I can do to optimize
the JPEG codec.  The latest round of optimizations in r3902 speed up the
Huffman decoder by employing some of the same techniques I employed with
the encoder, including register reduction, using a 64-bit holding
register on 64-bit platforms, eliminating branches, etc.  Unfortunately,
due to the nature of Huffman decoding, the loop unrolling that I did
with the encoder couldn't be done to the decoder (because the decoder
dynamically alters the loop counter.)  Some optimizations were borrowed
from TurboJPEG/mediaLib, so these specific modifications are licensed
under the wxWidgets Library License (GPL compatible.)

The attached spreadsheet shows where we stand, and a summary is included
below:


Relative performance to TurboJPEG/IPP (Neglecting Grayscale):

Pentium 4 Xeon/2.8 GHz/CentOS 4.7

         Compress         Decompress
64-bit:  -18.6 to -5.23%  -16.4 to +11.42%
32-bit:  -39.1 to -20.8%  -20.2 to +2.17%


AMD Athlon 64 X2 5050e/dual core/2.6 GHz/CentOS 5.3

         Compress         Decompress
64-bit:  +4.45 to +11.4%  -16.4 to +7.63%
32-bit:  -18.1 to -8.16%  -11.23 to +17.7%


Intel Core Duo/2.0 GHz/OS X 10.4.11

         Compress         Decompress
32-bit:  -22.8 to -9.56%  -26.0 to +3.76%



Relative performance to TigerVNC 1.0.0:

         Compress         Decompress

64-bit:  2.20 to 2.54x    2.11 to 2.54x
32-bit:  No change        +12.4 to +21.2%


In the aggregate, libjpeg/SIMD is now about an 80-90% solution relative
to IPP.  32-bit compression is still its weakness, particularly on Intel
chips.  However, 64-bit performance and 32-bit decompression performance
on AMD chips is very strong and is often even faster than IPP.  On older
64-bit AMD processors which lack SSE3, libjpeg/SIMD will be the clear
victor, since 64-bit IPP does not support SSE2 instructions.


Looking at just the 4:4:4 performance, which is the case most likely to
be used on a LAN (where these types of performance differences would be
the most obvious), the performance is generally toward the high side of
the ranges listed above, making libjpeg/SIMD a 90-95% solution in the
aggregate relative to IPP.


Since this codec is now also integrated with the evolving VirtualGL 2.2
code base, I will be maintaining it in both projects and will make
changes to it later on if I discover any obvious improvements, but for
now, I'm happy with shipping it the way it is, barring any unforeseen
breakage.


Darrell

Attachment: tigervncvsipp3.ods
Description: Binary data

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel

Reply via email to