At least for now, I think I've done pretty much all I can do to optimize the JPEG codec. The latest round of optimizations in r3902 speed up the Huffman decoder by employing some of the same techniques I employed with the encoder, including register reduction, using a 64-bit holding register on 64-bit platforms, eliminating branches, etc. Unfortunately, due to the nature of Huffman decoding, the loop unrolling that I did with the encoder couldn't be done to the decoder (because the decoder dynamically alters the loop counter.) Some optimizations were borrowed from TurboJPEG/mediaLib, so these specific modifications are licensed under the wxWidgets Library License (GPL compatible.)
The attached spreadsheet shows where we stand, and a summary is included below: Relative performance to TurboJPEG/IPP (Neglecting Grayscale): Pentium 4 Xeon/2.8 GHz/CentOS 4.7 Compress Decompress 64-bit: -18.6 to -5.23% -16.4 to +11.42% 32-bit: -39.1 to -20.8% -20.2 to +2.17% AMD Athlon 64 X2 5050e/dual core/2.6 GHz/CentOS 5.3 Compress Decompress 64-bit: +4.45 to +11.4% -16.4 to +7.63% 32-bit: -18.1 to -8.16% -11.23 to +17.7% Intel Core Duo/2.0 GHz/OS X 10.4.11 Compress Decompress 32-bit: -22.8 to -9.56% -26.0 to +3.76% Relative performance to TigerVNC 1.0.0: Compress Decompress 64-bit: 2.20 to 2.54x 2.11 to 2.54x 32-bit: No change +12.4 to +21.2% In the aggregate, libjpeg/SIMD is now about an 80-90% solution relative to IPP. 32-bit compression is still its weakness, particularly on Intel chips. However, 64-bit performance and 32-bit decompression performance on AMD chips is very strong and is often even faster than IPP. On older 64-bit AMD processors which lack SSE3, libjpeg/SIMD will be the clear victor, since 64-bit IPP does not support SSE2 instructions. Looking at just the 4:4:4 performance, which is the case most likely to be used on a LAN (where these types of performance differences would be the most obvious), the performance is generally toward the high side of the ranges listed above, making libjpeg/SIMD a 90-95% solution in the aggregate relative to IPP. Since this codec is now also integrated with the evolving VirtualGL 2.2 code base, I will be maintaining it in both projects and will make changes to it later on if I discover any obvious improvements, but for now, I'm happy with shipping it the way it is, barring any unforeseen breakage. Darrell
tigervncvsipp3.ods
Description: Binary data
------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf
_______________________________________________ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel