Sorry about that previous bit of noise. I thought I had stumbled upon an even simpler way of accomplishing this with TigerVNC, but it was because of a spreadsheet error.
Anyhow, the original analysis is all very buried in the libvncserver-common mailing list in multiple threads, with lots of hemming and hawing, and the spreadsheet is an eye chart. Thus, I'm going to repeat the analysis here, so you can have the information all in one place. Basically, this all started with convincing the libvncserver developers to support the TurboVNC encoder as a build option, mainly because I wanted to be able to use that encoder with x11vnc without patching the libvncserver code. They agreed, but this led to discussions about why it was then necessary to keep supporting the TightVNC 1.3.x encoder, so ultimately we agreed that replacing the TightVNC encoder with the TurboVNC encoder was the cleanest solution. However, this led to questions about whether the TurboVNC encoder could compress as "tightly" in all cases, which prompted me to do this research. Basically, the tests I conducted were similar to the ones I performed when designing the TurboVNC codec, using the same canonical datasets (8 2D datasets from Constantin and 12 3D datasets of my own design. Refer to http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf for dataset descriptions.) NOTE: the 8-bit test (freshmeat-8) is not particularly realistic anymore, since it doesn't use JPEG, so it was excluded from this analysis, but I also verified that including it would not have changed any of the conclusions (that is, I verified that that dataset was never an outlier.) Since libvncserver has to support both libjpeg and libjpeg-turbo, the libvncserver developers also needed reassurance that the performance with the new TurboVNC encoder was not solely dependent on the use of libjpeg-turbo. Thus, I tested the Turbo encoder against both libjpeg and libjpeg-turbo, but that aspect of the research is largely irrelevant for TigerVNC. Further, I tested the encoder with 4:2:2 Q80 and Q37 JPEG images, so as to get a true apples-to-apples comparison with low-quality JPEG (JPEG Quality Level 4) and medium-quality JPEG (JPEG Quality Level 9) in TightVNC 1.3.x (since TightVNC 1.3.x only supports 4:2:2 subsampling.) In all cases, I compared aggregate 2D and 3D compression ratio and performance across the series of datasets as well as looked for any outliers that diverged significantly from the average. I started by finding the "maximum useful" compression level in TightVNC, which I defined as the level beyond which it was impossible to get more than a 5% improvement in compression ratio by going any higher. For almost all datasets, this was CL 5, but a few were able to get some incremental benefit (no more than 10%) by moving from CL 5 to CL 6. No datasets saw more than a 5% improvement in compression ratio by moving from CL 6 to CL 9. TightVNC CL 9 vs. TightVNC CL 6: -- Compression Ratio for 2D datasets: -0.58% to +3.4% (avg +1.0% for Q80, +2.2% for Q37) -- Compression Ratio for 3D datasets: -2.3% to +5.0% (avg +0.82% for Q80, +1.3% for Q37) -- Speedup for 2D datasets: -83% to -46% (avg -74% for Q80, -75% for Q37) -- Speedup for 3D datasets: -87% to -4.7% (avg -78% for Q80, -78% for Q37) Thus, CL 6 in TightVNC is the maximum useful compression level and served as the compression ratio target for the TurboVNC encoder. Note that switching from CL 6 to CL 9 in the TightVNC encoder increased CPU time by, on average, 4-5x without providing any significant increase in compression ratio for any dataset I tested. Why does CPU time matter, you ask? Because using CL 9 in TightVNC produces a situation in which the CPU usage on the server is so high that it can become the primary bottleneck, even on low-bandwidth connections. For some datasets, TightVNC CL 9 would have been unable to fill even a 2-megabit pipe. CL 6, in contrast, could generally fill a 10-20 megabit pipe without any significant reduction in compression ratio relative to CL 9. NOTE: I ran the same tests without JPEG enabled as well (using Raw subrects in place of JPEG, not using gradient encoding), just as a sanity check, and CL 6 proved to be the maximum useful compression ratio in that case as well, with similar statistics to the above. I then tested TurboVNC's CL 2 (which is similar to TigerVNC's CL 2, and at the time this was the maximum compression level available in TurboVNC) against TightVNC CL 6: -- Compression Ratio for 2D datasets: -29% to +1.1% (avg -16% for Q80, -12% for Q37) -- Compression Ratio for 3D datasets: -14% to +131% (avg +51% for Q80, +4.5% for Q37) -- Speedup for 2D datasets (libjpeg): -4.9% to +115% (avg +53% for Q80, +50% for Q37) -- Speedup for 3D datasets (libjpeg): -12% to +210% (avg +91% for Q80, +52% for Q37) -- Speedup for 2D datasets (libjpeg-turbo): +26% to +226% (avg +128% for Q80, +121% for Q37) -- Speedup for 3D datasets (libjpeg-turbo): +115% to +574% (avg +327% for Q80, +243% for Q37) So, you can see that TurboVNC CL 2 already compresses as well as or better than the TightVNC on 3D workloads in most cases (in all cases for Q80, but Q37 had one -14% outlier and three -7% outliers among the 3D datasets.) TurboVNC CL 2 also has generally better performance across the board, even when using libjpeg instead of libjpeg-turbo. However, it isn't quite up to par with TightVNC 1.3.x in terms of overall compression ratio for 2D datasets. I went through a good bit of trial and error and finally stumbled upon a mode that mimics the other compression levels in TurboVNC, except that it borrows the Zlib compression levels from TightVNC CL 5 (7 for index and mono subrects and 5 for raw subrects) and it uses a palette threshold of 256 (to favor the use of indexed color subrects as much as possible.) This new mode became TurboVNC CL 9: TurboVNC CL 9 vs. TightVNC CL 6: -- Compression Ratio for 2D datasets: -16% to +17% (avg +0.043% for Q80, -7.0% for Q37) -- Compression Ratio for 3D datasets: -2.8% to +133% (avg +62% for Q80, +11% for Q37) -- Speedup for 2D datasets (libjpeg): -18% to +1.1% (avg -4.8% for Q80, -8.1% for Q37) -- Speedup for 3D datasets (libjpeg): -15% to +117% (avg +27% for Q80, -1.1% for Q37) -- Speedup for 2D datasets (libjpeg-turbo): -6.6% to +132% (avg +8.1% for Q80, +4.0% for Q37) -- Speedup for 3D datasets (libjpeg-turbo): +11% to +366% (avg +90% for Q80, +47% for Q37) The compression ratio for 2D apps is now a lot more in the ballpark of TightVNC. Only a couple of 2D workloads still compressed significantly better with TightVNC, and most compressed better with the new TurboVNC mode. The outliers were not extreme, and the average level of compression was about the same. The only negative compression ratio outliers larger than +/- 6% were the kde-hearts and photos datasets. I will discuss these individually. kde-hearts-16 & kde-hearts-24: With medium-quality JPEG (Q80) and TurboVNC CL 2, the two kde-hearts tests were the biggest negative outliers (~ -30%) in terms of compression ratio when compared apples-to-apples with TightVNC CL 6. With low-quality JPEG, however, those same tests with TurboVNC CL 2 compressed about equally well when compared apples-to-apples with TightVNC CL 6. Thus, if a real-world workload was similar in nature to those datasets, and if a user was looking for the maximum compression, then we already had a mode that could provide it. The new TurboVNC CL 9 basically split the difference. Rather than being a -30% outlier with medium-quality JPEG and performing on parity with low-quality JPEG, the kde-hearts tests were now at about -10% to -15% on both. Disabling JPEG altogether (while still using the new TurboVNC CL 9) proved to be a better approach for the kde-hearts tests. kde-hearts-16 now compressed 7% better than with TightVNC CL 6/JPEG Q80, and kde-hearts-24 was now only 4% worse. Thus, we now had modes that could achieve the same compression ratio and image quality with kde-hearts-* when compared to the TightVNC baseline. The photos test was still at -10% compared to TightVNC, and numerous attempts to improve this were unsuccessful. The kde-hearts and photos datasets represent rare cases for which the smoothness detection routines in TightVNC 1.3.x are actually beneficial, but as the original TurboVNC research showed, smoothness detection is too computationally expensive to be enabled all the time, and it is not beneficial in most cases. It's sort of a moot point, though, since smoothness detection is a legacy feature of TightVNC and has been removed in more recent releases. The purpose of this exercise was to demonstrate that, in general, it was possible to use the TurboVNC codec without giving up the "tightness" of TightVNC in most cases. The libvncserver developers ran their own independent tests confirming that the new TurboVNC CL 9 achieved this to their satisfaction. Now, do I consider the new CL 9 in TurboVNC to be a generally useful mode? Not in the vast majority of cases. On average, it doubles the CPU time relative to CL 2 and provides only about 20% better compression for 2D datasets and 7% better compression for 3D datasets. But there are a few 2D cases that compress 30-50% better with this mode than with CL 2. It's one of those "try it and see" situations. We document the new mode but do not expose it in the GUI, so it is considered an advanced feature. ---------- Over the past few days, I repeated the above research for TigerVNC (hacking the TigerVNC encoder slightly so that JPEG quality level 5 corresponds to Q80 and JPEG quality level 3 corresponds to Q37, so I could get a true apples-to-apples comparison with TightVNC.) From previous tests (http://www.virtualgl.org/pmwiki/uploads/About/turbototiger.pdf), we already know that CL 6 is the maximum useful mode in TigerVNC. Well, in fact, CL 5 and CL 6 are virtually identical in TigerVNC. The only major difference is that CL 5 uses Zlib level 4 for raw subrects and CL 6 uses Zlib level 5 for raw subrects, so that difference is irrelevant when JPEG is enabled. TigerVNC CL 9 vs. TigerVNC CL 5: -- Compression Ratio for 2D datasets: +0.10% to +2.6% (avg +0.80% for Q80, 1.1% for Q37) -- Compression Ratio for 3D datasets: +0% to +4.8% (avg +0.69% for Q80, 1.1% for Q37) -- Speedup for 2D datasets (libjpeg): -74% to -29% (avg -61% for Q80, -61% for Q37) -- Speedup for 3D datasets (libjpeg): -85% to +0.59% (avg -68% for Q80, -70% for Q37) In TigerVNC, CL 5 is the maximum useful compression level when JPEG is enabled. As with TightVNC, increasing the compression level beyond this has only a negligible impact on compression ratio and a very significant negative impact on performance. As with TightVNC, this produces a situation in which the server CPU is the primary bottleneck on any network faster than a few megabits/sec. With JPEG disabled, the maximum useful level turns out to be CL 7, but only because CL 7 sets the raw Zlib level to 6 instead of 5. From the research with the TurboVNC encoder, it became apparent that using Zlib levels > 7 was never beneficial. Thus, it turns out that we can set the Zlib levels in TigerVNC CL 6 to 7,7,6 (the same as TightVNC CL 6) and get the compression ratio to within +/- 5% of TigerVNC CL 9 in all cases, including with JPEG disabled. So let's see where TigerVNC stands with respect to TightVNC: TigerVNC CL 5 vs. TightVNC CL 6: -- Compression Ratio for 2D datasets: -28% to +11% (avg -9.4% for Q80, -3.7% for Q37) -- Compression Ratio for 3D datasets: -3.6% to +133% (avg +56% for Q80, +10% for Q37) -- Speedup for 2D datasets (libjpeg): -38% to +50% (avg +9.9% for Q80, +8.6% for Q37) -- Speedup for 3D datasets (libjpeg): +6.9% to +138% (avg +67% for Q80, +33% for Q37) -- Speedup for 2D datasets (libjpeg-turbo): -37% to +200% (avg +35% for Q80, +28% for Q37) -- Speedup for 3D datasets (libjpeg-turbo): +43% to +594% (avg +155% for Q80, +94% for Q37) TigerVNC is in a better position than TurboVNC was, but we're still dealing with some large negative outliers relative to TightVNC. However, changing the parameters of TigerVNC CL 9 such that a palette threshold of 256 and Zlib levels 7, 7, and 6 are used produces very similar results to the new TurboVNC CL 9. Thus, I make the following proposals: -- Cap the Zlib levels used by the Tight encoder in TigerVNC to 7 for indexed and mono subrects and to 6 for raw subrects -- Change the Zlib levels of CL 5 and CL 6 to match those used in TightVNC -- Set the palette threshold of CL 9 to 256. --- TightEncoder.cxx (revision 5946) +++ TightEncoder.cxx (working copy) @@ -67,11 +67,11 @@ { 65536, 2048, 8, 3, 3, 2, 24, 96, 41, SUBSAMP_420 }, // 2 { 65536, 2048, 12, 5, 5, 2, 32, 96, 42, SUBSAMP_422 }, // 3 { 65536, 2048, 12, 6, 7, 3, 32, 96, 62, SUBSAMP_422 }, // 4 - { 65536, 2048, 12, 7, 8, 4, 32, 96, 77, SUBSAMP_422 }, // 5 - { 65536, 2048, 16, 7, 8, 5, 32, 96, 79, SUBSAMP_NONE }, // 6 - { 65536, 2048, 16, 8, 9, 6, 64, 96, 86, SUBSAMP_NONE }, // 7 - { 65536, 2048, 24, 9, 9, 7, 64, 96, 92, SUBSAMP_NONE }, // 8 - { 65536, 2048, 32, 9, 9, 9, 96, 96,100, SUBSAMP_NONE } // 9 + { 65536, 2048, 12, 7, 7, 5, 32, 96, 77, SUBSAMP_422 }, // 5 + { 65536, 2048, 16, 7, 7, 6, 32, 96, 79, SUBSAMP_NONE }, // 6 + { 65536, 2048, 16, 7, 7, 6, 64, 96, 86, SUBSAMP_NONE }, // 7 + { 65536, 2048, 24, 7, 7, 6, 64, 96, 92, SUBSAMP_NONE }, // 8 + { 65536, 2048, 32, 7, 7, 6, 96, 256,100, SUBSAMP_NONE } // 9 }; const int TightEncoder::defaultCompressLevel = 1; What this does, conceptually: -- CL 6 in TigerVNC is now, in almost all cases, the maximum useful mode. This matches what the GUI already says (1=fast, 6=best, 4-6 are rarely useful). -- If someone decides that they want to jack it up to CL 9, this will never have a negative effect on compression ratio relative to CL 6, but whether it has a significant benefit will depend on the workload. On average, 2D datasets will compress 10% better with a 10% loss in performance, and on average, 3D datasets will compress 3% better with a 10% loss in performance. That's a lot nicer of a trade-off than the current CL 9 provides (basically no compression benefit with 4-5x more CPU usage.) -- CL 7 and 8 will basically perform the same as CL 6. On 1/23/14 4:01 AM, Pierre Ossman wrote: > On Wed, 22 Jan 2014 17:30:25 -0600, > DRC wrote: > >> >> My proposal is for TigerVNC to adopt four compression modes: CL 0, CL >> 1, CL 2, and CL 5. CL 3 and 4 would map to 2, and CL 6-9 would map to >> 5, and the GUI could be restructured so that it sets the compression >> level to "low, high, and very high", with a warning that "very high" is >> only better than "high" in some rare cases. >> >> Just a suggestion. I can provide a server-side patch for this if requested. >> > > Interesting stuff. I'd definitely like to know more about the > discussion and the testing that was done. > > I'm shuffling around all of that code at the moment though, so please > hold off on that patch for a while. :) > > Rgds > ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Tigervnc-devel mailing list Tigervnc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tigervnc-devel