Sorry about that previous bit of noise.  I thought I had stumbled upon 
an even simpler way of accomplishing this with TigerVNC, but it was 
because of a spreadsheet error.

Anyhow, the original analysis is all very buried in the 
libvncserver-common mailing list in multiple threads, with lots of 
hemming and hawing, and the spreadsheet is an eye chart.  Thus, I'm 
going to repeat the analysis here, so you can have the information all 
in one place.

Basically, this all started with convincing the libvncserver developers 
to support the TurboVNC encoder as a build option, mainly because I 
wanted to be able to use that encoder with x11vnc without patching the 
libvncserver code.  They agreed, but this led to discussions about why 
it was then necessary to keep supporting the TightVNC 1.3.x encoder, so 
ultimately we agreed that replacing the TightVNC encoder with the 
TurboVNC encoder was the cleanest solution.  However, this led to 
questions about whether the TurboVNC encoder could compress as "tightly" 
in all cases, which prompted me to do this research.

Basically, the tests I conducted were similar to the ones I performed 
when designing the TurboVNC codec, using the same canonical datasets (8 
2D datasets from Constantin and 12 3D datasets of my own design.  Refer 
to http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf for 
dataset descriptions.)  NOTE: the 8-bit test (freshmeat-8) is not 
particularly realistic anymore, since it doesn't use JPEG, so it was 
excluded from this analysis, but I also verified that including it would 
not have changed any of the conclusions (that is, I verified that that 
dataset was never an outlier.)

Since libvncserver has to support both libjpeg and libjpeg-turbo, the 
libvncserver developers also needed reassurance that the performance 
with the new TurboVNC encoder was not solely dependent on the use of 
libjpeg-turbo.  Thus, I tested the Turbo encoder against both libjpeg 
and libjpeg-turbo, but that aspect of the research is largely irrelevant 
for TigerVNC.  Further, I tested the encoder with 4:2:2 Q80 and Q37 JPEG 
images, so as to get a true apples-to-apples comparison with low-quality 
JPEG (JPEG Quality Level 4) and medium-quality JPEG (JPEG Quality Level 
9) in TightVNC 1.3.x (since TightVNC 1.3.x only supports 4:2:2 
subsampling.)  In all cases, I compared aggregate 2D and 3D compression 
ratio and performance across the series of datasets as well as looked 
for any outliers that diverged significantly from the average.

I started by finding the "maximum useful" compression level in TightVNC, 
which I defined as the level beyond which it was impossible to get more 
than a 5% improvement in compression ratio by going any higher.  For 
almost all datasets, this was CL 5, but a few were able to get some 
incremental benefit (no more than 10%) by moving from CL 5 to CL 6.  No 
datasets saw more than a 5% improvement in compression ratio by moving 
from CL 6 to CL 9.

TightVNC CL 9 vs. TightVNC CL 6:
-- Compression Ratio for 2D datasets:  -0.58% to +3.4% (avg +1.0% for 
Q80,  +2.2% for Q37)
-- Compression Ratio for 3D datasets:  -2.3% to +5.0%  (avg +0.82% for 
Q80, +1.3% for Q37)
-- Speedup for 2D datasets:            -83% to -46%    (avg -74% for 
Q80,   -75% for Q37)
-- Speedup for 3D datasets:            -87% to -4.7%   (avg -78% for 
Q80,   -78% for Q37)

Thus, CL 6 in TightVNC is the maximum useful compression level and 
served as the compression ratio target for the TurboVNC encoder.  Note 
that switching from CL 6 to CL 9 in the TightVNC encoder increased CPU 
time by, on average, 4-5x without providing any significant increase in 
compression ratio for any dataset I tested.  Why does CPU time matter, 
you ask?  Because using CL 9 in TightVNC produces a situation in which 
the CPU usage on the server is so high that it can become the primary 
bottleneck, even on low-bandwidth connections.  For some datasets, 
TightVNC CL 9 would have been unable to fill even a 2-megabit pipe.  CL 
6, in contrast, could generally fill a 10-20 megabit pipe without any 
significant reduction in compression ratio relative to CL 9.

NOTE:  I ran the same tests without JPEG enabled as well (using Raw 
subrects in place of JPEG, not using gradient encoding), just as a 
sanity check, and CL 6 proved to be the maximum useful compression ratio 
in that case as well, with similar statistics to the above.

I then tested TurboVNC's CL 2 (which is similar to TigerVNC's CL 2, and 
at the time this was the maximum compression level available in 
TurboVNC) against TightVNC CL 6:
-- Compression Ratio for 2D datasets:        -29% to +1.1%  (avg -16% 
for Q80,  -12% for Q37)
-- Compression Ratio for 3D datasets:        -14% to +131%  (avg +51% 
for Q80,  +4.5% for Q37)
-- Speedup for 2D datasets (libjpeg):        -4.9% to +115% (avg +53% 
for Q80,  +50% for Q37)
-- Speedup for 3D datasets (libjpeg):        -12% to +210%  (avg +91% 
for Q80,  +52% for Q37)
-- Speedup for 2D datasets (libjpeg-turbo):  +26% to +226%  (avg +128% 
for Q80, +121% for Q37)
-- Speedup for 3D datasets (libjpeg-turbo):  +115% to +574% (avg +327% 
for Q80, +243% for Q37)

So, you can see that TurboVNC CL 2 already compresses as well as or 
better than the TightVNC on 3D workloads in most cases (in all cases for 
Q80, but Q37 had one -14% outlier and three -7% outliers among the 3D 
datasets.)  TurboVNC CL 2 also has generally better performance across 
the board, even when using libjpeg instead of libjpeg-turbo.  However, 
it isn't quite up to par with TightVNC 1.3.x in terms of overall 
compression ratio for 2D datasets.

I went through a good bit of trial and error and finally stumbled upon a 
mode that mimics the other compression levels in TurboVNC, except that 
it borrows the Zlib compression levels from TightVNC CL 5 (7 for index 
and mono subrects and 5 for raw subrects) and it uses a palette 
threshold of 256 (to favor the use of indexed color subrects as much as 
possible.)  This new mode became TurboVNC CL 9:

TurboVNC CL 9 vs. TightVNC CL 6:
-- Compression Ratio for 2D datasets:        -16% to +17%   (avg +0.043% 
for Q80, -7.0% for Q37)
-- Compression Ratio for 3D datasets:        -2.8% to +133% (avg +62% 
for Q80,    +11% for Q37)
-- Speedup for 2D datasets (libjpeg):        -18% to +1.1%  (avg -4.8% 
for Q80,   -8.1% for Q37)
-- Speedup for 3D datasets (libjpeg):        -15% to +117%  (avg +27% 
for Q80,    -1.1% for Q37)
-- Speedup for 2D datasets (libjpeg-turbo):  -6.6% to +132% (avg +8.1% 
for Q80,   +4.0% for Q37)
-- Speedup for 3D datasets (libjpeg-turbo):  +11% to +366%  (avg +90% 
for Q80,    +47% for Q37)

The compression ratio for 2D apps is now a lot more in the ballpark of 
TightVNC.  Only a couple of 2D workloads still compressed significantly 
better with TightVNC, and most compressed better with the new TurboVNC 
mode.  The outliers were not extreme, and the average level of 
compression was about the same.  The only negative compression ratio 
outliers larger than +/- 6% were the kde-hearts and photos datasets.  I 
will discuss these individually.

kde-hearts-16 & kde-hearts-24:
With medium-quality JPEG (Q80) and TurboVNC CL 2, the two kde-hearts 
tests were the biggest negative outliers (~ -30%) in terms of 
compression ratio when compared apples-to-apples with TightVNC CL 6. 
With low-quality JPEG, however, those same tests with TurboVNC CL 2 
compressed about equally well when compared apples-to-apples with 
TightVNC CL 6.  Thus, if a real-world workload was similar in nature to 
those datasets, and if a user was looking for the maximum compression, 
then we already had a mode that could provide it.  The new TurboVNC CL 9 
basically split the difference.  Rather than being a -30% outlier with 
medium-quality JPEG and performing on parity with low-quality JPEG, the 
kde-hearts tests were now at about -10% to -15% on both.

Disabling JPEG altogether (while still using the new TurboVNC CL 9) 
proved to be a better approach for the kde-hearts tests.  kde-hearts-16 
now compressed 7% better than with TightVNC CL 6/JPEG Q80, and 
kde-hearts-24 was now only 4% worse.  Thus, we now had modes that could 
achieve the same compression ratio and image quality with kde-hearts-* 
when compared to the TightVNC baseline.

The photos test was still at -10% compared to TightVNC, and numerous 
attempts to improve this were unsuccessful.  The kde-hearts and photos 
datasets represent rare cases for which the smoothness detection 
routines in TightVNC 1.3.x are actually beneficial, but as the original 
TurboVNC research showed, smoothness detection is too computationally 
expensive to be enabled all the time, and it is not beneficial in most 
cases.  It's sort of a moot point, though, since smoothness detection is 
a legacy feature of TightVNC and has been removed in more recent releases.

The purpose of this exercise was to demonstrate that, in general, it was 
possible to use the TurboVNC codec without giving up the "tightness" of 
TightVNC in most cases.  The libvncserver developers ran their own 
independent tests confirming that the new TurboVNC CL 9 achieved this to 
their satisfaction.

Now, do I consider the new CL 9 in TurboVNC to be a generally useful 
mode?  Not in the vast majority of cases.  On average, it doubles the 
CPU time relative to CL 2 and provides only about 20% better compression 
for 2D datasets and 7% better compression for 3D datasets.  But there 
are a few 2D cases that compress 30-50% better with this mode than with 
CL 2.  It's one of those "try it and see" situations.  We document the 
new mode but do not expose it in the GUI, so it is considered an 
advanced feature.

----------


Over the past few days, I repeated the above research for TigerVNC 
(hacking the TigerVNC encoder slightly so that JPEG quality level 5 
corresponds to Q80 and JPEG quality level 3 corresponds to Q37, so I 
could get a true apples-to-apples comparison with TightVNC.)

 From previous tests 
(http://www.virtualgl.org/pmwiki/uploads/About/turbototiger.pdf), we 
already know that CL 6 is the maximum useful mode in TigerVNC.  Well, in 
fact, CL 5 and CL 6 are virtually identical in TigerVNC.  The only major 
difference is that CL 5 uses Zlib level 4 for raw subrects and CL 6 uses 
Zlib level 5 for raw subrects, so that difference is irrelevant when 
JPEG is enabled.

TigerVNC CL 9 vs. TigerVNC CL 5:
-- Compression Ratio for 2D datasets:  +0.10% to +2.6% (avg +0.80% for 
Q80, 1.1% for Q37)
-- Compression Ratio for 3D datasets:  +0% to +4.8%    (avg +0.69% for 
Q80, 1.1% for Q37)
-- Speedup for 2D datasets (libjpeg):  -74% to -29%    (avg -61% for 
Q80,   -61% for Q37)
-- Speedup for 3D datasets (libjpeg):  -85% to +0.59%  (avg -68% for 
Q80,   -70% for Q37)

In TigerVNC, CL 5 is the maximum useful compression level when JPEG is 
enabled.  As with TightVNC, increasing the compression level beyond this 
has only a negligible impact on compression ratio and a very significant 
negative impact on performance.  As with TightVNC, this produces a 
situation in which the server CPU is the primary bottleneck on any 
network faster than a few megabits/sec.  With JPEG disabled, the maximum 
useful level turns out to be CL 7, but only because CL 7 sets the raw 
Zlib level to 6 instead of 5.  From the research with the TurboVNC 
encoder, it became apparent that using Zlib levels > 7 was never 
beneficial.  Thus, it turns out that we can set the Zlib levels in 
TigerVNC CL 6 to 7,7,6 (the same as TightVNC CL 6) and get the 
compression ratio to within +/- 5% of TigerVNC CL 9 in all cases, 
including with JPEG disabled.

So let's see where TigerVNC stands with respect to TightVNC:

TigerVNC CL 5 vs. TightVNC CL 6:
-- Compression Ratio for 2D datasets:        -28% to +11%   (avg -9.4% 
for Q80, -3.7% for Q37)
-- Compression Ratio for 3D datasets:        -3.6% to +133% (avg +56% 
for Q80,  +10% for Q37)
-- Speedup for 2D datasets (libjpeg):        -38% to +50%   (avg +9.9% 
for Q80, +8.6% for Q37)
-- Speedup for 3D datasets (libjpeg):        +6.9% to +138% (avg +67% 
for Q80,  +33% for Q37)
-- Speedup for 2D datasets (libjpeg-turbo):  -37% to +200%  (avg +35% 
for Q80,  +28% for Q37)
-- Speedup for 3D datasets (libjpeg-turbo):  +43% to +594%  (avg +155% 
for Q80, +94% for Q37)

TigerVNC is in a better position than TurboVNC was, but we're still 
dealing with some large negative outliers relative to TightVNC. 
However, changing the parameters of TigerVNC CL 9 such that a palette 
threshold of 256 and Zlib levels 7, 7, and 6 are used produces very 
similar results to the new TurboVNC CL 9.

Thus, I make the following proposals:
-- Cap the Zlib levels used by the Tight encoder in TigerVNC to 7 for 
indexed and mono subrects and to 6 for raw subrects
-- Change the Zlib levels of CL 5 and CL 6 to match those used in TightVNC
-- Set the palette threshold of CL 9 to 256.

--- TightEncoder.cxx    (revision 5946)
+++ TightEncoder.cxx    (working copy)
@@ -67,11 +67,11 @@
    { 65536, 2048,   8, 3, 3, 2,  24, 96, 41, SUBSAMP_420 }, // 2
    { 65536, 2048,  12, 5, 5, 2,  32, 96, 42, SUBSAMP_422 }, // 3
    { 65536, 2048,  12, 6, 7, 3,  32, 96, 62, SUBSAMP_422 }, // 4
-  { 65536, 2048,  12, 7, 8, 4,  32, 96, 77, SUBSAMP_422 }, // 5
-  { 65536, 2048,  16, 7, 8, 5,  32, 96, 79, SUBSAMP_NONE }, // 6
-  { 65536, 2048,  16, 8, 9, 6,  64, 96, 86, SUBSAMP_NONE }, // 7
-  { 65536, 2048,  24, 9, 9, 7,  64, 96, 92, SUBSAMP_NONE }, // 8
-  { 65536, 2048,  32, 9, 9, 9,  96, 96,100, SUBSAMP_NONE }  // 9
+  { 65536, 2048,  12, 7, 7, 5,  32, 96, 77, SUBSAMP_422 }, // 5
+  { 65536, 2048,  16, 7, 7, 6,  32, 96, 79, SUBSAMP_NONE }, // 6
+  { 65536, 2048,  16, 7, 7, 6,  64, 96, 86, SUBSAMP_NONE }, // 7
+  { 65536, 2048,  24, 7, 7, 6,  64, 96, 92, SUBSAMP_NONE }, // 8
+  { 65536, 2048,  32, 7, 7, 6,  96, 256,100, SUBSAMP_NONE } // 9
  };
  const int TightEncoder::defaultCompressLevel = 1;


What this does, conceptually:
-- CL 6 in TigerVNC is now, in almost all cases, the maximum useful 
mode.  This matches what the GUI already says (1=fast, 6=best, 4-6 are 
rarely useful).
-- If someone decides that they want to jack it up to CL 9, this will 
never have a negative effect on compression ratio relative to CL 6, but 
whether it has a significant benefit will depend on the workload.  On 
average, 2D datasets will compress 10% better with a 10% loss in 
performance, and on average, 3D datasets will compress 3% better with a 
10% loss in performance.  That's a lot nicer of a trade-off than the 
current CL 9 provides (basically no compression benefit with 4-5x more 
CPU usage.)
-- CL 7 and 8 will basically perform the same as CL 6.


On 1/23/14 4:01 AM, Pierre Ossman wrote:
> On Wed, 22 Jan 2014 17:30:25 -0600,
> DRC wrote:
>
>>
>> My proposal is for TigerVNC to adopt four compression modes:  CL 0, CL
>> 1, CL 2, and CL 5.  CL 3 and 4 would map to 2, and CL 6-9 would map to
>> 5, and the GUI could be restructured so that it sets the compression
>> level to "low, high, and very high", with a warning that "very high" is
>> only better than "high" in some rare cases.
>>
>> Just a suggestion.  I can provide a server-side patch for this if requested.
>>
>
> Interesting stuff. I'd definitely like to know more about the
> discussion and the testing that was done.
>
> I'm shuffling around all of that code at the moment though, so please
> hold off on that patch for a while. :)
>
> Rgds
>

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Tigervnc-devel mailing list
Tigervnc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-devel

Reply via email to