> Did you see that they implemented the RemoteFX codec (libfreerdp-rfx) -
> both encode/decode with SSE2 acceleration? That must be something worth
> testing as another option in TurboVNC? I don't know how it would compare
> to libjpeg-turbo (it is some wavelet algorithm) but I think its good
> stuff. It is available on github in their repo:
>
> https://github.com/FreeRDP/FreeRDP/tree/master/libfreerdp/codec

I doubt that it will perform as well as what we're doing, because I've 
personally tested the full-blown RemoteFX solution using a Windows 
Server and Windows Client.  It doesn't support accelerated OpenGL, so it 
was a non-starter as a remote 3D solution for technical computing, but I 
also looked at the general performance, and I wasn't able to push more 
than about 1/3 the pixels that TurboVNC is capable of.  I agree, though, 
that it does bear further investigation.

You can read more about what TurboVNC is doing here 
http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf and in 
the TurboVNC User's Guide, but in a nutshell, our codec has been 
designed almost from the ground up around the needs of 3D and video 
applications.  We built upon TightVNC, which has the ability to split 
out areas of solid color in a framebuffer update and send them 
separately as bounding boxes (very fast, extremely low-bandwidth), then 
the remaining areas of the FBU are divided into subrectangles, and each 
subrect is sent using the most optimal subencoding based on the number 
of unique colors in the subrect.  JPEG is used for high-color subrects, 
and mono or indexed color is used for low-color subrects.

Where we improve upon TightVNC is in picking the mix of subencodings-- I 
refer to this in the docs as the "encoding method."  TightVNC's encoding 
methods are geared toward compressing 2D workloads absolutely as tightly 
as it can, since the solution was originally targeted at remote desktop 
access over dial-up and satellite.  It used JPEG only sparingly because, 
at the time, JPEG was really slow.  However, the performance assumptions 
made by TightVNC 1.3.x are no longer valid.  libjpeg-turbo makes JPEG 
encoding the fastest method of encoding, so we can get really tight and 
fast compression on high-color subrects without having to resort to high 
levels of Zlib compression.  The Zlib performance curve is nonlinear, 
and as you get into the higher levels, you can easily encounter 
situations in which your CPU usage doubles but you only get 10% better 
compression (NOTE: really wish someone would come up with SSE2 
optimizations for Zlib!)  That's basically the problem with the higher 
compression levels in TightVNC.  CL 9 in TightVNC doesn't, in the 
aggregate, produce any better compression than CL 5 except on rare 
corner cases, and it eats up 5 times (!) the CPU as CL 5.  Anything 
above CL 5 in the TightVNC 1.3.x codec (which is what libvncserver used 
prior to my involvement, and what many other projects still use) is 
literally useless.


> I also note they are experimenting with both an x11 RDP server and a
> server for Windows. The Windows server (how to capture on Windows has
> been discussed here in other threads) is using the new "Desktop
> Duplication API" that was introduced in Windows 8 that also TightVNC is
> using in their latest release:
>
> http://www.tightvnc.com/release-2.7.php

Definite improvement over mirror drivers, but it's still a screen 
scraper.  Screen scrapers can only serve one user at a time, and it's 
difficult to do hardware-accelerated 3D with them.  Not sure whether the 
desktop duplication API changes that, but I doubt it.  HP RGS is the 
only screen scraper solution I know of that has managed to solve the 3D 
problem.

I looked at TightVNC 2.x recently, and although it has some codec 
improvements relative to TightVNC 1.3.x (IIRC, it no longer performs 
smoothness detection), it also has regressed in a key area-- it no 
longer splits out solid regions of the FBU, so its ability to optimize 
subrects based on color count is not as good as TightVNC 1.3.x was (and 
not as good as TurboVNC is.)  TigerVNC was initially suffering from the 
same fate, so I had to port over that functionality from TurboVNC in 
order to bring its encoder in line with ours 
(http://www.virtualgl.org/pmwiki/uploads/About/turbototiger.pdf). 
Further, at least when I looked at TightVNC 2.x, they weren't using 
libjpeg-turbo, and the subrect mix was still similar to the old TightVNC 
1.3.x, thus limiting the potential speedup from accelerated JPEG.

I really wish TightVNC would adopt the TurboVNC encoding strategy.  I've 
spent many hours proving that we are much much faster in all cases, and 
we can produce approximately the same "tightness" when using the new 
compression level (9) provided in TurboVNC 1.2.  I measure the 
performance of this stuff by isolating the codecs at the low level and 
using them to encode a set of 20 RFB captures-- 8 of them are the 2D 
datasets supplied by Const that he used when designing the TightVNC 
codec, and the other 12 were captured when running Viewperf datasets, 
Quake, Google Earth, and our old friend GLXspheres.  The following is a 
summary of the research I conducted for the libvncserver developers, in 
which I compared TightVNC with TurboVNC, using the new CL 9 in the 
latter with medium-quality JPEG (same quality that TightVNC uses when 
you set Quality=9.)

TurboVNC CL 9, 2X subsamp, Qual 80 compared to TightVNC 1.3.x CL 5, Qual 9:

2D datasets:
Compression ratio:  83-119% (average 101%)
Performance:        89-211% (average 104%)

3D datasets:
Compression ratio:  114-236% (average 162%)
Performance:        116-432% (average 173%)

TurboVNC CL 9, 2X subsamp, Qual 80 compared to TightVNC 1.3.x CL 9, Qual 9:

2D datasets:
Compression ratio:  84-115% (average 98%)
Performance:        181-743% (average 472%)

3D datasets:
Compression ratio:  114-230% (average 160%)
Performance:        386-1751% (average 852%)

So, in numbers, this says the same thing that I said above-- we are able 
to compress as tightly as TightVNC on 2D datasets, in the aggregate (and 
much better than TightVNC on 3D datasets), and our performance is 
generally way better across the board.  Also, it shows numerically that, 
in the aggregate, TightVNC CL 9 does not improve the compression ratio 
relative to TightVNC CL 5.  It does offer some small improvement (<10%) 
in compression ratio relative to CL 5 on some isolated datasets, but it 
gives up the same on others.

Now, also bear in mind that CL 9 in TurboVNC was added solely to assuage 
the fears of those who were using TightVNC.  CL 2 in TurboVNC generally 
provides a much better balance of performance and compression ratio.  In 
the aggregate, CL 9 vs. CL 2 in TurboVNC compresses only 18% better on 
2D workloads and 7% better on 3D workloads, and CL 9 uses literally 
twice the CPU time of CL 2.  So even though CL 9 in TurboVNC matches or 
bests TightVNC, it's still our worse-performing mode by far.

The trade-off between CL 2 and CL 1 in TurboVNC is much more equal-- CL 
2 generally compresses 10-20% better but uses 10-20% more CPU time.


> Btw, I looked at the vglrun shell script and AFAICT it simply uses the
> LD_PRELOAD trick. If I turn Remmina into a dynamic library and compile
> it myself, I presume it is easy to make sure I get the faker lib and not
> having to use vglrun? I am not a hardcore C/C++ guy, but I can push
> myself through things ;)

You're confusing the purposes of VirtualGL and TurboVNC.  VirtualGL 
redirects the 3D rendering on the server side into a Pbuffer on the 
"root" X display, then reads back the rendered pixels and transmits them 
to the X proxy (TurboVNC in our case.)  You can use other X proxies or 
remote display technologies, but that doesn't eliminate the need for 
VirtualGL.  VirtualGL is what gives you 3D hardware acceleration in your 
remote display environment.


> Ah, another silly question - the IPP version of libjpeg-turbo, how do I
> use that? 3-4x faster sounds interesting.

There is no IPP version of libjpeg-turbo.  libjpeg-turbo has its own 
built-in SIMD routines that accelerate it to 3-4x the performance of 
plain libjpeg.  It performs on par with IPP in most cases (I re-tested 
that recently with IPP 7.1.) 
http://www.libjpeg-turbo.org/About/Performance has a run-down.

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
VirtualGL-Devel mailing list
VirtualGL-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtualgl-devel

Reply via email to