> Did you see that they implemented the RemoteFX codec (libfreerdp-rfx) - > both encode/decode with SSE2 acceleration? That must be something worth > testing as another option in TurboVNC? I don't know how it would compare > to libjpeg-turbo (it is some wavelet algorithm) but I think its good > stuff. It is available on github in their repo: > > https://github.com/FreeRDP/FreeRDP/tree/master/libfreerdp/codec
I doubt that it will perform as well as what we're doing, because I've personally tested the full-blown RemoteFX solution using a Windows Server and Windows Client. It doesn't support accelerated OpenGL, so it was a non-starter as a remote 3D solution for technical computing, but I also looked at the general performance, and I wasn't able to push more than about 1/3 the pixels that TurboVNC is capable of. I agree, though, that it does bear further investigation. You can read more about what TurboVNC is doing here http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf and in the TurboVNC User's Guide, but in a nutshell, our codec has been designed almost from the ground up around the needs of 3D and video applications. We built upon TightVNC, which has the ability to split out areas of solid color in a framebuffer update and send them separately as bounding boxes (very fast, extremely low-bandwidth), then the remaining areas of the FBU are divided into subrectangles, and each subrect is sent using the most optimal subencoding based on the number of unique colors in the subrect. JPEG is used for high-color subrects, and mono or indexed color is used for low-color subrects. Where we improve upon TightVNC is in picking the mix of subencodings-- I refer to this in the docs as the "encoding method." TightVNC's encoding methods are geared toward compressing 2D workloads absolutely as tightly as it can, since the solution was originally targeted at remote desktop access over dial-up and satellite. It used JPEG only sparingly because, at the time, JPEG was really slow. However, the performance assumptions made by TightVNC 1.3.x are no longer valid. libjpeg-turbo makes JPEG encoding the fastest method of encoding, so we can get really tight and fast compression on high-color subrects without having to resort to high levels of Zlib compression. The Zlib performance curve is nonlinear, and as you get into the higher levels, you can easily encounter situations in which your CPU usage doubles but you only get 10% better compression (NOTE: really wish someone would come up with SSE2 optimizations for Zlib!) That's basically the problem with the higher compression levels in TightVNC. CL 9 in TightVNC doesn't, in the aggregate, produce any better compression than CL 5 except on rare corner cases, and it eats up 5 times (!) the CPU as CL 5. Anything above CL 5 in the TightVNC 1.3.x codec (which is what libvncserver used prior to my involvement, and what many other projects still use) is literally useless. > I also note they are experimenting with both an x11 RDP server and a > server for Windows. The Windows server (how to capture on Windows has > been discussed here in other threads) is using the new "Desktop > Duplication API" that was introduced in Windows 8 that also TightVNC is > using in their latest release: > > http://www.tightvnc.com/release-2.7.php Definite improvement over mirror drivers, but it's still a screen scraper. Screen scrapers can only serve one user at a time, and it's difficult to do hardware-accelerated 3D with them. Not sure whether the desktop duplication API changes that, but I doubt it. HP RGS is the only screen scraper solution I know of that has managed to solve the 3D problem. I looked at TightVNC 2.x recently, and although it has some codec improvements relative to TightVNC 1.3.x (IIRC, it no longer performs smoothness detection), it also has regressed in a key area-- it no longer splits out solid regions of the FBU, so its ability to optimize subrects based on color count is not as good as TightVNC 1.3.x was (and not as good as TurboVNC is.) TigerVNC was initially suffering from the same fate, so I had to port over that functionality from TurboVNC in order to bring its encoder in line with ours (http://www.virtualgl.org/pmwiki/uploads/About/turbototiger.pdf). Further, at least when I looked at TightVNC 2.x, they weren't using libjpeg-turbo, and the subrect mix was still similar to the old TightVNC 1.3.x, thus limiting the potential speedup from accelerated JPEG. I really wish TightVNC would adopt the TurboVNC encoding strategy. I've spent many hours proving that we are much much faster in all cases, and we can produce approximately the same "tightness" when using the new compression level (9) provided in TurboVNC 1.2. I measure the performance of this stuff by isolating the codecs at the low level and using them to encode a set of 20 RFB captures-- 8 of them are the 2D datasets supplied by Const that he used when designing the TightVNC codec, and the other 12 were captured when running Viewperf datasets, Quake, Google Earth, and our old friend GLXspheres. The following is a summary of the research I conducted for the libvncserver developers, in which I compared TightVNC with TurboVNC, using the new CL 9 in the latter with medium-quality JPEG (same quality that TightVNC uses when you set Quality=9.) TurboVNC CL 9, 2X subsamp, Qual 80 compared to TightVNC 1.3.x CL 5, Qual 9: 2D datasets: Compression ratio: 83-119% (average 101%) Performance: 89-211% (average 104%) 3D datasets: Compression ratio: 114-236% (average 162%) Performance: 116-432% (average 173%) TurboVNC CL 9, 2X subsamp, Qual 80 compared to TightVNC 1.3.x CL 9, Qual 9: 2D datasets: Compression ratio: 84-115% (average 98%) Performance: 181-743% (average 472%) 3D datasets: Compression ratio: 114-230% (average 160%) Performance: 386-1751% (average 852%) So, in numbers, this says the same thing that I said above-- we are able to compress as tightly as TightVNC on 2D datasets, in the aggregate (and much better than TightVNC on 3D datasets), and our performance is generally way better across the board. Also, it shows numerically that, in the aggregate, TightVNC CL 9 does not improve the compression ratio relative to TightVNC CL 5. It does offer some small improvement (<10%) in compression ratio relative to CL 5 on some isolated datasets, but it gives up the same on others. Now, also bear in mind that CL 9 in TurboVNC was added solely to assuage the fears of those who were using TightVNC. CL 2 in TurboVNC generally provides a much better balance of performance and compression ratio. In the aggregate, CL 9 vs. CL 2 in TurboVNC compresses only 18% better on 2D workloads and 7% better on 3D workloads, and CL 9 uses literally twice the CPU time of CL 2. So even though CL 9 in TurboVNC matches or bests TightVNC, it's still our worse-performing mode by far. The trade-off between CL 2 and CL 1 in TurboVNC is much more equal-- CL 2 generally compresses 10-20% better but uses 10-20% more CPU time. > Btw, I looked at the vglrun shell script and AFAICT it simply uses the > LD_PRELOAD trick. If I turn Remmina into a dynamic library and compile > it myself, I presume it is easy to make sure I get the faker lib and not > having to use vglrun? I am not a hardcore C/C++ guy, but I can push > myself through things ;) You're confusing the purposes of VirtualGL and TurboVNC. VirtualGL redirects the 3D rendering on the server side into a Pbuffer on the "root" X display, then reads back the rendered pixels and transmits them to the X proxy (TurboVNC in our case.) You can use other X proxies or remote display technologies, but that doesn't eliminate the need for VirtualGL. VirtualGL is what gives you 3D hardware acceleration in your remote display environment. > Ah, another silly question - the IPP version of libjpeg-turbo, how do I > use that? 3-4x faster sounds interesting. There is no IPP version of libjpeg-turbo. libjpeg-turbo has its own built-in SIMD routines that accelerate it to 3-4x the performance of plain libjpeg. It performs on par with IPP in most cases (I re-tested that recently with IPP 7.1.) http://www.libjpeg-turbo.org/About/Performance has a run-down. ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ VirtualGL-Devel mailing list VirtualGL-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtualgl-devel