[VirtualGL-Users] More data regarding the performance of the new Java viewer

DRC Tue, 26 Feb 2013 18:17:55 -0800

For curiosity, I ported the benchmarking system from the TurboVNC Viewer 
into the TigerVNC native (FLTK) Viewer.  This was partly done to get a 
better idea of how the new JNI-accelerated Java TurboVNC Viewer compared 
-- since that viewer is architecturally more similar to TigerVNC, it's a 
bit more of an apples to apples comparison than comparing it to the 
TurboVNC native viewers.  Also, this research served as a baseline for 
an upcoming project to multi-thread the TurboVNC decoder.


For those who aren't familiar with this benchmarking system, basically 
what it does is take the set of 20 session captures that were originally 
used to design the TurboVNC encoding methods 
(http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf), 
pre-encodes them using the TurboVNC Benchmark Tools 
(http://virtualgl.svn.sourceforge.net/viewvc/virtualgl/vncbenchtools/trunk/) 
using the turbo-1.1 encoder with "Perceptually Lossless JPEG", and runs 
the encoded sessions through the viewer unimpeded (basically replacing 
socket reads with reads from a session capture file.)  The benchmarking 
system can make multiple runs of the same dataset and report the average 
of the total time, decoding time, and blitting time for each iteration 
(I typically take the average of 5 runs with 2 "throw-away" runs at the 
beginning.)  The system also subtracts out any time spent reading the 
session capture from disk.  This was the same system we used to figure 
out that the new Java TurboVNC Viewer is actually faster than the X11 
TurboVNC Viewer on OS X, which led to replacing that viewer in TurboVNC 1.2.

The results comparing with TigerVNC were interesting.  Prior studies 
with the X11 TurboVNC Viewer had revealed that the decoding performance 
of that viewer on Linux was higher than that of the new Java TurboVNC 
Viewer, and I suspected that it was due to the different way in which 
both solutions handle solid-colored subrectangles.  When decoding the 
Tight stream, the X11 TurboVNC Viewer uses an XImage to store the 
decoded non-solid subrectangles, but it stores just the coordinates and 
fill color of the solid-colored rectangles.  Thus, when it comes time to 
draw the frame, the viewer uses a series of XShmPutImage() calls to draw 
the non-solid subrectangles and a series of XFillRectangle() calls to 
draw the solid subrectangles.  This is not technically double buffering, 
but those X11 calls are processed so fast that effectively it appears 
double-buffered.  The TigerVNC Viewer and the new Java TurboVNC Viewer 
use true double buffering, and as such, solid-colored subrectangles have 
to be rendered to the back buffer whenever they are decoded.  Thus, the 
Java TurboVNC Viewer turns in slower decoding performance than the X11 
TurboVNC Viewer on Linux, but in fact, on that platform, the decoding 
performance of the Java TurboVNC Viewer is about equal with the native 
TigerVNC Viewer.

Results follow.  "Total" is the total (wall) time taken to fully 
decode/draw all 20 datasets, averaged over 5 runs with 2 "warmup" runs. 
  "Decode" is the portion of that wall time spent in the Tight decoder. 
  64-bit code (or a 64-bit JVM) was used in all cases.  J2SE or OpenJDK 
1.6.0 was used for all Java cases.  -O3 with GCC 4 was used for the 
C/C++ code.

Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, nVidia Quadro 
600 (310.14), CentOS 5.8:

X11 TurboVNC Viewer 1.2 beta1:
   Decode / Total = 10.6 / 15.6 s
Java TurboVNC Viewer 1.2 beta1:
   Decode / Total = 12.5 / 23.2 s
Native TigerVNC Viewer r5051:
   Decode / Total = 12.4 / 16.6 s

HP Pavilion Slimline s5100z, dual-core 2.6 GHz AMD Athlon 64 X2 5050e, 
nVidia GeForce 6150SE (304.64), CentOS 6.3:

X11 TurboVNC Viewer 1.2 beta1:
   Decode / Total = 17.2 / 24.0 s
Java TurboVNC Viewer 1.2 beta1:
   Decode / Total = 25.3 / 64.4 s
Native TigerVNC Viewer r5051:
   Decode / Total = 24.9 / 30.4 s

Mac Mini 2.0 GHz Intel Core 2 Duo, nVidia GeForce 9400, OS X 10.8.2:

X11 TurboVNC Viewer 1.2 beta1:
   Decode / Total = 16.2 / 48.8 s
Java TurboVNC Viewer 1.2 beta1:
   Decode / Total = 19.0 / 39.7 s
Native TigerVNC Viewer r5051:
   Decode / Total = 24.9 / 219 s

NOTES:

-- I am interested, in the long term, in replacing the X11 TurboVNC 
Viewer on Linux with the Java viewer, but we need to first figure out 
how to get the drawing performance up to snuff on that platform.  Note 
that the drawing performance (Total - Decode) on Linux is similar for 
the native TurboVNC and TigerVNC viewers.  The Java viewer's drawing 
performance, compared to native, is about half on the Dell and about 1/6 
on the HP, for reasons unknown.

-- The drawing performance of FLTK is awful on OS X.  Not sure why.  But 
assuming this is a legitimate result (it seems to be-- I mean, I watched 
it, and it was visibly very slow), it further validates the notion that 
Java may be the fastest solution for Mac, at least at the moment.

-- As mentioned above, the decoding performance of the Java TurboVNC 
Viewer is about the same as the native TigerVNC Viewer on Linux, and 
it's actually better than TigerVNC on OS X.


I'll be doing similar comparisons with the Windows Viewer over the next 
couple of months and will keep everyone posted on that.

If anyone wants to peer review this work, I'm happy to provide the 
TigerVNC patches that implement the benchmark functionality.

DRC

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
VirtualGL-Users mailing list
VirtualGL-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

[VirtualGL-Users] More data regarding the performance of the new Java viewer

Reply via email to