Re: [VirtualGL-Users] More data regarding the performance of the new Java viewer

DRC Fri, 01 Mar 2013 17:00:31 -0800

Major correction to the below.  Actually, I was measuring the 
performance of TigerVNC incorrectly.  When benchmarking vncviewer, it's 
necessary to synchronize the decode and the blitting in order to 
accurately break down the time spent in each, and that's what both of 
the TurboVNC viewers do, so I had to do likewise with the TigerVNC 
viewer in order to obtain an apples-to-apples comparison.  Here are the 
amended results:


Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, nVidia Quadro
600 (310.14), CentOS 5.8:

X11 TurboVNC Viewer 1.2 beta1:
    Decode / Total = 10.6 / 15.6 s
Java TurboVNC Viewer 1.2 beta1:
    Decode / Total = 12.5 / 23.2 s
Native TigerVNC Viewer r5051:
    Decode / Total = 12.2 / 23.7 s

HP Pavilion Slimline s5100z, dual-core 2.6 GHz AMD Athlon 64 X2 5050e,
nVidia GeForce 6150SE (304.64), CentOS 6.3:

X11 TurboVNC Viewer 1.2 beta1:
    Decode / Total = 17.2 / 24.0 s
Java TurboVNC Viewer 1.2 beta1:
    Decode / Total = 25.3 / 64.4 s
Native TigerVNC Viewer r5051:
    Decode / Total = 20.6 / 36.7 s

Mac Mini 2.0 GHz Intel Core 2 Duo, nVidia GeForce 9400, OS X 10.8.2:

X11 TurboVNC Viewer 1.2 beta1:
    Decode / Total = 16.2 / 48.8 s
Java TurboVNC Viewer 1.2 beta1:
    Decode / Total = 19.0 / 39.7 s
Native TigerVNC Viewer r5051:
    Decode / Total = 25.0 / 381 s


NOTES:

-- The FLTK performance on OS X still seems like a bug.  Hard to believe 
that it would really be 10x slower than Java.  If any TigerVNC 
developers are reading this, you may want to look into it.

-- As you can see, our Java viewer actually performs almost the same as 
the TigerVNC native viewer on the Dell machine.  The disparity in 
performance between those two viewers and the X11 TurboVNC Viewer is 
entirely due to differing double buffering strategies.  As explained 
below, the X11 TurboVNC Viewer doesn't do true double buffering.  It 
instead waits until all rectangles in a framebuffer update have been 
decoded and draws them all in rapid succession to the screen.  TigerVNC 
and the Java TurboVNC Viewer, on the other hand, draw a bounding box 
containing all of the updated rectangles.  In some cases, this causes 
those two viewers to draw more pixels.  For instance, if the user is 
typing text into a console application, like Emacs, the X11 TurboVNC 
Viewer could get away with drawing only the tiny rectangles representing 
the character being typed, the previous character, and the updated 
status bar text at the bottom.  TigerVNC (and Java TurboVNC), however, 
will redraw most of the window in that situation, because the bounding 
box of the status bar (with updated text at the bottom right) and the 
text being typed at the upper left encompasses most of the window.  I 
don't think this represents a noticeable performance issue from the 
point of view of the user, unless the user is able to type 100 
characters per second.  :)  When you look at just the 3D datasets, the 
blitting performance under Java is very much in line with the X11 
TurboVNC Viewer, because the framebuffer updates are typically large and 
monolithic.

-- The remaining issue is that, for whatever reason, Java2D is not being 
accelerated on my HP machine.  We're looking into that, and I will post 
amended results once we find a solution.


There are a couple of other minor usability tweaks that need to happen 
with the Java viewer before it could fully replace the X11 viewer on 
Linux (for instance, figuring out how to implement keyboard grabbing, 
extending the -via/-tunnel feature to allow using an external SSH 
command rather than the built-in Java SSH client, etc.), but it seems 
like the performance is there, assuming we can figure out the Java2D 
acceleration issue.  When I accelerated the TigerVNC codecs in 2011, I 
did extensive work to convince myself that, from an end user point of 
view, the TigerVNC Viewer would appear as fast as the TurboVNC Viewer, 
so I'm confident that if our Java viewer can hit that same baseline, 
we're golden.


On 2/26/13 8:16 PM, DRC wrote:
> For curiosity, I ported the benchmarking system from the TurboVNC Viewer
> into the TigerVNC native (FLTK) Viewer.  This was partly done to get a
> better idea of how the new JNI-accelerated Java TurboVNC Viewer compared
> -- since that viewer is architecturally more similar to TigerVNC, it's a
> bit more of an apples to apples comparison than comparing it to the
> TurboVNC native viewers.  Also, this research served as a baseline for
> an upcoming project to multi-thread the TurboVNC decoder.
>
> For those who aren't familiar with this benchmarking system, basically
> what it does is take the set of 20 session captures that were originally
> used to design the TurboVNC encoding methods
> (http://www.virtualgl.org/pmwiki/uploads/About/tighttoturbo.pdf),
> pre-encodes them using the TurboVNC Benchmark Tools
> (http://virtualgl.svn.sourceforge.net/viewvc/virtualgl/vncbenchtools/trunk/)
> using the turbo-1.1 encoder with "Perceptually Lossless JPEG", and runs
> the encoded sessions through the viewer unimpeded (basically replacing
> socket reads with reads from a session capture file.)  The benchmarking
> system can make multiple runs of the same dataset and report the average
> of the total time, decoding time, and blitting time for each iteration
> (I typically take the average of 5 runs with 2 "throw-away" runs at the
> beginning.)  The system also subtracts out any time spent reading the
> session capture from disk.  This was the same system we used to figure
> out that the new Java TurboVNC Viewer is actually faster than the X11
> TurboVNC Viewer on OS X, which led to replacing that viewer in TurboVNC
> 1.2.
>
> The results comparing with TigerVNC were interesting.  Prior studies
> with the X11 TurboVNC Viewer had revealed that the decoding performance
> of that viewer on Linux was higher than that of the new Java TurboVNC
> Viewer, and I suspected that it was due to the different way in which
> both solutions handle solid-colored subrectangles.  When decoding the
> Tight stream, the X11 TurboVNC Viewer uses an XImage to store the
> decoded non-solid subrectangles, but it stores just the coordinates and
> fill color of the solid-colored rectangles.  Thus, when it comes time to
> draw the frame, the viewer uses a series of XShmPutImage() calls to draw
> the non-solid subrectangles and a series of XFillRectangle() calls to
> draw the solid subrectangles.  This is not technically double buffering,
> but those X11 calls are processed so fast that effectively it appears
> double-buffered.  The TigerVNC Viewer and the new Java TurboVNC Viewer
> use true double buffering, and as such, solid-colored subrectangles have
> to be rendered to the back buffer whenever they are decoded.  Thus, the
> Java TurboVNC Viewer turns in slower decoding performance than the X11
> TurboVNC Viewer on Linux, but in fact, on that platform, the decoding
> performance of the Java TurboVNC Viewer is about equal with the native
> TigerVNC Viewer.
>
> Results follow.  "Total" is the total (wall) time taken to fully
> decode/draw all 20 datasets, averaged over 5 runs with 2 "warmup" runs.
>   "Decode" is the portion of that wall time spent in the Tight decoder.
>   64-bit code (or a 64-bit JVM) was used in all cases.  J2SE or OpenJDK
> 1.6.0 was used for all Java cases.  -O3 with GCC 4 was used for the
> C/C++ code.
>
> Dell Precision T3500, quad-core 2.8 GHz Intel Xeon W3530, nVidia Quadro
> 600 (310.14), CentOS 5.8:
>
> X11 TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 10.6 / 15.6 s
> Java TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 12.5 / 23.2 s
> Native TigerVNC Viewer r5051:
>    Decode / Total = 12.4 / 16.6 s
>
> HP Pavilion Slimline s5100z, dual-core 2.6 GHz AMD Athlon 64 X2 5050e,
> nVidia GeForce 6150SE (304.64), CentOS 6.3:
>
> X11 TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 17.2 / 24.0 s
> Java TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 25.3 / 64.4 s
> Native TigerVNC Viewer r5051:
>    Decode / Total = 24.9 / 30.4 s
>
> Mac Mini 2.0 GHz Intel Core 2 Duo, nVidia GeForce 9400, OS X 10.8.2:
>
> X11 TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 16.2 / 48.8 s
> Java TurboVNC Viewer 1.2 beta1:
>    Decode / Total = 19.0 / 39.7 s
> Native TigerVNC Viewer r5051:
>    Decode / Total = 24.9 / 219 s
>
> NOTES:
>
> -- I am interested, in the long term, in replacing the X11 TurboVNC
> Viewer on Linux with the Java viewer, but we need to first figure out
> how to get the drawing performance up to snuff on that platform.  Note
> that the drawing performance (Total - Decode) on Linux is similar for
> the native TurboVNC and TigerVNC viewers.  The Java viewer's drawing
> performance, compared to native, is about half on the Dell and about 1/6
> on the HP, for reasons unknown.
>
> -- The drawing performance of FLTK is awful on OS X.  Not sure why.  But
> assuming this is a legitimate result (it seems to be-- I mean, I watched
> it, and it was visibly very slow), it further validates the notion that
> Java may be the fastest solution for Mac, at least at the moment.
>
> -- As mentioned above, the decoding performance of the Java TurboVNC
> Viewer is about the same as the native TigerVNC Viewer on Linux, and
> it's actually better than TigerVNC on OS X.
>
>
> I'll be doing similar comparisons with the Windows Viewer over the next
> couple of months and will keep everyone posted on that.
>
> If anyone wants to peer review this work, I'm happy to provide the
> TigerVNC patches that implement the benchmark functionality.
>
> DRC

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
VirtualGL-Users mailing list
VirtualGL-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

Re: [VirtualGL-Users] More data regarding the performance of the new Java viewer

Reply via email to