We run into this question from time to time, usually from end users who
ask me why GLXgears only performs at 500 frames/second in VirtualGL
instead of 5000.  My answer is always the same:  does anything over 30
fps matter?  Unless you're running a heads-up flight simulator, the
answer is almost certainly "no" (and even if you are running an
immersive application, anything over 60 probably doesn't matter.)

I'm not sure how measuring the overhead of VirtualGL is relevant unless
it is in the context of potentially reducing that overhead or comparing
it to other solutions.  Comparing local performance to remote
performance is comparing apples to oranges.  You aren't comparing two
solutions with the same functionality.  You wouldn't use a GeForce as a
VirtualGL server, nor would you use a Tesla as a stand-alone
workstation.  VirtualGL gives you the ability to do things that you
can't do on a workstation-- sharing GPU's among multiple users, tightly
coupling 3D rendering with compute and storage resources, using a few
beefy servers instead of hundreds of less powerful, scattered
workstations.  Don't underestimate the administrative savings of this.
Our biggest customer saved close to $2 million in capital outlays as a
result of bypassing their 3-year workstation upgrade cycle and replacing
their outdated hardware with a handful of big iron TurboVNC/VirtualGL
servers, plus they are saving nearly $1 million/year in IT costs as a
result of being able to deploy applications in one place rather than 300
(http://www.redhat.com/summit/santos/).

Now, comparing the overhead of, say, indirect GLX with the VGL Transport
would be a reasonable thing to do, since those solutions are
functionally equivalent, but even there, you have to be careful to model
what a real application would do.  'glxspheres -p 100000 -m' would be a
good place to start.  That causes GLXspheres to use immediate mode (no
display lists) and render 100,000 polys (a modest model size by today's
standards.)  The advantages of VirtualGL relative to indirect GLX should
be extremely apparent with that benchmark.

Similarly, comparing VirtualGL + TurboVNC with VirtualGL + some other X
proxy is reasonable, or comparing VirtualGL + TurboVNC with some other X
proxy that does software OpenGL is reasonable.  However, you have to be
careful to measure the CPU usage of both VirtualGL and the X proxy,
normalized against the frame rate, for it to be meaningful (see
http://www.virtualgl.org/pmwiki/uploads/About/vglperf21.pdf).

Statements like "VirtualGL's readback reduces performance by 20%" aren't
very meaningful.  20% relative to what?  It's a moot point unless we can
somehow reduce that overhead and still provide the same functionality.
It would be like saying "my car's tires reduce gas mileage by 20%."
Maybe they do, but you can't drive the car without them.  We can't
exactly eliminate readback, or else we would have no way of delivering
the frames remotely, so this is only meaningful if it is compared to
something else that would still deliver the same functionality (for
instance, using Pbuffers vs. using FBO's to do rendering, or using PBO
readback vs. synchronous readback.)  PBO vs. synchronous readback is
something you can measure now, since PBO readback is implemented in
VirtualGL 2.2 and later, but even then, overhead may not tell you what
you really need to know, since the primary advantage of PBO's is in
making the readback asynchronous, which makes the solution more scalable
when multiple users are banging on the same GPU.

Benchmarking remote display solutions is a tricky thing, because the
solutions have a fundamentally different performance characteristic from
local displays.  If you were comparing one nVidia card vs. the next
nVidia card, you might be able to draw a halfway reasonable conclusion
by running a toy benchmark that rendered at 100 fps on one vs. 200 fps
on the other (although I would caution that, even then, you are probably
measuring a lot more of the CPU than you are the GPU.)  With a remote
display solution, however, the important question is not raw polygon
performance or even raw frame rate.  The important questions are:  (a)
can this solution deliver the same user experience as a local
application?  In most cases, that means can it deliver 15-20 fps or
higher, and in the case of VirtualGL and TurboVNC, that should be a
solid "yes" until you get up to displays of 4 MP or bigger (in which
case, multi-threading the TurboVNC client would deliver better
performance-- seeking funding for this, BTW.)  (b) can this solution
deliver better performance than other remote solutions?  Again, the
answer should be a solid "yes."


On 3/14/12 9:50 AM, Arthur Huillet wrote:
> Hello,
> 
> I haven't found an "official" study by DRC of VirtualGL's performance impact,
> so I set out to conduct a few benchmarks to see how much of a loss VirtualGL
> represented.
> It seems that the readback done by VirtualGL does reduce performance by around
> 20%, but I would like to have reliable figures to back this up.
> 
> Is there any interest from the community? I have access to different hardware
> with which I can test (however my Intel GPU driver crashes when I try to use
> VGL, so I'll stick to nVidia boards for this test).
> 
> The testing method I have in mind is to test up to four cases:
> 
> (1) Control case: no virtualGL, no VNC, test against the system's "3D" X 
> server
> (2) VirtualGL-loopback-case: VirtualGL reading from :0 and drawing to :0 (is
> that really a relevant test?)
> (3) VirtualGL-to-TurboVNC-with-no-viewer: this tests the performance impact of
> VirtualGL without image compression 
> (4) VirtualGL-to-TurboVNC-with-remote-viewer: is that relevant?
> 
> "VirtualGL performance factor" is (3) divided by (1).
> 
> The benchmarks I wish to use are recent OpenGL benchmarks, and it turns out
> those are rather difficult to find. I wanted to use Unigine Heaven and
> Sanctuary, but those benchmarks require a sound card (!!!) which isn't present
> on most of my hardware.
> 
> On my list are:
> 
> A-Unigine Heaven
> B-Unigine Sanctuary
> C-Xonotic's "the-big-benchmark" see
> http://dev.xonotic.org/projects/xonotic/wiki/Hardware_Requirements
> D-gluxMark2 
> 
> Some results so far :
> 
> (On Geforce 9800GT)
> B(1) 53FPS
> B(3) 43FPS
> 
> (On Tesla M2070Q)
> C(1) 
> 270.0412749fps
> 260.8189556fps
> 245.7475499fps
> 234.8674465fps
> 205.9180823fps
> 120.8288956fps
> 108.1091079fps
> 
> C(3) 
> 183.7750899 fps
> 210.5079939 fps
> 199.9529210 fps
> 194.9457950 fps
> 176.9814786 fps
> 107.4935744 fps
> 87.3782494 fps
> 
> 
> Obviously I have many more tests to do. I want to know how much interest there
> is, how much help I can hope to get, and if it's justified I wish we could
> conduct a real scientific study!
> 
> Thanks

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
VirtualGL-Users mailing list
VirtualGL-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

Reply via email to