The problem with what you're proposing is that, if the pixels are not
read back by VirtualGL, then the VNC X server will not have a copy of
the pixels in its virtual framebuffer. The lossless refresh, ALR, and
interframe comparison features wouldn't work properly, since those
features require that the pixels in the virtual FB be up to date.
Further, if the window manager or another app attempted to obtain the 3D
pixels via XGetImage() or XCopyArea(), they would get back bogus data.
Further, since TurboVNC's encoder is out of process from VirtualGL,
synchronization would be an issue. Ideally, you'd have VirtualGL create
a PBO, then you would copy the pixels into the PBO and pass the PBO
handle to TurboVNC, letting it read/compress the pixels out of the PBO
while the rendering thread has moved on to the next frame. You'd still
need synchronization, however, because you don't want VGL to create more
PBOs than necessary. You want it to create a pool of 2 or 3 of them and
reuse them, which requires blocking until TurboVNC has finished with a
particular PBO handle.
If I were designing such a system, I'd do it as follows:
(1) Develop a custom X extension that works similarly to MIT-SHM but
uses GPU memory (via a PBO) instead of POSIX shared memory. You would
need an equivalent of XShmPutImage() ("XGPUPutImage()" or
"XPBOPutImage()" or whatnot.) Within the body of this PutImage()
function, TurboVNC would synchronize the pixels between a specified PBO
and the VNC virtual framebuffer, then it would compress/send the given
pixels. It would be easiest if the compression took place within the
body of the XPBOPutImage() function. That way, we wouldn't have to have
separate synchronization functions to allow VGL and TVNC to lock the
PBO, nor would we have to track the PBO region separately within Xvnc so
that it could be handled by a different codec whenever the deferred
updates are processed.
(2) Develop an image transport extension in VGL that calls the
hypothetical X extension. Since the proposed PutImage() function would
be synchronous, the image transport extension would have to work like
the existing X11 transport, calling the PutImage() function in a
separate thread.
(3) Extend TurboVNC such that it can compress a specific region of the
virtual framebuffer from a PBO source instead of from the virtual FB.
Note that this doesn't eliminate the need to synchronize the pixels from
the PBO to the virtual FB, but it eliminates the need to copy the pixels
back down to the GPU for compression.
This solution is still more of a hack than I would prefer. Essentially
all you're eliminating is a single buffer copy, and that overhead may
not be very much if you used the GPU from within libjpeg-turbo instead
of at a higher level. The idea is that, if you are using the GPU at the
low level, you are copying data to it in very small chunks, and you can
hide that transfer time behind the time taken to do other operations.
For instance, there is a proposed patch for libjpeg-turbo that uses
OpenCL for doing certain decode operations, and the patched code is able
to pipeline the GPU operations with Huffman decoding (Huffman is best
left on the CPU due to the algorithm's lack of parallelism.)
In short, the foremost question on my mind is whether, despite the fact
that you would incur an additional buffer copy, it might still be better
from a performance point of view to do GPU compression at a lower level,
within libjpeg-turbo. It would certainly make things tons easier, since
none of the hacks proposed above would be necessary.
The other thing is-- JPEG is not always the most appropriate compression
algorithm. TurboVNC only uses it for areas of the display that have
high numbers of unique colors. If you are using a CAD app or something
else that doesn't generate images with a lot of unique colors, then a
good portion of what you're sending to the client may actually be
indexed color rather than JPEG. This is another reason why implementing
GPU compression at the codec level may be a better idea-- it wouldn't
interfere with the existing encoding method selection mechanism in TurboVNC.
On 6/21/13 12:34 AM, Bharatkumar Sharma wrote:
> The final Goal is to do JPEG compression done by TurboVNC using libjpeg
> on GPU and not on CPU and in order to do that below is the given setup.
> We have a setup where VirtualGL interrupts all GLX calls and renders
> offline in a pbuffer on a NVIDIA GPU. In normal setup VirtualGL reads
> this rendered image and then this image is taken by TurboVNC. TurboVNC
> does compression of this image using libjpeg and sends it over to the
> client side. Now we want to fasten the process of compressing the image
> using CUDA on NVIDIA GPU.
> In order to do that I need to send this image to GPU and run the
> parallel compression algorithm and get back the compressed image to CPU
> to be sent to client. In order to save this extra effort of transferring
> data back and forth between GPU we thought VirtualGL should not read
> back the pbuffer and TurboVNC directly takes this pbuffer to do the
> compression on GPU.
> This saves 2 copies to the GPU. So I set VGL_READBACK=0 so that
> VirtualGL does not read back the pbuffer. Now the question is how does
> this pbuffer is accessible to TurboVNC?
> As I said I am new to VirtualGL and TurboVNC so kindly suggest the
> appropriate way of doing this.
>
>
> On Thu, Jun 20, 2013 at 8:16 PM, DRC <[email protected]
> <mailto:[email protected]>> wrote:
>
> Please explain what you're trying to accomplish.
>
> On Jun 20, 2013, at 1:54 AM, Bharatkumar Sharma
> <[email protected] <mailto:[email protected]>>
> wrote:
>
> > Hi,
> >
> > I am new to VirtualGL and TurboVNC. We have requirement that VirtualGL
> should not readback the rendered image and TurboVNC before compressing the
> rendered image should get handle to this pbuffer.
> > After reading the VirtualGL guide I see that setting VGL_READBACK will
> solve the first part of problem where the rendered image is not read back.
> > But how to get handle to the pbuffer in TuboVNC before compression and
> sending to the client part is not very clear to me.
> >
> > In my knowledge pbuffer cannot be shared acroos process. I saw a
> similar approach used by ParaView where they create a wrapper around
> swapbuffer but I am not sure how to implement this.
> >
> > Regards,
> > Bharat
> >
> ------------------------------------------------------------------------------
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> >http://p.sf.net/sfu/windows-dev2dev
> > _______________________________________________
> > VirtualGL-Devel mailing list
> >[email protected]
> <mailto:[email protected]>
> >https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> VirtualGL-Devel mailing list
> [email protected]
> <mailto:[email protected]>
> https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
>
>
>
> _______________________________________________
> VirtualGL-Devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
VirtualGL-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtualgl-devel