Re: [osg-users] osgPPU CUDA Example - slower than expected?

J.P. Delport Mon, 03 Jan 2011 02:00:34 -0800

Hi,

I don't have any other suggestions than to use a GL debugger to makesure nothing is going to CPU or to try the new CUDA functions in osgPPUor your own code. I remember something in the GL to CUDA stuff buggingme, but cannot remember the details. AFAIR something was converting fromtexture to PBO and then to CUDA mem.


jp

On 16/12/10 13:25, Thorsten Roth wrote:

Hi,

as I explained in some other mail to this list, I am currently working
on a graph based image processing framework using CUDA. Basically, this
is independent from OSG, but I am using OSG for my example application :-)

For my first implemented postprocessing algorithm I need color and depth
data. As I want the depth to be linearized between 0 and 1, I used a
shader for that and also I render it in a separate pass to the color.
This stuff is then fetched from the GPU to the CPU by directly attaching
osg::Images to the cameras. This works perfectly, but is quite a bit
slow, as you might already have suspected, because the data is also
processed in CUDA kernels later, which is quite a back and forth ;-)

In fact, my application with three filter kernels based on CUDA (one
gauss blur with radius 21, one image subtract and one image "pseudo-add"
(about as elaborate as a simple add ;-)) yields about 15 fps with a
resolution of 1024 x 1024 (images for normal and absolute position
information are also rendered transferred from GPU to CPU here).

So with these 15 frames, I thought it should perform FAR better when
avoiding that GPU <-> CPU copying stuff. That's when I came across the
osgPPU-cuda example. As far as I am aware, this uses direct mapping of
PixelBuferObjects to cuda memory space. This should be fast! At least
that's what I thought, but running it at a resolution of 1024 x 1024
with a StatsHandler attached shows that it runs at just ~21 fps, not
getting too much better when the cuda kernel execution is completely
disabled.

Now my question is: Is that a general (known) problem which cannot be
avoided? Does it have anything to do with the memory mapping functions?
How can it be optimized? I know that, while osgPPU uses older CUDA
memory mapping functions, there are new ones as of CUDA 3. Is there a
difference in performance?

Any information on this is appreciated, because it will really help me
to decide wether I should integrate buffer mapping or just keep the
copying stuff going :-)

Best Regards
-Thorsten
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

--

This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,and is believed to be clean. MailScanner thanks Transtec Computers for their support.


_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Re: [osg-users] osgPPU CUDA Example - slower than expected?

Reply via email to