Hi,

I don't have any other suggestions than to use a GL debugger to make sure nothing is going to CPU or to try the new CUDA functions in osgPPU or your own code. I remember something in the GL to CUDA stuff bugging me, but cannot remember the details. AFAIR something was converting from texture to PBO and then to CUDA mem.

jp

On 16/12/10 13:25, Thorsten Roth wrote:
Hi,

as I explained in some other mail to this list, I am currently working
on a graph based image processing framework using CUDA. Basically, this
is independent from OSG, but I am using OSG for my example application :-)

For my first implemented postprocessing algorithm I need color and depth
data. As I want the depth to be linearized between 0 and 1, I used a
shader for that and also I render it in a separate pass to the color.
This stuff is then fetched from the GPU to the CPU by directly attaching
osg::Images to the cameras. This works perfectly, but is quite a bit
slow, as you might already have suspected, because the data is also
processed in CUDA kernels later, which is quite a back and forth ;-)

In fact, my application with three filter kernels based on CUDA (one
gauss blur with radius 21, one image subtract and one image "pseudo-add"
(about as elaborate as a simple add ;-)) yields about 15 fps with a
resolution of 1024 x 1024 (images for normal and absolute position
information are also rendered transferred from GPU to CPU here).

So with these 15 frames, I thought it should perform FAR better when
avoiding that GPU <-> CPU copying stuff. That's when I came across the
osgPPU-cuda example. As far as I am aware, this uses direct mapping of
PixelBuferObjects to cuda memory space. This should be fast! At least
that's what I thought, but running it at a resolution of 1024 x 1024
with a StatsHandler attached shows that it runs at just ~21 fps, not
getting too much better when the cuda kernel execution is completely
disabled.

Now my question is: Is that a general (known) problem which cannot be
avoided? Does it have anything to do with the memory mapping functions?
How can it be optimized? I know that, while osgPPU uses older CUDA
memory mapping functions, there are new ones as of CUDA 3. Is there a
difference in performance?

Any information on this is appreciated, because it will really help me
to decide wether I should integrate buffer mapping or just keep the
copying stuff going :-)

Best Regards
-Thorsten
_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. MailScanner thanks Transtec Computers for their support.

_______________________________________________
osg-users mailing list
[email protected]
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to