Hi Thorsten,

the problem which you experience is because of lacking direct memory mapping 
between OpenGL and CUDA memory. Even if CUDA (at least it was in version 2 so) 
supports GPU<->GPU memory mapping, whenever you access to OpenGL textures there 
is a full memory copy performed.

I am not aware if this was solved in CUDA3, maybe you should check it out. 
CUDA2 definitively doesn't perform direct mapping between GL textures and CUDA 
textures/arrays.

regards,
art



Thorsten Roth wrote:
> Hi,
> 
> as I explained in some other mail to this list, I am currently working 
> on a graph based image processing framework using CUDA. Basically, this 
> is independent from OSG, but I am using OSG for my example application :-)
> 
> For my first implemented postprocessing algorithm I need color and depth 
> data. As I want the depth to be linearized between 0 and 1, I used a 
> shader for that and also I render it in a separate pass to the color. 
> This stuff is then fetched from the GPU to the CPU by directly attaching 
> osg::Images to the cameras. This works perfectly, but is quite a bit 
> slow, as you might already have suspected, because the data is also 
> processed in CUDA kernels later, which is quite a back and forth ;-)
> 
> In fact, my application with three filter kernels based on CUDA (one 
> gauss blur with radius 21, one image subtract and one image "pseudo-add" 
> (about as elaborate as a simple add ;-)) yields about 15 fps with a 
> resolution of 1024 x 1024 (images for normal and absolute position 
> information are also rendered transferred from GPU to CPU here).
> 
> So with these 15 frames, I thought it should perform FAR better when 
> avoiding that GPU <-> CPU copying stuff. That's when I came across the 
> osgPPU-cuda example. As far as I am aware, this uses direct mapping of 
> PixelBuferObjects to cuda memory space. This should be fast! At least 
> that's what I thought, but running it at a resolution of 1024 x 1024 
> with a StatsHandler attached shows that it runs at just ~21 fps, not 
> getting too much better when the cuda kernel execution is completely 
> disabled.
> 
> Now my question is: Is that a general (known) problem which cannot be 
> avoided? Does it have anything to do with the memory mapping functions? 
> How can it be optimized? I know that, while osgPPU uses older CUDA 
> memory mapping functions, there are new ones as of CUDA 3. Is there a 
> difference in performance?
> 
> Any information on this is appreciated, because it will really help me 
> to decide wether I should integrate buffer mapping or just keep the 
> copying stuff going :-)
> 
> Best Regards
> -Thorsten
> _______________________________________________
> osg-users mailing list
> 
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
> 
>  ------------------
> Post generated by Mail2Forum


------------------
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=35415#35415





_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to