Hi Thorsten, the problem which you experience is because of lacking direct memory mapping between OpenGL and CUDA memory. Even if CUDA (at least it was in version 2 so) supports GPU<->GPU memory mapping, whenever you access to OpenGL textures there is a full memory copy performed.
I am not aware if this was solved in CUDA3, maybe you should check it out. CUDA2 definitively doesn't perform direct mapping between GL textures and CUDA textures/arrays. regards, art Thorsten Roth wrote: > Hi, > > as I explained in some other mail to this list, I am currently working > on a graph based image processing framework using CUDA. Basically, this > is independent from OSG, but I am using OSG for my example application :-) > > For my first implemented postprocessing algorithm I need color and depth > data. As I want the depth to be linearized between 0 and 1, I used a > shader for that and also I render it in a separate pass to the color. > This stuff is then fetched from the GPU to the CPU by directly attaching > osg::Images to the cameras. This works perfectly, but is quite a bit > slow, as you might already have suspected, because the data is also > processed in CUDA kernels later, which is quite a back and forth ;-) > > In fact, my application with three filter kernels based on CUDA (one > gauss blur with radius 21, one image subtract and one image "pseudo-add" > (about as elaborate as a simple add ;-)) yields about 15 fps with a > resolution of 1024 x 1024 (images for normal and absolute position > information are also rendered transferred from GPU to CPU here). > > So with these 15 frames, I thought it should perform FAR better when > avoiding that GPU <-> CPU copying stuff. That's when I came across the > osgPPU-cuda example. As far as I am aware, this uses direct mapping of > PixelBuferObjects to cuda memory space. This should be fast! At least > that's what I thought, but running it at a resolution of 1024 x 1024 > with a StatsHandler attached shows that it runs at just ~21 fps, not > getting too much better when the cuda kernel execution is completely > disabled. > > Now my question is: Is that a general (known) problem which cannot be > avoided? Does it have anything to do with the memory mapping functions? > How can it be optimized? I know that, while osgPPU uses older CUDA > memory mapping functions, there are new ones as of CUDA 3. Is there a > difference in performance? > > Any information on this is appreciated, because it will really help me > to decide wether I should integrate buffer mapping or just keep the > copying stuff going :-) > > Best Regards > -Thorsten > _______________________________________________ > osg-users mailing list > > http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org > > ------------------ > Post generated by Mail2Forum ------------------ Read this topic online here: http://forum.openscenegraph.org/viewtopic.php?p=35415#35415 _______________________________________________ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org