On Tue, Jan 26, 2010 at 3:09 AM, Florian Bösch <[email protected]> wrote:
> On Jan 26, 2:34 am, Tristam MacDonald <[email protected]> wrote: > > Computing directly on the GPU removes all of those bottlenecks, as the > CPU > > and bus are used solely to calculate and transfer control data to the > GPU. > > Further, the GPU is many, many times faster than the CPU at updating > > vertex/pixel data - we are looking at 1,500+ shader cores on a > > current-generation GPU, and unlike multiple cores in a CPU, it isn't > > unreasonable to run them all flat out. > > That is theoretically true, there's just two major flaws in that > theory. > 1) The reality: Geometry shaders are hog slow Geometry shaders aren't as fast as they might be, but histopyramids perform the exact same task, and are much faster. > true instancing is not any faster then pseudo instancing Instancing was devised to work around the high cost of DX 9 draw calls. It was never billed as a performance save in the OpenGL world, although it does allow you to free up some CPU time for other tasks. > and why does the following (wave > simulation) bring even high end GPUs to its knees for maps exceeding > 256x256 cells even tough it runs entirely on the GPU? > http://hg.codeflow.org/gletools/file/e351ce1564b0/examples/waves.py Amusingly enough, the poor performance of that example has nothing to do with the question at hand. Instead, when you increase the map size to 512x512 or higher, you become entirely vertex bound in copying 250,000 vertices to a vertex buffer, and rendering them as a terrain. Your actual texture-based computations aren't slow at all - disable the copy-to-vextex-buffer, and it runs fast up to huge resolutions. I would recommend that you eliminate that expensive copy, either by switching the terrain visualisation to relief mapping (thus rendering entirely in the pixel shader), or using vertex texture fetch to use the heightmap texture directly in the vertex shader, or use a texture buffer object + framebuffer blit to perform the copy a lot faster. 2) The programming model required to archive it is extremely complex > and very inflexible, because it's not truly "programmable", just > hopping around on buffers in a semi-fixed pipeline. It's why I have > such high hopes for OpenCL applications, where you'd not run any of > your program on the CPU anymore (except to supply user input) There has been a lot of hype around OpenCL/CUDA, but GPGPU programming has moved a long way since OpenCL was first announced, and it isn't the revolutionary idea it used to be. Among other things, it doesn't address any of your complaints: you are still constrained to small kernels, you still have to "hop around on buffers", and all the control logic is still on the CPU side. The advantage of OpenCL/CUDA is that you (as the programmer) get full control over scheduling and synchronisation - rather than letting the OpenGL driver take care of it behind the scenes. Whether this is a benefit in the general case is as yet unclear to me. -- Tristam MacDonald http://swiftcoder.wordpress.com/ -- You received this message because you are subscribed to the Google Groups "pyglet-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/pyglet-users?hl=en.
