---------- Forwarded message ---------- From: Rodolphe Ortalo <[EMAIL PROTECTED]> Date: Tue, 1 Feb 2005 21:52:48 +0100 Subject: Re: [Open-graphics] open graphics To: Timothy Miller <[EMAIL PROTECTED]>
On Friday 28 January 2005 14:43, Timothy Miller wrote: > There are actually lots of ways to do DMA ring buffering. One thing > you don't want to do is fill the buffer with NOPs. You want to > prevent the GPU from fetching anything it doesn't have to fetch. > Don't waste bus bandwidth. > > How about people look at how existing GPUs do DMA buffering and > discuss them. We can pick something that works well for us. I am not sure that looking at complex DMA operations on other GPUS is a good idea. However, IIRC, ATI Radeon has ring buffers like those described earlier; while Matrox >G400 have 3 types of DMA buffers: simple "dense" command buffers (4x8bits to indicate 4 registers index, then 4 int32 value), vertex buffers (several subtypes), and "indirect" vertex buffers (list of adresses to vertices). I tried to use all of these in practice in KGI (with some success). What I remember of this work is: - with respect to command buffers, the biggest performance problem nowadays is to move data fast from userspace to kernel space (without revolutioning the kernel interface - too much - and without compromising the performance opportunities offered by AGP or memory remapping); moving from kernel space to the chipset engine (ie DMA) is _not_ really a performance problem. In fact all modern software APIs allow asynchronous rendering (so the userspace software can start to work on the next command buffer while the original one is executing). This amortizes easily the cost of buffering (at least from what I saw). Applications that cannot amortize _and_ are performance critical are probably rare (counter examples welcome of course). - this is entirely another matter for "data" buffers, e.g. transferring list of vertices, or maybe even pictures (though the latter case is probably most flagrant for TV cards or video capture). There, the most annoying issue is the fact that the software manipulates userspace adresses while the kernel manipulates (sort of) physical adresses while the chipset manipulates DMA adresses (possibly through AGP magic). If all these chipsets could be setup to have a coherent view of memory and of memory protections (e.g. software [not] accessing memory while graphic card using it - remember I'm speaking of "data" buffers here, not commands) that would easily simplify the whole thing. And simple things can go fast. Amortizing via pipelining is not always easy here (though possible I guess). For example, with indirect buffers using physical adresses, it is nearly impossible to guarantee system integrity (wrt memory protection) and offer simultaneously maximal performance to a (userspace) software application. Personnally, I would suggest: to offer a single simple DMA method, probably very simple (start and length or end, nothing more). However, I would double check _every_ hardware assumptions that could break those of (virtual) memory management in kernels, AGP, DRI or software libraries (Mesa?). For example, crossing a page boundary is not very affordable if you want to use normal kernel virtual memory. (Ok, AGP may be used to correct this, but AGP does not exist everywhere, and won't last forever.) So, in fact, the DMA engine could simply take [page, offset, length] (with offset+length < page_length - and _check_ that it does not cross pages boundaries). [1] This is not as simple as it sounds, as the chipset engine should then be setup with some page size knowledge (this would be nice from a kernel driver perspective: big pages could be available on some architectures and not on others, and depending on the kernel decisions). But it is simple enough to implement right and without hesitation. Personnally, as I said earlier on this list, I would even favour implementing some adress translation and memory protection logic in to graphics chipset itself... :-) Rodolphe [1] A variation of this one would be to store the offset and length at the start of the page. The advantage is that userspace data could be sent to the graphic chipset without even looking at it (if the chipset guarantees some minimal safety measures such as not crossing page boundaries... :-) _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
