Yes of course... (Typo, sorry...) ---------- Forwarded Message ----------
Subject: Re: [Open-graphics] open graphics Date: Tuesday 01 February 2005 22:00 From: Timothy Miller <[EMAIL PROTECTED]> To: Rodolphe Ortalo <[EMAIL PROTECTED]> This is very informative. Mind if we share it with the list? I do have a few ideas about it that I'd like to share with everyone, but first, they should see what you have to say. On Tue, 1 Feb 2005 21:52:48 +0100, Rodolphe Ortalo <[EMAIL PROTECTED]> wrote: > On Friday 28 January 2005 14:43, Timothy Miller wrote: > > There are actually lots of ways to do DMA ring buffering. One thing > > you don't want to do is fill the buffer with NOPs. You want to > > prevent the GPU from fetching anything it doesn't have to fetch. > > Don't waste bus bandwidth. > > > > How about people look at how existing GPUs do DMA buffering and > > discuss them. We can pick something that works well for us. > > I am not sure that looking at complex DMA operations on other GPUS is a > good idea. However, IIRC, ATI Radeon has ring buffers like those described > earlier; while Matrox >G400 have 3 types of DMA buffers: simple "dense" > command buffers (4x8bits to indicate 4 registers index, then 4 int32 > value), vertex buffers (several subtypes), and "indirect" vertex buffers > (list of adresses to vertices). > I tried to use all of these in practice in KGI (with some success). > > What I remember of this work is: > - with respect to command buffers, the biggest performance problem nowadays > is to move data fast from userspace to kernel space (without revolutioning > the kernel interface - too much - and without compromising the performance > opportunities offered by AGP or memory remapping); moving from kernel space > to the chipset engine (ie DMA) is _not_ really a performance problem. In > fact all modern software APIs allow asynchronous rendering (so the > userspace software can start to work on the next command buffer while the > original one is executing). This amortizes easily the cost of buffering (at > least from what I saw). Applications that cannot amortize _and_ are > performance critical are probably rare (counter examples welcome of > course). > - this is entirely another matter for "data" buffers, e.g. transferring > list of vertices, or maybe even pictures (though the latter case is > probably most flagrant for TV cards or video capture). There, the most > annoying issue is the fact that the software manipulates userspace adresses > while the kernel manipulates (sort of) physical adresses while the chipset > manipulates DMA adresses (possibly through AGP magic). If all these > chipsets could be setup to have a coherent view of memory and of memory > protections (e.g. software [not] accessing memory while graphic card using > it - remember I'm speaking of "data" buffers here, not commands) that would > easily simplify the whole thing. And simple things can go fast. Amortizing > via pipelining is not always easy here (though possible I guess). > For example, with indirect buffers using physical adresses, it is nearly > impossible to guarantee system integrity (wrt memory protection) and offer > simultaneously maximal performance to a (userspace) software application. > > Personnally, I would suggest: to offer a single simple DMA method, probably > very simple (start and length or end, nothing more). However, I would > double check _every_ hardware assumptions that could break those of > (virtual) memory management in kernels, AGP, DRI or software libraries > (Mesa?). For example, crossing a page boundary is not very affordable if > you want to use normal kernel virtual memory. (Ok, AGP may be used to > correct this, but AGP does not exist everywhere, and won't last forever.) > So, in fact, the DMA engine could simply take [page, offset, length] (with > offset+length < page_length - and _check_ that it does not cross pages > boundaries). [1] This is not as simple as it sounds, as the chipset engine > should then be setup with some page size knowledge (this would be nice from > a kernel driver perspective: big pages could be available on some > architectures and not on others, and depending on the kernel decisions). > But it is simple enough to implement right and without hesitation. > Personnally, as I said earlier on this list, I would even favour > implementing some adress translation and memory protection logic in to > graphics chipset itself... :-) > > Rodolphe > > [1] A variation of this one would be to store the offset and length at the > start of the page. The advantage is that userspace data could be sent to > the graphic chipset without even looking at it (if the chipset guarantees > some minimal safety measures such as not crossing page boundaries... :-) ------------------------------------------------------- _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
