---------- Forwarded message ----------
From: Rodolphe Ortalo <[EMAIL PROTECTED]>
Date: Tue, 1 Feb 2005 21:52:48 +0100
Subject: Re: [Open-graphics] open graphics
To: Timothy Miller <[EMAIL PROTECTED]>


On Friday 28 January 2005 14:43, Timothy Miller wrote:
> There are actually lots of ways to do DMA ring buffering.  One thing
> you don't want to do is fill the buffer with NOPs.  You want to
> prevent the GPU from fetching anything it doesn't have to fetch.
> Don't waste bus bandwidth.
>
> How about people look at how existing GPUs do DMA buffering and
> discuss them.  We can pick something that works well for us.

I am not sure that looking at complex DMA operations on other GPUS is a good
idea. However, IIRC, ATI Radeon has ring buffers like those described
earlier; while Matrox >G400 have 3 types of DMA buffers: simple "dense"
command buffers (4x8bits to indicate 4 registers index, then 4 int32 value),
vertex buffers (several subtypes), and "indirect" vertex buffers (list of
adresses to vertices).
I tried to use all of these in practice in KGI (with some success).

What I remember of this work is:
- with respect to command buffers, the biggest performance problem nowadays is
to move data fast from userspace to kernel space (without revolutioning the
kernel interface - too much - and without compromising the performance
opportunities offered by AGP or memory remapping); moving from kernel space
to the chipset engine (ie DMA) is _not_ really a performance problem. In fact
all modern software APIs allow asynchronous rendering (so the userspace
software can start to work on the next command buffer while the original one
is executing). This amortizes easily the cost of buffering (at least from
what I saw). Applications that cannot amortize _and_ are performance critical
are probably rare (counter examples welcome of course).
- this is entirely another matter for "data" buffers, e.g. transferring list
of vertices, or maybe even pictures (though the latter case is probably most
flagrant for TV cards or video capture). There, the most annoying issue is
the fact that the software manipulates userspace adresses while the kernel
manipulates (sort of) physical adresses while the chipset manipulates DMA
adresses (possibly through AGP magic). If all these chipsets could be setup
to have a coherent view of memory and of memory protections (e.g. software
[not] accessing memory while graphic card using it - remember I'm speaking of
"data" buffers here, not commands) that would easily simplify the whole
thing. And simple things can go fast. Amortizing via pipelining is not always
easy here (though possible I guess).
For example, with indirect buffers using physical adresses, it is nearly
impossible to guarantee system integrity (wrt memory protection) and offer
simultaneously maximal performance to a (userspace) software application.

Personnally, I would suggest: to offer a single simple DMA method, probably
very simple (start and length or end, nothing more). However, I would double
check _every_ hardware assumptions that could break those of (virtual) memory
management in kernels, AGP, DRI or software libraries (Mesa?). For example,
crossing a page boundary is not very affordable if you want to use normal
kernel virtual memory. (Ok, AGP may be used to correct this, but AGP does not
exist everywhere, and won't last forever.)
So, in fact, the DMA engine could simply take [page, offset, length] (with
offset+length < page_length - and _check_ that it does not cross pages
boundaries). [1] This is not as simple as it sounds, as the chipset engine
should then be setup with some page size knowledge (this would be nice from a
kernel driver perspective: big pages could be available on some architectures
and not on others, and depending on the kernel decisions). But it is simple
enough to implement right and without hesitation.
Personnally, as I said earlier on this list, I would even favour implementing
some adress translation and memory protection logic in to graphics chipset
itself... :-)

Rodolphe

[1] A variation of this one would be to store the offset and length at the
start of the page. The advantage is that userspace data could be sent to the
graphic chipset without even looking at it (if the chipset guarantees some
minimal safety measures such as not crossing page boundaries... :-)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to