Yes of course... (Typo, sorry...)

----------  Forwarded Message  ----------

Subject: Re: [Open-graphics] open graphics
Date: Tuesday 01 February 2005 22:00
From: Timothy Miller <[EMAIL PROTECTED]>
To: Rodolphe Ortalo <[EMAIL PROTECTED]>

This is very informative.  Mind if we share it with the list?  I do
have a few ideas about it that I'd like to share with everyone, but
first, they should see what you have to say.


On Tue, 1 Feb 2005 21:52:48 +0100, Rodolphe Ortalo

<[EMAIL PROTECTED]> wrote:
> On Friday 28 January 2005 14:43, Timothy Miller wrote:
> > There are actually lots of ways to do DMA ring buffering.  One thing
> > you don't want to do is fill the buffer with NOPs.  You want to
> > prevent the GPU from fetching anything it doesn't have to fetch.
> > Don't waste bus bandwidth.
> >
> > How about people look at how existing GPUs do DMA buffering and
> > discuss them.  We can pick something that works well for us.
>
> I am not sure that looking at complex DMA operations on other GPUS is a
> good idea. However, IIRC, ATI Radeon has ring buffers like those described
> earlier; while Matrox >G400 have 3 types of DMA buffers: simple "dense"
> command buffers (4x8bits to indicate 4 registers index, then 4 int32
> value), vertex buffers (several subtypes), and "indirect" vertex buffers
> (list of adresses to vertices).
> I tried to use all of these in practice in KGI (with some success).
>
> What I remember of this work is:
> - with respect to command buffers, the biggest performance problem nowadays
> is to move data fast from userspace to kernel space (without revolutioning
> the kernel interface - too much - and without compromising the performance
> opportunities offered by AGP or memory remapping); moving from kernel space
> to the chipset engine (ie DMA) is _not_ really a performance problem. In
> fact all modern software APIs allow asynchronous rendering (so the
> userspace software can start to work on the next command buffer while the
> original one is executing). This amortizes easily the cost of buffering (at
> least from what I saw). Applications that cannot amortize _and_ are
> performance critical are probably rare (counter examples welcome of
> course).
> - this is entirely another matter for "data" buffers, e.g. transferring
> list of vertices, or maybe even pictures (though the latter case is
> probably most flagrant for TV cards or video capture). There, the most
> annoying issue is the fact that the software manipulates userspace adresses
> while the kernel manipulates (sort of) physical adresses while the chipset
> manipulates DMA adresses (possibly through AGP magic). If all these
> chipsets could be setup to have a coherent view of memory and of memory
> protections (e.g. software [not] accessing memory while graphic card using
> it - remember I'm speaking of "data" buffers here, not commands) that would
> easily simplify the whole thing. And simple things can go fast. Amortizing
> via pipelining is not always easy here (though possible I guess).
> For example, with indirect buffers using physical adresses, it is nearly
> impossible to guarantee system integrity (wrt memory protection) and offer
> simultaneously maximal performance to a (userspace) software application.
>
> Personnally, I would suggest: to offer a single simple DMA method, probably
> very simple (start and length or end, nothing more). However, I would
> double check _every_ hardware assumptions that could break those of
> (virtual) memory management in kernels, AGP, DRI or software libraries
> (Mesa?). For example, crossing a page boundary is not very affordable if
> you want to use normal kernel virtual memory. (Ok, AGP may be used to
> correct this, but AGP does not exist everywhere, and won't last forever.)
> So, in fact, the DMA engine could simply take [page, offset, length] (with
> offset+length < page_length - and _check_ that it does not cross pages
> boundaries). [1] This is not as simple as it sounds, as the chipset engine
> should then be setup with some page size knowledge (this would be nice from
> a kernel driver perspective: big pages could be available on some
> architectures and not on others, and depending on the kernel decisions).
> But it is simple enough to implement right and without hesitation.
> Personnally, as I said earlier on this list, I would even favour
> implementing some adress translation and memory protection logic in to
> graphics chipset itself... :-)
>
> Rodolphe
>
> [1] A variation of this one would be to store the offset and length at the
> start of the page. The advantage is that userspace data could be sent to
> the graphic chipset without even looking at it (if the chipset guarantees
> some minimal safety measures such as not crossing page boundaries... :-)

-------------------------------------------------------
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to