On Tuesday 15 March 2005 17:48, Timothy Miller wrote:
> It doesn't need it, really.  The solution is to make the direct DMA
> buffer only one or two pages.  Everything else is done through
> indirect by putting small commands into direct.  If an indirect
> buffer is noncontiguous, the kernel can break up the DMA into
> multiple smaller pieces.  Problem solved.

That is exactly my feeling, however I recently had to withstand a tirade 
from another kernel hacker about how the hardware will suck completely 
without scatter-gather DMA.  What I think is, our scheme here is 
considerably more elegant than scatter-gather, and certainly eats less 
hardware.

Now, the criticism here is: suppose texture load rates really go through 
the roof, is the ring buffer big enough to handle the huge number of 
individual 4K DMA commands that will be needed?  (Because user memory 
is typically completely physically fragmented.)  As far as I can see, 
that is the only criticism.

Now lets see if it is a serious problem.  Suppose each indirect DMA 
command is 8 bytes.  Suppose we are loading textures at 128 MB/Sec, or 
128 KB/ms, or 32 pages/ms.  Suppose we are willing to take an interrupt 
and wake up a task that loads up the next batch of DMA commands every 
10 ms.  That is 8 bytes/command * 32 pages/ms * 10 ms = 2560 bytes of 
ring buffer space, so round it up to 4K, which allows for enough slack 
to take an interrupt and refill the buffer before it drains completely.

That is the PCI case.  PCI-e can transfer one or two orders of magnitude 
faster.  Say we could somehow keep up with this, regardless of whether 
it is fanciful at this point.  Then we might need, say, a 256K ring 
buffer, and we could not be sure of being able to find that much 
unfragmented physical memory, except at boot time.

Now, we probably will only ever initialize the ring buffer at boot time, 
but say for the sake of argument that we want to fix this theoretical 
problem.  The way I would propose to do it is: initialize the command 
ring buffer by loading a number of physical page addresses via PIO, so 
that the command ring buffer does not have to be physically contiguous.  
In other words, the DMA hardware translates ring buffer addresses 
through a table of up to, say, 64 4K pages, and therefore does not have 
to be allocated from physically contiguous memory.

Indirect DMA pages will be locked down by DRI via a software interface, 
so the hardware doesn't have to worry about that at all.

What do you think?  Seriously, I doubt it would be a real problem to 
just make the command ring buffer physically contiguous and hope for 
the best at init time.  But if it isn't too hard, maybe we should solve 
the fragmentation problem definitively, at least on paper.  We can 
always put this under the "after initial release but before ASIC" 
category.

Regards,

Daniel

> On Tue, 15 Mar 2005 16:39:33 -0500, Daniel Phillips 
<[EMAIL PROTECTED]> wrote:
> > Hi Timothy,
> >
> > Have you thought about whether this card needs scatter-gather DMA?
> >
> > Regards,
> >
> > Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to