On Tue, 15 Mar 2005 18:29:58 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Tuesday 15 March 2005 17:48, Timothy Miller wrote:
> > It doesn't need it, really.  The solution is to make the direct DMA
> > buffer only one or two pages.  Everything else is done through
> > indirect by putting small commands into direct.  If an indirect
> > buffer is noncontiguous, the kernel can break up the DMA into
> > multiple smaller pieces.  Problem solved.
> 
> That is exactly my feeling, however I recently had to withstand a tirade
> from another kernel hacker about how the hardware will suck completely
> without scatter-gather DMA.  What I think is, our scheme here is
> considerably more elegant than scatter-gather, and certainly eats less
> hardware.
> 
> Now, the criticism here is: suppose texture load rates really go through
> the roof, is the ring buffer big enough to handle the huge number of
> individual 4K DMA commands that will be needed?  (Because user memory
> is typically completely physically fragmented.)  As far as I can see,
> that is the only criticism.
> 
> Now lets see if it is a serious problem.  Suppose each indirect DMA
> command is 8 bytes.  Suppose we are loading textures at 128 MB/Sec, or
> 128 KB/ms, or 32 pages/ms.  Suppose we are willing to take an interrupt
> and wake up a task that loads up the next batch of DMA commands every
> 10 ms.  That is 8 bytes/command * 32 pages/ms * 10 ms = 2560 bytes of
> ring buffer space, so round it up to 4K, which allows for enough slack
> to take an interrupt and refill the buffer before it drains completely.

It would take much longer than that to drain the ring buffer.  Each
entry in the ring buffer would point to some other DMA transaction
that would itself likely take a long time.

> 
> That is the PCI case.  PCI-e can transfer one or two orders of magnitude
> faster.  Say we could somehow keep up with this, regardless of whether
> it is fanciful at this point.  Then we might need, say, a 256K ring
> buffer, and we could not be sure of being able to find that much
> unfragmented physical memory, except at boot time.
> 
> Now, we probably will only ever initialize the ring buffer at boot time,
> but say for the sake of argument that we want to fix this theoretical
> problem.  The way I would propose to do it is: initialize the command
> ring buffer by loading a number of physical page addresses via PIO, so
> that the command ring buffer does not have to be physically contiguous.
> In other words, the DMA hardware translates ring buffer addresses
> through a table of up to, say, 64 4K pages, and therefore does not have
> to be allocated from physically contiguous memory.
> 
> Indirect DMA pages will be locked down by DRI via a software interface,
> so the hardware doesn't have to worry about that at all.
> 
> What do you think?  Seriously, I doubt it would be a real problem to
> just make the command ring buffer physically contiguous and hope for
> the best at init time.  But if it isn't too hard, maybe we should solve
> the fragmentation problem definitively, at least on paper.  We can
> always put this under the "after initial release but before ASIC"
> category.

I'll consider the idea of being able to take a LIST of address ranges
for the ring buffer.  Say, maybe 8 entries or something.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to