On Tuesday 15 March 2005 17:48, Timothy Miller wrote: > It doesn't need it, really. The solution is to make the direct DMA > buffer only one or two pages. Everything else is done through > indirect by putting small commands into direct. If an indirect > buffer is noncontiguous, the kernel can break up the DMA into > multiple smaller pieces. Problem solved.
That is exactly my feeling, however I recently had to withstand a tirade from another kernel hacker about how the hardware will suck completely without scatter-gather DMA. What I think is, our scheme here is considerably more elegant than scatter-gather, and certainly eats less hardware. Now, the criticism here is: suppose texture load rates really go through the roof, is the ring buffer big enough to handle the huge number of individual 4K DMA commands that will be needed? (Because user memory is typically completely physically fragmented.) As far as I can see, that is the only criticism. Now lets see if it is a serious problem. Suppose each indirect DMA command is 8 bytes. Suppose we are loading textures at 128 MB/Sec, or 128 KB/ms, or 32 pages/ms. Suppose we are willing to take an interrupt and wake up a task that loads up the next batch of DMA commands every 10 ms. That is 8 bytes/command * 32 pages/ms * 10 ms = 2560 bytes of ring buffer space, so round it up to 4K, which allows for enough slack to take an interrupt and refill the buffer before it drains completely. That is the PCI case. PCI-e can transfer one or two orders of magnitude faster. Say we could somehow keep up with this, regardless of whether it is fanciful at this point. Then we might need, say, a 256K ring buffer, and we could not be sure of being able to find that much unfragmented physical memory, except at boot time. Now, we probably will only ever initialize the ring buffer at boot time, but say for the sake of argument that we want to fix this theoretical problem. The way I would propose to do it is: initialize the command ring buffer by loading a number of physical page addresses via PIO, so that the command ring buffer does not have to be physically contiguous. In other words, the DMA hardware translates ring buffer addresses through a table of up to, say, 64 4K pages, and therefore does not have to be allocated from physically contiguous memory. Indirect DMA pages will be locked down by DRI via a software interface, so the hardware doesn't have to worry about that at all. What do you think? Seriously, I doubt it would be a real problem to just make the command ring buffer physically contiguous and hope for the best at init time. But if it isn't too hard, maybe we should solve the fragmentation problem definitively, at least on paper. We can always put this under the "after initial release but before ASIC" category. Regards, Daniel > On Tue, 15 Mar 2005 16:39:33 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > Hi Timothy, > > > > Have you thought about whether this card needs scatter-gather DMA? > > > > Regards, > > > > Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
