On Thu, 17 Mar 2005 23:42:46 +0100, Attila Kinali <[EMAIL PROTECTED]> wrote: > On Thu, 17 Mar 2005 16:48:00 -0500 > Timothy Miller <[EMAIL PROTECTED]> wrote: > > > On Thu, 17 Mar 2005 22:19:01 +0100, Attila Kinali <[EMAIL PROTECTED]> wrote: > > > > > How about to keep this ring buffer on the card itself ? > > > > > > Then we wouldn't have to worry about it's continuity > > > as it will be placed in unfragmented memory anyways. > > > > What use is a DMA buffer if it's not in host memory? :) > > Right, stupid me. > Didn't i already mention that i don't know what i'm > talking about ? :) > > > What you're describing is using PIOs to program indirect buffer loads, > > which you WILL be able to do. And, in fact, it may be better to > > prefer that. > > Why should it be better ?
Well, let's think for a moment about what we put into the ring buffer. Let's say most packets are 64 bits: A 40-bit address, 8-bit command, and 16-bit length. If use PIO, that's an immediate PCI burst of two words and nothing else. If you use DMA, and you update the write pointer for every access, then that's one PCI transaction for each packet, which is only one cycle shorter than the PIO. But then you have the added latency of the GPU taking the bus and doing a read, which won't help you much. If you use DMA but you only periodically update the write pointer, then there's much less host CPU involvement, but there's considerable added latency for cases when you have few small packets with long times between them. We also need to consider, in all cases, what happens when you try to do a PIO while DMA is going on: You wait for, like, 16+ bus cycles just to get in one transaction. Another possibility is to have the write pointer hang out in host memory and have the GPU poll it periodically. That eliminates bus overhead entirely from the kernel but does introduce some amount of latency. The advantage is that the write pointer is never passed over the bus when it doesn't need to be (it only happens when the GPU realizes that it can't do anything else useful). The simplest approach is to use PIO to push DMA commands into a queue. But that has the latency issue when a DMA transaction is already going on. The fastest approach is the one where the host does absolutely no PIO at all and it's the GPU's job to poll the write pointer and update the read pointer at convenient times. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
