On Thu, 17 Mar 2005 21:40:57 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> > If you use DMA but you only periodically update the write pointer, > > then there's much less host CPU involvement, but there's considerable > > added latency for cases when you have few small packets with long > > times between them. > > But in that case we don't care... because it's just a few small packets > with long times between them! True. In one implementation I did, I decided, on a per-command basis, whether to update the write pointer (it was a PIO to do it) immediately or whether to do it during vertical blank interrupt. 60Hz was good enough. > > > We also need to consider, in all cases, what happens when you try to > > do a PIO while DMA is going on: You wait for, like, 16+ bus cycles > > just to get in one transaction. > > My suggestion is to not implement PIO at all, except for the basic card > control commands. I do not think PIO as an alternate means of issuing > commands adds any useful functionality. > > I also suggest that PIO commands, direct DMA commands and indirect DMA > commands should not overlap. At this point, I haven't seen any > examples at all of where overlap makes sense. Updating cursor glyph in interrupt context while DMA is going on. Note that the latency for that could be evil. > > Another possibility is to have the write pointer hang out in host > > memory and have the GPU poll it periodically. That eliminates bus > > overhead entirely from the kernel but does introduce some amount of > > latency. The advantage is that the write pointer is never passed > > over the bus when it doesn't need to be (it only happens when the GPU > > realizes that it can't do anything else useful). > > This isn't a problem. When the drm issues a command list or a texture > ioctl, the kernel driver will: > > 1) Parse it into individual 4K DMA regions > 2) Load/lock the pages > 3) Load all the DMA commands into the ring buffer (if they fit) > 4) Update the write pointer just once > > (If they don't all fit, the remainder will be loaded when the next > buffer low interrupt arrives.) > > > The simplest approach is to use PIO to push DMA commands into a > > queue. But that has the latency issue when a DMA transaction is > > already going on. The fastest approach is the one where the host > > does absolutely no PIO at all and it's the GPU's job to poll the > > write pointer and update the read pointer at convenient times. > > I doubt the GPU needs to poll. Each of those ring buffer commands is > going to take quite some time to execute, and they will almost always > be submitted in batches. When they aren't, I don't think we care. The GPU will "poll" the write pointer under two conditions: (1) The fifo for the ring buffer is running low and should be loaded with new commands, if available. (2) The engine is completely idle. It's hard to completely eliminate (2). I suppose if we also interrupt on engine-idle, we can set a state bit that indicates that we have to do one PIO to kick-start the DMA for the ring buffer. Could that ever get confused? _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
