On Thu, 17 Mar 2005 21:40:57 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:

> > If you use DMA but you only periodically update the write pointer,
> > then there's much less host CPU involvement, but there's considerable
> > added latency for cases when you have few small packets with long
> > times between them.
> 
> But in that case we don't care... because it's just a few small packets
> with long times between them!

True.  In one implementation I did, I decided, on a per-command basis,
whether to update the write pointer (it was a PIO to do it)
immediately or whether to do it during vertical blank interrupt.  60Hz
was good enough.

> 
> > We also need to consider, in all cases, what happens when you try to
> > do a PIO while DMA is going on:  You wait for, like, 16+ bus cycles
> > just to get in one transaction.
> 
> My suggestion is to not implement PIO at all, except for the basic card
> control commands.  I do not think PIO as an alternate means of issuing
> commands adds any useful functionality.
> 
> I also suggest that PIO commands, direct DMA commands and indirect DMA
> commands should not overlap.  At this point, I haven't seen any
> examples at all of where overlap makes sense.

Updating cursor glyph in interrupt context while DMA is going on. 
Note that the latency for that could be evil.

> > Another possibility is to have the write pointer hang out in host
> > memory and have the GPU poll it periodically. That eliminates bus
> > overhead entirely from the kernel but does introduce some amount of
> > latency.  The advantage is that the write pointer is never passed
> > over the bus when it doesn't need to be (it only happens when the GPU
> > realizes that it can't do anything else useful).
> 
> This isn't a problem.  When the drm issues a command list or a texture
> ioctl, the kernel driver will:
> 
>   1) Parse it into individual 4K DMA regions
>   2) Load/lock the pages
>   3) Load all the DMA commands into the ring buffer (if they fit)
>   4) Update the write pointer just once
> 
> (If they don't all fit, the remainder will be loaded when the next
> buffer low interrupt arrives.)
> 
> > The simplest approach is to use PIO to push DMA commands into a
> > queue. But that has the latency issue when a DMA transaction is
> > already going on.  The fastest approach is the one where the host
> > does absolutely no PIO at all and it's the GPU's job to poll the
> > write pointer and update the read pointer at convenient times.
> 
> I doubt the GPU needs to poll.  Each of those ring buffer commands is
> going to take quite some time to execute, and they will almost always
> be submitted in batches.  When they aren't, I don't think we care.

The GPU will "poll" the write pointer under two conditions:

(1) The fifo for the ring buffer is running low and should be loaded
with new commands, if available.
(2) The engine is completely idle.

It's hard to completely eliminate (2).  I suppose if we also interrupt
on engine-idle, we can set a state bit that indicates that we have to
do one PIO to kick-start the DMA for the ring buffer.  Could that ever
get confused?
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to