On Wed, 2 Feb 2005 00:04:17 +0100, Rodolphe Ortalo
<[EMAIL PROTECTED]> wrote:
> On Tuesday 01 February 2005 23:06, Timothy Miller wrote:
> > To all:
> >
> > Ok, here are the kinds of DMA transactions that I want to support:
> >
> > (1) Direct command buffer.  Using a ring buffer with "read pointer"
> > (controlled by GPU) and "write pointer" (controlled by CPU), the host
> > can fill an empty portion of the buffer, starting at the write
> > pointer, with command packets.  The GPU fetches words in blocks and
> > feeds them into the GPU fifo.
> 
> For (1) and (2), are the "blocks" of fixed size? (If not, how do you define
> the size of each block?)

They would be variable.

For Direct, the host sets the queue tail pointer and the DMA engine
just reads up to that point and stops until the pointer changes again.

For indirect, you need to indicate an address and a length, in words.

> 
> > (2) Indirect command buffer.  Using PIO or a ring buffer, arbitrary
> > host address ranges are specified.  The GPU fetches them in blocks, in
> > order.  This is useful for multitasking, where there is a central
> > server and individual packets from different processes are thrown into
> > the one ring buffer.
> 
> Both for (1) and (2), note that the most efficient way for the kernel driver
> to grab command buffers from userspace is to steal pages from userspace
> memory. In this case, you can expect that the common case is that these pages
> are not full (command list rarely ends exactly on a page boundary) and that
> these pages are not physically contiguous (even if they appear to be in the
> virtual address space of the userspace process).

I thought there were special "DMA pages" you could allocate that
wouldn't swap and would stay in a fixed location.

> I hope such a case does not interfere with the way of operation you have in
> mind.

For the ring buffer, it would just wrap around to the beginning.  If
the ring buffer is only 4K, will anyone cry?  Especially if it's
mostly used to post indirect loads from other 4k pages?

> Note also that stealing pages from a process memory is not exactly easy
> (unless they are write-only or read-only by design, like for a file or pipe).
> The usual way is to mark them unwritable and unreadable and to put the
> process to sleep if it tries to touch these pages, until they are available
> again for it.

I don't think this is how you do it.

> > (3) Direct data move.  Data to be written to graphic memory is loaded
> > directly into the ring buffer.  In this case, the graphics memory,
> > rather than the GPU engine is the target.  Moves from card to host are
> > not possible this way.
> 
> It seems to me that, in this case, you will require AGP-like remaping
> capabilities to transfer efficiently areas longer than one page from host
> memory to graphics memory. (Unfortunately, contiguous areas in virtual memory
> cannot be assumed to be contiguous in physical memory. Enforcing such
> situation usually necessitates specific APIs in kernels and lots of secondary
> software problems...)

For the sake of everything else on the bus, you want to put limits on
the length of individual bus transactions.  Those limits are on the
order of 16 or 32 bus cycles.  Therefore, this doesn't matter.

> > (3) Indirect data move.  Using either PIO or a ring buffer, arbitrary
> > host address ranges are specified.  The source/target is graphics
> > memory, and either reads or writes can be specified.
> 
> If this is the way to go to do arbitrary size data movements (by batching one
> page at a time) from host memory to graphics memory, I wonder if the previous
> case "(3) Direct data move" is really needed?

It's incredibly convenient for certain operations.  It's really only
useful, though, if it goes though the GPU pipeline.  The reason is
that it's completely coherent with rendering.  This way, you can throw
text glyphs into the pipeline and not have to sync anything.

Actually, it may not be proper to separate this out.  Really, this
kind of transfer would be expressed as a GPU command packet that just
happens to have a relatively large payload.

> > For indirect DMA, an interrupt can be asserted for the completion of
> > each unit.  For direct DMA, two interrupts are available:  ring empty;
> > ring has reached low water-mark.
> 
> In fine, why not have only indirect data move and indirect command buffer
> move? 

And how are you going to specify those?  You need a queue.  You could
PIO to the queue, but what if the queue fills?  Well, you have to
sleep.  It would be LOTS nicer to just throw them into a ring buffer
somewhere.  Over-all, it would improve efficiency.

(Well, it wouldn't if your DMA request packets are like 3 words long
and you try to do a PIO for each one.  What you should do is
periodically flush the write pointer or when you know that nothing
more is coming.  The problem there is the latency.)

> As I see it, such indirect modes will be mandatory to handle cleanly
> multiple-pages data transfers and the kernel will probably enforce unit host
> address ranges shorter than one page length (unless AGP is fully used - not
> such a common case, but X could use it).
> 
> BTW, an interrupt for each unit (in the indirect case) is probably overkill
> (depends on the maximal interrupt rate however). Would it be possible to mark
> units that should raise an interrupt?

The way we did it with TROZ was that different units can raise a
condition, which shows up in a register you can read.  Then, there's a
mask which indicates which ones can actually assert an interrupt.  You
can turn the interrupts on and off selectively.

> (Indirect mode is especially useful for 2D operations with multiple processes
> - like the case of an X server. But in this case, units are probably pretty
> short in size and the sync offered by interrupts is requested rarely by
> processes - they usually do not bother to know that their drawings have been
> processed.)

When you have lots of small packets, like in the case of X11, it's
MUCH more efficient to use a single ring buffer.

> While talking about interrupts. Note that, IMHO, two kind of interrupts are
> useful: one for the end of the data transfer (it allows the kernel to reclaim
> resources) but also one for the end of the associated drawing operation.
> The latter is rarely furnished by the hardware - usually the driver needs to
> wait for the engine to do idle to be sure that drawing operations are
> completed. However this is the "interrupt" that drawing processes wants (they
> want to know that the drawing is finished on screen, not that the engine has
> finished to fetch ops).
> Would it be possible to mark units transfered via DMA so that an interrupt is
> generated when the drawing operation is completed? (Given transistor budget
> constraints of course.)

We can provide various different interrupts.  The one that they ALWAYS
miss, which I find to be really stupid of them, is one which says "I'm
not done processing, but I now have enough room in my queue that it's
worth waking you up so you can fill some more."
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to