Re: [Open-graphics] open graphics

Nicolai Haehnle Fri, 28 Jan 2005 02:20:58 -0800

On Friday 28 January 2005 06:16, you wrote:
> It would make a whole lot of sense to work closely with DRI/Xorg crowd 
> and get it right on both sides.  DRI is by no means immutable.


I agree with you, and since I'm interested in driver development and have at 
least a little bit of experience with DRI, I believe this can be doable.

> Here's an off-the-wall idea for command DMA.  Caveat: I've never 
> actually looked at command DMA on a graphics card, so I may just be 
> rambling.  But I have worked a fair bit with sound card DMA, and I 
> rather like the way that works.
> 
> The idea is to have a power-of-two sized ring buffer that generates two 
> interrupts each time round, one at halfway and the other at the wrap 
> point.  To start a frame, you fill the bottom half of the buffer plus a 
> bit with commands+data, then set up the DMA via command registers and 
> start it cycling.  Each interrupt refills half the command buffer.  
> When the frame is finished, the last command in the buffer is "stop 
> DMA".  The point is, there's no per-cycle setup overhead for this 
> scheme at all.  It is possible for the refill routine to underrun, 
> since it ultimately has to be driven from an foreground task which 
> might fail to deliver on time for one reason or another (e.g., disk 
> IO).  In that case, the interrupt routine could fill the buffer with 
> no-ops, rather than stalling it and requiring fresh setup.

This scheme makes sense for devices where the bytes/second bandwidth is 
fairly constant, which is the case for sound cards. However, with graphics 
cards the same number of bytes can take either very short (small triangles) 
or very long time to process (large triangles). So a straightforward ring 
buffer  - which is what basically every graphics card out there uses - is 
just right.

> The advantage of this scheme versus just setting up each block of DMA 
> when the completion interrupt for the one before it arrives is, there's 
> no latency between the interrupt and delivering the next batch of 
> commands.  There's almost no IO register traffic, and obviously there's 
> no busy waiting.

How would the amount of IO register traffic be any different? The 
busy-waiting and interrupt latencies should also be no problem with a 
normal DMA ring buffer.

When it comes to command streams, there are some important considerations:
1. The kernel has to prevent user-level applications from issuing arbitrary 
commands (otherwise apps could program arbitrary DMA transfers)

2. There will be *a lot* of bandwidth used for what is essentially just 
geometry data.

This is why we absolutely must have some kind of indirect buffer system. The 
way I believe it should work is roughly this:
- the user-level OpenGL (or whatever) code creates a buffer in DMA-able 
memory
- the user-level code issues an ioctl telling the kernel "please execute 
this buffer"
- the kernel will put a "CALL indirect_buffer" command into the main ring 
buffer
- the hardware only allows a subset of commands in indirect buffers

cu,
Nicolai

pgp0CZoobFLdv.pgp
Description: PGP signature

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] open graphics

Reply via email to