On Friday 28 January 2005 06:16, you wrote: > It would make a whole lot of sense to work closely with DRI/Xorg crowd > and get it right on both sides. DRI is by no means immutable.
I agree with you, and since I'm interested in driver development and have at least a little bit of experience with DRI, I believe this can be doable. > Here's an off-the-wall idea for command DMA. Caveat: I've never > actually looked at command DMA on a graphics card, so I may just be > rambling. But I have worked a fair bit with sound card DMA, and I > rather like the way that works. > > The idea is to have a power-of-two sized ring buffer that generates two > interrupts each time round, one at halfway and the other at the wrap > point. To start a frame, you fill the bottom half of the buffer plus a > bit with commands+data, then set up the DMA via command registers and > start it cycling. Each interrupt refills half the command buffer. > When the frame is finished, the last command in the buffer is "stop > DMA". The point is, there's no per-cycle setup overhead for this > scheme at all. It is possible for the refill routine to underrun, > since it ultimately has to be driven from an foreground task which > might fail to deliver on time for one reason or another (e.g., disk > IO). In that case, the interrupt routine could fill the buffer with > no-ops, rather than stalling it and requiring fresh setup. This scheme makes sense for devices where the bytes/second bandwidth is fairly constant, which is the case for sound cards. However, with graphics cards the same number of bytes can take either very short (small triangles) or very long time to process (large triangles). So a straightforward ring buffer - which is what basically every graphics card out there uses - is just right. > The advantage of this scheme versus just setting up each block of DMA > when the completion interrupt for the one before it arrives is, there's > no latency between the interrupt and delivering the next batch of > commands. There's almost no IO register traffic, and obviously there's > no busy waiting. How would the amount of IO register traffic be any different? The busy-waiting and interrupt latencies should also be no problem with a normal DMA ring buffer. When it comes to command streams, there are some important considerations: 1. The kernel has to prevent user-level applications from issuing arbitrary commands (otherwise apps could program arbitrary DMA transfers) 2. There will be *a lot* of bandwidth used for what is essentially just geometry data. This is why we absolutely must have some kind of indirect buffer system. The way I believe it should work is roughly this: - the user-level OpenGL (or whatever) code creates a buffer in DMA-able memory - the user-level code issues an ioctl telling the kernel "please execute this buffer" - the kernel will put a "CALL indirect_buffer" command into the main ring buffer - the hardware only allows a subset of commands in indirect buffers cu, Nicolai
pgp0CZoobFLdv.pgp
Description: PGP signature
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
