On Mon, 21 Mar 2005 16:09:03 +0100, Lourens Veen <[EMAIL PROTECTED]> wrote: > On Monday 21 March 2005 13:59, Nicolai Haehnle wrote: > > On Sunday 20 March 2005 23:49, Daniel Phillips wrote: > > > > > > So, why not combine these two thoughts into a bigger whole. Instead > > > > of the current model where command buffers are submitted > > > > synchronously via ioctl, the userspace driver will write the command > > > > buffers somewhere in userspace and simply point the kernel at it > > > > without taking the big hardware lock. > > > > > > That is _exactly_ what I had in mind. The main detail I've been > > > fretting over is how to deliver notification of command buffer > > > completion. I'm currently mulling over using a socket for that, in > > > which case the indirect DMA submission might as well go over the socket > > > too. > > > > DRI drivers already open an fd (/dev/dri/*) to send ioctls. Reading and > > writing from this fd is currently not used, so this is a good candidate > > IMO. > > Looks logical to me. Open device, write your commands to it (well, a pointer > to your commands), and read from it to know if it's finished executing them. > Nice and simple. > > > > > and it allows proper scheduling of access to the hardware, > > > > eliminates ping-ponging of the lock, etc. All in all, this design > > > > should behave *much* better in the face of multiple 3D apps. > > > > > > > > There are a number of problems, though: > > > > - Proper scheduling means that we also need proper context switching, > > > > including preserving all the relevant hardware state, i.e. texture, > > > > blending, etc. settings. This will be expensive unless we figure out > > > > a way for the userspace driver to communicate "reconfiguration > > > > points" in the command stream that contains the necessary information > > > > to reload state. > > > > > > The kernel driver knows which task it got the command submission from, > > > so it can switch to the correct context. > > > > Yes, but it can be expensive. When the kernel switches contexts, it must > > make sure that all the on-card registers (I know you don't like "register > > writes", but that's what they are) that reflect OpenGL state (blending, Z > > test, texturing, texture environments, ...) are correctly preserved. It > > must also make sure that all the referenced textures, including offscreen > > rendering targets, are in place and not swapped out. > > Can't we have a privileged instruction in direct DMA for that? And then maybe > we can have these new values flow down the rendering pipeline just like > rendering commands? It will take up a bit more chip real estate, but then we > wouldn't even need to do a pipeline flush when switching context.
If you need your own Z buffer, you'll have allocated it in advance. Only stage information needs to be set up. Of course, there's the issue of what happens when you run out of graphics memory. :) > > This means that the kernel must keep track of which memory areas each "GPU > > program" currently references and what all the registers contain. You can't > > just wave that away. > > > > Some hardware support for writing/reading register states into a predefined > > area of video memory could help a lot here, but I don't know if that's > > feasible. > > Block RAM? How much data are we talking about anyway? And what is a reasonable > amount of saved register states? Perhaps data could be swapped between > registers and a block RAM, so that we can write the next state into the block > RAM (perhaps via Daniel's cursor upload mechanism :-)) while the card is busy > drawing, and then the kernel sends it a swap command in the direct DMA queue, > directly followed by new drawing commands from the new process. While the > card processes those, the kernel can get the old state out of the block RAM > and store it in memory, and then write the next state to the block RAM. A > sort of cache really. Of course, its usefulness depends on the bus being idle > long enough to upload the state while the card is drawing. Since we're > bus-limited that may not be the case. Comments? The state is only as large as the number of registers in the rendering pipeline. > (Storing state in video RAM is obviously better if we can do that, we could > read/write during the pipeline flush if we have one). All of these things are wonderful, but all they do is take a reasonable design and make it better. Don't forget them, but remember that as long as there is an acceptable way of doing with out, then that's how we have to do it. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
