On Sunday 20 March 2005 12:48, Nicolai Haehnle wrote: > Hi Daniel, (others are obviously free to join the discussion)
It's been a little lonely talking to myself about it ;-) > This doesn't really have anything to do with the hardware, but it's > relevant for the software, so here goes... > > On Friday 18 March 2005 22:16, Daniel Phillips wrote: > > Sure, let me stop bleating about that. It is easy to provide and > > somebody might find a use for it, though I don't see any use at > > present. The problem is, if you have more than one task writing to > > the ring buffer, then we need to synchronize access. DRI provides a > > locking model that can be used for this. But we are within a hair's > > breadth of being able to bypass that odious locking and let each > > drawing task be truly asynchronous, depending only on the kernel > > module to sort out the indirect DMA submissions and switch to the > > correct drawing context. I think we ought to go the distance. That > > will get some attention. > > I believe I understand you better now. In fact, I believe this is > something I've been thinking about already, though maybe I'm coming > from a different direction. Start with two observations: > > 1. Run glxgears on a DRI-enabled system. Then run 'yes' in a terminal > emulator. Watch the system go crazy. A similar effect can sometimes > be observed while moving and resizing OpenGL windows. > This suggests that access to the GPU needs a proper scheduler, just > like access to the CPU is arbitrated using a proper scheduler. That has always bothered me a lot. Is it just restricted to DRI programs? I think I've seen that with 2D SDL animations as well. > 2. The GPU is really a second processor. It is unlike the floating > point coprocessor in the olden days because it has a different > address space and because of the vastly different performance > bottlenecks. So the GPU is more like a processor in its own right. > Digging a bit deeper into that thought, you'll notice that when an > application calls OpenGL commands, one can interpret this as > just-in-time assembling of a GPU "program". That's how I think of it. > So, why not combine these two thoughts into a bigger whole. Instead > of the current model where command buffers are submitted > synchronously via ioctl, the userspace driver will write the command > buffers somewhere in userspace and simply point the kernel at it > without taking the big hardware lock. That is _exactly_ what I had in mind. The main detail I've been fretting over is how to deliver notification of command buffer completion. I'm currently mulling over using a socket for that, in which case the indirect DMA submission might as well go over the socket too. > At some point (either in an > ioctl when the engine is idle, or in a bottom-half handler for the > "hardware ring buffer empty" IRQ), the kernel will look at all > outstanding scheduled command buffers and write to the hardware ring > buffer. Exactly. > Note that this idea is completely independent of hardware > capabilities. It has the very big advantage that the big hardware > lock is taken almost never, which reduces the potential impact of > bugs, Unless I missed something major, you can drop the "almost". > and it allows proper scheduling of access to the hardware, > eliminates ping-ponging of the lock, etc. All in all, this design > should behave *much* better in the face of multiple 3D apps. > > There are a number of problems, though: > - Proper scheduling means that we also need proper context switching, > including preserving all the relevant hardware state, i.e. texture, > blending, etc. settings. This will be expensive unless we figure out > a way for the userspace driver to communicate "reconfiguration > points" in the command stream that contains the necessary information > to reload state. The kernel driver knows which task it got the command submission from, so it can switch to the correct context. > - In the current DRI design, the kernel module does basically > everything in a process context. With this design, it'll have to do > almost everything in interrupt or bottomhalf context. This alone > brings a number of problems, such as access to the calling process' > data. Fortunately, the virtual pages are resolved to physical when the process obtains the dma buffer. I don't think we need anything else from process context. > - This design has a lot of impact on video memory management. > Basically, I believe you'd have to use a completely new kind of > memory management (Then again, "memory management" in DRI is really > bad right now, so maybe that's even a bonus ;)) I've always labored under the assumption that our kernel driver would manage video memory, and nobody else. > - I don't think it is possible to eliminate the big hardware lock > completely. I'm mostly thinking about some rare operations like mode > setting; when looking at the DRI in general, keep in mind that some > hardware has problems when the CPU accesses video memory while the > engine is busy. Such hardware would need the big hardware lock more > often. Great. That will make our hardware look good, and broken hardware can use the hardware lock. Regards, Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
