On Sunday 20 March 2005 12:48, Nicolai Haehnle wrote:
> Hi Daniel, (others are obviously free to join the discussion)

It's been a little lonely talking to myself about it ;-)

> This doesn't really have anything to do with the hardware, but it's
> relevant for the software, so here goes...
>
> On Friday 18 March 2005 22:16, Daniel Phillips wrote:
> > Sure, let me stop bleating about that.  It is easy to provide and
> > somebody might find a use for it, though I don't see any use at
> > present.  The problem is, if you have more than one task writing to
> > the ring buffer, then we need to synchronize access.  DRI provides a
> > locking model that can be used for this.  But we are within a hair's
> > breadth of being able to bypass that odious locking and let each
> > drawing task be truly asynchronous, depending only on the kernel
> > module to sort out the indirect DMA submissions and switch to the
> > correct drawing context.  I think we ought to go the distance.  That
> > will get some attention. 
>
> I believe I understand you better now. In fact, I believe this is
> something I've been thinking about already, though maybe I'm coming
> from a different direction. Start with two observations:
>
> 1. Run glxgears on a DRI-enabled system. Then run 'yes' in a terminal
> emulator. Watch the system go crazy. A similar effect can sometimes
> be observed while moving and resizing OpenGL windows.
> This suggests that access to the GPU needs a proper scheduler, just
> like access to the CPU is arbitrated using a proper scheduler.

That has always bothered me a lot.  Is it just restricted to DRI 
programs?  I think I've seen that with 2D SDL animations as well.

> 2. The GPU is really a second processor. It is unlike the floating
> point coprocessor in the olden days because it has a different
> address space and because of the vastly different performance
> bottlenecks. So the GPU is more like a processor in its own right.
> Digging a bit deeper into that thought, you'll notice that when an
> application calls OpenGL commands, one can interpret this as
> just-in-time assembling of a GPU "program".

That's how I think of it.

> So, why not combine these two thoughts into a bigger whole. Instead
> of the current model where command buffers are submitted
> synchronously via ioctl, the userspace driver will write the command
> buffers somewhere in userspace and simply point the kernel at it
> without taking the big hardware lock.

That is _exactly_ what I had in mind.  The main detail I've been 
fretting over is how to deliver notification of command buffer 
completion.  I'm currently mulling over using a socket for that, in 
which case the indirect DMA submission might as well go over the socket 
too.

> At some point (either in an 
> ioctl when the engine is idle, or in a bottom-half handler for the
> "hardware ring buffer empty" IRQ), the kernel will look at all
> outstanding scheduled command buffers and write to the hardware ring
> buffer.

Exactly.

> Note that this idea is completely independent of hardware
> capabilities. It has the very big advantage that the big hardware
> lock is taken almost never, which reduces the potential impact of
> bugs,

Unless I missed something major, you can drop the "almost".

> and it allows proper scheduling of access to the hardware, 
> eliminates ping-ponging of the lock, etc. All in all, this design
> should behave *much* better in the face of multiple 3D apps.
>
> There are a number of problems, though:
> - Proper scheduling means that we also need proper context switching,
> including preserving all the relevant hardware state, i.e. texture,
> blending, etc. settings. This will be expensive unless we figure out
> a way for the userspace driver to communicate "reconfiguration
> points" in the command stream that contains the necessary information
> to reload state.

The kernel driver knows which task it got the command submission from, 
so it can switch to the correct context.

> - In the current DRI design, the kernel module does basically
> everything in a process context. With this design, it'll have to do
> almost everything in interrupt or bottomhalf context. This alone
> brings a number of problems, such as access to the calling process'
> data.

Fortunately, the virtual pages are resolved to physical when the process 
obtains the dma buffer.  I don't think we need anything else from 
process context.

> - This design has a lot of impact on video memory management.
> Basically, I believe you'd have to use a completely new kind of
> memory management (Then again, "memory management" in DRI is really
> bad right now, so maybe that's even a bonus ;))

I've always labored under the assumption that our kernel driver would 
manage video memory, and nobody else.

> - I don't think it is possible to eliminate the big hardware lock
> completely. I'm mostly thinking about some rare operations like mode
> setting; when looking at the DRI in general, keep in mind that some
> hardware has problems when the CPU accesses video memory while the
> engine is busy. Such hardware would need the big hardware lock more
> often.

Great.  That will make our hardware look good, and broken hardware can 
use the hardware lock.

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to