Re: [Open-graphics] Alternative synchronization mechanism in the driver

Daniel Phillips Mon, 21 Mar 2005 10:57:07 -0800

On Monday 21 March 2005 07:59, Nicolai Haehnle wrote:
> > > Instead of the current model where command buffers are submitted
> > > synchronously via ioctl, the userspace driver will write the
> > > command buffers somewhere in userspace and simply point the
> > > kernel at it without taking the big hardware lock.
> >
> > That is _exactly_ what I had in mind.  The main detail I've been
> > fretting over is how to deliver notification of command buffer
> > completion.  I'm currently mulling over using a socket for that, in
> > which case the indirect DMA submission might as well go over the
> > socket too.
>
> DRI drivers already open an fd (/dev/dri/*) to send ioctls. Reading
> and writing from this fd is currently not used, so this is a good
> candidate IMO.


This fd is currently just a generic character device and not a socket, 
so DRI would have to be patched along with adding our driver to the X 
tree, which isn't necessarily a bad idea.  I can't think of any 
compatibility problem with changing the DRI character device(s) to a 
socket.  The quick test is to try it and see if anything breaks.

The DRI socket's job would just be to listen for connections, then 
create the real socket and hand it to the client.

We can probably manage to set up a per-client socket connection within 
the existing DRI framework, and so be able to offer a driver variant 
that works without upgrading X/DRI, for what it's worth.  I haven't 
tried this, and I still haven't looked at a lot of DRI code, so I can't 
swear it will work.

> > > - Proper scheduling means that we also need proper context
> > > switching, including preserving all the relevant hardware state,
> > > i.e. texture, blending, etc. settings. This will be expensive
> > > unless we figure out a way for the userspace driver to
> > > communicate "reconfiguration points" in the command stream that
> > > contains the necessary information to reload state.
> >
> > The kernel driver knows which task it got the command submission
> > from, so it can switch to the correct context.
>
> Yes, but it can be expensive. When the kernel switches contexts, it
> must make sure that all the on-card registers (I know you don't like
> "register writes", but that's what they are)

Well let me clear something up: I'm not quite that dense ;-)  I call 
those the "real" registers, and of course you can't build anything 
useful without them.  The registers that sit somewhere on the engine 
pipeline are the ones we're interested in at the moment.  We can't 
update a pipeline register whenever we want, we must arrange to update 
it at exactly the right time: after data already in the pipeline has 
progressed past that stage and before that stage processes any data 
that relies on the new setting.

This requires some clever register updating mechanism, which Timothy 
partly described earlier: there is a queue of register update values 
that runs in parallel with the pipeline and there is some means of 
deciding when to pull a value out of the queue into a particular 
register.  This still leaves a bunch of questions, like: how exactly 
are the queue values tagged, and what do you do when more than one 
register needs to be updated at the same pipeline stage?

Anyway, the point of this is, the 'real' registers aren't accessible in 
any simple way, either for writing or reading.  So if there are a set 
of registers available via PIO, it is either an indirect interface to 
the real registers or it is completely unsynchronized with the 
pipeline, and it is up to software to compensate, probably by flushing 
the pipeline before any register read or write.  This is the main 
reason I think that exposing a drawing register interface is worse than 
useless.

For non-DMA control of the card, I suggested emulating the DMA stream 
with an auto-incrementing PIO register.  This is an inverted way of 
looking at the problem: instead of seeing the world in terms of PIO 
registers with the DMA interface built on top of them, you see the 
world in terms of the DMA stream, or instruction stream if you like, 
and provide a PIO interface capable of emulating the DMA stream for any 
special situations where DMA isn't possible (though I still haven't 
seen a clear explanation of how this might come up).

> that reflect OpenGL 
> state (blending, Z test, texturing, texture environments, ...) are
> correctly preserved.

That is easy: the driver will keep the OpenGL state in host memory.  It 
would be perverse to try to read the GL state from the engine 
registers, since the 'real' registers reflect the GL state as of some 
point in the past.  On the other hand, if the host remembers the GL 
state it is available immediately and efficiently.  It does not make a 
bit of sense to try to use the card as some sort of auxiliary, 
out-of-sync state memory.

The corollary is, except perhaps for debugging, there is no reason to 
provide read access to the internal pipeline registers.  Which ought to 
save a little hardware and certainly will save some documentation.  It 
also removes the need to devise a register numbering scheme, which is 
replaced by a queue tagging scheme that can be whatever is convenient, 
and can even change from rev to rev, transparently to the driver.

> It must also make sure that all the referenced 
> textures, including offscreen rendering targets, are in place and not
> swapped out.
>
> This means that the kernel must keep track of which memory areas each
> "GPU program" currently references and what all the registers
> contain. You can't just wave that away.

Yes.  This leads us back to the discussion of resource handles I think. 
Perhaps you'd care to weigh in on that?

> Some hardware support for writing/reading register states into a
> predefined area of video memory could help a lot here, but I don't
> know if that's feasible.

I don't see why it's needed, since the driver can remember all the state 
it sent to the card.  It might be nice to have a way for the driver to 
confirm that its view of drawing state matches the card's view, for 
debugging purposes.

For some non-pipeline state, direct r/w PIO access makes perfect sense, 
for example, the cursor position register.

> And even if it is feasible, the video memory management issues still
> remain. They can obviously be reduced by allowing each GPU program to
> lock memory in place so that it will not be moved by the video memory
> manager until a certain part of the program has been executed.

Ideally, the DMA engine would advance its head pointer only after a 
drawing operation has completely cleared the pipeline.  But perhaps 
that is too hard to implement in hardware.  A reasonable alternative is 
to flush the pipeline before recovering a resource that is known to be 
in use.

> > > - In the current DRI design, the kernel module does basically
> > > everything in a process context. With this design, it'll have to
> > > do almost everything in interrupt or bottomhalf context. This
> > > alone brings a number of problems, such as access to the calling
> > > process' data.
> >
> > Fortunately, the virtual pages are resolved to physical when the
> > process obtains the dma buffer.  I don't think we need anything
> > else from process context.
>
> There is some meta information about the GPU program that doesn't fit
> into the DMA buffer itself. For example, with an in-kernel memory
> manager, we must communicate which textures and rendering targets the
> program currently needs and when they become "unlocked".
> Think of it like this: All userspace clients will need the kernel to
> issue some direct DMA commands for them: Memory moves, i.e. memory
> management stuff, and "calls" to indirect DMA. The meta information
> that is not in DMA memory reflects the direct DMA commands that
> userspace needs the kernel to issue.

My assumption is, the kernel driver keeps all the necessary state on 
behalf of each client.  The client _updates_ the state in process 
context, the kernel driver accesses the state via kernel address.

> However, this probably isn't too big a problem: This data shouldn't
> be too much, so we could just allocate a special page for it in
> memory.

Indeed.  Even if it is a lot, that's still ok.

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Alternative synchronization mechanism in the driver

Reply via email to