Re: [Open-graphics] Alternative synchronization mechanism in the driver

Nicolai Haehnle Tue, 22 Mar 2005 01:07:37 -0800

On Tuesday 22 March 2005 04:32, Daniel Phillips wrote:
> On Monday 21 March 2005 18:50, Nicolai Haehnle wrote:
> > On Monday 21 March 2005 19:51, I  wrote:
> > > The DRI socket's job would just be to listen for connections, then
> > > create the real socket and hand it to the client.
> > >
> > > We can probably manage to set up a per-client socket connection
> > > within the existing DRI framework, and so be able to offer a driver
> > > variant that works without upgrading X/DRI, for what it's worth.  I
> > > haven't tried this, and I still haven't looked at a lot of DRI
> > > code, so I can't swear it will work.
> >
> > Stop right there. You have just blown this whole thing up an order of
> > magnitude in complexity without a good reason.
> 
> Why do you think that using a socket is complex?  (In my experience, it 
> is not.)


We are not doing userspace programming here. We're communicating between the 
*kernel* and *userspace*, not between two userspace apps. Why on earth do 
you want to force socket semantics onto this when we don't need it? The 
notion of "address family" has exactly zero meaning here.

> > What exactly do you want to achieve? I thought you just wanted a way
> > for the kernel to notify userspace when some event happened in the
> > GPU program. Most of this can be done using the classic ioctl model
> > (think "wait_for_xyz" ioctls like all drivers already use) or shared
> > memory or a combination of both.
> 
> Wait in the kernel so the task can't do anything else?  Kind of crude...

Note that with wait_for_xyz ioctls, it is *entirely* up to the application 
if it wants to wait. We never want to wait unless the buffers become full 
or we want to do readback, and we never have to wait with an ioctl-based 
interface (if it works asynchronously with submitted GPU programs like I 
have suggested in the beginning of this thread).

Now, if one of those things (buffer low or readback) happen, there is no 
point in going event-based unless the APIs that are available to 
applications (i.e.: OpenGL) are event-based. As far as I know, there are no 
event-based APIs in that area, so unless we come up with our own extensions 
for that event-based simply isn't an issue.

> > The only situation where this *isn't* enough is if we ever find the
> > need for a fully event-based model, because in that situation we need
> > to poll() or select() on multiple event sources - where one of them
> > is the DRI file. But we can easily extend the current DRI file to be
> > a file that can be waited on by userspace. No need to go crazy with
> > sockets here...
> 
> Right, you know about poll, but you think using it is complex.

No. I think that changing the DRI file into a socket instead of just 
implementing the minor stuff to make poll and select work is absolutely 
insane.

> > For the record, I don't think such a fully event-based model is even
> > needed for an OpenGL implementation, unless we come up with some
> > really fancy new extensions.
> 
> Of course, nothing forces you to use it in an event style.  You can just 
> write to and read from the socket, blocking in the kernel just like a 
> blocking ioctl, only without sucking as much because the read/write 
> interface is cleaner (note that ioctls do copy to/from user just like 
> read/write).

I don't think read/write is cleaner. Read/write requires (de)marshalling of 
command packets, whereas with ioctls the requested command is implied in 
the ioctl number.

> If you want to implement a fully asynchronous model, which I believe we 
> do, the blocking ioctl interface just doesn't cut it.  You suggest:

I repeat, ioctls only block when we want them to block.

>   * DRM allocates DMA buffer B
>      * application draws via DRM into B until full
>      * DRM submits B and waits for completion
>      * <engine is idle here>
>      * DRM wakes up and returns control to application
>      * repeat

I have never suggested that. In fact, the entire point of the "GPU program" 
idea is to avoid stalls like that, both on the CPU and on the GPU side.

> If the drawing task takes some time to wake up, you may see a noticeable 
> stall, and the card bandwidth isn't fully used.
> 
> Now consider:
> 
>   * DRM allocates two DMA buffers, A and B
>   * application draws via DRM into A until full
>   * DRM submits A
>      * draw into B via DRM until full
>      * DRM submits B and waits for socket data
>      * DRM wakes up and receives completion for A
>      * draw into A via DRM until full
>      * DRM submits A and waits for socket data
>      * DRM wakes up and receives completion for B
>      * repeat
> 
> With this interface style, the drawing pipeline is never idle.  This 
> goodness is realized even though the drawing task is inherently
> linear - we haven't even done anything fancy with poll yet.

And we can do all this with ioctls.

> > Now for the state register discussion:
> > > My assumption is, the kernel driver keeps all the necessary state
> > > on behalf of each client.  The client _updates_ the state in
> > > process context, the kernel driver accesses the state via kernel
> > > address.
> >
> > And this is exactly the problem. How are state changes submitted by
> > the client? The direct path would be to write all
> > (non-safety-critical state changes) directly into an indirect DMA
> > buffer.
> 
> Nothing says the client can't write its state to the card via indirect 
> DMA state commands and also to kernel memory.
> 
> For efficiency, the client would only supply the hardware state deltas.  
> The kernel associates the deltas with the buffer, and there is also a 
> cumulative state buffer for the client.  As each indirect buffer 
> completes, the kernel applies the associated deltas to the cumulative 
> state.  To switch contexts, the kernel compares two hardware contexts 
> and submits the differences as state commands via the command ring 
> buffer.
> 
> Does this sound complex?  It is, a little.  But it avoids having to read 
> state from the hardware and it costs only a few bytes of state deltas 
> on each buffer submission.

Okay, this sounds very similar to my solution number 3, but instead of 
posting the complete state, userspace only post deltas and the kernel keeps 
track of them internally.

[snip]
> > It's not that there aren't any solutions to this problem. It's just
> > that it is far from obvious to me what the right solution is.
> 
> Same here.  Now, are we going to allow a context switch in the middle of 
> processing an indirect DMA buffer, or only between indirect buffers?  
> The latter is considerably easier, but we are then at the mercy of the 
> client to submit reasonably granular indirect buffers.

Only support context switch between indirect buffers, and limit the size of 
indirect buffers to something reasonable. There's obviously a tradeoff 
here, but if an application abuses the system by sending huge copy 
operations to the card, the GPU scheduler can give it a penalty.

cu,
Nicolai

pgpR15v6OkDQA.pgp
Description: PGP signature

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Alternative synchronization mechanism in the driver

Reply via email to