Re: [Open-graphics] Alternative synchronization mechanism in the driver

Timothy Miller Mon, 21 Mar 2005 18:28:25 -0800

On Mon, 21 Mar 2005 18:58:56 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Monday 21 March 2005 14:50, Timothy Miller wrote:
> > On Mon, 21 Mar 2005 13:51:31 -0500, Daniel Phillips wrote:
> > > This fd is currently just a generic character device and not a
> > > socket, so DRI would have to be patched along with adding our
> > > driver to the X tree, which isn't necessarily a bad idea.  I can't
> > > think of any compatibility problem with changing the DRI character
> > > device(s) to a socket.  The quick test is to try it and see if
> > > anything breaks.
> > >
> > > The DRI socket's job would just be to listen for connections, then
> > > create the real socket and hand it to the client.
> > >
> > > We can probably manage to set up a per-client socket connection
> > > within the existing DRI framework, and so be able to offer a driver
> > > variant that works without upgrading X/DRI, for what it's worth.  I
> > > haven't tried this, and I still haven't looked at a lot of DRI
> > > code, so I can't swear it will work.
> >
> > I'm not sure I like the idea of using the read/write interface.
> > That'll most likely involve extra copies.
> 
> This is only for low volume traffic that has to be synchronized and
> checked for validity.  Most (99% or more) of the traffic will go
> through indirect DMA.
> 
> > Rather, I prefer to use
> > shared memory pages, and the GL app sends pointers via an ioctl
> > interface.  Zero-copy.
> 
> That was one of the ideas I looked at.  The tricky part is making it
> event-driven.  With sockets, we already have a nice mechanism: poll.
> We could get way more creative, but it wouldn't necessarily be better.


What do you mean by "event driven"?  In any case, you can make an ioctl sleep.

> 
> > > ...This requires some clever register updating mechanism, which
> > > Timothy partly described earlier: there is a queue of register
> > > update values that runs in parallel with the pipeline and there is
> > > some means of deciding when to pull a value out of the queue into a
> > > particular register.  This still leaves a bunch of questions, like:
> > > how exactly are the queue values tagged, and what do you do when
> > > more than one register needs to be updated at the same pipeline
> > > stage?
> >
> > Since I've done this before, I can answer the question directly....
> >
> > While it's perfectly valid to look at it the way you've described it,
> > I tend to think of the register writes going THROUGH the pipeline.
> > Of course, being hardware, it's also parallel to the pipeline; it's
> > just that they synchronization logic.
> >
> > As for numbering, that's easy too:  Each pipeline stage has a number,
> > and each register with that stage has a number.  N bits for the stage
> > number, M bits for the register number, gives you an ADDRESS that is
> > M+N bits long.
> 
> It's a reasonable addressing scheme.  I had imagined you could have a
> comparator attached to each register, which recognizes a particular tag
> (register number) and loads the register.  So the pipeline stage number
> doesn't matter.

There is more than one way to skin a cat.  :)

> Anyway, that still leaves the question: what about when you have to load
> multiple registers at the same stage?  For example, 16 parameter
> increments have to be loaded all at the same stage, and further down,
> two register base addresses plus (if you accept my theories on accurate
> calculation) four base values for S1, T1, S2 and T2.  Some trick is
> needed.

Yeah, you load them one at a time.  You can only get them into the
pipeline serially, so there's no reason to try to load register more
than one at a time.  10 registers take 10 clocks.

> 
> I know you've already sorted this out, but it's fun to think about it
> anyway.  I imagined you could either stall the pipeline for a few
> clocks until a whole set of parameters arrive, or you could send some
> of the parameters down a few clocks early, and unload them into
> temporary registers until the whole set arrives.

Ah, I see what you're talking about, I think.  If a lower substage is
working on something, and the earlier substage that takes register
writes changes a register, then the register will be changed too soon.
 This is particularly easy to see if the lower stage is looping, and
you change its operating data during the loop.

There are a few different ways to deal with that.  One way I dealt
with that was to have register writes accepted at the earlier stage,
but they're passed down the pipeline of the substage, so they're
always in time.  This sounds like a waste if there are 10 substages,
you have to have 10 copies of the same register.  But in chip design,
there's this evil thing called wire load, which is increased by
fanout.  If each substage needs to use every register (not usually,
but just for the sake of argument), then you can much more easily
route the logic than if you had one copy of the register and had to
route that to all targets.

The other option, of course, is to stall that part of the pipeline
until that stage is no longer busy and then start accepting them. 
I've done it that way too.  It just depends on context.

> > > For non-DMA control of the card, I suggested emulating the DMA
> > > stream with an auto-incrementing PIO register.  This is an inverted
> > > way of looking at the problem: instead of seeing the world in terms
> > > of PIO registers with the DMA interface built on top of them, you
> > > see the world in terms of the DMA stream, or instruction stream if
> > > you like, and provide a PIO interface capable of emulating the DMA
> > > stream for any special situations where DMA isn't possible (though
> > > I still haven't seen a clear explanation of how this might come
> > > up).
> >
> > Regardless of the fate of the registers appearing in the PIO space,
> > what you describe is a totally valid and potentially very useful
> > alternative.  Add it to the list!
> 
> Great, and I'll shut up about the PIO interface, which I perceive it as
> a debugging interface. (Oops, was the me not shutting up about it?;-)
> 
> As a debugging interface, it will obviously give a lot of feedback very
> quickly.  Do we have a way to stop/single step the whole pipeline while
> we look at the registers?

That may be a useful feature, although I haven't designed it into
anything before.  Usually, when we need that level of detail, we run a
simulation.

> > Normally, you'd have the DMA engine feed fifo A, which is read by
> > translation logic that then feeds fifo B, which is read by the
> > engine. The PIO method you describe gives PIO access to fifo A.  If
> > we also allow PIO access directly to registers by number, then that
> > would be PIO access to fifo B.
> 
> This sounds clean and simple.

That's the objective.  :)

> > > The corollary is, except perhaps for debugging, there is no reason
> > > to provide read access to the internal pipeline registers.  Which
> > > ought to save a little hardware and certainly will save some
> > > documentation.  It also removes the need to devise a register
> > > numbering scheme, which is replaced by a queue tagging scheme that
> > > can be whatever is convenient, and can even change from rev to rev,
> > > transparently to the driver.
> >
> > I don't see why the register numbers and register tags can't be
> > exactly the same numbers.
> 
> They are, the driver just doesn't care what the numbers are, because
> they are purely an internal detail as I see it.  Where my "tag"
> terminology comes from: register updates in the pipeline are "tagged
> with the register number".

Yeah, I've dealt with tags like that before.

> > > > And even if it is feasible, the video memory management issues
> > > > still remain. They can obviously be reduced by allowing each GPU
> > > > program to lock memory in place so that it will not be moved by
> > > > the video memory manager until a certain part of the program has
> > > > been executed.
> > >
> > > Ideally, the DMA engine would advance its head pointer only after a
> > > drawing operation has completely cleared the pipeline.  But perhaps
> > > that is too hard to implement in hardware.  A reasonable
> > > alternative is to flush the pipeline before recovering a resource
> > > that is known to be in use.
> >
> > I agree with this approach.  If you have to shuffle memory, you have
> > to bite the bullet and do it serially with rendering.
> 
> So which is it, do we flush the pipeline to be sure a resource is
> released?  

Yes

> A half formed alternative idea: we could have a command to
> delay advancing the ring buffer head.  

I plan on things LIKE that, but some of them get too complicated for
what they're worth.

> When that command arrives, we
> send a special tag down the pipeline and when it pops out the other
> end, the DMA head is updated.  The driver sees that and carries on with
> releasing the resource.  A barrier instruction of sorts.

I think you're saying what I'm thinking.  Let me phrase it:  You pass
a token down the pipeline.  This token is sent in just after you use a
resource, so when the interrupt caused by the token arrives, that
means the resource is done being used.  But that doesn't keep you from
passing in other commands after the token that have nothing to do with
the particular resource.

> > > > However, this probably isn't too big a problem: This data
> > > > shouldn't be too much, so we could just allocate a special page
> > > > for it in memory.
> > >
> > > Indeed.  Even if it is a lot, that's still ok.
> >
> > Someone want to count?  :)
> 
> The GL state isn't more than a page or two per client, I would think.
> Resource tables will be bulkier, but it's hard to say much until we've
> established the model.
> 
> There isn't really a practical limit here, the kernel driver can grab as
> many pages as it needs.

True, but we don't want to take a long time to do a context switch. 
Actually, it's likely that the state of two apps will have some
overlap, so all you have to do is compare the state from the last
process to the current one and send any differences.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Alternative synchronization mechanism in the driver

Reply via email to