Re: [Open-graphics] Cheap trick for infinite ring-buffer scatter-gather

Timothy Miller Fri, 18 Mar 2005 18:59:45 -0800

On Fri, 18 Mar 2005 16:16:55 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Friday 18 March 2005 11:53, Timothy Miller wrote:
> > On Fri, 18 Mar 2005 02:13:56 -0500, Daniel Phillips wrote:
> > > Seriously, if you have one PIO gizmo capable of uploading N words of
> > > data, why not use that for every different vector that needs uploading
> > > via PIO?  Instead of calling this the "cursor upload register" I should
> > > really just call it what it is, the data register.
> >
> > There are some things, like perhaps the cursor glyph, which will done
> > via indirect access.  One register (address in the memory-mapped
> > engine space) is an index into the cursor, and the other is where you
> > write the data, and it auto-increments the index.
> 
> I see that this will work, but I don't see why it is better than poking each
> value into a PIO register one at a time.  Isn't this just an unnecessary
> level of indirection at the hardware level that the driver can handle
> perfectly well?


Maybe.  It's just that chip designers seems to like to use an extra
level of indirection for things that are out-of-the-way sorts of
things to access.  For things that you don't access often that would
take up a lot of register space, it only helps to make it indirect. 
For instance, consider the microcode for the video controller. 
Definitely going to be indirect.

> 
> > But most things won't be like that.  To write to, say, dX1dY is a
> > simple pointer dereference, having a different address from dX2dY.
> 
> I still do not see why it is good to have, e.g., gradients as PIO registers.
> These are properly command fields, or derived from other command fields.

See my other email.

> > Some addresses, like ones which target the rendering pipeline, dump
> > into a FIFO.  Some, like updating the cursor position, go direct.
> 
> Yes, understood.  In essence, the cursor acts like a completely separate
> piece of hardware.  This is good.  I'm just quibbling over matters of taste
> in the implemention.
> 
> > > > The cursor is only one of numerous things that will be accessible via
> > > > PIO.
> > >
> > > Right.  So I would like to suggest a two register interface that
> > > handles uploading any vector.  You write the number of the thing you
> > > want to upload and how many words it is into one register, then you
> > > write the words into the other (the data register).  This is
> > > economical, robust and extensible.
> >
> > We can really make the engine aperture as large as we want.  There's
> > no reason to do everything indirect like that.
> 
> It is not indirect.  I must have failed to explain. It is just a succession
> of pokes into a single data register, autoincrementing the on-card
> destination.  

Below this point, you lose me.

> Poking the "number of the thing" into the control register
> selects which card vector we are loading and initializes the autoincrement.
> This is about as simple as you can get, and general to boot.
> 
> Sure, at the moment, there are only two vectors needing loading, and that is
> on the assumption that you eventually buy my argument about having a page
> vector for the ring buffer.  But that is enough to justify it, and in
> future, we might come up with other vectors that want sequential loading.
> 
> > > > All engine registers (for
> > > > reading), the engine write fifo, debug, status, interrupt control,
> > > > etc., etc. will be accessible via PIO.  Basically, via PIO, you'll be
> > > > able to access all priveleged and unpriveleged registers.
> > >
> > > I see why you'd want to read engine registers by PIO (since there is no
> > > other way) but not why you would want to write them, except for
> > > debugging.  Doing 2D graphics via PIO makes zero sense to me.
> >
> > The first X11 driver prototype will use all PIO to control the engine.
> >  Why?  Because it's quickest and easiest to do it that way.  We need
> > to ramp up quickly so that we can start looking for hardware bugs.
> 
> The DMA buffers are an essential component.  Without them, we don't have a
> card in my opinion.  So why not debug that right from the beginning?
> Getting the interface and synchronization to the card working reliably and
> efficiently at the driver level is a long lead item.  It doesn't make sense
> to delay it.

It doesn't make sense to delay initial ramp-up in the testing,
especially when some of our test environments make it impossible or
inconvenient to do DMA.

Also, say you have a bug that you're trying to work out, and being
able to poke one register at a time helps in that?

Also, I'm betting one or more of the interfaces we'll have to
implement for embedded systems doesn't support DMA.

> 
> If you want to have debug registers so you can see what got loaded via DMA,
> fine.  I'm just violently objecting to letting this concept evolve into a
> supported interface.

I'm not sure how I can get rid of it.  There will be cases where it
can't be avoided.

> > Plus, when the BIOS and kernel driver talk to the engine, they're not
> > performance critical, so the best thing to do is just do everything by
> > PIO.
> 
> Ah, BIOS, I forgot about that.  Are we providing register compatibility or
> not?  (Not is better, imho.)  Anyway, how hard is it for our bios to submit
> commands via DMA?  It sounds like a subroutine to me.  But do we even need
> that?  To draw the glyphs, all we need is access to video memory as far as
> I can see.

You can't do DMA from the BIOS, because you cannot allocate memory for
it.  There's no memory management, and there's nothing to prevent
anything else from clobbering the buffer.  Even if we were to risk it,
it's a horrible kludge.  PIO access is a GOOD THING when you need it.

> > Only when you are doing something performance-critical does DMA
> > become any kind of help.
> 
> To risk an analogy duel: just like a 3D engine has now become an acceptable
> way to do 2D, even if overkill for many situations, command DMA is now an
> acceptable way to accomplish what used to be done via PIO registers.
> Progress marches on.  The PIO graphics model is outmoded, let's cut the
> cord.

I don't think we can.

> > > > The only
> > > > reason DMA can't get at priveleged registers is because the command
> > > > packets will always refer to registers implicitly.
> > >
> > > Of course.  But that should be enough.  If you think that having a PIO
> > > interface to write the internal registers will speed up debugging,
> > > great then, it's worth it.  But this interface should not be exposed as
> > > a drawing interface.  There is no proper synchronization for one thing.
> > > We already have a way to draw for another, so what is the advantage of
> > > having two ways to draw, that race with each other?
> >
> > The PIO interface to the drawing engine will have all the
> > synchronization you need.  TROZ is all PIO, and it's optimized to be
> > efficient that way.  Of course, it's also 2D only, but you get the
> > point.
> 
> Yes, I know it will work.  I just hate cruft, and I see a legacy 2D-oriented
> PIO alternate interface as cruft.
> 
> > I've noticed in some X.org drivers that some XAA functions will do
> > PIO, and others will use the ring buffer.  Like for instance, a line
> > segment will be started by PIO, but they'll use DMA for putimage.
> 
> Yuck!

I said the same thing.

> > For our chip, I would suggest always using DMA after X11 starts, but I
> > bet fbconsole will be done using PIO.
> 
> Why does fbconsole have to do anything fancier than poking video memory?

Scrolling and other kinds of shifting text.  I find is sad that some
fbconsole drivers don't bother to accelerate text drawing.  It would
be loads faster that way.  At least they do scroling, though.

> > > > BTW, you'll also be able to put rendering command packets into the
> > > > ring buffer.
> > >
> > > Yes, I remember that from way back.  We can get by without it,
> > > obviously.  Maybe the X server would use this feature to avoid some
> > > indirect DMA setup commands, which might matter for some loads (I have
> > > my doubts).  But then you introduce the need for userspace locking
> > > around the DMA write register, which isn't needed if only the kernel
> > > touches that register.
> >
> > These are all problems that can be solved in software; all the
> > possibilities will be available in hardware, and people can experiment
> > with them.
> 
> Sure, let me stop bleating about that.  It is easy to provide and somebody
> might find a use for it, though I don't see any use at present.  The
> problem is, if you have more than one task writing to the ring buffer, then
> we need to synchronize access.  DRI provides a locking model that can be
> used for this.  But we are within a hair's breadth of being able to bypass
> that odious locking and let each drawing task be truly asynchronous,
> depending only on the kernel module to sort out the indirect DMA
> submissions and switch to the correct drawing context.  I think we ought to
> go the distance.  That will get some attention.
> 
> > > 3D applications will always use indirect DMA, first because it is more
> > > capable and, and second because they aren't allowed to use direct DMA.
> >
> > Indirect DMA isn't more capable.  It's just easier to schedule.
> 
> That's what I mean by "more capable".  Also, it is easy to allocate DMA
> buffers of the right size, because the knowledge is in the right place.  As
> opposed to trying to automagically resize the command buffer.
> 
> > > I see what you're thinking: you have one command decoder and it doesn't
> > > care what the source is, however you selectively suppress a few command
> > > types depending on the source.  But that sounds like a solution looking
> > > for a problem.  You can easily separate the two command sets cleanly.
> > > Nobody has shown a command that needs to be available in both direct
> > > and indirect DMA mode, except for your direct DMA drawing.  As you may
> > > have noticed, I am not wild about that feature.
> >
> > I see your point, and remember, things are subject to change.
> 
> Please just forget I ever threw tomatoes at the direct DMA drawing
> feature ;)
> 
> It is not a bad thing, whether or not anybody uses it.  PIO drawing on the
> other hand, is a bad thing.  I see it as a make-work project.  It's another
> interface to design and document, another interface to write support for,
> another interface to implement and maintain, and worst of all, it is an
> obsolete interface that will eventually diverge from our command stream
> model.  Then we will either have to do unnatural things to force some
> semblance of backward compatibility or drop it.  I suggest looking into the
> future and dropping it now, except perhaps as a debugging interface.

I welcome your insights into how I might replace it with something
that can always do the job.  :)

> Or perhaps you have other reasons for wanting it, like compatibility with
> other Tech Source products, which would be perfectly valid.

No such concerns.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Cheap trick for infinite ring-buffer scatter-gather

Reply via email to