Re: [Open-graphics] Cheap trick for infinite ring-buffer scatter-gather

Daniel Phillips Fri, 18 Mar 2005 16:52:05 -0800

On Friday 18 March 2005 11:53, Timothy Miller wrote:
> On Fri, 18 Mar 2005 02:13:56 -0500, Daniel Phillips wrote:
> > Seriously, if you have one PIO gizmo capable of uploading N words of 
> > data, why not use that for every different vector that needs uploading
> > via PIO?  Instead of calling this the "cursor upload register" I should
> > really just call it what it is, the data register.
>
> There are some things, like perhaps the cursor glyph, which will done
> via indirect access.  One register (address in the memory-mapped
> engine space) is an index into the cursor, and the other is where you
> write the data, and it auto-increments the index.


I see that this will work, but I don't see why it is better than poking each 
value into a PIO register one at a time.  Isn't this just an unnecessary 
level of indirection at the hardware level that the driver can handle 
perfectly well?

> But most things won't be like that.  To write to, say, dX1dY is a
> simple pointer dereference, having a different address from dX2dY.

I still do not see why it is good to have, e.g., gradients as PIO registers.  
These are properly command fields, or derived from other command fields.

> Some addresses, like ones which target the rendering pipeline, dump
> into a FIFO.  Some, like updating the cursor position, go direct.

Yes, understood.  In essence, the cursor acts like a completely separate 
piece of hardware.  This is good.  I'm just quibbling over matters of taste 
in the implemention.

> > > The cursor is only one of numerous things that will be accessible via
> > > PIO.
> >
> > Right.  So I would like to suggest a two register interface that
> > handles uploading any vector.  You write the number of the thing you
> > want to upload and how many words it is into one register, then you
> > write the words into the other (the data register).  This is
> > economical, robust and extensible.
>
> We can really make the engine aperture as large as we want.  There's
> no reason to do everything indirect like that.

It is not indirect.  I must have failed to explain. It is just a succession 
of pokes into a single data register, autoincrementing the on-card 
destination.  Poking the "number of the thing" into the control register 
selects which card vector we are loading and initializes the autoincrement.  
This is about as simple as you can get, and general to boot.

Sure, at the moment, there are only two vectors needing loading, and that is 
on the assumption that you eventually buy my argument about having a page 
vector for the ring buffer.  But that is enough to justify it, and in 
future, we might come up with other vectors that want sequential loading.

> > > All engine registers (for
> > > reading), the engine write fifo, debug, status, interrupt control,
> > > etc., etc. will be accessible via PIO.  Basically, via PIO, you'll be
> > > able to access all priveleged and unpriveleged registers.
> >
> > I see why you'd want to read engine registers by PIO (since there is no
> > other way) but not why you would want to write them, except for
> > debugging.  Doing 2D graphics via PIO makes zero sense to me.
>
> The first X11 driver prototype will use all PIO to control the engine.
>  Why?  Because it's quickest and easiest to do it that way.  We need
> to ramp up quickly so that we can start looking for hardware bugs.

The DMA buffers are an essential component.  Without them, we don't have a 
card in my opinion.  So why not debug that right from the beginning?  
Getting the interface and synchronization to the card working reliably and 
efficiently at the driver level is a long lead item.  It doesn't make sense 
to delay it.

If you want to have debug registers so you can see what got loaded via DMA, 
fine.  I'm just violently objecting to letting this concept evolve into a 
supported interface.

> Plus, when the BIOS and kernel driver talk to the engine, they're not
> performance critical, so the best thing to do is just do everything by
> PIO.

Ah, BIOS, I forgot about that.  Are we providing register compatibility or 
not?  (Not is better, imho.)  Anyway, how hard is it for our bios to submit 
commands via DMA?  It sounds like a subroutine to me.  But do we even need 
that?  To draw the glyphs, all we need is access to video memory as far as 
I can see.

> Only when you are doing something performance-critical does DMA 
> become any kind of help.

To risk an analogy duel: just like a 3D engine has now become an acceptable 
way to do 2D, even if overkill for many situations, command DMA is now an 
acceptable way to accomplish what used to be done via PIO registers.  
Progress marches on.  The PIO graphics model is outmoded, let's cut the 
cord.

> > > The only
> > > reason DMA can't get at priveleged registers is because the command
> > > packets will always refer to registers implicitly.
> >
> > Of course.  But that should be enough.  If you think that having a PIO
> > interface to write the internal registers will speed up debugging,
> > great then, it's worth it.  But this interface should not be exposed as
> > a drawing interface.  There is no proper synchronization for one thing.
> > We already have a way to draw for another, so what is the advantage of
> > having two ways to draw, that race with each other?
>
> The PIO interface to the drawing engine will have all the
> synchronization you need.  TROZ is all PIO, and it's optimized to be
> efficient that way.  Of course, it's also 2D only, but you get the
> point.

Yes, I know it will work.  I just hate cruft, and I see a legacy 2D-oriented 
PIO alternate interface as cruft.

> I've noticed in some X.org drivers that some XAA functions will do
> PIO, and others will use the ring buffer.  Like for instance, a line
> segment will be started by PIO, but they'll use DMA for putimage.

Yuck!

> For our chip, I would suggest always using DMA after X11 starts, but I
> bet fbconsole will be done using PIO.

Why does fbconsole have to do anything fancier than poking video memory?

> > > BTW, you'll also be able to put rendering command packets into the
> > > ring buffer.
> >
> > Yes, I remember that from way back.  We can get by without it,
> > obviously.  Maybe the X server would use this feature to avoid some
> > indirect DMA setup commands, which might matter for some loads (I have
> > my doubts).  But then you introduce the need for userspace locking
> > around the DMA write register, which isn't needed if only the kernel
> > touches that register.
>
> These are all problems that can be solved in software; all the
> possibilities will be available in hardware, and people can experiment
> with them.

Sure, let me stop bleating about that.  It is easy to provide and somebody 
might find a use for it, though I don't see any use at present.  The 
problem is, if you have more than one task writing to the ring buffer, then 
we need to synchronize access.  DRI provides a locking model that can be 
used for this.  But we are within a hair's breadth of being able to bypass 
that odious locking and let each drawing task be truly asynchronous, 
depending only on the kernel module to sort out the indirect DMA 
submissions and switch to the correct drawing context.  I think we ought to 
go the distance.  That will get some attention.

> > 3D applications will always use indirect DMA, first because it is more
> > capable and, and second because they aren't allowed to use direct DMA.
>
> Indirect DMA isn't more capable.  It's just easier to schedule.

That's what I mean by "more capable".  Also, it is easy to allocate DMA 
buffers of the right size, because the knowledge is in the right place.  As 
opposed to trying to automagically resize the command buffer.

> > I see what you're thinking: you have one command decoder and it doesn't
> > care what the source is, however you selectively suppress a few command
> > types depending on the source.  But that sounds like a solution looking
> > for a problem.  You can easily separate the two command sets cleanly.
> > Nobody has shown a command that needs to be available in both direct
> > and indirect DMA mode, except for your direct DMA drawing.  As you may
> > have noticed, I am not wild about that feature.
>
> I see your point, and remember, things are subject to change.

Please just forget I ever threw tomatoes at the direct DMA drawing 
feature ;)

It is not a bad thing, whether or not anybody uses it.  PIO drawing on the 
other hand, is a bad thing.  I see it as a make-work project.  It's another 
interface to design and document, another interface to write support for, 
another interface to implement and maintain, and worst of all, it is an 
obsolete interface that will eventually diverge from our command stream 
model.  Then we will either have to do unnatural things to force some 
semblance of backward compatibility or drop it.  I suggest looking into the 
future and dropping it now, except perhaps as a debugging interface.

Or perhaps you have other reasons for wanting it, like compatibility with 
other Tech Source products, which would be perfectly valid.

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Cheap trick for infinite ring-buffer scatter-gather

Reply via email to