Re: [Open-graphics] Re: Scatter-gather?

Timothy Miller Sat, 19 Mar 2005 13:57:50 -0800

On Fri, 18 Mar 2005 21:40:06 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Friday 18 March 2005 20:16, Timothy Miller wrote:
> > On Fri, 18 Mar 2005 14:24:22 -0500, Daniel Phillips wrote:
> > > OK, the two level DMA structure is not unlike our current proposal.
> > >  (Hmm, I ask myself is this where Timothy got his model or did he
> > > arrive at it from first principles as I did...)
> >
> > I got it from experience with a different chip, plus some of my own
> > ideas.
> >
> > As for interrupts, I've never seen another chip have "fifo/buffer
> > almost empty", or even "fifo/buffer completely empty."  I've only
> > seen them have "engine completely idle".
> 
> Hmm, it seems kind of stupid not to have it.  Even lowly serial chips
> have that.


I've worked with a lot of graphics chips over the years.  The only one
to have those interrupts was the one I designed myself.

> 
> > > The register model however is an obsolete throwback that I would
> > > like to eradicate from our design.  We have a command stream,
> > > commands have command fields, there may be no simple mapping
> > > between command fields and registers.  We certainly do not need
> > > register numbers in the command fields.  In fact, we don't need
> > > registers at all, except for DMA control and similar.  We don't
> > > even need registers for reading GL state: the driver knows the GL
> > > state, and as a bonus, it knows the current values, not state as of
> > > some time in the past, which is what the rendering pipeline knows.
> > > Getting rid of the requirement for reading GL state from the card
> > > gets rid of a whole class of messy pipeline synchronization issues.
> >
> > Perhaps I'm thinking in terms of an archaic design philosophy,
> > although I have designed a GPU before, and it seemed most logical to
> > do it this way.
> >
> > Here's how the pipeline works:  Each pipeline stage that you see in
> > the model is really composed of many substages.  One of the early
> > substages is responsible for extracting register writes.
> 
> I see this as "instruction decode" rather than "register read".  I
> suppose it amounts to the same thing if you look at it a certain way.

Well, yes, but in this case, the register number doesn't cause an
action.  The closest thing to a machine instruction is like an move
instruction with immediate source and register destination, but that's
the only instruction type, so there's no instruction code, just a
register number, so we're back to it just being a data-address pair.

> > See, we
> > want writes to occur in pipeline order, and we don't want to stall
> > the pipeline when writing to registers, so we just carry them down
> > the pipeline just like fragments.
> 
> Yep, I have some sort of slightly foggy idea of what's happening there.
> I even had a suggestion way back about optimizing that by carrying only
> a single bit down the pipeline for register synchronization, but I like
> your idea below a lot more.
> 
> > To extract the register writes,
> > they're identified by number.  If the register doesn't belong to this
> > stage, it's passed along; if it does belong to this stage, it's
> > stored and dropped; and if it partially belongs to this stage, it's
> > stored and passed along.
> 
> Hmm, ok, I see that lets you use a single queue for a whole bunch of
> different registers.  That makes a lot of sense.  How long is our
> pipeline anyway?  It feels like dozens of clocks by now.  Each element
> in the queue is what, a chunk of distributed ram?

The fifo at the front end of the pipeline could be a block RAM, but
that's likely overkill.  There are actually multiple fifos at the
front end of the pipeline.  One is for the rendering pipeline
directly, where PIO writes to engine registers go.  Another is for DMA
ring buffer packets, another is for indirect DMA packets.  The DMA
queues are fed periodically, and emptied as packets are decoded,
converted to register writes and fed into the engine fifo.

> > Given this architecture, the registers need to be numbered, and
> > there's little reason not to number them as a subset of all of the
> > rest of the registers in the chip and give access to them the same
> > way.
> >
> > Really, as it turns out, the biggest negative is that the logic to
> > translate the DMA packets into the appropriate register writes is
> > non-trivial.  This is why some chips use a microcontroller for this
> > purpose.  If I have to develop anything programmable for that, it'll
> > be documented.  Smile.  :)
> 
> How about just skipping the register write?  It seems to me that what
> you have is a queue of values and tags.  The tag identifies the
> pipeline stage where the queue value is to be loaded into a (real)
> register.  (There may be more than one register that has to be loaded
> at the same stage, so some trickery is needed there.)  So you translate
> the instruction code through a table that gives the correct tag value
> for each instruction field, which ought to end up being a single 36 bit
> lookup in block ram to handle up to, say, 4 parameters.  Now you can
> pull in the rest of the command from DMA, and each field goes straight
> into the queue along with the correct tag.  If command fields aren't
> word aligned, it gets a little more complicated, but not much.
> 
> Does that make sense, or am I smoking crack?

What you're describing is the idea of dividing the register "tag" or
address into a field of bits that indicates the which pipeline stage
takes the data and another field which indicates which register in the
stage.  Of course, that's how it was going to happen anyhow, and those
tag numbers will map one-to-one to addresses in the PIO engine
aperture.

> > > The only plausible argument for having registers I've seen so far
> > > is for debugging, and then it's unimportant to have any formal
> > > definition.  Let's just get rid of the idea of drawing registers,
> > > it's obsolete.
> >
> > You have to store the state information SOMEWHERE.  Those where's are
> > registers, and they have to be numbered.
> 
> Yes, I see what you're thinking.  The _real_ place they have to end up
> is somewhere down the pipeline in most cases.  So it seems to me they
> can move straight from DMA into the value queue.

What do you mean by "value queue"?  

You're making a distinction between "pipeline register tag" and
"address" that doesn't really exist.  It's like the distinction
between "document" and "program".  (Actually MUCH less so.)  The only
difference between a document and a program is the number of levels of
interpretation before you hit the hardware, and some things are hard
to classify as strictly one or the other, like HTML.  (And in the case
of the registers, there are NO additional levels of interpretation for
one or the other.)

You are suffering from a typical engineer's syndrome.  You're taught
that two things are different, because the people who taught it to you
didn't realize they were the same thing, and you believed it.  I fall
into that trap constantly.  :)

> > > > The thing that makes it ugly for me is, that every transefer
> > > > is based either on whole lines or whole pixel. This has the
> > > > disadvantage that if my user space program has a picture to
> > > > draw, that i needs to be split at line ends and that the
> > > > lines are may not cross page boundaries under any circumstances
> > > > unless the pages are continous.
> > >
> > > We haven't even gotten to that part yet, so it can't possibly be
> > > misdesigned ;)
> > >
> > > I'm working on the assumption that when a command deals with a
> > > rectangle, the engine is smart enough to process the whole
> > > rectangle.  Raster lines don't come into it at all, at the command
> > > level.
> >
> > Our DMA will be linear data.  There are no rectangles, and the
> > scanline granularity is to the pixel. None of the things you're
> > talking about are going to be a problem.
> 
> I meant commands like "blit rectangle".  But then I forgot that we are
> probably not going to have rectangles and will probably use trapezoids
> for things like that.

Well, there will be a configuration of the engine which does "blit
rectangle", but none of this has anything to do with DMA, since DMA
and the graphics memory have no concept of rectangles, trapezoids, or
anything else.  Graphics memory is completely one-dimensional.  2D
only comes about when you define things like "pitch" when drawing
something, but the same memory could have different pitches, depending
on what's looking at it.

In fact, the only case where you have something that's truly 2D is
when it hits the monitor screen.  Everything else is a loose
constraint on 1D data.

> By the way, how are we going to handle miscellaneous (ugly and useless)
> 2D commands like DrawCircle and FillCircle?

Same way I always do:  Points, spans, and line-segments.  (If you look
at mi code in the X server, it generates either spans or polypoint for
circles; I developed my own code to do line segments when it was a
better thing to do.  One of our customers found a horrible
inefficiency in mi for dashed circles that causes it to take MINUTES
to draw one and hit us with it one day; we developed our own code to
replace it.)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Re: Scatter-gather?

Reply via email to