On Friday 18 March 2005 20:16, Timothy Miller wrote:
> On Fri, 18 Mar 2005 14:24:22 -0500, Daniel Phillips wrote:
> > OK, the two level DMA structure is not unlike our current proposal.
> >  (Hmm, I ask myself is this where Timothy got his model or did he
> > arrive at it from first principles as I did...)
>
> I got it from experience with a different chip, plus some of my own
> ideas.
>
> As for interrupts, I've never seen another chip have "fifo/buffer
> almost empty", or even "fifo/buffer completely empty."  I've only
> seen them have "engine completely idle".

Hmm, it seems kind of stupid not to have it.  Even lowly serial chips 
have that.

> > The register model however is an obsolete throwback that I would
> > like to eradicate from our design.  We have a command stream,
> > commands have command fields, there may be no simple mapping
> > between command fields and registers.  We certainly do not need
> > register numbers in the command fields.  In fact, we don't need
> > registers at all, except for DMA control and similar.  We don't
> > even need registers for reading GL state: the driver knows the GL
> > state, and as a bonus, it knows the current values, not state as of
> > some time in the past, which is what the rendering pipeline knows.
> > Getting rid of the requirement for reading GL state from the card
> > gets rid of a whole class of messy pipeline synchronization issues.
>
> Perhaps I'm thinking in terms of an archaic design philosophy,
> although I have designed a GPU before, and it seemed most logical to
> do it this way.
>
> Here's how the pipeline works:  Each pipeline stage that you see in
> the model is really composed of many substages.  One of the early
> substages is responsible for extracting register writes. 

I see this as "instruction decode" rather than "register read".  I 
suppose it amounts to the same thing if you look at it a certain way.

> See, we 
> want writes to occur in pipeline order, and we don't want to stall
> the pipeline when writing to registers, so we just carry them down
> the pipeline just like fragments.

Yep, I have some sort of slightly foggy idea of what's happening there.  
I even had a suggestion way back about optimizing that by carrying only 
a single bit down the pipeline for register synchronization, but I like 
your idea below a lot more.

> To extract the register writes, 
> they're identified by number.  If the register doesn't belong to this
> stage, it's passed along; if it does belong to this stage, it's
> stored and dropped; and if it partially belongs to this stage, it's
> stored and passed along.

Hmm, ok, I see that lets you use a single queue for a whole bunch of 
different registers.  That makes a lot of sense.  How long is our 
pipeline anyway?  It feels like dozens of clocks by now.  Each element 
in the queue is what, a chunk of distributed ram?

> Given this architecture, the registers need to be numbered, and
> there's little reason not to number them as a subset of all of the
> rest of the registers in the chip and give access to them the same
> way.
>
> Really, as it turns out, the biggest negative is that the logic to
> translate the DMA packets into the appropriate register writes is
> non-trivial.  This is why some chips use a microcontroller for this
> purpose.  If I have to develop anything programmable for that, it'll
> be documented.  Smile.  :)

How about just skipping the register write?  It seems to me that what 
you have is a queue of values and tags.  The tag identifies the 
pipeline stage where the queue value is to be loaded into a (real) 
register.  (There may be more than one register that has to be loaded 
at the same stage, so some trickery is needed there.)  So you translate 
the instruction code through a table that gives the correct tag value 
for each instruction field, which ought to end up being a single 36 bit 
lookup in block ram to handle up to, say, 4 parameters.  Now you can 
pull in the rest of the command from DMA, and each field goes straight 
into the queue along with the correct tag.  If command fields aren't 
word aligned, it gets a little more complicated, but not much.

Does that make sense, or am I smoking crack?

> > The only plausible argument for having registers I've seen so far
> > is for debugging, and then it's unimportant to have any formal
> > definition.  Let's just get rid of the idea of drawing registers,
> > it's obsolete.
>
> You have to store the state information SOMEWHERE.  Those where's are
> registers, and they have to be numbered.

Yes, I see what you're thinking.  The _real_ place they have to end up 
is somewhere down the pipeline in most cases.  So it seems to me they 
can move straight from DMA into the value queue.

> > > The thing that makes it ugly for me is, that every transefer
> > > is based either on whole lines or whole pixel. This has the
> > > disadvantage that if my user space program has a picture to
> > > draw, that i needs to be split at line ends and that the
> > > lines are may not cross page boundaries under any circumstances
> > > unless the pages are continous.
> >
> > We haven't even gotten to that part yet, so it can't possibly be
> > misdesigned ;)
> >
> > I'm working on the assumption that when a command deals with a
> > rectangle, the engine is smart enough to process the whole
> > rectangle.  Raster lines don't come into it at all, at the command
> > level.
>
> Our DMA will be linear data.  There are no rectangles, and the
> scanline granularity is to the pixel. None of the things you're
> talking about are going to be a problem. 

I meant commands like "blit rectangle".  But then I forgot that we are 
probably not going to have rectangles and will probably use trapezoids 
for things like that.

By the way, how are we going to handle miscellaneous (ugly and useless) 
2D commands like DrawCircle and FillCircle?

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to