Hi Timothy,

This started out as a simple reply and bloated up into an analysis of 
trapezoid format optimizations...

On Saturday 19 March 2005 17:22, Timothy Miller wrote:
> ...On the structure of rendering commands:
>
> The first word contains a command number and a set of bits that
> indicate the presence of an optional parameter.

Some commands will repeat many times with the same format, so we might 
want to have a separate command to specify the format.  I'm thinking 
mainly about what should be the most common command, "draw trapezoid".  
If there is plenty of slack space in the first word anyway, this is 
moot, but why don't we try to fully define the trapezoid command and 
see how it looks?

We noticed a while back that trapezoid commands are bulkier than 
triangle commands because we have to spell out all the gradients 
instead of deriving them from vertices.  Also, we need to give the 
parameters twice for at least one of the triangle vertices because the 
top of the trapezoid is truncated to a scan line.  Finally, we have 
nearly two trapezoids on average for each triangle.

The last point is painful.  But we can do something about it, we can 
have a "continue trapezoid" command that only supplies the gradients 
that change in the second trapezoid of a triangle.  This doesn't have 
to be a separate command, it can be an optional field-of-fields.  There 
are two flavors, left-knee and right-knee.  In the right-knee case the 
only thing that changes is dX2/dy; in the left-knee case every dy 
changes.

For the main part of the trapezoid we need the following, all in float25 
format:

   X1 X1y
   X2 X2y
   W Wy Wx
   F Fy Fx

   A1 A1y A1x
   R1 R1y R1x
   G1 G1y G1x
   B1 B1y B1x

   A2 A2y A2x
   R2 R2y R2x
   G2 G2y G2x
   B2 B2y B2x

   S1 S1y D1x
   T1 T1y T1x
   S2 S2y D2x
   T2 T2y T2x

   D1 D1y D1x
   D2 D2y D2x

(The naming scheme is slightly compressed here: redundant d's are 
omitted, LOD is just D, primary and secondary colors are 1 and 2)

The grand total is 52 float25 fields.  A left-knee triangle needs an 
additional 17 dy's and a right-knee triangle needs one.  Assuming we 
use a full word for each float25 and a full word for the 
command/options field, we have:

  Single trapezoid: 53 words
  Right knee triangle: 54 words
  Left knee triangle: 70 words

Single trapezoid triangles will be relatively rare, say about 4%, and of 
the remainder, left and right knee triangles should be about equal.  
This gives an average of about 62 words per triangle, so let's round it 
up to 64 words, or 2**8 bytes to estimate bandwidth, and round PCI 
bandwith to 2**27 bytes/second.  That gives 2**19 ~= .5 million 
triangles/sec, which is pathetic even for PCI.  We have to do something 
about that, so let's get started.

First let's cut out the parameters we don't really need for rendering 
Quake III.  That would be the fog, one of the LOD's, the secondary 
color and (in most cases) the alpha field, leaving:

   X1 X1y
   X2 X2y
   W Wy Wx

   R1 R1y R1x
   G1 G1y G1x
   B1 B1y B1x

   S1 S1y D1x
   T1 T1y T1x
   S2 S2y D2x
   T2 T2y T2x

   D1 D1y D1x

Including the command/options word we have:

  Single trapezoid: 36 words
  Right knee triangle: 37 words
  Left knee triangle: 45 words

This gives a little over 800 thousand triangles/second on PCI, still 
fairly lame.  I think this proves we need to lose the 4 byte boundaries 
and go to byte alignment.  We don't need any negative numbers here, so 
we can lose the sign bit, giving exactly 3 bytes per value.  We end up 
with:

  Single trapezoid: 108 bytes (2 bytes unused)
  Right knee triangle: 112 bytes (3 bytes unused)
  Left knee triangle: 136 bytes (3 bytes unused)

With an average 123 bytes/triangle, we now have just over 1 million 
trianges/second.  But let's not stop here!  Next, we can get rid of the 
nasty left-knee case by optionally rendering right-to-left.  This gives 
about 1.2 million triangles/second.

Finally, let's do the last really easy thing and cut down the precision 
of color and level of detail to float16, giving:

  Single trapezoid: 84 bytes (2 bytes unused)
  Right knee triangle: 88 bytes (3 bytes unused)

Now we are a little over 1.5 million triangles/second, much better.  
Short of bringing the gradient calculations on-chip, I don't think 
there is a lot more low-hanging fruit.  (But prove me wrong, please!)

Another way to get rid of the bus bandwidth bottleneck is to go to 
PCI-e.  However, since all of my suggestions above are easy to 
implement, I think we ought to write at least some of them into the 
design, since we're probably going to have to live with the PCI version 
for a while.

To summarize, here are my suggestions:

   * Include only enabled fields (already planned)
   * Put the options field in a separate command
   * Combine trapezoid pairs
   * Render right to left for left-knee triangles
   * Use float16 for low precision fields
   * Use float24 for other fields
   * Align fields on bytes, not words

These optimizations more than triple our triangle rate on PCI versus a 
dumb format that transmits each trapezoid separately, includes all 
parameters whether needed or not, and uses 4 byte floating point 
fields.

The dumb format is the easiest to implement, which is a good argument 
for starting out with it.  But we ought to plan the optimizations now, 
and implement them early.

By the way, I just noticed that I left out all the Y/Ycount fields, 
which will add another 4 bytes or so to every trapezoid, but I'm not 
going to go back and fix it now.  It doesn't change the estimates much.  
After we've kicked this around, I can redo it with corrections.

> There won't be any "draw line" commands just yet.  Too complicated,
> and we don't have space for a microcontroller to do the necessary
> math.

So our 2D X driver and DRM driver will flush the pipeline and draw 
lines/circles/other strange things directly into the frame buffer?  
Flushing before each line would be deadly for performance.

> Our rendering commands are little more than PIO register writes with
> implied addresses, so they use less bandwidth.

Great, getting rid of the register numbers is one step towards burying 
the register model completely ;-)

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to