Hi Timothy, This started out as a simple reply and bloated up into an analysis of trapezoid format optimizations...
On Saturday 19 March 2005 17:22, Timothy Miller wrote: > ...On the structure of rendering commands: > > The first word contains a command number and a set of bits that > indicate the presence of an optional parameter. Some commands will repeat many times with the same format, so we might want to have a separate command to specify the format. I'm thinking mainly about what should be the most common command, "draw trapezoid". If there is plenty of slack space in the first word anyway, this is moot, but why don't we try to fully define the trapezoid command and see how it looks? We noticed a while back that trapezoid commands are bulkier than triangle commands because we have to spell out all the gradients instead of deriving them from vertices. Also, we need to give the parameters twice for at least one of the triangle vertices because the top of the trapezoid is truncated to a scan line. Finally, we have nearly two trapezoids on average for each triangle. The last point is painful. But we can do something about it, we can have a "continue trapezoid" command that only supplies the gradients that change in the second trapezoid of a triangle. This doesn't have to be a separate command, it can be an optional field-of-fields. There are two flavors, left-knee and right-knee. In the right-knee case the only thing that changes is dX2/dy; in the left-knee case every dy changes. For the main part of the trapezoid we need the following, all in float25 format: X1 X1y X2 X2y W Wy Wx F Fy Fx A1 A1y A1x R1 R1y R1x G1 G1y G1x B1 B1y B1x A2 A2y A2x R2 R2y R2x G2 G2y G2x B2 B2y B2x S1 S1y D1x T1 T1y T1x S2 S2y D2x T2 T2y T2x D1 D1y D1x D2 D2y D2x (The naming scheme is slightly compressed here: redundant d's are omitted, LOD is just D, primary and secondary colors are 1 and 2) The grand total is 52 float25 fields. A left-knee triangle needs an additional 17 dy's and a right-knee triangle needs one. Assuming we use a full word for each float25 and a full word for the command/options field, we have: Single trapezoid: 53 words Right knee triangle: 54 words Left knee triangle: 70 words Single trapezoid triangles will be relatively rare, say about 4%, and of the remainder, left and right knee triangles should be about equal. This gives an average of about 62 words per triangle, so let's round it up to 64 words, or 2**8 bytes to estimate bandwidth, and round PCI bandwith to 2**27 bytes/second. That gives 2**19 ~= .5 million triangles/sec, which is pathetic even for PCI. We have to do something about that, so let's get started. First let's cut out the parameters we don't really need for rendering Quake III. That would be the fog, one of the LOD's, the secondary color and (in most cases) the alpha field, leaving: X1 X1y X2 X2y W Wy Wx R1 R1y R1x G1 G1y G1x B1 B1y B1x S1 S1y D1x T1 T1y T1x S2 S2y D2x T2 T2y T2x D1 D1y D1x Including the command/options word we have: Single trapezoid: 36 words Right knee triangle: 37 words Left knee triangle: 45 words This gives a little over 800 thousand triangles/second on PCI, still fairly lame. I think this proves we need to lose the 4 byte boundaries and go to byte alignment. We don't need any negative numbers here, so we can lose the sign bit, giving exactly 3 bytes per value. We end up with: Single trapezoid: 108 bytes (2 bytes unused) Right knee triangle: 112 bytes (3 bytes unused) Left knee triangle: 136 bytes (3 bytes unused) With an average 123 bytes/triangle, we now have just over 1 million trianges/second. But let's not stop here! Next, we can get rid of the nasty left-knee case by optionally rendering right-to-left. This gives about 1.2 million triangles/second. Finally, let's do the last really easy thing and cut down the precision of color and level of detail to float16, giving: Single trapezoid: 84 bytes (2 bytes unused) Right knee triangle: 88 bytes (3 bytes unused) Now we are a little over 1.5 million triangles/second, much better. Short of bringing the gradient calculations on-chip, I don't think there is a lot more low-hanging fruit. (But prove me wrong, please!) Another way to get rid of the bus bandwidth bottleneck is to go to PCI-e. However, since all of my suggestions above are easy to implement, I think we ought to write at least some of them into the design, since we're probably going to have to live with the PCI version for a while. To summarize, here are my suggestions: * Include only enabled fields (already planned) * Put the options field in a separate command * Combine trapezoid pairs * Render right to left for left-knee triangles * Use float16 for low precision fields * Use float24 for other fields * Align fields on bytes, not words These optimizations more than triple our triangle rate on PCI versus a dumb format that transmits each trapezoid separately, includes all parameters whether needed or not, and uses 4 byte floating point fields. The dumb format is the easiest to implement, which is a good argument for starting out with it. But we ought to plan the optimizations now, and implement them early. By the way, I just noticed that I left out all the Y/Ycount fields, which will add another 4 bytes or so to every trapezoid, but I'm not going to go back and fix it now. It doesn't change the estimates much. After we've kicked this around, I can redo it with corrections. > There won't be any "draw line" commands just yet. Too complicated, > and we don't have space for a microcontroller to do the necessary > math. So our 2D X driver and DRM driver will flush the pipeline and draw lines/circles/other strange things directly into the frame buffer? Flushing before each line would be deadly for performance. > Our rendering commands are little more than PIO register writes with > implied addresses, so they use less bandwidth. Great, getting rid of the register numbers is one step towards burying the register model completely ;-) Regards, Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
