On 2/22/09, Kenneth Ostby <[email protected]> wrote: > Heya, > > Recently, I've been looking up and reading through the documentation > available for the 3D Engine[1] part of the project ( It's in need of > some love as well ), and the software model of our rasterizer found in > the SVN repository. This has been in order to try to plan ahead, and > identify the work that has to be done. And in relations to that I have > some questions. > > First, in each of the different stages [2] in the pipeline, Rasterize, > Ownership, etc, we have some that just have to be forwarded for further > use into the pipeline. Texture coordinates and Texture ID is a good > example. Hence, what I'm thinking, is that if we in front of the > Rasterizer / Scissor step in the pipeline, include a issue unit, it > should be possible for the issue unit to bite off the unneeded parts and > just forward it to the correct stage where it's first needed. Then, in > the different stages in the pipeline, we could include a FIFO buffer > where we store the data for future usage. Also, since we're operating on > a strict in-order processor, it shouldn't need to be anything more > complex than a simple FIFO buffer.
You seem to be suggesting that we provide an ability to bypass pipeline segments that are not going to be used. This would certainly reduce latency. There are some tradeoffs, however. One is that what is enabled and disabled changes fairly frequently. To enable and disable pipeline stages would require a pipeline flush, introducing a significant delay. As long as we're keeping the pipeline busy, it doesn't really matter what the latency is. Secondly, this introduces the need to add additional mutiplexing and routing around pipeline segments. All reasonable configurations need to be accounted for, which implies crossbars. Those introduce latency of their own, as well as creating routing congestion in the FPGA. We'll get a higher clock frequency if we can make the pipeline streamlined and easy to place and route. > > Also, by doing this, we can try to hide some of the latencies found in > the pipeline in memory reads / writes. As an example, imagine that we > have the case of texture element. In the current model, it would require > that the pipeline stalls while waiting for a texture fetch. If we send > the coordinates ahead, it should be possible to prefetch some of the > needed texels before the fragments gets to the texture stage. I am sure > that this technique can be utilized more places in the pipeline as well. Let's consider something simpler, like the Z buffer. This is straight forward. For each fragment, we need to read a word from memory and possible write one. The write is trivial, since that can be just dumped into a fifo and processed out of order (unless the fifo fills, in which case, we're memory bound, and we don't care about the delay). The read, however, complicates things. The solution involves three fifos. The memory system is already build around fifos. For reads, requests are issued down one pipe, and then the data comes back in another. If you could continue to issue requests asynchronously from processing the data, you could keep the pipeline moving. What we do is insert a third fifo between the pipeline segment that requests the reads and the segment that receives and processes the data. As requests are pushed into the memory fifo, fragments are pushed into another one that only fills up and causes a stall if memory can't keep up with requests. Let's call this third fifo a "latency absorber". For Z, this is quite straightforward. For textures, which may involve multiple requests per fragment, which implies a state machine that will hold up the pipeline. The receiver also must loop over multiple received pixels to fully compute the fragment. I have two possible solutions to this. One is to have two full state machines at each end of the latency absorber. Each has its own set of configuration variables, and we just design them so that they do complementary work. The other alternative is to have one state machine at the head of the queue that also passes commands down the latency absorber which are processed by the receiver. Those commands would be things like "here's a fragment to be processed", "expect a pixel from memory, do this with it to modify the fragment", and "complete processing of the fragment and forward it to the next segment". (Some of those commands may be issued simultaneously, like when the last texel is received, the finish command may just be a flag bit.) Speaking of configuration variables, we would pass those down the pipeline, as if they were fragment data, reusing some of the same signals. When the parameter reaches the segment or stage that holds that variable, it gets stored right there. Thus, the pipeline is not stalled by variable changes. > Secondly, we have a lot of configuration parameters which need to set > for the engine, and which needs to be handled in an efficient manner. > Hence, I'm suggesting that we add some sort of registry file to the > architecture as well. Also by employing this technique, we could later > incorporate some sort of performance counter system, or as a way of > giving feedback from the system. Counters would definitely be useful for debugging. I've already described how to handle register writes. They pass down the pipeline and are stored locally in the stage. For read-backs, which are rare and only used for debugging, we can make them happen out of order. Just a big MUX in the engine routes them all back to PCI. > I tried to modify Tim's original block diagram with Gimp to kinda show > what I was talking about and the result can be found here [3]. Also, I > would like to apologize for breaking the pretty diagram. It seems > like a simple action such as drawing a box or a straight line in gimp is > meant to be hard. > > Kenneth > > [1] http://wiki.opengraphics.org/tiki-index.php?page=OGA%20Engine > [2] http://langly.org/og/block_diagram.gif > [3] http://langly.org/og/block-mod.gif > > > -- > Life on the earth might be expensive, but it > includes an annual free trip around the sun. > > Kenneth Østby > http://langly.org > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkmhwnAACgkQpcFZhY+VljxoCACfclEmobz6FRpSq0FBkajMK+Bk > mhAAoNSAW4fwAb/6//GjnnbwSKIJyiY0 > =jvMn > -----END PGP SIGNATURE----- > > _______________________________________________ > Open-graphics mailing list > [email protected] > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
