On Friday 04 February 2005 18:11, Timothy Miller wrote: > On Fri, 4 Feb 2005 18:02:16 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > On Friday 04 February 2005 17:41, Timothy Miller wrote: > > > On Fri, 4 Feb 2005 17:21:03 -0500, Daniel Phillips > > > > <[EMAIL PROTECTED]> wrote: > > > > But why can't we solve this with an intermediate queue that > > > > holds the same number of entries as stages in the iteration? > > > > > > Ummm... I'm not sure I understand what you're saying, but it > > > sounds vaguely like the solution I already have. :) > > > > The problem is, I don't really know the hardware terminology very > > well. I'll try to say it in more words. > > > > Suppose each horizontal iteration including the divide > > approximation requires N stages and yields one pair of results per > > clock. The goal is to deliver results to the next stage of the > > pixel pipeline once per clock. To get around the N stage latency, > > instead of delivering results to the next pixel stage, they are > > delivered to an N entry queue, which advances one entry per clock. > > The horizontal iterator goes to work on the next pixel pair > > immediately. Results are thus delivered to the pixel pipeline once > > per clock with N clocks latency. > > > > So latency does not turn into throughput reduction. I think. > > I'm still not totally sure if I get what you're trying to say, but it > sounds like you're not grokking the interation dependency that's > going on in the loop. The iterator cannot immediately go on to the > next pixel pair, because it doesn't have the results of the previous > iteration.
Grok is the right word, the concept is clear but the details are still a little fuzzy. I was already on my way for some gong hai fat choi by the time I realized I stated it incorrectly, the key idea is to pipe a prior result into the queue to be used later. Say the divide requires N clocks. At clock N the result of this divide will be placed in the N entry queue. At clock N+1 the results of this divide and the previous divide, read from the queue, will be combined to set up the current pixel pair. One reason for combining quotients this way would be to estimate odd numbered quotients by interpolating pairs of even numbered quotients. There are probably other dependencies I overlooked, but they can be treated the same way. I think. Furthermore, we can treat the texture pipe as a whole in a similar way. To handle more interpolants then we have multipliers for, we have to iterate. We can accomplish this by pushing partial results into a queue to hold them for as many clocks as are required to compute the remaining interpolants, then combine, convert to rgba and continue. If we're iterating twice (because we have up to twice as many interpolants we can hardcode) then a new pixel enters the texture pipe every second clock. I was mumbling about this concept much earlier, but at that time, even more incoherently than here. Surely this must be a standard technique for dealing with multi-clock latency? Regards, Daniel _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
