Re: Fwd: [Open-graphics] Relaxing some of the speed requirements

Daniel Phillips Fri, 04 Feb 2005 21:32:30 -0800

On Friday 04 February 2005 18:11, Timothy Miller wrote:
> On Fri, 4 Feb 2005 18:02:16 -0500, Daniel Phillips 
<[EMAIL PROTECTED]> wrote:
> > On Friday 04 February 2005 17:41, Timothy Miller wrote:
> > > On Fri, 4 Feb 2005 17:21:03 -0500, Daniel Phillips
> >
> > <[EMAIL PROTECTED]> wrote:
> > > > But why can't we solve this with an intermediate queue that
> > > > holds the same number of entries as stages in the iteration?
> > >
> > > Ummm... I'm not sure I understand what you're saying, but it
> > > sounds vaguely like the solution I already have.  :)
> >
> > The problem is, I don't really know the hardware terminology very
> > well. I'll try to say it in more words.
> >
> > Suppose each horizontal iteration including the divide
> > approximation requires N stages and yields one pair of results per
> > clock.  The goal is to deliver results to the next stage of the
> > pixel pipeline once per clock.  To get around the N stage latency,
> > instead of delivering results to the next pixel stage, they are
> > delivered to an N entry queue, which advances one entry per clock. 
> > The horizontal iterator goes to work on the next pixel pair
> > immediately.  Results are thus delivered to the pixel pipeline once
> > per clock with N clocks latency.
> >
> > So latency does not turn into throughput reduction.  I think.
>
> I'm still not totally sure if I get what you're trying to say, but it
> sounds like you're not grokking the interation dependency that's
> going on in the loop.  The iterator cannot immediately go on to the
> next pixel pair, because it doesn't have the results of the previous
> iteration.


Grok is the right word, the concept is clear but the details are still a 
little fuzzy.  I was already on my way for some gong hai fat choi by 
the time I realized I stated it incorrectly, the key idea is to pipe a 
prior result into the queue to be used later.  Say the divide requires 
N clocks.  At clock N the result of this divide will be placed in the N 
entry queue.  At clock N+1 the results of this divide and the previous 
divide, read from the queue, will be combined to set up the current 
pixel pair.

One reason for combining quotients this way would be to estimate odd 
numbered quotients by interpolating pairs of even numbered quotients.  
There are probably other dependencies I overlooked, but they can be 
treated the same way.

I think.

Furthermore, we can treat the texture pipe as a whole in a similar way.  
To handle more interpolants then we have multipliers for, we have to 
iterate.   We can accomplish this by pushing partial results into a 
queue to hold them for as many clocks as are required to compute the 
remaining interpolants, then combine, convert to rgba and continue.  If 
we're iterating twice (because we have up to twice as many interpolants 
we can hardcode) then a new pixel enters the texture pipe every second 
clock.  I was mumbling about this concept much earlier, but at that 
time, even more incoherently than here.

Surely this must be a standard technique for dealing with multi-clock 
latency?

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: Fwd: [Open-graphics] Relaxing some of the speed requirements

Reply via email to