On Tue, 8 Feb 2005 19:59:54 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Tuesday 08 February 2005 18:11, Lourens Veen wrote:
> > On Tuesday 08 February 2005 22:06, Daniel Phillips wrote:
> > > It's the interpolants that are really going to eat multipliers:
> > >
> > > Horizontal rasterization:
> > >
> > >   - two multiplies per interpolant for perspective correction
> > >
> > > Vertical rasterization:
> > >
> > >   - one multiply per interpolant to correct for pixel alignment
> >
> > Are these in the model yet?
> 
> Yes.
> 
> > > With 17 interpolants, most of which need perspective correction (in
> > > my opinion; some may think this justifiable only for textures)
> > > we've already exceeded our multiplier budget and haven't even begun
> > > to think about filtering, blending, mipmapping, fog and probably
> > > other things.
> >
> > If we can make a reciprocal with a LUT and some logic, maybe we can
> > do the same for a multiplier? Haven't thought it through, but
> > multiplying is generally easier than dividing.
> 
> Yes.  I suppose that is why multipliers start disappearing when you use
> the larger ram blocks.

No, the multipliers are dedicated logic.  If you wanted to put an
18x18 multiplier into a RAM block, you'd have to have a 36 bit
address.  Where are you going to fit 2^36 bits?

The reason the multipliers disappear when you use RAMs in 36-bit mode
is because each multiplier is paired with a RAM block, and they share
some data lines.  Apparently, the pairing is useful for digital signal
processing like FFTs and stuff, but since we want to use them
independently, we have to deal with some limitations.

> 
> > > So pretty soon it's time to make some hard choices about what is
> > > expendable, where to compromise on quality and throughput, and how
> > > throughput is going to degrade gracefully as features are turned
> > > on. All of which I'm sure Timothy has been thinking about, but now
> > > it's about time to take inventory and see just how bad things are.
> >
> > How complete is the software model right now? I think it would be a
> > good idea to try and complete that as much as possible. It will give
> > a complete picture of what we need to do, and a framework to figure
> > out what is the best compromise.
> 
> Did you really mean to direct all these questions to Timothy?  Anyway:
> it appears to be a rather well thought out implementation of the OpenGL
> 1.3 rendering spec, though I still haven't researched a lot of OpenGL
> details thoroughly enough to know for sure.  Others here have.

Well, if we need to add functionality, we need to figure that out. 
Otherwise, the most useful thing to do right now is to use the float25
class and perhaps start making other modifications that reflect the
implementation (like fixed-point).  But it may be too early to make
some of those decisions.

I suggest we work out exactly how many fixed-point bits each fragment
attribute needs after perspective divide.

> > > It's also possible to create more multipliers in random logic, as
> > > Timothy mentioned several times, but this is only going to work out
> > > in places where precision is really limited.
> >
> > And it may be expensive. If a single generic adder takes up 1%, then
> > how much will a multiplier be?
> 
> Floating point multiplication is easier than floating point addition:
> you multiply the mantissas, discard the least significant bits, add the
> exponents and xor the signs.  It's coming up with lots of dedicated
> fixed point multipliers that is the problem.  They can be pretty
> simple, but the simple implementation will eat a lot of logic in the
> form of adders.  These simple shift-add multipliers need a fixed point
> add for each stage, and there are as many stages as there are bits.

> Multiplication by table lookup is also possible as you mentioned.  

Only for numbers so small that you're better off using dedicated
logic.  The biggest reason to use a LUT is for complicated, nonlinear
things that are one-in/one-out like doing a reciprocal.  This also
includes color/gamma tables and things like that.

> In
> fact, this FPGA appears to work entirely by lookup tables and doesn't
> actually implement gate logic at all.  I don't know, maybe they all
> work that way, but this is the only one I've ever looked at and it does
> seem very cool.

That's a misleading way to put it.  Yes, Xilinx CLB's are made up of
small look-up tables, but all they really are are generalized
four-input logic gates.  Basically, what you have is a register with
16 bits in it and a MUX with 4 select bits.  The output of the MUX is
the logic function.  In addition to those, there are some extra XOR
gates, MUXes, and flipflops/latches.  Plus some other basic stuff.

> I still have only an inkling of where the boundaries lie in terms of
> fpga resources, but the picture that's beginning to emerge is that the
> render model as defined will happily use up all the resources this fpga
> has to offer, without some really careful shoehorning.  The thing that
> makes it hard is trying to get everything running in parallel at a
> steady two pixels per clock.  And as Timothy mentioned a few times,
> having enough resources is only half the battle, there's also routing
> to worry about.
> 
> It's fun isn't it? :-)

Indeed.  :)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to