Robin Mélinand wrote:
I think the key idea we should be following here is to minimize the
complexity of the hardware-software combination for the specific shading
operation.
And indeed MISC have been successfull in reducing this complexity for
general purpose compting, at least among some "extremists" following the
path opened by Forth programming language.
I was going through their papers again trying to figure out what could
be of interest for this list. Indeed they ended up with very simple
integer-only processors, many efficient attempts to implement them in
FPGA, very simple compiler heuristics, and very decent performance.
Taking advantage of their work brings to me two questions :
1. Is a MISC stack processor an option for us ? That is to say, are the
stack-paradigm drawbacks greater than it advantages ?
2. Is a FPU-based stack processor feasible ? In the works I've been
looking through, FPU had always been removed for the sake of simplicity,
but it doesn't look like this is an option for us.
The problem with using the FORTH stack paradigm is that the shader has
to deal with long data which means that the registers need to hold 4 32
bit floats. Since the length of the data varies, this would probably
result in a "sparse stack" (as in a sparse matrix) with a lot of "0"s
and wasted space. OTOH, my bytecode stack machine concept first parses
the fetched data stream and then puts it on the stack (which would be
packed and 32 bits wide) only as needed for operations that can not be
immediately executed.
However, the stack approach has the advantage that if the program is
properly written that the stack acts as a cache and you never have to
fetch a variable from main memory twice.
With my approach you would need a small memory segment and 1 way cache
to achieve this. With a small memory segment, all the data (including
what would be immediate data otherwise) for the current operation is
placed together (like with OS/360 assembler) and then it is copied once
on the first fetch into a cache which is really a local memory. You can
also have a scratch pad local memory to cache intermediate values that
don't need to be saved. This would be particularly applicable with a
shader since, IIUC, it outputs only one vector and everything else is
discarded.
--
JRT
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)