Reply to the open-hardware list.
Vinicius Santos wrote:
On Dec 15, 2007 5:47 PM, <[EMAIL PROTECTED]> wrote:
--- [EMAIL PROTECTED] wrote:
I don't speak on generic terme, i speak about real world example
of current complexe vector shader and pixel shader.
I speak of the actual code which we intend to use; it contains a
lot of vector and matrix operations. --- We aren't trying to make a
programmable shader with these SIMD units, are we? I think we can
easily use the SIMD units in a pseudo-fixed pipeline.
Look at some previous posts on the subject:
http://article.gmane.org/gmane.comp.graphics.opengraphics/2445
http://article.gmane.org/gmane.comp.graphics.opengraphics/2553
http://article.gmane.org/gmane.comp.graphics.opengraphics/2461
These are two shader examples and an analysis of various functions
used.
People might hate me to feed an IMHO fruitless discussion
This is not a discussion. It is NB using the rhetorical device commonly
called 'hit and run'. I have made the mistake of trying to answer his
postings, but now realize that this is just a waste of my time. It
started with an off had remark about Sun having released the FGX RTL
code (VIS instructions). I guess that I will have to stop making such
remarks when there are list members who seem interested mostly in
finding a way to say that I am wrong about something.
Might I add here that there is no black and white here (except perhaps
for the fact that RISC processors don't, by definition, use microcode).
There are things wrong with every possible way that we might do this.
That is what engineering is all about. You do not find the perfect
solution to a problem, you choose what you think is the best solution
from among various good solutions.
but previous threads(including those cited above) don't exactly show
how "real world shaders" translate into scalar operations:
1-You have shader examples written in high-level(ogsl) language
2-You have some compiled into a specific(arbvp1 or arbfp1)
architecture assembly [1]
3-You have a dataflow profiling of DirectX shaders, I assume already
compiled into the architecture assembly.
The way I see it, the device driver get those arbvp assembly code and
translate them into your own architecture code. So can arbvp be
efficiently translated into a systolic/SIMD/multi-core
architecture?(stalls, dependency, etc)[*]
Those examples are not directly related to our project.
In any case, dot products are vector operations and they don't have
any dependencies when run on a vector processor that has a wide enough
word (might not be a necessary condition) and the MAC instruction.
However, dependencies might be an issue with some of the code in
ogmodel.cpp.
I think that OGSL, or C, can be translated into any of these
architectures. The major issue is how much hardware it would require to
implement it in various ways.
Using multiple SIMD processors controlled by microcode has the advantage
that it is totally reconfigurable and you know how much hardware it will
use. It also should be scalable. I have no doubt that a systolic array
processor will run faster. The questions are how much faster and how
much hardware will it require.
After that, can said architecture be implemented efficiently in FPGA?
(complexity,space,etc)
There are two efficiency issues here: adequation of shader code into
an architecture and complexity of said architecture. People are
jumping from one issue to another leaving questions unanswered.
Yes, that is 'hit and run' and I would very much like to have a coherent
discussion instead. I am not convinced of what the best solution is, so
I would like to discuss it. I see no value in jumping from one issue to
another to the point that I am sure that NB contradicted himself. I may
have as well which I would attribute to the fact that I am ill and
easily confused -- can't seem to get the brain functioning fully. :-)
This whole discussion doesn't change a couple of facts:
1-As Timothy pointed out, there is already a tested model for OGA1.
Actually, there is tested C code (ogmodel.cpp). IIUC, The
implementation of this in Verilog is not yet done. I don't know if TM
has started on it yet. He said that he had some 2D stuff running in
simulation.
It might be interesting that future versions are improvements on
that. (ie. OGA2 adds more reconfigurability to the fixed pipeline,
instead of a whole new architecture)
There are certainly advantages of configurable hardware -- especially if
it is to be a custom ASIC. IIUC, ATI & nVidia use configurable arrays
of processors. 3D-Labs switched from a fixed pixel pipeline (Oxygen) to
multiple processors.
2-OGD1 isn't released yet, so people can't hack with it to figure out
its limits in any of the propositions.
Quite true, however the number of integer hardware multipliers (and
their width) in the Xilinx chip is know. I think that this is going to
be the limiting factor, but it might turn out differently.
3-Any of the models will have to be able to handle compiz and
high-resolution efficiently. Might be a limiting factor for
programmable shaders(at least in FPGA).
The number of hardware multipliers is always going to be a limiting
factor even if designing a custom ASIC because they take up a lot of
real estate on the chip and, therefore, consume a lot of power.
--
JRT
_______________________________________________
Open-hardware mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-hardware