Re: [Open-graphics] Intellectual Property, cash-flow, GPU architecture, OGA2, innovation, dominance!

Nicolas Boulay Mon, 28 May 2012 13:53:51 -0700

2012/5/28 Andre Pouliot <[email protected]>:
> Hi Ncolas
>
> 2012/5/28 Nicolas Boulay <[email protected]>:
>> How could you be effiscient on fully scalar shader with a single
>> decoder for 4 alu ? How do you manage register/memory bank with many
>> ports ? You need many port to fill many scalar pipelines but many
>> ports means slower accesses.
>>
>> On GPU, they used large register bank to avoid using RAM at maximum
>> (32K registres for Fermi ?).
>>
> The design is still a paper design at this point. There was a few
> thing that weren't fully tough on how to do it.  But the hardware I
> was thinking the design while using spartan6 fpga as a target. So a
> lot of BRAM where to be use and also flip-flop in the fabric.
>
http://milkymist.org project was done with no special technologies in
mind. The code was portable. It takes 5 years to finish the HDL code
and then the code was target to the best available plateforme (they
have a kind of tiny fragment shader).


>> I try to think about very high level instruction design to fill many
>> scalar pipeline. GPU usualy have many datas format manipulate by
>> register (packed rgb, etc...). But why not adding even square matrix
>> of fixe size 2 to 4, diagonals (to do complex and quaternions
>> calculus), and vector of "any" size (and array of vector).
>>
>> For exemple, the multiplication of an array of vector of size 4 to a
>> single matrix of size 4 could use many ressource in the same time
>> effciently.
>>
> We had that discussion in the past. Scalar is more practical that you
> use less hardware than a vector architecture. That mean a more optimal
> utilisation of the hardware, but at the cost that you need to unroll
> the vector operation in scalar one.

I know that. But if you have a flexible enough design you don't have
the problem. "Simple" SIMD can't manage vector of size 3 easly for
example. But if your design have many decoder and many alu, you could
fill them with such instructions.

You could imagine the design as few big decoder in front of risc cpu.
If there is many "layer" of registers, it's easer to make them fast.

>
> Regards,
> André
>>
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Intellectual Property, cash-flow, GPU architecture, OGA2, innovation, dominance!

Reply via email to