Fwd: [Open-graphics] OGA2 SIMD/MIMD

Nicolas Boulay Wed, 23 Sep 2009 05:56:26 -0700

2009/9/23 Kenneth Ostby <[email protected]>:
> Nicolas Boulay:
>>2009/9/23 Kenneth Ostby <[email protected]>:
>>> Hi,
>>>
>>> Nicolas Boulay:
>>>>2009/9/23 Hugh Fisher <[email protected]>:
>>>>> Andre Pouliot wrote:
>><...>
>>>>
>>>>Personnaly LIW is what i prefer : exposed every unit of the shader in
>>>>the instruction word. Then it became a software challenge to optimise
>>>>them.
>>>
>>> I'm unsure if LIW is the good option for this architecture. This due to
>>> the fact that Andre mentioned earlier, we have a lot of threads that
>>> needs to to execute the same instruction over data in close spatial
>>> locality. Hence, there is really no use in having fine grained control
>>> over the different units in a single shader, since in most cases they
>>> are going to execute the same instruction anyways. Thus, including LIW
>>> will only increase the complexity of the hardware, without providing any
>>> substantial gains.
>>>
>>
>>I doesn't understand your point. That means that the ALU will be full
>>but the other unit will be unused ? for example adder and
>>multiplication could be a separat unit, both could be filled in the
>>same time (MAC instead of MUL + an adder should be better).
>
> Aaah, the joy of terminology. If you take a look at the shader unit
> figure in [1], you can see how we plan to have several ALUs in a single
> shader. All those ALUs will execute the same instruction in over
> different threads. Thus, exposing the ALUs for the software developer
> only adds more complexity on both the hardware and software. Futhermore,
> the software side will in most cases only have to duplicate the same
> instruction over several ALUs.
>


For me it's the definition of SIMD code. How do you deal with
branchies ? You execute both branches and one is discared ? If you
used masked vector it's looks very like the new 512 bits vector
instruction from intel and larrabee (avx ?).

> That being said, after having finished my coffee, and had some time to
> think, we might be able to utilize LIW, although I'm still unsure about
> the cost to benefit ratio. Imagine if we, in what we call the ALUs,
> include several functional units, adders, multipliers, &c. we can use
> LIW in order to fully utilize them. However, this comes with the added
> cost of logic, and design complexity. The simple way to solve this could
> be to add a single multiply-adder unit inside each ALU, and thus we
> avoid the LIW problem altogether.
>

An x86 instruction use 2 registers adresse, 1 for reading, 1 for
read/write. It's compact but fast only with register renaming.
Typical RISC operation is 2r1w, 3 adresses, 2 read, one write. MAC
operation is 3 read 1 write.

An LIW could be seens as 6 reads, 3 writes execution unit.

>From your terminology, it's look like an ALU with a lot of register
port. (for exemple MAC/MUL unit, beside load/store, beside complete
ALU without MUL)

>
>>
>>>>
>>>>One other solution is having word aligned instructions. So you could
>>>>have 32, 64, 128 bits instructions size.
>>>
>>> Before we decide on the length of the instruction, it would be fun to
>>> further investigate some stuff from real life. And this is where we can
>>> benefit from some of the software dudes out there. I would like to see
>>> how big the average shader code is, compared to the available memory we
>>> have on the underlying technology. Cause due to my initial calculations
>>> here, if we assume 32'000 instructions in a kernel( Which from what I
>>> have seen is a lot ), we use about 250KB [1] to store it using 64 bit
>>> instruction words.  That also leaves us with a lot of flexibility in the
>>> instruction word, and the decoding should really not be that hard
>>> either. However, depending on the underlying technology, 250KB might be
>>> a lot of RAM.
>>
>>I hope you could put more than a single RISC instruction on 64 bits !
>>If you take 3 "basic" instructions in 64 bits. You should divide your
>>result by 3.
>
> Yup, I haven't been thinking a lot about how to structure the ISA yet,
> and of-course, using 64 bits for a RISC-ish ISA is waste of space. The
> 64 bit was just to get an example of a worst-case kernel size. However,
> it would still be interesting to get some metric on the average shader
> size though, so we can get a better feeling of how big real-world
> programs are.
>
> [1]
> http://docs.google.com/View?id=dfsp4qpd_41dtrrskfb#Specification_for_Shaders_9367_2463043036062943
>
> --
> Life on the earth might be expensive, but it
> includes an annual free trip around the sun.
>
> Kenneth Østby
> http://langly.org
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkq5+jUACgkQpcFZhY+Vljx4dACfQ83XLoHPa2E4OQs3Lk+2DFC6
> hygAmwXz76ZBT/2N591rTjzhQsISzYQw
> =a7Nv
> -----END PGP SIGNATURE-----
>
>
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Fwd: [Open-graphics] OGA2 SIMD/MIMD

Reply via email to