2009/9/23 Kenneth Ostby <[email protected]>: > Nicolas Boulay: >>2009/9/23 Kenneth Ostby <[email protected]>: >>> Hi, >>> >>> Nicolas Boulay: >>>>2009/9/23 Hugh Fisher <[email protected]>: >>>>> Andre Pouliot wrote: >><...> >>>> >>>>Personnaly LIW is what i prefer : exposed every unit of the shader in >>>>the instruction word. Then it became a software challenge to optimise >>>>them. >>> >>> I'm unsure if LIW is the good option for this architecture. This due to >>> the fact that Andre mentioned earlier, we have a lot of threads that >>> needs to to execute the same instruction over data in close spatial >>> locality. Hence, there is really no use in having fine grained control >>> over the different units in a single shader, since in most cases they >>> are going to execute the same instruction anyways. Thus, including LIW >>> will only increase the complexity of the hardware, without providing any >>> substantial gains. >>> >> >>I doesn't understand your point. That means that the ALU will be full >>but the other unit will be unused ? for example adder and >>multiplication could be a separat unit, both could be filled in the >>same time (MAC instead of MUL + an adder should be better). > > Aaah, the joy of terminology. If you take a look at the shader unit > figure in [1], you can see how we plan to have several ALUs in a single > shader. All those ALUs will execute the same instruction in over > different threads. Thus, exposing the ALUs for the software developer > only adds more complexity on both the hardware and software. Futhermore, > the software side will in most cases only have to duplicate the same > instruction over several ALUs. >
For me it's the definition of SIMD code. How do you deal with branchies ? You execute both branches and one is discared ? If you used masked vector it's looks very like the new 512 bits vector instruction from intel and larrabee (avx ?). > That being said, after having finished my coffee, and had some time to > think, we might be able to utilize LIW, although I'm still unsure about > the cost to benefit ratio. Imagine if we, in what we call the ALUs, > include several functional units, adders, multipliers, &c. we can use > LIW in order to fully utilize them. However, this comes with the added > cost of logic, and design complexity. The simple way to solve this could > be to add a single multiply-adder unit inside each ALU, and thus we > avoid the LIW problem altogether. > An x86 instruction use 2 registers adresse, 1 for reading, 1 for read/write. It's compact but fast only with register renaming. Typical RISC operation is 2r1w, 3 adresses, 2 read, one write. MAC operation is 3 read 1 write. An LIW could be seens as 6 reads, 3 writes execution unit. >From your terminology, it's look like an ALU with a lot of register port. (for exemple MAC/MUL unit, beside load/store, beside complete ALU without MUL) > >> >>>> >>>>One other solution is having word aligned instructions. So you could >>>>have 32, 64, 128 bits instructions size. >>> >>> Before we decide on the length of the instruction, it would be fun to >>> further investigate some stuff from real life. And this is where we can >>> benefit from some of the software dudes out there. I would like to see >>> how big the average shader code is, compared to the available memory we >>> have on the underlying technology. Cause due to my initial calculations >>> here, if we assume 32'000 instructions in a kernel( Which from what I >>> have seen is a lot ), we use about 250KB [1] to store it using 64 bit >>> instruction words. That also leaves us with a lot of flexibility in the >>> instruction word, and the decoding should really not be that hard >>> either. However, depending on the underlying technology, 250KB might be >>> a lot of RAM. >> >>I hope you could put more than a single RISC instruction on 64 bits ! >>If you take 3 "basic" instructions in 64 bits. You should divide your >>result by 3. > > Yup, I haven't been thinking a lot about how to structure the ISA yet, > and of-course, using 64 bits for a RISC-ish ISA is waste of space. The > 64 bit was just to get an example of a worst-case kernel size. However, > it would still be interesting to get some metric on the average shader > size though, so we can get a better feeling of how big real-world > programs are. > > [1] > http://docs.google.com/View?id=dfsp4qpd_41dtrrskfb#Specification_for_Shaders_9367_2463043036062943 > > -- > Life on the earth might be expensive, but it > includes an annual free trip around the sun. > > Kenneth Østby > http://langly.org > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkq5+jUACgkQpcFZhY+Vljx4dACfQ83XLoHPa2E4OQs3Lk+2DFC6 > hygAmwXz76ZBT/2N591rTjzhQsISzYQw > =a7Nv > -----END PGP SIGNATURE----- > > _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
