Hi,
I'll answer each email independently for simplicity sake.

On 2012-06-06 04:07, Nicolas Boulay wrote:
Hello,

Could you add diagonal of matrix of size 2 and 4 as data type. This a
complex and quaternion datatype.

Maybe string could be added (utf-8) ? This is quite far from GPGPU but
string management are really a pain to manage in pure software.

Nvidia have a bunch of vector format coded as 16 bits numbers for rgb
value using 5 5 5 bits for each texture. "S3TC" are something like
that packs few pixels in 64 bits. etc... This kind of data avoid a lot
of shift and bit manipulation, and still save data bandwith.

You should also add a way to define array of all this data type (size
are defined at runtime), so you could introduce kind of "map"
instructions or behavior to execute an instruction to all the array.
This could help to have fast tiny inner loops. This looks like repeat
instruction of some DSP, but i think it's more comprehensive to link
the behavior to the data them-self.
For the type we use some basic data type and we try to keep it simple. Each data type we need to support add complexity we don't want and that could be included in another data type.

For the vector or matrix we don't support them. The choice was made because each alu is a scalar unit and can process vector/matrix as scalar component. The latency is greater but we are optimizing hardware resource utilization.

The big thing is that we have a "kernel" that control multiple ALU. Each alu have it's own thread. So essentially you are controlling for example N different thread of data from the same program.

In parallel to that each step of the pipeline is controlled by a different kernel. So for an alu you could have 8 different thread running different kernel and data set. if you have 4 alu that mean you have 32 threads that are running concurrently controlled by 8 kernel.

It could be interesting to save data bandwidth but we aren't trying to do that yet. we were trying to keep it simple for the few person who will write the code.

For string data type I must say I don't know any processor that support that datatype naturally. It's usually all software. I know of a few architecture that support BCD integer and float but it's for banking system.

++ and -- are very annoying function because it means to use the same
register to read and write.
You can but you could also target another register. It just that an operation that's so common that having a dedicated instruction for it can be interesting. Unless we use a constant register for the value 1.
Some  new cpu use encoding for some heavly used constant. So there is
some instruction which reserve 3 bits for coding 8 constants as 1 2 4
8 16 and not only an immediat number.

Read after write dependancy should be break to better use the
pipeline. I have already think about a "load load" instruction to
better code instruction as "pointer->tab[i]". This hide also more
memory latency if the 2 loads make a cache miss, the core will wait
only a single time instead of 2.

Normally we have a deep enough pipeline and enough kernel running to not care about read write problem. We suppose for memory acces that most of the data set will be available locally. Each section being broken in small work unit we believe it would be the case most of the time.

You should add MACC operation that is the most used instruction
(d=a*b+c). You should also think about a fast way to do polynomial
evaluation as(( ((x+a)*x+b)*x+c)*x+d)... this is used a lot in GPGPU
to approximate mathematical function and for trigonometric function.
This could be optimised because there is only one variable reused a
lot and a bunch of constants. The most common  constant could also be
hardwired.
See previous point a simple alu mean each operation is seen as a scalar operation that operation will be broken down in multiple simple instruction. For the constant there will be some value in a constant register file to help so calculus. It could be some approximation of trigonometric function to help speed up the result.
1/sqrt(x) is missing, it could be a one cycle instruction, it's much
faster than sqrt() and 1/x alone.
Divide was still a question without an answer, it was debated but not decided what was the best option.

The kennet remark about thread management looks like what AMD as done
for buldozer : many decoder that fill many ALU, instead of having one
ALU for many decoders. I don't think it's possible to have a single
decoder for a see of ALU. It looks like a large SIMD processor as Cray
cumpter. In SSE, it miss some generic instruction as vector of pointer
load, to realy vectorised all kinds of loops.
It's not SIMD it more a SIMT each alu work on an independent data set. That mean no vector instruction. We evaluated the hardware use of scalar versus vector and you have a lot of wasted resources with vector processor.
Regards,
Nicolas
2012/6/6 Andre Pouliot<[email protected]>:
Hi everyone,

I have two document to share. They are both related to OGA2 and the
programmable architecture we were planning.

The first one is the architecture description for OGA2 as was
discussed between me and Kenneth. It now a few year old, time past
fast. If people want to rework it it need some reorganizing and some
update. Contact me I'll allow edit to those who ask.
https://docs.google.com/document/pub?id=1yE70dWsRPmg723tfxouQHdK5Mlq3khU8gCRmiu1vxNI

The other document is the breakdown for an ALU for the shader that do
both float and integer. It's based mostly on the instruction set in
the specification. I still need to find back the original document I
only have found the PDF.
https://docs.google.com/open?id=0B0gdvUojV4mJUWhldEZIWWx0UEk

If you have question after looking at those document I'll try to
answer as best as I can.

Have fun

André
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to