[EMAIL PROTECTED] wrote:
>From an architecture standpoint why not go for 5 execution unit in one bloc. On those 5 unit one is dedicated for memory management load store register and data mouvement, the other make a single execution unit for vector operation or 4 distinct unit for scalar operation. Each unit could do only a subset of the scalar operation, they don't all need to be able to do the same one, also it will help reduce the overal size of each unique execution bloc. With such an architecture and since the code to run is rather small, it will probaly be possible to optimise the order for the operation for doing most stuff in parallel. Also since all the unit, work at the same time. We just need to define a rather large instruction memory on chip, it dosn't need to be deep since for first generation shader program couldn't depass 255 instruction for a basic program. So one instruction line feed a the time 5 operation. It look like a little bit like a dsp architecture. After that you could reproduce the meta bloc many time depending on the performance you want. But with more than one bloc you will need a kind of dispatcher(hardware or software with the driver) to divide the work. Since it is a small processor if you don't have to much dependancy betwen different instruction and you know the number of clock for execution you could have multiple instruction executing at the same time by pipelining the operation. |
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
