Timothy Normand Miller wrote:

> The global scratch memory would have to be meted out in some
> semi-static way, depending on how many tasks are expected to be
> running simultaneously.  This is the part that I'm debating about,
> whether this is adequate, or if we need to design the local register
> file for worst-case.

Does the global scratch memory really need to be global? What if groups
of ALUs can share parts of their register files? An example: ALUs 0-3
run kernel 1, ALUs 4-7 run kernel 2, and so on. If kernel 2 needs many
registers, then (for 0<=n<4) ALUs n, n+8 and n+12 can give half or maybe
up to 3/4 of their registers to ALU n+4.

This can enable fast access because RAM ports won't be a bottleneck, but
I don't know how much the routing overhead will be compared to a global
scratch memory design. Maybe the ALUs can be organized in a matrix where
all ALUs in a row can share registers and all ALUs in a column run the
same kernel.


- Viktor Pracht

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to