Timothy Normand Miller wrote: > The global scratch memory would have to be meted out in some > semi-static way, depending on how many tasks are expected to be > running simultaneously. This is the part that I'm debating about, > whether this is adequate, or if we need to design the local register > file for worst-case.
Does the global scratch memory really need to be global? What if groups of ALUs can share parts of their register files? An example: ALUs 0-3 run kernel 1, ALUs 4-7 run kernel 2, and so on. If kernel 2 needs many registers, then (for 0<=n<4) ALUs n, n+8 and n+12 can give half or maybe up to 3/4 of their registers to ALU n+4. This can enable fast access because RAM ports won't be a bottleneck, but I don't know how much the routing overhead will be compared to a global scratch memory design. Maybe the ALUs can be organized in a matrix where all ALUs in a row can share registers and all ALUs in a column run the same kernel. - Viktor Pracht _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
