On Tue, Oct 6, 2009 at 4:41 PM, Viktor Pracht <[email protected]> wrote: > Timothy Normand Miller wrote: > >> The global scratch memory would have to be meted out in some >> semi-static way, depending on how many tasks are expected to be >> running simultaneously. This is the part that I'm debating about, >> whether this is adequate, or if we need to design the local register >> file for worst-case. > > Does the global scratch memory really need to be global? What if groups > of ALUs can share parts of their register files? An example: ALUs 0-3 > run kernel 1, ALUs 4-7 run kernel 2, and so on. If kernel 2 needs many > registers, then (for 0<=n<4) ALUs n, n+8 and n+12 can give half or maybe > up to 3/4 of their registers to ALU n+4.
Usually, only one kernel will be running at a time. That is, 1024 fragment tasks will be running (or scheduled) at the same time, all running the same kernel. Thus, all running tasks will require the same resources. Thus, there are no trade-offs to be had. Or, you might say, the tradeoff is in the number of simultaneous tasks that can be running. See below. > > This can enable fast access because RAM ports won't be a bottleneck, but > I don't know how much the routing overhead will be compared to a global > scratch memory design. Maybe the ALUs can be organized in a matrix where > all ALUs in a row can share registers and all ALUs in a column run the > same kernel. Say we provide 1024 words in the global memory. And say we provide 128 scalar registers to each shader, but each task needs 131 local variables. That means we have to divide the 1024 globals by three, and therefore, we can only process 341 fragments simultaneously. You see where this is going. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
