How many local variables should we need per shader kernel? And by local variables, I mean 32-bit registers.
If we reserve 6-bit fields, that gives us 64 scalars or 16 vectors. I think that that's not enough. 8-bit fields gives us 256 scalars or 64 vectors, but I'm afraid of them going unused, wasting tons of chip area. We need to design for the worst case, though. Say we has 32 registers but also included "scratch memory", like as a global dcache that spills into graphics memory. The problem is that if the demand for access to this exceeds its size, it'll start thrashing, and performance will bog down badly. Moreover, since we have a global dcache for surfaces anyhow, we might as well just use that. The overhead for all of these threads accessing their own dynamically allocated memory would be massive, though. Another option is to have a limited global scratch memory that threads can semi-dynamically access. Or in other words, since all kernels have the same demand, take the number of words in the global scratch memory and divide by the number of threads. Or another way to look at it, if threads demand X variables beyond their local register file, then divide the number of words in scratch space by X, and that's the max number of kernels that can be running at one time. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
