On Tue, Oct 6, 2009 at 12:50 AM, Hugh Fisher <[email protected]> wrote: > Timothy Normand Miller wrote: >> >> How many local variables should we need per shader kernel? And by >> local variables, I mean 32-bit registers. >> >> If we reserve 6-bit fields, that gives us 64 scalars or 16 vectors. I >> think that that's not enough. >> >> 8-bit fields gives us 256 scalars or 64 vectors, but I'm afraid of >> them going unused, wasting tons of chip area. > > Do "local variables" include the vertex attributes / fragment shader > varying values? (For those who are a bit rusty on GPU shaders, these > are the "incoming argument values" for shader kernels: vertex coords, > surface normals, tex coords, for vertex shaders; interpolated coords, > tex coords, and colors for fragment shaders.)
Yes. The way I want to handle these arguments is something like this: The rasterizer can handle some large number of arguments (not sure how many), but it doesn't iterate all of them simultaneously. For instance, it might do them in groups of 8 scalars, which should cover really simple cases like gouraud shading. With multiple textures, etc, you can switch to 16 or other multiples of 8. This way, we don't waste resources on counters that we don't always use. Besides, there's the next part where the shader has to pull them out of a pipe, which can take multiple cycles. The shader would have some sort of queue of these arguments that it pops from in order to retrieve them, and it would have to store them in registers (or some sort of local variable). > The OpenGL spec requires at least 16 4-way vector attributes for > vertex shaders, and at least 32 4-way vector varying values for > fragment shaders. So, 128 regs, just for arguments. > > Assuming that OGA2 is not intended for the high end gaming market, > I suggest enough registers to handle the fixed function pipeline for > shaded & textured polygons, and accept a performance penalty on > more complex shaders with reads/writes of additional arguments from > some form of per-kernel memory. There are five types of "memory" that each task (thread running kernel on shader) would have access to: - Local registers - Virtex/fragment attributes - Global constants - Graphics memory (the slowest) - Global scratch memory The global constants could be stored in a block RAM and would be read-only by tasks. The global scratch memory would have to be meted out in some semi-static way, depending on how many tasks are expected to be running simultaneously. This is the part that I'm debating about, whether this is adequate, or if we need to design the local register file for worst-case. > > For the fixed function pipeline the vertex shader is the more complex > one, needing 4 argument vectors and enough working registers for a > full matrix x vector transform and Gouraud lighting equation with one > light source. Even in this case, I'm not sure how many scalars it translates into. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
