Sorry, replying to self. I just remembered other benefits of the partitioned bram based shader.
#1 you can save and restore frozen execution contexts to the framebuffer, allowing task switching by (re)using the shader's DMA unit.
#2 If you can reuse use the same shader program and execution context for each fragment as you move across a scanline, I'm betting that the execution state will effectively cache the 'hot' section of the texture(s).
#3 if you then break each partition of the data memory into a several 'pages', you can effectively pass arguments into an existing shader program, by only switching one or two of the pages into the execution context.
#4 It would probably be possible to modify the instruction and data memories to be treated sort of like normal instruction and data caches; but I don't think doing that would be a good idea.
Basically the overall shader architecture I was last working on was a network of simple cpus each with a few K of almost zero latency local data and program memories (set by the bram size on the fpga family). The programming model would sort of be like MPI with a shared incoherent but fine grain lockable framebuffer.
This does not sound like the direction OGP is going to go in, but hey typing this up got my juices going.
-John -- John R. Culp [email protected] _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
