2012/11/12 Timothy Normand Miller <[email protected]>: ... > So, what I envision giving you, for your embedded design, is a single > streaming multiprocessor that can run 64 threads (4 wide, 16 deep) for > fragments. All of those will share a single L1I, but that's completely > separate from the CPU's L1 caches. And then a duplicate module for > vertices, optionally. (And since the architecture is open, that division is > merely a loose recommendation.) These will have to interface with your > memory system, and if that's virtually addressed, that's just great. For > your memory system, I'm expecting to be able to queue up read requests, > where each read address comes along with a tag that indicates the requestor, > and the read data them comes back in another queue with the identifying > tags. Other analogous solutions can be implemented, but the main concern is > that since there are many requestors, we need a way to identify where the > read data has to go back to. > >
This look like "split transaction" in AMBA world. But it could be nice to have every read split to hide the latency of the memory access. This is the normal way of doing in a network with "packet" transfer. You could do this on the cpu level, with a register to wrote the adress you want to read and a register for the data to be written. If you have 20 of such dual register, you could handle 20 memory stream at the same time. Only read on the data register will block, if the data is not ready. You could use the adresse register bank to do complexe prefetch if you want. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
