On Thu, May 31, 2012 at 9:10 AM, Nicolas Boulay <[email protected]> wrote: > 2012/5/31 Timothy Normand Miller <[email protected]>: >> Nicolas, your idea regarding LIW instructions has merit except for one >> problem: Register file bandwidth. > > Sure. In asic 8 read and 8 write exist, but i understood it's not fast > and not common. But does the extra logic to manage 4 read and 2 write > port is a good deal ? > >> With very clever pipeline organization, we can read two regs at once, >> and when the write-back occurs, it is always timed with a >> non-conflicting bank. Note that even in ASICs, SRAM blocks are almost >> always dual-ported. Now, if we were to have separate FP and INT reg > > SRAM block is good for big register file. For 4 or 8 registers, you > could even implement it as normal gate (or let the router do the job).
A small set of SRAMs is shared across 32 threads. The design will become clearer when Andre posts his stuff. > >> files (which CPUs do for bandwidth reasons), that would be fine, but >> now that doubles the SRAM resources we need for a core in a way that >> isn't economical, since the int registers will go underutilized. > > It's depend on the register total number. 8 is the minimum. 32+32 is > the norme for RISC cpu. 256 is used for SPARC VII of fujitsu. Fermi > have 32K registers if i remember correctly. Some GPUs have "scratch pad" memory, which might account for the 32K you're talking about. > >> So for very practical reasons, every instruction must have two >> register inputs and one register output. There are very few >> combinations that would work otherwise. Only memory write and branch >> have no target register. They could be combined, sometimes, with a >> single-input ALU op, but we haven't decided that we'll have any of >> those. How often is it useful to combine a memory write with an >> FPNEG? Do we really want to spend the extra logic to optimize such a >> low-probability case? >> > > And what about use explicit register bank ? You could split it in > half, with a lost cycle in case of conflict. Oh, no. We don't want to do any dynamic timing. We don't want any uncertainty about when results will be available so that we have to hold up fetch. The only time we do this is on memory reads, where we know way in advance that we're going to stall fetch until the data is ready. > > You could also have fast register (as L1 cache) and slow one (like > L2). Imagines 8 "fast" registers done with logic gate with many port, > and a large 248 register file dual ported only, but with enough > register to avoid RAM access. > > If a "many"-ported register is too costly, you could have a specific > register bank for the fpu only (only fpu could write in them). It will > looks like the big A register of some Ti DSP. > > All of this solution complexify the programming model, but enable 100% > use of the fpu. Let's get into the OGA2 design and then start seeing how we can make tweaks to that. >> >> >> >> -- >> Timothy Normand Miller >> http://www.cse.ohio-state.edu/~millerti >> Open Graphics Project -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
