2013/3/18 Timothy Normand Miller <[email protected]> > > > > On Mon, Mar 18, 2013 at 4:53 AM, Nicolas Boulay <[email protected]>wrote: > >> Registers are a very precious ressources. Memory are more and more slower >> than the CPU (it's even worse from the latency point of view). So having a >> register code for /dev/null is a coslty solution, if we have constraint on >> the instruction size. A cpu with large code have more pressure to reduce >> the code size, than a gpu where the code is smaller. >> >> MSP430 use 16 and 32 bits instruction size, 32 bits instruction use the >> second 16 part as immediat, it's quite clean. >> >> One of the new cpu have a specific encoding for constant. It's like >> having 3 bits that code 8 values includes -1, 0, 1, 2, 4, 8, 16, the most >> used constant to avoid to use larger code. >> >> - Large instruction world is coslty only on large code >> > > >> - dependencies between register is always a plague for performance on >> pipeline >> > > >> - Register and register adress space is one of the most precisous >> ressources of a cpu >> > > All very true. > > > >> - immediat could be coded as enum or constant name for the most used value > > > Yes. This is equivalent to having a shared extension to the register file > that contains constants. > > > I have gaps in my knowledge about some architectures, so there are some > features (such as a constant file) that I am more inclined to adopt because > earlier architectures have proved them to be useful. Once I understand > more of this, I'll be more willing to consider creative new features, and > by that point I hope to have some infrastructure for testing. > > If the instruction size is a problem, i think that a large register bank that could only be moved from and to normal register and memory could be usefull. This kind of register could replace write buffer and prefetch, by preloading. The idea is to fill 2 or 4 register in a single load or store to the main memory (preload), but partial write should be impossible. Each loop could be split in 2 or 4 using this special register bank. This better use the burst of the DRAM without problem on timing like with prefetch.
I would like to see also load_load instruction to have only a single stall instead of 2, for variable access like "struct->struct.i" . > >> >> >> Nicolas >> >> >> >> -- > Timothy Normand Miller, PhD > Assistant Professor of Computer Science, Binghamton University > http://www.cs.binghamton.edu/~millerti/ > Open Graphics Project >
_______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
