Timothy Normand Miller wrote: > On 9/3/07, Patrick McNamara <[EMAIL PROTECTED]> wrote: > >> Timothy Normand Miller wrote: >> >>> Well, that's not a bad idea. It's also worth pondering architectures >>> that have 512 local registers, unifying the scratch space with the >>> register file. But that may be too radical. >>> >>> >> This is pretty common in micro controllers. In the 8 bit AVR series, >> there are 32 registers that are the low 32 bytes of the memory space. >> Same goes for the 8051 series, though the 8051 also supports a >> completely separate external memory space as well. Personally, I think >> make perfect sense to not distinguish between the register set, scratch >> memory, and I/O space. It reduces the number of instruction types, >> reduces the amount of decode logic, etc. >> >> While we obvious want to make the controller as easily programmable as >> possible, efficiency in execution and efficiency in implementation take >> precedence in my mind. >> > > Can you give a little more detail on what you're envisioning here? > The advantage to a REG-REG architecture is that the instructions are > simple and fixed-size. Mapping registers into the memory space has > some interesting theoretical advantages, but now you need more logic > to distinguish, and you lose some of the benefits of the way a RISC > processor is pipelined. > > First, my apologies as I have only been skimming the posts related to the nano-controller. I haven't had the time available to get deeply involved in them which I tend to do with things that really interest me. So forgive me if I am re-hashing prior discussions.
As a starting point for having a single memory space for registers and RAM take for example the ATtiny45. This controller has 32 general purpose registers, 64 I/O registers, and 256 bytes of RAM. The memory maps looks something like this. 0x0000-0x001F: general purpose registers 0x0020-0x005F: I/O registers 0x0060-0x015F: RAM I won't go into the AVR instruction set, but I can access any location within the memory map with a single instruction type. The AVR ISA does still preserve register syntax in a number of different instruction mnemonics, and we could as well. Nothing says we couldn't map several mnemonics to a single instruction. All instructions now comprise of two source and one destination address fields. If we allow for immediates, they replace one or both of the source addresses. To go with Petter's example, the high bit selects IO space or memory space. Assuming we allow for more memory space than we need for IO space, it would be quite ok to mirror the IO space. Say for example you have a 128 byte memory space but only need 32 bytes of IO space, you can effectively ignore bits 5 and 6 in the IO address and effectively replicate the IO space four times. I'm afraid I haven't been paying close enough attention lately to have a good feel for how big of a scratch RAM space is needed. My assumption in all this is that the controller does not have direct access to the card memory space. That card memory access would be done through IO ports or we would have explicit instructions for card memory access ala the MOVX instruction on the Intel 8051 series. All working memory for the controller would be in the controller core. IIRC, if a BRAM is 512x36 correct? Since the BRAM is dual ported are allowed 2 reads and two writes per cycle assuming you read on one clock edge and write on the other. We could break the BRAM in two, using half for memory/register and the upper half as dedicated stack space. Even if you only get one read and write per cycle, appropriately designing the pipeline could work around this. Something else I was thinking about relates to using the same controller core for both PCI and VGA duties. We effectively have to be able to context switch to do this, and we have to be able to do it quickly to meet PCI timing requirements. I don't know what our BRAM budget is right now, but could we effectively have two sets of memory/registers and stack for the core. When we need to context switch we switch BRAMS. You could actual expand this to as many BRAMs as you want to use. To keep from having to flush the pipe, and a context pipeline that marches in step with the normal processor pipeline. For two contexts this is just an n bit shift register (where n is the number of stages in the pipeline). The value of the bit at any given stage in the pipeline selects the target BRAM for that stage. More than two hardware contexts means expanding the width of course. Context switching does of course bring us back to the problem of the multiplier. If multiplying doesn't stall the pipe waiting for the answer, then we really don't want to context switch (or interrupt) in the middle of a multiply. This causes all sort of problems though since we are effectively working in a realtime environment. If we need to go service a PCI transaction, we can't wait 10-20 cycles for a pending multiply to finish. This means that we have to have the output be context (or interrupt) aware. If the multiplier is context aware then the answer could be written to a separate output as necessary. Which brings me to a question that has been tickling the back of my head for a bit. Why aren't we using the multipliers embedded in the FPGA? I know there are limitations on how the BRAMs can be configured and still use the multipliers, but I couldn't find anything quickly in my archive of list messages. I suppose I need to go take a look at what Petter has been working on and then ask some more questions. :) Patrick M _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
