It won't be long before we'll have to design a nanocontroller for OGD1 to manage VGA and DMA. I may be able to just go off and design one myself, but I think that many of you would fancy observing and participating in the design process, and with more brains on it, we'd do a better job.
You don't need chip design knowledge for this. You just have understand logic, have some familiarity with assembly programming, and have a sense for the parallelism that goes on in a chip. It's as though you wrote a C program where every function in your program runs simultaneously with every other function in numerous threads. How about we get started with a high-level overview. A pipelined RISC processor is broken up into stages. A stage does its work and passes its results on to the next stage while at the same time accepting new work from the stage preceeding. In steady state, as long as a stream of work is available, then all stages in the pipeline are doing useful work, with earlier stages working on earlier instructions. If you want a good textbook on this, look for "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson. Here's how we'll break up our processor pipeline, deviating slightly from the MIPS template described in that book. (1) Instruction fetch Here, you have an instruction pointer that indicates the address of the next instruction to execute. In our processor, our instructions are stored in a local static RAM inside of the FPGA, so there is no need for any sort of "cache miss" logic. With an address, you are guaranteed to get an instruction immediately on the next cycle. Our instructions are 32 bits wide. (We could go to 36 bits if we find it helpful.) (2) Instruction decode and register access One of the main principles behind RISC processors is making instruction decode absolutely trivial. The instruction is broken up into fixed fields holding register numbers, and all instructions are structured this way. What that means for us is that we can take source operand straight out of the instruction and use those as indexes into our register file, with no logic in between. We'll have 32 registers, so we need 5-bit fields. We need fields in the instruction for two source operands and one destination operand (that will get used in a later pipeline stage). This is also where we need to deal with branches. If the instruction is a branch, the condition needs to be resolved, and the address needs to be fed back to stage (1). This is why RISC processors typically have a delayed branch. The possible branch conditions are reg-value=0 and reg-value!=0. (3) ALU Here, the numbers fetched from registers in stage (2) are combined based on an opcode in the instruction. ALU operations include add, subtract, shift, multiply (using dedicated multiplier logic), and bitwise logical operations. We may implement what's called result forwarding. Since the flow of data through the processor is completely deterministic, then we can figure out which pipeline stage has an ALU result before the result has made it to the register file in stage 5. This way, you can use as a source operand in one instruction what was the target of the immediately preceeding instruction. The MIPS processor stays simple by not having any result flags. That is in an x86 processor, math instructions yield carry, zero, negative, and overflow flags (among others). MIPS doesn't do that, because it causes all sorts of challenging dependencies. You're better off using a few extra instructions and having a processor that's simpler and faster for everything else. Comparisons are done in the ALU. The subtract instruction is used for equal/not-equal comparisons. In addition, we'll provide signed and unsigned less-than instructions. With these three instructions, you can get any of the usual comparisons that you want to make. The result of the comparison is dumped into a register, just like the result of any math operation, and used by the conditional branch instruction that compares it to zero. That means we "waste" a whole 32-bit register for what is really only a single-bit result. But that approach saves us logic in the long-run. (4) Memory access and I/O This is the stage where we take an address computed above and read or write our local memory. Our "local" memory is actually another 512-word block RAM, that we'll use as scratch space. I believe the MIPS processor uses the ALU to add the contents of one register to a short immediate value stored in the instruction, and that's used as the address. We should do the same. That makes it so that the only memory addressing mode is reg-value + offset. In addition, this is also the stage where we'll want to do other I/O-related operations, such as providing access to real graphics memory and controlling other aspects of the GPU that are accessible by this processor. We'll make that available, to appear as another 512-word space (or more or less as necessary) or read-only and write-only "memory locations". We'll treat graphics memory access as though we're controlling some other device. Writes involve dropping a pair of words (address, data) into a queue. Reads involve dropping a word (address) into a queue and them some time later, popping the read data out of another queue. Those queues will show up as "memory addresses" to the CPU. In fact, the CPU will control quite a number of things by writing/reading queues. (5) Register write-back The register file read in stage 2 is actually double-pumped. It runs at double the clock rate of the rest of the processor. On the first half clock cycle, we perform writes. On the second half, we perform reads. In fact, you might say that stages (2) and (5) are really parts of the same stage. -- Timothy Miller http://www.cse.ohio-state.edu/~millerti _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
