Hi folks. I would like to narrow and simplify the interface between the CPU and the decoder. I think I have a pretty decent working plan at the moment, and wanted to run this by everybody.
*Existing design* Right now, there are three logical components which are bound together into the ISA specific decoder class which the CPU instantiates, the predecoder, the actual decode function, and the decode cache. See this document I sent a while ago for an overview of what those are: https://docs.google.com/document/d/1quwxZOPb181jVWAh_7nX7E6uJM-7d9VDU9KxKkGmIrY/edit#heading=h.ri2i2g2awbtr This metacomponent was originally intended to have an interface with the following methods: void process() Run the decoder, interpreting data and build up instructions. void reset() Restart the decoder, throwing away any in progress state. void moreBytes(const PCState &pc, Addr fetchPC, MachInst data) Feed data from memory into the decoder. bool needMoreBytes() Check if the decoder actually needs more memory to keep processing. bool instReady() Whether the decoder has produced an instruction which can be retrieved later. void updateNPC() Fix up the PC based on the size of the current instruction. StaticInstPtr decode(PCState &nextPC) Retrieve the next instruction, and advance the PC for the CPU since it won't necessarily know how (I think). Note that this API will produce either a regular instruction, or a macroop. Microops need to be extracted separately within the CPU, either from a macroop or from the microcode ROM (recently changed to be via a method on the decoder). The original intention was for the predecoder to accept an in order stream of bytes and produce a stream of StaticInsts. Since the rate in and out could be different, possibly substantially different, it let the CPU know what was going on with the needMoreBytes and instReady methods. One major problem with this approach is that control may diverge and the instruction stream may not be linear in memory. We may need to abandon a partially processed chunk of memory and restart somewhere else. This made needMoreBytes a bit redundant, particularly in the simple CPU, where control flow could happen at any time. It's simplest to just always give the decoder more bytes when it reaches an instruction boundary. *New design* Some important concepts/design assumptions in the new design are that: 1. If instructions come from memory, they are contiguous. 2. The decoder handles extracting microops. Whatever it returns should be executed directly. 3. The only reason the decoder will ever not be able to return an instruction is that it needs the next block of memory. The interface will have this more limited set of methods: void reset() Force the decoder to reset. This may not actually be necessary, but I imagine this being used after a fault, etc. StaticInstPtr decode(PCState &pc) Return either the next instruction to execute if the decoder has all the bytes it does or doesn't need, or nullptr otherwise. StaticInstPtr decode(PCState &pc, MachInst data); The same, but simultaneously feed in the next chunk of memory. In the CPU in the main decode loop, the last two methods would be used like this: StaticInstPtr si = decode(pc); while (!si) si = decode(pc, data); This may be structured a little differently depending on the availability of a block like data, stopping to go get more data, etc. *X86 in the new design* Much of the x86 decoder would stay the same, except: Every time we entered the decoder, we'd check if the bytes we've received so far still go with the current PC. If not, we'd throw out the buffer, go to the reset state, and start over with the new PC. Note that this would check the current architectural PC. The "next" PC would change for branches, and the micropc would change for microcode control flow, but only the current architectural PC matters when deciding if the current buffer of bytes is usable. Once enough bytes are gathered, the instruction would be decoded with the decode method. If it's a macroop, then it will get set aside and the decoder will enter the microcode state. While in the microcode state or the reset state, when the final microop is reached the decoder will go back to the reset state. While in the microcode state, if it sees a rom micropc it will switch to the ROM state. *SPARC in the new design* When called with no MachInst, if in the middle of a microop, verify the arch PC and return the right microop, else nullptr. When called with a MachInst, decode it. If it's a macroop, store it to get microops from now and later, and return the first microop from it, else return the op itself. *RISCV in the new design* Or really, anything without microops. I assume RISCV doesn't use them. When called with no MachInst, return nullptr. When called with a MachInst, decode it and return the result. *So...* So, it's my hope that this will be simpler for CPUs to implement, and pull a lot of the machinery that ISAs may or may not need into their decoders themselves. That should give them better control, and get a lot of machinery out from underfoot for ISAs that don't need it. My hope is that this isn't *too* simplistic of a design which will jam up in certain situations, or not be able to do something it would otherwise need to. If you can think of a scenario where this might trip over it's own feet, fall over and explode, please let me know. Gabe
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s