Hi folks. I would like to narrow and simplify the interface between the CPU
and the decoder. I think I have a pretty decent working plan at the moment,
and wanted to run this by everybody.

*Existing design*

Right now, there are three logical components which are bound together into
the ISA specific decoder class which the CPU instantiates, the predecoder,
the actual decode function, and the decode cache. See this document I sent
a while ago for an overview of what those are:

https://docs.google.com/document/d/1quwxZOPb181jVWAh_7nX7E6uJM-7d9VDU9KxKkGmIrY/edit#heading=h.ri2i2g2awbtr

This metacomponent was originally intended to have an interface with the
following methods:


void process()
Run the decoder, interpreting data and build up instructions.

void reset()
Restart the decoder, throwing away any in progress state.

void moreBytes(const PCState &pc, Addr fetchPC, MachInst data)
Feed data from memory into the decoder.

bool needMoreBytes()
Check if the decoder actually needs more memory to keep processing.

bool instReady()
Whether the decoder has produced an instruction which can be retrieved
later.

void updateNPC()
Fix up the PC based on the size of the current instruction.

StaticInstPtr decode(PCState &nextPC)
Retrieve the next instruction, and advance the PC for the CPU since it
won't necessarily know how (I think).


Note that this API will produce either a regular instruction, or a macroop.
Microops need to be extracted separately within the CPU, either from a
macroop or from the microcode ROM (recently changed to be via a method on
the decoder).

The original intention was for the predecoder to accept an in order stream
of bytes and produce a stream of StaticInsts. Since the rate in and out
could be different, possibly substantially different, it let the CPU know
what was going on with the needMoreBytes and instReady methods.

One major problem with this approach is that control may diverge and the
instruction stream may not be linear in memory. We may need to abandon a
partially processed chunk of memory and restart somewhere else. This made
needMoreBytes a bit redundant, particularly in the simple CPU, where
control flow could happen at any time. It's simplest to just always give
the decoder more bytes when it reaches an instruction boundary.

*New design*

Some important concepts/design assumptions in the new design are that:

1. If instructions come from memory, they are contiguous.
2. The decoder handles extracting microops. Whatever it returns should be
executed directly.
3. The only reason the decoder will ever not be able to return an
instruction is that it needs the next block of memory.

The interface will have this more limited set of methods:

void reset()
Force the decoder to reset. This may not actually be necessary, but I
imagine this being used after a fault, etc.

StaticInstPtr decode(PCState &pc)
Return either the next instruction to execute if the decoder has all the
bytes it does or doesn't need, or nullptr otherwise.

StaticInstPtr decode(PCState &pc, MachInst data);
The same, but simultaneously feed in the next chunk of memory.

In the CPU in the main decode loop, the last two methods would be used like
this:

StaticInstPtr si = decode(pc);
while (!si)
    si = decode(pc, data);

This may be structured a little differently depending on the availability
of a block like data, stopping to go get more data, etc.

*X86 in the new design*

Much of the x86 decoder would stay the same, except:

Every time we entered the decoder, we'd check if the bytes we've received
so far still go with the current PC. If not, we'd throw out the buffer, go
to the reset state, and start over with the new PC. Note that this would
check the current architectural PC. The "next" PC would change for
branches, and the micropc would change for microcode control flow, but only
the current architectural PC matters when deciding if the current buffer of
bytes is usable.

Once enough bytes are gathered, the instruction would be decoded with the
decode method. If it's a macroop, then it will get set aside and the
decoder will enter the microcode state.

While in the microcode state or the reset state, when the final microop is
reached the decoder will go back to the reset state. While in the microcode
state, if it sees a rom micropc it will switch to the ROM state.

*SPARC in the new design*

When called with no MachInst, if in the middle of a microop, verify the
arch PC and return the right microop, else nullptr.
When called with a MachInst, decode it. If it's a macroop, store it to get
microops from now and later, and return the first microop from it, else
return the op itself.

*RISCV in the new design*

Or really, anything without microops. I assume RISCV doesn't use them.

When called with no MachInst, return nullptr.
When called with a MachInst, decode it and return the result.


*So...*

So, it's my hope that this will be simpler for CPUs to implement, and pull
a lot of the machinery that ISAs may or may not need into their decoders
themselves. That should give them better control, and get a lot of
machinery out from underfoot for ISAs that don't need it. My hope is that
this isn't *too* simplistic of a design which will jam up in certain
situations, or not be able to do something it would otherwise need to. If
you can think of a scenario where this might trip over it's own feet, fall
over and explode, please let me know.

Gabe
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to