Hi everyone,

I'm super excited to see all of the activity around RISC-V vector
instructions right now. However, it looks like there are a few different
implementations being worked on, and it's a good idea to try to unify
around a single implementation and work together to get to a point where
everyone in the gem5 community can benefit from this support.

Before going any further, I want to give a huge thanks to everyone that has
been working on this and has made contributions to varying different
implementations. I'm not going to try to name people (I'm certain I missed
some in the cc line!), but I hope everyone knows that we appreciate their
contributions to the project!

Before diving into details of the code, if there's interest from the
community I can set up a meeting time for us all to get together on zoom to
chat about details and the best way to work together.

Looking at the code (
https://gem5-review.googlesource.com/c/public/gem5/+/59789) and the
documentation (
https://docs.google.com/document/d/1yUDPU9NvpKo1WM1WYfdx20_aXLnlHssUUsDYR4lu95Q/edit)
recently submitted, I think there are many great things about this
approach, and a couple of places that we should discuss potential ways to
improve it.

First, I think that using microcode is definitely the right way to enable
configurable VLEN and to get timing memory accesses to work. Because of
this, I believe that the code posted to gerrit is probably the best
starting point for collaboration. Happy to hear other opinions, though.

Note that the Rivos implementation on github (
https://github.com/rivosinc/gem5/tree/rivos/dev/joy/initial_RVV_support)
does not use microcoded instructions, so it only works in atomic mode.
However, I believe this implementation may have more instructions
implemented than the one on gerrit. Also, in this implementation the VLEN
is a parameter of the ISA which allows users to configure the system
dynamically (which is great!). We should try to find a way to merge these
two implementations.

Second, we should integrate the tests (
https://github.com/huxuan0307/riscv-vector-tests) into gem5-resources ASAP.
This is a fabulous contribution! Having tests for vector insts will enable
much faster development.

I would like to discuss one design decision in the gerrit code:
Specifically how the vtype/vl is set in the decoder. Stalling the decoder
to get the correct vtype/vl when vset*vl* is executed doesn't fit well with
gem5's execution model, and it feels like a bit of a hack.

I have an alternative proposal that I would like to hear your thoughts on.
Instead of storing vtype/vl in the decoder, we could store it in the
PCState. Then, the vset*vl* instruction would look a lot like a control
instruction. At decode time, the next PC state could be set with some
values (maybe wrong values, just like the next pc after a branch may be
wrong) or if it is a vsetivli, then the next PC state would have the
correct values. Then, the subsequent instructions could access the PC state
to get the current vtype/vl.

In the execute stage of the vset*vl*, it would set the next pc state
correctly. The CPU models already check to see if the next PC is the same
in execute as it was "predicted" in the decode stage (i.e., was the branch
predicted correctly). We can leverage this to check to see if vtype/vl are
correct. If not, the CPU models will simply squash and re-execute starting
at the correct next pc (i.e., the next vector instruction will execute the
correct vtype/vl after vset*vl* is executed). If we extend the branch
predictor to predict the vtype/vl and use the "last" values, this should be
correct a huge percentage of the time. Smarter methods could also be
employed.

While this may not be a particularly realistic way to implement a hardware
version of RVV and vset*vl*, I think that it's probably the best way to
model it in gem5 without creating a separate vector engine object which is
decoupled from the CPU model.

We have been working on a proof-of-concept for this here at UC Davis (see
https://github.com/darchr/gem5/tree/hn/rvv-uop, though this is untested in
timing mode right now). Do you all think this is a good way forward? Or, is
there something that I'm missing about the decoder stalling?

Cheers,
Jason
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org

Reply via email to