Hi everyone, I'm super excited to see all of the activity around RISC-V vector instructions right now. However, it looks like there are a few different implementations being worked on, and it's a good idea to try to unify around a single implementation and work together to get to a point where everyone in the gem5 community can benefit from this support.
Before going any further, I want to give a huge thanks to everyone that has been working on this and has made contributions to varying different implementations. I'm not going to try to name people (I'm certain I missed some in the cc line!), but I hope everyone knows that we appreciate their contributions to the project! Before diving into details of the code, if there's interest from the community I can set up a meeting time for us all to get together on zoom to chat about details and the best way to work together. Looking at the code ( https://gem5-review.googlesource.com/c/public/gem5/+/59789) and the documentation ( https://docs.google.com/document/d/1yUDPU9NvpKo1WM1WYfdx20_aXLnlHssUUsDYR4lu95Q/edit) recently submitted, I think there are many great things about this approach, and a couple of places that we should discuss potential ways to improve it. First, I think that using microcode is definitely the right way to enable configurable VLEN and to get timing memory accesses to work. Because of this, I believe that the code posted to gerrit is probably the best starting point for collaboration. Happy to hear other opinions, though. Note that the Rivos implementation on github ( https://github.com/rivosinc/gem5/tree/rivos/dev/joy/initial_RVV_support) does not use microcoded instructions, so it only works in atomic mode. However, I believe this implementation may have more instructions implemented than the one on gerrit. Also, in this implementation the VLEN is a parameter of the ISA which allows users to configure the system dynamically (which is great!). We should try to find a way to merge these two implementations. Second, we should integrate the tests ( https://github.com/huxuan0307/riscv-vector-tests) into gem5-resources ASAP. This is a fabulous contribution! Having tests for vector insts will enable much faster development. I would like to discuss one design decision in the gerrit code: Specifically how the vtype/vl is set in the decoder. Stalling the decoder to get the correct vtype/vl when vset*vl* is executed doesn't fit well with gem5's execution model, and it feels like a bit of a hack. I have an alternative proposal that I would like to hear your thoughts on. Instead of storing vtype/vl in the decoder, we could store it in the PCState. Then, the vset*vl* instruction would look a lot like a control instruction. At decode time, the next PC state could be set with some values (maybe wrong values, just like the next pc after a branch may be wrong) or if it is a vsetivli, then the next PC state would have the correct values. Then, the subsequent instructions could access the PC state to get the current vtype/vl. In the execute stage of the vset*vl*, it would set the next pc state correctly. The CPU models already check to see if the next PC is the same in execute as it was "predicted" in the decode stage (i.e., was the branch predicted correctly). We can leverage this to check to see if vtype/vl are correct. If not, the CPU models will simply squash and re-execute starting at the correct next pc (i.e., the next vector instruction will execute the correct vtype/vl after vset*vl* is executed). If we extend the branch predictor to predict the vtype/vl and use the "last" values, this should be correct a huge percentage of the time. Smarter methods could also be employed. While this may not be a particularly realistic way to implement a hardware version of RVV and vset*vl*, I think that it's probably the best way to model it in gem5 without creating a separate vector engine object which is decoupled from the CPU model. We have been working on a proof-of-concept for this here at UC Davis (see https://github.com/darchr/gem5/tree/hn/rvv-uop, though this is untested in timing mode right now). Do you all think this is a good way forward? Or, is there something that I'm missing about the decoder stalling? Cheers, Jason
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org