On 2024/8/14 17:01, Richard Henderson wrote:
On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s,
RISCVInsn opc, TCGReg data,
static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
TCGReg arg1, intptr_t arg2)
{
- RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+ RISCVInsn insn;
+
+ if (type < TCG_TYPE_V64) {
+ insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+ } else {
+ tcg_debug_assert(arg >= TCG_REG_V1);
+ switch (prev_vece) {
+ case MO_8:
+ insn = OPC_VLE8_V;
+ break;
+ case MO_16:
+ insn = OPC_VLE16_V;
+ break;
+ case MO_32:
+ insn = OPC_VLE32_V;
+ break;
+ case MO_64:
+ insn = OPC_VLE64_V;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
tcg_out_ldst(s, insn, arg, arg1, arg2);
tcg_out_ld/st are called directly from register allocation spill/fill.
You'll need to set vtype here, and cannot rely on this having been
done in tcg_out_vec_op.
OK.
That said, with a little-endian host, the selected element size
doesn't matter *too* much. A write of 8 uint16_t or a write of 2
uint64_t produces the same bits in memory.
Therefore you can examine prev_vtype and adjust only if the vector
length changes.
OK.
But we do that -- e.g. load V256, store V256, store V128 to perform
a 384-bit store for AArch64 SVE when VQ=3.
Is there an advantage to using the vector load/store whole register
insns, if the requested length is not too small?
For vector type equal or bigger than vlen in host, we will use the whole
register instructions.
IIRC the NF field can be used to store multiples, but we can't store
half of a register with these.
I think we can still use the unit-stride instructions for them.
Thanks,
Zhiwei
r~