store

LIU Zhiwei Sun, 18 Aug 2024 18:42:31 -0700


On 2024/8/14 17:01, Richard Henderson wrote:

On 8/13/24 21:34, LIU Zhiwei wrote:

@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s,RISCVInsn opc, TCGReg data,

  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                         TCGReg arg1, intptr_t arg2)
  {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    } else {
+        tcg_debug_assert(arg >= TCG_REG_V1);
+        switch (prev_vece) {
+        case MO_8:
+            insn = OPC_VLE8_V;
+            break;
+        case MO_16:
+            insn = OPC_VLE16_V;
+            break;
+        case MO_32:
+            insn = OPC_VLE32_V;
+            break;
+        case MO_64:
+            insn = OPC_VLE64_V;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
      tcg_out_ldst(s, insn, arg, arg1, arg2);


tcg_out_ld/st are called directly from register allocation spill/fill.

You'll need to set vtype here, and cannot rely on this having beendone in tcg_out_vec_op.

OK.

That said, with a little-endian host, the selected element sizedoesn't matter *too* much. A write of 8 uint16_t or a write of 2uint64_t produces the same bits in memory.
Therefore you can examine prev_vtype and adjust only if the vectorlength changes.

OK.

But we do that -- e.g. load V256, store V256, store V128 to performa 384-bit store for AArch64 SVE when VQ=3.
Is there an advantage to using the vector load/store whole registerinsns, if the requested length is not too small?

For vector type equal or bigger than vlen in host, we will use the wholeregister instructions.

IIRC the NF field can be used to store multiples, but we can't storehalf of a register with these.


I think we can still use the unit-stride instructions for them.

Thanks,
Zhiwei

r~

Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store

Reply via email to