On 8/13/24 21:34, LIU Zhiwei wrote:
+    if (cpuinfo & CPUINFO_ZVE64X) {
+        /* We need to get vlenb for vector's extension */
+        riscv_get_vlenb();
+        tcg_debug_assert(riscv_vlen >= 64 && is_power_of_2(riscv_vlen));
+
+        if (riscv_vlen >= 256) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+        } else if (riscv_vlen == 128) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
+        } else if (riscv_vlen == 64) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
+        } else {
+            g_assert_not_reached();
+        }
+    }

I think this is over-complicated, and perhaps the reason for patch 3.

What I believe you're missing with patch 3 is the fact that when you change the lmul, adjacent vector registers get clobbered, and the tcg register allocator does not expect that. This will result in incorrect register allocation.

You need to pick one size at startup, and expose *only* those registers.

This won't affect code generation much, because we never have heavy vector register pressure. Mostly values go out of scope at the end of every guest instruction. So having only 8 or 16 visible host registers instead of 32 isn't a big deal.


r~

Reply via email to