On 2024/8/14 17:11, Richard Henderson wrote:
On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -641,6 +645,13 @@ static bool tcg_out_mov(TCGContext *s, TCGType
type, TCGReg ret, TCGReg arg)
case TCG_TYPE_I64:
tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
break;
+ case TCG_TYPE_V64:
+ case TCG_TYPE_V128:
+ case TCG_TYPE_V256:
+ tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
+ tcg_target_set_vec_config(s, type, prev_vece);
+ tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
I suggest these asserts be in tcg_out_opc_*
That way you don't need to replicate to all uses.
OK.
+static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type,
unsigned vece,
+ TCGReg dst, TCGReg src)
Oh, please drop all of the inline markup, from all patches.
Let the compiler decide.
OK.
+static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type,
unsigned vece,
+ TCGReg dst, TCGReg base,
intptr_t offset)
+{
+ tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
+ return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
Is this really better than using strided load with rs2 = r0?
It depends. For our test board, it is.
Thanks,
Zhiwei
r~