Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-24 Thread Richard Henderson
On 3/24/20 3:51 AM, LIU Zhiwei wrote:
>> (3) It would be handy to have TCGv cpu_vl.
> Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
> So that I can check vl==0 in translation time.

Yes.

>> vslide1up.vx:
>>  Ho hum, I forgot about masking.  Some options:
>>  (1) Call a helper just as you did in your original patch.
>>  (2) Call a helper only for !vm, for vm as below.
> 
> Sorry, I don't get it why I need a helper for !vm.
> I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

That's right.  I didn't mean a helper specific to vslide1up, but any helper.


r~



Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-24 Thread LIU Zhiwei




On 2020/3/17 1:42, Richard Henderson wrote:

On 3/16/20 1:04 AM, LIU Zhiwei wrote:

As a preference, I think you can do away with this helper.
Simply use the slideup helper with argument 1, and then
afterwards store the integer register into element 0.  You should be able to
re-use code from vmv.s.x for that.

When I try it, I find it is some difficult, because  vmv.s.x will clean
the elements (0 < index < VLEN/SEW).

Well, two things about that:

(1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
want to be prepared for that.

(2) We have 8 insns that, in the end come down to a direct element access,
possibly with some other processing.

So we'll want basic helper functions that can locate an element by immediate
offset and by variable offset:

/* Compute the offset of vreg[idx] relative to cpu_env.
The index must be in range of VLMAX. */
int vec_element_ofsi(int vreg, int idx, int sew);

/* Compute a pointer to vreg[idx].
If need_bound is true, mask idx into VLMAX,
Otherwise we know a-priori that idx is already in bounds. */
void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
   TCGv idx, int sew, bool need_bound);

/* Load idx >= VLMAX ? 0 : vreg[idx] */
void vec_element_loadi(DisasContext *s, TCGv_i64 val,
int vreg, int idx, int sew);
void vec_element_loadx(DisasContext *s, TCGv_i64 val,
int vreg, TCGv idx, int sew);

/* Store vreg[imm] = val.
The index must be in range of VLMAX.  */
void vec_element_storei(DisasContext *s, int vreg, int imm,
 TCGv_i64 val);
void vec_element_storex(DisasContext *s, int vreg,
 TCGv idx, TCGv_i64 val);

(3) It would be handy to have TCGv cpu_vl.

Do you mean I should define cpu_vl as a global TCG varible like cpu_pc?
So that I can check vl==0 in translation time.

Or just a temp variable?


Then:

vext.x.v:
 If rs1 == 0,
 Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
 else
 Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).

vmv.s.x:
 over = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
 For 0.7.1:
 Use tcg_gen_dup8i to zero all VLMAX elements of vd.
 If rs1 == 0, goto done.
 Use vec_element_storei(s, vs2, 0, x[rs1]).
  done:
 gen_set_label(over);

vfmv.f.s:
 Use vec_element_loadi(x, f[rd], vs2, 0).
 NaN-box f[rd] as necessary for SEW.

vfmv.s.f:
 tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
 For 0.7.1:
 Use tcg_gen_dup8i to zero all VLMAX elements of vd.
 Let tmp = f[rs1], nan-boxed as necessary for SEW.
 Use vec_element_storei(s, vs2, 0, tmp).
 gen_set_label(over);

vslide1up.vx:
 Ho hum, I forgot about masking.  Some options:
 (1) Call a helper just as you did in your original patch.
 (2) Call a helper only for !vm, for vm as below.


Sorry, I don't get it why I need a helper for !vm.
I think I can  call vslideup w/1 whether !vm or vm, then a store to vd[0].

Zhiwei

 (3) Call vslideup w/1.
 tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
 If !vm,
 // inline test for v0[0]
 vec_element_loadi(s, tmp, 0, 0, MO_8);
 tcg_gen_andi_i64(tmp, tmp, 1);
 tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
 Use vec_element_store(s, vd, 0, x[rs1]).
 gen_set_label(over);

vslide1down.vx:
 For !vm, this is complicated enough for a helper.
 If using option 3 for vslide1up, then the store becomes:
 tcg_gen_subi_tl(tmp, cpu_vl, 1);
 vec_element_storex(s, base, tmp, x[rs1]);

vrgather.vx:
 If !vm or !vl_eq_vlmax, use helper.
 vec_element_loadx(s, tmp, vs2, x[rs1]);
 Use tcg_gen_gvec_dup_i64 to store to tmp to vd.

vrgather.vi:
 If !vm or !vl_eq_vlmax, use helper.
 If imm >= vlmax,
 Use tcg_gen_dup8i to zero vd;
 else,
 ofs = vec_element_ofsi(s, vs2, imm, s->sew);
 tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
  ofs, vlmax, vlmax);


r~





Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-16 Thread Richard Henderson
On 3/16/20 1:04 AM, LIU Zhiwei wrote:
>> As a preference, I think you can do away with this helper.
>> Simply use the slideup helper with argument 1, and then
>> afterwards store the integer register into element 0.  You should be able to
>> re-use code from vmv.s.x for that.
> When I try it, I find it is some difficult, because  vmv.s.x will clean
> the elements (0 < index < VLEN/SEW).

Well, two things about that:

(1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll
want to be prepared for that.

(2) We have 8 insns that, in the end come down to a direct element access,
possibly with some other processing.

So we'll want basic helper functions that can locate an element by immediate
offset and by variable offset:

/* Compute the offset of vreg[idx] relative to cpu_env.
   The index must be in range of VLMAX. */
int vec_element_ofsi(int vreg, int idx, int sew);

/* Compute a pointer to vreg[idx].
   If need_bound is true, mask idx into VLMAX,
   Otherwise we know a-priori that idx is already in bounds. */
void vec_element_ofsx(DisasContext *s, TCGv_ptr base,
  TCGv idx, int sew, bool need_bound);

/* Load idx >= VLMAX ? 0 : vreg[idx] */
void vec_element_loadi(DisasContext *s, TCGv_i64 val,
   int vreg, int idx, int sew);
void vec_element_loadx(DisasContext *s, TCGv_i64 val,
   int vreg, TCGv idx, int sew);

/* Store vreg[imm] = val.
   The index must be in range of VLMAX.  */
void vec_element_storei(DisasContext *s, int vreg, int imm,
TCGv_i64 val);
void vec_element_storex(DisasContext *s, int vreg,
TCGv idx, TCGv_i64 val);

(3) It would be handy to have TCGv cpu_vl.

Then:

vext.x.v:
If rs1 == 0,
Use vec_element_loadi(s, x[rd], vs2, 0, s->sew).
else
Use vec_element_loadx(s, x[rd], vs2, x[rs1], true).

vmv.s.x:
over = gen_new_label();
tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
For 0.7.1:
Use tcg_gen_dup8i to zero all VLMAX elements of vd.
If rs1 == 0, goto done.
Use vec_element_storei(s, vs2, 0, x[rs1]).
 done:
gen_set_label(over);

vfmv.f.s:
Use vec_element_loadi(x, f[rd], vs2, 0).
NaN-box f[rd] as necessary for SEW.

vfmv.s.f:
tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
For 0.7.1:
Use tcg_gen_dup8i to zero all VLMAX elements of vd.
Let tmp = f[rs1], nan-boxed as necessary for SEW.
Use vec_element_storei(s, vs2, 0, tmp).
gen_set_label(over);

vslide1up.vx:
Ho hum, I forgot about masking.  Some options:
(1) Call a helper just as you did in your original patch.
(2) Call a helper only for !vm, for vm as below.
(3) Call vslideup w/1.
tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over);
If !vm,
// inline test for v0[0]
vec_element_loadi(s, tmp, 0, 0, MO_8);
tcg_gen_andi_i64(tmp, tmp, 1);
tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over);
Use vec_element_store(s, vd, 0, x[rs1]).
gen_set_label(over);

vslide1down.vx:
For !vm, this is complicated enough for a helper.
If using option 3 for vslide1up, then the store becomes:
tcg_gen_subi_tl(tmp, cpu_vl, 1);
vec_element_storex(s, base, tmp, x[rs1]);

vrgather.vx:
If !vm or !vl_eq_vlmax, use helper.
vec_element_loadx(s, tmp, vs2, x[rs1]);
Use tcg_gen_gvec_dup_i64 to store to tmp to vd.

vrgather.vi:
If !vm or !vl_eq_vlmax, use helper.
If imm >= vlmax,
Use tcg_gen_dup8i to zero vd;
else,
ofs = vec_element_ofsi(s, vs2, imm, s->sew);
tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd),
 ofs, vlmax, vlmax);


r~



Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-16 Thread LIU Zhiwei




On 2020/3/15 13:16, Richard Henderson wrote:

On 3/12/20 7:58 AM, LIU Zhiwei wrote:

+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+if (offset > vl) {\
+offset = vl;  \
+} \

This isn't right.


+for (i = 0; i < vl; i++) {\
+if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {\
+continue; \
+} \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
+} \
+if (i == 0) { \
+return;   \
+} \

You need to eliminate vl == 0 first, not last.
Then

 for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.



+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+for (i = 0; i < vl; i++) {\
+if (!vm && !vext_elem_mask(v0, mlen, i)) {\
+continue; \
+} \
+if (i + offset < vlmax) { \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));  \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

 if (offset >= vlmax) {
max = 0;
 } else {
max = MIN(vl, vlmax - offset);
 }
 for (i = 0; i < max; ++i)



+} else {  \
+*((ETYPE *)vd + H(i)) = 0;\
+}

Which lets these zeros merge into...


+for (; i < vlmax; i++) {  \
+CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));  \
+} \

These zeros.


+#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)   \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t i; 

Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-15 Thread Richard Henderson
On 3/14/20 11:49 PM, LIU Zhiwei wrote:
>>> +if (offset > vl) {\
>>> +offset = vl;  \
>>> +} \
>> This isn't right.
> That's to process a corner case.  As you can see the behavior of vslideup.vx
> from Section 17.4.1
> 
> 0 < i < max(vstart, OFFSET)     unchanged
> max(vstart, OFFSET) <= i < vl   vd[i] = vs2[i-OFFSET] if mask enabled,
> unchanged if not
> vl <= i < VLMAX   
>     tail elements, vd[i] = 0
> 
> 
> The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Certainly it does, right there:

   offset <= i < vl.

If offset >= vl, then that range is empty of elements.

> Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or 
> unchanged?

Tail elements.

> Here (vl <=  i  < OFFSET) elements are seen as tail elements.

Exactly.

>> Again, eliminate vl == 0 first.  In fact, why don't we make that a global
>> request for all of the patches for the next revision. 
> I don't get it.
> 
> Check vl == 0 first for all patches. Is it right?

Yes.


r~



Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-15 Thread LIU Zhiwei



On 2020/3/15 13:16, Richard Henderson wrote:

On 3/12/20 7:58 AM, LIU Zhiwei wrote:

+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+if (offset > vl) {\
+offset = vl;  \
+} \

This isn't right.
That's to process a corner case.  As you can see the behavior of 
vslideup.vx from Section 17.4.1


0 < i < max(vstart, OFFSET)     unchanged
max(vstart, OFFSET) <= i < vl 	  vd[i] = vs2[i-OFFSET] if mask enabled, 
unchanged if not

vl <= i < VLMAX
  tail elements, vd[i] = 0


The spec v0.7.1 or v0.8 does not specified when OFFSET > vl.

Should The elements (vl <=  i  < OFFSET) be seen as tail elements, or 
unchanged?


And it is possible because OFFSET is from a scalar register.

Here (vl <=  i  < OFFSET) elements are seen as tail elements.




+for (i = 0; i < vl; i++) {\
+if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {\
+continue; \
+} \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
+} \
+if (i == 0) { \
+return;   \
+} \

You need to eliminate vl == 0 first, not last.
Then

 for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

Yes, I should.


It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.



+#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)  \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
+uint32_t vm = vext_vm(desc);  \
+uint32_t vl = env->vl;\
+uint32_t offset = s1, i;  \
+  \
+for (i = 0; i < vl; i++) {\
+if (!vm && !vext_elem_mask(v0, mlen, i)) {\
+continue; \
+} \
+if (i + offset < vlmax) { \
+*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));  \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.

I don't get it.

Check vl == 0 first for all patches. Is it right?

  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.




It is probably worth changing the loop bounds to

 if (offset >= vlmax) {
max = 0;
 } else {
max = MIN(vl, vlmax - offset);
 }
 for (i = 0; i < max; ++i)


Yes.

+} else {  \
+*((ETYPE *)vd + H(i)) = 0;\
+}

Which lets these zeros merge into...

It's a mistake here.



+for (; i < vlmax; i++) {  \
+CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));  \
+} \

These zeros.


+#define 

Re: [PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-14 Thread Richard Henderson
On 3/12/20 7:58 AM, LIU Zhiwei wrote:
> +#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)\
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> +CPURISCVState *env, uint32_t desc)\
> +{ \
> +uint32_t mlen = vext_mlen(desc);  \
> +uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
> +uint32_t vm = vext_vm(desc);  \
> +uint32_t vl = env->vl;\
> +uint32_t offset = s1, i;  \
> +  \
> +if (offset > vl) {\
> +offset = vl;  \
> +} \

This isn't right.

> +for (i = 0; i < vl; i++) {\
> +if (((i < offset)) || (!vm && !vext_elem_mask(v0, mlen, i))) {\
> +continue; \
> +} \
> +*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i - offset));  \
> +} \
> +if (i == 0) { \
> +return;   \
> +} \

You need to eliminate vl == 0 first, not last.
Then

for (i = offset; i < vl; i++)

The types of i and vl need to be extended to target_ulong, so that you don't
incorrectly crop the input offset.

It may be worth special-casing vm=1, or hoisting it out of the loop.  The
operation becomes a memcpy (at least for little-endian) at that point.  See
swap_memmove in arm/sve_helper.c.


> +#define GEN_VEXT_VSLIDEDOWN_VX(NAME, ETYPE, H, CLEAR_FN)  \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> +CPURISCVState *env, uint32_t desc)\
> +{ \
> +uint32_t mlen = vext_mlen(desc);  \
> +uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
> +uint32_t vm = vext_vm(desc);  \
> +uint32_t vl = env->vl;\
> +uint32_t offset = s1, i;  \
> +  \
> +for (i = 0; i < vl; i++) {\
> +if (!vm && !vext_elem_mask(v0, mlen, i)) {\
> +continue; \
> +} \
> +if (i + offset < vlmax) { \
> +*((ETYPE *)vd + H(i)) = *((ETYPE *)vs2 + H(i + offset));  \

Again, eliminate vl == 0 first.  In fact, why don't we make that a global
request for all of the patches for the next revision.  Checking for i == 0 last
is silly, and checks for the zero twice: once in the loop bounds and again at
the end.

It is probably worth changing the loop bounds to

if (offset >= vlmax) {
   max = 0;
} else {
   max = MIN(vl, vlmax - offset);
}
for (i = 0; i < max; ++i)


> +} else {  \
> +*((ETYPE *)vd + H(i)) = 0;\
> +}

Which lets these zeros merge into...

> +for (; i < vlmax; i++) {  \
> +CLEAR_FN(vd, vl, vl * sizeof(ETYPE), vlmax * sizeof(ETYPE));  \
> +} \

These zeros.

> +#define GEN_VEXT_VSLIDE1UP_VX(NAME, ETYPE, H, CLEAR_FN)   \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
> +CPURISCVState *env, uint32_t desc)\
> +{ \
> +uint32_t mlen = vext_mlen(desc);  \
> +uint32_t vlmax = env_archcpu(env)->cfg.vlen / mlen;   \
> +uint32_t vm = vext_vm(desc);  \
> +uint32_t vl = env->vl;\
> +uint32_t i;  

[PATCH v5 57/60] target/riscv: vector slide instructions

2020-03-12 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  17 +++
 target/riscv/insn32.decode  |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  17 +++
 target/riscv/vector_helper.c| 136 
 4 files changed, 177 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 7a689a5c07..e86df5b9e4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1120,3 +1120,20 @@ DEF_HELPER_3(vfmv_s_f_b, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_h, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_w, void, ptr, i64, env)
 DEF_HELPER_3(vfmv_s_f_d, void, ptr, i64, env)
+
+DEF_HELPER_6(vslideup_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslideup_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslidedown_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1up_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vslide1down_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bfdce0979c..e6ade9c68e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -72,6 +72,7 @@
 @r2_vm   .. vm:1 . . ... . ...  %rs2 %rd
 @r1_vm   .. vm:1 . . ... . ... %rd
 @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
+@r2rd...   . . ... . ... %rs2 %rd
 @r_vm.. vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r2_zimm . zimm:11  . ... . ... %rs1 %rd
@@ -559,6 +560,12 @@ vext_x_v001100 1 . . 010 . 1010111 @r
 vmv_s_x 001101 1 0 . 110 . 1010111 @r2
 vfmv_f_s001100 1 . 0 001 . 1010111 @r2rd
 vfmv_s_f001101 1 0 . 101 . 1010111 @r2
+vslideup_vx 001110 . . . 100 . 1010111 @r_vm
+vslideup_vi 001110 . . . 011 . 1010111 @r_vm
+vslide1up_vx001110 . . . 110 . 1010111 @r_vm
+vslidedown_vx   00 . . . 100 . 1010111 @r_vm
+vslidedown_vi   00 . . . 011 . 1010111 @r_vm
+vslide1down_vx  00 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 99cd45b0aa..ef5960ba39 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -2316,3 +2316,20 @@ static bool trans_vfmv_s_f(DisasContext *s, arg_vfmv_s_f 
*a)
 }
 return false;
 }
+
+/* Vector Slide Instructions */
+static bool slideup_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+(a->rd != a->rs2));
+}
+GEN_OPIVX_TRANS(vslideup_vx, slideup_check)
+GEN_OPIVX_TRANS(vslide1up_vx, slideup_check)
+GEN_OPIVI_TRANS(vslideup_vi, 1, vslideup_vx, slideup_check)
+
+GEN_OPIVX_TRANS(vslidedown_vx, opivx_check)
+GEN_OPIVX_TRANS(vslide1down_vx, opivx_check)
+GEN_OPIVI_TRANS(vslidedown_vi, 1, vslidedown_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 3235c3fbe1..2219fdd6c5 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -4511,3 +4511,139 @@ uint64_t HELPER(vfmv_f_s_d)(void *vs2, CPURISCVState 
*env)
 return deposit64(*((uint64_t *)vs2), 32, 32, 0x);
 }
 }
+
+/* Vector Slide Instructions */
+/*
+ * the spec doesn't specify the behavior when offset is lager than vl,
+ * just truncate the offset to vl here.
+ */
+#define GEN_VEXT_VSLIDEUP_VX(NAME, ETYPE, H, CLEAR_FN)\
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \
+CPURISCVState *env, uint32_t desc)\
+{ \
+uint32_t mlen = vext_mlen(desc);  \
+uint32_t vlmax =