Re: [Qemu-devel] RISC-V: Vector && DSP Extension

2019-08-10 Thread LIU ZhiWei



On 8/8/19 6:48 AM, Chih-Min Chao wrote:



On Thu, Aug 8, 2019 at 7:29 PM Aleksandar Markovic 
mailto:aleksandar.m.m...@gmail.com>> wrote:


On Thu, Aug 8, 2019 at 11:52 AM liuzhiwei mailto:zhiwei_...@c-sky.com>> wrote:

> Hi all,
>
>     My workmate  and I have been working on Vector & Dsp
extension, and
> I'd like to share develop status  with folks.
>
>     The spec references for  Vector extension is
riscv-v-spec-0.7.1, and
> riscv-p-spec-0.5 for DSP extension.


Hello, Liu.

I will not answer your questions directly, however I want to bring
to you
and others another perspective on this situation.

First, please provide the link to the specifications. Via Google,
I found
that "riscv-v-spec-0.7.1" is titled "Working draft of the proposed
RISC-V V
vector extension". I could not find "riscv-p-spec-0.5".

I am not sure what the QEMU policy towards "working draft
proposal" type of
specification is. Peter, can you perhaps clarify that or any other
related
issue?


Hi Aleksandar,

As for riscv-v-spec 0.7.1, it is first stable spec for target software 
development
though the name is working draft.  The architecture skeleton is fix 
and most of
work are focusing the issues related to micro-architecture 
implementation complexity.
Sifive has released an open source implementation on spike simulation 
and Imperas also
provides another implementation with its binary simulator.  I think it 
is worth to include the extension

in Qemu at this moment.

As for riscv-p-spec-0.5, I think Andes has fully supported this 
extension and should release more
detailed spec in the near future (described Riscv Technical Update 
2019/06).
They have implement lots of DSP kernel based on this extension and 
also provided impressed
performance result.  It is also worth to be reviewed (at least [RFC]) 
if the detailed  spec is public.



ref:
     1. 
https://content.riscv.org/wp-content/uploads/2019/06/17.40-Vector_RISCV-20190611-Vectors.pdf
     2. 
https://content.riscv.org/wp-content/uploads/2019/06/17.20-P-ext-RVW-Zurich-20190611.pdf
     3. 
https://content.riscv.org/wp-content/uploads/2019/06/10.05-TechCommitteeUpdate-June-2019-Copy.pdf



chihmin


Hi chihmin,

Thank you for the detailed and informative response. You have a very 
good understanding of the specifications.


I will modify the code according to the latest spec(currently 
riscv-v-spec 0.7.2) from the ISA spec git repo as Alistair advised.


Yours,

Zhiwei



I would advice some caution in these cases. The major issue is
backward
compatibility, but there are other issues too. Let's say,
fairness. If we
let emulation of a component based on a "working draft proposal" be
integrated into QEMU, this will set a precedent, and many other
developer
would rightfully ask for their contributions based on drafts to be
integrated into QEMU. Our policy should be as equal as possible to all
contribution, large or small, riscv or alpha, cpu or device, tcg
or kvm -
in my honest opinion. QEMU upstream should not be a collecting
place for
all imaginable experimentations, certain criteria on what is
appropriate
for upstreaming exist and must continue to exist.

Yours,
Aleksandar




> The code of vector extension is
> ready and under testing,  the first patch will be sent about two
weeks
> later. After that we will forward working on DSP extension, and
send the
> first patch in middle  October.
>
>      Could the maintainers  tell me whether the specs referenced are
> appropriate? Is anyone working on these extensions? I'd like to get
> your status, and maybe discuss questions and work togather.
>
> Best Regards
>
> LIU Zhiwei
>
>
>
>



Re: [Qemu-devel] RISC-V: Vector && DSP Extension

2019-08-10 Thread LIU ZhiWei



On 8/9/19 6:54 PM, Alistair Francis wrote:

On Thu, Aug 8, 2019 at 2:52 AM liuzhiwei  wrote:

Hi all,

 My workmate  and I have been working on Vector & Dsp extension, and
I'd like to share develop status  with folks.

Cool!


 The spec references for  Vector extension is riscv-v-spec-0.7.1, and
riscv-p-spec-0.5 for DSP extension. The code of vector extension is
ready and under testing,  the first patch will be sent about two weeks
later. After that we will forward working on DSP extension, and send the
first patch in middle  October.

What code are you talking about? Is this QEMU code?


Hi Alistair,

It's the QEMU code I have been working on these days, which implements Vector 
extension. It is under testing,
and will be sent later.


  Could the maintainers  tell me whether the specs referenced are
appropriate? Is anyone working on these extensions?  I'd like to get
your status, and maybe discuss questions and work togather.

Just use the latest (master) from the ISA spec git repo.


I will follow your advice.Thanks for your attention to this matter.

Best Regards,

Zhiwei



I don't know anyone doing vector work for QEMU. It would be very
useful, but everyone is busy with something at the moment
unfortunately.

Alistair


Best Regards

LIU Zhiwei





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-19 Thread LIU Zhiwei
rgument to vsetvli is a good choice, because it is constant,
relates directly to the compiled code, and is unrelated to the length of the
data being processed.

With that, you can verify at translation:

(1) vill
(2) v[n], for (n % lmul) != 0
(3) v[n] overlapping v[0] for masked/carry operations, with lmul > 1

and

(4) you can arrange the helpers so that instead of 1 helper that has to
 handle all SEW, you have N helpers, each handling a different SEW.

And with all of this done, I believe you no longer need to pass the register
number to the helper.  You can pass the address of v[n], which is much more
like how the tcg generic vector support works.

Whether or not to include VL in tb_flags is a harder choice.  Certainly not the
exact value of VL, as that would lead to different translations for every loop
tail.  But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
single bit.  Knowing that this condition is true would allow some use of the
tcg generic vector support.


The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the 
bit of (VSTART == 0 && VL == VLMAX).


So it will take 8 bits of tb flags for vector extension at least.


E.g. vadd.vv could be

 if (masked) {
 switch (SEW) {
 case MO_8:
 gen_helper_vadd8_mask(...);
 break;
 ...
 }
 } else if (vl_eq_vlmax) {
 tcg_gen_gvec_add(SEW, vreg_ofs(vd), vreg_ofs(vs2), vreg_ofs(vs1),
  VLEN * LMUL, VLEN * LMUL);
 } else {
 switch (SEW) {
 case MO_8:
 gen_helper_vadd8(...);
 break;
 ...
 }
 }

Or, equivalently, pack pointers to the actual generator functions into a
structure so that this code structure can be shared between many instructions.


It's quiker to use generic vector of TCG.

However, I have one problem to support both command line VLEN and vreg_ofs.

As in SVE,  vreg ofs is the offset from cpu_env. If the structure of 
vector extension (to support command line VLEN) is


struct {
union{
uint64_t *u64 ;
int64_t  *s64;
uint32_t *u32;
int32_t  *s32;
uint16_t *u16;
int16_t  *s16;
uint8_t  *u8;
int8_t   *s8;
} mem;
target_ulong vxrm;
target_ulong vxsat;
target_ulong vl;
target_ulong vstart;
target_ulong vtype;
} vext

I can't find the way to get the direct offset of vreg from cpu_env.

Maybe I should specify a max VLEN like the way of SVE?

Best Regards,

LIU Zhiwei


Bear in mind that all tcg gvec operations operate strictly upon lanes.  I.e.

vd[x] = vs1[x] op vs2[x]

thus the actual arrangement of the elements in storage is irrelevant and SLEN
need not be considered here.


r~




Re: [PATCH v4 3/4] target/riscv: support vector extension csr

2020-02-11 Thread LIU Zhiwei




On 2020/2/12 0:11, Richard Henderson wrote:

On 2/10/20 8:12 AM, LIU Zhiwei wrote:

+static int vs(CPURISCVState *env, int csrno)
+{
+return 0;
+}

This should at least be testing RVV, a-la smode().

Testing RVV is ok.

 I'm not quite understand "a -1a smode()" here. Could you give more 
details? Thanks.

You probably want to have all of the other tests vs RVV in this file use this
function, since this will need to grow the system mode enable test.


@@ -158,8 +167,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
  return -1;
  }
  #endif
-*val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
-| (env->frm << FSR_RD_SHIFT);
+*val = (env->vext.vxrm << FSR_VXRM_SHIFT)
+| (env->vext.vxsat << FSR_VXSAT_SHIFT)
+| (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
+| (env->frm << FSR_RD_SHIFT);
  return 0;
  }

While we can be perfectly happy shifting 0's into place here, it would probably
be clearer to conditionalize on vs().

OK.

@@ -172,10 +183,60 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
  env->mstatus |= MSTATUS_FS;
  #endif
  env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+env->vext.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vext.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
  riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
  return 0;
  }

You *must* test vs() here.

OK.



r~





Re: [PATCH v4 1/4] target/riscv: add vector extension field in CPURISCVState

2020-02-11 Thread LIU Zhiwei




On 2020/2/11 23:53, Richard Henderson wrote:

On 2/10/20 8:12 AM, LIU Zhiwei wrote:

The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno,offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
---
  target/riscv/cpu.h | 13 +
  1 file changed, 13 insertions(+)

Reviewed-by: Richard Henderson 

I still don't think you need to put stuff into a sub-structure.  These register
names are unique in the manual, and not subdivided there.

OK. I will scatter these registers next patch.


r~





Re: [PATCH v4 2/4] target/riscv: configure and turn on vector extension from command line

2020-02-11 Thread LIU Zhiwei




On 2020/2/11 23:56, Richard Henderson wrote:

On 2/10/20 8:12 AM, LIU Zhiwei wrote:

+if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+error_setg(errp,
+   "Vector extension implementation only supports VLEN "
+   "in the range [128, %d]", RV_VLEN_MAX);
+return;
+}
+if (!is_power_of_2(cpu->cfg.elen)) {
+error_setg(errp,
+   "Vector extension ELEN must be power of 2");
+return;
+}
+if (cpu->cfg.elen > 64) {
+error_setg(errp,
+   "Vector extension ELEN must <= 64");
+return;
+}

ELEN should use the same "only supports ELEN in the range" language as VLEN.

OK. I will printf "only supports ELEN in the range[8, 64]".

Otherwise,
Reviewed-by: Richard Henderson 


r~





Re: [PATCH v4 4/4] target/riscv: add vector configure instruction

2020-02-12 Thread LIU Zhiwei




On 2020/2/12 0:56, Richard Henderson wrote:

On 2/10/20 8:12 AM, LIU Zhiwei wrote:

  static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
  {
+uint32_t flags = 0;
+uint32_t vlmax;
+uint8_t vl_eq_vlmax;

bool.

OK.

Is it clearer to use "bool" here? Or it's wrong to use "uint8_t "?



+
  *pc = env->pc;
  *cs_base = 0;
+
+if (env->misa & RVV) {
+vlmax = vext_get_vlmax(env_archcpu(env), env->vext.vtype);
+vl_eq_vlmax = (env->vext.vstart == 0) && (vlmax == env->vext.vl);

You might as well move the variable declarations inside this block.

OK.



+target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
+ target_ulong s2)

Indentation.

OK.



+{
+int vlmax, vl;
+RISCVCPU *cpu = env_archcpu(env);
+uint16_t sew = 1 << FIELD_EX64(s2, VTYPE, SEW);
+
+if (sew > cpu->cfg.elen) { /* only set vill bit. */
+env->vext.vtype = FIELD_DP64(0, VTYPE, VILL, 1);
+env->vext.vl = 0;
+env->vext.vstart = 0;
+return 0;
+}

You're missing checks against EDIV, VILL and the RESERVED field == 0.
This implementation does not support "Zvediv" . So I did not check it. 
I'm not sure if I should check(ediv==0).


I missed check  "VILL" filed.  Fix up it next patch.

I'm not quite sure if I should set VILL if  the RESERVED field != 0.




+
+vlmax = vext_get_vlmax(cpu, s2);
+if (s1 <= vlmax) {
+vl = s1;
+} else {
+vl = vlmax;
+}
+env->vext.vl = vl;
+env->vext.vtype = s2;
+env->vext.vstart = 0;
+return vl;
+}



r~





Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-12 Thread LIU Zhiwei

Hi, Richard

Thanks for comments.

On 2020/2/12 14:38, Richard Henderson wrote:

On 2/9/20 11:42 PM, LIU Zhiwei wrote:

+/*
+ * As simd_desc supports at most 256 bytes, and in this implementation,
+ * the max vector group length is 2048 bytes. So split it into two parts.
+ *
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t maxsz_part1(uint32_t maxsz)
+{
+return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
+}
+
+static uint32_t maxsz_part2(uint32_t maxsz)
+{
+return (maxsz & 0x3f) >> 3;
+}

I would much rather adjust simd_desc to support 2048 bytes.

I've just posted a patch set that removes an assert in target/arm that would
trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.
Do you mean "assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS));" 
in tcg-op-gvec.c?

If it is removed, I can pass 2048 bytes by set maxsz == 256.

Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
simd_desc at all, and just pass vlen, unencoded.

 Vlen is not enough,  lmul is also needed in helpers.



+/* define check conditions data structure */
+struct vext_check_ctx {
+
+struct vext_reg {
+uint8_t reg;
+bool widen;
+bool need_check;
+} check_reg[6];
+
+struct vext_overlap_mask {
+uint8_t reg;
+uint8_t vm;
+bool need_check;
+} check_overlap_mask;
+
+struct vext_nf {
+uint8_t nf;
+bool need_check;
+} check_nf;
+target_ulong check_misa;
+
+} vchkctx;

You cannot use a global variable.  The data must be thread-safe.

If we're going to do the checks this way, with a structure, it needs to be on
the stack or within DisasContext.

+#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)\
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)  \
+{ \
+vchkctx.check_misa = RVV; \
+vchkctx.check_overlap_mask.need_check = true; \
+vchkctx.check_overlap_mask.reg = a->rd;   \
+vchkctx.check_overlap_mask.vm = a->vm;\
+vchkctx.check_reg[0].need_check = true;   \
+vchkctx.check_reg[0].reg = a->rd; \
+vchkctx.check_reg[0].widen = false;   \
+vchkctx.check_nf.need_check = true;   \
+vchkctx.check_nf.nf = a->nf;  \
+  \
+if (!vext_check(s)) { \
+return false; \
+} \
+return DO_OP(s, a, SEQ);  \
+}

I don't see the improvement from a pointer.  Something like

 if (vext_check_isa_ill(s) &&
 vext_check_overlap(s, a->rd, a->rm) &&
 vext_check_reg(s, a->rd, false) &&
 vext_check_nf(s, a->nf)) {
 return DO_OP(s, a, SEQ);
 }
 return false;

seems just as clear without the extra data.
I am not quite sure which is clearer. In my opinion, setting datas is 
more easy than call different intefaces.

+#ifdef CONFIG_USER_ONLY
+#define MO_SB 0
+#define MO_LESW 0
+#define MO_LESL 0
+#define MO_LEQ 0
+#define MO_UB 0
+#define MO_LEUW 0
+#define MO_LEUL 0
+#endif

What is this for?  We already define these unconditionally.
Yes. I miss a head file "exec/memop.h". When I compile in user mode,  
some make errors appear.

I will remove these codes next patch.



+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+int idx = (index * mlen) / 8;
+int pos = (index * mlen) % 8;
+
+return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
+}

This is a little-endian indexing of the mask.  Just above we talk about using a
host-endian ordering of uint64_t.

Thus this must be based on uint64_t instead of uint8_t.


+/*
+ * This function checks watchpoint before really load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_

[PATCH v3 1/1] target/riscv: add vector integer operations

2020-02-25 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  395 +++
 target/riscv/insn32.decode  |  127 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  671 +++-
 target/riscv/vector_helper.c| 1308 ++-
 4 files changed, 2462 insertions(+), 39 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index cbe0d107c0..dee21b4128 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -296,3 +296,398 @@ DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_b, void, ptr, ptr, ptr, ptr, env, i32

Re: [PATCH v4 5/5] target/riscv: add vector amo operations

2020-02-29 Thread LIU Zhiwei




On 2020/2/29 2:46, Richard Henderson wrote:

On 2/28/20 1:19 AM, LIU Zhiwei wrote:

+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ETYPE, MTYPE, H, DO_OP, SUF)  \
+static void vext_##NAME##_noatomic_op(void *vs3, target_ulong addr,  \
+    uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr)\
+{    \
+    ETYPE ret;   \
+    target_ulong tmp;    \
+    int mmu_idx = cpu_mmu_index(env, false); \
+    tmp = cpu_ld##SUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);  \
+    ret = DO_OP((ETYPE)(MTYPE)tmp, *((ETYPE *)vs3 + H(idx)));    \
+    cpu_st##SUF##_mmuidx_ra(env, addr, ret, mmu_idx, retaddr);   \
+    if (wd) {    \
+    *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \

The target_long cast is wrong; should be ETYPE.

"If the AMO memory element width is less than SEW, the value returned from 
memory
  is sign-extended to fill SEW"

So just use (target_long) to sign-extended. As you see, instructions like

vamominud

have the uint64_t as ETYPE.  And it can't sign-extend the value from memory by
(ETYPE)(MTYPE)tmp.

Casting to target_long doesn't help -- it becomes signed at a variable size,
possibly larger than MTYPE.

In addition, I think you're performing the operation at the wrong length.
  The
text of the ISA document could be clearer, but

   # If SEW > 32 bits, the value returned from memory
   # is sign-extended to fill SEW.

You are performing the operation in ETYPE, but it should be done in MTYPE and
only afterward extended to ETYPE.

Yes, I  made a mistake.It should be MTYPE.

For minu/maxu, you're right that you need an unsigned for the operation.  But
then you need a signed type of the same width for the extension.

One possibility is to *always* make MTYPE a signed type, but for the two cases
that require an unsigned type, provide it.  E.g.

#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ESZ, MSZ, H, DO_OP, SUF)
static void vext_##NAME##_noatomic_op(void *vs3,
 target_ulong addr, uint32_t wd, uint32_t idx,
 CPURISCVState *env, uintptr_t retaddr)
{
 typedef int##ESZ##_t ETYPE;
 typedef int##MSZ##_t MTYPE;
 typedef uint##MSZ##_t UMTYPE;
 ETYPE *pe3 = (ETYPE *)vs3 + H(idx);
 MTYPE a = *pe3, b = cpu_ld##SUF##_data(env, addr);
 a = DO_OP(a, b);
 cpu_st##SUF##_data(env, addr, a);
 if (wd) {
 *pe3 = a;
 }
}

/* Signed min/max */
#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))

/* Unsigned min/max */
#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)

GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, 64, 32, H8, DO_MAXU, l)
GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, 64, 64, H8, DO_MAXU, q)

Perfect. I will try it.




The missing aligned address check is the only remaining exception that the
helper_atomic_* functions would raise, since you have properly checked for
read+write.  So it might be possible to get away with using the helpers, but I
don't like it.

Do you mean write my own helpers to implement atomic operations?

What's the meaning of " but I don't like it. "?

I don't like re-using helpers in an incorrect way.


But I do think it would be better to write your own helpers for the atomic
paths.  They need not check quite so much, since we have already done the
validation above.  You pretty much only need to use tlb_vaddr_to_host.

If that gets too ugly, we can talk about rearranging
accel/tcg/atomic_template.h so that it could be reused.

Good idea.  Perhaps use tlb_vaddr_to_host instead of atomic_mmu_lookup
to define another macro like GEN_ATOMIC_HELPER?

Alternately, we could simply *always* use the non-atomic helpers, and raise
exit_atomic if PARALLEL.

Yes, it's the simplest way.
However I prefer try to define something like GEN_ATOMIC_HELPER in
vector_helper.c.

I'll think about this some more.
In the short-term, I think non-atomic is the best we can do.

I will accept your advice. Thanks.

Best Regards,
Zhiwei


r~





[PATCH v6 4/4] target/riscv: add vector configure instruction

2020-02-29 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.h  | 61 +++---
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +++
 7 files changed, 198 insertions(+), 11 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 748bd557f9..9b5daed878 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -98,6 +99,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, VLMUL, 0, 2)
+FIELD(VTYPE, VSEW, 2, 3)
+FIELD(VTYPE, VEDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -302,16 +309,59 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, VSEW);
+lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
+
 *pc = env->pc;
 *cs_base = 0;
+
+if (env->misa & RVV) {
+uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vtype, VTYPE, VILL));
+flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+FIELD_EX64(env->vtype, VTYPE, VSEW));
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+FIELD_EX64(env->vtype, VTYPE, VLMUL));
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+} else {
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+}
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags |= TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
+flags |= cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -352,9 +402,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
+/* Vector functions */
+DEF_HELPER_3(vsetvl, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 77f794ed70..5dc009c3cd 100644
--- a/target/riscv/insn32.decode
+++ b/target/risc

[PATCH v6 3/4] target/riscv: support vector extension csr

2020-02-29 Thread LIU Zhiwei
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 75 -
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e99834856c..1f588ebc14 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM 0x002
 #define CSR_FCSR0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE   0xc00
 #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0e34c292c5..3f9e72b217 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+/* loose check condition for fcsr in vector extension */
+if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+return 0;
+}
 if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+if (env->misa & RVV) {
+return 0;
+}
+return -1;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -160,6 +172,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 #endif
 *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
 | (env->frm << FSR_RD_SHIFT);
+if (vs(env, csrno) >= 0) {
+*val |= (env->vxrm << FSR_VXRM_SHIFT)
+| (env->vxsat << FSR_VXSAT_SHIFT);
+}
 return 0;
 }
 
@@ -172,10 +188,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+if (vs(env, csrno) >= 0) {
+env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+}
 riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
 return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxrm;
+return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxrm = val;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxsat;
+return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxsat = val;
+return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vstart;
+return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vstart = val;
+return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -877,7 +945,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_FFLAGS] =  { fs,   read_fflags,  write_fflags  },
 [CSR_FRM] = { fs,   read_frm, write_frm },
 [CSR_FCSR] ={ fs,   read_fcsr,write_fcsr},
-
+/* Vector CSRs */
+[CSR_VSTART] =  { vs,   read_vstart,  write_vstart  },
+[CSR_VXSAT] =   { vs,   read_vxsat,   write_vxsat   },
+[CSR_VXRM] ={ vs,   read_vxrm,write_vxrm},
+[CSR_VL] =  { vs,   read_vl },
+[CSR_VTYPE] =   { vs,   read_vtype  },
 /* User Timers and Counters */
 [CSR_CYCLE] =   { ctr,  read_instret},
 [CSR_INSTRET] = { ctr,  read_instret},
-- 
2.23.0




[PATCH v6 2/4] target/riscv: implementation-defined constant parameters

2020-02-29 Thread LIU Zhiwei
vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Signed-off-by: LIU Zhiwei 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c | 7 +++
 target/riscv/cpu.h | 5 +
 2 files changed, 12 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8c86ebc109..6900714432 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -98,6 +98,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
 env->features |= (1ULL << feature);
@@ -320,6 +325,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = 0;
 Error *local_err = NULL;
 
@@ -345,6 +351,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
 set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 2e8d01c155..748bd557f9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -83,6 +83,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x0701
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -117,6 +119,7 @@ struct CPURISCVState {
 target_ulong badaddr;
 
 target_ulong priv_ver;
+target_ulong vext_ver;
 target_ulong misa;
 target_ulong misa_mask;
 
@@ -231,6 +234,8 @@ typedef struct RISCVCPU {
 
 char *priv_spec;
 char *user_spec;
+uint16_t vlen;
+uint16_t elen;
 bool mmu;
 bool pmp;
 } cfg;
-- 
2.23.0




[PATCH v6 1/4] target/riscv: add vector extension field in CPURISCVState

2020-02-29 Thread LIU Zhiwei
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
Acked-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de0a8d893a..2e8d01c155 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -93,9 +94,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
+
+/* vector coprocessor state. */
+uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+target_ulong vxrm;
+target_ulong vxsat;
+target_ulong vl;
+target_ulong vstart;
+target_ulong vtype;
+
 target_ulong pc;
 target_ulong load_res;
 target_ulong load_val;
-- 
2.23.0




[PATCH v6 0/4] target-riscv: support vector extension part 1

2020-02-29 Thread LIU Zhiwei
This is the first part of v6 patchset. The changelog of v6 is only coverd
the part1.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * SLEN always equals VLEN.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v6
  * keep vector CSR in read/write order
  * define fields name of VTYPE just like specification.
v5
  * vector registers as direct fields in RISCVCPUState.
  * mov the properties to last patch.
  * check RVV in vs().
  * check if rs1 is x0 in vsetvl/vsetvli.
  * check VILL, EDIV, RESERVED fileds in vsetvl.
v4
  * adjust max vlen to 512 bits.
  * check maximum on elen(64bits).
  * check minimum on vlen(128bits).
  * check if rs1 is x0 in vsetvl/vsetvli.
  * use gen_goto_tb in vsetvli instead of exit_tb.
  * fixup fetch vlmax from rs2, not env->vext.type.
v3
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * only default on for "any" cpu, others turn on from command line.
  * use a continous memory block for vector register description.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (4):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: implementation-defined constant parameters
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction

 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.c  |  7 +++
 target/riscv/cpu.h  | 78 ++---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 75 +++-
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 ++
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +
 10 files changed, 311 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0




Re: [PATCH v4 4/5] target/riscv: add fault-only-first unit stride load

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 4:03, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+GEN_VEXT_LD_ELEM(vlbff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlhff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlwff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwff_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vleff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vleff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbuff_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhuff_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwuff_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwuff_v_d, uint32_t, uint64_t, H8, ldl)

We definitely should not have a 3rd copy of these.

Yes, I will remove it by add a parameter to GEN_VEXT_LDFF.




+if (i == 0) {
+probe_read_access(env, addr, nf * msz, ra);
+} else {
+/* if it triggles an exception, no need to check watchpoint */

triggers.

Yes.



+offset = -(addr | TARGET_PAGE_MASK);
+remain = nf * msz;
+while (remain > 0) {
+host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmuidx);
+if (host) {
+#ifdef CONFIG_USER_ONLY
+if (page_check_range(addr, nf * msz, PAGE_READ) < 0) {
+vl = i;
+goto ProbeSuccess;
+}
+#else
+probe_read_access(env, addr, nf * msz, ra);
+#endif

Good job finding all of the corner cases.  I should invent a new cputlb
function that handles this better.  For now, this is the best we can do.

That will be better.

I learn a lot from SVE and some S390  code. Thanks a lot.

Best Regards,
Zhiwei


r~





Re: [PATCH v3 1/1] target/riscv: add vector integer operations

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 13:46, Richard Henderson wrote:

On 2/25/20 6:43 PM, LIU Zhiwei wrote:

Signed-off-by: LIU Zhiwei 
---
  target/riscv/helper.h   |  395 +++
  target/riscv/insn32.decode  |  127 +++
  target/riscv/insn_trans/trans_rvv.inc.c |  671 +++-
  target/riscv/vector_helper.c| 1308 ++-
  4 files changed, 2462 insertions(+), 39 deletions(-)

This patch is too large and needs splitting.

OK.

-static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm,
+bool widen)
  {
-return !(s->lmul > 1 && vm == 0 && vd == 0);
+return (vm != 0 || vd != 0) ? true : (!widen && (s->lmul == 0));
  }
  

Best to move the addition of widen back to the patch that introduced this 
function.

The "? true :" is a funny way to write ||.

Oh yes. I did not notice it.


r~





Re: [PATCH v4 3/5] target/riscv: add vector index load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 3:49, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+vsxb_v ... 011 . . . 000 . 0100111 @r_nfvm
+vsxh_v ... 011 . . . 101 . 0100111 @r_nfvm
+vsxw_v ... 011 . . . 110 . 0100111 @r_nfvm
+vsxe_v ... 011 . . . 111 . 0100111 @r_nfvm
+vsuxb_v... 111 . . . 000 . 0100111 @r_nfvm
+vsuxh_v... 111 . . . 101 . 0100111 @r_nfvm
+vsuxw_v... 111 . . . 110 . 0100111 @r_nfvm
+vsuxe_v... 111 . . . 111 . 0100111 @r_nfvm

These can be merged, with a comment, like

# Vector ordered-indexed and unordered-indexed store insns.
vsxb_v ... -11 . . . 000 . 0100111 @r_nfvm

which means you don't need these:

Good.

+static bool trans_vsuxb_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxb_v(s, a);
+}
+
+static bool trans_vsuxh_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxh_v(s, a);
+}
+
+static bool trans_vsuxw_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxw_v(s, a);
+}
+
+static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
+{
+return trans_vsxe_v(s, a);
+}
+static inline void vext_ld_index(void *vd, void *v0, target_ulong base,
+void *vs2, CPURISCVState *env, uint32_t desc,
+vext_get_index_addr get_index_addr,
+vext_ld_elem_fn ld_elem,
+vext_ld_clear_elem clear_elem,
+uint32_t esz, uint32_t msz, uintptr_t ra)

Similar comment about merging vext_ld_index and vext_st_index.

Good idea. Thanks.

Zhiwei


r~





Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 11:33, Richard Henderson wrote:

On 2/27/20 5:50 PM, LIU Zhiwei wrote:

This is not what I had in mind, and looks wrong as well.

 int idx = (index * mlen) / 64;
 int pos = (index * mlen) % 64;
 return (((uint64_t *)v0)[idx] >> pos) & 1;

You also might consider passing log2(mlen), so the multiplication could be
strength-reduced to a shift.

I don't think so. For example, when mlen is 8 bits and index is 0, it will
reduce to

return (((uint64_t *)v0)[0]) & 1

And it's not right.

The right bit is first bit in vector register 0. And in host big endianess,
it will be  the first bit of the seventh byte.

You've forgotten that we've just done an 8-byte big-endian load, which means
that we *are* looking at the first bit of the byte at offset 7.

It is right.

Yes, that's it.
  

You don't need to pass mlen, since it's

Yes.

I finally remembered all of the bits that go into mlen and thought I had
deleted that sentence -- apparently I only removed half.  ;-)


r~





Re: [PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-27 Thread LIU Zhiwei



On 2020/2/28 3:17, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+static bool vext_check_reg(DisasContext *s, uint32_t reg, bool widen)
+{
+int legal = widen ? 2 << s->lmul : 1 << s->lmul;
+
+return !((s->lmul == 0x3 && widen) || (reg % legal));
+}
+
+static bool vext_check_overlap_mask(DisasContext *s, uint32_t vd, bool vm)
+{
+return !(s->lmul > 1 && vm == 0 && vd == 0);
+}
+
+static bool vext_check_nf(DisasContext *s, uint32_t nf)
+{
+return s->lmul * (nf + 1) <= 8;
+}

Some commentary would be good here, quoting the rule being applied.  E.g. "The
destination vector register group for a masked vector instruction can only
overlap the source mask regis-
ter (v0) when LMUL=1. (Section 5.3)"

Good idea.

+static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+uint8_t nf = a->nf + 1;

Perhaps NF should have the +1 done during decode, so that it cannot be
forgotten here or elsewhere.


Perhaps not. It will  not be used elsewhere. And it will need one more 
bit in FIELD().

  E.g.

%nf  31:3  !function=ex_plus_1
@r2_nfvm ... ... vm:1 . . ... . ... \
   %nf %rs1 %rd

Where ex_plus_1 is the obvious modification of ex_shift_1().

+static inline uint32_t vext_nf(uint32_t desc)
+{
+return (simd_data(desc) >> 11) & 0xf;
+}
+
+static inline uint32_t vext_mlen(uint32_t desc)
+{
+return simd_data(desc) & 0xff;
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+return (simd_data(desc) >> 8) & 0x1;
+}
+
+static inline uint32_t vext_lmul(uint32_t desc)
+{
+return (simd_data(desc) >> 9) & 0x3;
+}

You should use FIELD() to define the fields, and then use FIELD_EX32 and
FIELD_DP32 to reference them.

Nice, I will find some place to define the fields.

+/*
+ * This function checks watchpoint before real load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB

trigger.

Yes.

+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+const target_ulong curlen = MIN(pagelen, len);
+
+probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
+addr += curlen;
+len -= curlen;
+}
+}
+
+static void probe_write_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+const target_ulong curlen = MIN(pagelen, len);
+
+probe_write(env, addr, curlen, cpu_mmu_index(env, false), ra);
+addr += curlen;
+len -= curlen;
+}
+}

A loop is overkill -- the access cannot span to 3 pages.

Yes, I will just do as you suggest!

In the unit stride load, without mask,  the max access len is checked . 
It is 512 in bytes.

And current target page is 4096 in bytes.

#define TARGET_PAGE_BITS 12 /* 4 KiB Pages */


These two functions
can be merged using probe_access and MMU_DATA_{LOAD,STORE}.


+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+/*
+ * Split the remaining range to two parts.
+ * The first part is in the last uint64_t unit.
+ * The second part start from the next uint64_t unit.
+ */
+int part1 = 0, part2 = tot - cnt;
+if (cnt % 64) {
+part1 = 64 - (cnt % 64);
+part2 = tot - cnt - part1;
+memset(tail & ~(63ULL), 0, part1);
+memset((tail + 64) & ~(63ULL), 0, part2);

You're confusing bit and byte offsets -- cnt and tot are both byte offsets.

Yes, I will fix it.

+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+
+int idx = (index * mlen) / 8;
+int pos = (index * mlen) % 8;
+
+switch (mlen) {
+case 8:
+return *((uint8_t *)v0 + H1(index)) & 0x1;
+case 16:
+return *((uint16_t *)v0 + H2(index)) & 0x1;
+case 32:
+return *((uint32_t *)v0 + H4(index)) & 0x1;
+case 64:
+return *((uint64_t *)v0 + index) & 0x1;
+default:









+return (*((uint8_t *)v0 + H1(idx)) >> pos) & 0x1;
+}

This is not what I had in mind, and looks wrong as well.

 int idx = (index * mlen) / 64;
 int pos = (index * mlen) % 64;
 return (((uint64_t *)v0)[idx] >> pos) & 1;

You also might consider passing log2(mlen), so the multiplication could be
strength-reduced to a shift.
I don't think so. For example, when mlen is 8 bits and 

Re: [PATCH v5 4/4] target/riscv: add vector configure instruction

2020-02-26 Thread LIU Zhiwei




On 2020/2/27 3:20, Alistair Francis wrote:

  On Fri, Feb 21, 2020 at 1:45 AM LIU Zhiwei  wrote:

vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
  MAINTAINERS |  1 +
  target/riscv/Makefile.objs  |  2 +-
  target/riscv/cpu.h  | 61 +++---
  target/riscv/helper.h   |  2 +
  target/riscv/insn32.decode  |  5 ++
  target/riscv/insn_trans/trans_rvv.inc.c | 69 +
  target/riscv/translate.c| 17 +-
  target/riscv/vector_helper.c| 53 +++
  8 files changed, 199 insertions(+), 11 deletions(-)
  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
  create mode 100644 target/riscv/vector_helper.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1740a4fddc..cd2e200db9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -266,6 +266,7 @@ M: Palmer Dabbelt 
  M: Alistair Francis 
  M: Sagar Karandikar 
  M: Bastian Koppelmann 
+M: LIU Zhiwei 

I don't think you should add yourself here. MAINTAINERS is more for
people doing active patch review.

OK.

RISC-V QEMU can really do with more maintainers though, so if you do
want to be involved you could help review patches.
Actually my main job is to maintain and develop QEMU code,so I'd like to 
review target/riscv code,

however vector upstream takes a lot time .

  L: qemu-ri...@nongnu.org
  S: Supported
  F: target/riscv/
diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
  obj-$(CONFIG_SOFTMMU) += pmp.o

  ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 748bd557f9..f7003edb86 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
  #define RISCV_CPU_H

  #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
  #include "exec/cpu-defs.h"
  #include "fpu/softfloat-types.h"

@@ -98,6 +99,12 @@ typedef struct CPURISCVState CPURISCVState;

  #define RV_VLEN_MAX 512

+FIELD(VTYPE, LMUL, 0, 2)

Shouldn't this be VLMUL?

OK. The same with VSEW and VEDIV.



+FIELD(VTYPE, SEW, 2, 3)

VSEW?


+FIELD(VTYPE, EDIV, 5, 2)

VEDIV?


+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
  struct CPURISCVState {
  target_ulong gpr[32];
  uint64_t fpr[32]; /* assume both F and D extensions */
@@ -302,16 +309,59 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
  #define TB_FLAGS_MMU_MASK   3
  #define TB_FLAGS_MSTATUS_FS MSTATUS_FS

+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"

Why do you need this? Shouldn't the TB_FLAGS fields work without this.

Because env_archcpu in cpu_get_tb_cpu_state will use it.

+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)

These should probably be defined with the other TB_FLAGS (or if you
need them here you can move the others up here).

I'd like to put other TB_FLAGS in other separate patch.



+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, SEW);
+lmul = FIELD_EX64(vtype, VTYPE, LMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);

Shouldn't we assert this isn't over RV_VLEN_MAX?

I don't think so.  VLEN is vector register length in bits. It is checked
against RV_VLEN_MAX in cpu realize function. If it is over RV_VLEN_MAX,
it will exits before translate any tb.

Zhiwei



Alistair


+}
+
  static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
  {
+uint32_t flags = 0;
+
  *pc = env->pc;
  *cs_base = 0;
+
+if (env->misa & RVV) {
+uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vtype, VTYPE, VILL));
+flags = F

Re: [PATCH v5 3/4] target/riscv: support vector extension csr

2020-02-26 Thread LIU Zhiwei




On 2020/2/27 2:42, Alistair Francis wrote:

On Fri, Feb 21, 2020 at 1:45 AM LIU Zhiwei  wrote:

The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei 
---
  target/riscv/cpu_bits.h | 15 +
  target/riscv/csr.c  | 75 -
  2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e99834856c..1f588ebc14 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
  #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
  #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)

+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)

Shouldn't these be FSCR_*?
Like other fields in fcsr, they all have been named by FSR_*, so I just 
name it like before.

+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)

Same here, FCSR_*


+
  /* Control and Status Registers */

  /* User Trap Setup */
@@ -48,6 +56,13 @@
  #define CSR_FRM 0x002
  #define CSR_FCSR0x003

+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
  /* User Timers and Counters */
  #define CSR_CYCLE   0xc00
  #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0e34c292c5..9cd2b418bf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
  static int fs(CPURISCVState *env, int csrno)
  {
  #if !defined(CONFIG_USER_ONLY)
+/* loose check condition for fcsr in vector extension */
+if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+return 0;
+}
  if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
  return -1;
  }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
  return 0;
  }

+static int vs(CPURISCVState *env, int csrno)
+{
+if (env->misa & RVV) {
+return 0;
+}
+return -1;
+}
+
  static int ctr(CPURISCVState *env, int csrno)
  {
  #if !defined(CONFIG_USER_ONLY)
@@ -160,6 +172,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
  #endif
  *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
  | (env->frm << FSR_RD_SHIFT);
+if (vs(env, csrno) >= 0) {
+*val |= (env->vxrm << FSR_VXRM_SHIFT)
+| (env->vxsat << FSR_VXSAT_SHIFT);
+}
  return 0;
  }

@@ -172,10 +188,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
  env->mstatus |= MSTATUS_FS;
  #endif
  env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+if (vs(env, csrno) >= 0) {
+env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+}
  riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
  return 0;
  }

+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxrm;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxsat;
+return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vstart;
+return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxrm = val;
+return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxsat = val;
+return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vstart = val;
+return 0;
+}

Can you keep these in read/write order? So read_vxrm() then
write_vxrm() for example.

OK.

Otherwise the patch looks good :)

Alistair


+
  /* User Timers and Counters */
  static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
  {
@@ -877,7 +945,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
  [CSR_FFLAGS] =  { fs,   read_fflags,  write_fflags  },
  [CSR_FRM] = { fs,   read_frm, write_frm },
  [CSR_FCSR] ={ fs,   read_fcsr,write_fcsr},
-
+/* Vector CSRs */
+[CSR_VSTART] =  { vs,   rea

Re: [PATCH v4 2/5] target/riscv: add vector stride load and store instructions

2020-02-27 Thread LIU Zhiwei




On 2020/2/28 3:36, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+GEN_VEXT_LD_ELEM(vlsb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlsh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlsw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlsw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlse_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlse_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlsbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlshu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)

Why do you need to define new functions identical to the old ones?
Are you
doing this just to make the names match up?

Yes, just to make the names match up. So I can use

GEN_VEXT_ST_STRIDE

to generate code.

Perhaps add a parameter for GEN_VEXT_ST_STRIDE is just OK.




+GEN_VEXT_ST_ELEM(vssb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vssb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vssb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vssb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vssh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vssh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vssh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vssw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vssw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)

Likewise.


+static void vext_st_stride(void *vd, void *v0, target_ulong base,
+target_ulong stride, CPURISCVState *env, uint32_t desc,
+vext_st_elem_fn st_elem, uint32_t esz, uint32_t msz, uintptr_t ra)
+{
+uint32_t i, k;
+uint32_t nf = vext_nf(desc);
+uint32_t vm = vext_vm(desc);
+uint32_t mlen = vext_mlen(desc);
+uint32_t vlmax = vext_maxsz(desc) / esz;
+
+/* probe every access*/
+for (i = 0; i < env->vl; i++) {
+if (!vm && !vext_elem_mask(v0, mlen, i)) {
+continue;
+}
+probe_write_access(env, base + stride * i, nf * msz, ra);
+}
+/* store bytes to guest memory */
+for (i = 0; i < env->vl; i++) {
+k = 0;
+if (!vm && !vext_elem_mask(v0, mlen, i)) {
+continue;
+}
+while (k < nf) {
+target_ulong addr = base + stride * i + k * msz;
+st_elem(env, addr, i + k * vlmax, vd, ra);
+k++;
+}
+}
+}

Similar comments wrt unifying the load and store helpers.

I'll also note that vext_st_stride and vext_st_us_mask could be unified by
passing sizeof(ETYPE) as stride, and vm = true as a parameter.

Good idea. Thanks.

Zhiwei



r~





Re: [PATCH v4 5/5] target/riscv: add vector amo operations

2020-02-28 Thread LIU Zhiwei




On 2020/2/28 13:38, Richard Henderson wrote:

On 2/25/20 2:35 AM, LIU Zhiwei wrote:

+if (s->sew < 2) {
+return false;
+}

This could just as easily be in amo_check?

Yes, it can be done in amo_check.

+
+if (tb_cflags(s->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC64
+fn = fns[0][seq][s->sew - 2];
+#else
+gen_helper_exit_atomic(cpu_env);
+s->base.is_jmp = DISAS_NORETURN;
+return true;
+#endif

Why are you raising exit_atomic without first checking that s->sew == 3?

Yes, it should be

    if (s->sew == 3) {

#ifdef CONFIG_ATOMIC64
fn = fns[0][seq][1];
#else
gen_helper_exit_atomic(cpu_env);
s->base.is_jmp = DISAS_NORETURN;
return true;
#endif

    } else {
#ifdef TARGET_RISCV64

fn = fns[0][seq][0];
#else
fn = fns[0][seq];
#endif

  }
Is it OK?

We
can do 32-bit atomic operations always.

Good.



+} else {
+fn = fns[1][seq][s->sew - 2];
+}
+if (fn == NULL) {
+return false;
+}
+
+return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+static bool amo_check(DisasContext *s, arg_rwdvm* a)
+{
+return (vext_check_isa_ill(s, RVV | RVA) &&
+(a->wd ? vext_check_overlap_mask(s, a->rd, a->vm) : 1) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false));
+}

I guess the "If SEW is greater than XLEN, an illegal instruction exception is
raised" requirement is currently in the column of NULLs in the !CONFIG_RISCV64
block.  But it might be better to have it explicit and save the column of NULLs.

maybe  adds

(1 << s->sew) <= sizeof(target_ulong) &&

in amo_check

It makes sense to me to do both sew checks together, whether in amo_check or in
amo_op.


+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ETYPE, MTYPE, H, DO_OP, SUF)  \
+static void vext_##NAME##_noatomic_op(void *vs3, target_ulong addr,  \
+uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr)\
+{\
+ETYPE ret;   \
+target_ulong tmp;\
+int mmu_idx = cpu_mmu_index(env, false); \
+tmp = cpu_ld##SUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);  \
+ret = DO_OP((ETYPE)(MTYPE)tmp, *((ETYPE *)vs3 + H(idx)));\
+cpu_st##SUF##_mmuidx_ra(env, addr, ret, mmu_idx, retaddr);   \
+if (wd) {\
+*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \

The target_long cast is wrong; should be ETYPE.
"If the AMO memory element width is less than SEW, the value returned 
from memory

 is sign-extended to fill SEW"

So just use (target_long) to sign-extended. As you see, instructions like

vamominud

have the uint64_t as ETYPE.  And it can't sign-extend the value from 
memory by (ETYPE)(MTYPE)tmp.

You can use cpu_ldX/stX_data (no mmu_idx or retaddr argument).  There should be
no faults, since you've already checked for read+write.

Good idea.

+/* atomic opreation for vector atomic insructions */
+#ifndef CONFIG_USER_ONLY
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)   \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,\
+uint32_t wd, uint32_t idx, CPURISCVState *env)   \
+{\
+target_ulong tmp;\
+int mem_idx = cpu_mmu_index(env, false); \
+tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx)),  \
+make_memop_idx(MO_ALIGN | MOFLAG, mem_idx)); \
+if (wd) {\
+*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \
+}\
+}
+#else
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)   \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,\
+uint32_t wd, uint32_t idx, CPURISCVState *env)   \
+{\
+target_ulong tmp;\
+tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx))); \
+if (wd) {\
+*((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;  \
+}\
+}
+#endif

This is not right.  It is not legal to call these helpers from anothe

[PATCH v4 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-25 Thread LIU Zhiwei
Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address.

The Zvlsseg expands some vector load/store segment instructions, which move
multiple contiguous fields in memory to and from consecutively numbered
vector register

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  70 
 target/riscv/insn32.decode  |  17 +
 target/riscv/insn_trans/trans_rvv.inc.c | 188 +++
 target/riscv/translate.c|   2 +
 target/riscv/vector_helper.c| 404 
 5 files changed, 681 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..996639c0fa 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,73 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5dc009c3cd..dad3ed91c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6 +43,7

[PATCH v4 5/5] target/riscv: add vector amo operations

2020-02-25 Thread LIU Zhiwei
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  56 +
 target/riscv/insn32-64.decode   |  11 +
 target/riscv/insn32.decode  |  13 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 154 +
 target/riscv/vector_helper.c| 280 +++-
 5 files changed, 513 insertions(+), 1 deletion(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72ba4d9bdb..cbe0d107c0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,59 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_a,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_a,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_a,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_a,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_a, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . . . 011 . 010 @atom_st
 amominu_d  11000 . . . . 011 . 010 @atom_st
 amomaxu_d  11100 . . . . 011 . 010 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo

[PATCH v4 2/5] target/riscv: add vector stride load and store instructions

2020-02-25 Thread LIU Zhiwei
Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  35 +
 target/riscv/insn32.decode  |  14 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 117 +
 target/riscv/vector_helper.c| 166 
 4 files changed, 332 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 996639c0fa..87dfa90609 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -148,3 +148,38 @@ DEF_HELPER_5(vse_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vse_v_w_mask, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vse_v_d, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vse_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_6(vlsb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlsbu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlshu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlswu_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssb_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssh_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vssw_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dad3ed91c7..2f2d3d13b3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
  shamt rs1 rd
 aq rl rs2 rs1 rd
 vm rd rs1 nf
+ vm rd rs1 rs2 nf
 
 # Formats 32:
 @r   ...   . . ... . ... %rs2 %rs1 
%rd
@@ -64,6 +65,7 @@
 @r2_rm   ...   . . ... . ... %rs1 %rm %rd
 @r2  ...   . . ... . ... %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 . . ... . ...  %rs1 %rd
+@r_nfvm  nf:3 ... vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r2_zimm . zimm:11  . ... . ... %rs1 %rd
 
 @sfence_vma ... . .   ... . ... %rs2 %rs1
@@ -222,6 +224,18 @@ vsh_v  ... 000 . 0 . 101 . 0100111 @r2_nfvm
 vsw_v  ... 000 . 0 . 110 . 0100111 @r2_nfvm
 vse_v  ... 000 . 0 . 111 . 0100111 @r2_nfvm
 
+vlsb_v ... 110 . . . 000 . 111 @r_nfvm
+vlsh_v ... 110 . . . 101 . 111 @r_nfvm
+vlsw_v ... 110 . . . 110 . 111 @r_nfvm
+vlse_v ... 010 . . . 111 . 111 @r_nfvm
+vlsbu_v... 010 . . . 000 . 111 @r_nfvm
+vlshu_v... 010 . . . 101 . 111 @r_nfvm
+vlswu_v... 010 . . . 110 . 111 @r_nfvm
+vssb_v ... 010 . . . 000 . 0100111 @r_nfvm
+vssh_v ... 010 . . . 101 . 0100111 @r_nfvm
+vssw_v ... 010 . . . 110 . 0100111 @r_nfvm
+vsse_v ... 010 . . . 111 . 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index b0e97e7e06..1b627dc880 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -255,3 +255,120 @@ GEN_VEXT_TRANS

[PATCH v4 4/5] target/riscv: add fault-only-first unit stride load

2020-02-25 Thread LIU Zhiwei
The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  22 
 target/riscv/insn32.decode  |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  66 
 target/riscv/vector_helper.c| 136 
 4 files changed, 231 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f9b3da60ca..72ba4d9bdb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a363a6b7e..973ac63fda 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -219,6 +219,13 @@ vle_v  ... 000 . 0 . 111 . 111 @r2_nfvm
 vlbu_v ... 000 . 0 . 000 . 111 @r2_nfvm
 vlhu_v ... 000 . 0 . 101 . 111 @r2_nfvm
 vlwu_v ... 000 . 0 . 110 . 111 @r2_nfvm
+vlbff_v... 100 . 1 . 000 . 111 @r2_nfvm
+vlhff_v... 100 . 1 . 101 . 111 @r2_nfvm
+vlwff_v... 100 . 1 . 110 . 111 @r2_nfvm
+vleff_v... 000 . 1 . 111 . 111 @r2_nfvm
+vlbuff_v   ... 000 . 1 . 000 . 111 @r2_nfvm
+vlhuff_v   ... 000 . 1 . 101 . 111 @r2_nfvm
+vlwuff_v   ... 000 . 1 . 110 . 111 @r2_nfvm
 vsb_v  ... 000 . 0 . 000 . 0100111 @r2_nfvm
 vsh_v  ... 000 . 0 . 101 . 0100111 @r2_nfvm
 vsw_v  ... 000 . 0 . 110 . 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index c0d560d789..dda3ba555c 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -510,3 +510,69 @@ static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
 {
 return trans_vsxe_v(s, a);
 }
+
+/*
+ *** unit stride fault-only-first load
+ */
+static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+gen_helper_ldst_us *fn, DisasContext *s)
+{
+TCGv_ptr dest, mask;
+TCGv base;
+TCGv_i32 desc;
+
+dest = tcg_temp_new_ptr();
+mask = tcg_temp_new_ptr();
+base = tcg_temp_new();
+desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+gen_get_gpr(base, rs1);
+tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+fn(dest, mask, base, cpu_env, desc);
+
+tcg_temp_free_ptr(dest);
+tcg_temp_free_ptr(mask);
+tcg_temp_free(base);
+tcg_temp_free_i32(desc);
+return true;
+}
+
+static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+uint8_t nf = a->nf + 1;
+uint32_t data = s->mlen | (a->vm << 8) | (s->lmul << 9) | (nf << 11);
+gen_helper_ldst_us *fn;
+static gen_helper_ldst_us * const fns[7][4] = {
+{ gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
+  gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
+{ NULL,  gen_helper_vlhff_v_h,
+  gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
+{ NULL,  NULL,
+  gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
+{ gen_helper_vleff_v_b,  gen_helper_vleff_v_h,
+  gen_helper_vleff_v_w,  gen_helper_vleff_v_d },

[PATCH v4 0/5] target/riscv: support vector extension part 2

2020-02-25 Thread LIU Zhiwei
Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v4
  * remove check structure, use check function directly
  * use (s->vlen / 8) as maxsz in simd_maxsz
  * remove helper structure vext_ctx, pass args directly.
v3
  * move check code from execution time to translation time.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * add atomic and noatomic operation for vector amo instructions.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (5):
  target/riscv: add vector unit stride load and store instructions
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations

 target/riscv/helper.h   |  218 
 target/riscv/insn32-64.decode   |   11 +
 target/riscv/insn32.decode  |   67 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  663 +
 target/riscv/translate.c|2 +
 target/riscv/vector_helper.c| 1203 +++
 6 files changed, 2164 insertions(+)

-- 
2.23.0




[PATCH v4 3/5] target/riscv: add vector index load and store instructions

2020-02-25 Thread LIU Zhiwei
Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  35 
 target/riscv/insn32.decode  |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 138 +++
 target/riscv/vector_helper.c| 219 
 4 files changed, 408 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87dfa90609..f9b3da60ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2f2d3d13b3..6a363a6b7e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -236,6 +236,22 @@ vssh_v ... 010 . . . 101 . 0100111 @r_nfvm
 vssw_v ... 010 . . . 110 . 0100111 @r_nfvm
 vsse_v ... 010 . . . 111 . 0100111 @r_nfvm
 
+vlxb_v ... 111 . . . 000 . 111 @r_nfvm
+vlxh_v ... 111 . . . 101 . 111 @r_nfvm
+vlxw_v ... 111 . . . 110 . 111 @r_nfvm
+vlxe_v ... 011 . . . 111 . 111 @r_nfvm
+vlxbu_v... 011 . . . 000 . 111 @r_nfvm
+vlxhu_v... 011 . . . 101 . 111 @r_nfvm
+vlxwu_v... 011 . . . 110 . 111 @r_nfvm
+vsxb_v ... 011 . . . 000 . 0100111 @r_nfvm
+vsxh_v ... 011 . . . 101 . 0100111 @r_nfvm
+vsxw_v ... 011 . . . 110 . 0100111 @r_nfvm
+vsxe_v ... 011 . . . 111 . 0100111 @r_nfvm
+vsuxb_v... 111 . . . 000 . 0100111 @r_nfvm
+vsuxh_v... 111 . . . 101 . 0100111 @r_nfvm
+vsuxw_v... 111 . . . 110 . 0100111 @r_nfvm
+vsuxe_v... 111 . . . 111 . 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 1b627dc880..c0d560d789 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -372,3 +372,141 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, 
st_stride_check)
 GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
+
+/*
+ *** index load and store
+ */
+typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr

Re: [PATCH v4 2/4] target/riscv: configure and turn on vector extension from command line

2020-02-18 Thread LIU Zhiwei

Hi, Alistair

On 2020/2/19 6:34, Alistair Francis wrote:

On Mon, Feb 10, 2020 at 12:12 AM LIU Zhiwei  wrote:

Vector extension is default on only for "any" cpu. It can be turned
on by command line "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".

vlen is the vector register length, default value is 128 bit.
elen is the max operator size in bits, default value is 64 bit.
vext_spec is the vector specification version, default value is v0.7.1.
Thest properties and cpu can be specified with other values.

Signed-off-by: LIU Zhiwei 

This looks fine to me. Shouldn't this be the last patch though?

Yes, it should be the last patch.

As in
once the vector extension has been added to QEMU you can turn it on
from the command line. Right now this turns it on but it isn't
implemented.

Maybe I should just add fields in RISCVCPU structure. And never open the
vector extension on or add configure properties until the implementation 
is ready.


It's still a little awkward as the reviewers will not be able to test 
the patch until the

last patch.


Alistair


---
  target/riscv/cpu.c | 48 --
  target/riscv/cpu.h |  8 
  2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8c86ebc109..95fdb6261e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -98,6 +98,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
  env->priv_ver = priv_ver;
  }

+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
  static void set_feature(CPURISCVState *env, int feature)
  {
  env->features |= (1ULL << feature);
@@ -113,7 +118,7 @@ static void set_resetvec(CPURISCVState *env, int resetvec)
  static void riscv_any_cpu_init(Object *obj)
  {
  CPURISCVState *env = _CPU(obj)->env;
-set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU | RVV);
  set_priv_version(env, PRIV_VERSION_1_11_0);
  set_resetvec(env, DEFAULT_RSTVEC);
  }
@@ -320,6 +325,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  CPURISCVState *env = >env;
  RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
  int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
  target_ulong target_misa = 0;
  Error *local_err = NULL;

@@ -343,8 +349,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  return;
  }
  }
-
+if (cpu->cfg.vext_spec) {
+if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
+vext_version = VEXT_VERSION_0_07_1;
+} else {
+error_setg(errp,
+   "Unsupported vector spec version '%s'",
+   cpu->cfg.vext_spec);
+return;
+}
+}
  set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
  set_resetvec(env, DEFAULT_RSTVEC);

  if (cpu->cfg.mmu) {
@@ -409,6 +425,30 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  if (cpu->cfg.ext_u) {
  target_misa |= RVU;
  }
+if (cpu->cfg.ext_v) {
+target_misa |= RVV;
+if (!is_power_of_2(cpu->cfg.vlen)) {
+error_setg(errp,
+   "Vector extension VLEN must be power of 2");
+return;
+}
+if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+error_setg(errp,
+   "Vector extension implementation only supports VLEN "
+   "in the range [128, %d]", RV_VLEN_MAX);
+return;
+}
+if (!is_power_of_2(cpu->cfg.elen)) {
+error_setg(errp,
+   "Vector extension ELEN must be power of 2");
+return;
+}
+if (cpu->cfg.elen > 64) {
+error_setg(errp,
+   "Vector extension ELEN must <= 64");
+return;
+}
+}

  set_misa(env, RVXLEN | target_misa);
  }
@@ -444,10 +484,14 @@ static Property riscv_cpu_properties[] = {
  DEFINE_PROP_BOOL("c", RISCVCPU, cfg.ext_c, true),
  DEFINE_PROP_BOOL("s", RISCVCPU, cfg.ext_s, true),
  DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
+DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
  DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
  DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
  DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
  DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_sp

Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-19 Thread LIU Zhiwei

Hi, Richard
Thanks for your informative comments. I'm addressing these comments.
And a little confused in some comments.
On 2020/2/12 14:38, Richard Henderson wrote:

On 2/9/20 11:42 PM, LIU Zhiwei wrote:

+/*
+ * As simd_desc supports at most 256 bytes, and in this implementation,
+ * the max vector group length is 2048 bytes. So split it into two parts.
+ *
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t maxsz_part1(uint32_t maxsz)
+{
+return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
+}
+
+static uint32_t maxsz_part2(uint32_t maxsz)
+{
+return (maxsz & 0x3f) >> 3;
+}

I would much rather adjust simd_desc to support 2048 bytes.

I've just posted a patch set that removes an assert in target/arm that would
trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.

Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
simd_desc at all, and just pass vlen, unencoded.


+/* define check conditions data structure */
+struct vext_check_ctx {
+
+struct vext_reg {
+uint8_t reg;
+bool widen;
+bool need_check;
+} check_reg[6];
+
+struct vext_overlap_mask {
+uint8_t reg;
+uint8_t vm;
+bool need_check;
+} check_overlap_mask;
+
+struct vext_nf {
+uint8_t nf;
+bool need_check;
+} check_nf;
+target_ulong check_misa;
+
+} vchkctx;

You cannot use a global variable.  The data must be thread-safe.

If we're going to do the checks this way, with a structure, it needs to be on
the stack or within DisasContext.


+#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)\
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)  \
+{ \
+vchkctx.check_misa = RVV; \
+vchkctx.check_overlap_mask.need_check = true; \
+vchkctx.check_overlap_mask.reg = a->rd;   \
+vchkctx.check_overlap_mask.vm = a->vm;\
+vchkctx.check_reg[0].need_check = true;   \
+vchkctx.check_reg[0].reg = a->rd; \
+vchkctx.check_reg[0].widen = false;   \
+vchkctx.check_nf.need_check = true;   \
+vchkctx.check_nf.nf = a->nf;  \
+  \
+if (!vext_check(s)) { \
+return false; \
+} \
+return DO_OP(s, a, SEQ);  \
+}

I don't see the improvement from a pointer.  Something like

 if (vext_check_isa_ill(s) &&
 vext_check_overlap(s, a->rd, a->rm) &&
 vext_check_reg(s, a->rd, false) &&
 vext_check_nf(s, a->nf)) {
 return DO_OP(s, a, SEQ);
 }
 return false;

seems just as clear without the extra data.


+#ifdef CONFIG_USER_ONLY
+#define MO_SB 0
+#define MO_LESW 0
+#define MO_LESL 0
+#define MO_LEQ 0
+#define MO_UB 0
+#define MO_LEUW 0
+#define MO_LEUL 0
+#endif

What is this for?  We already define these unconditionally.



+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+int idx = (index * mlen) / 8;
+int pos = (index * mlen) % 8;
+
+return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
+}

This is a little-endian indexing of the mask.  Just above we talk about using a
host-endian ordering of uint64_t.

Thus this must be based on uint64_t instead of uint8_t.


+/*
+ * This function checks watchpoint before really load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+target_ulong len, uintptr_t ra)
+{
+while (len) {
+const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+const target_ulong curlen = MIN(pagelen, len);
+
+probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);

The return value here is non-null when we can read directly from host memory.
It would be a shame to throw that work away.



+/* data structure and common functions for load and store */
+typedef void vext_ld_elem_fn(CPURI

[PATCH v5 4/4] target/riscv: add vector configure instruction

2020-02-21 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 MAINTAINERS |  1 +
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.h  | 61 +++---
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +++
 8 files changed, 199 insertions(+), 11 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 1740a4fddc..cd2e200db9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -266,6 +266,7 @@ M: Palmer Dabbelt 
 M: Alistair Francis 
 M: Sagar Karandikar 
 M: Bastian Koppelmann 
+M: LIU Zhiwei 
 L: qemu-ri...@nongnu.org
 S: Supported
 F: target/riscv/
diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 748bd557f9..f7003edb86 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -98,6 +99,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, LMUL, 0, 2)
+FIELD(VTYPE, SEW, 2, 3)
+FIELD(VTYPE, EDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -302,16 +309,59 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, SEW);
+lmul = FIELD_EX64(vtype, VTYPE, LMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
+
 *pc = env->pc;
 *cs_base = 0;
+
+if (env->misa & RVV) {
+uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vtype, VTYPE, VILL));
+flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+FIELD_EX64(env->vtype, VTYPE, SEW));
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+FIELD_EX64(env->vtype, VTYPE, LMUL));
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+} else {
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+}
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags |= TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
+flags |= cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -352,9 +402,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76

[PATCH v5 1/4] target/riscv: add vector extension field in CPURISCVState

2020-02-21 Thread LIU Zhiwei
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de0a8d893a..2e8d01c155 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -93,9 +94,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
+
+/* vector coprocessor state. */
+uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+target_ulong vxrm;
+target_ulong vxsat;
+target_ulong vl;
+target_ulong vstart;
+target_ulong vtype;
+
 target_ulong pc;
 target_ulong load_res;
 target_ulong load_val;
-- 
2.23.0




[PATCH v5 2/4] target/riscv: implementation-defined constant parameters

2020-02-21 Thread LIU Zhiwei
vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 7 +++
 target/riscv/cpu.h | 5 +
 2 files changed, 12 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8c86ebc109..6900714432 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -98,6 +98,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
 env->features |= (1ULL << feature);
@@ -320,6 +325,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = 0;
 Error *local_err = NULL;
 
@@ -345,6 +351,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
 set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 2e8d01c155..748bd557f9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -83,6 +83,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x0701
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -117,6 +119,7 @@ struct CPURISCVState {
 target_ulong badaddr;
 
 target_ulong priv_ver;
+target_ulong vext_ver;
 target_ulong misa;
 target_ulong misa_mask;
 
@@ -231,6 +234,8 @@ typedef struct RISCVCPU {
 
 char *priv_spec;
 char *user_spec;
+uint16_t vlen;
+uint16_t elen;
 bool mmu;
 bool pmp;
 } cfg;
-- 
2.23.0




[PATCH v5 0/4] target-riscv: support vector extension part 1

2020-02-21 Thread LIU Zhiwei
This is the first part of v5 patchset. The changelog of v5 is only coverd
the part1.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * SLEN always equals VLEN.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:

v5
  * vector registers as direct fields in RISCVCPUState.
  * mov the properties to last patch.
  * check RVV in vs().
  * check if rs1 is x0 in vsetvl/vsetvli.
  * check VILL, EDIV, RESERVED fileds in vsetvl.
v4
  * adjust max vlen to 512 bits.
  * check maximum on elen(64bits).
  * check minimum on vlen(128bits).
  * check if rs1 is x0 in vsetvl/vsetvli.
  * use gen_goto_tb in vsetvli instead of exit_tb.
  * fixup fetch vlmax from rs2, not env->vext.type.
v3
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * only default on for "any" cpu, others turn on from command line.
  * use a continous memory block for vector register description.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (4):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: implementation-defined constant parameters
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction

 MAINTAINERS |  1 +
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.c  |  7 +++
 target/riscv/cpu.h  | 78 ++---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 75 +++-
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 ++
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +
 11 files changed, 312 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0




[PATCH v5 3/4] target/riscv: support vector extension csr

2020-02-21 Thread LIU Zhiwei
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 75 -
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e99834856c..1f588ebc14 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM 0x002
 #define CSR_FCSR0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE   0xc00
 #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0e34c292c5..9cd2b418bf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+/* loose check condition for fcsr in vector extension */
+if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+return 0;
+}
 if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+if (env->misa & RVV) {
+return 0;
+}
+return -1;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -160,6 +172,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 #endif
 *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
 | (env->frm << FSR_RD_SHIFT);
+if (vs(env, csrno) >= 0) {
+*val |= (env->vxrm << FSR_VXRM_SHIFT)
+| (env->vxsat << FSR_VXSAT_SHIFT);
+}
 return 0;
 }
 
@@ -172,10 +188,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+if (vs(env, csrno) >= 0) {
+env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+}
 riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
 return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxrm;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxsat;
+return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vstart;
+return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxrm = val;
+return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxsat = val;
+return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vstart = val;
+return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -877,7 +945,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_FFLAGS] =  { fs,   read_fflags,  write_fflags  },
 [CSR_FRM] = { fs,   read_frm, write_frm },
 [CSR_FCSR] ={ fs,   read_fcsr,write_fcsr},
-
+/* Vector CSRs */
+[CSR_VSTART] =  { vs,   read_vstart,  write_vstart  },
+[CSR_VXSAT] =   { vs,   read_vxsat,   write_vxsat   },
+[CSR_VXRM] ={ vs,   read_vxrm,write_vxrm},
+[CSR_VL] =  { vs,   read_vl },
+[CSR_VTYPE] =   { vs,   read_vtype  },
 /* User Timers and Counters */
 [CSR_CYCLE] =   { ctr,  read_instret},
 [CSR_INSTRET] = { ctr,  read_instret},
-- 
2.23.0




[PATCH v4 4/4] target/riscv: add vector configure instruction

2020-02-10 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 MAINTAINERS |  1 +
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.h  | 61 +++---
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 49 ++
 8 files changed, 195 insertions(+), 11 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e72b5e5f69..015e9239b5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -266,6 +266,7 @@ M: Palmer Dabbelt 
 M: Alistair Francis 
 M: Sagar Karandikar 
 M: Bastian Koppelmann 
+M: LIU Zhiwei 
 L: qemu-ri...@nongnu.org
 S: Supported
 F: target/riscv/
diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index bf2b4b55af..f857845285 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -98,6 +99,10 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, LMUL, 0, 2)
+FIELD(VTYPE, SEW, 2, 3)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -306,16 +311,61 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, SEW);
+lmul = FIELD_EX64(vtype, VTYPE, LMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
+uint32_t vlmax;
+uint8_t vl_eq_vlmax;
+
 *pc = env->pc;
 *cs_base = 0;
+
+if (env->misa & RVV) {
+vlmax = vext_get_vlmax(env_archcpu(env), env->vext.vtype);
+vl_eq_vlmax = (env->vext.vstart == 0) && (vlmax == env->vext.vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vext.vtype, VTYPE, VILL));
+flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+FIELD_EX64(env->vext.vtype, VTYPE, SEW));
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+FIELD_EX64(env->vext.vtype, VTYPE, LMUL));
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+} else {
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+}
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags |= TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
+flags |= cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -356,9 +406,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER

[PATCH v3 4/5] target/riscv: add fault-only-first unit stride load

2020-02-09 Thread LIU Zhiwei
The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  22 
 target/riscv/insn32.decode  |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  88 +++
 target/riscv/vector_helper.c| 138 
 4 files changed, 255 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5ebd3d6ccd..893dfc0fb8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, 
i32)
 DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a363a6b7e..973ac63fda 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -219,6 +219,13 @@ vle_v  ... 000 . 0 . 111 . 111 @r2_nfvm
 vlbu_v ... 000 . 0 . 000 . 111 @r2_nfvm
 vlhu_v ... 000 . 0 . 101 . 111 @r2_nfvm
 vlwu_v ... 000 . 0 . 110 . 111 @r2_nfvm
+vlbff_v... 100 . 1 . 000 . 111 @r2_nfvm
+vlhff_v... 100 . 1 . 101 . 111 @r2_nfvm
+vlwff_v... 100 . 1 . 110 . 111 @r2_nfvm
+vleff_v... 000 . 1 . 111 . 111 @r2_nfvm
+vlbuff_v   ... 000 . 1 . 000 . 111 @r2_nfvm
+vlhuff_v   ... 000 . 1 . 101 . 111 @r2_nfvm
+vlwuff_v   ... 000 . 1 . 110 . 111 @r2_nfvm
 vsb_v  ... 000 . 0 . 000 . 0100111 @r2_nfvm
 vsh_v  ... 000 . 0 . 101 . 0100111 @r2_nfvm
 vsw_v  ... 000 . 0 . 110 . 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 13033b3906..66caa16d18 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -663,3 +663,91 @@ static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
 {
 return trans_vsxe_v(s, a);
 }
+
+/* unit stride fault-only-first load */
+typedef void gen_helper_vext_ldff(TCGv_ptr, TCGv, TCGv_ptr,
+TCGv_env, TCGv_i32);
+
+static bool do_vext_ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+gen_helper_vext_ldff *fn, DisasContext *s)
+{
+TCGv_ptr dest, mask;
+TCGv base;
+TCGv_i32 desc;
+
+dest = tcg_temp_new_ptr();
+mask = tcg_temp_new_ptr();
+base = tcg_temp_new();
+desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+gen_get_gpr(base, rs1);
+tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+fn(dest, base, mask, cpu_env, desc);
+
+tcg_temp_free_ptr(dest);
+tcg_temp_free_ptr(mask);
+tcg_temp_free(base);
+tcg_temp_free_i32(desc);
+return true;
+}
+
+static bool vext_ldff_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+uint8_t nf = a->nf + 1;
+uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+| (nf << 12);
+gen_helper_vext_ldff *fn;
+static gen_helper_vext_ldff * const fns[7][4] = {
+/* masked unit stride fault-only-first load */
+{ gen_helper_vlbff_v_b_mask,  gen_helper_vlbff_v_h_mask,
+  gen_helper_vlbff_v_w_mask,  gen_helper_

[PATCH v3 5/5] target/riscv: add vector amo operations

2020-02-09 Thread LIU Zhiwei
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  57 +
 target/riscv/insn32-64.decode   |  11 +
 target/riscv/insn32.decode  |  13 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 167 ++
 target/riscv/vector_helper.c| 292 
 5 files changed, 540 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 893dfc0fb8..3624a20262 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,60 @@ DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv

[PATCH v3 0/5] target/riscv: support vector extension part 2

2020-02-09 Thread LIU Zhiwei
This is the second part of v3 patchset. This changelog of v3 is only coverd
the part2.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v3
  * move check code from execution time to translation time.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * add atomic and noatomic operation for vector amo instructions.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (5):
  target/riscv: add vector unit stride load and store instructions
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations

 target/riscv/helper.h   |  219 
 target/riscv/insn32-64.decode   |   11 +
 target/riscv/insn32.decode  |   67 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  851 +++
 target/riscv/translate.c|2 +
 target/riscv/vector_helper.c| 1251 +++
 6 files changed, 2401 insertions(+)

-- 
2.23.0




[PATCH v3 3/5] target/riscv: add vector index load and store instructions

2020-02-09 Thread LIU Zhiwei
Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  35 
 target/riscv/insn32.decode  |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 164 ++
 target/riscv/vector_helper.c| 214 
 4 files changed, 429 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 19c1bfc317..5ebd3d6ccd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, 
i32)
 DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2f2d3d13b3..6a363a6b7e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -236,6 +236,22 @@ vssh_v ... 010 . . . 101 . 0100111 @r_nfvm
 vssw_v ... 010 . . . 110 . 0100111 @r_nfvm
 vsse_v ... 010 . . . 111 . 0100111 @r_nfvm
 
+vlxb_v ... 111 . . . 000 . 111 @r_nfvm
+vlxh_v ... 111 . . . 101 . 111 @r_nfvm
+vlxw_v ... 111 . . . 110 . 111 @r_nfvm
+vlxe_v ... 011 . . . 111 . 111 @r_nfvm
+vlxbu_v... 011 . . . 000 . 111 @r_nfvm
+vlxhu_v... 011 . . . 101 . 111 @r_nfvm
+vlxwu_v... 011 . . . 110 . 111 @r_nfvm
+vsxb_v ... 011 . . . 000 . 0100111 @r_nfvm
+vsxh_v ... 011 . . . 101 . 0100111 @r_nfvm
+vsxw_v ... 011 . . . 110 . 0100111 @r_nfvm
+vsxe_v ... 011 . . . 111 . 0100111 @r_nfvm
+vsuxb_v... 111 . . . 000 . 0100111 @r_nfvm
+vsuxh_v... 111 . . . 101 . 0100111 @r_nfvm
+vsuxw_v... 111 . . . 110 . 0100111 @r_nfvm
+vsuxe_v... 111 . . . 111 . 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 5a7ea94c2d..13033b3906 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -499,3 +499,167 @@ GEN_VEXT_ST_STRIDE_TRANS(vssb_v, vext_st_stride_trans, 0)
 GEN_VEXT_ST_STRIDE_TRANS(vssh_v, vext_st_stride_trans, 1)
 GEN_VEXT_ST_STRIDE_TRANS(vssw_v

[PATCH v4 1/4] target/riscv: add vector extension field in CPURISCVState

2020-02-10 Thread LIU Zhiwei
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno,offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de0a8d893a..07e63016a7 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -93,9 +93,22 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
+
+/* vector coprocessor state. */
+struct {
+ uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+ target_ulong vxrm;
+ target_ulong vxsat;
+ target_ulong vl;
+ target_ulong vstart;
+ target_ulong vtype;
+} vext;
+
 target_ulong pc;
 target_ulong load_res;
 target_ulong load_val;
-- 
2.23.0




[PATCH v4 3/4] target/riscv: support vector extension csr

2020-02-10 Thread LIU Zhiwei
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 72 +++--
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index e99834856c..1f588ebc14 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM 0x002
 #define CSR_FCSR0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE   0xc00
 #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0e34c292c5..4696c8c180 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+/* loose check condition for fcsr in vector extension */
+if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+return 0;
+}
 if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
@@ -53,6 +57,11 @@ static int fs(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+return 0;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -158,8 +167,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 return -1;
 }
 #endif
-*val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
-| (env->frm << FSR_RD_SHIFT);
+*val = (env->vext.vxrm << FSR_VXRM_SHIFT)
+| (env->vext.vxsat << FSR_VXSAT_SHIFT)
+| (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
+| (env->frm << FSR_RD_SHIFT);
 return 0;
 }
 
@@ -172,10 +183,60 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+env->vext.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vext.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
 riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
 return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vxrm;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vxsat;
+return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vstart;
+return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vext.vxrm = val;
+return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vext.vxsat = val;
+return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vext.vstart = val;
+return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -877,7 +938,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_FFLAGS] =  { fs,   read_fflags,  write_fflags  },
 [CSR_FRM] = { fs,   read_frm, write_frm },
 [CSR_FCSR] ={ fs,   read_fcsr,write_fcsr},
-
+/* Vector CSRs */
+[CSR_VSTART] =  { vs,   read_vstart,  write_vstart  },
+[CSR_VXSAT] =   { vs,   read_vxsat,   write_vxsat   },
+[CSR_VXRM] ={ vs,   read_vxrm,write_vxrm},
+[CSR_VL] =  { vs,   read_vl },
+[CSR_VTYPE] =   { vs,   read_vtype  },
 

[PATCH v4 0/4]target-riscv: support vector extension part 1

2020-02-10 Thread LIU Zhiwei
This is the first part of v4 patchset. The changelog of v4 is only coverd
the part1.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:

v4
  * adjust max vlen to 512 bits.
  * check maximum on elen(64bits).
  * check minimum on vlen(128bits).
  * check if rs1 is x0 in vsetvl/vsetvli.
  * use gen_goto_tb in vsetvli instead of exit_tb.
  * fixup fetch vlmax from rs2, not env->vext.type.
v3
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * only default on for "any" cpu, others turn on from command line.
  * use a continous memory block for vector register description.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (4):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: configure and turn on vector extension from command line
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction

 MAINTAINERS |  1 +
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.c  | 48 ++-
 target/riscv/cpu.h  | 82 ++---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 72 +-
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 -
 target/riscv/vector_helper.c| 49 +++
 11 files changed, 346 insertions(+), 16 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0




[PATCH v4 2/4] target/riscv: configure and turn on vector extension from command line

2020-02-10 Thread LIU Zhiwei
Vector extension is default on only for "any" cpu. It can be turned
on by command line "-cpu rv64,v=true,vlen=128,elen=64,vext_spec=v0.7.1".

vlen is the vector register length, default value is 128 bit.
elen is the max operator size in bits, default value is 64 bit.
vext_spec is the vector specification version, default value is v0.7.1.
Thest properties and cpu can be specified with other values.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 48 --
 target/riscv/cpu.h |  8 
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8c86ebc109..95fdb6261e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -98,6 +98,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
 env->features |= (1ULL << feature);
@@ -113,7 +118,7 @@ static void set_resetvec(CPURISCVState *env, int resetvec)
 static void riscv_any_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
-set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU | RVV);
 set_priv_version(env, PRIV_VERSION_1_11_0);
 set_resetvec(env, DEFAULT_RSTVEC);
 }
@@ -320,6 +325,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = 0;
 Error *local_err = NULL;
 
@@ -343,8 +349,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 }
-
+if (cpu->cfg.vext_spec) {
+if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
+vext_version = VEXT_VERSION_0_07_1;
+} else {
+error_setg(errp,
+   "Unsupported vector spec version '%s'",
+   cpu->cfg.vext_spec);
+return;
+}
+}
 set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
 set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
@@ -409,6 +425,30 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 if (cpu->cfg.ext_u) {
 target_misa |= RVU;
 }
+if (cpu->cfg.ext_v) {
+target_misa |= RVV;
+if (!is_power_of_2(cpu->cfg.vlen)) {
+error_setg(errp,
+   "Vector extension VLEN must be power of 2");
+return;
+}
+if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
+error_setg(errp,
+   "Vector extension implementation only supports VLEN "
+   "in the range [128, %d]", RV_VLEN_MAX);
+return;
+}
+if (!is_power_of_2(cpu->cfg.elen)) {
+error_setg(errp,
+   "Vector extension ELEN must be power of 2");
+return;
+}
+if (cpu->cfg.elen > 64) {
+error_setg(errp,
+   "Vector extension ELEN must <= 64");
+return;
+}
+}
 
 set_misa(env, RVXLEN | target_misa);
 }
@@ -444,10 +484,14 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("c", RISCVCPU, cfg.ext_c, true),
 DEFINE_PROP_BOOL("s", RISCVCPU, cfg.ext_s, true),
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
+DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
 DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
 DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
 DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
+DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
+DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
 DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 07e63016a7..bf2b4b55af 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -82,6

[PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions

2020-02-09 Thread LIU Zhiwei
Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address.

The Zvlsseg expands some vector load/store segment instructions, which move
multiple contiguous fields in memory to and from consecutively numbered
vector register

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  70 
 target/riscv/insn32.decode  |  17 +
 target/riscv/insn_trans/trans_rvv.inc.c | 294 
 target/riscv/translate.c|   2 +
 target/riscv/vector_helper.c| 438 
 5 files changed, 821 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..74c483ef9e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,73 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5dc009c3cd..dad3ed91c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6

[PATCH v3 2/5] target/riscv: add vector stride load and store instructions

2020-02-09 Thread LIU Zhiwei
Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  35 +
 target/riscv/insn32.decode  |  14 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 138 +++
 target/riscv/vector_helper.c| 169 
 4 files changed, 356 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c483ef9e..19c1bfc317 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -148,3 +148,38 @@ DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dad3ed91c7..2f2d3d13b3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
  shamt rs1 rd
 aq rl rs2 rs1 rd
 vm rd rs1 nf
+ vm rd rs1 rs2 nf
 
 # Formats 32:
 @r   ...   . . ... . ... %rs2 %rs1 
%rd
@@ -64,6 +65,7 @@
 @r2_rm   ...   . . ... . ... %rs1 %rm %rd
 @r2  ...   . . ... . ... %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 . . ... . ...  %rs1 %rd
+@r_nfvm  nf:3 ... vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r2_zimm . zimm:11  . ... . ... %rs1 %rd
 
 @sfence_vma ... . .   ... . ... %rs2 %rs1
@@ -222,6 +224,18 @@ vsh_v  ... 000 . 0 . 101 . 0100111 @r2_nfvm
 vsw_v  ... 000 . 0 . 110 . 0100111 @r2_nfvm
 vse_v  ... 000 . 0 . 111 . 0100111 @r2_nfvm
 
+vlsb_v ... 110 . . . 000 . 111 @r_nfvm
+vlsh_v ... 110 . . . 101 . 111 @r_nfvm
+vlsw_v ... 110 . . . 110 . 111 @r_nfvm
+vlse_v ... 010 . . . 111 . 111 @r_nfvm
+vlsbu_v... 010 . . . 000 . 111 @r_nfvm
+vlshu_v... 010 . . . 101 . 111 @r_nfvm
+vlswu_v... 010 . . . 110 . 111 @r_nfvm
+vssb_v ... 010 . . . 000 . 0100111 @r_nfvm
+vssh_v ... 010 . . . 101 . 0100111 @r_nfvm
+vssw_v ... 010 . . . 110 . 0100111 @r_nfvm
+vsse_v ... 010 . . . 111 . 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans

[PATCH 2/3] RISC-V: use FIELD macro to define tb flags

2020-01-10 Thread LIU Zhiwei
FIELD is more unified to define tb flags. It is easier to add new
filed to tb flags.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h   | 15 +--
 target/riscv/translate.c |  5 +++--
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index e59343e13c..8efd4c5904 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -282,22 +282,25 @@ void QEMU_NORETURN riscv_raise_exception(CPURISCVState 
*env,
 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 
-#define TB_FLAGS_MMU_MASK   3
-#define TB_FLAGS_MSTATUS_FS MSTATUS_FS
+FIELD(TB_FLAGS, MMU, 0, 2)
+FIELD(TB_FLAGS, FS, 13, 2)
 
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
 *pc = env->pc;
 *cs_base = 0;
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags = FIELD_DP32(flags, TB_FLAGS, FS, MSTATUS_FS);
 #else
-*flags = cpu_mmu_index(env, 0);
+flags = FIELD_DP32(flags, TB_FLAGS, MMU, cpu_mmu_index(env, 0));
 if (riscv_cpu_fp_enabled(env)) {
-*flags |= TB_FLAGS_MSTATUS_FS;
+flags = FIELD_DP32(flags, TB_FLAGS, FS, (env->mstatus & MSTATUS_FS));
 }
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index ab6a891dc3..5de2d11d5c 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -735,10 +735,11 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 DisasContext *ctx = container_of(dcbase, DisasContext, base);
 CPURISCVState *env = cs->env_ptr;
 RISCVCPU *cpu = RISCV_CPU(cs);
+uint32_t tb_flags = ctx->base.tb->flags;
 
 ctx->pc_succ_insn = ctx->base.pc_first;
-ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
-ctx->mstatus_fs = ctx->base.tb->flags & TB_FLAGS_MSTATUS_FS;
+ctx->mem_idx = FIELD_EX32(tb_flags, TB_FLAGS, MMU);
+ctx->mstatus_fs = FIELD_EX32(tb_flags, TB_FLAGS, FS);
 ctx->priv_ver = env->priv_ver;
 ctx->misa = env->misa;
 ctx->frm = -1;  /* unknown rounding mode */
-- 
2.23.0




[PATCH 1/3] select gdb fpu xml by single or double float extension

2020-01-10 Thread LIU Zhiwei
There is no reason why RISCV32 can't use RVD extension,
or RISCV64 can't just use RVF extension. And gdb will check
flen according to RVD or RVF feature in elf header.

Signed-off-by: LIU Zhiwei 
---
 configure  |  4 ++--
 target/riscv/gdbstub.c | 14 ++
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/configure b/configure
index 0ce2c0354a..2757c0a5a5 100755
--- a/configure
+++ b/configure
@@ -7679,13 +7679,13 @@ case "$target_name" in
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 mttcg=yes
-gdb_xml_files="riscv-32bit-cpu.xml riscv-32bit-fpu.xml riscv-32bit-csr.xml 
riscv-32bit-virtual.xml"
+gdb_xml_files="riscv-32bit-cpu.xml riscv-32bit-fpu.xml riscv-64bit-fpu.xml 
riscv-32bit-csr.xml riscv-32bit-virtual.xml"
   ;;
   riscv64)
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 mttcg=yes
-gdb_xml_files="riscv-64bit-cpu.xml riscv-64bit-fpu.xml riscv-64bit-csr.xml 
riscv-64bit-virtual.xml"
+gdb_xml_files="riscv-64bit-cpu.xml riscv-32bit-fpu.xml riscv-64bit-fpu.xml 
riscv-64bit-csr.xml riscv-64bit-virtual.xml"
   ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 1a7947e019..e3c9b320fb 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -403,23 +403,21 @@ void riscv_cpu_register_gdb_regs_for_features(CPUState 
*cs)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = >env;
-#if defined(TARGET_RISCV32)
-if (env->misa & RVF) {
+if (env->misa & RVD) {
+gdb_register_coprocessor(cs, riscv_gdb_get_fpu, riscv_gdb_set_fpu,
+ 36, "riscv-64bit-fpu.xml", 0);
+} else if (env->misa & RVF) {
 gdb_register_coprocessor(cs, riscv_gdb_get_fpu, riscv_gdb_set_fpu,
- 36, "riscv-32bit-fpu.xml", 0);
+  36, "riscv-32bit-fpu.xml", 0);
 }
 
+#if defined(TARGET_RISCV32)
 gdb_register_coprocessor(cs, riscv_gdb_get_csr, riscv_gdb_set_csr,
  240, "riscv-32bit-csr.xml", 0);
 
 gdb_register_coprocessor(cs, riscv_gdb_get_virtual, riscv_gdb_set_virtual,
  1, "riscv-32bit-virtual.xml", 0);
 #elif defined(TARGET_RISCV64)
-if (env->misa & RVF) {
-gdb_register_coprocessor(cs, riscv_gdb_get_fpu, riscv_gdb_set_fpu,
- 36, "riscv-64bit-fpu.xml", 0);
-}
-
 gdb_register_coprocessor(cs, riscv_gdb_get_csr, riscv_gdb_set_csr,
  240, "riscv-64bit-csr.xml", 0);
 
-- 
2.23.0




[PATCH 3/3] remove redundant check for fpu csr read and write interface

2020-01-10 Thread LIU Zhiwei
The read or write interface must be called after the predicate fs return 0.
And the predicate will check (!env->debugger && !riscv_cpu_fp_enabled(env)),
S0 no need to check again.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/csr.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index da02f9f0b1..0c2b8fc8f6 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -107,11 +107,6 @@ static int pmp(CPURISCVState *env, int csrno)
 /* User Floating-Point CSRs */
 static int read_fflags(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
-#endif
 *val = riscv_cpu_get_fflags(env);
 return 0;
 }
@@ -119,9 +114,6 @@ static int read_fflags(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fflags(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 riscv_cpu_set_fflags(env, val & (FSR_AEXC >> FSR_AEXC_SHIFT));
@@ -130,11 +122,6 @@ static int write_fflags(CPURISCVState *env, int csrno, 
target_ulong val)
 
 static int read_frm(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
-#endif
 *val = env->frm;
 return 0;
 }
@@ -142,9 +129,6 @@ static int read_frm(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_frm(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = val & (FSR_RD >> FSR_RD_SHIFT);
@@ -153,11 +137,6 @@ static int write_frm(CPURISCVState *env, int csrno, 
target_ulong val)
 
 static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
-#endif
 *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
 | (env->frm << FSR_RD_SHIFT);
 return 0;
@@ -166,9 +145,6 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
-- 
2.23.0




[PATCH v3 4/4] RISC-V: add vector extension configure instruction

2020-01-02 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.c  |  1 +
 target/riscv/cpu.h  | 55 -
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 52 +++
 target/riscv/translate.c| 17 +++-
 target/riscv/vector_helper.c| 51 +++
 8 files changed, 172 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index b1c79bc1d1..d577cef9e0 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o pmp.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o pmp.o
 
 DECODETREE = $(SRC_PATH)/scripts/decodetree.py
 
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c2370a0a57..3ff7b50bff 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -347,6 +347,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 }
 if (cpu->cfg.vext_spec) {
+env->vext.vtype = ~((target_ulong)-1 >> 1);
 if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
 vext_version = VEXT_VERSION_0_07_1;
 } else {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index d0b106583a..152a96f1fa 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -23,6 +23,7 @@
 #include "qom/cpu.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat.h"
+#include "hw/registerfields.h"
 
 #define TCG_GUEST_DEFAULT_MO 0
 
@@ -98,6 +99,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 4096
 
+struct VTYPE {
+#ifdef HOST_WORDS_BIGENDIAN
+target_ulong vill:1;
+target_ulong reserved:sizeof(target_ulong) * 8 - 7;
+target_ulong sew:3;
+target_ulong lmul:2;
+#else
+target_ulong lmul:2;
+target_ulong sew:3;
+target_ulong reserved:sizeof(target_ulong) * 8 - 7;
+target_ulong vill:1;
+#endif
+};
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -309,19 +324,44 @@ void QEMU_NORETURN riscv_raise_exception(CPURISCVState 
*env,
 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 
-#define TB_FLAGS_MMU_MASK   3
-#define TB_FLAGS_MSTATUS_FS MSTATUS_FS
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, MMU, 0, 2)
+FIELD(TB_FLAGS, FS, 13, 2)
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 16, 1)
+FIELD(TB_FLAGS, LMUL, 17, 2)
+FIELD(TB_FLAGS, SEW, 19, 3)
+FIELD(TB_FLAGS, VILL, 22, 1)
 
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+RISCVCPU *cpu = env_archcpu(env);
+struct VTYPE *vtype = (struct VTYPE *)>vext.vtype;
+uint32_t vlmax;
+uint8_t vl_eq_vlmax;
+uint32_t flags = 0;
+
 *pc = env->pc;
 *cs_base = 0;
+vlmax = (1 << vtype->lmul) * cpu->cfg.vlen / (8 * (1 << vtype->sew));
+vl_eq_vlmax = (env->vext.vstart == 0) && (vlmax == env->vext.vl);
+
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, vtype->vill);
+flags = FIELD_DP32(flags, TB_FLAGS, SEW, vtype->sew);
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL, vtype->lmul);
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags = FIELD_DP32(flags, TB_FLAGS, FS, MSTATUS_FS);
 #else
-*flags = cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
+flags = FIELD_DP32(flags, TB_FLAGS, MMU, cpu_mmu_index(env, 0));
+flags = FIELD_DP32(flags, TB_FLAGS, FS, (env->mstatus & MSTATUS_FS));
 #endif
+*pflags = flags;
+*cs_base = 0;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -362,9 +402,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..000b5aa3d1 100644
--

[PATCH v3 2/4] RISC-V: configure and turn on vector extension from command line

2020-01-02 Thread LIU Zhiwei
Vector extension is default on only for "any" cpu. It can be turned
on by command line "-cpu rv64,vlen=128,elen=64,vext_spec=v0.7.1".

vlen is the vector register length, default value is 128 bit.
elen is the max operator size in bits, default value is 64 bit.
vext_spec is the vector specification version, default value is v0.7.1.
Thest properties and cpu can be specified with other values.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 42 --
 target/riscv/cpu.h |  8 
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f8d07bd20a..c2370a0a57 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -94,6 +94,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
 env->features |= (1ULL << feature);
@@ -109,7 +114,7 @@ static void set_resetvec(CPURISCVState *env, int resetvec)
 static void riscv_any_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
-set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU);
+set_misa(env, RVXLEN | RVI | RVM | RVA | RVF | RVD | RVC | RVU | RVV);
 set_priv_version(env, PRIV_VERSION_1_11_0);
 set_resetvec(env, DEFAULT_RSTVEC);
 }
@@ -317,6 +322,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = 0;
 Error *local_err = NULL;
 
@@ -340,8 +346,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 }
-
+if (cpu->cfg.vext_spec) {
+if (!g_strcmp0(cpu->cfg.vext_spec, "v0.7.1")) {
+vext_version = VEXT_VERSION_0_07_1;
+} else {
+error_setg(errp,
+   "Unsupported vector spec version '%s'",
+   cpu->cfg.vext_spec);
+return;
+}
+}
 set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
 set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
@@ -406,6 +422,24 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 if (cpu->cfg.ext_u) {
 target_misa |= RVU;
 }
+if (cpu->cfg.ext_v) {
+target_misa |= RVV;
+if (!is_power_of_2(cpu->cfg.vlen)) {
+error_setg(errp,
+   "Vector extension VLEN must be power of 2");
+return;
+}
+if (cpu->cfg.vlen > RV_VLEN_MAX) {
+error_setg(errp,
+   "Vector extension VLEN must <= %d", RV_VLEN_MAX);
+return;
+}
+if (!is_power_of_2(cpu->cfg.elen)) {
+error_setg(errp,
+   "Vector extension ELEN must be power of 2");
+return;
+}
+}
 
 set_misa(env, RVXLEN | target_misa);
 }
@@ -441,10 +475,14 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("c", RISCVCPU, cfg.ext_c, true),
 DEFINE_PROP_BOOL("s", RISCVCPU, cfg.ext_s, true),
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
+DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
 DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
 DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
 DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
+DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
+DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
+DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
 DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index af66674461..d0b106583a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -82,6 +83,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x0071
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -119,6 +122,7 @@ struct CPURISCVState {
 target_ulong badaddr;
 
  

[PATCH v3 3/4] RISC-V: support vector extension csr

2020-01-02 Thread LIU Zhiwei
Until v0.7.1 specification, vector status is still not defined for
mstatus.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu_bits.h | 15 +++
 target/riscv/csr.c  | 92 +
 2 files changed, 80 insertions(+), 27 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 11f971ad5d..9eb43ecc1e 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM 0x002
 #define CSR_FCSR0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE   0xc00
 #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index e0d4586760..506ad7b590 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -53,6 +53,11 @@ static int fs(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+return 0;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -107,11 +112,6 @@ static int pmp(CPURISCVState *env, int csrno)
 /* User Floating-Point CSRs */
 static int read_fflags(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
-#endif
 *val = riscv_cpu_get_fflags(env);
 return 0;
 }
@@ -119,9 +119,6 @@ static int read_fflags(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fflags(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 riscv_cpu_set_fflags(env, val & (FSR_AEXC >> FSR_AEXC_SHIFT));
@@ -130,11 +127,6 @@ static int write_fflags(CPURISCVState *env, int csrno, 
target_ulong val)
 
 static int read_frm(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
-#endif
 *val = env->frm;
 return 0;
 }
@@ -142,9 +134,6 @@ static int read_frm(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_frm(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = val & (FSR_RD >> FSR_RD_SHIFT);
@@ -153,29 +142,73 @@ static int write_frm(CPURISCVState *env, int csrno, 
target_ulong val)
 
 static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 {
-#if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
-#endif
-*val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
-| (env->frm << FSR_RD_SHIFT);
+*val = (env->vext.vxrm << FSR_VXRM_SHIFT)
+| (env->vext.vxsat << FSR_VXSAT_SHIFT)
+| (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
+| (env->frm << FSR_RD_SHIFT);
 return 0;
 }
 
 static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-return -1;
-}
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+env->vext.vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vext.vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
 riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
 return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vxrm;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vext.vxsat;
+retur

[PATCH v3 1/4] RISC-V: add vector extension field in CPURISCVState

2020-01-02 Thread LIU Zhiwei
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno,offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0adb307f32..af66674461 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -93,9 +93,23 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 4096
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
+
+/* vector coprocessor state.  */
+struct {
+ uint64_t vreg[32 * RV_VLEN_MAX / 64];
+ target_ulong vxrm;
+ target_ulong vxsat;
+ target_ulong vl;
+ target_ulong vstart;
+ target_ulong vtype;
+} vext;
+
+bool foflag;
 target_ulong pc;
 target_ulong load_res;
 target_ulong load_val;
-- 
2.23.0




[PATCH v3 0/4] RISC-V: support vector extension part 1

2020-01-02 Thread LIU Zhiwei
This is the first part of v3 patchset. The changelog of v3 is only coverd
the part1.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension. 
   
  * support Zvlsseg.
   
  * support Zvamo.  
   
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v3
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * only default on for "any" cpu, others turn on from command line.
  * use a continous memory block for vector register description.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (4):
  RISC-V: add vector extension field in CPURISCVState
  RISC-V: configure and turn on vector extension from command line
  RISC-V: support vector extension csr
  RISC-V: add vector extension configure instruction

 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.c  | 43 +++-
 target/riscv/cpu.h  | 77 ++---
 target/riscv/cpu_bits.h | 15 
 target/riscv/csr.c  | 92 +
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 52 ++
 target/riscv/translate.c| 17 -
 target/riscv/vector_helper.c| 51 ++
 10 files changed, 314 insertions(+), 42 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

-- 
2.23.0




Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-25 Thread LIU Zhiwei


On 2019/12/20 4:38, Richard Henderson wrote:

On 12/18/19 11:11 PM, LIU Zhiwei wrote:

I'm sorry that it's really hard to absorb your opinion. I don't know why clang
will fail

when index beyond the end of vreg[n] into vreg[n+1].

I thought sure one of the address sanitizer checks would detect array bounds
overrun.  But it becomes irrelevant


As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
property which can be

specified from command line.  So the sub-struct maybe defined as

struct {
     union{
     uint64_t *u64 ;
     int64_t  *s64;
     uint32_t *u32;
     int32_t  *s32;
     uint16_t *u16;
     int16_t  *s16;
     uint8_t  *u8;
     int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext;

Will that be OK?

Pointers have consequences.  It can be done, but I don't think it is ideal.


The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit of
(VSTART == 0 && VL == VLMAX).

So it will take 8 bits of tb flags for vector extension at least.

Good.

However, I have one problem to support both command line VLEN and vreg_ofs.

As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
extension (to support command line VLEN) is

struct {
     union{
     uint64_t *u64 ;
     int64_t  *s64;
     uint32_t *u32;
     int32_t  *s32;
     uint16_t *u16;
     int16_t  *s16;
     uint8_t  *u8;
     int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext

I can't find the way to get the direct offset of vreg from cpu_env.

Maybe I should specify a max VLEN like the way of SVE?

I think a maximum vlen is best.  A command-line option to adjust vlen is all
well and good, but there's no reason to have to support vlen=(1<<29).

Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
be aligned mod 16.

I think that all you need is

 uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);

which gives us

uint32_t vreg_ofs(DisasContext *ctx, int reg)
{
 return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
}


struct {

    uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
    target_ulong vxrm;
    target_ulong vxsat;
    target_ulong vl;
    target_ulong vstart;
    target_ulong vtype;
    } vext;

Is it OK?


I don't see the point of a union for vreg.  I don't think you'll find that you
actually use it at all.


I think I can move most of execution check to translate time like SVE 
now. However, there are still some differences from SVE.


1)cpu_env must be used as a parameter for helper function.

    The helpers need  use env->vext.vl and env->vext.vstart.  Thus it 
will be difficult to use out of line tcg_gen_gvec_ool.


    void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,

    uint32_t oprsz, uint32_t maxsz, int32_t data,
    gen_helper_gvec_2 *fn)
    {
        ..
    fn(a0, a1, desc);
 ..
 }
    Maybe I have to write  something similar to tcg_gen_gvec_ool in 
trans_rvv.inc.c.  But it will be redundant.


2)simd_desc is not proper.

    I also need to transfer some members of DisasContext to helpers.

    (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of 
elements of this operation, so it will defined as ctx->lmul * ctx->vlen 
/ ctx->sew;


Data is reserved to expand.  Mlen is mask length for one elment, so it 
will defined as ctx->sew/ctx->lmul. As with Mlen, a active element will


be selected by

   static inline int vext_elem_mask(void *v0, int mlen, int index)
   {
    int idx = (index * mlen) / 8;
    int pos = (index * mlen) % 8;

    return (v0[idx] >> pos) & 0x1;
   }

    So I may have to implement vext_desc instead of use the simd_desc, 
which will be another redundant. Maybe a better way to mask elements?



You do need to document the element ordering that you're going to use for vreg.
  I.e. the mapping between the architectural vector register state and the
emulation state.  You have two choices:

(1) all bytes in host endianness (e.g. target/ppc)
(2) bytes within each uint64_t in host endianness,
 but each uint64_t is little-endian (e.g. target/arm).

Both require some fixup when running on a big-endian host.


Yes, I will take (2).


Best Regards,

Zhiwei



r~


Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-30 Thread LIU Zhiwei




On 2019/12/28 9:14, Richard Henderson wrote:

On 12/25/19 8:36 PM, LIU Zhiwei wrote:

struct {

     uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
     } vext;

Is it OK?

I don't think there's a good reason for the vext structure -- I would drop
that.  Otherwise it looks good.


However, there are still some differences from SVE.

1)cpu_env must be used as a parameter for helper function.

     The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be
difficult to use out of line tcg_gen_gvec_ool.

Sure.  That's also true of any of the fp operations, which will want to
accumulate ieee exceptions.

See tcg_gen_gvec_*_ptr(), which allows you to pass in cpu_env.

Thanks. The tcg_gen_gvec_*_ptr is good.



2)simd_desc is not proper.

     I also need to transfer some members of DisasContext to helpers.

     (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of
this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;

The oprsz & maxsz parameters to tcg_gen_gvec_* should be given (ctx->lmul *
ctx->vlen).  The sew parameter should be implied by the helper function called,
each helper function using a different type.  Therefore vlmax can be trivially
computed within the helper from oprsz / sizeof(type).
It's clear that the oprsz & maxsz paramenters should be given (ctx->lmul 
* ctx->vlen) for tcg_gen_gvec_add.


However It's not clear when use tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool. 
I think the meaning of oprsz is the
the bits of active elements.  Therefore , oprsz is  8 * env->vext.vl in 
RISC-V and it can't be fetched  from

TB_FLAGS like SVE.

Probably oprsz field will be not be used in RISC-V vector extension.

Data is reserved to expand.  Mlen is mask length for one elment, so it will
defined as ctx->sew/ctx->lmul. As with Mlen, a active element will

be selected by

 static inline int vext_elem_mask(void *v0, int mlen, int index)
 {
     int idx = (index * mlen) / 8;
     int pos = (index * mlen) % 8;

     return (v0[idx] >> pos) & 0x1;
 }

     So I may have to implement vext_desc instead of use the simd_desc, which
will be another redundant. Maybe a better way to mask elements?

I think you will want to define your own vext_desc, building upon simd_desc,
such that lg2(mlen) is passed in the first N bits of simd_data.

Good. It's a good way to use the tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool API.

Best Regards,
Zhiwei


r~





Re: [PATCH v3 3/4] RISC-V: support vector extension csr

2020-01-06 Thread LIU Zhiwei



On 2020/1/7 6:00, Jim Wilson wrote:

On 1/2/20 7:33 PM, LIU Zhiwei wrote:

Until v0.7.1 specification, vector status is still not defined for
mstatus.


The v0.8 spec does define a VS bit in mstatus.


Yes, I will also support v0.8 spec after the v0.7.1 spec.

@@ -107,11 +112,6 @@ static int pmp(CPURISCVState *env, int csrno)
  /* User Floating-Point CSRs */
  static int read_fflags(CPURISCVState *env, int csrno, target_ulong 
*val)

  {
-#if !defined(CONFIG_USER_ONLY)
-    if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
-    return -1;
-    }
-#endif
  *val = riscv_cpu_get_fflags(env);
  return 0;
  }


This allows reads of fflags when it doesn't exist, and hence does not 
make much sense.  Instead of removing the code, you should add a check 
for the vector extension, since the vector extension requires that 
fcsr exist even if the base architecture doesn't include FP support.  
Ideally this should use the VS bit, but if you don't have it then you 
can just check to see if the vector extension was enabled as a command 
line option.


I' sorry that there is some ambiguous here. The reason to remove these 
code is that they are redundant, and has nothing to do with the vector 
extension.  I just delete them by hand.


As you can see, all float csr has a predicate function.

static int fs(CPURISCVState *env, int csrno)
{
#if !defined(CONFIG_USER_ONLY)
    if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
    return -1;
    }
#endif
    return 0;
}

The read or write function must be called after the predicate return 0. 
So no need to check (!env->debugger && !(env->mstatus & MSTATUS_FS)  again.
While the vector spec says that fcsr must exist, it doesn't specify 
that the FP fields in fcsr are necessarily readable or writable when 
there is no FP.  It also doesn't specify whether the other FP related 
shadows of fcsr exist, like fflags.  This appears to have been left 
unspecified.  I don't think that you should be making fflags reads and 
writes work for a target with vector but without float.  I think it 
would make more sense to have fcsr behave 3 different ways depending 
on whether we have only F, only V, or both F and V.  And then we can 
support reads and writes of only the valid fields.



Thanks. Maybe I should just only loose the check condition for fcsr.

Best Regards,
Zhiwei

Jim





Re: [PATCH v3 2/4] RISC-V: configure and turn on vector extension from command line

2020-01-06 Thread LIU Zhiwei


On 2020/1/7 5:48, Jim Wilson wrote:

On 1/2/20 7:33 PM, LIU Zhiwei wrote:

+    if (cpu->cfg.vlen > RV_VLEN_MAX) {
+    error_setg(errp,
+   "Vector extension VLEN must <= %d", 
RV_VLEN_MAX);

+    return;


There is no architectural maximum for VLEN.  This is simply an 
implementation choice so you can use static arrays instead of malloc.  
I think this error should be reworded to something like "Vector 
extension implementation only supports VLEN <= %d."

Thanks. It's good to reduce ambiguous.
Zhiwei

The other errors here are for architecture requirements and are OK.

Jim




Re: [PATCH v3 4/4] RISC-V: add vector extension configure instruction

2020-01-06 Thread LIU Zhiwei

Hi Richard,

Thanks for the comments of the part 1.  It's really very helpful.
I accept most of the comments.

On 2020/1/4 7:41, Richard Henderson wrote:

On 1/3/20 2:33 PM, LIU Zhiwei wrote:

vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
  target/riscv/Makefile.objs  |  2 +-
  target/riscv/cpu.c  |  1 +
  target/riscv/cpu.h  | 55 -
  target/riscv/helper.h   |  2 +
  target/riscv/insn32.decode  |  5 +++
  target/riscv/insn_trans/trans_rvv.inc.c | 52 +++
  target/riscv/translate.c| 17 +++-
  target/riscv/vector_helper.c| 51 +++
  8 files changed, 172 insertions(+), 13 deletions(-)
  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
  create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index b1c79bc1d1..d577cef9e0 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o pmp.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o pmp.o
  
  DECODETREE = $(SRC_PATH)/scripts/decodetree.py
  
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c

index c2370a0a57..3ff7b50bff 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -347,6 +347,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  }
  }
  if (cpu->cfg.vext_spec) {
+env->vext.vtype = ~((target_ulong)-1 >> 1);

Better as FIELD_DP64(0, VTYPE, VILL, 1),



+struct VTYPE {
+#ifdef HOST_WORDS_BIGENDIAN
+target_ulong vill:1;
+target_ulong reserved:sizeof(target_ulong) * 8 - 7;
+target_ulong sew:3;
+target_ulong lmul:2;
+#else
+target_ulong lmul:2;
+target_ulong sew:3;
+target_ulong reserved:sizeof(target_ulong) * 8 - 7;
+target_ulong vill:1;
+#endif
+};

Do not use bit fields to describe target register layout.
Use FIELD().

OK. I think there is no need to the handle endianess here.  FIELD() is good.


-#define TB_FLAGS_MMU_MASK   3
-#define TB_FLAGS_MSTATUS_FS MSTATUS_FS
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, MMU, 0, 2)
+FIELD(TB_FLAGS, FS, 13, 2)

The change to use FIELD for MMU and FS should be made separately from adding
the vector state.


+FIELD(TB_FLAGS, VL_EQ_VLMAX, 16, 1)
+FIELD(TB_FLAGS, LMUL, 17, 2)
+FIELD(TB_FLAGS, SEW, 19, 3)
+FIELD(TB_FLAGS, VILL, 22, 1)

Why are you leaving holes in TB_FLAGS?  I know why the original hole was there,
since it corresponded to simple masks on other registers.


+vlmax = (1 << vtype->lmul) * cpu->cfg.vlen / (8 * (1 << vtype->sew));

Wow, this can be simplified a lot.

(1 << LMUL) * VLEN / (8 * (1 << SEW))
  = (VLEN << LMUL) / (8 << SEW)
  = (VLEN << LMUL) >> (SEW + 3)
  = VLEN >> (SEW + 3 - LMUL)


Good.

+vl_eq_vlmax = (env->vext.vstart == 0) && (vlmax == env->vext.vl);
+
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, vtype->vill);
+flags = FIELD_DP32(flags, TB_FLAGS, SEW, vtype->sew);
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL, vtype->lmul);
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);

I wonder if perhaps this all ought to be nested under

   if (env->misa & RVV) {
   ...
   } else {
   flag = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
   }

so that, for the normal case when RVV is disabled, we don't bother computing
all of those bits.


+static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
+{
+TCGv s1, s2, d;
+d = tcg_temp_new();
+s1 = tcg_temp_new();
+s2 = tcg_temp_new();
+gen_get_gpr(s1, a->rs1);
+gen_get_gpr(s2, a->rs2);
+gen_helper_vector_vsetvli(d, cpu_env, s1, s2);
+tcg_gen_st_tl(d, cpu_env, offsetof(CPURISCVState, vext.vl));

Why are you performing the store to vl inline, as opposed to within the helper
funtion?


+exit_tb(ctx);

A normal exit is correct for vsetvl, because the new state is variable.


+static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
+{
+TCGv s1, s2, d;
+d = tcg_temp_new();
+s1 = tcg_temp_new();
+s2 = tcg_const_tl(a->zimm);
+gen_get_gpr(s1, a->rs1);
+gen_helper_vector_vsetvli(d, cpu_env, s1, s2);
+tcg_gen_st_tl(d, cpu_env, offsetof(CPURISCVState, vext.vl));
+exit_tb(ctx);

You could use

   gen_goto_tb(ctx, 0, ctx->base.pc_next)

here, because the new state is unknown but constant.  It will be the same every
time the instruction is executed, and thus can comp

Re: [Qemu-devel] [PATCH v2 05/17] RISC-V: add vector extension load and store instructions

2020-01-07 Thread LIU Zhiwei

Hi Richard,

Sorry to reply so late for this comment.  I will move forward on part 2.
On 2019/9/12 22:23, Richard Henderson wrote:

+static bool  vector_lmul_check_reg(CPURISCVState *env, uint32_t lmul,
+uint32_t reg, bool widen)
+{
+int legal = widen ? (lmul * 2) : lmul;
+
+if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) ||
+(lmul == 8 && widen)) {
+helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
+return false;
+}
+
+if (reg % legal != 0) {
+helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
+return false;
+}
+return true;
+}

These exceptions will not do the right thing.

You cannot call helper_raise_exception from another helper, or from something
called from another helper, as here.  You need to use riscv_raise_exception, as
you do elsewhere in this patch, with a GETPC() value passed down from the
outermost helper.

Ideally you would check these conditions at translate time.
I've mentioned how to do this in reply to your v1.

As discussed in part1,  I will check these conditions at translate time.

+} else if (i < vl) {
+switch (width) {
+case 8:
+if (vector_elem_mask(env, vm, width, lmul, i)) {
+while (k >= 0) {
+read = i * (nf + 1)  + k;
+env->vfp.vreg[dest + k * lmul].u8[j] =
+cpu_ldub_data(env, env->gpr[rs1] + read);

You must not modify vreg[x] before you've recognized all possible exceptions,
e.g. validating that a subsequent access will not trigger a page fault.
Otherwise you will have a partially modified register value when the exception
handler is entered.

There are two questions here.

1) How to validate access before real access to registers?

As pointed in another comment for patchset v1,

"instructions that perform more than one host store must probe
  the entire range to be stored before performing any stores.
"

I didn't see the validation of page in SVE,  for example, sve_st1_r,
which directly use the  helper_ret_*_mmu  that may cause an page fault 
exception or ovelap a watchpoint,

before probe the entire range to be stored .

2) Why not use the  cpu_ld*  API?

I see in SVE that ld*_p is used to directly access the host memory. And 
helper_ret_*_mmu
is used to access guest memory. But from the definition of cpu_ld*, it's 
the combination of

ld*_p and helper_ret_*_mmu.

    entry = tlb_entry(env, mmu_idx, addr);
    if (unlikely(entry->ADDR_READ !=
 (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1) {
    oi = make_memop_idx(SHIFT, mmu_idx);
    res = glue(glue(helper_ret_ld, URETSUFFIX), MMUSUFFIX)(env, addr,
oi, retaddr);
    } else {
    uintptr_t hostaddr = addr + entry->addend;
    res = glue(glue(ld, USUFFIX), _p)((uint8_t *)hostaddr);
    }


So I don't know  why not use cpu_ld* API?

Without a stride, and without a predicate mask, this can be done with at most
two calls to probe_access (one per page).  This is the simplification that
makes splitting the helper into two very helpful.

With a stride or with a predicate mask requires either
(1) temporary storage for the loads, and copy back to env at the end, or
(2) use probe_access for each load, and then perform the actual loads directly
into env.

FWIW, ARM SVE uses (1), as probe_access is very new.


+k--;
+}
+env->vfp.vstart++;
+}
+break;
+case 16:
+if (vector_elem_mask(env, vm, width, lmul, i)) {
+while (k >= 0) {
+read = i * (nf + 1)  + k;
+env->vfp.vreg[dest + k * lmul].u16[j] =
+cpu_ldub_data(env, env->gpr[rs1] + read);

I don't see anything in these assignments to vreg[x].uN[y] that take the
endianness of the host into account.

You need to think about how the architecture defines the overlap of elements --
particularly across vlset -- and make adjustments.

I can imagine, if you have explicit tests for this, your tests are passing
because the architecture defines a little-endian based indexing of the register
file, and you have only run tests on a little-endian host, like x86_64.

For ARM, we define the representation as a little-endian indexed array of
host-endian uint64_t.  This means that a big-endian host needs to adjust the
address of any element smaller than 64-bit.  E.g.

#ifdef HOST_WORDS_BIGENDIAN
#define H1(x)   ((x) ^ 7)
#define H2(x)   ((x) ^ 3)
#define H4(x)   ((x) ^ 1)
#else
#define H1(x)   (x)
#define H2(x)   (x)
#define H4(x)   (x)
#endif

 env->vfp.vreg[reg + k * lmul].u16[H2(j)]

I will take it.  However I didn't have  a big-endian host to test the 
feature.



+if (base >= abs_off) {
+return base - abs_off;
+}
+} else {
+if ((target_ulong)((target_ulong)offset + base) >= base) 

[PATCH v4 23/60] target/riscv: vector single-width saturating add and subtract

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  33 +++
 target/riscv/insn32.decode  |  10 +
 target/riscv/insn_trans/trans_rvv.inc.c |  16 ++
 target/riscv/vector_helper.c| 278 
 4 files changed, 337 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 121e9e57e7..95da00d365 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -674,3 +674,36 @@ DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssubu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsaddu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssubu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bcb8273bcc..44baadf582 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -402,6 +402,16 @@ vwmaccus_vx 11 . . . 110 . 1010111 
@r_vm
 vmerge_vvm  010111 . . . 000 . 1010111 @r_vm
 vmerge_vxm  010111 . . . 100 . 1010111 @r_vm
 vmerge_vim  010111 . . . 011 . 1010111 @r_vm
+vsaddu_vv   10 . . . 000 . 1010111 @r_vm
+vsaddu_vx   10 . . . 100 . 1010111 @r_vm
+vsaddu_vi   10 . . . 011 . 1010111 @r_vm
+vsadd_vv11 . . . 000 . 1010111 @r_vm
+vsadd_vx11 . . . 100 . 1010111 @r_vm
+vsadd_vi11 . . . 011 . 1010111 @r_vm
+vssubu_vv   100010 . . . 000 . 1010111 @r_vm
+vssubu_vx   100010 . . . 100 . 1010111 @r_vm
+vssub_vv100011 . . . 000 . 1010111 @r_vm
+vssub_vx100011 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index aff5ca8663..ad55766b98 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1505,3 +1505,19 @@ static bool opivx_vmerge_check(DisasContext *s, arg_rmrr 
*a)
 GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
 
 GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
+
+/*
+ *** Vector Fixed-Point Arithmetic Instructions
+ */
+
+/* Vector Single-Width Saturating Add and Subtract */
+GEN_OPIVV_GVEC_TRANS(vsaddu_vv, usadd)
+GEN_OPIVV_GVEC_TRANS(vsadd_vv,  ssadd)
+GEN_OPIVV_GVEC_TRANS(vssubu_vv, ussub)
+GEN_OPIVV_GVEC_TRANS(vssub_vv,  sssub)
+GEN_OPIVX_TRANS(vsaddu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vsadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
+GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 273b705847..c7b8c1bff4 100644
--- a/target/riscv

[PATCH v4 24/60] target/riscv: vector single-width averaging add and subtract

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  17 
 target/riscv/insn32.decode  |   5 +
 target/riscv/insn_trans/trans_rvv.inc.c |   7 ++
 target/riscv/vector_helper.c| 129 
 4 files changed, 158 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 95da00d365..d3837d2ca4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -707,3 +707,20 @@ DEF_HELPER_6(vssub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vaadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vasub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vaadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 44baadf582..0227a16b16 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -412,6 +412,11 @@ vssubu_vv   100010 . . . 000 . 1010111 
@r_vm
 vssubu_vx   100010 . . . 100 . 1010111 @r_vm
 vssub_vv100011 . . . 000 . 1010111 @r_vm
 vssub_vx100011 . . . 100 . 1010111 @r_vm
+vaadd_vv100100 . . . 000 . 1010111 @r_vm
+vaadd_vx100100 . . . 100 . 1010111 @r_vm
+vaadd_vi100100 . . . 011 . 1010111 @r_vm
+vasub_vv100110 . . . 000 . 1010111 @r_vm
+vasub_vx100110 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index ad55766b98..9988fad2fe 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1521,3 +1521,10 @@ GEN_OPIVX_TRANS(vssubu_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vsaddu_vi, 1, vsaddu_vx, opivx_check)
 GEN_OPIVI_TRANS(vsadd_vi, 0, vsadd_vx, opivx_check)
+
+/* Vector Single-Width Averaging Add and Subtract */
+GEN_OPIVV_TRANS(vaadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vasub_vv, opivv_check)
+GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
+GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
+GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index c7b8c1bff4..b0a7a3b6e4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2291,3 +2291,132 @@ GEN_VEXT_VX_ENV(vssub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Averaging Add and Subtract */
+static inline uint8_t get_round(CPURISCVState *env, uint64_t v, uint8_t shift)
+{
+uint8_t d = extract64(v, shift, 1);
+uint8_t d1;
+uint64_t D1, D2;
+int mod = env->vxrm;
+
+if (shift == 0 || shift > 64) {
+return 0;
+}
+
+d1 = extract64(v, shift - 1, 1);
+D1 = extract64(v, 0, shift);
+if (mod == 0) { /* round-to-nearest-up (add +0.5 LSB) */
+return d1;
+} else if (mod == 1) { /* round-to-nearest-even */
+if (shift > 1) {
+D2 = extract64(v, 0, shift - 1);
+return d1 & ((D2 != 0) | d);
+} else {
+return d1 & d;
+}
+} else if (mod == 3) { /* round-to-odd (OR bits into LSB, aka "jam") */
+return !d & (D1 != 0);
+}
+return 0; /* round-down (truncate) */
+}
+
+static inline int8_t aadd8(CPURISCVState *env, int8_t a, int8_t b)
+{
+int16_t res = (int16_t)a + (int16_t)b;
+uint8_t round = get_round(env, res, 1);
+res   = (res >> 1) + round;
+return res;
+}
+static inline int16_t aadd16(CPURISCVState *env, int16_t a, int16_t b)
+{
+int32_t res = (int32_t)a + (int32_t)b;
+uint8_t round = get_round(env, res, 1);
+res   = (res >> 1) + round;
+return res;
+}
+static inline

[PATCH v4 25/60] target/riscv: vector single-width fractional multiply with rounding and saturation

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |   9 +++
 target/riscv/insn32.decode  |   2 +
 target/riscv/insn_trans/trans_rvv.inc.c |   4 +
 target/riscv/vector_helper.c| 103 
 4 files changed, 118 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index d3837d2ca4..333eccca57 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -724,3 +724,12 @@ DEF_HELPER_6(vasub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vasub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0227a16b16..99f70924d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -417,6 +417,8 @@ vaadd_vx100100 . . . 100 . 1010111 @r_vm
 vaadd_vi100100 . . . 011 . 1010111 @r_vm
 vasub_vv100110 . . . 000 . 1010111 @r_vm
 vasub_vx100110 . . . 100 . 1010111 @r_vm
+vsmul_vv100111 . . . 000 . 1010111 @r_vm
+vsmul_vx100111 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 9988fad2fe..60e1e63b7b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1528,3 +1528,7 @@ GEN_OPIVV_TRANS(vasub_vv, opivv_check)
 GEN_OPIVX_TRANS(vaadd_vx,  opivx_check)
 GEN_OPIVX_TRANS(vasub_vx,  opivx_check)
 GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
+GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index b0a7a3b6e4..74ad07743c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2420,3 +2420,106 @@ GEN_VEXT_VX_ENV(vasub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vasub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vasub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vasub_vx_d, 8, 8, clearq)
+
+/* Vector Single-Width Fractional Multiply with Rounding and Saturation */
+static inline int8_t vsmul8(CPURISCVState *env, int8_t a, int8_t b)
+{
+uint8_t round;
+int16_t res;
+
+res = (int16_t)a * (int16_t)b;
+round = get_round(env, res, 7);
+res   = (res >> 7) + round;
+
+if (res > INT8_MAX) {
+env->vxsat = 0x1;
+return INT8_MAX;
+} else if (res < INT8_MIN) {
+env->vxsat = 0x1;
+return INT8_MIN;
+} else {
+return res;
+}
+}
+static int16_t vsmul16(CPURISCVState *env, int16_t a, int16_t b)
+{
+uint8_t round;
+int32_t res;
+
+res = (int32_t)a * (int32_t)b;
+round = get_round(env, res, 15);
+res   = (res >> 15) + round;
+
+if (res > INT16_MAX) {
+env->vxsat = 0x1;
+return INT16_MAX;
+} else if (res < INT16_MIN) {
+env->vxsat = 0x1;
+return INT16_MIN;
+} else {
+return res;
+}
+}
+static int32_t vsmul32(CPURISCVState *env, int32_t a, int32_t b)
+{
+uint8_t round;
+int64_t res;
+
+res = (int64_t)a * (int64_t)b;
+round = get_round(env, res, 31);
+res   = (res >> 31) + round;
+
+if (res > INT32_MAX) {
+env->vxsat = 0x1;
+return INT32_MAX;
+} else if (res < INT32_MIN) {
+env->vxsat = 0x1;
+return INT32_MIN;
+} else {
+return res;
+}
+}
+static int64_t vsmul64(CPURISCVState *env, int64_t a, int64_t b)
+{
+uint8_t round;
+uint64_t hi_64, lo_64, Hi62;
+uint8_t hi62, hi63, lo63;
+
+muls64(_64, _64, a, b);
+hi62 = extract64(hi_64, 62, 1);
+lo63 = extract64(lo_64, 63, 1);
+hi63 = extract64(hi_64, 63, 1);
+Hi62 = extract64(hi_64, 0, 62);
+if (hi62 != hi63) {
+env->vxsat = 0x1;
+return INT64_MAX;
+}
+round = get_round(env, lo_64, 63);
+if (round && (Hi62 == 0x3fff) && lo63) {
+env->vxsat = 0x1;
+return hi62 ? INT64_MIN : INT64_MAX;
+} else {
+if (lo63 && round) {
+return (hi_64 + 1) << 1;
+} else {

[PATCH v4 28/60] target/riscv: vector narrowing fixed-point clip instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  13 +++
 target/riscv/insn32.decode  |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c| 128 
 4 files changed, 155 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index efc84fbd79..4cad8679ec 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -772,3 +772,16 @@ DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnclip_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclip_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d6d111e04a..c7d589566f 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -432,6 +432,12 @@ vssrl_vi101010 . . . 011 . 1010111 
@r_vm
 vssra_vv101011 . . . 000 . 1010111 @r_vm
 vssra_vx101011 . . . 100 . 1010111 @r_vm
 vssra_vi101011 . . . 011 . 1010111 @r_vm
+vnclipu_vv  101110 . . . 000 . 1010111 @r_vm
+vnclipu_vx  101110 . . . 100 . 1010111 @r_vm
+vnclipu_vi  101110 . . . 011 . 1010111 @r_vm
+vnclip_vv   10 . . . 000 . 1010111 @r_vm
+vnclip_vx   10 . . . 100 . 1010111 @r_vm
+vnclip_vi   10 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 21f896ea26..11b4887275 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1549,3 +1549,11 @@ GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
 GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
 GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
 GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+GEN_OPIVV_NARROW_TRANS(vnclipu_vv)
+GEN_OPIVV_NARROW_TRANS(vnclip_vv)
+GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
+GEN_OPIVX_NARROW_TRANS(vnclip_vx)
+GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
+GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ec0f822fcf..7f61d4c0c4 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -869,6 +869,12 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, 
clearl)
 #define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
 #define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
 #define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
+#define NOP_SSS_B int8_t, int8_t, int16_t, int8_t, int16_t
+#define NOP_SSS_H int16_t, int16_t, int32_t, int16_t, int32_t
+#define NOP_SSS_W int32_t, int32_t, int64_t, int32_t, int64_t
+#define NOP_UUU_B uint8_t, uint8_t, uint16_t, uint8_t, uint16_t
+#define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
+#define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)\
@@ -2812,3 +2818,125 @@ GEN_VEXT_VX_ENV(vssra_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vssra_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vssra_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vssra_vx_d, 8, 8, clearq)
+
+/* Vector Narrowing Fixed-Point Clip Instructions */
+static int8_t vnclip8(CPURISCVState *env, int16_t a, int8_t b)
+{
+uint8_t round, shift = b & 0xf;
+int16_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+if (res > INT8_MAX) {
+env->vxsat = 0x1;
+return INT8_MAX;
+} else if (res < INT8_MIN) {
+env->vxsat = 0x1;
+return INT8_MIN;
+} else {
+return res;
+}
+}
+static int16_t vnclip16(CPURISCVState *env, int32_t a, int16_t b)
+{
+uint8_t round, shift = b & 0x1f;
+int32_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+if 

[PATCH v4 31/60] target/riscv: vector single-width floating-point multiply/divide instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 16 +
 target/riscv/insn32.decode  |  5 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  7 
 target/riscv/vector_helper.c| 48 +
 4 files changed, 76 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f242fa4e4b..a2d7ed19a8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -818,3 +818,19 @@ DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, 
i32)
 DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmul_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ec95541c6..050b2fd467 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -451,6 +451,11 @@ vfwsub_vv   110010 . . . 001 . 1010111 
@r_vm
 vfwsub_vf   110010 . . . 101 . 1010111 @r_vm
 vfwsub_wv   110110 . . . 001 . 1010111 @r_vm
 vfwsub_wf   110110 . . . 101 . 1010111 @r_vm
+vfmul_vv100100 . . . 001 . 1010111 @r_vm
+vfmul_vf100100 . . . 101 . 1010111 @r_vm
+vfdiv_vv10 . . . 001 . 1010111 @r_vm
+vfdiv_vf10 . . . 101 . 1010111 @r_vm
+vfrdiv_vf   11 . . . 101 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index ab04f469af..8dcbff6c64 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1795,3 +1795,10 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   
\
 }
 GEN_OPFWF_WIDEN_TRANS(vfwadd_wf)
 GEN_OPFWF_WIDEN_TRANS(vfwsub_wf)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+GEN_OPFVV_TRANS(vfmul_vv, opfvv_check)
+GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
+GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
+GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0840c5d662..bd7ee4de18 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3106,3 +3106,51 @@ RVVCALL(OPFVF2, vfwsub_wf_h, WOP_WUUU_H, H4, H2, 
vfwsubw16)
 RVVCALL(OPFVF2, vfwsub_wf_w, WOP_WUUU_W, H8, H4, vfwsubw32)
 GEN_VEXT_VF(vfwsub_wf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwsub_wf_w, 4, 8, clearq)
+
+/* Vector Single-Width Floating-Point Multiply/Divide Instructions */
+RVVCALL(OPFVV2, vfmul_vv_h, OP_UUU_H, H2, H2, H2, float16_mul)
+RVVCALL(OPFVV2, vfmul_vv_w, OP_UUU_W, H4, H4, H4, float32_mul)
+RVVCALL(OPFVV2, vfmul_vv_d, OP_UUU_D, H8, H8, H8, float64_mul)
+GEN_VEXT_VV_ENV(vfmul_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfmul_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfmul_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfmul_vf_h, OP_UUU_H, H2, H2, float16_mul)
+RVVCALL(OPFVF2, vfmul_vf_w, OP_UUU_W, H4, H4, float32_mul)
+RVVCALL(OPFVF2, vfmul_vf_d, OP_UUU_D, H8, H8, float64_mul)
+GEN_VEXT_VF(vfmul_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfmul_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfmul_vf_d, 8, 8, clearq)
+
+RVVCALL(OPFVV2, vfdiv_vv_h, OP_UUU_H, H2, H2, H2, float16_div)
+RVVCALL(OPFVV2, vfdiv_vv_w, OP_UUU_W, H4, H4, H4, float32_div)
+RVVCALL(OPFVV2, vfdiv_vv_d, OP_UUU_D, H8, H8, H8, float64_div)
+GEN_VEXT_VV_ENV(vfdiv_vv_h, 2, 2, clearh)
+GEN_VEXT_VV_ENV(vfdiv_vv_w, 4, 4, clearl)
+GEN_VEXT_VV_ENV(vfdiv_vv_d, 8, 8, clearq)
+RVVCALL(OPFVF2, vfdiv_vf_h, OP_UUU_H, H2, H2, float16_div)
+RVVCALL(OPFVF2, vfdiv_vf_w, OP_UUU_W, H4, H4, float32_div)
+RVVCALL(OPFVF2, vfdiv_vf_d, OP_UUU_D, H8, H8, float64_div)
+GEN_VEXT_VF(vfdiv_vf_h, 2, 2, clearh)
+GEN_VEXT_VF(vfdiv_vf_w, 4, 4, clearl)
+GEN_VEXT_VF(vfdiv_vf_d, 8, 8, clearq)
+
+static uint16_t float16_rdiv(uint16_t a, uint16_t b, float_status *s

[PATCH v4 00/60] target/riscv: support vector extension v0.7.1

2020-03-10 Thread LIU Zhiwei
This patchset implements the vector extension for RISC-V on QEMU.

You can also find the patchset and all *test cases* in
my repo(https://github.com/romanheros/qemu.git branch:vector-upstream-v3).
All the test cases are in the directory qemu/tests/riscv/vector/. They are
riscv64 linux user mode programs.

You can test the patchset by the script qemu/tests/riscv/vector/runcase.sh.

Features:
  * support specification 
riscv-v-spec-0.7.1.(https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1/)
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * SLEN always equals VLEN.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v4
  * no change
v3
  * move check code from execution-time to translation-time
  * use a continous memory block for vector register description.
  * vector registers as direct fields in RISCVCPUState.
  * support VLEN configure from qemu command line.
  * support ELEN configure from qemu command line.
  * support vector specification version configure from qemu command line.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * generation atomic exit exception when in parallel environment.
  * fixup a lot of concrete bugs.

V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property


LIU Zhiwei (60):
  target/riscv: add vector extension field in CPURISCVState
  target/riscv: implementation-defined constant parameters
  target/riscv: support vector extension csr
  target/riscv: add vector configure instruction
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations
  target/riscv: vector single-width integer add and subtract
  target/riscv: vector widening integer add and subtract
  target/riscv: vector integer add-with-carry / subtract-with-borrow
instructions
  target/riscv: vector bitwise logical instructions
  target/riscv: vector single-width bit shift instructions
  target/riscv: vector narrowing integer right shift instructions
  target/riscv: vector integer comparison instructions
  target/riscv: vector integer min/max instructions
  target/riscv: vector single-width integer multiply instructions
  target/riscv: vector integer divide instructions
  target/riscv: vector widening integer multiply instructions
  target/riscv: vector single-width integer multiply-add instructions
  target/riscv: vector widening integer multiply-add instructions
  target/riscv: vector integer merge and move instructions
  target/riscv: vector single-width saturating add and subtract
  target/riscv: vector single-width averaging add and subtract
  target/riscv: vector single-width fractional multiply with rounding
and saturation
  target/riscv: vector widening saturating scaled multiply-add
  target/riscv: vector single-width scaling shift instructions
  target/riscv: vector narrowing fixed-point clip instructions
  target/riscv: vector single-width floating-point add/subtract
instructions
  target/riscv: vector widening floating-point add/subtract instructions
  target/riscv: vector single-width floating-point multiply/divide
instructions
  target/riscv: vector widening floating-point multiply
  target/riscv: vector single-width floating-point fused multiply-add
instructions
  target/riscv: vector widening floating-point fused multiply-add
instructions
  target/riscv: vector floating-point square-root instruction
  target/riscv: vector floating-point min/max instructions
  target/riscv: vector floating-point sign-injection instructions
  target/riscv: vector floating-point compare instructions
  target/riscv: vector floating-point classify instructions
  target/riscv: vector floating-point merge instructions
  target/riscv: vector floating-point/integer type-convert instructions
  target/riscv: widening floating-point/integer type-convert
instructions
  target/riscv: narrowing floating-point/integer type-convert
instructions
  target/riscv: vector single-width integer reduction instructions
  target/riscv: vector wideing integer reduction instructions
  target/riscv: vector single-width floating-point reduction
instructions
  target/riscv: vector widening floating-point reduction instructions
  target/riscv: vector mask-register logical instructions
  target/riscv: vector mask population count vmpopc
  target/riscv: vmfirst find-first-set mask bit
  target/riscv: set-X-first mask bit
  target/riscv: vector iota instruction
  target/riscv: vector element index instruction
  target/riscv: integer extract instruction
  target/riscv: integer scalar move instruction
  target/riscv: floating-point scalar move instructions
  target/riscv: vector slide instructions
  target/riscv: vector register gather instruction

[PATCH v4 07/60] target/riscv: add fault-only-first unit stride load

2020-03-10 Thread LIU Zhiwei
The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  22 +
 target/riscv/insn32.decode  |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  69 +++
 target/riscv/vector_helper.c| 111 
 4 files changed, 209 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f9b3da60ca..72ba4d9bdb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vleff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index bc36df33b5..b76c09c8c0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -224,6 +224,13 @@ vle_v  ... 000 . 0 . 111 . 111 @r2_nfvm
 vlbu_v ... 000 . 0 . 000 . 111 @r2_nfvm
 vlhu_v ... 000 . 0 . 101 . 111 @r2_nfvm
 vlwu_v ... 000 . 0 . 110 . 111 @r2_nfvm
+vlbff_v... 100 . 1 . 000 . 111 @r2_nfvm
+vlhff_v... 100 . 1 . 101 . 111 @r2_nfvm
+vlwff_v... 100 . 1 . 110 . 111 @r2_nfvm
+vleff_v... 000 . 1 . 111 . 111 @r2_nfvm
+vlbuff_v   ... 000 . 1 . 000 . 111 @r2_nfvm
+vlhuff_v   ... 000 . 1 . 101 . 111 @r2_nfvm
+vlwuff_v   ... 000 . 1 . 110 . 111 @r2_nfvm
 vsb_v  ... 000 . 0 . 000 . 0100111 @r2_nfvm
 vsh_v  ... 000 . 0 . 101 . 0100111 @r2_nfvm
 vsw_v  ... 000 . 0 . 110 . 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 5d1eeef323..9d9fc886d6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -531,3 +531,72 @@ GEN_VEXT_TRANS(vsxb_v, 0, rnfvm, st_index_op, 
st_index_check)
 GEN_VEXT_TRANS(vsxh_v, 1, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxw_v, 2, rnfvm, st_index_op, st_index_check)
 GEN_VEXT_TRANS(vsxe_v, 3, rnfvm, st_index_op, st_index_check)
+
+/*
+ *** unit stride fault-only-first load
+ */
+static bool ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+gen_helper_ldst_us *fn, DisasContext *s)
+{
+TCGv_ptr dest, mask;
+TCGv base;
+TCGv_i32 desc;
+
+dest = tcg_temp_new_ptr();
+mask = tcg_temp_new_ptr();
+base = tcg_temp_new();
+desc = tcg_const_i32(simd_desc(0, s->vlen / 8, data));
+
+gen_get_gpr(base, rs1);
+tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+fn(dest, mask, base, cpu_env, desc);
+
+tcg_temp_free_ptr(dest);
+tcg_temp_free_ptr(mask);
+tcg_temp_free(base);
+tcg_temp_free_i32(desc);
+return true;
+}
+
+static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+uint32_t data = 0;
+gen_helper_ldst_us *fn;
+static gen_helper_ldst_us * const fns[7][4] = {
+{ gen_helper_vlbff_v_b,  gen_helper_vlbff_v_h,
+  gen_helper_vlbff_v_w,  gen_helper_vlbff_v_d },
+{ NULL,  gen_helper_vlhff_v_h,
+  gen_helper_vlhff_v_w,  gen_helper_vlhff_v_d },
+{ NULL,  NULL,
+  gen_helper_vlwff_v_w,  gen_helper_vlwff_v_d },
+{ gen_helper_vleff_v_b,  gen_helper_vleff_

[PATCH v4 12/60] target/riscv: vector bitwise logical instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 25 
 target/riscv/insn32.decode  |  9 +
 target/riscv/insn_trans/trans_rvv.inc.c | 11 ++
 target/riscv/vector_helper.c| 51 +
 4 files changed, 96 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72c733bf49..4373e9e8c2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -372,3 +372,28 @@ DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vand_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vxor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vand_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vand_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e8ddf95d3d..29a505cede 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -310,6 +310,15 @@ vsbc_vvm010010 1 . . 000 . 1010111 @r
 vsbc_vxm010010 1 . . 100 . 1010111 @r
 vmsbc_vvm   010011 1 . . 000 . 1010111 @r
 vmsbc_vxm   010011 1 . . 100 . 1010111 @r
+vand_vv 001001 . . . 000 . 1010111 @r_vm
+vand_vx 001001 . . . 100 . 1010111 @r_vm
+vand_vi 001001 . . . 011 . 1010111 @r_vm
+vor_vv  001010 . . . 000 . 1010111 @r_vm
+vor_vx  001010 . . . 100 . 1010111 @r_vm
+vor_vi  001010 . . . 011 . 1010111 @r_vm
+vxor_vv 001011 . . . 000 . 1010111 @r_vm
+vxor_vx 001011 . . . 100 . 1010111 @r_vm
+vxor_vi 001011 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index a1f2e84eb8..3a4696dbcd 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1212,3 +1212,14 @@ static bool trans_##NAME(DisasContext *s, arg_r *a)  
\
 }
 GEN_OPIVI_R_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
 GEN_OPIVI_R_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
+
+/* Vector Bitwise Logical Instructions */
+GEN_OPIVV_GVEC_TRANS(vand_vv, and)
+GEN_OPIVV_GVEC_TRANS(vor_vv,  or)
+GEN_OPIVV_GVEC_TRANS(vxor_vv, xor)
+GEN_OPIVX_GVEC_TRANS(vand_vx, ands)
+GEN_OPIVX_GVEC_TRANS(vor_vx,  ors)
+GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
+GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
+GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
+GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index dd85b94fe7..532b373f99 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1202,3 +1202,54 @@ GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
 GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
+
+/* Vector Bitwise Logical Instructions */
+RVVCALL(OPIVV2, vand_vv_b, OP_SSS_B, H1, H1, H1, DO_AND)
+RVVCALL(OPIVV2, vand_vv_h, OP_SSS_H, H2, H2, H2, DO_AND)
+RVVCALL(OPIVV2, vand_vv_w, OP_SSS_W, H4, H4, H4, DO_AND)
+RVVCALL(OPIVV2, vand_vv_d, OP_SSS_D, H8, H8, H8, DO_AND)
+RVVCALL(OPIVV2, vor_vv_b, OP_SSS_B, H1, H1, H1, DO_OR)
+RVVCALL(OPIVV2, vor_vv_h, OP_SSS_H, H2, H2, H2, DO_OR)
+RVVCALL(OPIVV2, vor_vv_w, OP_SSS_W, H4, H4, H4

[PATCH v4 10/60] target/riscv: vector widening integer add and subtract

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  49 
 target/riscv/insn32.decode  |  16 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 154 
 target/riscv/vector_helper.c| 112 +
 4 files changed, 331 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e73701d4bb..1256defb6c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -290,3 +290,52 @@ DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwaddu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwaddu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsubu_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d1034a0e61..4bdbfd16fa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -284,6 +284,22 @@ vsub_vv 10 . . . 000 . 1010111 
@r_vm
 vsub_vx 10 . . . 100 . 1010111 @r_vm
 vrsub_vx11 . . . 100 . 1010111 @r_vm
 vrsub_vi11 . . . 011 . 1010111 @r_vm
+vwaddu_vv   11 . . . 010 . 1010111 @r_vm
+vwaddu_vx   11 . . . 110 . 1010111 @r_vm
+vwadd_vv110001 . . . 010 . 1010111 @r_vm
+vwadd_vx110001 . . . 110 . 1010111 @r_vm
+vwsubu_vv   110010 . . . 010 . 1010111 @r_vm
+vwsubu_vx   110010 . . . 110 . 1010111 @r_vm
+vwsub_vv110011 . . . 010 . 1010111 @r_vm
+vwsub_vx110011 . . . 110 . 1010111 @r_vm
+vwaddu_wv   110100 . . . 010 . 1010111 @r_vm
+vwaddu_wx   110100 . . . 110 . 1010111 @r_vm
+vwadd_wv110101 . . . 010 . 1010111 @r_vm
+vwadd_wx110101 . . . 110 . 1010111 @r_vm
+vwsubu_wv   110110 . . . 010 . 1010111 @r_vm
+vwsubu_wx   110110 . . . 110

[PATCH v4 13/60] target/riscv: vector single-width bit shift instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 25 
 target/riscv/insn32.decode  |  9 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 44 +
 target/riscv/vector_helper.c| 82 +
 4 files changed, 160 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4373e9e8c2..47284c7476 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -397,3 +397,28 @@ DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vsll_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsll_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 29a505cede..dbbfa34b97 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -319,6 +319,15 @@ vor_vi  001010 . . . 011 . 1010111 
@r_vm
 vxor_vv 001011 . . . 000 . 1010111 @r_vm
 vxor_vx 001011 . . . 100 . 1010111 @r_vm
 vxor_vi 001011 . . . 011 . 1010111 @r_vm
+vsll_vv 100101 . . . 000 . 1010111 @r_vm
+vsll_vx 100101 . . . 100 . 1010111 @r_vm
+vsll_vi 100101 . . . 011 . 1010111 @r_vm
+vsrl_vv 101000 . . . 000 . 1010111 @r_vm
+vsrl_vx 101000 . . . 100 . 1010111 @r_vm
+vsrl_vi 101000 . . . 011 . 1010111 @r_vm
+vsra_vv 101001 . . . 000 . 1010111 @r_vm
+vsra_vx 101001 . . . 100 . 1010111 @r_vm
+vsra_vi 101001 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 3a4696dbcd..a60518e1df 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1223,3 +1223,47 @@ GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
 GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
 GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
 GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
+
+/* Vector Single-Width Bit Shift Instructions */
+GEN_OPIVV_GVEC_TRANS(vsll_vv,  shlv)
+GEN_OPIVV_GVEC_TRANS(vsrl_vv,  shrv)
+GEN_OPIVV_GVEC_TRANS(vsra_vv,  sarv)
+
+#define GEN_OPIVX_GVEC_SHIFT_TRANS(NAME, GVSUF)   \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)\
+{ \
+if (!opivx_check(s, a)) { \
+return false; \
+} \
+  \
+if (a->vm && s->vl_eq_vlmax) {\
+TCGv_i32 src1 = tcg_temp_new_i32();   \
+TCGv tmp = tcg_temp_new();\
+gen_get_gpr(tmp, a->rs1); \
+tcg_gen_trunc_tl_i32(src1, tmp);  \
+tcg_gen_gvec_##GVSUF(8 << s->sew, vreg_ofs(s, a->rd), \
+vreg_ofs(s, a

[PATCH v4 15/60] target/riscv: vector integer comparison instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  57 +++
 target/riscv/insn32.decode  |  20 
 target/riscv/insn_trans/trans_rvv.inc.c |  66 
 target/riscv/vector_helper.c| 130 
 4 files changed, 273 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 0f36a8ce43..4e6c47c2d2 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -435,3 +435,60 @@ DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmseq_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsne_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmslt_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsle_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmseq_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsne_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsltu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmslt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsleu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsle_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgtu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e21b3d6b5e..525b2fa442 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -334,6 +334,26 @@ vnsrl_vi101100 . . . 011 . 1010111 
@r_vm
 vnsra_vv101101 . . . 000 . 1010111 @r_vm
 vnsra_vx101101 . . . 100 . 1010111 @r_vm
 vnsra_vi101101 . . . 011 . 1010111 @r_vm
+vmseq_vv011000 . . . 000 . 1010111 @r_vm
+vmseq_vx011000 . . . 100 . 1010111 @r_vm
+vmseq_vi011000 . . . 011 . 1010111 @r_vm
+vmsne_vv011001 . . . 000 . 1010111 @r_vm
+vmsne_vx011001 . . . 100 . 1010111 @r_vm
+vmsne_vi011001 . . . 011 . 1010111

[PATCH v4 32/60] target/riscv: vector widening floating-point multiply

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  5 +
 target/riscv/insn32.decode  |  2 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  4 
 target/riscv/vector_helper.c| 22 ++
 4 files changed, 33 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a2d7ed19a8..3ec2dcadd4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -834,3 +834,8 @@ DEF_HELPER_6(vfdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrdiv_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 050b2fd467..e0ee8f5a7c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -456,6 +456,8 @@ vfmul_vf100100 . . . 101 . 1010111 @r_vm
 vfdiv_vv10 . . . 001 . 1010111 @r_vm
 vfdiv_vf10 . . . 101 . 1010111 @r_vm
 vfrdiv_vf   11 . . . 101 . 1010111 @r_vm
+vfwmul_vv   111000 . . . 001 . 1010111 @r_vm
+vfwmul_vf   111000 . . . 101 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 8dcbff6c64..b4d3797685 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1802,3 +1802,7 @@ GEN_OPFVV_TRANS(vfdiv_vv, opfvv_check)
 GEN_OPFVF_TRANS(vfmul_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfdiv_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Multiply */
+GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index bd7ee4de18..8bb6ac158f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3154,3 +3154,25 @@ RVVCALL(OPFVF2, vfrdiv_vf_d, OP_UUU_D, H8, H8, 
float64_rdiv)
 GEN_VEXT_VF(vfrdiv_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfrdiv_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfrdiv_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Multiply */
+static uint32_t vfwmul16(uint16_t a, uint16_t b, float_status *s)
+{
+return float32_mul(float16_to_float32(a, true, s),
+float16_to_float32(b, true, s), s);
+}
+
+static uint64_t vfwmul32(uint32_t a, uint32_t b, float_status *s)
+{
+return float64_mul(float32_to_float64(a, s),
+float32_to_float64(b, s), s);
+
+}
+RVVCALL(OPFVV2, vfwmul_vv_h, WOP_UUU_H, H4, H2, H2, vfwmul16)
+RVVCALL(OPFVV2, vfwmul_vv_w, WOP_UUU_W, H8, H4, H4, vfwmul32)
+GEN_VEXT_VV_ENV(vfwmul_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmul_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF2, vfwmul_vf_h, WOP_UUU_H, H4, H2, vfwmul16)
+RVVCALL(OPFVF2, vfwmul_vf_w, WOP_UUU_W, H8, H4, vfwmul32)
+GEN_VEXT_VF(vfwmul_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmul_vf_w, 4, 8, clearq)
-- 
2.23.0




[PATCH v4 30/60] target/riscv: vector widening floating-point add/subtract instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  17 +++
 target/riscv/insn32.decode  |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 131 
 target/riscv/vector_helper.c|  77 ++
 4 files changed, 233 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6b46677eeb..f242fa4e4b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -801,3 +801,20 @@ DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, 
i32)
 DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwadd_wf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwsub_wf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 32918c4d11..5ec95541c6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -443,6 +443,14 @@ vfadd_vf00 . . . 101 . 1010111 
@r_vm
 vfsub_vv10 . . . 001 . 1010111 @r_vm
 vfsub_vf10 . . . 101 . 1010111 @r_vm
 vfrsub_vf   100111 . . . 101 . 1010111 @r_vm
+vfwadd_vv   11 . . . 001 . 1010111 @r_vm
+vfwadd_vf   11 . . . 101 . 1010111 @r_vm
+vfwadd_wv   110100 . . . 001 . 1010111 @r_vm
+vfwadd_wf   110100 . . . 101 . 1010111 @r_vm
+vfwsub_vv   110010 . . . 001 . 1010111 @r_vm
+vfwsub_vf   110010 . . . 101 . 1010111 @r_vm
+vfwsub_wv   110110 . . . 001 . 1010111 @r_vm
+vfwsub_wf   110110 . . . 101 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index af4dcb96c6..ab04f469af 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1664,3 +1664,134 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)  
  \
 GEN_OPFVF_TRANS(vfadd_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfsub_vf,  opfvf_check)
 GEN_OPFVF_TRANS(vfrsub_vf,  opfvf_check)
+
+/* Vector Widening Floating-Point Add/Subtract Instructions */
+static bool opfvv_widen_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, true) &&
+vext_check_reg(s, a->rd, true) &&
+vext_check_reg(s, a->rs2, false) &&
+vext_check_reg(s, a->rs1, false) &&
+vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs2,
+1 << s->lmul) &&
+vext_check_overlap_group(a->rd, 2 << s->lmul, a->rs1,
+1 << s->lmul) &&
+(s->lmul < 0x3) && (s->sew < 0x3) && (s->sew != 0));
+}
+
+/* OPFVV with WIDEN */
+#define GEN_OPFVV_WIDEN_TRANS(NAME, CHECK)   \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   \
+{\
+if (CHECK(s, a)) {   \
+uint32_t data = 0;   \
+static gen_helper_gvec_4_ptr * const fns[2] = {  \
+gen_helper_##NAME##_h, gen_helper_##NAME##_w,\
+};   \
+data = FIELD_DP32(data, VDATA, MLEN, s->mlen);   \
+data = FIELD_DP32(data, VDATA, VM, a->vm);   \
+data = FIELD_DP32(data, VDATA, LMUL, s->lmul);   \
+tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0)

[PATCH v4 34/60] target/riscv: vector widening floating-point fused multiply-add instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 17 +
 target/riscv/insn32.decode  |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c| 84 +
 4 files changed, 119 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3b6dd96918..57e0fee929 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -888,3 +888,20 @@ DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, 
i32)
 DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9834091a86..b7cb116cf4 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -474,6 +474,14 @@ vfmsub_vv   101010 . . . 001 . 1010111 
@r_vm
 vfmsub_vf   101010 . . . 101 . 1010111 @r_vm
 vfnmsub_vv  101011 . . . 001 . 1010111 @r_vm
 vfnmsub_vf  101011 . . . 101 . 1010111 @r_vm
+vfwmacc_vv  00 . . . 001 . 1010111 @r_vm
+vfwmacc_vf  00 . . . 101 . 1010111 @r_vm
+vfwnmacc_vv 01 . . . 001 . 1010111 @r_vm
+vfwnmacc_vf 01 . . . 101 . 1010111 @r_vm
+vfwmsac_vv  10 . . . 001 . 1010111 @r_vm
+vfwmsac_vf  10 . . . 101 . 1010111 @r_vm
+vfwnmsac_vv 11 . . . 001 . 1010111 @r_vm
+vfwnmsac_vf 11 . . . 101 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 172de867ea..06d6e2625b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1824,3 +1824,13 @@ GEN_OPFVF_TRANS(vfmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmadd_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfmsub_vf, opfvf_check)
 GEN_OPFVF_TRANS(vfnmsub_vf, opfvf_check)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+GEN_OPFVV_WIDEN_TRANS(vfwmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmacc_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwmsac_vv, opfvv_widen_check)
+GEN_OPFVV_WIDEN_TRANS(vfwnmsac_vv, opfvv_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2e0341adb0..9bff516a15 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3404,3 +3404,87 @@ RVVCALL(OPFVF3, vfnmsub_vf_d, OP_UUU_D, H8, H8, fnmsub64)
 GEN_VEXT_VF(vfnmsub_vf_h, 2, 2, clearh)
 GEN_VEXT_VF(vfnmsub_vf_w, 4, 4, clearl)
 GEN_VEXT_VF(vfnmsub_vf_d, 8, 8, clearq)
+
+/* Vector Widening Floating-Point Fused Multiply-Add Instructions */
+static uint32_t fwmacc16(uint16_t a, uint16_t b, uint32_t d, float_status *s)
+{
+return float32_muladd(float16_to_float32(a, true, s),
+float16_to_float32(b, true, s), d, 0, s);
+}
+static uint64_t fwmacc32(uint32_t a, uint32_t b, uint64_t d, float_status *s)
+{
+return float64_muladd(float32_to_float64(a, s),
+float32_to_float64(b, s), d, 0, s);
+}
+RVVCALL(OPFVV3, vfwmacc_vv_h, WOP_UUU_H, H4, H2, H2, fwmacc16)
+RVVCALL(OPFVV3, vfwmacc_vv_w, WOP_UUU_W, H8, H4, H4, fwmacc32)
+GEN_VEXT_VV_ENV(vfwmacc_vv_h, 2, 4, clearl)
+GEN_VEXT_VV_ENV(vfwmacc_vv_w, 4, 8, clearq)
+RVVCALL(OPFVF3, vfwmacc_vf_h, WOP_UUU_H, H4, H2, fwmacc16)
+RVVCALL(OPFVF3, vfwmacc_vf_w, WOP_UUU_W, H8, H4, fwmacc32)
+GEN_VEXT_VF(vfwmacc_vf_h, 2, 4, clearl)
+GEN_VEXT_VF(vfwmacc_vf_w, 4, 8, clearq)
+
+static uint32_t fwnmacc16(uint16_t a, uint16_t b

Questions about pollute the mail list archives

2020-03-10 Thread LIU Zhiwei

Hi, forks

When I sent vector extension patchset v3(2020/03/09),  my mail system 
works some wrong,
and only part of the patchset were sent.  When I try to send again, it 
either can't work.


Even more, I found the mail list archives were polluted, many 
repetitions and scattered

in many threads. And no thread is complete.

Is it serious?
Is there any way to clear it in the mail list archives?
Can I send it again to the mail list?

Zhiwei





[PATCH v4 06/60] target/riscv: add vector index load and store instructions

2020-03-10 Thread LIU Zhiwei
Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  35 +++
 target/riscv/insn32.decode  |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 124 
 target/riscv/vector_helper.c| 117 ++
 4 files changed, 289 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 87dfa90609..f9b3da60ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_h, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_w, void, ptr, ptr, tl, tl, env, i32)
 DEF_HELPER_6(vsse_v_d, void, ptr, ptr, tl, tl, env, i32)
+DEF_HELPER_6(vlxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ef521152c5..bc36df33b5 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -241,6 +241,19 @@ vssh_v ... 010 . . . 101 . 0100111 @r_nfvm
 vssw_v ... 010 . . . 110 . 0100111 @r_nfvm
 vsse_v ... 010 . . . 111 . 0100111 @r_nfvm
 
+vlxb_v ... 111 . . . 000 . 111 @r_nfvm
+vlxh_v ... 111 . . . 101 . 111 @r_nfvm
+vlxw_v ... 111 . . . 110 . 111 @r_nfvm
+vlxe_v ... 011 . . . 111 . 111 @r_nfvm
+vlxbu_v... 011 . . . 000 . 111 @r_nfvm
+vlxhu_v... 011 . . . 101 . 111 @r_nfvm
+vlxwu_v... 011 . . . 110 . 111 @r_nfvm
+# Vector ordered-indexed and unordered-indexed store insns.
+vsxb_v ... -11 . . . 000 . 0100111 @r_nfvm
+vsxh_v ... -11 . . . 101 . 0100111 @r_nfvm
+vsxw_v ... -11 . . . 110 . 0100111 @r_nfvm
+vsxe_v ... -11 . . . 111 . 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index d85f2aec68..5d1eeef323 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -407,3 +407,127 @@ GEN_VEXT_TRANS(vssb_v, 0, rnfvm, st_stride_op, 
st_stride_check)
 GEN_VEXT_TRANS(vssh_v, 1, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vssw_v, 2, rnfvm, st_stride_op, st_stride_check)
 GEN_VEXT_TRANS(vsse_v, 3, rnfvm, st_stride_op, st_stride_check)
+
+/*
+ *** index load and store
+ */
+typedef void gen_helper_ldst_index(TCGv_ptr, TCGv_ptr, TCGv,
+TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+uint32_t data, gen_helper_ldst_index *fn

[PATCH v4 01/60] target/riscv: add vector extension field in CPURISCVState

2020-03-10 Thread LIU Zhiwei
The 32 vector registers will be viewed as a continuous memory block.
It avoids the convension between element index and (regno, offset).
Thus elements can be directly accessed by offset from the first vector
base address.

Signed-off-by: LIU Zhiwei 
Acked-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3dcdf92227..0c1f7bdd8b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,7 @@
 #define RVA RV('A')
 #define RVF RV('F')
 #define RVD RV('D')
+#define RVV RV('V')
 #define RVC RV('C')
 #define RVS RV('S')
 #define RVU RV('U')
@@ -94,9 +95,20 @@ typedef struct CPURISCVState CPURISCVState;
 
 #include "pmp.h"
 
+#define RV_VLEN_MAX 512
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
+
+/* vector coprocessor state. */
+uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
+target_ulong vxrm;
+target_ulong vxsat;
+target_ulong vl;
+target_ulong vstart;
+target_ulong vtype;
+
 target_ulong pc;
 target_ulong load_res;
 target_ulong load_val;
-- 
2.23.0




[PATCH v4 05/60] target/riscv: add vector stride load and store instructions

2020-03-10 Thread LIU Zhiwei
Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address. It can been seen as a special
case of strided operations.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h  |   6 +
 target/riscv/helper.h   | 105 ++
 target/riscv/insn32.decode  |  32 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 340 
 target/riscv/translate.c|   7 +
 target/riscv/vector_helper.c| 406 
 6 files changed, 896 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 11fc573168..a6761f3838 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -369,6 +369,12 @@ typedef CPURISCVState CPUArchState;
 typedef RISCVCPU ArchCPU;
 #include "exec/cpu-all.h"
 
+/* share data between vector helpers and decode code */
+FIELD(VDATA, MLEN, 0, 8)
+FIELD(VDATA, VM, 8, 1)
+FIELD(VDATA, LMUL, 9, 2)
+FIELD(VDATA, NF, 11, 4)
+
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
 FIELD(TB_FLAGS, SEW, 5, 3)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..87dfa90609 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,108 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, ptr, tl, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, ptr, tl, env, i32)
+DE

[PATCH v4 08/60] target/riscv: add vector amo operations

2020-03-10 Thread LIU Zhiwei
Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu.h  |   1 +
 target/riscv/helper.h   |  29 +
 target/riscv/insn32-64.decode   |  11 ++
 target/riscv/insn32.decode  |  13 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 130 +
 target/riscv/vector_helper.c| 143 
 6 files changed, 327 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index a6761f3838..6fcaa5bc89 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -374,6 +374,7 @@ FIELD(VDATA, MLEN, 0, 8)
 FIELD(VDATA, VM, 8, 1)
 FIELD(VDATA, LMUL, 9, 2)
 FIELD(VDATA, NF, 11, 4)
+FIELD(VDATA, WD, 11, 1)
 
 FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 72ba4d9bdb..70a4b05f75 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,32 @@ DEF_HELPER_5(vlhuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlhuff_v_d, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_w, void, ptr, ptr, tl, env, i32)
 DEF_HELPER_5(vlwuff_v_d, void, ptr, ptr, tl, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d, void, ptr, ptr, tl, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w,   void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . . . 011 . 010 @atom_st
 amominu_d  11000 . . . . 011 . 010 @atom_st
 amomaxu_d  11100 . . . . 011 . 010 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo) ***
+vamoswapd_v 1 . . . . 111 . 010 @r_wdvm
+vamoaddd_v  0 . . . . 111 . 010 @r_wdvm
+vamoxord_v  00100 . . . . 111 . 010 @r_wdvm
+vamoandd_v  01100 . . . . 111 . 010 @r_wdvm
+vamoord_v   01000 . . . . 111 . 010 @r_wdvm
+vamomind_v  1 . . . . 111 . 010 @r_wdvm
+vamomaxd_v  10100 . . . . 111 . 010 @r_wdvm
+vamominud_v 11000 . . . . 111 . 010 @r_wdvm
+vamomaxud_v 11100 . . . . 111 . 010 @r_wdvm
+
 # *** RV64F Standard Extension (in addition to RV32F) ***
 fcvt_l_s   110  00010 . ... . 1010011 @r2_rm
 fcvt_lu_s  110  00011 . ... . 1010011 @r2_rm
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b76c09c8c0..1330703720 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 imm rd
  shamt rs1 rd
 aq rl rs2 rs1 rd
+ vm wd rd rs1 rs2
 vm rd rs1 nf
  vm rd rs1 rs2 nf
 
@@ -67,6 +68,7 @@
 @r2  ...   . . ... . ... %rs1 %rd
 @r2_nfvm ... ... vm:1 . . ... . ...  %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
+@r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r2_zimm . zimm:11

[PATCH v4 09/60] target/riscv: vector single-width integer add and subtract

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  21 +++
 target/riscv/insn32.decode  |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 220 
 target/riscv/vector_helper.c| 122 +
 4 files changed, 373 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 70a4b05f75..e73701d4bb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -269,3 +269,24 @@ DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1330703720..d1034a0e61 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 imm rd
  shamt rs1 rd
 aq rl rs2 rs1 rd
+  vm rd rs1 rs2
  vm wd rd rs1 rs2
 vm rd rs1 nf
  vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2  ...   . . ... . ... %rs1 %rd
 @r2_nfvm ... ... vm:1 . . ... . ...  %nf %rs1 %rd
 @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
+@r_vm.. vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r2_zimm . zimm:11  . ... . ... %rs1 %rd
 
@@ -275,5 +277,13 @@ vamominuw_v 11000 . . . . 110 . 010 
@r_wdvm
 vamomaxuw_v 11100 . . . . 110 . 010 @r_wdvm
 
 # *** new major opcode OP-V ***
+vadd_vv 00 . . . 000 . 1010111 @r_vm
+vadd_vx 00 . . . 100 . 1010111 @r_vm
+vadd_vi 00 . . . 011 . 1010111 @r_vm
+vsub_vv 10 . . . 000 . 1010111 @r_vm
+vsub_vx 10 . . . 100 . 1010111 @r_vm
+vrsub_vx11 . . . 100 . 1010111 @r_vm
+vrsub_vi11 . . . 011 . 1010111 @r_vm
+
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 3c677160c5..00c7ec976f 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -730,3 +730,223 @@ GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
 GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
 #endif
+
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+#define MAXSZ(s) (s->vlen >> (3 - s->lmul))
+
+static bool opivv_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+vext_check_reg(s, a->rs1, false));
+}
+
+/* OPIVV with GVEC IR */
+#define GEN_OPIVV_GVEC_TRANS(NAME, GVSUF)  \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
+{  \
+if (!opivv_check(s, a)) {  \
+return false;  \
+}  \
+   \
+if (a->vm && s->vl_eq_vlmax) { \
+tcg_gen_gvec_##GVSUF(8 << s->se

[PATCH v4 02/60] target/riscv: implementation-defined constant parameters

2020-03-10 Thread LIU Zhiwei
vlen is the vector register length in bits.
elen is the max element size in bits.
vext_spec is the vector specification version, default value is v0.7.1.

Signed-off-by: LIU Zhiwei 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c | 7 +++
 target/riscv/cpu.h | 5 +
 2 files changed, 12 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c0b7023100..6e4135583d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -106,6 +106,11 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
+static void set_vext_version(CPURISCVState *env, int vext_ver)
+{
+env->vext_ver = vext_ver;
+}
+
 static void set_feature(CPURISCVState *env, int feature)
 {
 env->features |= (1ULL << feature);
@@ -364,6 +369,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
+int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = 0;
 Error *local_err = NULL;
 
@@ -389,6 +395,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 set_priv_version(env, priv_version);
+set_vext_version(env, vext_version);
 set_resetvec(env, DEFAULT_RSTVEC);
 
 if (cpu->cfg.mmu) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0c1f7bdd8b..603715f849 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -84,6 +84,8 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
+#define VEXT_VERSION_0_07_1 0x0701
+
 #define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
@@ -119,6 +121,7 @@ struct CPURISCVState {
 target_ulong guest_phys_fault_addr;
 
 target_ulong priv_ver;
+target_ulong vext_ver;
 target_ulong misa;
 target_ulong misa_mask;
 
@@ -281,6 +284,8 @@ typedef struct RISCVCPU {
 
 char *priv_spec;
 char *user_spec;
+uint16_t vlen;
+uint16_t elen;
 bool mmu;
 bool pmp;
 } cfg;
-- 
2.23.0




[PATCH v4 22/60] target/riscv: vector integer merge and move instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  9 
 target/riscv/insn32.decode  |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 24 ++
 target/riscv/vector_helper.c| 58 +
 4 files changed, 94 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1f0d3d60e3..121e9e57e7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -665,3 +665,12 @@ DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmerge_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmerge_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2a5b945139..bcb8273bcc 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -399,6 +399,9 @@ vwmacc_vx   01 . . . 110 . 1010111 @r_vm
 vwmaccsu_vv 10 . . . 010 . 1010111 @r_vm
 vwmaccsu_vx 10 . . . 110 . 1010111 @r_vm
 vwmaccus_vx 11 . . . 110 . 1010111 @r_vm
+vmerge_vvm  010111 . . . 000 . 1010111 @r_vm
+vmerge_vxm  010111 . . . 100 . 1010111 @r_vm
+vmerge_vim  010111 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 958737d097..aff5ca8663 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1481,3 +1481,27 @@ GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
+
+/* Vector Integer Merge and Move Instructions */
+static bool opivv_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+vext_check_reg(s, a->rs1, false) &&
+((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVV_TRANS(vmerge_vvm, opivv_vmerge_check)
+
+static bool opivx_vmerge_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+((a->vm == 0) || (a->rs2 == 0)));
+}
+GEN_OPIVX_TRANS(vmerge_vxm, opivx_vmerge_check)
+
+GEN_OPIVI_TRANS(vmerge_vim, 0, vmerge_vxm, opivx_vmerge_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 5109654f9f..273b705847 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1955,3 +1955,61 @@ GEN_VEXT_VX(vwmaccsu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Integer Merge and Move Instructions */
+#define GEN_VEXT_VMERGE_VV(NAME, ETYPE, H, CLEAR_FN) \
+void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2,  \
+CPURISCVState *env, uint32_t desc)   \
+{\
+uint32_t mlen = vext_mlen(desc); \
+uint32_t vm = vext_vm(desc); \
+uint32_t vl = env->vl;   \
+uint32_t esz = sizeof(ETYPE);\
+uint32_t vlmax = vext_maxsz(desc) / esz; \
+uint32_t i;  \
+ \
+for (i = 0; i < vl; i++) {   \
+if (!vm && !vext_elem_mask(v0, mlen, i)) {   \
+ETYPE s2 = *((ETYPE *)vs2 + H(i));   \
+*((ETYPE *)vd + H1(i)) = s2; \
+} else { 

[PATCH v4 16/60] target/riscv: vector integer min/max instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 33 
 target/riscv/insn32.decode  |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 
 target/riscv/vector_helper.c| 71 +
 4 files changed, 122 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4e6c47c2d2..c7d4ff185a 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -492,3 +492,36 @@ DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vminu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vminu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmin_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmaxu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 525b2fa442..a7619f4e3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -354,6 +354,14 @@ vmsgtu_vx   00 . . . 100 . 1010111 
@r_vm
 vmsgtu_vi   00 . . . 011 . 1010111 @r_vm
 vmsgt_vx01 . . . 100 . 1010111 @r_vm
 vmsgt_vi01 . . . 011 . 1010111 @r_vm
+vminu_vv000100 . . . 000 . 1010111 @r_vm
+vminu_vx000100 . . . 100 . 1010111 @r_vm
+vmin_vv 000101 . . . 000 . 1010111 @r_vm
+vmin_vx 000101 . . . 100 . 1010111 @r_vm
+vmaxu_vv000110 . . . 000 . 1010111 @r_vm
+vmaxu_vx000110 . . . 100 . 1010111 @r_vm
+vmax_vv 000111 . . . 000 . 1010111 @r_vm
+vmax_vx 000111 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 078d275af6..4437a77878 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1424,3 +1424,13 @@ GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
 GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
+
+/* Vector Integer Min/Max Instructions */
+GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
+GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
+GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
+GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
+GEN_OPIVX_TRANS(vminu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
+GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e7a4e99f46..03e001262f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -849,6 +849,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, 
clearl)
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t

[PATCH v4 11/60] target/riscv: vector integer add-with-carry / subtract-with-borrow instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  33 ++
 target/riscv/insn32.decode  |  10 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 108 ++
 target/riscv/vector_helper.c| 140 
 4 files changed, 291 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1256defb6c..72c733bf49 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -339,3 +339,36 @@ DEF_HELPER_6(vwadd_wx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsub_wx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vvm_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4bdbfd16fa..e8ddf95d3d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -300,6 +300,16 @@ vwsubu_wv   110110 . . . 010 . 1010111 
@r_vm
 vwsubu_wx   110110 . . . 110 . 1010111 @r_vm
 vwsub_wv110111 . . . 010 . 1010111 @r_vm
 vwsub_wx110111 . . . 110 . 1010111 @r_vm
+vadc_vvm01 1 . . 000 . 1010111 @r
+vadc_vxm01 1 . . 100 . 1010111 @r
+vadc_vim01 1 . . 011 . 1010111 @r
+vmadc_vvm   010001 1 . . 000 . 1010111 @r
+vmadc_vxm   010001 1 . . 100 . 1010111 @r
+vmadc_vim   010001 1 . . 011 . 1010111 @r
+vsbc_vvm010010 1 . . 000 . 1010111 @r
+vsbc_vxm010010 1 . . 100 . 1010111 @r
+vmsbc_vvm   010011 1 . . 000 . 1010111 @r
+vmsbc_vxm   010011 1 . . 100 . 1010111 @r
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 7f6fe82fb3..a1f2e84eb8 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1104,3 +1104,111 @@ GEN_OPIWX_WIDEN_TRANS(vwaddu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwadd_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsubu_wx)
 GEN_OPIWX_WIDEN_TRANS(vwsub_wx)
+
+/* OPIVV with UNMASKED */
+#define GEN_OPIVV_R_TRANS(NAME, CHECK) \
+static bool trans_##NAME(DisasContext *s, arg_r *a)\
+{  \
+if (CHECK(s, a)) { \
+uint32_t data = 0; \
+static gen_helper_gvec_4_ptr * const fns[4] = {\
+gen_helper_##NAME##_b, gen_helper_##NAME##_h,  \
+gen_helper_##NAME##_w, gen_helper_##NAME##_d,  \
+}; \
+   \
+data

[PATCH v4 33/60] target/riscv: vector single-width floating-point fused multiply-add instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  49 +
 target/riscv/insn32.decode  |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  18 ++
 target/riscv/vector_helper.c| 228 
 4 files changed, 311 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3ec2dcadd4..3b6dd96918 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -839,3 +839,52 @@ DEF_HELPER_6(vfwmul_vv_h, void, ptr, ptr, ptr, ptr, env, 
i32)
 DEF_HELPER_6(vfwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwmul_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_6(vfmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmacc_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsac_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfnmsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e0ee8f5a7c..9834091a86 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -458,6 +458,22 @@ vfdiv_vf10 . . . 101 . 1010111 
@r_vm
 vfrdiv_vf   11 . . . 101 . 1010111 @r_vm
 vfwmul_vv   111000 . . . 001 . 1010111 @r_vm
 vfwmul_vf   111000 . . . 101 . 1010111 @r_vm
+vfmacc_vv   101100 . . . 001 . 1010111 @r_vm
+vfnmacc_vv  101101 . . . 001 . 1010111 @r_vm
+vfnmacc_vf  101101 . . . 101 . 1010111 @r_vm
+vfmacc_vf   101100 . . . 101 . 1010111 @r_vm
+vfmsac_vv   101110 . . . 001 . 1010111 @r_vm
+vfmsac_vf   101110 . . . 101 . 1010111 @r_vm
+vfnmsac_vv  10 . . . 001 . 1010111 @r_vm
+vfnmsac_vf  10 . . . 101 . 1010111 @r_vm
+vfmadd_vv   101000 . . . 001 . 1010111 @r_vm
+vfmadd_vf   101000 . . . 101 . 1010111 @r_vm
+vfnmadd_vv  101001 . . . 001 . 1010111 @r_vm
+vfnmadd_vf  101001 . . . 101 . 1010111 @r_vm
+vfmsub_vv   101010 . . . 001

[PATCH v4 35/60] target/riscv: vector floating-point square-root instruction

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  4 +++
 target/riscv/insn32.decode  |  3 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 37 +++
 target/riscv/vector_helper.c| 40 +
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 57e0fee929..c2f9871490 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -905,3 +905,7 @@ DEF_HELPER_6(vfwmsac_vf_h, void, ptr, ptr, i64, ptr, env, 
i32)
 DEF_HELPER_6(vfwmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_h, void, ptr, ptr, i64, ptr, env, i32)
 DEF_HELPER_6(vfwnmsac_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+
+DEF_HELPER_5(vfsqrt_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vfsqrt_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7cb116cf4..fc9aebc6d6 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -45,6 +45,7 @@
  shamt rs1 rd
 aq rl rs2 rs1 rd
   vm rd rs1 rs2
+   vm rd rs2
  vm wd rd rs1 rs2
 vm rd rs1 nf
  vm rd rs1 rs2 nf
@@ -68,6 +69,7 @@
 @r2_rm   ...   . . ... . ... %rs1 %rm %rd
 @r2  ...   . . ... . ... %rs1 %rd
 @r2_nfvm ... ... vm:1 . . ... . ...  %nf %rs1 %rd
+@r2_vm   .. vm:1 . . ... . ...  %rs2 %rd
 @r_nfvm  ... ... vm:1 . . ... . ...  %nf %rs2 %rs1 %rd
 @r_vm.. vm:1 . . ... . ...  %rs2 %rs1 %rd
 @r_wdvm  . wd:1 vm:1 . . ... . ...  %rs2 %rs1 %rd
@@ -482,6 +484,7 @@ vfwmsac_vv  10 . . . 001 . 1010111 @r_vm
 vfwmsac_vf  10 . . . 101 . 1010111 @r_vm
 vfwnmsac_vv 11 . . . 001 . 1010111 @r_vm
 vfwnmsac_vf 11 . . . 101 . 1010111 @r_vm
+vfsqrt_v100011 . . 0 001 . 1010111 @r2_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 06d6e2625b..3e4f7de240 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1834,3 +1834,40 @@ GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
 GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
+
+/* Vector Floating-Point Square-Root Instruction */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised
+ */
+static bool opfv_check(DisasContext *s, arg_rmr *a)
+{
+   return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+(s->sew != 0));
+}
+
+#define GEN_OPFV_TRANS(NAME, CHECK)\
+static bool trans_##NAME(DisasContext *s, arg_rmr *a)  \
+{  \
+if (CHECK(s, a)) { \
+uint32_t data = 0; \
+static gen_helper_gvec_3_ptr * const fns[3] = {\
+gen_helper_##NAME##_h, \
+gen_helper_##NAME##_w, \
+gen_helper_##NAME##_d, \
+}; \
+data = FIELD_DP32(data, VDATA, MLEN, s->mlen); \
+data = FIELD_DP32(data, VDATA, VM, a->vm); \
+data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
+tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0), \
+vreg_ofs(s, a->rs2), cpu_env, 0,   \
+s->vlen / 8, data, fns[s->sew - 1]);   \
+return true;   \
+}  \
+return false;  \
+}
+GEN_OPFV_TRANS(vfsqrt_v, opfv_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 9bff516a15..088bb51af0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3488,3 +3488,43 @@ RVVCALL(OPFVF3, vfwnmsac_vf_h, WOP_UUU_H, H4, H2, 
fwnmsac16)
 RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 GEN_VEXT_VF(vfwnmsac_vf_h, 2, 4, clearl)
 GEN_VEXT_VF(vfwnmsac_vf_w, 4, 8, clearq)
+
+/* Vector Floating-Point Square-Root Instruction */
+/* (TD, T2, TX2) */
+#define OP_U

[PATCH v4 04/60] target/riscv: add vector configure instruction

2020-03-10 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.h  | 63 ++
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +++
 7 files changed, 199 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 603715f849..11fc573168 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, VLMUL, 0, 2)
+FIELD(VTYPE, VSEW, 2, 3)
+FIELD(VTYPE, VEDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, VSEW);
+lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
+
 *pc = env->pc;
 *cs_base = 0;
+
+if (env->misa & RVV) {
+uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vtype, VTYPE, VILL));
+flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+FIELD_EX64(env->vtype, VTYPE, VSEW));
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+FIELD_EX64(env->vtype, VTYPE, VLMUL));
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+} else {
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+}
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags |= TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0);
+flags = cpu_mmu_index(env, 0);
 if (riscv_cpu_fp_enabled(env)) {
-*flags |= env->mstatus & MSTATUS_FS;
+flags |= env->mstatus & MSTATUS_FS;
 }
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
+/* Vector functions */
+DEF_HELPER_3(vsetvl, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b883672e

[PATCH v4 03/60] target/riscv: support vector extension csr

2020-03-10 Thread LIU Zhiwei
The v0.7.1 specification does not define vector status within mstatus.
A future revision will define the privileged portion of the vector status.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/cpu_bits.h | 15 +
 target/riscv/csr.c  | 75 -
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 7f64ee1174..8117e8b5a7 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -29,6 +29,14 @@
 #define FSR_NXA (FPEXC_NX << FSR_AEXC_SHIFT)
 #define FSR_AEXC(FSR_NVA | FSR_OFA | FSR_UFA | FSR_DZA | FSR_NXA)
 
+/* Vector Fixed-Point round model */
+#define FSR_VXRM_SHIFT  9
+#define FSR_VXRM(0x3 << FSR_VXRM_SHIFT)
+
+/* Vector Fixed-Point saturation flag */
+#define FSR_VXSAT_SHIFT 8
+#define FSR_VXSAT   (0x1 << FSR_VXSAT_SHIFT)
+
 /* Control and Status Registers */
 
 /* User Trap Setup */
@@ -48,6 +56,13 @@
 #define CSR_FRM 0x002
 #define CSR_FCSR0x003
 
+/* User Vector CSRs */
+#define CSR_VSTART  0x008
+#define CSR_VXSAT   0x009
+#define CSR_VXRM0x00a
+#define CSR_VL  0xc20
+#define CSR_VTYPE   0xc21
+
 /* User Timers and Counters */
 #define CSR_CYCLE   0xc00
 #define CSR_TIME0xc01
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 11d184cd16..d71c49dfff 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,6 +46,10 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
+/* loose check condition for fcsr in vector extension */
+if ((csrno == CSR_FCSR) && (env->misa & RVV)) {
+return 0;
+}
 if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
@@ -53,6 +57,14 @@ static int fs(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int vs(CPURISCVState *env, int csrno)
+{
+if (env->misa & RVV) {
+return 0;
+}
+return -1;
+}
+
 static int ctr(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -174,6 +186,10 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 #endif
 *val = (riscv_cpu_get_fflags(env) << FSR_AEXC_SHIFT)
 | (env->frm << FSR_RD_SHIFT);
+if (vs(env, csrno) >= 0) {
+*val |= (env->vxrm << FSR_VXRM_SHIFT)
+| (env->vxsat << FSR_VXSAT_SHIFT);
+}
 return 0;
 }
 
@@ -186,10 +202,62 @@ static int write_fcsr(CPURISCVState *env, int csrno, 
target_ulong val)
 env->mstatus |= MSTATUS_FS;
 #endif
 env->frm = (val & FSR_RD) >> FSR_RD_SHIFT;
+if (vs(env, csrno) >= 0) {
+env->vxrm = (val & FSR_VXRM) >> FSR_VXRM_SHIFT;
+env->vxsat = (val & FSR_VXSAT) >> FSR_VXSAT_SHIFT;
+}
 riscv_cpu_set_fflags(env, (val & FSR_AEXC) >> FSR_AEXC_SHIFT);
 return 0;
 }
 
+static int read_vtype(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vtype;
+return 0;
+}
+
+static int read_vl(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vl;
+return 0;
+}
+
+static int read_vxrm(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxrm;
+return 0;
+}
+
+static int write_vxrm(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxrm = val;
+return 0;
+}
+
+static int read_vxsat(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vxsat;
+return 0;
+}
+
+static int write_vxsat(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vxsat = val;
+return 0;
+}
+
+static int read_vstart(CPURISCVState *env, int csrno, target_ulong *val)
+{
+*val = env->vstart;
+return 0;
+}
+
+static int write_vstart(CPURISCVState *env, int csrno, target_ulong val)
+{
+env->vstart = val;
+return 0;
+}
+
 /* User Timers and Counters */
 static int read_instret(CPURISCVState *env, int csrno, target_ulong *val)
 {
@@ -1269,7 +1337,12 @@ static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_FFLAGS] =  { fs,   read_fflags,  write_fflags  },
 [CSR_FRM] = { fs,   read_frm, write_frm },
 [CSR_FCSR] ={ fs,   read_fcsr,write_fcsr},
-
+/* Vector CSRs */
+[CSR_VSTART] =  { vs,   read_vstart,  write_vstart  },
+[CSR_VXSAT] =   { vs,   read_vxsat,   write_vxsat   },
+[CSR_VXRM] ={ vs,   read_vxrm,write_vxrm},
+[CSR_VL] =  { vs,   read_vl },
+[CSR_VTYPE] =   { vs,   read_vtype  },
 /* User Timers and Counters */
 [CSR_CYCLE] =   { ctr,  read_instret},
 [CSR_INSTRET] = { ctr,  read_instret},
-- 
2.23.0




[PATCH v4 29/60] target/riscv: vector single-width floating-point add/subtract instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  16 
 target/riscv/insn32.decode  |   5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 107 
 target/riscv/vector_helper.c|  89 
 4 files changed, 217 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4cad8679ec..6b46677eeb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -785,3 +785,19 @@ DEF_HELPER_6(vnclipu_vx_w, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vnclip_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnclip_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vfadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfadd_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_h, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_w, void, ptr, ptr, i64, ptr, env, i32)
+DEF_HELPER_6(vfrsub_vf_d, void, ptr, ptr, i64, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index c7d589566f..32918c4d11 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -438,6 +438,11 @@ vnclipu_vi  101110 . . . 011 . 1010111 
@r_vm
 vnclip_vv   10 . . . 000 . 1010111 @r_vm
 vnclip_vx   10 . . . 100 . 1010111 @r_vm
 vnclip_vi   10 . . . 011 . 1010111 @r_vm
+vfadd_vv00 . . . 001 . 1010111 @r_vm
+vfadd_vf00 . . . 101 . 1010111 @r_vm
+vfsub_vv10 . . . 001 . 1010111 @r_vm
+vfsub_vf10 . . . 101 . 1010111 @r_vm
+vfrsub_vf   100111 . . . 101 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 11b4887275..af4dcb96c6 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1557,3 +1557,110 @@ GEN_OPIVX_NARROW_TRANS(vnclipu_vx)
 GEN_OPIVX_NARROW_TRANS(vnclip_vx)
 GEN_OPIVI_NARROW_TRANS(vnclipu_vi, 1, vnclipu_vx)
 GEN_OPIVI_NARROW_TRANS(vnclip_vi, 1, vnclip_vx)
+
+/*
+ *** Vector Float Point Arithmetic Instructions
+ */
+/* Vector Single-Width Floating-Point Add/Subtract Instructions */
+
+/*
+ * If the current SEW does not correspond to a supported IEEE floating-point
+ * type, an illegal instruction exception is raised.
+ */
+static bool opfvv_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, false) &&
+vext_check_reg(s, a->rs1, false) &&
+(s->sew != 0));
+}
+
+/* OPFVV without GVEC IR */
+#define GEN_OPFVV_TRANS(NAME, CHECK)   \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
+{  \
+if (CHECK(s, a)) { \
+uint32_t data = 0; \
+static gen_helper_gvec_4_ptr * const fns[3] = {\
+gen_helper_##NAME##_h, \
+gen_helper_##NAME##_w, \
+gen_helper_##NAME##_d, \
+}; \
+data = FIELD_DP32(data, VDATA, MLEN, s->mlen); \
+data = FIELD_DP32(data, VDATA, VM, a->vm); \
+data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
+tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0), \
+vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),  \
+cpu_env, 0, s->vlen / 8, data, fns[s->sew - 1]);   \
+return true;   \
+}  \
+return false; 

[PATCH v4 20/60] target/riscv: vector single-width integer multiply-add instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 33 ++
 target/riscv/insn32.decode  |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 +++
 target/riscv/vector_helper.c| 88 +
 4 files changed, 139 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 1704b8c512..098288df76 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -610,3 +610,36 @@ DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmacc_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsac_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ceddfe4b6c..58de888afa 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -384,6 +384,14 @@ vwmulsu_vv  111010 . . . 010 . 1010111 
@r_vm
 vwmulsu_vx  111010 . . . 110 . 1010111 @r_vm
 vwmul_vv111011 . . . 010 . 1010111 @r_vm
 vwmul_vx111011 . . . 110 . 1010111 @r_vm
+vmacc_vv101101 . . . 010 . 1010111 @r_vm
+vmacc_vx101101 . . . 110 . 1010111 @r_vm
+vnmsac_vv   10 . . . 010 . 1010111 @r_vm
+vnmsac_vx   10 . . . 110 . 1010111 @r_vm
+vmadd_vv101001 . . . 010 . 1010111 @r_vm
+vmadd_vx101001 . . . 110 . 1010111 @r_vm
+vnmsub_vv   101011 . . . 010 . 1010111 @r_vm
+vnmsub_vx   101011 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 990433f866..05f7ae0bc4 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1462,3 +1462,13 @@ GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
 GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+GEN_OPIVV_TRANS(vmacc_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsac_vv, opivv_check)
+GEN_OPIVV_TRANS(vmadd_vv, opivv_check)
+GEN_OPIVV_TRANS(vnmsub_vv, opivv_check)
+GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
+GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
+GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index beb84f9674..e5082c8adc 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1822,3 +1822,91 @@ GEN_VEXT_VX(vwmulu_vx_w, 4, 8, clearq)
 GEN_VEXT_VX(vwmulsu_vx_b, 1, 2, clearh)
 GEN_VEXT_VX(vwmulsu_vx_h, 2, 4, clearl)
 GEN_VEXT_VX(vwmulsu_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Integer Multiply-Add Instructions */
+#define

[PATCH v4 18/60] target/riscv: vector integer divide instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 33 +++
 target/riscv/insn32.decode  |  8 +++
 target/riscv/insn_trans/trans_rvv.inc.c | 10 
 target/riscv/vector_helper.c| 74 +
 4 files changed, 125 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index f42a12eef3..357f149198 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -558,3 +558,36 @@ DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vdivu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vremu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrem_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdivu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vdiv_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vremu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a8ac4e9e9d..2afe24dd34 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -370,6 +370,14 @@ vmulhu_vv   100100 . . . 010 . 1010111 
@r_vm
 vmulhu_vx   100100 . . . 110 . 1010111 @r_vm
 vmulhsu_vv  100110 . . . 010 . 1010111 @r_vm
 vmulhsu_vx  100110 . . . 110 . 1010111 @r_vm
+vdivu_vv10 . . . 010 . 1010111 @r_vm
+vdivu_vx10 . . . 110 . 1010111 @r_vm
+vdiv_vv 11 . . . 010 . 1010111 @r_vm
+vdiv_vx 11 . . . 110 . 1010111 @r_vm
+vremu_vv100010 . . . 010 . 1010111 @r_vm
+vremu_vx100010 . . . 110 . 1010111 @r_vm
+vrem_vv 100011 . . . 010 . 1010111 @r_vm
+vrem_vx 100011 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index a1ecc9f52d..9f0645a92b 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1444,3 +1444,13 @@ GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
 GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
+
+/* Vector Integer Divide Instructions */
+GEN_OPIVV_TRANS(vdivu_vv, opivv_check)
+GEN_OPIVV_TRANS(vdiv_vv, opivv_check)
+GEN_OPIVV_TRANS(vremu_vv, opivv_check)
+GEN_OPIVV_TRANS(vrem_vv, opivv_check)
+GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
+GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
+GEN_OPIVX_TRANS(vremu_vx, opivx_check)
+GEN_OPIVX_TRANS(vrem_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 93daafd5bd..6330f5882f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1697,3 +1697,77 @@ GEN_VEXT_VX(vmulhsu_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vmulhsu_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vmulhsu_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vmulhsu_vx_d, 8, 8, clearq)
+
+/* Vector Integer Divide Instructions */
+#define DO_DIVU(N, M) (unlikely(M == 0) ? (__typeof(N))(-1) : N / M)
+#define

[PATCH v4 27/60] target/riscv: vector single-width scaling shift instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  17 
 target/riscv/insn32.decode  |   6 ++
 target/riscv/insn_trans/trans_rvv.inc.c |   8 ++
 target/riscv/vector_helper.c| 109 
 4 files changed, 140 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c1c695e0..efc84fbd79 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -755,3 +755,20 @@ DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vssrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssra_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssrl_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vssra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8798919d3e..d6d111e04a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -426,6 +426,12 @@ vwsmacc_vx  01 . . . 100 . 1010111 
@r_vm
 vwsmaccsu_vv10 . . . 000 . 1010111 @r_vm
 vwsmaccsu_vx10 . . . 100 . 1010111 @r_vm
 vwsmaccus_vx11 . . . 100 . 1010111 @r_vm
+vssrl_vv101010 . . . 000 . 1010111 @r_vm
+vssrl_vx101010 . . . 100 . 1010111 @r_vm
+vssrl_vi101010 . . . 011 . 1010111 @r_vm
+vssra_vv101011 . . . 000 . 1010111 @r_vm
+vssra_vx101011 . . . 100 . 1010111 @r_vm
+vssra_vi101011 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 68bebd3c37..21f896ea26 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1541,3 +1541,11 @@ GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
 GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
+
+/* Vector Single-Width Scaling Shift Instructions */
+GEN_OPIVV_TRANS(vssrl_vv, opivv_check)
+GEN_OPIVV_TRANS(vssra_vv, opivv_check)
+GEN_OPIVX_TRANS(vssrl_vx,  opivx_check)
+GEN_OPIVX_TRANS(vssra_vx,  opivx_check)
+GEN_OPIVI_TRANS(vssrl_vi, 1, vssrl_vx, opivx_check)
+GEN_OPIVI_TRANS(vssra_vi, 0, vssra_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 90c19577fa..ec0f822fcf 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2703,3 +2703,112 @@ RVVCALL(OPIVX3_ENV, vwsmaccus_vx_w, WOP_SUS_W, H8, H4, 
vwsmaccus32)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_b, 1, 2, clearh)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_h, 2, 4, clearl)
 GEN_VEXT_VX_ENV(vwsmaccus_vx_w, 4, 8, clearq)
+
+/* Vector Single-Width Scaling Shift Instructions */
+static uint8_t vssrl8(CPURISCVState *env, uint8_t a, uint8_t b)
+{
+uint8_t round, shift = b & 0x7;
+uint8_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+return res;
+}
+static uint16_t vssrl16(CPURISCVState *env, uint16_t a, uint16_t b)
+{
+uint8_t round, shift = b & 0xf;
+uint16_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+return res;
+}
+static uint32_t vssrl32(CPURISCVState *env, uint32_t a, uint32_t b)
+{
+uint8_t round, shift = b & 0x1f;
+uint32_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+return res;
+}
+static uint64_t vssrl64(CPURISCVState *env, uint64_t a, uint64_t b)
+{
+uint8_t round, shift = b & 0x3f;
+uint64_t res;
+
+round = get_round(env, a, shift);
+res   = (a >> shift)  + round;
+return res;
+}
+RVVCALL(OPIVV2_ENV, vssrl_vv_b, OP_UUU_B, H1, H1, H1, vssrl8)
+RVVCALL(OPIVV2_ENV, vssrl_vv_h, OP_UUU_H, H2, H2, H2, vssrl16)
+RVVCALL(OPIVV2_ENV, vssrl_vv_w, OP_UUU_W, H4, H4, H4, vssrl32)
+RVVCALL(OPIVV2_ENV, v

[PATCH v4 21/60] target/riscv: vector widening integer multiply-add instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 22 
 target/riscv/insn32.decode  |  7 
 target/riscv/insn_trans/trans_rvv.inc.c |  9 +
 target/riscv/vector_helper.c| 45 +
 4 files changed, 83 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 098288df76..1f0d3d60e3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -643,3 +643,25 @@ DEF_HELPER_6(vnmsub_vx_b, void, ptr, ptr, tl, ptr, env, 
i32)
 DEF_HELPER_6(vnmsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vnmsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 58de888afa..2a5b945139 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -392,6 +392,13 @@ vmadd_vv101001 . . . 010 . 1010111 
@r_vm
 vmadd_vx101001 . . . 110 . 1010111 @r_vm
 vnmsub_vv   101011 . . . 010 . 1010111 @r_vm
 vnmsub_vx   101011 . . . 110 . 1010111 @r_vm
+vwmaccu_vv  00 . . . 010 . 1010111 @r_vm
+vwmaccu_vx  00 . . . 110 . 1010111 @r_vm
+vwmacc_vv   01 . . . 010 . 1010111 @r_vm
+vwmacc_vx   01 . . . 110 . 1010111 @r_vm
+vwmaccsu_vv 10 . . . 010 . 1010111 @r_vm
+vwmaccsu_vx 10 . . . 110 . 1010111 @r_vm
+vwmaccus_vx 11 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 05f7ae0bc4..958737d097 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1472,3 +1472,12 @@ GEN_OPIVX_TRANS(vmacc_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsac_vx, opivx_check)
 GEN_OPIVX_TRANS(vmadd_vx, opivx_check)
 GEN_OPIVX_TRANS(vnmsub_vx, opivx_check)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e5082c8adc..5109654f9f 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -1910,3 +1910,48 @@ GEN_VEXT_VX(vnmsub_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vnmsub_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vnmsub_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vnmsub_vx_d, 8, 8, clearq)
+
+/* Vector Widening Integer Multiply-Add Instructions */
+RVVCALL(OPIVV3, vwmaccu_vv_b, WOP_UUU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_h, WOP_UUU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccu_vv_w, WOP_UUU_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_b, WOP_SSS_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_h, WOP_SSS_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmacc_vv_w, WOP_SSS_W, H8, H4, H4, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_b, WOP_SSU_B, H2, H1, H1, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_h, WOP_SSU_H, H4, H2, H2, DO_MACC)
+RVVCALL(OPIVV3, vwmaccsu_vv_w, WOP_SSU_W, H8, H4, H4, DO_MACC)
+GEN_VEXT_VV(vwmaccu_vv_b, 1, 2, clearh)
+GEN_VEXT_VV(vwmaccu_vv_h, 2, 4, clearl)
+GEN_VEXT_VV(vwmaccu_vv_w, 4, 8, clearq)
+GEN_VEXT_VV(vwmacc_vv_b, 1, 2, clearh)
+GEN_VEXT_VV

[PATCH v4 26/60] target/riscv: vector widening saturating scaled multiply-add

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  22 +++
 target/riscv/insn32.decode  |   7 +
 target/riscv/insn_trans/trans_rvv.inc.c |   9 ++
 target/riscv/vector_helper.c| 180 
 4 files changed, 218 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 333eccca57..74c1c695e0 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -733,3 +733,25 @@ DEF_HELPER_6(vsmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwsmaccu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmacc_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwsmaccus_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 99f70924d6..8798919d3e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -419,6 +419,13 @@ vasub_vv100110 . . . 000 . 1010111 
@r_vm
 vasub_vx100110 . . . 100 . 1010111 @r_vm
 vsmul_vv100111 . . . 000 . 1010111 @r_vm
 vsmul_vx100111 . . . 100 . 1010111 @r_vm
+vwsmaccu_vv 00 . . . 000 . 1010111 @r_vm
+vwsmaccu_vx 00 . . . 100 . 1010111 @r_vm
+vwsmacc_vv  01 . . . 000 . 1010111 @r_vm
+vwsmacc_vx  01 . . . 100 . 1010111 @r_vm
+vwsmaccsu_vv10 . . . 000 . 1010111 @r_vm
+vwsmaccsu_vx10 . . . 100 . 1010111 @r_vm
+vwsmaccus_vx11 . . . 100 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 60e1e63b7b..68bebd3c37 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1532,3 +1532,12 @@ GEN_OPIVI_TRANS(vaadd_vi, 0, vaadd_vx, opivx_check)
 /* Vector Single-Width Fractional Multiply with Rounding and Saturation */
 GEN_OPIVV_TRANS(vsmul_vv, opivv_check)
 GEN_OPIVX_TRANS(vsmul_vx,  opivx_check)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+GEN_OPIVV_WIDEN_TRANS(vwsmaccu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmacc_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwsmaccsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmacc_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccsu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwsmaccus_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 74ad07743c..90c19577fa 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -2523,3 +2523,183 @@ GEN_VEXT_VX_ENV(vsmul_vx_b, 1, 1, clearb)
 GEN_VEXT_VX_ENV(vsmul_vx_h, 2, 2, clearh)
 GEN_VEXT_VX_ENV(vsmul_vx_w, 4, 4, clearl)
 GEN_VEXT_VX_ENV(vsmul_vx_d, 8, 8, clearq)
+
+/* Vector Widening Saturating Scaled Multiply-Add */
+static uint16_t vwsmaccu8(CPURISCVState *env, uint8_t a, uint8_t b,
+uint16_t c)
+{
+uint8_t round;
+uint16_t res = (uint16_t)a * (uint16_t)b;
+
+round = get_round(env, res, 4);
+res   = (res >> 4) + round;
+return saddu16(env, c, res);
+}
+static uint32_t vwsmaccu16(CPURISCVState *env, uint16_t a, uint16_t b,
+uint32_t c)
+{
+uint8_t round;
+uint32_t res = (uint32_t)a * (uint32_t)b;
+
+round = get_round(env, res, 8);
+res   = (res >> 8) + round;
+return saddu32(env, c, res);
+}
+static uint64_t vwsmaccu32(CPURISCVState *env, uint32_t a, uint32_t b,
+uint64_t c)
+{
+uint8_t round;
+uint64_t res

[PATCH v4 17/60] target/riscv: vector single-width integer multiply instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   |  33 ++
 target/riscv/insn32.decode  |   8 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  10 ++
 target/riscv/vector_helper.c| 147 
 4 files changed, 198 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c7d4ff185a..f42a12eef3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -525,3 +525,36 @@ DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulh_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmul_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulh_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index a7619f4e3d..a8ac4e9e9d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -362,6 +362,14 @@ vmaxu_vv000110 . . . 000 . 1010111 
@r_vm
 vmaxu_vx000110 . . . 100 . 1010111 @r_vm
 vmax_vv 000111 . . . 000 . 1010111 @r_vm
 vmax_vx 000111 . . . 100 . 1010111 @r_vm
+vmul_vv 100101 . . . 010 . 1010111 @r_vm
+vmul_vx 100101 . . . 110 . 1010111 @r_vm
+vmulh_vv100111 . . . 010 . 1010111 @r_vm
+vmulh_vx100111 . . . 110 . 1010111 @r_vm
+vmulhu_vv   100100 . . . 010 . 1010111 @r_vm
+vmulhu_vx   100100 . . . 110 . 1010111 @r_vm
+vmulhsu_vv  100110 . . . 010 . 1010111 @r_vm
+vmulhsu_vx  100110 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 4437a77878..a1ecc9f52d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1434,3 +1434,13 @@ GEN_OPIVX_TRANS(vminu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
 GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
 GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
+
+/* Vector Single-Width Integer Multiply Instructions */
+GEN_OPIVV_GVEC_TRANS(vmul_vv,  mul)
+GEN_OPIVV_TRANS(vmulh_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhu_vv, opivv_check)
+GEN_OPIVV_TRANS(vmulhsu_vv, opivv_check)
+GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
+GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
+GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 03e001262f..93daafd5bd 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -853,6 +853,10 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, 
clearl)
 #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t

[PATCH v4 19/60] target/riscv: vector widening integer multiply instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 19 +
 target/riscv/insn32.decode  |  6 +++
 target/riscv/insn_trans/trans_rvv.inc.c |  8 
 target/riscv/vector_helper.c| 51 +
 4 files changed, 84 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 357f149198..1704b8c512 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -591,3 +591,22 @@ DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vwmul_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmul_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vwmulsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2afe24dd34..ceddfe4b6c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -378,6 +378,12 @@ vremu_vv100010 . . . 010 . 1010111 
@r_vm
 vremu_vx100010 . . . 110 . 1010111 @r_vm
 vrem_vv 100011 . . . 010 . 1010111 @r_vm
 vrem_vx 100011 . . . 110 . 1010111 @r_vm
+vwmulu_vv   111000 . . . 010 . 1010111 @r_vm
+vwmulu_vx   111000 . . . 110 . 1010111 @r_vm
+vwmulsu_vv  111010 . . . 010 . 1010111 @r_vm
+vwmulsu_vx  111010 . . . 110 . 1010111 @r_vm
+vwmul_vv111011 . . . 010 . 1010111 @r_vm
+vwmul_vx111011 . . . 110 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index 9f0645a92b..990433f866 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1454,3 +1454,11 @@ GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
 GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
 GEN_OPIVX_TRANS(vremu_vx, opivx_check)
 GEN_OPIVX_TRANS(vrem_vx, opivx_check)
+
+/* Vector Widening Integer Multiply Instructions */
+GEN_OPIVV_WIDEN_TRANS(vwmul_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulu_vv, opivv_widen_check)
+GEN_OPIVV_WIDEN_TRANS(vwmulsu_vv, opivv_widen_check)
+GEN_OPIVX_WIDEN_TRANS(vwmul_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulu_vx)
+GEN_OPIVX_WIDEN_TRANS(vwmulsu_vx)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 6330f5882f..beb84f9674 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -857,6 +857,18 @@ GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, idx_w, 
clearl)
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
 #define OP_SUS_D int64_t, uint64_t, int64_t, uint64_t, int64_t
+#define WOP_UUU_B uint16_t, uint8_t, uint8_t, uint16_t, uint16_t
+#define WOP_UUU_H uint32_t, uint16_t, uint16_t, uint32_t, uint32_t
+#define WOP_UUU_W uint64_t, uint32_t, uint32_t, uint64_t, uint64_t
+#define WOP_SSS_B int16_t, int8_t, int8_t, int16_t, int16_t
+#define WOP_SSS_H int32_t, int16_t, int16_t, int32_t, int32_t
+#define WOP_SSS_W int64_t, int32_t, int32_t, int64_t, int64_t
+#define WOP_SUS_B int16_t, uint8_t, int8_t, uint16_t, int16_t
+#define WOP_SUS_H int32_t, uint16_t, int16_t, uint32_t, int32_t
+#define WOP_SUS_W int64_t, uint32_t, int32_t, uint64_t, int64_t
+#define WOP_SSU_B int16_t, int8_t, uint8_t, int16_t, uint16_t
+#define WOP_SSU_H int32_t, int16_t, uint16_t, int32_t, uint32_t
+#define WOP_SSU_W int64_t, int32_t, uint32_t, int64_t, uint64_t
 
 /* operation of two vector elements */
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)\
@@ -1771,3 +1783,42 @@ GEN_VEXT_VX(vrem_vx_b, 1, 1, clearb)
 GEN_VEXT_VX(vrem_vx_h, 2, 2, clearh)
 GEN_VEXT_VX(vrem_vx_w, 4, 4, clearl)
 GEN_VEXT_VX(vrem_vx_d, 8, 8

[PATCH v4 14/60] target/riscv: vector narrowing integer right shift instructions

2020-03-10 Thread LIU Zhiwei
Signed-off-by: LIU Zhiwei 
---
 target/riscv/helper.h   | 13 
 target/riscv/insn32.decode  |  6 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 91 +
 target/riscv/vector_helper.c| 14 
 4 files changed, 124 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 47284c7476..0f36a8ce43 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -422,3 +422,16 @@ DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vnsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dbbfa34b97..e21b3d6b5e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -328,6 +328,12 @@ vsrl_vi 101000 . . . 011 . 1010111 
@r_vm
 vsra_vv 101001 . . . 000 . 1010111 @r_vm
 vsra_vx 101001 . . . 100 . 1010111 @r_vm
 vsra_vi 101001 . . . 011 . 1010111 @r_vm
+vnsrl_vv101100 . . . 000 . 1010111 @r_vm
+vnsrl_vx101100 . . . 100 . 1010111 @r_vm
+vnsrl_vi101100 . . . 011 . 1010111 @r_vm
+vnsra_vv101101 . . . 000 . 1010111 @r_vm
+vnsra_vx101101 . . . 100 . 1010111 @r_vm
+vnsra_vi101101 . . . 011 . 1010111 @r_vm
 
 vsetvli 0 ... . 111 . 1010111  @r2_zimm
 vsetvl  100 . . 111 . 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
b/target/riscv/insn_trans/trans_rvv.inc.c
index a60518e1df..7033eeaa4d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -1267,3 +1267,94 @@ GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
 GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
 GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
 GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
+
+/* Vector Narrowing Integer Right Shift Instructions */
+static bool opivv_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_check_isa_ill(s, RVV) &&
+vext_check_overlap_mask(s, a->rd, a->vm, false) &&
+vext_check_reg(s, a->rd, false) &&
+vext_check_reg(s, a->rs2, true) &&
+vext_check_reg(s, a->rs1, false) &&
+vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
+2 << s->lmul) &&
+(s->lmul < 0x3) && (s->sew < 0x3));
+}
+
+/* OPIVV with NARROW */
+#define GEN_OPIVV_NARROW_TRANS(NAME)   \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
+{  \
+if (opivv_narrow_check(s, a)) {\
+uint32_t data = 0; \
+static gen_helper_gvec_4_ptr * const fns[3] = {\
+gen_helper_##NAME##_b, \
+gen_helper_##NAME##_h, \
+gen_helper_##NAME##_w, \
+}; \
+data = FIELD_DP32(data, VDATA, MLEN, s->mlen); \
+data = FIELD_DP32(data, VDATA, VM, a->vm); \
+data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
+tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0), \
+vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),  \
+cpu_env, 0, s->vlen / 8, data, fns[s->sew]);   \
+return true;   \
+}  \
+return false;  \
+}
+GEN_OPIVV_NARROW_TRANS(vnsra_vv)
+GEN_OPIVV_NARROW_TRANS(vnsrl_vv)
+
+static bool opivx_narrow_check(DisasContext *s, arg_rmrr *a)
+{
+return (vext_c

[PATCH v5 04/60] target/riscv: add vector configure instruction

2020-03-12 Thread LIU Zhiwei
vsetvl and vsetvli are two configure instructions for vl, vtype. TB flags
should update after configure instructions. The (ill, lmul, sew ) of vtype
and the bit of (VSTART == 0 && VL == VLMAX) will be placed within tb_flags.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/Makefile.objs  |  2 +-
 target/riscv/cpu.h  | 63 ++
 target/riscv/helper.h   |  2 +
 target/riscv/insn32.decode  |  5 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 69 +
 target/riscv/translate.c| 17 +-
 target/riscv/vector_helper.c| 53 +++
 7 files changed, 199 insertions(+), 12 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
 create mode 100644 target/riscv/vector_helper.c

diff --git a/target/riscv/Makefile.objs b/target/riscv/Makefile.objs
index ff651f69f6..ff38df6219 100644
--- a/target/riscv/Makefile.objs
+++ b/target/riscv/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
gdbstub.o
+obj-y += translate.o op_helper.o cpu_helper.o cpu.o csr.o fpu_helper.o 
vector_helper.o gdbstub.o
 obj-$(CONFIG_SOFTMMU) += pmp.o
 
 ifeq ($(CONFIG_SOFTMMU),y)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 603715f849..505d1a8515 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -21,6 +21,7 @@
 #define RISCV_CPU_H
 
 #include "hw/core/cpu.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "fpu/softfloat-types.h"
 
@@ -99,6 +100,12 @@ typedef struct CPURISCVState CPURISCVState;
 
 #define RV_VLEN_MAX 512
 
+FIELD(VTYPE, VLMUL, 0, 2)
+FIELD(VTYPE, VSEW, 2, 3)
+FIELD(VTYPE, VEDIV, 5, 2)
+FIELD(VTYPE, RESERVED, 7, sizeof(target_ulong) * 8 - 9)
+FIELD(VTYPE, VILL, sizeof(target_ulong) * 8 - 2, 1)
+
 struct CPURISCVState {
 target_ulong gpr[32];
 uint64_t fpr[32]; /* assume both F and D extensions */
@@ -358,19 +365,62 @@ void riscv_cpu_set_fflags(CPURISCVState *env, 
target_ulong);
 #define TB_FLAGS_MMU_MASK   3
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 
+typedef CPURISCVState CPUArchState;
+typedef RISCVCPU ArchCPU;
+#include "exec/cpu-all.h"
+
+FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
+FIELD(TB_FLAGS, LMUL, 3, 2)
+FIELD(TB_FLAGS, SEW, 5, 3)
+FIELD(TB_FLAGS, VILL, 8, 1)
+
+/*
+ * A simplification for VLMAX
+ * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
+ * = (VLEN << LMUL) / (8 << SEW)
+ * = (VLEN << LMUL) >> (SEW + 3)
+ * = VLEN >> (SEW + 3 - LMUL)
+ */
+static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+{
+uint8_t sew, lmul;
+
+sew = FIELD_EX64(vtype, VTYPE, VSEW);
+lmul = FIELD_EX64(vtype, VTYPE, VLMUL);
+return cpu->cfg.vlen >> (sew + 3 - lmul);
+}
+
 static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-target_ulong *cs_base, uint32_t *flags)
+target_ulong *cs_base, uint32_t 
*pflags)
 {
+uint32_t flags = 0;
+
 *pc = env->pc;
 *cs_base = 0;
+
+if (env->misa & RVV) {
+uint32_t vlmax = vext_get_vlmax(env_archcpu(env), env->vtype);
+bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl);
+flags = FIELD_DP32(flags, TB_FLAGS, VILL,
+FIELD_EX64(env->vtype, VTYPE, VILL));
+flags = FIELD_DP32(flags, TB_FLAGS, SEW,
+FIELD_EX64(env->vtype, VTYPE, VSEW));
+flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
+FIELD_EX64(env->vtype, VTYPE, VLMUL));
+flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
+} else {
+flags = FIELD_DP32(flags, TB_FLAGS, VILL, 1);
+}
+
 #ifdef CONFIG_USER_ONLY
-*flags = TB_FLAGS_MSTATUS_FS;
+flags |= TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0);
+flags |= cpu_mmu_index(env, 0);
 if (riscv_cpu_fp_enabled(env)) {
-*flags |= env->mstatus & MSTATUS_FS;
+flags |= env->mstatus & MSTATUS_FS;
 }
 #endif
+*pflags = flags;
 }
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
@@ -411,9 +461,4 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations 
*ops);
 
 void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
 
-typedef CPURISCVState CPUArchState;
-typedef RISCVCPU ArchCPU;
-
-#include "exec/cpu-all.h"
-
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index debb22a480..3c28c7e407 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -76,3 +76,5 @@ DEF_HELPER_2(mret, tl, env, tl)
 DEF_HELPER_1(wfi, void, env)
 DEF_HELPER_1(tlb_flush, void, env)
 #endif
+/* Vector functions */
+DEF_HELPER_3(vsetvl, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b883

  1   2   3   4   5   6   7   8   9   10   >