Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2020-01-05 Thread Richard Henderson
On 12/30/19 6:11 PM, LIU Zhiwei wrote:
> 
> However It's not clear when use tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool. I 
> think
> the meaning of oprsz is the
> the bits of active elements.  Therefore , oprsz is  8 * env->vext.vl in RISC-V
> and it can't be fetched  from
> TB_FLAGS like SVE.
> 
> Probably oprsz field will be not be used in RISC-V vector extension.

Correct.  For those risc-v helpers that are called when VL != VLMAX, you would
ignore the oprsz field and fetch it from env.

It may still be handy to pass in vlmax as maxsz, even if you leave the oprsz
field 0.  You'll find that out as you do the coding, I suppose.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-30 Thread LIU Zhiwei




On 2019/12/28 9:14, Richard Henderson wrote:

On 12/25/19 8:36 PM, LIU Zhiwei wrote:

struct {

     uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
     } vext;

Is it OK?

I don't think there's a good reason for the vext structure -- I would drop
that.  Otherwise it looks good.


However, there are still some differences from SVE.

1)cpu_env must be used as a parameter for helper function.

     The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be
difficult to use out of line tcg_gen_gvec_ool.

Sure.  That's also true of any of the fp operations, which will want to
accumulate ieee exceptions.

See tcg_gen_gvec_*_ptr(), which allows you to pass in cpu_env.

Thanks. The tcg_gen_gvec_*_ptr is good.



2)simd_desc is not proper.

     I also need to transfer some members of DisasContext to helpers.

     (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of
this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;

The oprsz & maxsz parameters to tcg_gen_gvec_* should be given (ctx->lmul *
ctx->vlen).  The sew parameter should be implied by the helper function called,
each helper function using a different type.  Therefore vlmax can be trivially
computed within the helper from oprsz / sizeof(type).
It's clear that the oprsz & maxsz paramenters should be given (ctx->lmul 
* ctx->vlen) for tcg_gen_gvec_add.


However It's not clear when use tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool. 
I think the meaning of oprsz is the
the bits of active elements.  Therefore , oprsz is  8 * env->vext.vl in 
RISC-V and it can't be fetched  from

TB_FLAGS like SVE.

Probably oprsz field will be not be used in RISC-V vector extension.

Data is reserved to expand.  Mlen is mask length for one elment, so it will
defined as ctx->sew/ctx->lmul. As with Mlen, a active element will

be selected by

 static inline int vext_elem_mask(void *v0, int mlen, int index)
 {
     int idx = (index * mlen) / 8;
     int pos = (index * mlen) % 8;

     return (v0[idx] >> pos) & 0x1;
 }

     So I may have to implement vext_desc instead of use the simd_desc, which
will be another redundant. Maybe a better way to mask elements?

I think you will want to define your own vext_desc, building upon simd_desc,
such that lg2(mlen) is passed in the first N bits of simd_data.

Good. It's a good way to use the tcg_gen_gvec_*_ptr or tcg_gen_gvec_ool API.

Best Regards,
Zhiwei


r~





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-27 Thread Richard Henderson
On 12/25/19 8:36 PM, LIU Zhiwei wrote:
> struct {
> 
>     uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
>     target_ulong vxrm;
>     target_ulong vxsat;
>     target_ulong vl;
>     target_ulong vstart;
>     target_ulong vtype;
>     } vext;
> 
> Is it OK?
I don't think there's a good reason for the vext structure -- I would drop
that.  Otherwise it looks good.

> However, there are still some differences from SVE.
> 
> 1)cpu_env must be used as a parameter for helper function.
> 
>     The helpers need  use env->vext.vl and env->vext.vstart.  Thus it will be
> difficult to use out of line tcg_gen_gvec_ool.

Sure.  That's also true of any of the fp operations, which will want to
accumulate ieee exceptions.

See tcg_gen_gvec_*_ptr(), which allows you to pass in cpu_env.

> 2)simd_desc is not proper.
> 
>     I also need to transfer some members of DisasContext to helpers. 
> 
>     (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of elements of
> this operation, so it will defined as ctx->lmul * ctx->vlen / ctx->sew;

The oprsz & maxsz parameters to tcg_gen_gvec_* should be given (ctx->lmul *
ctx->vlen).  The sew parameter should be implied by the helper function called,
each helper function using a different type.  Therefore vlmax can be trivially
computed within the helper from oprsz / sizeof(type).

> Data is reserved to expand.  Mlen is mask length for one elment, so it will
> defined as ctx->sew/ctx->lmul. As with Mlen, a active element will
> 
> be selected by
> 
> static inline int vext_elem_mask(void *v0, int mlen, int index)
> {
>     int idx = (index * mlen) / 8;
>     int pos = (index * mlen) % 8;
> 
>     return (v0[idx] >> pos) & 0x1;
> }
> 
>     So I may have to implement vext_desc instead of use the simd_desc, which
> will be another redundant. Maybe a better way to mask elements?

I think you will want to define your own vext_desc, building upon simd_desc,
such that lg2(mlen) is passed in the first N bits of simd_data.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-25 Thread LIU Zhiwei


On 2019/12/20 4:38, Richard Henderson wrote:

On 12/18/19 11:11 PM, LIU Zhiwei wrote:

I'm sorry that it's really hard to absorb your opinion. I don't know why clang
will fail

when index beyond the end of vreg[n] into vreg[n+1].

I thought sure one of the address sanitizer checks would detect array bounds
overrun.  But it becomes irrelevant


As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
property which can be

specified from command line.  So the sub-struct maybe defined as

struct {
     union{
     uint64_t *u64 ;
     int64_t  *s64;
     uint32_t *u32;
     int32_t  *s32;
     uint16_t *u16;
     int16_t  *s16;
     uint8_t  *u8;
     int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext;

Will that be OK?

Pointers have consequences.  It can be done, but I don't think it is ideal.


The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit of
(VSTART == 0 && VL == VLMAX).

So it will take 8 bits of tb flags for vector extension at least.

Good.

However, I have one problem to support both command line VLEN and vreg_ofs.

As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
extension (to support command line VLEN) is

struct {
     union{
     uint64_t *u64 ;
     int64_t  *s64;
     uint32_t *u32;
     int32_t  *s32;
     uint16_t *u16;
     int16_t  *s16;
     uint8_t  *u8;
     int8_t   *s8;
     } mem;
     target_ulong vxrm;
     target_ulong vxsat;
     target_ulong vl;
     target_ulong vstart;
     target_ulong vtype;
} vext

I can't find the way to get the direct offset of vreg from cpu_env.

Maybe I should specify a max VLEN like the way of SVE?

I think a maximum vlen is best.  A command-line option to adjust vlen is all
well and good, but there's no reason to have to support vlen=(1<<29).

Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
be aligned mod 16.

I think that all you need is

 uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);

which gives us

uint32_t vreg_ofs(DisasContext *ctx, int reg)
{
 return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
}


struct {

    uint64_t vreg[32 * RV_VLEN_MAX / 64] QEMU_ALIGNED(16);
    target_ulong vxrm;
    target_ulong vxsat;
    target_ulong vl;
    target_ulong vstart;
    target_ulong vtype;
    } vext;

Is it OK?


I don't see the point of a union for vreg.  I don't think you'll find that you
actually use it at all.


I think I can move most of execution check to translate time like SVE 
now. However, there are still some differences from SVE.


1)cpu_env must be used as a parameter for helper function.

    The helpers need  use env->vext.vl and env->vext.vstart.  Thus it 
will be difficult to use out of line tcg_gen_gvec_ool.


    void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs,

    uint32_t oprsz, uint32_t maxsz, int32_t data,
    gen_helper_gvec_2 *fn)
    {
        ..
    fn(a0, a1, desc);
 ..
 }
    Maybe I have to write  something similar to tcg_gen_gvec_ool in 
trans_rvv.inc.c.  But it will be redundant.


2)simd_desc is not proper.

    I also need to transfer some members of DisasContext to helpers.

    (Data, Vlmax, Mlen) is my current choice. Vlmax is the num of 
elements of this operation, so it will defined as ctx->lmul * ctx->vlen 
/ ctx->sew;


Data is reserved to expand.  Mlen is mask length for one elment, so it 
will defined as ctx->sew/ctx->lmul. As with Mlen, a active element will


be selected by

   static inline int vext_elem_mask(void *v0, int mlen, int index)
   {
    int idx = (index * mlen) / 8;
    int pos = (index * mlen) % 8;

    return (v0[idx] >> pos) & 0x1;
   }

    So I may have to implement vext_desc instead of use the simd_desc, 
which will be another redundant. Maybe a better way to mask elements?



You do need to document the element ordering that you're going to use for vreg.
  I.e. the mapping between the architectural vector register state and the
emulation state.  You have two choices:

(1) all bytes in host endianness (e.g. target/ppc)
(2) bytes within each uint64_t in host endianness,
 but each uint64_t is little-endian (e.g. target/arm).

Both require some fixup when running on a big-endian host.


Yes, I will take (2).


Best Regards,

Zhiwei



r~


Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-19 Thread Richard Henderson
On 12/18/19 11:11 PM, LIU Zhiwei wrote:
> I'm sorry that it's really hard to absorb your opinion. I don't know why clang
> will fail
> 
> when index beyond the end of vreg[n] into vreg[n+1].

I thought sure one of the address sanitizer checks would detect array bounds
overrun.  But it becomes irrelevant

> As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be a
> property which can be
> 
> specified from command line.  So the sub-struct maybe defined as
> 
> struct {
>     union{
>     uint64_t *u64 ;
>     int64_t  *s64;
>     uint32_t *u32;
>     int32_t  *s32;
>     uint16_t *u16;
>     int16_t  *s16;
>     uint8_t  *u8;
>     int8_t   *s8;
>     } mem;
>     target_ulong vxrm;
>     target_ulong vxsat;
>     target_ulong vl;
>     target_ulong vstart;
>     target_ulong vtype;
> } vext;
> 
> Will that be OK?

Pointers have consequences.  It can be done, but I don't think it is ideal.

> The (ill, lmul, sew ) of vtype  will be placed within tb_flags, also the bit 
> of
> (VSTART == 0 && VL == VLMAX).
> 
> So it will take 8 bits of tb flags for vector extension at least.

Good.

> However, I have one problem to support both command line VLEN and vreg_ofs.
> 
> As in SVE,  vreg ofs is the offset from cpu_env. If the structure of vector
> extension (to support command line VLEN) is
> 
> struct {
>     union{
>     uint64_t *u64 ;
>     int64_t  *s64;
>     uint32_t *u32;
>     int32_t  *s32;
>     uint16_t *u16;
>     int16_t  *s16;
>     uint8_t  *u8;
>     int8_t   *s8;
>     } mem;
>     target_ulong vxrm;
>     target_ulong vxsat;
>     target_ulong vl;
>     target_ulong vstart;
>     target_ulong vtype;
> } vext
> 
> I can't find the way to get the direct offset of vreg from cpu_env.
> 
> Maybe I should specify a max VLEN like the way of SVE?

I think a maximum vlen is best.  A command-line option to adjust vlen is all
well and good, but there's no reason to have to support vlen=(1<<29).

Oh, and you probably need a minimum vlen of 16 bytes as well, otherwise you
will run afoul of the assert in tcg-op-gvec.c that requires gvec operations to
be aligned mod 16.

I think that all you need is

uint64_t vreg[32 * MAX_VLEN / 8] QEMU_ALIGNED(16);

which gives us

uint32_t vreg_ofs(DisasContext *ctx, int reg)
{
return offsetof(CPURISCVState, vreg) + reg * ctx->vlen;
}

I don't see the point of a union for vreg.  I don't think you'll find that you
actually use it at all.

You do need to document the element ordering that you're going to use for vreg.
 I.e. the mapping between the architectural vector register state and the
emulation state.  You have two choices:

(1) all bytes in host endianness (e.g. target/ppc)
(2) bytes within each uint64_t in host endianness,
but each uint64_t is little-endian (e.g. target/arm).

Both require some fixup when running on a big-endian host.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-12-19 Thread LIU Zhiwei

Hi Richard,

Sorry to reply so late.

Upstream is really difficult . I was really frustrated to recieve so 
many difficult comments.


It is hard for me to absorb them and will take a lot of time to fixup. 
Now I will move on.


On 2019/8/29 2:54, Richard Henderson wrote:

On 8/27/19 7:36 PM, liuzhiwei wrote:

Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
Signed-off-by: liuzhiwei 
---
  fpu/softfloat.c |   119 +
  include/fpu/softfloat.h | 4 +
  linux-user/riscv/cpu_loop.c | 8 +-
  target/riscv/Makefile.objs  | 2 +-
  target/riscv/cpu.h  |30 +
  target/riscv/cpu_bits.h |15 +
  target/riscv/cpu_helper.c   | 7 +
  target/riscv/csr.c  |65 +-
  target/riscv/helper.h   |   354 +
  target/riscv/insn32.decode  |   374 +-
  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
  target/riscv/translate.c| 1 +
  target/riscv/vector_helper.c| 26563 ++
  13 files changed, 28017 insertions(+), 9 deletions(-)
+/* vector coprocessor state.  */
+struct {
+union VECTOR {
+float64  f64[VUNIT(64)];
+float32  f32[VUNIT(32)];
+float16  f16[VUNIT(16)];
+target_ulong ul[VUNIT(sizeof(target_ulong))];
+uint64_t u64[VUNIT(64)];
+int64_t  s64[VUNIT(64)];
+uint32_t u32[VUNIT(32)];
+int32_t  s32[VUNIT(32)];
+uint16_t u16[VUNIT(16)];
+int16_t  s16[VUNIT(16)];
+uint8_t  u8[VUNIT(8)];
+int8_t   s8[VUNIT(8)];
+} vreg[32];
+target_ulong vxrm;
+target_ulong vxsat;
+target_ulong vl;
+target_ulong vstart;
+target_ulong vtype;
+float_status fp_status;
+} vfp;

You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
in the context of risc-v.

I'm not sure that vreg[].element[] really makes the most sense in the context
of how risc-v rearranges its elements.


Use vreg[].element[] is my gut feeling.  It will be easiest to 
understand the code.


As you said, view all vector registers as a single block of memory is 
good for programing.



It will almost certainly fail clang
validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
vreg[n+1].


I'm sorry that it's really hard to absorb your opinion. I don't know why 
clang will fail


when index beyond the end of vreg[n] into vreg[n+1].


It might be best to have a single array:

 union {
 uint64_t u64[32 * VLEN / 64];
 ...
 uint8_t u8[32 * VLEN / 8];
 } velt;

This is clearer to the compiler that this is a single block of memory that we
can index as we please.


As Chih-Min Chao said in another part of PATCH V2 thread,  VLEN will be 
a property which can be


specified from command line.  So the sub-struct maybe defined as

struct {
union{
uint64_t *u64 ;
int64_t  *s64;
uint32_t *u32;
int32_t  *s32;
uint16_t *u16;
int16_t  *s16;
uint8_t  *u8;
int8_t   *s8;
} mem;
target_ulong vxrm;
target_ulong vxsat;
target_ulong vl;
target_ulong vstart;
target_ulong vtype;
} vext;

Will that be OK?


+static inline bool vector_vtype_ill(CPURISCVState *env)
+{
+if ((env->vfp.vtype >> (sizeof(target_ulong) - 1)) & 0x1) {
+return true;
+}
+return false;
+}
+
+static inline void vector_vtype_set_ill(CPURISCVState *env)
+{
+env->vfp.vtype = ((target_ulong)1) << (sizeof(target_ulong) - 1);
+return;
+}
+
+static inline int vector_vtype_get_sew(CPURISCVState *env)
+{
+return (env->vfp.vtype >> 2) & 0x7;
+}
+
+static inline int vector_get_width(CPURISCVState *env)
+{
+return  8 * (1 << vector_vtype_get_sew(env));
+}
+
+static inline int vector_get_lmul(CPURISCVState *env)
+{
+return 1 << (env->vfp.vtype & 0x3);
+}
+
+static inline int vector_get_vlmax(CPURISCVState *env)
+{
+return vector_get_lmul(env) * VLEN / vector_get_width(env);
+}
+
+static inline int vector_elem_mask(CPURISCVState *env, uint32_t vm, int width,
+int lmul, int index)
+{
+int mlen = width / lmul;
+int idx = (index * mlen) / 8;
+int pos = (index * mlen) % 8;
+
+return vm || ((env->vfp.vreg[0].u8[idx] >> pos) & 0x1);
+}

I would strongly encourage you place the components of vtype within tb_flags
via cpu_get_tb_cpu_state.  This would allow you to move quite a few checks from
run-time to translation-time.

Recall that translation happens once (per configuration), whereas execution
happens many times.  Obviously, the more configurations that we create, the
more translation that must happen.

All check code will be moved from execution time to translation.

But the vtypei argument to vsetvli is a good choice, because it is constant,
relates directly to the 

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-03 Thread Richard Henderson
On 9/2/19 2:43 AM, liuzhiwei wrote:
>> This is most definitely not the correct way to implement first-fault.
>>
>> You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
>> where we test pages for validity with tlb_vaddr_to_host.
> Why should  test pages for validity? If there is a page fault in running time,
> it just the case why it must use the fault-only-first instruction.

So that the helper does not fault for the Nth access, N > 1.

You test to see if the page has a mapping, and if it doesn't,
you end the instruction, without going through the exception
path that I have objections to.

Except for gather loads, you don't have to test for every
access, only at page boundaries.  And then you may also arrange
to use direct host access to the pages that you've validated.

Again, have a look at sve_ldff1_r.

> A single array is a good idea. But vreg[] will be better for understanding as 
> it preserve the register concepts. 

A function access to the registers would be just as good for that.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-03 Thread Richard Henderson
On 9/2/19 12:45 AM, liuzhiwei wrote:
> 
> On 2019/8/29 下午11:09, Richard Henderson wrote:
>> On 8/29/19 5:45 AM, liuzhiwei wrote:
>>> Even in qemu,  it may be some situations that VSTART != 0. For example, a 
>>> load
>>> instruction leads to a page fault exception in a middle position. If VSTART 
>>> ==
>>> 0,  some elements that had been loaded before the exception will be loaded 
>>> once
>>> again.
>> Alternately, you can validate all of the pages before performing any memory
>> operations.  At which point there will never be an exception in the middle.
> 
> As a vector instruction may access memory  across many pages,  is there any 
> way
> to validate the pages? Page table walk ?Or some TLB APIs?

Yes, there are TLB APIs.  Several of them, depending on what is needed.

> #0  cpu_watchpoint_address_matches (wp=0x56228110, addr=536871072, len=1)
> at qemu/exec.c:1094
> #1  0x5567204f in check_watchpoint (offset=160, len=1, attrs=...,
> flags=2) at qemu/exec.c:2803
> #2  0x55672379 in watch_mem_write (opaque=0x0, addr=536871072, 
> val=165,
> size=1, attrs=...) at qemu/exec.c:2878
> #3  0x556d44bb in memory_region_write_with_attrs_accessor
> (mr=0x561292e0 , addr=536871072, value=0x7fffedffe2c8,
> size=1, shift=0, mask=255, attrs=...)
>     at qemu/memory.c:553
> #4  0x556d45de in access_with_adjusted_size (addr=536871072,
> value=0x7fffedffe2c8, size=1, access_size_min=1, access_size_max=8,
> access_fn=0x556d43cd ,
>     mr=0x561292e0 , attrs=...) at qemu/memory.c:594
> #5  0x556d7247 in memory_region_dispatch_write (mr=0x561292e0
> , addr=536871072, data=165, size=1, attrs=...) at 
> qemu/memory.c:1480
> #6  0x556f0d13 in io_writex (env=0x561efb58,
> iotlbentry=0x561f5398, mmu_idx=1, val=165, addr=536871072, retaddr=0,
> recheck=false, size=1) at qemu/accel/tcg/cputlb.c:909
> #7  0x556f19a6 in io_writeb (env=0x561efb58, mmu_idx=1, index=0,
> val=165 '\245', addr=536871072, retaddr=0, recheck=false) at
> qemu/accel/tcg/softmmu_template.h:268
> #8  0x556f1b54 in helper_ret_stb_mmu (env=0x561efb58,
> addr=536871072, val=165 '\245', oi=1, retaddr=0) at
> qemu/accel/tcg/softmmu_template.h:304
> #9  0x55769f06 in cpu_stb_data_ra (env=0x561efb58, ptr=536871072,
> v=165, retaddr=0) at qemu/include/exec/cpu_ldst_template.h:182
> #10 0x55769f80 in cpu_stb_data (env=0x561efb58, ptr=536871072,
> v=165) at /qemu/include/exec/cpu_ldst_template.h:194
> #11 0x5576a913 in csky_cpu_stb_data (env=0x561efb58,
> vaddr=536871072, data=165 '\245') at qemu/target/csky/csky_ldst.c:48
> #12 0x5580ba7d in helper_vdsp2_vstru_n (env=0x561efb58,
> insn=4167183360) at qemu/target/csky/op_vdsp2.c:1317
> 
> The path is not related to probe_write in the patch().

Of course.  It wasn't supposed to be.

> Could you give more details or a test case where watchpoint doesn't work
> correctly?

If the store partially, but not completely, overlaps the watchpoint.  This is
obviously much easier to do with large vector operations than with normal
integer operations.

In this case, we may have completed some of the stores before encountering the
watchpoint.  Which, inside check_watchpoint(), will longjmp back to the cpu
main loop.  Now we have a problem: the store is partially complete and it
should not be.

Therefore, we now have patches queued in tcg-next that adjust probe_write to
perform both access and watchpoint tests.  There is still target-specific code
that must be adjusted to match, so there are not currently any examples in the
tree to show.

However, the idea is:
  (1) Instructions that perform more than one host store must probe
  the entire range to be stored before performing any stores.

  (2) Instructions that perform more than one host load must either
  probe the entire range to be loaded, or collect the data in
  temporary storage.  If not using probes, writeback to the
  register file must be delayed until after all loads are done.

  (3) Any one probe may not cross a page boundary; splitting of the
  access across pages must be done by the helper.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-02 Thread liuzhiwei



On 2019/8/29 上午2:54, Richard Henderson wrote:

On 8/27/19 7:36 PM, liuzhiwei wrote:

Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
Signed-off-by: liuzhiwei 
---
  fpu/softfloat.c |   119 +
  include/fpu/softfloat.h | 4 +
  linux-user/riscv/cpu_loop.c | 8 +-
  target/riscv/Makefile.objs  | 2 +-
  target/riscv/cpu.h  |30 +
  target/riscv/cpu_bits.h |15 +
  target/riscv/cpu_helper.c   | 7 +
  target/riscv/csr.c  |65 +-
  target/riscv/helper.h   |   354 +
  target/riscv/insn32.decode  |   374 +-
  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
  target/riscv/translate.c| 1 +
  target/riscv/vector_helper.c| 26563 ++
  13 files changed, 28017 insertions(+), 9 deletions(-)

As Alex mentioned, this is *far* too big to be presented as a single patch.

OK, split it into patch set in V2



diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3ff3fa5..3b0754c 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status 
*status);
  float16 float16_sqrt(float16, float_status *status);
  int float16_compare(float16, float16, float_status *status);
  int float16_compare_quiet(float16, float16, float_status *status);
+int float16_unordered_quiet(float16, float16, float_status *status);
+int float16_le(float16, float16, float_status *status);
+int float16_lt(float16, float16, float_status *status);
+int float16_eq_quiet(float16, float16, float_status *status);

As Alex mentioned, none of these changes are required, as all
functionality is provided by float16_compare{,_quiet}.

Yes, use float16_compare instead.

diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
index 12aa3c0..b01548a 100644
--- a/linux-user/riscv/cpu_loop.c
+++ b/linux-user/riscv/cpu_loop.c
@@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
  signum = 0;
  sigcode = 0;
  sigaddr = 0;
-
+if (env->foflag) {
+if (env->vfp.vl != 0) {
+env->foflag = false;
+env->pc += 4;
+continue;
+}

This is most definitely not the correct way to implement first-fault.

You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
where we test pages for validity with tlb_vaddr_to_host.
Why should  test pages for validity? If there is a page fault in running 
time, it just the case why it must use the fault-only-first instruction.

+/* vector coprocessor state.  */
+struct {
+union VECTOR {
+float64  f64[VUNIT(64)];
+float32  f32[VUNIT(32)];
+float16  f16[VUNIT(16)];
+target_ulong ul[VUNIT(sizeof(target_ulong))];
+uint64_t u64[VUNIT(64)];
+int64_t  s64[VUNIT(64)];
+uint32_t u32[VUNIT(32)];
+int32_t  s32[VUNIT(32)];
+uint16_t u16[VUNIT(16)];
+int16_t  s16[VUNIT(16)];
+uint8_t  u8[VUNIT(8)];
+int8_t   s8[VUNIT(8)];
+} vreg[32];
+target_ulong vxrm;
+target_ulong vxsat;
+target_ulong vl;
+target_ulong vstart;
+target_ulong vtype;
+float_status fp_status;
+} vfp;

You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
in the context of risc-v.
I'm not sure that vreg[].element[] really makes the most sense in the context
of how risc-v rearranges its elements.  It will almost certainly fail clang
validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
vreg[n+1].

It might be best to have a single array:

 union {
 uint64_t u64[32 * VLEN / 64];
 ...
 uint8_t u8[32 * VLEN / 8];
 } velt;

This is clearer to the compiler that this is a single block of memory that we
can index as we please.


A single array is a good idea. But vreg[] will be better for understanding as 
it preserve the register concepts.


Note that float64/float32/float16 are legacy.  They will always be equivalent
to the unsigned integer types of the same size.

Is there really any vector operation at all that is dependent on XLEN?  If not,
then there is no reason to confuse things by including target_ulong.


OK.

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index e32b612..405caf6 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
  [PRV_H] = RISCV_EXCP_H_ECALL,
  [PRV_M] = RISCV_EXCP_M_ECALL
  };
+if (env->foflag) {
+if (env->vfp.vl != 0) {
+env->foflag = false;
+env->pc += 4;
+return;
+}
+}

Again, not the way to implement first-fault.

In particular, you 

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-02 Thread liuzhiwei



On 2019/8/29 下午11:09, Richard Henderson wrote:

On 8/29/19 5:45 AM, liuzhiwei wrote:

Even in qemu,  it may be some situations that VSTART != 0. For example, a load
instruction leads to a page fault exception in a middle position. If VSTART ==
0,  some elements that had been loaded before the exception will be loaded once
again.

Alternately, you can validate all of the pages before performing any memory
operations.  At which point there will never be an exception in the middle.


As a vector instruction may access memory  across many pages,  is there 
any way to validate the pages? Page table walk ?Or some TLB APIs?



As it turns out, you *must* do this in order to allow watchpoints to work
correctly.  David Hildebrand and I are at this moment fixing this aspect of
watchpoints for s390x.

See https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05979.html


I am interested in wathpoint implementation and  once implemented the 
user mode watchpoints in the wild.


A backtrace of watchpoint is like

#0  cpu_watchpoint_address_matches (wp=0x56228110, addr=536871072, 
len=1) at qemu/exec.c:1094
#1  0x5567204f in check_watchpoint (offset=160, len=1, 
attrs=..., flags=2) at qemu/exec.c:2803
#2  0x55672379 in watch_mem_write (opaque=0x0, addr=536871072, 
val=165, size=1, attrs=...) at qemu/exec.c:2878
#3  0x556d44bb in memory_region_write_with_attrs_accessor 
(mr=0x561292e0 , addr=536871072, value=0x7fffedffe2c8, 
size=1, shift=0, mask=255, attrs=...)

    at qemu/memory.c:553
#4  0x556d45de in access_with_adjusted_size (addr=536871072, 
value=0x7fffedffe2c8, size=1, access_size_min=1, access_size_max=8, 
access_fn=0x556d43cd ,

    mr=0x561292e0 , attrs=...) at qemu/memory.c:594
#5  0x556d7247 in memory_region_dispatch_write 
(mr=0x561292e0 , addr=536871072, data=165, size=1, 
attrs=...) at qemu/memory.c:1480
#6  0x556f0d13 in io_writex (env=0x561efb58, 
iotlbentry=0x561f5398, mmu_idx=1, val=165, addr=536871072, 
retaddr=0, recheck=false, size=1) at qemu/accel/tcg/cputlb.c:909
#7  0x556f19a6 in io_writeb (env=0x561efb58, mmu_idx=1, 
index=0, val=165 '\245', addr=536871072, retaddr=0, recheck=false) at 
qemu/accel/tcg/softmmu_template.h:268
#8  0x556f1b54 in helper_ret_stb_mmu (env=0x561efb58, 
addr=536871072, val=165 '\245', oi=1, retaddr=0) at 
qemu/accel/tcg/softmmu_template.h:304
#9  0x55769f06 in cpu_stb_data_ra (env=0x561efb58, 
ptr=536871072, v=165, retaddr=0) at 
qemu/include/exec/cpu_ldst_template.h:182
#10 0x55769f80 in cpu_stb_data (env=0x561efb58, 
ptr=536871072, v=165) at /qemu/include/exec/cpu_ldst_template.h:194
#11 0x5576a913 in csky_cpu_stb_data (env=0x561efb58, 
vaddr=536871072, data=165 '\245') at qemu/target/csky/csky_ldst.c:48
#12 0x5580ba7d in helper_vdsp2_vstru_n (env=0x561efb58, 
insn=4167183360) at qemu/target/csky/op_vdsp2.c:1317


The path is not related to probe_write in the patch().

Could you give more details or a test case where watchpoint doesn't work 
correctly?




r~





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-02 Thread liuzhiwei



On 2019/8/29 下午11:14, Richard Henderson wrote:

On 8/29/19 5:00 AM, liuzhiwei wrote:

Maybe there is some better test method or some forced test cases in QEMU. Could
you give me some advice for testing?

If you have hardware, or another simulator, RISU is very good
for testing these sorts of things.

See https://git.linaro.org/people/pmaydell/risu.git

You'll need to write new support for RISC-V, but it's not hard
and we can help out with that.


r~


Hi, Richard

Thank you for your advice.  I will run test cases in Spike for cross 
validation at first.


Best Regards,
Zhiwei





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-09-02 Thread liuzhiwei



On 2019/8/30 上午5:50, Alistair Francis wrote:

On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei  wrote:

On 2019/8/29 上午5:34, Alistair Francis wrote:

On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:

Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
Signed-off-by: liuzhiwei 
---
   fpu/softfloat.c |   119 +
   include/fpu/softfloat.h | 4 +
   linux-user/riscv/cpu_loop.c | 8 +-
   target/riscv/Makefile.objs  | 2 +-
   target/riscv/cpu.h  |30 +
   target/riscv/cpu_bits.h |15 +
   target/riscv/cpu_helper.c   | 7 +
   target/riscv/csr.c  |65 +-
   target/riscv/helper.h   |   354 +
   target/riscv/insn32.decode  |   374 +-
   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
   target/riscv/translate.c| 1 +
   target/riscv/vector_helper.c| 26563 
++
   13 files changed, 28017 insertions(+), 9 deletions(-)
   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
   create mode 100644 target/riscv/vector_helper.c


Hello,

Thanks for the patch!

As others have pointed out you will need to split the patch up into
multiple smaller patches, otherwise it is too hard to review almost
30,000 lines of code.

Hi, Alistair

I'm so sorry for the inconvenience. It will be a patch set with a cover
letter in V2.

No worries.


Can you also include a cover letter with your patch series describing
how you are testing this? AFAIK vector extension support isn't in any
compiler so I'm assuming you are handwriting the assembly or have
toolchain patches. Either way it will help if you can share that so
others can test your implementation.

Yes, it's handwriting assembly. The assembler in Binutils has support
Vector extension.  First define an function test_vadd_vv_8 in assembly
and then it can be called from a C program.

The function is something like

/* vadd.vv */
TEST_FUNC(test_vadd_vv_8)
  vsetvlit1, x0, e8, m2
  vlb.v   v6, (a4)
  vsb.v   v6, (a3)
  vsetvlit1, a0, e8, m2
  vlb.v   v0, (a1)
  vlb.v   v2, (a2)
  vadd.vv v4, v0, v2
  vsb.v  v4, (a3)
ret
  .size   test_vadd_vv_8, .-test_vadd_vv_8

If possible it might be worth releasing the code that you are using for testing.

Yes,  but I didn't find a good place to release these test codes currently.



It takes more time to test than to implement the instructions. Maybe
there is some better test method or some forced test cases in QEMU.
Could you give me some advice for testing?

Richard's idea of risu seems like a good option.
All the test cases will be validated in Spike,  which has supported the 
same vector specification. But this  cross validation work may delay 
until V3.
I will split the patch, and address comments as soon as possible, to 
ensure the patch V2 can be sent next week.

Would it be all right?


Thinking about it a bit more we are going to have other extensions in
the future that will need assembly testing so setting up a test
framework seems like a good idea. I am happy to help try and get this
going as well.

Alistair
There is usually a big difference between new a ISA extension and the 
others. I doubt there is an general framework. A very light framework  
includes
building, input aiding  generation, result validation, and report maybe 
OK .


Best Regards,
Zhiwei

Best Regards,

Zhiwei


Alex and Richard have kindly started the review. Once you have
addressed their comments and split this patch up into smaller patches
you can send a v2 and we can go from there.

Once again thanks for doing this implementation for QEMU!

Alistair





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-30 Thread Alistair Francis
On Fri, Aug 30, 2019 at 2:06 AM Alex Bennée  wrote:
>
>
> Alistair Francis  writes:
>
> > On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei  wrote:
> >>
> >> On 2019/8/29 上午5:34, Alistair Francis wrote:
> >> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:
> >> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> >> >> Signed-off-by: liuzhiwei 
> >> >> ---
> >> >>   fpu/softfloat.c |   119 +
> >> >>   include/fpu/softfloat.h | 4 +
> >> >>   linux-user/riscv/cpu_loop.c | 8 +-
> >> >>   target/riscv/Makefile.objs  | 2 +-
> >> >>   target/riscv/cpu.h  |30 +
> >> >>   target/riscv/cpu_bits.h |15 +
> >> >>   target/riscv/cpu_helper.c   | 7 +
> >> >>   target/riscv/csr.c  |65 +-
> >> >>   target/riscv/helper.h   |   354 +
> >> >>   target/riscv/insn32.decode  |   374 +-
> >> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
> >> >>   target/riscv/translate.c| 1 +
> >> >>   target/riscv/vector_helper.c| 26563 
> >> >> ++
> >> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
> >> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >> >>   create mode 100644 target/riscv/vector_helper.c
> >> >>
> >> > Hello,
> >> >
> >> > Thanks for the patch!
> >> >
> >> > As others have pointed out you will need to split the patch up into
> >> > multiple smaller patches, otherwise it is too hard to review almost
> >> > 30,000 lines of code.
> >>
> >> Hi, Alistair
> >>
> >> I'm so sorry for the inconvenience. It will be a patch set with a cover
> >> letter in V2.
> >
> > No worries.
> >
> >>
> >> > Can you also include a cover letter with your patch series describing
> >> > how you are testing this? AFAIK vector extension support isn't in any
> >> > compiler so I'm assuming you are handwriting the assembly or have
> >> > toolchain patches. Either way it will help if you can share that so
> >> > others can test your implementation.
> >>
> >> Yes, it's handwriting assembly. The assembler in Binutils has support
> >> Vector extension.  First define an function test_vadd_vv_8 in assembly
> >> and then it can be called from a C program.
> >>
> >> The function is something like
> >>
> >> /* vadd.vv */
> >> TEST_FUNC(test_vadd_vv_8)
> >>  vsetvlit1, x0, e8, m2
> >>  vlb.v   v6, (a4)
> >>  vsb.v   v6, (a3)
> >>  vsetvlit1, a0, e8, m2
> >>  vlb.v   v0, (a1)
> >>  vlb.v   v2, (a2)
> >>  vadd.vv v4, v0, v2
> >>  vsb.v  v4, (a3)
> >> ret
> >>  .size   test_vadd_vv_8, .-test_vadd_vv_8
> >
> > If possible it might be worth releasing the code that you are using for 
> > testing.
> >
> >>
> >> It takes more time to test than to implement the instructions. Maybe
> >> there is some better test method or some forced test cases in QEMU.
> >> Could you give me some advice for testing?
> >
> > Richard's idea of risu seems like a good option.
> >
> > Thinking about it a bit more we are going to have other extensions in
> > the future that will need assembly testing so setting up a test
> > framework seems like a good idea. I am happy to help try and get this
> > going as well.

Ah, I looked into this more and it compares it to hardware running the
same binary. In this case there is no hardware so that doesn't work
too well.

What we could do though, is compare it to Spike (which I think has the
vector instructions?) which would have the same effect.

>
> tests/tcg already has the bits you need for both linux-user and system
> based testing. The main problem is getting a version of gcc that is new
> enough to emit the newer instructions. I recently updated the images to
> buster so gcc is pretty recent now (8.3).

In this case there is no GCC with the new instructions.

>
> I did start down the road of a general "op" test frame work which tried
> to come up with a common framework/boilerplate so all you needed to do
> was supply a new function (possible with a hex encoded instruction) and
> a list of expected inputs and outputs:
>
>   https://github.com/stsquad/qemu/commits/testing/generic-op-tester
>
> I suspect it was over engineered but perhaps it would be worth reviving
> it (or something like it) to make adding a simple single instruction
> test case with minimal additional verbiage?

That would be interesting, I'll take a look.

Alistair

>
> >
> > Alistair
> >
> >>
> >> Best Regards,
> >>
> >> Zhiwei
> >>
> >> > Alex and Richard have kindly started the review. Once you have
> >> > addressed their comments and split this patch up into smaller patches
> >> > you can send a v2 and we can go from there.
> >> >
> >> > Once again thanks for doing this implementation for QEMU!
> >> >
> >> > Alistair
> >> >
>
>
> --
> Alex Bennée



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-30 Thread Alex Bennée


Alistair Francis  writes:

> On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei  wrote:
>>
>> On 2019/8/29 上午5:34, Alistair Francis wrote:
>> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:
>> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>> >> Signed-off-by: liuzhiwei 
>> >> ---
>> >>   fpu/softfloat.c |   119 +
>> >>   include/fpu/softfloat.h | 4 +
>> >>   linux-user/riscv/cpu_loop.c | 8 +-
>> >>   target/riscv/Makefile.objs  | 2 +-
>> >>   target/riscv/cpu.h  |30 +
>> >>   target/riscv/cpu_bits.h |15 +
>> >>   target/riscv/cpu_helper.c   | 7 +
>> >>   target/riscv/csr.c  |65 +-
>> >>   target/riscv/helper.h   |   354 +
>> >>   target/riscv/insn32.decode  |   374 +-
>> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>> >>   target/riscv/translate.c| 1 +
>> >>   target/riscv/vector_helper.c| 26563 
>> >> ++
>> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
>> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>> >>   create mode 100644 target/riscv/vector_helper.c
>> >>
>> > Hello,
>> >
>> > Thanks for the patch!
>> >
>> > As others have pointed out you will need to split the patch up into
>> > multiple smaller patches, otherwise it is too hard to review almost
>> > 30,000 lines of code.
>>
>> Hi, Alistair
>>
>> I'm so sorry for the inconvenience. It will be a patch set with a cover
>> letter in V2.
>
> No worries.
>
>>
>> > Can you also include a cover letter with your patch series describing
>> > how you are testing this? AFAIK vector extension support isn't in any
>> > compiler so I'm assuming you are handwriting the assembly or have
>> > toolchain patches. Either way it will help if you can share that so
>> > others can test your implementation.
>>
>> Yes, it's handwriting assembly. The assembler in Binutils has support
>> Vector extension.  First define an function test_vadd_vv_8 in assembly
>> and then it can be called from a C program.
>>
>> The function is something like
>>
>> /* vadd.vv */
>> TEST_FUNC(test_vadd_vv_8)
>>  vsetvlit1, x0, e8, m2
>>  vlb.v   v6, (a4)
>>  vsb.v   v6, (a3)
>>  vsetvlit1, a0, e8, m2
>>  vlb.v   v0, (a1)
>>  vlb.v   v2, (a2)
>>  vadd.vv v4, v0, v2
>>  vsb.v  v4, (a3)
>> ret
>>  .size   test_vadd_vv_8, .-test_vadd_vv_8
>
> If possible it might be worth releasing the code that you are using for 
> testing.
>
>>
>> It takes more time to test than to implement the instructions. Maybe
>> there is some better test method or some forced test cases in QEMU.
>> Could you give me some advice for testing?
>
> Richard's idea of risu seems like a good option.
>
> Thinking about it a bit more we are going to have other extensions in
> the future that will need assembly testing so setting up a test
> framework seems like a good idea. I am happy to help try and get this
> going as well.

tests/tcg already has the bits you need for both linux-user and system
based testing. The main problem is getting a version of gcc that is new
enough to emit the newer instructions. I recently updated the images to
buster so gcc is pretty recent now (8.3).

I did start down the road of a general "op" test frame work which tried
to come up with a common framework/boilerplate so all you needed to do
was supply a new function (possible with a hex encoded instruction) and
a list of expected inputs and outputs:

  https://github.com/stsquad/qemu/commits/testing/generic-op-tester

I suspect it was over engineered but perhaps it would be worth reviving
it (or something like it) to make adding a simple single instruction
test case with minimal additional verbiage?

>
> Alistair
>
>>
>> Best Regards,
>>
>> Zhiwei
>>
>> > Alex and Richard have kindly started the review. Once you have
>> > addressed their comments and split this patch up into smaller patches
>> > you can send a v2 and we can go from there.
>> >
>> > Once again thanks for doing this implementation for QEMU!
>> >
>> > Alistair
>> >


--
Alex Bennée



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread Alistair Francis
On Thu, Aug 29, 2019 at 5:05 AM liuzhiwei  wrote:
>
> On 2019/8/29 上午5:34, Alistair Francis wrote:
> > On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:
> >> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> >> Signed-off-by: liuzhiwei 
> >> ---
> >>   fpu/softfloat.c |   119 +
> >>   include/fpu/softfloat.h | 4 +
> >>   linux-user/riscv/cpu_loop.c | 8 +-
> >>   target/riscv/Makefile.objs  | 2 +-
> >>   target/riscv/cpu.h  |30 +
> >>   target/riscv/cpu_bits.h |15 +
> >>   target/riscv/cpu_helper.c   | 7 +
> >>   target/riscv/csr.c  |65 +-
> >>   target/riscv/helper.h   |   354 +
> >>   target/riscv/insn32.decode  |   374 +-
> >>   target/riscv/insn_trans/trans_rvv.inc.c |   484 +
> >>   target/riscv/translate.c| 1 +
> >>   target/riscv/vector_helper.c| 26563 
> >> ++
> >>   13 files changed, 28017 insertions(+), 9 deletions(-)
> >>   create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
> >>   create mode 100644 target/riscv/vector_helper.c
> >>
> > Hello,
> >
> > Thanks for the patch!
> >
> > As others have pointed out you will need to split the patch up into
> > multiple smaller patches, otherwise it is too hard to review almost
> > 30,000 lines of code.
>
> Hi, Alistair
>
> I'm so sorry for the inconvenience. It will be a patch set with a cover
> letter in V2.

No worries.

>
> > Can you also include a cover letter with your patch series describing
> > how you are testing this? AFAIK vector extension support isn't in any
> > compiler so I'm assuming you are handwriting the assembly or have
> > toolchain patches. Either way it will help if you can share that so
> > others can test your implementation.
>
> Yes, it's handwriting assembly. The assembler in Binutils has support
> Vector extension.  First define an function test_vadd_vv_8 in assembly
> and then it can be called from a C program.
>
> The function is something like
>
> /* vadd.vv */
> TEST_FUNC(test_vadd_vv_8)
>  vsetvlit1, x0, e8, m2
>  vlb.v   v6, (a4)
>  vsb.v   v6, (a3)
>  vsetvlit1, a0, e8, m2
>  vlb.v   v0, (a1)
>  vlb.v   v2, (a2)
>  vadd.vv v4, v0, v2
>  vsb.v  v4, (a3)
> ret
>  .size   test_vadd_vv_8, .-test_vadd_vv_8

If possible it might be worth releasing the code that you are using for testing.

>
> It takes more time to test than to implement the instructions. Maybe
> there is some better test method or some forced test cases in QEMU.
> Could you give me some advice for testing?

Richard's idea of risu seems like a good option.

Thinking about it a bit more we are going to have other extensions in
the future that will need assembly testing so setting up a test
framework seems like a good idea. I am happy to help try and get this
going as well.

Alistair

>
> Best Regards,
>
> Zhiwei
>
> > Alex and Richard have kindly started the review. Once you have
> > addressed their comments and split this patch up into smaller patches
> > you can send a v2 and we can go from there.
> >
> > Once again thanks for doing this implementation for QEMU!
> >
> > Alistair
> >



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread Aleksandar Markovic
29.08.2019. 15.02, "liuzhiwei"  је написао/ла:
>
>
> On 2019/8/29 上午3:20, Aleksandar Markovic wrote:
>>
>>
>>
>> > On Wed, Aug 28, 2019 at 9:04 AM liuzhiwei  wrote:
>>>
>>> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
>>> Signed-off-by: liuzhiwei 
>>> ---
>>
>>
>> Such large patch and "Change-Id:
I3cf891bc400713b95f47ecca82b1bf773f3dcb25" is its entire commit message??
Horrible.
>
> Hi,  Aleksandar
>
> I am so sorry. A patch set with cover letter will be sent later.
>
> Best Regards,
>
> Zhiwei

OK, Zhiwei,

You'll soon get more used  to participating in open source, and write much
better patches.

Try to follow guidelines described at
https://wiki.qemu.org/Contribute/SubmitAPatch
Thanks,
Aleksandar



>>
>> Aleksandar
>>
>>>
>>>  fpu/softfloat.c |   119 +
>>>  include/fpu/softfloat.h | 4 +
>>>  linux-user/riscv/cpu_loop.c | 8 +-
>>>  target/riscv/Makefile.objs  | 2 +-
>>>  target/riscv/cpu.h  |30 +
>>>  target/riscv/cpu_bits.h |15 +
>>>  target/riscv/cpu_helper.c   | 7 +
>>>  target/riscv/csr.c  |65 +-
>>>  target/riscv/helper.h   |   354 +
>>>  target/riscv/insn32.decode  |   374 +-
>>>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>>>  target/riscv/translate.c| 1 +
>>>  target/riscv/vector_helper.c| 26563
++
>>>  13 files changed, 28017 insertions(+), 9 deletions(-)
>>>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>>>  create mode 100644 target/riscv/vector_helper.c
>>>
>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>> index 2ba36ec..da155ea 100644
>>> --- a/fpu/softfloat.c
>>> +++ b/fpu/softfloat.c
>>> @@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
>>>  }
>>>
>>>
 /*
>>> +| Returns the sign bit of the half-precision floating-point value `a'.
>>>
+**/
>>> +
>>> +static inline flag extractFloat16Sign(float16 a)
>>> +{
>>> +return float16_val(a) >> 0xf;
>>> +}
>>> +
>>> +
>>>
+/*
>>>  | Returns the fraction bits of the single-precision floating-point
value `a'.
>>>
 **/
>>>
>>> @@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b,
float_status *status)
>>>  }
>>>
>>>
 /*
>>> +| Returns 1 if the half-precision floating-point value `a' is less than
>>> +| or equal to the corresponding value `b', and 0 otherwise.  The
invalid
>>> +| exception is raised if either operand is a NaN.  The comparison is
performed
>>> +| according to the IEC/IEEE Standard for Binary Floating-Point
Arithmetic.
>>>
+**/
>>> +
>>> +int float16_le(float16 a, float16 b, float_status *status)
>>> +{
>>> +flag aSign, bSign;
>>> +uint16_t av, bv;
>>> +a = float16_squash_input_denormal(a, status);
>>> +b = float16_squash_input_denormal(b, status);
>>> +
>>> +if (( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> + || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +   ) {
>>> +float_raise(float_flag_invalid, status);
>>> +return 0;
>>> +}
>>> +aSign = extractFloat16Sign( a );
>>> +bSign = extractFloat16Sign( b );
>>> +av = float16_val(a);
>>> +bv = float16_val(b);
>>> +if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv
)<<1 ) == 0 );
>>> +return ( av == bv ) || ( aSign ^ ( av < bv ) );
>>> +
>>> +}
>>> +
>>>
+/*
>>>  | Returns 1 if the single-precision floating-point value `a' is less
than
>>>  | or equal to the corresponding value `b', and 0 otherwise.  The
invalid
>>>  | exception is raised if either operand is a NaN.  The comparison is
performed
>>> @@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b,
float_status *status)
>>>  | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>>>
 **/
>>>
>>> +int float16_lt(float16 a, float16 b, float_status *status)
>>> +{
>>> +flag aSign, bSign;
>>> +uint16_t av, bv;
>>> +a = float16_squash_input_denormal(a, status);
>>> +b = float16_squash_input_denormal(b, status);
>>> +
>>> +if (( ( extractFloat16Exp( a ) == 0x1F ) &&
extractFloat16Frac( a ) )
>>> + || ( ( extractFloat16Exp( b ) == 0x1F ) &&
extractFloat16Frac( b ) )
>>> +   ) {
>>> +float_raise(float_flag_invalid, status);
>>> +return 0;

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread Richard Henderson
On 8/29/19 5:45 AM, liuzhiwei wrote:
> Even in qemu,  it may be some situations that VSTART != 0. For example, a load
> instruction leads to a page fault exception in a middle position. If VSTART ==
> 0,  some elements that had been loaded before the exception will be loaded 
> once
> again.

Alternately, you can validate all of the pages before performing any memory
operations.  At which point there will never be an exception in the middle.

As it turns out, you *must* do this in order to allow watchpoints to work
correctly.  David Hildebrand and I are at this moment fixing this aspect of
watchpoints for s390x.

See https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg05979.html


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread Richard Henderson
On 8/29/19 5:00 AM, liuzhiwei wrote:
> Maybe there is some better test method or some forced test cases in QEMU. 
> Could
> you give me some advice for testing?

If you have hardware, or another simulator, RISU is very good
for testing these sorts of things.

See https://git.linaro.org/people/pmaydell/risu.git

You'll need to write new support for RISC-V, but it's not hard
and we can help out with that.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread liuzhiwei

Hi,  Alex

On 2019/8/28 下午5:08, Alex Bennée wrote:

liuzhiwei  writes:


Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
Signed-off-by: liuzhiwei 
---
  fpu/softfloat.c |   119 +
  include/fpu/softfloat.h | 4 +

Changes to softfloat should be in a separate patch, but see bellow.


  linux-user/riscv/cpu_loop.c | 8 +-
  target/riscv/Makefile.objs  | 2 +-
  target/riscv/cpu.h  |30 +
  target/riscv/cpu_bits.h |15 +
  target/riscv/cpu_helper.c   | 7 +
  target/riscv/csr.c  |65 +-
  target/riscv/helper.h   |   354 +
  target/riscv/insn32.decode  |   374 +-
  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
  target/riscv/translate.c| 1 +
  target/riscv/vector_helper.c| 26563 ++

This is likely too big to be reviewed. Is it possible to split the patch
up into more discrete chunks, for example support pieces and then maybe
a class at a time?


Yes,  a patch set with cover letter will be sent later.




  13 files changed, 28017 insertions(+), 9 deletions(-)
  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
  create mode 100644 target/riscv/vector_helper.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2ba36ec..da155ea 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
  }

  /*
+| Returns the sign bit of the half-precision floating-point value `a'.
+**/
+
+static inline flag extractFloat16Sign(float16 a)
+{
+return float16_val(a) >> 0xf;
+}
+

We are trying to avoid this sort of bit fiddling for new code when we
already have generic decompose functions that can extract all the parts
into a common format.


+
+/*
  | Returns the fraction bits of the single-precision floating-point value `a'.
  
**/

@@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b, float_status 
*status)
  }

  /*
+| Returns 1 if the half-precision floating-point value `a' is less than
+| or equal to the corresponding value `b', and 0 otherwise.  The invalid
+| exception is raised if either operand is a NaN.  The comparison is performed
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+**/
+
+int float16_le(float16 a, float16 b, float_status *status)
+{
+flag aSign, bSign;
+uint16_t av, bv;
+a = float16_squash_input_denormal(a, status);
+b = float16_squash_input_denormal(b, status);
+
+if (( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
+ || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
+   ) {
+float_raise(float_flag_invalid, status);
+return 0;
+}
+aSign = extractFloat16Sign( a );
+bSign = extractFloat16Sign( b );
+av = float16_val(a);
+bv = float16_val(b);
+if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv )<<1 ) == 0 
);
+return ( av == bv ) || ( aSign ^ ( av < bv ) );
+
+}

What does this provide that:

   float16_compare(a, b, status) == float_relation_less;

doesn't?


+
+/*
  | Returns 1 if the single-precision floating-point value `a' is less than
  | or equal to the corresponding value `b', and 0 otherwise.  The invalid
  | exception is raised if either operand is a NaN.  The comparison is performed
@@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b, float_status 
*status)
  | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
  
**/

+int float16_lt(float16 a, float16 b, float_status *status)
+{
+flag aSign, bSign;
+uint16_t av, bv;
+a = float16_squash_input_denormal(a, status);
+b = float16_squash_input_denormal(b, status);
+
+if (( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
+ || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
+   ) {
+float_raise(float_flag_invalid, status);
+return 0;
+}
+aSign = extractFloat16Sign( a );
+bSign = extractFloat16Sign( b );
+av = float16_val(a);
+bv = float16_val(b);
+if ( aSign != bSign ) return aSign && ( (uint16_t) ( ( av | bv )<<1 ) != 0 
);
+return ( av != bv ) && ( aSign ^ ( av < bv ) );
+
+}
+

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread liuzhiwei



On 2019/8/29 上午4:43, Richard Henderson wrote:

On 8/28/19 11:54 AM, Richard Henderson wrote:

But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
single bit.

BTW, it is reasonable to check VSTART == 0 always.  Quoting the spec:

# Implementations are permitted to raise illegal instruction exceptions
# when attempting to execute a vector instruction with a value of vstart
# that the implementation can never produce when executing that same
# instruction with the same vtype setting.

Since qemu will never interrupt a single instruction, each vector instruction
will always run to completion, which clears VSTART.  Since QEMU will never
produce a non-zero value of VSTART, it is allowed to trap on any non-zero
setting of VSTART.

I.e. it can be handled at translation time alongside VILL.


Hi, Richard

I am so sorry for the inconvenience. It is very kind of you to review 
the horrible long code and give so many comments.


Even in qemu,  it may be some situations that VSTART != 0. For example, 
a load instruction leads to a page fault exception in a middle position. 
If VSTART == 0,  some elements that had been loaded before the exception 
will be loaded once again.


Specially,  it may be a mistake if  the instruction restores execution 
with VSTART==  0.  When lmul == 1,


   "vlb v0 ,(a0), v0.t"

As v0 is the mask register,  if it is modified,  some part of it can't 
be used again.


It will take some time to address the other comments. After that I will 
split the patch into patch set with a cover letter in V2.


Thank you again for your review!

Best Regards,

Zhiwei




r~





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-29 Thread liuzhiwei

On 2019/8/29 上午5:34, Alistair Francis wrote:

On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:

Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
Signed-off-by: liuzhiwei 
---
  fpu/softfloat.c |   119 +
  include/fpu/softfloat.h | 4 +
  linux-user/riscv/cpu_loop.c | 8 +-
  target/riscv/Makefile.objs  | 2 +-
  target/riscv/cpu.h  |30 +
  target/riscv/cpu_bits.h |15 +
  target/riscv/cpu_helper.c   | 7 +
  target/riscv/csr.c  |65 +-
  target/riscv/helper.h   |   354 +
  target/riscv/insn32.decode  |   374 +-
  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
  target/riscv/translate.c| 1 +
  target/riscv/vector_helper.c| 26563 ++
  13 files changed, 28017 insertions(+), 9 deletions(-)
  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
  create mode 100644 target/riscv/vector_helper.c


Hello,

Thanks for the patch!

As others have pointed out you will need to split the patch up into
multiple smaller patches, otherwise it is too hard to review almost
30,000 lines of code.


Hi, Alistair

I'm so sorry for the inconvenience. It will be a patch set with a cover 
letter in V2.



Can you also include a cover letter with your patch series describing
how you are testing this? AFAIK vector extension support isn't in any
compiler so I'm assuming you are handwriting the assembly or have
toolchain patches. Either way it will help if you can share that so
others can test your implementation.


Yes, it's handwriting assembly. The assembler in Binutils has support 
Vector extension.  First define an function test_vadd_vv_8 in assembly 
and then it can be called from a C program.


The function is something like

/* vadd.vv */
TEST_FUNC(test_vadd_vv_8)
    vsetvli    t1, x0, e8, m2
    vlb.v   v6, (a4)
    vsb.v   v6, (a3)
    vsetvli    t1, a0, e8, m2
    vlb.v   v0, (a1)
    vlb.v   v2, (a2)
    vadd.vv v4, v0, v2
    vsb.v  v4, (a3)
ret
    .size   test_vadd_vv_8, .-test_vadd_vv_8

It takes more time to test than to implement the instructions. Maybe 
there is some better test method or some forced test cases in QEMU. 
Could you give me some advice for testing?


Best Regards,

Zhiwei


Alex and Richard have kindly started the review. Once you have
addressed their comments and split this patch up into smaller patches
you can send a v2 and we can go from there.

Once again thanks for doing this implementation for QEMU!

Alistair





Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-28 Thread Alistair Francis
On Wed, Aug 28, 2019 at 12:04 AM liuzhiwei  wrote:
>
> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei 
> ---
>  fpu/softfloat.c |   119 +
>  include/fpu/softfloat.h | 4 +
>  linux-user/riscv/cpu_loop.c | 8 +-
>  target/riscv/Makefile.objs  | 2 +-
>  target/riscv/cpu.h  |30 +
>  target/riscv/cpu_bits.h |15 +
>  target/riscv/cpu_helper.c   | 7 +
>  target/riscv/csr.c  |65 +-
>  target/riscv/helper.h   |   354 +
>  target/riscv/insn32.decode  |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c| 1 +
>  target/riscv/vector_helper.c| 26563 
> ++
>  13 files changed, 28017 insertions(+), 9 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>

Hello,

Thanks for the patch!

As others have pointed out you will need to split the patch up into
multiple smaller patches, otherwise it is too hard to review almost
30,000 lines of code.

Can you also include a cover letter with your patch series describing
how you are testing this? AFAIK vector extension support isn't in any
compiler so I'm assuming you are handwriting the assembly or have
toolchain patches. Either way it will help if you can share that so
others can test your implementation.

Alex and Richard have kindly started the review. Once you have
addressed their comments and split this patch up into smaller patches
you can send a v2 and we can go from there.

Once again thanks for doing this implementation for QEMU!

Alistair



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-28 Thread Richard Henderson
On 8/28/19 11:54 AM, Richard Henderson wrote:
> But it might be reasonable to include (VSTART == 0 && VL == VLMAX) as a
> single bit.

BTW, it is reasonable to check VSTART == 0 always.  Quoting the spec:

# Implementations are permitted to raise illegal instruction exceptions
# when attempting to execute a vector instruction with a value of vstart
# that the implementation can never produce when executing that same
# instruction with the same vtype setting.

Since qemu will never interrupt a single instruction, each vector instruction
will always run to completion, which clears VSTART.  Since QEMU will never
produce a non-zero value of VSTART, it is allowed to trap on any non-zero
setting of VSTART.

I.e. it can be handled at translation time alongside VILL.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-28 Thread Richard Henderson
On 8/27/19 7:36 PM, liuzhiwei wrote:
> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei 
> ---
>  fpu/softfloat.c |   119 +
>  include/fpu/softfloat.h | 4 +
>  linux-user/riscv/cpu_loop.c | 8 +-
>  target/riscv/Makefile.objs  | 2 +-
>  target/riscv/cpu.h  |30 +
>  target/riscv/cpu_bits.h |15 +
>  target/riscv/cpu_helper.c   | 7 +
>  target/riscv/csr.c  |65 +-
>  target/riscv/helper.h   |   354 +
>  target/riscv/insn32.decode  |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c| 1 +
>  target/riscv/vector_helper.c| 26563 
> ++
>  13 files changed, 28017 insertions(+), 9 deletions(-)

As Alex mentioned, this is *far* too big to be presented as a single patch.

> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 3ff3fa5..3b0754c 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -293,6 +293,10 @@ float16 float16_maxnummag(float16, float16, float_status 
> *status);
>  float16 float16_sqrt(float16, float_status *status);
>  int float16_compare(float16, float16, float_status *status);
>  int float16_compare_quiet(float16, float16, float_status *status);
> +int float16_unordered_quiet(float16, float16, float_status *status);
> +int float16_le(float16, float16, float_status *status);
> +int float16_lt(float16, float16, float_status *status);
> +int float16_eq_quiet(float16, float16, float_status *status);

As Alex mentioned, none of these changes are required, as all
functionality is provided by float16_compare{,_quiet}.

> diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
> index 12aa3c0..b01548a 100644
> --- a/linux-user/riscv/cpu_loop.c
> +++ b/linux-user/riscv/cpu_loop.c
> @@ -40,7 +40,13 @@ void cpu_loop(CPURISCVState *env)
>  signum = 0;
>  sigcode = 0;
>  sigaddr = 0;
> -
> +if (env->foflag) {
> +if (env->vfp.vl != 0) {
> +env->foflag = false;
> +env->pc += 4;
> +continue;
> +}

This is most definitely not the correct way to implement first-fault.

You need to have a look at target/arm/sve_helper.c, e.g. sve_ldff1_r,
where we test pages for validity with tlb_vaddr_to_host.

> +/* vector coprocessor state.  */
> +struct {
> +union VECTOR {
> +float64  f64[VUNIT(64)];
> +float32  f32[VUNIT(32)];
> +float16  f16[VUNIT(16)];
> +target_ulong ul[VUNIT(sizeof(target_ulong))];
> +uint64_t u64[VUNIT(64)];
> +int64_t  s64[VUNIT(64)];
> +uint32_t u32[VUNIT(32)];
> +int32_t  s32[VUNIT(32)];
> +uint16_t u16[VUNIT(16)];
> +int16_t  s16[VUNIT(16)];
> +uint8_t  u8[VUNIT(8)];
> +int8_t   s8[VUNIT(8)];
> +} vreg[32];
> +target_ulong vxrm;
> +target_ulong vxsat;
> +target_ulong vl;
> +target_ulong vstart;
> +target_ulong vtype;
> +float_status fp_status;
> +} vfp;

You've obviously copied "vfp" from target/arm.  Drop that.  It makes no sense
in the context of risc-v.

I'm not sure that vreg[].element[] really makes the most sense in the context
of how risc-v rearranges its elements.  It will almost certainly fail clang
validators, if enabled, since you'll be indexing beyond the end of vreg[n] into
vreg[n+1].

It might be best to have a single array:

union {
uint64_t u64[32 * VLEN / 64];
...
uint8_t u8[32 * VLEN / 8];
} velt;

This is clearer to the compiler that this is a single block of memory that we
can index as we please.

Note that float64/float32/float16 are legacy.  They will always be equivalent
to the unsigned integer types of the same size.

Is there really any vector operation at all that is dependent on XLEN?  If not,
then there is no reason to confuse things by including target_ulong.


> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index e32b612..405caf6 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -521,6 +521,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>  [PRV_H] = RISCV_EXCP_H_ECALL,
>  [PRV_M] = RISCV_EXCP_M_ECALL
>  };
> +if (env->foflag) {
> +if (env->vfp.vl != 0) {
> +env->foflag = false;
> +env->pc += 4;
> +return;
> +}
> +}

Again, not the way to implement first-fault.

In particular, you haven't even verified that do_interrupt has been called on
behalf of a RISCV_EXCP_LOAD_PAGE_FAULT.  This could be a timer tick.

> +#define MAX_U8  ((uint8_t)0xff)
> +#define MIN_U8  ((uint8_t)0x0)
> +#define MAX_S8  ((int8_t)0x7f)
> 

Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-28 Thread Richard Henderson
On 8/28/19 2:08 AM, Alex Bennée wrote:
> If you want to do vectors I suggest you look at the TCGvec types for
> passing pointers to vector registers to helpers. In this case you will
> want to ensure your vector registers are properly aligned.

The risc-v vector extension is very different from any other existing vector
extension.  In particular, the locations of the vector elements vary
dynamically.  Except for certain special cases I doubt that risc-v can make
direct use of the generic TCG vector support.


r~



Re: [Qemu-devel] [PATCH] RISCV: support riscv vector extension 0.7.1

2019-08-28 Thread Alex Bennée


liuzhiwei  writes:

> Change-Id: I3cf891bc400713b95f47ecca82b1bf773f3dcb25
> Signed-off-by: liuzhiwei 
> ---
>  fpu/softfloat.c |   119 +
>  include/fpu/softfloat.h | 4 +

Changes to softfloat should be in a separate patch, but see bellow.

>  linux-user/riscv/cpu_loop.c | 8 +-
>  target/riscv/Makefile.objs  | 2 +-
>  target/riscv/cpu.h  |30 +
>  target/riscv/cpu_bits.h |15 +
>  target/riscv/cpu_helper.c   | 7 +
>  target/riscv/csr.c  |65 +-
>  target/riscv/helper.h   |   354 +
>  target/riscv/insn32.decode  |   374 +-
>  target/riscv/insn_trans/trans_rvv.inc.c |   484 +
>  target/riscv/translate.c| 1 +
>  target/riscv/vector_helper.c| 26563 
> ++

This is likely too big to be reviewed. Is it possible to split the patch
up into more discrete chunks, for example support pieces and then maybe
a class at a time?

>  13 files changed, 28017 insertions(+), 9 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvv.inc.c
>  create mode 100644 target/riscv/vector_helper.c
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 2ba36ec..da155ea 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -433,6 +433,16 @@ static inline int extractFloat16Exp(float16 a)
>  }
>
>  
> /*
> +| Returns the sign bit of the half-precision floating-point value `a'.
> +**/
> +
> +static inline flag extractFloat16Sign(float16 a)
> +{
> +return float16_val(a) >> 0xf;
> +}
> +

We are trying to avoid this sort of bit fiddling for new code when we
already have generic decompose functions that can extract all the parts
into a common format.

> +
> +/*
>  | Returns the fraction bits of the single-precision floating-point value `a'.
>  
> **/
>
> @@ -4790,6 +4800,35 @@ int float32_eq(float32 a, float32 b, float_status 
> *status)
>  }
>
>  
> /*
> +| Returns 1 if the half-precision floating-point value `a' is less than
> +| or equal to the corresponding value `b', and 0 otherwise.  The invalid
> +| exception is raised if either operand is a NaN.  The comparison is 
> performed
> +| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> +**/
> +
> +int float16_le(float16 a, float16 b, float_status *status)
> +{
> +flag aSign, bSign;
> +uint16_t av, bv;
> +a = float16_squash_input_denormal(a, status);
> +b = float16_squash_input_denormal(b, status);
> +
> +if (( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> + || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +   ) {
> +float_raise(float_flag_invalid, status);
> +return 0;
> +}
> +aSign = extractFloat16Sign( a );
> +bSign = extractFloat16Sign( b );
> +av = float16_val(a);
> +bv = float16_val(b);
> +if ( aSign != bSign ) return aSign || ( (uint16_t) ( ( av | bv )<<1 ) == 
> 0 );
> +return ( av == bv ) || ( aSign ^ ( av < bv ) );
> +
> +}

What does this provide that:

  float16_compare(a, b, status) == float_relation_less;

doesn't?

> +
> +/*
>  | Returns 1 if the single-precision floating-point value `a' is less than
>  | or equal to the corresponding value `b', and 0 otherwise.  The invalid
>  | exception is raised if either operand is a NaN.  The comparison is 
> performed
> @@ -4825,6 +4864,35 @@ int float32_le(float32 a, float32 b, float_status 
> *status)
>  | to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>  
> **/
>
> +int float16_lt(float16 a, float16 b, float_status *status)
> +{
> +flag aSign, bSign;
> +uint16_t av, bv;
> +a = float16_squash_input_denormal(a, status);
> +b = float16_squash_input_denormal(b, status);
> +
> +if (( ( extractFloat16Exp( a ) == 0x1F ) && extractFloat16Frac( a ) )
> + || ( ( extractFloat16Exp( b ) == 0x1F ) && extractFloat16Frac( b ) )
> +   ) {
> +float_raise(float_flag_invalid, status);
> +return 0;
> +}
> +aSign = extractFloat16Sign( a );
> +bSign = extractFloat16Sign( b );
> +av = float16_val(a);
> +bv = float16_val(b);
> +if ( aSign != bSign ) return aSign && ( (uint16_t) ( ( av | bv )<<1 ) != 
> 0 );
> +return ( av != bv ) && ( aSign ^ ( av < bv )