Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-10-01 Thread Jim Wilson
On Wed, Sep 30, 2020 at 11:35 PM Richard Biener
 wrote:
> On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson  wrote:
> > We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
> > we are using for testing the vector support.
>
> That doesn't seem to exist (but maybe it's just not on trunk yet).

The vector extension is still in draft form, and they are still making
major compatibility breaks.  There was yet another one about 3-4 weeks
ago.  I don't want to upstream anything until we have an officially
accepted V extension, at which point they will stop allowing
compatibility breaks.  If we upstream now, we would need some protocol
for how to handle unsupported experimental patches in mainline, and I
don't think that we have one.

So for now, the vector support is on a branch in the RISC-V
International github repo.
https://github.com/riscv/riscv-gnu-toolchain/tree/rvv-intrinsic
The gcc testcases specifically are here
https://github.com/riscv/riscv-gcc/tree/riscv-gcc-10.1-rvv-dev/gcc/testsuite/gcc.target/riscv/rvv
A lot of the testcases use macros so we can test every variation of an
instruction, and there is a large number of variations for most
instructions, so most of these testcases aren't very readable.  They
are just to verify that we can generate the instructions we expect.
Only the algorithm ones are readable, like saxpy, memcpy, strcpy.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-10-01 Thread Richard Biener via Gcc
On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson  wrote:
>
> On Tue, Sep 29, 2020 at 11:40 PM Richard Biener
>  wrote:
> > But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
> > be a barrier for code motion if you make it __attribute__((returns_twice))
> > since then abnormal edges distort the CFG in a way preventing such motion.
>
> At the gimple level, all vector operations have an implicit vsetvl, so
> it doesn't matter much how they are sorted.  As long as they don't get
> sorted across an explicit vsetvl that they depend on.  But the normal
> way to use explicit vsetvl is to control a loop, and you can't move
> dependent operations out of the loop, so it tends to work.  Setting
> vsetvl in the middle of a basic block is less useful and less common,
> and very unlikely to work unless you really know what you are doing.
> Basically, RISC-V wasn't designed to work this way, and so you
> probably shouldn't be writing your code this way.  There might be edge
> cases where we aren't handling this right, as we aren't writing code
> this way, and hence we aren't testing this support.  This is still a
> work in progress.
>
> Good RVV code should look more like this:
>
> #include 
> #include 
>
> void saxpy(size_t n, const float a, const float *x, float *y) {
>   size_t l;
>
>   vfloat32m8_t vx, vy;
>
>   for (; (l = vsetvl_e32m8(n)) > 0; n -= l) {
> vx = vle32_v_f32m8(x);
> x += l;
> vy = vle32_v_f32m8(y);
> // vfmacc
> vy = a * vx + vy;
> vse32_v_f32m8(y, vy);
> y += l;
>   }
> }

Ah, ok - that makes sense.

> We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
> we are using for testing the vector support.

That doesn't seem to exist (but maybe it's just not on trunk yet).

Richard.

> Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Jim Wilson
On Tue, Sep 29, 2020 at 11:40 PM Richard Biener
 wrote:
> But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
> be a barrier for code motion if you make it __attribute__((returns_twice))
> since then abnormal edges distort the CFG in a way preventing such motion.

At the gimple level, all vector operations have an implicit vsetvl, so
it doesn't matter much how they are sorted.  As long as they don't get
sorted across an explicit vsetvl that they depend on.  But the normal
way to use explicit vsetvl is to control a loop, and you can't move
dependent operations out of the loop, so it tends to work.  Setting
vsetvl in the middle of a basic block is less useful and less common,
and very unlikely to work unless you really know what you are doing.
Basically, RISC-V wasn't designed to work this way, and so you
probably shouldn't be writing your code this way.  There might be edge
cases where we aren't handling this right, as we aren't writing code
this way, and hence we aren't testing this support.  This is still a
work in progress.

Good RVV code should look more like this:

#include 
#include 

void saxpy(size_t n, const float a, const float *x, float *y) {
  size_t l;

  vfloat32m8_t vx, vy;

  for (; (l = vsetvl_e32m8(n)) > 0; n -= l) {
vx = vle32_v_f32m8(x);
x += l;
vy = vle32_v_f32m8(y);
// vfmacc
vy = a * vx + vy;
vse32_v_f32m8(y, vy);
y += l;
  }
}

We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that
we are using for testing the vector support.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Jim Wilson
On Tue, Sep 29, 2020 at 7:22 PM 夏 晋  wrote:
> vint16m1_t foo3(vint16m1_t a, vint16m1_t b){
>   vint16m1_t add = a+b;
>   vint16m1_t mul = a*b;
>   vsetvl_e8m1(32);
>   return add + mul;
> }

Taking another look at your example, you have type confusion.  Using
vsetvl to specify an element width of 8 does not magically convert
types into 8-bit vector types.  They are still 16-bit vector types and
will still result in 16-bit vector operations.  So your explicit
vsetvl_e8m1 is completely useless.

In the RISC-V V scheme, every vector operation emits an implicit
vsetvl instruction, and then we optimize away the redundant ones.  So
the add and mul at the start are emitting two vsetvl instructions.
Then you have an explicit vsetvl.  Then another add, which will emit
another implicit vsetvl.  The compiler reordered the arithmetic in
such a way that two of the implicit vsetvl instructions can be
optimized away.  That probably happened by accident.  But we don't
have support for optimizing away the useless explicit vsetvl, so it
remains.

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-30 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 9:46 PM Jim Wilson  wrote:
>
> On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc  wrote:
> > I tried to set the "vlen" after the add & multi, as shown in the following 
> > code:
>
> > vf32 x3,x4;
> > void foo1(float16_t* input, float16_t* output, int vlen){
> > vf32 add = x3 + x4;
> > vf32 mul = x3 * x4;
> > __builtin_riscv_vlen(vlen);  //<
> > storevf([0], add);
> > storevf([4], mul);
> > }
>
> Not clear what __builtin_riscv_vlen is doing, or what exactly your
> target is, but the gcc port I did for the RISC-V draft V extension
> creates new fake vector type and vector length registers, like the
> existing fake fp and arg pointer registers, and the vsetvl{i}
> instruction sets the fake vector type and vector length registers, and
> all vector instructions read the fake vector type and vector length
> registers.  That creates the dependence between the instructions that
> prevents reordering.  It is a little more complicated than that, as
> you can have more than one vsetvl{i} instruction setting different
> vector type and/or vector length values, so we have to match on the
> expected values to make sure that vector instructions are tied to the
> right vsetvl{i} instruction.  This is a work in progress, but overall
> it is working pretty well.  This requires changes to the gcc port, as
> you have to add the new fake registers in gcc/config/riscv/riscv.h.
> This isn't something you can do with macros and extended asms.

But this also doesn't work on GIMPLE.  On GIMPLE riscv_vlen would
be a barrier for code motion if you make it __attribute__((returns_twice))
since then abnormal edges distort the CFG in a way preventing such motion.

> See for instance
> 
> https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ
>
> Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-29 Thread Jim Wilson
On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc  wrote:
> I tried to set the "vlen" after the add & multi, as shown in the following 
> code:

> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __builtin_riscv_vlen(vlen);  //<
> storevf([0], add);
> storevf([4], mul);
> }

Not clear what __builtin_riscv_vlen is doing, or what exactly your
target is, but the gcc port I did for the RISC-V draft V extension
creates new fake vector type and vector length registers, like the
existing fake fp and arg pointer registers, and the vsetvl{i}
instruction sets the fake vector type and vector length registers, and
all vector instructions read the fake vector type and vector length
registers.  That creates the dependence between the instructions that
prevents reordering.  It is a little more complicated than that, as
you can have more than one vsetvl{i} instruction setting different
vector type and/or vector length values, so we have to match on the
expected values to make sure that vector instructions are tied to the
right vsetvl{i} instruction.  This is a work in progress, but overall
it is working pretty well.  This requires changes to the gcc port, as
you have to add the new fake registers in gcc/config/riscv/riscv.h.
This isn't something you can do with macros and extended asms.

See for instance

https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ

Jim


Re: Is there a way to tell GCC not to reorder a specific instruction?

2020-09-29 Thread Richard Biener via Gcc
On Tue, Sep 29, 2020 at 12:55 PM 夏 晋 via Gcc  wrote:
>
> Hi everyone,
> I tried to set the "vlen" after the add & multi, as shown in the following 
> code:
> ➜
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __builtin_riscv_vlen(vlen);  //<
> storevf([0], add);
> storevf([4], mul);
> }
> but after compilation, the "vlen" is reordered:
> ➜
> foo1:
> lui a5,%hi(.LANCHOR0)
> addia5,a5,%lo(.LANCHOR0)
> addia4,a5,64
> vfldv0,a5
> vfldv1,a4
> csrwvlen,a2  //<
> vfadd   v2,v0,v1
> addia5,a1,8
> vfmul   v0,v0,v1
> vfstv2,a1
> vfstv0,a5
> ret
> And I've tried to add some barrier code shown as the following:
> ➜
> #define barrier() __asm__ __volatile__("": : :"memory")
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> barrier();
> __builtin_riscv_vlen(vlen);
> barrier();
> storevf([0], add);
> storevf([4], mul);
> }
> ➜
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x3 * x4;
> __asm__ __volatile__ ("csrw\tvlen,%0" : : "rJ"(vlen) : "memory");
> storevf([0], add);
> storevf([4], mul);
> }
> Both methods compiled out the same false assembly.
> ===
> But if I tried the code like: (add & multi are using different operands)
> ➜
> vf32 x1,x2;
> vf32 x3,x4;
> void foo1(float16_t* input, float16_t* output, int vlen){
> vf32 add = x3 + x4;
> vf32 mul = x1 * x2;
> __builtin_riscv_vlen(vlen);
> storevf([0], add);
> storevf([4], mul);
> }
> the assembly will be right:
> ➜
> foo1:
> lui a5,%hi(.LANCHOR0)
> addia5,a5,%lo(.LANCHOR0)
> addia0,a5,64
> addia3,a5,128
> addia4,a5,192
> vfldv1,a5
> vfldv3,a0
> vfldv0,a3
> vfldv2,a4
> vfadd   v1,v1,v3
> vfmul   v0,v0,v2
> csrwvlen,a2  <
> addia5,a1,8
> vfstv1,a1
> vfstv0,a5
> ret
>
> Is there any other way for coding or other option for gcc compilation to deal 
> with this issue.
> Any suggestion would be appreciated. Thank you very much!

You need to present GCC with a data dependence that prevents the re-ordering
for example by adding input/outputs for add/mul like

asm volatile ("crsw\tvlen, %0" : "=r" (add), "=r" (mul) : "0" (add),
"0" (mul), "rJ" (vlen));

Richard.

> Best,
> Jin