Re: Is there a way to tell GCC not to reorder a specific instruction?
On Wed, Sep 30, 2020 at 11:35 PM Richard Biener wrote: > On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson wrote: > > We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that > > we are using for testing the vector support. > > That doesn't seem to exist (but maybe it's just not on trunk yet). The vector extension is still in draft form, and they are still making major compatibility breaks. There was yet another one about 3-4 weeks ago. I don't want to upstream anything until we have an officially accepted V extension, at which point they will stop allowing compatibility breaks. If we upstream now, we would need some protocol for how to handle unsupported experimental patches in mainline, and I don't think that we have one. So for now, the vector support is on a branch in the RISC-V International github repo. https://github.com/riscv/riscv-gnu-toolchain/tree/rvv-intrinsic The gcc testcases specifically are here https://github.com/riscv/riscv-gcc/tree/riscv-gcc-10.1-rvv-dev/gcc/testsuite/gcc.target/riscv/rvv A lot of the testcases use macros so we can test every variation of an instruction, and there is a large number of variations for most instructions, so most of these testcases aren't very readable. They are just to verify that we can generate the instructions we expect. Only the algorithm ones are readable, like saxpy, memcpy, strcpy. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson wrote: > > On Tue, Sep 29, 2020 at 11:40 PM Richard Biener > wrote: > > But this also doesn't work on GIMPLE. On GIMPLE riscv_vlen would > > be a barrier for code motion if you make it __attribute__((returns_twice)) > > since then abnormal edges distort the CFG in a way preventing such motion. > > At the gimple level, all vector operations have an implicit vsetvl, so > it doesn't matter much how they are sorted. As long as they don't get > sorted across an explicit vsetvl that they depend on. But the normal > way to use explicit vsetvl is to control a loop, and you can't move > dependent operations out of the loop, so it tends to work. Setting > vsetvl in the middle of a basic block is less useful and less common, > and very unlikely to work unless you really know what you are doing. > Basically, RISC-V wasn't designed to work this way, and so you > probably shouldn't be writing your code this way. There might be edge > cases where we aren't handling this right, as we aren't writing code > this way, and hence we aren't testing this support. This is still a > work in progress. > > Good RVV code should look more like this: > > #include > #include > > void saxpy(size_t n, const float a, const float *x, float *y) { > size_t l; > > vfloat32m8_t vx, vy; > > for (; (l = vsetvl_e32m8(n)) > 0; n -= l) { > vx = vle32_v_f32m8(x); > x += l; > vy = vle32_v_f32m8(y); > // vfmacc > vy = a * vx + vy; > vse32_v_f32m8(y, vy); > y += l; > } > } Ah, ok - that makes sense. > We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that > we are using for testing the vector support. That doesn't seem to exist (but maybe it's just not on trunk yet). Richard. > Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 11:40 PM Richard Biener wrote: > But this also doesn't work on GIMPLE. On GIMPLE riscv_vlen would > be a barrier for code motion if you make it __attribute__((returns_twice)) > since then abnormal edges distort the CFG in a way preventing such motion. At the gimple level, all vector operations have an implicit vsetvl, so it doesn't matter much how they are sorted. As long as they don't get sorted across an explicit vsetvl that they depend on. But the normal way to use explicit vsetvl is to control a loop, and you can't move dependent operations out of the loop, so it tends to work. Setting vsetvl in the middle of a basic block is less useful and less common, and very unlikely to work unless you really know what you are doing. Basically, RISC-V wasn't designed to work this way, and so you probably shouldn't be writing your code this way. There might be edge cases where we aren't handling this right, as we aren't writing code this way, and hence we aren't testing this support. This is still a work in progress. Good RVV code should look more like this: #include #include void saxpy(size_t n, const float a, const float *x, float *y) { size_t l; vfloat32m8_t vx, vy; for (; (l = vsetvl_e32m8(n)) > 0; n -= l) { vx = vle32_v_f32m8(x); x += l; vy = vle32_v_f32m8(y); // vfmacc vy = a * vx + vy; vse32_v_f32m8(y, vy); y += l; } } We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that we are using for testing the vector support. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 7:22 PM 夏 晋 wrote: > vint16m1_t foo3(vint16m1_t a, vint16m1_t b){ > vint16m1_t add = a+b; > vint16m1_t mul = a*b; > vsetvl_e8m1(32); > return add + mul; > } Taking another look at your example, you have type confusion. Using vsetvl to specify an element width of 8 does not magically convert types into 8-bit vector types. They are still 16-bit vector types and will still result in 16-bit vector operations. So your explicit vsetvl_e8m1 is completely useless. In the RISC-V V scheme, every vector operation emits an implicit vsetvl instruction, and then we optimize away the redundant ones. So the add and mul at the start are emitting two vsetvl instructions. Then you have an explicit vsetvl. Then another add, which will emit another implicit vsetvl. The compiler reordered the arithmetic in such a way that two of the implicit vsetvl instructions can be optimized away. That probably happened by accident. But we don't have support for optimizing away the useless explicit vsetvl, so it remains. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 9:46 PM Jim Wilson wrote: > > On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc wrote: > > I tried to set the "vlen" after the add & multi, as shown in the following > > code: > > > vf32 x3,x4; > > void foo1(float16_t* input, float16_t* output, int vlen){ > > vf32 add = x3 + x4; > > vf32 mul = x3 * x4; > > __builtin_riscv_vlen(vlen); //< > > storevf([0], add); > > storevf([4], mul); > > } > > Not clear what __builtin_riscv_vlen is doing, or what exactly your > target is, but the gcc port I did for the RISC-V draft V extension > creates new fake vector type and vector length registers, like the > existing fake fp and arg pointer registers, and the vsetvl{i} > instruction sets the fake vector type and vector length registers, and > all vector instructions read the fake vector type and vector length > registers. That creates the dependence between the instructions that > prevents reordering. It is a little more complicated than that, as > you can have more than one vsetvl{i} instruction setting different > vector type and/or vector length values, so we have to match on the > expected values to make sure that vector instructions are tied to the > right vsetvl{i} instruction. This is a work in progress, but overall > it is working pretty well. This requires changes to the gcc port, as > you have to add the new fake registers in gcc/config/riscv/riscv.h. > This isn't something you can do with macros and extended asms. But this also doesn't work on GIMPLE. On GIMPLE riscv_vlen would be a barrier for code motion if you make it __attribute__((returns_twice)) since then abnormal edges distort the CFG in a way preventing such motion. > See for instance > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ > > Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc wrote: > I tried to set the "vlen" after the add & multi, as shown in the following > code: > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x3 * x4; > __builtin_riscv_vlen(vlen); //< > storevf([0], add); > storevf([4], mul); > } Not clear what __builtin_riscv_vlen is doing, or what exactly your target is, but the gcc port I did for the RISC-V draft V extension creates new fake vector type and vector length registers, like the existing fake fp and arg pointer registers, and the vsetvl{i} instruction sets the fake vector type and vector length registers, and all vector instructions read the fake vector type and vector length registers. That creates the dependence between the instructions that prevents reordering. It is a little more complicated than that, as you can have more than one vsetvl{i} instruction setting different vector type and/or vector length values, so we have to match on the expected values to make sure that vector instructions are tied to the right vsetvl{i} instruction. This is a work in progress, but overall it is working pretty well. This requires changes to the gcc port, as you have to add the new fake registers in gcc/config/riscv/riscv.h. This isn't something you can do with macros and extended asms. See for instance https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 12:55 PM 夏 晋 via Gcc wrote: > > Hi everyone, > I tried to set the "vlen" after the add & multi, as shown in the following > code: > ➜ > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x3 * x4; > __builtin_riscv_vlen(vlen); //< > storevf([0], add); > storevf([4], mul); > } > but after compilation, the "vlen" is reordered: > ➜ > foo1: > lui a5,%hi(.LANCHOR0) > addia5,a5,%lo(.LANCHOR0) > addia4,a5,64 > vfldv0,a5 > vfldv1,a4 > csrwvlen,a2 //< > vfadd v2,v0,v1 > addia5,a1,8 > vfmul v0,v0,v1 > vfstv2,a1 > vfstv0,a5 > ret > And I've tried to add some barrier code shown as the following: > ➜ > #define barrier() __asm__ __volatile__("": : :"memory") > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x3 * x4; > barrier(); > __builtin_riscv_vlen(vlen); > barrier(); > storevf([0], add); > storevf([4], mul); > } > ➜ > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x3 * x4; > __asm__ __volatile__ ("csrw\tvlen,%0" : : "rJ"(vlen) : "memory"); > storevf([0], add); > storevf([4], mul); > } > Both methods compiled out the same false assembly. > === > But if I tried the code like: (add & multi are using different operands) > ➜ > vf32 x1,x2; > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x1 * x2; > __builtin_riscv_vlen(vlen); > storevf([0], add); > storevf([4], mul); > } > the assembly will be right: > ➜ > foo1: > lui a5,%hi(.LANCHOR0) > addia5,a5,%lo(.LANCHOR0) > addia0,a5,64 > addia3,a5,128 > addia4,a5,192 > vfldv1,a5 > vfldv3,a0 > vfldv0,a3 > vfldv2,a4 > vfadd v1,v1,v3 > vfmul v0,v0,v2 > csrwvlen,a2 < > addia5,a1,8 > vfstv1,a1 > vfstv0,a5 > ret > > Is there any other way for coding or other option for gcc compilation to deal > with this issue. > Any suggestion would be appreciated. Thank you very much! You need to present GCC with a data dependence that prevents the re-ordering for example by adding input/outputs for add/mul like asm volatile ("crsw\tvlen, %0" : "=r" (add), "=r" (mul) : "0" (add), "0" (mul), "rJ" (vlen)); Richard. > Best, > Jin