[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #9 from Robin Dapp  ---
Argh,  I actually just did a gcc -O3 -march=native pr114027.c
-fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used
my slightly modified version and not the original test case.

long a;
int b[10][8] = {{},
{},
{},
{},
{},
{},
{0, 0, 0, 0, 0, 1, 1},
{1, 1, 1, 1, 1, 1, 1},
{1, 1, 1, 1, 1, 1, 1}};
int c;
int main() {
int d;
c = 0x;
for (; a < 6; a++) {
d = 0;
for (; d < 6; d++) {
c ^= -3L;
if (b[a + 3][d])
continue;
c = 0;
}
}

if (c == -3) {
return 0;
} else {
return 1;
}
}

This was from an initial attempt to minimize it further but I didn't really
verify if I'm breaking the test case by that (or causing undefined behavior).

With that I get a "1" with default options and "0" with -fno-tree-vectorize.
Maybe my snippet is broken then?

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #8 from Andrew Pinski  ---
(In reply to Robin Dapp from comment #6)
> Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model.  So it
> definitely looks generic.

I still can't reproduce it on x86 with `-O3 -fno-vect-cost-model` nor with `-O3
-fno-vect-cost-model -march=skylake-avx512`, I tried the testcase in comment #0
and comment #3. Did I miss something?

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #7 from Sam James  ---
(In reply to Robin Dapp from comment #6)
> Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model.  So it
> definitely looks generic.

What args did you use? I can't get it to fail.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

Robin Dapp  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Last reconfirmed||2024-2-22
 Target|riscv   |x86_64-*-* riscv*-*-*
   ||aarch64-*-*

--- Comment #5 from Robin Dapp  ---
To me it looks like we interpret e.g. c_53 = _43 ? prephitmp_13 : 0 as the only
reduction statement and simplify to MAX because of the wrong assumption that
this is the only reduction statement in the chain when we actually have
several. 
(See "condition expression based on compile time constant").

--- Comment #6 from Robin Dapp  ---
Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model.  So it
definitely looks generic.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #4 from Li Pan  ---
Just did some hacks from the riscv backend, which is to replace the expanding
code of reduc_smax_scal_ to the reduc_xor_scal_.

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..58424baabd7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2107,10 +2107,8 @@ (define_expand "reduc_smax_scal_"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  int prec = GET_MODE_PRECISION (mode);
-  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), mode);
-  riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, riscv_vector::REDUCE_OP,
-  operands, min);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_XOR, riscv_vector::REDUCE_OP,
+  operands, CONST0_RTX (mode));
   DONE;
 })

My idea would like to prove that the last standard name should be .REDUC_XOR.

Then the test (include the narrowed and the original one) can pass. That may
indicates we take .REDUC_MAX by mistake in somewhere. let me try to figure it
out.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-21 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

Li Pan  changed:

   What|Removed |Added

 CC||pan2.li at intel dot com

--- Comment #3 from Li Pan  ---
Narrow a little compares to the original test case.

---
int b[10][7] = {{}, // 0
{}, // 1
{}, // 2
{}, // 3
{}, // 4
{}, // 5
{0, 0, 0, 0, 0, 1}, // 6
{2, 3, 4, 5, 6, 7}, // 7
{8, 8, 8, 8, 8, 8}};// 8
   //0  1  2  3  4  5
int c;

int main() {
  int d = 0, a = 0;
  c = 0x;

  for (a = 0; a < 5; a++) {
for (d = 0; d < 6; d++) {
  c ^= -3L;

  if (b[a + 3][d])
continue;

  c = 0;
}
  }

  if (c == -3) {
return 0;
  } else {
return 1;
  }
}
---

The sematics of the loop acts on 5 * 6 matrix. The upstream currently makes the
first 4 * 6 vectorized and then goes scalar for the last 6 elements. The
vectorized part may looks like below.

  vect_array.16 = .MASK_LEN_LOAD_LANES (  [(void *) + 84B],
32B, { -1, ... }, POLY_INT_CST [4, 4], 0);
  vect__28.17_94 = vect_array.16[0];
  vect__28.18_95 = vect_array.16[1];
  vect__28.19_96 = vect_array.16[2];
  vect__28.20_97 = vect_array.16[3];
  vect__28.21_98 = vect_array.16[4];
  vect__28.22_99 = vect_array.16[5];
  vect_array.16 ={v} {CLOBBER};
  mask__70.24_102 = vect__28.17_94 != { 0, ... };
  vect_prephitmp_76.25_104 = .VCOND_MASK (mask__70.24_102, { -1, ... }, { -3,
... });
  mask__80.26_106 = vect__28.18_95 != { 0, ... };
  vect_c_lsm.27_108 = .VCOND_MASK (mask__80.26_106, vect_prephitmp_76.25_104, {
0, ... });
  mask__51.28_110 = vect__28.19_96 != { 0, ... };
  vect_prephitmp_66.29_112 = .VCOND_MASK (mask__51.28_110, vect_c_lsm.27_108, {
-3, ... });
  mask__16.30_114 = vect__28.20_97 != { 0, ... };
  vect_c_lsm.31_116 = .VCOND_MASK (mask__16.30_114, vect_prephitmp_66.29_112, {
0, ... });
  mask__79.32_118 = vect__28.21_98 != { 0, ... };
  vect_prephitmp_56.33_120 = .VCOND_MASK (mask__79.32_118, vect_c_lsm.31_116, {
-3, ... });
  mask__25.34_122 = vect__28.22_99 != { 0, ... };
  vect_c_lsm.35_124 = .VCOND_MASK (mask__25.34_122, vect_prephitmp_56.33_120, {
0, ... });
  _126 = .REDUC_MAX (vect_c_lsm.35_124);

Looks like the last .REDUC_MAX is kind of a surprise here? It is not easy to
get the sematics of REDUC_MAX for source code.  Actually the c will depend on
the previous iteration.

For example, if b condition is 0, c will be 0 forever. If b condition is 1, the
c will be the sequence similar to [-3, 0, -3, 0...].

Not sure if my understanding is correct, will take a look into tree-vect.

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #2 from Andrew Pinski  ---
Works fine on aarch64 with SVE:
```
[apinski@xeond2 upstream-full-cross]$ ./install/bin/aarch64-linux-gnu-gcc -O3
t6.c -static -march=armv9-a+sve2
[apinski@xeond2 upstream-full-cross]$ ./install-qemu/bin/qemu-aarch64 a.out
;echo $?
0
[apinski@xeond2 upstream-full-cross]$ ./install/bin/aarch64-linux-gnu-gcc -O3
t6.c -static -march=armv9-a+sve2 -fno-vect-cost-model
[apinski@xeond2 upstream-full-cross]$ ./install-qemu/bin/qemu-aarch64 a.out
;echo $?
0
```

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027

--- Comment #1 from Sam James  ---
When this is fixed, this is probably worth putting in the general torture test
suite, not just for riscv.