[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #9 from Robin Dapp --- Argh, I actually just did a gcc -O3 -march=native pr114027.c -fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used my slightly modified version and not the original test case. long a; int b[10][8] = {{}, {}, {}, {}, {}, {}, {0, 0, 0, 0, 0, 1, 1}, {1, 1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1, 1}}; int c; int main() { int d; c = 0x; for (; a < 6; a++) { d = 0; for (; d < 6; d++) { c ^= -3L; if (b[a + 3][d]) continue; c = 0; } } if (c == -3) { return 0; } else { return 1; } } This was from an initial attempt to minimize it further but I didn't really verify if I'm breaking the test case by that (or causing undefined behavior). With that I get a "1" with default options and "0" with -fno-tree-vectorize. Maybe my snippet is broken then?
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #8 from Andrew Pinski --- (In reply to Robin Dapp from comment #6) > Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model. So it > definitely looks generic. I still can't reproduce it on x86 with `-O3 -fno-vect-cost-model` nor with `-O3 -fno-vect-cost-model -march=skylake-avx512`, I tried the testcase in comment #0 and comment #3. Did I miss something?
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #7 from Sam James --- (In reply to Robin Dapp from comment #6) > Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model. So it > definitely looks generic. What args did you use? I can't get it to fail.
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Last reconfirmed||2024-2-22 Target|riscv |x86_64-*-* riscv*-*-* ||aarch64-*-* --- Comment #5 from Robin Dapp --- To me it looks like we interpret e.g. c_53 = _43 ? prephitmp_13 : 0 as the only reduction statement and simplify to MAX because of the wrong assumption that this is the only reduction statement in the chain when we actually have several. (See "condition expression based on compile time constant"). --- Comment #6 from Robin Dapp --- Btw this fails on x86 and aarch64 for me with -fno-vect-cost-model. So it definitely looks generic.
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #4 from Li Pan --- Just did some hacks from the riscv backend, which is to replace the expanding code of reduc_smax_scal_ to the reduc_xor_scal_. diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 3b32369f68c..58424baabd7 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2107,10 +2107,8 @@ (define_expand "reduc_smax_scal_" (match_operand:V_VLSI 1 "register_operand")] "TARGET_VECTOR" { - int prec = GET_MODE_PRECISION (mode); - rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), mode); - riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, riscv_vector::REDUCE_OP, - operands, min); + riscv_vector::expand_reduction (UNSPEC_REDUC_XOR, riscv_vector::REDUCE_OP, + operands, CONST0_RTX (mode)); DONE; }) My idea would like to prove that the last standard name should be .REDUC_XOR. Then the test (include the narrowed and the original one) can pass. That may indicates we take .REDUC_MAX by mistake in somewhere. let me try to figure it out.
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 Li Pan changed: What|Removed |Added CC||pan2.li at intel dot com --- Comment #3 from Li Pan --- Narrow a little compares to the original test case. --- int b[10][7] = {{}, // 0 {}, // 1 {}, // 2 {}, // 3 {}, // 4 {}, // 5 {0, 0, 0, 0, 0, 1}, // 6 {2, 3, 4, 5, 6, 7}, // 7 {8, 8, 8, 8, 8, 8}};// 8 //0 1 2 3 4 5 int c; int main() { int d = 0, a = 0; c = 0x; for (a = 0; a < 5; a++) { for (d = 0; d < 6; d++) { c ^= -3L; if (b[a + 3][d]) continue; c = 0; } } if (c == -3) { return 0; } else { return 1; } } --- The sematics of the loop acts on 5 * 6 matrix. The upstream currently makes the first 4 * 6 vectorized and then goes scalar for the last 6 elements. The vectorized part may looks like below. vect_array.16 = .MASK_LEN_LOAD_LANES ( [(void *) + 84B], 32B, { -1, ... }, POLY_INT_CST [4, 4], 0); vect__28.17_94 = vect_array.16[0]; vect__28.18_95 = vect_array.16[1]; vect__28.19_96 = vect_array.16[2]; vect__28.20_97 = vect_array.16[3]; vect__28.21_98 = vect_array.16[4]; vect__28.22_99 = vect_array.16[5]; vect_array.16 ={v} {CLOBBER}; mask__70.24_102 = vect__28.17_94 != { 0, ... }; vect_prephitmp_76.25_104 = .VCOND_MASK (mask__70.24_102, { -1, ... }, { -3, ... }); mask__80.26_106 = vect__28.18_95 != { 0, ... }; vect_c_lsm.27_108 = .VCOND_MASK (mask__80.26_106, vect_prephitmp_76.25_104, { 0, ... }); mask__51.28_110 = vect__28.19_96 != { 0, ... }; vect_prephitmp_66.29_112 = .VCOND_MASK (mask__51.28_110, vect_c_lsm.27_108, { -3, ... }); mask__16.30_114 = vect__28.20_97 != { 0, ... }; vect_c_lsm.31_116 = .VCOND_MASK (mask__16.30_114, vect_prephitmp_66.29_112, { 0, ... }); mask__79.32_118 = vect__28.21_98 != { 0, ... }; vect_prephitmp_56.33_120 = .VCOND_MASK (mask__79.32_118, vect_c_lsm.31_116, { -3, ... }); mask__25.34_122 = vect__28.22_99 != { 0, ... }; vect_c_lsm.35_124 = .VCOND_MASK (mask__25.34_122, vect_prephitmp_56.33_120, { 0, ... }); _126 = .REDUC_MAX (vect_c_lsm.35_124); Looks like the last .REDUC_MAX is kind of a surprise here? It is not easy to get the sematics of REDUC_MAX for source code. Actually the c will depend on the previous iteration. For example, if b condition is 0, c will be 0 forever. If b condition is 1, the c will be the sequence similar to [-3, 0, -3, 0...]. Not sure if my understanding is correct, will take a look into tree-vect.
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #2 from Andrew Pinski --- Works fine on aarch64 with SVE: ``` [apinski@xeond2 upstream-full-cross]$ ./install/bin/aarch64-linux-gnu-gcc -O3 t6.c -static -march=armv9-a+sve2 [apinski@xeond2 upstream-full-cross]$ ./install-qemu/bin/qemu-aarch64 a.out ;echo $? 0 [apinski@xeond2 upstream-full-cross]$ ./install/bin/aarch64-linux-gnu-gcc -O3 t6.c -static -march=armv9-a+sve2 -fno-vect-cost-model [apinski@xeond2 upstream-full-cross]$ ./install-qemu/bin/qemu-aarch64 a.out ;echo $? 0 ```
[Bug target/114027] [14] RISC-V vector: miscompile at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #1 from Sam James --- When this is fixed, this is probably worth putting in the general torture test suite, not just for riscv.