[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #9 from Andrew Pinski --- Reduced testcase: ``` #define vect128 __attribute__((vector_size(16))) [[gnu::noinline]] vect128 long f(vect128 long a) { return a <= (vect128 long){0, 9223372036854775807}; } int main() { vect128 long t = (vect128 long){0, 0}; t = f(t); if (!t[1]) __builtin_abort(); } ```
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #8 from Jakub Jelinek --- Slightly simplified/cleaned up testcase: /* { dg-do run } */ /* { dg-options "-O2 -fno-vect-cost-model" } */ /* { dg-additional-options "-mavx" { target avx_runtime } } */ struct S { int a; long b; int c; } s; __attribute__((noipa)) struct S * foo (void) { return } int main () { struct S r = *foo (); long t = 10412095L - r.b; struct { long d; int e; } f[4] = { {}, {}, {}, { __LONG_MAX__, 0 } }; for (unsigned i = 0; i < 4; ++i) if (t <= f[i].d) return f[i].e; __builtin_abort (); }
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #7 from Andrew Pinski --- This looks wrong: ``` ;; mask_patt_17.15_55 = vect_cst__53 <= { 0, 9223372036854775807 }; (insn 21 20 22 (set (reg:V2DI 111) (mem/u/c:V2DI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S16 A128])) -1 (expr_list:REG_EQUAL (const_vector:V2DI [ (const_int 1 [0x1]) (const_int -9223372036854775808 [0x8000]) ]) (nil))) (insn 22 21 23 (set (reg:V2DI 112) (gt:V2DI (reg:V2DI 111) (reg:V2DI 100 [ vect_cst__53 ]))) -1 (nil)) (insn 23 22 0 (set (reg:V2DI 102 [ mask_patt_17.15D.3121 ]) (reg:V2DI 112)) -1 (nil)) ``` we go from `a <= INT_MAX` to `INT_MIN > a` which is basically going from `true` to `a != INT_MIN`.
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #6 from Tamar Christina --- vectorizer generates: mask_patt_21.19_58 = vect_perm_even_49 >= vect_cst__57; mask_patt_21.19_59 = vect_perm_even_55 >= vect_cst__57; vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59; if (vexit_reduc_63 != { 0, 0 }) goto ; [20.00%] else goto ; [80.00%] This is changed at loopdone into: delays[3].nonprimary_delay = 129600; vect_cst__57 = {tdiff_6, tdiff_6}; mask_patt_21.19_58 = vect_cst__57 <= { 0, 0 }; mask_patt_21.19_59 = vect_cst__57 <= { 0, 0x7FFF }; vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59; if (vexit_reduc_63 != { 0, 0 }) goto ; [20.00%] else goto ; [80.00%] or in other words, if there's any value where the compare succeeds, find it and return. This looks correct to me. It could be that my AVX is rusty but, this generates: vmovdqa 0xf9c(%rip),%xmm1# 0x402010 mov$0x1,%eax vmovq %rcx,%xmm3 vmovdqa %xmm0,(%rsp) vpunpcklqdq %xmm3,%xmm3,%xmm2 vmovdqa %xmm0,0x10(%rsp) vpcmpgtq %xmm2,%xmm1,%xmm1# vmovdqa %xmm0,0x20(%rsp) vmovq %rax,%xmm0 vpunpcklqdq %xmm0,%xmm0,%xmm0 movl $0x1fa40,0x38(%rsp) vpcmpgtq %xmm2,%xmm0,%xmm0 vpor %xmm1,%xmm0,%xmm0 vptest %xmm0,%xmm0 which looks off, particularly for the second compare it look like it doesn't do a load but instead just duplicates the constant 1. gdb seems to confirm this. At the first compare: (gdb) p $xmm2.v2_int64 $4 = {10412095, 10412095} (gdb) p $xmm0.v2_int64 $5 = {0, 0} which is what's expected, but at the second compare: (gdb) p $xmm2.v2_int64 $7 = {10412095, 10412095} (gdb) p $xmm0.v2_int64 $6 = {1, 1} at the second it's comparing {1, 1} instead of {0, 0x7FFF}. on AArch64 where it doesn't fail the comparison is: moviv29.4s, 0 add x1, sp, 16 ldr x5, [x0, 8] mov w0, 64064 movkw0, 0x1, lsl 16 add x3, sp, 48 str q29, [sp, 64] mov x2, 57407 mov x4, 9223372036854775807 str x4, [sp, 64] movkx2, 0x9e, lsl 16 str w0, [sp, 72] sub x2, x2, x5 stp q29, q29, [x1] dup v27.2d, x2 ld2 {v30.2d - v31.2d}, [x1] str q29, [sp, 48] ld2 {v28.2d - v29.2d}, [x3] cmgev30.2d, v30.2d, v27.2d cmgev28.2d, v28.2d, v27.2d orr v30.16b, v30.16b, v28.16b umaxp v30.4s, v30.4s, v30.4s fmovx0, d30 cbnzx0, .L12 which has v30.2d being {0, 0} and v28.2d being {0, 0x7FFF} as expected... On AArch64 we don't inline the constants because whatever is propagating the constants can't understand the LOAD_LANES: mask_patt_19.21_50 = vect__2.16_44 >= vect_cst__49; mask_patt_19.21_51 = vect__2.19_47 >= vect_cst__49; vexit_reduc_55 = mask_patt_19.21_50 | mask_patt_19.21_51; if (vexit_reduc_55 != { 0, 0 }) goto ; [20.00%] else goto ; [80.00%] so could this be another expansion bug? Note that a simpler reproducer is this: --- long tdiff = 10412095; int main() { struct { long maximum; int nonprimary_delay; } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}}; for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i) if (tdiff <= delays[i].maximum) return delays[i].nonprimary_delay; __builtin_abort(); } --- the key point is that we're not allowed to constprop tdiff at GIMPLE. If we do, e.g: int main() { struct { long maximum; int nonprimary_delay; } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}}; long tdiff = 10412095; for (unsigned i = 0; i < sizeof(delays) / sizeof(delays[0]); ++i) if (tdiff <= delays[i].maximum) return delays[i].nonprimary_delay; __builtin_abort(); } then after vectorization the const prop the entire expression is evaluated at GIMPLE and it gets the right result. This makes me believe it's a target expansion bug.
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model since r14-6822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Summary|[14 regression] Tor |[14 regression] Tor |miscompiled with -O2 -mavx |miscompiled with -O2 -mavx |-fno-vect-cost-model|-fno-vect-cost-model since ||r14-6822 Ever confirmed|0 |1 Priority|P3 |P1 Target Milestone|--- |14.0 Last reconfirmed||2024-03-14 CC||jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Started with r14-6822-g01f4251b8775c832a92d55e2df57c9ac72eaceef
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #4 from Sam James --- (In reply to Sam James from comment #3) > Created attachment 57705 [details] > larger.i > > Ah, wait, that might be a bad reduction. Let me attach a larger one, then I > can give the original if needed too. OK, no, I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339#c2 is fine.
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #3 from Sam James --- Created attachment 57705 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57705=edit larger.i Ah, wait, that might be a bad reduction. Let me attach a larger one, then I can give the original if needed too.
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 Sam James changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org --- Comment #2 from Sam James --- I've minimised it to: ``` struct entry_guard_t { int is_reachable; long failing_since; int is_primary; } *create_guard() { struct entry_guard_t *guard = __builtin_malloc(sizeof *guard); guard->is_reachable = guard->failing_since = guard->is_primary = 0; return guard; } int main() { struct entry_guard_t guard = *create_guard(); long tdiff = 10412095 - guard.failing_since; struct { long maximum; int nonprimary_delay; } delays[] = {{}, {}, {}, {9223372036854775807, 36 * 60 * 60}}; unsigned i = 0; for (; i < sizeof(delays) / sizeof(delays[0]); ++i) if (tdiff <= delays[i].maximum) return delays[i].nonprimary_delay; __builtin_abort(); } ``` This fails for me with `-O2 -mavx -fno-vect-cost-model`.
[Bug tree-optimization/114339] [14 regression] Tor miscompiled with -O2 -mavx -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339 --- Comment #1 from Sam James --- The assert is at https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.8.10/src/feature/client/entrynodes.c#L2072 ``` (gdb) p delays $3 = {{ maximum = 21600, primary_delay = 600, nonprimary_delay = 3600 }, { maximum = 345600, primary_delay = 5400, nonprimary_delay = 14400 }, { maximum = 604800, primary_delay = 14400, nonprimary_delay = 64800 }, { maximum = 9223372036854775807, primary_delay = 32400, nonprimary_delay = 129600 }} (gdb) ``` The bad loop (confirmed w/ novector pragma) is: for (i = 0; i < ARRAY_LENGTH(delays); ++i) { if (tdiff <= delays[i].maximum) { return is_primary ? delays[i].primary_delay : delays[i].nonprimary_delay; } }