[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #24 from Tamar Christina --- The case I thought would go wrong with the above fix is: #include #include #include #define N 306 #define NEEDLE 135 __attribute__ ((noipa, noinline)) int use(int x[N]) { printf("res=%d\n", x[NEEDLE]); return x[NEEDLE]; } __attribute__ ((noipa, noinline)) int foo (int i, unsigned short parse_tables_n) { int table[N]; memset (table, -1, sizeof (table)); parse_tables_n >>= 9; parse_tables_n += 9; while (i < N && parse_tables_n--) table[i++] = 0; return use (table); } int main () { if (foo (0, 0x) != 0) abort (); return 0; } --- but this seems fine because of the bias_for_lowest which I now understand to be there to account for this. So starting a regtest for that patch.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #23 from Tamar Christina --- small standalone reducer: #include #include #include #define N 306 #define NEEDLE 136 __attribute__ ((noipa, noinline)) int use(int x[N]) { printf("res=%d\n", x[NEEDLE]); return x[NEEDLE]; } __attribute__ ((noipa, noinline)) int foo (int i, unsigned short parse_tables_n) { int table[N]; memset (table, -1, sizeof (table)); parse_tables_n >>= 9; parse_tables_n += 11; while (i < N && parse_tables_n--) table[i++] = 0; return use (table); } int main () { if (foo (0, 0x) != 0) abort (); return 0; } --- > gcc incorrect.c -O3 -o incorrect.exe; and ./incorrect.exe; echo $status res=-1 1 > gcc incorrect.c -O1 -o incorrect.exe; and ./incorrect.exe; echo $status res=0 0 > gcc incorrect.c -O3 -fdisable-tree-cunroll -o incorrect.exe; and > ./incorrect.exe; echo $status res=0 0
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #22 from Tamar Christina --- (In reply to Richard Biener from comment #21) > loop->nb_iterations_upper_bound exactly is an upper bound on the number of > latch executions, so maybe I'm missing the point here. When we update it it > as > well has to reflect an upper bound on that, whether the last exit (the one > before the latch) is the IV exit or a vectorized early exit. Yes, but the issue here is that the bounds limit is coming from an exit other than the one being chosen as the IV exit. So the latch iterates X times not X - 1. > > But yes, if the last exit is an early one that last iteration might be > partial > (so we drop the -1), but that's what we already do? I don't see where. This code all seems to remove -1 from the iteration count and assuming the latch iteration count + 1 == nbounds. I don't really think this is about partial loops at all. Change parse_tables_n >>= 9; parse_tables_n += 11; to parse_tables_n >>= 9; parse_tables_n += 10; then loop->nb_iterations_upper_bound is 136 and there is no partial iterations here. You do 17 full vector iterations with no residuals.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #21 from Richard Biener --- loop->nb_iterations_upper_bound exactly is an upper bound on the number of latch executions, so maybe I'm missing the point here. When we update it it as well has to reflect an upper bound on that, whether the last exit (the one before the latch) is the IV exit or a vectorized early exit. But yes, if the last exit is an early one that last iteration might be partial (so we drop the -1), but that's what we already do?
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #20 from Tamar Christina --- [local count: 21718864]: ... _54 = (short unsigned int) bits_106; _26 = _54 >> 9; _88 = _139 + 7; _89 = _88 & 7; _111 = _26 + 10; [local count: 181308616]: # i_66 = PHI # parse_tables_n_lsm.141_87 = PHI <_29(36), _111(21)> i_69 = i_66 + 1; table[i_66] = 0; _29 = parse_tables_n_lsm.141_87 + 65535; if (parse_tables_n_lsm.141_87 != 0) goto ; [94.50%] else goto ; [5.50%] [local count: 171336643]: if (i_69 != 306) goto ; [93.84%] else goto ; [6.16%] [local count: 160784290]: goto ; [100.00%] is the relevant part of the loop. At the start of vect we determine: misc.c:147:31: note:All loop exits successfully analyzed. misc.c:147:31: note: Symbolic number of iterations is 306 - (unsigned int) i_146 Creating dr for table[i_66] but when we get to vect_transform_loop we've updated loop->nb_iterations_upper_bound to be 137. >>> p (int)loop->nb_iterations_upper_bound $2 = 137 SCEV claims: result: # of iterations _26 + 10, bounded by 137 (instantiate_scev (instantiate_below = 21 -> 22) (evolution_loop = 4) (chrec = _26 + 10) (res = _26 + 10)) Statement (exit)if (parse_tables_n_lsm.141_87 != 0) is executed at most _26 + 10 (bounded by 137) + 1 times in loop 4. and further down Statement table[i_66] = 0; is executed at most 305 (bounded by 305) + 1 times in loop 4. So it looks like we determine the latch will execute a maximum of 305 times, but that the early exit is only executed 137 times. Which means the loop is bounded by 137 iterations. This looks like it's because _111 is unsigned short, so (0x >> 9) + 10 == 137 This is fine, but when during vect_transform_loop when we update the bounds we do: wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); so we end up doing ((137 + 1) / 8) - 1 which is bogus. This results in 16 iterations while we know the vector loop must do 17. The additional -1 is because the code assumes that the niters bounds is coming from the latch iteration count, however in this case it's coming from another exit and the latch never executes. I think for early exits we should take the ceil instead, since we have cases like this where an early exit restricts the loop's iteration count, but is not choosen as the main exit. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 854e9d78bc7..0b1656fef2f 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -12171,7 +12171,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* True if the final iteration might not handle a full vector's worth of scalar iterations. */ bool final_iter_may_be_partial -= LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo); += LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo); /* The minimum number of iterations performed by the epilogue. This is 1 when peeling for gaps because we always need a final scalar iteration. */ Fixes it, but this doesn't feel entirely right to me. Because in particular it would have still been wrong if the nbounds on (parse_tables_n_lsm.141_87 != 0) was 136, since there's no decimal after the divide there we'd end up at 16 again. So I think we need to know whether the bounds is coming from the loop exit or an alternate exit and not decrement by -1 if the bounds limit does not come from the latch.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #19 from Tamar Christina --- Ok, removing all the noise shows that this is the same issue as I saw before. The code out of the vectorizer is correct, but cunroll does a dodgee unrolling. -fdisable-tree-cunroll confirms it's the unrolling. cunroll claims: Loop 4 iterates at most 16 times. Loop 4 likely iterates at most 16 times. Analyzing # of iterations of loop 4 exit condition [1, + , 1](no_overflow) < bnd.157_285 ... ;; Guessed iterations of loop 4 is 7.347979. New upper bound 16. Making edge 23->36 impossible by redistributing probability to other edges. Original probability: 93.8% (guessed) misc.c:147:31: optimized: loop with 16 iterations completely unrolled (header execution count 2859449) Last iteration exit edge was proved true. Not peeling: number of iterations is not estimated but the i is bounded by i < 306. and VF=8. so binding loop iteration to 16 means it just cut off half the loop. so it looks like the upper bounds is wrong. Checking what we write out during loop vect.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #18 from Tamar Christina --- Loop that gets miscompiled is the initialization loop: while (parse_tables_n-- && i < 306) table[i++] = 0; and indeed, the compiler seems to also be ignoring the #pragma GCC novector attribute on it. Will take a look at both.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #17 from Tamar Christina --- (In reply to Sam James from comment #16) > Created attachment 57393 [details] > test.c > > OK, all done now (I figured I'd let cvise finish). No more :) > > By the way, this fails on arm64 too (at least the reduced thing). Thanks! yeah I'm already debugging there :) I have a much better setup on aarch64 to track such things down. Thanks again for the reducers!
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 Sam James changed: What|Removed |Added Attachment #57390|0 |1 is obsolete|| --- Comment #16 from Sam James --- Created attachment 57393 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57393=edit test.c OK, all done now (I figured I'd let cvise finish). No more :) By the way, this fails on arm64 too (at least the reduced thing).
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #15 from Tamar Christina --- (In reply to Sam James from comment #14) > Created attachment 57390 [details] > test.c > > I'll try reducing it preprocessed now (couldn't do it before as checking w/ > clang as well in the reduction script to avoid introducing UB). Thanks! this is good enough. Thanks for the reducer!
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 Sam James changed: What|Removed |Added Attachment #57384|0 |1 is obsolete|| --- Comment #14 from Sam James --- Created attachment 57390 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57390=edit test.c I'll try reducing it preprocessed now (couldn't do it before as checking w/ clang as well in the reduction script to avoid introducing UB).
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 Sam James changed: What|Removed |Added Attachment #57379|0 |1 is obsolete|| --- Comment #13 from Sam James --- Created attachment 57384 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57384=edit test.c Not done reducing yet but just posting an update.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #12 from Sam James --- Created attachment 57379 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57379=edit test.c Here's an initial stab at a standalone testcase. I'm going to try reduce it now.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #11 from Richard Biener --- (In reply to Tamar Christina from comment #10) > (In reply to Richard Biener from comment #9) > > Another bug in the dependence checking code is > > > > if (dr_may_alias_p (dr_ref, dr_read, loop_nest)) > > > > which will end up using TBAA - dr_may_alias_p doesn't think you are ever > > going to move stores down across loads. To verify if that's possible > > you need to use > > > > if (dr_may_alias_p (dr_read, dr_ref, loop_nest)) > > > > instead. > > > > Note there's still my very original review consideration that you move > > stmts out-of-order but the main dependence checking the vectorizer does > > assumes the stores and loads appear in their original order. I'm not > > sure whether with the above we prove this doesn't matter. > > But in the original review I had it that way and you said: > > > + for (auto dr_read : bases) > > + if (dr_may_alias_p (dr_read, dr_ref, loop_nest)) > > I think you need to swap dr_read and dr_ref operands, since you > are walking stmts backwards and thus all reads from 'bases' are > after the write. > > so I'm somewhat confused.. I was confused.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #10 from Tamar Christina --- (In reply to Richard Biener from comment #9) > Another bug in the dependence checking code is > > if (dr_may_alias_p (dr_ref, dr_read, loop_nest)) > > which will end up using TBAA - dr_may_alias_p doesn't think you are ever > going to move stores down across loads. To verify if that's possible > you need to use > > if (dr_may_alias_p (dr_read, dr_ref, loop_nest)) > > instead. > > Note there's still my very original review consideration that you move > stmts out-of-order but the main dependence checking the vectorizer does > assumes the stores and loads appear in their original order. I'm not > sure whether with the above we prove this doesn't matter. But in the original review I had it that way and you said: > + for (auto dr_read : bases) > + if (dr_may_alias_p (dr_read, dr_ref, loop_nest)) I think you need to swap dr_read and dr_ref operands, since you are walking stmts backwards and thus all reads from 'bases' are after the write. so I'm somewhat confused..
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #9 from Richard Biener --- Another bug in the dependence checking code is if (dr_may_alias_p (dr_ref, dr_read, loop_nest)) which will end up using TBAA - dr_may_alias_p doesn't think you are ever going to move stores down across loads. To verify if that's possible you need to use if (dr_may_alias_p (dr_read, dr_ref, loop_nest)) instead. Note there's still my very original review consideration that you move stmts out-of-order but the main dependence checking the vectorizer does assumes the stores and loads appear in their original order. I'm not sure whether with the above we prove this doesn't matter.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #8 from Richard Biener --- (In reply to Tamar Christina from comment #6) > The reason for the miscompile popping up is this change from the previous > patch > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index 109d4ce5192..df3eab2e8d5 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -725,8 +725,7 @@ vect_analyze_early_break_dependences (loop_vec_info > loop_vinfo) > bounded by VF so accesses are within range. We only need to > check the > reads since writes are moved to a safe place where if we get > there we > know they are safe to perform. */ > - if (DR_IS_READ (dr_ref) > - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > + if (!ref_within_array_bound (stmt, DR_REF (dr_ref))) I think it can even be relaxed to if ((DR_IS_READ (dr_ref) && check_deps)) since for non-peeled the IV exit block will be only executed with a fully enabled vector.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #7 from Richard Biener --- (In reply to Tamar Christina from comment #6) > The reason for the miscompile popping up is this change from the previous > patch > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index 109d4ce5192..df3eab2e8d5 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -725,8 +725,7 @@ vect_analyze_early_break_dependences (loop_vec_info > loop_vinfo) > bounded by VF so accesses are within range. We only need to > check the > reads since writes are moved to a safe place where if we get > there we > know they are safe to perform. */ > - if (DR_IS_READ (dr_ref) > - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) > + if (!ref_within_array_bound (stmt, DR_REF (dr_ref))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > but this should have bee safe, as the stores shouldn't be done until the > point we know for sure they would be safe to do. > > the code out of the vectorizer looks ok to me. Valgrind is saying we're > reading uninitialized values. But those values I think come from a previous > look which sets them to 0. Or is supposed to. So working my way up this > giant function. Hmm, but there isn't really a "safe" place, is there? If there's a safe place then it would be safe for reads as well, no? So I guess when you manage to massage the testcase to be based on decls then you instead (with the above suggested change) get spurious stores?
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #6 from Tamar Christina --- The reason for the miscompile popping up is this change from the previous patch diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 109d4ce5192..df3eab2e8d5 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -725,8 +725,7 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) bounded by VF so accesses are within range. We only need to check the reads since writes are moved to a safe place where if we get there we know they are safe to perform. */ - if (DR_IS_READ (dr_ref) - && !ref_within_array_bound (stmt, DR_REF (dr_ref))) + if (!ref_within_array_bound (stmt, DR_REF (dr_ref))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, but this should have bee safe, as the stores shouldn't be done until the point we know for sure they would be safe to do. the code out of the vectorizer looks ok to me. Valgrind is saying we're reading uninitialized values. But those values I think come from a previous look which sets them to 0. Or is supposed to. So working my way up this giant function.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #5 from Sam James --- (In reply to Sam James from comment #0) > /home/sam/data/libarchive/libarchive-3.7.2/libarchive/test/ > test_read_format_rar5.c:106: bytes_read != fsize > bytes_read=-30 (0xffe2, 0177742) > fsize=95 (0x5f, 0137) btw, this isn't overflow, it's a sentinel value to represent an invalid archive (ARCHIVE_FATAL) -- just that this test doesn't check for that, as it's expected pt ass.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #4 from Tamar Christina --- Narrowed down the change part that caused the failure, but it should have been correct to do. So looking into why the change caused the failure. Please hold..
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2024-02-03 Target Milestone|--- |14.0 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED --- Comment #3 from Andrew Pinski --- From https://gcc.gnu.org/pipermail/gcc-patches/2024-February/644888.html ``` FAIL: libgomp.fortran/non-rectangular-loop-1.f90 -O1 execution test ``` Maybe that failure is the same issue as this one too.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org --- Comment #2 from Tamar Christina --- Yeah that's enough info for me to go on. Will fix Monday morning.
[Bug middle-end/113734] [14 regression] libarchive miscompiled (fails libarchive_test_read_format_rar5_extra_field_version test) since r14-8768-g85094e2aa6dba7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113734 --- Comment #1 from Sam James --- Not made much progress yet. libarchive/archive_read_support_format_rar5.c.o is the miscompiled obj, using #pragma GCC optimize ("O0") on parse_tables fixes things.