[Bug c++/113687] -Warray-bounds is not emitted inside class method
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113687 Richard Biener changed: What|Removed |Added Blocks||56456 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-02-01 --- Comment #2 from Richard Biener --- (In reply to Andrew Pinski from comment #1) > The warning only happens if the vague linkage function is used. and IIRC > that is by design. Yeah, we try to avoid diagnosing things on "dead" code and here the whole functions are dead. IIRC even -fanalyzer runs after cgraph removes unreachable functions. It would be still nice to diagnose these kind of trivial cases. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456 [Bug 56456] [meta-bug] bogus/missing -Warray-bounds
[Bug testsuite/113685] [14 regression] gcc.dg/vect/vect-117.c fails profile checking with Invalid sum after r14-4089-gd45ddc2c04e471
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113685 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-02-01 CC||hubicka at gcc dot gnu.org Ever confirmed|0 |1 Target Milestone|--- |14.0 Keywords||testsuite-fail Summary|[14 regression] xxx fails |[14 regression] |after yyy |gcc.dg/vect/vect-117.c ||fails profile checking with ||Invalid sum after ||r14-4089-gd45ddc2c04e471 Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- As said in the other PR, this is more for Honza who thought checking we do not end with invalid profiles for all vect testcases is a good thing ;) Btw, the wrong count pops up in DOM3: t.c.203t.dom3:;; Invalid sum of incoming counts 138435014 (estimated locally, freq 3.0936), should be 134239200 (estimated locally, freq 2.) so it seems to be a jump threading issue. It's gone with -fno-thread-jumps. Very likely a latent issue, but of course the change triggering this does have an effect on jump threading. Confirmed.
[Bug target/113684] Cross compiler without assembler and linker should assume that all assembler and linker features are available
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113684 --- Comment #3 from Richard Biener --- I'm usually having cross assembler/linker around as they are easy to build.
[Bug tree-optimization/110176] [11/12/13 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176 Richard Biener changed: What|Removed |Added Known to work||14.0 Summary|[11/12/13/14 Regression]|[11/12/13 Regression] wrong |wrong code at -Os and above |code at -Os and above on |on x86_64-linux-gnu since |x86_64-linux-gnu since |r11-2446|r11-2446 --- Comment #11 from Richard Biener --- Fixed on trunk sofar.
[Bug target/113641] [13/14 regression] 510.parest_r with PGO at O2 slower than GCC 12 (7% on Zen 3&2, 4% on CascadeLake) since r13-4272-g8caf155a3d6e23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113641 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug rtl-optimization/113546] [13/14 Regression] aarch64: bootstrap-debug-lean broken with -fcompare-debug failure since r13-2921-gf1adf45b17f7f1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113546 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug testsuite/113611] [14 Regression] gcc.dg/pr110279-1.c fails on cross build since gcc-14-5779-g746344dd538
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113611 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0
[Bug target/113542] [14 Regression] gcc.target/arm/bics_3.c regression after change for pr111267
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113542 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0
[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug rtl-optimization/110390] [13/14 regression] ICE on valid code on x86_64-linux-gnu with sel-scheduling: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609 since r13-3596-ge7310e24b1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110390 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.3
[Bug target/105275] [12/13/14 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.4
[Bug debug/92444] [11/12/13/14 regression] gcc generates wrong debug information at -O2 and -O3 since r10-4122-gf658ad3002a0af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92444 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.5
[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681 Richard Biener changed: What|Removed |Added Priority|P3 |P4
[Bug tree-optimization/113681] [14 Regression] ICE in tree_profiling, at tree-profile.cc:803 since r14-6201-gf0a90c7d7333fc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113681 Richard Biener changed: What|Removed |Added Keywords||error-recovery Target Milestone|--- |14.0
[Bug rtl-optimization/113682] Branches in branchless binary search rather than cmov/csel/csinc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Component|other |rtl-optimization Version|unknown |14.0 Target||aarch64, x86_64-*-* --- Comment #1 from Richard Biener --- Since there's a loop exit involved (and the loop has multiple exits) if-conversion is made difficult here. You could try rotating manually producing a do { } while loop with a "nicer" exit condition and see whether that helps.
[Bug tree-optimization/110176] [11/12/13/14 Regression] wrong code at -Os and above on x86_64-linux-gnu since r11-2446
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110176 --- Comment #9 from Richard Biener --- With all VARYING we simplify i_19 = (int) _2; _6 = (int) _5; Value numbering stmt = _7 = _6 <= i_19; Applying pattern match.pd:6775, gimple-match-4.cc:1795 Match-and-simplified _6 <= i_19 to 1 where _5 is _Bool and _2 is unsigned int. We match zext <= (int) 4294967295u note that I see Value numbering stmt = _2 = f$0_25; Setting value number of _2 to 4294967295 (changed) Value numbering stmt = i_19 = (int) _2; Match-and-simplified (int) _2 to -1 RHS (int) _2 simplified to -1 Not changing value number of i_19 from VARYING to -1 Making available beyond BB6 i_19 for value i_19 so it's odd we see the constant here, but ... we go (if (TREE_CODE (@10) == INTEGER_CST && INTEGRAL_TYPE_P (TREE_TYPE (@00)) && !int_fits_type_p (@10, TREE_TYPE (@00))) (with { tree min = lower_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00)); tree max = upper_bound_in_type (TREE_TYPE (@10), TREE_TYPE (@00)); bool above = integer_nonzerop (const_binop (LT_EXPR, type, max, @10)); bool below = integer_nonzerop (const_binop (LT_EXPR, type, @10, min)); } (if (above || below) failing to see that we deal with a relational compare and a sign-change. The original code from fold-const.cc had only INTEGER_TYPE support, r6-4300-gf6c1575958f7bf made it cover all integral types (it half-way supported BOOLEAN_TYPE already). But the issue was latent I think. One notable difference was that I think get_unwidened made sure to convert a constant to the wider type while here we have @10 != @1 and the conversion not applied. We're doing it correct in earlier code: /* ??? The special-casing of INTEGER_CST conversion was in the original code and here to avoid a spurious overflow flag on the resulting constant which fold_convert produces. */ (if (TREE_CODE (@1) == INTEGER_CST) using @1 instead of @10. Correcting that avoids the pattern from triggering in this wrong way.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from Richard Biener --- Fixed.
[Bug middle-end/113680] Missed optimization: Redundant cmp/test instructions when comparing (x - y) > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113680 Richard Biener changed: What|Removed |Added Component|rtl-optimization|middle-end Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-31 Keywords||easyhack, ||missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I don't think we have or had a (a - b) CMP 0 simplification pattern which this seems to be about. We have a +- CST CMP CST'. Note the reverse, a < b -> (a - b) < 0 isn't valid.
[Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134 --- Comment #20 from Richard Biener --- I think we want split_loop () handle this case. That means extending it to handle loops with multiple exits. OTOH after loop rotation to if (i_21 == 1001) goto ; [1.00%] else goto ; [99.00%] [local count: 1004539166]: i_18 = i_21 + 1; if (N_13(D) > i_18) goto ; [94.50%] else goto ; [5.50%] it could be also IVCANONs job to rewrite the exit test so the bound is loop invariant and it becomes a single exit. There's another recent PR where an exit condition like i < N && i < M should become i < MIN(N,M).
[Bug tree-optimization/113630] [11/12/13 Regression] -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Summary|[11/12/13/14 Regression]|[11/12/13 Regression] |-fno-strict-aliasing|-fno-strict-aliasing |introduces out-of-bounds|introduces out-of-bounds |memory access |memory access Known to work||14.0 --- Comment #7 from Richard Biener --- Fixed on trunk sofar.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #10 from Richard Biener --- Hmm, I have another fix.
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #9 from Richard Biener --- (In reply to Richard Biener from comment #8) > The best fix would likely be to pre-insert all the IPA-CP known constants > instead of trying to discover them "late". > > I'm testing the easy fix for now. Hmm. gcc.dg/ipa/pr92497-1.c FAILs because of that. We get __attribute__((noinline)) int bar.constprop (struct a a) { intD.6 a$aD.2808; intD.6 D.2807; struct a aD.2806; intD.6 _4; [local count: 1073741824]: # .MEM_5 = VDEF <.MEM_2(D)> aD.2806 = aD.2800; # VUSE <.MEM_5> a$a_3 = aD.2806.aD.2769; here and thus translate through the aggregate copy - the result should then be put on aD.2806 but of course only with .MEM_5. Maybe we can and should always use the default def here but I'm slightly uneasy with the ref adjustment, esp. since we're going to record for the saved operands (if those exist - the path where it goes wrong isn't translated).
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 --- Comment #8 from Richard Biener --- OK, so the issue is that we're recording the IPA result with the wrong VUSE since we're calling vn_reference_lookup_2 with !data->last_vuse_ptr but data->finish (vr->set, vr->base_set, v) inserts a hashtable entry with data->last_vuse. Note it's somewhat unexpected that vn_reference_lookup_2 performs hashtable insertion which is what causes the issue. It's also not as easy as using the updated vuse since if we're coming from translation through a memcpy that would be wrong. In fact we probably want to avoid doing any insertion if theres sth fishy going on (!data->last_vuse_ptr). The best fix would likely be to pre-insert all the IPA-CP known constants instead of trying to discover them "late". I'm testing the easy fix for now.
[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670 Richard Biener changed: What|Removed |Added Known to fail|14.0| Target Milestone|--- |14.0 Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Known to work||14.0 --- Comment #5 from Richard Biener --- Fixed for trunk.
[Bug tree-optimization/113678] SLP misses up vec_concat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I think the SLP tree we discover is sound: t2.c:11:14: note: node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: *a_7(D) = _1; t2.c:11:14: note: stmt 0 *a_7(D) = _1; t2.c:11:14: note: stmt 1 MEM[(char *)a_7(D) + 1B] = _2; t2.c:11:14: note: stmt 2 MEM[(char *)a_7(D) + 2B] = _3; t2.c:11:14: note: stmt 3 MEM[(char *)a_7(D) + 3B] = _4; t2.c:11:14: note: stmt 4 MEM[(char *)a_7(D) + 4B] = _1; t2.c:11:14: note: stmt 5 MEM[(char *)a_7(D) + 5B] = _2; t2.c:11:14: note: stmt 6 MEM[(char *)a_7(D) + 6B] = _3; t2.c:11:14: note: stmt 7 MEM[(char *)a_7(D) + 7B] = _4; t2.c:11:14: note: children 0x5db7778 t2.c:11:14: note: node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: _1 = *b_6(D); t2.c:11:14: note: stmt 0 _1 = *b_6(D); t2.c:11:14: note: stmt 1 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 2 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 3 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: stmt 4 _1 = *b_6(D); t2.c:11:14: note: stmt 5 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 6 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 7 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: load permutation { 0 1 2 3 0 1 2 3 } the issue is as so often t2.c:11:14: note: ==> examining statement: _1 = *b_6(D); t2.c:11:14: missed: BB vectorization with gaps at the end of a load is not supported t2.c:3:19: missed: not vectorized: relevant stmt not supported: _1 = *b_6(D); t2.c:11:14: note: Building vector operands of 0x5db7778 from scalars instead where we are not applying much non-ad-hoc work to deal with those "out-of-bound" accesses. The choice here would be obvious in doing a single vector(4) load instead.
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- Yeah, most of the code in forwprop/match doesn't deal with the "new" permutes where the result isn't the same length as the inputs.
[Bug tree-optimization/113676] [12 Regression] Miscompilation tree-vrp __builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113676 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Summary|[11/12 Regression] |[12 Regression] |Miscompilation tree-vrp |Miscompilation tree-vrp |__builtin_unreachable |__builtin_unreachable --- Comment #1 from Richard Biener --- Needs -std=c++20. I can't reproduce locally.
[Bug c++/113674] [11/12/13/14 Regression] [[____attr____]] causes internal compiler error: in decl_attributes, at attribs.cc:776
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113674 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-31
[Bug tree-optimization/113673] [12/13/14 Regression] ICE: verify_flow_info failed: BB 5 cannot throw but has an EH edge with -Os -finstrument-functions -fnon-call-exceptions -ftrapv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113673 Richard Biener changed: What|Removed |Added Priority|P3 |P2 --- Comment #2 from Richard Biener --- Looks like an issue in bswap with regard to EH.
[Bug regression/113672] [14 Regression] FAIL: g++.dg/pch/line-map-3.C -g -I. -Dwith_PCH (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113672 Richard Biener changed: What|Removed |Added Keywords||testsuite-fail Target Milestone|--- |14.0
[Bug tree-optimization/113670] ICE with vectors in named registers and -fno-vect-cost-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113670 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2024-01-31 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #3 from Richard Biener --- I'll hunt it down.
[Bug middle-end/113669] -fsanitize=undefined failed to check a signed integer overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113669 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- So confirmed.
[Bug go/113668] [14 Regression] libgo soname bump needed for the GCC 14 release?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113668 Richard Biener changed: What|Removed |Added Keywords||ABI CC||rguenth at gcc dot gnu.org Target Milestone|--- |14.0
[Bug d/113667] [14 Regression] libgphobos symbols missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113667 Richard Biener changed: What|Removed |Added Keywords||ABI Priority|P3 |P1 Target Milestone|--- |14.0
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #13 from Richard Biener --- (In reply to JuzheZhong from comment #12) > OK. It seems it has data dependency issue: > > missed: not vectorized, possible dependence between data-refs a[i_15] and > a[_4] > > a[i_15] = _3; STMT 1 > _4 = i_15 + 2; > _5 = a[_4];STMT 2 > > STMT2 should not depend on STMT1. > > It's recognized as dependency in vect_analyze_data_ref_dependence. > > Is is reasonable to fix it in vect_analyze_data_ref_dependence ? t2.c:4:21: note: dependence distance = 1. t2.c:7:12: missed: not vectorized, possible dependence between data-refs a[i_15] and a[_4] t2.c:4:21: missed: bad data dependence. so there's a cross iteration dependence with distance 1 - that's (compute_affine_dependence ref_a: a[i_15], stmt_a: a[i_15] = _3; ref_b: a[_4], stmt_b: _5 = a[_4]; (analyze_overlapping_iterations (chrec_a = {0, +, 2}_1) (chrec_b = {2, +, 2}_1) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [1 + 1 * x_1]) (overlaps_b = [0 + 1 * x_1])) ) (overlap_iterations_a = [1 + 1 * x_1]) (overlap_iterations_b = [0 + 1 * x_1])) (build_classic_dist_vector dist_vector = (1 ) ) ) a read-after-write of a[i+2] after storing to a[i+1] in program order. This would be fine with a VF of 1 only, but we are not really considering that (a pure SLP vectorization w/o unrolling). Instead we start with the assumption of classical vectorization using interleaving which has a minimal VF of the number of lanes of the vector type with the largest number of lanes as determined by vect_analyze_data_refs. We can delay this all a bit but then the SLP build will fail anyway: t2.c:4:21: missed: Build SLP failed: different interleaving chains in one node _5 = a[_4]; which is because we do t2.c:4:21: note: === vect_analyze_data_ref_accesses === t2.c:4:21: note: Detected interleaving load a[i_15] and a[_1] t2.c:4:21: note: Detected interleaving store a[i_15] and a[_1] t2.c:4:21: note: Detected interleaving load of size 2 t2.c:4:21: note:_2 = a[i_15]; t2.c:4:21: note:tem_10 = a[_1]; t2.c:4:21: note: Detected single element interleaving a[_4] step 16 that is, we are splitting the chain because of the intermediate store (that's kind-of OK-ish, heuristically it works for more cases). We'd usually handle the VF == 1 cases also duriing BB vectorization on the loop body, but we're only doing that when there was if-conversion and the later stand-alone BB vectorization is after predictive commoning which wrecks the loop. We should move predcom after BB vect for that. That said, this PR is quite elaborate and it will touch some key design issues in the vectorizer. I'd rather finally finish getting us to work on the SLP representation only before touching all these delicate things. The following allows the analysis to proceed a bit longer with VF == 1. Not adjusting min_vf early might have issues, but the change might work as-is and possibly allow some cases to be loop vectorized with SLP and a low VF that we now fail to. diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index f592aeb8028..b16b4664e7b 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -589,7 +589,7 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, } unsigned int abs_dist = abs (dist); - if (abs_dist >= 2 && abs_dist < *max_vf) + if (abs_dist >= 1 && abs_dist < *max_vf) { /* The dependence distance requires reduction of the maximal vectorization factor. */ @@ -4946,7 +4955,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal) /* Adjust the minimal vectorization factor according to the vector type. */ vf = TYPE_VECTOR_SUBPARTS (vectype); - *min_vf = upper_bound (*min_vf, vf); + //*min_vf = upper_bound (*min_vf, vf); /* Leave the BB vectorizer to pick the vector type later, based on the final dataref group size and SLP node size. */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 30b90d99925..7eab3d4bebc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -2719,7 +2719,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool , opt_result ok = opt_result::success (); int res; unsigned int max_vf = MAX_VECTORIZATION_FACTOR; - poly_uint64 min_vf = 2; + poly_uint64 min_vf = 1; loop_vec_info orig_loop_vinfo = NULL; /* If we are dealing with an epilogue then orig_loop_vinfo points to the
[Bug ipa/111444] [14 Regression] Wrong code at -O2/3/s on x86_64-gnu since r14-3226-gd073e2d75d9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111444 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #7 from Richard Biener --- I will have a look then.
[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2 since r11-4987-g602c6cfc79ce4a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665 --- Comment #9 from Richard Biener --- (In reply to Jan Hubicka from comment #8) > > Honza - ICF seems to fixup points-to sets when merging variables, so there > > should be a way to kill off flow-sensitive info inside prevailing bodies > > as well. But would that happen before inlining the body? Can you work > > on that? I think comparing ranges would weaken ICF unnecessarily? > > AFAIK ICF does no changes to winning function body. It basically relies > on the fact that early optimizations are local and thus arrive to same > solutions for most of metadata. So only really easy fix is to make it > match value ranges, too. I will check how much that fire in practice - > I can only think of split funtions to diverge, which is probably not > that bad in practice. But is it possible to add a local transform stage and would that also affect which body we inline? But yes, inlining the original body would be so much better ...
[Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395 --- Comment #9 from Richard Biener --- (In reply to JuzheZhong from comment #8) > Hi, Richard. > > Now, I find the time to GCC vectorization optimization. > > I find this case: > > _2 = a[_1]; > ... > a[i_16] = _4; > ,,, > _7 = a[_1];---> This load should be eliminated and re-use _2. > > Am I right ? > > Could you guide me which pass should do this CSE optimization ? > > Thanks. In principle it's value-numbering. The reason it doesn't do this is compile-time cost of doing full data-ref analysis. In principle it's as "easy" as hooking that up into vn_reference_lookup_3 as part of the early work therein to disambiguate more defs. Iff we chose to refrain from valueizing any of the SSA uses we could cache both the data references and the dependence resolution. One could also think of doing very simple recognition of these single index expressions and / or integrating this with other cases. IIRC there's some warranting SCEV processing / niter analysis as well for example to figure that for (int i = 0; i < 128; ++i) a[i] = 1; return a[5]; returns 1.
[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Richard Biener --- Fixed.
[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059 --- Comment #21 from Richard Biener --- (In reply to Jakub Jelinek from comment #19) > Created attachment 57258 [details] > gcc14-pr113059.patch > > So in patch form like this. Untested so far. LGTM.
[Bug target/113059] [14 regression] fftw fails tests for -O3 -m32 -march=znver2 since r14-6210-ge44ed92dbbe9d4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113059 --- Comment #17 from Richard Biener --- (In reply to Jakub Jelinek from comment #16) > The question is revert what exactly? > If we revert r14-6210, we get back the other P1. Or do you mean revert > r14-5355? > I guess another option is move the vzeroupper pass one pass later, i.e. > after pass_gcse. I think moving mdreorg passes as late as possible esp. when they don't play well with DF/notes is a good thing. Maybe even after pass_rtl_dse2 and thus after shrink-wrapping?
[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #29 from Richard Biener --- (In reply to Hongtao Liu from comment #28) > I saw we already maskoff integral modes for vector mask in store_constructor > > /* Use sign-extension for uniform boolean vectors with > integer modes and single-bit mask entries. > Effectively "vec_duplicate" for bitmasks. */ > if (elt_size == 1 > && !TREE_SIDE_EFFECTS (exp) > && VECTOR_BOOLEAN_TYPE_P (type) > && SCALAR_INT_MODE_P (TYPE_MODE (type)) > && (elt = uniform_vector_p (exp)) > && !VECTOR_TYPE_P (TREE_TYPE (elt))) > { > rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)), >expand_normal (elt)); > rtx tmp = gen_reg_rtx (mode); > convert_move (tmp, op0, 0); > > /* Ensure no excess bits are set. > GCN needs this for nunits < 64. > x86 needs this for nunits < 8. */ > auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); > if (maybe_ne (GET_MODE_PRECISION (mode), nunits)) > tmp = expand_binop (mode, and_optab, tmp, > GEN_INT ((1 << nunits) - 1), target, > true, OPTAB_WIDEN); > if (tmp != target) > emit_move_insn (target, tmp); > break; > } But that's just for CONSTRUCTORs, we got the VIEW_CONVERT_EXPR path for VECTOR_CSTs. But yeah, that _might_ argue we should perform the same masking for VECTOR_CST expansion as well, instead of trying to fixup in do_compare_and_jump?
[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659 --- Comment #3 from Richard Biener --- So the issue is similar to gcc.c-torture/execute/20150611-1.c, this time the main exit ends in a path without a virtual use (__builtin_unreachable ()). We can do the same as we do for the alternate exits here.
[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #27 from Richard Biener --- (In reply to Hongtao Liu from comment #25) > (In reply to Tamar Christina from comment #24) > > Just to avoid confusion, are you still working on this one Richi? > > I'm working on a patch to add a target hook as #c18 mentioned. Not sure a target hook was suggested - I think it was suggested that do_compare_and_jump always masks excess bits for integer mode vector masks?
[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Priority|P3 |P2 --- Comment #6 from Richard Biener --- Honza - ICF seems to fixup points-to sets when merging variables, so there should be a way to kill off flow-sensitive info inside prevailing bodies as well. But would that happen before inlining the body? Can you work on that? I think comparing ranges would weaken ICF unnecessarily?
[Bug ipa/113665] [11/12/13/14 regression] Regular for Loop results in Endless Loop with -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113665 --- Comment #5 from Richard Biener --- Well, ICF figures out the other part of the partial inlined test() are equal and I think they are. The if (i >= S){ return false; } tests are inlined and eliminated (I think correctly so). -fno-partial-inlining also avoids the issue. The issue is that ICF doesn't wipe (or compare) range info so we get after inlining: [local count: 10737416]: goto ; [100.00%] [local count: 1063004409]: # RANGE [irange] long unsigned int [0, 591] NONZERO 0x3ff _5 = (long unsigned int) i_2; # RANGE [irange] unsigned int [0, 287] NONZERO 0x1ff _11 = (unsigned int) _5;
[Bug tree-optimization/113664] False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-30 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #4 from Richard Biener --- Confirmed. As usual it's jump-threading related where we isolate, in the -Warray-bounds case MEM[(char *)1B] = 48; we inline 'f' and then, when s == dot == NULL your code dereferences both NULL and NULL + 1. So the diagnostic messages leave a lot to be desired but in the end they point to a problem in your code which is a guard against a NULL 's'. The jump threading is different with -fwrapv-pointer, in particular without it we just get the NULL dereference which we seem to ignore during array-bound diagnostics. We later isolate the paths as unreachable but that happens after the diagnostic.
[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2024-01-30 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
[Bug tree-optimization/113659] [14 Regression] ICE Segmentation fault since r14-8355-g02e683894942da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113659 --- Comment #1 from Richard Biener --- I will have a look.
[Bug debug/113562] [14 Regression] FAIL: gcc.dg/guality/pr54796.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562 --- Comment #4 from Richard Biener --- (In reply to Richard Biener from comment #3) > Just to put it somewhere I ran dwlocstat on cc1plus before/after the > offending change and it looks almost the same. We go from > > cov%samples cumul > 0..10 1280217/38% 1280217/38% > 11..20 55668/1%1335885/40% > 21..30 68004/2%1403889/42% > 31..40 70774/2%1474663/44% > 41..50 75554/2%1550217/46% > 51..60 91816/2%1642033/49% > 61..70 101139/3% 1743172/52% > 71..80 135281/4% 1878453/56% > 81..90 198470/5% 2076923/62% > 91..100 1233822/37% 3310745/100% > > to > > cov%samples cumul > 0..10 1280197/38% 1280197/38% > 11..20 55669/1%1335866/40% > 21..30 68014/2%1403880/42% > 31..40 70773/2%1474653/44% > 41..50 75542/2%1550195/46% > 51..60 91800/2%1641995/49% > 61..70 101133/3% 1743128/52% > 71..80 135259/4% 1878387/56% > 81..90 198496/5% 2076883/62% > 91..100 1233844/37% 3310727/100% And with up-to-date elfutils to avoid some DWARF5 issues cov%samples cumul 0..10 1280347/38% 1280347/38% 11..20 55720/1%1336067/40% 21..30 68040/2%1404107/42% 31..40 70805/2%1474912/44% 41..50 75585/2%1550497/46% 51..60 91850/2%1642347/49% 61..70 101224/3% 1743571/52% 71..80 135406/4% 1878977/56% 81..90 198509/5% 2077486/62% 91..100 1233880/37% 3311366/100% to cov%samples cumul 0..10 1280327/38% 1280327/38% 11..20 55721/1%1336048/40% 21..30 68050/2%1404098/42% 31..40 70804/2%1474902/44% 41..50 75573/2%1550475/46% 51..60 91834/2%1642309/49% 61..70 101218/3% 1743527/52% 71..80 135384/4% 1878911/56% 81..90 198535/5% 2077446/62% 91..100 1233902/37% 3311348/100%
[Bug debug/113562] [14 Regression] FAIL: gcc.dg/guality/pr54796.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113562 --- Comment #3 from Richard Biener --- Just to put it somewhere I ran dwlocstat on cc1plus before/after the offending change and it looks almost the same. We go from cov%samples cumul 0..10 1280217/38% 1280217/38% 11..20 55668/1%1335885/40% 21..30 68004/2%1403889/42% 31..40 70774/2%1474663/44% 41..50 75554/2%1550217/46% 51..60 91816/2%1642033/49% 61..70 101139/3% 1743172/52% 71..80 135281/4% 1878453/56% 81..90 198470/5% 2076923/62% 91..100 1233822/37% 3310745/100% to cov%samples cumul 0..10 1280197/38% 1280197/38% 11..20 55669/1%1335866/40% 21..30 68014/2%1403880/42% 31..40 70773/2%1474653/44% 41..50 75542/2%1550195/46% 51..60 91800/2%1641995/49% 61..70 101133/3% 1743128/52% 71..80 135259/4% 1878387/56% 81..90 198496/5% 2076883/62% 91..100 1233844/37% 3310727/100%
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 Richard Biener changed: What|Removed |Added Attachment #57214|0 |1 is obsolete|| --- Comment #13 from Richard Biener --- Created attachment 57252 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57252=edit prototype fix Note when I extended the patch to also cover a PARM_DECL base to extent coverage I see FAIL: gcc.dg/torture/pr70421.c -O1 execution test FAIL: gcc.dg/torture/pr70421.c -O2 execution test FAIL: gcc.dg/torture/pr70421.c -O3 -g execution test FAIL: gcc.dg/torture/pr70421.c -Os execution test FAIL: gcc.dg/torture/pr70421.c -O2 -flto -fno-use-linker-plugin -flto-partitio n=none execution test FAIL: gcc.dg/torture/pr70421.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-obje cts execution test on x86_64. It seems that arg_base_value isn't the correct thing to use but it eventually should have been unique_base_value (UNIQUE_BASE_VALUE_ARGP)? I'm not sure whether all the different unique base values mean we'll not be able to derive exactly those classes from MEM_EXPRs.
[Bug tree-optimization/113622] [11/12/13 Regression] ICE with vectors in named registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622 Richard Biener changed: What|Removed |Added Known to work||14.0 Summary|[11/12/13/14 Regression]|[11/12/13 Regression] ICE |ICE with vectors in named |with vectors in named |registers |registers --- Comment #19 from Richard Biener --- Should be fixed on trunk, not sure to what extent backporting is suitable.
[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652 Richard Biener changed: What|Removed |Added Target||powerpc Target Milestone|--- |14.0 --- Comment #1 from Richard Biener --- What's the version of binutils you are using?
[Bug middle-end/113651] The GCC optimizer performs poorly on a very simple code snippet.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113651 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-01-29 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. This is a missed phiopt (or operation sinking) of if (r.1_90 < 0) goto ; [41.00%] else goto ; [59.00%] [local count: 391324129]: _91 = _89 ^ 79764919; [local count: 954449104]: # prephitmp_92 = PHI <_91(6), _89(5)> to sth like if (r.1_90 < 0) goto ; [41.00%] else goto ; [59.00%] [local count: 391324129]: [local count: 954449104]: # prephitmp_91 = PHI <79764919(6), 0(5)> _92 = _89 ^ prephitmp_xx; on some archs the conditional constant might be generated by a conditional add of 79764919 to zero. Whether this is better suited for GIMPLE or RTL if-conversion remains to be seen. That splitting the expression helps is just luck.
[Bug c/113650] __builtin_nonlocal_goto ICEs when passed 0 as arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113650 --- Comment #1 from Richard Biener --- I don't think these are supposed to be used by the user ...
[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622 --- Comment #16 from Richard Biener --- typedef double __attribute__ ((vector_size (16))) vec; void test (void) { register vec a asm("xmm1"), b asm("xmm2"), c asm("xmm3"); for (int i = 0; i < 2; i++) c[i] = a[i] < b[i] ? 0.1 : 0.2; } also ICEs with -O0 -msse.
[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622 --- Comment #15 from Richard Biener --- (In reply to Jakub Jelinek from comment #11) > I think it is most important we don't ICE and generate correct code. I > doubt this is used too much in real-world code, otherwise it would have been > reported years ago, so how efficient it will be is less important. We do spill on the read side already. On the write side the ICE is because of r0-71337-g1e188d1e130034. Note we're spilling parts of bitpos to offset: /* Otherwise, split it up. */ if (offset) { /* Avoid returning a negative bitpos as this may wreak havoc later. */ if (!bit_offset.to_shwi (pbitpos) || maybe_lt (*pbitpos, 0)) { *pbitpos = num_trailing_bits (bit_offset.force_shwi ()); poly_offset_int bytes = bits_to_bytes_round_down (bit_offset); offset = size_binop (PLUS_EXPR, offset, build_int_cst (sizetype, bytes.force_shwi ())); } *poffset = offset; but it can also be large positive when the bit amount doesn't fit a HWI. The flow of 'to' expansion is a bit awkward, but the following properly spills in case of variable offset and non-MEM_P: diff --git a/gcc/expr.cc b/gcc/expr.cc index ee822c11dce..f54d0b1474e 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6061,6 +6061,7 @@ expand_assignment (tree to, tree from, bool nontemporal) to_rtx = adjust_address (to_rtx, BLKmode, 0); } + rtx stemp = NULL_RTX, old_to_rtx = NULL_RTX; if (offset != 0) { machine_mode address_mode; @@ -6070,9 +6071,24 @@ expand_assignment (tree to, tree from, bool nontemporal) { /* We can get constant negative offsets into arrays with broken user code. Translate this to a trap instead of ICEing. */ - gcc_assert (TREE_CODE (offset) == INTEGER_CST); - expand_builtin_trap (); - to_rtx = gen_rtx_MEM (BLKmode, const0_rtx); + if (TREE_CODE (offset) == INTEGER_CST) + { + expand_builtin_trap (); + to_rtx = gen_rtx_MEM (BLKmode, const0_rtx); + } + /* Else spill for variable offset to the destination. */ + else + { + gcc_assert (!TREE_CODE (from) == CALL_EXPR + && COMPLETE_TYPE_P (TREE_TYPE (from)) + && (TREE_CODE (TYPE_SIZE (TREE_TYPE (from))) + != INTEGER_CST)); + stemp = assign_stack_temp (GET_MODE (to_rtx), +GET_MODE_SIZE (GET_MODE (to_rtx))); + emit_move_insn (stemp, to_rtx); + old_to_rtx = to_rtx; + to_rtx = stemp; + } } offset_rtx = expand_expr (offset, NULL_RTX, VOIDmode, EXPAND_SUM); @@ -6305,6 +6321,9 @@ expand_assignment (tree to, tree from, bool nontemporal) bitregion_start, bitregion_end, mode1, from, get_alias_set (to), nontemporal, reversep); + /* Move the temporary storage back to the non-MEM_P. */ + if (stemp) + emit_move_insn (old_to_rtx, stemp); } if (result)
[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622 --- Comment #10 from Richard Biener --- (In reply to Jakub Jelinek from comment #8) > Guess for an rvalue (if even that crashes) we want to expand it to some > permutation or whole vector shift which moves the indexed elements first and > then extract it, for lvalue we need to insert it similarly. If we can we should match this up with .VEC_SET / .VEC_EXTRACT, otherwise we should go "simple" and spill. diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc index 7e2392ecd38..e94f292dd38 100644 --- a/gcc/gimple-isel.cc +++ b/gcc/gimple-isel.cc @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun, machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0)); machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref)); - if (auto_var_in_fn_p (view_op0, fun->decl) + if ((auto_var_in_fn_p (view_op0, fun->decl) + || DECL_HARD_REGISTER (view_op0)) && !TREE_ADDRESSABLE (view_op0) && ((!is_extract && can_vec_set_var_idx_p (outermode)) || (is_extract ensures the former and fixes the ICE on x86_64 on trunk. The comment#5 testcase then results in the following loop: .L3: movslq %eax, %rdx vmovaps %zmm2, -56(%rsp) vmovaps %zmm0, -120(%rsp) vmovss -120(%rsp,%rdx,4), %xmm4 vmovss -56(%rsp,%rdx,4), %xmm3 vcmpltss%xmm4, %xmm3, %xmm3 vpbroadcastd%eax, %zmm4 addl$1, %eax vpcmpd $0, %zmm7, %zmm4, %k1 vblendvps %xmm3, %xmm5, %xmm6, %xmm3 vbroadcastss%xmm3, %zmm1{%k1} cmpl$8, %eax jne .L3 this isn't optimal of course, for optimality we need vectorization. But we still need to avoid the ICEs since vectorization can be disabled. That said, I'm quite sure in code using hard registers people are not doing such stupid things so I wonder how important it is to avoid "regressing" the vectorization here.
[Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #1 from Richard Biener --- Did you try with -fprofile-partial-training (is that default on? it probably should ...). Can you please try training with the rate data instead of train to rule out a mismatch?
[Bug c/113631] FAIL: gcc.dg/pr7356.c, fix still fails with #pragma
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113631 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-29 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Version|unknown |14.0 --- Comment #1 from Richard Biener --- :1:2: error: expected ';' before 'typedef' 1 | a a.h:1:9: error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma' 1 | #pragma message "foo" | ^~~ as it's a different message it's likely using a different location to highlight the issue. In general it's difficult to tell whether pointing to the first token sequence in the #included file or the last token before the #include directive is better here. Of course the pragma location should underline either #pragma or the whole #pragma, not just 'message'. Btw, same issue without the #include: a #pragma message "foo" vs. a typedef int b; I'm not sure it makes sense to special case the situation we've switched files?
[Bug c++/113644] [14 regression] ICE when building libcxxabi-16.0.6 since r14-6520
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113644 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug tree-optimization/113630] [11/12/13/14 Regression] -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #5 from Richard Biener --- (In reply to Andrew Pinski from comment #2) > Confirmed. > > I really think what PRE does is correct here since we have an aliasing set > of 0 for both. Now what is incorrect is hoist_adjacent_loads which cannot do > either of any of the aliasing sets are 0 ... > > > > I think even the function below is valid for non-strict aliasing: > ``` > int __attribute__((noipa,noinline)) > f(struct S *p, int c, int d) > { > int r; > if (c) > { > r = ((struct M*)p)->a; > } > else > r = ((struct M*)p)->b; > return r; > } > ``` > > That is hoist_adjacent_loads is broken for non-strict-aliasing in general > and has been since 4.8.0 when it was added (r0-117275-g372a6eb8d991eb). It looks it relies on /* The zeroth operand of the two component references must be identical. It is not sufficient to compare get_base_address of the two references, because this could allow for different elements of the same array in the two trees. It is not safe to assume that the existence of one array element implies the existence of a different one. */ if (!operand_equal_p (TREE_OPERAND (ref1, 0), TREE_OPERAND (ref2, 0), 0)) continue; for the correctness test. Note the MEM accesses are of size sizeof (struct M). With -fno-strict-aliasing we're not wiping that detail so I think it _is_ a bug in PRE that it merges the two accesses. I'll have a more detailed look.
[Bug tree-optimization/113630] [11/12/13/14 Regression] -fno-strict-aliasing introduces out-of-bounds memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113630 --- Comment #4 from Richard Biener --- (In reply to Andrew Pinski from comment #3) > Note LLVM produces decent code here by only using one load: > ``` > xor eax, eax > testesi, esi > seteal > mov eax, dword ptr [rdi + 4*rax] > ``` > > Maybe GCC could do the same ... IIRC there's duplicate bugs about this - phiprop does kind-of the reverse. The sink pass can now sink two exactly same stores but doesn't try sinking a "compatible" store by introducing a PHI for the address. /* ??? We could handle differing SSA uses in the LHS by inserting PHIs for them. */ else if (! operand_equal_p (gimple_assign_lhs (first_store), gimple_assign_lhs (def), 0) || (gimple_clobber_p (first_store) != gimple_clobber_p (def)))
[Bug target/113625] Interesting behavior with and without -mcpu=generic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113625 --- Comment #1 from Richard Biener --- Other targets (x86_64) default to -mtune=generic. Maybe configure time selection somehow interferes with this on aarch64?
[Bug tree-optimization/113622] [11/12/13/14 Regression] ICE with vectors in named registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #9 from Richard Biener --- I will have a look.
[Bug target/113618] [14 Regression] AArch64: memmove idiom regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113618 --- Comment #2 from Richard Biener --- It might be good to recognize this pattern in strlenopt or a related pass. A purely local transform would turn it into memcpy (temp, a, 64); memmove (b, a, 64); relying on DSE to eliminate the copy to temp if possible. Not sure if that possibly would be a bad transform if copying to temp is required. stp q30, q31, [sp] ldp q30, q31, [sp] why is CSE not able to catch this?
[Bug debug/103047] Inconsistent arguments ordering for inlined subroutine
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103047 Richard Biener changed: What|Removed |Added Known to work||14.0 Target Milestone|--- |14.0 Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from Richard Biener --- Fixed for GCC 14.
[Bug debug/29461] inconsistent variable output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29461 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Richard Biener --- <2><6c>: Abbrev Number: 8 (DW_TAG_formal_parameter) <6d> DW_AT_name: s_p <71> DW_AT_decl_file : 1 <72> DW_AT_decl_line : 3 <73> DW_AT_decl_column : 19 <74> DW_AT_type: <0x8a> <78> DW_AT_location: 2 byte block: 91 58 (DW_OP_fbreg: -40) <2><7b>: Abbrev Number: 9 (DW_TAG_variable) <7c> DW_AT_name: ss <7f> DW_AT_decl_file : 1 <80> DW_AT_decl_line : 5 <81> DW_AT_decl_column : 20 <82> DW_AT_type: <0x46> <86> DW_AT_location: 2 byte block: 91 68 (DW_OP_fbreg: -24) so we can and do now make those equal. With -O1 we have <2><6c>: Abbrev Number: 9 (DW_TAG_formal_parameter) <6d> DW_AT_name: s_p <71> DW_AT_decl_file : 1 <72> DW_AT_decl_line : 3 <73> DW_AT_decl_column : 19 <74> DW_AT_type: <0xc0> <78> DW_AT_location: 0x12 (location list) <7c> DW_AT_GNU_locviews: 0xc <2><80>: Abbrev Number: 10 (DW_TAG_variable) <81> DW_AT_name: ss <84> DW_AT_decl_file : 1 <85> DW_AT_decl_line : 5 <86> DW_AT_decl_column : 20 <87> DW_AT_type: <0x46> <8b> DW_AT_location: 0x2b (location list) <8f> DW_AT_GNU_locviews: 0x25 0012 v000 v000 views at 000c for: 0008 (DW_OP_reg5 (rdi)) 0017 v000 v000 views at 000e for: 0008 0012 (DW_OP_reg3 (rbx)) 001c v000 v000 views at 0010 for: 0012 0013 (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_stack_value) 0024 002b v001 v000 views at 0025 for: 0004 0008 (DW_OP_reg5 (rdi)) 0030 v000 v000 views at 0027 for: 0008 0012 (DW_OP_reg3 (rbx)) 0035 v000 v000 views at 0029 for: 0012 0013 (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_stack_value) 003d which is nearly equivalent and I suppose "correct" in that we're not showing it live during the prologue before the declaration/assignment func2: .LVL0: .LFB0: .file 1 "t.c" .loc 1 4 1 view -0 .cfi_startproc .loc 1 4 1 is_stmt 0 view .LVU1 pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx .loc 1 5 3 is_stmt 1 view .LVU2 .LVL1: .loc 1 6 3 view .LVU3 callfunc -O0 vs -O is also -fno-var-tracking vs. -fvar-tracking of course.
[Bug debug/27672] C frontend does not generate line information for multi-line conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27672 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Richard Biener --- Yes, this particular form seems fixed.
[Bug ada/26827] "GNAT BUG DETECTED" on compile GPS 1.3.1/gtkada
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26827 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-26 CC||dkm at gcc dot gnu.org Component|debug |ada Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #6 from Richard Biener --- Is this still an issue?
[Bug debug/103047] Inconsistent arguments ordering for inlined subroutine
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103047 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2024-01-26 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Known to fail||13.2.1 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Richard Biener --- Confirmed, still happens. But maybe this is a also gdb issue as we have for the similar case static inline int foo (int a, int b) { volatile int x = a + b; return x; } int main() { int c = 1; int d = 2; int res = foo (c, d); return res; } <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram) <2e> DW_AT_external: 1 <2e> DW_AT_name: (indirect string, offset: 0x6f): main ... <2><79>: Abbrev Number: 5 (DW_TAG_inlined_subroutine) <7a> DW_AT_abstract_origin: <0xca> <7e> DW_AT_entry_pc: 0 <86> DW_AT_GNU_entry_view: 4 <87> DW_AT_low_pc : 0 <8f> DW_AT_high_pc : 0xc <97> DW_AT_call_file : 1 <98> DW_AT_call_line : 11 <99> DW_AT_call_column : 13 <3><9a>: Abbrev Number: 6 (DW_TAG_formal_parameter) <9b> DW_AT_abstract_origin: <0xe1> <9f> DW_AT_location: 0x27 (location list) DW_AT_GNU_locviews: 0x25 <3>: Abbrev Number: 6 (DW_TAG_formal_parameter) DW_AT_abstract_origin: <0xd7> DW_AT_location: 0x4d (location list) DW_AT_GNU_locviews: 0x4b ... <1>: Abbrev Number: 10 (DW_TAG_subprogram) DW_AT_name: foo DW_AT_decl_file : 1 DW_AT_decl_line : 1 DW_AT_decl_column : 19 DW_AT_prototyped : 1 DW_AT_type: <0xbe> DW_AT_inline : 3(declared as inline and inlined) <2>: Abbrev Number: 11 (DW_TAG_formal_parameter) DW_AT_name: a DW_AT_decl_file : 1 DW_AT_decl_line : 1 DW_AT_decl_column : 28 DW_AT_type: <0xbe> <2>: Abbrev Number: 11 (DW_TAG_formal_parameter) DW_AT_name: b DW_AT_decl_file : 1 DW_AT_decl_line : 1 DW_AT_decl_column : 35 DW_AT_type: <0xbe> so it could look at the actual function for determining the order. The order of the formal parameters are reversed because the fake scope BLOCK the inliner adds has those as variables in that reverse order. We output them via decls_for_scope. static gimple * setup_one_parameter (copy_body_data *id, tree p, tree value, tree fn, basic_block bb, tree *vars) { ... /* Declare this new variable. */ DECL_CHAIN (var) = *vars; *vars = var; I have a patch.
[Bug debug/23551] dwarf records for inlines appear incomplete
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23551 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #21 from Richard Biener --- There is PR103047 for that now.
[Bug debug/19954] Compiler emits incomplete structure type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19954 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Known to work||13.2.1, 7.5.0 Status|NEW |RESOLVED --- Comment #4 from Richard Biener --- (gdb) n 6float* pt = d1.getData(1); /* set breakpoint here */ (gdb) ptype d1 type = class Derived1 { private: int mySize; int myId; float *myPointer; public: Derived1(int); ~Derived1(); virtual int getId(void); virtual float * getData(int); } works now as also verified in PR12385. Verified with GCC 7 and GCC 13.
[Bug debug/14169] Unneeded base types output in dwarf2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14169 Richard Biener changed: What|Removed |Added Last reconfirmed|2005-12-28 06:11:40 |2024-1-26 --- Comment #2 from Richard Biener --- Re-confirmed. With -fno-eliminate-unused-debug-types we output everything. I think we never prune unused base types, nor do we prune unused namespaces.
[Bug tree-optimization/113539] [14 Regression] perlbench miscompiled on aarch64 since r14-8223-g1c1853a70f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113539 --- Comment #8 from Richard Biener --- Does this still happen after r14-8413-g578c7b91f418eb?
[Bug tree-optimization/113467] [14 regression] libgcrypt-1.10.3 is miscompiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467 --- Comment #22 from Richard Biener --- Is this fixed meanwhile?
[Bug c/85800] A miscompilation bug with unsigned char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #6 from Richard Biener --- Yeah. I think we have enough duplicates that show cases where conditional equivalence propagation introduces these issues. Here it's already present in the source.
[Bug target/113615] New: internal compiler error: in extract_insn, at recog.cc:2812
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113615 Bug ID: 113615 Summary: internal compiler error: in extract_insn, at recog.cc:2812 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- I'm seeing a lot of ICEs like this when running libgomp testsuite with offloading for gfx1030. /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90: In function 'accum_._omp_fn.1':^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: error: unrecognizable insn:^M (insn 108 107 109 6 (set (reg:V8SF 849)^M (unspec:V8SF [^M (reg:V8SF 844 [ vect__43.12_106 ]) repeated x2^M (const_int 1 [0x1])^M ] UNSPEC_PLUS_DPP_SHR)) "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90":22:29 discrim 1 -1^M (nil))^M during RTL pass: vregs^M /space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/declare_target-4.f90:20:38: internal compiler error: in extract_insn, at recog.cc:2812^M other ones: (insn 93 92 94 7 (set (reg:V64DF 805)^M (unspec:V64DF [^M (reg:V64DF 802 [ vect__31.53_89 ])^M (const_int 1 [0x1])^M ] UNSPEC_MOV_DPP_SHR)) "/space/rguenther/src/gcc-autopar_devel/libgomp/testsuite/libgomp.fortran/examples-4/target_data-3.f90":51:41 -1^M
[Bug tree-optimization/113602] ICE: in vn_reference_maybe_forwprop_address, at tree-ssa-sccvn.cc:1426 with invalid _BitInt() register asm with -O2 -fno-tree-loop-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113602 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Richard Biener --- Fixed.
[Bug c++/113612] [13/14 Regression] ICE: SIGSEGV in get_template_info (pt.cc:378) or tree_check (tree.h:3611) with invalid -fpreprocessed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113612 Richard Biener changed: What|Removed |Added Priority|P3 |P4
[Bug tree-optimization/113602] ICE: in vn_reference_maybe_forwprop_address, at tree-ssa-sccvn.cc:1426 with invalid _BitInt() register asm with -O2 -fno-tree-loop-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113602 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener --- (gdb) p tem.last () $2 = (vn_reference_op_struct &) @0x7fffc820: {opcode = VAR_DECL, clique = 0, base = 0, reverse = 0, align = 0, off = {coeffs = {-1}}, type = , op0 = , op1 = , op2 = } (gdb) p debug_vn_reference_ops (tem) {array_ref<_4,0,1>,view_convert_expr,r} (gdb) p debug_generic_expr (addr) _CONVERT_EXPR(r)[_4] We're valueizing MEM [(_BitInt(503) *)vectp.5_18] trying to forward the vectp.5_18 def vectp.5_18 = _CONVERT_EXPR(r)[_4]; but we're not anticipating this shape of a non-invariant ADDR_EXPR. We do wrap all VAR_DECLs but DECL_HARD_REGISTER inside a MEM_REF but then a DECL_HARD_REGISTER shouldn't be addressable so the IL is actually invalid, generated by vectorization (but not diagnosed by IL checking). I'm not sure to what extent we should try to paper over this though ... The following works for me: diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc index ae55bf6aa48..f37734b5340 100644 --- a/gcc/tree-data-ref.cc +++ b/gcc/tree-data-ref.cc @@ -1182,7 +1182,12 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree ref, base = TREE_OPERAND (base, 0); } else -base = build_fold_addr_expr (base); +{ + if (may_be_nonaddressable_p (base)) + return opt_result::failure_at (stmt, + "failed: base not addressable.\n"); + base = build_fold_addr_expr (base); +} if (in_loop) {
[Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 --- Comment #3 from Richard Biener --- I'll note that esp. two-lane reductions (or in general two-lane BB vectorization) is hardly profitable on modern x86 uarchs unless the vectorized code is interleaved with other non-vectorized code that can execute at the same time. vectorizing two lanes will only make them dependent on each other while when not vectorized modern uarchs have no difficulty in executing them in parallel (but without the tied dependences). It's only when there's sufficient benefit, aka more lanes, approaching the issue width or the number of available ports for the ops, or the whole SLP mostly consisting of loads/stores, that BB vectorization is going to be profitable. Note the cost model only ever looks at the stmts participating in the vectorization, not the "surrounding" code, and it would be difficult to include that since the schedule on GIMPLE isn't even close to what we get later. The reduction op is also a serialization point on the scalar side of course, whether that means that BB reductions with two lanes are possibly better candidates than grouped BB stores with two lanes is another question. The BB reduction op itself is costed properly. So the 525.x264_r case might be loop vectorization, OTOH the epilogue cost is hardly ever a knob that decides whether a vectorization is profitable. I think we need to figure out what exactly gets slower (and hope it's not scattered all over the place)
[Bug middle-end/113596] Stack memory leakage caused by inline alloca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113596 --- Comment #10 from Richard Biener --- (In reply to Jakub Jelinek from comment #9) > Created attachment 57215 [details] > gcc14-pr113596.patch > > Untested patch to do that. > The disadvantage of doing that is that it may penalize inline calls which > just use VLAs, because calls_alloca covers even those functions. For simple: > static inline __attribute__((always_inline)) void > foo (int n) > { > char p[n]; > bar (p, n); > } > the fab1 pass actually removes redundant pair of stack_save/stack_restore, > but > bet if it would be something like { call (); { char p[n]; bar (p, n); } call > (); } then it wouldn't. > Anyway, this isn't a regression, so I think it is stage1 material for GCC 15. Most definitely. We can make ->calls_alloca more precise though of course we usually also do not want to inline functions with VLAs. IIRC a VLA forces a frame pointer for the caller then.
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 --- Comment #12 from Richard Biener --- Created attachment 57214 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57214=edit prototype fix The attached prototype fixes the testcase for me.
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 --- Comment #11 from Richard Biener --- In DSE the only differences is fbt (0x751a1a50: (plus:DI (reg/v/f:DI 117 [ u ]) -(reg:DI 146 [ _44 ]))) == (address 0) +(reg:DI 146 [ _44 ]))) == (nil) fbt (0x7700b3c0: (reg/f:DI 64 sfp)) == (address:DI -3) -bac false +bac true that's for (mem:BLK (reg/f:DI 64 sfp) [0 A8]) vs (mem:V4SF (plus:DI (reg/v/f:DI 117 [ u ]) (reg:DI 146 [ _44 ])) [0 MEM <__Float32x4_t> [(float * {ref-all})_42]+0 S16 A32]) from #0 0x02ff3796 in scan_reads (insn_info=0x5e5b680, gen=0x5ec2338, kill=0x5ec2358) at /space/rguenther/src/gcc/gcc/dse.cc:3156 #1 0x02ff39b1 in dse_step3_scan (bb=) at /space/rguenther/src/gcc/gcc/dse.cc:3238 processing (insn 62 61 64 5 (set (reg:V4SF 147 [ MEM <__Float32x4_t> [(float * {ref-all})_42] ]) (mem:V4SF (plus:DI (reg/v/f:DI 117 [ u ]) (reg:DI 146 [ _44 ])) [0 MEM <__Float32x4_t> [(float * {ref-all})_42]+0 S16 A32])) "include/arm_neon.h":12531:36 1274 {*aarch64_simd_movv4sf} (expr_list:REG_DEAD (reg:DI 146 [ _44 ]) (nil))) in this case we have _44 point to NONLOCAL only. It got arg_base_value as base value (from the MEM_EXPR and that points-to set we could eventually derive this very same base term as well). But I'll note that (mem:BLK (reg/f:DI 64 sfp) [0 A8]) is artificial, generated by DSE get_group_info via record_store on (insn 13 12 14 2 (set (mem/c:V2x16QI (reg/f:DI 119) [0 +0 S32 A128]) (unspec:V2x16QI [ (reg:V16QI 121) repeated x2 ] UNSPEC_STP)) "t.cc":12:10 discrim 1 92 {*store_pair_16} (nil)) which is figured to be const_or_frame_p () based. That notably lacks a MEM_EXPR (though the bare MEM means only base_alias_check would ever be able to disambiguate here).
[Bug other/113575] [14 Regression] memory hog building insn-opinit.o (i686-linux-gnu -> riscv64-linux-gnu)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575 --- Comment #13 from Richard Biener --- (In reply to Robin Dapp from comment #12) > Created attachment 57209 [details] > Tentative > > I tested the attached "fix". On my machine with 13.2 host compiler it > reduced the build time for insn-opinit.cc from > 4 mins to < 2 mins and the > memory usage from >1G to 600ish M. I didn't observe 3.5G before, though. > > For now I just went with an arbitrary threshold of 5000 patterns and > splitting into 10 functions. After testing on x86 and aarch64 I realized > that both have <3000 patterns so right now it would only split riscv's init > function. > > Or rather the other way, i.e. splitting into fixed-size chunks (of 1000) > instead? Yeah, I'd simplify it by doing exactly that.
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 --- Comment #10 from Richard Biener --- Created attachment 57212 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57212=edit patch for debugging Btw, I've used the attached to investigate other issues with the change. It will show the outcome of base_alias_check and find_base_term in dumps. One issue is that we're much more dependent on MEM_EXPRs being present. Before figuring there wouldn't be much important regressions the idea was to instead of doing find_base_term have a known base value recorded in the MEM_ATTRs, and as the only important ones should be the special ones for argument frame and stack-based represent that by an enum (rather than the other possibility of using ADDRESS). I'll also note that for spill slots we get around to use spill_slot_decl and set_mem_attrs_for_spill. I've not yet convinced myself that the other special bases we have really form a completely separate memory class. But if they do then accesses should do something similar there (but mind scheduling of frame related instructions ...). Argument stack slots are one important class, set up by init_alias_analysis. But those are also backed by regular decls at times (but not always)? assign_stack_temp "allocated" memory is another class, we're reusing slots during RTL expansion and they get (even if shared) a specific alias set. I don't think we ever release those temps and say re-use the space for spilling so assigning a different decl to each slot should eventually work.
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0 Ever confirmed|0 |1 Last reconfirmed||2024-01-25 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
[Bug rtl-optimization/113597] [14 Regression] aarch64: Significant code quality regression since r14-8346-ga98d5130a6dcff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113597 --- Comment #1 from Richard Biener --- I will have a look - but can you explain for me what I see? I suppose the testcase was reduced from something? Is the assembly diff complete? That is, do we really have more fmla or are they just moved? + stp q31, q31, [sp, 256] that's a store? A paired store? Aka, the sequence fills a stack(?) region with replications of q31?
[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #15 from Richard Biener --- (In reply to Richard Sandiford from comment #13) > I don't think there's any principle that upper bits must be zero. > How do we end up with a pattern that depends on that being the case? I think the problem is the cbranch pattern which looks at all of the QImode mask - but of course it doesn't know it's really V4BImode it's working on ... If there's no principle that the upper bits should be zero I think we need a way for the target to say so.
[Bug middle-end/113596] Stack memory leakage caused by inline alloca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113596 --- Comment #7 from Richard Biener --- In theory, if somebody really wanted it, we could replace alloca with __builtin_stack_save/restore during inlining (not sure if it would simply work, and be efficient, by just putting save at the start of the function and restore at the end). We could also warn when (forced-)inlining a function calling alloca.
[Bug tree-optimization/113592] missed partial sum optimization in vectorizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592 Richard Biener changed: What|Removed |Added Target||x86_64-*-* --- Comment #4 from Richard Biener --- The vectorizer for the original testcase generates # vect_sum_20.8_49 = PHI ... vect__9.20_68 = vect__5.12_55 * vect__8.16_61; vect__9.20_69 = vect__5.12_56 * vect__8.17_63; vect__9.20_70 = vect__5.12_57 * vect__8.18_65; vect__9.20_71 = vect__5.12_58 * vect__8.19_67; _9 = _5 * _8; vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49; vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72; vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73; vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74; sum_16 = _9 + sum_20; the adds are from the optimization to reduce the number of reduction IVs (we could alternatively keep them independent with 4 IVs and handle the reducing in the epilogue). This is to reduce register pressure. But this also shows if the issue isn't the multiple IVs, that this could be handled by reassoc + FMA forming given the vectorizer itself doesn't produce FMAs here.
[Bug tree-optimization/113590] The vectorizer introduces signed overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113590 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed||2024-01-25 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. Should be reasonably easy to fix - we either move all induction variable updates to the latch or compute them with unsigned arithmetic (we usually prefer an empty latch).
[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #8 from Richard Biener --- (In reply to JuzheZhong from comment #7) > > But I wonder if we see it is beneficial on some boards, could you teach us > how we can enable vectorization for such case according to uarchs ? If you figure how to optimally vectorize this for a given uarch I can definitely guide you.
[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #11 from Richard Biener --- (In reply to Hongtao Liu from comment #8) > maybe > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 1fd957288d4..6d321f9baef 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -8035,6 +8035,9 @@ native_encode_vector_part (const_tree expr, unsigned > char *ptr, int len, >unsigned int extract_elts = extract_bytes * elts_per_byte; >for (unsigned int i = 0; i < extract_elts; ++i) > { > + /* Don't encode any bit beyond the range of the vector. */ > + if (first_elt + i > count) > + break; Hmm. I think that VECTOR_CST_ELT should have ICEd for out-of-bound element queries but it seems to make up elements for us here. Richard? But yes, we do unsigned int extract_elts = extract_bytes * elts_per_byte; and since native_encode_* and native_interpret_* operate on bytes we have difficulties dealing with bit-precision entities with padding. There's either the possibility to fail encoding when that happens or do something else. Note that RTL expansion will do case VECTOR_CST: { tree tmp = NULL_TREE; if (VECTOR_MODE_P (mode)) return const_vector_from_tree (exp); scalar_int_mode int_mode; if (is_int_mode (mode, _mode)) { tree type_for_mode = lang_hooks.types.type_for_mode (int_mode, 1); if (type_for_mode) tmp = fold_unary_loc (loc, VIEW_CONVERT_EXPR, type_for_mode, exp); which I think should always succeed (otherwise it falls back to expanding a CTOR). That means failing to encode/interpret might get into store_constructor which I think will zero a register destination and thus fill padding with zeros. So yeah, something like this looks OK, but I think instead of only testing against 'count' we should also test against TYPE_VECTOR_SUBPARTS (that might be variable, so with known_gt). Would be interesting to see whether this fixes the issue without the now installed patch.
[Bug target/105275] [12/13/14 regression] 525.x264_r and 538.imagick_r regressed on x86_64 at -O2 with PGO after r12-7319-g90d693bdc9d718
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105275 --- Comment #4 from Richard Biener --- Since this was a costing change I wonder if we identified the code change responsible and thus have a testcase? I realize that for maximum assurance one would need to have a debug counter for switching the patch on/off to have it apply more selectively (possibly per SLP attempt rather than per cost hook invocation which would be even more tricky to do). Feeding another parameter to the hook via a new flag in the vinfo might be possible (and set that from a dbg_cnt call) for example.
[Bug ipa/113520] ICE with mismatched types with LTO (tree check: expected array_type, have integer_type in array_ref_low_bound)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113520 --- Comment #9 from Richard Biener --- (In reply to Jan Hubicka from comment #8) > I think the ipa-cp summaries should be used only when types match. At least > Martin added type streaming for all the jump functions. So we are missing > some check? I don't think this applies here, we're having foo ([5]); with b being int vs int[], so it's not about the argument types matching or the type of the JF but instead the value effectively changing during streaming due to varpool node "merging". As said elsewhere we avoid the issue by preserving the type of possibly merged decls by wrapping it with a MEM_REF (for rvalues a V_C_E would be possible as well). And we unwrap it later when possible (but that's of course optional). I think any summary streaming referencing decls subject to WPA merging need to do the same - it's not possible to recover after the fact since the original type is lost (for the ARRAY_REF case it might be possible to infer a type that would be good enough of course).
[Bug c++/113581] Ignoring GCC unroll loop annotation for loops with increment in condition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113581 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2024-01-24 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener --- Confirmed. The reason is we're seeing : i.2_3 = i; i = i.2_3 + 1; _4 = i.2_3 <= 2; D.2811 = .ANNOTATE (_4, 1, 16); retval.1 = D.2811; if (retval.1 != 0) goto ; [INV] else goto ; [INV] and the .ANNOTATE is not directly peceeding the condition but the assign is in the way. The FE generates if (<>>) goto ; else goto ; for some reason this forces an extra temporary via voidify_wrapper_expr () during gimplification. Possibly the frontend simply lacks knowledge of ANNOTATE_EXPR when checking whether it needs this cleanup_point at all (but I see it's too simplistic, checking for side-effects only). We could walk simple assigns like those of course, but the extra temporary looks superfluous.
[Bug tree-optimization/113576] [14 regression] 502.gcc_r hangs r14-8223-g1c1853a70f9422169190e65e568dcccbce02d95c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576 --- Comment #5 from Richard Biener --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index fe631252dc2..28ad03e0b8a 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -991,8 +991,12 @@ vec_init_loop_exit_info (class loop *loop) { tree may_be_zero = niter_desc.may_be_zero; if ((integer_zerop (may_be_zero) - || integer_nonzerop (may_be_zero) - || COMPARISON_CLASS_P (may_be_zero)) + /* As we are handling may_be_zero that's not false by + rewriting niter to may_be_zero ? 0 : niter we require + an empty latch. */ + || (exit->src == single_pred (loop->latch) + && (integer_nonzerop (may_be_zero) + || COMPARISON_CLASS_P (may_be_zero && (!candidate || dominated_by_p (CDI_DOMINATORS, exit->src, candidate->src))) fixes it, I'm testing this.