Re: [PATCH, rs6000] Correct match pattern in pr56605.c
Hi, On 9/4/2022 上午 3:36, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 28, 2022 at 11:17:27AM +0800, HAO CHEN GUI wrote: >> This patch corrects the match pattern in pr56605.c. The former pattern >> is wrong and test case fails with GCC11. It should match following insn on >> each subtarget after mode promotion is disabled. The patch need to be >> backported to GCC11. >> >> //gimple >> _17 = (unsigned int) _20; >> prolog_loop_niters.4_23 = _17 & 3; >> >> //rtl >> (insn 19 18 20 2 (parallel [ >> (set (reg:CC 208) >> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0) >> (const_int 3 [0x3])) >> (const_int 0 [0]))) >> (set (reg:SI 129 [ prolog_loop_niters.5 ]) >> (and:SI (subreg:SI (reg:DI 207) 0) >> (const_int 3 [0x3]))) >> ]) 197 {*andsi3_imm_mask_dot2} >> >> >> Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no >> regressions. >> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot. >> >> ChangeLog >> 2022-02-28 Haochen Gui >> >> gcc/testsuite/ >> PR target/102146 >> * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass. >> >> >> patch.diff >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c >> b/gcc/testsuite/gcc.target/powerpc/pr56605.c >> index fdedbfc573d..231d808aa99 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c >> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c >> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia) >> ia[i] = (int) sb[i]; >> } >> >> -/* { dg-final { scan-rtl-dump-times {\(compare:CC >> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */ >> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI >> \(reg:DI} 1 "combine" } } */ > > The old pattern uses non-capturing braces here, which are required for > ...-times to work correctly. The zero_extend alternative is required as > well, as is making the subreg optional (we have an actual reg in one of > the cases currently). What do you consider wrong about the old pattern, > what in the generated code is different from what you expect? > > It works correctly on p7 etc. btw; where do you see it fail? p10? > > I saw it failed with GCC11. FAIL: gcc.target/powerpc/pr56605.c scan-rtl-dump-times combine "\\(compare:CC \\((?:and|zero_extend):(?:DI) \\((?:sub)?reg:[SD]I" 1 On ppc64le with GCC11, it should match following insn. (compare:CC (and:SI (subreg:SI (reg:DI 208) 0) With GCC12, it should match following insn. (compare:CC (and:SI (subreg:SI (reg:DI 207) 0) With GCC12 the pattern actually matches: (compare:CC (and:DI (subreg:DI (reg:SI 136 [ niters.6 ]) 0) So GCC12 doesn't fail the case. But it actually match wrong insn. There is no such insn in GCC11 combine dump. So GCC11 hits the problem. Thanks. > Segher
[PATCH v2, pushed] rs6000/test: Adjust p9-vec-length-{full, epil}-7.c [PR103196]
on 2022/4/8 10:34 PM, Segher Boessenkool wrote: > Hi! > > Thanks for investigating. > > On Fri, Apr 08, 2022 at 03:25:51PM +0800, Kewen.Lin wrote: >> on 2022/4/8 3:29 AM, Segher Boessenkool wrote: >>> On Thu, Apr 07, 2022 at 09:19:51AM -0500, will schmidt wrote: On Mon, 2022-02-28 at 13:37 +0800, Kewen.Lin via Gcc-patches wrote: > As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the > complete unrolling can happen on some of its loops. This patch is to > use pragma "GCC unroll 0" to disable all possible loop unrollings. > Hope it can help the case not that fragile. ok Is the lack of effectiveness of "-fno-unroll-loops" otherwise understood, or is there further issue behind that option? >>> >>> There is -fpeel-loops as well, and cunroll is independent of all of >>> these as well? >> >> Yes, kind of. As below code, unroll-loops and peel-loops only affect >> whether it's acceptable to grow size. >> >> /* Allow cunroll to grow size accordingly. */ >> if (!opts_set->x_flag_cunroll_grow_size) >> opts->x_flag_cunroll_grow_size >> = (opts->x_flag_unroll_loops >> || opts->x_flag_peel_loops >> || opts->x_optimize >= 3); >> >> This flag_cunroll_grow_size doesn't gate the pass, it's only for >> "may_increase_size". > > Yuck. > >> unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size, >> true); >> >> The cunroll still can take effect if its transformation doesn't grow >> size even if we specify -fno-unroll-loops and -fno-peel-loops. > > Yeah. And the pragma disables it, so we really should have a command > line option that does the same (but globally), too. > I would expect the effect of the option, versus the pragma, two to roughly equivalent. Obviously it is not. :-) >>> >>> Yes, me too. And I do not see what makes the difference, if it isn't >>> the peel thing :-( >>> >>> Ke Wen, can you try with -fno-peel-loops please? >> >> I had a try and it didn't help as the cunroll pass doesn't grow size >> for this case. By the way, -fdisable-tree-cunroll does work. But I >> thought using pragma looks better as it's not an internal thing like >> the disabling option and I also expect other future unrolling related >> changes would respect it. >> >> Do you also prefer pragma, or want that disabling option? > > That flag please. And we should have a -fno-unroll that disables all > separate kinds of unrolling, imo -- but that is a separate problem, not > something to hold up your patch for. > > Okay for trunk without the pragma. Thanks! > Thanks Segher! I just committed the attached patch in r12-8075 which follows your preference with -fdisable-tree-cunroll. > (The pragma is a good solution if you want to disable unrolling for a > very specific piece of code only, which is not the case here: the > problem currently shows up in one place, but it could happen elsewhere!) > > Yeah, but for this failure, all loops of the cases use the same macro on which the proposed pragma is supposed to be used, so all loops should be affected (I would expect it's the same as global effect). :) BR, Kewen From e2ecbeeb47d20d6679993a74064bc9efdb827eef Mon Sep 17 00:00:00 2001 From: Kewen Lin Date: Sun, 10 Apr 2022 21:31:09 -0500 Subject: [PATCH] rs6000/test: Adjust p9-vec-length-{full,epil}-7.c [PR103196] As PR103196 shows, complete unrolling pass still takes effect even if we have specified the option "-fno-unroll-loops". The loops in that case are not expected to be transformed by it, otherwise the expected counts change. This patch is to add the disabling option to make them not sensitive to complete unrolling. PR testsuite/103196 gcc/testsuite/ChangeLog: * gcc.target/powerpc/p9-vec-length-epil-7.c: Add option -fdisable-tree-cunroll. * gcc.target/powerpc/p9-vec-length-full-7.c: Likewise. --- gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c | 4 +++- gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c index a27ee347ca1..011b731f7c5 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c @@ -1,5 +1,7 @@ /* { dg-do compile { target { lp64 && powerpc_p9vector_ok } } } */ -/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -ffast-math" } */ +/* Pass cunroll isn't disabled by -fno-unroll-loops, so use explicit + disabling option for it. */ +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -ffast-math -fdisable-tree-cunroll" } */ /* { dg-additional-options "--param=vect-partial-vector-usage=1" } */ diff --git
[committed] Minor bfin codegen bugfix
builtin-arith-overflow-3 has been failing on bfin-elf for a while. I've had a workaround installed on the tester for a few months and I finally got some time to dig into it over the weekend. Ultimately I was able to nail the problem down to the representation of the rotate left by one position instruction. I'm actually quite happy with that result as I was briefly worried it was a deeper problem in combine. Essentially we're trying to synthesize a DImode shift using an SImode rotates through CC. The key backend insn looks like: (define_insn "rol_one" [(set (match_operand:SI 0 "register_operand" "+d") (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "d") (const_int 1)) (zero_extend:SI (reg:BI REG_CC (set (reg:BI REG_CC) (zero_extract:BI (match_dup 1) (const_int 31) (const_int 0)))] "" "%0 = ROT %1 BY 1%!" [(set_attr "type" "dsp32shiftimm")]) What the 2nd set in the pattern wants to do is indicate that CC is set to the high bit from operand 1. But the RTL pattern is completely bogus as it's asking for a 31 bit wide field starting at offset 0, which is just silly for BImode. That bogus representation led combine to do some "unexpected" transformations. The obvious fix is to use (const_int 1) (const_int 31) in the 2nd set. That indicates we want one bit starting at position 31. With that pattern fixed, bfin returns to a normal state without needing my workaround. Installed on the trunk, Jeff commit 8d331aab65488b3998bd106205bbe6cab5df31b5 Author: Jeff Law Date: Sun Apr 10 23:02:48 2022 -0400 [committed] Minor bfin codegen bugfix gcc/ * config/bfin/bfin.md (rol_one): Fix pattern to indicate the sign bit of the source ends up in CC. diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md index 0e44653d7cb..56b24726bc2 100644 --- a/gcc/config/bfin/bfin.md +++ b/gcc/config/bfin/bfin.md @@ -1741,7 +1741,7 @@ (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "d") (const_int 1)) (zero_extend:SI (reg:BI REG_CC (set (reg:BI REG_CC) - (zero_extract:BI (match_dup 1) (const_int 31) (const_int 0)))] + (zero_extract:BI (match_dup 1) (const_int 1) (const_int 31)))] "" "%0 = ROT %1 BY 1%!" [(set_attr "type" "dsp32shiftimm")])
Re: [PATCH, rs6000] Correct match pattern in pr56605.c
Hi, On 9/4/2022 上午 12:48, will schmidt wrote: > On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote: >> Hi, >> This patch corrects the match pattern in pr56605.c. The former pattern >> is wrong and test case fails with GCC11. It should match following insn on >> each subtarget after mode promotion is disabled. The patch need to be >> backported to GCC11. >> > > Hi, > > I note This patch appears to (partially?) address the P1 [11 regression] pr. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146 There are two issues left in this PR. One is pr56605.c. My patch fixes it. Another is prefix-no-update.c. The patch Segher proposed in 103197 could fix it. Thanks. > > > The issue makes reference to a different proposed patch > in issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197 > titled ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx > for a single byte > proposed patch named > rs6000: Disparage lfiwzx and similar > > I can't address any of the background or history there. :-) > > >> //gimple >> _17 = (unsigned int) _20; >> prolog_loop_niters.4_23 = _17 & 3; >> >> //rtl >> (insn 19 18 20 2 (parallel [ >> (set (reg:CC 208) >> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0) >> (const_int 3 [0x3])) >> (const_int 0 [0]))) >> (set (reg:SI 129 [ prolog_loop_niters.5 ]) >> (and:SI (subreg:SI (reg:DI 207) 0) >> (const_int 3 [0x3]))) >> ]) 197 {*andsi3_imm_mask_dot2} >> >> >> Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no >> regressions. >> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot. >> >> ChangeLog >> 2022-02-28 Haochen Gui >> >> gcc/testsuite/ >> PR target/102146 >> * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass. >> >> >> patch.diff >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c >> b/gcc/testsuite/gcc.target/powerpc/pr56605.c >> index fdedbfc573d..231d808aa99 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c >> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c >> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia) >> ia[i] = (int) sb[i]; >> } >> >> -/* { dg-final { scan-rtl-dump-times {\(compare:CC >> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */ >> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI >> \(reg:DI} 1 "combine" } } */ > > > SO with the update, (i squint so this is an approximate handwave) this > drops the zero_extend and changes the destination type to be DI for the > scan-rtl.This appears to match the rtl as mentioned in the patch > comments. > > >> >
Avoid overflow in ipa-modref-tree.cc
Hi, the testcase triggers ICE since computation overflows on two accesses that are very far away d->b[-144115188075855873] and d->b[144678138029277184]. This patch makes the relevant part of modref to use poly_offset_int. It is kind of weird to store bit offsets into poly_int64 but it is what alias oracle does. There are some other places where computations are done that may overflow, I will fix them incrementally (it is not completely mechanical due to signed/unsigned info). Bootstrapped/regtested x86_64-linux. I plan to commit it tomorrow if there are no complains. gcc/ChangeLog: 2022-04-11 Jan Hubicka * ipa-modref-tree.cc (modref_access_node::closer_pair_p): Use poly_offset_int to avoid overflow. (modref_access_node::update2): Likewise. gcc/testsuite/ChangeLog: 2022-04-11 Jan Hubicka * gcc.c-torture/compile/103818.c: New test. diff --git a/gcc/ipa-modref-tree.cc b/gcc/ipa-modref-tree.cc index f19af8c2b55..44cb645954f 100644 --- a/gcc/ipa-modref-tree.cc +++ b/gcc/ipa-modref-tree.cc @@ -267,34 +267,42 @@ modref_access_node::closer_pair_p (const modref_access_node , /* Now compute distance of the intervals. */ - poly_int64 dist1, dist2; + poly_offset_int dist1, dist2; if (known_le (offseta1, offsetb1)) { if (!known_size_p (a1.max_size)) dist1 = 0; else - dist1 = offsetb1 - offseta1 - a1.max_size; + dist1 = (poly_offset_int)offsetb1 + - (poly_offset_int)offseta1 + - (poly_offset_int)a1.max_size; } else { if (!known_size_p (b1.max_size)) dist1 = 0; else - dist1 = offseta1 - offsetb1 - b1.max_size; + dist1 = (poly_offset_int)offseta1 +- (poly_offset_int)offsetb1 +- (poly_offset_int)b1.max_size; } if (known_le (offseta2, offsetb2)) { if (!known_size_p (a2.max_size)) dist2 = 0; else - dist2 = offsetb2 - offseta2 - a2.max_size; + dist2 = (poly_offset_int)offsetb2 + - (poly_offset_int)offseta2 + - (poly_offset_int)a2.max_size; } else { if (!known_size_p (b2.max_size)) dist2 = 0; else - dist2 = offseta2 - offsetb2 - b2.max_size; + dist2 = offseta2 + - (poly_offset_int)offsetb2 + - (poly_offset_int)b2.max_size; } /* It may happen that intervals overlap in case size is different. Prefer the overlap to non-overlap. */ @@ -380,9 +388,16 @@ modref_access_node::update2 (poly_int64 parm_offset1, new_max_size = max_size2; else { - new_max_size = max_size2 + offset2 - offset1; - if (known_le (new_max_size, max_size1)) - new_max_size = max_size1; + poly_offset_int s = (poly_offset_int)max_size2 + + (poly_offset_int)offset2 + - (poly_offset_int)offset1; + if (s.to_shwi (_max_size)) + { + if (known_le (new_max_size, max_size1)) + new_max_size = max_size1; + } + else + new_max_size = -1; } update (parm_offset1, offset1, diff --git a/gcc/testsuite/gcc.c-torture/compile/103818.c b/gcc/testsuite/gcc.c-torture/compile/103818.c new file mode 100644 index 000..8a0148e243a --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/103818.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target lp64 } } */ +struct a { + int b[0]; +} c(struct a *d) { + d->b[0] = d->b[-144115188075855873] + d->b[11] * d->b[2] + +d->b[0] % d->b[1025] + d->b[5]; + d->b[0] = + d->b[144678138029277184] + d->b[0] & d->b[-3] * d->b[053] + d->b[7] ^ + d->b[-9] + d->b[14] + d->b[9] % d->b[49] + d->b[024] + d->b[82] & + d->b[4096]; +} +void main() {}
Re: [PATCH v3] MIPS: IPL is 8bit in Cause and Status registers if TARGET_MCU
On Tue, 15 Mar 2022, YunQiang Su wrote: > If MIPS MCU extension is enable, the IPL section in Cause and Status > registers has been expand to 8bit instead of 6bit. > > In Cause: the bits are 10-17. > In Status: the bits are 10-16 and 18. > > MD00834-2B-MUCON-AFP-01.03.pdf: P49 and P61. I can see you have committed this change, but I cannot see an approval posted to the mailing list. Who has approved your change? > diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc > index a1c4b437cd4..91e1e964f94 100644 > --- a/gcc/config/mips/mips.cc > +++ b/gcc/config/mips/mips.cc > @@ -12254,10 +12254,22 @@ mips_expand_prologue (void) > /* Insert the RIPL into our copy of SR (k1) as the new IPL. */ > if (!cfun->machine->keep_interrupts_masked_p > && cfun->machine->int_mask == INT_MASK_EIC) > - emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM), > -GEN_INT (6), > + { > + emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM), > +TARGET_MCU ? GEN_INT (7) : GEN_INT (6), > GEN_INT (SR_IPL), > gen_rtx_REG (SImode, K0_REG_NUM))); > + if (TARGET_MCU) > + { > + emit_insn (gen_lshrsi3 (gen_rtx_REG (SImode, K0_REG_NUM), > + gen_rtx_REG (SImode, K0_REG_NUM), > + GEN_INT (7))); > + emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM), > +GEN_INT (1), > +GEN_INT (SR_IPL+8), > +gen_rtx_REG (SImode, K0_REG_NUM))); > + } > + } While code generation has been corrected your change has code formatting issues which should have been addressed before committing. A test case should have been made too. Maciej
New Swedish PO file for 'gcc' (version 12.1-b20220403)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: https://translationproject.org/latest/gcc/sv.po (This file, 'gcc-12.1-b20220403.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH, v2] PR fortran/105184 - ICE in gfc_array_init_size, at fortran/trans-array.cc:5841
Hi Harald, Regtested again with no new failures. OK for mainline? Looks good to me. Thanks for the patch! Best regards Thomas
Re: [PATCH, v2] PR fortran/105184 - ICE in gfc_array_init_size, at fortran/trans-array.cc:5841
Hi Harald, It looks good to me - OK for mainline. Thanks Paul On Fri, 8 Apr 2022 at 21:45, Harald Anlauf via Fortran wrote: > Dear all, > > Am 06.04.22 um 22:30 schrieb Harald Anlauf via Fortran: > > Dear all, > > > > the logic for checking the allocate-coshape-spec in an ALLOCATE > > statement was sort of sideways, and allowed to pass invalid > > specifications to the code generation. > > > > The fix seems obvious (to me). > > after submitting the previous patch, I found another invalid case with > a missing lower bound which was still silently accepted. The attached > revised patch adds this check, improves the original error message to > actually point to the coarray specification, and renames the testcase > to align better with existing coarray testcases. > > > Regtested on x86_64-pc-linux-gnu. OK for mainline? > > (12 or wait for 13?). > > Regtested again with no new failures. OK for mainline? > > Thanks, > Harald > -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein