Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/28/19 8:53 PM, Gunther Nikl wrote:> Bernd Schmidt : >> On 11/23/19 9:53 PM, Bernd Schmidt wrote: >> move.w %a4,%d0 >> - tst.b %d0 >> - jeq .L352 >> + jeq .L353 >> >> And the reason - that's a movqi using move.w. > > Can this problem also happen on older (pre-ccmode) GCC versions? Or was > this only an issue of the ccmode conversion? This was an error in the conversion (I think). > What about the compare constraint errors? Are those also present on > older GCC versions but never surfaced? Those were present, but presumably nothing ever tried to rerecognize a compare insn after reload (unlike jumps) so you wouldn't get a crash. I haven't checked whether it could have produced invalid assembly - I'm guessing probably not. Bernd
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/26/19 3:21 AM, Joseph Myers wrote: > > The soft-float ColdFire build (--with-arch=cf --with-cpu=54455 > --disable-multilib) successfully built libgcc and glibc, but ran into an > ICE building the glibc tests. Again, I've not bisected but this commit > seems likely to be responsible. Compile the attached preprocessed source > with -O2. Try the following. This seems to be the same (preexisting) problem which got fixed for the normal 68k compare patterns, where an operand with nonimmediate_operand predicate allows constants in its constraints. Bernd diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index 25e0b73741f..d6df385787a 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -496,7 +496,7 @@ ;; needs to be reloaded. (define_mode_attr scc0_cf_constraints [(QI "=d") (HI "=d") (SI "=d,d,d")]) -(define_mode_attr cmp1_cf_constraints [(QI "dm") (HI "dm") (SI "mrKs,r,rm")]) +(define_mode_attr cmp1_cf_constraints [(QI "dm") (HI "dm") (SI "mr,r,rm")]) (define_mode_attr cmp2_cf_constraints [(QI "C0") (HI "C0") (SI "r,mrKs,C0")]) (define_mode_attr cmp2_cf_predicate [(QI "const0_operand") (HI "const0_operand") (SI "general_operand")])
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/26/19 1:36 AM, Joseph Myers wrote: > I'm seeing a libgcc build failure for coldfire in my build-many-glibcs.py > bot (m68k-linux-gnu configured --with-arch=cf --disable-multilib). That's > building _mulsc3.o; I get assembler errors: I overlooked a difference in the 68881 vs coldfire patterns when I combined them. They use different suffixes for register compares (I only spotted the different constraints). The following seems to fix the assembler failures. I cannot properly test Coldfire however due to lack of hardware. Bernd * config/m68k/,68k.c (m68k_output_compare_fp): Use .d instead of .x for register compares on Coldfire. diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index 8d010ebe6e9..4b30c401e80 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -4501,7 +4501,12 @@ m68k_output_compare_fp (rtx op0, rtx op1, rtx_code code) if (op1 == CONST0_RTX (GET_MODE (op0))) { if (FP_REG_P (op0)) - output_asm_insn ("ftst%.x %0", ops); + { + if (TARGET_COLDFIRE_FPU) + output_asm_insn ("ftst%.d %0", ops); + else + output_asm_insn ("ftst%.x %0", ops); + } else output_asm_insn (("ftst%." + prec + " %0").c_str (), ops); return code; @@ -4510,7 +4515,10 @@ m68k_output_compare_fp (rtx op0, rtx op1, rtx_code code) switch (which_alternative) { case 0: - output_asm_insn ("fcmp%.x %1,%0", ops); + if (TARGET_COLDFIRE_FPU) + output_asm_insn ("fcmp%.d %1,%0", ops); + else + output_asm_insn ("fcmp%.x %1,%0", ops); break; case 1: output_asm_insn (("fcmp%." + prec + " %f1,%0").c_str (), ops);
Autoinc vs reload and LRA
So I was curious what would happen if I turned on LRA for m68k. It turns out my autoinc patches from the cc0 patch set expose a bug in how LRA handles autoincrement. While it copies the logic from reload's inc_for_reload, it appears to be missing the find_reloads_address code to ensure an autoinc address is reloaded entirely if it is part of a jump. LRA can reload just the register inside a POST_INC, which leads to creating an output reload. Hence, a new version of the autoinc changes, below. Since reload is known to work, we allow autoinc in jumps unless targetm.lra_p. One part of the patch is a fix for the earlier combine patch which was already checked in, the other part is a new version of the auto-inc-dec patch. Bootstrapped and tested on the gcc135 machine (powerpc64le-unknown-linux-gnu). OK? Bernd * auto-inc-dec.c (merge_in_block): Allow autoinc in jumps unless LRA is enabled. * combine.c (can_combine_p): Disallow autoinc in jumps unless LRA is disabled. diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c index bdb6efab520..1b224cc9777 100644 --- a/gcc/auto-inc-dec.c +++ b/gcc/auto-inc-dec.c @@ -1441,10 +1441,9 @@ merge_in_block (int max_reg, basic_block bb) continue; } - /* This continue is deliberate. We do not want the uses of the - jump put into reg_next_use because it is not considered safe to - combine a preincrement with a jump. */ - if (JUMP_P (insn)) + /* Reload should handle auto-inc within a jump correctly, while LRA + is known to have issues with autoinc. */ + if (JUMP_P (insn) && targetm.lra_p ()) continue; if (dump_file) diff --git a/gcc/combine.c b/gcc/combine.c index 2e21459f504..3fbd84fcb80 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2117,12 +2117,16 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED, /* If INSN contains an autoincrement or autodecrement, make sure that register is not used between there and I3, and not already used in - I3 either. Neither must it be used in PRED or SUCC, if they exist. */ + I3 either. Neither must it be used in PRED or SUCC, if they exist. + Also insist that I3 not be a jump if using LRA; if it were one + and the incremented register were spilled, we would lose. + Reload handles this correctly. */ if (AUTO_INC_DEC) for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) if (REG_NOTE_KIND (link) == REG_INC - && (reg_used_between_p (XEXP (link, 0), insn, i3) + && ((JUMP_P (i3) && targetm.lra_p ()) + || reg_used_between_p (XEXP (link, 0), insn, i3) || (pred != NULL_RTX && reg_overlap_mentioned_p (XEXP (link, 0), PATTERN (pred))) || (pred2 != NULL_RTX
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/25/19 1:38 PM, Tobias Burnus wrote: > Thanks for the m68k work! Can you also update > https://gcc.gnu.org/backends.html ? Committed as obvious. Bernd commit f42834ad5e77c05cb6bc0908b8fc9282fec7fc19 Author: Bernd Schmidt Date: Mon Nov 25 13:48:08 2019 +0100 Change backends table to show m68k does not use cc0 diff --git a/htdocs/backends.html b/htdocs/backends.html index c9449065..c4b916d3 100644 --- a/htdocs/backends.html +++ b/htdocs/backends.html @@ -89,7 +89,7 @@ iq2000 | ??? FICB b g t lm32 | F g m32c |L FIlb gs m32r | FI b s -m68k | ?cpb i +m68k | ? pb i mcore | ?FIpb mgs mep| F Cb g t s microblaze | CB i s
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/25/19 1:34 PM, John Paul Adrian Glaubitz wrote: > Are all 4 + 2 patches in now? Thus, can we close the bug? We're missing one piece for better autoinc generation, but that's a small optimization issue. The cc0 conversion is complete. Bernd
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/23/19 6:36 PM, Jeff Law wrote: > Not really. I've already indicated to Bernd that he should go ahead and > commit the changes and we can iterate on any problems that arise. After the last fix, I did some more testing and since I feel confident that it really is in good shape now, I committed it. Thanks! Bernd
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/25/19 12:26 PM, Andreas Schwab wrote: > On Nov 24 2019, Bernd Schmidt wrote: > >> Whew, I think I have it. One tst instruction eliminated when it >> shouldn't have been: >> >> move.w %a4,%d0 >> - tst.b %d0 >> - jeq .L352 >> + jeq .L353 >> >> And the reason - that's a movqi using move.w. The following should fix >> it. > > Apparently that also fixed the testsuite regressions. I still wish I knew how you managed to reproduce those. That would have been easier to debug than messing around with a three-stage cross-to-native setup. Bernd
Re: [PATCH 4/4] Fix autoinc cbranch
On 11/24/19 8:43 PM, Segher Boessenkool wrote: > But. Allowing autoinc into jump insns means those jump insns may then > eventually need an output reload; it may just have been because of that? That's almost certainly the reasoning, but as I pointed out in my original mail - reload is careful around autoincs and emits the reload sequence before the reloaded insn. For example, see inc_for_reload: /* Postincrement. Because this might be a jump insn or a compare, and because RELOADREG may not be available after the insn in an input reload, we must do the incrementation before the insn being reloaded for. On m68k with cc0 this is already necessary, because even a cmp insn cannot have output reloads (they would overwrite the flags). This is why I think it should be safe to allow them in jumps too. Bernd
Re: [PATCH 4/4] Fix autoinc cbranch
On 11/19/19 1:27 AM, Segher Boessenkool wrote: > The combine parts are okay for trunk, if you keep an eye out :-) Thanks, now committed. That leaves the auto-inc-dec part. Since we're being adventurous, I've also bootstrapped and tested the following in the meantime (on the gcc135 machine). This just deletes the tests for jumps entirely rather than hedging bets. Bernd Index: gcc/auto-inc-dec.c === --- gcc/auto-inc-dec.c (revision 278653) +++ gcc/auto-inc-dec.c (working copy) @@ -1441,12 +1441,6 @@ merge_in_block (int max_reg, basic_block continue; } - /* This continue is deliberate. We do not want the uses of the - jump put into reg_next_use because it is not considered safe to - combine a preincrement with a jump. */ - if (JUMP_P (insn)) - continue; - if (dump_file) dump_insn_slim (dump_file, insn);
Re: [PATCH ix86] Fix rtx_costs for flag-setting adds
On 11/22/19 6:05 PM, Uros Bizjak wrote: > Indeed, this is a different case, an overflow test that results in one > CMP insn. I think, we should check if the second operand is either 0 > (then proceed as it is now), or if the second operand equals first > operand of PLUS insn, then we actually emit CMP insn (please see > PR30315). Here's what I committed - preapproved by Uros off-list (I forgot Reply-All again). Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 278653) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2019-11-24 Bernd Schmidt + + * config/i386/i386.c (ix86_rtx_costs): Handle care of a PLUS in a + COMPARE, representing an overflow detection. + 2019-11-23 Jan Hubicka * cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT): Remove. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 278653) +++ gcc/config/i386/i386.c (working copy) @@ -19501,6 +19501,15 @@ ix86_rtx_costs (rtx x, machine_mode mode return true; } + if (GET_CODE (XEXP (x, 0)) == PLUS + && rtx_equal_p (XEXP (XEXP (x, 0), 0), XEXP (x, 1))) + { + /* This is an overflow detection, count it as a normal compare. */ + *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), + COMPARE, 0, speed); + return true; + } + /* The embedded comparison operand is completely free. */ if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0))) && XEXP (x, 1) == const0_rtx)
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/23/19 9:53 PM, Bernd Schmidt wrote: > I'll spend a few more days trying to see if I can do something about the > bootstrap failure Mikael saw (currently trying to do a two-stage cross > build rather than a really slow bootstrap). Whew, I think I have it. One tst instruction eliminated when it shouldn't have been: move.w %a4,%d0 - tst.b %d0 - jeq .L352 + jeq .L353 And the reason - that's a movqi using move.w. The following should fix it. I'll run a few more tests, and then check it all in as Jeff suggested if things looks OK. Bernd diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index c72325fa4ab..8d010ebe6e9 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -3314,7 +3314,11 @@ output_move_qimode (rtx *operands) /* 68k family (including the 5200 ColdFire) does not support byte moves to from address registers. */ if (ADDRESS_REG_P (operands[0]) || ADDRESS_REG_P (operands[1])) -return "move%.w %1,%0"; +{ + if (ADDRESS_REG_P (operands[1])) + CC_STATUS_INIT; + return "move%.w %1,%0"; +} return "move%.b %1,%0"; }
Re: [PATCH 2/4] The main m68k cc0 conversion
On 11/23/19 6:36 PM, Jeff Law wrote: > Not really. I've already indicated to Bernd that he should go ahead and > commit the changes and we can iterate on any problems that arise. In the meantime I've made an aranym setup in addition to the qemu setup I had, and I've not been able to reproduce failures like the ones Andreas reported. I'll spend a few more days trying to see if I can do something about the bootstrap failure Mikael saw (currently trying to do a two-stage cross build rather than a really slow bootstrap). Bernd
Re: [PATCH ix86] Fix rtx_costs for flag-setting adds
On 11/22/19 3:04 PM, Uros Bizjak wrote: > On Fri, Nov 22, 2019 at 1:58 PM Bernd Schmidt wrote: >> >> A patch I posted recently fixes combine to take costs of JUMP_INSNs into >> account. That causes the pr30315 test to fail with -m32, since the cost >> of an add that sets the flags is estimated too high. >> >> The following seems to fix it. Bootstrapped and tested on x86_64-linux, ok? > > I think that the intention of the code below "The embedded comparison > operand is completely free." comment is to handle this case. It looks > that it should return the cost of the inside operation of COMPARE rtx. There seem to be two problems with that. We're dealing with patterns such as: (set (reg:CCC 17 flags) (compare:CCC (plus:SI (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32]) (reg/v:SI 87 [ b ])) (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32]))) If I remove the test for const0_rtx, it still doesn't work - I think setting *total to zero is ineffective, since we'll still count the MEM twice. So, how about the following? Bernd @@ -19502,9 +19502,12 @@ } /* The embedded comparison operand is completely free. */ - if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0))) - && XEXP (x, 1) == const0_rtx) - *total = 0; + if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0 + { + *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), +COMPARE, 0, speed); + return true; + } return false;
[PATCH ix86] Fix rtx_costs for flag-setting adds
A patch I posted recently fixes combine to take costs of JUMP_INSNs into account. That causes the pr30315 test to fail with -m32, since the cost of an add that sets the flags is estimated too high. The following seems to fix it. Bootstrapped and tested on x86_64-linux, ok? Bernd * config/i386/i386.c (ix86_rtx_costs): For a PLUS inside a COMPARE, representing an add that sets the flags, count just the PLUS. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7115ec44c2a..6e48f5ccbde 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19500,6 +19500,11 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, + rtx_cost (const1_rtx, mode, outer_code, opno, speed)); return true; } + if (GET_CODE (XEXP (x, 0)) == PLUS) + { + *total = rtx_cost (XEXP (x, 0), mode, COMPARE, 0, speed); + return true; + } /* The embedded comparison operand is completely free. */ if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
Re: [PATCH 3/4] Set costs for jumps in combine
On 11/22/19 1:42 AM, Segher Boessenkool wrote: > On Thu, Nov 21, 2019 at 02:36:53PM +0100, Bernd Schmidt wrote: >> Thanks. Just FYI, this is held up a little. I decided I'd also test on >> x86, and there it shows a case where ix86_rtx_cost misses something: the >> i386/pr30315.c testcase wants to combine compares into addition+jump on >> carry, but the rtx_costs show too high a cost for (compare (plus)). I'm >> testing a fix for that in i386.c. > > Maybe i386 should implement the insn_cost hook as well? For most targets > that is a lot simpler to get right than rtx_cost, but allowing memory in > many insns and all the different insn lengths complicates matters. At > least insn_cost isn't inside-out, that should make it easier to deal with > already. That kind of thing is up to the x86 maintainers. I think the problem at hand can be fixed quite simply by detecting PLUS inside COMPARE and just counting it like we would a normal PLUS. Patch will follow once testing is complete. Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/21/19 1:30 PM, Matthias Klose wrote: > > that would be apt build-dep gcc-9. The former would only install the build > dependencies of the gcc-defaults package. That gets me E: You must put some 'source' URIs in your sources.list where /etc/apt/sources.list looks like deb http://ftp.ports.debian.org/debian-ports sid main deb-src http://ftp.ports.debian.org/debian-ports sid main which googling suggests is what I want? Bernd
Re: [PATCH 3/4] Set costs for jumps in combine
On 11/13/19 5:16 PM, Segher Boessenkool wrote: > On Wed, Nov 13, 2019 at 02:13:48PM +0100, Bernd Schmidt wrote: >> Also, it does not compute costs for jump >> insns, so they are always set to zero. As a consequence, any possible >> substitution is performed if a combination into a jump is possible, >> which turns out isn't really desirable on m68k with cbranch patterns. >> >> This patch simply removes a test for NONJUMP_INSN_P. Bootstrapped and >> tested on the gcc135 machine (powerpc64le-unknown-linux-gnu). > > I wonder why that test was there. It was added in r84513, which is where > insn_rtx_cost was made from combine_insn_cost, which didn't have that > non-jump thing yet. > > It is still stage 1, so we'll find out if any target breaks I guess. > Okay for trunk. Thanks! Thanks. Just FYI, this is held up a little. I decided I'd also test on x86, and there it shows a case where ix86_rtx_cost misses something: the i386/pr30315.c testcase wants to combine compares into addition+jump on carry, but the rtx_costs show too high a cost for (compare (plus)). I'm testing a fix for that in i386.c. Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/20/19 8:27 PM, Mikael Pettersson wrote: > On Wed, Nov 20, 2019 at 3:16 PM Bernd Schmidt wrote: >> Probably best to just run tests on stage1 and hope something shows up. > > Ok, how do I did that? I've always just done 'make -k check' after > full bootstraps. > I assume the stage 1 artifacts are the ones in the prev-* directories. There's a --disable-bootstrap configure option. >> What distro are you using for native builds? The m68k debian I'm using >> does not have an installable gcc package. > > I run a bespoke distro on m68k and sparc64, derived from Fedora but > massively cut down in size, with target patches done by myself or > ported from Debian and Gentoo as necessary. > > Debian/m68k not having a gcc package? That sounds odd; I see e.g. a > gcc-9 in http://ftp.ports.debian.org/debian-ports/pool-m68k/main/g/ Turns out I wasn't sufficiently familiar with how debian works. I was missing an "apt update" step. I kind of assumed a package like gcc would install out of the box (others did, things like emacs). Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/20/19 2:50 PM, Mikael Pettersson wrote: > On Mon, Nov 18, 2019 at 9:57 PM Mikael Pettersson > wrote: >> >> On Mon, Nov 18, 2019 at 8:31 PM Bernd Schmidt wrote: >>> >>> Hi Mikael, >>> >>>> This fixed the problem, thanks. >>> >>> Could you also run the testsuite to see if you can reproduce the >>> g++.old-deja failures Andreas reported? >> >> Yes, but it will probably take another week before the native >> bootstrap (on aranym) and test suite run is finished. It's currently >> in stage 2. Ugh, that suggests the stage2 compiler was miscompiled. That would be nasty to track down. Probably best to just run tests on stage1 and hope something shows up. What distro are you using for native builds? The m68k debian I'm using does not have an installable gcc package. Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
Hi Mikael, > This fixed the problem, thanks. Could you also run the testsuite to see if you can reproduce the g++.old-deja failures Andreas reported? Bernd
Re: [PATCH 2/4] The main m68k cc0 conversion
(Apologies to Jeff who's getting this twice because I didn't hit reply-all the first time.) On 11/17/19 6:56 PM, Jeff Law wrote: > While scanning this patch I did notice the introduction of > CC_STATUS_INIT in output_{and,ior,xor}si. You might want to check that. That is intentional. CC_STATUS_INIT can be used without cc0, and there's precedent in (IIRC) ARM. We need it to get final to tell us when it encounters a label - those don't make it to the postscan_insn hook. > So unless there's objections over the next say 48-72 hrs, let's get the > kit in and we can iterate if there's further issues that need resolving. Cool. I'll wait for a final confirmation. Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/16/19 9:18 AM, Andreas Schwab wrote: > On Nov 16 2019, Bernd Schmidt wrote: > >> Well, there has to be some difference between what you are doing and >> what I am doing, because: >> >> Running /local/src/egcs/git/gcc/testsuite/g++.old-deja/old-deja.exp ... >> >> === g++ Summary === >> >> # of expected passes 26826 >> # of expected failures 82 >> # of unsupported tests 157 >> /local/src/egcs/bm68k-test/gcc/xg++ version 10.0.0 20191101 >> (experimental) (GCC) > > === g++ Summary === > > # of expected passes 170041 > # of unexpected failures 74 > # of expected failures708 > # of unresolved testcases 2 > # of unsupported tests7419 > /daten/aranym/gcc/gcc-20191115/Build/gcc/xg++ version 10.0.0 20191114 > (experimental) [trunk revision 278266] (GCC) Once again, that doesn't help me track things down. A bug report without instructions to reproduce is useless. Could you please provide me the generated assembly files with -fverbose-asm that I asked for? Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/15/19 11:50 PM, Andreas Schwab wrote: > On Nov 15 2019, Bernd Schmidt wrote: > >> I meant the compiler command line of course... for any -mcpu flags that >> might differ from my test run. > > There are none. Well, there has to be some difference between what you are doing and what I am doing, because: Running /local/src/egcs/git/gcc/testsuite/g++.old-deja/old-deja.exp ... === g++ Summary === # of expected passes26826 # of expected failures 82 # of unsupported tests 157 /local/src/egcs/bm68k-test/gcc/xg++ version 10.0.0 20191101 (experimental) (GCC) Is there anything you think you can give me to help reproduce this? Before/after assembly files, generated with -fverbose-asm? Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/15/19 10:58 PM, Andreas Schwab wrote: > On Nov 15 2019, Bernd Schmidt wrote: > >> Any chance you could show the command lines from the log files or some >> other way of reproducing the issue? > > Executing on aranym: OMP_NUM_THREADS=2 > LD_LIBRARY_PATH=.:/daten/aranym/gcc/gcc-20191115/Build/m68k-linux/./libstdc++-v3/src/.libs:/daten/aranym/gcc/gcc-20191115/Build/m68k-linux/./libstdc++-v3/src/.libs:/daten/aranym/gcc/gcc-20191115/Build/gcc:/daten/aranym/gcc/gcc-20191115/Build/gcc > timeout 1200 ./dyncast1.exe (timeout = 300) > Executed ./dyncast1.exe, status 1 > Output: Error 25 > child process exited abnormally > FAIL: g++.old-deja/g++.other/dyncast1.C -std=c++17 execution test I meant the compiler command line of course... for any -mcpu flags that might differ from my test run. Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/15/19 5:34 PM, Andreas Schwab wrote: > On Nov 15 2019, Bernd Schmidt wrote: > >> Are these with the patch? > > Yes. > >> Are you on real hardware > > No, I'm using aranym. Any chance you could show the command lines from the log files or some other way of reproducing the issue? Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/15/19 2:48 PM, Andreas Schwab wrote: > Here are the results of running the testsuite on m68k-linux: > > http://gcc.gnu.org/ml/gcc-testresults/2019-11/msg00908.html > > This is a list of regressions: Are these with the patch? I'm not seeing any of these in my testing with qemu. Are you on real hardware (and on which hardware?), and can you do anything to help narrow down what's going wrong? Bernd
Re: [PATCH 1/4] Preliminary m68k patches
On 11/13/19 9:03 PM, Jeff Law wrote: > OK. I'd actually recommend this go ahead and get installed. My tester > will bootstrap it overnight. Alright, let me know how that turns out. What kind of machine do you have for that? Bernd
Re: [PATCH 0/4] Eliminate cc0 from m68k
On 11/13/19 7:16 PM, Segher Boessenkool wrote: > I tried this out with a kernel build (just the defconfig). > during RTL pass: jump2 > /home/segher/src/kernel/fs/binfmt_elf.c: In function 'elf_core_dump': > /home/segher/src/kernel/fs/binfmt_elf.c:2409:1: internal compiler error: in > patch_jump_insn, at cfgrtl.c:1290 > Can you reproduce that? Yes. It's actually an issue I spotted at one point, but I thought to myself "I'll just leave it, I want to make the minimum amount of changes". Should have thought harder. The constraints for comparison patterns in m68k.md allow constants as the first operand of a comparison. They also use nonimmediate_operand. Hence, the internal error you saw when something tries to rerecognize the instruction. The following should fix it, but it's only very lightly tested so far. I'll merge it into patch 2. Bernd * config/m68k/m68k.md (cmp1_constraints): Don't allow constants. diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index bb46e5880e2..56685db0c72 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -488,7 +488,7 @@ ;; and operand predicates. So to be safe, just don't allow the PC-rel (define_mode_attr scc0_constraints [(QI "=d,d,d") (HI "=d,d,d,d,d") (SI "=d,d,d,d,d,d")]) -(define_mode_attr cmp1_constraints [(QI "dn,dm,>") (HI "rnm,d,n,m,>") (SI "r,rKT,rKs,mr,ma,>")]) +(define_mode_attr cmp1_constraints [(QI "dn,dm,>") (HI "rnm,d,n,m,>") (SI "r,r,r,mr,ma,>")]) (define_mode_attr cmp2_constraints [(QI "dm,nd,>") (HI "d,rnm,m,n,>") (SI "mrC0,mr,ma,KTrC0,Ksr,>")]) ;; Note that operand 0 of an SCC insn is supported in the hardware as
[PATCH 4/4] Fix autoinc cbranch
After the m68k cc0 conversion, there is one code quality regression that I can see: we no longer generate autoinc addressing modes in comparisons. This is because the parts of the compiler that generate autoinc are unwilling to substitute into jumps. If you look at the code in reload, you'll see that it's careful around jumps at find_reload time, and the code to perform autoinc reloads does try to put all the extra code before the instruction. LRA seems to have copied most of that code. Also, in the former cc0 reality, a compare wasn't really any different from a jump on m68k: we can't have a reload after the instruction in either case. Any kind of move or arithmetic would clobber the flags. That leads me to believe that there is no issue with autoinc in jumps, hence this patch. Bootstrapped and tested on the gcc135 machine (powerpc64le-unknown-linux-gnu). I don't really expect this to get approved; alternatively I could write some peepholes which would generate the same code as long as register pressure doesn't get too high. Bernd * auto-inc-dec.c (merge_in_block): Allow jumps. * combine.c (can_combine_p): Allow jumps in autoinc. diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c index bdb6efa..6dab135 100644 --- a/gcc/auto-inc-dec.c +++ b/gcc/auto-inc-dec.c @@ -1441,10 +1441,11 @@ merge_in_block (int max_reg, basic_block bb) continue; } - /* This continue is deliberate. We do not want the uses of the -jump put into reg_next_use because it is not considered safe to -combine a preincrement with a jump. */ - if (JUMP_P (insn)) + /* We used to skip jump insns, but both reload and LRA seem to +take precautions not to perform autoinc reloads after a jump or +a comparison. Allow them for regular autoinc only (for test +coverage reasons more than anything). */ + if ((HAVE_PRE_MODIFY_REG || HAVE_POST_MODIFY_REG) && JUMP_P (insn)) continue; if (dump_file) diff --git a/gcc/combine.c b/gcc/combine.c index 857ea30..e9e1464 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2119,15 +2118,12 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED, /* If INSN contains an autoincrement or autodecrement, make sure that register is not used between there and I3, and not already used in - I3 either. Neither must it be used in PRED or SUCC, if they exist. - Also insist that I3 not be a jump; if it were one - and the incremented register were spilled, we would lose. */ + I3 either. Neither must it be used in PRED or SUCC, if they exist. */ if (AUTO_INC_DEC) for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) if (REG_NOTE_KIND (link) == REG_INC - && (JUMP_P (i3) - || reg_used_between_p (XEXP (link, 0), insn, i3) + && (reg_used_between_p (XEXP (link, 0), insn, i3) || (pred != NULL_RTX && reg_overlap_mentioned_p (XEXP (link, 0), PATTERN (pred))) || (pred2 != NULL_RTX
[PATCH 3/4] Set costs for jumps in combine
The combiner is somewhat strange about how it uses costs. If any of the insns involved in a comparison have a cost of 0, it does not verify that the substitution is cheaper. Also, it does not compute costs for jump insns, so they are always set to zero. As a consequence, any possible substitution is performed if a combination into a jump is possible, which turns out isn't really desirable on m68k with cbranch patterns. This patch simply removes a test for NONJUMP_INSN_P. Bootstrapped and tested on the gcc135 machine (powerpc64le-unknown-linux-gnu). Bernd * combine.c (combine_instructions): Record costs for jumps. diff --git a/gcc/combine.c b/gcc/combine.c index 857ea30dafd..9446d2769ab 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1234,8 +1234,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) insn); /* Record the current insn_cost of this instruction. */ - if (NONJUMP_INSN_P (insn)) - INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p); + INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p); if (dump_file) { fprintf (dump_file, "insn_cost %d for ", INSN_COST (insn));
[PATCH 2/4] The main m68k cc0 conversion
This achieves the conversion by using combined cbranch/cstore patterns, and using a mechanism similar to the cc_status tracking to elide certain comparisons. Unlike cc_status, this is opt-in and requires a flags_valid attribute to be set for suitable instructions. Due to lack of test hardware, this conversion is omitted for a number of coldfire patterns as opposed to normal m68k. For DImode comparisons, scc_di and beq0_di/bne0_di patterns already existed and are reused. The bgt0_di/ble0_di patterns are replaced with expander code to test just the high word, along with some smarts in m68k_find_flags_value. Bernd
Re: [PATCH 1/4] Preliminary m68k patches
This tidies up a few spots in the m68k backend in preparation for the large patch to follow. This is purely for review purposes: this patch has not been tested independently, and will be committed together with the following one. Noteworthy changes: Some patterns and peepholes were unified through mode iterators. The m68k_subword_comparison_operator predicate was adapted to also work with SImode. There are already scc_di patterns, so there is no need to generate a cc0 set/use pair in cstoredi. Without HAVE_cc0, combine sometimes substitutes a stack push into the destination of a divmod instruction, and then gets confused because it doesn't seem to expect it in a PARALLEL. Since the instruction only works on registers anyway, use register_operand. There are patterns that use register_operand with "do" constraints which allow memory. This works at reload time, but the instruction can not be rerecognized later on. This becomes a problem if such operands occur in a jump instruction, as subsequent passes will try to redirect branches and thus attempt to rerecognize the pattern. movqi/movhi do not accept constants that are not CONST_INT. The code to output them would not set flags correctly and was changed to gcc_unreachable. Comments were added to some patterns which are not being generated due to incorrect tests/predicates. Fixing these is out of scope for this work, but the problems are at least documented. All the passes working on conditional traps seem to assume const_true_rtx is used for unconditional ones, rather than const1_rtx. Bernd * config/m68k/m68k.c (output_move_himode, output_move_qimode): Replace code for non-CONST_INT constants with gcc_unreachable. * config/m68k/m68k.md (cbranchdi): Don't generate individual compare and test. (CMPMODE): New mode_iterator. (cbranchsi4, cbranchqi4, cbranchhi4): Replace expanders with cbranch4. (cstoresi4, cstoreqi4, cstorehi4): Replace expanders with cstore4. (cmp_68881): Remove 'F' constraint from first comparison operand. (bit test insns patterns): Use nonimmediate_operand, not register_operand, for source operands that allow memory in their constraints. (divmodsi4, udivmodsi4, divmodhi4 and related unnamed patterns): Use register_operand, not nonimmediate_operand, for the destinations. (DBCC): New mode_iterator. (dbcc peepholes): Use it to reduce duplication. (trap): Use const_true_rtx, not const1_rtx. * config/m68k/predicates.md (m68k_comparison_operand): Renamed from m68k_subword_comparison_operand and changed to handle SImode. diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index 1030dfa5957..4f3503b9118 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -3072,7 +3072,7 @@ output_move_simode (rtx *operands) const char * output_move_himode (rtx *operands) { - if (GET_CODE (operands[1]) == CONST_INT) + if (GET_CODE (operands[1]) == CONST_INT) { if (operands[1] == const0_rtx && (DATA_REG_P (operands[0]) @@ -3094,7 +3094,7 @@ output_move_himode (rtx *operands) return "move%.w %1,%0"; } else if (CONSTANT_P (operands[1])) -return "move%.l %1,%0"; +gcc_unreachable (); return "move%.w %1,%0"; } @@ -3103,7 +3103,7 @@ output_move_qimode (rtx *operands) { /* 68k family always modifies the stack pointer by at least 2, even for byte pushes. The 5200 (ColdFire) does not do this. */ - + /* This case is generated by pushqi1 pattern now. */ gcc_assert (!(GET_CODE (operands[0]) == MEM && GET_CODE (XEXP (operands[0], 0)) == PRE_DEC @@ -3134,7 +3134,7 @@ output_move_qimode (rtx *operands) if (operands[1] == const0_rtx && ADDRESS_REG_P (operands[0])) return "sub%.l %0,%0"; if (GET_CODE (operands[1]) != CONST_INT && CONSTANT_P (operands[1])) -return "move%.l %1,%0"; +gcc_unreachable (); /* 68k family (including the 5200 ColdFire) does not support byte moves to from address registers. */ if (ADDRESS_REG_P (operands[0]) || ADDRESS_REG_P (operands[1])) diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index 31e8767e7e3..e60978150d1 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -456,19 +456,14 @@ (match_operand:DI 3 "general_operand")]))] "" { - if (operands[3] == const0_rtx) -emit_insn (gen_tstdi (operands[2])); - else -emit_insn (gen_cmpdi (operands[2], operands[3])); - operands[2] = cc0_rtx; - operands[3] = const0_rtx; }) +(define_mode_iterator CMPMODE [QI HI SI]) -(define_expand "cbranchsi4" +(define_expand "cbranch4" [(set (cc0) - (compare (match_operand:SI 1 "nonimmediate_operand" "") - (match_operand:SI 2 "general_operand" ""))) + (compare (match_operand:CMPMODE 1 "nonimmediate_operand" "") +
[PATCH 0/4] Eliminate cc0 from m68k
This is a set of patches to convert m68k so that it no longer uses cc0. The approach is to combine cc0 setter/user pairs into cbranch and cstore patterns. It does not expose the flag register directly. Since m68k is a target that is not under active development, and probably receives very limited testing, I felt it was important to make it generate as close to the same code as previously. Also, given that the target clobbers the flags for pretty much every move, it seems unlikely that there's much value to be had from anything more complex. Trying to model every instruction's effect on the flags would be too error-prone for not nearly enough gain. The cc0 machinery allows for eliminating unnecessary comparisons by examining the effect instructions have on the flags registers. I have replicated that mechanism with a relatively modest amount of code based on a final_postscan_insn hook, but it is now opt-in: an instruction pattern can set the "flags_valid" attribute to a number of possible values to indicate what effect it has. That should be more reliable (try git log m68k.md to see recent sprinkling of CC_STATUS_INIT to squash bugs with the previous mechanism). We can remember either values where the flags indicate a comparison against zero (after practically all arithmetic and move insns), or alternatively record two comparison operands to eliminate identical compares. I stopped adding optimizations once I found it hard to find any meaningful differences in generated code. In particular, the m68k.exp tests which verify that these optimizations are performed all still pass. Testing was done with the qemu-system-m68k/debian combination. I do not have access to Coldfire hardware, and I tried to be somewhat conservative, for example by not adding "flags_valid" everywhere it would probably be possible. For someone with access to the hardware, it should be trivial to add such attributes and test that everything still works. I'll have to rerun my final tests because test_summary made a mess of things, but as far as I am able to tell, there are no regressions, and the patch set even fixes some failures in libstdc++. The first and second patch contain the m68k changes. They are separated only to make the review easier, they were not tested separately (since the time for a test run is measured in days). The first patch contains preliminary cleanup and fixes, the second the main cc0 conversion. After that, there are some changes in the rest of the compiler. Bernd
Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations
On 08/27/2017 09:36 PM, Jon Beniston wrote: > I have an out-of-tree GCC port and it is struggling supporting > auto-vectorization on some dot product instructions. For example, I have an > instruction that takes three operands which are all 32-bit general > registers. The second and third operands will be treated as V2HI then do dot > product, and then generate an SI result which is then added to the first > operand which is SI as well. This seems to ring a bell. I think I submitted a patch for this here - is this the same problem? https://gcc.gnu.org/ml/gcc-patches/2010-12/msg01724.html Bernd
Re: MAINTAINERS update
On 06/11/2017 08:03 PM, Gerald Pfeifer wrote: > On Tue, 30 May 2017, Bernd Schmidt wrote: >> On 05/30/2017 09:05 AM, Richard Biener wrote: >>> This leaves the nvptx and c6x ports without a maintainer. Do >>> you have any recommendations for a successor here? >> Not really. It would be a shame to lose the C6X port though. If I'm >> CC'd on any bug reports I'm prepared to keep it working - if that's >> considered sufficient, I can readd myself as maintainer. > > I think that would be preferrable. Even if practically it may > not make a huge difference, people with less background/involvement > will know who to contact, and having an entire port without maintainer > just doesn't feel right. I've done that now. Bernd Index: MAINTAINERS === --- MAINTAINERS (revision 249919) +++ MAINTAINERS (working copy) @@ -49,6 +49,7 @@ arm port Richard Earnshaw avr port Denis Chertykov <cherty...@gmail.com> bfin port Jie Zhang <jzhang...@gmail.com> +c6x port Bernd Schmidt <bernds_...@t-online.de> cris port Hans-Peter Nilsson <h...@axis.com> epiphany port Joern Rennecke <g...@amylaar.uk> fr30 port Nick Clifton <ni...@redhat.com> Index: ChangeLog === --- ChangeLog (revision 249919) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2017-07-03 Bernd Schmidt <bschm...@redhat.com> + + * MAINTAINERS: Readd myself for c6x. + 2017-06-28 Martin Liska <mli...@suse.cz> PR bootstrap/81217
Re: MAINTAINERS update
On 05/30/2017 09:05 AM, Richard Biener wrote: > This leaves the nvptx and c6x ports without a maintainer. Do you have > any recommendations for a successor here? Not really. It would be a shame to lose the C6X port though. If I'm CC'd on any bug reports I'm prepared to keep it working - if that's considered sufficient, I can readd myself as maintainer. It hasn't required much attention over the years anyway. Bernd
Re: MAINTAINERS update
On 05/27/2017 12:52 PM, Bernd Schmidt wrote: I am no longer working for Red Hat, so I've updated my email address. Also, I don't expect to be around very much in the near future, so I've removed myself as maintainer for some areas. Judging by a reply I got, I may have been too terse. No need to worry, I'm just choosing to do something else for a while and reading gcc mailing lists will not be a priority in the near term. Bernd
MAINTAINERS update
I am no longer working for Red Hat, so I've updated my email address. Also, I don't expect to be around very much in the near future, so I've removed myself as maintainer for some areas. Bernd Index: ChangeLog === --- ChangeLog (revision 248535) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2017-05-27 Bernd Schmidt <bschm...@redhat.com> + + * MAINTAINERS: Update my email address, and remove myself as + maintainer in some areas. + 2017-05-25 Eric Gallager <eg...@gwmail.gwu.edu> * MAINTAINERS: Add self to Write After Approval Index: MAINTAINERS === --- MAINTAINERS (revision 248535) +++ MAINTAINERS (working copy) @@ -30,7 +30,7 @@ Michael Meissner<gnu@the-meissners.o Jason Merrill <ja...@redhat.com> David S. Miller <da...@redhat.com> Joseph Myers <jos...@codesourcery.com> -Bernd Schmidt <bschm...@redhat.com> +Bernd Schmidt <bernds_...@t-online.de> Ian Lance Taylor<i...@airs.com> Jim Wilson <wil...@tuliptree.org> @@ -48,9 +48,7 @@ arm port Nick Clifton <ni...@redhat.co arm port Richard Earnshaw <richard.earns...@arm.com> arm port Ramana Radhakrishnan <ramana.radhakrish...@arm.com> avr port Denis Chertykov <cherty...@gmail.com> -bfin port Bernd Schmidt <bschm...@redhat.com> bfin port Jie Zhang <jzhang...@gmail.com> -c6x port Bernd Schmidt <bschm...@redhat.com> cris port Hans-Peter Nilsson <h...@axis.com> epiphany port Joern Rennecke <g...@amylaar.uk> fr30 port Nick Clifton <ni...@redhat.com> @@ -85,7 +83,6 @@ nds32 port Chung-Ju Wu <jasonwucj@gmai nds32 port Shiva Chen <shiva0...@gmail.com> nios2 port Chung-Lin Tang <clt...@codesourcery.com> nios2 port Sandra Loosemore <san...@codesourcery.com> -nvptx port Bernd Schmidt <bschm...@redhat.com> pdp11 port Paul Koning <n...@arrl.net> picochip port Daniel Towner <d...@picochip.com> riscv port Kito Cheng <kito.ch...@gmail.com> @@ -231,7 +228,6 @@ tree browser/unparser Sebastian Pop profile feedback Jan Hubicka <hubi...@ucw.cz> reload Ulrich Weigand <uweig...@de.ibm.com> -reload Bernd Schmidt <bschm...@redhat.com> dfp.c, related Ben Elliston <b...@gnu.org> RTL optimizers Eric Botcazou <ebotca...@libertysurf.fr> instruction combiner Segher Boessenkool <seg...@kernel.crashing.org>
PR78972, 80283: Extend TER with scheduling
If you look at certain testcases like the one for PR78972, you'll find that the code generated by TER is maximally pessimal in terms of register pressure: we can generate a large number of intermediate results, and defer all the statements that use them up. Another observation one can make is that doing TER doesn't actually buy us anything for a large subset of the values it finds: only a handful of places in the expand phase actually make use of the information. In cases where we know we aren't going to be making use of it, we could move expressions freely without doing TER-style substitution. This patch uses the information collected by TER about the moveability of statements and performs a mini scheduling pass with the aim of reducing register pressure. The heuristic is fairly simple: something that consumes more values than it produces is preferred. This could be tuned further, but it already produces pretty good results: for the 78972 testcase, the stack size is reduced from 2448 bytes to 288, and for PR80283, the stackframe of 496 bytes vanishes with the pass enabled. In terms of benchmarks I've run SPEC a few times, and the last set of results showed not much of a change. Getting reproducible results has been tricky but all runs I made have been within 0%-1% improvement. In this patch, the changed behaviour is gated with a -fschedule-ter option which is off by default; with that default it bootstraps and tests without regressions. The compiler also bootstraps with the option enabled, in that case there are some optimization issues. I'll address some of them with two followup patches, the remaining failures are: * a handful of guality/PR43077.c failures Debug insn generation is somewhat changed, and the peephole2 pass drops one of them on the floor. * three target/i386/bmi-* tests fail. These expect the combiner to build certain instruction patterns, and that turns out to be a little fragile. It would be nice to be able to use match.pd to produce target-specific patterns during expand. Thoughts? Ok to apply? Bernd PR middle-end/80283 PR tree-optimization/78972 * cfgexpand.c (should_forward_stmt_p, decide_schedule_stmt, resolve_deps, rank_ready_insns, add_dep, set_bb_uids, set_prios, floating_point_op_p, simple_schedule, expand_one_gimple_stmt): New static functions. (struct gimple_dep, struct gimple_sched_state): New structs. (sched_map): New static variable. (expand_gimple_basic_block): Use expand_one_gimple_state, and use the results from simple_schedule if flag_schedule_ter. * common.opt (fschedule-ter): New option. * toplev.c (process_options): Set flag_tree_ter if flag_schedule_ter. * doc/invoke.texi (Optimization Options): Add -fschedule-ter. (-fschedule-ter): Document. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 66af699..e1b4898 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -5337,6 +5337,345 @@ expand_debug_locations (void) flag_strict_aliasing = save_strict_alias; } +/* Determine whether TER decided that a statment should be forwarded. */ + +static bool +should_forward_stmt_p (gimple *stmt) +{ + if (!SA.values) +return false; + + def_operand_p def_p; + def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF); + if (def_p == NULL) +return false; + + /* Forward this stmt if it is in the list of + replaceable expressions. */ + if (bitmap_bit_p (SA.values, + SSA_NAME_VERSION (DEF_FROM_PTR (def_p +return true; + + return false; +} + +/* Decide whether STMT should be scheduled, or left as a TER replacement. + Clear it as a replacement if we decide to schedule it. */ + +static bool +decide_schedule_stmt (gimple *stmt) +{ + if (!SA.values) +return false; + + if (is_gimple_debug (stmt)) +return true; + + def_operand_p def_p; + def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF); + + if (def_p == NULL) +return false; + + tree def = DEF_FROM_PTR (def_p); + use_operand_p use_p; + gimple *use_stmt; + if (!single_imm_use (def, _p, _stmt)) +return false; + + /* There are only a few cases where expansion can actually make use of + replaceable SSA names. Since TER is pessimal for scheduling in + general, we try to limit what we do to cases where we know it is + beneficial, and schedule insns for the other cases. */ + tree_code def_c = ERROR_MARK; + if (gimple_code (stmt) == GIMPLE_ASSIGN) +def_c = gimple_assign_rhs_code (stmt); + + if (gimple_code (use_stmt) == GIMPLE_ASSIGN) +{ + tree_code c = gimple_assign_rhs_code (use_stmt); + if (TREE_CODE_CLASS (c) != tcc_comparison + && c != FMA_EXPR + && c != SSA_NAME + && c != MEM_REF + && c != TARGET_MEM_REF + && def_c != VIEW_CONVERT_EXPR) + { + if (bitmap_clear_bit (SA.values, SSA_NAME_VERSION (def))) + return true; + } +} + return false; +} + +struct gimple_dep; + +/* Record state for a single statement during simple_sched. */ +struct gimple_sched_state +{ + gimple
Re: [PATCH 1/5] nvptx: implement SIMT enter/exit insns
On 03/27/2017 12:56 PM, Alexander Monakov wrote: Hello Bernd, Can you have a look at this patch (unchanged from previous posting in January)? The rest of the patches in the set are reviewed. On Wed, 22 Mar 2017, Alexander Monakov wrote: This patch adds handling of new omp_simt_enter/omp_simt_exit named insns in the NVPTX backend. * config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare. (nvptx_output_simt_exit): Declare. * config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use cfun->machine->unisimt_location. Handle NULL unisimt_predicate. (init_softstack_frame): Move initialization of crtl->is_leaf to... (nvptx_declare_function_name): ...here. Emit declaration of local memory space buffer for omp_simt_enter insn. (nvptx_output_unisimt_switch): New. (nvptx_output_softstack_switch): New. (nvptx_output_simt_enter): New. (nvptx_output_simt_exit): New. * config/nvptx/nvptx.h (struct machine_function): New fields has_simtreg, unisimt_location, simt_stack_size, simt_stack_align. * config/nvptx/nvptx.md (UNSPECV_SIMT_ENTER): New unspec. (UNSPECV_SIMT_EXIT): Ditto. (omp_simt_enter_insn): New insn. (omp_simt_enter): New expansion. (omp_simt_exit): New insn. * config/nvptx/nvptx.opt (msoft-stack-reserve-local): New option. Technically this whole series isn't a regression fix, but since Jakub has acked the rest, this is OK too. Bernd
LRA fix for 80160
This fixes two PRs; we shouldn't try to avoid spilling a reg if it has an alternate class it can use. Bootstrapped and tested on x86_64-linux, approved by Vlad, committed. Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 246472) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2017-03-25 Bernd Schmidt <bschm...@redhat.com> + + PR rtl-optimization/80160 + PR rtl-optimization/80159 + * lra-assigns.c (must_not_spill_p): Tighten new test to also take + reg_alternate_class into account. + 2017-03-24 Vladimir Makarov <vmaka...@redhat.com> PR target/80148 Index: gcc/lra-assigns.c === --- gcc/lra-assigns.c (revision 246472) +++ gcc/lra-assigns.c (working copy) @@ -908,7 +908,8 @@ must_not_spill_p (unsigned spill_regno) does not solve the general case where existing reloads fully cover a limited register class. */ if (!bitmap_bit_p (_reload_pseudos, spill_regno) - && reg_class_size [reg_preferred_class (spill_regno)] == 1) + && reg_class_size [reg_preferred_class (spill_regno)] == 1 + && reg_alternate_class (spill_regno) == NO_REGS) return true; return false; } Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 246472) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2017-03-25 Bernd Schmidt <bschm...@redhat.com> + + PR rtl-optimization/80160 + PR rtl-optimization/80159 + + * gcc.target/i386/pr80160.c: New test. + 2017-03-24 Jakub Jelinek <ja...@redhat.com> PR sanitizer/79904 Index: gcc/testsuite/gcc.target/i386/pr80160.c === --- gcc/testsuite/gcc.target/i386/pr80160.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr80160.c (working copy) @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-omit-frame-pointer -w" } */ +/* { dg-additional-options "-march=pentium-mmx" { target ia32 } } */ + +typedef struct { long long a; } a_t; +int *a, b; +a_t *e, c; +long long f; +void fn (int); +void fn2 (void); +int fn3 (a_t); +void fn4 (a_t); +a_t foo (long long val) { return (a_t){val}; } +static void +bar (int ka) +{ + unsigned i; + for (i = 0; i < 512; i++) { +long d; +c = (a_t){d}; +fn2 (); + } + fn (ka); +} +void +test (void) +{ + a_t g; + a_t *h, j; + h = e; + j = *h; + if (e == (a_t *) 1) { +a_t k = {fn3 (j)}; +fn4 (j); +long l; +g = foo((long long)b << 2 | l); +k = g; +if (j.a != k.a) { + a_t m = g; + int n = m.a, o = m.a >> 32; + asm ("# %0 %1 %2 %3" : "=m"(*a), "+A"(f) : "b"(n), "c"(o)); +} + } + bar ((int) h); +}
Re: [PATCH] Decrease compile time memory with heavy find_base_{value,term} on i?86/x86_64 (PR rtl-optimization/63191, take 2)
On 03/22/2017 04:38 PM, Uros Bizjak wrote: LGTM, but I don't want to step on Bernd's toes, so let's wait for his opinion. I was waiting for yours really, that's the one that counts. Bernd
Re: [PATCH] Fix tree-prof/pr66295.c
On 03/15/2017 09:59 PM, Segher Boessenkool wrote: This testcase can only ever be built on x86 (it needs the "avx*" attributes). This patch skips the test elsewhere. Is this okay for trunk? Ok. Bernd
Re: [PATCH] Remove dead stores and initializations
On 03/16/2017 01:31 PM, Markus Trippelsdorf wrote: clang --analyze pointed out a number of dead stores and initializations. Tested on ppc64le. Ok for trunk? I'd say - not now. Ideally someone would delve into the commit history to figure out what happened with each of these, and whether any of these indicate bugs. Bernd
Document PR79806 as a non-bug
I suggest we apply the following and close the PR as INVALID (not a bug). Ok? Bernd Index: pr65693.c === --- pr65693.c (revision 245685) +++ pr65693.c (working copy) @@ -2,6 +2,11 @@ /* { dg-do compile } */ /* { dg-options "-O2" } */ +/* This test relies on -O2 to optimize away the division operation + that's expanded as part of the alloca. With -O0, the explicit use + of edx causes a compilation failure, which is expected + behaviour. */ + int a; void
Re: Combiner fix for PR79910
On 03/15/2017 04:00 PM, Bernd Schmidt wrote: On 03/15/2017 12:09 AM, Bernd Schmidt wrote: I'll retest with your suggestion and with the bitmap creation conditional on i1 being nonnull. Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok? Oops, that one also has dbg_cnt stuff in it which I was going to remove. If you want to approve that along with the rest, the following bit is also needed: Index: gcc/dbgcnt.def === --- gcc/dbgcnt.def (revision 245685) +++ gcc/dbgcnt.def (working copy) @@ -145,6 +145,7 @@ DEBUG_COUNTER (asan_use_after_scope) DEBUG_COUNTER (auto_inc_dec) DEBUG_COUNTER (ccp) DEBUG_COUNTER (cfg_cleanup) +DEBUG_COUNTER (combine) DEBUG_COUNTER (cprop) DEBUG_COUNTER (cse2_move2add) DEBUG_COUNTER (dce) Bernd
Re: Combiner fix for PR79910
On 03/15/2017 12:09 AM, Bernd Schmidt wrote: I'll retest with your suggestion and with the bitmap creation conditional on i1 being nonnull. Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok? Bernd Index: gcc/combine.c === --- gcc/combine.c (revision 245685) +++ gcc/combine.c (working copy) @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. #include "valtrack.h" #include "rtl-iter.h" #include "print-rtl.h" +#include "dbgcnt.h" /* Number of attempts to combine instructions in this function. */ @@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in return true; } +/* Set up a set of registers used in an insn. Called through note_uses, + arguments as described for that function. */ + +static void +record_used_regs (rtx *xptr, void *data) +{ + bitmap set = (bitmap)data; + int i, j; + enum rtx_code code; + const char *fmt; + rtx x = *xptr; + + /* repeat is used to turn tail-recursion into iteration since GCC + can't do it when there's no return value. */ + repeat: + if (x == 0) +return; + + code = GET_CODE (x); + if (REG_P (x)) +{ + unsigned regno = REGNO (x); + unsigned end_regno = END_REGNO (x); + while (regno < end_regno) + bitmap_set_bit (set, regno++); + return; +} + + /* Recursively scan the operands of this expression. */ + + for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--) +{ + if (fmt[i] == 'e') + { + /* If we are about to do the last recursive call + needed at this level, change it into iteration. + This function is called enough to be worth it. */ + if (i == 0) + { + x = XEXP (x, 0); + goto repeat; + } + + record_used_regs ( (x, i), data); + } + else if (fmt[i] == 'E') + for (j = 0; j < XVECLEN (x, i); j++) + record_used_regs ( (x, i, j), data); +} +} + /* Try to combine the insns I0, I1 and I2 into I3. Here I0, I1 and I2 appear earlier than I3. I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into @@ -2742,6 +2794,27 @@ try_combine (rtx_insn *i3, rtx_insn *i2, added_links_insn = 0; + /* For combinations that may result in two insns, we have to gather + some extra information about registers used, so that we can + update all relevant LOG_LINKS later. */ + auto_bitmap i2_regset, i3_regset, links_regset; + if (i1) +{ + note_uses ( (i2), record_used_regs, (bitmap)i2_regset); + note_uses ( (i3), record_used_regs, (bitmap)i3_regset); + insn_link *ll; + FOR_EACH_LOG_LINK (ll, i3) + bitmap_set_bit (links_regset, ll->regno); + FOR_EACH_LOG_LINK (ll, i2) + bitmap_set_bit (links_regset, ll->regno); + if (i1) + FOR_EACH_LOG_LINK (ll, i1) + bitmap_set_bit (links_regset, ll->regno); + if (i0) + FOR_EACH_LOG_LINK (ll, i0) + bitmap_set_bit (links_regset, ll->regno); +} + /* First check for one important special case that the code below will not handle. Namely, the case where I1 is zero, I2 is a PARALLEL and I3 is a SET whose SET_SRC is a SET_DEST in I2. In that case, @@ -4004,6 +4077,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, } } + if (!dbg_cnt (combine)) +{ + undo_all (); + return 0; +} + /* If it still isn't recognized, fail and change things back the way they were. */ if ((insn_code_number < 0 @@ -4051,6 +4130,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2, return 0; } + auto_bitmap new_regs_in_i2; + if (newi2pat) +{ + /* We need to discover situations where we introduce a use of a + register into I2, where none of the existing LOG_LINKS contain + a reference to it. This can happen if previously I3 referenced + the reg, and there is an additional use between I2 and I3. We + must remove the LOG_LINKS entry from that additional use and + distribute it along with our own ones. */ + note_uses (, record_used_regs, (bitmap)new_regs_in_i2); + bitmap_and_compl_into (new_regs_in_i2, i2_regset); + bitmap_and_compl_into (new_regs_in_i2, links_regset); + + /* Here, we first look for situations where a hard register use + moved, and just give up. This should happen approximately + never, and it's not worth it to deal with possibilities like + multi-word registers. Later, when fixing up LOG_LINKS, we + deal with the case where a pseudo use moved. */ + if (!bitmap_empty_p (new_regs_in_i2) + && prev_nonnote_insn (i3) != i2 + && bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER) + { + undo_all (); + return 0; + } +} + if (MAY_HAVE_DEBUG_INSNS) { struct undo *undo; @@ -4494,6 +4600,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, NULL_RTX, NULL_RTX, NULL_RTX); } +if (newi2pat) + { + bitmap_itera
Fix C6X hwloop issue
This fixes a failure in the testsuite. When we transform the doloop pattern, the decrement of the old iteration register goes away, which is a problem if it's used after the loop (where it should have the value zero). Committed. Bernd * config/c6x/c6x.c (predicate_insn): Avoid rtl sharing failure. (hwloop_optimize): Handle case where the old iteration reg is used after the loop. Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c (revision 245685) +++ gcc/config/c6x/c6x.c (working copy) @@ -5788,8 +5789,9 @@ hwloop_optimize (hwloop_info loop) start_sequence (); insn = emit_insn (gen_mvilc (loop->iter_reg)); + if (loop->iter_reg_used_outside) +insn = emit_move_insn (loop->iter_reg, const0_rtx); insn = emit_insn (gen_sploop (GEN_INT (sp_ii))); - seq = get_insns (); if (!single_succ_p (entry_bb) || vec_safe_length (loop->incoming) > 1)
Reload fix for an old aarch64 issue
This triggered a kernel miscompilation with an old (4.8 I think) aarch64 toolchain. Here's the reloads for the insn where things go wrong: Reloads for insn # 210 Reload 0: reload_in (DI) = (reg/v/f:DI 80 [ pgdata ]) GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 0) reload_in_reg: (reg/v/f:DI 80 [ pgdata ]) reload_reg_rtx: (reg:DI 5 x5) Reload 1: reload_in (DI) = (plus:DI (reg/v/f:DI 80 [ pgdata ]) (const_int 7172 [0x1c04])) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 2 reload_in_reg: (plus:DI (reg/v/f:DI 80 [ pgdata ]) (const_int 7172 [0x1c04])) reload_reg_rtx: (reg:DI 5 x5) Note how the input and input_address reloads use the same reload register. This is intended as the two categories aren't supposed to conflict. The problem is that reload 1 is a PLUS expression, and when reloading these, gen_reload may have to resort to tricks, such as loading one of the operands (the constant 7172) into the reload register, and then adding the other operand to it. In effect that's an earlyclobber of the reload register, and that is not represented in the for_input/for_input_address classification. Most likely we haven't tripped over this earlier because on other machines the addition can be done in a single insn. The following fixes this by using RELOAD_OTHER in this case. This is pessimistic, but at that point I don't think we can really know what gen_reload will have to do. Bootstrapped and tested on x86_64-linux on the 4.7 branch - that's the best method of testing that I can think of (but I think I'll also run some c6x tests with trunk). If no one objects, I'll check this in soonish. Bernd * reload.c (find_reloads): When reloading a nonoffsettable address, use RELOAD_OTHER for it and its address reloads. Index: gcc/reload.c === --- gcc/reload.c (revision 211480) +++ gcc/reload.c (working copy) @@ -3989,14 +3989,14 @@ find_reloads (rtx insn, int replace, int (recog_data.operand[i], 0), (rtx*) 0, base_reg_class (VOIDmode, as, MEM, SCRATCH), address_mode, - VOIDmode, 0, 0, i, RELOAD_FOR_INPUT); + VOIDmode, 0, 0, i, RELOAD_OTHER); rld[operand_reloadnum[i]].inc = GET_MODE_SIZE (GET_MODE (recog_data.operand[i])); /* If this operand is an output, we will have made any reloads for its address as RELOAD_FOR_OUTPUT_ADDRESS, but now we are treating part of the operand as an input, so - we must change these to RELOAD_FOR_INPUT_ADDRESS. */ + we must change these to RELOAD_FOR_OTHER_ADDRESS. */ if (modified[i] == RELOAD_WRITE) { @@ -4005,10 +4005,10 @@ find_reloads (rtx insn, int replace, int if (rld[j].opnum == i) { if (rld[j].when_needed == RELOAD_FOR_OUTPUT_ADDRESS) - rld[j].when_needed = RELOAD_FOR_INPUT_ADDRESS; + rld[j].when_needed = RELOAD_FOR_OTHER_ADDRESS; else if (rld[j].when_needed == RELOAD_FOR_OUTADDR_ADDRESS) - rld[j].when_needed = RELOAD_FOR_INPADDR_ADDRESS; + rld[j].when_needed = RELOAD_FOR_OTHER_ADDRESS; } } }
Re: Combiner fix for PR79910
On 03/15/2017 12:03 AM, Jeff Law wrote: On 03/10/2017 04:24 PM, Bernd Schmidt wrote: PR rtl-optimization/79910 * combine.c (record_used_regs): New static function. (try_combine): Handle situations where there is an additional instruction between I2 and I3 which needs to have a LOG_LINK updated. PR rtl-optimization/79910 * gcc.dg/torture/pr79910.c: New test. What a nasty little problem. I don't like that we have to build these bitmaps due to the compile-time cost. Would it make sense to only build them when i0/i1 exist? I suppose at the moment we don't do 2->2 combinations, so we could conditionalize this on having an i1. We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the i2pattern where we might need to conjure up a LOG_LINK, right? We don't do 4->3 combinations, right? So we only have to care about when there's going to be an newi2pat, right (3->2 or 4->2). We don't ever create a newi1pat (for a 4->3 combination), right? So we only have to worry about testing when there's a newi2pat, right? Yes to all, there isn't a newi1pat, only 4->2 and 3->2 can be an issue. +if (prev_nonnote_insn (i3) != i2) + for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++) +if (bitmap_bit_p (new_regs_in_i2, r)) if (prev_nonnote_insn (i3) != i2 && bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER) Ah. I had wondered about the loop but only thought in the direction of intersecting this bitmap with one of all hard regs (and I think there isn't such a bitmap, so I kept the loop). I'll retest with your suggestion and with the bitmap creation conditional on i1 being nonnull. Bernd
Combiner fix for PR79910
In this PR, we have a few insns involved in two instruction combinations: insn 16: set r100 insn 27: some calculation insn 28: some calculation insn 32: using r100 insn 33: using r100 insn 35: some calculation Then we combine insns 27, 28 and 33, producing two output insns, As a result, insn 28 (i2) now obtains a use of r100. But insn 32, which is not involved in this combination, has the LOG_LINKS entry for that register, and we don't know that we need to update it. As a result, the second combination, involving regs 16, 32 and 35 (based on the remaining LOG_LINK for r100), produces incorrect code, as we don't realize there's a use of r100 before insn 32. The following fixes it. Bootstrapped and tested on x86_64-linux, ok (on all branches)? Bernd PR rtl-optimization/79910 * combine.c (record_used_regs): New static function. (try_combine): Handle situations where there is an additional instruction between I2 and I3 which needs to have a LOG_LINK updated. PR rtl-optimization/79910 * gcc.dg/torture/pr79910.c: New test. Index: gcc/combine.c === --- gcc/combine.c (revision 245685) +++ gcc/combine.c (working copy) @@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in return true; } +/* Set up a set of registers used in an insn. Called through note_uses, + arguments as described for that function. */ + +static void +record_used_regs (rtx *xptr, void *data) +{ + bitmap set = (bitmap)data; + int i, j; + enum rtx_code code; + const char *fmt; + rtx x = *xptr; + + /* repeat is used to turn tail-recursion into iteration since GCC + can't do it when there's no return value. */ + repeat: + if (x == 0) +return; + + code = GET_CODE (x); + if (REG_P (x)) +{ + unsigned regno = REGNO (x); + unsigned end_regno = END_REGNO (x); + while (regno < end_regno) + bitmap_set_bit (set, regno++); + return; +} + + /* Recursively scan the operands of this expression. */ + + for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--) +{ + if (fmt[i] == 'e') + { + /* If we are about to do the last recursive call + needed at this level, change it into iteration. + This function is called enough to be worth it. */ + if (i == 0) + { + x = XEXP (x, 0); + goto repeat; + } + + record_used_regs ( (x, i), data); + } + else if (fmt[i] == 'E') + for (j = 0; j < XVECLEN (x, i); j++) + record_used_regs ( (x, i, j), data); +} +} + /* Try to combine the insns I0, I1 and I2 into I3. Here I0, I1 and I2 appear earlier than I3. I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into @@ -2742,6 +2794,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, added_links_insn = 0; + auto_bitmap i2_regset, i3_regset, links_regset; + note_uses ( (i2), record_used_regs, (bitmap)i2_regset); + note_uses ( (i3), record_used_regs, (bitmap)i3_regset); + insn_link *ll; + FOR_EACH_LOG_LINK (ll, i3) +bitmap_set_bit (links_regset, ll->regno); + FOR_EACH_LOG_LINK (ll, i2) +bitmap_set_bit (links_regset, ll->regno); + if (i1) + FOR_EACH_LOG_LINK (ll, i1) + bitmap_set_bit (links_regset, ll->regno); + if (i0) +FOR_EACH_LOG_LINK (ll, i0) + bitmap_set_bit (links_regset, ll->regno); + /* First check for one important special case that the code below will not handle. Namely, the case where I1 is zero, I2 is a PARALLEL and I3 is a SET whose SET_SRC is a SET_DEST in I2. In that case, @@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2, return 0; } + auto_bitmap new_regs_in_i2; + if (newi2pat) +{ + /* We need to discover situations where we introduce a use of a + register into I2, where none of the existing LOG_LINKS contain + a reference to it. This can happen if previously I3 referenced + the reg, and there is an additional use between I2 and I3. We + must remove the LOG_LINKS entry from that additional use and + distribute it along with our own ones. */ + note_uses (, record_used_regs, (bitmap)new_regs_in_i2); + bitmap_and_compl_into (new_regs_in_i2, i2_regset); + bitmap_and_compl_into (new_regs_in_i2, links_regset); + + /* Here, we first look for situations where a hard register use + moved, and just give up. This should happen approximately + never, and it's not worth it to deal with possibilities like + multi-word registers. Later, when fixing up LOG_LINKS, we + deal with the case where a pseudo use moved. */ + if (prev_nonnote_insn (i3) != i2) + for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++) + if (bitmap_bit_p (new_regs_in_i2, r)) + { + undo_all (); + return 0; + } +} + if (MAY_HAVE_DEBUG_INSNS) { struct undo *undo; @@ -4494,6 +4594,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2, NULL_RTX, NULL_RTX, NULL_RTX); } +if (newi2pat) + { + bitmap_iterator
Re: Fix IRA issue, PR79728
Ping (minus the require-effective-target line, as Uros pointed out). Bernd On 03/03/2017 02:51 PM, Bernd Schmidt wrote: This is an ICE where setup_pressure_classes fails if xmm0 is a global reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only SSE_FIRST_REG as the third register class. The problem is that the costs for moving between SSE_FIRST_REG and SSE_REGS are inflated because we think we have no available registers in SSE_FIRST_REG (since the only register in that class is global), and that somewhat confuses the algorithm. The following fixes it by tweaking contains_regs_of_mode. Out of caution, I've retained the old meaning for code in reload which uses this. Bootstrapped and tested on x86_64-linux. Ok? Bernd
Re: [PATCH] Fix out-of-bounds write in RTL function reader (PR bootstrap/79952)
On 03/10/2017 08:03 PM, David Malcolm wrote: print-rtl.c:rtx_writer::print_rtx_operand_code_0 has some special -casing for SYMBOL_REF, but if I'm reading things right we don't yet dump SYMBOL_REF_BLOCK and SYMBOL_REF_BLOCK_OFFSET, so we'd need to dump these somehow. Yeah. Perhaps as an extra table or so to show the various blocks and the symbols in them sorted by offset - could be useful information for RTL dumps too. Bernd
Re: [PATCH] Decrease compile time memory with heavy find_base_{value,term} on i?86/x86_64 (PR rtl-optimization/63191)
On 03/10/2017 06:53 PM, Jakub Jelinek wrote: + +template +static inline rtx +ix86_delegitimize_address_tmpl (rtx x) { Why is this a template and not a function arg? Bernd
Re: [PATCH] Fix out-of-bounds write in RTL function reader (PR bootstrap/79952)
On 03/09/2017 08:28 PM, David Malcolm wrote: The root cause is an out-of-bounds memory write in the RTL dump reader when handling SYMBOL_REFs with SYMBOL_FLAG_HAS_BLOCK_INFO set. Such SYMBOL_REFs are normally created by varasm.c:create_block_symbol, which has: Hmm, I don't actually recall seeing this stuff. It's for section anchors apparently. OK for trunk in stage 4? gcc/ChangeLog: PR bootstrap/79952 * read-rtl-function.c (function_reader::read_rtx_operand): Update x with result of extra_parsing_for_operand_code_0. (function_reader::extra_parsing_for_operand_code_0): Convert return type from void to rtx, returning x. When reading SYMBOL_REF with SYMBOL_FLAG_HAS_BLOCK_INFO, reallocate x to the larger size containing struct block_symbol. Looks OK for now, but longer term I think we should make it possible to reconstruct this data. Bernd
Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)
On 03/03/2017 02:33 PM, Marek Polacek wrote: 2017-03-03 Marek PolacekPR c/79758 * c-decl.c (store_parm_decls_oldstyle): Check if the element of current_function_prototype_arg_types is error_mark_node. Fix formatting. Use TREE_VALUE instead of TREE_TYPE. * gcc.dg/noncompile/pr79758.c: New test. Ok. Bernd
Fix IRA issue, PR79728
This is an ICE where setup_pressure_classes fails if xmm0 is a global reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only SSE_FIRST_REG as the third register class. The problem is that the costs for moving between SSE_FIRST_REG and SSE_REGS are inflated because we think we have no available registers in SSE_FIRST_REG (since the only register in that class is global), and that somewhat confuses the algorithm. The following fixes it by tweaking contains_regs_of_mode. Out of caution, I've retained the old meaning for code in reload which uses this. Bootstrapped and tested on x86_64-linux. Ok? Bernd PR rtl-optimization/79728 * regs.h (struct target_regs): New field x_contains_allocatable_regs_of_mode. (contains_allocatable_regs_of_mode): New macro. * reginfo.c (init_reg_sets_1): Initialize it, and change contains_reg_of_mode so it includes global regs as well. * reload.c (push_reload): Use contains_allocatable_regs_of_mode rather thanc ontains_regs_of_mode. PR rtl-optimization/79728 * gcc.target/i386/sse-globalreg.c: New test. Index: gcc/reginfo.c === --- gcc/reginfo.c (revision 245685) +++ gcc/reginfo.c (working copy) @@ -468,19 +468,29 @@ init_reg_sets_1 (void) memset (contains_reg_of_mode, 0, sizeof (contains_reg_of_mode)); for (m = 0; m < (unsigned int) MAX_MACHINE_MODE; m++) { - HARD_REG_SET ok_regs; + HARD_REG_SET ok_regs, ok_regs2; CLEAR_HARD_REG_SET (ok_regs); + CLEAR_HARD_REG_SET (ok_regs2); for (j = 0; j < FIRST_PSEUDO_REGISTER; j++) - if (!fixed_regs [j] && HARD_REGNO_MODE_OK (j, (machine_mode) m)) - SET_HARD_REG_BIT (ok_regs, j); + if (!TEST_HARD_REG_BIT (fixed_nonglobal_reg_set, j) + && HARD_REGNO_MODE_OK (j, (machine_mode) m)) + { + SET_HARD_REG_BIT (ok_regs, j); + if (!fixed_regs[j]) + SET_HARD_REG_BIT (ok_regs2, j); + } for (i = 0; i < N_REG_CLASSES; i++) if ((targetm.class_max_nregs ((reg_class_t) i, (machine_mode) m) <= reg_class_size[i]) && hard_reg_set_intersect_p (ok_regs, reg_class_contents[i])) { - contains_reg_of_mode [i][m] = 1; - have_regs_of_mode [m] = 1; + contains_reg_of_mode[i][m] = 1; + if (hard_reg_set_intersect_p (ok_regs2, reg_class_contents[i])) + { + have_regs_of_mode[m] = 1; + contains_allocatable_reg_of_mode[i][m] = 1; + } } } } Index: gcc/regs.h === --- gcc/regs.h (revision 245685) +++ gcc/regs.h (working copy) @@ -218,6 +218,10 @@ struct target_regs { /* 1 if the corresponding class contains a register of the given mode. */ char x_contains_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE]; + /* 1 if the corresponding class contains a register of the given mode + which is not global and can therefore be allocated. */ + char x_contains_allocatable_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE]; + /* Record for each mode whether we can move a register directly to or from an object of that mode in memory. If we can't, we won't try to use that mode directly when accessing a field of that mode. */ @@ -243,6 +247,8 @@ extern struct target_regs *this_target_r (this_target_regs->x_have_regs_of_mode) #define contains_reg_of_mode \ (this_target_regs->x_contains_reg_of_mode) +#define contains_allocatable_reg_of_mode \ + (this_target_regs->x_contains_allocatable_reg_of_mode) #define direct_load \ (this_target_regs->x_direct_load) #define direct_store \ Index: gcc/reload.c === --- gcc/reload.c (revision 245685) +++ gcc/reload.c (working copy) @@ -1055,7 +1055,7 @@ push_reload (rtx in, rtx out, rtx *inloc #ifdef CANNOT_CHANGE_MODE_CLASS && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass) #endif - && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))] + && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (in))] && (CONSTANT_P (SUBREG_REG (in)) || GET_CODE (SUBREG_REG (in)) == PLUS || strict_low @@ -1164,7 +1164,7 @@ push_reload (rtx in, rtx out, rtx *inloc #ifdef CANNOT_CHANGE_MODE_CLASS && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass) #endif - && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))] + && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (out))] && (CONSTANT_P (SUBREG_REG (out)) || strict_low || (((REG_P (SUBREG_REG (out)) Index: gcc/testsuite/gcc.target/i386/sse-globalreg.c === --- gcc/testsuite/gcc.target/i386/sse-globalreg.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/sse-globalreg.c (working copy) @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -w" } */ +/* { dg-require-effective-target sse2 } */ + +register
Fix (work around) LRA infinite loop, PR78911
In this PR, we have an endless cycle in LRA, generating ever more instructions. The relevant portions of the dump seem to be these: ** Local #9: ** Creating newreg=130 from oldreg=128, assigning class CREG to r130 18: {r117:DI=unspec/v[[r129:SI*0x8+r123:SI],r117:DI,r122:SI,r130:SI,0x8005] 62;[r129:SI*0x8+r123:SI]=unspec/v[0] 62;flags:CCZ=unspec/v[0] 62;} ** Assignment #7: ** Assigning to 130 (cl=CREG, orig=114, freq=1148, tfirst=130, tfreq=1148)... 2nd iter for reload pseudo assignments: Reload r130 assignment failure Spill r87(hr=5, freq=4296) Spill r123(hr=4, freq=1148) Spill reload r129(hr=2, freq=1148) ** Local #11: ** Spilling non-eliminable hard regs: 6 Creating newreg=131, assigning class GENERAL_REGS to address r131 Change to class INDEX_REGS for r131 ** Assignment #8: ** Assigning to 131 (cl=INDEX_REGS, orig=131, freq=1148, tfirst=131, tfreq=1148)... Trying 0: spill 117(freq=1722) Now best 0(cost=574, bad_spills=0, insn_pseudos=1) Trying 1: spill 117(freq=1722) Trying 2: spill 130(freq=1148) Now best 2(cost=0, bad_spills=0, insn_pseudos=1) Trying 3: spill 122(freq=1148) Trying 4: spill 129(freq=1148) ** Local #12: ** Creating newreg=133 from oldreg=130, assigning class CREG to r133 What seems to happen is that in the "Assignemnt #8" phase, we're considering one new reload reg only, and we end up spilling r130, which only allows a register from a single class. Then we reload r130 again and make r133, and the cycle starts. Reload is designed in a way to avoid cycles and to process all reloads for an insn in order of ascending class so as to avoid this kind of issue. With LRA I'm really not sure how to fix this properly, but the following patch seems to cure the PR at least, by recognizing when we're about to spill a reload reg whose assigned class contains only contains a single register. Bootstrapped and tested on x86_64-linux, ok? Bernd PR rtl-optimization/78911 * lra-assigns.c (must_not_spill_p): New function. (spill_for): Use it. PR rtl-optimization/78911 * gcc.target/i386/pr78911-1.c: New test. * gcc.target/i386/pr78911-2.c: New test. Index: gcc/lra-assigns.c === --- gcc/lra-assigns.c (revision 245685) +++ gcc/lra-assigns.c (working copy) @@ -889,6 +889,30 @@ assign_temporarily (int regno, int hard_ live_pseudos_reg_renumber[regno] = hard_regno; } +/* Return true iff there is a reason why pseudo SPILL_REGNO should not + be spilled. */ +static bool +must_not_spill_p (unsigned spill_regno) +{ + if ((pic_offset_table_rtx != NULL + && spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno >= lra_constraint_new_regno_start + && ! bitmap_bit_p (_inheritance_pseudos, spill_regno) + && ! bitmap_bit_p (_split_regs, spill_regno) + && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno) + && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno))) +return true; + /* A reload pseudo that requires a singleton register class should + not be spilled. + FIXME: this mitigates the issue on certain i386 patterns, but + does not solve the general case where existing reloads fully + cover a limited register class. */ + if (!bitmap_bit_p (_reload_pseudos, spill_regno) + && reg_class_size [reg_preferred_class (spill_regno)] == 1) +return true; + return false; +} + /* Array used for sorting reload pseudos for subsequent allocation after spilling some pseudo. */ static int *sorted_reload_pseudos; @@ -960,13 +984,7 @@ spill_for (int regno, bitmap spilled_pse /* Spill pseudos. */ static_p = false; EXECUTE_IF_SET_IN_BITMAP (_pseudos_bitmap, 0, spill_regno, bi) - if ((pic_offset_table_rtx != NULL - && spill_regno == REGNO (pic_offset_table_rtx)) - || ((int) spill_regno >= lra_constraint_new_regno_start - && ! bitmap_bit_p (_inheritance_pseudos, spill_regno) - && ! bitmap_bit_p (_split_regs, spill_regno) - && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno) - && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno))) + if (must_not_spill_p (spill_regno)) goto fail; else if (non_spilled_static_chain_regno_p (spill_regno)) static_p = true; Index: gcc/testsuite/gcc.target/i386/pr78911-1.c === --- gcc/testsuite/gcc.target/i386/pr78911-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr78911-1.c (working copy) @@ -0,0 +1,22 @@ +/* PR rtl-optimization/78911 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fno-strict-aliasing -fno-omit-frame-pointer" } */ +/* { dg-additional-options "-fPIC" { target fpic } } */ +/* { dg-additional-options "-march=pentium-m" { target ia32 } } */ + +int a, b, d, e; +long long *c; + +static
Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).
On 03/02/2017 06:50 PM, Martin Liška wrote: Hello. This is second part of fixes needed to not trigger integer overflow in gcse pass. So, how is this intended to work? The min/max stored in the param is an int, and by using a HOST_WIDE_INT here, we expect that it is a larger type and therefore won't overflow? { expr = flat_table[i]; fprintf (file, "Index %d (hash value %d; max distance %d)\n ", -expr->bitmap_index, hash_val[i], expr->max_distance); +expr->bitmap_index, hash_val[i], (int)expr->max_distance); print_rtl (file, expr->expr); fprintf (file, "\n"); Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess. Bernd
Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)
On 03/02/2017 06:35 PM, Marek Polacek wrote: While at it, I fixed wrong formatting in the nearby code. Also use NULL_TREE instead of 0 where appropriate. I really dislike those zeros-as-trees; one day I'll just go and turn them into NULL_TREEs. I sympathize, but it makes it harder to see which parts of the patch are actual changes. Generally it's best to separate functional and formatting changes. @@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) declared for the arg. ISO C says we take the unqualified type for parameters declared with qualified type. */ if (TREE_TYPE (parm) != error_mark_node - && TREE_TYPE (type) != error_mark_node + && TREE_TYPE (TREE_VALUE (type)) != error_mark_node && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm)) != TYPE_ATOMIC (TREE_VALUE (type))) || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)), Isn't this most likely intended to be TREE_VALUE (type) != error_mark_node? @@ -9017,7 +9020,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct c_arg_info *arg_info) if (targetm.calls.promote_prototypes (TREE_TYPE (current_function_decl)) && INTEGRAL_TYPE_P (TREE_TYPE (parm)) && TYPE_PRECISION (TREE_TYPE (parm)) - < TYPE_PRECISION (integer_type_node)) +< TYPE_PRECISION (integer_type_node)) Should add the necessary parens while fixing formatting. Bernd
Re: [PATCH] Fix ICE with multiple conditional traps turned into unconditional in one bb (PR rtl-optimization/79780)
On 03/02/2017 10:23 AM, Jakub Jelinek wrote: 2017-03-02 Jakub JelinekPR rtl-optimization/79780 * cprop.c (one_cprop_pass): When second and further conditional trap in a single basic block is turned into an unconditional trap, turn it into a deleted note to avoid RTL verification failures. * gcc.c-torture/compile/pr79780.c: New test. Ok. Bernd
Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)
On 02/23/2017 10:27 PM, Jakub Jelinek wrote: Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? LGTM. Bernd
Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)
On 02/23/2017 02:36 PM, Jakub Jelinek wrote: and both UNLT and GE can be reversed. But if the arguments of the condition are canonicalized, we run into: /* Test for an integer condition, or a floating-point comparison in which NaNs can be ignored. */ if (CONST_INT_P (arg0) || (GET_MODE (arg0) != VOIDmode && GET_MODE_CLASS (mode) != MODE_CC && !HONOR_NANS (mode))) return reverse_condition (code); and thus always return UNKNOWN. So... do you think we could add (in gcc-8, probably, although if it fixes this regression...) else if (GET_MODE (arg0) != VOIDmode && GET_MODE_CLASS (mode) != MODE_CC && HONOR_NANS (mode)) return reverse_condition_maybe_unordered (code); to make this work? Bernd
Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)
On 02/23/2017 12:46 PM, Jakub Jelinek wrote: But as soon as we only have the (unlt (reg:DF 100) (reg:DF 97)), reversed_comparison_code fails on it: case UNLT: case UNLE: case UNGT: case UNGE: /* We don't have safe way to reverse these yet. */ return UNKNOWN; I do have to wonder why. The reverse_condition_maybe_unordered function knows how to reverse these, and I can't quite figure out what the problem is here. The comment isn't super helpful. - && (reversed_comparison_code (if_info->cond, if_info->jump) - != UNKNOWN)) + && (if_info->rev_cond + || reversed_comparison_code (if_info->cond, + if_info->jump) != UNKNOWN)) Maybe have a reversed_cond_code (if_info) function? This pattern seems to occur a few times. @@ -4676,7 +4713,12 @@ find_cond_trap (basic_block test_bb, edg { code = reversed_comparison_code (cond, jump); if (code == UNKNOWN) - return FALSE; + { + cond = noce_get_condition (jump, _earliest, true); + if (!cond) + return FALSE; + code = GET_CODE (cond); + } This one is an extra optimization, similar but unrelated to the others, right? Can't you figure out the need to reverse a bit earlier and pass it into the first noce_get_condition call? Otherwise ok. Bernd
Re: [PATCH 1/6] c6x: Fix for RTL checking
On 02/21/2017 03:48 PM, Segher Boessenkool wrote: 2017-02-21 Segher Boessenkool* config/c6x/c6x.c (predicate_insn): Do not incorrectly share RTL. Ok, thanks. Bernd
Re: fwprop fix for PR79405
On 02/17/2017 10:11 AM, Richard Biener wrote: Index: gcc/fwprop.c === --- gcc/fwprop.c(revision 245501) +++ gcc/fwprop.c(working copy) @@ -1478,7 +1478,8 @@ fwprop (void) Do not forward propagate addresses into loops until after unrolling. CSE did so because it was able to fix its own mess, but we are not. */ - for (i = 0; i < DF_USES_TABLE_SIZE (); i++) + unsigned sz = DF_USES_TABLE_SIZE (); + for (i = 0; i < sz; i++) { df_ref use = DF_USES_GET (i); if (use) might work? (not knowing too much about this detail of the DF data structures - can the table shrink?) This would probably work to fix the bug, but this behaviour is explicitly documented as intentional (in the comment the second half of which you've quoted). I assume it enables additional substitutions. Bernd
Re: Improving code generation in the nvptx back end
On 02/17/2017 02:09 PM, Thomas Schwinge wrote: Hi! On Fri, 17 Feb 2017 14:00:09 +0100, I wrote: [...] for "normal" functions there is no reason to use the ".param" space for passing arguments in and out of functions. We can then get rid of the boilerplate code to move ".param %in_ar*" into ".reg %ar*", and the other way round for "%value_out"/"%value". This will then also simplify the call sites, where all that code "evaporates". That's actually something I started to look into, many months ago, and I now just dug out those changes, and will post them later. (Very likely, the PTX "JIT" compiler will do the very same thing without difficulty, but why not directly generate code that is less verbose to read?) Using my WIP patch, the generated PTX code changes/is simplified as follows: It's probably a good idea to run cuobjdump -sass to see whether this has any effect at all. The most important issue that needs solving is probably still the old issue that ptxas isn't reliable. Looks like the llvm folks ran into the same problem, as I discovered last week: https://bugs.llvm.org//show_bug.cgi?id=27738 Bernd
fwprop fix for PR79405
We have two registers being assigned to each other: (set (reg 213) (reg 209)) (set (reg 209) (reg 213)) These being the only definitions, we are happy to forward propagate reg 209 for reg 213 into a third insn, making a new use for reg 209. We are then happy to forward propagate reg 213 for it in the same insn... ending up in an infinite loop. I don't really see an elegant way to prevent this, so the following just tries to detect the situation (and more general ones) by brute force. Bootstrapped and tested on x86_64-linux, verified that the test passes with a ppc cross, ok? Bernd PR rtl-optimization/79405 * fwprop.c (forward_propagate_into): Detect potentially cyclic replacements and bail out for them. PR rtl-optimization/79405 * gcc.dg/torture/pr79405.c: New test. Index: gcc/fwprop.c === --- gcc/fwprop.c (revision 244815) +++ gcc/fwprop.c (working copy) @@ -1374,13 +1374,42 @@ forward_propagate_into (df_ref use) /* Only consider uses that have a single definition. */ def = get_def_for_use (use); - if (!def) + if (!def || DF_REF_INSN_INFO (def) == NULL) return false; if (DF_REF_FLAGS (def) & DF_REF_READ_WRITE) return false; if (DF_REF_IS_ARTIFICIAL (def)) return false; + df_ref tmp_def = def; + /* There is a problematic case where a chain of assignments + rA = rB; rB = rC; ; rM = rN; rN = rA. + can cause us to replace these registers in an infinite cycle. + Walk backwards until we can guarantee that this situation is + not present. */ + for (;;) +{ + rtx_insn *insn = DF_REF_INSN (tmp_def); + rtx set = single_set (insn); + if (set == NULL_RTX) + break; + rtx src = SET_SRC (set); + rtx dst = SET_DEST (set); + if (GET_CODE (src) != REG || GET_CODE (dst) != REG) + break; + if (rtx_equal_p (src, DF_REF_REG (use))) + return false; + df_ref tmp_use = df_single_use (DF_REF_INSN_INFO (tmp_def)); + if (!tmp_use) + break; + tmp_def = get_def_for_use (tmp_use); + if (!tmp_def || DF_REF_INSN_INFO (tmp_def) == NULL) + break; + if (DF_REF_FLAGS (tmp_def) & DF_REF_READ_WRITE) + break; + if (DF_REF_IS_ARTIFICIAL (tmp_def)) + break; +} /* Do not propagate loop invariant definitions inside the loop. */ if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father) return false; Index: gcc/testsuite/gcc.dg/torture/pr79405.c === --- gcc/testsuite/gcc.dg/torture/pr79405.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr79405.c (working copy) @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +char cz; +long long int xx, u2; + +void +qv (int js, int wl) +{ + if (js != 0) +{ + short int sc; + int *at = (int *) + long long int gx = 0; + + for (;;) +{ + *at = 0; + js /= sc; + + for (wl = 0; wl < 2; ++wl) +{ + xx = gx; + u2 %= xx > 0; + cz /= u2; + + fa: + if (cz != u2) +{ + gx |= js; + cz = gx / js; +} +} +} + + yq: + wl /= 0x8000; + u2 = wl; + u2 |= (wl != 0) | (wl != 0 && gx != 0); + js = u2; + goto fa; +} + goto yq; +}
Re: C PATCH to fix ICE with -Wdouble-promotion (PR c/79515)
On 02/15/2017 12:49 PM, Marek Polacek wrote: We ICEd on this testcase in do_warn_double_promotion because an invalid conversion had produced an error result type and accessing that via TYPE_MAIN_VARIANT crashes. Fixed in an obvious way. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2017-02-15 Marek PolacekPR c/79515 * c-warn.c (do_warn_double_promotion): Don't warn if an invalid conversion has occured. * gcc.dg/dfp/pr79515.c: New. Ok. Bernd
Re: [PATCH] Replace XALLOCAVEC with XCNEWVEC (PR c/79471).
On 02/13/2017 02:06 PM, Martin Liška wrote: On 02/13/2017 01:58 PM, Bernd Schmidt wrote: On 02/13/2017 11:15 AM, Martin Liška wrote: In order to not cause a stack overflow, lets use a vector allocated on heap instead of the one created by XALLOCVEC. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ok. I'm surprised this is marked as a regression, but it's simple enough that it probably ought to be fixed in any case. Yep. I've just removed that. Is it also fine to both active branches after it finishes regression tests? Sure. Bernd
Re: [PATCH] Invalidate combiner's cached last value upon insn removal (PR rtl-optimization/79388, PR rtl-optimization/79450)
On 02/10/2017 08:50 PM, Jakub Jelinek wrote: 2017-02-10 Jakub JelinekPR rtl-optimization/79388 PR rtl-optimization/79450 * combine.c (distribute_notes): When removing TEM_INSN for which corresponding dest has last value recorded, invalidate that last value. * gcc.c-torture/execute/pr79388.c: New test. * gcc.c-torture/execute/pr79450.c: New test. Ok. Bernd
Re: [PATCH] Replace XALLOCAVEC with XCNEWVEC (PR c/79471).
On 02/13/2017 11:15 AM, Martin Liška wrote: In order to not cause a stack overflow, lets use a vector allocated on heap instead of the one created by XALLOCVEC. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ok. I'm surprised this is marked as a regression, but it's simple enough that it probably ought to be fixed in any case. Bernd
Re: [patch] Fix PR middle-end/78468
On 01/27/2017 01:02 PM, Eric Botcazou wrote: The attached patch is a middle ground between the previously working and currently broken situations: if the back-end defines STACK_DYNAMIC_OFFSET, then the middle-end assumes that STACK_DYNAMIC_OFFSET maintains the alignment; if it doesn't, which means the default value of STACK_DYNAMIC_OFFSET computed in function.c is used instead, then the middle-end maintains the alignment itself. I'd say let's not have a middle ground - this stuff is sufficiently brain-twisting that I'd rather go back to a known working state. If there was an error in the previous patch, let's roll it back until we fully understand the situation. Bernd
One more cprop trap_if fix, PR79194
This PR seems to be curable by fixing up the CFG a little earlier. Bootstrapped and tested on x86_64-linux, and it seems to cure the testcase with a ppc cross. I'd appreciate if someone ran full ppc tests with this though. Ok? Bernd PR rtl-optimization/79194 * cprop.c (one_cprop_pass): Move deletion of code after unconditional traps before call to bypass_conditional_jumps. PR rtl-optimization/79194 * gcc.dg/torture/pr79194.c: New test. Index: gcc/cprop.c === --- gcc/cprop.c (revision 244817) +++ gcc/cprop.c (working copy) @@ -1863,8 +1863,6 @@ one_cprop_pass (void) } } - changed |= bypass_conditional_jumps (); - while (!uncond_traps.is_empty ()) { rtx_insn *insn = uncond_traps.pop (); @@ -1873,6 +1871,8 @@ one_cprop_pass (void) emit_barrier_after_bb (to_split); } + changed |= bypass_conditional_jumps (); + FREE_REG_SET (reg_set_bitmap); free_cprop_mem (); } Index: gcc/testsuite/gcc.dg/torture/pr79194.c === --- gcc/testsuite/gcc.dg/torture/pr79194.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr79194.c (working copy) @@ -0,0 +1,42 @@ +/* { dg-do compile } */ + +int iw, vr; + +void +d9 (unsigned int j3, long long int f5, int kp) +{ + int *qb = + + if (kp != 0) +{ + long long int oq; + unsigned int tl = 0; + + for (j3 = 0; j3 < 1; ++j3) +qb = + goto ed; + + l7: + oq = 1; + while (oq < 2) +oq *= j3; + + ed: + do +{ + oq -= *qb; + if (oq != 0) +{ + long long int ie = j3 & f5; + int ws = (j3 != 0 && kp != 0); + + tl = ie > ws; + iw = vr = tl; +} + else +tl = (kp != 0 && (0 % 0) != 0); /* { dg-warning "division by zero" } */ +} + while (tl != 0); +} + goto l7; +}
Re: [RFC] sched: Do not move expensive insns speculatively (PR68664)
On 01/27/2017 02:19 AM, Segher Boessenkool wrote: But what is "insn cost"? Latency is no good at all -- we *want* insns with higher latency to be earlier. fsqrt is not pipelined, and that is what makes it so costly. (This isn't modeled in the scheduling description btw: that would make the automata sizes explode, the usual problem). On other machines sqrt might be pipelined, so the patch as-is clearly isn't suitable. rtx_cost/insn_cost probably also won't do, since it doesn't model that property either. Maybe instead of a hook you could have an insn attribute that the scheduler looks for. Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/25/2017 08:46 PM, Segher Boessenkool wrote: It turns out my patch (see the PR) causes (or at least triggers) miscompilations on tilegx. I will drop it for now. Curious, it looked very reasonable. What's needed to reproduce this? Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/25/2017 10:18 AM, Kyrill Tkachov wrote: The test is supposed to test the generation of the vsel instruction. I believe adding an -mcpu=cortex-a57 to the testcases would be best, as VSEL isn't actually available on Cortex-A5, it's just enabled by the -mfpu=fp-armv8 option. A more realistic configuration would target an ARMv8-A CPU like the Cortex-A57. Ok, let me know if there's anything else you need from my side. Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/24/2017 06:03 PM, Christophe Lyon wrote: Ha... the regression occurred between r 244818 and r 244816, and I read r 244816 ChangeLog too quickly and did not notice it was modifying ifcvt.c in addition to x86-only files. So it's likely that it's your other patch for pr78634 that caused the regression I mentioned. Does it make more sense? That's possible. That added a missing cost check, so the question becomes - is the change in generated assembly sensible, given the selected CPU type? Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/24/2017 05:50 PM, Kyrill Tkachov wrote: Actually trying it out with an explicit -mcpu=cortex-a5 (so -O2 -S -mfpu=fp-armv8 -mcpu=cortex-a57 -mfloat-abi=hard) I get the test failing before and after the patch. The code generated is vcmp.f64d0, d1 vmrsAPSR_nzcv, FPSCR vmovvs.f64 d0, d1 bx lr whereas the desired (e.g. with -mcpu=cortex-a57) is: vcmp.f64d0, d1 vmrsAPSR_nzcv, FPSCR vselvs.f64 d0, d1, d0 bx lr Yes, I've seen both of these generated with different options, but the patch did not make a difference here either. For the moment I'll assume this was a false alarm, i.e. Christophe misidentified the patch and something else went wrong. Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/24/2017 05:30 PM, Kyrill Tkachov wrote: The -mfpu is overridden in the testcase to add the ARMv8 instructions. So to reproduce the compilation in that testcase you'd want -mfpu=fp-armv8 or something equivalent rather than vfpv3-d16-fp16. Exact steps please. No one who's not well-versed in all the ARM variants will be able to figure this out. I've been able to generate identical before/after code, both with and without vselvs.f64 instructions, after trying out a number of switch combinations, but I've not been able to find a way to show where the patch makes a difference. Bernd
Re: Improve things for PR71724, in combine/if_then_else_cond
On 01/24/2017 09:38 AM, Christophe Lyon wrote: It seems that Bernd's patch causes regressions on arm-linux-gnueabihf --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16: gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1 gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1 gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1 gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1 Maybe the upcoming patch from Segher intends to address this? Not really. How exactly do I reproduce this - can you give me a command line and expected assembly output (I'm having some trouble figuring out the right set of switches)? Bernd
Re: [PATCH v5] add -fprolog-pad=N,M option
There's still a a few details that need addressing, and some questions I have. Also Ccing Jakub to have another pair of eyes on the name of the section - I don't know if we want some sort of .gnu.something name. On 01/13/2017 01:19 PM, Torsten Duwe wrote: 2017-01-13 Torsten Duwe: First, some people seem to care, so I'll have to ask to please check the correct formatting of this line (spacing and all) by looking at existing entries. Also remove some of the blank lines in the ChangeLog, you could still group the doc entries separately from the code ones. * c-family/c-attribs.c : introduce prolog_pad attribute and create a handler for it. * lto/lto-lang.c : Likewise. > * testsuite/c-c++-common/attribute-prolog_pad-1.c : New test. These three directories have separate ChangeLogs. The top-level directory name should be stripped and the entries added to the appropriate file. + for (it = _ATTRIBUTES (newdecl); *it; it = _CHAIN(*it)) + { + if (IDENTIFIER_LENGTH (get_attribute_name (*it)) != + strlen("prolog_pad")) Still several instances of bad formatting, in the entire block of code. Please refer to my previous email. +static tree +handle_prolog_pad_attribute (tree *, tree, tree, int, +bool *) +{ + /* Nothing to be done here. */ + return NULL_TREE; +} Should still add at least one-liner comments before these empty functions to say what interface they're implementing. Also, probably don't need to wrap the line containing the arguments. diff --git a/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c b/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c new file mode 100644 index 000..2236aa8 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-fprolog-pad=3,1" } */ This test does not actually seem to verify that the padding has the expected size. /* Do not use IPA optimizations for register allocation if profiler is active +or prolog pads are inserted for run-time instrumentation or port does not emit prologue and epilogue as RTL. */ - if (profile_flag || !targetm.have_prologue () || !targetm.have_epilogue ()) + if (profile_flag || prolog_nop_pad_size + || !targetm.have_prologue () || !targetm.have_epilogue ()) flag_ipa_ra = 0; Was this explained? Why would ipa-ra be a problem? + if (pad_size > pad_entry) +targetm.asm_out.print_prolog_pad (asm_out_file, pad_size-pad_entry, + (pad_entry == 0)); Unnecessary parens around the last argument, and missing spaces around operators. Bernd
Improve things for PR71724, in combine/if_then_else_cond
The PR is about infinite recursion in combine_simplify_rtx, because if_then_else_cond does strange things to an expression, and we end up simplifying something to itself. The patch below tries to address this by improving that function a little. As stated in the PR, the situation is that we have (plus:SI (if_then_else:SI (eq (reg:CC 185) (const_int 0 [0])) (reg:SI 165) (reg:SI 174 [ t9.0_1+4 ])) (reg:SI 165)) Reg 165 is known to be zero or one, so it gets turned into a condition, and we have two different conditions on the operands. That causes us to fail to make the fairly obvious transformation to cond = reg:CC 185 true_rtx = (plus r165 r165) false_rtx = (plus r174 r165) So, when looking for situations where we have only one condition, we can try to undo the conversion of a plain REG into a condition, on the grounds that this is probably less helpful. This seems to cure the testcase, but Segher also has a patch in the PR that looks like a good and more direct approach. IMO both should be applied. This one was bootstrapped and tested on x86_64-linux. Ok? Bernd PR rtl-optimization/71724 * combine.c (if_then_else_cond): Look for situations where it is beneficial to undo the work of one of the recursive calls. Index: gcc/combine.c === --- gcc/combine.c (revision 244528) +++ gcc/combine.c (working copy) @@ -9044,11 +9044,31 @@ if_then_else_cond (rtx x, rtx *ptrue, rt the same value, compute the new true and false values. */ else if (BINARY_P (x)) { - cond0 = if_then_else_cond (XEXP (x, 0), , ); - cond1 = if_then_else_cond (XEXP (x, 1), , ); + rtx op0 = XEXP (x, 0); + rtx op1 = XEXP (x, 1); + cond0 = if_then_else_cond (op0, , ); + cond1 = if_then_else_cond (op1, , ); + + if ((cond0 != 0 && cond1 != 0 && !rtx_equal_p (cond0, cond1)) + && (REG_P (op0) || REG_P (op1))) + { + /* Try to enable a simplification by undoing work done by + if_then_else_cond if it converted a REG into something more + complex. */ + if (REG_P (op0)) + { + cond0 = 0; + true0 = false0 = op0; + } + else + { + cond1 = 0; + true1 = false1 = op1; + } + } if ((cond0 != 0 || cond1 != 0) - && ! (cond0 != 0 && cond1 != 0 && ! rtx_equal_p (cond0, cond1))) + && ! (cond0 != 0 && cond1 != 0 && !rtx_equal_p (cond0, cond1))) { /* If if_then_else_cond returned zero, then true/false are the same rtl. We must copy one of them to prevent invalid rtl
Another cprop trap_if fix, PR79125
This is essentially the same patch I sent for the previous instance of this problem, but this time applied to local_cprop_pass. Bootstrapped and tested on x86_64-linux, and it seems to fix the testcase with a ppc cross. Ok? Bernd PR rtl-optimization/79125 * cprop.c (local_cprop_pass): Handle cases where we make an unconditional trap. PR rtl-optimization/79125 * gcc.dg/torture/pr79125.c: New test. Index: gcc/cprop.c === --- gcc/cprop.c (revision 244528) +++ gcc/cprop.c (working copy) @@ -1248,6 +1248,8 @@ local_cprop_pass (void) bool changed = false; unsigned i; + auto_vec uncond_traps; + cselib_init (0); FOR_EACH_BB_FN (bb, cfun) { @@ -1255,6 +1257,9 @@ local_cprop_pass (void) { if (INSN_P (insn)) { + bool was_uncond_trap + = (GET_CODE (PATTERN (insn)) == TRAP_IF + && XEXP (PATTERN (insn), 0) == const1_rtx); rtx note = find_reg_equal_equiv_note (insn); do { @@ -1273,6 +1278,13 @@ local_cprop_pass (void) break; } } + if (!was_uncond_trap + && GET_CODE (PATTERN (insn)) == TRAP_IF + && XEXP (PATTERN (insn), 0) == const1_rtx) + { + uncond_traps.safe_push (insn); + break; + } if (insn->deleted ()) break; } @@ -1287,6 +1299,14 @@ local_cprop_pass (void) cselib_finish (); + while (!uncond_traps.is_empty ()) +{ + rtx_insn *insn = uncond_traps.pop (); + basic_block to_split = BLOCK_FOR_INSN (insn); + remove_edge (split_block (to_split, insn)); + emit_barrier_after_bb (to_split); +} + return changed; } Index: gcc/testsuite/gcc.dg/torture/pr79125.c === --- gcc/testsuite/gcc.dg/torture/pr79125.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr79125.c (working copy) @@ -0,0 +1,32 @@ +int za; + +void +hl (void) +{ + short int o8 = 0; + short int m6 = 1; + short int *ni = + + for (;;) +{ + int af; + short int *fd = (short int *) + + if (ni != 0) +{ + if (m6 != 0) +*ni = 0; + else +za = 0; + af = (o8 * o8) || o8; + if (af == 0) +m6 /= 0; /* { dg-warning "division" } */ + while (za != 0) +{ +} +} + *fd = /* { dg-warning "without a cast" } */ + for (af = 0; af < 2; ++af) +af = za; +} +}
Re: PR78634: ifcvt/i386 cost updates
On 12/09/2016 12:49 PM, Bernd Schmidt wrote: On 12/03/2016 10:49 AM, Uros Bizjak wrote: Based on the above explanation, the patch is OK. I'll be treating the ifcvt part of it as obvious. However, testing showed an issue with the i386 funcspec-11 test: /* PR target/36936 */ /* { dg-do compile } */ /* { dg-require-effective-target ia32 } */ /* { dg-options "-O2 -march=i386" } */ /* { dg-final { scan-assembler "cmov" } } */ extern int foo (int) __attribute__((__target__("arch=i686"))); int foo (int x) { if (x < 0) x = 1; return x; } It wants to test that we can temporarily switch the arch, but we're still using the i386 cost table, with a branch cost of 1. This means the cmov now won't get generated for cost reasons. It seems like whether we're generating cmov when tuning for i386 isn't the most critical thing to worry about, so I propose amending the test to also pass in -mtune=i686. Ok with that change? Ping? Bernd
Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization
On 01/16/2017 08:26 PM, Jeff Law wrote: On 01/13/2017 11:19 AM, Thomas Preudhomme wrote: Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM approval. Vlad's approval is all you need. Is that a general rule? I'm never too certain on that. Bernd
[i386] New lea patterns for PR71321
The problem here is that we don't have complete coverage of lea patterns for HImode/QImode: the combiner can't recognize a (plus (ashift reg 2) reg) pattern it builds. My first idea was to canonicalize ASHIFT by constant inside PLUS to MULT. The docs say that this is done only inside a MEM context, but that seems misguided considering the existence of lea patterns. Maybe that's something to revisit for gcc-8. In the meantime, the patch below is more like a minimal fix, and it seems to produce better results at the moment anyway. Bootstrapped and tested on x86_64-linux. Ok? Bernd PR target/71321 * config/i386/i386.md (lea_general_2b, lea_general_3b): New patterns. * config/i386/predicates.md (const123_operand): New. PR target/71321 * gcc.target/i386/pr71321.c: New test. Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 243548) +++ gcc/config/i386/i386.md (working copy) @@ -6266,6 +6266,27 @@ (define_insn_and_split "*lea_gener [(set_attr "type" "lea") (set_attr "mode" "SI")]) +(define_insn_and_split "*lea_general_2b" + [(set (match_operand:SWI12 0 "register_operand" "=r") + (plus:SWI12 + (ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l") + (match_operand 2 "const123_operand" "n")) + (match_operand:SWI12 3 "nonmemory_operand" "ri")))] + "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)" + "#" + "&& reload_completed" + [(set (match_dup 0) + (plus:SI + (ashift:SI (match_dup 1) (match_dup 2)) + (match_dup 3)))] +{ + operands[0] = gen_lowpart (SImode, operands[0]); + operands[1] = gen_lowpart (SImode, operands[1]); + operands[3] = gen_lowpart (SImode, operands[3]); +} + [(set_attr "type" "lea") + (set_attr "mode" "SI")]) + (define_insn_and_split "*lea_general_3" [(set (match_operand:SWI12 0 "register_operand" "=r") (plus:SWI12 @@ -6284,6 +6305,32 @@ (define_insn_and_split "*lea_gener (match_dup 3)) (match_dup 4)))] { + operands[0] = gen_lowpart (SImode, operands[0]); + operands[1] = gen_lowpart (SImode, operands[1]); + operands[3] = gen_lowpart (SImode, operands[3]); + operands[4] = gen_lowpart (SImode, operands[4]); +} + [(set_attr "type" "lea") + (set_attr "mode" "SI")]) + +(define_insn_and_split "*lea_general_3b" + [(set (match_operand:SWI12 0 "register_operand" "=r") + (plus:SWI12 + (plus:SWI12 + (ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l") + (match_operand 2 "const123_operand" "n")) + (match_operand:SWI12 3 "register_operand" "r")) + (match_operand:SWI12 4 "immediate_operand" "i")))] + "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)" + "#" + "&& reload_completed" + [(set (match_dup 0) + (plus:SI + (plus:SI + (ashift:SI (match_dup 1) (match_dup 2)) + (match_dup 3)) + (match_dup 4)))] +{ operands[0] = gen_lowpart (SImode, operands[0]); operands[1] = gen_lowpart (SImode, operands[1]); operands[3] = gen_lowpart (SImode, operands[3]); Index: gcc/config/i386/predicates.md === --- gcc/config/i386/predicates.md (revision 243548) +++ gcc/config/i386/predicates.md (working copy) @@ -765,6 +765,14 @@ (define_predicate "const248_operand" return i == 2 || i == 4 || i == 8; }) +;; Match 1, 2, or 3. Used for lea shift amounts. +(define_predicate "const123_operand" + (match_code "const_int") +{ + HOST_WIDE_INT i = INTVAL (op); + return i == 1 || i == 2 || i == 3; +}) + ;; Match 2, 3, 6, or 7 (define_predicate "const2367_operand" (match_code "const_int") Index: gcc/testsuite/gcc.target/i386/pr71321.c === --- gcc/testsuite/gcc.target/i386/pr71321.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr71321.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef unsigned char uint8_t; +typedef unsigned int uint32_t; + +unsigned cvt_to_2digit(uint8_t i, uint8_t base) +{ + return ((i / base) | (uint32_t)(i % base)<<8); +} +unsigned cvt_to_2digit_ascii(uint8_t i) +{ + return cvt_to_2digit(i, 10) + 0x0a3030; +} +/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 } } */ +/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 } } */
Re: [PATCH v3] add -fprolog-pad=N,M option
I'll consider myself agnostic as to whether this is a feature we want or need, so I'll just comment on some style questions. There's a fair amount of coding style violations, I'll point some of them out but please read the documents we have linked on this page: https://gcc.gnu.org/contribute.html On 12/16/2016 03:14 PM, Torsten Duwe wrote: Signed-off-by: Torsten DuweThis is meaningless for the GCC project. We require a copyright assignment; I assume SuSE has a blanket one that covers you. You should also write a ChangeLog entry. diff --git a/gcc/attribs.c b/gcc/attribs.c index e66349a..6ff81a8 100644 --- a/gcc/attribs.c +++ b/gcc/attribs.c @@ -365,6 +365,28 @@ decl_attributes (tree *node, tree attributes, int flags) if (!attributes_initialized) init_attributes (); + /* If we're building NOP pads because of a command line arg, note the size + for LTO builds, unless the attribute has already been overridden. */ Two spaces at the end of a sentence, including at the end of a comment. + if (TREE_CODE (*node) == FUNCTION_DECL && + prolog_nop_pad_size > 0) Operators go on the next line when wrapping. +), Don't put closing parens on their own line. diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index db293fe..7f3e558 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -2322,6 +2322,34 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype) TREE_ASM_WRITTEN (olddecl) = 0; } + /* Prolog pad size may be set wrongly by a forward declaration; + fix it up by pulling the final value in front. + */ The "*/" should go on the previous line. + for (it = _ATTRIBUTES (newdecl); *it; it = _CHAIN(*it)) Space before opening parentheses (many occurrences). +void +default_print_prolog_pad (FILE *file, unsigned HOST_WIDE_INT pad_size, + bool record_p) Every function needs a comment describing its purpose and that of its arguments. Look around for examples. There are cases in this patch of hook implementations which ignore all their args, there I believe we can relax this rule a little (but a comment saying which hook is implemented would still be good). +extern void default_print_prolog_pad (FILE *, unsigned HOST_WIDE_INT , bool); Careful with stray whitespace. + if (tree_fits_uhwi_p (prolog_pad_value1) ) Here too. + { + pad_size = tree_to_uhwi (prolog_pad_value1); + } Lose { } braces around single statements. Several cases. Am I missing it or is there no actual implementation of the target hook anywhere? Bernd
Re: [PATCH 1/2] print-rtl.c: use '<' and '>' rather than % for pseudos in compact mode
On 12/16/2016 09:18 PM, David Malcolm wrote: The following patch implements the change for print-rtl.c. OK for trunk assuming it passes bootstrap? Yes. Bernd
Re: [PATCH] Fix assertions along default switch labels (PR tree-optimization/78819)
On 12/16/2016 12:49 PM, Marek Polacek wrote: But as this testcase shows, this breaks when the default label shares a label with another case. On this testcase, when we reach the switch, we know that argc is either 1, 2, or 3. So by the time we reach vrp2, the IR will have been optimized to switch (argc) { default: case 3: // argc will be considered 1 despite the case 3 break; case 2: ... } Shouldn't we just remove the "case 3:" from the switch in this case? Would that fix things? Bernd
Re: Problem with pseudo-reg syntax in RTL frontend
On 12/14/2016 05:57 PM, David Malcolm wrote: Any preferences? (or other syntax ideas?). My preference is one of the currently-unused sigils e.g. "@3", or to wrap them in braces "{3}". Either might work, I'd vaguely prefer <3> over {3} but they're equivalent really. Maybe using "@" is simplest if it's unused. Bernd
Re: [PATCH] Formatting and spelling fixes for ipa-cp.c
On 12/15/2016 05:51 PM, Jakub Jelinek wrote: This patch fixes what I've found quickly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Bernd
Re: cprop fix for PR78626
On 12/12/2016 03:21 PM, Bernd Schmidt wrote: On 12/10/2016 08:58 PM, Segher Boessenkool wrote: On Thu, Dec 08, 2016 at 01:21:04PM +0100, Bernd Schmidt wrote: This is another case where an optimization turns a trap_if unconditional. We have to defer changing the CFG, since the rest of cprop seems to blow up when we modify things while scanning. The problem for PR78727 is that we also need to do this for insns that already are the last insn in the block: + while (!uncond_traps.is_empty ()) +{ + rtx_insn *insn = uncond_traps.pop (); + basic_block to_split = BLOCK_FOR_INSN (insn); + remove_edge (split_block (to_split, insn)); + emit_barrier_after_bb (to_split); +} We need that barrier, and we also need the successor edges removed (which split_block+remove_edge does). (PR78727 works fine with just that BB_END test deleted). Ah, ok. In that case I'll probably also add a test to make sure this is only done for insns that weren't an unconditional trap before. Retesting... That would be this patch. Tested as before. The two new testcases seem to pass with a ppc cross (but I would appreciate if someone were to run full tests on ppc). Bernd PR rtl-optimization/78626 PR rtl-optimization/78727 * cprop.c (one_cprop_pass): Collect unconditional traps in the middle of a block, and split such blocks after everything else is finished. PR rtl-optimization/78626 PR rtl-optimization/78727 * gcc.dg/torture/pr78626.c: New test. * gcc.dg/torture/pr78727.c: New test. Index: gcc/cprop.c === --- gcc/cprop.c (revision 243548) +++ gcc/cprop.c (working copy) @@ -1794,7 +1794,7 @@ one_cprop_pass (void) if (set_hash_table.n_elems > 0) { basic_block bb; - rtx_insn *insn; + auto_vec uncond_traps; alloc_cprop_mem (last_basic_block_for_fn (cfun), set_hash_table.n_elems); @@ -1810,6 +1810,9 @@ one_cprop_pass (void) EXIT_BLOCK_PTR_FOR_FN (cfun), next_bb) { + bool seen_uncond_trap = false; + rtx_insn *insn; + /* Reset tables used to keep track of what's still valid [since the start of the block]. */ reset_opr_set_tables (); @@ -1817,6 +1820,10 @@ one_cprop_pass (void) FOR_BB_INSNS (bb, insn) if (INSN_P (insn)) { + bool was_uncond_trap + = (GET_CODE (PATTERN (insn)) == TRAP_IF + && XEXP (PATTERN (insn), 0) == const1_rtx); + changed |= cprop_insn (insn); /* Keep track of everything modified by this insn. */ @@ -1825,11 +1832,27 @@ one_cprop_pass (void) insn into a NOTE, or deleted the insn. */ if (! NOTE_P (insn) && ! insn->deleted ()) mark_oprs_set (insn); + + if (!was_uncond_trap && !seen_uncond_trap + && GET_CODE (PATTERN (insn)) == TRAP_IF + && XEXP (PATTERN (insn), 0) == const1_rtx) + { + seen_uncond_trap = true; + uncond_traps.safe_push (insn); + } } } changed |= bypass_conditional_jumps (); + while (!uncond_traps.is_empty ()) + { + rtx_insn *insn = uncond_traps.pop (); + basic_block to_split = BLOCK_FOR_INSN (insn); + remove_edge (split_block (to_split, insn)); + emit_barrier_after_bb (to_split); + } + FREE_REG_SET (reg_set_bitmap); free_cprop_mem (); } Index: gcc/testsuite/gcc.dg/torture/pr78626.c === --- gcc/testsuite/gcc.dg/torture/pr78626.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr78626.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ + +int qs; + +void +ms (int g1) +{ + int cy; + int *fr = + + for (;;) +{ + *fr = 1; + fr = + + while (qs != 0) +{ + if (qs | cy) +qs = g1 / 0; /* { dg-warning "division" } */ + ++qs; +} + + cy = 1; + while (cy != 0) +cy = 2; +} +} Index: gcc/testsuite/gcc.dg/torture/pr78727.c === --- gcc/testsuite/gcc.dg/torture/pr78727.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr78727.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ + +int +dd (int gj, unsigned int o7) +{ + long long int e8 = gj; + + e8 |= gj + 1u; + if (e8 != 0) +{ + short int *mn = (short int *) + int pv; + + e8 &= e8 > gj; + gj = o7 > e8; + pv = ((gj != 0) ? gj : *mn) && e8; + gj |= *mn / pv; +} + + return gj; +}
Re: PR target/78213 revisited (was Re: [PATCH 5/9] Introduce selftest::locate_file (v4))
On 12/09/2016 08:32 PM, David Malcolm wrote: Thanks. Unfortunately, applying the "locate_file" patch https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01186.html would now introduce a regression in a recently-added test case: The problem is that this DejaGnu test case uses -fself-test, and doesn't provide any arguments. With the locate_file patch, we need to pass the path to $(srcdir)/testsuite/selftests as an argument to -fself -test, and it's not clear to me how to do that sanely in a DejaGnu test case; if I pass in a dummy value (like for pr71591.c), then the selftests that use locate_file fail. I think this part of the selftest framework (how to locate files and tests) is possibly going to need some rethinking. But for now, I'm ok with just adding dg-skip-if *-*-* to the problematic testcase, with a comment describing the situation. Bernd
Re: cprop fix for PR78626
On 12/10/2016 08:58 PM, Segher Boessenkool wrote: On Thu, Dec 08, 2016 at 01:21:04PM +0100, Bernd Schmidt wrote: This is another case where an optimization turns a trap_if unconditional. We have to defer changing the CFG, since the rest of cprop seems to blow up when we modify things while scanning. The problem for PR78727 is that we also need to do this for insns that already are the last insn in the block: + while (!uncond_traps.is_empty ()) + { + rtx_insn *insn = uncond_traps.pop (); + basic_block to_split = BLOCK_FOR_INSN (insn); + remove_edge (split_block (to_split, insn)); + emit_barrier_after_bb (to_split); + } We need that barrier, and we also need the successor edges removed (which split_block+remove_edge does). (PR78727 works fine with just that BB_END test deleted). Ah, ok. In that case I'll probably also add a test to make sure this is only done for insns that weren't an unconditional trap before. Retesting... Bernd
Re: [nvptx] propagating conditionals in worker-vector partitioned loops
On 10/27/2016 12:29 AM, Cesar Philippidis wrote: Currently, the nvptx backend is only neutering the worker axis when propagating variables used in conditional expressions across the worker and vector axes. That's a problem with the worker-state spill and fill propagation implementation because all of the vector threads in worker 0 all write the the same address location being spilled. As the attached test case demonstrates, this might cause an infinite loop depending on the values in the vector threads being propagated. This patch fixes this issue by introducing a new worker-vector predicate, so that both the worker and vector threads can be predicated together, not separately. I.e., instead of first neutering worker axis, then neutering the vector axis, this patch uses a single predicate for tid.x == 0 && tid.y == 0. Is this patch ok for trunk? This is more of an OpenACC patch than an nvptx patch. Nathan would be the best person to review it, but if he is disinclined, I'll just approve it on the grounds that you're probably in the best position to know. Bernd
Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE
On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote: Regardless, 'reload_cse_simplify' would never perform the opposite transformation. It checks whether it can replace anything within the first argument INSN, with the second argument TESTREG. As the name implies this will always be a register. I double checked, the function is only called in 'reload_cse_regs' and 'testreg' is created using 'gen_rtx_REG'. Ok, let's go ahead with it. Bernd
Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE
On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote: Regardless, the other testcases I add in this patch show a sub-optimal transformation done by postreload, turning direct calls into indirect calls, for targets which have specifically pointed out that no CSE should be done on functions through 'NO_FUNCTION_CSE'. What I'm wondering about is whether the patch wouldn't also prevent the opposite transformation. Is there a reason not to do that one? Can the problem be modeled by tweaking costs? Would you prefer I create a new PR for the problem this is actually fixing and refile this PATCH under that PR? Well, as long as you're working on fixing it I see no reason to clutter the bug database for the function cse issue, but do keep the existing PR open if there also ought to be register class changes. Bernd
Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE
On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote: This patch fixes the issue reported in PR78255 by making postreload aware it should not be performing CSE on functions if NO_FUNCTION_CSE is defined to true. Bootstrap and full regression on arm-none-linux-gnueabihf and aarch64-unknown-linux-gnu. Also checked this fixed the reported issue on arm-none-eabi. Is this OK for trunk? Hmm, it probably doesn't hurt, but looking at the PR I think the originally reported problem suggests you need a different fix: a separate register class to be used for indirect sibling calls. I remember seeing similar issues on other targets. Bernd