Re: Fix PR77881: combine improvement
Hi, On Fri, 18 Nov 2016, Bin.Cheng wrote: > On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwabwrote: > > On Nov 14 2016, Michael Matz wrote: > > > >> PR missed-optimization/77881 > >> * combine.c (simplify_comparison): Remove useless subregs > >> also inside the loop, not just after it. > >> (make_compound_operation): Recognize some subregs as being > >> masking as well. > > > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. > Hi, > I can confirm that, also new PR opened for tracking. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 See PR78390 for a patch (comment #8) fixing the aarch64 problem. Ciao, Michael.
Re: Fix PR77881: combine improvement
On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwabwrote: > On Nov 14 2016, Michael Matz wrote: > >> PR missed-optimization/77881 >> * combine.c (simplify_comparison): Remove useless subregs >> also inside the loop, not just after it. >> (make_compound_operation): Recognize some subregs as being >> masking as well. > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Hi, I can confirm that, also new PR opened for tracking. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 Thanks, bin > > Andreas. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
Re: Fix PR77881: combine improvement
On Nov 14 2016, Michael Matzwrote: > PR missed-optimization/77881 > * combine.c (simplify_comparison): Remove useless subregs > also inside the loop, not just after it. > (make_compound_operation): Recognize some subregs as being > masking as well. This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: Fix PR77881: combine improvement
On Mon, Nov 14, 2016 at 05:56:49AM +0100, Michael Matz wrote: > With this patch there are now no regressions on x86-64-linux (bootstrapped > with all languages+ada). Okay for trunk? I build cross-compilers for this for a whole bunch of archs, and built Linux with that, to see what effect this has. This is the code size generated before and after the patch; "0" means something failed to build (either the compiler, for the mips targets and tilegx after the patch, or Linux): beforeafter alpha 5410232 5410264 arc 3624274 3624338 arm 0 0 arm64 9086689 9082593 blackfin 1963170 1963226 c6x 2086879 2086911 cris 2186162 2186130 frv 3623264 3623264 h8300 1052810 1052850 i386 9723021 9721407 ia64 15243432 15244136 m32r 3415580 3415580 m68k 3221030 3221070 microblaze 0 0 mips 0 0 mips64 0 0 mn10300 2349253 2349237 nios2 3172110 3172182 parisc 8241147 8241147 parisc64 7197909 7196853 powerpc 8396871 8396863 powerpc64 14908442 14907866 s390 12579952 12579568 sh 2819700 2819716 shnommu 1360512 1360512 sparc 3734865 3734881 sparc64 5932081 5932249 tilegx 10839527 0 tilepro 10092546 10092610 x86_64 10349451 10349038 xtensa 1766572 1766572 So the patch helps nicely on many targets. I looked into the regressions; they all seem to be just unlucky, noise, or bad rtx_cost. The tilegx build fail is a target bug building _negvsi2.o -- the backend accepts shifts by 70 or 87 bits, but the assembler doesn't ;-) > @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, > rtx *pop1) > if (subreg_lowpart_p (op0) > && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) > ; > + else if (subreg_lowpart_p (op0) Many of these lines start with a space before the tab, please fix. Okay for trunk with that fixed. Thank you! Segher
Re: Fix PR77881: combine improvement
Hi, On Sat, 12 Nov 2016, Segher Boessenkool wrote: > Hi Michael, > > On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote: > > PR missed-optimization/77881 > > * combine.c (simplify_comparison): Remove useless subregs > > also inside the loop, not just after it. > > > > testsuite/ > > * gcc.target/i386/pr77881.c: New test. > > This isn't checked in yet, do you still want it? Gnah, fell through the cracks. I had to fix something else in combine to make it not regress in the testsuite. The problem is that removing the subregs enables further simplifications which in turn might not be expected down-stream. The particular problem was that originally the loop was left with an (subreg:QI (and:SI (lrshift X 24) 255) 0), whose inner op in turn was recognized by make_compound_operation and transformed into an extract. With the patch we leave the loop now with essentially (subreg:QI (lrshift X 24) 0) which of course is just the same masking and hence extract, but make_compound_operation didn't know. With the amended patch it does. I didn't come around making the AND and SUBREG handling a bit more common (which I initially wanted to do before posting), so for now I'm handling only the specific case I hit. With this patch there are now no regressions on x86-64-linux (bootstrapped with all languages+ada). Okay for trunk? Ciao, Michael. PR missed-optimization/77881 * combine.c (simplify_comparison): Remove useless subregs also inside the loop, not just after it. (make_compound_operation): Recognize some subregs as being masking as well. testsuite/ * gcc.target/i386/pr77881.c: New test. diff --git a/gcc/combine.c b/gcc/combine.c index 6ffa387..0210685 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -8102,6 +8102,18 @@ make_compound_operation (rtx x, enum rtx_code in_code) rtx inner = SUBREG_REG (x), simplified; enum rtx_code subreg_code = in_code; + /* If the SUBREG is masking of a logical right shift, + make an extraction. */ + if (GET_CODE (inner) == LSHIFTRT + && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner)) + && subreg_lowpart_p (x)) + { + new_rtx = make_compound_operation (XEXP (inner, 0), next_code); + new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1), + mode_width, 1, 0, in_code == COMPARE); + break; + } + /* If in_code is COMPARE, it isn't always safe to pass it through to the recursive make_compound_operation call. */ if (subreg_code == COMPARE @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1) if (subreg_lowpart_p (op0) && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width) ; + else if (subreg_lowpart_p (op0) + && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT + && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT + && (code == NE || code == EQ) + && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) + <= HOST_BITS_PER_WIDE_INT) + && !paradoxical_subreg_p (op0) + && (nonzero_bits (SUBREG_REG (op0), +GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + /* Remove outer subregs that don't do anything. */ + tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1); + + if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0))) + & ~GET_MODE_MASK (GET_MODE (op0))) == 0) + { + op0 = SUBREG_REG (op0); + op1 = tem; + continue; + } + break; + } else break; diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c b/gcc/testsuite/gcc.target/i386/pr77881.c new file mode 100644 index 000..80d143f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr77881.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pie } */ +/* { dg-options "-O2" } */ +extern void baz(void); +int +foo (long long int a, long long int a2, int b) +{ +if (a < 0 || b) + baz (); +} +/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */ +/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */
Re: Fix PR77881: combine improvement
Hi Michael, On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote: > PR missed-optimization/77881 > * combine.c (simplify_comparison): Remove useless subregs > also inside the loop, not just after it. > > testsuite/ > * gcc.target/i386/pr77881.c: New test. This isn't checked in yet, do you still want it? Thanks, Segher
Re: Fix PR77881: combine improvement
On 10/20/2016 08:20 AM, Michael Matz wrote: Hello, like analyzed in the PR, combine is able to remove outer subregs that don't do anything interesting in the context they are used (simplify_comparison). But that currently happens outside of the loop that retries simplifications if changes occurred. When we do that inside the loop as well we get secondary simplifications that currently only happen when calling the simplifiers multiple time, like when we start from three rather than from two instructions. So right now we're in the curious position that more complicated code is optimized better than simpler code and the patch fixes this. (FWIW: this replicates parts of rather than moves the responsible code, because between the loop and the original place of simplification other things happen that might itself generate subregs). Regstrapping on x86-64, all languages in process. Okay if that passes? Ciao, Michael. PR missed-optimization/77881 * combine.c (simplify_comparison): Remove useless subregs also inside the loop, not just after it. testsuite/ * gcc.target/i386/pr77881.c: New test. LGTM. I was a bit curious why you duplicated rather than factoring, but it looks like you simplified the copy a bit by not handling the paradoxical subreg case. There's an outside chance this might help a couple BZs that I've poked at in the past. I'll make a point to test them again this release cycle to see if your change improves them (I don't have the #s handy, I think they're P4/P5 regressions). jeff