Re: Fix PR77881: combine improvement

2016-11-18 Thread Michael Matz
Hi,

On Fri, 18 Nov 2016, Bin.Cheng wrote:

> On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab  wrote:
> > On Nov 14 2016, Michael Matz  wrote:
> >
> >>   PR missed-optimization/77881
> >>   * combine.c (simplify_comparison): Remove useless subregs
> >>   also inside the loop, not just after it.
> >>   (make_compound_operation): Recognize some subregs as being
> >>   masking as well.
> >
> > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
> Hi,
> I can confirm that, also new PR opened for tracking.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

See PR78390 for a patch (comment #8) fixing the aarch64 problem.


Ciao,
Michael.


Re: Fix PR77881: combine improvement

2016-11-18 Thread Bin.Cheng
On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwab  wrote:
> On Nov 14 2016, Michael Matz  wrote:
>
>>   PR missed-optimization/77881
>>   * combine.c (simplify_comparison): Remove useless subregs
>>   also inside the loop, not just after it.
>>   (make_compound_operation): Recognize some subregs as being
>>   masking as well.
>
> This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.
Hi,
I can confirm that, also new PR opened for tracking.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422

Thanks,
bin
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: Fix PR77881: combine improvement

2016-11-16 Thread Andreas Schwab
On Nov 14 2016, Michael Matz  wrote:

>   PR missed-optimization/77881
>   * combine.c (simplify_comparison): Remove useless subregs
>   also inside the loop, not just after it.
>   (make_compound_operation): Recognize some subregs as being
>   masking as well.

This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Fix PR77881: combine improvement

2016-11-14 Thread Segher Boessenkool
On Mon, Nov 14, 2016 at 05:56:49AM +0100, Michael Matz wrote:
> With this patch there are now no regressions on x86-64-linux (bootstrapped 
> with all languages+ada).  Okay for trunk?

I build cross-compilers for this for a whole bunch of archs, and built
Linux with that, to see what effect this has.  This is the code size
generated before and after the patch; "0" means something failed to
build (either the compiler, for the mips targets and tilegx after the
patch, or Linux):

beforeafter
   alpha   5410232   5410264
 arc   3624274   3624338
 arm 0 0
   arm64   9086689   9082593
blackfin   1963170   1963226
 c6x   2086879   2086911
cris   2186162   2186130
 frv   3623264   3623264
   h8300   1052810   1052850
i386   9723021   9721407
ia64  15243432  15244136
m32r   3415580   3415580
m68k   3221030   3221070
  microblaze 0 0
mips 0 0
  mips64 0 0
 mn10300   2349253   2349237
   nios2   3172110   3172182
  parisc   8241147   8241147
parisc64   7197909   7196853
 powerpc   8396871   8396863
   powerpc64  14908442  14907866
s390  12579952  12579568
  sh   2819700   2819716
 shnommu   1360512   1360512
   sparc   3734865   3734881
 sparc64   5932081   5932249
  tilegx  10839527 0
 tilepro  10092546  10092610
  x86_64  10349451  10349038
  xtensa   1766572   1766572

So the patch helps nicely on many targets.  I looked into the regressions;
they all seem to be just unlucky, noise, or bad rtx_cost.  The tilegx build
fail is a target bug building _negvsi2.o -- the backend accepts shifts by
70 or 87 bits, but the assembler doesn't ;-)

> @@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
> rtx *pop1)
> if (subreg_lowpart_p (op0)
> && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
>   ;
> +   else if (subreg_lowpart_p (op0)

Many of these lines start with a space before the tab, please fix.
Okay for trunk with that fixed.  Thank you!


Segher


Re: Fix PR77881: combine improvement

2016-11-13 Thread Michael Matz
Hi,

On Sat, 12 Nov 2016, Segher Boessenkool wrote:

> Hi Michael,
> 
> On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
> > PR missed-optimization/77881
> > * combine.c (simplify_comparison): Remove useless subregs
> > also inside the loop, not just after it.
> > 
> > testsuite/
> > * gcc.target/i386/pr77881.c: New test.
> 
> This isn't checked in yet, do you still want it?

Gnah, fell through the cracks.  I had to fix something else in combine to 
make it not regress in the testsuite.  The problem is that removing the 
subregs enables further simplifications which in turn might not be 
expected down-stream.  The particular problem was that originally the loop 
was left with an (subreg:QI (and:SI (lrshift X 24) 255) 0), whose inner op 
in turn was recognized by make_compound_operation and transformed into an 
extract.

With the patch we leave the loop now with essentially
  (subreg:QI (lrshift X 24) 0)
which of course is just the same masking and hence extract, but 
make_compound_operation didn't know.  With the amended patch it does.  I 
didn't come around making the AND and SUBREG handling a bit more common 
(which I initially wanted to do before posting), so for now I'm handling 
only the specific case I hit.

With this patch there are now no regressions on x86-64-linux (bootstrapped 
with all languages+ada).  Okay for trunk?


Ciao,
Michael.
PR missed-optimization/77881
* combine.c (simplify_comparison): Remove useless subregs
also inside the loop, not just after it.
(make_compound_operation): Recognize some subregs as being
masking as well.

testsuite/
* gcc.target/i386/pr77881.c: New test.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6ffa387..0210685 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8102,6 +8102,18 @@ make_compound_operation (rtx x, enum rtx_code in_code)
rtx inner = SUBREG_REG (x), simplified;
enum rtx_code subreg_code = in_code;
 
+   /* If the SUBREG is masking of a logical right shift,
+  make an extraction.  */
+   if (GET_CODE (inner) == LSHIFTRT
+   && GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (inner))
+   && subreg_lowpart_p (x))
+ {
+   new_rtx = make_compound_operation (XEXP (inner, 0), next_code);
+   new_rtx = make_extraction (mode, new_rtx, 0, XEXP (inner, 1),
+  mode_width, 1, 0, in_code == COMPARE);
+   break;
+ }
+
/* If in_code is COMPARE, it isn't always safe to pass it through
   to the recursive make_compound_operation call.  */
if (subreg_code == COMPARE
@@ -11994,6 +12006,29 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
  if (subreg_lowpart_p (op0)
  && GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))) < mode_width)
;
+ else if (subreg_lowpart_p (op0)
+  && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+  && GET_MODE_CLASS (GET_MODE (SUBREG_REG (op0))) == MODE_INT
+  && (code == NE || code == EQ)
+  && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0)))
+  <= HOST_BITS_PER_WIDE_INT)
+  && !paradoxical_subreg_p (op0)
+  && (nonzero_bits (SUBREG_REG (op0),
+GET_MODE (SUBREG_REG (op0)))
+  & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+   {
+ /* Remove outer subregs that don't do anything.  */
+ tem = gen_lowpart (GET_MODE (SUBREG_REG (op0)), op1);
+ 
+ if ((nonzero_bits (tem, GET_MODE (SUBREG_REG (op0)))
+  & ~GET_MODE_MASK (GET_MODE (op0))) == 0)
+   {
+ op0 = SUBREG_REG (op0);
+ op1 = tem;
+ continue;
+   }
+ break;
+   }
  else
break;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr77881.c 
b/gcc/testsuite/gcc.target/i386/pr77881.c
new file mode 100644
index 000..80d143f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr77881.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
+/* { dg-options "-O2" } */
+extern void baz(void);
+int
+foo (long long int a, long long int a2, int b)
+{
+if (a < 0 || b)
+  baz ();
+}
+/* { dg-final { scan-assembler "js\[ \t\]\.L" } } */
+/* { dg-final { scan-assembler "jne\[ \t\]\.L" } } */


Re: Fix PR77881: combine improvement

2016-11-12 Thread Segher Boessenkool
Hi Michael,

On Thu, Oct 20, 2016 at 04:20:09PM +0200, Michael Matz wrote:
>   PR missed-optimization/77881
>   * combine.c (simplify_comparison): Remove useless subregs
>   also inside the loop, not just after it.
> 
> testsuite/
>   * gcc.target/i386/pr77881.c: New test.

This isn't checked in yet, do you still want it?

Thanks,


Segher


Re: Fix PR77881: combine improvement

2016-10-20 Thread Jeff Law

On 10/20/2016 08:20 AM, Michael Matz wrote:

Hello,

like analyzed in the PR, combine is able to remove outer subregs that
don't do anything interesting in the context they are used
(simplify_comparison).  But that currently happens outside of the loop
that retries simplifications if changes occurred.

When we do that inside the loop as well we get secondary simplifications
that currently only happen when calling the simplifiers multiple time,
like when we start from three rather than from two instructions.  So
right now we're in the curious position that more complicated code is
optimized better than simpler code and the patch fixes this.

(FWIW: this replicates parts of rather than moves the responsible code,
because between the loop and the original place of simplification other
things happen that might itself generate subregs).

Regstrapping on x86-64, all languages in process.  Okay if that passes?


Ciao,
Michael.
PR missed-optimization/77881
* combine.c (simplify_comparison): Remove useless subregs
also inside the loop, not just after it.

testsuite/
* gcc.target/i386/pr77881.c: New test.
LGTM.  I was a bit curious why you duplicated rather than factoring, but 
it looks like you simplified the copy a bit by not handling the 
paradoxical subreg case.



There's an outside chance this might help a couple BZs that I've poked 
at in the past.  I'll make a point to test them again this release cycle 
to see if your change improves them (I don't have the #s handy, I think 
they're P4/P5 regressions).


jeff