PING: RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-06-02 Thread Yangfei (Felix)
Gentle ping ...

> -Original Message-
> From: Yangfei (Felix)
> Sent: Wednesday, May 27, 2020 11:52 AM
> To: 'Segher Boessenkool' 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: RE: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi,
> 
> > -Original Message-
> > From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> > Sent: Tuesday, May 26, 2020 11:32 PM
> > To: Yangfei (Felix) 
> > Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A)
> > 
> > Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> > comparisons with zero
> 
> Snip...
> 
> >
> > Yes, please try to get this sorted somehow.  Maybe you can ask other
> > people in your company that have this same problem?
> 
> Will try and see.
> 
> > > > > +   new_rtx = gen_rtx_AND (mode, new_rtx,
> > > > > +  gen_int_mode (mask << real_pos,
> mode));
> > > > > + }
> > > >
> > > > So this changes
> > > >   ((X >> C) & M) == ...
> > > > to
> > > >   (X & (M << C)) == ...
> > > > ?
> > > >
> > > > Where then does it check what ... is?  This is only valid like
> > > > this if that is
> > zero.
> > > >
> > > > Why should this go in combine and not in simplify-rtx instead?
> > >
> > > True.  This is only valid when ... is zero.
> > > That's why we need the "&& equality_comparison " condition here.
> >
> > But that doesn't test if the other side of the comparison is 0.
> 
> Well, the caller has ensured that.
> 
> Here, local variable "equality_comparison" in
> make_compound_operation_int depends on parameter "in_code":
>  8088   if (in_code == EQ)
>  8089 {
>  8090   equality_comparison = true;
>  8091   in_code = COMPARE;
>  8092 }
> 
> The only caller of make_compound_operation_int is
> make_compound_operation.
> The comment of the caller says something about " in_code ":
> 
>  8512IN_CODE says what kind of expression we are processing.  Normally, it
> is
>  8513SET.  In a memory address it is MEM.  When processing the arguments
> of
>  8514a comparison or a COMPARE against zero, it is COMPARE, or EQ if
> more
>  8515precisely it is an equality comparison against zero.  */
> 
> For the given test case, we have a call trace of:
> (gdb) bt
> #0  make_compound_operation_int (mode=..., x_ptr=0xbd08,
> in_code=COMPARE, next_code_ptr=0xbd1c) at ../../gcc-
> git/gcc/combine.c:8248
> #1  0x0208983c in make_compound_operation (x=0xb211c768,
> in_code=EQ) at ../../gcc-git/gcc/combine.c:8539
> #2  0x020970fc in simplify_comparison (code=NE,
> pop0=0xc1e8, pop1=0xc1e0) at ../../gcc-
> git/gcc/combine.c:13032
> #3  0x02084544 in simplify_set (x=0xb211c240) at ../../gcc-
> git/gcc/combine.c:6932
> #4  0x02082688 in combine_simplify_rtx (x=0xb211c240,
> op0_mode=E_VOIDmode, in_dest=0, in_cond=0) at ../../gcc-
> git/gcc/combine.c:6445
> #5  0x0208025c in subst (x=0xb211c240, from=0xb211c138,
> to=0xb211c150, in_dest=0, in_cond=0, unique_copy=0)
> at ../../gcc-git/gcc/combine.c:5724
> #6  0x02079110 in try_combine (i3=0xb22cc3c0, i2=0xb22cc340,
> i1=0x0, i0=0x0, new_direct_jump_p=0xceb4,
> last_combined_insn=0xb22cc3c0) at ../../gcc-git/gcc/combine.c:3413
> #7  0x02073004 in combine_instructions (f=0xb211d038,
> nregs=103) at ../../gcc-git/gcc/combine.c:1305
> #8  0x0209cc50 in rest_of_handle_combine () at ../../gcc-
> git/gcc/combine.c:15088
> 
> In simplify_comparison (combine.c:13032):
> 
> 13028   rtx_code op0_mco_code = SET;
> 13029   if (op1 == const0_rtx)
> 13030 op0_mco_code = code == NE || code == EQ ? EQ : COMPARE;
> 13031
> 13032   op0 = make_compound_operation (op0, op0_mco_code);
> 13033   op1 = make_compound_operation (op1, SET);
> 
> 
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr94026.c
> > > > > @@ -0,0 +1,21 @@
> > > > > +/* { dg-do compile { target aarch64*-*-* i?86-*-* x86_64-*-* }
> > > > > +} */
> > > >
> > > > Why restrict this to only some targets?
> > >
> > > That's because I only have these targets for verification.
> > > But I think this can work on other targets.  Removed from the v4 patch.
> > > Could you please help check

RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-26 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Tuesday, May 26, 2020 11:32 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero

Snip...

> 
> Yes, please try to get this sorted somehow.  Maybe you can ask other people
> in your company that have this same problem?

Will try and see.

> > > > + new_rtx = gen_rtx_AND (mode, new_rtx,
> > > > +gen_int_mode (mask << real_pos, mode));
> > > > +   }
> > >
> > > So this changes
> > >   ((X >> C) & M) == ...
> > > to
> > >   (X & (M << C)) == ...
> > > ?
> > >
> > > Where then does it check what ... is?  This is only valid like this if 
> > > that is
> zero.
> > >
> > > Why should this go in combine and not in simplify-rtx instead?
> >
> > True.  This is only valid when ... is zero.
> > That's why we need the "&& equality_comparison " condition here.
> 
> But that doesn't test if the other side of the comparison is 0.

Well, the caller has ensured that.

Here, local variable "equality_comparison" in make_compound_operation_int 
depends on parameter "in_code":
 8088   if (in_code == EQ)
 8089 {
 8090   equality_comparison = true;
 8091   in_code = COMPARE;
 8092 }

The only caller of make_compound_operation_int is make_compound_operation.
The comment of the caller says something about " in_code ":

 8512IN_CODE says what kind of expression we are processing.  Normally, it 
is
 8513SET.  In a memory address it is MEM.  When processing the arguments of
 8514a comparison or a COMPARE against zero, it is COMPARE, or EQ if more
 8515precisely it is an equality comparison against zero.  */

For the given test case, we have a call trace of:
(gdb) bt
#0  make_compound_operation_int (mode=..., x_ptr=0xbd08, 
in_code=COMPARE, next_code_ptr=0xbd1c) at 
../../gcc-git/gcc/combine.c:8248
#1  0x0208983c in make_compound_operation (x=0xb211c768, 
in_code=EQ) at ../../gcc-git/gcc/combine.c:8539
#2  0x020970fc in simplify_comparison (code=NE, pop0=0xc1e8, 
pop1=0xc1e0) at ../../gcc-git/gcc/combine.c:13032
#3  0x02084544 in simplify_set (x=0xb211c240) at 
../../gcc-git/gcc/combine.c:6932
#4  0x02082688 in combine_simplify_rtx (x=0xb211c240, 
op0_mode=E_VOIDmode, in_dest=0, in_cond=0) at ../../gcc-git/gcc/combine.c:6445
#5  0x0208025c in subst (x=0xb211c240, from=0xb211c138, 
to=0xb211c150, in_dest=0, in_cond=0, unique_copy=0)
at ../../gcc-git/gcc/combine.c:5724
#6  0x02079110 in try_combine (i3=0xb22cc3c0, i2=0xb22cc340, 
i1=0x0, i0=0x0, new_direct_jump_p=0xceb4,
last_combined_insn=0xb22cc3c0) at ../../gcc-git/gcc/combine.c:3413
#7  0x02073004 in combine_instructions (f=0xb211d038, nregs=103) at 
../../gcc-git/gcc/combine.c:1305
#8  0x0209cc50 in rest_of_handle_combine () at 
../../gcc-git/gcc/combine.c:15088

In simplify_comparison (combine.c:13032):

13028   rtx_code op0_mco_code = SET;
13029   if (op1 == const0_rtx)
13030 op0_mco_code = code == NE || code == EQ ? EQ : COMPARE;
13031
13032   op0 = make_compound_operation (op0, op0_mco_code);
13033   op1 = make_compound_operation (op1, SET);


> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr94026.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-do compile { target aarch64*-*-* i?86-*-* x86_64-*-* } }
> > > > +*/
> > >
> > > Why restrict this to only some targets?
> >
> > That's because I only have these targets for verification.
> > But I think this can work on other targets.  Removed from the v4 patch.
> > Could you please help check the other ports?
> 
> In general, you should never restrict anything to some targets simply
> because you haven't tested it on other targets.
> 
> If it is a good test it will just work on those other targets.  Traffic on 
> gcc-
> testresults@ will show you if it actually does.

OK.  Thanks for pointing this out  :-)

> > > > +/* { dg-options "-O2 -fdump-rtl-combine" } */
> > > > +
> > > > +int
> > > > +foo (int c)
> > > > +{
> > > > +  int a = (c >> 8) & 7;
> > > > +
> > > > +  if (a >= 2) {
> > > > +return 1;
> > > > +  }
> > > > +
> > > > +  return 0;
> > > > +}
> > > > +
> > > > +/* The combine phas

Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-26 Thread Segher Boessenkool
Hi!

On Tue, May 26, 2020 at 03:45:05AM +, Yangfei (Felix) wrote:
> > > I am using Outlook and I didn't find the place to change the MIME type
> > > : - (
> > 
> > The simplest option is to use a different email client, one that plays 
> > nicely
> > with others.  You use git, maybe you could even use git-send-email?
> 
> The bad news is that it would be hard to switch to a different email client 
> with my company's IT policy  :-( 
> But I think I can ask IT if that is possible. Sorry for the trouble.

Yes, please try to get this sorted somehow.  Maybe you can ask other
people in your company that have this same problem?

> > > +   new_rtx = gen_rtx_AND (mode, new_rtx,
> > > +  gen_int_mode (mask << real_pos, mode));
> > > + }
> > 
> > So this changes
> >   ((X >> C) & M) == ...
> > to
> >   (X & (M << C)) == ...
> > ?
> > 
> > Where then does it check what ... is?  This is only valid like this if that 
> > is zero.
> > 
> > Why should this go in combine and not in simplify-rtx instead?
> 
> True.  This is only valid when ... is zero.
> That's why we need the "&& equality_comparison " condition here.

But that doesn't test if the other side of the comparison is 0.

> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr94026.c
> > > @@ -0,0 +1,21 @@
> > > +/* { dg-do compile { target aarch64*-*-* i?86-*-* x86_64-*-* } } */
> > 
> > Why restrict this to only some targets?
> 
> That's because I only have these targets for verification.
> But I think this can work on other targets.  Removed from the v4 patch.
> Could you please help check the other ports?

In general, you should never restrict anything to some targets simply
because you haven't tested it on other targets.

If it is a good test it will just work on those other targets.  Traffic
on gcc-testresults@ will show you if it actually does.

> > > +/* { dg-options "-O2 -fdump-rtl-combine" } */
> > > +
> > > +int
> > > +foo (int c)
> > > +{
> > > +  int a = (c >> 8) & 7;
> > > +
> > > +  if (a >= 2) {
> > > +return 1;
> > > +  }
> > > +
> > > +  return 0;
> > > +}
> > > +
> > > +/* The combine phase should transform (compare (and (lshiftrt x 8) 6) 0)
> > > +   to (compare (and (x 1536)) 0). We look for the *attempt* to match this
> > > +   RTL pattern, regardless of whether an actual insn may be found on the
> > > +   platform.  */
> > > +
> > > +/* { dg-final { scan-rtl-dump "\\(const_int 1536" "combine" } } */
> > 
> > That is a very fragile test.
> 
> For this specific test case, (const_int 1536) is calculated from 
> subexpression (M << C) in (X & (M << C)).
> I also see some similar checkings in gcc.dg/asr_div1.c.  Suggesions?

Maybe it is better to test that the non-optimal code you saw before is
not generated anymore?  That could be a target-specific test of course.
The advantage is that the test will not break for all kinds of unrelated
reasons (like this test) -- that simply does not scale with the number
of tests we have.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-25 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Tuesday, May 26, 2020 12:27 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero

Snip...

> > I am using Outlook and I didn't find the place to change the MIME type
> > : - (
> 
> The simplest option is to use a different email client, one that plays nicely
> with others.  You use git, maybe you could even use git-send-email?

The bad news is that it would be hard to switch to a different email client 
with my company's IT policy  :-( 
But I think I can ask IT if that is possible. Sorry for the trouble.

> I'll paste things manually...
> 
> > From a19238c02c1e6ab9593a14a13e1e3dff90ed Mon Sep 17 00:00:00
> 2001
> > From: Fei Yang 
> > Date: Mon, 25 May 2020 10:19:30 +0800
> > Subject: [PATCH] combine: missed opportunity to simplify comparisons
> > with zero  [PR94026]
> 
> (Capital "M" on "Missed" please)
> 
> But, the subject should say what the patch *does*.  So maybe
>   combine: Simplify more comparisons with zero (PR94026)

OK. 

> > If we have (and (lshiftrt X C) M) and M is a constant that would
> > select a field of bits within an item, but not the entire word, fold
> > this into a simple AND if we are in an equality comparison against zero.
> 
> But that subject doesn't really describe what the patch does, anyway?

OK.  Modified in the v4 patch.  Does it look better?

> > gcc/
> > PR rtl-optimization/94026
> > * combine.c (make_compound_operation_int): If we have (and
> > (lshiftrt X C) M) and M is a constant that would select a field
> > of bits within an item, but not the entire word, fold this into
> > a simple AND if we are in an equality comparison.
> >
> > gcc/testsuite/
> > PR rtl-optimization/94026
> > * gcc.dg/pr94026.c: New test.
> 
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,11 @@
> > +2020-05-25  Felix Yang  
> > +
> > +   PR rtl-optimization/94026
> > +   * combine.c (make_compound_operation_int): If we have (and
> > +   (lshiftrt X C) M) and M is a constant that would select a field
> > +   of bits within an item, but not the entire word, fold this into
> > +   a simple AND if we are in an equality comparison.
> 
> Don't put the changelog in the patch.

OK.  I paste it here:

gcc/ChangeLog

+2020-05-26  Felix Yang  
+
+   PR rtl-optimization/94026
+   * combine.c (make_compound_operation_int): If we have (and
+   (lshiftrt X C) M) and M is a constant that would select a field
+   of bits within an item, but not the entire word, fold this into
+   a simple AND if we are in an equality comparison.

gcc/testsuite/ChangeLog

+2020-05-26  Felix Yang  
+
+   PR rtl-optimization/94026
+   * gcc.dg/pr94026.c: New test.

> > diff --git a/gcc/combine.c b/gcc/combine.c index
> > b044f29fd36..76d62b0bd17 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -8178,6 +8178,10 @@ make_compound_operation_int
> (scalar_int_mode mode, rtx *x_ptr,
> >if (!CONST_INT_P (XEXP (x, 1)))
> > break;
> >
> > +  HOST_WIDE_INT pos;
> > +  unsigned HOST_WIDE_INT len;
> > +  pos = get_pos_from_mask (UINTVAL (XEXP (x, 1)), );
> 
>   unsigned HOST_WIDE_INT len;
>   HOST_WIDE_INT pos = get_pos_from_mask (UINTVAL (XEXP (x, 1)), );
> 
> > @@ -8231,6 +8235,22 @@ make_compound_operation_int
> (scalar_int_mode mode, rtx *x_ptr,
> >   new_rtx = make_compound_operation (new_rtx, in_code);
> > }
> >
> > +  /* If we have (and (lshiftrt X C) M) and M is a constant that would
> select
> > +a field of bits within an item, but not the entire word, this might be
> > +representable by a simple AND if we are in an equality comparison.
> */
> > +  else if (pos > 0 && equality_comparison
> 
> That "&& equality_comparison" should be on a separate line as well.

OK.

> > +  && GET_CODE (XEXP (x, 0)) == LSHIFTRT
> > +  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
> > +  && pos + UINTVAL (XEXP (XEXP (x, 0), 1))
> > + <= GET_MODE_BITSIZE (mode))
> > +   {
> > + new_rtx = make_compound_operation (XEXP (XEXP (x, 0), 0),
> next_code);
> > + HOST_WIDE_INT real_pos = pos + UINTVAL (XEXP (XEXP (x, 0), 1));
> > + unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 <<
> len) -
> > +1;
> 
> Space after cast

Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-25 Thread Segher Boessenkool
On Mon, May 25, 2020 at 02:59:30AM +, Yangfei (Felix) wrote:
> > It creates better code on all targets :-)  A quite small improvement, but 
> > not
> > entirely trivial.
> 
> Thanks for the effort.  It's great to hear that :- )

Yes :-)

> > > > p.s.  Please use a correct mime type?  application/octet-stream
> > > > isn't something I can reply to.  Just text/plain is fine :-)
> > >
> > > I have using plain text now, hope that works for you.  :-)
> > 
> > Nope:
> > 
> > [-- Attachment #2: pr94026-v2.diff --]
> > [-- Type: application/octet-stream, Encoding: base64, Size: 5.9K --]
> 
> This time I switched to use UUEncode type for the attachment.  Does it work?

No:

[-- Attachment #2: pr94026-v3.diff --]
[-- Type: application/octet-stream, Encoding: base64, Size: 5.8K --]

> I am using Outlook and I didn't find the place to change the MIME type : - (

The simplest option is to use a different email client, one that plays
nicely with others.  You use git, maybe you could even use git-send-email?

I'll paste things manually...

> From a19238c02c1e6ab9593a14a13e1e3dff90ed Mon Sep 17 00:00:00 2001
> From: Fei Yang 
> Date: Mon, 25 May 2020 10:19:30 +0800
> Subject: [PATCH] combine: missed opportunity to simplify comparisons with zero
>  [PR94026]

(Capital "M" on "Missed" please)

But, the subject should say what the patch *does*.  So maybe
  combine: Simplify more comparisons with zero (PR94026)

> If we have (and (lshiftrt X C) M) and M is a constant that would select
> a field of bits within an item, but not the entire word, fold this into
> a simple AND if we are in an equality comparison against zero.

But that subject doesn't really describe what the patch does, anyway?

> gcc/
> PR rtl-optimization/94026
> * combine.c (make_compound_operation_int): If we have (and
> (lshiftrt X C) M) and M is a constant that would select a field
> of bits within an item, but not the entire word, fold this into
> a simple AND if we are in an equality comparison.
> 
> gcc/testsuite/
> PR rtl-optimization/94026
> * gcc.dg/pr94026.c: New test.

> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,11 @@
> +2020-05-25  Felix Yang  
> +
> + PR rtl-optimization/94026
> + * combine.c (make_compound_operation_int): If we have (and
> + (lshiftrt X C) M) and M is a constant that would select a field
> + of bits within an item, but not the entire word, fold this into
> + a simple AND if we are in an equality comparison.

Don't put the changelog in the patch.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index b044f29fd36..76d62b0bd17 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -8178,6 +8178,10 @@ make_compound_operation_int (scalar_int_mode mode, rtx 
> *x_ptr,
>if (!CONST_INT_P (XEXP (x, 1)))
>   break;
>  
> +  HOST_WIDE_INT pos;
> +  unsigned HOST_WIDE_INT len;
> +  pos = get_pos_from_mask (UINTVAL (XEXP (x, 1)), );

  unsigned HOST_WIDE_INT len;
  HOST_WIDE_INT pos = get_pos_from_mask (UINTVAL (XEXP (x, 1)), );

> @@ -8231,6 +8235,22 @@ make_compound_operation_int (scalar_int_mode mode, rtx 
> *x_ptr,
> new_rtx = make_compound_operation (new_rtx, in_code);
>   }
>  
> +  /* If we have (and (lshiftrt X C) M) and M is a constant that would 
> select
> +  a field of bits within an item, but not the entire word, this might be
> +  representable by a simple AND if we are in an equality comparison.  */
> +  else if (pos > 0 && equality_comparison

That "&& equality_comparison" should be on a separate line as well.

> +&& GET_CODE (XEXP (x, 0)) == LSHIFTRT
> +&& CONST_INT_P (XEXP (XEXP (x, 0), 1))
> +&& pos + UINTVAL (XEXP (XEXP (x, 0), 1))
> +   <= GET_MODE_BITSIZE (mode))
> + {
> +   new_rtx = make_compound_operation (XEXP (XEXP (x, 0), 0), next_code);
> +   HOST_WIDE_INT real_pos = pos + UINTVAL (XEXP (XEXP (x, 0), 1));
> +   unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 << len) - 1;

Space after cast.

> +   new_rtx = gen_rtx_AND (mode, new_rtx,
> +  gen_int_mode (mask << real_pos, mode));
> + }

So this changes
  ((X >> C) & M) == ...
to
  (X & (M << C)) == ...
?

Where then does it check what ... is?  This is only valid like this if
that is zero.

Why should this go in combine and not in simplify-rtx instead?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr94026.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target aarch64*-*-* i?86-*-* x86_64-*-* } } */

Why restrict this to only some targets?

> +/* { dg-options "-O2 -fdump-rtl-combine" } */
> +
> +int
> +foo (int c)
> +{
> +  int a = (c >> 8) & 7;
> +
> +  if (a >= 2) {
> +return 1;
> +  }
> +
> +  return 0;
> +}
> +
> +/* The combine phase should transform (compare (and (lshiftrt x 8) 6) 0)
> +   to (compare (and (x 1536)) 0). We look for the *attempt* to match this
> +   RTL pattern, regardless of whether an actual insn may be found 

RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-24 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Saturday, May 23, 2020 10:57 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi!
> 
> Sorry this is taking so long.
> 
> On Wed, May 06, 2020 at 08:57:52AM +, Yangfei (Felix) wrote:
> > > On Tue, Mar 24, 2020 at 06:30:12AM +, Yangfei (Felix) wrote:
> > > > I modified combine emitting a simple AND operation instead of
> > > > making one
> > > zero_extract for this scenario.
> > > > Attached please find the new patch.  Hope this solves both of our
> concerns.
> > >
> > > This looks promising.  I'll try it out, see what it does on other
> > > targets.  (It will have to wait for GCC 11 stage 1, of course).
> 
> It creates better code on all targets :-)  A quite small improvement, but not
> entirely trivial.

Thanks for the effort.  It's great to hear that :- )
Attached please find the v3 patch.  Rebased on the latest trunk. 
Bootstrapped and tested on aarch64-linux-gnu.  Could you please help install it?

> > > p.s.  Please use a correct mime type?  application/octet-stream
> > > isn't something I can reply to.  Just text/plain is fine :-)
> >
> > I have using plain text now, hope that works for you.  :-)
> 
> Nope:
> 
> [-- Attachment #2: pr94026-v2.diff --]
> [-- Type: application/octet-stream, Encoding: base64, Size: 5.9K --]

This time I switched to use UUEncode type for the attachment.  Does it work?
I am using Outlook and I didn't find the place to change the MIME type : - (

Felix


pr94026-v3.diff
Description: pr94026-v3.diff


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-23 Thread Segher Boessenkool
Hi!

Sorry this is taking so long.

On Wed, May 06, 2020 at 08:57:52AM +, Yangfei (Felix) wrote:
> > On Tue, Mar 24, 2020 at 06:30:12AM +, Yangfei (Felix) wrote:
> > > I modified combine emitting a simple AND operation instead of making one
> > zero_extract for this scenario.
> > > Attached please find the new patch.  Hope this solves both of our 
> > > concerns.
> > 
> > This looks promising.  I'll try it out, see what it does on other targets.  
> > (It will
> > have to wait for GCC 11 stage 1, of course).

It creates better code on all targets :-)  A quite small improvement, but
not entirely trivial.

> > p.s.  Please use a correct mime type?  application/octet-stream isn't
> > something I can reply to.  Just text/plain is fine :-)
> 
> I have using plain text now, hope that works for you.  :-)

Nope:

[-- Attachment #2: pr94026-v2.diff --]
[-- Type: application/octet-stream, Encoding: base64, Size: 5.9K --]


Segher


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-07 Thread Segher Boessenkool
Hi!

On Wed, May 06, 2020 at 08:57:52AM +, Yangfei (Felix) wrote:
> > This looks promising.  I'll try it out, see what it does on other targets.  
> > (It will
> > have to wait for GCC 11 stage 1, of course).
> 
> I see GCC11 stage 1 opens for commits now.
> I have rebased the patch on the latest repo.  Attached please find the v2 
> patch.
> Bootstrapped and tested on x86-64-linux-gnu and aarch64-linux-gnu.
> Is this good to go?

I'll try it out.  Takes a day or so...  Stay tuned.

> > p.s.  Please use a correct mime type?  application/octet-stream isn't
> > something I can reply to.  Just text/plain is fine :-)
> 
> I have using plain text now, hope that works for you.  :-)

It is still application/octet-stream.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-05-06 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Tuesday, March 24, 2020 10:58 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> On Tue, Mar 24, 2020 at 06:30:12AM +, Yangfei (Felix) wrote:
> > I modified combine emitting a simple AND operation instead of making one
> zero_extract for this scenario.
> > Attached please find the new patch.  Hope this solves both of our concerns.
> 
> This looks promising.  I'll try it out, see what it does on other targets.  
> (It will
> have to wait for GCC 11 stage 1, of course).

I see GCC11 stage 1 opens for commits now.
I have rebased the patch on the latest repo.  Attached please find the v2 patch.
Bootstrapped and tested on x86-64-linux-gnu and aarch64-linux-gnu.
Is this good to go?
 
> 
> p.s.  Please use a correct mime type?  application/octet-stream isn't
> something I can reply to.  Just text/plain is fine :-)

I have using plain text now, hope that works for you.  :-)

Thanks,
Felix


pr94026-v2.diff
Description: pr94026-v2.diff


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-24 Thread Segher Boessenkool
On Tue, Mar 24, 2020 at 06:30:12AM +, Yangfei (Felix) wrote:
> I modified combine emitting a simple AND operation instead of making one 
> zero_extract for this scenario.  
> Attached please find the new patch.  Hope this solves both of our concerns.  

This looks promising.  I'll try it out, see what it does on other
targets.  (It will have to wait for GCC 11 stage 1, of course).

Thanks!


Segher


p.s.  Please use a correct mime type?  application/octet-stream isn't
something I can reply to.  Just text/plain is fine :-)


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-24 Thread Yangfei (Felix)
Hi!

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Monday, March 23, 2020 8:10 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Yeah, maybe not in simplify-rtx.c, hrm.  There is SELECT_CC_MODE for these
> things, and combine knows about that (not many passes do).

I modified combine emitting a simple AND operation instead of making one 
zero_extract for this scenario.  
Attached please find the new patch.  Hope this solves both of our concerns.  

gcc
+2020-03-24  Felix Yang   
+
+   PR rtl-optimization/94026
+   * combine.c (make_compound_operation_int): If we are have (and
+   (lshiftrt X C) M) and M is a constant that would select a field
+   of bits within an item, but not the entire word, fold this into
+   a simple AND if we are in an equality comparison.

gcc/testsuite
+2020-03-24  Felix Yang   
+
+   PR rtl-optimization/94026
+   * gcc.dg/pr94026.c: New test.


Thanks,
Felix


pr94026-v1.patch
Description: pr94026-v1.patch


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-23 Thread Segher Boessenkool
On Mon, Mar 23, 2020 at 07:46:17AM +, Yangfei (Felix) wrote:
> I think the problem here is how to make sure we are doing a ***equality*** 
> comparison with zero.  
> We can only do the transformation under this condition.  
> Then I see combine tries the following pattern: 
> 
> 173 Failed to match this instruction:
> 174 (set (reg:SI 101)
> 175 (ne:SI (and:SI (lshiftrt:SI (reg:SI 102)
> 176 (const_int 8 [0x8]))
> 177 (const_int 6 [0x6]))
> 178 (const_int 0 [0])))

Like I said, this should be simplified to
(set (reg:SI 10)
 (ne:SI (and:SI (reg:SI 102)
(const_int 1536))
(const_int 0)))

> But this cannot match a 'tst' instruction as the above pattern does not 
> clobber the CC flag register.  

combine can (and will) add such clobbers itself, if the pattern matches
otherwise, so that is not an issue.

> > This should be simplified to
> > (set (reg:CC_NZ 66 cc)
> >  (compare:CC_NZ (and:SI (reg:SI 103)
> > (const_int 1536))
> > (const_int 0)))
> > (but it isn't), and that is just *and3nr_compare0, which is a "tst"
> > instruction.  If this is fixed (in simplify-rtx.c), it will work as you 
> > want.
> 
> But I don't think it's correct for logic in simplify-rtx.c to further 
> simplify this rtl:  
> 
> (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 102)
> (const_int 8 [0x8]))
> (const_int 6 [0x6]))
> (const_int 0 [0]))

> The reason is that it knows nothing about CC_NZ.  

Yeah, maybe not in simplify-rtx.c, hrm.  There is SELECT_CC_MODE for
these things, and combine knows about that (not many passes do).


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-23 Thread Yangfei (Felix)
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Friday, March 20, 2020 9:38 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> On Thu, Mar 19, 2020 at 01:43:40AM +, Yangfei (Felix) wrote:
> > 2. Given that the patterns for ubfx and ubfiz are already not simple, I am
> afraid the pattern we got by combining the three would be much complex.
> >   And even more complex when further merged with insn 14 here in order to
> make sure that we are doing a equality comparison with zero.
> 
> It will be just as simple as with the other approach:

I think the problem here is how to make sure we are doing a ***equality*** 
comparison with zero.  
We can only do the transformation under this condition.  
Then I see combine tries the following pattern: 

173 Failed to match this instruction:
174 (set (reg:SI 101)
175 (ne:SI (and:SI (lshiftrt:SI (reg:SI 102)
176 (const_int 8 [0x8]))
177 (const_int 6 [0x6]))
178 (const_int 0 [0])))

But this cannot match a 'tst' instruction as the above pattern does not clobber 
the CC flag register.  
Also this means that we depend on specific uses cases, so may need different 
patterns to match all possible cases.  

> > > Another approach:
> > >
> > > Trying 7 -> 9:
> > > 7: r99:SI=r103:SI>>0x8
> > >   REG_DEAD r103:SI
> > > 9: cc:CC_NZ=cmp(r99:SI&0x6,0)
> > >   REG_DEAD r99:SI
> > > Failed to match this instruction:
> > > (set (reg:CC_NZ 66 cc)
> > > (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
> > > (const_int 8 [0x8]))
> > > (const_int 6 [0x6]))
> > > (const_int 0 [0])))
> > >
> > > This can be recognised as just that "tst" insn, no?  But combine (or
> > > simplify-rtx) should get rid of the shift here, just the "and" is
> > > simpler after all (it just needs to change the constant for that).
> >
> > No, this does not mean an equality comparison with zero.  I have mentioned
> this in my previous mail.
> 
> This should be simplified to
> (set (reg:CC_NZ 66 cc)
>  (compare:CC_NZ (and:SI (reg:SI 103)
> (const_int 1536))
> (const_int 0)))
> (but it isn't), and that is just *and3nr_compare0, which is a "tst"
> instruction.  If this is fixed (in simplify-rtx.c), it will work as you want.

But I don't think it's correct for logic in simplify-rtx.c to further simplify 
this rtl:  

(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 102)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0]))

The reason is that it knows nothing about CC_NZ.  
CC_NZ is aarch64-port specific and it does not necessarily mean a equality 
comparison with zero.  
Correct me if I missed anything.  

Thanks,
Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-19 Thread Segher Boessenkool
On Thu, Mar 19, 2020 at 01:43:40AM +, Yangfei (Felix) wrote:
> 2. Given that the patterns for ubfx and ubfiz are already not simple, I am 
> afraid the pattern we got by combining the three would be much complex.
>   And even more complex when further merged with insn 14 here in order to 
> make sure that we are doing a equality comparison with zero.  

It will be just as simple as with the other approach:

> > Another approach:
> > 
> > Trying 7 -> 9:
> > 7: r99:SI=r103:SI>>0x8
> >   REG_DEAD r103:SI
> > 9: cc:CC_NZ=cmp(r99:SI&0x6,0)
> >   REG_DEAD r99:SI
> > Failed to match this instruction:
> > (set (reg:CC_NZ 66 cc)
> > (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
> > (const_int 8 [0x8]))
> > (const_int 6 [0x6]))
> > (const_int 0 [0])))
> > 
> > This can be recognised as just that "tst" insn, no?  But combine (or
> > simplify-rtx) should get rid of the shift here, just the "and" is simpler 
> > after all (it
> > just needs to change the constant for that).
> 
> No, this does not mean an equality comparison with zero.  I have mentioned 
> this in my previous mail.  

This should be simplified to
(set (reg:CC_NZ 66 cc)
 (compare:CC_NZ (and:SI (reg:SI 103)
(const_int 1536))
(const_int 0)))
(but it isn't), and that is just *and3nr_compare0, which is a
"tst" instruction.  If this is fixed (in simplify-rtx.c), it will work
as you want.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-18 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Thursday, March 19, 2020 7:52 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi!
> 
> On Tue, Mar 17, 2020 at 02:05:19AM +, Yangfei (Felix) wrote:
> > > Trying 7 -> 8:
> > > 7: r99:SI=r103:SI>>0x8
> > >   REG_DEAD r103:SI
> > > 8: r100:SI=r99:SI&0x6
> > >   REG_DEAD r99:SI
> > > Failed to match this instruction:
> > > (set (reg:SI 100)
> > > (and:SI (lshiftrt:SI (reg:SI 103)
> > > (const_int 8 [0x8]))
> > > (const_int 6 [0x6])))
> > >
> > > That should match already, perhaps with a splitter.  aarch64 does
> > > not have very generic rotate-and-mask (or -insert) instructions, so
> > > the
> > > aarch64 backend needs to help combine with the less trivial cases.
> > >
> > > If you have a splitter for *this* one, all else will probably work
> > > "automatically": you split it to two ubfm, and the second of those
> > > can then merge into the compare instruction, and everything works out.
> >
> > Do you mean splitting the above pattern into a combination of ubfx and 
> > ubfiz?
> (Both are aliases of ubfm).
> 
> Sure.  The problem with aarch's bitfield instruction is that either the 
> source or
> the dest has to be right-aligned, which isn't natural for the compiler.
> 
> > I still don't see how the benefit can be achieved.
> > The following is the expected assembly for the test case:
> > tst x0, 1536
> > csetw0, ne
> > ret
> > This may not happen when the remaining ubfx is there.  Also what
> instruction be matched when ubfiz is merged into the compare?
> > Anything I missed?
> 
> The second insn could combine with the compare, and then that can combine
> back further.

Let me paste the RTL input to the combine phase:
/
(insn 6 3 7 2 (set (reg:SI 98)
(ashiftrt:SI (reg:SI 102)
(const_int 8 [0x8]))) "foo.c":3:16 742 
{*aarch64_ashr_sisd_or_int_si3}
 (expr_list:REG_DEAD (reg:SI 102)
(nil)))
(note 7 6 8 2 NOTE_INSN_DELETED)
(insn 8 7 9 2 (set (reg:CC_NZ 66 cc)
(compare:CC_NZ (and:SI (reg:SI 98)
(const_int 6 [0x6]))
(const_int 0 [0]))) "foo.c":5:8 698 {*andsi3nr_compare0}
 (expr_list:REG_DEAD (reg:SI 98)
(nil)))
(note 9 8 14 2 NOTE_INSN_DELETED)
(insn 14 9 15 2 (set (reg/i:SI 0 x0)
(ne:SI (reg:CC_NZ 66 cc)
(const_int 0 [0]))) "foo.c":10:1 494 {aarch64_cstoresi}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
*/

Two issues that I can see here:
1. When the ubfiz is combined with the compare, the combined insn does not 
necessarily mean a equality comparison with zero.  
  This is also the case when all the three insns (ubfx & ubfiz & compare) are 
combined together.  

2. Given that the patterns for ubfx and ubfiz are already not simple, I am 
afraid the pattern we got by combining the three would be much complex.
  And even more complex when further merged with insn 14 here in order to make 
sure that we are doing a equality comparison with zero.  

So it looks difficult when we go this port-specific way without matching a 
"zero_extact".  

> Another approach:
> 
> Trying 7 -> 9:
> 7: r99:SI=r103:SI>>0x8
>   REG_DEAD r103:SI
> 9: cc:CC_NZ=cmp(r99:SI&0x6,0)
>   REG_DEAD r99:SI
> Failed to match this instruction:
> (set (reg:CC_NZ 66 cc)
> (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
> (const_int 8 [0x8]))
> (const_int 6 [0x6]))
> (const_int 0 [0])))
> 
> This can be recognised as just that "tst" insn, no?  But combine (or
> simplify-rtx) should get rid of the shift here, just the "and" is simpler 
> after all (it
> just needs to change the constant for that).

No, this does not mean an equality comparison with zero.  I have mentioned this 
in my previous mail.  

Thanks,
Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-18 Thread Segher Boessenkool
Hi!

On Tue, Mar 17, 2020 at 02:05:19AM +, Yangfei (Felix) wrote:
> > Trying 7 -> 8:
> > 7: r99:SI=r103:SI>>0x8
> >   REG_DEAD r103:SI
> > 8: r100:SI=r99:SI&0x6
> >   REG_DEAD r99:SI
> > Failed to match this instruction:
> > (set (reg:SI 100)
> > (and:SI (lshiftrt:SI (reg:SI 103)
> > (const_int 8 [0x8]))
> > (const_int 6 [0x6])))
> > 
> > That should match already, perhaps with a splitter.  aarch64 does not have
> > very generic rotate-and-mask (or -insert) instructions, so the
> > aarch64 backend needs to help combine with the less trivial cases.
> > 
> > If you have a splitter for *this* one, all else will probably work
> > "automatically": you split it to two ubfm, and the second of those can then
> > merge into the compare instruction, and everything works out.
> 
> Do you mean splitting the above pattern into a combination of ubfx and ubfiz? 
>  (Both are aliases of ubfm).  

Sure.  The problem with aarch's bitfield instruction is that either the
source or the dest has to be right-aligned, which isn't natural for the
compiler.

> I still don't see how the benefit can be achieved.  
> The following is the expected assembly for the test case:  
> tst x0, 1536
> csetw0, ne
> ret
> This may not happen when the remaining ubfx is there.  Also what instruction 
> be matched when ubfiz is merged into the compare?  
> Anything I missed?  

The second insn could combine with the compare, and then that can combine
back further.

Another approach:

Trying 7 -> 9:
7: r99:SI=r103:SI>>0x8
  REG_DEAD r103:SI
9: cc:CC_NZ=cmp(r99:SI&0x6,0)
  REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:CC_NZ 66 cc)
(compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 103)
(const_int 8 [0x8]))
(const_int 6 [0x6]))
(const_int 0 [0])))

This can be recognised as just that "tst" insn, no?  But combine (or
simplify-rtx) should get rid of the shift here, just the "and" is
simpler after all (it just needs to change the constant for that).

> Also it's interesting to see how this may affect on those archs.  

Yes.  Which is why "canonicalisation" rules that are really just "this
works better on targets A and B" do not usually work well.  Rules that
are predictable and that actually simplify the code might still need
all targets to update (and target maintainers will grumble, myself
included), but at least that is a way forwards (and not backwards or
sideways).


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-16 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Tuesday, March 17, 2020 1:58 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> On Mon, Mar 16, 2020 at 06:29:39AM +, Yangfei (Felix) wrote:
> > Sorry for not getting your point here.
> 
> Trying 7 -> 8:
> 7: r99:SI=r103:SI>>0x8
>   REG_DEAD r103:SI
> 8: r100:SI=r99:SI&0x6
>   REG_DEAD r99:SI
> Failed to match this instruction:
> (set (reg:SI 100)
> (and:SI (lshiftrt:SI (reg:SI 103)
> (const_int 8 [0x8]))
> (const_int 6 [0x6])))
> 
> That should match already, perhaps with a splitter.  aarch64 does not have
> very generic rotate-and-mask (or -insert) instructions, so the
> aarch64 backend needs to help combine with the less trivial cases.
> 
> If you have a splitter for *this* one, all else will probably work
> "automatically": you split it to two ubfm, and the second of those can then
> merge into the compare instruction, and everything works out.

Do you mean splitting the above pattern into a combination of ubfx and ubfiz?  
(Both are aliases of ubfm).  
I still don't see how the benefit can be achieved.  
The following is the expected assembly for the test case:  
tst x0, 1536
csetw0, ne
ret
This may not happen when the remaining ubfx is there.  Also what instruction be 
matched when ubfiz is merged into the compare?  
Anything I missed?  

> > Also, this issue is there for ports like x86.  If we go that way, then we 
> > need
> to handle each port affected.
> 
> Yes, you need to do target-specific stuff in every backend separately.
> 
> > So I am inclined to handle this in an arch-independent way.
> 
> But you don't.  The transformation you do looks to be actively harmful on
> many targets.  (I haven't tested it yet, that takes 8h currently).

Now I know your concern about zero_extract.  Maybe this should be mentioned in 
docs like gccint.  
Also it's interesting to see how this may affect on those archs.  

Thanks,
Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-16 Thread Segher Boessenkool
On Mon, Mar 16, 2020 at 06:29:39AM +, Yangfei (Felix) wrote:
> Sorry for not getting your point here. 

Trying 7 -> 8:
7: r99:SI=r103:SI>>0x8
  REG_DEAD r103:SI
8: r100:SI=r99:SI&0x6
  REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:SI 100)
(and:SI (lshiftrt:SI (reg:SI 103)
(const_int 8 [0x8]))
(const_int 6 [0x6])))

That should match already, perhaps with a splitter.  aarch64 does not
have very generic rotate-and-mask (or -insert) instructions, so the
aarch64 backend needs to help combine with the less trivial cases.

If you have a splitter for *this* one, all else will probably work
"automatically": you split it to two ubfm, and the second of those can
then merge into the compare instruction, and everything works out.

> Also, this issue is there for ports like x86.  If we go that way, then we 
> need to handle each port affected.  

Yes, you need to do target-specific stuff in every backend separately.

> So I am inclined to handle this in an arch-independent way.  

But you don't.  The transformation you do looks to be actively harmful
on many targets.  (I haven't tested it yet, that takes 8h currently).

> > What should be tested is what new combinations are done, and which are *no
> > longer* done.
> 
> In theory, we won't lose but emit more zero_extract with my patch.  

Which is a bad idea.  How many machine insns are there matters; the cost
of those matters (assuming that matches reality well, which it has to
anyway); latency matters.  How many insns are zero_extract instead of
something else is not a good indicator of performance.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-16 Thread Yangfei (Felix)
Hi,

> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Saturday, March 14, 2020 12:07 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> On Fri, Mar 13, 2020 at 03:21:18AM +, Yangfei (Felix) wrote:
> > > On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
> > > >   This is a simple fix for PR94026.
> > > >   With this fix, combine will try make an extraction if we are in
> > > > a equality
> > > comparison and this is an AND
> > > >   with a constant which is power of two minus one.  Shift here
> > > > should be an
> > > constant.  For example, combine
> > > >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare
> > > > (zero_extract
> > > (x 2 9)) 0).
> > >
> > > Why is that a good thing?
> >
> > The reported test case is reduced from spec2017 541.leela_r.  I have pasted
> original code snippet on the bugzilla.
> > We found other compilers like aocc/llvm can catch this pattern and simplify 
> > it.
> 
> That wasn't my question, let me rephrase: why would writing it as zero_extract
> (instead of as a more canonical form) be wanted?

Sorry for not getting your point here. 

> The aarch backend only has zero_extract formulations for most of the bitfield
> instructions.  If you fix that problem, all of this should go away?  Like, the
> testcase in the PR starts with
> 
> Trying 7 -> 8:
> 7: r99:SI=r103:SI>>r104:SI#0
>   REG_DEAD r104:SI
>   REG_DEAD r103:SI
> 8: r100:SI=r99:SI&0x6
>   REG_DEAD r99:SI
> Failed to match this instruction:
> (set (reg:SI 100)
> (and:SI (ashiftrt:SI (reg:SI 103)
> (subreg:QI (reg:SI 104) 0))
> (const_int 6 [0x6])))
> 
> and that should match already (that's an ubfm (ubfx))?

For aarch64, if we use "ubfm/ubfx" for the reduced test case, then we still 
need to do a compare with zero.  Then we won't get the benefit.  
For aarch64, we need to emit a "tst" instruction here.  So we need to catch 
something like:  

149 (set (reg:CC_NZ 66 cc)
150 (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 102)
151 (const_int 8 [0x8]))
152 (const_int 6 [0x6]))
153 (const_int 0 [0])))

But this pattern is not accurate enough: we can only accept equality comparison 
with zero here (as indicated by the checking of equality_comparison in my 
original patch).  
Also, this issue is there for ports like x86.  If we go that way, then we need 
to handle each port affected.  
So I am inclined to handle this in an arch-independent way.  
I looked into tree phases like fwprop & fold-const before, but didn't see an 
appropriate point to catch this opportunity.  
Then I came to the combine phase.  

> 
> > > (There should be thorough tests on many archs, showing it helps on
> > > average, and it doesn't regress anything.  I can do that for you, but not
> right now).
> >
> > I only have aarch64 & x86_64 linux available and have tested this patch with
> spec17 on both platforms.
> > No obvious improvement & regression witnessed.  This is expected as only
> one instruction is reduced here.
> 
> What should be tested is what new combinations are done, and which are *no
> longer* done.

In theory, we won't lose but emit more zero_extract with my patch.  

> > > In general, we should have *fewer* zero_extract, not more.
> 
> Some reasons for that:
> 
> 1) All those can be expressed with simpler operations as well;
> 2) Most very similar expressions cannot be expressed as zero_extract,
> although many architectures can handle (some of) those just fine;
> 3) The optimizers do not handle zero_extract very well at all (this includes
> simplify-rtx, to start with).
> 
> sign_extract is nastier -- we really want to have a sign_extend that works on
> separate bits, not as coarse as address units as we have now -- but it 
> currently
> isn't handled much either.

Thanks for explaining this.  I have to admit that I didn't realize this issue 
when I was creating my original patch.  


Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-13 Thread Segher Boessenkool
On Fri, Mar 13, 2020 at 03:21:18AM +, Yangfei (Felix) wrote:
> > On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
> > >   This is a simple fix for PR94026.
> > >   With this fix, combine will try make an extraction if we are in a 
> > > equality
> > comparison and this is an AND
> > >   with a constant which is power of two minus one.  Shift here should be 
> > > an
> > constant.  For example, combine
> > >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare 
> > > (zero_extract
> > (x 2 9)) 0).
> > 
> > Why is that a good thing?
> 
> The reported test case is reduced from spec2017 541.leela_r.  I have pasted 
> original code snippet on the bugzilla.  
> We found other compilers like aocc/llvm can catch this pattern and simplify 
> it.  

That wasn't my question, let me rephrase: why would writing it as
zero_extract (instead of as a more canonical form) be wanted?

The aarch backend only has zero_extract formulations for most of the
bitfield instructions.  If you fix that problem, all of this should go
away?  Like, the testcase in the PR starts with

Trying 7 -> 8:
7: r99:SI=r103:SI>>r104:SI#0
  REG_DEAD r104:SI
  REG_DEAD r103:SI
8: r100:SI=r99:SI&0x6
  REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:SI 100)
(and:SI (ashiftrt:SI (reg:SI 103)
(subreg:QI (reg:SI 104) 0))
(const_int 6 [0x6])))

and that should match already (that's an ubfm (ubfx))?


> > (There should be thorough tests on many archs, showing it helps on average,
> > and it doesn't regress anything.  I can do that for you, but not right now).
> 
> I only have aarch64 & x86_64 linux available and have tested this patch with 
> spec17 on both platforms.  
> No obvious improvement & regression witnessed.  This is expected as only one 
> instruction is reduced here.  

What should be tested is what new combinations are done, and which are
*no longer* done.

> > In general, we should have *fewer* zero_extract, not more.

Some reasons for that:

1) All those can be expressed with simpler operations as well;
2) Most very similar expressions cannot be expressed as zero_extract,
although many architectures can handle (some of) those just fine;
3) The optimizers do not handle zero_extract very well at all (this
includes simplify-rtx, to start with).

sign_extract is nastier -- we really want to have a sign_extend that
works on separate bits, not as coarse as address units as we have now --
but it currently isn't handled much either.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-12 Thread Yangfei (Felix)
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Friday, March 13, 2020 7:50 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi!
> 
> Please Cc: me on combine patches; you sent it nine days ago, and I didn't see 
> it
> until now.

OK.  

> On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
> >   This is a simple fix for PR94026.
> >   With this fix, combine will try make an extraction if we are in a equality
> comparison and this is an AND
> >   with a constant which is power of two minus one.  Shift here should be an
> constant.  For example, combine
> >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare 
> > (zero_extract
> (x 2 9)) 0).
> 
> Why is that a good thing?

The reported test case is reduced from spec2017 541.leela_r.  I have pasted 
original code snippet on the bugzilla.  
We found other compilers like aocc/llvm can catch this pattern and simplify it. 
 

> (There should be thorough tests on many archs, showing it helps on average,
> and it doesn't regress anything.  I can do that for you, but not right now).

I only have aarch64 & x86_64 linux available and have tested this patch with 
spec17 on both platforms.  
No obvious improvement & regression witnessed.  This is expected as only one 
instruction is reduced here.  
It's appreciated if this can be tested on other archs.  

> The code needs more comments, and the commit message should say what is
> done and why you made those choices.
> In general, we should have *fewer* zero_extract, not more.

OK.  I can add more comments & commit message if finally we are inclined to go 
with this patch.  

Thanks,
Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-12 Thread Segher Boessenkool
Hi!

Please Cc: me on combine patches; you sent it nine days ago, and I didn't
see it until now.

On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
>   This is a simple fix for PR94026.  
>   With this fix, combine will try make an extraction if we are in a equality 
> comparison and this is an AND
>   with a constant which is power of two minus one.  Shift here should be an 
> constant.  For example, combine
>   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare (zero_extract 
> (x 2 9)) 0).  

Why is that a good thing?

(There should be thorough tests on many archs, showing it helps on
average, and it doesn't regress anything.  I can do that for you, but
not right now).

The code needs more comments, and the commit message should say what is
done and why you made those choices.

In general, we should have *fewer* zero_extract, not more.


Segher


RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-05 Thread Yangfei (Felix)
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Thursday, March 5, 2020 11:37 PM
> To: Yangfei (Felix) ; gcc-patches@gcc.gnu.org
> Cc: Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> On Wed, 2020-03-04 at 08:39 +, Yangfei (Felix) wrote:
> > Hi,
> >
> >   This is a simple fix for PR94026.
> >   With this fix, combine will try make an extraction if we are in a
> > equality comparison and this is an AND
> >   with a constant which is power of two minus one.  Shift here should
> > be an constant.  For example, combine
> >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare
> > (zero_extract (x 2 9)) 0).
> >
> >   Added one test case for this.  Bootstrap and tested on both x86_64
> > and
> > aarch64 Linux platform.
> >   Any suggestion?
> >
> > Thanks,
> > Felix
> >
> > gcc:
> > +2020-03-04  Felix Yang  
> > +
> > +   PR rtl-optimization/94026
> > +   * combine.c (make_compound_operation_int): Make an extraction
> > + if we are in a equality comparison and this is an AND with a
> > + constant which is power of two minus one.
> > +
> >
> > gcc/testsuite:
> > +2020-03-04  Felix Yang  
> > +
> > +   PR rtl-optimization/94026
> > +   * gcc.dg/pr94026.c: New test.
> Just a note.  We're in stage4 of our development cycle, meaning we focus on
> regression bugfixes.  I've queued this for evaluation in gcc-11.
> jeff

Sure, this is intended for 11.  Thanks for doing that : - ) 

Best regards,
Felix


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-05 Thread Jeff Law
On Wed, 2020-03-04 at 08:39 +, Yangfei (Felix) wrote:
> Hi,
> 
>   This is a simple fix for PR94026.  
>   With this fix, combine will try make an extraction if we are in a equality
> comparison and this is an AND
>   with a constant which is power of two minus one.  Shift here should be an
> constant.  For example, combine
>   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare (zero_extract
> (x 2 9)) 0).  
> 
>   Added one test case for this.  Bootstrap and tested on both x86_64 and
> aarch64 Linux platform.  
>   Any suggestion?  
> 
> Thanks,
> Felix
> 
> gcc:
> +2020-03-04  Felix Yang  
> +
> +   PR rtl-optimization/94026
> +   * combine.c (make_compound_operation_int): Make an extraction
> + if we are in a equality comparison and this is an AND with a
> + constant which is power of two minus one.
> +
> 
> gcc/testsuite:
> +2020-03-04  Felix Yang  
> +
> +   PR rtl-optimization/94026
> +   * gcc.dg/pr94026.c: New test.
Just a note.  We're in stage4 of our development cycle, meaning we focus on
regression bugfixes.  I've queued this for evaluation in gcc-11.
jeff