Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-10-01 Thread Richard Sandiford via Gcc-patches
Alex Coplan  writes:
> Hi Christophe,
>
> On 08/09/2020 10:14, Christophe Lyon wrote:
>> On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
>> >
>> > gcc/ChangeLog:
>> >
>> > * config/aarch64/aarch64.md
>> > (*adds__): Ensure extended operand
>> > agrees with width of extension specifier.
>> > (*subs__): Likewise.
>> > (*adds__shift_): Likewise.
>> > (*subs__shift_): Likewise.
>> > (*add__): Likewise.
>> > (*add__shft_): Likewise.
>> > (*add_uxt_shift2): Likewise.
>> > (*sub__): Likewise.
>> > (*sub__shft_): Likewise.
>> > (*sub_uxt_shift2): Likewise.
>> > (*cmp_swp__reg): Likewise.
>> > (*cmp_swp__shft_): Likewise.
>> >
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
>> > * gcc.target/aarch64/cmp.c: Likewise.
>> > * gcc.target/aarch64/subs3.c: Likewise.
>> > * gcc.target/aarch64/subsp.c: Likewise.
>> > * gcc.target/aarch64/extend-syntax.c: New test.
>> >
>> 
>> Hi,
>> 
>> I've noticed some of the new tests fail with -mabi=ilp32:
>> gcc.target/aarch64/extend-syntax.c check-function-bodies add1
>> gcc.target/aarch64/extend-syntax.c check-function-bodies add3
>> gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
>> gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
>> gcc.target/aarch64/extend-syntax.c scan-assembler-times
>> subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
>> gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n
>> 
>> Christophe
>
> AFAICT the second scan-assembler in that subsp test failed on ILP32
> before my commit. This is because we generate slightly suboptimal code
> here. On LP64 with -O, we get:
>
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> sub sp, sp, x1, sxtw 4
> mov x0, sp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
>
> On ILP32, we get:
>
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> lsl w1, w1, 4
> sub sp, sp, x1
> mov w0, wsp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
>
> And we see similar results all the way back to GCC 6. So AFAICT this
> scan-assembler has never worked. The attached patch disables it on ILP32
> since this isn't a code quality regression.
>
> This patch also fixes up the DejaGnu directives in extend-syntax.c to
> work on ILP32: we change the check-function-bodies directive to only run
> on LP64, adding scan-assembler directives for ILP32 where required.
>
> OK for trunk?

OK, thanks.  Sorry for the slow review.

Richard


Re: [PING][PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-30 Thread Christophe Lyon via Gcc-patches
On Wed, 30 Sep 2020 at 16:03, Alex Coplan  wrote:
>
> Ping. Are these testsuite fixes for ILP32 OK?
>
LGTM, by looking at the patch (I didn't run it in ilp32 mode)

Thanks
Christophe


> On 18/09/2020 17:15, Alex Coplan wrote:
> > Hi Christophe,
> >
> > On 08/09/2020 10:14, Christophe Lyon wrote:
> > > On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/aarch64/aarch64.md
> > > > (*adds__): Ensure extended operand
> > > > agrees with width of extension specifier.
> > > > (*subs__): Likewise.
> > > > (*adds__shift_): Likewise.
> > > > (*subs__shift_): Likewise.
> > > > (*add__): Likewise.
> > > > (*add__shft_): Likewise.
> > > > (*add_uxt_shift2): Likewise.
> > > > (*sub__): Likewise.
> > > > (*sub__shft_): Likewise.
> > > > (*sub_uxt_shift2): Likewise.
> > > > (*cmp_swp__reg): Likewise.
> > > > (*cmp_swp__shft_): Likewise.
> > > >
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > > > * gcc.target/aarch64/cmp.c: Likewise.
> > > > * gcc.target/aarch64/subs3.c: Likewise.
> > > > * gcc.target/aarch64/subsp.c: Likewise.
> > > > * gcc.target/aarch64/extend-syntax.c: New test.
> > > >
> > >
> > > Hi,
> > >
> > > I've noticed some of the new tests fail with -mabi=ilp32:
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> > > gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> > > gcc.target/aarch64/extend-syntax.c scan-assembler-times
> > > subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> > > gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 
> > > 4\n
> > >
> > > Christophe
> >
> > AFAICT the second scan-assembler in that subsp test failed on ILP32
> > before my commit. This is because we generate slightly suboptimal code
> > here. On LP64 with -O, we get:
> >
> > f2:
> > stp x29, x30, [sp, -16]!
> > mov x29, sp
> > add w1, w1, 1
> > sub sp, sp, x1, sxtw 4
> > mov x0, sp
> > bl  foo
> > mov sp, x29
> > ldp x29, x30, [sp], 16
> > ret
> >
> > On ILP32, we get:
> >
> > f2:
> > stp x29, x30, [sp, -16]!
> > mov x29, sp
> > add w1, w1, 1
> > lsl w1, w1, 4
> > sub sp, sp, x1
> > mov w0, wsp
> > bl  foo
> > mov sp, x29
> > ldp x29, x30, [sp], 16
> > ret
> >
> > And we see similar results all the way back to GCC 6. So AFAICT this
> > scan-assembler has never worked. The attached patch disables it on ILP32
> > since this isn't a code quality regression.
> >
> > This patch also fixes up the DejaGnu directives in extend-syntax.c to
> > work on ILP32: we change the check-function-bodies directive to only run
> > on LP64, adding scan-assembler directives for ILP32 where required.
> >
> > OK for trunk?
> >
> > Thanks,
> > Alex
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
> > b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > index 23fa9f4ffc5..1bfcdb59dde 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> > @@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned 
> > x)
> >  */
> >  unsigned long long add2(unsigned long long x, unsigned y)
> >  {
> > +  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
> > ilp32 } } } */
> >return x + y;
> >  }
> >
> > @@ -34,6 +35,9 @@ double *add3(double *p, int x)
> >return p + x;
> >  }
> >
> > +// add1 and add3 should both generate this on ILP32:
> > +/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target 
> > ilp32 } } } */
> > +
> >  // Hits *sub_zero_extendsi_di (*sub__).
> >  /*
> >  ** sub1:
> > @@ -42,6 +46,7 @@ double *add3(double *p, int x)
> >  */
> >  unsigned long long sub1(unsigned long long x, unsigned n)
> >  {
> > +/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { 
> > target ilp32 } } } */
> >  return x - n;
> >  }
> >
> > @@ -67,6 +72,9 @@ double *sub3(double *p, int n)
> >return p - n;
> >  }
> >
> > +// sub2 and sub3 should both generate this on ILP32:
> > +/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target 
> > ilp32 } } } */
> > +
> >  // Hits *adds_zero_extendsi_di (*adds__).
> >  int adds1(unsigned long long x, unsigned y)
> >  {
> > @@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
> >  unsigned long long *w;
> >  int subs2(unsigned long long *x, int y)
> >  {
> > -  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> > w\[0-9\]+, 

[PING][PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-30 Thread Alex Coplan via Gcc-patches
Ping. Are these testsuite fixes for ILP32 OK?

On 18/09/2020 17:15, Alex Coplan wrote:
> Hi Christophe,
> 
> On 08/09/2020 10:14, Christophe Lyon wrote:
> > On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64.md
> > > (*adds__): Ensure extended operand
> > > agrees with width of extension specifier.
> > > (*subs__): Likewise.
> > > (*adds__shift_): Likewise.
> > > (*subs__shift_): Likewise.
> > > (*add__): Likewise.
> > > (*add__shft_): Likewise.
> > > (*add_uxt_shift2): Likewise.
> > > (*sub__): Likewise.
> > > (*sub__shft_): Likewise.
> > > (*sub_uxt_shift2): Likewise.
> > > (*cmp_swp__reg): Likewise.
> > > (*cmp_swp__shft_): Likewise.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > > * gcc.target/aarch64/cmp.c: Likewise.
> > > * gcc.target/aarch64/subs3.c: Likewise.
> > > * gcc.target/aarch64/subsp.c: Likewise.
> > > * gcc.target/aarch64/extend-syntax.c: New test.
> > >
> > 
> > Hi,
> > 
> > I've noticed some of the new tests fail with -mabi=ilp32:
> > gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> > gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> > gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> > gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> > gcc.target/aarch64/extend-syntax.c scan-assembler-times
> > subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> > gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n
> > 
> > Christophe
> 
> AFAICT the second scan-assembler in that subsp test failed on ILP32
> before my commit. This is because we generate slightly suboptimal code
> here. On LP64 with -O, we get:
> 
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> sub sp, sp, x1, sxtw 4
> mov x0, sp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
> 
> On ILP32, we get:
> 
> f2:
> stp x29, x30, [sp, -16]!
> mov x29, sp
> add w1, w1, 1
> lsl w1, w1, 4
> sub sp, sp, x1
> mov w0, wsp
> bl  foo
> mov sp, x29
> ldp x29, x30, [sp], 16
> ret
> 
> And we see similar results all the way back to GCC 6. So AFAICT this
> scan-assembler has never worked. The attached patch disables it on ILP32
> since this isn't a code quality regression.
> 
> This patch also fixes up the DejaGnu directives in extend-syntax.c to
> work on ILP32: we change the check-function-bodies directive to only run
> on LP64, adding scan-assembler directives for ILP32 where required.
> 
> OK for trunk?
> 
> Thanks,
> Alex

> diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
> b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> index 23fa9f4ffc5..1bfcdb59dde 100644
> --- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> +++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
> @@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned x)
>  */
>  unsigned long long add2(unsigned long long x, unsigned y)
>  {
> +  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
> ilp32 } } } */
>return x + y;
>  }
>  
> @@ -34,6 +35,9 @@ double *add3(double *p, int x)
>return p + x;
>  }
>  
> +// add1 and add3 should both generate this on ILP32:
> +/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target 
> ilp32 } } } */
> +
>  // Hits *sub_zero_extendsi_di (*sub__).
>  /*
>  ** sub1:
> @@ -42,6 +46,7 @@ double *add3(double *p, int x)
>  */
>  unsigned long long sub1(unsigned long long x, unsigned n)
>  {
> +/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { target 
> ilp32 } } } */
>  return x - n;
>  }
>  
> @@ -67,6 +72,9 @@ double *sub3(double *p, int n)
>return p - n;
>  }
>  
> +// sub2 and sub3 should both generate this on ILP32:
> +/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target 
> ilp32 } } } */
> +
>  // Hits *adds_zero_extendsi_di (*adds__).
>  int adds1(unsigned long long x, unsigned y)
>  {
> @@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
>  unsigned long long *w;
>  int subs2(unsigned long long *x, int y)
>  {
> -  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> w\[0-9\]+, sxtw 3" 1 } } */
> +  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, 
> w\[0-9\]+, sxtw 3" 1 { target lp64 } } } */
> +  /* { dg-final { scan-assembler-times "subs\tw\[0-9\]+, w\[0-9\]+, 
> w\[0-9\]+, lsl 3" 1 { target ilp32 } } } */
>unsigned long long *t = x - y;
>w = t;
>return !!t;
> @@ -117,4 +126,4 @@ int cmp2(unsigned long long x, int y)
>return x == ((unsigned 

Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-18 Thread Alex Coplan
Hi Christophe,

On 08/09/2020 10:14, Christophe Lyon wrote:
> On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md
> > (*adds__): Ensure extended operand
> > agrees with width of extension specifier.
> > (*subs__): Likewise.
> > (*adds__shift_): Likewise.
> > (*subs__shift_): Likewise.
> > (*add__): Likewise.
> > (*add__shft_): Likewise.
> > (*add_uxt_shift2): Likewise.
> > (*sub__): Likewise.
> > (*sub__shft_): Likewise.
> > (*sub_uxt_shift2): Likewise.
> > (*cmp_swp__reg): Likewise.
> > (*cmp_swp__shft_): Likewise.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > * gcc.target/aarch64/cmp.c: Likewise.
> > * gcc.target/aarch64/subs3.c: Likewise.
> > * gcc.target/aarch64/subsp.c: Likewise.
> > * gcc.target/aarch64/extend-syntax.c: New test.
> >
> 
> Hi,
> 
> I've noticed some of the new tests fail with -mabi=ilp32:
> gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> gcc.target/aarch64/extend-syntax.c scan-assembler-times
> subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n
> 
> Christophe

AFAICT the second scan-assembler in that subsp test failed on ILP32
before my commit. This is because we generate slightly suboptimal code
here. On LP64 with -O, we get:

f2:
stp x29, x30, [sp, -16]!
mov x29, sp
add w1, w1, 1
sub sp, sp, x1, sxtw 4
mov x0, sp
bl  foo
mov sp, x29
ldp x29, x30, [sp], 16
ret

On ILP32, we get:

f2:
stp x29, x30, [sp, -16]!
mov x29, sp
add w1, w1, 1
lsl w1, w1, 4
sub sp, sp, x1
mov w0, wsp
bl  foo
mov sp, x29
ldp x29, x30, [sp], 16
ret

And we see similar results all the way back to GCC 6. So AFAICT this
scan-assembler has never worked. The attached patch disables it on ILP32
since this isn't a code quality regression.

This patch also fixes up the DejaGnu directives in extend-syntax.c to
work on ILP32: we change the check-function-bodies directive to only run
on LP64, adding scan-assembler directives for ILP32 where required.

OK for trunk?

Thanks,
Alex
diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
index 23fa9f4ffc5..1bfcdb59dde 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
@@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned x)
 */
 unsigned long long add2(unsigned long long x, unsigned y)
 {
+  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
ilp32 } } } */
   return x + y;
 }
 
@@ -34,6 +35,9 @@ double *add3(double *p, int x)
   return p + x;
 }
 
+// add1 and add3 should both generate this on ILP32:
+/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target ilp32 
} } } */
+
 // Hits *sub_zero_extendsi_di (*sub__).
 /*
 ** sub1:
@@ -42,6 +46,7 @@ double *add3(double *p, int x)
 */
 unsigned long long sub1(unsigned long long x, unsigned n)
 {
+/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { target 
ilp32 } } } */
 return x - n;
 }
 
@@ -67,6 +72,9 @@ double *sub3(double *p, int n)
   return p - n;
 }
 
+// sub2 and sub3 should both generate this on ILP32:
+/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target ilp32 
} } } */
+
 // Hits *adds_zero_extendsi_di (*adds__).
 int adds1(unsigned long long x, unsigned y)
 {
@@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
 unsigned long long *w;
 int subs2(unsigned long long *x, int y)
 {
-  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, w\[0-9\]+, 
sxtw 3" 1 } } */
+  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, w\[0-9\]+, 
sxtw 3" 1 { target lp64 } } } */
+  /* { dg-final { scan-assembler-times "subs\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3" 1 { target ilp32 } } } */
   unsigned long long *t = x - y;
   w = t;
   return !!t;
@@ -117,4 +126,4 @@ int cmp2(unsigned long long x, int y)
   return x == ((unsigned long long)y << 3);
 }
 
-/* { dg-final { check-function-bodies "**" "" "" } } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/subsp.c 
b/gcc/testsuite/gcc.target/aarch64/subsp.c
index 341b83dca86..e7f61e0799b 100644
--- a/gcc/testsuite/gcc.target/aarch64/subsp.c
+++ b/gcc/testsuite/gcc.target/aarch64/subsp.c

RE: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-08 Thread Alex Coplan
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: 08 September 2020 09:15
> To: Alex Coplan 
> Cc: gcc Patches ; Richard Earnshaw
> ; Marcus Shawcroft 
> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
> syntax
> 
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md
> > (*adds__): Ensure extended operand
> > agrees with width of extension specifier.
> > (*subs__): Likewise.
> > (*adds__shift_): Likewise.
> > (*subs__shift_): Likewise.
> > (*add__): Likewise.
> > (*add__shft_): Likewise.
> > (*add_uxt_shift2): Likewise.
> > (*sub__): Likewise.
> > (*sub__shft_): Likewise.
> > (*sub_uxt_shift2): Likewise.
> > (*cmp_swp__reg): Likewise.
> > (*cmp_swp__shft_): Likewise.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > * gcc.target/aarch64/cmp.c: Likewise.
> > * gcc.target/aarch64/subs3.c: Likewise.
> > * gcc.target/aarch64/subsp.c: Likewise.
> > * gcc.target/aarch64/extend-syntax.c: New test.
> >
> 
> Hi,
> 
> I've noticed some of the new tests fail with -mabi=ilp32:
> gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> gcc.target/aarch64/extend-syntax.c scan-assembler-times
> subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw
> 4\n

Thanks for catching these.

The failures in extend-syntax.c just need the assertions tweaking, I have a
patch to fix those.

The failure in subsp.c is more interesting: looks like a missed optimisation on
ILP32, I'm taking a look.

> 
> Christophe

Thanks,
Alex


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-08 Thread Christophe Lyon via Gcc-patches
On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
>
> Hello,
>
> Given the following C function:
>
> double *f(double *p, unsigned x)
> {
> return p + x;
> }
>
> prior to this patch, GCC at -O2 would generate:
>
> f:
> add x0, x0, x1, uxtw 3
> ret
>
> but this add instruction uses architecturally-invalid syntax: the width
> of the third operand conflicts with the width of the extension
> specifier. The third operand is only permitted to be an x register when
> the extension specifier is (u|s)xtx.
>
> This instruction, and analogous insns for adds, sub, subs, and cmp, are
> rejected by clang, but accepted by binutils. Assembling and
> disassembling such an insn with binutils gives the architecturally-valid
> version in the disassembly:
>
>0:   8b214c00add x0, x0, w1, uxtw #3
>
> This patch fixes several patterns in the AArch64 backend to use the
> standard syntax as specified in the Arm ARM such that GCC's output can
> be assembled by assemblers other than GAS.
>
> Note that an obvious omission here is that this patch does not touch the
> mult patterns such as *add__mult_. I found
> that I couldn't hit these patterns with C code since multiplications by
> powers of two always get turned into shifts by earlier RTL passes. If
> there's a way to reliably hit these patterns, then perhaps these should
> be updated as well.
>
> Testing:
>  * New test which checks for the correct syntax in all updated
>patterns (fails before and passes after the aarch64.md change).
>  * New test can be assembled by both GAS and llvm-mc following the
>change.
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
>
> OK for master?
>
> Thanks,
> Alex
>
> ---
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md
> (*adds__): Ensure extended operand
> agrees with width of extension specifier.
> (*subs__): Likewise.
> (*adds__shift_): Likewise.
> (*subs__shift_): Likewise.
> (*add__): Likewise.
> (*add__shft_): Likewise.
> (*add_uxt_shift2): Likewise.
> (*sub__): Likewise.
> (*sub__shft_): Likewise.
> (*sub_uxt_shift2): Likewise.
> (*cmp_swp__reg): Likewise.
> (*cmp_swp__shft_): Likewise.
>
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> * gcc.target/aarch64/cmp.c: Likewise.
> * gcc.target/aarch64/subs3.c: Likewise.
> * gcc.target/aarch64/subsp.c: Likewise.
> * gcc.target/aarch64/extend-syntax.c: New test.
>

Hi,

I've noticed some of the new tests fail with -mabi=ilp32:
gcc.target/aarch64/extend-syntax.c check-function-bodies add1
gcc.target/aarch64/extend-syntax.c check-function-bodies add3
gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
gcc.target/aarch64/extend-syntax.c scan-assembler-times
subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n

Christophe


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-21 Thread Richard Sandiford
Alex Coplan  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: 18 August 2020 09:35
>> To: Alex Coplan 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
>> Marcus Shawcroft ; Kyrylo Tkachov
>> 
>> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
>> syntax
>> 
>> Alex Coplan  writes:
>> > Note that an obvious omission here is that this patch does not touch
>> the
>> > mult patterns such as *add__mult_. I found
>> > that I couldn't hit these patterns with C code since multiplications by
>> > powers of two always get turned into shifts by earlier RTL passes. If
>> > there's a way to reliably hit these patterns, then perhaps these should
>> > be updated as well.
>> 
>> Hmm.  Feels like we should either update them or delete them.  E.g.:
>> 
>>   *adds__multp2
>>   *subs__multp2
>> 
>> were added alongside the adds3.c and subs3.c tests that you're updating,
>> so if the tests don't/no longer need the multp2 patterns to pass,
>> there's a good chance that the patterns are redundant.
>> 
>> For reasons I never understood, the canonical representation is to use
>> (mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
>> So perhaps the patterns were originally for address calculations that had
>> been moved outside of a (mem …) and not updated to shifts instead of
>> mults.
>
> Thanks for the review, and for clarifying this. I tried removing these
> together with the *_mul_imm_ patterns (e.g. *adds_mul_imm_) and
> the only failure was gcc/testsuite/gcc.dg/torture/pr34330.c when
> compiled with -Os -ftree-vectorize which appears to depend on the
> *add_mul_imm_di pattern. Without this pattern, we ICE in LRA on this
> input.
>
> In this case, GCC appears to have done exactly what you described: we
> have the (plus (mult ...) ...) nodes inside (mem)s prior to register
> allocation, and then we end up pulling these out without converting them
> to shifts.
>
> Seeing this behaviour (and in particular seeing the ICE) makes me
> hesitant to just go ahead and remove the other patterns. That said, I
> have a patch to remove the following patterns:
>
>  *adds__multp2
>  *subs__multp2
>  *add__mult_
>  *add__mult_si_uxtw
>  *add__multp2
>  *add_si_multp2_uxtw
>  *add_uxt_multp2
>  *add_uxtsi_multp2_uxtw
>  *sub__multp2
>  *sub_si_multp2_uxtw
>  *sub_uxt_multp2
>  *sub_uxtsi_multp2_uxtw
>  *sub_uxtsi_multp2_uxtw
>
> (together with the predicate aarch64_pwr_imm3 which is only used in
> these patterns) and this bootstraps/regtests just fine.
>
> So, I have a couple of questions:
>
> (1) Should it be considered a bug if we pull (plus (mult (power of 2)
>...) ...) out of a (mem) RTX without re-writing the (mult) as a
>shift?

IMO, yes.  But if we have an example in which it happens, we have
to fix it before removing the patterns.  That could end up being
a bit of a rabbit hole, and could affect other targets too.

If we keep the patterns, we should fix the [su]xtw problem in:

  *adds__multp2
  *subs__multp2
  *add__mult_
  *add_uxt_multp2
  *sub_uxt_multp2

too.  (Plus any others I missed, if that isn't the full list.)

Thanks,
Richard


RE: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-19 Thread Alex Coplan
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 18 August 2020 09:35
> To: Alex Coplan 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> 
> Subject: Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend
> syntax
> 
> Alex Coplan  writes:
> > Note that an obvious omission here is that this patch does not touch
> the
> > mult patterns such as *add__mult_. I found
> > that I couldn't hit these patterns with C code since multiplications by
> > powers of two always get turned into shifts by earlier RTL passes. If
> > there's a way to reliably hit these patterns, then perhaps these should
> > be updated as well.
> 
> Hmm.  Feels like we should either update them or delete them.  E.g.:
> 
>   *adds__multp2
>   *subs__multp2
> 
> were added alongside the adds3.c and subs3.c tests that you're updating,
> so if the tests don't/no longer need the multp2 patterns to pass,
> there's a good chance that the patterns are redundant.
> 
> For reasons I never understood, the canonical representation is to use
> (mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
> So perhaps the patterns were originally for address calculations that had
> been moved outside of a (mem …) and not updated to shifts instead of
> mults.

Thanks for the review, and for clarifying this. I tried removing these
together with the *_mul_imm_ patterns (e.g. *adds_mul_imm_) and
the only failure was gcc/testsuite/gcc.dg/torture/pr34330.c when
compiled with -Os -ftree-vectorize which appears to depend on the
*add_mul_imm_di pattern. Without this pattern, we ICE in LRA on this
input.

In this case, GCC appears to have done exactly what you described: we
have the (plus (mult ...) ...) nodes inside (mem)s prior to register
allocation, and then we end up pulling these out without converting them
to shifts.

Seeing this behaviour (and in particular seeing the ICE) makes me
hesitant to just go ahead and remove the other patterns. That said, I
have a patch to remove the following patterns:

 *adds__multp2
 *subs__multp2
 *add__mult_
 *add__mult_si_uxtw
 *add__multp2
 *add_si_multp2_uxtw
 *add_uxt_multp2
 *add_uxtsi_multp2_uxtw
 *sub__multp2
 *sub_si_multp2_uxtw
 *sub_uxt_multp2
 *sub_uxtsi_multp2_uxtw
 *sub_uxtsi_multp2_uxtw

(together with the predicate aarch64_pwr_imm3 which is only used in
these patterns) and this bootstraps/regtests just fine.

So, I have a couple of questions:

(1) Should it be considered a bug if we pull (plus (mult (power of 2)
   ...) ...) out of a (mem) RTX without re-writing the (mult) as a
   shift?
(2) If not, can we otherwise justify the removal of the patterns here?

I'm happy to go ahead with this if either (1) or (2) are true.

Thanks,
Alex


Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-18 Thread Iain Sandoe via Gcc-patches

Richard Sandiford  wrote:


Alex Coplan  writes:

Note that an obvious omission here is that this patch does not touch the
mult patterns such as *add__mult_. I found
that I couldn't hit these patterns with C code since multiplications by
powers of two always get turned into shifts by earlier RTL passes. If
there's a way to reliably hit these patterns, then perhaps these should
be updated as well.


Hmm.  Feels like we should either update them or delete them.  E.g.:

 *adds__multp2
 *subs__multp2


FWIW add_extvdi_multp2 seems to fire for me, building libstdc++ (c++11
cow-wstring-inst) on my [very experimental] initial attempts at a Darwin  
port.


(I see these failures too because the platform assembler is based off the  
LLVM

 backend which complains)

Iain


were added alongside the adds3.c and subs3.c tests that you're updating,
so if the tests don't/no longer need the multp2 patterns to pass,
there's a good chance that the patterns are redundant.

For reasons I never understood, the canonical representation is to use
(mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
So perhaps the patterns were originally for address calculations that had
been moved outside of a (mem …) and not updated to shifts instead of mults.

AFAICT the full list of affected patterns is:

 *adds__multp2
 *subs__multp2
 *add__mult_
 *add_uxt_multp2
 *sub_uxt_multp2

Is that right?  If so, I think we should consider a follow-on patch
to delete them.


Testing:
* New test which checks for the correct syntax in all updated
  patterns (fails before and passes after the aarch64.md change).
* New test can be assembled by both GAS and llvm-mc following the
  change.
* Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for master?


OK as-is if paired with a follow-on patch to delete the patterns above
(preapproved if it passes testing).  Also OK without a follow-on patch
if the fix is extended to the patterns above too (but the first option
is better :-)).

Thanks for taking the time to find a test for each pattern.

Richard





Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-18 Thread Richard Sandiford
Alex Coplan  writes:
> Note that an obvious omission here is that this patch does not touch the
> mult patterns such as *add__mult_. I found
> that I couldn't hit these patterns with C code since multiplications by
> powers of two always get turned into shifts by earlier RTL passes. If
> there's a way to reliably hit these patterns, then perhaps these should
> be updated as well.

Hmm.  Feels like we should either update them or delete them.  E.g.:

  *adds__multp2
  *subs__multp2

were added alongside the adds3.c and subs3.c tests that you're updating,
so if the tests don't/no longer need the multp2 patterns to pass,
there's a good chance that the patterns are redundant.

For reasons I never understood, the canonical representation is to use
(mult …) for powers of 2 inside a (mem …) but shifts outside of (mem …)s.
So perhaps the patterns were originally for address calculations that had
been moved outside of a (mem …) and not updated to shifts instead of mults.

AFAICT the full list of affected patterns is:

  *adds__multp2
  *subs__multp2
  *add__mult_
  *add_uxt_multp2
  *sub_uxt_multp2

Is that right?  If so, I think we should consider a follow-on patch
to delete them.

> Testing:
>  * New test which checks for the correct syntax in all updated
>patterns (fails before and passes after the aarch64.md change).
>  * New test can be assembled by both GAS and llvm-mc following the
>change.
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
>
> OK for master?

OK as-is if paired with a follow-on patch to delete the patterns above
(preapproved if it passes testing).  Also OK without a follow-on patch
if the fix is extended to the patterns above too (but the first option
is better :-)).

Thanks for taking the time to find a test for each pattern.

Richard


[PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-08-17 Thread Alex Coplan
Hello,

Given the following C function:

double *f(double *p, unsigned x)
{
return p + x;
}

prior to this patch, GCC at -O2 would generate:

f:
add x0, x0, x1, uxtw 3
ret

but this add instruction uses architecturally-invalid syntax: the width
of the third operand conflicts with the width of the extension
specifier. The third operand is only permitted to be an x register when
the extension specifier is (u|s)xtx.

This instruction, and analogous insns for adds, sub, subs, and cmp, are
rejected by clang, but accepted by binutils. Assembling and
disassembling such an insn with binutils gives the architecturally-valid
version in the disassembly:

   0:   8b214c00add x0, x0, w1, uxtw #3

This patch fixes several patterns in the AArch64 backend to use the
standard syntax as specified in the Arm ARM such that GCC's output can
be assembled by assemblers other than GAS.

Note that an obvious omission here is that this patch does not touch the
mult patterns such as *add__mult_. I found
that I couldn't hit these patterns with C code since multiplications by
powers of two always get turned into shifts by earlier RTL passes. If
there's a way to reliably hit these patterns, then perhaps these should
be updated as well.

Testing:
 * New test which checks for the correct syntax in all updated
   patterns (fails before and passes after the aarch64.md change).
 * New test can be assembled by both GAS and llvm-mc following the
   change.
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64.md
(*adds__): Ensure extended operand
agrees with width of extension specifier.
(*subs__): Likewise.
(*adds__shift_): Likewise.
(*subs__shift_): Likewise.
(*add__): Likewise.
(*add__shft_): Likewise.
(*add_uxt_shift2): Likewise.
(*sub__): Likewise.
(*sub__shft_): Likewise.
(*sub_uxt_shift2): Likewise.
(*cmp_swp__reg): Likewise.
(*cmp_swp__shft_): Likewise.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
* gcc.target/aarch64/cmp.c: Likewise.
* gcc.target/aarch64/subs3.c: Likewise.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.target/aarch64/extend-syntax.c: New test.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9b20dd0b1a0..b1e83dfda78 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2383,7 +2383,7 @@
(set (match_operand:GPI 0 "register_operand" "=r")
(plus:GPI (ANY_EXTEND:GPI (match_dup 1)) (match_dup 2)))]
   ""
-  "adds\\t%0, %2, %1, xt"
+  "adds\\t%0, %2, %w1, xt"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2397,7 +2397,7 @@
(set (match_operand:GPI 0 "register_operand" "=r")
(minus:GPI (match_dup 1) (ANY_EXTEND:GPI (match_dup 2]
   ""
-  "subs\\t%0, %1, %2, xt"
+  "subs\\t%0, %1, %w2, xt"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2415,7 +2415,7 @@
  (match_dup 2))
  (match_dup 3)))]
   ""
-  "adds\\t%0, %3, %1, xt %2"
+  "adds\\t%0, %3, %w1, xt %2"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2433,7 +2433,7 @@
   (ashift:GPI (ANY_EXTEND:GPI (match_dup 2))
   (match_dup 3]
   ""
-  "subs\\t%0, %1, %2, xt %3"
+  "subs\\t%0, %1, %w2, xt %3"
   [(set_attr "type" "alus_ext")]
 )
 
@@ -2549,7 +2549,7 @@
(plus:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 "register_operand" "r"))
  (match_operand:GPI 2 "register_operand" "r")))]
   ""
-  "add\\t%0, %2, %1, xt"
+  "add\\t%0, %2, %w1, xt"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -2571,7 +2571,7 @@
  (match_operand 2 "aarch64_imm3" "Ui3"))
  (match_operand:GPI 3 "register_operand" "r")))]
   ""
-  "add\\t%0, %3, %1, xt %2"
+  "add\\t%0, %3, %w1, xt %2"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -2819,7 +2819,7 @@
   "*
   operands[3] = GEN_INT (aarch64_uxt_size (INTVAL(operands[2]),
   INTVAL (operands[3])));
-  return \"add\t%0, %4, %1, uxt%e3 %2\";"
+  return \"add\t%0, %4, %w1, uxt%e3 %2\";"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3305,7 +3305,7 @@
   (ANY_EXTEND:GPI
(match_operand:ALLX 2 "register_operand" "r"]
   ""
-  "sub\\t%0, %1, %2, xt"
+  "sub\\t%0, %1, %w2, xt"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3328,7 +3328,7 @@
(match_operand:ALLX 2 "register_operand" "r"))
   (match_operand 3 "aarch64_imm3" "Ui3"]
   ""
-  "sub\\t%0, %1, %2, xt %3"
+  "sub\\t%0, %1, %w2, xt %3"
   [(set_attr "type" "alu_ext")]
 )
 
@@ -3607,7 +3607,7 @@
   "*
   operands[3] = GEN_INT (aarch64_uxt_size (INTVAL (operands[2]),
   INTVAL (operands[3])));
-  return \"sub\t%0, %4,