Re: [PATCH][ARM] Fix broken shift patterns

2011-10-07 Thread Andrew Stubbs

On 06/10/11 18:17, Paul Brook wrote:

I believe this patch to be nothing but an improvement over the current
state, and that a fix to the constraint problem should be a separate patch.

In that basis, am I OK to commit?


One minor nit:


(define_special_predicate shift_operator
...
+   (ior (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
+ ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  
32)
+   (match_test REG_P (XEXP (op, 1))


We're already enforcing the REG_P elsewhere, and it's only valid in some
contexts, so I'd change this to:
 (match_test GET_CODE (XEXP (op, 1)) != CONST_INT
|| ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)


Done, and attached.


3. Consistently accept both power-of-two and 0..31 for shifts.  Large shift
counts give undefined results[1], so replace them with an arbitrary value
(e.g. 0) during assembly output.  Argualy not an entirely proper fix, but I
think it'll keep everything happy.


I think we need to be careful not to change the behaviour between 
different optimization levels and/or perturbations caused by minor code 
changes. I know this isn't a hard requirement for undefined behaviour, 
but I think it's still good practice.


In this case, I believe the hardware simply shifts the the value clean 
out of the register, and always returns a zero (or maybe -1 for 
ashiftrt?). I'm not sure what it does for rotate.


Anyway, my point is that I don't think that we could insert an immediate 
that had the same effect in all cases.



For bonus points we should probably disallow MULT in the arm_shiftsi3 pattern,
stop it interacting with the regulat mulsi3 pattern in undesirable ways.


How might that be a problem? Is it not the case that canonical forms 
already deals with this?


Anyway, it's easily achieved with an extra predicate.

Andrew
2011-10-07  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/predicates.md (shift_amount_operand): Remove constant
	range check.
	(shift_operator): Check range of constants for all shift operators.

	gcc/testsuite/
	* gcc.dg/pr50193-1.c: New file.
	* gcc.target/arm/shiftable.c: New file.

--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -129,11 +129,12 @@
   (ior (match_operand 0 arm_rhs_operand)
(match_operand 0 memory_operand)))
 
+;; This doesn't have to do much because the constant is already checked
+;; in the shift_operator predicate.
 (define_predicate shift_amount_operand
   (ior (and (match_test TARGET_ARM)
 	(match_operand 0 s_register_operand))
-   (and (match_code const_int)
-	(match_test ((unsigned HOST_WIDE_INT) INTVAL (op))  32
+   (match_operand 0 const_int_operand)))
 
 (define_predicate arm_add_operand
   (ior (match_operand 0 arm_rhs_operand)
@@ -219,13 +220,20 @@
(match_test mode == GET_MODE (op
 
 ;; True for shift operators.
+;; Notes:
+;;  * mult is only permitted with a constant shift amount
+;;  * patterns that permit register shift amounts only in ARM mode use
+;;shift_amount_operand, patterns that always allow registers do not,
+;;so we don't have to worry about that sort of thing here.
 (define_special_predicate shift_operator
   (and (ior (ior (and (match_code mult)
 		  (match_test power_of_two_operand (XEXP (op, 1), mode)))
 		 (and (match_code rotate)
 		  (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
 ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)))
-	(match_code ashift,ashiftrt,lshiftrt,rotatert))
+	(and (match_code ashift,ashiftrt,lshiftrt,rotatert)
+		 (match_test GET_CODE (XEXP (op, 1)) != CONST_INT
+			  || ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)))
(match_test mode == GET_MODE (op
 
 ;; True for shift operators which can be used with saturation instructions.
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr50193-1.c
@@ -0,0 +1,10 @@
+/* PR 50193: ARM: ICE on a | (b  negative-constant) */
+/* Ensure that the compiler doesn't ICE.  */
+
+/* { dg-options -O2 } */
+
+int
+foo(int a, int b)
+{
+  return a | (b  -3); /* { dg-warning left shift count is negative } */
+}
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/shiftable.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-require-effective-target arm32 } */
+
+int
+plus (int a, int b)
+{
+  return (a * 64) + b;
+}
+
+/* { dg-final { scan-assembler add.*\[al]sl #6 } } */
+
+int
+minus (int a, int b)
+{
+  return a - (b * 64);
+}
+
+/* { dg-final { scan-assembler sub.*\[al]sl #6 } } */
+
+int
+ior (int a, int b)
+{
+  return (a * 64) | b;
+}
+
+/* { dg-final { scan-assembler orr.*\[al]sl #6 } } */
+
+int
+xor (int a, int b)
+{
+  return (a * 64) ^ b;
+}
+
+/* { dg-final { scan-assembler eor.*\[al]sl #6 } } */
+
+int
+and (int a, int b)
+{
+  return (a * 64)  b;
+}
+
+/* { dg-final { scan-assembler and.*\[al]sl #6 } } */


Re: [PATCH][ARM] Fix broken shift patterns

2011-10-07 Thread Andrew Stubbs

On 06/10/11 16:01, Andrew Stubbs wrote:

  (define_special_predicate shift_operator
(and (ior (ior (and (match_code mult)
  (match_test power_of_two_operand (XEXP (op, 1), mode)))
 (and (match_code rotate)
  (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
  ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 
1)))  32)))
-   (match_code ashift,ashiftrt,lshiftrt,rotatert))
+   (and (match_code ashift,ashiftrt,lshiftrt,rotatert)
+(ior (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
+ ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 
1)))  32)
+ (match_test REG_P (XEXP (op, 1))
 (match_test mode == GET_MODE (op


Oh, I forgot to say, I don't understand why the rotate operator is 
special cased?


If I understand it correctly, the effect of the (existing) rotate is 
both to check the constant range, AND to disallow registers as the shift 
amount. This difference has no effect on Thumb, but might cause ARM mode 
some troubles?


Is this likely to be deliberate, or an oversight? I can't see any reason 
in the ARM ARM why this should be the case.


Andrew


Re: [PATCH][ARM] Fix broken shift patterns

2011-10-07 Thread Paul Brook
 Oh, I forgot to say, I don't understand why the rotate operator is
 special cased?
 
 If I understand it correctly, the effect of the (existing) rotate is
 both to check the constant range, AND to disallow registers as the shift
 amount. This difference has no effect on Thumb, but might cause ARM mode
 some troubles?
 
 Is this likely to be deliberate, or an oversight? I can't see any reason
 in the ARM ARM why this should be the case.

Deliberate. ARM only has rotatert (which for immediate operands can be 
substituted at assembly generation time).

Paul


Re: [PATCH][ARM] Fix broken shift patterns

2011-10-07 Thread Paul Brook
 Done, and attached.

Ok.

Two changes to the testcase that I'll pre-approve:
- Add a comment along the lines of
  /* ARM has shift-and-alu insns.  Depending on the ALU op GCC represents some
   of these as a left shift, others as a multiply.  Check that we match the
   right one.  */
- Also test (a * 64) - b [rsb] and ~(a * 64) [mvn]

  3. Consistently accept both power-of-two and 0..31 for shifts.  Large
  shift counts give undefined results[1], so replace them with an
  arbitrary value (e.g. 0) during assembly output.  Argualy not an
  entirely proper fix, but I think it'll keep everything happy.
 
 I think we need to be careful not to change the behaviour between
 different optimization levels and/or perturbations caused by minor code
 changes. I know this isn't a hard requirement for undefined behaviour,
 but I think it's still good practice.

 In this case, I believe the hardware simply shifts the the value clean
 out of the register, and always returns a zero (or maybe -1 for
 ashiftrt?).

I'm not convinced.  ARM instructions shift modulo 256
(i.e. a  (b  0xff).  I'm pretty sure anything that gets constant-folded by 
earlier passes will not honor these semantics.
 
  For bonus points we should probably disallow MULT in the arm_shiftsi3
  pattern, stop it interacting with the regular mulsi3 pattern in
  undesirable ways.
 
 How might that be a problem? Is it not the case that canonical forms
 already deals with this?

Mainly just general principle that having two insns with the same pattern is 
wrong - reload can't flip between them like it can different altrnatives, and 
there's obscure rules about which one wins when both match.

In this case the shiftsi variant is very restricted and should never occur in 
the first place so it probably doesn't matter.

Paul


Re: [PATCH][ARM] Fix broken shift patterns

2011-10-07 Thread Andrew Stubbs

On 07/10/11 13:37, Paul Brook wrote:

Done, and attached.


Ok.

Two changes to the testcase that I'll pre-approve:
- Add a comment along the lines of
   /* ARM has shift-and-alu insns.  Depending on the ALU op GCC represents some
of these as a left shift, others as a multiply.  Check that we match the
right one.  */
- Also test (a * 64) - b [rsb] and ~(a * 64) [mvn]


Thanks, I've committed the attached.


In this case, I believe the hardware simply shifts the the value clean
out of the register, and always returns a zero (or maybe -1 for
ashiftrt?).


I'm not convinced.  ARM instructions shift modulo 256
(i.e. a  (b  0xff).  I'm pretty sure anything that gets constant-folded by
earlier passes will not honor these semantics.


True, but I'm still not sure what the least wrong way to do this might be.


For bonus points we should probably disallow MULT in the arm_shiftsi3
pattern, stop it interacting with the regular mulsi3 pattern in
undesirable ways.


How might that be a problem? Is it not the case that canonical forms
already deals with this?


Mainly just general principle that having two insns with the same pattern is
wrong - reload can't flip between them like it can different altrnatives, and
there's obscure rules about which one wins when both match.


OK, so we'd just need a shift_operator_that_isnt_mult predicate, 
(probably not with that name).


Andrew
2011-10-07  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/predicates.md (shift_amount_operand): Remove constant
	range check.
	(shift_operator): Check range of constants for all shift operators.

	gcc/testsuite/
	* gcc.dg/pr50193-1.c: New file.
	* gcc.target/arm/shiftable.c: New file.

--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -129,11 +129,12 @@
   (ior (match_operand 0 arm_rhs_operand)
(match_operand 0 memory_operand)))
 
+;; This doesn't have to do much because the constant is already checked
+;; in the shift_operator predicate.
 (define_predicate shift_amount_operand
   (ior (and (match_test TARGET_ARM)
 	(match_operand 0 s_register_operand))
-   (and (match_code const_int)
-	(match_test ((unsigned HOST_WIDE_INT) INTVAL (op))  32
+   (match_operand 0 const_int_operand)))
 
 (define_predicate arm_add_operand
   (ior (match_operand 0 arm_rhs_operand)
@@ -219,13 +220,20 @@
(match_test mode == GET_MODE (op
 
 ;; True for shift operators.
+;; Notes:
+;;  * mult is only permitted with a constant shift amount
+;;  * patterns that permit register shift amounts only in ARM mode use
+;;shift_amount_operand, patterns that always allow registers do not,
+;;so we don't have to worry about that sort of thing here.
 (define_special_predicate shift_operator
   (and (ior (ior (and (match_code mult)
 		  (match_test power_of_two_operand (XEXP (op, 1), mode)))
 		 (and (match_code rotate)
 		  (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
 ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)))
-	(match_code ashift,ashiftrt,lshiftrt,rotatert))
+	(and (match_code ashift,ashiftrt,lshiftrt,rotatert)
+		 (match_test GET_CODE (XEXP (op, 1)) != CONST_INT
+			  || ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)))
(match_test mode == GET_MODE (op
 
 ;; True for shift operators which can be used with saturation instructions.
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr50193-1.c
@@ -0,0 +1,10 @@
+/* PR 50193: ARM: ICE on a | (b  negative-constant) */
+/* Ensure that the compiler doesn't ICE.  */
+
+/* { dg-options -O2 } */
+
+int
+foo(int a, int b)
+{
+  return a | (b  -3); /* { dg-warning left shift count is negative } */
+}
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/shiftable.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-require-effective-target arm32 } */
+
+/* ARM has shift-and-alu insns.  Depending on the ALU op GCC represents some
+   of these as a left shift, others as a multiply.  Check that we match the
+right one.  */
+
+int
+plus (int a, int b)
+{
+  return (a * 64) + b;
+}
+
+/* { dg-final { scan-assembler add.*\[al]sl #6 } } */
+
+int
+minus (int a, int b)
+{
+  return a - (b * 64);
+}
+
+/* { dg-final { scan-assembler sub.*\[al]sl #6 } } */
+
+int
+ior (int a, int b)
+{
+  return (a * 64) | b;
+}
+
+/* { dg-final { scan-assembler orr.*\[al]sl #6 } } */
+
+int
+xor (int a, int b)
+{
+  return (a * 64) ^ b;
+}
+
+/* { dg-final { scan-assembler eor.*\[al]sl #6 } } */
+
+int
+and (int a, int b)
+{
+  return (a * 64)  b;
+}
+
+/* { dg-final { scan-assembler and.*\[al]sl #6 } } */
+
+int
+rsb (int a, int b)
+{
+  return (a * 64) - b;
+}
+
+/* { dg-final { scan-assembler rsb.*\[al]sl #6 } } */
+
+int
+mvn (int a, int b)
+{
+  return ~(a * 64);
+}
+
+/* { dg-final { scan-assembler mvn.*\[al]sl #6 } } */


[PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Andrew Stubbs

This patch is a follow-up both to my patches here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00049.html

and Paul Brook's patch here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01076.html

The patch fixes both the original problem, in which negative shift 
constants caused an ICE (pr50193), and the problem introduced by Paul's 
patch, in which a*64+b is not properly optimized.


However, it does not attempt to fix Richard Sandiford's observation that 
there may be a latent problem with the 'M' constraint which could lead 
reload to cause a recog ICE.


I believe this patch to be nothing but an improvement over the current 
state, and that a fix to the constraint problem should be a separate patch.


In that basis, am I OK to commit?



Now, let me explain the other problem:

As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit 
a shift to be expressed as either a shift type and amount (register or 
constant), or as a multiply and power-of-two constant.


This is necessary because the canonical form of (plus (ashift x y) z) 
appears to be (plus (mult x 2^y) z), presumably for the benefit of 
multiply-and-accumulate optimizations. (Minus is similarly affected, but 
other shiftable operations are unaffected, and this only applies to left 
shifts, of course.)


The (possible) problem is that the meanings of the constants for mult 
and ashift are very different, but the arm.md file has these unified 
into a single pattern using a single 'M' constraint that must allow both 
types of constant unconditionally. This is safe for the vast majority of 
passes because they check recog before they make a change, and anyway 
don't make changes without understanding the logic. But, reload has a 
feature where it can pull a constant from a register, and convert it to 
an immediate, if the constraints allow, but crucially, it doesn't check 
the predicates; no doubt it shouldn't need to, but the ARM port appears 
to be breaking to rules.


Problem scenario 1:

  Consider pattern (plus (mult r1 r2) r3).

  It so happens that reload knows that r2 contains a constant, say 20,
  so reload checks to see if that could be converted to an immediate.
  Now, 20 is not a power of two, so recog would reject it, but it is in
  the range 0..31 so it does match the 'M' constraint. Oops!

Problem scenario 2:

  Consider pattern (ashiftrt r1 r2).

  Again, it so happens that reload knows that r2 contains a constant, in
  this case let's say 64, so again reload checks to see if that could
  be converted to an immediate. This time, 64 is not in the range
  0..31, so recog would reject it, but it is a power of two, so it does
  match the 'M' constraint. Again, oops!

I see two ways to fix this properly:

 1. Duplicate all the patterns in the machine description, once for the
mult case, and once for the other cases. This could probably be
done with a code iterator, if preferred.

 2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
always uses the (presumably cheaper) shift-and-add option. However,
this would require all other targets where madd really is the best
option to fix it up. (I'd imagine that two instructions for shift
and add would be cheaper speed wise, if properly scheduled, on most
targets? That doesn't help the size optimization though.)

However, it's not obvious to me that this needs fixing:
 * The failure mode would be an ICE, and we've not seen any.
 * There's a comment in arm.c:shift_op that suggests that this can't
   happen, somehow, at least in the mult case.
   - I'm not sure exactly how reload works, but it seems reasonable
 that it will never try to convert a register to an immediate
 because the pattern does not allow registers in the first place.
   - This logic doesn't hold in the opposite case though.

Have I explained all that clearly?

My conclusion after studying all this is that we don't need to do 
anything until somebody reports an ICE, at which point it becomes worth 
the effort of fixing it. Other opinions welcome! :)


Andrew
2011-10-06  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/predicates.md (shift_amount_operand): Remove constant
	range check.
	(shift_operator): Check range of constants for all shift operators.

	gcc/testsuite/
	* gcc.dg/pr50193-1.c: New file.
	* gcc.target/arm/shiftable.c: New file.

---
 src/gcc-mainline/gcc/config/arm/predicates.md  |   15 ++-
 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c  |   10 +
 .../gcc/testsuite/gcc.target/arm/shiftable.c   |   43 
 3 files changed, 65 insertions(+), 3 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/shiftable.c

diff --git a/src/gcc-mainline/gcc/config/arm/predicates.md b/src/gcc-mainline/gcc/config/arm/predicates.md
index 27ba603..7307fd5 100644
--- a/src/gcc-mainline/gcc/config/arm/predicates.md
+++ 

Re: [PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Paul Brook
 I believe this patch to be nothing but an improvement over the current
 state, and that a fix to the constraint problem should be a separate patch.
 
 In that basis, am I OK to commit?

One minor nit:

 (define_special_predicate shift_operator
...
+  (ior (match_test GET_CODE (XEXP (op, 1)) == CONST_INT
+   ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  
32)
+  (match_test REG_P (XEXP (op, 1))
 
We're already enforcing the REG_P elsewhere, and it's only valid in some 
contexts, so I'd change this to:
(match_test GET_CODE (XEXP (op, 1)) != CONST_INT
|| ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)))  32)


 Now, let me explain the other problem:
 
 As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit
 a shift to be expressed as either a shift type and amount (register or
 constant), or as a multiply and power-of-two constant.

Added complication is that only ARM mode accepts a register.
 
 Problem scenario 1:
 
Consider pattern (plus (mult r1 r2) r3).
 
It so happens that reload knows that r2 contains a constant, say 20,
so reload checks to see if that could be converted to an immediate.
Now, 20 is not a power of two, so recog would reject it, but it is in
the range 0..31 so it does match the 'M' constraint. Oops!

Though as you mention below we the predicate don't allow the second operand to 
be a register, so this can never happen.  Reload may do unexpected things, but 
if it starts randomly changing valid const_int values then we have much bigger 
problems.
 
 Problem scenario 2:
 
Consider pattern (ashiftrt r1 r2).
 
Again, it so happens that reload knows that r2 contains a constant, in
this case let's say 64, so again reload checks to see if that could
be converted to an immediate. This time, 64 is not in the range
0..31, so recog would reject it, but it is a power of two, so it does
match the 'M' constraint. Again, oops!
 
 I see two ways to fix this properly:
 
   1. Duplicate all the patterns in the machine description, once for the
  mult case, and once for the other cases. This could probably be
  done with a code iterator, if preferred.
 
   2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
  always uses the (presumably cheaper) shift-and-add option. However,
  this would require all other targets where madd really is the best
  option to fix it up. (I'd imagine that two instructions for shift
  and add would be cheaper speed wise, if properly scheduled, on most
  targets? That doesn't help the size optimization though.)

3. Consistently accept both power-of-two and 0..31 for shifts.  Large shift 
counts give undefined results[1], so replace them with an arbitrary value
(e.g. 0) during assembly output.  Argualy not an entirely proper fix, but I 
think it'll keep everything happy.
 
 However, it's not obvious to me that this needs fixing:
   * The failure mode would be an ICE, and we've not seen any.

Then again noone noticed the negative-shift ICE until recently :-/

   * There's a comment in arm.c:shift_op that suggests that this can't
 happen, somehow, at least in the mult case.
 - I'm not sure exactly how reload works, but it seems reasonable
   that it will never try to convert a register to an immediate
   because the pattern does not allow registers in the first place.
 - This logic doesn't hold in the opposite case though.
 Have I explained all that clearly?

I think you've convered most of it.

For bonus points we should probably disallow MULT in the arm_shiftsi3 pattern, 
stop it interacting with the regulat mulsi3 pattern in undesirable ways.

Paul

[1] Or at least not any result gcc will be expecting.