Re: [patch][ARM] Fix 16-bit - 64-bit multiply and accumulate

2011-05-03 Thread Bernd Schmidt
On 04/15/2011 12:54 PM, Andrew Stubbs wrote:
 Ping.

Trying to unblock this...

I think the point is that both examples:

long long foolong (long long x, short *a, short *b)
   {
   return x + (long long)*a * (long long)*b;
   }

and


  long long foolong (long long x, short *a, short *b)
  {
  return x + *a * *b;
  }

should produce the same code using the maddhidi pattern, and for that to
happen, simplify_rtx must produce a canonical form of the 16-64 bit
widening multiply. The form that already exists in arm.md looks better
so we should pick that.

I tried to fix it with the patch below, which unfortunately doesn't work
since during combine we don't see the SIGN_EXTEND operations inside the
MULT, but two shift operations instead. Maybe you can complete it from here?


Bernd
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 173242)
+++ gcc/simplify-rtx.c  (working copy)
@@ -1000,6 +1000,23 @@ simplify_unary_operation_1 (enum rtx_cod
   GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
return XEXP (op, 0);
 
+  /* Extending a widening multiplication should be canonicalized to
+a wider widening multiplication.  */
+  if (GET_CODE (op) == MULT
+  GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+  GET_CODE (XEXP (op, 1)) == SIGN_EXTEND)
+   {
+ rtx op0 = XEXP (XEXP (op, 0), 0);
+ rtx op1 = XEXP (XEXP (op, 1), 0);
+ enum machine_mode op0_mode = GET_MODE (op0);
+ enum machine_mode op1_mode = GET_MODE (op1);
+ return simplify_gen_binary (MULT, mode,
+ simplify_gen_unary (SIGN_EXTEND, mode,
+ op0, op0_mode),
+ simplify_gen_unary (SIGN_EXTEND, mode,
+ op1, op1_mode));
+   }
+
   /* Check for a sign extension of a subreg of a promoted
 variable, where the promotion is sign-extended, and the
 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1088,23 @@ simplify_unary_operation_1 (enum rtx_cod
   GET_MODE_SIZE (mode) = GET_MODE_SIZE (GET_MODE (XEXP (op, 0
return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+  /* Extending a widening multiplication should be canonicalized to
+a wider widening multiplication.  */
+  if (GET_CODE (op) == MULT
+  GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+  GET_CODE (XEXP (op, 1)) == ZERO_EXTEND)
+   {
+ rtx op0 = XEXP (XEXP (op, 0), 0);
+ rtx op1 = XEXP (XEXP (op, 1), 0);
+ enum machine_mode op0_mode = GET_MODE (op0);
+ enum machine_mode op1_mode = GET_MODE (op1);
+ return simplify_gen_binary (MULT, mode,
+ simplify_gen_unary (ZERO_EXTEND, mode,
+ op0, op0_mode),
+ simplify_gen_unary (ZERO_EXTEND, mode,
+ op1, op1_mode));
+   }
+
   /* (zero_extend:M (zero_extend:N X)) is (zero_extend:M X).  */
   if (GET_CODE (op) == ZERO_EXTEND)
return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),


Re: [patch][ARM] Fix 16-bit - 64-bit multiply and accumulate

2011-05-03 Thread Andrew Stubbs

On 03/05/11 10:07, Bernd Schmidt wrote:

I tried to fix it with the patch below, which unfortunately doesn't work
since during combine we don't see the SIGN_EXTEND operations inside the
MULT, but two shift operations instead. Maybe you can complete it from here?


I'll take a look.

Thanks for the help! :)

Andrew


Re: [patch][ARM] Fix 16-bit - 64-bit multiply and accumulate

2011-04-15 Thread Andrew Stubbs

Ping.

On 25/03/11 16:19, Andrew Stubbs wrote:

On 28/01/11 15:20, Richard Earnshaw wrote:


On Fri, 2011-01-28 at 15:13 +, Andrew Stubbs wrote:

On 28/01/11 14:12, Richard Earnshaw wrote:

So what happens to a variation of your testcase:

long long foolong (long long x, short *a, short *b)
{
return x + (long long)*a * (long long)*b;
}

With your patch? This should generate identical code to your original
test-case.



The patch has no effect on that testcase - it's broken in some other
way, I think, and the same with and without my patch:

ldrsh r3, [r3, #0]
ldrsh r2, [r2, #0]
push {r4, r5}
asrs r4, r3, #31
asrs r5, r2, #31
mul r4, r2, r4
mla r4, r3, r5, r4
umull r2, r3, r2, r3
adds r3, r4, r3
adds r0, r0, r2
adc r1, r1, r3
pop {r4, r5}
bx lr

Hmmm, that probably doesn't add anything useful to the discussion. :(

I'll add that one to the todo list ...

Andrew



Ouch! I though that used to work :-(



I looked at this one again, but on a second inspection I'm not sure
there's much wrong with it?

When I wrote the above I thought that there was a 64-bit multiply
instruction, but now I look more closely I see there isn't, hence the
above. It does two 16-bit loads, sign-extends the inputs to 64-bit, does
a 64-bit - 64-bit multiply, and then adds 'x'.

Can the umull/add/add/adc be optimized using umlal? It's too late on a
Friday to workout what's going on with the carries 

Also, I don't fully understand why the compiler can't just disregard the
casts and use maddhidi4? Isn't that mathematically equivalent in this case?

When you say it used to work, what did it use to look like?

Thanks

Andrew