Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-21 Thread Jeff Law
On 05/02/2018 04:05 PM, Jim Wilson wrote:
> This improves the code for a switch statement on targets that sign-extend
> function arguments, such as RISC-V.  Given a simple testcase
> 
> extern void asdf(int);
> void foo(int x) {
>   switch (x) {
>   case 0: asdf(10); break;
>   case 1: asdf(11); break;
>   case 2: asdf(12); break;
>   case 3: asdf(13); break;
>   case 4: asdf(14); break;
>   }
> }
> 
> Compiled for a 64-bit target, we get for the tablejump
> 
>   li  a5,4
>   bgtua0,a5,.L1
>   sllia0,a0,32
>   lui a5,%hi(.L4)
>   addia5,a5,%lo(.L4)
>   srlia0,a0,30
>   add a0,a0,a5
>   lw  a5,0(a0)
>   jr  a5
> 
> There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then
> shifted right by 30 to zero-extend it and multiply by 4 for the table index.
> However, after the unsigned greater than branch, we know the value is between
> 0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended
> long for this target.  Thus we get the same exact value if we sign-extend
> instead of zero-extend, and the code is one instruction shorter.  We get a 
> slli
> by 2 instead of the slli 32/srli 30.
> 
> The following patch implements this optimization.  It checks for a range that
> does not have the sign-bit set, and an index value that is already sign
> extended, and then does a sign extend instead of an zero extend.
> 
> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite 
> runs.
> There were no regressions.  It was also tested with an x86_64-linux build and
> testsuite run.
> 
> Ok?
> 
> Jim
> 
>   gcc/
>   * expr.c (do_tablejump): When converting index to Pmode, if we have a
>   sign extended promoted subreg, and the range does not have the sign bit
>   set, then do a sign extend.
OK.

Jeff


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-17 Thread Jim Wilson
On Thu, May 17, 2018 at 12:25 AM, Eric Botcazou  wrote:
> The patch looks OK to me, modulo:
>
>> +   && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1
>
> I'd use UINTVAL instead of INTVAL here.

Thanks.  Committed with that change.

Jim


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-17 Thread Eric Botcazou
> The following patch implements this optimization.  It checks for a range
> that does not have the sign-bit set, and an index value that is already
> sign extended, and then does a sign extend instead of an zero extend.
> 
> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite
> runs. There were no regressions.  It was also tested with an x86_64-linux
> build and testsuite run.
> 
> Ok?
> 
> Jim
> 
>   gcc/
>   * expr.c (do_tablejump): When converting index to Pmode, if we have a
>   sign extended promoted subreg, and the range does not have the sign bit
>   set, then do a sign extend.

Richard dragged me into this so I feel somewhat entitled to step up...

The patch looks OK to me, modulo:

> +  /* We know the value of INDEX is between 0 and RANGE.  If we have a
> +  sign-extended subreg, and RANGE does not have the sign bit set, then
> +  we have a value that is valid for both sign and zero extension.  In
> +  this case, we get better code if we sign extend.  */
> +  if (GET_CODE (index) == SUBREG
> +   && SUBREG_PROMOTED_VAR_P (index)
> +   && SUBREG_PROMOTED_SIGNED_P (index)
> +   && ((width = GET_MODE_PRECISION (as_a  (mode)))
> +   <= HOST_BITS_PER_WIDE_INT)
> +   && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1

I'd use UINTVAL instead of INTVAL here.

-- 
Eric Botcazou


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-16 Thread Jim Wilson
On Wed, May 9, 2018 at 2:20 PM, Jim Wilson  wrote:
> On Wed, May 2, 2018 at 3:05 PM, Jim Wilson  wrote:
>> This improves the code for a switch statement on targets that sign-extend
>> function arguments, such as RISC-V.  Given a simple testcase
>> ...
>> gcc/
>> * expr.c (do_tablejump): When converting index to Pmode, if we have a
>> sign extended promoted subreg, and the range does not have the sign 
>> bit
>> set, then do a sign extend.

ping^2

https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00118.html

Jim


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-09 Thread Jim Wilson
On Wed, May 2, 2018 at 3:05 PM, Jim Wilson  wrote:
> This improves the code for a switch statement on targets that sign-extend
> function arguments, such as RISC-V.  Given a simple testcase
> ...
> gcc/
> * expr.c (do_tablejump): When converting index to Pmode, if we have a
> sign extended promoted subreg, and the range does not have the sign 
> bit
> set, then do a sign extend.

Ping.

Jim


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-04 Thread Richard Biener
On Thu, May 3, 2018 at 11:29 PM, Jim Wilson  wrote:
> On Thu, May 3, 2018 at 12:29 AM, Richard Biener
>  wrote:
>> Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile
>> - I remember
>> Eric fixing things up a bit but some verification would be nice to
>> have (instrumentation
>> at RTL level that for SUBREG_PROMOTED_* the bits are as expected).
>
> If you are using SUBREG_PROMOTED_* in a late optimization pass like
> combine, then this requires that all earlier optimization passes
> propagate the info correctly.  I suppose there could be issues there.
> But that isn't what this patch is doing.  This is code called during
> initial RTL generation.  The SUBREG_PROMOTED_* bits are set during
> this process because we know that arguments are passed sign-extended
> to full register size.  We are then consuming the info while still in
> the RTL generation phase.  I think that there is little that can go
> wrong here.

Indeed.  But IIRC the info stays around and people might be tempted to use it...
I do see various uses in later RTL optimizers.

> Verifying this info in RTL generation would effectively be verifying
> that the argument passing conventions are implemented correctly, and
> we already have other ways to do that.
>
> It might be useful to try to verify this info before combine where it
> is more likely to be wrong.  I don't think there is any easy way to
> verify this at compile time.  This would probably require emitting
> code to check at application run-time that a promoted subreg actually
> has a properly promoted value, and call abort if it doesn't.  This
> would likely be an expensive check that we don't want enabled by
> default, but might be useful for debugging purposes.  I don't think we
> have any --enable-checking code like this at present.  We have
> compiler compile-time checking and compiler run-time checking, but I
> don't think that we have application run-time checking.  This would be
> more like a sanitizer option, except to validate info in the RTL.  Is
> this what you are asking for?

Yes, that's what I was suggesting.  I guess similarly "sanitizing"
on-the-side info we have at GIMPLE level would be interesting, like
verifying range-info.

Just an idea - I'm not actually expecting you to implemen this,
esp. since doing the actual instrumentation on RTL can be a bit
tricky (best not emit a call to abort but use the targets trap
instruction for simplicity).

Richard.

>
> Jim


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-03 Thread Jim Wilson
On Thu, May 3, 2018 at 12:29 AM, Richard Biener
 wrote:
> Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile
> - I remember
> Eric fixing things up a bit but some verification would be nice to
> have (instrumentation
> at RTL level that for SUBREG_PROMOTED_* the bits are as expected).

If you are using SUBREG_PROMOTED_* in a late optimization pass like
combine, then this requires that all earlier optimization passes
propagate the info correctly.  I suppose there could be issues there.
But that isn't what this patch is doing.  This is code called during
initial RTL generation.  The SUBREG_PROMOTED_* bits are set during
this process because we know that arguments are passed sign-extended
to full register size.  We are then consuming the info while still in
the RTL generation phase.  I think that there is little that can go
wrong here.

Verifying this info in RTL generation would effectively be verifying
that the argument passing conventions are implemented correctly, and
we already have other ways to do that.

It might be useful to try to verify this info before combine where it
is more likely to be wrong.  I don't think there is any easy way to
verify this at compile time.  This would probably require emitting
code to check at application run-time that a promoted subreg actually
has a properly promoted value, and call abort if it doesn't.  This
would likely be an expensive check that we don't want enabled by
default, but might be useful for debugging purposes.  I don't think we
have any --enable-checking code like this at present.  We have
compiler compile-time checking and compiler run-time checking, but I
don't think that we have application run-time checking.  This would be
more like a sanitizer option, except to validate info in the RTL.  Is
this what you are asking for?

Jim


Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-03 Thread Richard Biener
On Thu, May 3, 2018 at 12:05 AM, Jim Wilson  wrote:
> This improves the code for a switch statement on targets that sign-extend
> function arguments, such as RISC-V.  Given a simple testcase
>
> extern void asdf(int);
> void foo(int x) {
>   switch (x) {
>   case 0: asdf(10); break;
>   case 1: asdf(11); break;
>   case 2: asdf(12); break;
>   case 3: asdf(13); break;
>   case 4: asdf(14); break;
>   }
> }
>
> Compiled for a 64-bit target, we get for the tablejump
>
> li  a5,4
> bgtua0,a5,.L1
> sllia0,a0,32
> lui a5,%hi(.L4)
> addia5,a5,%lo(.L4)
> srlia0,a0,30
> add a0,a0,a5
> lw  a5,0(a0)
> jr  a5
>
> There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then
> shifted right by 30 to zero-extend it and multiply by 4 for the table index.
> However, after the unsigned greater than branch, we know the value is between
> 0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended
> long for this target.  Thus we get the same exact value if we sign-extend
> instead of zero-extend, and the code is one instruction shorter.  We get a 
> slli
> by 2 instead of the slli 32/srli 30.
>
> The following patch implements this optimization.  It checks for a range that
> does not have the sign-bit set, and an index value that is already sign
> extended, and then does a sign extend instead of an zero extend.
>
> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite 
> runs.
> There were no regressions.  It was also tested with an x86_64-linux build and
> testsuite run.
>
> Ok?

Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile
- I remember
Eric fixing things up a bit but some verification would be nice to
have (instrumentation
at RTL level that for SUBREG_PROMOTED_* the bits are as expected).

Richard.

> Jim
>
> gcc/
> * expr.c (do_tablejump): When converting index to Pmode, if we have a
> sign extended promoted subreg, and the range does not have the sign 
> bit
> set, then do a sign extend.
> ---
>  gcc/expr.c | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 9dd0e60d24d..919e20a22f7 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11782,11 +11782,26 @@ do_tablejump (rtx index, machine_mode mode, rtx 
> range, rtx table_label,
>  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
>  default_label, default_probability);
>
> -
>/* If index is in range, it must fit in Pmode.
>   Convert to Pmode so we can index with it.  */
>if (mode != Pmode)
> -index = convert_to_mode (Pmode, index, 1);
> +{
> +  unsigned int width;
> +
> +  /* We know the value of INDEX is between 0 and RANGE.  If we have a
> +sign-extended subreg, and RANGE does not have the sign bit set, then
> +we have a value that is valid for both sign and zero extension.  In
> +this case, we get better code if we sign extend.  */
> +  if (GET_CODE (index) == SUBREG
> + && SUBREG_PROMOTED_VAR_P (index)
> + && SUBREG_PROMOTED_SIGNED_P (index)
> + && ((width = GET_MODE_PRECISION (as_a  (mode)))
> + <= HOST_BITS_PER_WIDE_INT)
> + && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1
> +   index = convert_to_mode (Pmode, index, 0);
> +  else
> +   index = convert_to_mode (Pmode, index, 1);
> +}
>
>/* Don't let a MEM slip through, because then INDEX that comes
>   out of PIC_CASE_VECTOR_ADDRESS won't be a valid address,
> --
> 2.14.1
>


[PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-02 Thread Jim Wilson
This improves the code for a switch statement on targets that sign-extend
function arguments, such as RISC-V.  Given a simple testcase

extern void asdf(int);
void foo(int x) {
  switch (x) {
  case 0: asdf(10); break;
  case 1: asdf(11); break;
  case 2: asdf(12); break;
  case 3: asdf(13); break;
  case 4: asdf(14); break;
  }
}

Compiled for a 64-bit target, we get for the tablejump

li  a5,4
bgtua0,a5,.L1
sllia0,a0,32
lui a5,%hi(.L4)
addia5,a5,%lo(.L4)
srlia0,a0,30
add a0,a0,a5
lw  a5,0(a0)
jr  a5

There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then
shifted right by 30 to zero-extend it and multiply by 4 for the table index.
However, after the unsigned greater than branch, we know the value is between
0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended
long for this target.  Thus we get the same exact value if we sign-extend
instead of zero-extend, and the code is one instruction shorter.  We get a slli
by 2 instead of the slli 32/srli 30.

The following patch implements this optimization.  It checks for a range that
does not have the sign-bit set, and an index value that is already sign
extended, and then does a sign extend instead of an zero extend.

This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite runs.
There were no regressions.  It was also tested with an x86_64-linux build and
testsuite run.

Ok?

Jim

gcc/
* expr.c (do_tablejump): When converting index to Pmode, if we have a
sign extended promoted subreg, and the range does not have the sign bit
set, then do a sign extend.
---
 gcc/expr.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 9dd0e60d24d..919e20a22f7 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11782,11 +11782,26 @@ do_tablejump (rtx index, machine_mode mode, rtx 
range, rtx table_label,
 emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 default_label, default_probability);
 
-
   /* If index is in range, it must fit in Pmode.
  Convert to Pmode so we can index with it.  */
   if (mode != Pmode)
-index = convert_to_mode (Pmode, index, 1);
+{
+  unsigned int width;
+
+  /* We know the value of INDEX is between 0 and RANGE.  If we have a
+sign-extended subreg, and RANGE does not have the sign bit set, then
+we have a value that is valid for both sign and zero extension.  In
+this case, we get better code if we sign extend.  */
+  if (GET_CODE (index) == SUBREG
+ && SUBREG_PROMOTED_VAR_P (index)
+ && SUBREG_PROMOTED_SIGNED_P (index)
+ && ((width = GET_MODE_PRECISION (as_a  (mode)))
+ <= HOST_BITS_PER_WIDE_INT)
+ && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1
+   index = convert_to_mode (Pmode, index, 0);
+  else
+   index = convert_to_mode (Pmode, index, 1);
+}
 
   /* Don't let a MEM slip through, because then INDEX that comes
  out of PIC_CASE_VECTOR_ADDRESS won't be a valid address,
-- 
2.14.1