[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2016-04-29 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #8 from Oleg Endo  ---
(In reply to Oleg Endo from comment #1)
> 
> void test2_2 (unsigned int x, unsigned int* y)
> {
>   unsigned int xx = x >> 1;
>   unsigned int p = x & 1;
>   if (p != 0)
> foo (xx);
> }
> 

And of course also in the opposite direction:

void test4_2 (unsigned int x)
{
  if (x & (1 << 31))
((void(*)(void))(x << 1)) ();
}

[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2015-07-26 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #7 from Oleg Endo olegendo at gcc dot gnu.org ---
Another example:

unsigned int
count_trailing_nonzero_bits (unsigned int v, unsigned int c)
{
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  c += v  1;
  v = 1;
  return c;
}

ideally should compile to:
  mov   #0,r1
  shlr  r4
  movt  r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0
  shlr  r4
  addc  r1,r0


[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2015-01-11 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org ---
The shll/shlr insns effectively perform two operations:
  T = zero_extract single bit 0 / 31 from reg
  reg = reg  1 / reg  1.

The other shift insns as in comment #5 perform only a single operation.  Thus
those two things should be probably handled slightly differently.

With my current patchset for handling single bit zero_extract (PR 64345), code
like

void test4 (unsigned int x, unsigned int* y)
{
  y[0] = (x  0)  1;
  y[1] = (x  1)  1;
  y[2] = (x  2)  1;
  y[3] = (x  3)  1;
}

results in the following insns right after the combine pass:

(insn 7 4 8 2 (set (reg:SI 171 [ D.1733 ])
(and:SI (reg/v:SI 169 [ x ])
(const_int 1 [0x1]))) sh_tmp.cpp:432 115 {*andsi_compact}
 (nil))

...

(insn 10 9 11 2 (parallel [
(set (reg:SI 173 [ D.1733 ])
(zero_extract:SI (reg/v:SI 169 [ x ])
(const_int 1 [0x1])
(const_int 1 [0x1])))
(clobber (reg:SI 147 t))
]) sh_tmp.cpp:433 409 {any_treg_expr_to_reg}
 (expr_list:REG_UNUSED (reg:SI 147 t)
(nil)))

...

(insn 13 12 14 2 (parallel [
(set (reg:SI 175 [ D.1733 ])
(zero_extract:SI (reg/v:SI 169 [ x ])
(const_int 1 [0x1])
(const_int 2 [0x2])))
(clobber (reg:SI 147 t))
]) sh_tmp.cpp:434 409 {any_treg_expr_to_reg}
 (expr_list:REG_UNUSED (reg:SI 147 t)
(nil)))

...

(insn 16 15 17 2 (parallel [
(set (reg:SI 177 [ D.1733 ])
(zero_extract:SI (reg/v:SI 169 [ x ])
(const_int 1 [0x1])
(const_int 3 [0x3])))
(clobber (reg:SI 147 t))
]) sh_tmp.cpp:435 409 {any_treg_expr_to_reg}
 (expr_list:REG_UNUSED (reg:SI 147 t)
(expr_list:REG_DEAD (reg/v:SI 169 [ x ])
(nil

Those pseudo-insns are then split into tst/bld/movt/movrt sequences in the
split1 pass.  If a special shll/shlr pass is done right after combine and
before split1, it's possible to identify potential good shll/shlr sequences
rather easily and rewrite the code to use shll/shlr instead.


[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2014-12-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

Oleg Endo olegendo at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-12-02
 CC||segher at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Oleg Endo olegendo at gcc dot gnu.org ---
Combine recently received some updates which improve handling of multiple-set
parallel insns.  Applying the following:

Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md(revision 218250)
+++ gcc/config/sh/sh.md(working copy)
@@ -5156,6 +5156,12 @@
   DONE;
 }

+  if (operands[2] == const1_rtx)
+{
+  emit_insn (gen_shlr (operands[0], operands[1]));
+  DONE;
+}
+
   /* If the lshrsi3_* insn is going to clobber the T_REG it must be
  expanded here.  */
   if (CONST_INT_P (operands[2])


will always expand the multiple-set shlr insn and combine will be able to
utilize this.  The test case

void test2_1 (unsigned int x, unsigned int* y)
{
  y[0] = x  1;
  y[1] = x  1;
}

will compile to the desired sequence:
shlrr4
movtr1
mov.l   r4,@r5
rts
mov.l   r1,@(4,r5)


However, in the context of e.g. pointer tagging use cases, the tag bits are
usually used with conditional branches:

void test2_2 (unsigned int x, unsigned int* y)
{
  unsigned int xx = x  1;
  unsigned int p = x  1;
  if (p != 0)
foo (xx);
}

Combine can't handle this, because the shift and test insns end up in different
basic blocks.  Moreover, in order to utilize the shlr insn, the branch
condition needs to be inverted.  This could be done by emitting a movt-tst
sequence and let the sh_treg_combine pass optimize it away by inverting the
branch condition.


[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2014-12-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #2 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #1)
 
 void test2_1 (unsigned int x, unsigned int* y)
 {
   y[0] = x  1;
   y[1] = x  1;
 }
 
 will compile to the desired sequence:
 shlrr4
 movtr1
 mov.l   r4,@r5
 rts
 mov.l   r1,@(4,r5)


Changing the order of the operations to:

void test2_1 (unsigned int x, unsigned int* y)
{
  y[0] = x  1;
  y[1] = x  1;
}

will make it fail to combine the insns though.


[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2014-12-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #3 from Oleg Endo olegendo at gcc dot gnu.org ---
A more advanced example:

void test4 (unsigned int x, unsigned int* y)
{
  y[0] = (x  0)  1;
  y[1] = (x  1)  1;
  y[2] = x  2;
}

currently compiles to:
mov r4,r0
and #1,r0
mov.l   r0,@r5
mov r4,r0
shlrr0
and #1,r0
shlr2   r4
mov.l   r0,@(4,r5)
rts
mov.l   r4,@(8,r5)

better:
shlr r4
movt r0
shlr r4
mov.lr0,@r5
movt r1
mov.lr4,@(8,r5)
rts
mov.lr1,@(4,r5)


[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2014-12-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #4 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #1)
 Combine recently received some updates which improve handling of
 multiple-set parallel insns.  Applying the following:
 
 Index: gcc/config/sh/sh.md
 ===
 --- gcc/config/sh/sh.md   (revision 218250)
 +++ gcc/config/sh/sh.md   (working copy)
 @@ -5156,6 +5156,12 @@
DONE;
  }
  
 +  if (operands[2] == const1_rtx)
 +{
 +  emit_insn (gen_shlr (operands[0], operands[1]));
 +  DONE;
 +}
 +
/* If the lshrsi3_* insn is going to clobber the T_REG it must be
   expanded here.  */
if (CONST_INT_P (operands[2])
 
 
 will always expand the multiple-set shlr insn and combine will be able to
 utilize this.

Doing that for the shlr insn is OK, since there is no other alternative to do a
1 bit right shift without touching the T bit.  However, since there is a
non-T-bit-clobbering shll alternative (add x,x), doing the same for shll might
have negative side effects on other sequences.

[Bug target/63321] [SH] Unused T bit result of shll / shlr insns

2014-12-02 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63321

--- Comment #5 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #3)
 A more advanced example:
 
 void test4 (unsigned int x, unsigned int* y)
 {
   y[0] = (x  0)  1;
   y[1] = (x  1)  1;
   y[2] = x  2;
 }

Which is just another example of re-using intermediate results of stitched
shifts, only a bit more complex due to the multiple-set insns.

void test5 (unsigned int x, unsigned int* y)
{
  y[0] = x  (2);
  y[1] = x  (2 + 2);
  y[2] = x  (2 + 2 + 8);
}

currently compiles to:
mov r4,r1
shll2   r1
mov.l   r1,@r5
mov r4,r1
shll2   r1
shll2   r1
mov.l   r1,@(4,r5)
mov #12,r1
shldr1,r4
rts
mov.l   r4,@(8,r5)

better:
shll2   r4
mov.l   r4,@r5
shll2   r4
mov.l   r4,@(4,r5)
shll8   r4
rts
mov.l   r4,@(8,r5)

See also some examples in PR 54089.