[Bug rtl-optimization/63259] Detecting byteswap sequence

2015-01-06 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #21 from thopre01 at gcc dot gnu.org ---
Author: thopre01
Date: Tue Jan  6 11:51:16 2015
New Revision: 219256

URL: https://gcc.gnu.org/viewcvs?rev=219256root=gccview=rev
Log:
2015-01-06  Thomas Preud'homme  thomas.preudho...@arm.com

gcc/
PR tree-optimization/63259
* tree-ssa-math-opts.c (pass_optimize_bswap::execute): Stop checking
if optab exists for 16bit byteswap.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-math-opts.c


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-22 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #19 from thopre01 at gcc dot gnu.org ---
Yeah, when doing something like (x[0]  8) | x[1])  8) | x[2])  8) |
x[3] there is already a depth proportional to the size of the value being byte
swapped with a coefficient due to casting. But I need to evaluate the impact of
increasing the limit in terms of compilation time. If the impact is noticeable,
it might be necessary to do the refactoring suggested by Richard Biener in [1]
first.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00616.html
From the performance side the pass could be re-structured to populate
a lattice, thus work from def to use instead of the other way around.  Which
means we visit each stmt exactly once, compute its value symbolically
and check it against a rotate.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-22 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #20 from Oleg Endo olegendo at gcc dot gnu.org ---
BTW, PR 42587 contains a few other cases.  I've created PR 64376 for the SH
specific parts.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-21 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #18 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #17)
 (In reply to thopre01 from comment #16)
  
  Did we? All I can find is you and Andreas mentionning that it should work
  because it will be sign extended to int when doing the bitwise AND with
  0xFF00.
  
  What did I miss?
 
 Ah sorry ... me bad.  I haven't tried increasing the 'level' with the
 bswaphi expander pattern in place.  Will do that later.

So I did.  Increasing the limit to

  limit += 2 + (int) ceil_log2 ...

works for me for the signed short case.

However, the following function from the CSiBE set:

void
_nrrdSwap32Endian(void *_data, size_t N) {
  int *data, w, fix;
  size_t I;

  if (_data) {
data = (int *)_data;
for (I=0; IN; I++) {
  w = data[I];
  fix =  (w  0x00FF);
  fix = ((w  0xFF00)  0x08) | (fix  0x08);
  fix = ((w  0x00FF)  0x10) | (fix  0x08);
  fix = ((w  0xFF00)  0x18) | (fix  0x08);
  data[I] = fix;
}
  }
}

seems to require 'limit += 3 + (int) ...' for the bswap insn to be detected.

Then, this fine function (from the same set)

void
_nrrdSwap64Endian(void *_data, size_t N) {
  airLLong *data, l, fix;
  size_t I;

  if (_data) {
data = (airLLong *)_data;
for (I=0; IN; I++) {
  l = data[I];
  fix =  (l  0x00FF);
  fix = ((l  0xFF00)  0x08) | (fix  0x08);
  fix = ((l  0x00FF)  0x10) | (fix  0x08);
  fix = ((l  0xFF00)  0x18) | (fix  0x08);
#if defined(_WIN32)
  fix = ((l  0x00FFi64)  0x20) | (fix  0x08);
  fix = ((l  0xFF00i64)  0x28) | (fix  0x08);
  fix = ((l  0x00FFi64)  0x30) | (fix  0x08);
  fix = ((l  0xFF00i64)  0x38) | (fix  0x08);
#else
  fix = ((l  0x00FFLL)  0x20) | (fix  0x08);
  fix = ((l  0xFF00LL)  0x28) | (fix  0x08);
  fix = ((l  0x00FFLL)  0x30) | (fix  0x08);
  fix = ((l  0xFF00LL)  0x38) | (fix  0x08);
#endif
  data[I] = fix;
}
  }
}

requires 'limit += 6 + (int) ...'

(luckily, there's no _nrrdSwap128Endian in the set)


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-19 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #14 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #13)
 (In reply to thopre01 from comment #12)
  
  That's good, it means the pattern is recognized. Is there an optab defined
  for bswap16?
 
 Nope.  Just this:
 
 (define_insn rotlhi3_8
   [(set (match_operand:HI 0 arith_reg_dest =r)
   (rotate:HI (match_operand:HI 1 arith_reg_operand r)
  (const_int 8)))]
   TARGET_SH1
   swap.b %1,%0
   [(set_attr type arith)])
 
 If I remember correctly, there was something that would check whether it can
 use the rotate pattern instead of a bswaphi2.

It's rather the contrary. The bswap pass will replace the statements by a 8 bit
rotation if the value is 16bit and the expander will choose a bswaphi pattern
for that if the backend has one, otherwise it will keep the rotation.

The problem is that currently the bswap pass still bails out if there is no 16
bit bswap available. I shall fix that.

 After adding the following
 pattern:
 
 (define_expand bswaphi2
   [(set (match_operand:HI 0 arith_reg_dest)
   (bswap:HI (match_operand:HI 1 arith_reg_operand)))]
   TARGET_SH1
 {
   if (!can_create_pseudo_p ())
 FAIL;
   else
 {
   emit_insn (gen_rotlhi3_8 (operands[0], operands[1]));
   DONE;
 }
 })
 
 it looks much better.  The cases above work, except for the signed short. 

You mean with the added bswaphi2 pattern the pattern is still unchanged?

 On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the
 32 bit register.  What it does is:
 
 unsigned int test_0999 (unsigned int a, unsigned int b)
 {
   return (a  0x) | ((a  0xFF00)  8) | ((a  0xFF)  8);
 }
 
 I was afraid that using a bswap:HI will result in unnecessary code around it:
 
 
   mov.l   .L6,r0  ! r0 = 0x
   and r4,r0
   extu.w  r4,r4
   swap.b  r4,r4
   rts
   or  r4,r0

Looks good to me, what exactly is the problem?

 
 In my case, combine is looking for the pattern:
 
 Failed to match this instruction:
 (set (reg:SI 172 [ D.1356 ])
 (ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ])
 (const_int 8 [0x8]))
 (const_int 65280 [0xff00]))
 (zero_extract:SI (reg/v:SI 170 [ a ])
 (const_int 8 [0x8])
 (const_int 8 [0x8])))
 (and:SI (reg/v:SI 170 [ a ])
 (const_int -65536 [0x]
 
 Which should then work.

If you have a bswap instruction it seems better to define a pattern for that
which the expander will use. That's the job of the bswap pass to detect a
bswap, it shouldn't be done by combine.

 
  
  Same as the other pattern, can you try to print n-n in hex with this new
  limit? My guess is that the pattern is now recognized but fails later for
  the same reason as above.
 
 With increased limit, the number is/was the same.  The bswap:HI expander was
 missing.  Thanks!

Ok so the limit ought to be increased and the check for bswaphi removed. I'll
take a stab at that but that will probably make it only after Christmas.

[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-19 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #15 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to thopre01 from comment #14)
 
 It's rather the contrary. The bswap pass will replace the statements by a 8
 bit rotation if the value is 16bit and the expander will choose a bswaphi
 pattern for that if the backend has one, otherwise it will keep the rotation.
 
 The problem is that currently the bswap pass still bails out if there is no
 16 bit bswap available. I shall fix that.

Ah, I see.

 You mean with the added bswaphi2 pattern the pattern is still unchanged?
 

After adding bswaphi2, the bswap pass does the transformation.  Except for the
non-working 'signed short' mentioned above.  But we already figured that out
earlier.


  On SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the
  32 bit register.  What it does is:
  
  unsigned int test_0999 (unsigned int a, unsigned int b)
  {
return (a  0x) | ((a  0xFF00)  8) | ((a  0xFF)  8);
  }
  
  I was afraid that using a bswap:HI will result in unnecessary code around 
  it:
  
  
  mov.l   .L6,r0  ! r0 = 0x
  and r4,r0
  extu.w  r4,r4
  swap.b  r4,r4
  rts
  or  r4,r0
 
 Looks good to me, what exactly is the problem?

The expected sequence for the function above is:

rts
swap.b  r4,r0

i.e. no anding and oring of lower/higher 16 bit word, since the swap.b insn
operates on a SImode value and does not alter the high 16 bits.

 
 If you have a bswap instruction it seems better to define a pattern for that
 which the expander will use. That's the job of the bswap pass to detect a
 bswap, it shouldn't be done by combine.

The combine parts I was talking about are to eliminate the anding and oring of
higher 16 bits when a 16 bit byte swap is done on a 32 bit value.


 Ok so the limit ought to be increased and the check for bswaphi removed.
 I'll take a stab at that but that will probably make it only after Christmas.

No problem.

[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-19 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #16 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #15)
 (In reply to thopre01 from comment #14)
 
  You mean with the added bswaphi2 pattern the pattern is still unchanged?
  
 
 After adding bswaphi2, the bswap pass does the transformation.  Except for
 the non-working 'signed short' mentioned above.  But we already figured that
 out earlier.

Did we? All I can find is you and Andreas mentionning that it should work
because it will be sign extended to int when doing the bitwise AND with 0xFF00.

What did I miss?

 
 The expected sequence for the function above is:
 
 rts
 swap.b  r4,r0
 
 i.e. no anding and oring of lower/higher 16 bit word, since the swap.b insn
 operates on a SImode value and does not alter the high 16 bits.

Oh yeah right.

 
  
  If you have a bswap instruction it seems better to define a pattern for that
  which the expander will use. That's the job of the bswap pass to detect a
  bswap, it shouldn't be done by combine.
 
 The combine parts I was talking about are to eliminate the anding and oring
 of higher 16 bits when a 16 bit byte swap is done on a 32 bit value.

I'm surprised that it's not the semantic of a bswaphi.

Best regards,

Thomas


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-19 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #17 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to thopre01 from comment #16)
 
 Did we? All I can find is you and Andreas mentionning that it should work
 because it will be sign extended to int when doing the bitwise AND with
 0xFF00.
 
 What did I miss?

Ah sorry ... me bad.  I haven't tried increasing the 'level' with the bswaphi
expander pattern in place.  Will do that later.

   If you have a bswap instruction it seems better to define a pattern for 
   that
   which the expander will use. That's the job of the bswap pass to detect a
   bswap, it shouldn't be done by combine.
  
  The combine parts I was talking about are to eliminate the anding and oring
  of higher 16 bits when a 16 bit byte swap is done on a 32 bit value.
 
 I'm surprised that it's not the semantic of a bswaphi.

As far as I know...
bswaphi's inputs/output modes are HImode and it operates on the whole reg, not
on a subreg.  If applied to a SImode subreg the other bits need to be
saved/restored, which is what happens wiit the anding/oring.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-18 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #13 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to thopre01 from comment #12)
 
 That's good, it means the pattern is recognized. Is there an optab defined
 for bswap16?

Nope.  Just this:

(define_insn rotlhi3_8
  [(set (match_operand:HI 0 arith_reg_dest =r)
(rotate:HI (match_operand:HI 1 arith_reg_operand r)
   (const_int 8)))]
  TARGET_SH1
  swap.b%1,%0
  [(set_attr type arith)])

If I remember correctly, there was something that would check whether it can
use the rotate pattern instead of a bswaphi2.  After adding the following
pattern:

(define_expand bswaphi2
  [(set (match_operand:HI 0 arith_reg_dest)
(bswap:HI (match_operand:HI 1 arith_reg_operand)))]
  TARGET_SH1
{
  if (!can_create_pseudo_p ())
FAIL;
  else
{
  emit_insn (gen_rotlhi3_8 (operands[0], operands[1]));
  DONE;
}
})

it looks much better.  The cases above work, except for the signed short.  On
SH the bswap:HI HW insn actually doesn't modify the upper 16 bit of the 32 bit
register.  What it does is:

unsigned int test_0999 (unsigned int a, unsigned int b)
{
  return (a  0x) | ((a  0xFF00)  8) | ((a  0xFF)  8);
}

I was afraid that using a bswap:HI will result in unnecessary code around it:


mov.l.L6,r0  ! r0 = 0x
andr4,r0
extu.wr4,r4
swap.br4,r4
rts
orr4,r0

In my case, combine is looking for the pattern:

Failed to match this instruction:
(set (reg:SI 172 [ D.1356 ])
(ior:SI (ior:SI (and:SI (ashift:SI (reg/v:SI 170 [ a ])
(const_int 8 [0x8]))
(const_int 65280 [0xff00]))
(zero_extract:SI (reg/v:SI 170 [ a ])
(const_int 8 [0x8])
(const_int 8 [0x8])))
(and:SI (reg/v:SI 170 [ a ])
(const_int -65536 [0x]

Which should then work.

 
 Same as the other pattern, can you try to print n-n in hex with this new
 limit? My guess is that the pattern is now recognized but fails later for
 the same reason as above.

With increased limit, the number is/was the same.  The bswap:HI expander was
missing.  Thanks!


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-17 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #12 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #11)
 (In reply to thopre01 from comment #10)
  
  I have the same gimple and for me the bswap is correctly detected. Can you
  break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if
  (!source_stmt) and show me the output of p/x n-n ?
 
 n-n = 0x0102  limit = 4

That's good, it means the pattern is recognized. Is there an optab defined for
bswap16?

 
 For both, test_099 and test_08.
 
  Indeed, my mistake. Ok I tested a bit and found that the problem is the
  depth at which it's looking. Try to recompile tree-ssa-math-opts.c after
  increasing the limit number in find_bswap_or_nop. Right now the limit will
  evaluate to 4 and the gimple I have has a depth of 5.
 
 I've tried ...
 
   limit += 10 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
 
 ... but it doesn't change anything here.

Same as the other pattern, can you try to print n-n in hex with this new
limit? My guess is that the pattern is now recognized but fails later for the
same reason as above.

Best regards.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-16 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #11 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to thopre01 from comment #10)
 
 I have the same gimple and for me the bswap is correctly detected. Can you
 break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if
 (!source_stmt) and show me the output of p/x n-n ?

n-n = 0x0102  limit = 4

For both, test_099 and test_08.

 Indeed, my mistake. Ok I tested a bit and found that the problem is the
 depth at which it's looking. Try to recompile tree-ssa-math-opts.c after
 increasing the limit number in find_bswap_or_nop. Right now the limit will
 evaluate to 4 and the gimple I have has a depth of 5.

I've tried ...

  limit += 10 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);

... but it doesn't change anything here.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-15 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #10 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #9)
 (In reply to thopre01 from comment #7)
  
  Strange, could you show the output of -fdump-tree-bswap?
 
 Not so strange at all. 

What is strange is that it should detect such bswap pattern.

 After looking at the RTL dumps, I've noticed that
 the swap.b insn is generated by the combine pass.  I've got a few combine
 patterns for matching byte swaps on SH.  The pattern for swap.b doesn't
 combine well with other ops around/on it.  -fdump-tree-bswap says:
 
 ;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1,
 symbol_order=1)
 
 test_08 (short unsigned int a, short unsigned int b)
 {
   short unsigned int _2;
   signed short _3;
   int _4;
   int _5;
   signed short _6;
   signed short _7;
   short unsigned int _8;
   short unsigned int _10;
 
   bb 2:
   _2 = a_1(D)  8;
   _3 = (signed short) _2;
   _4 = (int) a_1(D);
   _5 = _4  8;
   _6 = (signed short) _5;
   _7 = _3 | _6;
   _8 = (short unsigned int) _7;
   _10 = _8 + b_9(D);
   return _10;
 
 }

I have the same gimple and for me the bswap is correctly detected. Can you
break at find_bswap_or_nop just after calling find_bswap_or_nop_1 on the if
(!source_stmt) and show me the output of p/x n-n ?

 
 
   Byte swapping of signed short types seems to be not working:
   
   short test_func_111 (short a, short b, short c)
   {
 return (((a  0xFF00)  8) | ((a  0xFF)  8));
   }
   
   exts.w  r4,r4
   mov r4,r0
   shlr8   r0
   extu.b  r0,r0
   shll8   r4
   or  r0,r4
   rts
   exts.w  r4,r0
  
  That's expected. Think about what happens if a = 0x8001. Doing a right shift
  by 8 bit would give 0xFF80 (due to the most significant bit being 1). The
  right part of the bitwise OR would give 0x0100 as expected and the result
  would be 0xFF80, so not a byte swap. It would work with an int though as the
  highest bit would then be 0, or with unsigned short as a right shift would
  introduce 0 in the most significant bits.
 
 As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap.

Indeed, my mistake. Ok I tested a bit and found that the problem is the depth
at which it's looking. Try to recompile tree-ssa-math-opts.c after increasing
the limit number in find_bswap_or_nop. Right now the limit will evaluate to 4
and the gimple I have has a depth of 5.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-14 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #6 from Oleg Endo olegendo at gcc dot gnu.org ---
With r218705 on SH (-O2 -m4 -ml) I get the following:

unsigned short test_099 (unsigned short a, unsigned short b)
{
  return (((a  0xFF00)  8) | ((a  0xFF)  8));
}

compiles to:
extu.wr4,r4
rts
swap.br4,r0


unsigned short test_08 (unsigned short a, unsigned short b)
{
  return b + (((a  0xFF00)  8) | ((a  0xFF)  8));
}

compiles to:
extu.w  r4,r4
mov r4,r0
shll8   r4
shlr8   r0
or  r0,r4
add r4,r5
rts
extu.w  r5,r0


Byte swapping of signed short types seems to be not working:

short test_func_111 (short a, short b, short c)
{
  return (((a  0xFF00)  8) | ((a  0xFF)  8));
}

exts.w  r4,r4
mov r4,r0
shlr8   r0
extu.b  r0,r0
shll8   r4
or  r0,r4
rts
exts.w  r4,r0


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-14 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #7 from thopre01 at gcc dot gnu.org ---
(In reply to Oleg Endo from comment #6)
 With r218705 on SH (-O2 -m4 -ml) I get the following:
 
 unsigned short test_099 (unsigned short a, unsigned short b)
 {
   return (((a  0xFF00)  8) | ((a  0xFF)  8));
 }
 
 compiles to:
 extu.wr4,r4
 rts
 swap.br4,r0

This one looks ok except for the zero-extension. Is the swap.b instruction
limited to 32 bit values?

 
 
 unsigned short test_08 (unsigned short a, unsigned short b)
 {
   return b + (((a  0xFF00)  8) | ((a  0xFF)  8));
 }
 
 compiles to:
 extu.w  r4,r4
 mov r4,r0
 shll8   r4
 shlr8   r0
 or  r0,r4
 add r4,r5
 rts
 extu.w  r5,r0

Strange, could you show the output of -fdump-tree-bswap?

 
 
 Byte swapping of signed short types seems to be not working:
 
 short test_func_111 (short a, short b, short c)
 {
   return (((a  0xFF00)  8) | ((a  0xFF)  8));
 }
 
 exts.w  r4,r4
 mov r4,r0
 shlr8   r0
 extu.b  r0,r0
 shll8   r4
 or  r0,r4
 rts
 exts.w  r4,r0

That's expected. Think about what happens if a = 0x8001. Doing a right shift by
8 bit would give 0xFF80 (due to the most significant bit being 1). The right
part of the bitwise OR would give 0x0100 as expected and the result would be
0xFF80, so not a byte swap. It would work with an int though as the highest bit
would then be 0, or with unsigned short as a right shift would introduce 0 in
the most significant bits.


Best regards,

Thomas

[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-14 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #8 from Andreas Schwab sch...@linux-m68k.org ---
(a  0xFF00)  8 with short a = 0x8001 evaluates to 0x80, since all operands
are first promoted to int.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-12-14 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #9 from Oleg Endo olegendo at gcc dot gnu.org ---
(In reply to thopre01 from comment #7)
 (In reply to Oleg Endo from comment #6)
  With r218705 on SH (-O2 -m4 -ml) I get the following:
  
  unsigned short test_099 (unsigned short a, unsigned short b)
  {
return (((a  0xFF00)  8) | ((a  0xFF)  8));
  }
  
  compiles to:
  extu.w  r4,r4
  rts
  swap.b  r4,r0
 
 This one looks ok except for the zero-extension. Is the swap.b instruction
 limited to 32 bit values?

Yes, in sh.md swap.b is defined for SImode values only.  But never mind the
extu.w, it's an SH ABI thing (PR 52441).

 
  
  
  unsigned short test_08 (unsigned short a, unsigned short b)
  {
return b + (((a  0xFF00)  8) | ((a  0xFF)  8));
  }
  
  compiles to:
  extu.w  r4,r4
  mov r4,r0
  shll8   r4
  shlr8   r0
  or  r0,r4
  add r4,r5
  rts
  extu.w  r5,r0
 
 Strange, could you show the output of -fdump-tree-bswap?

Not so strange at all.  After looking at the RTL dumps, I've noticed that the
swap.b insn is generated by the combine pass.  I've got a few combine patterns
for matching byte swaps on SH.  The pattern for swap.b doesn't combine well
with other ops around/on it.  -fdump-tree-bswap says:

;; Function test_08 (test_08, funcdef_no=1, decl_uid=1333, cgraph_uid=1,
symbol_order=1)

test_08 (short unsigned int a, short unsigned int b)
{
  short unsigned int _2;
  signed short _3;
  int _4;
  int _5;
  signed short _6;
  signed short _7;
  short unsigned int _8;
  short unsigned int _10;

  bb 2:
  _2 = a_1(D)  8;
  _3 = (signed short) _2;
  _4 = (int) a_1(D);
  _5 = _4  8;
  _6 = (signed short) _5;
  _7 = _3 | _6;
  _8 = (short unsigned int) _7;
  _10 = _8 + b_9(D);
  return _10;

}


  Byte swapping of signed short types seems to be not working:
  
  short test_func_111 (short a, short b, short c)
  {
return (((a  0xFF00)  8) | ((a  0xFF)  8));
  }
  
  exts.w  r4,r4
  mov r4,r0
  shlr8   r0
  extu.b  r0,r0
  shll8   r4
  or  r0,r4
  rts
  exts.w  r4,r0
 
 That's expected. Think about what happens if a = 0x8001. Doing a right shift
 by 8 bit would give 0xFF80 (due to the most significant bit being 1). The
 right part of the bitwise OR would give 0x0100 as expected and the result
 would be 0xFF80, so not a byte swap. It would work with an int though as the
 highest bit would then be 0, or with unsigned short as a right shift would
 introduce 0 in the most significant bits.

As Andreas mentioned, 'a' is promoted to int, so this should be a byte swap.

[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-10-31 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #5 from thopre01 at gcc dot gnu.org ---
Author: thopre01
Date: Fri Oct 31 11:55:07 2014
New Revision: 216971

URL: https://gcc.gnu.org/viewcvs?rev=216971root=gccview=rev
Log:
2014-10-31  Thomas Preud'homme  thomas.preudho...@arm.com

gcc/
PR tree-optimization/63259
* tree-ssa-math-opts.c (bswap_replace): Replace expression by a
rotation left if it is a 16 bit byte swap.
(pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR
and RROTATE_EXPR statements if it is a byte rotation.

gcc/testsuite/
PR tree-optimization/63259
* optimize-bswapsi-1.c (swap32_f): New bswap pass test.
* optimize-bswaphi-1.c: Drop useless SIType definition and fix typo in
following comment.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
trunk/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
trunk/gcc/tree-ssa-math-opts.c


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-09-29 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

--- Comment #4 from thopre01 at gcc dot gnu.org ---
I detect no noticeable difference when bootstrapping gcc with or without the
patch so I think we're in for a fix. :-)


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-09-28 Thread thopre01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

thopre01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from thopre01 at gcc dot gnu.org ---
The reason is that the last gimple statement of this bswap is a rotation right
on the result of a bitwise OR. Right now bswap are searched only from bitwise
OR statement doing recursion until the source. Therefore the analysis is only
done on a subset of the statements making up this bswap.

The fix is easy but needs compilation benchmarking to make sure this doesn't
increase the cost of compiling in a noticeable way. The more statement are
considered as finishing a bswap, the closer we are to a O(n²) algorithm.

[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-09-25 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

Oleg Endo olegendo at gcc dot gnu.org changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org,
   ||thopre01 at gcc dot gnu.org

--- Comment #2 from Oleg Endo olegendo at gcc dot gnu.org ---
Thomas just recently did some bswap patterns work, maybe he's got an idea.


[Bug rtl-optimization/63259] Detecting byteswap sequence

2014-09-19 Thread hp at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63259

Hans-Peter Nilsson hp at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-09-19
 CC||hp at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Hans-Peter Nilsson hp at gcc dot gnu.org ---
Confirmed the general observation; observed (as far back as) 4.7 era and trunk
at r215401 for cris-elf (-march=v8 or higher needed).