[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2017-09-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #12 from Segher Boessenkool  ---
I have a patch.

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2017-09-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #11 from Segher Boessenkool  ---
> Basically combine is going funny.

Hey, don't blame combine, it's all simplify-rtx's doing.

If in simplify_binary_operation_1 you disable the blocks after

  /* Minimize the number of bits set in C1, i.e. C1 := C1 & ~C2.  */

and after

  /* If we have (ior (and (X C1) C2)), simplify this by making
 C1 as small as possible if C1 actually changes.  */

all works as you want (this is on rs6000, pretty much the same thing):

===
Trying 10, 11 -> 12:
Failed to match this instruction:
(set (reg:SI 131)
(ior:SI (zero_extend:SI (mem:QI (reg/v/f:DI 128 [ string ]) [0
*string_9(D)+0 S1 A8]))
(const_int 96 [0x60])))
Successfully matched this instruction:
(set (reg:SI 130)
(zero_extend:SI (mem:QI (reg/v/f:DI 128 [ string ]) [0 *string_9(D)+0 S1
A8])))
Successfully matched this instruction:
(set (reg:SI 131)
(ior:SI (reg:SI 130)
(const_int 96 [0x60])))
allowing combination of insns 10, 11 and 12
original costs 8 + 4 + 4 = 16
replacement costs 8 + 4 = 12
deferring deletion of insn with uid = 10.
modifying insn i211: r130:SI=zero_extend([r128:DI])
deferring rescan insn with uid = 11.
modifying insn i312: r131:SI=r130:SI|0x60
  REG_DEAD r130:SI
deferring rescan insn with uid = 12.
===

giving

lbz 9,0(3)
ori 9,9,0x60
cmpwi 7,9,116
bne 7,.L5

etc.

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-09-25
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #10 from Andrew Pinski  ---
This is how we expand it on aarch64:
(insn 10 9 11 (set (reg:QI 81)
(mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) t.c:4 -1
 (nil))

(insn 11 10 12 (set (reg:SI 82)
(ior:SI (subreg:SI (reg:QI 81) 0)
(const_int 32 [0x20]))) t.c:4 -1
 (nil))

(insn 12 11 13 (set (reg:SI 83)
(zero_extend:SI (subreg:QI (reg:SI 82) 0))) t.c:4 -1
 (nil))

(insn 13 12 14 (set (reg:CC 66 cc)
(compare:CC (reg:SI 83)
(const_int 116 [0x74]))) t.c:4 -1
 (nil))


-
(set (reg:SI 83)
(ior:SI (and:SI (subreg:SI (mem:QI (reg/v/f:DI 80 [ string ]) [0
*string_9(D)+0 S1 A8]) 0)
(const_int 223 [0xdf]))
(const_int 32 [0x20])))

Notice how the and there is 223, but really that can be still a zero_extend. 
Basically combine is going funny.

 CUT 

Note for 33, orr does not accept 33 so combine does not see 33 and does not
change the and part around the subreg.

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #9 from Julian Andres Klode  ---
(In reply to Andrew Pinski from comment #8)
> This looks like missing removal of casts.
> 
> Note in C, char_var|32 is really the same as ((int)char_var)|32

Well. The loads of the byte are already zero-extend loads. The current code is
like saying:

   (int) ((int)char|32)

:)

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #8 from Andrew Pinski  ---
This looks like missing removal of casts.

Note in C, char_var|32 is really the same as ((int)char_var)|32

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

Florian Weimer  changed:

   What|Removed |Added

 CC||fw at gcc dot gnu.org

--- Comment #7 from Florian Weimer  ---
Something similar happens on x86_64 with -Os.  The OR instruction operates on
%eax instead of %al:

 :
   0:   8a 07   mov(%rdi),%al
   2:   83 c8 20or $0x20,%eax
   5:   3c 74   cmp$0x74,%al

(Byte-wise OR would be 0x0c 0x20, one byte shorter.)

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #6 from Julian Andres Klode  ---
(In reply to Andrew Pinski from comment #4)
> Note this testcase needs to be improved as I have a patch which converts a
> switch with just one case and a default into anew if statement.

FWIW, The issue is exactly the same with if statements, like in:

int TrieCase3(const char *string)
{
if((string[0] | 32) == 't') {
if((string[1] | 32) == 'a') {
if((string[2] | 32) == 'g') {
return 42;
}
}
}
return -1;
}

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #5 from Julian Andres Klode  ---
Created attachment 39678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39678=edit
ppc64le

Hmm, AFAICT the same seems to happen on powerpc64le:

lbz 9,0(3)  # Load zero
ori 9,9,0x20# ors in 32
rlwinm 9,9,0,0xff   # zero extend value AFAICT
cmpwi 7,9,116


So far tested:

good: mipsel, x86_64, armhf (thumb2)
bad: aarch64, powerpc64le

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

Andrew Pinski  changed:

   What|Removed |Added

 CC||pinskia at gcc dot gnu.org

--- Comment #4 from Andrew Pinski  ---
Note this testcase needs to be improved as I have a patch which converts a
switch with just one case and a default into anew if statement.

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #3 from Julian Andres Klode  ---
Created attachment 39677
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39677=edit
arm (thumb2) output at -O2

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #2 from Julian Andres Klode  ---
Created attachment 39676
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39676=edit
Aarch64 output at -O2

[Bug target/77729] aarch64 inserts unneeded uxtb after ldrb, orr ...32

2016-09-25 Thread j...@jak-linux.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77729

--- Comment #1 from Julian Andres Klode  ---
Created attachment 39675
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39675=edit
C reproducer