[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2023-04-11 Thread aagarwa at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

Ajit Kumar Agarwal  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2021-08-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

--- Comment #8 from Segher Boessenkool  ---
(In reply to Segher Boessenkool from comment #7)
> (In reply to Peter Cordes from comment #6)
> > # power64 GCC 9.2.1 (ATI13.0)
> > rlwimi 3,4,0,255# bit-blend according to mask, rotate count=0
> > rldicl 3,3,0,32 # Is this zero-extension to 64-bit redundant?
> 
> It is: the rlwinm does an AND with 0xff already, so that clears the top 32
> bits for sure.

Wow I cannot read, it is an rlwimi, so scratch that.

The rlwimi here keeps the top 56 bits intact, and they already were 0, so the
insn still is redundant.

[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2021-08-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

--- Comment #7 from Segher Boessenkool  ---
(In reply to Peter Cordes from comment #6)
> # power64 GCC 9.2.1 (ATI13.0)
> rlwimi 3,4,0,255# bit-blend according to mask, rotate count=0
> rldicl 3,3,0,32 # Is this zero-extension to 64-bit redundant?

It is: the rlwinm does an AND with 0xff already, so that clears the top 32
bits for sure.

> But ppc64 GCC does zero-extension of the result from 32 to 64-bit, which is
> probably not needed unless the calling convention has different requirements
> for return values than for incoming args.  (I don't know PPC well enough.)

Return values have to be properly (sign- or zero-) extended for its type,
just like function arguments.

[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2021-08-22 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #6 from Peter Cordes  ---
For a simpler test case, GCC 4.8.5 did redundantly mask before using
bitfield-insert, but GCC 9.2.1 doesn't.


unsigned merge2(unsigned a, unsigned b){
return (a&0xFF00u) | (b&0xFFu);
}

https://godbolt.org/z/froExaPxe
# PowerPC (32-bit) GCC 4.8.5
rlwinm 4,4,0,0xff # b &= 0xFF is totally redundant
rlwimi 3,4,0,24,31
blr

# power64 GCC 9.2.1 (ATI13.0)
rlwimi 3,4,0,255# bit-blend according to mask, rotate count=0
rldicl 3,3,0,32 # Is this zero-extension to 64-bit redundant?
blr

But ppc64 GCC does zero-extension of the result from 32 to 64-bit, which is
probably not needed unless the calling convention has different requirements
for return values than for incoming args.  (I don't know PPC well enough.)

So for at least some cases, modern GCC does ok.

Also, when the blend isn't split at a byte boundary, even GCC4.8.5 manages to
avoid redundant masking before the bitfield-insert.

unsigned merge2(unsigned a, unsigned b){
return (a & 0xFF80u) | (b & 0x7Fu);
}

rlwimi 3,4,0,25,31   # GCC4.8.5, 32-bit so no zero-extension
blr

[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2021-08-01 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

--- Comment #5 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #4)
> As long as nothing on the rtl level (combine) does not mess this up, it
> should produce the best code.

combine cannot ever create worse code than it had as input :-)

[Bug middle-end/82940] Suboptimal code for (a & 0x7f) | (b & 0x80) on powerpc

2021-08-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 CC||pinskia at gcc dot gnu.org

--- Comment #4 from Andrew Pinski  ---
I have a set of patches that I am going to clean up next year for GCC 13 where
on the gimple level GCC produces:
  _13 = v_9(D) & 127;
  _1 = (sizetype) _13;
  _2 = t_10(D) + _1;
  _3 = *_2;
  _14 = () _3;
  _12 = BIT_INSERT_EXPR ;
  return _12;

As long as nothing on the rtl level (combine) does not mess this up, it should
produce the best code.