[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-05-07 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #19 from Roger Sayle  ---
This should now be fixed on mainline.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:650c36ec461a722d9c65e82512b4c3aeec2ffee1

commit r14-335-g650c36ec461a722d9c65e82512b4c3aeec2ffee1
Author: Roger Sayle 
Date:   Fri Apr 28 14:21:53 2023 +0100

PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (reg/v:HI 49 [ b ])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
 (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

2023-04-28  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/109476
* lower-subreg.cc: Include explow.h for force_reg.
(find_decomposable_shift_zext): Pass an additional SPEED_P
argument.
If decomposing a suitable LSHIFTRT and we're not splitting
ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
instead of setting a high part SUBREG to zero, which helps combine.
(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
PR rtl-optimization/109476
* gcc.target/avr/mmcu/pr109476.c: New test case.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-23 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com

--- Comment #17 from Roger Sayle  ---
I've submitted an improved version of my patch for review:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616527.html

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-14 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #16 from Wilhelm M  ---
(In reply to Roger Sayle from comment #14)
> My apologies for the delay/issues.  My bootstrap and regression testing of
> this patch (on x86_64-pc-linux-gnu) revealed an issue or two (including the
> reported ICE).  My plan was to fix/resolve all these before posting a
> concrete submission to gcc-patches.  

We all appreciate your great effort in this case! Please don't hesitate to send
here some patches to test with. I'll be happy to test your patches!

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-14 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #15 from Wilhelm M  ---
Just checked actual gcc 13.0.1 without the patch: then no ICE accurs.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-13 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #14 from Roger Sayle  ---
My apologies for the delay/issues.  My bootstrap and regression testing of this
patch (on x86_64-pc-linux-gnu) revealed an issue or two (including the reported
ICE).  My plan was to fix/resolve all these before posting a concrete
submission to gcc-patches.  The general approach is solid (thanks to everyone
that agreed this was the correct place to fix things) but it's the corner cases
(such as RTL sharing) that all need to be addressed prior to a reviewable
submission.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-13 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #13 from Wilhelm M  ---
Yes, the ICE is with the patch. I'm pretty sure that this does not happen
without the patch, but I will that check again.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #12 from Segher Boessenkool  ---
With the modified compiler?  Does it ICE with an unmodified compiler as well?

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-13 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #11 from Wilhelm M  ---
After testing some more code, I got this ICE:

/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc: In static member
function 'static void GlobalFsm::ratePeriodic() [with BusDevs
=BusDevs > >]':
/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc:426:5: error:
unrecognizable insn:
426 | }
| ^
(insn 1584 1583 25 3 (set (concatn:HI [
(reg:QI 800)
(reg:QI 801 [+1 ])
])
(reg:HI 826)) "../../include0/external/sbus/sbus.h":226:69 -1
(nil))
during RTL pass: subreg1
/home/lmeier/Projekte/wmucpp/boards/rcfoc/gimbal_sbus_01.cc:426:5: internal
compiler error: in extract_insn, at recog.cc:2791
0x79f2cb _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc/rtl-error.cc:108
0x79f2e7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc/rtl-error.cc:116
0x79dc77 extract_insn(rtx_insn*)
../../gcc/recog.cc:2791
0x1a43d91 decompose_multiword_subregs
../../gcc/lower-subreg.cc:1651
0x1a447ca execute
../../gcc/lower-subreg.cc:1790
Please submit a full bug report, with preprocessed source (by using
-freport-bug).

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #10 from Richard Biener  ---
(In reply to Roger Sayle from comment #7)
> Created attachment 54843 [details]
> proposed patch
> 
> Proposed patch.  Does this look reasonable?

Yes, but doesn't this work for all GET_CODE (op) != ASHIFTRT?

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #9 from Segher Boessenkool  ---
That patch looks fine :-)

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #8 from Wilhelm M  ---
Yes. Looks like the patch does its job.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #7 from Roger Sayle  ---
Created attachment 54843
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54843=edit
proposed patch

Proposed patch.  Does this look reasonable?

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #6 from Wilhelm M  ---
The following "solution"

constexpr uint16_t mul(const uint8_t a, const uint16_t b) {
const auto aa = std::bit_cast>(b);
return aa[1] * a;
}

or 

constexpr uint16_t mul(const uint8_t a, const uint16_t b) {
uint8_t aa[2];
std::memcpy(aa, , 2);
return aa[1] * a;
}

gives optimal result ;-)

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

--- Comment #5 from Segher Boessenkool  ---
Correct, this certainly can not be done by combine, it see two independent
pseudos here.  For hard registers it *can* do many tricks, but not for
pseudos like this.

[Bug rtl-optimization/109476] Missing optimization for 8bit/8bit multiplication / regression

2023-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109476

Richard Biener  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2023-04-12
 Status|UNCONFIRMED |NEW
 CC||iant at google dot com,
   ||law at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
In the good case (with + 1) combine succeeds:

Trying 9 -> 11:
9: r55:HI=zero_extend(r54:QI)
  REG_DEAD r54:QI
   11: r52:HI=r55:HI*r56:HI
  REG_DEAD r56:HI
  REG_DEAD r55:HI
Successfully matched this instruction:
(set (reg:HI 52)
(mult:HI (zero_extend:HI (reg:QI 54))
(reg:HI 56 [ a ])))
allowing combination of insns 9 and 11
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 9.
modifying insn i311: r52:HI=zero_extend(r54:QI)*r56:HI
  REG_DEAD r54:QI
  REG_DEAD r56:HI
deferring rescan insn with uid = 11.

and then

Trying 10 -> 11:
   10: r56:HI=zero_extend(r61:QI)
  REG_DEAD r61:QI
   11: r52:HI=zero_extend(r54:QI)*r56:HI
  REG_DEAD r54:QI
  REG_DEAD r56:HI
Successfully matched this instruction:
(set (reg:HI 52)
(mult:HI (zero_extend:HI (reg:QI 54))
(zero_extend:HI (reg:QI 61
allowing combination of insns 10 and 11
original costs 4 + 20 = 24
replacement cost 12
deferring deletion of insn with uid = 10.
modifying insn i311: r52:HI=zero_extend(r54:QI)*zero_extend(r61:QI)
  REG_DEAD r61:QI
  REG_DEAD r54:QI
deferring rescan insn with uid = 11.


in the bad case instead

Trying 8 -> 9:
8: r52:HI=zero_extend(r55:QI)
  REG_DEAD r55:QI
9: r50:HI=r51:HI*r52:HI
  REG_DEAD r52:HI
  REG_DEAD r51:HI
Successfully matched this instruction:
(set (reg:HI 50)
(mult:HI (zero_extend:HI (reg:QI 55))
(reg:HI 51 [ b+1 ])))
allowing combination of insns 8 and 9
original costs 4 + 28 = 32
replacement cost 20
deferring deletion of insn with uid = 8.
modifying insn i3 9: r50:HI=zero_extend(r55:QI)*r51:HI
  REG_DEAD r55:QI
  REG_DEAD r51:HI
deferring rescan insn with uid = 9.

Trying 20 -> 9:
   20: r51:HI#1=0
9: r50:HI=zero_extend(r55:QI)*r51:HI
  REG_DEAD r55:QI
  REG_DEAD r51:HI
Can't combine i2 into i3

that's because the RTL into combine in the bad case is

(insn 19 22 20 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
 (expr_list:REG_DEAD (reg:QI 54 [ b+1 ])
(nil)))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51 [ b+1 ]) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))
(insn 8 20 9 2 (set (reg:HI 52 [ a ])
(zero_extend:HI (reg/v:QI 48 [ a ]))) "t.ii":4:49 635
{zero_extendqihi2}
 (expr_list:REG_DEAD (reg/v:QI 48 [ a ])
(nil)))
(insn 9 8 14 2 (set (reg:HI 50)
(mult:HI (reg:HI 51 [ b+1 ])
(reg:HI 52 [ a ]))) "t.ii":4:47 328 {*mulhi3_enh_split}
 (expr_list:REG_DEAD (reg:HI 52 [ a ])
(expr_list:REG_DEAD (reg:HI 51 [ b+1 ])
(nil

so the 'b' operand of the multiplication is now not a zero_extend:HI
but instead a two instruction set.  The first subreg pass produces
this IL, turning

(insn 3 2 4 2 (set (reg/v:HI 49 [ b ])
(reg:HI 22 r22 [ b ])) "t.ii":3:49 101 {*movhi_split}
 (nil))
(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (reg/v:HI 49 [ b ])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
 (nil))

into

(insn 17 2 18 2 (set (reg:QI 53 [ b ])
(reg:QI 22 r22 [ b ])) "t.ii":3:49 86 {movqi_insn_split}
 (nil))
(insn 18 17 4 2 (set (reg:QI 54 [ b+1 ])
(reg:QI 23 r23 [ b+1 ])) "t.ii":3:49 86 {movqi_insn_split}
 (nil))
(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
(reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
(const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
 (nil))

with -fno-split-wide-types the rotate doesn't get a zero_extend so the
multiplication pattern doesn't match either.

I think that possibly the lower subreg pass should more optimally
handle the situation, creating

(insn ... (set (zero_extend:HI (reg:QI 54 [ b + 1])))

here.  I'm quite sure combine/forwprop cannot combine the seemingly
unrelated subreg sets.  resolve_shift_zext seems to be supposed to
handle this and it receives

(insn 7 4 8 2 (set (reg:HI 51)
(lshiftrt:HI (concatn/v:HI [
(reg:QI 53 [ b ])
(reg:QI 54 [ b+1 ])
])
(const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
 (nil))

maybe for GET_CODE (op) != ASHIFTRT && offset1 == 0