[Bug tree-optimization/122749] [16 Regression] signed to unsigned type conversions inserted during vectorization blocking MLA since r16-3328-g3182e95eda4

2026-01-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tamar Christina  ---
Fixed.

[Bug tree-optimization/122749] [16 Regression] signed to unsigned type conversions inserted during vectorization blocking MLA since r16-3328-g3182e95eda4

2026-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:cb40e813b8f09f9d3a6000901f1373b476a20886

commit r16-7057-gcb40e813b8f09f9d3a6000901f1373b476a20886
Author: Tamar Christina 
Date:   Tue Jan 27 09:12:16 2026 +

middle-end: teach convert_mult_to_fma handle casts between addend and
multiplicant [PR122749]

The following example

int foo2 (char *buf, int len) {
int x;
for (int i =0; i < len; i++) {
x += (int) i * buf[i];
}
return x;
}

compiled with -O3 -mcpu=neoverse-v2 used to generate a 4x unrolled MLA
sequence

mla z29.s, p7/m, z2.s, z0.s
mla z27.s, p7/m, z4.s, z26.s
mla z30.s, p7/m, z1.s, z0.s
mla z28.s, p7/m, z23.s, z3.s

but now generates MUL + ADD

mul z2.s, z2.s, z1.s
mul z4.s, z4.s, z26.s
mul z1.s, z24.s, z1.s
mul z3.s, z23.s, z3.s
add z29.s, z2.s, z29.s
add z30.s, z1.s, z30.s
add z28.s, z3.s, z28.s
add z0.s, z4.s, z0.s

This is since the fix for r16-3328-g3182e95eda4 we now insert casts around
the
reduction addend.  This causes convert_mult_to_fma to miss the mul + add
sequence.

This patch teaches it to look around the casts for the operands and only
accept
the conversions if it's essentially only a sign changing operations.

Concretely, it converts:

  # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)>
  vect__3.8_38 = MEM  [(char *)_16];
  vect__4.12_45 = (vector([4,4]) int) vect__3.8_38;
  vect__5.14_54 = vect__4.12_45 * vect_vec_iv_.13_49;
  vect_x_12.17_62 = VIEW_CONVERT_EXPR(vect__5.14_54);
  vect_x_12.17_63 = VIEW_CONVERT_EXPR(vect_x_16.15_58);
  vect_x_12.17_64 = vect_x_12.17_62 + vect_x_12.17_63;
  vect_x_12.16_65 = VIEW_CONVERT_EXPR(vect_x_12.17_64);

into:

  # vect_vec_iv_.13_49 = PHI <_50(5), { 0, 1, 2, ... }(4)>
  vect__3.8_38 = MEM  [(charD.8 *)_16];
  vect__4.12_45 = (vector([4,4]) intD.7) vect__3.8_38;
  vect_x_12.17_63 = VIEW_CONVERT_EXPR(vect_x_16.15_58);
  _2 = (vector([4,4]) unsigned int) vect_vec_iv_.13_49;
  _1 = (vector([4,4]) unsigned int) vect__4.12_45;
  vect_x_12.17_64 = .FMA (_1, _2, vect_x_12.17_63);
  vect_x_12.16_65 = VIEW_CONVERT_EXPR(vect_x_12.17_64);

thus restoring FMAs on reductions.

gcc/ChangeLog:

PR tree-optimization/122749
* tree-ssa-math-opts.cc (convert_mult_to_fma_1,
convert_mult_to_fma):
Unwrap converts around addend.

gcc/testsuite/ChangeLog:

PR tree-optimization/122749
* gcc.target/aarch64/pr122749_1.c: New test.
* gcc.target/aarch64/pr122749_2.c: New test.
* gcc.target/aarch64/pr122749_3.c: New test.
* gcc.target/aarch64/pr122749_4.c: New test.
* gcc.target/aarch64/pr122749_5.c: New test.
* gcc.target/aarch64/pr122749_6.c: New test.
* gcc.target/aarch64/pr122749_8.c: New test.
* gcc.target/aarch64/pr122749_9.c: New test.
* gcc.target/aarch64/sve/pr122749_1.c: New test.
* gcc.target/aarch64/sve/pr122749_11.c: New test.
* gcc.target/aarch64/sve/pr122749_12.c: New test.
* gcc.target/aarch64/sve/pr122749_13.c: New test.
* gcc.target/aarch64/sve/pr122749_14.c: New test.
* gcc.target/aarch64/sve/pr122749_2.c: New test.
* gcc.target/aarch64/sve/pr122749_3.c: New test.
* gcc.target/aarch64/sve/pr122749_4.c: New test.
* gcc.target/aarch64/sve/pr122749_5.c: New test.
* gcc.target/aarch64/sve/pr122749_6.c: New test.
* gcc.target/aarch64/sve/pr122749_8.c: New test.
* gcc.target/aarch64/sve/pr122749_9.c: New test.

[Bug tree-optimization/122749] [16 Regression] signed to unsigned type conversions inserted during vectorization blocking MLA since r16-3328-g3182e95eda4

2025-12-24 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122749

Tamar Christina  changed:

   What|Removed |Added

Summary|[16 Regression] signed to   |[16 Regression] signed to
   |unsigned type conversions   |unsigned type conversions
   |inserted during |inserted during
   |vectorization blocking MLA  |vectorization blocking MLA
   ||since r16-3328-g3182e95eda4

--- Comment #3 from Tamar Christina  ---
Bisected to

commit 3182e95eda4a1d612b910b3248c997c8acc7add3
Author: Richard Biener 
Date:   Thu Jan 23 14:29:26 2025 +0100

tree-optimization/111494 - reduction vectorization with signed UB

The following makes sure to pun arithmetic that's used in vectorized
reduction to unsigned when overflow invokes undefined behavior.

PR tree-optimization/111494
* gimple-fold.h (arith_code_with_undefined_signed_overflow):
Declare.
* gimple-fold.cc (arith_code_with_undefined_signed_overflow):
Export.
* tree-vect-stmts.cc (vectorizable_operation): Use unsigned
arithmetic for operations participating in a reduction.

 gcc/gimple-fold.cc |  2 +-
 gcc/gimple-fold.h  |  1 +
 gcc/tree-vect-stmts.cc | 54 ++
 3 files changed, 56 insertions(+), 1 deletion(-)

Will take a look after christmas.