[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2023-02-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #11 from Gabriel Ravier  ---
It appears like this is fixed on trunk, I think ?

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:5ca2400270e985f9b33d93007f4d831299b9bda7

commit r11-6471-g5ca2400270e985f9b33d93007f4d831299b9bda7
Author: Jakub Jelinek 
Date:   Tue Jan 5 16:33:29 2021 +0100

match.pd: Improve (A / (1 << B)) -> (A >> B) optimization [PR96930]

The following patch improves the A / (1 << B) -> A >> B simplification,
as seen in the testcase, if there is unnecessary widening for the division,
we just optimize it into a shift on the widened type, but if the lshift
is widened too, there is no reason to do that, we can just shift it in the
original type and convert after.  The tree_nonzero_bits & wi::mask check
already ensures it is fine even for signed values.

I've split the vr-values optimization into a separate patch as it causes
a small regression on two testcases, but this patch fixes what has been
reported in the PR alone.

2021-01-05  Jakub Jelinek  

PR tree-optimization/96930
* match.pd ((A / (1 << B)) -> (A >> B)): If A is extended
from narrower value which has the same type as 1 << B, perform
the right shift on the narrower value followed by extension.

* g++.dg/tree-ssa/pr96930.C: New test.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Created attachment 49873
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49873&action=edit
gcc11-pr96930.patch

Untested fix.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #8 from Jakub Jelinek  ---
The optimization is there, but just has different conditions:
/* Although it would be tempting to shorten always here, that
   loses on some targets, since the modulo instruction is
   undefined if the quotient can't be represented in the
   computation mode.  We shorten only if unsigned or if
   dividing by something we know != -1.  */
shorten = (TYPE_UNSIGNED (TREE_TYPE (orig_op0))
   || (TREE_CODE (op1) == INTEGER_CST
   && !integer_all_onesp (op1)));
in C, and
  /* When dividing two signed integers, we have to promote to int.
 unless we divide by a constant != -1.  Note that default
 conversion will have been performed on the operands at this
 point, so we have to dig out the original type to find out if
 it was unsigned.  */
  tree stripped_op1 = tree_strip_any_location_wrapper (op1);
  shorten = ((TREE_CODE (op0) == NOP_EXPR
  && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0
 || (TREE_CODE (stripped_op1) == INTEGER_CST
 && ! integer_all_onesp (stripped_op1)));
So, in C++ we only try to shorten divisions (and modulo) if the first operand
has been extended, rather than the second one.
Anyway, if we optimize this in the middle-end, it won't be needed to change the
FE.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #7 from Jakub Jelinek  ---
Oh, C++, I was trying C.  Apparently this optimization is done by the C FE
only.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-03 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #6 from Gabriel Ravier  ---
For this exact code :

unsigned f(unsigned a, unsigned b)
{
return a / (unsigned long long)(1U << b);
}

compiled with a trunk-based GCC built yesterday for x86-64-linux-gnu configured
with: ../gcc-trunk-20210102/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build

with the specific compiler command line of `g++ -g -o
/tmp/compiler-explorer-compiler202103-4524-15mdddx.f4b4g/output.s -masm=intel
-S -fdiagnostics-color=always -O3
/tmp/compiler-explorer-compiler202103-4524-15mdddx.f4b4g/example.cpp`, I get:

f(unsigned int, unsigned int):
  mov eax, edi
  mov ecx, esi
  shr rax, cl
  ret

whereas LLVM uses `shr eax, cl` instead.

See also https://godbolt.org/z/Gqza7v for the exact setup I used (in case I
missed something and you rather avoid having to wait for me to answer).

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #5 from Jakub Jelinek  ---
I can't reproduce that.
I get:
movl%edi, %eax
movl%esi, %ecx
shrl%cl, %eax
ret
for that function, and LLVM emits the same code with the first two insns
swapped.
Only in baz above I get:
movl%edi, %eax
movl%esi, %ecx
shrq%cl, %rax
ret
which is the 64-bit shift instead (but I doubt on x86_64 there is any
difference except the 1 byte in code size).

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-03 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #4 from Gabriel Ravier  ---
> The testcase seems to be optimized into return a >> b; and already e.g. GCC 
> 4.4 does that.
> So it is unclear why this has been reported and what difference you found.

What I observed is that it is optimized into `return (uint64_t)a >> b;` (where
`unsigned` is 32-bit), not `return a >> b;`.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

--- Comment #3 from Jakub Jelinek  ---
I guess it could be either optimized using a match.pd pattern like we have for:
/* Convert (outertype)((innertype0)a+(innertype1)b)
   into ((newtype)a+(newtype)b) where newtype
   is the widest mode from all of these.  */
(for op (plus minus mult rdiv)
 (simplify
   (convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2)))
but for the *_div and trunc_mod/floor_mod the C/C++ FEs optimize using
shorten_binary_op without the outer convert, but instead requiring that at
least one of the operands is actually converted from narrower type and that the
other one fits into the range of that narrower type and for signed div/mod the
second operand is known not to be -1, or do it instead in:
simplify_using_ranges::simplify_div_or_mod_using_ranges
with the same rules.  Any preferences?

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2021-01-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
The testcase seems to be optimized into return a >> b; and already e.g. GCC 4.4
does that.
So it is unclear why this has been reported and what difference you found.

That said, given:
unsigned
foo (unsigned a, unsigned b)
{
  return a / (unsigned long long) (1U << b);
}

unsigned
bar (unsigned a, unsigned b)
{
  return a / (1U << b);
}

unsigned
baz (unsigned a, unsigned b)
{
  unsigned long long c = 1U << b;
  return a / c;
}

I see that while we optimize foo and bar into a >> b, by the:
/* (A / (1 << B)) -> (A >> B).
   Only for unsigned A.  For signed A, this would not preserve rounding
   toward zero.
   For example: (-1 / ( 1 << B)) !=  -1 >> B.
   Also also widening conversions, like:
   (A / (unsigned long long) (1U << B)) -> (A >> B)
   or
   (A / (unsigned long long) (1 << B)) -> (A >> B).
   If the left shift is signed, it can be done only if the upper bits
   of A starting from shift's type sign bit are zero, as
   (unsigned long long) (1 << 31) is -2147483648ULL, not 2147483648ULL,
   so it is valid only if A >> 31 is zero.  */
but for baz we actually perform the shift in the wider mode unnecessarily,
because both operands are zero-extended from 32 bits.

Given:
unsigned
qux (unsigned a, unsigned b)
{
  unsigned long long c = a;
  unsigned long long d = b;
  return c / d;
}

unsigned
corge (unsigned a, unsigned b)
{
  return ((unsigned long long) a) / (unsigned long long) b;
}
we only optimize it in corge to return a / b; and not in qux, so some
fold-const optimization is not done on GIMPLE.

[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift

2020-09-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-09-04
 Status|UNCONFIRMED |NEW
 Blocks||19987

--- Comment #1 from Richard Biener  ---
Thanks for the report.  Note there's the PR19987 meta-bug all your expression
simplification reports should 'block' (not so much the target specific ones,
maybe case-by-case)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
[Bug 19987] [meta-bug] fold missing optimizations in general