[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #11 from Gabriel Ravier --- It appears like this is fixed on trunk, I think ?
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #10 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:5ca2400270e985f9b33d93007f4d831299b9bda7 commit r11-6471-g5ca2400270e985f9b33d93007f4d831299b9bda7 Author: Jakub Jelinek Date: Tue Jan 5 16:33:29 2021 +0100 match.pd: Improve (A / (1 << B)) -> (A >> B) optimization [PR96930] The following patch improves the A / (1 << B) -> A >> B simplification, as seen in the testcase, if there is unnecessary widening for the division, we just optimize it into a shift on the widened type, but if the lshift is widened too, there is no reason to do that, we can just shift it in the original type and convert after. The tree_nonzero_bits & wi::mask check already ensures it is fine even for signed values. I've split the vr-values optimization into a separate patch as it causes a small regression on two testcases, but this patch fixes what has been reported in the PR alone. 2021-01-05 Jakub Jelinek PR tree-optimization/96930 * match.pd ((A / (1 << B)) -> (A >> B)): If A is extended from narrower value which has the same type as 1 << B, perform the right shift on the narrower value followed by extension. * g++.dg/tree-ssa/pr96930.C: New test.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek --- Created attachment 49873 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49873&action=edit gcc11-pr96930.patch Untested fix.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #8 from Jakub Jelinek --- The optimization is there, but just has different conditions: /* Although it would be tempting to shorten always here, that loses on some targets, since the modulo instruction is undefined if the quotient can't be represented in the computation mode. We shorten only if unsigned or if dividing by something we know != -1. */ shorten = (TYPE_UNSIGNED (TREE_TYPE (orig_op0)) || (TREE_CODE (op1) == INTEGER_CST && !integer_all_onesp (op1))); in C, and /* When dividing two signed integers, we have to promote to int. unless we divide by a constant != -1. Note that default conversion will have been performed on the operands at this point, so we have to dig out the original type to find out if it was unsigned. */ tree stripped_op1 = tree_strip_any_location_wrapper (op1); shorten = ((TREE_CODE (op0) == NOP_EXPR && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (op0, 0 || (TREE_CODE (stripped_op1) == INTEGER_CST && ! integer_all_onesp (stripped_op1))); So, in C++ we only try to shorten divisions (and modulo) if the first operand has been extended, rather than the second one. Anyway, if we optimize this in the middle-end, it won't be needed to change the FE.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #7 from Jakub Jelinek --- Oh, C++, I was trying C. Apparently this optimization is done by the C FE only.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #6 from Gabriel Ravier --- For this exact code : unsigned f(unsigned a, unsigned b) { return a / (unsigned long long)(1U << b); } compiled with a trunk-based GCC built yesterday for x86-64-linux-gnu configured with: ../gcc-trunk-20210102/configure --prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap --enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d --enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix --with-pkgversion=Compiler-Explorer-Build with the specific compiler command line of `g++ -g -o /tmp/compiler-explorer-compiler202103-4524-15mdddx.f4b4g/output.s -masm=intel -S -fdiagnostics-color=always -O3 /tmp/compiler-explorer-compiler202103-4524-15mdddx.f4b4g/example.cpp`, I get: f(unsigned int, unsigned int): mov eax, edi mov ecx, esi shr rax, cl ret whereas LLVM uses `shr eax, cl` instead. See also https://godbolt.org/z/Gqza7v for the exact setup I used (in case I missed something and you rather avoid having to wait for me to answer).
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #5 from Jakub Jelinek --- I can't reproduce that. I get: movl%edi, %eax movl%esi, %ecx shrl%cl, %eax ret for that function, and LLVM emits the same code with the first two insns swapped. Only in baz above I get: movl%edi, %eax movl%esi, %ecx shrq%cl, %rax ret which is the 64-bit shift instead (but I doubt on x86_64 there is any difference except the 1 byte in code size).
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #4 from Gabriel Ravier --- > The testcase seems to be optimized into return a >> b; and already e.g. GCC > 4.4 does that. > So it is unclear why this has been reported and what difference you found. What I observed is that it is optimized into `return (uint64_t)a >> b;` (where `unsigned` is 32-bit), not `return a >> b;`.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 --- Comment #3 from Jakub Jelinek --- I guess it could be either optimized using a match.pd pattern like we have for: /* Convert (outertype)((innertype0)a+(innertype1)b) into ((newtype)a+(newtype)b) where newtype is the widest mode from all of these. */ (for op (plus minus mult rdiv) (simplify (convert (op:s@0 (convert1?@3 @1) (convert2?@4 @2))) but for the *_div and trunc_mod/floor_mod the C/C++ FEs optimize using shorten_binary_op without the outer convert, but instead requiring that at least one of the operands is actually converted from narrower type and that the other one fits into the range of that narrower type and for signed div/mod the second operand is known not to be -1, or do it instead in: simplify_using_ranges::simplify_div_or_mod_using_ranges with the same rules. Any preferences?
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- The testcase seems to be optimized into return a >> b; and already e.g. GCC 4.4 does that. So it is unclear why this has been reported and what difference you found. That said, given: unsigned foo (unsigned a, unsigned b) { return a / (unsigned long long) (1U << b); } unsigned bar (unsigned a, unsigned b) { return a / (1U << b); } unsigned baz (unsigned a, unsigned b) { unsigned long long c = 1U << b; return a / c; } I see that while we optimize foo and bar into a >> b, by the: /* (A / (1 << B)) -> (A >> B). Only for unsigned A. For signed A, this would not preserve rounding toward zero. For example: (-1 / ( 1 << B)) != -1 >> B. Also also widening conversions, like: (A / (unsigned long long) (1U << B)) -> (A >> B) or (A / (unsigned long long) (1 << B)) -> (A >> B). If the left shift is signed, it can be done only if the upper bits of A starting from shift's type sign bit are zero, as (unsigned long long) (1 << 31) is -2147483648ULL, not 2147483648ULL, so it is valid only if A >> 31 is zero. */ but for baz we actually perform the shift in the wider mode unnecessarily, because both operands are zero-extended from 32 bits. Given: unsigned qux (unsigned a, unsigned b) { unsigned long long c = a; unsigned long long d = b; return c / d; } unsigned corge (unsigned a, unsigned b) { return ((unsigned long long) a) / (unsigned long long) b; } we only optimize it in corge to return a / b; and not in qux, so some fold-const optimization is not done on GIMPLE.
[Bug tree-optimization/96930] Failure to optimize out arithmetic with bigger size when it can't matter with division transformed into right shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96930 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-09-04 Status|UNCONFIRMED |NEW Blocks||19987 --- Comment #1 from Richard Biener --- Thanks for the report. Note there's the PR19987 meta-bug all your expression simplification reports should 'block' (not so much the target specific ones, maybe case-by-case) Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 [Bug 19987] [meta-bug] fold missing optimizations in general