[Bug c/94392] New: Infinite loops are optimized away for C99
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94392 Bug ID: 94392 Summary: Infinite loops are optimized away for C99 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- Created attachment 48141 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48141=edit Source code reproducing the issue John Regehr noticed on twitter (https://twitter.com/johnregehr/status/1244335355509129216) that trunk GCC removes infinite loops for C99, as can be seen by gcc -O3 -std=c99 fermat.c This behavior was introduced by the introduction of -ffinite-loops being enabled at -O2. This is fine for C11, but infinite loops do not invoke undefined behavior in C99, so the optimization should not be enabled per default for -std=c99.
[Bug tree-optimization/81388] Incorrect code generation with -O1 -fno-strict-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388 krister.walfridsson at gmail dot com changed: What|Removed |Added CC||krister.walfridsson at gmail dot c ||om --- Comment #5 from krister.walfridsson at gmail dot com --- Comment #1 says that "-fno-strict-overflow Does not effect Pointers", but the manual says for -fstrict-overflow: "This option also allows the compiler to assume strict pointer semantics: given a pointer to an object, if adding an offset to that pointer does not produce a pointer to the same object, the addition is undefined. This permits the compiler to conclude that p + u > p is always true for a pointer p and unsigned integer u. This assumption is only valid because pointer wraparound is undefined, as the expression is false if p + u overflows using twos complement arithmetic." At least I read this as -fno-strict-overflow permit pointer overflow too. Is that incorrect? If so, then I think the manual should be corrected/clarified...
[Bug c/80852] Optimisation fails to recognise sum computed by loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852 krister.walfridsson at gmail dot com changed: What|Removed |Added CC||krister.walfridsson at gmail dot c ||om --- Comment #3 from krister.walfridsson at gmail dot com --- This is related to (or a dup of) tree-optimization/46186
[Bug target/80600] hidden symbol `__cpu_model' is referenced by DSO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80600 krister.walfridsson at gmail dot com changed: What|Removed |Added CC||krister.walfridsson at gmail dot c ||om --- Comment #7 from krister.walfridsson at gmail dot com --- Yes, it works with GCC 6, and it used to work with GCC 7. My guess is that it started to fail with r243219. I'm at a conference the rest of this week, but I'll fix this as soon as I'm back.
[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520 --- Comment #5 from krister.walfridsson at gmail dot com --- I have extracted a smaller test case. The loops are generated from typedef mersenne_twister_engine< uint_fast32_t, 32, 624, 397, 31, 0x9908b0dfUL, 11, 0xUL, 7, 0x9d2c5680UL, 15, 0xefc6UL, 18, 1812433253UL> mt19937; and the expansion of the template end up with loops like void foo(unsigned long *M) { for (unsigned long k = 0; k < 227; ++k) { unsigned long y = ((M[k] & 0x8000) | (M[k + 1] & 0x7fff)); M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0)); } } which generates the dump described in the bug report. --- Comment #6 from krister.walfridsson at gmail dot com --- I have extracted a smaller test case. The loops are generated from typedef mersenne_twister_engine< uint_fast32_t, 32, 624, 397, 31, 0x9908b0dfUL, 11, 0xUL, 7, 0x9d2c5680UL, 15, 0xefc6UL, 18, 1812433253UL> mt19937; and the expansion of the template end up with loops like void foo(unsigned long *M) { for (unsigned long k = 0; k < 227; ++k) { unsigned long y = ((M[k] & 0x8000) | (M[k + 1] & 0x7fff)); M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0)); } } which generates the dump described in the bug report.
[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520 --- Comment #5 from krister.walfridsson at gmail dot com --- I have extracted a smaller test case. The loops are generated from typedef mersenne_twister_engine< uint_fast32_t, 32, 624, 397, 31, 0x9908b0dfUL, 11, 0xUL, 7, 0x9d2c5680UL, 15, 0xefc6UL, 18, 1812433253UL> mt19937; and the expansion of the template end up with loops like void foo(unsigned long *M) { for (unsigned long k = 0; k < 227; ++k) { unsigned long y = ((M[k] & 0x8000) | (M[k + 1] & 0x7fff)); M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0)); } } which generates the dump described in the bug report. --- Comment #6 from krister.walfridsson at gmail dot com --- I have extracted a smaller test case. The loops are generated from typedef mersenne_twister_engine< uint_fast32_t, 32, 624, 397, 31, 0x9908b0dfUL, 11, 0xUL, 7, 0x9d2c5680UL, 15, 0xefc6UL, 18, 1812433253UL> mt19937; and the expansion of the template end up with loops like void foo(unsigned long *M) { for (unsigned long k = 0; k < 227; ++k) { unsigned long y = ((M[k] & 0x8000) | (M[k + 1] & 0x7fff)); M[k] = (M[k + 397] ^ (y >> 1) ^ ((y & 1) ? 2567483615 : 0)); } } which generates the dump described in the bug report.
[Bug tree-optimization/80520] [7/8 Regression] Performance regression from missing if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520 --- Comment #3 from krister.walfridsson at gmail dot com --- You can see the issue in the generated code with int foo(std::mt19937 ) { std::uniform_int_distribution dist(0,99); return dist(gen); } too. I.e. it is not just an artifact of the uninteresting use in the benchmarking loop.
[Bug tree-optimization/80520] New: Performance regression from missing if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520 Bug ID: 80520 Summary: Performance regression from missing if-conversion Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- Target: x86_64-linux-gnu Created attachment 41266 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41266=edit Test case demonstrating the problem The following test case from a CppCon 2016 talk benchmarking different randomization constructs #include void foo(std::mt19937 ) { for (int i = 0; i < 10; ++i) { std::uniform_int_distribution dist(0,99); volatile auto x = dist(gen); } } runs much slower when compiled with gcc 8.0 (r247084) compared to gcc 6.3 gcc 6.3.0: 3.9s gcc 8.0.0: 7.7s (compiled as "g++ -O3" on x86_64-linux-gnu). The benchmark is silly, but it indicates that the heuristics for the branch optimizations could be improved The difference is that the .optimized dump generated by gcc 6.3.0 contains code segments of the form _32 = __y_27 & 1; iftmp.1_33 = _32 != 0 ? 2567483615 : 0; _34 = _31 ^ iftmp.1_33; MEM[base: _97, offset: 0B] = _34; ivtmp.35_100 = ivtmp.35_101 + 8; if (_94 == ivtmp.35_100) goto ; else goto ; where iftmp.1_33 is generated as a cmov, while the same code compiled by gcc 8.0.0 looks like _102 = __y_60 & 1; if (_102 != 0) goto ; [50.00%] else goto ; [50.00%] [49.50%]: _98 = _103 ^ 2567483615; MEM[base: _105, offset: 0B] = _98; ivtmp.33_91 = ivtmp.33_43 + 8; if (_47 == ivtmp.33_91) goto ; [1.01%] else goto ; [98.99%] [49.50%]: MEM[base: _105, offset: 0B] = _103; ivtmp.33_44 = ivtmp.33_43 + 8; if (ivtmp.33_44 == _47) goto ; [1.01%] else goto ; [98.99%] and the CPU mispredicts the branch generated from "if (_102 != 0)".
[Bug tree-optimization/79721] New: Scalar evolution introduces signed overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79721 Bug ID: 79721 Summary: Scalar evolution introduces signed overflow Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- The function int foo(int a, int b) { int sum = 0; for (int i = 0; i < 6; i++) { sum += a + i * b; } return sum; } is transformed to int _11; int _12; int _13; int _16; [...] _16 = b_7(D) + a_8(D); _13 = _16 * 5; _12 = b_7(D) * 1799910001; _11 = _12 + _13; sum_17 = _11 + a_8(D); return sum_17; by scalar evolution when compiled as "gcc -O3 -c bug.c". The original function could calculate foo(-3, 2) without any signed integer overflow, but the transformed function will overflow in the multiplication _12.
[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 --- Comment #3 from krister.walfridsson at gmail dot com --- Correction: -fno-split-paths does not help the trunk compiler. But it restores the result when using the r242550 compiler...
[Bug tree-optimization/79390] 10% performance drop in SciMark2 LU after r242550
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 --- Comment #2 from krister.walfridsson at gmail dot com --- No, I get the same reduced performance when using -fno-split-paths
[Bug tree-optimization/79390] New: 10% performance drop in SciMark2 LU after r242550
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79390 Bug ID: 79390 Summary: 10% performance drop in SciMark2 LU after r242550 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- Created attachment 40677 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40677=edit The relevant source code and generated asm before/after this change The dense LU matrix factorization test from the old SciMark2 (http://math.nist.gov/scimark) used in the Phoronix compiler test suite has regressed 10% compared to the November trunk when run on Intel i7 6800K Broadwell (compiled with "-O3 -march=native"). GCC 6 generated much slower code, so this is not a regression compared to released versions of the compiler. The regression was introduced in r242550: r242550 | wschmidt | 2016-11-17 15:22:17 +0100 (tor, 17 nov 2016) | 18 lines [gcc] 2016-11-17 Bill Schmidt <wschm...@linux.vnet.ibm.com> Richard Biener <rguent...@suse.de> PR tree-optimization/77848 * tree-if-conv.c (tree_if_conversion): Always version loops unless the user specified -ftree-loop-if-convert. [gcc/testsuite] 2016-11-17 Bill Schmidt <wschm...@linux.vnet.ibm.com> Richard Biener <rguent...@suse.de> PR tree-optimization/77848 * gfortran.dg/vect/pr77848.f: New test. and has the effect that the pivot-finding loop int LU_factor(int M, int N, double **A, int *pivot) { int minMN = M < N ? M : N; int j=0; for (j=0; j<minMN; j++) { /* find pivot in column j and test for singularity. */ int jp=j; int i; double t = fabs(A[j][j]); for (i=j+1; i<M; i++) { double ab = fabs(A[i][j]); if ( ab > t) { jp = i; t = ab; } } pivot[j] = jp; ... is transformed. The perf output seems to say that this is due to bad branch prediction, but I do not understand x86 assembler enough to be able to determine its cause (or to say if it really is a bug or just some random thing the compiler cannot know about...)
[Bug tree-optimization/79389] New: 30% performance regression in SciMark2 MonteCarlo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79389 Bug ID: 79389 Summary: 30% performance regression in SciMark2 MonteCarlo Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- Created attachment 40676 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40676=edit The relevant source code and generated assembler before/after this change The MonteCarlo test from the old SciMark2 (http://math.nist.gov/scimark) used in the Phoronix compiler test suite has regressed 30% compared to GCC 6.3 when run on Intel i7 6800K Broadwell (compiled with "-O3 -march=native"). The regression was introduced in r238005: r238005 | rguenth | 2016-07-05 15:25:47 +0200 (tis, 05 jul 2016) | 7 lines 2016-07-05 Richard Biener <rguent...@suse.de> * gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa): Handle empty else block. (is_feasible_trace): Likewise. (split_paths): Likewise. and has the effect that the if-statement in the loop for (count=0; count<Num_samples; count++) { double x= Random_nextDouble(R); double y= Random_nextDouble(R); if ( x*x + y*y <= 1.0) under_curve ++; } is changed from a cmov to a branch, which mispredicts.
[Bug c++/79205] New: ICE in create_tmp_var, at gimple-expr.c:473
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79205 Bug ID: 79205 Summary: ICE in create_tmp_var, at gimple-expr.c:473 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- gcc version 7.0.1 20170124 (r244846) ICEs when compiling the following (using the command line "g++ -c -std=c++1z bug.cpp") #include int foo(std::tuple t) { auto [x0] = t; return x0; }
[Bug middle-end/78847] New: pointer arithmetic from c++ ranged-based for loop not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78847 Bug ID: 78847 Summary: pointer arithmetic from c++ ranged-based for loop not optimized Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- GCC has some problems eliminating overhead from C++ range-based for loops. Consider the program #include #include #include using string_view = std::experimental::string_view; class Foo { constexpr static size_t Length = 9; char ascii_[Length]; public: Foo(); string_view view() const { return string_view(ascii_, Length); } }; void testWithLoopValue(const Foo foo, size_t ptr, char *buf_) { for (auto c : foo.view()) buf_[ptr++] = c; } compiled as g++ -O3 -S -std=c++1z k.cpp ldist determines that this is a memcpy of length expressed as _14 _18 = (unsigned long) [(void *) + 9B]; _4 = _ + 1; _3 = (unsigned long) _4; _16 = _18 + 1; _14 = _16 - _3; and dom3 improves this to _18 = (unsigned long) [(void *) + 9B]; _3 = (unsigned long) [(void *) + 1B]; _16 = _18 + 1; _14 = _16 - _3; But this is not further simplified to 9 until combine, where it is too late, and a call to memcpy is generated instead of the expected inlined version.
[Bug tree-optimization/78343] New: Loop is not eliminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78343 Bug ID: 78343 Summary: Loop is not eliminated Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krister.walfridsson at gmail dot com Target Milestone: --- GCC 6 and trunk generates inefficient code for the loop unsigned int test(unsigned int quant) { unsigned int sum = 0; for (unsigned int i = 0; i < quant; ++i){ sum += quant; } return sum; } as noted in the tweet https://twitter.com/lefticus/status/797593368037642244?s=09. This is a regression introduced in r233207; GCC used to generate test: movl%edi, %eax imull %edi, %eax ret before r233207, and it generates a meaningless loop test: testl %edi, %edi je .L4 xorl%edx, %edx .L3: addl$1, %edx cmpl%edx, %edi jne .L3 movl%edi, %eax imull %edi, %eax ret .L4: xorl%eax, %eax ret after that change.
[Bug tree-optimization/78035] Inconsistency between address comparison and alias analysis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78035 krister.walfridsson at gmail dot com changed: What|Removed |Added CC||krister.walfridsson at gmail dot c ||om --- Comment #9 from krister.walfridsson at gmail dot com --- Doesn't this just introduce more inconsistencies in the compiler? For example extern int a; extern int b; int foo(void) { a = 1; b = 5; a++; return != } optimizes to foo: movl$a, %eax movl$5, b(%rip) movl$2, a(%rip) cmpq$b, %rax setne %al movzbl %al, %eax ret That is, the accesses to a and b are optimized as if they are distinct, even though the compiler keeps the comparison of the addresses. I cannot think of a reasonable use case where you must handle comparisons of the addresses as currently implemented while allowing other optimizations as if the objects are distinct, so I'd say the bug from the original description is that we were "being too conservative in bar"...
[Bug fortran/48244] iso-c-binding support missing on NetBSD (with patch)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48244 --- Comment #2 from krister.walfridsson at gmail dot com --- --- Comment #1 from Dominique d'Humieres dominiq at lps dot ens.fr --- Is there still maintainers/users of NetBSD? There are still users. But my paperwork is not in order since I changed employer some years ago, so I am not allowed to commit anything... :( /Krister