[Bug c++/68763] [6 Regression] ICE: in verify_unstripped_args, at cp/pt.c:1132
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68763 Marek Polacek changed: What|Removed |Added Status|RESOLVED|NEW Resolution|WORKSFORME |--- --- Comment #7 from Marek Polacek --- Yeah, that ICEs. 1123 static void 1124 verify_unstripped_args (tree args) 1125 { 1126 ++processing_template_decl; 1127 if (!any_dependent_template_arguments_p (args)) 1128 { 1129 tree inner = INNERMOST_TEMPLATE_ARGS (args); 1130 for (int i = 0; i < TREE_VEC_LENGTH (inner); ++i) 1131 { 1132 tree arg = TREE_VEC_ELT (inner, i); 1133 if (TREE_CODE (arg) == TEMPLATE_DECL) 1134 /* OK */; 1135 else if (TYPE_P (arg)) 1136 gcc_assert (strip_typedefs (arg, NULL) == arg); strip_typedefs (arg, NULL) is: struct { const struct details_t & account_t:: (const struct account_t *, bool) * __pfn; long int __delta; } and arg is: struct { const struct details_t & account_t:: (const struct account_t *, bool) * __pfn; long int __delta; }
[Bug c/68908] inefficient code for _Atomic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908 --- Comment #6 from Martin Sebor --- (In reply to Jakub Jelinek from comment #2) > Doesn't seem to be ppc64le specific in any way, and doesn't affect just > preincrement. The inefficiency I was pointing out was the redundant syncs above the loop on powerpc64. The x86_64 assembly looks fairly efficient both ways. I also intentionally focused the bug on the increment expression and didn't mention others like compound assignment because I expected the former to be more common. But I suppose ++a really should be equally as efficient as a += 1 which shouldn't be any less efficient than a += X for any arbitrary X. If it's preferable to treat this as a generic opportunity to improve the efficiency of all atomic expressions (perhaps along with those discussed on the Wiki: https://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations) that sounds great.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #6 from Jon Beniston --- -fstrict-overflow (which is the default at -O2) tells us that we can assume it will not overflow. Even if it did, on most targets it makes no difference to the result.
[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616 --- Comment #15 from H.J. Lu --- (In reply to H.J. Lu from comment #14) > (In reply to H.J. Lu from comment #13) > > I got > > > > FAIL: g++.dg/ipa/pr66616.C -std=gnu++11 execution test > > FAIL: g++.dg/ipa/pr66616.C -std=gnu++14 execution test > > FAIL: g++.dg/ipa/pr66616.C -std=gnu++98 execution test > > > > on trunk/x86-64. > > It fails with -m32 on x86-64 for trunk and gcc-5-branch: > It also fails on i686.
[Bug rtl-optimization/67736] Wrong optimization with -fexpensive-optimizations on mips64el
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67736 Steve Ellcey changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||sje at gcc dot gnu.org Known to work||5.3.0, 6.0 Resolution|--- |FIXED --- Comment #8 from Steve Ellcey --- Patch checked in on ToT for 6.0 and on 5.* branch.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #8 from Andrew Pinski --- Couldn't it be optimized as: short func(short *a, int y) { short ret = 0; unsigned int tmp = 0; int i; for(i = 0; i < y; i++) tmp += (unsigned int)(int)a[i]; return (short)tmp; } Such that the addition happens in unsigned (so there is only wrapping and is well defined) and only one truncatation happens at the end of the loop.
[Bug libfortran/68867] numeric formatting problem in the fortran library
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867 --- Comment #13 from Steve Kargl --- On Tue, Dec 15, 2015 at 06:03:55PM +, seurer at linux dot vnet.ibm.com wrote: > > FAIL: gfortran.dg/default_format_denormal_2.f90 -O0 execution test > FAIL: gfortran.dg/default_format_denormal_2.f90 -O1 execution test > FAIL: gfortran.dg/default_format_denormal_2.f90 -O2 execution test > FAIL: gfortran.dg/default_format_denormal_2.f90 -O3 -fomit-frame-pointer > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > FAIL: gfortran.dg/default_format_denormal_2.f90 -O3 -g execution test > FAIL: gfortran.dg/default_format_denormal_2.f90 -Os execution test > > I checked with the revision previous to this patch and the revision for this > patch and the only differences were fmt_g0_7 succeeding and > default_format_denormal_2 failing. % svn diff default_format_denormal_2.f90 Index: default_format_denormal_2.f90 === --- default_format_denormal_2.f90 (revision 231661) +++ default_format_denormal_2.f90 (working copy) @@ -1,4 +1,4 @@ -! { dg-do run { xfail powerpc*-apple-darwin* } } +! { dg-do run { xfail powerpc*-*-* } } ! { dg-require-effective-target fortran_large_real } ! Test XFAILed on this platform because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685
[Bug c++/68763] [6 Regression] ICE: in verify_unstripped_args, at cp/pt.c:1132
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68763 David Binderman changed: What|Removed |Added CC||dcb314 at hotmail dot com --- Comment #6 from David Binderman --- Created attachment 37043 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37043=edit C++ source code, compressed with xz I can reproduce the problem with the attached C++ source code. gcc trunk from 20151214.
[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906 --- Comment #3 from Yuri Rumyantsev --- I've prepared simple fix which cures ICE. I will send it for review tomorrow. 2015-12-15 12:50 GMT+03:00 jakub at gcc dot gnu.org: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906 > > Jakub Jelinek changed: > >What|Removed |Added > > CC||jakub at gcc dot gnu.org > > --- Comment #2 from Jakub Jelinek --- > This doesn't look to me like a mere omission to invalidate debug stmts after > some stmt move that (correctly) has not considered debug stmts when > determining > if they should be moved or not, but it looks to me like wrong-code > transformation. > Before unswitch, if c is non-zero, we have endless loop, but during > unswitching > it is wrongly changed to branch to the bb that returns instead. > Say if you compile with -O3 (no -g): > int a; > volatile int b; > short c, d; > int > fn1 () > { > int e; > for (;;) > { > a = 3; > if (c) > continue; > e = 0; > for (; e > -30; e--) > if (b) > { > int f = e; > return d; > } > } > } > > int > main () > { > c = 1; > asm volatile ("" : : "m" (c) : "memory"); > fn1 (); > __builtin_abort (); > } > > then before the change this would just hang (expected), now it aborts instead. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #4 from Jon Beniston --- Well if it is just truncating the higher bits, why can't it be done at the end of the loop? What do you think will be different if it is done at the end of the loop? Can you think of an example where the value of ret will differ? The MSBs in an add don't effect the LSBs.
[Bug libfortran/68867] numeric formatting problem in the fortran library
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867 Jerry DeLisle changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from Jerry DeLisle --- Revision 231639 committed to trunk. 2015-12-14 Jerry DeLislePR libfortran/pr68867 * io/write.c (set_fnode_default): For kind=16, set the decimal precision depending on the platform binary precision, 106 or 113. https://gcc.gnu.org/viewcvs/gcc?view=revision=231639 Fixed on trunk.
[Bug c++/63628] [c++1y] cannot use decltype on captured arg-pack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63628 --- Comment #4 from Jason Merrill --- (In reply to Paolo Carlini from comment #3) > The second and third variants work in mainline. Yes, they were fixed by the patch for bug 68309. We need a further fix to handle the original testcase.
[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921 --- Comment #1 from Jonathan Wakely --- This fixes it: --- a/libstdc++-v3/src/c++11/futex.cc +++ b/libstdc++-v3/src/c++11/futex.cc @@ -52,7 +52,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // we will fall back to spin-waiting. The only thing we could do // here on errors is abort. int ret __attribute__((unused)); - ret = syscall (SYS_futex, __addr, futex_wait_op, __val); + ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr); _GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN); return true; }
[Bug c/68908] inefficient code for _Atomic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908 --- Comment #7 from Andrew Macleod --- (In reply to Richard Henderson from comment #4) > I think we should rather handle this in the front end than with > quite complex pattern matching. > > If we want to do any complex logic with atomics in the middle-end, > we should change their representation so that we don't have to > struggle with a sequence of builtins. Which is clearly a subject > for gcc7 at minimum. Yes, I think anything more complex than this should be part of an atomics optimization framework using a new set of ATOMIC gimple ops rather than builtins. For the purpose of this PR we ought to just fix it in the FE.
[Bug middle-end/56934] ICE folding a COND_EXPR involving vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56934 Marek Polacek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Marek Polacek --- Works with all active branches.
[Bug middle-end/62069] [GCC-5] ICE: in int_cst_value, at tree.c:10625
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62069 Marek Polacek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #2 from Marek Polacek --- This now passes for me.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #3 from Steve Ellcey --- My understanding (I don't have a C/C++ standard handy) is that the addition done by 'ret + a[i]' is done in integer mode (not as short). This results in an integer value that may be outside the range of a short, but in the range of a normal integer. So this is not really an overflow. Then the integer result is assigned to ret, which is short. I believe that the truncation of a integer value (with a value outside the range of a short) to a short is not undefined by the C and C++ standards but has a specific way that it needs to work (truncate off the higher bits). This is the truncation that needs to be done on each loop iteration.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #5 from Steve Ellcey --- If we did not truncate ret on each loop iteration then ret could get large enough to overflow the maximum integer value before we truncate it at the end, leading to undefined results. But if we truncate ret on each loop iteration then ret will not overflow and the result is defined.
[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #35 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #26) > Another analysis by Jake in PR54037: Eh, PR 54073.
[Bug libstdc++/68921] New: [5/6 Regression] std::future::wait() makes invalid futex calls and spins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921 Bug ID: 68921 Summary: [5/6 Regression] std::future::wait() makes invalid futex calls and spins Product: gcc Version: 5.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org CC: torvald at gcc dot gnu.org Target Milestone: --- Target: i?86-*linux* On 32-bit linux the following spins in a tight loop until it times out: #include #include int main() { std::promise p; auto f = p.get_future(); std::thread t([](){ std::this_thread::sleep_for(std::chrono::seconds(10)); p.set_value(); }); f.wait(); t.join(); } strace shows thousands of invalid calls: futex(0x8cf2a24, FUTEX_WAIT, 2147483648, {4289120584, 134527555}) = -1 EINVAL (Invalid argument) It's called from the infinite loop in __atomic_futex_unsigned::_M_load_and_test_until in
[Bug rtl-optimization/68920] [6 Regression] Undesirable if-conversion for a rarely taken branch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68920 --- Comment #1 from Uroš Bizjak --- Another incarnation of PR 56309 ?
[Bug middle-end/57348] [TM] ICE for transaction expression in gimplify_expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57348 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org --- Comment #2 from Marek Polacek --- Still ICEs.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 Steve Ellcey changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID |--- --- Comment #7 from Steve Ellcey --- I am still unconvinced but I will change it back to unconfirmed and leave it there in case someone else wants to look at it and/or propose a patch.
[Bug middle-end/63383] internal compiler error: in expand_expr_real_1, at expr.c:9389
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63383 Marek Polacek changed: What|Removed |Added Status|NEW |RESOLVED CC||mpolacek at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from Marek Polacek --- This shouldn't ICE anymore, because the testcase is rejected due to: fatal error: definition of std::initializer_list does not match #include
[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616 --- Comment #14 from H.J. Lu --- (In reply to H.J. Lu from comment #13) > I got > > FAIL: g++.dg/ipa/pr66616.C -std=gnu++11 execution test > FAIL: g++.dg/ipa/pr66616.C -std=gnu++14 execution test > FAIL: g++.dg/ipa/pr66616.C -std=gnu++98 execution test > > on trunk/x86-64. It fails with -m32 on x86-64 for trunk and gcc-5-branch: [hjl@gnu-6 gcc]$ /export/build/gnu/gcc-x32-5/build-x86_64-linux/gcc/testsuite/g++/../../xg++ -B/export/build/gnu/gcc-x32-5/build-x86_64-linux/gcc/testsuite/g++/../../ /export/gnu/import/git/sources/gcc-release/gcc/testsuite/g++.dg/ipa/pr66616.C -m32 -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -I/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/include -I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/libsupc++ -I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/include/backward -I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++14 -O2 -fipa-cp-clone -L/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -B/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -L/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs -lm -o ./pr66616.exe [hjl@gnu-6 gcc]$ ./pr66616.exe Aborted [hjl@gnu-6 gcc]$
[Bug libfortran/68867] numeric formatting problem in the fortran library
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867 Bill Seurer changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #12 from Bill Seurer --- FAIL: gfortran.dg/fmt_g0_7.f08 -O0 execution test FAIL: gfortran.dg/fmt_g0_7.f08 -O1 execution test FAIL: gfortran.dg/fmt_g0_7.f08 -O2 execution test FAIL: gfortran.dg/fmt_g0_7.f08 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/fmt_g0_7.f08 -O3 -g execution test FAIL: gfortran.dg/fmt_g0_7.f08 -Os execution test The above tests were fixed by the patch but the following tests now fail FAIL: gfortran.dg/default_format_denormal_2.f90 -O0 execution test FAIL: gfortran.dg/default_format_denormal_2.f90 -O1 execution test FAIL: gfortran.dg/default_format_denormal_2.f90 -O2 execution test FAIL: gfortran.dg/default_format_denormal_2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gfortran.dg/default_format_denormal_2.f90 -O3 -g execution test FAIL: gfortran.dg/default_format_denormal_2.f90 -Os execution test I checked with the revision previous to this patch and the revision for this patch and the only differences were fmt_g0_7 succeeding and default_format_denormal_2 failing.
[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 Known to work||4.9.3 Ever confirmed|0 |1
[Bug tree-optimization/16107] missed optimization with some math function builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16107 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Known to work||6.0 Resolution|--- |FIXED --- Comment #8 from Marc Glisse --- Fixed a few months ago.
[Bug tree-optimization/55180] Missed optimization abs(-x) -> abs(x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55180 Bug 55180 depends on bug 16107, which changed state. Bug 16107 Summary: missed optimization with some math function builtins https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16107 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/57600] Turn 2 comparisons into 1 with the min
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57600 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Known to work||6.0 Resolution|--- |FIXED --- Comment #7 from Marc Glisse --- Fixed during stage 1.
[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921 Carlos O'Donell changed: What|Removed |Added CC||carlos at redhat dot com --- Comment #2 from Carlos O'Donell --- (In reply to Jonathan Wakely from comment #1) > This fixes it: > > --- a/libstdc++-v3/src/c++11/futex.cc > +++ b/libstdc++-v3/src/c++11/futex.cc > @@ -52,7 +52,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // we will fall back to spin-waiting. The only thing we could do > // here on errors is abort. > int ret __attribute__((unused)); > - ret = syscall (SYS_futex, __addr, futex_wait_op, __val); > + ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr); > _GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN); > return true; >} That is correct. futex.2 from draft_futex upstream branch of linux man pages project: ~~~ If the timeout argument is non-NULL, its contents specify a relative timeout for the wait, measured according to the CLOCK_MONOTONIC clock. (This inter‐ val will be rounded up to the system clock granularity, and is guaranteed not to expire early.) If timeout is NULL, the call blocks indefinitely. ~~~ I assume you want to block indefinitely.
[Bug target/68923] New: SSE/AVX movq load (_mm_cvtsi64_si128) not being folded into pmovzx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923 Bug ID: 68923 Summary: SSE/AVX movq load (_mm_cvtsi64_si128) not being folded into pmovzx Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- context and background: http://stackoverflow.com/questions/34279513/loading-8-chars-from-memory-into-an-m256-variable-as-packed-single-precision-f Using intrinsics, I can't find a way to get gcc to emit VPMOVZXBD (%rsi), %ymm0 ; 64b load VCVTDQ2PS %ymm0, %ymm0 without using _mm_loadu_si128, which will compile to an actual 128b load with -O0. (not counting evil use of #ifndef __OPTIMIZE__ to do it two different ways, of course). Since there is no intrinsic for PMOVSX / PMOVZX as a load from a narrower memory location, the only way I can see to correctly write this with intrinsics involves _mm_cvtsi64_si128 (MOVQ), which I don't even want the compiler to emit. clang3.6 and ICC13 compile this to the optimal sequence, still folding the load into VPMOVZXBD, but gcc doesn't. #include #include #define USE_MOVQ __m256 load_bytes_to_m256(uint8_t *p) { #ifdef USE_MOVQ // compiles to an actual movq then pmovzx xmm,xmm with gcc -O3 __m128i small_load = _mm_cvtsi64_si128( *(uint64_t*)p ); #else // loadu compiles to a 128b load with gcc -O0, potentially segfaulting __m128i small_load = _mm_loadu_si128( (__m128i*)p ); #endif __m256i intvec = _mm256_cvtepu8_epi32( small_load ); return _mm256_cvtepi32_ps(intvec); } Problem 1: g++ -O3 -march=haswell emits (gcc 5.3.0 on godbolt) load_bytes_to_m256(unsigned char*): vmovq (%rdi), %xmm0 vpmovzxbd %xmm0, %ymm0 vcvtdq2ps %ymm0, %ymm0 ret Problem 2: gcc and clang don't even provide that movq intrinsic in 32bit mode. (Split into a separate bug, since it's totally separate from the missing optimization issue).
[Bug c++/68922] New: g++ fails to generate code for catch clause with specific optimizations enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68922 Bug ID: 68922 Summary: g++ fails to generate code for catch clause with specific optimizations enabled Product: gcc Version: 5.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: alban.lefebvre at gmail dot com Target Milestone: --- Hello, The following code fails to generate a try/catch clause on g++ 5.2.1, 4.9.2, 4.8.4, 4.8.2: #include #include class Base { public: virtual ~Base() {} }; class C1 : public virtual Base { }; class C2 : public virtual Base { public: virtual void foo() = 0; }; class D : public C1, public C2 { public: virtual void foo() { throw std::exception(); } }; int main() { C2 * c2 = new D(); try { c2->foo(); } catch (...) { std::cout << "Caught some exception" << std::endl; } return 0; } when compiled with O2 optimization g++ main.cpp -Wall -Wextra -pedantic -O2 && ./a.out I get the following error when executing it: terminate called after throwing an instance of 'std::exception' what(): std::exception Aborted (core dumped) It seems that __cxa_begin_catch, __cxa_end_catch calls do not get generated: main: sub$0x8,%rsp mov$0x10,%edi callq 0x400850 <_Znwm@plt> lea0x8(%rax),%rdi movq $0x400ea0,(%rax) movq $0x400ed8,0x8(%rax) callq 0x400b10 <_ZThn8_N1D3fooEv> xor%eax,%eax add$0x8,%rsp retq It looks similar but possibly not the same as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68184 When we remove the virtual inheritance (by removing Base), the bug doesn't occur anymore. As far as which flags seem to trigger the issue, here what I've tried: -O1 => OK -O2 => BUG -O3 => BUG -O1 -ftree-pre -ftree-vrp => BUG -ftree-pre -ftree-vrp => OK Thank you
[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921 --- Comment #3 from torvald at gcc dot gnu.org --- LGTM, thanks. Would be nice to backport this.
[Bug target/68924] New: No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924 Bug ID: 68924 Summary: No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode. Product: gcc Version: 5.3.0 URL: http://stackoverflow.com/questions/34279513/loading-8- chars-from-memory-into-an-m256-variable-as-packed-sing le-precision-f Status: UNCONFIRMED Keywords: missed-optimization, ssemmx Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: i386-linux-gnu context and background: http://stackoverflow.com/questions/34279513/loading-8-chars-from-memory-into-an-m256-variable-as-packed-single-precision-f https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923 gcc and clang don't even provide the _mm_cvtsi64_si128 intrinsic for movq in 32bit mode (ICC does, see below). They still provide m128i _mm_mov_epi64(__m128i a), but at -O0 the load of the source __m128i won't fold into the movq, so you'd get an undesired 128b load that could cross a page boundary and segfault. The lack of this, and lack of an intrinsic for PMOVZX as a load from a narrower source, is a design flaw in the intrinsics, IMO. I think it's super dumb to be forced to use an intrinsic for an instruction I don't want (movq), even if it didn't cause a portability issue for x86-32bit. Consider trying to get gcc to emit `VPMOVZXBD (%src), %ymm0` for 32bit mode: #include #include __m256 load_bytes_to_m256(uint8_t *p) { __m128i small_load = _mm_cvtsi64_si128( *(uint64_t*)p ); __m256i intvec = _mm256_cvtepu8_epi32( small_load ); return _mm256_cvtepi32_ps(intvec); } That's the same code as in the other bug report (about the failure to fold the load into a memory source operand for vpmovzx: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923 ), but with the #ifdefs taken out _mm_cvtsi64_si128 is the intrinsic for the MOVQ %r/m64, %xmm form of MOVQ. (This is the MOVD/MOVQ entry in Intel's manual). Its non-VEX encoding includes a REX prefix, and even the VEX encoding of it is illegal in 32bit mode (prob. because it couldn't decide if the insn was legal or not until it checked the mod/rm byte to see if it encoded a 64b register source, instead of a 64b memory location). Since the other MOVQ gives identical results, and has a shorter non-VEX encoding, there's no reason to bother with that complexity. The other MOVQ (the one Intel's insn ref lists under just MOVQ), which can be used for %mm,%mm reg moves, or the low half of %xmm,%xmm regs, only has a m128i to m128i intrinsic: m128i _mm_mov_epi64(__m128i a), not a load form (same problem as the pmovz/sx intrinsics). Other than this design-flaw in the intrinsics, you could see it as only a bug in gcc/clang's implementation, since Intel's own implementation does still make it possible to get MOVQ m64, %xmm emitted in 32bit mode. ICC13 still provides _mm_cvtsi64_si128 in 32bit mode, and will use the MOVQ xmm, m64 form as a load. If it has a uint64_t in two 32bit registers, it emulates it with 2xMOVD %r32, %xmm and a PUNPCKLDQ. http://goo.gl/LQkVJL. Two 32b stores then a movq load would cause a store-forwarding failure stall. vmovd/vpinsrd would be fewer instructions, but pinsrd is a 2-uop instruction on Intel SnB-family CPUs, so as far as uops they're equal: 3 uops for the shuffle port (port5). At -O0, ICC emulates it that way even if the value is in memory, with 2x MOVD m32, %xmm and a PUNPCK, so even Intel's compiler "thinks of" the intrinsic as normally being the MOVQ %r/m64, %xmm form, not the MOVQ %xmm/m64, %xmm form.
[Bug target/68923] SSE/AVX movq load (_mm_cvtsi64_si128) not being folded into pmovzx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923 Peter Cordes changed: What|Removed |Added Keywords||missed-optimization, ssemmx Target||x86_64-linux-gnu URL||http://stackoverflow.com/qu ||estions/34279513/loading-8- ||chars-from-memory-into-an-m ||256-variable-as-packed-sing ||le-precision-f --- Comment #1 from Peter Cordes --- The other issue (that there's no intrinsic to generate a movq m64, %xmm in 32bit mode), is addressed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924. This bug is just about the optimization failure to fold the load into pmovzx.
[Bug libstdc++/61347] std::distance(list.first(),list.end()) in O(1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61347 Marc Glisse changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Marc Glisse --- Not sure why I didn't close it at the time. Probably because of debug mode, but I am pretty sure François made a pass on that later.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 57600, which changed state. Bug 57600 Summary: Turn 2 comparisons into 1 with the min https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57600 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 Marek Polacek changed: What|Removed |Added Component|c |debug Target Milestone|--- |6.0 Summary|ICE on valid code at -O3 on |[6 Regression] ICE on valid |x86_64-linux-gnu in |code at -O3 on |maybe_record_trace_start, |x86_64-linux-gnu in |at dwarf2cfi.c:2297 |maybe_record_trace_start, ||at dwarf2cfi.c:2297
[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- This doesn't look to me like a mere omission to invalidate debug stmts after some stmt move that (correctly) has not considered debug stmts when determining if they should be moved or not, but it looks to me like wrong-code transformation. Before unswitch, if c is non-zero, we have endless loop, but during unswitching it is wrongly changed to branch to the bb that returns instead. Say if you compile with -O3 (no -g): int a; volatile int b; short c, d; int fn1 () { int e; for (;;) { a = 3; if (c) continue; e = 0; for (; e > -30; e--) if (b) { int f = e; return d; } } } int main () { c = 1; asm volatile ("" : : "m" (c) : "memory"); fn1 (); __builtin_abort (); } then before the change this would just hang (expected), now it aborts instead.
[Bug c++/53223] [c++0x] auto&& and operator* don't mix inside templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53223 Paolo Carlini changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |6.0 --- Comment #14 from Paolo Carlini --- Fixed.
[Bug target/68910] New: SPARC/cypress: Poor code generation, huge stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910 Bug ID: 68910 Summary: SPARC/cypress: Poor code generation, huge stack frame Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.hu...@embedded-brains.de Target Milestone: --- Created attachment 37036 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37036=edit Test case. The code for the SHA512_Transform() function is very poor for the SPARC cypress target. sparc-rtems4.12-gcc -c -O2 sha512c.i -mcpu=cypress sha512c.o: file format elf32-sparc Disassembly of section .text: : 0: 9d e3 b0 58 save %sp, -4008, %sp 4: 94 10 20 80 mov 0x80, %o2 8: 92 10 00 19 mov %i1, %o1 c: 90 07 bd 80 add %fp, -640, %o0 [...] 10c: 40 00 00 00 call 10c110: 90 07 bd 40 add %fp, -704, %o0 114: c0 27 bd 20 clr [ %fp + -736 ] 118: c0 27 bd 24 clr [ %fp + -732 ] 11c: c0 27 bd 10 clr [ %fp + -752 ] 120: c0 27 bd 14 clr [ %fp + -748 ] 124: c0 27 bd 08 clr [ %fp + -760 ] 128: c0 27 bd 0c clr [ %fp + -756 ] 12c: c0 27 bd 00 clr [ %fp + -768 ] 130: c0 27 bd 04 clr [ %fp + -764 ] 134: c0 27 bc f8 clr [ %fp + -776 ] 138: c0 27 bc fc clr [ %fp + -772 ] [...] Compared to v8: sparc-rtems4.12-gcc -c -O2 sha512c.i -mcpu=v8 : 0: 9d e3 bc b8 save %sp, -840, %sp 4: 94 10 20 80 mov 0x80, %o2 8: 92 10 00 19 mov %i1, %o1 c: 90 07 bd 80 add %fp, -640, %o0 10: 40 00 00 00 call 10 14: f0 27 a0 44 st %i0, [ %fp + 0x44 ] [...] No massive clr instructions.
[Bug rtl-optimization/66248] subreg truncation not hoisted from loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248 --- Comment #2 from Jon Beniston --- Hi Steve. I'm not sure I'm follow your explanation. As I understand it, signed overflow is undefined behaviour (http://www.airs.com/blog/archives/120), so I'm not sure why we need to worry about changing the overflow behaviour (as the 16 LSBs should be the same). Even if not, -fstrict-overflow should be enabled at -O2, so the compiler should be able to assume that overflow will not occur anyway.
[Bug debug/58315] [4.9/5 Regression] Excessive memory use with -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58315 Bug 58315 depends on bug 66688, which changed state. Bug 66688 Summary: [6 Regression] compare debug failure building Linux kernel on ppc64le https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66688 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug debug/66688] [6 Regression] compare debug failure building Linux kernel on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66688 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Jakub Jelinek --- Fixed.
[Bug testsuite/68629] FAIL: c-c++-common/attr-simd-3.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68629 --- Comment #4 from Christophe Lyon --- (In reply to Thomas Preud'homme from comment #3) > Hi Christophe, > > Could you paste the output of arm linux when compiling the testcase in > cilkplus effective target with -fcilkplus? The output is now: xgcc: error: libcilkrts.spec: No such file or directory Before your patch, compiling attr-simd-3 produced an error message, but the test passed nonetheless: error: '#pragma omp declare simd' or 'simd' attribute cannot be used in the same function marked as a Cilk Plus SIMD-enabled function
[Bug c++/68782] [6 regression] bad reference member formed with constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68782 --- Comment #4 from Jakub Jelinek --- (In reply to Jason Merrill from comment #3) > Hmm, any element without TREE_CONSTANT should have caused us to return > the original CONSTRUCTOR. Perhaps the TREE_SIDE_EFFECTS stuff is not needed, but for TREE_CONSTANT perhaps the reason is that constexpr.c has different POV on what is a constant compared to ../tree.c - at least it seems that cxx_eval_constant_expression happily accepts >c as *non_constant_p = false, but if CONSTRUCTOR containing that is marked TREE_CONSTANT (which is IMHO wrong, because in that case all elements should be TREE_CONSTANT), then we e.g. trigger: case CONSTRUCTOR: if (TREE_CONSTANT (t)) /* Don't re-process a constant CONSTRUCTOR, but do fold it to VECTOR_CST if applicable. */ return fold (t); r = cxx_eval_bare_aggregate (ctx, t, lval, non_constant_p, overflow_p); break; and just fold it instead of calling cxx_eval_bare_aggregate on it. > I thought there was already a function to recompute these flags, but I'm > not finding it. I can't find it either, we have that only for ADDR_EXPR it seems - recompute_tree_invariant_for_addr_expr.
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 --- Comment #3 from Marek Polacek --- (In reply to Chengnian Sun from comment #2) > Is it related to this recently fixed bug? > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67778 Doesn't look like it, this one has been caused by: commit a1965220d5ae62c617abfe40e1dc5c03bb7aa38f Author: lawDate: Sat Nov 7 06:31:14 2015 + [PATCH] Remove more backedge threading support * tree-ssa-threadedge.c (dummy_simplify): Remove. (thread_around_empty_blocks): Remove backedge_seen_p argument. If we thread to a backedge, then return false. Update recursive call to eliminate backedge_seen_p argument. (thread_through_normal_block): Remove backedge_seen_p argument. Remove backedge_seen_p argument from calls to thread_around_empty_blocks. Remove checks on backedge_seen_p. If we thread to a backedge, then return 0. (thread_across_edge): Remove bookkeeping for backedge_seen. Don't pass it to thread_through_normal_block or thread_through_empty_blocks. For joiner handling, if we see a backedge, do not try normal threading. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229911 138bc75d-0d04-0410-961f-82ee72b054a4
[Bug c/68845] -Werror=array-bounds=[12] doesn't turn warning into error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68845 --- Comment #3 from Franz Sirl --- Created attachment 37035 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37035=edit Alias -Warray-bounds to Warray-bounds= Tentative patch, no regressions. Please commit if OK, I don't have valid credentials anymore.
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 --- Comment #5 from Jakub Jelinek --- This started with r229911, but it must be some RTL optimization bug instead.
[Bug target/68910] SPARC/cypress: Poor code generation, huge stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910 Sebastian Huber <sebastian.hu...@embedded-brains.de> changed: What|Removed |Added Known to fail||6.0 --- Comment #2 from Sebastian Huber <sebastian.hu...@embedded-brains.de> --- sparc-rtems4.12-gcc (GCC) 6.0.0 20151215 (experimental)
[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506 Paolo Carlini changed: What|Removed |Added CC||paolo.carlini at oracle dot com --- Comment #7 from Paolo Carlini --- This is fixed in mainline. I'm adding the reduced testcases and closing the bug.
[Bug c/68909] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 Marek Polacek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||mpolacek at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Marek Polacek --- The backtrace looks the same as in PR65496.
[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906 Marek Polacek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||ienkovich at gcc dot gnu.org, ||mpolacek at gcc dot gnu.org Component|c |tree-optimization Target Milestone|--- |6.0 Summary|ICE at -O3 on |[6 Regression] ICE at -O3 |x86_64-linux-gnu: |on x86_64-linux-gnu: |verify_ssa failed |verify_ssa failed Ever confirmed|0 |1 --- Comment #1 from Marek Polacek --- Confirmed, started with: commit a361141865247626a73c0f2257a95bc7d4f274c9 Author: ienkovichDate: Thu Oct 8 13:14:09 2015 + gcc/ * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and "cfghooks.h", add prototypes for introduced new functions. (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all checks on ability of loop unswitching to tree_unswitch_single_loop; invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending on innermost loop check. (tree_unswitch_single_loop): Add all required checks on ability of loop unswitching under zero recursive level guard. (tree_unswitch_outer_loop): New function. (find_loop_guard): Likewise. (empty_bb_without_guard_p): Likewise. (used_outside_loop_p): Likewise. (get_vop_from_header): Likewise. (hoist_guard): Likewise. (check_exit_phi): Likewise. gcc/testsuite/ * gcc.dg/loop-unswitch-2.c: New test. * gcc.dg/loop-unswitch-3.c: Likewise. * gcc.dg/loop-unswitch-4.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228599 138bc75d-0d04-0410-961f-82ee72b054a4
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 --- Comment #4 from Marek Polacek --- Thus, not a dup of PR65496.
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 Chengnian Sun changed: What|Removed |Added CC||chengniansun at gmail dot com --- Comment #2 from Chengnian Sun --- (In reply to Marek Polacek from comment #1) > The backtrace looks the same as in PR65496. Is it related to this recently fixed bug? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67778
[Bug target/68908] inefficient code for _Atomic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908 Jakub Jelinek changed: What|Removed |Added Target|powerpc64 | Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||jakub at gcc dot gnu.org, ||jsm28 at gcc dot gnu.org, ||mpolacek at gcc dot gnu.org, ||rth at gcc dot gnu.org Summary|inefficient code for an |inefficient code for |atomic preincrement on |_Atomic operations |powerpc64le | Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek --- Doesn't seem to be ppc64le specific in any way, and doesn't affect just preincrement. Try: typedef _Atomic int AI; AI i; void fn1 (AI * ai) { ++*ai; } void fn2 (AI * ai) { (*ai)++; } void fn3 (AI * ai) { *ai += 6; } void fn4 (void) { ++i; } void fn5 (void) { i++; } void fn6 (void) { i += 2; } and you'll see even on x86_64-linux that all the sequences use the generic CAS instructions instead of __atomic_fetch_add etc. The comment above build_atomic_assign even says this: "Also note that the compiler is simply issuing the generic form of the atomic operations." So, the question is, should we add smarts to the FE to optimize the cases already when emitting them (this would be similar to what omp-low.c does when expanding #pragma omp atomic, see: /* When possible, use specialized atomic update functions. */ if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type)) && store_bb == single_succ (load_bb) && expand_omp_atomic_fetch_op (load_bb, addr, loaded_val, stored_val, index)) return; ), or should we add some pattern matching in some pass that would try to detect these rather complicated patterns like: : _5 = __atomic_load_4 (ai_3(D), 5); _6 = (int) _5; D.1768 = _6; : # prephitmp_17 = PHI <_6(2), pretmp_16(4)> _9 = prephitmp_17 + 1; _10 = (unsigned int) _9; _12 = __atomic_compare_exchange_4 (ai_3(D), , _10, 0, 5, 5); if (_12 != 0) goto ; else goto ; : pretmp_16 = D.1768; goto ; (with the casts in there optional) and convert those to the more efficient __atomic_* calls if possible? Note one issue is that the pattern involves non-SSA loads/stores (the D.1768 var above) and we'd need to prove that the var is used only in those two places and nowhere else.
[Bug c++/21802] Two-stage name lookup fails for operators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21802 Paolo Carlini changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |6.0 --- Comment #7 from Paolo Carlini --- Fixed then.
[Bug testsuite/68629] FAIL: c-c++-common/attr-simd-3.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68629 --- Comment #5 from Christophe Lyon --- After discussion on IRC, it seems better to keep your patch as-is, since cilk-plus is not supported on arm anyway.
[Bug target/68910] SPARC/cypress: Poor code generation, huge stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910 --- Comment #1 from Sebastian Huber--- Code generation for leon3 is also quite bad.
[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506 --- Comment #8 from paolo at gcc dot gnu.org --- Author: paolo Date: Tue Dec 15 10:18:13 2015 New Revision: 231646 URL: https://gcc.gnu.org/viewcvs?rev=231646=gcc=rev Log: 2015-12-15 Paolo CarliniPR c++/63506 * g++.dg/cpp0x/pr63506-1.C: New. * g++.dg/cpp0x/pr63506-2.C: Likewise. Added: trunk/gcc/testsuite/g++.dg/cpp0x/pr63506-1.C trunk/gcc/testsuite/g++.dg/cpp0x/pr63506-2.C Modified: trunk/gcc/testsuite/ChangeLog
[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506 Paolo Carlini changed: What|Removed |Added Status|NEW |RESOLVED CC|paolo.carlini at oracle dot com| Resolution|--- |FIXED Target Milestone|--- |6.0 --- Comment #9 from Paolo Carlini --- Done.
[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||segher at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- I think this testcase just shows that the PR67778 fix is insufficient. We again have a complex cfg full of various loops, followed by a single bb (bb10 in this case) that needs frame pointer. Again, we see: Attempting shrink-wrapping optimization. Block 10 needs the prologue. After wrapping required blocks, PRO is now 10 Avoiding non-duplicatable blocks, PRO is now 10 Bumping back to anticipatable blocks, PRO is now 6 where putting prologue at the entry of bb10 is fine, but putting it at the entry of bb6 (shrink-wrapping actually creates bb11 with the prologue and redirects edges from bb8 and bb5 to the new bb11 and bb11 then falls through to bb6) is wrong. While bb5 is only reachable from bbs before the prologue, so that is fine, bb8 is reachable both from bb2 (i.e. from bbs before the prologue), but also from bb9, which is dominated by bb6. So, by incorrectly putting the prologue at the start of bb6 (well, that bb self-loops, so it is put on the other edges), we then can take path from ENTRY -> bb2 -> bb3 -> bb5 -> bb11[prologue] -> bb6 -> bb7 -> bb9 -> bb8 -> bb11[prologue] and enter the prologue 2 times (or more times).
[Bug rtl-optimization/67715] [6 Regression][ARM] ICE in cselib.c during reload_cse_regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67715 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Jakub Jelinek --- Assuming fixed then.
[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477 Jakub Jelinek changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #6 from Jakub Jelinek --- Assuming fixed.
[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477 Jakub Jelinek changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- And presumably fixed for real with r228662 ?
[Bug libstdc++/68863] Regular expressions: Backreferences don't work in negative lookahead
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68863 Jonathan Wakely changed: What|Removed |Added Target Milestone|--- |4.9.4 Known to fail||4.9.3, 5.3.0
[Bug c/68911] [6 Regression] wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911 --- Comment #2 from Jakub Jelinek --- This goes wrong during vrp1. Analyzing # of iterations of loop 2 exit condition [e_6, + , 1] <= 93 bounds on difference of bases: -4294967202 ... 93 result: zero if e_6 > 94 # of iterations 94 - e_6, bounded by 94 looks wrong to me, e_6 as well as the additions and comparison are performed in unsigned type, therefore 94 - e_6 is I believe not bounded by 94. The value ranges for e_6 clearly allow (and in the testcase are) some very large unsigned numbers, so 94 - e_6. If assuming the value of f is arbitrary (it is not), then the possible values of e before entering the while (e < 94) e++; loop are either 2, 94, 0xU or 0xfffeU (of course f is not arbitrary and as b and d are both 0, it will be actually 0xU each time. But from those 4 numbers the number of iterations would be bound by 96.
[Bug c/68911] New: wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911 Bug ID: 68911 Summary: wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes) Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: chengniansun at gmail dot com Target Milestone: --- The current gcc trunk miscompiles the following code on x86_64-linux-gnu in both 32- and 64-bit modes at -Os and above. $: gcc-trunk -v Using built-in specs. COLLECT_GCC=gcc-trunk COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib Thread model: posix gcc version 6.0.0 20151214 (experimental) [trunk revision 231607] (GCC) $: $: gcc-trunk -Os -w -m32 small.c ; timeout -s 9 10 ./a.out Killed $: gcc-trunk -O2 -w -m32 small.c ; timeout -s 9 10 ./a.out Killed $: gcc-trunk -O3 -w -m32 small.c ; timeout -s 9 10 ./a.out Killed $: gcc-trunk -O0 -w -m32 small.c ; timeout -s 9 10 ./a.out $: gcc-trunk -O1 -w -m32 small.c ; timeout -s 9 10 ./a.out $: $: gcc-trunk -Os -w -m64 small.c ; timeout -s 9 10 ./a.out Killed $: gcc-trunk -O2 -w -m64 small.c ; timeout -s 9 10 ./a.out Killed $: gcc-trunk -O3 -w -m64 small.c ; timeout -s 9 10 ./a.out Killed $: $: cat small.c char a; int b, c; short d; int main() { unsigned e = 2; for (; c < 2; c++) { int f = ~e / 7; if (f) a = e = ~(b && d); while (e < 94) e++; } return 0; } $:
[Bug libstdc++/68912] New: Wrong value category used in _Bind functor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68912 Bug ID: 68912 Summary: Wrong value category used in _Bind functor Product: gcc Version: 4.9.4 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- From https://gcc.gnu.org/ml/libstdc++/2015-12/msg00035.html The _Bind class function-call operator looks something like this: template()(_Mu<_Bound_args>()( std::declval<_Bound_args&>(), std::declval&>() )... ) )> _Result operator()(_Args&&... __args) { ... } The problem is that std::declval returns an rvalue reference, but the functor is invoked in an lvalue context. As a result, the following (valid) code will fail to compile: #include struct B {}; struct C {}; struct A { B operator()(int, double, char) & { return B(); } C operator()(int, double, char) && {return C(); } }; int main() { A a; auto bound = std::bind(a, 5, 4.3, 'c'); auto res = bound(); }
[Bug c++/63628] [c++1y] cannot use decltype on captured arg-pack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63628 --- Comment #3 from Paolo Carlini --- The second and third variants work in mainline.
[Bug c++/68071] Generic lambda variadic argument pack cannot be empty
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68071 Paolo Carlini changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC|vittorio.romeo at outlook dot com | Ever confirmed|0 |1
[Bug tree-optimization/68862] [6 Regression] g++.dg/torture/pr59163.C FAILs with -flive-range-shrinkage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68862 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||jakub at gcc dot gnu.org, ||uros at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek --- Started with r229086. That said, I think it looks like an i386 backend problem. I believe for pre-AVX we rely on unaligned loads/stores to be done with unspecs (UNSPEC_LOADU/UNSPEC_STOREU and that way make sure those don't leak into arithmetic instructions which pre-AVX can't handle unaligned memory operands. But on this testcase those aren't used, because the load and store aren't performed in some vector mode, but in TImode instead (as that is the mode of the structure). So we have: (insn 6 2 8 2 (set (reg:TI 90 [ *a_4(D) ]) (mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32])) pr68862.c:15 84 {*movti_internal} (expr_list:REG_EQUIV (mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32]) (nil))) (insn 8 6 9 2 (set (reg:V4SF 92) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S16 A128])) pr68862.c:17 1221 {*movv4sf_internal} (expr_list:REG_EQUIV (const_vector:V4SF [ (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) ]) (nil))) (insn 9 8 12 2 (set (reg:V4SF 91 [ vect__7.7 ]) (mult:V4SF (reg:V4SF 92) (subreg:V4SF (reg:TI 90 [ *a_4(D) ]) 0))) pr68862.c:17 1436 {*mulv4sf3} (expr_list:REG_DEAD (reg:V4SF 92) (expr_list:REG_DEAD (reg:TI 90 [ *a_4(D) ]) (nil (insn 12 9 17 2 (set (mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32]) (subreg:TI (reg:V4SF 91 [ vect__7.7 ]) 0)) pr68862.c:18 84 {*movti_internal} (expr_list:REG_DEAD (reg:V4SF 91 [ vect__7.7 ]) (expr_list:REG_DEAD (reg/v/f:DI 89 [ a ]) (nil in *.ira, which is still not invalid according to the current rules, but then LRA changes it into: (insn 8 6 9 2 (set (reg:V4SF 21 xmm0 [92]) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S16 A128])) pr68862.c:17 1221 {*movv4sf_internal} (expr_list:REG_EQUIV (const_vector:V4SF [ (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) (const_double:SF 6.0e+0 [0x0.cp+3]) ]) (nil))) (insn 9 8 12 2 (set (reg:V4SF 21 xmm0 [orig:91 vect__7.7 ] [91]) (mult:V4SF (reg:V4SF 21 xmm0 [92]) (mem:V4SF (reg/v/f:DI 5 di [orig:89 a ] [89]) [1 *a_4(D)+0 S16 A32]))) pr68862.c:17 1436 {*mulv4sf3} (nil)) (insn 12 9 17 2 (set (mem:TI (reg/v/f:DI 5 di [orig:89 a ] [89]) [1 *a_4(D)+0 S16 A32]) (reg:TI 21 xmm0 [orig:91 vect__7.7 ] [91])) pr68862.c:18 84 {*movti_internal} (nil)) Not sure what to do about this though, most of the SSE* arithmetic instructions use nonimmediate_operand or similar predicates, we'd have to switch all of them to use some other predicate that for pre-AVX would disallow misaligned_operand.
[Bug target/24012] [4.9/5/6 regression] #define _POSIX_C_SOURCE breaks #include
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24012 Jonathan Wakely changed: What|Removed |Added Target Milestone|5.4 |7.0 --- Comment #18 from Jonathan Wakely --- The (fairly large) changes needed to fix this didn't happen for gcc 5, or gcc6, adjusting target milestone.
[Bug tree-optimization/68862] [6 Regression] g++.dg/torture/pr59163.C FAILs with -flive-range-shrinkage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68862 --- Comment #3 from Zdenek Sojka --- (In reply to Jakub Jelinek from comment #2) > Started with r229086. > That said, I think it looks like an i386 backend problem. True, I have 7 FAILs of pr59163.C on x86 (x86_64 and *x32), but none on other architectures.
[Bug middle-end/68870] [6 Regression] ICE on valid code at -O1, -O2 and -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68870 --- Comment #8 from Arseny Solokha --- I believe the following reproducer is for the issue reported here. Its further minimization yields backtrace listed in #c0. The only difference is that w/ the following not minimized snippet gcc ICEs in tree_nop_conversion_p(tree_node const*, tree_node const*) when compiling it at -O1: int w3, ao, k9, nl; static int oy(void) { static int pe; int ht; for (ao = 0; ao < 1; ++ao) for (ht = 0; ht < 1; ++ht) for (w3 = 0; w3 < 1; ++w3) for (k9 = 0; k9 < 1; ++k9) if (pe < 1) return 0; return (ht > 1) || nl; } int ct(void) { return oy(); }
[Bug target/66171] [6 Regression]: gcc.target/cris/biap.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66171 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P4 CC||jakub at gcc dot gnu.org
[Bug c++/68903] missing default initialization of member when combined with virtual inheritance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68903 Jonathan Wakely changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 Summary|missing default |missing default |initialization of member|initialization of member |when combined with virtual |when combined with virtual |imheritance |inheritance Ever confirmed|0 |1 Known to fail||4.9.3, 5.3.0, 6.0
[Bug target/68896] [ARM] target attribute ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68896 --- Comment #2 from chrbr at gcc dot gnu.org --- Currently not a bug, or rather implementation specified. According to the documentation 6.61.15 Function Specific Option Pragmas #pragma GCC target ("string"...) ... Each function that is defined after this point is as if attribute((target("STRING"))) was specified for that function So here we have #pragma GCC target ("fpu=vfp") ... int8x8_t __attribute__ ((target("fpu=neon"))) my so "my" is defined as if attribute((target("fpu=vfp"))) was specified. Now, IMHO this is not intuitive since the attribute targets has a smaller scope, it should have a higher priority. And the doc doesn't say if the attribute target is inserted before or after the existing ones, in case of conflict. so literally not a bug, but I'd like to specify the order of insertion to solve your current issue.
[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616 --- Comment #12 from Martin Jambor --- No, I'm still in the process of testing a slightly modified patch for 4.9.
[Bug c++/68903] missing default initialization of member when combined with virtual imheritance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68903 Jonathan Wakely changed: What|Removed |Added Severity|blocker |normal
[Bug c/68911] [6 Regression] wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||amker at gcc dot gnu.org, ||jakub at gcc dot gnu.org Target Milestone|--- |6.0 Summary|wrong code at -Os and above |[6 Regression] wrong code |on x86-64-linux-gnu (in 32- |at -Os and above on |and 64-bit modes) |x86-64-linux-gnu (in 32- ||and 64-bit modes) Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek --- Started with r224020.
[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477 --- Comment #7 from Renlin Li --- (In reply to Jakub Jelinek from comment #4) > The ICE has been on > (insn 105 746 971 5 (parallel [ > (set (reg:V16QI 60 d22 [720]) > (unspec:V16QI [ > (reg:V16QI 60 d22 [720]) > (reg:V16QI 60 d22 [720]) > ] UNSPEC_VTRN1)) > (set (reg:V16QI 60 d22 [720]) > (unspec:V16QI [ > (reg:V16QI 60 d22 [720]) > (reg:V16QI 60 d22 [720]) > ] UNSPEC_VTRN2)) > ]) pr67477.c:63 1972 {*neon_vtrnv16qi_insn} > (nil)) > which was clearly invalid RTL, multiple sets of the same register. The insn > was still ok in the *.ira dump and broken in *.reload dump. > (define_insn "*neon_vtrn_insn" > [(set (match_operand:VDQW 0 "s_register_operand" "=w") > (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > (match_operand:VDQW 3 "s_register_operand" "2")] > UNSPEC_VTRN1)) >(set (match_operand:VDQW 2 "s_register_operand" "=w") > (unspec:VDQW [(match_dup 1) (match_dup 3)] > UNSPEC_VTRN2))] > "TARGET_NEON" > "vtrn.\t%0, %2" > [(set_attr "type" "neon_permute")] > doesn't look like a target bug that would allow 2 same set destinations. That's exactly what I have observed. r228662 fixes that by adding early clobber modifier to the operand, so that register could assign a different register.
[Bug c++/68905] [DR496] __is_trivially_copyable returns True for volatile class types.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68905 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 Summary|__is_trivially_copyable |[DR496] |returns True for volatile |__is_trivially_copyable |class types.|returns True for volatile ||class types. Ever confirmed|0 |1 --- Comment #1 from Jonathan Wakely --- This is http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#496 Moved to DR at the April, 2013 meeting.
[Bug c++/68819] Invalid "-Wmisleading-indentation" warning if location_t >=LINE_MAP_MAX_LOCATION_WITH_COLS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68819 Markus Trippelsdorf changed: What|Removed |Added CC||trippels at gcc dot gnu.org --- Comment #8 from Markus Trippelsdorf --- Another similar example: int main() { int i = 0; do i++; while (i < 3); }
[Bug c/68845] -Werror=array-bounds=[12] doesn't turn warning into error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68845 --- Comment #4 from Manuel López-Ibáñez --- (In reply to Franz Sirl from comment #3) > Created attachment 37035 [details] > Alias -Warray-bounds to Warray-bounds= > > Tentative patch, no regressions. Please commit if OK, I don't have valid > credentials anymore. I cannot approve your patch, and the people who can, probably do not read this report. The best chance to get a patch approved is to send it to gcc-patc...@gcc.gnu.org, CCing people who can review it (in this case, middle-end people, see MAINTAINERS), and explaining how you did bootstrap & regression testing.
[Bug c++/58796] throw nullptr not caught by catch(type*)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58796 --- Comment #9 from Jonathan Wakely --- Yes, it's on my list. That's why I changed the target milestone to 6.0 a week ago.
[Bug libstdc++/68912] Wrong value category used in _Bind functor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68912 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-12-15 Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org Ever confirmed|0 |1
[Bug c/68908] inefficient code for _Atomic operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908 --- Comment #8 from joseph at codesourcery dot com --- I'm fine with making the front end smarter. Note that if either side of the assignment is of floating-point type, you need to keep the existing logic; if you're adding to / subtracting from a pointer, you need to ensure the multiplication by the size of the pointer target type still occurs; and if the arithmetic operation might be sanitized, you probably need to keep the existing logic as well (but otherwise, if the __atomic_fetch_* operations never have undefined overflow, it should be safe to do the operation in the type of the LHS).
[Bug tree-optimization/63185] Improve DSE with branches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63185 --- Comment #6 from Marc Glisse --- In addition to the issues already described, it seems that we generate better code if I replace the VLAs with calls to alloca. Indeed, we assume that alloca returns 16-aligned memory, while with __builtin_alloca_with_align(..., 64), we don't seem to have code to turn it into __builtin_alloca_with_align(..., 128) so we could avoid all the loop adjustment code.
[Bug libstdc++/68925] New: uniform_int_distribution needs not to be thread_local in std::experimental::randint
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68925 Bug ID: 68925 Summary: uniform_int_distribution needs not to be thread_local in std::experimental::randint Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: lichray at gmail dot com Target Milestone: --- libstdc++'s uniform_int_distribution is stateless, thus just return _Dist(a, b)(_S_randint_engine()); will do the work, and produces more compact binary.
[Bug inline-asm/10396] Constraint alternatives cause error " `asm' operand requires impossible reload"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10396 Bernd Edlinger changed: What|Removed |Added CC||bernd.edlinger at hotmail dot de --- Comment #23 from Bernd Edlinger --- (In reply to David from comment #22) > Despite the impression you may get from comments 17-21, gcc DOES support > multi-alternatives with inline asm (see > https://gcc.gnu.org/ml/gcc/2015-10/msg00249.html). > > I do not have an arm build with which to test, but using 5.2 on x64, the > samples in this bug do not produce errors. Perhaps in the 7-12 years since > they were added, something got fixed? Or maybe this problem is > platform-specific. you can build a cross-compiler out of nothing, if you want. cd binutils-build-arm ../binutils-2.25.1/configure --prefix=../arm-eabi --target=arm-unknown-eabi make && make install cd ../gcc-build-arm ../gcc-trunk/configure --prefix=/home/ed/gnu/arm-eabi --target=arm-unknown-eabi --enable-languages=c --disable-libssp make && make install
[Bug lto/68799] lto ICE on powerpc64le-linux-gnu builing python 2.7.x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68799 Bill Schmidt changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |wschmidt at gcc dot gnu.org Target Milestone|--- |6.0
[Bug target/61298] redundant compare instructions for powerpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61298 Peter Bergner changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Peter Bergner --- (In reply to Segher Boessenkool from comment #7) > Fixed (or hidden) on trunk with r222855. Given that, I'm going to mark this as fixed.
[Bug c++/68929] New: GCC hangs in nested template instantiations even after static_assert fails.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68929 Bug ID: 68929 Summary: GCC hangs in nested template instantiations even after static_assert fails. Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: eric at efcs dot ca Target Milestone: --- GCC currently hangs when compiling the attached reproducer. The reproducer is a stripped down libc++ test that ensures that "std::make_integer_sequence" causes a static assertion. GCC will emit the assertion but then continue to run and consume more memory until its killed for being OOM.
[Bug other/66250] Can't adjust complex nor decimal floating point modes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66250 Bernd Schmidt changed: What|Removed |Added CC||bernds at gcc dot gnu.org --- Comment #2 from Bernd Schmidt --- The motivation for this patch seems unclear. What is this fixing?
[Bug target/68256] [6 regression] switching constant pools to rodata sections causes go bootstrap failure.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68256 Bernd Schmidt changed: What|Removed |Added CC||bernds at gcc dot gnu.org --- Comment #3 from Bernd Schmidt --- Can this be closed?
[Bug rtl-optimization/63491] Ice in LRA with simple vector test case on power
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63491 Bill Schmidt changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-12-15 CC||wschmidt at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #15 from Bill Schmidt --- Obviously confirmed at this point. Vlad, do you plan to backport this to GCC 5? We should get this closed if this is fixed.
[Bug target/68928] New: AVX loops on unaligned arrays could generate more efficient startup/cleanup code when peeling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68928 Bug ID: 68928 Summary: AVX loops on unaligned arrays could generate more efficient startup/cleanup code when peeling Product: gcc Version: 5.3.0 Status: UNCONFIRMED Keywords: missed-optimization, ssemmx Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: x86-64-*-* I have some suggestions for better code that gcc could use for the prologue/epilogue when vectorizing loops over unaligned buffers. I haven't looked at gcc's code, just the output, so IDK how one might get gcc to implement these. - Consider the following code: #include typedef float float_align32 __attribute__ ((aligned (32))); void floatmul_aligned(float_align32 *a) { for (int i=0; i<1024 ; i++) a[i] *= 2; } void floatmul(float *a) { for (int i=0; i<1024 ; i++) a[i] *= 2; } g++ 5.3.0 -O3 -march=sandybridge emits what you'd expect for the aligned version: floatmul_aligned(float*): leaq4096(%rdi), %rax .L2: vmovaps (%rdi), %ymm0 addq$32, %rdi vaddps %ymm0, %ymm0, %ymm0 vmovaps %ymm0, -32(%rdi) cmpq%rdi, %rax jne .L2 vzeroupper ret *** off-topic *** It unfortunately uses 5 uops in the loop, meaning it can only issue one iteration per 2 clocks. Other than unrolling, it would prob. be more efficient to get 2.0f broadcast into %ymm1 and use vmulps (%rdi), %ymm1, %ymm0, avoiding the separate load. Doing the loop in reverse order, with an indexed addressing mode counting an index down to zero, would also keep the loop overhead down to one decrement-and-branch uop. I know compilers are allowed to re-order memory accesses, so I assume this would be allowed. However, this wouldn't actually help on Sandybridge since it seems that two-register addressing modes might not micro-fuse on SnB-family CPUs: (http://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes. Agner Fog says he tested and found 2-reg addressing modes did micro-fuse. Agner Fog is probably right, but IDK what's wrong with my experiment using perf counters.) That would make the store 2 uops. *** back on topic *** Anyway, that wasn't even what I meant to report. The unaligned case peels off the potentially-unaligned start/end iterations, and unrolls them into a giant amount of code. This is unlikely to be optimal outside of microbenchmarks, since CPUs with a uop-cache suffer from excessive unrolling. floatmul(float*): movq%rdi, %rax andl$31, %eax shrq$2, %rax negq%rax andl$7, %eax je .L12 vmovss (%rdi), %xmm0 vaddss %xmm0, %xmm0, %xmm0 vmovss %xmm0, (%rdi) cmpl$1, %eax je .L13 vmovss 4(%rdi), %xmm0 vaddss %xmm0, %xmm0, %xmm0 vmovss %xmm0, 4(%rdi) cmpl$2, %eax je .L14 vmovss 8(%rdi), %xmm0 ... repeated up to cmpl$6, %eax ... some loop setup .L9: vmovaps (%rcx,%rax), %ymm0 addl$1, %edx vaddps %ymm0, %ymm0, %ymm0 vmovaps %ymm0, (%rcx,%rax) addq$32, %rax cmpl%esi, %edx jb .L9 ... another fully-unrolled up-to-7 iteration cleanup loop Notice that the vectorized part of the loop now has 6 uops. (Or 7, if the store can't micro-fuse.) So gcc is even farther from getting this loop to run at one cycle per iteration. (Which should be possible on Haswell. On SnB/IvB (and AMD Bulldozer-family), a 256b store takes two cycles anyway.) Is there any experimental evidence that fully unrolling to make this much code is beneficial? The most obvious way to improve on this would be to use a 128b xmm vector for the first 4 iterations of the prologue/epilogue loops. Even simply not unrolling the 7-iteration alignment loops might be a win. Every unrolled iteration still has a compare-and-branch. By counting down to zero, the loop could have the same overhead. All that changes is branch prediction (one taken branch and many not-taken, vs. a single loop branch taken n times.) AVX introduces a completely different way to handle this, though: VMASKMOVPS is usable now, since it doesn't have the non-temporal hint that makes the SSE version of it nearly useless. According to Agner Fog's insn tables, vpmaskmov %ymm, %ymm, m256 is only 4 uops, and has a throughput of one per 2 cycles (SnB/IvB/Haswell). It's quite slow (as a store) on AMD bulldozer-family CPUs, though, so this might only be appropriate with -tune=something other than AMD. The trouble is turning a misalignment count into a mask. Most of the useful instructions (like PSRLDQ to use on a
[Bug debug/68904] DWARF for class ios_base says it's a declaration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68904 ivan.soleimanipour at oracle dot com changed: What|Removed |Added CC||ivan.soleimanipour at oracle dot c ||om --- Comment #5 from ivan.soleimanipour at oracle dot com --- I see now why Andrew was asking the questions he was asking. What we failed to notice is that the definition of ios_base in t.o is abbreviated. It contains only nested classes, 'static const' members and typedefs. There is no member function or data member information. There is a more complete definition in `libstdc++.so.6.0.18. So now the questions become: - How does gcc decide to emit this abbreviated form? - Why is it then not _fully_ abbreviated? Why bother with the typedefs and such? FWIW -fno-eliminate-unused-debug-types seems to make no difference.