[Bug c++/68763] [6 Regression] ICE: in verify_unstripped_args, at cp/pt.c:1132

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68763

Marek Polacek  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|WORKSFORME  |---

--- Comment #7 from Marek Polacek  ---
Yeah, that ICEs.

1123 static void
 1124 verify_unstripped_args (tree args)
 1125 {
 1126   ++processing_template_decl;
 1127   if (!any_dependent_template_arguments_p (args))
 1128 {
 1129   tree inner = INNERMOST_TEMPLATE_ARGS (args);
 1130   for (int i = 0; i < TREE_VEC_LENGTH (inner); ++i)
 1131 {
 1132   tree arg = TREE_VEC_ELT (inner, i);
 1133   if (TREE_CODE (arg) == TEMPLATE_DECL)
 1134 /* OK */;
 1135   else if (TYPE_P (arg))
 1136 gcc_assert (strip_typedefs (arg, NULL) == arg);

strip_typedefs (arg, NULL) is:
struct 
{
  const struct details_t & account_t:: (const struct account_t *, bool)
* __pfn;
  long int __delta;
}
and arg is:
struct 
{
  const struct details_t & account_t:: (const struct account_t *, bool)
* __pfn;
  long int __delta;
}

[Bug c/68908] inefficient code for _Atomic operations

2015-12-15 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

--- Comment #6 from Martin Sebor  ---
(In reply to Jakub Jelinek from comment #2)
> Doesn't seem to be ppc64le specific in any way, and doesn't affect just
> preincrement.

The inefficiency I was pointing out was the redundant syncs above the loop on
powerpc64.  The x86_64 assembly looks fairly efficient both ways.

I also intentionally focused the bug on the increment expression and didn't
mention others like compound assignment because I expected the former to be
more common.  But I suppose ++a really should be equally as efficient as a += 1
which shouldn't be any less efficient than a += X for any arbitrary X.

If it's preferable to treat this as a generic opportunity to improve the
efficiency of all atomic expressions (perhaps along with those discussed on the
Wiki: https://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations) that sounds great.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread jon at beniston dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #6 from Jon Beniston  ---
-fstrict-overflow (which is the default at -O2) tells us that we can assume it
will not overflow.

Even if it did, on most targets it makes no difference to the result.

[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk

2015-12-15 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616

--- Comment #15 from H.J. Lu  ---
(In reply to H.J. Lu from comment #14)
> (In reply to H.J. Lu from comment #13)
> > I got
> > 
> > FAIL: g++.dg/ipa/pr66616.C  -std=gnu++11 execution test
> > FAIL: g++.dg/ipa/pr66616.C  -std=gnu++14 execution test
> > FAIL: g++.dg/ipa/pr66616.C  -std=gnu++98 execution test
> > 
> > on trunk/x86-64.
> 
> It fails with -m32 on x86-64 for trunk and gcc-5-branch:
> 

It also fails on i686.

[Bug rtl-optimization/67736] Wrong optimization with -fexpensive-optimizations on mips64el

2015-12-15 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67736

Steve Ellcey  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||sje at gcc dot gnu.org
  Known to work||5.3.0, 6.0
 Resolution|--- |FIXED

--- Comment #8 from Steve Ellcey  ---
Patch checked in on ToT for 6.0 and on 5.* branch.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #8 from Andrew Pinski  ---
Couldn't it be optimized as:
short func(short *a, int y)
{
 short ret = 0;
 unsigned int tmp = 0;
 int i;
 for(i = 0; i < y; i++) 
   tmp += (unsigned int)(int)a[i];

 return (short)tmp;
}

Such that the addition happens in unsigned (so there is only wrapping and is
well defined) and only one truncatation happens at the end of the loop.

[Bug libfortran/68867] numeric formatting problem in the fortran library

2015-12-15 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867

--- Comment #13 from Steve Kargl  ---
On Tue, Dec 15, 2015 at 06:03:55PM +, seurer at linux dot vnet.ibm.com
wrote:
> 
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
> FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test
> 
> I checked with the revision previous to this patch and the revision for this
> patch and the only differences were fmt_g0_7 succeeding and
> default_format_denormal_2 failing.

% svn diff default_format_denormal_2.f90
Index: default_format_denormal_2.f90
===
--- default_format_denormal_2.f90   (revision 231661)
+++ default_format_denormal_2.f90   (working copy)
@@ -1,4 +1,4 @@
-! { dg-do run { xfail powerpc*-apple-darwin* } }
+! { dg-do run { xfail powerpc*-*-* } }
 ! { dg-require-effective-target fortran_large_real }
 ! Test XFAILed on this platform because the system's printf() lacks
 ! proper support for denormalized long doubles. See PR24685

[Bug c++/68763] [6 Regression] ICE: in verify_unstripped_args, at cp/pt.c:1132

2015-12-15 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68763

David Binderman  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #6 from David Binderman  ---
Created attachment 37043
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37043=edit
C++ source code, compressed with xz

I can reproduce the problem with the attached C++ source code.

gcc trunk from 20151214.

[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed

2015-12-15 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906

--- Comment #3 from Yuri Rumyantsev  ---
I've prepared simple fix which cures ICE. I will send it for review tomorrow.

2015-12-15 12:50 GMT+03:00 jakub at gcc dot gnu.org :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906
>
> Jakub Jelinek  changed:
>
>What|Removed |Added
> 
>  CC||jakub at gcc dot gnu.org
>
> --- Comment #2 from Jakub Jelinek  ---
> This doesn't look to me like a mere omission to invalidate debug stmts after
> some stmt move that (correctly) has not considered debug stmts when 
> determining
> if they should be moved or not, but it looks to me like wrong-code
> transformation.
> Before unswitch, if c is non-zero, we have endless loop, but during 
> unswitching
> it is wrongly changed to branch to the bb that returns instead.
> Say if you compile with -O3 (no -g):
> int a;
> volatile int b;
> short c, d;
> int
> fn1 ()
> {
>   int e;
>   for (;;)
> {
>   a = 3;
>   if (c)
> continue;
>   e = 0;
>   for (; e > -30; e--)
> if (b)
>   {
> int f = e;
> return d;
>   }
> }
> }
>
> int
> main ()
> {
>   c = 1;
>   asm volatile ("" : : "m" (c) : "memory");
>   fn1 ();
>   __builtin_abort ();
> }
>
> then before the change this would just hang (expected), now it aborts instead.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread jon at beniston dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #4 from Jon Beniston  ---
Well if it is just truncating the higher bits, why can't it be done at the end
of the loop?

What do you think will be different if it is done at the end of the loop? Can
you think of an example where the value of ret will differ?

The MSBs in an add don't effect the LSBs.

[Bug libfortran/68867] numeric formatting problem in the fortran library

2015-12-15 Thread jvdelisle at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867

Jerry DeLisle  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Jerry DeLisle  ---
Revision 231639 committed to trunk.

2015-12-14  Jerry DeLisle  

PR libfortran/pr68867
* io/write.c (set_fnode_default): For kind=16, set the decimal
precision
depending on the platform binary precision, 106 or 113.

https://gcc.gnu.org/viewcvs/gcc?view=revision=231639

Fixed on trunk.

[Bug c++/63628] [c++1y] cannot use decltype on captured arg-pack

2015-12-15 Thread jason at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63628

--- Comment #4 from Jason Merrill  ---
(In reply to Paolo Carlini from comment #3)
> The second and third variants work in mainline.

Yes, they were fixed by the patch for bug 68309.  We need a further fix to
handle the original testcase.

[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921

--- Comment #1 from Jonathan Wakely  ---
This fixes it:

--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -52,7 +52,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// we will fall back to spin-waiting.  The only thing we could do
// here on errors is abort.
int ret __attribute__((unused));
-   ret = syscall (SYS_futex, __addr, futex_wait_op, __val);
+   ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
_GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN);
return true;
   }

[Bug c/68908] inefficient code for _Atomic operations

2015-12-15 Thread amacleod at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

--- Comment #7 from Andrew Macleod  ---
(In reply to Richard Henderson from comment #4)
> I think we should rather handle this in the front end than with
> quite complex pattern matching.
> 
> If we want to do any complex logic with atomics in the middle-end,
> we should change their representation so that we don't have to
> struggle with a sequence of builtins.  Which is clearly a subject
> for gcc7 at minimum.

Yes, I think anything more complex than this should be part of an atomics
optimization framework using a new set of ATOMIC gimple ops rather than
builtins.

For the purpose of this PR we ought to just fix it in the FE.

[Bug middle-end/56934] ICE folding a COND_EXPR involving vectors

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56934

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Marek Polacek  ---
Works with all active branches.

[Bug middle-end/62069] [GCC-5] ICE: in int_cst_value, at tree.c:10625

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62069

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Marek Polacek  ---
This now passes for me.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #3 from Steve Ellcey  ---
My understanding (I don't have a C/C++ standard handy) is that the addition
done by 'ret + a[i]' is done in integer mode (not as short).  This results in
an integer value that may be outside the range of a short, but in the
range of a normal integer.  So this is not really an overflow.   Then the
integer result is assigned to ret, which is short.  I believe that the
truncation of a integer value (with a value outside the range of a short)
to a short is not undefined by the C and C++ standards but has a specific
way that it needs to work (truncate off the higher bits).  This is the
truncation that needs to be done on each loop iteration.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #5 from Steve Ellcey  ---
If we did not truncate ret on each loop iteration then ret could get large
enough to overflow the maximum integer value before we truncate it at the end,
leading to undefined results.  But if we truncate ret on each loop iteration
then ret will not overflow and the result is defined.

[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code

2015-12-15 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

--- Comment #35 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #26)
> Another analysis by Jake in PR54037:

Eh, PR 54073.

[Bug libstdc++/68921] New: [5/6 Regression] std::future::wait() makes invalid futex calls and spins

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921

Bug ID: 68921
   Summary: [5/6 Regression] std::future::wait() makes invalid
futex calls and spins
   Product: gcc
   Version: 5.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
CC: torvald at gcc dot gnu.org
  Target Milestone: ---
Target: i?86-*linux*

On 32-bit linux the following spins in a tight loop until it times out:

#include 
#include 

int main() {
  std::promise p;
  auto f = p.get_future();

  std::thread t([](){
std::this_thread::sleep_for(std::chrono::seconds(10));
p.set_value();
  });

  f.wait();

  t.join();
}

strace shows thousands of invalid calls:

futex(0x8cf2a24, FUTEX_WAIT, 2147483648, {4289120584, 134527555}) = -1 EINVAL
(Invalid argument)

It's called from the infinite loop in
__atomic_futex_unsigned::_M_load_and_test_until in 

[Bug rtl-optimization/68920] [6 Regression] Undesirable if-conversion for a rarely taken branch

2015-12-15 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68920

--- Comment #1 from Uroš Bizjak  ---
Another incarnation of PR 56309 ?

[Bug middle-end/57348] [TM] ICE for transaction expression in gimplify_expr

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57348

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org

--- Comment #2 from Marek Polacek  ---
Still ICEs.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

Steve Ellcey  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #7 from Steve Ellcey  ---
I am still unconvinced but I will change it back to unconfirmed and leave it
there in case someone else wants to look at it and/or propose a patch.

[Bug middle-end/63383] internal compiler error: in expand_expr_real_1, at expr.c:9389

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63383

Marek Polacek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||mpolacek at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #5 from Marek Polacek  ---
This shouldn't ICE anymore, because the testcase is rejected due to:
fatal error: definition of std::initializer_list does not match #include


[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk

2015-12-15 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616

--- Comment #14 from H.J. Lu  ---
(In reply to H.J. Lu from comment #13)
> I got
> 
> FAIL: g++.dg/ipa/pr66616.C  -std=gnu++11 execution test
> FAIL: g++.dg/ipa/pr66616.C  -std=gnu++14 execution test
> FAIL: g++.dg/ipa/pr66616.C  -std=gnu++98 execution test
> 
> on trunk/x86-64.

It fails with -m32 on x86-64 for trunk and gcc-5-branch:

[hjl@gnu-6 gcc]$ 
/export/build/gnu/gcc-x32-5/build-x86_64-linux/gcc/testsuite/g++/../../xg++
-B/export/build/gnu/gcc-x32-5/build-x86_64-linux/gcc/testsuite/g++/../../
/export/gnu/import/git/sources/gcc-release/gcc/testsuite/g++.dg/ipa/pr66616.C
-m32 -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++
-I/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/include
-I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/libsupc++
-I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/include/backward
-I/export/gnu/import/git/sources/gcc-release/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=gnu++14 -O2 -fipa-cp-clone
-L/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-B/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-L/export/build/gnu/gcc-x32-5/build-x86_64-linux/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-lm -o ./pr66616.exe
[hjl@gnu-6 gcc]$ ./pr66616.exe
Aborted
[hjl@gnu-6 gcc]$

[Bug libfortran/68867] numeric formatting problem in the fortran library

2015-12-15 Thread seurer at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68867

Bill Seurer  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #12 from Bill Seurer  ---
FAIL: gfortran.dg/fmt_g0_7.f08   -O0  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O1  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O2  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -O3 -g  execution test
FAIL: gfortran.dg/fmt_g0_7.f08   -Os  execution test

The above tests were fixed by the patch but the following tests now fail

FAIL: gfortran.dg/default_format_denormal_2.f90   -O0  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O1  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O2  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -O3 -g  execution test
FAIL: gfortran.dg/default_format_denormal_2.f90   -Os  execution test

I checked with the revision previous to this patch and the revision for this
patch and the only differences were fmt_g0_7 succeeding and
default_format_denormal_2 failing.

[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
  Known to work||4.9.3
 Ever confirmed|0   |1

[Bug tree-optimization/16107] missed optimization with some math function builtins

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16107

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||6.0
 Resolution|--- |FIXED

--- Comment #8 from Marc Glisse  ---
Fixed a few months ago.

[Bug tree-optimization/55180] Missed optimization abs(-x) -> abs(x)

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55180
Bug 55180 depends on bug 16107, which changed state.

Bug 16107 Summary: missed optimization with some math function builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16107

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/57600] Turn 2 comparisons into 1 with the min

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57600

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
  Known to work||6.0
 Resolution|--- |FIXED

--- Comment #7 from Marc Glisse  ---
Fixed during stage 1.

[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins

2015-12-15 Thread carlos at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921

Carlos O'Donell  changed:

   What|Removed |Added

 CC||carlos at redhat dot com

--- Comment #2 from Carlos O'Donell  ---
(In reply to Jonathan Wakely from comment #1)
> This fixes it:
> 
> --- a/libstdc++-v3/src/c++11/futex.cc
> +++ b/libstdc++-v3/src/c++11/futex.cc
> @@ -52,7 +52,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> // we will fall back to spin-waiting.  The only thing we could do
> // here on errors is abort.
> int ret __attribute__((unused));
> -   ret = syscall (SYS_futex, __addr, futex_wait_op, __val);
> +   ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
> _GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN);
> return true;
>}

That is correct.

futex.2 from draft_futex upstream branch of linux man pages project:
~~~
If  the timeout argument is non-NULL, its contents specify a relative timeout
for the wait, measured according to the CLOCK_MONOTONIC clock.  (This  inter‐
val will be rounded up to the system clock granularity, and is guaranteed not
to expire early.)  If timeout is NULL, the call blocks indefinitely.
~~~

I assume you want to block indefinitely.

[Bug target/68923] New: SSE/AVX movq load (_mm_cvtsi64_si128) not being folded into pmovzx

2015-12-15 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923

Bug ID: 68923
   Summary: SSE/AVX movq load (_mm_cvtsi64_si128) not being folded
into pmovzx
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: peter at cordes dot ca
  Target Milestone: ---

context and background:
http://stackoverflow.com/questions/34279513/loading-8-chars-from-memory-into-an-m256-variable-as-packed-single-precision-f

Using intrinsics, I can't find a way to get gcc to emit

VPMOVZXBD   (%rsi), %ymm0   ; 64b load
VCVTDQ2PS   %ymm0,  %ymm0

without using _mm_loadu_si128, which will compile to an actual 128b load with
-O0.  (not counting evil use of #ifndef __OPTIMIZE__ to do it two different
ways, of course).


Since there is no intrinsic for PMOVSX / PMOVZX as a load from a narrower
memory location, the only way I can see to correctly write this with intrinsics
involves _mm_cvtsi64_si128 (MOVQ), which I don't even want the compiler to
emit.  clang3.6 and ICC13 compile this to the optimal sequence, still folding
the load into VPMOVZXBD, but gcc doesn't.


#include 
#include 
#define USE_MOVQ
__m256 load_bytes_to_m256(uint8_t *p)
{
#ifdef  USE_MOVQ  // compiles to an actual movq then pmovzx xmm,xmm with gcc
-O3
__m128i small_load = _mm_cvtsi64_si128( *(uint64_t*)p );
#else  // loadu compiles to a 128b load with gcc -O0, potentially segfaulting
__m128i small_load = _mm_loadu_si128( (__m128i*)p );
#endif

__m256i intvec = _mm256_cvtepu8_epi32( small_load );
return _mm256_cvtepi32_ps(intvec);
}



Problem 1: g++ -O3 -march=haswell emits (gcc 5.3.0 on godbolt)

load_bytes_to_m256(unsigned char*):
vmovq   (%rdi), %xmm0
vpmovzxbd   %xmm0, %ymm0
vcvtdq2ps   %ymm0, %ymm0
ret


Problem 2:
 gcc and clang don't even provide that movq intrinsic in 32bit mode.

(Split into a separate bug, since it's totally separate from the missing
optimization issue).

[Bug c++/68922] New: g++ fails to generate code for catch clause with specific optimizations enabled

2015-12-15 Thread alban.lefebvre at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68922

Bug ID: 68922
   Summary: g++ fails to generate code for catch clause with
specific optimizations enabled
   Product: gcc
   Version: 5.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alban.lefebvre at gmail dot com
  Target Milestone: ---

Hello,

The following code fails to generate a try/catch clause on g++ 5.2.1, 4.9.2,
4.8.4, 4.8.2:

#include 
#include 

class Base
{
public:
virtual ~Base() {}
};

class C1 : public virtual Base
{
};

class C2 : public virtual Base
{
public:
virtual void foo() = 0;
};

class D : public C1, public C2
{
public:
virtual void foo() 
{
throw std::exception();
}
};

int main()
{
C2 * c2 = new D();

try
{
c2->foo();
}
catch (...)
{
std::cout << "Caught some exception" << std::endl;
}

return 0;
}

when compiled with O2 optimization

  g++ main.cpp -Wall -Wextra -pedantic -O2 && ./a.out

I get the following error when executing it:

terminate called after throwing an instance of 'std::exception' 
  what():  std::exception
Aborted (core dumped)

It seems that __cxa_begin_catch, __cxa_end_catch calls do not get generated:

main:
sub$0x8,%rsp
mov$0x10,%edi
callq  0x400850 <_Znwm@plt>
lea0x8(%rax),%rdi
movq   $0x400ea0,(%rax)
movq   $0x400ed8,0x8(%rax)
callq  0x400b10 <_ZThn8_N1D3fooEv>
xor%eax,%eax
add$0x8,%rsp
retq   

It looks similar but possibly not the same as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68184
When we remove the virtual inheritance (by removing Base), the bug doesn't
occur anymore.

As far as which flags seem to trigger the issue, here what I've tried:
-O1   => OK 
-O2   => BUG 
-O3   => BUG
-O1 -ftree-pre -ftree-vrp => BUG 
-ftree-pre -ftree-vrp => OK

Thank you

[Bug libstdc++/68921] [5/6 Regression] std::future::wait() makes invalid futex calls and spins

2015-12-15 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68921

--- Comment #3 from torvald at gcc dot gnu.org ---
LGTM, thanks.  Would be nice to backport this.

[Bug target/68924] New: No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2015-12-15 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

Bug ID: 68924
   Summary: No intrinsic for x86  `MOVQ m64, %xmm`  in 32bit mode.
   Product: gcc
   Version: 5.3.0
   URL: http://stackoverflow.com/questions/34279513/loading-8-
chars-from-memory-into-an-m256-variable-as-packed-sing
le-precision-f
Status: UNCONFIRMED
  Keywords: missed-optimization, ssemmx
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: peter at cordes dot ca
  Target Milestone: ---
Target: i386-linux-gnu

context and background:
http://stackoverflow.com/questions/34279513/loading-8-chars-from-memory-into-an-m256-variable-as-packed-single-precision-f

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923


gcc and clang don't even provide the _mm_cvtsi64_si128 intrinsic for movq in
32bit mode (ICC does, see below).  They still provide  m128i
_mm_mov_epi64(__m128i a), but at -O0 the load of the source __m128i won't fold
into the movq, so you'd get an undesired 128b load that could cross a page
boundary and segfault.


The lack of this, and lack of an intrinsic for PMOVZX as a load from a narrower
source, is a design flaw in the intrinsics, IMO.  I think it's super dumb to be
forced to use an intrinsic for an instruction I don't want (movq), even if it
didn't cause a portability issue for x86-32bit.


Consider trying to get gcc to emit `VPMOVZXBD  (%src), %ymm0` for 32bit mode:

#include 
#include 
__m256 load_bytes_to_m256(uint8_t *p)
{
__m128i small_load = _mm_cvtsi64_si128( *(uint64_t*)p );
__m256i intvec = _mm256_cvtepu8_epi32( small_load );
return _mm256_cvtepi32_ps(intvec);
}

That's the same code as in the other bug report (about the failure to fold the
load into a memory source operand for vpmovzx:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923 ), but with the #ifdefs
taken out





_mm_cvtsi64_si128 is the intrinsic for the MOVQ %r/m64, %xmm form of MOVQ. 
(This is the MOVD/MOVQ entry in Intel's manual).  Its non-VEX encoding includes
a REX prefix, and even the VEX encoding of it is illegal in 32bit mode (prob.
because it couldn't decide if the insn was legal or not until it checked the
mod/rm byte to see if it encoded a 64b register source, instead of a 64b memory
location).  Since the other MOVQ gives identical results, and has a shorter
non-VEX encoding, there's no reason to bother with that complexity.

The other MOVQ (the one Intel's insn ref lists under just MOVQ), which can be
used for %mm,%mm reg moves, or the low half of %xmm,%xmm regs, only has a m128i
to m128i intrinsic:  m128i _mm_mov_epi64(__m128i a), not a load form (same
problem as the pmovz/sx intrinsics).





Other than this design-flaw in the intrinsics, you could see it as only a bug
in gcc/clang's implementation, since Intel's own implementation does still make
it possible to get MOVQ m64, %xmm emitted in 32bit mode.


ICC13 still provides _mm_cvtsi64_si128 in 32bit mode, and will use the MOVQ
xmm, m64 form as a load.  If it has a uint64_t in two 32bit registers, it
emulates it with 2xMOVD %r32, %xmm and a PUNPCKLDQ.  http://goo.gl/LQkVJL.  Two
32b stores then a movq load would cause a store-forwarding failure stall.   
vmovd/vpinsrd would be fewer instructions, but pinsrd is a 2-uop instruction on
Intel SnB-family CPUs, so as far as uops they're equal: 3 uops for the shuffle
port (port5).

At -O0, ICC emulates it that way even if the value is in memory, with 2x MOVD
m32, %xmm and a PUNPCK, so even Intel's compiler "thinks of" the intrinsic as
normally being the MOVQ %r/m64, %xmm form, not the MOVQ %xmm/m64, %xmm form.

[Bug target/68923] SSE/AVX movq load (_mm_cvtsi64_si128) not being folded into pmovzx

2015-12-15 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68923

Peter Cordes  changed:

   What|Removed |Added

   Keywords||missed-optimization, ssemmx
 Target||x86_64-linux-gnu
URL||http://stackoverflow.com/qu
   ||estions/34279513/loading-8-
   ||chars-from-memory-into-an-m
   ||256-variable-as-packed-sing
   ||le-precision-f

--- Comment #1 from Peter Cordes  ---
The other issue (that there's no intrinsic to generate a movq m64, %xmm in
32bit mode), is addressed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924.

This bug is just about the optimization failure to fold the load into pmovzx.

[Bug libstdc++/61347] std::distance(list.first(),list.end()) in O(1)

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61347

Marc Glisse  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Marc Glisse  ---
Not sure why I didn't close it at the time. Probably because of debug mode, but
I am pretty sure François made a pass on that later.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 57600, which changed state.

Bug 57600 Summary: Turn 2 comparisons into 1 with the min
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57600

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

Marek Polacek  changed:

   What|Removed |Added

  Component|c   |debug
   Target Milestone|--- |6.0
Summary|ICE on valid code at -O3 on |[6 Regression] ICE on valid
   |x86_64-linux-gnu in |code at -O3 on
   |maybe_record_trace_start,   |x86_64-linux-gnu in
   |at dwarf2cfi.c:2297 |maybe_record_trace_start,
   ||at dwarf2cfi.c:2297

[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
This doesn't look to me like a mere omission to invalidate debug stmts after
some stmt move that (correctly) has not considered debug stmts when determining
if they should be moved or not, but it looks to me like wrong-code
transformation.
Before unswitch, if c is non-zero, we have endless loop, but during unswitching
it is wrongly changed to branch to the bb that returns instead.
Say if you compile with -O3 (no -g):
int a;
volatile int b;
short c, d;
int
fn1 ()
{
  int e;
  for (;;)
{
  a = 3;
  if (c)
continue;
  e = 0;
  for (; e > -30; e--)
if (b)
  {
int f = e;
return d;
  }
}
}

int
main ()
{
  c = 1;
  asm volatile ("" : : "m" (c) : "memory");
  fn1 ();
  __builtin_abort ();
}

then before the change this would just hang (expected), now it aborts instead.

[Bug c++/53223] [c++0x] auto&& and operator* don't mix inside templates

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53223

Paolo Carlini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |6.0

--- Comment #14 from Paolo Carlini  ---
Fixed.

[Bug target/68910] New: SPARC/cypress: Poor code generation, huge stack frame

2015-12-15 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910

Bug ID: 68910
   Summary: SPARC/cypress: Poor code generation, huge stack frame
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.hu...@embedded-brains.de
  Target Milestone: ---

Created attachment 37036
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37036=edit
Test case.

The code for the SHA512_Transform() function is very poor for the SPARC cypress
target.

sparc-rtems4.12-gcc -c -O2 sha512c.i -mcpu=cypress

sha512c.o: file format elf32-sparc


Disassembly of section .text:

 :
   0:   9d e3 b0 58 save  %sp, -4008, %sp
   4:   94 10 20 80 mov  0x80, %o2
   8:   92 10 00 19 mov  %i1, %o1
   c:   90 07 bd 80 add  %fp, -640, %o0
[...]
 10c:   40 00 00 00 call  10c 
 110:   90 07 bd 40 add  %fp, -704, %o0
 114:   c0 27 bd 20 clr  [ %fp + -736 ]
 118:   c0 27 bd 24 clr  [ %fp + -732 ]
 11c:   c0 27 bd 10 clr  [ %fp + -752 ]
 120:   c0 27 bd 14 clr  [ %fp + -748 ]
 124:   c0 27 bd 08 clr  [ %fp + -760 ]
 128:   c0 27 bd 0c clr  [ %fp + -756 ]
 12c:   c0 27 bd 00 clr  [ %fp + -768 ]
 130:   c0 27 bd 04 clr  [ %fp + -764 ]
 134:   c0 27 bc f8 clr  [ %fp + -776 ]
 138:   c0 27 bc fc clr  [ %fp + -772 ]
[...]

Compared to v8:

sparc-rtems4.12-gcc -c -O2 sha512c.i -mcpu=v8

 :
   0:   9d e3 bc b8 save  %sp, -840, %sp
   4:   94 10 20 80 mov  0x80, %o2
   8:   92 10 00 19 mov  %i1, %o1
   c:   90 07 bd 80 add  %fp, -640, %o0
  10:   40 00 00 00 call  10 
  14:   f0 27 a0 44 st  %i0, [ %fp + 0x44 ]
[...]

No massive clr instructions.

[Bug rtl-optimization/66248] subreg truncation not hoisted from loop

2015-12-15 Thread jon at beniston dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66248

--- Comment #2 from Jon Beniston  ---
Hi Steve. I'm not sure I'm follow your explanation. 

As I understand it, signed overflow is undefined behaviour
(http://www.airs.com/blog/archives/120), so I'm not sure why we need to worry
about changing the overflow behaviour (as the 16 LSBs should be the same). Even
if not, -fstrict-overflow should be enabled at -O2, so the compiler should be
able to assume that overflow will not occur anyway.

[Bug debug/58315] [4.9/5 Regression] Excessive memory use with -g

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58315
Bug 58315 depends on bug 66688, which changed state.

Bug 66688 Summary: [6 Regression] compare debug failure building Linux kernel 
on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66688

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug debug/66688] [6 Regression] compare debug failure building Linux kernel on ppc64le

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66688

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jakub Jelinek  ---
Fixed.

[Bug testsuite/68629] FAIL: c-c++-common/attr-simd-3.c

2015-12-15 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68629

--- Comment #4 from Christophe Lyon  ---
(In reply to Thomas Preud'homme from comment #3)
> Hi Christophe,
> 
> Could you paste the output of arm linux when compiling the testcase in
> cilkplus effective target with -fcilkplus?

The output is now:
xgcc: error: libcilkrts.spec: No such file or directory

Before your patch, compiling attr-simd-3 produced an error message, but the
test passed nonetheless:
error: '#pragma omp declare simd' or 'simd' attribute cannot be used in the
same function marked as a Cilk Plus SIMD-enabled function

[Bug c++/68782] [6 regression] bad reference member formed with constexpr

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68782

--- Comment #4 from Jakub Jelinek  ---
(In reply to Jason Merrill from comment #3)
> Hmm, any element without TREE_CONSTANT should have caused us to return 
> the original CONSTRUCTOR.

Perhaps the TREE_SIDE_EFFECTS stuff is not needed, but for TREE_CONSTANT
perhaps the reason is that constexpr.c has different POV on what is a constant
compared to ../tree.c - at least it seems that cxx_eval_constant_expression
happily accepts >c as *non_constant_p = false, but if CONSTRUCTOR containing
that is marked TREE_CONSTANT (which is IMHO wrong, because in that case all
elements should be TREE_CONSTANT), then we e.g. trigger:
case CONSTRUCTOR:
  if (TREE_CONSTANT (t))
/* Don't re-process a constant CONSTRUCTOR, but do fold it to
   VECTOR_CST if applicable.  */
return fold (t);
  r = cxx_eval_bare_aggregate (ctx, t, lval,
   non_constant_p, overflow_p);
  break;
and just fold it instead of calling cxx_eval_bare_aggregate on it.

> I thought there was already a function to recompute these flags, but I'm 
> not finding it.

I can't find it either, we have that only for ADDR_EXPR it seems -
recompute_tree_invariant_for_addr_expr.

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

--- Comment #3 from Marek Polacek  ---
(In reply to Chengnian Sun from comment #2)
> Is it related to this recently fixed bug?
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67778

Doesn't look like it, this one has been caused by:

commit a1965220d5ae62c617abfe40e1dc5c03bb7aa38f
Author: law 
Date:   Sat Nov 7 06:31:14 2015 +

[PATCH] Remove more backedge threading support

* tree-ssa-threadedge.c (dummy_simplify): Remove.
(thread_around_empty_blocks): Remove backedge_seen_p argument.
If we thread to a backedge, then return false.  Update recursive
call to eliminate backedge_seen_p argument.
(thread_through_normal_block): Remove backedge_seen_p argument.
Remove backedge_seen_p argument from calls to
thread_around_empty_blocks.  Remove checks on backedge_seen_p.
If we thread to a backedge, then return 0.
(thread_across_edge): Remove bookkeeping for backedge_seen.  Don't
pass it to thread_through_normal_block or thread_through_empty_blocks.
For joiner handling, if we see a backedge, do not try normal
threading.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229911
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug c/68845] -Werror=array-bounds=[12] doesn't turn warning into error

2015-12-15 Thread sirl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68845

--- Comment #3 from Franz Sirl  ---
Created attachment 37035
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37035=edit
Alias -Warray-bounds to Warray-bounds=

Tentative patch, no regressions. Please commit if OK, I don't have valid
credentials anymore.

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

--- Comment #5 from Jakub Jelinek  ---
This started with r229911, but it must be some RTL optimization bug instead.

[Bug target/68910] SPARC/cypress: Poor code generation, huge stack frame

2015-12-15 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910

Sebastian Huber <sebastian.hu...@embedded-brains.de> changed:

   What|Removed |Added

  Known to fail||6.0

--- Comment #2 from Sebastian Huber <sebastian.hu...@embedded-brains.de> ---
sparc-rtems4.12-gcc (GCC) 6.0.0 20151215 (experimental)

[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506

Paolo Carlini  changed:

   What|Removed |Added

 CC||paolo.carlini at oracle dot com

--- Comment #7 from Paolo Carlini  ---
This is fixed in mainline. I'm adding the reduced testcases and closing the
bug.

[Bug c/68909] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||mpolacek at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Marek Polacek  ---
The backtrace looks the same as in PR65496.

[Bug tree-optimization/68906] [6 Regression] ICE at -O3 on x86_64-linux-gnu: verify_ssa failed

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68906

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||ienkovich at gcc dot gnu.org,
   ||mpolacek at gcc dot gnu.org
  Component|c   |tree-optimization
   Target Milestone|--- |6.0
Summary|ICE at -O3 on   |[6 Regression] ICE at -O3
   |x86_64-linux-gnu:   |on x86_64-linux-gnu:
   |verify_ssa failed   |verify_ssa failed
 Ever confirmed|0   |1

--- Comment #1 from Marek Polacek  ---
Confirmed, started with:

commit a361141865247626a73c0f2257a95bc7d4f274c9
Author: ienkovich 
Date:   Thu Oct 8 13:14:09 2015 +

gcc/

* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(get_vop_from_header): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.

gcc/testsuite/

* gcc.dg/loop-unswitch-2.c: New test.
* gcc.dg/loop-unswitch-3.c: Likewise.
* gcc.dg/loop-unswitch-4.c: Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228599
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

--- Comment #4 from Marek Polacek  ---
Thus, not a dup of PR65496.

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread chengniansun at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

Chengnian Sun  changed:

   What|Removed |Added

 CC||chengniansun at gmail dot com

--- Comment #2 from Chengnian Sun  ---
(In reply to Marek Polacek from comment #1)
> The backtrace looks the same as in PR65496.

Is it related to this recently fixed bug?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67778

[Bug target/68908] inefficient code for _Atomic operations

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

Jakub Jelinek  changed:

   What|Removed |Added

 Target|powerpc64   |
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||jakub at gcc dot gnu.org,
   ||jsm28 at gcc dot gnu.org,
   ||mpolacek at gcc dot gnu.org,
   ||rth at gcc dot gnu.org
Summary|inefficient code for an |inefficient code for
   |atomic preincrement on  |_Atomic operations
   |powerpc64le |
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Doesn't seem to be ppc64le specific in any way, and doesn't affect just
preincrement.
Try:
typedef _Atomic int AI;
AI i;

void
fn1 (AI * ai)
{
  ++*ai;
}

void
fn2 (AI * ai)
{
  (*ai)++;
}

void
fn3 (AI * ai)
{
  *ai += 6;
}

void
fn4 (void)
{
  ++i;
}

void
fn5 (void)
{
  i++;
}

void
fn6 (void)
{
  i += 2;
}
and you'll see even on x86_64-linux that all the sequences use the generic CAS
instructions instead of __atomic_fetch_add etc.

The comment above build_atomic_assign even says this:
"Also note that the compiler is simply issuing the generic form of the atomic
operations."

So, the question is, should we add smarts to the FE to optimize the cases
already when emitting them (this would be similar to what omp-low.c does when
expanding #pragma omp atomic, see:
  /* When possible, use specialized atomic update functions.  */
  if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
  && store_bb == single_succ (load_bb)
  && expand_omp_atomic_fetch_op (load_bb, addr,
 loaded_val, stored_val, index))
return;
), or should we add some pattern matching in some pass that would try to detect
these rather complicated patterns like:
  :
  _5 = __atomic_load_4 (ai_3(D), 5);
  _6 = (int) _5;
  D.1768 = _6;

  :
  # prephitmp_17 = PHI <_6(2), pretmp_16(4)>
  _9 = prephitmp_17 + 1;
  _10 = (unsigned int) _9;
  _12 = __atomic_compare_exchange_4 (ai_3(D), , _10, 0, 5, 5);
  if (_12 != 0)
goto ;
  else
goto ;

  :
  pretmp_16 = D.1768;
  goto ;

(with the casts in there optional) and convert those to the more efficient
__atomic_* calls if possible?  Note one issue is that the pattern involves
non-SSA loads/stores (the D.1768 var above) and we'd need to prove that the var
is used only in those two places and nowhere else.

[Bug c++/21802] Two-stage name lookup fails for operators

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21802

Paolo Carlini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |6.0

--- Comment #7 from Paolo Carlini  ---
Fixed then.

[Bug testsuite/68629] FAIL: c-c++-common/attr-simd-3.c

2015-12-15 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68629

--- Comment #5 from Christophe Lyon  ---
After discussion on IRC, it seems better to keep your patch as-is, since
cilk-plus is not supported on arm anyway.

[Bug target/68910] SPARC/cypress: Poor code generation, huge stack frame

2015-12-15 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68910

--- Comment #1 from Sebastian Huber  ---
Code generation for leon3 is also quite bad.

[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions

2015-12-15 Thread paolo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506

--- Comment #8 from paolo at gcc dot gnu.org  ---
Author: paolo
Date: Tue Dec 15 10:18:13 2015
New Revision: 231646

URL: https://gcc.gnu.org/viewcvs?rev=231646=gcc=rev
Log:
2015-12-15  Paolo Carlini  

PR c++/63506
* g++.dg/cpp0x/pr63506-1.C: New.
* g++.dg/cpp0x/pr63506-2.C: Likewise.

Added:
trunk/gcc/testsuite/g++.dg/cpp0x/pr63506-1.C
trunk/gcc/testsuite/g++.dg/cpp0x/pr63506-2.C
Modified:
trunk/gcc/testsuite/ChangeLog

[Bug c++/63506] GCC deduces wrong return type of operator*() inside template functions

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63506

Paolo Carlini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC|paolo.carlini at oracle dot com|
 Resolution|--- |FIXED
   Target Milestone|--- |6.0

--- Comment #9 from Paolo Carlini  ---
Done.

[Bug debug/68909] [6 Regression] ICE on valid code at -O3 on x86_64-linux-gnu in maybe_record_trace_start, at dwarf2cfi.c:2297

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68909

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
I think this testcase just shows that the PR67778 fix is insufficient.
We again have a complex cfg full of various loops, followed by a single bb
(bb10 in this case) that needs frame pointer.
Again, we see:
Attempting shrink-wrapping optimization.
Block 10 needs the prologue.
After wrapping required blocks, PRO is now 10
Avoiding non-duplicatable blocks, PRO is now 10
Bumping back to anticipatable blocks, PRO is now 6
where putting prologue at the entry of bb10 is fine, but putting it at the
entry of bb6 (shrink-wrapping actually creates bb11 with the prologue and
redirects edges from bb8 and bb5 to the new bb11 and bb11 then falls through to
bb6) is wrong.  While bb5 is only reachable from bbs before the prologue, so
that is fine, bb8 is reachable both from bb2 (i.e. from bbs before the
prologue), but also from bb9, which is dominated by bb6.  So, by incorrectly
putting the prologue at the start of bb6 (well, that bb self-loops, so it is
put on the other edges), we then can take path from ENTRY -> bb2 -> bb3 -> bb5
-> bb11[prologue] -> bb6 -> bb7 -> bb9 -> bb8 -> bb11[prologue] and enter the
prologue 2 times (or more times).

[Bug rtl-optimization/67715] [6 Regression][ARM] ICE in cselib.c during reload_cse_regs

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67715

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Jakub Jelinek  ---
Assuming fixed then.

[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477

Jakub Jelinek  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jakub Jelinek  ---
Assuming fixed.

[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477

Jakub Jelinek  changed:

   What|Removed |Added

 CC||renlin at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
And presumably fixed for real with r228662 ?

[Bug libstdc++/68863] Regular expressions: Backreferences don't work in negative lookahead

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68863

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |4.9.4
  Known to fail||4.9.3, 5.3.0

[Bug c/68911] [6 Regression] wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911

--- Comment #2 from Jakub Jelinek  ---
This goes wrong during vrp1.
Analyzing # of iterations of loop 2
  exit condition [e_6, + , 1] <= 93
  bounds on difference of bases: -4294967202 ... 93
  result:
zero if e_6 > 94
# of iterations 94 - e_6, bounded by 94
looks wrong to me, e_6 as well as the additions and comparison are performed in
unsigned type, therefore 94 - e_6 is I believe not bounded by 94.  The value
ranges for e_6 clearly allow (and in the testcase are) some very large unsigned
numbers, so 94 - e_6.  If assuming the value of f is arbitrary (it is not),
then the possible values of e before entering the
while (e < 94)
  e++;
loop are either 2, 94, 0xU or 0xfffeU (of course f is not arbitrary
and as b and d are both 0, it will be actually 0xU each time.
But from those 4 numbers the number of iterations would be bound by 96.

[Bug c/68911] New: wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)

2015-12-15 Thread chengniansun at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911

Bug ID: 68911
   Summary: wrong code at -Os and above on x86-64-linux-gnu (in
32- and 64-bit modes)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chengniansun at gmail dot com
  Target Milestone: ---

The current gcc trunk miscompiles the following code on x86_64-linux-gnu in
both 32- and 64-bit modes at -Os and above. 

$: gcc-trunk -v
Using built-in specs.
COLLECT_GCC=gcc-trunk
COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk
--enable-languages=c,c++ --disable-werror --enable-multilib
Thread model: posix
gcc version 6.0.0 20151214 (experimental) [trunk revision 231607] (GCC) 
$: 
$: gcc-trunk -Os -w -m32 small.c ; timeout -s 9 10 ./a.out
Killed
$: gcc-trunk -O2 -w -m32 small.c ; timeout -s 9 10 ./a.out
Killed
$: gcc-trunk -O3 -w -m32 small.c ; timeout -s 9 10 ./a.out
Killed
$: gcc-trunk -O0 -w -m32 small.c ; timeout -s 9 10 ./a.out
$: gcc-trunk -O1 -w -m32 small.c ; timeout -s 9 10 ./a.out
$: 
$: gcc-trunk -Os -w -m64 small.c ; timeout -s 9 10 ./a.out
Killed
$: gcc-trunk -O2 -w -m64 small.c ; timeout -s 9 10 ./a.out
Killed
$: gcc-trunk -O3 -w -m64 small.c ; timeout -s 9 10 ./a.out
Killed
$: 
$: cat small.c
char a;
int b, c;
short d;
int main() {
  unsigned e = 2;
  for (; c < 2; c++) {
int f = ~e / 7;
if (f)
  a = e = ~(b && d);
while (e < 94)
  e++;
  }
  return 0;
}
$:

[Bug libstdc++/68912] New: Wrong value category used in _Bind functor

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68912

Bug ID: 68912
   Summary: Wrong value category used in _Bind functor
   Product: gcc
   Version: 4.9.4
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

From https://gcc.gnu.org/ml/libstdc++/2015-12/msg00035.html

The _Bind class function-call operator looks something like this:
template()(_Mu<_Bound_args>()(
std::declval<_Bound_args&>(), std::declval&>() )... )
)>
_Result operator()(_Args&&... __args) { ... }
The problem is that std::declval returns an rvalue reference, but the
functor is invoked in an lvalue context. As a result, the following
(valid) code will fail to compile:
#include
struct B {};
struct C {};
struct A {
B operator()(int, double, char) & { return B(); }
C operator()(int, double, char) && {return C(); }
};
int main() {
A a;
auto bound = std::bind(a, 5, 4.3, 'c');
auto res = bound();
}

[Bug c++/63628] [c++1y] cannot use decltype on captured arg-pack

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63628

--- Comment #3 from Paolo Carlini  ---
The second and third variants work in mainline.

[Bug c++/68071] Generic lambda variadic argument pack cannot be empty

2015-12-15 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68071

Paolo Carlini  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC|vittorio.romeo at outlook dot com  |
 Ever confirmed|0   |1

[Bug tree-optimization/68862] [6 Regression] g++.dg/torture/pr59163.C FAILs with -flive-range-shrinkage

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68862

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||jakub at gcc dot gnu.org,
   ||uros at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Started with r229086.
That said, I think it looks like an i386 backend problem.
I believe for pre-AVX we rely on unaligned loads/stores to be done with unspecs
(UNSPEC_LOADU/UNSPEC_STOREU and that way make sure those don't leak into
arithmetic instructions which pre-AVX can't handle unaligned memory operands.
But on this testcase those aren't used, because the load and store aren't
performed in some vector mode, but in TImode instead (as that is the mode of
the structure).  So we have:
(insn 6 2 8 2 (set (reg:TI 90 [ *a_4(D) ])
(mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32])) pr68862.c:15 84
{*movti_internal}
 (expr_list:REG_EQUIV (mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32])
(nil)))
(insn 8 6 9 2 (set (reg:V4SF 92)
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2  S16 A128]))
pr68862.c:17 1221 {*movv4sf_internal}
 (expr_list:REG_EQUIV (const_vector:V4SF [
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
])
(nil)))
(insn 9 8 12 2 (set (reg:V4SF 91 [ vect__7.7 ])
(mult:V4SF (reg:V4SF 92)
(subreg:V4SF (reg:TI 90 [ *a_4(D) ]) 0))) pr68862.c:17 1436
{*mulv4sf3}
 (expr_list:REG_DEAD (reg:V4SF 92)
(expr_list:REG_DEAD (reg:TI 90 [ *a_4(D) ])
(nil
(insn 12 9 17 2 (set (mem:TI (reg/v/f:DI 89 [ a ]) [1 *a_4(D)+0 S16 A32])
(subreg:TI (reg:V4SF 91 [ vect__7.7 ]) 0)) pr68862.c:18 84
{*movti_internal}
 (expr_list:REG_DEAD (reg:V4SF 91 [ vect__7.7 ])
(expr_list:REG_DEAD (reg/v/f:DI 89 [ a ])
(nil
in *.ira, which is still not invalid according to the current rules, but then
LRA changes it into:
(insn 8 6 9 2 (set (reg:V4SF 21 xmm0 [92])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2  S16 A128]))
pr68862.c:17 1221 {*movv4sf_internal}
 (expr_list:REG_EQUIV (const_vector:V4SF [
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
(const_double:SF 6.0e+0 [0x0.cp+3])
])
(nil)))
(insn 9 8 12 2 (set (reg:V4SF 21 xmm0 [orig:91 vect__7.7 ] [91])
(mult:V4SF (reg:V4SF 21 xmm0 [92])
(mem:V4SF (reg/v/f:DI 5 di [orig:89 a ] [89]) [1 *a_4(D)+0 S16
A32]))) pr68862.c:17 1436 {*mulv4sf3}
 (nil))
(insn 12 9 17 2 (set (mem:TI (reg/v/f:DI 5 di [orig:89 a ] [89]) [1 *a_4(D)+0
S16 A32])
(reg:TI 21 xmm0 [orig:91 vect__7.7 ] [91])) pr68862.c:18 84
{*movti_internal}
 (nil))

Not sure what to do about this though, most of the SSE* arithmetic instructions
use nonimmediate_operand or similar predicates, we'd have to switch all of them
to use some other predicate that for pre-AVX would disallow misaligned_operand.

[Bug target/24012] [4.9/5/6 regression] #define _POSIX_C_SOURCE breaks #include

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24012

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|5.4 |7.0

--- Comment #18 from Jonathan Wakely  ---
The (fairly large) changes needed to fix this didn't happen for gcc 5, or gcc6,
adjusting target milestone.

[Bug tree-optimization/68862] [6 Regression] g++.dg/torture/pr59163.C FAILs with -flive-range-shrinkage

2015-12-15 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68862

--- Comment #3 from Zdenek Sojka  ---
(In reply to Jakub Jelinek from comment #2)
> Started with r229086.
> That said, I think it looks like an i386 backend problem.

True, I have 7 FAILs of pr59163.C on x86 (x86_64 and *x32), but none on other
architectures.

[Bug middle-end/68870] [6 Regression] ICE on valid code at -O1, -O2 and -O3 on x86_64-linux-gnu

2015-12-15 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68870

--- Comment #8 from Arseny Solokha  ---
I believe the following reproducer is for the issue reported here. Its further
minimization yields backtrace listed in #c0.

The only difference is that w/ the following not minimized snippet gcc ICEs in
tree_nop_conversion_p(tree_node const*, tree_node const*) when compiling it at
-O1:

int w3, ao, k9, nl;

static int
oy(void)
{
  static int pe;
  int ht;
  for (ao = 0; ao < 1; ++ao)
for (ht = 0; ht < 1; ++ht)
  for (w3 = 0; w3 < 1; ++w3)
for (k9 = 0; k9 < 1; ++k9)
  if (pe < 1)
return 0;
  return (ht > 1) || nl;
}

int
ct(void)
{
  return oy();
}

[Bug target/66171] [6 Regression]: gcc.target/cris/biap.c

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66171

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||jakub at gcc dot gnu.org

[Bug c++/68903] missing default initialization of member when combined with virtual inheritance

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68903

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
Summary|missing default |missing default
   |initialization of member|initialization of member
   |when combined with virtual  |when combined with virtual
   |imheritance |inheritance
 Ever confirmed|0   |1
  Known to fail||4.9.3, 5.3.0, 6.0

[Bug target/68896] [ARM] target attribute ignored

2015-12-15 Thread chrbr at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68896

--- Comment #2 from chrbr at gcc dot gnu.org ---
Currently not a bug, or rather implementation specified.

According to the documentation

6.61.15 Function Specific Option Pragmas

#pragma GCC target ("string"...)
...
Each function that is defined after this point is as if
attribute((target("STRING"))) was specified for that function

So here we have 

#pragma GCC target ("fpu=vfp")
...
int8x8_t __attribute__ ((target("fpu=neon"))) my

so "my" is defined as if attribute((target("fpu=vfp"))) was specified.

Now, IMHO this is not intuitive since the attribute targets has a smaller
scope, it should have a higher priority. And the doc doesn't say if the
attribute target is inserted before or after the existing ones, in case of
conflict.
so literally not a bug, but I'd like to specify the order of insertion to solve
your current issue.

[Bug ipa/66616] [4.9/5/6 regression] fipa-cp-clone ignores thunk

2015-12-15 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66616

--- Comment #12 from Martin Jambor  ---
No, I'm still in the process of testing a slightly modified patch for 4.9.

[Bug c++/68903] missing default initialization of member when combined with virtual imheritance

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68903

Jonathan Wakely  changed:

   What|Removed |Added

   Severity|blocker |normal

[Bug c/68911] [6 Regression] wrong code at -Os and above on x86-64-linux-gnu (in 32- and 64-bit modes)

2015-12-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68911

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||amker at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
   Target Milestone|--- |6.0
Summary|wrong code at -Os and above |[6 Regression] wrong code
   |on x86-64-linux-gnu (in 32- |at -Os and above on
   |and 64-bit modes)   |x86-64-linux-gnu (in 32-
   ||and 64-bit modes)
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Started with r224020.

[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388

2015-12-15 Thread renlin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477

--- Comment #7 from Renlin Li  ---
(In reply to Jakub Jelinek from comment #4)
> The ICE has been on
> (insn 105 746 971 5 (parallel [
> (set (reg:V16QI 60 d22 [720])
> (unspec:V16QI [
> (reg:V16QI 60 d22 [720])
> (reg:V16QI 60 d22 [720])
> ] UNSPEC_VTRN1))
> (set (reg:V16QI 60 d22 [720])
> (unspec:V16QI [
> (reg:V16QI 60 d22 [720])
> (reg:V16QI 60 d22 [720])
> ] UNSPEC_VTRN2))
> ]) pr67477.c:63 1972 {*neon_vtrnv16qi_insn}
>  (nil))
> which was clearly invalid RTL, multiple sets of the same register.  The insn
> was still ok in the *.ira dump and broken in *.reload dump.
> (define_insn "*neon_vtrn_insn"
>   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
> (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
>   (match_operand:VDQW 3 "s_register_operand" "2")]
>  UNSPEC_VTRN1))
>(set (match_operand:VDQW 2 "s_register_operand" "=w")
>  (unspec:VDQW [(match_dup 1) (match_dup 3)]
>  UNSPEC_VTRN2))]
>   "TARGET_NEON"
>   "vtrn.\t%0, %2"
>   [(set_attr "type" "neon_permute")]
> doesn't look like a target bug that would allow 2 same set destinations.

That's exactly what I have observed. r228662 fixes that by adding early clobber
modifier to the operand, so that register could assign a different register.

[Bug c++/68905] [DR496] __is_trivially_copyable returns True for volatile class types.

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68905

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
Summary|__is_trivially_copyable |[DR496]
   |returns True for volatile   |__is_trivially_copyable
   |class types.|returns True for volatile
   ||class types.
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
This is http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#496

Moved to DR at the April, 2013 meeting.

[Bug c++/68819] Invalid "-Wmisleading-indentation" warning if location_t >=LINE_MAP_MAX_LOCATION_WITH_COLS

2015-12-15 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68819

Markus Trippelsdorf  changed:

   What|Removed |Added

 CC||trippels at gcc dot gnu.org

--- Comment #8 from Markus Trippelsdorf  ---
Another similar example:

int main() {
  int i = 0;
  do i++; while (i < 3);
}

[Bug c/68845] -Werror=array-bounds=[12] doesn't turn warning into error

2015-12-15 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68845

--- Comment #4 from Manuel López-Ibáñez  ---
(In reply to Franz Sirl from comment #3)
> Created attachment 37035 [details]
> Alias -Warray-bounds to Warray-bounds=
> 
> Tentative patch, no regressions. Please commit if OK, I don't have valid
> credentials anymore.

I cannot approve your patch, and the people who can, probably do not read this
report. The best chance to get a patch approved is to send it to
gcc-patc...@gcc.gnu.org, CCing people who can review it (in this case,
middle-end people, see MAINTAINERS), and explaining how you did bootstrap &
regression testing.

[Bug c++/58796] throw nullptr not caught by catch(type*)

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58796

--- Comment #9 from Jonathan Wakely  ---
Yes, it's on my list. That's why I changed the target milestone to 6.0 a week
ago.

[Bug libstdc++/68912] Wrong value category used in _Bind functor

2015-12-15 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68912

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2015-12-15
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug c/68908] inefficient code for _Atomic operations

2015-12-15 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68908

--- Comment #8 from joseph at codesourcery dot com  ---
I'm fine with making the front end smarter.  Note that if either side of 
the assignment is of floating-point type, you need to keep the existing 
logic; if you're adding to / subtracting from a pointer, you need to 
ensure the multiplication by the size of the pointer target type still 
occurs; and if the arithmetic operation might be sanitized, you probably 
need to keep the existing logic as well (but otherwise, if the 
__atomic_fetch_* operations never have undefined overflow, it should be 
safe to do the operation in the type of the LHS).

[Bug tree-optimization/63185] Improve DSE with branches

2015-12-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63185

--- Comment #6 from Marc Glisse  ---
In addition to the issues already described, it seems that we generate better
code if I replace the VLAs with calls to alloca. Indeed, we assume that alloca
returns 16-aligned memory, while with __builtin_alloca_with_align(..., 64), we
don't seem to have code to turn it into __builtin_alloca_with_align(..., 128)
so we could avoid all the loop adjustment code.

[Bug libstdc++/68925] New: uniform_int_distribution needs not to be thread_local in std::experimental::randint

2015-12-15 Thread lichray at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68925

Bug ID: 68925
   Summary: uniform_int_distribution needs not to be thread_local
in std::experimental::randint
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: lichray at gmail dot com
  Target Milestone: ---

libstdc++'s uniform_int_distribution is stateless, thus just

  return _Dist(a, b)(_S_randint_engine());

will do the work, and produces more compact binary.

[Bug inline-asm/10396] Constraint alternatives cause error " `asm' operand requires impossible reload"

2015-12-15 Thread bernd.edlinger at hotmail dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10396

Bernd Edlinger  changed:

   What|Removed |Added

 CC||bernd.edlinger at hotmail dot 
de

--- Comment #23 from Bernd Edlinger  ---
(In reply to David from comment #22)
> Despite the impression you may get from comments 17-21, gcc DOES support
> multi-alternatives with inline asm (see
> https://gcc.gnu.org/ml/gcc/2015-10/msg00249.html).
> 
> I do not have an arm build with which to test, but using 5.2 on x64, the
> samples in this bug do not produce errors.  Perhaps in the 7-12 years since
> they were added, something got fixed?  Or maybe this problem is
> platform-specific.

you can build a cross-compiler out of nothing, if you want.

cd binutils-build-arm
../binutils-2.25.1/configure --prefix=../arm-eabi --target=arm-unknown-eabi
make && make install
cd ../gcc-build-arm
../gcc-trunk/configure --prefix=/home/ed/gnu/arm-eabi --target=arm-unknown-eabi
--enable-languages=c --disable-libssp
make && make install

[Bug lto/68799] lto ICE on powerpc64le-linux-gnu builing python 2.7.x

2015-12-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68799

Bill Schmidt  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wschmidt at gcc dot 
gnu.org
   Target Milestone|--- |6.0

[Bug target/61298] redundant compare instructions for powerpc64

2015-12-15 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61298

Peter Bergner  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #7)
> Fixed (or hidden) on trunk with r222855.

Given that, I'm going to mark this as fixed.

[Bug c++/68929] New: GCC hangs in nested template instantiations even after static_assert fails.

2015-12-15 Thread eric at efcs dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68929

Bug ID: 68929
   Summary: GCC hangs in nested template instantiations even after
static_assert fails.
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: eric at efcs dot ca
  Target Milestone: ---

GCC currently hangs when compiling the attached reproducer. The reproducer is a
stripped down libc++ test that ensures that  "std::make_integer_sequence" causes a static assertion. 

GCC will emit the assertion but then continue to run and consume more memory
until its killed for being OOM.

[Bug other/66250] Can't adjust complex nor decimal floating point modes

2015-12-15 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66250

Bernd Schmidt  changed:

   What|Removed |Added

 CC||bernds at gcc dot gnu.org

--- Comment #2 from Bernd Schmidt  ---
The motivation for this patch seems unclear. What is this fixing?

[Bug target/68256] [6 regression] switching constant pools to rodata sections causes go bootstrap failure.

2015-12-15 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68256

Bernd Schmidt  changed:

   What|Removed |Added

 CC||bernds at gcc dot gnu.org

--- Comment #3 from Bernd Schmidt  ---
Can this be closed?

[Bug rtl-optimization/63491] Ice in LRA with simple vector test case on power

2015-12-15 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63491

Bill Schmidt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-12-15
 CC||wschmidt at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #15 from Bill Schmidt  ---
Obviously confirmed at this point.  Vlad, do you plan to backport this to GCC
5?  We should get this closed if this is fixed.

[Bug target/68928] New: AVX loops on unaligned arrays could generate more efficient startup/cleanup code when peeling

2015-12-15 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68928

Bug ID: 68928
   Summary: AVX loops on unaligned arrays could generate more
efficient startup/cleanup code when peeling
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization, ssemmx
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: peter at cordes dot ca
  Target Milestone: ---
Target: x86-64-*-*

I have some suggestions for better code that gcc could use for the
prologue/epilogue when vectorizing loops over unaligned buffers.  I haven't
looked at gcc's code, just the output, so IDK how one might get gcc to
implement these.

-

Consider the following code:

#include 

typedef float float_align32 __attribute__ ((aligned (32)));
void floatmul_aligned(float_align32 *a) {
  for (int i=0; i<1024 ; i++)
a[i] *= 2;
}


void floatmul(float *a) {
  for (int i=0; i<1024 ; i++)
a[i] *= 2;
}

g++ 5.3.0 -O3 -march=sandybridge emits what you'd expect for the aligned
version: 

floatmul_aligned(float*):
leaq4096(%rdi), %rax
.L2:
vmovaps (%rdi), %ymm0
addq$32, %rdi
vaddps  %ymm0, %ymm0, %ymm0
vmovaps %ymm0, -32(%rdi)
cmpq%rdi, %rax
jne .L2
vzeroupper
ret

*** off-topic ***

It unfortunately uses 5 uops in the loop, meaning it can only issue one
iteration per 2 clocks.  Other than unrolling, it would prob. be more efficient
to get 2.0f broadcast into %ymm1 and use vmulps (%rdi), %ymm1, %ymm0, avoiding
the separate load.

Doing the loop in reverse order, with an indexed addressing mode counting an
index down to zero, would also keep the loop overhead down to one
decrement-and-branch uop.  I know compilers are allowed to re-order memory
accesses, so I assume this would be allowed.  However, this wouldn't actually
help on Sandybridge since it seems that two-register addressing modes might not
micro-fuse on SnB-family CPUs:
(http://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes.
 Agner Fog says he tested and found 2-reg addressing modes did micro-fuse. 
Agner Fog is probably right, but IDK what's wrong with my experiment using perf
counters.)  That would make the store 2 uops.


*** back on topic ***

Anyway, that wasn't even what I meant to report.  The unaligned case peels off
the potentially-unaligned start/end iterations, and unrolls them into a giant
amount of code.  This is unlikely to be optimal outside of microbenchmarks,
since CPUs with a uop-cache suffer from excessive unrolling.

floatmul(float*):
movq%rdi, %rax
andl$31, %eax
shrq$2, %rax
negq%rax
andl$7, %eax
je  .L12
vmovss  (%rdi), %xmm0
vaddss  %xmm0, %xmm0, %xmm0
vmovss  %xmm0, (%rdi)
cmpl$1, %eax
je  .L13
vmovss  4(%rdi), %xmm0
vaddss  %xmm0, %xmm0, %xmm0
vmovss  %xmm0, 4(%rdi)
cmpl$2, %eax
je  .L14
vmovss  8(%rdi), %xmm0
...

repeated up to  cmpl$6, %eax
...
some loop setup
.L9:
vmovaps (%rcx,%rax), %ymm0
addl$1, %edx
vaddps  %ymm0, %ymm0, %ymm0
vmovaps %ymm0, (%rcx,%rax)
addq$32, %rax
cmpl%esi, %edx
jb  .L9

...
another fully-unrolled up-to-7 iteration cleanup loop

Notice that the vectorized part of the loop now has 6 uops.  (Or 7, if the
store can't micro-fuse.)  So gcc is even farther from getting this loop to run
at one cycle per iteration.  (Which should be possible on Haswell.  On SnB/IvB
(and AMD Bulldozer-family), a 256b store takes two cycles anyway.)


Is there any experimental evidence that fully unrolling to make this much code
is beneficial?

The most obvious way to improve on this would be to use a 128b xmm vector for
the first 4 iterations of the prologue/epilogue loops.

Even simply not unrolling the 7-iteration alignment loops might be a win. 
Every unrolled iteration still has a compare-and-branch.  By counting down to
zero, the loop could have the same overhead.  All that changes is branch
prediction (one taken branch and many not-taken, vs. a single loop branch taken
n times.)

AVX introduces a completely different way to handle this, though: VMASKMOVPS is
usable now, since it doesn't have the non-temporal hint that makes the SSE
version of it nearly useless.  According to Agner Fog's insn tables, vpmaskmov
%ymm, %ymm, m256 is only 4 uops, and has a throughput of one per 2 cycles
(SnB/IvB/Haswell).  It's quite slow (as a store) on AMD bulldozer-family CPUs,
though, so this might only be appropriate with -tune=something other than AMD.

The trouble is turning a misalignment count into a mask.  Most of the useful
instructions (like PSRLDQ to use on a 

[Bug debug/68904] DWARF for class ios_base says it's a declaration

2015-12-15 Thread ivan.soleimanipour at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68904

ivan.soleimanipour at oracle dot com changed:

   What|Removed |Added

 CC||ivan.soleimanipour at oracle 
dot c
   ||om

--- Comment #5 from ivan.soleimanipour at oracle dot com ---
I see now why Andrew was asking the questions he was asking.

What we failed to notice is that the definition of ios_base in t.o
is abbreviated. It contains only nested classes, 'static const' members and
typedefs. There is no member function or data member information.
There is a more complete definition in `libstdc++.so.6.0.18.

So now the questions become:
- How does gcc decide to emit this abbreviated form?
- Why is it then not _fully_ abbreviated? Why bother with the typedefs and
such?

FWIW -fno-eliminate-unused-debug-types seems to make no difference.

  1   2   >