Re: [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Ludovic Courtès
Hi Shea,

Shea Levy skribis:

> Unlike the traditional approach of installing system libraries into one
> central location like /usr/{lib,include}, the nix package manager [1]
> installs each package into it's own prefix
> (e.g. /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev). Moreover,
> each package is built in its own environment determined from its
> explicitly listed dependencies, regardless of what else is installed on
> the system. Because not all package build scripts properly respect
> CFLAGS etc., we currently wrap the compiler [2] to respect custom
> environment variables like NIX_CFLAGS_COMPILE, so during the build of a
> package that depends on zlib and Xlib might have NIX_CFLAGS_COMPILE set
> to "-isystem 
> /nix/store/bl0rz2xinsm9yslghd7n5vaba86zxknh-libX11-1.6.3-dev/include -isystem 
> /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev/include".
>
> Unfortunately, as you can see if you click through the link or look
> through the git history, the wrapper is quite complex (frankly, hacky)

> [2]: 
> https://github.com/NixOS/nixpkgs/blob/8cbdd9d0c290e294a9d783c8868e738db05c9ce2/pkgs/build-support/cc-wrapper/cc-wrapper.sh

Guix avoids the compiler wrapper altogether like this:

  • We use C_INCLUDE_PATH, LIBRARY_PATH, and friends:
.

  • We have a simple linker wrapper aimed at adding -Wl,-rpath flags:

.
The comment in that file explains why the other options considered
were unsuitable.

  • We modify the built-in “lib” spec of GCC to add the necessary -L and
-rpath flags:
.

  • Likewise, we tell Clang where to find libc and friends:


.

This is not too intrusive and more robust than wrapping everything.

I suppose GCC and Clang could facilitate this by providing configure
options to augment the “lib” spec, specify the location of libc alone,
or something along these lines.

Thoughts?

Ludo’.


Re: [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Shea Levy
Hi Ludo’,

Your patches look good! My biggest concern is how the ld wrapper behaves
in the presence of response files. Have you tested that?

Thanks,
Shea

Ludovic Courtès  writes:

> Hi Shea,
>
> Shea Levy skribis:
>
>> Unlike the traditional approach of installing system libraries into one
>> central location like /usr/{lib,include}, the nix package manager [1]
>> installs each package into it's own prefix
>> (e.g. /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev). Moreover,
>> each package is built in its own environment determined from its
>> explicitly listed dependencies, regardless of what else is installed on
>> the system. Because not all package build scripts properly respect
>> CFLAGS etc., we currently wrap the compiler [2] to respect custom
>> environment variables like NIX_CFLAGS_COMPILE, so during the build of a
>> package that depends on zlib and Xlib might have NIX_CFLAGS_COMPILE set
>> to "-isystem 
>> /nix/store/bl0rz2xinsm9yslghd7n5vaba86zxknh-libX11-1.6.3-dev/include 
>> -isystem /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev/include".
>>
>> Unfortunately, as you can see if you click through the link or look
>> through the git history, the wrapper is quite complex (frankly, hacky)
>
>> [2]: 
>> https://github.com/NixOS/nixpkgs/blob/8cbdd9d0c290e294a9d783c8868e738db05c9ce2/pkgs/build-support/cc-wrapper/cc-wrapper.sh
>
> Guix avoids the compiler wrapper altogether like this:
>
>   • We use C_INCLUDE_PATH, LIBRARY_PATH, and friends:
> 
> .
>
>   • We have a simple linker wrapper aimed at adding -Wl,-rpath flags:
> 
> .
> The comment in that file explains why the other options considered
> were unsuitable.
>
>   • We modify the built-in “lib” spec of GCC to add the necessary -L and
> -rpath flags:
> 
> .
>
>   • Likewise, we tell Clang where to find libc and friends:
> 
> 
> 
> .
>
> This is not too intrusive and more robust than wrapping everything.
>
> I suppose GCC and Clang could facilitate this by providing configure
> options to augment the “lib” spec, specify the location of libc alone,
> or something along these lines.
>
> Thoughts?
>
> Ludo’.


signature.asc
Description: PGP signature


C++17 std::launder and aliasing

2016-10-18 Thread Jakub Jelinek
Hi!

http://wg21.link/p0137
adds std::launder which is supposed to be some kind of aliasing optimization
barrier.

What is unclear to me is if we really need compiler support for that.
I have unfortunately not found many examples:

http://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder
mentions something like:
#include 
int
foo ()
{
  struct X { const int n; };
  union U { X x; float f; };
  U u = {{ 1 }};
  int a = u.x.n;
  X *p = new () X {2};
  int b = u.x.n;// UB, needs std::launder()
  return a + b;
}
but g++ handles it as returning 3 even without that.
So, do we need to do anything here even in the current gcc aliasing model,
which is very permissive (my understanding is that usually we treat all
writes as possibly placement new-ish changes)?

Then I found something like:
https://groups.google.com/a/isocpp.org/forum/#!msg/std-discussion/XYvVlTc3-to/HbhebSRnAgAJ
which of course doesn't work with -flifetime-dse=2, I'd strongly hope that
all those attempts there are UB even with std::launder.

Adding __builtin_launder as void * -> void * builtin (ECF_CONST) or perhaps
typegeneric one that returns the same pointer as given to it (and not
teaching alias analysis about what that builtin does) is certainly possible,
the question is how to expand it at RTL time (does it also need to be some
kind of opt barrier, say like __asm ("" : "+g" (ptr));, or not?

Jakub


Re: C++17 std::launder and aliasing

2016-10-18 Thread Richard Biener
On Tue, Oct 18, 2016 at 1:06 PM, Jakub Jelinek  wrote:
> Hi!
>
> http://wg21.link/p0137
> adds std::launder which is supposed to be some kind of aliasing optimization
> barrier.
>
> What is unclear to me is if we really need compiler support for that.
> I have unfortunately not found many examples:
>
> http://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder
> mentions something like:
> #include 
> int
> foo ()
> {
>   struct X { const int n; };
>   union U { X x; float f; };
>   U u = {{ 1 }};
>   int a = u.x.n;
>   X *p = new () X {2};
>   int b = u.x.n;// UB, needs std::launder()
>   return a + b;
> }
> but g++ handles it as returning 3 even without that.
> So, do we need to do anything here even in the current gcc aliasing model,
> which is very permissive (my understanding is that usually we treat all
> writes as possibly placement new-ish changes)?

The standard mentions that appearantly const and reference "sub-objects" are
unchanging when you access them.  Cruically they changed 3.8/1 to

The lifetime of an object o of type T ends when...

 * the storage which the object occupies is released, or is reused by
an object that is not nested within o ([intro.object])

the subobject notion is new.  I suppse this was done to formally allow
construction of objects
in char[] members w/o ending the containings object lifetime.  I think
this is somewhat of a mistake
as obviously two (sub-)objects can't be life at the same time at the
same memory location
(which the undefined behavior above implies).

And yes, GCC doesn't need anything special as it handles sub-object
lifetime properly
(any store may end it) and it doesn't exploit the "constness" of const
declared members
or reference members.

> Then I found something like:
> https://groups.google.com/a/isocpp.org/forum/#!msg/std-discussion/XYvVlTc3-to/HbhebSRnAgAJ
> which of course doesn't work with -flifetime-dse=2, I'd strongly hope that
> all those attempts there are UB even with std::launder.

Obviously std::launder now invites people to invent fancy things (all
UB), similar
to how reinterpret_cast<>s name invited people to think it has anything to do
with TBAA.

std::launder is about object lifetime, nothing else IIUC.

> Adding __builtin_launder as void * -> void * builtin (ECF_CONST) or perhaps
> typegeneric one that returns the same pointer as given to it (and not
> teaching alias analysis about what that builtin does) is certainly possible,
> the question is how to expand it at RTL time (does it also need to be some
> kind of opt barrier, say like __asm ("" : "+g" (ptr));, or not?

As said, nothing needed for the middle-end.

Richard.

> Jakub


Re: [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Ludovic Courtès
Hi!

Shea Levy  skribis:

> Your patches look good! My biggest concern is how the ld wrapper behaves
> in the presence of response files. Have you tested that?

It surely doesn’t (yet?).

However, GCC does not pass “@file” arguments when it invokes ‘ld’, and
the bug report you mentioned¹ talks about GHC invoking ‘gcc’, not ‘ld’,
so I guess it’s fine to ignore response files in the ld wrapper.

Ludo’.

¹ 
https://github.com/NixOS/nixpkgs/commit/a421e7bd4a28c69bded8b17888325e31554f61a1


Re: [cfe-dev] [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Nathan Froyd
On Tue, Oct 18, 2016 at 8:59 AM, Ludovic Courtès via cfe-dev
 wrote:
> Shea Levy  skribis:
>
>> Your patches look good! My biggest concern is how the ld wrapper behaves
>> in the presence of response files. Have you tested that?
>
> It surely doesn’t (yet?).
>
> However, GCC does not pass “@file” arguments when it invokes ‘ld’, and
> the bug report you mentioned¹ talks about GHC invoking ‘gcc’, not ‘ld’,
> so I guess it’s fine to ignore response files in the ld wrapper.

GCC will pass response files to ld when response files were used in
the invocation of GCC.

-Nathan


Re: [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-18 Thread Shea Levy
Hey Ludo’,

Amazing, more than a decade of close working with these tools and I
never knew about C_INCLUDE_PATH et al! It looks like those will solve a
huge portion of the problem.

Will look at your gcc and clang patches as well, thank you!

~Shea

Ludovic Courtès  writes:

> Hi Shea,
>
> Shea Levy skribis:
>
>> Unlike the traditional approach of installing system libraries into one
>> central location like /usr/{lib,include}, the nix package manager [1]
>> installs each package into it's own prefix
>> (e.g. /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev). Moreover,
>> each package is built in its own environment determined from its
>> explicitly listed dependencies, regardless of what else is installed on
>> the system. Because not all package build scripts properly respect
>> CFLAGS etc., we currently wrap the compiler [2] to respect custom
>> environment variables like NIX_CFLAGS_COMPILE, so during the build of a
>> package that depends on zlib and Xlib might have NIX_CFLAGS_COMPILE set
>> to "-isystem 
>> /nix/store/bl0rz2xinsm9yslghd7n5vaba86zxknh-libX11-1.6.3-dev/include 
>> -isystem /nix/store/mn9kqag3d24v6q41x747zd7n5qnalch7-zlib-1.2.8-dev/include".
>>
>> Unfortunately, as you can see if you click through the link or look
>> through the git history, the wrapper is quite complex (frankly, hacky)
>
>> [2]: 
>> https://github.com/NixOS/nixpkgs/blob/8cbdd9d0c290e294a9d783c8868e738db05c9ce2/pkgs/build-support/cc-wrapper/cc-wrapper.sh
>
> Guix avoids the compiler wrapper altogether like this:
>
>   • We use C_INCLUDE_PATH, LIBRARY_PATH, and friends:
> 
> .
>
>   • We have a simple linker wrapper aimed at adding -Wl,-rpath flags:
> 
> .
> The comment in that file explains why the other options considered
> were unsuitable.
>
>   • We modify the built-in “lib” spec of GCC to add the necessary -L and
> -rpath flags:
> 
> .
>
>   • Likewise, we tell Clang where to find libc and friends:
> 
> 
> 
> .
>
> This is not too intrusive and more robust than wrapping everything.
>
> I suppose GCC and Clang could facilitate this by providing configure
> options to augment the “lib” spec, specify the location of libc alone,
> or something along these lines.
>
> Thoughts?
>
> Ludo’.


signature.asc
Description: PGP signature


Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-18 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 15:38:50 +0200, I wrote:
> On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener 
>  wrote:
> > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> >  wrote:
> > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
> > >  wrote:
> > >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> > >>  wrote:
> > >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> > >> >  wrote:
> > >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  
> > >> >> wrote:
> > >> >> > On 10/14/16 05:28, Richard Biener wrote:
> > >> >> >
> > >> >> >> The BB_VISITED flag has indetermined state at the beginning of a 
> > >> >> >> pass.
> > >> >> >> You have to ensure it is cleared yourself.
> > >> >> >
> > >> >> >
> > >> >> > In that case the openacc (?) passes should be modified to 
> > >> >> > clear the
> > >> >> > flags at their start, rather than at their end.

This already is a "conceptual acknowledgement" of my patch, so...

> > >> > OK to commit the following?  Is such a test case appropriate (which 
> > >> > would
> > >> > have caught this issue right away), in particular the dg-final
> > >> > scan-tree-dump line?
> > >>
> > >> Ugh.  Not worse to what we do in various dwarf scanning I guess.
> > >
> > > ;-|
> > >
> > >> Doesn't failure lead to a miscompile eventually?  So you could formulate
> > >> this as a dg-do run test with a check for the desired outcome?
> > >
> > > No, unfortunately.  In this case the error is "benign" such that the
> > > OpenACC loop processing machinery will decide to not parallelize loops
> > > that ought to be parallelized.
> > 
> > So you can scan for "loop parallelized" instead?
> 
> The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
> have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
> seems likewise fragile (for false negatives), and less useful than
> scanning for the complete pattern.
> 
> > I fear your pattern
> > is quite fragile
> > to maintain over time.
> 
> Agreed -- but then, that's intentional: my idea for this new test case
> has been to have it actually verify the expected OpenACC loop processing,
> so it's clear that this pattern will need to be adjusted if changing the
> OpenACC loop processing.
> 
> > >  This won't generally cause any problem
> > > (apart from performance regression, obviously); it just caused problems
> > > in a few libgomp test cases that actually at run time test for
> > > parallelized execution -- which will/did trigger only with nvptx
> > > offloading enabled, which not too many people are testing.  The test case
> > > I propose below will trigger also for non-offloading configurations.
> 
> On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
> backslashes]' -- thanks, done.

If you don't like the test case as-is (do we need multi-line tree dump
scanning, just like we recently got for compiler diagnostics?), can I at
least commit the OpenACC loops processing fix?  Here is the latest
version, simplified after your r241296 IRA vs. BB_VISITED fixes:

commit 766cf9959b15a17e17e89a50e905b4c546893823
Author: Thomas Schwinge 
Date:   Mon Oct 17 15:33:09 2016 +0200

Clear basic block flags before using BB_VISITED for OpenACC loops processing

gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and
don't clear BB_VISITED after processing.

gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c  |  8 +++-
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 77f89d5..3ef796f 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19245,10 +19247,6 @@ oacc_loop_discovery ()
  that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
-
   return top;
 }
 
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, 

gcc-5-20161018 is now available

2016-10-18 Thread gccadmin
Snapshot gcc-5-20161018 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20161018/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 241321

You'll find:

 gcc-5-20161018.tar.bz2   Complete GCC

  MD5=a3caabb2ffaf0d52cf1cd668621b7a56
  SHA1=9d476e5a81beffccb76aaf95c97d5a12abc8aa02

Diffs from 5-20161011 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug fortran/78033] Internal Compiler Error in enforce_single_undo_checkpoint

2016-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78033

--- Comment #2 from kargl at gcc dot gnu.org ---
(In reply to kargl from comment #1)
> Reduced testcase.
> 
> function f(n, x)
>integer, intent(in) :: n 
>complex, intent(in) :: x(1:n)
>real :: f
>f = g([real(x(1:n)), aimag(x(1:n))])
> end function f
> 
> If the array sections are removed in favor of the whole array,
> the code compiles.

Even further reduction.

subroutine f(n, x)
   integer, intent(in) :: n 
   complex, intent(in) :: x(1:n)
   real :: y(2*n)
   y = [real(x(1:n), aimag(x(1:n))]
end subroutine f

The constructor with array sections is going south.

[Bug fortran/78033] Internal Compiler Error in enforce_single_undo_checkpoint

2016-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78033

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #1 from kargl at gcc dot gnu.org ---
Reduced testcase.

function f(n, x)
   integer, intent(in) :: n 
   complex, intent(in) :: x(1:n)
   real :: f
   f = g([real(x(1:n)), aimag(x(1:n))])
end function f

If the array sections are removed in favor of the whole array,
the code compiles.

Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Prathamesh Kulkarni
On 19 October 2016 at 03:03, Jeff Law  wrote:
> On 10/17/2016 11:23 PM, Prathamesh Kulkarni wrote:
>>
>> The divmod transform isn't enabled if target supports hardware div in
>> the same or wider mode even if divmod libfunc is available for the
>> given mode.
>
> Good.  That seems like the right thing to do.
>
>> Thanks. I had erroneously  assumed __udivimoddi4() was available for
>> all targets because it was defined in libgcc2.c and generated call to
>> it as "last resort" for unsigned DImode case if target didn't support
>> hardware div and divmod insn and didn't have target-specific divmod
>> libfunc for DImode. The reason why it generated undefined reference
>> on aarch64-linux-gnu was because I didn't properly check in the patch
>> if target supported hardware div, and ended up generating call to
>> __udivmoddi4() even though aarch64 has hardware div. I rectified
>> that before posting the patch.
>
> Understood.  From a design standpoint, it seems to me like the path where we
> emit a call to udivmod without knowing if its supported by libgcc is broken.
> But if I understand correctly, that's not affected by your changes -- it's
> simply a historical poor decision.
Err no, that poor decision was entirely mine -;)  I had wrongly
assumed __udivmoddi4 to be always available
and got surprised when it gave undefined reference error on aarch64
and hence brought it up for discussion.
I removed those parts of the patch that generated call to
__udivmoddi4() before posting the patch upstream.
>
>>>
>>> I don't even think we have a way of knowing in the compiler if the
>>> target has enabled divmod support in libgcc.
>>
>> Well the arm and c6x backends register target-specific divmod libfunc
>> via set_optab_libfunc(). I suppose that's sufficient to know if
>> target has divmod enabled in libgcc ?
>
> It's probably a pretty reasonable assumption that if the target has
> registered a libfunc, the the libfunc ought to be available.
>
>>>
>>> ISTM that for now we have to limit to cases where we have a divmod
>>> insn/libcall defined.
>>
>> Indeed. The transform is enabled only if the target has hardware
>> divmod insn or divmod libfunc (in the libfunc case, no hardware div
>> insn). Please see divmod_candidate_p() in the patch for cases when
>> the transform gets enabled.
>
> Great.  Thanks.  Hoping to make some progress on the actual patch in the
> next couple days.
Thanks!

Regards,
Prathamesh
>
> jeff


[Bug fortran/68649] [6/7 Regression] note: code may be misoptimized unless -fno-strict-aliasing is used

2016-10-18 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68649

--- Comment #18 from Joost VandeVondele  
---
since this PR, and the related PR77278 can presumably only be fixed by changing
libgfortran abi (at least if I understand Richard's suggestion for fixing
this). The announced major version bump of libgfortran
(https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01376.html) could be a good
opportunity for this change. It is the major thing holding back the use of LTO
with Fortran projects, I think.

Re: [PATCH] Implement P0084R2, Emplace return type, for C++17

2016-10-18 Thread Christophe Lyon
Hi Jonathan,


On 17 October 2016 at 13:56, Jonathan Wakely  wrote:
> In C++17 the emplace_front and emplace_back members return a
> reference. There isn't a very neat way to implement this, so it's just
> lots of conditional compilation.
>
> This isn't an ABI break, because these member functions are templates
> and so the return type is part of the mangled name. We can't apply it
> retroactively to older standards though, because it breaks code that
> is valid in C++14, such as:
>
>  void f(std::vector& v) { return v.emplace_back(1); }
>
>
> * doc/xml/manual/status_cxx2017.xml: Update status.
> * doc/html/*: Regenerate.
> * include/bits/deque.tcc (deque::emplace_front,
> deque::emplace_back):
> Return a reference in C++17 mode.
> * include/bits/forward_list.h (forward_list::emplace_front):
> Likewise.
> * include/bits/stl_bvector.h (vector::emplace_back): Likewise.
> * include/bits/stl_deque.h (deque::emplace_front,
> deque::emplace_back):
> Likewise.
> * include/bits/stl_list.h (list::emplace_front, list::emplace_back):
> Likewise.
> * include/bits/stl_queue.h (queue::emplace): Likewise.
> * include/bits/stl_stack.h (stack::emplace): Likewise.
> * include/bits/stl_vector.h (vector::emplace_back): Likewise.
> * include/bits/vector.tcc (vector::emplace_back): Likewise.
> * include/debug/deque (__gnu_debug::deque::emplace_front)
> (__gnu_debug::deque::emplace_back): Likewise.
> * include/debug/vector (__gnu_debug::vector::emplace_back):
> Likewise.
> * testsuite/23_containers/deque/modifiers/emplace/cxx17_return.cc:
> New.
> * testsuite/23_containers/forward_list/modifiers/
> emplace_cxx17_return.cc: New.
> * testsuite/23_containers/list/modifiers/emplace/cxx17_return.cc:
> New.
> * testsuite/23_containers/queue/members/emplace_cxx17_return.cc:
> New.
> * testsuite/23_containers/stack/members/emplace_cxx17_return.cc:
> New.
> * testsuite/23_containers/vector/bool/emplace_cxx17_return.cc: New.
> * testsuite/23_containers/vector/modifiers/emplace/cxx17_return.cc:
> New.
>
> Tested powerpc64le-linux, committed to trunk.
>

The new tests (except for
23_containers/forward_list/modifiers/emplace_cxx17_return.cc) fail on
arm-none-eabi using default cpu/fpu/arch/mode:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(future.o):
In function `__future_category_instance':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++11/future.cc:64:
undefined reference to `__sync_synchronize'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(locale.o):
In function `get_locale_cache_mutex':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++98/locale.cc:36:
undefined reference to `__sync_synchronize'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/src/.libs/libstdc++.a(locale_init.o):
In function `get_locale_mutex':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/src/c++98/locale_init.cc:66:
undefined reference to `__sync_synchronize'

Christophe


Re: [PATCH, libfortran] PR 48587 Newunit allocator

2016-10-18 Thread Steven Bosscher
On Thu, Oct 13, 2016 at 5:16 PM, Janne Blomqvist wrote:
> +static bool *newunits;

You could make this a bitmap (like sbitmap). A bit more code but makes
a potentially quadratic search (when opening many units) less time
consuming.

Ciao!
Steven


[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #3 from Richard Biener  ---
Created attachment 39827
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39827=edit
untested patch

Mostly untested prototype.  For -mavx2 we get from the testcase innermost loop

.L6:
vmovdqa (%r9,%rdx), %ymm0
addl$1, %r8d
vperm2i128  $0, %ymm0, %ymm0, %ymm0
vpshufb %ymm1, %ymm0, %ymm0
vmovdqa %ymm0, (%r9,%rdx)
addq$32, %rdx
cmpl%r11d, %r8d
jb  .L6

with -msse4:

.L6:
movdqa  (%rax,%rdx), %xmm0
addl$1, %r8d
pshufb  %xmm1, %xmm0
movaps  %xmm0, (%rax,%rdx)
addq$16, %rdx
cmpl%r10d, %r8d
jb  .L6

not sure if I got the bswap permutation vector constant correct either ;) 
(quick hack)

  vect_load_dst_8.13_63 = MEM[(u32 *)vectp_b.11_61];
  load_dst_8 = *_3;
  _64 = VIEW_CONVERT_EXPR(vect_load_dst_8.13_63);
  _65 = VEC_PERM_EXPR <_64, _64, { 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1,
0 }>;
  _66 = VIEW_CONVERT_EXPR(_65);
  _13 = __builtin_bswap32 (load_dst_8);
  MEM[(u32 *)vectp_b.14_69] = _66;

[Bug libgcc/78017] New: weak reference usage in gthr-posix.h (__gthread*) is broken

2016-10-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78017

Bug ID: 78017
   Summary: weak reference usage in gthr-posix.h (__gthread*) is
broken
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

there seem to be no open bug for the issue discussed in
https://gcc.gnu.org/ml/libstdc++/2014-11/msg00122.html

i think libstdc++, libgfortran and libgcov are affected.

libgcc/gthr-posix.h uses weak references for the pthread api to
1) detect single-threadedness
2) use pthread api without adding -lpthread dependency.

this does not work

a) with static linking:

because 1) fails on the target:

single threaded execution can be assumed if pthread_create and
thrd_create are not referenced (on targets without libpthread
dlopen support), but the gthr logic only checks pthread_create
(on bionic) or pthread_cancel (on some targets as a misguided
attempt to avoid detecting threads because of ldpreloaded pthread
wrappers which "seem unlikely" to define pthread_cancel).

symbols required by libstdc++ may be missing because of 2),
(causing hard to debug runtime crashes):

redhat puts all of libpthread into a single .o, others use
 -Wl,--whole-archive -lpthread -Wl,--no-whole-archive
linker flags as a workaround, but this should not be needed:
if libstdc++.a semantically requires strong references then
those references must be strong (the calls may be elided
using the detection above).

b) if there is dlopen support for libpthread

then single-threadedness can change at runtime, so any check
would break code like

  std::mutex m;
  m.lock();
  dlopen(.. something that starts threads and use m ..)
  m.unlock();


various targets explicitly opt out from the weak ref hacks,
(using gcc configure time hacks), i think instead the gthr
logic should be safe by default:

assume multi-threaded execution by default and only try
to optimize the single threaded case when it is guaranteed
to be safe.

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 07:06 PM, Alexander Monakov wrote:


I've just pushed two commits to the branch to fix this issue.  Before those, the
last commit left the branch in a state where an incremental build seemed ok
(because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
build was broken like you've shown.  LULESH is known to work.  I also intend to
perform a trunk merge soon.


Ok that did work, however...


I think before merging this work we'll need to have some idea of how well it
works on real-world code.


This patchset and the branch lay the foundation, there's more work to be
done, in particular on the performance improvements side. There should be
an agreement on these fundamental bits first, before moving on to fine-tuning.


The performance I saw was lower by a factor of 80 or so compared to 
their CUDA version, and even lower than OpenMP on the host. Does this 
match what you are seeing? Do you have a clear plan how this can be 
improved?


To me this kind of performance doesn't look like something that will be 
fixed by fine-tuning; it leaves me undecided whether the chosen approach 
(what you call the fundamentals) is viable at all. Performance is still 
better than the OpenACC version of the benchmark, but then I think we 
shouldn't repeat the mistakes we made with OpenACC and avoid merging 
something until we're sure it's ready and of benefit to users.



Bernd


[Bug tree-optimization/77943] [5/6 Regression] Optimization incorrectly commons noexcept calls with non-noexcept calls

2016-10-18 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77943

--- Comment #13 from Martin Liška  ---
The replacement you described fully makes sense for me! As I mentioned earlier,
I'm not c++ expert, I can't come up with more possible counter examples that
worth for testing.

However, we'll fix further issues ;)

Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Richard Biener
On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
 wrote:
>
> Richard Biener writes:
>
>> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
>>  wrote:
>>> Hi,
>>>
>>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
>>>   same issue on a gcc 5.x and found that the fix didn't get backported.
>>>
>>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
>>>   backport to gcc-5-branch?
>>
>> Ok with me but please double-check there was no fallout.
>
> I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> No regressions.

I meant fallout only fixed with followup patches.  ISTR some in that area
but I might confuse it with another patch.  Marek might remember.

Richard.

> I'll run the reg tests against arm-none-eabi. Can I commit it if that
> passes?
>
> Regards
> Senthil
>>
>> Richard.
>>
>>> Regards
>>> Senthil
>>>
>>> gcc/c/ChangeLog
>>>
>>> 2016-10-17  Senthil Kumar Selvaraj  
>>>
>>>   Backport from mainline
>>> 2015-04-25  Marek Polacek  
>>> PR c/52085
>>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
>>> "mode"
>>> attribute.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2016-10-17  Senthil Kumar Selvaraj  
>>>
>>> Backport from mainline
>>> 2015-04-25  Marek Polacek  
>>> PR c/52085
>>> * gcc.dg/enum-incomplete-2.c: New test.
>>> * gcc.dg/enum-mode-1.c: New test.
>>>
>>>
>>> diff --git gcc/c/c-decl.c gcc/c/c-decl.c
>>> index d1e7444..c508e7f 100644
>>> --- gcc/c/c-decl.c
>>> +++ gcc/c/c-decl.c
>>> @@ -8050,7 +8050,7 @@ finish_enum (tree enumtype, tree values, tree 
>>> attributes)
>>>
>>>/* If the precision of the type was specified with an attribute and it
>>>   was too small, give an error.  Otherwise, use it.  */
>>> -  if (TYPE_PRECISION (enumtype))
>>> +  if (TYPE_PRECISION (enumtype) && lookup_attribute ("mode", attributes))
>>>  {
>>>if (precision > TYPE_PRECISION (enumtype))
>>> {
>>> @@ -8078,6 +8078,7 @@ finish_enum (tree enumtype, tree values, tree 
>>> attributes)
>>>TYPE_MIN_VALUE (enumtype) = TYPE_MIN_VALUE (tem);
>>>TYPE_MAX_VALUE (enumtype) = TYPE_MAX_VALUE (tem);
>>>TYPE_UNSIGNED (enumtype) = TYPE_UNSIGNED (tem);
>>> +  TYPE_ALIGN (enumtype) = TYPE_ALIGN (tem);
>>>TYPE_SIZE (enumtype) = 0;
>>>TYPE_PRECISION (enumtype) = TYPE_PRECISION (tem);
>>>
>>> diff --git gcc/testsuite/gcc.dg/enum-incomplete-2.c 
>>> gcc/testsuite/gcc.dg/enum-incomplete-2.c
>>> new file mode 100644
>>> index 000..5970551
>>> --- /dev/null
>>> +++ gcc/testsuite/gcc.dg/enum-incomplete-2.c
>>> @@ -0,0 +1,41 @@
>>> +/* PR c/52085 */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "" } */
>>> +
>>> +#define SA(X) _Static_assert((X),#X)
>>> +
>>> +enum e1;
>>> +enum e1 { A } __attribute__ ((__packed__));
>>> +enum e2 { B } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e1) == sizeof (enum e2));
>>> +SA (_Alignof (enum e1) == _Alignof (enum e2));
>>> +
>>> +enum e3;
>>> +enum e3 { C = 256 } __attribute__ ((__packed__));
>>> +enum e4 { D = 256 } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e3) == sizeof (enum e4));
>>> +SA (_Alignof (enum e3) == _Alignof (enum e4));
>>> +
>>> +enum e5;
>>> +enum e5 { E = __INT_MAX__ } __attribute__ ((__packed__));
>>> +enum e6 { F = __INT_MAX__ } __attribute__ ((__packed__));
>>> +SA (sizeof (enum e5) == sizeof (enum e6));
>>> +SA (_Alignof (enum e5) == _Alignof (enum e6));
>>> +
>>> +enum e7;
>>> +enum e7 { G } __attribute__ ((__mode__(__byte__)));
>>> +enum e8 { H } __attribute__ ((__mode__(__byte__)));
>>> +SA (sizeof (enum e7) == sizeof (enum e8));
>>> +SA (_Alignof (enum e7) == _Alignof (enum e8));
>>> +
>>> +enum e9;
>>> +enum e9 { I } __attribute__ ((__packed__, __mode__(__byte__)));
>>> +enum e10 { J } __attribute__ ((__packed__, __mode__(__byte__)));
>>> +SA (sizeof (enum e9) == sizeof (enum e10));
>>> +SA (_Alignof (enum e9) == _Alignof (enum e10));
>>> +
>>> +enum e11;
>>> +enum e11 { K } __attribute__ ((__mode__(__word__)));
>>> +enum e12 { L } __attribute__ ((__mode__(__word__)));
>>> +SA (sizeof (enum e11) == sizeof (enum e12));
>>> +SA (_Alignof (enum e11) == _Alignof (enum e12));
>>> diff --git gcc/testsuite/gcc.dg/enum-mode-1.c 
>>> gcc/testsuite/gcc.dg/enum-mode-1.c
>>> new file mode 100644
>>> index 000..a701123
>>> --- /dev/null
>>> +++ gcc/testsuite/gcc.dg/enum-mode-1.c
>>> @@ -0,0 +1,10 @@
>>> +/* { dg-do compile } */
>>> +
>>> +enum e1 { A = 256 } __attribute__((__mode__(__byte__))); /* { dg-error 
>>> "specified mode too small for enumeral values" } */
>>> +enum e2 { B = 256 } __attribute__((__packed__, __mode__(__byte__))); /* { 
>>> dg-error "specified mode too small for enumeral values" } */
>>> +
>>> +enum e3 { C = __INT_MAX__ } __attribute__((__mode__(__QI__))); /* { 

[Bug libstdc++/78015] pthread_cancel while some exception is pending results in std::terminate ()

2016-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78015

Jakub Jelinek  changed:

   What|Removed |Added

 CC||rth at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
CCing also Richard as the author of the PR10570 changes.

Re: [patch] Fix PHI optimization issue with boolean types

2016-10-18 Thread Richard Biener
On Tue, Oct 18, 2016 at 8:35 AM, Eric Botcazou  wrote:
> Hi,
>
> this is a regression present on the mainline and 6 branch: the compiler now
> generates wrong code for the attached testcase at -O because of an internal
> conflict about boolean types.  The sequence is as follows.  In .mergephi3:
>
>   boolean _22;
>   p__enum res;
>
>   :
>   if (_22 != 0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # res_17 = PHI <2(8), 0(9), 1(10)>
>
> is turned into:
>
> COND_EXPR in block 9 and PHI in block 11 converted to straightline code.
> PHI res_17 changed to factor conversion out from COND_EXPR.
> New stmt with CAST that defines res_17.
>
>   boolean _33;
>
>   :
>   # _33 = PHI <2(8), _22(9)>
>   res_17 = (p__enum) _33;
>
> [...]
>
>   :
>   if (res_17 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _12 = res_17 == 2;
>   _13 = (integer) _12
>
> in .phiopt1.  So boolean _33 can have value 2.  Later forwprop3 propagates _33
> into the uses of res_17:
>
>   :
>   if (_33 != 0)
> goto ;
>   else
> goto ;
>
>   :
>   _12 = _33 == 2;
>   _13 = (integer) _12;
>
> and DOM3 deduces:
>
>   :
>   _12 = 0;
>   _13 = 0;
>
> because it computes that _33 has value 1 in BB 13 since it's a boolean.
>
> The problem was introduced by the new factor_out_conditional_conversion:
>
>   /* If arg1 is an INTEGER_CST, fold it to new type.  */
>   if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
>   && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
> {
>   if (gimple_assign_cast_p (arg0_def_stmt))
> new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
>   else
> return NULL;
> }
>   else
> return NULL
>
> int_fits_type_p is documented as taking only INTEGER_TYPE, but is invoked
> on constant 2 and a BOOLEAN_TYPE and returns true.
>
> BOOLEAN_TYPE is special in Ada: it has precision 8 and range [0;255] so the
> outcome of int_fits_type_p is not unreasonable.  But this goes against the
> various transformations applied to boolean types in the compiler, which all
> assume that they can only take values 0 or 1.
>
> Hence the attached fix (which should be a no-op except for Ada), tested on
> x86_64-suse-linux, OK for mainline and 6 branch?

Hmm, I wonder if the patch is a good approach as you are likely only papering
over a single of possibly very many affected places (wi::fits_to_tree_p comes
immediately to my mind).  I suppose a better way would be for Ada to not
lie about those types and not use BOOLEAN_TYPE but INTEGER_TYPE.
Because BOOLEAN_TYPE types only have two values as documented in
tree.def:

/* Boolean type (true or false are the only values).  Looks like an
   INTEGRAL_TYPE.  */
DEFTREECODE (BOOLEAN_TYPE, "boolean_type", tcc_type, 0)

There are not many references to BOOLEAN_TYPE in ada/gcc-interface
thus it shouldn't be hard to change ... (looks like Ada might even prefer
ENUMERAL_TYPE here).

Thanks,
Richard.

>
> 2016-10-18  Eric Botcazou  
>
> * tree.c (int_fits_type_p): Accept only 0 and 1 for boolean types.
>
>
> 2016-10-18  Eric Botcazou  
>
> * gnat.dg/opt59.adb: New test.
> * gnat.dg/opt59_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou


[Bug target/78007] Important loop from 482.sphinx3 is not vectorized

2016-10-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78007

--- Comment #4 from Richard Biener  ---
Probably handling should be moved after
targetm.vectorize.builtin_vectorized_function handling to allow arms
builtin-bswap vectorization via vrev to apply (not sure if its permutation
handling selects vrev for a bswap permutation).

Re: [PATCH] rs6000: Fix separate shrink-wrapping for TARGET_MULTIPLE

2016-10-18 Thread Segher Boessenkool
On Tue, Oct 18, 2016 at 12:17:32AM +0100, Iain Sandoe wrote:
> > Bootstrapped and tested on powerpc64-linux {-m64,-m32}.  I'll commit it
> > if Iain's testing (on darwin) also succeeds.
> 
> thanks!
> 
> All-langs bootstrap was restored with the patch (and others in progress for 
> existing known issues); 
> 
> I can’t see any evidence of the assert firing in the test-suite, so it all 
> looks good to me (there’s quite a bit of stage-1-ish testsuite noise at 
> present, however).

Thanks for testing!  I committed the slightly simpler patch below.


Segher


2016-10-18  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_savres_strategy): Do not select
{SAVE,REST}_MULTIPLE if shrink-wrapping separate components.
(rs6000_get_separate_components): Assert we do not have those
strategies selected.

---
 gcc/config/rs6000/rs6000.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 613af48..1b67592 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25511,7 +25511,10 @@ rs6000_savres_strategy (rs6000_stack_t *info,
   if (TARGET_MULTIPLE
   && !TARGET_POWERPC64
   && !(TARGET_SPE_ABI && info->spe_64bit_regs_used)
-  && info->first_gp_reg_save < 31)
+  && info->first_gp_reg_save < 31
+  && !(flag_shrink_wrap
+  && flag_shrink_wrap_separate
+  && optimize_function_for_speed_p (cfun)))
 {
   /* Prefer store multiple for saves over out-of-line routines,
 since the store-multiple instruction will always be smaller.  */
@@ -27445,6 +27448,9 @@ rs6000_get_separate_components (void)
   sbitmap components = sbitmap_alloc (32);
   bitmap_clear (components);
 
+  gcc_assert (!(info->savres_strategy & SAVE_MULTIPLE)
+ && !(info->savres_strategy & REST_MULTIPLE));
+
   /* The GPRs we need saved to the frame.  */
   if ((info->savres_strategy & SAVE_INLINE_GPRS)
   && (info->savres_strategy & REST_INLINE_GPRS))
-- 
1.9.3



Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Senthil Kumar Selvaraj

Jakub Jelinek writes:

> On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
>> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>>  wrote:
>> >
>> > Richard Biener writes:
>> >
>> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
>> >>  wrote:
>> >>> Hi,
>> >>>
>> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
>> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
>> >>>
>> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
>> >>>   backport to gcc-5-branch?
>> >>
>> >> Ok with me but please double-check there was no fallout.
>> >
>> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
>> > No regressions.
>> 
>> I meant fallout only fixed with followup patches.  ISTR some in that area
>> but I might confuse it with another patch.  Marek might remember.
>
> I'm not convinced it is desirable to backport such changes, it affects ABI,
> people are used to deal with minor ABI changes in between major GCC
> releases, but we'd need a strong reason to change it between minor releases.

Hmm, I tracked this down from a (internal) bug reported on arm-none-eabi, where 
the
inconsistent enum size (used in sizeof to malloc) was eventually causing
heap corruption. 

When debugging the issue, I noticed you'd already backported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69669, so I thought this
should be good.

Regards
Senthil

>
>> >>> 2016-10-17  Senthil Kumar Selvaraj  
>> >>>
>> >>>   Backport from mainline
>> >>> 2015-04-25  Marek Polacek  
>> >>> PR c/52085
>> >>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
>> >>> "mode"
>> >>> attribute.
>> >>>
>> >>> gcc/testsuite/ChangeLog
>> >>> 2016-10-17  Senthil Kumar Selvaraj  
>> >>>
>> >>> Backport from mainline
>> >>> 2015-04-25  Marek Polacek  
>> >>> PR c/52085
>> >>> * gcc.dg/enum-incomplete-2.c: New test.
>> >>> * gcc.dg/enum-mode-1.c: New test.
>
>   Jakub



Re: [PATCH 1/7] make LABEL_REF_LABEL a rtx_insn *

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


+static inline void
+set_label_ref_label (rtx ref, rtx_insn *label)
+{
+  XCEXP (ref, 0, LABEL_REF) = label;
+}


I guess I have to ask for a brief function comment for this. Otherwise OK.


Bernd



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov

> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
> 
> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>>> attempted to scale down the register limit used by -fsched-pressure for the 
>>> case where the block in question executes as frequently as the entry block 
>>> to just the call_clobbered (i.e. call_used) regs. But the code is actually 
>>> scaling toward call_saved registers. The following patch corrects that by 
>>> computing call_saved regs per class and subtracting out some scaled portion 
>>> of that.
 
 Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>> Hi Pat,
>> 
>> I stared at your patch and current code for good 30 minutes, and I still 
>> don't see what is wrong with the current code.
>> 
>> With your patch the number of registers from class CL that scheduler has at 
>> its disposal for a single-basic-block function will be:
>> 
>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>> call_saved_regs_num[CL];
>> 
>> where call_saved_regs_num is number of registers in class CL that need to be 
>> saved in the prologue (i.e., "free" registers).  I can see some logic in 
>> setting
>> 
>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>> 
>> but not in subtracting number of such registers from the number of total 
>> available hard registers.
>> 
>> Am I missing something?
>> 
> 
> Your original patch gave the following reasoning:
> 
> "At the moment the scheduler does not account for spills in the prologues and 
> restores in the epilogue, which occur from use of call-used registers.  The 
> current state is, essentially, optimized for case when there is a hot loop 
> inside the function, and the loop executes significantly more often than the 
> prologue/epilogue.  However, on the opposite end, we have a case when the 
> function is just a single non-cyclic basic block, which executes just as 
> often as prologue / epilogue, so spills in the prologue hurt performance as 
> much as spills in the basic block itself.  In such a case the scheduler 
> should throttle-down on the number of available registers and try to not go 
> beyond call-clobbered registers."
> 
> But the misunderstanding is that call-used registers do NOT cause any 
> save/restore. That is to say, call-used == call-clobbered. Your last sentence 
> explains the goal for a single block function, to not go beyond 
> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
> patch implements that goal by subtracting out call_saved_regs_num (those that 
> require prolog/epilog save/restore) from the total regs, and using that as 
> the target # of registers to be used for the block.

I see your point and agree that current code isn't optimal.  However, I don't 
think your patch is accurate either.  Consider 
https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
that FIXED_REGISTERS in class CL is set for a third of the registers, and 
CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have a 
third available for zero-cost allocation (CALL_USED_REGISTERS-FIXED_REGISTERS), 
a third available for spill-cost allocation (ALL_REGISTERS-CALL_USED_REGISTERS) 
and a third non-available (FIXED_REGISTERS).

For a non-loop-single-basic-block function we should be targeting only the 
third of register available at zero-cost -- correct?  This is what is done by 
the current code, but, apparently, by accident.  It seems that the right 
register count can be obtained with:

  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
-   if (call_used_regs[ira_class_hard_regs[cl][i]])
- ++call_used_regs_num[cl];
+   if (!call_used_regs[ira_class_hard_regs[cl][i]]
+   || fixed_regs[ira_class_hard_regs[cl][i]])
+ ++call_saved_regs_num[cl];

Does this look correct to you?

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH 2/7] make tablejump_p return the label as a rtx_insn *

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


* cfgcleanup.c (merge_blocks_move_successor_nojumps): Adjust.
(outgoing_edges_match): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgrtl.c (try_redirect_by_replacing_jump): Likewise.
(rtl_tidy_fallthru_edge): Likewise.
* rtl.h (tablejump_p): Adjust prototype.
* rtlanal.c (tablejump_p): Return the label as a rtx_insn *.


Ok.


Bernd


[Bug libstdc++/78015] New: pthread_cancel while some exception is pending results in std::terminate ()

2016-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78015

Bug ID: 78015
   Summary: pthread_cancel while some exception is pending results
in std::terminate ()
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

#include 
#include 

namespace __cxxabiv1
{  
  class __forced_unwind
  {
virtual ~__forced_unwind() throw();
virtual void __pure_dummy() = 0; 
  };
}

int a;

extern "C" void *
fun (void *)
{
#ifdef WORKAROUND
  try
  {
throw 1;
  }
  catch (int &)
#endif
  {
try
{
  char buf[10];
  for (;;)
read (4, buf, 0);
}
catch (__cxxabiv1::__forced_unwind &)
{
  a = 5;
  throw;
}
  }

  return NULL;
}

int
main ()
{
  pthread_t thread;
  pthread_create (, NULL, fun, NULL);
  pthread_cancel (thread);
  pthread_join (thread, NULL);
}

fails with std::terminate (), while works with -DWORKAROUND.
The problem is in __cxa_begin_catch:
51// Note that this use of "header" is a lie.  It's fine so long as we
only
52// examine header->unwindHeader though.
53if (!__is_gxx_exception_class(header->unwindHeader.exception_class))
54  {
55if (prev != 0)
56  std::terminate ();

With -DWORKAROUND, prev is NULL and thus we handle it fine as forced unwinding
exception.
But without that, prev is non-NULL here.  I don't know what exactly should we
do instead if prev is non-NULL,
whether to somehow destruct the previous exception, or just continue normally
and destruct it when the forced unwinding finishes,
or when.

Re: [PATCH] Clear BB_VISITED in bb-reorder

2016-10-18 Thread Richard Biener
On Mon, 17 Oct 2016, Andrew Pinski wrote:

> On Mon, Oct 17, 2016 at 5:26 AM, Richard Biener  wrote:
> >
> > $subject, applied as obvious.
> 
> I think you should do the same for the vectorizer too.  I noticed that
> when testing the patch for loop splitting.

Can't see where BB_VISITED is used by the vectorizer - can you point
me to that?

Thanks,
Richard.
 
> Thanks,
> Andrew
> 
> >
> > Richard.
> >
> > 2016-10-17  Richard Biener  
> >
> > * bb-reorder.c (reorder_basic_blocks_simple): Clear BB_VISITED
> > before using it.
> >
> > Index: gcc/bb-reorder.c
> > ===
> > --- gcc/bb-reorder.c(revision 241228)
> > +++ gcc/bb-reorder.c(working copy)
> > @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
> >   To start with, everything points to itself, nothing is assigned yet.  
> > */
> >
> >FOR_ALL_BB_FN (bb, cfun)
> > -bb->aux = bb;
> > +{
> > +  bb->aux = bb;
> > +  bb->flags &= ~BB_VISITED;
> > +}
> >
> >EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
> >


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>  wrote:
> >
> > Richard Biener writes:
> >
> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
> >>  wrote:
> >>> Hi,
> >>>
> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
> >>>
> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
> >>>   backport to gcc-5-branch?
> >>
> >> Ok with me but please double-check there was no fallout.
> >
> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> > No regressions.
> 
> I meant fallout only fixed with followup patches.  ISTR some in that area
> but I might confuse it with another patch.  Marek might remember.

I'm not convinced it is desirable to backport such changes, it affects ABI,
people are used to deal with minor ABI changes in between major GCC
releases, but we'd need a strong reason to change it between minor releases.

> >>> 2016-10-17  Senthil Kumar Selvaraj  
> >>>
> >>>   Backport from mainline
> >>> 2015-04-25  Marek Polacek  
> >>> PR c/52085
> >>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
> >>> "mode"
> >>> attribute.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>> 2016-10-17  Senthil Kumar Selvaraj  
> >>>
> >>> Backport from mainline
> >>> 2015-04-25  Marek Polacek  
> >>> PR c/52085
> >>> * gcc.dg/enum-incomplete-2.c: New test.
> >>> * gcc.dg/enum-mode-1.c: New test.

Jakub


Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Prathamesh Kulkarni
On 18 October 2016 at 13:55, Richard Biener  wrote:
> On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 18 October 2016 at 02:46, Jeff Law  wrote:
>> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> >>
>> >> This patch is mostly the same as previous one, except it drops
>> >> targeting __udivmoddi4() because it gave undefined reference link
>> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
>> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
>> >> needed for the target (it was a bug in my patch that called
>> >> __udivmoddi4() even though aarch64 supported hardware div).
>> >
>> > This touches on the one high level question I had.  Namely what is the code
>> > generation strategy if the hardware has a div, but not divmod.
>> The divmod transform isn't enabled if target supports hardware div in the 
>> same
>> or wider mode even if divmod libfunc is available for the given mode.
>> >
>> > ISTM in that case I think we want to use the div instruction and synthesize
>> > mod from that result rather than relying on a software divmod.  So it looks
>> > like you ought to be doing the right thing for that case now based on your
>> > comment above.
>> >>
>> >>
>> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> >> will be available for a target if it doesn't have hardware div and
>> >> divmod insn and doesn't have target-specific libfunc for DImode
>> >> divmod ? To be conservative, the attached patch doesn't generate call
>> >> to __udivmoddi4.
>> >
>> > I don't think that's a safe assumption.  Addition of the divmod routines
>> > into libgcc is controlled by the target and have to be explicitly added
>> > AFAICT.
>> >
>> > So on a target like the fr30 which has no div or mod insn and doesn't 
>> > define
>> > the right bits in libgcc, there is no divmod libcall available. (On these
>> > targets there's a div libcall and a mod libcall, but not a combined one).
>> Thanks. I had erroneously  assumed __udivimoddi4() was available for all 
>> targets
>> because it was defined in libgcc2.c and generated call to it as "last
>> resort" for unsigned
>> DImode case if target didn't support hardware div and divmod insn and
>> didn't have
>> target-specific divmod libfunc for DImode.
>> The reason why it generated undefined reference on aarch64-linux-gnu
>> was because I
>> didn't properly check in the patch if target supported hardware div,
>> and ended up generating call to
>> __udivmoddi4() even though aarch64 has hardware div. I rectified that
>> before posting the
>> patch.
>> >
>> > I don't even think we have a way of knowing in the compiler if the target
>> > has enabled divmod support in libgcc.
>
> Yeah, that's what bothers me with the current optab libfunc query
> setup -- it isn't reliable.
>
>> Well the arm and c6x backends register target-specific divmod libfunc via
>> set_optab_libfunc(). I suppose that's sufficient to know if target has
>> divmod enabled
>> in libgcc ?
>> >
>> > ISTM that for now we have to limit to cases where we have a divmod
>> > insn/libcall defined.
>> Indeed. The transform is enabled only if the target has hardware divmod insn
>> or divmod libfunc (in the libfunc case, no hardware div insn).
>> Please see divmod_candidate_p() in the patch for cases when the
>> transform gets enabled.
>
> But after your patch the divmod libfunc is never available?
After the patch, the divmod libfunc is not available if the target
doesn't explicitly set it.
So by default optab_libfunc ([us]divmod_optab, mode) would return NULL
instead of bogus libfunc
constructed with gen_int_libfunc, but if the target has registered it
with set_optab_libfunc, then
the target-specific libfunc name would be returned.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > jeff
>> >
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


[Bug middle-end/78016] New: REG_NOTE order is not kept during insn copy

2016-10-18 Thread jiwang at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78016

Bug ID: 78016
   Summary: REG_NOTE order is not kept during insn copy
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jiwang at gcc dot gnu.org
CC: ebotcazou at gcc dot gnu.org, jakub at redhat dot com
  Target Milestone: ---

Created attachment 39826
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39826=edit
keep REG_NOTE order during insn copy

I recently noticed when copying insn, GCC is actually not keeping the order of
REG_NOTEs.  This seems OK for some types of REG_NOTEs, for example UNUSED,
NORETURN etc.  But it seems not OK for DWARF annotation REG_NOTEs.

The reason is GCC DEARF module will do sanity check to make sure CFI rules come
from multiply paths are identical.


  AB
  \   /
   \ /
\   /
  C

For example, A and B are predecessor of C, then the following code in
"maybe_record_trace_start" in dwarf2cfi.c will do sanity check to make sure
DWARF rules come from A and B will be identical.

  /* We ought to have the same state incoming to a given trace no
 matter how we arrive at the trace.  Anything else means we've
 got some kind of optimization error.  */
  gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row));


As we don't keep the order of REG_NOTE, it's possible the final DWARF rules for
each register won't be identical.

It seems this issue is not exposed because:

  * normally, only prologue and epilogue basic blocks will contain DWARF
annnocation info.
  * Very few passes (bb-reorder) after pro/epi insertion pass will do insn
copy.
  * DWARF sanity check happens before CFI auto deduction, normally backends
only
generate simply DWARF info.  So normally we simply mark the insn as
RTX_FRAME_RELATED_P to let GCC auto-deduct the CFI annocation this happens
after the sanity check.

I came accorss the ICE when I was playing with some DWARF generation.

  0x98c9f4 maybe_record_trace_start
../../gcc-svn/gcc/dwarf2cfi.c:2285
  0x98cd15 create_trace_edges
../../gcc-svn/gcc/dwarf2cfi.c:2379
  0x98d59e scan_trace
../../gcc-svn/gcc/dwarf2cfi.c:2593
  0x98d686 create_cfi_notes
../../gcc-svn/gcc/dwarf2cfi.c:2619
  0x98e172 execute_dwarf2_frame
../../gcc-svn/gcc/dwarf2cfi.c:2977
  0x98ee3e execute
../../gcc-svn/gcc/dwarf2cfi.c:3457

You can reproduce this ICE by

step 1  (patch aarch64.c to generate DWARF manually)
===
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 241233)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -2958,9 +2958,26 @@

   insn = emit_insn (aarch64_gen_storewb_pair (mode, stack_pointer_rtx, reg1,
  reg2, adjustment));
+#if 0
   RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 2)) = 1;
   RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1;
+#else
+  /* cfi_offset for reg1.  */
+  rtx mem =
+gen_rtx_MEM (Pmode, plus_constant (Pmode, stack_pointer_rtx,
-adjustment));
+  add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, reg1));
+
+  /* cfi_offset for reg2.  */
+  mem =
+gen_rtx_MEM (Pmode,
+plus_constant (Pmode, stack_pointer_rtx, -adjustment + 8));
+  add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, reg2));
+
+  /* .cfi_def_cfa_offset for sp adjustment.  */
+  add_reg_note (insn, REG_CFA_DEF_CFA,
+   plus_constant (Pmode, stack_pointer_rtx, -adjustment));
+  add_reg_note (insn, REG_CFA_OFFSET, gen_rtx_SET (mem, reg2));
+
+  /* .cfi_def_cfa_offset for sp adjustment.  */
+  add_reg_note (insn, REG_CFA_DEF_CFA,
+   plus_constant (Pmode, stack_pointer_rtx, -adjustment));
   RTX_FRAME_RELATED_P (insn) = 1;
+#endif
 }

 static rtx

step 2
===
./configure --target=aarch64-linux --enable-languages=c,c++
make all-gcc -j10
./gcc/cc1 ./gcc/testsuite/gcc.c-torture/execute/20031204-1.c -g -O3 
-funroll-loops -fpeel-loops -ftracer -finline-functions 20031204-1.i
(please remove the #include  in the head of 20031204-1.c)

I attached a simply fix to keep REG-NOTE order during insn copy.

Any comments?

(The only relevant discussion I can find is at
https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00546.html, where and when
REG_NOTE order was thinking to be not matter)

[Bug libfortran/66756] libgfortran: ThreadSanitizer: lock-order-inversion

2016-10-18 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66756

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Dominique d'Humieres  ---
> I also obtain this with GCC 6.1.0.

So moved to NEW.

Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Jakub Jelinek
On Tue, Oct 18, 2016 at 02:46:29PM +0530, Senthil Kumar Selvaraj wrote:
> > I'm not convinced it is desirable to backport such changes, it affects ABI,
> > people are used to deal with minor ABI changes in between major GCC
> > releases, but we'd need a strong reason to change it between minor releases.
> 
> Hmm, I tracked this down from a (internal) bug reported on arm-none-eabi, 
> where the
> inconsistent enum size (used in sizeof to malloc) was eventually causing
> heap corruption. 
> 
> When debugging the issue, I noticed you'd already backported
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69669, so I thought this
> should be good.

That one has been a regression, older GCCs handled it the same as does the 5
branch now.  Is that the case here?

Jakub


Re: [PATCH 5/7] remove cast in delete_insn_chain

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* cfgrtl.c (delete_insn_chain): Change argument type to rtx_insn *
and adjust for that.
* cfgrtl.h (delete_insn_chain): Adjust prototype.


Ok.


Bernd



Re: [PATCH 6/7] remove cast from prev_nonnote_insn_bb

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* emit-rtl.c (prev_nonnote_insn_bb): Change argument type to
rtx_insn *.
* rtl.h (prev_nonnote_insn_bb): Adjust prototype.


Ok.


Bernd



[PATCH] Fix BB_VISITED clearing in IRA, remove substitue-and-fold dce flag

2016-10-18 Thread Richard Biener

This fixes the BB_VISITED bug in IRA I ran into earlier this year, 
removing the superfluous clearing in VRP and the SSA propagator as well
as removing the now always true do_dce flag from substitute-and-fold.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-10-18  Richard Biener  

* tree-ssa-propagate.h (substitute_and_fold): Adjust prototype.
* tree-ssa-propagate.c (ssa_prop_fini): Remove final BB_VISITED
clearing.
(substitute_and_fold_dom_walker): Adjust constructor.
(substitute_and_fold_dom_walker::before_dom_children): Remove
do_dce flag and handling (always true).
(substitute_and_fold): Likewise.
* tree-vrp.c (vrp_finalize): Adjust.
(execute_early_vrp): Remove final BB_VISITED clearing.
* tree-ssa-ccp.c (ccp_finalize): Adjust.
* tree-ssa-copy.c (fini_copy_prop): Likewise.
* ira.c (ira): Call clear_bb_flags.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241294)
+++ gcc/tree-vrp.c  (working copy)
@@ -10622,8 +10622,7 @@ vrp_finalize (bool warn_array_bounds_p)
  vr_value[i]->max);
   }
 
-  substitute_and_fold (op_with_constant_singleton_value_range,
-  vrp_fold_stmt, true);
+  substitute_and_fold (op_with_constant_singleton_value_range, vrp_fold_stmt);
 
   if (warn_array_bounds && warn_array_bounds_p)
 check_all_array_refs ();
@@ -10954,8 +10953,6 @@ execute_early_vrp ()
   vrp_free_lattice ();
   scev_finalize ();
   loop_optimizer_finalize ();
-  FOR_EACH_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
   return 0;
 }
 
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  (revision 241294)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -953,8 +953,7 @@ ccp_finalize (bool nonzero_p)
 }
 
   /* Perform substitutions based on the known constant values.  */
-  something_changed = substitute_and_fold (get_constant_value,
-  ccp_fold_stmt, true);
+  something_changed = substitute_and_fold (get_constant_value, ccp_fold_stmt);
 
   free (const_val);
   const_val = NULL;
Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 241294)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -479,9 +479,6 @@ ssa_prop_fini (void)
   free (cfg_order_to_bb);
   BITMAP_FREE (ssa_edge_worklist);
   uid_to_stmt.release ();
-  basic_block bb;
-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
-bb->flags &= ~BB_VISITED;
 }
 
 
@@ -972,10 +969,9 @@ class substitute_and_fold_dom_walker : p
 public:
 substitute_and_fold_dom_walker (cdi_direction direction,
ssa_prop_get_value_fn get_value_fn_,
-   ssa_prop_fold_stmt_fn fold_fn_,
-   bool do_dce_)
+   ssa_prop_fold_stmt_fn fold_fn_)
: dom_walker (direction), get_value_fn (get_value_fn_),
-  fold_fn (fold_fn_), do_dce (do_dce_), something_changed (false)
+  fold_fn (fold_fn_), something_changed (false)
 {
   stmts_to_remove.create (0);
   stmts_to_fixup.create (0);
@@ -993,7 +989,6 @@ public:
 
 ssa_prop_get_value_fn get_value_fn;
 ssa_prop_fold_stmt_fn fold_fn;
-bool do_dce;
 bool something_changed;
 vec stmts_to_remove;
 vec stmts_to_fixup;
@@ -1012,8 +1007,7 @@ substitute_and_fold_dom_walker::before_d
   tree res = gimple_phi_result (phi);
   if (virtual_operand_p (res))
continue;
-  if (do_dce
- && res && TREE_CODE (res) == SSA_NAME)
+  if (res && TREE_CODE (res) == SSA_NAME)
{
  tree sprime = get_value_fn (res);
  if (sprime
@@ -1039,8 +1033,7 @@ substitute_and_fold_dom_walker::before_d
   /* No point propagating into a stmt we have a value for we
  can propagate into all uses.  Mark it for removal instead.  */
   tree lhs = gimple_get_lhs (stmt);
-  if (do_dce
- && lhs && TREE_CODE (lhs) == SSA_NAME)
+  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
  tree sprime = get_value_fn (lhs);
  if (sprime
@@ -1180,8 +1173,7 @@ substitute_and_fold_dom_walker::before_d
 
 bool
 substitute_and_fold (ssa_prop_get_value_fn get_value_fn,
-ssa_prop_fold_stmt_fn fold_fn,
-bool do_dce)
+ssa_prop_fold_stmt_fn fold_fn)
 {
   gcc_assert (get_value_fn);
 
@@ -1192,7 +1184,7 @@ substitute_and_fold (ssa_prop_get_value_
 
   calculate_dominance_info (CDI_DOMINATORS);
   substitute_and_fold_dom_walker walker(CDI_DOMINATORS,
-   get_value_fn, fold_fn, do_dce);
+   get_value_fn, fold_fn);

Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

2016-10-18 Thread Christophe Lyon
Hi,


On 17 October 2016 at 18:47, Kyrill Tkachov  wrote:
>
> On 30/09/16 14:34, Bernd Edlinger wrote:
>>
>> On 09/30/16 12:14, Bernd Edlinger wrote:
>>>
>>> Eric Botcazou wrote:
>
> A comment before the SETs and a testcase would be nice.  IIRC
> we do have stack size testcases via using -fstack-usage.

 Or -Wstack-usage, which might be more appropriate here.
>>>
>>> Yes.  good idea.  I was not aware that we already have that kind of
>>> tests.
>>>
>>> When trying to write this test. I noticed, that I did not try -Os so far.
>>> But for -Os the stack is still the unchanged 3500 bytes.
>>>
>>> However for embedded targets I am often inclined to use -Os, and
>>> would certainly not expect the stack to explode...
>>>
>>> I see in arm.md there are places like
>>>
>>> /* If we're optimizing for size, we prefer the libgcc calls.  */
>>> if (optimize_function_for_size_p (cfun))
>>>   FAIL;
>>>
>> Oh, yeah.  The comment is completely misleading.
>>
>> If this pattern fails, expmed.c simply expands some
>> less efficient rtl, which also results in two shifts
>> and one or-op.  No libgcc calls at all.
>>
>> So in simple cases without spilling the resulting
>> assembler is the same, regardless if this pattern
>> fails or not.  But the half-defined out registers
>> make a big difference when it has to be spilled.
>>
>>> /* Expand operation using core-registers.
>>>'FAIL' would achieve the same thing, but this is a bit
>>> smarter.  */
>>> scratch1 = gen_reg_rtx (SImode);
>>> scratch2 = gen_reg_rtx (SImode);
>>> arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0],
>>> operands[1],
>>>operands[2], scratch1, scratch2);
>>>
>>>
>>> .. that explains why this happens.  I think it would be better to
>>> use the emit_coreregs for shift count >= 32, because these are
>>> effectively 32-bit shifts.
>>>
>>> Will try if that can be improved, and come back with the
>>> results.
>>>
>> The test case with -Os has 3520 bytes stack usage.
>> When only shift count >= 32 are handled we
>> have still 3000 bytes stack usage.
>> And when arm_emit_coreregs_64bit_shift is always
>> allowed to run, we have 2360 bytes stack usage.
>>
>> Also for the code size it is better not to fail this
>> pattern.  So I propose to remove this exception in all
>> three expansions.
>>
>> Here is an improved patch with the test case from the PR.
>> And a comment on the redundant SET why it is better to clear
>> the out register first.
>>
>>
>> Bootstrap and reg-testing on arm-linux-gnueabihf.
>> Is it OK for trunk?
>
>
> This looks ok to me.
> Thanks,
> Kyrill
>

I am seeing a lot of regressions since this patch was committed:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241273/report-build-info.html

(you can click on "REGRESSED" to see the list of regressions, "sum"
and "log" to download
the corresponding .sum/.log)

Thanks,

Christophe

>>
>> Thanks
>> Bernd.
>
>


[Bug libstdc++/78015] pthread_cancel while some exception is pending results in std::terminate ()

2016-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78015

--- Comment #2 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #0)
> #ifdef WORKAROUND
>   try
>   {
> throw 1;
>   }
>   catch (int &)
> #endif

Oops, of course I meant #ifndef WORKAROUND.  The problem is when there is an
outstanding normal exception and then we try to rethrow forced unwind exception
that we catch in the catch handler.

Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Senthil Kumar Selvaraj
Ping!

Regards
Senthil

Senthil Kumar Selvaraj writes:

> Bernd Schmidt writes:
>
>> On 09/16/2016 09:02 PM, Senthil Kumar Selvaraj wrote:
>>>   Does this make sense? I ran a reg test for the avr target with a
>>>   slightly older version of this patch, it did not show any regressions.
>>>   If this is the right fix, I'll make sure to run reg tests on x86_64
>>>   after backporting to a gcc version where that target used reload.
>>
>> It's hard to say, and could have different effects on different targets.
>> One thing though, at the very least the reg_class_size test would have 
>> to be adapted - the idea is to find the largest class, and there's a 
>> risk here of ending up with a large class that only has one valid register.
>
> Agreed - I've updated the patch to compute rclass sizes based on regno
> availability i.e., only if in_hard_reg_set_p and HARD_REGNO_MODE_OK, and
> then use the computed sizes when calculating best_size.
>
>>
>> You'll also want to verify this against
>>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54814
>
> Yes, this patch doesn't break the fix for PR54814. The change to
> in_hard_reg_set_p was what fixed that, and that remains unmodified.
>
> Reg tested this on top of trunk@190252 with the in_hard_reg_set_p
> backport. x86_64-pc-linux bootstrapped and regtested ok. avr showed
> no regressions either.
>
> Ok for trunk?
>
> Regards
> Senthil
>
> gcc/ChangeLog:
>
> 2016-10-13  Senthil Kumar Selvaraj  
>
>   * reload.c (find_valid_class_1): Allow regclass if atleast one
>   regno in class is ok. Compute and use rclass size based on
>   actually available regnos for mode in rclass.
>
> gcc/testsuite/ChangeLog:
>
> 2016-10-13  Senthil Kumar Selvaraj  
>   
>   * gcc.target/avr/pr71627.c: New.
>
>
> Index: gcc/reload.c
> ===
> --- gcc/reload.c  (revision 240989)
> +++ gcc/reload.c  (working copy)
> @@ -711,31 +711,36 @@
>enum reg_class best_class = NO_REGS;
>unsigned int best_size = 0;
>int cost;
> +  unsigned int computed_rclass_sizes[N_REG_CLASSES] = { 0 };
>  
>for (rclass = 1; rclass < N_REG_CLASSES; rclass++)
>  {
> -  int bad = 0;
> -  for (regno = 0; regno < FIRST_PSEUDO_REGISTER && !bad; regno++)
> - {
> -   if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno)
> -   && !HARD_REGNO_MODE_OK (regno, mode))
> - bad = 1;
> - }
> -  
> -  if (bad)
> - continue;
> +  int atleast_one_regno_ok = 0;
>  
> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +{
> +  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
> +{
> +  atleast_one_regno_ok = 1;
> +  if (HARD_REGNO_MODE_OK (regno, mode))
> +computed_rclass_sizes[rclass]++;
> +}
> +}
> +
> +  if (!atleast_one_regno_ok)
> +continue;
> +
>cost = register_move_cost (outer, (enum reg_class) rclass, dest_class);
>  
> -  if ((reg_class_size[rclass] > best_size
> -&& (best_cost < 0 || best_cost >= cost))
> -   || best_cost > cost)
> - {
> -   best_class = (enum reg_class) rclass;
> -   best_size = reg_class_size[rclass];
> -   best_cost = register_move_cost (outer, (enum reg_class) rclass,
> -   dest_class);
> - }
> +  if ((computed_rclass_sizes[rclass] > best_size
> + && (best_cost < 0 || best_cost >= cost))
> +|| best_cost > cost)
> +  {
> +best_class = (enum reg_class) rclass;
> +best_size = computed_rclass_sizes[rclass];
> +best_cost = register_move_cost (outer, (enum reg_class) rclass,
> +dest_class);
> +  }
>  }
>  
>gcc_assert (best_size != 0);
>
> Index: gcc/testsuite/gcc.target/avr/pr71627.c
> ===
> --- gcc/testsuite/gcc.target/avr/pr71627.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/avr/pr71627.c  (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +
> +extern volatile __memx const long  a, b, c, d, e, f;
> +extern volatile long result;
> +
> +extern void vfunc (const char*, ...);
> +
> +void foo (void)
> +{
> +   result = a + b + c + d + e + f;
> +   vfunc ("text", a, b, c, d, e, f, result);
> +}



[PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
The name of Etc/GMT+1 timezone is "-01", as evident from:

$ TZ=Etc/GMT+1 date +%Z
-01

Attached patch fixes the testsuite failure.

Uros.
diff --git a/libgo/go/time/time_test.go b/libgo/go/time/time_test.go
index b7ebb37..694e311 100644
--- a/libgo/go/time/time_test.go
+++ b/libgo/go/time/time_test.go
@@ -939,8 +939,8 @@ func TestLoadFixed(t *testing.T) {
// but Go and most other systems use "east is positive".
// So GMT+1 corresponds to -3600 in the Go zone, not +3600.
name, offset := Now().In(loc).Zone()
-   if name != "GMT+1" || offset != -1*60*60 {
-   t.Errorf("Now().In(loc).Zone() = %q, %d, want %q, %d", name, 
offset, "GMT+1", -1*60*60)
+   if name != "-01" || offset != -1*60*60 {
+   t.Errorf("Now().In(loc).Zone() = %q, %d, want %q, %d", name, 
offset, "-01", -1*60*60)
}
 }
 


Re: [PATCH] Fix PR77916

2016-10-18 Thread Christophe Lyon
On 18 October 2016 at 05:18, Markus Trippelsdorf  wrote:
> On 2016.10.18 at 05:13 +0200, Markus Trippelsdorf wrote:
>> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
>> > Hi,
>> >
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
>> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
>> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
>> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
>> > of pointer type.  This patch recognizes when pointer arithmetic is taking
>> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
>> > the case of the PR, this occurs in the logic where the stride S is a known
>> > constant value, but the same problem could occur when it is an SSA_NAME
>> > without known value.  Both possibilities are handled here.
>> >
>> > Fixing the code to ensure that the unknown stride case always uses an
>> > initializer for a negative increment allows us to remove the stopgap fix
>> > added for PR77937 as well.
>> >
>> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> > regressions, committed.
>>
>> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
>> X86_64 before committing these patches, because you broke it for the
>> third time in the last couple of days.
>>
>> markus@x4 ffmpeg % cat h264dsp.i
>> extern int fn2(int);
>> extern int fn3(int);
>> int a, b, c;
>> void fn1(long p1) {
>>   char *d;
>>   for (;; d += p1) {
>> d[0] = fn2(1 >> c >> 1);
>> fn2(c >> a);
>> d[1] = fn3(d[1]) >> 1;
>> d[6] = fn3(d[6] * b + 1) >> 1;
>> d[7] = fn3(d[7] * b + 1) >> 1;
>> d[8] = fn3(d[8] * b + 1) >> 1;
>>   }
>> }
>>
>> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
>> h264dsp.i: In function ‘fn1’:
>> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
>> gimple-ssa-strength-reduction.c:3375
>>  void fn1(long p1) {
>>   ^~~
>> 0x12773a9 replace_one_candidate
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
>> 0x127af77 replace_profitable_candidates
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
>> 0x127aeeb replace_profitable_candidates
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
>> 0x127f3ee analyze_candidates_and_replace
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
>> 0x127f3ee execute
>> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
>
> Just figured out that this testcase is identical to:
> gcc/testsuite/gcc.dg/torture/pr77937-2.c
>
> So please run the testsuite on X86_64 in the future.
>


I'm not sure whether Markus means that pr77937-2 fails since this commit?

I'm seeing ICEs on pr77937-2 on some arm targets:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241285/report-build-info.html

but I'm not 100% sure it was caused by this patch (the regression
happened between
r241273 and r241285), I haven't looked in details yet.

Christophe

> --
> Markus


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Segher Boessenkool
[ sorry for losing track of this patch ]

On Sun, Oct 09, 2016 at 10:32:51AM +0200, Eric Botcazou wrote:
> > Use "mode" instead of "Pmode" here?
> 
> No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.

I still don't see it, could you explain a bit more?


Segher


Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Eric Botcazou
> > No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.
> 
> I still don't see it, could you explain a bit more?

MODE is the mode of operands[1] before:

  operands[1] = force_const_mem (mode, operands[1]);

and after.  But the test is on the address of the MEM, not on the MEM itself:

  if (TARGET_TOC
  && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
  && use_toc_relative_ref (XEXP (operands[1], 0), Pmode))

because it's the mode of SYMBOL_REF we are interesting in (and force_const_mem 
guarantees that it's Pmode).  IOW you could theoretically have mode == SImode 
and we would still need to pass Pmode to use_toc_relative_ref (of course the 
whole thing is guarded with mode == Pmode so that's a little artificial).

-- 
Eric Botcazou


Re: [PATCH 7/7] make targetm.gen_ccmp{first,next} take rtx_insn **

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* ccmp.c (expand_ccmp_expr_1): Adjust.
(expand_ccmp_expr): Likewise.
(expand_ccmp_next): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_ccmp_next): Likewise.
(aarch64_gen_ccmp_first): Likewise.
* doc/tm.texi: Regenerate.
* target.def (gen_ccmp_first): Change argument types to rtx_insn *.
(gen_ccmp_next): Likewise.


Looks reasonable, but has this been tested on aarch64? I think that's a 
prerequisite for this patch.



Bernd


Re: [PATCH] Simplify conditions in EVRP, handle taken edge

2016-10-18 Thread Richard Biener
On Mon, 17 Oct 2016, Richard Biener wrote:

> 
> This refactors propagation vs. substitution and handles condition
> simplification properly as well as passing a known taken edge down
> to the DOM walker (avoiding useless work and properly handling PHIs).
> 
> If we do all the work it's stupid to not fold away dead code...
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

The following is what I applied, also fixing a spelling mistake noticed
by Bernhard.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Handle
not visited but non-executable predecessors.  Return taken edge.
Simplify conditions and refactor propagation vs. folding step.

* gcc.dg/tree-ssa/pr20318.c: Disable EVRP.
* gcc.dg/tree-ssa/pr21001.c: Likewise.
* gcc.dg/tree-ssa/pr21090.c: Likewise.
* gcc.dg/tree-ssa/pr21294.c: Likewise.
* gcc.dg/tree-ssa/pr21563.c: Likewise.
* gcc.dg/tree-ssa/pr23744.c: Likewise.
* gcc.dg/tree-ssa/pr25382.c: Likewise.
* gcc.dg/tree-ssa/pr68431.c: Likewise.
* gcc.dg/tree-ssa/vrp03.c: Likewise.
* gcc.dg/tree-ssa/vrp06.c: Likewise.
* gcc.dg/tree-ssa/vrp07.c: Likewise.
* gcc.dg/tree-ssa/vrp09.c: Likewise.
* gcc.dg/tree-ssa/vrp19.c: Likewise.
* gcc.dg/tree-ssa/vrp20.c: Likewise.
* gcc.dg/tree-ssa/vrp92.c: Likewise.
* gcc.dg/pr68217.c: Likewise.
* gcc.dg/predict-9.c: Likewise.
* gcc.dg/tree-prof/val-prof-5.c: Adjust.
* gcc.dg/predict-1.c: Likewise.



Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241242)
+++ gcc/tree-vrp.c  (working copy)
@@ -10741,12 +10741,13 @@ evrp_dom_walker::before_dom_children (ba
   gimple_stmt_iterator gsi;
   edge e;
   edge_iterator ei;
-  bool has_unvisived_preds = false;
+  bool has_unvisited_preds = false;
 
   FOR_EACH_EDGE (e, ei, bb->preds)
-if (!(e->src->flags & BB_VISITED))
+if (e->flags & EDGE_EXECUTABLE
+   && !(e->src->flags & BB_VISITED))
   {
-   has_unvisived_preds = true;
+   has_unvisited_preds = true;
break;
   }
 
@@ -10756,7 +10757,7 @@ evrp_dom_walker::before_dom_children (ba
   gphi *phi = gpi.phi ();
   tree lhs = PHI_RESULT (phi);
   value_range vr_result = VR_INITIALIZER;
-  if (!has_unvisived_preds
+  if (!has_unvisited_preds
  && stmt_interesting_for_vrp (phi))
extract_range_from_phi_node (phi, _result);
   else
@@ -10764,81 +10765,90 @@ evrp_dom_walker::before_dom_children (ba
   update_value_range (lhs, _result);
 }
 
+  edge taken_edge = NULL;
+
   /* Visit all other stmts and discover any new VRs possible.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
 {
   gimple *stmt = gsi_stmt (gsi);
-  edge taken_edge;
   tree output = NULL_TREE;
   gimple *old_stmt = stmt;
   bool was_noreturn = (is_gimple_call (stmt)
   && gimple_call_noreturn_p (stmt));
 
-  /* TODO, if found taken_edge, we should visit (return it) and travel
-again to improve VR as done in DOM/SCCVN optimizations.  It should
-be done carefully as stmts might prematurely leave a BB like
-in EH.  */
-  if (stmt_interesting_for_vrp (stmt))
+  if (gcond *cond = dyn_cast  (stmt))
+   {
+ vrp_visit_cond_stmt (cond, _edge);
+ if (taken_edge)
+   {
+ if (taken_edge->flags & EDGE_TRUE_VALUE)
+   gimple_cond_make_true (cond);
+ else if (taken_edge->flags & EDGE_FALSE_VALUE)
+   gimple_cond_make_false (cond);
+ else
+   gcc_unreachable ();
+   }
+   }
+  else if (stmt_interesting_for_vrp (stmt))
{
+ edge taken_edge;
  value_range vr = VR_INITIALIZER;
  extract_range_from_stmt (stmt, _edge, , );
  if (output
  && (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE))
-   update_value_range (output, );
- else
-   set_defs_to_varying (stmt);
-
- /* Try folding stmts with the VR discovered.  */
- bool did_replace
-   = replace_uses_in (stmt,
-  op_with_constant_singleton_value_range);
- if (fold_stmt (, follow_single_use_edges)
- || did_replace)
-   update_stmt (gsi_stmt (gsi));
-
- if (did_replace)
{
- /* If we cleaned up EH information from the statement,
-remove EH edges.  */
- if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-   bitmap_set_bit (need_eh_cleanup, bb->index);
-
- /* If we turned a not noreturn call into a noreturn one
-schedule it for fixup.  */
- if (!was_noreturn
- && is_gimple_call (stmt)
- 

Re: RFC [1/3] divmod transform v2

2016-10-18 Thread Richard Biener
On Tue, 18 Oct 2016, Prathamesh Kulkarni wrote:

> On 18 October 2016 at 02:46, Jeff Law  wrote:
> > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
> >>
> >> This patch is mostly the same as previous one, except it drops
> >> targeting __udivmoddi4() because it gave undefined reference link
> >> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
> >> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
> >> needed for the target (it was a bug in my patch that called
> >> __udivmoddi4() even though aarch64 supported hardware div).
> >
> > This touches on the one high level question I had.  Namely what is the code
> > generation strategy if the hardware has a div, but not divmod.
> The divmod transform isn't enabled if target supports hardware div in the same
> or wider mode even if divmod libfunc is available for the given mode.
> >
> > ISTM in that case I think we want to use the div instruction and synthesize
> > mod from that result rather than relying on a software divmod.  So it looks
> > like you ought to be doing the right thing for that case now based on your
> > comment above.
> >>
> >>
> >> However this makes me wonder if it's guaranteed that __udivmoddi4()
> >> will be available for a target if it doesn't have hardware div and
> >> divmod insn and doesn't have target-specific libfunc for DImode
> >> divmod ? To be conservative, the attached patch doesn't generate call
> >> to __udivmoddi4.
> >
> > I don't think that's a safe assumption.  Addition of the divmod routines
> > into libgcc is controlled by the target and have to be explicitly added
> > AFAICT.
> >
> > So on a target like the fr30 which has no div or mod insn and doesn't define
> > the right bits in libgcc, there is no divmod libcall available. (On these
> > targets there's a div libcall and a mod libcall, but not a combined one).
> Thanks. I had erroneously  assumed __udivimoddi4() was available for all 
> targets
> because it was defined in libgcc2.c and generated call to it as "last
> resort" for unsigned
> DImode case if target didn't support hardware div and divmod insn and
> didn't have
> target-specific divmod libfunc for DImode.
> The reason why it generated undefined reference on aarch64-linux-gnu
> was because I
> didn't properly check in the patch if target supported hardware div,
> and ended up generating call to
> __udivmoddi4() even though aarch64 has hardware div. I rectified that
> before posting the
> patch.
> >
> > I don't even think we have a way of knowing in the compiler if the target
> > has enabled divmod support in libgcc.

Yeah, that's what bothers me with the current optab libfunc query
setup -- it isn't reliable.

> Well the arm and c6x backends register target-specific divmod libfunc via
> set_optab_libfunc(). I suppose that's sufficient to know if target has
> divmod enabled
> in libgcc ?
> >
> > ISTM that for now we have to limit to cases where we have a divmod
> > insn/libcall defined.
> Indeed. The transform is enabled only if the target has hardware divmod insn
> or divmod libfunc (in the libfunc case, no hardware div insn).
> Please see divmod_candidate_p() in the patch for cases when the
> transform gets enabled.

But after your patch the divmod libfunc is never available?

Richard.

> Thanks,
> Prathamesh
> >
> > jeff
> >
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov
> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkov  wrote:
> 
>> 
>> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
>> 
>> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
 The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
 attempted to scale down the register limit used by -fsched-pressure for 
 the case where the block in question executes as frequently as the entry 
 block to just the call_clobbered (i.e. call_used) regs. But the code is 
 actually scaling toward call_saved registers. The following patch corrects 
 that by computing call_saved regs per class and subtracting out some 
 scaled portion of that.
> 
> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>>> Hi Pat,
>>> 
>>> I stared at your patch and current code for good 30 minutes, and I still 
>>> don't see what is wrong with the current code.
>>> 
>>> With your patch the number of registers from class CL that scheduler has at 
>>> its disposal for a single-basic-block function will be:
>>> 
>>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>>> call_saved_regs_num[CL];
>>> 
>>> where call_saved_regs_num is number of registers in class CL that need to 
>>> be saved in the prologue (i.e., "free" registers).  I can see some logic in 
>>> setting
>>> 
>>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>>> 
>>> but not in subtracting number of such registers from the number of total 
>>> available hard registers.
>>> 
>>> Am I missing something?
>>> 
>> 
>> Your original patch gave the following reasoning:
>> 
>> "At the moment the scheduler does not account for spills in the prologues 
>> and restores in the epilogue, which occur from use of call-used registers.  
>> The current state is, essentially, optimized for case when there is a hot 
>> loop inside the function, and the loop executes significantly more often 
>> than the prologue/epilogue.  However, on the opposite end, we have a case 
>> when the function is just a single non-cyclic basic block, which executes 
>> just as often as prologue / epilogue, so spills in the prologue hurt 
>> performance as much as spills in the basic block itself.  In such a case the 
>> scheduler should throttle-down on the number of available registers and try 
>> to not go beyond call-clobbered registers."
>> 
>> But the misunderstanding is that call-used registers do NOT cause any 
>> save/restore. That is to say, call-used == call-clobbered. Your last 
>> sentence explains the goal for a single block function, to not go beyond 
>> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
>> patch implements that goal by subtracting out call_saved_regs_num (those 
>> that require prolog/epilog save/restore) from the total regs, and using that 
>> as the target # of registers to be used for the block.
> 
> I see your point and agree that current code isn't optimal.  However, I don't 
> think your patch is accurate either.  Consider 
> https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
> that FIXED_REGISTERS in class CL is set for a third of the registers, and 
> CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have 
> a third available for zero-cost allocation 
> (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
> allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
> (FIXED_REGISTERS).
> 
> For a non-loop-single-basic-block function we should be targeting only the 
> third of register available at zero-cost -- correct?  This is what is done by 
> the current code, but, apparently, by accident.  It seems that the right 
> register count can be obtained with:
> 
> for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
> - if (call_used_regs[ira_class_hard_regs[cl][i]])
> -   ++call_used_regs_num[cl];
> + if (!call_used_regs[ira_class_hard_regs[cl][i]]
> +   || fixed_regs[ira_class_hard_regs[cl][i]])
> +   ++call_saved_regs_num[cl];
> 
> Does this look correct to you?

Thinking some more, it seems like fixed_regs should not be available to the 
scheduler no matter what.  Therefore, the number of fixed registers should be 
subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
bb_freq).

--
Maxim Kuvyrkov
www.linaro.org



Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:

 {
-  rtx r0, r16, eqv, tga, tp, insn, dest, seq;
+  rtx r0, r16, eqv, tga, tp, dest, seq;
+  rtx_insn *insn;

   switch (tls_symbolic_operand_type (x))
{
@@ -1025,66 +1026,70 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
machine_mode mode)
  break;

case TLS_MODEL_GLOBAL_DYNAMIC:
- start_sequence ();
+ {
+   start_sequence ();

- r0 = gen_rtx_REG (Pmode, 0);
- r16 = gen_rtx_REG (Pmode, 16);
- tga = get_tls_get_addr ();
- dest = gen_reg_rtx (Pmode);
- seq = GEN_INT (alpha_next_sequence_number++);
+   r0 = gen_rtx_REG (Pmode, 0);
+   r16 = gen_rtx_REG (Pmode, 16);
+   tga = get_tls_get_addr ();
+   dest = gen_reg_rtx (Pmode);
+   seq = GEN_INT (alpha_next_sequence_number++);

- emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
- insn = gen_call_value_osf_tlsgd (r0, tga, seq);
- insn = emit_call_insn (insn);
- RTL_CONST_CALL_P (insn) = 1;
- use_reg (_INSN_FUNCTION_USAGE (insn), r16);
+   emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
+   rtx val = gen_call_value_osf_tlsgd (r0, tga, seq);


Since this doesn't consistently declare variables at the point of 
initialization, might as well put val into the list of variables at the 
top, and avoid reindentation that way. There are several such reindented 
blocks, and the patch would be a lot easier to review without this.


Alternatively, split it up a bit more into obvious/nonobvious parts.


diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 21bba0c..8e8fff4 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -4829,7 +4829,6 @@ static rtx
 arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
 {
   rtx r0 = gen_rtx_REG (Pmode, R0_REG);
-  rtx insns;
   rtx call_fusage = NULL_RTX;

   start_sequence ();
@@ -4846,7 +4845,7 @@ arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
   RTL_PURE_CALL_P (call_insn) = 1;
   add_function_usage_to (call_insn, call_fusage);

-  insns = get_insns ();
+  rtx_insn *insns = get_insns ();
   end_sequence ();


For example, stuff like this looks obvious enough that it can go in.


Bernd


Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-18 Thread Marek Polacek
On Tue, Oct 18, 2016 at 10:12:24AM +0200, Richard Biener wrote:
> On Mon, Oct 17, 2016 at 6:57 PM, Senthil Kumar Selvaraj
>  wrote:
> >
> > Richard Biener writes:
> >
> >> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
> >>  wrote:
> >>> Hi,
> >>>
> >>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
> >>>   same issue on a gcc 5.x and found that the fix didn't get backported.
> >>>
> >>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
> >>>   backport to gcc-5-branch?
> >>
> >> Ok with me but please double-check there was no fallout.
> >
> > I boostrapped and ran against x86_64-pc-linux again, just to be sure.
> > No regressions.
> 
> I meant fallout only fixed with followup patches.  ISTR some in that area
> but I might confuse it with another patch.  Marek might remember.

I don't remember any fallout here (and a quick look at the ML around that
time doesn't reveal any).

Marek


Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Richard Biener
On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
>>
>> Mike Stump:
>>> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
 Therefore, it is better to emit it in all circumstances, in case the 
 reader needs to know what the working
 directory was at compile-time.
>>>
>>> I can't help but wonder if this would break ccache some?
>>>
>>
>> Could you explain this in some more detail? At the moment, GCC will already 
>> emit DW_AT_name with an absolute path (in the scenario that this patch is 
>> relevant to). How does ccache work around this at the moment? (Does it use 
>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
>> patch should be fine.)
>
> If you compile the same file, but in a different directory, I wonder if cwd 
> will cause the cache entry to not be reused.
>
> I expect one of the ccache people that are around would just know if it will 
> care at all.

I believe ccache compares preprocessed source, definitely _not_ DWARF
output, so this shouldn't break anything there.
It might result in different object file output but as the reporter
figured due to a bug in dwarf2out.c we end up generating
DW_AT_comp_dir in almost all cases already.

I think the patch is ok but it misses a ChangeLog entry.  How did you
test the patch? (bootstrapped and tested on ...?)

Thanks,
Richard.


[Bug fortran/78009] [OOP] polymorphic component of derived type array slice handling error

2016-10-18 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78009

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-10-18
 Ever confirmed|0   |1

--- Comment #1 from Dominique d'Humieres  ---
Confirmed from 4.8 up to trunk (7.0).

[Bug c++/78019] New: Local class with lambda in default member initializer cannot default-capture this

2016-10-18 Thread colu...@gmx-topmail.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78019

Bug ID: 78019
   Summary: Local class with lambda in default member initializer
cannot default-capture this
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: colu...@gmx-topmail.de
  Target Milestone: ---

int main() {
struct A {
int x, i = [&] { return x; }();
} a{0};
}

-
> error: 'this' was not captured for this lambda function 

Making the default-capture `=' doesn't help, but explicitly capturing `this'
works.

Re: [PATCH] Fix PR77916

2016-10-18 Thread Markus Trippelsdorf
On 2016.10.18 at 11:19 +0200, Christophe Lyon wrote:
> On 18 October 2016 at 05:18, Markus Trippelsdorf  
> wrote:
> > On 2016.10.18 at 05:13 +0200, Markus Trippelsdorf wrote:
> >> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> >> > Hi,
> >> >
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> >> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> >> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> >> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> >> > of pointer type.  This patch recognizes when pointer arithmetic is taking
> >> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> >> > the case of the PR, this occurs in the logic where the stride S is a 
> >> > known
> >> > constant value, but the same problem could occur when it is an SSA_NAME
> >> > without known value.  Both possibilities are handled here.
> >> >
> >> > Fixing the code to ensure that the unknown stride case always uses an
> >> > initializer for a negative increment allows us to remove the stopgap fix
> >> > added for PR77937 as well.
> >> >
> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> > regressions, committed.
> >>
> >> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> >> X86_64 before committing these patches, because you broke it for the
> >> third time in the last couple of days.
> >>
> >> markus@x4 ffmpeg % cat h264dsp.i
> >> extern int fn2(int);
> >> extern int fn3(int);
> >> int a, b, c;
> >> void fn1(long p1) {
> >>   char *d;
> >>   for (;; d += p1) {
> >> d[0] = fn2(1 >> c >> 1);
> >> fn2(c >> a);
> >> d[1] = fn3(d[1]) >> 1;
> >> d[6] = fn3(d[6] * b + 1) >> 1;
> >> d[7] = fn3(d[7] * b + 1) >> 1;
> >> d[8] = fn3(d[8] * b + 1) >> 1;
> >>   }
> >> }
> >>
> >> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
> >> h264dsp.i: In function ‘fn1’:
> >> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
> >> gimple-ssa-strength-reduction.c:3375
> >>  void fn1(long p1) {
> >>   ^~~
> >> 0x12773a9 replace_one_candidate
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
> >> 0x127af77 replace_profitable_candidates
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
> >> 0x127aeeb replace_profitable_candidates
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
> >> 0x127f3ee analyze_candidates_and_replace
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
> >> 0x127f3ee execute
> >> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
> >
> > Just figured out that this testcase is identical to:
> > gcc/testsuite/gcc.dg/torture/pr77937-2.c
> >
> > So please run the testsuite on X86_64 in the future.
> >
> 
> 
> I'm not sure whether Markus means that pr77937-2 fails since this
> commit?
> 
> I'm seeing ICEs on pr77937-2 on some arm targets:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241285/report-build-info.html
> 
> but I'm not 100% sure it was caused by this patch (the regression
> happened between r241273 and r241285), I haven't looked in details
> yet.

Yes, this is caused by this patch.

-- 
Markus


Re: [PATCH, libfortran] PR 48587 Newunit allocator

2016-10-18 Thread Janne Blomqvist
On Tue, Oct 18, 2016 at 12:09 PM, Steven Bosscher  wrote:
> On Thu, Oct 13, 2016 at 5:16 PM, Janne Blomqvist wrote:
>> +static bool *newunits;
>
> You could make this a bitmap (like sbitmap). A bit more code but makes
> a potentially quadratic search (when opening many units) less time
> consuming.

I did think about that, yes, but decided that it wasn't worth the
extra complexity since

a) The OS typically limits the number of fd's per process to a
relatively small number (typically 1024 by default).

b) For better or worse, in libgfortran a unit is a quite big
structure, not to mention the 8 kB buffer. So obsessing over wasting
an extra 7 bits per unit seemed pointless.

c) Due to the newunit_lwi, in many scenarios it should be able to skip
scanning over, if not all then at least most of, the in-use units. Of
course, it's possible to design a scenario which defeats the lwi, but,
is that something real software does? And even if it does, due to a)
above I think the effect would be quite modest anyway.



-- 
Janne Blomqvist


Re: [PATCH 4/7] remove cast to rtx_insn * in remove_note

2016-10-18 Thread Bernd Schmidt

On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:


2016-10-17  Trevor Saunders  

* config/rl78/rl78.c (gen-and_emit_move): Change argument type
to rtx_insn *.
(transcode_memory_rtx): Likewise.
(move_to_acc): Likewise.
(move_from_acc): Likewise.
(move_acc_to_reg): Likewise.
(move_to_x): Likewise.
(move_to_hl): Likewise.
(move_to_de): Likewise.
* config/rs6000/rs6000.c (emit_frame_save): Likewise.
(rs6000_emit_savres_rtx): Likewise.
(rs6000_emit_prologue): Likewise.
* reorg.c (update_reg_unused_notes): Likewise.
* rtl.h (remove_note): Adjust prototype.
* rtlanal.c (remove_note): Make argument type rtx_insn *.


Ok.


Bernd



[Bug c++/78018] New: [C++14] "internal compiler error: Segmentation fault" with templates and lambdas

2016-10-18 Thread andipeer at gmx dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78018

Bug ID: 78018
   Summary: [C++14] "internal compiler error: Segmentation fault"
with templates and lambdas
   Product: gcc
   Version: 6.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andipeer at gmx dot net
  Target Milestone: ---

Created attachment 39828
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39828=edit
Minimal working example for triggering the crash

Trying to compile the minimal working example that is attached, the compiler
crashes with a segfault. The exact output is stated below. The code should be
correct, as other compilers like Clang have no problem in compiling it.
Note that the bug does not occur if all methods of A are declared inline. Also
note that removing one element of the template-lambda call chain, the bug does
not occur, neither.

Command line: "g++ -std=c++14 -c error.cpp"

The error that is thrown is:

error.cpp: In member function ‘void A::f2(F) [with F =
A::f1()::]’:
error.cpp:24:6: internal compiler error: Segmentation fault
 void A::f2(F f)
  ^
0xad48ef crash_signal
../../src/gcc/toplev.c:333
0xd286ba wi::extended_tree<128>::get_len() const
../../src/gcc/tree.h:5268
0xd286ba wi::int_traits >
>::decompose(long*, unsigned int, generic_wide_int >
const&)
../../src/gcc/wide-int.h:898
0xd286ba
wide_int_ref_storage::wide_int_ref_storage
> >(generic_wide_int > const&, unsigned int)
../../src/gcc/wide-int.h:945
0xd286ba generic_wide_int::generic_wide_int >
>(generic_wide_int > const&, unsigned int)
../../src/gcc/wide-int.h:722
0xd286ba wi::unary_traits >
>::result_type wi::lshift >,
int>(generic_wide_int > const&, int const&)
../../src/gcc/wide-int.h:2847
0xd286ba int_bit_position(tree_node const*)
../../src/gcc/tree.h:5377
0xd286ba classify_argument
../../src/gcc/config/i386/i386.c:8095
0xd28b89 examine_argument
../../src/gcc/config/i386/i386.c:8409
0xd28de6 function_arg_advance_64
../../src/gcc/config/i386/i386.c:8822
0xd28de6 ix86_function_arg_advance
../../src/gcc/config/i386/i386.c:8915
0x8afd94 gimplify_parameters()
../../src/gcc/function.c:3999
0x8e5ef4 gimplify_body(tree_node*, bool)
../../src/gcc/gimplify.c:11522
0x8e6097 gimplify_function_tree(tree_node*)
../../src/gcc/gimplify.c:11682
0x7c5087 cgraph_node::analyze()
../../src/gcc/cgraphunit.c:625
0x7c7a2f analyze_functions
../../src/gcc/cgraphunit.c:1086
0x7c81a8 symbol_table::finalize_compilation_unit()
../../src/gcc/cgraphunit.c:2542

[PATCH] Don't define uses-allocator variable templates in C++11

2016-10-18 Thread Jonathan Wakely

These variable templates give warnings in C++11 mode when
-Wsystem-headers is used:

In file included from /home/jwakely/gcc/7/include/c++/7.0.0/memory:77:0,
from vt.cc:1:
/home/jwakely/gcc/7/include/c++/7.0.0/bits/uses_allocator.h:130:20: warning: 
variable templates only available with -std=c++14 or -std=gnu++14
constexpr bool __is_uses_allocator_constructible_v =
   ^~~
/home/jwakely/gcc/7/include/c++/7.0.0/bits/uses_allocator.h:141:20: warning: 
variable templates only available with -std=c++14 or -std=gnu++14
constexpr bool __is_nothrow_uses_allocator_constructible_v =
   ^~~

We don't need them, so let's only define them for C++14 and up.

* include/bits/uses_allocator.h (__is_uses_allocator_constructible_v)
(__is_nothrow_uses_allocator_constructible_v): Only define for C++14
and later.

Tested powerpc64le-linux, committed to trunk.

commit b19fd14727318d5d6f3a411a2a600f89d07ab28a
Author: Jonathan Wakely 
Date:   Tue Oct 18 12:31:11 2016 +0100

Don't define uses-allocator variable templates in C++11

* include/bits/uses_allocator.h (__is_uses_allocator_constructible_v)
(__is_nothrow_uses_allocator_constructible_v): Only define for C++14
and later.

diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index c7d14f3..612c53c 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -126,9 +126,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : __is_uses_allocator_predicate
 { };
 
+#if __cplusplus >= 201402L
   template
 constexpr bool __is_uses_allocator_constructible_v =
   __is_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
+#endif // C++14
 
   template
 struct __is_nothrow_uses_allocator_constructible
@@ -137,9 +139,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
 
+#if __cplusplus >= 201402L
   template
 constexpr bool __is_nothrow_uses_allocator_constructible_v =
   __is_nothrow_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
+#endif // C++14
 
   template
 void __uses_allocator_construct_impl(__uses_alloc0 __a, _Tp* __ptr,


[Bug c++/78019] Local class with lambda in default member initializer cannot default-capture this

2016-10-18 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78019

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-10-18
 Ever confirmed|0   |1

[Bug c++/78018] [C++14] "internal compiler error: Segmentation fault" with templates and lambdas

2016-10-18 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78018

Markus Trippelsdorf  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-10-18
 CC||trippels at gcc dot gnu.org
 Ever confirmed|0   |1
  Known to fail||5.4.0, 6.2.0, 7.0

--- Comment #1 from Markus Trippelsdorf  ---
Confirmed. It is not a regression, because it never worked.

Even icc crashes:

markus@x4 tmp % icpc -c error.cpp
error.cpp(20): internal error: bad pointer
  f2([&] (auto t) { f3(t); } );
^
compilation aborted for error.cpp (code 4)

Clang accepts the code.

Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Senthil Kumar Selvaraj

Bernd Schmidt writes:

> On 10/13/2016 08:57 AM, Senthil Kumar Selvaraj wrote:
>>
>> 2016-10-13  Senthil Kumar Selvaraj  
>>
>>  * reload.c (find_valid_class_1): Allow regclass if atleast one
>>  regno in class is ok. Compute and use rclass size based on
>>  actually available regnos for mode in rclass.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-10-13  Senthil Kumar Selvaraj  
>>  
>>  * gcc.target/avr/pr71627.c: New.
>>
>>
>> Index: gcc/reload.c
>> ===
>> --- gcc/reload.c (revision 240989)
>> +++ gcc/reload.c (working copy)
>> @@ -711,31 +711,36 @@
>>enum reg_class best_class = NO_REGS;
>>unsigned int best_size = 0;
>>int cost;
>> +  unsigned int computed_rclass_sizes[N_REG_CLASSES] = { 0 };
>
> As far as I can tell you're only accessing this as 
> computed_rclass_size[rclass], i.e. with the current class in the loop. 
> So I don't think you need the array at all, just a computed_size 
> variable in the loop?

Yes - I mechanically replaced the original array with the computed one.
A variable would suffice.
>
>> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +{
>> +  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno))
>> +{
>> +  atleast_one_regno_ok = 1;
>> +  if (HARD_REGNO_MODE_OK (regno, mode))
>> +computed_rclass_sizes[rclass]++;
>> +}
>> +}
>
> Don't you want to also ensure HARD_REGNO_MODE_OK before claiming that 
> atleast_one_regno_ok? Maybe I'm forgetting the motivation but this seems 
> odd. If so, the variable becomes unnecessary, just check the computed size.

True again - the original intention was to prevent the best_xxx
variables from getting set if no regno was in_hard_reg_set. Now the
computed class size would be zero, so the variable is unnecessary.

Will do both the changes and re-run the reg tests. Ok for trunk if the
tests pass for x86_64-pc-linux and avr?

Regards
Senthil


[PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Richard Biener

The following patch makes EVRP remove stmts that will become dead
after propagation.  For this to work we have to propagate into PHIs
(sth we missed as well).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
stmts_to_remove.
(evrp_dom_walker::~evrp_dom_walker): Free it.
(evrp_dom_walker::stmts_to_remove): Add.
(evrp_dom_walker::before_dom_children): Mark PHIs and stmts
whose output we fully propagate for removal.  Propagate
into BB destination PHI arguments.
(execute_early_vrp): Remove queued stmts.  Dump value ranges
before stmt removal.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241302)
+++ gcc/tree-vrp.c  (working copy)
@@ -10643,11 +10643,13 @@ public:
 : dom_walker (CDI_DOMINATORS), stack (10)
 {
   stmts_to_fixup.create (0);
+  stmts_to_remove.create (0);
   need_eh_cleanup = BITMAP_ALLOC (NULL);
 }
   ~evrp_dom_walker ()
 {
   stmts_to_fixup.release ();
+  stmts_to_remove.release ();
   BITMAP_FREE (need_eh_cleanup);
 }
   virtual edge before_dom_children (basic_block);
@@ -10660,6 +10662,7 @@ public:
   auto_vec > stack;
   bitmap need_eh_cleanup;
   vec stmts_to_fixup;
+  vec stmts_to_remove;
 };
 
 
@@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
   else
set_value_range_to_varying (_result);
   update_value_range (lhs, _result);
+
+  /* Mark PHIs whose lhs we fully propagate for removal.  */
+  tree val;
+  if ((val = op_with_constant_singleton_value_range (lhs))
+ && may_propagate_copy (lhs, val))
+   {
+ stmts_to_remove.safe_push (phi);
+ continue;
+   }
 }
 
   edge taken_edge = NULL;
@@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
  update_value_range (output, );
  vr = *get_value_range (output);
 
-
  /* Set the SSA with the value range.  */
  if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
{
@@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
   && range_includes_zero_p (vr.min,
 vr.max) == 1)))
set_ptr_nonnull (output);
+
+ /* Mark stmts whose output we fully propagate for removal.  */
+ tree val;
+ if ((val = op_with_constant_singleton_value_range (output))
+ && may_propagate_copy (output, val)
+ && !stmt_could_throw_p (stmt)
+ && !gimple_has_side_effects (stmt))
+   {
+ stmts_to_remove.safe_push (stmt);
+ continue;
+   }
}
  else
set_defs_to_varying (stmt);
@@ -10860,6 +10882,24 @@ evrp_dom_walker::before_dom_children (ba
}
}
 }
+
+  /* Visit BB successor PHI nodes and replace PHI args.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+{
+  for (gphi_iterator gpi = gsi_start_phis (e->dest);
+  !gsi_end_p (gpi); gsi_next ())
+   {
+ gphi *phi = gpi.phi ();
+ use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, e);
+ tree arg = USE_FROM_PTR (use_p);
+ if (TREE_CODE (arg) != SSA_NAME
+ || virtual_operand_p (arg))
+   continue;
+ if (tree val = op_with_constant_singleton_value_range (arg))
+   propagate_value (use_p, val);
+   }
+}
+ 
   bb->flags |= BB_VISITED;
 
   return taken_edge;
@@ -10941,6 +10981,34 @@ execute_early_vrp ()
   evrp_dom_walker walker;
   walker.walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
+  if (dump_file)
+{
+  fprintf (dump_file, "\nValue ranges after Early VRP:\n\n");
+  dump_all_value_ranges (dump_file);
+  fprintf (dump_file, "\n");
+}
+
+  /* Remove stmts in reverse order to make debug stmt creation possible.  */
+  while (! walker.stmts_to_remove.is_empty ())
+{
+  gimple *stmt = walker.stmts_to_remove.pop ();
+  if (dump_file && dump_flags & TDF_DETAILS)
+   {
+ fprintf (dump_file, "Removing dead stmt ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ fprintf (dump_file, "\n");
+   }
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (gimple_code (stmt) == GIMPLE_PHI)
+   remove_phi_node (, true);
+  else
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (, true);
+ release_defs (stmt);
+   }
+}
+
   if (!bitmap_empty_p (walker.need_eh_cleanup))
 gimple_purge_all_dead_eh_edges (walker.need_eh_cleanup);
 
@@ -10954,12 +11022,6 @@ execute_early_vrp ()
   fixup_noreturn_call (stmt);
 }
 
-  if (dump_file)
-{
-  fprintf (dump_file, "\nValue ranges after Early 

Re: [PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 02:34:58PM +0200, Richard Biener wrote:
> 
> The following patch makes EVRP remove stmts that will become dead
> after propagation.  For this to work we have to propagate into PHIs
> (sth we missed as well).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2016-10-18  Richard Biener  
> 
>   * tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
>   stmts_to_remove.
>   (evrp_dom_walker::~evrp_dom_walker): Free it.
>   (evrp_dom_walker::stmts_to_remove): Add.
>   (evrp_dom_walker::before_dom_children): Mark PHIs and stmts
>   whose output we fully propagate for removal.  Propagate
>   into BB destination PHI arguments.
>   (execute_early_vrp): Remove queued stmts.  Dump value ranges
>   before stmt removal.
> 
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c(revision 241302)
> +++ gcc/tree-vrp.c(working copy)
> @@ -10643,11 +10643,13 @@ public:
>  : dom_walker (CDI_DOMINATORS), stack (10)
>  {
>stmts_to_fixup.create (0);
> +  stmts_to_remove.create (0);
>need_eh_cleanup = BITMAP_ALLOC (NULL);
>  }
>~evrp_dom_walker ()
>  {
>stmts_to_fixup.release ();
> +  stmts_to_remove.release ();
>BITMAP_FREE (need_eh_cleanup);
>  }
>virtual edge before_dom_children (basic_block);
> @@ -10660,6 +10662,7 @@ public:
>auto_vec > stack;
>bitmap need_eh_cleanup;
>vec stmts_to_fixup;
> +  vec stmts_to_remove;

That might as well be an auto_vec right?

>  };
>  
>  
> @@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
>else
>   set_value_range_to_varying (_result);
>update_value_range (lhs, _result);
> +
> +  /* Mark PHIs whose lhs we fully propagate for removal.  */
> +  tree val;
> +  if ((val = op_with_constant_singleton_value_range (lhs))
> +   && may_propagate_copy (lhs, val))

wouldn't it be clearer to write that as

tree val = op_with_constant_singleton_value_range (lhs);
if (val && may_propagate_copy (lhs, val))

> + {
> +   stmts_to_remove.safe_push (phi);
> +   continue;
> + }
>  }
>  
>edge taken_edge = NULL;
> @@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
> update_value_range (output, );
> vr = *get_value_range (output);
>  
> -
> /* Set the SSA with the value range.  */
> if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
>   {
> @@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
>  && range_includes_zero_p (vr.min,
>vr.max) == 1)))
>   set_ptr_nonnull (output);
> +
> +   /* Mark stmts whose output we fully propagate for removal.  */
> +   tree val;
> +   if ((val = op_with_constant_singleton_value_range (output))
> +   && may_propagate_copy (output, val)
> +   && !stmt_could_throw_p (stmt)
> +   && !gimple_has_side_effects (stmt))

similar.

Thanks!

Trev



[Bug middle-end/77964] [7 Regression] Linux kernel firmware loader miscompiled

2016-10-18 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77964

--- Comment #14 from Jiri Slaby  ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Markus Trippelsdorf from comment #9)
> > Is subtracting undefined, too?
> Yes.  Comparing two unrelated arrays or subtracting them is undefined.

But they are not unrelated arrays. So what from the C standard actually makes
(and allows) gcc think they are unrelated?

And given gcc 7 is to be released yet, can we have a switch to disable this
optimization?

[Bug middle-end/77964] [7 Regression] Linux kernel firmware loader miscompiled

2016-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77964

--- Comment #15 from Jakub Jelinek  ---
(In reply to Jiri Slaby from comment #14)
> (In reply to Andrew Pinski from comment #10)
> > (In reply to Markus Trippelsdorf from comment #9)
> > > Is subtracting undefined, too?
> > Yes.  Comparing two unrelated arrays or subtracting them is undefined.
> 
> But they are not unrelated arrays. So what from the C standard actually
> makes (and allows) gcc think they are unrelated?

C doesn't have any notion of "related" declarations.

> And given gcc 7 is to be released yet, can we have a switch to disable this
> optimization?

This is nothing new in GCC 7, you've most likely just been extremely lucky in
the past that it happened to work as you expected.  Other projects had to
change similar UB code years ago.  It isn't just a single optimization, but
lots of them that rely on pointer arithmetics being defined only within the
same object.

[Bug libitm/63907] libitm/config/posix/rwlock.cc doesn't compile

2016-10-18 Thread lts-rudolph at gmx dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63907

--- Comment #10 from Klaus Rudolph  ---
Created attachment 39830
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39830=edit
preprocessed file rwlock.ii

Add rwlock.ii file as requested.

[Bug libgcc/78017] weak reference usage in gthr-posix.h (__gthread*) is broken

2016-10-18 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78017

--- Comment #1 from Andrew Pinski  ---
IIRC this was declared a libc bug or an user error (not using the full
archive). There is another thread on the glibc side if you want to read up on
that.

[Patch,testsuite] Fix sso.exp not calling torture-finish for avr

2016-10-18 Thread Senthil Kumar Selvaraj
Hi,

  When analyzing reg test failures for the avr target, I noticed that the
  torture options were different when running dg-torture.exp compared to
  x86_64-pc-linux-gnu, resulting in additional failures. I also found
  that  a bunch of "torture-without-loops not empty as expected" errors
  show up for a few .exp files.

  I found that these did not occur when the exp files were run in
  isolation. On further debugging, I found that sso.exp calls dg-init and
  torture-init, and returns if !effective_target_int32. It does
  not call the corresponding finish functions for targets like the avr
  for which the effective target condition is true, and this leaves
  torture-options set, which causes the errors and differing options.

  The below patch makes the return occur earlier - before calling the
  init functions.

  Committed to trunk.

Regards
Senthil

2016-10-18  Senthil Kumar Selvaraj  

* gcc.dg/sso/sso.exp: Return early if not
effective_target_int32.


Index: gcc.dg/sso/sso.exp
===
--- gcc.dg/sso/sso.exp  (revision 241299)
+++ gcc.dg/sso/sso.exp  (working copy)
@@ -18,6 +18,10 @@
 load_lib gcc-dg.exp
 load_lib torture-options.exp
 
+if { ![check_effective_target_int32] } {
+return
+}
+
 # Initialize `dg'.
 torture-init
 dg-init
@@ -32,10 +36,6 @@
 
 set-torture-options $SSO_TORTURE_OPTIONS
 
-if { ![check_effective_target_int32] } {
-return
-}
-
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
 


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
> The name of Etc/GMT+1 timezone is "-01", as evident from:
>
> $ TZ=Etc/GMT+1 date +%Z
> -01
>
> Attached patch fixes the testsuite failure.

Forgot to say that the patch was tested with tzdata2016g on Fedora 24
and CentOS 5.11.

Uros.


Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
On Tue, 2016-10-18 at 05:13 +0200, Markus Trippelsdorf wrote:
> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> > Hi,
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> > of pointer type.  This patch recognizes when pointer arithmetic is taking
> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> > the case of the PR, this occurs in the logic where the stride S is a known
> > constant value, but the same problem could occur when it is an SSA_NAME
> > without known value.  Both possibilities are handled here.
> > 
> > Fixing the code to ensure that the unknown stride case always uses an 
> > initializer for a negative increment allows us to remove the stopgap fix
> > added for PR77937 as well.
> > 
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions, committed.
> 
> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> X86_64 before committing these patches, because you broke it for the
> third time in the last couple of days.

Sorry, sorry.  I did intend to build another stage1 cross and try this,
but it just slipped my mind.  Looks like I'll need to put the stopgap
fix back in.  I'll do that shortly.

Meantime, -fno-slsr is a workaround.  If I have trouble building ffmpeg
with a cross, I'll check with you on testing a future patch.

Bill

> 
> markus@x4 ffmpeg % cat h264dsp.i
> extern int fn2(int);
> extern int fn3(int);
> int a, b, c;
> void fn1(long p1) {
>   char *d;
>   for (;; d += p1) {
> d[0] = fn2(1 >> c >> 1);
> fn2(c >> a);
> d[1] = fn3(d[1]) >> 1;
> d[6] = fn3(d[6] * b + 1) >> 1;
> d[7] = fn3(d[7] * b + 1) >> 1;
> d[8] = fn3(d[8] * b + 1) >> 1;
>   }
> }
> 
> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
> h264dsp.i: In function ‘fn1’:
> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
> gimple-ssa-strength-reduction.c:3375
>  void fn1(long p1) {
>   ^~~
> 0x12773a9 replace_one_candidate
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
> 0x127af77 replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
> 0x127aeeb replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
> 0x127f3ee analyze_candidates_and_replace
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
> 0x127f3ee execute
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648
> 
> 
> 
> 




Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool

2016-10-18 Thread Segher Boessenkool
On Tue, Oct 18, 2016 at 01:09:24PM +0200, Eric Botcazou wrote:
> > > No, "mode" is the mode of the MEM, not that of the SYMBOL_REF.
> > 
> > I still don't see it, could you explain a bit more?
> 
> MODE is the mode of operands[1] before:
> 
> operands[1] = force_const_mem (mode, operands[1]);
> 
> and after.  But the test is on the address of the MEM, not on the MEM itself:
> 
> if (TARGET_TOC
> && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
> && use_toc_relative_ref (XEXP (operands[1], 0), Pmode))
> 
> because it's the mode of SYMBOL_REF we are interesting in (and 
> force_const_mem 
> guarantees that it's Pmode).

We need to pass the mode of the actual datum we would put in the TOC to
the use_toc_relative_ref function, not the mode of its address.

I must be missing something...


Segher


Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Richard Biener
On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
> Richard Biener:
>> On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
>>> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:

 Mike Stump:
> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>> Therefore, it is better to emit it in all circumstances, in case the 
>> reader needs to know what the working
>> directory was at compile-time.
>
> I can't help but wonder if this would break ccache some?
>

 Could you explain this in some more detail? At the moment, GCC will 
 already emit DW_AT_name with an absolute path (in the scenario that this 
 patch is relevant to). How does ccache work around this at the moment? 
 (Does it use debug-prefix-map? In which case, this also affects 
 DW_AT_comp_dir, so my patch should be fine.)
>>>
>>> If you compile the same file, but in a different directory, I wonder if cwd 
>>> will cause the cache entry to not be reused.
>>>
>>> I expect one of the ccache people that are around would just know if it 
>>> will care at all.
>>
>> I believe ccache compares preprocessed source, definitely _not_ DWARF
>> output, so this shouldn't break anything there.
>> It might result in different object file output but as the reporter
>> figured due to a bug in dwarf2out.c we end up generating
>> DW_AT_comp_dir in almost all cases already.
>>
>> I think the patch is ok but it misses a ChangeLog entry.  How did you
>> test the patch? (bootstrapped and tested on ...?)
>>
>
> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
> didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
> testing/unstable. I'll find some time to bootstrap it and test it fully over 
> the next few days.
>
> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a 
> bit more, my patch basically obsoletes the need for this so I can delete that 
> as well.

That would be nice.

Richard.

> X
>
> --
> GPG: ed25519/56034877E1F87C35
> GPG: rsa4096/1318EFAC5FBBDBCE
> https://github.com/infinity0/pubkeys.git


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Uros Bizjak
On Tue, Oct 18, 2016 at 2:10 PM, Uros Bizjak  wrote:
> On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
>> The name of Etc/GMT+1 timezone is "-01", as evident from:
>>
>> $ TZ=Etc/GMT+1 date +%Z
>> -01
>>
>> Attached patch fixes the testsuite failure.
>
> Forgot to say that the patch was tested with tzdata2016g on Fedora 24
> and CentOS 5.11.

FYI, tzdata2016g ChangLog says:

  Changes to past and future time zone abbreviations

The Factory zone now uses the time zone abbreviation -00 instead
of a long English-language string, as -00 is now the normal way to
represent an undefined time zone.

Several zones in Antarctica and the former Soviet Union, along
with zones intended for ships at sea that cannot use POSIX TZ
strings, now use numeric time zone abbreviations instead of
invented or obsolete alphanumeric abbreviations.  The affected
zones are Antarctica/Casey, Antarctica/Davis,
Antarctica/DumontDUrville, Antarctica/Mawson, Antarctica/Rothera,
Antarctica/Syowa, Antarctica/Troll, Antarctica/Vostok,
Asia/Anadyr, Asia/Ashgabat, Asia/Baku, Asia/Bishkek, Asia/Chita,
Asia/Dushanbe, Asia/Irkutsk, Asia/Kamchatka, Asia/Khandyga,
Asia/Krasnoyarsk, Asia/Magadan, Asia/Omsk, Asia/Sakhalin,
Asia/Samarkand, Asia/Srednekolymsk, Asia/Tashkent, Asia/Tbilisi,
Asia/Ust-Nera, Asia/Vladivostok, Asia/Yakutsk, Asia/Yekaterinburg,
Asia/Yerevan, Etc/GMT-14, Etc/GMT-13, Etc/GMT-12, Etc/GMT-11,
Etc/GMT-10, Etc/GMT-9, Etc/GMT-8, Etc/GMT-7, Etc/GMT-6, Etc/GMT-5,
Etc/GMT-4, Etc/GMT-3, Etc/GMT-2, Etc/GMT-1, Etc/GMT+1, Etc/GMT+2,
Etc/GMT+3, Etc/GMT+4, Etc/GMT+5, Etc/GMT+6, Etc/GMT+7, Etc/GMT+8,
Etc/GMT+9, Etc/GMT+10, Etc/GMT+11, Etc/GMT+12, Europe/Kaliningrad,
Europe/Minsk, Europe/Samara, Europe/Volgograd, and
Indian/Kerguelen.  For Europe/Moscow the invented abbreviation MSM
was replaced by +05, whereas MSK and MSD were kept as they are not
our invention and are widely used.

Uros.


Re: [PATCH, libgo]: Fix FAIL: time testsuite failure

2016-10-18 Thread Rainer Orth
Hi Uros,

> On Tue, Oct 18, 2016 at 11:19 AM, Uros Bizjak  wrote:
>> The name of Etc/GMT+1 timezone is "-01", as evident from:
>>
>> $ TZ=Etc/GMT+1 date +%Z
>> -01
>>
>> Attached patch fixes the testsuite failure.
>
> Forgot to say that the patch was tested with tzdata2016g on Fedora 24
> and CentOS 5.11.

but Fedora 20 still returns GMT+1 here, and Solaris 10 to 12 even
Etc/GMT (where Solaris 12 also has 2016g).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


libgo patch committed: scan caller-saved regs for non-split-stack

2016-10-18 Thread Ian Lance Taylor
While testing a libgo patch on Solaris, which does not support
split-stack, I ran across a bug in the handling of caller-saved
registers for the garbage collector.  For non-split-stack systems,
runtime_mcall is responsible for saving all caller-saved registers on
the stack so that the GC stack scan will see them.  It does this by
calling __builtin_unwind_init and setting the g's gcnextsp field to
point to the current stack.  The garbage collector then scans the
stack from gcnextsp to the top of stack.

Unfortunately, the code was setting gcnextsp to point to
runtime_mcall's argument, which meant that even though runtime_mcall
was careful to store all caller-saved registers on the stack, the GC
never saw them.  This is, of course, only a problem if a value lives
only in a caller-saved register, and not anywhere else on the stack or
heap.  And it is only a problem if that caller-saved register manages
to make it all the way down to runtime_mcall without being saved by
any function on the way.  This is moderately unlikely but it turns out
that the recent changes to keep values on the stack when compiling the
runtime package caused it to happen for the local variable `s` in
`notifyListWait` in runtime/sema.go.  That function calls goparkunlock
which is simple enough to not require all registers, and itself calls
runtime_mcall.  So it was possible for `s` to be released by the GC
before the goroutine returned from goparkunlock, which eventually
caused a dangling pointerto be passed to releaseSudog.

This is not a problem on split-stack systems, which use
__splitstack_get_context, which saves a stack pointer low enough on
the stack to scan the registers saved by runtime_mcall.

This patch fixes the problem by introducing a local variable which
should be on the stack below the saved registers.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and
i386-sun-solaris.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241261)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-314ba28067383516c213ba84c931f93325a48c39
+0a49b1dadd862215bdd38b9725a6e193b0d8fd0b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/proc.c
===
--- libgo/runtime/proc.c(revision 241261)
+++ libgo/runtime/proc.c(working copy)
@@ -283,6 +283,9 @@ runtime_mcall(void (*pfn)(G*))
 {
M *mp;
G *gp;
+#ifndef USING_SPLIT_STACK
+   void *afterregs;
+#endif
 
// Ensure that all registers are on the stack for the garbage
// collector.
@@ -298,7 +301,9 @@ runtime_mcall(void (*pfn)(G*))
 #ifdef USING_SPLIT_STACK
__splitstack_getcontext(>stackcontext[0]);
 #else
-   gp->gcnextsp = 
+   // We have to point to an address on the stack that is
+   // below the saved registers.
+   gp->gcnextsp = 
 #endif
gp->fromgogo = false;
getcontext(ucontext_arg(>context[0]));


[Bug libitm/63907] libitm/config/posix/rwlock.cc doesn't compile

2016-10-18 Thread lts-rudolph at gmx dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63907

--- Comment #9 from Klaus Rudolph  ---
hi all,

> Gesendet: Freitag, 14. Oktober 2016 um 10:32 Uhr
> Von: "redi at gcc dot gnu.org" 
> An: lts-rudo...@gmx.de
> Betreff: [Bug libitm/63907] libitm/config/posix/rwlock.cc doesn't compile
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63907
> 
> --- Comment #6 from Jonathan Wakely  ---
> See https://gcc.gnu.org/bugs/
> 


How can I add "-save-temps" to the gcc build process itself. Because the bug is
compiling the libitm library from the compiler compilation itself. Anything
which can be done with "./configure" or must the Makefiles be patched? I have
no idea how to set this additional CFLAG/CXXFLAG or similar.

Regards
 Klaus

Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
Hi,

The previous solution for PR77916 was inadequately tested, for which I
sincerely apologize.  I've reinstated the stopgap fix previously
reverted, as follows.

Thanks for your patience,
Bill


2016-10-18  Bill Schmidt  

PR tree-optimization/77916
* gimple-ssa-strength-reduction.c (analyze_increments): Reinstate
stopgap fix, as pointers with -1 increment are still broken.

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 241302)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2825,6 +2825,10 @@ analyze_increments (slsr_cand_t first_dep, machine
   && !POINTER_TYPE_P (first_dep->cand_type)))
incr_vec[i].cost = COST_NEUTRAL;
 
+  /* FIXME: Still having trouble with pointers with a -1 increment.  */
+  else if (incr == -1 && POINTER_TYPE_P (first_dep->cand_type))
+   incr_vec[i].cost = COST_INFINITE;
+
   /* FORNOW: If we need to add an initializer, give up if a cast from
 the candidate's type to its stride's type can lose precision.
 This could eventually be handled better by expressly retaining the




[Bug middle-end/77964] [7 Regression] Linux kernel firmware loader miscompiled

2016-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77964

--- Comment #17 from Jakub Jelinek  ---
(In reply to Jiri Slaby from comment #16)
> (In reply to Jakub Jelinek from comment #15)
> > lots of them that rely on pointer arithmetics being defined only within the
> > same object.
> 
> Sure, but the two pointers (taken implicitly of the arrays) are within the
> same object. So I do not see, why it wouldn't work? I.e. where exactly this
> breaks the C specs?

No.  In C
extern struct builtin_fw __start_builtin_fw[];
extern struct builtin_fw __end_builtin_fw[];
declares two external arrays, thus they are two independent objects.  It is
like if you have:
int a[10];
int b[10];
in your program, although they might be allocated adjacent, such that
int *p = [10]; int *q = [0]; memcmp (, , sizeof (p)) == 0;
[0] - [0] is still UB.
What you do with __start_*/__end_* symbols is nothing you can define in C, you
need linker support or asm for that, and to use it without UB you also need to
use an optimization barrier that has been suggested.

Re: [PATCH] PR77990 refactor unique_ptr to encapsulate tuple

2016-10-18 Thread Jonathan Wakely

On 17/10/16 14:37 +0100, Jonathan Wakely wrote:

We are incorrectly requiring unique_ptr deleters to be copyable here:

explicit
unique_ptr(pointer __p) noexcept
: _M_t(__p, deleter_type())
{ }

We could just do:

explicit
unique_ptr(pointer __p) noexcept
: _M_t()
{ std::get<0>(_M_t) = __p; }

But having to deal directly with the std::tuple inside unique_ptr has
been bothering me for some time. The tuple is used so we get the empty
base-class optimisation for the deleter, but that implementation
detail


ops, a dangling sentence. I meant to say that the tuple implementation
details leaks into the definition of lots of members, which have to
say std::get<0>(_M_t) or std::get<1>(_M_t) instead of using more
natural member names.


This patch refactors unique_ptr to put the std::tuple member into a
new type which provides named accessors for the tuple elements, so we
can stop using get<0> and get<1>. That new type can also provide a
single-argument constructor to fix the copyable requirement for
deleters. This also removes the code for deducing the pointer type
which is duplciated in unique_ptr and unique_ptr, and while in
the neighbourhood I changed it from old-school SFINAE using overloaded
functions to the new hotness with __void_t<>.

I intend to commit this to trunk, but on the branches I'll just fix
the constructor as shown above, as it's a smaller change.


I'll wait a bit longer for any objections, as the refactoring could be
seen as unnecessary churn, but I think it's valuable housekeeping.




Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-18 Thread Bernd Schmidt

On 10/18/2016 02:15 PM, Senthil Kumar Selvaraj wrote:

Will do both the changes and re-run the reg tests. Ok for trunk if the
tests pass for x86_64-pc-linux and avr?


Probably but let's see the patch first.


Bernd



[Bug tree-optimization/77916] [6/7 Regression] ICE in verify_gimple_in_cfg: invalid (pointer) operands to plus/minus

2016-10-18 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916

--- Comment #5 from Bill Schmidt  ---
Author: wschmidt
Date: Tue Oct 18 13:35:19 2016
New Revision: 241305

URL: https://gcc.gnu.org/viewcvs?rev=241305=gcc=rev
Log:
2016-10-18  Bill Schmidt  

PR tree-optimization/77916
* gimple-ssa-strength-reduction.c (analyze_increments): Reinstate
stopgap fix, as pointers with -1 increment are still broken.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-ssa-strength-reduction.c

Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-18 Thread Ximin Luo
Richard Biener:
> On Mon, Oct 17, 2016 at 11:44 PM, Mike Stump  wrote:
>> On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
>>>
>>> Mike Stump:
 On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
> Therefore, it is better to emit it in all circumstances, in case the 
> reader needs to know what the working
> directory was at compile-time.

 I can't help but wonder if this would break ccache some?

>>>
>>> Could you explain this in some more detail? At the moment, GCC will already 
>>> emit DW_AT_name with an absolute path (in the scenario that this patch is 
>>> relevant to). How does ccache work around this at the moment? (Does it use 
>>> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
>>> patch should be fine.)
>>
>> If you compile the same file, but in a different directory, I wonder if cwd 
>> will cause the cache entry to not be reused.
>>
>> I expect one of the ccache people that are around would just know if it will 
>> care at all.
> 
> I believe ccache compares preprocessed source, definitely _not_ DWARF
> output, so this shouldn't break anything there.
> It might result in different object file output but as the reporter
> figured due to a bug in dwarf2out.c we end up generating
> DW_AT_comp_dir in almost all cases already.
> 
> I think the patch is ok but it misses a ChangeLog entry.  How did you
> test the patch? (bootstrapped and tested on ...?)
> 

Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
testing/unstable. I'll find some time to bootstrap it and test it fully over 
the next few days.

Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it a 
bit more, my patch basically obsoletes the need for this so I can delete that 
as well.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


Re: [PATCH 7/7] make targetm.gen_ccmp{first,next} take rtx_insn **

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 01:25:55PM +0200, Bernd Schmidt wrote:
> On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > gcc/ChangeLog:
> > 
> > 2016-10-17  Trevor Saunders  
> > 
> > * ccmp.c (expand_ccmp_expr_1): Adjust.
> > (expand_ccmp_expr): Likewise.
> > (expand_ccmp_next): Likewise.
> > * config/aarch64/aarch64.c (aarch64_gen_ccmp_next): Likewise.
> > (aarch64_gen_ccmp_first): Likewise.
> > * doc/tm.texi: Regenerate.
> > * target.def (gen_ccmp_first): Change argument types to rtx_insn *.
> > (gen_ccmp_next): Likewise.
> 
> Looks reasonable, but has this been tested on aarch64? I think that's a
> prerequisite for this patch.

So far I've only checked that I can build a compiler targeting aarch64,
which given the changes in the patch theoretically should be enough.
However it shouldn't be hard to actually test it on an aarch64 machine
so I'll do that.

Thanks!

Trev

> 
> 
> Bernd


Re: [PATCH] Fix PR77916

2016-10-18 Thread Markus Trippelsdorf
On 2016.10.18 at 08:15 -0500, Bill Schmidt wrote:
> On Tue, 2016-10-18 at 05:13 +0200, Markus Trippelsdorf wrote:
> > On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> > > Hi,
> > > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> > > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> > > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> > > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> > > of pointer type.  This patch recognizes when pointer arithmetic is taking
> > > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> > > the case of the PR, this occurs in the logic where the stride S is a known
> > > constant value, but the same problem could occur when it is an SSA_NAME
> > > without known value.  Both possibilities are handled here.
> > > 
> > > Fixing the code to ensure that the unknown stride case always uses an 
> > > initializer for a negative increment allows us to remove the stopgap fix
> > > added for PR77937 as well.
> > > 
> > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > > regressions, committed.
> > 
> > Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> > X86_64 before committing these patches, because you broke it for the
> > third time in the last couple of days.
> 
> Sorry, sorry.  I did intend to build another stage1 cross and try this,
> but it just slipped my mind.  Looks like I'll need to put the stopgap
> fix back in.  I'll do that shortly.
> 
> Meantime, -fno-slsr is a workaround.  If I have trouble building ffmpeg
> with a cross, I'll check with you on testing a future patch.

I you wish I can send you a tarball with the preprocessed *.i files from
ffmpeg, so that you can use a stage1 cross on them.

-- 
Markus


Re: [PATCH] Fix PR77916

2016-10-18 Thread Bill Schmidt
On Tue, 2016-10-18 at 15:30 +0200, Markus Trippelsdorf wrote:

> I you wish I can send you a tarball with the preprocessed *.i files from
> ffmpeg, so that you can use a stage1 cross on them.
> 

That would be very helpful, thanks!

Bill



Re: [PATCH] Make EVRP propagate into PHIs and remove dead stmts

2016-10-18 Thread Richard Biener
On Tue, 18 Oct 2016, Trevor Saunders wrote:

> On Tue, Oct 18, 2016 at 02:34:58PM +0200, Richard Biener wrote:
> > 
> > The following patch makes EVRP remove stmts that will become dead
> > after propagation.  For this to work we have to propagate into PHIs
> > (sth we missed as well).
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2016-10-18  Richard Biener  
> > 
> > * tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
> > stmts_to_remove.
> > (evrp_dom_walker::~evrp_dom_walker): Free it.
> > (evrp_dom_walker::stmts_to_remove): Add.
> > (evrp_dom_walker::before_dom_children): Mark PHIs and stmts
> > whose output we fully propagate for removal.  Propagate
> > into BB destination PHI arguments.
> > (execute_early_vrp): Remove queued stmts.  Dump value ranges
> > before stmt removal.
> > 
> > Index: gcc/tree-vrp.c
> > ===
> > --- gcc/tree-vrp.c  (revision 241302)
> > +++ gcc/tree-vrp.c  (working copy)
> > @@ -10643,11 +10643,13 @@ public:
> >  : dom_walker (CDI_DOMINATORS), stack (10)
> >  {
> >stmts_to_fixup.create (0);
> > +  stmts_to_remove.create (0);
> >need_eh_cleanup = BITMAP_ALLOC (NULL);
> >  }
> >~evrp_dom_walker ()
> >  {
> >stmts_to_fixup.release ();
> > +  stmts_to_remove.release ();
> >BITMAP_FREE (need_eh_cleanup);
> >  }
> >virtual edge before_dom_children (basic_block);
> > @@ -10660,6 +10662,7 @@ public:
> >auto_vec > stack;
> >bitmap need_eh_cleanup;
> >vec stmts_to_fixup;
> > +  vec stmts_to_remove;
> 
> That might as well be an auto_vec right?
> 
> >  };
> >  
> >  
> > @@ -10769,6 +10772,15 @@ evrp_dom_walker::before_dom_children (ba
> >else
> > set_value_range_to_varying (_result);
> >update_value_range (lhs, _result);
> > +
> > +  /* Mark PHIs whose lhs we fully propagate for removal.  */
> > +  tree val;
> > +  if ((val = op_with_constant_singleton_value_range (lhs))
> > + && may_propagate_copy (lhs, val))
> 
> wouldn't it be clearer to write that as
> 
> tree val = op_with_constant_singleton_value_range (lhs);
> if (val && may_propagate_copy (lhs, val))
> 
> > +   {
> > + stmts_to_remove.safe_push (phi);
> > + continue;
> > +   }
> >  }
> >  
> >edge taken_edge = NULL;
> > @@ -10806,7 +10818,6 @@ evrp_dom_walker::before_dom_children (ba
> >   update_value_range (output, );
> >   vr = *get_value_range (output);
> >  
> > -
> >   /* Set the SSA with the value range.  */
> >   if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
> > {
> > @@ -10824,6 +10835,17 @@ evrp_dom_walker::before_dom_children (ba
> >&& range_includes_zero_p (vr.min,
> >  vr.max) == 1)))
> > set_ptr_nonnull (output);
> > +
> > + /* Mark stmts whose output we fully propagate for removal.  */
> > + tree val;
> > + if ((val = op_with_constant_singleton_value_range (output))
> > + && may_propagate_copy (output, val)
> > + && !stmt_could_throw_p (stmt)
> > + && !gimple_has_side_effects (stmt))
> 
> similar.

Fixed.  Testing the following.

Richard.

2016-10-18  Richard Biener  

* tree-vrp.c (evrp_dom_walker::evrp_dom_walker): Initialize
stmts_to_remove.
(evrp_dom_walker::~evrp_dom_walker): Free it.
(evrp_dom_walker::stmts_to_remove): Add.
(evrp_dom_walker::before_dom_children): Mark PHIs and stmts
whose output we fully propagate for removal.  Propagate
into BB destination PHI arguments.
(execute_early_vrp): Remove queued stmts.  Dump value ranges
before stmt removal.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241302)
+++ gcc/tree-vrp.c  (working copy)
@@ -10642,12 +10642,10 @@ public:
   evrp_dom_walker ()
 : dom_walker (CDI_DOMINATORS), stack (10)
 {
-  stmts_to_fixup.create (0);
   need_eh_cleanup = BITMAP_ALLOC (NULL);
 }
   ~evrp_dom_walker ()
 {
-  stmts_to_fixup.release ();
   BITMAP_FREE (need_eh_cleanup);
 }
   virtual edge before_dom_children (basic_block);
@@ -10659,7 +10657,8 @@ public:
   /* Cond_stack holds the old VR.  */
   auto_vec > stack;
   bitmap need_eh_cleanup;
-  vec stmts_to_fixup;
+  auto_vec stmts_to_fixup;
+  auto_vec stmts_to_remove;
 };
 
 
@@ -10769,6 +10768,11 @@ evrp_dom_walker::before_dom_children (ba
   else
set_value_range_to_varying (_result);
   update_value_range (lhs, _result);
+
+  /* Mark PHIs whose lhs we fully propagate for removal.  */
+  tree val = op_with_constant_singleton_value_range (lhs);
+  if (val && may_propagate_copy (lhs, val))
+   

[PATCH] Use RPO order for domwalk dominator children sort

2016-10-18 Thread Richard Biener

For

extern void baz ();
extern void boo ();
extern void bla ();
int a[100];
void foo (int n)
{
  for (int j = 0; j < n; ++j)
{
  if (a[j+5])
{
  if (a[j])
break;
  baz ();
}
  else
bla ();
  boo ();
}
}

we happen to visit BBs in an unfortunate order so that we do not
have all predecessors visited when visiting the BB of boo().  This
is because domwalk uses a postorder on the inverted graph to
order dominator children -- that doesn't play well with loops
(as we've figured elsewhere before).  The following makes us use
RPO order instead.

This should help EVRP and DOM.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-10-18  Richard Biener  

* domwalk.c (dom_walker::walk): Use RPO order.

Index: gcc/domwalk.c
===
--- gcc/domwalk.c   (revision 241300)
+++ gcc/domwalk.c   (working copy)
@@ -243,7 +243,7 @@ dom_walker::walk (basic_block bb)
   if (m_dom_direction == CDI_DOMINATORS)
 {
   postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
-  postorder_num = inverted_post_order_compute (postorder);
+  postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
   bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
   for (int i = 0; i < postorder_num; ++i)
bb_postorder[postorder[i]] = i;


Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Trevor Saunders
On Tue, Oct 18, 2016 at 01:18:42PM +0200, Bernd Schmidt wrote:
> On 10/17/2016 09:46 PM, tbsaunde+...@tbsaunde.org wrote:
> >  {
> > -  rtx r0, r16, eqv, tga, tp, insn, dest, seq;
> > +  rtx r0, r16, eqv, tga, tp, dest, seq;
> > +  rtx_insn *insn;
> > 
> >switch (tls_symbolic_operand_type (x))
> > {
> > @@ -1025,66 +1026,70 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
> > machine_mode mode)
> >   break;
> > 
> > case TLS_MODEL_GLOBAL_DYNAMIC:
> > - start_sequence ();
> > + {
> > +   start_sequence ();
> > 
> > - r0 = gen_rtx_REG (Pmode, 0);
> > - r16 = gen_rtx_REG (Pmode, 16);
> > - tga = get_tls_get_addr ();
> > - dest = gen_reg_rtx (Pmode);
> > - seq = GEN_INT (alpha_next_sequence_number++);
> > +   r0 = gen_rtx_REG (Pmode, 0);
> > +   r16 = gen_rtx_REG (Pmode, 16);
> > +   tga = get_tls_get_addr ();
> > +   dest = gen_reg_rtx (Pmode);
> > +   seq = GEN_INT (alpha_next_sequence_number++);
> > 
> > - emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
> > - insn = gen_call_value_osf_tlsgd (r0, tga, seq);
> > - insn = emit_call_insn (insn);
> > - RTL_CONST_CALL_P (insn) = 1;
> > - use_reg (_INSN_FUNCTION_USAGE (insn), r16);
> > +   emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
> > +   rtx val = gen_call_value_osf_tlsgd (r0, tga, seq);
> 
> Since this doesn't consistently declare variables at the point of
> initialization, might as well put val into the list of variables at the top,
> and avoid reindentation that way. There are several such reindented blocks,
> and the patch would be a lot easier to review without this.

I do really prefer reading code where variables are declared at first
use, but I'll agree with the tools we are using this can be hard to
review, sorry about that.

> Alternatively, split it up a bit more into obvious/nonobvious parts.

yeah, I'll try to get that done soon.  fwiw a -b diff is below if you
find that better.

> > diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> > index 21bba0c..8e8fff4 100644
> > --- a/gcc/config/arc/arc.c
> > +++ b/gcc/config/arc/arc.c
> > @@ -4829,7 +4829,6 @@ static rtx
> >  arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx eqv)
> >  {
> >rtx r0 = gen_rtx_REG (Pmode, R0_REG);
> > -  rtx insns;
> >rtx call_fusage = NULL_RTX;
> > 
> >start_sequence ();
> > @@ -4846,7 +4845,7 @@ arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx 
> > eqv)
> >RTL_PURE_CALL_P (call_insn) = 1;
> >add_function_usage_to (call_insn, call_fusage);
> > 
> > -  insns = get_insns ();
> > +  rtx_insn *insns = get_insns ();
> >end_sequence ();
> 
> For example, stuff like this looks obvious enough that it can go in.

yeah, I think Jeff preapproved stuff like that a while back, but I just
lumped it in though really that wasn't very nice to review, and there
isn't really a reason for a human to review what a compiler can check
for us.

Thanks!

Trev

> 
> 
> Bernd


[Bug middle-end/77964] [7 Regression] Linux kernel firmware loader miscompiled

2016-10-18 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77964

--- Comment #16 from Jiri Slaby  ---
(In reply to Jakub Jelinek from comment #15)
> lots of them that rely on pointer arithmetics being defined only within the
> same object.

Sure, but the two pointers (taken implicitly of the arrays) are within the
same object. So I do not see, why it wouldn't work? I.e. where exactly this
breaks the C specs?

Re: [PATCH 3/7] use rtx_insn * more

2016-10-18 Thread Bernd Schmidt

On 10/18/2016 03:54 PM, Trevor Saunders wrote:


I do really prefer reading code where variables are declared at first
use


In general, so do I, but in this case it's one variable out of a whole 
bunch, which makes the entire thing look a little inconsistent.



Bernd


[Bug middle-end/65950] exit in main is causing the path to it to become unlikely.

2016-10-18 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65950

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |7.0

--- Comment #13 from Andrew Pinski  ---
Fixed.

[Bug target/78023] New: ice in replace_one_candidate with -O3 and -march=native

2016-10-18 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78023

Bug ID: 78023
   Summary: ice in replace_one_candidate with -O3 and
-march=native
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Created attachment 39831
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39831=edit
C source code after creduce

The attached C code, when compiled by gcc trunk dated 20161018,
and compiler flags -O3 -march=native, does this:

$ ../results/bin/gcc -c -O3 -march=native bug312.c
../../src/H5Tconv.c: In function ‘H5T__conv_int_float’:
../../src/H5Tconv.c:7558:1: internal compiler error: in replace_one_candidate,
at gimple-ssa-strength-reduction.c:3375
0x139c70e replace_one_candidate
../../trunk/gcc/gimple-ssa-strength-reduction.c:3375
0x13a1359 replace_profitable_candidates
../../trunk/gcc/gimple-ssa-strength-reduction.c:3486
0x13a13a5 replace_profitable_candidates
../../trunk/gcc/gimple-ssa-strength-reduction.c:3495
0x13a504f analyze_candidates_and_replace
../../trunk/gcc/gimple-ssa-strength-reduction.c:3574

The processor has this in /proc/cpuinfo

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 2
model name  : AMD FX(tm)-8350 Eight-Core Processor
stepping: 0
microcode   : 0x600084f
cpu MHz : 4000.000
cache size  : 2048 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 16
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clf
lush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good 
nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3
fma cx16 sse4_1 
sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm
sse4a misalign
sse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext
perfctr_core pe
rfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale
vmcb_clean flu
shbyasid decodeassists pausefilter pfthreshold
bugs: fxsave_leak sysret_ss_attrs null_seg
bogomips: 8026.96
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

[Bug target/78023] ice in replace_one_candidate with -O3 and -march=native

2016-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78023

--- Comment #3 from Uroš Bizjak  ---
(In reply to David Binderman from comment #0)
> Created attachment 39831 [details]
> C source code after creduce
> 
> The attached C code, when compiled by gcc trunk dated 20161018,
> and compiler flags -O3 -march=native, does this:

BTW: Please note that -march=native gets replaced with true -march=... and a
bunch of -mABI options by the compiler driver. 

You can see options, passed by the driver, by adding -### to the compiler
flags. Please report -march=... that is determined by the driver. Usually, this
is enough to reproduce the bug, for some unusual targets (e.g. emulators)
please also report other -m... options. 

> The processor has this in /proc/cpuinfo

This info is useless and redundant when true -march= is reported.

[C++ Patch/RFC] PR 67980 ("left shift count is negative [-Wshift-count-negative] generated for unreachable code")

2016-10-18 Thread Paolo Carlini

Hi,

in the language of our implementations details, submitter noticed that 
in terms of warnings we handle in a different way COND_EXPRs in 
tsubst_copy_and_build - we use fold_non_dependent_expr and integer_zerop 
to suppress undesired warnings by bumping c_inhibit_evaluation_warnings 
- and IF_STMTs in tsubst_expr, where we don't. My patch below, which 
passes testing, tries in a rather straightforward way to adopt the same 
mechanisms in the latter. There are quite a few details I'm not sure 
about: whether we should only use fold_non_dependent_expr for the 
purpose of suppressing the warnings -  thus never touching 'tmp' in the 
pt.c code handling IF_STMTs - which would be completely conservative in 
terms of code generation; whether there are subtle interactions with the 
new if constexpr, which I'm missing at the moment.


Thanks!

Paolo.

//

Index: cp/pt.c
===
--- cp/pt.c (revision 241297)
+++ cp/pt.c (working copy)
@@ -15403,26 +15403,46 @@
   break;
 
 case IF_STMT:
-  stmt = begin_if_stmt ();
-  IF_STMT_CONSTEXPR_P (stmt) = IF_STMT_CONSTEXPR_P (t);
-  tmp = RECUR (IF_COND (t));
-  tmp = finish_if_stmt_cond (tmp, stmt);
-  if (IF_STMT_CONSTEXPR_P (t) && integer_zerop (tmp))
-   /* Don't instantiate the THEN_CLAUSE. */;
-  else
-   RECUR (THEN_CLAUSE (t));
-  finish_then_clause (stmt);
+  {
+   tree folded_tmp;
+   bool zerop, nonzerop;
 
-  if (IF_STMT_CONSTEXPR_P (t) && integer_nonzerop (tmp))
-   /* Don't instantiate the ELSE_CLAUSE. */;
-  else if (ELSE_CLAUSE (t))
-   {
- begin_else_clause (stmt);
- RECUR (ELSE_CLAUSE (t));
- finish_else_clause (stmt);
-   }
+   stmt = begin_if_stmt ();
+   IF_STMT_CONSTEXPR_P (stmt) = IF_STMT_CONSTEXPR_P (t);
+   tmp = RECUR (IF_COND (t));
+   folded_tmp = fold_non_dependent_expr (tmp);
+   if (TREE_CODE (folded_tmp) == INTEGER_CST)
+ tmp = folded_tmp;
+   tmp = finish_if_stmt_cond (tmp, stmt);
+   zerop = integer_zerop (tmp);
+   nonzerop = integer_nonzerop (tmp);
+   if (IF_STMT_CONSTEXPR_P (t) && zerop)
+ /* Don't instantiate the THEN_CLAUSE. */;
+   else
+ {
+   if (zerop)
+ ++c_inhibit_evaluation_warnings;
+   RECUR (THEN_CLAUSE (t));
+   if (zerop)
+ --c_inhibit_evaluation_warnings;
+ }
+   finish_then_clause (stmt);
 
-  finish_if_stmt (stmt);
+   if (IF_STMT_CONSTEXPR_P (t) && nonzerop)
+ /* Don't instantiate the ELSE_CLAUSE. */;
+   else if (ELSE_CLAUSE (t))
+ {
+   begin_else_clause (stmt);
+   if (nonzerop)
+ ++c_inhibit_evaluation_warnings;
+   RECUR (ELSE_CLAUSE (t));
+   if (nonzerop)
+ --c_inhibit_evaluation_warnings;
+   finish_else_clause (stmt);
+ }
+
+   finish_if_stmt (stmt);
+  }
   break;
 
 case BIND_EXPR:
Index: testsuite/g++.dg/cpp1y/pr67980.C
===
--- testsuite/g++.dg/cpp1y/pr67980.C(revision 0)
+++ testsuite/g++.dg/cpp1y/pr67980.C(working copy)
@@ -0,0 +1,23 @@
+// { dg-do compile { target c++14 } }
+
+template 
+constexpr T cpp14_constexpr_then(T value) {
+  if (Y < 0)
+return (value << -Y);
+  else
+return 0;
+}
+
+template 
+constexpr T cpp14_constexpr_else(T value) {
+  if (Y > 0)
+return 0;
+  else
+return (value << -Y);
+}
+
+int main()
+{
+  cpp14_constexpr_then<1>(0);
+  cpp14_constexpr_else<1>(0);
+}


[Bug tree-optimization/78005] [7 Regression] 172.mgrid and 450.soplex miscompare

2016-10-18 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78005

--- Comment #4 from amker at gcc dot gnu.org ---
Hmm, some code got lost during iterating of patch testing...  Will send a patch
soon.

Re: [PATCH][AArch64] Align FP callee-saves

2016-10-18 Thread James Greenhalgh
On Mon, Oct 17, 2016 at 12:40:18PM +, Wilco Dijkstra wrote:
> 
> ping
>
> If the number of integer callee-saves is odd, the FP callee-saves use 8-byte
> aligned LDP/STP.  Since 16-byte alignment may be faster on some CPUs, align
> the FP callee-saves to 16 bytes and use the alignment gap for the last FP
> callee-save when possible. Besides slightly different offsets for FP
> callee-saves, the generated code doesn't change.
> 
> Bootstrap and regression pass, OK for commit?

This looks OK to me.

Thanks for the patch.

James

> ChangeLog:
> 2016-09-08  Wilco Dijkstra  
> 
>     * config/aarch64/aarch64.c (aarch64_layout_frame):
>     Align FP callee-saves.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> fed3b6e803821392194dc34a6c3df5f653d2e33e..075b3802c72a68f63b47574e19186e7ce3440b28
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2735,7 +2735,7 @@ static void
>  aarch64_layout_frame (void)
>  {
>    HOST_WIDE_INT offset = 0;
> -  int regno;
> +  int regno, last_fp_reg = INVALID_REGNUM;
>  
>    if (reload_completed && cfun->machine->frame.laid_out)
>  return;
> @@ -2781,7 +2781,10 @@ aarch64_layout_frame (void)
>    for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
>  if (df_regs_ever_live_p (regno)
>  && !call_used_regs[regno])
> -  cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
> +  {
> +   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
> +   last_fp_reg = regno;
> +  }
>  
>    if (cfun->machine->frame.emit_frame_chain)
>  {
> @@ -2805,9 +2808,21 @@ aarch64_layout_frame (void)
>  offset += UNITS_PER_WORD;
>    }
>  
> +  HOST_WIDE_INT max_int_offset = offset;
> +  offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
> +  bool has_align_gap = offset != max_int_offset;
> +
>    for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
>  if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
>    {
> +   /* If there is an alignment gap between integer and fp callee-saves,
> +  allocate the last fp register to it if possible.  */
> +   if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
> + {
> +   cfun->machine->frame.reg_offset[regno] = max_int_offset;
> +   break;
> + }
> +
>  cfun->machine->frame.reg_offset[regno] = offset;
>  if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
>    cfun->machine->frame.wb_candidate1 = regno;
> 



[Bug libstdc++/41861] [DR 887][C++0x] does not use monotonic_clock

2016-10-18 Thread mac at mcrowe dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861

--- Comment #14 from Mike Crowe  ---
(In reply to Jonathan Wakely from comment #13)
> (In reply to Roman Fietze from comment #12)
> > Sorry if it is inappropriate to ask for any changes, but how can it be, that
> > there is no fix for this bug for years in any of the GCC releases? 
> 
> Because it's not possible to implement the C++ requirements purely in terms
> of POSIX, so it requires a new API in the C library, which is complicated.
> All the information you need to investigate that is provided in this bug
> report and the enclosed links.

I submitted an RFC glibc patch last year:
http://patchwork.ozlabs.org/project/glibc/list/?submitter=66786

There were some objections but no-one seemed to outright say no. Unfortunately
it is blocked waiting for Torvald Riegel's removal of the assembly
"optimisation" for x86 and x86_64 before it can go in. This seems to be taking
longer than I expected when I wrote the patch.

> > With this bug condition_variable::wait_until is completely unusable on many
> 
> I find that hard to believe.

Well, it does make it unsafe to use in an environment where CLOCK_REALTIME can
change arbitrarily which some may consider to equate to "unusable".

However, in the intervening time I've become aware of the std::synchronic
proposal ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0126r2.pdf
) and I've been wondering whether libstdc++ would be better off implementing
support for that and then using it to build its own condition_variable
implementation. I'm not sure whether I'm qualified to do that though. :)

If we wanted to do something in the short term then we could consider flipping
the default clock for std::condition_variable to be std::chrono::steady_clock
where it is available. I suspect that most code is using a relative timeout
anyway and isn't really expecting the timeout to change when the system clock
changes.

Re: Go patch committed: copy print code from Go 1.7 runtime

2016-10-18 Thread Uros Bizjak
Hello!

> This patch copies the code that implements the print and println
> predeclared functions from the Go 1.7 runtime.  The compiler is
> changed to use the new names, and to call the printlock and
> printunlock functions around a sequence of print calls.  The writebuf
> field in the g struct changes to a slice.  Bootstrapped and ran Go
> testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

This patch probably introduced recent regression on 32bit x86 multilib:

Running target unix/-m32
FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
FAIL: go.test/test/printbig.go -O (test for excess errors)
FAIL: go.test/test/printbig.go execution

=== go Summary for unix/-m32 ===

# of expected passes6875
# of unexpected failures3
# of expected failures  1
# of untested testcases 12
# of unsupported tests  2

e.g.:

/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:27:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:15:45:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:38:
error: integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/fixedbugs/bug114.go:19:56:
error: integer constant overflow

FAIL: go.test/test/fixedbugs/bug114.go compilation,  -O2 -g
UNTESTED: go.test/test/fixedbugs/bug114.go execution,  -O2 -g

FAIL: go.test/test/printbig.go -O (test for excess errors)
Excess errors:
/home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:12:8: error:
integer constant overflow
/home/uros/git/gcc/gcc/testsuite/go.test/test/printbig.go:13:15:
error: integer constant overflow

./printbig.exe >printbig.p 2>&1
couldn't execute "./printbig.exe": no such file or directory
FAIL: go.test/test/printbig.go execution
UNTESTED: go.test/test/printbig.go compare

Uros.


[Bug middle-end/78016] REG_NOTE order is not kept during insn copy

2016-10-18 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78016

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-10-18
 Ever confirmed|0   |1

--- Comment #1 from Eric Botcazou  ---
> I attached a simply fix to keep REG-NOTE order during insn copy.
> 
> Any comments?

This seems reasonable if you need it for the DWARF CFI stuff, but note that
emit_copy_of_insn_after is not the only place where notes are copied, e.g.
try_split or create_copy_of_insn_rtx does that too.

  1   2   3   >