Re: [Committed] [PATCH 2/4] (v4) On-demand locations within string-literals

2016-08-05 Thread Prathamesh Kulkarni
On 6 August 2016 at 11:16, Markus Trippelsdorf  wrote:
> On 2016.08.05 at 14:16 -0400, David Malcolm wrote:
>> Successfully bootstrapped the updated patch on x86_64-pc
>> -linux-gnu, and successfully ran the stage 1 selftests on powerpc-ibm
>> -aix7.1.3.0 (gcc111)
>>
>> Committed to trunk as r239175; I'm attaching the final version of the
>> patch for reference.
>
> It breaks the build on ppc64le (gcc112):
FWIW I am observing the same error on x86_64-unknown-linux-gnu:
http://pastebin.com/63k4CRVY

Thanks,
Prathamesh
>
> /home/trippels/gcc_build_dir/./gcc/xgcc -B/home/trippels/gcc_build_dir/./gcc/ 
> -xc -S -c /dev/null -fself-test
> cc1: internal compiler error: Segmentation fault
> 0x1088b293 crash_signal
> ../../gcc/gcc/toplev.c:335
> 0x1115c694 cpp_string_location_reader::get_next()
> ../../gcc/libcpp/charset.c:2143
> 0x1115c694 _cpp_valid_ucn
> ../../gcc/libcpp/charset.c:1079
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> Makefile:1898: recipe for target 's-selftest' failed
>
> --
> Markus


Re: [Committed] [PATCH 2/4] (v4) On-demand locations within string-literals

2016-08-05 Thread Markus Trippelsdorf
On 2016.08.05 at 14:16 -0400, David Malcolm wrote:
> Successfully bootstrapped the updated patch on x86_64-pc
> -linux-gnu, and successfully ran the stage 1 selftests on powerpc-ibm
> -aix7.1.3.0 (gcc111)
> 
> Committed to trunk as r239175; I'm attaching the final version of the
> patch for reference.

It breaks the build on ppc64le (gcc112):

/home/trippels/gcc_build_dir/./gcc/xgcc -B/home/trippels/gcc_build_dir/./gcc/ 
-xc -S -c /dev/null -fself-test
cc1: internal compiler error: Segmentation fault
0x1088b293 crash_signal
../../gcc/gcc/toplev.c:335
0x1115c694 cpp_string_location_reader::get_next()
../../gcc/libcpp/charset.c:2143
0x1115c694 _cpp_valid_ucn
../../gcc/libcpp/charset.c:1079
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
Makefile:1898: recipe for target 's-selftest' failed

-- 
Markus


Re: [Patch] Implement std::experimental::variant

2016-08-05 Thread Tim Shen
On Fri, Aug 5, 2016 at 4:08 AM, Jonathan Wakely  wrote:
>> --- a/libstdc++-v3/include/bits/uses_allocator.h
>> +++ b/libstdc++-v3/include/bits/uses_allocator.h
>> @@ -113,6 +113,51 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> constexpr bool uses_allocator_v = uses_allocator<_Tp, _Alloc>::value;
>> #endif // C++17
>>
>> +  template
>> +struct __is_uses_allocator_constructible
>> +: conditional::value,
>> +  __or_,
>> +   is_constructible<_Tp, _Args..., _Alloc>>,
>> +  is_constructible<_Tp, _Args...>>::type { };
>> +
>> +  template
>> +static constexpr bool __is_uses_allocator_constructible_v =
>> +  __is_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
>
>
> This doesn't need to be 'static'

Done.

>
>
>> +  template
>> +struct __is_nothrow_uses_allocator_constructible
>> +: conditional::value,
>> +  __or_> _Args...>,
>> +   is_nothrow_constructible<_Tp, _Args..., _Alloc>>,
>> +  is_nothrow_constructible<_Tp, _Args...>>::type { };
>
>
> I wonder if there's any benefit to removing the duplication in the
> definitions of __is_nothrow_uses_allocator_constructible and
> __is_uses_allocator_constructible by defining a single template that
> can be instantation with either is_constructible or
> is_nothrow_constructible as needed:
>
>  template class _Trait, typename _Tp,
>   typename _Alloc, typename... _Args>
>struct __is_uses_allocator_constructible_impl
>: conditional::value,
>  __or_<_Trait<_Tp, allocator_arg_t, _Alloc, _Args...>,
>  _Trait<_Tp, _Args..., _Alloc>>,
>  _Trait<_Tp, _Args...>>::type { };
>
>  template
>using __is_uses_allocator_constructible
>  = __is_uses_allocator_constructible_impl   _Tp, _Alloc, _Args...>;
>
>  template
>using __is_nothrow_uses_allocator_constructible
>  = __is_uses_allocator_constructible_impl   _Tp, _Alloc, _Args...>;
>
> What do you think?
>
> (The variable templates would be unchanged).

Done. I don't have strong opinion on this.

>
>
>> +
>> +  template
>> +static constexpr bool __is_nothrow_uses_allocator_constructible_v =
>> +  __is_uses_allocator_constructible<_Tp, _Alloc, _Args...>::value;
>
>
> This should be using __is_nothrow_uses_allocator_constructible
> (and doesn't need to be static)

Good catch. :)

>
>
>
>> +
>> +  template
>> +void __uses_allocator_construct_impl(__uses_alloc0 __a, _Tp* __ptr,
>> +_Args&&... __args)
>> +{ new (__ptr) _Tp(forward<_Args>(__args)...); }
>> +
>> +  template
>> +void __uses_allocator_construct_impl(__uses_alloc1<_Alloc> __a, _Tp*
>> __ptr,
>> +_Args&&... __args)
>> +{ new (__ptr) _Tp(allocator_arg, *__a._M_a,
>> forward<_Args>(__args)...); }
>> +
>> +  template
>> +void __uses_allocator_construct_impl(__uses_alloc2<_Alloc> __a, _Tp*
>> __ptr,
>> +_Args&&... __args)
>> +{ new (__ptr) _Tp(forward<_Args>(__args)..., *__a._M_a); }
>
>
> I think these all need to use ::new (__ptr) with qualification, see
> below.

Done. I didn't add a testcase for this. Do you think that we need one/some?

>
>
>> +/** @file variant
>> + *  This is a TS C++ Library header.
>
>
> This should be updated.

Done.

>
>> + */
>> +
>> +#ifndef _GLIBCXX_VARIANT
>> +#define _GLIBCXX_VARIANT 1
>> +
>> +#pragma GCC system_header
>> +
>> +#if __cplusplus <= 201103L
>
>
> This should be 201402L

Done.

>
>> +# include 
>> +#else
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +namespace std _GLIBCXX_VISIBILITY(default)
>> +{
>> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +
>> +  template class variant;
>> +
>> +  template
>> +struct variant_size;
>> +
>> +  template
>> +struct variant_size : variant_size<_Variant> {};
>> +
>> +  template
>> +struct variant_size : variant_size<_Variant> {};
>> +
>> +  template
>> +struct variant_size : variant_size<_Variant>
>> {};
>> +
>> +  template
>> +struct variant_size>
>> +: std::integral_constant {};
>> +
>> +  template
>> +static constexpr size_t variant_size_v =
>> variant_size<_Variant>::value;
>
>
> We don't need 'static' here.

Done.

>
>> +
>> +  template
>> +struct variant_alternative;
>> +
>> +  template
>> +struct variant_alternative<_Np, variant<_First, _Rest...>>
>> +: variant_alternative<_Np-1, variant<_Rest...>> {};
>> +
>> +  template
>> +struct variant_alternative<0, variant<_First, _Rest...>>
>> +{ using type = _First; };
>> +
>> +  template
>> +   

Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-05 Thread ayush goel
On 5 August 2016 at 4:09:00 AM, Pedro Alves (pal...@redhat.com) wrote:
> On 08/02/2016 12:38 AM, Manuel López-Ibáñez wrote:
> >
> > If there is something wrong or missing, ideally we would like to know
> > so that Ayush can work on fixing it before the Summer of Code is over
> > in less than two weeks.
>
> I couldn't see anywhere gnulib's config.h file is included.
>
> gnulib's replacement headers and (and .c files) depend on
> that being included.
>
> Did I simply miss it?
>

gnulib’s config.h is created on compile time. After building the
library it is present inside gnulib build folder.

> Thanks,
> Pedro Alves
>
>


Re: [RS6000] PR72802 part 2, reload ICE

2016-08-05 Thread Segher Boessenkool
On Sat, Aug 06, 2016 at 10:53:33AM +0930, Alan Modra wrote:
> On Fri, Aug 05, 2016 at 06:01:47PM -0500, Segher Boessenkool wrote:
> > On Fri, Aug 05, 2016 at 02:20:40PM +0930, Alan Modra wrote:
> > > Here are the reload costs for the various alternatives of
> > > movsf_hardfloat:
> > > "=!r, !r,  m,  f, ww, ww, !r,  f, wb,  m, wY, wu,  Z,?wn, ?r,*c*l, !r, *h"
> > >   "r,  m,  r,  f, ww,  j,  j,  m, wY,  f, wb,  Z, wu,  r, wn,   r, *h,  0"
> > >  617 609  17  17   8   8 617   9   8  17  17   8  17  23  23   17 617  17
> > 
> > I agree reg-reg moves should come after reg-mem moves, but is it such
> > a good idea to put e.g. f->f after r->*c*l?
> 
> I doubt it matters but I agree that it's unnecessary to move the
> reg-reg moves to the last alternative.  They just need to be after
> equivalent class reg-mem moves.  I'll move them earlier, which is
> better since exact matches cut short the alternative evaluation.

Thanks!

> > The costs look in pretty bad shape anyway.
> 
> They make sense to me..  At least, they did when I was looking at them
> in reload.  :)  One thing I didn't show is that alternatives like ww,j
> are rejected regardless of the cost ("bad" is set by find_reloads).

For some reason I forgot ww is a superset of f.  Yeah it's not so bad.
Perhaps :-)


Segher


Re: [RS6000] PR72802 part 2, reload ICE

2016-08-05 Thread Alan Modra
On Fri, Aug 05, 2016 at 06:01:47PM -0500, Segher Boessenkool wrote:
> On Fri, Aug 05, 2016 at 02:20:40PM +0930, Alan Modra wrote:
> > Here are the reload costs for the various alternatives of
> > movsf_hardfloat:
> > "=!r, !r,  m,  f, ww, ww, !r,  f, wb,  m, wY, wu,  Z,?wn, ?r,*c*l, !r, *h"
> >   "r,  m,  r,  f, ww,  j,  j,  m, wY,  f, wb,  Z, wu,  r, wn,   r, *h,  0"
> >  617 609  17  17   8   8 617   9   8  17  17   8  17  23  23   17 617  17
> 
> I agree reg-reg moves should come after reg-mem moves, but is it such
> a good idea to put e.g. f->f after r->*c*l?

I doubt it matters but I agree that it's unnecessary to move the
reg-reg moves to the last alternative.  They just need to be after
equivalent class reg-mem moves.  I'll move them earlier, which is
better since exact matches cut short the alternative evaluation.

> The costs look in pretty bad shape anyway.

They make sense to me..  At least, they did when I was looking at them
in reload.  :)  One thing I didn't show is that alternatives like ww,j
are rejected regardless of the cost ("bad" is set by find_reloads).

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH,rs6000] Add built-in function support Power9 binary floating point operations

2016-08-05 Thread Segher Boessenkool
On Fri, Aug 05, 2016 at 04:27:36PM -0500, Pat Haugen wrote:
> On 08/02/2016 03:15 PM, Segher Boessenkool wrote:
> > On Tue, Aug 02, 2016 at 03:03:42PM -0500, Pat Haugen wrote:
> >> On 07/29/2016 10:47 AM, Kelvin Nilsen wrote:
> >>> +  "xsxexpdp %0,%x1"
> >>> +  [(set_attr "type" "fp")])
> >>
> >> Type should be 'integer'.
> > 
> > It has VSX regs as input, integer is worse than fpsimple here, I think?
> > Or vecsimple, that seems better yes.
> 
> It's a 2 cycle op that executes in the ALU pipe, just like other integer ops. 
> vecsimple would be worse since those consume a superslice, which this op 
> doesn't.
> 

Ah yes.  But why not fpsimple?  It schedules the same as e.g. fmr and
fabs?


Segher


[PATCH] c-format.c: cleanup of check_format_info_main

2016-08-05 Thread David Malcolm
On Thu, 2016-08-04 at 14:22 -0600, Jeff Law wrote:
> On 08/04/2016 01:24 PM, David Malcolm wrote:
>
> > > Do you realize that this isn't used for ~700 lines after this
> > > point?
> > >  Is
> > > there any sensible way to factor some code here to avoid the
> > > coding
> > > disconnect.  I realize the function was huge before you got in
> > > here,
> > > but
> > > if at all possible, I'd like to see a bit of cleanup.
> > >
> > > I think this is OK after that cleanup.
> >
> > format_chars can get modified in numerous places in the intervening
> > lines, which is why I stash the value there.
> Yea, I figured that was the case.  I first noticed the stashed value,
> but didn't see where it was used for far longer than I expected.
>
> >
> > I can do some kind of cleanup of check_format_info_main, maybe
> > splitting out the things in the body of loop, moving them to
> > support
> > functions.
> That's essentially what I was thinking.
>
> >
> > That said, I note that Martin's sprintf patch:
> >   https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00056.html
> > also touches those ~700 lines in check_format_info_main in over a
> > dozen
> > places.  Given that, would you prefer I do the cleanup before or
> > after
> > the substring_loc patch?
> I think you should go first with the cleanup.  It'll cause Martin
> some
> heartburn, but that happens sometimes.
>
> And FWIW, if you hadn't needed to stash away that value I probably
> wouldn't have noticed how badly that function (and the loop in
> particular) needed some refactoring.
>
> jeff

Here's a cleanup of check_format_info_main, which introduces three
new classes to hold state, and moves code from the loop into
methods of those classes, reducing the loop from ~700 lines to
~100 lines.

Unfortunately, so much changes in this patch that the before/after
diff is hard to read.  If you like the end-result, but would prefer
better history I could try to split this up into a more readable set
of patches.  (I have a version of that, but they're messy)

Successfully bootstrapped the updated patch on
x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
* c-format.c (class flag_chars_t): New class.
(struct length_modifier): New struct.
(class argument_parser): New class.
(flag_chars_t::flag_chars_t): New ctor.
(flag_chars_t::has_char_p): New method.
(flag_chars_t::add_char): New method.
(flag_chars_t::validate): New method.
(flag_chars_t::get_alloc_flag): New method.
(flag_chars_t::assignment_suppression_p): New method.
(argument_parser::argument_parser): New ctor.
(argument_parser::read_any_dollar): New method.
(argument_parser::read_format_flags): New method.
(argument_parser::read_any_format_width): New method.
(argument_parser::read_any_format_left_precision): New method.
(argument_parser::read_any_format_precision): New method.
(argument_parser::handle_alloc_chars): New method.
(argument_parser::read_any_length_modifier): New method.
(argument_parser::read_any_other_modifier): New method.
(argument_parser::find_format_char_info): New method.
(argument_parser::validate_flag_pairs): New method.
(argument_parser::give_y2k_warnings): New method.
(argument_parser::parse_any_scan_set): New method.
(argument_parser::handle_conversions): New method.
(argument_parser::check_argument_type): New method.
(check_format_info_main): Introduce classes argument_parser
and flag_chars_t, moving the code within the loop into methods
of these classes.  Make various locals "const".
---
 gcc/c-family/c-format.c | 1655 +--
 1 file changed, 1019 insertions(+), 636 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index c19c411..92d2c1c 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -1688,740 +1688,1123 @@ check_format_arg (void *ctx, tree format_tree,
  params, arg_num, fwt_pool);
 }
 
+/* Support class for argument_parser and check_format_info_main.
+   Tracks any flag characters that have been applied to the
+   current argument.  */
 
-/* Do the main part of checking a call to a format function.  FORMAT_CHARS
-   is the NUL-terminated format string (which at this point may contain
-   internal NUL characters); FORMAT_LENGTH is its length (excluding the
-   terminating NUL character).  ARG_NUM is one less than the number of
-   the first format argument to check; PARAMS points to that format
-   argument in the list of arguments.  */
+class flag_chars_t
+{
+ public:
+  flag_chars_t ();
+  bool has_char_p (char ch) const;
+  void add_char (char ch);
+  void validate (const format_kind_info *fki,
+const format_char_info *fci,
+const format_flag_spec *flag_specs,
+const char * const format_chars,
+

libgo patch committed: Change build procedure to use build tags

2016-08-05 Thread Ian Lance Taylor
Go packages use build tags (see the section on Build Constraints at
https://golang.org/pkg/go/build/) to select which files to build on
specific systems.

Previously the libgo Makefile explicitly listed the set of files to
compile for each package.  For packages that use build tags, this
required a lot of awkward automake conditionals in the Makefile.

This patch changes the build to look at the build tags in the files.
The new shell script libgo/match.sh does the matching.  This required
adjusting a lot of build tags, and removing some files that are never
used.  I verified that the exact same sets of files are compiled on
x86_64-pc-linux-gnu.  I also tested the build on i386-sun-solaris
(building for both 32-bit and 64-bit).

Writing match.sh revealed some bugs in the build tag handling that
already exists, in a slightly different form, in the gotest shell
script.  This patch fixes those problems as well.

The old code used automake conditionals to handle systems that were
missing strerror_r and wait4.  Rather than deal with those in Go,
those functions are now implemented in runtime/go-nosys.c when
necessary, so the Go code can simply assume that they exist.

The os testsuite looked for dir_unix.go, which was never built for
gccgo and has now been removed.  I changed the testsuite to look for
dir.go instead.

Note that if you have an existing build directory, you will have to
remove all the .dep files in TARGET/libgo after updating to this
patch.  There isn't anything that will force them to update
automatically.

Bootstrapped on x86_64-pc-linux-gnu and i386-sun-solaris.  Ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian


patch.txt.bz2
Description: BZip2 compressed data


[PATCH] Add std::apply for C++17

2016-08-05 Thread Jonathan Wakely

To avoid a circular dependency between  and  I've
moved std::__invoke to its own header.

In order to use std::__invoke in std::apply it needs to be constexpr,
which is easy, but the standard says that std::invoke isn't constexpr.
So we have to intentionally disallow std::invoke in constant
expressions even though it's implemented purely in terms of a
constexpr function.  Expect a ballot comment about that.

This also fixes a bug in the exception specifications of std::__invoke
and std::invoke, where the function type used as the template argument
to __is_nothrow_callable would have decayed its argument types, so we
need && there to prevent that.

* doc/xml/manual/status_cxx2017.xml: Add missing LFTSv2 features.
* doc/html/manual/status.html: Regenerate.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/invoke.h: New header.
(__invoke): Make constexpr. Add && to types in exception specification.
* include/experimental/tuple (apply, __apply_impl): Fix non-reserved
names. Include  and use std::__invoke.
* include/std/functional (__invfwd, __invoke_impl, __invoke): Move to
new header.
(invoke): Add && to types in exception specification.
* include/std/tuple (apply, __apply_impl): Define for C++17.
* testsuite/20_util/tuple/apply/1.cc: New test.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust dg-error
lineno.

Tested powerpc64-linux, committed to trunk.

commit 031d50fec36dd04c1e086ab7060d497a441f0d8a
Author: Jonathan Wakely 
Date:   Fri Aug 5 17:27:46 2016 +0100

Add std::apply for C++17

* doc/xml/manual/status_cxx2017.xml: Add missing LFTSv2 features.
* doc/html/manual/status.html: Regenerate.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/invoke.h: New header.
(__invoke): Make constexpr. Add && to types in exception specification.
* include/experimental/tuple (apply, __apply_impl): Fix non-reserved
names. Include  and use std::__invoke.
* include/std/functional (__invfwd, __invoke_impl, __invoke): Move to
new header.
(invoke): Add && to types in exception specification.
* include/std/tuple (apply, __apply_impl): Define for C++17.
* testsuite/20_util/tuple/apply/1.cc: New test.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust dg-error
lineno.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 02aec25..55e3ff5 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -149,6 +149,41 @@ Feature-testing recommendations for C++.
 
 
 
+   Library Fundamentals V1 TS Components: apply 

+  
+   http://www.w3.org/1999/xlink; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0220r1.html;>
+   P0220R1
+   
+  
+   7 
+   __cpp_lib_apply >= 201603 
+
+
+
+  
+   Library Fundamentals V1 TS Components: 
shared_ptrT[] 
+  
+   http://www.w3.org/1999/xlink; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0220r1.html;>
+   P0220R1
+   
+  
+   No 
+   __cpp_lib_shared_ptr_arrays >= 201603 
+
+
+
+  
+   Library Fundamentals V1 TS Components: Searchers 
+  
+   http://www.w3.org/1999/xlink; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0220r1.html;>
+   P0220R1
+   
+  
+   No 
+   __cpp_lib_boyer_moore_searcher >= 201603 
+
+
+
Constant View: A proposal for a std::as_const 
helper function template  
   
http://www.w3.org/1999/xlink; xlink:href="">
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index e2c4f63..ea992f0 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -112,6 +112,7 @@ bits_headers = \
${bits_srcdir}/hashtable.h \
${bits_srcdir}/hashtable_policy.h \
${bits_srcdir}/indirect_array.h \
+   ${bits_srcdir}/invoke.h \
${bits_srcdir}/ios_base.h \
${bits_srcdir}/istream.tcc \
${bits_srcdir}/list.tcc \
diff --git a/libstdc++-v3/include/bits/invoke.h 
b/libstdc++-v3/include/bits/invoke.h
new file mode 100644
index 000..60405b5
--- /dev/null
+++ b/libstdc++-v3/include/bits/invoke.h
@@ -0,0 +1,104 @@
+// Implementation of INVOKE -*- C++ -*-
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.

Re: [RS6000] PR72802 part 2, reload ICE

2016-08-05 Thread Segher Boessenkool
On Fri, Aug 05, 2016 at 02:20:40PM +0930, Alan Modra wrote:
> Here are the reload costs for the various alternatives of
> movsf_hardfloat:
> "=!r, !r,  m,  f, ww, ww, !r,  f, wb,  m, wY, wu,  Z,?wn, ?r,*c*l, !r, *h"
>   "r,  m,  r,  f, ww,  j,  j,  m, wY,  f, wb,  Z, wu,  r, wn,   r, *h,  0"
>  617 609  17  17   8   8 617   9   8  17  17   8  17  23  23   17 617  17

I agree reg-reg moves should come after reg-mem moves, but is it such
a good idea to put e.g. f->f after r->*c*l?

The costs look in pretty bad shape anyway.

I'm not comfortable acking this patch without some more investigation,
sorry.


Segher


Re: [RS6000] PR72802 part 1, fix constraints for lxssp/stxssp

2016-08-05 Thread Segher Boessenkool
On Fri, Aug 05, 2016 at 11:35:11AM +0930, Alan Modra wrote:
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 779ba1f..c59d07a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -7717,8 +7717,34 @@ mem_operand_gpr (rtx op, machine_mode mode)
>if (TARGET_POWERPC64 && (offset & 3) != 0)
>  return false;
>  
> -  if (mode_supports_vsx_dform_quad (mode)
> -  && !quad_address_offset_p (offset))
> +  extra = GET_MODE_SIZE (mode) - UNITS_PER_WORD;
> +  if (extra < 0)
> +extra = 0;
> +
> +  if (GET_CODE (addr) == LO_SUM)
> +/* For lo_sum addresses, we must allow any offset except one that
> +   causes a wrap, so test only the low 16 bits.  */
> +offset = ((offset & 0x) ^ 0x8000) - 0x8000;

You could use sext_hwi instead?   offset = sext_hwi (offset, 16);
There are many other explicit manipulations in the code already of course,
just FYI.

This seems like an improvement; okay for trunk, thanks.  I don't think
we're there yet though, there is more wrong / suboptimal than just the
PR :-/


Segher


Re: [PATCH], Improve vector int/long initialization on PowerPC

2016-08-05 Thread Pat Haugen
On 08/03/2016 11:33 PM, Michael Meissner wrote:
>  {
> -  if (BYTES_BIG_ENDIAN)
> -return "xxpermdi %x0,%x1,%x2,0";
> +  if (which_alternative == 0)
> +return (BYTES_BIG_ENDIAN
> + ? "xxpermdi %x0,%x1,%x2,0"
> + : "xxpermdi %x0,%x2,%x1,0");
> +
> +  else if (which_alternative == 1)
> +return (BYTES_BIG_ENDIAN
> + ? "mtvsrdd %x0,%1,%2"
> + : "mtvsrdd %x0,%2,%1");
> +
>else
> -return "xxpermdi %x0,%x2,%x1,0";
> +gcc_unreachable ();
>  }
> -  [(set_attr "type" "vecperm")])
> +  [(set_attr "type" "vecperm,mftgpr")
> +   (set_attr "length" "4")])

mtvsrdd actually behaves like a permute, so vecperm would be best insn type for 
it.

-Pat



Re: [PR49366] emit loc exprs for C++ non-virtual pmf template value parms

2016-08-05 Thread Alexandre Oliva
On Aug  5, 2016, Jason Merrill  wrote:

>> With this patch, when we can't emit a DW_AT_const_value, we emit each
>> "member" of the pointer to member function "record" as a
>> DW_OP_stack_value DW_OP_piece, as long as the referenced member
>> function is output in the same translation unit, otherwise we'd get
>> relocations to external symbols, something to avoid in debug sections.

> I wonder if it would make sense to use weak references...

I found the chunk of code that caused us to drop references to symbols
not defined in the current translation unit.  The comments at the end of
const_ok_for_output_1 don't sound not exactly hopeful:

  /* Avoid references to external symbols in debug info, on several targets
 the linker might even refuse to link when linking a shared library,
 and in many other cases the relocations for .debug_info/.debug_loc are
 dropped, so the address becomes zero anyway.  Hidden symbols, guaranteed
 to be defined within the same shared library or executable are fine.  */

Anyway, this change is something that could certainly be addressed as a
separate patch, since it would have effects over a lot more than just
pointers to member functions.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH v2,rs6000] Add built-in function support for Power9 binary floating point operations

2016-08-05 Thread Pat Haugen
On 08/04/2016 10:27 AM, Kelvin Nilsen wrote:
> +  "xsxexpdp %0,%x1"
> +  [(set_attr "type" "vecsimple")])

> +  "xsxsigdp %0,%x1"
> +  [(set_attr "type" "vecsimple")])

'integer' for both, reasoning in response on initial thread for this patch.

> +  "xvxexp %x0,%x1"
> +  [(set_attr "type" "fpsimple")])

> +  "xvxsig %x0,%x1"
> +  [(set_attr "type" "fpsimple")])

'vecsimple' for both.

-Pat



Re: [PR63240] generate debug info for defaulted member functions

2016-08-05 Thread Alexandre Oliva
On Aug  4, 2016, Jason Merrill  wrote:

> On Mon, Aug 1, 2016 at 10:57 PM, Alexandre Oliva  wrote:
>> * langhooks.h (struct lang_hooks_for_decls): Add
>> function_decl_defaulted_inclass_p and
>> function_decl_defaulted_outofclass_p.

> Maybe one hook that returns unsigned rather than two bool hooks?

There's a suitable lhd that returns int, so I used that instead.  I
noticed we should be using const_tree for these, so I adjusted them and,
while at that, I adjusted _deleted and _explicit too.  Finally, having
been informed of the constants for _defaulted and _deleted in DwARF v5
drafts, I've dropped the _GNU attributes for these and started using the
values from the draft, adjusting for them to be emitted even for strict
DWARF v5.

Here's what I'm testing.  Ok if it regstraps successfully?

[PR63240] generate debug info for defaulted member functions

From: Alexandre Oliva 

This implements , a
proposal already accepted for inclusion in DWARF-5, but using
DW_AT_GNU_defaulted instead of DW_AT_defaulted as the attribute name,
because the attribute id for DW_AT_defaulted is not yet publicly
available.

for  include/ChangeLog

	PR debug/63240
	* dwarf2.def (DW_AT_deleted, DW_AT_defaulted): New.
	* dwarf2.h (enu dwarf_defaulted_attribute): New.

for  gcc/ChangeLog

	PR debug/63240
	* langhooks-def.h
	(LANG_HOOKS_FUNCTION_DECL_EXPLICIT_P): Const_tree-ify.
	(LANG_HOOKS_FUNCTION_DECL_DELETED_P): Likewise.
	(LANG_HOOKS_FUNCTION_DECL_DEFAULTED): Set default.
	(LANG_HOOKS_DECLS): Add it.
	* langhooks.h (struct lang_hooks_for_decls): Add
	function_decl_defaulted.  Const_tree-ify
	function_decl_explicit_p and function_decl_deleted_p.
	* dwarf2out.c (gen_subprogram_die): Add DW_AT_defaulted
	attribute.  Add DW_AT_deleted instead of DW_AT_GNU_deleted,
	also at strict DWARF v5.

for  gcc/cp/ChangeLog

	PR debug/63240
	* cp-objcp-common.c (cp_function_decl_defaulted): New.
	(cp_function_decl_explicit_p): Const_tree-ify.
	(cp_function_decl_deleted_p): Likewise.
	* cp-objcp-common.h (cp_function_decl_defaulted): Declare.
	(cp_function_decl_explicit_p): Const_tree-ify.
	(cp_function_decl_deleted_p): Likewise.
	(LANG_HOOKS_FUNCTION_DECL_DEFAULTED): Redefine.

for  gcc/testsuite/ChangeLog

	PR debug/63240
	* g++.dg/debug/dwarf2/defaulted-member-function-1.C: New.
	* g++.dg/debug/dwarf2/defaulted-member-function-2.C: New.
	* g++.dg/debug/dwarf2/defaulted-member-function-3.C: New.
	* g++.dg/debug/dwarf2/deleted-member-function.C: Expect
	DW_AT_deleted.
---
 gcc/cp/cp-objcp-common.c   |   25 +-
 gcc/cp/cp-objcp-common.h   |7 ++-
 gcc/dwarf2out.c|   49 +++-
 gcc/langhooks-def.h|6 ++
 gcc/langhooks.h|9 +++-
 .../debug/dwarf2/defaulted-member-function-1.C |   14 ++
 .../debug/dwarf2/defaulted-member-function-2.C |   16 +++
 .../debug/dwarf2/defaulted-member-function-3.C |   13 +
 .../g++.dg/debug/dwarf2/deleted-member-function.C  |2 -
 include/dwarf2.def |2 +
 include/dwarf2.h   |8 +++
 11 files changed, 139 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/defaulted-member-function-1.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/defaulted-member-function-2.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/defaulted-member-function-3.C

diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index f7ddb00..9cb9dd7 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -133,7 +133,7 @@ cxx_types_compatible_p (tree x, tree y)
 /* Return true if DECL is explicit member function.  */
 
 bool
-cp_function_decl_explicit_p (tree decl)
+cp_function_decl_explicit_p (const_tree decl)
 {
   return (decl
 	  && DECL_LANG_SPECIFIC (STRIP_TEMPLATE (decl))
@@ -143,13 +143,34 @@ cp_function_decl_explicit_p (tree decl)
 /* Return true if DECL is deleted special member function.  */
 
 bool
-cp_function_decl_deleted_p (tree decl)
+cp_function_decl_deleted_p (const_tree decl)
 {
   return (decl
 	  && DECL_LANG_SPECIFIC (STRIP_TEMPLATE (decl))
 	  && DECL_DELETED_FN (decl));
 }
 
+/* Returns 0 if DECL is NOT a C++11 defaulted special member function,
+   1 if it is explicitly defaulted within the class body, or 2 if it
+   is explicitly defaulted outside the class body.  */
+
+int
+cp_function_decl_defaulted (const_tree decl)
+{
+  if (decl
+  && DECL_LANG_SPECIFIC (STRIP_TEMPLATE (decl))
+  && DECL_DEFAULTED_FN (decl))
+{
+  if (DECL_DEFAULTED_IN_CLASS_P (decl))
+	return 1;
+
+  if (DECL_DEFAULTED_OUTSIDE_CLASS_P (decl))
+	return 2;
+}
+
+  return 0;
+}
+
 /* Stubs to keep c-opts.c happy.  */
 void
 push_file_scope (void)
diff --git a/gcc/cp/cp-objcp-common.h 

one more patch for PR69847

2016-08-05 Thread Vladimir N Makarov

  This is a patch to fix some testsuite failures reported for arm:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69847

  The patch was bootstrapped and tested on x86-64 and ppc64.

Committed as rev. 239180.

Index: ChangeLog
===
--- ChangeLog	(revision 239179)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2016-08-05  Vladimir Makarov  
+
+	PR rtl-optimization/69847
+	* lra-constraints.c (process_invariant_for_inheritance): Save
+	pattern instead of src.
+	(remove_inheritance_pseudos): Use the pattern.  Add assert.
+
 2016-08-05  David Malcolm  
 
 	* input.c (string_concat::string_concat): New constructor.
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 238991)
+++ lra-constraints.c	(working copy)
@@ -5475,7 +5475,7 @@ process_invariant_for_inheritance (rtx d
 	cl, "invariant inheritance");
 	  bitmap_set_bit (_inheritance_pseudos, REGNO (new_reg));
 	  bitmap_set_bit (_only_regs, REGNO (new_reg));
-	  lra_reg_info[REGNO (new_reg)].restore_rtx = invariant_rtx;
+	  lra_reg_info[REGNO (new_reg)].restore_rtx = PATTERN (insn);
 	  start_sequence ();
 	  lra_emit_move (new_reg, dst_reg);
 	  new_insns = get_insns ();
@@ -6343,9 +6343,11 @@ remove_inheritance_pseudos (bitmap remov
 		  start_sequence ();
 		  /* We can not just change the source.  It might be
 		 an insn different from the move.  */
-		  lra_emit_move (SET_DEST (set), lra_reg_info[sregno].restore_rtx);
+		  emit_insn (lra_reg_info[sregno].restore_rtx);
 		  rtx_insn *new_insns = get_insns ();
 		  end_sequence ();
+		  lra_assert (single_set (new_insns) != NULL
+			  && SET_DEST (set) == SET_DEST (single_set (new_insns)));
 		  lra_process_new_insns (curr_insn, NULL, new_insns,
 	 "Changing reload<-invariant inheritance");
 		  delete_move_and_clobber (curr_insn, dregno);


Re: backward threading heuristics tweek

2016-08-05 Thread Jeff Law

On 06/06/2016 04:19 AM, Jan Hubicka wrote:

Hi,
while looking into profile mismatches introduced by the backward threading pass
I noticed that the heuristics seems quite simplistics.  First it should be
profile sensitive and disallow duplication when optimizing cold paths. Second
it should use estimate_num_insns because gimple statement count is not really
very realistic estimate of final code size effect and third there seems to be
no reason to disable the pass for functions optimized for size.
I've never been happy with the size estimation code -- it's a series of 
hacks to address a couple pathological codesize regressions that were 
seen when comparing gcc-6 to prior versions.  I won't lose any sleep if 
we move to estimate_num_insns and rely more on profile data.





If we block duplication for more than 1 insns for size optimized paths the pass
is able to do majority of threading decisions that are for free and improve 
codegen.
The code size benefit was between 0.5% to 2.7% on testcases I tried (tramp3d,
GCC modules, xlanancbmk and some other stuff around my hd).

Bootstrapped/regtested x86_64-linux, seems sane?

The pass should also avoid calling cleanup_cfg when no trheading was done
and i do not see why it is guarded by expensive_optimizations. What are the
main compile time complexity limitations?
The pass essentially walks up the use-def links in the CFG.  WHen it 
encounters a PHI, we walk up every PHI argument.  That's where it gets 
expensive.  I think there's a better algorithm to do those walks that 
doesn't start at the beginning each time, but I haven't had time to 
experiment with coding it up.





Honza

* tree-ssa-threadbackward.c: Include tree-inline.h
(profitable_jump_thread_path): Use estimate_num_insns to estimate
size of copied block; for cold paths reduce duplication.
(find_jump_threads_backwards): Remove redundant tests.
(pass_thread_jumps::gate): Enable for -Os.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
Index: tree-ssa-threadbackward.c
===
--- tree-ssa-threadbackward.c   (revision 237101)
+++ tree-ssa-threadbackward.c   (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
+#include "tree-inline.h"
Probably an indication we want estimate_num_insns broken out into a more 
generic place that could be used by the threader, inlining, etc.  Please 
consider that as a follow-up.






 static int max_threaded_paths;

@@ -210,7 +211,7 @@ profitable_jump_thread_path (vec
Whitespace nit here before the "-=".

I think this is fine with the whitespace fix.  And again, consider 
moving estimate_num_insns to a new location outside the inliner.


jeff




Re: Go patch committed: Avoid non-ASCII characters in asm identifiers

2016-08-05 Thread Ian Lance Taylor
On Fri, Aug 5, 2016 at 2:13 PM, Joseph Myers  wrote:
> On Fri, 5 Aug 2016, Ian Lance Taylor wrote:
>
>> PR 72812 points out that Go can generate non-ASCII characters in
>> assembly code.  This is a consequence of the fact that Go permits
>> identifiers to contain non-ASCII Unicode code points.  The GNU
>> assembler doesn't seem to mind, but the Solaris assembler does.
>
> Are Go identifiers meant to interoperate at the object code level with C
> and C++ code using the same identifiers (with UCNs, for non-ASCII
> identifiers in C and C++)?  If so, this change would break such
> interoperation, since C and C++ use UTF-8 in the compiler output for such
> identifiers (and the tests are skipped/XFAILed for systems where the
> assembler doesn't support this).

Interesting.  In any case, no, they are not.  Go identifiers are of
the form p.N, where p is a package name, so they never interoperate
with C/C++ code.  There is a mechanism for specifying the external
link name for a Go identifier, and for that case the compiler will not
apply any encoding--it will just the name as written.

Ian


Re: [PATCH,rs6000] Add built-in function support Power9 binary floating point operations

2016-08-05 Thread Pat Haugen
On 08/02/2016 03:15 PM, Segher Boessenkool wrote:
> On Tue, Aug 02, 2016 at 03:03:42PM -0500, Pat Haugen wrote:
>> On 07/29/2016 10:47 AM, Kelvin Nilsen wrote:
>>> +  "xsxexpdp %0,%x1"
>>> +  [(set_attr "type" "fp")])
>>
>> Type should be 'integer'.
> 
> It has VSX regs as input, integer is worse than fpsimple here, I think?
> Or vecsimple, that seems better yes.

It's a 2 cycle op that executes in the ALU pipe, just like other integer ops. 
vecsimple would be worse since those consume a superslice, which this op 
doesn't.




Re: [PATCH] Improve backwards threading

2016-08-05 Thread Jeff Law

On 08/05/2016 07:36 AM, Richard Biener wrote:

@@ -560,6 +562,14 @@ fsm_find_control_statement_thread_paths
  edge taken_edge = profitable_jump_thread_path (path, bbi, name,
arg);
  if (taken_edge)
{
+ /* If the taken edge is a loop entry avoid mashing two
+loops into one with multiple latches by splitting
+the edge.  This only works if that block won't become
+a latch of this loop.  */
+ if ((bb_loop_depth (taken_edge->src)
+  < bb_loop_depth (taken_edge->dest))
+ && ! single_succ_p (bbi))
+   split_edge (taken_edge);
  if (bb_loop_depth (taken_edge->src)
  >= bb_loop_depth (taken_edge->dest))
convert_and_register_jump_thread_path (path, taken_edge);

note you need the PR72772 fix to trigger all this.
I'm a little confused here.  In the case where the taken edge goes into 
a deeper loop nest you're splitting the edge -- to what end?  The 
backwards threader isn't going to register that jump thread.  So if this 
is fixing something, then we've got the fix in the wrong place.



FWIW, I the anonymous name fix ought to go in separately, it looks like 
the right thing to do independent of this stuff.


jeff



Re: Go patch committed: Avoid non-ASCII characters in asm identifiers

2016-08-05 Thread Joseph Myers
On Fri, 5 Aug 2016, Ian Lance Taylor wrote:

> PR 72812 points out that Go can generate non-ASCII characters in
> assembly code.  This is a consequence of the fact that Go permits
> identifiers to contain non-ASCII Unicode code points.  The GNU
> assembler doesn't seem to mind, but the Solaris assembler does.

Are Go identifiers meant to interoperate at the object code level with C 
and C++ code using the same identifiers (with UCNs, for non-ASCII 
identifiers in C and C++)?  If so, this change would break such 
interoperation, since C and C++ use UTF-8 in the compiler output for such 
identifiers (and the tests are skipped/XFAILed for systems where the 
assembler doesn't support this).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Improve forward jump threading of switch statements (PR18046)

2016-08-05 Thread Jeff Law

On 07/28/2016 09:46 PM, Patrick Palka wrote:

This patch improves the forward jump threader's ability to thread
GIMPLE_SWITCHes by making the VRP simplification callback attempt to
determine which case label will be taken.

For example, if the index operand of a switch has a value range ~[5,6]
along some edge and the switch statement has no "case 5" or "case 6"
label then this patch recognizes such a scenario as an opportunity for
threading through the switch and to the switch's default bb.

This improvement is necessary for the code in comment #1 of PR18046 to
get threaded.  There we have (in the VRP2 dump):

bar ()
{
  int i.0_1;
  int i.0_2;
  int pretmp_8;
  int prephitmp_9;
  int i.0_10;

  :
  i.0_1 = i;
  switch (i.0_1) , case 0: >

:
  i.0_10 = ASSERT_EXPR ;
  goto  ();  // <-- this can be threaded to 

:
  i.0_2 = ASSERT_EXPR ;
  foo ();
  pretmp_8 = i;

  # prephitmp_9 = PHI 
:
  switch (prephitmp_9) , case 0: >

:
  foo ();

:
  return;

}

The FSM threader can't thread the default edge of the 1st switch through
to the default edge of the 2nd switch because i.0_1 doesn't have a known
constant value along that path -- it could have any non-zero value.  For
this scenario and others like it, it is necessary to consider value
ranges.

During bootstrap this simplification triggered about 1000 times in
total.  It's not an impressive amount but the simplification itself is
cheap.

Bootstrap + regtest in progress on x86_64-pc-linux-gnu.  Does this look
OK to commit if there are no new regressions?

gcc/ChangeLog:

PR tree-optimization/18046
* tree-ssa-threadedge.c: Include cfganal.h.
(simplify_control_statement_condition): If simplifying a
GIMPLE_SWITCH, replace the index operand of the GIMPLE_SWITCH
with the dominating ASSERT_EXPR before handing it off to VRP.
Mention that a CASE_LABEL_EXPR may be returned.
(thread_around_empty_blocks): Adjust to handle
simplify_control_statement_condition() returning a
CASE_LABEL_EXPR.
(thread_through_normal_block): Likewise.
* tree-vrp.c (simplify_stmt_for_jump_threading): Simplify
a switch statement by trying to determine which case label
will be taken.

gcc/testsuite/ChangeLog:

PR tree-optimization/18046
* gcc.dg/tree-ssa/vrp105.c: New test.
* gcc.dg/tree-ssa/vrp106.c: New test.

OK.

Thanks for your patience,
jeff



Re: [PATCH/AARCH64] Improve ThunderX code generation slightly with load/store pair

2016-08-05 Thread Jim Wilson

On 08/05/2016 12:18 AM, Andrew Pinski wrote:

This patch disables the forming of the load/store pairs for SImode if
we are tuning for ThunderX.  I used the tuning flags route so it can
be overridden if needed later on or if someone else wants to use the
same method for their core.


+  if (mode == SImode
+  && AARCH64_EXTRA_TUNE_SLOW_LDPW
+  && !optimize_size)
+return false;

AARCH64_EXTRA_TUNE_SLOW_LDPW is a non-zero bit-mask.  That will always 
be true.  This is present in two places in the patch.  You need 
something more like


   && (aarch64_tune_params.extra_tuning_flags
   & AARCH64_EXTRA_TUNE_SLOW_LDPW)

You should verify that the patch disables the optimization for ThunderX 
but does not disable it for other targets.


Jim



Re: [PATCH] Remove unnecessary jump threading restriction

2016-08-05 Thread Jeff Law

On 08/05/2016 01:45 AM, Richard Biener wrote:


There is no need to avoid threading to a loop header, the threading
code can cope with this just fine.  Noticed when working on PR72772.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-08-05  Richard Biener  

* tree-ssa-threadupdate.c (thread_block_1): Remove unnecessary
restriction on threading to a loop header.
This restriction exists to help prevent threading through a loop header 
into the loop which can easily create an irreducible region.


In fact what you've done is revert the fix for 58343.  Now there is of 
course a test for 58343 in the regression suite.


It's entirely possible that 58343 is just latent now or other changes in 
tree-ssa-threadupdate.c prevent the problems that we saw with 58343.


Jeff



Re: [PATCH] gcov tool: Implement Hawick's algorithm for cycle detection, (PR gcov-profile/67992)

2016-08-05 Thread Michael Meissner
On Fri, Aug 05, 2016 at 08:27:39AM -0700, Andrew Pinski wrote:
> On Fri, Aug 5, 2016 at 12:32 AM, Martin Liška  wrote:
> > On 08/05/2016 09:30 AM, Martin Liška wrote:
> >> Hi.
> >>
> >> Sorry for the mistake with the enum, that was silly ;)
> >> New patch version also handles the unnecessary braces.
> >>
> >> Martin
> >
> > I attached a wrong patch, sending the new one.
> 
> 
> This broke cross aarch64-elf-gnu building.
> 
> /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/gcov.c:
> In function ‘loop_type handle_cycle(const arc_vector_t&, int64_t&)’:
> /home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/gcov.c:468:25:
> error: ‘INT64_MAX’ was not declared in this scope
> 
> 
> See 
> http://stackoverflow.com/questions/3233054/error-int32-max-was-not-declared-in-this-scope
> .

We are seeing the same thing on RHEL 7.2 PowerPC systems and RHEL 6.7 x86_64
systems when building a cross compiler.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [RS6000] Force source of fix_truncsi2 to reg

2016-08-05 Thread Michael Meissner
On Wed, Aug 03, 2016 at 12:02:54AM +0930, Alan Modra wrote:
> This is a hack I tried to avoid having to poke at lra code for
> pr71680..
> 
> The idea of adding force_reg here was that it removes subregs from
> fix_trunc, emitting the subreg mode conversion in a separate set insn.
> That avoids the underlying lra issue, by virtue of combine merging the
> SF subreg with the SI mem load, at least for -m64.
> 
> For -m32 combine rejects the combination due to the mem address being
> a lo_sum which is a mode dependent address.  Of course even for -m64,
> combine probably shouldn't allow this combination, and wouldn't if the
> rs6000 rtx_costs function said that SLOW_UNALIGNED_ACCESS mems
> actually cost more.
> 
> So this patch isn't a particularly good solution to pr71680, but
> a) force_reg for an operand that must be a reg is 100% safe, and
> b) it's good to expose more combine opportunities.
> 
> Bootstrapped and regression tested powerpc64le-linux and
> powerpc64-linux.
> 
>   * config/rs6000/rs6000.md (fix_truncsi2): Force source operand
>   to a reg.  Localize vars.
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 5afae92..45ad661 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -5357,15 +5357,15 @@
>  {
>if (!)
>  {
> -  rtx tmp, stack;
> +  rtx src = force_reg (SFmode, operands[1]);
> 
>if (TARGET_STFIWX)
> - emit_insn (gen_fix_truncsi2_stfiwx (operands[0], operands[1]));
> + emit_insn (gen_fix_truncsi2_stfiwx (operands[0], src));
>else
>   {
> -   tmp = gen_reg_rtx (DImode);
> -   stack = rs6000_allocate_stack_temp (DImode, true, false);
> -   emit_insn (gen_fix_truncsi2_internal (operands[0], operands[1],
> +   rtx tmp = gen_reg_rtx (DImode);
> +   rtx stack = rs6000_allocate_stack_temp (DImode, true, false);
> +   emit_insn (gen_fix_truncsi2_internal (operands[0], src,
> tmp, stack));
>   }
>DONE;

Ummm, this patch looks wrong.  Because the insn uses the SFDF iterator, the
mode of operands[1] could be either SFmode or DFmode.  I think it should be:

-  rtx tmp, stack;
+  rtx src = force_reg (mode, operands[1]);


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH][PR64971]Convert function pointer to Pmode when emit call

2016-08-05 Thread Jeff Law

On 08/04/2016 08:32 AM, Renlin Li wrote:

Hi all,

In the case of PR64971
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971),
the compiler ICE when compiling gcc.c-torture/compile/pr37433.c with
ilp32 abi.

As we know, in aarch64 ilp32, the ptr_mode is SImode while Pmode is
still DImode. It means all address should be DImode, and the backend
defines the patterns with this assumption.

The generic part expand_expr_addr_expr () function however generates a
SYMBOL_REF with SImode, it's later used as the address of a MEM rtx
pattern in a call expression. There is no matching pattern for this
SImode address, that's why gcc ICEs.
(symbol_ref/f:SI ("*.LC0") [flags 0x82] )

But here, I think what expand_expr_addr_expr does is correct. In this
particular case, expand_expr_addr_expr is not generating an address.
According to the source code, it's generating a function pointer, and
later this pointer is used in a call expression. So SImode should be
right in this case.

The behavior of the test case is, get the address of a piece of memory,
cast it into a function pointer, and call the function. IIUC, the flow
is like this:
CALL_EXPR ( NOP_EXPR (ADDR_EXPR ()))

NOP_EXPR here is to convert the address into a function pointer which
should be ptr_mode (SImode). So it's the responsibility of call expander
to convert the pointer into Pmode to create legal call rtx patern.

In the test case, there are two functions. The first function generates
function calls with a SYMBOL_REF as address, the second one generates a
REG as address. They are all of ptr_mode.
However, prepare_call_address () will convert the REG into Pmode to make
it as a legal address while SYMBOL_REF is missed. That's why I add the
code there.

And I want to change the PR64971 into a middle-end issue. The ICE
manifests in aarch64 target, but I believe this should be a generic
problem for targets which define ptr_mode different from Pmode.

There is a test case already, so I didn't add one.
aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
But I believe this may not help as the default abi is LP64.

It will be great if Andrew you can help to do regression test in your
aarch64 ilp32 environment.

And I double checked that, the backend fix can be removed without any
problem. It's good to expose middle-end bugs.

Okay for trunk and backport to branch 6?

gcc/ChangeLog:

2016-08-04  Renlin Li  

PR middle-end/64971
* calls.c (prepare_call_address): Convert funexp to Pmode when
necessary.
* config/aarch64/aarch64.md (sibcall): Remove fix for PR 64971.
(sibcall_value): Likewise.

OK.

jeff



Go patch committed: Avoid non-ASCII characters in asm identifiers

2016-08-05 Thread Ian Lance Taylor
PR 72812 points out that Go can generate non-ASCII characters in
assembly code.  This is a consequence of the fact that Go permits
identifiers to contain non-ASCII Unicode code points.  The GNU
assembler doesn't seem to mind, but the Solaris assembler does.

This patch changes the GCC Go interface to encode non-ASCII characters
in identifier names by setting DECL_ASSEMBLER_NAME to an encoded
value.  This fixes the problem with the Solaris assembler without
disturbing the debug info.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Ran reflect
package tests on Solaris and confirmed that they now pass.  Committed
to mainline.

Ian


2016-08-05  Ian Lance Taylor  

PR go/72812
* go-gcc.cc (char_needs_encoding): New static function.
(needs_encoding, fetch_utf8_char): New static functions.
(encode_id): New static function.
(Gcc_backend::global_variable): Set asm name if the name is not
simple ASCII.
(Gcc_backend::implicit_variable): Likewise.
(Gcc_backend::implicit_variable_reference): Likewise.
(Gcc_backend::immutable_struct): Likewise.
(Gcc_backend::immutable_struct_reference): Likewise.
(Gcc_backend::function): Likewise.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 238653)
+++ gcc/go/go-gcc.cc(working copy)
@@ -541,7 +541,7 @@ private:
   std::map builtin_functions_;
 };
 
-// A helper function.
+// A helper function to create a GCC identifier from a C++ string.
 
 static inline tree
 get_identifier_from_string(const std::string& str)
@@ -549,6 +549,102 @@ get_identifier_from_string(const std::st
   return get_identifier_with_length(str.data(), str.length());
 }
 
+// Return whether the character c is OK to use in the assembler.
+
+static bool
+char_needs_encoding(char c)
+{
+  switch (c)
+{
+case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+case 'Y': case 'Z':
+case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+case 'y': case 'z':
+case '0': case '1': case '2': case '3': case '4':
+case '5': case '6': case '7': case '8': case '9':
+case '_': case '.': case '$': case '/':
+  return false;
+default:
+  return true;
+}
+}
+
+// Return whether the identifier needs to be translated because it
+// contains non-ASCII characters.
+
+static bool
+needs_encoding(const std::string& str)
+{
+  for (std::string::const_iterator p = str.begin();
+   p != str.end();
+   ++p)
+if (char_needs_encoding(*p))
+  return true;
+  return false;
+}
+
+// Pull the next UTF-8 character out of P and store it in *PC.  Return
+// the number of bytes read.
+
+static size_t
+fetch_utf8_char(const char* p, unsigned int* pc)
+{
+  unsigned char c = *p;
+  if ((c & 0x80) == 0)
+{
+  *pc = c;
+  return 1;
+}
+  size_t len = 0;
+  while ((c & 0x80) != 0)
+{
+  ++len;
+  c <<= 1;
+}
+  unsigned int rc = *p & ((1 << (7 - len)) - 1);
+  for (size_t i = 1; i < len; i++)
+{
+  unsigned int u = p[i];
+  rc <<= 6;
+  rc |= u & 0x3f;
+}
+  *pc = rc;
+  return len;
+}
+
+// Encode an identifier using ASCII characters.
+
+static std::string
+encode_id(const std::string id)
+{
+  std::string ret;
+  const char* p = id.c_str();
+  const char* pend = p + id.length();
+  while (p < pend)
+{
+  unsigned int c;
+  size_t len = fetch_utf8_char(p, );
+  if (len == 1 && !char_needs_encoding(c))
+   ret += c;
+  else
+   {
+ ret += "$U";
+ char buf[30];
+ snprintf(buf, sizeof buf, "%x", c);
+ ret += buf;
+ ret += "$";
+   }
+  p += len;
+}
+  return ret;
+}
+
 // Define the built-in functions that are exposed to GCCGo.
 
 Gcc_backend::Gcc_backend()
@@ -2454,8 +2550,14 @@ Gcc_backend::global_variable(const std::
   std::string asm_name(pkgpath);
   asm_name.push_back('.');
   asm_name.append(name);
+  if (needs_encoding(asm_name))
+   asm_name = encode_id(asm_name);
   SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(asm_name));
 }
+  else if (needs_encoding(var_name))
+SET_DECL_ASSEMBLER_NAME(decl,
+   get_identifier_from_string(encode_id(var_name)));
+
   TREE_USED(decl) = 1;
 
   if (in_unique_section)
@@ -2690,6 +2792,8 @@ Gcc_backend::implicit_variable(const std
   SET_DECL_ALIGN(decl, alignment * BITS_PER_UNIT);
   DECL_USER_ALIGN(decl) = 1;
 }
+  if (needs_encoding(name))
+SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name)));

Re: protected alloca class for malloc fallback

2016-08-05 Thread Richard Biener
On August 5, 2016 8:15:54 PM GMT+02:00, Oleg Endo  wrote:
>On Fri, 2016-08-05 at 19:55 +0200, Richard Biener wrote:
>
>> 
>> Please don't use std::string.  For string building you can use
>> obstacks.
>> 
>
>Just out of curiosity ... why?  I remember there was some discussion
>about it, what was the conclusion?  Is that now a general rule or does
>it depend on the context where strings are used?

Because you make a messy mix of string handling variants.  Std::string is not 
powerful enough to capture all uses, it is vastly more expensive to embed into 
structs and it pulls in too much headers.
(Oh, and I hate I/o streams even more)

Richard.

>Cheers,
>Oleg




Re: [C++/68724] ICE with TRAIT_EXPR

2016-08-05 Thread Nathan Sidwell

On 08/05/16 12:06, Jason Merrill wrote:

On Fri, Aug 5, 2016 at 8:24 AM, Nathan Sidwell  wrote:

I've committed this patch to fix a failed assertion.  unify's (pt.c) default
case asserts EXPR_P(parm), but TRAIT_EXPR is tcc_exceptional, so fails.
However from unify's POV it's no more exceptional than, say, SIZEOF_EXPR, so
should be allowed at that point.


Please also apply this to the relevant release branches.


committed to 5 & 6 branches.

nathan


Re: [PR49366] emit loc exprs for C++ non-virtual pmf template value parms

2016-08-05 Thread Alexandre Oliva
On Aug  5, 2016, Jason Merrill  wrote:

> On Sat, Jul 23, 2016 at 10:30 AM, Alexandre Oliva  wrote:
>> With this patch, when we can't emit a DW_AT_const_value, we emit each
>> "member" of the pointer to member function "record" as a
>> DW_OP_stack_value DW_OP_piece, as long as the referenced member
>> function is output in the same translation unit, otherwise we'd get
>> relocations to external symbols, something to avoid in debug sections.

> I wonder if it would make sense to use weak references...

Hmm...  That certainly wouldn't avoid the relocation, but if the problem
is just the possibility that the symbol is undefined, that would work.
OTOH, it would hide the error one would expect out of a hard reference
in the same translation unit.  Could we ever have a weak reference in
debug info without a corresponding hard reference in the same
translation unit?  We'd have to take that into account.  I'll think
about it some more and maybe give that a shot.

>> + if (pos > offset)
>> +   {
>> + ret1 = new_loc_descr (DW_OP_piece, pos - offset, 0);
>> + add_loc_descr (, ret1);
>> + offset = pos;
>> +   }

> This seems like it will emit a DW_OP_piece with nothing before it,
> which I think is invalid DWARF.

If so, there'd be a long-standing bug in the DW_OP examples in the DWARF
standard:

  DW_OP_reg0 DW_OP_piece 4 DW_OP_piece 4 DW_OP_fbreg -12 DW_OP_piece 4
A twelve byte value whose first four bytes reside in register zero,
whose middle four bytes are unavailable (perhaps due to
optimization), and whose last four bytes are in memory, 12 bytes
before the frame base.

However, this use seems consistent with the description of DW_OP_piece,
that says it takes an operand that describes the size of the preceding
simple location description.  An empty location description is-a simple
location description.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [wwwdocs] Add branch description for new branch unified-autovect

2016-08-05 Thread Gerald Pfeifer
On Fri, 8 Jul 2016, Sameera Deshpande wrote:
> I have created new branch unified-autovect based on ToT.
> 
> Please find attached the patch adding information about new branch 
> "unified-autovect" in the documentation. Is it ok to commit?

Yes, please!

(Once a branch exists, you don't need to ask for approval to
document it, in fact. ;-)

Gerald

PS: I have been away for most of the month of July, still catching up.



Re: protected alloca class for malloc fallback

2016-08-05 Thread Oleg Endo
On Fri, 2016-08-05 at 19:55 +0200, Richard Biener wrote:

> 
> Please don't use std::string.  For string building you can use
> obstacks.
> 

Just out of curiosity ... why?  I remember there was some discussion
about it, what was the conclusion?  Is that now a general rule or does
it depend on the context where strings are used?

Cheers,
Oleg


Re: protected alloca class for malloc fallback

2016-08-05 Thread Richard Biener
On August 5, 2016 4:42:48 PM GMT+02:00, Aldy Hernandez  wrote:
>
>> I was surprised by the always_inline trick.  I suppose it makes
>> sense but I wouldn't have expected to be able to rely on it.  Out
>> of curiosity I tested it with other compilers.  It turns out that
>> it only works with some like GCC and IBM XLC++, but not others
>> like Clang or Sun CC.  In recursive calls, they don't seem to
>> hold on to the memory allocated via alloca by the class ctor in
>> the caller.
>
>Well, it was surprising to me as well, hence the comment.  I expected
>it 
>to just work, and when it didn't I had to hunt in the inliner code to 
>find out why it was selectively inlining:
>
> case GIMPLE_CALL:
>/* Refuse to inline alloca call unless user explicitly forced so as
>this may change program's memory overhead drastically when the
>function using alloca is called in loop.  In GCC present in
>SPEC2000 inlining into schedule_block cause it to require 2GB of
>RAM instead of 256MB.  Don't do so for alloca calls emitted for
>VLA objects as those can't cause unbounded growth (they're always
>wrapped inside stack_save/stack_restore regions.  */
>
>As Richi pointed out, if the constructor doesn't get inlined (as you're
>
>seeing in Clang and Sun CC), we could potentially clobber freed memory.
>
>  So much for that idea...
>
>>
>> FWIW, rather than creating a new class around alloca and putting
>> effort into changing code to use it I think that investing time
>> into replacing the function's uses with an existing C++ container
>> class (either GCC's own or one from the STL) might be a better
>> way to go.
>
>Yes, I suggested a combination of auto_vec<> (gcc's) and std::string 
>down thread.

Please don't use std::string.  For string building you can use obstacks.

Richard.

>Thanks for checking out the result from other compilers.  Much
>appreciated.
>
>Aldy




Re: protected alloca class for malloc fallback

2016-08-05 Thread Richard Biener
On August 5, 2016 6:23:27 PM GMT+02:00, Jeff Law  wrote:
>On 08/05/2016 08:37 AM, Aldy Hernandez wrote:
>>
>> After looking at all this code, I realize no size will fit all if we
>> want something clean.  So, I take my approach back.  I think we're
>best
>> served by a couple different approaches, depending on usage.
>Yup.
>
>>
>> My question is, how wed are we to an alloca based approach?  Can we
>> replace the non-critical uses by std::string (diagnostics,
>> BLAH_check_failed(), building a string to manipulate the output of
>> env("SOME_VAR"), etc?).
>I don't think we are at all wed to alloca.
>
>>
>> Then for things like the GCC driver wanting to create a vector of
>passed
>> commands, we could use auto_vec<> (*gasp*, using a vector to
>represent a
>> vector ;-)).  Note that auto_vec<> uses malloc (or GC) but creating a
>> malloc'd vector at startup is hardly on the fast path.
>>
>> For the remaining things we could use either alloca with a malloc
>> fallback for trivial things like set_user_assembler_name() that only
>> have one exit point, or take it on a case by case basis.
>>
>> But it seems to me that we can use an STL container for the non
>critical
>> things (which are the majority of them), and something else (perhaps
>an
>> RAII thinggie TBD later).
>>
>> Is this reasonable?  I'd like to know before I spend any time
>converting
>> anything to std::string and auto_vec<>.
>Yes, this is all reasonable to me.  I'm a big believer in moving
>towards 
>standard library implementations of things.  In this case we get to 
>remove the allocas *and* make the code easier to grok for those that
>are 
>familiar with the C++ standard library.

But at the same time please consider that each malloc costs 1000x more than an 
alloca.  So if you replace one alloca that is executed once during compile, 
fine.  If you replace one that is done for each symbol (the get_identifier 
case) it is not.

Richard.

>Jeff




Re: [PR49366] emit loc exprs for C++ non-virtual pmf template value parms

2016-08-05 Thread Jason Merrill
On Sat, Jul 23, 2016 at 10:30 AM, Alexandre Oliva  wrote:
> We used to emit, in debug information, the values bound to pointer to
> member function template parameters only when they were NULL or
> virtual member functions, because those can be represented with
> DW_AT_const_value.
>
> In order to represent the symbolic pointer to member function
> constants for non-virtual member functions, we'd need to be able to
> emit relocations for part of DW_AT_const_value, which we don't. The
> more viable alternative is to use DW_AT_location to represent such
> values, as slated for inclusion in DWARFv5, according to
> .

> With this patch, when we can't emit a DW_AT_const_value, we emit each
> "member" of the pointer to member function "record" as a
> DW_OP_stack_value DW_OP_piece, as long as the referenced member
> function is output in the same translation unit, otherwise we'd get
> relocations to external symbols, something to avoid in debug sections.

I wonder if it would make sense to use weak references...

> + if (pos > offset)
> +   {
> + ret1 = new_loc_descr (DW_OP_piece, pos - offset, 0);
> + add_loc_descr (, ret1);
> + offset = pos;
> +   }

This seems like it will emit a DW_OP_piece with nothing before it,
which I think is invalid DWARF.  Perhaps we should use DW_OP_lit0
first, or fail the expansion.

> + if (offset != size)
> +   {
> + ret1 = new_loc_descr (DW_OP_piece, size - offset, 0);
> + add_loc_descr (, ret1);
> + offset = size;
> +   }

Likewise.

Jason


Re: [PATCH] gcov tool: Implement Hawick's algorithm for cycle detection, (PR gcov-profile/67992)

2016-08-05 Thread Gerald Pfeifer
On Fri, 5 Aug 2016, Andrew Pinski wrote:
>> I attached a wrong patch, sending the new one.
> This broke cross aarch64-elf-gnu building.

It also broke native i586-unknown-freebsd10.3 (which features 
clang as the system compiler).

/scratch/tmp/gerald/gcc-HEAD/gcc/gcov.c:468:25: error: use of undeclared 
identifier 'INT64_MAX'
  int64_t cycle_count = INT64_MAX;
^

Gerald


Re: [PATCH] aarch64: Add split-stack initial support

2016-08-05 Thread Adhemerval Zanella
Ping.

On 28/07/2016 17:36, Adhemerval Zanella wrote:
> From: Adhemerval Zanella 
> 
> This patch adds the split-stack support on aarch64 (PR #67877).  As for
> other ports this patch should be used along with glibc and gold support.
> 
> The support is done similar to other architectures: a __private_ss field is
> added on TCB in glibc, a target-specific __morestack implementation and
> helper functions are added in libgcc and compiler supported adjustments
> (split-stack prologue, va_start for argument handling).  I also plan to
> send the gold support to adjust stack allocation acrosss split-stack
> and default code calls.
> 
> Current approach is similar to powerpc one: at most 2 GB of stack allocation
> is support so stack adjustments can be done with 2 instructions (either just
> a movn plus nop or a movn followed by movk).  The morestack call is non
> standard with x10 hollding the requested stack pointer and x11 the required
> stack size to be copied.  Also function arguments on the old stack are
> accessed based on a value relative to the stack pointer, so x10 is used to
> hold theold stack value.  Unwinding is handled by a personality routine that
> knows how to find stack segments.
> 
> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):
> 
>   mrsx9, tpidr_el0
>   movx10, -
>   nop/movk
>   addx10, sp, x10
>   ldrx9, [x9, 16]
>   cmpx10, x9
>   b.csenough
>   stpx30, [sp, -16]movx11, 
>   movx11, 
>   bl __morestack
>   ldpx30, [sp], 16
>   ret
> enough:
>   # usual function prologue, modified a little at the end to set up the
>   # arg_pointer in x10, starts here.  The arg_pointer is initialized,
>   # if it is used, with
>   mov x11, 
>   add x10, x29, x11
>   b.csfunction
>   mov x10, x28
> function:
> 
> Notes:
>  1. Even if a function does not allocate a stack frame, a split-stack prologue
> is created.  It is to avoid issues with tail call for external symbols
> which might require linker adjustment (libgo/runtime/go-varargs.c).
> 
>  2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
> to after the required stack calculation.
> 
>  3. Similar to powerpc, When the linker detects a call from split-stack to
> non-split-stack code, it adds 16k (or more) to the value found in 
> "allocate"
> instructions (so non-split-stack code gets a larger stack).  The amount is
> tunable by a linker option.  The edit means aarch64 does not need to
> implement __morestack_non_split, necessary on x86 because insufficient
> space is available there to edit the stack comparison code.  This feature
> is only implemented in the GNU gold linker.
> 
>  4. AArch64 does not handle >2G stack initially and although it is possible
> to implement it, limiting to 2G allows to materize the allocation with
> only 2 instructions (mov + movk) and thus simplifying the linker
> adjustments required.  Supporting multiple threads each requiring more
> than 2G of stack is probably not that important, and likely to OOM at
> run time.
> 
>  5. The TCB support on GLIBC is meant to be included in version 2.25 [1].
> 
> I tested bootstrapping on aarch64-linux-gnu and although still digesting
> the results I saw no regression.  All cgo tests are passing, although based
> on previous reports in other archs gold support should be present to avoid
> issues on split calling non-split code.
> 
> libgcc/ChangeLog:
> 
>   * libgcc/config.host: Use t-stack and t-statck-aarch64 for
>   aarch64*-*-linux.
>   * libgcc/config/aarch64/morestack-c.c: New file.
>   * libgcc/config/aarch64/morestack.S: Likewise.
>   * libgcc/config/aarch64/t-stack-aarch64: Likewise.
>   * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
>   code.
> 
> gcc/ChangeLog:
> 
>   * common/config/aarch64/aarch64-common.c
>   (aarch64_supports_split_stack): New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
>   macro.
>   * gcc/config/aarch64/aarch64-protos.h: Add
>   aarch64_expand_split_stack_prologue and
>   aarch64_split_stack_space_check.
>   * gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the
>   argument pointer (x10) for split-stack.
>   (aarch64_expand_builtin_va_start): Use internal argument pointer
>   instead of virtual_incoming_args_rtx.
>   (aarch64_expand_split_stack_prologue): New function.
>   (aarch64_file_end): Emit the split-stack note sections.
>   (aarch64_internal_arg_pointer): Likewise.
>   (aarch64_live_on_entry): Set the argument pointer for split-stack.
>   (aarch64_split_stack_space_check): Likewise.
>   (TARGET_ASM_FILE_END): New macro.
>  

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-05 Thread Jiong Wang

Andrew Pinski writes:

> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>  wrote:
>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang  wrote:
>>> >
>>> > Andrew Pinski writes:
>>> >
>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang  wrote:
>>> >>>
>>> >>> James Greenhalgh writes:
>>> >>>
>>> > On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> > Current IRA still use both target macros in a few places.
>>> >
>>> > Tell IRA to use the order we defined rather than with it's own cost
>>> > calculation. Allocate caller saved first, then callee saved.
>>> >
>>> > This is especially useful for LR/x30, as it's free to allocate and is
>>> > pure caller saved when used in leaf function.
>>> >
>>> > Haven't noticed significant impact on benchmarks, but by grepping some
>>> > keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the 
>>> > number
>>> > is smaller.
>>> >
>>> > OK for trunk?
>>> 
>>>  OK, sorry for the delay.
>>> 
>>>  It might be mail client mangling, but please check that the trailing 
>>>  slashes
>>>  line up in the version that gets committed.
>>> 
>>>  Thanks,
>>>  James
>>> 
>>> > 2015-05-19  Jiong. Wang  
>>> >
>>> > gcc/
>>> >  PR 63521
>>> >  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>> >  (HONOR_REG_ALLOC_ORDER): Define.
>>> >>>
>>> >>> Patch reverted.
>>> >>
>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>> >> missing an email or something.
>>> >
>>> > There are several execution regressions under gcc testsuite, although as
>>> > far as I can see it's this patch exposed hidding bugs in those
>>> > testcases, but there might be one other issue, so to be conservative, I
>>> > temporarily reverted this patch.
>>>
>>> If you are talking about:
>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>> Etc.
>>>
>>> These test cases are too dependent on the original register allocation order
>>> and really can be safely ignored. Really these three tests should be moved 
>>> or
>>> written in a more sane way.
>>
>> Yup, completely agreed - but the testcases do throw up something
>> interesting. If we are allocating registers to hold 128-bit values, and
>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>> I think we probably see the same thing if the first thing we do in a
>> function is a structure copy through a back-end expanded movmem, which
>> will likely begin with a 128-bit LDP using x7, x8.
>>
>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>> followed by x8, then we've potentially made a sub-optimal decision, our
>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>
>> My hunch is that we *might* get better code generation in this corner case
>> out of some permutation of the allocation order for argument
>> registers. I'm thinking something along the lines of
>>
>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>
>> I asked Jiong to take a look at that, and I agree with his decision to
>> reduce the churn on trunk and just revert the patch until we've come to
>> a conclusion based on some evidence - rather than just my hunch! I agree
>> that it would be harmless on trunk from a testing point of view, but I
>> think Jiong is right to revert the patch until we better understand the
>> code-generation implications.
>>
>> Of course, it might be that I am completely wrong! If you've already taken
>> a look at using a register allocation order like the example I gave and
>> have something to share, I'd be happy to read your advice!
>
> Any news on this patch?  It has been a year since it was reverted for
> a bad test that was failing.

Hi Andrew,

  Yeah, those tests actually are expected to fail once register
  allocation order changed, it's clearly documented in the comments:

  gcc.target/aarch64/aapcs64/abitest-2.h:
  
/* ...

   Note that for value that is returned in the caller-allocated memory
   block, we get the address from the saved x8 register.  x8 is saved
   just after the callee is returned; we assume that x8 has not been
   clobbered at then, although there is no requirement for the callee
   preserve the value stored in x8.  Luckily, all test cases here are
   simple enough that x8 doesn't normally get clobbered (although not
   guaranteed).  */

  I had a local fix which use the redundant value returned to x0 to
  repair the clobbered value in x8 as they will be identical for
  structure type return, however that trick can't play anymore as we
  recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
  remove that redundant x8 to x0 copy.
  
  Anyway, I will come back with some benchmark results of this patch on
  top of latest trunk after the weekend run, and also answers 

Re: protected alloca class for malloc fallback

2016-08-05 Thread Jeff Law

On 08/05/2016 08:37 AM, Aldy Hernandez wrote:


After looking at all this code, I realize no size will fit all if we
want something clean.  So, I take my approach back.  I think we're best
served by a couple different approaches, depending on usage.

Yup.



My question is, how wed are we to an alloca based approach?  Can we
replace the non-critical uses by std::string (diagnostics,
BLAH_check_failed(), building a string to manipulate the output of
env("SOME_VAR"), etc?).

I don't think we are at all wed to alloca.



Then for things like the GCC driver wanting to create a vector of passed
commands, we could use auto_vec<> (*gasp*, using a vector to represent a
vector ;-)).  Note that auto_vec<> uses malloc (or GC) but creating a
malloc'd vector at startup is hardly on the fast path.

For the remaining things we could use either alloca with a malloc
fallback for trivial things like set_user_assembler_name() that only
have one exit point, or take it on a case by case basis.

But it seems to me that we can use an STL container for the non critical
things (which are the majority of them), and something else (perhaps an
RAII thinggie TBD later).

Is this reasonable?  I'd like to know before I spend any time converting
anything to std::string and auto_vec<>.
Yes, this is all reasonable to me.  I'm a big believer in moving towards 
standard library implementations of things.  In this case we get to 
remove the allocas *and* make the code easier to grok for those that are 
familiar with the C++ standard library.


Jeff



Re: RFC: Make diagnostics for C++ reference binding more consistent

2016-08-05 Thread Jason Merrill
OK.

On Fri, Aug 5, 2016 at 5:08 AM, Jonathan Wakely  wrote:
> On 27/07/16 18:06 -0400, Jason Merrill wrote:
>>
>> On Wed, Jul 27, 2016 at 8:05 AM, Jonathan Wakely 
>> wrote:
>>>
>>> Consider:
>>>
>>> template T declval();
>>>
>>> int& r1 = declval();
>>> int&& r2 = declval();
>>> int& r3 = declval();
>>>
>>>
>>> This produces three quite different errors:
>>>
>>>
>>> refs.cc:3:25: error: invalid initialization of non-const reference of
>>> type
>>> 'int&' from an rvalue of type 'int'
>>> int& r1 = declval();
>>>   ~~^~
>>> refs.cc:4:25: error: cannot bind 'int' lvalue to 'int&&'
>>> int&& r2 = declval();
>>>~^~
>>> refs.cc:5:30: error: binding 'const int' to reference of type 'int&'
>>> discards qualifiers
>>> int& r3 = declval();
>>>   ~~~^~
>>>
>>>
>>> The first one uses the order to/from, but the other two are from/to.
>>>
>>> The first and second are saying the same thing (wrong value category)
>>> but very differently.
>>>
>>> The first and third mention references but the second doesn't.
>>>
>>> The standard talks about binding a reference to something, but the
>>> second and third diagnostics talk about binding something to a
>>> reference (and the first doesn't mention binding at all).
>>>
>>> The first diagnostic is specific to lvalue references (it wouldn't be
>>> invalid if it was binding a non-const _rvalue_ reference to an
>>> rvalue), but doesn't use the word "lvalue".
>>>
>>> (FWIW Clang and EDG both have their own inconsistencies for the same
>>> example, but that shouldn't stop us trying to improve.)
>>>
>>>
>>> IMHO the first should use the words "bind" and "lvalue reference":
>>>
>>> refs.cc:3:25: error: cannot bind non-const lvalue reference of type
>>> 'int&'
>>> to an rvalue of type 'int'
>>> int& r1 = declval();
>>>   ~~^~
>>>
>>> The second should use the phrasing "bind ref to value" instead of
>>> "bind value to ref", and mention "rvalue reference":
>>>
>>> refs.cc:4:25: error: cannot bind rvalue reference of type 'int&&' to an
>>> lvalue of type 'int'
>>> int&& r2 = declval();
>>>~^~
>>>
>>> The third should use the same phrasing (but doesn't need to mention
>>> lvalue/rvalue because the problem is related to cv-qualifiers not
>>> value categories):
>>>
>>> refs.cc:5:30: error: binding reference of type 'int&' to 'const int'
>>> discards qualifiers
>>> int& r3 = declval();
>>>   ~~~^~
>>>
>>>
>>> I've only considered trivial examples here, is there some reason to
>>> prefer to current diagnostics for more complex cases?
>>>
>>> The attached patch makes that change, but there are probably lots of
>>> tests that would need updating which I haven't done until I know if
>>> the change is worthwhile.
>>
>>
>> Sounds good to me.
>
>
> Here's the full patch then. There weren't too many tests to adjust.
>
> Tested x86_64-linux. OK for trunk?
>
>


Re: [C++/68724] ICE with TRAIT_EXPR

2016-08-05 Thread Jason Merrill
On Fri, Aug 5, 2016 at 8:24 AM, Nathan Sidwell  wrote:
> I've committed this patch to fix a failed assertion.  unify's (pt.c) default
> case asserts EXPR_P(parm), but TRAIT_EXPR is tcc_exceptional, so fails.
> However from unify's POV it's no more exceptional than, say, SIZEOF_EXPR, so
> should be allowed at that point.

Please also apply this to the relevant release branches.

Thanks,
Jason


Re: [PATCH RFC] do not throw in std::make_exception_ptr

2016-08-05 Thread Jason Merrill
On Wed, Aug 3, 2016 at 8:36 AM, Jonathan Wakely  wrote:
> On 03/08/16 15:02 +0300, Gleb Natapov wrote:
>>
>> On Wed, Aug 03, 2016 at 12:47:30PM +0100, Jonathan Wakely wrote:
>>> Huh. If I'm reading the ABI spec correctly, we should terminate if the
>>> copy constructor throws.
>>>
>> I'll make it terminate like you've suggested then.
>
> Let's leave it as you had it originally. I'm not sure if my reading of
> the ABI is correct, so let's keep the behaviour consistent for now.
>
> We can always revisit it later if we decide terminating is correct.

Terminating is not correct, the ABI document is out of date: CWG 475
(wg21.link/cwg475) clarified that terminate is not called if the copy
constructor throws while initializing the exception object.

Jason


Re: [PATCH] gcov tool: Implement Hawick's algorithm for cycle detection, (PR gcov-profile/67992)

2016-08-05 Thread Andrew Pinski
On Fri, Aug 5, 2016 at 12:32 AM, Martin Liška  wrote:
> On 08/05/2016 09:30 AM, Martin Liška wrote:
>> Hi.
>>
>> Sorry for the mistake with the enum, that was silly ;)
>> New patch version also handles the unnecessary braces.
>>
>> Martin
>
> I attached a wrong patch, sending the new one.


This broke cross aarch64-elf-gnu building.

/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/gcov.c:
In function ‘loop_type handle_cycle(const arc_vector_t&, int64_t&)’:
/home/jenkins/workspace/BuildToolchainAARCH64_thunder_elf_upstream/toolchain/scripts/../src/gcc/gcov.c:468:25:
error: ‘INT64_MAX’ was not declared in this scope


See 
http://stackoverflow.com/questions/3233054/error-int32-max-was-not-declared-in-this-scope
.


Thanks,
Andrew




>
> Martin


Re: [GCC Steering Committee attention] [PING] [PING] [PING] libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to te

2016-08-05 Thread Bernd Schmidt

On 08/04/2016 04:49 PM, Thomas Schwinge wrote:

Global Reviewers are welcome to review OpenACC/OpenMP/offloading patches.
But that doesn't help if that's then not happening in reality.  (With the
exception of Bernd, who then did review such patches for a while, but
also seems to have stopped with that again.)


As for this particular patch, I saw it go by, but there were a number of 
things that stopped me from looking at it which I'll try to explain. The 
one quoted seems to be just a resubmission of one that was posted 
towards the end of this thread:



.


(The resubmission should have contained a pointer to that thread).

That's a submission of three patches. If you follow that thread, you'll 
see that Jakub objected to one of them, asking you to provide a smaller 
one with just the necessary functional changes. In response you posted 
what appears to be just a merge of all three patches together. From my 
point of view, I see no reason to go near it if Jakub has unaddressed 
objections.


It's possible that I've misunderstood something, but that leads me to 
one piece of advice I could give (and I'm repeating myself here): write 
clearer patch introductions. After reading this:



On Wed, 13 Jul 2016 12:37:07 +0200, I wrote:

As discussed before, "offloading compilation is slow; I suppose because
of having to invoke several tools (LTO streaming -> mkoffload -> offload
compilers, assemblers, linkers -> combine the resulting images; but I
have not done a detailed analysis on that)".  For this reason it is
beneficial (that is, it is measurable in libgomp testing wall time) to
limit offload compilation to the one (in the OpenACC case) offload target
that we're actually going to test (that is, execute).  Another reason is
that -foffload=-fdump-tree-[...] produces clashes (that is,
unpredicatable outcome) in the file names of offload compilations' dump
files' names.  Here is a patch to implement that, to specify
-foffload=[...] during libgomp OpenACC testing.  As that has been
challenged before:

| [...] there actually is a difference between offload_plugins and
| offload_targets (for example, "intelmic"
| vs. "x86_64-intelmicemul-linux-gnu"), and I'm using both variables --
| to avoid having to translate the more specific
| "x86_64-intelmicemul-linux-gnu" (which we required in the test harness)
| into the less specific "intelmic" (for plugin loading) in
| libgomp/target.c.  I can do that, so that we can continue to use just a
| single offload_targets variable, but I consider that a less elegant
| solution.


I'm already somewhat confused about what you want to achieve and how. It 
sounds like you're trying to fix a dejagnu issue but then I look at the 
patch and see lots of libgomp configury changes.


Avoid writing information that is irrelevant to the patch at hand, such 
as most of the first sentence in the block quoted above. Don't quote 
yourself from past discussions, write something new that's specific and 
relevant to someone who's going to try and understand your patch. Try 
not to have three or four parenthetical comments in one sentence.


A better way to write the first paragraph might be:

"When testing libgomp for a compiler that supports more than one offload 
target, we iterate over the targets and run the test for each. That 
currently produces binaries for all offload targets, each time, which is 
wasted effort. Hence, this patch, which sets -foffload=xyz as 
appropriate.  This speeds up the compiler and also avoids name clashes 
between dump files."


That even adds information that was missing from your submission: I had 
to look at the patch to see that we iterate over targets, and that's 
relevant to the question of whether we need the patch or not.


Since the patch does way more, that needs to be adequately explained, 
and your original idea of a multi-part submission was in fact correct. 
You need to somehow deal with the fact that one of them was rejected, 
either using persuasion or reworking the other patches (possibly showing 
that such a reworking would be significantly worse).



Bernd


Re: [PATCH, ARM] Add a new target hook to compute the frame layout

2016-08-05 Thread Bernd Edlinger
On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>
> Thanks, that's pretty much what I expected would be the case.
>
> Could I suggest:
>
> This target hook is called once each time the frame layout needs to be
> recalculated.  The calculations can be cached by the target and can then
> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
> layout on every invocation of that hook.  This is particularly useful
> for targets that have an expensive frame layout function.  Implementing
> this callback is optional.
>
> R.
>

Excellent!  I like this wording very much.


Thanks
Bernd.


Re: protected alloca class for malloc fallback

2016-08-05 Thread Pedro Alves
On 08/05/2016 03:37 PM, Aldy Hernandez wrote:

> My question is, how wed are we to an alloca based approach?  Can we
> replace the non-critical uses by std::string (diagnostics,
> BLAH_check_failed(), building a string to manipulate the output of
> env("SOME_VAR"), etc?).

My suggestion too.

( I hope you'll pardon my sticking my nose into this.  I'm hoping that
down the road gcc and gdb/binutils end up finding a common ground
for basic C++ utilities, and grow closer together.  And you did say
you're fixing gdb's alloca problems as well.  :-) )

> Then for things like the GCC driver wanting to create a vector of passed
> commands, we could use auto_vec<> (*gasp*, using a vector to represent a
> vector ;-)).  Note that auto_vec<> uses malloc (or GC) but creating a
> malloc'd vector at startup is hardly on the fast path.

Note that that's why I specifically suggested auto_vec
for critical-path things.

The second template parameter of auto_vec specifies the size of the
internal storage:

 /* auto_vec is a subclass of vec that automatically manages creating and
releasing the internal vector. If N is non zero then it has N elements of
internal storage.  The default is no internal storage, and you probably only
want to ask for internal storage for vectors on the stack because if the
size of the vector is larger than the internal storage that space is wasted.
*/
 template
 class auto_vec : public vec
 {
 ...

So for fast-path cases where you know off hand the usual scenario is "less
than N", you could use an auto_vec variable on the stack.  This
does not malloc unless you end up needing more than N.  It actually ends up
being quite similar to your protected_alloca.  E.g.,

yours:

 protected_alloca chunk;
 str = (char *) chunk.pointer (); 

vs auto_vec:

 typedef auto_vec protected_alloca;

 protected_alloca chunk;
 str = chunk.address (); 

(Of course a similar approach can be taken with other data
structures, stack -> set/map/hash, etc., but auto_vec is there
already, and vectors should be the default container.)

> 
> For the remaining things we could use either alloca with a malloc
> fallback for trivial things like set_user_assembler_name() that only
> have one exit point, or take it on a case by case basis.

See above.

Thanks,
Pedro Alves

> 
> But it seems to me that we can use an STL container for the non critical
> things (which are the majority of them), and something else (perhaps an
> RAII thinggie TBD later).
> 
> Is this reasonable?  I'd like to know before I spend any time converting
> anything to std::string and auto_vec<>.
> 
> Thanks for everyone's input.
> Aldy
> 
> p.s. Yeah yeah, both std::string and auto_vec<> malloc stuff and they're
> bloated, but do we care on non-critical things?  We do get cleaner code
> as a result...



Re: [PATCH GCC]Simplify interface for simplify_using_initial_conditions

2016-08-05 Thread Bin.Cheng
On Thu, Aug 4, 2016 at 1:48 PM, Richard Biener
 wrote:
> On Thu, Aug 4, 2016 at 10:40 AM, Bin.Cheng  wrote:
>> On Wed, Aug 3, 2016 at 11:17 PM, Jeff Law  wrote:
>>> On 08/03/2016 10:35 AM, Bin Cheng wrote:

 Hi,
 When I introduced parameter STOP for expand_simple_operations, I also
 added it for simplify_using_initial_conditions.  The STOP argument is also
 passed to simplify_using_initial_conditions in
 simple_iv_with_niters/loop_exits_before_overflow.  After analyzing case
 reported by PR72772, I think STOP expanding is only needed for
 expand_simple_operations when handling IV.step in tree-ssa-loop-ivopts.c.
 For other cases like calls to simplify_using_initial_condition, both cond
 and expr should be expanded to check tree expression equality.  This patch
 does so.  It simplifies interface by removing parameter STOP, also moves
 expand_simple_operations from tree_simplify_using_condition_1 to its 
 caller.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?

 Thanks,
 bin

 2016-08-02  Bin Cheng  

 PR tree-optimization/72772
 * tree-ssa-loop-niter.h (simplify_using_initial_conditions):
 Delete
 parameter STOP.
 * tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Delete
 parameter STOP and update calls.  Move expand_simple_operations
 function call from here...
 (simplify_using_initial_conditions): ...to here.  Delete parameter
 STOP.
 (tree_simplify_using_condition): Delete parameter STOP.
 * tree-scalar-evolution.c (simple_iv_with_niters): Update call to
 simplify_using_initial_conditions.

>>> OK.
>>> jeff
>>
>> Thanks for reviewing.  Now I have a question about behavior of the
>> interface.  Although by expanding both cond and expr, this patch
>> catches more equality cases, it always returns expanded expr even it's
>> not simplified, while the original behavior only returns simplified
>> expr (not expanded).  For most use cases, it doesn't matter because we
>> only care if the simplified result is TRUE or FALSE, but in
>> computation of niter->assumption and niter->may_be_zeor, we may result
>> in different (expanded) expressions.  Not sure how much this
>> difference matters.  I can work on another version patch keeping the
>> old behavior if it worth keeping.
>
> It might result in additional redundant code to be generated when generating
> versioning conditions from assumption or maybe_zero?  So yes, I think
Yes, that's the case it matters.

Hi Jeff, Richard,
Attachment is updated patch preserving the old behavior.  Bootstrap
and test again.  Is it OK?

Thanks,
bin
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 7c5cefd..b8bfe51 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3484,7 +3484,7 @@ simple_iv_with_niters (struct loop *wrto_loop, struct 
loop *use_loop,
   bool allow_nonconstant_step)
 {
   enum tree_code code;
-  tree type, ev, base, e, stop;
+  tree type, ev, base, e;
   wide_int extreme;
   bool folded_casts, overflow;
 
@@ -3601,8 +3601,7 @@ simple_iv_with_niters (struct loop *wrto_loop, struct 
loop *use_loop,
 return true;
   e = fold_build2 (code, boolean_type_node, base,
   wide_int_to_tree (type, extreme));
-  stop = (TREE_CODE (base) == SSA_NAME) ? base : NULL;
-  e = simplify_using_initial_conditions (use_loop, e, stop);
+  e = simplify_using_initial_conditions (use_loop, e);
   if (!integer_zerop (e))
 return true;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 191a071..5efba3b 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1880,10 +1880,10 @@ expand_simple_operations (tree expr, tree stop)
expression (or EXPR unchanged, if no simplification was possible).  */
 
 static tree
-tree_simplify_using_condition_1 (tree cond, tree expr, tree stop)
+tree_simplify_using_condition_1 (tree cond, tree expr)
 {
   bool changed;
-  tree e, te, e0, e1, e2, notcond;
+  tree e, e0, e1, e2, notcond;
   enum tree_code code = TREE_CODE (expr);
 
   if (code == INTEGER_CST)
@@ -1895,17 +1895,17 @@ tree_simplify_using_condition_1 (tree cond, tree expr, 
tree stop)
 {
   changed = false;
 
-  e0 = tree_simplify_using_condition_1 (cond, TREE_OPERAND (expr, 0), 
stop);
+  e0 = tree_simplify_using_condition_1 (cond, TREE_OPERAND (expr, 0));
   if (TREE_OPERAND (expr, 0) != e0)
changed = true;
 
-  e1 = tree_simplify_using_condition_1 (cond, TREE_OPERAND (expr, 1), 
stop);
+  e1 = tree_simplify_using_condition_1 (cond, TREE_OPERAND (expr, 1));
   if (TREE_OPERAND (expr, 1) != e1)
changed = true;
 
   if (code == COND_EXPR)
{
- e2 = 

Re: protected alloca class for malloc fallback

2016-08-05 Thread Aldy Hernandez



I was surprised by the always_inline trick.  I suppose it makes
sense but I wouldn't have expected to be able to rely on it.  Out
of curiosity I tested it with other compilers.  It turns out that
it only works with some like GCC and IBM XLC++, but not others
like Clang or Sun CC.  In recursive calls, they don't seem to
hold on to the memory allocated via alloca by the class ctor in
the caller.


Well, it was surprising to me as well, hence the comment.  I expected it 
to just work, and when it didn't I had to hunt in the inliner code to 
find out why it was selectively inlining:


case GIMPLE_CALL:
  /* Refuse to inline alloca call unless user explicitly forced so as
 this may change program's memory overhead drastically when the
 function using alloca is called in loop.  In GCC present in
 SPEC2000 inlining into schedule_block cause it to require 2GB of
 RAM instead of 256MB.  Don't do so for alloca calls emitted for
 VLA objects as those can't cause unbounded growth (they're always
 wrapped inside stack_save/stack_restore regions.  */

As Richi pointed out, if the constructor doesn't get inlined (as you're 
seeing in Clang and Sun CC), we could potentially clobber freed memory. 
 So much for that idea...




FWIW, rather than creating a new class around alloca and putting
effort into changing code to use it I think that investing time
into replacing the function's uses with an existing C++ container
class (either GCC's own or one from the STL) might be a better
way to go.


Yes, I suggested a combination of auto_vec<> (gcc's) and std::string 
down thread.


Thanks for checking out the result from other compilers.  Much appreciated.

Aldy


Re: [AArch64] Handle HFAs of float16 types properly

2016-08-05 Thread Richard Earnshaw (lists)
On 26/07/16 14:55, James Greenhalgh wrote:
> 
> Hi,
> 
> It looks like we've not been handling structures of 16-bit floating-point
> data correctly for AArch64. For some reason we end up passing them
> packed in to integer registers. That is to say, on trunk and GCC 6, for:
> 
>   struct x {
> __fp16 x[4];
>   };
> 
>   __fp16
>   foo1 (struct x x)
>   {
> return x.x[1];
>   }
> 
> We generate:
> 
>   foo1:
>   sbfxx0, x0, 16, 16
>   mov v0.h[0], w0
>   ret
> 
> Which is wrong.
> 
> This patch fixes that, so now we generate:
> 
>   foo1:
>   umovw0, v1.h[0]
>   sxthx0, w0
>   mov v0.h[0], w0
>   ret
> 
> Far from optimal (I'll work on that...) but at least getting the data from
> the right register bank!
> 
> To do this we need to keep around a reference to the fp16 type after we
> construct it. I've moved this initialisation to a new function
> aarch64_init_fp16_types in aarch64-builtins.c and made the references
> available through arm_neon.h.
> 
> After that, we want to remove the #if 0 wrapping HFmode support in
> aarch64_gimplify_va_arg_expr in aarch64.c, and add HFmode to the
> REAL_TYPE and COMPLEX_TYPE support in aapcs_vfp_sub_candidate.
> 
> Strictly speaking, we don't need the hunk regarding COMPLEX_TYPE.
> We can't build complex forms of __fp16. But, were we ever to support the
> _Float16 type we'd need this. Rather than leave the chance it will be
> forgotten about, I've just added it here. If the maintainers would prefer,
> I can change this to a TODO and put a sticky-note somewhere near my desk.
> 
> With those simple changes, we fix the argument passing. The rest of the
> patch is an update to the various testcases in aapcs64.exp to fully cover
> various __fp16 cases (both naked, and within an HFA).
> 
> Bootstrapped on aarch64-none-linux-gnu and tested with no issues. Also
> tested on aarch64_be-none-elf. All test came back clean.
> 
> OK? As this is an ABI break, I'm not proposing for it to go back to GCC 6,
> though it will apply cleanly there if the maintainers support that.
> 

Can you please file a PR for this and use that when committing.  As
previously discussed, since this was new for 6.1 having a PR makes it
easier if we do decide to have a back-port.

OK on that basis.

R.

> Thanks,
> James
> 
> ---
> 
> gcc/
> 
> 2016-07-26  James Greenhalgh  
> 
>   * config/aarch64/aarch64.h (aarch64_fp16_type_node): Declare.
>   (aarch64_fp16_ptr_type_node): Likewise.
>   * config/aarch64/aarch64-simd-builtins.c
>   (aarch64_fp16_ptr_type_node): Define.
>   (aarch64_init_fp16_types): New, refactored out of...
>   (aarch64_init_builtins): ...here, update to call
>   aarch64_init_fp16_types.
>   * config/aarch64/aarch64.c (aarch64_gimplify_va_arg_expr): Handle
>   HFmode.
>   (aapcs_vfp_sub_candidate): Likewise.
> 
> gcc/testsuite/
> 
> 2016-07-26  James Greenhalgh  
> 
>   * gcc.target/aarch64/aapcs64/abitest-common.h: Define half-precision
>   registers.
>   * gcc.target/aarch64/aapcs64/abitest.S (dumpregs): Add assembly for
>   saving the half-precision registers.
>   * gcc.target/aarch64/aapcs64/func-ret-1.c: Test that an __fp16
>   value is returned in h0.
>   * gcc.target/aarch64/aapcs64/test_2.c: Check that __FP16 arguments
>   are passed in FP/SIMD registers.
>   * gcc.target/aarch64/aapcs64/test_27.c: New, test that __fp16 HFA
>   passing works corrcetly.
>   * gcc.target/aarch64/aapcs64/type-def.h (hfa_f16x1_t): New.
>   (hfa_f16x2_t): Likewise.
>   (hfa_f16x3_t): Likewise.
>   * gcc.target/aarch64/aapcs64/va_arg-1.c: Check that __fp16 values
>   are promoted to double and passed in a double register.
>   * gcc.target/aarch64/aapcs64/va_arg-2.c: Check that __fp16 values
>   are promoted to double and stacked.
>   * gcc.target/aarch64/aapcs64/va_arg-4.c: Check stacking of HFA of
>   __fp16 data types.
>   * gcc.target/aarch64/aapcs64/va_arg-5.c: Likewise.
>   * gcc.target/aarch64/aapcs64/va_arg-16.c: New, check HFAs of
>   __fp16 first get passed in FP/SIMD registers, then stacked.
> 
> 
> 0001-AArch64-Handle-HFAs-of-float16-types-properly.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index ca91d91..1de325a 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -443,13 +443,15 @@ static struct aarch64_simd_type_info aarch64_simd_types 
> [] = {
>  };
>  #undef ENTRY
>  
> -/* This type is not SIMD-specific; it is the user-visible __fp16.  */
> -static tree aarch64_fp16_type_node = NULL_TREE;
> -
>  static tree aarch64_simd_intOI_type_node = NULL_TREE;
>  static tree aarch64_simd_intCI_type_node = NULL_TREE;
>  static tree aarch64_simd_intXI_type_node = NULL_TREE;
>  
> +/* The user-visible __fp16 type, and a pointer to that type.  Used
> +   

Re: protected alloca class for malloc fallback

2016-08-05 Thread Aldy Hernandez



I think only thing you're missing is that we probably don't want to be
using alloca in new code anyway.  And if we're going through the trouble
of fixing old code, we ought to just remove alloca.  So building up this
kind of class is probably taking folks in the wrong direction.


After spending some time combing over all alloca uses in the compiler, I 
am very much inclined to agree :).


Pedro's comments up-thread ring true as well.  Most of the alloca uses 
in the compiler are actually ad-hoc allocation for a string that will be 
quickly discarded or passed to get_identifier().  Very few, if any are 
actually on a critical path.  Most uses are things like building 
diagnostics.


However, there are a few potentially problematic ones like alloca()ing 
chunks of memory while processing GIMPLE_ASM (chunk * noutputs).  This 
may require something more efficient than malloc().  Though I really 
wonder how prevalent inline asms are in that it would affect performance 
in any measurable way if we replaced these with malloc or GCC's 
auto_vec<> (which also uses malloc (or GGC) BTW).


After looking at all this code, I realize no size will fit all if we 
want something clean.  So, I take my approach back.  I think we're best 
served by a couple different approaches, depending on usage.


My question is, how wed are we to an alloca based approach?  Can we 
replace the non-critical uses by std::string (diagnostics, 
BLAH_check_failed(), building a string to manipulate the output of 
env("SOME_VAR"), etc?).


Then for things like the GCC driver wanting to create a vector of passed 
commands, we could use auto_vec<> (*gasp*, using a vector to represent a 
vector ;-)).  Note that auto_vec<> uses malloc (or GC) but creating a 
malloc'd vector at startup is hardly on the fast path.


For the remaining things we could use either alloca with a malloc 
fallback for trivial things like set_user_assembler_name() that only 
have one exit point, or take it on a case by case basis.


But it seems to me that we can use an STL container for the non critical 
things (which are the majority of them), and something else (perhaps an 
RAII thinggie TBD later).


Is this reasonable?  I'd like to know before I spend any time converting 
anything to std::string and auto_vec<>.


Thanks for everyone's input.
Aldy

p.s. Yeah yeah, both std::string and auto_vec<> malloc stuff and they're 
bloated, but do we care on non-critical things?  We do get cleaner code 
as a result...


Re: [PATCH][expr.c] PR middle-end/71700: zero-extend sub-word value when widening constructor element

2016-08-05 Thread Kyrill Tkachov


On 01/08/16 15:31, Kyrill Tkachov wrote:


On 11/07/16 18:55, Bernd Schmidt wrote:

On 07/11/2016 04:52 PM, Kyrill Tkachov wrote:

Based on that, I think that code block is a useful optimisation, we just
need
to take care with immediates.

What do you think?


Yeah, I think the patch is ok.



This patch (https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00017.html) has been 
in trunk
for a month with no reported problems.

I'd like to backport it to the GCC 5 and 6 branches.
I've bootstrapped and tested it there on arm-none-linux-gnueabihf.



Sorry for the early ping but given that GCC 6.2 is due shortly and I'm away 
next week,
could I please backport this patch to the GCC 6 branch?

Thanks,
Kyrill


Ok?

Thanks,
Kyrill



Bernd







Re: [AArch64] Handle HFAs of float16 types properly

2016-08-05 Thread Richard Earnshaw (lists)
On 05/08/16 15:17, James Greenhalgh wrote:
> On Fri, Aug 05, 2016 at 11:15:24AM +0100, James Greenhalgh wrote:
>> On Fri, Aug 05, 2016 at 11:00:39AM +0100, Yao Qi wrote:
>>> On Tue, Jul 26, 2016 at 2:55 PM, James Greenhalgh
>>>  wrote:

 OK? As this is an ABI break, I'm not proposing for it to go back to GCC 6,
 though it will apply cleanly there if the maintainers support that.

>>>
>>> What do you mean by "ABI break"?  AFAICS, with this patch, it conforms to
>>> AAPCS.
>>
>> After this patch code generated for GCC 4.9/5/6 will not be ABI
> 
> Note that the __fp16 type was only added for AArch64 for GCC 6, so there
> would be no break going back to the earlier branches.
> 
> The only released compiler we would potentially have an ABI break against
> would be GCC 6.1 (and any vendor/distibution compilers that had backported
> the __fp16 support).
> 
> __fp16 is a fairly corner-case type anyway, so the actual impact of this
> break should be reasonably well limited. Especially if we backport the fix
> such that GCC 6.2 contains the fix.

I agree.  Given this was a new feature we should fix it and be done.
6.1 was buggy, 6.2 is it!

R.

> 
> Thanks,
> James
> 
>> compatible with code generated for GCC 7 for HFAs of __fp16. The new
>> generated code will conform to AAPCS64, but the old code didn't so there has
>> been an ABI change between the GCC versions. We don't like doing that for
>> minor releases, so the patch is not really suitable for backporting.
>>
>>> The subject leads me thinking about the handling of HVA of float16.
>>
>> These are handled like any other vector, the code looking at HVA's doesn't
>> care about the inner mode of the vector just the bitsize:
>>



Re: [AArch64] Handle HFAs of float16 types properly

2016-08-05 Thread James Greenhalgh
On Fri, Aug 05, 2016 at 11:15:24AM +0100, James Greenhalgh wrote:
> On Fri, Aug 05, 2016 at 11:00:39AM +0100, Yao Qi wrote:
> > On Tue, Jul 26, 2016 at 2:55 PM, James Greenhalgh
> >  wrote:
> > >
> > > OK? As this is an ABI break, I'm not proposing for it to go back to GCC 6,
> > > though it will apply cleanly there if the maintainers support that.
> > >
> > 
> > What do you mean by "ABI break"?  AFAICS, with this patch, it conforms to
> > AAPCS.
> 
> After this patch code generated for GCC 4.9/5/6 will not be ABI

Note that the __fp16 type was only added for AArch64 for GCC 6, so there
would be no break going back to the earlier branches.

The only released compiler we would potentially have an ABI break against
would be GCC 6.1 (and any vendor/distibution compilers that had backported
the __fp16 support).

__fp16 is a fairly corner-case type anyway, so the actual impact of this
break should be reasonably well limited. Especially if we backport the fix
such that GCC 6.2 contains the fix.

Thanks,
James

> compatible with code generated for GCC 7 for HFAs of __fp16. The new
> generated code will conform to AAPCS64, but the old code didn't so there has
> been an ABI change between the GCC versions. We don't like doing that for
> minor releases, so the patch is not really suitable for backporting.
> 
> > The subject leads me thinking about the handling of HVA of float16.
> 
> These are handled like any other vector, the code looking at HVA's doesn't
> care about the inner mode of the vector just the bitsize:
> 



Re: [PATCH, ARM] Add a new target hook to compute the frame layout

2016-08-05 Thread Richard Earnshaw (lists)
On 05/08/16 13:49, Bernd Edlinger wrote:
> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this patch introduces a new target hook that allows the target's
>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>> re-computing the frame layout every time.
>>>
>>> I have updated the documentation a bit and hope it is clearer this time.
>>>
>>> It still needs a review by ARM port maintainers.
>>>
>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>
>>
>> I need to look into this more, but my first thought was that the
>> documentation is confusing, or there is a real problem in this patch.
>>
> 
> Thanks for your quick response.
> 
> The documentation is actually the most difficult part for me.
> 
>> As I understand it the frame has to be re-laid out each time the
>> contents of the frame changes (an extra register becomes live or another
>> spill slot is needed).  So saying that it is laid out once can't be
>> right if (as I read it initially) you mean 'once per function' since I
>> think it needs to be 'once for each time the frame contents changes'.
>>
>> Of course, things might be a bit different with LRA compared with
>> reload, but I strongly suspect this is never going to be an 'exactly
>> once per function' operation.
>>
> 
> Right.  It will be done 2 or 3 times for each function.
> LRA and reload behave identical in that respect.
> 
> But each time reload changes something in the input data the
> INITIAL_EMIMINATION_OFFSET is called several times, and the results
> have to be consistent in each iteration.
> 
> The frame layout function has no way to know if the frame layout
> is expected to change or not.
> 
> Many targets use reload_completed as an indication when the frame layout
> may not change at all, but that is only an approximation.
> 
>> Can you clarify your meaning in the documentation please?
>>
> 
> I meant 'once' in the sense of 'once for each time the frame contents 
> change'.
> 
> Thus I'd change that sentence to:
> 
> "This target hook allows the target to compute the frame layout once for
> each time the frame contents change and make use of the cached frame
> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
> on every invocation.  This is particularly useful for targets that have
> an expensive frame layout function.  Implementing this callback is
> optional."
> 

Thanks, that's pretty much what I expected would be the case.

Could I suggest:

This target hook is called once each time the frame layout needs to be
recalculated.  The calculations can be cached by the target and can then
be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
layout on every invocation of that hook.  This is particularly useful
for targets that have an expensive frame layout function.  Implementing
this callback is optional.

R.

> 
> Thanks
> Bernd.
> 
> 
>> R.
>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>> On 06/21/16 23:29, Jeff Law wrote:
 On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
> Hi!
>
>
> By the design of the target hook INITIAL_ELIMINATION_OFFSET
> it is necessary to call this function several times with
> different register combinations.
> Most targets use a cached data structure that describes the
> exact frame layout of the current function.
>
> It is safe to skip the computation when reload_completed = true,
> and most targets do that already.
>
> However while reload is doing its work, it is not clear when to
> do the computation and when not.  This results in unnecessary
> work.  Computing the frame layout can be a simple function or an
> arbitrarily complex one, that walks all instructions of the current
> function for instance, which is more or less the common case.
>
>
> This patch adds a new optional target hook that can be used
> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
> into a O(n) computation part, and a O(1) result function.
>
> The patch implements a compute_frame_layout target hook just
> for ARM in the moment, to show the principle.
> Other targets may also implement that hook, if it seems appropriate.
>
>
> Boot-strapped and reg-tested on arm-linux-gnueabihf.
> OK for trunk?
>
>
> Thanks
> Bernd.
>
>
> changelog-frame-layout.txt
>
>
> 2016-06-16  Bernd Edlinger  
>
>  * target.def (compute_frame_layout): New optional target hook.
>  * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>  * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>  * lra-eliminations.c (update_reg_eliminate): Call
> compute_frame_layout
>  target hook.
>  * reload1.c (verify_initial_elim_offsets): Likewise.
>  * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.

Re: [PATCH] Define feature-test macro for std::enable_shared_from_this

2016-08-05 Thread Jonathan Wakely

On 04/08/16 13:33 +0100, Jonathan Wakely wrote:

On 03/08/16 20:11 +0100, Jonathan Wakely wrote:

Another feature we already support, so just define the macro.

* include/bits/shared_ptr_base.h (__cpp_lib_enable_shared_from_this):
Define feature-test macro.
* testsuite/20_util/enable_shared_from_this/members/reinit.cc: Test
for the macro.

Tested x86_64-linux, committed to trunk.


I realised we don't actually implement the whole feature, because we
don't have the new weak_from_this() members (careless of me to forget
the contents of my own proposal!)

This adds them for C++17, or gnu++1*, and only defines the
feature-test macro when those members are present.


And this corrects the status in the manual.

commit bdf5a745f3f98517fc0efe3f5daef9da0b8b53c7
Author: Jonathan Wakely 
Date:   Fri Aug 5 14:56:34 2016 +0100

Correct status of __cpp_lib_enable_shared_from_this

	* doc/xml/manual/status_cxx2017.xml: Correct shared_from_this status.
	* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 4d098d1..02aec25 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -261,7 +261,7 @@ Feature-testing recommendations for C++.
 	P0033R1
 	
   
-   6.1 
+   7 
   __cpp_lib_enable_shared_from_this >= 201603
 
 


[PATCH] Implement C++17 rounding functions for std::chrono (P0092R1)

2016-08-05 Thread Jonathan Wakely

* include/std/chrono (floor, ceil, round, abs): New for C++17.
* testsuite/20_util/duration_cast/rounding.cc: New test.
* testsuite/20_util/time_point_cast/rounding.cc: New test.
* doc/xml/manual/status_cxx2017.xml: Update status table.
* doc/html/manual/status.html: Regenerate.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Adjust
dg-error lineno.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
* testsuite/20_util/duration/literals/range.cc: Likewise.

Tested powerpc64-linux, committed to trunk.

commit ac54a375293a58fc9b42e7bd8e8b0518c5988bd2
Author: Jonathan Wakely 
Date:   Fri Aug 5 13:54:57 2016 +0100

Implement C++17 rounding functions for std::chrono (P0092R1)

* include/std/chrono (floor, ceil, round, abs): New for C++17.
* testsuite/20_util/duration_cast/rounding.cc: New test.
* testsuite/20_util/time_point_cast/rounding.cc: New test.
* doc/xml/manual/status_cxx2017.xml: Update status table.
* doc/html/manual/status.html: Regenerate.
* testsuite/20_util/duration/requirements/typedefs_neg1.cc: Adjust
dg-error lineno.
* testsuite/20_util/duration/requirements/typedefs_neg2.cc: Likewise.
* testsuite/20_util/duration/requirements/typedefs_neg3.cc: Likewise.
* testsuite/20_util/duration/literals/range.cc: Likewise.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 8391758..4d098d1 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -437,14 +437,13 @@ Feature-testing recommendations for C++.
 
 
 
-  
Polishing chrono 
   
http://www.w3.org/1999/xlink; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0092r1.html;>
P0092R1

   
-   No 
+   7 
__cpp_lib_chrono >= 201510 
 
 
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index fdb21b3..f29d8e1 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -208,11 +208,69 @@ _GLIBCXX_END_NAMESPACE_VERSION
   struct treat_as_floating_point
   : is_floating_point<_Rep>
   { };
+
 #if __cplusplus > 201402L
 template 
   constexpr bool treat_as_floating_point_v =
 treat_as_floating_point<_Rep>::value;
 #endif // C++17
+
+#if __cplusplus > 201402L
+# define __cpp_lib_chrono 201510
+
+template
+  constexpr enable_if_t<__is_duration<_ToDur>::value, _ToDur>
+  floor(const duration<_Rep, _Period>& __d)
+  {
+   auto __to = chrono::duration_cast<_ToDur>(__d);
+   if (__to > __d)
+ --__to;
+   return __to;
+  }
+
+template
+  constexpr enable_if_t<__is_duration<_ToDur>::value, _ToDur>
+  ceil(const duration<_Rep, _Period>& __d)
+  {
+   auto __to = chrono::duration_cast<_ToDur>(__d);
+   if (__to < __d)
+ return __to + _ToDur{1};
+   return __to;
+  }
+
+template 
+  constexpr enable_if_t<
+   __and_<__is_duration<_ToDur>,
+  __not_>::value,
+   _ToDur>
+  round(const duration<_Rep, _Period>& __d)
+  {
+   _ToDur __t0 = chrono::floor<_ToDur>(__d);
+   _ToDur __t1 = __t0 + _ToDur{1};
+   auto __diff0 = __d - __t0;
+   auto __diff1 = __t1 - __d;
+   if (__diff0 == __diff1)
+   {
+   if (__t0.count() & 1)
+   return __t1;
+   return __t0;
+   }
+   else if (__diff0 < __diff1)
+   return __t0;
+   return __t1;
+  }
+
+template
+  constexpr
+  enable_if_t::is_signed, duration<_Rep, _Period>>
+  abs(duration<_Rep, _Period> __d)
+  {
+   if (__d >= __d.zero())
+ return __d;
+   return -__d;
+  }
+#endif // C++17
+
 /// duration_values
 template
   struct duration_values
@@ -610,6 +668,37 @@ _GLIBCXX_END_NAMESPACE_VERSION
return __time_point(duration_cast<_ToDur>(__t.time_since_epoch()));
   }
 
+#if __cplusplus > 201402L
+template
+  constexpr
+  enable_if_t<__is_duration<_ToDur>::value, time_point<_Clock, _ToDur>>
+  floor(const time_point<_Clock, _Dur>& __tp)
+  {
+   return time_point<_Clock, _ToDur>{
+   chrono::floor<_ToDur>(__tp.time_since_epoch())};
+  }
+
+template
+  constexpr
+  enable_if_t<__is_duration<_ToDur>::value, time_point<_Clock, _ToDur>>
+  ceil(const time_point<_Clock, _Dur>& __tp)
+  {
+   return time_point<_Clock, _ToDur>{
+   chrono::ceil<_ToDur>(__tp.time_since_epoch())};
+  }
+
+template
+  constexpr enable_if_t<
+   __and_<__is_duration<_ToDur>,
+  

Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-08-05 Thread Richard Biener
On Fri, Aug 5, 2016 at 3:28 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Here is updated patch which implements your proposal - I pass loop
> instead of stmt to determine either REF is defined inside LOOP nest or
> not. I checked that for pr70729-nest.cc the reference this->S_n  for
> statements which are out of innermost loop are  not considered as
> independent as you pointed out.
>
> Regression testing did not show any new failures and both failed tests
> from libgomp.fortran suite now passed.
>
> Is it OK for trunk?

I don't quite understand

+  /* Ignore dependence for loops having greater safelen.  */
+  if (new_safelen == safelen
+  && bitmap_bit_p (>dep_loop, LOOP_DEP_BIT (loop->num, stored_p)))
 return false;

this seems to suggest (correctly I think) that we cannot rely on the caching
for safelen, neither for optimal results (you seem to address that) but also
not for correctness (we cache the no-dep result from a safelen run and
then happily re-use that info for a ref that is not safe for safelen).

It seems to me we need to avoid any caching if we made things independent
because of safelen and simply not do the dep test afterwards.  this means
inlining ref_indep_loop_p_1 partly into _2 (not sure if there's a great way
to do this w/o confusing the control flow).

Richard.

> ChangeLog:
> 2016-08-05  Yuri Rumyantsev  
>
> PR tree-optimization/71734
> * tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument REF_LOOP.
> (outermost_indep_loop): Pass LOOP argumnet where REF was defined to
> ref_indep_loop_p.
> (ref_indep_loop_p_1): Fix commentary.
> (ref_indep_loop_p_2): Add additional argument REF_LOOP, introduce new
> variable NEW_SAFELEN which may have new value for SAFELEN, ignore
> dependencde for loop having greater safelen value, pass REF_LOOP in
> recursive call.
> (can_sm_ref_p): Pass LOOP as additional argument to ref_indep_loop_p.
>
> 2016-08-03 16:44 GMT+03:00 Richard Biener :
>> On Fri, Jul 29, 2016 at 4:00 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard.
>>>
>>> It turned out that the fix proposed by you does not work for liggomp
>>> tests simd3 and simd4.
>>> The reason is that we can't change safelen value for references not
>>> defined inside loop. So I add missed check on it to patch.
>>> Is it OK for trunk?
>>
>> Hmm, I don't like the walk of all subloops in ref_defined_in_loop_p as
>> that operation can end up being quadratic in the loop depth/width.
>>
>> But I also wonder about correctness given that LIM "commons"
>> references.  So we can have
>>
>>   for (;;)
>> .. = ref;  (1)
>> for (;;) // safelen == 2  (2)
>>   ... = ref;
>>
>> and when looking at the ref at (1) which according to you should not
>> have safelen applied your function will happily return that ref is defined
>> in the inner loop.
>>
>> So it looks like to be able to apply safelen the caller of ref_indep_loop_p
>> needs to pass down a ref plus a location (a stmt).  In which case your
>> function can simply use flow_loop_nested_p (loop, gimple_bb
>> (stmt)->loop_father);
>>
>> Richard.
>>
>>> ChangeLog:
>>> 2016-07-29  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/71734
>>> * tree-ssa-loop-im.c (ref_defined_in_loop_p): New function.
>>> (ref_indep_loop_p_2): Change SAFELEN value for REF defined inside LOOP.
>>>
>>> 2016-07-29 13:08 GMT+03:00 Yuri Rumyantsev :
 Sorry H.J.

 I checked both these tests manually but forgot to pass "-fopenmp" option.
 I will fix the issue asap.

 2016-07-29 0:33 GMT+03:00 H.J. Lu :
> On Thu, Jul 28, 2016 at 6:49 AM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> I prepare a patch which is based on yours. New test is also included.
>> Bootstrapping and regression testing did not show any new failures.
>> Is it OK for trunk?
>>
>> Thanks.
>> ChangeLog:
>> 2016-07-28  Yuri Rumyantsev  
>>
>> PR tree-optimization/71734
>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Pass value of safelen
>> attribute instead of REF_LOOP and use it.
>> (ref_indep_loop_p_2): Use SAFELEN argument instead of REF_LOOP and
>> set it for Loops having non-zero safelen attribute.
>> (ref_indep_loop_p): Pass zero as initial value for safelen.
>> gcc/testsuite/ChangeLog:
>> * g++.dg/vect/pr70729-nest.cc: New test.
>>
>
> Does this cause
>
> FAIL: libgomp.fortran/pr71734-1.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> test
> FAIL: libgomp.fortran/pr71734-1.f90   -O3 -g  execution test
> FAIL: libgomp.fortran/pr71734-2.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> test
> FAIL: libgomp.fortran/pr71734-2.f90   -O3 -g  execution test
>
> on AVX 

Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-05 Thread Martin Liška
On 08/05/2016 03:14 PM, Nathan Sidwell wrote:
> On 08/05/16 08:48, Martin Liška wrote:
> 
>> Ok, after all the experimenting and inventing "almost" thread-safe code, I 
>> incline to not to include __gcov_one_value_profiler_body_atomic
>> counter. The final solution is cumbersome and probably does not worth doing. 
>> Moreover, even having a thread-safe implementation, result of an indirect 
>> call counter
>> is not going to be stable among different runs (due to a single value 
>> storage capability).
>>
>> If you agree, I'll prepare a final version of patch?
> 
> Agreed.
> 
> nathan
> 

Great, attaching install candidate.

Martin
>From 0b3ac8636ef34b02e301f22c86dde0602f9969ef Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 28 Jul 2016 14:32:47 +0200
Subject: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9
 branch

gcc/ChangeLog:

2016-08-05  Martin Liska  

	Cherry picked (and modified) from google-4_7 branch
	2012-12-26  Rong Xu  
	* common.opt (fprofile-update): Add new flag.
	* coretypes.h: Define enum profile_update.
	* doc/invoke.texi: Document -fprofile-update.
	* gcov-io.h: Declare GCOV_TYPE_ATOMIC_FETCH_ADD and
	GCOV_TYPE_ATOMIC_FETCH_ADD_FN.
	* tree-profile.c (gimple_init_edge_profiler): Generate
	also atomic profiler update.
	(gimple_gen_edge_profiler): Likewise.

gcc/testsuite/ChangeLog:

2016-08-05  Martin Liska  

	* g++.dg/gcov/gcov-threads-1.C: New test.
---
 gcc/common.opt | 13 
 gcc/coretypes.h|  6 
 gcc/doc/invoke.texi| 12 +++
 gcc/gcov-io.h  |  8 +
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C | 46 ++
 gcc/tree-profile.c | 53 --
 6 files changed, 120 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..44adae8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1916,6 +1916,19 @@ fprofile-correction
 Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
 
+fprofile-update=
+Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
+-fprofile-update=[single|atomic]	Set the profile update method.
+
+Enum
+Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
+
+EnumValue
+Enum(profile_update) String(single) Value(PROFILE_UPDATE_SINGLE)
+
+EnumValue
+Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC)
+
 fprofile-generate
 Common
 Enable common options for generating profile info for profile feedback directed optimizations.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index b3a91a6..fe1e984 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -174,6 +174,12 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of profile update methods.  */
+enum profile_update {
+  PROFILE_UPDATE_SINGLE,
+  PROFILE_UPDATE_ATOMIC
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 22001f9..1cfaae7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9933,6 +9933,18 @@ the profile feedback data files. See @option{-fprofile-dir}.
 To optimize the program based on the collected profile information, use
 @option{-fprofile-use}.  @xref{Optimize Options}, for more information.
 
+@item -fprofile-update=@var{method}
+@opindex fprofile-update
+
+Alter the update method for an application instrumented for profile
+feedback based optimization.  The @var{method} argument should be one of
+@samp{single} or @samp{atomic}.  The first one is useful for single-threaded
+applications, while the second one prevents profile corruption by emitting
+thread-safe code.
+
+@strong{Warning:} When an application does not properly join all threads
+(or creates an detached thread), a profile file can be still corrupted.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bbf013a..afd00ac 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -164,6 +164,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #ifndef GCC_GCOV_IO_H
 #define GCC_GCOV_IO_H
 
+#if LONG_LONG_TYPE_SIZE > 32
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
+#else
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
+#endif
+
 #ifndef IN_LIBGCOV
 /* About the host */
 
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
new file mode 100644
index 000..a4a6f0a
--- /dev/null
+++ 

Re: [PATCH] Fix PR tree-optimization/72810

2016-08-05 Thread Richard Biener
On Fri, Aug 5, 2016 at 3:08 PM, Patrick Palka  wrote:
> Sometimes the case labels of a switch have a different type than the
> switch operand, e.g. in the test case below, the case labels have type t
> and the switch operand has type int. Because the verifier expects that
> case labels of a switch each have the exact same type, it's important to
> avoid changing their types when truncating their range.
>
> Does this patch look OK to commit after bootstrap + regtesting?

Ok, though the verifier should possibly be made more forgiving and
allow all types_compatible_p.

Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/72810
> * tree-vrp.c (simplify_switch_using_ranges): Avoid changing
> the types of the case labels when truncating.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/72810
> * tree-ssa/vrp110.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/vrp110.c | 24 +++
>  gcc/tree-vrp.c | 43 
> +++---
>  2 files changed, 48 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp110.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c
> new file mode 100644
> index 000..34d7eb4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c
> @@ -0,0 +1,24 @@
> +/* { dg-options "-O2" }  */
> +
> +extern void foo (void);
> +extern void bar (void);
> +
> +void
> +test (int i)
> +{
> +  if (i == 1)
> +return;
> +
> +  typedef int t;
> +  t j = i;
> +  switch (j)
> +{
> +case 1:
> +case 2:
> +  foo ();
> +  break;
> +case 7:
> +  bar ();
> +  break;
> +}
> +}
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index 7e3b513..2c166db 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -9643,61 +9643,66 @@ simplify_switch_using_ranges (gswitch *stmt)
>tree min_label = gimple_switch_label (stmt, min_idx);
>tree max_label = gimple_switch_label (stmt, max_idx);
>
> +  /* Avoid changing the types of the case labels when truncating.  */
> +  tree case_label_type = TREE_TYPE (CASE_LOW (min_label));
> +  tree vr_min = fold_convert (case_label_type, vr->min);
> +  tree vr_max = fold_convert (case_label_type, vr->max);
> +
>if (vr->type == VR_RANGE)
> {
>   /* If OP's value range is [2,8] and the low label range is
>  0 ... 3, truncate the label's range to 2 .. 3.  */
> - if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) < 0
> + if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) < 0
>   && CASE_HIGH (min_label) != NULL_TREE
> - && tree_int_cst_compare (CASE_HIGH (min_label), vr->min) >= 0)
> -   CASE_LOW (min_label) = vr->min;
> + && tree_int_cst_compare (CASE_HIGH (min_label), vr_min) >= 0)
> +   CASE_LOW (min_label) = vr_min;
>
>   /* If OP's value range is [2,8] and the high label range is
>  7 ... 10, truncate the label's range to 7 .. 8.  */
> - if (tree_int_cst_compare (CASE_LOW (max_label), vr->max) <= 0
> + if (tree_int_cst_compare (CASE_LOW (max_label), vr_max) <= 0
>   && CASE_HIGH (max_label) != NULL_TREE
> - && tree_int_cst_compare (CASE_HIGH (max_label), vr->max) > 0)
> -   CASE_HIGH (max_label) = vr->max;
> + && tree_int_cst_compare (CASE_HIGH (max_label), vr_max) > 0)
> +   CASE_HIGH (max_label) = vr_max;
> }
>else if (vr->type == VR_ANTI_RANGE)
> {
> - tree one_cst = build_one_cst (TREE_TYPE (op));
> + tree one_cst = build_one_cst (case_label_type);
>
>   if (min_label == max_label)
> {
>   /* If OP's value range is ~[7,8] and the label's range is
>  7 ... 10, truncate the label's range to 9 ... 10.  */
> - if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) == 0
> + if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) == 0
>   && CASE_HIGH (min_label) != NULL_TREE
> - && tree_int_cst_compare (CASE_HIGH (min_label), vr->max) > 
> 0)
> + && tree_int_cst_compare (CASE_HIGH (min_label), vr_max) > 0)
> CASE_LOW (min_label)
> - = int_const_binop (PLUS_EXPR, vr->max, one_cst);
> + = int_const_binop (PLUS_EXPR, vr_max, one_cst);
>
>   /* If OP's value range is ~[7,8] and the label's range is
>  5 ... 8, truncate the label's range to 5 ... 6.  */
> - if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) < 0
> + if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) < 0
>   && CASE_HIGH (min_label) != NULL_TREE
> - && tree_int_cst_compare (CASE_HIGH (min_label), vr->max) == 
> 0)
> + && tree_int_cst_compare 

Re: [PATCH] Improve backwards threading

2016-08-05 Thread Richard Biener
On Fri, 5 Aug 2016, Richard Biener wrote:

> On Fri, 5 Aug 2016, Richard Biener wrote:
> 
> > 
> > This avoids regressing gcc.dg/tree-ssa/pr21417.c with the fix for
> > PR72772 where after it a forwarder block no longer is present.
> > It's easy to simply create it when FSM threading faces the situation
> > that the edge ending the path enters a loop.
> > 
> > I also fixed the costs for obviously related anon SSA names (when
> > they are the same).
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, I'll apply
> > if that finishes successfully.
> 
> Ok, so that re-introduces the c-c++-common/ubsan/pr71403-2.c miscompile
> where we mash two loops retaining a bogus loop->nb_iterations_upper_bound.
> 
> I think those two testcases show that it should be the dominance
> relation between the entered loop header and the threaded path
> that decides whether we can thread or not -- the path may not
> become another latch of this loop.  I can't seem to figure out
> a correct condition for this based on the threading path but the
> following works (maybe by accident as I only have those two
> testcases):
> 
> Index: gcc/tree-ssa-threadbackward.c
> ===
> --- gcc/tree-ssa-threadbackward.c   (revision 239164)
> +++ gcc/tree-ssa-threadbackward.c   (working copy)
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-pass.h"
>  #include "gimple-ssa.h"
>  #include "tree-phinodes.h"
> +#include "cfghooks.h"
>  
>  static int max_threaded_paths;
>  
> @@ -560,6 +562,14 @@ fsm_find_control_statement_thread_paths
>   edge taken_edge = profitable_jump_thread_path (path, bbi, name, 
> arg);
>   if (taken_edge)
> {
> + /* If the taken edge is a loop entry avoid mashing two
> +loops into one with multiple latches by splitting
> +the edge.  This only works if that block won't become
> +a latch of this loop.  */
> + if ((bb_loop_depth (taken_edge->src)
> +  < bb_loop_depth (taken_edge->dest))
> + && ! single_succ_p (bbi))
> +   split_edge (taken_edge);
>   if (bb_loop_depth (taken_edge->src)
>   >= bb_loop_depth (taken_edge->dest))
> convert_and_register_jump_thread_path (path, taken_edge);
> 
> note you need the PR72772 fix to trigger all this.

I think it also shows that a thread path ending not in the inner loop
header but on the preheader block might be miscompiled as well given
said block can become latch as well if other paths leading into it
vanish.  Thus there is a latent wrong-code issue with the backward
threading (and maybe it really should invalidate 
nb_iterations_upper_bound of all possibly participating loops,
which I guess is the loop the backedge we thread through plus all
inner loops).

> Any idea?
> 
> Thanks,
> Richard.
> 
> > Richard.
> > 
> > 2016-08-05  Richard Biener  
> > 
> > * tree-ssa-threadbackward.c: Include cfghooks.h.
> > (profitable_jump_thread_path): Treat same SSA names related.
> > (fsm_find_control_statement_thread_paths): When the taken edge
> > enters a loop split it instead of giving up.
> > 
> > Index: gcc/tree-ssa-threadbackward.c
> > ===
> > --- gcc/tree-ssa-threadbackward.c   (revision 239164)
> > +++ gcc/tree-ssa-threadbackward.c   (working copy)
> > @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
> >  #include "tree-pass.h"
> >  #include "gimple-ssa.h"
> >  #include "tree-phinodes.h"
> > +#include "cfghooks.h"
> >  
> >  static int max_threaded_paths;
> >  
> > @@ -206,8 +207,9 @@ profitable_jump_thread_path (vec >   /* Note that if both NAME and DST are anonymous
> >  SSA_NAMEs, then we do not have enough information
> >  to consider them associated.  */
> > - if ((SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
> > -  || !SSA_NAME_VAR (dst))
> > + if (dst != name
> > + && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
> > + || !SSA_NAME_VAR (dst))
> >   && !virtual_operand_p (dst))
> > ++n_insns;
> > }
> > @@ -560,9 +562,13 @@ fsm_find_control_statement_thread_paths
> >   edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg);
> >   if (taken_edge)
> > {
> > + /* If the taken edge is a loop entry avoid mashing two
> > +loops into one with multiple latches by splitting
> > +the edge.  */
> >   if (bb_loop_depth (taken_edge->src)
> > - >= bb_loop_depth (taken_edge->dest))
> > -   convert_and_register_jump_thread_path (path, taken_edge);
> > + < bb_loop_depth (taken_edge->dest))
> > +   split_edge (taken_edge);
> > + 

Re: [PATCH] Improve backwards threading

2016-08-05 Thread Richard Biener
On Fri, 5 Aug 2016, Richard Biener wrote:

> 
> This avoids regressing gcc.dg/tree-ssa/pr21417.c with the fix for
> PR72772 where after it a forwarder block no longer is present.
> It's easy to simply create it when FSM threading faces the situation
> that the edge ending the path enters a loop.
> 
> I also fixed the costs for obviously related anon SSA names (when
> they are the same).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, I'll apply
> if that finishes successfully.

Ok, so that re-introduces the c-c++-common/ubsan/pr71403-2.c miscompile
where we mash two loops retaining a bogus loop->nb_iterations_upper_bound.

I think those two testcases show that it should be the dominance
relation between the entered loop header and the threaded path
that decides whether we can thread or not -- the path may not
become another latch of this loop.  I can't seem to figure out
a correct condition for this based on the threading path but the
following works (maybe by accident as I only have those two
testcases):

Index: gcc/tree-ssa-threadbackward.c
===
--- gcc/tree-ssa-threadbackward.c   (revision 239164)
+++ gcc/tree-ssa-threadbackward.c   (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
+#include "cfghooks.h"
 
 static int max_threaded_paths;
 
@@ -560,6 +562,14 @@ fsm_find_control_statement_thread_paths
  edge taken_edge = profitable_jump_thread_path (path, bbi, name, 
arg);
  if (taken_edge)
{
+ /* If the taken edge is a loop entry avoid mashing two
+loops into one with multiple latches by splitting
+the edge.  This only works if that block won't become
+a latch of this loop.  */
+ if ((bb_loop_depth (taken_edge->src)
+  < bb_loop_depth (taken_edge->dest))
+ && ! single_succ_p (bbi))
+   split_edge (taken_edge);
  if (bb_loop_depth (taken_edge->src)
  >= bb_loop_depth (taken_edge->dest))
convert_and_register_jump_thread_path (path, taken_edge);

note you need the PR72772 fix to trigger all this.

Any idea?

Thanks,
Richard.

> Richard.
> 
> 2016-08-05  Richard Biener  
> 
>   * tree-ssa-threadbackward.c: Include cfghooks.h.
>   (profitable_jump_thread_path): Treat same SSA names related.
>   (fsm_find_control_statement_thread_paths): When the taken edge
>   enters a loop split it instead of giving up.
> 
> Index: gcc/tree-ssa-threadbackward.c
> ===
> --- gcc/tree-ssa-threadbackward.c (revision 239164)
> +++ gcc/tree-ssa-threadbackward.c (working copy)
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-pass.h"
>  #include "gimple-ssa.h"
>  #include "tree-phinodes.h"
> +#include "cfghooks.h"
>  
>  static int max_threaded_paths;
>  
> @@ -206,8 +207,9 @@ profitable_jump_thread_path (vec /* Note that if both NAME and DST are anonymous
>SSA_NAMEs, then we do not have enough information
>to consider them associated.  */
> -   if ((SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
> -|| !SSA_NAME_VAR (dst))
> +   if (dst != name
> +   && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name)
> +   || !SSA_NAME_VAR (dst))
> && !virtual_operand_p (dst))
>   ++n_insns;
>   }
> @@ -560,9 +562,13 @@ fsm_find_control_statement_thread_paths
> edge taken_edge = profitable_jump_thread_path (path, bbi, name, arg);
> if (taken_edge)
>   {
> +   /* If the taken edge is a loop entry avoid mashing two
> +  loops into one with multiple latches by splitting
> +  the edge.  */
> if (bb_loop_depth (taken_edge->src)
> -   >= bb_loop_depth (taken_edge->dest))
> - convert_and_register_jump_thread_path (path, taken_edge);
> +   < bb_loop_depth (taken_edge->dest))
> + split_edge (taken_edge);
> +   convert_and_register_jump_thread_path (path, taken_edge);
> path->pop ();
>   }
>   }
> @@ -586,9 +592,13 @@ fsm_find_control_statement_thread_paths
>name, arg);
> if (taken_edge)
>   {
> +   /* If the taken edge is a loop entry avoid mashing two
> +  loops into one with multiple latches by splitting
> +  the edge.  */
> if (bb_loop_depth (taken_edge->src)
> -   >= bb_loop_depth (taken_edge->dest))
> - convert_and_register_jump_thread_path (path, taken_edge);
> +   < bb_loop_depth 

Re: [PATCH][COMMITTED] Revert r238497 because of PR 71961.

2016-08-05 Thread Renlin Li

Hi Joost,

I am not familiar with fortran code.
Maybe Thomas can do something in his new patch?

Regards,
Renlin

On 28/07/16 12:34, VandeVondele  Joost wrote:

Thanks.. I wonder if you could add the testcase in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961#c11

to the testsuite, as it catches the underlying issue.

Regards,

Joost VandeVondele



Re: [PING**2] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-08-05 Thread Bernd Edlinger
On 08/04/16 22:27, Jeff Law wrote:
> On 07/21/2016 10:29 AM, Bernd Edlinger wrote:
 But I think we have a use case where "X" means really more possible
 registers (i.e. includes ss2, mmx etc.) than "g" (only general
 registers).  Otherwise, in the test cases of pr59155 we would not
 have any benefit for using "+X" instead of "+g" or "+r".

 Does that sound reasonable?
>>> If it's the case that the real benefit of +X is that it's allowing more
>>> registers, then that argues that the backend ought to be providing
>>> another (larger) register class.
>>>
>>
>> X allows more different registers than r, and it is already documented.
>> In the cases where it is already used, the patch should not break
>> anything.  I would not understand, why we should forbid the use of X and
>> waste another letter of the alphabet for a slightly modified version
>> of X.
> Doing so essentially allows us to deprecate "X" to used by target
> patterns only -- where what's acceptable is limited by the operand
> predicates.  Those limits ultimately protect the rest of the routines
> from having to handle arbitrary RTL.
>
> Meanwhile asms can use the new letter to say "I'll take any register of
> any class".  Which is, AFAICT, what's desired here.
>
> jeff

Yes.  To be useful it should be a target independent letter.

While "g" implies a general register of class GENERAL_REGS
"X" implies any register of class ALL_REGS.

I have looked for uses of "X" and actually found some of them in glibc:

./sysdeps/powerpc/powerpc64/dl-machine.h:

   /* GCC 4.9+ eliminates the branch as dead code, force the odp set
  dependency.  */
   asm ("" : "=r" (value) : "0" (), "X" (opd));

./sysdeps/mach/hurd/i386/init-first.c:

   *--newsp = *((int *) __builtin_frame_address (0) + 1);
   /* GCC 4.4.6 also wants us to force loading *NEWSP already here.  */
   asm volatile ("# %0" : : "X" (*newsp));


and same file:

   usercode = *((int *) __builtin_frame_address (0) + 1);
   /* GCC 4.4.6 also wants us to force loading USERCODE already 
here.  */
   asm volatile ("# %0" : : "X" (usercode));


So in is mostly used for obfuscating the data flow.

The documentation of "g" at md.texi says

@cindex @samp{g} in constraint
@item @samp{g}
Any register, memory or immediate integer operand is allowed, except for
registers that are not general registers.

and "X" says:

@ifset INTERNALS
Any operand whatsoever is allowed, even if it does not satisfy
@code{general_operand}.  This is normally used in the constraint of
a @code{match_scratch} when certain alternatives will not actually
require a scratch register.
@end ifset
@ifclear INTERNALS
Any operand whatsoever is allowed.
@end ifclear

The part ifset INTERNALS describes the rules for target patterns,
while the ifclear INTERNALS part describes the rules for asms.

This is exactly what we want.  Not saying "except for
registers that are not general registers" is a hint that there
are more registers in the ALL_REGS class.  We could make it more
explicit by adding "Including registers that are not general
registers".

And "whatsoever" means anything you can write down at the source code
level IMO.


So I think restricting "X" in asms to this definition while keeping
the current meaning of "X" in target patterns is consistent with the
current documentation and compatible to the current uses of the
"X" constraint elsewhere.


Bernd.


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-08-05 Thread Yuri Rumyantsev
Richard,

Here is updated patch which implements your proposal - I pass loop
instead of stmt to determine either REF is defined inside LOOP nest or
not. I checked that for pr70729-nest.cc the reference this->S_n  for
statements which are out of innermost loop are  not considered as
independent as you pointed out.

Regression testing did not show any new failures and both failed tests
from libgomp.fortran suite now passed.

Is it OK for trunk?
ChangeLog:
2016-08-05  Yuri Rumyantsev  

PR tree-optimization/71734
* tree-ssa-loop-im.c (ref_indep_loop_p): Add new argument REF_LOOP.
(outermost_indep_loop): Pass LOOP argumnet where REF was defined to
ref_indep_loop_p.
(ref_indep_loop_p_1): Fix commentary.
(ref_indep_loop_p_2): Add additional argument REF_LOOP, introduce new
variable NEW_SAFELEN which may have new value for SAFELEN, ignore
dependencde for loop having greater safelen value, pass REF_LOOP in
recursive call.
(can_sm_ref_p): Pass LOOP as additional argument to ref_indep_loop_p.

2016-08-03 16:44 GMT+03:00 Richard Biener :
> On Fri, Jul 29, 2016 at 4:00 PM, Yuri Rumyantsev  wrote:
>> Hi Richard.
>>
>> It turned out that the fix proposed by you does not work for liggomp
>> tests simd3 and simd4.
>> The reason is that we can't change safelen value for references not
>> defined inside loop. So I add missed check on it to patch.
>> Is it OK for trunk?
>
> Hmm, I don't like the walk of all subloops in ref_defined_in_loop_p as
> that operation can end up being quadratic in the loop depth/width.
>
> But I also wonder about correctness given that LIM "commons"
> references.  So we can have
>
>   for (;;)
> .. = ref;  (1)
> for (;;) // safelen == 2  (2)
>   ... = ref;
>
> and when looking at the ref at (1) which according to you should not
> have safelen applied your function will happily return that ref is defined
> in the inner loop.
>
> So it looks like to be able to apply safelen the caller of ref_indep_loop_p
> needs to pass down a ref plus a location (a stmt).  In which case your
> function can simply use flow_loop_nested_p (loop, gimple_bb
> (stmt)->loop_father);
>
> Richard.
>
>> ChangeLog:
>> 2016-07-29  Yuri Rumyantsev  
>>
>> PR tree-optimization/71734
>> * tree-ssa-loop-im.c (ref_defined_in_loop_p): New function.
>> (ref_indep_loop_p_2): Change SAFELEN value for REF defined inside LOOP.
>>
>> 2016-07-29 13:08 GMT+03:00 Yuri Rumyantsev :
>>> Sorry H.J.
>>>
>>> I checked both these tests manually but forgot to pass "-fopenmp" option.
>>> I will fix the issue asap.
>>>
>>> 2016-07-29 0:33 GMT+03:00 H.J. Lu :
 On Thu, Jul 28, 2016 at 6:49 AM, Yuri Rumyantsev  
 wrote:
> Richard,
>
> I prepare a patch which is based on yours. New test is also included.
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?
>
> Thanks.
> ChangeLog:
> 2016-07-28  Yuri Rumyantsev  
>
> PR tree-optimization/71734
> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Pass value of safelen
> attribute instead of REF_LOOP and use it.
> (ref_indep_loop_p_2): Use SAFELEN argument instead of REF_LOOP and
> set it for Loops having non-zero safelen attribute.
> (ref_indep_loop_p): Pass zero as initial value for safelen.
> gcc/testsuite/ChangeLog:
> * g++.dg/vect/pr70729-nest.cc: New test.
>

 Does this cause

 FAIL: libgomp.fortran/pr71734-1.f90   -O3 -fomit-frame-pointer
 -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
 test
 FAIL: libgomp.fortran/pr71734-1.f90   -O3 -g  execution test
 FAIL: libgomp.fortran/pr71734-2.f90   -O3 -fomit-frame-pointer
 -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
 test
 FAIL: libgomp.fortran/pr71734-2.f90   -O3 -g  execution test

 on AVX machines and

 FAIL: libgomp.fortran/simd3.f90   -O3 -fomit-frame-pointer
 -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
 test
 FAIL: libgomp.fortran/simd3.f90   -O3 -g  execution test
 FAIL: libgomp.fortran/simd4.f90   -O3 -fomit-frame-pointer
 -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
 test
 FAIL: libgomp.fortran/simd4.f90   -O3 -g  execution test

 on non-AVX machines?

 --
 H.J.


71734.patch.4
Description: Binary data


Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-05 Thread Nathan Sidwell

On 08/05/16 08:48, Martin Liška wrote:


Ok, after all the experimenting and inventing "almost" thread-safe code, I 
incline to not to include __gcov_one_value_profiler_body_atomic
counter. The final solution is cumbersome and probably does not worth doing. 
Moreover, even having a thread-safe implementation, result of an indirect call 
counter
is not going to be stable among different runs (due to a single value storage 
capability).

If you agree, I'll prepare a final version of patch?


Agreed.

nathan



[PATCH] Fix PR tree-optimization/72810

2016-08-05 Thread Patrick Palka
Sometimes the case labels of a switch have a different type than the
switch operand, e.g. in the test case below, the case labels have type t
and the switch operand has type int. Because the verifier expects that
case labels of a switch each have the exact same type, it's important to
avoid changing their types when truncating their range.

Does this patch look OK to commit after bootstrap + regtesting?

gcc/ChangeLog:

PR tree-optimization/72810
* tree-vrp.c (simplify_switch_using_ranges): Avoid changing
the types of the case labels when truncating.

gcc/testsuite/ChangeLog:

PR tree-optimization/72810
* tree-ssa/vrp110.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp110.c | 24 +++
 gcc/tree-vrp.c | 43 +++---
 2 files changed, 48 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp110.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c
new file mode 100644
index 000..34d7eb4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp110.c
@@ -0,0 +1,24 @@
+/* { dg-options "-O2" }  */
+
+extern void foo (void);
+extern void bar (void);
+
+void
+test (int i)
+{
+  if (i == 1)
+return;
+
+  typedef int t;
+  t j = i;
+  switch (j)
+{
+case 1:
+case 2:
+  foo ();
+  break;
+case 7:
+  bar ();
+  break;
+}
+}
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 7e3b513..2c166db 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9643,61 +9643,66 @@ simplify_switch_using_ranges (gswitch *stmt)
   tree min_label = gimple_switch_label (stmt, min_idx);
   tree max_label = gimple_switch_label (stmt, max_idx);
 
+  /* Avoid changing the types of the case labels when truncating.  */
+  tree case_label_type = TREE_TYPE (CASE_LOW (min_label));
+  tree vr_min = fold_convert (case_label_type, vr->min);
+  tree vr_max = fold_convert (case_label_type, vr->max);
+
   if (vr->type == VR_RANGE)
{
  /* If OP's value range is [2,8] and the low label range is
 0 ... 3, truncate the label's range to 2 .. 3.  */
- if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) < 0
+ if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) < 0
  && CASE_HIGH (min_label) != NULL_TREE
- && tree_int_cst_compare (CASE_HIGH (min_label), vr->min) >= 0)
-   CASE_LOW (min_label) = vr->min;
+ && tree_int_cst_compare (CASE_HIGH (min_label), vr_min) >= 0)
+   CASE_LOW (min_label) = vr_min;
 
  /* If OP's value range is [2,8] and the high label range is
 7 ... 10, truncate the label's range to 7 .. 8.  */
- if (tree_int_cst_compare (CASE_LOW (max_label), vr->max) <= 0
+ if (tree_int_cst_compare (CASE_LOW (max_label), vr_max) <= 0
  && CASE_HIGH (max_label) != NULL_TREE
- && tree_int_cst_compare (CASE_HIGH (max_label), vr->max) > 0)
-   CASE_HIGH (max_label) = vr->max;
+ && tree_int_cst_compare (CASE_HIGH (max_label), vr_max) > 0)
+   CASE_HIGH (max_label) = vr_max;
}
   else if (vr->type == VR_ANTI_RANGE)
{
- tree one_cst = build_one_cst (TREE_TYPE (op));
+ tree one_cst = build_one_cst (case_label_type);
 
  if (min_label == max_label)
{
  /* If OP's value range is ~[7,8] and the label's range is
 7 ... 10, truncate the label's range to 9 ... 10.  */
- if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) == 0
+ if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) == 0
  && CASE_HIGH (min_label) != NULL_TREE
- && tree_int_cst_compare (CASE_HIGH (min_label), vr->max) > 0)
+ && tree_int_cst_compare (CASE_HIGH (min_label), vr_max) > 0)
CASE_LOW (min_label)
- = int_const_binop (PLUS_EXPR, vr->max, one_cst);
+ = int_const_binop (PLUS_EXPR, vr_max, one_cst);
 
  /* If OP's value range is ~[7,8] and the label's range is
 5 ... 8, truncate the label's range to 5 ... 6.  */
- if (tree_int_cst_compare (CASE_LOW (min_label), vr->min) < 0
+ if (tree_int_cst_compare (CASE_LOW (min_label), vr_min) < 0
  && CASE_HIGH (min_label) != NULL_TREE
- && tree_int_cst_compare (CASE_HIGH (min_label), vr->max) == 0)
+ && tree_int_cst_compare (CASE_HIGH (min_label), vr_max) == 0)
CASE_HIGH (min_label)
- = int_const_binop (MINUS_EXPR, vr->min, one_cst);
+ = int_const_binop (MINUS_EXPR, vr_min, one_cst);
}
  else
{
  /* If OP's value range is ~[2,8] and the low label range is
 0 ... 3, truncate the label's range to 0 ... 1.  */
-

Re: [PATCH, ARM] Add a new target hook to compute the frame layout

2016-08-05 Thread Bernd Edlinger
On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
> On 04/08/16 22:16, Bernd Edlinger wrote:
>> Hi,
>>
>> this patch introduces a new target hook that allows the target's
>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>> re-computing the frame layout every time.
>>
>> I have updated the documentation a bit and hope it is clearer this time.
>>
>> It still needs a review by ARM port maintainers.
>>
>> If the ARM port maintainers find this patch useful, that would be fine.
>>
>
> I need to look into this more, but my first thought was that the
> documentation is confusing, or there is a real problem in this patch.
>

Thanks for your quick response.

The documentation is actually the most difficult part for me.

> As I understand it the frame has to be re-laid out each time the
> contents of the frame changes (an extra register becomes live or another
> spill slot is needed).  So saying that it is laid out once can't be
> right if (as I read it initially) you mean 'once per function' since I
> think it needs to be 'once for each time the frame contents changes'.
>
> Of course, things might be a bit different with LRA compared with
> reload, but I strongly suspect this is never going to be an 'exactly
> once per function' operation.
>

Right.  It will be done 2 or 3 times for each function.
LRA and reload behave identical in that respect.

But each time reload changes something in the input data the
INITIAL_EMIMINATION_OFFSET is called several times, and the results
have to be consistent in each iteration.

The frame layout function has no way to know if the frame layout
is expected to change or not.

Many targets use reload_completed as an indication when the frame layout
may not change at all, but that is only an approximation.

> Can you clarify your meaning in the documentation please?
>

I meant 'once' in the sense of 'once for each time the frame contents 
change'.

Thus I'd change that sentence to:

"This target hook allows the target to compute the frame layout once for
each time the frame contents change and make use of the cached frame
layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
on every invocation.  This is particularly useful for targets that have
an expensive frame layout function.  Implementing this callback is
optional."


Thanks
Bernd.


> R.
>
>>
>> Thanks
>> Bernd.
>>
>> On 06/21/16 23:29, Jeff Law wrote:
>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
 Hi!


 By the design of the target hook INITIAL_ELIMINATION_OFFSET
 it is necessary to call this function several times with
 different register combinations.
 Most targets use a cached data structure that describes the
 exact frame layout of the current function.

 It is safe to skip the computation when reload_completed = true,
 and most targets do that already.

 However while reload is doing its work, it is not clear when to
 do the computation and when not.  This results in unnecessary
 work.  Computing the frame layout can be a simple function or an
 arbitrarily complex one, that walks all instructions of the current
 function for instance, which is more or less the common case.


 This patch adds a new optional target hook that can be used
 by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
 into a O(n) computation part, and a O(1) result function.

 The patch implements a compute_frame_layout target hook just
 for ARM in the moment, to show the principle.
 Other targets may also implement that hook, if it seems appropriate.


 Boot-strapped and reg-tested on arm-linux-gnueabihf.
 OK for trunk?


 Thanks
 Bernd.


 changelog-frame-layout.txt


 2016-06-16  Bernd Edlinger  

  * target.def (compute_frame_layout): New optional target hook.
  * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
  * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
  * lra-eliminations.c (update_reg_eliminate): Call
 compute_frame_layout
  target hook.
  * reload1.c (verify_initial_elim_offsets): Likewise.
  * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
  (use_simple_return_p): Call arm_compute_frame_layout if needed.
  (arm_get_frame_offsets): Split up into this ...
  (arm_compute_frame_layout): ... and this function.
>>> The ARM maintainers would need to chime in on the ARM specific changes
>>> though.
>>>
>>>
>>>
 Index: gcc/target.def
 ===
 --- gcc/target.def(Revision 233176)
 +++ gcc/target.def(Arbeitskopie)
 @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
unsigned int, (void),
default_case_values_threshold)

 -/* Retutn true if a function must have and 

Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-05 Thread Martin Liška
On 08/05/2016 02:38 PM, Nathan Sidwell wrote:
> On 08/05/16 04:55, Martin Liška wrote:
> 
>> Thank you for very intensive brainstorming ;) Well I still believe that 
>> following code
>> is not thread safe, let's image following situation:
>>
> 
> yeah, you're right.
> 
>>> we could do better by using compare_exchange storing value, and detect the 
>>> race I mentioned:
>>>
>>> gcov_t expected, val;
>>> atomic_load ([0],  , ...);
>>
>> [thread 1]: value == 1, read val == 1 // scheduled here
>>
>>> gcov_t delta = val == value ? 1 : -1;
>>> atomic_add ([1], delta);
>>> if (delta < 0) {
>>>retry:
>>> /* can we set counter[0]? */
>>> atomic_load ([1], , ...);
>>> if (expected < 0) {
>>>   bool stored = atomic_compare_exchange ([0], , , 
>>> ...);
>>>   if (!stored && val != value)
>>> goto retry;
>> [thread 2]: value == 2, just updated counter[0] to 2
>> // after that [thread 1] continue, but wrongly does counter[1]++, but value 
>> != counter[0]
>>>   atomic_add ([1], 2, ...);
> 
> Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 
> handling to store a count of 1 solve it?: (answer: no)
> 
> gcov_t expected, val;
> A:atomic_load ([0],  , ...);
> gcov_t delta = val == value ? 1 : -1;
> B:atomic_add ([1], delta);
> 
> if (delta < 0) {
>  /* can we set counter[0]? */
>  C:atomic_load ([1], , ...);
>  if (expected < 0) {
>D:atomic_store ([0], );
>E: atomic_store ([1], 1);
>   }
> atomic_add ([1], 2, ...);
> 
> 
> thread-1: value = 1, reads '1' at A
> thread-2: value = 2, reads '1' at A
> thread-2: decrements count @ B
> thread-2: reads -1 at C
> thread-2: write 2 at D
> thread-2: stores 1 at E
> thread-1: increments count @ B (finally)
> 
> So we still can go awry.  But the code's simpler.  Like you said, I don't 
> think it's possible to solve without an atomic update to both counter[0] & 
> counter[1].
> 
> 
>> Well, I wrote attached test-case which should trigger a data-race, but TSAN 
>> is silent:
> 
> I'm not too surprised.  The race window is tiny and you put a printf in the 
> middle of one path.  I suspect if you put a sleep/printf on the counter[1] 
> increment path you'll see it more often.
> 
> nathan

Ok, after all the experimenting and inventing "almost" thread-safe code, I 
incline to not to include __gcov_one_value_profiler_body_atomic
counter. The final solution is cumbersome and probably does not worth doing. 
Moreover, even having a thread-safe implementation, result of an indirect call 
counter
is not going to be stable among different runs (due to a single value storage 
capability).

If you agree, I'll prepare a final version of patch?

Thanks,
Martin


Re: [PATCH] gcov: rename line_next to next_file_fn in function_info

2016-08-05 Thread Nathan Sidwell

On 08/05/16 08:22, Martin Liška wrote:

Hello.

This corrects a record field name where comment does not correspond to it's 
name.
make check -k RUNTESTFLAGS="gcov.exp" works fine.

Ready to install?


heh, 'line_next' is perfectly good name for the next function.  No, wait, it's 
stupid!


your patch is fine, thanks!

nathan



Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-05 Thread Nathan Sidwell

On 08/05/16 04:55, Martin Liška wrote:


Thank you for very intensive brainstorming ;) Well I still believe that 
following code
is not thread safe, let's image following situation:



yeah, you're right.


we could do better by using compare_exchange storing value, and detect the race 
I mentioned:

gcov_t expected, val;
atomic_load ([0],  , ...);


[thread 1]: value == 1, read val == 1 // scheduled here


gcov_t delta = val == value ? 1 : -1;
atomic_add ([1], delta);
if (delta < 0) {
   retry:
/* can we set counter[0]? */
atomic_load ([1], , ...);
if (expected < 0) {
  bool stored = atomic_compare_exchange ([0], , , ...);
  if (!stored && val != value)
goto retry;

[thread 2]: value == 2, just updated counter[0] to 2
// after that [thread 1] continue, but wrongly does counter[1]++, but value != 
counter[0]

  atomic_add ([1], 2, ...);


Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 
handling to store a count of 1 solve it?: (answer: no)


gcov_t expected, val;
A:atomic_load ([0],  , ...);
gcov_t delta = val == value ? 1 : -1;
B:atomic_add ([1], delta);

if (delta < 0) {
 /* can we set counter[0]? */
 C:atomic_load ([1], , ...);
 if (expected < 0) {
   D:atomic_store ([0], );
   E: atomic_store ([1], 1);
  }
atomic_add ([1], 2, ...);


thread-1: value = 1, reads '1' at A
thread-2: value = 2, reads '1' at A
thread-2: decrements count @ B
thread-2: reads -1 at C
thread-2: write 2 at D
thread-2: stores 1 at E
thread-1: increments count @ B (finally)

So we still can go awry.  But the code's simpler.  Like you said, I don't think 
it's possible to solve without an atomic update to both counter[0] & counter[1].




Well, I wrote attached test-case which should trigger a data-race, but TSAN is 
silent:


I'm not too surprised.  The race window is tiny and you put a printf in the 
middle of one path.  I suspect if you put a sleep/printf on the counter[1] 
increment path you'll see it more often.


nathan


Re: [RFC] ipa bitwise constant propagation

2016-08-05 Thread Martin Jambor
Hi,

generally speaking, the ipa-cp.c and ipa-cp.[hc] bits look reasonable,
but I have a few comments:

On Thu, Aug 04, 2016 at 12:06:18PM +0530, Prathamesh Kulkarni wrote:
> Hi,
> This is a prototype patch for propagating known/unknown bits 
> inter-procedurally.
> for integral types which propagates info obtained from get_nonzero_bits ().
> 
> Patch required making following changes:
> a) To make info from get_nonzero_bits() available to ipa

in IPA phase, you should not be looking at SSA_NAMEs, those will not
be available with LTO when you do not have access to function bodies
and thus also not to SSA_NAMES.  In IPA, you should only rely on hat
you have in jump functions.

> , I had to remove
> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
> in get_ptr_info() for default_none.f95 (and several other fortran tests)
> with options: -fopenacc -O2
> ICE: http://pastebin.com/KjD7HMQi
> I confirmed with Richard that this was a latent issue.
> 
> 
> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?

Yes, definitely.

> I haven't yet gated it on the flag, will do in next version of patch.
> I have added some very simple test-cases, I will try to add more
> meaningful ones.


> 
> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
> of some fortran tests failing due to above ICE.
> 
> As next steps, I am planning to extend it to handle alignment propagation,
> and do further testing (lto-bootstrap, chromium).
> I would be grateful for feedback on the current patch.
> 
> Thanks,
> Prathamesh

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 5b6cb9a..b770f6a 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -120,6 +120,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
> +#include "tree-ssa-ccp.h"
>  
>  template  class ipcp_value;
>  
> @@ -266,6 +267,40 @@ private:
>bool meet_with_1 (unsigned new_align, unsigned new_misalign);
>  };
>  
> +/* Lattice of known bits, only capable of holding one value.
> +   Similar to ccp_prop_value_t, mask represents which bits of value are 
> constant.
> +   If a bit in mask is set to 0, then the corresponding bit in
> +   value is known to be constant.  */
> +
> +class ipcp_bits_lattice
> +{
> +public:
> +  bool bottom_p () { return lattice_val == IPA_BITS_VARYING; }
> +  bool top_p () { return lattice_val == IPA_BITS_UNDEFINED; }
> +  bool constant_p () { return lattice_val == IPA_BITS_CONSTANT; }
> +  bool set_to_bottom ();
> +  bool set_to_constant (widest_int, widest_int, signop, unsigned);
> + 
> +  widest_int get_value () { return value; }
> +  widest_int get_mask () { return mask; }
> +  signop get_sign () { return sgn; }
> +  unsigned get_precision () { return precision; }
> +
> +  bool meet_with (ipcp_bits_lattice& other, enum tree_code, tree);
> +  bool meet_with (widest_int, widest_int, signop, unsigned);
> +
> +  void print (FILE *);
> +
> +private:
> +  enum { IPA_BITS_UNDEFINED, IPA_BITS_CONSTANT, IPA_BITS_VARYING } 
> lattice_val;
> +  widest_int value, mask;
> +  signop sgn;
> +  unsigned precision;
> +
> +  bool meet_with_1 (widest_int, widest_int); 
> +  void get_value_and_mask (tree, widest_int *, widest_int *);
> +}; 
> +
>  /* Structure containing lattices for a parameter itself and for pieces of
> aggregates that are passed in the parameter or by a reference in a 
> parameter
> plus some other useful flags.  */
> @@ -281,6 +316,8 @@ public:
>ipcp_agg_lattice *aggs;
>/* Lattice describing known alignment.  */
>ipcp_alignment_lattice alignment;
> +  /* Lattice describing known bits.  */
> +  ipcp_bits_lattice bits_lattice;
>/* Number of aggregate lattices */
>int aggs_count;
>/* True if aggregate data were passed by reference (as opposed to by
> @@ -458,6 +495,21 @@ ipcp_alignment_lattice::print (FILE * f)
>  fprintf (f, " Alignment %u, misalignment %u\n", align, misalign);
>  }
>  
> +void
> +ipcp_bits_lattice::print (FILE *f)
> +{
> +  if (top_p ())
> +fprintf (f, " Bits unknown (TOP)\n");
> +  else if (bottom_p ())
> +fprintf (f, " Bits unusable (BOTTOM)\n");
> +  else
> +{
> +  fprintf (f, " Bits: value = "); print_hex (get_value (), f);
> +  fprintf (f, ", mask = "); print_hex (get_mask (), f);
> +  fprintf (f, "\n");
> +}
> +}
> +
>  /* Print all ipcp_lattices of all functions to F.  */
>  
>  static void
> @@ -484,6 +536,7 @@ print_all_lattices (FILE * f, bool dump_sources, bool 
> dump_benefits)
> fprintf (f, " ctxs: ");
> plats->ctxlat.print (f, dump_sources, dump_benefits);
> plats->alignment.print (f);
> +   plats->bits_lattice.print (f);
> if (plats->virt_call)
>   fprintf (f, "virt_call flag set\n");
>  
> @@ -911,6 +964,161 @@ ipcp_alignment_lattice::meet_with (const 
> 

Re: [PATCH] gcov tool: Implement Hawick's algorithm for cycle detection, (PR gcov-profile/67992)

2016-08-05 Thread Nathan Sidwell

On 08/05/16 03:32, Martin Liška wrote:

On 08/05/2016 09:30 AM, Martin Liška wrote:

Hi.

Sorry for the mistake with the enum, that was silly ;)
New patch version also handles the unnecessary braces.

Martin


I attached a wrong patch, sending the new one.


This one looks good, thanks!


nathan


[C++/68724] ICE with TRAIT_EXPR

2016-08-05 Thread Nathan Sidwell
I've committed this patch to fix a failed assertion.  unify's (pt.c) default 
case asserts EXPR_P(parm), but TRAIT_EXPR is tcc_exceptional, so fails.  However 
from unify's POV it's no more exceptional than, say, SIZEOF_EXPR, so should be 
allowed at that point.


nathan
2016-08-05  Nathan Sidwell  

	PR c++/68724
	* pt.c (unify): TRAIT_EXPR is an expr.

	PR c++/68724
	* g++.dg/cpp0x/pr68724.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 239166)
+++ cp/pt.c	(working copy)
@@ -20269,7 +20269,7 @@ unify (tree tparms, tree targs, tree par
   /* An unresolved overload is a nondeduced context.  */
   if (is_overloaded_fn (parm) || type_unknown_p (parm))
 	return unify_success (explain_p);
-  gcc_assert (EXPR_P (parm));
+  gcc_assert (EXPR_P (parm) || TREE_CODE (parm) == TRAIT_EXPR);
 expr:
   /* We must be looking at an expression.  This can happen with
 	 something like:
Index: testsuite/g++.dg/cpp0x/pr68724.C
===
--- testsuite/g++.dg/cpp0x/pr68724.C	(nonexistent)
+++ testsuite/g++.dg/cpp0x/pr68724.C	(working copy)
@@ -0,0 +1,15 @@
+// PR 68724 ICE in  unificiation
+// { dg-do compile { target c++11 } }
+
+template 
+struct integral_constant
+{
+};
+
+integral_constant inst;
+
+template 
+struct integral_constant // { dg-error "" }
+{
+};
+


[PATCH] gcov: rename line_next to next_file_fn in function_info

2016-08-05 Thread Martin Liška
Hello.

This corrects a record field name where comment does not correspond to it's 
name.
make check -k RUNTESTFLAGS="gcov.exp" works fine.

Ready to install?
Martin
>From 1733e26c619e65ef95604f04f026be0006e9c4e0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 5 Aug 2016 14:18:47 +0200
Subject: [PATCH] gcov: rename line_next to next_file_fn in function_info

gcc/ChangeLog:

2016-08-05  Martin Liska  

	* gcov.c (output_intermediate_file): Rename
	function_info::line_next to next_file_fn.
	(process_file): Likewise.
	(read_graph_file): Likewise.
	(accumulate_line_counts): Likewise.
	(output_lines): Likewise.
---
 gcc/gcov.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 417b4f4..e1b3372 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -194,7 +194,7 @@ typedef struct function_info
   unsigned src;
 
   /* Next function in same source file.  */
-  struct function_info *line_next;
+  struct function_info *next_file_fn;
 
   /* Next function.  */
   struct function_info *next;
@@ -667,7 +667,7 @@ output_intermediate_file (FILE *gcov_file, source_t *src)
 
   fprintf (gcov_file, "file:%s\n", src->name);/* source file name */
 
-  for (fn = src->functions; fn; fn = fn->line_next)
+  for (fn = src->functions; fn; fn = fn->next_file_fn)
 {
   /* function:,, */
   fprintf (gcov_file, "function:%d,%s,%s\n", fn->line,
@@ -737,10 +737,10 @@ process_file (const char *file_name)
 	 ascending order, so a simple scan is quick.  Note we're
 	 building this list in reverse order.  */
 	  for (prev = [src].functions;
-	   (probe = *prev); prev = >line_next)
+	   (probe = *prev); prev = >next_file_fn)
 	if (probe->line <= line)
 	  break;
-	  fn->line_next = probe;
+	  fn->next_file_fn = probe;
 	  *prev = fn;
 
 	  /* Mark last line in files touched by function.  */
@@ -1222,7 +1222,7 @@ read_graph_file (void)
 	  fn->src = src_idx;
 	  fn->line = lineno;
 
-	  fn->line_next = NULL;
+	  fn->next_file_fn = NULL;
 	  fn->next = NULL;
 	  *fns_end = fn;
 	  fns_end = >next;
@@ -2155,8 +2155,8 @@ accumulate_line_counts (source_t *src)
   for (fn = src->functions, fn_p = NULL; fn;
fn_p = fn, fn = fn_n)
 {
-  fn_n = fn->line_next;
-  fn->line_next = fn_p;
+  fn_n = fn->next_file_fn;
+  fn->next_file_fn = fn_p;
 }
   src->functions = fn_p;
 
@@ -2433,7 +2433,7 @@ output_lines (FILE *gcov_file, const source_t *src)
   for (line_num = 1, line = >lines[line_num];
line_num < src->num_lines; line_num++, line++)
 {
-  for (; fn && fn->line == line_num; fn = fn->line_next)
+  for (; fn && fn->line == line_num; fn = fn->next_file_fn)
 	{
 	  arc_t *arc = fn->blocks[EXIT_BLOCK].pred;
 	  gcov_type return_count = fn->blocks[EXIT_BLOCK].count;
-- 
2.9.2



[RFT PATCH, i386]: Optimize zero-extensions from mask registers

2016-08-05 Thread Uros Bizjak
Hello!

Attached patch was inspired by assembly from PR 72805 testcase.
Currently, the compiler generates:

test:
vpternlogd  $0xFF, %zmm0, %zmm0, %zmm0
vpxord  %zmm1, %zmm1, %zmm1
vpcmpd  $1, %zmm1, %zmm0, %k1
kmovw   %k1, %eax
movzwl  %ax, %eax
ret

Please note that kmovw already zero-extended from a mask register.

Attached patch allows ree pass to propagate mask registers to zext
insn patterns, resulting in:

test:
vpternlogd  $0xFF, %zmm0, %zmm0, %zmm0  # 24
movv16si_internal/2 [length = 6]
vpxord  %zmm1, %zmm1, %zmm1 # 25movv16si_internal/1
 [length = 6]
vpcmpd  $1, %zmm1, %zmm0, %k1   # 13avx512f_cmpv16si3
 [length = 7]
kmovw   %k1, %eax   # 27*zero_extendhisi2/2 [length = 4]
ret # 30simple_return_internal  [length = 1]

2016-08-05  Uros Bizjak  

* config/i386/i386.md (*zero_extendsidi2): Add (*r,*k) alternative.
(zero_extenddi2): Ditto.
(*zero_extendsi2): Ditto.
(*zero_extendqihi2): Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

The patch is in RFT state, since I have no means to test AVX512 stuff.
Kirill, can someone from Intel please test the patch?

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 239166)
+++ config/i386/i386.md (working copy)
@@ -3688,10 +3688,10 @@
 
 (define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   "=r,?r,?o,r   ,o,?*Ym,?!*y,?r ,?r,?*Yi,?*x")
+   "=r,?r,?o,r   ,o,?*Ym,?!*y,?r ,?r,?*Yi,?*x,*r")
(zero_extend:DI
 (match_operand:SI 1 "x86_64_zext_operand"
-   "0 ,rm,r ,rmWz,0,r   ,m   ,*Yj,*x,r   ,m")))]
+   "0 ,rm,r ,rmWz,0,r   ,m   ,*Yj,*x,r   ,m  ,*k")))]
   ""
 {
   switch (get_attr_type (insn))
@@ -3717,6 +3717,9 @@
 
   return "%vmovd\t{%1, %0|%0, %1}";
 
+case TYPE_MSKMOV:
+  return "kmovd\t{%1, %k0|%k0, %1}";
+
 default:
   gcc_unreachable ();
 }
@@ -3724,7 +3727,7 @@
   [(set (attr "isa")
  (cond [(eq_attr "alternative" "0,1,2")
  (const_string "nox64")
-   (eq_attr "alternative" "3,7")
+   (eq_attr "alternative" "3,7,11")
  (const_string "x64")
(eq_attr "alternative" "8")
  (const_string "x64_sse4")
@@ -3741,6 +3744,8 @@
  (const_string "ssemov")
(eq_attr "alternative" "8")
  (const_string "sselog1")
+   (eq_attr "alternative" "11")
+ (const_string "mskmov")
   ]
   (const_string "imovx")))
(set (attr "prefix_extra")
@@ -3792,12 +3797,14 @@
   "split_double_mode (DImode, [0], 1, [3], [4]);")
 
 (define_insn "zero_extenddi2"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,*r")
(zero_extend:DI
-(match_operand:SWI12 1 "nonimmediate_operand" "m")))]
+(match_operand:SWI12 1 "nonimmediate_operand" "m,*k")))]
   "TARGET_64BIT"
-  "movz{l|x}\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "imovx")
+  "@
+   movz{l|x}\t{%1, %k0|%k0, %1}
+   kmov\t{%1, %k0|%k0, %1}"
+  [(set_attr "type" "imovx,mskmov")
(set_attr "mode" "SI")])
 
 (define_expand "zero_extendsi2"
@@ -3841,13 +3848,15 @@
(set_attr "mode" "SI")])
 
 (define_insn "*zero_extendsi2"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,*r")
(zero_extend:SI
- (match_operand:SWI12 1 "nonimmediate_operand" "m")))]
+ (match_operand:SWI12 1 "nonimmediate_operand" "m,*k")))]
   "!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
-  "movz{l|x}\t{%1, %0|%0, %1}"
-  [(set_attr "type" "imovx")
-   (set_attr "mode" "SI")])
+  "@
+   movz{l|x}\t{%1, %0|%0, %1}
+   kmov\t{%1, %0|%0, %1}"
+  [(set_attr "type" "imovx,mskmov")
+   (set_attr "mode" "SI,")])
 
 (define_expand "zero_extendqihi2"
   [(set (match_operand:HI 0 "register_operand")
@@ -3890,12 +3899,14 @@
 
 ; zero extend to SImode to avoid partial register stalls
 (define_insn "*zero_extendqihi2"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
+  [(set (match_operand:HI 0 "register_operand" "=r,*r")
+   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "qm,*k")))]
   "!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
-  "movz{bl|x}\t{%1, %k0|%k0, %1}"
-  [(set_attr "type" "imovx")
-   (set_attr "mode" "SI")])
+  "@
+   movz{bl|x}\t{%1, %k0|%k0, %1}
+   kmovb\t{%1, %k0|%k0, %1}"
+  [(set_attr "type" "imovx,mskmov")
+   (set_attr "mode" "SI,QI")])
 
 (define_insn_and_split "*zext_doubleword_and"
   [(set (match_operand:DI 0 "register_operand" "=&")


Re: [ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-08-05 Thread Nathan Sidwell

On 08/05/16 00:56, Irfan Ahmad wrote:


1. In the absence of lazy binding (that is almost always the case in embedded
systems), GOT is practically read-only - it needs to be modified only during


'practically' as in 'almost', or is it really read only?


linking by the dynamic linker, after that it can be considered and marked
read-only (e.g. read-only attribute set to be enforced by some MMU or MPU).


True, but I don't see how that's related to the -mNPDITR flag.  (Your statement 
is about lazy binding).



2. If you only need a simple dynamic object model - where you just need dynamic
loading and dynamic linking - but do not need to maintain multiple data states
for the object like you need in a traditional shared object model, then one
instance of GOT per dynamic object is enough.


Again, correct, but unrelated to the flag.  You seem to be claiming that 
unix-like ABIS require more than one GOT per dynamic object?



From #1: GOT is read-only so keeping it with CODE segment is a natural choice.
Now as there is no need to move it to RAM so there is no need for a mechanism
(-mSPB) that would enable controlling the GOT location independently of CODE
segment.


That's correct, but again unrelated to the -mNPDITR flag.



2. If you only need a simple dynamic object model - where you just need dynamic 
loading and dynamic linking - but do not need to maintain multiple data states 
for the object like you need in a traditional shared object model, then one 
instance of GOT per dynamic object is enough.


What 'multiple data states?'  Are you talking about lazy binfing again 
(unrelated to -mNPDITR), or something else (what?)



So when both #1 and #2 are met, you only need -mno-pic-data-is-text-relative.


Your conclusion doesn't follow.  You can satisfy them with regular GOT behavior.

nathan


Re: [RFC] ipa bitwise constant propagation

2016-08-05 Thread Jan Hubicka
> Hi Honza,
> 
> On 04/08/16 23:05, Jan Hubicka wrote:
> >>I didn't look at the propagation part but eventually the IPA-CP
> >>lattice gets quite big.  Also the alignment lattice is very
> >>similar to the bits lattice so why not merge those two?  But
> >
> >This was always the original idea to replace alignment propagation by bitwise
> >ccp.  I suppose we only have issue here because nonzero bits are not tracked 
> >for
> >pointers so we need to feed the original lattices by hand?
> 
> I also raised this one with Prathamesh off line. With the early-vrp,
> we should have nonzero_bits for non pointers. For pointers we should
> feed the lattices with get_pointer_alignment_1 as it is done in
> ipa-cpalignment propagation.

Yes, that is the general idea. Note that also for pointers it would be
very useful to track what pointers are non-NULL (C++ multiple inheritance 
inserts
a lot of NULL pointer checks that confuse us in later analysis and it would
be nice to optimize them out). I am not very convinced saving a pointer is
worth to make difference between pointers/nonpointers for all the local
tracking.
> 
> >We could also make use of VR ranges and bits while evaultaing predicates
> >in ipa-inline-analysis. I can look into it after returning from Leeds.
> 
> Indeed. With ealrly dom based VRP (non iterative at this point),
> some of the ranges can be pessimistic and can impact the estimation.
> Let me have a look at this.

Yes, but those are independent issues - size/time estimates should
take into account the new info we have and we should work on getting
it better when we can ;)

I will try to revisit the size/time code after returning from
my vacation and turn it into sreals for time + cleanup/generalize the APIs
a bit. I tried to do it last stage1 but got stuck on some of ugly side cases
+ gengtype not liking sreal type.

Honza
> 
> Thanks,
> Kugan
> 


Re: [patch] New libstdc++ docs on testing and library versioning

2016-08-05 Thread Jonathan Wakely

On 05/08/16 01:37 +0100, Jonathan Wakely wrote:

The core of the new guidelines for writing tests is this:

[begin]
Test cases that use features of a particular C++ standard should
specify the minimum required standard as an effective target:

  // { dg-do run { target c++11 } }

or

  // { dg-require-effective-target c++11 }

Specifying the minimum required standard for a test allows it to be
run using later standards, so that we can verify that C++11
components still work correctly when compiled as C++14 or later.


I forgot to say that this is the main reason for all the testsuite
work I've been doing for the last couple of weeks. We have hundreds of
tests using { dg-options "-std=gnu++11" } which means that we're not
testing huge chunks of the library with the default -std=gnu++14
option.

By using { dg-do what { target c++11 } } the default testsuite options
will run the test with the default -std setting, but we can optionally
also run them with -std=gnu++11, and -std=gnu++17, and other
variations (which I'm going to be doing nightly).

So I want the docs to describe how I think we should be writing tests.



[PATCH] Fix (parts of) PR68273

2016-08-05 Thread Richard Biener

The following patch avoids overaligned types created from IPA parameter
replacement.  It is said to help mipsel which still suffers from the
backend-looks-at-type-alignment-for-parameter-passing-ABI bug.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

mips testing appreciated.

Richard.

2016-08-05  Richard Biener  

PR ipa/68273
* ipa-prop.c (ipa_modify_formal_parameters): Build
parameter types with natural alignment also for the
over-aligned case.

Index: gcc/ipa-prop.c
===
--- gcc/ipa-prop.c  (revision 239164)
+++ gcc/ipa-prop.c  (working copy)
@@ -3910,7 +3909,7 @@ ipa_modify_formal_parameters (tree fndec
  if (is_gimple_reg_type (ptype))
{
  unsigned malign = GET_MODE_ALIGNMENT (TYPE_MODE (ptype));
- if (TYPE_ALIGN (ptype) < malign)
+ if (TYPE_ALIGN (ptype) != malign)
ptype = build_aligned_type (ptype, malign);
}
}



[PATCH] Fix PR72772

2016-08-05 Thread Richard Biener

This fixes PR72772 by avoing placing a degenerate PHI in each
forwarder block loop init creates when creating simple preheaders.
The solution is to simply split the single loop entry edge which
is also way cheaper than using the forwarder block creation path.

You've seen a load of fallout fixes already, so this is the final
patch adjusting two testcases (for gcc.dg/tree-ssa/pr59597.c we
no longer register the unwanted threadings as the forwarders
no longer contain PHIs).

This patch will cause

+FAIL: gcc.dg/graphite/scop-dsyr2k.c scan-tree-dump-times graphite "number 
of SCoPs: 1" 1
+FAIL: gcc.dg/graphite/scop-dsyrk.c scan-tree-dump-times graphite "number 
of SCoPs: 1" 1

on x86_64 with -m32 as niter analysis is confused by us now generating
an optimized loop nest via threading that has all redundant checks 
removed.  We no longer can prove that the loops do not eventually
iterate zero times.  I will open a PR for this (it is a latent issue).
The tests would pass with VRP disabled but I choose to leave them
FAILing for now.

Bootstrapped / tested many times on x86_64-unknown-linux-gnu, re-doing
this after the latest fallout fix now.

Richard.

2016-08-05  Richard Biener  

PR tree-optimization/72772
* cfgloopmanip.c (create_preheader): Use split_edge if there
is a single loop entry, avoiding degenerate PHIs.

* gcc.dg/tree-ssa/ldist-24.c: New testcase.
* gcc.dg/graphite/pr35356-1.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.

Index: gcc/cfgloopmanip.c
===
*** gcc/cfgloopmanip.c  (revision 239117)
--- gcc/cfgloopmanip.c  (working copy)
*** has_preds_from_loop (basic_block block,
*** 1497,1503 
  basic_block
  create_preheader (struct loop *loop, int flags)
  {
!   edge e, fallthru;
basic_block dummy;
int nentry = 0;
bool irred = false;
--- 1497,1503 
  basic_block
  create_preheader (struct loop *loop, int flags)
  {
!   edge e;
basic_block dummy;
int nentry = 0;
bool irred = false;
*** create_preheader (struct loop *loop, int
*** 1544,1552 
  
mfb_kj_edge = loop_latch_edge (loop);
latch_edge_was_fallthru = (mfb_kj_edge->flags & EDGE_FALLTHRU) != 0;
!   fallthru = make_forwarder_block (loop->header, mfb_keep_just, NULL);
!   dummy = fallthru->src;
!   loop->header = fallthru->dest;
  
/* Try to be clever in placing the newly created preheader.  The idea is to
   avoid breaking any "fallthruness" relationship between blocks.
--- 1544,1557 
  
mfb_kj_edge = loop_latch_edge (loop);
latch_edge_was_fallthru = (mfb_kj_edge->flags & EDGE_FALLTHRU) != 0;
!   if (nentry == 1)
! dummy = split_edge (single_entry);
!   else
! {
!   edge fallthru = make_forwarder_block (loop->header, mfb_keep_just, 
NULL);
!   dummy = fallthru->src;
!   loop->header = fallthru->dest;
! }
  
/* Try to be clever in placing the newly created preheader.  The idea is to
   avoid breaking any "fallthruness" relationship between blocks.
Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c(revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c(working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -fdump-tree-ldist" } */
+ 
+ int foo (int flag, char *a)
+ {
+   short i, j;
+   short l = 0;
+   if (flag == 1)
+ l = 3;
+ 
+   for (i = 0; i < 4; i++)
+ {
+   for (j = l - 1; j > 0; j--)
+   a[j] = a[j - 1];
+   a[0] = i;
+ }
+ }
+ 
+ /* { dg-final { scan-tree-dump "memmove" "ldist" } } */
Index: gcc/testsuite/gcc.dg/graphite/pr35356-1.c
===
--- gcc/testsuite/gcc.dg/graphite/pr35356-1.c   (revision 239117)
+++ gcc/testsuite/gcc.dg/graphite/pr35356-1.c   (working copy)
@@ -34,4 +34,4 @@ if (n >= k + 1 && k >= 0) {
 
 */
 
-/* { dg-final { scan-tree-dump "if \\\(P_9 >= P_10 \\\+ 1 && P_10 >= 0\\\) 
\\\{" "graphite" } } */
+/* { dg-final { scan-tree-dump "if \\\(P_8 >= P_9 \\\+ 1 && P_9 >= 0\\\) \\\{" 
"graphite" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr59597.c (revision 239164)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr59597.c (working copy)
@@ -54,5 +54,6 @@ main (int argc, char argv[])
   return crc;
 }
 
-/* { dg-final { scan-tree-dump "Cancelling" "vrp1" } } */
-
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 3 "vrp1" } } */
+/* { dg-final { scan-tree-dump-not "joiner" "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp1" } } */


[PATCH] Improve backwards threading

2016-08-05 Thread Richard Biener

This avoids regressing gcc.dg/tree-ssa/pr21417.c with the fix for
PR72772 where after it a forwarder block no longer is present.
It's easy to simply create it when FSM threading faces the situation
that the edge ending the path enters a loop.

I also fixed the costs for obviously related anon SSA names (when
they are the same).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, I'll apply
if that finishes successfully.

Richard.

2016-08-05  Richard Biener  

* tree-ssa-threadbackward.c: Include cfghooks.h.
(profitable_jump_thread_path): Treat same SSA names related.
(fsm_find_control_statement_thread_paths): When the taken edge
enters a loop split it instead of giving up.

Index: gcc/tree-ssa-threadbackward.c
===
--- gcc/tree-ssa-threadbackward.c   (revision 239164)
+++ gcc/tree-ssa-threadbackward.c   (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
+#include "cfghooks.h"
 
 static int max_threaded_paths;
 
@@ -206,8 +207,9 @@ profitable_jump_thread_path (vecsrc)
- >= bb_loop_depth (taken_edge->dest))
-   convert_and_register_jump_thread_path (path, taken_edge);
+ < bb_loop_depth (taken_edge->dest))
+   split_edge (taken_edge);
+ convert_and_register_jump_thread_path (path, taken_edge);
  path->pop ();
}
}
@@ -586,9 +592,13 @@ fsm_find_control_statement_thread_paths
 name, arg);
  if (taken_edge)
{
+ /* If the taken edge is a loop entry avoid mashing two
+loops into one with multiple latches by splitting
+the edge.  */
  if (bb_loop_depth (taken_edge->src)
- >= bb_loop_depth (taken_edge->dest))
-   convert_and_register_jump_thread_path (path, taken_edge);
+ < bb_loop_depth (taken_edge->dest))
+   split_edge (taken_edge);
+ convert_and_register_jump_thread_path (path, taken_edge);
  path->pop ();
}
 


Re: [PATCH] Teach VRP to truncate the case ranges of a switch

2016-08-05 Thread Markus Trippelsdorf
On 2016.08.03 at 15:47 +0200, Richard Biener wrote:
> On Wed, Aug 3, 2016 at 6:00 AM, Patrick Palka  wrote:
> > VRP currently has functionality to eliminate case labels that lie
> > completely outside of the switch operand's value range.  This patch
> > complements this functionality by teaching VRP to also truncate the case
> > label ranges that partially overlap with the operand's value range.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu.  Does this look like
> > a reasonable optimization?  Admittedly, its effect will almost always be
> > negligible except in cases where a case label range spans a large number
> > of values which is a pretty rare thing.  The optimization triggered
> > about 250 times during bootstrap.
> 
> I think it's most useful when the range collapses to a single value.
> 
> Ok.

Apparently typedefs aren't handled correctly, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72810

-- 
Markus


Re: [AArch64] Handle HFAs of float16 types properly

2016-08-05 Thread James Greenhalgh
On Fri, Aug 05, 2016 at 11:00:39AM +0100, Yao Qi wrote:
> On Tue, Jul 26, 2016 at 2:55 PM, James Greenhalgh
>  wrote:
> >
> > OK? As this is an ABI break, I'm not proposing for it to go back to GCC 6,
> > though it will apply cleanly there if the maintainers support that.
> >
> 
> What do you mean by "ABI break"?  AFAICS, with this patch, it conforms to
> AAPCS.

After this patch code generated for GCC 4.9/5/6 will not be ABI
compatible with code generated for GCC 7 for HFAs of __fp16. The new
generated code will conform to AAPCS64, but the old code didn't so there has
been an ABI change between the GCC versions. We don't like doing that for
minor releases, so the patch is not really suitable for backporting.

> The subject leads me thinking about the handling of HVA of float16.

These are handled like any other vector, the code looking at HVA's doesn't
care about the inner mode of the vector just the bitsize:

  config/aarch64/aarch64.c::aapcs_vfp_sub_candidate

case VECTOR_TYPE:
  /* Use V2SImode and V4SImode as representatives of all 64-bit
 and 128-bit vector types.  */
  size = int_size_in_bytes (type);
  switch (size)
{
case 8:
  mode = V2SImode;
  break;
case 16:
  mode = V4SImode;
  break;
default:
  return -1;
}

  if (*modep == VOIDmode)
*modep = mode;

  /* Vector modes are considered to be opaque: two vectors are
 equivalent for the purposes of being homogeneous aggregates
 if they are the same size.  */
  if (*modep == mode)
return 1;

  break;

Thanks,
James



Re: [PATCH][AArch64] Improve stack adjustment

2016-08-05 Thread Richard Earnshaw (lists)
On 04/08/16 16:56, Richard Earnshaw (lists) wrote:
> On 04/08/16 12:06, Wilco Dijkstra wrote:
>> Improve stack adjustment by reusing a temporary move immediate 
>> from the epilog if the register is still valid in the epilog.  This generates
>> smaller code for leaf functions:
>>
>> mov x16, 4
>> sub sp, sp, x16
>> ldr w0, [sp, w0, sxtw 2]
>> add sp, sp, x16
>> ret
>>
>> Passes GCC regression tests.
>>
>> ChangeLog:
>> 2016-08-04  Wilco Dijkstra  
>>
>> gcc/
>>  * config/aarch64/aarch64.c (aarch64_add_constant):
>>  Add extra argument to allow emitting the move immediate.
>>  Use add/sub with positive immediate.
>>  (aarch64_expand_epilogue): Decide when to leave out move.
>>
>> testsuite/
>>  * gcc.target/aarch64/test_frame_17.c: New test.
>> --
>>
> 
> I see you've added a default argument for your new parameter.  I think
> doing that is fine, but I have two comments about how we might use that
> in this case.
> 
> Firstly, if this parameter is suitable for having a default value, then
> I think the preceding one should also be treated in the same way.
> 
> Secondly, I think (due to these parameters being BOOL with no useful
> context to make it clear which is which) that having wrapper functions
> (inlined, of course) that describe the intended behaviour more clearly
> would be useful.  So, defining
> 
> static inline void
> aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> {
>aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);

The inner function should be aarch64_add_constant of course!

R.

> }
> 
> would make it clearer at the call point than having a lot of true and
> false parameters scattered round the code.
> 
> Alternatively we might remove all the default parameters and require
> wrappers like the above to make it more explicit which API is intended -
> this might make more sense if not all combinations make sense.
> 
> R.
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> ce2cc5ae3e1291f4ef4a8408461678c9397b06bd..5b59e4dd157351f301fc563a724cefe8a9be132c
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1941,15 +1941,21 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>>  }
>>  
>>  /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
>> -   intermediate value if necessary.
>> +   intermediate value if necessary.  FRAME_RELATED_P should be true if
>> +   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
>> +   to the generated instructions.  If SCRATCHREG is known to hold
>> +   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
>> +   immediate again.
>>  
>> -   This function is sometimes used to adjust the stack pointer, so we must
>> -   ensure that it can never cause transient stack deallocation by writing an
>> -   invalid value into REGNUM.  */
>> +   Since this function may be used to adjust the stack pointer, we must
>> +   ensure that it cannot cause transient stack deallocation (for example
>> +   by first incrementing SP and then decrementing when adjusting by a
>> +   large immediate).  */
>>  
>>  static void
>>  aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
>> -  HOST_WIDE_INT delta, bool frame_related_p)
>> +  HOST_WIDE_INT delta, bool frame_related_p,
>> +  bool emit_move_imm = true)
>>  {
>>HOST_WIDE_INT mdelta = abs_hwi (delta);
>>rtx this_rtx = gen_rtx_REG (mode, regnum);
>> @@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, 
>> int scratchreg,
>>return;
>>  }
>>  
>> -  /* We need two add/sub instructions, each one performing part of the
>> - calculation.  Don't do this if the addend can be loaded into register 
>> with
>> - a single instruction, in that case we prefer a move to a scratch 
>> register
>> - following by an addition.  */
>> -  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
>> +  /* We need two add/sub instructions, each one perform part of the
>> + addition/subtraction, but don't this if the addend can be loaded into
>> + register by single instruction, in that case we prefer a move to 
>> scratch
>> + register following by addition.  */
>> +  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
>>  {
>>HOST_WIDE_INT low_off = mdelta & 0xfff;
>>  
>> @@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, 
>> int scratchreg,
>>  
>>/* Otherwise use generic function to handle all other situations.  */
>>rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
>> -  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
>> -  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
>> +  if (emit_move_imm)
>> +aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, 

Re: [AArch64] Handle HFAs of float16 types properly

2016-08-05 Thread Yao Qi
On Tue, Jul 26, 2016 at 2:55 PM, James Greenhalgh
 wrote:
>
> OK? As this is an ABI break, I'm not proposing for it to go back to GCC 6,
> though it will apply cleanly there if the maintainers support that.
>

What do you mean by "ABI break"?  AFAICS, with this patch, it conforms to
AAPCS.

The subject leads me thinking about the handling of HVA of float16.

-- 
Yao (齐尧)


Re: [PATCH, ARM] Add a new target hook to compute the frame layout

2016-08-05 Thread Richard Earnshaw (lists)
On 04/08/16 22:16, Bernd Edlinger wrote:
> Hi,
> 
> this patch introduces a new target hook that allows the target's
> INITIAL_ELIMINATION_OFFSET function to use cached values instead of 
> re-computing the frame layout every time.
> 
> I have updated the documentation a bit and hope it is clearer this time.
> 
> It still needs a review by ARM port maintainers.
> 
> If the ARM port maintainers find this patch useful, that would be fine.
> 

I need to look into this more, but my first thought was that the
documentation is confusing, or there is a real problem in this patch.

As I understand it the frame has to be re-laid out each time the
contents of the frame changes (an extra register becomes live or another
spill slot is needed).  So saying that it is laid out once can't be
right if (as I read it initially) you mean 'once per function' since I
think it needs to be 'once for each time the frame contents changes'.

Of course, things might be a bit different with LRA compared with
reload, but I strongly suspect this is never going to be an 'exactly
once per function' operation.

Can you clarify your meaning in the documentation please?

R.

> 
> Thanks
> Bernd.
> 
> On 06/21/16 23:29, Jeff Law wrote:
>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>> it is necessary to call this function several times with
>>> different register combinations.
>>> Most targets use a cached data structure that describes the
>>> exact frame layout of the current function.
>>>
>>> It is safe to skip the computation when reload_completed = true,
>>> and most targets do that already.
>>>
>>> However while reload is doing its work, it is not clear when to
>>> do the computation and when not.  This results in unnecessary
>>> work.  Computing the frame layout can be a simple function or an
>>> arbitrarily complex one, that walks all instructions of the current
>>> function for instance, which is more or less the common case.
>>>
>>>
>>> This patch adds a new optional target hook that can be used
>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>> into a O(n) computation part, and a O(1) result function.
>>>
>>> The patch implements a compute_frame_layout target hook just
>>> for ARM in the moment, to show the principle.
>>> Other targets may also implement that hook, if it seems appropriate.
>>>
>>>
>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>> OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>> changelog-frame-layout.txt
>>>
>>>
>>> 2016-06-16  Bernd Edlinger  
>>>
>>> * target.def (compute_frame_layout): New optional target hook.
>>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>> * lra-eliminations.c (update_reg_eliminate): Call
>>> compute_frame_layout
>>> target hook.
>>> * reload1.c (verify_initial_elim_offsets): Likewise.
>>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>> (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>> (arm_get_frame_offsets): Split up into this ...
>>> (arm_compute_frame_layout): ... and this function.
>> The ARM maintainers would need to chime in on the ARM specific changes
>> though.
>>
>>
>>
>>> Index: gcc/target.def
>>> ===
>>> --- gcc/target.def(Revision 233176)
>>> +++ gcc/target.def(Arbeitskopie)
>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>   unsigned int, (void),
>>>   default_case_values_threshold)
>>>
>>> -/* Retutn true if a function must have and use a frame pointer.  */
>> s/Retutn/Return
>>
>>> +/* Optional callback to advise the target to compute the frame
>>> layout.  */
>>>  DEFHOOK
>>> +(compute_frame_layout,
>>> + "This target hook is called immediately before reload wants to call\n\
>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>> frame\n\
>>> +layout instead of re-computing it on every invocation.  This is
>>> particularly\n\
>>> +useful for targets that have an O(n) frame layout function.
>>> Implementing\n\
>>> +this callback is optional.",
>>> + void, (void),
>>> + hook_void_void)
>> So the docs say "immediately before", but that's not actually reality in
>> lra-eliminations.  I think you can just say "This target hook is called
>> before reload or lra-eliminations calls
>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>
>>
>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>
>> I'm OK with this conceptually.  I think you need a minor doc update and
>> OK from the ARM maintainers before it can be installed though.
>>
>> jeff
>>
>> changelog-frame-layout.txt
>>
>>
>> 2016-06-16  Bernd Edlinger  
>>
>>  * target.def (compute_frame_layout): New optional target hook.
>>  * doc/tm.texi.in 

RE: [ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-08-05 Thread Joey Ye
Irfan,

Thanks for the case sharing. It is a persuasive reason not to error out 
-mno-SPB. 

Nathan's patch changes default behaviour that -mSPB will be implied when 
-mno-PDITR is provided. So with this patch your project need to explicitly 
specify -mno-SPB to make it work as before. IMHO default behaviour should 
reflect the typical usage. Now I not so sure whether -mSPB should be typically 
used with -mno-PDITR. Irfan what's your opinion?

Thanks,
Joey

> -Original Message-
> From: Irfan Ahmad [mailto:h.irfanah...@gmail.com]
> Sent: 05 August 2016 07:06
> To: Ramana Radhakrishnan; Nathan Sidwell; Joey Ye; GCC Patches
> Cc: Richard Earnshaw
> Subject: Re: [ARM] mno-pic-data-is-text-relative & msingle-pic-base
> 
> Ramana,
> 
> I saw some correspondence between you and Nathan on his patch
> [https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00630.html] (after I sent this
> email) going in a direction that may eventually result in too tight than 
> necessary
> coupling between these two switches, as your response hints at:
> 
> > I am also slightly inclined to go further and error out if someone uses 
> > -mno-
> PDITR with -mno-SPB on the command line, after all as you say -mno-PDITR
> implies a non-fixed mapping while -mno-SPB implies there is some fixed mapping
> some where currently in the compiler. I don't see how the twain can meet. That
> can happen as a follow-up - the current patch is by itself a step improvement.
> 
> Please see the alternative perspective as described below.
> 
> Irfan Ahmad
> [p.s. Sorry about repeat send. I accidentally sent it earlier in HTML format]
> --
> -
> On 05/08/2016 09:56, Irfan Ahmad wrote:
> 
> Nathan,
> 
> Sorry for jumping in a relatively old thread. I saw this in mailing list 
> archives
> during a web search (I wasn't on this mailing list before). I decided to 
> speak up as
> I would be an affected party if this patch (or some similar future patch) gets
> accepted or worse yet, the feature involved gets disabled.
> 
> > Apparently there are legitimate reasons one might want the -mno-PDITR
> behaviour without -mSPB. I don't know what those are, perhaps Joey could
> clarify?
> 
> Yes, there are some practical use cases that require -mno-pic-data-is-text-
> relative (-mno-PDITR) without -msingle-pic-base (-mSPB).
> 
> These are based on two primary principles:
> 
> 1. In the absence of lazy binding (that is almost always the case in embedded
> systems), GOT is practically read-only - it needs to be modified only during
> linking by the dynamic linker, after that it can be considered and marked 
> read-
> only (e.g. read-only attribute set to be enforced by some MMU or MPU).
> 
> 2. If you only need a simple dynamic object model - where you just need
> dynamic loading and dynamic linking - but do not need to maintain multiple 
> data
> states for the object like you need in a traditional shared object model, 
> then one
> instance of GOT per dynamic object is enough.
> 
> From #1: GOT is read-only so keeping it with CODE segment is a natural choice.
> Now as there is no need to move it to RAM so there is no need for a mechanism
> (-mSPB) that would enable controlling the GOT location independently of CODE
> segment.
> 
> From #2: Only one instance of GOT is required per dynamic object so there is 
> no
> need for a mechanism (-mSPB) that would enable switching GOTs.
> 
> So when both #1 and #2 are met, you only need -mno-pic-data-is-text-relative.
> 
> Irfan Ahmad




Re: RFC: Make diagnostics for C++ reference binding more consistent

2016-08-05 Thread Jonathan Wakely

On 27/07/16 18:06 -0400, Jason Merrill wrote:

On Wed, Jul 27, 2016 at 8:05 AM, Jonathan Wakely  wrote:

Consider:

template T declval();

int& r1 = declval();
int&& r2 = declval();
int& r3 = declval();


This produces three quite different errors:


refs.cc:3:25: error: invalid initialization of non-const reference of type
'int&' from an rvalue of type 'int'
int& r1 = declval();
  ~~^~
refs.cc:4:25: error: cannot bind 'int' lvalue to 'int&&'
int&& r2 = declval();
   ~^~
refs.cc:5:30: error: binding 'const int' to reference of type 'int&'
discards qualifiers
int& r3 = declval();
  ~~~^~


The first one uses the order to/from, but the other two are from/to.

The first and second are saying the same thing (wrong value category)
but very differently.

The first and third mention references but the second doesn't.

The standard talks about binding a reference to something, but the
second and third diagnostics talk about binding something to a
reference (and the first doesn't mention binding at all).

The first diagnostic is specific to lvalue references (it wouldn't be
invalid if it was binding a non-const _rvalue_ reference to an
rvalue), but doesn't use the word "lvalue".

(FWIW Clang and EDG both have their own inconsistencies for the same
example, but that shouldn't stop us trying to improve.)


IMHO the first should use the words "bind" and "lvalue reference":

refs.cc:3:25: error: cannot bind non-const lvalue reference of type 'int&'
to an rvalue of type 'int'
int& r1 = declval();
  ~~^~

The second should use the phrasing "bind ref to value" instead of
"bind value to ref", and mention "rvalue reference":

refs.cc:4:25: error: cannot bind rvalue reference of type 'int&&' to an
lvalue of type 'int'
int&& r2 = declval();
   ~^~

The third should use the same phrasing (but doesn't need to mention
lvalue/rvalue because the problem is related to cv-qualifiers not
value categories):

refs.cc:5:30: error: binding reference of type 'int&' to 'const int'
discards qualifiers
int& r3 = declval();
  ~~~^~


I've only considered trivial examples here, is there some reason to
prefer to current diagnostics for more complex cases?

The attached patch makes that change, but there are probably lots of
tests that would need updating which I haven't done until I know if
the change is worthwhile.


Sounds good to me.


Here's the full patch then. There weren't too many tests to adjust.

Tested x86_64-linux. OK for trunk?


commit c54df5dc2010513aeda6b065da843b4a582248e3
Author: Jonathan Wakely 
Date:   Fri Aug 5 09:51:54 2016 +0100

Harmonize diagnostics for invalid reference binding

gcc/cp:

	* call.c (convert_like_real): Harmonize diagnostics for invalid
	reference binding.

gcc/testsuite:

	* call.c (convert_like_real): Harmonize diagnostics for invalid
	reference binding.
	* g++.dg/conversion/pr16333.C: Adjust dg-error regexp.
	* g++.dg/conversion/pr41426.C: Likewise.
	* g++.dg/conversion/pr66211.C: Likewise.
	* g++.dg/cpp1y/lambda-init9.C: Likewise.
	* g++.dg/init/ref8.C: Likewise.
	* g++.old-deja/g++.law/cvt20.C: Likewise.
	* g++.old-deja/g++.mike/p9732c.C: Likewise.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 802c325..918661a 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6710,15 +6710,15 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum,
 	tree extype = TREE_TYPE (expr);
 	if (TYPE_REF_IS_RVALUE (ref_type)
 		&& lvalue_p (expr))
-	  error_at (loc, "cannot bind %qT lvalue to %qT",
-			extype, totype);
+	  error_at (loc, "cannot bind rvalue reference of type %qT to "
+"lvalue of type %qT", totype, extype);
 	else if (!TYPE_REF_IS_RVALUE (ref_type) && !lvalue_p (expr)
 		 && !CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (ref_type)))
-	  error_at (loc, "invalid initialization of non-const reference of "
-			"type %qT from an rvalue of type %qT", totype, extype);
+	  error_at (loc, "cannot bind non-const lvalue reference of "
+			"type %qT to an rvalue of type %qT", totype, extype);
 	else if (!reference_compatible_p (TREE_TYPE (totype), extype))
-	  error_at (loc, "binding %qT to reference of type %qT "
-			"discards qualifiers", extype, totype);
+	  error_at (loc, "binding reference of type %qT to %qT "
+			"discards qualifiers", totype, extype);
 	else
 	  gcc_unreachable ();
 	maybe_print_user_conv_context (convs);
diff --git a/gcc/testsuite/g++.dg/conversion/pr16333.C b/gcc/testsuite/g++.dg/conversion/pr16333.C
index 810c12a..a00bc5c 100644
--- a/gcc/testsuite/g++.dg/conversion/pr16333.C
+++ b/gcc/testsuite/g++.dg/conversion/pr16333.C
@@ -7,4 +7,4 @@ struct X {
 int a[3];
 X foo1 () { return a; }
 const X  () { return a; } // { 

Re: make streaming routines for widest_int non-static

2016-08-05 Thread kugan

Hi Richard and Prathamesh,

On 05/08/16 18:23, Richard Biener wrote:

On Fri, 5 Aug 2016, Prathamesh Kulkarni wrote:


Hi,
This patch makes streamer_read_wi and streamer_write_wi non-static,
and exports them from lto-streamer.h. I suppose this hunk could be committed
independently of the ipa-bitwise-cp patch ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.
OK for trunk ?


I think I approved a better patch from Kugan.


Sorry, I was waiting for the other patches to get approved. Since this 
is needed for Prathamesh too, I will commit after testing this patch alone.


Thanks.
Kugan


[PATCH] Use __invoke in std::function internals

2016-08-05 Thread Jonathan Wakely

This fixes a bug in the _Callable SFINAE constraint of std::function,
where it was using an rvalue _Func in the result_of expression, when
the target is actually invoked as an lvalue.

I'm also making std::function use std::__invoke() directly, instead of
going via __callable_functor() and/or mem_fn().  __callable_functor
conditionally calls std::mem_fn, and mem_fn uses std::__invoke anyway,
so we might as well just use that directly.

I'm also removing the TODO comments about allocators, because I fixed
that by removing them from the standard :-) http://wg21.link/p0302r1

* include/std/functional (__callable_functor): Remove.
(_Function_handler::_M_invoke): Use __invoke instead of
__callable_functor or mem_fn.
(function::_Callable): Use lvalue in result_of expression.
(function): Remove TODO comments about allocators.
* testsuite/20_util/function/cons/refqual.cc: New test.

Tested powerpc64-linux, committed to trunk.


commit 259d510ed2d371dab3f38c0f425980c5902c1df6
Author: Jonathan Wakely 
Date:   Thu Aug 4 19:24:49 2016 +0100

Use __invoke in std::function internals

* include/std/functional (__callable_functor): Remove.
(_Function_handler::_M_invoke): Use __invoke instead of
__callable_functor or mem_fn.
(function::_Callable): Use lvalue in result_of expression.
(function): Remove TODO comments about allocators.
* testsuite/20_util/function/cons/refqual.cc: New test.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 843dc83..4ca32c3 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1483,33 +1483,6 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 : __is_location_invariant<_Tp>
 { };
 
-  // Converts a reference to a function object into a callable
-  // function object.
-  template
-inline _Functor&
-__callable_functor(_Functor& __f)
-{ return __f; }
-
-  template
-inline _Mem_fn<_Member _Class::*>
-__callable_functor(_Member _Class::* &__p)
-{ return std::mem_fn(__p); }
-
-  template
-inline _Mem_fn<_Member _Class::*>
-__callable_functor(_Member _Class::* const &__p)
-{ return std::mem_fn(__p); }
-
-  template
-inline _Mem_fn<_Member _Class::*>
-__callable_functor(_Member _Class::* volatile &__p)
-{ return std::mem_fn(__p); }
-
-  template
-inline _Mem_fn<_Member _Class::*>
-__callable_functor(_Member _Class::* const volatile &__p)
-{ return std::mem_fn(__p); }
-
   template
 class function;
 
@@ -1731,8 +1704,8 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   static _Res
   _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
   {
-   return std::__callable_functor(**_Base::_M_get_pointer(__functor))(
- std::forward<_ArgTypes>(__args)...);
+   return std::__invoke(**_Base::_M_get_pointer(__functor),
+std::forward<_ArgTypes>(__args)...);
   }
 };
 
@@ -1746,8 +1719,8 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   static void
   _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
   {
-   std::__callable_functor(**_Base::_M_get_pointer(__functor))(
-   std::forward<_ArgTypes>(__args)...);
+   std::__invoke(**_Base::_M_get_pointer(__functor),
+ std::forward<_ArgTypes>(__args)...);
   }
 };
 
@@ -1763,8 +1736,8 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   static _Res
   _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
   {
-   return std::mem_fn(_Base::_M_get_pointer(__functor)->__value)(
-   std::forward<_ArgTypes>(__args)...);
+   return std::__invoke(_Base::_M_get_pointer(__functor)->__value,
+std::forward<_ArgTypes>(__args)...);
   }
 };
 
@@ -1803,8 +1776,8 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   static void
   _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
   {
-   std::mem_fn(_Base::_M_get_pointer(__functor)->__value)(
-   std::forward<_ArgTypes>(__args)...);
+   std::__invoke(_Base::_M_get_pointer(__functor)->__value,
+ std::forward<_ArgTypes>(__args)...);
   }
 };
 
@@ -1826,7 +1799,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   typedef _Res _Signature_type(_ArgTypes...);
 
   template::type>
+  typename _Res2 = typename result_of<_Func&(_ArgTypes...)>::type>
struct _Callable : __check_func_return_type<_Res2, _Res> { };
 
   // Used so the return type convertibility checks aren't done when
@@ -1878,8 +1851,6 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
__x.swap(*this);
   }
 
-  // TODO: needs allocator_arg_t
-
   /**
*  @brief Builds a %function that targets a copy of the incoming
*  function 

Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-05 Thread Martin Liška
On 08/04/2016 07:03 PM, Nathan Sidwell wrote:
> On 08/04/16 12:43, Nathan Sidwell wrote:
> 
>> How about:
>> gcov_t expected;
>> atomic_load ([0],  val, ...);
>> gcov_t delta = val == value ? 1 : -1;
>> atomic_add ([1], delta);   <-- or atomic_add_fetch
>> if (delta < 0) {
>>   /* can we set counter[0]? */
>>   atomic_load ([1], , ...);
>>   if (expected < 0) {
>> atomic_store ([0], value, ...);
>> atomic_add ([1], 2, ...);
>>   }
>> }
>> atomic_add ([2], 1, ...);
> 

Hi.

Thank you for very intensive brainstorming ;) Well I still believe that 
following code
is not thread safe, let's image following situation:

> we could do better by using compare_exchange storing value, and detect the 
> race I mentioned:
> 
> gcov_t expected, val;
> atomic_load ([0],  , ...);

[thread 1]: value == 1, read val == 1 // scheduled here

> gcov_t delta = val == value ? 1 : -1;
> atomic_add ([1], delta);
> if (delta < 0) {
>retry:
> /* can we set counter[0]? */
> atomic_load ([1], , ...);
> if (expected < 0) {
>   bool stored = atomic_compare_exchange ([0], , , ...);
>   if (!stored && val != value)
> goto retry;
[thread 2]: value == 2, just updated counter[0] to 2
// after that [thread 1] continue, but wrongly does counter[1]++, but value != 
counter[0]
>   atomic_add ([1], 2, ...);
>   }
> }
> atomic_add ([2], 1, ...);
> 
> This  corrects the off-by one issue.
> 
> nathan

Well, I wrote attached test-case which should trigger a data-race, but TSAN is 
silent:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread 
-fprofile-update=atomic
$ ./a.out

In main: creating thread 0
In main: creating thread 1
new counter[1] value, N:0
In main: creating thread 2
new counter[1] value, N:1
new counter[1] value, N:2
new counter[1] value, N:3
new counter[1] value, N:4
new counter[1] value, N:5
new counter[1] value, N:6
new counter[1] value, N:7
new counter[1] value, N:8
new counter[1] value, N:9
new counter[1] value, N:10
new counter[1] value, N:11
new counter[1] value, N:12
new counter[1] value, N:12
new counter[1] value, N:13
new counter[1] value, N:14
new counter[1] value, N:15
new counter[1] value, N:16
In main: creating thread 3
In main: creating thread 4
In main: creating thread 5
In main: creating thread 6
In main: creating thread 7
In main: creating thread 8
In main: creating thread 9
In main: creating thread 10
In main: creating thread 11
In main: creating thread 12
In main: creating thread 13
In main: creating thread 14
In main: creating thread 15

However, not updating arc counters with atomic operations causes really many 
races:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread
$ ./a.out 2>&1 | grep 'data race' | wc -l
110

Sample:
WARNING: ThreadSanitizer: data race (pid=11424)
  Read of size 8 at 0x00606718 by thread T4:
#0 A::foo() /tmp/race.cc:10 (a.out+0x00401e78)

  Previous write of size 8 at 0x00606718 by thread T1:
[failed to restore the stack]

  Location is global '__gcov0._ZN1A3fooEv' of size 16 at 0x00606710 
(a.out+0x00606718)

  Thread T4 (tid=11429, running) created by main thread at:
#0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 
(libtsan.so.0+0x0002ad2d)
#1 main /tmp/race.cc:43 (a.out+0x00401afb)

  Thread T1 (tid=11426, finished) created by main thread at:
#0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 
(libtsan.so.0+0x0002ad2d)
#1 main /tmp/race.cc:43 (a.out+0x00401afb)

Maybe I miss something and my tester sample is wrong (please apply attached 
patch to use original __gcov_one_value_profiler_body)?
Thanks,
Martin

diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 6e96fa9..42b780f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -188,8 +188,8 @@ gimple_init_edge_profiler (void)
   profiler_fn_name = "__gcov_indirect_call_profiler_v2";
   if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
 	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
-	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
+//  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+//	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
 
   tree_indirect_call_profiler_fn
 	  = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index c1e287d..d43932e 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -70,6 +70,8 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 
In any case, COUNTERS[2] is incremented.  */
 
+static int counter = 0;
+
 static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
 {
@@ -77,6 +79,7 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
 counters[1]++;
   else if (counters[1] == 0)
 {
+  fprintf (stderr, "new counter[1] value, N:%d\n", 

[PATCH] Make gcc.dg/tree-ssa/ivopt_5.c more robust

2016-08-05 Thread Richard Biener

If edges are swapped so are PHI args, the following adjusts the pattern
for this case.

Committed.

Richard.

2016-08-05  Richard Biener  

* gcc.dg/tree-ssa/ivopt_5.c: Make robust against edge swapping.

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_5.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 239164)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_5.c (working copy)
@@ -20,4 +20,4 @@ void foo1 (char *pstart, int n)
 *p = 1;
 }
 
-/* { dg-final { scan-tree-dump-times "ivtmp.\[0-9_\]* = PHI <\[^0\]" 0 
"ivopts"} } */
+/* { dg-final { scan-tree-dump-times "ivtmp.\[0-9_\]* = PHI <\[^0\]\[^,\]*, 
\[^0\]" 0 "ivopts"} } */


Re: libgo patch committed: Update to 1.7rc3

2016-08-05 Thread Uros Bizjak
On Thu, Aug 4, 2016 at 6:48 PM, Ian Lance Taylor  wrote:
> On Thu, Aug 4, 2016 at 1:11 AM, Uros Bizjak  wrote:
>> On Thu, Aug 4, 2016 at 12:53 AM, Ian Lance Taylor  wrote:
>>> On Thu, Jul 28, 2016 at 4:24 AM, Uros Bizjak  wrote:

 A new testsuite failure is introduced:

 FAIL: text/template

 on both, x86_64-linux-gnu and alpha-linux-gnu.

 The testcase corrupts stack with a too deep recursion.

 There is a part in libgo/go/text/template/exec.go that should handle
 this situaiton:

 // maxExecDepth specifies the maximum stack depth of templates within
 // templates. This limit is only practically reached by accidentally
 // recursive template invocations. This limit allows us to return
 // an error instead of triggering a stack overflow.
 const maxExecDepth = 10

 but the limit is either set too high, or the error handling code is
 inefficient on both, split-stack (x86_64) and non-split-stack (alpha)
 targets. Lowering this value to 1 "fixes" the testcase on both
 targets.
>>>
>>> I can not recreate this problem on x86 or x86_64.
>>>
>>> Does this patch work around the problem on Alpha?
>>
>> Yes, the patch "fixes" the problem on alpha, but I still see the
>> failure on x86_64, even with the unlimited stack.
>
> OK, I was able to recreate this by using GNU ld rather than gold.  I
> have committed the appended patch to reduce the number of recursive
> template invocations, since you said that 1 did let the test pass
> for you, and it works for me using GNU ld.  This number is still high
> enough to not cut off any reasonable template execution.
>
> For this patch bootstrapped and ran Go testsuite on
> x86_64-pc-linux-gnu, using both GNU ld and gold.  Committed to
> mainline.

The fixed test now passes reliably on alpha, *but* fails sporadically
on x86_64-linux-gnu with split-stack (GNU ld).

This is what I get on x86_64 Fedora 24:

$ while true; do LD_LIBRARY_PATH=../../.libs ./a.out; done
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
Segmentation fault (core dumped)
PASS
PASS
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
PASS
Segmentation fault (core dumped)
PASS
...

Running the test under gdb, the segfault is at different places, but
usually involves some insn that accesses the stack, e.g.

mov %rdi,0x8(%rsp)
push $0
call ...

Uros.


Re: make streaming routines for widest_int non-static

2016-08-05 Thread Richard Biener
On Fri, 5 Aug 2016, Prathamesh Kulkarni wrote:

> Hi,
> This patch makes streamer_read_wi and streamer_write_wi non-static,
> and exports them from lto-streamer.h. I suppose this hunk could be committed
> independently of the ipa-bitwise-cp patch ?
> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> OK for trunk ?

I think I approved a better patch from Kugan.

Richard.


Re: [tree-ssa-ccp] modify extend_mask to extend bits based on signop

2016-08-05 Thread Richard Biener
On Fri, 5 Aug 2016, Prathamesh Kulkarni wrote:

> Ah, the mail failed to be delivered to gcc-patches, sorry for the double-post.
> On 5 August 2016 at 01:26, Prathamesh Kulkarni
>  wrote:
> > Hi,
> > Is the attached patch OK ?
> > Since we want to extend based on signop, I removed ORing with wi::mask().
> > Bootstrap+test passes on x86_64-unknown-linux-gnu.
> > Cross-test in progress on arm*-*-* and aarch64*-*-*.
> > Ok for trunk ?

Ok if you adjust the extend_mask function comment accordingly.

Thanks,
Richard.


Re: fix fallout of pr22051-2.c on arm

2016-08-05 Thread Richard Biener
On Fri, 5 Aug 2016, Prathamesh Kulkarni wrote:

> On 4 August 2016 at 12:39, Richard Biener  wrote:
> > On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> The attached patch fixes pr22051-2.c which regressed due to
> >> r238754. Matthew, could you please confirm if this patch fixes the
> >> test-case for you ?
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> Cross tested on arm*-*-*.
> >> OK for trunk ?
> >
> > Note that if function pointer types are really the issue then
> > you also need to handle METHOD_TYPE, thus sth like
> >
> >   && ! FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@0))
> >
> > please also add a comment for this non-obvious thing.  I believe
> > we should simply apply function pointer canonicalization for
> > comparisons early during gimplification - exposing this target
> > detail only during RTL expansion makes generic optimization hard.
> >
> > Ok with those changes.
> Is the attached version OK ?

Ok.

Richard.

> Bootstrapped on x86_64-unknown-linux-gnu, cross-tested on arm*-*-*.
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> 


Re: protected alloca class for malloc fallback

2016-08-05 Thread Richard Biener
On Thu, Aug 4, 2016 at 5:19 PM, Aldy Hernandez  wrote:
> On 08/04/2016 08:57 AM, Richard Biener wrote:
>>
>> On Thu, Aug 4, 2016 at 1:30 PM, Aldy Hernandez  wrote:
>>>
>>> Howdy!
>>>
>>> As part of my -Walloca-larger-than=life work, I've been running said pass
>>> over gcc, binutils, and gdb, and trying to fix things along the way.
>>>
>>> Particularly irritating and error prone is having to free malloc'd
>>> pointers
>>> on every function exit point.  We end up with a lot of:
>>>
>>> foo(size_t len)
>>> {
>>>void *p, *m_p = NULL;
>>>if (len < HUGE)
>>>  p = alloca(len);
>>>else
>>>  p = m_p = malloc(len);
>>>if (something)
>>>  goto out;
>>>stuff();
>>> out:
>>>free (m_p);
>>> }
>>>
>>> ...which nobody really likes.
>>>
>>> I've been thinking that for GCC we could have a protected_alloca class
>>> whose
>>> destructor frees any malloc'd memory:
>>>
>>> void foo()
>>> {
>>>char *p;
>>>protected_alloca chunk(5);
>>>p = (char *) chunk.pointer();
>>>f(p);
>>> }
>>>
>>> This would generate:
>>>
>>> void foo() ()
>>> {
>>>void * _3;
>>>
>>>:
>>>_3 = malloc (5);
>>>f (_3);
>>>
>>>:
>>>free (_3); [tail call]
>>>return;
>>> }
>>>
>>> Now the problem with this is that the memory allocated by chunk is freed
>>> when it goes out of scope, which may not be what you want.  For example:
>>>
>>>   func()
>>>   {
>>> char *str;
>>> {
>>>   protected_alloca chunk ();
>>>   // malloc'd pointer will be freed when chunk goes out of scope.
>>>   str = (char *) chunk.pointer ();
>>> }
>>> use (str);  // BAD!  Use after free.
>>>   }
>>
>>
>> But how's that an issue if the chunk is created at the exact place where
>> there
>> previously was an alloca?
>
>
> The pointer can escape if you assign it to a variable outside the scope of
> chunk?  Take for instance the following snippet in tree.c:
>
> {
> ...
> ...
>   q = (char *) alloca (9 + 17 + len + 1);
>   memcpy (q, file, len + 1);
>
>   snprintf (q + len, 9 + 17 + 1, "_%08X_" HOST_WIDE_INT_PRINT_HEX,
> crc32_string (0, name), get_random_seed (false));
>
>   p = q;
> }
>
> clean_symbol_name (q);
>
> If you define `protected_alloca chunk(9 + 17 + len + 1)' at the alloca()
> call, chunk will be destroyed at the "}", whereas `q' is still being used
> outside of that scope.
>
> What I am suggesting for this escaping case is to define "protected_alloca
> chunk()" at function scope, and then do chunk.alloc(N) in the spot where the
> alloca() call was previously at.
>
> Or am I missing something?

Ah, ok - alloca memory is only freed at the end of the function.  Only VLAs
are freed at scope boundary.

>>
>> Your class also will not work when internal_alloc is not inlined and
>> the alloca path
>> is taken like when using non-GCC host compilers.
>
>
> Does not work, or is not optimal?  Because defining _ALLOCA_INLINE_ to
> nothing and forcing no-inline with:
>
> g++ -c b.cc -fno-inline -fdump-tree-all  -O1 -fno-exceptions
>
> I still see correct code.  It's just that it's inefficient, which we
> shouldn't care because bootstrap fixes the non-GCC inlining problem :).

As alloca() frees memory at function return code cannot be "correct".

>   protected_alloca chunk(123);
>   str = (char *) chunk.pointer();
>   use(str);
>
> becomes:
>
>   :
>   protected_alloca::protected_alloca (, 123);
>   str_3 = protected_alloca::pointer ();
>   use (str_3);
>   protected_alloca::~protected_alloca ();
>   return;
>
> What am I missing?

The memory is released after internal_alloc returns.  So if you do

  protected_alloca::protected_alloca (, 123);
  ...
  foo (); // function needing some stack space or GCC spilling
  ...

you'll corrupt memory.

>>
>>> In the attached patch implementing this class I have provided another
>>> idiom
>>> for avoiding this problem:
>>>
>>>   func()
>>>   {
>>> void *ptr;
>>> protected_alloca chunk;
>>> {
>>>   chunk.alloc (999);
>>>   str = (char *) chunk.pointer ();
>>> }
>>> // OK, pointer will be freed on function exit.
>>> use (str);
>>>   }
>>>
>>> So I guess it's between annoying gotos and keeping track of multiple exit
>>> points to a function previously calling alloca, or making sure the
>>> protected_alloca object always resides in the scope where the memory is
>>> going to be used.
>>>
>>> Is there a better blessed C++ way?  If not, is this OK?
>>
>>
>> It looks like you want to replace _all_ alloca uses?  What's the point
>> in doing this
>> at all?  Just to be able to enable the warning during bootstrap?
>
>
> Well, it did cross my mind to nix anything that had 0 bounds checks, but I
> was mostly interested in things like this:
>
> gcc.c:
>   temp = env.get ("COMPILER_PATH");
>   if (temp)
> {
>   const char *startp, 

[PATCH] Remove unnecessary jump threading restriction

2016-08-05 Thread Richard Biener

There is no need to avoid threading to a loop header, the threading
code can cope with this just fine.  Noticed when working on PR72772.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-08-05  Richard Biener  

* tree-ssa-threadupdate.c (thread_block_1): Remove unnecessary
restriction on threading to a loop header.

Index: gcc/tree-ssa-threadupdate.c
===
--- gcc/tree-ssa-threadupdate.c (revision 239117)
+++ gcc/tree-ssa-threadupdate.c (working copy)
@@ -1531,10 +1531,8 @@ thread_block_1 (basic_block bb, bool nol
 threading path that crosses loop boundaries.  We do not try
 and thread this elsewhere, so just cancel the jump threading
 request by clearing the AUX field now.  */
- if ((bb->loop_father != e2->src->loop_father
-  && !loop_exit_edge_p (e2->src->loop_father, e2))
- || (e2->src->loop_father != e2->dest->loop_father
- && !loop_exit_edge_p (e2->src->loop_father, e2)))
+ if (bb->loop_father != e2->src->loop_father
+ && !loop_exit_edge_p (e2->src->loop_father, e2))
{
  /* Since this case is not handled by our special code
 to thread through a loop header, we must explicitly


  1   2   >