Re: [PATCH] Add constexpr to iterator adaptors, array and range access

2016-07-15 Thread Oleg Endo
On Sat, 2016-07-16 at 00:09 +0100, Jonathan Wakely wrote:
> This patch implements http://wg21.link/p0031 which adds constexpr to
> most operations on std::reverse_iterator, std::move_iterator,
> std::array, as well as std::advance, std::distance, and the
> range-access functions std::begin, std::rbegin etc.
> 
> Strictly speaking, those functions should only be constexpr in C++17
> and not before, but this patch makes them constexpr whenever
> possible.
> That means the changes are fully implemented for C++14 (and the
> feature-test macro is defined) but only partially implemented for
> C++11, because some of the functions can't be constexpr in C++11.
> 
> My thinking is that if the committee has decided that these functions
> *should* be constexpr, and changed them for C++17, then it doesn't
> serve users to make them non-constexpr in C++11 and C++14 just
> because
> the standard says so.
> 
> How do other people feel about that?
> 
The alternative would be to define a new _GLIBCXX17_CONSTEXPR macro
> and use it in all these places, so they're only constexpr in C++17
> (and probably for -std=gnu++14 too, but not -std=c++14).
> 
> How strict do we want to be about obeying the "implementors can't add
> constexpr" rule in these cases?

Other std libs are even more ignorant regarding this and do stuff like
defining several C++11 features and functionalities even if C++11 is
not enabled, which can be annoying at times.

One thing that could happen with those extra constexpr, is having
people accidentally writing non-C++11/14-compliant code on a newer
C++17 toolchain, although they actually wanted to write C++11/14
compliant code.

GCC itself, with its conservative toolchain requirements, is probably a
good example.  If GCC's compiler requirement is bumped from C++98/03 to
C++11 some day, most GCC developers are probably going to use a pretty
recent GCC version for GCC development.  As a consequence, silent non
-compliant constexpr related constructs might slip in over and over
again simply because people don't notice.  So everyone would need to
keep some older C++11-only toolchain version just for testing the
build.  I can imagine this to be a source of frustration.  If users
want to write C++11 or C++14 code, the toolchain should assist them in
doing so as much as possible.

I'd vote for the _GLIBCXX17_CONSTEXPR macro.

Cheers,
Oleg


[PATCH] Add constexpr to iterator adaptors, array and range access

2016-07-15 Thread Jonathan Wakely

This patch implements http://wg21.link/p0031 which adds constexpr to
most operations on std::reverse_iterator, std::move_iterator,
std::array, as well as std::advance, std::distance, and the
range-access functions std::begin, std::rbegin etc.

Strictly speaking, those functions should only be constexpr in C++17
and not before, but this patch makes them constexpr whenever possible.
That means the changes are fully implemented for C++14 (and the
feature-test macro is defined) but only partially implemented for
C++11, because some of the functions can't be constexpr in C++11.

My thinking is that if the committee has decided that these functions
*should* be constexpr, and changed them for C++17, then it doesn't
serve users to make them non-constexpr in C++11 and C++14 just because
the standard says so.

How do other people feel about that?

The alternative would be to define a new _GLIBCXX17_CONSTEXPR macro
and use it in all these places, so they're only constexpr in C++17
(and probably for -std=gnu++14 too, but not -std=c++14).

How strict do we want to be about obeying the "implementors can't add
constexpr" rule in these cases?

Here's the patch, but I'm not committing it yet.

* include/bits/range_access.h (begin, end, rbegin, rend, crbegin)
(crend): Add constexpr specifier as per P0031R0.
* include/bits/stl_iterator.h (reverse_iterator, move_iterator)
(__make_reverse_iterator, make_reverse_iterator, make_move_iterator):
Add _GLIBCXX_CONSTEXPR or _GLIBCXX14_CONSTEXPR as appropriate.
* include/bits/stl_iterator_base_funcs.h (__distance, distance)
(__advance, advance, next, prev): Likewise.
* include/std/array (array::begin, array::end, array::rbegin)
(array::rend, array::cbegin, array:cend, array::crbegin)
(array::crend, array::operator[], array::at, array::front)
(array::back, array::data): Likewise.
* testsuite/24_iterators/headers/iterator/range_access.cc: Add
constexpr.
* testsuite/24_iterators/headers/iterator/synopsis.cc: Add constexpr
to reverse_iterator operations, advance, and distance.
commit 7e46198cc511bcc8876ab9d3f11e938dbe4cfea0
Author: Jonathan Wakely 
Date:   Fri Jul 15 22:31:16 2016 +0100

Add constexpr to iterator adaptors, array and range access

* include/bits/range_access.h (begin, end, rbegin, rend, crbegin)
(crend): Add constexpr specifier as per P0031R0.
* include/bits/stl_iterator.h (reverse_iterator, move_iterator)
(__make_reverse_iterator, make_reverse_iterator, make_move_iterator):
Add _GLIBCXX_CONSTEXPR or _GLIBCXX14_CONSTEXPR as appropriate.
* include/bits/stl_iterator_base_funcs.h (__distance, distance)
(__advance, advance, next, prev): Likewise.
* include/std/array (array::begin, array::end, array::rbegin)
(array::rend, array::cbegin, array:cend, array::crbegin)
(array::crend, array::operator[], array::at, array::front)
(array::back, array::data): Likewise.
* testsuite/24_iterators/headers/iterator/range_access.cc: Add
constexpr.
* testsuite/24_iterators/headers/iterator/synopsis.cc: Add constexpr
to reverse_iterator operations, advance, and distance.

diff --git a/libstdc++-v3/include/bits/range_access.h 
b/libstdc++-v3/include/bits/range_access.h
index e2ec072..257122e 100644
--- a/libstdc++-v3/include/bits/range_access.h
+++ b/libstdc++-v3/include/bits/range_access.h
@@ -44,7 +44,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template
-inline auto
+constexpr auto
 begin(_Container& __cont) -> decltype(__cont.begin())
 { return __cont.begin(); }
 
@@ -54,7 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template
-inline auto
+constexpr auto
 begin(const _Container& __cont) -> decltype(__cont.begin())
 { return __cont.begin(); }
 
@@ -64,7 +64,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template
-inline auto
+constexpr auto
 end(_Container& __cont) -> decltype(__cont.end())
 { return __cont.end(); }
 
@@ -74,7 +74,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template
-inline auto
+constexpr auto
 end(const _Container& __cont) -> decltype(__cont.end())
 { return __cont.end(); }
 
@@ -83,7 +83,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __arr  Array.
*/
   template
-inline _GLIBCXX14_CONSTEXPR _Tp*
+constexpr _Tp*
 begin(_Tp (&__arr)[_Nm])
 { return __arr; }
 
@@ -93,7 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __arr  Array.
*/
   template
-inline _GLIBCXX14_CONSTEXPR _Tp*
+constexpr _Tp*
 end(_Tp (&__arr)[_Nm])
 { return __arr + _Nm; }
 
@@ -112,7 +112,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __cont  Container.
*/
   template

Re: [PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-15 Thread Jonathan Wakely

On 15/07/16 22:53 +0100, Jonathan Wakely wrote:

On 15/07/16 23:36 +0200, Jakub Jelinek wrote:

On Fri, Jul 15, 2016 at 10:07:03PM +0100, Jonathan Wakely wrote:

+  if (typeid (*this) == typeid(__pointer_type_info))
+{
+  *thr_obj = nullptr;
+  return true;


But you have the above store too.


That doesn't write to the exception object, it only does a single
dereference (compared to the double dereference of the racy write), so
it writes to the local variable in the PERSONALITY_FUNCTION in
eh_personality.cc

So that shouldn't race with other threads. I think.



TSan agrees. When I compile my test and yours (and include
libsupc++/pbase_type_info.cc in the executable, so the writes are also
instrumented by tsan) then I see races for the *(ptrdiff_t*)*thr_obj
writes but not the *thr_obj ones.

It's only the ptr-to-data-member case that scribbles in the actual
exception object.




Re: [PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-15 Thread Jonathan Wakely

On 15/07/16 23:36 +0200, Jakub Jelinek wrote:

On Fri, Jul 15, 2016 at 10:07:03PM +0100, Jonathan Wakely wrote:

>+  if (typeid (*this) == typeid(__pointer_type_info))
>+{
>+  *thr_obj = nullptr;
>+  return true;


But you have the above store too.


That doesn't write to the exception object, it only does a single
dereference (compared to the double dereference of the racy write), so
it writes to the local variable in the PERSONALITY_FUNCTION in
eh_personality.cc

So that shouldn't race with other threads. I think.




Re: [PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-15 Thread Jakub Jelinek
On Fri, Jul 15, 2016 at 10:07:03PM +0100, Jonathan Wakely wrote:
> >+  if (typeid (*this) == typeid(__pointer_type_info))
> >+{
> >+  *thr_obj = nullptr;
> >+  return true;

But you have the above store too.
Can't you rethrow_exception in certain threads trying to catch it as pointer
to data member and in other threads as a pointer?

#include 
#include 
#include 

std::exception_ptr p;

struct A { };

void t()
{
 try {
   std::rethrow_exception(p);
 } catch (int A::* const& e) {
   assert( e == nullptr );
 }
}

void h()
{
 try {
   std::rethrow_exception(p);
 } catch (int *& e) {
   assert( e == nullptr );
 }
}

int main()
{
 try {
   throw nullptr;
 } catch (...) {
   p = std::current_exception();
 }
 std::thread t1(t);
 std::thread t2(h);
 std::thread t3(t);
 std::thread t4(h);
 t1.join();
 t2.join();
 t3.join();
 t4.join();
}

Jakub


Re: [PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-15 Thread Jonathan Wakely

On 15/07/16 12:47 +0100, Jonathan Wakely wrote:

diff --git a/libstdc++-v3/libsupc++/pbase_type_info.cc 
b/libstdc++-v3/libsupc++/pbase_type_info.cc
index d47fb23..a2993e4 100644
--- a/libstdc++-v3/libsupc++/pbase_type_info.cc
+++ b/libstdc++-v3/libsupc++/pbase_type_info.cc
@@ -38,6 +38,31 @@ __do_catch (const type_info *thr_type,
return true;  // same type

#if __cpp_rtti
+  if (*thr_type == typeid (nullptr))
+{
+  // A catch handler for any pointer type matches nullptr_t.
+  if (typeid (*this) == typeid(__pointer_type_info))
+{
+  *thr_obj = nullptr;
+  return true;
+}
+  else if (typeid (*this) == typeid(__pointer_to_member_type_info))
+{
+  if (__pointee->__is_function_p ())
+{
+  // A pointer-to-member-function is two words  but the
+  // nullptr_t exception object at *(nullptr_t*)*thr_obj is only
+  // one word, so we can't safely return it as a PMF. FIXME.
+  return false;
+}
+  else
+{
+  *(ptrdiff_t*)*thr_obj = -1; // null pointer to data member


This store could race with accesses in other threads, if the same
exception object gets thrown in multiple threads and the value of the
pointer-to-data-member is read concurrently with the write:

#include 
#include 
#include 

std::exception_ptr p;

struct A { };

void t()
{
 try {
   std::rethrow_exception(p);
 } catch (int A::* const& e) {
   assert( e == nullptr );
 }
}

int main()
{
 try {
   throw nullptr;
 } catch (...) {
   p = std::current_exception();
 }
 std::thread t1(t);
 std::thread t2(t);
 t1.join();
 t2.join();
}


We could use __atomic_store to do the write, but the reads would still
be non-atomic. The attached patch makes __cxa_throw do that write on
the initial throw, so that a thrown nullptr_t always has the right
content to act as a null pointer-to-member.  However that adds
overhead for every throw. A more efficient solution might be for the
front-end to do that write when it sees a nullptr_t being thrown.


diff --git a/libstdc++-v3/libsupc++/eh_throw.cc b/libstdc++-v3/libsupc++/eh_throw.cc
index 9aac218..e6b155f 100644
--- a/libstdc++-v3/libsupc++/eh_throw.cc
+++ b/libstdc++-v3/libsupc++/eh_throw.cc
@@ -62,6 +62,11 @@ __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
 {
   PROBE2 (throw, obj, tinfo);
 
+#if __cpp_rtti
+  if (*tinfo == typeid(nullptr))
+*(ptrdiff_t*)obj = -1; // make it act as a null pointer-to-data-member
+#endif
+
   __cxa_eh_globals *globals = __cxa_get_globals ();
   globals->uncaughtExceptions += 1;
 
diff --git a/libstdc++-v3/libsupc++/pbase_type_info.cc b/libstdc++-v3/libsupc++/pbase_type_info.cc
index a2993e4..4cb626c 100644
--- a/libstdc++-v3/libsupc++/pbase_type_info.cc
+++ b/libstdc++-v3/libsupc++/pbase_type_info.cc
@@ -57,7 +57,7 @@ __do_catch (const type_info *thr_type,
 }
   else
 {
-  *(ptrdiff_t*)*thr_obj = -1; // null pointer to data member
+  // __cxa_throw already wrote (ptrdiff_t)-1 to the object.
   return true;
 }
 }


[PATCH, rs6000, preapproved] Remove XFAILs from g++.dg/pr70098.C and gcc.target/powerpc/pr71763.c

2016-07-15 Thread Bill Schmidt
Hi,

A recent patch from Alan Modra removed the need to XFAIL these two
test cases.  Tested on powerpc64[le]-unknown-linux-gnu, committed
to trunk and gcc-6-branch.

Thanks,
Bill


2016-07-15  Bill Schmidt  

* g++.dg/pr70098.C: Remove XFAIL for powerpc64_no_dm.
* gcc.target/powerpc/pr71763.c: Likewise.


Index: gcc/testsuite/g++.dg/pr70098.C
===
--- gcc/testsuite/g++.dg/pr70098.C  (revision 238389)
+++ gcc/testsuite/g++.dg/pr70098.C  (working copy)
@@ -2,8 +2,6 @@
 // { dg-do compile }
 // { dg-options -O2 }
 // { dg-require-effective-target c++11 }
-// { dg-xfail-if "PR70098" { lp64 && powerpc64_no_dm } }
-// { dg-prune-output ".*internal compiler error.*" }
 
 template < typename > struct traits;
 template < typename, int _Rows, int _Cols, int = 0, int = _Rows,
Index: gcc/testsuite/gcc.target/powerpc/pr71763.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71763.c  (revision 238389)
+++ gcc/testsuite/gcc.target/powerpc/pr71763.c  (working copy)
@@ -1,8 +1,6 @@
 // PR target/71763
 // { dg-do compile }
 // { dg-options "-O1 -mvsx" }
-// { dg-xfail-if "PR70098" { lp64 && powerpc64_no_dm } }
-// { dg-prune-output ".*internal compiler error.*" }
 
 int a, b;
 float c;



Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-15 Thread Andi Kleen
> >Not sure what the status for autofdo is in this case.  "make check -k"
> >is stable for me, but "make check -k -j#" gives unstable result in
> >tree-prof.exp tests.  Anything I did wrong?
> I'm seeing unstable results as well, but haven't dug into it at all.
> It's definitely autofdo testing though -- If I remove the
> gcc-auto-profile script I get consistent results from one run to the
> next.

Ok. I'll investigate.
-Andi


[PATCH] Replace references to C++0x with C++11 in comments

2016-07-15 Thread Jonathan Wakely

* include/bits/algorithmfwd.h: Change C++0x to C++11 in comments.
* include/bits/move.h: Likewise.
* include/bits/postypes.h: Likewise.
* include/debug/bitset: Likewise.
* include/ext/pb_ds/detail/type_utils.hpp: Likewise.
* include/ext/string_conversions.h: Change C++0x to __cxx11 in
comment.
* testsuite/27_io/fpos/14320-1.cc: Change C++0x to C++11 in comment.
* testsuite/util/thread/all.h: Likewise.

Committed to trunk.

commit 1eb7eb4d0abcccfaf49ebf523666e83c857a8eca
Author: redi 
Date:   Fri Jul 15 20:23:08 2016 +

Replace references to C++0x with C++11 in comments

* include/bits/algorithmfwd.h: Change C++0x to C++11 in comments.
* include/bits/move.h: Likewise.
* include/bits/postypes.h: Likewise.
* include/debug/bitset: Likewise.
* include/ext/pb_ds/detail/type_utils.hpp: Likewise.
* include/ext/string_conversions.h: Change C++0x to __cxx11 in
comment.
* testsuite/27_io/fpos/14320-1.cc: Change C++0x to C++11 in comment.
* testsuite/util/thread/all.h: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238402 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/algorithmfwd.h 
b/libstdc++-v3/include/bits/algorithmfwd.h
index 1defb1d..c648169 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -45,14 +45,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /*
 adjacent_find
-all_of (C++0x)
-any_of (C++0x)
+all_of (C++11)
+any_of (C++11)
 binary_search
 clamp (C++17)
 copy
 copy_backward
-copy_if (C++0x)
-copy_n (C++0x)
+copy_if (C++11)
+copy_n (C++11)
 count
 count_if
 equal
@@ -63,17 +63,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 find_end
 find_first_of
 find_if
-find_if_not (C++0x)
+find_if_not (C++11)
 for_each
 generate
 generate_n
 includes
 inplace_merge
-is_heap (C++0x)
-is_heap_until (C++0x)
-is_partitioned (C++0x)
-is_sorted (C++0x)
-is_sorted_until (C++0x)
+is_heap (C++11)
+is_heap_until (C++11)
+is_partitioned (C++11)
+is_sorted (C++11)
+is_sorted_until (C++11)
 iter_swap
 lexicographical_compare
 lower_bound
@@ -83,17 +83,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 merge
 min
 min_element
-minmax (C++0x)
-minmax_element (C++0x)
+minmax (C++11)
+minmax_element (C++11)
 mismatch
 next_permutation
-none_of (C++0x)
+none_of (C++11)
 nth_element
 partial_sort
 partial_sort_copy
 partition
-partition_copy (C++0x)
-partition_point (C++0x)
+partition_copy (C++11)
+partition_point (C++11)
 pop_heap
 prev_permutation
 push_heap
@@ -116,7 +116,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 set_intersection
 set_symmetric_difference
 set_union
-shuffle (C++0x)
+shuffle (C++11)
 sort
 sort_heap
 stable_partition
diff --git a/libstdc++-v3/include/bits/move.h b/libstdc++-v3/include/bits/move.h
index afcea2d..9deec42 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -1,4 +1,4 @@
-// Move, forward and identity for C++0x + swap -*- C++ -*-
+// Move, forward and identity for C++11 + swap -*- C++ -*-
 
 // Copyright (C) 2007-2016 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/include/bits/postypes.h 
b/libstdc++-v3/include/bits/postypes.h
index f63571e..bdd1f4a 100644
--- a/libstdc++-v3/include/bits/postypes.h
+++ b/libstdc++-v3/include/bits/postypes.h
@@ -41,7 +41,7 @@
 
 // XXX If  is really needed, make sure to define the macros
 // before including it, in order not to break  (and 
-// in C++0x).  Reconsider all this as soon as possible...
+// in C++11).  Reconsider all this as soon as possible...
 #if (defined(_GLIBCXX_HAVE_INT64_T) && !defined(_GLIBCXX_HAVE_INT64_T_LONG) \
  && !defined(_GLIBCXX_HAVE_INT64_T_LONG_LONG))
 
diff --git a/libstdc++-v3/include/debug/bitset 
b/libstdc++-v3/include/debug/bitset
index 1353aa3..55d3281 100644
--- a/libstdc++-v3/include/debug/bitset
+++ b/libstdc++-v3/include/debug/bitset
@@ -50,7 +50,7 @@ namespace __debug
   typedef _GLIBCXX_STD_C::bitset<_Nb> _Base;
 
 public:
-  // In C++0x we rely on normal reference type to preserve the property
+  // In C++11 we rely on normal reference type to preserve the property
   // of bitset to be use as a literal.
   // TODO: Find another solution.
 #if __cplusplus >= 201103L
diff --git a/libstdc++-v3/include/ext/pb_ds/detail/type_utils.hpp 
b/libstdc++-v3/include/ext/pb_ds/detail/type_utils.hpp
index f0c3983..602c28b 100644
--- a/libstdc++-v3/include/ext/pb_ds/detail/type_utils.hpp
+++ b/libstdc++-v3/include/ext/pb_ds/detail/type_utils.hpp
@@ -131,7 +131,7 @@ namespace __gnu_pbds
};
 };
 
-// Use 

Re: [PATCH] Remove redundant std::move in std::for_each

2016-07-15 Thread Daniel Krügler
2016-07-15 21:53 GMT+02:00 Jonathan Wakely :
> Returning a parameter will try to move anyway, so using std::move here
> is redundant (and clang even warns about it being redundant).
>
> * include/bits/stl_algo.h (for_each): Remove redundant _GLIBCXX_MOVE
> and adjust comment.

Actually I think this should also be reported as an LWG issue.

- Daniel


[PATCH] Remove redundant std::move in std::for_each

2016-07-15 Thread Jonathan Wakely

Returning a parameter will try to move anyway, so using std::move here
is redundant (and clang even warns about it being redundant).

* include/bits/stl_algo.h (for_each): Remove redundant _GLIBCXX_MOVE
and adjust comment.

Tested x86_64-linux, committed to trunk.

commit 54ee8abfe62dc97f725843f58fe896eebcc52db0
Author: redi 
Date:   Fri Jul 15 19:51:33 2016 +

Remove redundant std::move in std::for_each

* include/bits/stl_algo.h (for_each): Remove redundant _GLIBCXX_MOVE
and adjust comment.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238399 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index a0820d4..ea0b56c 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3791,7 +3791,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  @param  __first  An input iterator.
*  @param  __last   An input iterator.
*  @param  __f  A unary function object.
-   *  @return   @p __f (std::move(@p __f) in C++0x).
+   *  @return   @p __f
*
*  Applies the function object @p __f to each element in the range
*  @p [first,last).  @p __f must not modify the order of the sequence.
@@ -3806,7 +3806,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   __glibcxx_requires_valid_range(__first, __last);
   for (; __first != __last; ++__first)
__f(*__first);
-  return _GLIBCXX_MOVE(__f);
+  return __f; // N.B. [alg.foreach] says std::move(f) but it's redundant.
 }
 
   /**


Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-15 Thread Jeff Law

On 07/15/2016 02:37 AM, Bin.Cheng wrote:

On Thu, Jul 14, 2016 at 11:33 PM, Andi Kleen  wrote:

I haven't seen that. Unstable in what way?

For GCC doesn't support FDO, it run below tests as you said:

PASS: gcc.dg/tree-prof/20041218-1.c compilation,  -g
UNSUPPORTED: gcc.dg/tree-prof/20041218-1.c: Cannot run create_gcov
--binary /data/work/trunk/build/gcc/testsuite/gcc/20041218-1.gcda

Normally, it doesn't create gcov data file, thus the next test is
unsupported.  I guess in parallel test, the second test picks up gcov
data files from other process, which results in random pass.
Is it possible to not have these when fdo is supported?

Hmm, typo:  s/supported/not supported/.


I don't see how this can happen: as you can see the .gcda file name
is unique for each test case.

I think there may be problems with the perf.data, could add a postfix.
But you should only see that with real autofdo.

Don't know.  I only configured/built GCC on x86_64 with below command lines:

$ ../gcc/configure --prefix=...
--enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++
--with-build-config=bootstrap-O3 --disable-werror
$ make && make install

Not sure what the status for autofdo is in this case.  "make check -k"
is stable for me, but "make check -k -j#" gives unstable result in
tree-prof.exp tests.  Anything I did wrong?
I'm seeing unstable results as well, but haven't dug into it at all. 
It's definitely autofdo testing though -- If I remove the 
gcc-auto-profile script I get consistent results from one run to the next.


jeff


Re: [patch, fortran] PR62125 Nested select type not accepted (rejects valid)

2016-07-15 Thread Jerry DeLisle
On 07/15/2016 01:35 AM, Mikael Morin wrote:
> Le 15/07/2016 à 03:26, Jerry DeLisle a écrit :
>> Hit send too soon and forgot to fill out the subject line. Correctly
>> given here.
>> On 07/14/2016 10:47 AM, Jerry DeLisle wrote:
>>> This simple patch kindly provided by Marco solves the problem.
>>>
>>> Regression tested on x86_64-Linux, Test case provided also.
>>>
>>> OK to commit to trunk?
>>>
> Yes. Thanks.

Thanks for review Mikael!

Jerry


Re: [PATCH testsuite]XFAIL gcc.dg/tree-ssa/pr71347 on some targets

2016-07-15 Thread Jeff Law

On 07/15/2016 08:02 AM, Bin.Cheng wrote:

On Thu, Jul 14, 2016 at 6:14 PM, Jeff Law  wrote:

On 07/14/2016 10:11 AM, Bin Cheng wrote:


Hi, Test gcc.dg/tree-ssa/pr71347 failed on some targets if the two
memory references are re-written into different forms by IVOPT.  This
could be because of various reasons, for example, auto-increment
addressing mode.  Since the address expressions are of different
form, DOM fails to eliminate the redundant load at the moment.  I
think DOM should be improved to handle address expressions appearing
in different forms (at least for simple cases).  For now, I will
XFAIL this test indicating a possible enhancement.


FWIW, Adam Lawrence did some work in this area a few months ago. Essentially
enhancing DOM to be able to detect certain MEM_REF and ARRAY_REF accesses
were in fact equivalent.

Hi Jeff,
Do you recall any PR/patch of this?  I didn't find it in mail searching.

The commit references 63679.

Jeff



Re: [PATCH, rs6000] Fix {div,mul}kc3-1.c for older hardware

2016-07-15 Thread Segher Boessenkool
On Fri, Jul 15, 2016 at 11:40:34AM -0500, Bill Schmidt wrote:
> For the subject tests, I neglected to ensure they would be skipped on
> older hardware.  This patch fixes that.  Tested on
> powerpc64-unknown-linux-gnu on POWER7 to verify the tests are
> unsupported there, and on powerpc64le-unknown-linux-gnu on POWER8
> to verify they still run there.  Ok for trunk and gcc-6-branch?

[ This shows up as the __mulkc3 etc. functions not existing at all ].

Okay.  Thanks,


Segher


[C++ PATCH] Allow frexp etc. builtins in c++14 constexpr (PR c++/50060)

2016-07-15 Thread Jakub Jelinek
Hi!

While in C++11, builtins returning two results, one of them by dereferencing
a pointer argument can't be constexpr, in my understanding in C++14
generalized constexprs they can.

So, this patch tweaks cxx_eval_builtin_function_call so that it handles how
builtins.c folds these builtins (i.e. COMPOUND_EXPR with first operand
being *arg = const1 and second operand const2, optionally all wrapped into a
NON_LVALUE_EXPR.

In addition, I've noticed that the lval argument is passed down to
evaluation of arguments, that doesn't make sense to me, IMHO arguments
should be always evakyated as rvalues (and for non-builtins they are).

sincos (which has stores 2 results through pointers) is folded earlier into
cexpi and thus worked already before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-07-15  Jakub Jelinek  

PR c++/50060
* constexpr.c (cxx_eval_builtin_function_call): Pass false as lval
when evaluating call arguments, make the lval argument nameless.  For
C++14 and later, if new_call is COMPOUND_EXPR with a assignment and
constant, evaluate the assignment and return the constant.

* g++.dg/cpp1y/constexpr-50060.C: New test.

--- gcc/cp/constexpr.c.jj   2016-07-11 22:18:01.0 +0200
+++ gcc/cp/constexpr.c  2016-07-15 15:27:50.820085561 +0200
@@ -1078,8 +1078,7 @@ get_nth_callarg (tree t, int n)
 
 static tree
 cxx_eval_builtin_function_call (const constexpr_ctx *ctx, tree t, tree fun,
-   bool lval,
-   bool *non_constant_p, bool *overflow_p)
+   bool, bool *non_constant_p, bool *overflow_p)
 {
   const int nargs = call_expr_nargs (t);
   tree *args = (tree *) alloca (nargs * sizeof (tree));
@@ -1105,7 +1104,7 @@ cxx_eval_builtin_function_call (const co
   for (i = 0; i < nargs; ++i)
 {
   args[i] = cxx_eval_constant_expression (_ctx, CALL_EXPR_ARG (t, i),
- lval, , );
+ false, , );
   if (bi_const_p)
/* For __built_in_constant_p, fold all expressions with constant values
   even if they aren't C++ constant-expressions.  */
@@ -1119,6 +1118,27 @@ cxx_eval_builtin_function_call (const co
   /* Fold away the NOP_EXPR from fold_builtin_n.  */
   new_call = fold (new_call);
   force_folding_builtin_constant_p = save_ffbcp;
+
+  if (cxx_dialect >= cxx14)
+{
+  tree r = new_call;
+  if (TREE_CODE (r) == NON_LVALUE_EXPR)
+   r = TREE_OPERAND (r, 0);
+  if (TREE_CODE (r) == COMPOUND_EXPR
+ && TREE_CODE (TREE_OPERAND (r, 0)) == MODIFY_EXPR
+ && reduced_constant_expression_p (TREE_OPERAND (TREE_OPERAND (r, 0),
+ 1)))
+   {
+ /* The frexp, modf, remquo and lgamma_r builtins (and their variants)
+with  as last argument are folded into
+(var = const1), const2, sometimes wrapped into
+NON_LVALUE_EXPR.  */
+ cxx_eval_constant_expression (_ctx, TREE_OPERAND (r, 0),
+   false, non_constant_p, overflow_p);
+ new_call = TREE_OPERAND (r, 1);
+   }
+}
+
   VERIFY_CONSTANT (new_call);
   return new_call;
 }
--- gcc/testsuite/g++.dg/cpp1y/constexpr-50060.C.jj 2016-07-15 
15:34:12.469124944 +0200
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-50060.C2016-07-15 
15:59:36.285303078 +0200
@@ -0,0 +1,100 @@
+// PR c++/50060
+// { dg-do compile { target c++14 } }
+
+// sincos and lgamma_r aren't available in -std=c++14,
+// only in -std=gnu++14.  Use __builtin_* in that case.
+extern "C" void sincos (double, double *, double *);
+extern "C" double frexp (double, int *);
+extern "C" double modf (double, double *);
+extern "C" double remquo (double, double, int *);
+extern "C" double lgamma_r (double, int *);
+
+constexpr double
+f0 (double x)
+{
+  double y {};
+  double z {};
+  __builtin_sincos (x, , );
+  return y;
+}
+
+constexpr double
+f1 (double x)
+{
+  double y {};
+  double z {};
+  __builtin_sincos (x, , );
+  return z;
+}
+
+constexpr double
+f2 (double x)
+{
+  int y {};
+  return frexp (x, );
+}
+
+constexpr int
+f3 (double x)
+{
+  int y {};
+  frexp (x, );
+  return y;
+}
+
+constexpr double
+f4 (double x)
+{
+  double y {};
+  return modf (x, );
+}
+
+constexpr double
+f5 (double x)
+{
+  double y {};
+  modf (x, );
+  return y;
+}
+
+constexpr double
+f6 (double x, double y)
+{
+  int z {};
+  return remquo (x, y, );
+}
+
+constexpr int
+f7 (double x, double y)
+{
+  int z {};
+  remquo (x, y, );
+  return z;
+}
+
+constexpr double
+f8 (double x)
+{
+  int y {};
+  return __builtin_lgamma_r (x, );
+}
+
+constexpr int
+f9 (double x)
+{
+  int y {};
+  __builtin_lgamma_r (x, );
+  return y;
+}
+
+static_assert (f0 (0.0) == 0.0, "");
+static_assert (f1 (0.0) == 1.0, "");
+static_assert (f2 (6.5) == 0.8125, "");

Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-15 Thread Segher Boessenkool
On Mon, Jul 11, 2016 at 06:54:44AM -0400, David Edelsohn wrote:
> Should we backport this?  At least Alan's UNSPEC_DOLOOP part?

Alan backported this to 6 (I unfortunately removed gcc-patches
from cc:).


Segher


C++ PATCH for c++/71092 (ICE with array and constexpr)

2016-07-15 Thread Jason Merrill
Another instance of the problem with our handling of VEC_INIT_EXPR: it
isn't expanded until gimplification time, at which point some of the
front end information has started to be thrown away.  This really
needs to be fixed, but in the meantime we can work around the problem
like this.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 89a2f27053165b07898390a4312b0d47336d9128
Author: Jason Merrill 
Date:   Fri Jul 15 00:52:58 2016 -0400

PR c++/71092 - ICE with array and constexpr.

* constexpr.c (cxx_eval_call_expression): Fail quietly when cgraph
threw away DECL_SAVED_TREE.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b9834a7..cb8ece0 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1504,9 +1504,19 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 }
   else
 {
-  if (!result || result == error_mark_node)
+  if (result && result != error_mark_node)
+   /* OK */;
+  else if (!DECL_SAVED_TREE (fun))
+   {
+ /* When at_eof >= 2, cgraph has started throwing away
+DECL_SAVED_TREE, so fail quietly.  FIXME we get here because of
+late code generation for VEC_INIT_EXPR, which needs to be
+completely reconsidered.  */
+ gcc_assert (at_eof >= 2 && ctx->quiet);
+ *non_constant_p = true;
+   }
+  else
{
- gcc_assert (DECL_SAVED_TREE (fun));
  tree body, parms, res;
 
  /* Reuse or create a new unshared copy of this function's body.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array17.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-array17.C
new file mode 100644
index 000..c6afa50
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array17.C
@@ -0,0 +1,61 @@
+// PR c++/71092
+// { dg-do compile { target c++11 } }
+
+template  struct A { using type = _Default; };
+template  class>
+using __detected_or = A<_Default>;
+template  class _Op>
+using __detected_or_t = typename __detected_or<_Default, _Op>::type;
+template  struct B { typedef _Tp value_type; };
+struct C {
+  template  using __pointer = typename _Tp::pointer;
+};
+template  struct J : C {
+  using pointer = __detected_or_t;
+};
+template  void _Construct(_T1 *) { new _T1; }
+struct D {
+  template 
+  static _ForwardIterator __uninit_default_n(_ForwardIterator p1, _Size) {
+_Construct(p1);
+  }
+};
+template 
+void __uninitialized_default_n(_ForwardIterator p1, _Size) {
+  D::__uninit_default_n(p1, 0);
+}
+template 
+void __uninitialized_default_n_a(_ForwardIterator p1, _Size, _Tp) {
+  __uninitialized_default_n(p1, 0);
+}
+template  struct __shared_ptr {
+  constexpr __shared_ptr() : _M_ptr(), _M_refcount() {}
+  int _M_ptr;
+  int _M_refcount;
+};
+template  struct F {
+  typedef _Alloc _Tp_alloc_type;
+  struct G {
+typename J<_Tp_alloc_type>::pointer _M_start;
+G(_Tp_alloc_type);
+  };
+  F(int, _Alloc p2) : _M_impl(p2) {}
+  G _M_impl;
+};
+template > struct K : F<_Alloc> {
+  typedef _Alloc allocator_type;
+  K(int, allocator_type p2 = allocator_type()) : F<_Alloc>(0, p2) {
+__uninitialized_default_n_a(this->_M_impl._M_start, 0, 0);
+  }
+};
+struct H {
+  H();
+  struct I {
+__shared_ptr trigger[1];
+  };
+  __shared_ptr resetTrigger_;
+  K states_;
+  __shared_ptr triggerManager_;
+};
+__shared_ptr a;
+H::H() : states_(0), triggerManager_(a) {}


C++ PATCH for c++/71117 (wrong overload resolution with generic lambda)

2016-07-15 Thread Jason Merrill
My fix for add_template_conv_candidate, implementing my suggestion for
core 2189, broke calls to a function object that inherits from a
generic lambda.  The committee hasn't had a chance to consider this
issue yet, but for now let's disable the template conversion candidate
when there's a viable operator() candidate.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit b56248900900e3a9f85670d3cf0bf4acdfae34f4
Author: Jason Merrill 
Date:   Thu Jul 14 16:32:27 2016 -0400

PR c++/71117 - core 2189 and generic lambda

* call.c (add_template_conv_candidate): Disable if there are
viable candidates.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index a3c5008..889852f 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3204,6 +3204,12 @@ add_template_conv_candidate (struct z_candidate 
**candidates, tree tmpl,
 tree return_type, tree access_path,
 tree conversion_path, tsubst_flags_t complain)
 {
+  /* Making this work broke PR 71117, so until the committee resolves core
+ issue 2189, let's disable this candidate if there are any viable call
+ operators.  */
+  if (any_strictly_viable (*candidates))
+return NULL;
+
   return
 add_template_candidate_real (candidates, tmpl, NULL_TREE, NULL_TREE,
 NULL_TREE, arglist, return_type, access_path,
diff --git a/gcc/testsuite/g++.dg/cpp0x/conv-tmpl1.C 
b/gcc/testsuite/g++.dg/cpp0x/conv-tmpl1.C
index 7f866da..eb40dd6 100644
--- a/gcc/testsuite/g++.dg/cpp0x/conv-tmpl1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/conv-tmpl1.C
@@ -1,3 +1,4 @@
+// Test for Core 2189.
 // { dg-do compile { target c++11 } }
 
 template 
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-conv2.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-conv2.C
new file mode 100644
index 000..5528455
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-conv2.C
@@ -0,0 +1,26 @@
+// PR c++/71117
+// { dg-do compile { target c++14 } }
+
+template  T&& declval() noexcept;
+template 
+constexpr bool is_same = false;
+template 
+constexpr bool is_same = true;
+
+template 
+struct indirected : F {
+indirected(F f) : F(f) {}
+template 
+auto operator()(I i) -> decltype(declval()(*i)) {
+return static_cast(*this)(*i);
+}
+};
+
+int main() {
+auto f = [](auto rng) {
+static_assert(is_same, "");
+return 42;
+};
+indirected i(f);
+static_assert(is_same())), int>, "");
+}


Re: [libstdc++] Add C++17clamp

2016-07-15 Thread Ed Smith-Rowland



OK for trunk, thanks.


I didn't see a feature test in any of the SD-6 papers or P0025.


p0096r3 proposes __cpp_lib_clamp = 201603.

I added the feature macro and committed the attached as 238383.

Thanks,

Ed

2016-07-15  Edward Smith-Rowland  <3dw...@verizon.net>

Implement C++17 P0025 clamp.
* include/bits/algorithmfwd.h: Declare clamp overloads.
* include/bits/stl_algo.h: Implement clamp.  Feature __cpp_lib_clamp.
* testsuite/25_algorithms/clamp/1.cc: New test.
* testsuite/25_algorithms/clamp/2.cc: New test.
* testsuite/25_algorithms/clamp/constexpr.cc: New test.
* testsuite/25_algorithms/clamp/requirements/explicit_instantiation/
1.cc: New test.
* testsuite/25_algorithms/clamp/requirements/explicit_instantiation/
pod.cc: New test.

Index: include/bits/algorithmfwd.h
===
--- include/bits/algorithmfwd.h (revision 238382)
+++ include/bits/algorithmfwd.h (revision 238383)
@@ -48,6 +48,7 @@
 all_of (C++0x)
 any_of (C++0x)
 binary_search
+clamp (C++17)
 copy
 copy_backward
 copy_if (C++0x)
@@ -208,6 +209,18 @@
 bool 
 binary_search(_FIter, _FIter, const _Tp&, _Compare);
 
+#if __cplusplus > 201402L
+  template
+_GLIBCXX14_CONSTEXPR
+const _Tp&
+clamp(const _Tp&, const _Tp&, const _Tp&);
+
+  template
+_GLIBCXX14_CONSTEXPR
+const _Tp&
+clamp(const _Tp&, const _Tp&, const _Tp&, _Compare);
+#endif
+
   template
 _OIter 
 copy(_IIter, _IIter, _OIter);
Index: include/bits/stl_algo.h
===
--- include/bits/stl_algo.h (revision 238382)
+++ include/bits/stl_algo.h (revision 238383)
@@ -3698,8 +3698,47 @@
   return std::__is_permutation(__first1, __last1, __first2, __last2,
   __gnu_cxx::__ops::__iter_comp_iter(__pred));
 }
-#endif
 
+#if __cplusplus > 201402L
+
+#define __cpp_lib_clamp 201603
+
+  /**
+   *  @brief  Returns the value clamped between lo and hi.
+   *  @ingroup sorting_algorithms
+   *  @param  __val  A value of arbitrary type.
+   *  @param  __lo   A lower limit of arbitrary type.
+   *  @param  __hi   An upper limit of arbitrary type.
+   *  @return max(__val, __lo) if __val < __hi or min(__val, __hi) otherwise.
+   */
+  template
+constexpr const _Tp&
+clamp(const _Tp& __val, const _Tp& __lo, const _Tp& __hi)
+{
+  __glibcxx_assert(!(__hi < __lo));
+  return (__val < __lo) ? __lo : (__hi < __val) ? __hi : __val;
+}
+
+  /**
+   *  @brief  Returns the value clamped between lo and hi.
+   *  @ingroup sorting_algorithms
+   *  @param  __val   A value of arbitrary type.
+   *  @param  __loA lower limit of arbitrary type.
+   *  @param  __hiAn upper limit of arbitrary type.
+   *  @param  __comp  A comparison functor.
+   *  @return max(__val, __lo, __comp) if __comp(__val, __hi)
+   * or min(__val, __hi, __comp) otherwise.
+   */
+  template
+constexpr const _Tp&
+clamp(const _Tp& __val, const _Tp& __lo, const _Tp& __hi, _Compare __comp)
+{
+  __glibcxx_assert(!__comp(__hi, __lo));
+  return __comp(__val, __lo) ? __lo : __comp(__hi, __val) ? __hi : __val;
+}
+#endif // C++17
+#endif // C++14
+
 #ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /**
*  @brief Shuffle the elements of a sequence using a uniform random
Index: testsuite/25_algorithms/clamp/1.cc
===
--- testsuite/25_algorithms/clamp/1.cc  (nonexistent)
+++ testsuite/25_algorithms/clamp/1.cc  (revision 238383)
@@ -0,0 +1,48 @@
+// { dg-options "-std=gnu++17" }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+void test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  const int x = std::clamp(1, 2, 4);
+  const int y = std::clamp(3, 2, 4);
+  const int z = std::clamp(5, 2, 4);
+  VERIFY( x == 2 );
+  VERIFY( y == 3 );
+  VERIFY( z == 4 );
+
+  const int xc = std::clamp(1, 2, 4, std::greater());
+  const int yc = std::clamp(3, 2, 4, std::greater());
+  const int zc = std::clamp(5, 2, 4, std::greater());
+  VERIFY( xc == 4 );
+  VERIFY( yc == 2 );
+  

Re: [Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-07-15 Thread Alessandro Fanfarillo
Third *PING*

2016-07-04 16:46 GMT-06:00 Alessandro Fanfarillo :
> * PING *
>
> 2016-06-21 10:59 GMT-06:00 Alessandro Fanfarillo :
>> * PING *
>>
>> 2016-06-06 15:05 GMT-06:00 Alessandro Fanfarillo :
>>> Dear all,
>>>
>>> please find in attachment the first patch (of n) for the FAILED IMAGES
>>> capability defined in the coarray TS 18508.
>>> The patch adds support for three new intrinsic functions defined in
>>> the TS for simulating a failure (fail image), checking an image status
>>> (image_status) and getting the list of failed images (failed_images).
>>> The patch has been built and regtested on x86_64-pc-linux-gnu.
>>>
>>> Ok for trunk?
>>>
>>> Alessandro


Re: [PATCH GCC]Remove support for -funsafe-loop-optimizations

2016-07-15 Thread Richard Biener
On July 15, 2016 7:16:42 PM GMT+02:00, Bernd Schmidt  
wrote:
>On 07/15/2016 07:07 PM, Bin Cheng wrote:
>
>> Bootstrap and test on x86_64.  Is it OK?
>
>If you do this you'll also need to remove the use in config/bfin.

OK with that change.

Richard.

>
>Bernd




Re: [PATCH GCC]Remove support for -funsafe-loop-optimizations

2016-07-15 Thread Bernd Schmidt

On 07/15/2016 07:07 PM, Bin Cheng wrote:


Bootstrap and test on x86_64.  Is it OK?


If you do this you'll also need to remove the use in config/bfin.


Bernd


[PATCH GCC]Remove support for -funsafe-loop-optimizations

2016-07-15 Thread Bin Cheng
Hi,
This patch removes support for -funsafe-loop-optimizations, as well as 
-Wunsafe-loop-optimizations.  By its name, this option does unsafe 
optimizations by assuming all loops must terminate and doesn't wrap.  
Unfortunately, it's not as useful as expected because:
1) Simply assuming loop must terminate isn't enough.  What we really want is to 
analyze scalar evolution and loop niter bound under such assumptions.  This 
option does nothing in this aspect.
2) IIRC, this option generates bogus code for some common programs, that's why 
it's disabled by default even at Ofast level.

After I sent patches handling possible infinite loops in both (scev/niter) 
analyzer and vectorizer, it's a natural step to remove such options in GCC.  
This patch does so by deleting code for -funsafe-loop-optimizations, as well as 
-Wunsafe-loop-optimizations.  It also deletes the two now useless tests, while 
the option interface is preserved for backward compatibility purpose.

Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin

2016-07-14  Bin Cheng  

* common.opt (funsafe-loop-optimizations): Mark ignore.
(Wunsafe-loop-optimizations): Ditto.
* doc/invoke.texi (funsafe-loop-optimizations): Remove.
(Wunsafe-loop-optimizations): Ditto.
* loop-iv.c (get_simple_loop_desc): Remove support for
funsafe-loop-optimizations.
* tree-ssa-loop-niter.h (number_of_iterations_exit): Remove param.
* tree-ssa-loop-niter.c (number_of_iterations_exit): Remove param.
(finite_loop_p): Remove support for funsafe-loop-optimizations.
(find_loop_niter): Remove arg for number_of_iterations_exit call.
(estimate_numbers_of_iterations_loop): Ditto.
* ipa-inline-analysis.c (estimate_function_body_sizes): Ditto.
* predict.c (predict_loops): Ditto.
* tree-parloops.c (try_get_loop_niter): Ditto.
* tree-scalar-evolution.c (number_of_latch_executions): Ditto.
* tree-ssa-loop-ivcanon.c (remove_redundant_iv_tests): Ditto.
* tree-ssa-loop-ivopts.c (niter_for_exit): Ditto.
* tree-ssa-loop-manip.c (can_unroll_loop_p): Ditto.

gcc/testsuite/ChangeLog
2016-07-14  Bin Cheng  

* gcc.dg/tree-ssa/pr19210-1.c: Delete.
* gcc.dg/tree-ssa/pr19210-2.c: Delete.
diff --git a/gcc/common.opt b/gcc/common.opt
index a7c5125..331e1da 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -625,8 +625,8 @@ Common Var(warn_null_dereference) Warning
 Warn if dereferencing a NULL pointer may lead to erroneous or undefined 
behavior.
 
 Wunsafe-loop-optimizations
-Common Var(warn_unsafe_loop_optimizations) Warning
-Warn if the loop cannot be optimized due to nontrivial assumptions.
+Common Ignore
+Does nothing.  Preserved for backward compatibility.
 
 Wmissing-noreturn
 Common Warning Alias(Wsuggest-attribute=noreturn)
@@ -2500,8 +2500,8 @@ Perform loop unrolling for all loops.
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
 funsafe-loop-optimizations
-Common Report Var(flag_unsafe_loop_optimizations) Optimization
-Allow loop optimizations to assume that the loops behave in normal way.
+Common Ignore
+Does nothing.  Preserved for backward compatibility.
 
 fassociative-math
 Common Report Var(flag_associative_math) SetByCombined Optimization
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ca8c1b4..4241956 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -302,8 +302,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wswitch  -Wswitch-bool  -Wswitch-default  -Wswitch-enum @gol
 -Wswitch-unreachable  -Wsync-nand @gol
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
--Wtype-limits  -Wundef @gol
--Wuninitialized  -Wunknown-pragmas  -Wunsafe-loop-optimizations @gol
+-Wtype-limits  -Wundef -Wuninitialized  -Wunknown-pragmas @gol
 -Wunsuffixed-float-constants  -Wunused  -Wunused-function @gol
 -Wunused-label  -Wunused-local-typedefs -Wunused-parameter @gol
 -Wno-unused-result -Wunused-value @gol -Wunused-variable @gol
@@ -414,7 +413,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
 -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
--funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
+-funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
 -fweb -fwhole-program -fwpa -fuse-linker-plugin @gol
 --param @var{name}=@var{value}
@@ -4986,14 +4985,6 @@ If the stack usage is (partly) dynamic and not bounded, 
it's:
 @end smallexample
 @end itemize
 
-@item -Wunsafe-loop-optimizations
-@opindex Wunsafe-loop-optimizations
-@opindex Wno-unsafe-loop-optimizations
-Warn if the loop cannot be optimized because the compiler cannot
-assume anything on the bounds of the loop indices.  With

New Swedish PO file for 'gcc' (version 6.1.0)

2016-07-15 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-6.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: RFA: new pass to warn on questionable uses of alloca() and VLAs

2016-07-15 Thread Aldy Hernandez

On 07/11/2016 01:56 PM, Martin Sebor wrote:

On 07/11/2016 09:40 AM, Aldy Hernandez wrote:

On 07/11/2016 11:08 AM, Martin Sebor wrote:


Hey, I'm all for different options.  It would definitely be easier for
me :).  I wast trying really hard to use -Wvla= and keep the current
-Wvla meaning the same.  Though I see your point about using -Wvla* but
using different variables for representing -Wvla and -Wvla=blah.

The easiest thing would be:

-Wvla: Current behavior (warn on everything).

-Wvla-larger-than=: Warn on unbounded VLA and bounded VLAs > .

-Walloca: Warn on every use of alloca (analogous to -Wvla).

-Walloca-larger-than=: Warn on unbounded alloca and bounded allocas
 > .

This seems straightforward and avoids all the overloading you see.
Would
this be ok with everyone?


I like the consistency.

Given the common root with -Wlarger-than=N, what do envision the effect
of setting -Wlarger-than=N to be on the two new options?

FWIW, I tend to view -Wlarger-than= and certainly -Wframe-larger-than=
not warning on alloca larger than N as a defect.  It could be fixed by
simply hooking -Walloca-larger-than= up to it.


I'm not touching any of these independent options.  -Wvla-larger-than=
and -Walloca-larger-than= will be implemented independently of whatever
-Wlarger-than= and -Wframe-larger-than=.  Any problems with these last
two options can be treated indpendently (PRs).


Strictly speaking there is no defect in -Wframe-larger-than= because
it's documented not to warn for alloca or VLAs.  I asked because it
seems like an unnecessary (and IMO undesirable) limitation that could
be easily removed as part of this patch.  Not removing it, OTOH, and
providing a separate solution, feels like highlighting the limitation.
With the new options, users interested in detecting all forms of
excessive stack allocation will be able to do so but not using
a single option.  Instead, they will need to use all three.


I'll address this as a follow-up patch.

Aldy



Re: RFA: new pass to warn on questionable uses of alloca() and VLAs

2016-07-15 Thread Aldy Hernandez

On 07/10/2016 06:09 PM, Martin Sebor wrote:

On 07/08/2016 05:48 AM, Aldy Hernandez wrote:

[New thread now that I actually have a tested patch :)].

...

I have overhauled the options and added extensive documentation to
invoke.texi explaining them.  See the included testcases.  I have tried
to add a testcase for everything the pass currently handles.

In the interest of keeping a consistent relationship with -Wvla, we now
have:

-Walloca:Warn on every use of alloca (not VLAs).
-Walloca=999:Warn on unbounded uses of alloca, bounded uses with
 no known limit, and bounded uses where the number of
 bytes is greater than 999.
-Wvla:Behaves as currently (warn on every use of VLAs).
-Wvla=999:Similar to -Walloca=999, but for -Wvla.

Notice plain -Walloca doesn't have a default, and just warns on every
use of alloca, just like -Wvla does for VLAs.  This way we can be
consistent with -Wvla, and just add the -Wvla=999 variant.


I like it!


This patch depends on range information, which is less than stellar past
the VRP pass.  To get better range information, we'd have to incorporate
this somehow into the VRP pass and use the ASSERT_EXPRs therein.  And
still, VRP gets awfully confused with PHI merge points.  Andrew Macleod
is working on a new fancy range info infrastructure that should
eliminate a lot of the false positives we may get and/or help us
diagnose a wider range of cases.  Hence the reason that I'm not bending
over backwards to incorporate this pass into tree-vrp.c.

FYI, I have a few XFAILed tests in this patch, to make sure we don't
lose track of possible improvements (which in this case should be
handled by better range info).  What's the blessed way of doing this?
Are adding new XFAILed tests acceptable?

I have regstrapped this patch on x86-64 Linux, and have tested the
resulting compiler by building glibc with:

/source/glibc/configure --prefix=/usr CC="/path/bin/gcc -Walloca=5000"
--disable-werror

This of course, pointed out all sorts of interesting things!

Fire away!


I've played with the patch a bit over the weekend and have a few
comments and suggestions (I hope you won't regret encouraging me :)


Not at all.  Your feedback is quite valuable.


I like the consistency between -Walloca and -Wvla! (And despite
the volume of my remaining comments, the rest of the patch too!


Well, after Manuel's comments I decided to split things up as previously 
discussed:


-Wvla: warn on any VLA uses (as currently on mainline).
-Wvla-larger-than=N: warn on unbounded use of VLAs, etc.
-Walloca: same as -Wvla but for alloca.
-Walloca-larger-than=N: same as -Wvla-larger-than=N but for alloca.



1) Optimization.  Without -O2 GCC prints:

   sorry, unimplemented: -Walloca ignored without -O2


Changed to only sorry() on the -Wvla-larger-than=* and 
-Walloca-larger-than=* options.  The -Wvla and -Walloca warnings work 
without optimization as you suggest below.




It seems that a warning would be more appropriate here than
a hard error, but the checker could, and I would say should, be
made available (in a limited form) without optimization because
  -Walloca with no argument doesn't rely on it.  I suspect in this
form, -Walloca is probably mainly going to be useful as a mechanism
to enforce a "thou shall not use alloca" policy, though not much
more beyond that.  Some development processes only require that
code build without optimization in order to allow a commit and
do more extensive testing with optimization during continuous
integration, and not enabling it at -O0 would make it difficult
to adopt the warning on projects that use such a process.


Done.  -Walloca and -Wvla warn on any use of alloca and VLAs 
accordingly, with or without optimization.  I sorry() on the bounded cases.




2) When passed an argument of a signed type, GCC prints

   warning: cast from signed type in alloca

even though there is no explicit cast in the code.  It may not
be obvious why the conversion is a problem in this context.  I
would suggest to rephrase the warning along the lines of
-Wsign-conversion which prints:

   conversion to ‘long unsigned int’ from ‘int’ may change the sign of
the result

and add why it's a potential problem.  Perhaps something like:

   argument to alloca may be too large due to conversion from
   'int to 'long unsigned int'


Fixed:

void
g2 (short int n)
{
  void *p;
  p = __builtin_alloca (n);
  f (p);
}

b.c:9:5: warning: argument to alloca may be too large due to conversion 
from 'short int' to 'long unsigned int' [-Walloca-larger-than=]

   p = __builtin_alloca (n);
   ~~^~

I wonder whether we could do all this in the front-ends as in 
-Wsign-conversion, but this is something that can be done as a follow-up 
if we really want it.




(In addition, assuming one accepts the lack of range checking and
constant propagation, this aspect of -Walloca also doesn't need
optimization.)


I did not do this.  It is technically possible, but I 

C++ PATCH for c++/71495 (bogus note in SFINAE)

2016-07-15 Thread Jason Merrill
"complain" can be non-zero but not call for any diagnostics.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 63a7ba56bc105e90e64a8c273d0422d96752ad73
Author: Jason Merrill 
Date:   Thu Jul 14 14:02:57 2016 -0400

PR c++/71495 - spurious note during SFINAE.

* call.c (convert_like_real): Mask complain.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 9b02814..a3c5008 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6634,7 +6634,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
   expr = decay_conversion (expr, complain);
   if (expr == error_mark_node)
{
- if (complain)
+ if (complain & tf_error)
{
  maybe_print_user_conv_context (convs);
  if (fn)
diff --git a/gcc/testsuite/g++.dg/cpp0x/sfinae57.C 
b/gcc/testsuite/g++.dg/cpp0x/sfinae57.C
new file mode 100644
index 000..975a330
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/sfinae57.C
@@ -0,0 +1,16 @@
+// PR c++/71495
+// { dg-do compile { target c++11 } }
+
+struct A;
+template  void f(T);  // { dg-bogus "initializing" }
+template  T&& declval();
+struct B
+{
+  template  static decltype(f(declval())) g(int);
+  template  void g(...);
+} b;
+
+int main()
+{
+  b.g(42);
+}


C++ PATCH for c++/71511 (ICE on decltype scope in declaration)

2016-07-15 Thread Jason Merrill
Using decltype as the scope of a declaration is ill-formed like using
a template parameter, so we just need to add the case.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8e3fc6fef25c9ccddea0097216bf2989e10d9cb5
Author: Jason Merrill 
Date:   Thu Jul 14 12:34:28 2016 -0400

PR c++/71511 - ICE on decltype scope in declaration.

* typeck2.c (cxx_incomplete_type_diagnostic): Handle DECLTYPE_TYPE.

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index b1206c0..b9dc56d 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -537,6 +537,7 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
   break;
 
 case TYPENAME_TYPE:
+case DECLTYPE_TYPE:
   emit_diagnostic (diag_kind, loc, 0,
   "invalid use of dependent type %qT", type);
   break;
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype65.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype65.C
new file mode 100644
index 000..85f635a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype65.C
@@ -0,0 +1,11 @@
+// PR c++/71511
+// { dg-do compile { target c++11 } }
+
+template < typename T > 
+class A 
+{   
+  static int i;   
+};
+
+//okay: template < typename T > int A ::i = 100;
+template < typename T > int decltype (A < T > ())::i = 100; // { dg-error 
"decltype" }


C++ PATCH for c++/71513 (alignas on member enum)

2016-07-15 Thread Jason Merrill
Thinko in the tsubst_attributes logic.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 9460ce06baf4d47d44aa7111f991bb50e6644f67
Author: Jason Merrill 
Date:   Thu Jul 14 10:56:25 2016 -0400

PR c++/71513 - alignas on member enum in template

* pt.c (tsubst_attributes): Fix loop logic.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index de70fb2..1fbf546 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9713,20 +9713,23 @@ tsubst_attributes (tree attributes, tree args,
   }
 
   if (last_dep)
-for (tree *p =  *p; p = _CHAIN (*p))
+for (tree *p =  *p; )
   {
tree t = *p;
if (ATTR_IS_DEPENDENT (t))
  {
tree subst = tsubst_attribute (t, NULL, args, complain, in_decl);
-   if (subst == t)
- continue;
-   *p = subst;
-   do
- p = _CHAIN (*p);
-   while (*p);
-   *p = TREE_CHAIN (t);
+   if (subst != t)
+ {
+   *p = subst;
+   do
+ p = _CHAIN (*p);
+   while (*p);
+   *p = TREE_CHAIN (t);
+   continue;
+ }
  }
+   p = _CHAIN (*p);
   }
 
   return attributes;
diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas7.C 
b/gcc/testsuite/g++.dg/cpp0x/alignas7.C
new file mode 100644
index 000..a209250
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alignas7.C
@@ -0,0 +1,13 @@
+// PR c++/71513
+// { dg-do compile { target c++11 } }
+
+template < int N, typename T >
+struct A
+{ 
+  enum alignas (N) E : T;
+};
+
+#define SA(X) static_assert((X), #X)
+
+constexpr int al = alignof(double);
+SA(alignof(A::E) == al);


C++ PATCH for c++/71604 (type definition in range for)

2016-07-15 Thread Jason Merrill
This was crashing because the binding hijinks we do to handle the
range-for-declaration weren't dealing well with a class definition.
This patch fixes that, and also adds a diagnostic since that turns out
to be ill-formed.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit adb66c0085a034eddcb7d7467af372331c8db1be
Author: Jason Merrill 
Date:   Thu Jul 14 00:50:14 2016 -0400

PR c++/71604 - type definition in range-based for

PR c++/54430
* parser.c (cp_parser_range_for): Modify IDENTIFIER_BINDING directly.
(cp_parser_simple_declaration): Diagnose type definition in
for-range-declaration.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ef35aa9..69fb060 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -11187,11 +11187,17 @@ cp_parser_range_for (cp_parser *parser, tree scope, 
tree init, tree range_decl,
 bool ivdep)
 {
   tree stmt, range_expr;
+  cxx_binding *binding = NULL;
+  tree name = NULL_TREE;
 
   /* Get the range declaration momentarily out of the way so that
  the range expression doesn't clash with it. */
   if (range_decl != error_mark_node)
-pop_binding (DECL_NAME (range_decl), range_decl);
+{
+  name = DECL_NAME (range_decl);
+  binding = IDENTIFIER_BINDING (name);
+  IDENTIFIER_BINDING (name) = binding->previous;
+}
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
 {
@@ -11203,7 +11209,10 @@ cp_parser_range_for (cp_parser *parser, tree scope, 
tree init, tree range_decl,
 
   /* Put the range declaration back into scope. */
   if (range_decl != error_mark_node)
-push_binding (DECL_NAME (range_decl), range_decl, current_binding_level);
+{
+  binding->previous = IDENTIFIER_BINDING (name);
+  IDENTIFIER_BINDING (name) = binding;
+}
 
   /* If in template, STMT is converted to a normal for-statement
  at instantiation. If not, it is done just ahead. */
@@ -12437,8 +12446,15 @@ cp_parser_simple_declaration (cp_parser* parser,
   if (token->type == CPP_COMMA)
/* will be consumed next time around */;
   /* If it's a `;', we are done.  */
-  else if (token->type == CPP_SEMICOLON || maybe_range_for_decl)
+  else if (token->type == CPP_SEMICOLON)
break;
+  else if (maybe_range_for_decl)
+   {
+ if (declares_class_or_enum && token->type == CPP_COLON)
+   permerror (decl_specifiers.locations[ds_type_spec],
+  "types may not be defined in a for-range-declaration");
+ break;
+   }
   /* Anything else is an error.  */
   else
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/range-for31.C 
b/gcc/testsuite/g++.dg/cpp0x/range-for31.C
new file mode 100644
index 000..833f510
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/range-for31.C
@@ -0,0 +1,9 @@
+// PR c++/71604
+// { dg-do compile { target c++11 } }
+
+void foo ()
+{
+  int a[2] = { 1, 2 }; 
+  for (struct S { S (int) {} } S : a) // { dg-error "types may not be defined" 
}
+;
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/range-for8.C 
b/gcc/testsuite/g++.dg/cpp0x/range-for8.C
index a389f66..38fe456 100644
--- a/gcc/testsuite/g++.dg/cpp0x/range-for8.C
+++ b/gcc/testsuite/g++.dg/cpp0x/range-for8.C
@@ -7,9 +7,9 @@
 
 void test()
 {
-for (struct S { } *x : { (S*)0, (S*)0 } )
+for (struct S { } *x : { (S*)0, (S*)0 } ) // { dg-error "types may not be 
defined" }
 ;
 
-for (struct S { } x : { S(), S() } )
+for (struct S { } x : { S(), S() } ) // { dg-error "types may not be 
defined" }
 ;
 }


C++ PATCHes to mangling of sizeof... and fold-expressions

2016-07-15 Thread Jason Merrill
Investigating the ICE in 71814, I found that we never implemented the
specified mangling for sizeof...; sZ for a simple sizeof...(pack) and
sP for a more complicated form involving an alias template.  So the
first patch implements mangling and demangling those forms.

Similarly, 71711 shows that we never implemented mangling of C++17
fold-expressions, or partial instantiation of them.

The last patch bumps -fabi-version to 11 to deal with the change in
sizeof... mangling.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 81272a102f30521982d4e6442d1ef7b868f2d1fa
Author: Jason Merrill 
Date:   Mon Jul 11 16:02:42 2016 -0400

PR c++/71814 - mangling sizeof... (sP and sZ)

gcc/cp/
* mangle.c (write_expression): Handle sizeof... an argument pack.
libiberty/
* cp-demangle.c (cplus_demangle_operators): Add sP and sZ.
(d_print_comp_inner): Handle them.
(d_template_args_1): Split out from d_template_args.
(d_args_length): New.

diff --git a/gcc/common.opt b/gcc/common.opt
index 2b68fa7..b56ba47 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -882,6 +882,9 @@ Driver Undocumented
 ; identity, such as ia32 calling convention attributes (stdcall, etc.)
 ; Default in G++ 6 (set in c_common_post_options).
 ;
+; 11: The version of the ABI that corrects mangling of sizeof... expressions.
+; Default in G++ 7.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become the default version of the ABI.
 fabi-version=
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 0e44409..8205da9 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2767,17 +2767,67 @@ write_expression (tree expr)
   write_mangled_name (expr, false);
   write_char ('E');
 }
-  else if (TREE_CODE (expr) == SIZEOF_EXPR
-  && SIZEOF_EXPR_TYPE_P (expr))
+  else if (TREE_CODE (expr) == SIZEOF_EXPR)
 {
-  write_string ("st");
-  write_type (TREE_TYPE (TREE_OPERAND (expr, 0)));
-}
-  else if (TREE_CODE (expr) == SIZEOF_EXPR
-  && TYPE_P (TREE_OPERAND (expr, 0)))
-{
-  write_string ("st");
-  write_type (TREE_OPERAND (expr, 0));
+  tree op = TREE_OPERAND (expr, 0);
+
+  if (PACK_EXPANSION_P (op))
+   {
+ if (abi_warn_or_compat_version_crosses (11))
+   G.need_abi_warning = true;
+ if (abi_version_at_least (11))
+   {
+ /* sZ rather than szDp.  */
+ write_string ("sZ");
+ write_expression (PACK_EXPANSION_PATTERN (op));
+ return;
+   }
+   }
+
+  if (SIZEOF_EXPR_TYPE_P (expr))
+   {
+ write_string ("st");
+ write_type (TREE_TYPE (op));
+   }
+  else if (ARGUMENT_PACK_P (op))
+   {
+ tree args = ARGUMENT_PACK_ARGS (op);
+ int length = TREE_VEC_LENGTH (args);
+ if (abi_warn_or_compat_version_crosses (10))
+   G.need_abi_warning = true;
+ if (abi_version_at_least (10))
+   {
+ /* sP * E # sizeof...(T), size of a captured
+template parameter pack from an alias template */
+ write_string ("sP");
+ for (int i = 0; i < length; ++i)
+   write_template_arg (TREE_VEC_ELT (args, i));
+ write_char ('E');
+   }
+ else
+   {
+ /* In GCC 5 we represented this sizeof wrong, with the effect
+that we mangled it as the last element of the pack.  */
+ tree arg = TREE_VEC_ELT (args, length-1);
+ if (TYPE_P (op))
+   {
+ write_string ("st");
+ write_type (arg);
+   }
+ else
+   {
+ write_string ("sz");
+ write_expression (arg);
+   }
+   }
+   }
+  else if (TYPE_P (TREE_OPERAND (expr, 0)))
+   {
+ write_string ("st");
+ write_type (TREE_OPERAND (expr, 0));
+   }
+  else
+   goto normal_expr;
 }
   else if (TREE_CODE (expr) == ALIGNOF_EXPR
   && TYPE_P (TREE_OPERAND (expr, 0)))
@@ -2947,6 +2997,7 @@ write_expression (tree expr)
 }
   else
 {
+normal_expr:
   int i, len;
   const char *name;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1.C
new file mode 100644
index 000..51f9581
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1.C
@@ -0,0 +1,11 @@
+// Test for sZ mangling.
+// { dg-do compile { target c++11 } }
+// { dg-final { scan-assembler "_Z1fIJidEEv1AIXsZT_EE" } }
+
+template  struct A { };
+template  void f(A);
+
+int main()
+{
+  f(A<2>());
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1a.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1a.C
new file mode 100644
index 000..b230ffa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-mangle1a.C

C++ PATCH for c++/71718 (ICE with infinite instantiation and alias template)

2016-07-15 Thread Jason Merrill
When we're trying to die due to excessive template recursion, we
shouldn't try to instantiate more templates when printing diagnostics.
The code in error.c already checks at_eof to prevent instantiation, so
let's just set that appropriately.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 72b9617f4d2ec4af8ab9fdcd29b3c9d065059429
Author: Jason Merrill 
Date:   Mon Jul 11 15:36:04 2016 -0400

PR c++/71718 - infinite recursion and alias template

* pt.c (push_tinst_level_loc): Set at_eof before fatal_error.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a1b0ca9..830ff0a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9132,10 +9132,12 @@ push_tinst_level_loc (tree d, location_t loc)
 
   if (tinst_depth >= max_tinst_depth)
 {
+  /* Tell error.c not to try to instantiate any templates.  */
+  at_eof = 2;
   fatal_error (input_location,
   "template instantiation depth exceeds maximum of %d"
-   " (use -ftemplate-depth= to increase the maximum)",
-   max_tinst_depth);
+  " (use -ftemplate-depth= to increase the maximum)",
+  max_tinst_depth);
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-55.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-55.C
new file mode 100644
index 000..c6d7ae6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-55.C
@@ -0,0 +1,23 @@
+// PR c++/71718
+// { dg-do compile { target c++11 } }
+
+template 
+class A : T{};
+
+template 
+using sp = A;
+
+struct Base {};
+
+template 
+const sp
+rec()  // { dg-error "depth" }
+{
+  return rec();  
+}
+
+static void f(void) {
+  rec();
+}
+
+// { dg-prune-output "compilation terminated" }


C++ PATCH for c++/70824 (initializer list in template)

2016-07-15 Thread Jason Merrill
Instantiating the compound literal for the underlying array of the
initializer list was creating a new variable, so then we would pull
out that variables initializer and instantiate that, ad infinitum.  We
shouldn't need to instantiate the initializer for any artificial
variable.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 2fb3877338f276ea8f9a8871b823a359672cbc99
Author: Jason Merrill 
Date:   Mon Jul 11 15:01:13 2016 -0400

PR c++/70824 - initializer_list in template

* init.c (constant_value_1): Don't instantiated DECL_INITIAL of
artificial variables.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index b4a4388..6047639 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2073,8 +2073,13 @@ constant_value_1 (tree decl, bool strict_p, bool 
return_aggregate_cst_ok_p)
  && TREE_CODE (init) == TREE_LIST
  && TREE_CHAIN (init) == NULL_TREE)
init = TREE_VALUE (init);
-  /* Instantiate a non-dependent initializer.  */
-  init = instantiate_non_dependent_or_null (init);
+  /* Instantiate a non-dependent initializer for user variables.  We
+mustn't do this for the temporary for an array compound literal;
+trying to instatiate the initializer will keep creating new
+temporaries until we crash.  Probably it's not useful to do it for
+other artificial variables, either.  */
+  if (!DECL_ARTIFICIAL (decl))
+   init = instantiate_non_dependent_or_null (init);
   if (!init
  || !TREE_TYPE (init)
  || !TREE_CONSTANT (init)
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-template1.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-template1.C
new file mode 100644
index 000..a24e205
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-template1.C
@@ -0,0 +1,15 @@
+// PR c++/70824
+// { dg-do compile { target c++11 } }
+
+#include 
+
+constexpr
+int
+max(std::initializer_list __l)
+{ return *__l.begin(); }
+
+template 
+void
+a() {
+  const int v =  max({1});
+}


[PATCH, rs6000] Fix {div,mul}kc3-1.c for older hardware

2016-07-15 Thread Bill Schmidt
Hi,

For the subject tests, I neglected to ensure they would be skipped on
older hardware.  This patch fixes that.  Tested on
powerpc64-unknown-linux-gnu on POWER7 to verify the tests are
unsupported there, and on powerpc64le-unknown-linux-gnu on POWER8
to verify they still run there.  Ok for trunk and gcc-6-branch?

Thanks,
Bill


2016-07-15  Bill Schmidt  

* gcc.target/powerpc/divkc3-1.c: Require p8vector support.
* gcc.target/powerpc/mulkc3-1.c: Likewise.


Index: gcc/testsuite/gcc.target/powerpc/divkc3-1.c
===
--- gcc/testsuite/gcc.target/powerpc/divkc3-1.c (revision 238375)
+++ gcc/testsuite/gcc.target/powerpc/divkc3-1.c (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mfloat128 -mvsx" } */
 
 void abort ();
Index: gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
===
--- gcc/testsuite/gcc.target/powerpc/mulkc3-1.c (revision 238375)
+++ gcc/testsuite/gcc.target/powerpc/mulkc3-1.c (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mfloat128 -mvsx" } */
 
 void abort ();



[PATCH] Add qsort comparator consistency checking (PR71702)

2016-07-15 Thread Alexander Monakov
Hi,

this patch adds internal checking for comparator functions that GCC passes to
qsort.  PR71702 describes an ICE that happens because comparator
'dr_group_sort_cmp' used to be non-transitive in some cases until GCC 6.  This
patch would uncover that issue on a number of small testcases in GCC's
testsuite, so it should be useful to catch similar issues in the future as well.

Although the meat of the implementation is not tied to vec<>, this patch adds
verification only to vec::qsort.  I see there's a number of other ::qsort uses
in GCC; it should be useful to have such checking there as well.  I'd appreciate
input on that (I'd go with '#define qsort(b, n, s, c) qsort_chk (b, n, s, c)'
in system.h and implement qsort_chk as a checking wrapper around libc qsort).

Bootstrapped and regtested on x86_64, OK to apply?

Alexander

* vec.c (vec_check_sort_cmp): New.  Use it...
* vec.h (vec::qsort): ...here to verify comparator
consistency when checking is enabled.

diff --git a/gcc/vec.c b/gcc/vec.c
index fd200ea..b4ac0b4 100644
--- a/gcc/vec.c
+++ b/gcc/vec.c
@@ -190,6 +190,44 @@ dump_vec_loc_statistics (void)
   vec_mem_desc.dump (VEC_ORIGIN);
 }
 
+/* Verify consistency of comparator CMP on array BASE of N SIZE-sized elements.
+   Assuming the array should be sorted according to CMP, any assertion failure
+   here implies that CMP is not transitive, or is not anti-commutative.  */
+
+void
+vec_check_sort_cmp (const void *base, size_t n, size_t size,
+   int (*cmp) (const void *, const void *))
+{
+  const char *cbase = (const char *) base;
+  size_t i1, i2, i, j;
+  /* The following loop nest has O(n^2) time complexity.  Limit n to avoid
+ slowness on artificial testcases.  */
+  if (n > 100)
+n = 100;
+#define CMP(i, j) cmp (cbase + (i) * size, cbase + (j) * size)
+  /* The outer loop iterates over maximum spans [i1, i2) such that elements
+ within each span compare equal.  */
+  for (i1 = 0; i1 < n; i1 = i2)
+{
+  /* Position i2 past the last element that compares equal to i1'th.  */
+  for (i2 = i1 + 1; i2 < n; i2++)
+   if (CMP (i1, i2))
+ break;
+   else
+ gcc_assert (!CMP (i2, i1));
+  /* Verify that all remaining pairs within current span compare equal.  */
+  for (i = i1 + 1; i + 1 < i2; i++)
+   for (j = i + 1; j < i2; j++)
+ gcc_assert (!CMP (i, j) && !CMP (j, i));
+  /* Verify that all elements within current span compare less than any
+ element beyond the span.  */
+  for (i = i1; i < i2; i++)
+   for (j = i2; j < n; j++)
+ gcc_assert (CMP (i, j) < 0 && CMP (j, i) > 0);
+}
+#undef CMP
+}
+
 #ifndef GENERATOR_FILE
 #if CHECKING_P
 
diff --git a/gcc/vec.h b/gcc/vec.h
index eb8c270..ff1e37e 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -182,6 +182,9 @@ extern void *ggc_realloc (void *, size_t MEM_STAT_DECL);
 /* Support function for statistics.  */
 extern void dump_vec_loc_statistics (void);
 
+extern void vec_check_sort_cmp (const void *, size_t, size_t,
+   int (*) (const void *, const void *));
+
 /* Hashtable mapping vec addresses to descriptors.  */
 extern htab_t vec_mem_usage_hash;
 
@@ -947,8 +950,11 @@ template
 inline void
 vec::qsort (int (*cmp) (const void *, const void *))
 {
-  if (length () > 1)
-::qsort (address (), length (), sizeof (T), cmp);
+  if (length () <= 1)
+return;
+  ::qsort (address (), length (), sizeof (T), cmp);
+  if (CHECKING_P)
+vec_check_sort_cmp (address (), length (), sizeof (T), cmp);
 }
 
 


[patch,avr] make progmem work on AVR_TINY, use TARGET_ADDR_SPACE_DIAGNOSE_USAGE

2016-07-15 Thread Georg-Johann Lay

This patch needs new hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00839.html

This patch turns attribute progmem into a working feature for AVR_TINY cores.

It boils down to adding 0x4000 to all symbols with progmem:  Flash memory can 
be seen in the RAM address space starting at 0x4000, i.e. data in flash can be 
read by means of LD instruction if we add offsets of 0x4000.  There is no need 
for special access macros like pgm_read_* or special address spaces as there is 
nothing like a LPM instruction.


This is simply achieved by setting a respective symbol_ref_flag, and when such 
a symbol has to be printed, then plus_constant with 0x4000 is used.


Diagnosing of unsupported address spaces is now performed by 
TARGET_ADDR_SPACE_DIAGNOSE_USAGE which has exact location information.  Hence 
there is no need to scan all decls for invalid address spaces.


For AVR_TINY, alls address spaces have been disabled.  They are of no use. 
Supporting __flash would just make the backend more complicated without any gains.



Ok for trunk?

Johann


gcc/
* doc/extend.texi (AVR Variable Attributes) [progmem]: Add
documentation how it works on reduced Tiny cores.
(AVR Named Address Spaces): No support for reduced Tiny.
* avr-protos.h (avr_addr_space_supported_p): New prototype.
* avr.c (AVR_SYMBOL_FLAG_TINY_PM): New macro.
(avr_address_tiny_pm_p): New static function.
(avr_print_operand_address) [AVR_TINY]: Add AVR_TINY_PM_OFFSET
if the address is in progmem.
(avr_assemble_integer): Same.
(avr_encode_section_info) [AVR_TINY]: Set AVR_SYMBOL_FLAG_TINY_PM
for symbol_ref in progmem.
(TARGET_ADDR_SPACE_DIAGNOSE_USAGE): New hook define...
(avr_addr_space_diagnose_usage): ...and implementation.
(avr_addr_space_supported_p): New function.
(avr_nonconst_pointer_addrspace, avr_pgm_check_var_decl): Only
report bad address space usage if that space is supported.
(avr_insert_attributes): Same.  No more complain about unsupported
address spaces.
* avr.h (AVR_TINY_PM_OFFSET): New macro.
* avr-c.c (tm_p.h): Include it.
(avr_cpu_cpp_builtins) [__AVR_TINY_PM_BASE_ADDRESS__]: Use
AVR_TINY_PM_OFFSET instead of magic 0x4000 when built-in def'ing.
Only define addr-space related built-in macro if
avr_addr_space_supported_p.
gcc/testsuite/
* gcc.target/avr/torture/tiny-progmem.c: New test.

Index: config/avr/avr-c.c
===
--- config/avr/avr-c.c	(revision 237643)
+++ config/avr/avr-c.c	(working copy)
@@ -26,7 +26,7 @@
 #include "c-family/c-common.h"
 #include "stor-layout.h"
 #include "langhooks.h"
-
+#include "tm_p.h"
 
 /* IDs for all the AVR builtins.  */
 
@@ -253,7 +253,10 @@ avr_register_target_pragmas (void)
   gcc_assert (ADDR_SPACE_GENERIC == ADDR_SPACE_RAM);
 
   /* Register address spaces.  The order must be the same as in the respective
- enum from avr.h (or designated initializers must be used in avr.c).  */
+ enum from avr.h (or designated initializers must be used in avr.c).
+ We always register all address spaces even if some of them make no
+ sense for some targets.  Diagnose for non-supported spaces will be
+ emit by TARGET_ADDR_SPACE_DIAGNOSE_USAGE.  */
 
   for (i = 0; i < ADDR_SPACE_COUNT; i++)
 {
@@ -293,7 +296,7 @@ avr_cpu_cpp_builtins (struct cpp_reader
   builtin_define_std ("AVR");
 
   /* __AVR_DEVICE_NAME__ and  avr_mcu_types[].macro like __AVR_ATmega8__
-	 are defined by -D command option, see device-specs file.  */
+ are defined by -D command option, see device-specs file.  */
 
   if (avr_arch->macro)
 cpp_define_formatted (pfile, "__AVR_ARCH__=%s", avr_arch->macro);
@@ -334,7 +337,8 @@ start address.  This macro shall be used
  it has been mapped to the data memory.  For AVR_TINY devices
  (ATtiny4/5/9/10/20 and 40) mapped program memory starts at 0x4000. */
 
-  cpp_define (pfile, "__AVR_TINY_PM_BASE_ADDRESS__=0x4000");
+  cpp_define_formatted (pfile, "__AVR_TINY_PM_BASE_ADDRESS__=0x%x",
+AVR_TINY_PM_OFFSET);
 }
 
   if (AVR_HAVE_EIJMP_EICALL)
@@ -391,10 +395,7 @@ /* Define builtin macros so that the use
 /* Only supply __FLASH macro if the address space is reasonable
for this target.  The address space qualifier itself is still
supported, but using it will throw an error.  */
-&& avr_addrspace[i].segment < avr_n_flash
-	/* Only support __MEMX macro if we have LPM.  */
-	&& (AVR_HAVE_LPM || avr_addrspace[i].pointer_size <= 2))
-
+&& avr_addr_space_supported_p ((addr_space_t) i))
   {
 const char *name = avr_addrspace[i].name;
 char *Name = (char*) alloca (1 + strlen (name));
Index: config/avr/avr-protos.h

[PATCH][RFC] PR middle-end/22141 GIMPLE store widening pass

2016-07-15 Thread Kyrill Tkachov

Hi all,

This is a GIMPLE pass to implement PR middle-end/22141. that is merge narrow 
stores of constants
into fewer wider stores.  A 2009 patch from Jakub [1] contains many testcases 
but a simple motivating
case can be:

struct bar {
  int a;
  char b;
  char c;
  char d;
  char e;
}; // packed 64-bit structure

void bar (struct bar *);

void
foo (struct bar *p)
{
  p->b = 0;
  p->a = 0;
  p->c = 0;
  p->d = 1;
  p->e = 0;
}

Currently on aarch64 this will generate:
foo:
mov w1, 1
str wzr, [x0]
strbwzr, [x0, 4]
strbwzr, [x0, 5]
strbw1, [x0, 6]
strbwzr, [x0, 7]
ret

With this patch this can be improved into a single unaligned store:
foo:
mov x1, 0x1
str x1, [x0]
ret

or, if compiled with -mstrict-align:
foo:
mov w1, 0x1
stp wzr, w1, [x0]
ret

The pass is a tree-ssa pass that runs fairly late in the pipeline, after 
pass_optimize_widening_mul.
I explain the approach taken in the comments in the new 
tree-ssa-store-widening.c file but essentially
it has 3 phases applied to each basic block:

1) Scan through the statements recording assignments of constants to 
destinations like ARRAY_REF,
COMPONENT_REF, MEM_REF which are determined to write to an ultimate common 
destination. get_inner_reference
is used to decompose these destinations. Continue recording these until we 
encounter a statement that may
interfere with the stores we've been recording (load or store that may alias, 
volatile access etc).
These assignments of interest are recorded as store_immediate_info objects in 
the m_store_info vector.

2) Analyse the stores recorded in phase one (they all write to a destination 
offset from a common base)
and merge them into wider assignments up to BITS_PER_WORD bits wide. These 
widened assignments are represented
as merged_store_group objects and they are recorded in the 
m_merged_store_groups vector. This is the
coalesce_immediate_stores function. It sorts the stores by the bitposition they 
write to and iterates through
them, merging consecutive stores (it fails the transformation on overlapping 
stores, I don't think that case
appears often enough to warrant extra logic) up to BITS_PER_WORD-wide accesses.

3) Go through the merged stores recorded in m_merged_store_groups and output 
each widened store. Widened stores
that are not of a bitsize that is a power of two (for example 48 bits wide) are 
output as multiple stores of decreasing
power-of-two width. So, for a widened store 48-bits wide this phase would a 
emit a 32-bit store followed by a
16-bit store. The new sequence is only emitted if it contains fewer statements 
than the original sequence that it
will replace.  This phase also avoids outputting unaligned stores for 
STRICT_ALIGNMENT targets or targets where
SLOW_UNALIGNED_ACCESS forbids it. Since some configurations/targets may want to 
avoid generation of unaligned
stores even when it is legal I've added the new param 
PARAM_STORE_WIDENING_ALLOW_UNALIGNED that can be used
to disallow unaligned store generation.  Its default setting is to allow them 
(assuming that STRICT_ALIGNMENT
and SLOW_UNALIGNED_ACCESS allows it).

This is my first GIMPLE-level pass so please do point out places where I'm not 
using the interfaces correctly.
This patch handles bitfields as well, but only if they are a multiple of 
BITS_PER_UNIT. It should be easily
extensible to handle other bitfields as well, but I'm not entirely sure of the 
rules for laying out such bitfields
and in particular the byteswap logic that needs to be applied for big-endian 
targets. If someone can shed some light
on how they should be handed I'll be happy to try it out, but I believe this 
patch is an improvement as it is.

This has been bootstrapped and tested on aarch64-none-linux-gnu, 
arm-none-linux-gnueabihf and x86_64-unknown-linux-gnu.
I've also tested it on the big-endian targets: armeb-none-eabi, 
aarch64_be-none-elf. Also tested aarch64-none-elf/-mabi=ilp32.

I've benchmarked it on SPEC2006 on AArch64 on a Cortex-A72 and there were no 
regressions, the overall score improved a bit
(about 0.1%). The interesting improvements were:
458.sjeng (+0.8%)
483.xalancbmk (+1.1%)
416.gamess(+1.0%)
454.calculix  (+1.1%)

An interesting effect was in BZ2_decompress from bzip2 where at -Ofast it 
transformed a long sequence of constant
byte stores into a much shorter sequence of word-size stores (from ~550 
instructions to ~190).

On x86_64 SPECINT there was no change in the overall score. The code size at 
-Ofast is consistently smaller
with this patch but the preformance differences on sub-benchmarks are in the 
noise.

I've included the testcases from Jakub's patch [1] and added a few of my own.

Is this direction acceptable for the problem this is trying to solve?

Thanks,
Kyrill

N.B. I'm going on vacation until August so I won't be able to respond to any 
feedback until 

Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-15 Thread Yuri Rumyantsev
Richard!

Did you have a chance to look at this patch?

Thanks.
Yuri.

2016-07-08 17:07 GMT+03:00 Yuri Rumyantsev :
> Hi Richard,
>
> Thanks for your help - your patch looks much better.
> Here is new patch in which additional argument was added to determine
> source loop of reference.
>
> Bootstrap and regression testing did not show any new failures.
>
> Is it OK for trunk?
> ChangeLog:
> 2016-07-08  Yuri Rumyantsev  
>
> PR tree-optimization/71734
> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
> contains REF, use it to check safelen, assume that safelen value
> must be greater 1, fix style.
> (ref_indep_loop_p_2): Add REF_LOOP argument.
> (ref_indep_loop_p): Pass LOOP as additional argument to
> ref_indep_loop_p_2.
> gcc/testsuite/ChangeLog:
> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.
>
> 2016-07-08 11:18 GMT+03:00 Richard Biener :
>> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev  wrote:
>>> I checked simd3.f90 and found out that my additional check reject
>>> independence of references
>>>
>>> REF is independent in loop#3
>>> .istart0.19, .iend0.20
>>> which are defined in loop#1 which is outer for loop#3.
>>> Note that these references are defined by
>>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
>>> which is in loop#1.
>>> It is clear that both these references can not be independent for loop#3.
>>
>> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner loops
>> of LOOP to catch memory references in those as well.  So the issue is really
>> that we look at the wrong loop for safelen and we _do_ want to apply safelen
>> to inner loops as well.
>>
>> So better track the loop we are ultimately asking the question for, like in 
>> the
>> attached patch (fixes the testcase for me).
>>
>> Richard.
>>
>>
>>
>>> 2016-07-07 17:11 GMT+03:00 Richard Biener :
 On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev  wrote:
> I Added this check because of new failures in libgomp.fortran suite.
> Here is copy of Jakub message:
> --- Comment #29 from Jakub Jelinek  ---
> The #c27 r237844 change looks bogus to me.
> First of all, IMNSHO you can argue this way only if ref is a reference 
> seen in
> loop LOOP,

 or inner loops of LOOP I guess.  I _think_ we never call 
 ref_indep_loop_p_1 with
 a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would not 
 make
 sense to do that, it would be a waste of time).

 So only if "or inner loops of LOOP" is not correct the check would be 
 needed
 but then my issue with unrolling an inner loop and turning a ref that 
 safelen
 does not apply to into a ref that it now applies to arises.

 I don't fully get what Jakub is hinting at.

 Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can you
 explain that bitmap check with a simple testcase?

 Thanks,
 Richard.

> which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2
> -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p - 
> the
> D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the outer 
> loop
> obviously can be dependent on many of the loads and/or stores in the 
> loop, be
> it "omp simd array" or not.
> Say for
> void
> foo (int *p, int *q)
> {
>   #pragma omp simd
>   for (int i = 0; i < 1024; i++)
> p[i] += q[0];
> }
> sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could 
> write
> something that changes its value, and then it would behave differently 
> from
> using VF = 1024, where everything is performed in parallel.
> Though, actually, it can alias, just it would have to write the same 
> value as
> was there.  So, if this is used to determine if it is safe to hoist the 
> load
> before the loop, it is fine, if it is used to determine if [0] >= [0] 
> &&
> [0] <= [1023], then it is not fine.
>
> For aliasing of q[0] and p[1023], I don't see why they couldn't alias in a
> valid program.  #pragma omp simd I think guarantees that the last 
> iteration is
> executed last, it isn't necessarily executed last alone, it could be, or
> together with one before last iteration, or (for simdlen INT_MAX) even all
> iterations can be done concurrently, in hw or sw, so it is fine if it is
> transformed into:
>   int temp[1024], temp2[1024], temp3[1024];
>   for (int i = 0; i < 1024; i++)
> temp[i] = p[i];
>   for (int i = 0; i < 1024; i++)
> temp2[i] = q[0];
>   /* The above two loops can be also swapped, or intermixed.  */
>   for (int i = 0; i < 1024; i++)
> temp3[i] = temp[i] + temp2[i];
>   for (int 

Re: [PATCH testsuite]XFAIL gcc.dg/tree-ssa/pr71347 on some targets

2016-07-15 Thread Bin.Cheng
On Thu, Jul 14, 2016 at 6:14 PM, Jeff Law  wrote:
> On 07/14/2016 10:11 AM, Bin Cheng wrote:
>>
>> Hi, Test gcc.dg/tree-ssa/pr71347 failed on some targets if the two
>> memory references are re-written into different forms by IVOPT.  This
>> could be because of various reasons, for example, auto-increment
>> addressing mode.  Since the address expressions are of different
>> form, DOM fails to eliminate the redundant load at the moment.  I
>> think DOM should be improved to handle address expressions appearing
>> in different forms (at least for simple cases).  For now, I will
>> XFAIL this test indicating a possible enhancement.
>
> FWIW, Adam Lawrence did some work in this area a few months ago. Essentially
> enhancing DOM to be able to detect certain MEM_REF and ARRAY_REF accesses
> were in fact equivalent.
Hi Jeff,
Do you recall any PR/patch of this?  I didn't find it in mail searching.

>
>
>>
>> Test checked on affected targets.  Is it OK?
Applied.

Thanks,
bin
>
> Yes.  This is fine.
> jeff
>


Re: [PATCH] Amend dump expectation in slsr-8.c (PR, tree-optimization/71490)

2016-07-15 Thread Bill Schmidt

> On Jul 15, 2016, at 2:24 AM, Richard Biener  
> wrote:
> 
> On Thu, Jul 14, 2016 at 6:10 PM, Martin Liška  wrote:
>> On 07/14/2016 01:21 PM, Richard Biener wrote:
>>> On Thu, Jul 14, 2016 at 1:06 PM, Martin Liška  wrote:
 On 07/13/2016 07:21 PM, Jeff Law wrote:
> Isn't that a code quality regression?  So instead shouldn't we be keeping 
> the same expectation, but xfailing the test?
> 
> jeff
 
 Hello.
 
 Disabling a pass before slsr makes the test to catch both opportunities.
 Is the patch fine?
>>> 
>>> No, this is still a code quality regression.  What happens is that for
>>> some reason we fail to sink for GCC 6.
>> 
>> So should I just mark the test-case as a xfail?
> 
> Leave it FAIL and open a bug.  We need to fix SLSR to handle the situation.

Please CC me on the bug (wschm...@gcc.gnu.org).

Thanks,
Bill

> 
> You can try going back to the point where the testcase was added and look at 
> the
> IL that it was supposed to test, on the GCC 6 branch we sink into one
> arm but not
> the other, on trunk we sink into both.  Iff the original IL was
> without any sinking
> then adding a testcase variant with sinking turned off might be good as well.
> 
> I'll also note that if we'd do these kind of tests as unit-tests we'd
> never notice
> that in real-world the testcase would have started failing due to
> previous passes
> messing up the IL.
> 
> Richard.
> 
>> M.
>> 
>>> 
>>> Richard.
>>> 
 Thanks,
 Martin
>> 
> 



[PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate

2016-07-15 Thread Bill Schmidt
Hi,

This patch is a follow-up to Richard's patch of
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
vec_construct (initialization of an N-way vector by N scalars) is too low,
which can cause too-aggressive vectorization in particular for N=8 or
higher.  Richard changed the default cost to N-1, which is generally
sensible.  For powerpc I am going with a slightly higher cost of N, which
will keep us from being less conservative than the previous values when N=2.

The whole cost model for powerpc needs more work (in particular we need
to distinguish among processor models), but that's beyond the scope of
this patch.  One thing that I've called out in the comments is that a
vec_construct can have wildly different costs depending on the scalar
elements.  If they are all the same small constant, then we only need
a single splat-immediate instruction; but for V4SF the cost is potentially
higher because of the need to do converts.  For the splat case, we might
want to teach the vectorizer in general to estimate the cost as just
a vector_stmt rather than a vec_construct, but that requires some target
knowledge of which constants can be duplicated with a splat-immediate.

In any case, the purpose of this patch is simply to avoid vectorizing
things we shouldn't when we've undercounted the cost of a vec_construct.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions (hence the vectorization decisions in the test suite have
not changed).  Is this ok for trunk?

Thanks,
Bill


2016-07-15  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
Improve vec_construct estimate.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 238312)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
tree vectype, int misalign)
 {
   unsigned elements;
-  tree elem_type;
 
   switch (type_of_cost)
 {
@@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
 return 2;
 
   case vec_construct:
-   elements = TYPE_VECTOR_SUBPARTS (vectype);
-   elem_type = TREE_TYPE (vectype);
-   /* 32-bit vectors loaded into registers are stored as double
-  precision, so we need n/2 converts in addition to the usual
-  n/2 merges to construct a vector of short floats from them.  */
-   if (SCALAR_FLOAT_TYPE_P (elem_type)
-   && TYPE_PRECISION (elem_type) == 32)
- return elements + 1;
-   else
- return elements / 2 + 1;
+   /* This is a rough approximation assuming non-constant elements
+  constructed into a vector via element insertion.  FIXME:
+  vec_construct is not granular enough for uniformly good
+  decisions.  If the initialization is a splat, this is
+  cheaper than we estimate.  If we want to form four SF
+  values into a vector, it's more expensive (we need to
+  copy the four elements into two vector registers,
+  perform two conversions to single precision, and merge
+  the two result vectors).  Improve this someday.  */
+   return TYPE_VECTOR_SUBPARTS (vectype);
 
   default:
 gcc_unreachable ();



[PATCH] Fix PR71893

2016-07-15 Thread Richard Biener

This PR shows that array_ref_element_size may apply spurious casts which
in turn end up confusing VN/PRE.

Fixed as follows, bootstrapped on x86_64-unknown-linux-gnu, testing in
progress.

Richard.

2016-07-15  Richard Biener  

PR tree-optimization/71893
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Compensate
for sizetype cast added by array_ref_element_size.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 238370)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_component_ref_by_pieces_1 (basic_
*** 2576,2581 
--- 2587,2595 
  {
genop3 = size_binop (EXACT_DIV_EXPR, genop3,
 size_int (TYPE_ALIGN_UNIT (elmt_type)));
+   /* We may have a useless conversion added by
+  array_ref_element_size via copy_reference_opts_from_ref.  */
+   STRIP_USELESS_TYPE_CONVERSION (genop3);
genop3 = find_or_generate_expression (block, genop3, stmts);
if (!genop3)
  return NULL_TREE;
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 238370)
--- gcc/tree-ssa-sccvn.c(working copy)
*** copy_reference_ops_from_ref (tree ref, v
*** 810,815 
--- 810,818 
  /* Always record lower bounds and element size.  */
  temp.op1 = array_ref_low_bound (ref);
  temp.op2 = array_ref_element_size (ref);
+ /* array_ref_element_size forces the result to sizetype
+even if that is the same as bitsizetype.  */
+ STRIP_USELESS_TYPE_CONVERSION (temp.op2);
  if (TREE_CODE (temp.op0) == INTEGER_CST
  && TREE_CODE (temp.op1) == INTEGER_CST
  && TREE_CODE (temp.op2) == INTEGER_CST)



Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue

2016-07-15 Thread Bernd Schmidt

On 07/15/2016 02:22 PM, Dominik Vogt wrote:

Final version 6 with the stray comment removed (was a harmless
oversight).


Ok, let's put it in.


Bernd


Re: C, C++: Fix PR 69733 (bad location for ignored qualifiers warning)

2016-07-15 Thread Bernd Schmidt

On 06/22/2016 05:37 AM, Jeff Law wrote:

It looks like this stalled...

Anyway, it's fine for the trunk.


Some of the surrounding code was changed a bit to produce different 
errors for different C standards, so I had to make an adjustment to the 
patch. While I was here, I added cdw_atomic to the list of qualifiers to 
look for, and added a C-specific testcase for that case.


Retested as before. Ok for this version too?


Bernd
c/
	PR c++/69733
	* c-decl.c (smallest_type_quals_location): New static function.
	(grokdeclarator): Try to find the correct location for an ignored
	qualifier.
cp/
	PR c++/69733
	* decl.c (grokdeclarator): Try to find the correct location for an
	ignored qualifier.
testsuite/
	PR c++/69733
	* c-c++-common/pr69733.c: New test.
	* gcc.dg/pr69733.c: New test.
	* gcc.target/i386/pr69733.c: New test.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c	(revision 238040)
+++ gcc/c/c-decl.c	(working copy)
@@ -5463,6 +5463,27 @@ warn_defaults_to (location_t location, i
   va_end (ap);
 }
 
+/* Returns the smallest location != UNKNOWN_LOCATION in LOCATIONS,
+   considering only those c_declspec_words found in LIST, which
+   must be terminated by cdw_number_of_elements.  */
+
+static location_t
+smallest_type_quals_location (const location_t *locations,
+			  const c_declspec_word *list)
+{
+  location_t loc = UNKNOWN_LOCATION;
+  while (*list != cdw_number_of_elements)
+{
+  location_t newloc = locations[*list];
+  if (loc == UNKNOWN_LOCATION
+	  || (newloc != UNKNOWN_LOCATION && newloc < loc))
+	loc = newloc;
+  list++;
+}
+
+  return loc;
+}
+
 /* Given declspecs and a declarator,
determine the name and type of the object declared
and construct a ..._DECL node for it.
@@ -6277,7 +6298,19 @@ grokdeclarator (const struct c_declarato
 	   qualify the return type, not the function type.  */
 	if (type_quals)
 	  {
-		int quals_used = type_quals;
+		const enum c_declspec_word ignored_quals_list[] =
+		  {
+		cdw_const, cdw_volatile, cdw_restrict, cdw_address_space,
+		cdw_atomic, cdw_number_of_elements
+		  };
+		location_t specs_loc
+		  = smallest_type_quals_location (declspecs->locations,
+		  ignored_quals_list);
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = declspecs->locations[cdw_typedef];
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = loc;
+
 		/* Type qualifiers on a function return type are
 		   normally permitted by the standard but have no
 		   effect, so give a warning at -Wreturn-type.
@@ -6287,13 +6320,14 @@ grokdeclarator (const struct c_declarato
 		   DR#423 means qualifiers (other than _Atomic) are
 		   actually removed from the return type when
 		   determining the function type.  */
+		int quals_used = type_quals;
 		if (flag_isoc11)
 		  quals_used &= TYPE_QUAL_ATOMIC;
 		if (quals_used && VOID_TYPE_P (type) && really_funcdef)
-		  pedwarn (loc, 0,
+		  pedwarn (specs_loc, 0,
 			   "function definition has qualified void return type");
 		else
-		  warning_at (loc, OPT_Wignored_qualifiers,
+		  warning_at (specs_loc, OPT_Wignored_qualifiers,
 			   "type qualifiers ignored on function return type");
 
 		/* Ensure an error for restrict on invalid types; the
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 238040)
+++ gcc/cp/decl.c	(working copy)
@@ -10089,8 +10089,15 @@ grokdeclarator (const cp_declarator *dec
 	if (type_quals != TYPE_UNQUALIFIED)
 	  {
 		if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
-		  warning (OPT_Wignored_qualifiers,
-			   "type qualifiers ignored on function return type");
+		  {
+		location_t loc;
+		loc = smallest_type_quals_location (type_quals,
+			declspecs->locations);
+		if (loc == UNKNOWN_LOCATION)
+		  loc = declspecs->locations[ds_type_spec];
+		warning_at (loc, OPT_Wignored_qualifiers, "type "
+"qualifiers ignored on function return type");
+		  }
 		/* We now know that the TYPE_QUALS don't apply to the
 		   decl, but to its return type.  */
 		type_quals = TYPE_UNQUALIFIED;
Index: gcc/testsuite/c-c++-common/pr69733.c
===
--- gcc/testsuite/c-c++-common/pr69733.c	(revision 0)
+++ gcc/testsuite/c-c++-common/pr69733.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-W -fdiagnostics-show-caret" } */
+
+typedef const double cd;
+double val;
+
+const double val0() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ const double val0() {return val;}
+ ^
+{ dg-end-multiline-output "" } */
+
+volatile double val1() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ volatile double val1() {return val;}
+ ^~~~
+{ dg-end-multiline-output "" } */
+
+cd val2() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { 

[PATCH] Fix PR71881

2016-07-15 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-15  Richard Biener  

PR tree-optimization/71881
* tree-loop-distribution.c (destroy_loop): Remove blocks in
reverse DOM order to make debug temp generation happy.

* gcc.dg/torture/pr71881.c: New testcase.

Index: gcc/tree-loop-distribution.c
===
*** gcc/tree-loop-distribution.c(revision 238364)
--- gcc/tree-loop-distribution.c(working copy)
*** destroy_loop (struct loop *loop)
*** 888,900 
cancel_loop_tree (loop);
rescan_loop_exit (exit, false, true);
  
!   for (i = 0; i < nbbs; i++)
  {
/* We have made sure to not leave any dangling uses of SSA
   names defined in the loop.  With the exception of virtuals.
 Make sure we replace all uses of virtual defs that will remain
 outside of the loop with the bare symbol as delete_basic_block
 will release them.  */
for (gphi_iterator gsi = gsi_start_phis (bbs[i]); !gsi_end_p (gsi);
   gsi_next ())
{
--- 891,905 
cancel_loop_tree (loop);
rescan_loop_exit (exit, false, true);
  
!   i = nbbs;
!   do
  {
/* We have made sure to not leave any dangling uses of SSA
   names defined in the loop.  With the exception of virtuals.
 Make sure we replace all uses of virtual defs that will remain
 outside of the loop with the bare symbol as delete_basic_block
 will release them.  */
+   --i;
for (gphi_iterator gsi = gsi_start_phis (bbs[i]); !gsi_end_p (gsi);
   gsi_next ())
{
*** destroy_loop (struct loop *loop)
*** 912,917 
--- 917,924 
}
delete_basic_block (bbs[i]);
  }
+   while (i != 0);
+ 
free (bbs);
  
set_immediate_dominator (CDI_DOMINATORS, dest,
Index: gcc/testsuite/gcc.dg/torture/pr71881.c
===
*** gcc/testsuite/gcc.dg/torture/pr71881.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71881.c  (working copy)
***
*** 0 
--- 1,25 
+ /* { dg-do compile } */
+ /* { dg-additional-options "-g" } */
+ 
+ int a, b, c, d, *e, f, g;
+ 
+ int fn1 ()
+ {
+   char h[2];
+   int i = 0;
+   for (; i < 2; i++)
+ {
+   if (c)
+   for (*e = 0; *e;)
+ {
+   int j[f];
+   i = *e;
+ }
+   h[i] = 0;
+ }
+   for (; a;)
+ return h[0];
+   for (b = 0; b;)
+ i = g = (1 & i) < d;
+   return 0;
+ }


[PATCH] Fix PR71887

2016-07-15 Thread Richard Biener

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-15  Richard Biener  

PR tree-optimization/71887
* tree-ssa-phiopt.c (absorbing_element_p): Add rhs arg and
verify it is not zero for division / modulo handling.
(value_replacement): Adjust.

* gcc.dg/torture/pr71887.c: New testcase.

Index: gcc/tree-ssa-phiopt.c
===
*** gcc/tree-ssa-phiopt.c   (revision 238364)
--- gcc/tree-ssa-phiopt.c   (working copy)
*** neutral_element_p (tree_code code, tree
*** 812,818 
  /* Returns true if ARG is an absorbing element for operation CODE.  */
  
  static bool
! absorbing_element_p (tree_code code, tree arg, bool right)
  {
switch (code)
  {
--- 812,818 
  /* Returns true if ARG is an absorbing element for operation CODE.  */
  
  static bool
! absorbing_element_p (tree_code code, tree arg, bool right, tree rval)
  {
switch (code)
  {
*** absorbing_element_p (tree_code code, tre
*** 827,832 
--- 827,834 
  case RSHIFT_EXPR:
  case LROTATE_EXPR:
  case RROTATE_EXPR:
+   return !right && integer_zerop (arg);
+ 
  case TRUNC_DIV_EXPR:
  case CEIL_DIV_EXPR:
  case FLOOR_DIV_EXPR:
*** absorbing_element_p (tree_code code, tre
*** 836,842 
  case CEIL_MOD_EXPR:
  case FLOOR_MOD_EXPR:
  case ROUND_MOD_EXPR:
!   return !right && integer_zerop (arg);
  
  default:
return false;
--- 838,846 
  case CEIL_MOD_EXPR:
  case FLOOR_MOD_EXPR:
  case ROUND_MOD_EXPR:
!   return (!right
! && integer_zerop (arg)
! && tree_single_nonzero_warnv_p (rval, NULL));
  
  default:
return false;
*** value_replacement (basic_block cond_bb,
*** 1010,1018 
  && neutral_element_p (code_def, cond_rhs, false))
  || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
  && ((operand_equal_for_phi_arg_p (rhs2, cond_lhs)
!  && absorbing_element_p (code_def, cond_rhs, true))
  || (operand_equal_for_phi_arg_p (rhs1, cond_lhs)
! && absorbing_element_p (code_def, cond_rhs, false))
  {
gsi = gsi_for_stmt (cond);
if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
--- 1014,1023 
  && neutral_element_p (code_def, cond_rhs, false))
  || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
  && ((operand_equal_for_phi_arg_p (rhs2, cond_lhs)
!  && absorbing_element_p (code_def, cond_rhs, true, rhs2))
  || (operand_equal_for_phi_arg_p (rhs1, cond_lhs)
! && absorbing_element_p (code_def,
! cond_rhs, false, rhs2))
  {
gsi = gsi_for_stmt (cond);
if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
Index: gcc/testsuite/gcc.dg/torture/pr71887.c
===
*** gcc/testsuite/gcc.dg/torture/pr71887.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71887.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do run } */
+ 
+ char a;
+ int b;
+ 
+ int main ()
+ {
+   unsigned char c = a, d = a;
+   b = d == 0 ? c : c % d;
+   return 0;
+ }


Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns

2016-07-15 Thread Bernd Schmidt
I still have misgivings about all the changes needed to the following 
passes, but I guess there's no choice but to live with it. So, I'm 
trying to look at this patch, but I'm finding it fairly impenetrable and 
underdocumented.



+  /* The concerns for which we want a prologue placed on this BB.  */
+  sbitmap pro_concern;
+
+  /* The concerns for which we placed code at the start of the BB.  */
+  sbitmap head_concern;


What's the difference?


+  /* The frequency of executing the prologue for this BB and all BBs
+ dominated by it.  */
+  gcov_type cost;


Is this frequency consideration the only thing that attempts to prevent 
placing prologue insns into loops?



+
+/* Destroy the pass-specific data.  */
+static void
+fini_separate_shrink_wrap (void)
+{
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+if (bb->aux)
+  {
+   sbitmap_free (SW (bb)->has_concern);
+   sbitmap_free (SW (bb)->pro_concern);
+   sbitmap_free (SW (bb)->head_concern);
+   sbitmap_free (SW (bb)->tail_concern);
+   free (bb->aux);
+   bb->aux = 0;
+  }
+}


Almost makes me want to ask for an sbitmap variant allocated on obstacks.


+  /* If this block does have the concern itself, or it is cheaper to
+ put the prologue here than in all the descendants that need it,
+mark it so.  If it is the same cost, put it here if there is no
+block reachable from this block that does not need the prologue.
+The actual test is a bit more stringent but catches most cases.  */


There's some oddness here with the leading whitespace.


+/* Mark HAS_CONCERN for every block dominated by at least one block with
+   PRO_CONCERN set, starting at HEAD.  */


I see a lot of code dealing with the placement of prologue 
parts/concerns/components, but very little dealing with how to place 
epilogues, leading me to wonder whether we could do better wrt the 
latter. Shouldn't there be some mirror symmetry, i.e. 
spread_concerns_backwards?



+{
+  if (first_visit)
+   {
+ bitmap_ior (SW (bb)->has_concern, SW (bb)->pro_concern, concern);
+
+ if (first_dom_son (CDI_DOMINATORS, bb))
+   {
+ concern = SW (bb)->has_concern;
+ bb = first_dom_son (CDI_DOMINATORS, bb);
+ continue;
+   }


Calling first_dom_son twice with the same args? More importantly, this 
first_visit business seems very confusing. I'd try to find a way to 
merge this if with the places that set first_visit to true. Also - 
instead of having a "continue;" here it seems the code inside the if 
represents an inner loop that should be written explicitly. There are 
two loops with such a structure.



+/* If we cannot handle placing some concern's prologues or epilogues where
+   we decided we should place them, unmark that concern in CONCERNS so
+   that it is not wrapped separately.  */
+static void
+disqualify_problematic_concerns (sbitmap concerns)
+{
+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   {
+ bitmap_and_compl (epi, SW (e->src)->has_concern,
+   SW (e->dest)->has_concern);
+ bitmap_and_compl (pro, SW (e->dest)->has_concern,
+   SW (e->src)->has_concern);


What is the purpose of this?


+/* Place code for prologues and epilogues for CONCERNS where we can put
+   that code at the start of basic blocks.  */
+static void
+do_common_heads_for_concerns (sbitmap concerns)


The function name should probably have some combination of the strings 
emit_ and _at or _into to make it clear what it's doing. This and the 
following function have some logical operations on the bitmaps which are 
not explained anywhere. In general a much better overview of the 
intended operation of this pass is needed.



+   {
+ bitmap_and_compl (epi, SW (e->src)->has_concern,
+   SW (e->dest)->has_concern);
+ bitmap_and_compl (pro, SW (e->dest)->has_concern,
+   SW (e->src)->has_concern);
+ bitmap_and (epi, epi, concerns);
+ bitmap_and (pro, pro, concerns);
+ bitmap_and_compl (epi, epi, SW (e->dest)->head_concern);
+ bitmap_and_compl (pro, pro, SW (e->dest)->head_concern);
+ bitmap_and_compl (epi, epi, SW (e->src)->tail_concern);
+ bitmap_and_compl (pro, pro, SW (e->src)->tail_concern);


Likewise here.


Bernd


Re: [RFC][IPA-VRP] Add support for IPA VRP in ipa-cp/ipa-prop

2016-07-15 Thread Martin Jambor
Hi,

thanks for working on extending IPA-CP in this way.  I do have a few
comments though:

On Fri, Jul 15, 2016 at 02:46:50PM +1000, kugan wrote:
> Hi,
> 
> This patch extends ipa-cp/ipa-prop infrastructure to handle propagation of
> VR.
> Thanks,
> 
> Kugan
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-07-14  Kugan Vivekanandarajah  
> 
> * gcc.dg/ipa/vrp1.c: New test.
> * gcc.dg/ipa/vrp2.c: New test.
> * gcc.dg/ipa/vrp3.c: New test.
> 
> gcc/ChangeLog:
> 
> 2016-07-14  Kugan Vivekanandarajah  
> 
> * common.opt: New option -fipa-vrp.
> * ipa-cp.c (ipa_get_vr_lat): New.
> (ipcp_vr_lattice::print): Likewise.
> (print_all_lattices): Call ipcp_vr_lattice::print.
> (ipcp_vr_lattice::meet_with): New.
> (ipcp_vr_lattice::meet_with_1): Likewise.
> (ipcp_vr_lattice::top_p): Likewise.
> (ipcp_vr_lattice::bottom_p): Likewsie.
> (ipcp_vr_lattice::set_to_bottom): Likewise.
> (set_all_contains_variable): Call VR set_to_bottom.
> (initialize_node_lattices): Init VR lattices.
> (propagate_vr_accross_jump_function): New.
> (propagate_constants_accross_call): Call
> propagate_vr_accross_jump_function.
> (ipcp_store_alignment_results): Rename to
> ipcp_store_alignment_and_vr_results and handke VR.
> * ipa-prop.c (ipa_set_jf_unknown):
> (ipa_compute_jump_functions_for_edge): Handle Value Range.
> (ipa_node_params_t::duplicate): Likewise.
> (ipa_write_jump_function): Likewise.
> (ipa_read_jump_function): Likewise.
> (write_ipcp_transformation_info): Likewise.
> (read_ipcp_transformation_info): Likewise.
> (ipcp_update_alignments): Rename to ipcp_update_vr_and_alignments
> and handle VR.
> 

> From 092cbccd79c3859ff24846bb0e1892ef5d8086bc Mon Sep 17 00:00:00 2001
> From: Kugan Vivekanandarajah 
> Date: Tue, 21 Jun 2016 12:43:01 +1000
> Subject: [PATCH 5/6] Add ipa vrp
> 
> ---
>  gcc/common.opt  |   4 +
>  gcc/ipa-cp.c| 220 
> +++-
>  gcc/ipa-prop.c  | 110 ++--
>  gcc/ipa-prop.h  |  16 +++
>  gcc/testsuite/gcc.dg/ipa/vrp1.c |  32 ++
>  gcc/testsuite/gcc.dg/ipa/vrp2.c |  35 +++
>  gcc/testsuite/gcc.dg/ipa/vrp3.c |  30 ++
>  7 files changed, 433 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp1.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp2.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp3.c
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 29d0e4d..7bf7305 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2475,6 +2475,10 @@ ftree-evrp
>  Common Report Var(flag_tree_early_vrp) Init(1) Optimization
>  Perform Early Value Range Propagation on trees.
>  
> +fipa-vrp
> +ommon Report Var(flag_ipa_vrp) Init(1) Optimization

Common

> +Perform IPA Value Range Propagation on trees.

I think that nowadays we should omit the "on trees" part, they are not
particularly useful.

> +
>  fsplit-paths
>  Common Report Var(flag_split_paths) Init(0) Optimization
>  Split paths leading to loop backedges.
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 4b7f6bb..97cd04b 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -120,6 +120,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "params.h"
>  #include "ipa-inline.h"
>  #include "ipa-utils.h"
> +#include "tree-vrp.h"
>  
>  template  class ipcp_value;
>  
> @@ -266,6 +267,25 @@ private:
>bool meet_with_1 (unsigned new_align, unsigned new_misalign);
>  };
>  
> +/* Lattice of value ranges.  */
> +
> +class ipcp_vr_lattice
> +{
> +public:
> +  value_range vr;
> +
> +  inline bool bottom_p () const;
> +  inline bool top_p () const;
> +  inline bool set_to_bottom ();
> +  bool meet_with (const value_range *vr);

Please do not call the parameter the same name as that of a member
variable, that might become very confusing in future.

> +  bool meet_with (const ipcp_vr_lattice );
> +  void init () { vr.type = VR_UNDEFINED; }
> +  void print (FILE * f);
> +
> +private:
> +  bool meet_with_1 (const value_range *vr);

Likewise.

I know that no other classes in the file do, but if you want to
strictly follow the GCC coding style, member vr should be called m_vr.
Perhaps I should add the m_ prefixes to other classes as well, I am
becoming to appreciate them.

> +};
> +
>  /* Structure containing lattices for a parameter itself and for pieces of
> aggregates that are passed in the parameter or by a reference in a 
> parameter
> plus some other useful flags.  */
> @@ -281,6 +301,8 @@ public:
>ipcp_agg_lattice *aggs;
>/* Lattice describing known alignment.  */
>ipcp_alignment_lattice alignment;
> +  /* Lattice describing value range.  */
> +  ipcp_vr_lattice vr;
>/* Number of 

Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue

2016-07-15 Thread Dominik Vogt
Final version 6 with the stray comment removed (was a harmless
oversight).

Initial description of the patch:
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* cfgexpand.c (expand_stack_vars): Implement synamic stack space
allocation in the prologue.
* explow.c (get_dynamic_stack_base): New function to return an address
expression for the dynamic stack base.
(get_dynamic_stack_size): New function to do the required dynamic stack
space size calculations.
(allocate_dynamic_stack_space): Use new functions.
(align_dynamic_address): Move some code from
allocate_dynamic_stack_space to new function.
* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

* gcc.target/s390/warn-dynamicstack-1.c: New test.
* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
stack-layout-dynamic-1.c: New test.
>From 03d09e6adacf582d52f53ca36cfa3c001ed444e5 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.
---
 gcc/cfgexpand.c|  22 +-
 gcc/explow.c   | 225 ++---
 gcc/explow.h   |   8 +
 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c  |  14 ++
 gcc/testsuite/gcc.dg/stack-usage-2.c   |   4 +-
 .../gcc.target/s390/warn-dynamicstack-1.c  |  17 ++
 6 files changed, 206 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/warn-dynamicstack-1.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index e4ddb3a..5046d61 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1053,6 +1053,7 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
   HOST_WIDE_INT large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
   unsigned large_align = 0;
+  bool large_allocation_done = false;
   tree decl;
 
   /* Determine if there are any variables requiring "large" alignment.
@@ -1096,11 +1097,6 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  large_size &= -(HOST_WIDE_INT)alignb;
  large_size += stack_vars[i].size;
}
-
-  /* If there were any, allocate space.  */
-  if (large_size > 0)
-   large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-  large_align, true);
 }
 
   for (si = 0; si < n; ++si)
@@ -1186,6 +1182,22 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  /* Large alignment is only processed in the last pass.  */
  if (pred)
continue;
+
+ /* If there were any variables requiring "large" alignment, allocate
+space.  */
+ if (large_size > 0 && ! large_allocation_done)
+   {
+ HOST_WIDE_INT loffset;
+ rtx large_allocsize;
+
+ large_allocsize = GEN_INT (large_size);
+ get_dynamic_stack_size (_allocsize, 0, large_align, NULL);
+ loffset = alloc_stack_frame_space
+   (INTVAL (large_allocsize),
+PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+ large_base = get_dynamic_stack_base (loffset, large_align);
+ large_allocation_done = true;
+   }
  gcc_assert (large_base != NULL);
 
  large_alloc += alignb - 1;
diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..a345690 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1146,82 +1146,55 @@ record_new_stack_level (void)
 update_sjlj_context ();
 }
 
-/* Return an rtx representing the address of an area of memory dynamically
-   pushed on the stack.
+/* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
+static rtx
+align_dynamic_address (rtx target, unsigned required_align)
+{
+  /* CEIL_DIV_EXPR needs to worry about the addition overflowing,
+ but we know it can't.  So add ourselves and then do
+ TRUNC_DIV_EXPR.  */
+  target = expand_binop (Pmode, add_optab, target,
+gen_int_mode (required_align / BITS_PER_UNIT - 1,
+  Pmode),
+NULL_RTX, 1, OPTAB_LIB_WIDEN);
+  target = expand_divmod (0, 

Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue

2016-07-15 Thread Dominik Vogt
On Fri, Jul 15, 2016 at 01:51:51PM +0200, Bernd Schmidt wrote:
> On 07/14/2016 11:10 AM, Dominik Vogt wrote:
> 
> >-  if (flag_stack_usage_info)
> >-stack_usage_size += extra;
> >+  /*!!!*/
> >+  if (flag_stack_usage_info && pstack_usage_size)
> >+*pstack_usage_size += extra;
> 
> What does the comment mean?

It means that I wanted to look at that code location for some
reason and forgot to do so.  Need to check why I've put it in
there in the first place.

> Other than that it looks ok, although I still feel uneasy about the
> unexplained movement of code since v1 - next time you fix a bug in a
> patch, please document what happened.

Yep.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany


Re: [PATCH v5] Allocate constant size dynamic stack space in the prologue

2016-07-15 Thread Bernd Schmidt

On 07/14/2016 11:10 AM, Dominik Vogt wrote:


-  if (flag_stack_usage_info)
-stack_usage_size += extra;
+  /*!!!*/
+  if (flag_stack_usage_info && pstack_usage_size)
+*pstack_usage_size += extra;


What does the comment mean? Whatever it is needs to be addressed and the 
comment removed.


Other than that it looks ok, although I still feel uneasy about the 
unexplained movement of code since v1 - next time you fix a bug in a 
patch, please document what happened.



Bernd


[PATCH] c++/58796 Make nullptr match exception handlers of pointer type

2016-07-15 Thread Jonathan Wakely

This allows exception handlers of pointer and pointer to data members
to match exceptions of type std::nullptr_t.

Pointers to member functions don't work yet, because a nullptr_t
exception object is only a single word, and we need two words for a
pointer to member function, and we don't have anywhere to put those
two words in the exception object. I'm going to raise this with the
C++ ABI group to coordinate with other implementations.

Tested x86_64-linux, I plan to commit this to trunk soon.


libstdc++-v3:

PR c++/58796
* libsupc++/pbase_type_info.cc (__pbase_type_info::__do_catch): Make
nullptr match handlers of pointer type.

gcc/testsuite:

PR c++/58796
* g++.dg/cpp0x/nullptr21.C: Remove void* handlers.
* g++.dg/cpp0x/nullptr35.C: New test.



commit 2e6bb9fbc1d47915bdfe487efd7e870600b4b442
Author: Jonathan Wakely 
Date:   Thu Jul 14 14:33:24 2016 +0100

c++/58796 Make nullptr match exception handlers of pointer type

libstdc++-v3:

	PR c++/58796
	* libsupc++/pbase_type_info.cc (__pbase_type_info::__do_catch): Make
	nullptr match handlers of pointer type.

gcc/testsuite:

	PR c++/58796
	* g++.dg/cpp0x/nullptr21.C: Remove void* handlers.
	* g++.dg/cpp0x/nullptr35.C: New test.

diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr21.C b/gcc/testsuite/g++.dg/cpp0x/nullptr21.C
index 89884b9..c6f560e 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nullptr21.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr21.C
@@ -18,8 +18,6 @@ int main()
 {
   try {
 throw nullptr;
-  } catch (void*) {
-foo (0, 1);
   } catch (bool) {
 foo (0, 2);
   } catch (int) {
@@ -35,8 +33,6 @@ int main()
   nullptr_t mynull = 0;
   try {
 throw mynull;
-  } catch (void*) {
-foo (1, 1);
   } catch (bool) {
 foo (1, 2);
   } catch (int) {
diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr35.C b/gcc/testsuite/g++.dg/cpp0x/nullptr35.C
new file mode 100644
index 000..2f93ce1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr35.C
@@ -0,0 +1,52 @@
+// { dg-do run { target c++11 } }
+
+// Test catching as pointer and pointer to member types, [except.handle] p3.
+
+extern "C" void abort (void);
+
+typedef decltype(nullptr) nullptr_t;
+
+int result = 0;
+
+void __attribute__((noinline))
+caught(int bit)
+{
+  result &= bit;
+}
+
+struct A { };
+
+int main()
+{
+  try {
+try {
+  try {
+try {
+  try {
+throw nullptr;
+  } catch (void* p) {
+if (p == nullptr)
+  caught(1);
+throw;
+  }
+} catch (void(*pf)()) {
+  if (pf == nullptr)
+caught(2);
+  throw;
+}
+  } catch (int A::*pm) {
+if (pm == nullptr)
+  caught(4);
+throw;
+  }
+} catch (int (A::*pmf)()) {  // FIXME: currently unsupported
+  if (pmf == nullptr)
+caught(8);
+  throw;
+}
+  } catch (nullptr_t) {
+  }
+
+  if (result != 7) // should be 15
+abort ();
+}
diff --git a/libstdc++-v3/libsupc++/pbase_type_info.cc b/libstdc++-v3/libsupc++/pbase_type_info.cc
index d47fb23..a2993e4 100644
--- a/libstdc++-v3/libsupc++/pbase_type_info.cc
+++ b/libstdc++-v3/libsupc++/pbase_type_info.cc
@@ -38,6 +38,31 @@ __do_catch (const type_info *thr_type,
 return true;  // same type
 
 #if __cpp_rtti
+  if (*thr_type == typeid (nullptr))
+{
+  // A catch handler for any pointer type matches nullptr_t.
+  if (typeid (*this) == typeid(__pointer_type_info))
+{
+  *thr_obj = nullptr;
+  return true;
+}
+  else if (typeid (*this) == typeid(__pointer_to_member_type_info))
+{
+  if (__pointee->__is_function_p ())
+{
+  // A pointer-to-member-function is two words  but the
+  // nullptr_t exception object at *(nullptr_t*)*thr_obj is only
+  // one word, so we can't safely return it as a PMF. FIXME.
+  return false;
+}
+  else
+{
+  *(ptrdiff_t*)*thr_obj = -1; // null pointer to data member
+  return true;
+}
+}
+}
+
   if (typeid (*this) != typeid (*thr_type))
 return false; // not both same kind of pointers
 #endif


Re: [PATCH/AARCH64] Add rtx_costs routine for vulcan.

2016-07-15 Thread James Greenhalgh
On Fri, Jul 15, 2016 at 10:08:11AM +0530, Virendra Pathak wrote:
> Hi James/Julian,
> 
> Kindly merge this patch to the trunk.

Done as r238372.

Thanks,
James

> >>> gcc/ChangeLog:
> >>>
> >>> Virendra Pathak  
> >>> Julian Brown  
> >>>
> >>> * config/aarch64/aarch64-cores.def: Update vulcan COSTS.
> >>> * config/aarch64/aarch64-cost-tables.h
> >>> (vulcan_extra_costs): New variable.
> >>> * config/aarch64/aarch64.c
> >>> (vulcan_addrcost_table): Likewise.
> >>> (vulcan_regmove_cost): Likewise.
> >>> (vulcan_vector_cost): Likewise.
> >>> (vulcan_branch_cost): Likewise.
> >>> (vulcan_tunings): Likewise.



Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-15 Thread Kyrill Tkachov


On 14/07/16 14:31, Rainer Orth wrote:

Hi Richard,


On Thu, 14 Jul 2016, Kyrill Tkachov wrote:


Hi all,

On 07/07/16 17:16, Kyrill Tkachov wrote:

On 06/07/16 13:40, Kyrill Tkachov wrote:

On 06/07/16 13:31, Rainer Orth wrote:

Hi Kyrill,


On 05/07/16 12:24, Rainer Orth wrote:

Marc Glisse  writes:


On Tue, 5 Jul 2016, Kyrill Tkachov wrote:


As for testing I've bootstrapped and tested the patch on aarch64
and
x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked
to be
always true to exercise the paths that synthesize the shift by
additions. Marc, could you test this on the sparc targets you
care about?

I don't have access to Sparc, you want Rainer here (added in Cc:).

As is, the patch doesn't even build on current mainline:

/vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error:
'mult_variant' has not been declared
target_supports_mult_synth_alg (struct algorithm *alg,
mult_variant var,
^

enum mult_variant is only declared in expmed.c.

Ah, sorry I forgot to mention that this is patch 2/2.
It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
which
is already approved
but I was planning to commit it together with this one.
Can you please try applying
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
as well as this?

sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
that the patch fixes PR tree-optimization/70923 (you should mention that
in the ChangeLog or close it as a duplicate), with the same caveat as
about Marc's latest patch for that:

Thanks! Much appreciated.


+FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1
loops" 1

The message appears twice, not once, on sparc, so the testcase should be
updated to accomodate that.

Ok.


Besides, the new testcase FAILs:

+FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 2
+FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1
loops" 2

The dump contains

gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt:
_4 = *_3;
gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt:
_4 = *_3;
gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function
calls or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function
calls or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.

I see. I suppose SPARC doesn't have vector shifts operating on 64-bit
data?
I believe the testcase should be updated to use just "int" arrays rather
than "long long".

I'll respin the testcases

Ok, here's the patch with pr65951.c updated to use int rather than long long
arrays and vect-iv-9.c updated to scan for the
"Vectorized 1 loops" string twice. I've verified by hand by building a
sparc-sun-solaris2.10 cc1 that compiling with
-mcpu=ultrasparc -mvis gives the right strings in the vectoriser dump.

So is this version ok?


So richi was ok with the implementation part of the patch [1]
Are the testuite changes made since ok?

[1] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00177.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00331.html

Ok with me - I wondered if Rainer wanted to double-check.

I just ran a sparc-sun-solaris2.12 bootstrap with the patch(es) applied
and results look good: the remaining failures are unrelated and have
already been reported separately.


Thanks for confirming.
I've committed the patches to trunk.

Kyrill



Thanks.
 Rainer





[PATCH] Some PRE cleanups

2016-07-15 Thread Richard Biener

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-15  Richard Biener  

* tree-ssa-pre.c (get_representative_for): Make sure to return
the value number of SSA names.
(phi_translate_1): get_representative_for cannot return NULL.
(do_pre_regular_insertion): Remove redundant call to
fully_constant_expression.
(do_pre_partial_partial_insertion): Likewise.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 238369)
--- gcc/tree-ssa-pre.c  (working copy)
*** get_representative_for (const pre_expr e
*** 1365,1371 
switch (e->kind)
  {
  case NAME:
!   return PRE_EXPR_NAME (e);
  case CONSTANT:
return PRE_EXPR_CONSTANT (e);
  case NARY:
--- 1365,1371 
switch (e->kind)
  {
  case NAME:
!   return VN_INFO (PRE_EXPR_NAME (e))->valnum;
  case CONSTANT:
return PRE_EXPR_CONSTANT (e);
  case NARY:
*** get_representative_for (const pre_expr e
*** 1380,1386 
  {
pre_expr rep = expression_for_id (i);
if (rep->kind == NAME)
! return PRE_EXPR_NAME (rep);
else if (rep->kind == CONSTANT)
  return PRE_EXPR_CONSTANT (rep);
  }
--- 1380,1386 
  {
pre_expr rep = expression_for_id (i);
if (rep->kind == NAME)
! return VN_INFO (PRE_EXPR_NAME (rep))->valnum;
else if (rep->kind == CONSTANT)
  return PRE_EXPR_CONSTANT (rep);
  }
*** phi_translate_1 (pre_expr expr, bitmap_s
*** 1448,1459 
leader = find_leader_in_sets (op_val_id, set1, set2);
  result = phi_translate (leader, set1, set2, pred, phiblock);
if (result && result != leader)
! {
!   tree name = get_representative_for (result);
!   if (!name)
! return NULL;
!   newnary->op[i] = name;
! }
else if (!result)
  return NULL;
  
--- 1448,1454 
leader = find_leader_in_sets (op_val_id, set1, set2);
  result = phi_translate (leader, set1, set2, pred, phiblock);
if (result && result != leader)
! newnary->op[i] = get_representative_for (result);
else if (!result)
  return NULL;
  
*** phi_translate_1 (pre_expr expr, bitmap_s
*** 1543,1561 
  }
op_val_id = VN_INFO (op[n])->value_id;
leader = find_leader_in_sets (op_val_id, set1, set2);
-   if (!leader)
- break;
opresult = phi_translate (leader, set1, set2, pred, phiblock);
!   if (!opresult)
! break;
!   if (opresult != leader)
  {
tree name = get_representative_for (opresult);
-   if (!name)
- break;
changed |= name != op[n];
op[n] = name;
  }
  }
if (n != 3)
  {
--- 1538,1552 
  }
op_val_id = VN_INFO (op[n])->value_id;
leader = find_leader_in_sets (op_val_id, set1, set2);
opresult = phi_translate (leader, set1, set2, pred, phiblock);
!   if (opresult && opresult != leader)
  {
tree name = get_representative_for (opresult);
changed |= name != op[n];
op[n] = name;
  }
+   else if (!opresult)
+ break;
  }
if (n != 3)
  {
*** do_pre_regular_insertion (basic_block bl
*** 3198,3204 
  break;
}
  
- eprime = fully_constant_expression (eprime);
  vprime = get_expr_value_id (eprime);
  edoubleprime = bitmap_find_leader (AVAIL_OUT (bprime),
 vprime);
--- 3189,3194 
*** do_pre_partial_partial_insertion (basic_
*** 3357,3363 
  break;
}
  
- eprime = fully_constant_expression (eprime);
  vprime = get_expr_value_id (eprime);
  edoubleprime = bitmap_find_leader (AVAIL_OUT (bprime), vprime);
  avail[pred->dest_idx] = edoubleprime;
--- 3347,3352 


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-07-15 Thread Richard Biener
On Fri, Jun 17, 2016 at 4:33 PM, Ilya Enkovich  wrote:
> 2016-06-16 9:00 GMT+03:00 Jeff Law :
>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces changes required to run vectorizer on loop epilogue.
>>> This also enables epilogue vectorization using a vector of smaller size.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2016-05-19  Ilya Enkovich  
>>>
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>>> try to enhance alignment for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>>> created loop.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>>> loop->aux.
>>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>>> loop->aux.
>>> (vect_analyze_loop): Reset loop->aux.
>>> (vect_transform_loop): Check if created epilogue should be
>>> returned
>>> for further vectorization.  If-convert epilogue if required.
>>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>>> process and insert vectorized loop epilogues into this queue.
>>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>>> created
>>> loop.
>>> (vect_transform_loop): Return created loop.
>>
>> As Richi noted, the additional calls into the if-converter are unfortunate.
>> I'm not sure how else to avoid them though.  It looks like we can run
>> if-conversion on just the epilogue, so maybe that's not too bad.
>>
>>
>>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>>> bool clean_stmts)
>>>destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>>loop_vinfo->scalar_cost_vec.release ();
>>>
>>> +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>>free (loop_vinfo);
>>> -  loop->aux = NULL;
>>>  }
>>
>> Hmm, there seems to be a level of indirection I'm missing here.  We're
>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
>> the original loop to the vectorized epilogue.  What am I missing?  Rather
>> than smuggling around in the aux field, is there some inherent reason why we
>> can't just copy the info from the original loop directly into
>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>
> LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
>  - mark this loop as epilogue
>  - get VF of original loop (required for both mask and nomask modes)
>  - get decision about epilogue masking
>
> That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
> vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
> LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
> place to bind it with the original loop and therefore I use vacant loop->aux
> for that.  Any other way to bind epilogue with its original loop would work
> as well.  I just chose loop->aux to avoid new fields and data structures.

Maybe simply changing the way the vectorizer iterates over loops like
re-cursing on the generated epilogue and passing down its origin.

>>
>>> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
>>> + Don't try to vectorize epilogue because it will require
>>> + additional alias checks.  */
>>
>> Are the alias checks here redundant with the ones done for the original
>> loop?  If so won't DOM eliminate them?
>
> I revisited this part recently and thought it should actually be safe to
> assume we have no aliasing in epilogue because we are dominated by alias
> checks of the original loop.  So I prepared a patch to remove this restriction
> and avoid alias checks generation for epilogues (so we compute aliases checks
> required but don't emit them).  I didn't send this patch yet.
> Do you think it is a valid assumption?
>
>>
>>
>> And something just occurred to me -- is there some inherent reason why SLP
>> doesn't vectorize the epilogue, particularly for the cases where we can
>> vectorize the epilogue using smaller vectors?  Sorry if you've already
>> answered this somewhere or it's a dumb question.
>
> IIUC this may happen only if we unroll epilogue into a single BB which happens
> only when epilogue iterations count is known. Right?
>
>>
>>
>>
>>>
>>> +   /* Add new loop to a processing queue.  To make it easier
>>> +  to match loop and its epilogue vectorization in dumps
>>> +  put new loop as the next loop to process.  */
>>> +   if (new_loop)
>>> + {
>>> +   loops.safe_insert (i + 1, new_loop->num);
>>> +   

Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-07-15 Thread Richard Biener
On Thu, Jun 16, 2016 at 8:00 AM, Jeff Law  wrote:
> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch introduces changes required to run vectorizer on loop epilogue.
>> This also enables epilogue vectorization using a vector of smaller size.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * tree-if-conv.c (tree_if_conversion): Make public.
>> * tree-if-conv.h: New file.
>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>> try to enhance alignment for epilogues.
>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>> created loop.
>> * tree-vect-loop.c: include tree-if-conv.h.
>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>> loop->aux.
>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>> loop->aux.
>> (vect_analyze_loop): Reset loop->aux.
>> (vect_transform_loop): Check if created epilogue should be
>> returned
>> for further vectorization.  If-convert epilogue if required.
>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>> process and insert vectorized loop epilogues into this queue.
>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>> created
>> loop.
>> (vect_transform_loop): Return created loop.
>
> As Richi noted, the additional calls into the if-converter are unfortunate.
> I'm not sure how else to avoid them though.  It looks like we can run
> if-conversion on just the epilogue, so maybe that's not too bad.

We could use the if-converted loop as source when doing the loop copy
for the epilogue...  (and do it similar to if-conversion when it inserts a
__builtin_vectorized_loop () check, that is, create two versions for
the epilogue).

>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>> bool clean_stmts)
>>destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>loop_vinfo->scalar_cost_vec.release ();
>>
>> +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>free (loop_vinfo);
>> -  loop->aux = NULL;
>>  }
>
> Hmm, there seems to be a level of indirection I'm missing here.  We're
> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
> the original loop to the vectorized epilogue.  What am I missing?  Rather
> than smuggling around in the aux field, is there some inherent reason why we
> can't just copy the info from the original loop directly into
> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>
>> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
>> + Don't try to vectorize epilogue because it will require
>> + additional alias checks.  */
>
> Are the alias checks here redundant with the ones done for the original
> loop?  If so won't DOM eliminate them?

They are too complex for this.  But the epilogue could be annotated with ivdep
pragma / safelen in some way?

> And something just occurred to me -- is there some inherent reason why SLP
> doesn't vectorize the epilogue, particularly for the cases where we can
> vectorize the epilogue using smaller vectors?  Sorry if you've already
> answered this somewhere or it's a dumb question.

It usually can but only if we unroll the epilogue later (and thus when the
number of iterations is known at compile-time).

>
>
>>
>> +   /* Add new loop to a processing queue.  To make it easier
>> +  to match loop and its epilogue vectorization in dumps
>> +  put new loop as the next loop to process.  */
>> +   if (new_loop)
>> + {
>> +   loops.safe_insert (i + 1, new_loop->num);
>> +   vect_loops_num = number_of_loops (cfun);
>> + }
>> +
>
> So just to be clear, the only reason to do this is for dumps -- other than
> processing the loop before it's epilogue, there's no other inherently
> necessary ordering of the loops, right?
>
>
> Jeff


Re: [C PATCH] For implicit function declaration suggestions only consider fns and fn pointers (PR c/71858)

2016-07-15 Thread Bernd Schmidt

On 07/14/2016 04:53 PM, Jakub Jelinek wrote:


2016-07-14  Jakub Jelinek  

PR c/71858
* c-common.h (enum lookup_name_fuzzy_kind): Add
FUZZY_LOOKUP_FUNCTION_NAME.

* c-decl.c (implicit_decl_warning): Use FUZZY_LOOKUP_FUNCTION_NAME
instead of FUZZY_LOOKUP_NAME.
(lookup_name_fuzzy): For FUZZY_LOOKUP_FUNCTION_NAME consider
FUNCTION_DECLs, {VAR,PARM}_DECLs function pointers and macros.

* gcc.dg/spellcheck-identifiers-3.c: New test.


Ok.


Bernd



Re: [RFC, v2] Test coverage for --param boundary values

2016-07-15 Thread Andreas Schwab
Martin Liška  writes:

> This is my second attempt of the patch where I generate all tests on fly.
> Firstly, params-options.h is used to generate a list of options in form of:
>
> "predictable-branch-outcome"=2,0,50
> "inline-min-speedup"=10,0,0
> "max-inline-insns-single"=400,0,0
> ...
>
> The list is then loaded in params.ext which triggers dg-runtest.
> I've also swapped the tested source file to a part of bzip2 compression 
> algorithm.
>
> Now the testing runs 1m20s (w/ -O3) and 0m58s (w/ -O2).
> Hope it's much better?

That generates 231 identically named tests, which is bad.  All tests
must have unique titles.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH, vec-tails 10/10] Tests

2016-07-15 Thread Ilya Enkovich
2016-07-14 20:32 GMT+03:00 Jeff Law :
> On 07/05/2016 09:44 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch adds several tests to check tails vectorization functionality.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/testsuite/
>>
>> 2016-07-05  Ilya Enkovich  
>>
>> * lib/target-supports.exp (check_avx2_hw_available): New.
>> (check_effective_target_avx2_runtime): New.
>> * gcc.dg/vect/vect-tail-combine-1.c: New test.
>> * gcc.dg/vect/vect-tail-combine-2.c: New test.
>> * gcc.dg/vect/vect-tail-combine-3.c: New test.
>> * gcc.dg/vect/vect-tail-combine-4.c: New test.
>> * gcc.dg/vect/vect-tail-combine-5.c: New test.
>> * gcc.dg/vect/vect-tail-combine-6.c: New test.
>> * gcc.dg/vect/vect-tail-combine-7.c: New test.
>> * gcc.dg/vect/vect-tail-combine-9.c: New test.
>> * gcc.dg/vect/vect-tail-mask-1.c: New test.
>> * gcc.dg/vect/vect-tail-mask-2.c: New test.
>> * gcc.dg/vect/vect-tail-mask-3.c: New test.
>> * gcc.dg/vect/vect-tail-mask-4.c: New test.
>> * gcc.dg/vect/vect-tail-mask-5.c: New test.
>> * gcc.dg/vect/vect-tail-mask-6.c: New test.
>> * gcc.dg/vect/vect-tail-mask-7.c: New test.
>> * gcc.dg/vect/vect-tail-mask-8.c: New test.
>> * gcc.dg/vect/vect-tail-mask-9.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-2.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-3.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-4.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-5.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-6.c: New test.
>> * gcc.dg/vect/vect-tail-nomask-7.c: New test.
>
> This is fine when the rest of the patches go in.
>
>
>> + unsigned int eax, ebx, ecx, edx;
>> + if (!__get_cpuid (1, , , , )
>> + || ((ecx & bit_OSXSAVE) != bit_OSXSAVE))
>> +   return 1;
>> +
>> + if (__get_cpuid_max (0, NULL) < 7)
>> +   return 1;
>> +
>> + __cpuid_count (7, 0, eax, ebx, ecx, edx);
>> +
>> + return (ebx & bit_AVX2) != bit_AVX2;
>
> Ugh.  I'm going to trust this is correct.  I vaguely recall mucking around
> with this stuff for the original AVX in glibc several years ago.

Actually I just copied some code from avx2-check.h.  Kirill should be able
to review this piece of code.

Thanks,
Ilya

>
> jeff
>


GCC testsuite maintenance (was: [PATCH] Fix OpenACC vector_length parsing in fortran)

2016-07-15 Thread Thomas Schwinge
Hi!

Including  in case others also have opinions on "GCC
testsuite maintenance".

On Fri, 15 Jul 2016 10:21:13 +0200, Jakub Jelinek  wrote:
> On Fri, Jul 15, 2016 at 09:44:25AM +0200, Thomas Schwinge wrote:
> > Does that me we (erroneously) accept any random junk after [...]
> 
> No.  The thing is, [...]

Thanks for the explanation!


> > On Thu, 14 Jul 2016 19:05:52 -0700, Cesar Philippidis 
> >  wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/goacc/vector_length.f90
> > > @@ -0,0 +1,11 @@
> > > +program t
> > > +  implicit none
> > > +  integer, parameter :: n = 100
> > > +  integer a(n), i
> > > +
> > > +  !$acc parallel loop num_gangs(100) num_workers(1) vector_length(32)
> > > +  do i = 1, n
> > > + a(i) = i
> > > +  enddo
> > > +  !$acc end parallel loop
> > > +end program t
> > 
> > My preference is that such tests are added to existing test files, for
> > example where "!$acc parallel loop" is tested, instead of adding such
> > mini files.

One of the problem I have with this specific and similar other test cases
is, that when coming back to such a test case in a few weeks' time, it is
not obvious at all what this is testing, whereas if this were inside a
file that generally/already tests all kinds of "!$acc parallel loop"
variants, this would be obvious.

> No, we want to have as little churn as possible in existing tests, the
> general policy is to add new tests (not just for OpenACC/OpenMP, but for
> all functionality).

Hmm, that's something I had not been aware of, and I can't find this
covered in the documentation.  So, you're basically saying that files in
the testsuite are write-once, and should not be maintained aside from
fixing errors, adjusting due to optimization changes, or due to changed
diagnostics, and the like?  (Of course, I do agree that we shouldn't
"randomly" remove/modify test cases that test stuff that is still
relevant, but that doesn't apply here, obviously.)

In my opinion, the testsuite should be maintained, just like "regular"
code, and so related test cases should be consolidated, obsolete ones be
removed, and so on, every once in a while, or on demand.  So that the GCC
testsuite doesn't just always grow (in number of files), but also "stays
relevant".  Also, the wall-time overhead is much lower if, taking this
specific example, that one OpenACC directive test case is tested next to
others in an existing testsuite file, as opposed to requiring a separate
compiler invocation, being in a separate file.


Grüße
 Thomas


Re: [Fortran, Patch, PR71807, v1] [5/6/7 Regression] Internal compiler error with NULL() reference in structure constructor

2016-07-15 Thread Andre Vehreschild
Hi Jerry,

thanks for the fast review. Committed to trunk as r238368.

Regards,
Andre

On Thu, 14 Jul 2016 10:49:37 -0700
Jerry DeLisle  wrote:

> On 07/14/2016 09:17 AM, Andre Vehreschild wrote:
> > Hi all,
> > 
> > attached patch fixes the ICE when assigning null() to an allocatable
> > component in an initializer.
> > 
> > Bootstrapped and regtested on x86_64-linux/F23 ok for trunk,
> > gcc-6-branch and gcc-5-branch.
> > 
> > Ok for trunk and one week later for the other branches?
> > 
> > Regards,
> > Andre
> >   
> 
> Yes, OK and thanks for patch.
> 
> Jerry


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 238367)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2016-07-15  Andre Vehreschild  
+
+	PR fortran/71807
+	* trans-expr.c (gfc_trans_subcomponent_assign): Special casing
+	when allocatable component is set to null() in initializer.
+
 2016-07-14  Steven G. Kargl  
 
 	PR fortran/29819
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(Revision 238367)
+++ gcc/fortran/trans-expr.c	(Arbeitskopie)
@@ -7200,6 +7200,12 @@
   tmp = gfc_trans_alloc_subarray_assign (tmp, cm, expr);
   gfc_add_expr_to_block (, tmp);
 }
+  else if (init && cm->attr.allocatable && expr->expr_type == EXPR_NULL)
+{
+  /* NULL initialization for allocatable components.  */
+  gfc_add_modify (, dest, fold_convert (TREE_TYPE (dest),
+		  null_pointer_node));
+}
   else if (init && (cm->attr.allocatable
 	   || (cm->ts.type == BT_CLASS && CLASS_DATA (cm)->attr.allocatable
 	   && expr->ts.type != BT_CLASS)))
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 238367)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2016-07-15  Andre Vehreschild  
+
+	PR fortran/71807
+	* gfortran.dg/null_9.f90: New test.
+
 2016-07-15  Bin Cheng  
 
 	* gcc.dg/tree-ssa/loop-41.c: New test.
Index: gcc/testsuite/gfortran.dg/null_9.f90
===
--- gcc/testsuite/gfortran.dg/null_9.f90	(nicht existent)
+++ gcc/testsuite/gfortran.dg/null_9.f90	(Arbeitskopie)
@@ -0,0 +1,30 @@
+! { dg-do run }
+
+MODULE fold_convert_loc_ice
+  IMPLICIT NONE
+  PRIVATE
+
+  TYPE, PUBLIC :: ta
+PRIVATE
+INTEGER :: a_comp
+  END TYPE ta
+
+  TYPE, PUBLIC :: tb
+TYPE(ta), ALLOCATABLE :: b_comp
+  END TYPE tb
+
+  PUBLIC :: proc
+CONTAINS
+  SUBROUTINE proc
+TYPE(tb) :: b
+
+b = tb(null())
+if (allocated( b%b_comp )) call abort()
+  END SUBROUTINE proc
+END MODULE fold_convert_loc_ice
+
+  USE fold_convert_loc_ice
+
+  call proc()
+END
+


Re: [PATCH GCC]Improve loop-niter to handle possible infinite loop.

2016-07-15 Thread Bin.Cheng
On Thu, Jul 14, 2016 at 12:42 PM, Richard Biener
 wrote:
> On Wed, Jul 13, 2016 at 6:09 PM, Bin.Cheng  wrote:
>> On Fri, Jul 1, 2016 at 11:33 AM, Richard Biener
>>  wrote:
>>> On Tue, Jun 28, 2016 at 8:18 AM, Bin Cheng  wrote:
 Hi,
 At the moment, loop niter analyzer depends on simple_iv to understand 
 control induction variable in order to do further niter analysis.  For 
 cases reported in PR57558 (comment #4), the control variable is not an 
 SCEV because it's converted from an smaller type and that type could 
 overflow/wrap.  As a result, niter information for such loops won't be 
 analyzed because number_of_iterations_exit exits immediately once 
 simple_iv fails.  As a matter of fact, niter analyzer can be improved by 
 computing an additional assumption under which the loop won't be infinite 
 (or the corresponding iv doesn't overflow).  This patch does so by 
 changing both scev and niter analyzer.  It introduces a new interface 
 simple_iv_with_niters which is used in niter analyzer.  The new interface 
 returns an IV as well as a possible niter expression, the expression 
 indicates the maximum number the IV can iterate before the returned 
 simple_iv becoming invalid.  Given below example:

   short unsigned int i;
   int _1;

   :
   goto ;

   :
   arr[_1] = -1;
   i_6 = i_2 + 1;

   :
   # i_2 = PHI <1(2), i_6(3)>
   _1 = (int) i_2;
   if (_1 != 199)
 goto ;
   else
 goto ;

   :
   return;

 When analyzing variable "_1", the old interface simple_iv returns false, 
 while the new interface returns <{1, 1}_loop, 65534>.  It means "_1" is a 
 valid SCEV as long as it doesn't iterate more than 65534 times.
 This patch also introduces a new interface in niter analyzer 
 number_of_iterations_exit_assumptions.  The new interface further derives 
 assumption from the result of simple_iv_with_niters, and the assumption 
 can be used by other optimizers.  As for this loop, it's an unexpected 
 gain because assumption (198 < 65534) is naturally TRUE.
 For loops that could be infinite, the assumption will be an expression.  
 This improvement is still good, for example, the next patch to will 
 vectorize such loops with help of this patch.

 This patch also changes how assumptions is handled in niter analyzer.  At 
 the moment, (non-true) assumptions are not supported unless 
 -funsafe-loop-optimizations are specified, after this patch, the new 
 interface exposes assumptions to niter user and let them make their own 
 decision.  In effect, this patch discards -funsafe-loop-optimizations on 
 GIMPLE level, which we think is not a very useful option anyway.  Next 
 patch will xfails tests for this option.  Well, the option is not totally 
 discarded because it requires RTL changes too.  I will follow up that 
 after gimple part change.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Please make simple_iv_with_niters accept NULL as iv_niters and pass
>>> NULL from simple_iv to avoid useless work.
>>>
>>> You have chosen to make the flags per loop instead of say, flags on
>>> the global loop state.  The patch itself doesn't set the flag
>>> on any loop thus it doesn't really belong into this patch IMHO, so
>>> maybe you can split it out.  We do already have a plethora
>>> of "flags" (badly packed) in struct loop and while I see the interface
>>> is sth very specific adding another 4 bytes doesn't look
>>> too nice here (but maybe we don't care, struct loop isn't allocated
>>> that much hopefully).  I'd like to see a better comment
>>> before the flags part in cfgloop.h though noting that those are not
>>> flags computed by the compiler but flags that are set
>>> by the consumer that affect semantics of certain (document which)
>>> APIs.  And that consumers are expected to re-set
>>> flags back!  (failing to do that might be a source of hard to track down 
>>> bugs?)
>>>
>>> Anyway, the _assumtions part of the patch is ok with the suggested change.
>>>
>> Hi Richard,
>> According to your suggestion, I split the original patch into two, and
>> this is the first part.  It improves scalar evolution and loop niter
>> analyzer so that (possible) infinite loop can be handled.  As a bonus
>> point, it also picks up normal loops which were missed before, for
>> example, loop in the added test.
>>
>> Bootstrap and test on x86_64 ongoing, Is it OK?
>
> Ok.
Applied as revision 238367, failure of gcc.dg/tree-ssa/pr19210-*.c
will be handled by following patch removing
-funsafe-loop-optimizations.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-07-13  Bin Cheng  
>>
>> * tree-scalar-evolution.c (simple_iv_with_niters): New 

Re: [RFC][IPA-VRP] Disable setting param of __builtin_constant_p to null

2016-07-15 Thread Jan Hubicka
> Hi,
> 
> 
> 
> VRP assumes that it is run after inlining. Therefore, if there is a
> call to __builtin_constant_p with function parameter, it resolve it
> to false to avoid bogus warnings. Since we use this as an early vrp
> before inling, it leads to  wrong code. As a workaround I have
> disabled it for the time being. That is, this patch is not intended
> for committing but just to get the VRP tested.
> 
> 
> 
> Original patch which introduced this also talks about doing it earlier.
> 
> 
> 
> 
> 
> Thanks,
> 
> Kugan

> >From 99f8e7884d582cfae2d7cb50ad59dab7ac6772fc Mon Sep 17 00:00:00 2001
> From: Kugan Vivekanandarajah 
> Date: Sat, 25 Jun 2016 11:52:57 +1000
> Subject: [PATCH 1/6] Hack-Prevent setting __builtin_constant_p of param to
>  null before inlining in Early VRP
> 
> ---
>  gcc/tree-vrp.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index ecfab1f..23c12b5 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -3759,8 +3759,10 @@ extract_range_basic (value_range *vr, gimple *stmt)
> && SSA_NAME_IS_DEFAULT_DEF (arg)
> && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL)
>   {
> +#if 0
> set_value_range_to_null (vr, type);
> return;
> +#endif

It is not cleanest either, but better to test cfun->after_inlining

Honza


Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-07-15 Thread Bin.Cheng
On Thu, Jul 14, 2016 at 11:33 PM, Andi Kleen  wrote:
>> >> I haven't seen that. Unstable in what way?
>> > For GCC doesn't support FDO, it run below tests as you said:
>> >
>> > PASS: gcc.dg/tree-prof/20041218-1.c compilation,  -g
>> > UNSUPPORTED: gcc.dg/tree-prof/20041218-1.c: Cannot run create_gcov
>> > --binary /data/work/trunk/build/gcc/testsuite/gcc/20041218-1.gcda
>> >
>> > Normally, it doesn't create gcov data file, thus the next test is
>> > unsupported.  I guess in parallel test, the second test picks up gcov
>> > data files from other process, which results in random pass.
>> > Is it possible to not have these when fdo is supported?
>> Hmm, typo:  s/supported/not supported/.
>
> I don't see how this can happen: as you can see the .gcda file name
> is unique for each test case.
>
> I think there may be problems with the perf.data, could add a postfix.
> But you should only see that with real autofdo.
Don't know.  I only configured/built GCC on x86_64 with below command lines:

$ ../gcc/configure --prefix=...
--enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++
--with-build-config=bootstrap-O3 --disable-werror
$ make && make install

Not sure what the status for autofdo is in this case.  "make check -k"
is stable for me, but "make check -k -j#" gives unstable result in
tree-prof.exp tests.  Anything I did wrong?

Thanks,
bin


Re: [patch, fortran] PR62125 Nested select type not accepted (rejects valid)

2016-07-15 Thread Mikael Morin

Le 15/07/2016 à 03:26, Jerry DeLisle a écrit :

Hit send too soon and forgot to fill out the subject line. Correctly
given here.
On 07/14/2016 10:47 AM, Jerry DeLisle wrote:

This simple patch kindly provided by Marco solves the problem.

Regression tested on x86_64-Linux, Test case provided also.

OK to commit to trunk?


Yes. Thanks.


Re: [PATCH] zero-length arrays in OpenACC

2016-07-15 Thread Jakub Jelinek
On Wed, Jun 01, 2016 at 02:35:44PM -0700, Cesar Philippidis wrote:
> This patch teaches c and c++ front ends and omp-low how to deal with
> subarray involving GOMP_MAP_FORCE_{PRESENT,TO,FROM,TOFROM} data
> mappings. As the libgomp test case shows, it might be possible for a
> subarray to have zero length. This patch fixes that.
> 
> I also updated *parser_oacc_declare not to handle GOMP_MAP_POINTER,
> because subarray pointers are now firstprivate.
> 
> Is this OK for trunk?

Ok with me if that is the semantics OpenACC wants for zero length array
sections (which would surprise me, but it is up to you to discuss the
semantics you want).
Basically, in OpenMP 4.5, it has been voted in that zero length array
sections don't cause mapping, but rather just resolve its address to
either some device pointer if the array section points into an already
mapped object, or NULL if it corresponds to something not mapped yet.
As GNU extension, GCC still handles zero length objects (not array sections)
differently (by mapping them).

Jakub


Re: [PATCH PR71503/PR71683]Fix ICE in tree-if-conv.c

2016-07-15 Thread Bin.Cheng
On Thu, Jul 14, 2016 at 6:49 PM, Jeff Law  wrote:
> On 07/14/2016 10:12 AM, Bin Cheng wrote:
>>
>> Hi,
>> This is a simple patch fixing ICE in tree-if-conv.c.  Existing code does
>> not setup a variable (cond) when predicate of basic block is true and it
>> asserts on the variable.  Interesting thing is dead code is not cleaned up
>> before ifcvt, but that's another story.
>> Bootstrap and test on x86_64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-07-13  Bin Cheng  
>>
>> PR tree-optimization/71503
>> PR tree-optimization/71683
>> * tree-if-conv.c (gen_phi_arg_condition): Set cond when predicate
>> is true.
>
> Maybe I'm missing something, but in the case where we COND is already set
> and we encounter a true predicate later, shouldn't that make the result true
> as well?
Yes, this is my understanding too, if the case does exist.  To my
understanding, conditions for phi arguments should be complimentary to
others.  If one argument has true predicate, then other must? be
false, then the case doesn't exist.

>
> I don't think the code will though -- we just throw away the true condition.
> ISTM that the right thing to do is
>
> if (is_true_predicate (c))
>   {
> cond = c;
> continue;
>   }
Anyway, I think we can make it more robust/efficient by using:
if (is_true_predicate (c))
 {
   cond = c;
   break;
  }
What do you think?  Actually we should discard other arguments if one
has true predicate.

>
> Can you see if you can trigger a case where we have an existing cond, then
> later find a true condition to verify the right thing happens?
Failed to do so.  It should be quite rare, if no impossible,

Thanks,
bin


Re: [PATCH] Fix OpenACC vector_length parsing in fortran

2016-07-15 Thread Jakub Jelinek
On Thu, Jul 14, 2016 at 07:05:52PM -0700, Cesar Philippidis wrote:
> Is this OK for trunk and gcc6?

See other mail for trunk approval, just want to comment on the "and gcc6",
it doesn't apply to gcc6, you can add the testcase there,
but gcc 6 matches vector_length before vector, the bug has been introduced
by my gfc_match_omp_clauses alphabetical sorting.

I went through the whole function and this case is the only one that
needs the non-alphabetical match.

Jakub


Re: [PATCH] Fix OpenACC vector_length parsing in fortran

2016-07-15 Thread Jakub Jelinek
On Fri, Jul 15, 2016 at 09:44:25AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 14 Jul 2016 19:05:52 -0700, Cesar Philippidis 
>  wrote:
> > The fortran FE is currently scanning for the vector clause before
> > vector_length. That's a problem match_oacc_clause_gwv matches 'vector'
> > without looking for whatever follows it. The correction I made here was
> > to scan for vector_length before vector.
> 
> Does that me we (erroneously) accept any random junk after "vector",
> "vector_length", and likely other clauses?

No.  The thing is, for clauses where the initial condition starts with
gfc_match ("clause_name (") and longer, we can try to match them in any
order and thus using alphabetical order is ok.
If it is like for a bunch of OpenACC clauses with optional arguments
matched just with gfc_match ("clause_name"), then matching order matters.
vector_length ( arg ) in the source right now would satisfy
&& gfc_match ("vector") == MATCH_YES
and then match_oacc_clause_gwv should yield MATCH_NO, which sets
needs_space = true; and thus we reject it later on, because there is no
whitespace nor comma after vector (there is underscore).
> 
> > --- a/gcc/fortran/openmp.c
> > +++ b/gcc/fortran/openmp.c
> > @@ -1338,6 +1338,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, 
> > uint64_t mask,
> > continue;
> >   break;
> > case 'v':
> > + if ((mask & OMP_CLAUSE_VECTOR_LENGTH)
> > + && c->vector_length_expr == NULL
> > + && (gfc_match ("vector_length ( %e )", >vector_length_expr)
> > + == MATCH_YES))
> > +   continue;
> >   if ((mask & OMP_CLAUSE_VECTOR)
> >   && !c->vector
> >   && gfc_match ("vector") == MATCH_YES)
> > @@ -1353,11 +1358,6 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, 
> > uint64_t mask,
> > needs_space = true;
> >   continue;
> > }
> > - if ((mask & OMP_CLAUSE_VECTOR_LENGTH)
> > - && c->vector_length_expr == NULL
> > - && (gfc_match ("vector_length ( %e )", >vector_length_expr)
> > - == MATCH_YES))
> > -   continue;
> >   break;

The patch is ok for trunk, but please add there a short comment that
/* vector_length has to be matched before vector, because the latter
   doesn't unconditionally match '('.  */

> 
> Unless there's a more generic problem here (per my comment above), this
> at least needs a source code comment added why "vector_length" needs to
> be handled before "vector".
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/goacc/vector_length.f90
> > @@ -0,0 +1,11 @@
> > +program t
> > +  implicit none
> > +  integer, parameter :: n = 100
> > +  integer a(n), i
> > +
> > +  !$acc parallel loop num_gangs(100) num_workers(1) vector_length(32)
> > +  do i = 1, n
> > + a(i) = i
> > +  enddo
> > +  !$acc end parallel loop
> > +end program t
> 
> My preference is that such tests are added to existing test files, for
> example where "!$acc parallel loop" is tested, instead of adding such
> mini files.

No, we want to have as little churn as possible in existing tests, the
general policy is to add new tests (not just for OpenACC/OpenMP, but for
all functionality).

Jakub


Re: [PATCH] Fix OpenACC vector_length parsing in fortran

2016-07-15 Thread Thomas Schwinge
Hi!

On Thu, 14 Jul 2016 19:05:52 -0700, Cesar Philippidis  
wrote:
> The fortran FE is currently scanning for the vector clause before
> vector_length. That's a problem match_oacc_clause_gwv matches 'vector'
> without looking for whatever follows it. The correction I made here was
> to scan for vector_length before vector.

Does that me we (erroneously) accept any random junk after "vector",
"vector_length", and likely other clauses?

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -1338,6 +1338,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>   continue;
> break;
>   case 'v':
> +   if ((mask & OMP_CLAUSE_VECTOR_LENGTH)
> +   && c->vector_length_expr == NULL
> +   && (gfc_match ("vector_length ( %e )", >vector_length_expr)
> +   == MATCH_YES))
> + continue;
> if ((mask & OMP_CLAUSE_VECTOR)
> && !c->vector
> && gfc_match ("vector") == MATCH_YES)
> @@ -1353,11 +1358,6 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>   needs_space = true;
> continue;
>   }
> -   if ((mask & OMP_CLAUSE_VECTOR_LENGTH)
> -   && c->vector_length_expr == NULL
> -   && (gfc_match ("vector_length ( %e )", >vector_length_expr)
> -   == MATCH_YES))
> - continue;
> break;

Unless there's a more generic problem here (per my comment above), this
at least needs a source code comment added why "vector_length" needs to
be handled before "vector".

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/vector_length.f90
> @@ -0,0 +1,11 @@
> +program t
> +  implicit none
> +  integer, parameter :: n = 100
> +  integer a(n), i
> +
> +  !$acc parallel loop num_gangs(100) num_workers(1) vector_length(32)
> +  do i = 1, n
> + a(i) = i
> +  enddo
> +  !$acc end parallel loop
> +end program t

My preference is that such tests are added to existing test files, for
example where "!$acc parallel loop" is tested, instead of adding such
mini files.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [v3 PATCH] Fix the constraints for any's assignment operator template to properly reject assignment from a non-copyable lvalue.

2016-07-15 Thread Jonathan Wakely

On 14/07/16 20:27 +0300, Ville Voutilainen wrote:

   Fix the constraints for any's assignment operator template to properly
   reject assignment from a non-copyable lvalue.
   * include/std/any (operator=(_ValueType&&)): Constrain the decayed
   type for is_copy_constructible,
   * testsuite/20_util/any/requirements.cc: Add a test for
   non-copyable lvalues.


OK for trunk, thanks.



Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-07-15 Thread Richard Biener
On Tue, 12 Jul 2016, Richard Biener wrote:

> On Tue, 12 Jul 2016, Uros Bizjak wrote:
> 
> > On Tue, Jul 12, 2016 at 10:58 AM, Richard Biener  wrote:
> > > On Sun, 10 Jul 2016, Uros Bizjak wrote:
> > >
> > >> On Wed, Jul 6, 2016 at 3:18 PM, Richard Biener  wrote:
> > >>
> > >> >> > 2016-07-04  Richard Biener  
> > >> >> >
> > >> >> > PR rtl-optimization/68961
> > >> >> > * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and 
> > >> >> > CONCAT
> > >> >> > to simplify to a non-constant.
> > >> >> >
> > >> >> > * gcc.target/i386/pr68961.c: New testcase.
> > >> >>
> > >> >> Thanks, LGTM.
> > >> >
> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, it causes
> > >> >
> > >> > FAIL: gcc.target/i386/sse2-load-multi.c scan-assembler-times movup 2
> > >> >
> > >> > as the peephole created for that testcase no longer applies as fwprop
> > >> > does
> > >> >
> > >> > In insn 10, replacing
> > >> >  (vec_concat:V2DF (vec_select:DF (reg:V2DF 91)
> > >> > (parallel [
> > >> > (const_int 0 [0])
> > >> > ]))
> > >> > (mem:DF (reg/f:DI 95) [0  S8 A128]))
> > >> >  with (vec_concat:V2DF (reg:DF 93 [ MEM[(const double *) + 8B] ])
> > >> > (mem:DF (reg/f:DI 95) [0  S8 A128]))
> > >> > Changed insn 10
> > >> >
> > >> > resulting in
> > >> >
> > >> > movsd   a+8(%rip), %xmm0
> > >> > movhpd  a+16(%rip), %xmm0
> > >> >
> > >> > again rather than movupd.
> > >> >
> > >> > Uros, there is probably a missing peephole for the new form - can you
> > >> > fix this as a followup or should I hold on this patch for a bit longer?
> > >>
> > >> No, please proceed with the patch, I'll fix this fallout with a
> > >> followup patch in a couple of days.
> > >
> > > Applied as r238238.  Is the following x86 change ok then which
> > > adjusts the vectorizer vector construction cost to sth more sensible?
> > > I have adjusted the generic implementation in targhooks.c this way
> > > a few weeks ago already.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2016-07-12  Richard Biener  
> > >
> > > * targhooks.c (default_builtin_vectorization_cost): Adjust
> > > vec_construct cost.
> > > * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
> > 
> > Looks OK to me, but let's also give Intel chance to comment.
> 
> Btw, the motivation is that the cost of large initializers like for
> v16qi or v32qi is underestimated currently.  You end up with
> 15 or 31 vinsert calls (or similar with other ISAs) and you can't do
> better than elements - 1 operations.  It doesn't really matter
> for smaller vectors of course (seen for CPU v6 x264)

I've applied the patch now given no further comments from Intel.

Richard.


Re: [libstdc++] Add C++17clamp

2016-07-15 Thread Jonathan Wakely

On 14/07/16 19:50 -0400, Ed Smith-Rowland wrote:

Here is an implementation of P0025
An algorithm to "clamp" a value between a pair of boundary values.

Testing is almost finished - looks good so far.

OK if testing passes?


OK for trunk, thanks.


I didn't see a feature test in any of the SD-6 papers or P0025.


p0096r3 proposes __cpp_lib_clamp = 201603.



Re: [RFC][IPA-VRP] Early VRP Implementation

2016-07-15 Thread kugan

Hi Andrew,

On 15/07/16 17:28, Andrew Pinski wrote:

On Fri, Jul 15, 2016 at 12:08 AM, kugan
 wrote:

Hi Andrew,


Why separate out early VRP from tree-vrp?  Just a little curious.



It is based on the discussion in
https://gcc.gnu.org/ml/gcc/2016-01/msg00069.html.
In summary, conclusion (based on my understanding) was to implement a
simplified VRP algorithm that doesn't require ASSERT_EXPR insertion.


But I don't see why you are moving it from tree-vrp.c .  That was my
question, you pointing to that discussion does not say to split it
into a new file and expose these interfaces.



Are you saying that I should keep this part of tree-vrp.c. I am happy to 
do that if this is considered the best approach.


Thanks,
Kugan


Re: [RFC][IPA-VRP] Check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO in tree-inline

2016-07-15 Thread Richard Biener
On Fri, Jul 15, 2016 at 6:43 AM, kugan
 wrote:
>
> Hi,
>
>
>
> This patch adds check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO
> in remap_ssa_name in gcc/tree-inline.c. This is not related to IPA_VRP but
> was exposed by that.

Ok.  (before your IPA VRP nothing sets RANGE_INFO on SSA vars before
inlining is complete)

Thanks,
Richard.

>
>
> Thanks,
>
> Kugan
>
>
>
>
>
> gcc/ChangeLog:
>
>
>
> 2016-07-14  Kugan Vivekanandarajah  
>
>
>
> * tree-inline.c (remap_ssa_name): Check for POINTER_TYPE_P before
>
> accessing SSA_NAME_PTR_INFO.
>
>
>
>
>


Re: [RFC][IPA-VRP] Early VRP Implementation

2016-07-15 Thread Andrew Pinski
On Fri, Jul 15, 2016 at 12:08 AM, kugan
 wrote:
> Hi Andrew,
>
>> Why separate out early VRP from tree-vrp?  Just a little curious.
>
>
> It is based on the discussion in
> https://gcc.gnu.org/ml/gcc/2016-01/msg00069.html.
> In summary, conclusion (based on my understanding) was to implement a
> simplified VRP algorithm that doesn't require ASSERT_EXPR insertion.

But I don't see why you are moving it from tree-vrp.c .  That was my
question, you pointing to that discussion does not say to split it
into a new file and expose these interfaces.

Thanks,
Andrew Pinski

>
>
>> Also it seems like if you are going to do that, putting the following
>> functions in a class would be better:
>> +extern void vrp_initialize (void);
>> +extern void vrp_finalize (bool update, bool warn_array_bounds_p);
>> +extern void vrp_intersect_ranges (value_range *vr0, value_range *vr1);
>> +extern void vrp_meet (value_range *vr0, value_range *vr1);
>> +extern enum ssa_prop_result vrp_visit_stmt (gimple *stmt,
>> +edge *taken_edge_p,
>> +tree *output_p);
>> +extern enum ssa_prop_result vrp_visit_phi_node (gphi *phi);
>> +extern bool stmt_interesting_for_vrp (gimple *stmt);
>> +
>> +extern void extract_range_from_assert (value_range *vr_p, tree expr);
>> +extern bool update_value_range (const_tree var, value_range *vr);
>> +extern value_range *get_value_range (const_tree var);
>> +extern void set_value_range (value_range *vr, enum value_range_type t,
>> + tree min, tree max, bitmap equiv);
>> +extern void change_value_range (const_tree var, value_range *new_vr);
>> +
>> +extern void dump_value_range (FILE *, value_range *);
>>
>> That is vrp_initialize becomes the constructor and vrp_finalize
>> becomes the deconstructor.
>> With both jump_thread and warn_array_bounds_p being variables you set
>> while running your pass.
>
>
> I will give this a try.
>
> Thanks,
> Kugan


Re: [PATCH] Amend dump expectation in slsr-8.c (PR, tree-optimization/71490)

2016-07-15 Thread Richard Biener
On Thu, Jul 14, 2016 at 6:10 PM, Martin Liška  wrote:
> On 07/14/2016 01:21 PM, Richard Biener wrote:
>> On Thu, Jul 14, 2016 at 1:06 PM, Martin Liška  wrote:
>>> On 07/13/2016 07:21 PM, Jeff Law wrote:
 Isn't that a code quality regression?  So instead shouldn't we be keeping 
 the same expectation, but xfailing the test?

 jeff
>>>
>>> Hello.
>>>
>>> Disabling a pass before slsr makes the test to catch both opportunities.
>>> Is the patch fine?
>>
>> No, this is still a code quality regression.  What happens is that for
>> some reason we fail to sink for GCC 6.
>
> So should I just mark the test-case as a xfail?

Leave it FAIL and open a bug.  We need to fix SLSR to handle the situation.

You can try going back to the point where the testcase was added and look at the
IL that it was supposed to test, on the GCC 6 branch we sink into one
arm but not
the other, on trunk we sink into both.  Iff the original IL was
without any sinking
then adding a testcase variant with sinking turned off might be good as well.

I'll also note that if we'd do these kind of tests as unit-tests we'd
never notice
that in real-world the testcase would have started failing due to
previous passes
messing up the IL.

Richard.

> M.
>
>>
>> Richard.
>>
>>> Thanks,
>>> Martin
>


Re: [RFC, v2] Test coverage for --param boundary values

2016-07-15 Thread Thomas Schwinge
Hi!

On Fri, 8 Jul 2016 14:47:46 +0200, Martin Liška  wrote:
> From f84ce7be4a998089541fb4512e19f54a4ec25cf6 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 8 Jul 2016 10:59:24 +0200
> Subject: [PATCH] Add tests that test boundary values of params

This became r238249.  Yay for pushing testing boundaries; next is some
fuzzy testing for GCC?  ;-D

> gcc/ChangeLog:
> 
> 2016-07-08  Martin Liska  
> 
>   * Makefile.in: Append rule for params-options.h.
>   * params-options.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-07-08  Martin Liska  
> 
>   * gcc.dg/params/blocksort-part.c: New test.
>   * gcc.dg/params/params.exp: New file.

:-/ (GNU-style ChangeLogs are just so useful... not.)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/params/blocksort-part.c
> @@ -0,0 +1,706 @@
> +
> +/*-*/
> +/*--- Block sorting machinery   ---*/
> +/*---   blocksort.c ---*/
> +/*-*/
> +
> +/* --
> +   This file is part of bzip2/libbzip2, a program and library for
> +   lossless, block-sorting data compression.
> +
> +   bzip2/libbzip2 version 1.0.6 of 6 September 2010
> +   Copyright (C) 1996-2010 Julian Seward 
> +
> +   Please read the WARNING, DISCLAIMER and PATENTS sections in the 
> +   README file.
> +
> +   This program is released under the terms of the license contained
> +   in the file LICENSE.
> +[...]

Are there any issues with including that file without the
accompanying/referenced README and LICENSE files?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/params/params.exp
> @@ -0,0 +1,64 @@
> +[...]
> +# GCC testsuite that uses the `dg.exp' driver.
> +
> +# Load support procs.
> +load_lib gcc-dg.exp
> +
> +# Initialize `dg'.
> +dg-init
> +
> +proc param_run_test { param_name param_value } {
> +global srcdir
> +global subdir
> +
> +dg-runtest $srcdir/$subdir/blocksort-part.c "" "-O3 --param 
> $param_name=$param_value"
> +}
> +
> +set fd [open "$objdir/../../params.options" r]

(I do understand what you're doing there, but) is it kosher to refer to a
file from GCC's build tree inside the test suite?  I thought the idea was
that you could run the testsuite without the build tree being available
-- as much that's possible.  As far as I remember there are a few
exception to this rule already, so maybe adding one more is not much of a
problem.  ;-) (I do know that for our internal testing we have a list of
files that need to be preserved from the GCC build directory for later
testing without the build directory being available; so I'll add this
file to the list; Joseph CCed in case he has some additional comments
after returning from his vacations.)


shows one instance of the problem, that I could quickly find:

[...]
ERROR: tcl error sourcing 
/home/jenkins/workspace/BuildToolchainThunder_elf_test_upstream/toolchain/src/gcc/testsuite/gcc.dg/params/params.exp.
ERROR: couldn't open 
"/home/jenkins/workspace/BuildToolchainThunder_elf_test_upstream/toolchain/testresults/aarch64-thunderx-elf/../../params.options":
 no such file or directory
[...]

> +set text [read $fd]
> +close $fd
> +
> +# Main loop.
> +foreach params [split $text "\n"] {
> +set parts [split $params "="]
> +set name [string trim [lindex $parts 0] '"']
> +set values [split [lindex $parts 1] ","]
> +if { [llength $values] == 3 } {
> + set default [lindex $values 0]
> + set min [lindex $values 1]
> + set max [lindex $values 2]
> + set int_max "INT_MAX"
> +
> + if { $min != -1 } {
> + param_run_test $name $min
> + }
> + if { $max != $min && $max > 0 && $max != $int_max } {
> + param_run_test $name $max
> + }
> +}
> +if { [llength $values] == 5 } {
> + foreach v $values {
> + param_run_test $name $v
> + }
> +}
> +}

This produces ambiguous test result lines:

Running 
/scratch/tschwing/nvidiak20-2/gcc/trunk-light/source-gcc/gcc/testsuite/gcc.dg/params/params.exp
 ...
PASS: gcc.dg/params/blocksort-part.c (test for excess errors)
PASS: gcc.dg/params/blocksort-part.c (test for excess errors)
PASS: gcc.dg/params/blocksort-part.c (test for excess errors)
PASS: gcc.dg/params/blocksort-part.c (test for excess errors)
[many more]

OK to fix this as follows:

PASS: gcc.dg/params/blocksort-part.c -O3 --param 
predictable-branch-outcome=0 (test for excess errors)
PASS: gcc.dg/params/blocksort-part.c -O3 --param 
predictable-branch-outcome=50 (test for excess errors)
PASS: gcc.dg/params/blocksort-part.c -O3 --param inline-min-speedup=0 (test 
for 

Re: [RFC][IPA-VRP] Early VRP Implementation

2016-07-15 Thread kugan

Hi Andrew,


Why separate out early VRP from tree-vrp?  Just a little curious.


It is based on the discussion in 
https://gcc.gnu.org/ml/gcc/2016-01/msg00069.html.
In summary, conclusion (based on my understanding) was to implement a 
simplified VRP algorithm that doesn't require ASSERT_EXPR insertion.




Also it seems like if you are going to do that, putting the following
functions in a class would be better:
+extern void vrp_initialize (void);
+extern void vrp_finalize (bool update, bool warn_array_bounds_p);
+extern void vrp_intersect_ranges (value_range *vr0, value_range *vr1);
+extern void vrp_meet (value_range *vr0, value_range *vr1);
+extern enum ssa_prop_result vrp_visit_stmt (gimple *stmt,
+edge *taken_edge_p,
+tree *output_p);
+extern enum ssa_prop_result vrp_visit_phi_node (gphi *phi);
+extern bool stmt_interesting_for_vrp (gimple *stmt);
+
+extern void extract_range_from_assert (value_range *vr_p, tree expr);
+extern bool update_value_range (const_tree var, value_range *vr);
+extern value_range *get_value_range (const_tree var);
+extern void set_value_range (value_range *vr, enum value_range_type t,
+ tree min, tree max, bitmap equiv);
+extern void change_value_range (const_tree var, value_range *new_vr);
+
+extern void dump_value_range (FILE *, value_range *);

That is vrp_initialize becomes the constructor and vrp_finalize
becomes the deconstructor.
With both jump_thread and warn_array_bounds_p being variables you set
while running your pass.


I will give this a try.

Thanks,
Kugan


Re: [RFC][IPA-VRP] Check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO in tree-inline

2016-07-15 Thread Jakub Jelinek
On Thu, Jul 14, 2016 at 09:47:03PM -0700, Andrew Pinski wrote:
> On Thu, Jul 14, 2016 at 9:43 PM, kugan
>  wrote:
> > This patch adds check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO
> > in remap_ssa_name in gcc/tree-inline.c. This is not related to IPA_VRP but
> > was exposed by that.
> 
> SSA_NAME_PTR_INFO should be NULL for non POINTER_TYPE ssa names?  Why
> is it not null in your case?

??
  /* Value range information.  */
  union ssa_name_info_type {
/* Pointer attributes used for alias analysis.  */
struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
/* Value range attributes used for zero/sign extension elimination.  */
struct GTY ((tag ("1"))) range_info_def *range_info;
  } GTY ((desc ("%1.typed.type ?" \
"!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;

SSA_NAME_PTR_INFO and SSA_NAME_RANGE_INFO share space, so all uses of
SSA_NAME_PTR_INFO should be only used for pointer types and
SSA_NAME_RANGE_INFO only for integral types.

Jakub


Re: [RFC][IPA-VRP] Check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO in tree-inline

2016-07-15 Thread kugan



Hi Andrew,


This patch adds check for POINTER_TYPE_P before accessing SSA_NAME_PTR_INFO
in remap_ssa_name in gcc/tree-inline.c. This is not related to IPA_VRP but
was exposed by that.


SSA_NAME_PTR_INFO should be NULL for non POINTER_TYPE ssa names?  Why
is it not null in your case?
In both cases there is a check for SSA_NAME_PTR_INFO being NULL before using it.


As per tree.h:

#define SSA_NAME_PTR_INFO(N) \
   SSA_NAME_CHECK (N)->ssa_name.info.ptr_info

#define SSA_NAME_RANGE_INFO(N) \
SSA_NAME_CHECK (N)->ssa_name.info.range_info

ptr_info and range_info are unions (see below). Unless I am missing 
something, now that we set range_info, we should check it is 
POINTER_TYPE_P before accessing.


  /* Value range information.  */
  union ssa_name_info_type {
/* Pointer attributes used for alias analysis.  */
struct GTY ((tag ("0"))) ptr_info_def *ptr_info;
/* Value range attributes used for zero/sign extension elimination.  */
struct GTY ((tag ("1"))) range_info_def *range_info;
  } GTY ((desc ("%1.typed.type ?" \
"!POINTER_TYPE_P (TREE_TYPE ((tree)&%1)) : 2"))) info;


Thanks,
Kugan