date:20150729

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-29 Thread Trevor Saunders

On Wed, Jul 29, 2015 at 09:11:21AM +0100, Richard Sandiford wrote:
> Trevor Saunders  writes:
> > On Tue, Jul 28, 2015 at 09:24:17PM +0100, Richard Sandiford wrote:
> >> Trevor Saunders  writes:
> >> > On Mon, Jul 27, 2015 at 09:05:08PM +0100, Richard Sandiford wrote:
> >> >> Alternatively we could have a new target_globals structure that is
> >> >> initialised with the result of calling the hook.  If we do that though,
> >> >> it might make sense to consolidate the hooks rather than have one for
> >> >> every value.  E.g. having one function for UNITS_PER_WORD, one for
> >> >> POINTER_SIZE, one for Pmode, etc., would lead to some very verbose
> >> >> target code.
> >> >
> >> > so something like
> >> >
> >> > struct target_types
> >> > {
> >> >   unsigned long pointer_size;
> >> >   ...
> >> > };
> >> >
> >> > const target_types &targetm.get_type_data ()
> >> >
> >> > ? that seems pretty reasonable, and I wouldn't expect too many ordering
> >> > issues, but who knows.  Its too bad nobody has taken on the big job of
> >> > turning targetm into a class so we can hope for some devirt help from
> >> > the compiler.
> >> 
> >> I was thinking more:
> >> 
> >>   void targetm.get_type_data (target_types *);
> >> 
> >> The caller could then initialise or post-process the defaults.  The
> >> target_types would eventually end up in some target_globals structure.
> >
> > but wouldn't that mean the hook would need to initialize all the fields
> > every time the hook was called?
> 
> Yeah, but the idea was that the hook would only be called once per
> target initialisation and the result would be stored in a target_globals
> structure.  Then places that use POINTER_SIZE would instead use the
> cached target_globals structure rather than targetm.

ok, personally I always found the have global state and update it
appropriately approach a little distasteful, but I have to admitt it
makes getting values faster than anything else, and otherwise works.  I
guess if we ever care about threaded compilation for the jit or whatever
that's not true, but that's not a bridge we need to cross now.

> For SWITCHABLE_TARGETs, the hook would be called only once for each
> subtarget used by the TU.  For other targets it would be called
> once for each change in subtarget (which is already very expensive without
> SWITCHABLE_TARGET -- targets that want it to be fast should move to
> SWITCHABLE_TARGET).
> 
> The disadvantage of:
> 
>const target_types &targetm.get_type_data ()
> 
> is that it pushes the caching logic into targetm rather than sharing
> it between all ports.  This could be a particular problem for targets
> like MIPS that support a lot of variations.

true, though I was expecting for most targets you'd just have a couple
static const structs and just choose which one to return based onthe
subtarget which wouldn't be too bad.

Trev

> 
> Thanks,
> Richard
>

C++ PATCH for c++/67021 (dependent alias template specialization)

2015-07-29 Thread Jason Merrill

In this testcase, having previously determined that "int" is not 
dependent was confusing us into thinking that ValueType was not 
dependent.  But under DR 1558, it is.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4f4b8497f404ab8f8d641878ee03ee91e6dcf6fb
Author: Jason Merrill 
Date:   Thu Jul 30 00:45:44 2015 -0400

	DR 1558
	PR c++/67021
	* pt.c (tsubst_decl) [TYPE_DECL]: Clear TYPE_DEPENDENT_P_VALID.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e92fefb..6bf3d23 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11570,6 +11570,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	  {
 	DECL_ORIGINAL_TYPE (r) = NULL_TREE;
 	set_underlying_type (r);
+	if (TYPE_DECL_ALIAS_P (r) && type != error_mark_node)
+	  /* An alias template specialization can be dependent
+		 even if its underlying type is not.  */
+	  TYPE_DEPENDENT_P_VALID (TREE_TYPE (r)) = false;
 	  }
 
 	layout_decl (r, 0);
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-52.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-52.C
new file mode 100644
index 000..2734075
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-52.C
@@ -0,0 +1,24 @@
+// PR c++/67021
+// { dg-do compile { target c++11 } }
+
+template struct Dummy;
+template<> struct Dummy {};
+
+template 
+struct all_same { static constexpr bool value = true; };
+template 
+struct all_same : all_same {};
+template 
+struct all_same { static constexpr bool value = false; };
+
+template 
+using ValueType = int;
+
+template 
+constexpr bool A(I i) {
+  return all_same, ValueType>::value;
+}
+
+int main() {
+  static_assert(A(42), "");
+}

RE: [PATCH v3][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-07-29 Thread Kumar, Venkataramanan

Hi Benedikt, 

I ran SPEC2006  fp with your previous patch (v2) for cortex-a57.   Gromacs 
gains ~5% for -mcpu=cortex-a57 -Ofast  and gains ~11% with -mcpu=cortex-a57 
-Ofast -mlow-precision-recip-sqrt.
Other FP benchmarks were within noise.

However I will  leave it for Aarch64 maintainers  to decide on the default 
tuning.

Regards,
Venkat.
 
> -Original Message-
> From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
> Sent: Wednesday, July 29, 2015 11:18 PM
> To: gcc-patches@gcc.gnu.org
> Cc: philipp.toms...@theobroma-systems.com; Kumar, Venkataramanan;
> pins...@gmail.com; e.mene...@samsung.com; Benedikt Huber
> Subject: [PATCH v3][aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
> 
> This third revision of the patch:
>  * makes -mrecip default value specified per core.
>  * disables rsqrt when -Os is given.
> 
> Ok for check in.
> 
> Benedikt Huber (1):
>   2015-07-29  Benedikt Huber  
> Philipp Tomsich  
> 
>  gcc/ChangeLog  |  19 
>  gcc/config/aarch64/aarch64-builtins.c  | 103 
>  gcc/config/aarch64/aarch64-opts.h  |   7 ++
>  gcc/config/aarch64/aarch64-protos.h|   3 +
>  gcc/config/aarch64/aarch64-simd.md |  27 ++
>  gcc/config/aarch64/aarch64.c   |  81 ++--
>  gcc/config/aarch64/aarch64.md  |   3 +
>  gcc/config/aarch64/aarch64.opt |   8 ++
>  gcc/doc/invoke.texi|  19 
>  gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
>  gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107
> +
>  11 files changed, 434 insertions(+), 6 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> 
> --
> 1.9.1

Re: [Patch] Small refactor on _State<>

2015-07-29 Thread Tim Shen

On Wed, Jul 29, 2015 at 9:21 PM, Tim Shen  wrote:
> On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely  wrote:
>> Yes, that makes sense. See the code in  for how
>> to set the alignment of the buffer appropriately. You can use the size
>> and alignment of std::function even though it will
>> sometimes be a different std::function specialization.
>
> Done.

Oops, fixed one typo: s/_Matcher/_Matcher/.


-- 
Regards,
Tim Shen
commit d9fb4e3ec5eb9fcaf08f757c2a9ddcf57289684f
Author: Tim Shen 
Date:   Wed Jul 29 21:08:43 2015 -0700

* include/bits/regex_automaton.h (_State_base, _State<>):
Remove _TraitsT dependency from _State<>; Make matcher member
into the union to reduce struct size.
* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
_State_base<>::_M_dot, _StateSeq<>::_M_clone):
Adjust to fit the interface. Factor out common parts in
_M_clone as _State<>::_M_has_alt.
* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
Only pass state id instead of the whole state.
* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
_Executer<>::_M_lookahead): Adjust to fit the interface.
* include/std/regex: Include 

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..e153d42 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
 _Opcode  _M_opcode;   // type of outgoing transition
+
+  public:
 _StateIdT_M_next; // outgoing transition
 union // Since they are mutually exclusive.
 {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// quantifiers (ungreedy if set true)
bool   _M_neg;
   };
+  // For _S_opcode_match
+  __gnu_cxx::__aligned_membuf<_Matcher> _M_matcher_storage;
 };
 
+  protected:
 explicit _State_base(_Opcode __opcode)
 : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
 { }
 
-  protected:
-~_State_base() = default;
-
   public:
+bool
+_M_has_alt()
+{
+  return _M_opcode == _S_opcode_alternative
+   || _M_opcode == _S_opcode_repeat
+   || _M_opcode == _S_opcode_subexpr_lookahead;
+}
+
 #ifdef _GLIBCXX_DEBUG
 std::ostream&
 _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template
+  template
 struct _State : _State_base
 {
-  typedef _Matcher _MatcherT;
+  typedef _Matcher<_Char_type> _MatcherT;
+  static_assert(sizeof(_MatcherT) == sizeof(_Matcher),
+   "The aussmption std::function has "
+   "the same size as std::function is violated");
+  static_assert(alignof(_MatcherT) == alignof(_Matcher),
+   "The aussmption std::function has "
+   "the same alignment as std::function is violated");
+
+  explicit
+  _State(_Opcode __opcode) : _State_base(__opcode)
+  {
+   if (_M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr()) _MatcherT();
+  }
+
+  _State(const _State& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(__rhs._M_get_matcher());
+  }
+
+  _State(_State&& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(std::move(__rhs._M_get_matcher()));
+  }
+
+  _State&
+  operator=(const _State&) = delete;
+
+  ~_State()
+  {
+   if (_M_opcode() == _S_opcode_match)
+ _M_get_matcher().~_MatcherT();
+  }
+
+  // Since correct ctor and dtor rely on _M_opcode, it's better not to
+  // change it over time.
+  _Opcode
+  _M_opcode() const
+  { return _State_base::_M_opcode; }
+
+  bool
+  _M_matches(_Char_type __char) const
+  { return _M_get_matcher()(__char); }
 
-  _MatcherT  _M_matches;// for _S_opcode_match
+  _MatcherT&
+  _M_get_matcher()
+  { return 
*reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-  explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+  const _MatcherT&
+  _M_get_matcher() const
+  { return *reinterpret_cast(this->_M_matcher_storage._M_addr()); }
 };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct _NFA
-: _NFA_base, std::vector<_State<_TraitsT>>
+: _NFA_base, std::vector<_State>
 {
-  typedef _State<_TraitsT> _StateT;
-  typedef _Matcher   _MatcherT;
+  typedef typename _TraitsT::char_type _Char_type;
+  typedef _State<_Char_type>   _StateT;
+

Re: [Patch] Small refactor on _State<>

2015-07-29 Thread Tim Shen

On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely  wrote:
> Yes, that makes sense. See the code in  for how
> to set the alignment of the buffer appropriately. You can use the size
> and alignment of std::function even though it will
> sometimes be a different std::function specialization.

Done.

Also change _Executor::_M_lookahead(_State<>) to
_Executor::_M_lookahead(_StateIdT __next_state).


-- 
Regards,
Tim Shen
commit 52c70e70bdbef15c787f81e83722bfc119543ff0
Author: Tim Shen 
Date:   Wed Jul 29 21:08:43 2015 -0700

* include/bits/regex_automaton.h (_State_base, _State<>):
Remove _TraitsT dependency from _State<>; Make matcher member
into the union to reduce struct size.
* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
_State_base<>::_M_dot, _StateSeq<>::_M_clone):
Adjust to fit the interface. Factor out common parts in
_M_clone as _State<>::_M_has_alt.
* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
Only pass state id instead of the whole state.
* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
_Executer<>::_M_lookahead): Adjust to fit the interface.
* include/std/regex: Include 

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..0ef3896 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
 _Opcode  _M_opcode;   // type of outgoing transition
+
+  public:
 _StateIdT_M_next; // outgoing transition
 union // Since they are mutually exclusive.
 {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// quantifiers (ungreedy if set true)
bool   _M_neg;
   };
+  // For _S_opcode_match
+  __gnu_cxx::__aligned_membuf<_Matcher> _M_matcher_storage;
 };
 
+  protected:
 explicit _State_base(_Opcode __opcode)
 : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
 { }
 
-  protected:
-~_State_base() = default;
-
   public:
+bool
+_M_has_alt()
+{
+  return _M_opcode == _S_opcode_alternative
+   || _M_opcode == _S_opcode_repeat
+   || _M_opcode == _S_opcode_subexpr_lookahead;
+}
+
 #ifdef _GLIBCXX_DEBUG
 std::ostream&
 _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template
+  template
 struct _State : _State_base
 {
-  typedef _Matcher _MatcherT;
+  typedef _Matcher<_Char_type> _MatcherT;
+  static_assert(sizeof(_MatcherT) == sizeof(_Matcher),
+   "The aussmption std::function has "
+   "the same size as std::function is violated");
+  static_assert(alignof(_MatcherT) == alignof(_Matcher),
+   "The aussmption std::function has "
+   "the same alignment as std::function is violated");
+
+  explicit
+  _State(_Opcode __opcode) : _State_base(__opcode)
+  {
+   if (_M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr()) _MatcherT();
+  }
+
+  _State(const _State& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(__rhs._M_get_matcher());
+  }
+
+  _State(_State&& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(std::move(__rhs._M_get_matcher()));
+  }
+
+  _State&
+  operator=(const _State&) = delete;
+
+  ~_State()
+  {
+   if (_M_opcode() == _S_opcode_match)
+ _M_get_matcher().~_MatcherT();
+  }
+
+  // Since correct ctor and dtor rely on _M_opcode, it's better not to
+  // change it over time.
+  _Opcode
+  _M_opcode() const
+  { return _State_base::_M_opcode; }
+
+  bool
+  _M_matches(_Char_type __char) const
+  { return _M_get_matcher()(__char); }
 
-  _MatcherT  _M_matches;// for _S_opcode_match
+  _MatcherT&
+  _M_get_matcher()
+  { return 
*reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-  explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+  const _MatcherT&
+  _M_get_matcher() const
+  { return *reinterpret_cast(this->_M_matcher_storage._M_addr()); }
 };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct _NFA
-: _NFA_base, std::vector<_State<_TraitsT>>
+: _NFA_base, std::vector<_State>
 {
-  typedef _State<_TraitsT> _StateT;
-  typedef _Matcher   _MatcherT;
+  typedef typename _TraitsT::char_type _Char_type;
+  typedef _State<_Char_type>   _StateT;
+

[PATCH] rs6000: Fix PR67045

2015-07-29 Thread Segher Boessenkool

Paper bag time.  Committing as obvious fix.  Bootstrapped and regression
checked on powerpc64-linux and powerpc64le-linux; also bootstrapped the
latter with --enable-checking=release and -O3 (the PR67045 case).  Will
do an --enable-checking=yes,rtl as well.


Segher


2015-07-29  Segher Boessenkool  

PR target/66217
PR target/67045
* config/rs6000/rs6000.md (and3): Put a CONST_INT_P check
around those cases that need one.

---
 gcc/config/rs6000/rs6000.md | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index f7fa399..527ad98 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2898,26 +2898,29 @@ (define_expand "and3"
   DONE;
 }
 
-  if (rs6000_is_valid_and_mask (operands[2], mode))
+  if (CONST_INT_P (operands[2]))
 {
-  emit_insn (gen_and3_mask (operands[0], operands[1], operands[2]));
-  DONE;
-}
+  if (rs6000_is_valid_and_mask (operands[2], mode))
+   {
+ emit_insn (gen_and3_mask (operands[0], operands[1], 
operands[2]));
+ DONE;
+   }
 
-  if (logical_const_operand (operands[2], mode)
-  && rs6000_gen_cell_microcode)
-{
-  emit_insn (gen_and3_imm (operands[0], operands[1], operands[2]));
-  DONE;
-}
+  if (logical_const_operand (operands[2], mode)
+ && rs6000_gen_cell_microcode)
+   {
+ emit_insn (gen_and3_imm (operands[0], operands[1], 
operands[2]));
+ DONE;
+   }
 
-  if (rs6000_is_valid_2insn_and (operands[2], mode))
-{
-  rs6000_emit_2insn_and (mode, operands, true, 0);
-  DONE;
-}
+  if (rs6000_is_valid_2insn_and (operands[2], mode))
+   {
+ rs6000_emit_2insn_and (mode, operands, true, 0);
+ DONE;
+   }
 
-  operands[2] = force_reg (mode, operands[2]);
+  operands[2] = force_reg (mode, operands[2]);
+}
 })
 
 
-- 
1.8.1.4

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-29 Thread Segher Boessenkool

On Wed, Jul 29, 2015 at 11:16:40AM +0100, Richard Earnshaw wrote:
> I'm getting a bit worried about the potential performance impact from
> all these indirect function call hooks.  This is a good example of when
> it's probably somewhat unnecessary.  I doubt that the compiler could
> function correctly if this ever changed in the middle of a function.

It is also very ugly and much harder to read: it is longer, with more
useless interpunction, and there is nothing that makes clear it is a
constant.


Segher

[gomp4.1] fold ordered depend(sink) clauses

2015-07-29 Thread Aldy Hernandez

The attached patch canonicalizes sink dependence clauses into one folded 
clause if possible (as discussed in the paper "Expressing DOACROSS Loop 
Dependences in OpenMP").


The basic algorithm is to create a sink vector whose first element is 
the GCD of all the first elements, and whose remaining elements are the 
minimum of the subsequent columns.  Further explanations are included in 
the code.


I have also added further warnings/errors for incompatible and 
nonsensical sink offsets.


I suggest you start with the tests and first see if you agree with the 
folded cases.


How does this look?
commit 3f0a9dc6ceca690a77c74549a42040c52bc02fdc
Author: Aldy Hernandez 
Date:   Wed Jul 29 13:39:06 2015 -0700

* wide-int.h (wi::gcd): New.
* gimplify.c (struct gimplify_omp_ctx): Rename iter_vars to
loop_iter_var.
Add loop_dir and loop_const_step fields.
(delete_omp_context): Free loop_dir and loop_const_step.
(gimplify_omp_for): Set loop_dir and loop_const_step.
(gimplify_expr): Move code handling OMP_ORDERED into...
(gimplify_omp_ordered): ...here.  New.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2331001..5262233 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -153,7 +153,18 @@ struct gimplify_omp_ctx
   splay_tree variables;
   hash_set *privatized_types;
   /* Iteration variables in an OMP_FOR.  */
-  vec iter_vars;
+  vec loop_iter_var;
+
+  /* Direction of loop in an OMP_FOR.  */
+  enum dir {
+DIR_UNKNOWN,
+DIR_FORWARD,
+DIR_BACKWARD
+  };
+  vec loop_dir;
+
+  /* Absolute value of step.  NULL_TREE if non-constant.  */
+  vec loop_const_step;
   location_t location;
   enum omp_clause_default_kind default_kind;
   enum omp_region_type region_type;
@@ -392,7 +403,9 @@ delete_omp_context (struct gimplify_omp_ctx *c)
 {
   splay_tree_delete (c->variables);
   delete c->privatized_types;
-  c->iter_vars.release ();
+  c->loop_iter_var.release ();
+  c->loop_dir.release ();
+  c->loop_const_step.release ();
   XDELETE (c);
 }
 
@@ -7490,8 +7503,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
  == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
   gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
  == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
-  gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH
-  (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH
+  (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH
+ (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH
+(OMP_FOR_INIT (for_stmt)));
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
 {
   t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
@@ -7501,10 +7518,10 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
  || POINTER_TYPE_P (TREE_TYPE (decl)));
   if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))
-   gimplify_omp_ctxp->iter_vars.quick_push
+   gimplify_omp_ctxp->loop_iter_var.quick_push
  (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i));
   else
-   gimplify_omp_ctxp->iter_vars.quick_push (decl);
+   gimplify_omp_ctxp->loop_iter_var.quick_push (decl);
 
   /* Make sure the iteration variable is private.  */
   tree c = NULL_TREE;
@@ -7670,6 +7687,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   t = TREE_VEC_ELT (OMP_FOR_COND (for_stmt), i);
   gcc_assert (COMPARISON_CLASS_P (t));
   gcc_assert (TREE_OPERAND (t, 0) == decl);
+  switch (TREE_CODE (t))
+   {
+   case LT_EXPR:
+   case LE_EXPR:
+ gimplify_omp_ctxp->loop_dir.quick_push
+   (gimplify_omp_ctx::DIR_FORWARD);
+ break;
+   case GT_EXPR:
+   case GE_EXPR:
+ gimplify_omp_ctxp->loop_dir.quick_push
+   (gimplify_omp_ctx::DIR_BACKWARD);
+ break;
+   default:
+ gimplify_omp_ctxp->loop_dir.quick_push
+   (gimplify_omp_ctx::DIR_UNKNOWN);
+ break;
+   }
 
   tret = gimplify_expr (&TREE_OPERAND (t, 1), &for_pre_body, NULL,
is_gimple_val, fb_rvalue);
@@ -7687,6 +7721,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   called to massage things appropriately.  */
gcc_assert (!POINTER_TYPE_P (TREE_TYPE (decl)));
 
+   /* Pointer increments have been adjusted by now, so p++
+  should be p += SIZE, and handled by the MODIFY_EXPR
+  case below.  */
+   gimplify_omp_ctxp->loop_const_step.quick_push (integer_one_node);
+
if (orig_for_stmt != for_stmt)
  break;
t = build_int_cst (TREE_TYPE (decl), 1);
@@ -7703,6 +7742,7 @@ g

[gomp4] openacc default handling

2015-07-29 Thread Nathan Sidwell

I've committed this to  gomp4.  When I broke out the oacc_default_clause 
function from omp_notice_variable, I was puzzled by the ordering of the lookups, 
but could quite figure out what was wrong.  further investigation and standard 
reading showed that for  openacc, we look in outer lexical scopes to find a data 
clause for the object, before applying the default behaviour.  This patch 
reorganizes things to  make that the case.


nathan
2015-07-29  Nathan Sidwell  

	gcc/
	* gimplify.c (oacc_default_clause): Outer scope searching moved to
	omp_notice_variable.
	(omp_notice_variable): For OpenACC search enclosing scopes before
	applying default.

	gcc/testsuite/
	* c-c++-common/goacc/default-2.c: New.

Index: gcc/testsuite/c-c++-common/goacc/default-2.c
===
--- gcc/testsuite/c-c++-common/goacc/default-2.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/default-2.c	(revision 0)
@@ -0,0 +1,21 @@
+void Foo ()
+{
+  int ary[10];
+  
+#pragma acc parallel default(none) /* { dg-error "enclosing" } */
+  {
+ary[0] = 5; /* { dg-error "not specified" }  */
+  }
+}
+
+void Baz ()
+{
+  int ary[10];
+#pragma acc data copy (ary)
+  {
+#pragma acc parallel default(none)
+{
+  ary[0] = 5;
+}
+  }
+}
Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 226334)
+++ gcc/gimplify.c	(working copy)
@@ -5934,30 +5934,10 @@ oacc_default_clause (struct gimplify_omp
 
 case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
   {
-	if (struct gimplify_omp_ctx *octx = ctx->outer_context)
-	  {
-	omp_notice_variable (octx, decl, in_code);
-	
-	for (; octx; octx = octx->outer_context)
-	  {
-		if (octx->region_type & ORT_HOST_DATA)
-		  continue;
-		if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
-		  break;
-	  splay_tree_node n2
-		= splay_tree_lookup (octx->variables, (splay_tree_key) decl);
-	  if (n2)
-		{
-		  flags |= GOVD_MAP;
-		  goto found_outer;
-		}
-	  }
-	  }
-
 	if (is_global_var (decl) && device_resident_p (decl))
 	  flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
-	/* Scalars under kernels are default 'copy'.  */
 	else if (ctx->acc_region_kind == ARK_KERNELS)
+	  /* Scalars under kernels are default 'copy'.  */
 	  flags |= GOVD_FORCE_MAP | GOVD_MAP;
 	else if (ctx->acc_region_kind == ARK_PARALLEL)
 	  {
@@ -5968,16 +5948,14 @@ oacc_default_clause (struct gimplify_omp
 	  type = TREE_TYPE (type);
 	
 	if (AGGREGATE_TYPE_P (type))
-	  /* Aggregates default to 'copy'.  This should really
-		 include GOVD_FORCE_MAP.  */
+	  /* Aggregates default to 'present_or_copy'.  */
 	  flags |= GOVD_MAP;
 	else
-	  /* Scalars default tp 'firstprivate'.  */
+	  /* Scalars default to 'firstprivate'.  */
 	  flags |= GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY | GOVD_MAP;
 	  }
 	else
 	  gcc_unreachable ();
-  found_outer:;
   }
   break;
 }
@@ -6020,21 +5998,49 @@ omp_notice_variable (struct gimplify_omp
   if (ctx->region_type == ORT_TARGET)
 {
   ret = lang_hooks.decls.omp_disregard_value_expr (decl, true);
-  if (n == NULL)
+  bool is_oacc = ctx->region_kind == ORK_OACC;
+
+  if (!n)
 	{
-	  bool is_oacc = ctx->region_kind == ORK_OACC;
+	  struct gimplify_omp_ctx *octx = ctx->outer_context;
 
-	  if (is_oacc)
-	flags = oacc_default_clause (ctx, decl, in_code, flags);
-	  else
-	flags |= GOVD_MAP;
+	  /*  OpenMP doesn't look in outer contexts to find an
+	  enclosing data clause.  */
+	  if (is_oacc && octx)
+	{
+	  omp_notice_variable (octx, decl, in_code);
+	  
+	  for (; octx; octx = octx->outer_context)
+		{
+		  if (octx->region_type & ORT_HOST_DATA)
+		continue;
+		  if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
+		break;
+		  splay_tree_node n2
+		= splay_tree_lookup (octx->variables,
+	 (splay_tree_key) decl);
+		  if (n2)
+		{
+		  flags |= GOVD_MAP;
+		  goto found_outer;
+		}
+		}
+	}
 
-	  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (decl), is_oacc))
+	  if (!lang_hooks.types.omp_mappable_type
+	  (TREE_TYPE (decl), ctx->region_kind == ORK_OACC))
 	{
 	  error ("%qD referenced in target region does not have "
 		 "a mappable type", decl);
 	  flags |= GOVD_EXPLICIT;
 	}
+
+	  if (is_oacc)
+	flags = oacc_default_clause (ctx, decl, in_code, flags);
+	  else
+	flags |= GOVD_MAP;
+
+	found_outer:;
 	  omp_add_variable (ctx, decl, flags);
 	}
   else

lto wrapper verboseness

2015-07-29 Thread Nathan Sidwell


Jakub,
this patch augments the lto wrapper to print out the arguments to spawned 
commands when verbose.  I found this useful in debugging recent development.


ok for trunk?

nathan
2015-07-29  Nathan Sidwell  

	* lto-wrapper.c (verbose_exec): New.
	(compile_offload_image, run_gcc): Call it.

Index: gcc/lto-wrapper.c
===
--- gcc/lto-wrapper.c	(revision 226372)
+++ gcc/lto-wrapper.c	(working copy)
@@ -115,6 +115,15 @@ maybe_unlink (const char *file)
 fprintf (stderr, "[Leaving LTRANS %s]\n", file);
 }
 
+/* Print command being executed.  */
+static void
+verbose_exec (const char *const *argv)
+{
+  while (*argv)
+fprintf (stderr, " %s", *argv++);
+  fprintf (stderr, "\n");
+}
+
 /* Template of LTRANS dumpbase suffix.  */
 #define DUMPBASE_SUFFIX ".ltrans18446744073709551615"
 
@@ -693,6 +702,8 @@ compile_offload_image (const char *targe
 
   obstack_ptr_grow (&argv_obstack, NULL);
   argv = XOBFINISH (&argv_obstack, char **);
+  if (verbose)
+	verbose_exec (argv);
   fork_execute (argv[0], argv, true);
   obstack_free (&argv_obstack, NULL);
 }
@@ -1156,6 +1167,8 @@ run_gcc (unsigned argc, char *argv[])
 
   new_argv = XOBFINISH (&argv_obstack, const char **);
   argv_ptr = &new_argv[new_head_argc];
+  if (verbose)
+verbose_exec (new_argv);
   fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
 
   if (lto_mode == LTO_MODE_LTO)
@@ -1264,6 +1277,8 @@ cont:
 	}
 	  else
 	{
+	  if (verbose)
+		verbose_exec (new_argv);
 	  fork_execute (new_argv[0], CONST_CAST (char **, new_argv),
 			true);
 	  maybe_unlink (input_name);

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-07-29 Thread Michael Meissner

On Wed, Jul 29, 2015 at 05:46:42PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote:
> > On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> > > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > > > +;; Return constant 0x8000 in an Altivec 
> > > > register.
> > > > +
> > > > +(define_expand "altivec_high_bit"
> > > > +  [(set (match_dup 1)
> > > > +   (vec_duplicate:V16QI (const_int 7)))
> > > > +   (set (match_dup 2)
> > > > +   (ashift:V16QI (match_dup 1)
> > > > + (match_dup 1)))
> > > > +   (set (match_dup 3)
> > > > +   (match_dup 4))
> > > > +   (set (match_operand:V16QI 0 "register_operand" "")
> > > > +   (unspec:V16QI [(match_dup 2)
> > > > +  (match_dup 3)
> > > > +  (const_int 15)] UNSPEC_VSLDOI))]
> > > > +  "TARGET_ALTIVEC"
> > > > +{
> > > > +  if (can_create_pseudo_p ())
> > > > +{
> > > > +  operands[1] = gen_reg_rtx (V16QImode);
> > > > +  operands[2] = gen_reg_rtx (V16QImode);
> > > > +  operands[3] = gen_reg_rtx (V16QImode);
> > > > +}
> > > > +  else
> > > > +operands[1] = operands[2] = operands[3] = operands[0];
> > > 
> > > This won't work (in the pattern you write to op 3 before reading from op 
> > > 2).
> > > Do you ever call this expander late, anyway?
> > 
> > I'm not sure I follow you.
> 
> I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p ()
> one.  If that is executed operands[2] will be the same reg as operands[3],
> and things fall apart.

Yes, you are right. I'll put an abort in there if we can't allocate
pseudos. But since it is called during RTL expansion of abskf2, negkf2, etc. we
won't run into it.  Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[gomp4] PTX target format

2015-07-29 Thread Nathan Sidwell

I've committed this to gomp4 branch.  It changes the ptx target data format from 
a string array with embedded NULs, to an array of pointers & sizes to separate 
strings for each object file.  This avoids the use of strlen when loading onto 
the PTX device.


Not incrementing the PTX version number, as that just got incremented for the 
launch API change.


nathan
2015-07-29  Nathan Sidwell  

gcc/
* config/nvptx/mkoffload.c (process): Reimplement emission of ptx
objects to set of arrays.

libgomp/
* plugin/plugin-nvptx.c (struct targ_ptx_obj): New.
(struct nvptx_tdata): Move earlier, adjust.
(link_ptx): Take targ_ptx_obj array and adjust.
(GOMP_OFFLOAD_load_image_ver): Adjust link_ptx call.

Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c   (revision 226371)
+++ libgomp/plugin/plugin-nvptx.c   (working copy)
@@ -290,6 +290,28 @@ struct targ_fn_launch
   unsigned short dim[GOMP_DIM_MAX];
 };
 
+/* Target PTX object information.  */
+
+struct targ_ptx_obj
+{
+  const char *code;
+  size_t size;
+};
+
+/* Target data image information.  */
+
+typedef struct nvptx_tdata
+{
+  const struct targ_ptx_obj *ptx_objs;
+  unsigned ptx_num;
+
+  const char *const *var_names;
+  unsigned var_num;
+
+  const struct targ_fn_launch *fn_descs;
+  unsigned fn_num;
+} nvptx_tdata_t;
+
 /* Descriptor of a loaded function.  */
 
 struct targ_fn_descriptor
@@ -824,7 +846,8 @@ nvptx_get_num_devices (void)
 
 
 static void
-link_ptx (CUmodule *module, char const *ptx_code, size_t length)
+link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
+ unsigned num_objs)
 {
   CUjit_option opts[7];
   void *optvals[7];
@@ -838,8 +861,6 @@ link_ptx (CUmodule *module, char const *
   void *linkout;
   size_t linkoutsize __attribute__ ((unused));
 
-  GOMP_PLUGIN_debug (0, "attempting to load:\n---\n%s\n---\n", ptx_code);
-
   opts[0] = CU_JIT_WALL_TIME;
   optvals[0] = &elapsed;
 
@@ -865,25 +886,22 @@ link_ptx (CUmodule *module, char const *
   if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal ("cuLinkCreate error: %s", cuda_error (r));
 
-  size_t off = 0;
-  while (off < length)
+  for (; num_objs--; ptx_objs++)
 {
-  int l = strlen (ptx_code + off);
   /* cuLinkAddData's 'data' argument erroneously omits the const
 qualifier.  */
-  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, (char*)ptx_code + off, l 
+ 1,
-0, 0, 0, 0);
+  GOMP_PLUGIN_debug (0, "Loading:\n---\n%s\n---\n", ptx_objs->code);
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, (char*)ptx_objs->code,
+ptx_objs->size, 0, 0, 0, 0);
   if (r != CUDA_SUCCESS)
{
  GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
- GOMP_PLUGIN_fatal ("cuLinkAddData (ptx_code) error: %s", cuda_error 
(r));
+ GOMP_PLUGIN_fatal ("cuLinkAddData (ptx_code) error: %s",
+cuda_error (r));
}
-
-  off += l;
-  while (off < length && ptx_code[off] == '\0')
-   off++;
 }
 
+  GOMP_PLUGIN_debug (0, "Linking\n");
   r = cuLinkComplete (linkstate, &linkout, &linkoutsize);
 
   GOMP_PLUGIN_debug (0, "Link complete: %fms\n", elapsed);
@@ -1619,18 +1637,6 @@ GOMP_OFFLOAD_fini_device (int n)
   pthread_mutex_unlock (&ptx_dev_lock);
 }
 
-typedef struct nvptx_tdata
-{
-  const char *ptx_src;
-  size_t ptx_len;
-
-  const char *const *var_names;
-  size_t var_num;
-
-  const struct targ_fn_launch *fn_descs;
-  size_t fn_num;
-} nvptx_tdata_t;
-
 /* Return the libgomp version number we're compatible with.  There is
no requirement for cross-version compatibility.  */
 
@@ -1670,7 +1676,7 @@ GOMP_OFFLOAD_load_image_ver (unsigned ve
   
   nvptx_attach_host_thread_to_device (ord);
 
-  link_ptx (&module, img_header->ptx_src, img_header->ptx_len);
+  link_ptx (&module, img_header->ptx_objs, img_header->ptx_num);
 
   /* The mkoffload utility emits a struct of pointers/integers at the
  start of each offload image.  The array of kernel names and the
Index: gcc/config/nvptx/mkoffload.c
===
--- gcc/config/nvptx/mkoffload.c(revision 226371)
+++ gcc/config/nvptx/mkoffload.c(working copy)
@@ -229,51 +229,68 @@ process (FILE *in, FILE *out)
   const char *input = read_file (in, &len);
   const char *comma;
   id_map const *id;
+  unsigned obj_count = 0;
+  size_t i;
 
-  fprintf (out, "static const char ptx_code[] = \n \"");
-  for (size_t i = 0; i < len; i++)
+  /* Dump out char arrays for each PTX object file.  These are
+ terminated by a NUL.  */
+  for (i = 0; i != len;)
 {
-  char c = input[i];
-  bool nl = false;
-  switch (c)
+  char c;
+  
+  fprintf (out, "static const char ptx_code_%u[] =\n\t\"", obj_count++);
+  while ((c = input[i++]))

Re: C++ delayed folding branch review

2015-07-29 Thread Kai Tietz

2015-07-29 19:48 GMT+02:00 Jason Merrill :
> On 07/28/2015 04:10 PM, Kai Tietz wrote:
>>
>> 2015-07-28 1:14 GMT+02:00 Kai Tietz :
>>
>>> 2015-07-27 18:51 GMT+02:00 Jason Merrill :

 I've trimmed this to the previously mentioned issues that still need to
 be
 addressed; I'll do another full review after these are dealt with.
>>>
>>>
>>> Thanks for doing this summary of missing parts of prior review.
>>>
 On 06/13/2015 12:15 AM, Jason Merrill wrote:
>
>
> On 06/12/2015 12:11 PM, Kai Tietz wrote:


 @@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp)
{
  if (TREE_TYPE (temp) == type)
return temp;
 +  STRIP_NOPS (temp);
 +  if (TREE_TYPE (temp) == type)
 +return temp;
 @@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx
 *ctx,
 tree t,
bool
reduced_constant_expression_p (tree t)
{
 +  /* Make sure we remove useless initial NOP_EXPRs.  */
 +  STRIP_NOPS (t);
>
> ^
>

Checked, and removing those STRIP_NOPS cause regressions about
vector-casts.  At least the STRIP_NOPS in
reduced_constant_expression_p seems to be required.  See as example
g++.dg/ext/vector20.C as testcase.
It sees that '(vec)(const __vector(2) long int){3l, 4l}' is not a
constant expression.

The change to adjust_temp_type seems to be no more necessary (just
doing tests on it).

 @@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const
 constexpr_ctx
 *ctx, tree t,
 && is_dummy_object (x))
   {
 x = ctx->object;
 - x = cp_build_addr_expr (x, tf_warning_or_error);
 + if (x)
 +   x = cp_build_addr_expr (x, tf_warning_or_error);
 + else
 +   x = get_nth_callarg (t, i);
>>>
>>>
>>>
>>> This still should not be necessary.

Replaced the x = get_nth_callarg (t,i);  by a gcc_unreachable ();,
just to be sure we hit issue, if occures.

>>
>>
>> Yeah, most likely.  But I got initially here some issues, so I don't
>> see that this code would worsen things.
>
>
>
> If this code path is hit, that means something has broken my design,
> and
> I don't want to just paper over that.  Please revert this change.
>
>
> ^
>
case SIZEOF_EXPR:
 +  if (processing_template_decl
 + && (!COMPLETE_TYPE_P (TREE_TYPE (t))
 + || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST))
 +   return t;
>>>
>>>
>>>
>>> Why is this necessary?

The issue is that by delayed-folding we don't fold sizeof-expressions
until we do the folding after genericize-pass.  So those expressions
remain, and we can run in template on sizeof-operators on incomplete
types, if we invoke here variants of the constexpr-code.  So this
pattern simply verifies that the sizeof-operand can be determined.  We
could simply avoid resolving sizeof-operators in template-decl at all.
But my idea here was to try to resolve them, if the type of the
operand is already complete (and has an constant size).

>>
>>
>> We don't want to resolve SIZEOF_EXPR within template-declarations for
>> incomplete types, of if its size isn't fixed.  Issue is that we
>> otherwise get issues about expressions without existing type (as usual
>> within template-declarations for some expressions).
>
>
>
> Yes, but we shouldn't have gotten this far with a dependent sizeof;
> maybe_constant_value just returns if
> instantiation_dependent_expression_p is true.
>
> ^

Well, but we could come here by other routine then
maybe_constant_value. For example cxx_constnat_value doesn't do checks
here.

>
 @@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const
 constexpr_ctx
 *ctx, tree t,
case CONVERT_EXPR:
case VIEW_CONVERT_EXPR:
case NOP_EXPR:
 +case UNARY_PLUS_EXPR:
  {
 +   enum tree_code tcode = TREE_CODE (t);
   tree oldop = TREE_OPERAND (t, 0);
 +
 +   if (tcode == NOP_EXPR && TREE_TYPE (t) == TREE_TYPE (oldop)
 &&
 TREE_OVERFLOW_P (oldop))
 + {
 +   if (!ctx->quiet)
 + permerror (input_location, "overflow in constant
 expression");
 +   /* If we're being permissive (and are in an enforcing
 +   context), ignore the overflow.  */
 +   if (!flag_permissive)
 + *overflow_p = true;
 +   *non_constant_p = true;
 +
 +   return t;
 + }
   tree op = cxx_eval_constant_expression (ctx, oldop,
>>>
>>>
>>>
>>>

Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-07-29 Thread Jeff Law


On 07/27/2015 09:47 PM, Trevor Saunders wrote:

On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:

On Sat, Jul 25, 2015 at 4:37 AM,   wrote:

From: Trevor Saunders 

 * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
 config/ia64/ia64-protos.h, config/ia64/ia64.c, config/ia64/ia64.h,
 config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
 config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
 config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
the name of a function.
 * output.h (default_output_label): New prototype.
 * varasm.c (default_output_label): New function.
 * vmsdbgout.c: Include tm_p.h.
 * xcoffout.c: Likewise.


Just a general remark - the GCC output machinery is known to be slow,
adding indirect calls might be not the very best idea without refactoring
some of it.


ah, I wasn't aware of that.  On the other hand some of these hooks seem
to generally be big so the call over head might not matter that much.  I
suppose if this is something we really care about we might want to
consider pushing the libgas project farther so we can avoid all this
text formatting all together.
They're definitely slow, but I've always considered that overhead as in 
the noise relative to what it takes to run through the optimizer passes.


I'm tentatively supported of libgas, but I have concerns as well.  I've 
been burned by (for example) vendor assemblers which didn't have a means 
(assembler directives) to correctly handle certain programs.  The 
vendor's compiler didn't have the problem because it just blasted .o 
files and didn't pass them through the assembler.  I'd hate to see us 
repeating those mistakes.


Jeff

Re: [PATCH][2/N] Replace the pattern GET_MODE_SIZE (GET_MODE_INNER (m)) with GET_MODE_UNIT_SIZE (m)

2015-07-29 Thread Jeff Law


On 07/29/2015 02:00 AM, David Sherwood wrote:

Hi,

This patch follows on from

[PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

It is another tidy up, replacing the pattern GET_MODE_SIZE (GET_MODE_INNER (m))
with GET_MODE_UNIT_SIZE (m).

Tested:
aarch64 and aarch64_be - no regressions in gcc testsuite
x86_64 - bootstrap build, no testsuite regressions
arm-none-eabi - no regressions in gcc testsuite
Run contrib/config-list.mk - no regressions

Good to go?

Thanks,
David.

ChangeLog:

2015-07-29  David Sherwood  

 gcc/config/
 * aarch64/aarch64-simd.md (aarch64_ext): Replace call to
 GET_MODE_SIZE (GET_MODE_INNER (m)) with GET_MODE_UNIT_SIZE (m).
 * aarch64/aarch64.c (aarch64_simd_valid_immediate): Likewise.
 * arm/arm.c (neon_valid_immediate): Likewise.
 * i386/i386.c (classify_argument, ix86_expand_int_vcond): Likewise.
 (expand_vec_perm_blend, expand_vec_perm_pshufb): Likewise.
 (expand_vec_perm_pshufb2, expand_vec_perm_vpshufb2_vpermq): Likewise.
 (expand_vec_perm_vpshufb2_vpermq): Likewise.
 (expand_vec_perm_vpshufb2_vpermq_even_odd): Likewise.
 (expand_vec_perm_vpshufb4_vpermq2): Likewise.
 * i386/sse.md
 (_vinsert_mask): Likewise.
 (*ssse3_palignr_perm): Likewise.
 * rs6000/rs6000.c (rs6000_complex_function_value): Likewise.
 * spu/spu.c (arith_immediate_p): Likewise.
 gcc/
 * simplify-rtx.c (simplify_const_unary_operation): Likewise.
 (simplify_binary_operation_1, simplify_ternary_operation): Likewise.


OK.
jeff

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-07-29 Thread Segher Boessenkool

On Wed, Jul 29, 2015 at 06:38:45PM -0400, Michael Meissner wrote:
> On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> > On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > > +;; Return constant 0x8000 in an Altivec 
> > > register.
> > > +
> > > +(define_expand "altivec_high_bit"
> > > +  [(set (match_dup 1)
> > > + (vec_duplicate:V16QI (const_int 7)))
> > > +   (set (match_dup 2)
> > > + (ashift:V16QI (match_dup 1)
> > > +   (match_dup 1)))
> > > +   (set (match_dup 3)
> > > + (match_dup 4))
> > > +   (set (match_operand:V16QI 0 "register_operand" "")
> > > + (unspec:V16QI [(match_dup 2)
> > > +(match_dup 3)
> > > +(const_int 15)] UNSPEC_VSLDOI))]
> > > +  "TARGET_ALTIVEC"
> > > +{
> > > +  if (can_create_pseudo_p ())
> > > +{
> > > +  operands[1] = gen_reg_rtx (V16QImode);
> > > +  operands[2] = gen_reg_rtx (V16QImode);
> > > +  operands[3] = gen_reg_rtx (V16QImode);
> > > +}
> > > +  else
> > > +operands[1] = operands[2] = operands[3] = operands[0];
> > 
> > This won't work (in the pattern you write to op 3 before reading from op 2).
> > Do you ever call this expander late, anyway?
> 
> I'm not sure I follow you.

I'm sorry, I meant that very last line I quoted, the !can_create_pseudo_p ()
one.  If that is executed operands[2] will be the same reg as operands[3],
and things fall apart.


Segher

Re: [PATCH] -Wtautological-compare should be quiet on floats

2015-07-29 Thread Jeff Law


On 07/29/2015 08:08 AM, Marek Polacek wrote:

As discussed elsewhere, -Wtautological-compare shouldn't warn about
floating-point types because of the way NaN behave.

I've been meaning to commit this one as obvious, but I'm not sure
whether I should also use HONOR_NANS or whether I can safely ignore
that here.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-29  Marek Polacek  

* c-common.c (warn_tautological_cmp): Bail for float types.

* c-c++-common/Wtautological-compare-3.c: New test.
I think it comes down to what we think users are going to expect when 
compiling code with NaNs disabled.


One camp would probably say "in my code X == X is always true since I 
don't have NaNs."  The other might say "whether or not to warn on X == X 
should not be dependent on flags such as -ffinite-math-only".


I  could easily make a case for either.  I'd personally tend to lean 
towards the latter.


Jeff

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-07-29 Thread Michael Meissner

On Wed, Jul 29, 2015 at 04:59:23PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> > +;; Return constant 0x8000 in an Altivec 
> > register.
> > +
> > +(define_expand "altivec_high_bit"
> > +  [(set (match_dup 1)
> > +   (vec_duplicate:V16QI (const_int 7)))
> > +   (set (match_dup 2)
> > +   (ashift:V16QI (match_dup 1)
> > + (match_dup 1)))
> > +   (set (match_dup 3)
> > +   (match_dup 4))
> > +   (set (match_operand:V16QI 0 "register_operand" "")
> > +   (unspec:V16QI [(match_dup 2)
> > +  (match_dup 3)
> > +  (const_int 15)] UNSPEC_VSLDOI))]
> > +  "TARGET_ALTIVEC"
> > +{
> > +  if (can_create_pseudo_p ())
> > +{
> > +  operands[1] = gen_reg_rtx (V16QImode);
> > +  operands[2] = gen_reg_rtx (V16QImode);
> > +  operands[3] = gen_reg_rtx (V16QImode);
> > +}
> > +  else
> > +operands[1] = operands[2] = operands[3] = operands[0];
> 
> This won't work (in the pattern you write to op 3 before reading from op 2).
> Do you ever call this expander late, anyway?

I'm not sure I follow you.  Without the patch lines the insns are as follows (I
put in blank lines to separate the insns):

(define_expand "altivec_high_bit"
  [(set (match_dup 1)
(vec_duplicate:V16QI (const_int 7)))

   (set (match_dup 2)
(ashift:V16QI (match_dup 1)
  (match_dup 1)))

   (set (match_dup 3)
(match_dup 4))

   (set (match_operand:V16QI 0 "register_operand" "")
(unspec:V16QI [(match_dup 2)
   (match_dup 3)
   (const_int 15)] UNSPEC_VSLDOI))]
  "TARGET_ALTIVEC"
{
  if (can_create_pseudo_p ())
{
  operands[1] = gen_reg_rtx (V16QImode);
  operands[2] = gen_reg_rtx (V16QImode);
  operands[3] = gen_reg_rtx (V16QImode);
}
  else
operands[1] = operands[2] = operands[3] = operands[0];

  operands[4] = CONST0_RTX (V16QImode);
})

The first insn sets operands[1] to be 0x07070707070707070707070707070707LL.

The second insn sets operands[2] to be operands[1] << operands[1], i.e.
0x80808080808080808080808080808080LL.

The third insn sets operands[3] to be 0.

The fourth does a double vector shift left 15 bytes, filing in 0's in the
bottom bits, which leaves the following in the register:
0x8000LL

This is negative -0.0 in IEEE 128-bit, which is used to flip the sign bit.

The code is used for negate and absolute value (which is done during rtl
expansion). Here is the negate use case.

(define_insn_and_split "ieee_128bit_vsx_neg2"
  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
(neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
   (clobber (match_scratch:V16QI 2 "=v"))]
  "TARGET_FLOAT128 && FLOAT128_IEEE_P (mode)"
  "#"
  ""
  [(parallel [(set (match_dup 0)
   (neg:TFIFKF (match_dup 1)))
  (use (match_dup 2))])]
{
  if (GET_CODE (operands[2]) == SCRATCH)
operands[2] = gen_reg_rtx (V16QImode);

  operands[3] = gen_reg_rtx (V16QImode);
  emit_insn (gen_altivec_high_bit (operands[2]));
}
  [(set_attr "length" "8")
   (set_attr "type" "vecsimple")])

(define_insn "*ieee_128bit_vsx_neg2_internal"
  [(set (match_operand:TFIFKF 0 "register_operand" "=wa")
(neg:TFIFKF (match_operand:TFIFKF 1 "register_operand" "wa")))
   (use (match_operand:V16QI 2 "register_operand" "=v"))]
  "TARGET_FLOAT128"
  "xxlxor %x0,%x1,%x2"
  [(set_attr "length" "4")
   (set_attr "type" "vecsimple")])


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-07-29 Thread Jeff Law


On 07/29/2015 07:49 AM, Kyrill Tkachov wrote:

Hi all,

This patch improves RTL if-conversion on sequences that perform a
conditional select on integer constants.
Most of the smart logic to do that already exists in the
noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two
immediates and performs a conditional
select between them. It returns true and then the caller
noce_process_if_block doesn't try any other transformations,
completely skipping the more aggressive transformations that
noce_try_store_flag_constants allows!

Calling noce_try_store_flag_constants before noce_try_cmove allows for
the smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for
the C code:
int
foo (int a, int b)
{
   return ((a & (1 << 25)) ? 5 : 4);
}

i.e. test a bit in a and return 5 or 4. Currently on aarch64 this
generates the naive:
 and w2, w0, 33554432  // mask away all bits except bit 25
 mov w1, 4
 cmp w2, wzr
 mov w0, 5
 cselw0, w0, w1, ne


whereas with this patch this can be transformed into the much better:
 ubfxx0, x0, 25, 1  // extract bit 25
 add w0, w0, 4
I suspect the PA would benefit from this as well, probably several 
other architectures with reasonable bitfield extraction capabilities.




Another issue I encountered is that the code that claims to perform the
transformation:
   /* if (test) x = 3; else x = 4;
  =>   x = 3 + (test == 0);  */

doesn't seem to do exactly that in all cases. In fact for that case it
will try something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional
increment operation.
I vaguely recall targets that don't have const - X insns, but do have X 
+ const style insns.  And more generally I think we're better off 
generating the PLUS version.




This patch tweaks that code to always try to generate an addition of the
condition rather than
a subtraction.

Anyway, for code:
int
fooinc (int x)
{
   return x ? 1025 : 1026;
}

we currently generate:
 mov w2, 1025
 mov w1, 1026
 cmp w0, wzr
 cselw0, w2, w1, ne

whereas with this patch we will generate:
 cmp w0, wzr
 csetw0, eq
 add w0, w0, 1025

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

P.S. noce_try_store_flag_constants is currently gated on
!targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For
example, I think the first example above
would show a benefit on arm if it was enabled there. But that can be a
separate investigation.

2015-07-29  Kyrylo Tkachov 

 * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
 -STORE_FLAG and condition is reversable.  Prefer to add to the
 flag value.
 (noce_process_if_block): Try noce_try_store_flag_constants before
 noce_try_cmove.

2015-07-29  Kyrylo Tkachov 

 * gcc.target/aarch64/csel_bfx_1.c: New test.
 * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.

ifcvt-csel-imms.patch


commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov
Date:   Tue Jul 28 14:59:46 2015 +0100

 [RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info 
*if_info)

reversep = 0;
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-   normalize = 0;
+   normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
We generally avoid using a ',' operator like this.  However, I can see 
that you're just following existing convention in that code.  So I won't 
object.





else if (ifalse == 0 && exact_log2 (itrue) >= 0
   && (STORE_FLAG_VALUE == 1
   || if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info 
*if_info)
 =>   x = 3 + (test == 0);  */
if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
{
- target = expand_simple_binop (mode,
-   (diff == STORE_FLAG_VALUE
-? PLUS : MINUS),
-   gen_int_mode (ifalse, mode), target,
+ rtx_code code = reversep ? PLUS :
+   (diff == STORE_FLAG_VALUE ? PLUS
+  : MINUS);
+ HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+ target = expand_simple_binop (mode, code,
+   gen_int_mode (to_add, mode), target,

Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-07-29 Thread Jeff Law


On 07/28/2015 12:18 PM, Alex Velenko wrote:

On 21/04/15 06:27, Jeff Law wrote:

On 04/20/2015 01:09 AM, Shiva Chen wrote:

Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks.  This looks pretty good.

I expanded the comment for the new function a bit and renamed the
function in an effort to clarify its purpose.  From reviewing
can_replace_by, it seems it should have been handling this case, but
clearly wasn't due to implementation details.

I then bootstrapped and regression tested the patch on x86_64-linux-gnu
where it passed.  I also instrumented that compiler to see how often
this code triggers.  During a bootstrap it triggers a couple hundred
times (which is obviously a proxy for cross jumping improvements).  So
it's triggering regularly on x86_64, which is good.

I also verified that this fixes BZ64916 for an arm-non-eabi toolchain
configured with --with-arch=armv7.

Installed on the trunk.  No new testcase as it's covered by existing
tests.

Thanks,,
jeff



Hi,
I see this patch been committed in r56 on trunk. Is it okay to port
this to fsf-5?
It's not a regression, so backporting it would be generally frowned 
upon.  If you feel strongly about it, you should ask Jakub, Joseph or 
Richi (the release managers) for an exception to the general policy.


jeff

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-29 Thread Michael Collison


Richard and Jeff,

Any conclusion to this discussion? Is this okay in match.pd or would you 
like to see it implemented elsewhere?


On 7/28/2015 12:41 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law  wrote:

On 07/27/2015 03:25 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
 wrote:

This patch is designed to optimize end of loop conditions involving of
the
form
   i < x && i < y into i < min (x, y). Loop condition involving '>' are
handled similarly using max(x,y).
As an example:

#define N 1024

int  a[N], b[N], c[N];

void add (unsignedint  m, unsignedint  n)
{
unsignedint  i, bound = (m < n) ? m : n;
for  (i = 0; i < m && i < n; ++i)
  a[i] = b[i] + c[i];
}


Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
arm-linux-gnueabihf, and aarch64-linux-gnu.
Okay for trunk?


So this works only for && that has been lowered to non-CFG form
(I suppose phiopt would catch that?  If not, ifcombine would be the
place to implement it I guess).

phiopt is supposed to be generating MIN/MAX expressions for us.  If it isn't
it'd be good to see any testcases where it isn't.

I think that raises a general question though.  Does it make more sense to
capture MIN/MAX (and others) in phiopt or in the match.pd framework?

match.pd is good for pattern recognition - patterns of fixed size.  There are
cases that are done in fold-const.c for example that doesn't fit very well
and should be done as separate pass, like for example figuring out whether
an expression can be easily negated or whether there are sign-changes that
can be stripped.  Basically all cases where fold currently recurses (unbound).

The above case is a corner case I think - the number of && you can change
into (multiple) MIN/MAX is unbound but we might only care about the case
where there will be one MIN/MAX operation.

Generally phiopt and other patterns that match the CFG are not yet well
supported by match.pd (though I outlined how matching PHI nodes when
facing (simplify (cond ...) ...) would be possible).

So while putting something into match.pd is easy I'd like people to
think if doing the same thing elsewhere is better - that is, if this is really
a pattern transform operation or if you are just implementing a special-case
of a general transform as a pattern.

Richard.


Jeff

Re: [PATCH], PowerPC IEEE 128-bit patch #4

2015-07-29 Thread Segher Boessenkool

On Wed, Jul 29, 2015 at 04:04:28PM -0400, Michael Meissner wrote:
> +;; Return constant 0x8000 in an Altivec register.
> +
> +(define_expand "altivec_high_bit"
> +  [(set (match_dup 1)
> + (vec_duplicate:V16QI (const_int 7)))
> +   (set (match_dup 2)
> + (ashift:V16QI (match_dup 1)
> +   (match_dup 1)))
> +   (set (match_dup 3)
> + (match_dup 4))
> +   (set (match_operand:V16QI 0 "register_operand" "")
> + (unspec:V16QI [(match_dup 2)
> +(match_dup 3)
> +(const_int 15)] UNSPEC_VSLDOI))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (can_create_pseudo_p ())
> +{
> +  operands[1] = gen_reg_rtx (V16QImode);
> +  operands[2] = gen_reg_rtx (V16QImode);
> +  operands[3] = gen_reg_rtx (V16QImode);
> +}
> +  else
> +operands[1] = operands[2] = operands[3] = operands[0];

This won't work (in the pattern you write to op 3 before reading from op 2).
Do you ever call this expander late, anyway?


Segher

Re: [gomp4] Redesign oacc_parallel launch API

2015-07-29 Thread Nathan Sidwell


On 07/29/15 08:24, Nathan Sidwell wrote:

On 07/29/15 05:22, Thomas Schwinge wrote:



Likewise for the other torture testing flags.



Investigating ...  (I've seen those failures be intermittent)


Interestingly the fails go away with an unoptimized libgomp.  I've observed 
something vaguely like that before.  The observed failure mode was getting stuck 
inside the driver library opening the device.  Which is very strange.



Anyway, I've committed the attached to gomp4 branch, which separates the ASYNC 
and WAIT tags, for a slightly better interface.  It doesn't fixup the failure 
thought.  Still thinking about that.


nathan
2015-07-29  Nathan Sidwell  

	include/
	* gomp-constants.h (GOMP_LAUNCH_ASYNC_WAIT): Replace with ...
	(GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): ... these.
	(GOMP_LAUNCH_OP_MAX): New.

	libgomp/
	* plugin/plugin-nvptx.c (nvptx_wait): Add debug print.
	* oacc-parallel.c (GOACC_parallel_keyed): Process separate ASYNC
	and WAIT tags.
	(GOACC_parallel): Adjust forwarding.

	gcc/
	* omp-low.c (expand_omp_target): Emit separate ASYNC and WAIT tags.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 226346)
+++ gcc/omp-low.c	(working copy)
@@ -9875,23 +9875,35 @@ expand_omp_target (struct omp_region *re
   integer_type_node,
   build_int_cst (integer_type_node,
 		 GOMP_ASYNC_SYNC));
-	if (t_async && !tagging)
+	if (tagging && t_async)
 	  {
-	args.safe_push (t_async);
-	t_async = NULL_TREE;
+	unsigned HOST_WIDE_INT i_async;
+
+	if (TREE_CODE (t_async) == INTEGER_CST)
+	  {
+		/* See if we can pack the async arg in to the tag's
+		   operand.  */
+		i_async = TREE_INT_CST_LOW (t_async);
+
+		if (i_async < GOMP_LAUNCH_OP_MAX)
+		  t_async = NULL_TREE;
+	  }
+	if (t_async)
+	  i_async = GOMP_LAUNCH_OP_MAX;
+	args.safe_push (oacc_launch_pack
+			(GOMP_LAUNCH_ASYNC, NULL_TREE, i_async));
 	  }
+	if (t_async)
+	  args.safe_push (t_async);
 
 	/* Save the argument index, and... */
 	unsigned t_wait_idx = args.length ();
 	unsigned num_waits = 0;
 	c = find_omp_clause (clauses, OMP_CLAUSE_WAIT);
-	if (!tagging || c || t_async)
+	if (!tagging || c)
 	  /* ... push a placeholder.  */
 	  args.safe_push (integer_zero_node);
 
-	if (tagging && t_async)
-	  args.safe_push (t_async);
-	
 	for (; c; c = OMP_CLAUSE_CHAIN (c))
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WAIT)
 	{
@@ -9901,13 +9913,13 @@ expand_omp_target (struct omp_region *re
 	  num_waits++;
 	}
 
-	if (!tagging || num_waits || t_async)
+	if (!tagging || num_waits)
 	  {
 	tree len;
 
 	/* Now that we know the number, update the placeholder.  */
 	if (tagging)
-	  len = oacc_launch_pack (GOMP_LAUNCH_ASYNC_WAIT,
+	  len = oacc_launch_pack (GOMP_LAUNCH_WAIT,
   NULL_TREE, num_waits);
 	else
 	  len = build_int_cst (integer_type_node, num_waits);
Index: libgomp/oacc-parallel.c
===
--- libgomp/oacc-parallel.c	(revision 226346)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -268,11 +268,20 @@ GOACC_parallel_keyed (int device, void (
 	  }
 	  break;
 
-	case GOMP_LAUNCH_ASYNC_WAIT:
+	case GOMP_LAUNCH_ASYNC:
+	  {
+	/* Small constant values are encoded in the operand.  */
+	async = GOMP_LAUNCH_OP (tag);
+
+	if (async == GOMP_LAUNCH_OP_MAX)
+	  async = va_arg (ap, unsigned);
+	break;
+	  }
+
+	case GOMP_LAUNCH_WAIT:
 	  {
 	unsigned num_waits = GOMP_LAUNCH_OP (tag);
 
-	async = va_arg (ap, unsigned);
 	if (num_waits)
 	  goacc_wait (async, num_waits, &ap);
 	break;
@@ -357,8 +366,9 @@ GOACC_parallel (int device, void (*fn) (
 			GOMP_LAUNCH_PACK (GOMP_LAUNCH_DIM, 0,
 	  GOMP_DIM_MASK (GOMP_DIM_MAX) - 1),
 			num_gangs, num_workers, vector_length,
-			GOMP_LAUNCH_PACK (GOMP_LAUNCH_ASYNC_WAIT,
-	  0, num_waits),
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_ASYNC, 0,
+	  GOMP_LAUNCH_OP_MAX), async,
+			GOMP_LAUNCH_PACK (GOMP_LAUNCH_WAIT, 0, num_waits),
 			async, waits[0], waits[1], waits[2], waits[3],
 			waits[4], waits[5], waits[6], waits[7], waits[8]);
 }
Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 226346)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -1346,6 +1346,8 @@ nvptx_wait (int async)
   if (!s)
 GOMP_PLUGIN_fatal ("unknown async %d", async);
 
+  GOMP_PLUGIN_debug (0, "  %s: waiting on async=%d\n", __FUNCTION__, async);
+
   r = cuStreamSynchronize (s->stream);
   if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal ("cuStreamSynchronize error: %s", cuda_error (r));
Index: include/gomp-constants.h
===
--- include/gomp-constants.h	(revision 226346)
+++ include/gomp-constants.h	(working copy)
@@ -131,7 +131,8 @@ enum gomp_map_kind
 /* Varadic launch arguments.  */
 #define GOMP_LAUNCH_END 	0

libgo patch committed: Add missing spaces to last mksysinfo commit

2015-07-29 Thread Ian Lance Taylor

The last change to libgo/mksysinfo.sh was missing some spaces, which
apparently causaed mksysinfo to hang on some systems.  This is
https://golang.org/issue/11924.  This patch from Lynn Boger fixes the
problem.  Bootstrapped on x86_64-unknown-linux-gnu, where it made no
difference.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 226196)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3aa95d96181dc4525b1b8ec189f9104afa6d7609
+9931f2c150e2da4b7d468db332823d8ef4fb8c34
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 226196)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1026,19 +1026,19 @@ if ! grep '^const TUNDETACHFILTER' ${OUT
   fi
 fi
 
-if ! grep '^const TUNGETVNETHDRSZ'${OUT} >/dev/null 2>&1; then
+if ! grep '^const TUNGETVNETHDRSZ' ${OUT} >/dev/null 2>&1; then
   if grep '^const _TUNGETVNETHDRSZ_val' ${OUT} >/dev/null 2>&1; then
 echo 'const TUNGETVNETHDRSZ = _TUNGETVNETHDRSZ_val' >> ${OUT}
   fi
 fi
 
-if ! grep '^const TUNSETVNETHDRSZ'${OUT} >/dev/null 2>&1; then
+if ! grep '^const TUNSETVNETHDRSZ' ${OUT} >/dev/null 2>&1; then
   if grep '^const _TUNSETVNETHDRSZ_val' ${OUT} >/dev/null 2>&1; then
 echo 'const TUNSETVNETHDRSZ = _TUNSETVNETHDRSZ_val' >> ${OUT}
   fi
 fi
 
-if ! grep '^const TUNSETQUEUE'${OUT} >/dev/null 2>&1; then
+if ! grep '^const TUNSETQUEUE' ${OUT} >/dev/null 2>&1; then
   if grep '^const _TUNSETQUEUE_val' ${OUT} >/dev/null 2>&1; then
 echo 'const TUNSETQUEUE = _TUNSETQUEUE_val' >> ${OUT}
   fi

Re: [PATCH] [graphite] Reduce the number of params in a scop to 3

2015-07-29 Thread Sebastian Pop

Sebastian Pop wrote:
> Aditya Kumar wrote:
> > More than 3 params consumes too much memory while bootstrapping gcc
> > with graphite enabled.
> 
> Ok.  I will commit the patch.
> Thanks for fixing bootstrap with graphite enabled.

We will increase the max when we will use ISL's mechanism to count the number of
polyhedral operations.  Until then I think we'd better lower the max nb_params
to avoid long compile times and reduce memory requirements to bootstrap with
graphite.

[PATCH] [graphite] Reduce the number of params in a scop to 3

2015-07-29 Thread Aditya Kumar

More than 3 params consumes too much memory while bootstrapping gcc
with graphite enabled.

BOOT_CFLAGS="-g -O2 -fgraphite-identity -floop-block -floop-interchange 
-floop-strip-mine"
---
 gcc/params.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.def b/gcc/params.def
index 0ca3451..1f6e40e 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -850,7 +850,7 @@ DEFPARAM (PARAM_LOOP_UNROLL_JAM_DEPTH,
 DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
  "graphite-max-nb-scop-params",
  "maximum number of parameters in a SCoP",
- 10, 0, 0)
+ 3, 0, 0)
 
 /* Maximal number of basic blocks in the functions analyzed by Graphite.  */
 
-- 
2.1.0.243.g30d45f7

Re: [PATCH] [graphite] Reduce the number of params in a scop to 3

2015-07-29 Thread Sebastian Pop

Aditya Kumar wrote:
> More than 3 params consumes too much memory while bootstrapping gcc
> with graphite enabled.

Ok.  I will commit the patch.
Thanks for fixing bootstrap with graphite enabled.

Sebastian

[committed, PATCH] Define DBX_REGISTER_NUMBER for IA MCU

2015-07-29 Thread H.J. Lu

Since IA MCU uses the same debug register map as Linux/x86, we copy
DBX_REGISTER_NUMBER together with TARGET_ASM_FILE_START_FILE_DIRECTIVE
and ASM_COMMENT_START from i386/gnu-user.h to i386/iamcu.h.

* config/i386/iamcu.h (TARGET_ASM_FILE_START_FILE_DIRECTIVE):
New.  Copied from config/i386/gnu-user.h.
(ASM_COMMENT_START): Likewise.
(DBX_REGISTER_NUMBER): Likewise.
---
 gcc/config/i386/iamcu.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/i386/iamcu.h b/gcc/config/i386/iamcu.h
index 1e2fbe4..c20c2db 100644
--- a/gcc/config/i386/iamcu.h
+++ b/gcc/config/i386/iamcu.h
@@ -26,6 +26,17 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #undef TARGET_SUBTARGET_DEFAULT
 #define TARGET_SUBTARGET_DEFAULT MASK_IAMCU
 
+/* Output at beginning of assembler file.  */
+/* The .file command should always begin the output.  */
+#define TARGET_ASM_FILE_START_FILE_DIRECTIVE true
+
+#undef ASM_COMMENT_START
+#define ASM_COMMENT_START "#"
+
+#undef DBX_REGISTER_NUMBER
+#define DBX_REGISTER_NUMBER(n) \
+  (TARGET_64BIT ? dbx64_register_map[n] : svr4_dbx_register_map[n])
+
 #undef ASM_SPEC
 #define ASM_SPEC "--32 -march=iamcu"
 
-- 
2.4.3

Re: [PATCH 0/4] S390 -march=native related fixes

2015-07-29 Thread Richard Sandiford

Dominik Vogt  writes:
> With that problem fixed I still see one minor glitch.  Maybe
> someone knows how to fix the following:
>
> * With a cross compiler that generates i686 binaries on s390x:
>
>$ i686-elf-gcc -c ~/foo.c -march=native
>/home/vogt/foo.c:1:0: error: bad value (native) for -march= switch
>
>   This is all right because the x86 compiler just emits a brief
>   error message because the argument to -march= is a string.
>
> * The other way round, generating s390x binaries on i686:
>
>$ s390x-linux-gcc -c ~/foo.c -march=native
>cc1: error: unrecognized argument in option '-march=native'
>cc1: note: valid arguments to '-march=' are: g5 g6 native z10 z13 z196 
> z9-109 z9-ec z900 z990 zEC12
>
>   So, the error message complains about "native" not being a valid
>   argument to -march=, and then lists it as valid in the next line.
>   This is because s390x uses an Enum option, and "native" is one
>   of the allowed values.

Nice spot :-)  One way would be to have:

  %{mtune=native:%e-mtune=native is only supported for native configurations}

in the non-native MARCH_MTUNE_NATIVE_SPECS, and similarly for -march.
The problem is that that won't quote -mtune=native in the canonical
way and I don't know how well it would handle intl (probably not very well).
Maybe we should have a new .opt tag for native-only enum options.

Thanks,
Richard

Re: [Patch, fortran] PR64921 class_allocate_18 failure

2015-07-29 Thread Steve Kargl

On Wed, Jul 29, 2015 at 09:50:16PM +0200, Uros Bizjak wrote:
> 
> Please also add the testcase from the PR, Comment #14 [1]. The failure
> mode of this problem is a difficult to detect invalid read in
> class_allocate_18, but the testcase from the PR outright crashes.
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64921#c14
> 

Patch is OK with the testcase.

-- 
Steve

Re: [PR64164] drop copyrename, integrate into expand

2015-07-29 Thread H.J. Lu

On Wed, Jul 29, 2015 at 1:13 PM, Alexandre Oliva  wrote:
> On Jul 23, 2015, "H.J. Lu"  wrote:
>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66978
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66983
>
> Thanks, both of these are also fixed (I merged your patch for x32, and I
> verified manually that another fix I just wrote fixes all the -m32
> -msse2 regressions) in the git branch aoliva/pr64164, but I'm going to
> investigate a few more issues affecting other targets before I start
> full regression tests all over the build farm ;-)
>

I am building x32 on aoliva/pr64164 now.

-- 
H.J.

Re: [PR64164] drop copyrename, integrate into expand

2015-07-29 Thread Alexandre Oliva

On Jul 23, 2015, "H.J. Lu"  wrote:

>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66978

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66983

Thanks, both of these are also fixed (I merged your patch for x32, and I
verified manually that another fix I just wrote fixes all the -m32
-msse2 regressions) in the git branch aoliva/pr64164, but I'm going to
investigate a few more issues affecting other targets before I start
full regression tests all over the build farm ;-)

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PR64164] drop copyrename, integrate into expand

2015-07-29 Thread Alexandre Oliva

On Jul 23, 2015, Segher Boessenkool  wrote:

> On Thu, Jul 23, 2015 at 12:29:14PM -0300, Alexandre Oliva wrote:
>> Yeah.  Thanks, I've tested it with this change, and I'm now checking
>> this in (full patch first; adjusted incremental patch at the end):

> Unfortunately it causes about a thousand test fails on powerpc64-linux
> (at least, it seems to be this patch, I haven't actually checked).

> Some representative backtraces:

Thanks, both of these are now fixed (at least in that they don't ICE any
more) in the git branch aoliva/pr64164, but I'm going to investigate a
few more issues before I start a regression test on gcc110 and gcc112.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Add irreflexive comparison debug check

2015-07-29 Thread François Dumont

Hi

Here is a patch to add irreflexive debug check.

Standard algos signatures are such that there is no guaranty that
the operator < or predicate to compare the iterator value type will
exist. So I had to check if the call is valid using C++11 features
declval and decltype.

* include/debug/formatter.h (_Debug_msg_id::__msg_irreflexive_ordering):
New enum entry.
* include/debug/functions.h (_Irreflexive_checker): New.
(__is_irreflexive, __is_irreflexive_pred): New.
* include/debug/macros.h (__glibcxx_check_irreflexive,
__glibcxx_check_irreflexive_pred): New macros.
* include/debug/debug.h (__glibcxx_requires_irreflexive,
__glibcxx_requires_irreflexive_pred): New macros, use latter.
* include/bits/stl_algo.h
(partial_sort_copy): Add irreflexive debug check.
(partial_sort_copy): Likewise.
(lower_bound): Likewise.
(upper_bound): Likewise.
(equal_range): Likewise.
(binary_search): Likewise.
(inplace_merge): Likewise.
(includes): Likewise.
(next_permutation): Likewise.
(prev_permutation): Likewise.
(is_sorted_until): Likewise.
(minmax_element): Likewise.
(partial_sort): Likewise.
(nth_element): Likewise.
(sort): Likewise.
(merge): Likewise.
(stable_sort): Likewise.
(set_union): Likewise.
(set_intersection): Likewise.
(set_difference): Likewise.
(set_symmetric_difference): Likewise.
(min_element): Likewise.
(max_element): Likewise.
* include/bits/stl_algobase.h
(lower_bound): Likewise.
(lexicographical_compare): Likewise.
* include/bits/stl_heap.h
(push_heap): Likewise.
(pop_heap): Likewise.
(make_heap): Likewise.
(sort_heap): Likewise.
(is_heap_until): Likewise.

Ok to commit ?

François

diff --git libstdc++-v3/include/bits/stl_algo.h libstdc++-v3/include/bits/stl_algo.h
index 93e834a..89f5d36 100644
--- libstdc++-v3/include/bits/stl_algo.h
+++ libstdc++-v3/include/bits/stl_algo.h
@@ -1750,6 +1750,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _OutputValueType>)
   __glibcxx_function_requires(_LessThanComparableConcept<_OutputValueType>)
   __glibcxx_requires_valid_range(__first, __last);
+  __glibcxx_requires_irreflexive(__first, __last);
   __glibcxx_requires_valid_range(__result_first, __result_last);
 
   return std::__partial_sort_copy(__first, __last,
@@ -1803,6 +1804,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
   _OutputValueType, _OutputValueType>)
   __glibcxx_requires_valid_range(__first, __last);
+  __glibcxx_requires_irreflexive_pred(__first, __last, __comp);
   __glibcxx_requires_valid_range(__result_first, __result_last);
 
   return std::__partial_sort_copy(__first, __last,
@@ -2027,6 +2029,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _ValueType, _Tp>)
   __glibcxx_requires_partitioned_lower_pred(__first, __last,
 		__val, __comp);
+  __glibcxx_requires_irreflexive_pred(__first, __last, __comp);
 
   return std::__lower_bound(__first, __last, __val,
 __gnu_cxx::__ops::__iter_comp_val(__comp));
@@ -2082,6 +2085,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
   __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
+  __glibcxx_requires_irreflexive(__first, __last);
 
   return std::__upper_bound(__first, __last, __val,
 __gnu_cxx::__ops::__val_less_iter());
@@ -2116,6 +2120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Tp, _ValueType>)
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
 		__val, __comp);
+  __glibcxx_requires_irreflexive_pred(__first, __last, __comp);
 
   return std::__upper_bound(__first, __last, __val,
 __gnu_cxx::__ops::__val_comp_iter(__comp));
@@ -2189,7 +2194,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_LessThanOpConcept<_ValueType, _Tp>)
   __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
   __glibcxx_requires_partitioned_lower(__first, __last, __val);
-  __glibcxx_requires_partitioned_upper(__first, __last, __val);  
+  __glibcxx_requires_partitioned_upper(__first, __last, __val);
+  __glibcxx_requires_irreflexive(__first, __last);
 
   return std::__equal_range(__first, __last, __val,
 __gnu_cxx::__ops::__iter_less_val(),
@@ -2231,6 +2237,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		__val, __comp);
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
 		__val, __comp);
+  __glibcxx_requires_irreflexive_pred(__first, __last, __comp);
 
   return std::__equal_range(__first, __last, __val,
 __gnu_cxx::__ops::__iter_comp_val(__comp),
@@ -2262,6 +2269,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
   __glib

[PATCH], PowerPC IEEE 128-bit patch #4

2015-07-29 Thread Michael Meissner

This is another intermediate patch to get IEEE 128-bit support on the PowerPC
into the GCC compiler. This patch adds a lot of the support to allow IEEE
128-bit support in VSX registers. Note, it will need future patches that
updates rs6000.c and rs6000.md to enable the basic IEEE 128-bit support.

This patch bootstraps and has no test suite regressions on a big endian Power7
system and a little endian system. Is it ok to install?

The expected future patches are:

  #5Finish the enablement of the basic support (rs6000.c & rs6000.md
changes);

  #6Add support for using different names for the 64/128-bit integer
conversion to IBM extended double, to allow a future version to
switch the default for what long double is. It is not expected that GCC
6.x will make this switch, but we would like to eventually use the
standard TF names for the library when the default change is made. If
this isn't clear, the following names use 'tf' in them, when they use
IBM extended double:

__dpd_extendddtf
__dpd_extendsdtf
__dpd_extendtftd
__dpd_trunctdtf
__dpd_trunctfdd
__dpd_trunctfsd
__fixtfdi
__fixtfti
__fixunstfti
__floattitf
__floatuntitf
__powitf2
__floatditf
__floatunditf
__fixunstfdi

  #7Basic patches to enable libgcc support. It is anticipated that these
patches may be temporary changes, to allow for the glibc team to do the
soft-float emulator changes that are shared with libgcc (but they can't
really start until there is basic support in there).

  #8Enable IEEE 128-bit floating point in VSX registers by default for VSX
systems, add tests for IEEE 128-bit support and correct calling
sequence.

  #9... Various fixes of things I haven't yet covered (complex, libquadmath,
etc.).

2015-07-28  Michael Meissner  

* config/rs6000/vector.md (VEC_L): Add KFmode and TFmode.
(VEC_M): Likewise.
(VEC_N): Likewise.
(mov, VEC_M iterator): Add support for IEEE 128-bit floating
point in VSX registers.

* config/rs6000/constraints.md (wb constraint): Document unused
w constraint.
(we constraint): Likewise.
(wo constraint): Likewise.
(wp constraint): New constraint for IEEE 128-bit floating point in
VSX registers.
(wq constraint): Likewise.

* config/rs6000/predicates.md (easy_fp_constant): Add support for
IEEE 128-bit floating point in VSX registers.
(easy_scalar_constant): Likewise.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add new
constraints (wp, wq) for IEEE 128-bit floating point in VSX
registers.
(rs6000_init_hard_regno_mode_ok): Likewise.

* config/rs6000/vsx.md (VSX_LE_128): Add support for IEEE 128-bit
floating point in VSX registers.
(VSX_L): Likewise.
(VSX_M): Likewise.
(VSX_M2): Likewise.
(VSm): Likewise.
(VSs): Likewise.
(VSr): Likewise.
(VSa): Likewise.
(VSv): Likewise.
(vsx_le_permute_): Add support to properly swap bytes for
IEEE 128-bit floating point in VSX registers on little endian.
(vsx_le_undo_permute_): Likewise.
(vsx_le_perm_load_): Likewise.
(vsx_le_perm_store_): Likewise.
(splitters for IEEE 128-bit fp moves): Likewise.

* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wp and
wq constraints.

* config/rs6000/altivec.md (VM): Add support for IEEE 128-bit
floating point in VSX registers.
(VM2): Likewise.
(altivec_high_bit): New insn to set just the high bit in an
altivec register.

* doc/md.text (Machine Constraints): Document wp and wq
constraints on PowerPC.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 226275)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -36,13 +36,14 @@ (define_mode_iterator VEC_A [V16QI V8HI 
 (define_mode_iterator VEC_K [V16QI V8HI V4SI V4SF])
 
 ;; Vector logical modes
-(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI])
+(define_mode_iterator VEC_L [V16QI V8HI V4SI V2DI V4SF V2DF V1TI TI KF TF])
 
-;; Vector modes for moves.  Don't do TImode here.
-(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI])
+;; Vector modes for moves.  Don't do TImode or TFmode here, since their
+;; moves are handled elsewhere.
+(define_mode_iterator VEC_M [V16QI V8HI V4SI V2DI V4SF V2DF V1TI KF])
 
 ;; Vec

Re: [Patch, fortran] PR64921 class_allocate_18 failure

2015-07-29 Thread Uros Bizjak

Hello!

> I submit the PR64921 fix I posted two days ago on bugzilla.
>
> The problem comes from the finalization wrapper not having the 
> always_explicit attribute set, so
> that when it is called, the array argument is passed without descriptor, but 
> the argument
> declaration is a descriptor array. Boom! Normally the always_explicit 
> attribute is set automatically
> during resolution, but the code generated by finalization is not passed 
> through resolution. As
> resolution also takes care of error reporting,
>
> I prefer to not pass the generated code through resolution.
>
> So the attached patch just sets the flag.
> I think it can be backported.

Please also add the testcase from the PR, Comment #14 [1]. The failure
mode of this problem is a difficult to detect invalid read in
class_allocate_18, but the testcase from the PR outright crashes.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64921#c14

Uros.

Re: [PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself.

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 10:19:52PM +0300, Ville Voutilainen wrote:
> Although the commit message for this supposedly obvious change is f**ked up
> since it claims I modified MAINTAINERS, whereas I modified ChangeLog.
> Such fun.

The commit message doesn't matter that much.  Your last change looks ok.

Marek

Re: [PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself.

2015-07-29 Thread Ville Voutilainen

On 29 July 2015 at 22:18, Ville Voutilainen  wrote:
> On 29 July 2015 at 21:48, Marek Polacek  wrote:
>> On Wed, Jul 29, 2015 at 08:29:34PM +0300, Ville Voutilainen wrote:
>>> Fyi.
>>>
>>> 2015-07-29  Ville Voutilainen  
>>> * MAINTAINERS (Write After Approval): Add myself.
>>
>> There should be a blank line between these two lines.
>
> Ok, I added the blank line and committed the change as obvious. :)

Although the commit message for this supposedly obvious change is f**ked up
since it claims I modified MAINTAINERS, whereas I modified ChangeLog.
Such fun.

Re: [PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself.

2015-07-29 Thread Ville Voutilainen

On 29 July 2015 at 21:48, Marek Polacek  wrote:
> On Wed, Jul 29, 2015 at 08:29:34PM +0300, Ville Voutilainen wrote:
>> Fyi.
>>
>> 2015-07-29  Ville Voutilainen  
>> * MAINTAINERS (Write After Approval): Add myself.
>
> There should be a blank line between these two lines.

Ok, I added the blank line and committed the change as obvious. :)

Re: [gomp4.1] Support #pragma omp target {enter,exit} data

2015-07-29 Thread Ilya Verbin

On Mon, Jul 06, 2015 at 22:42:10 +0200, Jakub Jelinek wrote:
> As has been clarified on omp-lang, we actually shouldn't be mapping or
> unmapping the pointer and/or reference, only the array slice itself, except
> in target construct (and even for that it is changing from mapping to
> private + pointer assignment).

I've updated this patch.  make check-target-libgomp passed.


libgomp/
* target.c (gomp_map_vars_existing): Fix target address for 'always to'
array sections.
(gomp_unmap_vars): Decrement k->refcount when it is 1 and
k->async_refcount is 0.
(gomp_offload_image_to_device): Set tgt's refcount to infinity.
(gomp_exit_data): New static function.
(GOMP_target_enter_exit_data): Support mapping/unmapping.
* testsuite/libgomp.c/target-11.c: Extend for testing 'always to' array
sections.
* testsuite/libgomp.c/target-20.c: New test.


diff --git a/libgomp/target.c b/libgomp/target.c
index ef74d43..ad375c9 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -191,7 +191,8 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep, 
splay_tree_key oldn,
 
   if (GOMP_MAP_ALWAYS_TO_P (kind))
 devicep->host2dev_func (devicep->target_id,
-   (void *) (oldn->tgt->tgt_start + oldn->tgt_offset),
+   (void *) (oldn->tgt->tgt_start + oldn->tgt_offset
+ + newn->host_start - oldn->host_start),
(void *) newn->host_start,
newn->host_end - newn->host_start);
   if (oldn->refcount != REFCOUNT_INFINITY)
@@ -664,15 +665,18 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
do_copyfrom)
continue;
 
   bool do_unmap = false;
-  if (k->refcount > 1)
+  if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
+   k->refcount--;
+  else if (k->refcount == 1)
{
- if (k->refcount != REFCOUNT_INFINITY)
-   k->refcount--;
+ if (k->async_refcount > 0)
+   k->async_refcount--;
+ else
+   {
+ k->refcount--;
+ do_unmap = true;
+   }
}
-  else if (k->async_refcount > 0)
-   k->async_refcount--;
-  else
-   do_unmap = true;
 
   if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
  || tgt->list[i].always_copy_from)
@@ -798,7 +802,7 @@ gomp_offload_image_to_device (struct gomp_device_descr 
*devicep,
   /* Insert host-target address mapping into splay tree.  */
   struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
   tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
-  tgt->refcount = 1;
+  tgt->refcount = REFCOUNT_INFINITY;
   tgt->tgt_start = 0;
   tgt->tgt_end = 0;
   tgt->to_free = NULL;
@@ -1241,6 +1245,62 @@ GOMP_target_update (int device, const void *unused, 
size_t mapnum,
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, false);
 }
 
+static void
+gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
+   void **hostaddrs, size_t *sizes, unsigned short *kinds)
+{
+  const int typemask = 0xff;
+  size_t i;
+  gomp_mutex_lock (&devicep->lock);
+  for (i = 0; i < mapnum; i++)
+{
+  struct splay_tree_key_s cur_node;
+  unsigned char kind = kinds[i] & typemask;
+  switch (kind)
+   {
+   case GOMP_MAP_FROM:
+   case GOMP_MAP_ALWAYS_FROM:
+   case GOMP_MAP_DELETE:
+   case GOMP_MAP_RELEASE:
+ cur_node.host_start = (uintptr_t) hostaddrs[i];
+ cur_node.host_end = cur_node.host_start + sizes[i];
+ splay_tree_key k = splay_tree_lookup (&devicep->mem_map, &cur_node);
+ if (!k)
+   continue;
+
+ if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
+   k->refcount--;
+ if (kind == GOMP_MAP_DELETE && k->refcount != REFCOUNT_INFINITY)
+   k->refcount = 0;
+
+ if ((kind == GOMP_MAP_FROM && k->refcount == 0)
+ || kind == GOMP_MAP_ALWAYS_FROM)
+   devicep->dev2host_func (devicep->target_id,
+   (void *) cur_node.host_start,
+   (void *) (k->tgt->tgt_start + k->tgt_offset
+ + cur_node.host_start
+ - k->host_start),
+   cur_node.host_end - cur_node.host_start);
+ if (k->refcount == 0)
+   {
+ splay_tree_remove (&devicep->mem_map, k);
+ if (k->tgt->refcount > 1)
+   k->tgt->refcount--;
+ else
+   gomp_unmap_tgt (k->tgt);
+   }
+
+ break;
+   default:
+ gomp_mutex_unlock (&devicep->lock);
+ gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x",
+ kind);
+   }
+}
+
+  gomp_mutex_unlock (&devicep->lock);
+}
+
 void
 GOMP_target_enter_exit_data

Re: [PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself.

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 08:29:34PM +0300, Ville Voutilainen wrote:
> Fyi.
> 
> 2015-07-29  Ville Voutilainen  
> * MAINTAINERS (Write After Approval): Add myself.

There should be a blank line between these two lines.

Marek

Re: patch fortran, pr 59746, internal compiler error : segmentation fault

2015-07-29 Thread Mikael Morin


Hello,

I'm unburrying the patch from the thread starting at:
https://gcc.gnu.org/ml/gcc-patches/2014-03/msg00439.html

I provide the patch in two flavors read-only (without whitespace 
changes) and  write-only (with them).

This has been tested on x86_64-unknown-linux-gnu.  OK for trunk?

Mikael



2015-07-29  Bud Davis  
Mikael Morin  

PR fortran/59746
* symbol.c (gfc_restore_last_undo_checkpoint): Delete a common block
symbol if it was put in the list.

2015-07-29  Bud Davis  

PR fortran/59746
* gfortran.dg/common_22.f90: New.

*** /tmp/ro4P6U_symbol.c	2015-07-29 20:08:48.675970662 +0200
--- gcc/fortran/symbol.c	2015-07-29 19:48:25.580979685 +0200
*** gfc_restore_last_undo_checkpoint (void)
*** 3168,3177 
  
FOR_EACH_VEC_ELT (latest_undo_chgset->syms, i, p)
  {
!   if (p->gfc_new)
! 	{
! 	  /* Symbol was new.  */
! 	  if (p->attr.in_common && p->common_block && p->common_block->head)
  	{
  	  /* If the symbol was added to any common block, it
  		 needs to be removed to stop the resolver looking
--- 3168,3177 
  
FOR_EACH_VEC_ELT (latest_undo_chgset->syms, i, p)
  {
!   /* Symbol was new. Or was old and just put in common */
!   if ((p->gfc_new
! 	   || (p->attr.in_common && !p->old_symbol->attr.in_common ))
! 	  && p->attr.in_common && p->common_block && p->common_block->head)
  	{
  	  /* If the symbol was added to any common block, it
  	 needs to be removed to stop the resolver looking
*** gfc_restore_last_undo_checkpoint (void)
*** 3206,3216 
  		}
  
  		  gcc_assert(cparent->common_next == p);
- 
  		  cparent->common_next = csym->common_next;
  		}
  	}
! 
  	  /* The derived type is saved in the symtree with the first
  	 letter capitalized; the all lower-case version to the
  	 derived type contains its associated generic function.  */
--- 3206,3216 
  		}
  
  	  gcc_assert(cparent->common_next == p);
  	  cparent->common_next = csym->common_next;
  	}
  	}
!   if (p->gfc_new)
! 	{
  	  /* The derived type is saved in the symtree with the first
  	 letter capitalized; the all lower-case version to the
  	 derived type contains its associated generic function.  */
*** /dev/null	2015-07-28 11:36:43.193098438 +0200
--- gcc/testsuite/gfortran.dg/common_22.f90	2015-07-29 19:59:59.864974563 +0200
***
*** 0 
--- 1,24 
+ ! { dg-do compile }
+ !
+ ! PR fortran/59746
+ ! Check that symbols present in common block are properly cleaned up
+ ! upon error.
+ !
+ ! Contributed by Bud Davis  
+ 
+   CALL RCCFL (NVE,IR,NU3,VE (1,1,1,I))
+   COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+   COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+ !  the PR only contained the two above.
+ !  success is no segfaults or infinite loops.
+ !  let's check some combinations
+  CALL ABC (INTG)
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  CALL DEF (NT1)
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  CALL GHI (NRESL)
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  COMMON /CCFILE/ INTG,NT1,NT2,NT3,NVM,NVE,NFRLE,NRESF,NRESL !{ dg-error "Unexpected COMMON" }
+  END


Index: fortran/symbol.c
===
*** fortran/symbol.c	(rÃ©vision 226157)
--- fortran/symbol.c	(copie de travail)
***
*** 3168,3216 
  
FOR_EACH_VEC_ELT (latest_undo_chgset->syms, i, p)
  {
!   if (p->gfc_new)
  	{
! 	  /* Symbol was new.  */
! 	  if (p->attr.in_common && p->common_block && p->common_block->head)
! 	{
! 	  /* If the symbol was added to any common block, it
! 		 needs to be removed to stop the resolver looking
! 		 for a (possibly) dead symbol.  */
  
! 	  if (p->common_block->head == p && !p->common_next)
  		{
! 		  gfc_symtree st, *st0;
! 		  st0 = find_common_symtree (p->ns->common_root,
! 	 p->common_block);
! 		  if (st0)
! 		{
! 		  st.name = st0->name;
! 		  gfc_delete_bbt (&p->ns->common_root, &st, compare_symtree);
! 		  free (st0);
! 		}
  		}
  
! 	  if (p->common_block->head == p)
! 	p->common_block->head = p->common_next;
! 	  else
! 		{
! 		  gfc_symbol *cparent, *csym;
! 
! 		  cparent = p->common_block->head;
! 		  csym = cparent->common_next;
! 
! 		  while (csym != p)
! 		{
! 		  cparent = csym;
! 		  csym = csym->common_next;
! 		}
  
! 		  gcc_assert(cparent->common_next == p);
  
! 		  cparent->comm

Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-07-29 Thread Martin Sebor


On 07/28/2015 09:38 PM, Jason Merrill wrote:

Sorry about the slow response on IRC today, I got distracted onto
another issue and forgot to check back.  What I started to write:


No problem.




I'm exploring your suggestion to see if the back end could emit the
diagnostics. But I'm not sure it has sufficient context (location
information) to point to the line of code that uses the function.


Hmm, that's a good point.  I think it would make sense for the ADDR_EXPR
to carry location information as long as we're dealing with trees, but I
suspect we don't currently set the location of an ADDR_EXPR.  So that
would need to be fixed as part of this approach.


Okay, let me look into this.




I suspect the back end or even the middle end route isn't going to
work even if there was enough context to diagnose the problem
expressions because some of them will have been optimized away by then
(e.g., 'if (& __builtin_foo != 0)' is optimized into if (1) by gimple).


I was thinking that if they're optimized away, they aren't problematic
anymore; that was part of the attraction for me of handling this lower
down.


Yes, they're not problematic in that they don't cause linker unsats.
I have a few concerns with the approach of accepting or rejecting
constructs based on optimization (e.g., that it might lead to
similar problems as with -Wmaybe-uninitialized; or that it could
be perceived as inconsistent). But it may not be as bad as it seems
-- let me look into it if only as a learning exercise and see how
it goes.




The second question is about your suggestion to consolidate the code
into mark_rvalue_use. The problem I'm running into there is that
mark_rvalue_use is called for calls to builtins as well as for other
uses and doesn't have enough context to tell one from the other.


Ah, true.  But special-casing call uses is still fewer places than
special-casing all non-call uses.


This will be moot if I can implement it the middle-end. If not,
I'll give it a try to see if this alternative ends up reducing
the footprint of the patch.

Thanks
Martin

[PATCH] 2015-07-29 Benedikt Huber Philipp Tomsich

2015-07-29 Thread Benedikt Huber

* config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
rsqrtf.
* config/aarch64/aarch64-opts.h: -mrecip has a default value
depending on the core.
* config/aarch64/aarch64-protos.h: Declare.
* config/aarch64/aarch64-simd.md: Matching expressions for
frsqrte and frsqrts.
* config/aarch64/aarch64.c: New functions. Emit rsqrt
estimation code in fast math mode.
* config/aarch64/aarch64.md: Added enum entries.
* config/aarch64/aarch64.opt: Added options -mrecip and
-mlow-precision-recip-sqrt.
* testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
for frsqrte and frsqrts
* testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.

Signed-off-by: Philipp Tomsich 
---
 gcc/ChangeLog  |  19 
 gcc/config/aarch64/aarch64-builtins.c  | 103 
 gcc/config/aarch64/aarch64-opts.h  |   7 ++
 gcc/config/aarch64/aarch64-protos.h|   3 +
 gcc/config/aarch64/aarch64-simd.md |  27 ++
 gcc/config/aarch64/aarch64.c   |  81 ++--
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   8 ++
 gcc/doc/invoke.texi|  19 
 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
 gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 +
 11 files changed, 434 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3432adb..ac63f70 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,22 @@
+2015-07-29  Benedikt Huber  
+   Philipp Tomsich  
+
+   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and
+   rsqrtf.
+   * config/aarch64/aarch64-opts.h: -mrecip has a default value
+   depending on the core.
+   * config/aarch64/aarch64-protos.h: Declare.
+   * config/aarch64/aarch64-simd.md: Matching expressions for
+   frsqrte and frsqrts.
+   * config/aarch64/aarch64.c: New functions. Emit rsqrt
+   estimation code in fast math mode.
+   * config/aarch64/aarch64.md: Added enum entries.
+   * config/aarch64/aarch64.opt: Added options -mrecip and
+   -mlow-precision-recip-sqrt.
+   * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans
+   for frsqrte and frsqrts
+   * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt.
+
 2015-07-08  Jiong Wang  
 
* config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index b6c89b9..adcea07 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -335,6 +335,11 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_RSQRT_DF,
+  AARCH64_BUILTIN_RSQRT_SF,
+  AARCH64_BUILTIN_RSQRT_V2DF,
+  AARCH64_BUILTIN_RSQRT_V2SF,
+  AARCH64_BUILTIN_RSQRT_V4SF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include "aarch64-simd-builtins.def"
@@ -824,6 +829,42 @@ aarch64_init_crc32_builtins ()
 }
 
 void
+aarch64_add_builtin_rsqrt (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+
+  tree V2SF_type_node = build_vector_type (float_type_node, 2);
+  tree V2DF_type_node = build_vector_type (double_type_node, 2);
+  tree V4SF_type_node = build_vector_type (float_type_node, 4);
+
+  ftype = build_function_type_list (double_type_node, double_type_node, 
NULL_TREE);
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
+ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
+
+  ftype = build_function_type_list (float_type_node, float_type_node, 
NULL_TREE);
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_sf",
+ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl;
+
+  ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, NULL_TREE);
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v2df",
+ftype, AARCH64_BUILTIN_RSQRT_V2DF, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_V2DF] = fndecl;
+
+  ftype = build_function_type_list (V2SF_type_node, V2SF_type_node, NULL_TREE);
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v2sf",
+ftype, AARCH64_BUILTIN_RSQRT_V2SF, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_V2SF] = fndecl;
+
+  ftype = build_function_type_list (V4SF_type_node, V4SF_type_node, NULL_TREE);
+  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_v4sf",

Re: C++ delayed folding branch review

2015-07-29 Thread Jason Merrill


On 07/28/2015 04:10 PM, Kai Tietz wrote:

2015-07-28 1:14 GMT+02:00 Kai Tietz :

2015-07-27 18:51 GMT+02:00 Jason Merrill :

I've trimmed this to the previously mentioned issues that still need to be
addressed; I'll do another full review after these are dealt with.


Thanks for doing this summary of missing parts of prior review.


On 06/13/2015 12:15 AM, Jason Merrill wrote:


On 06/12/2015 12:11 PM, Kai Tietz wrote:


@@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp)
   {
 if (TREE_TYPE (temp) == type)
   return temp;
+  STRIP_NOPS (temp);
+  if (TREE_TYPE (temp) == type)
+return temp;
@@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx
*ctx,
tree t,
   bool
   reduced_constant_expression_p (tree t)
   {
+  /* Make sure we remove useless initial NOP_EXPRs.  */
+  STRIP_NOPS (t);



Within the constexpr code we should be folding away NOPs as they are
generated, they shouldn't live this long.



Well, we might see them on overflows ...



We shouldn't within the constexpr code.  NOPs for expressions that are
non-constant due to overflow are added in
cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle
of constexpr evaluation.


^


@@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx
*ctx, tree t,
&& is_dummy_object (x))
  {
x = ctx->object;
- x = cp_build_addr_expr (x, tf_warning_or_error);
+ if (x)
+   x = cp_build_addr_expr (x, tf_warning_or_error);
+ else
+   x = get_nth_callarg (t, i);



This still should not be necessary.



Yeah, most likely.  But I got initially here some issues, so I don't
see that this code would worsen things.



If this code path is hit, that means something has broken my design, and
I don't want to just paper over that.  Please revert this change.


^


   case SIZEOF_EXPR:
+  if (processing_template_decl
+ && (!COMPLETE_TYPE_P (TREE_TYPE (t))
+ || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST))
+   return t;



Why is this necessary?



We don't want to resolve SIZEOF_EXPR within template-declarations for
incomplete types, of if its size isn't fixed.  Issue is that we
otherwise get issues about expressions without existing type (as usual
within template-declarations for some expressions).



Yes, but we shouldn't have gotten this far with a dependent sizeof;
maybe_constant_value just returns if
instantiation_dependent_expression_p is true.


^


@@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const
constexpr_ctx
*ctx, tree t,
   case CONVERT_EXPR:
   case VIEW_CONVERT_EXPR:
   case NOP_EXPR:
+case UNARY_PLUS_EXPR:
 {
+   enum tree_code tcode = TREE_CODE (t);
  tree oldop = TREE_OPERAND (t, 0);
+
+   if (tcode == NOP_EXPR && TREE_TYPE (t) == TREE_TYPE (oldop) &&
TREE_OVERFLOW_P (oldop))
+ {
+   if (!ctx->quiet)
+ permerror (input_location, "overflow in constant
expression");
+   /* If we're being permissive (and are in an enforcing
+   context), ignore the overflow.  */
+   if (!flag_permissive)
+ *overflow_p = true;
+   *non_constant_p = true;
+
+   return t;
+ }
  tree op = cxx_eval_constant_expression (ctx, oldop,



Why doesn't the call to cxx_eval_constant_expression at the bottom here
handle oldop having TREE_OVERFLOW set?



I just handled the case that we see here a wrapping NOP_EXPR around an
overflow.  As this isn't handled by cxx_eval_constant_expression.



How does it need to be handled?  A NOP_EXPR wrapped around an overflow
is there to indicated that the expression is non-constant, and it can't
be simplified any farther.

Please give an example of what was going wrong.


^


@@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
gimple_seq *post_p)

 switch (code)
   {
+case SIZEOF_EXPR:
+  if (SIZEOF_EXPR_TYPE_P (*expr_p))
+   *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND
(*expr_p,
+
0)),
+ SIZEOF_EXPR, false);
+  else if (TYPE_P (TREE_OPERAND (*expr_p, 0)))
+   *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p,
0),
+ SIZEOF_EXPR, false);
+  else
+   *expr_p = cxx_sizeof_or_alignof_expr (TREE_OPERAND (*expr_p,
0),
+ SIZEOF_EXPR, false);
+  if (*expr_p == error_mark_node)
+   *expr_p = size_one_node;
+
+  *expr_p = maybe_constant_value (*expr_p);
+  ret = GS_OK;
+  break;



Why are these surviving until gimplification time?



This might be still necessary. I will retest, when bootstrap works.
As we now added SIZEOF_EXPR folding to cp_fold, and if we catch all
expressions a sizeof can occure, this shouldn't be necessary anymore.
AFAIR I saw here some issues about initialzation for global-variables,
which we

[PATCH v3][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-07-29 Thread Benedikt Huber

This third revision of the patch:
 * makes -mrecip default value specified per core.
 * disables rsqrt when -Os is given.

Ok for check in.

Benedikt Huber (1):
  2015-07-29  Benedikt Huber  
Philipp Tomsich  

 gcc/ChangeLog  |  19 
 gcc/config/aarch64/aarch64-builtins.c  | 103 
 gcc/config/aarch64/aarch64-opts.h  |   7 ++
 gcc/config/aarch64/aarch64-protos.h|   3 +
 gcc/config/aarch64/aarch64-simd.md |  27 ++
 gcc/config/aarch64/aarch64.c   |  81 ++--
 gcc/config/aarch64/aarch64.md  |   3 +
 gcc/config/aarch64/aarch64.opt |   8 ++
 gcc/doc/invoke.texi|  19 
 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c |  63 
 gcc/testsuite/gcc.target/aarch64/rsqrt.c   | 107 +
 11 files changed, 434 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

-- 
1.9.1

[Patch, fortran] PR64921 class_allocate_18 failure

2015-07-29 Thread Mikael Morin


Hello,

I submit the PR64921 fix I posted two days ago on bugzilla.
The problem comes from the finalization wrapper not having the 
always_explicit attribute set, so that when it is called, the array 
argument is passed without descriptor, but the argument declaration is a 
descriptor array.  Boom!


Normally the always_explicit attribute is set automatically during 
resolution, but the code generated by finalization is not passed through 
resolution.  As resolution also takes care of error reporting,

I prefer to not pass the generated code through resolution.

So the attached patch just sets the flag.
I think it can be backported.

The test is already present on the trunk and 5 branch. I plan to add it 
to 4.9 as well.


Regression tested on x86_64-unkown-linux-gnu. OK for 6/5/4.9 ?

Mikael


2015-07-29  Mikael Morin  

PR fortran/64921
* class.c (generate_finalization_wrapper): Set finalization
procedure symbol's always_explicit attribute.

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 218973d..7a9e275 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -1599,6 +1599,7 @@ generate_finalization_wrapper (gfc_symbol *derived, gfc_namespace *ns,
   final->ts.type = BT_INTEGER;
   final->ts.kind = 4;
   final->attr.artificial = 1;
+  final->attr.always_explicit = 1;
   final->attr.if_source = expr_null_wrapper ? IFSRC_IFBODY : IFSRC_DECL;
   if (ns->proc_name->attr.flavor == FL_MODULE)
 final->module = ns->proc_name->name;

[PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself.

2015-07-29 Thread Ville Voutilainen

Fyi.

2015-07-29  Ville Voutilainen  
* MAINTAINERS (Write After Approval): Add myself.
diff --git a/ChangeLog b/ChangeLog
index 4fcf016..bf49729 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,6 @@
+2015-07-29  Ville Voutilainen  
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2012-12-29  Ben Elliston  
 
* config.sub, config.guess: Import from upstream.
diff --git a/MAINTAINERS b/MAINTAINERS
index bdfd2be..1e9211a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -592,6 +592,7 @@ Andre Vehreschild   
 Alex Velenko   
 Ilya Verbin
 Kugan Vivekanandarajah 
+Ville Voutilainen  
 Tom de Vries   
 Nenad Vukicevic
 Feng Wang

Re: [PATCH][21/n] Remove GENERIC stmt combining from SCCVN

2015-07-29 Thread Richard Biener

On July 29, 2015 6:14:58 PM GMT+02:00, Paolo Carlini  
wrote:
>Hi,
>
>On 07/28/2015 09:20 AM, Richard Biener wrote:
>> This moves/merges the equality folding of decl addresses from
>> fold_comparison with that from fold_binary in match.pd.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to
>trunk.
>>
>> Richard.
>>
>> 2015-07-28  Richard Biener  
>>
>>  * fold-const.c (fold_comparison): Remove equality folding
>>  of decl addresses ...
>>  * match.pd: ... here and merge with existing pattern.
>I didn't double check with r226298, but I'm pretty sure this change of 
>yours has to do with the FAIL:
>
>FAIL: experimental/optional/constexpr/make_optional.cc (test for excess
>
>errors)
>UNRESOLVED: experimental/optional/constexpr/make_optional.cc
>compilation 
>failed to produce executable
>
>which now we are all seeing. Certainly the FAIL is there with r226299 
>and the library was clean a few revisions before. Note that the
>testcase 
>involves comparisons of decl addresses ;)

Yeah, fix is in testing.

Richard.

>Thanks,
>Paolo.

[gomp4] Some additional OpenACC reduction tests

2015-07-29 Thread Julian Brown

Hi,

This is a set of 19 new tests for OpenACC reductions, covering several
ways of performing reductions over the parallel and loop directives
using gang or worker/vector level parallelism. (The semantics are quite
subtle in some places, but I believe the tests follow the specification
to the letter at least, E&OE.)

Several of these do not pass yet, so have been marked with XFAILs.

I will apply to gomp4 branch shortly.

Cheers,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-reduction-*.c: New tests.
* testsuite/par-reduction-*.c: New tests.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-*.c:
New tests.commit d6cb22b11bbe6f536bd0f6d5ce8349266040
Author: Julian Brown 
Date:   Wed Jul 29 10:04:36 2015 -0700

Some new OpenACC reduction tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
new file mode 100644
index 000..52f9a8f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
@@ -0,0 +1,43 @@
+#include 
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang reduction(+:res)
+for (i = 0; i < 1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang reduction(*:res)
+for (i = 0; i < 12; i++)
+  res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+hres *= arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
new file mode 100644
index 000..b5e3b2f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
@@ -0,0 +1,28 @@
+#include 
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang vector reduction(+:res)
+for (i = 0; i < 1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
new file mode 100644
index 000..d724680
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
@@ -0,0 +1,28 @@
+#include 
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang worker reduction(+:res)
+for (i = 0; i < 1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
new file mode 100644
index 000..d610373
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
@@ -0,0 +1,28 @@
+#include 
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+int
+main (int argc, char *argv[])
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		   copy(res)
+  {
+#pragma acc loop gang worker vector reduction(+:res)
+for (i = 0; i < 1024; i++)
+  res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+hres += arr[i];
+
+  assert (res == hres);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
new file mode 100644
index 000..3e5c707
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -0,0 +1,36 @@
+/* { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "*" } { "" } } */
+
+#include 
+
+/* Test of reduction on loop directive (ga

Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-29 Thread Tom de Vries


On 29/07/15 14:00, Richard Biener wrote:

On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries  wrote:

On 29/07/15 10:09, Richard Biener wrote:


On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries 
wrote:


On 28/07/15 09:59, Richard Biener wrote:



On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries 

wrote:



Hi,

this patch allows parallelization and vectorization of reduction
operators
that are guaranteed to not overflow (such as min and max operators),
independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?




Hmm, I don't like that no_overflow_tree_code function.  We have a much
more
clear understanding which codes may overflow or trap.  Thus please add
a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED}
like



Done.


bool
operation_overflow_traps (tree type, enum tree_code code)
{
 if (!ANY_INTEGRAL_TYPE_P (type)




I've changed this test into a gcc_checking_assert.



|| !TYPE_OVERFLOW_TRAPS (type))
   return false;
 switch (code)
   {
   case PLUS_EXPR:
   case MINUS_EXPR:
   case MULT_EXPR:
   case LSHIFT_EXPR:
  /* Can overflow in various ways */
   case TRUNC_DIV_EXPR:
   case EXACT_DIV_EXPR:
   case FLOOR_DIV_EXPR:
   case CEIL_DIV_EXPR:
  /* For INT_MIN / -1 */
   case NEGATE_EXPR:
   case ABS_EXPR:
  /* For -INT_MIN */
  return true;
   default:
  return false;
  }
}

and similar variants for _wraps and _undefined.  I think we decided at
some point
the compiler should not take advantage of the fact that lshift or
*_div have undefined
behavior on signed integer overflow, similar we only take advantage of
integral-type
overflow behavior, not vector or complex.  So we could reduce the
number of cases
the functions return true if we document that it returns true only for
the cases where
the compiler needs to / may assume wrapping behavior does not take
place.
As for _traps for example we only have optabs and libfuncs for
plus,minus,mult,negate
and abs.




I've tried to capture all of this in the three new functions:
- operation_overflows_and_traps
- operation_no_overflow_or_wraps
- operation_overflows_and_undefined (unused atm)

I've also added the graphite bit.

OK for trunk, if bootstrap and reg-test succeeds?



+/* Returns true if CODE operating on operands of type TYPE can overflow,
and
+   fwrapv generates trapping insns for CODE.  */

ftrapv



Done.


+bool
+operation_overflows_and_traps (tree type, enum tree_code code)
+{

operation_overflow_traps

is better wording.  Meaning that when the operation overflows then it
traps.



AFAIU, the purpose of the function is to enable optimizations when it
returns false, that is:
- if the operation doesn't overflow, or
- if the operation overflows, but doesn't trap.

The name operation_overflow_traps does not make clear what it returns when
the operation doesn't overflow. If the name doesn't make it clear, you need
to be conservative, that is, return true. Which defies the purpose of the
function.

I've changed the name to operation_no_trapping_overflow (and inverted logic
in the function).

But perhaps you want operation_overflow_traps with a conservative return for
non-overflow operations, and use it like this:
...
   else if (INTEGRAL_TYPE_P (type) && check_reduction)
 {
   if (operation_overflows (type, code)
   && operation_overflow_traps (type, code))
 {
   /* Changing the order of operations changes the semantics.  */
...
?


I think operation_no_trapping_overflow has the same wording issue as
operation_overflow_traps but I'm not a native speaker


Hmm, I'm also not a native speaker. I think I understand what you mean, 
if operation_no_trapping_overflow is read with stress on 'trapping', we 
have the same ambiguity issue.


[ Possibility: a more verbose variant, but I hope no ambiguity: 
operation_can_overflow_and_trap ? ]


> so I'll take your

word that operation_no_trapping_overflow is non-ambiguous iff the
operation cannot overflow.

And no, I didn't mean to use it in combination with operation_overflows.


+  /* We don't take advantage of integral type overflow behaviour for
complex and
+ vector types.  */

We don't generate instructions that trap on overflow for complex or vector
types



Done.


+  if (!INTEGRAL_TYPE_P (type))
+return true;

+  switch (code)
+{
+case PLUS_EXPR:
+case MINUS_EXPR:
+case MULT_EXPR:
+case LSHIFT_EXPR:
+  /* Can overflow in various ways.  */

we don't have a trapping optab for lshift

+case TRUNC_DIV_EXPR:
+case EXACT_DIV_EXPR:
+case FLOOR_DIV_EXPR:
+case CEIL_DIV_EXPR:

nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return
true
for those.



Before the logic inversion, we return false for these (And also for
operators that do not overflow).


+/* Returns true if CODE operating on operands of type TYPE cannot
overflow, or
+   wraps on ove

[gomp4.1] Various accelerator updates from OpenMP 4.1

2015-07-29 Thread Jakub Jelinek

On Fri, Jul 24, 2015 at 10:04:57PM +0200, Jakub Jelinek wrote:
> Another version.
> What to do with zero-length array sections vs. objects is still under heated
> debates, so target8.f90 keeps failing intermittently.

Here is a new version of the patch, with various additions (implemented
GOMP_MAP_FIRSTPRIVATE_INT I've talked about, it now handles use_device_ptr
and handles is_device_ptr with array decls (silly, but seems the accel folks
want it for some strange reason), etc.) and it special cases zero length
array sections rather than all zero length mappings.
The heated debates continue, so perhaps that part -
GOMP_MAP_ZERO_LEN_ARRAY_SECTION - will need reversion and replacement with
something else, we'll see.  This let's the testsuite pass for now except
for the two LTO ICEs, both without offloading (host fallback only) and with
Intel MIC offloading.  Committed to gomp-4_1-branch.

Ilya, I think now is the time to update your enter data/exit data patch.

2015-07-29  Jakub Jelinek  

gcc/
* tree.h (OMP_TARGET_COMBINED): Define.
(OMP_CLAUSE_SET_MAP_KIND): Cast to unsigned int rather than unsigned
char.
(OMP_CLAUSE_MAP_PRIVATE,
OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION): Define.
* tree-core.h (struct tree_omp_clause): Change type of map_kind
from unsigned char to unsigned int.
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_0LEN_ARRAY.
(enum omp_region_type): Add ORT_COMBINED_TARGET.
(struct gimplify_omp_ctx): Add target_map_scalars_firstprivate,
target_map_pointers_as_0len_arrays and
target_firstprivatize_array_bases fields.
(maybe_fold_stmt): Adjust check for ORT_TARGET for the addition of
ORT_COMBINED_TARGET.
(omp_notice_threadprivate_variable): Likewise.
(omp_firstprivatize_variable): Likewise. 
If ctx->target_map_scalars_firstprivate is set, firstprivatize
as GOVD_FIRSTPRIVATE.
(omp_add_variable): Allow map clause together with data sharing
clauses.  For data sharing clause with VLA decl
on omp target/target data don't add firstprivate for the pointer.
(omp_notice_variable): Adjust check for ORT_TARGET for the addition
of ORT_COMBINED_TARGET.  Handle implicit mapping of pointers
as zero length array sections and
ctx->target_map_scalars_firstprivate mapping of scalars as
firstprivate data sharing.
(gimplify_scan_omp_clauses): Initialize
ctx->target_map_scalars_firstprivate,
ctx->target_firstprivatize_array_bases and
ctx->target_map_pointers_as_0len_arrays.  Add firstprivate for
linear clause even to target region if combined.  Remove
map clauses with GOMP_MAP_FIRSTPRIVATE_POINTER kind from
OMP_TARGET_{,ENTER_,EXIT_}DATA.  For GOMP_MAP_FIRSTPRIVATE_POINTER
map kind with non-INTEGER_CST OMP_CLAUSE_SIZE firstprivatize
the bias.
(gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_0LEN_ARRAY.
If gimplify_omp_ctxp->target_firstprivatize_array_bases, use
GOMP_MAP_FIRSTPRIVATE_POINTER map kind instead of
GOMP_MAP_POINTER.
(gimplify_adjust_omp_clauses): Adjust check for ORT_TARGET for the
addition of ORT_COMBINED_TARGET.  Use
GOMP_MAP_FIRSTPRIVATE_POINTER instead of GOMP_MAP_POINTER if
ctx->target_firstprivatize_array_bases for VLAs.  Set
OMP_CLAUSE_MAP_PRIVATE if both data sharing and map clause
appear together.
(gimplify_omp_workshare): Adjust check for ORT_TARGET for the
addition of ORT_COMBINED_TARGET.  Use ORT_COMBINED_TARGET if
OMP_TARGET_COMBINED.
* omp-low.c (lookup_sfield): Change first argument to
splay_tree_key, add overload with tree first argument.
(maybe_lookup_field): Likewise.
(build_sender_ref): Likewise.
(scan_sharing_clauses): Handle VLAs in target firstprivate and
is_device_ptr clauses.  Fix up variable shadowing.  Handle
OMP_CLAUSE_USE_DEVICE_PTR.  Handle OMP_CLAUSE_MAP_PRIVATE.  Handle
GOMP_MAP_FIRSTPRIVATE_POINTER map kind.
(handle_simd_reference): Use get_name.
(lower_rec_input_clauses): Likewise.  Use BUILT_IN_ALLOCA_WITH_ALIGN
instead of BUILT_IN_ALLOCA.
(lower_send_clauses): Use new lookup_sfield overload.
(lower_omp_target): Handle GOMP_MAP_FIRSTPRIVATE_POINTER map kind.
Handle OMP_CLAUSE_PRIVATE VLAs.  Handle OMP_CLAUSE_USE_DEVICE_PTR,
handle arrays and references to arrays in OMP_CLAUSE_IS_DEVICE_PTR
clause.  Handle OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_FIRSTPRIVATE_POINTER.
gcc/c/
* c-tree.h (c_finish_omp_clauses): Add is_omp argument.
* c-parser.c (c_parser_oacc_all_clauses, c_parser_omp_all_clauses,
c_parser_oacc_cache, omp_split_clauses, c_parser_cilk_for): Adj

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-29 Thread Steve Kargl

On Wed, Jul 29, 2015 at 02:04:12PM +0200, Richard Biener wrote:
> On Wed, Jul 29, 2015 at 1:59 PM, Mikael Morin  wrote:
> > Le 29/07/2015 13:22, Richard Biener a écrit :
> >>
> >> On Wed, Jul 29, 2015 at 11:34 AM, Mikael Morin 
> >> wrote:
> >>>
> >>> Le 29/07/2015 10:26, Richard Biener a écrit :
> >>
> >>
> >> Did you try using vec_safe_splice?
> 
> 
> 
>  That handles NULL retargs, not NULL or empty arglist.
> 
> >>> I think retargs is NULL.
> >>
> >>
> >> Not if the patch fixes anything.
> >>
> > The case retargs == NULL is the case arglen == 0, which means every vector
> > pointer we are about to splice is NULL.
> > So the patch fixes it.
> 
> Ok, that wasn't obvious from reading the patch.
> 

This builds and passes regression testing on x86_64-*-freebsd.
OP found the problem by running the sanatizers.  I don't
know how to build gcc with this as a build option.  I'll
commit whichever diff you recommend.
 
Index: trans-expr.c
===
--- trans-expr.c(revision 226328)
+++ trans-expr.c(working copy)
@@ -5921,18 +5921,18 @@ gfc_conv_procedure_call (gfc_se * se, gf
   vec_safe_reserve (retargs, arglen);
 
   /* Add the return arguments.  */
-  retargs->splice (arglist);
+  vec_safe_splice (retargs, arglist);
 
   /* Add the hidden present status for optional+value to the arguments.  */
-  retargs->splice (optionalargs);
+  vec_safe_splice (retargs, optionalargs);
 
   /* Add the hidden string length parameters to the arguments.  */
-  retargs->splice (stringargs);
+  vec_safe_splice (retargs, stringargs);
 
   /* We may want to append extra arguments here.  This is used e.g. for
  calls to libgfortran_matmul_??, which need extra information.  */
-  if (!vec_safe_is_empty (append_args))
-retargs->splice (append_args);
+  vec_safe_splice (retargs, append_args);
+
   arglist = retargs;
 
   /* Generate the actual call.  */

-- 
Steve

Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-29 Thread Abe


Well.  We don't generally introduce regressions with changes.


Understood.  Regressions are bad, of course.  TTBOMK the
regressions in question are temporary.  Once they are gone,
I think we can then look at whether or not we still
need to keep the old if converter in trunk.  Ideally,
it eventually becomes redundant and unneeded.



(well, the patch still needs review -

> I hope to get to that this week).

After I`ve done the SPEC-based analysis, my next planned steps
on this work are to disable the code that [in my WIP] currently
causes conversion to be enabled by default when autovectorization
is enabled, then to re-integrate the old converter and implement
the switches that will give GCC users access to the modes I described
in a recent email from me.  You might prefer to delay your code review
until I have that all done and a new version of the patch submitted.

Regards,

Abe

Re: [PATCH][21/n] Remove GENERIC stmt combining from SCCVN

2015-07-29 Thread Paolo Carlini


Hi,

On 07/28/2015 09:20 AM, Richard Biener wrote:

This moves/merges the equality folding of decl addresses from
fold_comparison with that from fold_binary in match.pd.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-28  Richard Biener  

* fold-const.c (fold_comparison): Remove equality folding
of decl addresses ...
* match.pd: ... here and merge with existing pattern.
I didn't double check with r226298, but I'm pretty sure this change of 
yours has to do with the FAIL:


FAIL: experimental/optional/constexpr/make_optional.cc (test for excess 
errors)
UNRESOLVED: experimental/optional/constexpr/make_optional.cc compilation 
failed to produce executable


which now we are all seeing. Certainly the FAIL is there with r226299 
and the library was clean a few revisions before. Note that the testcase 
involves comparisons of decl addresses ;)


Thanks,
Paolo.

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-07-29 Thread Paul Richard Thomas

Dear All,

My reply is the same as FX's. However, I am perfectly happy to
eliminate the initialization. The correct state is ensured by
gfc_dump_module.

Cheers

Paul

On 29 July 2015 at 17:31, FX  wrote:
>> Why do you initialize a static variable to false?
>
> You mean because false is equal to zero and it will be the default 
> initialization anyway?
> I quite like that the default value is explicit.
>
> FX



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 05:31:57PM +0200, FX wrote:
> > Why do you initialize a static variable to false?
> 
> You mean because false is equal to zero and it will be the default 
> initialization anyway?

Yes.

> I quite like that the default value is explicit.

Ok then. 

Marek

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-07-29 Thread FX

> Why do you initialize a static variable to false?

You mean because false is equal to zero and it will be the default 
initialization anyway?
I quite like that the default value is explicit.

FX

Re: [PATCH, libgfortran]: Avoid left shift of negative value warning

2015-07-29 Thread FX

> 2015-07-29  Uros Bizjak  
> 
>PR libgfortran/66650
>* libgfortran.h (GFC_DTYPE_SIZE_MASK): Rewrite to avoid
>"left shift of negative value" warning.

OK. Thanks for the patch.

FX

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 05:08:19PM +0200, Paul Richard Thomas wrote:
> Index: gcc/fortran/module.c
> ===
> *** gcc/fortran/module.c  (revision 226054)
> --- gcc/fortran/module.c  (working copy)
> *** read_module (void)
> *** 5283,5291 
> --- 5283,5296 
>  PRIVATE, then private, and otherwise it is public unless the default
>  access in this context has been declared PRIVATE.  */
> 
> + static bool dump_smod = false;

Why do you initialize a static variable to false?

Marek

Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-07-29 Thread Paul Richard Thomas

Dear All,

On 24 July 2015 at 10:08, Damian Rouson  wrote:
> I love this idea and had similar thoughts as well.
>
> :D
>
> Sent from my iPhone
>
>> On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas 
>>  wrote:
>>
>> Dear Mikael,
>>
>> It had crossed my mind also that a .mod and a .smod file could be
>> written. Normally, the .smod files are produced by the submodules
>> themselves, so that their descendants can pick up the symbols that
>> they generate. There is no reason at all why this could not be
>> implemented; early on in the development I did just this, although I
>> think that it would now be easier to modify this patch.
>>
>> One huge advantage of proceeding in this way is that any resulting
>> library can be distributed with the .mod file alone so that the
>> private entities are never exposed. The penalty is that a second file
>> is output.
>>
>> With best regards
>>
>> Paul
>>

Please find attached the implementation of this suggestion.

Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the
original preferred?

Cheers

Paul

2015-07-29  Paul Thomas  

PR fortran/52846
* module.c (check_access): Return true if new static flag
'dump_smod' is true..
(gfc_dump_module): Rename original 'dump_module' and call from
new version. Use 'dump_smod' rather than the stack state to
determine if a submodule is being processed. The new version of
this procedure sets 'dump_smod' depending on the stack state and
then writes both the mod and smod files if a module is being
processed or just the smod for a submodule.
(gfc_use_module): Eliminate the check for module_name and
submodule_name being the same.
* trans-decl.c (gfc_finish_var_decl, gfc_build_qualified_array,
get_proc_pointer_decl): Set TREE_PUBLIC unconditionally and use
the conditions to set DECL_VISIBILITY as hidden and to set as
true DECL_VISIBILITY_SPECIFIED.

2015-07-29  Paul Thomas  

PR fortran/52846

* lib/fortran-modules.exp: Call cleanup-submodules from
cleanup-modules.
* gfortran.dg/public_private_module_2.f90: Add two XFAILS to
cover the cases where private entities are no longer optimized
away.
* gfortran.dg/public_private_module_6.f90: Add an XFAIL for the
same reason.
* gfortran.dg/submodule_1.f08: Change cleanup module names.
* gfortran.dg/submodule_5.f08: The same.
* gfortran.dg/submodule_9.f08: The same.
* gfortran.dg/submodule_10.f08: New test
Index: gcc/fortran/module.c
===
*** gcc/fortran/module.c(revision 226054)
--- gcc/fortran/module.c(working copy)
*** read_module (void)
*** 5283,5291 
--- 5283,5296 
 PRIVATE, then private, and otherwise it is public unless the default
 access in this context has been declared PRIVATE.  */

+ static bool dump_smod = false;
+
  static bool
  check_access (gfc_access specific_access, gfc_access default_access)
  {
+   if (dump_smod)
+ return true;
+
if (specific_access == ACCESS_PUBLIC)
  return TRUE;
if (specific_access == ACCESS_PRIVATE)
*** read_crc32_from_module_file (const char*
*** 5961,5968 
 processing the module, dump_flag will be set to zero and we delete
 the module file, even if it was already there.  */

! void
! gfc_dump_module (const char *name, int dump_flag)
  {
int n;
char *filename, *filename_tmp;
--- 5966,5973 
 processing the module, dump_flag will be set to zero and we delete
 the module file, even if it was already there.  */

! static void
! dump_module (const char *name, int dump_flag)
  {
int n;
char *filename, *filename_tmp;
*** gfc_dump_module (const char *name, int d
*** 5970,5976 

module_name = gfc_get_string (name);

!   if (gfc_state_stack->state == COMP_SUBMODULE)
  {
name = submodule_name;
n = strlen (name) + strlen (SUBMODULE_EXTENSION) + 1;
--- 5975,5981 

module_name = gfc_get_string (name);

!   if (dump_smod)
  {
name = submodule_name;
n = strlen (name) + strlen (SUBMODULE_EXTENSION) + 1;
*** gfc_dump_module (const char *name, int d
*** 5991,5997 
strcpy (filename, name);
  }

!   if (gfc_state_stack->state == COMP_SUBMODULE)
  strcat (filename, SUBMODULE_EXTENSION);
else
strcat (filename, MODULE_EXTENSION);
--- 5996,6002 
strcpy (filename, name);
  }

!   if (dump_smod)
  strcat (filename, SUBMODULE_EXTENSION);
else
strcat (filename, MODULE_EXTENSION);
*** gfc_dump_module (const char *name, int d
*** 6060,6065 
--- 6065,6091 
  }


+ void
+ gfc_dump_module (const char *name, int dump_flag)
+ {
+   if (gfc_state_stack->state == COMP_SUBMODULE)
+ dump_smod = true;
+   else
+ dump_smod =false;
+
+   dump_module (name, dump_flag);
+
+   if (dump_smod)
+ return;
+
+   /* Write a submodule file from a module.  The

[patch] libstdc++/66829 use -std=gnu++98 to compile testsuite_shared.cc

2015-07-29 Thread Jonathan Wakely


This file contains some C++98 instantiations that are then compared to
similar instantiations in C++11 tests, so it needs to be compiled with
an explicit -std option now that g++ defaults to gnu++14.

Tested i686-linux, x86_64-linux and ppc64le-linux.

Committed to trunk.

commit 641f48dbc01bb134de1b99414b3ebb4f114d92c0
Author: Jonathan Wakely 
Date:   Wed Jul 29 14:00:51 2015 +0100

	PR libstdc++/66829
	* testsuite/lib/libstdc++.exp (v3-build_support): Compile
	testsuite_shared.cc with -std=gnu++98.

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp
index d60062d..88738b7 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -668,7 +668,7 @@ proc v3-build_support { } {
 	# Compile with "-w" so that warnings issued by the compiler
 	# do not prevent compilation.
 	if { [v3_target_compile $srcdir/util/$f $object_file "sharedlib" \
-	 [list "incdir=$srcdir" "additional_flags=-fno-inline -w -shared -fPIC -DPIC"]]
+	 [list "incdir=$srcdir" "additional_flags=-fno-inline -w -shared -fPIC -DPIC -std=gnu++98"]]
 		 != "" } {
 		error "could not compile $f"
 	}

Re: [PATCH 3/15][ARM] Add V8HFmode and float16x8_t type

2015-07-29 Thread Kyrill Tkachov


Hi Alan,

On 28/07/15 12:24, Alan Lawrence wrote:

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00477.html.
The only change is to publish float16x8_t only if we actually have a scalar
__fp16 type.

gcc/ChangeLog:

  * config/arm/arm.h (VALID_NEON_QREG_MODE): Add V8HFmode.

  * config/arm/arm.c (arm_vector_mode_supported_p): Support V8HFmode.

  * config/arm/arm-builtins.c (v8hf_UP): New.
  (arm_init_simd_builtin_types): Initialise Float16x8_t.

  * config/arm/arm-simd-builtin-types.def (Float16x8_t): New.

  * config/arm/arm_neon.h (float16x8_t): New typedef.


@@ -822,6 +823,7 @@ arm_init_simd_builtin_types (void)
  we have a scalar __fp16 type.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
Please put the Float16x8_t intialisation right after the Float16x4_t one.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6e074ea3d3910e1d7abf0299f441973259023606..0faa46ceea51ef6c524c8ff8c063f329a524c11d
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26251,7 +26251,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == 
V2DImode))
+  || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode
+  || mode == V2DImode || mode == V8HFmode))

Space between == and V4HFmode.

Ok with those changes.

Thanks,
Kyrill

Re: [PATCH 2/15][ARM] float16x4_t intrinsics in arm_neon.h

2015-07-29 Thread Kyrill Tkachov


Hi Alan,
On 28/07/15 12:23, Alan Lawrence wrote:

This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00476.html.
The change is to provide all the new float16 intrinsics only if we actually have
an scalar __fp16 type. (This covers the intrinsics whose implementation is
entirely within arm_neon.h; those requiring .md changes follow in patch 7).


You mean patch 5?



gcc/ChangeLog (unchanged):

* config/arm/arm_neon.h (float16_t, vget_lane_f16, vset_lane_f16,
vcreate_f16, vld1_lane_f16, vld1_dup_f16, vreinterpret_p8_f16,
vreinterpret_p16_f16, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpret_f16_f32, vreinterpret_f16_p64, vreinterpret_f16_s64,
vreinterpret_f16_u64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_u8, vreinterpret_f16_u16,
vreinterpret_f16_u32, vreinterpret_f32_f16, vreinterpret_p64_f16,
vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16): New.


Ok.

Thanks,
Kyrill

Re: [PATCH 1/15][ARM] Hide existing float16 intrinsics unless we have a scalar __fp16 type

2015-07-29 Thread Kyrill Tkachov


Hi Alan,

On 28/07/15 12:23, Alan Lawrence wrote:

This makes the existing float16 vector intrinsics available only when we have an
__fp16 type (i.e. when one of the ARM_FP16_FORMAT_... macros is defined).

Thus, we also rearrange the float16x[48]_t types to use the same type as __fp16
for the element type (ACLE says that __fp16 should be an alias).

To keep the existing gcc.target/arm/neon/vcvt{f16_f32,f32_f16} tests working, as
these do not specify an -mfp16-format, I've modified
check_effective_target_arm_neon_fp16_ok to add in -mfp16-format=ieee *if
necessary* (hence still allowing an explicit -mfp16-format=alternative). A
documentation fix for this follows in the last patch.

gcc/ChangeLog:

* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Move
initialization of HFmode scalar type (float16_t) to...
(arm_init_fp16_builtins): ...here, combining with previous __fp16.


I'd say: "... Here.  Combine with __fp16 initialization code"


(arm_init_builtins): Call arm_init_fp16_builtins earlier and always.

* config/arm/arm_neon.h (vcvt_f16_f32, vcvt_f32_f16): Condition on
having an -mfp16-format.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_neon_fp16_ok_nocache): Add flag variants
with -mfp16-format=ieee.


@@ -1752,12 +1749,11 @@ arm_init_builtins (void)
   if (TARGET_REALLY_IWMMXT)
 arm_init_iwmmxt_builtins ();
 
+  arm_init_fp16_builtins ();

+
   if (TARGET_NEON)
 arm_init_neon_builtins ();
 
-  if (arm_fp16_format)

-arm_init_fp16_builtins ();
-
   if (TARGET_CRC32)
 arm_init_crc32_builtins ();


Can you please add a comment above arm_init_fp16_builtins ();
saying that it needs to be called before arm_init_neon_builtins
so that arm_simd_floatHF_type_node gets initialised properly?
(Or words to that effect).

Ok with the comment.

Thanks,
Kyrill

[PATCH] Fix PR67053

2015-07-29 Thread Richard Biener


The following fixes PR67053 by more closely mirror what fold_binary()s
STRIP_NOPS does to avoid the C++ FE constexpr code to regress.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Yes, I'm thinking on an automated way to more closely mirror
STRIP_[SIGN_]NOPS behavior (on toplevel args).

Richard.

2015-07-29  Richard Biener  

PR middle-end/67053
* match.pd: Allow both operands to independently have conversion
when simplifying compares of addresses.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226345)
+++ gcc/match.pd(working copy)
@@ -1814,7 +1814,7 @@ (define_operator_list CBRT BUILT_IN_CBRT
enough to make fold_stmt not regress when not dispatching to fold_binary.  
*/
 (for cmp (simple_comparison)
  (simplify
-  (cmp (convert?@2 addr@0) (convert? addr@1))
+  (cmp (convert1?@2 addr@0) (convert2? addr@1))
   (with
{
  HOST_WIDE_INT off0, off1;

Re: [PATCH] Work around host compiler placement new aliasing bug

2015-07-29 Thread Richard Biener

On Wed, Jul 29, 2015 at 3:57 PM, Ulrich Weigand  wrote:
> Hello,
>
> this patch is a workaround for the problem discussed here:
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html
>
> The problem is that the new pool allocator code relies on C++ aliasing
> rules related to placement new (basically, that placement new changes
> the dynamic type of the referenced memory).  GCC compilers prior to
> version 4.3 did not implement this rule correctly (PR 29286).
>
> When building current GCC with a host compiler that is affected by this
> bug, and we build with optimization enabled (this typically only happens
> when building a cross-compiler), the resulting compiler binary may be
> miscompiled.
>
> The patch below attempts to detect this situation by checking whether
> the host compiler is a version of GCC prior to 4.3 (but stil accepts
> the -fno-strict-aliasing flag).  If so, -fno-strict-aliasing is added
> to the flags when building the compiler binary.
>
> Tested on i686-linux, and when building an SPU cross-compiler using
> a gcc 4.1 powerpc64-linux host compiler.
>
> OK for mainline?

Ok if nobody objects.

Thanks,
Richard.

> Bye,
> Ulrich
>
> gcc/ChangeLog:
>
> * configure.ac: Set aliasing_flags to -fno-strict-aliasing if
> the host compiler is affected by placement new aliasing bug.
> * configure: Regenerate.
> * Makefile.in (ALIASING_FLAGS): New variable.
> (ALL_CXXFLAGS): Add $(ALIASING_FLAGS).
>
> Index: gcc/configure
> ===
> --- gcc/configure   (revision 226312)
> +++ gcc/configure   (working copy)
> @@ -789,6 +789,7 @@ c_strict_warn
>  strict_warn
>  c_loose_warn
>  loose_warn
> +aliasing_flags
>  CPP
>  EGREP
>  GREP
> @@ -6526,6 +6527,42 @@ fi
>  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>  fi
>
> +# Check whether compiler is affected by placement new aliasing bug (PR 
> 29286).
> +# If the host compiler is affected by the bug, and we build with optimization
> +# enabled (which happens e.g. when cross-compiling), the pool allocator may
> +# get miscompiled.  Use -fno-strict-aliasing to work around this problem.
> +# Since there is no reliable feature check for the presence of this bug,
> +# we simply use a GCC version number check.  (This should never trigger for
> +# stages 2 or 3 of a native bootstrap.)
> +aliasing_flags=
> +if test "$GCC" = yes; then
> +  saved_CXXFLAGS="$CXXFLAGS"
> +
> +  # The following test compilation will succeed if and only if $CXX accepts
> +  # -fno-strict-aliasing *and* is older than GCC 4.3.
> +  CXXFLAGS="$CXXFLAGS -fno-strict-aliasing"
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX is affected 
> by placement new aliasing bug" >&5
> +$as_echo_n "checking whether $CXX is affected by placement new aliasing 
> bug... " >&6; }
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
> +#error compiler not affected by placement new aliasing bug
> +#endif
> +
> +_ACEOF
> +if ac_fn_cxx_try_compile "$LINENO"; then :
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
> +$as_echo "yes" >&6; }; aliasing_flags='-fno-strict-aliasing'
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +
> +  CXXFLAGS="$saved_CXXFLAGS"
> +fi
> +
>
>
>
> @@ -18301,7 +18338,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18304 "configure"
> +#line 18341 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -18407,7 +18444,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18410 "configure"
> +#line 18447 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> Index: gcc/configure.ac
> ===
> --- gcc/configure.ac(revision 226312)
> +++ gcc/configure.ac(working copy)
> @@ -416,6 +416,32 @@ struct X { typedef long long
>  ]], [[X::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses 
> long long])])
>  fi
>
> +# Check whether compiler is affected by placement new aliasing bug (PR 
> 29286).
> +# If the host compiler is affected by the bug, and we build with optimization
> +# enabled (which happens e.g. when cross-compiling), the pool allocator may
> +# get miscompiled.  Use -fno-strict-aliasing to work around this problem.
> +# Since there is no reliable feature check for the presence of this bug,
> +# we simply use a GCC version number check.  (This should never trigger for
> +# stages 2 or 3 of a native bootstrap.)
> +aliasing_flags=
> +if test "$GCC" = yes; then
> +  saved_CXXFLAGS="$CXXFLAGS"
> +
> +  # The following test compilation will succeed if and only

Re: [PATCH] -Wtautological-compare should be quiet on floats

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 04:08:22PM +0200, Marek Polacek wrote:
> As discussed elsewhere, -Wtautological-compare shouldn't warn about
> floating-point types because of the way NaN behave.
> 
> I've been meaning to commit this one as obvious, but I'm not sure
> whether I should also use HONOR_NANS or whether I can safely ignore
> that here.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-07-29  Marek Polacek  
> 
>   * c-common.c (warn_tautological_cmp): Bail for float types.
> 
>   * c-c++-common/Wtautological-compare-3.c: New test.
> 
> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
> index caa801e..9456729 100644
> --- gcc/c-family/c-common.c
> +++ gcc/c-family/c-common.c
> @@ -1910,6 +1910,12 @@ warn_tautological_cmp (location_t loc, enum tree_code 
> code, tree lhs, tree rhs)
>|| (CONVERT_EXPR_P (rhs) || TREE_CODE (rhs) == NON_LVALUE_EXPR))
>  return;
>  
> +  /* Don't warn if either LHS or RHS has an IEEE floating point-type.

Eh, make this "floating-point type".

> + It could be a NaN, and NaN never compares equal to anything, even
> + itself.  */
> +  if (FLOAT_TYPE_P (TREE_TYPE (lhs)) || FLOAT_TYPE_P (TREE_TYPE (rhs)))
> +return;
> +
>if (operand_equal_p (lhs, rhs, 0))
>  {
>/* Don't warn about array references with constant indices;
> diff --git gcc/testsuite/c-c++-common/Wtautological-compare-3.c 
> gcc/testsuite/c-c++-common/Wtautological-compare-3.c
> index e69de29..64807b0 100644
> --- gcc/testsuite/c-c++-common/Wtautological-compare-3.c
> +++ gcc/testsuite/c-c++-common/Wtautological-compare-3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wtautological-compare" } */
> +/* Test we don't warn for floats.  */
> +
> +struct S { double d; float f; };
> +
> +void
> +fn1 (int i, float f, double d, struct S *s, float *fp)
> +{
> +  if (f == f);
> +  if (f != f);
> +  if (d == d);
> +  if (d != d);
> +  if (fp[i] == fp[i]);
> +  if (fp[i] != fp[i]);
> +  if (s->f == s->f);
> +  if (s->f != s->f);
> +  if (s->d == s->d);
> +  if (s->d != s->d);
> +}
> 
>   Marek

Marek

[PATCH] -Wtautological-compare should be quiet on floats

2015-07-29 Thread Marek Polacek

As discussed elsewhere, -Wtautological-compare shouldn't warn about
floating-point types because of the way NaN behave.

I've been meaning to commit this one as obvious, but I'm not sure
whether I should also use HONOR_NANS or whether I can safely ignore
that here.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-29  Marek Polacek  

* c-common.c (warn_tautological_cmp): Bail for float types.

* c-c++-common/Wtautological-compare-3.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index caa801e..9456729 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1910,6 +1910,12 @@ warn_tautological_cmp (location_t loc, enum tree_code 
code, tree lhs, tree rhs)
   || (CONVERT_EXPR_P (rhs) || TREE_CODE (rhs) == NON_LVALUE_EXPR))
 return;
 
+  /* Don't warn if either LHS or RHS has an IEEE floating point-type.
+ It could be a NaN, and NaN never compares equal to anything, even
+ itself.  */
+  if (FLOAT_TYPE_P (TREE_TYPE (lhs)) || FLOAT_TYPE_P (TREE_TYPE (rhs)))
+return;
+
   if (operand_equal_p (lhs, rhs, 0))
 {
   /* Don't warn about array references with constant indices;
diff --git gcc/testsuite/c-c++-common/Wtautological-compare-3.c 
gcc/testsuite/c-c++-common/Wtautological-compare-3.c
index e69de29..64807b0 100644
--- gcc/testsuite/c-c++-common/Wtautological-compare-3.c
+++ gcc/testsuite/c-c++-common/Wtautological-compare-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-Wtautological-compare" } */
+/* Test we don't warn for floats.  */
+
+struct S { double d; float f; };
+
+void
+fn1 (int i, float f, double d, struct S *s, float *fp)
+{
+  if (f == f);
+  if (f != f);
+  if (d == d);
+  if (d != d);
+  if (fp[i] == fp[i]);
+  if (fp[i] != fp[i]);
+  if (s->f == s->f);
+  if (s->f != s->f);
+  if (s->d == s->d);
+  if (s->d != s->d);
+}

Marek

[PATCH] Work around host compiler placement new aliasing bug

2015-07-29 Thread Ulrich Weigand

Hello,

this patch is a workaround for the problem discussed here:
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html

The problem is that the new pool allocator code relies on C++ aliasing
rules related to placement new (basically, that placement new changes
the dynamic type of the referenced memory).  GCC compilers prior to
version 4.3 did not implement this rule correctly (PR 29286).

When building current GCC with a host compiler that is affected by this
bug, and we build with optimization enabled (this typically only happens
when building a cross-compiler), the resulting compiler binary may be
miscompiled.

The patch below attempts to detect this situation by checking whether
the host compiler is a version of GCC prior to 4.3 (but stil accepts
the -fno-strict-aliasing flag).  If so, -fno-strict-aliasing is added
to the flags when building the compiler binary.

Tested on i686-linux, and when building an SPU cross-compiler using
a gcc 4.1 powerpc64-linux host compiler.

OK for mainline?

Bye,
Ulrich

gcc/ChangeLog:

* configure.ac: Set aliasing_flags to -fno-strict-aliasing if
the host compiler is affected by placement new aliasing bug.
* configure: Regenerate.
* Makefile.in (ALIASING_FLAGS): New variable.
(ALL_CXXFLAGS): Add $(ALIASING_FLAGS).

Index: gcc/configure
===
--- gcc/configure   (revision 226312)
+++ gcc/configure   (working copy)
@@ -789,6 +789,7 @@ c_strict_warn
 strict_warn
 c_loose_warn
 loose_warn
+aliasing_flags
 CPP
 EGREP
 GREP
@@ -6526,6 +6527,42 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 fi
 
+# Check whether compiler is affected by placement new aliasing bug (PR 29286).
+# If the host compiler is affected by the bug, and we build with optimization
+# enabled (which happens e.g. when cross-compiling), the pool allocator may
+# get miscompiled.  Use -fno-strict-aliasing to work around this problem.
+# Since there is no reliable feature check for the presence of this bug,
+# we simply use a GCC version number check.  (This should never trigger for
+# stages 2 or 3 of a native bootstrap.)
+aliasing_flags=
+if test "$GCC" = yes; then
+  saved_CXXFLAGS="$CXXFLAGS"
+
+  # The following test compilation will succeed if and only if $CXX accepts
+  # -fno-strict-aliasing *and* is older than GCC 4.3.
+  CXXFLAGS="$CXXFLAGS -fno-strict-aliasing"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX is affected 
by placement new aliasing bug" >&5
+$as_echo_n "checking whether $CXX is affected by placement new aliasing bug... 
" >&6; }
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
+#error compiler not affected by placement new aliasing bug
+#endif
+
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }; aliasing_flags='-fno-strict-aliasing'
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+  CXXFLAGS="$saved_CXXFLAGS"
+fi
+
 
 
 
@@ -18301,7 +18338,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18304 "configure"
+#line 18341 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18407,7 +18444,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18410 "configure"
+#line 18447 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gcc/configure.ac
===
--- gcc/configure.ac(revision 226312)
+++ gcc/configure.ac(working copy)
@@ -416,6 +416,32 @@ struct X { typedef long long 
 ]], [[X::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses long 
long])])
 fi
 
+# Check whether compiler is affected by placement new aliasing bug (PR 29286).
+# If the host compiler is affected by the bug, and we build with optimization
+# enabled (which happens e.g. when cross-compiling), the pool allocator may
+# get miscompiled.  Use -fno-strict-aliasing to work around this problem.
+# Since there is no reliable feature check for the presence of this bug,
+# we simply use a GCC version number check.  (This should never trigger for
+# stages 2 or 3 of a native bootstrap.)
+aliasing_flags=
+if test "$GCC" = yes; then
+  saved_CXXFLAGS="$CXXFLAGS"
+
+  # The following test compilation will succeed if and only if $CXX accepts
+  # -fno-strict-aliasing *and* is older than GCC 4.3.
+  CXXFLAGS="$CXXFLAGS -fno-strict-aliasing"
+  AC_MSG_CHECKING([whether $CXX is affected by placement new aliasing bug])
+  AC_COMPILE_IFELSE([
+#if (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
+#error compiler not affected by placement new aliasing bug
+#

[PATCH][RTL-ifcvt] Improve conditional select ops on immediates

2015-07-29 Thread Kyrill Tkachov


Hi all,

This patch improves RTL if-conversion on sequences that perform a conditional 
select on integer constants.
Most of the smart logic to do that already exists in the 
noce_try_store_flag_constants function.
However, currently that function is tried after noce_try_cmove.
noce_try_cmove is a simple catch-all function that just loads the two 
immediates and performs a conditional
select between them. It returns true and then the caller noce_process_if_block 
doesn't try any other transformations,
completely skipping the more aggressive transformations that 
noce_try_store_flag_constants allows!

Calling noce_try_store_flag_constants before noce_try_cmove allows for the 
smarter if-conversion transformations
to be used. An example that occurs a lot in the gcc code itself is for the C 
code:
int
foo (int a, int b)
{
  return ((a & (1 << 25)) ? 5 : 4);
}

i.e. test a bit in a and return 5 or 4. Currently on aarch64 this generates the 
naive:
and w2, w0, 33554432  // mask away all bits except bit 25
mov w1, 4
cmp w2, wzr
mov w0, 5
cselw0, w0, w1, ne


whereas with this patch this can be transformed into the much better:
ubfxx0, x0, 25, 1  // extract bit 25
add w0, w0, 4

Another issue I encountered is that the code that claims to perform the 
transformation:
  /* if (test) x = 3; else x = 4;
 =>   x = 3 + (test == 0);  */

doesn't seem to do exactly that in all cases. In fact for that case it will try 
something like:
x = 4 - (test == 0)
which is suboptimal for targets like aarch64 which have a conditional increment 
operation.
This patch tweaks that code to always try to generate an addition of the 
condition rather than
a subtraction.

Anyway, for code:
int
fooinc (int x)
{
  return x ? 1025 : 1026;
}

we currently generate:
mov w2, 1025
mov w1, 1026
cmp w0, wzr
cselw0, w2, w1, ne

whereas with this patch we will generate:
cmp w0, wzr
csetw0, eq
add w0, w0, 1025

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

P.S. noce_try_store_flag_constants is currently gated on 
!targetm.have_conditional_execution () but I don't see
any reason to restrict it on targets with conditional execution. For example, I 
think the first example above
would show a benefit on arm if it was enabled there. But that can be a separate 
investigation.

2015-07-29  Kyrylo Tkachov  

* ifcvt.c (noce_try_store_flag_constants): Reverse when diff is
-STORE_FLAG and condition is reversable.  Prefer to add to the
flag value.
(noce_process_if_block): Try noce_try_store_flag_constants before
noce_try_cmove.

2015-07-29  Kyrylo Tkachov  

* gcc.target/aarch64/csel_bfx_1.c: New test.
* gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d
Author: Kyrylo Tkachov 
Date:   Tue Jul 28 14:59:46 2015 +0100

[RTL-ifcvt] Improve conditional increment ops on immediates

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a57d78c..80d0285 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 
   reversep = 0;
   if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
-	normalize = 0;
+	normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse;
   else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	   && (STORE_FLAG_VALUE == 1
 		   || if_info->branch_cost >= 2))
@@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	 =>   x = 3 + (test == 0);  */
   if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
 	{
-	  target = expand_simple_binop (mode,
-	(diff == STORE_FLAG_VALUE
-	 ? PLUS : MINUS),
-	gen_int_mode (ifalse, mode), target,
+	  rtx_code code = reversep ? PLUS :
+(diff == STORE_FLAG_VALUE ? PLUS
+			   : MINUS);
+	  HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse;
+
+	  target = expand_simple_binop (mode, code,
+	gen_int_mode (to_add, mode), target,
 	if_info->x, 0, OPTAB_WIDEN);
 	}
 
@@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info)
 goto success;
   if (noce_try_abs (if_info))
 goto success;
+  if (!targetm.have_conditional_execution ()
+  && noce_try_store_flag_constants (if_info))
+goto success;
   if (HAVE_conditional_move
   && noce_try_cmove (if_info))
 goto success;
   if (! targetm.have_conditional_execution ())
 {
-  if (noce_try_store_flag_constants (if_info))
-	goto success;
   if (noce_try_addcc (if_info))
 	goto success;
   if (noce_try_store_flag_mask (if_info))
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
new file mode 100644
index 000..c20597f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_1.c
@@ -0,0 +1,11 @@
+/* { dg-do

Re: [PATCH 12/15][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-07-29 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:26:35PM +0100, Alan Lawrence wrote:
> commit 214fcc00475a543a79ed444f9a64061215397cc8
> Author: Alan Lawrence 
> Date:   Wed Jan 28 13:01:31 2015 +
> 
> AArch64 6/N: vcvt{,_high}_f32_f16 (using vect_par_cnst_hi_half, fixing 
> bigendian indices)
> 
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 8bcab72..9869b73 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -361,11 +361,11 @@
>BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
>BUILTIN_VDQF (UNOP, abs, 2)
>  
> -  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
> +  VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)

Should this not use the appropriate "BUILTIN_..." iterator?

>VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
>VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
>  
> -  VAR1 (UNOP, float_extend_lo_, 0, v2df)
> +  VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf)

Likewise.

This looks OK to me with that fixed.

Thanks,
James

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html

2015-07-29 Thread Kenneth Zadeck


I had the following conversation with richi about this patch.

Sorry to reply off thread, but i do net read this group in my mailer.


[09:00]zadeckrichi: i am reviewing a patch and i have a couple 
of questions, do you have a second to look at something?

[09:00]richizadeck: sure
[09:01]zadeckthe patch is 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html
[09:01]zadeckhe has set up the data flow problem correctly, but 
i worry that this really is not the right way to solve this problem.

[09:02]richilet me look at the problem
[09:03]zadeckin particular, the only way he can demonstrate this 
problem in c is with an uninitialized variable.It seems that a 
normal correct program would not normally have this kind of issue unless 
there was some upstream bug coming out of the middle end.

[09:04]richieven for his Ada case it's not initialized it seems
[09:06]richizadeck: I've added a comment to the PR and requested 
info

[09:06]zadeckmy ada is as good as my German.
[09:07]zadeckthe thing is that if you turn this into a truly 
must problem, it will disable a lot of legit transformations.

[09:08]richiyep
[09:09]zadeckthanks
[09:09]richilets see if he can point to a different issue
[09:09]richior produce a C testcase for us
[09:13]jakubrichi: yeah; if it is Ada only thing and Ada 
uninitialized variable must have some special properties, then they'd 
better use some on the side flag for whether it is initialized, or zero 
initialize or whatever Ada expects
[09:15]richijakub: from what I understand of Ada the testcase 
doesn't look like such a case

[09:16]richijakub: so the actual bug is likely somewhere else




Kenny

Re: [v3 PATCH] Implement N4280, Non-member size() and more (Revision 2)

2015-07-29 Thread Jonathan Wakely


On 15/07/15 02:49 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64.


Committed to trunk with some whitespace changes.

Re: [v3 PATCH] PR libstdc++/60970, implement LWG 2148

2015-07-29 Thread Jonathan Wakely


On 25/07/15 22:01 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64.

The proposed resolution of the issue doesn't really say whether our
regression test for PR libstdc++/52931 should remain valid. However,
it doesn't say that we shouldn't keep it valid, either. This approach
keeps it valid, but provides support for hashing enums. It took a while
to figure out suitable jiggery-pokery to make it so, but this approach
passes the testsuite without regressions. I considered an alternative
alias-template-based approach, but while that attempt would've worked
with our current front-end, it would not have worked on clang (and
it's thus far unclear whether it was intended to work by the language
rules).


Committed with a minor change to put is_enum<_Tp>::value as a default
template argument of __hash_enum and add a comment.
commit 7d23fa744a6eb0ee4d0df9f044b72cb172336915
Author: Jonathan Wakely 
Date:   Wed Jul 29 11:09:52 2015 +0100

2015-07-29  Ville Voutilainen  

	PR libstdc++/60970
	* include/bits/functional_hash.h (__hash_enum): New.
	(hash): Derive from __hash_enum.
	* testsuite/20_util/hash/60970.cc: New.

diff --git a/libstdc++-v3/include/bits/functional_hash.h b/libstdc++-v3/include/bits/functional_hash.h
index 3c962fc..88937bd 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -57,6 +57,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct hash;
 
+  // Helper struct for SFINAE-poisoning non-enum types.
+  template::value>
+struct __hash_enum
+{
+private:
+  // Private rather than deleted to be non-trivially-copyable.
+  __hash_enum(__hash_enum&&);
+  ~__hash_enum();
+};
+
+  // Helper struct for hash with enum types.
+  template
+struct __hash_enum<_Tp, true> : public __hash_base
+{
+  size_t
+  operator()(_Tp __val) const noexcept
+  {
+   using __type = typename underlying_type<_Tp>::type;
+   return hash<__type>{}(static_cast<__type>(__val));
+  }
+};
+
+  /// Primary class template hash, usable for enum types only.
+  // Use with non-enum types still SFINAES.
+  template
+struct hash : __hash_enum<_Tp>
+{ };
+
   /// Partial specializations for pointer types.
   template
 struct hash<_Tp*> : public __hash_base
diff --git a/libstdc++-v3/testsuite/20_util/hash/60970.cc b/libstdc++-v3/testsuite/20_util/hash/60970.cc
new file mode 100644
index 000..ddc626f
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/hash/60970.cc
@@ -0,0 +1,36 @@
+// { dg-options "-std=gnu++11" }
+// { dg-do run }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with this library; see the file COPYING3.  If not see
+// .
+
+
+#include 
+#include 
+
+using namespace std;
+
+enum E1 : int {FIRST=1, SECOND=2};
+enum class E2 : int {THIRD=42, FOURTH=666};
+
+int main()
+{
+  VERIFY(hash{}(1) == hash{}(FIRST));
+  VERIFY(hash{}(2) == hash{}(SECOND));
+  VERIFY(hash{}(42) == hash{}(E2::THIRD));
+  VERIFY(hash{}(666) == hash{}(E2::FOURTH));
+}

Re: [PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-07-29 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:26:22PM +0100, Alan Lawrence wrote:
> gcc/ChangeLog:
> 
>   * config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
>   vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
>   vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
>   vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
>   vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
>   vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
>   vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
>   vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
>   vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
>   vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
>   vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
>   vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
>   vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
>   vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
>   vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
>   vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
>   vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
>   vld1q_dup_f16): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vget_high_1.c: Add float16x8->float16x4 case.
>   * gcc.target/aarch64/vget_low_1.c: Likewise.

> @@ -14871,6 +15171,13 @@ vld1q_u64 (const uint64_t *a)
>  
>  /* vld1_dup  */
>  
> +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
> +vld1_dup_f16 (const float16_t* __a)
> +{
> +  float16_t __f = *__a;
> +  return (float16x4_t) { __f, __f, __f, __f };
> +}
> +
>  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
>  vld1_dup_f32 (const float32_t* __a)
>  {
> @@ -14945,6 +15252,13 @@ vld1_dup_u64 (const uint64_t* __a)
>  
>  /* vld1q_dup  */
>  
> +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
> +vld1q_dup_f16 (const float16_t* __a)
> +{
> +  float16_t __f = *__a;
> +  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
> +}
> +

Did you check that these actually emit the expected instruction?

Applying your patch set I see some fairly unpleasant code generation,
but I might have made an error, or perhaps you have another patch in
waiting?

Thanks,
James

Re: [gomp4] Redesign oacc_parallel launch API

2015-07-29 Thread Nathan Sidwell


On 07/29/15 05:22, Thomas Schwinge wrote:

Hi Nathan!

On Tue, 28 Jul 2015 12:52:02 -0400, Nathan Sidwell  wrote:

I've committed this patch to the gomp4 branch to redo the launch API.  I'll post
a version for trunk once the versioning patch gets approved & committed.


Thanks!


(I have not yet looked at the patch in detail.)  There is one regression:

 PASS: libgomp.oacc-fortran/asyncwait-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/asyncwait-2.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test

 libgomp: Trying to map into device [0x10f7930..0x10f7a30) object when 
[0x10f7930..0x10f7a30) is already mapped

Likewise for the other torture testing flags.



Investigating ...  (I've seen those failures be intermittent)

nathan

[PATCH][1/2] Remove GENERIC comparison folding from fold_stmt

2015-07-29 Thread Richard Biener


This removes folding of conditions in GIMPLE_CONDs using fold_binary
from fold_stmt.  All cases appearing during bootstrap and regtest on
x86_64-unknown-linux-gnu are now handled by gimple_simplify and
match.pd patterns (remember this is just two bare operand cases).
I've verified this using the 2nd patch below which passes bootstrap &
regtest.

Boostrapped and tested on x86_64-unknown-linux-gnu.

[2/2] will do the same for comparisons on the RHS of assignments
(note fold_stmt never folded the comparison embedded in RHS
[VEC_]COND_EXPRs, only forward_propagate_into_cond does right now).

Richard.

2015-07-29  Richard Biener  

* gimple-fold.c (fold_gimple_cond): Remove.
(fold_stmt_1): Do not call it.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 226340)
+++ gcc/gimple-fold.c   (working copy)
@@ -529,33 +529,6 @@ fold_gimple_assign (gimple_stmt_iterator
   return NULL_TREE;
 }
 
-/* Attempt to fold a conditional statement. Return true if any changes were
-   made. We only attempt to fold the condition expression, and do not perform
-   any transformation that would require alteration of the cfg.  It is
-   assumed that the operands have been previously folded.  */
-
-static bool
-fold_gimple_cond (gcond *stmt)
-{
-  tree result = fold_binary_loc (gimple_location (stmt),
-gimple_cond_code (stmt),
- boolean_type_node,
- gimple_cond_lhs (stmt),
- gimple_cond_rhs (stmt));
-
-  if (result)
-{
-  STRIP_USELESS_TYPE_CONVERSION (result);
-  if (is_gimple_condexpr (result))
-{
-  gimple_cond_set_condition_from_tree (stmt, result);
-  return true;
-}
-}
-
-  return false;
-}
-
 
 /* Replace a statement at *SI_P with a sequence of statements in STMTS,
adjusting the replacement stmts location and virtual operands.
@@ -3711,10 +3684,6 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
break;
   }
 
-case GIMPLE_COND:
-  changed |= fold_gimple_cond (as_a  (stmt));
-  break;
-
 case GIMPLE_CALL:
   changed |= gimple_fold_call (gsi, inplace);
   break;


Testing patch:

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 226340)
+++ gcc/gimple-fold.c   (working copy)
@@ -548,6 +551,13 @@ fold_gimple_cond (gcond *stmt)
   STRIP_USELESS_TYPE_CONVERSION (result);
   if (is_gimple_condexpr (result))
 {
+ /* Folding changes 1 != 0 to 1 thus avoid false changed
+reporting if the condition didn't really change.  */
+ if (is_gimple_val (result)
+ && gimple_cond_code (stmt) == NE_EXPR
+ && integer_zerop (gimple_cond_rhs (stmt))
+ && operand_equal_p (gimple_cond_lhs (stmt), result, 0))
+   return false;
   gimple_cond_set_condition_from_tree (stmt, result);
   return true;
 }
@@ -3712,8 +3708,11 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
   }
 
 case GIMPLE_COND:
-  changed |= fold_gimple_cond (as_a  (stmt));
-  break;
+  {
+   bool cg = fold_gimple_cond (as_a  (stmt));
+   gcc_assert (!cg);
+   break;
+  }
 
 case GIMPLE_CALL:
   changed |= gimple_fold_call (gsi, inplace);

[PATCH][23/n] Remove GENERIC stmt combining from SCCVN

2015-07-29 Thread Richard Biener


This merges the recently introduced address comparison patterns
and makes them handle more cases in that process.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-29  Richard Biener  

* match.pd: Merge address comparison patterns and make them
handle some more cases.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226344)
+++ gcc/match.pd(working copy)
@@ -1802,26 +1802,6 @@ (define_operator_list CBRT BUILT_IN_CBRT
   (cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2)
   (if (tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@0)))
(cmp @0 (bit_xor @1 (convert @2)
-   
- /* If this is an equality comparison of the address of two non-weak,
-unaliased symbols neither of which are extern (since we do not
-have access to attributes for externs), then we know the result.  */
- (simplify
-  (cmp (convert? addr@0) (convert? addr@1))
-  (if (DECL_P (TREE_OPERAND (@0, 0))
-   && DECL_P (TREE_OPERAND (@1, 0)))
-   (if (decl_in_symtab_p (TREE_OPERAND (@0, 0))
-   && decl_in_symtab_p (TREE_OPERAND (@1, 0)))
-(with
- {
-   int equal = symtab_node::get_create (TREE_OPERAND (@0, 0))
-   ->equal_address_to (symtab_node::get_create (TREE_OPERAND (@1, 0)));
- }
- (if (equal != 2)
-  { constant_boolean_node (equal
-  ? cmp == EQ_EXPR : cmp != EQ_EXPR, type); }))
-(if (TREE_OPERAND (@0, 0) != TREE_OPERAND (@1, 0))
- { constant_boolean_node (cmp == EQ_EXPR ? false : true, type); }
 
  (simplify
   (cmp (convert? addr@0) integer_zerop)
@@ -1834,7 +1814,7 @@ (define_operator_list CBRT BUILT_IN_CBRT
enough to make fold_stmt not regress when not dispatching to fold_binary.  
*/
 (for cmp (simple_comparison)
  (simplify
-  (cmp (convert? addr@0) (convert? addr@1))
+  (cmp (convert?@2 addr@0) (convert? addr@1))
   (with
{
  HOST_WIDE_INT off0, off1;
@@ -1851,23 +1831,48 @@ (define_operator_list CBRT BUILT_IN_CBRT
  base1 = TREE_OPERAND (base1, 0);
}
}
-   (if (base0 && base1
-   && operand_equal_p (base0, base1, 0)
-   && (cmp == EQ_EXPR || cmp == NE_EXPR
-   || POINTER_TYPE_OVERFLOW_UNDEFINED))
-(switch
- (if (cmp == EQ_EXPR)
-  { constant_boolean_node (off0 == off1, type); })
- (if (cmp == NE_EXPR)
-  { constant_boolean_node (off0 != off1, type); })
- (if (cmp == LT_EXPR)
-  { constant_boolean_node (off0 < off1, type); })
- (if (cmp == LE_EXPR)
-  { constant_boolean_node (off0 <= off1, type); })
- (if (cmp == GE_EXPR)
-  { constant_boolean_node (off0 >= off1, type); })
- (if (cmp == GT_EXPR)
-  { constant_boolean_node (off0 > off1, type); }))
+   (if (base0 && base1)
+(with
+ {
+   int equal;
+   if (decl_in_symtab_p (base0)
+  && decl_in_symtab_p (base1))
+ equal = symtab_node::get_create (base0)
+  ->equal_address_to (symtab_node::get_create (base1));
+   else
+ equal = operand_equal_p (base0, base1, 0);
+ }
+ (if (equal == 1
+ && (cmp == EQ_EXPR || cmp == NE_EXPR
+ /* If the offsets are equal we can ignore overflow.  */
+ || off0 == off1
+ || POINTER_TYPE_OVERFLOW_UNDEFINED
+ /* Or if we compare using pointers to decls.  */
+ || (POINTER_TYPE_P (TREE_TYPE (@2))
+ && DECL_P (base0
+  (switch
+   (if (cmp == EQ_EXPR)
+   { constant_boolean_node (off0 == off1, type); })
+   (if (cmp == NE_EXPR)
+   { constant_boolean_node (off0 != off1, type); })
+   (if (cmp == LT_EXPR)
+   { constant_boolean_node (off0 < off1, type); })
+   (if (cmp == LE_EXPR)
+   { constant_boolean_node (off0 <= off1, type); })
+   (if (cmp == GE_EXPR)
+   { constant_boolean_node (off0 >= off1, type); })
+   (if (cmp == GT_EXPR)
+   { constant_boolean_node (off0 > off1, type); }))
+  (if (equal == 0
+  && DECL_P (base0) && DECL_P (base1)
+  /* If we compare this as integers require equal offset.  */
+  && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
+  || off0 == off1))
+   (switch
+   (if (cmp == EQ_EXPR)
+{ constant_boolean_node (false, type); })
+   (if (cmp == NE_EXPR)
+{ constant_boolean_node (true, type); })
 
 /* Non-equality compare simplifications from fold_binary  */
 (for cmp (lt gt le ge)

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-29 Thread Richard Biener

On Wed, Jul 29, 2015 at 1:59 PM, Mikael Morin  wrote:
> Le 29/07/2015 13:22, Richard Biener a écrit :
>>
>> On Wed, Jul 29, 2015 at 11:34 AM, Mikael Morin 
>> wrote:
>>>
>>> Le 29/07/2015 10:26, Richard Biener a écrit :
>>
>>
>> Did you try using vec_safe_splice?



 That handles NULL retargs, not NULL or empty arglist.

>>> I think retargs is NULL.
>>
>>
>> Not if the patch fixes anything.
>>
> The case retargs == NULL is the case arglen == 0, which means every vector
> pointer we are about to splice is NULL.
> So the patch fixes it.

Ok, that wasn't obvious from reading the patch.

Richard.

Re: [PATCH, PR66846] Mark inner loop for fixup in parloops

2015-07-29 Thread Richard Biener

On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries  wrote:
> On 28/07/15 12:11, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries 
>> wrote:
>>>
>>> On 20/07/15 15:04, Tom de Vries wrote:

 On 16/07/15 12:15, Richard Biener wrote:
>
>
> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>  wrote:
>>
>>
>> On 16/07/15 10:44, Richard Biener wrote:
>>>
>>>
>>>
>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries
>>> 
>>> wrote:

 Hi,

 I.

 In openmp expansion of loops, we do some effort to try to create
 matching
 loops in the loop state of the child function, f.i.in

 expand_omp_for_generic:
 ...
  struct loop *outer_loop;
  if (seq_loop)
outer_loop = l0_bb->loop_father;
  else
{
  outer_loop = alloc_loop ();
  outer_loop->header = l0_bb;
  outer_loop->latch = l2_bb;
  add_loop (outer_loop, l0_bb->loop_father);
}

  if (!gimple_omp_for_combined_p (fd->for_stmt))
{
  struct loop *loop = alloc_loop ();
  loop->header = l1_bb;
  /* The loop may have multiple latches.  */
  add_loop (loop, outer_loop);
}
 ...

 And if that doesn't work out, we try to mark the loop state for
 fixup, in
 expand_omp_taskreg and expand_omp_target:
 ...
  /* When the OMP expansion process cannot guarantee an
 up-to-date
 loop tree arrange for the child function to fixup
 loops.  */
  if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
 ...

 and expand_omp_for:
 ...
  else
/* If there isn't a continue then this is a degerate case
 where
   the introduction of abnormal edges during lowering will
 prevent
   original loops from being detected.  Fix that up.  */
loops_state_set (LOOPS_NEED_FIXUP);
 ...

 However, loops are fixed up anyway, because the first pass we
 execute
 with
 the new child function is pass_fixup_cfg.

 The new child function contains a function call to
 __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
 execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
 subsequently
 the loops with LOOPS_NEED_FIXUP.

 II.

 This patch adds a verification that at the end of the omp-expand
 processing
 of the child function, either the loop structure is ok, or marked
 for
 fixup.

 This verfication triggered a failure in parloops. When an outer
 loop is
 being parallelized, both the outer and inner loop are cancelled.
 Then
 during
 omp-expansion, we create a loop in the loop state for the outer
 loop (the
 one that is transformed), but not for the inner, which causes the
 verification failure:
 ...
 outer-1.c:11:3: error: loop with header 5 not in loop tree
 ...

 [ I ran into this verification failure with an openacc kernels
 testcase
 on
 the gomp-4_0-branch, where parloops is called additionally from a
 different
 location, and pass_fixup_cfg is not the first pass that the child
 function
 is processed by. ]

 The patch contains a bit that makes sure that the loop state of the
 child
 function is marked for fixup in parloops. The bit is non-trival
 since it
 create a loop state and sets the fixup flag on the loop state, but
 postpones
 the init_loops_structure call till move_sese_region_to_fn, where it
 can
 succeed.

> Can we fix the root-cause of the issue instead?  That is, build a
> valid loop
> structure in the first place?
>

 This patch manages to keep the loop structure, that is, to not cancel
 the loop tree in parloops, and guarantee a valid loop structure at the
 end of parloops.

 The transformation to insert the omp_for invalidates the loop state
 properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
 we drop those in parloops.

 In expand_omp_for_static_nochunk, we detect the existing loop struct of
 the omp_for, and keep it.

 Then by calling

Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-29 Thread Richard Biener

On Wed, Jul 29, 2015 at 1:22 PM, Tom de Vries  wrote:
> On 29/07/15 10:09, Richard Biener wrote:
>>
>> On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries 
>> wrote:
>>>
>>> On 28/07/15 09:59, Richard Biener wrote:


 On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries 

 wrote:
>
>
> Hi,
>
> this patch allows parallelization and vectorization of reduction
> operators
> that are guaranteed to not overflow (such as min and max operators),
> independent of the overflow behaviour of the type.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?



 Hmm, I don't like that no_overflow_tree_code function.  We have a much
 more
 clear understanding which codes may overflow or trap.  Thus please add
 a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED}
 like

>>>
>>> Done.
>>>
 bool
 operation_overflow_traps (tree type, enum tree_code code)
 {
 if (!ANY_INTEGRAL_TYPE_P (type)
>>>
>>>
>>>
>>> I've changed this test into a gcc_checking_assert.
>>>
>>>
|| !TYPE_OVERFLOW_TRAPS (type))
   return false;
 switch (code)
   {
   case PLUS_EXPR:
   case MINUS_EXPR:
   case MULT_EXPR:
   case LSHIFT_EXPR:
  /* Can overflow in various ways */
   case TRUNC_DIV_EXPR:
   case EXACT_DIV_EXPR:
   case FLOOR_DIV_EXPR:
   case CEIL_DIV_EXPR:
  /* For INT_MIN / -1 */
   case NEGATE_EXPR:
   case ABS_EXPR:
  /* For -INT_MIN */
  return true;
   default:
  return false;
  }
 }

 and similar variants for _wraps and _undefined.  I think we decided at
 some point
 the compiler should not take advantage of the fact that lshift or
 *_div have undefined
 behavior on signed integer overflow, similar we only take advantage of
 integral-type
 overflow behavior, not vector or complex.  So we could reduce the
 number of cases
 the functions return true if we document that it returns true only for
 the cases where
 the compiler needs to / may assume wrapping behavior does not take
 place.
 As for _traps for example we only have optabs and libfuncs for
 plus,minus,mult,negate
 and abs.
>>>
>>>
>>>
>>> I've tried to capture all of this in the three new functions:
>>> - operation_overflows_and_traps
>>> - operation_no_overflow_or_wraps
>>> - operation_overflows_and_undefined (unused atm)
>>>
>>> I've also added the graphite bit.
>>>
>>> OK for trunk, if bootstrap and reg-test succeeds?
>>
>>
>> +/* Returns true if CODE operating on operands of type TYPE can overflow,
>> and
>> +   fwrapv generates trapping insns for CODE.  */
>>
>> ftrapv
>>
>
> Done.
>
>> +bool
>> +operation_overflows_and_traps (tree type, enum tree_code code)
>> +{
>>
>> operation_overflow_traps
>>
>> is better wording.  Meaning that when the operation overflows then it
>> traps.
>>
>
> AFAIU, the purpose of the function is to enable optimizations when it
> returns false, that is:
> - if the operation doesn't overflow, or
> - if the operation overflows, but doesn't trap.
>
> The name operation_overflow_traps does not make clear what it returns when
> the operation doesn't overflow. If the name doesn't make it clear, you need
> to be conservative, that is, return true. Which defies the purpose of the
> function.
>
> I've changed the name to operation_no_trapping_overflow (and inverted logic
> in the function).
>
> But perhaps you want operation_overflow_traps with a conservative return for
> non-overflow operations, and use it like this:
> ...
>   else if (INTEGRAL_TYPE_P (type) && check_reduction)
> {
>   if (operation_overflows (type, code)
>   && operation_overflow_traps (type, code))
> {
>   /* Changing the order of operations changes the semantics.  */
> ...
> ?

I think operation_no_trapping_overflow has the same wording issue as
operation_overflow_traps but I'm not a native speaker so I'll take your
word that operation_no_trapping_overflow is non-ambiguous iff the
operation cannot overflow.

And no, I didn't mean to use it in combination with operation_overflows.

>> +  /* We don't take advantage of integral type overflow behaviour for
>> complex and
>> + vector types.  */
>>
>> We don't generate instructions that trap on overflow for complex or vector
>> types
>>
>
> Done.
>
>> +  if (!INTEGRAL_TYPE_P (type))
>> +return true;
>>
>> +  switch (code)
>> +{
>> +case PLUS_EXPR:
>> +case MINUS_EXPR:
>> +case MULT_EXPR:
>> +case LSHIFT_EXPR:
>> +  /* Can overflow in various ways.  */
>>
>> we don't have a trapping optab for lshift
>>
>> +case TRUNC_DIV_EXPR:
>> +case EXACT_DIV_EXPR:
>> +case FLOOR_DIV_EXPR:
>> +case CEIL_DIV_EXPR:
>>
>> nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return
>

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-29 Thread Mikael Morin


Le 29/07/2015 13:22, Richard Biener a écrit :

On Wed, Jul 29, 2015 at 11:34 AM, Mikael Morin  wrote:

Le 29/07/2015 10:26, Richard Biener a écrit :


Did you try using vec_safe_splice?



That handles NULL retargs, not NULL or empty arglist.


I think retargs is NULL.


Not if the patch fixes anything.

The case retargs == NULL is the case arglen == 0, which means every 
vector pointer we are about to splice is NULL.

So the patch fixes it.

[PATCH, libgfortran]: Avoid left shift of negative value warning

2015-07-29 Thread Uros Bizjak

Attached patch rewrites GFC_DTYPE_SIZE_MASK definition to avoid "left
shift of negative value" warning. during ligfortran build.

2015-07-29  Uros Bizjak  

PR libgfortran/66650
* libgfortran.h (GFC_DTYPE_SIZE_MASK): Rewrite to avoid
"left shift of negative value" warning.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline and release branches?

Uros.

---
Index: libgfortran.h
===
--- libgfortran.h   (revision 226339)
+++ libgfortran.h   (working copy)
@@ -404,8 +404,7 @@

/* Macros to get both the size and the type with a single masking operation  */

-#define GFC_DTYPE_SIZE_MASK \
-  ((~((index_type) 0) >> GFC_DTYPE_SIZE_SHIFT) << GFC_DTYPE_SIZE_SHIFT)
+#define GFC_DTYPE_SIZE_MASK (-((index_type) 1 << GFC_DTYPE_SIZE_SHIFT))
#define GFC_DTYPE_TYPE_SIZE_MASK (GFC_DTYPE_SIZE_MASK | GFC_DTYPE_TYPE_MASK)

#define GFC_DTYPE_TYPE_SIZE(desc) ((desc)->dtype & GFC_DTYPE_TYPE_SIZE_MASK)

Re: [PATCH, PR66846] Mark inner loop for fixup in parloops

2015-07-29 Thread Tom de Vries


On 28/07/15 12:11, Richard Biener wrote:

On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries  wrote:

On 20/07/15 15:04, Tom de Vries wrote:


On 16/07/15 12:15, Richard Biener wrote:


On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
 wrote:


On 16/07/15 10:44, Richard Biener wrote:



On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries 
wrote:



Hi,

I.

In openmp expansion of loops, we do some effort to try to create
matching
loops in the loop state of the child function, f.i.in
expand_omp_for_generic:
...
 struct loop *outer_loop;
 if (seq_loop)
   outer_loop = l0_bb->loop_father;
 else
   {
 outer_loop = alloc_loop ();
 outer_loop->header = l0_bb;
 outer_loop->latch = l2_bb;
 add_loop (outer_loop, l0_bb->loop_father);
   }

 if (!gimple_omp_for_combined_p (fd->for_stmt))
   {
 struct loop *loop = alloc_loop ();
 loop->header = l1_bb;
 /* The loop may have multiple latches.  */
 add_loop (loop, outer_loop);
   }
...

And if that doesn't work out, we try to mark the loop state for
fixup, in
expand_omp_taskreg and expand_omp_target:
...
 /* When the OMP expansion process cannot guarantee an
up-to-date
loop tree arrange for the child function to fixup
loops.  */
 if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
   child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
...

and expand_omp_for:
...
 else
   /* If there isn't a continue then this is a degerate case where
  the introduction of abnormal edges during lowering will
prevent
  original loops from being detected.  Fix that up.  */
   loops_state_set (LOOPS_NEED_FIXUP);
...

However, loops are fixed up anyway, because the first pass we execute
with
the new child function is pass_fixup_cfg.

The new child function contains a function call to
__builtin_omp_get_num_threads, which is marked with ECF_CONST, so
execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
subsequently
the loops with LOOPS_NEED_FIXUP.


II.

This patch adds a verification that at the end of the omp-expand
processing
of the child function, either the loop structure is ok, or marked for
fixup.

This verfication triggered a failure in parloops. When an outer
loop is
being parallelized, both the outer and inner loop are cancelled. Then
during
omp-expansion, we create a loop in the loop state for the outer
loop (the
one that is transformed), but not for the inner, which causes the
verification failure:
...
outer-1.c:11:3: error: loop with header 5 not in loop tree
...

[ I ran into this verification failure with an openacc kernels
testcase
on
the gomp-4_0-branch, where parloops is called additionally from a
different
location, and pass_fixup_cfg is not the first pass that the child
function
is processed by. ]

The patch contains a bit that makes sure that the loop state of the
child
function is marked for fixup in parloops. The bit is non-trival
since it
create a loop state and sets the fixup flag on the loop state, but
postpones
the init_loops_structure call till move_sese_region_to_fn, where it
can
succeed.







Can we fix the root-cause of the issue instead?  That is, build a
valid loop
structure in the first place?



This patch manages to keep the loop structure, that is, to not cancel
the loop tree in parloops, and guarantee a valid loop structure at the
end of parloops.

The transformation to insert the omp_for invalidates the loop state
properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
we drop those in parloops.

In expand_omp_for_static_nochunk, we detect the existing loop struct of
the omp_for, and keep it.

Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
the loop state properties LOOPS_HAVE_RECORDED_EXITS and
LOOPS_HAVE_SIMPLE_LATCHES back.



This updated patch tries a more minimal approach.

Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
exit instead.

And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.

Bootstrapped and reg-tested on x86_64.

OK for trunk?


I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp).  The parloops code lacks comments and
the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?



It's not difficult to do. It's just that the omp-for that is generated 
via the normal route (omp-annotated source code) doesn't have this 
simple latch, and parloops just mimics that cfg shape.


Attached updated patch preserves simple latches in parloops. We need 
some minor adjustments in omp-expand to handle that.



If execute_expand_omp is not performed in a loop pipeline where

Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-29 Thread Tom de Vries


On 29/07/15 10:09, Richard Biener wrote:

On Tue, Jul 28, 2015 at 2:08 PM, Tom de Vries  wrote:

On 28/07/15 09:59, Richard Biener wrote:


On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries 
wrote:


Hi,

this patch allows parallelization and vectorization of reduction
operators
that are guaranteed to not overflow (such as min and max operators),
independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Hmm, I don't like that no_overflow_tree_code function.  We have a much
more
clear understanding which codes may overflow or trap.  Thus please add
a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like



Done.


bool
operation_overflow_traps (tree type, enum tree_code code)
{
if (!ANY_INTEGRAL_TYPE_P (type)



I've changed this test into a gcc_checking_assert.



   || !TYPE_OVERFLOW_TRAPS (type))
  return false;
switch (code)
  {
  case PLUS_EXPR:
  case MINUS_EXPR:
  case MULT_EXPR:
  case LSHIFT_EXPR:
 /* Can overflow in various ways */
  case TRUNC_DIV_EXPR:
  case EXACT_DIV_EXPR:
  case FLOOR_DIV_EXPR:
  case CEIL_DIV_EXPR:
 /* For INT_MIN / -1 */
  case NEGATE_EXPR:
  case ABS_EXPR:
 /* For -INT_MIN */
 return true;
  default:
 return false;
 }
}

and similar variants for _wraps and _undefined.  I think we decided at
some point
the compiler should not take advantage of the fact that lshift or
*_div have undefined
behavior on signed integer overflow, similar we only take advantage of
integral-type
overflow behavior, not vector or complex.  So we could reduce the
number of cases
the functions return true if we document that it returns true only for
the cases where
the compiler needs to / may assume wrapping behavior does not take place.
As for _traps for example we only have optabs and libfuncs for
plus,minus,mult,negate
and abs.



I've tried to capture all of this in the three new functions:
- operation_overflows_and_traps
- operation_no_overflow_or_wraps
- operation_overflows_and_undefined (unused atm)

I've also added the graphite bit.

OK for trunk, if bootstrap and reg-test succeeds?


+/* Returns true if CODE operating on operands of type TYPE can overflow, and
+   fwrapv generates trapping insns for CODE.  */

ftrapv



Done.


+bool
+operation_overflows_and_traps (tree type, enum tree_code code)
+{

operation_overflow_traps

is better wording.  Meaning that when the operation overflows then it traps.



AFAIU, the purpose of the function is to enable optimizations when it 
returns false, that is:

- if the operation doesn't overflow, or
- if the operation overflows, but doesn't trap.

The name operation_overflow_traps does not make clear what it returns 
when the operation doesn't overflow. If the name doesn't make it clear, 
you need to be conservative, that is, return true. Which defies the 
purpose of the function.


I've changed the name to operation_no_trapping_overflow (and inverted 
logic in the function).


But perhaps you want operation_overflow_traps with a conservative return 
for non-overflow operations, and use it like this:

...
  else if (INTEGRAL_TYPE_P (type) && check_reduction)
{
  if (operation_overflows (type, code)
  && operation_overflow_traps (type, code))
{
  /* Changing the order of operations changes the semantics.  */
...
?


+  /* We don't take advantage of integral type overflow behaviour for
complex and
+ vector types.  */

We don't generate instructions that trap on overflow for complex or vector types



Done.


+  if (!INTEGRAL_TYPE_P (type))
+return true;

+  switch (code)
+{
+case PLUS_EXPR:
+case MINUS_EXPR:
+case MULT_EXPR:
+case LSHIFT_EXPR:
+  /* Can overflow in various ways.  */

we don't have a trapping optab for lshift

+case TRUNC_DIV_EXPR:
+case EXACT_DIV_EXPR:
+case FLOOR_DIV_EXPR:
+case CEIL_DIV_EXPR:

nor division.  See optabs.c:optab_for_tree_code.  I suggest to only return true
for those.



Before the logic inversion, we return false for these (And also for 
operators that do not overflow).



+/* Returns true if CODE operating on operands of type TYPE cannot overflow, or
+   wraps on overflow.  */
+
+bool
+operation_no_overflow_or_wraps (tree type, enum tree_code code)
+{
+  gcc_checking_assert (ANY_INTEGRAL_TYPE_P (type));

operation_overflow_wraps ()

is still my preferred name.



The name operation_overflow_wraps doesn't make clear what it returns if 
the operation doesn't overflow. And I didn't manage to come up with a 
better name sofar.


Btw, I wonder about something like vector max operation. The current 
implementation of operation_no_overflow_or_wraps returns false. Could we do:

...
 /* We don't take advantage of integral type overflow behaviour for
complex and vector types.  */
  if (!INTEGRAL_TYPE_P (type))
return !operation_overflows (type, code);
...
?


+bool
+operati

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-29 Thread Richard Biener

On Wed, Jul 29, 2015 at 11:34 AM, Mikael Morin  wrote:
> Le 29/07/2015 10:26, Richard Biener a écrit :

 Did you try using vec_safe_splice?
>>
>>
>> That handles NULL retargs, not NULL or empty arglist.
>>
> I think retargs is NULL.

Not if the patch fixes anything.

Richard.

Re: [PATCH 10/15][AArch64] Implement vcvt_{,high_}f16_f32

2015-07-29 Thread James Greenhalgh

On Wed, Jul 29, 2015 at 10:10:09AM +0100, Alan Lawrence wrote:
> James Greenhalgh wrote:
> > On Tue, Jul 28, 2015 at 12:26:09PM +0100, Alan Lawrence wrote:
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf):
> >>Reparameterize to...
> >>(aarch64_float_truncate_lo_): ...this, for both V2SF and V4HF.
> >>(aarch64_float_truncate_hi_v4sf): Reparameterize to...
> >>(aarch64_float_truncate_hi_): ...this, for both V4SF and V8HF.
> >>
> >>* config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add
> >>v8hf variant.
> >>(float_truncate_lo_): Use BUILTIN_VDF iterator.
> >>
> >>* config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New.
> >>
> >>* config/aarch64/iterators.md (VDF, Vdtype): New.
> >>(VWIDE, Vmwtype): Add cases for V4HF and V2SF.
> >>
> > 
> > Hi Alan,
> > 
> > I don't see a patch attached to this one, could you repost with the intended
> > patch for review please?
> > 
> > Thanks,
> > James
> > 
> 
> Oops, sorry, here it is. (FWIW, not changed since previous version of series.)

OK.

Thanks,
James

[PATCH] Handle bogus captures by erroring

2015-07-29 Thread Richard Biener


This makes us error on

(simplify
 (plus @0 integer_zerop)
 @1)

rather than silently emitting "uninitialized" code.  Likewise it
avoids crashing on

(simplify
 (plus @0 @1)
 (if (@2)
  @0))

Bootstrap pending on x86_64-unknown-linux-gnu.

Richard.

2015-07-29  Richard Biener  

* genmatch.c (c_expr::gen_transform): Error on unknown captures.
(parser::parse_capture): Add bool argument on whether to reject
unknown captures.
(parser::parse_expr): Adjust.
(parser::parse_op): Likewise.
(parser::parse_pattern): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226339)
+++ gcc/genmatch.c  (working copy)
@@ -2048,7 +2061,10 @@ c_expr::gen_transform (FILE *f, int inde
id = (const char *)n->val.str.text;
  else
id = (const char *)CPP_HASHNODE (n->val.node.node)->ident.str;
- fprintf (f, "captures[%u]", *capture_ids->get(id));
+ unsigned *cid = capture_ids->get (id);
+ if (!cid)
+   fatal_at (token, "unknown capture id");
+ fprintf (f, "captures[%u]", *cid);
  ++i;
  continue;
}
@@ -3153,7 +3197,7 @@ private:
   const char *get_number ();
 
   id_base *parse_operation ();
-  operand *parse_capture (operand *);
+  operand *parse_capture (operand *, bool);
   operand *parse_expr ();
   c_expr *parse_c_expr (cpp_ttype);
   operand *parse_op ();
@@ -3383,7 +3427,7 @@ parser::parse_operation ()
  capture = '@'  */
 
 struct operand *
-parser::parse_capture (operand *op)
+parser::parse_capture (operand *op, bool require_existing)
 {
   source_location src_loc = eat_token (CPP_ATSIGN)->src_loc;
   const cpp_token *token = peek ();
@@ -3398,7 +3442,11 @@ parser::parse_capture (operand *op)
   bool existed;
   unsigned &num = capture_ids->get_or_insert (id, &existed);
   if (!existed)
-num = next_id;
+{
+  if (require_existing)
+   fatal_at (src_loc, "unknown capture id");
+  num = next_id;
+}
   return new capture (src_loc, num, op);
 }
 
@@ -3452,7 +3500,7 @@ parser::parse_expr ()
 
   if (token->type == CPP_ATSIGN
   && !(token->flags & PREV_WHITE))
-op = parse_capture (e);
+op = parse_capture (e, !parsing_match_operand);
   else if (force_capture)
 {
   unsigned num = capture_ids->elements ();
@@ -3604,7 +3652,7 @@ parser::parse_op ()
   if (token->type == CPP_COLON)
fatal_at (token, "not implemented: predicate on leaf operand");
   if (token->type == CPP_ATSIGN)
-   op = parse_capture (op);
+   op = parse_capture (op, !parsing_match_operand);
 }
 
   return op;
@@ -4074,7 +4122,7 @@ parser::parse_pattern ()
  capture_ids = new cid_map_t;
  e = new expr (p, e_loc);
  while (peek ()->type == CPP_ATSIGN)
-   e->append_op (parse_capture (NULL));
+   e->append_op (parse_capture (NULL, false));
  eat_token (CPP_CLOSE_PAREN);
}
   if (p->nargs != -1

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-29 Thread Marek Polacek

On Wed, Jul 29, 2015 at 10:59:43AM +0100, Richard Earnshaw wrote:
> On 22/07/15 19:43, Martin Sebor wrote:
> > On 07/14/2015 09:18 AM, Marek Polacek wrote:
> >> Code such as "if (i == i)" is hardly ever desirable, so we should be able
> >> to warn about this to prevent dumb mistakes.
> > 
> > I haven't tried the patch or even studied it very carefully but
> > I wonder if this is also the case when i is declared volatile.
> > I.e., do we want to issue a warning there? (If we do, the text
> > of the warning would need to be adjusted in those cases since
> > the expression need not evaluate to true.)
> > 
> > Martin
> > 
> 
> It's also not true if i is an IEEE floating point type with a NaN value.
>  In that case this is a standard idiom for testing for a NaN.

Yep.  Steve already raised this yesterday:

I'm going to fix it.

Thanks,

Marek

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2015-07-29 Thread Kyrill Tkachov


Hi Prathamesh,

This is probably not appropriate for -Os optimisation.
And for speed optimisation I imagine it can vary a lot on the target the code 
is run.
Do you have any benchmark results for this patch?

Thanks,
Kyrill

On 29/07/15 11:09, Prathamesh Kulkarni wrote:

Hi,
This patch tries to implement division with multiplication by
reciprocal using vrecpe/vrecps
with -funsafe-math-optimizations and -freciprocal-math enabled.
Tested on arm-none-linux-gnueabihf using qemu.
OK for trunk ?

Thank you,
Prathamesh

+/* Perform 2 iterations of Newton-Raphson method for better accuracy */
+for (int i = 0; i < 2; i++)
+  {
+emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
+emit_insn (gen_mul3 (rec, rec, vrecps_temp));
+  }
+
+/* We now have reciprocal in rec, perform operands[0] = operands[1] * rec 
*/
+emit_insn (gen_mul3 (operands[0], operands[1], rec));
+DONE;
+  }
+)
+

Full stop and two spaces at the end of the comments.

Re: [PATCH 8/15][AArch64] Add support for float14x{4,8}_t vectors/builtins

2015-07-29 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:25:40PM +0100, Alan Lawrence wrote:
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support
>   V4HFmode and V8HFmode.
>   (aarch64_split_simd_move): Add case for V8HFmode.
>   * config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define.
>   (aarch64_simd_builtin_std_type): Handle HFmode.
>   (aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t.
> 
>   * config/aarch64/aarch64-simd.md (mov, aarch64_get_lane,
>   aarch64_ld1, aarch64_st1   (aarch64_be_ld1, aarch64_be_st1): Use VALLDI_F16 iterator.
> 
>   * config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t,
>   Float16x8_t.
> 
>   * config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16.
>   * config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t):
>   New typedefs.
>   (vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16,
>   vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16,
>   vst1q_lane_f16): New.
>   * config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode.
>   (VALLDI_F16, VALL_F16): New.
>   (Vmtype, VEL, VCONQ, VHALF, VRL3, VRL4, V_TWO_ELEM, V_THREE_ELEM,
>   V_FOUR_ELEM, q): Add cases for V4HF and V8HF.
>   (VDBL, VRL2): Add V4HF case.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and
>   float16x8_t.
>   * gcc.target/aarch64/vset_lane_1.c: Likewise.

>   * gcc.target/aarch64/vld1-vst1_1.c: Likewise, also missing float32x4_t.
>   * gcc.target/aarch64/vld1_lane.c: Remove unused constants; add cases
>   for float16x4_t and float16x8_t.

I'd have preferred the unrelated changes here as separate patches. If you
pull them out, they are OK to commit independent of this patch.

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index a6b351b..a7aaa52 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -97,12 +97,20 @@
>  ;; Vector Float modes with 2 elements.
>  (define_mode_iterator V2F [V2SF V2DF])
>  
> -;; All modes.
> +;; All vector modes on which we support any arithmetic operations.
>  (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF 
> V2DF])
>  
> -;; All vector modes and DI.
> +;; All vector modes, including HF modes on which we cannot operate

The wording here is a bit off, we can operate on them - for a limited set
of operations (and you are missing a full stop). How
about something like:

  All vector modes suitable for moving, loading and storing.

> +(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> + V4HF V8HF V2SF V4SF V2DF])
> +
> +;; All vector modes barring F16, plus DI.

"barring HF modes" for consistency with the above comment.

>  (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF 
> V2DF DI])
>  
> +;; All vector modes and DI.
> +(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
> +   V4HF V8HF V2SF V4SF V2DF DI])
> +
>  ;; All vector modes and DI and DF.

Except HF modes.

>  (define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI")
> + (V4HF "V16HF")
>   (V2SI "V8SI")  (V2SF "V8SF")
>   (DI   "V4DI")  (DF   "V4DF")
>   (V16QI "V32QI") (V8HI "V16HI")

V8HF missing?

> @@ -549,16 +563,20 @@
>   (V2DI "V4DI")  (V2DF "V4DF")])
>  
>  (define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI")
> + (V4HF "V24HF")
>   (V2SI "V12SI")  (V2SF "V12SF")
>   (DI   "V6DI")  (DF   "V6DF")
>   (V16QI "V48QI") (V8HI "V24HI")
> + (V8HF "V48HF")

This should be V24HF?

>   (V4SI "V12SI")  (V4SF "V12SF")
>   (V2DI "V6DI")  (V2DF "V6DF")])
>  
>  (define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI")
> + (V4HF "V32HF")
>   (V2SI "V16SI")  (V2SF "V16SF")
>   (DI   "V8DI")  (DF   "V8DF")
>   (V16QI "V64QI") (V8HI "V32HI")
> + (V8HF "V32HF")
>   (V4SI "V16SI")  (V4SF "V16SF")
>   (V2DI "V8DI")  (V2DF "V8DF")])

Ah ok, I see what is going on here... None of these are actually used for
the 128-bit vector modes, so missing entries or incorrect entries for
V8HFmode don't matter.

However... We should do something consistent, so I think we should either
add the correct mappings for V8HF, or remove all 128-bit modes from these
three mode attributes.

Thanks,
James

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-29 Thread Richard Earnshaw

On 27/07/15 04:10, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> Hi,
> 
> $subject.
> 
> patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
> through config-list.mk with more patches removing usage of the macro.  Ok?
> 
> Trev
> 
> Trevor Saunders (9):
>   remove POINTER_SIZE_UNITS macro
>   add pointer_size target hook
>   target.h: change to use targetm.pointer_size instead of POINTER_SIZE
>   varasm.c: switch from POINTER_SIZE to targetm.pointer_size ()
>   ubsan.c: switch from POINTER_SIZE to targetm.pointer_size ()
>   tree-chkp.c: switch to targetm.pointer_size ()
>   stor-layout.c: switch to targetm.pointer_size ()
>   tree.c: switch to targetm.pointer_size ()
>   emit-rtl.c: switch to targetm.pointer_size ()
> 
>  gcc/c-family/c-cppbuiltin.c |  2 +-
>  gcc/defaults.h  |  3 ---
>  gcc/doc/tm.texi |  7 +++
>  gcc/doc/tm.texi.in  |  2 ++
>  gcc/dwarf2asm.c |  4 ++--
>  gcc/emit-rtl.c  |  5 +++--
>  gcc/lto/lto-object.c|  3 ++-
>  gcc/stor-layout.c   |  9 +
>  gcc/target.def  |  8 
>  gcc/target.h|  8 
>  gcc/targhooks.c |  8 
>  gcc/targhooks.h |  1 +
>  gcc/tree-chkp.c | 14 --
>  gcc/tree.c  |  3 ++-
>  gcc/ubsan.c |  3 ++-
>  gcc/varasm.c| 12 ++--
>  16 files changed, 65 insertions(+), 27 deletions(-)
> 

I'm getting a bit worried about the potential performance impact from
all these indirect function call hooks.  This is a good example of when
it's probably somewhat unnecessary.  I doubt that the compiler could
function correctly if this ever changed in the middle of a function.

It would be much better if targetm.pointer_size was a variable that
could be modified by back-end code on those few occasions when that
might be really necessary.

R.

[ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2015-07-29 Thread Prathamesh Kulkarni

Hi,
This patch tries to implement division with multiplication by
reciprocal using vrecpe/vrecps
with -funsafe-math-optimizations and -freciprocal-math enabled.
Tested on arm-none-linux-gnueabihf using qemu.
OK for trunk ?

Thank you,
Prathamesh
2015-07-28  Prathamesh Kulkarni  
Charles Baylis  

* config/arm/neon.md (div3): New pattern.

testsuite/
* gcc.target/arm/vect-div-1.c: New test-case.
* gcc.target/arm/vect-div-2.c: Likewise.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..28c2e2a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -548,6 +548,32 @@
 (const_string "neon_mul_")))]
 )
 
+(define_expand "div3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+(div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
+ (match_operand:VCVTF 2 "s_register_operand" "w")))]
+  "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal_math"
+  {
+rtx rec = gen_reg_rtx (mode);
+rtx vrecps_temp = gen_reg_rtx (mode);
+
+/* Reciprocal estimate */
+emit_insn (gen_neon_vrecpe (rec, operands[2]));
+
+/* Perform 2 iterations of Newton-Raphson method for better accuracy */
+for (int i = 0; i < 2; i++)
+  {
+   emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
+   emit_insn (gen_mul3 (rec, rec, vrecps_temp));
+  }
+
+/* We now have reciprocal in rec, perform operands[0] = operands[1] * rec 
*/
+emit_insn (gen_mul3 (operands[0], operands[1], rec));
+DONE;
+  }
+)
+
+
 (define_insn "mul3add_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
 (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c 
b/gcc/testsuite/gcc.target/arm/vect-div-1.c
new file mode 100644
index 000..e562ef3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize 
-fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/vect-div-2.c 
b/gcc/testsuite/gcc.target/arm/vect-div-2.c
new file mode 100644
index 000..8e15d0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-div-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fno-reciprocal-math 
-ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-29 Thread Richard Earnshaw

On 22/07/15 19:43, Martin Sebor wrote:
> On 07/14/2015 09:18 AM, Marek Polacek wrote:
>> Code such as "if (i == i)" is hardly ever desirable, so we should be able
>> to warn about this to prevent dumb mistakes.
> 
> I haven't tried the patch or even studied it very carefully but
> I wonder if this is also the case when i is declared volatile.
> I.e., do we want to issue a warning there? (If we do, the text
> of the warning would need to be adjusted in those cases since
> the expression need not evaluate to true.)
> 
> Martin
> 

It's also not true if i is an IEEE floating point type with a NaN value.
 In that case this is a standard idiom for testing for a NaN.

R.

Re: [PATCH 7/15][ARM/AArch64 Testsuite] Add basic fp16 tests

2015-07-29 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:25:26PM +0100, Alan Lawrence wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/fp16/fp16.exp: New.
>   * gcc.target/aarch64/fp16/f16_convs_1.c: New.
>   * gcc.target/aarch64/fp16/f16_convs_2.c: New.

OK.

Thanks,
James

Re: [PATCH 6/15][AArch64] Add basic FP16 support

2015-07-29 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:25:06PM +0100, Alan Lawrence wrote:
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
>   (aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.
> 
>   * config/aarch64/aarch64-modes.def: Add HFmode.
> 
>   * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
>   __ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.
> 
>   * config/aarch64/aarch64.c (aarch64_init_libfuncs,
>   aarch64_promoted_type): New.
> 
>   (aarch64_float_const_representable_p): Disable HFmode.
>   (aarch64_mangle_type): Mangle half-precision floats to "Dh".
>   (TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
>   (TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.
> 
>   * config/aarch64/aarch64.md (mov): Include HFmode using GPF_F16.
>   (movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.
> 
>   * config/aarch64/iterators.md (GPF_F16): New.
> 
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/f16_movs_1.c: New test.

OK with some minor tweaks to some comments.

> diff --git a/gcc/config/aarch64/aarch64-modes.def 
> b/gcc/config/aarch64/aarch64-modes.def
> index b17b90d..c30059b 100644
> --- a/gcc/config/aarch64/aarch64-modes.def
> +++ b/gcc/config/aarch64/aarch64-modes.def
> @@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
>  CC_MODE (CC_DGEU);
>  CC_MODE (CC_DGTU);
>  
> +/* Half-precision floating point for arm_neon.h float16_t.  */
> +FLOAT_MODE (HF, 2, 0);
> +ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
> +

Minor technicality. This is for the __fp16 type, which arm_neon.h aliases
to float16_t. As __fp16 is the name given in the AAPCS64 for the type,
I'd prefer if we used that in the comments documenting the feature.

So, s/arm_neon.h float16_t/__fp16/

> +/* Implement TARGET_PROMOTED_TYPE to promote float16 to 32 bits.  */

Reword as something like:

To promote "__fp16" to "float"

>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST aarch64_address_cost
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index a22c6e4..44fe4f9 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -57,7 +57,9 @@
>if (TARGET_FLOAT) \
>  {   \
>builtin_define ("__ARM_FEATURE_FMA"); \
> -  builtin_define_with_int_value ("__ARM_FP", 0x0C); \
> +   builtin_define_with_int_value ("__ARM_FP", 0x0E); \
> +   builtin_define ("__ARM_FP16_FORMAT_IEEE");\
> +   builtin_define ("__ARM_FP16_ARGS");   \

You'll be in competition with Kyrill's changes for target attributes here,
he moves these all to a new file.

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 498358a..a6b351b 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -38,6 +38,9 @@
>  ;; Iterator for General Purpose Floating-point registers (32- and 64-bit 
> modes)
>  (define_mode_iterator GPF [SF DF])
>  
> +;; Iterator for General Purpose Float regs, inc float16_t.
> +(define_mode_iterator GPF_F16 [HF SF DF])
> +

s/float16_t/__fp16/

Thanks,
James

Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-29 Thread Mikael Morin


Le 29/07/2015 10:26, Richard Biener a écrit :

Did you try using vec_safe_splice?


That handles NULL retargs, not NULL or empty arglist.


I think retargs is NULL.

Re: [gomp4] Redesign oacc_parallel launch API

2015-07-29 Thread Thomas Schwinge

Hi Nathan!

On Tue, 28 Jul 2015 12:52:02 -0400, Nathan Sidwell  wrote:
> I've committed this patch to the gomp4 branch to redo the launch API.  I'll 
> post 
> a version for trunk once the versioning patch gets approved & committed.

Thanks!


(I have not yet looked at the patch in detail.)  There is one regression:

PASS: libgomp.oacc-fortran/asyncwait-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/asyncwait-2.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test

libgomp: Trying to map into device [0x10f7930..0x10f7a30) object when 
[0x10f7930..0x10f7a30) is already mapped

Likewise for the other torture testing flags.


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-29 Thread Richard Biener

On Tue, Jul 28, 2015 at 7:57 PM, Abe  wrote:
> [Richard wrote:]
>>
>> Note the store to *pointer can be done unconditionally
>
>
> Yes; if I`m mapping things correctly in my mind, this is
> something that Sebastian [and Alan, via email?] and I have
> already discussed and which we plan to fix in good time.
>
> Please note that this is a minor problem at most,
> if/when it it safe to assume that the target can handle
> two vectorized conditional operations in the same loop,
> since anything remotely resembling an expensive
> operation in the [pure] condition should be [is being?]
> computed once per index and stored in a temporary.
> For example: if the source code looks something like:
>
>   if ( condition(index) )  A[index] = foo(index);
>   // important: no else here
>
> ... then the new converter currently converts it to something like:
>
>   /* an appropriate type goes here */ __compiler_temp_1 = condition(index);
>   /* the type of the scalar goes here */ * pointer = __compiler_temp_1 ?
> &A[index] : &scratchpad;
>   /* an appropriate type goes here */ __compiler_temp_2 = foo(index);
>   *pointer = __compiler_temp_1 ? __compiler_temp_2 : scratchpad;
>
> … so "condition(index)" is being evaluated only once
> per {evaluation that exists in the source code}.
>
> The fix for this would/will therefor be a minor optimization IMO;
> the benefit would/will be that in/for iterations/columns
> for which the condition is false, the scratchpad will not be
> needlessly read from in order to derive the value to throw away.
> Always throwing away the unneeded result of evaluating "foo(index)"
> is good enough, and by removing an unneeded conditional expression
> the burden on the vectorizer is reduced: it now only needs:
>   {vectorized decision followed by vectorized store}
> in each such loop, not:
>   {vectorized decision followed by vectorized decision followed by
> vectorized store}.
> [intentionally omitting whatever else it must do
>  in a vectorized manner in the same loop]
>
> This is something we [Sebastian and I] plan on fixing eventually anyway,
> i.e. regardless of whether or not it fixes a test case we already have.
>
>
> [Richard wrote:]
>>
>> and note that another performance issue of if-conversion
>> is  that foo(bar) is executed unconditionally.
>
>
> AFAIK this is a fundamental limitation/necessity of if conversion.
>
> A fundamental assumption/requirement here is that "foo(bar)"/"foo(index)"
> is/are both pure and low-cost.  [I`ve renamed the example to "foo(index)"
> to show that it`s not loop-invariant, since if it were then LICM should
> make multiple evaluations of it unneeded and probably not going to happen
> unless you are targeting a VLIW ISA and have an unused slot in the
> instruction word if you do LICM on the sub-instruction in question.]
>
> If "foo(index)" is not being checked for purity,
> then we have a correctness bug.
>
> If "foo(index)" is not being checked for low evaluation cost,
> then we have a performance bug IMO.  The compiler should use its
> existing estimation mechanism[s] to make an educated guess on
> the cost of "foo(index)" and intentionally not do if conversion
> if/when {the predicted cost of evaluating "foo(index)"
>  for each iteration regardless of the condition bits}
> is too high even in the presence of vectorization.
>
>
> [Richard wrote:]
>>
>> We have a bugreport that
>>if (C[index]) A[index] = exp (x);
>> massively slows down things if C[index] is almost never true.
>
>
> Quite understandable.  However, unfortunately I cannot think of
> any mechanism that already exists in GCC [or any other compiler
> the internals of which I am even slightly familiar] to estimate
> the probability of the elements of an arbitrary array --
> or [worse yet] of the probability of an arbitrary expression`s
> evaluation result -- being convertible to either particular
> Boolean value.  Perhaps this is feasible if/when "C[...]" is
> truly an array, i.e. not a pointer, and the array`s contents
> are known at compile time.  Otherwise, it seems to require
> pointer analysis at best, and is infeasible at worst
> [e.g. a pointer received from another translation unit].
>
> I think the only thing we can do about this, other than alter our
> plans for defaulting the if conversion, is to endeavor to make profiling
> [e.g. "gprof"] able to "understand" that a certain piece of code has been
> if-converted and able to suggest -- based on profiling -- that the
> conversion should be undone b/c it is "costing" more than it is "saving",
> even with vectorization, which IMO should be an extremely rare occurrence
> if/once we are checking e.g. "exp(x)" [assuming it`s not loop-invariant]
> for low cost of evaluation.
>
> IOW, whatever we have [or will set] the threshold on evaluation cost of
> the RHS expression for if conversion of source code like the above example
> should, IMO, solve most instances of the abovementioned problem.
> The remaining problem cases will likely be some

1 2 >

1 - 100 of 117 matches

Mail list logo