RE: [PATCH] Enhance phiopt to handle BIT_AND_EXPR

2013-10-08 Thread Zhenqiang Chen

> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, October 09, 2013 5:00 AM
> To: Andrew Pinski; Zhenqiang Chen
> Cc: GCC Patches
> Subject: Re: [PATCH] Enhance phiopt to handle BIT_AND_EXPR
> 
> On 09/30/13 09:57, Andrew Pinski wrote:
> > On Mon, Sep 30, 2013 at 2:29 AM, Zhenqiang Chen
>  wrote:
> >> Hi,
> >>
> >> The patch enhances phiopt to handle cases like:
> >>
> >>if (a == 0 && (...))
> >>  return 0;
> >>return a;
> >>
> >> Boot strap and no make check regression on X86-64 and ARM.
> >>
> >> Is it OK for trunk?
> >
> >  From someone who wrote lot of this code (value_replacement in fact),
> > this looks good, though I would pull:
> > +  if (TREE_CODE (gimple_assign_rhs1 (def)) == SSA_NAME)
> > +{
> > +  gimple def1 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def));
> > +  if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) ==
> > + EQ_EXPR) {  tree op0 = gimple_assign_rhs1 (def1);  tree op1 =
> > + gimple_assign_rhs2 (def1);  if ((operand_equal_for_phi_arg_p (arg0,
> > + op0)
> > +   && operand_equal_for_phi_arg_p (arg1, op1))
> > +  || (operand_equal_for_phi_arg_p (arg0, op1)
> > +   && operand_equal_for_phi_arg_p (arg1, op0)))
> > +{
> > +  *code = gimple_assign_rhs_code (def1);
> > +  return 1;
> > +}
> > + }
> > +}
> >
> > Out into its own function since it is repeated again for
> > gimple_assign_rhs2 (def).
> Agreed.  Repeating blobs of code like that should be pulled out into its own
> subroutine.
> 
> It's been 10 years since we wrote that code Andrew and in looking at it again,
> I wonder if/why DOM doesn't handle the stuff in value_replacement.  It
> really just looks like it's propagation of an edge equivalence.  do you recall
> the motivation behind value_replacement and why we didn't just let DOM
> handle it?
> 
> 
> 
> 
>   Also what about cascading BIT_AND_EXPR
> > like:
> > if((a == 0) & (...) & (...))
> >
> > I notice you don't handle that either.
> Well, given the structure of how that code is going to look, it'd just be
> repeated walking through the input chains.  Do-able, yes.  But I don't think
> handling that should be a requirement for integration.
> 
> I'll go ahead and pull the common bits into a single function and commit on
> Zhenqiang's behalf.

Thank you!
-Zhenqiang 






Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-08 Thread Jason Merrill

On 10/07/2013 05:14 AM, Adam Butcher wrote:

+ /* Forbid ambiguous implicit pack expansions by only allowing
+a single generic type in such a parameter.

+XXX: Maybe allow if explicitly specified template
+XXX: 'typename...' account for all expansions?  Though this
+XXX: could be tricky or slow.


This seems wrong.  The standard says,


The invented type template-parameter is a parameter pack if the corresponding 
parameter-declaration de-
clares a function parameter pack (8.3.5).


So if we have a function parameter pack, any generic type parameters in 
the type are packs.



+ /* If there is only one generic type in the parameter, tentatively
+assume that that it is a parameter pack.  If it turns out, after
+grokdeclarator, that the parameter does not contain a pack
+expansion, then reset it be a non-pack type.  */
+ if (cp_num_implicit_template_type_parms == 1)
+   TEMPLATE_PARM_PARAMETER_PACK
+ (TEMPLATE_TYPE_PARM_INDEX
+   (cp_last_implicit_template_type_parm)) = true;


This will cause problems with type comparison, since TYPE_CANONICAL of 
the implicit parm doesn't have TEMPLATE_PARM_PARAMETER_PACK set.  That's 
why I was talking about using tsubst to replace a non-pack with a pack.



+   parser->implicit_template_scope = class_scope;
+  else
+   parser->implicit_template_scope = fn_parms_scope;
+  current_binding_level = parser->implicit_template_scope->level_chain;


Why not make implicit_template_scope the actual template scope, rather 
than the function/class?  It looks like all the users immediately take 
the level_chain.



+/* Nonzero if parsing a context where 'auto' in a parameter list should not
+   trigger an implicit template parameter.  Specifically, 'auto' should not
+   introduce a new template type parameter in explicit specializations, 
trailing
+   return types or exception specifiers.  */
+int cp_disable_auto_as_implicit_function_template_parm;


Can we put this in cp_parser, invert the sense of the flag, and only set 
it during cp_parser_parameter_declaration_clause?


Jason


Re: Add a param to decide stack slot sharing at -O0

2013-10-08 Thread Andi Kleen
Easwaran Raman  writes:

> In cfgexpand.c, variables in non-overlapping lexical scopes are
> assigned same stack locations at -O1 and above. At -O0, this is
> attempted only if the size of the stack objects is above a threshold
> (32). The rationale is at -O0, more variables are going to be in the
> stack and the O(n^2) stack slot sharing algorithm will increase the
> compilation time. This patch replaces the constant with a param which
> is set to 32 by default. We ran into a case where the presence of
> always_inline attribute triggered Wframe-larger-than warnings at -O0
> but not at -O2 since the different inlined copies share the stack. We
> are ok with a slight increase in compilation time to get smaller stack
> frames even at -O0 and this patch would allow us do that easily.

Seems like a odd thing for a param. If the compile time increase is very
small (< 1%?) I would just make the new threshold default.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


[rl78] tweaks to moviqi/movsi expander

2013-10-08 Thread DJ Delorie

Minor fix.  Committed.

* config/rl78/rl78-expand.md (movqi): use operands[] not operandN.
(movhi): Likewise.

 2013-10-08  Jan Hubicka  
Index: config/rl78/rl78-expand.md
===
--- config/rl78/rl78-expand.md  (revision 203298)
+++ config/rl78/rl78-expand.md  (working copy)
@@ -22,55 +22,55 @@
 
 (define_expand "movqi"
   [(set (match_operand:QI 0 "nonimmediate_operand")
(match_operand:QI 1 "general_operand"))]
   ""
   {
-if (MEM_P (operand0) && MEM_P (operand1))
-  operands[1] = copy_to_mode_reg (QImode, operand1);
-if (rl78_far_p (operand0) && rl78_far_p (operand1))
-  operands[1] = copy_to_mode_reg (QImode, operand1);
+if (MEM_P (operands[0]) && MEM_P (operands[1]))
+  operands[1] = copy_to_mode_reg (QImode, operands[1]);
+if (rl78_far_p (operands[0]) && rl78_far_p (operands[1]))
+  operands[1] = copy_to_mode_reg (QImode, operands[1]);
 
 /* FIXME: Not sure how GCC can generate (SUBREG (SYMBOL_REF)),
but it does.  Since this makes no sense, reject it here.  */
-if (GET_CODE (operand1) == SUBREG
-&& GET_CODE (XEXP (operand1, 0)) == SYMBOL_REF)
+if (GET_CODE (operands[1]) == SUBREG
+&& GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF)
   FAIL;
 /* Similarly for (SUBREG (CONST (PLUS (SYMBOL_REF.
cf. g++.dg/abi/packed.C.  */
-if (GET_CODE (operand1) == SUBREG
-   && GET_CODE (XEXP (operand1, 0)) == CONST
-&& GET_CODE (XEXP (XEXP (operand1, 0), 0)) == PLUS
-&& GET_CODE (XEXP (XEXP (XEXP (operand1, 0), 0), 0)) == SYMBOL_REF)
+if (GET_CODE (operands[1]) == SUBREG
+   && GET_CODE (XEXP (operands[1], 0)) == CONST
+&& GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == PLUS
+&& GET_CODE (XEXP (XEXP (XEXP (operands[1], 0), 0), 0)) == SYMBOL_REF)
   FAIL;
 
-if (CONST_INT_P (operand1) && ! IN_RANGE (INTVAL (operand1), (-1 << 8) + 
1, (1 << 8) - 1))
+if (CONST_INT_P (operands[1]) && ! IN_RANGE (INTVAL (operands[1]), (-1 << 
8) + 1, (1 << 8) - 1))
   FAIL;
   }
 )
 
 (define_expand "movhi"
   [(set (match_operand:HI 0 "nonimmediate_operand")
(match_operand:HI 1 "general_operand"))]
   ""
   {
-if (MEM_P (operand0) && MEM_P (operand1))
-  operands[1] = copy_to_mode_reg (HImode, operand1);
-if (rl78_far_p (operand0) && rl78_far_p (operand1))
-  operands[1] = copy_to_mode_reg (HImode, operand1);
+if (MEM_P (operands[0]) && MEM_P (operands[1]))
+  operands[1] = copy_to_mode_reg (HImode, operands[1]);
+if (rl78_far_p (operands[0]) && rl78_far_p (operands[1]))
+  operands[1] = copy_to_mode_reg (HImode, operands[1]);
 
 /* FIXME: Not sure how GCC can generate (SUBREG (SYMBOL_REF)),
but it does.  Since this makes no sense, reject it here.  */
-if (GET_CODE (operand1) == SUBREG
-&& GET_CODE (XEXP (operand1, 0)) == SYMBOL_REF)
+if (GET_CODE (operands[1]) == SUBREG
+&& GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF)
   FAIL;
 /* Similarly for (SUBREG (CONST (PLUS (SYMBOL_REF.  */
-if (GET_CODE (operand1) == SUBREG
-   && GET_CODE (XEXP (operand1, 0)) == CONST
-&& GET_CODE (XEXP (XEXP (operand1, 0), 0)) == PLUS
-&& GET_CODE (XEXP (XEXP (XEXP (operand1, 0), 0), 0)) == SYMBOL_REF)
+if (GET_CODE (operands[1]) == SUBREG
+   && GET_CODE (XEXP (operands[1], 0)) == CONST
+&& GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == PLUS
+&& GET_CODE (XEXP (XEXP (XEXP (operands[1], 0), 0), 0)) == SYMBOL_REF)
   FAIL;
   }
 )
 
 (define_insn_and_split "movsi"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=vYS,v,Wfr")


RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-08 Thread Bernd Edlinger
Hi,

On Mon, 30 Sep 2013 16:18:30, DJ Delorie wrote:
>
> As per my previous comments on this patch, I will not approve the
> changes to the m32c backend, as they will cause real bugs in real
> hardware, and violate the hardware's ABI. The user may use
> -fno-strict-volatile-bitfields if they do not desire this behavior and
> understand the consequences.
>
> I am not a maintainer for the rx and h8300 ports, but they are in the
> same situation.
>
> To reiterate my core position: if the user defines a proper "volatile
> int" bitfield, and the compiler does anything other than an int-sized
> access, the compiler is WRONG. Any optimization that changes volatile
> accesses to something other than what the user specified is a bug that
> needs to be fixed before this option can be non-default.

hmm, I just tried to use the latest 4.9 trunk to compile the example from
the AAPCS document:

struct s
{
  volatile int a:8;
  volatile char b:2;
};

struct s ss;

int
main ()
{
  ss.a=1;
  ss.b=1;
  return 0;
}

and the resulting code is completely against the written AAPCS specification:

main:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r3, .L2
    ldrh    r2, [r3]
    bic r2, r2, #254
    orr r2, r2, #1
    strh    r2, [r3]    @ movhi
    ldrh    r2, [r3]
    bic r2, r2, #512
    orr r2, r2, #256
    strh    r2, [r3]    @ movhi
    mov r0, #0
    bx  lr

two half-word accesses, to my total surprise!

As it looks like, the -fstrict-volatile-bitfields are already totally broken,
apparently in favour of the C++ memory model, at least at the write-side.

These are aligned accesses, not the packed structures, that was the
only case where it used to work once.

This must be fixed. I do not understand why we cannot agree, that
at least the bug-fix part of Sandra's patch needs to be applied.

Regards
Bernd.

rl78: fix %c modifier

2013-10-08 Thread DJ Delorie

namespaces are so small... committed.

* config/rl78/rl78.c (rl78_print_operand_1): Change %c to %C to
avoid conflict with the MI use of %c.
* config/rl78/rl78-real.md: change %c to %C throughout.
* config/rl78/rl78-virt.md: Likewise.

Index: config/rl78/rl78-real.md
===
--- config/rl78/rl78-real.md(revision 203295)
+++ config/rl78/rl78-real.md(working copy)
@@ -318,69 +318,69 @@
  [(match_operand:QI 1 "general_operand" "A,A,A")
   (match_operand:QI 2 "general_operand" 
"ISqi,i,v")])
   (label_ref (match_operand 3 "" ""))
  (pc)))]
   "rl78_real_insns_ok ()"
   "@
-   cmp\t%1, %2 \;xor1 CY,%1.7\;not1 CY\;sk%c0 \;br\t!!%3
-   cmp\t%1, %2 \;xor1 CY,%1.7\;sk%c0 \;br\t!!%3
-   cmp\t%1, %2 \;xor1 CY,%1.7\;xor1 CY,%2.7\;sk%c0 \;br\t!!%3"
+   cmp\t%1, %2 \;xor1 CY,%1.7\;not1 CY\;sk%C0 \;br\t!!%3
+   cmp\t%1, %2 \;xor1 CY,%1.7\;sk%C0 \;br\t!!%3
+   cmp\t%1, %2 \;xor1 CY,%1.7\;xor1 CY,%2.7\;sk%C0 \;br\t!!%3"
   )
 
 (define_insn "*cbranchqi4_real"
   [(set (pc) (if_then_else
  (match_operator 0 "rl78_cmp_operator_real"
  [(match_operand:QI 1 "general_operand" 
"Wabvaxbc,a,  v,bcdehl")
   (match_operand:QI 2 "general_operand" "M,   
irvWabWhlWh1Whb,i,a")])
   (label_ref (match_operand 3 "" ""))
  (pc)))]
   "rl78_real_insns_ok ()"
   "@
-   cmp0\t%1 \;sk%c0 \;br\t!!%3
-   cmp\t%1, %2 \;sk%c0 \;br\t!!%3
-   cmp\t%1, %2 \;sk%c0 \;br\t!!%3
-   cmp\t%1, %2 \;sk%c0 \;br\t!!%3"
+   cmp0\t%1 \;sk%C0 \;br\t!!%3
+   cmp\t%1, %2 \;sk%C0 \;br\t!!%3
+   cmp\t%1, %2 \;sk%C0 \;br\t!!%3
+   cmp\t%1, %2 \;sk%C0 \;br\t!!%3"
   )
 
 (define_insn "*cbranchhi4_real_signed"
   [(set (pc) (if_then_else
  (match_operator 0 "rl78_cmp_operator_signed"
  [(match_operand:HI 1 "general_operand" "A,A,A,vR")
   (match_operand:HI 2 "general_operand" 
"IShi,i,v,1")])
   (label_ref (match_operand 3))
  (pc)))]
   "rl78_real_insns_ok ()"
   "@
-   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;not1 CY\;sk%c0 \;br\t!!%3
-   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;sk%c0 \;br\t!!%3
-   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;xor1 CY,%Q2.7\;sk%c0 \;br\t!!%3
+   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;not1 CY\;sk%C0 \;br\t!!%3
+   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;sk%C0 \;br\t!!%3
+   cmpw\t%1, %2 \;xor1 CY,%Q1.7\;xor1 CY,%Q2.7\;sk%C0 \;br\t!!%3
%z0\t!!%3"
   )
 
 (define_insn "cbranchhi4_real"
   [(set (pc) (if_then_else
  (match_operator0 "rl78_cmp_operator_real"
  [(match_operand:HI 1 "general_operand" "A,vR")
   (match_operand:HI 2 "general_operand" 
"iBDTvWabWhlWh1,1")])
   (label_ref (match_operand  3 "" ""))
  (pc)))]
   "rl78_real_insns_ok ()"
   "@
-  cmpw\t%1, %2 \;sk%c0 \;br\t!!%3
+  cmpw\t%1, %2 \;sk%C0 \;br\t!!%3
   %z0\t!!%3"
   )
 
 (define_insn "cbranchhi4_real_inverted"  
   [(set (pc) (if_then_else
  (match_operator0 "rl78_cmp_operator_real"
  [(match_operand:HI 1 "general_operand" "A")
   (match_operand:HI 2 "general_operand" 
"iBDTvWabWhlWh1")])
  (pc)
   (label_ref (match_operand  3 "" ""]
   "rl78_real_insns_ok ()"
-  "cmpw\t%1, %2 \;sk%c0 \;br\t!!%3"
+  "cmpw\t%1, %2 \;sk%C0 \;br\t!!%3"
   )
 
 (define_insn "*cbranchsi4_real_lt"
   [(set (pc) (if_then_else
  (lt (match_operand:SI 0 "general_operand" "U,vWabWhlWh1")
  (const_int 0))
@@ -416,28 +416,28 @@
   (label_ref (match_operand 3 "" ""))
  (pc)))
(clobber (reg:HI AX_REG))
]
   "rl78_real_insns_ok ()"
   "@
-   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;not1 CY\;  movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%c0 \;br\t!!%3
-   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;   movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%c0 \;br\t!!%3
-   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;xor1 CY,%E2.7\;movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%c0 \;br\t!!%3"
+   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;not1 CY\;  movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%C0 \;br\t!!%3
+   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;   movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%C0 \;br\t!!%3
+   movw ax,%H1 \;cmpw  ax, %H2 \;xor1 CY,a.7\;xor1 CY,%E2.7\;movw ax,%h1 
\;sknz \;cmpw  ax, %h2 \;sk%C0 \;br\t!!%3"
   )
 
 (define_insn "*cbranchsi4_real"
   [(set (pc) (if_then_else
  (match_operator 0 "rl78_cmp_operator_real"
  [(match_operand:SI 1 "general_operand" "vUi")
   (match_operand:SI 2 "general_operand" 
"iWhlWh1v")])
   (label_ref (match_operand 3 "" ""))
  (pc)))
 

libgo patch committed: Do not report thunks/recover in backtrace

2013-10-08 Thread Ian Lance Taylor
This patch to libgo avoids returning thunk or recover functions in a
backtrace or when calling runtime.Caller or friends.  This is to give a
stack backtrace more like that generated by the gc compiler.  In
particular, it is so that runtime.Caller(n) will return stack trace that
more closely corresponds to that returned by the gc toolchain.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian

diff -r 17075de125b7 libgo/runtime/go-callers.c
--- a/libgo/runtime/go-callers.c	Tue Oct 08 16:51:52 2013 -0700
+++ b/libgo/runtime/go-callers.c	Tue Oct 08 16:54:42 2013 -0700
@@ -53,6 +53,21 @@
 	return 0;
 }
 
+  /* Skip thunks and recover functions.  There is no equivalent to
+ these functions in the gc toolchain, so returning them here means
+ significantly different results for runtime.Caller(N).  */
+  if (function != NULL)
+{
+  const char *p;
+
+  p = __builtin_strchr (function, '.');
+  if (p != NULL && __builtin_strncmp (p + 1, "$thunk", 6) == 0)
+	return 0;
+  p = __builtin_strrchr (function, '$');
+  if (p != NULL && __builtin_strcmp(p, "$recover") == 0)
+	return 0;
+}
+
   if (arg->skip > 0)
 {
   --arg->skip;


Go patch committed: Error for qualified ID in struct composite lit

2013-10-08 Thread Ian Lance Taylor
This patch to the Go frontend gives an error if a qualified identifier
is used as a field name in a struct composite literal.  This is trickier
than one might expect since qualified identifiers are fine as keys in a
slice or map composite literal.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Ian

diff -r 29fef9fa5a41 go/expressions.cc
--- a/go/expressions.cc	Mon Oct 07 08:31:41 2013 -0700
+++ b/go/expressions.cc	Tue Oct 08 16:48:26 2013 -0700
@@ -11293,7 +11293,7 @@
 }
 
   Expression* e = Expression::make_composite_literal(array_type, 0, false,
-		 bytes, loc);
+		 bytes, false, loc);
 
   Variable* var = new Variable(array_type, e, true, false, false, loc);
 
@@ -13236,9 +13236,11 @@
 {
  public:
   Composite_literal_expression(Type* type, int depth, bool has_keys,
-			   Expression_list* vals, Location location)
+			   Expression_list* vals, bool all_are_names,
+			   Location location)
 : Parser_expression(EXPRESSION_COMPOSITE_LITERAL, location),
-  type_(type), depth_(depth), vals_(vals), has_keys_(has_keys)
+  type_(type), depth_(depth), vals_(vals), has_keys_(has_keys),
+  all_are_names_(all_are_names)
   { }
 
  protected:
@@ -13256,6 +13258,7 @@
 	(this->vals_ == NULL
 	 ? NULL
 	 : this->vals_->copy()),
+	this->all_are_names_,
 	this->location());
   }
 
@@ -13285,6 +13288,9 @@
   // If this is true, then VALS_ is a list of pairs: a key and a
   // value.  In an array initializer, a missing key will be NULL.
   bool has_keys_;
+  // If this is true, then HAS_KEYS_ is true, and every key is a
+  // simple identifier.
+  bool all_are_names_;
 };
 
 // Traversal.
@@ -13387,6 +13393,8 @@
   std::vector vals(field_count);
   std::vector* traverse_order = new(std::vector);
   Expression_list::const_iterator p = this->vals_->begin();
+  Expression* external_expr = NULL;
+  const Named_object* external_no = NULL;
   while (p != this->vals_->end())
 {
   Expression* name_expr = *p;
@@ -13492,6 +13500,12 @@
 
   if (no != NULL)
 	{
+	  if (no->package() != NULL && external_expr == NULL)
+	{
+	  external_expr = name_expr;
+	  external_no = no;
+	}
+
 	  name = no->name();
 
 	  // A predefined name won't be packed.  If it starts with a
@@ -13541,6 +13555,23 @@
   traverse_order->push_back(index);
 }
 
+  if (!this->all_are_names_)
+{
+  // This is a weird case like bug462 in the testsuite.
+  if (external_expr == NULL)
+	error_at(this->location(), "unknown field in %qs literal",
+		 (type->named_type() != NULL
+		  ? type->named_type()->message_name().c_str()
+		  : "unnamed struct"));
+  else
+	error_at(external_expr->location(), "unknown field %qs in %qs",
+		 external_no->message_name().c_str(),
+		 (type->named_type() != NULL
+		  ? type->named_type()->message_name().c_str()
+		  : "unnamed struct"));
+  return Expression::make_error(location);
+}
+
   Expression_list* list = new Expression_list;
   list->reserve(field_count);
   for (size_t i = 0; i < field_count; ++i)
@@ -13830,11 +13861,11 @@
 
 Expression*
 Expression::make_composite_literal(Type* type, int depth, bool has_keys,
-   Expression_list* vals,
+   Expression_list* vals, bool all_are_names,
    Location location)
 {
   return new Composite_literal_expression(type, depth, has_keys, vals,
-	  location);
+	  all_are_names, location);
 }
 
 // Return whether this expression is a composite literal.
diff -r 29fef9fa5a41 go/expressions.h
--- a/go/expressions.h	Mon Oct 07 08:31:41 2013 -0700
+++ b/go/expressions.h	Tue Oct 08 16:48:26 2013 -0700
@@ -291,10 +291,13 @@
   make_unsafe_cast(Type*, Expression*, Location);
 
   // Make a composite literal.  The DEPTH parameter is how far down we
-  // are in a list of composite literals with omitted types.
+  // are in a list of composite literals with omitted types.  HAS_KEYS
+  // is true if the expression list has keys alternating with values.
+  // ALL_ARE_NAMES is true if all the keys could be struct field
+  // names.
   static Expression*
   make_composite_literal(Type*, int depth, bool has_keys, Expression_list*,
-			 Location);
+			 bool all_are_names, Location);
 
   // Make a struct composite literal.
   static Expression*
diff -r 29fef9fa5a41 go/parse.cc
--- a/go/parse.cc	Mon Oct 07 08:31:41 2013 -0700
+++ b/go/parse.cc	Tue Oct 08 16:48:26 2013 -0700
@@ -2690,15 +2690,17 @@
 {
   this->advance_token();
   return Expression::make_composite_literal(type, depth, false, NULL,
-		location);
+		false, location);
 }
 
   bool has_keys = false;
+  bool all_are_names = true;
   Expression_list* vals = new Expression_list;
   while (true)
 {
   Expression* val;
   bool is_type_omitted = false;
+  bool is_name = false;
 
   const Token* token = this->peek_token();
 
@@ -2719,6 +2721,7 @@
 	  val = this->id_to_expression(gogo->pack_h

Re: Enable SSE math on i386 with -Ofast

2013-10-08 Thread Jan Hubicka
Hi,
this is patch I ended up comitting after some furhter testing.  The difference 
to initial
version is that it now eanbles SSE math with -ffast-math too and it does so 
outside the
ugly target macro.

Bootstrapped/regtested x86_64-linux, tested with -m32
Honza

* config/i386/i386.c (ix86_option_override_internal): Switch
to SSE math for -ffast-math when target ISA supports SSE2.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 203252)
+++ config/i386/i386.c  (working copy)
@@ -3769,6 +3769,19 @@ ix86_option_override_internal (bool main
}
}
 }
+  /* For all chips supporting SSE2, -mfpmath=sse performs better than
+ fpmath=387.  The second is however default at many targets since the
+ extra 80bit precision of temporaries is considered to be part of ABI.
+ Overwrite the default at least for -ffast-math. 
+ TODO: -mfpmath=both seems to produce same performing code with bit
+ smaller binaries.  It is however not clear if register allocation is
+ ready for this setting.
+ Also -mfpmath=387 is overall a lot more compact (bout 4-5%) than SSE
+ codegen.  We may switch to 387 with -ffast-math for size optimized
+ functions. */
+  else if (fast_math_flags_set_p (&global_options)
+  && TARGET_SSE2)
+ix86_fpmath = FPMATH_SSE;
   else
 ix86_fpmath = TARGET_FPMATH_DEFAULT;
 


Re: [patch] The remainder of tree-flow.h refactored.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 07:44 AM, Andrew MacLeod wrote:

On 10/08/2013 06:22 AM, Richard Biener wrote:

graphite.h should be unnecessary with moving the pass struct like you
did for other loop opts.  Likewise tree-parloops.h (well, ok, maybe
you need parallelized_function_p, even though it's implementation is
gross ;)).  Likewise tree-predcom.h.


fair enough.  Yes, I've already seen a few things that madfe my skin 
crawl  and I had to resist going down a  rathole for :-)


unvisit_body isn't generic enough to warrant moving out of gimplify.c
(the only user).

The force_gimple_operand_gsi... routines are in gimplify.c because 
they ...

gimplify!  And you moved them but not force_gimple_operand[_1]!?


OK, let me make the above adjustments, and I'll recreate a patch 
without the gimple/gimplfy parts, and re-address that separately. I 
forget the details of my include issues there at the moment. 


Here's the adjusted patch which doesn't contain the ugly gimple, 
gimplify, and tree stuff.  I'll deal with that once everything else 
settles.
I removed tree-predcom.h and graphite.h and also moved the 
parallel_loops pass into tree-parloops.c... but we still need predcom.h 
:-P.  oh well. I think most of its pretty straightforward.


Bootstraps on x86_64-unknown-linux-gnu, and running regressions. 
Assuming no issues, OK?


Andrew



	* tree-flow.h: Remove all remaining prototypes, enums and structs that
	are not related to tree-cfg.c.
	* tree-ssa-address.h: New file.  Relocate prototypes.
	* tree-ssa-address.c: (struct mem_address): Relocate from tree-flow.h.
	(addr_for_mem_ref): New.  Combine call to get_address_description and
	return addr_for_mem_ref.
	* expr.c (expand_expr_real_1): Use new addr_for_mem_ref routine.
	* tree-ssa-live.h: Adjust prototypes.
	* passes.c: Include tree-ssa-live.h.
	* gimple-pretty-print.h (gimple_dump_bb): Add prototype.
	* graphite.c (graphite_transform_loops): Make static.
	(graphite_transforms, gate_graphite_transforms, pass_data_graphite,
	make_pass_graphite, pass_data_graphite_transforms, 
	make_pass_graphite_transforms): Relocate here from tree-ssa-loop.c.
	* ipa-pure-const.c (warn_function_noreturn): Make static.
	(execute_warn_function_noreturn, gate_warn_function_noreturn,
	class pass_warn_function_noreturn, make_pass_warn_function_noreturn):
	Relocate from tree-cfg.c
	* tree-cfg.c (tree_node_can_be_shared, gimple_empty_block_p): Make
	static.
	(execute_warn_function_noreturn, gate_warn_function_noreturn,
	class pass_warn_function_noreturn, make_pass_warn_function_noreturn):
	Move to ipa-pure-const.c.
	(execute_fixup_cfg, class pass_fixup_cfg, make_pass_fixup_cfg): Relocate
	from tree-optimize.c.
	* tree-optimize.c (execute_fixup_cfg, class pass_fixup_cfg,
	make_pass_fixup_cfg): Move to tree-cfg.c.
	* tree-chrec.h: (enum ev_direction): Relocate here from tree-flow.h.
	Relocate some prototypes.
	* tree-data-ref.h (tree_check_data_deps) Add prototype.
	* tree-dump.c (dump_function_to_file): Remove prototype.
	Add tree-flow.h to the include file.
	* tree-dump.h: Remove prototype.
	* tree-parloops.h: New File.  Add prototypes.
	* tree-parloops.c (gate_tree_parallelize_loops, tree_parallelize_loops,
	pass_data_parallelize_loops,  make_pass_parallelize_loops): Relocate
	from tree-ssa-loop.c.
	* tree-predcom.c (run_tree_predictive_commoning,
	gate_tree_predictive_commoning, pass_data_predcom, make_pass_predcom):
	Relocate here from tree-ssa-loop.c.
	* tree-ssa-dom.c (tree_ssa_dominator_optimize) Don't call 
	ssa_name_values.release ().
	* tree-ssa-threadedge.h: New File.  Relocate prototypes here.
	(ssa_name_values): Relocate from tree-flow.h.
	* tree-ssa.h: Include tree-ssa-threadedge.h and tree-ssa-address.h.
	* tree-ssa-loop.c (run_tree_predictive_commoning,
	gate_tree_predictive_commoning, pass_data_predcom, make_pass_predcom,
	graphite_transforms, gate_graphite_transforms, pass_data_graphite,
	make_pass_graphite, pass_data_graphite_transforms,
	make_pass_graphite_transforms, gate_tree_parallelize_loops,
	tree_parallelize_loops, pass_data_parallelize_loops,
	make_pass_parallelize_loops): Move to other files.
	* tree-vectorizer.h (lpeel_tree_duplicate_loop_to_edge_cfg): Prototype
	moved here.
	* tree.h: Remove prototypes from tree-address.c.


*** R/tree-flow.h	2013-10-08 18:26:20.692433161 -0400
--- tree-flow.h	2013-10-08 18:29:31.000721329 -0400
*** extern void free_omp_regions (void);
*** 87,206 
  void omp_expand_local (basic_block);
  tree copy_var_decl (tree, tree, tree);
  
- /*---
- 			  Function prototypes
- ---*/
- /* In tree-cfg.c  */
- 
  /* Location to track pending stmt for edge insertion.  */
  #define PENDING_STMT(e)	((e)->insns.g)
  
! extern void delete_tree_cfg_annotations (void);
! extern bool stmt_ends_bb_p (gimple);
! extern bool is_ctrl_stmt (gimple);
! extern bool is_ctrl_altering_stmt (gimple);
! e

Re: [C++ Patch] PR 58633

2013-10-08 Thread Paolo Carlini
.. a curiosity: the cp_parser_commit_to_tentative_parse at the end of 
cp_parser_pseudo_destructor_name, which didn't exist in 4.6.x and we can 
consider the root of this issue, is also my fault:


http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02246.html

From a different angle, I'm happy of the outcome of this detective 
work, because it means that the parser_commit isn't there for 
correctness: not performing it in some cases shouldn't be a big issue.


Paolo.


RE: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-10-08 Thread Bernd Edlinger
Hi,

On Tue, 8 Oct 2013 22:50:21, Eric Botcazou wrote:
>
>> I agree, that assigning a non-BLKmode to structures with zero-sized arrays
>> should be considered a bug.
>
> Fine, then let's apply Martin's patch, on mainline at least.
>

That would definitely be a good move. Maybe someone should approve it?

> But this testcase is invalid on STRICT_ALIGNMENT platforms: xx is pointer to a
> type with 4-byte alignment so its value must be a multiple of 4.

Then you probably win. But I still have some doubts.

I had to use this silly alignment/pack(4) to circumvent this statement
in compute_record_mode:

  /* If structure's known alignment is less than what the scalar
 mode would need, and it matters, then stick with BLKmode.  */
  if (TYPE_MODE (type) != BLKmode
  && STRICT_ALIGNMENT
  && ! (TYPE_ALIGN (type)>= BIGGEST_ALIGNMENT
    || TYPE_ALIGN (type)>= GET_MODE_ALIGNMENT (TYPE_MODE (type
    {
  /* If this is the only reason this type is BLKmode, then
 don't force containing types to be BLKmode.  */
  TYPE_NO_FORCE_BLK (type) = 1;
  SET_TYPE_MODE (type, BLKmode);
    }

But there are at least two targets where STRICT_ALIGNMENT = 0
and SLOW_UNALIGNED_ACCESS != 0: rs6000 and alpha.

This example with a byte-aligned structure will on one of these targets
likely execute this code path in  expand_expr_real_1/case MEM_REF:

    else if (SLOW_UNALIGNED_ACCESS (mode, align))
  temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
    0, TYPE_UNSIGNED (TREE_TYPE (exp)),
    (modifier == EXPAND_STACK_PARM
 ? NULL_RTX : target),
    mode, mode);

This looks wrong, but unfortunately I cannot test on these targets...

Regards
Bernd.

Re: [C++ Patch] PR 58448

2013-10-08 Thread Paolo Carlini

On 10/04/2013 02:04 PM, Paolo Carlini wrote:

... and this is a more straightforward approach. Also tested x86_64-linux.

I reverted this for now. Was causing problems (c++/58665).

Thanks,
Paolo.


Re: [PATCH] Enhance phiopt to handle BIT_AND_EXPR

2013-10-08 Thread Jeff Law

On 09/30/13 09:57, Andrew Pinski wrote:

On Mon, Sep 30, 2013 at 2:29 AM, Zhenqiang Chen  wrote:

Hi,

The patch enhances phiopt to handle cases like:

   if (a == 0 && (...))
 return 0;
   return a;

Boot strap and no make check regression on X86-64 and ARM.

Is it OK for trunk?


 From someone who wrote lot of this code (value_replacement in fact),
this looks good, though I would pull:
+  if (TREE_CODE (gimple_assign_rhs1 (def)) == SSA_NAME)
+{
+  gimple def1 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def));
+  if (is_gimple_assign (def1) && gimple_assign_rhs_code (def1) == EQ_EXPR)
+ {
+  tree op0 = gimple_assign_rhs1 (def1);
+  tree op1 = gimple_assign_rhs2 (def1);
+  if ((operand_equal_for_phi_arg_p (arg0, op0)
+   && operand_equal_for_phi_arg_p (arg1, op1))
+  || (operand_equal_for_phi_arg_p (arg0, op1)
+   && operand_equal_for_phi_arg_p (arg1, op0)))
+{
+  *code = gimple_assign_rhs_code (def1);
+  return 1;
+}
+ }
+}

Out into its own function since it is repeated again for
gimple_assign_rhs2 (def).
Agreed.  Repeating blobs of code like that should be pulled out into its 
own subroutine.


It's been 10 years since we wrote that code Andrew and in looking at it 
again, I wonder if/why DOM doesn't handle the stuff in 
value_replacement.  It really just looks like it's propagation of an 
edge equivalence.  do you recall the motivation behind value_replacement 
and why we didn't just let DOM handle it?





 Also what about cascading BIT_AND_EXPR

like:
if((a == 0) & (...) & (...))

I notice you don't handle that either.
Well, given the structure of how that code is going to look, it'd just 
be repeated walking through the input chains.  Do-able, yes.  But I 
don't think handling that should be a requirement for integration.


I'll go ahead and pull the common bits into a single function and commit 
on Zhenqiang's behalf.


Thanks!

jeff




Re: Patch: Add #pragma ivdep support to the ME and C FE (was: Re: RFC patch for #pragma ivdep)

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 10:51:29PM +0200, Tobias Burnus wrote:
> +   return false;
> + }
> +  c_parser_for_statement (parser, true);
> +  return false;
> +
>  case PRAGMA_GCC_PCH_PREPROCESS:
>c_parser_error (parser, "%<#pragma GCC pch_preprocess%> must be 
> first");
>c_parser_skip_until_found (parser, CPP_PRAGMA_EOL, NULL);
> diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
> index f39b194..5979a4a 100644
> --- a/gcc/cfgloop.c
> +++ b/gcc/cfgloop.c
> @@ -507,6 +507,37 @@ flow_loops_find (struct loops *loops)
> loop->latch = latch;
>   }
>   }
> +  /* Search for ANNOTATE call with annot_expr_ivdep_kind; if found, 
> remove
> +  it and set loop->safelen to INT_MAX.  We assume that the annotation
> + comes immediately before the condition.  */

Mixing tabs with spaces above.

> +  if (loop->latch)
> + FOR_EACH_EDGE (e, ei, loop->latch->succs)
> +   {
> + if (e->dest->flags & BB_RTL)
> +   break;

I'd prefer Richard to review this (and probably Joseph the C FE part).
You can't really have loop->latch in GIMPLE and the successors
in RTL, so perhaps you can check that in the if (loop->latch) check
already.

> + gimple_stmt_iterator gsi = gsi_last_nondebug_bb (e->dest);

GIMPLE_COND must be the last in the bb, can't be followed by
debug stmts, so you can safely use just gsi_last_bb (e->dest) instead.

> @@ -7378,6 +7388,22 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
> ret = gimplify_addr_expr (expr_p, pre_p, post_p);
> break;
>  
> + case ANNOTATE_EXPR:
> +   {
> + tree cond = TREE_OPERAND (*expr_p, 0);
> + tree id = build_int_cst (integer_type_node,
> +  ANNOTATE_EXPR_ID (*expr_p));
> + tree tmp = create_tmp_var_raw (TREE_TYPE(cond), NULL);
> + gimplify_arg (&cond, pre_p, EXPR_LOCATION (*expr_p));
> + gimple call = gimple_build_call_internal (IFN_ANNOTATE, 2,
> +   cond, id);
> +gimple_call_set_lhs (call, tmp);
> + gimplify_seq_add_stmt (pre_p, call);
> +*expr_p = tmp;

Again, mixing tabs with spaces, tabs should be used always.

> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -591,6 +591,11 @@ extern void omp_clause_range_check_failed (const_tree, 
> const char *, int,
>  #define PREDICT_EXPR_PREDICTOR(NODE) \
>((enum br_predictor)tree_low_cst (TREE_OPERAND (PREDICT_EXPR_CHECK (NODE), 
> 0), 0))
>  
> +#define ANNOTATE_EXPR_ID(NODE) \
> +  ((enum annot_expr_kind) ANNOTATE_EXPR_CHECK(NODE)->base.u.version)

Missing space between CHECK and (.

> +#define SET_ANNOTATE_EXPR_ID(NODE, ID) \
> +  (ANNOTATE_EXPR_CHECK(NODE)->base.u.version = ID)

Likewise.  Shouldn't it be = (ID) ?

Jakub


[Patch, Fortran] Use ANNOTATE_EXPR annot_expr_ivdep_kind for DO CONCURRENT

2013-10-08 Thread Tobias Burnus
This patch requires my pending ME and C FE patch: 
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00514.html


Using C/C++'s #pragma ivdep or – with the attached Fortran patch – "do 
concurrent", the loop condition is annotated such that later the loop's 
vectorization safelen is set to infinity (well, INT_MAX). The main 
purpose is to tell the compiler that the result is independent of the 
order in which the loop is walked. The typical case is pointer aliasing, 
in which the compiler either doesn't vectorize or adds a run-time 
aliasing check (loop versioning). With the annotation, the compiler 
simply assumes that there is no aliasing and avoids the versioning. – 
Contrary to C++ which does not even have the "restrict" qualifier (gcc 
and others do support __restrict) and cases where C/C++'s __restrict 
qualifier isn't sufficient/applicable, the effect on typical Fortran 
code should be smaller as most variables cannot alias. Still, in some 
cases it can help. (See test case for an example.)


There is an alternative to ivdep, which is more lower level [1]: 
OpenMPv4's "omp simd" (with safelen=) for C/C++/Fortran and – for C/C++ 
– Cilk Plus's #pragma simd.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

PS: I think the same annotation could be also used with FORALL and 
implied loops with whole-array/array-section assignments, when the FE 
knows that there is no aliasing between the LHS and RHS side. (In some 
cases, the FE knows this while the ME doesn't.)


PPS: My personal motivation is my long-standing wish to pass this 
information to the middle end for DO CONCURRENT but also to use the 
pragma for a specific C++ code.


[1] The OpenMPv4 support for C/C++ will be merged soon, for Fortran it 
will take a while (maybe still in 4.9, maybe only later). See 
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00502.html / The relevant 
Cilk Plus patch has been posted at 
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01626.html
2013-10-08  Tobias Burnus  

	PR fortran/44646
	* trans-stmt.c (struct forall_info): Add do_concurrent field.
	(gfc_trans_forall_1): Set it for do concurrent.
	(gfc_trans_forall_loop): Mark those as annot_expr_ivdep_kind.

2013-10-08  Tobias Burnus  

	PR fortran/44646
	* gfortran.dg/vect/vect-do-concurrent-1.f90: New.

diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index edd2dac..b44d2c1 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -53,6 +53,7 @@ typedef struct forall_info
   int nvar;
   tree size;
   struct forall_info  *prev_nest;
+  bool do_concurrent;
 }
 forall_info;
 
@@ -2759,6 +2760,12 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tree body,
   /* The exit condition.  */
   cond = fold_build2_loc (input_location, LE_EXPR, boolean_type_node,
 			  count, build_int_cst (TREE_TYPE (count), 0));
+  if (forall_tmp->do_concurrent)
+	{
+	  cond = build1 (ANNOTATE_EXPR, TREE_TYPE (cond), cond);
+	  SET_ANNOTATE_EXPR_ID (cond, annot_expr_ivdep_kind);
+	}
+
   tmp = build1_v (GOTO_EXPR, exit_label);
   tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node,
 			 cond, tmp, build_empty_stmt (input_location));
@@ -3842,6 +3849,7 @@ gfc_trans_forall_1 (gfc_code * code, forall_info * nested_forall_info)
 	}
 
   tmp = gfc_finish_block (&body);
+  nested_forall_info->do_concurrent = true;
   tmp = gfc_trans_nested_forall_loop (nested_forall_info, tmp, 1);
   gfc_add_expr_to_block (&block, tmp);
   goto done;
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 b/gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90
new file mode 100644
index 000..7d56241
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-require-effective-target vect_float }
+! { dg-options "-O3 -fopt-info-vec-optimized" }
+
+subroutine test(n, a, b, c)
+  integer, value :: n
+  real, contiguous,  pointer :: a(:), b(:), c(:)
+  integer :: i
+  do concurrent (i = 1:n)
+a(i) = b(i) + c(i)
+  end do
+end subroutine test
+
+! { dg-message "loop vectorized" "" { target *-*-* } 0 }
+! { dg-bogus "version" "" { target *-*-* } 0 }
+! { dg-bogus "alias" "" { target *-*-* } 0 }
+! { dg-final { cleanup-tree-dump "vect" } }


Add a param to decide stack slot sharing at -O0

2013-10-08 Thread Easwaran Raman
In cfgexpand.c, variables in non-overlapping lexical scopes are
assigned same stack locations at -O1 and above. At -O0, this is
attempted only if the size of the stack objects is above a threshold
(32). The rationale is at -O0, more variables are going to be in the
stack and the O(n^2) stack slot sharing algorithm will increase the
compilation time. This patch replaces the constant with a param which
is set to 32 by default. We ran into a case where the presence of
always_inline attribute triggered Wframe-larger-than warnings at -O0
but not at -O2 since the different inlined copies share the stack. We
are ok with a slight increase in compilation time to get smaller stack
frames even at -O0 and this patch would allow us do that easily.

Bootstraps on x86_64/linux. Is this ok for trunk?

Thanks,
Easwaran


2013-10-08  Easwaran Raman 

* params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param...
* cfgexpand.c (defer_stack_allocation): ...use here


cfgexpand.patch
Description: Binary data


Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-10-08 Thread Eric Botcazou
> I agree, that assigning a non-BLKmode to structures with zero-sized arrays
> should be considered a bug.

Fine, then let's apply Martin's patch, on mainline at least.

> And again, this is not only a problem of structures with zero-sized
> arrays at the end. Remember my previous example code:
> On ARM (or anything with STRICT_ALIGNMENT) this union has the
> same problems:
> 
> /* PR middle-end/57748 */
> /* arm-eabi-gcc -mcpu=cortex-a9 -O3 */
> #include 
> 
> union  x
> {
>   short a[2];
>   char x[4];
> } __attribute__((packed, aligned(4))) ;
> typedef volatile union  x *s;
> 
> void __attribute__((noinline, noclone))
> check (void)
> {
>   s xx=(s)(0x8002);
>   /* although volatile xx->x[3] reads 4 bytes here */
>   if (xx->x[3] != 3)
> abort ();
> }
> 
> void __attribute__((noinline, noclone))
> foo (void)
> {
>   s xx=(s)(0x8002);
>   xx->x[3] = 3;
> }
> 
> int
> main ()
> {
>   foo ();
>   check ();
>   return 0;
> }

But this testcase is invalid on STRICT_ALIGNMENT platforms: xx is pointer to a 
type with 4-byte alignment so its value must be a multiple of 4.

-- 
Eric Botcazou


Patch: Add #pragma ivdep support to the ME and C FE (was: Re: RFC patch for #pragma ivdep)

2013-10-08 Thread Tobias Burnus

Jakub Jelinek wrote:

On Tue, Oct 08, 2013 at 08:51:50AM +0200, Tobias Burnus wrote:

+  if (loop->latch && loop->latch->next_bb != EXIT_BLOCK_PTR
+  && bb_seq_addr (loop->latch->next_bb))

Why this bb_seq_addr guard?


Without, I get a segfault in the stage 1 (prev-gcc/xg++) compiler when 
building stage 2's cfgloop.c. I now fixed it more properly using:

if (e->dest->flags & BB_RTL)
  break;


GIMPLE_COND must be the last stmt in a bb.  So, instead of the walk just
do
   gimple stmt = last_stmt (loop->latch->next_bb);
   if (stmt && gimple_code (stmt) == GIMPLE_COND)

Also, not sure if you really want loop->latch->next_bb rather than
look through succs of loop->latch or similar, next_bb is really chaining
of bb's together in some order, doesn't imply there is an edge in between
the previous and next bb and what the edge kind is.


I now did so. I also added a test case and documentation. I will post 
the Fortran and C++ parser patches as follow up.


Attached is the new patch, freshly bootstrapped and regtested on 
x86-64-gnu-linux.

Comments/suggestions? Or is it already OK for the trunk?

Tobias
2013-08-10  Tobias Burnus  

	PR other/33426
* c-pragma.c (init_pragma) Add #pragma ivdep handling.
* c-pragma.h (pragma_kind): Add PRAGMA_IVDEP.

	PR other/33426
* c-parser.c (c_parser_pragma, c_parser_for_statement):
Handle PRAGMA_IVDEP.
(c_parser_statement_after_labels): Update call.

	PR other/33426
* cfgloop.c (flow_loops_find): Search for IFN_ANNOTATE
and set safelen.
* gimplify.c (gimple_boolify, gimplify_expr): Handle ANNOTATE_EXPR.
* internal-fn.c (expand_ANNOTATE): New function.
* internal-fn.def (ANNOTATE): Define as new internal function.
* tree-core.h (tree_node_kind): Add annot_expr_ivdep_kind.
(tree_base) Update a comment.
* tree-pretty-print.c (dump_generic_node): Handle ANNOTATE_EXPR.
* tree.def (ANNOTATE_EXPR): New DEFTREECODE.
* tree.h (ANNOTATE_EXPR_ID, SET_ANNOTATE_EXPR_ID): New macros.
	* doc/extend.texi (Pragmas): Document #pragma ivdep.

	PR other/33426
	* testsuite/gcc.dg/vect/vect-ivdep-1.c: New.

diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 309859f..06dbf17 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1353,6 +1353,8 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "pch_preprocess",
   PRAGMA_GCC_PCH_PREPROCESS, false, false);
 
+  cpp_register_deferred_pragma (parse_in, 0, "ivdep", PRAGMA_IVDEP, false,
+false);
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 41215db..c826fbd 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -46,6 +46,7 @@ typedef enum pragma_kind {
   PRAGMA_OMP_THREADPRIVATE,
 
   PRAGMA_GCC_PCH_PREPROCESS,
+  PRAGMA_IVDEP,
 
   PRAGMA_FIRST_EXTERNAL
 } pragma_kind;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b612e29..6bf9fbf 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1150,7 +1150,7 @@ static void c_parser_if_statement (c_parser *);
 static void c_parser_switch_statement (c_parser *);
 static void c_parser_while_statement (c_parser *);
 static void c_parser_do_statement (c_parser *);
-static void c_parser_for_statement (c_parser *);
+static void c_parser_for_statement (c_parser *, bool);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -4495,7 +4495,7 @@ c_parser_statement_after_labels (c_parser *parser)
 	  c_parser_do_statement (parser);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser);
+	  c_parser_for_statement (parser, false);
 	  break;
 	case RID_GOTO:
 	  c_parser_consume_token (parser);
@@ -4948,7 +4948,7 @@ c_parser_do_statement (c_parser *parser)
 */
 
 static void
-c_parser_for_statement (c_parser *parser)
+c_parser_for_statement (c_parser *parser, bool ivdep)
 {
   tree block, cond, incr, save_break, save_cont, body;
   /* The following are only used when parsing an ObjC foreach statement.  */
@@ -5054,8 +5054,17 @@ c_parser_for_statement (c_parser *parser)
 	{
 	  if (c_parser_next_token_is (parser, CPP_SEMICOLON))
 	{
-	  c_parser_consume_token (parser);
-	  cond = NULL_TREE;
+	  if (ivdep)
+		{
+		  c_parser_error (parser, "missing loop condition loop with "
+  "IVDEP pragma");
+		  cond = error_mark_node;
+		}
+	  else
+		{
+		  c_parser_consume_token (parser);
+		  cond = NULL_TREE;
+		}
 	}
 	  else
 	{
@@ -5069,6 +5078,12 @@ c_parser_for_statement (c_parser *parser)
 	  c_parser_skip_until_found (parser, CPP_SEMICOLON,
 	 "expected %<;%>");
 	}
+	  if (ivdep)
+	{
+	  cond = build1 (ANNOTATE_EXPR, TREE_TYPE (cond), cond);
+	  SET_ANNOTATE_EXPR_ID (cond, annot

[4/6] OpenMP 4.0 gcc testsuite

2013-10-08 Thread Jakub Jelinek
Hi!

Apparently the 3/6 and 4/6 patches didn't make it through to gcc-patches,
although they aren't the largest.  Trying to post them bzip2ed now
if I get more luck to get it through.

2013-10-08  Jakub Jelinek  

gcc/testsuite/
* c-c++-common/gomp/atomic-15.c: Adjust for C diagnostics.
Remove error test that is now valid in OpenMP 4.0.
* c-c++-common/gomp/atomic-16.c: New test.
* c-c++-common/gomp/cancel-1.c: New test.
* c-c++-common/gomp/depend-1.c: New test.
* c-c++-common/gomp/depend-2.c: New test.
* c-c++-common/gomp/map-1.c: New test.
* c-c++-common/gomp/pr58472.c: New test.
* c-c++-common/gomp/sections1.c: New test.
* c-c++-common/gomp/simd1.c: New test.
* c-c++-common/gomp/simd2.c: New test.
* c-c++-common/gomp/simd3.c: New test.
* c-c++-common/gomp/simd4.c: New test.
* c-c++-common/gomp/simd5.c: New test.
* c-c++-common/gomp/single1.c: New test.
* g++.dg/gomp/block-0.C: Adjust for stricter #pragma omp sections
parser.
* g++.dg/gomp/block-3.C: Likewise.
* g++.dg/gomp/clause-3.C: Adjust error messages.
* g++.dg/gomp/declare-simd-1.C: New test.
* g++.dg/gomp/declare-simd-2.C: New test.
* g++.dg/gomp/depend-1.C: New test.
* g++.dg/gomp/depend-2.C: New test.
* g++.dg/gomp/target-1.C: New test.
* g++.dg/gomp/target-2.C: New test.
* g++.dg/gomp/taskgroup-1.C: New test.
* g++.dg/gomp/teams-1.C: New test.
* g++.dg/gomp/udr-1.C: New test.
* g++.dg/gomp/udr-2.C: New test.
* g++.dg/gomp/udr-3.C: New test.
* g++.dg/gomp/udr-4.C: New test.
* g++.dg/gomp/udr-5.C: New test.
* gcc.dg/autopar/outer-1.c: Expect 4 instead of 5 loopfn matches.
* gcc.dg/autopar/outer-2.c: Likewise.
* gcc.dg/autopar/outer-3.c: Likewise.
* gcc.dg/autopar/outer-4.c: Likewise.
* gcc.dg/autopar/outer-5.c: Likewise.
* gcc.dg/autopar/outer-6.c: Likewise.
* gcc.dg/autopar/parallelization-1.c: Likewise.
* gcc.dg/gomp/block-3.c: Adjust for stricter #pragma omp sections
parser.
* gcc.dg/gomp/clause-1.c: Adjust error messages.
* gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime
instead of GOMP_parallel_loop_runtime_start.
* gcc.dg/gomp/declare-simd-1.c: New test.
* gcc.dg/gomp/declare-simd-2.c: New test.
* gcc.dg/gomp/nesting-1.c: Adjust for stricter #pragma omp sections
parser.  Add further #pragma omp sections nesting tests.
* gcc.dg/gomp/target-1.c: New test.
* gcc.dg/gomp/target-2.c: New test.
* gcc.dg/gomp/taskgroup-1.c: New test.
* gcc.dg/gomp/teams-1.c: New test.
* gcc.dg/gomp/udr-1.c: New test.
* gcc.dg/gomp/udr-2.c: New test.
* gcc.dg/gomp/udr-3.c: New test.
* gcc.dg/gomp/udr-4.c: New test.
* gfortran.dg/gomp/appendix-a/a.35.5.f90: Add dg-error.


Jakub


gomp4-merge-4.patch.bz2
Description: BZip2 compressed data


Re: New attribute: returns_nonnull

2013-10-08 Thread Jeff Law

On 10/08/13 13:41, Marc Glisse wrote:

Glad to see you checked this and have a test for it.


I am slowly starting to understand how reviewers think ;-)
That's a huge part of the submission process.  Once you know what folks 
are looking for the path to approval gets appropriately short :-)






Not required for approval, but an "extra credit" -- a warning if a
NULL value flows into a return statement in a function with this marking.

Similarly, not required for approval, but it'd be real cool if we
could back-propagate the non-null return value attribute.  ie, any
value flowing into the return statement of one of these functions can
be assumed to be non-zero, which may help eliminate more null pointer
checks in the decorated function.  I guess ultimately we'd have to see
if noting this actually helps any real code.

Also not required for approval, but adding returns_nonnull markups to
appropriate functions in gcc itself.


I completely agree that returns_nonnull has more uses. The warning is
the first one (not sure where to add such a check though), and maybe we
should have a sanitizer option to check the nonnull and returns_nonnull
attributes, although I don't know if that should be in the caller, in
the callee, or both.
Well, it seems to me there's both compile-time and runtime (sanitizer) 
possibilities.  I'm much more familiar with the compile-time analysis 
and can comment on that much more substantially.


In a function with the attribute, what we want to know is if any NULL 
values reach the return statement.  Obviously if CCP/VRP/DOM manage to 
propagate a NULL into a return statement, then we issue a "returns NULL" 
warning, much like we do for "is used uninitialized".


The more interesting case is if a value is defined by a PHI.  If a NULL 
is found in the PHI argument list, then we'd want to use a "may return 
NULL warning" much like we do for "may be used uninitialized".  Note 
this is true anytime we have a NULL value in a PHI arg and the result of 
the PHI can ultimately reach a return statement.



If set of all values flowing to the return are SSA_NAMEs and range 
information indicates some might be NULL, then the question is do we 
issue a warning or not.  My inclination would be yes, but we'd want it 
to be separate from the earlier two cases due to the high noise ratio.


Note there is significant overlap with a queued enhancement of mine to 
warn about potential null pointer dereferences -- which needs to work in 
precisely the same manner and has the same problems with signal to noise 
ratios, particularly in the 3rd case.  Given that, I'd be happy to pull 
that patch out of my stash and let you play with it if you're 
interested.  It hasn't been hacked on in a couple years, so it'd need 
some updating.



In terms of exploiting the non-nullness, lots of fun here too.

For example, if CCP/VRP/DOM propagate a NULL into a return statement, a 
program executing that code should be declared as having undefined 
behaviour.  Assuming we do that, then we change the return 0 into a trap.


If there is a PHI argument with a NULL value that then feeds a return 
statement, we should isolate that path by block duplication.  Once the 
path is isolated, it turns into the former case and we apply the same 
transformation.



As it turns out I'm working right now on cleaning up a change to do 
precisely that for cases where a NULL value feeds into a pointer 
dereference :-)



Finally, backpropagation.  Any SSA_NAME that directly or indirectly 
reaches a return statement in one of these decorated functions has a 
known non-zero value.  It shouldn't be too hard to teach that to VRP 
which will then use that information to do more aggressive NULL pointer 
elimination.




 From an optimization POV, I guess there is also the inlining that could
be improved so it doesn't lose the nonnull property.

The main request I am aware of is PR21856, but I am not familiar enough
with java to handle it. Do you have suggestions of functions which
should get the attribute? memcpy could, but that's really a special case
since it returns its argument, which is itself non-null.
The most obvious are things like xmalloc and wrappers around it, 
probably the GC allocation routines and wrappers around those.  Maybe 
some of the str* and mem* functions as well, I'd have to look at the ISO 
specs to be sure though.


You end up wanting to do a transitive closure on whatever initial set of 
non-null functions you come up with and any functions which call them 
and return those results unconditionally.  You could even build a 
warning around that -- ie, "is missing non-null return attribute" kind 
of warning when you can prove that all values feeding the return 
statement are non-null by whatever means are available to you.


With regard to 21856, the caveat I would mention for that is java as a 
GCC front-end is becoming less and less important over time.   So I'd 
suggest focusing on how this can be used for the more impor

Re: Cleanup patches

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 09:17:35AM +0200, Thomas Schwinge wrote:
> Here are a few cleanup patches, mostly in the realm of OpenMP, so Jakub
> gets a CC.  OK to commit?

They look ok to me, but I'd prefer if they could go in after the merge
I've just posted.

Jakub


Re: [1/6] OpenMP 4.0 C FE support

2013-10-08 Thread Jakub Jelinek
Hi!

Sorry for the subject, that was meant to be [2/6], not [1/6].

Jakub


[0/6] Merge of gomp-4_0-branch to trunk

2013-10-08 Thread Jakub Jelinek
Hi!

I'd like to merge (most of) gomp-4_0-branch to trunk.
Except for unknown bugs and known outstanding unclear spots in
the standard, the branch right now implements OpenMP 4.0 standard
with the following caveats that I hope can be dealt with incrementally
later on:
1) Fortran front-end support of OpenMP 4.0 isn't written, so Fortran
   will for now only support OpenMP 3.1 and earlier, except for
   library only features.
2) offloading isn't implemented, for the merge I've actually took out
   the splay tree and plugin support from target.c, because it was so
   far only useful for hack testing, there aren't any plugins yet
   and when there will be, target.c will need to be modified.
   #pragma omp target{, data, update} is supported, but will always
   for now fall back to host execution.
3) #pragma omp declare simd is right now parsed and diagnosed, but
   we don't actually generate the elemental function entry points
   (ABI issue for public function, but we have yet to decide on the
   ABI) and vectorizer doesn't use them (only optimization issue).
4) C++ tasks with firstprivate variables that need copy constructors
   can't be discarded right now during cancellation, because there is
   no way yet to call corresponding destructors.  They will be therefore
   started and if they hit some early cancellation point, they will be
   cancelled there, otherwise they will simply run all the way through.
   Shouldn't be that hard to handle this, but IMHO can be done
   incrementally.

I've bootstrapped/regtested the merge on x86_64-linux and i686-linux.
I think most of the patch if not all of it falls under my OpenMP
maintainership and the vectorization part of the OpenMP 4.0 support
has been merged earlier, still I'd appreciate acks from the front-end
maintainers because the patch is very large and affects the frontends
quite a lot.  And of course, if anybody has further comments, they'll
be appreciated.

Because the patch is too large for gcc-patches limits (1.15MB), I've
split it into 6 parts:
[1/6] gcc/ support other than gcc/{c/,cp/,testsuite/}
[2/6] C FE changes
[3/6] C++ FE changes
[4/6] gcc/testsuite
[5/6] libgomp/ (other than libgomp/testsuite/)
[6/6] libgomp/testsuite/

 gcc/ada/gcc-interface/utils.c|6 
 gcc/builtin-types.def|   35 
 gcc/c-family/c-common.c  |   32 
 gcc/c-family/c-common.h  |  148 
 gcc/c-family/c-cppbuiltin.c  |2 
 gcc/c-family/c-omp.c |  408 +
 gcc/c-family/c-pragma.c  |9 
 gcc/c-family/c-pragma.h  |   39 
 gcc/c/c-decl.c   |  176 
 gcc/c/c-lang.h   |3 
 gcc/c/c-parser.c | 4017 
 gcc/c/c-tree.h   |9 
 gcc/c/c-typeck.c | 1015 
 gcc/cp/cp-array-notation.c   |1 
 gcc/cp/cp-gimplify.c |   22 
 gcc/cp/cp-objcp-common.h |2 
 gcc/cp/cp-tree.h |   35 
 gcc/cp/decl.c|   52 
 gcc/cp/decl2.c   |   63 
 gcc/cp/parser.c  | 4552 +--
 gcc/cp/parser.h  |   18 
 gcc/cp/pt.c  |  289 +
 gcc/cp/semantics.c   | 1523 ++
 gcc/fortran/f95-lang.c   |   18 
 gcc/fortran/trans-openmp.c   |6 
 gcc/fortran/types.def|   36 
 gcc/gimple-low.c |3 
 gcc/gimple-pretty-print.c|  119 
 gcc/gimple.c |   79 
 gcc/gimple.def   |   24 
 gcc/gimple.h |  266 +
 gcc/gimplify.c   |  603 ++
 gcc/langhooks-def.h  |3 
 gcc/langhooks.c  |9 
 gcc/langhooks.h  |3 
 gcc/lto/lto-lang.c   |6 
 gcc/omp-builtins.def |   76 
 gcc/omp-low.c| 2646 +--
 gcc/testsuite/c-c++-common/gomp/atomic-15.c  |   34 
 gcc/testsuite/c-c++-common/gomp/atomic-16.c  |   34 
 gcc/testsuite/c-c++-common/gomp/cancel-1.c   |  396 +
 gcc/testsuite/c-c++-common/gomp/depend-1.c   |   79 
 gcc/testsuite/c-c++-common/gomp/depend-2.c   |   19 
 gcc/testsuite/c-c++-common/gomp/map-1.c 

Re: New attribute: returns_nonnull

2013-10-08 Thread Marc Glisse

On Tue, 8 Oct 2013, Jeff Law wrote:


On 10/07/13 08:17, Marc Glisse wrote:

Hello,

this patch adds an attribute to let the compiler know that a function
never returns NULL. I saw some ECF_* flags, but the attribute seems
sufficient. I considered using nonnull(0), but then it would have been
confusing that the version of nonnull without arguments applies only to
parameters and not the return value.

2013-10-08  Marc Glisse 

 PR tree-optimization/20318
gcc/c-family/
 * c-common.c (handle_returns_nonnull_attribute): New function.
 (c_common_attribute_table): Add returns_nonnull.

gcc/
 * doc/extend.texi (returns_nonnull): New function attribute.
 * fold-const.c (tree_expr_nonzero_warnv_p): Look for returns_nonnull
 attribute.
 * tree-vrp.c (gimple_stmt_nonzero_warnv_p): Likewise.
 (stmt_interesting_for_vrp): Accept all GIMPLE_CALL.

gcc/testsuite/
 * c-c++-common/pr20318.c: New file.
 * gcc.dg/tree-ssa/pr20318.c: New file.

--
Marc Glisse

p12


Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 203241)
+++ c-family/c-common.c (working copy)
@@ -740,20 +741,22 @@ const struct attribute_spec c_common_att
{ "*tm regparm",0, 0, false, true, true,
  ignore_attribute, false },
{ "no_split_stack", 0, 0, true,  false, false,
  handle_no_split_stack_attribute, false },
/* For internal use (marking of builtins and runtime functions) only.
   The name contains space to prevent its usage in source code.  */
{ "fn spec",1, 1, false, true, true,
  handle_fnspec_attribute, false },
{ "warn_unused",0, 0, false, false, false,
  handle_warn_unused_attribute, false },
+  { "returns_nonnull",0, 0, false, true, true,
+ handle_returns_nonnull_attribute, false },
{ NULL, 0, 0, false, false, false, NULL, false }
  };
I'm going to assume this is correct -- it looks sane, but I've never really 
done much with the attribute tables.


I looked at nonnull and noreturn, and the second one says that it is wrong 
and should be like nonnull, so I mostly copied from nonnull. I can't say I 
really understand what it is doing, but I was happy that everything worked 
so nicely.



+
+/* Handle a "returns_nonnull" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_nonnull_attribute (tree *node, tree, tree, int,
+ bool *no_add_attrs)
+{
+  // Even without a prototype we still have a return type we can check.
+  if (TREE_CODE (TREE_TYPE (*node)) != POINTER_TYPE)
+{
+  error ("returns_nonnull attribute on a function not returning a 
pointer");

+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}

Glad to see you checked this and have a test for it.


I am slowly starting to understand how reviewers think ;-)

Not required for approval, but an "extra credit" -- a warning if a NULL value 
flows into a return statement in a function with this marking.


Similarly, not required for approval, but it'd be real cool if we could 
back-propagate the non-null return value attribute.  ie, any value flowing 
into the return statement of one of these functions can be assumed to be 
non-zero, which may help eliminate more null pointer checks in the decorated 
function.  I guess ultimately we'd have to see if noting this actually helps 
any real code.


Also not required for approval, but adding returns_nonnull markups to 
appropriate functions in gcc itself.


I completely agree that returns_nonnull has more uses. The warning is the 
first one (not sure where to add such a check though), and maybe we should 
have a sanitizer option to check the nonnull and returns_nonnull 
attributes, although I don't know if that should be in the caller, in the 
callee, or both.


From an optimization POV, I guess there is also the inlining that could be 

improved so it doesn't lose the nonnull property.

The main request I am aware of is PR21856, but I am not familiar enough 
with java to handle it. Do you have suggestions of functions which should 
get the attribute? memcpy could, but that's really a special case since it 
returns its argument, which is itself non-null.


I'll open a new PR about all these. I mostly implemented returns_nonnull 
because I'd just done the same for operator new and thus I knew exactly 
where the optimization was, I can't promise I'll manage much of the rest.




Index: testsuite/gcc.dg/tree-ssa/pr20318.c
===
--- testsuite/gcc.dg/tree-ssa/pr20318.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr20318.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
+/* { dg-options "-O2 -fd

Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (1).

2013-10-08 Thread Richard Sandiford
Chung-Ju Wu  writes:
> On 10/6/13 5:36 PM, Richard Sandiford wrote:
>> Thanks for the updates.
>> 
>> Chung-Ju Wu  writes:
>>>
>>> Now we remove all "use"s and "clobber"s from parallel rtx and
>>> use predicate function to check stack push/pop operation.
>>> Furthermore, once I remove unspec rtx as you suggested,
>>> I notice gcc is aware of the def-use dependency and it wouldn't
>>> perform unexpected register renaming.  So I think it was my
>>> faulty design of placing unspec/use/clobber within parallel rtx.
>> 
>> FWIW, it'd probably be better for nds32_valid_stack_push_pop to check
>> all the SETs in the PARALLEL, like nds32_valid_multiple_load_store does.
>> They could share a subroutine that checks for conesecutive loads and stores.
>> I.e. something like:
>> 
>> static bool
>> nds32_valid_multiple_load_store_1 (rtx op, bool load_p, int start, int end)
>> {
>>   ...
>> }
>> 
>> bool
>> nds32_valid_multiple_load_store (rtx op, bool load_p)
>> {
>>   return nds32_valid_multiple_load_store_1 (op, load_p, 0, XVECLEN (op, 0));
>> }
>> 
>> bool
>> nds32_valid_multiple_push_pop (rtx op, bool load_p)
>> {
>>   ...handle $lp, $gp and $fp cases, based on cframe->...
>>   if (!nds32_valid_multiple_load_store_1 (op, load_p, i, count - 1))
>> return false;
>>   ...check last element...
>> }
>> 
>
> Now I follow your suggestion to implement a subroutine
> nds32_consecutive_registers_load_store_p().
> Both nds32_valid_multiple_load_store() and nds32_valid_stack_push_pop()
> can share this subroutine to check consecutive load/store behavior.
>
> A revised-3 patch for nds32.c is attached.

Looks good to me, thanks.  FWIW, I was wondering if we could simplify things
by making the order of the SETs in the PARALLEL the same for push and pop,
but it's probably not much of a win.

Thanks,
Richard


[C++ Patch] PR 58633

2013-10-08 Thread Paolo Carlini

Hi,

in this ICE on valid, 4.7/4.8/4.9 Regression, the ICE happens in the 
second half of cp_parser_decltype_expr, when 
cp_parser_abort_tentative_parse is called in an inconstent status: 
tentative parsing is committed and no errors. The reason is the 
following: in its main loop cp_parser_postfix_expression calls 
cp_parser_postfix_dot_deref_expression which in turns calls 
cp_parser_pseudo_destructor_name and everything appears to be fine, thus 
the latter commits the tentative parse, but member_access_only_p is 
*true* in the first half of cp_parser_decltype_expr and even if 
cp_parser_postfix_expression later also finds valid () following the 
pseudo dtor expression has to return error_mark_node anyway.


A possible way to resolve the inconsistency is propagating 
member_access_only_p past cp_parser_postfix_expression to 
cp_parser_pseudo_destructor_name and in the latter not committing the 
tentative parse when the flag is true.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2013-10-08  Paolo Carlini  

PR c++/58633
* parser.c (cp_parser_postfix_dot_deref_expression,
cp_parser_pseudo_destructor_name): Add bool parameter.
(cp_parser_postfix_expression, cp_parser_builtin_offsetof):
Adjust.

/testsuite
2013-10-08  Paolo Carlini  

PR c++/58633
* g++.dg/cpp0x/decltype57.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 203274)
+++ cp/parser.c (working copy)
@@ -1861,13 +1861,13 @@ static tree cp_parser_postfix_expression
 static tree cp_parser_postfix_open_square_expression
   (cp_parser *, tree, bool, bool);
 static tree cp_parser_postfix_dot_deref_expression
-  (cp_parser *, enum cpp_ttype, tree, bool, cp_id_kind *, location_t);
+  (cp_parser *, enum cpp_ttype, tree, bool, bool, cp_id_kind *, location_t);
 static vec *cp_parser_parenthesized_expression_list
   (cp_parser *, int, bool, bool, bool *);
 /* Values for the second parameter of cp_parser_parenthesized_expression_list. 
 */
 enum { non_attr = 0, normal_attr = 1, id_attr = 2 };
 static void cp_parser_pseudo_destructor_name
-  (cp_parser *, tree, tree *, tree *);
+  (cp_parser *, tree, tree *, tree *, bool);
 static tree cp_parser_unary_expression
   (cp_parser *, bool, bool, cp_id_kind *);
 static enum tree_code cp_parser_unary_operator
@@ -6028,7 +6028,9 @@ cp_parser_postfix_expression (cp_parser *parser, b
  postfix_expression
= cp_parser_postfix_dot_deref_expression (parser, token->type,
  postfix_expression,
- false, &idk, loc);
+ false,
+ member_access_only_p,
+ &idk, loc);
 
   is_member_access = true;
  break;
@@ -6261,7 +6263,9 @@ static tree
 cp_parser_postfix_dot_deref_expression (cp_parser *parser,
enum cpp_ttype token_type,
tree postfix_expression,
-   bool for_offsetof, cp_id_kind *idk,
+   bool for_offsetof,
+   bool member_access_only_p,
+   cp_id_kind *idk,
location_t location)
 {
   tree name;
@@ -6337,7 +6341,7 @@ cp_parser_postfix_dot_deref_expression (cp_parser
   /* Parse the pseudo-destructor-name.  */
   s = NULL_TREE;
   cp_parser_pseudo_destructor_name (parser, postfix_expression,
-   &s, &type);
+   &s, &type, member_access_only_p);
   if (dependent_p
  && (cp_parser_error_occurred (parser)
  || !SCALAR_TYPE_P (type)))
@@ -6610,7 +6614,8 @@ static void
 cp_parser_pseudo_destructor_name (cp_parser* parser,
  tree object,
  tree* scope,
- tree* type)
+ tree* type,
+ bool member_access_only_p)
 {
   bool nested_name_specifier_p;
 
@@ -6692,7 +6697,15 @@ cp_parser_pseudo_destructor_name (cp_parser* parse
   cp_parser_require (parser, CPP_COMPL, RT_COMPL);
 
   /* Once we see the ~, this has to be a pseudo-destructor.  */
-  if (!processing_template_decl && !cp_parser_error_occurred (parser))
+  if (!processing_template_decl && !cp_parser_error_occurred (parser)
+  /* ... but we don't want to prematurely commit when only
+member access expressions are allowed.  This happens
+when parsing a decltype, because pseudo destructor
+calls (5.2.4) are handled by cp_parser_decltype_expr
+via the final cp_parser_expression (which eventu

Re: New attribute: returns_nonnull

2013-10-08 Thread Jeff Law

On 10/07/13 08:17, Marc Glisse wrote:

Hello,

this patch adds an attribute to let the compiler know that a function
never returns NULL. I saw some ECF_* flags, but the attribute seems
sufficient. I considered using nonnull(0), but then it would have been
confusing that the version of nonnull without arguments applies only to
parameters and not the return value.

2013-10-08  Marc Glisse 

 PR tree-optimization/20318
gcc/c-family/
 * c-common.c (handle_returns_nonnull_attribute): New function.
 (c_common_attribute_table): Add returns_nonnull.

gcc/
 * doc/extend.texi (returns_nonnull): New function attribute.
 * fold-const.c (tree_expr_nonzero_warnv_p): Look for returns_nonnull
 attribute.
 * tree-vrp.c (gimple_stmt_nonzero_warnv_p): Likewise.
 (stmt_interesting_for_vrp): Accept all GIMPLE_CALL.

gcc/testsuite/
 * c-c++-common/pr20318.c: New file.
 * gcc.dg/tree-ssa/pr20318.c: New file.

--
Marc Glisse

p12


Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 203241)
+++ c-family/c-common.c (working copy)
@@ -740,20 +741,22 @@ const struct attribute_spec c_common_att
{ "*tm regparm",0, 0, false, true, true,
  ignore_attribute, false },
{ "no_split_stack", 0, 0, true,  false, false,
  handle_no_split_stack_attribute, false },
/* For internal use (marking of builtins and runtime functions) only.
   The name contains space to prevent its usage in source code.  */
{ "fn spec",1, 1, false, true, true,
  handle_fnspec_attribute, false },
{ "warn_unused",0, 0, false, false, false,
  handle_warn_unused_attribute, false },
+  { "returns_nonnull",0, 0, false, true, true,
+ handle_returns_nonnull_attribute, false },
{ NULL, 0, 0, false, false, false, NULL, false }
  };
I'm going to assume this is correct -- it looks sane, but I've never 
really done much with the attribute tables.



+
+/* Handle a "returns_nonnull" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_nonnull_attribute (tree *node, tree, tree, int,
+ bool *no_add_attrs)
+{
+  // Even without a prototype we still have a return type we can check.
+  if (TREE_CODE (TREE_TYPE (*node)) != POINTER_TYPE)
+{
+  error ("returns_nonnull attribute on a function not returning a 
pointer");
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}

Glad to see you checked this and have a test for it.

Not required for approval, but an "extra credit" -- a warning if a NULL 
value flows into a return statement in a function with this marking.


Similarly, not required for approval, but it'd be real cool if we could 
back-propagate the non-null return value attribute.  ie, any value 
flowing into the return statement of one of these functions can be 
assumed to be non-zero, which may help eliminate more null pointer 
checks in the decorated function.  I guess ultimately we'd have to see 
if noting this actually helps any real code.


Also not required for approval, but adding returns_nonnull markups to 
appropriate functions in gcc itself.





Index: testsuite/gcc.dg/tree-ssa/pr20318.c
===
--- testsuite/gcc.dg/tree-ssa/pr20318.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr20318.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
+/* { dg-options "-O2 -fdump-tree-original -fdump-tree-vrp1" } */
+
+extern int* f(int) __attribute__((returns_nonnull));
+extern void eliminate ();
+void g () {
+  if (f (2) == 0)
+eliminate ();
+}
+void h () {
+  int *p = f (2);
+  if (p == 0)
+eliminate ();
+}
+
+/* { dg-final { scan-tree-dump-times "== 0" 1 "original" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate\[^\\n\]*to 0" 1 "vrp1" 
} } */
+/* { dg-final { cleanup-tree-dump "original" } } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */
Presumably g() is testing the fold-const.c and h() tests the tree-vrp 
changes, right?


This is OK for the trunk, please install.

Jeff


[gomp4] Fix thread-limit-{1,2}.c tests

2013-10-08 Thread Jakub Jelinek
Hi!

The last probelm I've occassionally saw FAILing under high load
was a thinko in thread-limit-1.c test.

The test meant to verify that no more than omp_get_thread_limit ()
total threads are running in the same contention group, but foolishly
assumed that all the nested GOMP_parallels will happen about at the same
time and thus there will be at most 6 threads running totally.
But, if some threads run through quickly and finish, while others are slow
to reach GOMP_barrier, then the ThreadsBusy counter might be already
decremented before deciding how many threads are allowed to be started,
so we could increment cnt more than 6 times.

This patch rewrites it such that it really tests if no more than 6
threads are running at the same time.

2013-10-08  Jakub Jelinek  

* testsuite/libgomp.c/thread-limit-1.c (main): Check if
cnt isn't bigger than 6 at any point in time, sleep 10ms after
incrementing it and then atomically decrement.
* testsuite/libgomp.c/thread-limit-2.c (main): Likewise.

--- libgomp/testsuite/libgomp.c/thread-limit-1.c.jj 2013-10-04 
15:41:48.0 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-1.c2013-10-08 
20:30:47.841114157 +0200
@@ -27,9 +27,15 @@ main ()
   #pragma omp parallel num_threads (5)
   #pragma omp parallel num_threads (5)
   #pragma omp parallel num_threads (2)
-  #pragma omp atomic
-  cnt++;
-  if (cnt > 6)
-abort ();
+  {
+int v;
+#pragma omp atomic capture
+v = ++cnt;
+if (v > 6)
+  abort ();
+usleep (1);
+#pragma omp atomic
+--cnt;
+  }
   return 0;
 }
--- libgomp/testsuite/libgomp.c/thread-limit-2.c.jj 2013-10-04 
15:48:28.0 +0200
+++ libgomp/testsuite/libgomp.c/thread-limit-2.c2013-10-08 
20:32:34.281578026 +0200
@@ -41,10 +41,16 @@ main ()
#pragma omp parallel num_threads (5)
#pragma omp parallel num_threads (5)
#pragma omp parallel num_threads (2)
-   #pragma omp atomic
-   cnt++;
-   if (cnt > 6)
- abort ();
+   {
+ int v;
+ #pragma omp atomic capture
+ v = ++cnt;
+ if (v > 6)
+   abort ();
+ usleep (1);
+ #pragma omp atomic
+ --cnt;
+   }
   }
   }
   return 0;

Jakub


[PATCH]: Fix PR58542, Arguments of __atomic_* functions are converted in unsigned mode

2013-10-08 Thread Uros Bizjak
Hello!

As shown in the attached testcase, arguments of various __atomic
builtins should be converted as signed, so the immediates get properly
extended.

2013-10-08  Uros Bizjak  

* optabs.c (maybe_emit_atomic_exchange): Convert operands as signed.
(maybe_emit_sync_lock_test_and_set): Ditto.
(expand_atomic_compare_and_swap): Ditto.
(maybe_emit_op): Ditto.

testsuite/ChangeLog:

2013-10-08  Uros Bizjak  

* g++.dg/ext/atomic-2.C: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

OK for mainline and release branches?

Uros.
Index: optabs.c
===
--- optabs.c(revision 203285)
+++ optabs.c(working copy)
@@ -7041,7 +7041,7 @@ maybe_emit_atomic_exchange (rtx target, rtx mem, r
   create_output_operand (&ops[0], target, mode);
   create_fixed_operand (&ops[1], mem);
   /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
-  create_convert_operand_to (&ops[2], val, mode, true);
+  create_convert_operand_to (&ops[2], val, mode, false);
   create_integer_operand (&ops[3], model);
   if (maybe_expand_insn (icode, 4, ops))
return ops[0].value;
@@ -7081,7 +7081,7 @@ maybe_emit_sync_lock_test_and_set (rtx target, rtx
   create_output_operand (&ops[0], target, mode);
   create_fixed_operand (&ops[1], mem);
   /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
-  create_convert_operand_to (&ops[2], val, mode, true);
+  create_convert_operand_to (&ops[2], val, mode, false);
   if (maybe_expand_insn (icode, 3, ops))
return ops[0].value;
 }
@@ -7336,8 +7336,8 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool,
   create_output_operand (&ops[0], target_bool, bool_mode);
   create_output_operand (&ops[1], target_oval, mode);
   create_fixed_operand (&ops[2], mem);
-  create_convert_operand_to (&ops[3], expected, mode, true);
-  create_convert_operand_to (&ops[4], desired, mode, true);
+  create_convert_operand_to (&ops[3], expected, mode, false);
+  create_convert_operand_to (&ops[4], desired, mode, false);
   create_integer_operand (&ops[5], is_weak);
   create_integer_operand (&ops[6], succ_model);
   create_integer_operand (&ops[7], fail_model);
@@ -7358,8 +7358,8 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool,
 
   create_output_operand (&ops[0], target_oval, mode);
   create_fixed_operand (&ops[1], mem);
-  create_convert_operand_to (&ops[2], expected, mode, true);
-  create_convert_operand_to (&ops[3], desired, mode, true);
+  create_convert_operand_to (&ops[2], expected, mode, false);
+  create_convert_operand_to (&ops[3], desired, mode, false);
   if (!maybe_expand_insn (icode, 4, ops))
return false;
 
@@ -7788,7 +7788,7 @@ maybe_emit_op (const struct atomic_op_functions *o
 
   create_fixed_operand (&ops[op_counter++], mem);
   /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
-  create_convert_operand_to (&ops[op_counter++], val, mode, true);
+  create_convert_operand_to (&ops[op_counter++], val, mode, false);
 
   if (maybe_expand_insn (icode, num_ops, ops))
 return (target == const0_rtx ? const0_rtx : ops[0].value);
Index: testsuite/g++.dg/ext/atomic-2.C
===
--- testsuite/g++.dg/ext/atomic-2.C (revision 0)
+++ testsuite/g++.dg/ext/atomic-2.C (working copy)
@@ -0,0 +1,18 @@
+// { dg-do run { target c++11 } }
+// { dg-require-effective-target sync_int_128_runtime }
+// { dg-options "-O2" }
+// { dg-additional-options "-mcx16" { target { i?86-*-* x86_64-*-* } } }
+
+#include 
+#include 
+
+int
+main (int, char **)
+{
+  std::atomic < __int128_t > i;
+
+  i = -1;
+
+  assert (int64_t (i >> 64) == -1);
+  assert (int64_t (i) == -1);
+}


[gomp4] Fix task cancellation

2013-10-08 Thread Jakub Jelinek
Hi!

I've noticed that occassionally on larger box under high load
cancel-taskgroup-{2.c,2.C,3.C} tests would time out.
The problem is that sometimes gomp_team_barrier_clear_task_pending
wasn't called, when GOMP_taskgroup_end (or could be GOMP_taskwait)
ate the last few threads.
We have task_count counter (number of GOMP_TASK_{WAITING,TIED} tasks
that haven't finished yet) and then task_running_count, which is only
incremented in gomp_barrier_handle_tasks and decremented there too,
and we use those for two purposes:
1) to find out if it makes sense to wake some threads possibly sleeping
   on a barrier
2) to find out if it is ok to call gomp_team_barrier_clear_task_pending
Pre-gomp-4_0-branch, we were actually incrementing task_running_count
in both gomp_barrier_handle_tasks and GOMP_taskwait (GOMP_taskgroup_end
didn't exist yet), and thus task_running_count was actually usable for 2),
but could be way off for 1) - task_running_count could be much bigger than
nthreads, even when say all the running tasks were on one or two threads,
all caused by GOMP_taskwait inside of task spawned from GOMP_taskwait,
etc.  I've changed it to only increment task_running_count in
gomp_barrier_handle_tasks, so it is actually guaranteed to be <= nthreads
and can be reliably used for 1), but that apparently broke the spots
using it for 2).  So, this patch introduces another counter, number of
GOMP_TASK_WAITING tasks ready to be scheduled, but not scheduled yet,
and we simply clear pending tasks bits when that counter decrements to zero.

Tested on x86_64-linux and i686-linux.

2013-10-08  Jakub Jelinek  

* libgomp.h (struct gomp_team): Add task_queued_count field.
Add comments about task_{,queued_,running_}count.
* team.c (gomp_new_team): Clear task_queued_count.
* task.c (GOMP_task): Increment task_queued_count.
(gomp_task_run_pre): Decrement task_queued_count.  If it is
decremented to zero, call gomp_team_barrier_clear_task_pending.
(gomp_task_run_post_handle_dependers): Increment task_queued_count.
(gomp_barrier_handle_tasks): Don't call
gomp_team_barrier_clear_task_pending here.

--- libgomp/libgomp.h.jj2013-10-04 13:48:39.0 +0200
+++ libgomp/libgomp.h   2013-10-08 19:17:12.303460999 +0200
@@ -385,8 +385,18 @@ struct gomp_team
 
   gomp_mutex_t task_lock;
   struct gomp_task *task_queue;
-  int task_count;
-  int task_running_count;
+  /* Number of all GOMP_TASK_{WAITING,TIED} tasks in the team.  */
+  unsigned int task_count;
+  /* Number of GOMP_TASK_WAITING tasks currently waiting to be scheduled.  */
+  unsigned int task_queued_count;
+  /* Number of GOMP_TASK_{WAITING,TIED} tasks currently running
+ directly in gomp_barrier_handle_tasks; tasks spawned
+ from e.g. GOMP_taskwait or GOMP_taskgroup_end don't count, even when
+ that is called from a task run from gomp_barrier_handle_tasks.
+ task_running_count should be always <= team->nthreads,
+ and if current task isn't in_tied_task, then it will be
+ even < team->nthreads.  */
+  unsigned int task_running_count;
   int work_share_cancelled;
   int team_cancelled;
 
--- libgomp/team.c.jj   2013-10-04 18:44:31.0 +0200
+++ libgomp/team.c  2013-10-08 19:03:25.067640468 +0200
@@ -172,6 +172,7 @@ gomp_new_team (unsigned nthreads)
   gomp_mutex_init (&team->task_lock);
   team->task_queue = NULL;
   team->task_count = 0;
+  team->task_queued_count = 0;
   team->task_running_count = 0;
   team->work_share_cancelled = 0;
   team->team_cancelled = 0;
--- libgomp/task.c.jj   2013-09-27 12:04:12.0 +0200
+++ libgomp/task.c  2013-10-08 19:50:46.057284830 +0200
@@ -407,6 +407,7 @@ GOMP_task (void (*fn) (void *), void *da
  team->task_queue = task;
}
   ++team->task_count;
+  ++team->task_queued_count;
   gomp_team_barrier_set_task_pending (&team->barrier);
   do_wake = team->task_running_count + !parent->in_tied_task
< team->nthreads;
@@ -434,6 +435,8 @@ gomp_task_run_pre (struct gomp_task *chi
team->task_queue = NULL;
 }
   child_task->kind = GOMP_TASK_TIED;
+  if (--team->task_queued_count == 0)
+gomp_team_barrier_clear_task_pending (&team->barrier);
   if ((gomp_team_barrier_cancelled (&team->barrier)
|| (taskgroup && taskgroup->cancelled))
   && !child_task->copy_ctors_done)
@@ -538,6 +541,7 @@ gomp_task_run_post_handle_dependers (str
  team->task_queue = task;
}
   ++team->task_count;
+  ++team->task_queued_count;
   ++ret;
 }
   free (child_task->dependers);
@@ -670,8 +674,6 @@ gomp_barrier_handle_tasks (gomp_barrier_
}
  team->task_running_count++;
  child_task->in_tied_task = true;
- if (team->task_count == team->task_running_count)
-   gomp_team_barrier_clear_task_pending (&team->barrier);
}
   gomp_mutex_unlock (&team->task_lock);
   if (do_wake)

Jakub


Re: [c++-concepts] friends regression

2013-10-08 Thread Andrew Sutton
No, any current_template_reqs are reset (set to null) before parsing
any trailing requirements and restored after the fact.

Andrew Sutton


On Mon, Oct 7, 2013 at 3:05 PM, Jason Merrill  wrote:
> OK.
>
> If we have a friend declaration inside a constrained partial specialization,
> will that still get a false positive?
>
> Jason


Re: [patch] Fix PR middle-end/58570

2013-10-08 Thread Eric Botcazou
> Probably because the actual accesses may overlap if we choose to
> perform a bigger access.

Nope, simply because they share a byte.

> The same can happen if we for struct { char c1; char c2; } perform
> an HImode access in case the target doesn't support QImode accesses.
> Basically anytime we go through the bitfield expansion path.  Thus, doesn't
> that mean that MEM_EXPR is wrong on the MEMs?  Maybe we used to
> strip all DECL_BIT_FIELD component-refs at some point (adjusting
> MEM_OFFSET accordingly)?

Yes, we used to strip the MEM_EXPRs as soon as we go through the bitfield 
expansion path until last year, when I changed it:

2012-09-14  Eric Botcazou  

PR rtl-optimization/44194
* calls.c (expand_call): In the PARALLEL case, copy the return value
into pseudos instead of spilling it onto the stack.
* emit-rtl.c (adjust_address_1): Rename ADJUST into ADJUST_ADDRESS and
add new ADJUST_OBJECT parameter.
If ADJUST_OBJECT is set, drop the underlying object if it cannot be
proved that the adjusted memory access is still within its bounds.
(adjust_automodify_address_1): Adjust call to adjust_address_1.
(widen_memory_access): Likewise.
* expmed.c (store_bit_field_1): Call adjust_bitfield_address instead
of adjust_address.  Do not drop the underlying object of a MEM.
(store_fixed_bit_field): Likewise.
(extract_bit_field_1): Likewise.  Fix oversight in recursion.
(extract_fixed_bit_field): Likewise.
* expr.h (adjust_address_1): Adjust prototype.
(adjust_address): Adjust call to adjust_address_1.
(adjust_address_nv): Likewise.
(adjust_bitfield_address): New macro.
(adjust_bitfield_address_nv): Likewise.
* expr.c (expand_assignment): Handle a PARALLEL in more cases.
(store_expr): Likewise.
(store_field): Likewise.

But this was done carefully, i.e. we still drop the MEM_EXPRs if we cannot 
prove that they are still valid.  Now the granularity of memory accesses at 
the RTL level is the byte so everything is rounded up to byte boundaries, 
that's why bitfields sharing a byte need to be dealt with specially.

> Your patch seems to paper over this issue in the wrong place ...

No, it's the proper, albeit conservative, fix in my opinion.

-- 
Eric Botcazou


[PATCH][AArch64] NEON vclz intrinsic modified

2013-10-08 Thread Alex Velenko

Hi,

This patch implements the behavior and regression
test for NEON intrinsics vclz[q]_[s,u][8,16,32]
No problems found when running aarch64-none-elf
regressions tests.

Is patch OK?

Thanks,
Alex

gcc/testsuite/

2013-10-08  Alex Velenko  

* gcc.target/aarch64/vclz.c: New testcase.

gcc/

2013-10-08  Alex Velenko  

* config/aarch64/arm_neon.h (vclz_s8): Asm replaced with C
  (vclz_s16): Likewise.
  (vclz_s32): Likewise.
  (vclzq_s8): Likewise.
  (vclzq_s16): Likewise.
  (vclzq_s32): Likewise.
  (vclz_u8): Likewise.
  (vclz_u16): Likewise.
  (vclz_u32): Likewise.
  (vclzq_u8): Likewise.
  (vclzq_u16): Likewise.
  (vclzq_u32): Likewise.

* config/aarch64/aarch64.h (CLZ_DEFINED_VALUE_AT_ZERO): 
Macro fixed for clz.


* config/aarch64/aarch64-simd-builtins.def (VAR1 (UNOP, 
clz, 0, v4si)): Replaced with iterator.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 35897f3939556d7bb804d4b4ae692a300b103681..c18b150a1f5f2131deb54e3f66f93330c43bcefd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -45,7 +45,7 @@
   BUILTIN_VDQF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
-  VAR1 (UNOP, clz, 2, v4si)
+  BUILTIN_VDQ_BHSI (UNOP, clz, 2)
 
   BUILTIN_VALL (GETLANE, get_lane, 0)
   VAR1 (GETLANE, get_lane, 0, di)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index da2b46d14cf02814f93aeda1535461c242174aae..7a80e96385f935e032bc0421d1aeea52de7bcd1d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -739,7 +739,7 @@ do {	 \
: reverse_condition (CODE))
 
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE))
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 2)
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index db9bf28227e87072b48f5dca8835be8007c6b93d..482d7d03ed4995d46bef14a0c2c42903aafc6986 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5158,138 +5158,6 @@ vclsq_s32 (int32x4_t a)
   return result;
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vclz_s8 (int8x8_t a)
-{
-  int8x8_t result;
-  __asm__ ("clz %0.8b,%1.8b"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vclz_s16 (int16x4_t a)
-{
-  int16x4_t result;
-  __asm__ ("clz %0.4h,%1.4h"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vclz_s32 (int32x2_t a)
-{
-  int32x2_t result;
-  __asm__ ("clz %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vclz_u8 (uint8x8_t a)
-{
-  uint8x8_t result;
-  __asm__ ("clz %0.8b,%1.8b"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vclz_u16 (uint16x4_t a)
-{
-  uint16x4_t result;
-  __asm__ ("clz %0.4h,%1.4h"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vclz_u32 (uint32x2_t a)
-{
-  uint32x2_t result;
-  __asm__ ("clz %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vclzq_s8 (int8x16_t a)
-{
-  int8x16_t result;
-  __asm__ ("clz %0.16b,%1.16b"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vclzq_s16 (int16x8_t a)
-{
-  int16x8_t result;
-  __asm__ ("clz %0.8h,%1.8h"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vclzq_s32 (int32x4_t a)
-{
-  int32x4_t result;
-  __asm__ ("clz %0.4s,%1.4s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vclzq_u8 (uint8x16_t a)
-{
-  uint8x16_t result;
-  __asm__ ("clz %0.16b,%1.16b"
-   : "=w"(result)
-   : "w"(a)
-   : /*

[PATCH][AArch64] NEON vadd_f64 and vsub_f64 intrinsics modified

2013-10-08 Thread Alex Velenko


Hi,

This patch implements the behavior of vadd_f64 and
vsub_f64 NEON intrinsics. Regression tests are added.
Regression tests for aarch64-none-elf completed with no
regressions.

OK?

Thanks,
Alex

gcc/testsuite/

2013-10-08  Alex Velenko  

* gcc.target/aarch64/vadd_f64.c: New testcase.
* gcc.target/aarch64/vsub_f64.c: New testcase.

gcc/

2013-10-08  Alex Velenko  

* config/aarch64/arm_neon.h (vadd_f64): Implementation added.
(vsub_f64): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 1bd098d2a9c3a204c0fb57ee3ef31cbb5f328d8e..b8791b7b5dd7123b6d708aeb2321986673a0c0cd 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -634,6 +634,12 @@ vadd_f32 (float32x2_t __a, float32x2_t __b)
   return __a + __b;
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vadd_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return __a + __b;
+}
+
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vadd_u8 (uint8x8_t __a, uint8x8_t __b)
 {
@@ -1824,6 +1830,12 @@ vsub_f32 (float32x2_t __a, float32x2_t __b)
   return __a - __b;
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vsub_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return __a - __b;
+}
+
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vsub_u8 (uint8x8_t __a, uint8x8_t __b)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/vadd_f64.c b/gcc/testsuite/gcc.target/aarch64/vadd_f64.c
new file mode 100644
index ..c3bf7349597aa9b75e0bc34cfd4cde4dc16b95f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vadd_f64.c
@@ -0,0 +1,114 @@
+/* Test vadd works correctly.  */
+/* { dg-do run } */
+/* { dg-options "--save-temps" } */
+
+#include 
+
+#define FLT_EPSILON __FLT_EPSILON__
+#define DBL_EPSILON __DBL_EPSILON__
+
+#define TESTA0 0.3
+#define TESTA1 -1.
+#define TESTA2 0
+#define TESTA3 1.23456
+/* 2^54, double has 53 significand bits
+   according to Double-precision floating-point format.  */
+#define TESTA4 18014398509481984
+#define TESTA5 (1.0 / TESTA4)
+
+#define TESTB0 0.7
+#define TESTB1 2
+#define TESTB2 0
+#define TESTB3 -2
+#define TESTB4 1.0
+#define TESTB5 2.0
+
+#define ANSW0 1
+#define ANSW1 0.2223
+#define ANSW2 0
+#define ANSW3 -0.76544
+#define ANSW4 TESTA4
+#define ANSW5 2.0
+
+extern void abort (void);
+
+#define EPSILON __DBL_EPSILON__
+#define ABS(a) __builtin_fabs (a)
+#define ISNAN(a) __builtin_isnan (a)
+#define FP_equals(a, b, epsilon)			\
+  (			\
+   ((a) == (b))		\
+|| (ISNAN (a) && ISNAN (b))\
+|| (ABS (a - b) < epsilon)\
+   )
+
+int
+test_vadd_f64 ()
+{
+  float64x1_t a;
+  float64x1_t b;
+  float64x1_t c;
+
+  a = TESTA0;
+  b = TESTB0;
+  c = ANSW0;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  a = TESTA1;
+  b = TESTB1;
+  c = ANSW1;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  a = TESTA2;
+  b = TESTB2;
+  c = ANSW2;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  a = TESTA3;
+  b = TESTB3;
+  c = ANSW3;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  a = TESTA4;
+  b = TESTB4;
+  c = ANSW4;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  a = TESTA5;
+  b = TESTB5;
+  c = ANSW5;
+
+  a = vadd_f64 (a, b);
+  if (!FP_equals (a, c, EPSILON))
+return 1;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "fadd\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 6 } } */
+
+int
+main (int argc, char **argv)
+{
+  if (test_vadd_f64 ())
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vsub_f64.c b/gcc/testsuite/gcc.target/aarch64/vsub_f64.c
new file mode 100644
index ..abf4fc42d49dc695f435b1e0f331737c8e9367b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vsub_f64.c
@@ -0,0 +1,116 @@
+/* Test vsub works correctly.  */
+/* { dg-do run } */
+/* { dg-options "--save-temps" } */
+
+#include 
+
+#define FLT_EPSILON __FLT_EPSILON__
+#define DBL_EPSILON __DBL_EPSILON__
+
+#define TESTA0 1
+#define TESTA1 0.2223
+#define TESTA2 0
+#define TESTA3 -0.76544
+/* 2^54, double has 53 significand bits
+   according to Double-precision floating-point format.  */
+#define TESTA4 18014398509481984
+#define TESTA5 2.0
+
+#define TESTB0 0.7
+#define TESTB1 2
+#define TESTB2 0
+#define TESTB3 -2
+#define TESTB4 1.0
+#define TESTB5 (1.0 / TESTA4)
+
+#define ANSW0 0.3
+#define ANSW1 -1.
+#define ANSW2 0
+#define ANSW3 1.23456
+#define ANSW4 TESTA4
+#define ANSW5 2.0
+
+extern void abort (void);
+
+#define EPSILON __DBL_EPSILON__
+#define ISNAN(a) __builtin_isnan (a)
+/* FP_equals is implemented like this to execute subtraction
+   exectly once during a single test run.  */
+#defi

Re: [PATCH] alternative hirate for builtin_expert

2013-10-08 Thread Dehao Chen
Thanks for applying the patch. Backported to google-4_8

I still have some concern when inlining .part function into its
original function: basically, the gimple_block for that call may be
NULL, but it does not make sense to clear all block info for all stmts
in the .part function.

Dehao

On Tue, Oct 8, 2013 at 1:35 AM, Ramana Radhakrishnan
 wrote:
>>> Can someone comment / approve it quickly so that we get AArch32 and AArch64
>>> linux cross-builds back up ?
>>
>> Ok.
>
> Applied for Dehao as r203269 . Tests on arm came back ok.
>
> Ramana
>
>>
>> Thanks,
>> Richard.
>>
>>>
>>> regards
>>> Ramana
>>>

 Honza

>
> Dehao
>
>>
>> Honza


>>>
>>>


[PATCH][AARCH64] Vdiv NEON intrinsic

2013-10-08 Thread Alex Velenko

Hi,

This patch implements the behavior of vdiv_f64 intrinsic
and adds regression tests for vdiv[q]_f[32,64] NEON intrinsics.

Full aarch64-none-elf regression test ran with no regressions.

Is it OK?

Thanks,
Alex

gcc/testsuite/

2013-09-10  Alex Velenko  

* gcc.target/aarch64/vdiv_f.c: New testcase.

gcc/

2013-09-10  Alex Velenko  

* config/aarch64/arm_neon.h (vdiv_f64): Added.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b8791b7b5dd7123b6d708aeb2321986673a0c0cd..db9bf28227e87072b48f5dca8835be8007c6b93d 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -1210,6 +1210,12 @@ vdiv_f32 (float32x2_t __a, float32x2_t __b)
   return __a / __b;
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vdiv_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return __a / __b;
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdivq_f32 (float32x4_t __a, float32x4_t __b)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/vdiv_f.c b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
new file mode 100644
index ..cc3a9570c0fac0dcbf38f38314a416cca5e58c6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
@@ -0,0 +1,361 @@
+/* Test vdiv works correctly.  */
+/* { dg-do run } */
+/* { dg-options "-O3 --save-temps" } */
+
+#include 
+
+#define FLT_INFINITY (__builtin_inff ())
+#define DBL_INFINITY (__builtin_inf ())
+
+#define NAN (0.0 / 0.0)
+
+#define PI 3.141592653589793
+#define PI_4 0.7853981633974483
+#define SQRT2 1.4142135623730951
+#define SQRT1_2 0.7071067811865475
+
+#define TESTA0 PI
+#define TESTA1 -PI
+#define TESTA2 PI
+#define TESTA3 -PI
+#define TESTA4 1.0
+#define TESTA5 -1.0
+#define TESTA6 1.0
+#define TESTA7 -1.0
+/* 2^25+1, float has 24 significand bits
+   according to Single-precision floating-point format.  */
+#define TESTA8_FLT 33554433
+/* 2^54+1, double has 53 significand bits
+   according to Double-precision floating-point format.  */
+#define TESTA8_DBL 18014398509481985
+#define TESTA9 -TESTA8
+#define TESTA10 TESTA8
+#define TESTA11 -TESTA8
+#define TESTA12 NAN
+#define TESTA13 1.0
+#define TESTA14 INFINITY
+#define TESTA15 -INFINITY
+#define TESTA16 INFINITY
+#define TESTA17 9.0
+#define TESTA18 11.0
+#define TESTA19 13.0
+
+#define TESTB0 4.0
+#define TESTB1 4.0
+#define TESTB2 -4.0
+#define TESTB3 -4.0
+#define TESTB4 SQRT2
+#define TESTB5 SQRT2
+#define TESTB6 -SQRT2
+#define TESTB7 -SQRT2
+#define TESTB8 2.0
+#define TESTB9 2.0
+#define TESTB10 -2.0
+#define TESTB11 -2.0
+#define TESTB12 3.0
+#define TESTB13 NAN
+#define TESTB14 5.0
+#define TESTB15 7.0
+#define TESTB16 INFINITY
+#define TESTB17 INFINITY
+#define TESTB18 -INFINITY
+#define TESTB19 0
+
+#define ANSW0 PI_4
+#define ANSW1 -PI_4
+#define ANSW2 -PI_4
+#define ANSW3 PI_4
+#define ANSW4 SQRT1_2
+#define ANSW5 -SQRT1_2
+#define ANSW6 -SQRT1_2
+#define ANSW7 SQRT1_2
+#define ANSW8_FLT 16777216
+#define ANSW8_DBL 9007199254740992
+#define ANSW9 -ANSW8
+#define ANSW10 -ANSW8
+#define ANSW11 ANSW8
+#define ANSW12 NAN
+#define ANSW13 NAN
+#define ANSW14 INFINITY
+#define ANSW15 -INFINITY
+#define ANSW16 NAN
+#define ANSW17 0
+#define ANSW18 0
+#define ANSW19 INFINITY
+
+#define CONCAT(a, b) a##b
+#define CONCAT1(a, b) CONCAT (a, b)
+#define REG_INFEX64 _
+#define REG_INFEX128 q_
+#define REG_INFEX(reg_len) REG_INFEX##reg_len
+#define POSTFIX(reg_len, data_len) \
+  CONCAT1 (REG_INFEX (reg_len), f##data_len)
+
+#define DATA_TYPE_32 float
+#define DATA_TYPE_64 double
+#define DATA_TYPE(data_len) DATA_TYPE_##data_len
+
+#define EPSILON_32 __FLT_EPSILON__
+#define EPSILON_64 __DBL_EPSILON__
+#define EPSILON(data_len) EPSILON_##data_len
+
+#define INDEX64_32 [i]
+#define INDEX64_64
+#define INDEX128_32 [i]
+#define INDEX128_64 [i]
+#define INDEX(reg_len, data_len) \
+  CONCAT1 (INDEX, reg_len##_##data_len)
+
+#define LOAD_INST(reg_len, data_len) \
+  CONCAT1 (vld1, POSTFIX (reg_len, data_len))
+#define DIV_INST(reg_len, data_len) \
+  CONCAT1 (vdiv, POSTFIX (reg_len, data_len))
+
+#define ABS(a) __builtin_fabs (a)
+#define ISNAN(a) __builtin_isnan (a)
+#define FP_equals(a, b, epsilon)			\
+  (			\
+   ((a) == (b))		\
+|| (ISNAN (a) && ISNAN (b))\
+|| (ABS (a - b) < epsilon)\
+  )
+
+#define INHIB_OPTIMIZATION asm volatile ("" : : : "memory")
+
+#define RUN_TEST(a, b, c, testseta, testsetb, answset, count,		\
+		 reg_len, data_len, n)	\
+{	\
+  int i;\
+  INHIB_OPTIMIZATION;			\
+  (a) = LOAD_INST (reg_len, data_len) (testseta[count]);		\
+  (b) = LOAD_INST (reg_len, data_len) (testsetb[count]);		\
+  (c) = LOAD_INST (reg_len, data_len) (answset[count]);			\
+  INHIB_OPTIMIZATION;			\
+  (a) = DIV_INST (reg_len, data_len) (a, b);\
+  for (i = 0; i < n; i++)		\
+  {	\
+INHIB_OPTIMIZATION;			\
+if (!FP_equals ((a) INDEX (reg_len, data_len),			\
+		(c) INDEX (reg_len, 

Re: [C++ Patch] PR 58568

2013-10-08 Thread Jason Merrill

OK.

Jason


[PATCH][AArch64] Vneg NEON intrinsics modified

2013-10-08 Thread Alex Velenko

Hi,

This patch implements the behavior of the following
neon intrinsics using C:
vneg[q]_f[32,64]
vneg[q]_s[8,16,32,64]

Regression tests for listed intrinsics included.
I ran a full regression test for aarch64-none-elf
with no regressions.

Ok?

Thanks,
Alex

gcc/testsuite/

2013-10-08  Alex Velenko  

* gcc.target/aarch64/vneg_f.c: New testcase.
* gcc.target/aarch64/vneg_s.c: New testcase.

gcc/

2013-10-08  Alex Velenko  

* config/aarch64/arm_neon.h (vneg_f32): Asm replaced with C.
(vneg_f64): New intrinsic.
(vneg_s8): Asm replaced with C.
(vneg_s16): Likewise.
(vneg_s32): Likewise.
(vneg_s64): New intrinsic.
(vnegq_f32): Asm replaced with C.
(vnegq_f64): Likewise.
(vnegq_s8): Likewise.
(vnegq_s16): Likewise.
(vnegq_s32): Likewise.
(vnegq_s64): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index cb5860206a1812f347a77d4a6e06519f8c3a696f..1bd098d2a9c3a204c0fb57ee3ef31cbb5f328d8e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -9785,115 +9785,6 @@ vmvnq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vneg_f32 (float32x2_t a)
-{
-  float32x2_t result;
-  __asm__ ("fneg %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vneg_s8 (int8x8_t a)
-{
-  int8x8_t result;
-  __asm__ ("neg %0.8b,%1.8b"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vneg_s16 (int16x4_t a)
-{
-  int16x4_t result;
-  __asm__ ("neg %0.4h,%1.4h"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vneg_s32 (int32x2_t a)
-{
-  int32x2_t result;
-  __asm__ ("neg %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vnegq_f32 (float32x4_t a)
-{
-  float32x4_t result;
-  __asm__ ("fneg %0.4s,%1.4s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vnegq_f64 (float64x2_t a)
-{
-  float64x2_t result;
-  __asm__ ("fneg %0.2d,%1.2d"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vnegq_s8 (int8x16_t a)
-{
-  int8x16_t result;
-  __asm__ ("neg %0.16b,%1.16b"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vnegq_s16 (int16x8_t a)
-{
-  int16x8_t result;
-  __asm__ ("neg %0.8h,%1.8h"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vnegq_s32 (int32x4_t a)
-{
-  int32x4_t result;
-  __asm__ ("neg %0.4s,%1.4s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vnegq_s64 (int64x2_t a)
-{
-  int64x2_t result;
-  __asm__ ("neg %0.2d,%1.2d"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpadal_s8 (int16x4_t a, int8x8_t b)
@@ -21241,6 +21132,80 @@ vmulq_laneq_u32 (uint32x4_t __a, uint32x4_t __b, const int __lane)
   return __a * __aarch64_vgetq_lane_u32 (__b, __lane);
 }
 
+/* vneg  */
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vneg_f32 (float32x2_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vneg_f64 (float64x1_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vneg_s8 (int8x8_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vneg_s16 (int16x4_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vneg_s32 (int32x2_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vneg_s64 (int64x1_t __a)
+{
+  return -__a;
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vnegq_f32 (float32x4_t __a)
+{
+  return -__a;
+

[gomp4] Small libgomp testsuite tweaks

2013-10-08 Thread Jakub Jelinek
Hi!

I've noticed that udr-8.C failed on i686-linux, apparently because i
was uninitialized.  udr-3.c had the same bug, but didn't FAIL because
of that on either x86_64-linux nor i686-linux, and udr-2.c had just
unused variable.

2013-10-08  Jakub Jelinek  

* testsuite/libgomp.c/udr-2.c (main): Remove unused variable i.
* testsuite/libgomp.c/udr-3.c (main): Initialize i to 0.
* testsuite/libgomp.c++/udr-8.C (main): Likewise.

--- libgomp/testsuite/libgomp.c/udr-2.c.jj  2013-09-19 09:12:43.0 
+0200
+++ libgomp/testsuite/libgomp.c/udr-2.c 2013-10-08 17:28:54.687344973 +0200
@@ -11,7 +11,7 @@ struct S { int s; };
 int
 main ()
 {
-  int i, u = 0, q = 0;
+  int u = 0, q = 0;
   struct S s, t;
   s.s = 0; t.s = 0;
   #pragma omp parallel reduction(+:s, q) reduction(foo:t, u)
--- libgomp/testsuite/libgomp.c/udr-3.c.jj  2013-09-19 09:12:43.0 
+0200
+++ libgomp/testsuite/libgomp.c/udr-3.c 2013-10-08 17:29:24.958191565 +0200
@@ -17,7 +17,7 @@ int
 main ()
 {
   struct S s;
-  int i;
+  int i = 0;
   s.s = 0;
   #pragma omp parallel reduction (+:s, i)
   {
--- libgomp/testsuite/libgomp.c++/udr-8.C.jj2013-09-18 12:43:23.0 
+0200
+++ libgomp/testsuite/libgomp.c++/udr-8.C   2013-10-08 17:29:51.235059721 
+0200
@@ -25,7 +25,7 @@ int
 main ()
 {
   S s, t;
-  int i;
+  int i = 0;
   #pragma omp parallel reduction (+:s, i) reduction (*:t)
   {
 if (s.s != 0 || t.s != 0)

Jakub


[gomp4] Adjust some gcc.dg/autopar/ tests

2013-10-08 Thread Jakub Jelinek
Hi!

These tests were expecting 5 loopfn matches, 3 on the fn definition, one
as GOMP_parallel_start argument and one called in between
GOMP_parallel_start and GOMP_parallel_end.  But the new API is
to call GOMP_parallel with the function and not call the outlined
function nor GOMP_parallel_end directly, GOMP_parallel will call it
indirectly.

2013-10-08  Jakub Jelinek  

* gcc.dg/autopar/outer-1.c: Expect 4 instead of 5 loopfn matches.
* gcc.dg/autopar/outer-2.c: Likewise.
* gcc.dg/autopar/outer-3.c: Likewise.
* gcc.dg/autopar/outer-4.c: Likewise.
* gcc.dg/autopar/outer-5.c: Likewise.
* gcc.dg/autopar/outer-6.c: Likewise.
* gcc.dg/autopar/parallelization-1.c: Likewise.

--- gcc/testsuite/gcc.dg/autopar/outer-1.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-1.c  2013-10-08 17:18:55.710385102 
+0200
@@ -28,6 +28,6 @@ int main(void)
 
 /* Check that outer loop is parallelized.  */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } 
} */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/outer-2.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-2.c  2013-10-08 17:18:57.659374373 
+0200
@@ -28,6 +28,6 @@ int main(void)
 }
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } 
} */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/outer-3.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-3.c  2013-10-08 17:18:59.151368202 
+0200
@@ -28,6 +28,6 @@ int main(void)
 
 /* Check that outer loop is parallelized.  */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } 
} */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/outer-4.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-4.c  2013-10-08 17:19:00.700358700 
+0200
@@ -32,6 +32,6 @@ int main(void)
 
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { 
xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/outer-5.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-5.c  2013-10-08 17:19:02.402350051 
+0200
@@ -45,6 +45,6 @@ int main(void)
 }
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { 
xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/outer-6.c.jj   2013-03-20 10:06:18.0 
+0100
+++ gcc/testsuite/gcc.dg/autopar/outer-6.c  2013-10-08 17:19:04.018341866 
+0200
@@ -46,6 +46,6 @@ int main(void)
 /* Check that outer loop is parallelized.  */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } 
} */
 /* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } 
} */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/autopar/parallelization-1.c.jj 2013-03-20 
10:06:18.0 +0100
+++ gcc/testsuite/gcc.dg/autopar/parallelization-1.c2013-10-08 
17:19:05.71280 +0200
@@ -28,6 +28,6 @@ int main(void)
 /* Check that the first loop in parloop got parallelized.  */
 
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 
"parloops" } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */

Jakub


Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib

2013-10-08 Thread Marcus Shawcroft
On 1 October 2013 12:40, Marcus Shawcroft  wrote:

> Patch attached.
>
> /Marcus
>
> 2013-10-01  Marcus Shawcroft  
>
> * configure.ac (AC_CHECK_FUNCS_ONCE): Add for exit() then make
> existing AC_CHECK_FUNCS_ONCE dependent on outcome.

Ping.


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 09:18 AM, Richard Biener wrote:

On Tue, Oct 8, 2013 at 2:58 PM, Andrew MacLeod  wrote:



I just took a quick stab at it...  I think its pretty involved and someone
with better loop comprehension should probably look at the followup of
removing that requirement. estimate_numbers_of_iterations_loop() in
particular uses last_stmt(), so it requires gimple.. and there is sprinkling
of gimple specific stuff all through it...  I have no idea how this is
suppose to work for rtl.

This is the way it is now, so at least by including that header, it
exposes the hidden problem and either I can revisit it later, or someone
else can tackle that.  it seems *really* messy.

OK as is for now?

Andrew

heh, make it available, and it will get used :-)  It hasn't been this that
way that long.

Well, that's just accessing already computed and preserved max-iteration.

That is, the accessors used by loop-*.c should probably be moved to
cfgloop.[ch].


huh, deeper than I intended to go but not too bad.   OK...  how about 
this as an add-on to the previous patch.  I'd just check in the 
combination all at once. A couple of the functions needed to be 
separated... so I created get_estimated_loop_iterations() and 
get_max_loop_iterations()... hopefully I did that right. basically I 
pulled out everything but the scev estimating,a dn left that in the 
original.   This bootstraps, and regression tests are running.


Assuming it all works out, is this ok?

Andrew


	* cfgloop.c (record_niter_bound, estimated_loop_iterations_int,
	max_stmt_executions_int): Move from tree-ssa-loop-niter.c.
	(get_estimated_loop_iterations): Factor out accessor from 
	estimated_loop_iterations in tree-ssa-loop-niter.c.
	(get_max_loop_iterations): Factor out accessor from _max_loop_iterations
	in tree-ssa-niter.c.
	* cfgloop.h: Add prototypes.
	(gcov_type_to_double_int): relocate from tree-ssa-loop.niter.c.
	* loop-iv.c: Don't include tree-ssa-niter.h.
	* loop-unroll.c: Don't include tree-ssa-niter.h.
	(decide_unroll_constant_iterations, decide_unroll_runtime_iterations,
	decide_peel_simple, decide_unroll_stupid): Use new get_* accessors.
	* loop-unswitch.c: Don't include tree-ssa-niter.h.
	* tree-ssa-loop-niter.c (record_niter_bound): Move to cfgloop.c/
	(gcov_type_to_double_int): Move to cfgloop.h.
	(estimated_loop_iterations): Factor out get_estimated_loop_iterations.
	(max_loop_iterations): Factor out get_max_loop_iterations.
	(estimated_loop_iterations_int, max_stmt_executions_int): Move to 
	cfgloop.c.
	* tree-ssa-loop-niter.h: Remove a few prototypes.

diff -cNp L/cfgloop.c ./cfgloop.c
*** L/cfgloop.c	2013-10-08 09:28:03.609369504 -0400
--- ./cfgloop.c	2013-10-08 09:48:17.867054642 -0400
*** get_loop_location (struct loop *loop)
*** 1781,1783 
--- 1781,1893 
return DECL_SOURCE_LOCATION (current_function_decl);
  }
  
+ /* Records that every statement in LOOP is executed I_BOUND times.
+REALISTIC is true if I_BOUND is expected to be close to the real number
+of iterations.  UPPER is true if we are sure the loop iterates at most
+I_BOUND times.  */
+ 
+ void
+ record_niter_bound (struct loop *loop, double_int i_bound, bool realistic,
+ 		bool upper)
+ {
+   /* Update the bounds only when there is no previous estimation, or when the
+  current estimation is smaller.  */
+   if (upper
+   && (!loop->any_upper_bound
+ 	  || i_bound.ult (loop->nb_iterations_upper_bound)))
+ {
+   loop->any_upper_bound = true;
+   loop->nb_iterations_upper_bound = i_bound;
+ }
+   if (realistic
+   && (!loop->any_estimate
+ 	  || i_bound.ult (loop->nb_iterations_estimate)))
+ {
+   loop->any_estimate = true;
+   loop->nb_iterations_estimate = i_bound;
+ }
+ 
+   /* If an upper bound is smaller than the realistic estimate of the
+  number of iterations, use the upper bound instead.  */
+   if (loop->any_upper_bound
+   && loop->any_estimate
+   && loop->nb_iterations_upper_bound.ult (loop->nb_iterations_estimate))
+ loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
+ }
+ 
+ /* Similar to estimated_loop_iterations, but returns the estimate only
+if it fits to HOST_WIDE_INT.  If this is not the case, or the estimate
+on the number of iterations of LOOP could not be derived, returns -1.  */
+ 
+ HOST_WIDE_INT
+ estimated_loop_iterations_int (struct loop *loop)
+ {
+   double_int nit;
+   HOST_WIDE_INT hwi_nit;
+ 
+   if (!estimated_loop_iterations (loop, &nit))
+ return -1;
+ 
+   if (!nit.fits_shwi ())
+ return -1;
+   hwi_nit = nit.to_shwi ();
+ 
+   return hwi_nit < 0 ? -1 : hwi_nit;
+ }
+ 
+ /* Returns an upper bound on the number of executions of statements
+in the LOOP.  For statements before the loop exit, this exceeds
+the number of execution of the latch by one.  */
+ 
+ HOST_WIDE_INT
+ max_stmt_executions_int (struct loop *loop)
+ {
+   HOST_WIDE_INT nit = max_loop_iterations_int (loop);
+   HOST_WIDE_INT snit;
+ 

Re: [Patch] Fix the testcases that use bind_pic_locally

2013-10-08 Thread Vidya Praveen
On Tue, Oct 08, 2013 at 10:30:22AM +0100, Jakub Jelinek wrote:
> On Tue, Oct 08, 2013 at 10:14:59AM +0100, Vidya Praveen wrote:
> > There are several tests that use "dg-add-options bind_pic_locally" in order 
> > to
> > add -fPIE or -fpie when -fPIC or -fpic are used respectively with the 
> > expecta-
> > tion that -fPIE/-fpie will override -fPIC/-fpic. But this doesn't happen 
> > since
> > since -fPIE/-fpie will be added before the -fPIC/-fpic (whether -fPIC/-fpic 
> > is
> > added as a multilib option or through cflags). This is essentially due to 
> > the
> > fact that cflags and multilib flags are added after the options are added 
> > through dg-options, dg-add-options, et al. in default_target_compile 
> > function.
> > 
> > Assuming dg-options or dg-add-options should always win, we can fix this by
> > modifying the order in which they are concatenated at 
> > default_target_compile in
> > target.exp. But this is not recommended since it depends on everyone who 
> > tests
> > upgrading their dejagnu (refer [1]). 
> 
> This looks like a big step backwards and I'm afraid it can break targets
> where -fpic/-fPIC is the default. 

I agree. I didn't think of this. Since the -fPIC/-fpic comes before the 
-fPIE/-fpie
this will work here. In other words, bind_pic_locally is not broken in this 
case.

(This is assuming the -fPIC/-fpic as default option is passed through 
DRIVER_SELF_SPECS or similar).

> If dg-add-options bind_pic_locally must
> add options to the end of command line, then can't you just push the options
> that must go last to some variable other than dg-extra-tool-flags and as we
> override dejagnu's dg-test, put it in our override last (or in whatever
> other method that already added the multilib options)?

Well, multilib options are added at default_target_compile which is in 
target.exp.
If I store the flags in some variable at add_options_for_bind_pic_locally and 
add
it later, it still going to be before default_target_compile is called. 

Hope I understood your suggestion right.

Cheers
VP




[patch] Tweak some libstdc++ tests.

2013-10-08 Thread Jonathan Wakely
2013-10-08  Jonathan Wakely  

* testsuite/*: Remove stray semi-colons after function definitions.

Tested x86_64-linux, committed to trunk.
commit 2c5f00a242d9aaf12cfcad0d217c4cad5b25b711
Author: Jonathan Wakely 
Date:   Tue Oct 8 14:16:59 2013 +0100

* testsuite/*: Remove stray semi-colons after function definitions.

diff --git a/libstdc++-v3/testsuite/26_numerics/valarray/subset_assignment.cc 
b/libstdc++-v3/testsuite/26_numerics/valarray/subset_assignment.cc
index 6b598c4..ed339fd 100644
--- a/libstdc++-v3/testsuite/26_numerics/valarray/subset_assignment.cc
+++ b/libstdc++-v3/testsuite/26_numerics/valarray/subset_assignment.cc
@@ -75,4 +75,4 @@ int main()
   VERIFY(check_array(val_g, ans4));
 
   return 0;
-};
+}
diff --git 
a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/bitmask_operators.cc 
b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/bitmask_operators.cc
index 3fd9649..74c3f99 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/bitmask_operators.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/bitmask_operators.cc
@@ -26,4 +26,4 @@
 int main()
 {
   __gnu_test::bitmask_operators();
-};
+}
diff --git 
a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/bitmask_operators.cc 
b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/bitmask_operators.cc
index 3931db4..acfa45a 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/bitmask_operators.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/bitmask_operators.cc
@@ -26,4 +26,4 @@
 int main()
 {
   __gnu_test::bitmask_operators();
-};
+}
diff --git 
a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/bitmask_operators.cc 
b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/bitmask_operators.cc
index cdb9b1b..c2e40bc 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/bitmask_operators.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/bitmask_operators.cc
@@ -26,4 +26,4 @@
 int main()
 {
   __gnu_test::bitmask_operators();
-};
+}
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring.cc
index 1b6c532..7340e7e 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring.cc
@@ -41,4 +41,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --git 
a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring_op.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring_op.cc
index e7dfe8d..a28e95e 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring_op.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/cstring_op.cc
@@ -40,4 +40,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --git 
a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/moveable.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/moveable.cc
index 18b3d5e..0897a0c 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/moveable.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/moveable.cc
@@ -30,16 +30,16 @@ void test01()
 {
   bool test __attribute__((unused)) = true;
 
-   std::regex src_re("aaba");
+  std::regex src_re("aaba");
   const unsigned mark_count = src_re.mark_count();
-   const std::regex::flag_type flags = src_re.flags();
+  const std::regex::flag_type flags = src_re.flags();
 
-   std::regex target_re;
+  std::regex target_re;
   
   target_re.assign(std::move(src_re));
   
-   VERIFY( target_re.flags() == flags );
-   VERIFY( target_re.mark_count() == mark_count );
+  VERIFY( target_re.flags() == flags );
+  VERIFY( target_re.mark_count() == mark_count );
 }
 
 int
@@ -47,4 +47,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/pstring.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/pstring.cc
index f5c6ff2..63f93c2 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/pstring.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/pstring.cc
@@ -40,4 +40,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/range.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/range.cc
index f871ff1..c08ac06 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/range.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/range.cc
@@ -40,4 +40,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/string.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/string.cc
index 2f75c7b..ed3fcdf 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/string.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/string.cc
@@ -41,4 +41,4 @@ main()
 { 
   test01();
   return 0;
-};
+}
diff --

Fix ARM/Thumb non-interworking problem in libgcc

2013-10-08 Thread Jonathan Larmour
This is forwarded from PR58660
(). Please CC me on
replies as I am not on the list.

In doing some testing of GCC 4.7.3 on a target with an ARMv4T CPU (e.g.
ARM7T or ARM9) I encountered a problem if using Thumb mode with
interworking disabled. For that, obviously I had to created a libgcc
multilib for this combination; for example: -mcpu=arm9 -mthumb
-mno-thumb-interwork.

However this fails to work in GCC 4.7.3 because of a problem in libgcc
making it fail to handle modes correctly for non-interworking operation. A
specific example is aeabi_uldivmod in bpabi.S which is an ARM (not thumb)
function, which includes a bl to __gnu_uldivmod_helper.
__gnu_uldivmod_helper is from bpabi.c and is in Thumb mode.

The linker helpfully adds a little trampoline to switch to Thumb mode
with the bx instruction (gnu_uldivmod_helper_from_arm). However GCC
returns from __gnu_uldivmod_helper with:

  pop {r3, r4, r5, r6, r7, pc}

This means that no mode switch happens on the return from the Thumb mode
function to the ARM mode aeabi_uldivmod function.

I am attaching a potential fix for the problem with aeabi_ldivmod and
aeabi_uldivmod (against 4.7.3), so it would be good for that to be applied
at least, including on the 4.7 and 4.8 branches.

2013-10-07  Jonathan Larmour  

* config/arm/bpabi.S (aeabi_ldivmod, aeabi_uldivmod): Allow for
non-interworking Thumb builds.

Someone may also want to double-check whether this sort of problem may
apply to other functions (although I haven't found any others yet myself,
but I'm not sure about how functions like those in unaligned-funcs.c would
be called).

Thanks,

Jifl
-- 
eCosCentric Limited  http://www.eCosCentric.com/ The eCos experts
Barnwell House, Barnwell Drive, Cambridge, UK.   Tel: +44 1223 245571
Registered in England and Wales: Reg No 4422071.
--["Si fractum non sit, noli id reficere"]--   Opinions==mine
diff -x CVS -x .svn -x '*~' -x '.#*' -x autom4te.cache -urpN 
gcc-4.7.3.pre/libgcc/config/arm/bpabi.S gcc-4.7.3/libgcc/config/arm/bpabi.S
--- gcc-4.7.3.pre/libgcc/config/arm/bpabi.S 2011-11-02 15:03:19.0 
+
+++ gcc-4.7.3/libgcc/config/arm/bpabi.S 2013-10-07 23:39:05.508083589 +0100
@@ -133,10 +133,25 @@ ARM_FUNC_START aeabi_ldivmod
 #else
do_push {sp, lr}
 #endif
+#if defined(__INTERWORKING_STUBS__)
+   /* In this case, __gnu_ldivmod_helper is compiled in Thumb mode, but
+  without interworking. This means it needs to be called in Thumb
+  mode, and will return in Thumb mode, not ARM. So we handle that.  */
+   orr ip, pc, #1
+   bx ip
+   .code 16
+   bl SYM(__gnu_ldivmod_helper) __PLT__
+   ldr r2, [sp, #4]
+   mov lr, r2
+   ldr r2, [sp, #8]
+   ldr r3, [sp, #12]
+   add sp, sp, #16
+#else
bl SYM(__gnu_ldivmod_helper) __PLT__
ldr lr, [sp, #4]
add sp, sp, #8
do_pop {r2, r3}
+#endif
RET

 #endif /* L_aeabi_ldivmod */
@@ -153,10 +168,25 @@ ARM_FUNC_START aeabi_uldivmod
 #else
do_push {sp, lr}
 #endif
+#if defined(__INTERWORKING_STUBS__)
+   /* In this case, __gnu_uldivmod_helper is compiled in Thumb mode, but
+  without interworking. This means it needs to be called in Thumb
+  mode, and will return in Thumb mode, not ARM. So we handle that.  */
+   orr ip, pc, #1
+   bx ip
+   .code 16
+   bl SYM(__gnu_uldivmod_helper) __PLT__
+   ldr r2, [sp, #4]
+   mov lr, r2
+   ldr r2, [sp, #8]
+   ldr r3, [sp, #12]
+   add sp, sp, #16
+#else
bl SYM(__gnu_uldivmod_helper) __PLT__
ldr lr, [sp, #4]
add sp, sp, #8
do_pop {r2, r3}
+#endif
RET

 #endif /* L_aeabi_divmod */


Re: [patch] fix libstdc++/58659

2013-10-08 Thread Jonathan Wakely
On 8 October 2013 13:33, Jonathan Wakely wrote:
> PR libstdc++/58659
> * include/bits/shared_ptr_base.h 
> (__shared_count::__shared_count(P,D)):
> Delegate to constructor taking allocator.
> (__shared_count::_S_create_from_up): Inline into ...
> (__shared_count::__shared_count(unique_ptr&&): Here. Use
> std::conditional instead of constrained overloads. Allocate memory
> using the allocator type that will be used for deallocation.
> * testsuite/20_util/shared_ptr/cons/58659.cc: New.
> * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust.
>
> Tested x86_64-linux, committed to trunk.

I've committed the same change to the 4.8 branch, except that the
dummy allocator type passed by the delegating constructor to the
target constructor is std::allocator instead of
std::allocator:

+   __shared_count(_Ptr __p, _Deleter __d)
+   : __shared_count(__p, std::move(__d), allocator())
+   { }

That's the type that was already used, so users will get the same
specialization of _Sp_counted_deleter with 4.8.2 as with 4.8.1.

On the trunk I changed it to std::allocator to be consistent
with the type used when constructing from a unique_ptr. That should
mean slightly smaller executables in cases like this:

std::shared_ptr sp1(new X, std::default_delete());
std::shared_ptr sp2(std::unique_ptr(new X));

On trunk the same _Sp_counted_deleter specialization will be used by
sp1 and sp2.


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Richard Biener
On Tue, Oct 8, 2013 at 2:58 PM, Andrew MacLeod  wrote:
> On 10/08/2013 08:35 AM, Andrew MacLeod wrote:
>>
>> On 10/08/2013 07:38 AM, Andrew MacLeod wrote:
>>>
>>> On 10/08/2013 05:57 AM, Richard Biener wrote:

 Hm.

 Index: loop-iv.c
 ===
 *** loop-iv.c   (revision 203243)
 --- loop-iv.c   (working copy)
 *** along with GCC; see the file COPYING3.
 *** 62,67 
 --- 62,68 
#include "df.h"
#include "hash-table.h"
#include "dumpfile.h"
 + #include "tree-ssa-loop-niter.h"

 loop-iv.c is RTL land (likewise loop-unroll.c and loop-unswitch.c),
 why do they need tree-ssa-loop-niter.h?

 Apart from that the patch is ok.

>>> we've got a bit of a mess in a number of places... I've cleaned up a few
>>> of the easy ones I found.
>>>
>>> This file is required for  record_niter_bound(),
>>> max_loop_iterations_int() and estimated_loop_iterations_int().. so all
>>> bounds estimations.  There was enough of an infrastructural requirement for
>>> these routines within the file that I decided it was beyond the scope of
>>> what I'm doing at the moment to split them out.
>>>
>>> Eventually I'd like to break this up into modules and make sure that
>>> thing aren't creeping in from the wrong places, and restructure a bit...
>>> Loop suffer from that (like here), cfg has a couple of places where either
>>> rtl or generic cfg routines care calling into a routine that can understand
>>> SSA. I noted one in one of the earlier patches that I'll have to get back
>>> to. This sort of thing has in fact prompted me to start looking now at our
>>> include web and what should be within the logical modules so we can more
>>> easily identify these sorts of things.
>>>
>>> Too many things to clean up!! I think I could be doing this sort of thing
>>> for the next 8 months easily :-)
>>>
>>> I'd prefer to come back and revisit this with a followup to try to split
>>> tree-ssa-loop-niter.c into another component, maybe loop-niter.[ch]...
>>>
>>> Andrew
>>>
>>>
>> I just took a quick stab at it...  I think its pretty involved and someone
>> with better loop comprehension should probably look at the followup of
>> removing that requirement. estimate_numbers_of_iterations_loop() in
>> particular uses last_stmt(), so it requires gimple.. and there is sprinkling
>> of gimple specific stuff all through it...  I have no idea how this is
>> suppose to work for rtl.
>>
>> This is the way it is now, so at least by including that header, it
>> exposes the hidden problem and either I can revisit it later, or someone
>> else can tackle that.  it seems *really* messy.
>>
>> OK as is for now?
>>
>> Andrew
>
> heh, make it available, and it will get used :-)  It hasn't been this that
> way that long.

Well, that's just accessing already computed and preserved max-iteration.

That is, the accessors used by loop-*.c should probably be moved to
cfgloop.[ch].

Richard.

>   2012-04-18  Richard Guenther  
>
> PR tree-optimization/44688
> * cfgloop.h (record_niter_bound): Declare.
> * tree-ssa-loop-niter.c (record_niter_bound): Export.
> Update the estimation with the upper bound here...
> (estimate_numbers_of_iterations_loop): ... instead of here.
> Do not forcefully reset a recorded upper bound.
>
> <...>
> 2012-10-08  Jan Hubicka  
>
> * loop-unswitch.c (unswitch_single_loop): Use
> estimated_loop_iterations_int to prevent unswitching when loop
> is known to not roll.
> * loop-unroll.c (decide_peel_once_rolling): Use
> max_loop_iterations_int.
> (unroll_loop_constant_iterations): Update
> nb_iterations_upper_bound and nb_iterations_estimate.
> (decide_unroll_runtime_iterations): Use
> estimated_loop_iterations or max_loop_iterations;
> (unroll_loop_runtime_iterations): fix profile updating.
> (decide_peel_simple): Use estimated_loop_iterations
> and max_loop_iterations.
> (decide_unroll_stupid): Use estimated_loop_iterations
> ad max_loop_iterations.
> * loop-doloop.c (doloop_modify): Use max_loop_iterations_int.
> (doloop_optimize): Likewise.
> * loop-iv.c (iv_number_of_iterations): Use record_niter_bound.
> (find_simple_exit): Likewise.
> * cfgloop.h (struct niter_desc): Remove niter_max.


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 08:35 AM, Andrew MacLeod wrote:

On 10/08/2013 07:38 AM, Andrew MacLeod wrote:

On 10/08/2013 05:57 AM, Richard Biener wrote:

Hm.

Index: loop-iv.c
===
*** loop-iv.c   (revision 203243)
--- loop-iv.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 62,67 
--- 62,68 
   #include "df.h"
   #include "hash-table.h"
   #include "dumpfile.h"
+ #include "tree-ssa-loop-niter.h"

loop-iv.c is RTL land (likewise loop-unroll.c and loop-unswitch.c),
why do they need tree-ssa-loop-niter.h?

Apart from that the patch is ok.

we've got a bit of a mess in a number of places... I've cleaned up a 
few of the easy ones I found.


This file is required for  record_niter_bound(), 
max_loop_iterations_int() and estimated_loop_iterations_int().. so 
all bounds estimations.  There was enough of an infrastructural 
requirement for these routines within the file that I decided it was 
beyond the scope of what I'm doing at the moment to split them out.


Eventually I'd like to break this up into modules and make sure that 
thing aren't creeping in from the wrong places, and restructure a 
bit...  Loop suffer from that (like here), cfg has a couple of places 
where either rtl or generic cfg routines care calling into a routine 
that can understand SSA. I noted one in one of the earlier patches 
that I'll have to get back to. This sort of thing has in fact 
prompted me to start looking now at our include web and what should 
be within the logical modules so we can more easily identify these 
sorts of things.


Too many things to clean up!! I think I could be doing this sort of 
thing for the next 8 months easily :-)


I'd prefer to come back and revisit this with a followup to try to 
split tree-ssa-loop-niter.c into another component, maybe 
loop-niter.[ch]...


Andrew


I just took a quick stab at it...  I think its pretty involved and 
someone with better loop comprehension should probably look at the 
followup of removing that requirement. 
estimate_numbers_of_iterations_loop() in particular uses last_stmt(), 
so it requires gimple.. and there is sprinkling of gimple specific 
stuff all through it...  I have no idea how this is suppose to work 
for rtl.


This is the way it is now, so at least by including that header, it 
exposes the hidden problem and either I can revisit it later, or 
someone else can tackle that.  it seems *really* messy.


OK as is for now?

Andrew
heh, make it available, and it will get used :-)  It hasn't been this 
that way that long.


  2012-04-18  Richard Guenther  

PR tree-optimization/44688
* cfgloop.h (record_niter_bound): Declare.
* tree-ssa-loop-niter.c (record_niter_bound): Export.
Update the estimation with the upper bound here...
(estimate_numbers_of_iterations_loop): ... instead of here.
Do not forcefully reset a recorded upper bound.

<...>
2012-10-08  Jan Hubicka  

* loop-unswitch.c (unswitch_single_loop): Use
estimated_loop_iterations_int to prevent unswitching when loop
is known to not roll.
* loop-unroll.c (decide_peel_once_rolling): Use
max_loop_iterations_int.
(unroll_loop_constant_iterations): Update
nb_iterations_upper_bound and nb_iterations_estimate.
(decide_unroll_runtime_iterations): Use
estimated_loop_iterations or max_loop_iterations;
(unroll_loop_runtime_iterations): fix profile updating.
(decide_peel_simple): Use estimated_loop_iterations
and max_loop_iterations.
(decide_unroll_stupid): Use estimated_loop_iterations
ad max_loop_iterations.
* loop-doloop.c (doloop_modify): Use max_loop_iterations_int.
(doloop_optimize): Likewise.
* loop-iv.c (iv_number_of_iterations): Use record_niter_bound.
(find_simple_exit): Likewise.
* cfgloop.h (struct niter_desc): Remove niter_max.


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 07:38 AM, Andrew MacLeod wrote:

On 10/08/2013 05:57 AM, Richard Biener wrote:

Hm.

Index: loop-iv.c
===
*** loop-iv.c   (revision 203243)
--- loop-iv.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 62,67 
--- 62,68 
   #include "df.h"
   #include "hash-table.h"
   #include "dumpfile.h"
+ #include "tree-ssa-loop-niter.h"

loop-iv.c is RTL land (likewise loop-unroll.c and loop-unswitch.c),
why do they need tree-ssa-loop-niter.h?

Apart from that the patch is ok.

we've got a bit of a mess in a number of places... I've cleaned up a 
few of the easy ones I found.


This file is required for  record_niter_bound(), 
max_loop_iterations_int() and estimated_loop_iterations_int().. so all 
bounds estimations.  There was enough of an infrastructural 
requirement for these routines within the file that I decided it was 
beyond the scope of what I'm doing at the moment to split them out.


Eventually I'd like to break this up into modules and make sure that 
thing aren't creeping in from the wrong places, and restructure a 
bit...  Loop suffer from that (like here), cfg has a couple of places 
where either rtl or generic cfg routines care calling into a routine 
that can understand SSA. I noted one in one of the earlier patches 
that I'll have to get back to.This sort of thing has in fact 
prompted me to start looking now at our include web and what should be 
within the logical modules so we can more easily identify these sorts 
of things.


Too many things to clean up!! I think I could be doing this sort of 
thing for the next 8 months easily :-)


I'd prefer to come back and revisit this with a followup to try to 
split tree-ssa-loop-niter.c into another component, maybe 
loop-niter.[ch]...


Andrew


I just took a quick stab at it...  I think its pretty involved and 
someone with better loop comprehension should probably look at the 
followup of removing that requirement. 
estimate_numbers_of_iterations_loop() in particular uses last_stmt(), so 
it requires gimple.. and there is sprinkling of gimple specific stuff 
all through it...  I have no idea how this is suppose to work for rtl.


This is the way it is now, so at least by including that header, it 
exposes the hidden problem and either I can revisit it later, or someone 
else can tackle that.  it seems *really* messy.


OK as is for now?

Andrew


[patch] fix libstdc++/58659

2013-10-08 Thread Jonathan Wakely
PR libstdc++/58659
* include/bits/shared_ptr_base.h (__shared_count::__shared_count(P,D)):
Delegate to constructor taking allocator.
(__shared_count::_S_create_from_up): Inline into ...
(__shared_count::__shared_count(unique_ptr&&): Here. Use
std::conditional instead of constrained overloads. Allocate memory
using the allocator type that will be used for deallocation.
* testsuite/20_util/shared_ptr/cons/58659.cc: New.
* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust.

Tested x86_64-linux, committed to trunk.
commit a6857ab59da2f7001495982dbcc2b58adcbe84e5
Author: Jonathan Wakely 
Date:   Tue Oct 8 01:02:19 2013 +0100

PR libstdc++/58659
* include/bits/shared_ptr_base.h (__shared_count::__shared_count(P,D)):
Delegate to constructor taking allocator.
(__shared_count::_S_create_from_up): Inline into ...
(__shared_count::__shared_count(unique_ptr&&): Here. Use
std::conditional instead of constrained overloads. Allocate memory
using the allocator type that will be used for deallocation.
* testsuite/20_util/shared_ptr/cons/58659.cc: New.
* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust.

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index f4bff77..911dd92 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -495,29 +495,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 
   template
-   __shared_count(_Ptr __p, _Deleter __d) : _M_pi(0)
-   {
- // The allocator's value_type doesn't matter, will rebind it anyway.
- typedef std::allocator _Alloc;
- typedef _Sp_counted_deleter<_Ptr, _Deleter, _Alloc, _Lp> _Sp_cd_type;
- typedef typename allocator_traits<_Alloc>::template
-   rebind_traits<_Sp_cd_type> _Alloc_traits;
- typename _Alloc_traits::allocator_type __a;
- _Sp_cd_type* __mem = 0;
- __try
-   {
- __mem = _Alloc_traits::allocate(__a, 1);
- _Alloc_traits::construct(__a, __mem, __p, std::move(__d));
- _M_pi = __mem;
-   }
- __catch(...)
-   {
- __d(__p); // Call _Deleter on __p.
- if (__mem)
-   _Alloc_traits::deallocate(__a, __mem, 1);
- __throw_exception_again;
-   }
-   }
+   __shared_count(_Ptr __p, _Deleter __d)
+   : __shared_count(__p, std::move(__d), allocator())
+   { }
 
   template
__shared_count(_Ptr __p, _Deleter __d, _Alloc __a) : _M_pi(0)
@@ -576,16 +556,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Special case for unique_ptr<_Tp,_Del> to provide the strong guarantee.
   template
 explicit
-   __shared_count(std::unique_ptr<_Tp, _Del>&& __r)
-   : _M_pi(_S_create_from_up(std::move(__r)))
-   { __r.release(); }
+   __shared_count(std::unique_ptr<_Tp, _Del>&& __r) : _M_pi(0)
+   {
+ using _Ptr = typename unique_ptr<_Tp, _Del>::pointer;
+ using _Del2 = typename conditional::value,
+ reference_wrapper::type>,
+ _Del>::type;
+ using _Sp_cd_type
+   = _Sp_counted_deleter<_Ptr, _Del2, allocator, _Lp>;
+ using _Alloc = allocator<_Sp_cd_type>;
+ using _Alloc_traits = allocator_traits<_Alloc>;
+ _Alloc __a;
+ _Sp_cd_type* __mem = _Alloc_traits::allocate(__a, 1);
+ _Alloc_traits::construct(__a, __mem, __r.release(),
+  __r.get_deleter());  // non-throwing
+ _M_pi = __mem;
+   }
 
   // Throw bad_weak_ptr when __r._M_get_use_count() == 0.
   explicit __shared_count(const __weak_count<_Lp>& __r);
 
   ~__shared_count() noexcept
   {
-   if (_M_pi != 0)
+   if (_M_pi != nullptr)
  _M_pi->_M_release();
   }
 
@@ -647,28 +640,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   friend class __weak_count<_Lp>;
 
-  template
-   static _Sp_counted_base<_Lp>*
-   _S_create_from_up(std::unique_ptr<_Tp, _Del>&& __r,
- typename std::enable_if::value>::type* = 0)
-   {
- typedef typename unique_ptr<_Tp, _Del>::pointer _Ptr;
- return new _Sp_counted_deleter<_Ptr, _Del, std::allocator,
-   _Lp>(__r.get(), __r.get_deleter());
-   }
-
-  template
-   static _Sp_counted_base<_Lp>*
-   _S_create_from_up(std::unique_ptr<_Tp, _Del>&& __r,
- typename std::enable_if::value>::type* = 0)
-   {
- typedef typename unique_ptr<_Tp, _Del>::pointer _Ptr;
- typedef typename std::remove_reference<_Del>::type _Del1;
- typedef std::reference_wrapper<_Del1> _Del2;
- return new _Sp_counted_deleter<_Ptr, _Del2, std::allocator,
-   _Lp>(__r.get(), std::ref(__r.get_deleter()));
-   }
-
   _Sp_counted_base<_Lp>*  _M_pi;
 

Re: [patch] The remainder of tree-flow.h refactored.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 06:22 AM, Richard Biener wrote:

On Fri, Oct 4, 2013 at 6:52 PM, Andrew MacLeod  wrote:

This patch clears the rest of the improperly located prototypes out of
tree-flow.h.  A bit larger than the last few, but I was pushing to clear
this up, and its not quite as bad as it seems :-)

Of interest:

* tree-flow.h now contains *only* prototypes for tree-cfg.c now. Next week
I'll rename it to tree-cfg.h when I  visit all the .c files which include
tree-flow.h to see who the true consumers are now.

* tree-ssa-address.c only had one client for get_address_description, and
that required exposing struct mem_address in tree-flow.h.  Since that call
was followed by a call to  addr_for_mem_ref with the structure, i figured it
was better to create a new funciton which combined that functionality. NOw
the struct is internal only.

* ipa_pure_const.c has only a single export... warn_function_noreturn()
which was only used in one place, tree-cfg.c to implement the
warn_function_noreturn pass.  Although the name of the file is not really
appropriate, I moved the pass structural stuff into ipa_pure_const.c and now
there are no exports.  I looked at numerous other options, and in the end,
this one is cleanest.  There is no other good home for the bits an pieces
that can be shuffled.   maybe ipa-func-attr.c would be a better file name
:-P but that is a miniscule issue :_)

* tree-cfg.c had the warn_function_noreturn pass removed from it, but I
relocated the execute_fixup_cfg routine/pass from tree-optimize.c here. It
makes more sense here and that was the only routines being exported from
tree-optimize.c, and tree-cfg was the only consumer.

* tree-dump.h was explicitly prototyping dump_function_to_file() from
tree-cfg.c, presumably to avoid including all of tree-flow.h. I removed that
and include tree-flow.h from tree-dump.c.   Seems like a backwards step
until one realizes that tree-flow.h is now the header file for tree-cfg.c.

* tree-ssa-dom.c::tree_ssa_dominator_optimize()  was calling
ssa_name_values.release() from tree-ssa-threadedge.c, and was the only
consumer.  The call immediately before is to threadedge_finalize_values()
which performs that operation, I deleted the use from tree-ssa-dom.c.   I
would like to have made the vec ssa_name_values static within the file
too, but then the SSA_NAME_VALUE macro to access it would have to make a
call into a function in the .c file to access it, and I don't know what the
performance hit would be... I expect it would be un-noticable to abstract
that out... but I left it for the moment.

* There was a bit of a mess between gimplify.c, gimple.c, and tree.c. In the
end, I
 - relocated the tree copying and unsharing routines from gimplify.c to
tree.c. The uses were sporadic throughout the compiler, and weren't always
used by things that should be gimple aware.  They were pure tree routines,
so that seemed like an appropriate place for them.  Maybe something like
tree-share.[ch] would be better...  that could be dealt with during a
tree.[ch] examination.
 -  force_gimple_operand_gsi_1and force_gimple_operand_gsi were the only
routines in gimplify.c that operated with gimple statement iterators from
gimple.c.  The include files had this chicken and egg problem where I
couldn't resolve the ordering very well.  IN the end, I decided to relocate
those 2 routines into gimple.c where there is a good understanding of
gimple-iterators.  I expect the gimple iterators will be split from
gimple.[ch] when I process gimple.h, so they may be relocatable again later.
 -  gimple.h declared some typedefs and enums that were gimplification
specific, so I moved those to the new gimplfy.h as well.


This boostraps on x86_64-unknown-linux-gnu and has no new regressions.  OK?

graphite.h should be unnecessary with moving the pass struct like you
did for other loop opts.  Likewise tree-parloops.h (well, ok, maybe
you need parallelized_function_p, even though it's implementation is
gross ;)).  Likewise tree-predcom.h.


fair enough.  Yes, I've already seen a few things that madfe my skin 
crawl  and I had to resist going down a  rathole for :-)


unvisit_body isn't generic enough to warrant moving out of gimplify.c
(the only user).

The force_gimple_operand_gsi... routines are in gimplify.c because they ...
gimplify!  And you moved them but not force_gimple_operand[_1]!?


OK, let me make the above adjustments, and I'll recreate a patch without 
the gimple/gimplfy parts, and re-address that separately.  I forget the 
details of my include issues there at the moment.


thanks
Andrew


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Andrew MacLeod

On 10/08/2013 05:57 AM, Richard Biener wrote:

On Mon, Oct 7, 2013 at 3:39 PM, Andrew MacLeod  wrote:

On 10/07/2013 04:44 AM, Richard Biener wrote:

On Thu, Oct 3, 2013 at 4:11 AM, Andrew MacLeod 
wrote:

this patch consolidates tree-ssa-loop*.c files with new .h files as
required
(8 in total)

A number of the prototypes were in tree-flow.h, but there were also a few
in
cfgloop.h.  tree-ssa-loop.h was created to contain a couple of common
structs and act as the gathering file for any generally applicable
tree-ssa-loop includes. tree-flow.h includes this file for now.

There is a bit of a criss-cross mess between the cfg-* and tree-ssa-loop*
routines, but I'm not touching that for now.  Some of that might have to
get
resolved when I try to remove tree-flow.h as a standard include file from
the .c files.. we'll see.

In particular, tree-ssa-loop-niter.h exports a lot of more generally used
routines. loop-iv.c, loop-unroll.c and loop-unswitch.c needed to include
it.

A few routines werent referenced outside their file so I made those
static,
and there was one routine stmt_invariant_in_loop_p wich was actually
unused.

bootstraps on x86_64-unknown-linux-gnu and passes with no new
regressions.
OK?

+   enum tree_code cmp;
+ };
+
+ #include "tree-ssa-loop-im.h"
+ #include "tree-ssa-loop-ivcanon.h"

what's the particular reason to not do includes first?  That looks really
odd (and maybe rather than doing that these includes should be done
elsewhere, like in .c files or in a header that includes tree-ssa-loop.h).


I had sent a followup patch because I didn't like it either :-). required
just a little shuffling since a couple of them used the typedefs declared
before them.


You seem to export things like movement_possibility that are not
used outside of tree-ssa-loop-im.c (in my tree at least).


Hmm, seems I  missed that one somehow... I've been checking every function
and enum, but seemed to have managed to miss that one somehow. easy to fix.

Yes, both those should go into the tree-ssa-loop-im.c



Other than that the single reason we have to have all these exports
is that loop passes have their pass structures and gate / entries
defined in tree-ssa-loop.c instead of in the files the passes reside.
Consider changing that as a preparatory patch - it should cut down
the number of needed new .h files.


Yeah, good observation.   And by shuffling a few other exported routines
which are more generally used into tree-ssa-loop.c:
   for_each_index, lsm_tmp_name_add, gen_lsm_tmp_name, get_lsm_tmp_name,
tree_num_loop_insns and enhancing get_lsm_tmp_name() to accept a suffix
parameter, those loop-im and loop-ivcanon no longer need a .h file either.

I moved the passes out of tree-ssa-loop.c  that would cause new .h files
unnecessarily, but left the others since that is orthogonal and could be
followed up later.

I just bundled it all together since it changes the original 2 patches a
bit.

Bootstraps on x86_64-unknown-linux-gnu, testsuite regressions running.
Assuming they pass fine, OK?

Hm.

Index: loop-iv.c
===
*** loop-iv.c   (revision 203243)
--- loop-iv.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 62,67 
--- 62,68 
   #include "df.h"
   #include "hash-table.h"
   #include "dumpfile.h"
+ #include "tree-ssa-loop-niter.h"

loop-iv.c is RTL land (likewise loop-unroll.c and loop-unswitch.c),
why do they need tree-ssa-loop-niter.h?

Apart from that the patch is ok.

we've got a bit of a mess in a number of places... I've cleaned up a few 
of the easy ones I found.


This file is required for  record_niter_bound(), 
max_loop_iterations_int() and estimated_loop_iterations_int().. so all 
bounds estimations.  There was enough of an infrastructural requirement 
for these routines within the file that I decided it was beyond the 
scope of what I'm doing at the moment to split them out.


Eventually I'd like to break this up into modules and make sure that 
thing aren't creeping in from the wrong places, and restructure a 
bit...  Loop suffer from that (like here), cfg has a couple of places 
where either rtl or generic cfg routines care calling into a routine 
that can understand SSA. I noted one in one of the earlier patches that 
I'll have to get back to.This sort of thing has in fact prompted me 
to start looking now at our include web and what should be within the 
logical modules so we can more easily identify these sorts of things.


Too many things to clean up!! I think I could be doing this sort of 
thing for the next 8 months easily :-)


I'd prefer to come back and revisit this with a followup to try to split 
tree-ssa-loop-niter.c into another component, maybe loop-niter.[ch]...


Andrew




Re: RFA: Add news item for ARC port contribution

2013-10-08 Thread Gerald Pfeifer

On Mon, 7 Oct 2013, Joern Rennecke wrote:

OK to commit?


Yes, this looks good to me, thanks!

While you are at it, your entry in gcc/doc/contrib.texi could
do with an update as well. :-)

Gerald


Re: [patch] The remainder of tree-flow.h refactored.

2013-10-08 Thread Richard Biener
On Fri, Oct 4, 2013 at 6:52 PM, Andrew MacLeod  wrote:
> This patch clears the rest of the improperly located prototypes out of
> tree-flow.h.  A bit larger than the last few, but I was pushing to clear
> this up, and its not quite as bad as it seems :-)
>
> Of interest:
>
> * tree-flow.h now contains *only* prototypes for tree-cfg.c now. Next week
> I'll rename it to tree-cfg.h when I  visit all the .c files which include
> tree-flow.h to see who the true consumers are now.
>
> * tree-ssa-address.c only had one client for get_address_description, and
> that required exposing struct mem_address in tree-flow.h.  Since that call
> was followed by a call to  addr_for_mem_ref with the structure, i figured it
> was better to create a new funciton which combined that functionality. NOw
> the struct is internal only.
>
> * ipa_pure_const.c has only a single export... warn_function_noreturn()
> which was only used in one place, tree-cfg.c to implement the
> warn_function_noreturn pass.  Although the name of the file is not really
> appropriate, I moved the pass structural stuff into ipa_pure_const.c and now
> there are no exports.  I looked at numerous other options, and in the end,
> this one is cleanest.  There is no other good home for the bits an pieces
> that can be shuffled.   maybe ipa-func-attr.c would be a better file name
> :-P but that is a miniscule issue :_)
>
> * tree-cfg.c had the warn_function_noreturn pass removed from it, but I
> relocated the execute_fixup_cfg routine/pass from tree-optimize.c here. It
> makes more sense here and that was the only routines being exported from
> tree-optimize.c, and tree-cfg was the only consumer.
>
> * tree-dump.h was explicitly prototyping dump_function_to_file() from
> tree-cfg.c, presumably to avoid including all of tree-flow.h. I removed that
> and include tree-flow.h from tree-dump.c.   Seems like a backwards step
> until one realizes that tree-flow.h is now the header file for tree-cfg.c.
>
> * tree-ssa-dom.c::tree_ssa_dominator_optimize()  was calling
> ssa_name_values.release() from tree-ssa-threadedge.c, and was the only
> consumer.  The call immediately before is to threadedge_finalize_values()
> which performs that operation, I deleted the use from tree-ssa-dom.c.   I
> would like to have made the vec ssa_name_values static within the file
> too, but then the SSA_NAME_VALUE macro to access it would have to make a
> call into a function in the .c file to access it, and I don't know what the
> performance hit would be... I expect it would be un-noticable to abstract
> that out... but I left it for the moment.
>
> * There was a bit of a mess between gimplify.c, gimple.c, and tree.c. In the
> end, I
> - relocated the tree copying and unsharing routines from gimplify.c to
> tree.c. The uses were sporadic throughout the compiler, and weren't always
> used by things that should be gimple aware.  They were pure tree routines,
> so that seemed like an appropriate place for them.  Maybe something like
> tree-share.[ch] would be better...  that could be dealt with during a
> tree.[ch] examination.
> -  force_gimple_operand_gsi_1and force_gimple_operand_gsi were the only
> routines in gimplify.c that operated with gimple statement iterators from
> gimple.c.  The include files had this chicken and egg problem where I
> couldn't resolve the ordering very well.  IN the end, I decided to relocate
> those 2 routines into gimple.c where there is a good understanding of
> gimple-iterators.  I expect the gimple iterators will be split from
> gimple.[ch] when I process gimple.h, so they may be relocatable again later.
> -  gimple.h declared some typedefs and enums that were gimplification
> specific, so I moved those to the new gimplfy.h as well.
>
>
> This boostraps on x86_64-unknown-linux-gnu and has no new regressions.  OK?

graphite.h should be unnecessary with moving the pass struct like you
did for other loop opts.  Likewise tree-parloops.h (well, ok, maybe
you need parallelized_function_p, even though it's implementation is
gross ;)).  Likewise tree-predcom.h.

unvisit_body isn't generic enough to warrant moving out of gimplify.c
(the only user).

The force_gimple_operand_gsi... routines are in gimplify.c because they ...
gimplify!  And you moved them but not force_gimple_operand[_1]!?

The rest looks reasonable (though the patch is big and I only had
a quick look).

Thanks,
Richard.



> Andrew
>
>
>


Re: [patch] tree-eh.c prototypes

2013-10-08 Thread Richard Biener
On Wed, Oct 2, 2013 at 7:54 PM, Andrew MacLeod  wrote:
> This patch moves the prototypes for tree-eh.c into a new file tree-eh.h.
> This file is in fact really gimple-eh.. we'll rename that later with the
> other tree->gimple renaming that is needed.
>
> however, using_eh_for_cleanups() is in fact a front end routine which is
> called when eh regions are used for cleanups. It sets a static flag in
> tree-eh.c and is only examined from one place in tree-eh.c.  I think 4 or 5
> of the front ends call this routine.
>
> Since this is really a front end interface routine, I kept the name and
> moved it and the static variable to tree.[ch] for now and added a query
> routine. This prevents the front ends from having to include any of this
> gimple stuff.
>
> Bootstraps onx86_64-unknown-linux-gnu and has no new regressions. OK?

Ok.

Thanks,
Richard.

> Andrew
>
> PS.  do we want to put debug routines in the .h file?  I ask because I see a
> few are, but in many other cases there are a number of them in the .c file
> which are not explicitly exported.  Often their names aren't very useful
> either and sometimes sometimes utilize structs or types that are specific to
> that .c file.   Mostly I think they are not static simply because the
> debugger needs them so the compiler wont throw them away.
>
> for instance, tree-ssa-pre.c has 3 of them, including a very common form:
> debug_bitmap_sets_for_bb(basic_block bb)...  This prints a bitmaps based on
> internal meanings of the bits. I see numerous other files which have
> similar, if slightly different names to do a simiiar function
> And in fact, tree-ssa-pre.c will have no header file, unless we need a place
> to put these 3 debug files.
>
> My personal preference is to simply leave them in the .c file, mostly
> because they can have internal types.  Ideally, all the prototypes would be
> listed early in the .c file in one place so anyone truing to debug something
> can find them easily.

The implementation should be marked with DEBUG_FUNCTION, the
header declarations are in place since in C days we required
strict prototypes.  I'd vote for removing the declarations in header files.

Richard.


Re: [patch] Fix PR bootstrap/58509

2013-10-08 Thread Richard Biener
On Fri, Sep 27, 2013 at 1:17 PM, Eric Botcazou  wrote:
> Hi,
>
> this fixes the ICE during the build of the Ada runtime on the SPARC, a fallout
> of the recent inliner changes:
>   http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01033.html
>
> The ICE is triggered because the ldd peephole merges an MEM with MEM_NOTRAP_P
> and a contiguous MEM without MEM_NOTRAP_P, keeping the MEM_NOTRAP_P flag on
> the result.  As a consequence, an EH edge is eliminated and a BB is orphaned.
>
> I think this shows that my above inliner patch was too gross: when you have
> successive inlining, you can quickly end up with a mess of trapping and non-
> trapping memory accesses for the same object.  So the attached seriously
> refines it, restricting it to parameters with reference type and leaning
> towards being less conservative.  Again, this should only affect Ada.
>
> Tested on x86_64-suse-linux, OK for the mainline?

This is getting somewhat gross ... what about clearing all TREE_NO_TRAPs
on inlining?

Otherwise I think the "proper" way is to teach passes that moving loads/stores
eventually has to clear TREE_NO_TRAP ... (a reason that for example
VRP cannot set TREE_NO_TRAP on dereferences of pointers that have
a non-NULL range...).

Richard.


> 2013-09-27  Eric Botcazou  
>
> PR bootstrap/58509
> * ipa-prop.h (get_ancestor_addr_info): Declare.
> * ipa-prop.c (get_ancestor_addr_info): Make public.
> * tree-inline.c (is_parm): Rename into...
> (is_ref_parm): ...this.
> (is_based_on_ref_parm): New predicate.
> (remap_gimple_op_r): Do not propagate TREE_THIS_NOTRAP on MEM_REF if
> a parameter with reference type has been remapped and the result is
> not based on another parameter with reference type.
> (copy_tree_body_r): Likewise on INDIRECT_REF and MEM_REF.
>
>
> 2013-09-27  Eric Botcazou  
>
> * gnat.dg/specs/opt1.ads: New test.
>
>
> --
> Eric Botcazou


Re: [patch] Fix PR middle-end/58570

2013-10-08 Thread Richard Biener
On Tue, Oct 8, 2013 at 10:19 AM, Eric Botcazou  wrote:
> Hi,
>
> this is a regression on the mainline introduced by my tree-ssa-alias.c change:
>
> 2013-04-17  Eric Botcazou  
>
> * tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): New.
> (decl_refs_may_alias_p): Add REF1 and REF2 parameters.
> Use nonoverlapping_component_refs_of_decl_p to disambiguate component
> references.
> (refs_may_alias_p_1): Adjust call to decl_refs_may_alias_p.
> * tree-streamer.c (record_common_node): Adjust reference in comment.
>
> Unlike its model nonoverlapping_component_refs_p from alias.c, the predicate
> nonoverlapping_component_refs_of_decl_p considers that different fields in the
> same structure cannot overlap.  While that's true in GIMPLE, that's false in
> RTL for bitfields and tree-ssa-alias.c is also queried from RTL nowadays...

Probably because the actual accesses may overlap if we choose to
perform a bigger access.

The same can happen if we for struct { char c1; char c2; } perform
an HImode access in case the target doesn't support QImode accesses.
Basically anytime we go through the bitfield expansion path.  Thus, doesn't
that mean that MEM_EXPR is wrong on the MEMs?  Maybe we used to
strip all DECL_BIT_FIELD component-refs at some point (adjusting
MEM_OFFSET accordingly)?

Your patch seems to paper over this issue in the wrong place ...

Richard.

> Therefore the attached patch just copies the missing bits from the former to
> the latter.  Tested on x86_64-suse-linux, OK for the mainline?
>
>
> 2013-10-08  Eric Botcazou  
>
> PR middle-end/58570
> * tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Return
> false if both components are bitfields.
>
>
> 2013-10-08  Eric Botcazou  
>
> * gcc.c-torture/execute/pr58570.c: New test.
>
>
> --
> Eric Botcazou


Re: Optimize callers using nonnull attribute

2013-10-08 Thread Richard Biener
On Mon, Oct 7, 2013 at 3:52 PM, Marc Glisse  wrote:
> On Mon, 7 Oct 2013, Richard Biener wrote:
>
>> On Mon, Oct 7, 2013 at 12:33 AM, Marc Glisse  wrote:
>>>
>>> Hello,
>>>
>>> this patch asserts that when we call a function with the nonnull
>>> attribute,
>>> the corresponding argument is not zero, just like when we dereference a
>>> pointer. Everything is under a check for flag_delete_null_pointer_checks.
>>>
>>> Note that this function currently gives up if the statement may throw
>>> (because in some languages indirections may throw?), but this could
>>> probably
>>> be relaxed a bit so my example would still work when compiled with g++,
>>> without having to mark f1 and f2 as throw().
>>>
>>> Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.
>>
>>
>> Can you please restructure it in a way to not require a goto?  That is,
>> move searching for a non-null opportunity into a helper function?
>
>
> Thanks. I wasn't sure what to put in the helper and what to keep in the
> original function, so I'll wait a bit for comments before the commit.

Works for me.

Richard.

>
> Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.
>
> 2013-10-08  Marc Glisse  
>
> PR tree-optimization/58480
> gcc/
> * tree-vrp.c (infer_nonnull_range): New function.
> (infer_value_range): Call infer_nonnull_range.
>
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/pr58480.c: New file.
>
> --
> Marc Glisse
>
> Index: testsuite/gcc.dg/tree-ssa/pr58480.c
> ===
> --- testsuite/gcc.dg/tree-ssa/pr58480.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/pr58480.c (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +
> +extern void eliminate (void);
> +extern void* f1 (void *a, void *b) __attribute__((nonnull));
> +extern void* f2 (void *a, void *b) __attribute__((nonnull(2)));
> +void g1 (void*p, void*q){
> +  f1 (q, p);
> +  if (p == 0)
> +eliminate ();
> +}
> +void g2 (void*p, void*q){
> +  f2 (q, p);
> +  if (p == 0)
> +eliminate ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Folding predicate\[^\\n\]*to 0" 2
> "vrp1" } } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
>
> Property changes on: testsuite/gcc.dg/tree-ssa/pr58480.c
> ___
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Index: tree-vrp.c
> ===
> --- tree-vrp.c  (revision 203241)
> +++ tree-vrp.c  (working copy)
> @@ -4455,63 +4455,103 @@ build_assert_expr_for (tree cond, tree v
>
>  static inline bool
>  fp_predicate (gimple stmt)
>  {
>GIMPLE_CHECK (stmt, GIMPLE_COND);
>
>return FLOAT_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)));
>  }
>
>
> +/* If OP can be inferred to be non-zero after STMT executes, return true.
> */
> +
> +static bool
> +infer_nonnull_range (gimple stmt, tree op)
> +{
> +  /* We can only assume that a pointer dereference will yield
> + non-NULL if -fdelete-null-pointer-checks is enabled.  */
> +  if (!flag_delete_null_pointer_checks
> +  || !POINTER_TYPE_P (TREE_TYPE (op))
> +  || gimple_code (stmt) == GIMPLE_ASM)
> +return false;
> +
> +  unsigned num_uses, num_loads, num_stores;
> +
> +  count_uses_and_derefs (op, stmt, &num_uses, &num_loads, &num_stores);
> +  if (num_loads + num_stores > 0)
> +return true;
> +
> +  if (gimple_code (stmt) == GIMPLE_CALL)
> +{
> +  tree fntype = gimple_call_fntype (stmt);
> +  tree attrs = TYPE_ATTRIBUTES (fntype);
> +  for (; attrs; attrs = TREE_CHAIN (attrs))
> +   {
> + attrs = lookup_attribute ("nonnull", attrs);
> +
> + /* If "nonnull" wasn't specified, we know nothing about
> +the argument.  */
> + if (attrs == NULL_TREE)
> +   return false;
> +
> + /* If "nonnull" applies to all the arguments, then ARG
> +is non-null.  */
> + if (TREE_VALUE (attrs) == NULL_TREE)
> +   return true;
> +
> + /* Now see if op appears in the nonnull list.  */
> + for (tree t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t))
> +   {
> + int idx = TREE_INT_CST_LOW (TREE_VALUE (t)) - 1;
> + tree arg = gimple_call_arg (stmt, idx);
> + if (op == arg)
> +   return true;
> +   }
> +   }
> +}
> +
> +  return false;
> +}
> +
>  /* If the range of values taken by OP can be inferred after STMT executes,
> return the comparison code (COMP_CODE_P) and value (VAL_P) that
> describes the inferred range.  Return true if a range could be
> inferred.  */
>
>  static bool
>  infer_value_range (gimple stmt, tree op, enum tree_code *co

Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-08 Thread Richard Biener
On Mon, Oct 7, 2013 at 3:39 PM, Andrew MacLeod  wrote:
> On 10/07/2013 04:44 AM, Richard Biener wrote:
>>
>> On Thu, Oct 3, 2013 at 4:11 AM, Andrew MacLeod 
>> wrote:
>>>
>>> this patch consolidates tree-ssa-loop*.c files with new .h files as
>>> required
>>> (8 in total)
>>>
>>> A number of the prototypes were in tree-flow.h, but there were also a few
>>> in
>>> cfgloop.h.  tree-ssa-loop.h was created to contain a couple of common
>>> structs and act as the gathering file for any generally applicable
>>> tree-ssa-loop includes. tree-flow.h includes this file for now.
>>>
>>> There is a bit of a criss-cross mess between the cfg-* and tree-ssa-loop*
>>> routines, but I'm not touching that for now.  Some of that might have to
>>> get
>>> resolved when I try to remove tree-flow.h as a standard include file from
>>> the .c files.. we'll see.
>>>
>>> In particular, tree-ssa-loop-niter.h exports a lot of more generally used
>>> routines. loop-iv.c, loop-unroll.c and loop-unswitch.c needed to include
>>> it.
>>>
>>> A few routines werent referenced outside their file so I made those
>>> static,
>>> and there was one routine stmt_invariant_in_loop_p wich was actually
>>> unused.
>>>
>>> bootstraps on x86_64-unknown-linux-gnu and passes with no new
>>> regressions.
>>> OK?
>>
>> +   enum tree_code cmp;
>> + };
>> +
>> + #include "tree-ssa-loop-im.h"
>> + #include "tree-ssa-loop-ivcanon.h"
>>
>> what's the particular reason to not do includes first?  That looks really
>> odd (and maybe rather than doing that these includes should be done
>> elsewhere, like in .c files or in a header that includes tree-ssa-loop.h).
>
>
> I had sent a followup patch because I didn't like it either :-). required
> just a little shuffling since a couple of them used the typedefs declared
> before them.
>
>>
>> You seem to export things like movement_possibility that are not
>> used outside of tree-ssa-loop-im.c (in my tree at least).
>
>
> Hmm, seems I  missed that one somehow... I've been checking every function
> and enum, but seemed to have managed to miss that one somehow. easy to fix.
>
> Yes, both those should go into the tree-ssa-loop-im.c
>
>
>>
>> Other than that the single reason we have to have all these exports
>> is that loop passes have their pass structures and gate / entries
>> defined in tree-ssa-loop.c instead of in the files the passes reside.
>> Consider changing that as a preparatory patch - it should cut down
>> the number of needed new .h files.
>>
> Yeah, good observation.   And by shuffling a few other exported routines
> which are more generally used into tree-ssa-loop.c:
>   for_each_index, lsm_tmp_name_add, gen_lsm_tmp_name, get_lsm_tmp_name,
> tree_num_loop_insns and enhancing get_lsm_tmp_name() to accept a suffix
> parameter, those loop-im and loop-ivcanon no longer need a .h file either.
>
> I moved the passes out of tree-ssa-loop.c  that would cause new .h files
> unnecessarily, but left the others since that is orthogonal and could be
> followed up later.
>
> I just bundled it all together since it changes the original 2 patches a
> bit.
>
> Bootstraps on x86_64-unknown-linux-gnu, testsuite regressions running.
> Assuming they pass fine, OK?

Hm.

Index: loop-iv.c
===
*** loop-iv.c   (revision 203243)
--- loop-iv.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 62,67 
--- 62,68 
  #include "df.h"
  #include "hash-table.h"
  #include "dumpfile.h"
+ #include "tree-ssa-loop-niter.h"

loop-iv.c is RTL land (likewise loop-unroll.c and loop-unswitch.c),
why do they need tree-ssa-loop-niter.h?

Apart from that the patch is ok.

Thanks,
Richard.


> Andrew


[2nd PING] [C++ PATCH] demangler fix (take 2)

2013-10-08 Thread Gary Benson
Hi all,

This is a resubmission of my previous demangler fix [1] rewritten
to avoid using hashtables and other libiberty features.

From the above referenced email:

d_print_comp maintains a certain amount of scope across calls (namely
a stack of templates) which is used when evaluating references in
template argument lists.  If such a reference is later used from a
subtitution then the scope in force at the time of the substitution is
used.  This appears to be wrong (I say appears because I couldn't find
anything in the API [2] to clarify this).

The attached patch causes the demangler to capture the scope the first
time such a reference is traversed, and to use that captured scope on
subsequent traversals.  This fixes GDB PR 14963 [3] whereby a
reference is resolved against the wrong template, causing an infinite
loop and eventual stack overflow and segmentation fault.

I've added the result to the demangler test suite, but I know of no
way to check the validity of the demangled symbol other than by
inspection (and I am no expert here!)  If anybody knows a way to
check this then please let me know!  Otherwise, I hope this
not-really-checked demangled version is acceptable.

Thanks,
Gary

[1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00215.html
[2] http://mentorembedded.github.io/cxx-abi/abi.html#mangling
[3] http://sourceware.org/bugzilla/show_bug.cgi?id=14963

-- 
http://gbenson.net/


diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 89e108a..2ff8216 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,20 @@
+2013-09-17  Gary Benson  
+
+   * cp-demangle.c (struct d_saved_scope): New structure.
+   (struct d_print_info): New fields saved_scopes and
+   num_saved_scopes.
+   (d_print_init): Initialize the above.
+   (d_print_free): New function.
+   (cplus_demangle_print_callback): Call the above.
+   (d_copy_templates): New function.
+   (d_print_comp): New variables saved_templates and
+   need_template_restore.
+   [DEMANGLE_COMPONENT_REFERENCE,
+   DEMANGLE_COMPONENT_RVALUE_REFERENCE]: Capture scope the first
+   time the component is traversed, and use the captured scope for
+   subsequent traversals.
+   * testsuite/demangle-expected: Add regression test.
+
 2013-09-10  Paolo Carlini  
 
PR bootstrap/58386
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 70f5438..a199f6d 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -275,6 +275,18 @@ struct d_growable_string
   int allocation_failure;
 };
 
+/* A demangle component and some scope captured when it was first
+   traversed.  */
+
+struct d_saved_scope
+{
+  /* The component whose scope this is.  */
+  const struct demangle_component *container;
+  /* The list of templates, if any, that was current when this
+ scope was captured.  */
+  struct d_print_template *templates;
+};
+
 enum { D_PRINT_BUFFER_LENGTH = 256 };
 struct d_print_info
 {
@@ -302,6 +314,10 @@ struct d_print_info
   int pack_index;
   /* Number of d_print_flush calls so far.  */
   unsigned long int flush_count;
+  /* Array of saved scopes for evaluating substitutions.  */
+  struct d_saved_scope *saved_scopes;
+  /* Number of saved scopes in the above array.  */
+  int num_saved_scopes;
 };
 
 #ifdef CP_DEMANGLE_DEBUG
@@ -3665,6 +3681,30 @@ d_print_init (struct d_print_info *dpi, 
demangle_callbackref callback,
   dpi->opaque = opaque;
 
   dpi->demangle_failure = 0;
+
+  dpi->saved_scopes = NULL;
+  dpi->num_saved_scopes = 0;
+}
+
+/* Free a print information structure.  */
+
+static void
+d_print_free (struct d_print_info *dpi)
+{
+  int i;
+
+  for (i = 0; i < dpi->num_saved_scopes; i++)
+{
+  struct d_print_template *ts, *tn;
+
+  for (ts = dpi->saved_scopes[i].templates; ts != NULL; ts = tn)
+   {
+ tn = ts->next;
+ free (ts);
+   }
+}
+
+  free (dpi->saved_scopes);
 }
 
 /* Indicate that an error occurred during printing, and test for error.  */
@@ -3749,6 +3789,7 @@ cplus_demangle_print_callback (int options,
demangle_callbackref callback, void *opaque)
 {
   struct d_print_info dpi;
+  int success;
 
   d_print_init (&dpi, callback, opaque);
 
@@ -3756,7 +3797,9 @@ cplus_demangle_print_callback (int options,
 
   d_print_flush (&dpi);
 
-  return ! d_print_saw_error (&dpi);
+  success = ! d_print_saw_error (&dpi);
+  d_print_free (&dpi);
+  return success;
 }
 
 /* Turn components into a human readable string.  OPTIONS is the
@@ -3913,6 +3956,36 @@ d_print_subexpr (struct d_print_info *dpi, int options,
 d_append_char (dpi, ')');
 }
 
+/* Return a shallow copy of the current list of templates.
+   On error d_print_error is called and a partial list may
+   be returned.  Whatever is returned must be freed.  */
+
+static struct d_print_template *
+d_copy_templates (struct d_print_info *dpi)
+{
+  struct d_print_template *src, *result, **link = &result;
+
+  for (

Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips

2013-10-08 Thread Jan Hubicka
> Hi Honza,
> 
> I am planning to update the scheduler descriptions for bdver3 first.
> Attached is the patch. Please let me know your comments if any.
> 
> Though I agree on merging bdver1/2 and bdver3 on most parts, the FP lines and 
> decoding schemes are different. So, let me know how can I approach merging 
> these.

Yep, I think we need to merge only those autmatas tha are same for both:
(define_automaton "bdver3,bdver3_ieu,bdver3_load,bdver3_fp,bdver3_agu")
probably can become
(define_automaton "bdver3,bdver3_fp")
with the corresponding reservations using bdver3_ieu,bdver3_load,bdver3_agu 
changed to bdver1
automaton.  I think it should result in smaller binary - the fact that all 
conditionals are
physically duplicated in bdver1/bdev3.md should be optimized away by 
genautomata.

I also played a bit with the decoders and I am attaching my version - that 
seems SPEC neutral though.
Your version has problem that it does not model the thing that the two decoders 
works sequentially.

I removed the bdver1-decodev unit and instead i simply reserve all thre 
decoders + I added 
presence set requring second decoder to be taken only after first one changed 
presence set requring
decoder 2 to be taken only after decoder 1+2 to final presence set, so decoderv 
resetvation has
chance to pass.
Finally I added use-decodera that consumes all of the first decoder as soon as 
we start to allocate
second decoder - we can not really allocate them in interchanging order.

I also noticed that push/pop instructions are modeled as being vector, while 
manual says it 
generates one micro op unless memory operand is used.

I did not have much time to play further with this except for manual inspection 
of schedules that
seems better now and in rare cases handle 4-5 instructions per cycle.

We also should enable ia32_multipass_dfa_lookahead - with that scheduler should 
be able to put double decoded
and vector decoded insns on the proper places.

We can also experiment with defining TARGET_SCHED_VARIABLE_ISSUE to get more 
realistic
estimates on what still can be issued - the value of 6 is unrealistically high.

Seems like with addition of Atom the scheduler macros became very twisty maze 
of passages.
I will work on replacing most of the CPU cases into tuning flags + costs.

What do you think?
Honza


Index: bdver1.md
===
--- bdver1.md   (revision 203204)
+++ bdver1.md   (working copy)
@@ -41,7 +41,9 @@
 (define_cpu_unit "bdver1-decode0" "bdver1")
 (define_cpu_unit "bdver1-decode1" "bdver1")
 (define_cpu_unit "bdver1-decode2" "bdver1")
-(define_cpu_unit "bdver1-decodev" "bdver1")
+(define_cpu_unit "bdver1-decode0b" "bdver1")
+(define_cpu_unit "bdver1-decode1b" "bdver1")
+(define_cpu_unit "bdver1-decode2b" "bdver1")
 
 ;; Model the fact that double decoded instruction may take 2 cycles
 ;; to decode when decoder2 and decoder0 in next cycle
@@ -57,18 +59,26 @@
 ;; too.  Vector decoded instructions then can't be issued when modeled
 ;; as consuming decoder0+decoder1+decoder2.
 ;; We solve that by specialized vector decoder unit and exclusion set.
-(presence_set "bdver1-decode2" "bdver1-decode0")
-(exclusion_set "bdver1-decodev" "bdver1-decode0,bdver1-decode1,bdver1-decode2")
-
-(define_reservation "bdver1-vector" "nothing,bdver1-decodev")
-(define_reservation "bdver1-direct1" "nothing,bdver1-decode1")
+(final_presence_set "bdver1-decode2" "bdver1-decode0,bdver1-decode1")
+(presence_set "bdver1-decode0b,bdver1-decode1b,bdver1-decode2b" 
"bdver1-decode0,bdver1-decode1")
+(final_presence_set "bdver1-decode2b" "bdver1-decode0b,bdver1-decode1b")
+
+(define_reservation "use-decodera" "((bdver1-decode0 | nothing)
++ (bdver1-decode1 | nothing)
++ (bdver1-decode2 | nothing))")
+(define_reservation "bdver1-vector" 
"nothing,((bdver1-decode0+bdver1-decode1+bdver1-decode2)
+ 
|(use-decodera+bdver1-decode0b+bdver1-decode1b+bdver1-decode2b))")
+(define_reservation "bdver1-direct1" 
"nothing,(bdver1-decode1|(use-decodera+bdver1-decode1b))")
 (define_reservation "bdver1-direct" "nothing,
 (bdver1-decode0 | bdver1-decode1
-| bdver1-decode2)")
+| bdver1-decode2 | 
(use-decodera+bdver1-decode0b)
+| (use-decodera+bdver1-decode1b) | 
(use-decodera+bdver1-decode2b))")
 ;; Double instructions behaves like two direct instructions.
 (define_reservation "bdver1-double" "((bdver1-decode2,bdver1-decode0)
 | (nothing,(bdver1-decode0 + 
bdver1-decode1))
-| (nothing,(bdver1-decode1 + 
bdver1-decode2)))")
+| (nothing,(bdver1-decode1 + 
bdver1-decode2))
+| (nothing,(use-decodera + bdver1-decode0b 

Re: [Patch] Fix the testcases that use bind_pic_locally

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 10:14:59AM +0100, Vidya Praveen wrote:
> There are several tests that use "dg-add-options bind_pic_locally" in order to
> add -fPIE or -fpie when -fPIC or -fpic are used respectively with the expecta-
> tion that -fPIE/-fpie will override -fPIC/-fpic. But this doesn't happen since
> since -fPIE/-fpie will be added before the -fPIC/-fpic (whether -fPIC/-fpic is
> added as a multilib option or through cflags). This is essentially due to the
> fact that cflags and multilib flags are added after the options are added 
> through dg-options, dg-add-options, et al. in default_target_compile function.
> 
> Assuming dg-options or dg-add-options should always win, we can fix this by
> modifying the order in which they are concatenated at default_target_compile 
> in
> target.exp. But this is not recommended since it depends on everyone who tests
> upgrading their dejagnu (refer [1]). 

This looks like a big step backwards and I'm afraid it can break targets
where -fpic/-fPIC is the default.  If dg-add-options bind_pic_locally must
add options to the end of command line, then can't you just push the options
that must go last to some variable other than dg-extra-tool-flags and as we
override dejagnu's dg-test, put it in our override last (or in whatever
other method that already added the multilib options)?

Jakub


RE: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-10-08 Thread Bernd Edlinger
Hi,

On Tue, 8 Oct 2013 10:01:37, Eric Botcazou wrote:
>
>> OK, what do you think of it now?
>
> My take on this is that the Proper Fix(tm) has been posted by Martin:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00082.html
> IMO it's a no-brainer, modulo the ABI concern. Everything else is more or
> less clever stuff to paper over this original compute_record_mode bug.
>
> We are lying to the expander by pretending that the object has V2DImode since
> it's larger and thus cannot be manipulated in this mode; everything is clearly
> downhill from there. If we don't want to properly fix the bug then let's put
> a hack in the expander, possibly using EXPAND_MEMORY, but it should trigger
> _only_ for structures with zero-sized arrays and non-BLKmode and be preceded
> by a big ??? comment explaining why it is deemed necessary.
>
> --
> Eric Botcazou

I agree, that assigning a non-BLKmode to structures with zero-sized arrays
should be considered a bug.

And it should be checked that there is only ONE zero-sized array.
And it should be checked that it is only allowed at the end of the structure.

Otherwise we have something like a union instead of a structure,
which will break the code in tree-ssa-alias.c !

And again, this is not only a problem of structures with zero-sized
arrays at the end. Remember my previous example code:
On ARM (or anything with STRICT_ALIGNMENT) this union has the
same problems:
 
/* PR middle-end/57748 */
/* arm-eabi-gcc -mcpu=cortex-a9 -O3 */
#include 
 
union  x
{
  short a[2];
  char x[4];
} __attribute__((packed, aligned(4))) ;
typedef volatile union  x *s;
 
void __attribute__((noinline, noclone))
check (void)
{
  s xx=(s)(0x8002);
  /* although volatile xx->x[3] reads 4 bytes here */
  if (xx->x[3] != 3)
    abort ();
}
 
void __attribute__((noinline, noclone))
foo (void)
{
  s xx=(s)(0x8002);
  xx->x[3] = 3;
}
 
int
main ()
{
  foo ();
  check ();
  return 0;
}


This union has a UINT32 mode, look at compute_record_mode!
Because of this example I still think that the expander should know what
we intend to do with the base object.


Regards
Bernd.

[Patch] Fix the testcases that use bind_pic_locally

2013-10-08 Thread Vidya Praveen
Hello,

There are several tests that use "dg-add-options bind_pic_locally" in order to
add -fPIE or -fpie when -fPIC or -fpic are used respectively with the expecta-
tion that -fPIE/-fpie will override -fPIC/-fpic. But this doesn't happen since
since -fPIE/-fpie will be added before the -fPIC/-fpic (whether -fPIC/-fpic is
added as a multilib option or through cflags). This is essentially due to the
fact that cflags and multilib flags are added after the options are added 
through dg-options, dg-add-options, et al. in default_target_compile function.

Assuming dg-options or dg-add-options should always win, we can fix this by
modifying the order in which they are concatenated at default_target_compile in
target.exp. But this is not recommended since it depends on everyone who tests
upgrading their dejagnu (refer [1]). 

So this patch replaces:

/* { dg-add-options bind_pic_locally } */

with 

/* { dg-skip-if "" { *-*-* } { "-fPIC" "-fpic" } { "" } } */

in all the applicable test files. 

NOTE: There are many files that uses bind_pic_locally but they do PASS whether
or not -fPIE/-fpie is passed. But I've replaced in all the files that uses
bind_pic_locally.

add_options_for_bind_pic_locally should IMO be removed or deprecated since it is
is misleading. I can post a separate patch for this if everyone agrees to it.

References:
[1] http://gcc.gnu.org/ml/gcc/2013-07/msg00281.html
[2] http://gcc.gnu.org/ml/gcc/2013-09/msg00207.html

This issue for obvious reasons, common to all targets. 

Tested for aarch64-none-elf. OK for trunk?

Cheers
VP

---

gcc/testsuite/ChangeLog:

2013-10-08  Vidya Praveen  

* gcc.dg/inline-33.c: Remove bind_pic_locally and skip if -fPIC/-fpic
is used.
* gcc.dg/ipa/ipa-3.c: Likewise.
* gcc.dg/ipa/ipa-5.c: Likewise.
* gcc.dg/ipa/ipa-7.c: Likewise.
* gcc.dg/ipa/ipcp-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-1.c: Likewise.
* gcc.dg/ipa/ipcp-agg-2.c: Likewise.
* gcc.dg/ipa/ipcp-agg-6.c: Likewise.
* gcc.dg/ipa/ipa-1.c: Likewise.
* gcc.dg/ipa/ipa-2.c: Likewise.
* gcc.dg/ipa/ipa-4.c: Likewise.
* gcc.dg/ipa/ipa-8.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Likewise.
* gcc.dg/ipa/ipcp-1.c: Likewise.
* gcc.dg/ipa/ipcp-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-3.c: Likewise.
* gcc.dg/ipa/ipcp-agg-4.c: Likewise.
* gcc.dg/ipa/ipcp-agg-5.c: Likewise.
* gcc.dg/ipa/ipcp-agg-7.c: Likewise.
* gcc.dg/ipa/ipcp-agg-8.c: Likewise.
* gcc.dg/ipa/pr56988.c: Likewise.
* g++.dg/ipa/iinline-1.C: Likewise.
* g++.dg/ipa/iinline-2.C: Likewise.
* g++.dg/ipa/iinline-3.C: Likewise.
* g++.dg/ipa/inline-1.C: Likewise.
* g++.dg/ipa/inline-2.C: Likewise.
* g++.dg/ipa/inline-3.C: Likewise.
* g++.dg/other/first-global.C: Likewise.
* g++.dg/parse/attr-externally-visible-1.C: Likewise.
* g++.dg/torture/pr40323.C: Likewise.
* g++.dg/torture/pr55260-1.C: Likewise.
* g++.dg/torture/pr55260-2.C: Likewise.
* g++.dg/tree-ssa/inline-1.C: Likewise.
* g++.dg/tree-ssa/inline-2.C: Likewise.
* g++.dg/tree-ssa/inline-3.C: Likewise.
* g++.dg/tree-ssa/nothrow-1.C: Likewise.
* gcc.dg/tree-ssa/inline-3.c: Likewise.
* gcc.dg/tree-ssa/inline-4.c: Likewise.
* gcc.dg/tree-ssa/ipa-cp-1.c: Likewise.
* gcc.dg/tree-ssa/local-pure-const.c: Likewise.
* gfortran.dg/whole_file_5.f90: Likewise.
* gfortran.dg/whole_file_6.f90: Likewise.
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-1.C b/gcc/testsuite/g++.dg/ipa/iinline-1.C
index 9f99893..e4daa8c 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-1.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-1.C
@@ -2,7 +2,7 @@
inlining..  */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining"  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if "" { *-*-* } { "-fPIC" "-fpic" } { "" } } */
 
 extern void non_existent (const char *, int);
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-2.C b/gcc/testsuite/g++.dg/ipa/iinline-2.C
index 670a5dd..64a4dce 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-2.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-2.C
@@ -2,7 +2,7 @@
inlining..  */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining"  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if "" { *-*-* } { "-fPIC" "-fpic" } { "" } } */
 
 extern void non_existent (const char *, int);
 
diff --git a/gcc/testsuite/g++.dg/ipa/iinline-3.C b/gcc/testsuite/g++.dg/ipa/iinline-3.C
index 3daee9a..0d59969 100644
--- a/gcc/testsuite/g++.dg/ipa/iinline-3.C
+++ b/gcc/testsuite/g++.dg/ipa/iinline-3.C
@@ -2,7 +2,7 @@
parameters which have been modified.  */
 /* { dg-do run } */
 /* { dg-options "-O3 -fno-early-inlining"  } */
-/* { dg-add-options bind_pic_locally } */
+/* { dg-skip-if "" { *-*-* } { "-fPIC" "-fpic" } { "" } } */
 
 ex

RE: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips

2013-10-08 Thread Gopalasubramanian, Ganesh
Hi Honza,

I am planning to update the scheduler descriptions for bdver3 first.
Attached is the patch. Please let me know your comments if any.

Though I agree on merging bdver1/2 and bdver3 on most parts, the FP lines and 
decoding schemes are different. So, let me know how can I approach merging 
these.

Regards
Ganesh

-Original Message-
From: Jan Hubicka [mailto:hubi...@ucw.cz] 
Sent: Monday, September 30, 2013 4:47 PM
To: gcc-patches@gcc.gnu.org; Gopalasubramanian, Ganesh; hjl.to...@gmail.com
Subject: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 chips


Hi,
while looking into schedules produced for Buldozer and Core I noticed that they 
do not seem to match reality.  This is because ix86_issue_rate limits those 
CPUs into 3 instructions per cycle, while they are designed to do 4 and 
somewhat confused ix86_adjust_cost.

I also added stack engine into modern chips even though scheduler doesn't 
really understand that multiple push operations can happen in one cycle. At 
least it gets the stack updates in sequences of push/pop operations.

I did not updated buldozer issue rates yet.  The current scheduler model won't 
allow it to execute more than 3 instructions per cycle (and 2 for version 3).  
I think bdver1.md/bdver3.md needs to be updated first.

I am testing x86_64-linux and will commit if there are no complains.

Honza

* i386.c (ix86_issue_rate): Pentium4/Nocona issue 2 instructions
per cycle, Core/CoreI7/Haswell 4 instructions per cycle.
(ix86_adjust_cost): Add stack engine to modern AMD chips;
fix for core; remove Atom that mistakely shared code with AMD.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 203011)
+++ config/i386/i386.c  (working copy)
@@ -24435,17 +24435,14 @@ ix86_issue_rate (void)
 case PROCESSOR_SLM:
 case PROCESSOR_K6:
 case PROCESSOR_BTVER2:
+case PROCESSOR_PENTIUM4:
+case PROCESSOR_NOCONA:
   return 2;
 
 case PROCESSOR_PENTIUMPRO:
-case PROCESSOR_PENTIUM4:
-case PROCESSOR_CORE2:
-case PROCESSOR_COREI7:
-case PROCESSOR_HASWELL:
 case PROCESSOR_ATHLON:
 case PROCESSOR_K8:
 case PROCESSOR_AMDFAM10:
-case PROCESSOR_NOCONA:
 case PROCESSOR_GENERIC:
 case PROCESSOR_BDVER1:
 case PROCESSOR_BDVER2:
@@ -24453,6 +24450,11 @@ ix86_issue_rate (void)
 case PROCESSOR_BTVER1:
   return 3;
 
+case PROCESSOR_CORE2:
+case PROCESSOR_COREI7:
+case PROCESSOR_HASWELL:
+  return 4;
+
 default:
   return 1;
 }
@@ -24709,10 +24711,15 @@ ix86_adjust_cost (rtx insn, rtx link, rt
 case PROCESSOR_BDVER3:
 case PROCESSOR_BTVER1:
 case PROCESSOR_BTVER2:
-case PROCESSOR_ATOM:
 case PROCESSOR_GENERIC:
   memory = get_attr_memory (insn);
 
+  /* Stack engine allows to execute push&pop instructions in parall.  */
+  if (((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
+  && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
+ && (ix86_tune != PROCESSOR_ATHLON && ix86_tune != PROCESSOR_K8))
+   return 0;
+
   /* Show ability of reorder buffer to hide latency of load by executing
 in parallel with previous instruction in case
 previous instruction is not needed to compute the address.  */ @@ 
-24737,6 +24744,29 @@ ix86_adjust_cost (rtx insn, rtx link, rt
  else
cost = 0;
}
+  break;
+
+case PROCESSOR_CORE2:
+case PROCESSOR_COREI7:
+case PROCESSOR_HASWELL:
+  memory = get_attr_memory (insn);
+
+  /* Stack engine allows to execute push&pop instructions in parall.  */
+  if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
+ && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
+   return 0;
+
+  /* Show ability of reorder buffer to hide latency of load by executing
+in parallel with previous instruction in case
+previous instruction is not needed to compute the address.  */
+  if ((memory == MEMORY_LOAD || memory == MEMORY_BOTH)
+ && !ix86_agi_dependent (dep_insn, insn))
+   {
+ if (cost >= 4)
+   cost -= 4;
+ else
+   cost = 0;
+   }
   break;
 
 case PROCESSOR_SLM:


issue_rate_bdver3.patch
Description: issue_rate_bdver3.patch


Re: [PATCH] alternative hirate for builtin_expert

2013-10-08 Thread Ramana Radhakrishnan
>> Can someone comment / approve it quickly so that we get AArch32 and AArch64
>> linux cross-builds back up ?
>
> Ok.

Applied for Dehao as r203269 . Tests on arm came back ok.

Ramana

>
> Thanks,
> Richard.
>
>>
>> regards
>> Ramana
>>
>>>
>>> Honza
>>>

 Dehao

>
> Honza
>>>
>>>
>>
>>


[ping] Fix PR bootstrap/58509

2013-10-08 Thread Eric Botcazou
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg02000.html

Thanks in advance.

-- 
Eric Botcazou


[patch] Fix PR middle-end/58570

2013-10-08 Thread Eric Botcazou
Hi,

this is a regression on the mainline introduced by my tree-ssa-alias.c change:

2013-04-17  Eric Botcazou  

* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): New.
(decl_refs_may_alias_p): Add REF1 and REF2 parameters.
Use nonoverlapping_component_refs_of_decl_p to disambiguate component
references.
(refs_may_alias_p_1): Adjust call to decl_refs_may_alias_p.
* tree-streamer.c (record_common_node): Adjust reference in comment.

Unlike its model nonoverlapping_component_refs_p from alias.c, the predicate 
nonoverlapping_component_refs_of_decl_p considers that different fields in the 
same structure cannot overlap.  While that's true in GIMPLE, that's false in 
RTL for bitfields and tree-ssa-alias.c is also queried from RTL nowadays...

Therefore the attached patch just copies the missing bits from the former to 
the latter.  Tested on x86_64-suse-linux, OK for the mainline?


2013-10-08  Eric Botcazou  

PR middle-end/58570
* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Return
false if both components are bitfields.


2013-10-08  Eric Botcazou  

* gcc.c-torture/execute/pr58570.c: New test.


-- 
Eric Botcazou#pragma pack(1)
struct S
{
  int f0:15;
  int f1:29;
};

int e = 1, i;
static struct S d[6];

int
main (void)
{
  if (e)
{
  d[i].f0 = 1;
  d[i].f1 = 1;
}
  if (d[0].f1 != 1)
__builtin_abort ();
  return 0;
}Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c	(revision 203241)
+++ tree-ssa-alias.c	(working copy)
@@ -803,12 +803,13 @@ nonoverlapping_component_refs_of_decl_p
   if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 	 goto may_overlap;
 
-  /* Different fields of the same record type cannot overlap.  */
+  /* Different fields of the same record type cannot overlap, unless they
+	 are both bitfields and we are at the RTL level.  */
   if (field1 != field2)
 	{
 	  component_refs1.release ();
 	  component_refs2.release ();
-	  return true;
+	  return !(DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2));
 	}
 }
 

Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-10-08 Thread Eric Botcazou
> OK, what do you think of it now?

My take on this is that the Proper Fix(tm) has been posted by Martin:
  http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00082.html
IMO it's a no-brainer, modulo the ABI concern.  Everything else is more or 
less clever stuff to paper over this original compute_record_mode bug.

We are lying to the expander by pretending that the object has V2DImode since 
it's larger and thus cannot be manipulated in this mode; everything is clearly
downhill from there.  If we don't want to properly fix the bug then let's put 
a hack in the expander, possibly using EXPAND_MEMORY, but it should trigger 
_only_ for structures with zero-sized arrays and non-BLKmode and be preceded 
by a big ??? comment explaining why it is deemed necessary.

-- 
Eric Botcazou


[gomp4] Another affinity-1.c testcase fix

2013-10-08 Thread Jakub Jelinek
Hi!

As reported by Vincenzo, the testcase didn't try to verify affinity behavior
if _SC_NPROCESSORS_CONF returned value was smaller than kernel's internal
mask size (e.g. because of CPU hotplug support).  Fixed thusly:

2013-10-08  Jakub Jelinek  

* testsuite/libgomp.c/affinity-1.c (min_cpusetsize): New variable.
(pthread_getaffinity_np): Set it when setting contig_cpucount.
(print_affinity): Use it for size, only use sysconf value if that is not
set and if it is smaller than sizeof (cpu_set_t), use sizeof 
(cpu_set_t).

--- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2013-10-07 14:09:52.0 
+0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c2013-10-08 09:10:26.439380189 
+0200
@@ -65,6 +65,7 @@ struct places
 };
 
 unsigned long contig_cpucount;
+unsigned long min_cpusetsize;
 
 #if defined (HAVE_PTHREAD_AFFINITY_NP) && defined (_SC_NPROCESSORS_CONF) \
 && defined (CPU_ALLOC_SIZE)
@@ -94,6 +95,7 @@ pthread_getaffinity_np (pthread_t thread
if (!CPU_ISSET_S (i, cpusetsize, cpuset))
  break;
   contig_cpucount = i;
+  min_cpusetsize = cpusetsize;
 }
   return ret;
 }
@@ -105,8 +107,15 @@ print_affinity (struct place p)
   static unsigned long size;
   if (size == 0)
 {
-  size = sysconf (_SC_NPROCESSORS_CONF);
-  size = CPU_ALLOC_SIZE (size);
+  if (min_cpusetsize)
+   size = min_cpusetsize;
+  else
+   {
+ size = sysconf (_SC_NPROCESSORS_CONF);
+ size = CPU_ALLOC_SIZE (size);
+ if (size < sizeof (cpu_set_t))
+   size = sizeof (cpu_set_t);
+   }
 }
   cpu_set_t *cpusetp = (cpu_set_t *) alloca (size);
   if (pthread_getaffinity_np (pthread_self (), size, cpusetp) == 0)

Jakub


[gomp4] Fix bootstrap

2013-10-08 Thread Jakub Jelinek
Hi!

Got a warning turned into error during bootstrap that name might be used
uninitialized.  Fixed by not using it at all, there is no point to duplicate
the clause names when we have omp_clause_code_name array.

2013-10-08  Jakub Jelinek  

* c-typeck.c (c_finish_omp_clauses): Remove name variable, use
omp_clause_code_name[OMP_CLAUSE_CODE (c)] instead.

* semantics.c (finish_omp_clauses): Remove name variable, use
omp_clause_code_name[OMP_CLAUSE_CODE (c)] instead.

--- gcc/c/c-typeck.c.jj 2013-09-25 09:51:45.0 +0200
+++ gcc/c/c-typeck.c2013-10-08 08:45:30.653081155 +0200
@@ -11233,7 +11233,6 @@ c_finish_omp_clauses (tree clauses)
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
   bitmap_head aligned_head;
   tree c, t, *pc = &clauses;
-  const char *name;
   bool branch_seen = false;
   bool copyprivate_seen = false;
   tree *nowait_clause = NULL;
@@ -11253,18 +11252,15 @@ c_finish_omp_clauses (tree clauses)
   switch (OMP_CLAUSE_CODE (c))
{
case OMP_CLAUSE_SHARED:
- name = "shared";
  need_implicitly_determined = true;
  goto check_dup_generic;
 
case OMP_CLAUSE_PRIVATE:
- name = "private";
  need_complete = true;
  need_implicitly_determined = true;
  goto check_dup_generic;
 
case OMP_CLAUSE_REDUCTION:
- name = "reduction";
  need_implicitly_determined = true;
  t = OMP_CLAUSE_DECL (c);
  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
@@ -11377,7 +11373,6 @@ c_finish_omp_clauses (tree clauses)
  goto check_dup_generic;
 
case OMP_CLAUSE_COPYPRIVATE:
- name = "copyprivate";
  copyprivate_seen = true;
  if (nowait_clause)
{
@@ -11390,7 +11385,6 @@ c_finish_omp_clauses (tree clauses)
  goto check_dup_generic;
 
case OMP_CLAUSE_COPYIN:
- name = "copyin";
  t = OMP_CLAUSE_DECL (c);
  if (TREE_CODE (t) != VAR_DECL || !DECL_THREAD_LOCAL_P (t))
{
@@ -11401,7 +11395,6 @@ c_finish_omp_clauses (tree clauses)
  goto check_dup_generic;
 
case OMP_CLAUSE_LINEAR:
- name = "linear";
  t = OMP_CLAUSE_DECL (c);
  if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
  && TREE_CODE (TREE_TYPE (t)) != POINTER_TYPE)
@@ -11429,7 +11422,8 @@ c_finish_omp_clauses (tree clauses)
  if (TREE_CODE (t) != VAR_DECL && TREE_CODE (t) != PARM_DECL)
{
  error_at (OMP_CLAUSE_LOCATION (c),
-   "%qE is not a variable in clause %qs", t, name);
+   "%qE is not a variable in clause %qs", t,
+   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  remove = true;
}
  else if (bitmap_bit_p (&generic_head, DECL_UID (t))
@@ -11445,7 +11439,6 @@ c_finish_omp_clauses (tree clauses)
  break;
 
case OMP_CLAUSE_FIRSTPRIVATE:
- name = "firstprivate";
  t = OMP_CLAUSE_DECL (c);
  need_complete = true;
  need_implicitly_determined = true;
@@ -11467,7 +11460,6 @@ c_finish_omp_clauses (tree clauses)
  break;
 
case OMP_CLAUSE_LASTPRIVATE:
- name = "lastprivate";
  t = OMP_CLAUSE_DECL (c);
  need_complete = true;
  need_implicitly_determined = true;
@@ -11694,7 +11686,8 @@ c_finish_omp_clauses (tree clauses)
{
  error_at (OMP_CLAUSE_LOCATION (c),
"%qE is predetermined %qs for %qs",
-   t, share_name, name);
+   t, share_name,
+   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  remove = true;
}
}
--- gcc/cp/semantics.c.jj   2013-10-07 23:29:44.0 +0200
+++ gcc/cp/semantics.c  2013-10-08 08:44:15.766475978 +0200
@@ -5085,7 +5085,6 @@ finish_omp_clauses (tree clauses)
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
   bitmap_head aligned_head;
   tree c, t, *pc = &clauses;
-  const char *name;
   bool branch_seen = false;
   bool copyprivate_seen = false;
 
@@ -5102,23 +5101,17 @@ finish_omp_clauses (tree clauses)
   switch (OMP_CLAUSE_CODE (c))
{
case OMP_CLAUSE_SHARED:
- name = "shared";
  goto check_dup_generic;
case OMP_CLAUSE_PRIVATE:
- name = "private";
  goto check_dup_generic;
case OMP_CLAUSE_REDUCTION:
- name = "reduction";
  goto check_dup_generic;
case OMP_CLAUSE_COPYPRIVATE:
- name = "copyprivate";
  copyprivate_seen = true;
  goto check_dup_generic;
case OMP_CLAUSE_COPYIN:
- name = "copyin";
  goto check_dup_generic;
case OMP_CLAUSE_LINEAR:
- name = "linear";
  t = OMP_CLAUSE_DECL (c);
  if (!type_dependent_expression_p (t)
  && !IN

Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP

2013-10-08 Thread Kugan
Ping~

Thanks,
Kugan

+2013-09-25  Kugan Vivekanandarajah  
+
+   * dojump.c (do_compare_and_jump): Generate rtl without
+   zero/sign extension if redundant.
+   * cfgexpand.c (expand_gimple_stmt_1): Likewise.
+   * gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New
+   function.
+   * gimple.h (gimple_assign_is_zero_sign_ext_redundant) : Declare.
+


On 26/09/13 18:04, Kugan Vivekanandarajah wrote:
> Hi,
> 
> This is the updated patch for expanding gimple stmts without zer/sign
> extensions when it is safe to do that. This is based on the
>  latest changes to propagating value range information to SSA_NAMEs
> and addresses review comments from Eric.
> 
> Bootstrapped and regtested on x86_64-unknown-linux-gnu and arm-none
> linux-gnueabi. Is this OK ?
> 
> Thanks,
> Kugan
> 

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 88e48c2..6a22f8b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2311,6 +2311,20 @@ expand_gimple_stmt_1 (gimple stmt)
 
 	if (temp == target)
 	  ;
+	/* If the value in SUBREG of temp fits that SUBREG (does not
+	   overflow) and is assigned to target SUBREG of the same mode
+	   without sign convertion, we can skip the SUBREG
+	   and extension.  */
+	else if (promoted
+		 && gimple_assign_is_zero_sign_ext_redundant (stmt)
+		 && (GET_CODE (temp) == SUBREG)
+		 && (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (temp)))
+			 >= GET_MODE_PRECISION (GET_MODE (target)))
+		 && (GET_MODE (SUBREG_REG (target))
+			 == GET_MODE (SUBREG_REG (temp
+	  {
+		emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+	  }
 	else if (promoted)
 	  {
 		int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target);
diff --git a/gcc/dojump.c b/gcc/dojump.c
index 3f04eac..9ea5995 100644
--- a/gcc/dojump.c
+++ b/gcc/dojump.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ggc.h"
 #include "basic-block.h"
 #include "tm_p.h"
+#include "gimple.h"
 
 static bool prefer_and_bit_test (enum machine_mode, int);
 static void do_jump_by_parts_greater (tree, tree, int, rtx, rtx, int);
@@ -1108,6 +1109,64 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code,
 
   type = TREE_TYPE (treeop0);
   mode = TYPE_MODE (type);
+
+  /* Is zero/sign extension redundant.  */
+  bool op0_ext_redundant = false;
+  bool op1_ext_redundant = false;
+
+  /* If promoted and the value in SUBREG of op0 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op0) == SUBREG && SUBREG_PROMOTED_VAR_P (op0))
+op0_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop0));
+
+  /* If promoted and the value in SUBREG of op1 fits (does not overflow),
+ it is a candidate for extension elimination.  */
+  if (GET_CODE (op1) == SUBREG && SUBREG_PROMOTED_VAR_P (op1))
+op1_ext_redundant =
+  gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop1));
+
+  /* If zero/sign extension is redundant, generate RTL
+ for operands without zero/sign extension.  */
+  if ((op0_ext_redundant || TREE_CODE (treeop0) == INTEGER_CST)
+  && (op1_ext_redundant || TREE_CODE (treeop1) == INTEGER_CST))
+{
+  if ((TREE_CODE (treeop1) == INTEGER_CST)
+	  && (!mode_signbit_p (GET_MODE (op1), op1)))
+	{
+	  /* First operand is constant and signbit is not set (not
+	 represented in RTL as a negative constant).  */
+	  rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+	  emit_move_insn (new_op0, SUBREG_REG (op0));
+	  op0 = new_op0;
+	}
+  else if ((TREE_CODE (treeop0) == INTEGER_CST)
+	   && (!mode_signbit_p (GET_MODE (op0), op0)))
+	{
+	  /* Other operand is constant and signbit is not set (not
+	 represented in RTL as a negative constant).  */
+	  rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+	  emit_move_insn (new_op1, SUBREG_REG (op1));
+	  op1 = new_op1;
+	}
+  else if ((TREE_CODE (treeop0) != INTEGER_CST)
+	   && (TREE_CODE (treeop1) != INTEGER_CST)
+	   && (GET_MODE (op0) == GET_MODE (op1))
+	   && (GET_MODE (SUBREG_REG (op0)) == GET_MODE (SUBREG_REG (op1
+	{
+	  /* Compare registers fits SUBREG and of the
+	 same mode.  */
+	  rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0)));
+	  rtx new_op1 = gen_reg_rtx (GET_MODE (SUBREG_REG (op1)));
+
+	  emit_move_insn (new_op0, SUBREG_REG (op0));
+	  emit_move_insn (new_op1, SUBREG_REG (op1));
+	  op0 = new_op0;
+	  op1 = new_op1;
+	}
+}
+
   if (TREE_CODE (treeop0) == INTEGER_CST
   && (TREE_CODE (treeop1) != INTEGER_CST
   || (GET_MODE_BITSIZE (mode)
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 59fcf43..7bb93a6 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -200,6 +200,102 @@ gimple_call_reset_alias_info (gimple s)
 pt_solution_reset (gimple_call_clobber_set (s));
 }
 
+/* Check gimple assign stmt and see if zero/sign extension is
+   redundant.  i.e.  if an assignment 

Cleanup patches

2013-10-08 Thread Thomas Schwinge
Hi!

Here are a few cleanup patches, mostly in the realm of OpenMP, so Jakub
gets a CC.  OK to commit?


libgomp/
* omp.h.in: Don't touch the user's namespace.

---
 libgomp/omp.h.in | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git libgomp/omp.h.in libgomp/omp.h.in
index 5db4407..ce68163 100644
--- libgomp/omp.h.in
+++ libgomp/omp.h.in
@@ -22,8 +22,8 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
-#ifndef OMP_H
-#define OMP_H 1
+#ifndef _OMP_H
+#define _OMP_H 1
 
 #ifndef _LIBGOMP_OMP_LOCK_DEFINED
 #define _LIBGOMP_OMP_LOCK_DEFINED 1
@@ -104,4 +104,4 @@ int omp_in_final (void) __GOMP_NOTHROW;
 }
 #endif
 
-#endif /* OMP_H */
+#endif /* _OMP_H */


gcc/
* doc/gimple.texi (is_gimple_omp): Move into the correct section.

---
 gcc/doc/gimple.texi | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git gcc/doc/gimple.texi gcc/doc/gimple.texi
index 896aea3..7bd9fd5 100644
--- gcc/doc/gimple.texi
+++ gcc/doc/gimple.texi
@@ -711,6 +711,10 @@ Return true if g is a @code{GIMPLE_DEBUG} that binds the 
value of an
 expression to a variable.
 @end deftypefn
 
+@deftypefn {GIMPLE function} bool is_gimple_omp (gimple g)
+Return true if g is any of the OpenMP codes.
+@end deftypefn
+
 @node Manipulating GIMPLE statements
 @section Manipulating GIMPLE statements
 @cindex Manipulating GIMPLE statements
@@ -1846,11 +1850,6 @@ Return a pointer to the data argument for 
@code{OMP_PARALLEL} @code{G}.
 Set @code{DATA_ARG} to be the data argument for @code{OMP_PARALLEL} @code{G}.
 @end deftypefn
 
-@deftypefn {GIMPLE function} bool is_gimple_omp (gimple stmt)
-Returns true when the gimple statement @code{STMT} is any of the OpenMP
-types.
-@end deftypefn
-
 
 @node @code{GIMPLE_OMP_RETURN}
 @subsection @code{GIMPLE_OMP_RETURN}


gcc/
* doc/generic.texi (Adding new DECL node types): Explain *_CHECK
macros.

---
 gcc/doc/generic.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git gcc/doc/generic.texi gcc/doc/generic.texi
index cacab01..07e3f5a 100644
--- gcc/doc/generic.texi
+++ gcc/doc/generic.texi
@@ -924,6 +924,11 @@ structures, something like the following should be used
(BASE_STRUCT_CHECK(NODE)->base_struct.fieldname
 @end smallexample
 
+Reading them from the generated @file{all-tree.def} file (which in
+turn includes all the @file{tree.def} files), @file{gencheck.c} is
+used during GCC's build to generate the @code{*_CHECK} macros for all
+tree codes.
+
 @end table
 
 


The following two patches change the documentation/sources to
consistently talk about subcodes.

gcc/
* doc/generic.texi (OpenMP): OMP_CLAUSE_* are subcodes, not
sub-codes.

---
 gcc/doc/generic.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git gcc/doc/generic.texi gcc/doc/generic.texi
index 07e3f5a..ccecd6e 100644
--- gcc/doc/generic.texi
+++ gcc/doc/generic.texi
@@ -2204,7 +2204,7 @@ regular critical section around the expression is used.
 @item OMP_CLAUSE
 
 Represents clauses associated with one of the @code{OMP_} directives.
-Clauses are represented by separate sub-codes defined in
+Clauses are represented by separate subcodes defined in
 @file{tree.h}.  Clauses codes can be one of:
 @code{OMP_CLAUSE_PRIVATE}, @code{OMP_CLAUSE_SHARED},
 @code{OMP_CLAUSE_FIRSTPRIVATE},

gcc/
* gimple.c: GIMPLE statements have subcodes, not sub-codes.
* gimple.h: Likewise.

---
 gcc/gimple.c | 4 ++--
 gcc/gimple.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git gcc/gimple.c gcc/gimple.c
index dbcfa3a..4776a7d 100644
--- gcc/gimple.c
+++ gcc/gimple.c
@@ -157,7 +157,7 @@ gimple_set_subcode (gimple g, unsigned subcode)
 
 
 /* Build a tuple with operands.  CODE is the statement to build (which
-   must be one of the GIMPLE_WITH_OPS tuples).  SUBCODE is the sub-code
+   must be one of the GIMPLE_WITH_OPS tuples).  SUBCODE is the subcode
for the new tuple.  NUM_OPS is the number of operands to allocate.  */
 
 #define gimple_build_with_ops(c, s, n) \
@@ -429,7 +429,7 @@ gimple_build_assign_stat (tree lhs, tree rhs MEM_STAT_DECL)
 }
 
 
-/* Build a GIMPLE_ASSIGN statement with sub-code SUBCODE and operands
+/* Build a GIMPLE_ASSIGN statement with subcode SUBCODE and operands
OP1 and OP2.  If OP2 is NULL then SUBCODE must be of class
GIMPLE_UNARY_RHS or GIMPLE_SINGLE_RHS.  */
 
diff --git gcc/gimple.h gcc/gimple.h
index e7021a4..636b9e2 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ -81,7 +81,7 @@ enum gimple_rhs_class
 
 /* Specific flags for individual GIMPLE statements.  These flags are
always stored in gimple_statement_base.subcode and they may only be
-   defined for statement codes that do not use sub-codes.
+   defined for statement codes that do not use subcodes.
 
Values for the masks can overlap as long as the overlapping values
are never used in the same statement class.


And then, I noticed that the gcc.dg/gomp testsu

Re: RFC patch for #pragma ivdep

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 08:51:50AM +0200, Tobias Burnus wrote:
> --- a/gcc/cfgloop.c
> +++ b/gcc/cfgloop.c
> @@ -507,6 +507,39 @@ flow_loops_find (struct loops *loops)
> loop->latch = latch;
>   }
>   }
> +  /* Search for ANNOTATE call with annot_expr_ivdep_kind; if found, 
> remove
> +  it and set loop->safelen to INT_MAX.  */
> +  if (loop->latch && loop->latch->next_bb != EXIT_BLOCK_PTR
> +  && bb_seq_addr (loop->latch->next_bb))

Why this bb_seq_addr guard?

> + {
> +   gimple_stmt_iterator gsi;
> +   for (gsi = gsi_start_bb (loop->latch->next_bb);
> + gsi.bb && gsi.seq && !gsi_end_p (gsi);
> + gsi_next (&gsi))
> +  {
> +gimple stmt = gsi_stmt (gsi);
> + if (gimple_code (stmt) == GIMPLE_COND)

GIMPLE_COND must be the last stmt in a bb.  So, instead of the walk just
do
  gimple stmt = last_stmt (loop->latch->next_bb);
  if (stmt && gimple_code (stmt) == GIMPLE_COND)

Also, not sure if you really want loop->latch->next_bb rather than
look through succs of loop->latch or similar, next_bb is really chaining
of bb's together in some order, doesn't imply there is an edge in between
the previous and next bb and what the edge kind is.

Jakub