Go patch committed: Fix a few compiler crashes

2012-04-27 Thread Ian Lance Taylor
This patch to the Go frontend fixes a few cases where the compiler was
crashing on invalid code.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

diff -r cc46c1bb0278 go/expressions.cc
--- a/go/expressions.cc	Fri Apr 27 17:28:13 2012 -0700
+++ b/go/expressions.cc	Fri Apr 27 21:51:48 2012 -0700
@@ -9225,7 +9225,7 @@
   ref->set_is_lvalue();
   tree temp_tree = ref->get_tree(context);
   if (temp_tree == error_mark_node)
-	continue;
+	return error_mark_node;
 
   tree val_tree = build3_loc(loc.gcc_location(), COMPONENT_REF,
  TREE_TYPE(field), call_tree, field, NULL_TREE);
diff -r cc46c1bb0278 go/types.cc
--- a/go/types.cc	Fri Apr 27 17:28:13 2012 -0700
+++ b/go/types.cc	Fri Apr 27 21:51:48 2012 -0700
@@ -5450,6 +5450,11 @@
   mpz_t val;
   if (this->length_->numeric_constant_value(&nc) && nc.to_int(&val))
 	{
+	  if (mpz_sgn(val) < 0)
+	{
+	  this->length_tree_ = error_mark_node;
+	  return this->length_tree_;
+	}
 	  Type* t = nc.type();
 	  if (t == NULL)
 	t = Type::lookup_integer_type("int");
@@ -6551,7 +6556,11 @@
 Interface_type::is_identical(const Interface_type* t,
 			 bool errors_are_identical) const
 {
-  go_assert(this->methods_are_finalized_ && t->methods_are_finalized_);
+  // If methods have not been finalized, then we are asking whether
+  // func redeclarations are the same.  This is an error, so for
+  // simplicity we say they are never the same.
+  if (!this->methods_are_finalized_ || !t->methods_are_finalized_)
+return false;
 
   // We require the same methods with the same types.  The methods
   // have already been sorted.


Re: [RFH / Patch] PR 51222

2012-04-27 Thread Jason Merrill

On 04/27/2012 09:42 PM, Paolo Carlini wrote:

In particular about the exact meaning of the FIXME? More generally, is
the issue here matter of compile-time optimization? Like we want to save
work when we actually *know* the type even in template context? (then
looks like type_dependent_expression_p isn't really what we want...)


It's more of a mangling issue.  If the type is instantiation-dependent, 
we should mangle it as written; if not, we mangle the real type. 
Currently we don't test for instantiation-dependence, so we get this 
wrong; your change would go too far in the other direction.


Jason


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-27 Thread Maxim Kuvyrkov
On 27/04/2012, at 1:17 AM, Richard Guenther wrote:

> On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov  
> wrote:
> 
...
> +ppre_n_insert_for_speed_p (pre_expr expr, basic_block block,
> +  unsigned int inserts_needed)
> +{
> +  /* The more expensive EXPR is, the more we should be prepared to insert
> + in the predecessors of BLOCK to make EXPR fully redundant.
> + For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use
> + SSA_NAME with a constant as cheap.  */
> 
> the function implementation is odd.  cost is always 1 when used, and
> both memory loads and calls are always cheap, but for example casts are not?
> Isn't
> 
> return EDGE_COUNT (block->preds) * cost >= inserts_needed;
> 
> always true?  Or is inserts_needed not what it suggests?

It /almost/ is what it suggests.  To account for the fact that inserts will not 
benefit some of the paths (50% in the initial version of the patch, and a more 
precise estimate in a latter version) we tell/lie ppre_n_insert_for_speed_p 
that we need a greater number of the inserts (e.g., 2x the real number) that we 
will perform.

> 
> + /* Insert only if we can remove a later expression on a path
> +that we want to optimize for speed.
> +The phi node that we will be inserting in BLOCK is not free,
> +and inserting it for the sake of !optimize_for_speed 
> successor
> +may cause regressions on the speed path.  */
> + FOR_EACH_EDGE (succ, ei, block->succs)
> +   {
> + if (bitmap_set_contains_value (PA_IN (succ->dest), val))
> +   {
> + if (optimize_edge_for_speed_p (succ))
> +   do_insertion = true;
> +
> + ++pa_succs;
> +   }
> +   }
> 
> the change up to here looks good to me - can you factor out the command-line
> switch plus this optimize_edge_for_speed_p test into a separate patch
> (hereby approved)?

Attached is what I checked in after retesting on a fresh mainline.  This part 
of the patch by itself turns out to fix the bitmnp01 regression in what seems 
to be a reliable way.  Therefore, I'll mark PR38785 as fixed.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




pr38785-2.ChangeLog
Description: Binary data


pr38785-2.patch
Description: Binary data


[RFH / Patch] PR 51222

2012-04-27 Thread Paolo Carlini

Hi,

I'm having a look to this PR filed by Daniel, which is again about 
SFINAE for combined delete / new expressions, eg a simple example could 
be (Daniel provided tens)


template
auto g(int) -> char;

template
auto g(...) -> char(&)[2];

static_assert(sizeof(g(0)) == 2, "Ouch");

We handle incorrectly all such cases, all the static_assert for the 
testcases provided by Daniel fire, for a simple reason: in 
finish_decltype_type the check:


   /* FIXME instantiation-dependent  */
  if (type_dependent_expression_p (expr)
  /* In a template, a COMPONENT_REF has an IDENTIFIER_NODE for op1 even
if it isn't dependent, so that we can check access control at
instantiation time, so defer the decltype as well (PR 42277).  */
  || (id_expression_or_member_access_p
&& processing_template_decl
&& TREE_CODE (expr) == COMPONENT_REF))
 {

is false, thus we don't produce a DECLTYPE_TYPE and we don't tsubst into 
it. Nothing can possibly work... Now I did the trivial experiment of 
replacing the above with the usual:


  if (processing_template_decl)
{

and *the whole* testsuite passes (all the tests provided by Daniel 
included) with the only exception of nullptr26.C, ie:


template 
void f(T, U);

template 
void f(T, decltype(nullptr));

int main()
{
  f(1, nullptr);
}

which we reject as ambiguous (and of course it would be easy to handle 
it specially, for sure std::nullptr_t it *is* special!).


Thus, I suspect the fix for this issue could turn out to be pretty 
simple: any help?


In particular about the exact meaning of the FIXME? More generally, is 
the issue here matter of compile-time optimization? Like we want to save 
work when we actually *know* the type even in template context? (then 
looks like type_dependent_expression_p isn't really what we want...)


Thanks!
Paolo.


Go patch committed: Use less memory for some array/slice literals

2012-04-27 Thread Ian Lance Taylor
This patch to the Go frontend changes the representation of array/slice
literals to use less memory when the literal uses index keys.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.7 branch.

Ian

diff -r ebdbe2ad3ef6 go/expressions.cc
--- a/go/expressions.cc	Fri Apr 27 09:34:39 2012 -0700
+++ b/go/expressions.cc	Fri Apr 27 17:24:14 2012 -0700
@@ -6,6 +6,8 @@
 
 #include "go-system.h"
 
+#include 
+
 #include 
 
 #ifndef ENABLE_BUILD_WITH_CXX
@@ -11392,11 +11394,12 @@
 {
  protected:
   Array_construction_expression(Expression_classification classification,
-Type* type, Expression_list* vals,
-Location location)
+Type* type,
+const std::vector* indexes,
+Expression_list* vals, Location location)
 : Expression(classification, location),
-  type_(type), vals_(vals)
-  { }
+  type_(type), indexes_(indexes), vals_(vals)
+  { go_assert(indexes == NULL || indexes->size() == vals->size()); }
 
  public:
   // Return whether this is a constant initializer.
@@ -11425,6 +11428,11 @@
   void
   do_export(Export*) const;
 
+  // The indexes.
+  const std::vector*
+  indexes()
+  { return this->indexes_; }
+
   // The list of values.
   Expression_list*
   vals()
@@ -11440,7 +11448,10 @@
  private:
   // The type of the array to construct.
   Type* type_;
-  // The list of values.
+  // The list of indexes into the array, one for each value.  This may
+  // be NULL, in which case the indexes start at zero and increment.
+  const std::vector* indexes_;
+  // The list of values.  This may be NULL if there are no values.
   Expression_list* vals_;
 };
 
@@ -11523,18 +11534,6 @@
 	  this->set_is_error();
 	}
 }
-
-  Expression* length = at->length();
-  Numeric_constant nc;
-  unsigned long val;
-  if (length != NULL
-  && !length->is_error_expression()
-  && length->numeric_constant_value(&nc)
-  && nc.to_unsigned_long(&val) == Numeric_constant::NC_UL_VALID)
-{
-  if (this->vals_->size() > val)
-	this->report_error(_("too many elements in composite literal"));
-}
 }
 
 // Get a constructor tree for the array values.
@@ -11552,12 +11551,22 @@
   if (this->vals_ != NULL)
 {
   size_t i = 0;
+  std::vector::const_iterator pi;
+  if (this->indexes_ != NULL)
+	pi = this->indexes_->begin();
   for (Expression_list::const_iterator pv = this->vals_->begin();
 	   pv != this->vals_->end();
 	   ++pv, ++i)
 	{
+	  if (this->indexes_ != NULL)
+	go_assert(pi != this->indexes_->end());
 	  constructor_elt* elt = VEC_quick_push(constructor_elt, values, NULL);
-	  elt->index = size_int(i);
+
+	  if (this->indexes_ == NULL)
+	elt->index = size_int(i);
+	  else
+	elt->index = size_int(*pi);
+
 	  if (*pv == NULL)
 	{
 	  Gogo* gogo = context->gogo();
@@ -11578,7 +11587,11 @@
 	return error_mark_node;
 	  if (!TREE_CONSTANT(elt->value))
 	is_constant = false;
-	}
+	  if (this->indexes_ != NULL)
+	++pi;
+	}
+  if (this->indexes_ != NULL)
+	go_assert(pi == this->indexes_->end());
 }
 
   tree ret = build_constructor(type_tree, values);
@@ -11596,13 +11609,28 @@
   exp->write_type(this->type_);
   if (this->vals_ != NULL)
 {
+  std::vector::const_iterator pi;
+  if (this->indexes_ != NULL)
+	pi = this->indexes_->begin();
   for (Expression_list::const_iterator pv = this->vals_->begin();
 	   pv != this->vals_->end();
 	   ++pv)
 	{
 	  exp->write_c_string(", ");
+
+	  if (this->indexes_ != NULL)
+	{
+	  char buf[100];
+	  snprintf(buf, sizeof buf, "%lu", *pi);
+	  exp->write_c_string(buf);
+	  exp->write_c_string(":");
+	}
+
 	  if (*pv != NULL)
 	(*pv)->export_expression(exp);
+
+	  if (this->indexes_ != NULL)
+	++pi;
 	}
 }
   exp->write_c_string(")");
@@ -11614,8 +11642,7 @@
 Array_construction_expression::do_dump_expression(
 Ast_dump_context* ast_dump_context) const
 {
-  Expression* length = this->type_->array_type() != NULL ?
-			 this->type_->array_type()->length() : NULL;
+  Expression* length = this->type_->array_type()->length();
 
   ast_dump_context->ostream() << "[" ;
   if (length != NULL)
@@ -11625,7 +11652,22 @@
   ast_dump_context->ostream() << "]" ;
   ast_dump_context->dump_type(this->type_);
   ast_dump_context->ostream() << "{" ;
-  ast_dump_context->dump_expression_list(this->vals_);
+  if (this->indexes_ == NULL)
+ast_dump_context->dump_expression_list(this->vals_);
+  else
+{
+  Expression_list::const_iterator pv = this->vals_->begin();
+  for (std::vector::const_iterator pi =
+	 this->indexes_->begin();
+	   pi != this->indexes_->end();
+	   ++pi, ++pv)
+	{
+	  if (pi != this->indexes_->begin())
+	ast_dump_context->ostream() << ", ";
+	  ast_dump_context->ostream() << *pi << ':';
+	  ast_dump_context->dump_expression(*pv);
+	}
+}
   ast_dump_context->ostream() << "}" ;
 
 }
@@ -11636,20 +11678,19 @@
   public Array_construction_expression
 {
  public:
-  Fixed_array_constru

Re: [PATCH, diagnostics] Add -Wvarargs option

2012-04-27 Thread Dodji Seketeli
Dodji Seketeli  writes:


> Tested on x86_64-unknown-linux-gnu against trunk.  Bootstrap for all
> languages is still underway.
>
> gcc/c-family/
>
>   * c.opt (Wvarargs):  Define new option.
>
> gcc/
>   builtins.c (fold_builtin_next_arg):  Use OPT_Wvarargs as an
>   argument for the various warning_at calls.
>
> gcc/doc/
>
>   * invoke.texi: Update the documentation.
>
> gcc/testsuite/
>
>   * c-c++-common/Wvarargs.c: New test case.
>   * c-c++-common/Wvarargs-2.c: Likewise.

FWIW, this completed bootstrap and testing fine.

-- 
Dodji


Re: [PATCH 11/13] Fix va_start related location

2012-04-27 Thread Dodji Seketeli
Dodji Seketeli  writes:


> Tested on x86_64-unknown-linux-gnu against trunk.  Bootstrap is still
> running ...
>
>   * builtins.c (fold_builtin_next_arg): Unwinds to the first
>   location in real source code.
> ---
>  gcc/builtins.c |   23 +++
>  1 files changed, 19 insertions(+), 4 deletions(-)

FWIW, this completed bootstrap and testing fine.

-- 
Dodji


[ARM] Add atomic_loaddi pattern

2012-04-27 Thread Richard Henderson
We can perform a single-copy atomic load with an ldrexd insn.
If the load is all we care about, we need not pair this with
a strexd.

Ok?


r~
* config/arm/arm.md (UNSPEC_LL): New.
* config/arm/sync.md (atomic_loaddi, atomic_loaddi_1): New.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 79eff0e..75d2565 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -117,6 +117,7 @@
; that.
   UNSPEC_UNALIGNED_STORE ; Same for str/strh.
   UNSPEC_PIC_UNIFIED; Create a common pic addressing form.
+  UNSPEC_LL; Represent an unpaired load-register-exclusive.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 03838f5..de2da3b 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -65,6 +65,40 @@
(set_attr "conds" "unconditional")
(set_attr "predicable" "no")])
 
+;; Note that ldrd and vldr are *not* guaranteed to be single-copy atomic,
+;; even for a 64-bit aligned address.  Instead we use a ldrexd unparied
+;; with a store.
+(define_expand "atomic_loaddi"
+  [(match_operand:DI 0 "s_register_operand")   ;; val out
+   (match_operand:DI 1 "mem_noofs_operand");; memory
+   (match_operand:SI 2 "const_int_operand")]   ;; model
+  "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN"
+{
+  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
+  expand_mem_thread_fence (model);
+  emit_insn (gen_atomic_loaddi_1 (operands[0], operands[1]));
+  if (model == MEMMODEL_SEQ_CST)
+expand_mem_thread_fence (model);
+  DONE;
+})
+
+(define_insn "atomic_loaddi_1"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+   (unspec:DI [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
+  UNSPEC_LL))]
+  "TARGET_HAVE_LDREXD && ARM_DOUBLEWORD_ALIGN"
+  {
+rtx target = operands[0];
+/* The restrictions on target registers in ARM mode are that the two
+   registers are consecutive and the first one is even; Thumb is
+   actually more flexible, but DI should give us this anyway.
+   Note that the 1st register always gets the lowest word in memory.  */
+gcc_assert ((REGNO (target) & 1) == 0);
+operands[2] = gen_rtx_REG (SImode, REGNO (target) + 1);
+return "ldrexd%?\t%0, %2, %C1";
+  }
+  [(set_attr "predicable" "yes")])
+
 (define_expand "atomic_compare_and_swap"
   [(match_operand:SI 0 "s_register_operand" "");; bool out
(match_operand:QHSD 1 "s_register_operand" "")  ;; val out


Re: [PATCH] Proper use of decl_function_context in dwar2out.c

2012-04-27 Thread Jason Merrill

On 04/27/2012 07:56 AM, Martin Jambor wrote:

PR lto/53138
* dwarf2out.c (dwarf2out_decl): Only lookup die representing context
of a variable when the contect is a function.


OK.

Jason



[committed] Fix section conflict compiling rtld.c

2012-04-27 Thread John David Anglin
The attached change fixes a problem compiling the linux dynamic loader
for the PA target.

Putting function labels (plabels) in the constant pool results
in a section flag conflict compiling rtld.c.  Other targets don't
do this, and testing indicates that it is not necessary.

Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and
hppa64-hp-hpux11.11 with no regressions.

Committed to trunk.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

2012-04-27  John David Anglin  

PR target/52999
* config/pa/pa.c (pa_legitimate_constant_p): Don't put function labels
in constant pool.

Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 186918)
+++ config/pa/pa.c  (working copy)
@@ -10332,9 +10332,6 @@
   && !pa_cint_ok_for_move (INTVAL (x)))
 return false;
 
-  if (function_label_operand (x, mode))
-return false;
-
   return true;
 }
 


Re: [PATCH] teach phi-opt to produce -(a COND b)

2012-04-27 Thread Paolo Bonzini
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53144

Looks like a PPRE bug.

Paolo


[PATCH] libatomic, v2

2012-04-27 Thread Richard Henderson
This tree is also available from

  git://repo.or.cz/gcc/rth.git rth/libatomic

Changes since v1:
  * I believe I've addressed all of Torvald's feedback especially
re barrier problems.  I have not changed the lock hash function,
as there were no concrete suggestions, and certainly that ought
not be an issue for correctness.

  * I've increased the set of objects that can be handled lock-free
in the generic routines.  If there is a supported aligned integer
that overlaps the object, we'll use compare-exchange.

  * "Tuning", aka enabling multiple paths to elide memory barriers,
for ia64 and ppc targets.

Tested on armv4, armv7, x86_64, ia64, sparc, and ppc64 linux.

I think the library is ready for merge back to mainline, but
I will wait until at least Monday for feedback and proximity
to my desk to deal with potential fallout.


r~


d-libatomic-2.gz
Description: GNU Zip compressed data


Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-04-27 Thread Mark Wielaard
On Fri, 2012-04-27 at 15:43 +0200, Jakub Jelinek wrote:
> On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote:
> > But even without this, I think the patch is worth it just to get rid of
> > all the relocations necessary otherwise.
> 
> IMHO we should defer applying this by a few months, given that GDB support
> is only being added these days and -gdwarf-4 is now the default.
> Applying it in August/September when there is already GDB version with
> the support would be better.

OK, I'll try to remember and ping as soon as a new GDB release is out
with the patch in.

> > +  attr.dw_attr = DW_AT_high_pc;
> > +  if (dwarf_version < 3)
> 
> Shouldn't this be < 4?  I think DWARF 3 doesn't have constant class
> DW_AT_high_pc.

Oops, yes, thanks for spotting.

Cheers,

Mark


Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-27 Thread Rong Xu
On Fri, Apr 27, 2012 at 5:06 AM, Jan Hubicka  wrote:
>> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou  
>> wrote:
>> >> We do not depend on the block structure any more when dealing with
>> >> stack layout for variables in GCC 4.7.0 and above.  I am not saying
>> >> your patch is incorrect or not needed.  Just it will not have an
>> >> effect on variable stack layout.
>> >
>> > It might be worth backporting to the 4.6 branch though, these stack layout
>> > issues are very nasty.
>>
>> What about not setting a BLOCK on the call stmt?  That said, I can't see
>
> I recall that not seetting the block did lead to ICE...
>
>> how outlining a SESE region that starts at the beginning of the function
>> is not conservatively using the outermost BLOCK ... so, is the bug not
>> elsewhere?
>
> ... so I made the exactly same conclussion.
> The problem is with stack vars allocated in header that survive till the
> tail part?

A SESE region does not necessary at the beginning of the function.
They can be anywhere.
In the example I attached, it is at the end of function.

Even if the outlined region is at the beginning the function. setting
the call_stmt as the outermost
block is also incorrect.
For c++ programs, the block for function level local variables is not
DECL_INITIAL(current_function_decl).
It is the subblock of DECL_INITIAL(current_function_decl).
This is different from c programs.

-Rong

>
> Honza
>>
>> Richard.
>>
>> > --
>> > Eric Botcazou


Re: RFA: consolidate DWARF strings into libiberty

2012-04-27 Thread Tom Tromey
> "Tom" == Tom Tromey  writes:

HJ> You should add extern "C" for C++ on those functions moved to
HJ> libiberty.

Tom> Yeah, sorry about that.
Tom> I'm testing the fix.

Here is what I am checking in.

Tom

ChangeLog:
2012-04-27  Tom Tromey  

* dwarf2.h: Wrap function declarations in extern "C".

Index: dwarf2.h
===
--- dwarf2.h(revision 186908)
+++ dwarf2.h(working copy)
@@ -361,6 +361,10 @@
 #define DW_EH_PE_indirect  0x80
 
 
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
 /* Return the name of a DW_TAG_ constant, or NULL if the value is not
recognized.  */
 extern const char *get_DW_TAG_name (unsigned int tag);
@@ -385,4 +389,8 @@
recognized.  */
 extern const char *get_DW_CFA_name (unsigned int opc);
 
+#ifdef __cplusplus
+}
+#endif /* __cplusplus */
+
 #endif /* _DWARF2_H */


Re: [PATCH] teach phi-opt to produce -(a COND b)

2012-04-27 Thread H.J. Lu
On Fri, Apr 27, 2012 at 3:01 AM, Paolo Bonzini  wrote:
> This patch teaches phiopt to look at phis whose arguments are -1 and 0,
> and produce negated setcc statements.
>
> Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch
> for pr53138.  Ok for mainline?
>
> Paolo
>
> 2012-04-27  Paolo Bonzini  
>
>        * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs
>        whose arguments are -1 and 0, by negating the result of the
>        conditional.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53144

H.J.


libgo patch committed: Provide i386 long double math fns if needed

2012-04-27 Thread Ian Lance Taylor
This patch to libgo provides the long double math functions if they are
not in the system libm.  This is needed on i386 because the Go frontend
automatically converts calls to these functions from float64 (aka C
double) to C long double, as guided by express_precision_type.  This is
done so that the compiler can use the x87 instructions when available.
However, when not optimizing, the compiler will still emit a call to the
function.  This occurs in particular when running the libgo math test;
I'm not sure it actually happens at any other time.  This patch provides
a sufficient implementation of these functions for use by Go.  This
fixes another part of PR 52358.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu and i386-sun-solaris2.9.  Committed to mainline
and 4.7 branch.

Ian

diff -r a253ccfa3489 libgo/configure.ac
--- a/libgo/configure.ac	Fri Apr 27 09:30:57 2012 -0700
+++ b/libgo/configure.ac	Fri Apr 27 09:33:21 2012 -0700
@@ -489,6 +489,11 @@
 AC_TYPE_OFF_T
 AC_CHECK_TYPES([loff_t])
 
+LIBS_hold="$LIBS"
+LIBS="$LIBS -lm"
+AC_CHECK_FUNCS(cosl expl logl sinl tanl acosl asinl atanl atan2l expm1l ldexpl log10l log1pl)
+LIBS="$LIBS_hold"
+
 CFLAGS_hold="$CFLAGS"
 CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
 LIBS_hold="$LIBS"
diff -r a253ccfa3489 libgo/runtime/go-nosys.c
--- a/libgo/runtime/go-nosys.c	Fri Apr 27 09:30:57 2012 -0700
+++ b/libgo/runtime/go-nosys.c	Fri Apr 27 09:33:21 2012 -0700
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -239,3 +240,116 @@
   return -1;
 }
 #endif
+
+/* Long double math functions.  These are needed on old i386 systems
+   that don't have them in libm.  The compiler translates calls to
+   these functions on float64 to call an 80-bit floating point
+   function instead, because when optimizing that function can be
+   executed as an x87 instructure.  However, when not optimizing, this
+   translates into a call to the math function.  So on systems that
+   don't provide these functions, we provide a version that just calls
+   the float64 version.  */
+
+#ifndef HAVE_COSL
+long double
+cosl (long double a)
+{
+  return (long double) cos ((double) a);
+}
+#endif
+
+#ifndef HAVE_EXPL
+long double
+expl (long double a)
+{
+  return (long double) exp ((double) a);
+}
+#endif
+
+#ifndef HAVE_LOGL
+long double
+logl (long double a)
+{
+  return (long double) log ((double) a);
+}
+#endif
+
+#ifndef HAVE_SINL
+long double
+sinl (long double a)
+{
+  return (long double) sin ((double) a);
+}
+#endif
+
+#ifndef HAVE_TANL
+long double
+tanl (long double a)
+{
+  return (long double) tan ((double) a);
+}
+#endif
+
+#ifndef HAVE_ACOSL
+long double
+acosl (long double a)
+{
+  return (long double) acos ((double) a);
+}
+#endif
+
+#ifndef HAVE_ASINL
+long double
+asinl (long double a)
+{
+  return (long double) asin ((double) a);
+}
+#endif
+
+#ifndef HAVE_ATANL
+long double
+atanl (long double a)
+{
+  return (long double) atan ((double) a);
+}
+#endif
+
+#ifndef HAVE_ATAN2L
+long double
+atan2l (long double a, long double b)
+{
+  return (long double) atan2 ((double) a, (double) b);
+}
+#endif
+
+#ifndef HAVE_EXPM1L
+long double
+expm1l (long double a)
+{
+  return (long double) expm1 ((double) a);
+}
+#endif
+
+#ifndef HAVE_LDEXPL
+long double
+ldexpl (long double a, int exp)
+{
+  return (long double) ldexp ((double) a, exp);
+}
+#endif
+
+#ifndef HAVE_LOG10L
+long double
+log10l (long double a)
+{
+  return (long double) log10 ((double) a);
+}
+#endif
+
+#ifndef HAVE_LOG1PL
+long double
+log1pl (long double a)
+{
+  return (long double) log1p ((double) a);
+}
+#endif


libgo patch committed: Work around bug in Solaris 9 ldexp

2012-04-27 Thread Ian Lance Taylor
The Solaris 9 ldexp function has a bug: ldexp(-1, -1075) returns
positive zero when it should return negative zero.  This patch works
around this bug in the Go math package.  This fixes part of PR 52358.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and
i386-sun-solaris2.9.  Committed to mainline and 4.7 branch.

Ian

diff -r c1d1d764b7a6 libgo/go/math/ldexp.go
--- a/libgo/go/math/ldexp.go	Fri Apr 27 09:26:59 2012 -0700
+++ b/libgo/go/math/ldexp.go	Fri Apr 27 09:29:05 2012 -0700
@@ -16,7 +16,18 @@
 func libc_ldexp(float64, int) float64
 
 func Ldexp(frac float64, exp int) float64 {
-	return libc_ldexp(frac, exp)
+ 	r := libc_ldexp(frac, exp)
+
+	// Work around a bug in the implementation of ldexp on Solaris
+	// 9.  If multiplying a negative number by 2 raised to a
+	// negative exponent underflows, we want to return negative
+	// zero, but the Solaris 9 implementation returns positive
+	// zero.  This workaround can be removed when and if we no
+	// longer care about Solaris 9.
+	if r == 0 && frac < 0 && exp < 0 {
+		r = Copysign(0, frac)
+	}
+	return r
 }
 
 func ldexp(frac float64, exp int) float64 {


libgo patch committed: Correct syscall.Setenv if no setenv

2012-04-27 Thread Ian Lance Taylor
This patch to libgo corrects the implementation of syscall.Setenv for
systems that do not have the setenv library call, but only have putenv.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and
i386-sun-solaris2.9.  Committed to mainline and 4.7 branch.

Ian

diff -r 1ed95295e00b libgo/runtime/go-setenv.c
--- a/libgo/runtime/go-setenv.c	Wed Apr 25 21:25:31 2012 -0700
+++ b/libgo/runtime/go-setenv.c	Fri Apr 27 09:25:43 2012 -0700
@@ -50,7 +50,7 @@
 
 #else /* !defined(HAVE_SETENV) */
 
-  kn = malloc (k.__length + v.__length + 2);
+  kn = __go_alloc (k.__length + v.__length + 2);
   __builtin_memcpy (kn, ks, k.__length);
   kn[k.__length] = '=';
   __builtin_memcpy (kn + k.__length + 1, vs, v.__length);


Re: Fix find_moveable_pseudos, PR52997

2012-04-27 Thread Ulrich Weigand
Bernd Schmidt wrote:

> We're creating new pseudos, and while we're resizing some data 
> structures, we aren't doing it for everything.

> @@ -3983,7 +3983,8 @@ find_moveable_pseudos (void)
>  
>last_moveable_pseudo = max_reg_num ();
>  
> -  fix_reg_equiv_init();
> +  fix_reg_equiv_init ();
> +  resize_reg_info ();
>regstat_free_n_sets_and_refs ();
>regstat_free_ri ();
>regstat_init_n_sets_and_refs ();

This causes a bootstrap failure on s390 when enabling
-fsched-pressure -fsched-pressure-algorithm=model by default
(not sure why this doesn't show up without that change).

The problem is that resize_reg_info only resizes the data
structure, but leaves it uninitialized.  All other uses
of resize_reg_info subsequently fill in that data,
e.g. in ira-costs.c:find_costs_and_classes.

  if (!resize_reg_info () && allocno_p
  && pseudo_classes_defined_p && flag_expensive_optimizations)
{
[...]

But because the structure was now already resized in find_moveable_pseudos,
the next call to resize_reg_info returns false, and thus find_costs_and_classes
assumes the info is actually present, but it is uninitialized, which causes
a crash in the subsequent code

  pref[ALLOCNO_NUM (a)] = reg_preferred_class (ALLOCNO_REGNO (a));
  setup_regno_cost_classes_by_aclass
(ALLOCNO_REGNO (a), pref[ALLOCNO_NUM (a)]);
  max_cost_classes_num
= MAX (max_cost_classes_num,
   regno_cost_classes[ALLOCNO_REGNO (a)]->num);

since reg_preferred_class returns 255 for the new pseudo.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: RFA: consolidate DWARF strings into libiberty

2012-04-27 Thread Tom Tromey
HJ> You should add extern "C" for C++ on those functions moved to
HJ> libiberty.

Yeah, sorry about that.
I'm testing the fix.

Tom


Re: RFA: consolidate DWARF strings into libiberty

2012-04-27 Thread H.J. Lu
On Fri, Apr 27, 2012 at 9:01 AM, H.J. Lu  wrote:
> On Fri, Apr 27, 2012 at 7:11 AM, Tom Tromey  wrote:
>>> "Jakub" == Jakub Jelinek  writes:
>>
>> Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote:

 I will not oppose adding more unrelated stuff to libiberty, but
 neither will I approve it.  I will let one of the other maintainers or
 a global maintainer approve it.
>>
>> Jakub> The original libiberty patch is ok for trunk then.
>>
>> Thanks, Jakub.  I am checking it in to the gcc tree.
>> I'll send the gdb and binutils follow-ups later.
>>
>> Tom
>
> This breaks GCC bootstrap:
>
> http://gcc.gnu.org/ml/gcc-regression/2012-04/msg00500.html
>

You should add extern "C" for C++ on those functions moved to
libiberty.


-- 
H.J.


Re: RFA: consolidate DWARF strings into libiberty

2012-04-27 Thread H.J. Lu
On Fri, Apr 27, 2012 at 7:11 AM, Tom Tromey  wrote:
>> "Jakub" == Jakub Jelinek  writes:
>
> Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote:
>>>
>>> I will not oppose adding more unrelated stuff to libiberty, but
>>> neither will I approve it.  I will let one of the other maintainers or
>>> a global maintainer approve it.
>
> Jakub> The original libiberty patch is ok for trunk then.
>
> Thanks, Jakub.  I am checking it in to the gcc tree.
> I'll send the gdb and binutils follow-ups later.
>
> Tom

This breaks GCC bootstrap:

http://gcc.gnu.org/ml/gcc-regression/2012-04/msg00500.html

-- 
H.J.


Re: [PATCH, i386, middle-end, tessuite] Intel TSX's HLE.

2012-04-27 Thread Richard Henderson
On 04/27/12 05:49, Kirill Yukhin wrote:
> +  if (targetm.memmodel_check)
> +val = targetm.memmodel_check (val);
> +  else if (val & ~MEMMODEL_MASK)
> +
> +{

Incorrect vertical whitespace.

> +  if ( (failure & MEMMODEL_MASK) == MEMMODEL_RELEASE
> +   || (failure & MEMMODEL_MASK) == MEMMODEL_ACQ_REL)

Incorrect horizontal whitespace.

>{"generic32", PROCESSOR_GENERIC32, CPU_PENTIUMPRO,
> - 0 /* flags are only used for -march switch.  */ },
> + PTA_HLE /* flags are only used for -march switch.  */ },
>{"generic64", PROCESSOR_GENERIC64, CPU_GENERIC64,
> - PTA_64BIT /* flags are only used for -march switch.  */ },
> + PTA_64BIT
> +| PTA_HLE /* flags are only used for -march switch.  */ },

Adding to the generic tunings I suggested.  Removing the HLE
bit from core-avx2 or whatever cpu actually has the bit I did
not suggest.

Otherwise, this looks good.


r~



Re: [PATCH] Default to -gdwarf-4

2012-04-27 Thread H.J. Lu
On Wed, Apr 25, 2012 at 6:47 AM, Jakub Jelinek  wrote:
> Hi!
>
> For reasonable debugging experience recent GCC versions need
> GDB >= 7.0 for quite some time, and DWARF4 is almost 2 years old now,
> and offers lots of improvements over DWARF2 we still default to.
>
> So, I'd like to make -gdwarf-4 (just the version, of course -g
> is needed to enable debug info generation) the default, together
> with -fno-debug-types-section (as .debug_types isn't right now always a win,
> see the data in the dwz-0.1 announcement plus not all DWARF consumers can
> handle it yet or are gaining support only very recently (e.g. valgrind))
> and -grecord-gcc-switches which is IMHO worth the few extra bytes per CU
> (unless every CU is compiled with different code generation affecting
> options usually just 4 extra bytes).  In Fedora we default to this combo
> already for some time.  Targets that have tools that are upset by
> any extensions that defaulted to -gno-strict-dwarf previously will now
> default to -gdwarf-2 as before.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2012-04-25  Jakub Jelinek  
>
>        * common.opt (flag_debug_types_section): Default to 0.
>        (dwarf_version): Default to 4.
>        (dwarf_record_gcc_switches): Default to 1.
>        (dwarf_strict): Default to 0.
>        * toplev.c (process_options): Don't handle dwarf_strict
>        or dwarf_version here.
>        * config/vxworks.c (vxworks_override_options): Don't
>        test whether dwarf_strict or dwarf_version are negative,
>        instead test !global_options_set.x_dwarf_*.
>        * config/darwin.c (darwin_override_options): Default to
>        dwarf_version 2.
>        * doc/invoke.texi: Note that -gdwarf-4, -grecord-gcc-switches
>        and -fno-debug-types-section are now the default.
>

This caused:

FAIL: gcc.dg/pch/save-temps-1.c  -O0 -g assembly comparison
FAIL: gcc.dg/pch/save-temps-1.c   -O3 -g  assembly comparison

on Linux/x86-64 with --target_board='unix{-m32}'.

-- 
H.J.


Re: [PATCH] teach phi-opt to produce -(a COND b)

2012-04-27 Thread Andrew Pinski
On Fri, Apr 27, 2012 at 3:01 AM, Paolo Bonzini  wrote:
> This patch teaches phiopt to look at phis whose arguments are -1 and 0,
> and produce negated setcc statements.
>
> Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch
> for pr53138.  Ok for mainline?

I came up with this same patch independently.  I was going to submit
it but I never got around to it.
Thanks for submitting this.

-- Andrew Pinski

>
> Paolo
>
> 2012-04-27  Paolo Bonzini  
>
>        * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs
>        whose arguments are -1 and 0, by negating the result of the
>        conditional.
>
> 2012-04-27  Paolo Bonzini  
>
>        * gcc.c-torture/execute/20120427-2.c: New testcase.
>        * gcc.dg/tree-ssa/phi-opt-10.c: New testcase.
>
>
> Index: tree-ssa-phiopt.c
> ===
> --- tree-ssa-phiopt.c   (revisione 186859)
> +++ tree-ssa-phiopt.c   (copia locale)
> @@ -536,17 +536,21 @@
>   gimple_stmt_iterator gsi;
>   edge true_edge, false_edge;
>   tree new_var, new_var2;
> +  bool neg;
>
>   /* FIXME: Gimplification of complex type is too hard for now.  */
>   if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE
>       || TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE)
>     return false;
>
> -  /* The PHI arguments have the constants 0 and 1, then convert
> -     it to the conditional.  */
> +  /* The PHI arguments have the constants 0 and 1, or 0 and -1, then
> +     convert it to the conditional.  */
>   if ((integer_zerop (arg0) && integer_onep (arg1))
>       || (integer_zerop (arg1) && integer_onep (arg0)))
> -    ;
> +    neg = false;
> +  else if ((integer_zerop (arg0) && integer_all_onesp (arg1))
> +          || (integer_zerop (arg1) && integer_all_onesp (arg0)))
> +    neg = true;
>   else
>     return false;
>
> @@ -558,7 +562,7 @@
>      falls through into BB.
>
>      There is a single PHI node at the join point (BB) and its arguments
> -     are constants (0, 1).
> +     are constants (0, 1) or (0, -1).
>
>      So, given the condition COND, and the two PHI arguments, we can
>      rewrite this PHI into non-branching code:
> @@ -585,12 +589,20 @@
>      edge so that we know when to invert the condition below.  */
>   extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
>   if ((e0 == true_edge && integer_zerop (arg0))
> -      || (e0 == false_edge && integer_onep (arg0))
> +      || (e0 == false_edge && !integer_zerop (arg0))
>       || (e1 == true_edge && integer_zerop (arg1))
> -      || (e1 == false_edge && integer_onep (arg1)))
> +      || (e1 == false_edge && !integer_zerop (arg1)))
>     cond = fold_build1_loc (gimple_location (stmt),
> -                           TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
> +                            TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
>
> +  if (neg)
> +    {
> +      cond = fold_convert_loc (gimple_location (stmt),
> +                               TREE_TYPE (result), cond);
> +      cond = fold_build1_loc (gimple_location (stmt),
> +                              NEGATE_EXPR, TREE_TYPE (cond), cond);
> +    }
> +
>   /* Insert our new statements at the end of conditional block before the
>      COND_STMT.  */
>   gsi = gsi_for_stmt (stmt);
> Index: testsuite/gcc.c-torture/execute/20120427-2.c
> ===
> --- testsuite/gcc.c-torture/execute/20120427-2.c        (revisione 0)
> +++ testsuite/gcc.c-torture/execute/20120427-2.c        (revisione 0)
> @@ -0,0 +1,38 @@
> +typedef struct sreal
> +{
> +  unsigned sig;                /* Significant.  */
> +  int exp;             /* Exponent.  */
> +} sreal;
> +
> +sreal_compare (sreal *a, sreal *b)
> +{
> +  if (a->exp > b->exp)
> +    return 1;
> +  if (a->exp < b->exp)
> +    return -1;
> +  if (a->sig > b->sig)
> +    return 1;
> +  if (a->sig < b->sig)
> +    return -1;
> +  return 0;
> +}
> +
> +sreal a[] = {
> +   { 0, 0 },
> +   { 1, 0 },
> +   { 0, 1 },
> +   { 1, 1 }
> +};
> +
> +int main()
> +{
> +  int i, j;
> +  for (i = 0; i <= 3; i++) {
> +    for (j = 0; j < 3; j++) {
> +      if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort();
> +      if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort();
> +      if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort();
> +    }
> +  }
> +  return 0;
> +}
> Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c
> =

Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread Sriraman Tallam
On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu  wrote:
> On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam  wrote:
>> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu  wrote:
>>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam  
>>> wrote:
 On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu  wrote:
> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  
> wrote:
>> Hi,
>>
>>   I have made the following changes in this new patch which is attached:
>>
>> * Use target attribute itself to create function versions.
>> * Handle any number of ISA names and arch=  args to target attribute,
>> generating the right dispatchers.
>> * Integrate with the CPU runtime detection checked in this week.
>> * Overload resolution: If the caller's target matches any of the
>> version function's target, then a direct call to the version is
>> generated, no need to go through the dispatching.
>>
>> Patch also available for review here:
>> http://codereview.appspot.com/5752064
>>
>
> Does it work with
>
> int foo ();
> int foo () __attribute__ ((targetv("arch=corei7")));
>
> int (*foo_p) () = foo?

 Yes, this will work. foo_p will be the address of the dispatcher
 function and hence doing (*foo_p)() will call the right version.
>>>
>>> Even when foo_p is a global variable and compiled with -fPIC?
>>
>> I am not sure I understand what the complication is here, but FWIW, I
>> tried this example and it works
>>
>> int foo ()
>> {
>>  return 0;
>> }
>>
>> int  __attribute__ ((target ("arch=corei7)))
>> foo ()
>> {
>>  return 1;
>> }
>>
>> int (*foo_p)() = foo;
>> int main ()
>> {
>>  return (*foo_p)();
>> }
>>
>> g++ -fPIC -O2 example.cc
>>
>>
>> Did you have something else in mind? Could you please elaborate if you
>> a have a particular case in mind.
>>
>
> That is what I meant.  But I didn't see it in your testcase.
> Can you add it to your testcase?
>
> Also you should verify the correct function is called in
> your testcase at run-time.

Ok, i will update the patch.

Thanks,
-Sri.

>
>
> Thanks.
>
>
> --
> H.J.


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread H.J. Lu
On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam  wrote:
> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu  wrote:
>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam  wrote:
>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu  wrote:
 On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  
 wrote:
> Hi,
>
>   I have made the following changes in this new patch which is attached:
>
> * Use target attribute itself to create function versions.
> * Handle any number of ISA names and arch=  args to target attribute,
> generating the right dispatchers.
> * Integrate with the CPU runtime detection checked in this week.
> * Overload resolution: If the caller's target matches any of the
> version function's target, then a direct call to the version is
> generated, no need to go through the dispatching.
>
> Patch also available for review here:
> http://codereview.appspot.com/5752064
>

 Does it work with

 int foo ();
 int foo () __attribute__ ((targetv("arch=corei7")));

 int (*foo_p) () = foo?
>>>
>>> Yes, this will work. foo_p will be the address of the dispatcher
>>> function and hence doing (*foo_p)() will call the right version.
>>
>> Even when foo_p is a global variable and compiled with -fPIC?
>
> I am not sure I understand what the complication is here, but FWIW, I
> tried this example and it works
>
> int foo ()
> {
>  return 0;
> }
>
> int  __attribute__ ((target ("arch=corei7)))
> foo ()
> {
>  return 1;
> }
>
> int (*foo_p)() = foo;
> int main ()
> {
>  return (*foo_p)();
> }
>
> g++ -fPIC -O2 example.cc
>
>
> Did you have something else in mind? Could you please elaborate if you
> a have a particular case in mind.
>

That is what I meant.  But I didn't see it in your testcase.
Can you add it to your testcase?

Also you should verify the correct function is called in
your testcase at run-time.


Thanks.


-- 
H.J.


Re: [libcpp] maybe canonicalize system paths in line-map

2012-04-27 Thread Dodji Seketeli
Manuel López-Ibáñez  a écrit:

> Another drawback I didn't realize until now is that in this way the
> canonicalize every path, instead of only touching those that belong to
> system headers.

Ah.  Good catch.

I guess file->dir->sysp should tell us if we are in a system directory,
so that we can avoid canonicalizing in that case.

> I am not sure if it is important or not.

I don't know either.  But I guess we could err on the safe side by
limiting the change to system just headers like you were initially
proposing.

-- 
Dodji


Re: [PATCH][ARM] NEON DImode immediate constants

2012-04-27 Thread Richard Earnshaw
On 30/03/12 12:15, Andrew Stubbs wrote:
> On 28/02/12 16:20, Andrew Stubbs wrote:
>> Hi all,
>>
>> This patch implements 64-bit immediate constant loads in NEON.
>>
>> The current state is that you can load const_vector, but not const_int.
>> This is clearly not ideal. The result is a constant pool entry when it's
>> not necessary.
>>
>> The patch disables the movdi_vfp patterns for loading DImode values, if
>> the operand is const_int and NEON is enabled, and extends the neon_mov
>> pattern to include DImode const_int, as well as the const_vector
>> operands. I've modified neon_valid_immediate only enough to accept
>> const_int input - the logic remains untouched.
> 
> That patch failed to bootstrap successfully, but this updated patch 
> bootstraps and tests with no regressions.
> 
> OK?

Sorry for the delay.

This is OK.

It would be good to merge all the target32 movdi variants into one
pattern and then use alternative enabling to deal with the different
valid alternatives.

R.
> 
> Andrew
> 
> 
> neon-loadimm64.patch
> 
> 
> 2012-03-27  Andrew Stubbs  
> 
>   gcc/
>   * config/arm/arm.c (neon_valid_immediate): Allow const_int.
>   (arm_print_operand): Add 'x' format.
>   * config/arm/constraints.md (Dn): Allow const_int.
>   * config/arm/neon.md (neon_mov): Use VDX to allow DImode.
>   Use 'x' format to print constants.
>   * config/arm/predicates.md (imm_for_neon_mov_operand): Allow const_int.
>   * config/arm/vfp.md (movdi_vfp): Disable for const_int when neon
>   is enabled.
>   (movdi_vfp_cortexa8): Likewise.
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 0bded8d..492ddde 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -8873,11 +8873,25 @@ neon_valid_immediate (rtx op, enum machine_mode mode, 
> int inverse,
>break; \
>  }
>  
> -  unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op);
> -  unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode));
> +  unsigned int i, elsize = 0, idx = 0, n_elts;
> +  unsigned int innersize;
>unsigned char bytes[16];
>int immtype = -1, matches;
>unsigned int invmask = inverse ? 0xff : 0;
> +  bool vector = GET_CODE (op) == CONST_VECTOR;
> +
> +  if (vector)
> +{
> +  n_elts = CONST_VECTOR_NUNITS (op);
> +  innersize = GET_MODE_SIZE (GET_MODE_INNER (mode));
> +}
> +  else
> +{
> +  n_elts = 1;
> +  if (mode == VOIDmode)
> + mode = DImode;
> +  innersize = GET_MODE_SIZE (mode);
> +}
>  
>/* Vectors of float constants.  */
>if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
> @@ -8913,7 +8927,7 @@ neon_valid_immediate (rtx op, enum machine_mode mode, 
> int inverse,
>/* Splat vector constant out into a byte vector.  */
>for (i = 0; i < n_elts; i++)
>  {
> -  rtx el = CONST_VECTOR_ELT (op, i);
> +  rtx el = vector ? CONST_VECTOR_ELT (op, i) : op;
>unsigned HOST_WIDE_INT elpart;
>unsigned int part, parts;
>  
> @@ -17230,6 +17244,19 @@ arm_print_operand (FILE *stream, rtx x, int code)
>   }
>return;
>  
> +/* An integer that we want to print in HEX.  */
> +case 'x':
> +  switch (GET_CODE (x))
> + {
> + case CONST_INT:
> +   fprintf (stream, "#" HOST_WIDE_INT_PRINT_HEX, INTVAL (x));
> +   break;
> +
> + default:
> +   output_operand_lossage ("Unsupported operand for code '%c'", code);
> + }
> +  return;
> +
>  case 'B':
>if (GET_CODE (x) == CONST_INT)
>   {
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 7d0269a..68979c1 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -255,9 +255,9 @@
>  
>  (define_constraint "Dn"
>   "@internal
> -  In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov
> -  immediate instruction."
> - (and (match_code "const_vector")
> +  In ARM/Thumb-2 state a const_vector or const_int which can be loaded with a
> +  Neon vmov immediate instruction."
> + (and (match_code "const_vector,const_int")
>(match_test "TARGET_32BIT
>  && imm_for_neon_mov_operand (op, GET_MODE (op))")))
>  
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index d7caa37..3c88568 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -152,9 +152,9 @@
>  (define_attr "vqh_mnem" "vadd,vmin,vmax" (const_string "vadd"))
>  
>  (define_insn "*neon_mov"
> -  [(set (match_operand:VD 0 "nonimmediate_operand"
> +  [(set (match_operand:VDX 0 "nonimmediate_operand"
> "=w,Uv,w, w,  ?r,?w,?r,?r, ?Us")
> - (match_operand:VD 1 "general_operand"
> + (match_operand:VDX 1 "general_operand"
> " w,w, Dn,Uvi, w, r, r, Usi,r"))]
>"TARGET_NEON
> && (register_operand (operands[0], mode)
> @@ -173,7 +173,7 @@
>if (width == 0)
>  return "vmov.f32\t%P0, %1  @ ";
>else
> -sprintf 

[PATCH, diagnostics] Add -Wvarargs option

2012-04-27 Thread Dodji Seketeli
Hello Gabriel,

Following your request[1], please find below the implementation for the
-Wvarargs option, as well as its introductory text.  It applies on top
the changes to enable -ftrack-macro-expansion by default[2].

[1]: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01604.html
[2]: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00532.html

---
Several warnings related to questionable usage cases of variadic
function related macros (like va_start) could not be controlled by any
warning-related macro.  Fixed thus, by introducing the -Wvarargs option.

Tested on x86_64-unknown-linux-gnu against trunk.  Bootstrap for all
languages is still underway.

gcc/c-family/

* c.opt (Wvarargs):  Define new option.

gcc/
builtins.c (fold_builtin_next_arg):  Use OPT_Wvarargs as an
argument for the various warning_at calls.

gcc/doc/

* invoke.texi: Update the documentation.

gcc/testsuite/

* c-c++-common/Wvarargs.c: New test case.
* c-c++-common/Wvarargs-2.c: Likewise.
---
 gcc/builtins.c  |8 ++--
 gcc/c-family/c.opt  |4 ++
 gcc/doc/invoke.texi |7 
 gcc/testsuite/c-c++-common/Wvarargs-2.c |   33 +++
 gcc/testsuite/c-c++-common/Wvarargs.c   |   54 +++
 5 files changed, 102 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wvarargs-2.c
 create mode 100644 gcc/testsuite/c-c++-common/Wvarargs.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 5ddc47b..41a052b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -12127,8 +12127,8 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
  /* Evidently an out of date version of ; can't validate
 va_start's second argument, but can still work as intended.  */
  warning_at (current_location,
- 0,
- "%<__builtin_next_arg%> called without an argument");
+ OPT_Wvarargs,
+  "%<__builtin_next_arg%> called without an argument");
  return true;
}
   else if (nargs > 1)
@@ -12164,7 +12164,7 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
 argument so that we will get wrong-code because of
 it.  */
  warning_at (current_location,
- 0,
+ OPT_Wvarargs,
  "second parameter of % not last named 
argument");
}
 
@@ -12177,7 +12177,7 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
   else if (DECL_REGISTER (arg))
{
  warning_at (current_location,
- 0,
+ OPT_Wvarargs,
  "undefined behaviour when second parameter of "
  "% is declared with % storage");
}
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index db8ca81..a038e57 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -685,6 +685,10 @@ Wvariadic-macros
 C ObjC C++ ObjC++ Warning
 Do not warn about using variadic macros when -pedantic
 
+Wvarargs
+C ObjC C++ ObjC++ Warning Var(warn_varargs) Init(1)
+Warn about questionable usage of the macros used to retrieve variable arguments
+
 Wvla
 C ObjC C++ ObjC++ Var(warn_vla) Init(-1) Warning
 Warn if a variable length array is used
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 280fac3..47100b0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4657,6 +4657,13 @@ Warn if variadic macros are used in pedantic ISO C90 
mode, or the GNU
 alternate syntax when in pedantic ISO C99 mode.  This is default.
 To inhibit the warning messages, use @option{-Wno-variadic-macros}.
 
+@item -Wvarargs
+@opindex Wvarargs
+@opindex Wno-varargs
+Warn upon questionable usage of the macros used to handle variable
+arguments like @samp{va_start}.  This is default.  To inhibit the
+warning messages, use @option{-Wno-varargs}.
+
 @item -Wvector-operation-performance
 @opindex Wvector-operation-performance
 @opindex Wno-vector-operation-performance
diff --git a/gcc/testsuite/c-c++-common/Wvarargs-2.c 
b/gcc/testsuite/c-c++-common/Wvarargs-2.c
new file mode 100644
index 000..a2e031f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wvarargs-2.c
@@ -0,0 +1,33 @@
+/*
+   { dg-options "-Wno-varargs" }
+   { dg-do compile }
+ */
+
+#include 
+
+void
+err (int a)
+{
+  va_list vp;
+  va_start (vp, a); // { dg-error "used in function with fixed args" }
+}
+
+void
+foo0 (int a, int b, ...)
+{
+va_list vp;
+/* 'a' is not the last argument of the enclosing function, but
+   don't warn because we are ignoring -Wvarargs.  */
+va_start (vp, a);
+va_end (vp);
+}
+
+void
+foo1 (int a, register int b, ...)
+{
+va_list vp;
+/* 'b' is declared with register storage, but don't warn
+   because we are ignoring -Wvarargs.  */
+va_start (vp, b);
+va_end (vp);
+}
diff --git a/gcc/testsuite/c-c++-common/Wvarargs.c 
b/gcc/testsuite/c-c++-c

Re: [PATCH 11/13] Fix va_start related location

2012-04-27 Thread Dodji Seketeli
Gabriel Dos Reis  writes:

> On Wed, Apr 25, 2012 at 10:20 AM, Dodji Seketeli  wrote:
>> Gabriel Dos Reis  writes:
>>
>>> On Wed, Apr 25, 2012 at 9:04 AM, Dodji Seketeli  wrote:
 In gcc/testsuite/gcc.dg/pr30457.c, the first warning was not being
 emitted because the relevant location was inside the var_start macro
 defined in a system header.  It can even point to a token for a
 builtin macro there.  This patch unwinds to the first token in real
 source code in that case.
>>>
>>> While you are at it, could you also use a non-zero value for the second
>>> argument argument to warning_at?
>>
>> I couldn't find any obvious value for it.  I am thinking maybe it would
>> make sense to introduction a new -Wva_start to warn about possible
>> dangerous uses of the va_start macro and use that as the second argument
>> for the relevant warnings emitted by fold_builtin_next_arg.  What do you
>> think?
>
> or -Wvarargs?

OK, I have cooked up a patch for this that I will send in a separate
thread shortly.

>>
>> In any case, this topic seems logically unrelated to the purpose of
>> enable -ftrack-macro-expansion by default, so IMHO it would be better to
>> do this in a separate self contain patch.  Among other things, this
>> would ease possible future back-ports.  Would you agree?
>
> OK.

While testing the separate patch, I realized that this one was missing
adjusting the location in another spot.  So I have updated this patch
accordingly.  The patch that adds -Wvarargs will come on top of it, and
will add some needed regression tests for the whole.

Tested on x86_64-unknown-linux-gnu against trunk.  Bootstrap is still
running ...

* builtins.c (fold_builtin_next_arg): Unwinds to the first
location in real source code.
---
 gcc/builtins.c |   23 +++
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index b47f218..5ddc47b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -12095,6 +12095,13 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
   tree fntype = TREE_TYPE (current_function_decl);
   int nargs = call_expr_nargs (exp);
   tree arg;
+  /* There is good chance the current input_location points inside the
+ definition of the va_start macro (perhaps on the token for
+ builtin) in a system header, so warnings will not be emitted.
+ Use the location in real source code.  */
+  source_location current_location =
+linemap_unwind_to_first_non_reserved_loc (line_table, input_location,
+ NULL);
 
   if (!stdarg_p (fntype))
 {
@@ -12119,7 +12126,9 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
{
  /* Evidently an out of date version of ; can't validate
 va_start's second argument, but can still work as intended.  */
- warning (0, "%<__builtin_next_arg%> called without an argument");
+ warning_at (current_location,
+ 0,
+ "%<__builtin_next_arg%> called without an argument");
  return true;
}
   else if (nargs > 1)
@@ -12154,7 +12163,9 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
 argument.  We just warn and set the arg to be the last
 argument so that we will get wrong-code because of
 it.  */
- warning (0, "second parameter of % not last named 
argument");
+ warning_at (current_location,
+ 0,
+ "second parameter of % not last named 
argument");
}
 
   /* Undefined by C99 7.15.1.4p4 (va_start):
@@ -12164,8 +12175,12 @@ fold_builtin_next_arg (tree exp, bool va_start_p)
  the default argument promotions, the behavior is undefined."
   */
   else if (DECL_REGISTER (arg))
-warning (0, "undefined behaviour when second parameter of "
- "% is declared with % storage");
+   {
+ warning_at (current_location,
+ 0,
+ "undefined behaviour when second parameter of "
+ "% is declared with % storage");
+   }
 
   /* We want to verify the second parameter just once before the tree
 optimizers are run and then avoid keeping it in the tree,
-- 
Dodji


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread Sriraman Tallam
On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu  wrote:
> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam  wrote:
>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu  wrote:
>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  
>>> wrote:
 Hi,

   I have made the following changes in this new patch which is attached:

 * Use target attribute itself to create function versions.
 * Handle any number of ISA names and arch=  args to target attribute,
 generating the right dispatchers.
 * Integrate with the CPU runtime detection checked in this week.
 * Overload resolution: If the caller's target matches any of the
 version function's target, then a direct call to the version is
 generated, no need to go through the dispatching.

 Patch also available for review here:
 http://codereview.appspot.com/5752064

>>>
>>> Does it work with
>>>
>>> int foo ();
>>> int foo () __attribute__ ((targetv("arch=corei7")));
>>>
>>> int (*foo_p) () = foo?
>>
>> Yes, this will work. foo_p will be the address of the dispatcher
>> function and hence doing (*foo_p)() will call the right version.
>
> Even when foo_p is a global variable and compiled with -fPIC?

I am not sure I understand what the complication is here, but FWIW, I
tried this example and it works

int foo ()
{
 return 0;
}

int  __attribute__ ((target ("arch=corei7)))
foo ()
{
 return 1;
}

int (*foo_p)() = foo;
int main ()
{
 return (*foo_p)();
}

g++ -fPIC -O2 example.cc


Did you have something else in mind? Could you please elaborate if you
a have a particular case in mind.

The way I handle function pointers is straightforward. When the
front-end sees a pointer to a function that is versioned, it returns
the pointer to the dispatcher instead.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread H.J. Lu
On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam  wrote:
> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu  wrote:
>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  
>> wrote:
>>> Hi,
>>>
>>>   I have made the following changes in this new patch which is attached:
>>>
>>> * Use target attribute itself to create function versions.
>>> * Handle any number of ISA names and arch=  args to target attribute,
>>> generating the right dispatchers.
>>> * Integrate with the CPU runtime detection checked in this week.
>>> * Overload resolution: If the caller's target matches any of the
>>> version function's target, then a direct call to the version is
>>> generated, no need to go through the dispatching.
>>>
>>> Patch also available for review here:
>>> http://codereview.appspot.com/5752064
>>>
>>
>> Does it work with
>>
>> int foo ();
>> int foo () __attribute__ ((targetv("arch=corei7")));
>>
>> int (*foo_p) () = foo?
>
> Yes, this will work. foo_p will be the address of the dispatcher
> function and hence doing (*foo_p)() will call the right version.

Even when foo_p is a global variable and compiled with -fPIC?

Thanks.

-- 
H.J.


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread Sriraman Tallam
On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu  wrote:
> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  wrote:
>> Hi,
>>
>>   I have made the following changes in this new patch which is attached:
>>
>> * Use target attribute itself to create function versions.
>> * Handle any number of ISA names and arch=  args to target attribute,
>> generating the right dispatchers.
>> * Integrate with the CPU runtime detection checked in this week.
>> * Overload resolution: If the caller's target matches any of the
>> version function's target, then a direct call to the version is
>> generated, no need to go through the dispatching.
>>
>> Patch also available for review here:
>> http://codereview.appspot.com/5752064
>>
>
> Does it work with
>
> int foo ();
> int foo () __attribute__ ((targetv("arch=corei7")));
>
> int (*foo_p) () = foo?

Yes, this will work. foo_p will be the address of the dispatcher
function and hence doing (*foo_p)() will call the right version.

>
> Does it support C++?

Partially, no support for virtual function versioning yet. I will add
it in the next iteration.

Thanks,
-Sri.

>
> Thanks.
>
> --
> H.J.


Re: PR c++/52538 Extend C++11 UDLs to be compatible with inttypes.h macros (issue6109043)

2012-04-27 Thread Ollie Wild
On Thu, Apr 26, 2012 at 8:35 AM, Tom Tromey  wrote:
>
> This is ok with this change.

Thanks.  Updated and submitted to trunk.

Ollie


Re: RFA: consolidate DWARF strings into libiberty

2012-04-27 Thread Tom Tromey
> "Jakub" == Jakub Jelinek  writes:

Jakub> On Thu, Apr 26, 2012 at 01:52:31PM -0400, DJ Delorie wrote:
>> 
>> I will not oppose adding more unrelated stuff to libiberty, but
>> neither will I approve it.  I will let one of the other maintainers or
>> a global maintainer approve it.

Jakub> The original libiberty patch is ok for trunk then.

Thanks, Jakub.  I am checking it in to the gcc tree.
I'll send the gdb and binutils follow-ups later.

Tom


Re: [PATCH] reload: Try alternative with swapped operands before going to the next

2012-04-27 Thread H.J. Lu
On Wed, Apr 25, 2012 at 6:23 AM, Ulrich Weigand  wrote:
> Andreas Krebbel wrote:
>
>> 2011-11-17  Andreas Krebbel  
>>
>>       * reload.c (find_reloads): Change the loop nesting when trying an
>>       alternative with swapped operands.
>
> This is OK.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53141

-- 
H.J.


Re: [libcpp] maybe canonicalize system paths in line-map

2012-04-27 Thread Manuel López-Ibáñez
On 26 April 2012 12:12, Dodji Seketeli  wrote:
> Manuel López-Ibáñez  a écrit:
>
>> On 21 April 2012 14:56, Jason Merrill  wrote:
>>> It seems like we'll do this for every line in the header, which could lead
>>> to a lot of leaked memory.  Instead, we should canonicalize when setting
>>> ORDINARY_MAP_FILE_NAME.
>>
>> Hum, my understanding of the code is that this is exactly what I
>> implemented. That is, at most we leak every time
>> ORDINARY_MAP_FILE_NAME is set to a system header file (if the realpath
>> is shorter than the original path). This is a bit inefficient because
>> this happens two times per #include (when entering and when leaving).
>> But I honestly don't know how to avoid this.
>
> My understanding is that by the time we are about to deal with the
> ORDINARY_MAP_FILE_NAME property of a given map, it's too late; we have
> lost the "memory leak game" because that property just points to the
> path carried by the instance _cpp_file::path that is current when the
> map is built.  So the lifetime of that property is tied to the lifetime
> of the relevant instance of _cpp_file.
>
> So maybe it'd be better to canonicalize the _cpp_file::path when it's
> first build?  One drawback of that approach would be that
> _cpp_file::path will then permanently loose the information about the
> current directory, that is indirectly encoded into the way the file path
> is originally built.  But do we really need that information?

Another drawback I didn't realize until now is that in this way the
canonicalize every path, instead of only touching those that belong to
system headers.

I am not sure if it is important or not.

Comments?

Cheers,

Manuel.


Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-04-27 Thread Jakub Jelinek
On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote:
> But even without this, I think the patch is worth it just to get rid of
> all the relocations necessary otherwise.

IMHO we should defer applying this by a few months, given that GDB support
is only being added these days and -gdwarf-4 is now the default.
Applying it in August/September when there is already GDB version with
the support would be better.

> +  attr.dw_attr = DW_AT_high_pc;
> +  if (dwarf_version < 3)

Shouldn't this be < 4?  I think DWARF 3 doesn't have constant class
DW_AT_high_pc.

Jakub


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-04-27 Thread H.J. Lu
On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam  wrote:
> Hi,
>
>   I have made the following changes in this new patch which is attached:
>
> * Use target attribute itself to create function versions.
> * Handle any number of ISA names and arch=  args to target attribute,
> generating the right dispatchers.
> * Integrate with the CPU runtime detection checked in this week.
> * Overload resolution: If the caller's target matches any of the
> version function's target, then a direct call to the version is
> generated, no need to go through the dispatching.
>
> Patch also available for review here:
> http://codereview.appspot.com/5752064
>

Does it work with

int foo ();
int foo () __attribute__ ((targetv("arch=corei7")));

int (*foo_p) () = foo?

Does it support C++?

Thanks.

-- 
H.J.


dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-04-27 Thread Mark Wielaard
Hi,

The DWARF spec says (since version 4) that DW_AT_high_pc can be
represented by a constant form.

If the value of the DW_AT_high_pc is of class address, it is the
relocated address of the first location past the last
instruction associated with the entity; if it is of class
constant, the value is an unsigned integer offset which when
added to the low PC gives the address of the first location past
the last instruction associated with the entity.

Encoding DW_AT_high_pc as constant offset saves a lot of relocations.
Which is what this patch does. I also have patches for elfutils,
binutils and gdb to handle the new form. Jakub has a patch for dwz to
make the format encoding optimal, which saves some space too. I couldn't
figure out a way to do this from dwarf2out.c even though in almost all
cases using a full DW_FORM_data4/8 is way too much. Any ideas?

But even without this, I think the patch is worth it just to get rid of
all the relocations necessary otherwise.

At first I thought of using a dw_val_class_delta class that held two
labels (a generic form of dw_val_class_vms_delta), but that seemed
overkill since the first label is always associated with the
DW_AT_low_pc attribute of the DIE. So I just made it a special new
class.

2012-04-27  Mark Wielaard  

* dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
* dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
(add_AT_low_high_pc): New function.
(AT_lbl): Handle dw_val_class_high_pc.
(print_die): Likewise.
(attr_checksum): Likewise.
(attr_checksum_ordered): Likewise.
(same_dw_val_p): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(gen_subprogram_die): Use add_AT_low_high_pc.
(add_high_low_attributes): Likewise.
(dwarf2out_finish): Likewise.

Tested on x86_64-unknown-linux-gnu.

Cheers,

Mark
From 5e601bc28e283511b3f5d1bb1d2904251d909563 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Fri, 27 Apr 2012 14:27:14 +0200
Subject: [PATCH] dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant
 offset.

* dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
* dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
(add_AT_low_high_pc): New function.
(AT_lbl): Handle dw_val_class_high_pc.
(print_die): Likewise.
(attr_checksum): Likewise.
(attr_checksum_ordered): Likewise.
(same_dw_val_p): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(gen_subprogram_die): Use add_AT_low_high_pc.
(add_high_low_attributes): Likewise.
(dwarf2out_finish): Likewise.
---
 gcc/ChangeLog   |   17 +++
 gcc/dwarf2out.c |   87 +++
 gcc/dwarf2out.h |3 +-
 3 files changed, 81 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a93d3cd..66a32ad 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,20 @@
+2012-04-27  Mark Wielaard  
+
+	* dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
+	* dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
+	(add_AT_low_high_pc): New function.
+	(AT_lbl): Handle dw_val_class_high_pc.
+	(print_die): Likewise.
+	(attr_checksum): Likewise.
+	(attr_checksum_ordered): Likewise.
+	(same_dw_val_p): Likewise.
+	(size_of_die): Likewise.
+	(value_format): Likewise.
+	(output_die): Likewise.
+	(gen_subprogram_die): Use add_AT_low_high_pc.
+	(add_high_low_attributes): Likewise.
+	(dwarf2out_finish): Likewise.
+
 2012-04-25  Jakub Jelinek  
 
 	PR target/53110
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 766edba..fa2963b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1647,6 +1647,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b)
 case dw_val_class_fde_ref:
   return a->v.val_fde_index == b->v.val_fde_index;
 case dw_val_class_lbl_id:
+case dw_val_class_high_pc:
   return strcmp (a->v.val_lbl_id, b->v.val_lbl_id) == 0;
 case dw_val_class_str:
   return a->v.val_str == b->v.val_str;
@@ -4350,6 +4351,26 @@ add_AT_data8 (dw_die_ref die, enum dwarf_attribute attr_kind,
   add_dwarf_attr (die, &attr);
 }
 
+/* Add DW_AT_low_pc and DW_AT_high_pc to a DIE.  */
+static inline void
+add_AT_low_high_pc (dw_die_ref die, const char *lbl_low, const char *lbl_high)
+{
+  dw_attr_node attr;
+
+  attr.dw_attr = DW_AT_low_pc;
+  attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_low);
+  add_dwarf_attr (die, &attr);
+
+  attr.dw_attr = DW_AT_high_pc;
+  if (dwarf_version < 3)
+attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  else
+attr.dw_attr_val.val_class = dw_val_class_high_pc;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_high);
+  add_dwarf_attr (die, &attr);
+}
+
 /* Hash and equality f

Re: [C11-atomic] [patch] gimple atomic statements

2012-04-27 Thread Andrew MacLeod

On 04/27/2012 04:52 AM, Richard Guenther wrote:

hmm, yeah they always return a value. I was just copying the gimple_call
code...  Why would we need to do this processing  for a GIMPLE_CALL lhs and
not a GIMPLE_ATOMIC lhs?
GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar
GIMPLE_CALL arguments can be memory if they are aggregate.

Can this be the case for atomics as well?


A. point taken.  No.  No aggregates can ever be returned by an 
atomic, so yes, I can trim that out then.
When generic atomics are used to handle aggregates, it's all handled by 
pointer parameters and the function is usually  void, except for 
compare_exchange which returns a boolean.



   And the RHS processing is the same as for a
is_gimple_call as well...  it was lifted from the code immediately
following.,..  I tried to write all this code to return the same values as
would have been returned had it actually been a  built-in __sync or __atomic
call like we have today.  Once I had them all, then we could actually make
any improvements based on our known side effect limits.

I guess I don't understand the rest of the comment about why I need to do
something different here than with a call...

Well, all callers that use walk_stmt_load_store/addr_ops need to handle
non-explicit loads/stores represented by the stmt.  For calls this includes
the loads and stores the callee may perform.  For atomics this includes  ?
the only implicit thing a normal atomic operation can do is load or 
store the TARGET.
When the type is an aggregate, then the EXPR (soon to be op :-) and/or 
EXPECTED field may also be loaded or stored indrectly as all the 
parameters become pointers.  When I add that code I will reflect that 
properly.

(depends on whether the operand of an atomic-load is a pointer or an object,
I suppose it is a pointer for the following) For atomic this includes the
implicit load of *{address operand} which will _not_ be returned by
walk_stmt_load_store/addr_ops.  Thus callers that expect to see all loads/stores
(like ipa-pure-const.c) need to explicitely handle atomics similar to how they
handle calls and asms (if only in a very conservative way).  Similar the
alias oracle needs to handle them (obviously).

OK, I understand better :-)

yes, locals can do anything they want since they aren't visible to other
processes.  at the moment, we'll leave those fences in because we dont
optimize atomics at all, but  "in the fullness of time" this will be
optimized to:
int foo()
{
  atomic_fence()
  return 1;
}

at the moment we produce:

int foo()
{
  atomic_fence()
  atomic_fence()
  return 1;

}

Which is inconsistend with the alias-oracle implementation.  Please fix it
to at _least_ not consider non-aliased locals.  You can look at the call
handling for how to do this - where you can also see how to do even better
by using points-to information from the pointer operand of the atomics
that specify the memory loaded/stored.  You don't want to hide all your
implementation bugs by making the alias oracle stupid ;)

I'm too transparently lazy obviously :-)  I'll look at it :-)


All atomic operations will have a VDEF so in theory it should be fine to
ignore.
Ignore?  Returning 'false' would be ignoring them.  Why do all atomic
operations have a VDEF?  At least atomic loads are not considered
stores, are they?
very true, and neither do fences...  But we can't move any shared memory 
load or store past them, so making them all VDEFS prevents that 
activity.  Once gimple_atomic is working properly, it might be possible 
to go back and adjust places. In particular, we may be able to allow 
directional movement based on the memory order.  seq_cst will always 
block motion in both directions, but the acquire model allows shared 
memory accesses to be sunk, and the release model allows shared memory 
access to be hoisted. relaxed allows both..  We may be able to get this 
behaviour through appropriate uses of VDEFs and VUSES...  or maybe it 
will be through a more basic understanding of GIMPLE_ATOMIC in the 
appropriate places.  I will visit that once everything is working 
conservatively.



  There are only 2 uses:
  DCE already has code added to handle them, and
  DSE handles it through ref_maybe_used_by_stmt_p.

Yes.  And eventually what matters for both callers should be folded into them
(let me put that somewhere ontop of my TODO ...)

For example DCE would happily remove atomic loads if they look like

  result-SSA-name = ATOMIC-LOAD;

if result-SSA-name is not used anywhere.  And I don't see why we should
not be allowed to do this?  Returning true from the above function will
make DCE not remove it.
at the moment, we are NOT allowed to remove any atomic operation. There 
are synchronization side effects and the only way we can remove such a 
statement is by analyzing the atomic operations in relation to each 
other. (ie, 2 loads from the same atomic with no other intervening 
atomic operation may make the first on

[PATCH] x86: emit tzcnt unconditionally

2012-04-27 Thread Paolo Bonzini
tzcnt is encoded as "rep;bsf" and unlike lzcnt is a drop-in replacement
if we don't care about the flags (it has the same semantics for non-zero
values).

Since bsf is usually slower, just emit tzcnt unconditionally.  However,
write it as rep;bsf unless -mbmi is in use, to cater for old assemblers.

Bootstrapped on a non-BMI x86_64-linux host, regtest in progress.
Ok for mainline?

Paolo

2012-04-27  Paolo Bonzini  

* config/i386/i386.md (ctz2): Emit tzcnt (as rep;bsf)
instead of bsf.

Index: i386/i386.md
===
--- i386/i386.md(revisione 186904)
+++ i386/i386.md(copia locale)
@@ -12082,14 +12082,15 @@ (define_insn "ctz2"
(clobber (reg:CC FLAGS_REG))]
   ""
 {
+  /* tzcnt expands to rep;bsf and we can use it even if !TARGET_BMI.  */
   if (TARGET_BMI)
 return "tzcnt{}\t{%1, %0|%0, %1}";
   else
-return "bsf{}\t{%1, %0|%0, %1}";
+return "rep;bsf{}\t{%1, %0|%0, %1}";
 }
   [(set_attr "type" "alu1")
(set_attr "prefix_0f" "1")
-   (set (attr "prefix_rep") (symbol_ref "TARGET_BMI"))
+   (set_attr "prefix_rep" "1")
(set_attr "mode" "")])
 
 (define_expand "clz2"



Re: [PATCH, i386, middle-end, tessuite] Intel TSX's HLE.

2012-04-27 Thread Kirill Yukhin
Hello guys,
After conversation in IRC with Richard, I've slightly updated the patch.
1. According to Richards suggestion I moved PTA_HLE to `generic` march.
2. Applied and updated Andi's patch (see [1]).
3. Updated tests to use proper memory model combintations
4. Added 1-sentense description to extend.texi to mention

Since, I've changed i386 part, I thing Uros's OK is also needed.

ChangeLog entry:
2012-04-27  Kirill Yukhin  
Andi Kleen 

* coretypes (MEMMODEL_MASK): New.
* builtins.c (get_memmodel): Add val. Call target.memmodel_check
and return new variable.
(expand_builtin_atomic_exchange):  Mask memmodel values.
(expand_builtin_atomic_compare_exchange): Ditto.
(expand_builtin_atomic_load): Ditto.
(expand_builtin_atomic_store): Ditto.
(expand_builtin_atomic_clear): Ditto.
* doc/extend.texi: Mention port-dependent memory model flags.
* config/i386/cpuid.h (bit_HLE): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
HLE support.
* config/i386/i386-protos.h (ix86_generate_hle_prefix): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Set
HLE defines.
(ix86_target_string)<-mhle>: New.
(ix86_valid_target_attribute_inner_p): Ditto.
* config/i386/i386.c (ix86_target_string):
New.
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): New switch, set it
enabled for generic and generic64.
(ix86_print_operand): Generate HLE lock prefixes.
(ix86_memmodel_check): New.
(TARGET_MEMMODEL_CHECK): Ditto.
* config/i386/i386.h (OPTION_ISA_HLE): Ditto.
(IX86_HLE_ACQUIRE): Ditto.
(IX86_HLE_RELEASE): Ditto.
* config/i386/i386.h (ix86_generate_hle_prefix): Ditto.
* config/i386/i386.opt (mhle): Ditto.
* config/i386/sync.md(atomic_compare_and_swap): Pass
success model to instruction emitter.
(atomic_fetch_add): Ditto.
(atomic_exchange): Ditto.
(atomic_add): Ditto.
(atomic_sub): Ditto.
(atomic_): Ditto.
(*atomic_compare_and_swap_doubledi_pic): Ditto.
(atomic_compare_and_swap_single): Define and use argument
for success model.
(atomic_compare_and_swap_double): Ditto.
* configure.ac: Check if assembler support HLE prefixes.
* configure: Regenerate.
* config.in: Ditto.

testsuite/ChageLog entry:
2012-04-27  Kirill Yukhin  

* gcc.target/i386/hle-cmpxchg-acq-1.c: New.
* gcc.target/i386/hle-cmpxchg-rel-1.c: Ditto.
* gcc.target/i386/hle-add-acq-1.c: Ditto.
* gcc.target/i386/hle-add-rel-1.c: Ditto.
* gcc.target/i386/hle-and-acq-1.c: Ditto.
* gcc.target/i386/hle-and-rel-1.c: Ditto.
* gcc.target/i386/hle-or-acq-1.c: Ditto.
* gcc.target/i386/hle-or-rel-1.c: Ditto.
* gcc.target/i386/hle-sub-acq-1.c: Ditto.
* gcc.target/i386/hle-sub-rel-1.c: Ditto.
* gcc.target/i386/hle-xadd-acq-1.c: Ditto.
* gcc.target/i386/hle-xadd-rel-1.c: Ditto.
* gcc.target/i386/hle-xchg-acq-1.c: Ditto.
* gcc.target/i386/hle-xchg-rel-1.c: Ditto.
* gcc.target/i386/hle-xor-acq-1.c: Ditto.
* gcc.target/i386/hle-xor-rel-1.c: Ditto.

bootstrap passing. HLE tests passing. Patch attached.

OK?

[1] - http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01538.html.

Thanks, K

On Thu, Apr 19, 2012 at 9:18 PM, Uros Bizjak  wrote:
> On Thu, Apr 19, 2012 at 5:21 PM, Kirill Yukhin  
> wrote:
>> Folks,
>> Thanks a lot for prompts!
>> I've updated my patch, so cmparing to previous it is:
>>  - have dedicated hook var, to define target dep. bits in mem model.
>> Check (and warning) is performed in get_memmodel
>>   - prefix emit performed through dedicated operand type (%K)
>
> +#else
> +         fputs ("\n" ASM_BYTE "0xf3\n\t");
> +#endif
>
> Ouch, my bad, you need to add file argument here (no need to repost
> patch because of this). FYI, you have wrong indentation here.
>
> Otherwise, OK as far as x86 is concerned, but you will need separate
> approval for middle-end part.
>
> Thanks,
> Uros.


hle-5.gcc.patch
Description: Binary data


Re: [PATCH] teach emit_store_flag to use clz/ctz

2012-04-27 Thread Paolo Bonzini
Il 27/04/2012 13:16, Richard Guenther ha scritto:
> In optabs.c we compare the CTZ_DEFINED_VALUE_AT_ZERO against two,
> is != 0 really what you want here?  The docs suggest to me
> that as you are using the optab below you should compare against two, too.

Interesting, first time I hear about this...

... except that no target sets the macros to 2, and all of them could
(as far as I could see).  Looks like the code trumps the documentation;
how does this look?

2012-04-27  Paolo Bonzini  

* optabs.c (expand_ffs): Check CTZ_DEFINED_VALUE_AT_ZERO
against 1.
* doc/tm.texi (Misc): Invert meaning of 1 and 2 for
CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO.

Index: optabs.c
===
--- optabs.c(revisione 186903)
+++ optabs.c(copia locale)
@@ -2764,7 +2764,7 @@ expand_ffs
   if (!temp)
goto fail;
 
-  defined_at_zero = (CTZ_DEFINED_VALUE_AT_ZERO (mode, val) == 2);
+  defined_at_zero = (CTZ_DEFINED_VALUE_AT_ZERO (mode, val) == 1);
 }
   else if (optab_handler (clz_optab, mode) != CODE_FOR_nothing)
 {
@@ -2773,7 +2773,7 @@ expand_ffs
   if (!temp)
goto fail;
 
-  if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) == 2)
+  if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) == 1)
{
  defined_at_zero = true;
  val = (GET_MODE_PRECISION (mode) - 1) - val;
Index: doc/tm.texi
===
--- doc/tm.texi (revisione 186903)
+++ doc/tm.texi (copia locale)
@@ -10640,9 +10640,9 @@
 for @code{clz} or @code{ctz} with a zero operand.
 A result of @code{0} indicates the value is undefined.
 If the value is defined for only the RTL expression, the macro should
-evaluate to @code{1}; if the value applies also to the corresponding optab
+evaluate to @code{2}; if the value applies also to the corresponding optab
 entry (which is normally the case if it expands directly into
-the corresponding RTL), then the macro should evaluate to @code{2}.
+the corresponding RTL), then the macro should evaluate to @code{1}.
 In the cases where the value is defined, @var{value} should be set to
 this value.
 
plus tweaking this patch to reject 2 as well.

> What about cost considerations?  We only seem to have the general
> "branches are expensive" metric - but ctz/clz may be prohibitely expensive
> themselves, no?

Yeah, that's a general problem with this kind of tricks.  In general
however clz/ctz is getting less and less expensive, so I don't think
it is worrisome (at least at the beginning of stage 1).  We can add
rtx_costs checks later.

Among architectures I know, only i386 has an expensive bsf/bsr but
it also has sete/setne which GCC will use instead of this trick.

Looking at rtx_costs, nothing seems to mark clz/ctz as prohibitively
expensive (Xtensa does, but only in the case when the optab handler
will not exist).  I realize though that this is not a particularly
good statistic, since the compiler would not generate them out of
its hat until now.

Paolo


Re: [PATCH, tree-optimization] Fix for PR 52868

2012-04-27 Thread Igor Zamyatin
On Wed, Apr 25, 2012 at 6:41 PM, Richard Guenther
 wrote:
> On Wed, Apr 25, 2012 at 4:32 PM, Igor Zamyatin  wrote:
>> On Wed, Apr 25, 2012 at 1:14 PM, Richard Guenther
>>  wrote:
>>> On Wed, Apr 25, 2012 at 10:56 AM, Igor Zamyatin  wrote:
 Hi all!

 I'd like to post for review the patch which makes some costs adjusting
 in get_computation_cost_at routine in ivopts part.
 As mentioned in the PR changes also fix the bwaves regression from PR 
 52272.
 Also changes introduce no degradations on spec2000/2006 and
 EEMBC1.1/2.0(this was measured on Atom) on x86


 Bootstrapped and regtested on x86. Ok to commit?
>>>
>>> I can't make sense of the patch and the comment does not help.
>>>
>>> +      diff_cost = cost.cost;
>>>       cost.cost /= avg_loop_niter (data->current_loop);
>>> +      add_cost_val = add_cost (TYPE_MODE (ctype), data->speed);
>>> +      /* Do cost correction if address cost is small enough
>>> +         and difference cost is high enough.  */
>>> +      if (address_p && diff_cost > add_cost_val
>>> +          && get_address_cost (symbol_present, var_present,
>>> +                               offset, ratio, cstepi,
>>> +                               TYPE_MODE (TREE_TYPE (utype)),
>>> +                               TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> +                               speed, stmt_is_after_inc,
>>> +                               can_autoinc).cost <= add_cost_val)
>>> +        cost.cost += add_cost_val;
>>>
>>> Please explain more thoroughly.  It also would seem to be better to add
>>> an extra case, as later code does
>>
>> For example for such code
>>
>>   for (j=0; j>       for (i=0; i>           sum += ptr->a[j][i] * ptr->c[k][i];
>>   }
>>  we currently have following gimple on x86 target (I provided a piece
>> of all phase output):
>>
>>           # ivtmp.13_30 = PHI 
>>           D.1748_34 = (void *) ivtmp.13_30;
>>           D.1722_7 = MEM[base: D.1748_34, offset: 0B];
>>           D.1750_36 = ivtmp.27_28;
>>           D.1751_37 = D.1750_36 + ivtmp.13_30; <-- we got
>> non-invariant add which is not taken into account currently in cost
>> model
>>           D.1752_38 = (void *) D.1751_37;
>>           D.1753_39 = (sizetype) k_8(D);
>>           D.1754_40 = D.1753_39 * 800;
>>           D.1723_9 = MEM[base: D.1752_38, index: D.1754_40, offset: 16000B];
>>           ...
>>
>>  With proposed fix we produce:
>>
>>           # ivtmp.14_30 = PHI 
>>           D.1749_34 = (struct S *) ivtmp.25_28;
>>           D.1722_7 = MEM[base: D.1749_34, index: ivtmp.14_30, offset: 0B];
>>           D.1750_35 = (sizetype) k_8(D);
>>           D.1751_36 = D.1750_35 * 800;
>>           D.1752_37 = ptr_6(D) + D.1751_36;
>>           D.1723_9 = MEM[base: D.1752_37, index: ivtmp.14_30, offset: 
>> 16000B];
>>
>> which is more effective on platforms where address cost is cheaper
>> than cost of addition operation. That's basically what this adjustment
>> is for.
>
> If we generally miss to account for the add then why is the adjustment
> conditional on diff_cost > add_cost and address_cost <= add_cost?
>
> Is this a new heuristic or a fix for not accurately computing the cost for the
> stmts we generate?

I'd say this is closer to heuristic since diff_cost > add_cost is an
attempt to catch the case with non-invariant add produced by pointer
difference and address_cost <=add_cost leaves the cases with cheap
address operations

>
> Richard.
>
>> So comment in the source code now looks as follows
>>
>> /* Do cost correction when address difference produces
>>   additional non-invariant add operation which is less
>>   profitable if address cost is cheaper than cost of add.  */
>>
>>>
>>>  /* Now the computation is in shape symbol + var1 + const + ratio * var2.
>>>     (symbol/var1/const parts may be omitted).  If we are looking for an
>>>     address, find the cost of addressing this.  */
>>>  if (address_p)
>>>    return add_costs (cost,
>>>                      get_address_cost (symbol_present, var_present,
>>>                                        offset, ratio, cstepi,
>>>                                        TYPE_MODE (TREE_TYPE (utype)),
>>>                                        TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>>                                        speed, stmt_is_after_inc,
>>>                                        can_autoinc));
>>>
>>> thus refactoring the code a bit would make it possible to CSE the
>>> get_address_cost
>>> call and eventually make it clearer what the code does.
>>
>> 'offset' could be changed beetween two calls of get_address_cost so
>> such refactoring looks useless.
>>
>> New patch (only the comment was changed) attached. Changelog was
>> changed as well.
>>
>>>
>>> Richard.
>>>
>>
>> Changelog:
>>
>>  2012-04-26  Yuri Rumyantsev  
>>
>>         * tree-ssa-loop-ivopts.c (get_computation_cost_at): Adjust
>>        cost model when address difference produces additional
>>        non-invariant add operation whi

Re: [C11-atomic] [patch] gimple atomic statements

2012-04-27 Thread Andrew MacLeod

On 04/27/2012 04:37 AM, Richard Guenther wrote:

Since integral atomics are always of an unsigned type ,  I could switch over
and use 'unsigned size' instead of 'tree fntype' for them (I will rename
it), but then things may  be more complicated when dealing with generic
atomics...  those can be structure or array types and I was planning to
allow leaving the type in case I discover something useful I can do with it.
  It may ultimately turn out that the real type isn't going to matter, in
which case I will remove it and replace it with an unsigned int for size.
So it eventually will support variable-size types?
we support arbitrary sized objects now for exchange, compare_exchange, 
load, and store. I just havent added the support to gimple_atomic yet.  
Thats next on my list.

And the reason memmodel is a tree is because, as ridiculous as it seems, it
can ultimately be a runtime value.Even barring that, it shows up as a
variable after inlining before various propagation engines run, especially
in the  C++ templates.  So it must be a tree.

Ick.  That sounds gross.  So, if it ends up a variable at the time we generate
assembly we use a "conservative" constant value for it?
Indeed,  ICK,  and yes, I make it SEQ_CST at compile time if a variable 
value gets all the way through to codegen.
Hmm, ok. So you are changing GENERIC in fact. I suppose _Atomic is 
restricted to global variables. Can I attach _Atomic to allocated 
storage via pointer casts? Consider writing up semantics of _Atomic 
into generic.texi please.


Yes, I have a start on implementing _Atomic which adds the bit to the 
type.  And no, it isn't restricted to globals... it just indicates said 
hunk of memory must be accessed in an atomic manner.   You can use it on 
automatics if you want...


I believe _Atomic can also be used in casts as well.  so an atomic call 
may have to be injected into a dereference expression as well to load or 
update memory.


I'll update .texi along with the patch when I get to it.



Well, they are calls with a very well-defined set of side-effects. 
Otherwise not representing them as calls would be a waste of time. 
Thus, no - they do not need to be considered "calls" everywhere (well, 
treating them the same as calls should be conservative) but treating 
them as "atomics" even if they were expanded as calls needs to be 
correct. 


Yeah, I was starting with them as calls just to be conservative and get 
the same behaviour, then refine them by examining each GIMPLE_ATOMIC use 
in detail.

Looks like a hieararchy "no target or anything else" ->  "target" ->
"expr" ->  "memorder" would work, with only "lhs" being optional and
present everywhere.
Of course the hierarchy only makes sense for things that are not trees
(I was thinking of memorder originally - but that thought fell apart).
  So in the
end apart from abstracting a base for FENCE the flat hieararchy makes sense
(all but FENCE have a type / size).
Is it really worth it to have 2 different types of gimple_atomic nodes 
in order to avoid having an unused type field in atomic_fence?  
Atomic_fence has an order field, so it requires the use of op[0] in 
order for gimple_atomic_order() to work properly, we can't have all the 
other gimple nodes inherit from that, so we'd need a base which 
basically has just the KIND in it, and target/no-target nodes inherit 
from that.  So we'd have:


/gimple_atomic_fence
atomic_base
\   gimple_atomic_all_others

then all the checks for is_gimple_atomic() have to look for both 
gimple_atomic_all_others and gimple_atomic_fence.



Perhaps a better option is to shift all the operands by one since the 
type is a tree... ie, if there is a a target, there is also a type entry, so

op[0] remains ORDER,
op[1] is the newly inserted TYPE
op[2] then becomes the TARGET,
op[3...] ,when present, are all shifted over by one.

the type can be removed from the structure and now there would be no 
wastage.  Is that reasonable?


Andrew


Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-27 Thread Jan Hubicka
> On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou  wrote:
> >> We do not depend on the block structure any more when dealing with
> >> stack layout for variables in GCC 4.7.0 and above.  I am not saying
> >> your patch is incorrect or not needed.  Just it will not have an
> >> effect on variable stack layout.
> >
> > It might be worth backporting to the 4.6 branch though, these stack layout
> > issues are very nasty.
> 
> What about not setting a BLOCK on the call stmt?  That said, I can't see

I recall that not seetting the block did lead to ICE...

> how outlining a SESE region that starts at the beginning of the function
> is not conservatively using the outermost BLOCK ... so, is the bug not
> elsewhere?

... so I made the exactly same conclussion.
The problem is with stack vars allocated in header that survive till the
tail part?

Honza
> 
> Richard.
> 
> > --
> > Eric Botcazou


Re: [PATCH] Support for known unknown alignment

2012-04-27 Thread Martin Jambor
Hi,

On Tue, Apr 24, 2012 at 12:31:38PM +0200, Martin Jambor wrote:
> Hi,
> 
> On Mon, Apr 23, 2012 at 03:30:19PM +0200, Richard Guenther wrote:
> > On Mon, 23 Apr 2012, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > On Mon, Apr 23, 2012 at 12:50:51PM +0200, Richard Guenther wrote:
> > > > On Fri, 20 Apr 2012, Martin Jambor wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > two days ago I talked to Richi on IRC about the functions to determine
> > > > > the expected alignment of objects and pointers we have and he
> > > > > suggested that get_object_alignment_1 and get_pointer_alignment_1
> > > > > should return whether the alignment is actually known and return the
> > > > > actual alignment in a reference variable (as they currently return
> > > > > bitpos).
> > > > > 
> 
...

> The testsuite differences I got on Friday were probably noise, tonight
> (on an updated svn tree) I did not get any on ppc64-linux,
> x86_64-linux or i686-linux.  Considering that and the OK above, I'm
> going to bootstrap and test also on sparc64-linux and ia64-linux and
> if those pass too, I'll commit the patch tomorrow, unless there are
> any objections.
> 

There is a Fortran testsuite that, on sparc64-linux, showed that my
clever simplification of get_object_alignment_1 handling of
TARGET_MEM_REFs was actually very dumb.  So this is the working patch,
successfully bootstrapped and tested on x86_64-linux (all languages
and Ada), i686-linux (likewise), sparc64-linux (all languages - Java +
Ada), ia64-linux (all languages) and ppc64-linux (likewise) and tested
on hppa-linux (C and C++ only).

I still consider the patch approved because it actually changes less
stuff but do not want to commit it before a weekend so intend to do it
next week.

Thanks,

Martin


2012-04-25  Martin Jambor  

* builtins.c (get_object_alignment_1): Return whether we can determine
the alignment or conservatively assume byte alignment.  Return the
alignment by reference.  Use get_pointer_alignment_1 for dereference
alignment.
(get_pointer_alignment_1): Return whether we can determine the
alignment or conservatively assume byte alignment.  Return the
alignment by reference.  Use get_ptr_info_alignment to get SSA name
alignment.
(get_object_alignment): Update call to get_object_alignment_1.
(get_object_or_type_alignment): Likewise, fall back to type alignment
only when it returned false.
(get_pointer_alignment): Update call to get_pointer_alignment_1.
* fold-const.c (get_pointer_modulus_and_residue): Update call to
get_object_alignment_1.
* ipa-prop.c (ipa_modify_call_arguments): Update call to
get_pointer_alignment_1.
* tree-sra.c (build_ref_for_offset): Likewise, fall back to the type
of MEM_REF or TARGET_MEM_REF only when it returns false.
* tree-ssa-ccp.c (get_value_from_alignment): Update call to
get_object_alignment_1.
(ccp_finalize): Use set_ptr_info_alignment.
* tree.h (get_object_alignment_1): Update declaration.
(get_pointer_alignment_1): Likewise.
* gimple-pretty-print.c (dump_gimple_phi): Use get_ptr_info_alignment.
(dump_gimple_stmt): Likewise.
* tree-flow.h (ptr_info_def): Updated comments of fields align and
misalign.
(get_ptr_info_alignment): Declared.
(mark_ptr_info_alignment_unknown): Likewise.
(set_ptr_info_alignment): Likewise.
(adjust_ptr_info_misalignment): Likewise.
* tree-ssa-address.c (copy_ref_info): Use new access functions to get
and set alignment of SSA names.
* tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Call
mark_ptr_info_alignment_unknown.
* tree-ssanames.c (get_ptr_info_alignment): New function.
(mark_ptr_info_alignment_unknown): Likewise.
(set_ptr_info_alignment): Likewise.
(adjust_ptr_info_misalignment): Likewise.
(get_ptr_info): Call mark_ptr_info_alignment_unknown.
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Likewise.
(bump_vector_ptr): Likewise.
* tree-vect-stmts.c (create_array_ref): Use set_ptr_info_alignment.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.


Index: src/gcc/builtins.c
===
*** src.orig/gcc/builtins.c
--- src/gcc/builtins.c
*** called_as_built_in (tree node)
*** 263,270 
return is_builtin_name (name);
  }
  
! /* Compute values M and N such that M divides (address of EXP - N) and
!such that N < M.  Store N in *BITPOSP and return M.
  
 Note that the address (and thus the alignment) computed here is based
 on the address to which a symbol resolves, whereas DECL_ALIGN is based
--- 263,272 
return is_builtin_name (name);
  }
  
! /* Compute values M and N such that M divides (address of EXP - N

Re: [PATCH] Proper use of decl_function_context in dwar2out.c

2012-04-27 Thread Martin Jambor
Hi,

I'd like to drag some attention to this bug again, it is the only ICE
when LTO building Firefox with debug info and the problem also happens
with the 4.7 so it would be nice to have this fixed for 4.7.1.

On Mon, Mar 12, 2012 at 11:51:05AM +0100, Richard Guenther wrote:
> On Thu, Mar 8, 2012 at 12:18 PM, Jakub Jelinek  wrote:
> > On Thu, Mar 08, 2012 at 12:06:46PM +0100, Martin Jambor wrote:
> >>        /* For local statics lookup proper context die.  */
> >> -      if (TREE_STATIC (decl) && decl_function_context (decl))
> >> -     context_die = lookup_decl_die (DECL_CONTEXT (decl));
> >> +      if (TREE_STATIC (decl) &&
> >> +       (ctx_fndecl = decl_function_context (decl)) != NULL_TREE)
> >> +     context_die = lookup_decl_die (ctx_fndecl);
> >
> > The formatting is wrong, && shouldn't be at the end of line.
> > For the rest I'll defer to Jason, not sure what exactly we want to do there.
> > This hunk has been added by Honza:
> 
> I don't think the patch is right.  Suppose you have a function-local
> class declaration with a static member.  Surely the context DIE
> you want to use is still the class one, not the function one.
> 

Let me just briefly re-iterate what I have already written before.
The above cannot happen because function-local classes are not allowed
to have static members.  Also, the actual problem happens when we
attempt to generate debug info for a VMT of a class defined within a
function.  I don not think it really matters whether or what context
it gets...

> So, I'm not sure why we should not be able to create a testcase
> that fails even without LTO.  I think using get_context_die here
> would be more appropriate.  Or restrict this to DECL_CONTEXT (decl)
> == FUNCTION_DECL.

...and thus I am fine with this as well, which is what the patch below
does.  It now also has a testcase, LTO builds the testcase and I am
currently bootstrapping it and LTOing Firefox with it.

But I would be equally happy with my original patch, feeding
lookup_decl_die with the result of decl_function_context.

OK for trunk and the 4.7 branch?

Thanks,

Martin


2012-04-27  Martin Jambor  

PR lto/53138
* dwarf2out.c (dwarf2out_decl): Only lookup die representing context
of a variable when the contect is a function.

* gcc/testsuite/g++.dg/lto/pr52605_0.C: New test.


Index: src/gcc/dwarf2out.c
===
--- src.orig/gcc/dwarf2out.c
+++ src/gcc/dwarf2out.c
@@ -19880,7 +19880,9 @@ dwarf2out_decl (tree decl)
return;
 
   /* For local statics lookup proper context die.  */
-  if (TREE_STATIC (decl) && decl_function_context (decl))
+  if (TREE_STATIC (decl)
+ && DECL_CONTEXT (decl)
+ && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
context_die = lookup_decl_die (DECL_CONTEXT (decl));
 
   /* If we are in terse mode, don't generate any DIEs to represent any
Index: src/gcc/testsuite/g++.dg/lto/pr52605_0.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/lto/pr52605_0.C
@@ -0,0 +1,39 @@
+// { dg-lto-do link }
+// { dg-lto-options {{-flto -g}} }
+
+extern "C" void abort (void);
+
+class A
+{
+public:
+  virtual int foo (int i);
+};
+
+int A::foo (int i)
+{
+  return i + 1;
+}
+
+int __attribute__ ((noinline,noclone)) get_input(void)
+{
+  return 1;
+}
+
+int main (int argc, char *argv[])
+{
+
+  class B : public A
+  {
+  public:
+int bar (int i)
+{
+  return foo (i) + 2;
+}
+  };
+  class B b;
+
+  if (b.bar (get_input ()) != 4)
+abort ();
+  return 0;
+}
+


[PATCH] Remove is_hidden_global_store

2012-04-27 Thread Richard Guenther

This removes is_hidden_global_store in favor of two functions
with more clear semantics.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2012-04-27  Richard Guenther  

* tree-flow.h (is_hidden_global_store): Remove.
* tree-ssa-sink.c (is_hidden_global_store): Likewise.
* tree-ssa-alias.h (ref_may_alias_global_p): Declare.
(stmt_may_clobber_global_p): Likewise.
* tree-ssa-alias.c (ref_may_alias_global_p): New function.
(stmt_may_clobber_global_p): Likewise.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Call
stmt_may_clobber_global_p.
* tree-ssa-dse.c (dse_possible_dead_store_p): Likewise.

Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186897)
--- gcc/tree-flow.h (working copy)
*** extern void maybe_remove_unreachable_han
*** 795,803 
  /* In tree-ssa-pre.c  */
  void debug_value_expressions (unsigned int);
  
- /* In tree-ssa-sink.c  */
- bool is_hidden_global_store (gimple);
- 
  /* In tree-loop-linear.c  */
  extern void linear_transform_loops (void);
  extern unsigned perfect_loop_nest_depth (struct loop *);
--- 795,800 
Index: gcc/tree-ssa-alias.h
===
*** gcc/tree-ssa-alias.h(revision 186897)
--- gcc/tree-ssa-alias.h(working copy)
*** extern tree ao_ref_base (ao_ref *);
*** 99,109 
--- 99,111 
  extern alias_set_type ao_ref_alias_set (ao_ref *);
  extern bool ptr_deref_may_alias_global_p (tree);
  extern bool ptr_derefs_may_alias_p (tree, tree);
+ extern bool ref_may_alias_global_p (tree);
  extern bool refs_may_alias_p (tree, tree);
  extern bool refs_may_alias_p_1 (ao_ref *, ao_ref *, bool);
  extern bool refs_anti_dependent_p (tree, tree);
  extern bool refs_output_dependent_p (tree, tree);
  extern bool ref_maybe_used_by_stmt_p (gimple, tree);
+ extern bool stmt_may_clobber_global_p (gimple);
  extern bool stmt_may_clobber_ref_p (gimple, tree);
  extern bool stmt_may_clobber_ref_p_1 (gimple, ao_ref *);
  extern bool call_may_clobber_ref_p (gimple, tree);
Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 186897)
--- gcc/tree-ssa-alias.c(working copy)
*** ptr_deref_may_alias_ref_p_1 (tree ptr, a
*** 328,333 
--- 328,379 
return true;
  }
  
+ /* Return true whether REF may refer to global memory.  */
+ 
+ bool
+ ref_may_alias_global_p (tree ref)
+ {
+   tree base = get_base_address (ref);
+   if (DECL_P (base))
+ return is_global_var (base);
+   else if (TREE_CODE (base) == MEM_REF
+  || TREE_CODE (base) == TARGET_MEM_REF)
+ return ptr_deref_may_alias_global_p (TREE_OPERAND (base, 0));
+   return true;
+ }
+ 
+ /* Return true whether STMT may clobber global memory.  */
+ 
+ bool
+ stmt_may_clobber_global_p (gimple stmt)
+ {
+   tree lhs;
+ 
+   if (!gimple_vdef (stmt))
+ return false;
+ 
+   /* ???  We can ask the oracle whether an artificial pointer
+  dereference with a pointer with points-to information covering
+  all global memory (what about non-address taken memory?) maybe
+  clobbered by this call.  As there is at the moment no convenient
+  way of doing that without generating garbage do some manual
+  checking instead.
+  ???  We could make a NULL ao_ref argument to the various
+  predicates special, meaning any global memory.  */
+ 
+   switch (gimple_code (stmt))
+ {
+ case GIMPLE_ASSIGN:
+   lhs = gimple_assign_lhs (stmt);
+   return (TREE_CODE (lhs) != SSA_NAME
+ && ref_may_alias_global_p (lhs));
+ case GIMPLE_CALL:
+   return true;
+ default:
+   return true;
+ }
+ }
+ 
  
  /* Dump alias information on FILE.  */
  
Index: gcc/tree-ssa-dce.c
===
*** gcc/tree-ssa-dce.c  (revision 186897)
--- gcc/tree-ssa-dce.c  (working copy)
*** mark_stmt_if_obviously_necessary (gimple
*** 370,376 
return;
  }
  
!   if (is_hidden_global_store (stmt))
  {
mark_stmt_necessary (stmt, true);
return;
--- 370,376 
return;
  }
  
!   if (stmt_may_clobber_global_p (stmt))
  {
mark_stmt_necessary (stmt, true);
return;
Index: gcc/tree-ssa-dse.c
===
*** gcc/tree-ssa-dse.c  (revision 186897)
--- gcc/tree-ssa-dse.c  (working copy)
*** dse_possible_dead_store_p (gimple stmt,
*** 169,175 
 just pretend the stmt makes itself dead.  Otherwise fail.  */
if (!temp)
{
! if (is_hidden_global_store (stmt))
return false;
  
  temp = stmt;
--- 169,175 
 just pretend the stmt makes itself dead.  Otherwise fail.  */
if (!temp)

Re: [PATCH] teach emit_store_flag to use clz/ctz

2012-04-27 Thread Richard Guenther
On Fri, 27 Apr 2012, Paolo Bonzini wrote:

> If the value at zero is outside the range [0, GET_MODE_BITSIZE (mode)),
> "A != 0" and "A == 0" can be compiled to clz/ctz followed by a subtraction
> or one's complement (only for A != 0) and a shift.  This trick can be
> used effectively on PPC (though there are other problems in the machine
> description to fix first).
> 
> Bootstrapped/regtested x86_64-pc-linux-gnu, and tested manually on PPC
> with some changes to the machine description.  Ok for mainline?

See below.

> Paolo
> 
> 2012-04-27  Paolo Bonzini  
> 
>   PR 53087
>   * expmed.c (emit_store_flag): Generalize to support cases
>   where the result is not in the sign bit.  Add a trick
>   involving the value at zero of clz or ctz.
> 
> Index: expmed.c
> ===
> --- expmed.c  (revisione 186859)
> +++ expmed.c  (copia locale)
> @@ -5419,6 +5419,7 @@
>enum rtx_code rcode;
>rtx subtarget;
>rtx tem, last, trueval;
> +  int shift;
>  
>tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, 
> normalizep,
>  target_mode);
> @@ -5612,10 +5613,11 @@
>  false) <= 1 || (code != LE && code != GT
>  return 0;
>  
> -  /* Try to put the result of the comparison in the sign bit.  Assume we 
> can't
> - do the necessary operation below.  */
> +  /* Assume we can't do the necessary operation below, so try to put the
> + result of the comparison in a known bit, given by SHIFT.  */
>  
>tem = 0;
> +  shift = GET_MODE_BITSIZE (mode) - 1;
>  
>/* To see if A <= 0, compute (A | (A - 1)).  A <= 0 iff that result has
>   the sign bit set.  */
> @@ -5650,25 +5652,22 @@
>  
>if (code == EQ || code == NE)
>  {
> -  /* For EQ or NE, one way to do the comparison is to apply an operation
> -  that converts the operand into a positive number if it is nonzero
> -  or zero if it was originally zero.  Then, for EQ, we subtract 1 and
> -  for NE we negate.  This puts the result in the sign bit.  Then we
> -  normalize with a shift, if needed.
> +  /* Look for an operation that converts the operand into a positive
> +  number if it is nonzero or zero if it was originally zero.  Then,
> +  for EQ, we subtract 1 and for NE we negate.  This puts the result
> +  in the sign bit.  Then we normalize with a shift, if needed.
>  
> -  Two operations that can do the above actions are ABS and FFS, so try
> -  them.  If that doesn't work, and MODE is smaller than a full word,
> -  we can use zero-extension to the wider mode (an unsigned conversion)
> -  as the operation.  */
> -
> -  /* Note that ABS doesn't yield a positive number for INT_MIN, but
> +  ABS can do this; it doesn't yield a positive number for INT_MIN, but
>that is compensated by the subsequent overflow when subtracting
> -  one / negating.  */
> +  one / negating.  If that doesn't work, and MODE is smaller than
> +  a full word, we can use zero-extension to the wider mode (an
> +  unsigned conversion) as the operation.
>  
> +  FFS could also do this, but we use a more versatile trick with
> +  CLZ/CTZ below.  */
> +
>if (optab_handler (abs_optab, mode) != CODE_FOR_nothing)
>   tem = expand_unop (mode, abs_optab, op0, subtarget, 1);
> -  else if (optab_handler (ffs_optab, mode) != CODE_FOR_nothing)
> - tem = expand_unop (mode, ffs_optab, op0, subtarget, 1);
>else if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
>   {
> tem = convert_modes (word_mode, mode, op0, 1);
> @@ -5684,6 +5683,69 @@
>   tem = expand_unop (mode, neg_optab, tem, subtarget, 0);
>   }
>  
> +  if (tem == 0)
> +{
> +  int clz_zero, ctz_zero, if_zero;
> +
> +  /* For EQ or NE, clz and ctz may give a unique value for zero that 
> is
> + not in the [0, GET_MODE_BITSIZE (mode)) range.  The result can 
> be
> + massaged to put the result in a known bit.  */
> +
> +  if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, clz_zero))
> +clz_zero = 0;
> +  if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, ctz_zero))
> +ctz_zero = 0;

In optabs.c we compare the CTZ_DEFINED_VALUE_AT_ZERO against two,
is != 0 really what you want here?  The docs suggest to me
that as you are using the optab below you should compare against two, too.

What about cost considerations?  We only seem to have the general
"branches are expensive" metric - but ctz/clz may be prohibitely expensive
themselves, no?

Richard.

> +  /* Prefer a negative result if normalize == -1, since that does not
> + require an additional left shift.  */
> +
> +  if (normalizep == -1)
> +{
> +  if (ctz_zero < 0 && clz_zero >= GET_MODE_BITSIZE (mode))
> +clz_zero = 0;
> +  if (clz_zero < 0 && ctz_zero >= G

Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-27 Thread Richard Guenther
On Fri, Apr 27, 2012 at 12:50 PM, Eric Botcazou  wrote:
>> We do not depend on the block structure any more when dealing with
>> stack layout for variables in GCC 4.7.0 and above.  I am not saying
>> your patch is incorrect or not needed.  Just it will not have an
>> effect on variable stack layout.
>
> It might be worth backporting to the 4.6 branch though, these stack layout
> issues are very nasty.

What about not setting a BLOCK on the call stmt?  That said, I can't see
how outlining a SESE region that starts at the beginning of the function
is not conservatively using the outermost BLOCK ... so, is the bug not
elsewhere?

Richard.

> --
> Eric Botcazou


[Patch,4.7,committed]: Fix PR c/51527

2012-04-27 Thread Georg-Johann Lay
http://gcc.gnu.org/viewcvs?view=revision&revision=186899

Applied as approved in
http://gcc.gnu.org/ml/gcc/2012-04/msg00843.html

Johann

PR c/51527
* convert.c (convert_to_integer): Avoid infinite recursion for
target-defined built-in types.

Index: convert.c
===
--- convert.c	(revision 186872)
+++ convert.c	(working copy)
@@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr
 		   (Otherwise would recurse infinitely in convert.  */
 		if (TYPE_PRECISION (typex) != inprec)
 		  {
+		tree otypex = typex;
 		/* Don't do unsigned arithmetic where signed was wanted,
 		   or vice versa.
 		   Exception: if both of the original operands were
@@ -806,10 +807,12 @@ convert_to_integer (tree type, tree expr
 		  typex = unsigned_type_for (typex);
 		else
 		  typex = signed_type_for (typex);
-		return convert (type,
-fold_build2 (ex_form, typex,
-		 convert (typex, arg0),
-		 convert (typex, arg1)));
+
+		if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex))
+		  return convert (type,
+  fold_build2 (ex_form, typex,
+		   convert (typex, arg0),
+		   convert (typex, arg1)));
 		  }
 	  }
 	  }


Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-27 Thread Eric Botcazou
> We do not depend on the block structure any more when dealing with
> stack layout for variables in GCC 4.7.0 and above.  I am not saying
> your patch is incorrect or not needed.  Just it will not have an
> effect on variable stack layout.

It might be worth backporting to the 4.6 branch though, these stack layout 
issues are very nasty.

-- 
Eric Botcazou


Re: [PATCH] teach phi-opt to produce -(a COND b)

2012-04-27 Thread Richard Guenther
On Fri, 27 Apr 2012, Paolo Bonzini wrote:

> This patch teaches phiopt to look at phis whose arguments are -1 and 0,
> and produce negated setcc statements.
> 
> Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch
> for pr53138.  Ok for mainline?

Ok.

Thanks,
Richard.

> Paolo
> 
> 2012-04-27  Paolo Bonzini  
> 
> * tree-ssa-phiopt.c (conditional_replacement): Replace PHIs
>   whose arguments are -1 and 0, by negating the result of the
>   conditional.
> 
> 2012-04-27  Paolo Bonzini  
> 
> * gcc.c-torture/execute/20120427-2.c: New testcase.
> * gcc.dg/tree-ssa/phi-opt-10.c: New testcase.
> 
> 
> Index: tree-ssa-phiopt.c
> ===
> --- tree-ssa-phiopt.c (revisione 186859)
> +++ tree-ssa-phiopt.c (copia locale)
> @@ -536,17 +536,21 @@
>gimple_stmt_iterator gsi;
>edge true_edge, false_edge;
>tree new_var, new_var2;
> +  bool neg;
>  
>/* FIXME: Gimplification of complex type is too hard for now.  */
>if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE
>|| TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE)
>  return false;
>  
> -  /* The PHI arguments have the constants 0 and 1, then convert
> - it to the conditional.  */
> +  /* The PHI arguments have the constants 0 and 1, or 0 and -1, then
> + convert it to the conditional.  */
>if ((integer_zerop (arg0) && integer_onep (arg1))
>|| (integer_zerop (arg1) && integer_onep (arg0)))
> -;
> +neg = false;
> +  else if ((integer_zerop (arg0) && integer_all_onesp (arg1))
> +|| (integer_zerop (arg1) && integer_all_onesp (arg0)))
> +neg = true;
>else
>  return false;
>  
> @@ -558,7 +562,7 @@
>   falls through into BB.
>  
>   There is a single PHI node at the join point (BB) and its arguments
> - are constants (0, 1).
> + are constants (0, 1) or (0, -1).
>  
>   So, given the condition COND, and the two PHI arguments, we can
>   rewrite this PHI into non-branching code:
> @@ -585,12 +589,20 @@
>   edge so that we know when to invert the condition below.  */
>extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
>if ((e0 == true_edge && integer_zerop (arg0))
> -  || (e0 == false_edge && integer_onep (arg0))
> +  || (e0 == false_edge && !integer_zerop (arg0))
>|| (e1 == true_edge && integer_zerop (arg1))
> -  || (e1 == false_edge && integer_onep (arg1)))
> +  || (e1 == false_edge && !integer_zerop (arg1)))
>  cond = fold_build1_loc (gimple_location (stmt),
> - TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
> +TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
>  
> +  if (neg)
> +{
> +  cond = fold_convert_loc (gimple_location (stmt),
> +   TREE_TYPE (result), cond);
> +  cond = fold_build1_loc (gimple_location (stmt),
> +  NEGATE_EXPR, TREE_TYPE (cond), cond);
> +}
> +
>/* Insert our new statements at the end of conditional block before the
>   COND_STMT.  */
>gsi = gsi_for_stmt (stmt);
> Index: testsuite/gcc.c-torture/execute/20120427-2.c
> ===
> --- testsuite/gcc.c-torture/execute/20120427-2.c  (revisione 0)
> +++ testsuite/gcc.c-torture/execute/20120427-2.c  (revisione 0)
> @@ -0,0 +1,38 @@
> +typedef struct sreal
> +{
> +  unsigned sig;  /* Significant.  */
> +  int exp;   /* Exponent.  */
> +} sreal;
> +
> +sreal_compare (sreal *a, sreal *b)
> +{
> +  if (a->exp > b->exp)
> +return 1;
> +  if (a->exp < b->exp)
> +return -1;
> +  if (a->sig > b->sig)
> +return 1;
> +  if (a->sig < b->sig)
> +return -1;
> +  return 0;
> +}
> +
> +sreal a[] = {
> +   { 0, 0 },
> +   { 1, 0 },
> +   { 0, 1 },
> +   { 1, 1 }
> +};
> +
> +int main()
> +{
> +  int i, j;
> +  for (i = 0; i <= 3; i++) {
> +for (j = 0; j < 3; j++) {
> +  if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort();
> +  if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort();
> +  if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort();
> +}
> +  }
> +  return 0;
> +}
> Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c
> ===
> --- testsuite/gcc.dg/tree-ssa/phi-opt-10.c(revisione 0)
> 

Re: [patch] Fix cygwin ada install [was Re: Yet another issue with gcc current trunk with ada on cygwin]

2012-04-27 Thread Pascal Obry
Le 06/04/2012 17:27, Pascal Obry a écrit :
> 
> Back on this! It turn out that this breaks the shared Ada runtime.
> Indeed, exported variables on Ada packages in a DLLs are only accessible
> when linking against DLL (thanks to runtime pseudo reloc).
> 
> With the patch applied it is not possible to build any Ada application
> using the shared runtime. This is a very serious problem, the patch
> should be reverted.
> 
> Sorry for not having remembered before about this and why linking
> against DLL was so important!
> 

PING?

-- 
  Pascal Obry
  --
  gpg --keyserver keys.gnupg.net --recv-key F949BD3B


Re: [PATCH] teach emit_store_flag to use clz/ctz

2012-04-27 Thread Paolo Bonzini

> Index: config/rs6000/rs6000.md
> ===
> --- config/rs6000/rs6000.md   (revisione 186859)
> +++ config/rs6000/rs6000.md   (copia locale)
> @@ -2129,7 +2129,7 @@
>  (define_expand "abssi2"
>[(set (match_operand:SI 0 "gpc_reg_operand" "")
>   (abs:SI (match_operand:SI 1 "gpc_reg_operand" "")))]
> -  ""
> +  "TARGET_ISEL || TARGET_POWER"
>"
>  {
>if (TARGET_ISEL)
> @@ -2137,11 +2137,6 @@
>emit_insn (gen_abssi2_isel (operands[0], operands[1]));
>DONE;
>  }
> -  else if (! TARGET_POWER)
> -{
> -  emit_insn (gen_abssi2_nopower (operands[0], operands[1]));
> -  DONE;
> -}
>  }")
>  
>  (define_insn "*abssi2_power"
> @@ -2188,36 +2183,12 @@
> (match_dup 2)))]
>"")
>  
> -(define_insn_and_split "abssi2_nopower"
> -  [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,r")
> -(abs:SI (match_operand:SI 1 "gpc_reg_operand" "r,0")))
> -   (clobber (match_scratch:SI 2 "=&r,&r"))]
> -  "! TARGET_POWER && ! TARGET_ISEL"
> -  "#"
> -  "&& reload_completed"
> -  [(set (match_dup 2) (ashiftrt:SI (match_dup 1) (const_int 31)))
> -   (set (match_dup 0) (xor:SI (match_dup 2) (match_dup 1)))
> -   (set (match_dup 0) (minus:SI (match_dup 0) (match_dup 2)))]
> -  "")
> -
>  (define_insn "*nabs_power"
>[(set (match_operand:SI 0 "gpc_reg_operand" "=r")
>   (neg:SI (abs:SI (match_operand:SI 1 "gpc_reg_operand" "r"]
>"TARGET_POWER"
>"nabs %0,%1")
>  
> -(define_insn_and_split "*nabs_nopower"
> -  [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,r")
> -(neg:SI (abs:SI (match_operand:SI 1 "gpc_reg_operand" "r,0"
> -   (clobber (match_scratch:SI 2 "=&r,&r"))]
> -  "! TARGET_POWER"
> -  "#"
> -  "&& reload_completed"
> -  [(set (match_dup 2) (ashiftrt:SI (match_dup 1) (const_int 31)))
> -   (set (match_dup 0) (xor:SI (match_dup 2) (match_dup 1)))
> -   (set (match_dup 0) (minus:SI (match_dup 2) (match_dup 0)))]
> -  "")
> -
>  (define_expand "neg2"
>[(set (match_operand:SDI 0 "gpc_reg_operand" "")
>   (neg:SDI (match_operand:SDI 1 "gpc_reg_operand" "")))]
> @@ -7710,40 +7681,13 @@
>  (define_expand "absdi2"
>[(set (match_operand:DI 0 "gpc_reg_operand" "")
>   (abs:DI (match_operand:DI 1 "gpc_reg_operand" "")))]
> -  "TARGET_POWERPC64"
> +  "TARGET_POWERPC64 && TARGET_ISEL"
>"
>  {
> -  if (TARGET_ISEL)
> -emit_insn (gen_absdi2_isel (operands[0], operands[1]));
> -  else
> -emit_insn (gen_absdi2_internal (operands[0], operands[1]));
> +  emit_insn (gen_absdi2_isel (operands[0], operands[1]));
>DONE;
>  }")
>  
> -(define_insn_and_split "absdi2_internal"
> -  [(set (match_operand:DI 0 "gpc_reg_operand" "=&r,r")
> -(abs:DI (match_operand:DI 1 "gpc_reg_operand" "r,0")))
> -   (clobber (match_scratch:DI 2 "=&r,&r"))]
> -  "TARGET_POWERPC64 && !TARGET_ISEL"
> -  "#"
> -  "&& reload_completed"
> -  [(set (match_dup 2) (ashiftrt:DI (match_dup 1) (const_int 63)))
> -   (set (match_dup 0) (xor:DI (match_dup 2) (match_dup 1)))
> -   (set (match_dup 0) (minus:DI (match_dup 0) (match_dup 2)))]
> -  "")
> -
> -(define_insn_and_split "*nabsdi2"
> -  [(set (match_operand:DI 0 "gpc_reg_operand" "=&r,r")
> -(neg:DI (abs:DI (match_operand:DI 1 "gpc_reg_operand" "r,0"
> -   (clobber (match_scratch:DI 2 "=&r,&r"))]
> -  "TARGET_POWERPC64 && !TARGET_ISEL"
> -  "#"
> -  "&& reload_completed"
> -  [(set (match_dup 2) (ashiftrt:DI (match_dup 1) (const_int 63)))
> -   (set (match_dup 0) (xor:DI (match_dup 2) (match_dup 1)))
> -   (set (match_dup 0) (minus:DI (match_dup 2) (match_dup 0)))]
> -  "")
> -
>  (define_insn "muldi3"
>[(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
>  (mult:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
> @@ -13100,13 +13044,9 @@
>   operands[2], operands[3]);
>  }
>  
> -  /* For SNE, we would prefer that the xor/abs sequence be used for integers.
> - For SEQ, likewise, except that comparisons with zero should be done
> - with an scc insns.  However, due to the order that combine see the
> - resulting insns, we must, in fact, allow SEQ for integers.  Fail in
> - the cases we don't want to handle or are best handled by portable
> - code.  */
> -  if (GET_CODE (operands[1]) == NE)
> +  /* Fail in the cases we don't want to handle or are best handled by
> + portable code.  */
> +  if (GET_CODE (operands[1]) == NE || GET_CODE (operands[1]) == EQ)
>  FAIL;
>if ((GET_CODE (operands[1]) == LT || GET_CODE (operands[1]) == LE
> || GET_CODE (operands[1]) == GT || GET_CODE (operands[1]) == GE)
> 

I included the RS6000 bits for completeness, but I'm not seeking
approval now for them.  I'll resubmit with proper testcases.

Paolo


Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-27 Thread Eric Botcazou
> Ah, and all ACATS fails and
>
> -FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
> -FAIL: gnat.dg/loop_optimization3.adb execution test
> -FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
> -FAIL: gnat.dg/test_8bitlong_overflow.adb execution test
>
> are fixed by for example
>
> [...]
>
> thus are because array TYPE_DOMAIN is built using unsigned sizetype
> but these Ada testcases have array domains which really need signed
> types.  The above is of course a hack, but one that otherwise survives
> bootstrap / test of all languages.

Kind of a miracle if you ask me, but probably a reasonable way out for Ada.
Thanks a lot for devising it.

> Thus, we arrive at the following Ada regression status if the patch series
> is applied (plus the above incremental patch):
>
> === acats tests ===
>
> === acats Summary ===
> # of expected passes2320
> # of unexpected failures0
> Native configuration is x86_64-unknown-linux-gnu
>
> === gnat tests ===
>
>
> Running target unix/
> FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
> FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
> FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized "goto" 2
> FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
>
> === gnat Summary for unix/ ===
>
> # of expected passes1093
> # of unexpected failures4
> # of expected failures  13
> # of unsupported tests  2
>
> Running target unix//-m32
> FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
> FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
> FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized "goto" 2
> FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
>
> === gnat Summary for unix//-m32 ===
>
> # of expected passes1093
> # of unexpected failures4
> # of expected failures  13
> # of unsupported tests  2
>
> === gnat Summary ===
>
> # of expected passes2186
> # of unexpected failures8
> # of expected failures  26
> # of unsupported tests  4
>
>
> Which I consider reasonable?

Sure, no opposition by me to applying the whole set of patches.

-- 
Eric Botcazou


Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-27 Thread Eric Botcazou
> To followup myself - bootstrap with just the 2nd patch is still
> broken:
>
> /abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/
> -B/usr/local/x86_64-unknown-linux-gnu/bin/
> -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem
> /usr/local/x86_64-unknown-linux-gnu/include -isystem
> /usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic
> -W -Wall -gnatpg -nostdinc -m32  s-secsta.adb -o s-secsta.o
> s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk'
> is too large
> Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size);
> ^
> make[9]: *** [s-secsta.o] Error 1

Yes, because of a spurious TREE_OVERFLOW set on a sizetype calculation that 
cannot overflow.  Not really fixable in isolation, so I didn't pursue.

-- 
Eric Botcazou


Re: [PATCH] pr53138 - miscompilation of spaceship operator

2012-04-27 Thread Uros Bizjak
On Fri, Apr 27, 2012 at 11:49 AM, Paolo Bonzini  wrote:
> The testcase is miscompiled to this:
>
>        ...
>        movl    (%rsi), %ecx
>        cmpl    %ecx, (%rdi)
>        sbbl    %edx, %edx
>        cmovbe  %edx, %eax
>        ret
>
> but sbbl only preserves the carry flag, not the zero flag.  I suppose
> one could use different patterns with different CC modes, but this is
> the safest fix.
>
> Bootstrapped/regtested x86_64-pc-linux-gnu, ok for mainline?

OK.

> I also reproduced this at least on 4.7; help would be welcome on
> committing it to the branches.  Thanks in advance!

I can take care of this, but I won' be able to do this until Monday.

Thanks,
Uros.


[PATCH] teach phi-opt to produce -(a COND b)

2012-04-27 Thread Paolo Bonzini
This patch teaches phiopt to look at phis whose arguments are -1 and 0,
and produce negated setcc statements.

Bootstrapped/regtested x86_64-pc-linux-gnu, together with the patch
for pr53138.  Ok for mainline?

Paolo

2012-04-27  Paolo Bonzini  

* tree-ssa-phiopt.c (conditional_replacement): Replace PHIs
whose arguments are -1 and 0, by negating the result of the
conditional.

2012-04-27  Paolo Bonzini  

* gcc.c-torture/execute/20120427-2.c: New testcase.
* gcc.dg/tree-ssa/phi-opt-10.c: New testcase.


Index: tree-ssa-phiopt.c
===
--- tree-ssa-phiopt.c   (revisione 186859)
+++ tree-ssa-phiopt.c   (copia locale)
@@ -536,17 +536,21 @@
   gimple_stmt_iterator gsi;
   edge true_edge, false_edge;
   tree new_var, new_var2;
+  bool neg;
 
   /* FIXME: Gimplification of complex type is too hard for now.  */
   if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE
   || TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE)
 return false;
 
-  /* The PHI arguments have the constants 0 and 1, then convert
- it to the conditional.  */
+  /* The PHI arguments have the constants 0 and 1, or 0 and -1, then
+ convert it to the conditional.  */
   if ((integer_zerop (arg0) && integer_onep (arg1))
   || (integer_zerop (arg1) && integer_onep (arg0)))
-;
+neg = false;
+  else if ((integer_zerop (arg0) && integer_all_onesp (arg1))
+  || (integer_zerop (arg1) && integer_all_onesp (arg0)))
+neg = true;
   else
 return false;
 
@@ -558,7 +562,7 @@
  falls through into BB.
 
  There is a single PHI node at the join point (BB) and its arguments
- are constants (0, 1).
+ are constants (0, 1) or (0, -1).
 
  So, given the condition COND, and the two PHI arguments, we can
  rewrite this PHI into non-branching code:
@@ -585,12 +589,20 @@
  edge so that we know when to invert the condition below.  */
   extract_true_false_edges_from_block (cond_bb, &true_edge, &false_edge);
   if ((e0 == true_edge && integer_zerop (arg0))
-  || (e0 == false_edge && integer_onep (arg0))
+  || (e0 == false_edge && !integer_zerop (arg0))
   || (e1 == true_edge && integer_zerop (arg1))
-  || (e1 == false_edge && integer_onep (arg1)))
+  || (e1 == false_edge && !integer_zerop (arg1)))
 cond = fold_build1_loc (gimple_location (stmt),
-   TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
+TRUTH_NOT_EXPR, TREE_TYPE (cond), cond);
 
+  if (neg)
+{
+  cond = fold_convert_loc (gimple_location (stmt),
+   TREE_TYPE (result), cond);
+  cond = fold_build1_loc (gimple_location (stmt),
+  NEGATE_EXPR, TREE_TYPE (cond), cond);
+}
+
   /* Insert our new statements at the end of conditional block before the
  COND_STMT.  */
   gsi = gsi_for_stmt (stmt);
Index: testsuite/gcc.c-torture/execute/20120427-2.c
===
--- testsuite/gcc.c-torture/execute/20120427-2.c(revisione 0)
+++ testsuite/gcc.c-torture/execute/20120427-2.c(revisione 0)
@@ -0,0 +1,38 @@
+typedef struct sreal
+{
+  unsigned sig;/* Significant.  */
+  int exp; /* Exponent.  */
+} sreal;
+
+sreal_compare (sreal *a, sreal *b)
+{
+  if (a->exp > b->exp)
+return 1;
+  if (a->exp < b->exp)
+return -1;
+  if (a->sig > b->sig)
+return 1;
+  if (a->sig < b->sig)
+return -1;
+  return 0;
+}
+
+sreal a[] = {
+   { 0, 0 },
+   { 1, 0 },
+   { 0, 1 },
+   { 1, 1 }
+};
+
+int main()
+{
+  int i, j;
+  for (i = 0; i <= 3; i++) {
+for (j = 0; j < 3; j++) {
+  if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort();
+  if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort();
+  if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort();
+}
+  }
+  return 0;
+}
Index: testsuite/gcc.dg/tree-ssa/phi-opt-10.c
===
--- testsuite/gcc.dg/tree-ssa/phi-opt-10.c  (revisione 0)
+++ testsuite/gcc.dg/tree-ssa/phi-opt-10.c  (revisione 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+int nem1_phi (unsigned long a) { return a ? -1 : 0; }
+int eqm1_phi (unsigned long a) { return a ? 0 : -1; }
+
+int spaceship1 (long a) { return a > 0 ? 1 : a < 0 ? -1 : 0; }
+int spaceship2 (long a) { return a > 0 ? 1 : a == 0 ? 0 : -1; }
+
+/* { dg-final { scan-tree-dump-times " = -D" 4 "optimized"} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */


[PATCH] pr53138 - miscompilation of spaceship operator

2012-04-27 Thread Paolo Bonzini
The testcase is miscompiled to this:

...
movl(%rsi), %ecx
cmpl%ecx, (%rdi)
sbbl%edx, %edx
cmovbe  %edx, %eax
ret

but sbbl only preserves the carry flag, not the zero flag.  I suppose
one could use different patterns with different CC modes, but this is
the safest fix.

Bootstrapped/regtested x86_64-pc-linux-gnu, ok for mainline?

I also reproduced this at least on 4.7; help would be welcome on
committing it to the branches.  Thanks in advance!

Paolo

2012-04-27  Paolo Bonzini  

PR target/53138
* config/i386/i386.md (x86_movcc_0_m1_neg): Add clobber.

2012-04-27  Paolo Bonzini  

PR target/53138
* gcc.c-torture/execute/20120427-1.c: New testcase.

Index: config/i386/i386.md
===
--- config/i386/i386.md (revisione 186859)
+++ config/i386/i386.md (copia locale)
@@ -16439,7 +16439,8 @@
 (define_insn "*x86_movcc_0_m1_neg"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(neg:SWI48 (match_operator 1 "ix86_carry_flag_operator"
-   [(reg FLAGS_REG) (const_int 0)])))]
+   [(reg FLAGS_REG) (const_int 0)])))
+   (clobber (reg:CC FLAGS_REG))]
   ""
   "sbb{}\t%0, %0"
   [(set_attr "type" "alu")
Index: testsuite/gcc.c-torture/execute/20120427-1.c
===
--- testsuite/gcc.c-torture/execute/20120427-1.c(revisione 0)
+++ testsuite/gcc.c-torture/execute/20120427-1.c(revisione 0)
@@ -0,0 +1,36 @@
+typedef struct sreal
+{
+  unsigned sig;/* Significant.  */
+  int exp; /* Exponent.  */
+} sreal;
+
+sreal_compare (sreal *a, sreal *b)
+{
+  if (a->exp > b->exp)
+return 1;
+  if (a->exp < b->exp)
+return -1;
+  if (a->sig > b->sig)
+return 1;
+  return -(a->sig < b->sig);
+}
+
+sreal a[] = {
+   { 0, 0 },
+   { 1, 0 },
+   { 0, 1 },
+   { 1, 1 }
+};
+
+int main()
+{
+  int i, j;
+  for (i = 0; i <= 3; i++) {
+for (j = 0; j < 3; j++) {
+  if (i < j && sreal_compare(&a[i], &a[j]) != -1) abort();
+  if (i == j && sreal_compare(&a[i], &a[j]) != 0) abort();
+  if (i > j && sreal_compare(&a[i], &a[j]) != 1) abort();
+}
+  }
+  return 0;
+}


[PATCH] teach emit_store_flag to use clz/ctz

2012-04-27 Thread Paolo Bonzini
If the value at zero is outside the range [0, GET_MODE_BITSIZE (mode)),
"A != 0" and "A == 0" can be compiled to clz/ctz followed by a subtraction
or one's complement (only for A != 0) and a shift.  This trick can be
used effectively on PPC (though there are other problems in the machine
description to fix first).

Bootstrapped/regtested x86_64-pc-linux-gnu, and tested manually on PPC
with some changes to the machine description.  Ok for mainline?

Paolo

2012-04-27  Paolo Bonzini  

PR 53087
* expmed.c (emit_store_flag): Generalize to support cases
where the result is not in the sign bit.  Add a trick
involving the value at zero of clz or ctz.

Index: expmed.c
===
--- expmed.c(revisione 186859)
+++ expmed.c(copia locale)
@@ -5419,6 +5419,7 @@
   enum rtx_code rcode;
   rtx subtarget;
   rtx tem, last, trueval;
+  int shift;
 
   tem = emit_store_flag_1 (target, code, op0, op1, mode, unsignedp, normalizep,
   target_mode);
@@ -5612,10 +5613,11 @@
   false) <= 1 || (code != LE && code != GT
 return 0;
 
-  /* Try to put the result of the comparison in the sign bit.  Assume we can't
- do the necessary operation below.  */
+  /* Assume we can't do the necessary operation below, so try to put the
+ result of the comparison in a known bit, given by SHIFT.  */
 
   tem = 0;
+  shift = GET_MODE_BITSIZE (mode) - 1;
 
   /* To see if A <= 0, compute (A | (A - 1)).  A <= 0 iff that result has
  the sign bit set.  */
@@ -5650,25 +5652,22 @@
 
   if (code == EQ || code == NE)
 {
-  /* For EQ or NE, one way to do the comparison is to apply an operation
-that converts the operand into a positive number if it is nonzero
-or zero if it was originally zero.  Then, for EQ, we subtract 1 and
-for NE we negate.  This puts the result in the sign bit.  Then we
-normalize with a shift, if needed.
+  /* Look for an operation that converts the operand into a positive
+number if it is nonzero or zero if it was originally zero.  Then,
+for EQ, we subtract 1 and for NE we negate.  This puts the result
+in the sign bit.  Then we normalize with a shift, if needed.
 
-Two operations that can do the above actions are ABS and FFS, so try
-them.  If that doesn't work, and MODE is smaller than a full word,
-we can use zero-extension to the wider mode (an unsigned conversion)
-as the operation.  */
-
-  /* Note that ABS doesn't yield a positive number for INT_MIN, but
+ABS can do this; it doesn't yield a positive number for INT_MIN, but
 that is compensated by the subsequent overflow when subtracting
-one / negating.  */
+one / negating.  If that doesn't work, and MODE is smaller than
+a full word, we can use zero-extension to the wider mode (an
+unsigned conversion) as the operation.
 
+FFS could also do this, but we use a more versatile trick with
+CLZ/CTZ below.  */
+
   if (optab_handler (abs_optab, mode) != CODE_FOR_nothing)
tem = expand_unop (mode, abs_optab, op0, subtarget, 1);
-  else if (optab_handler (ffs_optab, mode) != CODE_FOR_nothing)
-   tem = expand_unop (mode, ffs_optab, op0, subtarget, 1);
   else if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
{
  tem = convert_modes (word_mode, mode, op0, 1);
@@ -5684,6 +5683,69 @@
tem = expand_unop (mode, neg_optab, tem, subtarget, 0);
}
 
+  if (tem == 0)
+{
+  int clz_zero, ctz_zero, if_zero;
+
+  /* For EQ or NE, clz and ctz may give a unique value for zero that is
+ not in the [0, GET_MODE_BITSIZE (mode)) range.  The result can be
+ massaged to put the result in a known bit.  */
+
+  if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, clz_zero))
+clz_zero = 0;
+  if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, ctz_zero))
+ctz_zero = 0;
+
+  /* Prefer a negative result if normalize == -1, since that does not
+ require an additional left shift.  */
+
+  if (normalizep == -1)
+{
+  if (ctz_zero < 0 && clz_zero >= GET_MODE_BITSIZE (mode))
+clz_zero = 0;
+  if (clz_zero < 0 && ctz_zero >= GET_MODE_BITSIZE (mode))
+ctz_zero = 0;
+}
+
+  if ((clz_zero < 0 || clz_zero >= GET_MODE_BITSIZE (mode))
+  && optab_handler (clz_optab, mode) != CODE_FOR_nothing)
+{
+  tem = expand_unop (mode, clz_optab, op0, subtarget, 1);
+  if_zero = clz_zero;
+}
+  if (tem == 0
+  && (ctz_zero < 0 || ctz_zero >= GET_MODE_BITSIZE (mode))
+  && optab_handler (ctz_optab, mode) != CODE_FOR_nothing)
+{
+  tem = expand_unop (mode, ctz_opt

Re: [patch] Obvious: Fix DF solution dirty marking in cfg.c:disconnect_src

2012-04-27 Thread Paolo Bonzini
Il 27/04/2012 11:18, Steven Bosscher ha scritto:
> Hello,
> 
> It makes no sense to mark DF solutions dirty on the gcc_unreachable()
> path but not on the return path.
> 
> Bootstrapped&tested on x86_64-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu. I'll this, as obvious, some time late
> next week unless I hear objections.

Ok.

Paolo



[patch] Obvious: Fix DF solution dirty marking in cfg.c:disconnect_src

2012-04-27 Thread Steven Bosscher
Hello,

It makes no sense to mark DF solutions dirty on the gcc_unreachable()
path but not on the return path.

Bootstrapped&tested on x86_64-unknown-linux-gnu and
powerpc64-unknown-linux-gnu. I'll this, as obvious, some time late
next week unless I hear objections.

Ciao!
Steven


* cfg.c (disconnect_src): Do df_mark_solutions_dirty in the right place.

Index: cfg.c
===
--- cfg.c   (revision 186897)
+++ cfg.c   (working copy)
@@ -242,13 +242,13 @@ disconnect_src (edge e)
   if (tmp == e)
{
  VEC_unordered_remove (edge, src->succs, ei.index);
+ df_mark_solutions_dirty ();
  return;
}
   else
ei_next (&ei);
 }

-  df_mark_solutions_dirty ();
   gcc_unreachable ();
 }


Request new specs string token for multilib_os_dir

2012-04-27 Thread Steven Drake
In the GCC FAQ under "Dynamic linker is unable to find GCC libraries", one
suggestion is to add '-R' or '-rpath' linker option to the *link or *lib 
specs so that the GCC libraries can be found.

E.G. the following line is added to the DRIVER_DEFINES when building gcc
via pkgsrc ('$(LINKER_RPATH_FLAG)' comes from the environment):

  -DLINK_LIBGCC_SPEC="\"%D $(LINKER_RPATH_FLAG)$(libdir) \""

This is needed as the prefix is normally something like '/usr/pkg/gcc47'.

The problem is that this does not allow for multilib os directory's and
there is currently no simple way of dong so.

My solution is to add the '%M' token that expands to multilib_os_dir.

The above line can then be easily change to handle multilib directory's:

  -DLINK_LIBGCC_SPEC="\"%D $(LINKER_RPATH_FLAG)$(libdir)/%M \""


2012-04-27  Steven Drake 

* gcc.c (do_spec_1): Add %M spec token to output multilib_os_dir.

--- gcc/gcc.c.orig  2012-02-28 17:31:38.0 +
+++ gcc/gcc.c
@@ -5115,6 +5115,13 @@ do_spec_1 (const char *spec, int inswitc
  return value;
break;
 
+ case 'M':
+   if (multilib_os_dir == NULL)
+ obstack_1grow (&obstack, '.');
+   else
+ obstack_grow (&obstack, multilib_os_dir, strlen(multilib_os_dir));
+   break;
+
  case 'G':
value = do_spec_1 (libgcc_spec, 0, NULL);
if (value != 0)

-- 
Steven


Re: Continue strict-volatile-bitfields fixes

2012-04-27 Thread Jakub Jelinek
On Fri, Apr 27, 2012 at 11:00:19AM +0200, Richard Guenther wrote:
> But won't it re-introduce bugs like PR52080, 52097 or 48124?  Also the

No.  All those are about bitfield stores, not reads.  All extract_bit*
functions currently pass 0, 0 as bitrange_{start,end}.

> proper place for this optimization is lowering and CSEing of bit-field loads
> (not that this lowering already exists).

I agree, but we don't have that yet.  When it is added,
optimize_bitfield* should be certainly moved there, but if we do that
before, we regress generated code quality.

Jakub


Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-27 Thread Richard Guenther
On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries  wrote:
> On 26/04/12 12:20, Richard Guenther wrote:
>> On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries  
>> wrote:
>>> On 25/04/12 11:57, Richard Guenther wrote:
>>> 
 Hmm.  I'm not sure we can conclude that they have the same value!

 +int bar (int);
 +void baz (int);
 +void bla (int);
 +
 +void
 +foo (int y)
 +{
 +  int a;
 +  if (y == 6)
 +    {
 +      bla (5);
 +      a = bar (7);
 +    }
 +  else
 +    {
 +      bla (5);
 +      a = bar (7);
 +    }
 +  baz (a);
 +}

 I can implement bla as

 void bla(int) { a = random (); }

 thus a would not have the same value on both paths
>>
>> I think it's hard to decide whether they have the same value or not, 
>> since we
>> cannot write a test-case that compares the results of calls from 2 
>> branches of
>> an if statement.
 void *foo ()
 {
   return __builtin_return_address (0);
 }

 void *bar (_Bool b)
 {
   if (b)
     return foo ();
   else
     return foo ();
 }

 int main()
 {
   if (bar(true) == bar(false))
     abort ();
 }

 ok ... outside of the scope of standard "C", but we certainly _can_ do 
 this.
 Which would question tail-merging the above at all, of course.

>>>
>>> I added noinline to make sure there's no variation from inlining.
>>> ...
>>> extern void abort (void) __attribute__ ((__nothrow__)) __attribute__
>>> ((__noreturn__));
>>>
>>> __attribute__((noinline))
>>> void *foo ()
>>> {
>>>  return __builtin_return_address (0);
>>> }
>>>
>>> __attribute__((noinline))
>>> void *bar (int b)
>>> {
>>>  if (b)
>>>    return foo ();
>>>  else
>>>    return foo ();
>>> }
>>>
>>> int main()
>>> {
>>>  void *a, *b;
>>>  a = bar (1);
>>>  b = bar (0);
>>>  if (a == b)
>>>    abort ();
>>>  return 0;
>>> }
>>> ...
>>>
>>> This test-case passes with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
>>> -fno-crossjumping
>>> ...
>>>
>>> and fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge 
>>> -fcrossjumping
>>> ...
>>>
>>> with the proposed patch, it also fails with:
>>> ...
>>> gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge 
>>> -fno-crossjumping
>>> ...
>>>
>>> Tree-tail-merge performs the same transformation as cross-jumping for this
>>> test-case. I think that the transformation is valid, and that the test-case
>>> behaves as expected. Subsequent calls to foo may or may not return the same
>>> value, depending on the optimizations, we can't assert a != b, that's just 
>>> the
>>> nature of __builtin_return_address.
>>>
>>> Btw, this example is not what I meant when I said 'a test-case that 
>>> compares the
>>> results of calls from 2 branches of an if statement'. I tried to say that 
>>> the
>>> semantics of C is defined in terms of taking either 1 branch or the other,
>>> rather than executing both in parallel, after which both results would be
>>> available. From that point of view, this example compares subsequent calls 
>>> to
>>> foo, not parallel.
>>
>> Ah, I see.  Btw, I was not suggesting that we should not optimize the above.
>>
>> I started looking at the problem in terms of subsequent calls. Consider 
>> the
>> imaginary attribute nosideeffect, similar to const and pure.
>>
>> const:
>> - no effects except the return value
>> - the return value depends only on the parameters
>>
>> pure:
>> - no effects except the return value
>> - the return value depends only on the parameters and/or global variables
>>
>> nosideeffect:
>> - no effects except the return value
>>
>> The code fragment:
>> ...
>> extern int f (int) __attribute ((nosideeffect));
>> int g()
>> {
>>  int a;
>>  int b;
>>  a = f (5);
>>  b = f (5);
>>  return a + b;
>> }
>> ...
>>
>> would result in a gimple sequence more or less like this:
>> ...
>>  # VUSE <.MEMD.1712_4(D)>
>>  aD.1707_1 = fD.1704 (5);
>>  # VUSE <.MEMD.1712_4(D)>
>>  bD.1708_2 = fD.1704 (5);
>>  D.1710_3 = aD.1707_1 + bD.1708_2;
>>  # VUSE <.MEMD.1712_6>
>>  return D.1710_3;
>> ...
>>
>> The value numbering modification I'm proposing would say that a and b 
>> have the
>> same value, which is not true. I updated the patch to ensure in this 
>> case that a
>> and b would be seen as different.
>> Note that things won't break if the attribute is missing on a function, 
>> since
>> that just means that the function would have a vdef, and there would be 
>> no 2
>> subsequent function calls with the same vuse.
>>
 - but that is not what
>>>

Re: Continue strict-volatile-bitfields fixes

2012-04-27 Thread Richard Guenther
On Fri, Apr 27, 2012 at 10:29 AM, Jakub Jelinek  wrote:
> On Fri, Apr 27, 2012 at 12:42:41PM +0800, Thomas Schwinge wrote:
>> > > GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
>> > > VOIDmode.)
>> > >
>> > > Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case,
>> > > or should a later optimization pass be able to figure out that
>> > > »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as
>> > > common->code, and then be able to conflate these?  Any suggestions
>> > > where/how to tackle this?
>> >
>> > The BIT_FIELD_REF is somewhat of a red-herring.  It is created by 
>> > fold-const.c
>> > in optimize_bit_field_compare, code that I think should be removed 
>> > completely.
>> > Or it needs to be made aware of strict-volatile bitfield and C++ memory 
>> > model
>> > details.
>
> I'd actually very much prefer the latter, just disable
> optimize_bit_field_compare for strict-volatile bitfield mode and when
> avoiding load data races in C++ memory model (that isn't going to be
> default, right?).  This optimization is useful, and it is solely about
> loads, so even C++ memory model usually shouldn't care.

But won't it re-introduce bugs like PR52080, 52097 or 48124?  Also the
proper place for this optimization is lowering and CSEing of bit-field loads
(not that this lowering already exists).

Well, I'm obviously biased here - fold doing this kind of transform looks
out-of-place.

Richard.

>        Jakub


Re: [C11-atomic] [patch] gimple atomic statements

2012-04-27 Thread Richard Guenther
On Thu, Apr 26, 2012 at 10:07 PM, Andrew MacLeod  wrote:
> On 04/05/2012 05:14 AM, Richard Guenther wrote:
>>
>> + static inline bool
>> + gimple_atomic_has_fail_order (const_gimple gs)
>> + {
>> +   return gimple_atomic_kind (gs) == GIMPLE_ATOMIC_COMPARE_EXCHANGE;
>> + }
>>
>> btw, these kind of predicates look superfluous to me - if they are true
>> exactly for one atomic kind then users should use the predicate to
>> test for that specific atomic kind, not for some random field presence.
>
>
> yeah, a previous incarnation artifact, removed.
>
>>
>> All asserts in inline functions should be gcc_checking_asserts.
>
> indeed.
>
>>
>>
>> you should have a gimple_build_atomic_raw function that takes all
>> operands and the atomic kind, avoiding the need for all the repeated
>> calls of gimple_atomic_set_* as well as avoid all the repeated checking
>> this causes.
>
> as well. done.
>
>
>>
>> +   else if (is_gimple_atomic (stmt))
>> +     {
>> +       tree t;
>> +       if (visit_store)
>> +         {
>> +         for (i = 0; i<  gimple_atomic_num_lhs (stmt); i++)
>> +           {
>> +             t = gimple_atomic_lhs (stmt, i);
>> +             if (t)
>> +               {
>> +                 t = get_base_loadstore (t);
>> +                 if (t)
>> +                   ret |= visit_store (stmt, t, data);
>> +               }
>>
>> I thought results are always registers?  The walk_stmt_load_store_addr_ops
>> looks wrong to me.  As the loads/stores are implicit (you only have
>> addresses)
>> you have to adjust all callers of walk_stmt_load_store_addr_ops to handle
>> atomics specially as they expect to come along all loads/stores that way.
>> Those also handle calls and asms (for "memory" clobber) specially.
>
>
> hmm, yeah they always return a value.     I was just copying the gimple_call
> code...  Why would we need to do this processing  for a GIMPLE_CALL lhs and
> not a GIMPLE_ATOMIC lhs?

GIMPLE_CALL lhs can be memory if the call returns an aggregate, similar
GIMPLE_CALL arguments can be memory if they are aggregate.

Can this be the case for atomics as well?

>   And the RHS processing is the same as for a
> is_gimple_call as well...  it was lifted from the code immediately
> following.,..  I tried to write all this code to return the same values as
> would have been returned had it actually been a  built-in __sync or __atomic
> call like we have today.  Once I had them all, then we could actually make
> any improvements based on our known side effect limits.
>
> I guess I don't understand the rest of the comment about why I need to do
> something different here than with a call...

Well, all callers that use walk_stmt_load_store/addr_ops need to handle
non-explicit loads/stores represented by the stmt.  For calls this includes
the loads and stores the callee may perform.  For atomics this includes  ?
(depends on whether the operand of an atomic-load is a pointer or an object,
I suppose it is a pointer for the following) For atomic this includes the
implicit load of *{address operand} which will _not_ be returned by
walk_stmt_load_store/addr_ops.  Thus callers that expect to see all loads/stores
(like ipa-pure-const.c) need to explicitely handle atomics similar to how they
handle calls and asms (if only in a very conservative way).  Similar the
alias oracle needs to handle them (obviously).

>> Index: tree-ssa-alias.c
>> ===
>> *** tree-ssa-alias.c    (revision 186098)
>> --- tree-ssa-alias.c    (working copy)
>> *** ref_maybe_used_by_stmt_p (gimple stmt, t
>> *** 1440,1445 
>> --- 1440,1447 
>>       }
>>     else if (is_gimple_call (stmt))
>>       return ref_maybe_used_by_call_p (stmt, ref);
>> +   else if (is_gimple_atomic (stmt))
>> +     return true;
>>
>> please add a comment before these atomic handlings that we assume
>> they are using/clobbering all refs because they are considered memory
>> optimization barriers.  Btw, does this apply to non-address-taken
>> automatic
>> references?  I suppose not.  Consider:
>>
>> int foo()
>> {
>>   struct s;
>>   atomic_fence();
>>   s.a = 1;
>>   atomic_fence();
>>   return s.a;
>> }
>>
>> we still should be able to optimize this to return 1, no?  At least SRA
>> will
>
> yes, locals can do anything they want since they aren't visible to other
> processes.  at the moment, we'll leave those fences in because we dont
> optimize atomics at all, but  "in the fullness of time" this will be
> optimized to:
> int foo()
> {
>  atomic_fence()
>  return 1;
> }
>
> at the moment we produce:
>
> int foo()
> {
>  atomic_fence()
>  atomic_fence()
>  return 1;
>
> }

Which is inconsistend with the alias-oracle implementation.  Please fix it
to at _least_ not consider non-aliased locals.  You can look at the call
handling for how to do this - where you can also see how to do even better
by using points-to information from the pointer operand of the atomics
that specify the memory lo

Re: [C11-atomic] [patch] gimple atomic statements

2012-04-27 Thread Richard Guenther
On Thu, Apr 26, 2012 at 7:53 PM, Andrew MacLeod  wrote:
> On 04/05/2012 05:14 AM, Richard Guenther wrote:
>>
>>
>> Ok.  Remember that you should use non-tree things if you can in GIMPLE
>> land.  This probably means that both the size and the memmodel "operands"
>> should be
>>
>> + struct GTY(()) gimple_statement_atomic
>> + {
>> +   /* [ WORD 1-8 ]  */
>> +   struct gimple_statement_with_memory_ops_base membase;
>> +
>> +   /* [ WORD 9 ] */
>> +   enum gimple_atomic_kind kind;
>> +
>>      enum gimple_atomic_memmodel memmodel;
>>
>>      unsigned size;
>>
>> and not be trees in the ops array.  Even more, both kind and memmodel
>> should probably go in membase.base.subcode
>
>
> I'm using subcode now for for the fetch_op and op_fetch operation when we
> actually need a tree code for plus, sub, and, etc...  so I am utilizing that
> field. Since the 'kind' field is present in ALL node, and the tree code was
> only needed in some, it looked cleaner to utilize the subcode field for the
> 'sometimes' field so that it wouldnt be an obvious wasted field in all the
> non-fetch nodes.  In the end it amounts to the same thing, but just looked
> cleaner :-)   I could change if it you felt strongly about it and use
> subcode for the kind and create a tree_code field in the object for the
> operation.
>
> Since integral atomics are always of an unsigned type ,  I could switch over
> and use 'unsigned size' instead of 'tree fntype' for them (I will rename
> it), but then things may  be more complicated when dealing with generic
> atomics...  those can be structure or array types and I was planning to
> allow leaving the type in case I discover something useful I can do with it.
>  It may ultimately turn out that the real type isn't going to matter, in
> which case I will remove it and replace it with an unsigned int for size.

So it eventually will support variable-size types?

> And the reason memmodel is a tree is because, as ridiculous as it seems, it
> can ultimately be a runtime value.    Even barring that, it shows up as a
> variable after inlining before various propagation engines run, especially
> in the  C++ templates.  So it must be a tree.

Ick.  That sounds gross.  So, if it ends up a variable at the time we generate
assembly we use a "conservative" constant value for it?

>
>>> I was actually thinking about doing it during gimplification... I hadnt
>>> gotten as far as figuring out what to do with the functions from the
>>> front
>>> end yet.  I dont know that code well, but I was in fact hoping there was
>>> a
>>> way to 'recognize' the function names easily and avoid built in functions
>>> completely...
>>
>> Heh ... you'd still need a GENERIC representation then.  Possibly
>> a ATOMIC_OP tree may do.
>
> possibly...   or maybe a single generic atomic_builtin with a kind and a
> variable list of  parameters.
>
>
>> well, the names must remain exposed and recognizable since they are 'out
>> there'.  Maybe under the covers I can just leave them as normal calls and
>> then during gimplification simply recognize the names and generate
>> GIMPLE_ATOMIC statements directly from the CALL_EXPR.  That would be
>> ideal.
>>  That way there are no builtins any more.
>> I suppose my question was whether the frontends need to do sth about the
>> __atomic keyword or if that is simply translated to some type flag - or
>> is that keyword applying to operations, not to objects or types?
>>
> The _Atomic keyword is a type modifier like const or volatile.  So during
> gimplification I'll also look for all occurrences of variables in normal
> expressions which have that bit set in the type,  then translate the
> expression to utilize the new gimple atomic node.  so
>
> _Atomic int var = 0;
>  var += 4;
>  foo (var);
>
> would become
>
>  __atomic_add_fetch (&var, 4, SEQ_CST);
>  D.123 = __atomic_load (&var, SEQ_CST);
> foo (D.123);

Hmm, ok.  So you are changing GENERIC in fact.  I suppose _Atomic is
restricted to global variables.  Can I attach _Atomic to allocated storage
via pointer casts?

Consider writing up semantics of _Atomic into generic.texi please.

>
>
>
>
>>>
> So bottom line, a GIMPLE_ATOMIC statement is just an object that is
> much
> easier to work with.

 Yes, I see that it is easier to work with for you.  All other statements
 will
 see GIMPLE_ATOMICs as blockers for their work though, even if they
 already deal with calls just fine - that's why I most of the time
 suggest
 to use builtins (or internal fns) to do things (I bet you forgot to
 update
 enough predicates ...).  Can GIMPLE_ATOMICs throw with
 -fnon-call-exceptions?
 I suppose yes.  One thing you missed at least ;)
>>>
>>>
>>> Not that I am aware of, they are 'noexcept'.  But I'm sure I've missed
>>> more
>>> than a few things so far.  Im pretty early in the process :-)
>>
>> What about compare-exchange on a pointer dereference? The pointer
>> dereference surely can trap, so it can throw w

Re: [PATCH][ARM] NEON DImode immediate constants

2012-04-27 Thread Andrew Stubbs

Ping.

On 10/04/12 14:00, Andrew Stubbs wrote:

Ping.

On 30/03/12 12:15, Andrew Stubbs wrote:

On 28/02/12 16:20, Andrew Stubbs wrote:

Hi all,

This patch implements 64-bit immediate constant loads in NEON.

The current state is that you can load const_vector, but not const_int.
This is clearly not ideal. The result is a constant pool entry when it's
not necessary.

The patch disables the movdi_vfp patterns for loading DImode values, if
the operand is const_int and NEON is enabled, and extends the neon_mov
pattern to include DImode const_int, as well as the const_vector
operands. I've modified neon_valid_immediate only enough to accept
const_int input - the logic remains untouched.


That patch failed to bootstrap successfully, but this updated patch
bootstraps and tests with no regressions.

OK?

Andrew







Re: combine_conversions int->double->int

2012-04-27 Thread Marc Glisse

On Fri, 27 Apr 2012, Richard Guenther wrote:


Do you have a copyright assignment on file?


Yes.

--
Marc Glisse


Re: Continue strict-volatile-bitfields fixes

2012-04-27 Thread Jakub Jelinek
On Fri, Apr 27, 2012 at 12:42:41PM +0800, Thomas Schwinge wrote:
> > > GET_MODE_BITSIZE (lmode)« (8 bits).  (With the current sources, lmode is
> > > VOIDmode.)
> > >
> > > Is emmitting »BIT_FIELD_REF <*common, 32, 0> & 255« wrong in this case,
> > > or should a later optimization pass be able to figure out that
> > > »BIT_FIELD_REF <*common, 32, 0> & 255« is in fact the same as
> > > common->code, and then be able to conflate these?  Any suggestions
> > > where/how to tackle this?
> > 
> > The BIT_FIELD_REF is somewhat of a red-herring.  It is created by 
> > fold-const.c
> > in optimize_bit_field_compare, code that I think should be removed 
> > completely.
> > Or it needs to be made aware of strict-volatile bitfield and C++ memory 
> > model
> > details.

I'd actually very much prefer the latter, just disable
optimize_bit_field_compare for strict-volatile bitfield mode and when
avoiding load data races in C++ memory model (that isn't going to be
default, right?).  This optimization is useful, and it is solely about
loads, so even C++ memory model usually shouldn't care.

Jakub


Re: [C] improve missing initializers diagnostics

2012-04-27 Thread Manuel López-Ibáñez
On 25 April 2012 18:08, Manuel López-Ibáñez  wrote:
> On 25 April 2012 16:46, H.J. Lu  wrote:
>> On Sat, Apr 21, 2012 at 4:58 AM, Manuel López-Ibáñez
>>  wrote:
>>> This patch improves missing initializers diagnostics. From:
>>>
>>> pr36446.c:13:3: warning: missing initializer [-Wmissing-field-initializers]
>>>   .h = {1},
>>>   ^
>>> pr36446.c:13:3: warning: (near initialization for ‘m0.h.b’)
>>> [-Wmissing-field-initializers]
>>>   .h = {1},
>>>   ^
>>>
>>> to:
>>>
>>> pr36446.c:13:3: warning: missing initializer for field ‘b’ of ‘struct
>>> h’ [-Wmissing-field-initializers]
>>>   .h = {1},
>>>   ^
>>> pr36446.c:3:7: note: ‘b’ declared here
>>>   int b;
>>>       ^
>>>
>>> Bootstrapped/regression tested.
>>>
>>> OK?
>>>
>>>
>>> 2012-04-19  Manuel López-Ibáñez  
>>>
>>>        * c-typeck.c (pop_init_level): Improve diagnostics.
>>> testsuite/
>>>        * gcc.dg/m-un-2.c: Update.
>>>        * gcc.dg/20011021-1.c: Update.
>>
>> On Linux/x86, revision 186808 gave me:
>>
>> FAIL: gcc.dg/20011021-1.c  (test for warnings, line 34)
>> FAIL: gcc.dg/20011021-1.c  (test for warnings, line 41)
>> FAIL: gcc.dg/20011021-1.c  (test for warnings, line 44)
>> FAIL: gcc.dg/20011021-1.c (test for excess errors)
>> FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 27)
>> FAIL: gcc.dg/20011021-1.c near init (test for warnings, line 30)
>> FAIL: gcc.dg/m-un-2.c (test for excess errors)
>> FAIL: gcc.dg/m-un-2.c warning regression 2 (test for warnings, line 12)
>> FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 14)
>> FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 7)
>> FAIL: gcc.dg/missing-field-init-2.c  (test for warnings, line 8)
>> FAIL: gcc.dg/missing-field-init-2.c (test for excess errors)
>>
>> Revision 186806 is OK.
>
> Somehow I committed a broken version of the patch. It should have been this:
>
>
> --- gcc/c-typeck.c      (revision 186821)
> +++ gcc/c-typeck.c      (working copy)
> @@ -7063,11 +7063,11 @@ pop_init_level (int implicit, struct obs
>            if (warning_at (input_location, OPT_Wmissing_field_initializers,
>                            "missing initializer for field %qD of %qT",
>                            constructor_unfilled_fields,
>                            constructor_type))
>              inform (DECL_SOURCE_LOCATION (constructor_unfilled_fields),
> -                     "%qT declared here", constructor_unfilled_fields);
> +                     "%qD declared here", constructor_unfilled_fields);
>          }
>     }
>
>
> I'll commit as soon as it finishes bootstrapping+testing.

Committed as revision 186896.

Cheers,

Manuel.


Re: combine_conversions int->double->int

2012-04-27 Thread Richard Guenther
On Thu, Apr 26, 2012 at 8:43 PM, Marc Glisse  wrote:
> On Thu, 26 Apr 2012, Richard Guenther wrote:
>
>> On Wed, Apr 25, 2012 at 3:58 PM, Marc Glisse  wrote:
>>>
>>> Here is take 2 on this patch, which seems cleaner. Bootstrapped and
>>> regression tested.
>>>
>>> gcc/ChangeLog
>>>
>>> 2012-04-25  Marc Glisse  
>>>
>>>        PR middle-end/27139
>>>        * tree-ssa-forwprop.c (combine_conversions): Handle INT->FP->INT.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> 2012-04-25  Marc Glisse  
>>>
>>>        PR middle-end/27139
>>>        * gcc.dg/tree-ssa/forwprop-18.c: New test.
>>>
>>>
>>> In my patch, the lines with gimple_assign_* are vaguely guessed from what
>>> is
>>> around, I don't pretend to understand them.
>>
>>
>> ;)
>>
>> The patch looks good to me - on a 2nd thought folding the case back in
>> to the if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
>> block to benefit from the local vars therein makes sense (but as separate
>> sub-if () as it is now).
>
>
> Like the attached? (c,c++ bootstrapped, make -k check, no regression)
>
> I changed the types in the testcase to make it really unlikely that a
> platform needs to skip it (it would require a long double that can't
> represent all signed char, or a float that can represent all unsigned long
> long).

Yes.  Do you have a copyright assignment on file?

Thanks,
Richard.

> --
> Marc Glisse


Re: [patch] Simplify tree-switch-conversion.c a bit - prep work for gimple switch lowering

2012-04-27 Thread Richard Guenther
On Thu, Apr 26, 2012 at 7:48 PM, Steven Bosscher  wrote:
> Hello,
>
> The attached patch re-organizes some code in tree-switch-conversion.c.
> All information about a GIMPLE_SWITCH is now collected by one
> function, so that my switch lowering code can use the same
> switch_conv_info. Bootstrapped&tested on x86_64-unknown-linux-gnu. OK
> for trunk?

Ok.

Thanks,
Richard.

> Ciao!
> Steven


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-27 Thread Xinliang David Li
On Fri, Apr 27, 2012 at 12:07 AM, Igor Zamyatin  wrote:
> Are you sure that tree-level unrollers are turned on at O2? My
> impression was that they work only at O3 or with f[unroll,peel]-loops
> flags.

yes they are on but only have effect on tiny loops with very small
trip count. With O3 or with -funroll,peel-loops, the size is allowed
to grow.

David

>
> On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen  wrote:
>> tejohn...@google.com (Teresa Johnson) writes:
>>
>>> This patch adds heuristics to limit unrolling in loops with branches
>>> that may increase branch mispredictions. It affects loops that are
>>> not frequently iterated, and that are nested within a hot region of code 
>>> that already contains many branch instructions.
>>>
>>> Performance tested with both internal benchmarks and with SPEC
>>> 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a 
>>> couple of different AMD Opteron systems.
>>> This improves performance of an internal search indexing benchmark by
>>> close to 2% on all the tested Intel platforms.  It also consistently
>>> improves 445.gobmk (with FDO feedback where unrolling kicks in) by
>>> close to 1% on AMD Opteron. Other performance effects are neutral.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?
>>
>> One problem with any unrolling heuristics is currently that gcc has
>> both the tree level and the rtl level unroller. The tree one is even
>> on at -O3.  So if you tweak anything for one you have to affect both,
>> otherwise the other may still do the wrong thing(tm).
>
> Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2,
> both of them are turned on, but gcc does not allow any code growth --
> which makes them pretty useless at O2 (very few loops qualify). The
> default max complete peel iteration is also too low compared with both
> icc and llvm.  This needs to be tuned.
>
> David
>
>>
>> For some other tweaks I looked into a shared cost model some time ago.
>> May be still needed.
>>
>> -Andi
>>
>> --
>> a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-27 Thread Igor Zamyatin
Are you sure that tree-level unrollers are turned on at O2? My
impression was that they work only at O3 or with f[unroll,peel]-loops
flags.

On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen  wrote:
> tejohn...@google.com (Teresa Johnson) writes:
>
>> This patch adds heuristics to limit unrolling in loops with branches
>> that may increase branch mispredictions. It affects loops that are
>> not frequently iterated, and that are nested within a hot region of code 
>> that already contains many branch instructions.
>>
>> Performance tested with both internal benchmarks and with SPEC
>> 2000/2006 on a variety of Intel systems (Core2, Corei7, SandyBridge) and a 
>> couple of different AMD Opteron systems.
>> This improves performance of an internal search indexing benchmark by
>> close to 2% on all the tested Intel platforms.  It also consistently
>> improves 445.gobmk (with FDO feedback where unrolling kicks in) by
>> close to 1% on AMD Opteron. Other performance effects are neutral.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?
>
> One problem with any unrolling heuristics is currently that gcc has
> both the tree level and the rtl level unroller. The tree one is even
> on at -O3.  So if you tweak anything for one you have to affect both,
> otherwise the other may still do the wrong thing(tm).

Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2,
both of them are turned on, but gcc does not allow any code growth --
which makes them pretty useless at O2 (very few loops qualify). The
default max complete peel iteration is also too low compared with both
icc and llvm.  This needs to be tuned.

David

>
> For some other tweaks I looked into a shared cost model some time ago.
> May be still needed.
>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only