Re: trans-mem: virtual ops for gimple_transaction

2012-02-10 Thread Richard Guenther
On Thu, 9 Feb 2012, Richard Henderson wrote:

  From: rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org
  the __transaction_atomic  // SUBCODE=[ GTMA_HAVE_STORE ] statement
  looks like an overly optimistic way to start a transaction in my quick view.
 
 Indeed.  At some point this worked, but this may have gotten lost
 during one of the merges.  Now,
 
   # .MEM_8 = VDEF .MEM_7(D)
   __transaction_relaxed  // SUBCODE=[ ... ]
 
 Bootstrapped and tested on x86_64.  Ok?

Yes.

But ...

diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index ccdf14a..ace9ef9 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -965,6 +965,13 @@ propagate_necessity (struct edge_list *el)
mark_aliased_reaching_defs_necessary (stmt, op);
}
}
+ else if (gimple_code (stmt) == GIMPLE_TRANSACTION)
+   {
+ /* The beginning of a transaction is a memory barrier.  */
+ /* ??? If we were really cool, we'd only be a barrier
+for the memories touched within the transaction.  */
+ mark_all_reaching_defs_necessary (stmt);
+   }
  else
gcc_unreachable ();

hints at that code might not expect virtual operands on this ...

What is the reason to keep a GIMPLE_TRANSACTION stmt after
TM lowering and not lower it to a builtin function call?  It seems
the body is empty after lowering (what's the label thing?)
I'd have arranged it do lower

  __transaction_atomic
{
  {
x[9] = 1;
  }
}

to

  __builtin_transaction_atomic (GTMA_HAVE_STORE, label);
  try
{
  x[9] = 1;
}
  finally
{
  __builtin__ITM_commitTransaction ();
}

Eventually returning/passing a token to link a transaction start
to its commit / abort.

That said, I expect some more fallout from the patch, but I suppose
TM is still experimental enough that we can fixup things later.

Richard.


Re: Go patch committed: Build math library with -funsafe-math-optimizations

2012-02-10 Thread Richard Guenther
On Thu, Feb 9, 2012 at 6:32 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:

 On Wed, Feb 8, 2012 at 8:38 PM, Ian Lance Taylor i...@google.com wrote:
 The master Go math library uses assembler code on 386 processors to take
 advantage of 387 instructions.  This patch lets gccgo do the same thing,
 by compiling the math library with -funsafe-math-optimizations.  I also
 pass -mfancy-math-387, although that is the default.  It would not be
 appropriate to compile all Go code with -funsafe-math-optimizations, of
 course, but the math library is designed to handle it.

 Huh ... I'd rather not do that if I were you.  Instead I'd say we lack a
 machine specific flag to enable the fancy-x87-math patterns which
 then -funsafe-math-optimizations should enable.  The x87 math
 routines are the only thing you are after, right?  No math-library
 can be _safe_ against -funsafe-math-optimizations I believe.

 Yes, that approach would make sense, but this doesn't seem like the
 right time to do it.

 The -funsafe-math-optimizations option does not permit arbitrary
 behaviour.  It merely permits a set of optimizations which violate
 strict IEEE conformance.  I believe the Go math library can be safe in
 the presence of those optimizations, because the library does explicit
 checks for NaN and infinity, where necessary, before it does the actual
 operation.  The math library has a fairly extensive set of tests,
 including tests of exceptional conditions, and it passes the tests when
 using -funsafe-math-optimizations.  Note that I'm only using
 -funsafe-math-optimizations on x86.

I see.  -funsafe-math-optimizations affects precision though (it doesn't
assume NaNs or Infinities do not happen), see the docs - it enables
-fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

Richard.

 Ian


PATCH: break lines in announce_function

2012-02-10 Thread Basile Starynkevitch

Hello All,

When running cc1 in verbose mode, the announce_function from gcc/toplevel.c
displays the name of the function. However, it never emits any newline
characters. As a result, the stderr stream may contains very long lines,
which is visually unpleasant and tend to block Emacs when running cc1 -v
inside an Emacs buffer (or agdb session inside Emacs).

The attached patch (backbported from the MELT branch, where it is very
useful) fixes this issue by adding a newline once every eight identifiers.

 gcc/ChangeLog entry

2011-02-10  Basile Starynkevitch  bas...@starynkevitch.net

* toplev.c (announce_function): Emit newline periodically.




Ok for trunk?

Thanks

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***
Index: gcc/toplev.c
===
--- gcc/toplev.c	(revision 184084)
+++ gcc/toplev.c	(working copy)
@@ -234,6 +234,12 @@ announce_function (tree decl)
 {
   if (!quiet_flag)
 {
+
+  static long count;
+  count++;
+  if (count % 8 == 0)
+putc('\n', stderr);
+
   if (rtl_dump_and_exit)
 	fprintf (stderr, %s ,
 		 identifier_to_locale (IDENTIFIER_POINTER (DECL_NAME (decl;


Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result

2012-02-10 Thread Richard Guenther
On Fri, Feb 10, 2012 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/2/9 Richard Guenther richard.guent...@gmail.com:
 Works apart from

 Running target unix/
 FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test
 FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test

 Maybe invalid testcases.  Who knows ... same fails happen with your patch.

 Richard.

 Richard.

 Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some
 use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause
 part of this failure.

 This might lead here to this failure.  I am not sure, if such
 constructs having fixed behavior for C++, but it looks to me like
 undefined behavior.

 A C-testcase for the issue would be:

 int *foo (int *p)
 {
  *p++ = *p;
  return p;
 }

 which produces with patch now:

 foo (int * p)
 {
  int * D.1363;
  int D.1364;
  int * D.1365;

  D.1363 = p;
  p = D.1363 + 4;
  D.1364 = *p;
  *D.1363 = D.1364;
  D.1365 = p;
  return D.1365;
 }

 but in old variant we were producing:

 foo (int * p)
 {
  int D.1363;
  int * D.1364;

  D.1363 = *p;
  *p = D.1363;
  p = p + 4;
  D.1364 = p;
  return D.1364;
 }

 So, maybe the real solution for this issue might be to swap for
 assignment gimplification the order for lhs/rhs gimplification
 instead.

Well, that would certainly not be suitable for stage4 (though I have a working
patch for that as well).  The post-modify gimplification change looks better
as it also fixes the volatile aggregate-return case which would not be fixed
by re-ordering of the gimplification.

libstdc++ folks - can you investigate the testsuite failure?

The patch in question is at
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00435/fix-pr48814

Note that the _Mutable_ForwardIteratorConcept isn't the problem I think,
it's not executed code.

Thanks,
Richard.


 Regards,
 Kai


[PATCH] Fix PR52193

2012-02-10 Thread Richard Guenther

Committed as obvious.

Richard.

2012-02-10  Richard Guenther  rguent...@suse.de

PR translation/52193
* cgraphunit.c (cgraph_mark_functions_to_output): Fix typo.

Index: cgraphunit.c
===
--- cgraphunit.c(revision 184085)
+++ cgraphunit.c(working copy)
@@ -1430,14 +1430,16 @@ cgraph_mark_functions_to_output (void)
  tree decl = node-decl;
  if (!node-global.inlined_to
   gimple_has_body_p (decl)
- /* FIXME: in ltrans unit when offline copy is outside partition 
but inline copies
-are inside partition, we can end up not removing the body 
since we no longer
-have analyzed node pointing to it.  */
+ /* FIXME: in an ltrans unit when the offline copy is outside a
+partition but inline copies are inside a partition, we can
+end up not removing the body since we no longer have an
+analyzed node pointing to it.  */
   !node-in_other_partition
   !DECL_EXTERNAL (decl))
{
  dump_cgraph_node (stderr, node);
- internal_error (failed to reclaim unneeded functionin same 
comdat group);
+ internal_error (failed to reclaim unneeded function in same 
+ comdat group);
}
}
 #endif


[PATCH, ARM] Properly count number of instructions emitted.

2012-02-10 Thread Roman Zhuykov
This weak I investigated GCC trunk fails to compile test 186.crafty
from SPECINT2000
on ARM. File validate.c compilation leads to this:
/tmp/ccXFsLlG.s: Assembler messages:
/tmp/ccXFsLlG.s:3411: Error: bad immediate value for offset (4112)
/tmp/ccXFsLlG.s:7069: Error: bad immediate value for offset (4096)

I find a patch, which causes the problem.
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01249.html
Ramana, you forgot to set count = 2 in one case.

This patch fixes the problem successfully.

Cross-compiler regtest showed no new failures. All SPECINT2000 tests
work correctly.

Ok for trunk?

2012-02-10  Roman Zhuykov  zhr...@ispras.ru

* config/arm/arm.c (output_move_double): In one case properly
count number of instructions that will be emitted.
---
 gcc/config/arm/arm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e2ab102..7f0dc6b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14205,6 +14205,9 @@ output_move_double (rtx *operands, bool emit,
int *count)
output_asm_insn (sub%?\t%0, %1, %2, otherops);
}

+ if (count)
+   *count = 2;
+
  if (TARGET_LDRD)
return ldr%(d%)\t%0, [%1];

-- 
Roman Zhuykov
zhr...@ispras.ru


Re: [PATCH, ARM] Properly count number of instructions emitted.

2012-02-10 Thread Richard Earnshaw
On 10/02/12 11:39, Roman Zhuykov wrote:
 This weak I investigated GCC trunk fails to compile test 186.crafty
 from SPECINT2000
 on ARM. File validate.c compilation leads to this:
 /tmp/ccXFsLlG.s: Assembler messages:
 /tmp/ccXFsLlG.s:3411: Error: bad immediate value for offset (4112)
 /tmp/ccXFsLlG.s:7069: Error: bad immediate value for offset (4096)
 
 I find a patch, which causes the problem.
 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01249.html
 Ramana, you forgot to set count = 2 in one case.
 
 This patch fixes the problem successfully.
 
 Cross-compiler regtest showed no new failures. All SPECINT2000 tests
 work correctly.
 
 Ok for trunk?
 
 2012-02-10  Roman Zhuykov  zhr...@ispras.ru
 
 * config/arm/arm.c (output_move_double): In one case properly
 count number of instructions that will be emitted.

OK.

R.



PR middle-end/48600

2012-02-10 Thread Jan Hubicka
Hi,
this PR is about bug in my update to predict_paths_for_bb when handing abnormal 
edges.
Abnormal edges can not be predicted and thus the funcition is trying to predict 
paths
leading to the source BB of the edge instead. This however may lead to infinite 
loop
when two regions are mutually reachable by abnormals.

Bootstrapped/regtested x86_64-linux. Will commit it shortly.
My apologies for ignoring the PR for so long.

Honza

PR middle-end/48600
* predict.c (predict_paths_for_bb): Prevent looping.
(predict_paths_leading_to_edge, predict_paths_leading_to): Update.
* g++.dg/torture/pr48600.C: New testcase.
Index: gcc/testsuite/g++.dg/torture/pr48600.C
===
*** gcc/testsuite/g++.dg/torture/pr48600.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr48600.C  (revision 0)
***
*** 0 
--- 1,16 
+ /* { dg-do compile } */
+ 
+ class mx {
+ public:
+ mx();
+ };
+ 
+ int main()
+ {
+   while (true) {
+   mx *bar = new mx;
+   mx *baz = new mx;
+   continue;
+   }
+   return 0;
+ }
Index: gcc/predict.c
===
*** gcc/predict.c   (revision 184016)
--- gcc/predict.c   (working copy)
*** tree_estimate_probability_driver (void)
*** 1827,1833 
  static void
  predict_paths_for_bb (basic_block cur, basic_block bb,
  enum br_predictor pred,
! enum prediction taken)
  {
edge e;
edge_iterator ei;
--- 1827,1834 
  static void
  predict_paths_for_bb (basic_block cur, basic_block bb,
  enum br_predictor pred,
! enum prediction taken,
! bitmap visited)
  {
edge e;
edge_iterator ei;
*** predict_paths_for_bb (basic_block cur, b
*** 1848,1854 
continue;
gcc_assert (bb == cur || dominated_by_p (CDI_POST_DOMINATORS, cur, bb));
  
!   /* See if there is how many edge from e-src that is not abnormal
 and does not lead to BB.  */
FOR_EACH_EDGE (e2, ei2, e-src-succs)
if (e2 != e
--- 1849,1855 
continue;
gcc_assert (bb == cur || dominated_by_p (CDI_POST_DOMINATORS, cur, bb));
  
!   /* See if there is an edge from e-src that is not abnormal
 and does not lead to BB.  */
FOR_EACH_EDGE (e2, ei2, e-src-succs)
if (e2 != e
*** predict_paths_for_bb (basic_block cur, b
*** 1861,1876 
  
/* If there is non-abnormal path leaving e-src, predict edge
 using predictor.  Otherwise we need to look for paths
!leading to e-src.  */
if (found)
  predict_edge_def (e, pred, taken);
!   else
!   predict_paths_for_bb (e-src, e-src, pred, taken);
  }
for (son = first_dom_son (CDI_POST_DOMINATORS, cur);
 son;
 son = next_dom_son (CDI_POST_DOMINATORS, son))
! predict_paths_for_bb (son, bb, pred, taken);
  }
  
  /* Sets branch probabilities according to PREDiction and
--- 1862,1881 
  
/* If there is non-abnormal path leaving e-src, predict edge
 using predictor.  Otherwise we need to look for paths
!leading to e-src.
! 
!The second may lead to infinite loop in the case we are predicitng
!regions that are only reachable by abnormal edges.  We simply
!prevent visiting given BB twice.  */
if (found)
  predict_edge_def (e, pred, taken);
!   else if (!bitmap_set_bit (visited, e-src-index))
!   predict_paths_for_bb (e-src, e-src, pred, taken, visited);
  }
for (son = first_dom_son (CDI_POST_DOMINATORS, cur);
 son;
 son = next_dom_son (CDI_POST_DOMINATORS, son))
! predict_paths_for_bb (son, bb, pred, taken, visited);
  }
  
  /* Sets branch probabilities according to PREDiction and
*** static void
*** 1880,1886 
  predict_paths_leading_to (basic_block bb, enum br_predictor pred,
  enum prediction taken)
  {
!   predict_paths_for_bb (bb, bb, pred, taken);
  }
  
  /* Like predict_paths_leading_to but take edge instead of basic block.  */
--- 1885,1893 
  predict_paths_leading_to (basic_block bb, enum br_predictor pred,
  enum prediction taken)
  {
!   bitmap visited = BITMAP_ALLOC (NULL);
!   predict_paths_for_bb (bb, bb, pred, taken, visited);
!   BITMAP_FREE (visited);
  }
  
  /* Like predict_paths_leading_to but take edge instead of basic block.  */
*** predict_paths_leading_to_edge (edge e, e
*** 1903,1909 
break;
}
if (!has_nonloop_edge)
! predict_paths_for_bb (bb, bb, pred, taken);
else
  predict_edge_def (e, pred, taken);
  }
--- 1910,1920 
break;
}
if (!has_nonloop_edge)
! {
!   bitmap visited = BITMAP_ALLOC (NULL);
!   predict_paths_for_bb (bb, bb, pred, taken, visited);

Re: PR middle-end/48600

2012-02-10 Thread Jakub Jelinek
On Fri, Feb 10, 2012 at 01:00:16PM +0100, Jan Hubicka wrote:

Does this belong to the patch?  It isn't mentioned in the ChangeLog,
nor I see how it is related...

 *** libcpp/macro.c(revision 184016)
 --- libcpp/macro.c(working copy)
 *** tokens_buff_last_token_ptr (_cpp_buff *b
 *** 1878,1884 
  If VIRT_LOCS_BUFF is non-NULL,  it should point at the buffer
  containing the virtual locations of the tokens in TOKENS_BUFF; in
  which case the function updates that buffer as well.   */
 ! static inline void
   tokens_buff_remove_last_token (_cpp_buff *tokens_buff)
   
   {
 --- 1878,1884 
  If VIRT_LOCS_BUFF is non-NULL,  it should point at the buffer
  containing the virtual locations of the tokens in TOKENS_BUFF; in
  which case the function updates that buffer as well.   */
 ! static void
   tokens_buff_remove_last_token (_cpp_buff *tokens_buff)
   
   {

Jakub


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Richard Guenther
On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Following is a revision of yesterday's PR50031 patch submission,
 modified per Richard's comments.  Bootstrapped and tested with no
 regressions on powerpc64-linux.  I've confirmed the same performance
 improvements in SPEC.  OK for trunk?

Some more questions - maybe Jakub can clarify as well given he worked
on patterns recently ...

 Thanks,
 Bill


 2012-02-09  Bill Schmidt  wschm...@linux.vnet.ibm.com
            Ira Rosen  i...@il.ibm.com

        PR tree-optimization/50031
        * targhooks.c (default_builtin_vectorization_cost): Handle
        vec_promote_demote.
        * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
        * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
        all types of reduction and pattern statements.
        (vect_estimate_min_profitable_iters): Likewise.
        * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
        (vect_get_load_cost): Use vec_perm for permutations; add dump logic
        for explicit realigns.
        (vectorizable_conversion): Call vect_model_promotion_demotion_cost.
        * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
        vec_promote_demote.
        * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
        vec_perm for VSX and handle vec_promote_demote.


 Index: gcc/targhooks.c
 ===
 --- gcc/targhooks.c     (revision 183944)
 +++ gcc/targhooks.c     (working copy)
 @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost
       case scalar_to_vec:
       case cond_branch_not_taken:
       case vec_perm:
 +      case vec_promote_demote:
         return 1;

       case unaligned_load:
 Index: gcc/target.h
 ===
 --- gcc/target.h        (revision 183944)
 +++ gcc/target.h        (working copy)
 @@ -145,7 +145,8 @@ enum vect_cost_for_stmt
   scalar_to_vec,
   cond_branch_not_taken,
   cond_branch_taken,
 -  vec_perm
 +  vec_perm,
 +  vec_promote_demote
  };

  /* The target structure.  This holds all the backend hooks.  */
 Index: gcc/tree-vect-loop.c
 ===
 --- gcc/tree-vect-loop.c        (revision 183944)
 +++ gcc/tree-vect-loop.c        (working copy)
 @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf
           if (stmt_info
                !STMT_VINFO_RELEVANT_P (stmt_info)
                (!STMT_VINFO_LIVE_P (stmt_info)
 -                  || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def))
 +                  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
 (stmt_info)))
 +              !STMT_VINFO_IN_PATTERN_P (stmt_info))
             continue;

Why would we exclude !relevant stmts when they are part of a pattern?
We are looking at a scalar iteration cost, so all stmts that are not dead
count, no?


           if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
 @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info
        {
          gimple stmt = gsi_stmt (si);
          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
 +
 +         /* Translate the last statement in a pattern into the
 +            related replacement statement.  */
 +         if (STMT_VINFO_IN_PATTERN_P (stmt_info))
 +           {
 +             stmt = STMT_VINFO_RELATED_STMT (stmt_info);
 +             stmt_info = vinfo_for_stmt (stmt);
 +           }

So here we are tanslating stmt to the main scalar pattern stmt - and thus
count it as many times as we have stmts in that pattern?  That looks wrong.
More like

  if (STMT_VINFO_IN_PATTERN_P (stmt_info)
   STMT_VINFO_RELATED_STMT (stmt_info) != stmt)
continue;

?  Does the main stmt has the flag set and points to itself?

          /* Skip stmts that are not vectorized inside the loop.  */
          if (!STMT_VINFO_RELEVANT_P (stmt_info)
               (!STMT_VINFO_LIVE_P (stmt_info)
 -                 || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def))
 +                 || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
 (stmt_info
            continue;
 +

... and then, what does STMT_VINFO_INSIDE_OF_LOOP_COST of
that main pattern stmt represent?  Shouldn't it represent the cost
of the whole sequence, and thus ...

          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * 
 factor;
          /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
             some of the outside costs are generated inside the outer-loop.  
 */
          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
 +          if (is_pattern_stmt_p (stmt_info)
 +              STMT_VINFO_PATTERN_DEF_SEQ (stmt_info))
 +            {
 +             

Re: [SMS] Support new loop pattern

2012-02-10 Thread Roman Zhuykov
Ping.
Ayal, please review this patch and these three patches too:
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html

--
Roman Zhuykov
zhr...@ispras.ru


Re: PR middle-end/48600

2012-02-10 Thread Jan Hubicka
 On Fri, Feb 10, 2012 at 01:00:16PM +0100, Jan Hubicka wrote:
 
 Does this belong to the patch?  It isn't mentioned in the ChangeLog,
 nor I see how it is related...

No, it does not.  I diffed toplevel dir and this is hack for some warning
that got stuck there for months.  It is not what I comitted. Sorry for messup.

Honza


Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result

2012-02-10 Thread Andreas Schwab
Kai Tietz ktiet...@googlemail.com writes:

 Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some
 use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause
 part of this failure.

I don't think those __constraints functions are ever executed.  They are
only present to assert the presense of certain operations at compile
time.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result

2012-02-10 Thread Jonathan Wakely
On 10 February 2012 10:35, Richard Guenther wrote:
 Works apart from

 Running target unix/
 FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test
 FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test

What does libstdc++.log show for those failures?


Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result

2012-02-10 Thread Richard Guenther
On Fri, Feb 10, 2012 at 2:12 PM, Jonathan Wakely jwakely@gmail.com wrote:
 On 10 February 2012 10:35, Richard Guenther wrote:
 Works apart from

 Running target unix/
 FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test
 FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test

 What does libstdc++.log show for those failures?

spawn [open ...]^M
?xml version = 1.0?
test
sd value = 1328880244
/sd
n value = 5000
/n
m value = 1
/m
tp value = 0.2
/tp
ip value = 0.6
/ip
ep value = 0.2
/ep
cp value = 0.001
/cp
mp value = 0.25
/mp
/test
cntnr name = pat_trie_map
desc
type value = trie
Tag value = pat_trie_tag
/Tag
Node_Update value = null_node_update
/Node_Update
/type
/desc
progress

*FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test

spawn [open ...]^M
?xml version = 1.0?
test
sd value = 1328880487
/sd
n value = 5000
/n
m value = 1
/m
tp value = 0.2
/tp
ip value = 0.6
/ip
ep value = 0.2
/ep
cp value = 0.001
/cp
mp value = 0.25
/mp
/test
cntnr name = pat_trie_set
desc
type value = trie
Tag value = pat_trie_tag
/Tag
Node_Update value = null_node_update
/Node_Update
/type
/desc
progress

**FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread William J. Schmidt
Richard, thanks.  I can answer most of your questions, but for the last
one I will have to ask Ira to weigh in.

On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote:
 On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  Following is a revision of yesterday's PR50031 patch submission,
  modified per Richard's comments.  Bootstrapped and tested with no
  regressions on powerpc64-linux.  I've confirmed the same performance
  improvements in SPEC.  OK for trunk?
 
 Some more questions - maybe Jakub can clarify as well given he worked
 on patterns recently ...
 
  Thanks,
  Bill
 
 
  2012-02-09  Bill Schmidt  wschm...@linux.vnet.ibm.com
 Ira Rosen  i...@il.ibm.com
 
 PR tree-optimization/50031
 * targhooks.c (default_builtin_vectorization_cost): Handle
 vec_promote_demote.
 * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
 * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
 all types of reduction and pattern statements.
 (vect_estimate_min_profitable_iters): Likewise.
 * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New 
  function.
 (vect_get_load_cost): Use vec_perm for permutations; add dump logic
 for explicit realigns.
 (vectorizable_conversion): Call vect_model_promotion_demotion_cost.
 * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
 vec_promote_demote.
 * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
 vec_perm for VSX and handle vec_promote_demote.
 
 
  Index: gcc/targhooks.c
  ===
  --- gcc/targhooks.c (revision 183944)
  +++ gcc/targhooks.c (working copy)
  @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost
case scalar_to_vec:
case cond_branch_not_taken:
case vec_perm:
  +  case vec_promote_demote:
  return 1;
 
case unaligned_load:
  Index: gcc/target.h
  ===
  --- gcc/target.h(revision 183944)
  +++ gcc/target.h(working copy)
  @@ -145,7 +145,8 @@ enum vect_cost_for_stmt
scalar_to_vec,
cond_branch_not_taken,
cond_branch_taken,
  -  vec_perm
  +  vec_perm,
  +  vec_promote_demote
   };
 
   /* The target structure.  This holds all the backend hooks.  */
  Index: gcc/tree-vect-loop.c
  ===
  --- gcc/tree-vect-loop.c(revision 183944)
  +++ gcc/tree-vect-loop.c(working copy)
  @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf
if (stmt_info
 !STMT_VINFO_RELEVANT_P (stmt_info)
 (!STMT_VINFO_LIVE_P (stmt_info)
  -  || STMT_VINFO_DEF_TYPE (stmt_info) != 
  vect_reduction_def))
  +  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
  (stmt_info)))
  +  !STMT_VINFO_IN_PATTERN_P (stmt_info))
  continue;
 
 Why would we exclude !relevant stmts when they are part of a pattern?
 We are looking at a scalar iteration cost, so all stmts that are not dead
 count, no?

As I understand it, we're at a point where a statement replacing the
pattern exists but has not yet been inserted in the code stream.  All of
the statements in the pattern are marked irrelevant, but the related
statement of the main (last) statement of the pattern is relevant.  Thus
we need to allow the main statement through this check so the
replacement statement can be found and counted.

 
 
if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
  @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info
 {
   gimple stmt = gsi_stmt (si);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  +
  + /* Translate the last statement in a pattern into the
  +related replacement statement.  */
  + if (STMT_VINFO_IN_PATTERN_P (stmt_info))
  +   {
  + stmt = STMT_VINFO_RELATED_STMT (stmt_info);
  + stmt_info = vinfo_for_stmt (stmt);
  +   }
 
 So here we are tanslating stmt to the main scalar pattern stmt - and thus
 count it as many times as we have stmts in that pattern?  That looks wrong.
 More like
 
   if (STMT_VINFO_IN_PATTERN_P (stmt_info)
STMT_VINFO_RELATED_STMT (stmt_info) != stmt)
 continue;
 
 ?  Does the main stmt has the flag set and points to itself?

From the commentary at the end of tree-vect-patterns.c, only the main
statement in the pattern (the last one) is flagged as
STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
statement which has been created but not yet inserted in the code.  It
only gets counted once.

 
   /* 

Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread William J. Schmidt


On Fri, 2012-02-10 at 07:31 -0600, William J. Schmidt wrote:
 Richard, thanks.  I can answer most of your questions, but for the last
 one I will have to ask Ira to weigh in.
 
 On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote:
  On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt
  wschm...@linux.vnet.ibm.com wrote:
   Following is a revision of yesterday's PR50031 patch submission,
   modified per Richard's comments.  Bootstrapped and tested with no
   regressions on powerpc64-linux.  I've confirmed the same performance
   improvements in SPEC.  OK for trunk?
  
  Some more questions - maybe Jakub can clarify as well given he worked
  on patterns recently ...
  
   Thanks,
   Bill
  
  
   2012-02-09  Bill Schmidt  wschm...@linux.vnet.ibm.com
  Ira Rosen  i...@il.ibm.com
  
  PR tree-optimization/50031
  * targhooks.c (default_builtin_vectorization_cost): Handle
  vec_promote_demote.
  * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
  * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
  all types of reduction and pattern statements.
  (vect_estimate_min_profitable_iters): Likewise.
  * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New 
   function.
  (vect_get_load_cost): Use vec_perm for permutations; add dump logic
  for explicit realigns.
  (vectorizable_conversion): Call vect_model_promotion_demotion_cost.
  * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
  vec_promote_demote.
  * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
  * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
   Update
  vec_perm for VSX and handle vec_promote_demote.
  
  
   Index: gcc/targhooks.c
   ===
   --- gcc/targhooks.c (revision 183944)
   +++ gcc/targhooks.c (working copy)
   @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost
 case scalar_to_vec:
 case cond_branch_not_taken:
 case vec_perm:
   +  case vec_promote_demote:
   return 1;
  
 case unaligned_load:
   Index: gcc/target.h
   ===
   --- gcc/target.h(revision 183944)
   +++ gcc/target.h(working copy)
   @@ -145,7 +145,8 @@ enum vect_cost_for_stmt
 scalar_to_vec,
 cond_branch_not_taken,
 cond_branch_taken,
   -  vec_perm
   +  vec_perm,
   +  vec_promote_demote
};
  
/* The target structure.  This holds all the backend hooks.  */
   Index: gcc/tree-vect-loop.c
   ===
   --- gcc/tree-vect-loop.c(revision 183944)
   +++ gcc/tree-vect-loop.c(working copy)
   @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf
 if (stmt_info
  !STMT_VINFO_RELEVANT_P (stmt_info)
  (!STMT_VINFO_LIVE_P (stmt_info)
   -  || STMT_VINFO_DEF_TYPE (stmt_info) != 
   vect_reduction_def))
   +  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE 
   (stmt_info)))
   +  !STMT_VINFO_IN_PATTERN_P (stmt_info))
   continue;
  
  Why would we exclude !relevant stmts when they are part of a pattern?
  We are looking at a scalar iteration cost, so all stmts that are not dead
  count, no?
 
 As I understand it, we're at a point where a statement replacing the
 pattern exists but has not yet been inserted in the code stream.  All of
 the statements in the pattern are marked irrelevant, but the related
 statement of the main (last) statement of the pattern is relevant.  Thus
 we need to allow the main statement through this check so the
 replacement statement can be found and counted.
 
  
  
 if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
   @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info
  {
gimple stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   +
   + /* Translate the last statement in a pattern into the
   +related replacement statement.  */
   + if (STMT_VINFO_IN_PATTERN_P (stmt_info))
   +   {
   + stmt = STMT_VINFO_RELATED_STMT (stmt_info);
   + stmt_info = vinfo_for_stmt (stmt);
   +   }
  
  So here we are tanslating stmt to the main scalar pattern stmt - and thus
  count it as many times as we have stmts in that pattern?  That looks wrong.
  More like
  
if (STMT_VINFO_IN_PATTERN_P (stmt_info)
 STMT_VINFO_RELATED_STMT (stmt_info) != stmt)
  continue;
  
  ?  Does the main stmt has the flag set and points to itself?
 
 From the commentary at the end of tree-vect-patterns.c, only the main
 statement in the pattern (the last one) is 

Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Jakub Jelinek
On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote:
 From the commentary at the end of tree-vect-patterns.c, only the main
 statement in the pattern (the last one) is flagged as
 STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
 statement which has been created but not yet inserted in the code.  It
 only gets counted once.

STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not
just the last, but all original stmts that are being replaced by the
pattern).  Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt
(last one corresponding to that original stmt).  If needed, further
pattern stmts for that original stmts are put into
STMT_VINFO_PATTERN_DEF_SEQ.  The pattern matcher could e.g. match
3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT
on the first original stmt, then say 2 pattern stmts into
STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one
STMT_VINFO_RELATED_STMT and finally one pattern stmt through
STMT_VINFO_RELATED_STMT on the third original stmt.

 Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
 this section has to be omitted if we backport it (which is desirable
 since the degradation was introduced in 4.6).  Removing it apparently
 does not affect the sphinx3 degradation.

Yeah, when backporting you can basically assume that
STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead
of itt (and simplify), because no pattern recognizers needed more than one
pattern stmt for each original stmt back then.

Jakub


Re: Go patch committed: Multiplex goroutines onto OS threads

2012-02-10 Thread Rainer Orth
Ian Lance Taylor i...@google.com writes:

 Ian Lance Taylor i...@google.com writes:

 Right now it looks like there is a bug, or at least an incompatibility,
 in the 64-bit versions of getcontext and setcontext.  It looks like
 calling setcontext on the 32-bit version does not change the value of
 TLS variables, which is also the case on GNU/Linux.  In the 64-bit
 version, calling setcontext does change the value of TLS variables.
 That is, on the 64-bit version, getcontext preserves the value of TLS
 variables and setcontext restores the old value.  (Of course it's really
 the pointer, not the TLS variables themselves).  This incompatibility
 has to be a bug, and of course I would prefer that the incompatibility
 be resolved in favor of the implementation used on GNU/Linux.

 Well, I thought I could fix this in a sensible way, but I really
 couldn't.  It's a fairly serious bug.  Calling setcontext in thread T2
 when getcontext was called in thread T1 causes T2 to start using T1's
 TLS variables, T1's pthread_getspecific values, and even T1's
 pthread_self.  I couldn't figure out any way that T2 could do any thread
 specific.

 So I cheated and wrote a system-specific hack.  With this patch applied,
 I see only 4 failures running the testsuite on x86/x86_64 Solaris.
 Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu, where the
 code is not used.  Committed to mainline.

Excellent, works like a charm, thanks.  The patches for Solaris 10 and
11 are expected within the next month or two.  I only noticed two
problems:

* On Solaris 8/x86 with native TLS, SETCONTEXT_CLOBBERS_TLS was
  incorrectly defined, causing a proc.c compilation failure:

/vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:105:4: error: #error unknown 
case for SETCONTEXT_CLOBBERS_TLS
/vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_gogo':
/vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:299:2: error: implicit 
declaration of function 'fixcontext' [-Werror=implicit-function-declaration]
/vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 
'runtime_schedinit':
/vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:366:2: error: implicit 
declaration of function 'initcontext' [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[4]: *** [proc.lo] Error 1

  The configure test failed like this:

configure:14894: ./conftest
ld.so.1: conftest: fatal: conftest: object requires TLS, but TLS failed to initi
alize
/vol/gcc/src/hg/trunk/local/libgo/configure[20]: 28491 Killed

  The problem is that on Solaris 8 one needs to link with -pthread which
  takes care of finding the correct libthread.so.  The following patch
  takes care of that and gives decent testsuite results.

* In the cross-compilation case, one needs to test both the
  i?86-*-solaris2.1[01] and x86_64-*-solaris2.1[10] target triplets, but
  only for 64-bit.  The first is the default, while one can configure
  for the second if need be.  The patch contains the necessary change,
  adapted from libgcc/configure.ac's type size detection.  Besides, the
  bug affects both Solaris 10 and 11, so the workaround should trigger
  on either, too.

Rainer


diff --git a/libgo/configure.ac b/libgo/configure.ac
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -584,8 +584,12 @@ fi
 dnl See whether setcontext changes the value of TLS variables.
 AC_CACHE_CHECK([whether setcontext clobbers TLS variables],
 [libgo_cv_lib_setcontext_clobbers_tls],
-[LIBS_hold=$LIBS
+[CFLAGS_hold=$CFLAGS
+CFLAGS=$PTHREAD_CFLAGS
+LIBS_hold=$LIBS
 LIBS=$LIBS $PTHREAD_LIBS
+AC_CHECK_SIZEOF([void *])
+AS_VAR_ARITH([ptr_type_size], [$ac_cv_sizeof_void_p \* 8])
 AC_RUN_IFELSE(
   [AC_LANG_SOURCE([
 #include pthread.h
@@ -650,11 +654,14 @@ main ()
 ])],
 [libgo_cv_lib_setcontext_clobbers_tls=no],
 [libgo_cv_lib_setcontext_clobbers_tls=yes],
-[case $target in
-  x86_64*-*-solaris2.10) libgo_cv_lib_setcontext_clobbers_tls=yes ;;
-  *) libgo_cv_lib_setcontext_clobbers_tls=no ;;
+[case $target:$ptr_type_size in
+  i?86-*-solaris2.1[[01]]:64 | x86_64*-*-solaris2.1[[01]]:64)
+libgo_cv_lib_setcontext_clobbers_tls=yes ;;
+  *)
+libgo_cv_lib_setcontext_clobbers_tls=no ;;
  esac
 ])
+CFLAGS=$CFLAGS_hold
 LIBS=$LIBS_hold
 ])
 if test $libgo_cv_lib_setcontext_clobbers_tls = yes; then

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Release virtual SSA names properly

2012-02-10 Thread Richard Guenther

I've played with

Index: tree-ssa.c
===
--- tree-ssa.c  (revision 184088)
+++ tree-ssa.c  (working copy)
@@ -944,6 +944,11 @@ verify_ssa (bool check_modified_stmt)
  if (!gimple_nop_p (stmt))
{
  basic_block bb = gimple_bb (stmt);
+ if (!bb)
+   {
+ error (Unreleased SSA name);
+ goto err;
+   }
  verify_def (bb, definition_block,
  name, stmt, !is_gimple_reg (name));
 
and noticed we do not properly free a lot of virtual SSA names, leaking
both SSA names, stmts and operands they point to over time.

The following patch cures the cases I hit with running the tree-ssa.exp
testsuite.  More issues exist, and I guess we should make a more
verbose variant of the above the default in 4.8 to catch these issues
(there is a slight issue with this when verify_ssa is invoked during
loop transforms and the vectorizer has pattern stmts built that are
not in a basic block ... so maybe we should put the above into some
other verifier).

Bootstrapped on x86_64-unknown-linux-gnu, testing in currently in
progress.

I'll commit this unless somebody thinks it's a really bad idea
at this point (I do not have numbers for the amount of memory
we save - once we have fully compiled a function and we do not
need its body for further inlining we throw it away and with it
all leaked SSA names).

Thanks,
Richard.

2012-02-10  Richard Guenther  rguent...@suse.de

* tree-nrv.c (tree_nrv): Release VDEFs.
* tree-inline.c (expand_call_inline): Likewise.
* tree-sra.c (sra_modify_constructor_assign): Likewise.
(sra_modify_assign): Likewise.
* tree-vect-stmts.c (vect_remove_stores): Likewise.
* tree-vect-loop.c (vect_transform_loop): Likewise.
* tree-ssa-dom.c (optimize_stmt): Likewise.
* tree-vect-slp.c (vect_schedule_slp): Likewise.
* tree-ssa-math-opts.c (execute_cse_sincos): Likewise.

Index: gcc/tree-nrv.c
===
--- gcc/tree-nrv.c  (revision 184088)
+++ gcc/tree-nrv.c  (working copy)
@@ -244,6 +244,7 @@ tree_nrv (void)
{
  unlink_stmt_vdef (stmt);
  gsi_remove (gsi, true);
+ release_defs (stmt);
}
  else
{
Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 184088)
+++ gcc/tree-inline.c   (working copy)
@@ -4002,6 +4002,9 @@ expand_call_inline (basic_block bb, gimp
 
   /* Unlink the calls virtual operands before replacing it.  */
   unlink_stmt_vdef (stmt);
+  if (gimple_vdef (stmt)
+   TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
+release_ssa_name (gimple_vdef (stmt));
 
   /* If the inlined function returns a result that we care about,
  substitute the GIMPLE_CALL with an assignment of the return
@@ -4043,7 +4046,7 @@ expand_call_inline (basic_block bb, gimp
}
}
   else
-gsi_remove (stmt_gsi, true);
+   gsi_remove (stmt_gsi, true);
 }
 
   if (purge_dead_abnormal_edges)
Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  (revision 184088)
+++ gcc/tree-sra.c  (working copy)
@@ -2858,6 +2858,7 @@ sra_modify_constructor_assign (gimple *s
{
  unlink_stmt_vdef (*stmt);
  gsi_remove (gsi, true);
+ release_defs (*stmt);
  return SRA_AM_REMOVED;
}
   else
@@ -2881,6 +2882,7 @@ sra_modify_constructor_assign (gimple *s
   init_subtree_with_zero (acc, gsi, false, loc);
   unlink_stmt_vdef (*stmt);
   gsi_remove (gsi, true);
+  release_defs (*stmt);
   return SRA_AM_REMOVED;
 }
   else
@@ -3125,6 +3127,7 @@ sra_modify_assign (gimple *stmt, gimple_
  gsi_next (gsi);
  unlink_stmt_vdef (*stmt);
  gsi_remove (orig_gsi, true);
+ release_defs (*stmt);
  sra_stats.deleted++;
  return SRA_AM_REMOVED;
}
@@ -3145,6 +3148,7 @@ sra_modify_assign (gimple *stmt, gimple_
  gcc_assert (*stmt == gsi_stmt (*gsi));
  unlink_stmt_vdef (*stmt);
  gsi_remove (gsi, true);
+ release_defs (*stmt);
  sra_stats.deleted++;
  return SRA_AM_REMOVED;
}
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 184088)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -5596,7 +5596,9 @@ vect_remove_stores (gimple first_stmt)
next = STMT_VINFO_RELATED_STMT (stmt_info);
   /* Free the attached stmt_vec_info and remove the stmt.  */
   next_si = gsi_for_stmt (next);
+  unlink_stmt_vdef (next);
   gsi_remove (next_si, true);
+  

Re: Go patch committed: Multiplex goroutines onto OS threads

2012-02-10 Thread Ian Lance Taylor
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 * On Solaris 8/x86 with native TLS, SETCONTEXT_CLOBBERS_TLS was
   incorrectly defined, causing a proc.c compilation failure:

 /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:105:4: error: #error unknown 
 case for SETCONTEXT_CLOBBERS_TLS
 /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_gogo':
 /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:299:2: error: implicit 
 declaration of function 'fixcontext' [-Werror=implicit-function-declaration]
 /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 
 'runtime_schedinit':
 /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:366:2: error: implicit 
 declaration of function 'initcontext' [-Werror=implicit-function-declaration]
 cc1: all warnings being treated as errors
 make[4]: *** [proc.lo] Error 1

   The configure test failed like this:

 configure:14894: ./conftest
 ld.so.1: conftest: fatal: conftest: object requires TLS, but TLS failed to 
 initi
 alize
 /vol/gcc/src/hg/trunk/local/libgo/configure[20]: 28491 Killed

   The problem is that on Solaris 8 one needs to link with -pthread which
   takes care of finding the correct libthread.so.  The following patch
   takes care of that and gives decent testsuite results.

 * In the cross-compilation case, one needs to test both the
   i?86-*-solaris2.1[01] and x86_64-*-solaris2.1[10] target triplets, but
   only for 64-bit.  The first is the default, while one can configure
   for the second if need be.  The patch contains the necessary change,
   adapted from libgcc/configure.ac's type size detection.  Besides, the
   bug affects both Solaris 10 and 11, so the workaround should trigger
   on either, too.

Thanks.  Committed.

Ian


Re: Go patch committed: Build math library with -funsafe-math-optimizations

2012-02-10 Thread Ian Lance Taylor
Richard Guenther richard.guent...@gmail.com writes:

 On Thu, Feb 9, 2012 at 6:32 PM, Ian Lance Taylor i...@google.com wrote:
 Richard Guenther richard.guent...@gmail.com writes:

 On Wed, Feb 8, 2012 at 8:38 PM, Ian Lance Taylor i...@google.com wrote:
 The master Go math library uses assembler code on 386 processors to take
 advantage of 387 instructions.  This patch lets gccgo do the same thing,
 by compiling the math library with -funsafe-math-optimizations.  I also
 pass -mfancy-math-387, although that is the default.  It would not be
 appropriate to compile all Go code with -funsafe-math-optimizations, of
 course, but the math library is designed to handle it.

 Huh ... I'd rather not do that if I were you.  Instead I'd say we lack a
 machine specific flag to enable the fancy-x87-math patterns which
 then -funsafe-math-optimizations should enable.  The x87 math
 routines are the only thing you are after, right?  No math-library
 can be _safe_ against -funsafe-math-optimizations I believe.

 Yes, that approach would make sense, but this doesn't seem like the
 right time to do it.

 The -funsafe-math-optimizations option does not permit arbitrary
 behaviour.  It merely permits a set of optimizations which violate
 strict IEEE conformance.  I believe the Go math library can be safe in
 the presence of those optimizations, because the library does explicit
 checks for NaN and infinity, where necessary, before it does the actual
 operation.  The math library has a fairly extensive set of tests,
 including tests of exceptional conditions, and it passes the tests when
 using -funsafe-math-optimizations.  Note that I'm only using
 -funsafe-math-optimizations on x86.

 I see.  -funsafe-math-optimizations affects precision though (it doesn't
 assume NaNs or Infinities do not happen), see the docs - it enables
 -fno-signed-zeros, -fno-trapping-math, -fassociative-math and 
 -freciprocal-math.


Affecting precision is OK for the Go math library, since the master
version of the library is written to use the 80-bit 80387 registers on
386, in order to use the magic 80387 instructions.

Ian


[PATCH] Fix __atomic_{always,is}_lock_free folding (PR middle-end/52177)

2012-02-10 Thread Jakub Jelinek
Hi!

These builtins are:
DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE,
  __atomic_always_lock_free,
  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)

DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_IS_LOCK_FREE,
  __atomic_is_lock_free,
  BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST)
therefore return bool rather than int, but apparently the
folders returned integer constants anyway, which e.g. on the following
testcase leads to checking ICEs.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-02-10  Jakub Jelinek  ja...@redhat.com

PR middle-end/52177
* builtins.c (fold_builtin_atomic_always_lock_free,
expand_builtin_atomic_always_lock_free,
fold_builtin_atomic_is_lock_free,
expand_builtin_atomic_is_lock_free): Return and/or test
boolean_true_node/boolean_false_node instead of
integer_one_node/integer_zero_node.

* c-c++-common/pr52177.c: New test.

--- gcc/builtins.c.jj   2012-01-30 00:10:01.0 +0100
+++ gcc/builtins.c  2012-02-10 09:37:37.719936106 +0100
@@ -5639,15 +5639,15 @@ fold_builtin_atomic_always_lock_free (tr
   /* If the object has smaller alignment, the the lock free routines cannot
  be used.  */
   if (type_align  mode_align)
-return integer_zero_node;
+return boolean_false_node;
 
   /* Check if a compare_and_swap pattern exists for the mode which represents
  the required size.  The pattern is not allowed to fail, so the existence
  of the pattern indicates support is present.  */
   if (can_compare_and_swap_p (mode, true))
-return integer_one_node;
+return boolean_true_node;
   else
-return integer_zero_node;
+return boolean_false_node;
 }
 
 /* Return true if the parameters to call EXP represent an object which will
@@ -5671,7 +5671,7 @@ expand_builtin_atomic_always_lock_free (
 }
 
   size = fold_builtin_atomic_always_lock_free (arg0, arg1);
-  if (size == integer_one_node)
+  if (size == boolean_true_node)
 return const1_rtx;
   return const0_rtx;
 }
@@ -5686,8 +5686,8 @@ fold_builtin_atomic_is_lock_free (tree a
 return NULL_TREE;
   
   /* If it isn't always lock free, don't generate a result.  */
-  if (fold_builtin_atomic_always_lock_free (arg0, arg1) == integer_one_node)
-return integer_one_node;
+  if (fold_builtin_atomic_always_lock_free (arg0, arg1) == boolean_true_node)
+return boolean_true_node;
 
   return NULL_TREE;
 }
@@ -5717,7 +5717,7 @@ expand_builtin_atomic_is_lock_free (tree
 
   /* If the value is known at compile time, return the RTX for it.  */
   size = fold_builtin_atomic_is_lock_free (arg0, arg1);
-  if (size == integer_one_node)
+  if (size == boolean_true_node)
 return const1_rtx;
 
   return NULL_RTX;
--- gcc/testsuite/c-c++-common/pr52177.c.jj 2012-02-10 09:46:27.175770110 
+0100
+++ gcc/testsuite/c-c++-common/pr52177.c2012-02-10 09:46:01.0 
+0100
@@ -0,0 +1,23 @@
+/* PR middle-end/52177 */
+/* { dg-do compile } */
+/* { dg-options -O -fno-tree-ccp } */
+
+int *s;
+
+static inline int
+foo ()
+{
+  return sizeof (int);
+}
+
+int
+bar ()
+{
+  return __atomic_always_lock_free (foo (), s);
+}
+
+int
+baz ()
+{
+  return __atomic_is_lock_free (foo (), s);
+}

Jakub


[PATCH] Fix cgraph verification (PR middle-end/51929)

2012-02-10 Thread Jakub Jelinek
Hi!

As discussed in the PR, this is a verification ICE if a call stmt
calls a same_body_alias (__comp_ctor) and IPA-CP decides to clone the
ctor, it clones the function with the body (__base_ctor), but
before the call stmts are updated, the checking fails because
the clone isn't clone or former clone of the called function,
but its alias.

Fixed by adjusting the verifier, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2012-02-10  Jakub Jelinek  ja...@redhat.com

PR middle-end/51929
* cgraphunit.c (verify_edge_corresponds_to_fndecl): If node is
a same_body_alias, also test whether e-callee isn't a former
or current clone of the decl this is a same body alias of.
(verify_cgraph_node): Fix a typo in error message.

* g++.dg/ipa/pr51929.C: New test.

--- gcc/cgraphunit.c.jj 2012-02-03 13:31:41.0 +0100
+++ gcc/cgraphunit.c2012-02-10 12:35:44.682099203 +0100
@@ -471,11 +471,17 @@ verify_edge_corresponds_to_fndecl (struc
 return false;
   node = cgraph_function_or_thunk_node (node, NULL);
 
-  if ((e-callee-former_clone_of != node-decl)
+  if ((e-callee-former_clone_of != node-decl
+(!node-same_body_alias
+  || e-callee-former_clone_of != node-thunk.alias))
   /* IPA-CP sometimes redirect edge to clone and then back to the former
-function.  This ping-pong has to go, eventaully.  */
+function.  This ping-pong has to go, eventually.  */
(node != cgraph_function_or_thunk_node (e-callee, NULL))
-   !clone_of_p (node, e-callee))
+   !clone_of_p (node, e-callee)
+  /* If decl is a same body alias of some other decl, allow e-callee to be
+a clone of a clone of that other decl too.  */
+   (!node-same_body_alias
+ || !clone_of_p (cgraph_get_node (node-thunk.alias), e-callee)))
 return true;
   else
 return false;
@@ -667,7 +673,7 @@ verify_cgraph_node (struct cgraph_node *
   for (i = 0; ipa_ref_list_reference_iterate (node-ref_list, i, ref); 
i++)
if (ref-use != IPA_REF_ALIAS)
  {
-   error (Alias has non-alias refernece);
+   error (Alias has non-alias reference);
error_found = true;
  }
else if (ref_found)
--- gcc/testsuite/g++.dg/ipa/pr51929.C.jj   2012-02-10 12:36:55.792748549 
+0100
+++ gcc/testsuite/g++.dg/ipa/pr51929.C  2012-02-09 19:42:10.0 +0100
@@ -0,0 +1,32 @@
+// PR middle-end/51929
+// { dg-do compile }
+// { dg-options -O -fno-guess-branch-probability -fipa-cp -fipa-cp-clone 
--param=max-inline-insns-single=25 }
+
+struct A
+{
+  A (A, unsigned);
+  A (const char *);
+  ~A () { a1 (a4 ()); }
+  void a1 (int);
+  unsigned a2 ();
+  char *a3 ();
+  int a4 ();
+};
+
+template typename T
+struct B
+{
+  A b;
+  B (A x, int y = 1) : b (x.a3 (), x.a2 ()) { if (y  1) b.a2 (); }
+};
+
+extern template struct B char;
+A a1 (foo), a2 (bar);
+Bchar b1 (a1), b2 (a2, 8);
+
+void
+foo ()
+{
+  A a3 (baz);
+  Bchar b3 (a1), b4 (a3);
+}

Jakub


Re: [PATCH] Fix sibcall argument overlap checking if pretend_args_size (PR target/52129)

2012-02-10 Thread Carrot Wei
On Fri, Feb 10, 2012 at 2:13 PM, Jing Yu jin...@google.com wrote:
 On Thu, Feb 9, 2012 at 12:54 AM, Carrot Wei car...@google.com wrote:
 Hi Richard and Jakub

 Since 4.6 contains the same bug, I would like to back port it to 4.6
 branch. Could you approve it for 4.6?

 Jing and Doug

 Could you approve it for google/gcc-4_6-mobile branch?


 OK for google/gcc-4_6-mobile and gcc-4_6_2-mobile

 Jing


Bootstrapped/regtested on x86_64-linux and regtested on arm qemu.
Committed to both google/gcc-4_6-mobile and google/gcc-4_6_2-mobile.

thanks
Carrot


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread William J. Schmidt
Jakub, thanks!  Based on this, I believe the patch is correct in its
handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double
counting.

I misinterpreted what the commentary for vect_pattern_recog was saying:
I thought that all replacements were hung off of the last pattern
statement, but this was just true for the particular example, where only
one replacement statement was created.  Sorry for any confusion!

Based on the discussion, is the patch OK for trunk?

Thanks,
Bill

On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote:
 On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote:
  From the commentary at the end of tree-vect-patterns.c, only the main
  statement in the pattern (the last one) is flagged as
  STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
  statement which has been created but not yet inserted in the code.  It
  only gets counted once.
 
 STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not
 just the last, but all original stmts that are being replaced by the
 pattern).  Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt
 (last one corresponding to that original stmt).  If needed, further
 pattern stmts for that original stmts are put into
 STMT_VINFO_PATTERN_DEF_SEQ.  The pattern matcher could e.g. match
 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT
 on the first original stmt, then say 2 pattern stmts into
 STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one
 STMT_VINFO_RELATED_STMT and finally one pattern stmt through
 STMT_VINFO_RELATED_STMT on the third original stmt.
 
  Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
  this section has to be omitted if we backport it (which is desirable
  since the degradation was introduced in 4.6).  Removing it apparently
  does not affect the sphinx3 degradation.
 
 Yeah, when backporting you can basically assume that
 STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead
 of itt (and simplify), because no pattern recognizers needed more than one
 pattern stmt for each original stmt back then.
 
   Jakub
 



Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Rainer Orth
Jonathan Wakely jwakely@gmail.com writes:

 On 6 February 2012 19:24, Mike Stump wrote:
 On Feb 5, 2012, at 12:26 PM, Jonathan Wakely wrote:
 PRs libstdc++/51296 and libstdc++/51906 are both caused by problems
 with the Pthreads static initializer macros such as
 PTHREAD_MUTEX_INITIALIZER.

 On Mac OS X 10.7 the PTHREAD_RECURSIVE_MUTEX_INITIALIZER is buggy.

 Thanks for all you work on this.

 Well I broke it so I had to fix it ;)

 I'm pleased to say we should now have an almost complete C++11 thread
 implementation for most POSIX platforms, with hundreds of existing
 libstdc++ tests moving from UNSUPPORTED to PASS on some secondary
 platforms (aix and darwin, maybe hpux too.)

Indeed, thanks for your hard work on this.  The following patch on top
of your previous one allows the 30_threads tests to PASS on Tru64 UNIX.

Bootstrapped without regressions on alpha-dec-osf5.1b, ok for mainline?

I've also noticed one feature of your patch that I don't like:
__GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of
pthread_{mutex,cond}_init_function instead of passing them on as all
other gthr-posix.h functions do.  This might lead to bad error handling,
and my previous patch (based on an older version of yours you had
attached to the PR) changed them to return int instead.  I suppose
changing this now is out of question, and this is left as 4.8 material.

Rainer


2012-02-03  Rainer Orth  r...@cebitec.uni-bielefeld.de

libstdc++-v3:
PR libstdc++/51296
* config/os/osf/ctype_base.h,
config/os/osf/ctype_configure_char.cc,
config/os/osf/ctype_inline.h, config/os/osf/error_constants.h:
Copy from config/os/generic.
* config/os/osf/os_defines.h: Likewise.
(_GTHREAD_USE_MUTEX_INIT_FUNC, _GTHREAD_USE_COND_INIT_FUNC):
Define.
* configure.host osf*: Use os/osf for os_include_dir.

# HG changeset patch
# Parent badf67959a1e06e2f8c98df3797878e3123aad8e
Use __GTHREAD_MUTEX_INIT_FUNCTION on Tru64 UNIX (PR libstdc++/51296)

diff --git a/libstdc++-v3/config/os/generic/ctype_base.h b/libstdc++-v3/config/os/osf/ctype_base.h
copy from libstdc++-v3/config/os/generic/ctype_base.h
copy to libstdc++-v3/config/os/osf/ctype_base.h
diff --git a/libstdc++-v3/config/os/generic/ctype_configure_char.cc b/libstdc++-v3/config/os/osf/ctype_configure_char.cc
copy from libstdc++-v3/config/os/generic/ctype_configure_char.cc
copy to libstdc++-v3/config/os/osf/ctype_configure_char.cc
diff --git a/libstdc++-v3/config/os/generic/ctype_inline.h b/libstdc++-v3/config/os/osf/ctype_inline.h
copy from libstdc++-v3/config/os/generic/ctype_inline.h
copy to libstdc++-v3/config/os/osf/ctype_inline.h
diff --git a/libstdc++-v3/config/os/generic/error_constants.h b/libstdc++-v3/config/os/osf/error_constants.h
copy from libstdc++-v3/config/os/generic/error_constants.h
copy to libstdc++-v3/config/os/osf/error_constants.h
diff --git a/libstdc++-v3/config/os/generic/os_defines.h b/libstdc++-v3/config/os/osf/os_defines.h
copy from libstdc++-v3/config/os/generic/os_defines.h
copy to libstdc++-v3/config/os/osf/os_defines.h
--- a/libstdc++-v3/config/os/generic/os_defines.h
+++ b/libstdc++-v3/config/os/osf/os_defines.h
@@ -1,6 +1,6 @@
-// Specific definitions for generic platforms  -*- C++ -*-
+// Specific definitions for Tru64 UNIX  -*- C++ -*-
 
-// Copyright (C) 2000, 2009, 2010 Free Software Foundation, Inc.
+// Copyright (C) 2000, 2009, 2010, 2012 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -33,4 +33,9 @@
 // System-specific #define, typedefs, corrections, etc, go here.  This
 // file will come before all others.
 
+// Tru64 UNIX requires using pthread_mutex_init()/pthread_cond_init() to
+// initialized non-statically allocated mutexes/condvars.
+#define _GTHREAD_USE_MUTEX_INIT_FUNC
+#define _GTHREAD_USE_COND_INIT_FUNC
+
 #endif
diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
--- a/libstdc++-v3/configure.host
+++ b/libstdc++-v3/configure.host
@@ -280,7 +280,7 @@ case ${host_os} in
 os_include_dir=os/bsd/netbsd
 ;;
   osf*)
-os_include_dir=os/generic
+os_include_dir=os/osf
 # libstdc++.so relies on emutls on Tru64 UNIX, which only works with the
 # real functions implemented in libpthread.so, not with the dummies in
 # libgcc, so always pass -lpthread.


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result

2012-02-10 Thread Kai Tietz
2012/2/10 Richard Guenther richard.guent...@gmail.com:
 On Fri, Feb 10, 2012 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/2/9 Richard Guenther richard.guent...@gmail.com:
 Works apart from

 Running target unix/
 FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test
 FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test

 Maybe invalid testcases.  Who knows ... same fails happen with your patch.

 Richard.

 Richard.

 Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some
 use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause
 part of this failure.

 This might lead here to this failure.  I am not sure, if such
 constructs having fixed behavior for C++, but it looks to me like
 undefined behavior.

 A C-testcase for the issue would be:

 int *foo (int *p)
 {
  *p++ = *p;
  return p;
 }

 which produces with patch now:

 foo (int * p)
 {
  int * D.1363;
  int D.1364;
  int * D.1365;

  D.1363 = p;
  p = D.1363 + 4;
  D.1364 = *p;
  *D.1363 = D.1364;
  D.1365 = p;
  return D.1365;
 }

 but in old variant we were producing:

 foo (int * p)
 {
  int D.1363;
  int * D.1364;

  D.1363 = *p;
  *p = D.1363;
  p = p + 4;
  D.1364 = p;
  return D.1364;
 }

 So, maybe the real solution for this issue might be to swap for
 assignment gimplification the order for lhs/rhs gimplification
 instead.

 Well, that would certainly not be suitable for stage4 (though I have a working
 patch for that as well).  The post-modify gimplification change looks better
 as it also fixes the volatile aggregate-return case which would not be fixed
 by re-ordering of the gimplification.

 libstdc++ folks - can you investigate the testsuite failure?

 The patch in question is at
 http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00435/fix-pr48814

 Note that the _Mutable_ForwardIteratorConcept isn't the problem I think,
 it's not executed code.

 Thanks,
 Richard.

Hmm, we might need here lhs/rhs flip plus the post-modify.  At least
we would avoid by this code differences for common cases that lhs has
post-inc/dec to current behavior?

But you might be right that this patch is not suitable for stage 4.

Regards,
Kai


Re: [patch boehm-gc]: Fix PR/48514

2012-02-10 Thread Tom Tromey
 Kai == Kai Tietz ktiet...@googlemail.com writes:

Kai * include/gc_config_macros.h (GC_DLL): Define it for
Kai mingw-targets only, if
Kai we are actual in boehm-gc's build and DLL_EXPORT macro is defined.

This is ok.  Thanks.

Tom


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Richard Guenther
On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Jakub, thanks!  Based on this, I believe the patch is correct in its
 handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double
 counting.

 I misinterpreted what the commentary for vect_pattern_recog was saying:
 I thought that all replacements were hung off of the last pattern
 statement, but this was just true for the particular example, where only
 one replacement statement was created.  Sorry for any confusion!

 Based on the discussion, is the patch OK for trunk?

But you still count for each of the matched scalar stmts the cost of the
whole pattern sequence.  No?

Richard.

 Thanks,
 Bill

 On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote:
 On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote:
  From the commentary at the end of tree-vect-patterns.c, only the main
  statement in the pattern (the last one) is flagged as
  STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
  statement which has been created but not yet inserted in the code.  It
  only gets counted once.

 STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not
 just the last, but all original stmts that are being replaced by the
 pattern).  Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt
 (last one corresponding to that original stmt).  If needed, further
 pattern stmts for that original stmts are put into
 STMT_VINFO_PATTERN_DEF_SEQ.  The pattern matcher could e.g. match
 3 original stmts, and stick a single pattern stmt into 
 STMT_VINFO_RELATED_STMT
 on the first original stmt, then say 2 pattern stmts into
 STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one
 STMT_VINFO_RELATED_STMT and finally one pattern stmt through
 STMT_VINFO_RELATED_STMT on the third original stmt.

  Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
  this section has to be omitted if we backport it (which is desirable
  since the degradation was introduced in 4.6).  Removing it apparently
  does not affect the sphinx3 degradation.

 Yeah, when backporting you can basically assume that
 STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead
 of itt (and simplify), because no pattern recognizers needed more than one
 pattern stmt for each original stmt back then.

       Jakub




Re: [PATCH] Release virtual SSA names properly

2012-02-10 Thread Richard Guenther
On Fri, 10 Feb 2012, Richard Guenther wrote:

 
 I've played with
 
 Index: tree-ssa.c
 ===
 --- tree-ssa.c  (revision 184088)
 +++ tree-ssa.c  (working copy)
 @@ -944,6 +944,11 @@ verify_ssa (bool check_modified_stmt)
   if (!gimple_nop_p (stmt))
 {
   basic_block bb = gimple_bb (stmt);
 + if (!bb)
 +   {
 + error (Unreleased SSA name);
 + goto err;
 +   }
   verify_def (bb, definition_block,
   name, stmt, !is_gimple_reg (name));
  
 and noticed we do not properly free a lot of virtual SSA names, leaking
 both SSA names, stmts and operands they point to over time.
 
 The following patch cures the cases I hit with running the tree-ssa.exp
 testsuite.  More issues exist, and I guess we should make a more
 verbose variant of the above the default in 4.8 to catch these issues
 (there is a slight issue with this when verify_ssa is invoked during
 loop transforms and the vectorizer has pattern stmts built that are
 not in a basic block ... so maybe we should put the above into some
 other verifier).
 
 Bootstrapped on x86_64-unknown-linux-gnu, testing in currently in
 progress.
 
 I'll commit this unless somebody thinks it's a really bad idea
 at this point (I do not have numbers for the amount of memory
 we save - once we have fully compiled a function and we do not
 need its body for further inlining we throw it away and with it
 all leaked SSA names).

I'm not installing this now because I hit PR52198 with this change.
I'm instead queuing it for 4.8 where I can XFAIL the testcases
that are affected.

Richard.

 2012-02-10  Richard Guenther  rguent...@suse.de
 
   * tree-nrv.c (tree_nrv): Release VDEFs.
   * tree-inline.c (expand_call_inline): Likewise.
   * tree-sra.c (sra_modify_constructor_assign): Likewise.
   (sra_modify_assign): Likewise.
   * tree-vect-stmts.c (vect_remove_stores): Likewise.
   * tree-vect-loop.c (vect_transform_loop): Likewise.
   * tree-ssa-dom.c (optimize_stmt): Likewise.
   * tree-vect-slp.c (vect_schedule_slp): Likewise.
   * tree-ssa-math-opts.c (execute_cse_sincos): Likewise.
 
 Index: gcc/tree-nrv.c
 ===
 --- gcc/tree-nrv.c(revision 184088)
 +++ gcc/tree-nrv.c(working copy)
 @@ -244,6 +244,7 @@ tree_nrv (void)
   {
 unlink_stmt_vdef (stmt);
 gsi_remove (gsi, true);
 +   release_defs (stmt);
   }
 else
   {
 Index: gcc/tree-inline.c
 ===
 --- gcc/tree-inline.c (revision 184088)
 +++ gcc/tree-inline.c (working copy)
 @@ -4002,6 +4002,9 @@ expand_call_inline (basic_block bb, gimp
  
/* Unlink the calls virtual operands before replacing it.  */
unlink_stmt_vdef (stmt);
 +  if (gimple_vdef (stmt)
 +   TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
 +release_ssa_name (gimple_vdef (stmt));
  
/* If the inlined function returns a result that we care about,
   substitute the GIMPLE_CALL with an assignment of the return
 @@ -4043,7 +4046,7 @@ expand_call_inline (basic_block bb, gimp
   }
   }
else
 -gsi_remove (stmt_gsi, true);
 + gsi_remove (stmt_gsi, true);
  }
  
if (purge_dead_abnormal_edges)
 Index: gcc/tree-sra.c
 ===
 --- gcc/tree-sra.c(revision 184088)
 +++ gcc/tree-sra.c(working copy)
 @@ -2858,6 +2858,7 @@ sra_modify_constructor_assign (gimple *s
   {
 unlink_stmt_vdef (*stmt);
 gsi_remove (gsi, true);
 +   release_defs (*stmt);
 return SRA_AM_REMOVED;
   }
else
 @@ -2881,6 +2882,7 @@ sra_modify_constructor_assign (gimple *s
init_subtree_with_zero (acc, gsi, false, loc);
unlink_stmt_vdef (*stmt);
gsi_remove (gsi, true);
 +  release_defs (*stmt);
return SRA_AM_REMOVED;
  }
else
 @@ -3125,6 +3127,7 @@ sra_modify_assign (gimple *stmt, gimple_
 gsi_next (gsi);
 unlink_stmt_vdef (*stmt);
 gsi_remove (orig_gsi, true);
 +   release_defs (*stmt);
 sra_stats.deleted++;
 return SRA_AM_REMOVED;
   }
 @@ -3145,6 +3148,7 @@ sra_modify_assign (gimple *stmt, gimple_
 gcc_assert (*stmt == gsi_stmt (*gsi));
 unlink_stmt_vdef (*stmt);
 gsi_remove (gsi, true);
 +   release_defs (*stmt);
 sra_stats.deleted++;
 return SRA_AM_REMOVED;
   }
 Index: gcc/tree-vect-stmts.c
 ===
 --- gcc/tree-vect-stmts.c (revision 184088)
 +++ gcc/tree-vect-stmts.c (working copy)
 @@ -5596,7 +5596,9 @@ vect_remove_stores (gimple first_stmt)
   next = 

Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread William J. Schmidt


On Fri, 2012-02-10 at 16:22 +0100, Richard Guenther wrote:
 On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  Jakub, thanks!  Based on this, I believe the patch is correct in its
  handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double
  counting.
 
  I misinterpreted what the commentary for vect_pattern_recog was saying:
  I thought that all replacements were hung off of the last pattern
  statement, but this was just true for the particular example, where only
  one replacement statement was created.  Sorry for any confusion!
 
  Based on the discussion, is the patch OK for trunk?
 
 But you still count for each of the matched scalar stmts the cost of the
 whole pattern sequence.  No?

I need to change the commentary to make this more clear now.  My
comment:  Translate the last statement... is incorrect; this is done
for each statement in a pattern.

Per Jakub's explanation, the replacement statements are distributed over
the original pattern statements.  Visiting STMT_VINFO_RELATED_STMT for a
statement marked STMT_VINFO_IN_PATTERN_P will find zero or one
replacement statements that should be examined.  Visiting
STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement
statements that didn't fit in the 1-1 mapping.  The point is that all of
these related statements and pattern-def-seq statements are disjoint,
and Ira's logic is ensuring that they all get examined once.

It's not a very clean way to represent the replacement of a pattern --
not how you'd do it if designing from scratch -- but I guess I can see
how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto
the existing 1-1 mapping.

Bill

 
 Richard.
 
  Thanks,
  Bill
 
  On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote:
  On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote:
   From the commentary at the end of tree-vect-patterns.c, only the main
   statement in the pattern (the last one) is flagged as
   STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
   statement which has been created but not yet inserted in the code.  It
   only gets counted once.
 
  STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not
  just the last, but all original stmts that are being replaced by the
  pattern).  Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt
  (last one corresponding to that original stmt).  If needed, further
  pattern stmts for that original stmts are put into
  STMT_VINFO_PATTERN_DEF_SEQ.  The pattern matcher could e.g. match
  3 original stmts, and stick a single pattern stmt into 
  STMT_VINFO_RELATED_STMT
  on the first original stmt, then say 2 pattern stmts into
  STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one
  STMT_VINFO_RELATED_STMT and finally one pattern stmt through
  STMT_VINFO_RELATED_STMT on the third original stmt.
 
   Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
   this section has to be omitted if we backport it (which is desirable
   since the degradation was introduced in 4.6).  Removing it apparently
   does not affect the sphinx3 degradation.
 
  Yeah, when backporting you can basically assume that
  STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL 
  instead
  of itt (and simplify), because no pattern recognizers needed more than one
  pattern stmt for each original stmt back then.
 
Jakub
 
 
 



Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Jakub Jelinek
On Fri, Feb 10, 2012 at 04:22:32PM +0100, Richard Guenther wrote:
 On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  Jakub, thanks!  Based on this, I believe the patch is correct in its
  handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double
  counting.
 
  I misinterpreted what the commentary for vect_pattern_recog was saying:
  I thought that all replacements were hung off of the last pattern
  statement, but this was just true for the particular example, where only
  one replacement statement was created.  Sorry for any confusion!
 
  Based on the discussion, is the patch OK for trunk?
 
 But you still count for each of the matched scalar stmts the cost of the
 whole pattern sequence.  No?

For each stmt you should count the original stmt if it is relevant
(I think there are cases where both the original stmt and pattern stmt for
it are both relevant and emitted, didn't understand it exactly though,
but usually only the orig or only the pattern stmts are relevant),
and if STMT_VINFO_IN_PATTERN_P, count the STMT_VINFO_RELATED_STMT (if any)
and all STMT_VINFO_PATTERN_DEF_SEQ sequence stmts (if any), again, not sure
if STMT_VINFO_RELEVANT needs to be tested for each or not.
I guess look at what we actually emit if the cost model says it should be
emitted.
Neither the STMT_VINFO_PATTERN_DEF_SEQ nor STMT_VINFO_RELATED_STMT
are shared in between any orig stmts, each has its own set of stmts.
I believe if some original stmt should map to no pattern stmts, it isn't
marked as STMT_VINFO_IN_PATTERN_P at all, but as nothing uses its lhs
with the pattern stmts in place, it won't be relevant and will be ignored.

Jakub


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Jakub Jelinek
On Fri, Feb 10, 2012 at 09:36:01AM -0600, William J. Schmidt wrote:
 Per Jakub's explanation, the replacement statements are distributed over
 the original pattern statements.  Visiting STMT_VINFO_RELATED_STMT for a
 statement marked STMT_VINFO_IN_PATTERN_P will find zero or one
 replacement statements that should be examined.  Visiting
 STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement
 statements that didn't fit in the 1-1 mapping.  The point is that all of
 these related statements and pattern-def-seq statements are disjoint,
 and Ira's logic is ensuring that they all get examined once.
 
 It's not a very clean way to represent the replacement of a pattern --
 not how you'd do it if designing from scratch -- but I guess I can see
 how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto
 the existing 1-1 mapping.

Still in 4.6 we have just a 1-1 or 1-0 mapping (as I wrote in the previous
mail, I think those orig stmts that don't need any replacements, e.g.
if you have 2 orig stmts mapping to just one pattern stmt, are just marked
as not relevant).  Then 4.7 first added (Ira's changes)
STMT_VINFO_PATTERN_DEF_STMT, i.e. in addition to 1-1 and 1-0 there was
a possibility for 1-2 mapping.  And then I needed more than 2, but it was
too late in the game to do large changes, so we end up with 1-1+N scheme
by changing the DEF_STMT into DEF_SEQ. For 4.8 we definitely at least should
remove STMT_VINFO_RELATED_STMT as a way to represent pattern stmts,
even for 1-1 mapping the pattern stmt should go into a sequence.

Jakub


libgo patch committed: Save all registers on stack for GC scan

2012-02-10 Thread Ian Lance Taylor
This patch to the libgo garbage collector ensures that we save all the
registers on the stack when starting a GC.  Otherwise we could sometimes
miss a value held in a register while scanning the stack of the
goroutine running the GC.  This never happened on x86, but it did happen
on SPARC.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 1f6696441eeb libgo/runtime/mgc0.c
--- a/libgo/runtime/mgc0.c	Fri Feb 10 06:07:07 2012 -0800
+++ b/libgo/runtime/mgc0.c	Fri Feb 10 07:50:20 2012 -0800
@@ -936,6 +936,10 @@
 	const byte *p;
 	bool extra;
 
+	// Make sure all registers are saved on stack so that
+	// scanstack sees them.
+	__builtin_unwind_init();
+
 	// The gc is turned off (via enablegc) until
 	// the bootstrap has completed.
 	// Also, malloc gets called in the guts


[contrib] Add a commonly used flag to repro_fail (issue5650057)

2012-02-10 Thread Diego Novillo

This patch adds two common shortcuts to the failure reproducer
script.

Committed to trunk.

2012-02-10   Diego Novillo  dnovi...@google.com

* repro_fail: Add --debug and --debug-tui flags.

diff --git a/contrib/repro_fail b/contrib/repro_fail
index 8100456..c55d080 100755
--- a/contrib/repro_fail
+++ b/contrib/repro_fail
@@ -31,14 +31,26 @@
 # command, it asks which one you want.
 
 if [ $# -lt 2 ] ; then
-echo usage: $0 pattern file.log [additional-args]
+echo usage: $0 [--debug|--debug-tui] pattern file.log [additional-args]
 echo
 echo Finds the 'spawn' line matching PATTERN in FILE.LOG and executes
 echo the command with any arguments in ADDITIONAL-ARGS.
 echo
+echo If --debug is used, the compiler is invoked with -wrapper gdb,--args
+echo If --debug-tui is used, the compiler is invoked with -wrapper \
+ gdb,--tui,--args
 exit 1
 fi
 
+if [ $1 == --debug ] ; then
+debug_args=-wrapper gdb,--args
+shift
+elif [ $1 == --debug-tui ] ; then
+debug_args=-wrapper gdb,--tui,--args
+shift
+else
+debug_args=
+fi
 pattern=$1
 logf=$2
 shift 2
@@ -77,6 +89,6 @@ IFS=$old_IFS
 for cmd_num in $cmds_to_run ; do
 cmd=${commands[$cmd_num]}
 set -x +e
-$cmd $@
+$cmd $debug_args $@
 set +x -e
 done

--
This patch is available for review at http://codereview.appspot.com/5650057


libgo patch committed: Record g0 top of stack correctly when not split

2012-02-10 Thread Ian Lance Taylor
This patch to libgo records the top of the g0 goroutine stack correctly
when not using -fsplit-stack.  Without this patch the garbage collector
would scan beyond the top of the stack when looking at the g0 stack,
which could cause segmentation violations.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 971ec6345e64 libgo/runtime/proc.c
--- a/libgo/runtime/proc.c	Fri Feb 10 07:50:54 2012 -0800
+++ b/libgo/runtime/proc.c	Fri Feb 10 07:52:48 2012 -0800
@@ -909,7 +909,9 @@
 	__splitstack_getcontext(g-stack_context[0]);
 #else
 	g-gcinitial_sp = mp;
-	g-gcstack_size = StackMin;
+	// Setting gcstack_size to 0 is a marker meaning that gcinitial_sp
+	// is the top of the stack, not the bottom.
+	g-gcstack_size = 0;
 	g-gcnext_sp = mp;
 #endif
 	getcontext(g-context);
@@ -1267,6 +1269,8 @@
 #else
 		sp = newg-gcinitial_sp;
 		spsize = newg-gcstack_size;
+		if(spsize == 0)
+			runtime_throw(bad spsize in __go_go);
 		newg-gcnext_sp = sp;
 #endif
 	} else {


Re: [PATCH] Fix PR50031, take 2

2012-02-10 Thread Richard Guenther
On Fri, Feb 10, 2012 at 4:36 PM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:


 On Fri, 2012-02-10 at 16:22 +0100, Richard Guenther wrote:
 On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  Jakub, thanks!  Based on this, I believe the patch is correct in its
  handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double
  counting.
 
  I misinterpreted what the commentary for vect_pattern_recog was saying:
  I thought that all replacements were hung off of the last pattern
  statement, but this was just true for the particular example, where only
  one replacement statement was created.  Sorry for any confusion!
 
  Based on the discussion, is the patch OK for trunk?

 But you still count for each of the matched scalar stmts the cost of the
 whole pattern sequence.  No?

 I need to change the commentary to make this more clear now.  My
 comment:  Translate the last statement... is incorrect; this is done
 for each statement in a pattern.

 Per Jakub's explanation, the replacement statements are distributed over
 the original pattern statements.

Ok, looking at the code in tree-vect-patterns.c it seems that
STMT_VINFO_IN_PATTERN_P is only set for the stmt that contains
the first replacement stmt in STMT_VINFO_RELATED_STMT and further
stmts in that stmts STMT_VINFO_PATTERN_DEF_SEQ.  Other
stmts that were matched do not have STMT_VINFO_IN_PATTERN_P set
(but their STMT_VINFO_RELATED_STMT points to the stmt that has
STMT_VINFO_IN_PATTERN_P set).  Possibly the other scalar stmts
participating in the pattern are marked as not relevant (couldn't spot
this quickly).

So it seems that your patch should indeed work as-is.

Thus, ok, if Jakub doesn't spot another error.

Thanks,
Richard.

  Visiting STMT_VINFO_RELATED_STMT for a
 statement marked STMT_VINFO_IN_PATTERN_P will find zero or one
 replacement statements that should be examined.  Visiting
 STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement
 statements that didn't fit in the 1-1 mapping.  The point is that all of
 these related statements and pattern-def-seq statements are disjoint,
 and Ira's logic is ensuring that they all get examined once.

 It's not a very clean way to represent the replacement of a pattern --
 not how you'd do it if designing from scratch -- but I guess I can see
 how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto
 the existing 1-1 mapping.

 Bill


 Richard.

  Thanks,
  Bill
 
  On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote:
  On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote:
   From the commentary at the end of tree-vect-patterns.c, only the main
   statement in the pattern (the last one) is flagged as
   STMT_VINFO_IN_PATTERN_P.  So this is finding the new replacement
   statement which has been created but not yet inserted in the code.  It
   only gets counted once.
 
  STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not
  just the last, but all original stmts that are being replaced by the
  pattern).  Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt
  (last one corresponding to that original stmt).  If needed, further
  pattern stmts for that original stmts are put into
  STMT_VINFO_PATTERN_DEF_SEQ.  The pattern matcher could e.g. match
  3 original stmts, and stick a single pattern stmt into 
  STMT_VINFO_RELATED_STMT
  on the first original stmt, then say 2 pattern stmts into
  STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one
  STMT_VINFO_RELATED_STMT and finally one pattern stmt through
  STMT_VINFO_RELATED_STMT on the third original stmt.
 
   Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so
   this section has to be omitted if we backport it (which is desirable
   since the degradation was introduced in 4.6).  Removing it apparently
   does not affect the sphinx3 degradation.
 
  Yeah, when backporting you can basically assume that
  STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL 
  instead
  of itt (and simplify), because no pattern recognizers needed more than one
  pattern stmt for each original stmt back then.
 
        Jakub
 
 




[PATCH] c/52190 update __atomic documentation

2012-02-10 Thread Andrew MacLeod


Updated the extra parameter to __atomic_{is,always}_lock_free , and bkoz 
noted that the __atomic_compare_exchange documentation did not document 
the 'weak' parameter.


Fixed as such, seems to format fine.  OK for mainline?

Andrew


PR c/52190
* doc/extend.texi : Update comments for __atomic_compare_exchange and
__atomic_{is,always}_lock_free.

Index: doc/extend.texi
===
*** doc/extend.texi (revision 183969)
--- doc/extend.texi (working copy)
*** This built-in function implements an ato
*** 7146,7152 
  This compares the contents of @code{*@var{ptr}} with the contents of
  @code{*@var{expected}} and if equal, writes @var{desired} into
  @code{*@var{ptr}}.  If they are not equal, the current contents of
! @code{*@var{ptr}} is written into @code{*@var{expected}}.
  
  True is returned if @code{*@var{desired}} is written into
  @code{*@var{ptr}} and the execution is considered to conform to the
--- 7146,7155 
  This compares the contents of @code{*@var{ptr}} with the contents of
  @code{*@var{expected}} and if equal, writes @var{desired} into
  @code{*@var{ptr}}.  If they are not equal, the current contents of
! @code{*@var{ptr}} is written into @code{*@var{expected}}.  @var{weak} is true
! for weak compare_exchange, and false for the strong variation.  Many targets 
! only offer the strong variation and ignore the parameter.  When in doubt, use
! the strong variation.
  
  True is returned if @code{*@var{desired}} is written into
  @code{*@var{ptr}} and the execution is considered to conform to the
*** All memory orders are valid.
*** 7242,7268 
  
  @end deftypefn
  
! @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size)
  
! This built-in function returns true if objects of size bytes will always
! generate lock free atomic instructions for the target architecture.
! Otherwise false is returned.
  
! size must resolve to a compile time constant.
  
  @smallexample
! if (_atomic_always_lock_free (sizeof (long long)))
  @end smallexample
  
  @end deftypefn
  
! @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size)
  
! This built-in function returns true if objects of size bytes will always
  generate lock free atomic instructions for the target architecture.  If
  it is not known to be lock free a call is made to a runtime routine named
  @code{__atomic_is_lock_free}.
  
  @end deftypefn
  
  @node Object Size Checking
--- 7245,7276 
  
  @end deftypefn
  
! @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size,  
void *ptr)
  
! This built-in function returns true if objects of @var{size} bytes will always
! generate lock free atomic instructions for the target architecture.  
! @var{size} must resolve to a compile time constant and the result also 
resolves to compile time constant.
  
! @var{ptr} is an optional pointer to the object which may be used to determine
! alignment.  A value of 0 indicates typical alignment should be used.  The 
! compiler may also ignore this parameter.
  
  @smallexample
! if (_atomic_always_lock_free (sizeof (long long), 0))
  @end smallexample
  
  @end deftypefn
  
! @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size, void 
*ptr)
  
! This built-in function returns true if objects of @var{size} bytes will always
  generate lock free atomic instructions for the target architecture.  If
  it is not known to be lock free a call is made to a runtime routine named
  @code{__atomic_is_lock_free}.
  
+ @var{ptr} is an optional pointer to the object which may be used to determine
+ alignment.  A value of 0 indicates typical alignment should be used.  The 
+ compiler may also ignore this parameter.
  @end deftypefn
  
  @node Object Size Checking


PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF

2012-02-10 Thread H.J. Lu
Hi,

Since constant address in x32 is signed extended to 64bit, negative
displacement without base nor index is out of range.  OK for trunk?

Thanks.


H.J.
---
gcc/

2012-02-10  H.J. Lu  hongjiu...@intel.com

PR target/52146
* config/i386/i386.c (ix86_legitimate_address_p): Disallow
negative constant address for x32.

gcc/testsuite/

2012-02-10  H.J. Lu  hongjiu...@intel.com

PR target/52146
* gcc.target/i386/pr52146.c: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 009dd53..0bb94a7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12107,6 +12107,15 @@ ix86_legitimate_address_p (enum machine_mode mode 
ATTRIBUTE_UNUSED,
   || !ix86_legitimate_constant_p (Pmode, disp)))
/* Displacement is not constant.  */
return false;
+  else if (TARGET_X32
+   !base
+   !index
+   CONST_INT_P (disp)
+   INTVAL (disp)  0)
+   /* Since constant address in x32 is signed extended to 64bit,
+  negative displacement without base nor index is out of
+  range.  */
+   return false;
   else if (TARGET_64BIT
!x86_64_immediate_operand (disp, VOIDmode))
/* Displacement is out of range.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c 
b/gcc/testsuite/gcc.target/i386/pr52146.c
new file mode 100644
index 000..68bdeff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr52146.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* }  { ! { ia32 
} } } } } */
+/* { dg-options -O2 -mx32 } */
+
+void test1() {
+  int* apic_tpr_addr = (int *)0xfee00080;
+  *apic_tpr_addr += 4;
+}
+void test2() {
+  volatile int* apic_tpr_addr = (int *)0xfee00080;
+  *apic_tpr_addr = 0;
+}
+void test3() {
+  volatile int* apic_tpr_addr = (int *)0x7fff;
+  *apic_tpr_addr = 0;
+}
+
+/* { dg-final { scan-assembler-not -18874240 } } */


Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF

2012-02-10 Thread H.J. Lu
On Fri, Feb 10, 2012 at 09:25:06AM -0800, H.J. Lu wrote:
 Hi,
 
 Since constant address in x32 is signed extended to 64bit, negative
 displacement without base nor index is out of range.  OK for trunk?
 

Here is a different patch.

H.J.
---
gcc/

2012-02-10  Uros Bizjak  ubiz...@gmail.com

PR target/52146
* config/i386/i386.c (ix86_legitimate_address_p): Disallow
negative constant address for x32.

gcc/testsuite/

2012-02-10  H.J. Lu  hongjiu...@intel.com

PR target/52146
* gcc.target/i386/pr52146.c: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 009dd53..8f4e72e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode 
ATTRIBUTE_UNUSED,
   rtx base, index, disp;
   HOST_WIDE_INT scale;
 
+  /* Since constant address in x32 is signed extended to 64bit,
+ we have to prevent addresses from 0x8000 to 0x.  */
+  if (TARGET_X32
+   CONST_INT_P (addr)
+   val_signbit_known_set_p (SImode, INTVAL (addr)))
+return false;
+
   if (ix86_decompose_address (addr, parts) = 0)
 /* Decomposition failed.  */
 return false;
diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c 
b/gcc/testsuite/gcc.target/i386/pr52146.c
new file mode 100644
index 000..68bdeff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr52146.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* }  { ! { ia32 
} } } } } */
+/* { dg-options -O2 -mx32 } */
+
+void test1() {
+  int* apic_tpr_addr = (int *)0xfee00080;
+  *apic_tpr_addr += 4;
+}
+void test2() {
+  volatile int* apic_tpr_addr = (int *)0xfee00080;
+  *apic_tpr_addr = 0;
+}
+void test3() {
+  volatile int* apic_tpr_addr = (int *)0x7fff;
+  *apic_tpr_addr = 0;
+}
+
+/* { dg-final { scan-assembler-not -18874240 } } */


Re: [PATCH] c/52190 update __atomic documentation

2012-02-10 Thread Joseph S. Myers
On Fri, 10 Feb 2012, Andrew MacLeod wrote:

 Updated the extra parameter to __atomic_{is,always}_lock_free , and bkoz noted
 that the __atomic_compare_exchange documentation did not document the 'weak'
 parameter.
 
 Fixed as such, seems to format fine.  OK for mainline?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: trans-mem: virtual ops for gimple_transaction

2012-02-10 Thread Richard Henderson
On 02/10/2012 01:44 AM, Richard Guenther wrote:
 What is the reason to keep a GIMPLE_TRANSACTION stmt after
 TM lowering and not lower it to a builtin function call?

Because real optimization hasn't happened yet, and we hold
out hope that we'll be able to delete stuff as unreachable.
Especially all instances of transaction_cancel.

 It seems the body is empty after lowering (what's the label thing?)

The label is the transaction cancel label.

When we finally convert GIMPLE_TRANSACTION a builtin, we'll
generate different code layouts with and without a cancel.


r~


Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF

2012-02-10 Thread Uros Bizjak
On Fri, Feb 10, 2012 at 6:42 PM, H.J. Lu hongjiu...@intel.com wrote:

 Since constant address in x32 is signed extended to 64bit, negative
 displacement without base nor index is out of range.  OK for trunk?


 Here is a different patch.

 H.J.
 ---
 gcc/

 2012-02-10  Uros Bizjak  ubiz...@gmail.com

        PR target/52146
        * config/i386/i386.c (ix86_legitimate_address_p): Disallow
        negative constant address for x32.

 gcc/testsuite/

 2012-02-10  H.J. Lu  hongjiu...@intel.com

        PR target/52146
        * gcc.target/i386/pr52146.c: New.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 009dd53..8f4e72e 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode 
 ATTRIBUTE_UNUSED,
   rtx base, index, disp;
   HOST_WIDE_INT scale;

 +  /* Since constant address in x32 is signed extended to 64bit,
 +     we have to prevent addresses from 0x8000 to 0x.  */
 +  if (TARGET_X32
 +       CONST_INT_P (addr)
 +       val_signbit_known_set_p (SImode, INTVAL (addr)))

As said in the PR, val_signbit_known_set_p is a bit overkill. Please
use INTVAL (addr)  0, it works as well.

 +++ b/gcc/testsuite/gcc.target/i386/pr52146.c
 @@ -0,0 +1,17 @@
 +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* }  { ! { 
 ia32 } } } } } */

we _are_ in x86 directory, so:

/* { dg-do compile { target { ! { ia32 } } } }  */

 +/* { dg-options -O2 -mx32 } */
 +
 +void test1() {
 +  int* apic_tpr_addr = (int *)0xfee00080;
 +  *apic_tpr_addr += 4;
 +}
 +void test2() {
 +  volatile int* apic_tpr_addr = (int *)0xfee00080;
 +  *apic_tpr_addr = 0;

No need for volatile.

 +}
 +void test3() {
 +  volatile int* apic_tpr_addr = (int *)0x7fff;
 +  *apic_tpr_addr = 0;
 +}

test2 is enough. No need to test what worked OK.

 +
 +/* { dg-final { scan-assembler-not -18874240 } } */

Please also reformat the test to GNU coding standards.

Patch is OK for 4.7 and 4.6 after bootstrap and regression test on x32 target.

Thanks,
Uros.


Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Jonathan Wakely
On 10 February 2012 14:48, Rainer Orth wrote:
 I've also noticed one feature of your patch that I don't like:
 __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of
 pthread_{mutex,cond}_init_function instead of passing them on as all
 other gthr-posix.h functions do.  This might lead to bad error handling,
 and my previous patch (based on an older version of yours you had
 attached to the PR) changed them to return int instead.  I suppose
 changing this now is out of question, and this is left as 4.8 material.

I didn't like that feature of the patch much either, but the signature
of the mutex_init function is specified in gthr.h:

 __GTHREAD_MUTEX_INIT_FUNCTION
some systems can't initialize a mutex without a
function call.  On such systems, define this to a
function which looks like this:
  void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *)

The recursive one says it should have the same signature, and I (maybe
wrongly) assumed the cond_init one should too.


Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Jonathan Wakely
On 10 February 2012 14:48, Rainer Orth wrote:

 Bootstrapped without regressions on alpha-dec-osf5.1b, ok for mainline?

Yes, this is OK, thanks for your help testing and getting all of this working.


Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Rainer Orth
Jon,

 On 10 February 2012 14:48, Rainer Orth wrote:
 I've also noticed one feature of your patch that I don't like:
 __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of
 pthread_{mutex,cond}_init_function instead of passing them on as all
 other gthr-posix.h functions do.  This might lead to bad error handling,
 and my previous patch (based on an older version of yours you had
 attached to the PR) changed them to return int instead.  I suppose
 changing this now is out of question, and this is left as 4.8 material.

 I didn't like that feature of the patch much either, but the signature
 of the mutex_init function is specified in gthr.h:

  __GTHREAD_MUTEX_INIT_FUNCTION
 some systems can't initialize a mutex without a
 function call.  On such systems, define this to a
 function which looks like this:
   void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *)

 The recursive one says it should have the same signature, and I (maybe

I don't see this, neither in gthr.h nor in gthr-posix.h.

 wrongly) assumed the cond_init one should too.

I don't think this is cast in stone: the gthr*.h headers aren't
installed in a public place and aren't considered an external interface
to the best of my knowledge, so it should be possible to change.

Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc,
libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in
libstdc++), so changing the 13 calls is doable.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF

2012-02-10 Thread H.J. Lu
On Fri, Feb 10, 2012 at 9:55 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Feb 10, 2012 at 6:42 PM, H.J. Lu hongjiu...@intel.com wrote:

 Since constant address in x32 is signed extended to 64bit, negative
 displacement without base nor index is out of range.  OK for trunk?


 Here is a different patch.

 H.J.
 ---
 gcc/

 2012-02-10  Uros Bizjak  ubiz...@gmail.com

        PR target/52146
        * config/i386/i386.c (ix86_legitimate_address_p): Disallow
        negative constant address for x32.

 gcc/testsuite/

 2012-02-10  H.J. Lu  hongjiu...@intel.com

        PR target/52146
        * gcc.target/i386/pr52146.c: New.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 009dd53..8f4e72e 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode 
 ATTRIBUTE_UNUSED,
   rtx base, index, disp;
   HOST_WIDE_INT scale;

 +  /* Since constant address in x32 is signed extended to 64bit,
 +     we have to prevent addresses from 0x8000 to 0x.  */
 +  if (TARGET_X32
 +       CONST_INT_P (addr)
 +       val_signbit_known_set_p (SImode, INTVAL (addr)))

 As said in the PR, val_signbit_known_set_p is a bit overkill. Please
 use INTVAL (addr)  0, it works as well.

 +++ b/gcc/testsuite/gcc.target/i386/pr52146.c
 @@ -0,0 +1,17 @@
 +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* }  { ! { 
 ia32 } } } } } */

 we _are_ in x86 directory, so:

 /* { dg-do compile { target { ! { ia32 } } } }  */

 +/* { dg-options -O2 -mx32 } */
 +
 +void test1() {
 +  int* apic_tpr_addr = (int *)0xfee00080;
 +  *apic_tpr_addr += 4;
 +}
 +void test2() {
 +  volatile int* apic_tpr_addr = (int *)0xfee00080;
 +  *apic_tpr_addr = 0;

 No need for volatile.

 +}
 +void test3() {
 +  volatile int* apic_tpr_addr = (int *)0x7fff;
 +  *apic_tpr_addr = 0;
 +}

 test2 is enough. No need to test what worked OK.

 +
 +/* { dg-final { scan-assembler-not -18874240 } } */

 Please also reformat the test to GNU coding standards.

 Patch is OK for 4.7 and 4.6 after bootstrap and regression test on x32 target.

I checked it into 4.7 and will backport to my x32 4.6 branch.

Thanks.

-- 
H.J.


Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Jonathan Wakely
On 10 February 2012 18:20, Rainer Orth wrote:
 Jon,

 On 10 February 2012 14:48, Rainer Orth wrote:
 I've also noticed one feature of your patch that I don't like:
 __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of
 pthread_{mutex,cond}_init_function instead of passing them on as all
 other gthr-posix.h functions do.  This might lead to bad error handling,
 and my previous patch (based on an older version of yours you had
 attached to the PR) changed them to return int instead.  I suppose
 changing this now is out of question, and this is left as 4.8 material.

 I didn't like that feature of the patch much either, but the signature
 of the mutex_init function is specified in gthr.h:

      __GTHREAD_MUTEX_INIT_FUNCTION
             some systems can't initialize a mutex without a
         function call.  On such systems, define this to a
         function which looks like this:
           void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *)

 The recursive one says it should have the same signature, and I (maybe

 I don't see this, neither in gthr.h nor in gthr-posix.h.

lines 54-62 in http://gcc.gnu.org/viewcvs/trunk/libgcc/gthr.h?view=markup

 wrongly) assumed the cond_init one should too.

 I don't think this is cast in stone: the gthr*.h headers aren't
 installed in a public place and aren't considered an external interface
 to the best of my knowledge, so it should be possible to change.

 Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc,
 libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in
 libstdc++), so changing the 13 calls is doable.


Private ports which provide their own gthr header (I think they're
rare, but do exist) would need to modify their functions to return an
integer - but at least it would be a compile-time failure so they'd
know to change it.


Re: Gthreads patch to disable static initializer macros

2012-02-10 Thread Rainer Orth
Jonathan Wakely jwakely@gmail.com writes:

 I didn't like that feature of the patch much either, but the signature
 of the mutex_init function is specified in gthr.h:

      __GTHREAD_MUTEX_INIT_FUNCTION
             some systems can't initialize a mutex without a
         function call.  On such systems, define this to a
         function which looks like this:
           void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *)

 The recursive one says it should have the same signature, and I (maybe

 I don't see this, neither in gthr.h nor in gthr-posix.h.

 lines 54-62 in http://gcc.gnu.org/viewcvs/trunk/libgcc/gthr.h?view=markup

I think that's overinterpreting the comment.
__gthread_recursive_mutex_init_function happily returns errors with no
indication it should be otherwise.

 wrongly) assumed the cond_init one should too.

 I don't think this is cast in stone: the gthr*.h headers aren't
 installed in a public place and aren't considered an external interface
 to the best of my knowledge, so it should be possible to change.

 Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc,
 libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in
 libstdc++), so changing the 13 calls is doable.

 Private ports which provide their own gthr header (I think they're
 rare, but do exist) would need to modify their functions to return an
 integer - but at least it would be a compile-time failure so they'd
 know to change it.

Right, and that's the cost of keeping a port out of the FSF tree.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[patch libffi]: Add support for ffi_prep_raw_closure_loc for X86_WIN32 for fixing finally PR 51500

2012-02-10 Thread Kai Tietz
Hi,

ChangeLog

2012-02-10  Kai Tietz  kti...@redhat.com

* src/x86/ffi.c (ffi_prep_raw_closure_loc): Add thiscall
support for X86_WIN32.

Tested for i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-unknown-linux-gnu.

Ok for apply?

Regards,
Kai

Index: src/x86/ffi.c
===
--- src/x86/ffi.c   (revision 184105)
+++ src/x86/ffi.c   (working copy)
@@ -720,6 +720,9 @@
   int i;

   if (cif-abi != FFI_SYSV) {
+#ifdef X86_WIN32
+if (cif-abi != FFI_THISCALL)
+#endif
 return FFI_BAD_ABI;
   }

@@ -734,10 +737,20 @@
   FFI_ASSERT (cif-arg_types[i]-type != FFI_TYPE_LONGDOUBLE);
 }

-
+#if X86_WIN32
+  if (cif-abi == FFI_SYSV)
+{
+#endif
   FFI_INIT_TRAMPOLINE (closure-tramp[0], ffi_closure_raw_SYSV,
codeloc);
-
+#if X86_WIN32
+}
+  else if (cif-abi == FFI_THISCALL)
+{
+  FFI_INIT_TRAMPOLINE_THISCALL (closure-tramp[0], ffi_closure_raw_SYSV,
+   codeloc, cif-bytes);
+}
+#endif
   closure-cif  = cif;
   closure-user_data = user_data;
   closure-fun  = fun;


Re: haifa-sched: Fix problems with cc0 in prune_ready_list

2012-02-10 Thread Vladimir Makarov

On 02/08/2012 10:01 AM, Bernd Schmidt wrote:

We found a scheduler problem while testing Coldfire targets. The
prune_ready_list function I introduced doesn't take SCHED_GROUPs into
account, which can lead to a random insn being scheduled between a cc0
setter and user. The patch below fixes it, OK? (Bootstrapped  tested
i686-linux).



Ok. Thanks, Bernd.



Re: [libitm] Port to ARM

2012-02-10 Thread Eric Botcazou
 I have no immediate plans.  If you have time to knock something
 together for Sparc, that would be fantastic.

Btw, any particular reason why the libgomp-copied linking scheme seems to be 
only half-implemented?

# Set up the set of libraries that we need to link against for libitm.
# Note that the GOMP_SELF_SPEC in gcc.c will force -pthread for -fopenmp,
# which will force linkage against -lpthread (or equivalent for the system).
# That's not 100% ideal, but about the best we can do easily.
if test $enable_shared = yes; then
  link_itm=-litm %{static: $LIBS}
else
  link_itm=-litm $LIBS
fi
AC_SUBST(link_itm)

but AFAICS nobody uses libitm.spec (and so you need to force -litm for the 
testsuite)?  

-- 
Eric Botcazou


[PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations

2012-02-10 Thread Michael Meissner
I was looking at the routelookup EEMBC benchmark and it has code of the form:

   while ( this_node-cmpbit  next_node-cmpbit )
{
  this_node = next_node;

  if ( proute-dst_addr  (0x1  this_node-cmpbit) )
 next_node = this_node-rlink;
  else
 next_node = this_node-llink;
}

This is where you have a binary tree/trie and you are iterating going down
either the right link or left link until you find a stopping condition.  The
code in ifcvt.c does not handle optimizing these cases for conditional move
since the load might trap, and generates code that does if-then-else with loads
and jumps.

However, since the two elements are next to each other in memory, they are
likely in the same cache line, particularly with aligned stacks and malloc
returning aligned pointers.  Except in unusual circumstances where the pointer
is not aligned, this means it is much faster to optimize it as:

   while ( this_node-cmpbit  next_node-cmpbit )
{
  this_node = next_node;

  rtmp = this_node-rlink;
  ltmp = this_node-llink;
  if ( proute-dst_addr  (0x1  this_node-cmpbit) )
 next_node = rtmp;
  else
 next_node = ltmp;
}

So I wrote some patches to do this optimization.  In ifcvt.c I added a new hook
that allows the backend to try and do conditional moves if the machine
independent code doesn't handle the special cases that the machine might have.

Then in rs6000.c I used that hook to see if the conditional moves are adjacent,
and do the optimization.

I will note that this type of code comes up quite frequently since binary trees
and tries are common data structure.  The file splay-tree.c in libiberty is one
place in the compiler tree that has conditional adjacent memory moves.

So I would like comments on the patch before the 4.8 tree opens up.  I feel
even if we decide not to add the adjacent memory move patch, the hook is
useful, and I have some other ideas for using it for the powerpc.

I was thinking about rewriting the rs6000 dependent parts to make it a normal
optimization available to all ports.  Is this something we want as a normal
option?  At the moment, I'm not sure it should be part of -O3 because it is
possible for a trap to occur if the pointer straddles a page boundary and the
test condition would guard against loading up the second value.  However,
-Ofast might be an appropriate place to do this optimization.

At this time I don't have test cases, but I would add them for the real
submission.  I have bootstraped the compiler on powerpc with this option
enabled and it passed the bootstrap and had no regressions in make check.  I
will do a spec run over the weekend as well.

In addition to libibery/splay-tree.c the following files in gcc have adjacent
conditional moves that this code would optimize:

cfg.c
c-typeck.c
df-scan.c
fold-const.c
graphds.c
ira-emit.c
omp-low.c
rs6000.c
tree-cfg.c
tree-ssa-dom.c
tree-ssa-loop-ivops.c
tree-ssa-phiopt.c
tree-ssa-uncprop.c

2012-02-10  Michael Meissner  meiss...@linux.vnet.ibm.com

* target.def (cmove_md_extra): New hook that is called from
ifcvt.c to allow the backend to generate additional conditional
moves that aren't handled by the machine independent code.  Add
support to call the hook at the appropriate places.
* targhooks.h (default_cmove_md_extra): Likewise.
* targhooks.c (default_cmove_md_extra): Likewise.
* target.h (enum ifcvt_pass): Likewise.
* ifcvt.c (find_if_header): Likewise.
(noce_find_if_block): Likewise.
(struct noce_if_info): Likewise.
(noce_process_if_block): Likewise.
(cond_move_process_if_block): Likewise.
(if_convert): Likewise.
(rest_of_handle_if_conversion): Likewise.
(rest_of_handle_if_after_combine): Likewise.
(rest_of_handle_if_after_reload): Likewise.
* doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
* doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.

* config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
for a new switch to optimize conditional moves where each side is
a memory reference and the memory locations are adjacent.
* config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
(TARGET_CMOVE_MD_EXTRA): Likewise.
(rs6000_decompose_offsettable_memref): Likewise.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 184104)
+++ gcc/doc/tm.texi (working copy)
@@ -10871,6 +10871,16 @@ conditional execution instructions inste
 1 if it does use cc0.
 @end defmac
 
+@deftypefn {Target Hook} rtx 

[PATCH, 4.6] Backport fixes for PR50031, PR50969

2012-02-10 Thread William J. Schmidt
This patch backports the two recent trunk fixes for powerpc64
vectorization degradations.  The fixes are largely identical to their
4.7 counterparts except that (a) the logic for
STMT_VINFO_PATTERN_DEF_SEQ does not apply in 4.6, and (b) the changes to
vectorizable_conversion in 4.7 correspond to changes in
vectorizable_type_demotion and vectorizable_type_promotion in 4.6.

Bootstrapped and tested for regressions and performance for
powerpc64-linux.  OK to commit after the trunk patch has a few days of
burn-in?

Thanks,
Bill


2012-02-10  Bill Schmidt wschm...@linux.vnet.ibm.com
Ira Rosen i...@il.ibm.com

PR tree-optimization/50031
PR tree-optimization/50969
* targhooks.c (default_builtin_vectorization_cost): Handle
vec_promote_demote.
* target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
* tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
all types of reduction and pattern statements.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
(vect_model_store_cost): Use vec_perm rather than vector_stmt for
statement cost.
(vect_model_load_cost): Likewise.
(vect_get_load_cost): Likewise; add dump logic for explicit realigns.
(vectorizable_type_demotion): Call vect_model_promotion_demotion_cost.
(vectorizable_type_promotion): Likewise.
* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
vec_promote_demote.
* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
vec_perm for VSX and handle vec_promote_demote.


Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 184047)
+++ gcc/targhooks.c (working copy)
@@ -529,6 +529,7 @@ default_builtin_vectorization_cost (enum vect_cost
   case scalar_to_vec:
   case cond_branch_not_taken:
   case vec_perm:
+  case vec_promote_demote:
 return 1;
 
   case unaligned_load:
Index: gcc/target.h
===
--- gcc/target.h(revision 184047)
+++ gcc/target.h(working copy)
@@ -128,7 +128,8 @@ enum vect_cost_for_stmt
   scalar_to_vec,
   cond_branch_not_taken,
   cond_branch_taken,
-  vec_perm
+  vec_perm,
+  vec_promote_demote
 };
 
 /* Sets of optimization levels at which an option may be enabled by
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 184047)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2104,7 +2104,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf
   if (stmt_info
!STMT_VINFO_RELEVANT_P (stmt_info)
(!STMT_VINFO_LIVE_P (stmt_info)
-  || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def))
+  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)))
+  !STMT_VINFO_IN_PATTERN_P (stmt_info))
 continue;
 
   if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
@@ -2251,11 +2252,19 @@ vect_estimate_min_profitable_iters (loop_vec_info
{
  gimple stmt = gsi_stmt (si);
  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+
+ if (STMT_VINFO_IN_PATTERN_P (stmt_info))
+   {
+ stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+ stmt_info = vinfo_for_stmt (stmt);
+   }
+
  /* Skip stmts that are not vectorized inside the loop.  */
  if (!STMT_VINFO_RELEVANT_P (stmt_info)
   (!STMT_VINFO_LIVE_P (stmt_info)
- || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def))
+ || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info
continue;
+
  vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * 
factor;
  /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
 some of the outside costs are generated inside the outer-loop.  
*/
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 184047)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -623,6 +623,46 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i
 }
 
 
+/* Model cost for type demotion and promotion operations.  PWR is normally
+   zero for single-step promotions and demotions.  It will be one if 
+   two-step promotion/demotion is required, and so on.  Each additional
+   step doubles the number of instructions required.  */
+
+static void
+vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
+   enum vect_def_type *dt, int pwr)
+{
+  int i, tmp;
+  int inside_cost = 0, outside_cost = 0, 

Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations

2012-02-10 Thread Andrew Pinski
On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 I was looking at the routelookup EEMBC benchmark and it has code of the form:

   while ( this_node-cmpbit  next_node-cmpbit )
    {
      this_node = next_node;

      if ( proute-dst_addr  (0x1  this_node-cmpbit) )
         next_node = this_node-rlink;
      else
         next_node = this_node-llink;
    }

Hmm, this looks like we could do this on the tree level better as we
have more information about this_node there.  Like we know that we
load from this_node-cmpbit before we do either of the branches.  So
can move both of those loads before the branch and then we get the
ifcvt for free.

Thanks,
Andrew Pinski




 This is where you have a binary tree/trie and you are iterating going down
 either the right link or left link until you find a stopping condition.  The
 code in ifcvt.c does not handle optimizing these cases for conditional move
 since the load might trap, and generates code that does if-then-else with 
 loads
 and jumps.

 However, since the two elements are next to each other in memory, they are
 likely in the same cache line, particularly with aligned stacks and malloc
 returning aligned pointers.  Except in unusual circumstances where the pointer
 is not aligned, this means it is much faster to optimize it as:

   while ( this_node-cmpbit  next_node-cmpbit )
    {
      this_node = next_node;

      rtmp = this_node-rlink;
      ltmp = this_node-llink;
      if ( proute-dst_addr  (0x1  this_node-cmpbit) )
         next_node = rtmp;
      else
         next_node = ltmp;
    }

 So I wrote some patches to do this optimization.  In ifcvt.c I added a new 
 hook
 that allows the backend to try and do conditional moves if the machine
 independent code doesn't handle the special cases that the machine might have.

 Then in rs6000.c I used that hook to see if the conditional moves are 
 adjacent,
 and do the optimization.

 I will note that this type of code comes up quite frequently since binary 
 trees
 and tries are common data structure.  The file splay-tree.c in libiberty is 
 one
 place in the compiler tree that has conditional adjacent memory moves.

 So I would like comments on the patch before the 4.8 tree opens up.  I feel
 even if we decide not to add the adjacent memory move patch, the hook is
 useful, and I have some other ideas for using it for the powerpc.

 I was thinking about rewriting the rs6000 dependent parts to make it a normal
 optimization available to all ports.  Is this something we want as a normal
 option?  At the moment, I'm not sure it should be part of -O3 because it is
 possible for a trap to occur if the pointer straddles a page boundary and the
 test condition would guard against loading up the second value.  However,
 -Ofast might be an appropriate place to do this optimization.

 At this time I don't have test cases, but I would add them for the real
 submission.  I have bootstraped the compiler on powerpc with this option
 enabled and it passed the bootstrap and had no regressions in make check.  I
 will do a spec run over the weekend as well.

 In addition to libibery/splay-tree.c the following files in gcc have adjacent
 conditional moves that this code would optimize:

            cfg.c
            c-typeck.c
            df-scan.c
            fold-const.c
            graphds.c
            ira-emit.c
            omp-low.c
            rs6000.c
            tree-cfg.c
            tree-ssa-dom.c
            tree-ssa-loop-ivops.c
            tree-ssa-phiopt.c
            tree-ssa-uncprop.c

 2012-02-10  Michael Meissner  meiss...@linux.vnet.ibm.com

        * target.def (cmove_md_extra): New hook that is called from
        ifcvt.c to allow the backend to generate additional conditional
        moves that aren't handled by the machine independent code.  Add
        support to call the hook at the appropriate places.
        * targhooks.h (default_cmove_md_extra): Likewise.
        * targhooks.c (default_cmove_md_extra): Likewise.
        * target.h (enum ifcvt_pass): Likewise.
        * ifcvt.c (find_if_header): Likewise.
        (noce_find_if_block): Likewise.
        (struct noce_if_info): Likewise.
        (noce_process_if_block): Likewise.
        (cond_move_process_if_block): Likewise.
        (if_convert): Likewise.
        (rest_of_handle_if_conversion): Likewise.
        (rest_of_handle_if_after_combine): Likewise.
        (rest_of_handle_if_after_reload): Likewise.
        * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise.
        * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise.

        * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support
        for a new switch to optimize conditional moves where each side is
        a memory reference and the memory locations are adjacent.
        * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise.
        (TARGET_CMOVE_MD_EXTRA): Likewise.
        (rs6000_decompose_offsettable_memref): Likewise.

 --
 

[lra] patch to fix bootstrap on ARM

2012-02-10 Thread Vladimir Makarov
The following patch fixes arm bootstrap for LRA. The problem was in 
pseudo live
range splitting and wrong assignment to multi-registers pseudos with the 
same value.


The patch was successfully bootstrapped on x86/x86-64 and ARM.

Committed as rev. 184109.

2012-02-10 Vladimir Makarov vmaka...@redhat.com

* lra-int.h (struct lra_reg): New member call_p.

* lra-lives.c (lra_create_live_ranges): Clear call_p for the pseudo.
(process_bb_lives): Setup call_p.

* lra-assigns.c (find_hard_regno_for): Add code to setup and use
impossible_start_hard_regs.
(setup_live_pseudos_and_spill_after_risky): Consider
conflict_hard_regs to.
(lra_assign): Add checking pseudos crossing calls and using call
used hard regs.

* lra-constraints.c (extract_loc_address_regs): Fix the typo in
using code1 instead of code0.
(exchange_plus_ops): New function.
(process_address): Use it.
(split_pseudo): Rename and change semantics of the first
parameter.
(inherit_in_ebb): Process only output non-subreg pseudos as
a definition for the split. In this case don't invalidate usages
for splitting. Always add usage insn for splitting in case of
used pseudos or defined subregs of multi-reg pseudos.

Index: lra-assigns.c
===
--- lra-assigns.c	(revision 183639)
+++ lra-assigns.c	(working copy)
@@ -304,11 +304,14 @@ find_hard_regno_for (int regno, int *cos
   int best_cost = INT_MAX, best_bank = INT_MAX, best_usage = INT_MAX;
   lra_live_range_t r;
   int p, i, j, rclass_size, best_hard_regno, bank, hard_regno;
+  int hr, conflict_hr, nregs;
+  enum machine_mode biggest_mode;
   unsigned int k, conflict_regno;
   int val;
   enum reg_class rclass;
   bitmap_iterator bi;
   bool *rclass_intersect_p;
+  HARD_REG_SET impossible_start_hard_regs;
 
   COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
   rclass = regno_allocno_class_array[regno];
@@ -317,6 +320,7 @@ find_hard_regno_for (int regno, int *cos
   sparseset_clear (conflict_reload_and_inheritance_pseudos);
   sparseset_clear (live_range_hard_reg_pseudos);
   IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs);
+  biggest_mode = lra_reg_info[regno].biggest_mode;
   for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r-next)
 {
   EXECUTE_IF_SET_IN_BITMAP (live_hard_reg_pseudos[r-start], 0, k, bi)
@@ -370,8 +374,26 @@ find_hard_regno_for (int regno, int *cos
 #endif
   sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
   val = lra_reg_info[regno].val;
+  CLEAR_HARD_REG_SET (impossible_start_hard_regs);
   EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
-if (val != lra_reg_info[conflict_regno].val)
+if (val == lra_reg_info[conflict_regno].val)
+  {
+	conflict_hr = live_pseudos_reg_renumber[conflict_regno];
+	nregs = (hard_regno_nregs[conflict_hr]
+		 [lra_reg_info[conflict_regno].biggest_mode]);
+	/* Remember about multi-register pseudos.  For example, 2 hard
+	   register pseudos can start on the same hard register but can
+	   not start on HR and HR+1/HR-1.  */ 
+	for (hr = conflict_hr + 1;
+	 hr  FIRST_PSEUDO_REGISTER  hr  conflict_hr + nregs;
+	 hr++)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+	for (hr = conflict_hr - 1;
+	 hr = 0  hr + hard_regno_nregs[hr][biggest_mode]  conflict_hr;
+	 hr--)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+  }
+else
   {
 	lra_add_hard_reg_set (live_pseudos_reg_renumber[conflict_regno],
 			  lra_reg_info[conflict_regno].biggest_mode,
@@ -424,7 +446,8 @@ find_hard_regno_for (int regno, int *cos
 	 conflict_set)
 	  /* We can not use prohibited_class_mode_regs because it is
 	 defined not for all classes.  */
-	   HARD_REGNO_MODE_OK (hard_regno, PSEUDO_REGNO_MODE (regno)))
+	   HARD_REGNO_MODE_OK (hard_regno, PSEUDO_REGNO_MODE (regno))
+	   ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno))
 	{
 	  if (hard_regno_costs_check[hard_regno]
 	  != curr_hard_regno_costs_check)
@@ -881,9 +904,13 @@ setup_live_pseudos_and_spill_after_risky
 	}
 	}
   COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+  IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs);
   val = lra_reg_info[regno].val;
   EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
-	if (val != lra_reg_info[conflict_regno].val)
+	if (val != lra_reg_info[conflict_regno].val
+	/* If it is multi-register pseudos they should start on
+	   the same hard register.  */
+	|| hard_regno != reg_renumber[conflict_regno])
 	  lra_add_hard_reg_set (reg_renumber[conflict_regno],
 lra_reg_info[conflict_regno].biggest_mode,
 conflict_set);
@@ -893,7 +920,6 @@ setup_live_pseudos_and_spill_after_risky
 	  continue;
 	}
   bitmap_set_bit (spilled_pseudo_bitmap, regno);
-  hard_regno = reg_renumber[regno];
   for (j = 0;
 	   j  hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)];
 	   j++)

Re: libgo patch committed: Update to weekly.2011-12-22

2012-02-10 Thread Ian Lance Taylor
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 Ian Lance Taylor i...@google.com writes:

 Fixed with the appended patch, which also gathers up all the possibly
 missing functions that I noticed.  Bootstrapped and ran Go testsuite on
 x86_64-unknown-linux-gnu, which proves little as the system I tested on
 has all these functions anyhow.  Committed to mainline.

 Results are way better now, but still a couple of libgo FAILs:

 Running target unix
 FAIL: net
 FAIL: websocket
 FAIL: compress/flate
 FAIL: exp/ssh
 FAIL: image/jpeg
 FAIL: log/syslog
 FAIL: net/http
 FAIL: net/http/httputil
 FAIL: net/rpc
 FAIL: os/exec

 Most of them are like

 Start pollServer: epoll_ctl: Bad file descriptor
 panic: runtime error: invalid memory address or nil pointer dereference
 FAIL: net

That sort of problem should be fixed now, by

http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00110.html

Ian


Re: [Ada] Do not pass -Werror during linking

2012-02-10 Thread Eric Botcazou
 Can you try to extract a testcase (assuming it's just a single case?).
 We shouldn't
 warn for layout-compatible types (but we may do so if for example struct
 nesting differs).

It's more basic than that: for example, we map pointers on the C side to 
addresses (integers) on the Ada side.

-- 
Eric Botcazou


libgo patch committed: Fix channels for big-endian strict-alignment

2012-02-10 Thread Ian Lance Taylor
This libgo patch fixes the channel code to work correctly on big-endian
systems.  The older version of the code did this right, but I forgot to
handle it in the new version.  This patch also fixes the memory
structure built for the select statement to work correctly on strict
alignment systems.  The code was putting a uint16_t array before a
pointer array, which would of course cause a SIGBUS on a strict
alignment system.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Bootstrapped and ran channel tests on
sparc-sun-solaris2.11; the Go testsuite on that system still has some
failures but they appear to be due to other problems, such as PR 52205
which I just filed.  Committed to mainline.

Ian

diff -r 93764d4d4aca libgo/runtime/chan.c
--- a/libgo/runtime/chan.c	Fri Feb 10 07:53:37 2012 -0800
+++ b/libgo/runtime/chan.c	Fri Feb 10 15:54:46 2012 -0800
@@ -409,11 +409,20 @@
 void
 __go_send_small(ChanType *t, Hchan* c, uint64 val)
 {
-	byte b[sizeof(uint64)];
+	union
+	{
+		byte b[sizeof(uint64)];
+		uint64 v;
+	} u;
+	byte *p;
 
-	runtime_memclr(b, sizeof(uint64));
-	__builtin_memcpy(b, val, t-__element_type-__size);
-	runtime_chansend(t, c, b, nil);
+	u.v = val;
+#ifndef WORDS_BIGENDIAN
+	p = u.b;
+#else
+	p = u.b + sizeof(uint64) - t-__element_type-__size;
+#endif
+	runtime_chansend(t, c, p, nil);
 }
 
 // The compiler generates a call to __go_send_big to send a value
@@ -433,9 +442,15 @@
 		byte b[sizeof(uint64)];
 		uint64 v;
 	} u;
+	byte *p;
 
 	u.v = 0;
-	runtime_chanrecv(t, c, u.b, nil, nil);
+#ifndef WORDS_BIGENDIAN
+	p = u.b;
+#else
+	p = u.b + sizeof(uint64) - t-__element_type-__size;
+#endif
+	runtime_chanrecv(t, c, p, nil, nil);
 	return u.v;
 }
 
@@ -654,8 +669,8 @@
 
 	sel-tcase = size;
 	sel-ncase = 0;
-	sel-pollorder = (void*)(sel-scase + size);
-	sel-lockorder = (void*)(sel-pollorder + size);
+	sel-lockorder = (void*)(sel-scase + size);
+	sel-pollorder = (void*)(sel-lockorder + size);
 	*selp = sel;
 
 	if(debug)


[SPARC] Remove useless new code

2012-02-10 Thread Eric Botcazou
It was introduced in the course of the -mflat rewrite, but can be replaced by a 
single line.

Tested on SPARC/Solaris w/ and w/o -mflat, applied on the mainline.


2012-02-10  Eric Botcazou  ebotca...@adacore.com

* config/sparc/sparc.c (sparc_flat_expand_prologue): Use emit_use.
* config/sparc/sparc.md (UNSPECV_GOTO): Delete.
(nonlocal_goto_internal): Likewise.
(nonlocal_goto): Emit a use and an indirect jump directly.


-- 
Eric Botcazou
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 183864)
+++ config/sparc/sparc.c	(working copy)
@@ -5131,7 +5131,7 @@ sparc_flat_expand_prologue (void)
 
 	  /* Prevent this instruction from ever being considered dead,
 	 even if this function has no epilogue.  */
-	  emit_insn (gen_rtx_USE (VOIDmode, i7));
+	  emit_use (i7);
 	}
 }
 
Index: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 183864)
+++ config/sparc/sparc.md	(working copy)
@@ -99,7 +99,6 @@ (define_constants
 (define_constants
   [(UNSPECV_BLOCKAGE		0)
(UNSPECV_FLUSHW		1)
-   (UNSPECV_GOTO		2)
(UNSPECV_FLUSH		4)
(UNSPECV_SAVEW		6)
(UNSPECV_CAS			8)
@@ -6524,6 +6523,7 @@ (define_expand nonlocal_goto
(match_operand 3 memory_operand )]
   
 {
+  rtx i7 = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
   rtx r_label = copy_to_reg (operands[1]);
   rtx r_sp = adjust_address_nv (operands[2], Pmode, 0);
   rtx r_fp = operands[3];
@@ -6540,44 +6540,19 @@ (define_expand nonlocal_goto
   /* Restore frame pointer for containing function.  */
   emit_move_insn (hard_frame_pointer_rtx, r_fp);
   emit_stack_restore (SAVE_NONLOCAL, r_sp);
+  emit_move_insn (i7, r_i7);
 
   /* USE of hard_frame_pointer_rtx added for consistency;
  not clear if really needed.  */
   emit_use (hard_frame_pointer_rtx);
   emit_use (stack_pointer_rtx);
+  emit_use (i7);
 
-  /* We need to smuggle the load of %i7 as it is a fixed register.  */
-  emit_jump_insn (gen_nonlocal_goto_internal (r_label, r_i7));
+  emit_jump_insn (gen_indirect_jump (r_label));
   emit_barrier ();
   DONE;
 })
 
-(define_insn nonlocal_goto_internal
-  [(unspec_volatile [(match_operand 0 register_operand r)
- (match_operand 1 memory_operand m)] UNSPECV_GOTO)]
-  GET_MODE (operands[0]) == Pmode  GET_MODE (operands[1]) == Pmode
-{
-  if (flag_delayed_branch)
-{
-  if (TARGET_ARCH64)
-	return jmp\t%0\n\t ldx\t%1, %%i7;
-  else
-	return jmp\t%0\n\t ld\t%1, %%i7;
-}
-  else
-{
-  if (TARGET_ARCH64)
-	return ldx\t%1, %%i7\n\tjmp\t%0\n\t nop;
-  else
-	return ld\t%1, %%i7\n\tjmp\t%0\n\t nop;
-}
-}
-  [(set (attr type) (const_string multi))
-   (set (attr length)
-	(if_then_else (eq_attr delayed_branch true)
-		  (const_int 2)
-		  (const_int 3)))])
-
 (define_expand builtin_setjmp_receiver
   [(label_ref (match_operand 0  ))]
   flag_pic


libgo patch committed: Tweak can-recover heuristic for SPARC

2012-02-10 Thread Ian Lance Taylor
The gccgo implementation of panic/recover uses the return address of a
function which called recover in order to see if that function was
invoke directly by defer.  It does this by passing the address of a
label which immediately follows the function call.  This requires a
small heuristic, because there may be some machine instructions between
the function call and the label, even though they are adjacent in
GIMPLE.  On SPARC I needed to tweak that heuristic further, because on
SPARC __builtin_return_address returns the address of the function call
instruction, which is at least 8 bytes before the label.  This patch
implements that tweak.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu and sparc-sun-solaris2.11 (Solaris has other
unrelated failures).  Committed to mainline.

Ian

Index: libgo/runtime/go-recover.c
===
--- libgo/runtime/go-recover.c	(revision 183650)
+++ libgo/runtime/go-recover.c	(working copy)
@@ -43,6 +43,14 @@ __go_can_recover (const void* retaddr)
  such as an instruction to adjust the stack pointer.  */
 
   ret = (const char *) retaddr;
+
+#ifdef __sparc__
+  /* On SPARC the address we get, from __builtin_return_address, is
+ the address of the call instruction.  Adjust forward, also
+ skipping the delayed instruction following the call.  */
+  ret += 8;
+#endif
+
   dret = (const char *) d-__retaddr;
   return ret = dret  ret + 16 = dret;
 }


Re: [PATCH, rtl-optimization]: Fix post-reload compare elimination pre-pass

2012-02-10 Thread Richard Henderson
On 02/09/2012 03:47 PM, Uros Bizjak wrote:
 2012-02-10  Uros Bizjak  ubiz...@gmail.com
 
   * compare-elim.c (find_comparisons_in_bb): Eliminate only compares
   having mode compatible with the mode of previous compare.  Substitute
   compare mode of previous compare with the mode, compatible
   with eliminated and previous compare.

This patch is ok for 4.8.

For 4.6 and 4.7 I would prefer that we simply not eliminate the compare.  I.e.

+ enum machine_mode src_mode = GET_MODE (src);
+
  /* Eliminate a compare that's redundant with the previous.  */
  if (last_cmp_valid
+  src_mode == last_cmp-orig_mode
   rtx_equal_p (last_cmp-in_a, XEXP (src, 0))
   rtx_equal_p (last_cmp-in_b, XEXP (src, 1)))
{
...
- last_cmp-orig_mode = GET_MODE (SET_DEST (single_set (insn)));
+ last_cmp-orig_mode = src_mode;


For 4.6 and 4.7, there are only two extant users of this pass and neither
of them use anything besides CCmode before compare-elim.c does its own
manipulation of the modes later.


r~


Re: [v3] libstdc++/51798

2012-02-10 Thread Richard Henderson
On 02/09/2012 03:24 PM, Benjamin Kosnik wrote:
 This is the rest of 51798, completing the conversion to C++11 atomics
 in libstdc++. This is now a complete transition, modulo documentation
 which I plan to finish as a separate patch once I am back from the ISO
 C++ meeting.
 
 tested x86_64/linux
 
 -benjamin
 
 
 20120209-2.patch
 
 
 2012-02-09  Benjamin Kosnik  b...@redhat.com
 Jonathan Wakely  jwakely@gmail.com
 
   PR libstdc++/51798 continued.
   * acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Use __atomic_*
   builtins instead of __sync_* builtins for atomic functionality.
   * include/bits/shared_ptr_base.h: Same.
   * include/parallel/compatibility.h: Same.
   * include/profile/impl/profiler_state.h: Same.
   * include/tr1/shared_ptr.h: Same.
   * libsupc++/eh_ptr.cc: Same.
   * libsupc++/eh_throw.cc: Same.
   * libsupc++/eh_tm.cc: Same.
   * libsupc++/guard.cc: Same.
   * configure: Regenerated.
   * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust line numbers.
   * testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc: Same.

The patch uses the weak version of compare_exchange universally, which
is incorrect in a number of cases.  You wouldn't see this on x86_64;
you'd have to use a ll/sc target such as powerpc.

In addition to changing several uses to strong compare_exchange, I also
optimize the idiom

do
  {
var = *m;
newval = ...;
  }
while (!atomic_compare_exchange(m, var, newval, ...));

With the new builtins, VAR is updated with the current value of the 
memory (regardless of the weak setting), so the initial read from *M
can be hoisted outside the loop.

Ok?


r~
* include/bits/shared_ptr_base.h
(_Sp_counted_base_S_atomic::_M_add_ref_lock): Hoist initial load
outside compare_exchange loop.
* include/tr1/shared_ptr.h: Same.
* include/parallel/compatibility.h (__compare_and_swap_32): Use strong
version of compare_exchange.
(__compare_and_swap_64): Same.
* include/profile/impl/profiler_state.h (__gnu_profile::__turn): Same.
* libsupc++/guard.cc (__cxa_guard_acquire): Same.



diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index ebdc7ed..c48c18e 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -236,13 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_add_ref_lock()
 {
   // Perform lock-free add-if-not-zero operation.
-  _Atomic_word __count;
+  _Atomic_word __count = _M_use_count;
   do
{
- __count = _M_use_count;
  if (__count == 0)
__throw_bad_weak_ptr();
- 
  // Replace the current counter value with the old value + 1, as
  // long as it's not changed meanwhile. 
}
diff --git a/libstdc++-v3/include/parallel/compatibility.h 
b/libstdc++-v3/include/parallel/compatibility.h
index 460345e..8a65c9e 100644
--- a/libstdc++-v3/include/parallel/compatibility.h
+++ b/libstdc++-v3/include/parallel/compatibility.h
@@ -252,8 +252,9 @@ namespace __gnu_parallel
__replacement, __comparand)
  == __comparand;
 #elif defined(__GNUC__)
-return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, 
true,
-  __ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+return __atomic_compare_exchange_n(__ptr, __comparand, __replacement,
+  false, __ATOMIC_ACQ_REL,
+  __ATOMIC_ACQ_REL);
 #elif defined(__SUNPRO_CC)  defined(__sparc)
 return atomic_cas_32((volatile unsigned int*)__ptr, __comparand,
  __replacement) == __comparand;
@@ -299,13 +300,15 @@ namespace __gnu_parallel
 #endif
 
 #elif defined(__GNUC__)  defined(__x86_64)
-return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, 
true,
-  __ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+return __atomic_compare_exchange_n(__ptr, __comparand, __replacement,
+  false, __ATOMIC_ACQ_REL,
+  __ATOMIC_ACQ_REL);
 #elif defined(__GNUC__)  defined(__i386)\
   (defined(__i686) || defined(__pentium4) || defined(__athlon)  \
|| defined(__k8) || defined(__core2))
-return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, 
true,
-  __ATOMIC_ACQ_REL, __ATOMIC_RELAXED);
+return __atomic_compare_exchange_n(__ptr, __comparand, __replacement,
+  false, __ATOMIC_ACQ_REL,
+  __ATOMIC_ACQ_REL);
 #elif defined(__SUNPRO_CC)  defined(__sparc)
 return atomic_cas_64((volatile unsigned long long*)__ptr,
  __comparand, 

Re: [PATCH][ARM] Improve 64-bit shifts (non-NEON)

2012-02-10 Thread Richard Henderson
On 02/08/2012 08:28 AM, Andrew Stubbs wrote:
 Unfortunately, these don't work in Thumb mode (no IT block), and I'd have to 
 add arith-shift variants, I think, for ARM mode to work.
 
 H ... I'll try again.

Does it work to simply use branches initially, and rely on post-reload
ifcvt to transform to cond_exec when possible?


r~


Re: [PATCH] MIPS16 TLS support for GCC

2012-02-10 Thread Richard Henderson
On 02/04/2012 02:06 AM, Richard Sandiford wrote:
   Actually I had that idea of a link-once function too, but it turned out 
  quite complicated to do without rewriting some generic parts of GCC as it 
  is currently not prepared to emit link-once functions outside C++ 
  compilations.  It's been a while and I did lots of other stuff meanwhile, 
  so please excuse me if I got anything wrong here.
 Hmm, OK, I wouldn't have expected that.  But if you've tried making
 __mips16_rdhwr link-once and had a bad experience with it, then yeah,
 let's go with the hidden libgcc function.  It's just a shame that we're
 having to force static linking of libgcc for this one case.

The i386 target does it all the time for its __x86.get_pc_thunk.?x thingys.
So I really don't believe the not prepared comment.


r~


Re: [v3] Update alpha-linux baselines for GCC 4.7

2012-02-10 Thread Richard Henderson
On 02/06/2012 05:39 AM, Uros Bizjak wrote:
 2012-02-06  Uros Bizjak  ubiz...@gmail.com
 
 * config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Regenerated.

Ok.


r~


Re: [PATCH] Fix reg-stack DEBUG_INSN adjustments (PR debug/52132)

2012-02-10 Thread Richard Henderson
On 02/06/2012 05:06 AM, Jakub Jelinek wrote:
 2012-02-06  Jakub Jelinek  ja...@redhat.com
 
   PR debug/52132
   * reg-stack.c (subst_stack_regs_in_debug_insn): Don't use
   get_true_reg.
 
   * gcc.dg/pr52132.c: New test.

Ok.


r~


Re: Reinstalled patch for 37273

2012-02-10 Thread H.J. Lu
On Thu, Mar 17, 2011 at 1:09 PM, Jeff Law l...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1


 I had pulled the patch for 37273 during the 4.6 cycle due to exposing
 several latent problems.  I've just reinstalled it and will (of course)
 keep an eye out for any problems.



This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52208


-- 
H.J.


Re: [Ada] Do not pass -Werror during linking

2012-02-10 Thread Duncan Sands

Hi Eric,


Can you try to extract a testcase (assuming it's just a single case?).
We shouldn't
warn for layout-compatible types (but we may do so if for example struct
nesting differs).


It's more basic than that: for example, we map pointers on the C side to
addresses (integers) on the Ada side.


is having Address be an integer useful any more?  Nowadays it should be possible
to declare Address to be an access type with no associated storage pool.  Even
nicer might be to have it be turned into void* when lowering to GCC types.
After all, Address is really used much like void*; see how GNAT declares malloc
for example.  Both of these possibilities would probably play better with GCC's
middle end type system, which considers integers to be different to pointers.

Ciao, Duncan.

PS: I first thought of this when I noticed (when using the dragonegg plugin to
compile Ada) that some of LLVM's optimizations bail out when they see integers
being used where they expect a pointer.  I tried tweaking the declaration of
Address to be an access type as mentioned above, but unfortunately that crashes
the compiler pretty quickly.


libgo patch committed: FFI promotes result types

2012-02-10 Thread Ian Lance Taylor
The FFI documentation doesn't mention it, but it turns out that the FFI
library promotes integer result types.  That is, if a function is
declared as returning short, FFI will actually store the result as an
int.  The code in libgo was getting away with not supporting this
because allocating one or two bytes would always permit storing four or
eight bytes, and because I was only testing on little-endian systems.
On big-endian systems this needs to be handled correctly.  This patch
does that.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Ran some but not all tests on
sparc-sun-solaris2.11.  Committed to mainline.

Ian

diff -r ff8b35faf485 libgo/runtime/go-reflect-call.c
--- a/libgo/runtime/go-reflect-call.c	Fri Feb 10 16:12:23 2012 -0800
+++ b/libgo/runtime/go-reflect-call.c	Fri Feb 10 23:03:30 2012 -0800
@@ -5,6 +5,7 @@
license that can be found in the LICENSE file.  */
 
 #include stdio.h
+#include stdint.h
 #include stdlib.h
 
 #include ffi.h
@@ -326,6 +327,28 @@
 
   types = (const struct __go_type_descriptor **) func-__out.__values;
 
+  /* A single integer return value is always promoted to a full
+ word.  */
+  if (count == 1)
+{
+  switch (types[0]-__code  GO_CODE_MASK)
+	{
+	case GO_BOOL:
+	case GO_INT8:
+	case GO_INT16:
+	case GO_INT32:
+	case GO_UINT8:
+	case GO_UINT16:
+	case GO_UINT32:
+	case GO_INT:
+	case GO_UINT:
+	  return sizeof (ffi_arg);
+
+	default:
+	  break;
+	}
+}
+
   off = 0;
   maxalign = 0;
   for (i = 0; i  count; ++i)
@@ -362,6 +385,81 @@
 
   types = (const struct __go_type_descriptor **) func-__out.__values;
 
+  /* A single integer return value is always promoted to a full
+ word.  */
+  if (count == 1)
+{
+  switch (types[0]-__code  GO_CODE_MASK)
+	{
+	case GO_BOOL:
+	case GO_INT8:
+	case GO_INT16:
+	case GO_INT32:
+	case GO_UINT8:
+	case GO_UINT16:
+	case GO_UINT32:
+	case GO_INT:
+	case GO_UINT:
+	  {
+	union
+	{
+	  unsigned char buf[sizeof (ffi_arg)];
+	  ffi_arg v;
+	} u;
+	ffi_arg v;
+
+	__builtin_memcpy (u.buf, call_result, sizeof (ffi_arg));
+	v = u.v;
+
+	switch (types[0]-__size)
+	  {
+	  case 1:
+		{
+		  uint8_t b;
+
+		  b = (uint8_t) v;
+		  __builtin_memcpy (results[0], b, 1);
+		}
+		break;
+
+	  case 2:
+		{
+		  uint16_t s;
+
+		  s = (uint16_t) v;
+		  __builtin_memcpy (results[0], s, 2);
+		}
+		break;
+
+	  case 4:
+		{
+		  uint32_t w;
+
+		  w = (uint32_t) v;
+		  __builtin_memcpy (results[0], w, 4);
+		}
+		break;
+
+	  case 8:
+		{
+		  uint64_t d;
+
+		  d = (uint64_t) v;
+		  __builtin_memcpy (results[0], d, 8);
+		}
+		break;
+
+	  default:
+		abort ();
+	  }
+	  }
+	  return;
+
+	default:
+	  break;
+	}
+}
+
   off = 0;
   for (i = 0; i  count; ++i)
 {
@@ -388,7 +486,7 @@
   ffi_cif cif;
   unsigned char *call_result;
 
-  __go_assert (func_type-__common.__code == GO_FUNC);
+  __go_assert ((func_type-__common.__code  GO_CODE_MASK) == GO_FUNC);
   go_func_to_cif (func_type, is_interface, is_method, cif);
 
   call_result = (unsigned char *) malloc (go_results_size (func_type));