Re: trans-mem: virtual ops for gimple_transaction
On Thu, 9 Feb 2012, Richard Henderson wrote: From: rguenth at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org the __transaction_atomic // SUBCODE=[ GTMA_HAVE_STORE ] statement looks like an overly optimistic way to start a transaction in my quick view. Indeed. At some point this worked, but this may have gotten lost during one of the merges. Now, # .MEM_8 = VDEF .MEM_7(D) __transaction_relaxed // SUBCODE=[ ... ] Bootstrapped and tested on x86_64. Ok? Yes. But ... diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c index ccdf14a..ace9ef9 100644 --- a/gcc/tree-ssa-dce.c +++ b/gcc/tree-ssa-dce.c @@ -965,6 +965,13 @@ propagate_necessity (struct edge_list *el) mark_aliased_reaching_defs_necessary (stmt, op); } } + else if (gimple_code (stmt) == GIMPLE_TRANSACTION) + { + /* The beginning of a transaction is a memory barrier. */ + /* ??? If we were really cool, we'd only be a barrier +for the memories touched within the transaction. */ + mark_all_reaching_defs_necessary (stmt); + } else gcc_unreachable (); hints at that code might not expect virtual operands on this ... What is the reason to keep a GIMPLE_TRANSACTION stmt after TM lowering and not lower it to a builtin function call? It seems the body is empty after lowering (what's the label thing?) I'd have arranged it do lower __transaction_atomic { { x[9] = 1; } } to __builtin_transaction_atomic (GTMA_HAVE_STORE, label); try { x[9] = 1; } finally { __builtin__ITM_commitTransaction (); } Eventually returning/passing a token to link a transaction start to its commit / abort. That said, I expect some more fallout from the patch, but I suppose TM is still experimental enough that we can fixup things later. Richard.
Re: Go patch committed: Build math library with -funsafe-math-optimizations
On Thu, Feb 9, 2012 at 6:32 PM, Ian Lance Taylor i...@google.com wrote: Richard Guenther richard.guent...@gmail.com writes: On Wed, Feb 8, 2012 at 8:38 PM, Ian Lance Taylor i...@google.com wrote: The master Go math library uses assembler code on 386 processors to take advantage of 387 instructions. This patch lets gccgo do the same thing, by compiling the math library with -funsafe-math-optimizations. I also pass -mfancy-math-387, although that is the default. It would not be appropriate to compile all Go code with -funsafe-math-optimizations, of course, but the math library is designed to handle it. Huh ... I'd rather not do that if I were you. Instead I'd say we lack a machine specific flag to enable the fancy-x87-math patterns which then -funsafe-math-optimizations should enable. The x87 math routines are the only thing you are after, right? No math-library can be _safe_ against -funsafe-math-optimizations I believe. Yes, that approach would make sense, but this doesn't seem like the right time to do it. The -funsafe-math-optimizations option does not permit arbitrary behaviour. It merely permits a set of optimizations which violate strict IEEE conformance. I believe the Go math library can be safe in the presence of those optimizations, because the library does explicit checks for NaN and infinity, where necessary, before it does the actual operation. The math library has a fairly extensive set of tests, including tests of exceptional conditions, and it passes the tests when using -funsafe-math-optimizations. Note that I'm only using -funsafe-math-optimizations on x86. I see. -funsafe-math-optimizations affects precision though (it doesn't assume NaNs or Infinities do not happen), see the docs - it enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math. Richard. Ian
PATCH: break lines in announce_function
Hello All, When running cc1 in verbose mode, the announce_function from gcc/toplevel.c displays the name of the function. However, it never emits any newline characters. As a result, the stderr stream may contains very long lines, which is visually unpleasant and tend to block Emacs when running cc1 -v inside an Emacs buffer (or agdb session inside Emacs). The attached patch (backbported from the MELT branch, where it is very useful) fixes this issue by adding a newline once every eight identifiers. gcc/ChangeLog entry 2011-02-10 Basile Starynkevitch bas...@starynkevitch.net * toplev.c (announce_function): Emit newline periodically. Ok for trunk? Thanks -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} *** Index: gcc/toplev.c === --- gcc/toplev.c (revision 184084) +++ gcc/toplev.c (working copy) @@ -234,6 +234,12 @@ announce_function (tree decl) { if (!quiet_flag) { + + static long count; + count++; + if (count % 8 == 0) +putc('\n', stderr); + if (rtl_dump_and_exit) fprintf (stderr, %s , identifier_to_locale (IDENTIFIER_POINTER (DECL_NAME (decl;
Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result
On Fri, Feb 10, 2012 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote: 2012/2/9 Richard Guenther richard.guent...@gmail.com: Works apart from Running target unix/ FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test Maybe invalid testcases. Who knows ... same fails happen with your patch. Richard. Richard. Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause part of this failure. This might lead here to this failure. I am not sure, if such constructs having fixed behavior for C++, but it looks to me like undefined behavior. A C-testcase for the issue would be: int *foo (int *p) { *p++ = *p; return p; } which produces with patch now: foo (int * p) { int * D.1363; int D.1364; int * D.1365; D.1363 = p; p = D.1363 + 4; D.1364 = *p; *D.1363 = D.1364; D.1365 = p; return D.1365; } but in old variant we were producing: foo (int * p) { int D.1363; int * D.1364; D.1363 = *p; *p = D.1363; p = p + 4; D.1364 = p; return D.1364; } So, maybe the real solution for this issue might be to swap for assignment gimplification the order for lhs/rhs gimplification instead. Well, that would certainly not be suitable for stage4 (though I have a working patch for that as well). The post-modify gimplification change looks better as it also fixes the volatile aggregate-return case which would not be fixed by re-ordering of the gimplification. libstdc++ folks - can you investigate the testsuite failure? The patch in question is at http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00435/fix-pr48814 Note that the _Mutable_ForwardIteratorConcept isn't the problem I think, it's not executed code. Thanks, Richard. Regards, Kai
[PATCH] Fix PR52193
Committed as obvious. Richard. 2012-02-10 Richard Guenther rguent...@suse.de PR translation/52193 * cgraphunit.c (cgraph_mark_functions_to_output): Fix typo. Index: cgraphunit.c === --- cgraphunit.c(revision 184085) +++ cgraphunit.c(working copy) @@ -1430,14 +1430,16 @@ cgraph_mark_functions_to_output (void) tree decl = node-decl; if (!node-global.inlined_to gimple_has_body_p (decl) - /* FIXME: in ltrans unit when offline copy is outside partition but inline copies -are inside partition, we can end up not removing the body since we no longer -have analyzed node pointing to it. */ + /* FIXME: in an ltrans unit when the offline copy is outside a +partition but inline copies are inside a partition, we can +end up not removing the body since we no longer have an +analyzed node pointing to it. */ !node-in_other_partition !DECL_EXTERNAL (decl)) { dump_cgraph_node (stderr, node); - internal_error (failed to reclaim unneeded functionin same comdat group); + internal_error (failed to reclaim unneeded function in same + comdat group); } } #endif
[PATCH, ARM] Properly count number of instructions emitted.
This weak I investigated GCC trunk fails to compile test 186.crafty from SPECINT2000 on ARM. File validate.c compilation leads to this: /tmp/ccXFsLlG.s: Assembler messages: /tmp/ccXFsLlG.s:3411: Error: bad immediate value for offset (4112) /tmp/ccXFsLlG.s:7069: Error: bad immediate value for offset (4096) I find a patch, which causes the problem. http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01249.html Ramana, you forgot to set count = 2 in one case. This patch fixes the problem successfully. Cross-compiler regtest showed no new failures. All SPECINT2000 tests work correctly. Ok for trunk? 2012-02-10 Roman Zhuykov zhr...@ispras.ru * config/arm/arm.c (output_move_double): In one case properly count number of instructions that will be emitted. --- gcc/config/arm/arm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e2ab102..7f0dc6b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -14205,6 +14205,9 @@ output_move_double (rtx *operands, bool emit, int *count) output_asm_insn (sub%?\t%0, %1, %2, otherops); } + if (count) + *count = 2; + if (TARGET_LDRD) return ldr%(d%)\t%0, [%1]; -- Roman Zhuykov zhr...@ispras.ru
Re: [PATCH, ARM] Properly count number of instructions emitted.
On 10/02/12 11:39, Roman Zhuykov wrote: This weak I investigated GCC trunk fails to compile test 186.crafty from SPECINT2000 on ARM. File validate.c compilation leads to this: /tmp/ccXFsLlG.s: Assembler messages: /tmp/ccXFsLlG.s:3411: Error: bad immediate value for offset (4112) /tmp/ccXFsLlG.s:7069: Error: bad immediate value for offset (4096) I find a patch, which causes the problem. http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01249.html Ramana, you forgot to set count = 2 in one case. This patch fixes the problem successfully. Cross-compiler regtest showed no new failures. All SPECINT2000 tests work correctly. Ok for trunk? 2012-02-10 Roman Zhuykov zhr...@ispras.ru * config/arm/arm.c (output_move_double): In one case properly count number of instructions that will be emitted. OK. R.
PR middle-end/48600
Hi, this PR is about bug in my update to predict_paths_for_bb when handing abnormal edges. Abnormal edges can not be predicted and thus the funcition is trying to predict paths leading to the source BB of the edge instead. This however may lead to infinite loop when two regions are mutually reachable by abnormals. Bootstrapped/regtested x86_64-linux. Will commit it shortly. My apologies for ignoring the PR for so long. Honza PR middle-end/48600 * predict.c (predict_paths_for_bb): Prevent looping. (predict_paths_leading_to_edge, predict_paths_leading_to): Update. * g++.dg/torture/pr48600.C: New testcase. Index: gcc/testsuite/g++.dg/torture/pr48600.C === *** gcc/testsuite/g++.dg/torture/pr48600.C (revision 0) --- gcc/testsuite/g++.dg/torture/pr48600.C (revision 0) *** *** 0 --- 1,16 + /* { dg-do compile } */ + + class mx { + public: + mx(); + }; + + int main() + { + while (true) { + mx *bar = new mx; + mx *baz = new mx; + continue; + } + return 0; + } Index: gcc/predict.c === *** gcc/predict.c (revision 184016) --- gcc/predict.c (working copy) *** tree_estimate_probability_driver (void) *** 1827,1833 static void predict_paths_for_bb (basic_block cur, basic_block bb, enum br_predictor pred, ! enum prediction taken) { edge e; edge_iterator ei; --- 1827,1834 static void predict_paths_for_bb (basic_block cur, basic_block bb, enum br_predictor pred, ! enum prediction taken, ! bitmap visited) { edge e; edge_iterator ei; *** predict_paths_for_bb (basic_block cur, b *** 1848,1854 continue; gcc_assert (bb == cur || dominated_by_p (CDI_POST_DOMINATORS, cur, bb)); ! /* See if there is how many edge from e-src that is not abnormal and does not lead to BB. */ FOR_EACH_EDGE (e2, ei2, e-src-succs) if (e2 != e --- 1849,1855 continue; gcc_assert (bb == cur || dominated_by_p (CDI_POST_DOMINATORS, cur, bb)); ! /* See if there is an edge from e-src that is not abnormal and does not lead to BB. */ FOR_EACH_EDGE (e2, ei2, e-src-succs) if (e2 != e *** predict_paths_for_bb (basic_block cur, b *** 1861,1876 /* If there is non-abnormal path leaving e-src, predict edge using predictor. Otherwise we need to look for paths !leading to e-src. */ if (found) predict_edge_def (e, pred, taken); ! else ! predict_paths_for_bb (e-src, e-src, pred, taken); } for (son = first_dom_son (CDI_POST_DOMINATORS, cur); son; son = next_dom_son (CDI_POST_DOMINATORS, son)) ! predict_paths_for_bb (son, bb, pred, taken); } /* Sets branch probabilities according to PREDiction and --- 1862,1881 /* If there is non-abnormal path leaving e-src, predict edge using predictor. Otherwise we need to look for paths !leading to e-src. ! !The second may lead to infinite loop in the case we are predicitng !regions that are only reachable by abnormal edges. We simply !prevent visiting given BB twice. */ if (found) predict_edge_def (e, pred, taken); ! else if (!bitmap_set_bit (visited, e-src-index)) ! predict_paths_for_bb (e-src, e-src, pred, taken, visited); } for (son = first_dom_son (CDI_POST_DOMINATORS, cur); son; son = next_dom_son (CDI_POST_DOMINATORS, son)) ! predict_paths_for_bb (son, bb, pred, taken, visited); } /* Sets branch probabilities according to PREDiction and *** static void *** 1880,1886 predict_paths_leading_to (basic_block bb, enum br_predictor pred, enum prediction taken) { ! predict_paths_for_bb (bb, bb, pred, taken); } /* Like predict_paths_leading_to but take edge instead of basic block. */ --- 1885,1893 predict_paths_leading_to (basic_block bb, enum br_predictor pred, enum prediction taken) { ! bitmap visited = BITMAP_ALLOC (NULL); ! predict_paths_for_bb (bb, bb, pred, taken, visited); ! BITMAP_FREE (visited); } /* Like predict_paths_leading_to but take edge instead of basic block. */ *** predict_paths_leading_to_edge (edge e, e *** 1903,1909 break; } if (!has_nonloop_edge) ! predict_paths_for_bb (bb, bb, pred, taken); else predict_edge_def (e, pred, taken); } --- 1910,1920 break; } if (!has_nonloop_edge) ! { ! bitmap visited = BITMAP_ALLOC (NULL); ! predict_paths_for_bb (bb, bb, pred, taken, visited);
Re: PR middle-end/48600
On Fri, Feb 10, 2012 at 01:00:16PM +0100, Jan Hubicka wrote: Does this belong to the patch? It isn't mentioned in the ChangeLog, nor I see how it is related... *** libcpp/macro.c(revision 184016) --- libcpp/macro.c(working copy) *** tokens_buff_last_token_ptr (_cpp_buff *b *** 1878,1884 If VIRT_LOCS_BUFF is non-NULL, it should point at the buffer containing the virtual locations of the tokens in TOKENS_BUFF; in which case the function updates that buffer as well. */ ! static inline void tokens_buff_remove_last_token (_cpp_buff *tokens_buff) { --- 1878,1884 If VIRT_LOCS_BUFF is non-NULL, it should point at the buffer containing the virtual locations of the tokens in TOKENS_BUFF; in which case the function updates that buffer as well. */ ! static void tokens_buff_remove_last_token (_cpp_buff *tokens_buff) { Jakub
Re: [PATCH] Fix PR50031, take 2
On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Following is a revision of yesterday's PR50031 patch submission, modified per Richard's comments. Bootstrapped and tested with no regressions on powerpc64-linux. I've confirmed the same performance improvements in SPEC. OK for trunk? Some more questions - maybe Jakub can clarify as well given he worked on patterns recently ... Thanks, Bill 2012-02-09 Bill Schmidt wschm...@linux.vnet.ibm.com Ira Rosen i...@il.ibm.com PR tree-optimization/50031 * targhooks.c (default_builtin_vectorization_cost): Handle vec_promote_demote. * target.h (enum vect_cost_for_stmt): Add vec_promote_demote. * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle all types of reduction and pattern statements. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function. (vect_get_load_cost): Use vec_perm for permutations; add dump logic for explicit realigns. (vectorizable_conversion): Call vect_model_promotion_demotion_cost. * config/spu/spu.c (spu_builtin_vectorization_cost): Handle vec_promote_demote. * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update vec_perm for VSX and handle vec_promote_demote. Index: gcc/targhooks.c === --- gcc/targhooks.c (revision 183944) +++ gcc/targhooks.c (working copy) @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost case scalar_to_vec: case cond_branch_not_taken: case vec_perm: + case vec_promote_demote: return 1; case unaligned_load: Index: gcc/target.h === --- gcc/target.h (revision 183944) +++ gcc/target.h (working copy) @@ -145,7 +145,8 @@ enum vect_cost_for_stmt scalar_to_vec, cond_branch_not_taken, cond_branch_taken, - vec_perm + vec_perm, + vec_promote_demote }; /* The target structure. This holds all the backend hooks. */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 183944) +++ gcc/tree-vect-loop.c (working copy) @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf if (stmt_info !STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) + !STMT_VINFO_IN_PATTERN_P (stmt_info)) continue; Why would we exclude !relevant stmts when they are part of a pattern? We are looking at a scalar iteration cost, so all stmts that are not dead count, no? if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info { gimple stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + /* Translate the last statement in a pattern into the + related replacement statement. */ + if (STMT_VINFO_IN_PATTERN_P (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } So here we are tanslating stmt to the main scalar pattern stmt - and thus count it as many times as we have stmts in that pattern? That looks wrong. More like if (STMT_VINFO_IN_PATTERN_P (stmt_info) STMT_VINFO_RELATED_STMT (stmt_info) != stmt) continue; ? Does the main stmt has the flag set and points to itself? /* Skip stmts that are not vectorized inside the loop. */ if (!STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info continue; + ... and then, what does STMT_VINFO_INSIDE_OF_LOOP_COST of that main pattern stmt represent? Shouldn't it represent the cost of the whole sequence, and thus ... vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor; /* FIXME: for stmts in the inner-loop in outer-loop vectorization, some of the outside costs are generated inside the outer-loop. */ vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info); + if (is_pattern_stmt_p (stmt_info) + STMT_VINFO_PATTERN_DEF_SEQ (stmt_info)) + { +
Re: [SMS] Support new loop pattern
Ping. Ayal, please review this patch and these three patches too: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html -- Roman Zhuykov zhr...@ispras.ru
Re: PR middle-end/48600
On Fri, Feb 10, 2012 at 01:00:16PM +0100, Jan Hubicka wrote: Does this belong to the patch? It isn't mentioned in the ChangeLog, nor I see how it is related... No, it does not. I diffed toplevel dir and this is hack for some warning that got stuck there for months. It is not what I comitted. Sorry for messup. Honza
Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result
Kai Tietz ktiet...@googlemail.com writes: Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause part of this failure. I don't think those __constraints functions are ever executed. They are only present to assert the presense of certain operations at compile time. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result
On 10 February 2012 10:35, Richard Guenther wrote: Works apart from Running target unix/ FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test What does libstdc++.log show for those failures?
Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result
On Fri, Feb 10, 2012 at 2:12 PM, Jonathan Wakely jwakely@gmail.com wrote: On 10 February 2012 10:35, Richard Guenther wrote: Works apart from Running target unix/ FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test What does libstdc++.log show for those failures? spawn [open ...]^M ?xml version = 1.0? test sd value = 1328880244 /sd n value = 5000 /n m value = 1 /m tp value = 0.2 /tp ip value = 0.6 /ip ep value = 0.2 /ep cp value = 0.001 /cp mp value = 0.25 /mp /test cntnr name = pat_trie_map desc type value = trie Tag value = pat_trie_tag /Tag Node_Update value = null_node_update /Node_Update /type /desc progress *FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test spawn [open ...]^M ?xml version = 1.0? test sd value = 1328880487 /sd n value = 5000 /n m value = 1 /m tp value = 0.2 /tp ip value = 0.6 /ip ep value = 0.2 /ep cp value = 0.001 /cp mp value = 0.25 /mp /test cntnr name = pat_trie_set desc type value = trie Tag value = pat_trie_tag /Tag Node_Update value = null_node_update /Node_Update /type /desc progress **FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test
Re: [PATCH] Fix PR50031, take 2
Richard, thanks. I can answer most of your questions, but for the last one I will have to ask Ira to weigh in. On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote: On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Following is a revision of yesterday's PR50031 patch submission, modified per Richard's comments. Bootstrapped and tested with no regressions on powerpc64-linux. I've confirmed the same performance improvements in SPEC. OK for trunk? Some more questions - maybe Jakub can clarify as well given he worked on patterns recently ... Thanks, Bill 2012-02-09 Bill Schmidt wschm...@linux.vnet.ibm.com Ira Rosen i...@il.ibm.com PR tree-optimization/50031 * targhooks.c (default_builtin_vectorization_cost): Handle vec_promote_demote. * target.h (enum vect_cost_for_stmt): Add vec_promote_demote. * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle all types of reduction and pattern statements. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function. (vect_get_load_cost): Use vec_perm for permutations; add dump logic for explicit realigns. (vectorizable_conversion): Call vect_model_promotion_demotion_cost. * config/spu/spu.c (spu_builtin_vectorization_cost): Handle vec_promote_demote. * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update vec_perm for VSX and handle vec_promote_demote. Index: gcc/targhooks.c === --- gcc/targhooks.c (revision 183944) +++ gcc/targhooks.c (working copy) @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost case scalar_to_vec: case cond_branch_not_taken: case vec_perm: + case vec_promote_demote: return 1; case unaligned_load: Index: gcc/target.h === --- gcc/target.h(revision 183944) +++ gcc/target.h(working copy) @@ -145,7 +145,8 @@ enum vect_cost_for_stmt scalar_to_vec, cond_branch_not_taken, cond_branch_taken, - vec_perm + vec_perm, + vec_promote_demote }; /* The target structure. This holds all the backend hooks. */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 183944) +++ gcc/tree-vect-loop.c(working copy) @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf if (stmt_info !STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) + !STMT_VINFO_IN_PATTERN_P (stmt_info)) continue; Why would we exclude !relevant stmts when they are part of a pattern? We are looking at a scalar iteration cost, so all stmts that are not dead count, no? As I understand it, we're at a point where a statement replacing the pattern exists but has not yet been inserted in the code stream. All of the statements in the pattern are marked irrelevant, but the related statement of the main (last) statement of the pattern is relevant. Thus we need to allow the main statement through this check so the replacement statement can be found and counted. if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info { gimple stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + /* Translate the last statement in a pattern into the +related replacement statement. */ + if (STMT_VINFO_IN_PATTERN_P (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } So here we are tanslating stmt to the main scalar pattern stmt - and thus count it as many times as we have stmts in that pattern? That looks wrong. More like if (STMT_VINFO_IN_PATTERN_P (stmt_info) STMT_VINFO_RELATED_STMT (stmt_info) != stmt) continue; ? Does the main stmt has the flag set and points to itself? From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. /*
Re: [PATCH] Fix PR50031, take 2
On Fri, 2012-02-10 at 07:31 -0600, William J. Schmidt wrote: Richard, thanks. I can answer most of your questions, but for the last one I will have to ask Ira to weigh in. On Fri, 2012-02-10 at 13:06 +0100, Richard Guenther wrote: On Thu, Feb 9, 2012 at 5:56 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Following is a revision of yesterday's PR50031 patch submission, modified per Richard's comments. Bootstrapped and tested with no regressions on powerpc64-linux. I've confirmed the same performance improvements in SPEC. OK for trunk? Some more questions - maybe Jakub can clarify as well given he worked on patterns recently ... Thanks, Bill 2012-02-09 Bill Schmidt wschm...@linux.vnet.ibm.com Ira Rosen i...@il.ibm.com PR tree-optimization/50031 * targhooks.c (default_builtin_vectorization_cost): Handle vec_promote_demote. * target.h (enum vect_cost_for_stmt): Add vec_promote_demote. * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle all types of reduction and pattern statements. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function. (vect_get_load_cost): Use vec_perm for permutations; add dump logic for explicit realigns. (vectorizable_conversion): Call vect_model_promotion_demotion_cost. * config/spu/spu.c (spu_builtin_vectorization_cost): Handle vec_promote_demote. * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update vec_perm for VSX and handle vec_promote_demote. Index: gcc/targhooks.c === --- gcc/targhooks.c (revision 183944) +++ gcc/targhooks.c (working copy) @@ -514,6 +514,7 @@ default_builtin_vectorization_cost (enum vect_cost case scalar_to_vec: case cond_branch_not_taken: case vec_perm: + case vec_promote_demote: return 1; case unaligned_load: Index: gcc/target.h === --- gcc/target.h(revision 183944) +++ gcc/target.h(working copy) @@ -145,7 +145,8 @@ enum vect_cost_for_stmt scalar_to_vec, cond_branch_not_taken, cond_branch_taken, - vec_perm + vec_perm, + vec_promote_demote }; /* The target structure. This holds all the backend hooks. */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 183944) +++ gcc/tree-vect-loop.c(working copy) @@ -2417,7 +2417,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf if (stmt_info !STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) + !STMT_VINFO_IN_PATTERN_P (stmt_info)) continue; Why would we exclude !relevant stmts when they are part of a pattern? We are looking at a scalar iteration cost, so all stmts that are not dead count, no? As I understand it, we're at a point where a statement replacing the pattern exists but has not yet been inserted in the code stream. All of the statements in the pattern are marked irrelevant, but the related statement of the main (last) statement of the pattern is relevant. Thus we need to allow the main statement through this check so the replacement statement can be found and counted. if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) @@ -2564,15 +2565,48 @@ vect_estimate_min_profitable_iters (loop_vec_info { gimple stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + /* Translate the last statement in a pattern into the +related replacement statement. */ + if (STMT_VINFO_IN_PATTERN_P (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } So here we are tanslating stmt to the main scalar pattern stmt - and thus count it as many times as we have stmts in that pattern? That looks wrong. More like if (STMT_VINFO_IN_PATTERN_P (stmt_info) STMT_VINFO_RELATED_STMT (stmt_info) != stmt) continue; ? Does the main stmt has the flag set and points to itself? From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is
Re: [PATCH] Fix PR50031, take 2
On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote: From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not just the last, but all original stmts that are being replaced by the pattern). Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt (last one corresponding to that original stmt). If needed, further pattern stmts for that original stmts are put into STMT_VINFO_PATTERN_DEF_SEQ. The pattern matcher could e.g. match 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT on the first original stmt, then say 2 pattern stmts into STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one STMT_VINFO_RELATED_STMT and finally one pattern stmt through STMT_VINFO_RELATED_STMT on the third original stmt. Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so this section has to be omitted if we backport it (which is desirable since the degradation was introduced in 4.6). Removing it apparently does not affect the sphinx3 degradation. Yeah, when backporting you can basically assume that STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead of itt (and simplify), because no pattern recognizers needed more than one pattern stmt for each original stmt back then. Jakub
Re: Go patch committed: Multiplex goroutines onto OS threads
Ian Lance Taylor i...@google.com writes: Ian Lance Taylor i...@google.com writes: Right now it looks like there is a bug, or at least an incompatibility, in the 64-bit versions of getcontext and setcontext. It looks like calling setcontext on the 32-bit version does not change the value of TLS variables, which is also the case on GNU/Linux. In the 64-bit version, calling setcontext does change the value of TLS variables. That is, on the 64-bit version, getcontext preserves the value of TLS variables and setcontext restores the old value. (Of course it's really the pointer, not the TLS variables themselves). This incompatibility has to be a bug, and of course I would prefer that the incompatibility be resolved in favor of the implementation used on GNU/Linux. Well, I thought I could fix this in a sensible way, but I really couldn't. It's a fairly serious bug. Calling setcontext in thread T2 when getcontext was called in thread T1 causes T2 to start using T1's TLS variables, T1's pthread_getspecific values, and even T1's pthread_self. I couldn't figure out any way that T2 could do any thread specific. So I cheated and wrote a system-specific hack. With this patch applied, I see only 4 failures running the testsuite on x86/x86_64 Solaris. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu, where the code is not used. Committed to mainline. Excellent, works like a charm, thanks. The patches for Solaris 10 and 11 are expected within the next month or two. I only noticed two problems: * On Solaris 8/x86 with native TLS, SETCONTEXT_CLOBBERS_TLS was incorrectly defined, causing a proc.c compilation failure: /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:105:4: error: #error unknown case for SETCONTEXT_CLOBBERS_TLS /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_gogo': /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:299:2: error: implicit declaration of function 'fixcontext' [-Werror=implicit-function-declaration] /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_schedinit': /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:366:2: error: implicit declaration of function 'initcontext' [-Werror=implicit-function-declaration] cc1: all warnings being treated as errors make[4]: *** [proc.lo] Error 1 The configure test failed like this: configure:14894: ./conftest ld.so.1: conftest: fatal: conftest: object requires TLS, but TLS failed to initi alize /vol/gcc/src/hg/trunk/local/libgo/configure[20]: 28491 Killed The problem is that on Solaris 8 one needs to link with -pthread which takes care of finding the correct libthread.so. The following patch takes care of that and gives decent testsuite results. * In the cross-compilation case, one needs to test both the i?86-*-solaris2.1[01] and x86_64-*-solaris2.1[10] target triplets, but only for 64-bit. The first is the default, while one can configure for the second if need be. The patch contains the necessary change, adapted from libgcc/configure.ac's type size detection. Besides, the bug affects both Solaris 10 and 11, so the workaround should trigger on either, too. Rainer diff --git a/libgo/configure.ac b/libgo/configure.ac --- a/libgo/configure.ac +++ b/libgo/configure.ac @@ -584,8 +584,12 @@ fi dnl See whether setcontext changes the value of TLS variables. AC_CACHE_CHECK([whether setcontext clobbers TLS variables], [libgo_cv_lib_setcontext_clobbers_tls], -[LIBS_hold=$LIBS +[CFLAGS_hold=$CFLAGS +CFLAGS=$PTHREAD_CFLAGS +LIBS_hold=$LIBS LIBS=$LIBS $PTHREAD_LIBS +AC_CHECK_SIZEOF([void *]) +AS_VAR_ARITH([ptr_type_size], [$ac_cv_sizeof_void_p \* 8]) AC_RUN_IFELSE( [AC_LANG_SOURCE([ #include pthread.h @@ -650,11 +654,14 @@ main () ])], [libgo_cv_lib_setcontext_clobbers_tls=no], [libgo_cv_lib_setcontext_clobbers_tls=yes], -[case $target in - x86_64*-*-solaris2.10) libgo_cv_lib_setcontext_clobbers_tls=yes ;; - *) libgo_cv_lib_setcontext_clobbers_tls=no ;; +[case $target:$ptr_type_size in + i?86-*-solaris2.1[[01]]:64 | x86_64*-*-solaris2.1[[01]]:64) +libgo_cv_lib_setcontext_clobbers_tls=yes ;; + *) +libgo_cv_lib_setcontext_clobbers_tls=no ;; esac ]) +CFLAGS=$CFLAGS_hold LIBS=$LIBS_hold ]) if test $libgo_cv_lib_setcontext_clobbers_tls = yes; then -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Release virtual SSA names properly
I've played with Index: tree-ssa.c === --- tree-ssa.c (revision 184088) +++ tree-ssa.c (working copy) @@ -944,6 +944,11 @@ verify_ssa (bool check_modified_stmt) if (!gimple_nop_p (stmt)) { basic_block bb = gimple_bb (stmt); + if (!bb) + { + error (Unreleased SSA name); + goto err; + } verify_def (bb, definition_block, name, stmt, !is_gimple_reg (name)); and noticed we do not properly free a lot of virtual SSA names, leaking both SSA names, stmts and operands they point to over time. The following patch cures the cases I hit with running the tree-ssa.exp testsuite. More issues exist, and I guess we should make a more verbose variant of the above the default in 4.8 to catch these issues (there is a slight issue with this when verify_ssa is invoked during loop transforms and the vectorizer has pattern stmts built that are not in a basic block ... so maybe we should put the above into some other verifier). Bootstrapped on x86_64-unknown-linux-gnu, testing in currently in progress. I'll commit this unless somebody thinks it's a really bad idea at this point (I do not have numbers for the amount of memory we save - once we have fully compiled a function and we do not need its body for further inlining we throw it away and with it all leaked SSA names). Thanks, Richard. 2012-02-10 Richard Guenther rguent...@suse.de * tree-nrv.c (tree_nrv): Release VDEFs. * tree-inline.c (expand_call_inline): Likewise. * tree-sra.c (sra_modify_constructor_assign): Likewise. (sra_modify_assign): Likewise. * tree-vect-stmts.c (vect_remove_stores): Likewise. * tree-vect-loop.c (vect_transform_loop): Likewise. * tree-ssa-dom.c (optimize_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp): Likewise. * tree-ssa-math-opts.c (execute_cse_sincos): Likewise. Index: gcc/tree-nrv.c === --- gcc/tree-nrv.c (revision 184088) +++ gcc/tree-nrv.c (working copy) @@ -244,6 +244,7 @@ tree_nrv (void) { unlink_stmt_vdef (stmt); gsi_remove (gsi, true); + release_defs (stmt); } else { Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 184088) +++ gcc/tree-inline.c (working copy) @@ -4002,6 +4002,9 @@ expand_call_inline (basic_block bb, gimp /* Unlink the calls virtual operands before replacing it. */ unlink_stmt_vdef (stmt); + if (gimple_vdef (stmt) + TREE_CODE (gimple_vdef (stmt)) == SSA_NAME) +release_ssa_name (gimple_vdef (stmt)); /* If the inlined function returns a result that we care about, substitute the GIMPLE_CALL with an assignment of the return @@ -4043,7 +4046,7 @@ expand_call_inline (basic_block bb, gimp } } else -gsi_remove (stmt_gsi, true); + gsi_remove (stmt_gsi, true); } if (purge_dead_abnormal_edges) Index: gcc/tree-sra.c === --- gcc/tree-sra.c (revision 184088) +++ gcc/tree-sra.c (working copy) @@ -2858,6 +2858,7 @@ sra_modify_constructor_assign (gimple *s { unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); return SRA_AM_REMOVED; } else @@ -2881,6 +2882,7 @@ sra_modify_constructor_assign (gimple *s init_subtree_with_zero (acc, gsi, false, loc); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); return SRA_AM_REMOVED; } else @@ -3125,6 +3127,7 @@ sra_modify_assign (gimple *stmt, gimple_ gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (orig_gsi, true); + release_defs (*stmt); sra_stats.deleted++; return SRA_AM_REMOVED; } @@ -3145,6 +3148,7 @@ sra_modify_assign (gimple *stmt, gimple_ gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); sra_stats.deleted++; return SRA_AM_REMOVED; } Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 184088) +++ gcc/tree-vect-stmts.c (working copy) @@ -5596,7 +5596,9 @@ vect_remove_stores (gimple first_stmt) next = STMT_VINFO_RELATED_STMT (stmt_info); /* Free the attached stmt_vec_info and remove the stmt. */ next_si = gsi_for_stmt (next); + unlink_stmt_vdef (next); gsi_remove (next_si, true); +
Re: Go patch committed: Multiplex goroutines onto OS threads
Rainer Orth r...@cebitec.uni-bielefeld.de writes: * On Solaris 8/x86 with native TLS, SETCONTEXT_CLOBBERS_TLS was incorrectly defined, causing a proc.c compilation failure: /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:105:4: error: #error unknown case for SETCONTEXT_CLOBBERS_TLS /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_gogo': /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:299:2: error: implicit declaration of function 'fixcontext' [-Werror=implicit-function-declaration] /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c: In function 'runtime_schedinit': /vol/gcc/src/hg/trunk/local/libgo/runtime/proc.c:366:2: error: implicit declaration of function 'initcontext' [-Werror=implicit-function-declaration] cc1: all warnings being treated as errors make[4]: *** [proc.lo] Error 1 The configure test failed like this: configure:14894: ./conftest ld.so.1: conftest: fatal: conftest: object requires TLS, but TLS failed to initi alize /vol/gcc/src/hg/trunk/local/libgo/configure[20]: 28491 Killed The problem is that on Solaris 8 one needs to link with -pthread which takes care of finding the correct libthread.so. The following patch takes care of that and gives decent testsuite results. * In the cross-compilation case, one needs to test both the i?86-*-solaris2.1[01] and x86_64-*-solaris2.1[10] target triplets, but only for 64-bit. The first is the default, while one can configure for the second if need be. The patch contains the necessary change, adapted from libgcc/configure.ac's type size detection. Besides, the bug affects both Solaris 10 and 11, so the workaround should trigger on either, too. Thanks. Committed. Ian
Re: Go patch committed: Build math library with -funsafe-math-optimizations
Richard Guenther richard.guent...@gmail.com writes: On Thu, Feb 9, 2012 at 6:32 PM, Ian Lance Taylor i...@google.com wrote: Richard Guenther richard.guent...@gmail.com writes: On Wed, Feb 8, 2012 at 8:38 PM, Ian Lance Taylor i...@google.com wrote: The master Go math library uses assembler code on 386 processors to take advantage of 387 instructions. This patch lets gccgo do the same thing, by compiling the math library with -funsafe-math-optimizations. I also pass -mfancy-math-387, although that is the default. It would not be appropriate to compile all Go code with -funsafe-math-optimizations, of course, but the math library is designed to handle it. Huh ... I'd rather not do that if I were you. Instead I'd say we lack a machine specific flag to enable the fancy-x87-math patterns which then -funsafe-math-optimizations should enable. The x87 math routines are the only thing you are after, right? No math-library can be _safe_ against -funsafe-math-optimizations I believe. Yes, that approach would make sense, but this doesn't seem like the right time to do it. The -funsafe-math-optimizations option does not permit arbitrary behaviour. It merely permits a set of optimizations which violate strict IEEE conformance. I believe the Go math library can be safe in the presence of those optimizations, because the library does explicit checks for NaN and infinity, where necessary, before it does the actual operation. The math library has a fairly extensive set of tests, including tests of exceptional conditions, and it passes the tests when using -funsafe-math-optimizations. Note that I'm only using -funsafe-math-optimizations on x86. I see. -funsafe-math-optimizations affects precision though (it doesn't assume NaNs or Infinities do not happen), see the docs - it enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math. Affecting precision is OK for the Go math library, since the master version of the library is written to use the 80-bit 80387 registers on 386, in order to use the magic 80387 instructions. Ian
[PATCH] Fix __atomic_{always,is}_lock_free folding (PR middle-end/52177)
Hi! These builtins are: DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_ALWAYS_LOCK_FREE, __atomic_always_lock_free, BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_IS_LOCK_FREE, __atomic_is_lock_free, BT_FN_BOOL_SIZE_CONST_VPTR, ATTR_CONST_NOTHROW_LEAF_LIST) therefore return bool rather than int, but apparently the folders returned integer constants anyway, which e.g. on the following testcase leads to checking ICEs. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-02-10 Jakub Jelinek ja...@redhat.com PR middle-end/52177 * builtins.c (fold_builtin_atomic_always_lock_free, expand_builtin_atomic_always_lock_free, fold_builtin_atomic_is_lock_free, expand_builtin_atomic_is_lock_free): Return and/or test boolean_true_node/boolean_false_node instead of integer_one_node/integer_zero_node. * c-c++-common/pr52177.c: New test. --- gcc/builtins.c.jj 2012-01-30 00:10:01.0 +0100 +++ gcc/builtins.c 2012-02-10 09:37:37.719936106 +0100 @@ -5639,15 +5639,15 @@ fold_builtin_atomic_always_lock_free (tr /* If the object has smaller alignment, the the lock free routines cannot be used. */ if (type_align mode_align) -return integer_zero_node; +return boolean_false_node; /* Check if a compare_and_swap pattern exists for the mode which represents the required size. The pattern is not allowed to fail, so the existence of the pattern indicates support is present. */ if (can_compare_and_swap_p (mode, true)) -return integer_one_node; +return boolean_true_node; else -return integer_zero_node; +return boolean_false_node; } /* Return true if the parameters to call EXP represent an object which will @@ -5671,7 +5671,7 @@ expand_builtin_atomic_always_lock_free ( } size = fold_builtin_atomic_always_lock_free (arg0, arg1); - if (size == integer_one_node) + if (size == boolean_true_node) return const1_rtx; return const0_rtx; } @@ -5686,8 +5686,8 @@ fold_builtin_atomic_is_lock_free (tree a return NULL_TREE; /* If it isn't always lock free, don't generate a result. */ - if (fold_builtin_atomic_always_lock_free (arg0, arg1) == integer_one_node) -return integer_one_node; + if (fold_builtin_atomic_always_lock_free (arg0, arg1) == boolean_true_node) +return boolean_true_node; return NULL_TREE; } @@ -5717,7 +5717,7 @@ expand_builtin_atomic_is_lock_free (tree /* If the value is known at compile time, return the RTX for it. */ size = fold_builtin_atomic_is_lock_free (arg0, arg1); - if (size == integer_one_node) + if (size == boolean_true_node) return const1_rtx; return NULL_RTX; --- gcc/testsuite/c-c++-common/pr52177.c.jj 2012-02-10 09:46:27.175770110 +0100 +++ gcc/testsuite/c-c++-common/pr52177.c2012-02-10 09:46:01.0 +0100 @@ -0,0 +1,23 @@ +/* PR middle-end/52177 */ +/* { dg-do compile } */ +/* { dg-options -O -fno-tree-ccp } */ + +int *s; + +static inline int +foo () +{ + return sizeof (int); +} + +int +bar () +{ + return __atomic_always_lock_free (foo (), s); +} + +int +baz () +{ + return __atomic_is_lock_free (foo (), s); +} Jakub
[PATCH] Fix cgraph verification (PR middle-end/51929)
Hi! As discussed in the PR, this is a verification ICE if a call stmt calls a same_body_alias (__comp_ctor) and IPA-CP decides to clone the ctor, it clones the function with the body (__base_ctor), but before the call stmts are updated, the checking fails because the clone isn't clone or former clone of the called function, but its alias. Fixed by adjusting the verifier, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-02-10 Jakub Jelinek ja...@redhat.com PR middle-end/51929 * cgraphunit.c (verify_edge_corresponds_to_fndecl): If node is a same_body_alias, also test whether e-callee isn't a former or current clone of the decl this is a same body alias of. (verify_cgraph_node): Fix a typo in error message. * g++.dg/ipa/pr51929.C: New test. --- gcc/cgraphunit.c.jj 2012-02-03 13:31:41.0 +0100 +++ gcc/cgraphunit.c2012-02-10 12:35:44.682099203 +0100 @@ -471,11 +471,17 @@ verify_edge_corresponds_to_fndecl (struc return false; node = cgraph_function_or_thunk_node (node, NULL); - if ((e-callee-former_clone_of != node-decl) + if ((e-callee-former_clone_of != node-decl +(!node-same_body_alias + || e-callee-former_clone_of != node-thunk.alias)) /* IPA-CP sometimes redirect edge to clone and then back to the former -function. This ping-pong has to go, eventaully. */ +function. This ping-pong has to go, eventually. */ (node != cgraph_function_or_thunk_node (e-callee, NULL)) - !clone_of_p (node, e-callee)) + !clone_of_p (node, e-callee) + /* If decl is a same body alias of some other decl, allow e-callee to be +a clone of a clone of that other decl too. */ + (!node-same_body_alias + || !clone_of_p (cgraph_get_node (node-thunk.alias), e-callee))) return true; else return false; @@ -667,7 +673,7 @@ verify_cgraph_node (struct cgraph_node * for (i = 0; ipa_ref_list_reference_iterate (node-ref_list, i, ref); i++) if (ref-use != IPA_REF_ALIAS) { - error (Alias has non-alias refernece); + error (Alias has non-alias reference); error_found = true; } else if (ref_found) --- gcc/testsuite/g++.dg/ipa/pr51929.C.jj 2012-02-10 12:36:55.792748549 +0100 +++ gcc/testsuite/g++.dg/ipa/pr51929.C 2012-02-09 19:42:10.0 +0100 @@ -0,0 +1,32 @@ +// PR middle-end/51929 +// { dg-do compile } +// { dg-options -O -fno-guess-branch-probability -fipa-cp -fipa-cp-clone --param=max-inline-insns-single=25 } + +struct A +{ + A (A, unsigned); + A (const char *); + ~A () { a1 (a4 ()); } + void a1 (int); + unsigned a2 (); + char *a3 (); + int a4 (); +}; + +template typename T +struct B +{ + A b; + B (A x, int y = 1) : b (x.a3 (), x.a2 ()) { if (y 1) b.a2 (); } +}; + +extern template struct B char; +A a1 (foo), a2 (bar); +Bchar b1 (a1), b2 (a2, 8); + +void +foo () +{ + A a3 (baz); + Bchar b3 (a1), b4 (a3); +} Jakub
Re: [PATCH] Fix sibcall argument overlap checking if pretend_args_size (PR target/52129)
On Fri, Feb 10, 2012 at 2:13 PM, Jing Yu jin...@google.com wrote: On Thu, Feb 9, 2012 at 12:54 AM, Carrot Wei car...@google.com wrote: Hi Richard and Jakub Since 4.6 contains the same bug, I would like to back port it to 4.6 branch. Could you approve it for 4.6? Jing and Doug Could you approve it for google/gcc-4_6-mobile branch? OK for google/gcc-4_6-mobile and gcc-4_6_2-mobile Jing Bootstrapped/regtested on x86_64-linux and regtested on arm qemu. Committed to both google/gcc-4_6-mobile and google/gcc-4_6_2-mobile. thanks Carrot
Re: [PATCH] Fix PR50031, take 2
Jakub, thanks! Based on this, I believe the patch is correct in its handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double counting. I misinterpreted what the commentary for vect_pattern_recog was saying: I thought that all replacements were hung off of the last pattern statement, but this was just true for the particular example, where only one replacement statement was created. Sorry for any confusion! Based on the discussion, is the patch OK for trunk? Thanks, Bill On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote: On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote: From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not just the last, but all original stmts that are being replaced by the pattern). Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt (last one corresponding to that original stmt). If needed, further pattern stmts for that original stmts are put into STMT_VINFO_PATTERN_DEF_SEQ. The pattern matcher could e.g. match 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT on the first original stmt, then say 2 pattern stmts into STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one STMT_VINFO_RELATED_STMT and finally one pattern stmt through STMT_VINFO_RELATED_STMT on the third original stmt. Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so this section has to be omitted if we backport it (which is desirable since the degradation was introduced in 4.6). Removing it apparently does not affect the sphinx3 degradation. Yeah, when backporting you can basically assume that STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead of itt (and simplify), because no pattern recognizers needed more than one pattern stmt for each original stmt back then. Jakub
Re: Gthreads patch to disable static initializer macros
Jonathan Wakely jwakely@gmail.com writes: On 6 February 2012 19:24, Mike Stump wrote: On Feb 5, 2012, at 12:26 PM, Jonathan Wakely wrote: PRs libstdc++/51296 and libstdc++/51906 are both caused by problems with the Pthreads static initializer macros such as PTHREAD_MUTEX_INITIALIZER. On Mac OS X 10.7 the PTHREAD_RECURSIVE_MUTEX_INITIALIZER is buggy. Thanks for all you work on this. Well I broke it so I had to fix it ;) I'm pleased to say we should now have an almost complete C++11 thread implementation for most POSIX platforms, with hundreds of existing libstdc++ tests moving from UNSUPPORTED to PASS on some secondary platforms (aix and darwin, maybe hpux too.) Indeed, thanks for your hard work on this. The following patch on top of your previous one allows the 30_threads tests to PASS on Tru64 UNIX. Bootstrapped without regressions on alpha-dec-osf5.1b, ok for mainline? I've also noticed one feature of your patch that I don't like: __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of pthread_{mutex,cond}_init_function instead of passing them on as all other gthr-posix.h functions do. This might lead to bad error handling, and my previous patch (based on an older version of yours you had attached to the PR) changed them to return int instead. I suppose changing this now is out of question, and this is left as 4.8 material. Rainer 2012-02-03 Rainer Orth r...@cebitec.uni-bielefeld.de libstdc++-v3: PR libstdc++/51296 * config/os/osf/ctype_base.h, config/os/osf/ctype_configure_char.cc, config/os/osf/ctype_inline.h, config/os/osf/error_constants.h: Copy from config/os/generic. * config/os/osf/os_defines.h: Likewise. (_GTHREAD_USE_MUTEX_INIT_FUNC, _GTHREAD_USE_COND_INIT_FUNC): Define. * configure.host osf*: Use os/osf for os_include_dir. # HG changeset patch # Parent badf67959a1e06e2f8c98df3797878e3123aad8e Use __GTHREAD_MUTEX_INIT_FUNCTION on Tru64 UNIX (PR libstdc++/51296) diff --git a/libstdc++-v3/config/os/generic/ctype_base.h b/libstdc++-v3/config/os/osf/ctype_base.h copy from libstdc++-v3/config/os/generic/ctype_base.h copy to libstdc++-v3/config/os/osf/ctype_base.h diff --git a/libstdc++-v3/config/os/generic/ctype_configure_char.cc b/libstdc++-v3/config/os/osf/ctype_configure_char.cc copy from libstdc++-v3/config/os/generic/ctype_configure_char.cc copy to libstdc++-v3/config/os/osf/ctype_configure_char.cc diff --git a/libstdc++-v3/config/os/generic/ctype_inline.h b/libstdc++-v3/config/os/osf/ctype_inline.h copy from libstdc++-v3/config/os/generic/ctype_inline.h copy to libstdc++-v3/config/os/osf/ctype_inline.h diff --git a/libstdc++-v3/config/os/generic/error_constants.h b/libstdc++-v3/config/os/osf/error_constants.h copy from libstdc++-v3/config/os/generic/error_constants.h copy to libstdc++-v3/config/os/osf/error_constants.h diff --git a/libstdc++-v3/config/os/generic/os_defines.h b/libstdc++-v3/config/os/osf/os_defines.h copy from libstdc++-v3/config/os/generic/os_defines.h copy to libstdc++-v3/config/os/osf/os_defines.h --- a/libstdc++-v3/config/os/generic/os_defines.h +++ b/libstdc++-v3/config/os/osf/os_defines.h @@ -1,6 +1,6 @@ -// Specific definitions for generic platforms -*- C++ -*- +// Specific definitions for Tru64 UNIX -*- C++ -*- -// Copyright (C) 2000, 2009, 2010 Free Software Foundation, Inc. +// Copyright (C) 2000, 2009, 2010, 2012 Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -33,4 +33,9 @@ // System-specific #define, typedefs, corrections, etc, go here. This // file will come before all others. +// Tru64 UNIX requires using pthread_mutex_init()/pthread_cond_init() to +// initialized non-statically allocated mutexes/condvars. +#define _GTHREAD_USE_MUTEX_INIT_FUNC +#define _GTHREAD_USE_COND_INIT_FUNC + #endif diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host --- a/libstdc++-v3/configure.host +++ b/libstdc++-v3/configure.host @@ -280,7 +280,7 @@ case ${host_os} in os_include_dir=os/bsd/netbsd ;; osf*) -os_include_dir=os/generic +os_include_dir=os/osf # libstdc++.so relies on emutls on Tru64 UNIX, which only works with the # real functions implemented in libpthread.so, not with the dummies in # libgcc, so always pass -lpthread. -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Ping: Re: [patch middle-end]: Fix PR/48814 - [4.4/4.5/4.6/4.7 Regression] Incorrect scalar increment result
2012/2/10 Richard Guenther richard.guent...@gmail.com: On Fri, Feb 10, 2012 at 9:36 AM, Kai Tietz ktiet...@googlemail.com wrote: 2012/2/9 Richard Guenther richard.guent...@gmail.com: Works apart from Running target unix/ FAIL: ext/pb_ds/regression/trie_map_rand.cc execution test FAIL: ext/pb_ds/regression/trie_set_rand.cc execution test Maybe invalid testcases. Who knows ... same fails happen with your patch. Richard. Richard. Hmm, I see in libstdc++'s file include/bits/boost_concept_check.h some use of '*__i++ = *__i;' and '*__i-- = *__i;', which seems to cause part of this failure. This might lead here to this failure. I am not sure, if such constructs having fixed behavior for C++, but it looks to me like undefined behavior. A C-testcase for the issue would be: int *foo (int *p) { *p++ = *p; return p; } which produces with patch now: foo (int * p) { int * D.1363; int D.1364; int * D.1365; D.1363 = p; p = D.1363 + 4; D.1364 = *p; *D.1363 = D.1364; D.1365 = p; return D.1365; } but in old variant we were producing: foo (int * p) { int D.1363; int * D.1364; D.1363 = *p; *p = D.1363; p = p + 4; D.1364 = p; return D.1364; } So, maybe the real solution for this issue might be to swap for assignment gimplification the order for lhs/rhs gimplification instead. Well, that would certainly not be suitable for stage4 (though I have a working patch for that as well). The post-modify gimplification change looks better as it also fixes the volatile aggregate-return case which would not be fixed by re-ordering of the gimplification. libstdc++ folks - can you investigate the testsuite failure? The patch in question is at http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00435/fix-pr48814 Note that the _Mutable_ForwardIteratorConcept isn't the problem I think, it's not executed code. Thanks, Richard. Hmm, we might need here lhs/rhs flip plus the post-modify. At least we would avoid by this code differences for common cases that lhs has post-inc/dec to current behavior? But you might be right that this patch is not suitable for stage 4. Regards, Kai
Re: [patch boehm-gc]: Fix PR/48514
Kai == Kai Tietz ktiet...@googlemail.com writes: Kai * include/gc_config_macros.h (GC_DLL): Define it for Kai mingw-targets only, if Kai we are actual in boehm-gc's build and DLL_EXPORT macro is defined. This is ok. Thanks. Tom
Re: [PATCH] Fix PR50031, take 2
On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Jakub, thanks! Based on this, I believe the patch is correct in its handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double counting. I misinterpreted what the commentary for vect_pattern_recog was saying: I thought that all replacements were hung off of the last pattern statement, but this was just true for the particular example, where only one replacement statement was created. Sorry for any confusion! Based on the discussion, is the patch OK for trunk? But you still count for each of the matched scalar stmts the cost of the whole pattern sequence. No? Richard. Thanks, Bill On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote: On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote: From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not just the last, but all original stmts that are being replaced by the pattern). Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt (last one corresponding to that original stmt). If needed, further pattern stmts for that original stmts are put into STMT_VINFO_PATTERN_DEF_SEQ. The pattern matcher could e.g. match 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT on the first original stmt, then say 2 pattern stmts into STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one STMT_VINFO_RELATED_STMT and finally one pattern stmt through STMT_VINFO_RELATED_STMT on the third original stmt. Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so this section has to be omitted if we backport it (which is desirable since the degradation was introduced in 4.6). Removing it apparently does not affect the sphinx3 degradation. Yeah, when backporting you can basically assume that STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead of itt (and simplify), because no pattern recognizers needed more than one pattern stmt for each original stmt back then. Jakub
Re: [PATCH] Release virtual SSA names properly
On Fri, 10 Feb 2012, Richard Guenther wrote: I've played with Index: tree-ssa.c === --- tree-ssa.c (revision 184088) +++ tree-ssa.c (working copy) @@ -944,6 +944,11 @@ verify_ssa (bool check_modified_stmt) if (!gimple_nop_p (stmt)) { basic_block bb = gimple_bb (stmt); + if (!bb) + { + error (Unreleased SSA name); + goto err; + } verify_def (bb, definition_block, name, stmt, !is_gimple_reg (name)); and noticed we do not properly free a lot of virtual SSA names, leaking both SSA names, stmts and operands they point to over time. The following patch cures the cases I hit with running the tree-ssa.exp testsuite. More issues exist, and I guess we should make a more verbose variant of the above the default in 4.8 to catch these issues (there is a slight issue with this when verify_ssa is invoked during loop transforms and the vectorizer has pattern stmts built that are not in a basic block ... so maybe we should put the above into some other verifier). Bootstrapped on x86_64-unknown-linux-gnu, testing in currently in progress. I'll commit this unless somebody thinks it's a really bad idea at this point (I do not have numbers for the amount of memory we save - once we have fully compiled a function and we do not need its body for further inlining we throw it away and with it all leaked SSA names). I'm not installing this now because I hit PR52198 with this change. I'm instead queuing it for 4.8 where I can XFAIL the testcases that are affected. Richard. 2012-02-10 Richard Guenther rguent...@suse.de * tree-nrv.c (tree_nrv): Release VDEFs. * tree-inline.c (expand_call_inline): Likewise. * tree-sra.c (sra_modify_constructor_assign): Likewise. (sra_modify_assign): Likewise. * tree-vect-stmts.c (vect_remove_stores): Likewise. * tree-vect-loop.c (vect_transform_loop): Likewise. * tree-ssa-dom.c (optimize_stmt): Likewise. * tree-vect-slp.c (vect_schedule_slp): Likewise. * tree-ssa-math-opts.c (execute_cse_sincos): Likewise. Index: gcc/tree-nrv.c === --- gcc/tree-nrv.c(revision 184088) +++ gcc/tree-nrv.c(working copy) @@ -244,6 +244,7 @@ tree_nrv (void) { unlink_stmt_vdef (stmt); gsi_remove (gsi, true); + release_defs (stmt); } else { Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 184088) +++ gcc/tree-inline.c (working copy) @@ -4002,6 +4002,9 @@ expand_call_inline (basic_block bb, gimp /* Unlink the calls virtual operands before replacing it. */ unlink_stmt_vdef (stmt); + if (gimple_vdef (stmt) + TREE_CODE (gimple_vdef (stmt)) == SSA_NAME) +release_ssa_name (gimple_vdef (stmt)); /* If the inlined function returns a result that we care about, substitute the GIMPLE_CALL with an assignment of the return @@ -4043,7 +4046,7 @@ expand_call_inline (basic_block bb, gimp } } else -gsi_remove (stmt_gsi, true); + gsi_remove (stmt_gsi, true); } if (purge_dead_abnormal_edges) Index: gcc/tree-sra.c === --- gcc/tree-sra.c(revision 184088) +++ gcc/tree-sra.c(working copy) @@ -2858,6 +2858,7 @@ sra_modify_constructor_assign (gimple *s { unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); return SRA_AM_REMOVED; } else @@ -2881,6 +2882,7 @@ sra_modify_constructor_assign (gimple *s init_subtree_with_zero (acc, gsi, false, loc); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); return SRA_AM_REMOVED; } else @@ -3125,6 +3127,7 @@ sra_modify_assign (gimple *stmt, gimple_ gsi_next (gsi); unlink_stmt_vdef (*stmt); gsi_remove (orig_gsi, true); + release_defs (*stmt); sra_stats.deleted++; return SRA_AM_REMOVED; } @@ -3145,6 +3148,7 @@ sra_modify_assign (gimple *stmt, gimple_ gcc_assert (*stmt == gsi_stmt (*gsi)); unlink_stmt_vdef (*stmt); gsi_remove (gsi, true); + release_defs (*stmt); sra_stats.deleted++; return SRA_AM_REMOVED; } Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 184088) +++ gcc/tree-vect-stmts.c (working copy) @@ -5596,7 +5596,9 @@ vect_remove_stores (gimple first_stmt) next =
Re: [PATCH] Fix PR50031, take 2
On Fri, 2012-02-10 at 16:22 +0100, Richard Guenther wrote: On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Jakub, thanks! Based on this, I believe the patch is correct in its handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double counting. I misinterpreted what the commentary for vect_pattern_recog was saying: I thought that all replacements were hung off of the last pattern statement, but this was just true for the particular example, where only one replacement statement was created. Sorry for any confusion! Based on the discussion, is the patch OK for trunk? But you still count for each of the matched scalar stmts the cost of the whole pattern sequence. No? I need to change the commentary to make this more clear now. My comment: Translate the last statement... is incorrect; this is done for each statement in a pattern. Per Jakub's explanation, the replacement statements are distributed over the original pattern statements. Visiting STMT_VINFO_RELATED_STMT for a statement marked STMT_VINFO_IN_PATTERN_P will find zero or one replacement statements that should be examined. Visiting STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement statements that didn't fit in the 1-1 mapping. The point is that all of these related statements and pattern-def-seq statements are disjoint, and Ira's logic is ensuring that they all get examined once. It's not a very clean way to represent the replacement of a pattern -- not how you'd do it if designing from scratch -- but I guess I can see how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto the existing 1-1 mapping. Bill Richard. Thanks, Bill On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote: On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote: From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not just the last, but all original stmts that are being replaced by the pattern). Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt (last one corresponding to that original stmt). If needed, further pattern stmts for that original stmts are put into STMT_VINFO_PATTERN_DEF_SEQ. The pattern matcher could e.g. match 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT on the first original stmt, then say 2 pattern stmts into STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one STMT_VINFO_RELATED_STMT and finally one pattern stmt through STMT_VINFO_RELATED_STMT on the third original stmt. Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so this section has to be omitted if we backport it (which is desirable since the degradation was introduced in 4.6). Removing it apparently does not affect the sphinx3 degradation. Yeah, when backporting you can basically assume that STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead of itt (and simplify), because no pattern recognizers needed more than one pattern stmt for each original stmt back then. Jakub
Re: [PATCH] Fix PR50031, take 2
On Fri, Feb 10, 2012 at 04:22:32PM +0100, Richard Guenther wrote: On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Jakub, thanks! Based on this, I believe the patch is correct in its handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double counting. I misinterpreted what the commentary for vect_pattern_recog was saying: I thought that all replacements were hung off of the last pattern statement, but this was just true for the particular example, where only one replacement statement was created. Sorry for any confusion! Based on the discussion, is the patch OK for trunk? But you still count for each of the matched scalar stmts the cost of the whole pattern sequence. No? For each stmt you should count the original stmt if it is relevant (I think there are cases where both the original stmt and pattern stmt for it are both relevant and emitted, didn't understand it exactly though, but usually only the orig or only the pattern stmts are relevant), and if STMT_VINFO_IN_PATTERN_P, count the STMT_VINFO_RELATED_STMT (if any) and all STMT_VINFO_PATTERN_DEF_SEQ sequence stmts (if any), again, not sure if STMT_VINFO_RELEVANT needs to be tested for each or not. I guess look at what we actually emit if the cost model says it should be emitted. Neither the STMT_VINFO_PATTERN_DEF_SEQ nor STMT_VINFO_RELATED_STMT are shared in between any orig stmts, each has its own set of stmts. I believe if some original stmt should map to no pattern stmts, it isn't marked as STMT_VINFO_IN_PATTERN_P at all, but as nothing uses its lhs with the pattern stmts in place, it won't be relevant and will be ignored. Jakub
Re: [PATCH] Fix PR50031, take 2
On Fri, Feb 10, 2012 at 09:36:01AM -0600, William J. Schmidt wrote: Per Jakub's explanation, the replacement statements are distributed over the original pattern statements. Visiting STMT_VINFO_RELATED_STMT for a statement marked STMT_VINFO_IN_PATTERN_P will find zero or one replacement statements that should be examined. Visiting STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement statements that didn't fit in the 1-1 mapping. The point is that all of these related statements and pattern-def-seq statements are disjoint, and Ira's logic is ensuring that they all get examined once. It's not a very clean way to represent the replacement of a pattern -- not how you'd do it if designing from scratch -- but I guess I can see how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto the existing 1-1 mapping. Still in 4.6 we have just a 1-1 or 1-0 mapping (as I wrote in the previous mail, I think those orig stmts that don't need any replacements, e.g. if you have 2 orig stmts mapping to just one pattern stmt, are just marked as not relevant). Then 4.7 first added (Ira's changes) STMT_VINFO_PATTERN_DEF_STMT, i.e. in addition to 1-1 and 1-0 there was a possibility for 1-2 mapping. And then I needed more than 2, but it was too late in the game to do large changes, so we end up with 1-1+N scheme by changing the DEF_STMT into DEF_SEQ. For 4.8 we definitely at least should remove STMT_VINFO_RELATED_STMT as a way to represent pattern stmts, even for 1-1 mapping the pattern stmt should go into a sequence. Jakub
libgo patch committed: Save all registers on stack for GC scan
This patch to the libgo garbage collector ensures that we save all the registers on the stack when starting a GC. Otherwise we could sometimes miss a value held in a register while scanning the stack of the goroutine running the GC. This never happened on x86, but it did happen on SPARC. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 1f6696441eeb libgo/runtime/mgc0.c --- a/libgo/runtime/mgc0.c Fri Feb 10 06:07:07 2012 -0800 +++ b/libgo/runtime/mgc0.c Fri Feb 10 07:50:20 2012 -0800 @@ -936,6 +936,10 @@ const byte *p; bool extra; + // Make sure all registers are saved on stack so that + // scanstack sees them. + __builtin_unwind_init(); + // The gc is turned off (via enablegc) until // the bootstrap has completed. // Also, malloc gets called in the guts
[contrib] Add a commonly used flag to repro_fail (issue5650057)
This patch adds two common shortcuts to the failure reproducer script. Committed to trunk. 2012-02-10 Diego Novillo dnovi...@google.com * repro_fail: Add --debug and --debug-tui flags. diff --git a/contrib/repro_fail b/contrib/repro_fail index 8100456..c55d080 100755 --- a/contrib/repro_fail +++ b/contrib/repro_fail @@ -31,14 +31,26 @@ # command, it asks which one you want. if [ $# -lt 2 ] ; then -echo usage: $0 pattern file.log [additional-args] +echo usage: $0 [--debug|--debug-tui] pattern file.log [additional-args] echo echo Finds the 'spawn' line matching PATTERN in FILE.LOG and executes echo the command with any arguments in ADDITIONAL-ARGS. echo +echo If --debug is used, the compiler is invoked with -wrapper gdb,--args +echo If --debug-tui is used, the compiler is invoked with -wrapper \ + gdb,--tui,--args exit 1 fi +if [ $1 == --debug ] ; then +debug_args=-wrapper gdb,--args +shift +elif [ $1 == --debug-tui ] ; then +debug_args=-wrapper gdb,--tui,--args +shift +else +debug_args= +fi pattern=$1 logf=$2 shift 2 @@ -77,6 +89,6 @@ IFS=$old_IFS for cmd_num in $cmds_to_run ; do cmd=${commands[$cmd_num]} set -x +e -$cmd $@ +$cmd $debug_args $@ set +x -e done -- This patch is available for review at http://codereview.appspot.com/5650057
libgo patch committed: Record g0 top of stack correctly when not split
This patch to libgo records the top of the g0 goroutine stack correctly when not using -fsplit-stack. Without this patch the garbage collector would scan beyond the top of the stack when looking at the g0 stack, which could cause segmentation violations. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 971ec6345e64 libgo/runtime/proc.c --- a/libgo/runtime/proc.c Fri Feb 10 07:50:54 2012 -0800 +++ b/libgo/runtime/proc.c Fri Feb 10 07:52:48 2012 -0800 @@ -909,7 +909,9 @@ __splitstack_getcontext(g-stack_context[0]); #else g-gcinitial_sp = mp; - g-gcstack_size = StackMin; + // Setting gcstack_size to 0 is a marker meaning that gcinitial_sp + // is the top of the stack, not the bottom. + g-gcstack_size = 0; g-gcnext_sp = mp; #endif getcontext(g-context); @@ -1267,6 +1269,8 @@ #else sp = newg-gcinitial_sp; spsize = newg-gcstack_size; + if(spsize == 0) + runtime_throw(bad spsize in __go_go); newg-gcnext_sp = sp; #endif } else {
Re: [PATCH] Fix PR50031, take 2
On Fri, Feb 10, 2012 at 4:36 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: On Fri, 2012-02-10 at 16:22 +0100, Richard Guenther wrote: On Fri, Feb 10, 2012 at 3:39 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Jakub, thanks! Based on this, I believe the patch is correct in its handling of the STMT_VINFO_PATTERN_DEF_SEQ logic, without any double counting. I misinterpreted what the commentary for vect_pattern_recog was saying: I thought that all replacements were hung off of the last pattern statement, but this was just true for the particular example, where only one replacement statement was created. Sorry for any confusion! Based on the discussion, is the patch OK for trunk? But you still count for each of the matched scalar stmts the cost of the whole pattern sequence. No? I need to change the commentary to make this more clear now. My comment: Translate the last statement... is incorrect; this is done for each statement in a pattern. Per Jakub's explanation, the replacement statements are distributed over the original pattern statements. Ok, looking at the code in tree-vect-patterns.c it seems that STMT_VINFO_IN_PATTERN_P is only set for the stmt that contains the first replacement stmt in STMT_VINFO_RELATED_STMT and further stmts in that stmts STMT_VINFO_PATTERN_DEF_SEQ. Other stmts that were matched do not have STMT_VINFO_IN_PATTERN_P set (but their STMT_VINFO_RELATED_STMT points to the stmt that has STMT_VINFO_IN_PATTERN_P set). Possibly the other scalar stmts participating in the pattern are marked as not relevant (couldn't spot this quickly). So it seems that your patch should indeed work as-is. Thus, ok, if Jakub doesn't spot another error. Thanks, Richard. Visiting STMT_VINFO_RELATED_STMT for a statement marked STMT_VINFO_IN_PATTERN_P will find zero or one replacement statements that should be examined. Visiting STMT_VINFO_PATTERN_DEF_SEQ may pick up some leftover replacement statements that didn't fit in the 1-1 mapping. The point is that all of these related statements and pattern-def-seq statements are disjoint, and Ira's logic is ensuring that they all get examined once. It's not a very clean way to represent the replacement of a pattern -- not how you'd do it if designing from scratch -- but I guess I can see how it got this way when the STMT_VINFO_PATTERN_DEF_SEQ was glued onto the existing 1-1 mapping. Bill Richard. Thanks, Bill On Fri, 2012-02-10 at 14:44 +0100, Jakub Jelinek wrote: On Fri, Feb 10, 2012 at 07:31:01AM -0600, William J. Schmidt wrote: From the commentary at the end of tree-vect-patterns.c, only the main statement in the pattern (the last one) is flagged as STMT_VINFO_IN_PATTERN_P. So this is finding the new replacement statement which has been created but not yet inserted in the code. It only gets counted once. STMT_VINFO_IN_PATTERN_P is set on the vinfo of the original stmt (and not just the last, but all original stmts that are being replaced by the pattern). Each of these has STMT_VINFO_RELATED_STMT set to a pattern stmt (last one corresponding to that original stmt). If needed, further pattern stmts for that original stmts are put into STMT_VINFO_PATTERN_DEF_SEQ. The pattern matcher could e.g. match 3 original stmts, and stick a single pattern stmt into STMT_VINFO_RELATED_STMT on the first original stmt, then say 2 pattern stmts into STMT_VINFO_PATTERN_DEF_SEQ of the second original stmt plus one STMT_VINFO_RELATED_STMT and finally one pattern stmt through STMT_VINFO_RELATED_STMT on the third original stmt. Note that STMT_VINFO_PATTERN_DEF_SEQ doesn't exist in the 4.6 branch, so this section has to be omitted if we backport it (which is desirable since the degradation was introduced in 4.6). Removing it apparently does not affect the sphinx3 degradation. Yeah, when backporting you can basically assume that STMT_VINFO_PATTERN_DEF_SEQ is always NULL in 4.6 and just write NULL instead of itt (and simplify), because no pattern recognizers needed more than one pattern stmt for each original stmt back then. Jakub
[PATCH] c/52190 update __atomic documentation
Updated the extra parameter to __atomic_{is,always}_lock_free , and bkoz noted that the __atomic_compare_exchange documentation did not document the 'weak' parameter. Fixed as such, seems to format fine. OK for mainline? Andrew PR c/52190 * doc/extend.texi : Update comments for __atomic_compare_exchange and __atomic_{is,always}_lock_free. Index: doc/extend.texi === *** doc/extend.texi (revision 183969) --- doc/extend.texi (working copy) *** This built-in function implements an ato *** 7146,7152 This compares the contents of @code{*@var{ptr}} with the contents of @code{*@var{expected}} and if equal, writes @var{desired} into @code{*@var{ptr}}. If they are not equal, the current contents of ! @code{*@var{ptr}} is written into @code{*@var{expected}}. True is returned if @code{*@var{desired}} is written into @code{*@var{ptr}} and the execution is considered to conform to the --- 7146,7155 This compares the contents of @code{*@var{ptr}} with the contents of @code{*@var{expected}} and if equal, writes @var{desired} into @code{*@var{ptr}}. If they are not equal, the current contents of ! @code{*@var{ptr}} is written into @code{*@var{expected}}. @var{weak} is true ! for weak compare_exchange, and false for the strong variation. Many targets ! only offer the strong variation and ignore the parameter. When in doubt, use ! the strong variation. True is returned if @code{*@var{desired}} is written into @code{*@var{ptr}} and the execution is considered to conform to the *** All memory orders are valid. *** 7242,7268 @end deftypefn ! @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size) ! This built-in function returns true if objects of size bytes will always ! generate lock free atomic instructions for the target architecture. ! Otherwise false is returned. ! size must resolve to a compile time constant. @smallexample ! if (_atomic_always_lock_free (sizeof (long long))) @end smallexample @end deftypefn ! @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size) ! This built-in function returns true if objects of size bytes will always generate lock free atomic instructions for the target architecture. If it is not known to be lock free a call is made to a runtime routine named @code{__atomic_is_lock_free}. @end deftypefn @node Object Size Checking --- 7245,7276 @end deftypefn ! @deftypefn {Built-in Function} bool __atomic_always_lock_free (size_t size, void *ptr) ! This built-in function returns true if objects of @var{size} bytes will always ! generate lock free atomic instructions for the target architecture. ! @var{size} must resolve to a compile time constant and the result also resolves to compile time constant. ! @var{ptr} is an optional pointer to the object which may be used to determine ! alignment. A value of 0 indicates typical alignment should be used. The ! compiler may also ignore this parameter. @smallexample ! if (_atomic_always_lock_free (sizeof (long long), 0)) @end smallexample @end deftypefn ! @deftypefn {Built-in Function} bool __atomic_is_lock_free (size_t size, void *ptr) ! This built-in function returns true if objects of @var{size} bytes will always generate lock free atomic instructions for the target architecture. If it is not known to be lock free a call is made to a runtime routine named @code{__atomic_is_lock_free}. + @var{ptr} is an optional pointer to the object which may be used to determine + alignment. A value of 0 indicates typical alignment should be used. The + compiler may also ignore this parameter. @end deftypefn @node Object Size Checking
PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF
Hi, Since constant address in x32 is signed extended to 64bit, negative displacement without base nor index is out of range. OK for trunk? Thanks. H.J. --- gcc/ 2012-02-10 H.J. Lu hongjiu...@intel.com PR target/52146 * config/i386/i386.c (ix86_legitimate_address_p): Disallow negative constant address for x32. gcc/testsuite/ 2012-02-10 H.J. Lu hongjiu...@intel.com PR target/52146 * gcc.target/i386/pr52146.c: New. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 009dd53..0bb94a7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12107,6 +12107,15 @@ ix86_legitimate_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, || !ix86_legitimate_constant_p (Pmode, disp))) /* Displacement is not constant. */ return false; + else if (TARGET_X32 + !base + !index + CONST_INT_P (disp) + INTVAL (disp) 0) + /* Since constant address in x32 is signed extended to 64bit, + negative displacement without base nor index is out of + range. */ + return false; else if (TARGET_64BIT !x86_64_immediate_operand (disp, VOIDmode)) /* Displacement is out of range. */ diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c b/gcc/testsuite/gcc.target/i386/pr52146.c new file mode 100644 index 000..68bdeff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr52146.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* } { ! { ia32 } } } } } */ +/* { dg-options -O2 -mx32 } */ + +void test1() { + int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr += 4; +} +void test2() { + volatile int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr = 0; +} +void test3() { + volatile int* apic_tpr_addr = (int *)0x7fff; + *apic_tpr_addr = 0; +} + +/* { dg-final { scan-assembler-not -18874240 } } */
Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF
On Fri, Feb 10, 2012 at 09:25:06AM -0800, H.J. Lu wrote: Hi, Since constant address in x32 is signed extended to 64bit, negative displacement without base nor index is out of range. OK for trunk? Here is a different patch. H.J. --- gcc/ 2012-02-10 Uros Bizjak ubiz...@gmail.com PR target/52146 * config/i386/i386.c (ix86_legitimate_address_p): Disallow negative constant address for x32. gcc/testsuite/ 2012-02-10 H.J. Lu hongjiu...@intel.com PR target/52146 * gcc.target/i386/pr52146.c: New. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 009dd53..8f4e72e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx base, index, disp; HOST_WIDE_INT scale; + /* Since constant address in x32 is signed extended to 64bit, + we have to prevent addresses from 0x8000 to 0x. */ + if (TARGET_X32 + CONST_INT_P (addr) + val_signbit_known_set_p (SImode, INTVAL (addr))) +return false; + if (ix86_decompose_address (addr, parts) = 0) /* Decomposition failed. */ return false; diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c b/gcc/testsuite/gcc.target/i386/pr52146.c new file mode 100644 index 000..68bdeff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr52146.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* } { ! { ia32 } } } } } */ +/* { dg-options -O2 -mx32 } */ + +void test1() { + int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr += 4; +} +void test2() { + volatile int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr = 0; +} +void test3() { + volatile int* apic_tpr_addr = (int *)0x7fff; + *apic_tpr_addr = 0; +} + +/* { dg-final { scan-assembler-not -18874240 } } */
Re: [PATCH] c/52190 update __atomic documentation
On Fri, 10 Feb 2012, Andrew MacLeod wrote: Updated the extra parameter to __atomic_{is,always}_lock_free , and bkoz noted that the __atomic_compare_exchange documentation did not document the 'weak' parameter. Fixed as such, seems to format fine. OK for mainline? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: trans-mem: virtual ops for gimple_transaction
On 02/10/2012 01:44 AM, Richard Guenther wrote: What is the reason to keep a GIMPLE_TRANSACTION stmt after TM lowering and not lower it to a builtin function call? Because real optimization hasn't happened yet, and we hold out hope that we'll be able to delete stuff as unreachable. Especially all instances of transaction_cancel. It seems the body is empty after lowering (what's the label thing?) The label is the transaction cancel label. When we finally convert GIMPLE_TRANSACTION a builtin, we'll generate different code layouts with and without a cancel. r~
Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF
On Fri, Feb 10, 2012 at 6:42 PM, H.J. Lu hongjiu...@intel.com wrote: Since constant address in x32 is signed extended to 64bit, negative displacement without base nor index is out of range. OK for trunk? Here is a different patch. H.J. --- gcc/ 2012-02-10 Uros Bizjak ubiz...@gmail.com PR target/52146 * config/i386/i386.c (ix86_legitimate_address_p): Disallow negative constant address for x32. gcc/testsuite/ 2012-02-10 H.J. Lu hongjiu...@intel.com PR target/52146 * gcc.target/i386/pr52146.c: New. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 009dd53..8f4e72e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx base, index, disp; HOST_WIDE_INT scale; + /* Since constant address in x32 is signed extended to 64bit, + we have to prevent addresses from 0x8000 to 0x. */ + if (TARGET_X32 + CONST_INT_P (addr) + val_signbit_known_set_p (SImode, INTVAL (addr))) As said in the PR, val_signbit_known_set_p is a bit overkill. Please use INTVAL (addr) 0, it works as well. +++ b/gcc/testsuite/gcc.target/i386/pr52146.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* } { ! { ia32 } } } } } */ we _are_ in x86 directory, so: /* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 } */ + +void test1() { + int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr += 4; +} +void test2() { + volatile int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr = 0; No need for volatile. +} +void test3() { + volatile int* apic_tpr_addr = (int *)0x7fff; + *apic_tpr_addr = 0; +} test2 is enough. No need to test what worked OK. + +/* { dg-final { scan-assembler-not -18874240 } } */ Please also reformat the test to GNU coding standards. Patch is OK for 4.7 and 4.6 after bootstrap and regression test on x32 target. Thanks, Uros.
Re: Gthreads patch to disable static initializer macros
On 10 February 2012 14:48, Rainer Orth wrote: I've also noticed one feature of your patch that I don't like: __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of pthread_{mutex,cond}_init_function instead of passing them on as all other gthr-posix.h functions do. This might lead to bad error handling, and my previous patch (based on an older version of yours you had attached to the PR) changed them to return int instead. I suppose changing this now is out of question, and this is left as 4.8 material. I didn't like that feature of the patch much either, but the signature of the mutex_init function is specified in gthr.h: __GTHREAD_MUTEX_INIT_FUNCTION some systems can't initialize a mutex without a function call. On such systems, define this to a function which looks like this: void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *) The recursive one says it should have the same signature, and I (maybe wrongly) assumed the cond_init one should too.
Re: Gthreads patch to disable static initializer macros
On 10 February 2012 14:48, Rainer Orth wrote: Bootstrapped without regressions on alpha-dec-osf5.1b, ok for mainline? Yes, this is OK, thanks for your help testing and getting all of this working.
Re: Gthreads patch to disable static initializer macros
Jon, On 10 February 2012 14:48, Rainer Orth wrote: I've also noticed one feature of your patch that I don't like: __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of pthread_{mutex,cond}_init_function instead of passing them on as all other gthr-posix.h functions do. This might lead to bad error handling, and my previous patch (based on an older version of yours you had attached to the PR) changed them to return int instead. I suppose changing this now is out of question, and this is left as 4.8 material. I didn't like that feature of the patch much either, but the signature of the mutex_init function is specified in gthr.h: __GTHREAD_MUTEX_INIT_FUNCTION some systems can't initialize a mutex without a function call. On such systems, define this to a function which looks like this: void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *) The recursive one says it should have the same signature, and I (maybe I don't see this, neither in gthr.h nor in gthr-posix.h. wrongly) assumed the cond_init one should too. I don't think this is cast in stone: the gthr*.h headers aren't installed in a public place and aren't considered an external interface to the best of my knowledge, so it should be possible to change. Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc, libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in libstdc++), so changing the 13 calls is doable. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PATCH: PR target/52146: [x32] - Wrong code to access addresses 0x80000000 to 0xFFFFFFFF
On Fri, Feb 10, 2012 at 9:55 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Feb 10, 2012 at 6:42 PM, H.J. Lu hongjiu...@intel.com wrote: Since constant address in x32 is signed extended to 64bit, negative displacement without base nor index is out of range. OK for trunk? Here is a different patch. H.J. --- gcc/ 2012-02-10 Uros Bizjak ubiz...@gmail.com PR target/52146 * config/i386/i386.c (ix86_legitimate_address_p): Disallow negative constant address for x32. gcc/testsuite/ 2012-02-10 H.J. Lu hongjiu...@intel.com PR target/52146 * gcc.target/i386/pr52146.c: New. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 009dd53..8f4e72e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11932,6 +11932,13 @@ ix86_legitimate_address_p (enum machine_mode mode ATTRIBUTE_UNUSED, rtx base, index, disp; HOST_WIDE_INT scale; + /* Since constant address in x32 is signed extended to 64bit, + we have to prevent addresses from 0x8000 to 0x. */ + if (TARGET_X32 + CONST_INT_P (addr) + val_signbit_known_set_p (SImode, INTVAL (addr))) As said in the PR, val_signbit_known_set_p is a bit overkill. Please use INTVAL (addr) 0, it works as well. +++ b/gcc/testsuite/gcc.target/i386/pr52146.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { { i?86-*-linux* x86_64-*-linux* } { ! { ia32 } } } } } */ we _are_ in x86 directory, so: /* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 } */ + +void test1() { + int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr += 4; +} +void test2() { + volatile int* apic_tpr_addr = (int *)0xfee00080; + *apic_tpr_addr = 0; No need for volatile. +} +void test3() { + volatile int* apic_tpr_addr = (int *)0x7fff; + *apic_tpr_addr = 0; +} test2 is enough. No need to test what worked OK. + +/* { dg-final { scan-assembler-not -18874240 } } */ Please also reformat the test to GNU coding standards. Patch is OK for 4.7 and 4.6 after bootstrap and regression test on x32 target. I checked it into 4.7 and will backport to my x32 4.6 branch. Thanks. -- H.J.
Re: Gthreads patch to disable static initializer macros
On 10 February 2012 18:20, Rainer Orth wrote: Jon, On 10 February 2012 14:48, Rainer Orth wrote: I've also noticed one feature of your patch that I don't like: __GTHREAD_{MUTEX,COND}_INIT_FUNCTION ignore the return values of pthread_{mutex,cond}_init_function instead of passing them on as all other gthr-posix.h functions do. This might lead to bad error handling, and my previous patch (based on an older version of yours you had attached to the PR) changed them to return int instead. I suppose changing this now is out of question, and this is left as 4.8 material. I didn't like that feature of the patch much either, but the signature of the mutex_init function is specified in gthr.h: __GTHREAD_MUTEX_INIT_FUNCTION some systems can't initialize a mutex without a function call. On such systems, define this to a function which looks like this: void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *) The recursive one says it should have the same signature, and I (maybe I don't see this, neither in gthr.h nor in gthr-posix.h. lines 54-62 in http://gcc.gnu.org/viewcvs/trunk/libgcc/gthr.h?view=markup wrongly) assumed the cond_init one should too. I don't think this is cast in stone: the gthr*.h headers aren't installed in a public place and aren't considered an external interface to the best of my knowledge, so it should be possible to change. Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc, libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in libstdc++), so changing the 13 calls is doable. Private ports which provide their own gthr header (I think they're rare, but do exist) would need to modify their functions to return an integer - but at least it would be a compile-time failure so they'd know to change it.
Re: Gthreads patch to disable static initializer macros
Jonathan Wakely jwakely@gmail.com writes: I didn't like that feature of the patch much either, but the signature of the mutex_init function is specified in gthr.h: __GTHREAD_MUTEX_INIT_FUNCTION some systems can't initialize a mutex without a function call. On such systems, define this to a function which looks like this: void __GTHREAD_MUTEX_INIT_FUNCTION (__gthread_mutex_t *) The recursive one says it should have the same signature, and I (maybe I don't see this, neither in gthr.h nor in gthr-posix.h. lines 54-62 in http://gcc.gnu.org/viewcvs/trunk/libgcc/gthr.h?view=markup I think that's overinterpreting the comment. __gthread_recursive_mutex_init_function happily returns errors with no indication it should be otherwise. wrongly) assumed the cond_init one should too. I don't think this is cast in stone: the gthr*.h headers aren't installed in a public place and aren't considered an external interface to the best of my knowledge, so it should be possible to change. Currently, __GTHREAD_MUTEX_INIT_FUNCTION is only used in libgcc, libgfortran, and libstdc++ (and __GTHREAD_COND_INIT_FUNCTION only in libstdc++), so changing the 13 calls is doable. Private ports which provide their own gthr header (I think they're rare, but do exist) would need to modify their functions to return an integer - but at least it would be a compile-time failure so they'd know to change it. Right, and that's the cost of keeping a port out of the FSF tree. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[patch libffi]: Add support for ffi_prep_raw_closure_loc for X86_WIN32 for fixing finally PR 51500
Hi, ChangeLog 2012-02-10 Kai Tietz kti...@redhat.com * src/x86/ffi.c (ffi_prep_raw_closure_loc): Add thiscall support for X86_WIN32. Tested for i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: src/x86/ffi.c === --- src/x86/ffi.c (revision 184105) +++ src/x86/ffi.c (working copy) @@ -720,6 +720,9 @@ int i; if (cif-abi != FFI_SYSV) { +#ifdef X86_WIN32 +if (cif-abi != FFI_THISCALL) +#endif return FFI_BAD_ABI; } @@ -734,10 +737,20 @@ FFI_ASSERT (cif-arg_types[i]-type != FFI_TYPE_LONGDOUBLE); } - +#if X86_WIN32 + if (cif-abi == FFI_SYSV) +{ +#endif FFI_INIT_TRAMPOLINE (closure-tramp[0], ffi_closure_raw_SYSV, codeloc); - +#if X86_WIN32 +} + else if (cif-abi == FFI_THISCALL) +{ + FFI_INIT_TRAMPOLINE_THISCALL (closure-tramp[0], ffi_closure_raw_SYSV, + codeloc, cif-bytes); +} +#endif closure-cif = cif; closure-user_data = user_data; closure-fun = fun;
Re: haifa-sched: Fix problems with cc0 in prune_ready_list
On 02/08/2012 10:01 AM, Bernd Schmidt wrote: We found a scheduler problem while testing Coldfire targets. The prune_ready_list function I introduced doesn't take SCHED_GROUPs into account, which can lead to a random insn being scheduled between a cc0 setter and user. The patch below fixes it, OK? (Bootstrapped tested i686-linux). Ok. Thanks, Bernd.
Re: [libitm] Port to ARM
I have no immediate plans. If you have time to knock something together for Sparc, that would be fantastic. Btw, any particular reason why the libgomp-copied linking scheme seems to be only half-implemented? # Set up the set of libraries that we need to link against for libitm. # Note that the GOMP_SELF_SPEC in gcc.c will force -pthread for -fopenmp, # which will force linkage against -lpthread (or equivalent for the system). # That's not 100% ideal, but about the best we can do easily. if test $enable_shared = yes; then link_itm=-litm %{static: $LIBS} else link_itm=-litm $LIBS fi AC_SUBST(link_itm) but AFAICS nobody uses libitm.spec (and so you need to force -litm for the testsuite)? -- Eric Botcazou
[PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
I was looking at the routelookup EEMBC benchmark and it has code of the form: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = this_node-rlink; else next_node = this_node-llink; } This is where you have a binary tree/trie and you are iterating going down either the right link or left link until you find a stopping condition. The code in ifcvt.c does not handle optimizing these cases for conditional move since the load might trap, and generates code that does if-then-else with loads and jumps. However, since the two elements are next to each other in memory, they are likely in the same cache line, particularly with aligned stacks and malloc returning aligned pointers. Except in unusual circumstances where the pointer is not aligned, this means it is much faster to optimize it as: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; rtmp = this_node-rlink; ltmp = this_node-llink; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = rtmp; else next_node = ltmp; } So I wrote some patches to do this optimization. In ifcvt.c I added a new hook that allows the backend to try and do conditional moves if the machine independent code doesn't handle the special cases that the machine might have. Then in rs6000.c I used that hook to see if the conditional moves are adjacent, and do the optimization. I will note that this type of code comes up quite frequently since binary trees and tries are common data structure. The file splay-tree.c in libiberty is one place in the compiler tree that has conditional adjacent memory moves. So I would like comments on the patch before the 4.8 tree opens up. I feel even if we decide not to add the adjacent memory move patch, the hook is useful, and I have some other ideas for using it for the powerpc. I was thinking about rewriting the rs6000 dependent parts to make it a normal optimization available to all ports. Is this something we want as a normal option? At the moment, I'm not sure it should be part of -O3 because it is possible for a trap to occur if the pointer straddles a page boundary and the test condition would guard against loading up the second value. However, -Ofast might be an appropriate place to do this optimization. At this time I don't have test cases, but I would add them for the real submission. I have bootstraped the compiler on powerpc with this option enabled and it passed the bootstrap and had no regressions in make check. I will do a spec run over the weekend as well. In addition to libibery/splay-tree.c the following files in gcc have adjacent conditional moves that this code would optimize: cfg.c c-typeck.c df-scan.c fold-const.c graphds.c ira-emit.c omp-low.c rs6000.c tree-cfg.c tree-ssa-dom.c tree-ssa-loop-ivops.c tree-ssa-phiopt.c tree-ssa-uncprop.c 2012-02-10 Michael Meissner meiss...@linux.vnet.ibm.com * target.def (cmove_md_extra): New hook that is called from ifcvt.c to allow the backend to generate additional conditional moves that aren't handled by the machine independent code. Add support to call the hook at the appropriate places. * targhooks.h (default_cmove_md_extra): Likewise. * targhooks.c (default_cmove_md_extra): Likewise. * target.h (enum ifcvt_pass): Likewise. * ifcvt.c (find_if_header): Likewise. (noce_find_if_block): Likewise. (struct noce_if_info): Likewise. (noce_process_if_block): Likewise. (cond_move_process_if_block): Likewise. (if_convert): Likewise. (rest_of_handle_if_conversion): Likewise. (rest_of_handle_if_after_combine): Likewise. (rest_of_handle_if_after_reload): Likewise. * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise. * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise. * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support for a new switch to optimize conditional moves where each side is a memory reference and the memory locations are adjacent. * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise. (TARGET_CMOVE_MD_EXTRA): Likewise. (rs6000_decompose_offsettable_memref): Likewise. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 184104) +++ gcc/doc/tm.texi (working copy) @@ -10871,6 +10871,16 @@ conditional execution instructions inste 1 if it does use cc0. @end defmac +@deftypefn {Target Hook} rtx
[PATCH, 4.6] Backport fixes for PR50031, PR50969
This patch backports the two recent trunk fixes for powerpc64 vectorization degradations. The fixes are largely identical to their 4.7 counterparts except that (a) the logic for STMT_VINFO_PATTERN_DEF_SEQ does not apply in 4.6, and (b) the changes to vectorizable_conversion in 4.7 correspond to changes in vectorizable_type_demotion and vectorizable_type_promotion in 4.6. Bootstrapped and tested for regressions and performance for powerpc64-linux. OK to commit after the trunk patch has a few days of burn-in? Thanks, Bill 2012-02-10 Bill Schmidt wschm...@linux.vnet.ibm.com Ira Rosen i...@il.ibm.com PR tree-optimization/50031 PR tree-optimization/50969 * targhooks.c (default_builtin_vectorization_cost): Handle vec_promote_demote. * target.h (enum vect_cost_for_stmt): Add vec_promote_demote. * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle all types of reduction and pattern statements. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function. (vect_model_store_cost): Use vec_perm rather than vector_stmt for statement cost. (vect_model_load_cost): Likewise. (vect_get_load_cost): Likewise; add dump logic for explicit realigns. (vectorizable_type_demotion): Call vect_model_promotion_demotion_cost. (vectorizable_type_promotion): Likewise. * config/spu/spu.c (spu_builtin_vectorization_cost): Handle vec_promote_demote. * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update vec_perm for VSX and handle vec_promote_demote. Index: gcc/targhooks.c === --- gcc/targhooks.c (revision 184047) +++ gcc/targhooks.c (working copy) @@ -529,6 +529,7 @@ default_builtin_vectorization_cost (enum vect_cost case scalar_to_vec: case cond_branch_not_taken: case vec_perm: + case vec_promote_demote: return 1; case unaligned_load: Index: gcc/target.h === --- gcc/target.h(revision 184047) +++ gcc/target.h(working copy) @@ -128,7 +128,8 @@ enum vect_cost_for_stmt scalar_to_vec, cond_branch_not_taken, cond_branch_taken, - vec_perm + vec_perm, + vec_promote_demote }; /* Sets of optimization levels at which an option may be enabled by Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 184047) +++ gcc/tree-vect-loop.c(working copy) @@ -2104,7 +2104,8 @@ vect_get_single_scalar_iteraion_cost (loop_vec_inf if (stmt_info !STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) + !STMT_VINFO_IN_PATTERN_P (stmt_info)) continue; if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) @@ -2251,11 +2252,19 @@ vect_estimate_min_profitable_iters (loop_vec_info { gimple stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + if (STMT_VINFO_IN_PATTERN_P (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } + /* Skip stmts that are not vectorized inside the loop. */ if (!STMT_VINFO_RELEVANT_P (stmt_info) (!STMT_VINFO_LIVE_P (stmt_info) - || STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info continue; + vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor; /* FIXME: for stmts in the inner-loop in outer-loop vectorization, some of the outside costs are generated inside the outer-loop. */ Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 184047) +++ gcc/tree-vect-stmts.c (working copy) @@ -623,6 +623,46 @@ vect_model_simple_cost (stmt_vec_info stmt_info, i } +/* Model cost for type demotion and promotion operations. PWR is normally + zero for single-step promotions and demotions. It will be one if + two-step promotion/demotion is required, and so on. Each additional + step doubles the number of instructions required. */ + +static void +vect_model_promotion_demotion_cost (stmt_vec_info stmt_info, + enum vect_def_type *dt, int pwr) +{ + int i, tmp; + int inside_cost = 0, outside_cost = 0,
Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I was looking at the routelookup EEMBC benchmark and it has code of the form: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = this_node-rlink; else next_node = this_node-llink; } Hmm, this looks like we could do this on the tree level better as we have more information about this_node there. Like we know that we load from this_node-cmpbit before we do either of the branches. So can move both of those loads before the branch and then we get the ifcvt for free. Thanks, Andrew Pinski This is where you have a binary tree/trie and you are iterating going down either the right link or left link until you find a stopping condition. The code in ifcvt.c does not handle optimizing these cases for conditional move since the load might trap, and generates code that does if-then-else with loads and jumps. However, since the two elements are next to each other in memory, they are likely in the same cache line, particularly with aligned stacks and malloc returning aligned pointers. Except in unusual circumstances where the pointer is not aligned, this means it is much faster to optimize it as: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; rtmp = this_node-rlink; ltmp = this_node-llink; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = rtmp; else next_node = ltmp; } So I wrote some patches to do this optimization. In ifcvt.c I added a new hook that allows the backend to try and do conditional moves if the machine independent code doesn't handle the special cases that the machine might have. Then in rs6000.c I used that hook to see if the conditional moves are adjacent, and do the optimization. I will note that this type of code comes up quite frequently since binary trees and tries are common data structure. The file splay-tree.c in libiberty is one place in the compiler tree that has conditional adjacent memory moves. So I would like comments on the patch before the 4.8 tree opens up. I feel even if we decide not to add the adjacent memory move patch, the hook is useful, and I have some other ideas for using it for the powerpc. I was thinking about rewriting the rs6000 dependent parts to make it a normal optimization available to all ports. Is this something we want as a normal option? At the moment, I'm not sure it should be part of -O3 because it is possible for a trap to occur if the pointer straddles a page boundary and the test condition would guard against loading up the second value. However, -Ofast might be an appropriate place to do this optimization. At this time I don't have test cases, but I would add them for the real submission. I have bootstraped the compiler on powerpc with this option enabled and it passed the bootstrap and had no regressions in make check. I will do a spec run over the weekend as well. In addition to libibery/splay-tree.c the following files in gcc have adjacent conditional moves that this code would optimize: cfg.c c-typeck.c df-scan.c fold-const.c graphds.c ira-emit.c omp-low.c rs6000.c tree-cfg.c tree-ssa-dom.c tree-ssa-loop-ivops.c tree-ssa-phiopt.c tree-ssa-uncprop.c 2012-02-10 Michael Meissner meiss...@linux.vnet.ibm.com * target.def (cmove_md_extra): New hook that is called from ifcvt.c to allow the backend to generate additional conditional moves that aren't handled by the machine independent code. Add support to call the hook at the appropriate places. * targhooks.h (default_cmove_md_extra): Likewise. * targhooks.c (default_cmove_md_extra): Likewise. * target.h (enum ifcvt_pass): Likewise. * ifcvt.c (find_if_header): Likewise. (noce_find_if_block): Likewise. (struct noce_if_info): Likewise. (noce_process_if_block): Likewise. (cond_move_process_if_block): Likewise. (if_convert): Likewise. (rest_of_handle_if_conversion): Likewise. (rest_of_handle_if_after_combine): Likewise. (rest_of_handle_if_after_reload): Likewise. * doc/tm.texi (TARGET_CMOVE_MD_EXTRA): Likewise. * doc/tm.texi.in (TARGET_CMOVE_MD_EXTRA): Likewise. * config/rs6000/rs6000.opt (-mcmove-adjacent-memory): Add support for a new switch to optimize conditional moves where each side is a memory reference and the memory locations are adjacent. * config/rs6000/rs6000.c (rs6000_cmove_md_extra): Likewise. (TARGET_CMOVE_MD_EXTRA): Likewise. (rs6000_decompose_offsettable_memref): Likewise. --
[lra] patch to fix bootstrap on ARM
The following patch fixes arm bootstrap for LRA. The problem was in pseudo live range splitting and wrong assignment to multi-registers pseudos with the same value. The patch was successfully bootstrapped on x86/x86-64 and ARM. Committed as rev. 184109. 2012-02-10 Vladimir Makarov vmaka...@redhat.com * lra-int.h (struct lra_reg): New member call_p. * lra-lives.c (lra_create_live_ranges): Clear call_p for the pseudo. (process_bb_lives): Setup call_p. * lra-assigns.c (find_hard_regno_for): Add code to setup and use impossible_start_hard_regs. (setup_live_pseudos_and_spill_after_risky): Consider conflict_hard_regs to. (lra_assign): Add checking pseudos crossing calls and using call used hard regs. * lra-constraints.c (extract_loc_address_regs): Fix the typo in using code1 instead of code0. (exchange_plus_ops): New function. (process_address): Use it. (split_pseudo): Rename and change semantics of the first parameter. (inherit_in_ebb): Process only output non-subreg pseudos as a definition for the split. In this case don't invalidate usages for splitting. Always add usage insn for splitting in case of used pseudos or defined subregs of multi-reg pseudos. Index: lra-assigns.c === --- lra-assigns.c (revision 183639) +++ lra-assigns.c (working copy) @@ -304,11 +304,14 @@ find_hard_regno_for (int regno, int *cos int best_cost = INT_MAX, best_bank = INT_MAX, best_usage = INT_MAX; lra_live_range_t r; int p, i, j, rclass_size, best_hard_regno, bank, hard_regno; + int hr, conflict_hr, nregs; + enum machine_mode biggest_mode; unsigned int k, conflict_regno; int val; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; + HARD_REG_SET impossible_start_hard_regs; COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs); rclass = regno_allocno_class_array[regno]; @@ -317,6 +320,7 @@ find_hard_regno_for (int regno, int *cos sparseset_clear (conflict_reload_and_inheritance_pseudos); sparseset_clear (live_range_hard_reg_pseudos); IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs); + biggest_mode = lra_reg_info[regno].biggest_mode; for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r-next) { EXECUTE_IF_SET_IN_BITMAP (live_hard_reg_pseudos[r-start], 0, k, bi) @@ -370,8 +374,26 @@ find_hard_regno_for (int regno, int *cos #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val != lra_reg_info[conflict_regno].val) +if (val == lra_reg_info[conflict_regno].val) + { + conflict_hr = live_pseudos_reg_renumber[conflict_regno]; + nregs = (hard_regno_nregs[conflict_hr] + [lra_reg_info[conflict_regno].biggest_mode]); + /* Remember about multi-register pseudos. For example, 2 hard + register pseudos can start on the same hard register but can + not start on HR and HR+1/HR-1. */ + for (hr = conflict_hr + 1; + hr FIRST_PSEUDO_REGISTER hr conflict_hr + nregs; + hr++) + SET_HARD_REG_BIT (impossible_start_hard_regs, hr); + for (hr = conflict_hr - 1; + hr = 0 hr + hard_regno_nregs[hr][biggest_mode] conflict_hr; + hr--) + SET_HARD_REG_BIT (impossible_start_hard_regs, hr); + } +else { lra_add_hard_reg_set (live_pseudos_reg_renumber[conflict_regno], lra_reg_info[conflict_regno].biggest_mode, @@ -424,7 +446,8 @@ find_hard_regno_for (int regno, int *cos conflict_set) /* We can not use prohibited_class_mode_regs because it is defined not for all classes. */ - HARD_REGNO_MODE_OK (hard_regno, PSEUDO_REGNO_MODE (regno))) + HARD_REGNO_MODE_OK (hard_regno, PSEUDO_REGNO_MODE (regno)) + ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)) { if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check) @@ -881,9 +904,13 @@ setup_live_pseudos_and_spill_after_risky } } COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs); + IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs); val = lra_reg_info[regno].val; EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) - if (val != lra_reg_info[conflict_regno].val) + if (val != lra_reg_info[conflict_regno].val + /* If it is multi-register pseudos they should start on + the same hard register. */ + || hard_regno != reg_renumber[conflict_regno]) lra_add_hard_reg_set (reg_renumber[conflict_regno], lra_reg_info[conflict_regno].biggest_mode, conflict_set); @@ -893,7 +920,6 @@ setup_live_pseudos_and_spill_after_risky continue; } bitmap_set_bit (spilled_pseudo_bitmap, regno); - hard_regno = reg_renumber[regno]; for (j = 0; j hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)]; j++)
Re: libgo patch committed: Update to weekly.2011-12-22
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Ian Lance Taylor i...@google.com writes: Fixed with the appended patch, which also gathers up all the possibly missing functions that I noticed. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu, which proves little as the system I tested on has all these functions anyhow. Committed to mainline. Results are way better now, but still a couple of libgo FAILs: Running target unix FAIL: net FAIL: websocket FAIL: compress/flate FAIL: exp/ssh FAIL: image/jpeg FAIL: log/syslog FAIL: net/http FAIL: net/http/httputil FAIL: net/rpc FAIL: os/exec Most of them are like Start pollServer: epoll_ctl: Bad file descriptor panic: runtime error: invalid memory address or nil pointer dereference FAIL: net That sort of problem should be fixed now, by http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00110.html Ian
Re: [Ada] Do not pass -Werror during linking
Can you try to extract a testcase (assuming it's just a single case?). We shouldn't warn for layout-compatible types (but we may do so if for example struct nesting differs). It's more basic than that: for example, we map pointers on the C side to addresses (integers) on the Ada side. -- Eric Botcazou
libgo patch committed: Fix channels for big-endian strict-alignment
This libgo patch fixes the channel code to work correctly on big-endian systems. The older version of the code did this right, but I forgot to handle it in the new version. This patch also fixes the memory structure built for the select statement to work correctly on strict alignment systems. The code was putting a uint16_t array before a pointer array, which would of course cause a SIGBUS on a strict alignment system. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Bootstrapped and ran channel tests on sparc-sun-solaris2.11; the Go testsuite on that system still has some failures but they appear to be due to other problems, such as PR 52205 which I just filed. Committed to mainline. Ian diff -r 93764d4d4aca libgo/runtime/chan.c --- a/libgo/runtime/chan.c Fri Feb 10 07:53:37 2012 -0800 +++ b/libgo/runtime/chan.c Fri Feb 10 15:54:46 2012 -0800 @@ -409,11 +409,20 @@ void __go_send_small(ChanType *t, Hchan* c, uint64 val) { - byte b[sizeof(uint64)]; + union + { + byte b[sizeof(uint64)]; + uint64 v; + } u; + byte *p; - runtime_memclr(b, sizeof(uint64)); - __builtin_memcpy(b, val, t-__element_type-__size); - runtime_chansend(t, c, b, nil); + u.v = val; +#ifndef WORDS_BIGENDIAN + p = u.b; +#else + p = u.b + sizeof(uint64) - t-__element_type-__size; +#endif + runtime_chansend(t, c, p, nil); } // The compiler generates a call to __go_send_big to send a value @@ -433,9 +442,15 @@ byte b[sizeof(uint64)]; uint64 v; } u; + byte *p; u.v = 0; - runtime_chanrecv(t, c, u.b, nil, nil); +#ifndef WORDS_BIGENDIAN + p = u.b; +#else + p = u.b + sizeof(uint64) - t-__element_type-__size; +#endif + runtime_chanrecv(t, c, p, nil, nil); return u.v; } @@ -654,8 +669,8 @@ sel-tcase = size; sel-ncase = 0; - sel-pollorder = (void*)(sel-scase + size); - sel-lockorder = (void*)(sel-pollorder + size); + sel-lockorder = (void*)(sel-scase + size); + sel-pollorder = (void*)(sel-lockorder + size); *selp = sel; if(debug)
[SPARC] Remove useless new code
It was introduced in the course of the -mflat rewrite, but can be replaced by a single line. Tested on SPARC/Solaris w/ and w/o -mflat, applied on the mainline. 2012-02-10 Eric Botcazou ebotca...@adacore.com * config/sparc/sparc.c (sparc_flat_expand_prologue): Use emit_use. * config/sparc/sparc.md (UNSPECV_GOTO): Delete. (nonlocal_goto_internal): Likewise. (nonlocal_goto): Emit a use and an indirect jump directly. -- Eric Botcazou Index: config/sparc/sparc.c === --- config/sparc/sparc.c (revision 183864) +++ config/sparc/sparc.c (working copy) @@ -5131,7 +5131,7 @@ sparc_flat_expand_prologue (void) /* Prevent this instruction from ever being considered dead, even if this function has no epilogue. */ - emit_insn (gen_rtx_USE (VOIDmode, i7)); + emit_use (i7); } } Index: config/sparc/sparc.md === --- config/sparc/sparc.md (revision 183864) +++ config/sparc/sparc.md (working copy) @@ -99,7 +99,6 @@ (define_constants (define_constants [(UNSPECV_BLOCKAGE 0) (UNSPECV_FLUSHW 1) - (UNSPECV_GOTO 2) (UNSPECV_FLUSH 4) (UNSPECV_SAVEW 6) (UNSPECV_CAS 8) @@ -6524,6 +6523,7 @@ (define_expand nonlocal_goto (match_operand 3 memory_operand )] { + rtx i7 = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM); rtx r_label = copy_to_reg (operands[1]); rtx r_sp = adjust_address_nv (operands[2], Pmode, 0); rtx r_fp = operands[3]; @@ -6540,44 +6540,19 @@ (define_expand nonlocal_goto /* Restore frame pointer for containing function. */ emit_move_insn (hard_frame_pointer_rtx, r_fp); emit_stack_restore (SAVE_NONLOCAL, r_sp); + emit_move_insn (i7, r_i7); /* USE of hard_frame_pointer_rtx added for consistency; not clear if really needed. */ emit_use (hard_frame_pointer_rtx); emit_use (stack_pointer_rtx); + emit_use (i7); - /* We need to smuggle the load of %i7 as it is a fixed register. */ - emit_jump_insn (gen_nonlocal_goto_internal (r_label, r_i7)); + emit_jump_insn (gen_indirect_jump (r_label)); emit_barrier (); DONE; }) -(define_insn nonlocal_goto_internal - [(unspec_volatile [(match_operand 0 register_operand r) - (match_operand 1 memory_operand m)] UNSPECV_GOTO)] - GET_MODE (operands[0]) == Pmode GET_MODE (operands[1]) == Pmode -{ - if (flag_delayed_branch) -{ - if (TARGET_ARCH64) - return jmp\t%0\n\t ldx\t%1, %%i7; - else - return jmp\t%0\n\t ld\t%1, %%i7; -} - else -{ - if (TARGET_ARCH64) - return ldx\t%1, %%i7\n\tjmp\t%0\n\t nop; - else - return ld\t%1, %%i7\n\tjmp\t%0\n\t nop; -} -} - [(set (attr type) (const_string multi)) - (set (attr length) - (if_then_else (eq_attr delayed_branch true) - (const_int 2) - (const_int 3)))]) - (define_expand builtin_setjmp_receiver [(label_ref (match_operand 0 ))] flag_pic
libgo patch committed: Tweak can-recover heuristic for SPARC
The gccgo implementation of panic/recover uses the return address of a function which called recover in order to see if that function was invoke directly by defer. It does this by passing the address of a label which immediately follows the function call. This requires a small heuristic, because there may be some machine instructions between the function call and the label, even though they are adjacent in GIMPLE. On SPARC I needed to tweak that heuristic further, because on SPARC __builtin_return_address returns the address of the function call instruction, which is at least 8 bytes before the label. This patch implements that tweak. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu and sparc-sun-solaris2.11 (Solaris has other unrelated failures). Committed to mainline. Ian Index: libgo/runtime/go-recover.c === --- libgo/runtime/go-recover.c (revision 183650) +++ libgo/runtime/go-recover.c (working copy) @@ -43,6 +43,14 @@ __go_can_recover (const void* retaddr) such as an instruction to adjust the stack pointer. */ ret = (const char *) retaddr; + +#ifdef __sparc__ + /* On SPARC the address we get, from __builtin_return_address, is + the address of the call instruction. Adjust forward, also + skipping the delayed instruction following the call. */ + ret += 8; +#endif + dret = (const char *) d-__retaddr; return ret = dret ret + 16 = dret; }
Re: [PATCH, rtl-optimization]: Fix post-reload compare elimination pre-pass
On 02/09/2012 03:47 PM, Uros Bizjak wrote: 2012-02-10 Uros Bizjak ubiz...@gmail.com * compare-elim.c (find_comparisons_in_bb): Eliminate only compares having mode compatible with the mode of previous compare. Substitute compare mode of previous compare with the mode, compatible with eliminated and previous compare. This patch is ok for 4.8. For 4.6 and 4.7 I would prefer that we simply not eliminate the compare. I.e. + enum machine_mode src_mode = GET_MODE (src); + /* Eliminate a compare that's redundant with the previous. */ if (last_cmp_valid + src_mode == last_cmp-orig_mode rtx_equal_p (last_cmp-in_a, XEXP (src, 0)) rtx_equal_p (last_cmp-in_b, XEXP (src, 1))) { ... - last_cmp-orig_mode = GET_MODE (SET_DEST (single_set (insn))); + last_cmp-orig_mode = src_mode; For 4.6 and 4.7, there are only two extant users of this pass and neither of them use anything besides CCmode before compare-elim.c does its own manipulation of the modes later. r~
Re: [v3] libstdc++/51798
On 02/09/2012 03:24 PM, Benjamin Kosnik wrote: This is the rest of 51798, completing the conversion to C++11 atomics in libstdc++. This is now a complete transition, modulo documentation which I plan to finish as a separate patch once I am back from the ISO C++ meeting. tested x86_64/linux -benjamin 20120209-2.patch 2012-02-09 Benjamin Kosnik b...@redhat.com Jonathan Wakely jwakely@gmail.com PR libstdc++/51798 continued. * acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Use __atomic_* builtins instead of __sync_* builtins for atomic functionality. * include/bits/shared_ptr_base.h: Same. * include/parallel/compatibility.h: Same. * include/profile/impl/profiler_state.h: Same. * include/tr1/shared_ptr.h: Same. * libsupc++/eh_ptr.cc: Same. * libsupc++/eh_throw.cc: Same. * libsupc++/eh_tm.cc: Same. * libsupc++/guard.cc: Same. * configure: Regenerated. * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust line numbers. * testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc: Same. The patch uses the weak version of compare_exchange universally, which is incorrect in a number of cases. You wouldn't see this on x86_64; you'd have to use a ll/sc target such as powerpc. In addition to changing several uses to strong compare_exchange, I also optimize the idiom do { var = *m; newval = ...; } while (!atomic_compare_exchange(m, var, newval, ...)); With the new builtins, VAR is updated with the current value of the memory (regardless of the weak setting), so the initial read from *M can be hoisted outside the loop. Ok? r~ * include/bits/shared_ptr_base.h (_Sp_counted_base_S_atomic::_M_add_ref_lock): Hoist initial load outside compare_exchange loop. * include/tr1/shared_ptr.h: Same. * include/parallel/compatibility.h (__compare_and_swap_32): Use strong version of compare_exchange. (__compare_and_swap_64): Same. * include/profile/impl/profiler_state.h (__gnu_profile::__turn): Same. * libsupc++/guard.cc (__cxa_guard_acquire): Same. diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index ebdc7ed..c48c18e 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -236,13 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_add_ref_lock() { // Perform lock-free add-if-not-zero operation. - _Atomic_word __count; + _Atomic_word __count = _M_use_count; do { - __count = _M_use_count; if (__count == 0) __throw_bad_weak_ptr(); - // Replace the current counter value with the old value + 1, as // long as it's not changed meanwhile. } diff --git a/libstdc++-v3/include/parallel/compatibility.h b/libstdc++-v3/include/parallel/compatibility.h index 460345e..8a65c9e 100644 --- a/libstdc++-v3/include/parallel/compatibility.h +++ b/libstdc++-v3/include/parallel/compatibility.h @@ -252,8 +252,9 @@ namespace __gnu_parallel __replacement, __comparand) == __comparand; #elif defined(__GNUC__) -return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, true, - __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); +return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, + false, __ATOMIC_ACQ_REL, + __ATOMIC_ACQ_REL); #elif defined(__SUNPRO_CC) defined(__sparc) return atomic_cas_32((volatile unsigned int*)__ptr, __comparand, __replacement) == __comparand; @@ -299,13 +300,15 @@ namespace __gnu_parallel #endif #elif defined(__GNUC__) defined(__x86_64) -return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, true, - __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); +return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, + false, __ATOMIC_ACQ_REL, + __ATOMIC_ACQ_REL); #elif defined(__GNUC__) defined(__i386)\ (defined(__i686) || defined(__pentium4) || defined(__athlon) \ || defined(__k8) || defined(__core2)) -return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, true, - __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); +return __atomic_compare_exchange_n(__ptr, __comparand, __replacement, + false, __ATOMIC_ACQ_REL, + __ATOMIC_ACQ_REL); #elif defined(__SUNPRO_CC) defined(__sparc) return atomic_cas_64((volatile unsigned long long*)__ptr, __comparand,
Re: [PATCH][ARM] Improve 64-bit shifts (non-NEON)
On 02/08/2012 08:28 AM, Andrew Stubbs wrote: Unfortunately, these don't work in Thumb mode (no IT block), and I'd have to add arith-shift variants, I think, for ARM mode to work. H ... I'll try again. Does it work to simply use branches initially, and rely on post-reload ifcvt to transform to cond_exec when possible? r~
Re: [PATCH] MIPS16 TLS support for GCC
On 02/04/2012 02:06 AM, Richard Sandiford wrote: Actually I had that idea of a link-once function too, but it turned out quite complicated to do without rewriting some generic parts of GCC as it is currently not prepared to emit link-once functions outside C++ compilations. It's been a while and I did lots of other stuff meanwhile, so please excuse me if I got anything wrong here. Hmm, OK, I wouldn't have expected that. But if you've tried making __mips16_rdhwr link-once and had a bad experience with it, then yeah, let's go with the hidden libgcc function. It's just a shame that we're having to force static linking of libgcc for this one case. The i386 target does it all the time for its __x86.get_pc_thunk.?x thingys. So I really don't believe the not prepared comment. r~
Re: [v3] Update alpha-linux baselines for GCC 4.7
On 02/06/2012 05:39 AM, Uros Bizjak wrote: 2012-02-06 Uros Bizjak ubiz...@gmail.com * config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Regenerated. Ok. r~
Re: [PATCH] Fix reg-stack DEBUG_INSN adjustments (PR debug/52132)
On 02/06/2012 05:06 AM, Jakub Jelinek wrote: 2012-02-06 Jakub Jelinek ja...@redhat.com PR debug/52132 * reg-stack.c (subst_stack_regs_in_debug_insn): Don't use get_true_reg. * gcc.dg/pr52132.c: New test. Ok. r~
Re: Reinstalled patch for 37273
On Thu, Mar 17, 2011 at 1:09 PM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I had pulled the patch for 37273 during the 4.6 cycle due to exposing several latent problems. I've just reinstalled it and will (of course) keep an eye out for any problems. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52208 -- H.J.
Re: [Ada] Do not pass -Werror during linking
Hi Eric, Can you try to extract a testcase (assuming it's just a single case?). We shouldn't warn for layout-compatible types (but we may do so if for example struct nesting differs). It's more basic than that: for example, we map pointers on the C side to addresses (integers) on the Ada side. is having Address be an integer useful any more? Nowadays it should be possible to declare Address to be an access type with no associated storage pool. Even nicer might be to have it be turned into void* when lowering to GCC types. After all, Address is really used much like void*; see how GNAT declares malloc for example. Both of these possibilities would probably play better with GCC's middle end type system, which considers integers to be different to pointers. Ciao, Duncan. PS: I first thought of this when I noticed (when using the dragonegg plugin to compile Ada) that some of LLVM's optimizations bail out when they see integers being used where they expect a pointer. I tried tweaking the declaration of Address to be an access type as mentioned above, but unfortunately that crashes the compiler pretty quickly.
libgo patch committed: FFI promotes result types
The FFI documentation doesn't mention it, but it turns out that the FFI library promotes integer result types. That is, if a function is declared as returning short, FFI will actually store the result as an int. The code in libgo was getting away with not supporting this because allocating one or two bytes would always permit storing four or eight bytes, and because I was only testing on little-endian systems. On big-endian systems this needs to be handled correctly. This patch does that. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Ran some but not all tests on sparc-sun-solaris2.11. Committed to mainline. Ian diff -r ff8b35faf485 libgo/runtime/go-reflect-call.c --- a/libgo/runtime/go-reflect-call.c Fri Feb 10 16:12:23 2012 -0800 +++ b/libgo/runtime/go-reflect-call.c Fri Feb 10 23:03:30 2012 -0800 @@ -5,6 +5,7 @@ license that can be found in the LICENSE file. */ #include stdio.h +#include stdint.h #include stdlib.h #include ffi.h @@ -326,6 +327,28 @@ types = (const struct __go_type_descriptor **) func-__out.__values; + /* A single integer return value is always promoted to a full + word. */ + if (count == 1) +{ + switch (types[0]-__code GO_CODE_MASK) + { + case GO_BOOL: + case GO_INT8: + case GO_INT16: + case GO_INT32: + case GO_UINT8: + case GO_UINT16: + case GO_UINT32: + case GO_INT: + case GO_UINT: + return sizeof (ffi_arg); + + default: + break; + } +} + off = 0; maxalign = 0; for (i = 0; i count; ++i) @@ -362,6 +385,81 @@ types = (const struct __go_type_descriptor **) func-__out.__values; + /* A single integer return value is always promoted to a full + word. */ + if (count == 1) +{ + switch (types[0]-__code GO_CODE_MASK) + { + case GO_BOOL: + case GO_INT8: + case GO_INT16: + case GO_INT32: + case GO_UINT8: + case GO_UINT16: + case GO_UINT32: + case GO_INT: + case GO_UINT: + { + union + { + unsigned char buf[sizeof (ffi_arg)]; + ffi_arg v; + } u; + ffi_arg v; + + __builtin_memcpy (u.buf, call_result, sizeof (ffi_arg)); + v = u.v; + + switch (types[0]-__size) + { + case 1: + { + uint8_t b; + + b = (uint8_t) v; + __builtin_memcpy (results[0], b, 1); + } + break; + + case 2: + { + uint16_t s; + + s = (uint16_t) v; + __builtin_memcpy (results[0], s, 2); + } + break; + + case 4: + { + uint32_t w; + + w = (uint32_t) v; + __builtin_memcpy (results[0], w, 4); + } + break; + + case 8: + { + uint64_t d; + + d = (uint64_t) v; + __builtin_memcpy (results[0], d, 8); + } + break; + + default: + abort (); + } + } + return; + + default: + break; + } +} + off = 0; for (i = 0; i count; ++i) { @@ -388,7 +486,7 @@ ffi_cif cif; unsigned char *call_result; - __go_assert (func_type-__common.__code == GO_FUNC); + __go_assert ((func_type-__common.__code GO_CODE_MASK) == GO_FUNC); go_func_to_cif (func_type, is_interface, is_method, cif); call_result = (unsigned char *) malloc (go_results_size (func_type));