Re: [Patch, fortran] PR50981 correctly handle absent arrays as actual argument to elemental procedures
Mikael Morin wrote: there was no specific handling for absent arrays passed as argument to elemental procedures. So, because of scalarisation, we were passing an array element reference of a NULL pointer which was failing. These patches add a conditional to pass NULL when the data pointer is NULL. Regression tested on x86_64-unknown-freebsd9.0. OK for trunk? OK. Thanks for the patch. Tobias
Re: trans-mem: virtual ops for gimple_transaction
On Fri, 10 Feb 2012, Richard Henderson wrote: On 02/10/2012 01:44 AM, Richard Guenther wrote: What is the reason to keep a GIMPLE_TRANSACTION stmt after TM lowering and not lower it to a builtin function call? Because real optimization hasn't happened yet, and we hold out hope that we'll be able to delete stuff as unreachable. Especially all instances of transaction_cancel. It seems the body is empty after lowering (what's the label thing?) The label is the transaction cancel label. When we finally convert GIMPLE_TRANSACTION a builtin, we'll generate different code layouts with and without a cancel. Ah, I see. But wouldn't a placeholder builtin function be effectively the same as using a new GIMPLE stmt kind? Richard.
RE: [PATCH] Improve SCEV for array element
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Friday, January 20, 2012 5:07 PM To: 'Richard Guenther' Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH] Improve SCEV for array element It's definitely not ok at this stage but at most for next stage1. OK. I may wait until next stage1. This is a very narrow pattern-match. It doesn't allow for a[i].x for example, even if a[i] is a one-element structure. I think the canonical way of handling ADDR_EXPR is to use sth like base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., offset, ...); base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); chrec1 = analyze_scalar_evolution (loop, base); chrec2 = analyze_scalar_evolution (loop, offset); chrec1 = chrec_convert (type, chrec1, at_stmt); chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); res = chrec_fold_plus (type, chrec1, chrec2); where you probably need to handle scev_not_known when analyzing offset (which might be NULL). You also need to add bitpos to the base address (in bytes, of course). Note that the MEM_REF case would naturally work with this as well. OK. New patch is like below, and bootstrapped on x86-32. ChangeLog: 2012-01-20 Jiangning Liu jiangning@arm.com * tree-scalar-evolution (interpret_rhs_expr): generate chrec for array reference and component reference. ChangeLog for testsuite: 2012-01-20 Jiangning Liu jiangning@arm.com * gcc.dg/tree-ssa/scev-3.c: New. * gcc.dg/tree-ssa/scev-4.c: New. Richard, PING... Is this patch OK after branch 4.7 is created and trunk is open again? Thanks, -Jiangning diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i1000; i+=k) { + a_p = a[i]; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times a 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c new file mode 100644 index 000..6c1e530 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +typedef struct { + int x; + int y; +} S; + +int *a_p; +S a[1000]; + +f(int k) +{ + int i; + + for (i=k; i1000; i+=k) { + a_p = a[i].y; + *a_p = 100; +} +} + +/* { dg-final { scan-tree-dump-times a 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..4e06b75 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple at_stmt, switch (code) { case ADDR_EXPR: - /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR. */ - if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) - { - res = chrec_dont_know; - break; - } + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF) +{ + enum machine_mode mode; + HOST_WIDE_INT bitsize, bitpos; + int unsignedp; + int volatilep = 0; + tree base, offset; + tree chrec3; + + base = get_inner_reference (TREE_OPERAND (rhs1, 0), + bitsize, bitpos, offset, + mode, unsignedp, volatilep, false); + + if (TREE_CODE (base) == MEM_REF) + { + rhs2 = TREE_OPERAND (base, 1); + rhs1 = TREE_OPERAND (base, 0); + + chrec1 = analyze_scalar_evolution (loop, rhs1); + chrec2 = analyze_scalar_evolution (loop, rhs2); + chrec1 = chrec_convert (type, chrec1, at_stmt); + chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt); + res = chrec_fold_plus (type, chrec1, chrec2); + } + else + { + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); + chrec1 = analyze_scalar_evolution (loop, base); + chrec1 = chrec_convert (type, chrec1, at_stmt); + res = chrec1; + } - rhs2 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 1); - rhs1 = TREE_OPERAND (TREE_OPERAND (rhs1, 0), 0); - /* Fall through. */ + if (offset != NULL_TREE) +
Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
On Fri, Feb 10, 2012 at 10:02 PM, Andrew Pinski pins...@gmail.com wrote: On Fri, Feb 10, 2012 at 12:46 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I was looking at the routelookup EEMBC benchmark and it has code of the form: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = this_node-rlink; else next_node = this_node-llink; } Hmm, this looks like we could do this on the tree level better as we have more information about this_node there. Like we know that we load from this_node-cmpbit before we do either of the branches. So can move both of those loads before the branch and then we get the ifcvt for free. Indeed. But note that the transform is not valid as *this_node may cross a page boundary and thus either pointer load may trap if the other does not (well, unless the C standard (and thus our middle-end) would require that iff ptr-component does not trap that *ptr does not trap either - we would require a operand_equal_p (get_base_address ()) for both addresses). Joseph, can you clarify what the C standard specifies here? Thanks, Richard. Thanks, Andrew Pinski This is where you have a binary tree/trie and you are iterating going down either the right link or left link until you find a stopping condition. The code in ifcvt.c does not handle optimizing these cases for conditional move since the load might trap, and generates code that does if-then-else with loads and jumps. However, since the two elements are next to each other in memory, they are likely in the same cache line, particularly with aligned stacks and malloc returning aligned pointers. Except in unusual circumstances where the pointer is not aligned, this means it is much faster to optimize it as: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; rtmp = this_node-rlink; ltmp = this_node-llink; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = rtmp; else next_node = ltmp; } So I wrote some patches to do this optimization. In ifcvt.c I added a new hook that allows the backend to try and do conditional moves if the machine independent code doesn't handle the special cases that the machine might have. Then in rs6000.c I used that hook to see if the conditional moves are adjacent, and do the optimization. I will note that this type of code comes up quite frequently since binary trees and tries are common data structure. The file splay-tree.c in libiberty is one place in the compiler tree that has conditional adjacent memory moves. So I would like comments on the patch before the 4.8 tree opens up. I feel even if we decide not to add the adjacent memory move patch, the hook is useful, and I have some other ideas for using it for the powerpc. I was thinking about rewriting the rs6000 dependent parts to make it a normal optimization available to all ports. Is this something we want as a normal option? At the moment, I'm not sure it should be part of -O3 because it is possible for a trap to occur if the pointer straddles a page boundary and the test condition would guard against loading up the second value. However, -Ofast might be an appropriate place to do this optimization. At this time I don't have test cases, but I would add them for the real submission. I have bootstraped the compiler on powerpc with this option enabled and it passed the bootstrap and had no regressions in make check. I will do a spec run over the weekend as well. In addition to libibery/splay-tree.c the following files in gcc have adjacent conditional moves that this code would optimize: cfg.c c-typeck.c df-scan.c fold-const.c graphds.c ira-emit.c omp-low.c rs6000.c tree-cfg.c tree-ssa-dom.c tree-ssa-loop-ivops.c tree-ssa-phiopt.c tree-ssa-uncprop.c 2012-02-10 Michael Meissner meiss...@linux.vnet.ibm.com * target.def (cmove_md_extra): New hook that is called from ifcvt.c to allow the backend to generate additional conditional moves that aren't handled by the machine independent code. Add support to call the hook at the appropriate places. * targhooks.h (default_cmove_md_extra): Likewise. * targhooks.c (default_cmove_md_extra): Likewise. * target.h (enum ifcvt_pass): Likewise. * ifcvt.c (find_if_header): Likewise. (noce_find_if_block): Likewise. (struct noce_if_info): Likewise. (noce_process_if_block): Likewise. (cond_move_process_if_block): Likewise. (if_convert): Likewise. (rest_of_handle_if_conversion): Likewise. (rest_of_handle_if_after_combine): Likewise.
Re: [PATCH] Fix signed bitfield BIT_NOT_EXPR expansion (PR middle-end/52209)
On Sat, Feb 11, 2012 at 12:49 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! In July Richard changed reduce_bit_field BIT_NOT_EXPR expansion from NOT unop to XOR with all the bits in the bitfield's precision set. Unfortunately that is correct for unsigned bitfields only, for signed bitfields, where op0 is already sign-extended to its mode before this, expanding this as NOT is the right thing. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2012-02-11 Jakub Jelinek ja...@redhat.com PR middle-end/52209 * expr.c (expand_expr_real_2) case BIT_NOT_EXPR: Only expand using XOR for reduce_bit_field if type is unsigned. * gcc.c-torture/execute/pr52209.c: New test. --- gcc/expr.c.jj 2012-02-07 16:05:51.0 +0100 +++ gcc/expr.c 2012-02-11 10:08:44.162924423 +0100 @@ -8582,8 +8582,9 @@ expand_expr_real_2 (sepops ops, rtx targ if (modifier == EXPAND_STACK_PARM) target = 0; /* In case we have to reduce the result to bitfield precision - expand this as XOR with a proper constant instead. */ - if (reduce_bit_field) + for unsigned bitfield expand this as XOR with a proper constant + instead. */ + if (reduce_bit_field TYPE_UNSIGNED (type)) temp = expand_binop (mode, xor_optab, op0, immed_double_int_const (double_int_mask (TYPE_PRECISION (type)), mode), --- gcc/testsuite/gcc.c-torture/execute/pr52209.c.jj 2012-02-11 10:09:46.080571803 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr52209.c 2012-02-11 10:09:28.0 +0100 @@ -0,0 +1,14 @@ +/* PR middle-end/52209 */ + +extern void abort (void); +struct S0 { int f2 : 1; } c; +int b; + +int +main () +{ + b = -1 ^ c.f2; + if (b != -1) + abort (); + return 0; +} Jakub
Re: [PATCH] Improve SCEV for array element
On Mon, Feb 13, 2012 at 10:54 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Friday, January 20, 2012 5:07 PM To: 'Richard Guenther' Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH] Improve SCEV for array element It's definitely not ok at this stage but at most for next stage1. OK. I may wait until next stage1. This is a very narrow pattern-match. It doesn't allow for a[i].x for example, even if a[i] is a one-element structure. I think the canonical way of handling ADDR_EXPR is to use sth like base = get_inner_reference (TREE_OPERAND (rhs1, 0), ..., offset, ...); base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); chrec1 = analyze_scalar_evolution (loop, base); chrec2 = analyze_scalar_evolution (loop, offset); chrec1 = chrec_convert (type, chrec1, at_stmt); chrec2 = chrec_convert (TREE_TYPE (offset), chrec2, at_stmt); res = chrec_fold_plus (type, chrec1, chrec2); where you probably need to handle scev_not_known when analyzing offset (which might be NULL). You also need to add bitpos to the base address (in bytes, of course). Note that the MEM_REF case would naturally work with this as well. OK. New patch is like below, and bootstrapped on x86-32. ChangeLog: 2012-01-20 Jiangning Liu jiangning@arm.com * tree-scalar-evolution (interpret_rhs_expr): generate chrec for array reference and component reference. ChangeLog for testsuite: 2012-01-20 Jiangning Liu jiangning@arm.com * gcc.dg/tree-ssa/scev-3.c: New. * gcc.dg/tree-ssa/scev-4.c: New. Richard, PING... Is this patch OK after branch 4.7 is created and trunk is open again? It's on my (rather large) list of things to review for 4.8. Be patient ... Richard. Thanks, -Jiangning diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c new file mode 100644 index 000..28d5c93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +int *a_p; +int a[1000]; + +f(int k) +{ + int i; + + for (i=k; i1000; i+=k) { + a_p = a[i]; + *a_p = 100; + } +} + +/* { dg-final { scan-tree-dump-times a 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c new file mode 100644 index 000..6c1e530 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +typedef struct { + int x; + int y; +} S; + +int *a_p; +S a[1000]; + +f(int k) +{ + int i; + + for (i=k; i1000; i+=k) { + a_p = a[i].y; + *a_p = 100; + } +} + +/* { dg-final { scan-tree-dump-times a 1 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 2077c8d..4e06b75 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -1712,16 +1712,61 @@ interpret_rhs_expr (struct loop *loop, gimple at_stmt, switch (code) { case ADDR_EXPR: - /* Handle MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR. */ - if (TREE_CODE (TREE_OPERAND (rhs1, 0)) != MEM_REF) - { - res = chrec_dont_know; - break; - } + if (TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == MEM_REF + || TREE_CODE (TREE_OPERAND (rhs1, 0)) == COMPONENT_REF) + { + enum machine_mode mode; + HOST_WIDE_INT bitsize, bitpos; + int unsignedp; + int volatilep = 0; + tree base, offset; + tree chrec3; + + base = get_inner_reference (TREE_OPERAND (rhs1, 0), + bitsize, bitpos, offset, + mode, unsignedp, volatilep, false); + + if (TREE_CODE (base) == MEM_REF) + { + rhs2 = TREE_OPERAND (base, 1); + rhs1 = TREE_OPERAND (base, 0); + + chrec1 = analyze_scalar_evolution (loop, rhs1); + chrec2 = analyze_scalar_evolution (loop, rhs2); + chrec1 = chrec_convert (type, chrec1, at_stmt); + chrec2 = chrec_convert (TREE_TYPE (rhs2), chrec2, at_stmt); + res = chrec_fold_plus (type, chrec1, chrec2); + } + else + { + base = build1 (ADDR_EXPR, TREE_TYPE (rhs1), base); + chrec1 = analyze_scalar_evolution (loop, base); + chrec1 = chrec_convert (type, chrec1, at_stmt); + res = chrec1; + } - rhs2 = TREE_OPERAND
[PATCH] Fix PR52211
Committed as obvious. Richard. 2012-02-13 Richard Guenther rguent...@suse.de PR translation/52211 * passes.c (enable_disable_pass): Fix typo. Index: gcc/passes.c === --- gcc/passes.c(revision 184151) +++ gcc/passes.c(working copy) @@ -709,7 +709,7 @@ enable_disable_pass (const char *arg, bo if (is_enable) error (unknown pass %s specified in -fenable, phase_name); else -error (unknown pass %s specified in -fdisble, phase_name); +error (unknown pass %s specified in -fdisable, phase_name); free (argstr); return; }
Re: [PATCH ARM] backport r174803 from trunk to 4.6 branch
On 08/02/12 08:29, Bin Cheng wrote: Hi, Julian Brown once posted a patch fixing ARM EABI violation, which I think also essential to 4.6 branch. I created a patch against 4.6 branch as attached. Is it ok to back port? You can refer following link for original patch. http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00260.html Thanks gcc/ChangeLog: 2012-02-08 Bin Cheng bin.ch...@arm.com Backport from mainline 2011-06-08 Julian Brown jul...@codesourcery.com * config/arm/arm.c (arm_libcall_uses_aapcs_base): Use correct ABI for double-precision helper functions in hard-float mode if only single-precision arithmetic is supported in hardware. OK. Can you also back-port it to 4.5 as well, please. R.
Re: [PATCH] Fix for PR52081 - Missed tail merging with pure calls
On Thu, Feb 2, 2012 at 11:44 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, this patch fixes PR52801. Consider test-case pr51879-12.c: ... __attribute__((pure)) int bar (int); __attribute__((pure)) int bar2 (int); void baz (int); int x, z; void foo (int y) { int a = 0; if (y == 6) { a += bar (7); a += bar2 (6); } else { a += bar2 (6); a += bar (7); } baz (a); } ... When compiling at -O2, pr51879-12.c.094t.pre looks like this: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1717_4 = barD.1703 (7); # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1718_6 = bar2D.1705 (6); aD.1713_7 = D.1717_4 + D.1718_6; goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1720_8 = bar2D.1705 (6); # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1721_10 = barD.1703 (7); aD.1713_11 = D.1720_8 + D.1721_10; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1713_1 = PHI aD.1713_7(3), aD.1713_11(4) # .MEMD.1722_13 = VDEF .MEMD.1722_12(D) # USE = nonlocal # CLB = nonlocal bazD.1707 (aD.1713_1); # VUSE .MEMD.1722_13 return; ... block 3 and 4 can be tail-merged. Value numbering numbers the two phi arguments a_7 and a_11 the same so the problem is not in value numbering: ... Setting value number of a_11 to a_7 (changed) ... There are 2 reasons that tail_merge_optimize doesn't optimize this: 1. The clause is_gimple_assign (stmt) local_def (gimple_get_lhs (stmt)) !gimple_has_side_effects (stmt) used in both same_succ_hash and gsi_advance_bw_nondebug_nonlocal evaluates to false for pure calls. This is fixed by replacing is_gimple_assign with gimple_has_lhs. 2. In same_succ_equal we check gimples from the 2 bbs side-by-side: ... gsi1 = gsi_start_nondebug_bb (bb1); gsi2 = gsi_start_nondebug_bb (bb2); while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2))) { s1 = gsi_stmt (gsi1); s2 = gsi_stmt (gsi2); if (gimple_code (s1) != gimple_code (s2)) return 0; if (is_gimple_call (s1) !gimple_call_same_target_p (s1, s2)) return 0; gsi_next_nondebug (gsi1); gsi_next_nondebug (gsi2); } ... and we'll be comparing 'bar (7)' and 'bar2 (6)', and gimple_call_same_target_p will return false. This is fixed by ignoring local defs in this comparison, by using gsi_advance_fw_nondebug_nonlocal on the iterators. bootstrapped and reg-tested on x86_64. ok for stage1? Sorry for responding so late ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. Richard. Thanks, - Tom 2012-02-02 Tom de Vries t...@codesourcery.com * tree-ssa-tail-merge.c (local_def): Move up. (stmt_local_def): New function, factored out of same_succ_hash. Use gimple_has_lhs instead of is_gimple_assign. (gsi_advance_nondebug_nonlocal): New function, factored out of gsi_advance_bw_nondebug_nonlocal. Use stmt_local_def. (gsi_advance_fw_nondebug_nonlocal): New function. (gsi_advance_bw_nondebug_nonlocal): Use gsi_advance_nondebug_nonlocal. Move up. (same_succ_hash): Use stmt_local_def. (same_succ_equal): Use gsi_advance_fw_nondebug_nonlocal. * gcc.dg/pr51879-12.c: New test.
Re: [committed] Remove myself as vectorizer maintainer
On Tue, 7 Feb 2012 15:44:04 +0200 Ira Rosen i...@il.ibm.com wrote: Hi, I am starting to work on a new project and won't be able to continue with vectorizer maintenance. I'd like to thank all the people I had a chance to work with for making my GCC experience so enjoyable. Thanks for all the hard work on auto-vectorization over the years! I'm sure your contributions will be missed. Cheers, Julian
Re: [PATCH] Fix for PR52081 - Missed tail merging with pure calls
On 13/02/12 12:54, Richard Guenther wrote: On Thu, Feb 2, 2012 at 11:44 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, this patch fixes PR52801. Consider test-case pr51879-12.c: ... __attribute__((pure)) int bar (int); __attribute__((pure)) int bar2 (int); void baz (int); int x, z; void foo (int y) { int a = 0; if (y == 6) { a += bar (7); a += bar2 (6); } else { a += bar2 (6); a += bar (7); } baz (a); } ... When compiling at -O2, pr51879-12.c.094t.pre looks like this: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1717_4 = barD.1703 (7); # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1718_6 = bar2D.1705 (6); aD.1713_7 = D.1717_4 + D.1718_6; goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1720_8 = bar2D.1705 (6); # VUSE .MEMD.1722_12(D) # USE = nonlocal escaped D.1721_10 = barD.1703 (7); aD.1713_11 = D.1720_8 + D.1721_10; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1713_1 = PHI aD.1713_7(3), aD.1713_11(4) # .MEMD.1722_13 = VDEF .MEMD.1722_12(D) # USE = nonlocal # CLB = nonlocal bazD.1707 (aD.1713_1); # VUSE .MEMD.1722_13 return; ... block 3 and 4 can be tail-merged. Value numbering numbers the two phi arguments a_7 and a_11 the same so the problem is not in value numbering: ... Setting value number of a_11 to a_7 (changed) ... There are 2 reasons that tail_merge_optimize doesn't optimize this: 1. The clause is_gimple_assign (stmt) local_def (gimple_get_lhs (stmt)) !gimple_has_side_effects (stmt) used in both same_succ_hash and gsi_advance_bw_nondebug_nonlocal evaluates to false for pure calls. This is fixed by replacing is_gimple_assign with gimple_has_lhs. 2. In same_succ_equal we check gimples from the 2 bbs side-by-side: ... gsi1 = gsi_start_nondebug_bb (bb1); gsi2 = gsi_start_nondebug_bb (bb2); while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2))) { s1 = gsi_stmt (gsi1); s2 = gsi_stmt (gsi2); if (gimple_code (s1) != gimple_code (s2)) return 0; if (is_gimple_call (s1) !gimple_call_same_target_p (s1, s2)) return 0; gsi_next_nondebug (gsi1); gsi_next_nondebug (gsi2); } ... and we'll be comparing 'bar (7)' and 'bar2 (6)', and gimple_call_same_target_p will return false. This is fixed by ignoring local defs in this comparison, by using gsi_advance_fw_nondebug_nonlocal on the iterators. bootstrapped and reg-tested on x86_64. ok for stage1? Sorry for responding so late ... no problem :) I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. Your comment is relevant for the other recent tail-merge related fixes I submitted, but I think not for this one. In this case, value-numbering manages to value number the 2 phi-alternatives equal. It's tail-merging that doesn't take advantage of this, by treating pure function calls the same as non-pure function calls. The fixes are therefore in tail-merging, not in value numbering. So, ok for stage1? Thanks, - Tom Richard. Thanks, - Tom 2012-02-02 Tom de Vries t...@codesourcery.com * tree-ssa-tail-merge.c (local_def): Move up. (stmt_local_def): New function, factored out of same_succ_hash. Use gimple_has_lhs instead of is_gimple_assign. (gsi_advance_nondebug_nonlocal): New function, factored out of gsi_advance_bw_nondebug_nonlocal. Use stmt_local_def. (gsi_advance_fw_nondebug_nonlocal): New function. (gsi_advance_bw_nondebug_nonlocal): Use gsi_advance_nondebug_nonlocal. Move up. (same_succ_hash): Use stmt_local_def. (same_succ_equal): Use gsi_advance_fw_nondebug_nonlocal. * gcc.dg/pr51879-12.c: New test.
RE: [Patch,wwwdocs,AVR]: AVR release notes
-Original Message- From: Gerald Pfeifer Sent: Sunday, February 12, 2012 3:17 PM To: Georg-Johann Lay Cc: gcc-patches@gcc.gnu.org; Denis Chertykov; Weddington, Eric Subject: Re: [Patch,wwwdocs,AVR]: AVR release notes This looks like an impressive release for AVR! Gerald Johann has been doing some excellent work for the AVR backend. It's been very much appreciated. Eric Weddington
[PR52001] too many cse reverse equiv exprs (take2)
Jakub asked to have a closer look at the problem, and I found we could do somewhat better. The first thing I noticed was that the problem was that, in each block that computed a (base+const), we created a new VALUE for the expression (with the same const and global base), and a new reverse operation. This was wrong. Clearly we should reuse the same expression. I had to arrange for the expression to be retained across basic blocks, for it was function invariant. I split out the code to detect invariants from the function that removes entries from the cselib hash table across blocks, and made it recursive so that a VALUE equivalent to (plus (value) (const_int)) will be retained, if the base value fits (maybe recursively) the definition of invariant. An earlier attempt to address this issue remained in cselib: using the canonical value to build the reverse expression. I believe it has a potential of avoiding the creation of redundant reverse expressions, for expressions involving equivalent but different VALUEs will evaluate to different hashes. I haven't observed effects WRT the given testcase, before or after the change that actually fixed the problem, because we now find the same base expression and thus reuse the reverse_op as well, but I figured I'd keep it in for it is very cheap and possibly useful. Regstrapped on x86_64-linux-gnu and i686-pc-linux-gnu. Ok to install? for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/52001 * cselib.c (invariant_p): Split out of... (preserve_only_constants): ... this. Preserve plus expressions of invariant values and constants. * var-tracking.c (reverse_op): Don't drop equivs of constants. Use canonical value to build reverse operation. Index: gcc/cselib.c === --- gcc/cselib.c.orig 2012-02-12 06:13:40.676385499 -0200 +++ gcc/cselib.c 2012-02-12 09:07:00.653579375 -0200 @@ -383,22 +383,29 @@ cselib_clear_table (void) cselib_reset_table (1); } -/* Remove from hash table all VALUEs except constants - and function invariants. */ +/* Return TRUE if V is a constant or a function invariant, FALSE + otherwise. */ -static int -preserve_only_constants (void **x, void *info ATTRIBUTE_UNUSED) +static bool +invariant_p (cselib_val *v) { - cselib_val *v = (cselib_val *)*x; struct elt_loc_list *l; + if (v == cfa_base_preserved_val) +return true; + + /* Keep VALUE equivalences around. */ + for (l = v-locs; l; l = l-next) +if (GET_CODE (l-loc) == VALUE) + return true; + if (v-locs != NULL v-locs-next == NULL) { if (CONSTANT_P (v-locs-loc) (GET_CODE (v-locs-loc) != CONST || !references_value_p (v-locs-loc, 0))) - return 1; + return true; /* Although a debug expr may be bound to different expressions, we can preserve it as if it was constant, to get unification and proper merging within var-tracking. */ @@ -406,24 +413,29 @@ preserve_only_constants (void **x, void || GET_CODE (v-locs-loc) == DEBUG_IMPLICIT_PTR || GET_CODE (v-locs-loc) == ENTRY_VALUE || GET_CODE (v-locs-loc) == DEBUG_PARAMETER_REF) - return 1; - if (cfa_base_preserved_val) - { - if (v == cfa_base_preserved_val) - return 1; - if (GET_CODE (v-locs-loc) == PLUS - CONST_INT_P (XEXP (v-locs-loc, 1)) - XEXP (v-locs-loc, 0) == cfa_base_preserved_val-val_rtx) - return 1; - } + return true; + + /* (plus (value V) (const_int C)) is invariant iff V is invariant. */ + if (GET_CODE (v-locs-loc) == PLUS + CONST_INT_P (XEXP (v-locs-loc, 1)) + GET_CODE (XEXP (v-locs-loc, 0)) == VALUE + invariant_p (CSELIB_VAL_PTR (XEXP (v-locs-loc, 0 + return true; } - /* Keep VALUE equivalences around. */ - for (l = v-locs; l; l = l-next) -if (GET_CODE (l-loc) == VALUE) - return 1; + return false; +} + +/* Remove from hash table all VALUEs except constants + and function invariants. */ + +static int +preserve_only_constants (void **x, void *info ATTRIBUTE_UNUSED) +{ + cselib_val *v = (cselib_val *)*x; - htab_clear_slot (cselib_hash_table, x); + if (!invariant_p (v)) +htab_clear_slot (cselib_hash_table, x); return 1; } Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2012-02-12 06:13:38.633412886 -0200 +++ gcc/var-tracking.c 2012-02-12 10:09:49.0 -0200 @@ -5298,7 +5298,6 @@ reverse_op (rtx val, const_rtx expr, rtx { rtx src, arg, ret; cselib_val *v; - struct elt_loc_list *l; enum rtx_code code; if (GET_CODE (expr) != SET) @@ -5334,13 +5333,9 @@ reverse_op (rtx val, const_rtx expr, rtx if (!v || !cselib_preserved_value_p (v)) return; - /* Adding a reverse op isn't useful if V already has an always valid - location. Ignore ENTRY_VALUE, while it is always constant, we should - prefer non-ENTRY_VALUE locations whenever possible. */ - for (l = v-locs;
Re: [PR52001] too many cse reverse equiv exprs (take2)
On Mon, Feb 13, 2012 at 12:27:35PM -0200, Alexandre Oliva wrote: Jakub asked to have a closer look at the problem, and I found we could do somewhat better. The first thing I noticed was that the problem was that, in each block that computed a (base+const), we created a new VALUE for the expression (with the same const and global base), and a new reverse operation. I'm not convinced you want the + /* Keep VALUE equivalences around. */ + for (l = v-locs; l; l = l-next) +if (GET_CODE (l-loc) == VALUE) + return true; hunk in invariant_p, I'd say it should stay in preserve_only_values, a value equivalence isn't necessarily invariant. Otherwise the cselib.c changes look ok to me, but I don't understand why are you removing the var-tracking.c loop. While cselib will with your changes handle the situation better, for values that are already invariant (guess canonical_cselib_val should be called before that loop and perhaps instead of testing CONSTANT_P it could test invatiant_p if you rename it to cselib_invariant_p and export) adding any reverse ops for it is really just a waste of resources, because we have a better location already in the list. Adding the extra loc doesn't improve it in any way. Jakub
[PATCH] Fix PR52178
This fixes PR52178, the failure to bootstrap Ada with LTO (well, until you hit the next problem). A self-referential DECL_QUALIFIER makes us think that a QUAL_UNION_TYPE type is of variable-size which makes us stream that type locally, wrecking type merging and later ICEing in the type verifier. While it looks that variably_modified_type_p should not inspect DECL_QUALIFIER a less intrusive patch for 4.7 notices that DECL_QUALIFIER is unused after gimplification and thus clears it and does not stream it instead. LTO bootstrapped until I hit an optimization ICE when optimizing gnat1, a regular bootstrap regtest is pending on x86_64-unknown-linux-gnu. Richard. 2012-02-13 Richard Guenther rguent...@suse.de PR lto/52178 * tree-streamer-in.c (lto_input_ts_field_decl_tree_pointers): Do not stream DECL_QUALIFIER. * tree-streamer-out.c (write_ts_field_decl_tree_pointers): Likewise. * tree.c (free_lang_data_in_decl): Free DECL_QUALIFIER. (find_decls_types_r): Do not walk DECL_QUALIFIER. Index: gcc/tree-streamer-in.c === --- gcc/tree-streamer-in.c (revision 184151) +++ gcc/tree-streamer-in.c (working copy) @@ -640,7 +640,7 @@ lto_input_ts_field_decl_tree_pointers (s { DECL_FIELD_OFFSET (expr) = stream_read_tree (ib, data_in); DECL_BIT_FIELD_TYPE (expr) = stream_read_tree (ib, data_in); - DECL_QUALIFIER (expr) = stream_read_tree (ib, data_in); + /* Do not stream DECL_QUALIFIER, it is useless after gimplification. */ DECL_FIELD_BIT_OFFSET (expr) = stream_read_tree (ib, data_in); DECL_FCONTEXT (expr) = stream_read_tree (ib, data_in); } Index: gcc/tree-streamer-out.c === --- gcc/tree-streamer-out.c (revision 184151) +++ gcc/tree-streamer-out.c (working copy) @@ -552,7 +552,7 @@ write_ts_field_decl_tree_pointers (struc { stream_write_tree (ob, DECL_FIELD_OFFSET (expr), ref_p); stream_write_tree (ob, DECL_BIT_FIELD_TYPE (expr), ref_p); - stream_write_tree (ob, DECL_QUALIFIER (expr), ref_p); + /* Do not stream DECL_QUALIFIER, it is useless after gimplification. */ stream_write_tree (ob, DECL_FIELD_BIT_OFFSET (expr), ref_p); stream_write_tree (ob, DECL_FCONTEXT (expr), ref_p); } Index: gcc/tree.c === --- gcc/tree.c (revision 184151) +++ gcc/tree.c (working copy) @@ -4596,7 +4596,10 @@ free_lang_data_in_decl (tree decl) free_lang_data_in_one_sizepos (DECL_SIZE (decl)); free_lang_data_in_one_sizepos (DECL_SIZE_UNIT (decl)); if (TREE_CODE (decl) == FIELD_DECL) -free_lang_data_in_one_sizepos (DECL_FIELD_OFFSET (decl)); +{ + free_lang_data_in_one_sizepos (DECL_FIELD_OFFSET (decl)); + DECL_QUALIFIER (decl) = NULL_TREE; +} if (TREE_CODE (decl) == FUNCTION_DECL) { @@ -4800,7 +4803,6 @@ find_decls_types_r (tree *tp, int *ws, v { fld_worklist_push (DECL_FIELD_OFFSET (t), fld); fld_worklist_push (DECL_BIT_FIELD_TYPE (t), fld); - fld_worklist_push (DECL_QUALIFIER (t), fld); fld_worklist_push (DECL_FIELD_BIT_OFFSET (t), fld); fld_worklist_push (DECL_FCONTEXT (t), fld); }
Re: [PATCH] Re: New atomics not mentioned in /gcc-4.7/changes.html
On 02/12/2012 04:48 PM, Gerald Pfeifer wrote: On Wed, 8 Feb 2012, Andrew MacLeod wrote: Checked in the shortened version andcode changes. How thats? seems better :-) Yep, thanks! There is just a minor grammor I went ahead fixing. On the title page, I was thinking to refer to the release notes entry (gcc-4.7/changes.html), and would make this change for you if you agree. If not, we can leave it as is. Im happy with your changes :-) Andrew
Re: [PING] New port resubmission for TILEPro and TILE-Gx
Ping. Can someone please review these ports? Here is a summary of the submission. Summary of changes in latest submit: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01854.html Latest submit: 1/6 toplevel: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01860.html 2/6 contrib: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01855.html 3/6 gcc: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01494.html 4/6 libcpp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01857.html 5/6 libgcc: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01858.html 6/6 libgomp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01859.html 1st round review comments: http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01385.html http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01387.html 2nd round review comments: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01232.html http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01247.html Thanks, Walter Lee
[Committed] S/390: Adjust gcc.c-torture/execute/pr51933.c testcase
Committed to mainline. 2012-02-13 Andreas Krebbel andreas.kreb...@de.ibm.com * gcc.c-torture/execute/pr51933.c: Modify for s390 31 bit. --- gcc/testsuite/gcc.c-torture/execute/pr51933.c |8 1 file changed, 8 insertions(+) Index: gcc/testsuite/gcc.c-torture/execute/pr51933.c === *** gcc/testsuite/gcc.c-torture/execute/pr51933.c.orig --- gcc/testsuite/gcc.c-torture/execute/pr51933.c *** static unsigned char v2[256], v3[256]; *** 6,12 --- 6,20 __attribute__((noclone, noinline)) void foo (void) { + #if defined(__s390__) !defined(__zarch__) + /* S/390 31 bit cannot deal with more than one literal pool + reference per insn. */ + asm volatile ( : : g (v1) : memory); + asm volatile ( : : g (v2[0])); + asm volatile ( : : g (v3[0])); + #else asm volatile ( : : g (v1), g (v2[0]), g (v3[0]) : memory); + #endif } __attribute__((noclone, noinline)) int
[Patch,AVR]: Built-in for non-contiguous port layouts
This patch set removes __builtin_avr_map8 __builtin_avr_map16 built-ins and implements a built-in __builtin_avr_insert_bits instead. This has several reasons: * From user feedback I learned that speed matters more than size here * I found that the new built-in has better usability and fits better to the intended use cases. * Better code is generated by implementing hook TARGET_FOLD_BUILTIN. * The implementation is simpler (except the new folding part). * There were issues with __builtin_avr_map*. Instead of fixing these I went ahead an removed them altogether * The new built-in is generic enough to provide the old ones' functionalities easily. There are 2 new test programs for this built-in that all pass fine. Ok for trunk? Johann gcc/doc/ * extend.texi (AVR Built-in Functions): Remove doc for __builtin_avr_map8, __builtin_avr_map16. Document __builtin_avr_insert_bits. gcc/testsuite/ * gcc.target/avr/torture/builtin_insert_bits-1.c: New test. * gcc.target/avr/torture/builtin_insert_bits-2.c: New test. gcc/ * config/avr/avr.md (map_bitsqi, map_bitshi): Remove. (insert_bits): New insn. (adjust_len.map_bits): Rename to insert_bits. (UNSPEC_MAP_BITS): Rename to UNSPEC_INSERT_BITS. * avr-protos.h (avr_out_map_bits): Remove. (avr_out_insert_bits, avr_has_nibble_0xf): New. * config/avr/constraints.md (Cxf,C0f): New. * config/avr/avr.c (avr_cpu_cpp_builtins): Remove built-in defines __BUILTIN_AVR_MAP8, __BUILTIN_AVR_MAP16. New built-in define __BUILTIN_AVR_INSERT_BITS. * config/avr/avr.c (TARGET_FOLD_BUILTIN): New define. (enum avr_builtin_id): Add AVR_BUILTIN_INSERT_BITS. (avr_move_bits): Rewrite. (avr_fold_builtin, avr_map_metric, avr_map_decompose): New static functions. (avr_map_op_t): New typedef. (avr_map_op): New static variable. (avr_out_insert_bits, avr_has_nibble_0xf): New functions. (adjust_insn_length): Handle ADJUST_LEN_INSERT_BITS. (avr_init_builtins): Add definition for __builtin_avr_insert_bits. (bdesc_3arg, avr_expand_triop_builtin): New. (avr_expand_builtin): Use them. And handle AVR_BUILTIN_INSERT_BITS. (avr_revert_map, avr_swap_map, avr_id_map, avr_sig_map): Remove. (avr_map_hamming_byte, avr_map_hamming_nonstrict): Remove. (avr_map_equal_p, avr_map_sig_p): Remove. (avr_out_swap_bits, avr_out_revert_bits, avr_out_map_bits): Remove. (bdesc_2arg): Remove AVR_BUILTIN_MAP8, AVR_BUILTIN_MAP16. (adjust_insn_length): Remove handling for ADJUST_LEN_MAP_BITS. (enum avr_builtin_id): Remove AVR_BUILTIN_MAP8, AVR_BUILTIN_MAP16. (avr_init_builtins): Remove __builtin_avr_map8, __builtin_avr_map16. (avr_expand_builtin): Remove AVR_BUILTIN_MAP8, AVR_BUILTIN_MAP16. Index: doc/extend.texi === --- doc/extend.texi (revision 184156) +++ doc/extend.texi (working copy) @@ -8810,33 +8810,53 @@ might increase delay time. @code{ticks} integer constant; delays with a variable number of cycles are not supported. @smallexample - unsigned char __builtin_avr_map8 (unsigned long map, unsigned char val) + unsigned char __builtin_avr_insert_bits (unsigned long map, unsigned char bits, unsigned char val) @end smallexample @noindent -Each bit of the result is copied from a specific bit of @code{val}. -@code{map} is a compile time constant that represents a map composed -of 8 nibbles (4-bit groups): -The @var{n}-th nibble of @code{map} specifies which bit of @code{val} -is to be moved to the @var{n}-th bit of the result. -For example, @code{map = 0x76543210} represents identity: The MSB of -the result is read from the 7-th bit of @code{val}, the LSB is -read from the 0-th bit to @code{val}, etc. -Two more examples: @code{0x01234567} reverses the bit order and -@code{0x32107654} is equivalent to a @code{swap} instruction. +Insert bits from @var{bits} into @var{val} and return the resulting +value. The nibbles of @var{map} determine how the insertion is +performed: Let @var{X} be the @var{n}-th nibble of @var{map} +@enumerate +@item If @var{X} is @code{0xf}, +then the @var{n}-th bit of @var{val} is returned unaltered. + +@item If X is in the range 0@dots{}7, +then the @var{n}-th result bit is set to the @var{X}-th bit of @var{bits} + +@item If X is in the range 8@dots{}@code{0xe}, +then the @var{n}-th result bit is undefined. +@end enumerate @noindent -One typical use case for this and the following built-in is adjusting input and -output values to non-contiguous port layouts. +One typical use case for this built-in is adjusting input and +output values to non-contiguous port layouts. Some examples: @smallexample - unsigned int __builtin_avr_map16 (unsigned long long map, unsigned int val) +// same as val, bits is unused
Re: trans-mem: virtual ops for gimple_transaction
On 02/13/2012 01:35 AM, Richard Guenther wrote: On Fri, 10 Feb 2012, Richard Henderson wrote: On 02/10/2012 01:44 AM, Richard Guenther wrote: What is the reason to keep a GIMPLE_TRANSACTION stmt after TM lowering and not lower it to a builtin function call? Because real optimization hasn't happened yet, and we hold out hope that we'll be able to delete stuff as unreachable. Especially all instances of transaction_cancel. It seems the body is empty after lowering (what's the label thing?) The label is the transaction cancel label. When we finally convert GIMPLE_TRANSACTION a builtin, we'll generate different code layouts with and without a cancel. Ah, I see. But wouldn't a placeholder builtin function be effectively the same as using a new GIMPLE stmt kind? Except for the whole need to hold on to a label thing. Honestly, think about that for 10 seconds and tell me that a builtin is better than simply re-tasking the gimple code that we already have around. r~
New German PO file for 'gcc' (version 4.7-b20120128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the German team of translators. The file is available at: http://translationproject.org/latest/gcc/de.po (This file, 'gcc-4.7-b20120128.de.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
On Mon, 13 Feb 2012, Richard Guenther wrote: Indeed. But note that the transform is not valid as *this_node may cross a page boundary and thus either pointer load may trap if the other does not (well, unless the C standard (and thus our middle-end) would require that iff ptr-component does not trap that *ptr does not trap either - we would require a operand_equal_p (get_base_address ()) for both addresses). Joseph, can you clarify what the C standard specifies here? The question of what the relevant objects for an access are isn't well-defined in general, but it seems doubtful that accessing via a structure type is valid if the whole structure isn't in accessible memory. (Whereas you can't speculatively load from x[1] just because x[0] was accessed - x might point to an array of size 1. And of course this applies with flexible array members - access to any bit of the structure means the part before the flexible array member is available, but the flexible array member may not extend beyond the part accessed.) -- Joseph S. Myers jos...@codesourcery.com
Re: Documenting the MIPS changes in 4.7
Gerald Pfeifer ger...@pfeifer.com writes: On Sun, 5 Feb 2012, Richard Sandiford wrote: I've committed this patch to describe the MIPS changes in GCC 4.7. Corrections, comments, and help with wordsmithing are all welcome. Nice! How about the small follow-up below? The first definitely looks good, thanks. Not sure either way about the second; I'll leave it up to you. Richard
[Patch, libfortran] RFC: Shared vtables, constification
Hi, the attached patch changes the low-level libgfortran IO dispatching mechanism to use shared vtables for each stream type, instead of all the function pointers being replicated for each unit. This is similar to e.g. how the C++ frontend implements vtables. The benefits are: - Slightly smaller heap memory overhead for each unit as only the vtable pointer needs to be stored, and slightly faster unit initialization as only the vtable pointer needs to be setup instead of all the function pointers in the stream struct. - Looking at unix.o with readelf, one sees Relocation section '.rela.data.rel.ro.local.mem_vtable' at offset 0x15550 contains 8 entries: and similarly for the other vtables; according to http://www.airs.com/blog/archives/189 this means that after relocation the page where this data resides may be marked read-only. The downside is that the sizes of the .text and .data sections are increased. Before: textdata bss dec hex filename 11169916664 592 1124247 112797 ./x86_64-unknown-linux-gnu/libgfortran/.libs/libgfortran.so After: textdata bss dec hex filename 11174876936 592 1125015 112a97 ./x86_64-unknown-linux-gnu/libgfortran/.libs/libgfortran.so The data section increase is due to the vtables, the text increase is, I guess, due to the extra pointer dereference when calling the IO functions. Regtested on x86_64-unknown-linux-gnu, Ok for trunk, or 4.8? 2012-02-13 Janne Blomqvist j...@gcc.gnu.org * io/unix.h (struct stream): Rename to stream_vtable. (struct stream): New struct definition. (sread): Dereference vtable pointer. (swrite): Likewise. (sseek): Likewise. (struncate): Likewise. (sflush): Likewise. (sclose): Likewise. * io/unix.c (raw_vtable): New variable. (buf_vtable): Likewise. (mem_vtable): Likewise. (mem4_vtable): Likewise. (raw_init): Assign vtable pointer. (buf_init): Likewise. (open_internal): Likewise. (open_internal4): Likewise. -- Janne Blomqvist diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c index 6eef3f9..978c3ff 100644 --- a/libgfortran/io/unix.c +++ b/libgfortran/io/unix.c @@ -401,17 +401,21 @@ raw_close (unix_stream * s) return retval; } +static const struct stream_vtable raw_vtable = { + .read = (void *) raw_read, + .write = (void *) raw_write, + .seek = (void *) raw_seek, + .tell = (void *) raw_tell, + .size = (void *) raw_size, + .trunc = (void *) raw_truncate, + .close = (void *) raw_close, + .flush = (void *) raw_flush +}; + static int raw_init (unix_stream * s) { - s-st.read = (void *) raw_read; - s-st.write = (void *) raw_write; - s-st.seek = (void *) raw_seek; - s-st.tell = (void *) raw_tell; - s-st.size = (void *) raw_size; - s-st.trunc = (void *) raw_truncate; - s-st.close = (void *) raw_close; - s-st.flush = (void *) raw_flush; + s-st.vptr = raw_vtable; s-buffer = NULL; return 0; @@ -619,17 +623,21 @@ buf_close (unix_stream * s) return raw_close (s); } +static const struct stream_vtable buf_vtable = { + .read = (void *) buf_read, + .write = (void *) buf_write, + .seek = (void *) buf_seek, + .tell = (void *) buf_tell, + .size = (void *) buf_size, + .trunc = (void *) buf_truncate, + .close = (void *) buf_close, + .flush = (void *) buf_flush +}; + static int buf_init (unix_stream * s) { - s-st.read = (void *) buf_read; - s-st.write = (void *) buf_write; - s-st.seek = (void *) buf_seek; - s-st.tell = (void *) buf_tell; - s-st.size = (void *) buf_size; - s-st.trunc = (void *) buf_truncate; - s-st.close = (void *) buf_close; - s-st.flush = (void *) buf_flush; + s-st.vptr = buf_vtable; s-buffer = get_mem (BUFFER_SIZE); return 0; @@ -872,6 +880,31 @@ mem_close (unix_stream * s) return 0; } +static const struct stream_vtable mem_vtable = { + .read = (void *) mem_read, + .write = (void *) mem_write, + .seek = (void *) mem_seek, + .tell = (void *) mem_tell, + /* buf_size is not a typo, we just reuse an identical + implementation. */ + .size = (void *) buf_size, + .trunc = (void *) mem_truncate, + .close = (void *) mem_close, + .flush = (void *) mem_flush +}; + +static const struct stream_vtable mem4_vtable = { + .read = (void *) mem_read4, + .write = (void *) mem_write4, + .seek = (void *) mem_seek, + .tell = (void *) mem_tell, + /* buf_size is not a typo, we just reuse an identical + implementation. */ + .size = (void *) buf_size, + .trunc = (void *) mem_truncate, + .close = (void *) mem_close, + .flush = (void *) mem_flush +}; /* Public functions -- A reimplementation of this module needs to @@ -895,16 +928,7 @@ open_internal (char *base, int length, gfc_offset offset) s-logical_offset = 0; s-active = s-file_length = length; - s-st.close = (void *) mem_close; - s-st.seek =
[PATCH] Fix up vectorizer cost model use of uninitialized value (PR tree-optimization/52210)
Hi! The PR50912 changed vect_get_and_check_slp_defs dt from array into scalar, which fails when calling vect_model_simple_cost which looks at two array members. I believe even 4.6 checked just the first operand, as it called it when processing the first operand, so IMHO this patch doesn't regress (the very incomplete) cost model handling and doesn't introduce undefined behavior. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-02-13 Jakub Jelinek ja...@redhat.com PR tree-optimization/52210 * tree-vect-slp.c (vect_get_and_check_slp_defs): Call vect_model_simple_cost with two entry vect_def_type array instead of an address of dt. * gcc.dg/pr52210.c: New test. --- gcc/tree-vect-slp.c.jj 2012-02-07 16:05:51.0 +0100 +++ gcc/tree-vect-slp.c 2012-02-13 10:14:28.017357662 +0100 @@ -321,10 +321,15 @@ vect_get_and_check_slp_defs (loop_vec_in vect_model_store_cost (stmt_info, ncopies_for_cost, false, dt, slp_node); else - /* Not memory operation (we don't call this function for - loads). */ - vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, - slp_node); + { + enum vect_def_type dts[2]; + dts[0] = dt; + dts[1] = vect_uninitialized_def; + /* Not memory operation (we don't call this function for +loads). */ + vect_model_simple_cost (stmt_info, ncopies_for_cost, dts, + slp_node); + } } } else --- gcc/testsuite/gcc.dg/pr52210.c.jj 2012-02-13 10:27:46.692809216 +0100 +++ gcc/testsuite/gcc.dg/pr52210.c 2012-02-13 10:25:31.0 +0100 @@ -0,0 +1,12 @@ +/* PR tree-optimization/52210 */ +/* { dg-do compile } */ +/* { dg-options -O3 } */ + +void +foo (long *x, long y, long z) +{ + long a = x[0]; + long b = x[1]; + x[0] = a ~y; + x[1] = b ~z; +} Jakub
[PATCH] Fix __atomic_compare_exchange handling (PR c++/52215)
Hi! As the testcase shows, deciding on whether to convert an argument or not based on TYPE_SIZE is wrong. While the old __sync_* builtins in the _[1248]/_16 variants only had a VPTR as first argument and optionally I[1248]/I16 argument or arguments that should be converted, the new __atomic_* builtins also have PTR arguments (e.g. the expected pointer), BOOL (e.g. weak argument) or INT (e.g. the *memmodel arguments). Those have invariant types that shouldn't be adjusted based on what type the first pointer points to. I[1248]/I16 arguments are unsigned integers, the arguments that we don't want to adjust are BOOLEAN_TYPE/POINTER_TYPE or signed integers, so I think we should convert only unsigned INTEGER_TYPEs. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-02-13 Jakub Jelinek ja...@redhat.com PR c++/52215 * c-common.c (sync_resolve_params): Don't decide whether to convert or not based on TYPE_SIZE comparison, convert whenever arg_type is unsigned INTEGER_TYPE. * g++.dg/ext/atomic-1.C: New test. --- gcc/c-family/c-common.c.jj 2012-01-26 09:22:17.0 +0100 +++ gcc/c-family/c-common.c 2012-02-13 14:49:15.204685590 +0100 @@ -9336,10 +9336,12 @@ sync_resolve_params (location_t loc, tre return false; } - /* Only convert parameters if the size is appropriate with new format -sync routines. */ - if (orig_format - || tree_int_cst_equal (TYPE_SIZE (ptype), TYPE_SIZE (arg_type))) + /* Only convert parameters if arg_type is unsigned integer type with +new format sync routines, i.e. don't attempt to convert pointer +arguments (e.g. EXPECTED argument of __atomic_compare_exchange_n), +bool arguments (e.g. WEAK argument) or signed int arguments (memmodel +kinds). */ + if (TREE_CODE (arg_type) == INTEGER_TYPE TYPE_UNSIGNED (arg_type)) { /* Ideally for the first conversion we'd use convert_for_assignment so that we get warnings for anything that doesn't match the pointer --- gcc/testsuite/g++.dg/ext/atomic-1.C.jj 2012-02-13 14:54:33.337864794 +0100 +++ gcc/testsuite/g++.dg/ext/atomic-1.C 2012-02-13 14:53:13.0 +0100 @@ -0,0 +1,12 @@ +// PR c++/52215 +// { dg-do compile } + +enum E { ZERO }; + +int +main () +{ + E e = ZERO; + __atomic_compare_exchange_n (e, e, e, true, __ATOMIC_ACQ_REL, + __ATOMIC_RELAXED); +} Jakub
[ping 5] [patch] attribute to reverse bitfield allocations
Ping 5... Ping 4... Ping 3? It's been months with no feedback... Ping 2 ? http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01889.html http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02555.html http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00529.html http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01246.html
[PATCH] Fix cselib dump ICE
Hi! While debugging PR52172, I've noticed ICE when dumping RTL, all of cselib seems to test setting_insn for NULL, but this spot doesn't. Ok for trunk? 2012-02-13 Jakub Jelinek ja...@redhat.com * cselib.c (dump_cselib_val): Don't assume l-setting_insn is non-NULL. --- gcc/cselib.c.jj 2012-01-26 09:22:21.0 +0100 +++ gcc/cselib.c2012-02-13 11:07:15.109023769 +0100 @@ -2688,8 +2688,11 @@ dump_cselib_val (void **x, void *info) fputs ( locs:, out); do { - fprintf (out, \n from insn %i , - INSN_UID (l-setting_insn)); + if (l-setting_insn) + fprintf (out, \n from insn %i , +INSN_UID (l-setting_insn)); + else + fprintf (out, \n ); print_inline_rtx (out, l-loc, 4); } while ((l = l-next)); Jakub
Re: [PATCH] Prevent cselib substitution of FP, SP, SFP
On Wed, Jan 04, 2012 at 05:21:38PM +, Marcus Shawcroft wrote: Alias analysis by DSE based on CSELIB expansion assumes that references to the stack frame from different base registers (ie FP, SP) never alias. The comment block in cselib explains that cselib does not allow substitution of FP, SP or SFP specifically in order not to break DSE. Looks reasonable, appart from coding style (no spaces around - and no {} around return p-loc;), I just wonder if having a separate loop in expand_loc just for this isn't too expensive. On sane targets IMHO hard frame pointer in the prologue should be initialized from sp, not the other way around, thus hard frame pointer based VALUEs should have hard frame pointer earlier in the locs list (when there is hfp = sp (+ optionally some const) insn, we first cselib_lookup_from_insn the rhs and add to locs of the new VALUE (plus (VALUE of sp) (const_int)), then process the lhs and add it to locs, moving the plus to locs-next). So I think the following patch could be enough (bootstrapped/regtested on x86_64-linux and i686-linux). There is AVR though, which has really weirdo prologue - PR50063, but I think it should just use UNSPEC for that or something similar, setting sp from hfp seems unnecessary and especially for values with long locs chains could make cselib more expensive. Richard, what do you think about this? 2012-02-13 Jakub Jelinek ja...@redhat.com * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right away if seen. --- gcc/cselib.c.jj 2012-02-13 11:07:15.0 +0100 +++ gcc/cselib.c2012-02-13 18:15:17.531776145 +0100 @@ -1372,8 +1372,18 @@ expand_loc (struct elt_loc_list *p, stru unsigned int regno = UINT_MAX; struct elt_loc_list *p_in = p; - for (; p; p = p - next) + for (; p; p = p-next) { + /* Return these right away to avoid returning stack pointer based +expressions for frame pointer and vice versa, which is something +that would confuse DSE. See the comment in cselib_expand_value_rtx_1 +for more details. */ + if (REG_P (p-loc) + (REGNO (p-loc) == STACK_POINTER_REGNUM + || REGNO (p-loc) == FRAME_POINTER_REGNUM + || REGNO (p-loc) == HARD_FRAME_POINTER_REGNUM + || REGNO (p-loc) == cfa_base_preserved_regno)) + return p-loc; /* Avoid infinite recursion trying to expand a reg into a the same reg. */ if ((REG_P (p-loc)) Jakub
[PATCH, go]: Disable TestListenMulticastUDP on alpha linux
Hello! alpha linux does not have expected /proc/net/igmp and /proc/net/igmp6 files, so func interfaceMulticastAddrTable(ifindex int) from interface_linux.go always returns (nil, nil), failing net/test with: --- FAIL: net.TestListenMulticastUDP (4.71 seconds) ???:1: IPv4 multicast interface: nil ???:1: IPv4 multicast TTL: 1 ???:1: IPv4 multicast loopback: false ???:1: 224.0.0.254:12345 not found in RIB FAIL Attached patch skips this sub-test in the same way as for ARM arch. Tested on alphaev6-pc-linux-gnu, where it fixes failing net test. Uros. Index: go/net/multicast_test.go === --- go/net/multicast_test.go(revision 184156) +++ go/net/multicast_test.go(working copy) @@ -33,7 +33,7 @@ case netbsd, openbsd, plan9, windows: return case linux: - if runtime.GOARCH == arm { + if runtime.GOARCH == arm || runtime.GOARCH == alpha { return } }
[committed] Fix invalid GOMP_loop_static_start call (PR middle-end/52230)
Hi! If omp for loop body doesn't fallthru (which doesn't make much sense), then we would call GOMP_loop_static_start with wrong number of arguments if collapse is 1, static scheduling without chunk size and no ordered clause. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2012-02-13 Jakub Jelinek ja...@redhat.com PR middle-end/52230 * omp-low.c (expand_omp_for): If a static schedule without chunk size has NULL region-cont, force fd.chunk_size to be integer_zero_node. --- gcc/omp-low.c.jj2012-01-13 21:47:35.0 +0100 +++ gcc/omp-low.c 2012-02-13 12:54:55.137590443 +0100 @@ -4664,6 +4664,9 @@ expand_omp_for (struct omp_region *regio { int fn_index, start_ix, next_ix; + if (fd.chunk_size == NULL + fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC) + fd.chunk_size = integer_zero_node; gcc_assert (fd.sched_kind != OMP_CLAUSE_SCHEDULE_AUTO); fn_index = (fd.sched_kind == OMP_CLAUSE_SCHEDULE_RUNTIME) ? 3 : fd.sched_kind; Jakub
Re: [PING] New port resubmission for TILEPro and TILE-Gx
On 02/13/2012 07:42 AM, Walter Lee wrote: 1/6 toplevel: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01860.html 2/6 contrib: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01855.html 3/6 gcc: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01494.html 4/6 libcpp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01857.html 5/6 libgcc: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01858.html 6/6 libgomp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01859.html Ok. r~
Re: [PATCH] Fix cselib dump ICE
On 02/13/2012 11:43 AM, Jakub Jelinek wrote: * cselib.c (dump_cselib_val): Don't assume l-setting_insn is non-NULL. Ok. r~
Re: [PATCH] Prevent cselib substitution of FP, SP, SFP
On 02/13/2012 11:54 AM, Jakub Jelinek wrote: * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right away if seen. Looks good. r~
Re: [PATCH] Fix __atomic_compare_exchange handling (PR c++/52215)
On 02/13/2012 11:42 AM, Jakub Jelinek wrote: 2012-02-13 Jakub Jelinek ja...@redhat.com PR c++/52215 * c-common.c (sync_resolve_params): Don't decide whether to convert or not based on TYPE_SIZE comparison, convert whenever arg_type is unsigned INTEGER_TYPE. * g++.dg/ext/atomic-1.C: New test. Ok. r~
Re: [libitm] Add SPARC bits
On 02/12/2012 12:15 PM, Eric Botcazou wrote: 2012-02-12 Eric Botcazou ebotca...@adacore.com * configure.tgt (target_cpu): Handle sparc and sparc64 sparcv9. * config/sparc/cacheline.h: New file. * config/sparc/target.h: Likewise. * config/sparc/sjlj.S: Likewise. * config/linux/sparc/futex_bits.h: Likewise. Ok. Thanks for this. r~
[patch, testsuite] PR 52229, testsuite failure
Hello world, the attached patch xfails the offencing test case on architectures which do not allow unaligned access for vecorization. OK for trunk? Any other architectures which should be XFAILed? Regression-tested on powerpc64-unknown-linux-gnu. OK for trunk? Thomas 2012-02-13 Thomas Koenig tkoe...@gcc.gnu.org PR testsuite/52229 PR fortran/32380 * gfortran.dg/vect/pr32380.f: XFAIL on PowerPC and ia-64. Index: pr32380.f === --- pr32380.f (Revision 184166) +++ pr32380.f (Arbeitskopie) @@ -259,5 +259,5 @@ return end -! { dg-final { scan-tree-dump-times vectorized 7 loops 1 vect } } +! { dg-final { scan-tree-dump-times vectorized 7 loops 1 vect { xfail powerpc*-*-* ia64-*-*-* } } } ! { dg-final { cleanup-tree-dump vect } }
Re: [PR52001] too many cse reverse equiv exprs (take2)
Alexandre Oliva aol...@redhat.com writes: Jakub asked to have a closer look at the problem, and I found we could do somewhat better. The first thing I noticed was that the problem was that, in each block that computed a (base+const), we created a new VALUE for the expression (with the same const and global base), and a new reverse operation. This was wrong. Clearly we should reuse the same expression. I had to arrange for the expression to be retained across basic blocks, for it was function invariant. I split out the code to detect invariants from the function that removes entries from the cselib hash table across blocks, and made it recursive so that a VALUE equivalent to (plus (value) (const_int)) will be retained, if the base value fits (maybe recursively) the definition of invariant. An earlier attempt to address this issue remained in cselib: using the canonical value to build the reverse expression. I believe it has a potential of avoiding the creation of redundant reverse expressions, for expressions involving equivalent but different VALUEs will evaluate to different hashes. I haven't observed effects WRT the given testcase, before or after the change that actually fixed the problem, because we now find the same base expression and thus reuse the reverse_op as well, but I figured I'd keep it in for it is very cheap and possibly useful. Thanks for looking at this. Just to be sure: does this avoid the kind of memrefs_conflict_p cycle I was seeing in: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01051.html (in theory, I mean). Richard
Re: [Patch, libfortran] RFC: Shared vtables, constification
On Mon, Feb 13, 2012 at 7:20 PM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: Hi, the attached patch changes the low-level libgfortran IO dispatching mechanism to use shared vtables for each stream type, instead of all the function pointers being replicated for each unit. This is similar to e.g. how the C++ frontend implements vtables. The benefits are: - Slightly smaller heap memory overhead for each unit as only the vtable pointer needs to be stored, and slightly faster unit initialization as only the vtable pointer needs to be setup instead of all the function pointers in the stream struct. - Looking at unix.o with readelf, one sees Relocation section '.rela.data.rel.ro.local.mem_vtable' at offset 0x15550 contains 8 entries: and similarly for the other vtables; according to http://www.airs.com/blog/archives/189 this means that after relocation the page where this data resides may be marked read-only. The downside is that the sizes of the .text and .data sections are increased. Before: text data bss dec hex filename 1116991 6664 592 1124247 112797 ./x86_64-unknown-linux-gnu/libgfortran/.libs/libgfortran.so After: text data bss dec hex filename 1117487 6936 592 1125015 112a97 ./x86_64-unknown-linux-gnu/libgfortran/.libs/libgfortran.so The data section increase is due to the vtables, the text increase is, I guess, due to the extra pointer dereference when calling the IO functions. Regtested on x86_64-unknown-linux-gnu, Ok for trunk, or 4.8? Certainly not for trunk at this stage. For 4.8: So the trade-off is between faster initialization and smaller heap vs. fewer pointer dereferences? Does this patch fix an actual problem? Does it bring a killer feature? Otherwise, I'd say if it ain't broke, don't fix it! Ciao! Steven
[pph] Re-factor streaming of binding levels (issue5663043)
This patch re-writes the streaming of binding levels to guarantee that the whole tree of binding levels in each file is written and merged-in before anything else. With this re-factoring, we now write all the binding levels, the merge keys for symbols/types and their other contents at the start of the PPH image. Additionally, we do not skip any namespaces when traversing the binding level tree (we used to skip over builtin namespaces, which causes problems when looking up things like std::ptrdiff_t). After all the binding levels have been merged-in, every other read of a binding level is expected to be read as a reference (so that we don't materialize a new binding level that has not been merged in). With this change, I get significantly fewer name lookup failures in our internal code base. But this is still incomplete. In chasing down other failures, I found out that we should be also writing out the table of canonical types (type_hash_table). I'm getting a new ICE, because two different types that compare the same fail the TYPE_CANONICAL identity test. Lawrence, to avoid too many merge conflicts with the patch you are working with, I will be fixing this new failure in a subsequent patch. Most of this patch is moving code around. The old pph_out/in_binding_level is now simply expecting a reference to a binding level. The merging into existing binding levels (e.g., the global binding scope or binding levels for already existing namespaces) is done by pph_in_binding_level_start. This routine will take an existing binding level as parameter and use it in two ways: 1- If the record read from STREAM is a reference, the binding level in that reference must be identical to EXISTING_BL. 2- If the record read from STREAM is a new instance, the binding level given in EXISTING_BL is registered in the cache at the slot location given by this record. This way, subsesequent internal references to EXISTING_BL will resolve to EXISTING_BL. This is used for binding levels that are already set in the compilation (e.g., scope_chain-bindings). 2012-02-13 Diego Novillo dnovi...@google.com cp/ChangeLog.pph * pph-in.c (pph_in_binding_level_start): Move earlier into the file. Change return type to cp_binding_level *. Add argument EXISTING_BL and EXISTED_P. If EXISTING_BL is given, and a reference is read from STREAM, the reference read should be the same as EXISTING_BL. If EXISTING_BL is given and a new reference is started, do not allocate a new instance. Rather, register EXISTING_BL in the cache. (pph_in_binding_level_ref): Rename from pph_in_binding_level. Assert that it always reads a reference record. Update all users. (pph_in_binding_level_1): Move body inside pph_in_merge_body_binding_level. (pph_in_merge_key_binding_level): Move earlier in the file. (pph_in_merge_body_binding_level): Move earlier in the file. (pph_in_merge_body_binding_level_1): Move body into pph_in_merge_body_binding_level. (pph_in_ld_ns): Call pph_in_binding_level_ref. (pph_ensure_namespace_binding_level): Move body into pph_in_merge_key_namespace_decl. Update all users. (pph_in_merge_key_namespace_decl): Fix comment. (pph_in_global_binding): Call pph_in_binding_level_start. * pph-out.c (pph_out_tree_vec_unchain): Remove. (pph_out_chain_filtered): Remove. (pph_out_binding_level_ref): Rename from pph_out_binding_level. Always expect to write a reference record. Update all users. (pph_out_cxx_binding_1): Embed inside pph_out_merge_body_binding_level. Update all users. (pph_out_merge_key_binding_level): Do not filter BL-NAMESPACES. (pph_out_merge_body_binding_level): Likewise. testsuite/ChangeLog.pph * g++.dg/pph/x7dynarray5.cc: Add expected failure due to type canonical mismatch. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/pph@184170 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/cp/ChangeLog.pph| 36 gcc/cp/pph-in.c | 323 ++- gcc/cp/pph-out.c| 183 +++--- gcc/testsuite/ChangeLog.pph |5 + gcc/testsuite/g++.dg/pph/x7dynarray5.cc |2 + 5 files changed, 260 insertions(+), 289 deletions(-) diff --git a/gcc/cp/ChangeLog.pph b/gcc/cp/ChangeLog.pph index b078607..2fa3153 100644 --- a/gcc/cp/ChangeLog.pph +++ b/gcc/cp/ChangeLog.pph @@ -1,3 +1,39 @@ +2012-02-13 Diego Novillo dnovi...@google.com + + * pph-in.c (pph_in_binding_level_start): Move earlier into the + file. + Change return type to cp_binding_level *. + Add argument EXISTING_BL and EXISTED_P. + If EXISTING_BL is given, and a reference is read from STREAM, + the reference read
Re: [libitm] Link with -litm and -pthread
On 02/11/2012 06:14 AM, Eric Botcazou wrote: 2012-02-11 Eric Botcazou ebotca...@adacore.com * gcc.c (LINK_COMMAND_SPEC): Deal with -fgnu-tm. (GTM_SELF_SPECS): Define if not already defined. (driver_self_specs): Add GTM_SELF_SPECS. * config/darwin.h (GTM_SELF_SPECS): Define. * config/i386/cygwin.h (GTM_SELF_SPECS): Likewise. * config/i386/mingw32.h (GTM_SELF_SPECS): Likewise. 2012-02-11 Eric Botcazou ebotca...@adacore.com * configure.ac (link_itm): Fix comment. * configure: Regenerate. * testsuite/lib/libitm.exp: Do not pass -litm for the link. Ok with the darwin followup-patch. r~
Re: [PING] New port resubmission for TILEPro and TILE-Gx
On 2/13/2012 3:02 PM, Richard Henderson wrote: On 02/13/2012 07:42 AM, Walter Lee wrote: 1/6 toplevel: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01860.html 2/6 contrib: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01855.html 3/6 gcc: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01494.html 4/6 libcpp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01857.html 5/6 libgcc: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01858.html 6/6 libgomp: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01859.html Ok. r~ Hi Richard, Thanks for the review. Do I have permission to commit, or is there anything else I need to do? I will update the copyright notices with the year 2012. The assignment of copyright paperwork was filed on May 26, 2011 by Tilera Corporation. The gcc steering committee has approved my maintainership: http://gcc.gnu.org/ml/gcc/2012-02/msg00123.html. I have an account at sourceware.org. Can I use your name to get commit rights to gcc? Thanks, Walter
Re: [wwwdocs] deprecation of access declarations
2012/2/12 Gerald Pfeifer ger...@pfeifer.com: On Fri, 27 Jan 2012, Fabien Chêne wrote: I get back to you for the snippet about deprecated access declarations. I would also find it sensible to advertise about the fix of c++/14258, a popular bug I have hit myself many times. OK to commit the below ? Yes, thank you. One suggestion: where it reads c++/14258, how about making this bug c++/14258, for those who are less familiar how we name things? I have committed it with the change that you have suggested. Do we need an update for http://gcc.gnu.org/gcc-4.7/porting_to.html as well? I don't know. The deprecation of access declarations only raises a warning -- unless -Werror is used. Is porting_to.html appropriate to describe the way to fix the warning ? Concerning the other changes related to using declarations, I don't expect them to (massively) break some existing code. -- Fabien
[patch] libitm: Add multi-lock, write-through TM method.
This patch adds a new TM method, ml_wt, which uses an array of locks with version numbers and runs a write-through algorithm with time-based validations and snapshot time extensions. patch1 adds xcalloc as a helper function for allocations (used in the new TM method). patch2 improves TM method reinitialization (helps ml_wt avoid reallocation of the lock array) and adds a hook to TM methods so that they can report back whether they can deal with the current runtime situation (e.g., a the current number of threads). patch3 is the actual TM method. Tested on ppc64 with up to 64 threads with both microbenchmarks and STAMP. OK for trunk? commit c0d1d1778b18f3dfc4a136e5a807c2fecbeb64e4 Author: Torvald Riegel trie...@redhat.com Date: Thu Feb 9 13:44:38 2012 +0100 libitm: Add xcalloc. libitm/ * util.cc (GTM::xcalloc): New. * common.h (GTM::xcalloc): Declare. diff --git a/libitm/common.h b/libitm/common.h index 14d0efb..b1ef2d4 100644 --- a/libitm/common.h +++ b/libitm/common.h @@ -54,6 +54,8 @@ namespace GTM HIDDEN { // cache lines that are not shared with any object used by another thread. extern void * xmalloc (size_t s, bool separate_cl = false) __attribute__((malloc, nothrow)); +extern void * xcalloc (size_t s, bool separate_cl = false) + __attribute__((malloc, nothrow)); extern void * xrealloc (void *p, size_t s, bool separate_cl = false) __attribute__((malloc, nothrow)); diff --git a/libitm/util.cc b/libitm/util.cc index afd37e4..48a1bf8 100644 --- a/libitm/util.cc +++ b/libitm/util.cc @@ -71,6 +71,18 @@ xmalloc (size_t size, bool separate_cl) } void * +xcalloc (size_t size, bool separate_cl) +{ + // TODO Use posix_memalign if separate_cl is true, or some other allocation + // method that will avoid sharing cache lines with data used by other + // threads. + void *r = calloc (1, size); + if (r == 0) +GTM_fatal (Out of memory allocating %lu bytes, (unsigned long) size); + return r; +} + +void * xrealloc (void *old, size_t size, bool separate_cl) { // TODO Use posix_memalign if separate_cl is true, or some other allocation commit 3b486db323b51ea87e1f64cd3abb9402f7c7307a Author: Torvald Riegel trie...@redhat.com Date: Thu Feb 9 13:50:10 2012 +0100 libitm: Improve method reinit and choice. libitm/ * dispatch.h (GTM::abi_dispatch::supports): New. (GTM::method_group::reinit): New. * retry.cc (GTM::gtm_thread::decide_retry_strategy): Use reinit(). (GTM::gtm_thread::number_of_threads_changed): Check that the method supports the current situation. diff --git a/libitm/dispatch.h b/libitm/dispatch.h index dbf05e4..d059c49 100644 --- a/libitm/dispatch.h +++ b/libitm/dispatch.h @@ -245,6 +245,12 @@ struct method_group // Stop using any method from this group for now. This can be used to // destruct meta data as soon as this method group is not used anymore. virtual void fini() = 0; + // This can be overriden to implement more light-weight re-initialization. + virtual void reinit() + { +fini(); +init(); + } }; @@ -290,6 +296,10 @@ public: // method on begin of a nested transaction without committing or restarting // the parent method. virtual abi_dispatch* closed_nesting_alternative() { return 0; } + // Returns true iff this method group supports the current situation. + // NUMBER_OF_THREADS is the current number of threads that might execute + // transactions. + virtual bool supports(unsigned number_of_threads) { return true; } bool read_only () const { return m_read_only; } bool write_through() const { return m_write_through; } diff --git a/libitm/retry.cc b/libitm/retry.cc index decd773..6e05f5f 100644 --- a/libitm/retry.cc +++ b/libitm/retry.cc @@ -58,11 +58,8 @@ GTM::gtm_thread::decide_retry_strategy (gtm_restart_reason r) serial_lock.read_unlock(this); serial_lock.write_lock(); if (disp-get_method_group() == default_dispatch-get_method_group()) - { - // Still the same method group. - disp-get_method_group()-fini(); - disp-get_method_group()-init(); - } + // Still the same method group. + disp-get_method_group()-reinit(); serial_lock.write_unlock(); serial_lock.read_lock(this); if (disp-get_method_group() != default_dispatch-get_method_group()) @@ -72,11 +69,8 @@ GTM::gtm_thread::decide_retry_strategy (gtm_restart_reason r) } } else - { - // We are a serial transaction already, which makes things simple. - disp-get_method_group()-fini(); - disp-get_method_group()-init(); - } + // We are a serial transaction already, which makes things simple. + disp-get_method_group()-reinit(); } bool retry_irr = (r == RESTART_SERIAL_IRR); @@ -249,7 +243,7 @@ GTM::gtm_thread::number_of_threads_changed(unsigned previous, unsigned now)
[lra] fixing x86 gcc testsuite regressions
The following tiny patch fixes testsuite regressions on x86-64 occurred after latest merge (this weekend). The patch was successfully bootstrapped on x86/x86-64. Committed as rev. 184173. 2012-02-13 Vladimir Makarov vmaka...@redhat.com * lra.c (check_rtl): Ignore addr with UNSPEC. Index: lra.c === --- lra.c (revision 184156) +++ lra.c (working copy) @@ -1940,6 +1940,7 @@ check_rtl (bool final_p) legitimate if they satisfies the constraints and will be checked by insn constraints which we ignore here. */ + GET_CODE (XEXP (op, 0)) != UNSPEC GET_CODE (XEXP (op, 0)) != PRE_DEC GET_CODE (XEXP (op, 0)) != PRE_INC GET_CODE (XEXP (op, 0)) != POST_DEC
Re: [lra] fixing x86 gcc testsuite regressions
On Mon, Feb 13, 2012 at 10:50 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following tiny patch fixes testsuite regressions on x86-64 occurred after latest merge (this weekend). Hello Vladimir, Could you please also update http://gcc.gnu.org/svn.html#devbranches ? It still mentions ira as an active development branch, but lra isn't mentioned yet. Ciao! Steven
[wwwdocs] Use dependent instead of dependant
Per http://gcc.gnu.org/codingconventions.html we should use dependent, not dependant. This fixes this for the new GCC 4.7 porting notes as well as one old news entry. Committed. Gerald Index: gcc-4.7/porting_to.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/porting_to.html,v retrieving revision 1.7 diff -u -3 -p -r1.7 porting_to.html --- gcc-4.7/porting_to.html 27 Jan 2012 01:04:13 - 1.7 +++ gcc-4.7/porting_to.html 13 Feb 2012 22:17:27 - @@ -106,7 +106,7 @@ Instead, use the POSIX macro code_REEN p The C++ compiler no longer performs some extra unqualified lookups it had performed in the past, namely -a href=http://gcc.gnu.org/PR24163;dependant base class scope lookups/a +a href=http://gcc.gnu.org/PR24163;dependent base class scope lookups/a and a href=http://gcc.gnu.org/PR29131;unqualified template function/a lookups. /p Index: news/ia32.html === RCS file: /cvs/gcc/wwwdocs/htdocs/news/ia32.html,v retrieving revision 1.6 diff -u -3 -p -r1.6 ia32.html --- news/ia32.html 21 Jan 2002 10:24:45 - 1.6 +++ news/ia32.html 13 Feb 2012 22:17:27 - @@ -36,7 +36,7 @@ and focused on better optimization for t free registers and allocate them as scratches. This is a generalization of the PGCC -friscify pass./li -liRecognition of extension-dependant GIVs. This shows up in a loop like +liRecognition of extension-dependent GIVs. This shows up in a loop like pre short s; for (s = 0; s lt; 10; ++s) @@ -48,7 +48,7 @@ and focused on better optimization for t liRecognition of certain forms of loop-carried post-decrement. Primarily, pre -while (a--) { /* nothing dependant on a */ } +while (a--) { /* nothing dependent on a */ } /pre becomes pre
Re: [lra] fixing x86 gcc testsuite regressions
On 02/13/2012 05:07 PM, Steven Bosscher wrote: On Mon, Feb 13, 2012 at 10:50 PM, Vladimir Makarovvmaka...@redhat.com wrote: The following tiny patch fixes testsuite regressions on x86-64 occurred after latest merge (this weekend). Hello Vladimir, Could you please also update http://gcc.gnu.org/svn.html#devbranches ? It still mentions ira as an active development branch, but lra isn't mentioned yet. Ok. I've just done that. The patch is in the attachment. Index: svn.html === RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v retrieving revision 1.167 diff -u -r1.167 svn.html --- svn.html 1 Feb 2012 19:55:33 - 1.167 +++ svn.html 13 Feb 2012 22:27:18 - @@ -210,30 +210,6 @@ is maintained by a href=mailto:berg...@vnet.ibm.comPeter Bergner/a./dd - dtira/dt - ddThis branch contains the Integrated Register Allocator (IRA). It is - based on work done on yara-branch. The latter is more of a research - branch because one of its goals (removing reload) is too remote. The - ira branch is focused to prepare some code for GCC mainline, hopefully - in time for GCC 4.4. IRA still uses reload; it is called integrated - because register coalescing and register live range splitting are done - on-the-fly during coloring. The branch is maintained by Vladimir - Makarov lt; a - href=mailto:vmaka...@redhat.comvmaka...@redhat.com/agt; and - will be merged with mainline from time to time. Patches will be - marked with the tag code[ira]/code in the subject line./dd - - dtira-merge/dt - ddThis branch contains bug fixes for the Integrated Register Allocator - (IRA). It is branched from trunk at revision 139590 when IRA was - merged into trunk. It is used to track IRA related regressions. - Only IRA fixes from trunk will be applied to this branch. Its goal is - there should be no make check and performance regressions against - trunk at revision 139589. The branch is maintained by H.J. Lu lt;a - href=mailto:hjl.to...@gmail.comhjl.to...@gmail.com/agt; and - Vladimir Makarov lt; - a href=mailto:vmaka...@redhat.comvmaka...@redhat.com/agt;./dd - dtsel-sched-branch/dt ddThis branch contains the implementation of the selective scheduling approach. The goal of the branch is to provide more aggressive scheduler @@ -336,6 +312,15 @@ maintained by Richard Guenther and H.J. Lu. Patches should be marked with the tag code[vect256]/code in the subject line./dd + + dtlra/dt + ddThis branch contains the Local Register Allocator (LRA). LRA is + focused to replace GCC reload pass. The branch is maintained by + Vladimir Makarov + lt; a href=mailto:vmaka...@redhat.comvmaka...@redhat.com/agt; + and will be merged with mainline from time to time. Patches will be + marked with the tag code[lra]/code in the subject line./dd + /dl h4Architecture-specific/h4
Re: [Patch, fortran] PR50981 absent polymorphic scalar actual arguments
Mikael, This is OK for trunk with one proviso; could you move is_class_container_ref to gfc_is_class_container_ref in class.c? Thanks for the patch Paul On Sun, Feb 12, 2012 at 10:11 PM, Mikael Morin mikael.mo...@sfr.fr wrote: Hello, this is the next PR50981 fix: when passing polymorphic scalar actual arguments to elemental procedures, we were not adding the _data component reference. The fix is straightforward; checking that the expression's type is BT_CLASS was introducing regressions, so this patch uses a helper function to check the type without impacting the testsuite. Regression tested on x86_64-unknown-freebsd9.0. OK for trunk? Mikael -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: [wwwdocs] deprecation of access declarations
On Mon, 13 Feb 2012, Fabien Chêne wrote: Do we need an update for http://gcc.gnu.org/gcc-4.7/porting_to.html as well? I don't know. The deprecation of access declarations only raises a warning -- unless -Werror is used. Is porting_to.html appropriate to describe the way to fix the warning ? Hmm, I guess if it's only a warning we do not need to document it yet. Let's see whether Jason or other C++ affine developers think differently. Gerald
Re: [patch] libitm: Add multi-lock, write-through TM method.
On 02/13/2012 01:47 PM, Torvald Riegel wrote: + else { Watch the formatting. + // Location-to-orec mapping. Stripes of 16B mapped to 2^19 orecs. + static const gtm_word L2O_ORECS = 1 19; + static const gtm_word L2O_SHIFT = 4; Is it just easier to say 16B or did we really want CACHELINE_SIZE? Otherwise ok. r~
[PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
Hello! We can jump indirect from memory address, sparing a couple of cycles. 2012-02-14 Uros Bizjak ubiz...@gmail.com * config/x86/target.h (GTM_longjmp): Jump indirect from memory address. Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline? Uros. Index: config/x86/sjlj.S === --- config/x86/sjlj.S (revision 184177) +++ config/x86/sjlj.S (working copy) @@ -119,23 +119,19 @@ SYM(GTM_longjmp): movq32(%rsi), %r13 movq40(%rsi), %r14 movq48(%rsi), %r15 - movq56(%rsi), %rdx movl%edi, %eax cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq%rcx, %rsp - jmp *%rdx + jmp *56(%rsi) #else movl(%edx), %ecx movl4(%edx), %ebx movl8(%edx), %esi movl12(%edx), %edi movl16(%edx), %ebp - movl20(%edx), %edx cfi_def_cfa(%ecx, 0) - cfi_register(%eip, %edx) movl%ecx, %esp - jmp *%edx + jmp *20(%edx) #endif cfi_endproc
Re: [PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
On 02/13/2012 02:54 PM, Uros Bizjak wrote: - movq56(%rsi), %rdx movl%edi, %eax cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq%rcx, %rsp - jmp *%rdx + jmp *56(%rsi) If you're going to do that, the correct fix for the unwind info is - cfi_register(%rip, %rdx) + cfi_offset(%rip, 56) Otherwise ok. r~
Re: [PATCH 4.8 v2, i386]: Make CCZ mode compatible with CCGOC and CCGO modes
On 02/11/2012 12:56 AM, Uros Bizjak wrote: FWIW, the mode of flags in users doesn't matter at all on x86, but which way is correct? As far as I know, it doesn't matter anywhere. We don't even bother to have perfect harmony between integer modes in hard registers -- think about what happens when we drop all the subregs on the floor post-reload. Yes, it's probably an error if we don't have compatible modes between def and use, but nothing is going to check for that. r~
[patch] libitm: Fix race condition in dispatch choice at transaction begin.
This patch fixes a race condition in how transactions previously chose the dispatch at transaction begin: default_dispatch in retry.cc was read by transaction before they became either serial or nonserial transactions (with the serial_lock). A concurrent change of default_dispatch was possible to lead to a transaction starting with some out-of-date dispatch that were potentially incompatible with the actual current dispatch, leading in turn to all sorts of synchronization failures. This is fixed by this patch by moving the serial_lock acquisiton into the dispatch choice code. Tested on ppc64 with microbenchmarks. OK for trunk? commit ce52924dedca632b24ea931329e060959782f89a Author: Torvald Riegel trie...@redhat.com Date: Mon Feb 13 23:49:55 2012 +0100 libitm: Fix race condition in dispatch choice at transaction begin. libitm/ * beginend.cc (GTM::gtm_thread::begin_transaction): Move serial lock acquisition to ... * retry.cc (GTM::gtm_thread::decide_begin_dispatch): ... here. (default_dispatch): Make atomic. (GTM::gtm_thread::decide_retry_strategy, GTM::gtm_thread::set_default_dispatch): Access atomically. (GTM::gtm_thread::number_of_threads_changed): Initialize default_dispatch here. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 08c2174..e6a84de 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -233,16 +233,6 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) { // Outermost transaction disp = tx-decide_begin_dispatch (prop); - if (disp == dispatch_serialirr() || disp == dispatch_serial()) - { - tx-state = STATE_SERIAL; - if (disp == dispatch_serialirr()) - tx-state |= STATE_IRREVOCABLE; - serial_lock.write_lock (); - } - else - serial_lock.read_lock (tx); - set_abi_disp (disp); } diff --git a/libitm/retry.cc b/libitm/retry.cc index d57bba0..08c5d80 100644 --- a/libitm/retry.cc +++ b/libitm/retry.cc @@ -27,8 +27,9 @@ #include ctype.h #include libitm_i.h -// The default TM method used when starting a new transaction. -static GTM::abi_dispatch* default_dispatch = 0; +// The default TM method used when starting a new transaction. Initialized +// in number_of_threads_changed() below. +static std::atomicGTM::abi_dispatch* default_dispatch; // The default TM method as requested by the user, if any. static GTM::abi_dispatch* default_dispatch_user = 0; @@ -57,14 +58,18 @@ GTM::gtm_thread::decide_retry_strategy (gtm_restart_reason r) // given that re-inits should be very infrequent. serial_lock.read_unlock(this); serial_lock.write_lock(); - if (disp-get_method_group() == default_dispatch-get_method_group()) + if (disp-get_method_group() + == default_dispatch.load(memory_order_relaxed) + -get_method_group()) // Still the same method group. disp-get_method_group()-reinit(); serial_lock.write_unlock(); serial_lock.read_lock(this); - if (disp-get_method_group() != default_dispatch-get_method_group()) + if (disp-get_method_group() + != default_dispatch.load(memory_order_relaxed) + -get_method_group()) { - disp = default_dispatch; + disp = default_dispatch.load(memory_order_relaxed); set_abi_disp(disp); } } @@ -124,48 +129,81 @@ GTM::gtm_thread::decide_retry_strategy (gtm_restart_reason r) // Decides which TM method should be used on the first attempt to run this -// transaction. +// transaction. Acquires the serial lock and sets transaction state +// according to the chosen TM method. GTM::abi_dispatch* GTM::gtm_thread::decide_begin_dispatch (uint32_t prop) { + abi_dispatch* dd; // TODO Pay more attention to prop flags (eg, *omitted) when selecting // dispatch. + // ??? We go irrevocable eagerly here, which is not always good for + // performance. Don't do this? if ((prop pr_doesGoIrrevocable) || !(prop pr_instrumentedCode)) -return dispatch_serialirr(); +dd = dispatch_serialirr(); - // If we might need closed nesting and the default dispatch has an - // alternative that supports closed nesting, use it. - // ??? We could choose another TM method that we know supports closed - // nesting but isn't the default (e.g., dispatch_serial()). However, we - // assume that aborts that need closed nesting are infrequent, so don't - // choose a non-default method until we have to actually restart the - // transaction. - if (!(prop pr_hasNoAbort) !default_dispatch-closed_nesting() - default_dispatch-closed_nesting_alternative()) -return default_dispatch-closed_nesting_alternative(); + else +{ + // Load the default dispatch. We're not an active transaction and so it + // can change concurrently but will
Re: Ping: Fix MIPS va_arg regression
On 02/02/2012 11:01 AM, Richard Sandiford wrote: Ping for: http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01564.html which fixes a MIPS va_arg regression (admittedly a long-standing one) on zero-sized types. There are no functional changes to other targets and I'm as confident as I can be that it's safe for MIPS. Ok. r~
Re: [PATCH 4.8 v2, i386]: Make CCZ mode compatible with CCGOC and CCGO modes
On Tue, Feb 14, 2012 at 12:00 AM, Richard Henderson r...@redhat.com wrote: On 02/11/2012 12:56 AM, Uros Bizjak wrote: FWIW, the mode of flags in users doesn't matter at all on x86, but which way is correct? As far as I know, it doesn't matter anywhere. We don't even bother to have perfect harmony between integer modes in hard registers -- think about what happens when we drop all the subregs on the floor post-reload. Yes, it's probably an error if we don't have compatible modes between def and use, but nothing is going to check for that. cse.c says some relaxing words related to this issue: /* If the following assertion was triggered, there is most probably something wrong with the cc_modes_compatible back end function. CC modes only can be considered compatible if the insn - with the mode replaced by any of the compatible modes - can still be recognized. */ It looks to me that correct definition of cc_modes_compatible guarantees that insn is still valid, no matter if the mode of flags remains in the wrong mode. In any case, I will add the comment to avoid confusion. Thanks, Uros.
Re: [patch] libitm: Fix race condition in dispatch choice at transaction begin.
On 02/13/2012 03:03 PM, Torvald Riegel wrote: -// The default TM method used when starting a new transaction. -static GTM::abi_dispatch* default_dispatch = 0; +// The default TM method used when starting a new transaction. Initialized +// in number_of_threads_changed() below. +static std::atomicGTM::abi_dispatch* default_dispatch; I see nothing but memory_order_relaxed uses of default_dispatch? r~
[PATCH] Fix cselib -fcompare-debug problem (PR bootstrap/52172)
Hi! To avoid -fcompare-debug failures, we promote_debug_loc VALUEs looked up from DEBUG_INSNs when they are looked from some other insns. Unfortunately, the scheduler after cselib_lookup_from_insn from DEBUG_INSN calls cselib_subst_to_values, which may e.g. cselib_lookup_mem (with create=0). As that is with cselib_current_insn == NULL, promote_debug_loc considers it being a non-DEBUG_INSN lookup and promotes it to non-debug, which on the testcase in the PR (too large and too hard to further reduce) results in different n_useless_values and remove_useless_values being triggered at different times between -g and -g0 compilations. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, tested on the testcase with ia64-linux cross. Ok for trunk? 2012-02-13 Jakub Jelinek ja...@redhat.com PR bootstrap/52172 * cselib.h (cselib_subst_to_values_from_insn): New prototype. * cselib.c (cselib_subst_to_values_from_insn): New function. * sched-deps.c (add_insn_mem_dependence, sched_analyze_1, sched_analyze_2): Use it. --- gcc/cselib.h.jj 2012-01-01 19:54:46.0 +0100 +++ gcc/cselib.h2012-02-13 21:29:21.792483236 +0100 @@ -88,6 +88,7 @@ extern rtx cselib_expand_value_rtx_cb (r extern bool cselib_dummy_expand_value_rtx_cb (rtx, bitmap, int, cselib_expand_callback, void *); extern rtx cselib_subst_to_values (rtx, enum machine_mode); +extern rtx cselib_subst_to_values_from_insn (rtx, enum machine_mode, rtx); extern void cselib_invalidate_rtx (rtx); extern void cselib_reset_table (unsigned int); --- gcc/cselib.c.jj 2012-02-13 18:15:17.0 +0100 +++ gcc/cselib.c2012-02-13 21:33:37.019088486 +0100 @@ -1905,6 +1905,19 @@ cselib_subst_to_values (rtx x, enum mach return copy; } +/* Wrapper for cselib_subst_to_values, that indicates X is in INSN. */ + +rtx +cselib_subst_to_values_from_insn (rtx x, enum machine_mode memmode, rtx insn) +{ + rtx ret; + gcc_assert (!cselib_current_insn); + cselib_current_insn = insn; + ret = cselib_subst_to_values (x, memmode); + cselib_current_insn = NULL; + return ret; +} + /* Look up the rtl expression X in our tables and return the value it has. If CREATE is zero, we return NULL if we don't know the value. Otherwise, we create a new one if possible, using mode MODE if X --- gcc/sched-deps.c.jj 2012-01-26 09:22:21.0 +0100 +++ gcc/sched-deps.c2012-02-13 21:30:40.235054596 +0100 @@ -1728,7 +1728,8 @@ add_insn_mem_dependence (struct deps_des if (sched_deps_info-use_cselib) { mem = shallow_copy_rtx (mem); - XEXP (mem, 0) = cselib_subst_to_values (XEXP (mem, 0), GET_MODE (mem)); + XEXP (mem, 0) = cselib_subst_to_values_from_insn (XEXP (mem, 0), + GET_MODE (mem), insn); } link = alloc_EXPR_LIST (VOIDmode, canon_rtx (mem), *mem_list); *mem_list = link; @@ -2449,7 +2450,9 @@ sched_analyze_1 (struct deps_desc *deps, t = shallow_copy_rtx (dest); cselib_lookup_from_insn (XEXP (t, 0), address_mode, 1, GET_MODE (t), insn); - XEXP (t, 0) = cselib_subst_to_values (XEXP (t, 0), GET_MODE (t)); + XEXP (t, 0) + = cselib_subst_to_values_from_insn (XEXP (t, 0), GET_MODE (t), + insn); } t = canon_rtx (t); @@ -2609,7 +2612,9 @@ sched_analyze_2 (struct deps_desc *deps, t = shallow_copy_rtx (t); cselib_lookup_from_insn (XEXP (t, 0), address_mode, 1, GET_MODE (t), insn); - XEXP (t, 0) = cselib_subst_to_values (XEXP (t, 0), GET_MODE (t)); + XEXP (t, 0) + = cselib_subst_to_values_from_insn (XEXP (t, 0), GET_MODE (t), + insn); } if (!DEBUG_INSN_P (insn)) Jakub
Re: [PATCH] Fix cselib -fcompare-debug problem (PR bootstrap/52172)
On 02/13/2012 03:17 PM, Jakub Jelinek wrote: 2012-02-13 Jakub Jelinek ja...@redhat.com PR bootstrap/52172 * cselib.h (cselib_subst_to_values_from_insn): New prototype. * cselib.c (cselib_subst_to_values_from_insn): New function. * sched-deps.c (add_insn_mem_dependence, sched_analyze_1, sched_analyze_2): Use it. Ok. r~
Re: [PING] New port resubmission for TILEPro and TILE-Gx
On Feb 13, 2012, at 1:43 PM, Walter Lee wrote: Thanks for the review. Do I have permission to commit, Yes, you do. Richard can approve this, and when he says, Ok., you're good to go. or is there anything else I need to do? Nope. (Assuming you have write after approval to the tree.)
PR middle-end/52214
Hi, this patch fixes typo I introduced in my patch fixing infinte recursion of predict_paths_for_bb. While converting the check from aux pointers to bitmaps, I got bitmap_set_bit wrong. Bootstrapped/regtested x86_64-linux, comitted. PR middle-end/52214 * predict.c (predict_paths_for_bb): Fix thinko in previous patch. Index: predict.c === *** predict.c (revision 184179) --- predict.c (working copy) *** predict_paths_for_bb (basic_block cur, b *** 1869,1875 prevent visiting given BB twice. */ if (found) predict_edge_def (e, pred, taken); ! else if (!bitmap_set_bit (visited, e-src-index)) predict_paths_for_bb (e-src, e-src, pred, taken, visited); } for (son = first_dom_son (CDI_POST_DOMINATORS, cur); --- 1869,1875 prevent visiting given BB twice. */ if (found) predict_edge_def (e, pred, taken); ! else if (bitmap_set_bit (visited, e-src-index)) predict_paths_for_bb (e-src, e-src, pred, taken, visited); } for (son = first_dom_son (CDI_POST_DOMINATORS, cur);
Re: [RFC, 4.8] Magic matching for flags clobbering and setting
On Sat, Feb 11, 2012 at 1:12 AM, Richard Henderson r...@redhat.com wrote: Seeing as how Uros is starting to go down the path of cleaning up the flags handling for x86, I thought I'd go ahead and knock up the idea that I've been tossing around to help automate the process of building patterns that match both clobbering the flags and setting the flags to a comparison. This is far from complete, but it at least shows the direction. What I know is missing off the top of my head are: (0) Documentation in some .texi file; atm there's only what's in rtl.def. (1) Generate (clobber (reg flags)) from genemit, should this construct be used in a named insn pattern. (2) Can't be usefully used with define_insn_and_split, and no way to tell. This problem should simply be documented in the .texi file as user error. (3) Can't be used for x86 add patterns, as the clobber version wants the freedom to use lea and the set flags version cannot. And there are different sets of constraints if lea may be used or not. What would be nice, however, is exposing the targetm.cc_modes_compatible thing in such a way that the x86 add patterns could use that, for the separate insn that does do the set flags. Exposing the targetm.cc_modes_compatible thing separately might also clean up some of the evil magic in genrecog.c too. Comments? To see if I understand what a cc0 port conversion would look like with match_flags, I tried to apply this magic to convert one of the pet ports, the mighty pdp11. The pdp11 port doesn't have any define_insn_and_splits, so I didn't run into the problem you mentioned in (2). What is the problem here? I suppose it has to do with finding out what the flags setter is after the split? If so, then couldn't that be resolved with some rules about how the post-split patterns should be constructed? Other than that: To convert a port, there is still a lot of work to be done to define and handle the various CC modes properly (well, not for the pdp11, because it writes out 1 insn for most define_insns), but it is great not having to define all the pairs of clobber-flags and set-flags insns. At least, I didn't end up rewriting the complete .md file. It was relatively easy. Less book-keeping involved, etc. Hope this goes in for GCC 4.8. Ciao! Steven
Re: [PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
On Mon, Feb 13, 2012 at 11:57 PM, Richard Henderson r...@redhat.com wrote: On 02/13/2012 02:54 PM, Uros Bizjak wrote: - movq 56(%rsi), %rdx movl %edi, %eax cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq %rcx, %rsp - jmp *%rdx + jmp *56(%rsi) If you're going to do that, the correct fix for the unwind info is - cfi_register(%rip, %rdx) + cfi_offset(%rip, 56) Hm, we just defined new CFA as rcx+0, so we should define location of rip relative to new CFA. Since CFA points to stack slot just before return address was pushed, new rip lies at CFA-8 for 64bit resp. CFA-4 for x86_32. Did I get these .cfi directives correctly? SYM(GTM_longjmp): cfi_startproc #ifdef __x86_64__ movq(%rsi), %rcx movq8(%rsi), %rbx movq16(%rsi), %rbp movq24(%rsi), %r12 movq32(%rsi), %r13 movq40(%rsi), %r14 movq48(%rsi), %r15 movl%edi, %eax cfi_def_cfa(%rcx, 0) cfi_offset(%rip, -8) movq%rcx, %rsp jmp *56(%rsi) #else movl(%edx), %ecx movl4(%edx), %ebx movl8(%edx), %esi movl12(%edx), %edi movl16(%edx), %ebp cfi_def_cfa(%ecx, 0) cfi_offset(%eip, -4) movl%ecx, %esp jmp *20(%edx) #endif cfi_endproc Uros.
Re: [PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
On 02/13/2012 04:09 PM, Uros Bizjak wrote: On Mon, Feb 13, 2012 at 11:57 PM, Richard Henderson r...@redhat.com wrote: On 02/13/2012 02:54 PM, Uros Bizjak wrote: - movq56(%rsi), %rdx movl%edi, %eax cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq%rcx, %rsp - jmp *%rdx + jmp *56(%rsi) If you're going to do that, the correct fix for the unwind info is - cfi_register(%rip, %rdx) + cfi_offset(%rip, 56) Hm, we just defined new CFA as rcx+0, so we should define location of rip relative to new CFA. Since CFA points to stack slot just before return address was pushed, new rip lies at CFA-8 for 64bit resp. CFA-4 for x86_32. Did I get these .cfi directives correctly? No. The value at %rcx-8 is total garbage. There no guarantee that the call stack leading to this abort has anything in common with the call stack that created the jmpbuf, except *above* %rcx, the new CFA. The new rip is at rsi+56. You can see that in that you jump to it. r~
Re: [PATCH, go]: Disable TestListenMulticastUDP on alpha linux
Uros Bizjak ubiz...@gmail.com writes: alpha linux does not have expected /proc/net/igmp and /proc/net/igmp6 files, so func interfaceMulticastAddrTable(ifindex int) from interface_linux.go always returns (nil, nil), failing net/test with: --- FAIL: net.TestListenMulticastUDP (4.71 seconds) ???:1: IPv4 multicast interface: nil ???:1: IPv4 multicast TTL: 1 ???:1: IPv4 multicast loopback: false ???:1: 224.0.0.254:12345 not found in RIB FAIL Attached patch skips this sub-test in the same way as for ARM arch. Tested on alphaev6-pc-linux-gnu, where it fixes failing net test. Thanks. Committed. Ian
libgo patch committed: Reload m and g if necessary
PR 50654 points out that many Go tests fail on systems that use emutls. This turns out to be a subtle issue involving the use of setcontext and getcontext. When a particular invocation is moved to run on a different thread via getcontext and setcontext, it must reload the thread-local variables m and g. This happens naturally, because the function call makes gcc think that they might have changed (as indeed they might have). However, gcc knows that the address of the thread-local variables can not change. Or at least it thinks it does; if setcontext causes the function to start running on a different thread, then the address actually does change. This means that gcc may cache the address on the stack in some cases where it must not. The same issue arises for ordinary TLS, of course, and I have already fixed most cases. However, I missed one case. That case was working for ordinary TLS because the function refers to both m and g, and gcc compiles the code such that it holds a pointer to the thread-specific area and references m and g off that pointer. This happens to work even if the function starts running on a different thread. However, it does not work when using emultls, for which gcc uses a different compilation strategy. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-unknonw-linux-gnu, with both regular TLS and emutls. Committed to mainline. Ian diff -r 5b77b481d6f9 libgo/runtime/proc.c --- a/libgo/runtime/proc.c Mon Feb 13 16:29:13 2012 -0800 +++ b/libgo/runtime/proc.c Mon Feb 13 16:30:31 2012 -0800 @@ -309,6 +309,8 @@ static void runtime_mcall(void (*pfn)(G*)) { + M *mp; + G *gp; #ifndef USING_SPLIT_STACK int i; #endif @@ -317,28 +319,45 @@ // collector. __builtin_unwind_init(); - if(g == m-g0) + mp = m; + gp = g; + if(gp == mp-g0) runtime_throw(runtime: mcall called on m-g0 stack); - if(g != nil) { + if(gp != nil) { #ifdef USING_SPLIT_STACK __splitstack_getcontext(g-stack_context[0]); #else - g-gcnext_sp = i; + gp-gcnext_sp = i; #endif - g-fromgogo = false; - getcontext(g-context); + gp-fromgogo = false; + getcontext(gp-context); + + // When we return from getcontext, we may be running + // in a new thread. That means that m and g may have + // changed. They are global variables so we will + // reload them, but the addresses of m and g may be + // cached in our local stack frame, and those + // addresses may be wrong. Call functions to reload + // the values for this thread. + mp = runtime_m(); + gp = runtime_g(); } - if (g == nil || !g-fromgogo) { + if (gp == nil || !gp-fromgogo) { #ifdef USING_SPLIT_STACK - __splitstack_setcontext(m-g0-stack_context[0]); + __splitstack_setcontext(mp-g0-stack_context[0]); #endif - m-g0-entry = (byte*)pfn; - m-g0-param = g; - g = m-g0; - fixcontext(m-g0-context); - setcontext(m-g0-context); + mp-g0-entry = (byte*)pfn; + mp-g0-param = gp; + + // It's OK to set g directly here because this case + // can not occur if we got here via a setcontext to + // the getcontext call just above. + g = mp-g0; + + fixcontext(mp-g0-context); + setcontext(mp-g0-context); runtime_throw(runtime: mcall function returned); } }
Re: [C/C++ PATCH] Fix merge_decls/duplicate_decls DECL_USER_ALIGN/DECL_ALIGN handling (PR c/52181)
OK. Jason
Re: [C++ Patch] PR 51494 (and 52183)
This patch fixes this particular bug, but there are some issues. First, non_static_member_function_p only checks the first function in the overload set, which may not be representative of all of them. It really shouldn't look through OVERLOADs, we need to defer this decision until build_over_call. Second, the uses of maybe_dummy_object in build_offset_ref, finish_qualified_id_expr and finish_id_expression could also be dealing with static member functions. The underlying problem here is that we're only supposed to capture a variable/this when it is odr-used, which we can't know until we finish overload resolution. Jason
Re: [libitm] Add SPARC bits
From: Eric Botcazou ebotca...@adacore.com Date: Sun, 12 Feb 2012 21:15:26 +0100 + load[%o1 + OFFSET (JB_CFA)], %fp + cfi_def_cfa(%fp, 0) +#if STACK_BIAS + sub %fp, STACK_BIAS, %fp + cfi_def_cfa_offset(STACK_BIAS) +#endif I think you really need to put the proper value into the %fp register atomically here. If an interrupt comes in before you STACK_BIAS adjust the %fp, a debugger or similar could see a corrupt frame pointer.
Re: [libitm] Add SPARC bits
From: Eric Botcazou ebotca...@adacore.com Date: Sun, 12 Feb 2012 21:15:26 +0100 +static inline void +cpu_relax (void) +{ + __asm volatile ( : : : memory); +} We probably want to do some nop'ish thing here which will yield the cpu thread on Niagara cpus, I'd recommend something along the lines of rd %ccr, %g0 or rd %y, %g0
Re: [libitm] Link with -litm and -pthread
On Sat, 11 Feb 2012, Eric Botcazou wrote: Hi, this completes the half-implemented linking scheme of libitm and makes it mimic that of libgomp entirely. We need the -pthread thing on Solaris 8. It broke all targets that don't implement threads and as such don't support -pthread. And you need to gate *all* tm-related tests on something like check_effective_target_pthread. I see regress-155 for cris-elf. Can't you just limit adding -pthread to Solaris 8 or something? brgds, H-P
[google/integration] Add support for powerpc64-grtev2-linux-gnu (issue5659050)
Hi, This patch adds support for powerpc*-grtev2-linux-gnu. The changes include: 1. Relocating the dynamic linker using a run-time root prefix. 2. Using different library setting in static linking. This is tested by building PowerPC64 and PowerPC toolchains and ran some tests with the resulting toolchain. This is used by Google and is not meant to be sent to trunk. -Doug 2012-02-13 Doug Kwan dougk...@google.com * gcc/config.gcc (powerpc*-*-linux): Pull in GRTEv2 spec changes if target matches *-grtev2-*. * gcc/config/rs6000/linux64.h (GLIB_DYNAMIC_LINKER{32,64}): Add runtime root prefix to glibc's dynamic linker. * gcc/config/rs6000/linux-grtev2.h: New file. * gcc/config/rs6000/sysv4.h (GLIB_DYNAMIC_LINKER): Add runtime root prefix to glibc's dynamic linker. (LINUX_GRTE_EXTRA_SPECS): Define to be empty if no definition found. (SUBTARGET_EXTRA_SPECS): Include LINUX_GRTE_EXTRA_SPECS. Index: gcc/config.gcc === --- gcc/config.gcc (revision 184150) +++ gcc/config.gcc (working copy) @@ -2040,6 +2040,12 @@ powerpc-*-linux* | powerpc64-*-linux*) if test x${enable_secureplt} = xyes; then tm_file=rs6000/secureplt.h ${tm_file} fi + # Pull in spec changes for GRTEv2 configurations. + case ${target} in + *-grtev2-*) + tm_file=${tm_file} rs6000/linux-grtev2.h + ;; + esac ;; powerpc-wrs-vxworks|powerpc-wrs-vxworksae) tm_file=${tm_file} elfos.h freebsd-spec.h rs6000/sysv4.h Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 184150) +++ gcc/config/rs6000/linux64.h (working copy) @@ -367,8 +367,8 @@ extern int dot_symbols; #undef LINK_OS_DEFAULT_SPEC #define LINK_OS_DEFAULT_SPEC %(link_os_linux) -#define GLIBC_DYNAMIC_LINKER32 /lib/ld.so.1 -#define GLIBC_DYNAMIC_LINKER64 /lib64/ld64.so.1 +#define GLIBC_DYNAMIC_LINKER32 RUNTIME_ROOT_PREFIX /lib/ld.so.1 +#define GLIBC_DYNAMIC_LINKER64 RUNTIME_ROOT_PREFIX /lib64/ld64.so.1 #define UCLIBC_DYNAMIC_LINKER32 /lib/ld-uClibc.so.0 #define UCLIBC_DYNAMIC_LINKER64 /lib/ld64-uClibc.so.0 #if DEFAULT_LIBC == LIBC_UCLIBC Index: gcc/config/rs6000/linux-grtev2.h === --- gcc/config/rs6000/linux-grtev2.h(revision 0) +++ gcc/config/rs6000/linux-grtev2.h(revision 0) @@ -0,0 +1,43 @@ +/* Definitions for Linux-based GRTE (Google RunTime Environment) version 2. + Copyright (C) 2009,2010,2011,2012 Free Software Foundation, Inc. + Contributed by Chris Demetriou and Ollie Wild. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +/* Overrides LIB_LINUX_SPEC from sysv4.h. */ +#undef LIB_LINUX_SPEC +#define LIB_LINUX_SPEC \ + %{pthread:-lpthread} \ + %{shared:-lc} \ + %{!shared:%{mieee-fp:-lieee} %{profile:%(libc_p)}%{!profile:%(libc)}} + +/* When GRTE links statically, it needs its NSS and resolver libraries + linked in as well. Note that when linking statically, these are + enclosed in a group by LINK_GCC_C_SEQUENCE_SPEC. */ +#undef LINUX_GRTE_EXTRA_SPECS +#define LINUX_GRTE_EXTRA_SPECS \ + { libc, %{static:%(libc_static);:-lc} }, \ + { libc_p, %{static:%(libc_p_static);:-lc_p} }, \ + { libc_static, \ +-lc -lnss_borg -lnss_cache -lnss_dns -lnss_files -lresolv }, \ + { libc_p_static, \ +-lc_p -lnss_borg_p -lnss_cache_p -lnss_dns_p -lnss_files_p -lresolv_p }, Index: gcc/config/rs6000/sysv4.h === --- gcc/config/rs6000/sysv4.h (revision 184150) +++ gcc/config/rs6000/sysv4.h (working copy) @@ -803,7 +803,10 @@ extern int fixuplabelno; #define LINK_START_LINUX_SPEC -#define GLIBC_DYNAMIC_LINKER /lib/ld.so.1 +#ifndef RUNTIME_ROOT_PREFIX +#define RUNTIME_ROOT_PREFIX +#endif +#define GLIBC_DYNAMIC_LINKER RUNTIME_ROOT_PREFIX /lib/ld.so.1 #define UCLIBC_DYNAMIC_LINKER /lib/ld-uClibc.so.0 #if DEFAULT_LIBC == LIBC_UCLIBC #define
Re: [google/integration] Add support for powerpc64-grtev2-linux-gnu (issue5659050)
On Mon, Feb 13, 2012 at 6:41 PM, Doug Kwan dougk...@google.com wrote: Hi, This patch adds support for powerpc*-grtev2-linux-gnu. The changes include: 1. Relocating the dynamic linker using a run-time root prefix. 2. Using different library setting in static linking. This is tested by building PowerPC64 and PowerPC toolchains and ran some tests with the resulting toolchain. This is used by Google and is not meant to be sent to trunk. -Doug 2012-02-13 Doug Kwan dougk...@google.com * gcc/config.gcc (powerpc*-*-linux): Pull in GRTEv2 spec changes if target matches *-grtev2-*. * gcc/config/rs6000/linux64.h (GLIB_DYNAMIC_LINKER{32,64}): Add runtime root prefix to glibc's dynamic linker. * gcc/config/rs6000/linux-grtev2.h: New file. * gcc/config/rs6000/sysv4.h (GLIB_DYNAMIC_LINKER): Add runtime root prefix to glibc's dynamic linker. (LINUX_GRTE_EXTRA_SPECS): Define to be empty if no definition found. (SUBTARGET_EXTRA_SPECS): Include LINUX_GRTE_EXTRA_SPECS. Index: gcc/config.gcc === --- gcc/config.gcc (revision 184150) +++ gcc/config.gcc (working copy) @@ -2040,6 +2040,12 @@ powerpc-*-linux* | powerpc64-*-linux*) if test x${enable_secureplt} = xyes; then tm_file=rs6000/secureplt.h ${tm_file} fi + # Pull in spec changes for GRTEv2 configurations. + case ${target} in + *-grtev2-*) + tm_file=${tm_file} rs6000/linux-grtev2.h + ;; + esac ;; powerpc-wrs-vxworks|powerpc-wrs-vxworksae) tm_file=${tm_file} elfos.h freebsd-spec.h rs6000/sysv4.h Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 184150) +++ gcc/config/rs6000/linux64.h (working copy) @@ -367,8 +367,8 @@ extern int dot_symbols; #undef LINK_OS_DEFAULT_SPEC #define LINK_OS_DEFAULT_SPEC %(link_os_linux) -#define GLIBC_DYNAMIC_LINKER32 /lib/ld.so.1 -#define GLIBC_DYNAMIC_LINKER64 /lib64/ld64.so.1 +#define GLIBC_DYNAMIC_LINKER32 RUNTIME_ROOT_PREFIX /lib/ld.so.1 +#define GLIBC_DYNAMIC_LINKER64 RUNTIME_ROOT_PREFIX /lib64/ld64.so.1 #define UCLIBC_DYNAMIC_LINKER32 /lib/ld-uClibc.so.0 #define UCLIBC_DYNAMIC_LINKER64 /lib/ld64-uClibc.so.0 #if DEFAULT_LIBC == LIBC_UCLIBC Index: gcc/config/rs6000/linux-grtev2.h === --- gcc/config/rs6000/linux-grtev2.h (revision 0) +++ gcc/config/rs6000/linux-grtev2.h (revision 0) @@ -0,0 +1,43 @@ +/* Definitions for Linux-based GRTE (Google RunTime Environment) version 2. + Copyright (C) 2009,2010,2011,2012 Free Software Foundation, Inc. + Contributed by Chris Demetriou and Ollie Wild. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +/* Overrides LIB_LINUX_SPEC from sysv4.h. */ +#undef LIB_LINUX_SPEC +#define LIB_LINUX_SPEC \ + %{pthread:-lpthread} \ + %{shared:-lc} \ + %{!shared:%{mieee-fp:-lieee} %{profile:%(libc_p)}%{!profile:%(libc)}} + +/* When GRTE links statically, it needs its NSS and resolver libraries + linked in as well. Note that when linking statically, these are + enclosed in a group by LINK_GCC_C_SEQUENCE_SPEC. */ +#undef LINUX_GRTE_EXTRA_SPECS +#define LINUX_GRTE_EXTRA_SPECS \ + { libc, %{static:%(libc_static);:-lc} }, \ + { libc_p, %{static:%(libc_p_static);:-lc_p} }, \ + { libc_static, \ + -lc -lnss_borg -lnss_cache -lnss_dns -lnss_files -lresolv }, \ + { libc_p_static, \ + -lc_p -lnss_borg_p -lnss_cache_p -lnss_dns_p -lnss_files_p -lresolv_p }, Really can't you fix glibc so that libnss.a is not needed. See http://sourceware.org/bugzilla/show_bug.cgi?id=6528 for those fixes. Thanks, Andrew Pinski Index: gcc/config/rs6000/sysv4.h === --- gcc/config/rs6000/sysv4.h (revision 184150) +++ gcc/config/rs6000/sysv4.h (working copy) @@ -803,7 +803,10 @@ extern int fixuplabelno;
Re: [google/integration] Add support for powerpc64-grtev2-linux-gnu (issue5659050)
Thanks Andrew. I will take a look at that. -Doug On Mon, Feb 13, 2012 at 6:45 PM, Andrew Pinski pins...@gmail.com wrote: On Mon, Feb 13, 2012 at 6:41 PM, Doug Kwan dougk...@google.com wrote: Hi, This patch adds support for powerpc*-grtev2-linux-gnu. The changes include: 1. Relocating the dynamic linker using a run-time root prefix. 2. Using different library setting in static linking. This is tested by building PowerPC64 and PowerPC toolchains and ran some tests with the resulting toolchain. This is used by Google and is not meant to be sent to trunk. -Doug 2012-02-13 Doug Kwan dougk...@google.com * gcc/config.gcc (powerpc*-*-linux): Pull in GRTEv2 spec changes if target matches *-grtev2-*. * gcc/config/rs6000/linux64.h (GLIB_DYNAMIC_LINKER{32,64}): Add runtime root prefix to glibc's dynamic linker. * gcc/config/rs6000/linux-grtev2.h: New file. * gcc/config/rs6000/sysv4.h (GLIB_DYNAMIC_LINKER): Add runtime root prefix to glibc's dynamic linker. (LINUX_GRTE_EXTRA_SPECS): Define to be empty if no definition found. (SUBTARGET_EXTRA_SPECS): Include LINUX_GRTE_EXTRA_SPECS. Index: gcc/config.gcc === --- gcc/config.gcc (revision 184150) +++ gcc/config.gcc (working copy) @@ -2040,6 +2040,12 @@ powerpc-*-linux* | powerpc64-*-linux*) if test x${enable_secureplt} = xyes; then tm_file=rs6000/secureplt.h ${tm_file} fi + # Pull in spec changes for GRTEv2 configurations. + case ${target} in + *-grtev2-*) + tm_file=${tm_file} rs6000/linux-grtev2.h + ;; + esac ;; powerpc-wrs-vxworks|powerpc-wrs-vxworksae) tm_file=${tm_file} elfos.h freebsd-spec.h rs6000/sysv4.h Index: gcc/config/rs6000/linux64.h === --- gcc/config/rs6000/linux64.h (revision 184150) +++ gcc/config/rs6000/linux64.h (working copy) @@ -367,8 +367,8 @@ extern int dot_symbols; #undef LINK_OS_DEFAULT_SPEC #define LINK_OS_DEFAULT_SPEC %(link_os_linux) -#define GLIBC_DYNAMIC_LINKER32 /lib/ld.so.1 -#define GLIBC_DYNAMIC_LINKER64 /lib64/ld64.so.1 +#define GLIBC_DYNAMIC_LINKER32 RUNTIME_ROOT_PREFIX /lib/ld.so.1 +#define GLIBC_DYNAMIC_LINKER64 RUNTIME_ROOT_PREFIX /lib64/ld64.so.1 #define UCLIBC_DYNAMIC_LINKER32 /lib/ld-uClibc.so.0 #define UCLIBC_DYNAMIC_LINKER64 /lib/ld64-uClibc.so.0 #if DEFAULT_LIBC == LIBC_UCLIBC Index: gcc/config/rs6000/linux-grtev2.h === --- gcc/config/rs6000/linux-grtev2.h (revision 0) +++ gcc/config/rs6000/linux-grtev2.h (revision 0) @@ -0,0 +1,43 @@ +/* Definitions for Linux-based GRTE (Google RunTime Environment) version 2. + Copyright (C) 2009,2010,2011,2012 Free Software Foundation, Inc. + Contributed by Chris Demetriou and Ollie Wild. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +/* Overrides LIB_LINUX_SPEC from sysv4.h. */ +#undef LIB_LINUX_SPEC +#define LIB_LINUX_SPEC \ + %{pthread:-lpthread} \ + %{shared:-lc} \ + %{!shared:%{mieee-fp:-lieee} %{profile:%(libc_p)}%{!profile:%(libc)}} + +/* When GRTE links statically, it needs its NSS and resolver libraries + linked in as well. Note that when linking statically, these are + enclosed in a group by LINK_GCC_C_SEQUENCE_SPEC. */ +#undef LINUX_GRTE_EXTRA_SPECS +#define LINUX_GRTE_EXTRA_SPECS \ + { libc, %{static:%(libc_static);:-lc} }, \ + { libc_p, %{static:%(libc_p_static);:-lc_p} }, \ + { libc_static, \ + -lc -lnss_borg -lnss_cache -lnss_dns -lnss_files -lresolv }, \ + { libc_p_static, \ + -lc_p -lnss_borg_p -lnss_cache_p -lnss_dns_p -lnss_files_p -lresolv_p }, Really can't you fix glibc so that libnss.a is not needed. See http://sourceware.org/bugzilla/show_bug.cgi?id=6528 for those fixes. Thanks, Andrew Pinski Index: gcc/config/rs6000/sysv4.h === ---
Re: [v3] libstdc++/51798
The patch uses the weak version of compare_exchange universally, which is incorrect in a number of cases. You wouldn't see this on x86_64; you'd have to use a ll/sc target such as powerpc. In addition to changing several uses to strong compare_exchange, I also optimize the idiom do { var = *m; newval = ...; } while (!atomic_compare_exchange(m, var, newval, ...)); With the new builtins, VAR is updated with the current value of the memory (regardless of the weak setting), so the initial read from *M can be hoisted outside the loop. nice! Ok? cool, thanks for reviewing this. I fixed up the line numbers for the header file edits. -benjamin 2012-02-13 Benjamin Kosnik b...@redhat.com * testsuite/20_util/shared_ptr/cons/43820_neg.cc: Adjust line numbers. * testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc: Same. diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/43820_neg.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/43820_neg.cc index 0d51663..39f9ce3 100644 --- a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/43820_neg.cc +++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/43820_neg.cc @@ -32,9 +32,9 @@ void test01() { X* px = 0; std::shared_ptrX p1(px); // { dg-error here } - // { dg-error incomplete { target *-*-* } 773 } + // { dg-error incomplete { target *-*-* } 771 } std::shared_ptrX p9(ap()); // { dg-error here } - // { dg-error incomplete { target *-*-* } 867 } + // { dg-error incomplete { target *-*-* } 865 } } diff --git a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc index ae902dc..0309f8f 100644 --- a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc +++ b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc @@ -1,6 +1,6 @@ // { dg-do compile } -// Copyright (C) 2010 Free Software Foundation +// Copyright (C) 2010, 2012 Free Software Foundation // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -30,9 +30,9 @@ void test01() { X* px = 0; std::tr1::shared_ptrX p1(px); // { dg-error here } - // { dg-error incomplete { target *-*-* } 565 } + // { dg-error incomplete { target *-*-* } 563 } std::tr1::shared_ptrX p9(ap()); // { dg-error here } - // { dg-error incomplete { target *-*-* } 604 } + // { dg-error incomplete { target *-*-* } 602 } }
RE: [PATCH ARM] backport r174803 from trunk to 4.6 branch
-Original Message- From: Richard Earnshaw Sent: Monday, February 13, 2012 7:37 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH ARM] backport r174803 from trunk to 4.6 branch On 08/02/12 08:29, Bin Cheng wrote: Hi, Julian Brown once posted a patch fixing ARM EABI violation, which I think also essential to 4.6 branch. I created a patch against 4.6 branch as attached. Is it ok to back port? You can refer following link for original patch. http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00260.html Thanks gcc/ChangeLog: 2012-02-08 Bin Cheng bin.ch...@arm.com Backport from mainline 2011-06-08 Julian Brown jul...@codesourcery.com * config/arm/arm.c (arm_libcall_uses_aapcs_base): Use correct ABI for double-precision helper functions in hard-float mode if only single-precision arithmetic is supported in hardware. OK. Can you also back-port it to 4.5 as well, please. Hi, Thanks for approving, I will back port this and r183734(from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51835) to 4.5 branch. Thanks.
RE: [Ping] RE: CR16 Port addition
Hello Gerald, Thank you for this suggestion. I have not worked on these changes before. Therefore, can you please review the attached patch and let me know if any changes are required to be done in it. Thanks and Regards, Jayant Sonar [KPIT Cummins, Pune] cr16-htdocs2.diff Description: cr16-htdocs2.diff
MAINTAINERS: add myself
Committed. 2012-02-14 Walter Lee w...@tilera.com * MAINTAINERS (Write After Approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 184193) +++ MAINTAINERS (working copy) @@ -428,6 +428,7 @@ Asher Langton langt...@llnl.gov Chris Lattner sa...@nondot.org Terry Laurenzo tlaure...@gmail.com Georg-Johann Lay a...@gjlay.de +Walter Lee w...@tilera.com Marc Lehmann p...@goof.com James Lemkejwle...@codesourcery.com Kriang Lerdsuwanakij lerds...@users.sourceforge.net
[PATCH] Prefer reg as first operand in commutative operator
Hi, This patch was submitted as part of PR 52235. It increases the preference of a register for first operand of a commutative operator. 2012-02-13 Paulo Matos paulo.ma...@csr.com * gcc/rtlanal.c: Increase preference of a register for the first operand in a commutative operator. --- gcc46/gcc/rtlanal.c (gcc 4.6.2) +++ gcc46/gcc/rtlanal.c (working copy) @@ -3047,11 +3047,11 @@ /* Constants always come the second operand. Prefer nice constants. */ if (code == CONST_INT) +return -9; + if (code == CONST_DOUBLE) return -8; - if (code == CONST_DOUBLE) -return -7; if (code == CONST_FIXED) -return -7; +return -8; op = avoid_constant_pool_reference (op); code = GET_CODE (op); @@ -3059,26 +3059,28 @@ { case RTX_CONST_OBJ: if (code == CONST_INT) +return -7; + if (code == CONST_DOUBLE) return -6; - if (code == CONST_DOUBLE) -return -5; if (code == CONST_FIXED) -return -5; - return -4; +return -6; + return -5; case RTX_EXTRA: /* SUBREGs of objects should come second. */ if (code == SUBREG OBJECT_P (SUBREG_REG (op))) -return -3; +return -4; return 0; case RTX_OBJ: /* Complex expressions should be the first, so decrease priority of objects. Prefer pointer objects over non pointer objects. */ - if ((REG_P (op) REG_POINTER (op)) - || (MEM_P (op) MEM_POINTER (op))) - return -1; - return -2; + if(REG_P(op)) + return -1; + else if ((REG_P (op) REG_POINTER (op)) + || (MEM_P (op) MEM_POINTER (op))) + return -2; + return -3; case RTX_COMM_ARITH: /* Prefer operands that are themselves commutative to be first. -- PMatos
Re: [PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
On Tue, Feb 14, 2012 at 1:15 AM, Richard Henderson r...@redhat.com wrote: - movq 56(%rsi), %rdx movl %edi, %eax cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq %rcx, %rsp - jmp *%rdx + jmp *56(%rsi) If you're going to do that, the correct fix for the unwind info is - cfi_register(%rip, %rdx) + cfi_offset(%rip, 56) Hm, we just defined new CFA as rcx+0, so we should define location of rip relative to new CFA. Since CFA points to stack slot just before return address was pushed, new rip lies at CFA-8 for 64bit resp. CFA-4 for x86_32. Did I get these .cfi directives correctly? No. The value at %rcx-8 is total garbage. There no guarantee that the call stack leading to this abort has anything in common with the call stack that created the jmpbuf, except *above* %rcx, the new CFA. The new rip is at rsi+56. You can see that in that you jump to it. Thanks for the explanation, I will commit the patch with your suggested change. Uros.
Re: [PATCH, libitm]: GTM_longjmp: Jump indirect from memory address
On Tue, Feb 14, 2012 at 8:39 AM, Uros Bizjak ubiz...@gmail.com wrote: - cfi_register(%rip, %rdx) + cfi_offset(%rip, 56) Hm, we just defined new CFA as rcx+0, so we should define location of rip relative to new CFA. Since CFA points to stack slot just before return address was pushed, new rip lies at CFA-8 for 64bit resp. CFA-4 for x86_32. Did I get these .cfi directives correctly? No. The value at %rcx-8 is total garbage. There no guarantee that the call stack leading to this abort has anything in common with the call stack that created the jmpbuf, except *above* %rcx, the new CFA. The new rip is at rsi+56. You can see that in that you jump to it. Thanks for the explanation, I will commit the patch with your suggested change. Now with the patch attached... (please also note that rip is now defined with offset to old CFA, before CFA is updated to new register). Uros. Index: ChangeLog === --- ChangeLog (revision 184197) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2012-02-15 Uros Bizjak ubiz...@gmail.com + + * config/x86/target.h (GTM_longjmp): Jump indirect from memory address. + 2012-02-13 Eric Botcazou ebotca...@adacore.com * configure.tgt (target_cpu): Handle sparc and sparc64 sparcv9. Index: config/x86/sjlj.S === --- config/x86/sjlj.S (revision 184150) +++ config/x86/sjlj.S (working copy) @@ -119,23 +119,21 @@ movq32(%rsi), %r13 movq40(%rsi), %r14 movq48(%rsi), %r15 - movq56(%rsi), %rdx movl%edi, %eax + cfi_offset(%rip, 56) cfi_def_cfa(%rcx, 0) - cfi_register(%rip, %rdx) movq%rcx, %rsp - jmp *%rdx + jmp *56(%rsi) #else movl(%edx), %ecx movl4(%edx), %ebx movl8(%edx), %esi movl12(%edx), %edi movl16(%edx), %ebp - movl20(%edx), %edx + cfi_offset(%eip, 20) cfi_def_cfa(%ecx, 0) - cfi_register(%eip, %edx) movl%ecx, %esp - jmp *%edx + jmp *20(%edx) #endif cfi_endproc