[Patch ARM] Fix off by one error in neon_evpc_vrev.
Hi, There is an off by one error in neon_evpc_vrev which means it rarely gets triggerred. The problem is that if you are looking at d-perm [i +j] and you increment i by just diff you end up starting looking where you looked at the end of the last place where you checked. Given this I think this patch should make it to trunk and to the 4.7 branch after appropriate testing and if the release managers don't object to such a change. The point is that this helps generate proper vect_reverse style instructions for arm_neon. There is scope for a follow up patch that fixes up the intrinsics essentially identical to all the functions in the test that I've added (except for a couple of missing v2sf and v4sf type operations which should boil down to the same implementation) . It's a permute so the type really doesn't matter. However it requires some ML magic with permutations for the masks which will prove to be good fun on a plane trip. Testing currently in progress and if there are no regressions I intend to commit this to trunk and (after a while I would like to backport this to the 4.7 branch if there are no objections from the release managers) RichardH , since you did the original patch would you mind having a quick look to see if I haven't missed anything obvious. * config/arm/arm.c (arm_evpc_neon_vrev): Fix off by one error. * gcc.target/arm/neon-vrev.c: New. regards, Ramana 1. For one example from the testcase you can see the effects before and after for this : (left is new compiler and right is old compiler) vrev16q2_u8:vrev16q2_u8: @ args = 0, pretend = 0, frame = 0 @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated.@ link register save eliminated. vmovd16, r0, r1 @ v16qi | vmov d20, r0, r1 @ v16qi vmovd17, r2, r3 | vmov d21, r2, r3 vrev16.8q8, q8| vldr d16, .L41 vmovr0, r1, d16 @ v16qi | vldr d17, .L41+8 vmovr2, r3, d17 | vtbl.8 d18, {d20, d21}, d16 bx lr| vtbl.8 d19, {d20, d21}, d17 vmov r0, r1, d18 @ v16qi vmov r2, r3, d19 bx lr .L42: .align 3 .L41: .byte 1 .byte 0 .byte 3 .byte 2 .byte 5 .byte 4 .byte 7 .byte 6 .byte 9 .byte 8 .byte 11 .byte 10 .byte 13 .byte 12 .byte 15 .byte 14 .size vrev16q2_u8, .-vrev16q2_u8 .size vrev16q2_u8, .-vrev16q2_u8 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 321e6b5..bcad0b9 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25668,10 +25668,18 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d) return false; } - for (i = 0; i nelt; i += diff) + for (i = 0; i nelt ; i += (diff + 1)) for (j = 0; j = diff; j += 1) - if (d-perm[i + j] != i + diff - j) - return false; + { + /* This is guaranteed to be true as the value of diff +
[Ada] Small tweak to type derivation machinery
To have the name of the types of the variant part and the fields therein be unique instead of mere duplicates of those of the base type, which makes it easier to debug type merging issues in LTO mode. Tested on i586-suse-linux, applied on the mainline, 4.7 and 4.6 branches. 2012-05-26 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (variant_desc): Rename 'record' to 'new_type'. (build_variant_list): Adjust to above renaming. (gnat_to_gnu_entity) E_Record_Subtype: Likewise. Give a unique name to the type of the variant containers. (create_variant_part_from): Likewise. Give a unique name to the type of the variant part. -- Eric Botcazou Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 187833) +++ gcc-interface/decl.c (working copy) @@ -119,8 +119,8 @@ typedef struct variant_desc_d { /* The value of the qualifier. */ tree qual; - /* The record associated with this variant. */ - tree record; + /* The type of the variant after transformation. */ + tree new_type; } variant_desc; DEF_VEC_O(variant_desc); @@ -3318,11 +3318,16 @@ gnat_to_gnu_entity (Entity_Id gnat_entit { tree old_variant = v-type; tree new_variant = make_node (RECORD_TYPE); + tree suffix + = concat_name (DECL_NAME (gnu_variant_part), + IDENTIFIER_POINTER + (DECL_NAME (v-field))); TYPE_NAME (new_variant) - = DECL_NAME (TYPE_NAME (old_variant)); + = concat_name (TYPE_NAME (gnu_type), + IDENTIFIER_POINTER (suffix)); copy_and_substitute_in_size (new_variant, old_variant, gnu_subst_list); - v-record = new_variant; + v-new_type = new_variant; } } else @@ -3426,7 +3431,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit if (selected_variant) gnu_cont_type = gnu_type; else - gnu_cont_type = v-record; + gnu_cont_type = v-new_type; } else /* The front-end may pass us ghost components if @@ -7562,7 +7567,7 @@ build_variant_list (tree qual_union_type v-type = variant_type; v-field = gnu_field; v-qual = qual; - v-record = NULL_TREE; + v-new_type = NULL_TREE; /* Recurse on the variant subpart of the variant, if any. */ variant_subpart = get_variant_part (variant_type); @@ -8238,7 +8243,9 @@ create_variant_part_from (tree old_varia /* First create the type of the variant part from that of the old one. */ new_union_type = make_node (QUAL_UNION_TYPE); - TYPE_NAME (new_union_type) = DECL_NAME (TYPE_NAME (old_union_type)); + TYPE_NAME (new_union_type) += concat_name (TYPE_NAME (record_type), + IDENTIFIER_POINTER (DECL_NAME (old_variant_part))); /* If the position of the variant part is constant, subtract it from the size of the type of the parent to get the new size. This manual CSE @@ -8272,7 +8279,7 @@ create_variant_part_from (tree old_varia continue; /* Retrieve the list of fields already added to the new variant. */ - new_variant = v-record; + new_variant = v-new_type; field_list = TYPE_FIELDS (new_variant); /* If the old variant had a variant subpart, we need to create a new
Fix gnat.dg/renaming5.adb regression
There is one more goto in the .optimized dump because the latch of a loop is now preserved. Tested on i586-suse-linux, applied on the mainline. 2012-05-26 Eric Botcazou ebotca...@adacore.com * gnat.dg/renaming5.adb: Adjust dg-final directive. -- Eric Botcazou Index: gnat.dg/renaming5.adb === --- gnat.dg/renaming5.adb (revision 187833) +++ gnat.dg/renaming5.adb (working copy) @@ -26,5 +26,5 @@ package body Renaming5 is end Renaming5; --- { dg-final { scan-tree-dump-times goto 2 optimized } } +-- { dg-final { scan-tree-dump-times goto 3 optimized } } -- { dg-final { cleanup-tree-dump optimized } }
[Ada] Fix gnat.dg/return3.adb regression
The problem is that the new call to cleanup_cfg in gimple_expand_cfg has short-circuited the machinery that emits nops to carry goto locus at -O0. The machinery works in CFGLAYOUT mode, but here we're still in CFGRTL. The attached patch makes it so that forwarder blocks are not deleted by try_optimize_cfg if CLEANUP_NO_INSN_DEL and partially extends the above machinery to the CFGRTL mode. Bootstrapped/regtested on x86_64-suse-linux, applied on the mainline. 2012-05-26 Eric Botcazou ebotca...@adacore.com * cfgcleanup.c (try_optimize_cfg): Do not delete forwarder blocks if CLEANUP_NO_INSN_DEL. * cfgrtl.c (unique_locus_on_edge_between_p): New function extracted from cfg_layout_merge_blocks. (emit_nop_for_unique_locus_between): New function. (rtl_merge_blocks): Invoke emit_nop_for_unique_locus_between. (cfg_layout_merge_blocks): Likewise. -- Eric Botcazou Index: cfgcleanup.c === --- cfgcleanup.c (revision 187833) +++ cfgcleanup.c (working copy) @@ -2644,7 +2644,7 @@ try_optimize_cfg (int mode) } /* If we fall through an empty block, we can remove it. */ - if (!(mode CLEANUP_CFGLAYOUT) + if (!(mode (CLEANUP_CFGLAYOUT | CLEANUP_NO_INSN_DEL)) single_pred_p (b) (single_pred_edge (b)-flags EDGE_FALLTHRU) !LABEL_P (BB_HEAD (b)) Index: cfgrtl.c === --- cfgrtl.c (revision 187833) +++ cfgrtl.c (working copy) @@ -603,6 +603,56 @@ rtl_split_block (basic_block bb, void *i return new_bb; } +/* Return true if the single edge between blocks A and B is the only place + in RTL which holds some unique locus. */ + +static bool +unique_locus_on_edge_between_p (basic_block a, basic_block b) +{ + const int goto_locus = EDGE_SUCC (a, 0)-goto_locus; + rtx insn, end; + + if (!goto_locus) +return false; + + /* First scan block A backward. */ + insn = BB_END (a); + end = PREV_INSN (BB_HEAD (a)); + while (insn != end (!NONDEBUG_INSN_P (insn) || INSN_LOCATOR (insn) == 0)) +insn = PREV_INSN (insn); + + if (insn != end locator_eq (INSN_LOCATOR (insn), goto_locus)) +return false; + + /* Then scan block B forward. */ + insn = BB_HEAD (b); + if (insn) +{ + end = NEXT_INSN (BB_END (b)); + while (insn != end !NONDEBUG_INSN_P (insn)) + insn = NEXT_INSN (insn); + + if (insn != end INSN_LOCATOR (insn) != 0 + locator_eq (INSN_LOCATOR (insn), goto_locus)) + return false; +} + + return true; +} + +/* If the single edge between blocks A and B is the only place in RTL which + holds some unique locus, emit a nop with that locus between the blocks. */ + +static void +emit_nop_for_unique_locus_between (basic_block a, basic_block b) +{ + if (!unique_locus_on_edge_between_p (a, b)) +return; + + BB_END (a) = emit_insn_after_noloc (gen_nop (), BB_END (a), a); + INSN_LOCATOR (BB_END (a)) = EDGE_SUCC (a, 0)-goto_locus; +} + /* Blocks A and B are to be merged into a single block A. The insns are already contiguous. */ @@ -681,15 +731,25 @@ rtl_merge_blocks (basic_block a, basic_b /* Delete everything marked above as well as crap that might be hanging out between the two blocks. */ - BB_HEAD (b) = NULL; + BB_END (a) = a_end; + BB_HEAD (b) = b_empty ? NULL_RTX : b_head; delete_insn_chain (del_first, del_last, true); + /* When not optimizing CFG and the edge is the only place in RTL which holds + some unique locus, emit a nop with that locus in between. */ + if (!optimize) +{ + emit_nop_for_unique_locus_between (a, b); + a_end = BB_END (a); +} + /* Reassociate the insns of B with A. */ if (!b_empty) { update_bb_for_insn_chain (a_end, b_debug_end, a); - a_end = b_debug_end; + BB_END (a) = b_debug_end; + BB_HEAD (b) = NULL_RTX; } else if (b_end != b_debug_end) { @@ -701,11 +761,10 @@ rtl_merge_blocks (basic_block a, basic_b reorder_insns_nobb (NEXT_INSN (a_end), PREV_INSN (b_debug_start), b_debug_end); update_bb_for_insn_chain (b_debug_start, b_debug_end, a); - a_end = b_debug_end; + BB_END (a) = b_debug_end; } df_bb_delete (b-index); - BB_END (a) = a_end; /* If B was a forwarder block, propagate the locus on the edge. */ if (forwarder_p !EDGE_SUCC (b, 0)-goto_locus) @@ -2853,33 +2912,10 @@ cfg_layout_merge_blocks (basic_block a, try_redirect_by_replacing_jump (EDGE_SUCC (a, 0), b, true); gcc_assert (!JUMP_P (BB_END (a))); - /* When not optimizing and the edge is the only place in RTL which holds + /* When not optimizing CFG and the edge is the only place in RTL which holds some unique locus, emit a nop with that locus in between. */ - if (!optimize EDGE_SUCC (a, 0)-goto_locus) -{ - rtx insn = BB_END (a), end = PREV_INSN (BB_HEAD (a)); - int goto_locus = EDGE_SUCC (a,
Re: divide 64-bit by constant for 32-bit target machines
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 2cecf45..9d6983b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed) *total = COSTS_N_INSNS (2); else if (TARGET_HARD_FLOAT mode == DFmode !TARGET_VFP_SINGLE) *total = COSTS_N_INSNS (4); + else if (mode == DImode) +*total = COSTS_N_INSNS (50); else *total = COSTS_N_INSNS (20); return false; diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index d48a465..b5627c2 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED, *total = COSTS_N_INSNS (mips_idiv_insns ()); } else if (mode == DImode) -*total = mips_cost-int_div_di; +{ + if (!TARGET_64BIT) +/* Divide double integer library call is expensive. */ +*total = COSTS_N_INSNS (200); + else +*total = mips_cost-int_div_di; +} else *total = mips_cost-int_div_si; return false; diff --git a/gcc/expmed.c b/gcc/expmed.c index aa24fbf..5f4c921 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1, } } + if (unsignedp (!optimize_size (optimize1)) + (size - 1 BITS_PER_WORD + BITS_PER_WORD == 32 GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD) These references to 32-bits are still wrong (and unnecessary, just remove them). + (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode] + + shift_cost[speed][mode][31] max_cost)) +{ + unsigned HOST_WIDE_INT d; + rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result; + + d = (INTVAL (op1) GET_MODE_MASK (mode)); This could be a CONST_DOUBLE. But you don't need d, because you can... + /* Extracting the higher part of the 64-bit multiplier. */ + x1 = gen_highpart (word_mode, op0); + x1 = force_reg (word_mode, x1); + + /* Extracting the lower part of the 64-bit multiplier. */ + x0 = gen_lowpart (word_mode, op0); + x0 = force_reg (word_mode, x0); + + /* Splitting the 64-bit constant for the higher and the lower parts. */ + y0 = gen_int_mode(d UINT32_MAX, word_mode); + y1 = gen_int_mode(d 32, word_mode); ... use gen_lowpart and gen_highpart directly on op1. + + z2 = gen_reg_rtx (mode); + u0 = gen_reg_rtx (mode); + + /* Unsigned multiplication of the higher multiplier part + and the higher constant part. */ + z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab); + /* Unsigned multiplication of the lower multiplier part + and the higher constant part. */ + u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab); + + z0 = gen_reg_rtx (mode); + u1 = gen_reg_rtx (mode); + + /* Unsigned multiplication of the lower multiplier part + and the lower constant part. */ + z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab); + + /* Unsigned multiplication of the higher multiplier part + the lower constant part. */ + u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab); Up to here the comments are not necessary. + /* Getting the higher part of multiplication between the lower multiplier + part and the lower constant part, the lower part is not interesting + for the final result. */ + u0tmp = gen_highpart (word_mode, z0); + u0tmp = force_reg (word_mode, u0tmp); + u0tmp = convert_to_mode (mode, u0tmp, 1); + + /* Adding the higher part of multiplication between the lower multiplier + part and the lower constant part to the result of multiplication between + the lower multiplier part and the higher constant part. Please note, + that we couldn't get overflow here since in the worst case + (0x*0x)+0x we get 0xL. */ The command can simply be compute the middle word of the three-word intermediate result. Also it's not overflow, it's carry. + expand_inc (u0, u0tmp); + tmp = gen_reg_rtx (mode); + + /* Adding multiplication between the lower multiplier part and the higher + constant part with the higher part of multiplication between the lower + multiplier part and the lower constant part to the result of multiplication + between the higher multiplier part and the lower constant part. */ Here you have to explain: /* We have to return z2 + ((u0 + u1) GET_MODE_BITSIZE (word_mode)). u0 + u1 are the upper two words of the three-word intermediate result and they could
Re: divide 64-bit by constant for 32-bit target machines
Il 25/05/2012 12:20, Dinar Temirbulatov ha scritto: + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1); + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1); + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN); + if (!result) + return 0; Ah, you don't need the or. u0 tmp is already giving the overflow. Paolo
Re: divide 64-bit by constant for 32-bit target machines
Il 26/05/2012 14:35, Paolo Bonzini ha scritto: /* We have to return z2 + ((u0 + u1) GET_MODE_BITSIZE (word_mode)). u0 + u1 are the upper two words of the three-word intermediate result and they could have up to 2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision. We compute the extra bit by checking for carry, and add 1 GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */ Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD. + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN); + if (!tmp) + return 0; /* We have to return z2 + (tmp 32). We need + /* Checking for overflow. */ This is not overflow, it's carry (see above). + c = gen_reg_rtx (mode); + c1 = gen_reg_rtx (mode); + cres = gen_reg_rtx (mode); + + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1); + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1); + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN); + if (!result) + return 0; + + ccst = gen_reg_rtx (mode); + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1); This 32 should be GET_MODE_BITSIZE (word_mode). Here, too. Paolo
[C++ Patch] PR 25137
Hi, I found the time to return to this issue, where -Wmissing-braces is often overeager to warn, thus annoying, for example, people using -Wall together with std::array: std::arrayint, 3 s = { 1, 2, 3 }; I handle the issue following the letter of the suggestion given by Ian at the time: do not warn when the class type has only one field and that field is an aggregate (recursively, of course). Indeed, that seems to me quite conservative. I also make sure to change nothing wrt the designated initializers extension. Bootstrapped and tested x86_64-linux. Thanks, Paolo. /cp 2012-05-26 Paolo Carlini paolo.carl...@oracle.com PR c++/25137 * decl.c (reshape_init_r): Add bool parameter. (reshape_init_class): If the struct has only one field and that field is an aggregate, don't warn if there is only one set of braces in the initializer. (reshape_init_array_1, reshape_init_class, reshape_init): Adjust. /testsuite 2012-05-26 Paolo Carlini paolo.carl...@oracle.com PR c++/25137 * g++.dg/warn/Wbraces3.C: New. Index: testsuite/g++.dg/warn/Wbraces3.C === --- testsuite/g++.dg/warn/Wbraces3.C(revision 0) +++ testsuite/g++.dg/warn/Wbraces3.C(revision 0) @@ -0,0 +1,34 @@ +// PR c++/25137 +// { dg-options -Wmissing-braces } + +struct S { int s[3]; }; +S s1 = { 1, 1, 1 }; + +struct S1 { int s[3]; }; +struct S2 { struct S1 a; }; +S2 s21 = { 1, 1, 1 }; + +struct S3 { int s[3]; }; +struct S4 { struct S3 a; int b; }; +S4 s41 = { 1, 1, 1, 1 }; // { dg-warning missing braces around initializer for 'S3' } + +struct S5 { int s[3]; }; +struct S6 { struct S5 a; int b; }; +S6 s61 = { { 1, 1, 1 }, 1 }; + +struct S7 { int s[3]; }; +struct S8 { int a; struct S7 b; }; +S8 s81 = { 1, { 1, 1, 1 } }; + +struct S9 { int s[2]; }; +struct S10 { struct S9 a; struct S9 b; }; +S10 s101 = { { 1, 1 }, 1, 1 }; // { dg-warning missing braces around initializer for 'S9' } + +struct S11 { int s[2]; }; +struct S12 { struct S11 a; struct S11 b; }; +S12 s121 = { { 1, 1 }, { 1, 1 } }; + +struct S13 { int i; }; +struct S14 { struct S13 a; }; +struct S15 { struct S14 b; }; +S15 s151 = { 1 }; Index: cp/decl.c === --- cp/decl.c (revision 187907) +++ cp/decl.c (working copy) @@ -4954,7 +4954,7 @@ typedef struct reshape_iterator_t constructor_elt *end; } reshape_iter; -static tree reshape_init_r (tree, reshape_iter *, bool, tsubst_flags_t); +static tree reshape_init_r (tree, reshape_iter *, bool, bool, tsubst_flags_t); /* FIELD is a FIELD_DECL or NULL. In the former case, the value returned is the next FIELD_DECL (possibly FIELD itself) that can be @@ -5014,7 +5014,7 @@ reshape_init_array_1 (tree elt_type, tree max_inde check_array_designated_initializer (d-cur, index); elt_init = reshape_init_r (elt_type, d, /*first_initializer_p=*/false, -complain); +/*no_warn_missing_braces=*/false, complain); if (elt_init == error_mark_node) return error_mark_node; CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (new_init), @@ -5082,6 +5082,7 @@ reshape_init_class (tree type, reshape_iter *d, bo { tree field; tree new_init; + bool no_warn_missing_braces; gcc_assert (CLASS_TYPE_P (type)); @@ -5105,6 +5106,12 @@ reshape_init_class (tree type, reshape_iter *d, bo return new_init; } + /* If the struct has only one field and that field is an aggregate, + don't warn if there is only one set of braces in the initializer. */ + no_warn_missing_braces += (next_initializable_field (DECL_CHAIN (field)) == NULL_TREE +CP_AGGREGATE_TYPE_P (TREE_TYPE (field))); + /* Loop through the initializable fields, gathering initializers. */ while (d-cur != d-end) { @@ -5125,7 +5132,10 @@ reshape_init_class (tree type, reshape_iter *d, bo /* We already reshaped this. */ gcc_assert (d-cur-index == field); else - field = lookup_field_1 (type, d-cur-index, /*want_type=*/false); + { + field = lookup_field_1 (type, d-cur-index, /*want_type=*/false); + no_warn_missing_braces = false; + } if (!field || TREE_CODE (field) != FIELD_DECL) { @@ -5141,7 +5151,8 @@ reshape_init_class (tree type, reshape_iter *d, bo break; field_init = reshape_init_r (TREE_TYPE (field), d, - /*first_initializer_p=*/false, complain); + /*first_initializer_p=*/false, + no_warn_missing_braces, complain); if (field_init == error_mark_node) return error_mark_node; @@ -5183,11 +5194,12 @@ has_designator_problem (reshape_iter *d, tsubst_fl a CONSTRUCTOR). TYPE is the type of
[ARM Patch 1/n] PR53447: optimizations of 64bit ALU operation with constant
Hi, As described in PR53447, many 64bit ALU operations with constant can be optimized to use corresponding 32bit instructions with immediate operands. This is the first part of the patches that deals with 64bit add. It directly extends the patterns adddi3, arm_adddi3 and adddi3_neon to handle constant operands. Tested on arm qemu without regression. OK for trunk? thanks Carrot 2012-05-26 Wei Guozhi car...@google.com PR target/53447 * gcc.target/arm/pr53447-1.c: New testcase. 2012-05-26 Wei Guozhi car...@google.com PR target/53447 * config/arm/arm-protos.h (const_ok_for_adddi): New prototype. * config/arm/arm.c (const_ok_for_adddi): New function. * config/arm/constraints.md (Dd): New constraint. * config/arm/arm.md (adddi3): Extend it to handle constants. (arm_adddi3): Likewise. * config/arm/neon.md (adddi3_neon): Likewise. Index: testsuite/gcc.target/arm/pr53447-1.c === --- testsuite/gcc.target/arm/pr53447-1.c(revision 0) +++ testsuite/gcc.target/arm/pr53447-1.c(revision 0) @@ -0,0 +1,8 @@ +/* { dg-options -O2 } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-final { scan-assembler-not mov } } */ + +void t0p(long long * p) +{ + *p += 0x10001; +} Index: config/arm/arm.c === --- config/arm/arm.c(revision 187751) +++ config/arm/arm.c(working copy) @@ -2497,6 +2497,17 @@ } } +/* Return TRUE if int I is a valid immediate constant used by pattern + arm_adddi3. */ +int +const_ok_for_adddi (HOST_WIDE_INT i) +{ + HOST_WIDE_INT high = (i 32) 0x; + HOST_WIDE_INT low = i 0x; + return (const_ok_for_arm (high) + (const_ok_for_arm (low) || const_ok_for_arm (-low))); +} + /* Emit a sequence of insns to handle a large constant. CODE is the code of the operation required, it can be any of SET, PLUS, IOR, AND, XOR, MINUS; Index: config/arm/arm-protos.h === --- config/arm/arm-protos.h (revision 187751) +++ config/arm/arm-protos.h (working copy) @@ -47,6 +47,7 @@ extern bool arm_small_register_classes_for_mode_p (enum machine_mode); extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode); extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode); +extern int const_ok_for_adddi (HOST_WIDE_INT); extern int const_ok_for_arm (HOST_WIDE_INT); extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx, Index: config/arm/neon.md === --- config/arm/neon.md (revision 187751) +++ config/arm/neon.md (working copy) @@ -588,9 +588,9 @@ ) (define_insn adddi3_neon - [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w) -(plus:DI (match_operand:DI 1 s_register_operand %w,0,0,w) - (match_operand:DI 2 s_register_operand w,r,0,w))) + [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w,?r,?r,?r) +(plus:DI (match_operand:DI 1 s_register_operand %w,0,0,w,r,0,r) + (match_operand:DI 2 reg_or_int_operand w,r,0,w,r,Dd,Dd))) (clobber (reg:CC CC_REGNUM))] TARGET_NEON { @@ -600,13 +600,16 @@ case 3: return vadd.i64\t%P0, %P1, %P2; case 1: return #; case 2: return #; +case 4: return #; +case 5: return #; +case 6: return #; default: gcc_unreachable (); } } - [(set_attr neon_type neon_int_1,*,*,neon_int_1) - (set_attr conds *,clob,clob,*) - (set_attr length *,8,8,*) - (set_attr arch nota8,*,*,onlya8)] + [(set_attr neon_type neon_int_1,*,*,neon_int_1,*,*,*) + (set_attr conds *,clob,clob,*,clob,clob,clob) + (set_attr length *,8,8,*,*,*,*) + (set_attr arch nota8,*,*,onlya8,*,*,*)] ) (define_insn *submode3_neon Index: config/arm/constraints.md === --- config/arm/constraints.md (revision 187751) +++ config/arm/constraints.md (working copy) @@ -29,7 +29,7 @@ ;; in Thumb-1 state: I, J, K, L, M, N, O ;; The following multi-letter normal constraints have been used: -;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz +;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py @@ -251,6 +251,12 @@ (match_test TARGET_32BIT arm_const_double_inline_cost (op) == 4 !(optimize_size || arm_ld_sched +(define_constraint Dd + @internal + In ARM/Thumb-2 state a const_int that can be used by insn adddi. + (and (match_code const_int) + (match_test TARGET_32BIT const_ok_for_adddi (ival + (define_constraint Di @internal In ARM/Thumb-2 state a const_int or const_double where both the high
Re: [C++ Patch] PR 53491
I think I would rather fix stabilize_expr to handle void arguments properly: basically just stick the whole argument in *initp and return void_zero_node. Jason
Re: RFC: IRA patch to reduce lifetimes
On Mon, May 21, 2012 at 9:33 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Apr 11, 2012 at 7:35 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 12/23/2011 05:31 PM, Vladimir Makarov wrote: On 12/21/2011 09:09 AM, Bernd Schmidt wrote: This patch was an experiment to see if we can get the same improvement with modifications to IRA, making it more tolerant to over-aggressive scheduling. THe idea is that if an instruction sets a register A, and all its inputs are live and unmodified for the lifetime of A, then moving the instruction downwards towards its first use is going to be beneficial from a register pressure point of view. That alone, however, turns out to be too aggressive, performance drops presumably because we undo too many scheduling decisions. So, the patch detects such situations, and splits the pseudo; a new pseudo is introduced in the original setting instruction, and a copy is added before the first use. If the new pseudo does not get a hard register, it is removed again and instead the setting instruction is moved to the point of the copy. This gets up to 6.5% on 456.hmmer on the mips target I was working on; an embedded benchmark suite also seems to have a (small) geomean improvement. On x86_64, I've tested spec2k, where specint is unchanged and specfp has a tiny performance regression. All these tests were done with a gcc-4.6 based tree. Thoughts? Currently the patch feels somewhat bolted on to the side of IRA, maybe there's a nicer way to achieve this? I think that is an excellent idea. I used analogous approach for splitting pseudo in IRA on loop bounds even if it gets hard register inside and outside loops. The copies are removed if the live ranges were not spilled in reload. I have no problem with this patch. It is just a small change in IRA. Sounds like you're happier with the patch than I am, so who am I to argue. Here's an updated version against current trunk, with some cc0 bugfixes that I've since discovered to be necessary. Bootstrapped and tested (but not benchmarked again) on i686-linux. Ok? Bernd This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53411 This also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53495 -- H.J.
Re: [C++ Patch] PR 53491
On 05/26/2012 04:21 PM, Jason Merrill wrote: I think I would rather fix stabilize_expr to handle void arguments properly: basically just stick the whole argument in *initp and return void_zero_node. Ok. Like this it works, if I understand your suggestion. Thanks, Paolo. /cp 2012-05-26 Paolo Carlini paolo.carl...@oracle.com PR c++/53491 * tree.c (stabilize_expr): Handle exp of void type. /testsuite 2012-05-26 Paolo Carlini paolo.carl...@oracle.com PR c++/53491 * g++.dg/parse/crash60.C: New. Index: testsuite/g++.dg/parse/crash60.C === --- testsuite/g++.dg/parse/crash60.C(revision 0) +++ testsuite/g++.dg/parse/crash60.C(revision 0) @@ -0,0 +1,14 @@ +// PR c++/53491 + +struct M +{ + void pop(); +}; + +void foo() +{ + int result = 0; + M m; + + result += m.pop(); // { dg-error invalid operands|in evaluation } +} Index: cp/tree.c === --- cp/tree.c (revision 187915) +++ cp/tree.c (working copy) @@ -3279,7 +3279,12 @@ stabilize_expr (tree exp, tree* initp) { tree init_expr; - if (!TREE_SIDE_EFFECTS (exp)) + if (VOID_TYPE_P (TREE_TYPE (exp))) +{ + *initp = exp; + return void_zero_node; +} + else if (!TREE_SIDE_EFFECTS (exp)) init_expr = NULL_TREE; /* There are no expressions with REFERENCE_TYPE, but there can be call arguments with such a type; just treat it as a pointer. */
Re: [C++ Patch] PR 53491
On 05/26/2012 11:31 AM, Paolo Carlini wrote: Ok. Like this it works, if I understand your suggestion. Yep, that's what I had in mind. But let's put it after the !TREE_SIDE_EFFECTS case. OK with that change. Jason
Re: RFC (c): PATCH for c++/53220 (array compound literals and C++)
On Sat, 26 May 2012, Jason Merrill wrote: In C++, C99 a compound literal creates a temporary object, unlike C, where it creates an automatic or static object. As a result, using an array compound literal to initialize a pointer variable leads to undefined behavior, as the array's lifetime ends after the declaration statement. This patch changes the C++ front end to reject decay of temporary array compound literals to pointers, and the C front end to warn about it with -Wc++-compat. Tested x86_64-pc-linux-gnu. Is the C front-end change OK? The C front-end change is OK. -- Joseph S. Myers jos...@codesourcery.com
[Ada] Fix some typos in comments
Hello, I happened to notice these typos but I don't have commit rights. Thanks to whoever might pick it up. -- Oliver 2012-05-26 Oliver Kellogg okell...@users.sourceforge.net * alloc.ads, exp_dbug.adb, gcc-interface/misc.c, lib.ads, lib-xref.adb, lib-xref.ads, sem_ch10.adb, sem_ch12.adb, sem_res.ads, table.ads: Fix typo. Index: lib.ads === --- lib.ads (revision 187915) +++ lib.ads (working copy) @@ -66,7 +66,7 @@ --(a) Corresponding spec for a body --(b) Parent spec of a child library spec - --(d) With'ed specs + --(c) With'ed specs --(d) Parent body of a subunit --(e) Subunits corresponding to any specified stubs --(f) Bodies of inlined subprograms that are called @@ -227,7 +227,7 @@ -- A subprogram body (in an adb file) may stand for both a spec and a body. -- A simple model (and one that was adopted through version 2.07) is simply -- to assume that such an adb file acts as its own spec if no ads file is - -- is present. + -- present. -- However, this is not correct. RM 10.1.4(4) requires that such a body -- act as a spec unless a subprogram declaration of the same name is Index: table.ads === --- table.ads (revision 187915) +++ table.ads (working copy) @@ -144,7 +144,7 @@ function Last return Table_Index_Type; pragma Inline (Last); - -- Returns the current value of the last used entry in the table, which + -- Returns the current index of the last used entry in the table, which -- can then be used as a subscript for Table. Note that the only way to -- modify Last is to call the Set_Last procedure. Last must always be -- used to determine the logically last entry. Index: sem_ch10.adb === --- sem_ch10.adb (revision 187915) +++ sem_ch10.adb (working copy) @@ -767,7 +767,7 @@ -- If the subprogram body is a child unit, we must create a -- declaration for it, in order to properly load the parent(s). --- After this, the original unit does not acts as a spec, because +-- After this, the original unit does not act as a spec, because -- there is an explicit one. If this unit appears in a context -- clause, then an implicit with on the parent will be added when -- installing the context. If this is the main unit, there is no Index: sem_ch12.adb === --- sem_ch12.adb (revision 187915) +++ sem_ch12.adb (working copy) @@ -6612,7 +6612,7 @@ -- If we are not instantiating, then this is where we load and -- analyze subunits, i.e. at the point where the stub occurs. A -- more permissive system might defer this analysis to the point - -- of instantiation, but this seems to complicated for now. + -- of instantiation, but this seems too complicated for now. if not Instantiating then declare Index: alloc.ads === --- alloc.ads (revision 187915) +++ alloc.ads (working copy) @@ -30,7 +30,7 @@ -- -- This package contains definitions for initial sizes and growth increments --- for the various dynamic arrays used for principle compiler data strcutures. +-- for the various dynamic arrays used for principal compiler data strcutures. -- The indicated initial size is allocated for the start of each file, and -- the increment factor is a percentage used to increase the table size when -- it needs expanding (e.g. a value of 100 = 100% increase = double) Index: gcc-interface/misc.c === --- gcc-interface/misc.c (revision 187915) +++ gcc-interface/misc.c (working copy) @@ -348,7 +348,7 @@ void gnat_init_gcc_eh (void) { - /* We shouldn't do anything if the No_Exceptions_Handler pragma is set, + /* We shouldn't do anything if the No_Exception_Handlers pragma is set, though. This could for instance lead to the emission of tables with references to symbols (such as the Ada eh personality routine) within libraries we won't link against. */ Index: sem_res.ads === --- sem_res.ads (revision 187915) +++ sem_res.ads (working copy) @@ -45,7 +45,7 @@ -- Since in practice a lot of semantic analysis has to be postponed until -- types are known (e.g. static folding, setting of suppress flags), the -- Resolve routines also complete the semantic analysis, and call the - -- expander for possibly expansion of the completely type resolved node. + -- expander for possibly
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote: On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote: On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 5:0 BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 2b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 00015fd0 gDO .bss 0010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 001016 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNCGLOBAL HIDDEN4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? Example: file:test.c int main () { return (int) __builtin_cpu_is (corei7); } Case I : Use gcc to build dynamic $ gcc test.c -Wl,-y,__cpu_model libgcc.a(cpuinfo.o): reference to __cpu_model libgcc_s.so: definition of __cpu_model Case II: Use g++ to build dynamic $ g++ test.c -Wl,-y,__cpu_model fe1.o: reference to __cpu_model libgcc_s.so: definition of __cpu_model Case III: Use gcc to link static $ gcc test.c -Wl,-y,__cpu_model -static fe1.o: reference to __cpu_model libgcc.a(cpuinfo.o): reference to __cpu_model Please note that in all 3 cases, libgcc.a was linked in. Hence, removing these symbols from the dynamic symbol table of libgcc_s.so should have no issues. Thanks, -Sri. -- H.J.
[C++ Patch] PR 25137 (no -Wmissing-braces in -Wall version)
... and this is the version of the patch which simply takes -Wmissing-braces out of -Wall in C++. Bootstrapped and tested all C-family languages x86-64-linux. Paolo. /// Index: doc/invoke.texi === --- doc/invoke.texi (revision 187916) +++ doc/invoke.texi (working copy) @@ -3090,7 +3090,7 @@ Options} and @ref{Objective-C and Objective-C++ Di -Wformat @gol -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)} @gol -Wmaybe-uninitialized @gol --Wmissing-braces @gol +-Wmissing-braces @r{(only for C/ObjC)} @gol -Wnonnull @gol -Wparentheses @gol -Wpointer-sign @gol @@ -3401,7 +3401,8 @@ or @option{-Wpedantic}. @opindex Wno-missing-braces Warn if an aggregate or union initializer is not fully bracketed. In the following example, the initializer for @samp{a} is not fully -bracketed, but that for @samp{b} is fully bracketed. +bracketed, but that for @samp{b} is fully bracketed. This warning is +enabled by @option{-Wall} in C. @smallexample int a[2][2] = @{ 0, 1, 2, 3 @}; Index: c-family/c-opts.c === --- c-family/c-opts.c (revision 187916) +++ c-family/c-opts.c (working copy) @@ -370,7 +370,6 @@ c_common_handle_option (size_t scode, const char * c_family_lang_mask, kind, loc, handlers, global_dc); warn_char_subscripts = value; - warn_missing_braces = value; warn_parentheses = value; warn_return_type = value; warn_sequence_point = value; /* Was C only. */ @@ -402,6 +401,8 @@ c_common_handle_option (size_t scode, const char * done in c_common_post_options. */ if (warn_enum_compare == -1) warn_enum_compare = value; + + warn_missing_braces = value; } else { Index: testsuite/g++.dg/warn/Wbraces4.C === --- testsuite/g++.dg/warn/Wbraces4.C(revision 0) +++ testsuite/g++.dg/warn/Wbraces4.C(revision 0) @@ -0,0 +1,34 @@ +// PR c++/25137 +// { dg-options -Wmissing-braces } + +struct S { int s[3]; }; +S s1 = { 1, 1, 1 }; // { dg-warning missing braces } + +struct S1 { int s[3]; }; +struct S2 { struct S1 a; }; +S2 s21 = { 1, 1, 1 }; // { dg-warning missing braces } + +struct S3 { int s[3]; }; +struct S4 { struct S3 a; int b; }; +S4 s41 = { 1, 1, 1, 1 };// { dg-warning missing braces } + +struct S5 { int s[3]; }; +struct S6 { struct S5 a; int b; }; +S6 s61 = { { 1, 1, 1 }, 1 };// { dg-warning missing braces } + +struct S7 { int s[3]; }; +struct S8 { int a; struct S7 b; }; +S8 s81 = { 1, { 1, 1, 1 } };// { dg-warning missing braces } + +struct S9 { int s[2]; }; +struct S10 { struct S9 a; struct S9 b; }; +S10 s101 = { { 1, 1 }, 1, 1 }; // { dg-warning missing braces } + +struct S11 { int s[2]; }; +struct S12 { struct S11 a; struct S11 b; }; +S12 s121 = { { 1, 1 }, { 1, 1 } }; // { dg-warning missing braces } + +struct S13 { int i; }; +struct S14 { struct S13 a; }; +struct S15 { struct S14 b; }; +S15 s151 = { 1 }; // { dg-warning missing braces } Index: testsuite/g++.dg/warn/Wbraces3.C === --- testsuite/g++.dg/warn/Wbraces3.C(revision 0) +++ testsuite/g++.dg/warn/Wbraces3.C(revision 0) @@ -0,0 +1,34 @@ +// PR c++/25137 +// { dg-options -Wall } + +struct S { int s[3]; }; +S s1 = { 1, 1, 1 }; + +struct S1 { int s[3]; }; +struct S2 { struct S1 a; }; +S2 s21 = { 1, 1, 1 }; + +struct S3 { int s[3]; }; +struct S4 { struct S3 a; int b; }; +S4 s41 = { 1, 1, 1, 1 }; + +struct S5 { int s[3]; }; +struct S6 { struct S5 a; int b; }; +S6 s61 = { { 1, 1, 1 }, 1 }; + +struct S7 { int s[3]; }; +struct S8 { int a; struct S7 b; }; +S8 s81 = { 1, { 1, 1, 1 } }; + +struct S9 { int s[2]; }; +struct S10 { struct S9 a; struct S9 b; }; +S10 s101 = { { 1, 1 }, 1, 1 }; + +struct S11 { int s[2]; }; +struct S12 { struct S11 a; struct S11 b; }; +S12 s121 = { { 1, 1 }, { 1, 1 } }; + +struct S13 { int i; }; +struct S14 { struct S13 a; }; +struct S15 { struct S14 b; }; +S15 s151 = { 1 };
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote: On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote: On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 5:0 BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 2b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 00015fd0 g DO .bss 0010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. That will be great. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? Duplication comes from static and dynamic symbol tables. In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? You don't want one copy of those 2 symbols in each DSO where they are used. -- H.J.
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote: On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote: On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 5:0 BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 2b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 00015fd0 g DO .bss 0010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. That will be great. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? Duplication comes from static and dynamic symbol tables. In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? You don't want one copy of those 2 symbols in each DSO where they are used. Right, I agree. But this problem exists right now even if libgcc_s.so is provided with these symbols. Please see example below: Example: dso.c --- int some_func () { return (int) __builtin_cpu_is (corei7); } Build with gcc driver: $ gcc dso.c -fPIC -shared -o dso.so $ nm dso.so | grep __cpu 0780 t __cpu_indicator_init 1e00 b __cpu_model This DSO is getting its own local copy of __cpu_model. This is fine functionally but this is not the behaviour you have in mind. whereas, if I build with g++ driver: $ g++ dso.c -fPIC -shared dso.so $ nm dso.so | grep __cpu U __cpu_model This is as we would like, __cpu_model is undefined. The difference is that with the gcc driver, the link line is -lgcc -lgcc_s, whereas with the g++ driver -lgcc is not even present! Should I fix the gcc driver instead? This double-standard is not clear to me. Thanks, -Sri. -- H.J.
RFA: temp slot TLC [1/3]
Hi, I still had some cleanups for age-old code lying around, and thought to bring it up to date. The whole dealing of slots for temporary stack space in function.c was never really updated to the way we're meanwhile expanding statements. It has facilities that aren't useful anymore. In the old days it worked like this: after a statement was expanded temporary slots that were allocated during that expansion would be freed. The problem was that statements could be arbitrarily complex, including calls as call arguments or statement expressions. So there had to be a way that some of these slots could be marked as to be kept until the next scope would be popped (the 'keep' flag to assign_temp), or at least that the slot might be used in the containing expression even after a free_temp_slots (the addr_taken flag for a slot). Meanwhile the situation is like so: 1) There are only three places that still allocate a 'kept' slot (i.e. keep==1 in a call to assign_temp, assign_stack_temp and assign_stack_temp_for_type): a) expand_expr_real_1, when expanding a reference from a constant object, which must be in memory but isn't; this is meanwhile useless, the so allocated temporary will not be freed until the full statement is done, at which point the expanded reference will either be consumed into the real objects, or will be ignored; it's not necessary to artificially mark it kept b) expand_asm_operands, for handling mem inputs on non-lvalues; this is dead code, we rewrite all such asms at gimple level already. c) in expand_decl; this routine I'm going to remove (with removal of the reference from Ada it's all dead code) 2) The single place that still sometimes marks temp slots as addr_taken is in expand_call. It does so when the callee returns an aggregate and no target is given (or no return slot optimization should happen). The temp slot will not go away until the next free_temp_slots at which point it must already be copied out into the real LHS of the (gimple) call statement or it'll be ignored, so marking it to be kept isn't necessary. expand_decl is dead code because it only does things for CONST_DECLs (for Ada, but I'm removing that use) and automatic vars. But automatic vars are (supposed to be) allocated since quite some time in cfgexpand. One call to expand_decl that still did something is for expansion of COMPOUND_LITERAL_EXPR, which we can have only in global ctors (usual problem of not gimplifying those). That's easy to deal with, though, as in the patch. The other expand_decl call that does something is for generating RTL for the nonlocal_goto_save_area in case it doesn't have RTL yet. The latter shouldn't (and after the patch doesn't) happen: the underlying variable is supposed to be in local_decls and hence should have gotten RTL during cfgexpand. This is only not true when it was unused, removed from local_decls, but the reference from cfun remains. In that case we'd expand (unused) stack space. Instead I simply remove the reference from cfun when the variable is removed. As the data flow regarding temporaries in the old RTL code is a bit complicated I reassured myself regarding the above points with various gcc_unreachable and commenting code. That is this first patch. The second and third patch contain the more or less obvious cleanup followups. I'm asking for approval for all three patches, but I have bootstrapped the first separately, so if approved I'm going to commit them as separate revisions so that in case I got something wrong with this touchy area it's easy to play with adjusting just parts of the patch, without always having to redo all the cleanup adjustments. So, regstrapped on x86_64-linux all languages (+Ada,+obj-c++), this patch alone and in connection with [2/3] and [3/3]. Okay for trunk (I have approval for the Ada part already). Ciao, Michael. --- * expr.c (expand_expr_real_1 normal_inner_ref): Don't allocate a kept temp. (expand_expr_real_1 COMPOUND_LITERAL_EXPR): Make unreachable. * gimple-fold.c (canonicalize_constructor_val): Canonicalize COMPOUND_LITERAL_EXPR. * function.c (expand_function_start): Don't call expand_decl, instead assert that we have RTL assigned. * tree-ssa-live.c (remove_unused_locals): Clear nonlocal_goto_save_area if its backing variable is removed. * stmt.c (expand_asm_operands): Remove handling of non-lvalues as mem inputs. (expand_decl): Assert that this does nothing. * calls.c (expand_call): Don't call mark_temp_addr_taken. * c-tree.h (c_expand_decl): Remove prototype. c-family/ * c-common.h (c_expand_decl): Remove prototype. ada/ * gcc-interface/utils.c (create_var_decl_1): Don't call expand_decl. Index: expr.c === ---
RFA: temp slot TLC [2/3]
Hi, and this is the large cleanup, removing dead code introduced by [1/3] for real, and removing the 'keep' parameter from assign_temp and friends, which now is always zero (in the compiler proper and in the backends). That latter is the largest churn. If [1/3] is approved this whole patch is large but obvious. Regstrapped together with [1/3] on x86_64-linux, all languages (+Ada,+obj-c++). Ciao, Michael. * rtl.h (assign_stack_temp, assign_stack_temp_for_type, assign_temp): Remove 'keep' argument. (mark_temp_addr_taken): Remove prototype. * tree.h (expand_decl): Remove prototype. * function.c (struct temp_slot): Remove addr_taken and keep member. (assign_stack_temp_for_type) Don't initialize above, remove keep argument. (assign_stack_temp, assign_temp): Remove keep argument. (mark_temp_addr_taken): Remove. (preserve_temp_slots): Remove handling of addr_taken and keep members. (free_temp_slots): Ditto. * expr.c (expand_expr_real_1 COMPOUND_LITERAL_EXPR): Remove handling. * stmt.c (expand_asm_operands): Remove handling of non-lvalues as mem inputs. (expand_decl): Remove. * c-decl.c (finish_struct): Don't call expand_decl. * builtins.c (expand_builtin_cexpi): Adjust calls to assign_temp and assign_stack_temp. * calls.c (save_fixed_argument_area, initialize_argument_information, expand_call, emit_library_call_value_1, store_one_arg): Ditto. * expmed.c (extract_bit_field_1): Ditto. * expr.c (emit_group_load_1, emit_group_store, copy_blkmode_from_reg, emit_push_insn, expand_assignment, store_field, expand_constructor, expand_cond_expr_using_cmove, expand_expr_real_2, expand_expr_real_1): Ditto. * stmt.c (expand_asm_operands, expand_return): Ditto. * config/arm/arm.c (neon_expand_vector_init): Ditto. * config/i386/i386.c (ix86_expand_vector_set): Ditto. (ix86_expand_vector_extract): Ditto. * config/ia64/ia64.c (spill_xfmode_rfmode_operand, ia64_expand_movxf_movrf): Ditto. * config/mips/mips.c (mips_expand_vi_general): Ditto. * config/mmix/mmix.md (floatdisf2, floatunsdisf2, truncdfsf2, extendsfdf2): Ditto. * config/rs6000/rs6000.c (rs6000_expand_vector_init, rs6000_expand_vector_set, rs6000_expand_vector_extract, rs6000_allocate_stack_temp): Ditto. * config/rs6000/rs6000.md (fix_trunctfsi2_fprs): Ditto. * config/sparc/sparc.c (emit_soft_tfmode_libcall, sparc_emit_float_lib_cmp, sparc_emit_float_lib_cmp, sparc_expand_vector_init): Ditto. Index: gcc/builtins.c === --- gcc.orig/builtins.c 2012-05-26 21:46:58.0 +0200 +++ gcc/builtins.c 2012-05-26 21:47:02.0 +0200 @@ -2650,8 +2650,8 @@ expand_builtin_cexpi (tree exp, rtx targ else gcc_unreachable (); - op1 = assign_temp (TREE_TYPE (arg), 0, 1, 1); - op2 = assign_temp (TREE_TYPE (arg), 0, 1, 1); + op1 = assign_temp (TREE_TYPE (arg), 1, 1); + op2 = assign_temp (TREE_TYPE (arg), 1, 1); op1a = copy_addr_to_reg (XEXP (op1, 0)); op2a = copy_addr_to_reg (XEXP (op2, 0)); top1 = make_tree (build_pointer_type (TREE_TYPE (arg)), op1a); Index: gcc/calls.c === --- gcc.orig/calls.c2012-05-26 21:47:02.0 +0200 +++ gcc/calls.c 2012-05-26 21:47:02.0 +0200 @@ -933,7 +933,7 @@ save_fixed_argument_area (int reg_parm_s set_mem_align (stack_area, PARM_BOUNDARY); if (save_mode == BLKmode) { - save_area = assign_stack_temp (BLKmode, num_to_save, 0); + save_area = assign_stack_temp (BLKmode, num_to_save); emit_block_move (validize_mem (save_area), stack_area, GEN_INT (num_to_save), BLOCK_OP_CALL_PARM); } @@ -1258,7 +1258,7 @@ initialize_argument_information (int num set_mem_attributes (copy, type, 1); } else - copy = assign_temp (type, 0, 1, 0); + copy = assign_temp (type, 1, 0); store_expr (args[i].tree_value, copy, 0, false); @@ -2404,7 +2404,7 @@ expand_call (tree exp, rtx target, int i /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ - rtx d = assign_temp (rettype, 0, 1, 1); + rtx d = assign_temp (rettype, 1, 1); structure_value_addr = XEXP (d, 0); target = 0; } @@ -3278,7 +3278,7 @@ expand_call (tree exp, rtx target, int i
RFA: temp slot TLC [3/3]
Hi, and this is a further small cleanup. pop_temp_slots is now the same as free_temp_slots (modulo the level-- of course), so there's no need in calling both. (and preserve_temp_slots(NULL) is useless now). Regstrapped on x86_64-linux together with [1/3] and [2/3], all languages (+Ada,+obj-c++). Okay for trunk? Ciao, Michael. -- * function.c (pop_temp_slots): Call free_temp_slots. * calls.c (store_one_arg): Don't call preserve_temp_slots or free_temp_slots. * expr.c (expand_assignment): Don't call free_temp_slots. Index: calls.c === --- calls.c.orig2012-05-26 21:47:02.0 +0200 +++ calls.c 2012-05-26 23:42:57.0 +0200 @@ -4721,11 +4721,7 @@ store_one_arg (struct arg_data *arg, rtx be deferred during the rest of the arguments. */ NO_DEFER_POP; - /* Free any temporary slots made in processing this argument. Show - that we might have taken the address of something and pushed that - as an operand. */ - preserve_temp_slots (NULL_RTX); - free_temp_slots (); + /* Free any temporary slots made in processing this argument. */ pop_temp_slots (); return sibcall_failure; Index: expr.c === --- expr.c.orig 2012-05-26 21:47:02.0 +0200 +++ expr.c 2012-05-26 23:43:23.0 +0200 @@ -4836,7 +4836,6 @@ expand_assignment (tree to, tree from, b if (result) preserve_temp_slots (result); - free_temp_slots (); pop_temp_slots (); return; } @@ -4884,7 +4883,6 @@ expand_assignment (tree to, tree from, b emit_move_insn (to_rtx, value); } preserve_temp_slots (to_rtx); - free_temp_slots (); pop_temp_slots (); return; } @@ -4911,7 +4909,6 @@ expand_assignment (tree to, tree from, b emit_move_insn (to_rtx, temp); preserve_temp_slots (to_rtx); - free_temp_slots (); pop_temp_slots (); return; } @@ -4941,7 +4938,6 @@ expand_assignment (tree to, tree from, b TYPE_MODE (sizetype)); preserve_temp_slots (to_rtx); - free_temp_slots (); pop_temp_slots (); return; } @@ -4951,7 +4947,6 @@ expand_assignment (tree to, tree from, b push_temp_slots (); result = store_expr (from, to_rtx, 0, nontemporal); preserve_temp_slots (result); - free_temp_slots (); pop_temp_slots (); return; } Index: function.c === --- function.c.orig 2012-05-26 23:41:39.0 +0200 +++ function.c 2012-05-26 23:42:27.0 +0200 @@ -1200,22 +1200,7 @@ push_temp_slots (void) void pop_temp_slots (void) { - struct temp_slot *p, *next; - bool some_available = false; - - for (p = *temp_slots_at_level (temp_slot_level); p; p = next) -{ - next = p-next; - make_slot_available (p); - some_available = true; -} - - if (some_available) -{ - remove_unused_temp_slot_addresses (); - combine_temp_slots (); -} - + free_temp_slots (); temp_slot_level--; }
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam tmsri...@google.com wrote: On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote: On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote: On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 5:0 BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 2b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 00015fd0 g DO .bss 0010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. That will be great. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? Duplication comes from static and dynamic symbol tables. In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? You don't want one copy of those 2 symbols in each DSO where they are used. Right, I agree. But this problem exists right now even if libgcc_s.so is provided with these symbols. Please see example below: Example: dso.c --- int some_func () { return (int) __builtin_cpu_is (corei7); } Build with gcc driver: $ gcc dso.c -fPIC -shared -o dso.so $ nm dso.so | grep __cpu 0780 t __cpu_indicator_init 1e00 b __cpu_model This DSO is getting its own local copy of __cpu_model. This is fine functionally but this is not the behaviour you have in mind. whereas, if I build with g++ driver: $ g++ dso.c -fPIC -shared dso.so $ nm dso.so | grep __cpu U __cpu_model This is as we would like, __cpu_model is undefined. The difference is that with the gcc driver, the link line is -lgcc -lgcc_s, whereas with the g++ driver -lgcc is not even present! Should I fix the gcc driver instead? This double-standard is not clear to me. That is because libgcc_s.so is preferred by g++. We can do one of 3 things: 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_eh.a. 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_static.a. 3. Add libgcc_static.a and move __cpu_model and __cpu_indicator_ini from libgcc.a to libgcc_static.a. We treat libgcc_static.a similar to libgcc_eh.a. -- H.J.
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Sat, May 26, 2012 at 7:06 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam tmsri...@google.com wrote: On Sat, May 26, 2012 at 4:56 PM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 10:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam tmsri...@google.com wrote: On May 25, 2012 7:15 PM, H.J. Lu hjl.to...@gmail.com wrote: On May 25, 2012 6:54 PM, Sriraman Tallam tmsri...@google.com wrote: On Fri, May 25, 2012 at 5:0 BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 2b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 00214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 00015fd0 g DO .bss 0010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. That will be great. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? Duplication comes from static and dynamic symbol tables. In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 0010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? You don't want one copy of those 2 symbols in each DSO where they are used. Right, I agree. But this problem exists right now even if libgcc_s.so is provided with these symbols. Please see example below: Example: dso.c --- int some_func () { return (int) __builtin_cpu_is (corei7); } Build with gcc driver: $ gcc dso.c -fPIC -shared -o dso.so $ nm dso.so | grep __cpu 0780 t __cpu_indicator_init 1e00 b __cpu_model This DSO is getting its own local copy of __cpu_model. This is fine functionally but this is not the behaviour you have in mind. whereas, if I build with g++ driver: $ g++ dso.c -fPIC -shared dso.so $ nm dso.so | grep __cpu U __cpu_model This is as we would like, __cpu_model is undefined. The difference is that with the gcc driver, the link line is -lgcc -lgcc_s, whereas with the g++ driver -lgcc is not even present! Should I fix the gcc driver instead? This double-standard is not clear to me. That is because libgcc_s.so is preferred by g++. We can do one of 3 things: 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_eh.a. 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_static.a. 3. Add libgcc_static.a and move __cpu_model and __cpu_indicator_ini from libgcc.a to libgcc_static.a. We treat libgcc_static.a similar to libgcc_eh.a. Any reason why gcc should not be made to prefer libgcc_s.so too like g++? Thanks for clearing this up. I will
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On Sat, May 26, 2012 at 7:23 PM, Sriraman Tallam tmsri...@google.com wrote: That is because libgcc_s.so is preferred by g++. We can do one of 3 things: 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_eh.a. 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_static.a. 3. Add libgcc_static.a and move __cpu_model and __cpu_indicator_ini from libgcc.a to libgcc_static.a. We treat libgcc_static.a similar to libgcc_eh.a. Any reason why gcc should not be made to prefer libgcc_s.so too like g++? Thanks for clearing this up. I will take a stab at it. This is a long story. The short answer is people didn't want to add libgcc_s.so to DT_NEEDED for C programs. But it is no longer an issue since we now pass -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed to linker. -- H.J.
RFA: Speedup expand_used_vars by 30 times (PR38474)
Hi, [for certain test cases :-) ] the temp slot cleanups I just sent where actually motivated by PR38474. It exposes many slownesses in the compiler, but at -O0 the only remaining one is the expand phase. expanding variables is the one thing (on my machine here, with -O0 -g f951): 30 seconds for the reduced testcase. expand itself (the statements) then needs 5.5 seconds and is the other thing. Overall 77 seconds compile time. The problem with expanding variables in this case is the add_alias_set_conflicts routine. It walks all NxN/2 stack variable pairs, looks if pair (i,j) can share the same stack slot from a type perspective (canshare(i,j)) and adds a conflict if not so. Before asking that question it doesn't look if pair (i,j) maybe already conflict for other reasons. Now, we could simply add the conflicts(i,j) test in that loop, before asking canshare(i,j) and adding a new conflict. But adding conflicts comes with a cost, so we can do better. The answer to canshare(i,j) depends on only all components already merged into [i] and the type of j. With carefully choosing a type to represent all components of [i] and updating it in union_stack_vars we can simply ask canshare(i,j) right before we actually want to merge j into [i]. That's the least amount of work possible as long as we want to have the same results. This change brings the 30 seconds for var expansion down to 0.8 seconds. The other change in function.c further improves the temp slot machinery. While expanding free_temp_slots is called after each statement, and if it is able to free a slot it needs to update the RTX-slot mapping (a htab_t). The freeing of slots was done incorrectly, it overwrote the slot in question with NULL, but it should have used htab_clear_slot. This meant that the hashtable kept growing and growing even with all elements being empty because the size statistics aren't correct. Additionally it breaks hash chains if there is one striding the nullified slot. And then a microoptimization: if we know there aren't any slots in use at all (happens often in normal circumstances) we can as well use htab_empty instead of traversing the hash table for the right slots. This brings the expansion time (i.e. for expanding the statements) down from 5.5 to 2.8 seconds. All changes together reduce the compile time for the reduced testcase from ~ 77 seconds to ~ 44. A variant of this regstrapped on x86_64-linux, all languages+Ada+objc++, but after some changes I'm redoing that regstrap. Okay for trunk if that passes? Ciao, Michael. --- PR middle-end/38474 * cfgexpand.c (struct stack_var): Add slot_type member. (add_stack_var): Initialize it. (add_alias_set_conflicts): Remove. (merge_stack_vars_p, more_specific_type): New functions. (nonconflicting_type): New static data. (union_stack_vars): Use more_specific_type to update slot_type. (partition_stack_vars): Call merge_stack_vars_p ... (expand_used_vars): ... instead of add_alias_set_conflicts. (fini_vars_expansion): Clear nonconflicting_type. * function.c (n_temp_slots_in_use): New static data. (make_slot_available, assign_stack_temp_for_type): Update it. (init_temp_slots): Zero it. (remove_unused_temp_slot_addresses): Use it for quicker removal. (remove_unused_temp_slot_addresses_1): Use htab_clear_slot. Index: cfgexpand.c === --- cfgexpand.c.orig2012-05-27 01:33:55.0 +0200 +++ cfgexpand.c 2012-05-27 04:37:18.0 +0200 @@ -177,6 +177,11 @@ struct stack_var /* The next stack variable in the partition, or EOC. */ size_t next; + /* The most specific type of all variables merged with the slot + if this is a representative. Initially this is simply the + TREE_TYPE for the variable. */ + tree slot_type; + /* The numbers of conflicting stack variables. */ bitmap conflicts; }; @@ -285,6 +290,7 @@ add_stack_var (tree decl) /* All variables are initially in their own partition. */ v-representative = stack_vars_num; v-next = EOC; + v-slot_type = TREE_TYPE (decl); /* All variables initially conflict with no other. */ v-conflicts = NULL; @@ -353,45 +359,40 @@ aggregate_contains_union_type (tree type return false; } -/* A subroutine of expand_used_vars. If two variables X and Y have alias - sets that do not conflict, then do add a conflict for these variables - in the interference graph. We also need to make sure to add conflicts - for union containing structures. Else RTL alias analysis comes along - and due to type based aliasing rules decides that for two overlapping - union temporaries { short s; int i; } accesses to the same mem through - different types may not alias and happily reorders stores across - life-time boundaries of the temporaries (See
Re: [cxx-conversion] New Hash Table (issue6244048)
Hi, On Fri, 25 May 2012, NightStrike wrote: This point of yours should be stressed. Using writing standards of one language to develop in another language is a fundamentally bad idea. C and C++ kind of look the same, but they are not: http://www.research.att.com/~bs/bs_faq.html#C-slash You wouldn't use the GNU C Coding conventions to write in tcl, and you shouldn't use them to write in C++. You should just create the GNU C++ Coding Standards new, and not base them off of the former. That is all nice and fine. For starting from scratch. But we aren't. We have an existing compiler written in a certain style. We have existing people actually working on it and used to that style. We don't want to have a mixture of several different styles in the compiler. I (and I expect many others) don't want anyone working around the latter by going over the whole source base and reindent everything. Hence inventing a new coding standard for GCC-in-C++ (by reusing existing ones or doing something new) that isn't mostly the same as GCC-in-C isn't going to fly. Ciao, Michael.