Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call
2011/10/31 Georg-Johann Lay a...@gjlay.de: This is a fix for optimization flaw when dividing int by 2. There is really no need for a library call. Costs of [U]DIV/[U]MOD are adjusted to take into account the costs of CONST_INT operands that must be loaded for division by means of libgcc call. There are some new combiner patterns suffixed .lt0 that so adjustment frequently seen when division-by-const in lowered to arithmetic in order to avoid more expensive libcall. Moreover, there are two patterns for adding sign-extended QI to HI. These patterns are shorter, faster and have lower register pressure than explicitly sign-extending the QI before adding it. Example code is: int add (int a, char b) { return a + b; } int sub (int a, char b) { return a - b; } add: add r24,r22 ; 13 *addhi3.sign_extend1 [length = 4] adc r25,__zero_reg__ sbrc r22,7 dec r25 ret sub: sub r24,r22 ; 13 *subhi3.sign_extend2 [length = 4] sbc r25,__zero_reg__ sbrc r22,7 inc r25 ret The reg_overlap_mentioned case is just for pathological code like, e.g. a + (char) a so that the expected size is 4 instructions. Since beginning of time, BRANCH_COST was set to 0 so that some optimization passes make code happily jumping around. The patch introduces a new command line option for that; mainly because I don't know the rationale behind setting BRANCH_COST to 0. Regression-tested. Ok for trunk? Johann * config/avr/avr.opt (-mbranch-cost=): New option. * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost. * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs. * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns. (*addhi3_zero_extend1): Remov % in constraint of operand 1. (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns. Approved. Denis.
[PATCH] Add vcond/vcondu patterns to sparc backend.
I really wanted to make this work using the define_expand rtl to generate the pattern, but I ran into two problems: 1) In addition to mode GCM, we also need to iterate over P mode for the sake of the rtl of fpcmp and cmask. So we'd get dups in the insn output files. 2) I couldn't substitute the mode attribute gcm_name into the cmask unspec code. ie. UNSPEC_CMASKgcm_name didn't work. Anyways, at least there is one expander function shared between the signed and unsigned cases. Committed to trunk. gcc/ * config/sparc/sparc.c (sparc_expand_vcond): New function. * config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it. * config/sparc/sparc.md (vcondmodemode): New VIS3 expander. (vconduv8qiv8qi): Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180733 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog |7 +++ gcc/config/sparc/sparc-protos.h |1 + gcc/config/sparc/sparc.c| 37 + gcc/config/sparc/sparc.md | 30 ++ 4 files changed, 75 insertions(+), 0 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d5f725b..d6a9c4d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2011-11-01 David S. Miller da...@davemloft.net + + * config/sparc/sparc.c (sparc_expand_vcond): New function. + * config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it. + * config/sparc/sparc.md (vcondmodemode): New VIS3 expander. + (vconduv8qiv8qi): Likewise. + 2011-11-01 Alexandre Oliva aol...@redhat.com PR debug/50869 diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h index 108e105..b9a094e 100644 --- a/gcc/config/sparc/sparc-protos.h +++ b/gcc/config/sparc/sparc-protos.h @@ -108,6 +108,7 @@ extern const char *output_v8plus_mult (rtx, rtx *, const char *); extern void sparc_expand_vector_init (rtx, rtx); extern void sparc_expand_vec_perm_bmask(enum machine_mode, rtx); extern bool sparc_expand_conditional_move (enum machine_mode, rtx *); +extern void sparc_expand_vcond (enum machine_mode, rtx *, int, int); #endif /* RTX_CODE */ #endif /* __SPARC_PROTOS_H__ */ diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index fd1b190..6431405 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -11531,4 +11531,41 @@ sparc_expand_conditional_move (enum machine_mode mode, rtx *operands) return true; } +void +sparc_expand_vcond (enum machine_mode mode, rtx *operands, int ccode, int fcode) +{ + rtx mask, cop0, cop1, fcmp, cmask, bshuf, gsr; + enum rtx_code code = GET_CODE (operands[3]); + + mask = gen_reg_rtx (Pmode); + cop0 = operands[4]; + cop1 = operands[5]; + if (code == LT || code == GE) +{ + rtx t; + + code = swap_condition (code); + t = cop0; cop0 = cop1; cop1 = t; +} + + gsr = gen_rtx_REG (DImode, SPARC_GSR_REG); + + fcmp = gen_rtx_UNSPEC (Pmode, +gen_rtvec (1, gen_rtx_fmt_ee (code, mode, cop0, cop1)), +fcode); + + cmask = gen_rtx_UNSPEC (DImode, + gen_rtvec (2, mask, gsr), + ccode); + + bshuf = gen_rtx_UNSPEC (mode, + gen_rtvec (3, operands[1], operands[2], gsr), + UNSPEC_BSHUFFLE); + + emit_insn (gen_rtx_SET (VOIDmode, mask, fcmp)); + emit_insn (gen_rtx_SET (VOIDmode, gsr, cmask)); + + emit_insn (gen_rtx_SET (VOIDmode, operands[0], bshuf)); +} + #include gt-sparc.h diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index fbd1a87..5924403 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -8299,6 +8299,36 @@ [(set_attr type fpmul) (set_attr fptype double)]) +(define_expand vcondmodemode + [(match_operand:GCM 0 register_operand ) + (match_operand:GCM 1 register_operand ) + (match_operand:GCM 2 register_operand ) + (match_operator 3 + [(match_operand:GCM 4 register_operand ) + (match_operand:GCM 5 register_operand )])] + TARGET_VIS3 +{ + sparc_expand_vcond (MODEmode, operands, + UNSPEC_CMASKgcm_name, + UNSPEC_FCMP); + DONE; +}) + +(define_expand vconduv8qiv8qi + [(match_operand:V8QI 0 register_operand ) + (match_operand:V8QI 1 register_operand ) + (match_operand:V8QI 2 register_operand ) + (match_operator 3 + [(match_operand:V8QI 4 register_operand ) + (match_operand:V8QI 5 register_operand )])] + TARGET_VIS3 +{ + sparc_expand_vcond (V8QImode, operands, + UNSPEC_CMASK8, + UNSPEC_FUCMP); + DONE; +}) + (define_insn array8P:mode_vis [(set (match_operand:P 0 register_operand =r) (unspec:P [(match_operand:P 1 register_or_zero_operand rJ) -- 1.7.6.401.g6a319
RFA: Fix dse / postreload not to bypass add expanders
This patch makes emit_inc_dec_insn_before use add3_insn / gen_move_insn so that the appropriate expanders are used to create the new instructions, and for dse it use the available register liveness information to check that no live fixed hard register, like a flags register, is clobbered in the process. For postreload, there is no such information available, so we give up when we see a clobber / set that might be problematic. regtested for epiphany-elf with modified rtx_cost, where it fixes three ICE-on-valid-code: FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c compilation, -O1 (internal compiler error) FAIL: gcc.c-torture/execute/builtins/memmove-chk.c compilation, -O1 (internal compiler error) FAIL: gcc.c-torture/execute/memcpy-bi.c compilation, -O1 (internal compiler error) Bootstrapped and regression tested on i686-pc-linux-gnu . 2011-10-31 Joern Rennecke joern.renne...@embecosm.com * regset.h (fixed_regset): Declare. * dse.c: Include regset.h . (struct insn_info): Add member fixed_regs_live. (note_add_store_info): New typedef. (note_add_store): New function. (emit_inc_dec_insn_before): Expect arg to be of type insn_info_t . Use gen_add3_insn / gen_move_insn. Check new insn for unwanted clobbers before emitting it. (check_for_inc_dec): Rename to... (check_for_inc_dec_1:) ... this. Return bool. Take insn_info parameter. Changed all callers in file. (check_for_inc_dec, copy_fixed_regs): New functions. (scan_insn): Set fixed_regs_live field of insn_info. * rtl.h (check_for_inc_dec): Update prototype. * postreload.c (reload_cse_simplify): Take new signature of check_ind_dec into account. * reginfo.c (fixed_regset): New variable. (init_reg_sets_1): Initialize it. Index: postreload.c === --- postreload.c(revision 180683) +++ postreload.c(working copy) @@ -112,8 +112,8 @@ reload_cse_simplify (rtx insn, rtx testr if (REG_P (value) ! REG_FUNCTION_VALUE_P (value)) value = 0; - check_for_inc_dec (insn); - delete_insn_and_edges (insn); + if (check_for_inc_dec (insn, NULL)) + delete_insn_and_edges (insn); return; } @@ -164,8 +164,8 @@ reload_cse_simplify (rtx insn, rtx testr if (i 0) { - check_for_inc_dec (insn); - delete_insn_and_edges (insn); + if (check_for_inc_dec (insn, NULL)) + delete_insn_and_edges (insn); /* We're done with this insn. */ return; } Index: regset.h === --- regset.h(revision 180683) +++ regset.h(working copy) @@ -1,6 +1,6 @@ /* Define regsets. Copyright (C) 1987, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, - 2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. + 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -115,6 +115,9 @@ #define EXECUTE_IF_AND_IN_REG_SET(REGSET extern regset regs_invalidated_by_call_regset; +/* Same information as FIXED_REG_SET but in regset form. */ +extern regset fixed_regset; + /* An obstack for regsets. */ extern bitmap_obstack reg_obstack; Index: dse.c === --- dse.c (revision 180683) +++ dse.c (working copy) @@ -33,6 +33,7 @@ Software Foundation; either version 3, o #include tm_p.h #include regs.h #include hard-reg-set.h +#include regset.h #include flags.h #include df.h #include cselib.h @@ -377,6 +378,13 @@ struct insn_info created. */ read_info_t read_rec; + /* The live fixed registers. We assume only fixed registers can + cause trouble by being clobbered from an expanded pattern; + storing only the live fixed registers (rather than all registers) + means less memory needs to be allocated / copied for the individual + stores. */ + regset fixed_regs_live; + /* The prev insn in the basic block. */ struct insn_info * prev_insn; @@ -448,9 +456,9 @@ struct bb_info /* The following bitvector is indexed by the reg number. It contains the set of regs that are live at the current instruction being processed. While it contains info for all of the - registers, only the pseudos are actually examined. It is used to - assure that shift sequences that are inserted do not accidently - clobber live hard regs. */ + registers, only the hard registers are actually examined. It is used + to assure that shift and/or add sequences that are inserted do not + accidently clobber live hard regs. */ bitmap regs_live; }; @@ -827,6 +835,51 @@ free_store_info (insn_info_t insn_info) insn_info-store_rec = NULL; } +typedef struct +{ + rtx
[PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders
Hi! Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode conversion. Ok for trunk? 2011-11-01 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (ssepackfltmode): New mode attr. (vec_pack_ufix_trunc_mode): New expander using VF2 iterator. --- gcc/config/i386/sse.md.jj 2011-11-01 09:04:37.0 +0100 +++ gcc/config/i386/sse.md 2011-11-01 09:37:36.0 +0100 @@ -3127,6 +3127,56 @@ (define_expand vec_pack_sfix_trunc_v2df DONE; }) +(define_mode_attr ssepackfltmode + [(V4DF V8SI) (V2DF V4SI)]) + +(define_expand vec_pack_ufix_trunc_mode + [(match_operand:ssepackfltmode 0 register_operand ) + (match_operand:VF2 1 register_operand ) + (match_operand:VF2 2 register_operand )] + TARGET_AVX +{ + REAL_VALUE_TYPE MTWO32r, TWO31r; + rtx two31r, mtwo32r, tmp[8]; + int i; + + for (i = 0; i 6; i++) +tmp[i] = gen_reg_rtx (MODEmode); + tmp[6] = gen_reg_rtx (ssepackfltmodemode); + tmp[7] = gen_reg_rtx (ssepackfltmodemode); + real_ldexp (TWO31r, dconst1, 31); + two31r = const_double_from_real_value (TWO31r, DFmode); + two31r = ix86_build_const_vector (MODEmode, 1, two31r); + two31r = force_reg (MODEmode, two31r); + real_ldexp (MTWO32r, dconstm1, 32); + mtwo32r = const_double_from_real_value (MTWO32r, DFmode); + mtwo32r = ix86_build_const_vector (MODEmode, 1, mtwo32r); + mtwo32r = force_reg (MODEmode, mtwo32r); + emit_insn (gen_avx_cmpmode3 (tmp[0], operands[1], two31r, GEN_INT (29))); + emit_insn (gen_avx_cmpmode3 (tmp[1], operands[2], two31r, GEN_INT (29))); + emit_insn (gen_andmode3 (tmp[2], tmp[0], mtwo32r)); + emit_insn (gen_andmode3 (tmp[3], tmp[1], mtwo32r)); + emit_insn (gen_addmode3 (tmp[4], operands[1], tmp[2])); + emit_insn (gen_addmode3 (tmp[5], operands[2], tmp[3])); + if (MODEmode == V4DFmode) +{ + emit_insn (gen_avx_cvttpd2dq256_2 (tmp[6], tmp[4])); + emit_insn (gen_avx_cvttpd2dq256_2 (tmp[7], tmp[5])); + emit_insn (gen_avx_vperm2f128v8si3 (operands[0], tmp[6], tmp[7], + GEN_INT (0x20))); +} + else +{ + emit_insn (gen_sse2_cvttpd2dq (tmp[6], tmp[4])); + emit_insn (gen_sse2_cvttpd2dq (tmp[7], tmp[5])); + emit_insn (gen_vec_interleave_lowv2di (gen_lowpart (V2DImode, + operands[0]), +gen_lowpart (V2DImode, tmp[6]), +gen_lowpart (V2DImode, tmp[7]))); +} + DONE; +}) + (define_expand vec_pack_sfix_v4df [(match_operand:V8SI 0 register_operand ) (match_operand:V4DF 1 nonimmediate_operand ) Jakub
[PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch
Use -Aleon to enable binutils sparc-leon architecture. The leon-arch binutils GAS has umul/smul and casa enabled. Signed-off-by: Konrad Eisele kon...@gaisler.com --- gcc/config/sparc/sparc.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index 65b4527..bbadeb2 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -236,7 +236,7 @@ extern enum cmodel sparc_cmodel; #if TARGET_CPU_DEFAULT == TARGET_CPU_leon #define CPP_CPU32_DEFAULT_SPEC -D__leon__ -D__sparc_v8__ -#define ASM_CPU32_DEFAULT_SPEC +#define ASM_CPU32_DEFAULT_SPEC -Aleon #endif #endif @@ -324,7 +324,7 @@ extern enum cmodel sparc_cmodel; /* Override in target specific files. */ #define ASM_CPU_SPEC \ -%{mcpu=sparclet:-Asparclet} %{mcpu=tsc701:-Asparclet} \ +%{mcpu=sparclet:-Asparclet} %{mcpu=leon:-Aleon} %{mcpu=tsc701:-Asparclet} \ %{mcpu=sparclite:-Asparclite} \ %{mcpu=sparclite86x:-Asparclite} \ %{mcpu=f930:-Asparclite} %{mcpu=f934:-Asparclite} \ -- 1.6.4.1
Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call
Denis Chertykov schrieb: 2011/10/31 Georg-Johann Lay: Since beginning of time, BRANCH_COST was set to 0 so that some optimization passes make code happily jumping around. The patch introduces a new command line option for that; mainly because I don't know the rationale behind setting BRANCH_COST to 0. Johann * config/avr/avr.opt (-mbranch-cost=): New option. * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost. * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs. * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns. (*addhi3_zero_extend1): Remov % in constraint of operand 1. (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns. Approved. Denis. You know why the branch costs are set to 0 by default? Maybe it's better to have a default of 1 for the new avr_branch_cost? Johann
Re: [PATCH, devirtualization] Detect the new type in type change detection
On Mon, Oct 31, 2011 at 5:58 PM, Martin Jambor mjam...@suse.cz wrote: On Fri, Oct 28, 2011 at 11:21:23AM +0200, Richard Guenther wrote: On Thu, Oct 27, 2011 at 9:54 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Thu, Oct 27, 2011 at 11:06:02AM +0200, Richard Guenther wrote: On Thu, Oct 27, 2011 at 1:22 AM, Martin Jambor mjam...@suse.cz wrote: Hi, I've been asked by Maxim Kuvyrkov to revive the following patch which has not made it to 4.6. Currently, when type based devirtualization detects a potential type change, it simply gives up on gathering any information on the object in question. This patch adds an attempt to actually detect the new type after the change. Maxim claimed this (and another patch I'll post tomorrow) noticeably improved performance of some real code. I can only offer a rather artificial example in the attachment. When the constructors are inlined but the function multiply_matrices is not, this patch makes the produced executable run for only 7 seconds instead of about 20 on my 4 year old i686 desktop (with -Ofast). Anyway, the patch passes bootstrap and testsuite on x86_64-linux. What do you think, is it a good idea for trunk now? Thanks, Martin 2011-10-21 Martin Jambor mjam...@suse.cz * ipa-prop.c (type_change_info): New fields object, known_current_type and multiple_types_encountered. (extr_type_from_vtbl_ptr_store): New function. (check_stmt_for_type_change): Use it, set multiple_types_encountered if the result is different from the previous one. (detect_type_change): Renamed to detect_type_change_1. New parameter comp_type. Set up new fields in tci, build known type jump functions if the new type can be identified. (detect_type_change): New function. * tree.h (DECL_CONTEXT): Comment new use. * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans. * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise. * testsuite/g++.dg/ipa/devirt-c-7.C: New test. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f) struct type_change_info { + /* The declaration or SSA_NAME pointer of the base that we are checking for + type change. */ + tree object; + /* If we actually can tell the type that the object has changed to, it is + stored in this field. Otherwise it remains NULL_TREE. */ + tree known_current_type; /* Set to true if dynamic type change has been detected. */ bool type_maybe_changed; + /* Set to true if multiple types have been encountered. known_current_type + must be disregarded in that case. */ + bool multiple_types_encountered; }; /* Return true if STMT can modify a virtual method table pointer. @@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt) return true; } +/* If STMT can be proved to be an assignment to the virtual method table + pointer of ANALYZED_OBJ and the type associated with the new table + identified, return the type. Otherwise return NULL_TREE. */ + +static tree +extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj) +{ + tree lhs, t, obj; + + if (!is_gimple_assign (stmt)) gimple_assign_single_p (stmt) OK. + return NULL_TREE; + + lhs = gimple_assign_lhs (stmt); + + if (TREE_CODE (lhs) != COMPONENT_REF) + return NULL_TREE; + obj = lhs; + + if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1))) + return NULL_TREE; + + do + { + obj = TREE_OPERAND (obj, 0); + } + while (TREE_CODE (obj) == COMPONENT_REF); You do not allow other components than component-refs (thus, for example an ARRAY_REF - that is for a reason?). Please add a comment why. Otherwise this whole sequence would look like it should be replaceable by get_base_address (obj). I guess I might have been overly conservative here, ARRAY_REFs are fine. get_base_address only digs into MEM_REFs if they are based on an ADDR_EXPR while I do so always. But I can check that either both obj and analyzed_obj are a MEM_REF of the same SSA_NAME or they are the same thing (i.e. the same decl)... which even feels a bit cleaner, so I did that. Well, as you are looking for a must-change-type pattern I think you cannot simply ignore offsets. Consider T a[10]; new (T') (a[9]); a[8]-foo(); where the must-type-change on a[9] is _not_ changing the type of a[8]! Similar cases might happen with class Compound { T a; T b; }; no? Please think about the difference must vs. may-type-change for these cases. I'm not convinced that the must-type-change code is
Re: [PR50878, PATCH] Fix for verify_dominators in -ftree-tail-merge
On Mon, Oct 31, 2011 at 9:19 PM, Tom de Vries tom_devr...@mentor.com wrote: On 10/30/2011 10:54 AM, Richard Guenther wrote: On Sun, Oct 30, 2011 at 9:27 AM, Tom de Vries tom_devr...@mentor.com wrote: On 10/30/2011 09:20 AM, Tom de Vries wrote: Richard, I have a fix for PR50878. Sorry, with patch this time. Ok for now, but see Davids mail and the complexity issue with iteratively updating dominators. I'm not sure which mail you mean. The one I CCed you on, which complained about iterative dominator fixing taking 70% of the compile-time in some GCC testsuite test. It seems to me that we know exactly what to update and how, and we should do that (well, if we need up-to-date dominators, re-computing them once in the pass would be ok). Indeed, in this example we know exactly what to update and how. However, PR50908 popped up, and there that's not the case anymore. Consider the following cfg, where A is the direct dominator of I: A / \ B \ / \ \ C D /| |\ E F |\ /| | x | |/ \| G H \ / I Say E and F are duplicates, and F is removed. The cfg then looks like this: A / \ B \ / \ \ C D / \ / \ E / \ G H \ / I E is now the new direct dominator of I. The patch for PR50878 did not address this example, since it uses the set of bbs directly dominated by the (single) predecessor of bb1 and bb2. The new patch calculates the updated dominator info by taking the nearest common dominator (A) of bb1 (F) and bb2 (E), and getting the set of bbs immediately dominated by it. Part of this set is now directly dominated by bb2. Ideally we would have a means to determine which bbs in the set are now directly dominated by bb2, and call set_immediate_dominator for those bbs, but we don't, so instead we let iterate_fix_dominators figure it out. Additionally, the patch makes sure it updates dominator info before updating the vuses, this fixes a latent bug. The patch fixes both PR50908 and PR50878. Bootstrapped and reg-tested on x86_64 and i686, and build and reg-tested on ARM and MIPS. Ok for trunk? Ok, but please add testcases for all the bugs you fixed. This helps adding test coverage for these cases. Thanks, Richard. Thanks, - Tom Richard. Thanks, - Tom A simplified form of the problem from the test-case of the PR is shown in this cfg. Block 12 has as direct dominator block 5. 5 / \ / \ * * 6 7 | | | | * * 8 9 \ / \ / * 12 tail_merge_optimize finds that blocks 6 and 7 are duplicates. After replacing block 7 by block 6, the cfg looks like this: 5 | | * 6 / \ / \ * * 8 9 \ / \ / * 12 The new direct dominator of block 12 is block 6, but the current algorithm only recalculates dominator info for blocks 6, 8 and 9. The patch fixes this by additionally recalculating the dominator info for blocks immediately dominated by bb2 (block 6 in the example), if bb2 has a single predecessor after replacement. Bootstapped and reg-tested on x86_64 and i686. Build and reg-tested on MIPS and ARM. Ok for trunk? Thanks, - Tom 2011-10-30 Tom de Vries t...@codesourcery.com PR tree-optimization/50878 * tree-ssa-tail-merge.c (replace_block_by): Recalculate dominator info for blocks immediately dominated by bb2, if bb2 has a single predecessor after replacement. 2011-10-31 Tom de Vries t...@codesourcery.com PR tree-optimization/50908 * tree-ssa-tail-merge.c (update_vuses): Now that edges are removed before update_vuses, test for 1 predecessor rather than two. (delete_block_update_dominator_info): New function, part of it factored out of ... (replace_block_by): Use delete_block_update_dominator_info. Call update_vuses after deleting bb1 and updating dominator info, instead of before.
Re: AVX generic mode tuning discussion.
On Mon, Oct 31, 2011 at 9:36 PM, Jagasia, Harsha harsha.jaga...@amd.com wrote: We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? We see these % differences going from SSE42 to AVX128 to AVX256 on Bulldozer with -mtune=generic -Ofast. (Positive is improvement, negative is degradation) Bulldozer: AVX128/SSE42 AVX256/AVX-128 410.bwaves -1.4% -1.4% 416.gamess -1.1% 0.0% 433.milc 0.5% -2.4% 434.zeusmp 9.7% -2.1% 435.gromacs 5.1% 0.5% 436.cactusADM 8.2% -23.8% 437.leslie3d 8.1% 0.4% 444.namd 3.6% 0.0% 447.dealII -1.4% -0.4% 450.soplex -0.4% -0.4% 453.povray 0.0% -1.5% 454.calculix 15.7% -8.3% 459.GemsFDTD 4.9% 1.4% 465.tonto 1.3% -0.6% 470.lbm 0.9% 0.3% 481.wrf 7.3% -3.6% 482.sphinx3 5.0% -9.8% SPECFP 3.8% -3.2% Will the next AMD generation have a useable avx256? I'm not keen on the idea of generic mode being tune for a single processor revision that maybe shouldn't actually be using avx at all. We see a substantial gain in several SPECFP benchmarks going from SSE42 to AVX128 on Bulldozer. IMHO, accomplishing even a 5% gain in an individual benchmark takes a hardware company several man months. The loss with AVX256 for Bulldozer is much more significant than the gain for SandyBridge. While the general trend in the industry is a move toward AVX256, for now we would be disadvantaging Bulldozer with this choice. We have several customers who use -mtune=generic and it is default, unless a user explicitly overrides it with -mtune=native. They are the ones who want to experiment with latest ISA using gcc, but want to keep their ISA selection and tuning agnostic on x86/64. IMHO, it is with these customers in mind that generic was introduced in the first place. Since stage 1 closure is around the corner, just wanted to ping to see if the maintainers have made up their mind on this one. AVX-128 is an improvement over SSE42 for Bulldozer and AVX-256 wipes out pretty much all of that gain in generic mode. Until there is a convergence on AVX-256 for x86/64, we would like to propose having generic generate avx-128 by default and have a user override to avx-256 manually when known to benefit performance. Did somebody spend the time analyzing why CactusADM shows so much of a difference? With the recent improvements in vectorizing for AVX, did you re-do the measurements with a recent trunk? I don't think disabling avx-256 by default is a good idea until we understand why these numbers happen and are convinced we cannot fix this by proper cost modeling. Richard. Thanks, Harsha
Re: [google] Enable loop unroll/peel notes under -fopt-info
On Tue, Nov 1, 2011 at 1:46 AM, Teresa Johnson tejohn...@google.com wrote: This patch is for google-main only. Tested with bootstrap and regression tests. Print unroll and peel factors along with loop source position under -fopt-info. Teresa 2011-10-31 Teresa Johnson tejohn...@google.com * common.opt (fopt-info): Disable -fopt-info by default. * loop-unroll.c (report_unroll_peel): New function. (unroll_and_peel_loops): Call record_loop_exits for later use. (peel_loops_completely): Print the loop source position in dump info and emit note under -fopt-info. (decide_unroll_and_peeling): Ditto. (decide_peel_once_rolling): Record peel factor for use in note emission. (decide_peel_completely): Ditto. * cfgloop.c (get_loop_location): New function. * cfgloop.h (get_loop_location): Ditto. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Emit note under -fopt-info. Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 180437) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -52,6 +52,7 @@ #include flags.h #include tree-inline.h #include target.h +#include diagnostic.h /* Specifies types of loops that may be unrolled. */ @@ -443,6 +444,17 @@ fprintf (dump_file, Unrolled loop %d completely by factor %d.\n, loop-num, (int) n_unroll); + if (flag_opt_info = OPT_INFO_MIN) + { + location_t locus; + locus = gimple_location (cond); + + inform (locus, Completely Unroll loop by %d (execution count %d, const iterations %d), + (int) n_unroll, + (int) loop-header-count, + (int) TREE_INT_CST_LOW(niter)); + } + And this is exactly what I mean with code-duplication. Two lines above we already have Unroled loop %d completely by factor %d, not only do you duplicate some diagnostic printing about this fact, you put in useless info (complete unroll by N of a loop executing M (?! that's surely N as well) times, const iterations O (?! that's surely N as well ...). Richard.
Re: Go patch committed: Update Go library
On Thu, Oct 27, 2011 at 6:42 PM, Uros Bizjak ubiz...@gmail.com wrote: This patch updates the Go library to the most recent weekly release. I think the only potential portability issues here are the use of the ipv6_mreq struct. I'm not entirely sure the new exp/terminal package is portable, but it might be. There are still problems with EpollEvent definition on Alpha, please see [1] for the analysis. [1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00457.html Thanks, the resulting epoll.go on Alpha reads as: epoll.go package syscall type EpollEvent struct { Events uint32 Pad [4]byte Fd int32 Pad2 [4]byte } However, I am not able to finish compilation of libgo due to unrelated problem (reported in [1]) with TC[GS]ETS define: libtool: compile: /space/uros/gcc-build-go/./gcc/gccgo -B/space/uros/gcc-build-go/./gcc/ -B/usr/local/alphaev68-unknown-linux-gnu/bin/ -B/usr/local/alphaev68-unknown-linux-gnu/lib/ -isystem /usr/local/alphaev68-unknown-linux-gnu/include -isystem /usr/local/alphaev68-unknown-linux-gnu/sys-include -O2 -g -mieee -I . -c -fgo-prefix=libgo_bytes ../../../gcc-svn/trunk/libgo/go/bytes/buffer.go ../../../gcc-svn/trunk/libgo/go/bytes/bytes.go ../../../gcc-svn/trunk/libgo/go/bytes/bytes_decl.go -o bytes/bytes.o /dev/null 21 ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:31:78: error: reference to undefined identifier ‘syscall.TCGETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:40:81: error: reference to undefined identifier ‘syscall.TCGETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:47:81: error: reference to undefined identifier ‘syscall.TCSETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:57:78: error: reference to undefined identifier ‘syscall.TCSETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:66:81: error: reference to undefined identifier ‘syscall.TCGETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:72:81: error: reference to undefined identifier ‘syscall.TCSETS’ ../../../gcc-svn/trunk/libgo/go/exp/terminal/terminal.go:77:68: error: reference to undefined identifier ‘syscall.TCSETS’ make[4]: *** [exp/terminal.lo] Error 1 [1] http://gcc.gnu.org/ml/gcc/2011-10/msg00488.html Uros.
Re: [Patch,AVR]: Fix PR50910: int/2 leads to libgcc call
2011/11/1 Georg-Johann Lay a...@gjlay.de: Denis Chertykov schrieb: 2011/10/31 Georg-Johann Lay: Since beginning of time, BRANCH_COST was set to 0 so that some optimization passes make code happily jumping around. The patch introduces a new command line option for that; mainly because I don't know the rationale behind setting BRANCH_COST to 0. Johann * config/avr/avr.opt (-mbranch-cost=): New option. * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost. * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs. * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns. (*addhi3_zero_extend1): Remov % in constraint of operand 1. (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns. Approved. Denis. You know why the branch costs are set to 0 by default? No. Maybe it's better to have a default of 1 for the new avr_branch_cost? I don't know. (I forgot) Denis.
Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders
On Tue, Nov 1, 2011 at 10:07 AM, Jakub Jelinek ja...@redhat.com wrote: Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode conversion. Ok for trunk? Please put expander function into i386.c. IMO, this expander can be better written using variable mode and indirect functions. Otherwise, it looks OK. Thanks, Uros.
Re: PATCH: Move f16c intrinsics into f16cintrin.h
Hello! On Mon, Oct 31, 2011 at 05:23:58PM -0500, Quentin Neill wrote: Interested parties should view these threads from three years ago: http://gcc.gnu.org/ml/gcc-patches/2008-11/threads.html#00145 http://gcc.gnu.org/ml/gcc-patches/2008-12/threads.html#00174 Testing on x86_64, okay to commit if no regressions? You aren't installing the header, so it will cause regressions. config.gcc needs to be adjusted for it. Arggh. ?Thanks, my tests found that too. Reposting, okay to commit after testing on x86_64 if no regressions? Piledriver f16cintrin.h fix. * config/i386/f16cintrin.h: Contents moved from immintrin.h. * config/config.gcc: Add f16cintrin.h. OK. Thanks, Uros.
Re: implementation of std::thread::hardware_concurrency()
Rechecked. diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc index 09e7fc5..6feda4d 100644 --- a/libstdc++-v3/src/thread.cc +++ b/libstdc++-v3/src/thread.cc @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned int thread::hardware_concurrency() noexcept { -int __n = _GLIBCXX_NPROCS; -if (__n 0) - __n = 0; -return __n; +int count=0; +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) +count=pthread_num_processors_np(); +#elif defined(__APPLE__) || defined(__FreeBSD__) +size_t size=sizeof(count); +sysctlbyname(hw.ncpu, count, size, NULL, 0); +#elif defined(_SC_NPROCESSORS_ONLN) +count=sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(_GLIBCXX_USE_GET_NPROCS) +count=_GLIBCXX_NPROCS; +#endif +return (count0)?count:0; } _GLIBCXX_END_NAMESPACE_VERSION 2011/11/1 Paolo Carlini pcarl...@gmail.com: Hi, This is patch is implement the std::thread::hardware_concurrency(). Tested on pthreads-win32/winpthreads on windows OS, and on Linux/FreeBSD. Please send library patches to the library mailing list too. Also, always parch mainline first: actually in the latter the function is alread implemented, maybe something is missing for win32, please check, rediff, and resend. Thanks Paolo
[Patch, libfortran, committed] Cleanup NEWUNIT allocation
Hi, attached patch committed to trunk as obvious after regtesting. 2011-11-01 Janne Blomqvist j...@gcc.gnu.org * io/io.h (next_available_newunit): Remove prototype. * io/unit.h (next_available_newunit): Make variable static, initialize it. (init_units): Don't initialize next_available_newunit. (get_unique_unit_number): Use atomic builtin if available. -- Janne Blomqvist diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h index 23f07ca..3569c54 100644 --- a/libgfortran/io/io.h +++ b/libgfortran/io/io.h @@ -576,10 +576,6 @@ gfc_unit; extern gfc_offset max_offset; internal_proto(max_offset); -/* Unit number to be assigned when NEWUNIT is used in an OPEN statement. */ -extern GFC_INTEGER_4 next_available_newunit; -internal_proto(next_available_newunit); - /* Unit tree root. */ extern gfc_unit *unit_root; internal_proto(unit_root); diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index b4d10cd..33072fe 100644 --- a/libgfortran/io/unit.c +++ b/libgfortran/io/unit.c @@ -71,8 +71,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* Subroutines related to units */ -GFC_INTEGER_4 next_available_newunit; +/* Unit number to be assigned when NEWUNIT is used in an OPEN statement. */ #define GFC_FIRST_NEWUNIT -10 +static GFC_INTEGER_4 next_available_newunit = GFC_FIRST_NEWUNIT; #define CACHE_SIZE 3 static gfc_unit *unit_cache[CACHE_SIZE]; @@ -525,8 +526,6 @@ init_units (void) __GTHREAD_MUTEX_INIT_FUNCTION (unit_lock); #endif - next_available_newunit = GFC_FIRST_NEWUNIT; - if (options.stdin_unit = 0) {/* STDIN */ u = insert_unit (options.stdin_unit); @@ -808,16 +807,19 @@ get_unique_unit_number (st_parameter_open *opp) { GFC_INTEGER_4 num; +#ifdef HAVE_SYNC_FETCH_AND_ADD + num = __sync_fetch_and_add (next_available_newunit, -1); +#else __gthread_mutex_lock (unit_lock); num = next_available_newunit--; + __gthread_mutex_unlock (unit_lock); +#endif /* Do not allow NEWUNIT numbers to wrap. */ - if (next_available_newunit = GFC_FIRST_NEWUNIT ) + if (num GFC_FIRST_NEWUNIT ) { - __gthread_mutex_unlock (unit_lock); generate_error (opp-common, LIBERROR_INTERNAL, NEWUNIT exhausted); return 0; } - __gthread_mutex_unlock (unit_lock); return num; }
Re: implementation of std::thread::hardware_concurrency()
On 11/01/2011 12:33 PM, niXman wrote: Rechecked. Stylistically, you are missing a lot of spaces around the operators, eg: return (count 0) ? count : 0; also, patches are always submitted with a ChangeLog entry. Do you have already a Copyright assignment in place? I'm asking in general, for your future submissions, this specific patch probably would be small enough to not require it. Paolo.
[patch] Update gcc.dg/vect/no-scevccp-outer-6-global.c
Hi, With the recent patches for __restrict__, the outer loop in gcc.dg/vect/no-scevccp-outer-6-global.c is now vectorizable, because it doesn't require loop versioning for alias anymore. The comment in the test is probably obsolete, and checking for widen-mult doesn't make much sense, because there is no multiplication here at all. Tested on powerpc64-suse-linux. Committed. Ira testsuite/ChangeLog: * gcc.dg/vect/no-scevccp-outer-6-global.c: Expect to vectorize the outer loop. Remove comment. Don't check for widen-mult. Index: testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c === --- testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c (revision 180733) +++ testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c (working copy) @@ -52,7 +52,5 @@ return 0; } -/* Too many BBs in loop */ -/* { dg-final { scan-tree-dump-times OUTER LOOP VECTORIZED. 1 vect { xfail *-*-* } } } */ -/* { dg-final { scan-tree-dump-times vect_recog_widen_mult_pattern: detected 1 vect { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times OUTER LOOP VECTORIZED. 1 vect { xfail vect_no_align } } } */ /* { dg-final { cleanup-tree-dump vect } } */
Re: building binutils from same directory as gcc
On 10/30/2011 01:51 PM, Gerald Pfeifer wrote: Why not just declare that building from the same directory is not support and have one simple set of instructions that always works, as opposed to this ought to work with snapshots but not with direct checkouts? That's right. Is there ever any advantage to building in-srcdir? I'm not aware of one. Andrew.
Re: implementation of std::thread::hardware_concurrency()
On Tue, 1 Nov 2011, niXman wrote: diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc index 09e7fc5..6feda4d 100644 --- a/libstdc++-v3/src/thread.cc +++ b/libstdc++-v3/src/thread.cc @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned int thread::hardware_concurrency() noexcept { -int __n = _GLIBCXX_NPROCS; -if (__n 0) - __n = 0; -return __n; +int count=0; +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) +count=pthread_num_processors_np(); +#elif defined(__APPLE__) || defined(__FreeBSD__) +size_t size=sizeof(count); +sysctlbyname(hw.ncpu, count, size, NULL, 0); +#elif defined(_SC_NPROCESSORS_ONLN) +count=sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(_GLIBCXX_USE_GET_NPROCS) +count=_GLIBCXX_NPROCS; +#endif +return (count0)?count:0; Er, the macro _GLIBCXX_NPROCS already handles the case sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the macro _GLIBCXX_NPROCS completely. -- Marc Glisse
[PATCH] PR target/50038 fix: redundant zero extensions removal
Hi, Here is a patch which fixes redundant zero extensions problem. Issue is resolved by expanding implicit_zee pass functionality to cover zero and sign extends of different modes. Could please someone review it? Bootstrapped and checked on linux-x86_64. Thanks, Ilya --- 2011-11-01 Enkovich Ilya ilya.enkov...@intel.com PR target/50038 * implicit-zee.c (ext_cand): New. (ext_cand_pool): Likewise. (add_ext_candidate): New. (zee_init): New. (zee_cleanup): New. (get_reg_di): Removed. (combine_set_zero_extend): Get extend candidate as new parameter. Now handle sign extend cases and other modes. (transform_ifelse): Likewise. (merge_def_and_ze): Likewise. (combine_reaching_defs): Change parameter type. (zero_extend_info): Changed insn_list type. (add_removable_zero_extend): Relaxed mode and code filter. (find_removable_zero_extends): Changed return type. (find_and_remove_ze): Var type changes. (rest_of_handle_zee): Init and cleanup added. * i386.c (ix86_option_override_internal): set flag_zee for 32 bit platform. PR50038.diff Description: Binary data
[PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)
On Tue, Nov 01, 2011 at 11:16:07AM +0100, Uros Bizjak wrote: On Tue, Nov 1, 2011 at 10:07 AM, Jakub Jelinek ja...@redhat.com wrote: Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode conversion. Ok for trunk? Please put expander function into i386.c. IMO, this expander can be better written using variable mode and indirect functions. Like this? Advantage is that fixuns_truncmodesseintvecmodelower2 pattern can use the helper too and shrink, disadvantage is that the stmts in the new pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why the scheduler didn't change it, but on the other side it is scheduler's job). 2011-11-01 Jakub Jelinek ja...@redhat.com * config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New prototype. * config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New function. * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use it. (ssepackfltmode): New mode attr. (vec_pack_ufix_trunc_mode): New expander. --- gcc/config/i386/i386-protos.h.jj2011-10-25 08:13:31.0 +0200 +++ gcc/config/i386/i386-protos.h 2011-11-01 14:18:59.0 +0100 @@ -109,6 +109,7 @@ extern void ix86_expand_convert_uns_sixf extern void ix86_expand_convert_uns_sidf_sse (rtx, rtx); extern void ix86_expand_convert_uns_sisf_sse (rtx, rtx); extern void ix86_expand_convert_sign_didf_sse (rtx, rtx); +extern rtx ix86_expand_adjust_ufix_to_sfix_si (rtx); extern enum ix86_fpcmp_strategy ix86_fp_comparison_strategy (enum rtx_code); extern void ix86_expand_fp_absneg_operator (enum rtx_code, enum machine_mode, rtx[]); --- gcc/config/i386/i386.c.jj 2011-10-31 20:44:13.0 +0100 +++ gcc/config/i386/i386.c 2011-11-01 14:26:31.0 +0100 @@ -17016,6 +17016,46 @@ ix86_expand_convert_uns_sisf_sse (rtx ta emit_move_insn (target, fp_hi); } +/* Adjust a V*SFmode/V*DFmode value VAL so that *sfix_trunc* resp. fix_trunc* + pattern can be used on it instead of *ufix_trunc* resp. fixuns_trunc*. + This is done by subtracting 0x1p32 from VAL if VAL is greater or equal + (non-signalling) than 0x1p31. */ + +rtx +ix86_expand_adjust_ufix_to_sfix_si (rtx val) +{ + REAL_VALUE_TYPE MTWO32r, TWO31r; + rtx two31r, mtwo32r, tmp[3]; + enum machine_mode mode = GET_MODE (val); + enum machine_mode scalarmode = GET_MODE_INNER (mode); + rtx (*cmp) (rtx, rtx, rtx, rtx); + int i; + + for (i = 0; i 3; i++) +tmp[i] = gen_reg_rtx (mode); + real_ldexp (TWO31r, dconst1, 31); + two31r = const_double_from_real_value (TWO31r, scalarmode); + two31r = ix86_build_const_vector (mode, 1, two31r); + two31r = force_reg (mode, two31r); + real_ldexp (MTWO32r, dconstm1, 32); + mtwo32r = const_double_from_real_value (MTWO32r, scalarmode); + mtwo32r = ix86_build_const_vector (mode, 1, mtwo32r); + mtwo32r = force_reg (mode, mtwo32r); + switch (mode) +{ +case V8SFmode: cmp = gen_avx_cmpv8sf3; break; +case V4SFmode: cmp = gen_avx_cmpv4sf3; break; +case V4DFmode: cmp = gen_avx_cmpv4df3; break; +case V2DFmode: cmp = gen_avx_cmpv2df3; break; +default: gcc_unreachable (); +} + emit_insn (cmp (tmp[0], val, two31r, GEN_INT (29))); + tmp[1] = expand_simple_binop (mode, AND, tmp[0], mtwo32r, tmp[1], + 0, OPTAB_DIRECT); + return expand_simple_binop (mode, PLUS, val, tmp[1], tmp[2], + 0, OPTAB_DIRECT); +} + /* A subroutine of ix86_build_signbit_mask. If VECT is true, then replicate the value for all elements of the vector register. */ --- gcc/config/i386/sse.md.jj 2011-11-01 09:04:37.0 +0100 +++ gcc/config/i386/sse.md 2011-11-01 14:25:52.0 +0100 @@ -2323,32 +2323,13 @@ (define_insn fix_truncv4sfv4si2 (set_attr mode TI)]) (define_expand fixuns_truncmodesseintvecmodelower2 - [(set (match_dup 4) - (unspec:VF1 - [(match_operand:VF1 1 register_operand ) - (match_dup 2) - (const_int 29)] UNSPEC_PCMP)) - (set (match_dup 5) - (and:VF1 (match_dup 4) (match_dup 3))) - (set (match_dup 6) - (plus:VF1 (match_dup 1) (match_dup 5))) - (set (match_operand:sseintvecmode 0 register_operand ) - (fix:sseintvecmode (match_dup 6)))] + [(match_operand:sseintvecmode 0 register_operand ) + (match_operand:VF1 1 register_operand )] TARGET_AVX { - REAL_VALUE_TYPE MTWO32r, TWO31r; - int i; - - real_ldexp (TWO31r, dconst1, 31); - operands[2] = const_double_from_real_value (TWO31r, SFmode); - operands[2] = ix86_build_const_vector (MODEmode, 1, operands[2]); - operands[2] = force_reg (MODEmode, operands[2]); - real_ldexp (MTWO32r, dconstm1, 32); - operands[3] = const_double_from_real_value (MTWO32r, SFmode); - operands[3] =
Re: [google] Enable loop unroll/peel notes under -fopt-info
Hi Richard, Once we have a uniform way to emit notes to either stderr or dump, as you and David had discussed in the earlier thread, we can merge these two messages. The advantage with the new messages, besides going to stderr, is that the source position information is being emitted since it is a note. I agree that for complete unrolls the constant number of iterations can be omitted (but it is useful for the other types of unrolls/peels). But the execution count is something different - it includes the number of times the loop header executes based on profile information (i.e. iterations*# times loop is entered). Thanks, Teresa On Tue, Nov 1, 2011 at 2:53 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Nov 1, 2011 at 1:46 AM, Teresa Johnson tejohn...@google.com wrote: This patch is for google-main only. Tested with bootstrap and regression tests. Print unroll and peel factors along with loop source position under -fopt-info. Teresa 2011-10-31 Teresa Johnson tejohn...@google.com * common.opt (fopt-info): Disable -fopt-info by default. * loop-unroll.c (report_unroll_peel): New function. (unroll_and_peel_loops): Call record_loop_exits for later use. (peel_loops_completely): Print the loop source position in dump info and emit note under -fopt-info. (decide_unroll_and_peeling): Ditto. (decide_peel_once_rolling): Record peel factor for use in note emission. (decide_peel_completely): Ditto. * cfgloop.c (get_loop_location): New function. * cfgloop.h (get_loop_location): Ditto. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Emit note under -fopt-info. Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 180437) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -52,6 +52,7 @@ #include flags.h #include tree-inline.h #include target.h +#include diagnostic.h /* Specifies types of loops that may be unrolled. */ @@ -443,6 +444,17 @@ fprintf (dump_file, Unrolled loop %d completely by factor %d.\n, loop-num, (int) n_unroll); + if (flag_opt_info = OPT_INFO_MIN) + { + location_t locus; + locus = gimple_location (cond); + + inform (locus, Completely Unroll loop by %d (execution count %d, const iterations %d), + (int) n_unroll, + (int) loop-header-count, + (int) TREE_INT_CST_LOW(niter)); + } + And this is exactly what I mean with code-duplication. Two lines above we already have Unroled loop %d completely by factor %d, not only do you duplicate some diagnostic printing about this fact, you put in useless info (complete unroll by N of a loop executing M (?! that's surely N as well) times, const iterations O (?! that's surely N as well ...). Richard. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: C++ PATCH for c++/50500 (DR 1082, implicitly declared copy in class with move)
On 10/29/2011 05:07 PM, Eric Botcazou wrote: DR 1082 changed the rules for implicitly declared copy constructors and assignment operators in the presence of move ctor/op= such that if either move operation is present, instead of being suppressed the copy operations will still be declared, but as deleted. We have detected a side effect of this change by means of -fdump-ada-spec: implicit copy assignment operators are now generated in simple cases where they were not previously generated, for example: Oops, thanks. Fixed thus. commit 06151eabf195163c8885da36abae67ab60cf1978 Author: Jason Merrill ja...@redhat.com Date: Mon Oct 31 16:57:17 2011 -0400 PR c++/50500 DR 1082 * search.c (lookup_fnfields_idx_nolazy): Split out from... (lookup_fnfields_1): ...here. (lookup_fnfields_slot_nolazy): Use it. * cp-tree.h: Declare it. * class.c (type_has_move_assign): Use it. (type_has_user_declared_move_assign): Likewise. diff --git a/gcc/cp/class.c b/gcc/cp/class.c index a014d25..41d182a 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -4485,7 +4485,7 @@ type_has_move_assign (tree t) lazily_declare_fn (sfk_move_assignment, t); } - for (fns = lookup_fnfields_slot (t, ansi_assopname (NOP_EXPR)); + for (fns = lookup_fnfields_slot_nolazy (t, ansi_assopname (NOP_EXPR)); fns; fns = OVL_NEXT (fns)) if (move_fn_p (OVL_CURRENT (fns))) return true; @@ -4530,7 +4530,7 @@ type_has_user_declared_move_assign (tree t) if (CLASSTYPE_LAZY_MOVE_ASSIGN (t)) return false; - for (fns = lookup_fnfields_slot (t, ansi_assopname (NOP_EXPR)); + for (fns = lookup_fnfields_slot_nolazy (t, ansi_assopname (NOP_EXPR)); fns; fns = OVL_NEXT (fns)) { tree fn = OVL_CURRENT (fns); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 7ff1491..ac42e0e 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5328,6 +5328,7 @@ extern tree lookup_field_1 (tree, tree, bool); extern tree lookup_field (tree, tree, int, bool); extern int lookup_fnfields_1 (tree, tree); extern tree lookup_fnfields_slot (tree, tree); +extern tree lookup_fnfields_slot_nolazy (tree, tree); extern int class_method_index_for_fn (tree, tree); extern tree lookup_fnfields (tree, tree, int); extern tree lookup_member (tree, tree, int, bool); diff --git a/gcc/cp/search.c b/gcc/cp/search.c index 97f593c..5f60eee 100644 --- a/gcc/cp/search.c +++ b/gcc/cp/search.c @@ -1335,10 +1335,11 @@ lookup_conversion_operator (tree class_type, tree type) } /* TYPE is a class type. Return the index of the fields within - the method vector with name NAME, or -1 if no such field exists. */ + the method vector with name NAME, or -1 if no such field exists. + Does not lazily declare implicitly-declared member functions. */ -int -lookup_fnfields_1 (tree type, tree name) +static int +lookup_fnfields_idx_nolazy (tree type, tree name) { VEC(tree,gc) *method_vec; tree fn; @@ -1348,34 +1349,6 @@ lookup_fnfields_1 (tree type, tree name) if (!CLASS_TYPE_P (type)) return -1; - if (COMPLETE_TYPE_P (type)) -{ - if ((name == ctor_identifier - || name == base_ctor_identifier - || name == complete_ctor_identifier)) - { - if (CLASSTYPE_LAZY_DEFAULT_CTOR (type)) - lazily_declare_fn (sfk_constructor, type); - if (CLASSTYPE_LAZY_COPY_CTOR (type)) - lazily_declare_fn (sfk_copy_constructor, type); - if (CLASSTYPE_LAZY_MOVE_CTOR (type)) - lazily_declare_fn (sfk_move_constructor, type); - } - else if (name == ansi_assopname (NOP_EXPR)) - { - if (CLASSTYPE_LAZY_COPY_ASSIGN (type)) - lazily_declare_fn (sfk_copy_assignment, type); - if (CLASSTYPE_LAZY_MOVE_ASSIGN (type)) - lazily_declare_fn (sfk_move_assignment, type); - } - else if ((name == dtor_identifier - || name == base_dtor_identifier - || name == complete_dtor_identifier - || name == deleting_dtor_identifier) - CLASSTYPE_LAZY_DESTRUCTOR (type)) - lazily_declare_fn (sfk_destructor, type); -} - method_vec = CLASSTYPE_METHOD_VEC (type); if (!method_vec) return -1; @@ -1445,6 +1418,46 @@ lookup_fnfields_1 (tree type, tree name) return -1; } +/* TYPE is a class type. Return the index of the fields within + the method vector with name NAME, or -1 if no such field exists. */ + +int +lookup_fnfields_1 (tree type, tree name) +{ + if (!CLASS_TYPE_P (type)) +return -1; + + if (COMPLETE_TYPE_P (type)) +{ + if ((name == ctor_identifier + || name == base_ctor_identifier + || name == complete_ctor_identifier)) + { + if (CLASSTYPE_LAZY_DEFAULT_CTOR (type)) + lazily_declare_fn (sfk_constructor, type); + if (CLASSTYPE_LAZY_COPY_CTOR (type)) + lazily_declare_fn (sfk_copy_constructor, type); + if (CLASSTYPE_LAZY_MOVE_CTOR (type)) + lazily_declare_fn (sfk_move_constructor, type); + } + else if (name == ansi_assopname (NOP_EXPR)) + { + if (CLASSTYPE_LAZY_COPY_ASSIGN (type)) +
Re: [C++ Patch] PR 44277
OK. Jason
[wwwdocs] Use regular markup for java/status.html
That does not fix the fact that the status is not up-to-date, but makes things more consistent and easier to carry along in case of future updates. Applied. Gerald 2011-11-01 Gerald Pfeifer ger...@pfeifer.com * status.html: Use h2 instead of fake tables. Index: status.html === RCS file: /cvs/gcc/wwwdocs/htdocs/java/status.html,v retrieving revision 1.30 diff -u -r1.30 status.html --- status.html 27 Jul 2004 23:59:38 - 1.30 +++ status.html 1 Nov 2011 14:03:45 - @@ -17,14 +17,7 @@ pStatus of GCJ as of GCC 3.2. Improvements that are only in current development versions are marked as in CVS./p -table id=features border=0 cellpadding=4 width=95% -tr bgcolor=#b0d0ff - th align=left - Core Features - /th -/tr -/table -br / +h2 id=featuresCore Features/h2 ul liCompile Java source code (ahead-of-time) to native (machine) code,/li @@ -43,14 +36,8 @@ liAn extensive class library - see below./li /ul -table id=packages border=0 cellpadding=4 width=95% -tr bgcolor=#b0d0ff - th align=left - Implemented Packages - /th -/tr -/table +h2 id=packagesImplemented Packages/h2 pYou can also see a href=http://www.kaffe.org/~stuart/japi/;a comparison of libgcj with the JDK/a. This is updated nightly. It @@ -118,13 +105,8 @@ a comparison of the GUI branch with Classpath/a. /p -table id=targets border=0 cellpadding=4 width=95% -tr bgcolor=#b0d0ff - th align=left - Supported Targets - /th -/tr -/table + +h2 id=targetsSupported Targets/h2 dl dt class=targetGNU/Linux on the Pentium-compatible PCs
[wwwdocs] Prepare GCC 4.7 release notes for the release
...at least somwhat, and also to then serve as a better template for the following release. Sort ARM, MIPS and picochip alphabetically, add an anchor for MIPS. Comment out empty sections. Applied. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.52 diff -u -r1.52 changes.html --- changes.html1 Nov 2011 03:56:01 - 1.52 +++ changes.html1 Nov 2011 15:10:07 - @@ -11,6 +11,7 @@ body h1GCC 4.7 Release Seriesbr /Changes, New Features, and Fixes/h1 + h2Caveats/h2 ul @@ -51,6 +52,7 @@ obsoleted in GCC 4.6./li /ul + h2General Optimizer Improvements/h2 ul @@ -168,6 +170,7 @@ /li /ul + h2New Languages and Language specific improvements/h2 ul @@ -399,10 +402,23 @@ /ul/li /ul +!-- h3Java (GCJ)/h3 +-- + h2 id=targetsNew Targets and Target Specific Improvements/h2 +h3 id=armARM/h3 + ul +liThe default vector size in auto-vectorization for NEON is now 128 bits. + If vectorization fails thusly, the vectorizer tries again with + 64-bit vectors./li +liA new option code-mvectorize-with-neon-double/code was added to + allow users to change the vector size to 64 bits./li + + /ul + h3C6X/h3 ul liSupport has been added for the Texas Instruments C6X family of @@ -430,6 +446,14 @@ li.../li /ul +!-- +h3 id=mipsMIPS/h3 +-- + +!-- +h3 id=picochippicochip/h3 +-- + h3PowerPC/PowerPC64/h3 ul liVectors of type ivector long long/i or ivector long/i are @@ -448,8 +472,6 @@ /li /ul -h3MIPS/h3 - h3SPARC/h3 ul liThe option code-mflat/code has been reinstated. When it is @@ -490,19 +512,11 @@ default on UltraSPARC T3 (Niagara 3) and later CPUs./li /ul -h3 id=picochippicochip/h3 - -h3 id=armARM/h3 -ul -liThe default vector size in auto-vectorization for NEON is now 128 bits. - If vectorization fails thusly, the vectorizer tries again with - 64-bit vectors./li -liA new option code-mvectorize-with-neon-double/code was added to - allow users to change the vector size to 64 bits./li - - /ul +!-- h2Documentation improvements/h2 +-- + h2Other significant improvements/h2
[libstdc++, patch] Refer to GNU/Linux in acinclude.m4
Applied, based on ongoing exchange with RMS. Gerald 2011-10-31 Gerald Pfeifer ger...@pfeifer.com * acinclude.m4 (GLIBCXX_CONFIGURE): Refer to GNU/Linux. * configure: Regenerate. Index: acinclude.m4 === --- acinclude.m4(revision 180677) +++ acinclude.m4(working copy) @@ -94,8 +94,8 @@ ## (Right now, this only matters for enable_wchar_t, but nothing prevents ## other macros from doing the same. This should be automated.) -pme - # Check for C library flavor since Linux platforms use different configuration - # directories depending on the C library in use. + # Check for C library flavor since GNU/Linux platforms use different + # configuration directories depending on the C library in use. AC_EGREP_CPP([_using_uclibc], [ #include stdio.h #if __UCLIBC__ Index: configure === --- configure (revision 180677) +++ configure (working copy) @@ -5219,8 +5219,8 @@ ## (Right now, this only matters for enable_wchar_t, but nothing prevents ## other macros from doing the same. This should be automated.) -pme - # Check for C library flavor since Linux platforms use different configuration - # directories depending on the C library in use. + # Check for C library flavor since GNU/Linux platforms use different + # configuration directories depending on the C library in use. cat confdefs.h - _ACEOF conftest.$ac_ext /* end confdefs.h. */
Re: v2[PATCH] update to libtool-2.4.2 and regenerate
On Mon, 31 Oct 2011, Markus Trippelsdorf wrote: This is an updated version of the libtool update patch. It fixes the --with-sysroot clash by reverting commit 3334f7ed5851ef1 in libtools. I've also included Rainer's 64bit Solaris patch. For the record, older versions of libtool have references to Linux (where RMS would like to see GNU/Linux) which this addresses, too. Doing this update really beneficial from this side as well. Gerald
Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)
On 11/01/2011 06:35 AM, Jakub Jelinek wrote: ... disadvantage is that the stmts in the new pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why the scheduler didn't change it, but on the other side it is scheduler's job). I wonder if the scheduling description didn't get updated properly? If the scheduler believes that the each insn takes 1 cycle, and there is only one pipe for them, it won't reorder anything. * config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New prototype. * config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New function. * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use it. (ssepackfltmode): New mode attr. (vec_pack_ufix_trunc_mode): New expander. Looks good to me. r~
Re: implementation of std::thread::hardware_concurrency()
On 1 November 2011 11:54, Marc Glisse wrote: On Tue, 1 Nov 2011, niXman wrote: diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc index 09e7fc5..6feda4d 100644 --- a/libstdc++-v3/src/thread.cc +++ b/libstdc++-v3/src/thread.cc @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned int thread::hardware_concurrency() noexcept { - int __n = _GLIBCXX_NPROCS; - if (__n 0) - __n = 0; - return __n; + int count=0; +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) + count=pthread_num_processors_np(); +#elif defined(__APPLE__) || defined(__FreeBSD__) + size_t size=sizeof(count); + sysctlbyname(hw.ncpu, count, size, NULL, 0); +#elif defined(_SC_NPROCESSORS_ONLN) + count=sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(_GLIBCXX_USE_GET_NPROCS) + count=_GLIBCXX_NPROCS; +#endif + return (count0)?count:0; Er, the macro _GLIBCXX_NPROCS already handles the case sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the macro _GLIBCXX_NPROCS completely. Right, I already handled the case of using sysconf. I'm going to veto this patch in its current form - please check how it works now before changing this code. _GLIBCXX_NPROCS should be made to call pthread_num_processors_np() for mingw or HPUX.
Re: PowerPC shrink-wrap support 3 of 3
On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: Bits left to do - limit size of duplicated tails Done here. Also fixes a hole in that I took no notice of targetm.cannot_copy_insn_p when duplicating tails. One interesting result is that the tail duplication actually reduces the text size of libstdc++.so from 1074042 to 1073478 bytes on powerpc-linux. The reason being that a shrink-wrapped function that needs a prologue only on paths ending in a sibling call will lose one copy of the epilogue. That must happen enough to more than make up for duplicated tails. Bootstrapped and regression tested powerpc-linux. OK to apply? (And I won't be posting any more versions of the patch until this is reviewed. Please excuse me for spamming the list.) * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. Index: gcc/function.c === --- gcc/function.c (revision 180588) +++ gcc/function.c (working copy) @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3. #include df.h #include timevar.h #include vecprim.h +#include params.h +#include bb-reorder.h /* So we can assign to cfun in this file. */ #undef cfun @@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R HARD_REG_SET hardregs; unsigned regno; - if (!INSN_P (insn) || DEBUG_INSN_P (insn)) -return false; if (CALL_P (insn)) return !SIBLING_CALL_P (insn); @@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump) JUMP_LABEL (returnjump) = ret_rtx; } -/* Return true if BB has any active insns. */ +#ifdef HAVE_simple_return +/* Create a copy of BB instructions and insert at BEFORE. Redirect + preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ +static void +dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, + bitmap_head *need_prologue) +{ + edge_iterator ei; + edge e; + rtx insn = BB_END (bb); + + /* We know BB has a single successor, so there is no need to copy a + simple jump at the end of BB. */ + if (simplejump_p (insn)) +insn = PREV_INSN (insn); + + start_sequence (); + duplicate_insn_chain (BB_HEAD (bb), insn); + if (dump_file) +{ + unsigned count = 0; + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + if (active_insn_p (insn)) + ++count; + fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n, + bb-index, copy_bb-index, count); +} + insn = get_insns (); + end_sequence (); + emit_insn_before (insn, before); + + /* Redirect all the paths that need no prologue into copy_bb. */ + for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) +if (!bitmap_bit_p (need_prologue, e-src-index)) + { + redirect_edge_and_branch_force (e, copy_bb); + continue; + } +else + ei_next (ei); +} +#endif + +#if defined (HAVE_return) || defined (HAVE_simple_return) +/* Return true if there are any active insns between HEAD and TAIL. */ static bool -bb_active_p (basic_block bb) +active_insn_between (rtx head, rtx tail) { + while (tail) +{ + if (active_insn_p (tail)) + return true; + if (tail == head) + return false; + tail = PREV_INSN (tail); +} + return false; +} + +/* LAST_BB is a block that exits, and empty of active instructions. + Examine its predecessors for jumps that can be converted to + (conditional) returns. */ +static VEC (edge, heap) * +convert_jumps_to_returns (basic_block last_bb, bool simple_p, + VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED) +{ + int i; + basic_block bb; rtx label; + edge_iterator ei; + edge e; + VEC(basic_block,heap) *src_bbs; + + src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb-preds)); + FOR_EACH_EDGE (e, ei, last_bb-preds) +if (e-src != ENTRY_BLOCK_PTR) + VEC_quick_push (basic_block,
Re: implementation of std::thread::hardware_concurrency()
With what exactly do you don't accept this patch? 2011/11/1 Jonathan Wakely jwakely@gmail.com: On 1 November 2011 11:54, Marc Glisse wrote: On Tue, 1 Nov 2011, niXman wrote: diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc index 09e7fc5..6feda4d 100644 --- a/libstdc++-v3/src/thread.cc +++ b/libstdc++-v3/src/thread.cc @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned int thread::hardware_concurrency() noexcept { - int __n = _GLIBCXX_NPROCS; - if (__n 0) - __n = 0; - return __n; + int count=0; +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) + count=pthread_num_processors_np(); +#elif defined(__APPLE__) || defined(__FreeBSD__) + size_t size=sizeof(count); + sysctlbyname(hw.ncpu, count, size, NULL, 0); +#elif defined(_SC_NPROCESSORS_ONLN) + count=sysconf(_SC_NPROCESSORS_ONLN); +#elif defined(_GLIBCXX_USE_GET_NPROCS) + count=_GLIBCXX_NPROCS; +#endif + return (count0)?count:0; Er, the macro _GLIBCXX_NPROCS already handles the case sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the macro _GLIBCXX_NPROCS completely. Right, I already handled the case of using sysconf. I'm going to veto this patch in its current form - please check how it works now before changing this code. _GLIBCXX_NPROCS should be made to call pthread_num_processors_np() for mingw or HPUX.
Re: implementation of std::thread::hardware_concurrency()
I've put gcc-patches@ back in the CC list and removed gcc@ On 1 November 2011 15:35, niXman wrote: Er, the macro _GLIBCXX_NPROCS already handles the case sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the macro _GLIBCXX_NPROCS completely. Fixed. No, this still isn't acceptable. I do not want to see preprocessor tests like +#elif defined(__APPLE__) || defined(__FreeBSD__) in the body of get_thread::hardware_concurrency(), the configure script should determine what is available on the platform and set an appropriate macro. Look at the definition of _GLIBCXX_NPROCS and adjust that to do #define _GLIBCXX_NPROCS pthread_num_processors_np() for the relevant platforms. For the platforms using sysctlbyname there could be an inline function that calls it, and _GLIBCXX_NPROCS could be defined to call that, so that thread::hardware_concurrency() can still be defined as it is today. Please read the code you're changing and understand how it works today before making changes.
Re: implementation of std::thread::hardware_concurrency()
What header is required for pthread_num_processors_np? pthread.h Also, you should include sys/sysctl.h before calling sysctlbyname. On the right - yes. sysctlbyname() implicitly included in some header files.
Re: implementation of std::thread::hardware_concurrency()
Ok. I correct it. 2011/11/1 Jonathan Wakely jwakely@gmail.com: I've put gcc-patches@ back in the CC list and removed gcc@ On 1 November 2011 15:35, niXman wrote: Er, the macro _GLIBCXX_NPROCS already handles the case sysconf(_SC_NPROCESSORS_ONLN). It looks like you actually want to remove the macro _GLIBCXX_NPROCS completely. Fixed. No, this still isn't acceptable. I do not want to see preprocessor tests like +#elif defined(__APPLE__) || defined(__FreeBSD__) in the body of get_thread::hardware_concurrency(), the configure script should determine what is available on the platform and set an appropriate macro. Look at the definition of _GLIBCXX_NPROCS and adjust that to do #define _GLIBCXX_NPROCS pthread_num_processors_np() for the relevant platforms. For the platforms using sysctlbyname there could be an inline function that calls it, and _GLIBCXX_NPROCS could be defined to call that, so that thread::hardware_concurrency() can still be defined as it is today. Please read the code you're changing and understand how it works today before making changes.
[RFC][cxx-mem-model] mem_signal_fence
Any comments on the expectation, or implementation of signal-fence below? Should I make the distinction between the memory models here at all? At minimum there's another typo in the ifdef section; we really need to minimize those... r~ diff --git a/gcc/builtins.c b/gcc/builtins.c index 756070f..34922a8 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5530,16 +5530,18 @@ expand_builtin_atomic_is_lock_free (tree exp) /* This routine will either emit the mem_thread_fence pattern or issue a sync_synchronize to generate a fence for memory model MEMMODEL. */ +#ifndef HAVE_mem_thread_fence +# define HAVE_mem_thread_fence 0 +# define gen_mem_thread_fence(x) (gcc_unreachable (), NULL_RTX) +#endif + void expand_builtin_mem_thread_fence (enum memmodel model) { - if (model == MEMMODEL_RELAXED) -return; -#ifdef HAVE_mem_thread_fence - emit_insn (gen_mem_thread_fence (GEN_INT (model))); -#else - expand_builtin_sync_synchronize (); -#endif + if (HAVE_mem_thread_fence) +emit_insn (gen_mem_thread_fence (GEN_INT (model))); + else if (model != MEMMODEL_RELAXED) +expand_builtin_sync_synchronize (); } /* Expand the __atomic_thread_fence intrinsic: @@ -5558,15 +5560,38 @@ expand_builtin_atomic_thread_fence (tree exp) /* This routine will either emit the mem_signal_fence pattern or issue a sync_synchronize to generate a fence for memory model MEMMODEL. */ +#ifndef HAVE_mem_signal_fence +# define HAVE_mem_signal_fence 0 +# define gen_mem_signal_fence(x) (gcc_unreachable (), NULL_RTX) +#endif + static void expand_builtin_mem_signal_fence (enum memmodel model) { -#ifdef HAVE_mem_signal_fence - emit_insn (gen_mem_signal_fence (memmodel)); -#else - if (model != MEMMODEL_RELAXED) -expand_builtin_sync_synchronize (); -#endif + if (HAVE_mem_signal_fence) +emit_insn (gen_mem_signal_fence (GEN_INT (model))); + else +{ + rtx x; + + /* By default I expect that targets are coherent between a thread and +the signal handler running on the same thread. Thus this really +becomes a compiler barrier, in that stores must not be sunk past +(or raised above) a given point. */ + switch (model) + { + case MEMMODEL_RELAXED: + break; + case MEMMODEL_SEQ_CST: + gen_blockage (); + break; + default: + x = gen_rtx_SCRATCH (Pmode); + x = gen_rtx_MEM (BLKmode, x); + emit_insn (gen_rtx_USE (x)); + break; + } +} } /* Expand the __atomic_signal_fence intrinsic:
Re: implementation of std::thread::hardware_concurrency()
On 1 November 2011 15:57, niXman wrote: What header is required for pthread_num_processors_np? pthread.h OK. This assumes that Pthreads is the only abstraction available on __hpux (i.e. that if _GLIBCXX_HAS_GTHREADS is true then we have already included pthread.h): +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) Is that assumption safe? Also, you should include sys/sysctl.h before calling sysctlbyname. On the right - yes. sysctlbyname() implicitly included in some header files. The manual page says it requires sys/sysctl.h so please do that, otherwise a future version of darwin or freebsd might stop implicitly including it and break the code.
Re: [libstdc++, patch] Refer to GNU/Linux in acinclude.m4
On Tue, 1 Nov 2011, Gerald Pfeifer wrote: - # Check for C library flavor since Linux platforms use different configuration - # directories depending on the C library in use. + # Check for C library flavor since GNU/Linux platforms use different + # configuration directories depending on the C library in use. I think this is a case that is definitely referring to platforms using the Linux kernel and not restricted in any way to GNU/Linux platforms (so platforms using the Linux kernel might be a better description in the comment). It's a comment on tests for uClibc and Bionic, and even if you account for some GNU code present in uClibc, Bionic is the C library for Android which is the canonical example of a Linux system which is not GNU/Linux (no GPL code in userspace) - the test is for whether a Linux system is GNU/Linux or not. -- Joseph S. Myers jos...@codesourcery.com
Re: implementation of std::thread::hardware_concurrency()
On 1 November 2011 16:01, Jonathan Wakely wrote: On 1 November 2011 15:57, niXman wrote: What header is required for pthread_num_processors_np? pthread.h OK. This assumes that Pthreads is the only abstraction available on __hpux (i.e. that if _GLIBCXX_HAS_GTHREADS is true then we have already included pthread.h): +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) Is that assumption safe? OK, gthr-dec.h includes pthread.h so I think it is safe. Do all supported versions of Pthreads-win32, mingw64 and HPUX define pthread_num_processors_np() in pthread.h? They might not, which is why there should be a configure test checking for the availability of that function, which sets a macro such as _GLIBCXX_USE_PTHREAD_NUM_PROCESSORS_NP, which is then checked in src/thread.cc
Re: [libstdc++, patch] Refer to GNU/Linux in acinclude.m4
On Tue, 1 Nov 2011, Joseph S. Myers wrote: + # Check for C library flavor since GNU/Linux platforms use different + # configuration directories depending on the C library in use. I think this is a case that is definitely referring to platforms using the Linux kernel and not restricted in any way to GNU/Linux platforms (so platforms using the Linux kernel might be a better description in the comment). It's a comment on tests for uClibc and Bionic, and even if you account for some GNU code present in uClibc, Bionic is the C library for Android which is the canonical example of a Linux system which is not GNU/Linux (no GPL code in userspace) - the test is for whether a Linux system is GNU/Linux or not. I was thinking of that, and agree it's a border case. Given that significant parts of the GNU toolchain are being used here it's not just about the Linux kernel, but also at least some parts of GNU and from that point it gets messy pretty quickly. (Luckily I have not seen GNU/Solaris being suggested yet.) If you feel this is simply a mistake, happy to change to platforms using the Linux kernel or start a conversation with you and RMS (though the latter may be overkill). Let me know. Gerald
Re: building binutils from same directory as gcc
On Nov 1, 2011, at 4:27 AM, Andrew Haley wrote: On 10/30/2011 01:51 PM, Gerald Pfeifer wrote: Why not just declare that building from the same directory is not support and have one simple set of instructions that always works, as opposed to this ought to work with snapshots but not with direct checkouts? That's right. Is there ever any advantage to building in-srcdir? Yes. You can do configure make make install.
Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
that means some existing bugs get exposed. Your previous version simply skipped the target mem refs. You will need to debug the problem a little more. David On Tue, Nov 1, 2011 at 5:26 AM, dvyu...@google.com wrote: On 2011/10/31 06:08:34, davidxl wrote: http://codereview.appspot.com/5303083/diff/1/gcc/passes.c#newcode1423 gcc/passes.c:1423: NEXT_PASS (pass_tsan); Move this to the same place as asan. Otherwise TARGET_MEM_REF won't be handled. After I moved the pass it started crashing: Program received signal SIGSEGV, Segmentation fault. 0x00718f94 in is_gimple_reg_type (t=0x7771efa0) at gimple.c:2960 2960 return !AGGREGATE_TYPE_P (type); (gdb) bt #0 0x00718f94 in is_gimple_reg_type (t=0x7771efa0) at gimple.c:2960 #1 is_gimple_val (t=0x7771efa0) at gimple.c:3028 #2 0x008a5d20 in verify_types_in_gimple_reference (expr=0x776b74c0, require_lvalue=false) at tree-cfg.c:2934 #3 0x008b2d4f in verify_gimple_in_cfg (fn=0x777c67e0) at tree-cfg.c:4382 #4 0x00a061d6 in verify_ssa (check_modified_stmt=true) at tree-ssa.c:924 #5 0x007f755c in execute_function_todo (data=Unhandled dwarf expression opcode 0xf3 ) at passes.c:1727 #6 0x007f7e4d in execute_todo (flags=34854) at passes.c:1758 #7 0x007fafda in execute_one_pass (pass=0x122a900) at passes.c:2104 The code seems to be (however I am not 100% sure): D.3617_33 = MEM[(const uint64_t *)ctx_11(D)].nhkey[D.3612_26]{lb: D.3810_279 sz: 8}; http://codereview.appspot.com/5303083/
Re: implementation of std::thread::hardware_concurrency()
On Nov 1, 2011, at 8:55 AM, Jonathan Wakely wrote: Is there a reason you used hw.ncpu not the constant HW_NCPU ? I suspect on some systems, this would be a runtime value so, no fixed constant could ever work.
Re: implementation of std::thread::hardware_concurrency()
On 1 November 2011 17:06, Mike Stump wrote: On Nov 1, 2011, at 8:55 AM, Jonathan Wakely wrote: Is there a reason you used hw.ncpu not the constant HW_NCPU ? I suspect on some systems, this would be a runtime value so, no fixed constant could ever work. It's a constant for identifying the sysctl, not a constant for the number of processors e.g. (untested) int mib[] = { CTL_HW, HW_NCPU }; if (!sysctl(mib, 2, count, size, NULL, 0)) The Mac OS X man page says the sysctl() function runs in about a third the time as the same request made via the sysctlbyname() function. My preferred solution (which would be consistent with the existing code, and additionally support NetBSD, OpenBSD and Irix) would be to add autoconf tests for the required functionality, then: #if defined(_GLIBCXX_USE_GET_NPROCS) # include sys/sysinfo.h # define _GLIBCXX_NPROCS get_nprocs() #elif defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN) # include unistd.h # define _GLIBCXX_NPROCS sysconf(_SC_NPROCESSORS_ONLN) #elif defined(_GLIBCXX_USE_SC_NPROC_ONLN) # include unistd.h # define _GLIBCXX_NPROCS sysconf(_SC_NPROC_ONLN) #elif defined(_GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP) # define _GLIBCXX_NPROCS pthread_num_processors_np() #elif defined(_GLIBCXX_USE_SYSCTLBYNAME_HW_NCPU) # include sys/sysctl.h static inline int get_nprocs() { int count; size_t size = sizeof(count); int mib[] = { CTL_HW, HW_NCPU }; if (!sysctl(mib, 2, count, size, NULL, 0)) return count; return 0; } # define _GLIBCXX_NPROCS get_nprocs() #else # define _GLIBCXX_NPROCS 0 #endif ... unsigned int thread::hardware_concurrency() noexcept { int __n = _GLIBCXX_NPROCS; if (__n 0) __n = 0; return __n; }
Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c File gcc/tree-tsan.c (right): http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c#newcode1075 gcc/tree-tsan.c:1075: for (eidx = 0; VEC_iterate (edge, exit_bb-preds, eidx, e); eidx++) Use FOR_EACH_EDGE macro http://codereview.appspot.com/5303083/diff/3002/gcc/tree-tsan.c#newcode1082 gcc/tree-tsan.c:1082: gsi_insert_seq_before (gsi, post_func_seq, GSI_SAME_STMT); On 2011/11/01 11:39:49, dvyukov wrote: Do I need to make a copy of POST_FUNC_SEQ here? I think that I do not need location info for this code at all, so is it OK to leave the seq w/o location and then insert it into several basic blocks? Yes, do not share gimple stmts. http://codereview.appspot.com/5303083/
Re: implementation of std::thread::hardware_concurrency()
On Nov 1, 2011, at 10:13 AM, Jonathan Wakely wrote: I suspect on some systems, this would be a runtime value so, no fixed constant could ever work. It's a constant for identifying the sysctl, not a constant for the number of processors e.g. (untested) Ah, never mind, ignore me.
[Patch, libfortran] PR 46686 Implement backtrace using libgcc functionality
Hi, the attached patch changes the backtracing functionality, which is used to print a stack trace before aborting when something goes belly-up, to use the stack unwinding functionality provided by libgcc instead of using the glibc backtrace_symbols and backtrace_symbols_fd functions, or the pstack utility which is available on some systems (Solaris?). There are some nice benefits of this: - It should work on all targets, not only those which use glibc or pstack. - It gets the correct line numbers, whereas the backtrace_symbols_fd output was usually (but not always) offset by one. This is probably related to the use of _Unwind_GetIPInfo and in some cases decrementing the IP. - Based on some googling, it's a bit unclear whether backtrace() and/or backtrace_symbols_fd() actually are async-signal-safe due to usage of dlsym/dladdr and such. It still uses addr2line if available to print out function and file names and line numbers. If addr2line is not found on the path during program startup, it resorts to printing out the addresses only. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2011-11-01 Janne Blomqvist j...@gcc.gnu.org PR fortran/46686 * configure.ac: Don't check execinfo.h, backtrace, backtrace_symbols_fd. Check execve instead of execvp. Call GCC_CHECK_UNWIND_GETIPINFO. * runtime/backtrace.c: Don't include unused headers, include limits.h and unwind.h. (CAN_FORK): Check execve instead of execvp. (GLIBC_BACKTRACE): Remove. (bt_header): Conform to gdb backtrace format. (struct bt_state): New struct. (trace_function): New function. (show_backtrace): Use _Unwind_Backtrace from libgcc instead of glibc backtrace functions. -- Janne Blomqvist diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac index 74cfe44..32431c0 100644 --- a/libgfortran/configure.ac +++ b/libgfortran/configure.ac @@ -249,7 +249,7 @@ AC_HEADER_TIME AC_HAVE_HEADERS(stdio.h stdlib.h string.h unistd.h signal.h stdarg.h) AC_CHECK_HEADERS(time.h sys/time.h sys/times.h sys/resource.h) AC_CHECK_HEADERS(sys/types.h sys/stat.h sys/wait.h floatingpoint.h ieeefp.h) -AC_CHECK_HEADERS(fenv.h fptrap.h float.h execinfo.h pwd.h) +AC_CHECK_HEADERS(fenv.h fptrap.h float.h pwd.h) AC_CHECK_HEADER([complex.h],[AC_DEFINE([HAVE_COMPLEX_H], [1], [complex.h exists])]) GCC_HEADER_STDINT(gstdint.h) @@ -261,14 +261,11 @@ AC_CHECK_MEMBERS([struct stat.st_rdev]) AC_CHECK_FUNCS(getrusage times mkstemp strtof strtold snprintf ftruncate chsize) AC_CHECK_FUNCS(chdir strerror getlogin gethostname kill link symlink perror) AC_CHECK_FUNCS(sleep time ttyname signal alarm clock access fork execl) -AC_CHECK_FUNCS(wait setmode execvp pipe dup2 close fdopen strcasestr getrlimit) +AC_CHECK_FUNCS(wait setmode execve pipe dup2 close fdopen strcasestr getrlimit) AC_CHECK_FUNCS(gettimeofday stat fstat lstat getpwuid vsnprintf dup getcwd) AC_CHECK_FUNCS(localtime_r gmtime_r strerror_r getpwuid_r ttyname_r) AC_CHECK_FUNCS(clock_gettime strftime readlink) -# Check for glibc backtrace functions -AC_CHECK_FUNCS(backtrace backtrace_symbols_fd) - # Check libc for getgid, getpid, getuid AC_CHECK_LIB([c],[getgid],[AC_DEFINE([HAVE_GETGID],[1],[libc includes getgid])]) AC_CHECK_LIB([c],[getpid],[AC_DEFINE([HAVE_GETPID],[1],[libc includes getpid])]) @@ -562,6 +559,9 @@ LIBGFOR_CHECK_UNLINK_OPEN_FILE # Check whether line terminator is LF or CRLF LIBGFOR_CHECK_CRLF +# Check whether we have _Unwind_GetIPInfo for backtrace +GCC_CHECK_UNWIND_GETIPINFO + AC_CACHE_SAVE if test ${multilib} = yes; then diff --git a/libgfortran/runtime/backtrace.c b/libgfortran/runtime/backtrace.c index 7d6479f..70aae91 100644 --- a/libgfortran/runtime/backtrace.c +++ b/libgfortran/runtime/backtrace.c @@ -26,46 +26,38 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #include string.h -#ifdef HAVE_STDLIB_H -#include stdlib.h -#endif - -#ifdef HAVE_INTTYPES_H -#include inttypes.h -#endif - #ifdef HAVE_UNISTD_H #include unistd.h #endif -#ifdef HAVE_EXECINFO_H -#include execinfo.h -#endif - #ifdef HAVE_SYS_WAIT_H #include sys/wait.h #endif -#include ctype.h +#include limits.h + +#include unwind.h /* Macros for common sets of capabilities: can we fork and exec, can we use glibc-style backtrace functions, and can we use pipes. */ -#define CAN_FORK (defined(HAVE_FORK) defined(HAVE_EXECVP) \ +#define CAN_FORK (defined(HAVE_FORK) defined(HAVE_EXECVE) \ defined(HAVE_WAIT)) -#define GLIBC_BACKTRACE (defined(HAVE_BACKTRACE) \ - defined(HAVE_BACKTRACE_SYMBOLS_FD)) #define CAN_PIPE (CAN_FORK defined(HAVE_PIPE) \ defined(HAVE_DUP2) defined(HAVE_FDOPEN) \ defined(HAVE_CLOSE)) +#ifndef PATH_MAX +#define PATH_MAX 4096 +#endif + /* GDB style #NUM index for each stack frame. */ static void bt_header (int num) { - st_printf ( #%d , num); + st_printf (#%d , num); } @@ -106,24 +98,105 @@
Re: [PATCH, rs6000] Preserve link stack for 476 cpus
On Mon, 2011-10-31 at 19:05 -0400, David Edelsohn wrote: Okay, go ahead with PPC64 support as well. Hopefully no one ever will have to use it. That implies the option should not explicitly reference ppc476. Ok, for completeness, I attached what I committed below, which includes the support for 64-bit because it makes the code cleaner and changes the option name back to -mpreserve-link-stack. Thanks. Peter * config.gcc (powerpc*-*-linux*): Add powerpc*-*-linux*ppc476* variant. * config/rs6000/476.h: New file. * config/rs6000/476.opt: Likewise. * config/rs6000/rs6000.h (TARGET_LINK_STACK): New define. (SET_TARGET_LINK_STACK): Likewise. (TARGET_ASM_CODE_END): Define. * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable TARGET_LINK_STACK for -mtune=476 and -mtune=476fp. (rs6000_legitimize_tls_address): Emit the link stack preserving GOT code if TARGET_LINK_STACK. (rs6000_emit_load_toc_table): Likewise. (output_function_profiler): Likewise (macho_branch_islands): Likewise (machopic_output_stub): Likewise (get_ppc476_thunk_name): New function. (rs6000_code_end): Likewise. * config/rs6000/rs6000.md (load_toc_v4_PIC_1, load_toc_v4_PIC_1b): Convert to a define_expand. (load_toc_v4_PIC_1_normal): New define_insn. (load_toc_v4_PIC_1_476): Likewise. (load_toc_v4_PIC_1b_normal): Likewise. (load_toc_v4_PIC_1b_476): Likewise. Index: gcc/config.gcc === --- gcc/config.gcc (revision 180740) +++ gcc/config.gcc (revision 180741) @@ -2145,6 +2145,9 @@ powerpc-*-linux* | powerpc64-*-linux*) esac tmake_file=${tmake_file} t-slibgcc-libgcc case ${target} in + powerpc*-*-linux*ppc476*) + tm_file=${tm_file} rs6000/476.h + extra_options=${extra_options} rs6000/476.opt ;; powerpc*-*-linux*altivec*) tm_file=${tm_file} rs6000/linuxaltivec.h ;; powerpc*-*-linux*spe*) Index: gcc/config/rs6000/476.h === --- gcc/config/rs6000/476.h (revision 0) +++ gcc/config/rs6000/476.h (revision 180741) @@ -0,0 +1,32 @@ +/* Enable IBM PowerPC 476 support. + Copyright (C) 2011 Free Software Foundation, Inc. + Contributed by Peter Bergner (berg...@vnet.ibm.com) + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#undef TARGET_LINK_STACK +#define TARGET_LINK_STACK (rs6000_link_stack) + +#undef SET_TARGET_LINK_STACK +#define SET_TARGET_LINK_STACK(X) do { TARGET_LINK_STACK = (X); } while (0) + +#undef TARGET_ASM_CODE_END +#define TARGET_ASM_CODE_END rs6000_code_end Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 180740) +++ gcc/config/rs6000/rs6000-protos.h (revision 180741) @@ -173,6 +173,7 @@ extern void rs6000_emit_eh_reg_restore ( extern const char * output_isel (rtx *); extern void rs6000_call_indirect_aix (rtx, rtx, rtx); extern void rs6000_aix_asm_output_dwarf_table_ref (char *); +extern void get_ppc476_thunk_name (char name[32]); /* Declare functions in rs6000-c.c */ Index: gcc/config/rs6000/476.opt === --- gcc/config/rs6000/476.opt (revision 0) +++ gcc/config/rs6000/476.opt (revision 180741) @@ -0,0 +1,24 @@ +; IBM PowerPC 476 options. +; +; Copyright (C) 2011 Free Software Foundation, Inc. +; Contributed by Peter Bergner (berg...@vnet.ibm.com) +; +; This file is part of GCC. +; +; GCC is free software; you can redistribute it and/or modify it under +; the terms of the GNU General Public License as published by the Free +; Software Foundation; either version 3, or (at your option) any later +; version. +; +; GCC is distributed in the hope that it will be useful, but WITHOUT ANY +; WARRANTY; without even the implied warranty
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote: Diego mentioned that we can move the asan pass somewhere to the very end, just before lowering to RTL. Where would be this blessed place? Does it still have TARGET_MEM_REF? Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. TARGET_MEM_REFs are converted to RTL mems during RTL expansion. Diego.
Re: [PATCH] Add vec_pack_ufix_trunc_{v4df,v2df} expanders (take 2)
On Tue, Nov 1, 2011 at 2:35 PM, Jakub Jelinek ja...@redhat.com wrote: Similarly to the V{4,8}SFmode - unsigned V{4,8}SImode conversion support for AVX this one adds V{2,4}DFmode - unsigned V{4,8}SImode conversion. Ok for trunk? Please put expander function into i386.c. IMO, this expander can be better written using variable mode and indirect functions. Like this? Advantage is that fixuns_truncmodesseintvecmodelower2 pattern can use the helper too and shrink, disadvantage is that the stmts in the new pattern are now in vcmppd; vandpd; vaddpd; vcmppd; vandpd; vaddpd order instead of vcmppd; vcmppd; vandpd; vandpd; vaddpd; vaddpd; (not sure why the scheduler didn't change it, but on the other side it is scheduler's job). 2011-11-01 Jakub Jelinek ja...@redhat.com * config/i386/i386-protos.h (ix86_expand_adjust_ufix_to_sfix_si): New prototype. * config/i386/i386.c (ix86_expand_adjust_ufix_to_sfix_si): New function. * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): Use it. (ssepackfltmode): New mode attr. (vec_pack_ufix_trunc_mode): New expander. OK. Thanks, Uros.
Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
Hi, sorry that I'm not using the fancy web tool but I do not want to use my google account and gmail address in particular for work-related stuff. On Tue, Nov 01, 2011 at 06:05:46PM +, davi...@google.com wrote: ... http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode638 gcc/tree-tsan.c:638: _vptr., sizeof (_vptr.) - 1) == 0) This is a very hacky way of recognizing vptr field. C++ FE provides TYPE_VFIELD macro to get the vptr field, but you will need to add a new langhook for it -- which is not liked in upstream -- so the hacky way may be ok (as it is for error checking purpose). If you have a FIELD_DECL and want to check whether it is a VPTR, you can simply use DECL_VIRTUAL_P. Martin
Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
On 11-11-01 15:26 , Martin Jambor wrote: Hi, sorry that I'm not using the fancy web tool but I do not want to use my google account and gmail address in particular for work-related stuff. No worries. You do not need to use the web tool at all. You can simply reply to these messages. As long as you keep re...@codereview.appspotmail.com in the CC and do not remove the (issue NN) string from the subject, your message will be added to the issue log (similarly to how bugzilla works). Diego.
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Nov 1, 2011 at 12:16 PM, Diego Novillo dnovi...@google.com wrote: On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote: Diego mentioned that we can move the asan pass somewhere to the very end, just before lowering to RTL. Where would be this blessed place? Does it still have TARGET_MEM_REF? Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. Why? TARGET_MEM_REFs are converted to RTL mems during RTL expansion. What? they will still be seen by asan which can not be handled (e.g, creating address expression out of it). David Diego.
Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
On Tue, Nov 1, 2011 at 12:26 PM, Martin Jambor mjam...@suse.cz wrote: Hi, sorry that I'm not using the fancy web tool but I do not want to use my google account and gmail address in particular for work-related stuff. On Tue, Nov 01, 2011 at 06:05:46PM +, davi...@google.com wrote: ... http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode638 gcc/tree-tsan.c:638: _vptr., sizeof (_vptr.) - 1) == 0) This is a very hacky way of recognizing vptr field. C++ FE provides TYPE_VFIELD macro to get the vptr field, but you will need to add a new langhook for it -- which is not liked in upstream -- so the hacky way may be ok (as it is for error checking purpose). If you have a FIELD_DECL and want to check whether it is a VPTR, you can simply use DECL_VIRTUAL_P. ah yes, that will do. thanks, David Martin
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On 11-11-01 15:34 , Xinliang David Li wrote: Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. Why? The idea was to experiment where to best place ASAN to avoid instrumenting too much. If we schedule it really late, then we may save ourselves some unnecessary instrumentation. Though, I still think ASAN should never open code the library calls directly. Rather, it should emit straight-code gimple that can be better understood and optimized away. TARGET_MEM_REFs are converted to RTL mems during RTL expansion. What? they will still be seen by asan which can not be handled (e.g, creating address expression out of it). So, it needs to run before TMRs are introduced then. *shrug*.
Re: [patch] support for multiarch systems
On Sun, 21 Aug 2011, Matthias Klose wrote: On 08/20/2011 09:51 PM, Matthias Klose wrote: Multiarch [1] is the term being used to refer to the capability of a system to install and run applications of multiple different binary targets on the same system. The idea and name of multiarch dates back to 2004/2005 [2] (to be confused with multiarch in glibc). attached is an updated patch which includes feedback from Jakub and Joseph. Hello, what is the status of this patch? Is it waiting for a review? Having gcc 4.7 work out of the box on 2 of the most popular linux distributions seems like an important feature... -- Marc Glisse
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Nov 1, 2011 at 12:41 PM, Diego Novillo dnovi...@google.com wrote: On 11-11-01 15:34 , Xinliang David Li wrote: Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. Why? The idea was to experiment where to best place ASAN to avoid instrumenting too much. If we schedule it really late, then we may save ourselves some unnecessary instrumentation. It needs to be balanced -- on one hand it needs to be as late as possible so that as few memory references (dynamically executed) as possible are instrumented. On the other hand, early enough so that the instrumented code can be optimized sufficiently. Though, I still think ASAN should never open code the library calls directly. Rather, it should emit straight-code gimple that can be better understood and optimized away. that depends on the library function themselves -- if they are trivial, inline sequence should be generated. TARGET_MEM_REFs are converted to RTL mems during RTL expansion. What? they will still be seen by asan which can not be handled (e.g, creating address expression out of it). So, it needs to run before TMRs are introduced then. *shrug*. yes it should be before ivopt as discussed. David
Re: RFC: PATCH to adjust warning flags for C++
On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrill ja...@redhat.com wrote: Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus -Wall) broke bootstrap because of narrowing warnings, so I'd like to add -Wno-narrowing to the stage 2+ warning flags. Is this the best way to do that? why do we want to include -Wc++0x-compat in -Wall?
[PATCH, i386]: Fix PR50940, ICE in extract_insn, at recog.c:2137 during bootstrap
Hello! Fix a typo. 2011-10-30 Uros Bizjak ubiz...@gmail.com PR target/50940 * config/i386/i386.md (floatsimode2_vector_sse_with_temp splitter): Compare ssevecmodemode with V4SFmode, not V4SImode. Tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 180741) +++ config/i386/i386.md (working copy) @@ -5053,7 +5053,7 @@ emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode), operands[2])); } - if (ssevecmodemode == V4SImode) + if (ssevecmodemode == V4SFmode) emit_insn (gen_floatv4siv4sf2 (operands[3], operands[4])); else emit_insn (gen_sse2_cvtdq2pd (operands[3], operands[4]));
Re: [PATCH] Fix errors in expand_atomic_store.
From: Richard Henderson r...@redhat.com Date: Tue, 01 Nov 2011 08:15:51 -0700 Given that I believe that essentially all Sparcs still running are actually v9 and have native CAS, I think we can ignore this problem entirely. Unfortunately, this is not true. Otherwise we could change the 32-bit default code generation to v9 from v7 under Linux.
Re: [PATCH] Add capability to run several iterations of early optimizations
Hi, On Fri, Oct 28, 2011 at 04:06:20PM -0700, Matt wrote: ... I agree (of course). Having the knob will be very useful for testing and determining the acceptance criteria for the later smartness. While terminating early would be a nice optimization, the feature is still intrinsically useful and deployable without it. In addition, when using LTO on nearly all the projects/modules I tested on, 3+ passes were always productive. To be fair, when not using LTO, beyond 2-3 passes did not often produce improvements unless individual compilation units were enormous. I'm quite surprised you get extra benefit with LTO since early optimizations are exactly the part of middle-end which should produce the same results, LTO or not. So the only way I can imagine this can happen is that inlining analysis gets somehow a much better input and then can make much bigger use of it. If this is because of extra early inlining, we might try to be able to catch these situations when doing IPA inlining decisions which would work regardless of any iteration number cut-off. If it is because of something else, it's probably better to (at least try to) tweak the passes and/or inlining analysis to understand each other straight away. There was also the question of if some of the improvements seen with multiple passes were indicative of deficiencies in early inlining, CFG, SRA, SRA, because it is not flow-sensitive in any way, unfortunately sometimes produces useless statements which then need to be cleanup up by forwprop (and possibly dse and others). We've already talked with Richi about this and agreed the early one should be dumbed down a little to produce much less of these. I'm afraid I won't be able to submit a patch doing that during this stage 1, though. etc. If the knob is available, I'm happy to continue testing on the same projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, scummvm, binutils, etc) and write a report on what I observe as suspicious improvements that perhaps should be caught/made in a single pass. It's worth noting again that while this is a useful feature in and of itself (especially when combined with LTO), it's *extremely* useful when coupled with the de-virtualization improvements submitted in other threads. The examples submitted for inclusion in the test suite aren't academic -- they are reductions of real-world performance issues from a mature (and shipping) C++-based networking product. Any C++ codebase that employs physical separation in their designs via Factory patterns, Interface Segregation, and/or Dependency Inversion will likely see improvements. To me, these enahncements combine to form one of the biggest leaps I've seen in C++ code optimization -- code that can be clean, OO, *and* fast. Well, while I'd understand that whenever there is a new direct call graph edge, trying early inlining again might help or save some work for the full inlining, I think that we should rather try to enhance the current IPA infrastructure rather than grow another one from the early optimizations, especially if we aim at LTO - iterating early optimizations will not help reduce abstraction if it is spread across a number of compilation units. Martin
Re: PING 2 : [Patch Darwin/PR49992 2/2] remove ranlib special-casing from the darwin port.
Le 28/10/2011 17:41, Iain Sandoe a écrit : This is unreviewed for 2 weeks. I am sure that this issue will be affecting Ada on Darwin10/11 with the latest toolchains. It's actually under discussion and is pretty subtle, so delicate. Thanks for your patience. Arno
Re: [PATCH] Fix errors in expand_atomic_store.
On 11/01/2011 01:20 PM, David Miller wrote: Unfortunately, this is not true. Otherwise we could change the 32-bit default code generation to v9 from v7 under Linux. For v7, pa-risc, and sh, we originally allowed the test_and_set and lock_release patterns to do non-obvious things with 0/1 constants. My proposal is to *not* carry that over to the __atomic patterns. The PA and SH targets have already switched to use kernel helpers, and no longer rely on this hack. The only one left is Sparc v7. (1) Are there really live v7 still around? At least with v8 we have SWAP, with which we can implement the full __atomic_exchange pattern sans hackery. We can't do that with just LDSTUB. (2) Can we have the kernel implement some {SWAP,CAS}{4,8} primitives (possibly via a special trap) that we can export from libgcc, as we do for ARM, PA, SH? I believe that would allow all of the non-embedded linux to support all of the c++11 atomic operations without having to resort to spinlocks. r~
Re: [PATCH, devirtualization] Detect the new type in type change detection
Hi, On Tue, Nov 01, 2011 at 10:37:10AM +0100, Richard Guenther wrote: On Mon, Oct 31, 2011 at 5:58 PM, Martin Jambor mjam...@suse.cz wrote: On Fri, Oct 28, 2011 at 11:21:23AM +0200, Richard Guenther wrote: On Thu, Oct 27, 2011 at 9:54 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Thu, Oct 27, 2011 at 11:06:02AM +0200, Richard Guenther wrote: On Thu, Oct 27, 2011 at 1:22 AM, Martin Jambor mjam...@suse.cz wrote: Hi, I've been asked by Maxim Kuvyrkov to revive the following patch which has not made it to 4.6. Currently, when type based devirtualization detects a potential type change, it simply gives up on gathering any information on the object in question. This patch adds an attempt to actually detect the new type after the change. Maxim claimed this (and another patch I'll post tomorrow) noticeably improved performance of some real code. I can only offer a rather artificial example in the attachment. When the constructors are inlined but the function multiply_matrices is not, this patch makes the produced executable run for only 7 seconds instead of about 20 on my 4 year old i686 desktop (with -Ofast). Anyway, the patch passes bootstrap and testsuite on x86_64-linux. What do you think, is it a good idea for trunk now? Thanks, Martin 2011-10-21 Martin Jambor mjam...@suse.cz * ipa-prop.c (type_change_info): New fields object, known_current_type and multiple_types_encountered. (extr_type_from_vtbl_ptr_store): New function. (check_stmt_for_type_change): Use it, set multiple_types_encountered if the result is different from the previous one. (detect_type_change): Renamed to detect_type_change_1. New parameter comp_type. Set up new fields in tci, build known type jump functions if the new type can be identified. (detect_type_change): New function. * tree.h (DECL_CONTEXT): Comment new use. * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans. * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise. * testsuite/g++.dg/ipa/devirt-c-7.C: New test. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f) struct type_change_info { + /* The declaration or SSA_NAME pointer of the base that we are checking for + type change. */ + tree object; + /* If we actually can tell the type that the object has changed to, it is + stored in this field. Otherwise it remains NULL_TREE. */ + tree known_current_type; /* Set to true if dynamic type change has been detected. */ bool type_maybe_changed; + /* Set to true if multiple types have been encountered. known_current_type + must be disregarded in that case. */ + bool multiple_types_encountered; }; /* Return true if STMT can modify a virtual method table pointer. @@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt) return true; } +/* If STMT can be proved to be an assignment to the virtual method table + pointer of ANALYZED_OBJ and the type associated with the new table + identified, return the type. Otherwise return NULL_TREE. */ + +static tree +extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj) +{ + tree lhs, t, obj; + + if (!is_gimple_assign (stmt)) gimple_assign_single_p (stmt) OK. + return NULL_TREE; + + lhs = gimple_assign_lhs (stmt); + + if (TREE_CODE (lhs) != COMPONENT_REF) + return NULL_TREE; + obj = lhs; + + if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1))) + return NULL_TREE; + + do + { + obj = TREE_OPERAND (obj, 0); + } + while (TREE_CODE (obj) == COMPONENT_REF); You do not allow other components than component-refs (thus, for example an ARRAY_REF - that is for a reason?). Please add a comment why. Otherwise this whole sequence would look like it should be replaceable by get_base_address (obj). I guess I might have been overly conservative here, ARRAY_REFs are fine. get_base_address only digs into MEM_REFs if they are based on an ADDR_EXPR while I do so always. But I can check that either both obj and analyzed_obj are a MEM_REF of the same SSA_NAME or they are the same thing (i.e. the same decl)... which even feels a bit cleaner, so I did that. Well, as you are looking for a must-change-type pattern I think you cannot simply ignore offsets. Consider T a[10]; new (T') (a[9]); a[8]-foo(); where the must-type-change on a[9] is _not_ changing the type of a[8]!
Re: [PATCH] Fix errors in expand_atomic_store.
From: Richard Henderson r...@redhat.com Date: Tue, 01 Nov 2011 13:48:26 -0700 (2) Can we have the kernel implement some {SWAP,CAS}{4,8} primitives (possibly via a special trap) that we can export from libgcc, as we do for ARM, PA, SH? I believe that would allow all of the non-embedded linux to support all of the c++11 atomic operations without having to resort to spinlocks. Yes, I was just looking into this right now. I didn't realize that PA, SH, and ARM had added these kernel hooks to solve this problem.
Re: [Patch, fortran] [00/66] PR fortran/43829 Inline sum and product (AKA scalarization of reductions)
Dear Mikael, PS: I hereby confess my failure to not split the patch too much. :-( I hereby confess my failure to find anything to which I could gripe, let alone object! The patch can only be described as a tour de force. Not only is there a lot of it - 6160 lines with context on - but it is well commented and well structured. I cannot see any whitespace out of place or even minor transgressions in respect of gnu coding style. Bah humbug! On top of all that, it even does what is promised! Also, other testers have run it through various benchmarks, as recent threads attest. The only, slight worry that I have is that it is going to make Richi's middle end scalarization nearly impossible to use for gfortran. However, the enhanced capability that this patch brings makes it a worthy addition to gfortran. I bootstrapped and regtested on FC9/x86_64, just for the record. OK for trunk. Many, many thanks for the patch. Paul
Re: [PATCH] Add capability to run several iterations of early optimizations
On Sat, Oct 29, 2011 at 1:06 AM, Matt m...@use.net wrote: On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote: I like this variant a lot better than the last one - still it lacks any analysis-based justification for iteration (see my reply to Matt on what I discussed with Honza). Yes, having a way to tell whether a function have significantly changed would be awesome. My approach here would be to make inline_parameters output feedback of how much the size/time metrics have changed for a function since previous run. If the change is above X%, then queue functions callers for more optimizations. Similarly, Martin's rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue new direct callees and current function for another iteration if new direct edges were resolved. Figuring out the heuristic will need decent testing on a few projects to figure out what the sweet spot is (smallest binary for time/passes spent) for that given codebase. With a few data points, a reasonable stab at the metrics you mention can be had that would not terminate the iterations before the known optimial number of passes. Without those data points, it seems like making sure the metrics allow those sweet spots to be attained will be difficult. Well, sure - the same like with inlining heuristics. Thus, I don't think we want to merge this in its current form or in this stage1. What is the benefit of pushing this to a later release? If anything, merging the support for iterative optimizations now will allow us to consider adding the wonderful smartness to it later. In the meantime, substituting that smartness with a knob is still a great alternative. The benefit? The benifit is to not have another magic knob in there that doesn't make too much sense and papers over real conceptual/algorithmic issues. Brute-force iterating is a hack, not a solution. (sorry) I agree (of course). Having the knob will be very useful for testing and determining the acceptance criteria for the later smartness. While terminating early would be a nice optimization, the feature is still intrinsically useful and deployable without it. In addition, when using LTO on nearly all the projects/modules I tested on, 3+ passes were always productive. If that is true then I'd really like to see testcases. Because I am sure you are just papering over (mabe even easy to fix) issues by the brute-force iterating approach. We also do not have a switch to run every pass twice in succession, just because that would be as stupid as this. To be fair, when not using LTO, beyond 2-3 passes did not often produce improvements unless individual compilation units were enormous. There was also the question of if some of the improvements seen with multiple passes were indicative of deficiencies in early inlining, CFG, SRA, etc. If the knob is available, I'm happy to continue testing on the same projects I've filed recent LTO/graphite bugs against (glib, zlib, openssl, scummvm, binutils, etc) and write a report on what I observe as suspicious improvements that perhaps should be caught/made in a single pass. It's worth noting again that while this is a useful feature in and of itself (especially when combined with LTO), it's *extremely* useful when coupled with the de-virtualization improvements submitted in other threads. The examples submitted for inclusion in the test suite aren't academic -- they are reductions of real-world performance issues from a mature (and shipping) C++-based networking product. Any C++ codebase that employs physical separation in their designs via Factory patterns, Interface Segregation, and/or Dependency Inversion will likely see improvements. To me, these enahncements combine to form one of the biggest leaps I've seen in C++ code optimization -- code that can be clean, OO, *and* fast. But iterating the whole early optimization pipeline is not a sensible approach of attacking these. Richard. Richard: If there's any additional testing or information I can reasonably provide to help get this in for this stage1, let me know. Thanks! -- tangled strands of DNA explain the way that I behave. http://www.clock.org/~matt
[PATCH, i386]: Use reg_or_subregno in int-float splitters
Hello! We have a nice utility function that can be used in int-float splitter constraints. 2011-11-01 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (splitters for int-float conversion): Use reg_or_subregno in splitter constraints. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 180742) +++ i386.md (working copy) @@ -4920,9 +4920,7 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_MIX_SSE_I387 TARGET_INTER_UNIT_CONVERSIONS reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_split @@ -4933,9 +4931,7 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_MIX_SSE_I387 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -5024,9 +5020,7 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(const_int 0)] { rtx op1 = operands[1]; @@ -5067,9 +5061,7 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -5091,9 +5083,7 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(const_int 0)] { rtx op1 = operands[1]; @@ -5137,9 +5127,7 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -5200,9 +5188,7 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH (TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_insn *floatSWI48x:modeMODEF:mode2_sse_nointerunit @@ -5235,9 +5221,7 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -5248,9 +5232,7 @@ (SWI48x:MODEmode != DImode || TARGET_64BIT) SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH reload_completed -(SSE_REG_P (operands[0]) - || (GET_CODE (operands[0]) == SUBREG - SSE_REG_P (operands[0]))) +SSE_REGNO_P (reg_or_subregno (operands[0])) [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_insn *floatSWI48x:modeX87MODEF:mode2_i387_with_temp
[Patch,Fortran] Fix tree-walking issue (was: gfortran tree walking issue)
Dear all, dear Paul, (For gcc-patch@ readers: gfortran has issues with tree walking: During traversal it does not touch all tree nodes if the function called during traversal adds new nodes to the tree - as this will rebalance the tree. This causes a regression with my recently posted RFC patch for constructors.) Paul Richard Thomas wrote: Maybe we should decide a priority order? Your patch and those of Mikael could cause regressions other than in code involving OOP. I would suggest, therefore, that we should find a fix for your problem below and get these patches committed first. I will still try to get mine completed before the end of Stage 1 but it will not matter as much if I am a week or so late. I think it makes sense to have mine and Mikael's patch first. Actually, I just saw that you approved Mikael's patch. For my patch, the class_21.f03/tree-walking issue is solved by the attached patch 2. I think after that issue is solved, you can continue working on your patch. Constructor patch: I still have another rejects-valid issue related to multiple USE, ONLY for the same module, but I don't think that it makes sense that we both simultaneously try tackle that issue. When I have solved the use-only issue, I can start cleaning up the patch, add two diagnostic checks, tweak some diagnostics/dg-error checks, write a ChangeLog, re-test the patch with real-world codes, and hopefully submit it by next weekend. * * * Regarding the tree-walking issue: I think it is a general issue which could also affect other things. I really wonder why we haven't been bitten by it before. However, it might be that we hit those problems and fixed them by re-resolving symbols at later parts. My feeling is that the issue occurs either only with vtab/vtree or at least also due to those functions. However, I might be wrong as I do not quickly see which of the tree-traversal callers can generate new trees. I made two attempts to fix the issue. The first one fails - hence, I use the second one. In particular, I seek comments and approval for the second patch. PATCH 1 Ensuring that every tree node gets touched once. This patch works by traversing the tree until all nodes are touched at least once. That means that one has a couple of light-weight extra walks, which *includes the newly added nodes*. The patch does: a) Ensure that all trees are walked b) Mark symbol nodes as already walked when finding a vtab c) Skip vtab/vtype in resolve symbol (b) and (c) do not seem to have any effect. The patch regtests*, except for gfortran.dg/class_21.f03, which still has an endless loop. (Cf. previous email.) PATCH 2 This patch uses a different approach to makes sure that *newly added nodes* do *not* get visited. It does so by saving the symtree in a vector and then one walks the vector. Except for the additional memory requirement for the vector, this version should also be quick and avoids walking the tree multiple times. It also preserves the order the trees are walked. This patch builds and regtests* (gfortran + libgomp) on x86-64-linux. OK for the trunk? Tobias * Except for the known and meanwhile old failures for gfortran.dg/select_type_12.f03 (P1 regression), gfortran.fortran-torture/execute/entry_4.f90 (P1 regression) and gfortran.dg/realloc_on_assign_5.f03 (failed since committal). NOTE: The following patch does not regtest. Hence, I do not seek approval for this patch. 2011-11-01 Tobias Burnus bur...@net-b.de * symbol.c (all_marked): New file-global variable. (is_all_marked): New function which checks whether a variable is marked. (gfc_traverse_symtree): Use them to ensure that all tree nodes are touched, even if the tree changes during tranversal. * class.c (gfc_find_derived_vtab): Mark vtab symbol to avoid double resolution. diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c index f64cc1b..8880d65 100644 --- a/gcc/fortran/class.c +++ b/gcc/fortran/class.c @@ -591,6 +591,10 @@ have_vtype: found_sym = vtab; + /* Avoid double evaluation. */ + if (found_sym) +found_sym-mark = 1; + cleanup: /* It is unexpected to have some symbols added at resolution or code generation time. We commit the changes in order to keep a clean state. */ diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 67d65cb..11c83cc 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -102,6 +102,7 @@ static gfc_symbol *changed_syms = NULL; gfc_dt_list *gfc_derived_types; +static bool all_marked; /* List of tentative typebound-procedures. */ @@ -3353,6 +3354,14 @@ traverse_ns (gfc_symtree *st, void (*func) (gfc_symbol *)) } +static void +is_all_marked (gfc_symtree *st) +{ + if (!st-n.sym-mark) +all_marked = false; +} + + /* Call a given function for all symbols in the namespace. We take care that each gfc_symbol node is called exactly once. */ @@ -3362,7 +3371,15 @@ gfc_traverse_ns
Re: [Patch,Fortran] Fix tree-walking issue
Dear Tobias, On 2011-11-01 22:33, Tobias Burnus wrote: Regarding the tree-walking issue: I think it is a general issue which could also affect other things. I really wonder why we haven't been bitten by it before. However, it might be that we hit those problems and fixed them by re-resolving symbols at later parts. My feeling is that the issue occurs either only with vtab/vtree or at least also due to those functions. However, I might be wrong as I do not quickly see which of the tree-traversal callers can generate new trees. I don't remember all this very clearly, but I think that the gfc_symbol::tlink field is intended for something like this, even though this is not very clear (at least to me) from the explanatory comment I quoted below. Anyway, I thought I might point this out, as it might help you getting things working since the problem it addresses at least appears similar: /* Change management fields. Symbols that might be modified by the current statement have the mark member nonzero and are kept in a singly linked list through the tlink field. Of these symbols, symbols with old_symbol equal to NULL are symbols created within the current statement. Otherwise, old_symbol points to a copy of the old symbol. */ struct gfc_symbol *old_symbol, *tlink; Cheers, - Tobi
Re: [PATCH, i386]: Use reg_or_subregno in int-float splitters
On Tue, Nov 01, 2011 at 10:33:07PM +0100, Uros Bizjak wrote: We have a nice utility function that can be used in int-float splitter constraints. 2011-11-01 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (splitters for int-float conversion): Use reg_or_subregno in splitter constraints. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Unfortunately reg_or_subregno is an external non-inline function, doesn't have pure attribute and SSE_REGNO_P macro evaluates its argument twice, which means the function is called multiple times. Jakub
Re: [PATCH][ARM] Big Endian and Generic tuning
On 26/10/11 10:15, Richard Earnshaw wrote: Here's an updated patch that makes no generalizations. OK? Yep Committed. Andrew
[PATCH, devirtualization] Intraprocedural devirtualization pass
Hi, the patch below is the second (and last) revived type-based devirtualization patch that did not make it to 4.6. It deals with virtual calls from the function in which the there is also the object declaration: void foo() { A a; a.foo (); } Normally, the front-end would do the devirtualization on its own, however, early-inlining can create these situations too. Since there is nothing interprocedural going on, the current inlining and IPA-CP devirtualization bits are of no help. We do not do type-based devirtualization in OBJ_TYPE_REF folding either, because the necessary type-changing checks might make it quite expensive. Thus, this patch introduces a new pass to do that. The patch basically piggy-tails on the intraprocedural devirtualization mechanism, trying to construct a known-type jump function for all objects in OBJ_TYPE_REF calls and then immediately using it like we do in IPA-CP. The original patch was doing this as a part of pass_rebuild_cgraph_edges. Honza did not like this idea so I made it a separate pass. First, I scheduled it after pass_rebuild_cgraph_edges and was only traversing indirect edges, avoiding a sweep over all of the IL. Unfortunately, this does not work in one scenario. If the newly-known destination of a virtual call is known not to throw, we may have to purge some EH CFG edges and potentially basic blocks. If these basic blocks contain calls (typically calls to object destructors), we may end up having stale call edges in the call graph... and our current approach to that problem is to call pass_rebuild_cgraph_edges. I think that I was not running into this problem when the mechanism was a part of that pass just because of pure luck. Anyway, this is why I eventually opted for a sweep over the statements. My best guess is that it is probably worth it, but only because the overhead should be still fairly low. The pass triggers quite a number of times when building libstdc++ and it can speed up an artificial testcase which I will attach from over 20 seconds to 7s on my older desktop - it is very similar to the one I posted with the previous patch but this time the object constructors must not get early inlined but the function multiply_matrices has to. Currently I have problems compiling Firefox even without LTO so I don't have any numbers from it either. IIRC, Honza did not see this patch trigger there when he tried the ancient version almost a year go. On the other hand, Maxim claimed that the impact can be noticeable on some code base he is concerned about. I have successfully bootstrapped and tested the patch on x86_64-linux. What do you think, should we include this in trunk? Thanks, Martin 2011-10-31 Martin Jambor mjam...@suse.cz * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and updated all callers. (intraprocedural_devirtualization): New function. (gate_intra_devirtualization): Likewise. (pass_intra_devirt): New pass. * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. * passes.c (init_optimization_passes): Schedule pass_intra_devirt. * tree-pass.h (pass_intra_devirt): Declare. * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C @@ -0,0 +1,62 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-devirt } */ + +extern C void abort (void); + +class A +{ +public: + int data; + virtual int foo (int i); +}; + + +class B : public A +{ +public: + __attribute__ ((noinline)) B(); + virtual int foo (int i); +}; + +int __attribute__ ((noinline)) A::foo (int i) +{ + return i + 1; +} + +int __attribute__ ((noinline)) B::foo (int i) +{ + return i + 2; +} + +int __attribute__ ((noinline,noclone)) get_input(void) +{ + return 1; +} + +__attribute__ ((noinline)) B::B() +{ +} + +static inline int middleman_1 (class A *obj, int i) +{ + return obj-foo (i); +} + +static inline int middleman_2 (class B *obj, int i) +{ + return middleman_1 (obj, i); +} + +int main (int argc, char *argv[]) +{ + class B b; + + if (middleman_2 (b, get_input ()) != 3) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump Immediately devirtualizing call.*into.*B::foo devirt } } */ +/* { dg-final { cleanup-tree-dump devirt } } */ Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C === --- /dev/null +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C @@ -0,0 +1,95 @@ +/* Verify that virtual calls are folded even when a typecast to an + ancestor is involved along the way. */ +/* { dg-do run }
Re: [PATCH, i386]: Use reg_or_subregno in int-float splitters
On Tue, Nov 1, 2011 at 11:00 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 01, 2011 at 10:33:07PM +0100, Uros Bizjak wrote: We have a nice utility function that can be used in int-float splitter constraints. 2011-11-01 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (splitters for int-float conversion): Use reg_or_subregno in splitter constraints. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Unfortunately reg_or_subregno is an external non-inline function, doesn't have pure attribute and SSE_REGNO_P macro evaluates its argument twice, which means the function is called multiple times. You are right. :( On a second look, we are missing SUBREG_REG on subregs, the constraint should read: (SSE_REG_P (operands[0]) || (GET_CODE (operands[0]) == SUBREG SSE_REG_P (SUBREG_REG (operands[0]))) I will do a partial revert with additional fix. 2011-11-01 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (splitters for int-float conversion): Use SUBREG_REG on SUBREGs in splitter constraints. Bootstrap and regression test in progress. Thanks, Uros. Index: i386.md === --- i386.md (revision 180745) +++ i386.md (working copy) @@ -4920,7 +4920,9 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_MIX_SSE_I387 TARGET_INTER_UNIT_CONVERSIONS reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_split @@ -4931,7 +4933,9 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_MIX_SSE_I387 !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -5020,7 +5024,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(const_int 0)] { rtx op1 = operands[1]; @@ -5061,7 +5067,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -5083,7 +5091,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(const_int 0)] { rtx op1 = operands[1]; @@ -5127,7 +5137,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -5188,7 +5200,9 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH (TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(set (match_dup 0) (float:MODEF (match_dup 1)))]) (define_insn *floatSWI48x:modeMODEF:mode2_sse_nointerunit @@ -5221,7 +5235,9 @@ SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH !(TARGET_INTER_UNIT_CONVERSIONS || optimize_function_for_size_p (cfun)) reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(set (match_dup 2) (match_dup 1)) (set (match_dup 0) (float:MODEF (match_dup 2)))]) @@ -5232,7 +5248,9 @@ (SWI48x:MODEmode != DImode || TARGET_64BIT) SSE_FLOAT_MODE_P (MODEF:MODEmode) TARGET_SSE_MATH reload_completed -SSE_REGNO_P (reg_or_subregno (operands[0])) +(SSE_REG_P (operands[0]) + || (GET_CODE (operands[0]) == SUBREG + SSE_REG_P (SUBREG_REG (operands[0] [(set (match_dup 0)
Re: [PATCH] Fix computed gotos on m68k
I've now committed the patch on 4.6 also. I did need to apply the following patch from Bernd in order to test the 4.6 branch tip successfully, since without it my build blew up in glibc with an error in final.c: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00454.html Maybe that patch should be applied to 4.6 also? Fine with me, this is a regression; please backport the testcase as well. TIA. -- Eric Botcazou
Re: [PATCH] Fix errors in expand_atomic_store.
(1) Are there really live v7 still around? At least with v8 we have SWAP, with which we can implement the full __atomic_exchange pattern sans hackery. We can't do that with just LDSTUB. I think that we can drop v7 support at this point but not v8 because of Leon. -- Eric Botcazou
Re: [patch] support for multiarch systems
On Tue, 1 Nov 2011, Marc Glisse wrote: On Sun, 21 Aug 2011, Matthias Klose wrote: On 08/20/2011 09:51 PM, Matthias Klose wrote: Multiarch [1] is the term being used to refer to the capability of a system to install and run applications of multiple different binary targets on the same system. The idea and name of multiarch dates back to 2004/2005 [2] (to be confused with multiarch in glibc). attached is an updated patch which includes feedback from Jakub and Joseph. Hello, what is the status of this patch? Is it waiting for a review? Having gcc 4.7 work out of the box on 2 of the most popular linux distributions seems like an important feature... There were comments of mine that remained unaddressed in the version to which you replied and I don't recall a version that addressed them. So there isn't a patch ready for review. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Handle V4HI vector initialization more efficiently on VIS1.
Committed to trunk. gcc/ * config/sparc/sparc.c (vector_init_faligndata): New function. (sparc_expand_vector_init): Use it for V4HImode on VIS1. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180752 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog|3 +++ gcc/config/sparc/sparc.c | 24 2 files changed, 27 insertions(+), 0 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 9c75318..a7b1c09 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -76,6 +76,9 @@ 2011-11-01 David S. Miller da...@davemloft.net + * config/sparc/sparc.c (vector_init_faligndata): New function. + (sparc_expand_vector_init): Use it for V4HImode on VIS1. + * config/sparc/sparc.c (sparc_expand_vcond): New function. * config/sparc/sparc-protos.h (sparc_expand_vcond): Declare it. * config/sparc/sparc.md (vcondmodemode): New VIS3 expander. diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 6431405..649612e 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -11340,6 +11340,25 @@ vector_init_fpmerge (rtx target, rtx elt, enum machine_mode inner_mode) emit_insn (gen_fpmerge_vis (gen_lowpart (V8QImode, target), t1, t2)); } +static void +vector_init_faligndata (rtx target, rtx elt, enum machine_mode inner_mode) +{ + rtx t1 = gen_reg_rtx (V4HImode); + + elt = convert_modes (SImode, inner_mode, elt, true); + + emit_move_insn (gen_lowpart (SImode, t1), elt); + + emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode), + force_reg (SImode, GEN_INT (6)), + CONST0_RTX (SImode))); + + emit_insn (gen_faligndatav4hi_vis (target, t1, target)); + emit_insn (gen_faligndatav4hi_vis (target, t1, target)); + emit_insn (gen_faligndatav4hi_vis (target, t1, target)); + emit_insn (gen_faligndatav4hi_vis (target, t1, target)); +} + void sparc_expand_vector_init (rtx target, rtx vals) { @@ -11404,6 +11423,11 @@ sparc_expand_vector_init (rtx target, rtx vals) vector_init_fpmerge (target, XVECEXP (vals, 0, 0), inner_mode); return; } + if (mode == V4HImode) + { + vector_init_faligndata (target, XVECEXP (vals, 0, 0), inner_mode); + return; + } } mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0); -- 1.7.6.401.g6a319
[v3] implement LWG 2067 and new issues with constructors in future
This patch implements http://lwg.github.com/issues/lwg-active.html#2067 which has Ready status, as well as fixing two new issues I've reported in the past few hours. The first is that packaged_task's template constructors should be restricted to prevent them from being chosen to copy a packaged_task object and the second is that promise and packaged_task should properly support uses-allocator construction so that if promise(args...) is well-formed then so is promise(allocator_arg, alloc, args...) * include/std/future (promise): Add constructors for uses-allocator construction from rvalue promise. (packaged_task): Implement LWG 2067. Add additional constructors for uses-allocator construction. * testsuite/30_threads/packaged_task/cons/3.cc: New. * testsuite/30_threads/packaged_task/cons/alloc2.cc: New. * testsuite/30_threads/promise/cons/alloc2.cc: New. Tested x86_64-linux, committed to trunk. Index: include/std/future === --- include/std/future (revision 180749) +++ include/std/future (working copy) @@ -955,6 +955,12 @@ _M_storage(__future_base::_S_allocate_result_Res(__a)) { } + templatetypename _Allocator +promise(allocator_arg_t, const _Allocator, promise __rhs) +: _M_future(std::move(__rhs._M_future)), + _M_storage(std::move(__rhs._M_storage)) +{ } + promise(const promise) = delete; ~promise() @@ -1047,6 +1053,12 @@ _M_storage(__future_base::_S_allocate_result_Res(__a)) { } + templatetypename _Allocator +promise(allocator_arg_t, const _Allocator, promise __rhs) +: _M_future(std::move(__rhs._M_future)), + _M_storage(std::move(__rhs._M_storage)) +{ } + promise(const promise) = delete; ~promise() @@ -1122,6 +1134,12 @@ _M_storage(__future_base::_S_allocate_resultvoid(__a)) { } + templatetypename _Allocator +promise(allocator_arg_t, const _Allocator, promise __rhs) +: _M_future(std::move(__rhs._M_future)), + _M_storage(std::move(__rhs._M_storage)) +{ } + promise(const promise) = delete; ~promise() @@ -1270,6 +1288,15 @@ { return std::forward_Tp(__t); } }; + templatetypename _Task, typename _Fn, bool + = is_same_Task, typename remove_reference_Fn::type::value +struct __is_same_pkgdtask +{ typedef void __type; }; + + templatetypename _Task, typename _Fn +struct __is_same_pkgdtask_Task, _Fn, true +{ }; + /// packaged_task templatetypename _Res, typename... _ArgTypes class packaged_task_Res(_ArgTypes...) @@ -1281,13 +1308,20 @@ // Construction and destruction packaged_task() noexcept { } - templatetypename _Fn + templatetypename _Allocator explicit +packaged_task(allocator_arg_t, const _Allocator __a) noexcept +{ } + + templatetypename _Fn, typename = typename + __is_same_pkgdtaskpackaged_task, _Fn::__type +explicit packaged_task(_Fn __fn) : _M_state(std::make_shared_State_type(std::forward_Fn(__fn))) { } - templatetypename _Fn, typename _Allocator + templatetypename _Fn, typename _Allocator, typename = typename + __is_same_pkgdtaskpackaged_task, _Fn::__type explicit packaged_task(allocator_arg_t, const _Allocator __a, _Fn __fn) : _M_state(std::allocate_shared_State_type(__a, @@ -1301,13 +1335,24 @@ } // No copy - packaged_task(packaged_task) = delete; - packaged_task operator=(packaged_task) = delete; + packaged_task(const packaged_task) = delete; + packaged_task operator=(const packaged_task) = delete; + templatetypename _Allocator +explicit +packaged_task(allocator_arg_t, const _Allocator, + const packaged_task) = delete; + // Move support packaged_task(packaged_task __other) noexcept { this-swap(__other); } + templatetypename _Allocator +explicit +packaged_task(allocator_arg_t, const _Allocator, + packaged_task __other) noexcept +{ this-swap(__other); } + packaged_task operator=(packaged_task __other) noexcept { packaged_task(std::move(__other)).swap(*this); Index: testsuite/30_threads/packaged_task/cons/alloc2.cc === --- testsuite/30_threads/packaged_task/cons/alloc2.cc (revision 0) +++ testsuite/30_threads/packaged_task/cons/alloc2.cc (revision 0) @@ -0,0 +1,40 @@ +// { dg-do compile } +// { dg-options -std=gnu++0x } +// { dg-require-cstdint } +// { dg-require-gthreads } +// { dg-require-atomic-builtins } + +// Copyright (C) 2011 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you
[patch] update config.sub
Committed under the brought in via a merge rule. 2011-11-01 DJ Delorie d...@redhat.com * config.sub: Update to version 2011-10-29 (added rl78) Imports this change: 2011-10-29 DJ Delorie d...@redhat.com * config.sub (rl78): New.
Re: RFC: PATCH to adjust warning flags for C++
On 11/01/2011 03:48 PM, Gabriel Dos Reis wrote: On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrillja...@redhat.com wrote: Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus -Wall) broke bootstrap because of narrowing warnings, so I'd like to add -Wno-narrowing to the stage 2+ warning flags. Is this the best way to do that? why do we want to include -Wc++0x-compat in -Wall? It's already included. And I think that your code won't work in C++11 is a warning that most C++ programmers will be interested in if they are asking for warnings. Jason
Re: [v3] implement LWG 2067 and new issues with constructors in future
On 2 November 2011 00:53, Jonathan Wakely wrote: The first is that packaged_task's template constructors should be restricted to prevent them from being chosen to copy a packaged_task object While submitting that issue to the LWG chair I realised the constraint should use decayFn instead of remove_referenceFn so that it also removes cv qualifiers. I'll fix that tomorrow.
[PATCH, ARM] Fix stack red zone bug (PR38644)
Hi, This patch is to fix PR38644 in ARM back-end. OK for trunk? For every detail, please refer to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644. ChangeLog: 2011-11-2 Jiangning Liu jiangning@arm.com PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. ChangeLog of testsuite: 2011-11-2 Jiangning Liu jiangning@arm.com PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New. Thanks, -Jiangning Patch: diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1ada6f..1f6fc26 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22215,6 +22215,8 @@ thumb1_expand_epilogue (void) gcc_assert (amount = 0); if (amount) { + emit_insn (gen_blockage ()); + if (amount 512) emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (amount))); diff --git a/gcc/testsuite/gcc.target/arm/stack-red-zone.c b/gcc/testsuite/gcc.target/arm/stack-red-zone.c new file mode 100644 index 000..b9f0f99 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/stack-red-zone.c @@ -0,0 +1,12 @@ +/* No stack red zone. PR38644. */ +/* { dg-options -mthumb -O2 } */ +/* { dg-final { scan-assembler ldrb\[^\n\]*\\n\[\t \]*add\[\t \]*sp } } */ + +extern int doStreamReadBlock (int *, char *, int size, int); + +char readStream (int *s) +{ + char c = 0; + doStreamReadBlock (s, c, 1, *s); + return c; +}
Re: [PATCH] Fix errors in expand_atomic_store.
On 11/01/2011 11:15 AM, Richard Henderson wrote: On 11/01/2011 04:56 AM, Andrew MacLeod wrote: well, the reason for it was so that __atomic_store can be used as a replacement for sync_lock_release on such targets... And what was your replacement for sync_test_and_set? If you don't have that pair, you don't have a replacement. store (m, 0) is release and t = exchange (m, 1) is test_and_set.
[pph] Merge static_decls. (issue5335042)
Add merging of static_decls in bindings. Due to the current structure, this change is currently only effective at namespace scope. Consequently, there are no changes to test status. We may need to make all bindings merged by default. Index: gcc/cp/ChangeLog.pph 2011-11-01 Lawrence Crowl cr...@google.com * pph-streamer-out.c (pph_out_binding_level_1): Remove streaming of static_decls. (pph_out_binding_level): Add streaming of static_decls. (pph_out_binding_merge_bodies): Likewise. * pph-streamer-in.c (pph_is_tree_element_of_vec): New. (pph_union_two_tree_vecs): New. (pph_union_into_tree_vec): New. (pph_in_binding_level_1): Remove streaming of static_decls. (pph_in_binding_level): Add streaming of static_decls. (pph_in_binding_merge_bodies): Add merging of static_decls from streamer into existing binding. Needs new function parameter. (pph_in_merge_key_tree): Preallocate namespace cp_binding_level. (pph_in_global_binding): Update call to pph_in_binding_merge_bodies. Index: gcc/cp/pph-streamer-in.c === --- gcc/cp/pph-streamer-in.c(revision 180705) +++ gcc/cp/pph-streamer-in.c(working copy) @@ -677,6 +677,66 @@ pph_in_tree_pair_vec (pph_stream *stream } +/* Test whether tree T is an element of vector V. */ + +static bool +pph_is_tree_element_of_vec (tree t, VEC(tree,gc) *v) +{ + unsigned i; + tree s; + FOR_EACH_VEC_ELT (tree, v, i, s) +if (s == t) + return true; + return false; +} + + +/* Return the union of two tree vecs. The argument vectors are unmodified. */ + +static VEC(tree,gc) * +pph_union_two_tree_vecs (VEC(tree,gc) *left, VEC(tree,gc) *right) +{ + /* FIXME pph: This O(left)+O(left*right) union may become a problem. + In the long run, we probably want to copy both into a hash table + and then copy the table into the result. */ + unsigned i; + tree t; + VEC(tree,gc) *unioned = VEC_copy (tree, gc, left); + FOR_EACH_VEC_ELT (tree, right, i, t) +{ + if (!pph_is_tree_element_of_vec (t, left)) + VEC_safe_push (tree, gc, unioned, t); +} + return unioned; +} + + +/* Union FROM one tree vec with and INTO a tree vec. The INTO argument will + have an updated value. The FROM argument is no longer valid. */ + +static void +pph_union_into_tree_vec (VEC(tree,gc) **into, VEC(tree,gc) *from) +{ + if (!VEC_empty (tree, from)) +{ + if (*into == NULL) + *into = from; + else if (VEC_empty (tree, *into)) + { + VEC_free (tree, gc, *into); + *into = from; + } + else + { + VEC(tree,gc) *unioned = pph_union_two_tree_vecs (*into, from); + VEC_free (tree, gc, *into); + VEC_free (tree, gc, from); + *into = unioned; + } +} +} + + / chains */ @@ -967,7 +1027,6 @@ pph_in_binding_level_1 (pph_stream *stre struct bitpack_d bp; bl-this_entity = pph_in_tree (stream); - bl-static_decls = pph_in_tree_vec (stream); num = pph_in_uint (stream); bl-class_shadowed = NULL; @@ -1029,6 +1088,7 @@ pph_in_binding_level (pph_stream *stream bl-namespaces = pph_in_chain (stream); bl-usings = pph_in_chain (stream); bl-using_directives = pph_in_chain (stream); + bl-static_decls = pph_in_tree_vec (stream); pph_in_binding_level_1 (stream, bl); return bl; @@ -1051,12 +,13 @@ pph_in_binding_merge_keys (pph_stream *s /* Read all the merge bodies from STREAM into the cp_binding_level BL. */ static void -pph_in_binding_merge_bodies (pph_stream *stream) +pph_in_binding_merge_bodies (pph_stream *stream, cp_binding_level *bl) { pph_in_merge_body_chain (stream); pph_in_merge_body_chain (stream); pph_in_merge_body_chain (stream); pph_in_merge_body_chain (stream); + pph_union_into_tree_vec (bl-static_decls, pph_in_tree_vec (stream)); } @@ -1951,11 +2012,11 @@ pph_in_merge_key_tree (pph_stream *strea { if (TREE_CODE (expr) == NAMESPACE_DECL) { - /* struct lang_decl *ld; */ - retrofit_lang_decl (expr); - /* ld = DECL_LANG_SPECIFIC (expr); */ - /* FIXME NOW: allocate binding. */ - pph_in_binding_merge_keys (stream, NAMESPACE_LEVEL (expr)); + cp_binding_level *bl; + retrofit_lang_decl (expr); + bl = ggc_alloc_cleared_cp_binding_level (); + NAMESPACE_LEVEL (expr) = bl; + pph_in_binding_merge_keys (stream, bl); } #if 0 /* FIXME pph: Disable type merging for the moment. */ @@ -2438,7 +2499,7 @@ pph_in_global_binding (pph_stream *strea same slot IX that the writer used, the trees read now will be bound to scope_chain-bindings. */ pph_in_binding_merge_keys (stream, bl); - pph_in_binding_merge_bodies (stream); + pph_in_binding_merge_bodies (stream, bl); /* FIXME
Re: RFC: PATCH to adjust warning flags for C++
On Tue, Nov 1, 2011 at 8:11 PM, Jason Merrill ja...@redhat.com wrote: On 11/01/2011 03:48 PM, Gabriel Dos Reis wrote: On Tue, Nov 1, 2011 at 12:54 PM, Jason Merrillja...@redhat.com wrote: Paolo Carlini's patch to add -Wnarrowing to -Wc++0x-compat (and thus -Wall) broke bootstrap because of narrowing warnings, so I'd like to add -Wno-narrowing to the stage 2+ warning flags. Is this the best way to do that? why do we want to include -Wc++0x-compat in -Wall? It's already included. yes, that is why I asked. E.g. it isn't obvious that -Wc++0x-compat ought to be in -Wall at this stage or 4.6.x. And I think that your code won't work in C++11 is a warning that most C++ programmers will be interested in if they are asking for warnings. Even when -std=c++03 -Wall or -std=c++98 -Wall? I would suggest we do this: 1. Include -Wc++0x-compat in -W or -Wextra for THIS release. 2. leave -Wnarrowing in -Wc++0x-compat by default. 3. Make a release note that -Wc++0x-compat will be activated in the very major release.
[v3] tr2 missing bits
Ooops, noticed some minor bits when I was regenerating the docs. Some of the TR2 man pages needed munging, and the c++config bits for versioning TR2 needed to go in. tested x86/linux best, benjamin 2011-11-02 Benjamin Kosnik b...@redhat.com * include/bits/c++config: Add tr2 to versioned namespaces. * scripts/run_doxygen: Adjust generated man files as well. * testsuite/ext/profile/mutex_extensions_neg.cc: Adjust line numbers. diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index f77da5e..e76e742 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -148,6 +148,8 @@ namespace __detail { } } +namespace tr2 { } + namespace decimal { } namespace chrono { } @@ -197,6 +199,9 @@ namespace std namespace __detail { inline namespace __7 { } } } + namespace tr2 + { inline namespace __7 { } } + namespace decimal { inline namespace __7 { } } namespace chrono { inline namespace __7 { } } diff --git a/libstdc++-v3/scripts/run_doxygen b/libstdc++-v3/scripts/run_doxygen index 48b1724..3fef95f 100644 --- a/libstdc++-v3/scripts/run_doxygen +++ b/libstdc++-v3/scripts/run_doxygen @@ -339,6 +339,10 @@ for f in std_tr1_*; do newname=`echo $f | sed 's/^std_tr1_/std::tr1::/'` mv $f $newname done +for f in std_tr2_*; do +newname=`echo $f | sed 's/^std_tr2_/std::tr2::/'` +mv $f $newname +done for f in std_*; do newname=`echo $f | sed 's/^std_/std::/'` mv $f $newname diff --git a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc index 4e2d071..c6e6fea 100644 --- a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc +++ b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc @@ -25,4 +25,4 @@ #include vector -// { dg-error multiple inlined namespaces { target *-*-* } 258 } +// { dg-error multiple inlined namespaces { target *-*-* } 263 }
Re: [PATCH, rs6000] Preserve link stack for 476 cpus
On Tue, Nov 01, 2011 at 02:00:25PM -0500, Peter Bergner wrote: (get_ppc476_thunk_name): New function. (rs6000_code_end): Likewise. rs6000.c:27968:1: error: 'void rs6000_code_end()' defined but not used [-Werror=unused-function] cc1plus: all warnings being treated as errors Bootstrapped and committed as obvious, revision 180761. * config/rs6000/rs6000.c (rs6000_code_end): Declare ATTRIBUTE_UNUSED. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 180754) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1176,6 +1176,7 @@ static void rs6000_trampoline_init (rtx, static bool rs6000_cannot_force_const_mem (enum machine_mode, rtx); static bool rs6000_legitimate_constant_p (enum machine_mode, rtx); static bool rs6000_save_toc_in_prologue_p (void); +static void rs6000_code_end (void) ATTRIBUTE_UNUSED; /* Hash table stuff for keeping track of TOC entries. */ -- Alan Modra Australia Development Lab, IBM
Re: -fdump-go-spec option does not handle redefinitions
Uros Bizjak ubiz...@gmail.com writes: The problem with your proposal is that the output would be invalid Go, because it would attempt to define the name _aa twice. However, it does seem plausible that in most scenarios of this type it would be more useful for -fdump-go-spec to generate const _aa = 3 I agree. This patch implements this approach. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian 2011-11-01 Ian Lance Taylor i...@google.com * godump.c (struct macro_hash_value): Define. (macro_hash_hashval): New static function. (macro_hash_eq, macro_hash_del): New static functions. (go_define): Use macro_hash_value to store values in macro_hash. Replace an old value on a redefinition. Don't print anything to go_dump_file. (go_undef): Delete the entry from the hash table. (go_output_typedef): For an enum, use macro_hash_value, and don't print anything to go_dump_file. (go_print_macro): New static function. (go_finish): Traverse macro_hash with go_print_macro. (dump_go_spec_init): Update macro_hash creation for macro_hash_value. Index: godump.c === --- godump.c (revision 180342) +++ godump.c (working copy) @@ -62,7 +62,47 @@ static GTY(()) VEC(tree,gc) *queue; static htab_t macro_hash; -/* For the hash tables. */ +/* The type of a value in macro_hash. */ + +struct macro_hash_value +{ + /* The name stored in the hash table. */ + char *name; + /* The value of the macro. */ + char *value; +}; + +/* Calculate the hash value for an entry in the macro hash table. */ + +static hashval_t +macro_hash_hashval (const void *val) +{ + const struct macro_hash_value *mhval = (const struct macro_hash_value *) val; + return htab_hash_string (mhval-name); +} + +/* Compare values in the macro hash table for equality. */ + +static int +macro_hash_eq (const void *v1, const void *v2) +{ + const struct macro_hash_value *mhv1 = (const struct macro_hash_value *) v1; + const struct macro_hash_value *mhv2 = (const struct macro_hash_value *) v2; + return strcmp (mhv1-name, mhv2-name) == 0; +} + +/* Free values deleted from the macro hash table. */ + +static void +macro_hash_del (void *v) +{ + struct macro_hash_value *mhv = (struct macro_hash_value *) v; + XDELETEVEC (mhv-name); + XDELETEVEC (mhv-value); + XDELETE (mhv); +} + +/* For the string hash tables. */ static int string_hash_eq (const void *y1, const void *y2) @@ -77,10 +117,12 @@ go_define (unsigned int lineno, const ch { const char *p; const char *name_end; + size_t out_len; char *out_buffer; char *q; bool saw_operand; bool need_operand; + struct macro_hash_value *mhval; char *copy; hashval_t hashval; void **slot; @@ -105,17 +147,17 @@ go_define (unsigned int lineno, const ch memcpy (copy, buffer, name_end - buffer); copy[name_end - buffer] = '\0'; + mhval = XNEW (struct macro_hash_value); + mhval-name = copy; + mhval-value = NULL; + hashval = htab_hash_string (copy); - slot = htab_find_slot_with_hash (macro_hash, copy, hashval, NO_INSERT); - if (slot != NULL) -{ - XDELETEVEC (copy); - return; -} + slot = htab_find_slot_with_hash (macro_hash, mhval, hashval, NO_INSERT); /* For simplicity, we force all names to be hidden by adding an initial underscore, and let the user undo this as needed. */ - out_buffer = XNEWVEC (char, strlen (p) * 2 + 1); + out_len = strlen (p) * 2 + 1; + out_buffer = XNEWVEC (char, out_len); q = out_buffer; saw_operand = false; need_operand = false; @@ -141,6 +183,7 @@ go_define (unsigned int lineno, const ch don't worry about them. */ const char *start; char *n; + struct macro_hash_value idval; if (saw_operand) goto unknown; @@ -151,8 +194,9 @@ go_define (unsigned int lineno, const ch n = XALLOCAVEC (char, p - start + 1); memcpy (n, start, p - start); n[p - start] = '\0'; - slot = htab_find_slot (macro_hash, n, NO_INSERT); - if (slot == NULL || *slot == NULL) + idval.name = n; + idval.value = NULL; + if (htab_find (macro_hash, idval) == NULL) { /* This is a reference to a name which was not defined as a macro. */ @@ -382,18 +426,30 @@ go_define (unsigned int lineno, const ch if (need_operand) goto unknown; + gcc_assert ((size_t) (q - out_buffer) out_len); *q = '\0'; - slot = htab_find_slot_with_hash (macro_hash, copy, hashval, INSERT); - *slot = copy; + mhval-value = out_buffer; - fprintf (go_dump_file, const _%s = %s\n, copy, out_buffer); + if (slot == NULL) +{ + slot = htab_find_slot_with_hash (macro_hash, mhval, hashval, INSERT); + gcc_assert (slot != NULL *slot == NULL); +} + else +{ + if (*slot != NULL) + macro_hash_del (*slot); +} + + *slot = mhval; -
Re: RFC: PATCH to adjust warning flags for C++
On 11/02/2011 12:05 AM, Gabriel Dos Reis wrote: And I think that your code won't work in C++11 is a warning that most C++ programmers will be interested in if they are asking for warnings. Even when -std=c++03 -Wall or -std=c++98 -Wall? Yes. -Wc++0x-compat has been part of -Wall for almost 5 years. If people don't want narrowing warnings, they can use -Wno-narrowing, which is helpfully mentioned in the warnings themselves. Jason