Re: -fdump-passes -fenable-xxx=func_name_list
This is the version of the patch that walks through pass lists. Ok with this one? David On Wed, Jun 1, 2011 at 12:45 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 12:29 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 6:16 PM, Xinliang David Li davi...@google.com wrote: On Wed, Jun 1, 2011 at 1:51 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:34 AM, Xinliang David Li davi...@google.com wrote: The following patch implements the a new option that dumps gcc PASS configuration. The sample output is attached. There is one limitation: some placeholder passes that are named with '*xxx' are note registered thus they are not listed. They are not important as they can not be turned on/off anyway. The patch also enhanced -fenable-xxx and -fdisable-xx to allow a list of function assembler names to be specified. Ok for trunk? Please split the patch. I'm not too happy how you dump the pass configuration. Why not simply, at a _single_ place, walk the pass tree? Instead of doing pieces of it at pass execution time when it's not already dumped - that really looks gross. Yes, that was the original plan -- but it has problems 1) the dumper needs to know the root pass lists -- which can change frequently -- it can be a long term maintanance burden; 2) the centralized dumper needs to be done after option processing 3) not sure if gate functions have any side effects or have dependencies on cfun The proposed solutions IMHO is not that intrusive -- just three hooks to do the dumping and tracking indentation. Well, if you have a CU that is empty or optimized to nothing at some point you will not get a complete pass list. I suppose optimize attributes might also confuse output. Your solution might not be that intrusive but it is still ugly. I don't see 1) as an issue, for 2) you can just call the dumping from toplev_main before calling do_compile (), 3) gate functions shouldn't have side-effects, but as they could gate on optimize_for_speed () your option summary output will be bogus anyway. So - what is the output intended for if it isn't reliable? This needs to be cleaned up at some point -- the gate function should behave the same for all functions and per-function decisions need to be pushed down to the executor body. I will try to rework the patch as you suggested to see if there are problems. David Richard. The documentation should also link this option to the -fenable/disable options as obviously the pass names in that dump are those to be used for those flags (and not readily available anywhere else). Ok. I also think that it would be way more useful to note in the individual dump files the functions (at the place they would usually appear) that have the pass explicitly enabled/disabled. Ok -- for ipa passes or tree/rtl passes where all functions are explicitly disabled. Thanks, David Richard. Thanks, David dump-pass3.p Description: Binary data out Description: Binary data
Ping: [Patch] Make libstdc++'s abi_check more robust against readelf output format
Ping. On 20 May 2011 17:05, Simon Baldwin sim...@google.com wrote: Make libstdc++'s abi_check more robust against readelf output format. libstdc++-abi/abi_check in the libstdc++-v3 testsuite relies on a fixed number of space separated fields in readelf output. However, the field count for readelf output can vary where the library contains OS or processor specific bindings, or other unknown bindings. This patch replaces the strings that readelf outputs for such bindings with alternative strings that use underscores in place of space. It preserves the count of fields for such cases, and allows the awk statement that follows to find the desired field correctly with $n. OK for trunk? libstdc++-v3/ChangeLog: 2011-05-20 Simon Baldwin sim...@google.com * scripts/extract_symvers.in: Handle processor/OS specific or unknown symbol binding strings from readelf. Index: libstdc++-v3/scripts/extract_symvers.in === --- libstdc++-v3/scripts/extract_symvers.in (revision 173951) +++ libstdc++-v3/scripts/extract_symvers.in (working copy) @@ -52,6 +52,9 @@ SunOS) ${readelf} ${lib} |\ sed -e 's/ \[other: [A-Fa-f0-9]*\] //' -e '/\.dynsym/,/^$/p;d' |\ egrep -v ' (LOCAL|UND) ' |\ + sed -e 's/ processor specific: / processor_specific:_/g' |\ + sed -e 's/ OS specific: / OS_specific:_/g' |\ + sed -e 's/ unknown: / unknown:_/g' |\ awk '{ if ($4 == FUNC || $4 == NOTYPE) printf %s:%s\n, $4, $8; else if ($4 == OBJECT || $4 == TLS) -- Google UK Limited | Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1W 9TQ | Registered in England Number: 3977902
Re: [PATCH][ARM] Add support for ADDW and SUBW instructions
Ping 2. On 20/04/11 16:27, Andrew Stubbs wrote: This patch adds basic support for the Thumb ADDW and SUBW instructions. The patch permits the compiler to use the new instructions for constants that can be loaded with a single instruction (i.e. 16-bit unshifted), but does not support use of addw with split-constants; I have a patch for that coming soon. This patch requires that my previously posted patch for MOVW is applied first. OK? Andrew
Re: [PATCH][ARM] Add support for ADDW and SUBW instructions
OK? This is largely OK modulo the following. Please remove the alternatives in the subsi3 pattern since that is just unnecessary. Please make the constraints internal only. cheers Ramana Andrew
Re: [PATCH, ARM] Thumb-2 12-bit immediates in ADD and SUB instructions
Would you include this in your patch? Or should we submit it as a separate patch? Could you submit this as a follow-up patch that touches the costs. I would rather that these changes also went in when we were looking at this area ? cheers Ramana
Re: [patch] Improve detection of widening multiplication in the vectorizer
On 1 June 2011 15:14, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 12:42, Richard Guenther richard.guent...@gmail.com wrote: Did you think about moving pass_optimize_widening_mul before loop optimizations? Does that pass catch the cases you are teaching the pattern recognizer? I think we should try to expose these more complicated instructions to loop optimizers. pass_optimize_widening_mul doesn't catch these cases, but I can try to teach it instead of the vectorizer. I am now testing Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,7 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +935,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); to see how it affects other loop optimizations (vectorizer pattern tests obviously fail). Looks like it needs copy_prop and dce as well: Index: passes.c === --- passes.c(revision 174391) +++ passes.c(working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Ira Thanks. I would hope that we eventually can get rid of the pattern recognizer ... at least for SSE there is also always a scalar variant instruction for each vectorized one. Richard.
Re: introduce --param max-vartrack-expr-depth
On Wed, Jun 01, 2011 at 07:25:39PM -0300, Alexandre Oliva wrote: Such as this one... I'd appreciate if this could go in... Index: gcc/params.def === --- gcc/params.def.orig 2011-05-31 18:28:05.348070586 -0300 +++ gcc/params.def2011-06-01 17:09:41.117140944 -0300 @@ -845,7 +845,7 @@ DEFPARAM (PARAM_MAX_VARTRACK_SIZE, DEFPARAM (PARAM_MAX_VARTRACK_EXPR_DEPTH, max-vartrack-expr-depth, Max. recursion depth for expanding var tracking expressions, - 10, 0, 0) + 20, 0, 0) /* Set minimum insn uid for non-debug insns. */ Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2011-05-31 20:06:25.604477956 -0300 +++ gcc/var-tracking.c2011-05-31 23:56:06.578450957 -0300 @@ -5288,7 +5288,7 @@ reverse_op (rtx val, const_rtx expr) arg = XEXP (src, 1); if (!CONST_INT_P (arg) GET_CODE (arg) != SYMBOL_REF) { - arg = cselib_expand_value_rtx (arg, scratch_regs, EXPR_DEPTH); + arg = cselib_expand_value_rtx (arg, scratch_regs, 5); if (arg == NULL_RTX) return NULL_RTX; if (!CONST_INT_P (arg) GET_CODE (arg) != SYMBOL_REF) Jakub
Re: [PATCH][ARM] Add support for ADDW and SUBW instructions
On 02/06/11 09:23, Ramana Radhakrishnan wrote: Please remove the alternatives in the subsi3 pattern since that is just unnecessary. Please make the constraints internal only. Is this better? Andrew 2011-06-02 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm-protos.h (const_ok_for_op): Add prototype. * config/arm/arm.c (const_ok_for_op): Add support for addw/subw. Remove prototype. Remove static function type. * config/arm/arm.md (*arm_addsi3): Add addw/subw support. Add arch attribute. * config/arm/constraints.md (Pj, PJ): New constraints. --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -46,6 +46,7 @@ extern bool arm_vector_mode_supported_p (enum machine_mode); extern bool arm_small_register_classes_for_mode_p (enum machine_mode); extern int arm_hard_regno_mode_ok (unsigned int, enum machine_mode); extern int const_ok_for_arm (HOST_WIDE_INT); +extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern RTX_CODE arm_canonicalize_comparison (RTX_CODE, rtx *, rtx *); --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -82,7 +82,6 @@ inline static int thumb1_index_register_rtx_p (rtx, int); static bool arm_legitimate_address_p (enum machine_mode, rtx, bool); static int thumb_far_jump_used_p (void); static bool thumb_force_lr_save (void); -static int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); static rtx emit_sfm (int, int); static unsigned arm_size_return_regs (void); static bool arm_assemble_integer (rtx, unsigned int, int); @@ -2149,7 +2148,7 @@ const_ok_for_arm (HOST_WIDE_INT i) } /* Return true if I is a valid constant for the operation CODE. */ -static int +int const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code) { if (const_ok_for_arm (i)) @@ -2165,6 +2164,13 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code) return 0; case PLUS: + /* See if we can use addw or subw. */ + if (TARGET_THUMB2 + ((i 0xf000) == 0 + || ((-i) 0xf000) == 0)) + return 1; + /* else fall through. */ + case COMPARE: case EQ: case NE: --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -707,21 +707,24 @@ ;; (plus (reg rN) (reg sp)) into (reg rN). In this case reload will ;; put the duplicated register first, and not try the commutative version. (define_insn_and_split *arm_addsi3 - [(set (match_operand:SI 0 s_register_operand =r, k,r,r, k,r) - (plus:SI (match_operand:SI 1 s_register_operand %rk,k,r,rk,k,rk) - (match_operand:SI 2 reg_or_int_operand rI,rI,k,L, L,?n)))] + [(set (match_operand:SI 0 s_register_operand =r, k,r,r, k, r, k,r, k, r) + (plus:SI (match_operand:SI 1 s_register_operand %rk,k,r,rk,k, rk,k,rk,k, rk) + (match_operand:SI 2 reg_or_int_operand rI,rI,k,Pj,Pj,L, L,PJ,PJ,?n)))] TARGET_32BIT @ add%?\\t%0, %1, %2 add%?\\t%0, %1, %2 add%?\\t%0, %2, %1 + addw%?\\t%0, %1, %2 + addw%?\\t%0, %1, %2 sub%?\\t%0, %1, #%n2 sub%?\\t%0, %1, #%n2 + subw%?\\t%0, %1, #%n2 + subw%?\\t%0, %1, #%n2 # TARGET_32BIT GET_CODE (operands[2]) == CONST_INT -!(const_ok_for_arm (INTVAL (operands[2])) -|| const_ok_for_arm (-INTVAL (operands[2]))) +!const_ok_for_op (INTVAL (operands[2]), PLUS) (reload_completed || !arm_eliminable_register (operands[1])) [(clobber (const_int 0))] @@ -730,8 +733,9 @@ operands[1], 0); DONE; - [(set_attr length 4,4,4,4,4,16) - (set_attr predicable yes)] + [(set_attr length 4,4,4,4,4,4,4,4,4,16) + (set_attr predicable yes) + (set_attr arch *,*,*,t2,t2,*,*,t2,t2,*)] ) (define_insn_and_split *thumb1_addsi3 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -31,7 +31,7 @@ ;; The following multi-letter normal constraints have been used: ;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd -;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px, Py +;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us @@ -75,6 +75,18 @@ (and (match_code const_int) (match_test (ival 0x) == 0) +(define_constraint Pj + @internal A 12-bit constant suitable for an ADDW or SUBW instruction. (Thumb-2) + (and (match_code const_int) + (and (match_test TARGET_THUMB2) + (match_test (ival 0xf000) == 0 + +(define_constraint PJ + @internal A constant that satisfies the Pj constrant if negated. + (and (match_code const_int) + (and (match_test TARGET_THUMB2) + (match_test ((-ival) 0xf000) == 0 + (define_register_constraint k STACK_REG @internal The stack register.)
Re: [patch][simplify-rtx] Fix 16-bit - 64-bit multiply and accumulate
On Thu, 2011-05-26 at 14:35 +0100, Andrew Stubbs wrote: On 25/05/11 14:47, Joseph S. Myers wrote: The shift must be by a positive constant amount, strictly less than the precision (GET_MODE_PRECISION) of the mode (of the value being shifted). If that applies, the relevant number of bits is the precision of the mode minus the number of bits of the shift. For an extension, just take the number of bits in the inner mode. Add the two numbers of bits; if the result does not exceed the number of bits in the mode (of the operands and the multiplication) then the multiplication won't overflow. I believe the attached should implement what you describe. Is the patch OK now? Andrew OK. R.
Re: [patch] Improve detection of widening multiplication in the vectorizer
On Thu, Jun 2, 2011 at 10:46 AM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 15:14, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 12:42, Richard Guenther richard.guent...@gmail.com wrote: Did you think about moving pass_optimize_widening_mul before loop optimizations? Does that pass catch the cases you are teaching the pattern recognizer? I think we should try to expose these more complicated instructions to loop optimizers. pass_optimize_widening_mul doesn't catch these cases, but I can try to teach it instead of the vectorizer. I am now testing Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,7 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +935,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); to see how it affects other loop optimizations (vectorizer pattern tests obviously fail). Looks like it needs copy_prop and dce as well: Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Hmm. I would have put the pass next to the sincos pass, but yes, in principle a copyprop dce pass after PRE makes sense (the loop passes likely don't run because there are no loops in those testcases - both copyprop and dce should be scheduled more like TODOs, or even automatically by the pass manager via PROPs ...). Dead code can indeed confuse those matching passes that look for single-use vars. I'll think about a more elegant solution for this problem. Would you mind checking if the next-to-sincos position makes any difference? Thanks, Richard. Ira Thanks. I would hope that we eventually can get rid of the pattern recognizer ... at least for SSE there is also always a scalar variant instruction for each vectorized one. Richard.
Re: introduce --param max-vartrack-expr-depth
On 06/02/2011 10:46 AM, Jakub Jelinek wrote: On Wed, Jun 01, 2011 at 07:25:39PM -0300, Alexandre Oliva wrote: Such as this one... I'd appreciate if this could go in... Go on then. Bernd
Re: Ping: [Patch] Make libstdc++'s abi_check more robust against readelf output format
Hi, Ping. Did Ian Taylor see this patch? If he likes it, I'm also fine with it. Paolo
Re: Add missing ChangeLog entry
On 06/01/11 15:32, Ian Lance Taylor wrote: I noticed that we have a --with-specs option in gcc/configure.ac, added in revision 155208 with this e-mail message: http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00132.html sorry about that. How's the attachd documentation? -- Nathan Sidwell 2011-06-02 Nathan Sidwell nat...@codesourcery.com * doc/install.texi (Options specification): Document --with-specs. Index: doc/install.texi === --- doc/install.texi (revision 174559) +++ doc/install.texi (working copy) @@ -771,6 +771,12 @@ on other configuration options, and differs between cross and native configurations. +@item --with-specs=@var{specs} +Specify additional command line driver SPECS. This can be useful if +you to turn on a non-standard feature by default without modifying the +compiler's source code, for instance +@option{--with-specs=%@{!fcommon:%@{!fno-common:-fno-common@}@}}. + @end table @item --program-prefix=@var{prefix}
Re: RFA: another patch to solve PR49154
On Tue, 31 May 2011, Richard Sandiford wrote: Gah, seems like I'd forgotten the no subclasses bit by the time I started looking at code. Sorry for the false alarm. Still, the extra look made me realise that I should have restricted that statement to allocatable registers. (And I really do appreciate a look from a native speaker.) Updated patch follows, checked dvi and info output: * doc/tm.texi.in (Register Classes): Document rule for the narrowest register classes. * doc/tm.texi: Regenerate. Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 174376) +++ doc/tm.texi.in (working copy) @@ -2327,6 +2327,12 @@ constraints is through machine-dependent You can define such letters to correspond to various classes, then use them in operand constraints. +You must define the narrowest register classes for allocatable +registers, so that each class either has no subclasses, or that for +some mode, the move cost between registers within the class is +cheaper than moving a register in the class to or from memory +(@pxref{Costs}). + You should define a class for the union of two classes whenever some instruction allows both classes. For example, if an instruction allows either a floating point (coprocessor) register or a general register for a brgds, H-P
Re: [PATCH][ARM] Add support for ADDW and SUBW instructions
On 2 June 2011 10:03, Andrew Stubbs a...@codesourcery.com wrote: On 02/06/11 09:23, Ramana Radhakrishnan wrote: Please remove the alternatives in the subsi3 pattern since that is just unnecessary. Please make the constraints internal only. Is this better? OK. Ramana Andrew
Re: Add missing ChangeLog entry
On Thu, 02 Jun 2011 11:12:12 +0100 Nathan Sidwell nat...@codesourcery.com wrote: On 06/01/11 15:32, Ian Lance Taylor wrote: I noticed that we have a --with-specs option in gcc/configure.ac, added in revision 155208 with this e-mail message: http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00132.html sorry about that. How's the attachd documentation? [...] +@item --with-specs=@var{specs} +Specify additional command line driver SPECS. This can be useful if +you to turn on a non-standard feature by default without modifying the +compiler's source code, for instance +@option{--with-specs=%@{!fcommon:%@{!fno-common:-fno-common@}@}}. I am not a native English speaker, and my english is bad. But perhaps it should be this can be useful if you *want* to turn on I feel that the 'want' word is missing... Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Remove SETJMP_VIA_SAVE_AREA support
This removes the (undocumented) support for SETJMP_VIA_SAVE_AREA from the compiler. This is a trick implemented on the SPARC exclusively to reuse the register save area present in all frames (because of the register windows) for part of the setjmp buffer. The benefit are marginal and dwarfed by the usual drawbacks of using setjmp/longjmp (to implement exceptions for example). This exposed a couple of similar bugs in cse.c and postreload-gcse.c: the code was effectively treating a basic block with a single, abnormal incoming edge as if the edge was normal. Bootstrapped/regtested on x86_64-suse-linux and sparc-sun-solaris2.10. I also verified that ACATS is clean with the SJLJ EH scheme. Applied on the mainline. 2011-06-02 Eric Botcazou ebotca...@adacore.com * function.h (struct stack_usage): Remove dynamic_alloc_count field. (current_function_dynamic_alloc_count): Delete. * builtins.c (expand_builtin_setjmp_setup): Do not set calls_setjmp. (expand_builtin_nonlocal_goto): Remove obsolete comment. (expand_builtin_update_setjmp_buf): Remove dead code. * cse.c (cse_find_path): Do not follow a single abnormal incoming edge. * explow.c (allocate_dynamic_stack_space): Remove SETJMP_VIA_SAVE_AREA support. * function.c (instantiate_virtual_regs): Likewise. * postreload-gcse.c (bb_has_well_behaved_predecessors): Return false for a block with a single abnormal incoming edge. * config/sparc/sparc.h (STACK_SAVEAREA_MODE): Define. * config/sparc/sparc-protos.h (load_got_register): Declare. * config/sparc/sparc.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. (load_got_register): Make global. (sparc_frame_pointer_required): Add 'static'. (sparc_can_eliminate): Likewise. Call sparc_frame_pointer_required. (sparc_builtin_setjmp_frame_value): New function. * config/sparc/sparc.md (UNSPECV_SETJMP): Remove. (save_stack_nonlocal): New expander. (restore_stack_nonlocal): Likewise. (nonlocal_goto): Remove modes, adjust predicates and reimplement. (nonlocal_goto_internal): New insn. (goto_handler_and_restore): Delete. (builtin_setjmp_setup): Likewise. (do_builtin_setjmp_setup): Likewise. (setjmp): Likewise. (builtin_setjmp_receiver): New expander. -- Eric Botcazou Index: function.h === --- function.h (revision 174559) +++ function.h (working copy) @@ -1,6 +1,6 @@ /* Structure for saving state for a nested function. Copyright (C) 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 + 1999, 2000, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -476,9 +476,6 @@ struct GTY(()) stack_usage !ACCUMULATE_OUTGOING_ARGS, it contains the outgoing arguments. */ int pushed_stack_size; - /* # of dynamic allocations in the function. */ - unsigned int dynamic_alloc_count : 31; - /* Nonzero if the amount of stack space allocated dynamically cannot be bounded at compile-time. */ unsigned int has_unbounded_dynamic_stack_size : 1; @@ -487,7 +484,6 @@ struct GTY(()) stack_usage #define current_function_static_stack_size (cfun-su-static_stack_size) #define current_function_dynamic_stack_size (cfun-su-dynamic_stack_size) #define current_function_pushed_stack_size (cfun-su-pushed_stack_size) -#define current_function_dynamic_alloc_count (cfun-su-dynamic_alloc_count) #define current_function_has_unbounded_dynamic_stack_size \ (cfun-su-has_unbounded_dynamic_stack_size) #define current_function_allocates_dynamic_stack_space\ Index: builtins.c === --- builtins.c (revision 174559) +++ builtins.c (working copy) @@ -806,10 +806,6 @@ expand_builtin_setjmp_setup (rtx buf_add emit_insn (gen_builtin_setjmp_setup (buf_addr)); #endif - /* Tell optimize_save_area_alloca that extra work is going to - need to go on during alloca. */ - cfun-calls_setjmp = 1; - /* We have a nonlocal label. */ cfun-has_nonlocal_label = 1; } @@ -992,8 +988,8 @@ expand_builtin_nonlocal_goto (tree exp) r_label = convert_memory_address (Pmode, r_label); r_save_area = expand_normal (t_save_area); r_save_area = convert_memory_address (Pmode, r_save_area); - /* Copy the address of the save location to a register just in case it was based -on the frame pointer. */ + /* Copy the address of the save location to a register just in case it was + based on the frame pointer. */ r_save_area = copy_to_reg (r_save_area); r_fp = gen_rtx_MEM (Pmode, r_save_area); r_sp = gen_rtx_MEM (STACK_SAVEAREA_MODE (SAVE_NONLOCAL), @@ -1013,11 +1009,7 @@ expand_builtin_nonlocal_goto (tree exp) emit_clobber (gen_rtx_MEM
Re: [patch] Improve detection of widening multiplication in the vectorizer
On 2 June 2011 12:59, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 10:46 AM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 15:14, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 12:42, Richard Guenther richard.guent...@gmail.com wrote: Did you think about moving pass_optimize_widening_mul before loop optimizations? Does that pass catch the cases you are teaching the pattern recognizer? I think we should try to expose these more complicated instructions to loop optimizers. pass_optimize_widening_mul doesn't catch these cases, but I can try to teach it instead of the vectorizer. I am now testing Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,7 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +935,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); to see how it affects other loop optimizations (vectorizer pattern tests obviously fail). Looks like it needs copy_prop and dce as well: Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Hmm. I would have put the pass next to the sincos pass, but yes, in principle a copyprop dce pass after PRE makes sense (the loop passes likely don't run because there are no loops in those testcases - both copyprop and dce should be scheduled more like TODOs, or even automatically by the pass manager via PROPs ...). Dead code can indeed confuse those matching passes that look for single-use vars. I'll think about a more elegant solution for this problem. Would you mind checking if the next-to-sincos position makes any difference? Before sincos we have D.2747_2 = __builtin_powf (a_1(D), 2.0e+0); D.2746_4 = D.2747_2 + c_3(D); which is transformed by sincos to powmult.8_7 = a_1(D) * a_1(D); D.2747_2 = powmult.8_7; D.2746_4 = D.2747_2 + c_3(D); but widening_mul is confused by D.2747_2 = powmult.8_7; and it needs both copy_prop and dce to remove it: powmult.8_7 = a_1(D) * a_1(D); D.2746_4 = c_3(D) + powmult.8_7; So moving widening_mul next to sincos doesn't help. Maybe gimple_expand_builtin_pow() can be changed to generate the last version by itself? Ira Thanks, Richard. Ira Thanks. I would hope that we eventually can get rid of the pattern recognizer ... at least for SSE there is also always a scalar variant instruction for each vectorized one. Richard.
Re: Ping: [Patch] Make libstdc++'s abi_check more robust against readelf output format
On 2 June 2011 08:55, Simon Baldwin wrote: Index: libstdc++-v3/scripts/extract_symvers.in === --- libstdc++-v3/scripts/extract_symvers.in (revision 173951) +++ libstdc++-v3/scripts/extract_symvers.in (working copy) @@ -52,6 +52,9 @@ SunOS) ${readelf} ${lib} |\ sed -e 's/ \[other: [A-Fa-f0-9]*\] //' -e '/\.dynsym/,/^$/p;d' |\ egrep -v ' (LOCAL|UND) ' |\ + sed -e 's/ processor specific: / processor_specific:_/g' |\ + sed -e 's/ OS specific: / OS_specific:_/g' |\ + sed -e 's/ unknown: / unknown:_/g' |\ Is there a reason to use three sed processes instead of one? We already assume sed -e script -e script works earlier in that pipeline. We could even replace the egrep with a sed 'd' command and combine it all into a single sed, but that could be left for another day.
Re: Initialize INSN_COND
On 06/02/2011 01:29 PM, Alexander Monakov wrote: Bernd, The problem is INSN_COND should be reset when initializing a new deps structure, otherwise instructions may get stale conditions from other previously analyzed instructions. Presuming that sd_init_insn is the proper place for that, I'll test the following patch. 2011-06-02 Alexander Monakov amona...@ispras.ru * sched-deps.c (sd_init_insn): Initialize INSN_COND. * sel-sched.c (move_op): Use correct type for 'res'. Verify that code_motion_path_driver returned 0 or 1. Ok. Although I wonder how sel-sched can end up reusing an entry in h_d_i_d? How does it use this machinery? If it's not doing a normal forward scan as in sched_analyze, the INSN_COND mechanism may break in other ways. Bernd
Re: [google] Improve locus information during if-conversion (issue4526101)
On Wed, Jun 1, 2011 at 21:03, Sharad Singhai sing...@google.com wrote: 2011-06-01 Sharad Singhai sing...@google.com Google Ref 39994 * ifcvt.c (noce_try_cmove_arith): Use the locus information from the if-statment rather than the then path. Could you elaborate how it improves locus information? Is there a test case you can add to the testsuite? Or an example code fragment that shows how is the locus better now? OK for google/main. Diego.
[PATCH] make attribute((returns_twice)) actually work (PR tree-optimization/49243)
GCC has attribute((returns_twice)) which is supposed to allow the safe use of alternate implementations of setjmp-like functions. In particular, a function that calls a setjmp-like function must itself not be inlined, because that would enable unsafe optimizations. This works for calls to setjmp (a few alternate spellings are allowed), but not to e.g. my_setjmp even if that function is declared with attribute((returns_twice)). This bug affects the entire gcc-4.x series, gcc-3.x worked; see PR49243. A function that calls setjmp is marked non-inlinable because setjmp_call_p is applied to the function position, and it deduces via special_function_p that the callee is ECF_RETURNS_TWICE. But special_function_p only looks at the name, so setjmp_call_p fails to detect attribute((returns_twice)) callees. The fix is to have setjmp_call_p also check if the returns_twice attribute is present, via DECL_IS_RETURNS_TWICE. It could call flags_from_decl_or_type instead, but that would perform quite a bit of redundant work for this case. The test case uses -Winline to check that gcc refuses to inline a function that calls a returns_twice callee. This is sufficient to verify the fix, and avoids the machine-specific code needed in the original runtime test case. Tested w/o regressions with gcc trunk and 4.6 on x86_64-linux. The added test case does fail without the fix and pass with it. OK for trunk, and perhaps 4.6? (I don't have svn write access.) /Mikael gcc/ 2011-06-02 Mikael Pettersson mi...@it.uu.se PR tree-optimization/49243 * calls.c (setjmp_call_p): Also check if fndecl has the returns_twice attribute. gcc/testsuite/ 2011-06-02 Mikael Pettersson mi...@it.uu.se PR tree-optimization/49243 * gcc.dg/pr49243.c: New. --- gcc-4.7-20110528/gcc/calls.c.~1~2011-05-25 13:00:14.0 +0200 +++ gcc-4.7-20110528/gcc/calls.c2011-06-02 12:55:32.0 +0200 @@ -554,6 +554,8 @@ special_function_p (const_tree fndecl, i int setjmp_call_p (const_tree fndecl) { + if (DECL_IS_RETURNS_TWICE (fndecl)) +return ECF_RETURNS_TWICE; return special_function_p (fndecl, 0) ECF_RETURNS_TWICE; } --- gcc-4.7-20110528/gcc/testsuite/gcc.dg/pr49243.c.~1~ 1970-01-01 01:00:00.0 +0100 +++ gcc-4.7-20110528/gcc/testsuite/gcc.dg/pr49243.c 2011-06-02 12:55:32.0 +0200 @@ -0,0 +1,25 @@ +/* PR tree-optimization/49243 */ +/* { dg-do compile } */ +/* { dg-options -O2 -Winline } */ + +extern unsigned long jb[]; +extern int my_setjmp(unsigned long jb[]) __attribute__((returns_twice)); +extern int decode(const char*); + +static inline int wrapper(const char **s_ptr) /* { dg-warning (inlining failed|function 'wrapper' can never be inlined because it uses setjmp) } */ +{ +if (my_setjmp(jb) == 0) { + const char *s = *s_ptr; + while (decode(s) != 0) + *s_ptr = ++s; + return 0; +} else + return -1; +} + +void parse(const char *data) +{ +const char *s = data; +if (!(wrapper(s) == -1 (s - data) == 1)) /* { dg-warning called from here } */ + __builtin_abort(); +}
Re: [google] Improve locus information during if-conversion (issue4526101)
On Thu, Jun 2, 2011 at 08:46, Steven Bosscher stevenb@gmail.com wrote: On Thu, Jun 2, 2011 at 1:46 PM, Diego Novillo dnovi...@google.com wrote: On Wed, Jun 1, 2011 at 21:03, Sharad Singhai sing...@google.com wrote: 2011-06-01 Sharad Singhai sing...@google.com Google Ref 39994 * ifcvt.c (noce_try_cmove_arith): Use the locus information from the if-statment rather than the then path. Could you elaborate how it improves locus information? Is there a test case you can add to the testsuite? Or an example code fragment that shows how is the locus better now? OK for google/main. Why can't this patch just go onto the trunk? Yes. Every patch submitted for google/main also applies to trunk. I generally try to avoid approving patches that are not inside my maintenance areas for trunk (unless the patch is obvious). Sharad simply forgot to request trunk approval for this patch and I forgot to remind him. Sharad, could you mark future patches? (Idem for some other google/main patches -- is there a merge plan??) The merge plan is to submit patches to trunk. They are quickly moved in google/main for scheduling reasons, but everything going to google/main is automatically assumed to apply to trunk as well. The only patches that we don't necessarily mean to apply to trunk are the ones we put in google/integration (though some patches have already been moved or approved for trunk). Diego.
Re: [PATCH, ARM] Fix ABI for double-precision helpers on single-float-only CPUs
gcc/ * config/arm/arm.c (arm_libcall_uses_aapcs_base) (arm_init_cumulative_args): Use correct ABI for double-precision helper functions in hard-float mode if only single-precision arithmetic is supported in hardware. Ok, though I'd add a bit more explanation to the comments: Technically the same is true for the single precision helpers. However all targets that support the hard-float ABI implement single-precision in hardware, so this never occurs in practice. Paul
Re: [PATCH][all-langs] Defer size_t and sizetype setting to the middle-end
On Wed, Jun 1, 2011 at 14:34, Richard Guenther rguent...@suse.de wrote: This patch defers the control over size_t and sizetype to the middle-end which in turn consults the target. This removes various inconsistencies for frontends that do not seem to care about size_t and will allow simplifying the global tree initialization. Bootstrapped on x86_64-unknown-linux-gnu for all languages, testing in progress. Ok for trunk? (the change is worthwhile from an LTO and middle-end perspective and I'll apply leeway to frontends that appear to be unmaintained - hello Java) Fortran parts are ok. -- Janne Blomqvist
[gcc patch 0/3] libiberty: New DMGL_RET_DROP
Hi, introducing DMGL_RET_DROP which suppresses return type demangled from linkage name for the toplevel function type. DMGL_RET_POSTFIX is now in use only for DMGL_JAVA. Besides Java return types in linkage name are in C++ present for function templates. GDB since 7.2 provides convenience alias for the function template symbols without the return type so that for 00400523 T _Z4funcIdET_i 00400523 T double funcdouble(int) one can since GDB 7.2 easily find the template functions by name using: (gdb) break funtab-completion instead of having to guess the return type first as in GDB 7.1: (gdb) break 'double funtab-completion As the demangler usage has been reintroduced for GDB 7.3 (to fix GDB PR 12506 and similar cases by using DW_AT_linkage_name again) it now needs to drop the return type by the demangler (instead of a GDB 7.2 custom physname code). The function templates return types linkage name are a similar case like DMGL_JAVA which uses DMGL_RET_POSTFIX: jmain.main(java.lang.String[])void I believe both cases should either use DMGL_RET_POSTFIX or the new DMGL_RET_DROP, therefore to use just: jmain.main(java.lang.String[]) For C++ I (and also Tom Tromey) prefer DMGL_RET_DROP: funcdouble(int) over DMGL_RET_POSTFIX: funcdouble(int)double as in practice there are no two template function instances with the same name + parameters signature but different return type - one cannot overload function by return type in either C++ or Java. G++ rejects compilation of a CU containing such two instances, one can only link two different CUs together to get the return type linkage name difference in a single file. After all one also still can reference the function by its original ELF symbol 'double funcdouble(int)'. Proposing in a different GDB patch to use DMGL_RET_DROP even for Java symbols but I do not have any real need for it. This patchset had no GCC regressions for Fedora gcc-4.6.0-8.fc15 (not tested for GCC HEAD, hopefully OK). The new testcases are based on a C++ source: template typename T char outer (int (*inner) (long)) { return 0; } int outer2_ret (long) { return 0; } template typename T int (*outer2 (int (*inner) (long))) (long) { return outer2_ret; } char outer (short (*inner) (int), long) { outer2short (0); return outershort (0); } Thanks, Jan
[gcc patch 3/3] cp-demangle.c: Fix DMGL_RET_POSTFIX for inner func types
Hi, I do not need this patch in any way but I believe it should go in for the case anyone would want to use DMGL_RET_POSTFIX with C/C++. Without this fix the new testcase would: FAIL at line 3979, options --format=gnu-v3 --ret-postfix: in: _Z6outer2IsEPFilES1_ out: outer2short(int (*)(long))int (*(int (*)(long)))(long) exp: outer2short(int (*)(long))int (*)(long) Thanks, Jan libiberty/ 2011-05-24 Jan Kratochvil jan.kratoch...@redhat.com * cp-demangle.c (d_print_comp) DEMANGLE_COMPONENT_FUNCTION_TYPE: Suppress d_print_mod for DMGL_RET_POSTFIX. * testsuite/demangle-expected: New testcases for --ret-postfix. --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -3921,7 +3921,10 @@ d_print_comp (struct d_print_info *dpi, const struct demangle_component *dc, options ~(DMGL_RET_POSTFIX | DMGL_RET_DROP)); /* Print return type if present */ - if (d_left (dc) != NULL (options DMGL_RET_DROP) == 0) + if (d_left (dc) != NULL (options DMGL_RET_POSTFIX) != 0) + d_print_comp (dpi, d_left (dc), + options ~(DMGL_RET_POSTFIX | DMGL_RET_DROP)); + else if (d_left (dc) != NULL (options DMGL_RET_DROP) == 0) { struct d_print_mod dpm; --- a/libiberty/testsuite/demangle-expected +++ b/libiberty/testsuite/demangle-expected @@ -3968,6 +3968,15 @@ outer(short (*)(int), long) --format=gnu-v3 _Z6outer2IsEPFilES1_ int (*outer2short(int (*)(long)))(long) +--format=gnu-v3 --ret-postfix +_Z5outerIsEcPFilE +outershort(int (*)(long))char +--format=gnu-v3 --ret-postfix +_Z5outerPFsiEl +outer(short (*)(int), long) +--format=gnu-v3 --ret-postfix +_Z6outer2IsEPFilES1_ +outer2short(int (*)(long))int (*)(long) --format=gnu-v3 --ret-drop _Z5outerIsEcPFilE outershort(int (*)(long))
[PATCH] PR fortran/49265 -- allow for double colon in module procedure statement
The attached patch allows for F2008's optional double colon in a module procedure statement. Built and regression tested on trunk. OK for trunk and 4.6? Steven G. Kargl ka...@gcc.gnu.org PR fortran/49265 * decl.c (gfc_match_modproc): Allow for a double colon in a module procedure statement. 2011-06-02 Steven G. Kargl ka...@gcc.gnu.org PR fortran/49265 * gfortran.dg/module_procedure_double_colon.f90: New test. -- Steve Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (revision 174566) +++ gcc/fortran/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-06-02 Steven G. Kargl ka...@gcc.gnu.org + + PR fortran/49265 + * decl.c (gfc_match_modproc): Allow for a double colon in a module + procedure statement. + 2011-05-31 Tobias Burnus bur...@net-b.de PR fortran/18918 Index: gcc/fortran/decl.c === --- gcc/fortran/decl.c (revision 174566) +++ gcc/fortran/decl.c (working copy) @@ -7016,6 +7016,7 @@ gfc_match_modproc (void) char name[GFC_MAX_SYMBOL_LEN + 1]; gfc_symbol *sym; match m; + locus old_locus; gfc_namespace *module_ns; gfc_interface *old_interface_head, *interface; @@ -7044,10 +7045,22 @@ gfc_match_modproc (void) end up with a syntax error and need to recover. */ old_interface_head = gfc_current_interface_head (); + /* Check if the F2008 optional double colon appears. */ + old_locus = gfc_current_locus; + if (gfc_match ( :: ) == MATCH_YES) +{ + if (gfc_notify_std (GFC_STD_F2008, Fortran 2008: double colon in + MODULE PROCEDURE statement at %L, old_locus) + == FAILURE) + return MATCH_ERROR; +} + else +gfc_current_locus = old_locus; + for (;;) { - locus old_locus = gfc_current_locus; bool last = false; + old_locus = gfc_current_locus; m = gfc_match_name (name); if (m == MATCH_NO) @@ -7059,6 +7072,7 @@ gfc_match_modproc (void) current namespace. */ if (gfc_match_eos () == MATCH_YES) last = true; + if (!last gfc_match_char (',') != MATCH_YES) goto syntax; Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 174566) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-06-02 Steven G. Kargl ka...@gcc.gnu.org + + PR fortran/49265 + * gfortran.dg/module_procedure_double_colon.f90: New test. + 2011-06-02 Eric Botcazou ebotca...@adacore.com Hans-Peter Nilsson h...@axis.com Index: gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 === --- gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 (revision 0) @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-options -std=f95 } +! +! PR fortran/49265 +! Contributed by Erik Toussaint +! +module m1 + implicit none + interface foo + module procedure :: bar ! { dg-error double colon } + end interface +contains + subroutine bar + end subroutine +end module +! { dg-final { cleanup-modules m1 } }
Re: fix left-over debug insns in DCE
One of the issues was that DCE removed an insn that set a REG in a certain mode, without adjusting a debug use of that REG. This was in libstdc++, but I failed to take note of the affected file. DF later attached that debug use to another SET to the same REG in a different, incompatible mode. When that one was found to be dead by DF, we ended up ICEing as we attempted to emit the invalid SUBREGs. I reused some of the infrastructure to propagate dead DEFs into debug uses in DF to get DCE to emit debug temps and adjust debug uses as well, fixing this issue. While at that, I improved the handling of unused DEFs in DF, that previously resulted in loss of debug information, so as to retain it as much as possible. Why can't the problem be addressed purely within DF? Starting to spill the DF logic to individual RTL passes doesn't look very appealing to me. This is the patch I ended up with. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? OK for the usual debug insn bookkeeping, i.e. * dce.c (reset_unmarked_insns_debug_uses): New. (delete_unmarked_insns): Skip debug insns. (prescan_insns_for_dce): Likewise. (rest_of_handle_ud_dce): Propagate debug uses. * reg-stack.c (subst_stack_regs_in_debug_insn): Signal when no active reg can be found. (subst_all_stack_regs_in_debug_insn): New. Reset debug insn then. (convert_regs_1): Use it. The rest needs further discussing IMO. -- Eric Botcazou
Re: [patch] Improve detection of widening multiplication in the vectorizer
On Thu, Jun 2, 2011 at 1:08 PM, Ira Rosen ira.ro...@linaro.org wrote: On 2 June 2011 12:59, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jun 2, 2011 at 10:46 AM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 15:14, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen ira.ro...@linaro.org wrote: On 1 June 2011 12:42, Richard Guenther richard.guent...@gmail.com wrote: Did you think about moving pass_optimize_widening_mul before loop optimizations? Does that pass catch the cases you are teaching the pattern recognizer? I think we should try to expose these more complicated instructions to loop optimizers. pass_optimize_widening_mul doesn't catch these cases, but I can try to teach it instead of the vectorizer. I am now testing Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,7 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +935,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); to see how it affects other loop optimizations (vectorizer pattern tests obviously fail). Looks like it needs copy_prop and dce as well: Index: passes.c === --- passes.c (revision 174391) +++ passes.c (working copy) @@ -870,6 +870,9 @@ NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_dce); + NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = pass_tree_loop.pass.sub; @@ -934,7 +937,6 @@ NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_fold_builtins); - NEXT_PASS (pass_optimize_widening_mul); NEXT_PASS (pass_tail_calls); NEXT_PASS (pass_rename_ssa_copies); NEXT_PASS (pass_uncprop); otherwise I get (on x86_64-suse-linux) FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd Hmm. I would have put the pass next to the sincos pass, but yes, in principle a copyprop dce pass after PRE makes sense (the loop passes likely don't run because there are no loops in those testcases - both copyprop and dce should be scheduled more like TODOs, or even automatically by the pass manager via PROPs ...). Dead code can indeed confuse those matching passes that look for single-use vars. I'll think about a more elegant solution for this problem. Would you mind checking if the next-to-sincos position makes any difference? Before sincos we have D.2747_2 = __builtin_powf (a_1(D), 2.0e+0); D.2746_4 = D.2747_2 + c_3(D); which is transformed by sincos to powmult.8_7 = a_1(D) * a_1(D); D.2747_2 = powmult.8_7; D.2746_4 = D.2747_2 + c_3(D); but widening_mul is confused by D.2747_2 = powmult.8_7; and it needs both copy_prop and dce to remove it: powmult.8_7 = a_1(D) * a_1(D); D.2746_4 = c_3(D) + powmult.8_7; So moving widening_mul next to sincos doesn't help. Maybe gimple_expand_builtin_pow() can be changed to generate the last version by itself? Yeah, I guess so. I'll have a look. Richard. Ira Thanks, Richard. Ira Thanks. I would hope that we eventually can get rid of the pattern recognizer ... at least for SSE there is also always a scalar variant instruction for each vectorized one. Richard.
Re: [PATCH, ARM] Fix ABI for double-precision helpers on single-float-only CPUs
On Fri, 2011-05-27 at 17:32 +0100, Julian Brown wrote: The helper functions used to implement double-precision arithmetic on ARM processors that only support single-precision arithmetic in hardware should use the soft-float ABI (i.e. passing and returning floating-point arguments in core registers), even when -mfloat-abi=hard is in effect. This patch tweaks the ABI for the affected functions so that is true. Tested with cross to ARM EABI, and by manually observing compiler output. We've also been carrying this patch in our local tree for some time without issue. OK to apply? Thanks, Julian ChangeLog gcc/ * config/arm/arm.c (arm_libcall_uses_aapcs_base) (arm_init_cumulative_args): Use correct ABI for double-precision helper functions in hard-float mode if only single-precision arithmetic is supported in hardware. I see Paul has already approved this, but I've just spotted one potential problem that might cause latent bugs sometime in the future. The code to register the libcalls is only run once, the first time we try to look up a libcall. If we ever end up allowing dynamic changing of CPU and optimization options, not registering the other libcalls will lead to subtle problems at run time. I suggest that these functions be unconditionally added along with the other libcalls. I also don't understand why all the tests are needed in arm_init_cumulative_args? Surely arm_libcall_uses_aapcs_base() will already have run that test. R.
[gc-improv] Fix all remaining C testsuite failures
This patch - Fixes PCH failures by re-initializing struct function after PCH read. - Allocates couple of global RTXes in the permanent memory. - Fixes RTL copying by taking source and destination memory areas into account. I.e. RTXes that would be normally shared, if source is in the permanent, and destination is in the function area, then are copied. Assert that this does not happen in the cases when copying is meaningless. With this, the C testsuite achieves parity on x86_64-unknown-linux-gnu, but I've lagged significantly behind with merges from trunk. My next step is to implement poisoning of function memory area and also I will look into walking GC and checking for non-GTY((skip)) pointers pointing to RTL memory areas. I will do this before I do the next merge, as hopefully this will make merges easier. This patch took me two months. At this pace, I'm not sure the branch will be ready for consideration for 4.7. 2011-06-02 Laurynas Biveinis laurynas.bivei...@gmail.com * varasm.c (make_decl_rtl): Allocate DECL_RTL in the permanent RTL memory. * rtl.c: (_obstack_allocated_p): Declare. (allocated_in_function_mem_p): New. (need_copy_p): New. (copy_rtx): Re-enable sharing of CONST_VECTOR rtxes. Use need_copy_p to decide on copying vs. sharing of rtxes. * function.c (reinit_struct_function): New. (set_cfun, prepare_function_start): Call it. * config/i386/i386.c (ix86_expand_split_stack_prologue): Allocate split_stack_fn in the permanent RTL memory. (ix86_expand_split_stack_prologue): Allocate split_stack_fn_large in the permanent RTL memory. Index: gcc/function.c === --- gcc/function.c (revision 171651) +++ gcc/function.c (working copy) @@ -151,6 +151,7 @@ static void do_clobber_return_reg (rtx, void *); static void do_use_return_reg (rtx, void *); static void set_insn_locators (rtx, int) ATTRIBUTE_UNUSED; +static void reinit_struct_function (void); /* Stack of nested functions. */ /* Keep track of the cfun stack. */ @@ -4316,6 +4317,7 @@ { cfun = new_cfun; invoke_set_current_function_hook (new_cfun ? new_cfun-decl : NULL_TREE); + reinit_struct_function (); } } @@ -4417,6 +4419,16 @@ allocate_struct_function (fndecl, false); } +/* Initialize those parts of struct function that are cleared during PCH read + and write. */ + +static void +reinit_struct_function (void) +{ + if (cfun !cfun-machine init_machine_status) +cfun-machine = (*init_machine_status) (); +} + /* Reset crtl and other non-struct-function variables to defaults as appropriate for emitting rtl at the start of a function. */ @@ -4437,8 +4449,7 @@ } /* cfun-machine is NULL after PCH read. Initialize it. */ - if (!cfun-machine init_machine_status) -cfun-machine = (*init_machine_status) (); + reinit_struct_function (); cse_not_expected = ! optimize; Index: gcc/ChangeLog.gc-improv === --- gcc/ChangeLog.gc-improv (revision 172076) +++ gcc/ChangeLog.gc-improv (working copy) @@ -1,3 +1,22 @@ +2011-06-02 Laurynas Biveinis laurynas.bivei...@gmail.com + + * varasm.c (make_decl_rtl): Allocate DECL_RTL in the permanent RTL + memory. + + * rtl.c: (_obstack_allocated_p): Declare. + (allocated_in_function_mem_p): New. + (need_copy_p): New. + (copy_rtx): Re-enable sharing of CONST_VECTOR rtxes. Use + need_copy_p to decide on copying vs. sharing of rtxes. + + * function.c (reinit_struct_function): New. + (set_cfun, prepare_function_start): Call it. + + * config/i386/i386.c (ix86_expand_split_stack_prologue): Allocate + split_stack_fn in the permanent RTL memory. + (ix86_expand_split_stack_prologue): Allocate split_stack_fn_large + in the permanent RTL memory. + 2011-04-07 Laurynas Biveinis laurynas.bivei...@gmail.com * stmt.c (label_rtx): Allocate RTX in permanent RTL memory. Index: gcc/varasm.c === --- gcc/varasm.c (revision 171651) +++ gcc/varasm.c (working copy) @@ -1238,6 +1238,9 @@ optimization may eliminate reads and/or writes to register variables); + if (TREE_STATIC (decl)) + use_rtl_permanent_mem (); + /* If the user specified one of the eliminables registers here, e.g., FRAME_POINTER_REGNUM, we don't want to get this variable confused with that register and be eliminated. This usage is @@ -1248,6 +1251,9 @@ REG_USERVAR_P (DECL_RTL (decl)) = 1; if (TREE_STATIC (decl)) + use_rtl_function_mem (); + + if (TREE_STATIC (decl)) { /* Make this register global, so not usable for anything else. */ Index: gcc/rtl.c === --- gcc/rtl.c (revision 171651) +++ gcc/rtl.c (working copy) @@ -150,6 +150,9 @@ static
Re: [patch] Fix PR tree-optimization/49038
On 26 May 2011 10:52, Ira Rosen ira.ro...@linaro.org wrote: Hi, The vectorizer supports strided loads with gaps, e.g., when only a[4i] and a[4i+2] are accessed, it generates a vector load a[4i:4i+3], i.e., creating an access to a[4i+3], which doesn't exist in the scalar code. This access maybe invalid as described in the PR. This patch creates an epilogue loop (with at least one iteration) for such cases. Bootstrapped and tested on powerpc64-suse-linux. Applied to trunk. I'll prepare patches for 4.5 and 4.6 next week. Here are the patches. Bootstrapped and tested on x86_64-suse-linux (4.5) and on powerpc64-suse-linux (4.6). OK to apply? Thanks, Ira 4.6 ChangeLog: PR tree-optimization/49038 * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): Ensure at least one epilogue iteration if required by data accesses with gaps. * tree-vectorizer.h (struct _loop_vec_info): Add new field to mark loops that require peeling for gaps. * tree-vect-loop.c (new_loop_vec_info): Initialize new field. (vect_get_known_peeling_cost): Take peeling for gaps into account. (vect_transform_loop): Generate epilogue if required by data access with gaps. * tree-vect-data-refs.c (vect_analyze_group_access): Mark the loop as requiring an epilogue if there are gaps in the end of the strided group. 4.5 ChangeLog: PR tree-optimization/49038 * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): Ensure at least one epilogue iteration if required by data accesses with gaps. * tree-vectorizer.h (struct _loop_vec_info): Add new field to mark loops that require peeling for gaps. * tree-vect-loop.c (new_loop_vec_info): Initialize new field. (vect_estimate_min_profitable_iters): Take peeling for gaps into account. (vect_transform_loop): Generate epilogue if required by data access with gaps. * tree-vect-data-refs.c (vect_analyze_group_access): Mark the loop as requiring an epilogue if there are gaps in the end of the strided group. 4.6 and 4.5 testsuite/ChangeLog: PR tree-optimization/49038 * gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: New test. * gcc.dg/vect/pr49038.c: New test. Index: tree-vect-loop-manip.c === --- tree-vect-loop-manip.c (revision 174565) +++ tree-vect-loop-manip.c (working copy) @@ -1516,7 +1516,7 @@ vect_generate_tmps_on_preheader (loop_ve edge pe; basic_block new_bb; gimple_seq stmts; - tree ni_name; + tree ni_name, ni_minus_gap_name; tree var; tree ratio_name; tree ratio_mult_vf_name; @@ -1533,9 +1533,39 @@ vect_generate_tmps_on_preheader (loop_ve ni_name = vect_build_loop_niters (loop_vinfo, cond_expr_stmt_list); log_vf = build_int_cst (TREE_TYPE (ni), exact_log2 (vf)); + /* If epilogue loop is required because of data accesses with gaps, we + subtract one iteration from the total number of iterations here for + correct calculation of RATIO. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) +{ + ni_minus_gap_name = fold_build2 (MINUS_EXPR, TREE_TYPE (ni_name), + ni_name, + build_one_cst (TREE_TYPE (ni_name))); + if (!is_gimple_val (ni_minus_gap_name)) + { + var = create_tmp_var (TREE_TYPE (ni), ni_gap); + add_referenced_var (var); + + stmts = NULL; + ni_minus_gap_name = force_gimple_operand (ni_minus_gap_name, stmts, + true, var); + if (cond_expr_stmt_list) +gimple_seq_add_seq (cond_expr_stmt_list, stmts); + else +{ + pe = loop_preheader_edge (loop); + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); + gcc_assert (!new_bb); +} +} +} + else +ni_minus_gap_name = ni_name; + /* Create: ratio = ni log2(vf) */ - ratio_name = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_name), ni_name, log_vf); + ratio_name = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ni_minus_gap_name), + ni_minus_gap_name, log_vf); if (!is_gimple_val (ratio_name)) { var = create_tmp_var (TREE_TYPE (ni), bnd); Index: testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c === --- testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c (revision 0) +++
Re: [PATCH] PR fortran/49265 -- allow for double colon in module procedure statement
On Thu, Jun 02, 2011 at 05:59:29PM +0200, Thomas Koenig wrote: Hi Steve, it seems that, with your patch, interface foo module procedure::bar end interface is rejected, as is interface foo module procuedure:: bar end interface Is this the way it is supposed to be? Oh phew. Good catch. I wasn't dealing with the possible white space issues. Here's an updated patch and testcase. -- Steve Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (revision 174566) +++ gcc/fortran/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2011-06-02 Steven G. Kargl ka...@gcc.gnu.org + + PR fortran/49265 + * decl.c (gfc_match_modproc): Allow for a double colon in a module + procedure statement. + * parse.c ( decode_statement): Deal with whitespace around :: in + gfc_match_modproc. + 2011-05-31 Tobias Burnus bur...@net-b.de PR fortran/18918 Index: gcc/fortran/decl.c === --- gcc/fortran/decl.c (revision 174566) +++ gcc/fortran/decl.c (working copy) @@ -7016,6 +7016,7 @@ gfc_match_modproc (void) char name[GFC_MAX_SYMBOL_LEN + 1]; gfc_symbol *sym; match m; + locus old_locus; gfc_namespace *module_ns; gfc_interface *old_interface_head, *interface; @@ -7044,10 +7045,23 @@ gfc_match_modproc (void) end up with a syntax error and need to recover. */ old_interface_head = gfc_current_interface_head (); + /* Check if the F2008 optional double colon appears. */ + gfc_gobble_whitespace (); + old_locus = gfc_current_locus; + if (gfc_match (::) == MATCH_YES) +{ + if (gfc_notify_std (GFC_STD_F2008, Fortran 2008: double colon in + MODULE PROCEDURE statement at %L, old_locus) + == FAILURE) + return MATCH_ERROR; +} + else +gfc_current_locus = old_locus; + for (;;) { - locus old_locus = gfc_current_locus; bool last = false; + old_locus = gfc_current_locus; m = gfc_match_name (name); if (m == MATCH_NO) @@ -7059,6 +7073,7 @@ gfc_match_modproc (void) current namespace. */ if (gfc_match_eos () == MATCH_YES) last = true; + if (!last gfc_match_char (',') != MATCH_YES) goto syntax; Index: gcc/fortran/parse.c === --- gcc/fortran/parse.c (revision 174566) +++ gcc/fortran/parse.c (working copy) @@ -399,7 +399,7 @@ decode_statement (void) break; case 'm': - match (module% procedure% , gfc_match_modproc, ST_MODULE_PROC); + match (module% procedure, gfc_match_modproc, ST_MODULE_PROC); match (module, gfc_match_module, ST_MODULE); break; Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 174566) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-06-02 Steven G. Kargl ka...@gcc.gnu.org + + PR fortran/49265 + * gfortran.dg/module_procedure_double_colon.f90: New test. + 2011-06-02 Eric Botcazou ebotca...@adacore.com Hans-Peter Nilsson h...@axis.com Index: gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 === --- gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/module_procedure_double_colon.f90 (revision 0) @@ -0,0 +1,24 @@ +! { dg-do compile } +! { dg-options -std=f95 } +! +! PR fortran/49265 +! Contributed by Erik Toussaint +! +module m1 + implicit none + interface foo + module procedure::bar ! { dg-error double colon } + module procedure ::bar_none ! { dg-error double colon } + module procedure:: none_bar ! { dg-error double colon } + end interface +contains + subroutine bar + end subroutine + subroutine bar_none(i) + integer i + end subroutine + subroutine none_bar(x) + real x + end subroutine +end module +! { dg-final { cleanup-modules m1 } }
Re: [PATCH] PR fortran/49265 -- allow for double colon in module procedure statement
Hi Steve, Oh phew. Good catch. I wasn't dealing with the possible white space issues. Here's an updated patch and testcase. OK for trunk. Could you also add the test case a second time, without -std=f95, to make sure it keeps passing? Thanks for the patch! Thomas
Re: [PATCH, ARM] Cortex-A5 tuning [2/2] - tweak instruction conditionalisation
On Wed, 01 Jun 2011 17:00:30 +0100 Richard Earnshaw rearn...@arm.com wrote: On Wed, 2011-06-01 at 16:49 +0100, Julian Brown wrote: This patch tweaks the behaviour of arm_final_prescan_insn when tuning for Cortex-A5 cores, since branches are cheaper than long sequences of conditionalised instructions on those processors. As posted in the previous patch, this provides a measurable increase in performance on a popular embedded benchmark. (I didn't use the tuning infrastructure for this one, though it could easily be changed to do so, now I come to think of it.) I would much prefer that this was done through the tuning infrastructure. If one core likes it this way, there's a strong chance of another one coming along that has similar preferences. How does this version look? I've left the size-optimisation case the same (max_insns_skipped=6), but added a tunable integer to the tune_params structure allowing the speed-optimisation case to be varied according to the chosen target tuning. To maintain existing semantics, this means duplicating the fastmul structure for the StrongARM (XScale also used the StrongARM setting, but already has its own tuning structure). Minimally re-tested. OK to apply? Thanks, Julian ChangeLog gcc/ * config/arm/arm-cores.def (strongarm, strongarm110, strongarm1100) (strongarm1110): Use strongarm tuning. * config/arm/arm-protos.h (tune_params): Add max_insns_skipped field. * config/arm/arm.c (arm_strongarm_tune): New. (arm_slowmul_tune, arm_fastmul_tune, arm_xscale_tune, arm_9e_tune) (arm_v6t2_tune, arm_cortex_tune, arm_cortex_a5_tune) (arm_cortex_a9_tune, arm_fa726te_tune): Add max_insns_skipped field setting, using previous defaults or 1 for Cortex-A5. (arm_option_override): Set max_insns_skipped from current tuning.commit 2116062b95b55fc048d54321c8b41a4d83175430 Author: Julian Brown jul...@henry7.codesourcery.com Date: Fri May 27 11:26:57 2011 -0700 Tune max_insns_skipped for conditionalization for Cortex-A5. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 4ff2324..89697c0 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -70,10 +70,10 @@ ARM_CORE(arm7dmi, arm7dmi, 3M, FL_CO_PROC | FL_MODE26, fastmul) /* V4 Architecture Processors */ ARM_CORE(arm8, arm8, 4, FL_MODE26 | FL_LDSCHED, fastmul) ARM_CORE(arm810,arm810, 4, FL_MODE26 | FL_LDSCHED, fastmul) -ARM_CORE(strongarm, strongarm, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) -ARM_CORE(strongarm110, strongarm110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) -ARM_CORE(strongarm1100, strongarm1100, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) -ARM_CORE(strongarm1110, strongarm1110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, fastmul) +ARM_CORE(strongarm, strongarm, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) +ARM_CORE(strongarm110, strongarm110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) +ARM_CORE(strongarm1100, strongarm1100, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) +ARM_CORE(strongarm1110, strongarm1110, 4, FL_MODE26 | FL_LDSCHED | FL_STRONG, strongarm) ARM_CORE(fa526, fa526,4, FL_LDSCHED, fastmul) ARM_CORE(fa626, fa626,4, FL_LDSCHED, fastmul) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index c104d74..67aee46 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -221,6 +221,9 @@ struct tune_params bool (*rtx_costs) (rtx, RTX_CODE, RTX_CODE, int *, bool); bool (*sched_adjust_cost) (rtx, rtx, rtx, int *); int constant_limit; + /* Maximum number of instructions to conditionalise in + arm_final_prescan_insn. */ + int max_insns_skipped; int num_prefetch_slots; int l1_cache_size; int l1_cache_line_size; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cd3f104..8f01202 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -857,6 +857,7 @@ const struct tune_params arm_slowmul_tune = arm_slowmul_rtx_costs, NULL, 3, /* Constant limit. */ + 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ arm_default_branch_cost @@ -867,6 +868,21 @@ const struct tune_params arm_fastmul_tune = arm_fastmul_rtx_costs, NULL, 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_NOT_BENEFICIAL, + true, /* Prefer constant pool. */ + arm_default_branch_cost +}; + +/* StrongARM has early execution of branches, so a sequence that is worth + skipping is shorter. Set max_insns_skipped to a lower value. */ + +const struct tune_params arm_strongarm_tune = +{ + arm_fastmul_rtx_costs, + NULL, + 1, /* Constant limit. */ +
Re: [google] Improve locus information during if-conversion (issue4526101)
This patch improves precision of the line number information during coverage mode. Yes, I need to add an example/test case. I was planning to do that before I propose this patch for trunk as well. Thanks, Sharad On Thu, Jun 2, 2011 at 4:46 AM, Diego Novillo dnovi...@google.com wrote: On Wed, Jun 1, 2011 at 21:03, Sharad Singhai sing...@google.com wrote: 2011-06-01 Sharad Singhai sing...@google.com Google Ref 39994 * ifcvt.c (noce_try_cmove_arith): Use the locus information from the if-statment rather than the then path. Could you elaborate how it improves locus information? Is there a test case you can add to the testsuite? Or an example code fragment that shows how is the locus better now? OK for google/main. Diego.
Re: [patch] testsuite: support board_info timeouts
I never got feedback from the testsuite maintainers on this one... Date: Mon, 9 Aug 2010 23:48:31 -0400 From: DJ Delorie d...@redhat.com Mailing-List: contact gcc-patches-h...@gcc.gnu.org; run by ezmlm Is there any reason why we don't support board-level timeouts? It's really hard to specify timeouts for sid-based embedded targets with lots of multilibs (or just one, sometimes). It's certainly better than really REALLY ugly which is the only other option at that point. * lib/timeout.exp (timeout): Add board_info support. 2010-08-09 Thomas Koenig tkoe...@gcc.gnu.org Index: lib/timeout.exp === --- lib/timeout.exp (revision 163048) +++ lib/timeout.exp (working copy) @@ -43,12 +43,14 @@ proc timeout_value { args } { if [info exists individual_timeout] { set val $individual_timeout } elseif [info exists tool_timeout] { set val $tool_timeout } elseif [target_info exists gcc,timeout] { set val [target_info gcc,timeout] +} elseif [board_info target exists gcc,timeout] { + set val [board_info target gcc,timeout] } else { # This is really, REALLY ugly, but this is the default from # remote.exp deep within DejaGnu. set val 300 }
Re: [PATCH] PR fortran/49265 -- allow for double colon in module procedure statement
On Thu, Jun 02, 2011 at 06:39:18PM +0200, Thomas Koenig wrote: Hi Steve, Oh phew. Good catch. I wasn't dealing with the possible white space issues. Here's an updated patch and testcase. OK for trunk. Could you also add the test case a second time, without -std=f95, to make sure it keeps passing? Yes, I'll add a 2nd testcase. Thanks for the review. -- Steve
Re: Add missing ChangeLog entry
Nathan Sidwell nat...@codesourcery.com writes: On 06/01/11 15:32, Ian Lance Taylor wrote: I noticed that we have a --with-specs option in gcc/configure.ac, added in revision 155208 with this e-mail message: http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00132.html sorry about that. How's the attachd documentation? Works for me, but please add an @xref{Spec Files} (you might need a document in there too, not sure) as a pointer to where the spec format is documented. Thanks. Ian
Re: Ping: [Patch] Make libstdc++'s abi_check more robust against readelf output format
On Thu, Jun 2, 2011 at 3:08 AM, Paolo Carlini pcarl...@gmail.com wrote: Hi, Ping. Did Ian Taylor see this patch? If he likes it, I'm also fine with it. I think this patch is fine, with or without Jonathan's suggestion. Ian
Re: [patch] testsuite: support board_info timeouts
On Jun 2, 2011, at 9:48 AM, DJ Delorie wrote: I never got feedback from the testsuite maintainers on this one... Ok.
Re: [PATCH] PR fortran/49265 -- allow for double colon in module procedure statement
On Thu, Jun 02, 2011 at 06:39:18PM +0200, Thomas Koenig wrote: Hi Steve, Oh phew. Good catch. I wasn't dealing with the possible white space issues. Here's an updated patch and testcase. OK for trunk. Could you also add the test case a second time, without -std=f95, to make sure it keeps passing? Thanks for the patch! svn-commit.tmp: 14 lines, 440 characters. Sendingfortran/ChangeLog Sendingfortran/decl.c Sendingfortran/parse.c Sendingtestsuite/ChangeLog Adding testsuite/gfortran.dg/module_procedure_double_colon_1.f90 Adding testsuite/gfortran.dg/module_procedure_double_colon_2.f90 Transmitting file data .. Committed revision 174569. -- Steve
[PATCH, i386]: Introduce Y4 register constraint and merge SSE4_1 patterns
Hello! ... and some unrelated cleanups involving simplifying a couple of switch statements. 2011-06-02 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (standard_sse_constant_p) case 1: Simplify switch statement. * config/i386/i386.md (*movdf_internal_rex64) case 8,9,10: Ditto. (*movdf_internal) case 6,7,8: Ditto. * config/i386/constraints.md (Y4): New constraint. * config/i386/sse.md (vec_setmode_0): Merge with *vec_setmode_0_sse4_1 and *vec_setmode_0_sse2. (*vec_extractv2di_1): Merge from *vec_extractv2di_1_sse2 and *vec_extractv2di_1_sse. (*vec_concatv2di_rex64): Merge from *vec_concatv2di_rex64_sse4_1 and *vec_concatv2di_rex64_sse. testsuite/ChangeLog: 2011-06-02 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/sse2-init-v2di-2: Update scan-assembler-times string. Bootstrapped and regression tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: testsuite/gcc.target/i386/sse2-init-v2di-2.c === --- testsuite/gcc.target/i386/sse2-init-v2di-2.c(revision 174566) +++ testsuite/gcc.target/i386/sse2-init-v2di-2.c(working copy) @@ -10,4 +10,4 @@ test (long long b) return _mm_cvtsi64_si128 (b); } -/* { dg-final { scan-assembler-times \\*vec_concatv2di_rex64_sse4_1/4 1 } } */ +/* { dg-final { scan-assembler-times \\*vec_concatv2di_rex64/4 1 } } */ Index: config/i386/i386.md === --- config/i386/i386.md (revision 174566) +++ config/i386/i386.md (working copy) @@ -2956,18 +2956,15 @@ case 10: switch (get_attr_mode (insn)) { - case MODE_V4SF: - return %vmovaps\t{%1, %0|%0, %1}; - case MODE_V2DF: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; - else - return %vmovapd\t{%1, %0|%0, %1}; case MODE_TI: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; - else + if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) return %vmovdqa\t{%1, %0|%0, %1}; + case MODE_V2DF: + if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) + return %vmovapd\t{%1, %0|%0, %1}; + case MODE_V4SF: + return %vmovaps\t{%1, %0|%0, %1}; + case MODE_DI: return %vmovq\t{%1, %0|%0, %1}; case MODE_DF: @@ -3102,18 +3099,15 @@ case 8: switch (get_attr_mode (insn)) { - case MODE_V4SF: - return %vmovaps\t{%1, %0|%0, %1}; - case MODE_V2DF: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; - else - return %vmovapd\t{%1, %0|%0, %1}; case MODE_TI: - if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; - else + if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) return %vmovdqa\t{%1, %0|%0, %1}; + case MODE_V2DF: + if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) + return %vmovapd\t{%1, %0|%0, %1}; + case MODE_V4SF: + return %vmovaps\t{%1, %0|%0, %1}; + case MODE_DI: return %vmovq\t{%1, %0|%0, %1}; case MODE_DF: Index: config/i386/constraints.md === --- config/i386/constraints.md (revision 174566) +++ config/i386/constraints.md (working copy) @@ -99,6 +99,9 @@ (define_register_constraint Y2 TARGET_SSE2 ? SSE_REGS : NO_REGS @internal Any SSE register, when SSE2 is enabled.) +(define_register_constraint Y4 TARGET_SSE4_1 ? SSE_REGS : NO_REGS + @internal Any SSE register, when SSE4_1 is enabled.) + (define_register_constraint Yi TARGET_SSE2 TARGET_INTER_UNIT_MOVES ? SSE_REGS : NO_REGS @internal Any SSE register, when SSE2 and inter-unit moves are enabled.) Index: config/i386/sse.md === --- config/i386/sse.md (revision 174566) +++ config/i386/sse.md (working copy) @@ -3376,79 +3376,35 @@ ;; Avoid combining registers from different units in a single alternative, ;; see comment above inline_secondary_memory_needed function in i386.c -(define_insn *vec_setmode_0_sse4_1 +(define_insn vec_setmode_0 [(set (match_operand:VI4F_128 0 nonimmediate_operand - =x,x,x ,x,x,x ,x ,m,m,m) + =Y4,Y2,Y2,x,x,x,Y4 ,x ,m,m,m) (vec_merge:VI4F_128 (vec_duplicate:VI4F_128 (match_operand:ssescalarmode 2 general_operand - x,m,*r,x,x,*rm,*rm,x,*r,fF)) + Y4,m ,*r,m,x,x,*rm,*rm,x,*r,fF)) (match_operand:VI4F_128 1 vector_move_operand - C,C,C ,0,x,0 ,x ,0,0 ,0) + C ,C ,C ,C,0,x,0 ,x ,0,0 ,0) (const_int 1)))] - TARGET_SSE4_1 + TARGET_SSE @ %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
Re: [PATCH] c-pragma: adding a data field to pragma_handler
Pierre == Pierre p.vit...@laposte.net writes: Pierre I have changed this handler in order to accept a second parameter Pierre which is a void *, allowing to give extra datas to the handler. I Pierre think this data field might be of general use: we can have condition Pierre or data at register time that we want to express in the handler. I Pierre guess this is a common way to pass data to an handler function. I can't approve or reject this patch, but the idea seems reasonable enough to me. Pierre I would like your opinion on this patch! Thanks! It has a number of formatting issues. Pierre +typedef void (*pragma_handler)(struct cpp_reader *, void * ); No space after the final *. Pierre +/* Internally use to keep the data of the handler. */ Pierre +struct internal_pragma_handler_d{ Space before the {. Pierre + pragma_handler handler; Pierre + void * data; No space. Lots of instances of this. Pierre /* A vector of registered pragma callbacks. */ Pierre +/*This is never freed as we need it during the whole execution */ Coalesce the two comments. The comment formatting is wrong, see GNU standards. Pierrens_name.space = space; Pierrens_name.name = name; Pierre + PierreVEC_safe_push (pragma_ns_name, heap, registered_pp_pragmas, ns_name); Gratuitous newline addition. Pierre + ihandler-handler = handler; Pierre + ihandler-data = data; I didn't see anything that initialized ihandler. Pierre + VEC_safe_push (internal_pragma_handler, heap, registered_pragmas, Pierre +ihandler); I think you wanted just `internal_pragma_handler ihandler', no *, for the definition. Pierre +c_register_pragma (const char *space, const char *name, pragma_handler handler, Pierre + void * data) There are lots of calls to this that you did not update. Do a recursive grep to see. One way to avoid a massive change is to add a new overload that passes in the data to c_register_pragma_1; and then change the legacy functions to pass NULL. I don't know if that approach is ok (it is typical in gdb...), so if not, you have to update all callers. Tom
Re: [PATCH libcpp]: S_ISREG non-zero value does not always fit in a bool
John == John Tytgat john.tyt...@aaug.net writes: John 2011-05-29 John Tytgat john.tyt...@aaug.net John * files.c (read_file_guts): Add test on non-zero value of S_ISREG. It seems reasonable enough to me. I am checking it in. Out of curiosity, do you know of a platform where this is an issue? Tom
Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)
ok for google/main. David On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote: This patch from Neil Vachharajani and Dehao Chen improves mcf by using minimum cost circulation instead of minimum cost flow to smooth profiles. It also introduces a parameter for controlling running time of the algorithm. This was what was originally presented in the academic work and handles certain cases where the function entry and exit have incorrect profile weights. For now, this is for google/main. Once I have collected performance results from SPEC I will propose this patch for trunk as well. Bootstraps and no test regressions. Ok for google/main? 2011-06-02 Neil Vachharajani nvach...@gmail.com, Dehao Chen daniel...@gmail.com * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document. * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS. (edge_type): Add SINK_SOURCE_EDGE. (dump_fixup_edge): Handle SINK_SOURCE_EDGE. (create_fixup_graph): Make problem miminum cost circulation. (cancel_negative_cycle): Update handling of infinite capacity. (compute_residual_flow): Update handling of infinite capacity. (find_max_flow): Update handling of infinite capacity. (modify_sink_source_capacity): New function. (find_minimum_cost_flow): Make problem miminum cost circulation. Use param PARAM_MIN_MCF_CANCEL_ITERS. * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 174456) +++ gcc/doc/invoke.texi (working copy) @@ -8341,6 +8341,12 @@ whether the result of a complex multipli The default is @option{-fno-cx-fortran-rules}. +@item min-mcf-cancel-iters +The minimum number of iterations of negative cycle cancellation during +MCF profile correction before early termination. This parameter is +only useful when using @option{-fprofile-correction}. + + @end table The following options control optimizations that may improve Index: gcc/mcf.c === --- gcc/mcf.c (revision 174456) +++ gcc/mcf.c (working copy) @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3. #include langhooks.h #include tree.h #include gcov-io.h +#include params.h +#include diagnostic-core.h #include profile.h @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3. #define COST(k, w) ((k) / mcf_ln ((w) + 2)) /* Limit the number of iterations for cancel_negative_cycles() to ensure reasonable compile time. */ -#define MAX_ITER(n, e) 10 + (100 / ((n) * (e))) +#define MAX_ITER(n, e) (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \ + (100 / ((n) * (e + typedef enum { - INVALID_EDGE, + INVALID_EDGE = 0, VERTEX_SPLIT_EDGE, /* Edge to represent vertex with w(e) = w(v). */ REDIRECT_EDGE, /* Edge after vertex transformation. */ REVERSE_EDGE, SOURCE_CONNECT_EDGE, /* Single edge connecting to single source. */ SINK_CONNECT_EDGE, /* Single edge connecting to single sink. */ + SINK_SOURCE_EDGE, /* Single edge connecting sink to source. */ BALANCE_EDGE, /* Edge connecting with source/sink: cp(e) = 0. */ REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge. */ REVERSE_NORMALIZED_EDGE /* Normalized edge for a reverse edge. */ @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph fputs ( @SINK_CONNECT_EDGE, file); break; + case SINK_SOURCE_EDGE: + fputs ( @SINK_SOURCE_EDGE, file); + break; + case REVERSE_EDGE: fputs ( @REVERSE_EDGE, file); break; @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi double k_neg = 0; /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v). */ gcov_type *diff_out_in = NULL; - gcov_type supply_value = 1, demand_value = 0; + gcov_type supply_value = 0, demand_value = 0; gcov_type fcost = 0; int new_entry_index = 0, new_exit_index = 0; int i = 0, j = 0; @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi fnum_vertices_after_transform + n_edges + n_basic_blocks + 2; /* In create_fixup_graph: Each basic block and edge can be split into 3 - edges. Number of balance edges = n_basic_blocks. So after - create_fixup_graph: - max_edges = 4 * n_basic_blocks + 3 * n_edges + edges. Number of balance edges = n_basic_blocks - 1. And there is 1 edge + connecting new_entry and new_exit, and 2 edges connecting new_entry to + entry, and exit to new_exit. So after create_fixup_graph: + max_edges = 4 * n_basic_blocks + 3 * n_edges + 2 Accounting for residual flow edges - max_edges = 2 * (4 * n_basic_blocks + 3 * n_edges) - = 8 * n_basic_blocks + 6 *
Re: [patch] testsuite: support board_info timeouts
Thanks! Committed.
Re: [google] Use minimum cost circulation, not minimum cost flow to smooth profiles other minor fixes. (issue4536106)
Counter overflow? David On Thu, Jun 2, 2011 at 11:12 AM, Martin Thuresson mart...@google.com wrote: On Thu, Jun 2, 2011 at 11:05 AM, Xinliang David Li davi...@google.com wrote: Smoothing works for sample FDO and profile data from multi-threaded programs. You won't see any difference in SPEC. Dehao reported some performance improvements from the algorithmic improvements he added in terms of extra fixup edges and handling of infinite capacity. Martin David On Thu, Jun 2, 2011 at 11:00 AM, Martin Thuresson mart...@google.com wrote: This patch from Neil Vachharajani and Dehao Chen improves mcf by using minimum cost circulation instead of minimum cost flow to smooth profiles. It also introduces a parameter for controlling running time of the algorithm. This was what was originally presented in the academic work and handles certain cases where the function entry and exit have incorrect profile weights. For now, this is for google/main. Once I have collected performance results from SPEC I will propose this patch for trunk as well. Bootstraps and no test regressions. Ok for google/main? 2011-06-02 Neil Vachharajani nvach...@gmail.com, Dehao Chen daniel...@gmail.com * gcc/doc/invoke.texi (min-mcf-cancel-iters): Document. * gcc/mcf.c (MAX_ITER): Use new param PARAM_MIN_MCF_CANCEL_ITERS. (edge_type): Add SINK_SOURCE_EDGE. (dump_fixup_edge): Handle SINK_SOURCE_EDGE. (create_fixup_graph): Make problem miminum cost circulation. (cancel_negative_cycle): Update handling of infinite capacity. (compute_residual_flow): Update handling of infinite capacity. (find_max_flow): Update handling of infinite capacity. (modify_sink_source_capacity): New function. (find_minimum_cost_flow): Make problem miminum cost circulation. Use param PARAM_MIN_MCF_CANCEL_ITERS. * gcc/params.def (PARAM_MIN_MCF_CANCEL_ITERS): Define. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 174456) +++ gcc/doc/invoke.texi (working copy) @@ -8341,6 +8341,12 @@ whether the result of a complex multipli The default is @option{-fno-cx-fortran-rules}. +@item min-mcf-cancel-iters +The minimum number of iterations of negative cycle cancellation during +MCF profile correction before early termination. This parameter is +only useful when using @option{-fprofile-correction}. + + @end table The following options control optimizations that may improve Index: gcc/mcf.c === --- gcc/mcf.c (revision 174456) +++ gcc/mcf.c (working copy) @@ -52,6 +52,8 @@ along with GCC; see the file COPYING3. #include langhooks.h #include tree.h #include gcov-io.h +#include params.h +#include diagnostic-core.h #include profile.h @@ -64,15 +66,18 @@ along with GCC; see the file COPYING3. #define COST(k, w) ((k) / mcf_ln ((w) + 2)) /* Limit the number of iterations for cancel_negative_cycles() to ensure reasonable compile time. */ -#define MAX_ITER(n, e) 10 + (100 / ((n) * (e))) +#define MAX_ITER(n, e) (PARAM_VALUE (PARAM_MIN_MCF_CANCEL_ITERS) + \ + (100 / ((n) * (e + typedef enum { - INVALID_EDGE, + INVALID_EDGE = 0, VERTEX_SPLIT_EDGE, /* Edge to represent vertex with w(e) = w(v). */ REDIRECT_EDGE, /* Edge after vertex transformation. */ REVERSE_EDGE, SOURCE_CONNECT_EDGE, /* Single edge connecting to single source. */ SINK_CONNECT_EDGE, /* Single edge connecting to single sink. */ + SINK_SOURCE_EDGE, /* Single edge connecting sink to source. */ BALANCE_EDGE, /* Edge connecting with source/sink: cp(e) = 0. */ REDIRECT_NORMALIZED_EDGE, /* Normalized edge for a redirect edge. */ REVERSE_NORMALIZED_EDGE /* Normalized edge for a reverse edge. */ @@ -250,6 +255,10 @@ dump_fixup_edge (FILE *file, fixup_graph fputs ( @SINK_CONNECT_EDGE, file); break; + case SINK_SOURCE_EDGE: + fputs ( @SINK_SOURCE_EDGE, file); + break; + case REVERSE_EDGE: fputs ( @REVERSE_EDGE, file); break; @@ -465,7 +474,7 @@ create_fixup_graph (fixup_graph_type *fi double k_neg = 0; /* Vector to hold D(v) = sum_out_edges(v) - sum_in_edges(v). */ gcov_type *diff_out_in = NULL; - gcov_type supply_value = 1, demand_value = 0; + gcov_type supply_value = 0, demand_value = 0; gcov_type fcost = 0; int new_entry_index = 0, new_exit_index = 0; int i = 0, j = 0; @@ -486,14 +495,15 @@ create_fixup_graph (fixup_graph_type *fi fnum_vertices_after_transform + n_edges + n_basic_blocks + 2; /* In create_fixup_graph: Each basic block and edge can be split
Fix for PR objc/48539 (Missing warning when messaging a forward-declared class)
This patch fixes PR objc/48539 (Missing warning when messaging a forward-declared class). The problem occurs when using @class, and then messaging class or instance objects of the class. It can happen both with class and instance methods. -- An example with class methods is @class MyClass; [MyClass method]; In this example, the compiler has no information on 'MyClass' and on what methods it responds to. So, there is no way to determine if MyClass responds to the method +method, and what the method prototype is (in this example it doesn't matter, but if there are arguments or return values, it could matter a lot, including potential causing a crash at runtime if the wrong method prototype is used). Clang emits a warning there, which seems very appropriate, and with this patch, GCC 4.7.0 emits a warning there too. :-) -- Then there is then the issue of what to do about instance methods, as in the following example -- @class MyClass; MyClass *x; [x method]; This is almost identical to the case above, and this patch adds a similar warning. ;-) Note that in this case, the current behaviour of the compiler is substandard; the compiler silently throws away the MyClass * type, silently casts x to id, and proceeds to accept for it to be used as a receiver of any possible method of any class, without any warning (!!). clang does the same by the way. We clearly do want to emit a warning there, instead. The fact that the programmer has explicitly declared x to be of type MyClass * instead of id means she is expecting the compiler to use that information to do the standard method lookup/check based on the class. If the @interface is missing, it is most likely an error / slip in the program, which is worthwhile for the compiler to warn about (in the same way as we warn above for class methods!). ;-) So this patch changes this behaviour and adds a warning here; if the programmer doesn't want the warning and is happy with x being treated as an id, she can simply add a cast to id to clarify her mind, and the warning will go away. Ie. if you do @class MyClass; MyClass *x; [(id)x method]; you don't get any warning as you explicitly disabled type-based checks by casting to id. But if you leave x to be of type MyClass *, then you're asking for type-based checks / method lookup, the compiler will try to do the method lookup, and if the @interface of MyClass was not found, will emit a warning because it can't do the requested type-based checks / method lookup. ;-) I tried this patch with gnustep core and it did find a number of slips in the code, which is good. No major bugs, but all cases where someone had forgotten to #include the header with the @interface of a class and so where the compiler couldn't do the proper checks, but nobody would notice because of the current silent behaviour where the missing @interface causes the variable to magically and silently become of type id. In fact, looking at the examples is very convincing that we need this warning. Without it, @class NSArray; basically makes NSArray * a typedef for id when doing method invocations. So, in your code you may have methods or functions taking (NSArray *) arguments, and then you call methods on these arguments, expecting the compiler to check that the methods are appropriate for an NSArray. Instead, the compiler is silently treating NSArray * as identical to id, and performing no checks whatsoever, and not bothering to tell you anything about the fact! ;-) -- Finally, there's the additional complication of deciding what to do when this is mixed with protocols, as in -- @class MyClass; @protocol MyProtocol - (void) method; @end MyClass MyProtocol *x; [x method]; This is a weird/rare case, and I don't expect to see it much in practice, but it needs to be sorted out, and GCC already has at least two existing testcases for it in the GCC testsuite. At the moment, the compiler does the equivalent of silently converting MyClass MyProtocol * to id MyProtocol. That's not great, but I pondered about this for a long time, tried a few variations, and then decided to make no changes. :-) The reason is that if the method being called is part of the protocol, then the compiler can find the method prototype (and do the type-checking) without needing any more information on the actual class. Complaining about the @interface not being available seems pointless nit-picking, since it's not required to do the type-based method lookup. ;-) If the method being called is not part of the protocol, the compiler already emits a warning that the method could not be found in the protocol. I thought about adding a second warning about the @interface not being found, and for a while had it in the patch, but in practice it seemed overkill and I removed it from the final version. -- The warning message that I chose for GCC is -- method-lookup-1.m:42:3: warning: @interface of class ‘NotKnown’
Re: Fix for PR objc/48539 (Missing warning when messaging a forward-declared class)
On Jun 2, 2011, at 11:29 AM, Nicola Pero wrote: This patch fixes PR objc/48539 (Missing warning when messaging a forward-declared class). Ok to commit to trunk ? Ok.
Re: __sync_swap* with acq/rel/full memory barrier semantics
On 05/30/11 15:07, Andrew MacLeod wrote: Aldy was just too excited about working on memory model I think :-) I've been looking at this, and I propose we go this way : http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen Still overly excited, but now with a more thorough plan :). I'm going to concentrate on the non controversial parts (the __sync builtins), while the details are ironed out. The attached patch implements the exchange operation, with a parameter/enum for the type of memory model to use. I have chosen to call the builtins __sync_mem_BLAH to keep them all consistent. I am including documentation and a test, so folks can get an idea of where I'm headed with this. Once I take everyone's input, we can implement the rest of the builtins, and take it from there. I see no prior art in providing some sort of enum for a builtin parameter. I can proceed down this path if advisable, but an easier path is to just declare the __SYNC_MEM_* enum as preprocessor macros as I do in this patch. Suggestions welcome. How does this (lightly tested patch) look? * doc/extend.texi (__sync_mem_exchange): Document. * cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*. * c-family/c-common.c (BUILT_IN_MEM_EXCHANGE_N): Add case. * optabs.c (expand_sync_mem_exchange): New. * optabs.h (enum direct_optab_index): Add DOI_sync_mem* entries. (sync_mem_exchange_*_optab): Define. * genopinit.c: Add entries for sync_mem_exchange_*. * tree.h (enum memmodel): New. * builtins.c (get_memmodel): New. (expand_builtin_mem_exchange): New. (expand_builtin_synchronize): Remove static. (expand_builtin): Add cases for BUILT_IN_MEM_EXCHANGE_*. * sync-builtins.def: Add entries for BUILT_IN_MEM_EXCHANGE_*. * builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New. * expr.h (expand_sync_mem_exchange): Declare. (expand_builtin_synchronize): Same. * config/i386/i386.md (UNSPECV_MEM_XCHG): New. (sync_mem_exchange_seq_cstmode): New pattern. Index: doc/extend.texi === --- doc/extend.texi (revision 173831) +++ doc/extend.texi (working copy) @@ -6728,6 +6728,22 @@ This builtin is not a full barrier, but This means that all previous memory stores are globally visible, and all previous memory loads have been satisfied, but following memory reads are not prevented from being speculated to before the barrier. + +@item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int memmodel, ...) +@findex __sync_mem_exchange +This builtin implements an atomic exchange operation within the +constraints of a memory model. It writes @var{value} into +@code{*@var{ptr}}, and returns the previous contents of +@code{*@var{ptr}}. + +The valid memory model variants for this builtin are +__SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE, +__SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL. If the variant is not +available for the given target, the compiler will fall back to the +more restrictive memory model, the sequentially consistent model (if +available). If the sequentially consistent model is not implemented +for the target, the compiler will implement the builtin with a compare +and swap loop. @end table @node Object Size Checking Index: cppbuiltin.c === --- cppbuiltin.c(revision 173831) +++ cppbuiltin.c(working copy) @@ -66,6 +66,12 @@ define__GNUC__ (cpp_reader *pfile) cpp_define_formatted (pfile, __GNUC_MINOR__=%d, minor); cpp_define_formatted (pfile, __GNUC_PATCHLEVEL__=%d, patchlevel); cpp_define_formatted (pfile, __VERSION__=\%s\, version_string); + cpp_define_formatted (pfile, __SYNC_MEM_RELAXED=%d, MEMMODEL_RELAXED); + cpp_define_formatted (pfile, __SYNC_MEM_SEQ_CST=%d, MEMMODEL_SEQ_CST); + cpp_define_formatted (pfile, __SYNC_MEM_ACQUIRE=%d, MEMMODEL_ACQUIRE); + cpp_define_formatted (pfile, __SYNC_MEM_RELEASE=%d, MEMMODEL_RELEASE); + cpp_define_formatted (pfile, __SYNC_MEM_ACQ_REL=%d, MEMMODEL_ACQ_REL); + cpp_define_formatted (pfile, __SYNC_MEM_CONSUME=%d, MEMMODEL_CONSUME); } Index: c-family/c-common.c === --- c-family/c-common.c (revision 173831) +++ c-family/c-common.c (working copy) @@ -9035,6 +9035,7 @@ resolve_overloaded_builtin (location_t l case BUILT_IN_VAL_COMPARE_AND_SWAP_N: case BUILT_IN_LOCK_TEST_AND_SET_N: case BUILT_IN_LOCK_RELEASE_N: +case BUILT_IN_MEM_EXCHANGE_N: { int n = sync_resolve_size (function, params); tree new_function, first_param, result; Index: optabs.c === --- optabs.c(revision 173831) +++ optabs.c(working copy) @@ -6988,6 +6988,85 @@ expand_sync_lock_test_and_set (rtx mem,
Re: __sync_swap* with acq/rel/full memory barrier semantics
On Thu, Jun 02, 2011 at 02:12:38PM -0500, Aldy Hernandez wrote: +/* This function expands a fine grained atomic exchange operation: + atomically store VAL in MEM and return the previous value in MEM. + + MEMMODEL is the memory model variant to use. + TARGET is an option place to stick the return value. */ + +rtx +expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target) +{ + enum machine_mode mode = GET_MODE (mem); + enum insn_code icode; + direct_optab op; + + switch (model) +{ +case MEMMODEL_RELAXED: + /* ?? Eventually we should either just emit the atomic + instruction without any barriers (and thus allow movements + and transformations), or emit a relaxed builtin. + + It is still not clear whether any transformations are + permissible on the atomics (for example, CSE might break + coherence), so we might need to emit a relaxed builtin. + + Until we figure this out, be conservative and fall + through. */ +case MEMMODEL_SEQ_CST: + op = sync_mem_exchange_seq_cst_optab; + break; +case MEMMODEL_ACQUIRE: + op = sync_mem_exchange_acq_optab; + break; +case MEMMODEL_RELEASE: + op = sync_mem_exchange_rel_optab; + break; +case MEMMODEL_ACQ_REL: + op = sync_mem_exchange_acq_rel_optab; + break; Wouldn't it be better to pass the model (as an extra CONST_INT operand) to the expanders? Targets where atomic instructions always act as full barriers could just ignore that argument, other could decide what to do based on the value. Jakub
[PATCH] [Bug c++/49118] fake template nesting for operator- chain
This is my first frontend contribution. While it fixes the crash and produces an explanatory error message, the message isn't quite right. I don't understand the message generation system so I might need help. Or, it looks like there's an issue with template backtraces at the moment anyway, so there might be an interaction with another known bug. The problem occurs when operator- drill-down behavior is infinitely chained, for example with a template template int n t n + 1 t n ::operator-() There is no cycle to signal endlessness, and no template nesting, as drill-down is implemented as a deep expression, not tail-calls. The result is that the compiler hangs. My solution is to pretend that there is template nesting, presuming the user will find this intuitive. There is the added benefit of the maximum chain length being configured by the template nesting limit. Drill-down is implemented by build_x_arrow. If operator- resolves, it calls it and uses the result type to lookup another operator-. I'd like to re-open a template context related to operator- after generating the call. The function push_tinst_level seems to relate only to diagnostics, with no semantic effect, so it seems a good candidate. Optimally the re-opened context would be the preceding operator- function itself, to create the illusion of nested calls. However, the result of build_new_op may be a target_expr or a call_expr. I'm not sure of the best way to recover the function declaration from this ambiguous tree, nor whether it would a performance issue (i.e., too much work for the reward). The identity of the class containing the *next* operator- call is easy to recover, however, since it is the type of the expression from build_new_op. This introduces an off-by-one error, and gets us a class template rather than the more relevant function member. These problems shouldn't matter since this is all just for diagnostics. But perhaps the discrepancy between having a function type and a class type is interfering with message generation? Thanks for the help and consideration! endless_arrow.clog Description: Binary data endless_arrow.patch Description: Binary data
Re: __sync_swap* with acq/rel/full memory barrier semantics
On 06/02/11 14:25, Jakub Jelinek wrote: +case MEMMODEL_SEQ_CST: + op = sync_mem_exchange_seq_cst_optab; + break; +case MEMMODEL_ACQUIRE: + op = sync_mem_exchange_acq_optab; + break; +case MEMMODEL_RELEASE: + op = sync_mem_exchange_rel_optab; + break; +case MEMMODEL_ACQ_REL: + op = sync_mem_exchange_acq_rel_optab; + break; Wouldn't it be better to pass the model (as an extra CONST_INT operand) to the expanders? Targets where atomic instructions always act as full barriers could just ignore that argument, other could decide what to do based on the value. *shrug* I don't care. Whatever everyone agrees on.
Re: PING^2 [PATCH] Support for AMD64 targets running GNU/kFreeBSD
Hi, 2011/5/21 Joseph S. Myers jos...@codesourcery.com: Please send a patch against *current trunk* and CC *relevant target architecture maintainers*. linux*.h headers are no longer used on non-Linux targets (since my 2011-04-28 patch - on which I CC:ed you) so this patch version is no longer appropriate. I think you'll want to make gnu-user64.h use GNU_USER_LINK_EMULATION32 and GNU_USER_LINK_EMULATION64 similarly to how gnu-user.h uses GNU_USER_LINK_EMULATION. Thanks for the tip. Here's an update to current trunk. -- Robert Millan 2011-06-02 Robert Millan r...@gnu.org * config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'. * config/kfreebsd-gnu.h (GNU_USER_DYNAMIC_LINKER): Resync with `config/linux.h'. * config/i386/kfreebsd-gnu64.h: New file. * config.gcc (x86_64-*-kfreebsd*-gnu): Replace `i386/kfreebsd-gnu.h' with `i386/kfreebsd-gnu64.h'. * config/i386/linux64.h (GNU_USER_LINK_EMULATION32) (GNU_USER_LINK_EMULATION64): New macros. * config/i386/gnu-user64.h (LINK_SPEC): Rely on `GNU_USER_LINK_EMULATION32' and `GNU_USER_LINK_EMULATION64' instead of hardcoding `elf_i386' and `elf_x86_64'. Index: gcc/config/i386/kfreebsd-gnu64.h === --- gcc/config/i386/kfreebsd-gnu64.h(revision 0) +++ gcc/config/i386/kfreebsd-gnu64.h(revision 0) @@ -0,0 +1,26 @@ +/* Definitions for AMD x86-64 running kFreeBSD-based GNU systems with ELF format + Copyright (C) 2011 + Free Software Foundation, Inc. + Contributed by Robert Millan. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#define GNU_USER_LINK_EMULATION32 elf_i386_fbsd +#define GNU_USER_LINK_EMULATION64 elf_x86_64_fbsd + +#define GLIBC_DYNAMIC_LINKER32 /lib/ld.so.1 +#define GLIBC_DYNAMIC_LINKER64 /lib64/ld-kfreebsd-x86-64.so.1 Index: gcc/config/i386/kfreebsd-gnu.h === --- gcc/config/i386/kfreebsd-gnu.h (revision 174566) +++ gcc/config/i386/kfreebsd-gnu.h (working copy) @@ -1,5 +1,5 @@ /* Definitions for Intel 386 running kFreeBSD-based GNU systems with ELF format - Copyright (C) 2004, 2007, 2011 + Copyright (C) 2011 Free Software Foundation, Inc. Contributed by Robert Millan. @@ -19,11 +19,5 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ -#undef GNU_USER_LINK_EMULATION #define GNU_USER_LINK_EMULATION elf_i386_fbsd - -#undef GNU_USER_DYNAMIC_LINKER32 -#define GNU_USER_DYNAMIC_LINKER32 /lib/ld.so.1 - -#undef GNU_USER_DYNAMIC_LINKER64 -#define GNU_USER_DYNAMIC_LINKER64 /lib/ld-kfreebsd-x86-64.so.1 +#define GLIBC_DYNAMIC_LINKER /lib/ld.so.1 Index: gcc/config/i386/linux64.h === --- gcc/config/i386/linux64.h (revision 174566) +++ gcc/config/i386/linux64.h (working copy) @@ -24,6 +24,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see http://www.gnu.org/licenses/. */ +#define GNU_USER_LINK_EMULATION32 elf_i386 +#define GNU_USER_LINK_EMULATION64 elf_x86_64 + #define GLIBC_DYNAMIC_LINKER32 /lib/ld-linux.so.2 #define GLIBC_DYNAMIC_LINKER64 /lib64/ld-linux-x86-64.so.2 Index: gcc/config/i386/gnu-user64.h === --- gcc/config/i386/gnu-user64.h(revision 174566) +++ gcc/config/i386/gnu-user64.h(working copy) @@ -69,7 +69,8 @@ %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}} #undef LINK_SPEC -#define LINK_SPEC %{ SPEC_64 :-m elf_x86_64} %{ SPEC_32 :-m elf_i386} \ +#define LINK_SPEC %{ SPEC_64 :-m GNU_USER_LINK_EMULATION64 } \ + %{ SPEC_32 :-m GNU_USER_LINK_EMULATION32 } \ %{shared:-shared} \ %{!shared: \ %{!static: \ Index: gcc/config/kfreebsd-gnu.h === --- gcc/config/kfreebsd-gnu.h (revision 174566) +++ gcc/config/kfreebsd-gnu.h (working copy) @@ -19,7 +19,6 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ -#undef GNU_USER_TARGET_OS_CPP_BUILTINS #define GNU_USER_TARGET_OS_CPP_BUILTINS() \ do \ { \ @@ -31,5 +30,6 @@ }
[PATCH, i386]: Introduce Y3 register constraint and merge SSE3 patterns
Hello! 2011-06-02 Uros Bizjak ubiz...@gmail.com * config/i386/constraints.md (Y3): New register constraint. * config/i386/sse.md (*vec_interleave_highv2df): Merge with *sse3_interleave_highv2df and *sse2_interleave_highv2df. (*vec_interleave_lowv2df): Merge with *sse3_interleave_lowv2df and *sse2_interleave_lowv2df. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: constraints.md === --- constraints.md (revision 174570) +++ constraints.md (working copy) @@ -99,6 +99,9 @@ (define_register_constraint Y2 TARGET_SSE2 ? SSE_REGS : NO_REGS @internal Any SSE register, when SSE2 is enabled.) +(define_register_constraint Y3 TARGET_SSE3 ? SSE_REGS : NO_REGS + @internal Any SSE register, when SSE3 is enabled.) + (define_register_constraint Y4 TARGET_SSE4_1 ? SSE_REGS : NO_REGS @internal Any SSE register, when SSE4_1 is enabled.) Index: sse.md === --- sse.md (revision 174570) +++ sse.md (working copy) @@ -3804,15 +3804,15 @@ operands[2] = force_reg (V2DFmode, operands[2]); }) -(define_insn *sse3_interleave_highv2df - [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,x,x,x,m) +(define_insn *vec_interleave_highv2df + [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,Y3,x,x,m) (vec_select:V2DF (vec_concat:V4DF - (match_operand:V2DF 1 nonimmediate_operand 0,x,o,o,o,x) - (match_operand:V2DF 2 nonimmediate_operand x,x,1,0,x,0)) + (match_operand:V2DF 1 nonimmediate_operand 0,x,o ,o,o,x) + (match_operand:V2DF 2 nonimmediate_operand x,x,1 ,0,x,0)) (parallel [(const_int 1) (const_int 3)])))] - TARGET_SSE3 ix86_vec_interleave_v2df_operator_ok (operands, 1) + TARGET_SSE2 ix86_vec_interleave_v2df_operator_ok (operands, 1) @ unpckhpd\t{%2, %0|%0, %2} vunpckhpd\t{%2, %1, %0|%0, %1, %2} @@ -3826,23 +3826,6 @@ (set_attr prefix orig,vex,maybe_vex,orig,vex,maybe_vex) (set_attr mode V2DF,V2DF,V2DF,V1DF,V1DF,V1DF)]) -(define_insn *sse2_interleave_highv2df - [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,m) - (vec_select:V2DF - (vec_concat:V4DF - (match_operand:V2DF 1 nonimmediate_operand 0,o,x) - (match_operand:V2DF 2 nonimmediate_operand x,0,0)) - (parallel [(const_int 1) -(const_int 3)])))] - TARGET_SSE2 ix86_vec_interleave_v2df_operator_ok (operands, 1) - @ - unpckhpd\t{%2, %0|%0, %2} - movlpd\t{%H1, %0|%0, %H1} - movhpd\t{%1, %0|%0, %1} - [(set_attr type sselog,ssemov,ssemov) - (set_attr prefix_data16 *,1,1) - (set_attr mode V2DF,V1DF,V1DF)]) - ;; Recall that the 256-bit unpck insns only shuffle within their lanes. (define_expand avx_movddup256 [(set (match_operand:V4DF 0 register_operand ) @@ -3923,15 +3906,15 @@ operands[1] = force_reg (V2DFmode, operands[1]); }) -(define_insn *sse3_interleave_lowv2df - [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,x,x,x,o) +(define_insn *vec_interleave_lowv2df + [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,Y3,x,x,o) (vec_select:V2DF (vec_concat:V4DF - (match_operand:V2DF 1 nonimmediate_operand 0,x,m,0,x,0) - (match_operand:V2DF 2 nonimmediate_operand x,x,1,m,m,x)) + (match_operand:V2DF 1 nonimmediate_operand 0,x,m ,0,x,0) + (match_operand:V2DF 2 nonimmediate_operand x,x,1 ,m,m,x)) (parallel [(const_int 0) (const_int 2)])))] - TARGET_SSE3 ix86_vec_interleave_v2df_operator_ok (operands, 0) + TARGET_SSE2 ix86_vec_interleave_v2df_operator_ok (operands, 0) @ unpcklpd\t{%2, %0|%0, %2} vunpcklpd\t{%2, %1, %0|%0, %1, %2} @@ -3945,23 +3928,6 @@ (set_attr prefix orig,vex,maybe_vex,orig,vex,maybe_vex) (set_attr mode V2DF,V2DF,V2DF,V1DF,V1DF,V1DF)]) -(define_insn *sse2_interleave_lowv2df - [(set (match_operand:V2DF 0 nonimmediate_operand =x,x,o) - (vec_select:V2DF - (vec_concat:V4DF - (match_operand:V2DF 1 nonimmediate_operand 0,0,0) - (match_operand:V2DF 2 nonimmediate_operand x,m,x)) - (parallel [(const_int 0) -(const_int 2)])))] - TARGET_SSE2 ix86_vec_interleave_v2df_operator_ok (operands, 0) - @ - unpcklpd\t{%2, %0|%0, %2} - movhpd\t{%2, %0|%0, %2} - movlpd\t{%2, %H0|%H0, %2} - [(set_attr type sselog,ssemov,ssemov) - (set_attr prefix_data16 *,1,1) - (set_attr mode V2DF,V1DF,V1DF)]) - (define_split [(set (match_operand:V2DF 0 memory_operand ) (vec_select:V2DF
Re: [google]Backport r174549 Fix 3 test cases incorrectly run in Thumb/Xscale (issue4524090)
OK for google/main. thanks Carrot On Thu, Jun 2, 2011 at 12:51 PM, Jing Yu jin...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00134.html Backport r174549 to fix three testcases that are specific to ARM mode and therefore should be skipped when compiling for thumb. Thanks, Jing 2011-06-01 Jing Yu jin...@google.com Backport r174549 2011-06-01 Sofiane Naci sofiane.n...@arm.com * gcc.target/arm/mmx-1.c: Skip test in -mthumb. * gcc.target/arm/g2.c: Skip test in -mthumb. Skip test unless cpu is xscale. * gcc.target/arm/scd42-2.c: Likewise. Index: gcc.target/arm/mmx-1.c === --- gcc.target/arm/mmx-1.c (revision 174299) +++ gcc.target/arm/mmx-1.c (working copy) @@ -4,6 +4,7 @@ /* { dg-skip-if Test is specific to the iWMMXt { arm*-*-* } { -mcpu=* } { -mcpu=iwmmxt } } */ /* { dg-skip-if Test is specific to the iWMMXt { arm*-*-* } { -mabi=* } { -mabi=iwmmxt } } */ /* { dg-skip-if Test is specific to the iWMMXt { arm*-*-* } { -march=* } { -march=iwmmxt } } */ +/* { dg-skip-if Test is specific to ARM mode { arm*-*-* } { -mthumb } { } } */ /* { dg-options -O -mno-apcs-frame -mcpu=iwmmxt -mabi=iwmmxt } */ /* { dg-require-effective-target arm32 } */ /* { dg-require-effective-target arm_iwmmxt_ok } */ Index: gcc.target/arm/g2.c === --- gcc.target/arm/g2.c (revision 174299) +++ gcc.target/arm/g2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-do compile } */ /* { dg-options -mcpu=xscale -O2 } */ /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -march=* } { -march=xscale } } */ +/* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -mcpu=* } { -mcpu=xscale } } */ +/* { dg-skip-if Test is specific to ARM mode { arm*-*-* } { -mthumb } { } } */ /* { dg-require-effective-target arm32 } */ /* Brett Gaines' test case. */ Index: gcc.target/arm/scd42-2.c === --- gcc.target/arm/scd42-2.c (revision 174299) +++ gcc.target/arm/scd42-2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-do compile } */ /* { dg-options -mcpu=xscale -O } */ /* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -march=* } { -march=xscale } } */ +/* { dg-skip-if Test is specific to the Xscale { arm*-*-* } { -mcpu=* } { -mcpu=xscale } } */ +/* { dg-skip-if Test is specific to ARM mode { arm*-*-* } { -mthumb } { } } */ /* { dg-require-effective-target arm32 } */ unsigned load2(void) __attribute__ ((naked)); -- This patch is available for review at http://codereview.appspot.com/4524090
Re: Remove SETJMP_VIA_SAVE_AREA support
This exposed a couple of similar bugs in cse.c and postreload-gcse.c: the code was effectively treating a basic block with a single, abnormal incoming edge as if the edge was normal. I've installed the following refined fix, after testing on i586-suse-linux and sparc-sun-solaris2.10. Most EDGE_ABNORMAL edges can very likely be treated normally here, for example EH edges when call-saved registers are considered. The only really problematic ones are EDGE_ABNORMAL_CALL edges when there is a non-local label in the function, because even call-saved registers are not guaranteed to be preserved in this case. 2011-06-02 Eric Botcazou ebotca...@adacore.com * cse.c (cse_find_path): Refine change to exclude EDGE_ABNORMAL_CALL edges only, when there is a non-local label in the function. * postreload-gcse.c (bb_has_well_behaved_predecessors): Likewise. -- Eric Botcazou Index: cse.c === --- cse.c (revision 174564) +++ cse.c (working copy) @@ -6193,7 +6193,7 @@ cse_find_path (basic_block first_bb, str e = NULL; if (e - (e-flags EDGE_ABNORMAL) == 0 + !((e-flags EDGE_ABNORMAL_CALL) cfun-has_nonlocal_label) e-dest != EXIT_BLOCK_PTR single_pred_p (e-dest) /* Avoid visiting basic blocks twice. The large comment Index: postreload-gcse.c === --- postreload-gcse.c (revision 174564) +++ postreload-gcse.c (working copy) @@ -912,12 +912,10 @@ get_avail_load_store_reg (rtx insn) static bool bb_has_well_behaved_predecessors (basic_block bb) { - unsigned int edge_count = EDGE_COUNT (bb-preds); edge pred; edge_iterator ei; - if (edge_count == 0 - || (edge_count == 1 (single_pred_edge (bb)-flags EDGE_ABNORMAL))) + if (EDGE_COUNT (bb-preds) == 0) return false; FOR_EACH_EDGE (pred, ei, bb-preds) @@ -925,6 +923,9 @@ bb_has_well_behaved_predecessors (basic_ if ((pred-flags EDGE_ABNORMAL) EDGE_CRITICAL_P (pred)) return false; + if ((pred-flags EDGE_ABNORMAL_CALL) cfun-has_nonlocal_label) + return false; + if (JUMP_TABLE_DATA_P (BB_END (pred-src))) return false; }
Re: [patch] add -Wdelete-non-virtual-dtor
On 2 June 2011 22:27, Jonathan Wakely wrote: -Wnon-virtual-dtor isn't always what you want, defining a polymorphic object without a virtual destructor is not necessarily a mistake. You may never delete such an object so instead of warning when the class is defined it's more useful to warn only when the class is deleted, as Clang does with -Wdelete-non-virtual-dtor This patch implements the same warning for G++. That patch was the wrong one, with a typo in the new test. The correct one, as tested, is attached, but only differs in the additional } character in the testcase. Index: c-family/c.opt === --- c-family/c.opt (revision 174539) +++ c-family/c.opt (working copy) @@ -331,6 +331,10 @@ Wdeclaration-after-statement C ObjC Var(warn_declaration_after_statement) Warning Warn when a declaration is found after a statement +Wdelete-non-virtual-dtor +C++ ObjC++ Var(warn_delnonvdtor) Warning +Warn about deleting polymorphic objects with non-virtual destructors + Wdeprecated C C++ ObjC ObjC++ Var(warn_deprecated) Init(1) Warning Warn if a deprecated compiler feature, class, method, or field is used Index: c-family/c-opts.c === --- c-family/c-opts.c (revision 174539) +++ c-family/c-opts.c (working copy) @@ -405,6 +405,7 @@ c_common_handle_option (size_t scode, co warn_sign_compare = value; warn_reorder = value; warn_cxx0x_compat = value; + warn_delnonvdtor = value; } cpp_opts-warn_trigraphs = value; Index: cp/init.c === --- cp/init.c (revision 174539) +++ cp/init.c (working copy) @@ -3421,6 +3421,31 @@ build_delete (tree type, tree addr, spec } complete_p = false; } + else if (warn_delnonvdtor MAYBE_CLASS_TYPE_P (type) +!CLASSTYPE_FINAL (type) TYPE_POLYMORPHIC_P (type)) + { + tree dtor; + dtor = CLASSTYPE_DESTRUCTORS (type); + if (!dtor || !DECL_VINDEX (dtor)) + { + tree x; + bool abstract = false; + for (x = TYPE_METHODS (type); x; x = DECL_CHAIN (x)) + if (DECL_PURE_VIRTUAL_P (x)) + { + abstract = true; + break; + } + if (abstract) + warning(OPT_Wdelete_non_virtual_dtor, deleting object of + abstract class type %qT which has non-virtual + destructor will cause undefined behaviour, type); + else + warning(OPT_Wdelete_non_virtual_dtor, deleting object of + polymorphic class type %qT which has non-virtual + destructor may cause undefined behaviour, type); + } + } } if (VOID_TYPE_P (type) || !complete_p || !MAYBE_CLASS_TYPE_P (type)) /* Call the builtin operator delete. */ Index: doc/invoke.texi === --- doc/invoke.texi (revision 174539) +++ doc/invoke.texi (working copy) @@ -2331,6 +2331,15 @@ Warn when a class seems unusable because destructors in that class are private, and it has neither friends nor public static member functions. +@item -Wdelete-non-virtual-dtor @r{(C++ and Objective-C++ only)} +@opindex Wdelete-non-virtual-dtor +@opindex Wno-delete-non-virtual-dtor +Warn when @samp{delete} is used to destroy an instance of a class which +has virtual functions and non-virtual destructor. It is unsafe to delete +an instance of a derived class through a pointer to a base class if the +base class does not have a virtual destructor. This warning is enabled +by @option{-Wall}. + @item -Wnoexcept @r{(C++ and Objective-C++ only)} @opindex Wnoexcept @opindex Wno-noexcept Index: testsuite/g++.dg/warn/delete-non-virtual-dtor.C === --- testsuite/g++.dg/warn/delete-non-virtual-dtor.C (revision 0) +++ testsuite/g++.dg/warn/delete-non-virtual-dtor.C (revision 0) @@ -0,0 +1,44 @@ +// { dg-options -std=gnu++0x -Wdelete-non-virtual-dtor } +// { dg-do compile } + +struct polyBase { virtual void f(); }; + +void f(polyBase* p, polyBase* arr) +{ + delete p; // { dg-warning non-virtual destructor may } + delete [] arr; +} + +struct polyDerived : polyBase { }; + +void f(polyDerived* p, polyDerived* arr) +{ + delete p; // { dg-warning non-virtual destructor may } + delete [] arr; +} + +struct absDerived : polyBase { virtual void g() = 0; }; + +void f(absDerived* p, absDerived* arr) +{ + delete p; // { dg-warning non-virtual destructor will } + delete [] arr; +} + +struct finalDerived
[patch committed] Fix PR target/49163
Hi, The attached patch is to fix PR target/49163. The problem occurs with the unrecognizable insn like (insn 66 141 110 4 (set (reg:SI 4 r4) (sign_extend:SI (subreg:QI (mem/s/v/u/c:DI (plus:SI (reg/f:SI 7 r7 [192]) (const_int 12 [0xc])) [3 s2array[1][0].f0+0 S8 A32]) 0))) iii.c:39 164 {*extendqisi2_compact} (expr_list:REG_DEAD (reg/f:SI 7 r7 [192]) (nil))) which is an intermediate insn in reload. SH makes the memory address like (plus (reg) (const_int)) invalid for HI/QImode because SH's mov.b instruction can take only R0 as the other operand for that memory and compiler can't handle such case well. The patch makes constraints for some move insns more rigid about invalid addresses of this type so to avoid generating a problematic move insn. The patch is tested on sh4-unknown-linux-gnu with no new failures. and the new test is tested also on i686-pc-linux-gnu. Applied on trunk. Regards, kaz -- 2011-06-02 Kaz Kojima kkoj...@gcc.gnu.org PR target/49163 * config/sh/predicates.md (general_movsrc_operand): Return 0 for memory and memory subreg of which address is an invalid indexed address for QI and HImode. (general_movdst_operand): Likewise. [testsuite] PR target/49163 * gcc.c-torture/compile/pr49163.c: New. diff -uprN ORIG/trunk/gcc/config/sh/predicates.md trunk/gcc/config/sh/predicates.md --- ORIG/trunk/gcc/config/sh/predicates.md 2010-04-12 09:52:36.0 +0900 +++ trunk/gcc/config/sh/predicates.md 2011-06-02 10:17:40.0 +0900 @@ -394,6 +394,18 @@ return 0; } + if ((mode == QImode || mode == HImode) + (MEM_P (op) + || (GET_CODE (op) == SUBREG MEM_P (SUBREG_REG (op) +{ + rtx x = XEXP ((MEM_P (op) ? op : SUBREG_REG (op)), 0); + + if (GET_CODE (x) == PLUS + REG_P (XEXP (x, 0)) + CONST_INT_P (XEXP (x, 1))) + return sh_legitimate_index_p (mode, XEXP (x, 1)); +} + if (TARGET_SHMEDIA (GET_CODE (op) == PARALLEL || GET_CODE (op) == CONST_VECTOR) sh_rep_vec (op, mode)) @@ -419,6 +431,18 @@ ! (high_life_started || reload_completed)) return 0; + if ((mode == QImode || mode == HImode) + (MEM_P (op) + || (GET_CODE (op) == SUBREG MEM_P (SUBREG_REG (op) +{ + rtx x = XEXP ((MEM_P (op) ? op : SUBREG_REG (op)), 0); + + if (GET_CODE (x) == PLUS + REG_P (XEXP (x, 0)) + CONST_INT_P (XEXP (x, 1))) + return sh_legitimate_index_p (mode, XEXP (x, 1)); +} + return general_operand (op, mode); }) diff -uprN ORIG/trunk/gcc/testsuite/gcc.c-torture/compile/pr49163.c trunk/gcc/testsuite/gcc.c-torture/compile/pr49163.c --- ORIG/trunk/gcc/testsuite/gcc.c-torture/compile/pr49163.c1970-01-01 09:00:00.0 +0900 +++ trunk/gcc/testsuite/gcc.c-torture/compile/pr49163.c 2011-06-02 20:58:31.0 +0900 @@ -0,0 +1,35 @@ +/* PR target/49163 */ +struct S1 +{ + unsigned f0:18; + int f1; +} __attribute__ ((packed)); + +struct S2 +{ + volatile long long f0; + int f1; +}; + +struct S1 s1; +struct S2 s2; +const struct S2 s2array[2][1] = { }; + +struct S2 **sptr; + +extern int bar (char a, long long b, int * c, long long d, long long e); +extern int baz (void); + +int i; +int *ptr; + +void +foo (int *arg) +{ + for (i = 0; i 1; i = baz()) +{ + *arg = *(int *)sptr; + *ptr = bar (*arg, s2.f1, ptr, + bar (s2array[1][0].f0, *arg, ptr, s1.f1, *ptr), *arg); +} +}
Re: Initialize INSN_COND (was: C6X port 5/11: Track predication conditions more accurately)
On Thu, 2011-06-02 at 15:29 +0400, Alexander Monakov wrote: Bernd, The problem is INSN_COND should be reset when initializing a new deps structure, otherwise instructions may get stale conditions from other previously analyzed instructions. Presuming that sd_init_insn is the proper place for that, I'll test the following patch. 2011-06-02 Alexander Monakov amona...@ispras.ru * sched-deps.c (sd_init_insn): Initialize INSN_COND. * sel-sched.c (move_op): Use correct type for 'res'. Verify that code_motion_path_driver returned 0 or 1. I tested this patch on my IA64 HP-UX box and did not see any regressions. It fixed the problem I was having. Steve Ellcey s...@cup.hp.com
libobjc: remove deprecated API (patch 1)
This patch removes a number of deprecated libobjc functions and methods, which are part of the Traditional Objective-C API that was deprecated in GCC 4.6.x and are to be removed in GCC 4.7.0. It's the first of a long sequence of patches that does this removal one bit at a time. This one removes the deprecated objc_error(), objc_verror() and objc_set_error_handler() functions, and all the deprecated Object methods whose implementation used to use these functions. Unfortunately, all of our testcases use the Traditional Objective-C API when testing the GNU runtime and they will need to be updated to use the Modern Objective-C API because the Traditional Objective-C API is simply going away. I'll update the relevant testcases with each patch. This first patch requires only a tiny update of a single testcase. Committed to trunk. Thanks Index: libobjc/sendmsg.c === --- libobjc/sendmsg.c (revision 174585) +++ libobjc/sendmsg.c (working copy) @@ -977,16 +977,8 @@ __objc_forward (id object, SEL sel, arglist_t args : instance ), object-class_pointer-name, sel_getName (sel)); -/* TODO: support for error: is surely deprecated ? */ -err_sel = sel_get_any_uid (error:); -if (__objc_responds_to (object, err_sel)) - { - imp = get_implementation (object, object-class_pointer, err_sel); - return (*imp) (object, sel_get_any_uid (error:), msg); - } - -/* The object doesn't respond to doesNotRecognize: or error:; - Therefore, a default action is taken. */ +/* The object doesn't respond to doesNotRecognize:. Therefore, a + default action is taken. */ _objc_abort (%s\n, msg); return 0; Index: libobjc/Makefile.in === --- libobjc/Makefile.in (revision 174585) +++ libobjc/Makefile.in (working copy) @@ -139,7 +139,6 @@ OBJC_DEPRECATED_H = \ STR.h \ hash.h \ objc-list.h \ - objc_error.h \ objc_get_uninstalled_dtable.h \ objc_malloc.h \ objc_msg_sendv.h \ Index: libobjc/libobjc.def === --- libobjc/libobjc.def (revision 174585) +++ libobjc/libobjc.def (working copy) @@ -25,7 +25,6 @@ search_for_method_in_list objc_get_uninstalled_dtable objc_hash_is_key_in_hash hash_is_key_in_hash -objc_verror _objc_load_callback objc_malloc objc_atomic_malloc @@ -53,7 +52,6 @@ objc_thread_remove __objc_class_name_Object __objc_class_name_Protocol __objc_class_name_NXConstantString -objc_error __objc_object_alloc __objc_object_copy __objc_object_dispose Index: libobjc/error.c === --- libobjc/error.c (revision 174585) +++ libobjc/error.c (working copy) @@ -45,53 +45,3 @@ _objc_abort (const char *fmt, ...) abort (); va_end (ap); } - -/* The rest of the file is deprecated. */ -#include objc/objc-api.h /* For objc_error_handler. */ - -/* -** Error handler function -** NULL so that default is to just print to stderr -*/ -static objc_error_handler _objc_error_handler = NULL; - -/* Trigger an objc error */ -void -objc_error (id object, int code, const char *fmt, ...) -{ - va_list ap; - - va_start (ap, fmt); - objc_verror (object, code, fmt, ap); - va_end (ap); -} - -/* Trigger an objc error */ -void -objc_verror (id object, int code, const char *fmt, va_list ap) -{ - BOOL result = NO; - - /* Call the error handler if its there - Otherwise print to stderr */ - if (_objc_error_handler) -result = (*_objc_error_handler) (object, code, fmt, ap); - else -vfprintf (stderr, fmt, ap); - - /* Continue if the error handler says its ok - Otherwise abort the program */ - if (result) -return; - else -abort (); -} - -/* Set the error handler */ -objc_error_handler -objc_set_error_handler (objc_error_handler func) -{ - objc_error_handler temp = _objc_error_handler; - _objc_error_handler = func; - return temp; -} Index: libobjc/ChangeLog === --- libobjc/ChangeLog (revision 174585) +++ libobjc/ChangeLog (working copy) @@ -1,3 +1,19 @@ +2011-06-02 Nicola Pero nicola.p...@meta-innovation.com + + * Makefile.in (OBJC_DEPRECATED_H): Removed objc_error.h. + * objc/deprecated/objc_error.h: Removed. + * objc/objc-api.h: Do not include deprecated/objc_error.h. + * libobjc.def (objc_error, objc_verror): Removed. + * error.c (_objc_error_handler, objc_error, objc_verror, + objc_set_error_handler): Removed. + * Object.m ([-error:], [-perform:], [-perform:with:], + [-perform:with:with], [-subclassResponsibility:], + [-notImplemented:], [-shouldNotImplement:], [-doesNotRecognize:]): + Removed. + * objc/deprecated/Object.h: Removed the same methods. + * sendmsg.c (__objc_forward): Do not try to
Re: Ping: Re: Improve DSE in the presence of calls
Ping. On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
Re: fix left-over debug insns in DCE
On Jun 2, 2011, Eric Botcazou ebotca...@adacore.com wrote: Why can't the problem be addressed purely within DF? Hmm... Maybe it could, I'm not sure. The problem is that DCE removes insns, and then DF associates remaining uses in debug insns to earlier DEFs. Adjusting debug insns in DCE is right per the VTA design motto: decide as if debug insns weren't there, adjust them as you would adjust non-debug insns. This code borrowed from DF into DCE is the “adjust” bit. Starting to spill the DF logic to individual RTL passes doesn't look very appealing to me. Propagation of uses isn't DF-specific material, it just so happened that it offered an adequate interface. Other passes already have their own propagation machinery, but it didn't look quite as suitable. This is the patch I ended up with. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? OK for the usual debug insn bookkeeping, i.e. Err... These depend on the interface changes of functions defined within DF to work. Should they perhaps be moved out of DF-specific files? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PR debug/47590] rework md option overriding to delay var-tracking
On Jun 1, 2011, Bernd Schmidt ber...@codesourcery.com wrote: On 06/02/2011 12:47 AM, Alexandre Oliva wrote: On Jun 1, 2011, Bernd Schmidt ber...@codesourcery.com wrote: Looks ok, except I think you need to update tm.texi.in and tm.texi? Oh, I didn't realize updating tm.texi.in; AFAICT tm.texi is generated the same regardless. I *think* what one is supposed to do is to just add the @hook lines in tm.texi.in if the definition in target.def includes documentation. Right you are, though it looks like leaving the @hook lines out makes no difference. Anyhow, here's the patch I'm checking in. for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/47590 * target.def (delay_sched2, delay_vartrack): New. * doc/tm.texi.in: Update. * doc/tm.texi: Rebuild. * sched-rgn.c (gate_handle_sched2): Fail if delay_sched2. * var-tracking.c (gate_handle_var_tracking): Likewise. * config/bfin/bfin.c (bfin_flag_schedule_insns2): Drop. (bfin_flag_var_tracking): Drop. (output_file_start): Don't save and override flag_var_tracking. (bfin_option_override): Ditto flag_schedule_insns_after_reload. (bfin_reorg): Test original variables. (TARGET_DELAY_SCHED2, TARGET_DELAY_VARTRACK): Define. * config/ia64/ia64.c (ia64_flag_schedule_insns2): Drop. (ia64_flag_var_tracking): Drop. (TARGET_DELAY_SCHED2, TARGET_DELAY_VARTRACK): Define. (ia64_file_start): Don't save and override flag_var_tracking. (ia64_override_options_after_change): Ditto flag_schedule_insns_after_reload. (ia64_reorg): Test original variables. * config/picochip/picochip.c (picochip_flag_schedule_insns2): Drop. (picochip_flag_var_tracking): Drop. (TARGET_DELAY_SCHED2, TARGET_DELAY_VARTRACK): Define. (picochip_option_override): Don't save and override flag_schedule_insns_after_reload. (picochip_asm_file_start): Ditto flag_var_tracking. (picochip_reorg): Test original variables. * config/spu/spu.c (spu_flag_var_tracking): Drop. (TARGET_DELAY_VARTRACK): Define. (spu_var_tracking): New. (spu_machine_dependent_reorg): Call it. (asm_file_start): Don't save and override flag_var_tracking. Index: gcc/target.def === --- gcc/target.def.orig 2011-05-30 03:53:29.0 -0300 +++ gcc/target.def 2011-05-31 17:50:09.733284971 -0300 @@ -2717,6 +2717,16 @@ DEFHOOKPOD in particular GDB does not use them., bool, false) +DEFHOOKPOD +(delay_sched2, True if sched2 is not to be run at its normal place. \ +This usually means it will be run as part of machine-specific reorg., +bool, false) + +DEFHOOKPOD +(delay_vartrack, True if vartrack is not to be run at its normal place. \ +This usually means it will be run as part of machine-specific reorg., +bool, false) + /* Leave the boolean fields at the end. */ /* Close the 'struct gcc_target' definition. */ Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in.orig 2011-06-01 19:45:28.725386885 -0300 +++ gcc/doc/tm.texi.in 2011-06-01 19:53:47.394534907 -0300 @@ -9353,6 +9353,10 @@ tables, and hence is desirable if it wor @hook TARGET_WANT_DEBUG_PUB_SECTIONS +@hook TARGET_DELAY_SCHED2 + +@hook TARGET_DELAY_VARTRACK + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi.orig 2011-05-30 03:53:29.0 -0300 +++ gcc/doc/tm.texi 2011-06-01 19:54:09.126494927 -0300 @@ -9432,6 +9432,14 @@ tables, and hence is desirable if it wor True if the @code{.debug_pubtypes} and @code{.debug_pubnames} sections should be emitted. These sections are not used on most platforms, and in particular GDB does not use them. @end deftypevr +@deftypevr {Target Hook} bool TARGET_DELAY_SCHED2 +True if sched2 is not to be run at its normal place. This usually means it will be run as part of machine-specific reorg. +@end deftypevr + +@deftypevr {Target Hook} bool TARGET_DELAY_VARTRACK +True if vartrack is not to be run at its normal place. This usually means it will be run as part of machine-specific reorg. +@end deftypevr + @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2}) A C statement to issue assembly directives that create a difference @var{lab1} minus @var{lab2}, using an integer of the given @var{size}. Index: gcc/sched-rgn.c === --- gcc/sched-rgn.c.orig 2011-04-06 00:24:12.0 -0300 +++ gcc/sched-rgn.c 2011-05-31 17:43:02.584808465 -0300 @@ -3508,7 +3508,7 @@ gate_handle_sched2 (void) { #ifdef INSN_SCHEDULING return optimize 0 flag_schedule_insns_after_reload - dbg_cnt (sched2_func); + !targetm.delay_sched2 dbg_cnt (sched2_func); #else return 0; #endif
Re: [PR48866] three alternative fixes
On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: 1. emit debug temps for replaceable DEFs that end up being referenced in debug insns. We already have some code to try to deal with this, but it emits the huge expressions we'd rather avoid, and it may create unnecessary duplication. This new approach emits a placeholder instead of skipping replaceable DEFs altogether, and then, if the DEF is referenced in a debug insn (perhaps during the late debug re-expasion of some other placeholder), it is expanded. Placeholders that end up not being referenced are then throw away. This is my favorite option, for it's safest: it doesn't change executable code at all (or should I say it *shouldn't* change it, for I haven't verified that it doesn't), retaining any register pressure benefits from TER. This revised and retested version records expansions in an array indexed on SSA version rather than a pointer_map, as suggested by Matz. for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/48866 * cfgexpand.c (def_expansions): New. (def_expansions_init): New. (def_expansions_remove_placeholder, def_expansions_fini): New. (def_get_expansion_ptr): New. (expand_debug_expr): Create debug temps as needed. (expand_debug_insn): New, split out of... (expand_debug_locations): ... this. (gen_emit_debug_insn): New, split out of... (expand_gimple_basic_block): ... this. Simplify expansion of debug stmts. Emit placeholders for replaceable DEFs, rather than debug temps at last non-debug uses. (gimple_expand_cfg): Initialize and finalize expansions cache. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c.orig 2011-06-01 19:45:02.520428653 -0300 +++ gcc/cfgexpand.c 2011-06-01 20:20:02.014975168 -0300 @@ -2337,6 +2337,70 @@ convert_debug_memory_address (enum machi return x; } +/* Mark debug insns that are placeholders for replaceable SSA_NAMEs + that have not been expanded yet. */ +#define DEBUG_INSN_TOEXPAND(RTX) \ + (RTL_FLAG_CHECK1(DEBUG_INSN_TOEXPAND, (RTX), DEBUG_INSN)-used) + +/* Map replaceable SSA_NAMEs versions to their RTL expansions. */ +static rtx *def_expansions; + +/* Initialize the def_expansions data structure. This is to be called + before expansion of a function starts. */ + +static void +def_expansions_init (void) +{ + gcc_checking_assert (!def_expansions); + def_expansions = XCNEWVEC (rtx, num_ssa_names); +} + +/* Remove the DEBUG_INSN INSN if it still binds an SSA_NAME. */ + +static bool +def_expansions_remove_placeholder (rtx insn) +{ + gcc_checking_assert (insn); + + if (TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == SSA_NAME) +{ + gcc_assert (!DEBUG_INSN_TOEXPAND (insn)); + remove_insn (insn); +} + + return true; +} + +/* Finalize the def_expansions data structure. This is to be called + at the end of the expansion of a function. */ + +static void +def_expansions_fini (void) +{ + int i = num_ssa_names; + + gcc_checking_assert (def_expansions); + while (i--) +if (def_expansions[i]) + def_expansions_remove_placeholder (def_expansions[i]); + XDELETEVEC (def_expansions); + def_expansions = NULL; +} + +/* Return a pointer to the rtx expanded from EXP. EXP must be a + replaceable SSA_NAME. */ + +static rtx * +def_get_expansion_ptr (tree exp) +{ + gcc_checking_assert (def_expansions); + gcc_checking_assert (TREE_CODE (exp) == SSA_NAME); + gcc_checking_assert (bitmap_bit_p (SA.values, SSA_NAME_VERSION (exp))); + return def_expansions[SSA_NAME_VERSION (exp)]; +} + +static void expand_debug_insn (rtx insn); + /* Return an RTX equivalent to the value of the tree expression EXP. */ @@ -3131,7 +3195,30 @@ expand_debug_expr (tree exp) gimple g = get_gimple_for_ssa_name (exp); if (g) { - op0 = expand_debug_expr (gimple_assign_rhs_to_tree (g)); + rtx insn = *def_get_expansion_ptr (exp); + tree vexpr; + + /* If this still has the original SSA_NAME, emit a debug + temp and compute the RTX value. */ + if (TREE_CODE (INSN_VAR_LOCATION_DECL (insn)) == SSA_NAME) + { + tree var = SSA_NAME_VAR (INSN_VAR_LOCATION_DECL (insn)); + + vexpr = make_node (DEBUG_EXPR_DECL); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (var); + DECL_MODE (vexpr) = DECL_MODE (var); + INSN_VAR_LOCATION_DECL (insn) = vexpr; + + gcc_checking_assert (!DEBUG_INSN_TOEXPAND (insn)); + DEBUG_INSN_TOEXPAND (insn) = 1; + expand_debug_insn (insn); + } + else + vexpr = INSN_VAR_LOCATION_DECL (insn); + + op0 = expand_debug_expr (vexpr); + if (!op0) return NULL; } @@ -3293,6 +3380,45 @@ expand_debug_expr (tree exp) } } +/* Expand the LOC value of the debug insn INSN. */ + +static void +expand_debug_insn (rtx insn) +{ + tree value = (tree)INSN_VAR_LOCATION_LOC (insn); + rtx val; + enum machine_mode mode; + +
Re: [PR48866] three alternative fixes
On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: 3. expand dominators before dominated blocks, so that DEFs of replaceable SSA names are expanded before their uses. Expand them when they're encountered, but not requiring a REG as a result. Save the RTL expression that results from the expansion for use in debug insns and at the non-debug use. This patch addresses some of the problems in 2, avoiding expanding code out of order within a block, and (hopefully) ensuring that, expanding dominators before dominatedblocks, DEFs are expanded before USEs. There is a theoretical possibility that a USE may be expanded before a DEF, depending on internal details of out-of-ssa, but should this ever happen, we'll get a failed assertion, and then disabling TER will work around the problem. I also posted the wrong patch upthread for this variant. The one I posted didn't work at all, because it contained a last-minute optimization that changed the expansion of replaceable stmts from EXPAND_NORMAL to EXPAND_SUM. IIRC the former always yielded a pseudo, whereas the former enabled replacements, but it also exposed the need for better handling of non-general_operands when the use expects one. This revised and retested version also drops the reordering of the expansion of basic blocks, that Matz pointed out was unnecessary, and switches to an array rather than a pointer_map to record the expansions. for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/48866 * cfgexpand.c (def_expansions): New. (def_expansion_recent_tree, def_expansion_recent_rtx): New. (def_expansions_init, def_expansions_fini): New. (def_has_expansion_ptr, def_get_expansion_ptr): New. (expand_debug_expr): Use recorded expansion if available. (expand_gimple_basic_block): Prepare to record expansion of replaceable defs. Change return type to void. (gimple_expand_cfg): Initialize and finalize expansions cache. Expand dominator blocks before dominated. * expr.c (expand_expr_real_1): Use recorded expansion of replaceable defs. * expr.h (def_has_expansion_ptr): Declare. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c.orig 2011-06-01 20:39:58.244953408 -0300 +++ gcc/cfgexpand.c 2011-06-01 21:44:38.005879125 -0300 @@ -2337,6 +2337,42 @@ convert_debug_memory_address (enum machi return x; } +/* Map replaceable SSA_NAMEs to their RTL expansions. */ +static rtx *def_expansions; + +/* Initialize the def_expansions data structure. This is to be called + before expansion of a function starts. */ + +static void +def_expansions_init (void) +{ + gcc_checking_assert (!def_expansions); + def_expansions = XCNEWVEC (rtx, num_ssa_names); +} + +/* Finalize the def_expansions data structure. This is to be called + at the end of the expansion of a function. */ + +static void +def_expansions_fini (void) +{ + gcc_checking_assert (def_expansions); + XDELETEVEC (def_expansions); + def_expansions = NULL; +} + +/* Return a pointer to the rtx expanded from EXP. EXP must be a + replaceable SSA_NAME. */ + +rtx * +def_get_expansion_ptr (tree exp) +{ + gcc_checking_assert (def_expansions); + gcc_checking_assert (TREE_CODE (exp) == SSA_NAME); + gcc_checking_assert (bitmap_bit_p (SA.values, SSA_NAME_VERSION (exp))); + return def_expansions[SSA_NAME_VERSION (exp)]; +} + /* Return an RTX equivalent to the value of the tree expression EXP. */ @@ -3131,7 +3167,16 @@ expand_debug_expr (tree exp) gimple g = get_gimple_for_ssa_name (exp); if (g) { - op0 = expand_debug_expr (gimple_assign_rhs_to_tree (g)); + rtx *xp = def_get_expansion_ptr (exp); + + if (xp) + op0 = copy_rtx (*xp); + else + op0 = NULL; + + if (!op0) + op0 = expand_debug_expr (gimple_assign_rhs_to_tree (g)); + if (!op0) return NULL; } @@ -3618,20 +3663,38 @@ expand_gimple_basic_block (basic_block b } else { + rtx *xp = NULL; def_operand_p def_p; def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF); - if (def_p != NULL) + /* Ignore this stmt if it is in the list of + replaceable expressions. */ + if (def_p != NULL + SA.values + bitmap_bit_p (SA.values, + SSA_NAME_VERSION (DEF_FROM_PTR (def_p { - /* Ignore this stmt if it is in the list of - replaceable expressions. */ - if (SA.values - bitmap_bit_p (SA.values, - SSA_NAME_VERSION (DEF_FROM_PTR (def_p - continue; + tree def = DEF_FROM_PTR (def_p); + gimple g = get_gimple_for_ssa_name (def); + rtx retval; + + last = get_last_insn (); + + retval = expand_expr (gimple_assign_rhs_to_tree (g), + NULL_RTX, VOIDmode, EXPAND_SUM); + + xp = def_get_expansion_ptr (def); + gcc_checking_assert (!*xp); + *xp = retval; } - last = expand_gimple_stmt (stmt); + else +
Re: [PR48866] three alternative fixes
Ugh, failed to refresh the patch file, resending with the correct one. On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: On May 30, 2011, Alexandre Oliva aol...@redhat.com wrote: 2. emit placeholders for replaceable DEFs and, when the DEFs are expanded at their point of use, emit the expansion next to the placeholder, rather than at the current stream. The result of the expansion is saved and used in debug insns that reference the replaceable DEF. If the result is forced into a REG shortly thereafter, the code resulting from this is also emitted next to the placeholder, and the saved expansion is updated. If the USE is expanded before the DEF, the insn stream resulting from the expansion is saved and emitted at the point of the DEF. IMHO this is the riskiest of the 3 patches, for shuffling expansions around isn't exactly something I'm comfortable with. There's a very real risk that moving the expansion of sub-expressions to their definition points may end up moving uses before definitions. Upthread, I posted the wrong patch: instead of the one that tolerated expanding DEFs before or after USEs, I posted a simplifying experiment that seemed to fail, but it looks like I misinterpreted the results. This revised and retested patch also records expansions in an array rather than a pointer_map, and it avoids re-expanding DEFs when a USE is expanded for the second time. Although replaceable DEFs can only have one USE, when the single USE appears in a call stmt, it can be expanded twice. I'm not sure whether it would be better to expand it twice and let RTL optimizations drop any redundancies, or reuse the result of the first expansion, like this patch does. for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/48866 * cfgexpand.c (def_expansions): New. (def_expansion_recent_tree, def_expansion_recent_rtx): New. (def_expansions_init): New. (def_expansions_remove_placeholder, def_expansions_fini): New. (def_get_expansion_ptr): New. (def_expansion_recent, def_expansion_record_recent): New. (def_expansion_add_insns): New. (expand_debug_expr): Use recorded expansion if available. (expand_gimple_basic_block): Prepare to record expansion of replaceable defs. Reset recent expansions at the end of the block. (gimple_expand_cfg): Initialize and finalize expansions cache. * expr.c: Include gimple-pretty-print.h. (store_expr): Forget recent expansions upon nontemporal moves. (expand_expr_real_1): Reuse or record expansion of replaceable defs. * expr.h (def_get_expansion_ptr, def_expansion_recent): Declare. (def_expansion_record_recent, def_expansion_add_insns): Declare. * explow.c (force_recent): New. (force_reg): Use it. Split into... (force_reg_1): ... this. * Makefile.in (expr.o): Depend on gimple-pretty-print.h. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c.orig 2011-06-02 16:43:03.596818720 -0300 +++ gcc/cfgexpand.c 2011-06-02 17:18:10.217974612 -0300 @@ -2337,6 +2337,144 @@ convert_debug_memory_address (enum machi return x; } +/* Map replaceable SSA_NAMEs to NOTE_INSN_VAR_LOCATIONs that hold + their RTL expansions (once available) in their NOTE_VAR_LOCATIONs + (without a VAR_LOCATION rtx). The SSA_NAME DEF is expanded before + its single USE, so the NOTE is inserted in the insn stream, marking + the location where the non-replaceable portion of the expansion is + to be inserted. When the single USE is expanded, it will be + emitted before the NOTE. */ +static rtx *def_expansions; + +/* The latest expanded SSA name, and its corresponding RTL expansion. + These are used to enable the insertion of the insn that stores the + expansion in a register at the end of the sequence expanded for the + SSA DEF. */ +static tree def_expansion_recent_tree; +static rtx def_expansion_recent_rtx; + +/* Initialize the def_expansions data structure. This is to be called + before expansion of a function starts. */ + +static void +def_expansions_init (void) +{ + gcc_checking_assert (!def_expansions); + def_expansions = XCNEWVEC (rtx, num_ssa_names); + + gcc_checking_assert (!def_expansion_recent_tree); + gcc_checking_assert (!def_expansion_recent_rtx); +} + +/* Remove the NOTE that marks the insertion location of the expansion + of a replaceable SSA note. */ + +static bool +def_expansions_remove_placeholder (rtx note) +{ + if (!note) +return true; + + gcc_checking_assert (NOTE_P (note)); + remove_insn (note); + + return true; +} + +/* Finalize the def_expansions data structure. This is to be called + at the end of the expansion of a function. */ + +static void +def_expansions_fini (void) +{ + int i = num_ssa_names; + + gcc_checking_assert (def_expansions); + + while (i--) +if (def_expansions[i]) + def_expansions_remove_placeholder (def_expansions[i]); + XDELETEVEC (def_expansions); + def_expansions = NULL; + def_expansion_recent_tree =
Re: introduce --param max-vartrack-expr-depth
On Jun 2, 2011, Bernd Schmidt ber...@codesourcery.com wrote: On 06/02/2011 10:46 AM, Jakub Jelinek wrote: On Wed, Jun 01, 2011 at 07:25:39PM -0300, Alexandre Oliva wrote: Such as this one... I'd appreciate if this could go in... Go on then. Ok, here's what I've just installed. for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com * params.def (PARAM_MAX_VARTRACK_EXPR_DEPTH): Bump default to 10. * var-tracking.c (reverse_op): Limite recurse depth to 5. Index: gcc/params.def === --- gcc/params.def.orig 2011-05-31 18:28:05.348070586 -0300 +++ gcc/params.def 2011-06-01 17:09:41.117140944 -0300 @@ -845,7 +845,7 @@ DEFPARAM (PARAM_MAX_VARTRACK_SIZE, DEFPARAM (PARAM_MAX_VARTRACK_EXPR_DEPTH, max-vartrack-expr-depth, Max. recursion depth for expanding var tracking expressions, - 10, 0, 0) + 20, 0, 0) /* Set minimum insn uid for non-debug insns. */ Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2011-05-31 20:06:25.604477956 -0300 +++ gcc/var-tracking.c 2011-05-31 23:56:06.578450957 -0300 @@ -5288,7 +5288,7 @@ reverse_op (rtx val, const_rtx expr) arg = XEXP (src, 1); if (!CONST_INT_P (arg) GET_CODE (arg) != SYMBOL_REF) { - arg = cselib_expand_value_rtx (arg, scratch_regs, EXPR_DEPTH); + arg = cselib_expand_value_rtx (arg, scratch_regs, 5); if (arg == NULL_RTX) return NULL_RTX; if (!CONST_INT_P (arg) GET_CODE (arg) != SYMBOL_REF) -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [patch] add -Wdelete-non-virtual-dtor
On 06/02/2011 06:02 PM, Jonathan Wakely wrote: + if (!dtor || !DECL_VINDEX (dtor)) Do we really want to warn about the case where the class has no/trivial destructor? + bool abstract = false; + for (x = TYPE_METHODS (type); x; x = DECL_CHAIN (x)) + if (DECL_PURE_VIRTUAL_P (x)) + { + abstract = true; + break; + } + if (abstract) Just check CLASSTYPE_PURE_VIRTUALS. Jason
Re: [PR 48333] avoid -fcompare-debug errors from builtins in MEM attrs
Ping? This fixes a case in which -g might change the executable code, exposed with bootstrap-debug-lean. On Apr 2, 2011, Alexandre Oliva aol...@redhat.com wrote: PR debug/48333 * calls.c (emit_call_1): Prefer the __builtin declaration of builtin functions. http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00114.html -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Wed, May 18, 2011 at 8:57 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Apr 26, 2011 at 6:05 AM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Mar 31, 2011 at 7:57 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Mar 21, 2011 at 11:40 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Mar 14, 2011 at 12:28 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Jan 27, 2011 at 2:40 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Jan 27, 2011 at 12:12 AM, H.J. Lu hongjiu...@intel.com wrote: On Tue, Dec 14, 2010 at 05:20:48PM -0800, H.J. Lu wrote: This patch uses .init_array/.fini_array sections instead of .ctors/.dtors sections if mixing .init_array/.fini_array and .ctors/.dtors sections with init_priority works. It removes .ctors/.ctors sections from executables and DSOes, which will remove one function call at startup time from each executable and DSO. It should reduce image size and improve system startup time. If a platform with a working .init_array/.fini_array support needs a different .init_array/.fini_array implementation, it can set use_initfini_array to no. Since .init_array/.fini_array is a target feature. --enable-initfini-array is default to no unless the native run-time test is passed. To pass the native run-time test, a linker with SORT_BY_INIT_PRIORITY support is required. The binutils patch is available at http://sourceware.org/ml/binutils/2010-12/msg00466.html Linker patch has been checked in. This patch passed 32bit/64bit regression test on Linux/x86-64. Any comments? This updated patch fixes build on Linux/ia64 and should work on others. Any comments? Yes. This is stage1 material. Here is the updated patch. OK for trunk? Thanks. -- H.J. 2011-03-14 H.J. Lu hongjiu...@intel.com PR target/46770 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Removed. * config.gcc (use_initfini_array): New variable. Use initfini-array.o if supported. * crtstuff.c: Don't generate .ctors nor .dtors sections if NO_CTORS_DTORS_SECTIONS is defined. * configure.ac: Remove gcc_AC_INITFINI_ARRAY. Add --enable-initfini-array and check if .init_array can be used with .ctors. * configure: Regenerated. * config/initfini-array.c: New. * config/initfini-array.h: Likewise. * config/t-initfini-array: Likewise. * config/arm/arm.c (arm_asm_init_sections): Call elf_initfini_array_init_sections if NO_CTORS_DTORS_SECTIONS is defined. * config/avr/avr.c (avr_asm_init_sections): Likewise. * config/ia64/ia64.c (ia64_asm_init_sections): Likewise. * config/mep/mep.c (mep_asm_init_sections): Likewise. * config/microblaze/microblaze.c (microblaze_elf_asm_init_sections): Likewise. * config/rs6000/rs6000.c (rs6000_elf_asm_init_sections): Likewise. * config/stormy16/stormy16.c (xstormy16_asm_init_sections): Likewise. * config/v850/v850.c (v850_asm_init_sections): Likewise. PING: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00760.html Any comments? Any objections? Here is the patch updated for the current trunk. OK for trunk? PING,. Hi Richard, You commented my patch was stage 1 material: http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01989.html Is my patch: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00760.html OK for trunk? Thanks. -- H.J.