Re: RFA: patch to fix PR55116
On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov vmaka...@redhat.com wrote: On 12-10-29 12:21 PM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: H.J. in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116 reported an interesting address (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ]) (const_int 2 [0x2])) (symbol_ref:SI (glob_vol_int_arr) var_decl 0x703c2720 glob_vol_int_arr)) 0) (const_int 4294967295 [0x])) which can not be correctly extracted. Here `and' with `subreg' behaves as an address mutation. The following patch fixes the problem. Ok to commit, Richard? Heh, I wondered if subregs might still be used like that, and was almost tempted to add them just in case. I think this particular case is really a failed canonicalisation and that: (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x)) ought to be: (zero_extend:DI (foo:SI ...)) Yes, that was my thought too. But I know I've approved MIPS patches to accept (and:DI ... (const_int 0x)) as an alternative. Index: rtlanal.c === --- rtlanal.c (revision 192942) +++ rtlanal.c (working copy) @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum else if (code == AND CONST_INT_P (XEXP (*loc, 1))) /* (and ... (const_int -X)) is used to align to X bytes. */ loc = XEXP (*loc, 0); + else if (code == SUBREG + ! REG_P (XEXP (*loc, 0)) ! MEM_P (XEXP (*loc, 0))) + /* (subreg (operator ...) ...) usually inside and is used for + mode conversion too. */ + loc = XEXP (*loc, 0); I think the condition should be: else if (code == SUBREG !OBJECT_P (SUBREG_REG (*loc)) subreg_lowpart (*loc)) OK with that change, if it works. Yes, it works. I've submitted the following patch. It doesn't work right. I will create a new testcase. This patch limits SUBREG to Pmode. Tested on Linux/x86-64. OK to install? Thanks. -- H.J. gcc/ 2012-10-29 H.J. Lu hongjiu...@intel.com PR middle-end/55116 * rtlanal.c (strip_address_mutations): Handle SUBREG only for Pmode. gcc/testsuite/ 2012-10-29 H.J. Lu hongjiu...@intel.com PR middle-end/55116 * gcc.target/i386/pr55116-2.c: New file. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 43d4cb8..d076ad6 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -5460,6 +5460,7 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code) /* (and ... (const_int -X)) is used to align to X bytes. */ loc = XEXP (*loc, 0); else if (code == SUBREG + GET_MODE (*loc) == Pmode !OBJECT_P (SUBREG_REG (*loc)) subreg_lowpart_p (*loc)) /* (subreg (operator ...) ...) inside and is used for mode diff --git a/gcc/testsuite/gcc.target/i386/pr55116-2.c b/gcc/testsuite/gcc.target/i386/pr55116-2.c new file mode 100644 index 000..7ef8ead --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr55116-2.c @@ -0,0 +1,86 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 -maddress-mode=long } */ + +typedef struct rtx_def *rtx; +enum rtx_code { MINUS }; +union rtunion_def { + rtx rt_rtx; +}; +typedef union rtunion_def rtunion; +struct rtx_def { + enum rtx_code code: 16; + union u { +rtunion fld[1]; + } + u; +}; +rtx simplify_binary_operation (enum rtx_code code, int mode, + rtx op0, rtx op1); +struct simplify_plus_minus_op_data { + rtx op; + short neg; +}; +void simplify_plus_minus (enum rtx_code code, int mode, rtx op0, rtx op1) +{ + struct simplify_plus_minus_op_data ops[8]; + rtx tem = (rtx) 0; + int n_ops = 2, input_ops = 2; + int changed, canonicalized = 0; + int i, j; + __builtin_memset (ops, 0, sizeof (ops)); + do +{ + changed = 0; + for (i = 0; i n_ops; i++) + { + rtx this_op = ops[i].op; + int this_neg = ops[i].neg; + enum rtx_code this_code = ((enum rtx_code) (this_op)-code); + switch (this_code) + { + case MINUS: + if (n_ops == 7) + return; + n_ops++; + input_ops++; + changed = 1; + canonicalized |= this_neg; + break; + } + } +} + while (changed); + do +{ + j = n_ops - 1; + for (i = n_ops - 1; j = 0; j--) + { + rtx lhs = ops[j].op, rhs = ops[i].op; + int lneg = ops[j].neg, rneg = ops[i].neg; + if (lhs != 0 rhs != 0) + { + enum rtx_code ncode = MINUS; + if (((enum rtx_code) (lhs)-code) ==
Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed
On Mon, Oct 29, 2012 at 3:49 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 3:44 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 8:15 AM, Richard Sandiford rdsandif...@googlemail.com wrote: H.J. Lu hjl.to...@gmail.com writes: Hi, This patch changes get_elimination to check register number instead of RTX. Tested on Linux/x32 with -maddress-mode=long. OK to install? FWIW, this doesn't sound right to me, at least not without more justification. The idea is that things like frame_pointer_rtx are supposed to be unique, so the original code: if ((ep = elimination_map[hard_regno]) != NULL) -return ep-from_rtx != reg ? NULL : ep; from != hard_regno ? NULL : ep; ought to be correct in itself. reload did the same thing: for (ep = reg_eliminate; ep reg_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (ep-from_rtx == x ep-can_eliminate) return plus_constant (Pmode, ep-to_rtx, ep-previous_offset); It sounds on the face of it like the bug is elsewhere. LRA has if (REG_P (reg) (ep = get_elimination (reg)) != NULL) { rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx; if (! replace_p) { offset += (ep-offset - ep-previous_offset); offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); } if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); Reload has rtx to_rtx = ep-to_rtx; offset += ep-offset; offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); (gdb) call debug_rtx (ep-to_rtx) (reg/f:DI 7 sp) (gdb) call debug_rtx (ep-from_rtx) (reg/f:DI 16 argp) (gdb) gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp) for LRA. They are caused by if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM /* We should convert arg register in LRA after the elimination if it is possible. */ xregno == ARG_POINTER_REGNUM ! lra_in_progress) return -1; It doesn't work in this case. This testcase shows that LRA can't convert arg register after the elimination. Here is a patch to remove ra_in_progress check for ARG_POINTER_REGNUM. Tested on Linux.x86-64. OK to install? Thanks. -- H.J. gcc/ 2012-10-29 H.J. Lu hongjiu...@intel.com PR rtl-optimization/55093 * rtlanal.c (simplify_subreg_regno): Remove lra_in_progress check for ARG_POINTER_REGNUM. gcc/testsuite/ 2012-10-29 H.J. Lu hongjiu...@intel.com PR rtl-optimization/55093 * gcc.target/i386/pr55093.c: New file. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 43d4cb8..c1a7580 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -3494,10 +3494,7 @@ simplify_subreg_regno (unsigned int xregno, enum machine_mode xmode, return -1; if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM - /* We should convert arg register in LRA after the elimination -if it is possible. */ - xregno == ARG_POINTER_REGNUM - ! lra_in_progress) + xregno == ARG_POINTER_REGNUM) return -1; if (xregno == STACK_POINTER_REGNUM diff --git a/gcc/testsuite/gcc.target/i386/pr55093.c b/gcc/testsuite/gcc.target/i386/pr55093.c new file mode 100644 index 000..76b4042 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr55093.c @@ -0,0 +1,80 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 -maddress-mode=long } */ + +typedef union tree_node *tree; +typedef const union tree_node *const_tree; +typedef struct { + unsigned long long low; + long long high; +} double_int; +struct real_value { +}; +struct real_format { + int has_signed_zero; +}; +extern const struct real_format * real_format_for_mode[]; +extern int real_isnegzero (const struct real_value *); +enum tree_code { REAL_CST, SSA_NAME }; +struct tree_base { + enum tree_code code : 16; + union { +unsigned int version; + } + u; +}; +extern void tree_check_failed (const_tree, const char *, int, const char *, ...) __attribute__ ((__noreturn__)); +union tree_node { + struct tree_base base; +}; +inline tree tree_check (tree __t, const char *__f, int __l, const char *__g, enum tree_code __c) { + if (((enum tree_code) (__t)-base.code) != __c) +tree_check_failed (__t, __f, __l, __g, __c, 0); + return __t; +} +struct prop_value_d { + int lattice_val; + tree value; + double_int mask; +}; +typedef struct prop_value_d prop_value_t; +static prop_value_t *const_val; +static void canonicalize_float_value (prop_value_t *); +typedef void (*ssa_prop_visit_stmt_fn) (prop_value_t); +typedef void (*ssa_prop_visit_phi_fn) (void); +typedef void
RE: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Hi Jakub, We are working on the following. 1. bdver3 enablement. Review completed. Changes to be incorporated and checked-in. http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01131.html 2. btver2 basic enablement is done (http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01018.html)/ Scheduler descriptions are being updated. This is architecture specific and we consider it not to be a stage-1 material. Regards Ganesh -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Monday, October 29, 2012 11:27 PM To: g...@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org Subject: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Quality Data Priority # Change from Last Report --- --- P1 23 + 23 P2 77 + 8 P3 85 + 84 --- --- Total 185 +115 Previous Report === http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html The next report will be sent by me again, announcing end of stage 1.
Re: [patch] Unify bitmap interface.
On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl cr...@googlers.com wrote: On 10/29/12, Diego Novillo dnovi...@google.com wrote: On Oct 29, 2012 Diego Novillo dnovi...@google.com wrote: Just to make sure. Testing on ppc should be fast, for example. And useless. Your patch does not touch ppc. I've fixed the #if 0 and the remaining suggestions will happen in another patch. I've committed this one. === This patch implements the unification of the *bitmap interfaces as discussed. Essentially, we rename ebitmap and sbitmap functions to use the same names as the bitmap functions. This rename works because we can now overload on the bitmap type. Some macros now become inline functions to enable that overloading. The sbitmap non-bool returning bitwise operations have been merged with the bool versions. Sometimes this merge involved modifying the non-bool version to compute the bool value, and sometimes modifying bool version to add additional work from the non-bool version. The redundant routines have been removed. The allocation functions have not been renamed, because we often do not have an argument on which to overload. The cardinality functions have not been renamed, because they have different parameters, and are thus not interchangable. The iteration functions have not been renamed, because they are functionally different. Tested on x86_64, contrib/config-list.mk testing passed. Index: gcc/ChangeLog Just one question: Should we change the name of functions sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds too? It might be a little confusing that sbitmap_* is used among bitmap_*. -- Best Regards.
Adapt one fold-const optimization for vectors
Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. -- Marc Glisse
[patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o
Hi, I would like to apply the patch below to trunk and gcc-4.7-branch. This patch was originalyl submitted by Joel Sherrill back in May (http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html), but had never received any feedback. It has been part of the rtems-gcc patches, since then. Ralf 2012-05-16 Joel Sherrill joel.sherr...@oarcorp.com * config.host (m32r-*-rtems*): Include crtinit.o and crtfinit.o as extra_parts. diff --git a/libgcc/config.host b/libgcc/config.host index 051d6b0..bbf21a9 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -693,6 +693,7 @@ m32r-*-elf*) ;; m32r-*-rtems*) tmake_file=$tmake_file m32r/t-m32r t-fdpbit + extra_parts=$extra_parts crtinit.o crtfini.o ;; m32rle-*-elf*) tmake_file=t-fdpbit
Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
On 30 October 2012 05:20, Teresa Johnson tejohn...@google.com wrote: Index: cfgrtl.c === --- cfgrtl.c(revision 192692) +++ cfgrtl.c(working copy) @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ As this if() condition seems to be the canonical way to detect being in a different partition should it be moved out into a query function, and all of cfgrtl.c updated to use it? [Note I am not a maintainer and so can't approve/reject your patch]. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Oct 29, 2012 at 02:07:55PM -0400, David Miller wrote: I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. I'd like to get the Sparc cbcond stuff in (3 revisions posted) which is waiting for Eric B. to do some Solaris specific work. That has been posted in stage 1, so it is certainly ok to commit it even during early stage 3. And, on a case by case basis exceptions are always possible. This hasn't changed in the last few years. By the reviewed shortly after the freeze I just want to say that e.g. having large intrusive patches posted now, but reviewed late December is already too late. As for postponing end of stage 1 by a few weeks because of the storm, I'm afraid if we want to keep roughly timely releases we don't have that luxury. If you look at http://gcc.gnu.org/develop.html, ending stage 1 around end of October happened already for 4.6 and 4.7, for 4.5 if was a month earlier and for 4.4 even two months earlier. The 4.7 bugfixing went IMHO smothly, but we certainly have to expect lots of bugfixing. I'd also like to enable LRA for at least 32-bit sparc, even if I can't find the time to work on auditing 64-bit completely. I agree with Eric that it is better to enable it for the whole target together, rather than based on some options. Enabling LRA in early stage 3 for some targets should be ok, if it doesn't require too large and intrusive changes to the generic code that could destabilize other targets. Jakub
[PATCH] PR 54472
Hello, This PR is due to the selective scheduling missing the dependencies with implicit_sets. From the sched-deps.c code it looks like implicit sets generate anti dependencies with either sets, uses or clobbers, so that's that I've done with the below patch. Vlad, as it looks you've added implicit sets, does the above conclusion look correct? I will commit the patch then after bootstrapping and testing will complete. Yours, Andrey 2012-10-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/54472 * sel-sched-ir.c (has_dependence_note_reg_set): Handle implicit sets. (has_dependence_note_reg_clobber, has_dependence_note_reg_use): Likewise. 2012-10-30 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/54472 * gcc.dg/pr54472.c: New test. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 2a7a170..220568a 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3185,7 +3185,7 @@ has_dependence_note_reg_set (int regno) || reg_last-clobbers != NULL) *dsp = (*dsp ~SPECULATIVE) | DEP_OUTPUT; - if (reg_last-uses) + if (reg_last-uses || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; } } @@ -3205,7 +3205,7 @@ has_dependence_note_reg_clobber (int regno) if (reg_last-sets) *dsp = (*dsp ~SPECULATIVE) | DEP_OUTPUT; - if (reg_last-uses) + if (reg_last-uses || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; } } @@ -3225,7 +3225,7 @@ has_dependence_note_reg_use (int regno) if (reg_last-sets) *dsp = (*dsp ~SPECULATIVE) | DEP_TRUE; - if (reg_last-clobbers) + if (reg_last-clobbers || reg_last-implicit_sets) *dsp = (*dsp ~SPECULATIVE) | DEP_ANTI; /* Merge BE_IN_SPEC bits into *DSP when the dependency producer diff --git a/gcc/testsuite/gcc.dg/pr54472.c b/gcc/testsuite/gcc.dg/pr54472.c new file mode 100644 index 000..9395203 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr54472.c @@ -0,0 +1,9 @@ +/* { dg-do compile { target powerpc*-*-* ia64-*-* x86_64-*-* } } */ +/* { dg-options -O -fschedule-insns -fselective-scheduling } */ + +int main () +{ + int a[3][3][3]; + __builtin_memset (a, 0, sizeof a); + return 0; +}
Add myself to MAINTAINERS
Adding myself to the list of members in write after approval. Index: ChangeLog === --- ChangeLog (revision 192977) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2012-10-30 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + + * MAINTAINERS (Write After Approval): Add myself. + 2012-10-26 James Greenhalgh james.greenha...@arm.com * MAINTAINERS (Write After Approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 192977) +++ MAINTAINERS (working copy) @@ -372,6 +372,7 @@ Chao-ying Fu f...@mips.com Gary Funck g...@intrepid.com Pompapathi V Gadad pompapathi.v.ga...@nsc.com +Gopalasubramanian Ganesh ganesh.gopalasubraman...@amd.com Kaveh Ghazigh...@gcc.gnu.org Matthew Gingellging...@gnat.com Tristan Gingoldging...@adacore.com Regards Ganesh
Re: [PATCH, libstdc++] Make empty std::string storage readonly
On 08/30/2012 11:45 AM, Jonathan Wakely wrote: On 29 August 2012 13:25, Michael Haubenwallner wrote: On 08/28/2012 08:12 PM, Jonathan Wakely wrote: On 28 August 2012 18:27, Michael Haubenwallner wrote: Does it actually produce a segfault? I suppose it might on some platforms, but not all, so I'm not sure it's worth changing. Using this patch on my x86_64 Gentoo Linux Desktop with gcc-4.7.1 does segfault as expected - when I make sure the correct libstdc++ is used at runtime, having the '_S_empty_rep_storage' symbol in the .rodata section rather than .bss. If it works reliably on x86_64 then I think the patch is worth considering. I'm on holiday for a week, so maybe one of the other maintainers will deal with it first. Any chance to get this in for 4.8? Thank you! /haubi/
Re: Adapt one fold-const optimization for vectors
On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Patch missing?
[SH, committed] PR 54963
Hello, This is the latest proposed patch from the PR. Tested on rev 192482 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. Pre-approved by Kaz in the PR. Committed as rev 192983. Cheers, Oleg gcc/ChangeLog: PR target/54963 * config/sh/iterators.md (SIDI): New mode iterator. * config/sh/sh.md (negdi2): Use parallel around operation and T_REG clobber in expander. (*negdi2): Mark output operand as early clobbered. Add T_REG clobber. Split after reload. Simplify split code. (abssi2, absdi2): Fold expanders into absmode2. (*abssi2, *absdi2): Fold into *absmode2 insn_and_split. Split insns before reload. (*negabssi2, *negabsdi2): Fold into *negabsmode2. Add T_REG clobber. Split insns before reload. (negsi_cond): Reformat. Use emit_move_insn instead of gen_movesi. (negdi_cond): Reformat. Use emit_move_insn instead of a pair of gen_movsi. Split insn before reload. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 192482) +++ gcc/config/sh/sh.md (working copy) @@ -5177,28 +5177,25 @@ ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be ;; combined. (define_expand negdi2 - [(set (match_operand:DI 0 arith_reg_dest ) - (neg:DI (match_operand:DI 1 arith_reg_operand ))) - (clobber (reg:SI T_REG))] - TARGET_SH1 - ) + [(parallel [(set (match_operand:DI 0 arith_reg_dest) + (neg:DI (match_operand:DI 1 arith_reg_operand))) + (clobber (reg:SI T_REG))])] + TARGET_SH1) (define_insn_and_split *negdi2 - [(set (match_operand:DI 0 arith_reg_dest =r) - (neg:DI (match_operand:DI 1 arith_reg_operand r)))] + [(set (match_operand:DI 0 arith_reg_dest =r) + (neg:DI (match_operand:DI 1 arith_reg_operand r))) + (clobber (reg:SI T_REG))] TARGET_SH1 # - TARGET_SH1 + reload_completed [(const_int 0)] { - rtx low_src = gen_lowpart (SImode, operands[1]); - rtx high_src = gen_highpart (SImode, operands[1]); - rtx low_dst = gen_lowpart (SImode, operands[0]); - rtx high_dst = gen_highpart (SImode, operands[0]); - emit_insn (gen_clrt ()); - emit_insn (gen_negc (low_dst, low_src)); - emit_insn (gen_negc (high_dst, high_src)); + emit_insn (gen_negc (gen_lowpart (SImode, operands[0]), + gen_lowpart (SImode, operands[1]))); + emit_insn (gen_negc (gen_highpart (SImode, operands[0]), + gen_highpart (SImode, operands[1]))); DONE; }) @@ -5272,38 +5269,53 @@ (const_int -1)))] TARGET_SHMEDIA ) -(define_expand abssi2 - [(set (match_operand:SI 0 arith_reg_dest ) - (abs:SI (match_operand:SI 1 arith_reg_operand ))) +(define_expand absmode2 + [(parallel [(set (match_operand:SIDI 0 arith_reg_dest) + (abs:SIDI (match_operand:SIDI 1 arith_reg_operand))) + (clobber (reg:SI T_REG))])] + TARGET_SH1) + +(define_insn_and_split *absmode2 + [(set (match_operand:SIDI 0 arith_reg_dest) + (abs:SIDI (match_operand:SIDI 1 arith_reg_operand))) (clobber (reg:SI T_REG))] TARGET_SH1 - ) - -(define_insn_and_split *abssi2 - [(set (match_operand:SI 0 arith_reg_dest =r) - (abs:SI (match_operand:SI 1 arith_reg_operand r)))] - TARGET_SH1 # - TARGET_SH1 + can_create_pseudo_p () [(const_int 0)] { - emit_insn (gen_cmpgesi_t (operands[1], const0_rtx)); - emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1], - const1_rtx)); + if (MODEmode == SImode) +emit_insn (gen_cmpgesi_t (operands[1], const0_rtx)); + else +{ + rtx high_src = gen_highpart (SImode, operands[1]); + emit_insn (gen_cmpgesi_t (high_src, const0_rtx)); +} + + emit_insn (gen_negmode_cond (operands[0], operands[1], operands[1], + const1_rtx)); DONE; }) -(define_insn_and_split *negabssi2 - [(set (match_operand:SI 0 arith_reg_dest =r) - (neg:SI (abs:SI (match_operand:SI 1 arith_reg_operand r] +(define_insn_and_split *negabsmode2 + [(set (match_operand:SIDI 0 arith_reg_dest) + (neg:SIDI (abs:SIDI (match_operand:SIDI 1 arith_reg_operand + (clobber (reg:SI T_REG))] TARGET_SH1 # - TARGET_SH1 + can_create_pseudo_p () [(const_int 0)] { - emit_insn (gen_cmpgesi_t (operands[1], const0_rtx)); - emit_insn (gen_negsi_cond (operands[0], operands[1], operands[1], - const0_rtx)); + if (MODEmode == SImode) +emit_insn (gen_cmpgesi_t (operands[1], const0_rtx)); + else +{ + rtx high_src = gen_highpart (SImode, operands[1]); + emit_insn (gen_cmpgesi_t (high_src, const0_rtx)); +} + + emit_insn (gen_negmode_cond (operands[0], operands[1], operands[1], + const0_rtx)); DONE; }) @@ -5316,10 +5328,10 @@ (define_insn_and_split negsi_cond [(set (match_operand:SI 0 arith_reg_dest =r,r) - (if_then_else:SI (eq:SI (reg:SI T_REG) - (match_operand:SI 3 const_int_operand M,N)) - (match_operand:SI 1
Re: [PATCH, libstdc++] Make empty std::string storage readonly
On 30 October 2012 09:05, Michael Haubenwallner wrote: Any chance to get this in for 4.8? I'm looking into it today.
Re: [PATCH] Fix debug info for expr and jump stmt
On Mon, Oct 29, 2012 at 05:10:10PM +0100, Richard Biener wrote: On Mon, Oct 29, 2012 at 4:25 PM, Dehao Chen de...@google.com wrote: On Mon, Oct 29, 2012 at 7:17 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 29 Oct 2012, Richard Biener wrote: Well, you merely moved the bogus code to gimple-low.c. It is bogus because you unconditionally overwrite TREE_BLOCK of all operands (and all Emm, then in gimple-low.c, we should probably not unconditionally overwrite gimple_block for stmts too? gimple stmts have no block before gimple-low. And tree expressions don't have TREE_BLOCK before gimple-low either. So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple stmts as well as all expression in the operands. It is not overwriting anything, no frontend sets TREE_BLOCK for any expression, the way frontends associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND after gimplification, and it is gimple-low responsibility to set it. In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK initially. Before the location_t changes, again it was gimple-low that was the first setter of TREE_BLOCK, which was valid for all IS_EXPR_CODE_CLASS. So, IMNSHO gimple-low should merge location_t with block for all gimple stmts and all tree expressions used in its operands. It shouldn't be set on trees that can be shared, so say decls etc. should keep using just location_t's without associated block. So perhaps the right test for gimple-low setting of block is CAN_HAVE_LOCATION_P (exp) !tree_node_can_be_shared (exp). Jakub
Re: [Patch] Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF from mingw32-w64/os_defines.h
On 10/29/2012 21:05, JonY wrote: ChangeLog 2012-10-29 Jonathan Yong jo...@users.sourceforge.net * config/os/mingw32-w64/os_defines.h: Remove _GLIBCXX_HAVE_BROKEN_VSWPRINTF as no longer required. Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h === --- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 192802) +++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy) @@ -63,8 +63,9 @@ // See libstdc++/20806. #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1 -// See libstdc++/37522. -#define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1 +// See libstdc++/37522. mingw-w64 stdio redirect for C++ +// #define _GLIBCXX_HAVE_BROKEN_VSWPRINTF 1 +// Workaround added for mingw-w64 trunk headers r5437 // See libstdc++/43738 // On native windows targets there is no ioctl function. And the existing Hi, Can I have this in before 4.8 branches? Maintainer is away for the few weeks, but OK'ed it on IRC. signature.asc Description: OpenPGP digital signature
Re: [PATCH, libstdc++] Make empty std::string storage readonly
On 30 October 2012 09:28, Jonathan Wakely wrote: On 30 October 2012 09:05, Michael Haubenwallner wrote: Any chance to get this in for 4.8? I'm looking into it today. Consider the case where one object file containing std::string().erase() is built with an older GCC without the fix for PR 40518, then it's linked to a new libstdc++.so where the empty rep is read-only. The program will attempt to write to the empty rep, but now it's read-only and will crash. I don't think we can apply it unless we change the library ABI so that no pre-PR40518 objects can link to a libstdc++.so containing a read-only empty rep.
Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed
H.J. Lu hjl.to...@gmail.com writes: LRA has if (REG_P (reg) (ep = get_elimination (reg)) != NULL) { rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx; if (! replace_p) { offset += (ep-offset - ep-previous_offset); offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); } if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); Reload has rtx to_rtx = ep-to_rtx; offset += ep-offset; offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); (gdb) call debug_rtx (ep-to_rtx) (reg/f:DI 7 sp) (gdb) call debug_rtx (ep-from_rtx) (reg/f:DI 16 argp) (gdb) gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp) for LRA. They are caused by if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM /* We should convert arg register in LRA after the elimination if it is possible. */ xregno == ARG_POINTER_REGNUM ! lra_in_progress) return -1; It doesn't work in this case. This testcase shows that LRA can't convert arg register after the elimination. Here is a patch to remove ra_in_progress check for ARG_POINTER_REGNUM. Tested on Linux.x86-64. OK to install? Thanks HJ. This looks good to me. As well as your testcase, I think it would be dangerous to reduce this kind of subreg during non-final elimination in cases where the argument pointer occupies more than one hard register (like avr IIRC). We could end up with something like ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register during the rest of LRA. It's important that we do get rid of the subreg during the final elimination stage, but I think alter_subreg already handles that case. Since this code is outside the LRA files: patch is OK if Vlad agrees. Richard
Re: [PATCH, libstdc++] Make empty std::string storage readonly
On 30 October 2012 10:11, Jonathan Wakely wrote: On 30 October 2012 09:28, Jonathan Wakely wrote: On 30 October 2012 09:05, Michael Haubenwallner wrote: Any chance to get this in for 4.8? I'm looking into it today. Consider the case where one object file containing std::string().erase() is built with an older GCC without the fix for PR 40518, then it's linked to a new libstdc++.so where the empty rep is read-only. The program will attempt to write to the empty rep, but now it's read-only and will crash. I don't think we can apply it unless we change the library ABI so that no pre-PR40518 objects can link to a libstdc++.so containing a read-only empty rep. If I can convince myself this can't happen then I'll commit it, but I need to look into it further this evening.
Re: RFA: patch to fix PR55116
H.J. Lu hjl.to...@gmail.com writes: On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov vmaka...@redhat.com wrote: On 12-10-29 12:21 PM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: H.J. in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116 reported an interesting address (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ]) (const_int 2 [0x2])) (symbol_ref:SI (glob_vol_int_arr) var_decl 0x703c2720 glob_vol_int_arr)) 0) (const_int 4294967295 [0x])) which can not be correctly extracted. Here `and' with `subreg' behaves as an address mutation. The following patch fixes the problem. Ok to commit, Richard? Heh, I wondered if subregs might still be used like that, and was almost tempted to add them just in case. I think this particular case is really a failed canonicalisation and that: (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x)) ought to be: (zero_extend:DI (foo:SI ...)) Yes, that was my thought too. But I know I've approved MIPS patches to accept (and:DI ... (const_int 0x)) as an alternative. Index: rtlanal.c === --- rtlanal.c (revision 192942) +++ rtlanal.c (working copy) @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum else if (code == AND CONST_INT_P (XEXP (*loc, 1))) /* (and ... (const_int -X)) is used to align to X bytes. */ loc = XEXP (*loc, 0); + else if (code == SUBREG + ! REG_P (XEXP (*loc, 0)) ! MEM_P (XEXP (*loc, 0))) + /* (subreg (operator ...) ...) usually inside and is used for + mode conversion too. */ + loc = XEXP (*loc, 0); I think the condition should be: else if (code == SUBREG !OBJECT_P (SUBREG_REG (*loc)) subreg_lowpart (*loc)) OK with that change, if it works. Yes, it works. I've submitted the following patch. It doesn't work right. I will create a new testcase. This patch limits SUBREG to Pmode. Tested on Linux/x86-64. OK to install? Thanks. The address in this case is: (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154]) (const_int 8 [0x8])) (subreg:SI (plus:DI (reg/f:DI 20 frame) (const_int 32 [0x20])) 0)) which after Uros's subreg simplification patch shouldn't be allowed: the subreg ought to be on the frame register rather than the plus. The attached patch seems to work for the testcase. Does it work more generally? Richard gcc/ * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg rather than gen_rtx_SUBREG. Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c (revision 192983) +++ gcc/lra-eliminations.c (working copy) @@ -550,7 +550,8 @@ return x; } else - return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x)); + return simplify_gen_subreg (GET_MODE (x), new_rtx, + GET_MODE (new_rtx), SUBREG_BYTE (x)); } return x;
[RFC] Heuristics to throttle the complette unrolling
Hi, for past week or two I was playing with ways to throttle down the complette unrolling heuristics. I made complette unroller to use the tree-ssa-loop-niter upper bound and unroll even in non-trivial cases and this has turned out to increase number of complettely unrolled loops by great amount, so one can see it as considerable code size growth at -O3 SPEC build. http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png it is the largest jump on right hand side in both peak and base runs. There are also performance imrovements, most impotantly 11% on applu. The intuition is that complette unrolling is most profitable when IV tests are eliminated and single basic block is created. When condtionals stay in the code it is not that good idea and also functions containing calls are less interesting for unrolling since the calls are slow and optimization oppurtunities are not so great. This patch reduces unrolling on loops having many branches or calls on the hot patch. I found that for applu speedup the number of branches needs to be pretty high - about 32. The patch saves about half of the growth introduced (but on different benchmarks) and I think I can move all peeling to trees and reduce peeling limits a bit, too. Does this sound sane? Any ideas? Honza Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 192892) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -140,6 +140,20 @@ struct loop_size instructions after exit are not executed. */ int last_iteration; int last_iteration_eliminated_by_peeling; + + /* If some IV computation will become constant. */ + bool constant_iv; + + /* Number of call stmts that are not a builtin and are pure or const + present on the hot path. */ + int num_pure_calls_on_hot_path; + /* Number of call stmts that are not a builtin and are not pure nor const + present on the hot path. */ + int num_non_pure_calls_on_hot_path; + /* Number of statements other than calls in the loop. */ + int non_call_stmts_on_hot_path; + /* Number of branches seen on the hot path. */ + int num_branches_on_hot_path; }; /* Return true if OP in STMT will be constant after peeling LOOP. */ @@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple return true; } -/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS. +/* Computes an estimated number of insns in LOOP. + EXIT (if non-NULL) is an exite edge that will be eliminated in all but last + iteration of the loop. + EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last iteration + of loop. Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. */ static void @@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo gimple_stmt_iterator gsi; unsigned int i; bool after_exit; + VEC (basic_block, heap) *path = get_loop_hot_path (loop); size-overall = 0; size-eliminated_by_peeling = 0; size-last_iteration = 0; size-last_iteration_eliminated_by_peeling = 0; + size-num_pure_calls_on_hot_path = 0; + size-num_non_pure_calls_on_hot_path = 0; + size-non_call_stmts_on_hot_path = 0; + size-num_branches_on_hot_path = 0; + size-constant_iv = 0; if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Estimating sizes for loop %i\n, loop-num); @@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo gimple stmt = gsi_stmt (gsi); int num = estimate_num_insns (stmt, eni_size_weights); bool likely_eliminated = false; + bool likely_eliminated_last = false; + bool likely_eliminated_peeled = false; if (dump_file (dump_flags TDF_DETAILS)) { @@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo /* Look for reasons why we might optimize this stmt away. */ /* Exit conditional. */ - if (exit body[i] == exit-src stmt == last_stmt (exit-src)) + if (exit body[i] == exit-src + stmt == last_stmt (exit-src)) { if (dump_file (dump_flags TDF_DETAILS)) - fprintf (dump_file,Exit condition will be eliminated.\n); - likely_eliminated = true; + fprintf (dump_file,Exit condition will be eliminated +in peeled copies.\n); + likely_eliminated_peeled = true; + } + else if (edge_to_cancel body[i] == edge_to_cancel-src + stmt == last_stmt (edge_to_cancel-src)) + { + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file,Exit condition will be eliminated +in last copy.\n); + likely_eliminated_last = true; } /* Sets of IV variables */ else if (gimple_code (stmt) == GIMPLE_ASSIGN @@ -249,19 +285,22 @@
Re: [RFC] Heuristics to throttle the complette unrolling
On Tue, 30 Oct 2012, Jan Hubicka wrote: Hi, for past week or two I was playing with ways to throttle down the complette unrolling heuristics. I made complette unroller to use the tree-ssa-loop-niter upper bound and unroll even in non-trivial cases and this has turned out to increase number of complettely unrolled loops by great amount, so one can see it as considerable code size growth at -O3 SPEC build. http://gcc.opensuse.org/SPEC/CFP/sb-vangelis-head-64/Total-size_big.png it is the largest jump on right hand side in both peak and base runs. There are also performance imrovements, most impotantly 11% on applu. The intuition is that complette unrolling is most profitable when IV tests are eliminated and single basic block is created. When condtionals stay in the code it is not that good idea and also functions containing calls are less interesting for unrolling since the calls are slow and optimization oppurtunities are not so great. This patch reduces unrolling on loops having many branches or calls on the hot patch. I found that for applu speedup the number of branches needs to be pretty high - about 32. The patch saves about half of the growth introduced (but on different benchmarks) and I think I can move all peeling to trees and reduce peeling limits a bit, too. Does this sound sane? Any ideas? Yes, this sounds ok (beware of unrelated PARAM_MAX_ONCE_PEELED_INSNS remove in the patch below). Richard. Honza Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 192892) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -140,6 +140,20 @@ struct loop_size instructions after exit are not executed. */ int last_iteration; int last_iteration_eliminated_by_peeling; + + /* If some IV computation will become constant. */ + bool constant_iv; + + /* Number of call stmts that are not a builtin and are pure or const + present on the hot path. */ + int num_pure_calls_on_hot_path; + /* Number of call stmts that are not a builtin and are not pure nor const + present on the hot path. */ + int num_non_pure_calls_on_hot_path; + /* Number of statements other than calls in the loop. */ + int non_call_stmts_on_hot_path; + /* Number of branches seen on the hot path. */ + int num_branches_on_hot_path; }; /* Return true if OP in STMT will be constant after peeling LOOP. */ @@ -188,7 +202,11 @@ constant_after_peeling (tree op, gimple return true; } -/* Computes an estimated number of insns in LOOP, weighted by WEIGHTS. +/* Computes an estimated number of insns in LOOP. + EXIT (if non-NULL) is an exite edge that will be eliminated in all but last + iteration of the loop. + EDGE_TO_CANCEL (if non-NULL) is an non-exit edge eliminated in the last iteration + of loop. Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. */ static void @@ -198,11 +216,17 @@ tree_estimate_loop_size (struct loop *lo gimple_stmt_iterator gsi; unsigned int i; bool after_exit; + VEC (basic_block, heap) *path = get_loop_hot_path (loop); size-overall = 0; size-eliminated_by_peeling = 0; size-last_iteration = 0; size-last_iteration_eliminated_by_peeling = 0; + size-num_pure_calls_on_hot_path = 0; + size-num_non_pure_calls_on_hot_path = 0; + size-non_call_stmts_on_hot_path = 0; + size-num_branches_on_hot_path = 0; + size-constant_iv = 0; if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Estimating sizes for loop %i\n, loop-num); @@ -221,6 +245,8 @@ tree_estimate_loop_size (struct loop *lo gimple stmt = gsi_stmt (gsi); int num = estimate_num_insns (stmt, eni_size_weights); bool likely_eliminated = false; + bool likely_eliminated_last = false; + bool likely_eliminated_peeled = false; if (dump_file (dump_flags TDF_DETAILS)) { @@ -231,11 +257,21 @@ tree_estimate_loop_size (struct loop *lo /* Look for reasons why we might optimize this stmt away. */ /* Exit conditional. */ - if (exit body[i] == exit-src stmt == last_stmt (exit-src)) + if (exit body[i] == exit-src + stmt == last_stmt (exit-src)) { if (dump_file (dump_flags TDF_DETAILS)) - fprintf (dump_file,Exit condition will be eliminated.\n); - likely_eliminated = true; + fprintf (dump_file,Exit condition will be eliminated + in peeled copies.\n); + likely_eliminated_peeled = true; + } + else if (edge_to_cancel body[i] == edge_to_cancel-src + stmt == last_stmt (edge_to_cancel-src)) + { + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file,Exit condition will be eliminated +
Re: RFA: patch to fix PR55116
On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford rdsandif...@googlemail.com wrote: H.J. Lu hjl.to...@gmail.com writes: On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov vmaka...@redhat.com wrote: On 12-10-29 12:21 PM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: H.J. in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116 reported an interesting address (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ]) (const_int 2 [0x2])) (symbol_ref:SI (glob_vol_int_arr) var_decl 0x703c2720 glob_vol_int_arr)) 0) (const_int 4294967295 [0x])) which can not be correctly extracted. Here `and' with `subreg' behaves as an address mutation. The following patch fixes the problem. Ok to commit, Richard? Heh, I wondered if subregs might still be used like that, and was almost tempted to add them just in case. I think this particular case is really a failed canonicalisation and that: (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x)) ought to be: (zero_extend:DI (foo:SI ...)) Yes, that was my thought too. But I know I've approved MIPS patches to accept (and:DI ... (const_int 0x)) as an alternative. Index: rtlanal.c === --- rtlanal.c (revision 192942) +++ rtlanal.c (working copy) @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum else if (code == AND CONST_INT_P (XEXP (*loc, 1))) /* (and ... (const_int -X)) is used to align to X bytes. */ loc = XEXP (*loc, 0); + else if (code == SUBREG + ! REG_P (XEXP (*loc, 0)) ! MEM_P (XEXP (*loc, 0))) + /* (subreg (operator ...) ...) usually inside and is used for + mode conversion too. */ + loc = XEXP (*loc, 0); I think the condition should be: else if (code == SUBREG !OBJECT_P (SUBREG_REG (*loc)) subreg_lowpart (*loc)) OK with that change, if it works. Yes, it works. I've submitted the following patch. It doesn't work right. I will create a new testcase. This patch limits SUBREG to Pmode. Tested on Linux/x86-64. OK to install? Thanks. The address in this case is: (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154]) (const_int 8 [0x8])) (subreg:SI (plus:DI (reg/f:DI 20 frame) (const_int 32 [0x20])) 0)) which after Uros's subreg simplification patch shouldn't be allowed: the subreg ought to be on the frame register rather than the plus. The attached patch seems to work for the testcase. Does it work more generally? Richard gcc/ * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg rather than gen_rtx_SUBREG. Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c (revision 192983) +++ gcc/lra-eliminations.c (working copy) @@ -550,7 +550,8 @@ return x; } else - return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x)); + return simplify_gen_subreg (GET_MODE (x), new_rtx, + GET_MODE (new_rtx), SUBREG_BYTE (x)); } return x; I am running the full test. Thanks. -- H.J.
Re: Fix estimation of array accesses
On Mon, 29 Oct 2012, Jan Hubicka wrote: ICK ... Why not sth as simple as return num_ssa_operands (stmt, SSA_OP_USE); ? a[1][2] and b[2] really have the same cost, variable length objects have extra SSA operands in ARRAY_REF/COMPONENT_REF for the size. Thus, stmt cost somehow should reflect the number of dependent stmts. So in estimate_num_insns I'd try int estimate_num_insns (gimple stmt, eni_weights *weights) { unsigned cost, i; enum gimple_code code = gimple_code (stmt); tree lhs; tree rhs; switch (code) { case GIMPLE_ASSIGN: /* Initialize with the number of SSA uses, one is free. */ cost = num_ssa_operands (stmt, SSA_OP_USE); if (cost 1) --cost; Hi, this is the udpated patch I am testing after today discussion. I decided to drop the ipa-inline-analysis changes and do that incrementally. So the patch now trashes tramp3d performance by increasing need for early-inlining-insns, but it is not inexpected. The patch also fixes accounting of addr expr that got previously confused with a load. Does this seem sane? * tree-inline.c (estimate_operator_cost): Return 1 for non-trivial ADDR_EXPR operations. (estimate_num_insns): Do not confuse general single rhs with loads; account cost of non-trivial addr_expr for ASSIGN_EXPR, GIMPLE_RETURN and GIMPLE_ASM Hi, this patch actually do not cause that much of tramp3d fuzz and no differences in testsuite due to unroling changes The change are the constants when accounting loads and stores. Typical store has two SSA uses (one for address and one for value to store). Of course we lose difference in between array offset and pointer dereference. Typical load/address expression has one SSA use (for the address) Bootstrapped/regtested x86_64-linux, OK? ChangeLog? Honza Index: tree-inline.c === --- tree-inline.c (revision 192945) +++ tree-inline.c (working copy) @@ -3447,6 +3447,19 @@ estimate_operator_cost (enum tree_code c if (TREE_CODE (op2) != INTEGER_CST) return weights-div_mod_cost; return 1; +case ADDR_EXPR: + { +tree addr_base; +HOST_WIDE_INT addr_offset; + + addr_base = get_addr_base_and_unit_offset (TREE_OPERAND (op1, 0), +addr_offset); + /* If the offset is variable or with non-zero offset, return 2. */ + if (!addr_base || addr_offset != 0 || TREE_CODE (addr_base) != MEM_REF + || !integer_zerop (TREE_OPERAND (addr_base, 1))) + return 1; The comment doesn't match the code. Why the TREE_CODE (addr_base) != MEM_REF check? If it isn't a MEM_REF then it is a plain decl, thus no dereference. So it's not clear what you want here ...? It looks like you want to pessimize variable addresses here, like a.a[i]? Before all ADDR_EXPR cost was zero. I'd say you want simply if (!addr_base || addr_offset != 0) return 1; no? Or even if (!addr_base || (addr_offset != 0 TREE_CODE (addr_base) == MEM_REF)) return 1; that keeps decl + CST as cost 0. Or even if (!addr_base) return 1; that even keeps ptr + CST as cost 0. Both because they are likely combined with some complex addressing mode later. + } + return 0; Inside the } please. default: /* We expect a copy assignment with no operator. */ @@ -3512,14 +3525,24 @@ estimate_num_insns (gimple stmt, eni_wei lhs = gimple_assign_lhs (stmt); rhs = gimple_assign_rhs1 (stmt); - if (is_gimple_reg (lhs)) - cost = 0; - else + /* Store. */ + if (gimple_vdef (stmt)) cost = estimate_move_cost (TREE_TYPE (lhs)); + else + cost = 0; That change seems bogus. A !is_gimple_reg lhs is always a store. - if (!is_gimple_reg (rhs) !is_gimple_min_invariant (rhs)) + /* Load. */ + if (gimple_vuse (stmt)) cost += estimate_move_cost (TREE_TYPE (rhs)); Likewise. If rhs1 is not a register or a constant this is a load. All stores also have a VUSE so you are always accounting a store as aggregate copy this way. + /* Stores, loads and address expressions may have variable array + references in them. Account these. */ + if (gimple_vdef (stmt)) + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 2); + else if (gimple_vuse (stmt) + || gimple_assign_rhs_code (stmt) == ADDR_EXPR) + cost += MAX (0, (int)num_ssa_operands (stmt, SSA_OP_USE) - 1); + Rather than doing this here in this awkward and bogus way (see above) why not do it in estimate_operator_cost? You added ADDR_EXPR already, so simply add DECLs and handled-components. Then you can do better than estimating from
Re: [patch] Unify bitmap interface.
On Tue, Oct 30, 2012 at 2:17 AM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl cr...@googlers.com wrote: On 10/29/12, Diego Novillo dnovi...@google.com wrote: On Oct 29, 2012 Diego Novillo dnovi...@google.com wrote: Just to make sure. Testing on ppc should be fast, for example. And useless. Your patch does not touch ppc. I've fixed the #if 0 and the remaining suggestions will happen in another patch. I've committed this one. === This patch implements the unification of the *bitmap interfaces as discussed. Essentially, we rename ebitmap and sbitmap functions to use the same names as the bitmap functions. This rename works because we can now overload on the bitmap type. Some macros now become inline functions to enable that overloading. The sbitmap non-bool returning bitwise operations have been merged with the bool versions. Sometimes this merge involved modifying the non-bool version to compute the bool value, and sometimes modifying bool version to add additional work from the non-bool version. The redundant routines have been removed. The allocation functions have not been renamed, because we often do not have an argument on which to overload. The cardinality functions have not been renamed, because they have different parameters, and are thus not interchangable. The iteration functions have not been renamed, because they are functionally different. Tested on x86_64, contrib/config-list.mk testing passed. Index: gcc/ChangeLog Just one question: Should we change the name of functions sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds too? It might be a little confusing that sbitmap_* is used among bitmap_*. Yes. Lawrence is proceeding with this unification in stages. The next few patches should rename these. The only two sets of functions that will remain separate for now are the iterators and the bitmap creation routines, I think. Lawrence? Diego.
Re: [patch] Unify bitmap interface.
On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote: Sure. But the point is not to add more. We should mechanically strip all the #if 0 code from the tree, btw. No point keeping all that garbage around. Please no. A lot (if not most) if the #if 0 code serves as good documentation for why something is *not* done, other pieces are there to indicate possible enhancement, and some are useful for debugging. I never really bought into that line of reasoning. Documenting why an approach was not taken is better to do it in words than in code that will grow stale wrt the surrounding code. Similarly for possible enhancements. Prose is better than code in those cases. If debugging code is useful, then it can remain predicated on some debugging switch. Diego.
[PATCH] Fix PR55111
This fixes PR55111. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-10-30 Richard Biener rguent...@suse.de PR tree-optimization/55111 * tree-ssa-pre.c (eliminate_insert): Properly fold the built stmt. * gcc.dg/torture/pr55111.c: New testcase. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 192983) --- gcc/tree-ssa-pre.c (working copy) *** eliminate_insert (gimple_stmt_iterator * *** 3996,4003 tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, pretmp); gimple tem = gimple_build_assign (res, ! build1 (TREE_CODE (expr), ! TREE_TYPE (expr), leader)); gsi_insert_before (gsi, tem, GSI_SAME_STMT); VN_INFO_GET (res)-valnum = val; --- 3996,4003 tree res = make_temp_ssa_name (TREE_TYPE (val), NULL, pretmp); gimple tem = gimple_build_assign (res, ! fold_build1 (TREE_CODE (expr), !TREE_TYPE (expr), leader)); gsi_insert_before (gsi, tem, GSI_SAME_STMT); VN_INFO_GET (res)-valnum = val; Index: gcc/testsuite/gcc.dg/torture/pr55111.c === *** gcc/testsuite/gcc.dg/torture/pr55111.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr55111.c (working copy) *** *** 0 --- 1,24 + /* { dg-do compile } */ + + int a, b, c; + long d; + unsigned long *e; + + int f(void) + { + for(;; a++) + { + if(c) + { + for(b = d = 0; b 1; b++) + e = d; + + --*e; + + if(d 0) + a = 0; + + return d; + } + } + }
Re: Adapt one fold-const optimization for vectors
On Tue, 30 Oct 2012, Marek Polacek wrote: On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Patch missing? Indeed, thanks for the notice. Here it is. -- Marc GlisseIndex: fold-const.c === --- fold-const.c(revision 192976) +++ fold-const.c(working copy) @@ -5952,42 +5952,47 @@ static tree fold_binary_op_with_conditional_arg (location_t loc, enum tree_code code, tree type, tree op0, tree op1, tree cond, tree arg, int cond_first_p) { tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1); tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0); tree test, true_value, false_value; tree lhs = NULL_TREE; tree rhs = NULL_TREE; + enum tree_code cond_code = COND_EXPR; - if (TREE_CODE (cond) == COND_EXPR) + if (TREE_CODE (cond) == COND_EXPR + || TREE_CODE (cond) == VEC_COND_EXPR) { test = TREE_OPERAND (cond, 0); true_value = TREE_OPERAND (cond, 1); false_value = TREE_OPERAND (cond, 2); /* If this operand throws an expression, then it does not make sense to try to perform a logical or arithmetic operation involving it. */ if (VOID_TYPE_P (TREE_TYPE (true_value))) lhs = true_value; if (VOID_TYPE_P (TREE_TYPE (false_value))) rhs = false_value; } else { tree testtype = TREE_TYPE (cond); test = cond; true_value = constant_boolean_node (true, testtype); false_value = constant_boolean_node (false, testtype); } + if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE) +cond_code = VEC_COND_EXPR; + /* This transformation is only worthwhile if we don't have to wrap ARG in a SAVE_EXPR and the operation can be simplified on at least one of the branches once its pushed inside the COND_EXPR. */ if (!TREE_CONSTANT (arg) (TREE_SIDE_EFFECTS (arg) || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value))) return NULL_TREE; arg = fold_convert_loc (loc, arg_type, arg); if (lhs == 0) @@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc if (cond_first_p) rhs = fold_build2_loc (loc, code, type, false_value, arg); else rhs = fold_build2_loc (loc, code, type, arg, false_value); } /* Check that we have simplified at least one of the branches. */ if (!TREE_CONSTANT (arg) !TREE_CONSTANT (lhs) !TREE_CONSTANT (rhs)) return NULL_TREE; - return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs); + return fold_build3_loc (loc, cond_code, type, test, lhs, rhs); } /* Subroutine of fold() that checks for the addition of +/- 0.0. If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type TYPE, X + ADDEND is the same as X. If NEGATE, return true if X - ADDEND is the same as X. X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero @@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc, if (TREE_CODE (arg1) == COMPOUND_EXPR reorder_operands_p (arg0, TREE_OPERAND (arg1, 0))) { tem = fold_build2_loc (loc, code, type, op0, fold_convert_loc (loc, TREE_TYPE (op1), TREE_OPERAND (arg1, 1))); return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1, 0), tem); } - if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0)) + if (TREE_CODE (arg0) == COND_EXPR + || TREE_CODE (arg0) == VEC_COND_EXPR + || COMPARISON_CLASS_P (arg0)) { tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1, arg0, arg1, /*cond_first_p=*/1); if (tem != NULL_TREE) return tem; } - if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1)) + if (TREE_CODE (arg1) == COND_EXPR + || TREE_CODE (arg1) == VEC_COND_EXPR + || COMPARISON_CLASS_P (arg1)) { tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1, arg1, arg0, /*cond_first_p=*/0); if (tem != NULL_TREE) return tem; } } switch (code)
Re: [patch] Apply conditional down cast to cgraph.h et.al.
On 2012-10-29 15:01 , Lawrence Crowl wrote: On 10/27/12, Marc Glisse marc.gli...@inria.fr wrote: On Fri, 26 Oct 2012, Lawrence Crowl wrote: 2012-10-26 Lawrence Crowl cr...@google.com missing '' Fixed. * is-a.h: New. (is_a T (U*)): New. Test for is-a relationship. (as_a T (U*)): New. Treat as a derived type. (dyn_cast T (U*)): New. Conditionally cast based on is_a. I can't find this file in the patch... I forgot to svn add. Updated patch here. This patch implements generic type query and conversion functions, and applies them to the use of cgraph_node, varpool_node, and symtab_node. The functions are: bool is_a TYPE (pointer) Tests whether the pointer actually points to a more derived TYPE. TYPE *as_a TYPE (pointer) Converts pointer to a TYPE*. TYPE *dyn_cast TYPE (pointer) Converts pointer to TYPE* if and only if is_a TYPE pointer. Otherwise, returns NULL. This function is essentially a checked down cast. These functions reduce compile time and increase type safety when treating a generic item as a more specific item. In essence, the code change is from if (symtab_function_p (node)) { struct cgraph_node *cnode = cgraph (node); } to if (cgraph_node *cnode = dyn_cast cgraph_node (node)) { } The necessary conditional test defines a variable that holds a known good pointer to the specific item and avoids subsequent conversion calls and the assertion checks that may come with them. When, the property test is embedded within a larger condition, the variable declaration gets pulled out of the condition. (This leaves some room for using the variable inappropriately.) if (symtab_variable_p (node) varpool (node)-finalized) varpool_analyze_node (varpool (node)); becomes varpool_node *vnode = dyn_cast varpool_node (node); if (vnode vnode-finalized) varpool_analyze_node (vnode); Note that we have converted two sets of assertions in the calls to varpool into safe and efficient use of a variable. There are remaining calls to symtab_function_p and symtab_variable_p that do not involve a pointer to a more specific type. These have been converted to calls to a functions is_a cgraph_node and is_a varpool_node. The original predicate functions have been removed. The cgraph.h header defined both a struct and a function with the name varpool_node. This name overloading can cause some unintuitive error messages when, as is common in C++, one omits the struct keyword when using the type. I have renamed the function to varpool_node_for_decl. Tested on x86_64. Okay for trunk? Index: gcc/ChangeLog 2012-10-29 Lawrence Crowl cr...@google.com * is-a.h: New. (is_a T (U*)): New. Test for is-a relationship. (as_a T (U*)): New. Treat as a derived type. (dyn_cast T (U*)): New. Conditionally cast based on is_a. * cgraph.h (varpool_node): Rename to varpool_node_for_decl. Adjust callers to match. (is_a_helper cgraph_node::test (symtab_node_def *)): New. (is_a_helper varpool_node::test (symtab_node_def *)): New. (symtab_node_def::try_function): New. Change most calls to symtab_function_p with calls to dyn_cast cgraph_node (p). (symtab_node_def::try_variable): New. Change most calls to symtab_variable_p with calls to dyn_cast varpool_node (p). (symtab_function_p): Remove. Change callers to use is_a cgraph_node (p) instead. (symtab_variable_p): Remove. Change callers to use is_a varpool_node (p) instead. * cgraph.c (cgraph_node_for_asm): Remove redundant call to symtab_node_for_asm. * cgraphunit.c (symbol_finalized_and_needed): New. (symbol_finalized): New. (cgraph_analyze_functions): Split complicated conditionals out into above new functions. * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h. Thanks. I really like this cleanup. I have a few questions and comments below. Honza, the patch looks OK to me but it touches a bunch of cgraph code, could you please go over it? Index: gcc/is-a.h === --- gcc/is-a.h (revision 0) +++ gcc/is-a.h (revision 0) @@ -0,0 +1,118 @@ +/* Dynamic testing for abstract is-a relationships. + Copyright (C) 2012 Free Software Foundation, Inc. + Contributed by Lawrence Crowl. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have
Re: [patch] Unify bitmap interface.
On Tue, Oct 30, 2012 at 12:53 PM, Diego Novillo dnovi...@google.com wrote: On Mon, Oct 29, 2012 at 4:25 PM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Oct 29, 2012 at 7:54 PM, Diego Novillo wrote: Sure. But the point is not to add more. We should mechanically strip all the #if 0 code from the tree, btw. No point keeping all that garbage around. Please no. A lot (if not most) if the #if 0 code serves as good documentation for why something is *not* done, other pieces are there to indicate possible enhancement, and some are useful for debugging. I never really bought into that line of reasoning. Documenting why an approach was not taken is better to do it in words than in code that will grow stale wrt the surrounding code. Similarly for possible enhancements. Prose is better than code in those cases. If debugging code is useful, then it can remain predicated on some debugging switch. I agree with both of you - #if 0 code if it is useful as comment deserves being rewritten as a comment. Richard. Diego.
Re: Adapt one fold-const optimization for vectors
On Tue, Oct 30, 2012 at 1:14 PM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 30 Oct 2012, Marek Polacek wrote: On Tue, Oct 30, 2012 at 08:05:13AM +0100, Marc Glisse wrote: Hello, one more optimization that needed help for vectors, it crashed on (xy)0. Because of PR 55001, testcases are awkward to add (I could do a x86-only one if needed). bootstrap+testsuite. 2012-10-30 Marc Glisse marc.gli...@inria.fr * fold-const.c (fold_binary_op_with_conditional_arg): Handle vectors. (fold_binary_loc): call it for VEC_COND_EXPR. Patch missing? Indeed, thanks for the notice. Ok. Thanks, Richard. Here it is. -- Marc Glisse Index: fold-const.c === --- fold-const.c(revision 192976) +++ fold-const.c(working copy) @@ -5952,42 +5952,47 @@ static tree fold_binary_op_with_conditional_arg (location_t loc, enum tree_code code, tree type, tree op0, tree op1, tree cond, tree arg, int cond_first_p) { tree cond_type = cond_first_p ? TREE_TYPE (op0) : TREE_TYPE (op1); tree arg_type = cond_first_p ? TREE_TYPE (op1) : TREE_TYPE (op0); tree test, true_value, false_value; tree lhs = NULL_TREE; tree rhs = NULL_TREE; + enum tree_code cond_code = COND_EXPR; - if (TREE_CODE (cond) == COND_EXPR) + if (TREE_CODE (cond) == COND_EXPR + || TREE_CODE (cond) == VEC_COND_EXPR) { test = TREE_OPERAND (cond, 0); true_value = TREE_OPERAND (cond, 1); false_value = TREE_OPERAND (cond, 2); /* If this operand throws an expression, then it does not make sense to try to perform a logical or arithmetic operation involving it. */ if (VOID_TYPE_P (TREE_TYPE (true_value))) lhs = true_value; if (VOID_TYPE_P (TREE_TYPE (false_value))) rhs = false_value; } else { tree testtype = TREE_TYPE (cond); test = cond; true_value = constant_boolean_node (true, testtype); false_value = constant_boolean_node (false, testtype); } + if (TREE_CODE (TREE_TYPE (test)) == VECTOR_TYPE) +cond_code = VEC_COND_EXPR; + /* This transformation is only worthwhile if we don't have to wrap ARG in a SAVE_EXPR and the operation can be simplified on at least one of the branches once its pushed inside the COND_EXPR. */ if (!TREE_CONSTANT (arg) (TREE_SIDE_EFFECTS (arg) || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value))) return NULL_TREE; arg = fold_convert_loc (loc, arg_type, arg); if (lhs == 0) @@ -6004,21 +6009,21 @@ fold_binary_op_with_conditional_arg (loc if (cond_first_p) rhs = fold_build2_loc (loc, code, type, false_value, arg); else rhs = fold_build2_loc (loc, code, type, arg, false_value); } /* Check that we have simplified at least one of the branches. */ if (!TREE_CONSTANT (arg) !TREE_CONSTANT (lhs) !TREE_CONSTANT (rhs)) return NULL_TREE; - return fold_build3_loc (loc, COND_EXPR, type, test, lhs, rhs); + return fold_build3_loc (loc, cond_code, type, test, lhs, rhs); } /* Subroutine of fold() that checks for the addition of +/- 0.0. If !NEGATE, return true if ADDEND is +/-0.0 and, for all X of type TYPE, X + ADDEND is the same as X. If NEGATE, return true if X - ADDEND is the same as X. X + 0 and X - 0 both give X when X is NaN, infinite, or nonzero @@ -9864,30 +9869,34 @@ fold_binary_loc (location_t loc, if (TREE_CODE (arg1) == COMPOUND_EXPR reorder_operands_p (arg0, TREE_OPERAND (arg1, 0))) { tem = fold_build2_loc (loc, code, type, op0, fold_convert_loc (loc, TREE_TYPE (op1), TREE_OPERAND (arg1, 1))); return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1, 0), tem); } - if (TREE_CODE (arg0) == COND_EXPR || COMPARISON_CLASS_P (arg0)) + if (TREE_CODE (arg0) == COND_EXPR + || TREE_CODE (arg0) == VEC_COND_EXPR + || COMPARISON_CLASS_P (arg0)) { tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1, arg0, arg1, /*cond_first_p=*/1); if (tem != NULL_TREE) return tem; } - if (TREE_CODE (arg1) == COND_EXPR || COMPARISON_CLASS_P (arg1)) + if (TREE_CODE (arg1) == COND_EXPR + || TREE_CODE (arg1) == VEC_COND_EXPR + || COMPARISON_CLASS_P (arg1)) { tem = fold_binary_op_with_conditional_arg (loc, code, type, op0, op1, arg1,
Re: [patch] Apply conditional down cast to cgraph.h et.al.
On Tue, Oct 30, 2012 at 1:20 PM, Diego Novillo dnovi...@google.com wrote: On 2012-10-29 15:01 , Lawrence Crowl wrote: On 10/27/12, Marc Glisse marc.gli...@inria.fr wrote: On Fri, 26 Oct 2012, Lawrence Crowl wrote: 2012-10-26 Lawrence Crowl cr...@google.com missing '' Fixed. * is-a.h: New. (is_a T (U*)): New. Test for is-a relationship. (as_a T (U*)): New. Treat as a derived type. (dyn_cast T (U*)): New. Conditionally cast based on is_a. I can't find this file in the patch... I forgot to svn add. Updated patch here. This patch implements generic type query and conversion functions, and applies them to the use of cgraph_node, varpool_node, and symtab_node. The functions are: bool is_a TYPE (pointer) Tests whether the pointer actually points to a more derived TYPE. TYPE *as_a TYPE (pointer) Converts pointer to a TYPE*. TYPE *dyn_cast TYPE (pointer) Converts pointer to TYPE* if and only if is_a TYPE pointer. Otherwise, returns NULL. This function is essentially a checked down cast. These functions reduce compile time and increase type safety when treating a generic item as a more specific item. In essence, the code change is from if (symtab_function_p (node)) { struct cgraph_node *cnode = cgraph (node); } to if (cgraph_node *cnode = dyn_cast cgraph_node (node)) { } The necessary conditional test defines a variable that holds a known good pointer to the specific item and avoids subsequent conversion calls and the assertion checks that may come with them. When, the property test is embedded within a larger condition, the variable declaration gets pulled out of the condition. (This leaves some room for using the variable inappropriately.) if (symtab_variable_p (node) varpool (node)-finalized) varpool_analyze_node (varpool (node)); becomes varpool_node *vnode = dyn_cast varpool_node (node); if (vnode vnode-finalized) varpool_analyze_node (vnode); Note that we have converted two sets of assertions in the calls to varpool into safe and efficient use of a variable. There are remaining calls to symtab_function_p and symtab_variable_p that do not involve a pointer to a more specific type. These have been converted to calls to a functions is_a cgraph_node and is_a varpool_node. The original predicate functions have been removed. The cgraph.h header defined both a struct and a function with the name varpool_node. This name overloading can cause some unintuitive error messages when, as is common in C++, one omits the struct keyword when using the type. I have renamed the function to varpool_node_for_decl. Tested on x86_64. Okay for trunk? Index: gcc/ChangeLog 2012-10-29 Lawrence Crowl cr...@google.com * is-a.h: New. (is_a T (U*)): New. Test for is-a relationship. (as_a T (U*)): New. Treat as a derived type. (dyn_cast T (U*)): New. Conditionally cast based on is_a. * cgraph.h (varpool_node): Rename to varpool_node_for_decl. Adjust callers to match. (is_a_helper cgraph_node::test (symtab_node_def *)): New. (is_a_helper varpool_node::test (symtab_node_def *)): New. (symtab_node_def::try_function): New. Change most calls to symtab_function_p with calls to dyn_cast cgraph_node (p). (symtab_node_def::try_variable): New. Change most calls to symtab_variable_p with calls to dyn_cast varpool_node (p). (symtab_function_p): Remove. Change callers to use is_a cgraph_node (p) instead. (symtab_variable_p): Remove. Change callers to use is_a varpool_node (p) instead. * cgraph.c (cgraph_node_for_asm): Remove redundant call to symtab_node_for_asm. * cgraphunit.c (symbol_finalized_and_needed): New. (symbol_finalized): New. (cgraph_analyze_functions): Split complicated conditionals out into above new functions. * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h. Thanks. I really like this cleanup. I have a few questions and comments below. Honza, the patch looks OK to me but it touches a bunch of cgraph code, could you please go over it? Index: gcc/is-a.h === --- gcc/is-a.h (revision 0) +++ gcc/is-a.h (revision 0) @@ -0,0 +1,118 @@ +/* Dynamic testing for abstract is-a relationships. + Copyright (C) 2012 Free Software Foundation, Inc. + Contributed by Lawrence Crowl. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT
[PATCH] Add gimple load/store predicates, use them from stmt estimates
As requested this adds predicates to check whether the lhs of a assign or call is a store and whether rhs1 of an assignment is a load. It uses this in place of the existing, slightly bogus, check in the stmt estimate code. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2012-10-30 Richard Biener rguent...@suse.de * gimple.h (gimple_store_p): New predicate. (gimple_assign_load_p): Likewise. * tree-inline.c (estimate_num_insns): Use it. Index: gcc/gimple.h === *** gcc/gimple.h(revision 192984) --- gcc/gimple.h(working copy) *** gimple_assign_single_p (gimple gs) *** 2041,2046 --- 2041,2071 gimple_assign_rhs_class (gs) == GIMPLE_SINGLE_RHS); } + /* Return true if GS performs a store to its lhs. */ + + static inline bool + gimple_store_p (gimple gs) + { + tree lhs = gimple_get_lhs (gs); + return lhs !is_gimple_reg (lhs); + } + + /* Return true if GS is an assignment that loads from its rhs1. */ + + static inline bool + gimple_assign_load_p (gimple gs) + { + tree rhs; + if (!gimple_assign_single_p (gs)) + return false; + rhs = gimple_assign_rhs1 (gs); + if (TREE_CODE (rhs) == WITH_SIZE_EXPR) + return true; + rhs = get_base_address (rhs); + return (DECL_P (rhs) + || TREE_CODE (rhs) == MEM_REF || TREE_CODE (rhs) == TARGET_MEM_REF); + } + /* Return true if S is a type-cast assignment. */ Index: gcc/tree-inline.c === *** gcc/tree-inline.c (revision 192984) --- gcc/tree-inline.c (working copy) *** estimate_num_insns (gimple stmt, eni_wei *** 3512,3523 lhs = gimple_assign_lhs (stmt); rhs = gimple_assign_rhs1 (stmt); ! if (is_gimple_reg (lhs)) ! cost = 0; ! else ! cost = estimate_move_cost (TREE_TYPE (lhs)); ! if (!is_gimple_reg (rhs) !is_gimple_min_invariant (rhs)) cost += estimate_move_cost (TREE_TYPE (rhs)); cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights, --- 3512,3523 lhs = gimple_assign_lhs (stmt); rhs = gimple_assign_rhs1 (stmt); ! cost = 0; ! /* Account for the cost of moving to / from memory. */ ! if (gimple_store_p (stmt)) ! cost += estimate_move_cost (TREE_TYPE (lhs)); ! if (gimple_assign_load_p (stmt)) cost += estimate_move_cost (TREE_TYPE (rhs)); cost += estimate_operator_cost (gimple_assign_rhs_code (stmt), weights,
RE: [PATCH] [AArch64] Add vcond, vcondu support.
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Marcus Shawcroft Sent: 15 October 2012 12:37 To: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [AArch64] Add vcond, vcondu support. On 09/10/12 12:08, James Greenhalgh wrote: Hi, This patch adds support for vcond and vcondu to the AArch64 backend. Tested with no regressions on aarch64-none-elf. OK for aarch64-branch? (If so, someone will have to commit for me, as I do not have commit rights.) Thanks James Greenhalgh --- 2012-09-11 James Greenhalghjames.greenha...@arm.com Tejas Belagodtejas.bela...@arm.com * config/aarch64/aarch64-simd.md (aarch64_simd_bslmode_internal): New pattern. (aarch64_simd_bslmode): Likewise. (aarch64_vcond_internalmode): Likewise. (vcondumodemode): Likewise. (vcondmodemode): Likewise. * config/aarch64/iterators.md (UNSPEC_BSL): Add to define_constants. OK /Marcus Hi, Committed as revision 192985. Thanks, James Greenhalgh
Re: [patch] Apply conditional down cast to cgraph.h et.al.
2012-10-29 Lawrence Crowl cr...@google.com * is-a.h: New. (is_a T (U*)): New. Test for is-a relationship. (as_a T (U*)): New. Treat as a derived type. (dyn_cast T (U*)): New. Conditionally cast based on is_a. * cgraph.h (varpool_node): Rename to varpool_node_for_decl. Adjust callers to match. (is_a_helper cgraph_node::test (symtab_node_def *)): New. (is_a_helper varpool_node::test (symtab_node_def *)): New. (symtab_node_def::try_function): New. Change most calls to symtab_function_p with calls to dyn_cast cgraph_node (p). (symtab_node_def::try_variable): New. Change most calls to symtab_variable_p with calls to dyn_cast varpool_node (p). (symtab_function_p): Remove. Change callers to use is_a cgraph_node (p) instead. (symtab_variable_p): Remove. Change callers to use is_a varpool_node (p) instead. * cgraph.c (cgraph_node_for_asm): Remove redundant call to symtab_node_for_asm. * cgraphunit.c (symbol_finalized_and_needed): New. (symbol_finalized): New. (cgraph_analyze_functions): Split complicated conditionals out into above new functions. * Makefile.in (CGRAPH_H): Add is-a.h as used by cgraph.h. the patch is OK, thanks! Honza
Re: RFA: patch to fix PR55116
On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford rdsandif...@googlemail.com wrote: H.J. Lu hjl.to...@gmail.com writes: On Mon, Oct 29, 2012 at 5:11 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 4:41 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 29, 2012 at 9:38 AM, Vladimir Makarov vmaka...@redhat.com wrote: On 12-10-29 12:21 PM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: H.J. in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55116 reported an interesting address (and:DI (subreg:DI (plus:SI (ashift:SI (reg:SI 96 [ glob_vol_int.22 ]) (const_int 2 [0x2])) (symbol_ref:SI (glob_vol_int_arr) var_decl 0x703c2720 glob_vol_int_arr)) 0) (const_int 4294967295 [0x])) which can not be correctly extracted. Here `and' with `subreg' behaves as an address mutation. The following patch fixes the problem. Ok to commit, Richard? Heh, I wondered if subregs might still be used like that, and was almost tempted to add them just in case. I think this particular case is really a failed canonicalisation and that: (and:DI (subreg:DI (foo:SI ...) 0) (const_int 0x)) ought to be: (zero_extend:DI (foo:SI ...)) Yes, that was my thought too. But I know I've approved MIPS patches to accept (and:DI ... (const_int 0x)) as an alternative. Index: rtlanal.c === --- rtlanal.c (revision 192942) +++ rtlanal.c (working copy) @@ -5459,6 +5459,11 @@ strip_address_mutations (rtx *loc, enum else if (code == AND CONST_INT_P (XEXP (*loc, 1))) /* (and ... (const_int -X)) is used to align to X bytes. */ loc = XEXP (*loc, 0); + else if (code == SUBREG + ! REG_P (XEXP (*loc, 0)) ! MEM_P (XEXP (*loc, 0))) + /* (subreg (operator ...) ...) usually inside and is used for + mode conversion too. */ + loc = XEXP (*loc, 0); I think the condition should be: else if (code == SUBREG !OBJECT_P (SUBREG_REG (*loc)) subreg_lowpart (*loc)) OK with that change, if it works. Yes, it works. I've submitted the following patch. It doesn't work right. I will create a new testcase. This patch limits SUBREG to Pmode. Tested on Linux/x86-64. OK to install? Thanks. The address in this case is: (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154]) (const_int 8 [0x8])) (subreg:SI (plus:DI (reg/f:DI 20 frame) (const_int 32 [0x20])) 0)) which after Uros's subreg simplification patch shouldn't be allowed: the subreg ought to be on the frame register rather than the plus. The attached patch seems to work for the testcase. Does it work more generally? Richard gcc/ * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg rather than gen_rtx_SUBREG. Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c (revision 192983) +++ gcc/lra-eliminations.c (working copy) @@ -550,7 +550,8 @@ return x; } else - return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x)); + return simplify_gen_subreg (GET_MODE (x), new_rtx, + GET_MODE (new_rtx), SUBREG_BYTE (x)); } return x; I am running the full test. It works. Can you check in your patch? I will check in my testcase. Thanks. -- H.J.
Re: RFA: patch to fix PR55116
H.J. Lu hjl.to...@gmail.com writes: On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford rdsandif...@googlemail.com wrote: The address in this case is: (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154]) (const_int 8 [0x8])) (subreg:SI (plus:DI (reg/f:DI 20 frame) (const_int 32 [0x20])) 0)) which after Uros's subreg simplification patch shouldn't be allowed: the subreg ought to be on the frame register rather than the plus. The attached patch seems to work for the testcase. Does it work more generally? Richard gcc/ * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg rather than gen_rtx_SUBREG. Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c (revision 192983) +++ gcc/lra-eliminations.c (working copy) @@ -550,7 +550,8 @@ return x; } else - return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x)); + return simplify_gen_subreg (GET_MODE (x), new_rtx, + GET_MODE (new_rtx), SUBREG_BYTE (x)); } return x; I am running the full test. It works. Can you check in your patch? I will check in my testcase. Thanks. Vlad, is the patch OK? Richard
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Oct 29, 2012 at 1:56 PM, Jakub Jelinek ja...@redhat.com wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. I will be committing the VEC overhaul soon. With any luck this week, but PCH and gengtype are giving me a lot of grief. Diego.
Re: [Patch] Potential fix for PR55033
On 10/26/2012 02:22 PM, Sebastian Huber wrote: Hello, here is a test case for PR55033. Is there something wrong with this test case? It compiles well with Alan's patch. -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: [Patch] Potential fix for PR55033
Hello, what needs to be done to get this committed? -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: [PATCH] Add gimple load/store predicates, use them from stmt estimates
On Tue, 30 Oct 2012, Richard Biener wrote: As requested this adds predicates to check whether the lhs of a assign or call is a store and whether rhs1 of an assignment is a load. It uses this in place of the existing, slightly bogus, check in the stmt estimate code. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Committed with the following adjustment as we now would inline foo as we no longer account memory move costs for stmts like pInput_9 = MEM[(void *)pInput_1 + 8B]; adding noinline looks reasonable anyway (we'll fix the above to cost something again with a followup). Richard. Index: gcc/testsuite/gcc.dg/vect/slp-perm-2.c === *** gcc/testsuite/gcc.dg/vect/slp-perm-2.c (revision 192984) --- gcc/testsuite/gcc.dg/vect/slp-perm-2.c (working copy) *** *** 12,18 #define N 16 ! void foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ pOutput) { unsigned int i, a, b; --- 12,19 #define N 16 ! void __attribute__((noinline)) ! foo (unsigned int *__restrict__ pInput, unsigned int *__restrict__ pOutput) { unsigned int i, a, b;
Re: [PATCH] Fix debug info for expr and jump stmt
And tree expressions don't have TREE_BLOCK before gimple-low either. So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple stmts as well as all expression in the operands. It is not overwriting anything, no frontend sets TREE_BLOCK for any expression, the way frontends associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND after gimplification, and it is gimple-low responsibility to set it. In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK initially. Before the location_t changes, again it was gimple-low that was the first setter of TREE_BLOCK, which was valid for all IS_EXPR_CODE_CLASS. So, IMNSHO gimple-low should merge location_t with block for all gimple stmts and all tree expressions used in its operands. It shouldn't be set on trees that can be shared, so say decls etc. should keep using just location_t's without associated block. So perhaps the right test for gimple-low setting of block is CAN_HAVE_LOCATION_P (exp) !tree_node_can_be_shared (exp). Jakub I Kind of like this idea. What do you guys think? Thanks, Dehao
Re: RFA: patch to fix PR55116
On 10/30/2012 09:39 AM, Richard Sandiford wrote: H.J. Lu hjl.to...@gmail.com writes: On Tue, Oct 30, 2012 at 4:38 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Oct 30, 2012 at 4:09 AM, Richard Sandiford rdsandif...@googlemail.com wrote: The address in this case is: (plus:SI (mult:SI (reg/v:SI 223 [orig:154 j ] [154]) (const_int 8 [0x8])) (subreg:SI (plus:DI (reg/f:DI 20 frame) (const_int 32 [0x20])) 0)) which after Uros's subreg simplification patch shouldn't be allowed: the subreg ought to be on the frame register rather than the plus. The attached patch seems to work for the testcase. Does it work more generally? Richard gcc/ * lra-eliminations.c (lra_eliminate_regs_1): Use simplify_gen_subreg rather than gen_rtx_SUBREG. Index: gcc/lra-eliminations.c === --- gcc/lra-eliminations.c (revision 192983) +++ gcc/lra-eliminations.c (working copy) @@ -550,7 +550,8 @@ return x; } else - return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x)); + return simplify_gen_subreg (GET_MODE (x), new_rtx, + GET_MODE (new_rtx), SUBREG_BYTE (x)); } return x; I am running the full test. It works. Can you check in your patch? I will check in my testcase. Thanks. Vlad, is the patch OK? Sure. Thanks.
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote: And tree expressions don't have TREE_BLOCK before gimple-low either. So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple stmts as well as all expression in the operands. It is not overwriting anything, no frontend sets TREE_BLOCK for any expression, the way frontends associate IL with BLOCKs is by putting them inside of BIND_EXPR/GIMPLE_BIND after gimplification, and it is gimple-low responsibility to set it. In 4.3 before tuples, it was solely gimple-low that set TREE_BLOCK initially. Before the location_t changes, again it was gimple-low that was the first setter of TREE_BLOCK, which was valid for all IS_EXPR_CODE_CLASS. So, IMNSHO gimple-low should merge location_t with block for all gimple stmts and all tree expressions used in its operands. It shouldn't be set on trees that can be shared, so say decls etc. should keep using just location_t's without associated block. So perhaps the right test for gimple-low setting of block is CAN_HAVE_LOCATION_P (exp) !tree_node_can_be_shared (exp). Jakub I Kind of like this idea. What do you guys think? I question the need of BLOCK info on expression trees. If BLOCKs are relevant then the tree ends up referencing a declaration with a BLOCK as context, no? Thus, the case int tem, a; { int a; ... tem = a; } int b = tem + 5; where we may end up with gimple like b = a + 5; thus mixing two BLOCKs inside a stmt (and no expression trees to attach different BLOCKs) is no different from the case where we end up with expression trees. Thus my original question - why isn't a NULL BLOCK treated the same as UNKNOWN_LOCATION? Or rather, _why_ does Michas patch not work? Did you analyze the guality fails? Thanks, Richard. Thanks, Dehao
Re: [PATCH] Replace const_vector with match_operand in sse.md
I changed the patch according Uros' remarks. Please, have a look. Changelog: 2012-10-30 Andrey Turetskiy andrey.turets...@gmail.com * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to CODE_FOR_avx2_pmulhrswv16hi3. * config/i386/predicates.md (const1_operand): Extend for vectors. * config/i386/sse.md (ssse3_avx2): Extend. (ssedoublemode): Ditto. (sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3, avx2_uavgv16hi3 and sse2_uavgv8hi3 into one. (*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3, *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one. (PMULHRSW): New. (ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3, ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one. (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand. (*ssse3_pmulhrswv8hi3): Ditto. (*ssse3_pmulhrswv4hi3): Ditto. --- Best regards, Andrey Turetskiy On Wed, Oct 24, 2012 at 5:36 PM, Uros Bizjak ubiz...@gmail.com wrote: On Wed, Oct 24, 2012 at 3:01 PM, Andrey Turetskiy andrey.turets...@gmail.com wrote: On Tue, Oct 23, 2012 at 2:45 PM, Andrey Turetskiy andrey.turets...@gmail.com wrote: Hi, This patch replaces large const_vector constructions with match_operand in sse.md to decrease its size. Is it ok? No, you don't have to touch generic expand machinery. In the expander, use (match_dup X), and declare operands[X] = CONST1_RTX (..some_mode..) in preparation statement. In unnamed patterns, use const1_operand operand predicate. You should extend existing const1_operand in the same way as const0_operand. This approach is not compatible with named insn patterns, which duplicate its functionality as pattern matcher and as an expander. Uros. const_vector_replace.patch Description: Binary data
Re: [PATCH] Fix debug info for expr and jump stmt
Hi, On Tue, 30 Oct 2012, Richard Biener wrote: On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote: And tree expressions don't have TREE_BLOCK before gimple-low either. So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple stmts as well as all expression in the operands. That would be a step away from the ideal situation, so we rather should first analyze why the testcase fails with my patch. I expected some fallout and am actually surprised it's only one testcase :) What we should end up with in the ideal world is that we simply have no expressions in gimple (and hence no place to store any locations for them), except via gimple statements. I question the need of BLOCK info on expression trees. If BLOCKs are relevant then the tree ends up referencing a declaration with a BLOCK as context, no? Thus, the case int tem, a; { int a; ... tem = a; } int b = tem + 5; where we may end up with gimple like b = a + 5; thus mixing two BLOCKs inside a stmt (and no expression trees to attach different BLOCKs) is no different from the case where we end up with expression trees. Thus my original question - why isn't a NULL BLOCK treated the same as UNKNOWN_LOCATION? Since merging location and block a null BLOCK doesn't imply no location. It can very well have a location without a block. What we might want to imply is that a null BLOCK implies the BLOCK from the statement. But as you say, first we should find out why my patch breaks the one testcase. Or rather, _why_ does Michas patch not work? Did you analyze the guality fails? Ciao, Michael.
Re: [PR54693] loss of debug info in jump threading and loop ivopts
On Fri, Oct 26, 2012 at 8:30 AM, Alexandre Oliva aol...@redhat.com wrote: Both jump threading and loop induction variable optimizations were dropping useful debug information, and it took improvements in both for debug info about relevant variables in the enclosed testcase to survive all the way to the end. The first problem was that jump threading could bypass blocks containing debug stmts, losing the required bindings. The solution was to propagate bypassed debug insns into the target of the bypass. Even if the new confluence ends up introducing new PHI nodes, the propagated debug binds will resolve to them, just as intended. This is implemented in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch The second part of the problem was that, when performing loop ivopts, we'd often drop PHI nodes and other SSA names for unused ivs. Although we had code to propagate SSA DEFs into debug uses upon their removal, this couldn't take care of PHI nodes (no debug phis or conditional debug binds), and it was precisely at the removal of a PHI node that we dropped debug info for the loop in the provided testcase. Once Jakub figured out how to compute an unused iv out of available ivs (thanks!), it was easy to introduce debug temps with the expressions to compute them, so that debug binds would remain correct as long as the unused iv can still be computed out of the others. (If it can't, we'll still try propagation, but may end up losing at the PHI nodes). I had thought that replacing only the PHI nodes would suffice, but it turned out that replacing all unused iv defs with their equivalent used-IV expressions got us better coverage, so this is what the 5th patch below does: vta-ivopts-replace-dropped-phis-pr54693.patch + if (count 1) + { + tree vexpr = make_node (DEBUG_EXPR_DECL); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (comp); + if (SSA_NAME_VAR (def)) + DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def)); + else + DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr)); simply always use the TREE_TYPE path. TYPE_MODE is always valid for SSA_NAMEs. + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) + { + if (!gimple_debug_bind_p (stmt)) + continue; + + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, comp); + + if (!comp) + BREAK_FROM_IMM_USE_STMT (imm_iter); how does comp magically become NULL_TREE here? Btw, what's all the fuzz with IV candidates, etc? At least for non-PHIs I don't see why the regular release_ssa_name way of doing things does not work. IVOPTs is slow enough ... Richard. Regression testing revealed -fcompare-debug regressions exposed by these patches. x86-specific code introduces pre-reload scheduling dependencies between calls and likely-spilled parameter registers, but it does so in a way that's AFAICT buggy, and fragile to the presence of debug insns at the top of a block: we won't introduce a dependency for the first insn of the block, even if we'd rather have such a dependency. This fails to achieve the intended effect, and it also causes codegen differences when the first insn in the block happens to be a debug insn, for then we will add the intended dependency. The first patch below, vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug insns, so as to remove the difference, and moves the end of the backward scan to the insn before the first actual insn, so that we don't refrain from considering it for dependencies. This in turn required an additional test to make sure we wouldn't go past the nondebug head if first_arg happened to be the head. The introduction of debug insns at new spots also exposed a bug in loop unrolling: we'd unroll a loop a different number of times depending on whether or not its latch is an empty block. The propagation or introduction of debug insns in previously-empty latch blocks caused loops to be unrolled a different number of times depending on the presence of the debug insns, which triggered -fcompare-debug errors. The fix was to count only nondebug insns towards the decision on whether the latch block was empty. This is implemented in the second patch below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch Guality testsuite regressions given the patches above revealed that the fast DCE global dead debug substitution introduced for PR54551 was not correct: it was possible for us to visit, for the last time, a block with a REG used in debug stmts after its death before we visited the block with the debug use. As a result, we'd fail to emit a debug temp at the not-revisited block, and the debug temp bindings introduced at other blocks might be insufficient
Non-dominating loop bounds in tree-ssa-loop-niter 1/4
Hi, this is first patch of change of tree-ssa-loop-niter to consider bounds that are not in block dominating latch. This patch makes them to be recorded and they are not used. I plan to followup with: 1) patch to add simple shortest path walk at the end of estimate_numbers_of_iterations_loop determining bound of i.e. int a[10]; int b[10]; for (i=0;in;i++) if (t()) q(a[i]); else q(a[i]); 2) make complette loop unrolling to kill all statements known to not be executed in the last iteration by __builtin_unreachable to silence part of -Warray-bounds warnings currently breaking bootstrap with -O3 3) make duplicate_loop_to_header_edge in peeling mode to do the same silencing the rest of warnings 4) make branch prediction code to drop the prediction on exits not dominating latch. Bootstrapped/regtested x86_64-linux, OK? * tree-ssa-loop-niter.c (number_of_iterations_exit): New parameter EVERY_ITERATION with implicit value of true. (record_estimate): Check dominance relationship of the basic block we are estimating on instead of relying on UPPER to be false. (struct ilb_data): Drop RELIABLE. (idx_infer_loop_bounds): Update. (infer_loop_bounds_from_ref): Drop parameter RELIABLE. (infer_loop_bounds_from_array): Drop parameter RELIABLE. (infer_loop_bounds_from_undefined): Update comments and handling of RELIABLE. (estimate_numbers_of_iterations_loop): Record all bounds. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192986) +++ tree-ssa-loop-niter.c (working copy) @@ -1793,12 +1793,15 @@ loop_only_exit_p (const struct loop *loo meaning described in comments at struct tree_niter_desc declaration), false otherwise. If WARN is true and -Wunsafe-loop-optimizations was given, warn if the optimizer is going to use - potentially unsafe assumptions. */ + potentially unsafe assumptions. + When EVERY_ITERATION is true, only tests that are known to be executed + every iteration are considered (i.e. only test that alone bounds the loop). + */ bool number_of_iterations_exit (struct loop *loop, edge exit, struct tree_niter_desc *niter, - bool warn) + bool warn, bool every_iteration) { gimple stmt; tree type; @@ -1806,7 +1809,8 @@ number_of_iterations_exit (struct loop * enum tree_code code; affine_iv iv0, iv1; - if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src)) + if (every_iteration + !dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src)) return false; niter-assumptions = boolean_false_node; @@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree loop-bounds = elt; } + /* If statement is executed on every path to the loop latch, we can directly + infer the upper bound on the # of iterations of the loop. */ + if (!dominated_by_p (CDI_DOMINATORS, loop-latch, gimple_bb (at_stmt))) +return; + /* Update the number of iteration estimates according to the bound. If at_stmt is an exit then the loop latch is executed at most BOUND times, otherwise it can be executed BOUND + 1 times. We will lower the estimate @@ -2651,7 +2660,6 @@ struct ilb_data { struct loop *loop; gimple stmt; - bool reliable; }; static bool @@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree * struct ilb_data *data = (struct ilb_data *) dta; tree ev, init, step; tree low, high, type, next; - bool sign, upper = data-reliable, at_end = false; + bool sign, upper = true, at_end = false; struct loop *loop = data-loop; if (TREE_CODE (base) != ARRAY_REF) @@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree * STMT is guaranteed to be executed in every iteration of LOOP.*/ static void -infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref, - bool reliable) +infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref) { struct ilb_data data; data.loop = loop; data.stmt = stmt; - data.reliable = reliable; for_each_index (ref, idx_infer_loop_bounds, data); } @@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop executed in every iteration of LOOP. */ static void -infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable) +infer_loop_bounds_from_array (struct loop *loop, gimple stmt) { if (is_gimple_assign (stmt)) { @@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo /* For each memory access, analyze its access function and record a bound on the loop iteration domain. */ if (REFERENCE_CLASS_P (op0)) - infer_loop_bounds_from_ref (loop, stmt, op0, reliable); + infer_loop_bounds_from_ref (loop, stmt, op0); if (REFERENCE_CLASS_P
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 3:49 PM, Michael Matz m...@suse.de wrote: Hi, On Tue, 30 Oct 2012, Richard Biener wrote: On Tue, Oct 30, 2012 at 3:17 PM, Dehao Chen de...@google.com wrote: And tree expressions don't have TREE_BLOCK before gimple-low either. So IMNSHO it is gimple-low.c that should set TREE_BLOCK of all the gimple stmts as well as all expression in the operands. That would be a step away from the ideal situation, so we rather should first analyze why the testcase fails with my patch. I expected some fallout and am actually surprised it's only one testcase :) What we should end up with in the ideal world is that we simply have no expressions in gimple (and hence no place to store any locations for them), except via gimple statements. I question the need of BLOCK info on expression trees. If BLOCKs are relevant then the tree ends up referencing a declaration with a BLOCK as context, no? Thus, the case int tem, a; { int a; ... tem = a; } int b = tem + 5; where we may end up with gimple like b = a + 5; thus mixing two BLOCKs inside a stmt (and no expression trees to attach different BLOCKs) is no different from the case where we end up with expression trees. Thus my original question - why isn't a NULL BLOCK treated the same as UNKNOWN_LOCATION? Since merging location and block a null BLOCK doesn't imply no location. It can very well have a location without a block. What we might want to imply is that a null BLOCK implies the BLOCK from the statement. But as you say, first we should find out why my patch breaks the one testcase. Yes, I mean we happily leave the stmt line location the same if we have a stmt with UNKNOWN_LOCATION (thus it inherits that of the previous stmt), we should do the same with BLOCKs - if a stmt has a location with NULL BLOCK then it should inherit the block info from the previous stmt. Richard. Or rather, _why_ does Michas patch not work? Did you analyze the guality fails? Ciao, Michael.
Re: [PATCH] Inter-bb range test optimization (PRs tree-optimization/19105, tree-optimization/21643, tree-optimization/46309)
On Fri, Oct 26, 2012 at 7:27 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! This patch extends optimize_range_tests optimization, so that it handles also the cases where the truth or || has been gimplifed as a series of GIMPLE_CONDs or mixture thereof and BIT_{AND,IOR}_EXPR stmts. Example of code it handles is e.g.: bb 2: v1_3 = a_2(D) != 3; v2_4 = a_2(D) != 1; v3_5 = a_2(D) != 4; v4_6 = a_2(D) != 2; v5_7 = a_2(D) != 7; v6_8 = a_2(D) != 5; v7_9 = a_2(D) != 8; v8_10 = a_2(D) != 6; _11 = v1_3 v2_4; if (_11 != 0) goto bb 3; else goto bb 7; bb 3: _13 = v3_5 v4_6; if (_13 != 0) goto bb 4; else goto bb 7; bb 4: _14 = v5_7 v6_8; if (_14 != 0) goto bb 5; else goto bb 7; bb 5: _15 = v7_9 v8_10; if (_15 != 0) goto bb 6; else goto bb 7; or: bb 2: _3 = c_2(D) == 34; _4 = c_2(D) == 32; _5 = _3 | _4; if (_5 != 0) goto bb 4; else goto bb 3; bb 3: _8 = c_2(D) = 31; _7 = (int) _8; bb 4: # _1 = PHI _7(3), 1(2) It is implemented in reassociate_bb, as that is where all the infrastructure already is, but isn't done as part of normal reassociation, but before reassociating first bb that represents a series of or || operands. As post-dominator sons aren't particularly ordered, it can happen that the optimization is performed on the first, last or middle bb of the series of bbs, so it searches both forward and backward to find suitable basic blocks. The last bb in the series (last_bb) can be of the form of bb 3 in the second example, which is how certain sequencies are gimplified when assigning or || result to a variable or returning it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok, but the code could really really have some more comments - functions not fitting in my 80x24 terminal without seeing any comment what happens here are a maintainance nightmare. Thanks, Richard. 2012-10-26 Jakub Jelinek ja...@redhat.com PR tree-optimization/19105 PR tree-optimization/21643 PR tree-optimization/46309 * tree-ssa-reassoc.c (init_range_entry): Add STMT argument and use it if EXP is NULL. (update_range_test): Handle OPCODE equal to ERROR_MARK and oe-op NULL. (optimize_range_tests): Likewise. (final_range_test_p, suitable_cond_bb, no_side_effect_bb, get_ops, maybe_optimize_range_tests): New functions. (reassociate_bb): Call maybe_optimize_range_tests if last stmt of bb is GIMPLE_COND that hasn't been visited yet. * gcc.dg/pr19105.c: New test. * gcc.dg/pr21643.c: New test. * gcc.dg/pr46309-2.c: New test. * gcc.c-torture/execute/pr46309.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2012-10-25 09:21:08.657049321 +0200 +++ gcc/tree-ssa-reassoc.c 2012-10-26 15:51:13.398025229 +0200 @@ -1,5 +1,5 @@ /* Reassociation for trees. - Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011 + Copyright (C) 2005, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. Contributed by Daniel Berlin d...@dberlin.org @@ -1713,10 +1713,12 @@ struct range_entry }; /* This is similar to make_range in fold-const.c, but on top of - GIMPLE instead of trees. */ + GIMPLE instead of trees. If EXP is non-NULL, it should be + an SSA_NAME and STMT argument is ignored, otherwise STMT + argument should be a GIMPLE_COND. */ static void -init_range_entry (struct range_entry *r, tree exp) +init_range_entry (struct range_entry *r, tree exp, gimple stmt) { int in_p; tree low, high; @@ -1727,7 +1729,8 @@ init_range_entry (struct range_entry *r, r-strict_overflow_p = false; r-low = NULL_TREE; r-high = NULL_TREE; - if (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp))) + if (exp != NULL_TREE + (TREE_CODE (exp) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (exp return; /* Start with simply saying EXP != 0 and then look at the code of EXP @@ -1735,12 +1738,14 @@ init_range_entry (struct range_entry *r, happen, but it doesn't seem worth worrying about this. We continue the outer loop when we've changed something; otherwise we break the switch, which will break the while. */ - low = build_int_cst (TREE_TYPE (exp), 0); + low = exp ? build_int_cst (TREE_TYPE (exp), 0) : boolean_false_node; high = low; in_p = 0; strict_overflow_p = false; is_bool = false; - if (TYPE_PRECISION (TREE_TYPE (exp)) == 1) + if (exp == NULL_TREE) +is_bool = true; + else if (TYPE_PRECISION (TREE_TYPE (exp)) == 1) { if (TYPE_UNSIGNED (TREE_TYPE (exp))) is_bool = true; @@ -1752,25 +1757,35 @@ init_range_entry (struct range_entry *r, while (1) { - gimple stmt; enum tree_code code; tree arg0, arg1, exp_type; tree nexp;
Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)
On Sun, Oct 28, 2012 at 6:13 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Thus, you can allow the length to vary downwards as well as upwards across iterations with suitable definitions of the @code{length} attribute and/or @code{ADJUST_INSN_LENGTH}. Care has to be taken that this does not lead to infinite loops. I don't see that you can shrink length with just suitable lock_length and length attributes. I disagree there (for certain values of 'just'), but we can just agree to disagree on this point because... What seems to be the cruical difference is that you apply ADJUST_INSN_LENGTH after combining lock-length-max and length. But then you _decrease_ length with ADJUST_INSN_LENGHT ... Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is applied afterwards? Thus, Well, actually, I found a number of problems with ADJUST_INSN_LENGTH: - it is not applied to inner insn is a delay slot sequence. You can sort of work around this by stitching recursive calls together, but then you are faced with another problem: - You don't know what the length prior to ADJUST_INSN_LENGTH was. That was even worse with the non-unique uids where you didn't knew squat about the instructions in the delay slot. - As you said, I want the adjustment happen after the maximum calculation. Well, usually. There are actually special cases where it would be useful to have some sort of maximum calculation in place, to break up alignment-driven cycles, but only applicable with a lesser priority than for the range-driven branch offset calculations. But adding yet another macro neither does solve all these problems, nor would it align with our goal to move away from target macros. I found now an alternate way to make the ARC port termiante building its insn-attrtab.c - it involves using match_test(get_attr_length (insn) == 2) instead of eq_attr([lock_]length (const_int 2)) - where I really want to know if the instruction was considered short in the previous iteration. So, I have made a patch to replace the ADJUST_INSN_LENGTH macro in final.c with an adjust_insn_length hook, for which a default implementation using ADJUST_INSN_LENGTH is provided in targhooks.c to provide for an easy transition. I've looked at the existing ports that use ADJUST_INSN_LENGTH, and some seem to prefer to return an adjustment to be added to the length, while others prefer to return the new length. The latter seemed to be slightly more numerous, so I went with that. The hook has two new parameters: - a flag that tells it if the insn in question is a delay sequence. The default hook implementation skips the invocation of ADJUST_INSN_LENGTH in this case for the sake of compatibility. - a pointer to int to set the number of iteration loops till the length locking feature is supposed to apply to this instruction length when using the increasing size calculations. The pointed-to value is initialized to zero, which means that length locking is always applied (assuming final.c uses the increasing length algorithm). Setting this to a higher number effectively gives the new instruction length a lower priority to be put into uid_lock_length. Note that Care has to be taken that this does not lead to infinite loops. doesn't convince me that is properly designed ;) With the hook mechanism, it is much harder to create an infinite loop inside shorten_branches. (It would involve something like setting iteration_threshold to MAX_INT and making length decreasing when niter is at MAX_INT, then letting integer overflow of niter take its course. Making it impossible for a port maintainer to get things wrong is not a meaningful goal for GCC, but making it straightforward to get it right is.) This patch builds on: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html Bootstrapped in revision 192879 on i686-pc-linux-gnu. Tested with config-list.mk on x86_64-unknown-linux-gnu. Apart from the iteration_threshold the hookization would be straight-forward. Now I cannot decipher from the patch what functional change it introduces ;) There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to warrant introducing a hook without removing the macro IMHO. Thanks, Richard. 2012-10-28 Joern Rennecke joern.renne...@embecosm.com * doc/tm.texi.in (@hook TARGET_ADJUST_INSN_LENGTH): Add. * doc/tm.texi: Regenerate. * final.c (get_attr_length_1): Use targetm.adjust_insn_length instead of ADJUST_INSN_LENGTH. (adjust_length): New function. (shorten_branches): Use adjust_length instead of ADJUST_INSN_LENGTH. * target.def (adjust_insn_length): New hook. * targhooks.c (default_adjust_insn_length): New function. * targhooks.h (default_adjust_insn_length): Declare. diff -drup gcc-192879-haveattr/doc/tm.texi
Re: Non-dominating loop bounds in tree-ssa-loop-niter 1/4
On Tue, 30 Oct 2012, Jan Hubicka wrote: Hi, this is first patch of change of tree-ssa-loop-niter to consider bounds that are not in block dominating latch. This patch makes them to be recorded and they are not used. I plan to followup with: 1) patch to add simple shortest path walk at the end of estimate_numbers_of_iterations_loop determining bound of i.e. int a[10]; int b[10]; for (i=0;in;i++) if (t()) q(a[i]); else q(a[i]); 2) make complette loop unrolling to kill all statements known to not be executed in the last iteration by __builtin_unreachable to silence part of -Warray-bounds warnings currently breaking bootstrap with -O3 3) make duplicate_loop_to_header_edge in peeling mode to do the same silencing the rest of warnings 4) make branch prediction code to drop the prediction on exits not dominating latch. Bootstrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. * tree-ssa-loop-niter.c (number_of_iterations_exit): New parameter EVERY_ITERATION with implicit value of true. (record_estimate): Check dominance relationship of the basic block we are estimating on instead of relying on UPPER to be false. (struct ilb_data): Drop RELIABLE. (idx_infer_loop_bounds): Update. (infer_loop_bounds_from_ref): Drop parameter RELIABLE. (infer_loop_bounds_from_array): Drop parameter RELIABLE. (infer_loop_bounds_from_undefined): Update comments and handling of RELIABLE. (estimate_numbers_of_iterations_loop): Record all bounds. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192986) +++ tree-ssa-loop-niter.c (working copy) @@ -1793,12 +1793,15 @@ loop_only_exit_p (const struct loop *loo meaning described in comments at struct tree_niter_desc declaration), false otherwise. If WARN is true and -Wunsafe-loop-optimizations was given, warn if the optimizer is going to use - potentially unsafe assumptions. */ + potentially unsafe assumptions. + When EVERY_ITERATION is true, only tests that are known to be executed + every iteration are considered (i.e. only test that alone bounds the loop). + */ bool number_of_iterations_exit (struct loop *loop, edge exit, struct tree_niter_desc *niter, -bool warn) +bool warn, bool every_iteration) { gimple stmt; tree type; @@ -1806,7 +1809,8 @@ number_of_iterations_exit (struct loop * enum tree_code code; affine_iv iv0, iv1; - if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src)) + if (every_iteration + !dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src)) return false; niter-assumptions = boolean_false_node; @@ -2568,6 +2572,11 @@ record_estimate (struct loop *loop, tree loop-bounds = elt; } + /* If statement is executed on every path to the loop latch, we can directly + infer the upper bound on the # of iterations of the loop. */ + if (!dominated_by_p (CDI_DOMINATORS, loop-latch, gimple_bb (at_stmt))) +return; + /* Update the number of iteration estimates according to the bound. If at_stmt is an exit then the loop latch is executed at most BOUND times, otherwise it can be executed BOUND + 1 times. We will lower the estimate @@ -2651,7 +2660,6 @@ struct ilb_data { struct loop *loop; gimple stmt; - bool reliable; }; static bool @@ -2660,7 +2668,7 @@ idx_infer_loop_bounds (tree base, tree * struct ilb_data *data = (struct ilb_data *) dta; tree ev, init, step; tree low, high, type, next; - bool sign, upper = data-reliable, at_end = false; + bool sign, upper = true, at_end = false; struct loop *loop = data-loop; if (TREE_CODE (base) != ARRAY_REF) @@ -2737,14 +2745,12 @@ idx_infer_loop_bounds (tree base, tree * STMT is guaranteed to be executed in every iteration of LOOP.*/ static void -infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref, - bool reliable) +infer_loop_bounds_from_ref (struct loop *loop, gimple stmt, tree ref) { struct ilb_data data; data.loop = loop; data.stmt = stmt; - data.reliable = reliable; for_each_index (ref, idx_infer_loop_bounds, data); } @@ -2753,7 +2759,7 @@ infer_loop_bounds_from_ref (struct loop executed in every iteration of LOOP. */ static void -infer_loop_bounds_from_array (struct loop *loop, gimple stmt, bool reliable) +infer_loop_bounds_from_array (struct loop *loop, gimple stmt) { if (is_gimple_assign (stmt)) { @@ -2763,10 +2769,10 @@ infer_loop_bounds_from_array (struct loo /* For each memory access, analyze its access function and record a bound on the loop iteration domain. */ if
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote: I question the need of BLOCK info on expression trees. If BLOCKs are relevant then the tree ends up referencing a declaration with a BLOCK as context, no? Thus, the case int tem, a; { int a; ... tem = a; } int b = tem + 5; where we may end up with gimple like b = a + 5; thus mixing two BLOCKs inside a stmt (and no expression trees to attach different BLOCKs) is no different from the case where we end up with expression trees. IMHO either we don't use locations at all on tree expressions (thus no source location nor block), or both. A source location has always an associated block it is present in. Of course for shared trees we can't put there any block, as it can appear anywhere. Thus my original question - why isn't a NULL BLOCK treated the same as UNKNOWN_LOCATION? Or rather, _why_ does Michas patch not work? Did you analyze the guality fails? Micha's patch is degrading debug info quality. Whenever some expression has different source location from the source location of the gimple stmt, then assuming that other source location is from the same block is wrong, it could very well be from a different one. On the testcase that fails with Micha's patch, we have: [pr43479.c : 8:4] l_2 = l_1(D) + 1; [pr43479.c : 8:4] # DEBUG l = l_2 [pr43479.c : 10:9] # DEBUG h = n_3(D) [pr43479.c : 12:11] # DEBUG i = k_4(D) [pr43479.c : 13:8] k_5 = k_4(D) + 1; [pr43479.c : 13:8] # DEBUG k = k_5 [pr43479.c : 17:11] # DEBUG j = m_6(D) [pr43479.c : 18:8] m_7 = m_6(D) + 1; [pr43479.c : 18:8] # DEBUG m = m_7 [pr43479.c : 22:3] __asm__ __volatile__( : : r k_5, r l_2); [pr43479.c : 23:3] __asm__ __volatile__( : : r m_7, r n_3(D)); where line 8 is from the outer block in the source, line 10 from the middle block, line 12/13 from first innermost block, line 17/18 from second innermost block. But all of the l_2, k_5 and m_7 setters are TERed, so everything is emitted when expanding the two asm statements and with Micha's patch the block used is the block of the asm statement, while previously each TERed statement got its own block. I'd say either we should do the TREE_BLOCK setting on all non-shareable trees during gimple-low and clear the block (but then likely whole location?; it doesn't make sense to say in the debugger that something has certain source location when you can't print variables declared in that location) if copying expressions for use elsewhere, outside of the containing function. Or say during gimplification or gimple-low.c simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION to make it clear we don't use it (wonder if that could affect debug info quality, perhaps not that much), but during expansion if creating trees from TERed stmts they need to be set back, or the current location/block adjusted accordingly. Jakub
Re: [PATCH] PR c++/54955 - Fail to parse alignas expr at the beginning of a declaration
OK. Jason
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 4:03 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 30, 2012 at 03:38:11PM +0100, Richard Biener wrote: I question the need of BLOCK info on expression trees. If BLOCKs are relevant then the tree ends up referencing a declaration with a BLOCK as context, no? Thus, the case int tem, a; { int a; ... tem = a; } int b = tem + 5; where we may end up with gimple like b = a + 5; thus mixing two BLOCKs inside a stmt (and no expression trees to attach different BLOCKs) is no different from the case where we end up with expression trees. IMHO either we don't use locations at all on tree expressions (thus no source location nor block), or both. A source location has always an associated block it is present in. Of course for shared trees we can't put there any block, as it can appear anywhere. Thus my original question - why isn't a NULL BLOCK treated the same as UNKNOWN_LOCATION? Or rather, _why_ does Michas patch not work? Did you analyze the guality fails? Micha's patch is degrading debug info quality. Whenever some expression has different source location from the source location of the gimple stmt, then assuming that other source location is from the same block is wrong, it could very well be from a different one. On the testcase that fails with Micha's patch, we have: [pr43479.c : 8:4] l_2 = l_1(D) + 1; [pr43479.c : 8:4] # DEBUG l = l_2 [pr43479.c : 10:9] # DEBUG h = n_3(D) [pr43479.c : 12:11] # DEBUG i = k_4(D) [pr43479.c : 13:8] k_5 = k_4(D) + 1; [pr43479.c : 13:8] # DEBUG k = k_5 [pr43479.c : 17:11] # DEBUG j = m_6(D) [pr43479.c : 18:8] m_7 = m_6(D) + 1; [pr43479.c : 18:8] # DEBUG m = m_7 [pr43479.c : 22:3] __asm__ __volatile__( : : r k_5, r l_2); [pr43479.c : 23:3] __asm__ __volatile__( : : r m_7, r n_3(D)); where line 8 is from the outer block in the source, line 10 from the middle block, line 12/13 from first innermost block, line 17/18 from second innermost block. But all of the l_2, k_5 and m_7 setters are TERed, so everything is emitted when expanding the two asm statements and with Micha's patch the block used is the block of the asm statement, while previously each TERed statement got its own block. I'd say either we should do the TREE_BLOCK setting on all non-shareable trees during gimple-low and clear the block (but then likely whole location?; it doesn't make sense to say in the debugger that something has certain source location when you can't print variables declared in that location) if copying expressions for use elsewhere, outside of the containing function. Or say during gimplification or gimple-low.c simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION to make it clear we don't use it (wonder if that could affect debug info quality, perhaps not that much), but during expansion if creating trees from TERed stmts they need to be set back, or the current location/block adjusted accordingly. So maybe TER (well, those looking up the stmt) should pick the location from the TERed statement properly then? Richard. Jakub
Re: [PR54693] loss of debug info in jump threading and loop ivopts
On Tue, Oct 30, 2012 at 03:51:19PM +0100, Richard Biener wrote: + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) + { + if (!gimple_debug_bind_p (stmt)) + continue; + + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, comp); + + if (!comp) + BREAK_FROM_IMM_USE_STMT (imm_iter); how does comp magically become NULL_TREE here? Looks like pasto to me, from the first loop. BTW, I'd also think that the first loop should set count = 2 if the debug stmt already has a non-trivial expression (i.e. rhs isn't just the SSA_NAME), to force a debug temp in that case to avoid creating too large debug stmts. Btw, what's all the fuzz with IV candidates, etc? At least for non-PHIs I don't see why the regular release_ssa_name way of doing things does not work. IVOPTs is slow enough ... Even if it is non-PHI, release_ssa_name will replace it with the definition, and then on another release_ssa_name again and again, and finally either be lucky enough that some SSA_NAME stays (is an IV that has been kept), but more often you'll just reach the PHI node and end up with a long list of: DEBUG D#7 = NULL DEBUG D#6 = D#7 DEBUG D#5 = D#6 DEBUG D#4 = D#5 DEBUG D#3 = D#4 DEBUG D#2 = D#3 DEBUG D#1 = D#2 (the NULL because of PHI). We don't need to find optimal IV replacement, so the code tries just a couple of them (perhaps the 64 should be a param, and perhaps could be lower by default), it just helps if the expression is smaller (smaller debug info), and if it contains as few SSA_NAMEs as possible (then it is more likely it will actually be useful). Jakub
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote: So maybe TER (well, those looking up the stmt) should pick the location from the TERed statement properly then? Perhaps, but Micha's patch doesn't do that. But in that case IMHO it still would help to set all expr locations to UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore the locations. Jakub
Re: [PATCH] pass filtering for -fopt-info
On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai sing...@google.com wrote: As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html, I have added the -fopt-info pass filtering in the attached patch. The basic idea is that there are optimization pass groups and a user can selectively enable dumps for these group(s) via command-line syntax. Currently, I define the following optimization groups: 'loop', 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups. If a pass doesn't explicitly specify an optimization group, (denoted by OPTGROUP_NONE) then a group is assigned based on the pass type. These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'. Also there is a catch-all group, called 'optall'. The options for -fopt-info dumps verbosity remain 'optimized', 'missed', 'note', and 'all'. Since these two types of options, verbosity and optimization groups are non-overlapping, I have decided to freely mix them. Something like this works as expected, i.e., dump missed vectorizer info into vec.missed. gcc ... -fopt-info-vec-missed=vec.missed which is equivalent to gcc ... -fopt-info-missed-vec=vec.missed However, the order still matters, and it can be somewhat confusing. For example, gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt will dump missed and optimized vectorizer info into vec.opt, while no vec.miss is produced. This is due to the fact that the latter group specifier, 'vec' overrides the first one. However, the 'missed' and 'optimized' are both honored as there is no conflict there. This is somewhat confusing. Hopefully, this type of usage would not be common. What I'd expect from that would be both vec.miss and vec.opt being populated ... (thus go away from the duality of dump files to primary dump file plus a set of alternate dump files). I have updated the documentation to include -fopt-info examples, and added some details about -fopt-info command line conflicts. I like it overall, not sure if we want to pre-populate the OPTGROUP set too much at this point. Like what is 'tree' or 'rtl' to users? nothing I think. 'ipa' yes. 'lto'? sounds redundant with 'ipa' to me. 'omp'? we don't have any optimizations here. Thus please drop TREE, RTL, LTO and OMP for now. Otherwise I'm leaving it for comments from other folks. Thanks, Richard. Thanks, Sharad
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote: So maybe TER (well, those looking up the stmt) should pick the location from the TERed statement properly then? Perhaps, but Micha's patch doesn't do that. But in that case IMHO it still would help to set all expr locations to UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore the locations. Yes indeed. Richard. Jakub
Re: PATCH: PR rtl-optimization/55093: [4.8 Regression] [x32] -maddress-mode=long failed
On 10/30/2012 06:34 AM, Richard Sandiford wrote: H.J. Lu hjl.to...@gmail.com writes: LRA has if (REG_P (reg) (ep = get_elimination (reg)) != NULL) { rtx to_rtx = replace_p ? ep-to_rtx : ep-from_rtx; if (! replace_p) { offset += (ep-offset - ep-previous_offset); offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); } if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); Reload has rtx to_rtx = ep-to_rtx; offset += ep-offset; offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG) to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx); (gdb) call debug_rtx (ep-to_rtx) (reg/f:DI 7 sp) (gdb) call debug_rtx (ep-from_rtx) (reg/f:DI 16 argp) (gdb) gen_lowpart returns (reg/f:DI 7 sp) for reload and (reg:SI 16 argp) for LRA. They are caused by if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM /* We should convert arg register in LRA after the elimination if it is possible. */ xregno == ARG_POINTER_REGNUM ! lra_in_progress) return -1; It doesn't work in this case. This testcase shows that LRA can't convert arg register after the elimination. Here is a patch to remove ra_in_progress check for ARG_POINTER_REGNUM. Tested on Linux.x86-64. OK to install? Thanks HJ. This looks good to me. As well as your testcase, I think it would be dangerous to reduce this kind of subreg during non-final elimination in cases where the argument pointer occupies more than one hard register (like avr IIRC). We could end up with something like ARG_POINTER_REGNUM+1, which wouldn't show up as an elimination register during the rest of LRA. It's important that we do get rid of the subreg during the final elimination stage, but I think alter_subreg already handles that case. Since this code is outside the LRA files: patch is OK if Vlad agrees. I added this code for a reason probably to solve some target problems. So I am not sure but let us try. It is ok for me to commit the patch if there are no regressions on x86/x86-64.
Re: [PATCH] Replace const_vector with match_operand in sse.md
On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy andrey.turets...@gmail.com wrote: I changed the patch according Uros' remarks. Please, have a look. Changelog: 2012-10-30 Andrey Turetskiy andrey.turets...@gmail.com * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to CODE_FOR_avx2_pmulhrswv16hi3. * config/i386/predicates.md (const1_operand): Extend for vectors. * config/i386/sse.md (ssse3_avx2): Extend. (ssedoublemode): Ditto. (sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3, avx2_uavgv16hi3 and sse2_uavgv8hi3 into one. (*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3, *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one. (PMULHRSW): New. (ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3, ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one. (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand. (*ssse3_pmulhrswv8hi3): Ditto. (*ssse3_pmulhrswv4hi3): Ditto. +{ + ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands); + operands[3] = CONST1_RTX(MODEmode); +}) Please put operands[3] initialization before the call to ix86_f_b_o_n_c. We don't want to pass uninitialized operands around. +{ + if (which_alternative == 0 + (MODEmode == V16QImode + || MODEmode == V8HImode)) +return pavgssemodesuffix\t{%2, %0|%0, %2}; + return vpavgssemodesuffix\t{%2, %1, %0|%0, %1, %2}; +} + [(set (attr isa) + (if_then_else + (match_test which_alternative == 0 (MODEmode == V16QImode || MODEmode == V8HImode)) + (const_string noavx) + (const_string avx))) + (set (attr prefix_data16) +(if_then_else (eq_attr isa noavx) + (const_string 1) + (const_string *))) + (set (attr prefix) +(if_then_else (eq_attr isa noavx) + (const_string orig) + (const_string vex))) Uh, oh. Please note that isa attribute enables or disables the alternative through enabled attribute. Just change the mode attribute to sseinsnmode and everything will magically work: - AVX2 implies AVX, so it enables alternative 1, while disabling alternative 0 (and vice versa when AVX is disabled through noavx isa attribute). - Correct modes are conditionally enabled via VI12_AVX2 iterator - Base ISA level is enabled via insn predicate (TARGET_SSE2). You have to track three dependant conditions to calculate how insn pattern/mode/operand predicate are enabled ;) Uros,
Re: [PATCH] pass filtering for -fopt-info
On Tue, Oct 30, 2012 at 1:21 AM, Sharad Singhai sing...@google.com wrote: As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html, I have added the -fopt-info pass filtering in the attached patch. The basic idea is that there are optimization pass groups and a user can selectively enable dumps for these group(s) via command-line syntax. Currently, I define the following optimization groups: 'loop', 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups. If a pass doesn't explicitly specify an optimization group, (denoted by OPTGROUP_NONE) then a group is assigned based on the pass type. These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'. I agree with Richard -- we don't need these implicit groups. Also there is a catch-all group, called 'optall'. Why is this needed? Isn't that the default? The options for -fopt-info dumps verbosity remain 'optimized', 'missed', 'note', and 'all'. Since these two types of options, verbosity and optimization groups are non-overlapping, I have decided to freely mix them. Something like this works as expected, i.e., dump missed vectorizer info into vec.missed. gcc ... -fopt-info-vec-missed=vec.missed which is equivalent to gcc ... -fopt-info-missed-vec=vec.missed However, the order still matters, and it can be somewhat confusing. For example, gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt will dump missed and optimized vectorizer info into vec.opt, while no vec.miss is produced. This is due to the fact that the latter group specifier, 'vec' overrides the first one. However, the 'missed' and 'optimized' are both honored as there is no conflict there. This is somewhat confusing. Hopefully, this type of usage would not be common. Please document that only one alt dump file per optimization-group is supported. I have updated the documentation to include -fopt-info examples, and added some details about -fopt-info command line conflicts. thanks, David Thanks, Sharad
Re: [PATCH] Fix debug info for expr and jump stmt
I'd say either we should do the TREE_BLOCK setting on all non-shareable trees during gimple-low and clear the block (but then likely whole location?; it doesn't make sense to say in the debugger that something has certain source location when you can't print variables declared in that location) if copying expressions for use elsewhere, outside of the containing function. Or say during gimplification or gimple-low.c simply set t-exp.locus of all non-shareable expressions to UNKNOWN_LOCATION to make it clear we don't use it (wonder if that could affect debug info quality, perhaps not that much), but during expansion if creating trees from TERed stmts they need to be set back, or the current location/block adjusted accordingly. The debugger isn't the only consumer of debug info, and other tools might need a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work around a debug info size issue seems very short-sighted (to say the least). -- Eric Botcazou
[PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]
On 09/17/2012 12:54 PM, Florian Weimer wrote: On 09/17/2012 12:15 PM, Paolo Carlini wrote: Hi, On 09/17/2012 11:51 AM, Florian Weimer wrote: On 08/21/2012 12:37 PM, Florian Weimer wrote: I don't think there are any callers out there, but let's fix this for completeness. A compiler emitting code to call this function would still have to perform overflow checks for the new T[n][m] case, so this interface is not as helpful as it looks at first glance. Tested on x86_64-redhat-linux-gnu. Ping? This function is apparently used by compilers based on the EDG front-end, so it's not actually dead. Being code that touches the library, the patch should go to the libstdc++ maliling list too. Likewise the testcase, should be in the libstdc++ testsuite, I guess. Oh, I thought that this wouldn't apply to internal C++ support code. Sorry. That said, I didn't really follow the details of your recent work. Who did? Jason? I would gently ping the same maintainer. Indeed, Jason reviewed that. Cc:ing. Ping? Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html Thanks, Florian -- Florian Weimer / Red Hat Product Security Team
Re: [PATCH] pass filtering for -fopt-info
On Tue, Oct 30, 2012 at 8:28 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 30, 2012 at 9:21 AM, Sharad Singhai sing...@google.com wrote: As per discussion in http://gcc.gnu.org/ml/gcc/2012-10/msg00225.html, I have added the -fopt-info pass filtering in the attached patch. The basic idea is that there are optimization pass groups and a user can selectively enable dumps for these group(s) via command-line syntax. Currently, I define the following optimization groups: 'loop', 'lto', 'inline', 'vec', and 'omp'. A pass can be in multiple groups. If a pass doesn't explicitly specify an optimization group, (denoted by OPTGROUP_NONE) then a group is assigned based on the pass type. These three are the obvious implicit groups: 'tree', 'ipa', and 'rtl'. Also there is a catch-all group, called 'optall'. The options for -fopt-info dumps verbosity remain 'optimized', 'missed', 'note', and 'all'. Since these two types of options, verbosity and optimization groups are non-overlapping, I have decided to freely mix them. Something like this works as expected, i.e., dump missed vectorizer info into vec.missed. gcc ... -fopt-info-vec-missed=vec.missed which is equivalent to gcc ... -fopt-info-missed-vec=vec.missed However, the order still matters, and it can be somewhat confusing. For example, gcc -fopt-info-vec-missed=vec.miss -fopt-info-vec-optimized=vec.opt will dump missed and optimized vectorizer info into vec.opt, while no vec.miss is produced. This is due to the fact that the latter group specifier, 'vec' overrides the first one. However, the 'missed' and 'optimized' are both honored as there is no conflict there. This is somewhat confusing. Hopefully, this type of usage would not be common. What I'd expect from that would be both vec.miss and vec.opt being populated ... (thus go away from the duality of dump files to primary dump file plus a set of alternate dump files). I have updated the documentation to include -fopt-info examples, and added some details about -fopt-info command line conflicts. I like it overall, not sure if we want to pre-populate the OPTGROUP set too much at this point. Like what is 'tree' or 'rtl' to users? nothing I think. 'ipa' yes. 'lto'? sounds redundant with 'ipa' to me. 'omp'? we don't have any optimizations here. OMP is a high level transformation, and it seems to be a good candidate group, but this part does not need to be designed now. For instance, there are a bunch of FDO related transformation (indirect call promotion, memcpy transformation etc), and coverage mismatch notes etc a good candidate to be filtered. thanks, David Thus please drop TREE, RTL, LTO and OMP for now. Otherwise I'm leaving it for comments from other folks. Thanks, Richard. Thanks, Sharad
Re: RFA: hookize ADJUST_INSN_LENGTH (Was: RFA: Add lock_lenth attribute to support the ARC port)
Quoting Richard Biener richard.guent...@gmail.com: Apart from the iteration_threshold the hookization would be straight-forward. Now I cannot decipher from the patch what functional change it introduces ;) The only change occurs if we reach an iteration count of MAX_INT iterations - which should already be indicative of a problem. At the MAX_INTth iteration, we will apply the length locking logic to instructions inside a delay slot sequence as well. If we disregard this exceptional case, there should be no functional changes unless someone defines TARGET_ADJUST_INSN_LENGTH. uid_lock_length gets initialized to allocated with XCNEWVEC. So, in particular, the elements corresponding to instructions inside delay slot sequences are initialized to zero. As default hook sets *iter_threshold to MAX_INT when inside a delay sequence, and doesn't change length, the max operation with uid_lock_length is a no-op, and *niter iter_threshold is true, hence a length change results in updating the length immediately, without changing uid_lock_length. In the case that we are not inside a delay slot sequence, the default hook leaves iter_threshold as 0, and applies ADJUST_INSN_LENGTH. Thus, when niter is 0 or larger, as is the case for the ordinary looping operation, we always find *niter = iter_threshold, and thus apply the length locking mechanism. If we are in the preliminary pass, or doing the single !increasing iteration, niter is set to -1, so *inter iter_threshold is always true, so again we update the length immediately. There are also ony 10 users of ADJUST_INSN_LENGTH, not enough to warrant introducing a hook without removing the macro IMHO. Ok, I can prepare a patch for that, even though it makes it even a bit less obvious that there's no functional change. What about the preparatory patch http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02527.html ? Can I check that in now?
Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]
Hi, Florian Weimer fwei...@redhat.com ha scritto: Ping? Patch is at: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01416.html Sorry, I don't know the code well enough to review your patch, but since I'm in CC, I still don't understand why, instead of adding a full libstdc++ testcase you are extending a C++ testcase, in old-deja even, normally considered legacy. Paolo
[PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures
Hello, Hot/cold partitioning is apparently a hot topic all of a sudden, which is a good thing of course, because it's in need of some TLC. The attached patch adds another check the RTL cfg checking (verify_flow_info) for the partitioning: A hot block can never be dominated by a cold block (because the dominated block must also be cold). This trips in PR55121. I haven't tested this with any profiling tests, but it's bound to break things. From my POV, whatever gets broken by this patch was already broken to begin with :-) If you're in CC, it's because I hope you can help test this patch. Downside of this patch is that I need dominance info. If it's not available, I compute and free it. I'm not sure if this works if dominance info status is DOM_NO_FAST_QUERY, and I don't want to recompute in this case because IMHO a verifier should be a no-op from the POV of the rest of the compiler, and updating dominators would make this patch a not-a-no-op :-) Thoughts/comments? Ciao! Steven * cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the hot partition are not dominated by blocks in the cold partition. Index: cfgrtl.c === --- cfgrtl.c(revision 191819) +++ cfgrtl.c(working copy) @@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void) rtx x; int err = 0; basic_block bb; + bool have_partitions = false; /* Check the general integrity of the basic blocks. */ FOR_EACH_BB_REVERSE (bb) @@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void) n_eh++; else if (e-flags EDGE_ABNORMAL) n_abnormal++; + + have_partitions |= is_crossing; } if (n_eh !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) @@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void) } } + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. */ + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; + if (have_partitions !err) +FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); + if (! VEC_empty (basic_block, bbs_in_cold_partition)) +{ + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bb = VEC_pop (basic_block, bbs_in_cold_partition); + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + error (non-cold basic block %d dominated +by a block in the cold partition, bb-index); + err = 1; + } + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); +} + /* Clean up. */ return err; }
Re: [PATCH] Fix debug info for expr and jump stmt
On Tue, Oct 30, 2012 at 8:29 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 30, 2012 at 4:21 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Oct 30, 2012 at 04:15:38PM +0100, Richard Biener wrote: So maybe TER (well, those looking up the stmt) should pick the location from the TERed statement properly then? Perhaps, but Micha's patch doesn't do that. But in that case IMHO it still would help to set all expr locations to UNKNOWN_LOCATION during gimple lowering, to make it clear we ignore the locations. Yes indeed. I agree, but this looks like too bold a move at this point. Shall we do that in 4.8? BTW, I updated the patch to ensure pr43479.c works fine. The testing is still on-going. Dehao gcc/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * tree-eh.c (do_return_redirection): Set location for jump statement. (do_goto_redirection): Likewise. (frob_into_branch_around): Likewise. (lower_try_finally_nofallthru): Likewise. (lower_try_finally_copy): Likewise. (lower_try_finally_switch): Likewise. * expr.c (store_expr): Use current insn location instead of expr location. (expand_expr_real): Likewise. (expand_expr_real_1): Likewise. gcc/testsuite/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * g++.dg/debug/dwarf2/block.C: New testcase. Index: testsuite/g++.dg/debug/dwarf2/block.C === --- testsuite/g++.dg/debug/dwarf2/block.C (revision 0) +++ testsuite/g++.dg/debug/dwarf2/block.C (revision 0) @@ -0,0 +1,29 @@ +// Compiler should not generate too many lexical blocks for this function. +// { dg-do compile { target { i?86-*-* x86_64-*-* } } } +// { dg-options -O0 -fno-exceptions -g -dA } + +union UElement { +void* pointer; +int integer; +}; +struct UColToken { + unsigned source; + unsigned char **rulesToParseHdl; +}; + +int uhash_hashTokens(const union UElement k) +{ + int hash = 0; + struct UColToken *key = (struct UColToken *)k.pointer; + if (key != 0) { +int len = (key-source 0xFF00)24; +int inc = ((len - 32) / 32) + 1; +const unsigned char *p = (key-source 0x00FF) ++ *(key-rulesToParseHdl); +const unsigned char *limit = p + len; +hash = *p + *limit; + } + return hash; +} + +// { dg-final { scan-assembler-not LBB10 } } Index: tree-eh.c === --- tree-eh.c (revision 192809) +++ tree-eh.c (working copy) @@ -739,6 +739,7 @@ do_return_redirection (struct goto_queue_node *q, gimple_seq_add_seq (q-repl_stmt, mod); x = gimple_build_goto (finlab); + gimple_set_location (x, q-location); gimple_seq_add_stmt (q-repl_stmt, x); } @@ -758,6 +759,7 @@ do_goto_redirection (struct goto_queue_node *q, tr gimple_seq_add_seq (q-repl_stmt, mod); x = gimple_build_goto (finlab); + gimple_set_location (x, q-location); gimple_seq_add_stmt (q-repl_stmt, x); } @@ -857,6 +859,7 @@ frob_into_branch_around (gimple tp, eh_region regi if (!over) over = create_artificial_label (loc); x = gimple_build_goto (over); + gimple_set_location (x, loc); gimple_seq_add_stmt (cleanup, x); } gimple_seq_add_seq (eh_seq, cleanup); @@ -1085,6 +1088,7 @@ lower_try_finally_nofallthru (struct leh_state *st emit_post_landing_pad (eh_seq, tf-region); x = gimple_build_goto (lab); + gimple_set_location (x, gimple_location (tf-try_finally_expr)); gimple_seq_add_stmt (eh_seq, x); } } @@ -1223,6 +1227,7 @@ lower_try_finally_copy (struct leh_state *state, s tmp = lower_try_finally_fallthru_label (tf); x = gimple_build_goto (tmp); + gimple_set_location (x, tf_loc); gimple_seq_add_stmt (new_stmt, x); } @@ -1395,6 +1400,7 @@ lower_try_finally_switch (struct leh_state *state, tmp = lower_try_finally_fallthru_label (tf); x = gimple_build_goto (tmp); + gimple_set_location (x, tf_loc); gimple_seq_add_stmt (switch_body, x); } @@ -1423,6 +1429,7 @@ lower_try_finally_switch (struct leh_state *state, gimple_seq_add_stmt (eh_seq, x); x = gimple_build_goto (finally_label); + gimple_set_location (x, tf_loc); gimple_seq_add_stmt (eh_seq, x); tmp = build_int_cst (integer_type_node, eh_index); Index: expr.c === --- expr.c (revision 192809) +++ expr.c (working copy) @@ -5030,7 +5030,7 @@ store_expr (tree exp, rtx target, int call_param_p { rtx temp; rtx alt_rtl = NULL_RTX; - location_t loc = EXPR_LOCATION (exp); + location_t loc = curr_insn_location (); if (VOID_TYPE_P (TREE_TYPE (exp))) { @@ -7869,31 +7869,7 @@ expand_expr_real (tree exp, rtx target, enum machi return ret ? ret : const0_rtx; } - /* If this
Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
On Tue, Oct 30, 2012 at 8:49 AM, Matthew Gretton-Dann wrote: On 30 October 2012 05:20, Teresa Johnson wrote: Index: cfgrtl.c === --- cfgrtl.c(revision 192692) +++ cfgrtl.c(working copy) @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ As this if() condition seems to be the canonical way to detect being in a different partition should it be moved out into a query function, and all of cfgrtl.c updated to use it? Not just in cfgrtl.c but for example also in ifcvt.c (which currently only tests for notes, that's broken). Ciao! Steven
Re: [PATCH] Fix debug info for expr and jump stmt
BTW, one thing I found confusing is that in expr.c, some code is for frontend, while some are for rtl. Shall we separate them into two files? And we don't expect to see EXPR_LOCATION in the rtl side. Thanks, Dehao
Re: [PING^2] [C++ PATCH] Add overflow checking to __cxa_vec_new[23]
On 10/30/2012 05:17 PM, Paolo Carlini wrote: Sorry, I don't know the code well enough to review your patch, but since I'm in CC, I still don't understand why, instead of adding a full libstdc++ testcase you are extending a C++ testcase, in old-deja even, normally considered legacy. AFAIK, this is the only place we have such a test. I suppose I could it put it into testsuite/18_support, but I would have to duplicate a bit of the machinery of the original test case because I can't just write a class and take the address of its constructor and destructor (whose addresses are passed to the tested functions). I really didn't want to do that because there are some platform dependencies (the __ARM_EABI__ #ifdef). Not sure if this makes sense, but those were my reasons. -- Florian Weimer / Red Hat Product Security Team
Re: [PATCH] Fix debug info for expr and jump stmt
The debugger isn't the only consumer of debug info, and other tools might need a finer granularity than a GIMPLE location, so clearing EXPR_LOCATION to work around a debug info size issue seems very short-sighted (to say the least). Hi, Eric, There might be some misunderstanding here. Clearing EXPR_LOCATION is not a work around to reduce debug size. It is aiming at making gcc implementation cleaner. And before we resetting it, we need to truely make sure nothing after gimple-low is dependent on it. Please let me know if you have other concerns. Thanks, Dehao -- Eric Botcazou
[PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when vectorization is enabled on powerpc64-linux. A reduced test case is: bergner@bns:~/gcc/BUGS cat foo.i static void (*const init_array []) (void) __attribute__ ((section (.init_array), aligned (sizeof (void *)), used)) = { 0 }; bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o bad.s bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i -o good.s bergner@bns:~/gcc/BUGS diff -u bad.s good.s --- bad.s2012-10-30 10:41:15.0 -0500 +++ good.s2012-10-30 10:41:23.0 -0500 @@ -2,7 +2,7 @@ .section.toc,aw .section.text .section.init_array,a -.align 4 +.align 3 .typeinit_array, @object .sizeinit_array, 8 init_array: The above is bad, because the extra alignment causes the linker to add some null padding to the init_array and the loader isn't expecting that and ends up segv'ing. I'd like to backport Richard's patch below to the 4.7 branch. The patch bootstrapped and regtested on powerpc64-linux with no regressions. Is it ok for the 4.7 branch? Peter Backport from mainline 2012-06-19 Richard Guenther rguent...@suse.de PR tree-optimization/53708 * tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve user-supplied alignment and alignment of decls with the used attribute. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 192988) +++ gcc/tree-vect-data-refs.c (working copy) @@ -4574,6 +4574,12 @@ if (TREE_ASM_WRITTEN (decl)) return false; + /* Do not override explicit alignment set by the user or the alignment + as specified by the ABI when the used attribute is set. */ + if (DECL_USER_ALIGN (decl) + || DECL_PRESERVE_P (decl)) +return false; + if (TREE_STATIC (decl)) return (alignment = MAX_OFILE_ALIGNMENT); else
Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures
On Tue, Oct 30, 2012 at 9:26 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, Hot/cold partitioning is apparently a hot topic all of a sudden, which is a good thing of course, because it's in need of some TLC. The attached patch adds another check the RTL cfg checking (verify_flow_info) for the partitioning: A hot block can never be dominated by a cold block (because the dominated block must also be cold). This trips in PR55121. I haven't tested this with any profiling tests, but it's bound to break things. From my POV, whatever gets broken by this patch was already broken to begin with :-) If you're in CC, it's because I hope you can help test this patch. I will try testing your patch on top of mine with our fdo benchmarks. For the others on the cc list, you may need to include my patch as well for testing. Without it, -freorder-blocks-and-partition was DOA for me. For my patch, see http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html Teresa Downside of this patch is that I need dominance info. If it's not available, I compute and free it. I'm not sure if this works if dominance info status is DOM_NO_FAST_QUERY, and I don't want to recompute in this case because IMHO a verifier should be a no-op from the POV of the rest of the compiler, and updating dominators would make this patch a not-a-no-op :-) Thoughts/comments? Ciao! Steven * cfgrtl.c (rtl_verify_flow_info_1): Verify that blocks in the hot partition are not dominated by blocks in the cold partition. Index: cfgrtl.c === --- cfgrtl.c(revision 191819) +++ cfgrtl.c(working copy) @@ -2033,6 +2033,7 @@ rtl_verify_flow_info_1 (void) rtx x; int err = 0; basic_block bb; + bool have_partitions = false; /* Check the general integrity of the basic blocks. */ FOR_EACH_BB_REVERSE (bb) @@ -2145,6 +2146,8 @@ rtl_verify_flow_info_1 (void) n_eh++; else if (e-flags EDGE_ABNORMAL) n_abnormal++; + + have_partitions |= is_crossing; } if (n_eh !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) @@ -2263,6 +2268,40 @@ rtl_verify_flow_info_1 (void) } } + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. */ + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; + if (have_partitions !err) +FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); + if (! VEC_empty (basic_block, bbs_in_cold_partition)) +{ + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! VEC_empty (basic_block, bbs_in_cold_partition)) + { + bb = VEC_pop (basic_block, bbs_in_cold_partition); + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + error (non-cold basic block %d dominated +by a block in the cold partition, bb-index); + err = 1; + } + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); +} + /* Clean up. */ return err; } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Fix debug info for expr and jump stmt
gcc/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * tree-eh.c (do_return_redirection): Set location for jump statement. (do_goto_redirection): Likewise. (frob_into_branch_around): Likewise. (lower_try_finally_nofallthru): Likewise. (lower_try_finally_copy): Likewise. (lower_try_finally_switch): Likewise. * expr.c (store_expr): Use current insn location instead of expr location. (expand_expr_real): Likewise. (expand_expr_real_1): Likewise. gcc/testsuite/ChangeLog: 2012-10-25 Dehao Chen de...@google.com * g++.dg/debug/dwarf2/block.C: New testcase. This patch bootstrapped and passed gcc regression tests. Okay for trunk? Thanks, Dehao
Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: Index: bb-reorder.c === --- bb-reorder.c(revision 192692) +++ bb-reorder.c(working copy) @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) first_partition = BB_PARTITION (bb); if (BB_PARTITION (bb) != first_partition) { + /* There should be a barrier between text sections. */ + emit_barrier_after (BB_END (bb-prev_bb)); So why isn't there one? There can't be a fall-through edge from one section to the other, so cfgrtl.c:fixup_reorder_chain should have added a barrier here already in the code under the comment: /* Now add jumps and labels as needed to match the blocks new outgoing edges. */ Why isn't it doing that for you? BTW, something else I noted in cfgrtl.c: NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in duplicate_insn_chain, so the following is necessary for robustness: Index: cfgrtl.c === --- cfgrtl.c(revision 191819) +++ cfgrtl.c(working copy) @@ -3615,7 +3615,6 @@ break; case NOTE_INSN_EPILOGUE_BEG: - case NOTE_INSN_SWITCH_TEXT_SECTIONS: emit_note_copy (insn); break; There can be only one! One note to rule them all! etc. Index: cfgrtl.c === --- cfgrtl.c(revision 192692) +++ cfgrtl.c(working copy) @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; My dislike for this whole scheme just continues to grow... How can there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)? That is a bug. We should not need the notes here. As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be the canonical way to check whether two blocks are in the same partition, and the EDGE_CROSSING flag should be set iff an edge crosses from one section to another. The REG_CROSSING_JUMP note should only be used to see if a JUMP_INSN may jump to another section, without having to check all successor edges. Any place where we have to check the BB_PARTITION or edge-flagsEDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in the partitioning updating. Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so that slim RTL dumping breaks. I need this patchlet to make things work: Index: sched-vis.c === --- sched-vis.c (revision 191819) +++ sched-vis.c (working copy) @@ -553,6 +553,11 @@ { char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN]; + if (! x) +{ + sprintf (buf, (nil)); + return; +} switch (GET_CODE (x)) { case SET: Ciao! Steven
Re: [PATCH] Replace const_vector with match_operand in sse.md
Thanks for explanation, I understand it. I fixed issue which you marked. Changelog is unchanged. --- Best regards, Andrey Turetskiy On Tue, Oct 30, 2012 at 7:40 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Oct 30, 2012 at 3:47 PM, Andrey Turetskiy andrey.turets...@gmail.com wrote: I changed the patch according Uros' remarks. Please, have a look. Changelog: 2012-10-30 Andrey Turetskiy andrey.turets...@gmail.com * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to CODE_FOR_avx2_pmulhrswv16hi3. * config/i386/predicates.md (const1_operand): Extend for vectors. * config/i386/sse.md (ssse3_avx2): Extend. (ssedoublemode): Ditto. (sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3, avx2_uavgv16hi3 and sse2_uavgv8hi3 into one. (*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3, *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one. (PMULHRSW): New. (ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3, ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one. (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand. (*ssse3_pmulhrswv8hi3): Ditto. (*ssse3_pmulhrswv4hi3): Ditto. +{ + ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands); + operands[3] = CONST1_RTX(MODEmode); +}) Please put operands[3] initialization before the call to ix86_f_b_o_n_c. We don't want to pass uninitialized operands around. +{ + if (which_alternative == 0 + (MODEmode == V16QImode + || MODEmode == V8HImode)) +return pavgssemodesuffix\t{%2, %0|%0, %2}; + return vpavgssemodesuffix\t{%2, %1, %0|%0, %1, %2}; +} + [(set (attr isa) + (if_then_else + (match_test which_alternative == 0 (MODEmode == V16QImode || MODEmode == V8HImode)) + (const_string noavx) + (const_string avx))) + (set (attr prefix_data16) +(if_then_else (eq_attr isa noavx) + (const_string 1) + (const_string *))) + (set (attr prefix) +(if_then_else (eq_attr isa noavx) + (const_string orig) + (const_string vex))) Uh, oh. Please note that isa attribute enables or disables the alternative through enabled attribute. Just change the mode attribute to sseinsnmode and everything will magically work: - AVX2 implies AVX, so it enables alternative 1, while disabling alternative 0 (and vice versa when AVX is disabled through noavx isa attribute). - Correct modes are conditionally enabled via VI12_AVX2 iterator - Base ISA level is enabled via insn predicate (TARGET_SSE2). You have to track three dependant conditions to calculate how insn pattern/mode/operand predicate are enabled ;) Uros, const_vector_replace.patch Description: Binary data
Re: [patch,libgcc] m32r*rtems* add crtinit.o and crtfinit.o
On Tue, Oct 30, 2012 at 12:10 AM, Ralf Corsepius ralf.corsep...@rtems.org wrote: I would like to apply the patch below to trunk and gcc-4.7-branch. This patch was originalyl submitted by Joel Sherrill back in May (http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01180.html), but had never received any feedback. It has been part of the rtems-gcc patches, since then. You are an RTEMS maintainer; you don't need any additional approval to commit this patch to trunk. Ian
Re: [PATCH] Replace const_vector with match_operand in sse.md
On Tue, Oct 30, 2012 at 6:53 PM, Andrey Turetskiy andrey.turets...@gmail.com wrote: Thanks for explanation, I understand it. I fixed issue which you marked. Changelog is unchanged. I changed the patch according Uros' remarks. Please, have a look. Changelog: 2012-10-30 Andrey Turetskiy andrey.turets...@gmail.com * config/i386/i386.c (bdesc_args): Rename CODE_FOR_avx2_umulhrswv16hi3 to CODE_FOR_avx2_pmulhrswv16hi3. * config/i386/predicates.md (const1_operand): Extend for vectors. * config/i386/sse.md (ssse3_avx2): Extend. (ssedoublemode): Ditto. (sse2_avx2_uavgmode3): Merge avx2_uavgv32qi3, sse2_uavgv16qi3, avx2_uavgv16hi3 and sse2_uavgv8hi3 into one. (*sse2_avx2_uavgmode3): Merge *avx2_uavgv32qi3, *sse2_uavgv16qi3, *avx2_uavgv16hi3 and *sse2_uavgv8hi3 into one. (PMULHRSW): New. (ssse3_avx2_pmulhrswmode3): Merge avx2_umulhrswv16hi3, ssse3_pmulhrswv8hi3 and ssse3_pmulhrswv4hi3 into one. (*avx2_pmulhrswv16hi3): Replace const_vector with match_operand. Replace const_vector with const1_operand predicate. (*ssse3_pmulhrswv8hi3): Ditto. (*ssse3_pmulhrswv4hi3): Ditto. Yes, the patch is OK for mainline SVN. BTW: Probably, pmulhrsw insn patterns can be merged, too, but this can be a follow-up patch. Thanks, Uros.
Re: [patch] Unify bitmap interface.
On 10/30/12, Diego Novillo dnovi...@google.com wrote: On Oct 30, 2012 Bin.Cheng amker.ch...@gmail.com wrote: Just one question: Should we change the name of functions sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/ sbitmap_union_of_succs/sbitmap_union_of_preds too? It might be a little confusing that sbitmap_* is used among bitmap_*. Yes. Lawrence is proceeding with this unification in stages. The next few patches should rename these. Actually, I didn't know about them because they are auxillary routines declared outside of the bitmap files. I've gone ahead and changed them as well. The only two sets of functions that will remain separate for now are the iterators and the bitmap creation routines, I think. Lawrence? The iterator functions have been unified, but not the iterator type names. -- Lawrence Crowl
Non-dominating loop bounds in tree-ssa-loop-niter 2/4
Hi, this patch implements the second part of planned change - to determine loop bounds based by shortest path discovery. This allows to bound number of iterations on loops with bounds in statements that do not dominate the latch. I originally planned to implement this as part of maybe_lower_iteration_bound, but I found doing it separately is more readable. While both performs walk of the loop body, both do that for different reasons. discover_iteration_bound_by_body_walk walks from header to latch, while maybe_lower_iteration_bound walks from header to first statement with side effects looking if there is exit on the way. Both walks can be skipped for many loops, but each with different reasons. Bootstrapped/regtested x86_64-linux, OK? * tree-ssa-loop-niter.c (double_int_cmp, bound_index, discover_iteration_bound_by_body_walk): New functions. (discover_iteration_bound_by_body_walk): Use it. * gcc.dg/tree-ssa/loop-38.c: New testcase. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192989) +++ tree-ssa-loop-niter.c (working copy) @@ -2955,6 +2955,234 @@ gcov_type_to_double_int (gcov_type val) return ret; } +/* Compare double ints, callback for qsort. */ + +int +double_int_cmp (const void *p1, const void *p2) +{ + const double_int *d1 = (const double_int *)p1; + const double_int *d2 = (const double_int *)p2; + if (*d1 == *d2) +return 0; + if (d1-ult (*d2)) +return -1; + return 1; +} + +/* Return index of BOUND in BOUNDS array sorted in increasing order. + Lookup by binary search. */ + +int +bound_index (VEC (double_int, heap) *bounds, double_int bound) +{ + unsigned int end = VEC_length (double_int, bounds); + unsigned int begin = 0; + + /* Find a matching index by means of a binary search. */ + while (begin != end) +{ + unsigned int middle = (begin + end) / 2; + double_int index = VEC_index (double_int, bounds, middle); + + if (index == bound) + return middle; + else if (index.ult (bound)) + begin = middle + 1; + else + end = middle; +} + gcc_unreachable (); +} + +/* Used to hold vector of queues of basic blocks bellow. */ +typedef VEC (basic_block, heap) *bb_queue; +DEF_VEC_P(bb_queue); +DEF_VEC_ALLOC_P(bb_queue,heap); + +/* We recorded loop bounds only for statements dominating loop latch (and thus + executed each loop iteration). If there are any bounds on statements not + dominating the loop latch we can improve the estimate by walking the loop + body and seeing if every path from loop header to loop latch contains + some bounded statement. */ + +static void +discover_iteration_bound_by_body_walk (struct loop *loop) +{ + pointer_map_t *bb_bounds; + struct nb_iter_bound *elt; + VEC (double_int, heap) *bounds = NULL; + VEC (bb_queue, heap) *queues = NULL; + bb_queue queue = NULL; + ptrdiff_t queue_index; + ptrdiff_t latch_index = 0; + pointer_map_t *visited; + + /* Discover what bounds may interest us. */ + for (elt = loop-bounds; elt; elt = elt-next) +{ + double_int bound = elt-bound; + + /* Exit terminates loop at given iteration, while non-exits produce undefined +effect on the next iteration. */ + if (!elt-is_exit) + { + bound += double_int_one; + /* Overflow, give up on this bound. */ + if (bound == double_int_zero) + continue; + } + + if (!loop-any_upper_bound + || bound.ult (loop-nb_iterations_upper_bound)) +VEC_safe_push (double_int, heap, bounds, bound); +} + + /* Exit early if there is nothing to do. */ + if (!bounds) +return; + + if (dump_file (dump_flags TDF_DETAILS)) +fprintf (dump_file, Trying to walk loop body to reduce the bound.\n); + + /* Sort the bounds in decreasing order. */ + qsort (VEC_address (double_int, bounds), VEC_length (double_int, bounds), +sizeof (double_int), double_int_cmp); + + /* For every basic block record the lowest bound that is guaranteed to + terminate the loop. */ + + bb_bounds = pointer_map_create (); + for (elt = loop-bounds; elt; elt = elt-next) +{ + double_int bound = elt-bound; + if (!elt-is_exit) + { + bound += double_int_one; + /* Overflow, give up on this bound. */ + if (bound == double_int_zero) + continue; + } + + if (!loop-any_upper_bound + || bound.ult (loop-nb_iterations_upper_bound)) + { + ptrdiff_t index = bound_index (bounds, bound); + void **entry = pointer_map_contains (bb_bounds, + gimple_bb (elt-stmt)); + if (!entry) + *pointer_map_insert (bb_bounds, +gimple_bb (elt-stmt)) = (void *)index; + else if ((ptrdiff_t)*entry index) + *entry = (void *)index; + } +} +
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Tue, 2012-10-30 at 11:58 -0500, Peter Bergner wrote: I'm hitting the same bug as in PR53708 when compiling GLIBC's dlfcn.c when vectorization is enabled on powerpc64-linux. A reduced test case is: bergner@bns:~/gcc/BUGS cat foo.i static void (*const init_array []) (void) __attribute__ ((section (.init_array), aligned (sizeof (void *)), used)) = { 0 }; bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-base/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-4_7-base/gcc -S -m64 -O3 -maltivec foo.i -o bad.s bergner@bns:~/gcc/BUGS /home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-4_7-pr53708/gcc -S -m64 -O3 -maltivec foo.i -o good.s bergner@bns:~/gcc/BUGS diff -u bad.s good.s --- bad.s2012-10-30 10:41:15.0 -0500 +++ good.s2012-10-30 10:41:23.0 -0500 @@ -2,7 +2,7 @@ .section.toc,aw .section.text .section.init_array,a -.align 4 +.align 3 .typeinit_array, @object .sizeinit_array, 8 init_array: The above is bad, because the extra alignment causes the linker to add some null padding to the init_array and the loader isn't expecting that and ends up segv'ing. I'd like to backport Richard's patch below to the 4.7 branch. The patch bootstrapped and regtested on powerpc64-linux with no regressions. Commenting on Richard's question from the bugzilla: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10 I suppose if attribute((__aligned__)) truly does just set a minimum alignment value (and the documentation seems to say that) and the compiler is free to arbitrarily increase it, then the GLIBC code to scan the init_array needs to be tolerant of null values in init_array. Does everyone agree with that assessment? Peter
Re: [PATCH] Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
On Tue, Oct 30, 2012 at 6:48 PM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: Index: bb-reorder.c === --- bb-reorder.c(revision 192692) +++ bb-reorder.c(working copy) @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) first_partition = BB_PARTITION (bb); if (BB_PARTITION (bb) != first_partition) { + /* There should be a barrier between text sections. */ + emit_barrier_after (BB_END (bb-prev_bb)); So why isn't there one? There can't be a fall-through edge from one section to the other, so cfgrtl.c:fixup_reorder_chain should have added a barrier here already in the code under the comment: /* Now add jumps and labels as needed to match the blocks new outgoing edges. */ Why isn't it doing that for you? Maybe it's because fix_up_fall_thru_edges calls force_nonfallthru, which is incorrectly inserting JUMP_INSNs and BARRIERs in cfglayout mode. I'm going to test this patch: Index: cfgrtl.c === --- cfgrtl.c(revision 192889) +++ cfgrtl.c(working copy) @@ -1511,16 +1511,17 @@ force_nonfallthru_and_redirect (edge e, #endif } set_return_jump_label (BB_END (jump_block)); + emit_barrier_after (BB_END (jump_block)); } - else + else if (current_ir_type () == IR_RTL_CFGRTL) { rtx label = block_label (target); emit_jump_insn_after_setloc (gen_jump (label), BB_END (jump_block), loc); JUMP_LABEL (BB_END (jump_block)) = label; LABEL_NUSES (label)++; + emit_barrier_after (BB_END (jump_block)); } - emit_barrier_after (BB_END (jump_block)); redirect_edge_succ_nodup (e, target); if (abnormal_edge_flags)
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote: Commenting on Richard's question from the bugzilla: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10 I suppose if attribute((__aligned__)) truly does just set a minimum alignment value (and the documentation seems to say that) and the compiler is free to arbitrarily increase it, then the GLIBC code to scan the init_array needs to be tolerant of null values in init_array. Does everyone agree with that assessment? I think the right test is for user supplied section name, this approach is so common that I think we have to support it. E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g. crtstuff.c). Jakub
Re: Ping / update: RFA: replace #ifdef with if/#if for HAVE_ATTR_*
I can't approve the whole thing of course, but I like the idea. However... Joern Rennecke joern.renne...@embecosm.com writes: +@deftypevr {Target Hook} bool TARGET_HAVE_CC0 +@deftypevrx {Target Hook} {bool} TARGET_AUTO_INC_DEC +@deftypevrx {Target Hook} {bool} TARGET_STACK_REGS +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_ENABLED +@deftypevrx {Target Hook} {bool} TARGET_HAVE_ATTR_LENGTH +These flags are automatically generated; you should not override them in @file{tm.c}. +@end deftypevr Unless this goes against something already discussed, I think it'd be better to leave these out until there's a concerted attempt to use them somewhere. On its own this isn't even a partial transition. :-) + /* We make an exception here to provide stub definitions for + insn_*_length* / get_attr_enabled functions. */ + puts (#if !HAVE_ATTR_length\n + extern int hook_int_rtx_0 (rtx);\n + #define insn_default_length hook_int_rtx_0\n + #define insn_min_length hook_int_rtx_0\n + #define insn_variable_length_p hook_int_rtx_0\n + #define insn_current_length hook_int_rtx_0\n + #include \insn-addr.h\\n + #endif\n I'd prefer defaults that call gcc_unreachable, rather than silently return an arbitrary value. That said, + #if !HAVE_ATTR_enabled\n + extern int hook_int_rtx_1 (rtx);\n + #define get_attr_enabled hook_int_rtx_1\n + #endif\n); I agree that 1 is a safe default here. Richard
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Tue, 2012-10-30 at 19:55 +0100, Jakub Jelinek wrote: On Tue, Oct 30, 2012 at 01:43:33PM -0500, Peter Bergner wrote: Commenting on Richard's question from the bugzilla: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53708#c10 I suppose if attribute((__aligned__)) truly does just set a minimum alignment value (and the documentation seems to say that) and the compiler is free to arbitrarily increase it, then the GLIBC code to scan the init_array needs to be tolerant of null values in init_array. Does everyone agree with that assessment? I think the right test is for user supplied section name, this approach is so common that I think we have to support it. E.g. Linux kernel, glibc, prelink all use this, gcc itself too (e.g. crtstuff.c). Ok, then I'll bootstrap and regtest your suggested change while we wait for richi to comment. I'm fine with whatever you and richi decide is best. The ObjC guys should probably test it though too. I assume you think we should change the current trunk code as well? Peter
[committed] A global default for SLOW_UNALIGNED_ACCESS
This patch replaces three separate default definitions of SLOW_UNALIGNED_ACCESS with a single global one. Note that tm.texi requires SLOW_UNALIGNED_ACCESS to be true if STRICT_ALIGNMENT. Tested on x86_64-linux-gnu, powerpc64-linux-gnu and mipsisa64-elf. Applied as obvious. Richard gcc/ * defaults.h (SLOW_UNALIGNED_ACCESS): Provide default definition. * expmed.c (SLOW_UNALIGNED_ACCESS): Remove default definition. * expr.c (SLOW_UNALIGNED_ACCESS): Likewise. * lra-constraints.c (SLOW_UNALIGNED_ACCESS): Likewise. (simplify_operand_subreg): Don't check STRICT_ALIGNMENT here. Index: gcc/defaults.h === --- gcc/defaults.h 2012-08-02 21:10:06.0 +0100 +++ gcc/defaults.h 2012-10-28 10:30:47.340353996 + @@ -1218,6 +1218,10 @@ #define MINIMUM_ALIGNMENT(EXP,MODE,ALIGN #define ATTRIBUTE_ALIGNED_VALUE BIGGEST_ALIGNMENT #endif +#ifndef SLOW_UNALIGNED_ACCESS +#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT +#endif + /* For most ports anything that evaluates to a constant symbolic or integer value is acceptable as a constant address. */ #ifndef CONSTANT_ADDRESS_P Index: gcc/expmed.c === --- gcc/expmed.c2012-10-28 10:25:12.0 + +++ gcc/expmed.c2012-10-28 10:30:44.178354004 + @@ -69,11 +69,6 @@ static rtx expand_sdiv_pow2 (enum machin /* Test whether a value is zero of a power of two. */ #define EXACT_POWER_OF_2_OR_ZERO_P(x) (((x) ((x) - 1)) == 0) -#ifndef SLOW_UNALIGNED_ACCESS -#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT -#endif - - /* Reduce conditional compilation elsewhere. */ #ifndef HAVE_insv #define HAVE_insv 0 Index: gcc/expr.c === --- gcc/expr.c 2012-10-25 10:08:06.0 +0100 +++ gcc/expr.c 2012-10-28 10:31:44.133353857 + @@ -189,12 +189,6 @@ #define STORE_BY_PIECES_P(SIZE, ALIGN) \ (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \ (unsigned int) MOVE_RATIO (optimize_insn_for_speed_p ())) #endif - -/* SLOW_UNALIGNED_ACCESS is nonzero if unaligned accesses are very slow. */ - -#ifndef SLOW_UNALIGNED_ACCESS -#define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT -#endif /* This is run to set up which modes can be used directly in memory and to initialize the block move optab. It is run Index: gcc/lra-constraints.c === --- gcc/lra-constraints.c 2012-10-26 11:50:16.0 +0100 +++ gcc/lra-constraints.c 2012-10-28 10:32:02.499353813 + @@ -1105,10 +1105,6 @@ process_addr_reg (rtx *loc, rtx *before, return true; } -#ifndef SLOW_UNALIGNED_ACCESS -#define SLOW_UNALIGNED_ACCESS(mode, align) 0 -#endif - /* Make reloads for subreg in operand NOP with internal subreg mode REG_MODE, add new reloads for further processing. Return true if any reload was generated. */ @@ -1132,8 +1128,7 @@ simplify_operand_subreg (int nop, enum m address might violate the necessary alignment or the access might be slow. So take this into consideration. */ if ((MEM_P (reg) -((! STRICT_ALIGNMENT -! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))) +(! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) || MEM_ALIGN (reg) = GET_MODE_ALIGNMENT (mode))) || (REG_P (reg) REGNO (reg) FIRST_PSEUDO_REGISTER)) {
Re: User directed Function Multiversioning via Function Overloading (issue5752064)
On 10/27/2012 09:16 PM, Sriraman Tallam wrote: + /* See if there's a match. For functions that are multi-versioned, +all the versions match. */ if (same_type_p (target_fn_type, static_fn_type (fn))) - matches = tree_cons (fn, NULL_TREE, matches); + { + matches = tree_cons (fn, NULL_TREE, matches); + /*If versioned, push all possible versions into a vector. */ + if (DECL_FUNCTION_VERSIONED (fn)) + { + if (fn_ver_vec == NULL) + fn_ver_vec = VEC_alloc (tree, heap, 2); + VEC_safe_push (tree, heap, fn_ver_vec, fn); + } + } Why do we need to keep both a list and vector of the matches? +Call decls_match to make sure they are different because they are +versioned. */ + if (DECL_FUNCTION_VERSIONED (fn)) + { + for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match)) + if (decls_match (fn, TREE_PURPOSE (match))) + break; + } What if you have multiple matches that aren't all versions of the same function? Why would it be a problem to have two separate declarations of the same function? + dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec); Is the idea here that if you have some versions declared, then a call, then more versions declared, then another call, you will call two different dispatchers, where the first one will only dispatch to the versions declared before the first call? If not, why do we care about the set of declarations at this point? + /* Mark this functio to be output. */ + node-local.finalized = 1; Missing 'n' in function. @@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl) else break; } - name = DECL_ASSEMBLER_NAME (decl); + if (TREE_CODE (decl) == FUNCTION_DECL + DECL_FUNCTION_VERSIONED (decl)) + name = DECL_NAME (decl); This would mean that f in the global namespace and f in namespace foo would end up in the same comdat group. Why do we need special handling here at all? dump_function_name (tree t, int flags) { - tree name = DECL_NAME (t); + tree name; + /* For function versions, use the assembler name as the decl name is + the same for all versions. */ + if (TREE_CODE (t) == FUNCTION_DECL + DECL_FUNCTION_VERSIONED (t)) +name = DECL_ASSEMBLER_NAME (t); This shouldn't be necessary; we should print the target attribute when printing the function declaration. +Also, mark this function as needed if it is marked inline but +is a multi-versioned function. */ + if (((flag_keep_inline_functions + || DECL_FUNCTION_VERSIONED (fn)) This should be marked as needed by the code that builds the dispatcher. + /* For calls to a multi-versioned function, overload resolution + returns the function with the highest target priority, that is, + the version that will checked for dispatching first. If this + version is inlinable, a direct call to this version can be made + otherwise the call should go through the dispatcher. */ I'm a bit confused why people would want both dispatched calls and non-dispatched inlining; I would expect that if a function can be compiled differently enough on newer hardware to make versioning worthwhile, that would be a larger difference than the call overhead. + if (DECL_FUNCTION_VERSIONED (fn) + !targetm.target_option.can_inline_p (current_function_decl, fn)) +{ + struct cgraph_node *dispatcher_node = NULL; + fn = get_function_version_dispatcher (fn); + if (fn == NULL) + return NULL; + dispatcher_node = cgraph_get_create_node (fn); + gcc_assert (dispatcher_node != NULL); + /* Mark this function to be output. */ + dispatcher_node-local.finalized = 1; +} Why do you need to mark this here? If you generate a call to the dispatcher, cgraph should mark it to be output automatically. + /* For candidates of a multi-versioned function, make the version with + the highest priority win. This version will be checked for dispatching + first. If this version can be inlined into the caller, the front-end + will simply make a direct call to this function. */ This is still too high in joust. I believe I said before that this code should come just above /* If the two function declarations represent the same function (this can happen with declarations in multiple scopes and arg-dependent lookup), arbitrarily choose one. But first make sure the default args we're using match. */ + /* For multiversioned functions, aggregate all the versions here for + generating the dispatcher body later if necessary. Check to see if + the dispatcher is already generated to avoid doing this more than + once. */ This caching
[0/8] Preparing for insv and ext(z)v optabs
I'm finishing off some patches to allow insv, extv and extzv to be defined as normal direct optabs (such as insvsi and insvdi rather than just insv). This series of patches does some groundwork to make that possible. The patches are not supposed to change the generated code. I checked that the assembly output for a set of gcc .ii files didn't change for x86_64-linux-gnu, powerpc64-linux-gnu and mips64-linux-gnu. Also regression-tested on x86_64-linux-gnu, powerpc64-linux-gnu and mips64-elf. Richard
[1/8] Remove redundant BLKmode test
This patch removes what I believe is a redundant check in store_bit_field_1 for whether the value to insert (i.e. the rhs) has BLKmode. We shouldn't see BLKmode values here, and even if we did, the only effect of the test is to fall through to store_fixed_bit_field, which can't handle BLKmode either. Specifically, store_fixed_bit_field would call: if (GET_MODE (value) != mode) value = convert_to_mode (mode, value, 1); and convert_to_mode ICEs on BLKmode values. Tested as described in the covering note. OK to install? Richard gcc/ * expmed.c (store_bit_field_1): Remove test for BLKmode values. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-28 10:40:22.533352589 + +++ gcc/expmed.c2012-10-28 10:40:23.119352588 + @@ -670,7 +670,6 @@ store_bit_field_1 (rtx str_rtx, unsigned enum machine_mode op_mode = mode_for_extraction (EP_insv, 3); if (HAVE_insv - GET_MODE (value) != BLKmode bitsize 0 GET_MODE_BITSIZE (op_mode) = bitsize /* Do not use insv for volatile bitfields when
[2/8] Remove redundant MAX_MACHINE_MODE tests
extract_bit_field_1 has a block beginning: /* If OP0 is a memory, try copying it to a register and seeing if a cheap register alternative is available. */ if (ext_mode != MAX_MACHINE_MODE MEM_P (op0)) { and within it there are tests for whether ext_mode != MAX_MACHINE_MODE. This patch removes them. store_bit_field_1 has a similar block, but it uses: /* If OP0 is a memory, try copying it to a register and seeing if a cheap register alternative is available. */ if (HAVE_insv MEM_P (op0)) { However, by definition, HAVE_insv is equivalent to op_mode != MAX_MACHINE_MODE, so this patch changes it to that form for consistency. It's then obvious that the corresponding op_mode != MAX_MACHINE_MODE tests are redundant too. Tested as described in the covering note. OK to install? Richard gcc/ * expmed.c (store_bit_field_1): Use OP_MODE to check whether an insv pattern is available. Remove redundant checks for OP_MODE being MAX_MACHINE_MODE. (extract_bit_field_1): Remove redundant checks for EXT_MODE being MAX_MACHINE_MODE. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-28 10:54:24.0 + +++ gcc/expmed.c2012-10-28 10:54:53.715350457 + @@ -669,7 +669,7 @@ store_bit_field_1 (rtx str_rtx, unsigned in a word. */ enum machine_mode op_mode = mode_for_extraction (EP_insv, 3); - if (HAVE_insv + if (op_mode != MAX_MACHINE_MODE bitsize 0 GET_MODE_BITSIZE (op_mode) = bitsize /* Do not use insv for volatile bitfields when @@ -788,7 +788,7 @@ store_bit_field_1 (rtx str_rtx, unsigned /* If OP0 is a memory, try copying it to a register and seeing if a cheap register alternative is available. */ - if (HAVE_insv MEM_P (op0)) + if (op_mode != MAX_MACHINE_MODE MEM_P (op0)) { enum machine_mode bestmode; unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE; @@ -803,13 +803,10 @@ store_bit_field_1 (rtx str_rtx, unsigned if (GET_MODE (op0) == BLKmode || GET_MODE_BITSIZE (GET_MODE (op0)) maxbits - || (op_mode != MAX_MACHINE_MODE - GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (op_mode))) + || GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (op_mode)) bestmode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end, - MEM_ALIGN (op0), - (op_mode == MAX_MACHINE_MODE - ? VOIDmode : op_mode), + MEM_ALIGN (op0), op_mode, MEM_VOLATILE_P (op0)); else bestmode = GET_MODE (op0); @@ -1597,12 +1594,9 @@ extract_bit_field_1 (rtx str_rtx, unsign smallest mode containing the field. */ if (GET_MODE (op0) == BLKmode - || (ext_mode != MAX_MACHINE_MODE - GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (ext_mode))) + || GET_MODE_SIZE (GET_MODE (op0)) GET_MODE_SIZE (ext_mode)) bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0), - (ext_mode == MAX_MACHINE_MODE - ? VOIDmode : ext_mode), - MEM_VOLATILE_P (op0)); + ext_mode, MEM_VOLATILE_P (op0)); else bestmode = GET_MODE (op0);
[3/8] Split out insv and ext(z)v code
This patch splits out the code to handle insv and ext(z)v from store_bit_field_1 and extract_bit_field_1 respectively. I removed x prefixes from some of the variables and tried to make the placement of the REG and SUBREG handling more consistent, but there are no behavioural changes. Tested as described in the covering note. OK to install? Richard gcc/ * expmed.c (store_bit_field_using_insv): New function, split out from... (store_bit_field_1): ...here. (extract_bit_field_using_extv): New function, split out from... (extract_bit_field_1): ...here. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-29 12:56:45.260327155 + +++ gcc/expmed.c2012-10-29 13:11:08.244325044 + @@ -404,6 +404,120 @@ lowpart_bit_field_p (unsigned HOST_WIDE_ return bitnum % BITS_PER_WORD == 0; } +/* Try to use an insv pattern to store VALUE into a field of OP0. + OP_MODE is the mode of the insertion and BITSIZE and BITNUM are + as for store_bit_field. */ + +static bool +store_bit_field_using_insv (rtx op0, unsigned HOST_WIDE_INT bitsize, + unsigned HOST_WIDE_INT bitnum, rtx value, + enum machine_mode op_mode) +{ + struct expand_operand ops[4]; + rtx value1; + rtx xop0 = op0; + rtx last = get_last_insn (); + bool copy_back = false; + + unsigned int unit = GET_MODE_BITSIZE (op_mode); + if (bitsize == 0 || bitsize unit) +return false; + + if (MEM_P (xop0)) +{ + /* Get a reference to the first byte of the field. */ + xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT); + bitnum %= BITS_PER_UNIT; +} + else +{ + /* Convert from counting within OP0 to counting in OP_MODE. */ + if (BYTES_BIG_ENDIAN) + bitnum += unit - GET_MODE_BITSIZE (GET_MODE (op0)); + + /* If xop0 is a register, we need it in OP_MODE +to make it acceptable to the format of insv. */ + if (GET_CODE (xop0) == SUBREG) + /* We can't just change the mode, because this might clobber op0, + and we will need the original value of op0 if insv fails. */ + xop0 = gen_rtx_SUBREG (op_mode, SUBREG_REG (xop0), SUBREG_BYTE (xop0)); + if (REG_P (xop0) GET_MODE (xop0) != op_mode) + xop0 = gen_lowpart_SUBREG (op_mode, xop0); +} + + /* If the destination is a paradoxical subreg such that we need a + truncate to the inner mode, perform the insertion on a temporary and + truncate the result to the original destination. Note that we can't + just truncate the paradoxical subreg as (truncate:N (subreg:W (reg:N + X) 0)) is (reg:N X). */ + if (GET_CODE (xop0) == SUBREG + REG_P (SUBREG_REG (xop0)) + !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (SUBREG_REG (xop0)), +op_mode)) +{ + rtx tem = gen_reg_rtx (op_mode); + emit_move_insn (tem, xop0); + xop0 = tem; + copy_back = true; +} + + /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count + backwards from the size of the unit we are inserting into. + Otherwise, we count bits from the most significant on a + BYTES/BITS_BIG_ENDIAN machine. */ + + if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN) +bitnum = unit - bitsize - bitnum; + + /* Convert VALUE to op_mode (which insv insn wants) in VALUE1. */ + value1 = value; + if (GET_MODE (value) != op_mode) +{ + if (GET_MODE_BITSIZE (GET_MODE (value)) = bitsize) + { + /* Optimization: Don't bother really extending VALUE +if it has all the bits we will actually use. However, +if we must narrow it, be sure we do it correctly. */ + + if (GET_MODE_SIZE (GET_MODE (value)) GET_MODE_SIZE (op_mode)) + { + rtx tmp; + + tmp = simplify_subreg (op_mode, value1, GET_MODE (value), 0); + if (! tmp) + tmp = simplify_gen_subreg (op_mode, + force_reg (GET_MODE (value), + value1), + GET_MODE (value), 0); + value1 = tmp; + } + else + value1 = gen_lowpart (op_mode, value1); + } + else if (CONST_INT_P (value)) + value1 = gen_int_mode (INTVAL (value), op_mode); + else + /* Parse phase is supposed to make VALUE's data type + match that of the component reference, which is a type + at least as wide as the field; so VALUE should have + a mode that corresponds to that type. */ + gcc_assert (CONSTANT_P (value)); +} + + create_fixed_operand (ops[0], xop0); + create_integer_operand (ops[1], bitsize); + create_integer_operand (ops[2], bitnum); + create_input_operand (ops[3], value1, op_mode); + if (maybe_expand_insn (CODE_FOR_insv, 4,
[4/8] Separate reg and mem uses of insv and ext(z)v
This patch simply separates out the MEM and non-MEM insv and ext(z)v cases. On it's own, it's probably a wash whether this is an improvement or not, but it makes the optabs patches much easier. Tested as described in the covering note. OK to install? Richard gcc/ * expmed.c (store_bit_field_1): Move generation of MEM insvs to the MEM_P block. (extract_bit_field_1): Likewise extvs and extzvs. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-28 10:55:22.754350385 + +++ gcc/expmed.c2012-10-28 10:55:29.249350370 + @@ -784,16 +784,7 @@ store_bit_field_1 (rtx str_rtx, unsigned enum machine_mode op_mode = mode_for_extraction (EP_insv, 3); if (op_mode != MAX_MACHINE_MODE - /* Do not use insv for volatile bitfields when - -fstrict-volatile-bitfields is in effect. */ - !(MEM_P (op0) MEM_VOLATILE_P (op0) - flag_strict_volatile_bitfields 0) - /* Do not use insv if the bit region is restricted and -op_mode integer at offset doesn't fit into the -restricted region. */ - !(MEM_P (op0) bitregion_end - bitnum - (bitnum % BITS_PER_UNIT) + GET_MODE_BITSIZE (op_mode) - bitregion_end + 1) + !MEM_P (op0) store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode)) return true; @@ -804,6 +795,18 @@ store_bit_field_1 (rtx str_rtx, unsigned enum machine_mode bestmode; unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE; + /* Do not use insv for volatile bitfields when + -fstrict-volatile-bitfields is in effect. */ + if (!(MEM_VOLATILE_P (op0) flag_strict_volatile_bitfields 0) + /* Do not use insv if the bit region is restricted and +an op_mode integer doesn't fit into the restricted region. */ + !(bitregion_end + (bitnum - (bitnum % BITS_PER_UNIT) + + GET_MODE_BITSIZE (op_mode) + bitregion_end + 1)) + store_bit_field_using_insv (op0, bitsize, bitnum, value, op_mode)) + return true; + if (bitregion_end) maxbits = bitregion_end - bitregion_start + 1; @@ -1594,11 +1597,7 @@ extract_bit_field_1 (rtx str_rtx, unsign If OP0 is a register, it too fits within a word. */ ext_mode = mode_for_extraction (unsignedp ? EP_extzv : EP_extv, 0); - if (ext_mode != MAX_MACHINE_MODE - /* Do not use extv/extzv for volatile bitfields when - -fstrict-volatile-bitfields is in effect. */ - !(MEM_P (op0) MEM_VOLATILE_P (op0) - flag_strict_volatile_bitfields 0)) + if (ext_mode != MAX_MACHINE_MODE !MEM_P (op0)) { rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum, unsignedp, target, mode, @@ -1613,6 +1612,17 @@ extract_bit_field_1 (rtx str_rtx, unsign { enum machine_mode bestmode; + /* Do not use extv/extzv for volatile bitfields when + -fstrict-volatile-bitfields is in effect. */ + if (!(MEM_VOLATILE_P (op0) flag_strict_volatile_bitfields 0)) + { + rtx result = extract_bit_field_using_extv (op0, bitsize, bitnum, +unsignedp, target, mode, +tmode, ext_mode); + if (result) + return result; + } + /* Get the mode to use for inserting into this field. If OP0 is BLKmode, get the smallest mode consistent with the alignment. If OP0 is a non-BLKmode object that is no
Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures
On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote: I will try testing your patch on top of mine with our fdo benchmarks. Thanks. But you should expect a lot of errors, hopefully you can make something out of it for Bugzilla. For the others on the cc list, you may need to include my patch as well for testing. Without it, -freorder-blocks-and-partition was DOA for me. For my patch, see http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html Ah, the fate of a pass that isn't enabled by default. From what I can tell, looking at this code the past few hours, it's hopelessly broken at the moment: * It doesn't work with cfglayout mode, even though it is in between pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out of cfglayout mode * Coming out of cfglayout mode, fixup_reorder_chain adds REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called just before it. So we end up with 1 such note, or with notes on jumps that shouldn't have them. * The scheduler doesn't respect the partitioning at all. remove_notes happily removes the section split note. * etc. This pass may need some major work to be in good order again. Ciao! Steven
[5/8] Add narrow_bit_field_mem
This patch splits out a fairly common operation: that of narrowing a MEM to a particular mode and adjusting the bit number accordingly. I've kept with bit_field rather than bitfield for consistency with the callers, although we do have bitfield in adjust_bitfield_address. Tested as described in the covering note. OK to install? Richard gcc/ * expmed.c (narrow_bit_field_mem): New function. (store_bit_field_using_insv, store_bit_field_1, store_fixed_bit_field) (extract_bit_field_1): Use it. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-30 19:25:44.797368678 + +++ gcc/expmed.c2012-10-30 19:25:47.730368671 + @@ -387,6 +387,23 @@ mode_for_extraction (enum extraction_pat return data-operand[opno].mode; } +/* Adjust bitfield memory MEM so that it points to the first unit of + mode MODE that contains the bitfield at bit position BITNUM. + Set NEW_BITNUM to the bit position of the field within the + new memory. */ + +static rtx +narrow_bit_field_mem (rtx mem, enum machine_mode mode, + unsigned HOST_WIDE_INT bitnum, + unsigned HOST_WIDE_INT new_bitnum) +{ + unsigned int unit = GET_MODE_BITSIZE (mode); + unsigned HOST_WIDE_INT offset = bitnum / unit * GET_MODE_SIZE (mode); + mem = adjust_bitfield_address (mem, mode, offset); + new_bitnum = bitnum % unit; + return mem; +} + /* Return true if a bitfield of size BITSIZE at bit number BITNUM within a structure of mode STRUCT_MODE represents a lowpart subreg. The subreg offset is then BITNUM / BITS_PER_UNIT. */ @@ -424,11 +441,8 @@ store_bit_field_using_insv (rtx op0, uns return false; if (MEM_P (xop0)) -{ - /* Get a reference to the first byte of the field. */ - xop0 = adjust_bitfield_address (xop0, byte_mode, bitnum / BITS_PER_UNIT); - bitnum %= BITS_PER_UNIT; -} +/* Get a reference to the first byte of the field. */ +xop0 = narrow_bit_field_mem (xop0, byte_mode, bitnum, bitnum); else { /* Convert from counting within OP0 to counting in OP_MODE. */ @@ -831,18 +845,14 @@ store_bit_field_1 (rtx str_rtx, unsigned GET_MODE_BITSIZE (bestmode) MEM_ALIGN (op0))) { rtx last, tempreg, xop0; - unsigned int unit; - unsigned HOST_WIDE_INT offset, bitpos; + unsigned HOST_WIDE_INT bitpos; last = get_last_insn (); /* Adjust address to point to the containing unit of that mode. Compute the offset as a multiple of this unit, counting in bytes. */ - unit = GET_MODE_BITSIZE (bestmode); - offset = (bitnum / unit) * GET_MODE_SIZE (bestmode); - bitpos = bitnum % unit; - xop0 = adjust_bitfield_address (op0, bestmode, offset); + xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos); /* Fetch that unit, store the bitfield in it, then store the unit. */ @@ -975,9 +985,7 @@ store_fixed_bit_field (rtx op0, unsigned return; } - HOST_WIDE_INT bit_offset = bitnum - bitnum % GET_MODE_BITSIZE (mode); - op0 = adjust_bitfield_address (op0, mode, bit_offset / BITS_PER_UNIT); - bitnum -= bit_offset; + op0 = narrow_bit_field_mem (op0, mode, bitnum, bitnum); } mode = GET_MODE (op0); @@ -1246,11 +1254,8 @@ extract_bit_field_using_extv (rtx op0, u return NULL_RTX; if (MEM_P (op0)) -{ - /* Get a reference to the first byte of the field. */ - op0 = adjust_bitfield_address (op0, byte_mode, bitnum / BITS_PER_UNIT); - bitnum %= BITS_PER_UNIT; -} +/* Get a reference to the first byte of the field. */ +op0 = narrow_bit_field_mem (op0, byte_mode, bitnum, bitnum); else { /* Convert from counting within OP0 to counting in EXT_MODE. */ @@ -1640,23 +1645,19 @@ extract_bit_field_1 (rtx str_rtx, unsign !(SLOW_UNALIGNED_ACCESS (bestmode, MEM_ALIGN (op0)) GET_MODE_BITSIZE (bestmode) MEM_ALIGN (op0))) { - unsigned HOST_WIDE_INT offset, bitpos; - - /* Compute the offset as a multiple of this unit, -counting in bytes. */ - unsigned int unit = GET_MODE_BITSIZE (bestmode); - offset = (bitnum / unit) * GET_MODE_SIZE (bestmode); - bitpos = bitnum % unit; + unsigned HOST_WIDE_INT bitpos; + rtx xop0 = narrow_bit_field_mem (op0, bestmode, bitnum, bitpos); - /* Make sure the register is big enough for the whole field. */ - if (bitpos + bitsize = unit) + /* Make sure the register is big enough for the whole field. +(It might not be if bestmode == GET_MODE (op0) and the input +code was invalid.) */ + if (bitpos + bitsize = GET_MODE_BITSIZE (bestmode)) { - rtx last, result, xop0; + rtx last, result;
[6/8] Simplify make_extraction
On a change of tack, this tackles some redundant code in combine. It has code to convert a variable bit position to the mode required by the bit position operand to insv, extv or extzv: [A] else if (pos_rtx != 0 GET_MODE_SIZE (pos_mode) GET_MODE_SIZE (GET_MODE (pos_rtx))) pos_rtx = gen_lowpart (pos_mode, pos_rtx); However, this is protected by an earlier: [B] if (pos_rtx GET_MODE (pos_rtx) != VOIDmode GET_MODE_SIZE (pos_mode) GET_MODE_SIZE (GET_MODE (pos_rtx))) pos_mode = GET_MODE (pos_rtx); so [B] makes [A] redundant. That leaves this block: /* Adjust mode of POS_RTX, if needed. If we want a wider mode, we have to zero extend. Otherwise, we can just use a SUBREG. */ if (pos_rtx != 0 GET_MODE_SIZE (pos_mode) GET_MODE_SIZE (GET_MODE (pos_rtx))) { as the only use of pos_mode, so [B] can go too. Similarly, the memory case has: [C] /* If we have to change the mode of memory and cannot, the desired mode is EXTRACTION_MODE. */ if (inner_mode != wanted_inner_mode (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner)) || MEM_VOLATILE_P (inner) || pos_rtx)) wanted_inner_mode = extraction_mode; but those same conditions are repeated in the code that actually applies wanted_inner_mode: /* If INNER has a wider mode, and this is a constant extraction, try to make it smaller and adjust the byte to point to the byte containing the value. */ if (wanted_inner_mode != VOIDmode inner_mode != wanted_inner_mode ! pos_rtx GET_MODE_SIZE (wanted_inner_mode) GET_MODE_SIZE (is_mode) MEM_P (inner) ! mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner)) ! MEM_VOLATILE_P (inner)) { so [C] too is unnecessary. Tested as described in the covering note. OK to install? Richard gcc/ * combine.c (make_extraction): Remove dead wanted_inner_mode- and pos_rtx-related code. Index: gcc/combine.c === --- gcc/combine.c 2012-10-27 22:22:45.0 +0100 +++ gcc/combine.c 2012-10-27 22:32:26.811370582 +0100 @@ -7211,10 +7211,6 @@ make_extraction (enum machine_mode mode, GET_MODE_SIZE (extraction_mode) GET_MODE_SIZE (mode)) extraction_mode = mode; - if (pos_rtx GET_MODE (pos_rtx) != VOIDmode - GET_MODE_SIZE (pos_mode) GET_MODE_SIZE (GET_MODE (pos_rtx))) -pos_mode = GET_MODE (pos_rtx); - /* If this is not from memory, the desired mode is the preferred mode for an extraction pattern's first input operand, or word_mode if there is none. */ @@ -7231,14 +7227,6 @@ make_extraction (enum machine_mode mode, wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode); gcc_assert (wanted_inner_mode != VOIDmode); } - - /* If we have to change the mode of memory and cannot, the desired mode -is EXTRACTION_MODE. */ - if (inner_mode != wanted_inner_mode - (mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner)) - || MEM_VOLATILE_P (inner) - || pos_rtx)) - wanted_inner_mode = extraction_mode; } orig_pos = pos; @@ -7359,9 +7347,6 @@ make_extraction (enum machine_mode mode, } pos_rtx = temp; } - else if (pos_rtx != 0 - GET_MODE_SIZE (pos_mode) GET_MODE_SIZE (GET_MODE (pos_rtx))) -pos_rtx = gen_lowpart (pos_mode, pos_rtx); /* Make POS_RTX unless we already have it and it is correct. If we don't have a POS_RTX but we do have an ORIG_POS_RTX, the latter must
Re: [PATCH][RFC] Sanity checking for -freorder-blocks-and-partition failures
On Tue, Oct 30, 2012 at 12:25 PM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Oct 30, 2012 at 5:59 PM, Teresa Johnson wrote: I will try testing your patch on top of mine with our fdo benchmarks. Thanks. But you should expect a lot of errors, hopefully you can make something out of it for Bugzilla. For the others on the cc list, you may need to include my patch as well for testing. Without it, -freorder-blocks-and-partition was DOA for me. For my patch, see http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02692.html Ah, the fate of a pass that isn't enabled by default. I see only 5 test cases under tree-prof for function splitting. More test cases are needed. David From what I can tell, looking at this code the past few hours, it's hopelessly broken at the moment: * It doesn't work with cfglayout mode, even though it is in between pass_into_cfg_layout_mode and pass_outof_cfg_layout_mode. Coming out of cfglayout mode * Coming out of cfglayout mode, fixup_reorder_chain adds REG_CROSSING_JUMP -- but so does force_nonfallthru_and_redirect called just before it. So we end up with 1 such note, or with notes on jumps that shouldn't have them. * The scheduler doesn't respect the partitioning at all. remove_notes happily removes the section split note. * etc. This pass may need some major work to be in good order again. Ciao! Steven
[7/8] BITS_BIG_ENDIAN vs. (zero_extract (const_int ...) ...)
Combine tries to optimise comparisons involving: (zero_extract (const_int X) (const_int 1) (var Y)) and so on BITS_BIG_ENDIAN targets it tries gamely to work out what mode X actually has. At the moment it tries reading the mode from operand 1 of extzv, but that doesn't feel right, since we never use extzv itself with this combination of operands. (We only use it with combinations in which the first zero_extract operand is variable and the third is constant.) And extzv isn't necessarily a good indicator of what a matched zero_extract does, since targets often have matchable zero_extract insns for more than one mode. E.g. powerpc (a BITS_BIG_ENDIAN target) has both SImode and DImode patterns. In practice, all BITS_BIG_ENDIAN targets that have an extzv pattern either have an explicit word_mode operand (e.g. m68k) or leave it void, which make_for_extraction treats as word_mode. Since word_mode is also the default assumption when no extzv pattern is defined, I think it would be more robust to assume/require word_mode across the board (as much as anything can be called robust in this sort of situation). Tested as described in the covering note. OK to install? Richard gcc/ * combine.c (simplify_comparison): If BITS_BIG_ENDIAN, always assume that zero_extracts of const_ints are doing word-sized extractions. Index: gcc/combine.c === --- gcc/combine.c 2012-10-29 14:14:36.371315725 + +++ gcc/combine.c 2012-10-29 14:29:26.800313546 + @@ -11154,17 +11154,7 @@ simplify_comparison (enum rtx_code code, (i = exact_log2 (UINTVAL (XEXP (op0, 0 = 0) { if (BITS_BIG_ENDIAN) - { - enum machine_mode new_mode - = mode_for_extraction (EP_extzv, 1); - if (new_mode == MAX_MACHINE_MODE) - i = BITS_PER_WORD - 1 - i; - else - { - mode = new_mode; - i = (GET_MODE_PRECISION (mode) - 1 - i); - } - } + i = BITS_PER_WORD - 1 - i; op0 = XEXP (op0, 2); op1 = GEN_INT (i);
Re: [PATCH, GCC 4.7] Backport fix for PR tree-optimization/53708
On Tue, Oct 30, 2012 at 02:03:44PM -0500, Peter Bergner wrote: Ok, then I'll bootstrap and regtest your suggested change while we wait for richi to comment. I'm fine with whatever you and richi decide is best. The ObjC guys should probably test it though too. I assume you think we should change the current trunk code as well? Well, I haven't looked at the ObjC failures (guess they are darwin only anyway), so have no idea whether those set section name or not. So, if my proposed test instead of the trunk one doesn't work for darwin, perhaps it could be DECL_PRESERVED_P (decl) || (DECL_SECTION_NAME (decl) !...). I think DECL_USER_ALIGN test is undesirable for that though, it is just fine to increase alignment of anything, after all, it is still aligned properly then. Jakub