[PATCH] Add support for lzd and popc instructions on sparc.
Unfortunately, only 64-bit versions of popc and lzd exist, so I have to play some shenanigans to make SImode and v8plus cases work. But it's definitely worth it. I plan to tweak this stuff and perhaps also add some explicit ffs patterns as well later. There are only two sets of VIS3 instructions still unsupported, addxc and addxccc (basically, add with carry but use 64-bit condition codes instead of the 32-bit ones), and the new instructions that allow directly moving between integer and float regs without using memory (movstosw, movstouw, movdtox, movwtos, movxtod). Then I want to seriously look into redoing how v8plus is implemented. I think we should let the compiler know that v8plus-capable integer registers can be used just like in 64-bit mode. And then we have reload patterns for moving DImode values between v8plus-capable and non-v8plus-capable registers. Then we can really get rid of these crazy v8plus patterns that emit more than one real instruction. I'd also like to make -mv8plus a dead option, or if anything the default when V9 and not-ARCH64. Really, there is no V9 capable system in the universe that does not properly preserve the V9 register state in the out/global/float registers across traps when running a 32-bit executable. I also want to look into supporting the vector infrastructure better. VIS2 and VIS3 allow a lot more named patterns and features to be supported as Richard pointed out the other day. Anyways, committed to trunk. gcc/ * config/sparc/sparc.opt (POPC): New option. * doc/invoke.texi: Document it. * config/sparc/sparc.c (sparc_option_override): Enable MASK_POPC by default on Niagara-2 and later. * config/sparc/sparc.h (CLZ_DEFINED_VALUE_AT_ZERO): Define. * config/sparc/sparc.md (SIDI): New mode iterator. (ffsdi2): Delete commented out pattern and comments. (popcountmode2, clzmode2): New expanders. (*popcountmode_sp64, popcountsi_v8plus, popcountdi_v8plus, *clzdi_sp64, clzdi_v8plus, *clzsi_sp64, clzsi_v8plus): New insns. gcc/testsuite/ * gcc.target/sparc/lzd.c: New test. * gcc.target/sparc/popc.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179591 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 13 gcc/config/sparc/sparc.c |6 +- gcc/config/sparc/sparc.h |5 ++ gcc/config/sparc/sparc.md | 108 gcc/config/sparc/sparc.opt|4 + gcc/doc/invoke.texi | 11 +++- gcc/testsuite/ChangeLog |5 ++ gcc/testsuite/gcc.target/sparc/lzd.c | 18 ++ gcc/testsuite/gcc.target/sparc/popc.c | 18 ++ 9 files changed, 170 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/lzd.c create mode 100644 gcc/testsuite/gcc.target/sparc/popc.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7623ff4..a3cd404 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2011-10-05 David S. Miller da...@davemloft.net + + * config/sparc/sparc.opt (POPC): New option. + * doc/invoke.texi: Document it. + * config/sparc/sparc.c (sparc_option_override): Enable MASK_POPC by + default on Niagara-2 and later. + * config/sparc/sparc.h (CLZ_DEFINED_VALUE_AT_ZERO): Define. + * config/sparc/sparc.md (SIDI): New mode iterator. + (ffsdi2): Delete commented out pattern and comments. + (popcountmode2, clzmode2): New expanders. + (*popcountmode_sp64, popcountsi_v8plus, popcountdi_v8plus, + *clzdi_sp64, clzdi_v8plus, *clzsi_sp64, clzsi_v8plus): New insns. + 2011-10-06 Artjoms Sinkarovs artyom.shinkar...@gmail.com PR middle-end/50607 diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index b2cbdd2..9606f68 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -774,11 +774,11 @@ sparc_option_override (void) { MASK_ISA, MASK_V9|MASK_DEPRECATED_V8_INSNS}, /* UltraSPARC T2 */ -{ MASK_ISA, MASK_V9|MASK_VIS2}, +{ MASK_ISA, MASK_V9|MASK_POPC|MASK_VIS2}, /* UltraSPARC T3 */ -{ MASK_ISA, MASK_V9|MASK_VIS2|MASK_VIS3|MASK_FMAF}, +{ MASK_ISA, MASK_V9|MASK_POPC|MASK_VIS2|MASK_VIS3|MASK_FMAF}, /* UltraSPARC T4 */ -{ MASK_ISA, MASK_V9|MASK_VIS2|MASK_VIS3|MASK_FMAF}, +{ MASK_ISA, MASK_V9|MASK_POPC|MASK_VIS2|MASK_VIS3|MASK_FMAF}, }; const struct cpu_table *cpu; unsigned int i; diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index fa94387..0642ff2 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -1608,6 +1608,11 @@ do { \ is done just by pretending it is already truncated. */ #define TRULY_NOOP_TRUNCATION(OUTPREC, INPREC) 1 +/* For SImode, we make sure the top 32-bits of the register are clear and + then we subtract 32
Re: [PATCH] Fix memory leak in vect_pattern_recog_1
On 5 October 2011 20:06, Jakub Jelinek ja...@redhat.com wrote: Hi! If vect_recog_func fails (or the other spot where vect_pattern_recog_1 returns early), the vector allocated in the function isn't freed, leading to memory leak. But, more importantly, doing a VEC_alloc + VEC_free num_stmts_in_loop * NUM_PATTERNS times seems to be completely unnecessary, the following patch allocates just one vector for that purpose in the caller and only performs VEC_truncate in each call to make it independent from previous uses of the vector. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Ira 2011-10-05 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_pattern_recog_1): Add stmts_to_replace argument, truncate it at the beginning instead of allocating there and freeing at the end. (vect_pattern_recog): Allocate stmts_to_replace here and free at end, pass its address to vect_pattern_recog_1. --- gcc/tree-vect-patterns.c.jj 2011-09-26 14:06:52.0 +0200 +++ gcc/tree-vect-patterns.c 2011-10-05 15:57:38.0 +0200 @@ -1281,7 +1281,8 @@ vect_mark_pattern_stmts (gimple orig_stm static void vect_pattern_recog_1 ( gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *), - gimple_stmt_iterator si) + gimple_stmt_iterator si, + VEC (gimple, heap) **stmts_to_replace) { gimple stmt = gsi_stmt (si), pattern_stmt; stmt_vec_info stmt_info; @@ -1291,14 +1292,14 @@ vect_pattern_recog_1 ( enum tree_code code; int i; gimple next; - VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); - VEC_quick_push (gimple, stmts_to_replace, stmt); - pattern_stmt = (* vect_recog_func) (stmts_to_replace, type_in, type_out); + VEC_truncate (gimple, *stmts_to_replace, 0); + VEC_quick_push (gimple, *stmts_to_replace, stmt); + pattern_stmt = (* vect_recog_func) (stmts_to_replace, type_in, type_out); if (!pattern_stmt) return; - stmt = VEC_last (gimple, stmts_to_replace); + stmt = VEC_last (gimple, *stmts_to_replace); stmt_info = vinfo_for_stmt (stmt); loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); @@ -1363,8 +1364,8 @@ vect_pattern_recog_1 ( /* It is possible that additional pattern stmts are created and inserted in STMTS_TO_REPLACE. We create a stmt_info for each of them, and mark the relevant statements. */ - for (i = 0; VEC_iterate (gimple, stmts_to_replace, i, stmt) - (unsigned) i (VEC_length (gimple, stmts_to_replace) - 1); + for (i = 0; VEC_iterate (gimple, *stmts_to_replace, i, stmt) + (unsigned) i (VEC_length (gimple, *stmts_to_replace) - 1); i++) { stmt_info = vinfo_for_stmt (stmt); @@ -1377,8 +1378,6 @@ vect_pattern_recog_1 ( vect_mark_pattern_stmts (stmt, pattern_stmt, NULL_TREE); } - - VEC_free (gimple, heap, stmts_to_replace); } @@ -1468,6 +1467,7 @@ vect_pattern_recog (loop_vec_info loop_v gimple_stmt_iterator si; unsigned int i, j; gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); + VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, === vect_pattern_recog ===); @@ -1483,8 +1483,11 @@ vect_pattern_recog (loop_vec_info loop_v for (j = 0; j NUM_PATTERNS; j++) { vect_recog_func_ptr = vect_vect_recog_func_ptrs[j]; - vect_pattern_recog_1 (vect_recog_func_ptr, si); + vect_pattern_recog_1 (vect_recog_func_ptr, si, + stmts_to_replace); } } } + + VEC_free (gimple, heap, stmts_to_replace); } Jakub
Re: [v3] add max_size and rebind to __alloc_traits
On 6 October 2011 02:57, Paolo Carlini wrote: today I ran the whole testsuite in C++0x mode and I'm pretty sure that 23_containers/vector/modifiers/swap/3.cc, which is now failing, wasn't a couple of days ago (I ran the whole testsuite like that in order to validate my std::list changes). When you have time, could you please double check? (maybe after all we *do* want it to fail in C++0x mode, but I'd like to understand if the behavior changed inadvertently!) I think you're right it wasn't failing before, as I ran the whole testsuite in C++0x mode when I first added alloc_traits - I'll check it today and see how I broke it.
Re: [PATCH] Fix PR46556 (poor address generation)
On 10/05/2011 10:16 PM, William J. Schmidt wrote: OK, I see. If there's a better place downstream to make a swizzle, I'm certainly fine with that. I disabled locally_poor_mem_replacement and added some dump information in should_replace_address to show the costs for the replacement I'm trying to avoid: In should_replace_address: old_rtx = (reg/f:DI 125 [ D.2036 ]) new_rtx = (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) address_cost (old_rtx) = 0 address_cost (new_rtx) = 0 set_src_cost (old_rtx) = 0 set_src_cost (new_rtx) = 4 In insn 11, replacing (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)-a S4 A32]) with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) [2 p_1(D)-a S4 A32]) Changed insn 11 deferring rescan insn with uid = 11. deferring rescan insn with uid = 11. And IIUC the other address is based on pseudo 125 as well, but the combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and cannot be represented on ppc. I think _this_ is the problem, so I'm afraid your patch could cause pessimizations on x86 for example. On x86, which has a cheap REG+REG+CONST addressing mode, it is much better to propagate pseudo 125 so that you can delete the set altogether. However, indeed there is no downstream pass that undoes the transformation. Perhaps we can do it in CSE, since this _is_ CSE after all. :) The attached untested (uncompiled) patch is an attempt. Paolo Index: cse.c === --- cse.c (revision 177688) +++ cse.c (working copy) @@ -3136,6 +3136,75 @@ find_comparison_args (enum rtx_code code return code; } +static rtx +lookup_addr (rtx insn, rtx *loc, enum machine_mode mode) +{ + struct table_elt *elt, *p; + int regno; + int hash; + int base_cost; + rtx addr = *loc; + rtx exp; + + /* Try to reuse existing registers for addresses, in hope of shortening + live ranges for the registers that compose the addresses. This happens + when you have + + (set (reg C) (plus (reg A) (reg B)) + (set (reg D) (mem (reg C))) + (set (reg E) (mem (plus (reg C) (const_int X + + In this case fwprop will try to propagate into the addresses, but + if propagation into reg E fails, the only result will have been to + uselessly lengthen the live range of A and B. */ + + if (!REG_P (addr)) +return; + + regno = REGNO (addr); + if (regno == FRAME_POINTER_REGNUM + || regno == HARD_FRAME_POINTER_REGNUM + || regno == ARG_POINTER_REGNUM) +return; + + /* If this address is not in the hash table, we can't look for equivalences + of the whole address. Also, ignore if volatile. */ + + { +int save_do_not_record = do_not_record; +int save_hash_arg_in_memory = hash_arg_in_memory; +int addr_volatile; + +do_not_record = 0; +hash = HASH (addr, Pmode); +addr_volatile = do_not_record; +do_not_record = save_do_not_record; +hash_arg_in_memory = save_hash_arg_in_memory; + +if (addr_volatile) + return; + } + + /* Try to find a REG that holds the same address. */ + + elt = lookup (addr, hash, Pmode); + if (!elt) +return; + + base_cost = address_cost (*loc, mode); + for (p = elt-first_same_value; p; p = p-next_same_value) +{ + exp = p-exp; + if (REG_P (exp) + exp_equiv_p (exp, exp, 1, false) + address_cost (exp, mode) base_cost) +break; +} + + if (p) +validate_change (insn, loc, canon_reg (copy_rtx (exp), NULL_RTX), 0)); +} + /* If X is a nontrivial arithmetic operation on an argument for which a constant value can be determined, return the result of operating on that value, as a constant. Otherwise, return X, possibly with @@ -3180,6 +3249,12 @@ fold_rtx (rtx x, rtx insn) switch (code) { case MEM: + if ((new_rtx = equiv_constant (x)) != NULL_RTX) +return new_rtx; + if (insn) +lookup_addr (insn, XEXP (x, 0), GET_MODE (x)); + return x; + case SUBREG: if ((new_rtx = equiv_constant (x)) != NULL_RTX) return new_rtx; Index: passes.c === --- passes.c (revision 177688) +++ passes.c (working copy) @@ -1448,9 +1448,9 @@ init_optimization_passes (void) } NEXT_PASS (pass_web); NEXT_PASS (pass_rtl_cprop); + NEXT_PASS (pass_rtl_fwprop_addr); NEXT_PASS (pass_cse2); NEXT_PASS (pass_rtl_dse1); - NEXT_PASS (pass_rtl_fwprop_addr); NEXT_PASS (pass_inc_dec); NEXT_PASS (pass_initialize_regs); NEXT_PASS (pass_ud_rtl_dce);
[PATCH] Fix PR38884
This handles the case of CSEing part of an SSA name that is stored to memory and defined with a composition like COMPLEX_EXPR or CONSTRUCTOR. This fixes the remaining pieces of PR38884 and PR38885. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-06 Richard Guenther rguent...@suse.de PR tree-optimization/38884 * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads from aggregate SSA names. * gcc.dg/tree-ssa/ssa-fre-34.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-35.c: Likewise. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 179556) --- gcc/tree-ssa-sccvn.c(working copy) *** vn_reference_lookup_3 (ao_ref *ref, tree *** 1489,1495 } } ! /* 4) For aggregate copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE gimple_assign_single_p (def_stmt) --- 1489,1554 } } ! /* 4) Assignment from an SSA name which definition we may be able ! to access pieces from. */ ! else if (ref-size == maxsize ! is_gimple_reg_type (vr-type) ! gimple_assign_single_p (def_stmt) ! TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME) ! { ! tree rhs1 = gimple_assign_rhs1 (def_stmt); ! gimple def_stmt2 = SSA_NAME_DEF_STMT (rhs1); ! if (is_gimple_assign (def_stmt2) ! (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR ! || gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR) ! types_compatible_p (vr-type, TREE_TYPE (TREE_TYPE (rhs1 ! { ! tree base2; ! HOST_WIDE_INT offset2, size2, maxsize2, off; ! base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt), ! offset2, size2, maxsize2); ! off = offset - offset2; ! if (maxsize2 != -1 ! maxsize2 == size2 ! operand_equal_p (base, base2, 0) ! offset2 = offset ! offset2 + size2 = offset + maxsize) ! { ! tree val = NULL_TREE; ! HOST_WIDE_INT elsz ! = TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (rhs1; ! if (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR) ! { ! if (off == 0) ! val = gimple_assign_rhs1 (def_stmt2); ! else if (off == elsz) ! val = gimple_assign_rhs2 (def_stmt2); ! } ! else if (gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR ! off % elsz == 0) ! { ! tree ctor = gimple_assign_rhs1 (def_stmt2); ! unsigned i = off / elsz; ! if (i CONSTRUCTOR_NELTS (ctor)) ! { ! constructor_elt *elt = CONSTRUCTOR_ELT (ctor, i); ! if (compare_tree_int (elt-index, i) == 0) ! val = elt-value; ! } ! } ! if (val) ! { ! unsigned int value_id = get_or_alloc_constant_value_id (val); ! return vn_reference_insert_pieces ! (vuse, vr-set, vr-type, ! VEC_copy (vn_reference_op_s, heap, vr-operands), ! val, value_id); ! } ! } ! } ! } ! ! /* 5) For aggregate copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE gimple_assign_single_p (def_stmt) *** vn_reference_lookup_3 (ao_ref *ref, tree *** 1587,1593 return NULL; } ! /* 5) For memcpy copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE is_gimple_reg_type (vr-type) --- 1646,1652 return NULL; } ! /* 6) For memcpy copies translate the reference through them if the copy kills ref. */ else if (vn_walk_kind == VN_WALKREWRITE is_gimple_reg_type (vr-type) Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c === *** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c (revision 0) *** *** 0 --- 1,18 + /* { dg-do compile } */ + /* { dg-options -O -fdump-tree-fre1-details } */ + + #define vector __attribute__((vector_size(16) )) + + struct { + float i; + vector float global_res; + } s; + float foo(float f) + { + vector float res = (vector float){0.0f,f,0.0f,1.0f}; + s.global_res = res; + return *((float*)s.global_res + 1); + } + + /* { dg-final { scan-tree-dump Replaced BIT_FIELD_REF.*with
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 6:53 PM, Diego Novillo dnovi...@google.com wrote: On Wed, Oct 5, 2011 at 11:28, Diego Novillo dnovi...@google.com wrote: On Wed, Oct 5, 2011 at 10:51, Richard Guenther richard.guent...@gmail.com wrote: Did you also mark the function with always_inline? That's a requirement as artificial only works for inlined function bodies. Yeah. It doesn't quite work as I expect it to. It steps into the function at odd places. So, I played with this some more with this, and there seems to be some inconsistency in how these attributes get handled. http://sourceware.org/bugzilla/show_bug.cgi?id=13263 static inline int foo (int) __attribute__((always_inline,artificial)); static inline int foo (int x) { int y = x - 3; return y; } int bar (int y) { return y == 0; } main () { foo (10); return bar (foo (3)); } With GCC 4.7, the stand alone call foo(10) is not ignored by 'step'. However, the embedded call bar(foo(3)) is ignored as I was expecting. Hm, nothing is ignored for me with gcc 4.6. Diego.
Re: Modify gcc for use with gdb (issue5132047)
On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo dnovi...@google.com wrote: On Wed, Oct 5, 2011 at 14:20, Mike Stump mikest...@comcast.net wrote: On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote: I think we need to find a solution for this situation. The solution Apple found and implemented is a __nodebug__ attribute, as can be seen in Apple's gcc. We use it like so: #define __always_inline__ __always_inline__, __nodebug__ #undef __always_inline__ in headers like mmintrn.h: __STATIC_INLINE void __attribute__((__always_inline__)) /* APPLE LOCAL end radar 5618945 */ _mm_empty (void) { __builtin_ia32_emms (); } Ah, nice. Though, one of the things I am liking more and more about the blacklist solution is that it (a) does not need any modifications to the source code, and (b) it works with no-inline functions as well. This gives total control to the developer. I would blacklist a bunch of functions I never care to go into, for instance. Others may choose to blacklist a different set. And you can change that from debug session to the next. I agree with Jakub that artificial functions should be blacklisted automatically, however. Richi, Jakub, if the blacklist solution was implemented in GCC would you agree with promoting these macros into inline functions? This is orthogonal to http://sourceware.org/bugzilla/show_bug.cgi?id=13263, of course. I know you are on to that C++ thing and ending up returning a reference to make it an lvalue. Which I very much don't like (please, if you go that route add _set functions and lower the case of the macros). What's the other advantage of using inline functions? The gdb annoyance with the macros can be solved with the .gdbinit macro defines (which might be nice to commit to SVN btw). Richard. Thanks. Diego.
Re: [patch, arm] Fix PR target/50305 (arm_legitimize_reload_address problem)
On 4 October 2011 16:13, Ulrich Weigand uweig...@de.ibm.com wrote: Ramana Radhakrishnan wrote: On 26 September 2011 15:24, Ulrich Weigand uweig...@de.ibm.com wrote: Is this sufficient, or should I test any other set of options as well? Could you run one set of tests with neon ? Sorry for the delay, but I had to switch to my IGEP board for Neon support, and that's a bit slow ... In any case, I've now completed testing the patch with Neon with no regressions. Just to clarify: in the presence of the other options that are already in dg-options, the test case now fails (with the unpatched compiler) for *any* setting of -mfloat-abi (hard, soft, or softfp). Do you still want me to add a specific setting to the test case? No the mfpu=vfpv3 is fine. OK, thanks. Instead of skipping I was wondering if we could prune the outputs to get this through all the testers we have. Well, the problem is that with certain -march options (e.g. armv7) we get: /home/uweigand/gcc-head/gcc/testsuite/gcc.target/arm/pr50305.c:1:0: error: target CPU does not support ARM mode Ah - ok. Since this is an *error*, pruning the output doesn't really help, the test isn't being run in any case. Otherwise this is OK. Given the above, is the patch now OK as-is? OK by me. Ramana
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c(revision 179592) +++ fold-const.c(working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + !FLOAT_TYPE_P (TREE_TYPE (arg1)) + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) +{ + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR), + type, arg0, arg1); +} + return NULL_TREE; }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote: Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c (revision 179592) +++ fold-const.c (working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT Why only for LOGICAL_OP_NON_SHORT_CIRCUIT? It doesn't make a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ... + !FLOAT_TYPE_P (TREE_TYPE (arg1)) ? I hope we don't have || float. + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). + { + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) Err ... so why do you recurse here (and associate)? Even with different predicates than above ... And similar transforms seem to happen in fold_truthop - did you investigate why it didn't trigger there. And I'm missing a testcase. Richard. + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, arg0, arg1); + } + return NULL_TREE; }
Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries tom_devr...@mentor.com wrote: On 10/05/2011 10:46 AM, Richard Guenther wrote: On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries tom_devr...@mentor.com wrote: On 10/04/2011 03:03 PM, Richard Guenther wrote: On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries tom_devr...@mentor.com wrote: On 10/01/2011 05:46 PM, Tom de Vries wrote: On 09/30/2011 03:29 PM, Richard Guenther wrote: On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries tom_devr...@mentor.com wrote: On 09/28/2011 11:53 AM, Richard Guenther wrote: On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, I got a patch for PR50527. The patch prevents the alignment of vla-related allocas to be set to BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding the alloca. Bootstrapped and regtested on x86_64. OK for trunk? Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that the vectorizer then will no longer see that the arrays are properly aligned. I'm not sure what the best thing to do is here, other than trying to record the alignment requirement of the VLA somewhere. Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT has the issue that it will force stack-realignment which isn't free (and the point was to make the decl cheaper than the alloca). But that might possibly be the better choice. Any other thoughts? How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla for the new array prevents stack realignment for folded vla-allocas, also for large vlas. This will not help in vectorizing large folded vla-allocas, but I think it's not reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has been the case up until we started to fold). If you want to trigger vectorization for a vla, you can still use the aligned attribute on the declaration. Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using an attribute on the decl. This patch exploits this by setting it at the end of the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in propagation though, because although the ptr_info of the lhs is propagated via copy_prop afterwards, it's not propagated anymore via ccp. Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and not fold during ccp3. Ugh, somehow I like this the least ;) How about lowering VLAs to p = __builtin_alloca (...); p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); and not assume anything for alloca itself if it feeds a __builtin_assume_aligned? Or rather introduce a __builtin_alloca_with_align () and for VLAs do p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); that's less awkward to use? Sorry for not having a clear plan here ;) Using assume_aligned is a more orthogonal way to represent this in gimple, but indeed harder to use. Another possibility is to add a 'tree vla_decl' field to struct gimple_statement_call, which is probably the easiest to implement. But I think __builtin_alloca_with_align might have a use beyond vlas, so I decided to try this one. Attached patch implements my first stab at this (now testing on x86_64). Is this an acceptable approach? bootstrapped and reg-tested (including ada) on x86_64. Ok for trunk? The idea is ok I think. But case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: /* If the allocation stems from the declaration of a variable-sized object, it cannot accumulate. */ target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); if (target) return target; + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) + == BUILT_IN_ALLOCA_WITH_ALIGN) + { + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), + built_in_decls[BUILT_IN_ALLOCA], + 1, CALL_EXPR_ARG (exp, 0)); + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); + exp = new_call; + } Ick. Why can't the rest of the compiler not handle BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? (thus, arrange things so the assembler name of alloca-with-align is alloca?) We can set the assembler name in the local_define_builtin call. But that will still create a call alloca (12, 4). How do we deal with the second argument? I don't see why you still need the special late CCP pass. For alloca_with_align, the align will minimally be the 2nd argument. This is independent of folding, and we can propagate this information in every ccp. If the alloca_with_align is not folded and will not be folded anymore (something we know at the earliest after the propagation phase of the last ccp), the alignment of BIGGEST_ALIGNMENT is guaranteed, because we
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Richard Guenther richard.guent...@gmail.com: On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz kti...@redhat.com wrote: Hello, Sorry attached non-updated change. Here with proper attached patch. This patch improves in fold_truth_andor the generation of branch-conditions for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set. If right-hand side operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand operand is a simple operand, and has no side-effects. ChangeLog 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai ndex: fold-const.c === --- fold-const.c (revision 179592) +++ fold-const.c (working copy) @@ -8387,6 +8387,33 @@ if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + simple_operand_p (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT Why only for LOGICAL_OP_NON_SHORT_CIRCUIT? It doesn't make a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ... Well, I used this check only for not doing this transformation for targets, which have low-cost branches. This is the same thing as in fold_truthop. It does this transformation only if LOGICAL_OP_NON_SHORT_CIRCUIT is true. + !FLOAT_TYPE_P (TREE_TYPE (arg1)) ? I hope we don't have || float. This can happen. Operands of TRUTH_AND|OR(IF)_EXPR aren't necessarily of integral type. After expansion in gimplifier, we have for sure comparisons, but not in c-tree. + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. This check is the same as in fold_truthop. I used this check. The point here is that floats might trap. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). See inner of if condition for those checks. I moved those checks for arg1 out of the inner conditions to avoid double-checking. + { + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) Err ... so why do you recurse here (and associate)? Even with different predicates than above ... See, here is the missing check. Point is that even if arg0 has side-effects and is a (AND|OR)IF expression, we might be able to associate with right-hand argument of arg0, if for it no side-effects are existing. Otherwise we wouldn't catch this case. We have here in maximum a recursion level of one. And similar transforms seem to happen in fold_truthop - did you investigate why it didn't trigger there. This is pretty simple. The point is that only for comparisons this transformation is done. But in c-tree we don't have here necessarily for TRUTH_(AND|OR)[IF]_EXPR comparison arguments, not necessarily integral ones (see above). And I'm missing a testcase. Ok, I'll add one. Effect can be seen best after gimplification. Richard. + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem); + } + if (!TREE_SIDE_EFFECTS (arg0) + simple_operand_p (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, arg0, arg1); + } + return NULL_TREE; } Regards. Kai
Re: [PATCH] Fix PR46556 (poor address generation)
On Wed, 5 Oct 2011, William J. Schmidt wrote: This patch addresses the poor code generation in PR46556 for the following code: struct x { int a[16]; int b[16]; int c[16]; }; extern void foo (int, int, int); void f (struct x *p, unsigned int n) { foo (p-a[n], p-c[n], p-b[n]); } Prior to the fix for PR32698, gcc calculated the offset for accessing the array elements as: n*4; 64+n*4; 128+n*4. Following that fix, the offsets are calculated as: n*4; (n+16)*4; (n +32)*4. This led to poor code generation on powerpc64 targets, among others. The poor code generation was observed to not occur in loops, as the IVOPTS code does a good job of lowering these expressions to MEM_REFs. It was previously suggested that perhaps a general pass to lower memory accesses to MEM_REFs in GIMPLE would solve not only this, but other similar problems. I spent some time looking into various approaches to this, and reviewing some previous attempts to do similar things. In the end, I've concluded that this is a bad idea in practice because of the loss of useful aliasing information. In particular, early lowering of component references causes us to lose the ability to disambiguate non-overlapping references in the same structure, and there is no simple way to carry the necessary aliasing information along with the replacement MEM_REFs to avoid this. While some performance gains are available with GIMPLE lowering of memory accesses, there are also offsetting performance losses, and I suspect this would just be a continuous source of bug reports into the future. Therefore the current patch is a much simpler approach to solve the specific problem noted in the PR. There are two pieces to the patch: * The offending addressing pattern is matched in GIMPLE and transformed into a restructured MEM_REF that distributes the multiply, so that (n +32)*4 becomes 4*n+128 as before. This is done during the reassociation pass, for reasons described below. The transformation only occurs in non-loop blocks, since IVOPTS does a good job on such things within loops. * A tweak is added to the RTL forward-propagator to avoid propagating into memory references based on a single base register with no offset, under certain circumstances. This improves sharing of base registers for accesses within the same structure and slightly lowers register pressure. It would be possible to separate these into two patches if that's preferred. I chose to combine them because together they provide the ideal code generation that the new test cases test for. I initially implemented the pattern matcher during expand, but I found that the expanded code for two accesses to the same structure was often not being CSEd well. So I moved it back into the GIMPLE phases prior to DOM to take advantage of its CSE. To avoid an additional complete pass over the IL, I chose to piggyback on the reassociation pass. This transformation is not technically a reassociation, but it is related enough to not be a complete wart. One noob question about this: It would probably be preferable to have this transformation only take place during the second reassociation pass, so the ARRAY_REFs are seen by earlier optimization phases. Is there an easy way to detect that it's the second pass without having to generate a separate pass entry point? One other general question about the pattern-match transformation: Is this an appropriate transformation for all targets, or should it be somehow gated on available addressing modes on the target processor? Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no performance degradations on that target for SPEC CPU2000 and CPU2006. I'm looking for eventual approval for trunk after any comments are resolved. Thanks! People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. Now some comments on the patch ... Bill 2011-10-05 Bill Schmidt wschm...@linux.vnet.ibm.com gcc: PR rtl-optimization/46556 * fwprop.c (fwprop_bb_aux_d): New struct. (MEM_PLUS_REGS): New macro. (record_mem_plus_reg): New function. (record_mem_plus_regs): Likewise. (single_def_use_enter_block): Record
Re: Unreviewed libgcc patches
On 10/06/2011 12:21 PM, Rainer Orth wrote: Can you post an updated patch for this one? I'll try to review the others as soon as possible. Do you see a change to get the other patches reviewed before stage1 closes? I'd like to get them into 4.7 rather than carry them forward for several months. Yes, I'm very sorry for the delay. Paolo
Re: Commit: RX: Codegen bug fixes
Hi Richard, The SMIN pattern has the same problem. *sigh* Fixed. Cheers Nick
Re: Initial shrink-wrapping patch
On 10/06/11 05:17, Ian Lance Taylor wrote: Thinking about it I think this is the wrong approach. The -fsplit-stack code by definition has to wrap the entire function and it can not modify any callee-saved registers. We should do shrink wrapping before -fsplit-stack, not the other way around. Sorry, I'm not following what you're saying here. Can you elaborate? Bernd
[PATCH] Some TLC
Noticed when working on vector/complex folding and simplification. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-06 Richard Guenther rguent...@suse.de * fold-const.c (fold_ternary_loc): Also fold non-constant vector CONSTRUCTORs. Make more efficient. * tree-ssa-dom.c (cprop_operand): Don't handle virtual operands. (cprop_into_stmt): Don't propagate into virtual operands. (optimize_stmt): Really dump original statement. Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 179592) --- gcc/fold-const.c(working copy) *** fold_ternary_loc (location_t loc, enum t *** 13647,13653 case BIT_FIELD_REF: if ((TREE_CODE (arg0) == VECTOR_CST ! || (TREE_CODE (arg0) == CONSTRUCTOR TREE_CONSTANT (arg0))) type == TREE_TYPE (TREE_TYPE (arg0))) { unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1); --- 13647,13653 case BIT_FIELD_REF: if ((TREE_CODE (arg0) == VECTOR_CST ! || TREE_CODE (arg0) == CONSTRUCTOR) type == TREE_TYPE (TREE_TYPE (arg0))) { unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1); *** fold_ternary_loc (location_t loc, enum t *** 13659,13682 (idx = idx / width) TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))) { - tree elements = NULL_TREE; - if (TREE_CODE (arg0) == VECTOR_CST) - elements = TREE_VECTOR_CST_ELTS (arg0); - else { ! unsigned HOST_WIDE_INT idx; ! tree value; ! ! FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg0), idx, value) ! elements = tree_cons (NULL_TREE, value, elements); } ! while (idx-- 0 elements) ! elements = TREE_CHAIN (elements); ! if (elements) ! return TREE_VALUE (elements); ! else ! return build_zero_cst (type); } } --- 13659,13675 (idx = idx / width) TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))) { if (TREE_CODE (arg0) == VECTOR_CST) { ! tree elements = TREE_VECTOR_CST_ELTS (arg0); ! while (idx-- 0 elements) ! elements = TREE_CHAIN (elements); ! if (elements) ! return TREE_VALUE (elements); } ! else if (idx CONSTRUCTOR_NELTS (arg0)) ! return CONSTRUCTOR_ELT (arg0, idx)-value; ! return build_zero_cst (type); } } Index: gcc/tree-ssa-dom.c === *** gcc/tree-ssa-dom.c (revision 179592) --- gcc/tree-ssa-dom.c (working copy) *** cprop_operand (gimple stmt, use_operand_ *** 1995,2011 val = SSA_NAME_VALUE (op); if (val val != op) { - /* Do not change the base variable in the virtual operand -tables. That would make it impossible to reconstruct -the renamed virtual operand if we later modify this -statement. Also only allow the new value to be an SSA_NAME -for propagation into virtual operands. */ - if (!is_gimple_reg (op) - (TREE_CODE (val) != SSA_NAME - || is_gimple_reg (val) - || get_virtual_var (val) != get_virtual_var (op))) - return; - /* Do not replace hard register operands in asm statements. */ if (gimple_code (stmt) == GIMPLE_ASM !may_propagate_copy_into_asm (op)) --- 1995,2000 *** cprop_into_stmt (gimple stmt) *** 2076,2086 use_operand_p op_p; ssa_op_iter iter; ! FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_ALL_USES) ! { ! if (TREE_CODE (USE_FROM_PTR (op_p)) == SSA_NAME) ! cprop_operand (stmt, op_p); ! } } /* Optimize the statement pointed to by iterator SI. --- 2065,2072 use_operand_p op_p; ssa_op_iter iter; ! FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE) ! cprop_operand (stmt, op_p); } /* Optimize the statement pointed to by iterator SI. *** optimize_stmt (basic_block bb, gimple_st *** 2107,2124 old_stmt = stmt = gsi_stmt (si); - if (gimple_code (stmt) == GIMPLE_COND) - canonicalize_comparison (stmt); - - update_stmt_if_modified (stmt); - opt_stats.num_stmts++; - if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, Optimizing statement ); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } /* Const/copy propagate into USES, VUSES and the RHS of VDEFs. */ cprop_into_stmt (stmt); --- 2093,2110 old_stmt = stmt = gsi_stmt
Re: [Patch] Support DEC-C extensions
On Tue, Oct 4, 2011 at 5:46 AM, Pedro Alves pe...@codesourcery.com wrote: On Tuesday 04 October 2011 11:16:30, Gabriel Dos Reis wrote: Do we need to consider ABIs that have calling conventions that treat unprototyped and varargs functions differently? (is there any?) Could you elaborate on the equivalence of these declarations? I expected that with: extern void foo(); extern void bar(...); foo (1, 2, 0.3f, NULL, 5); bar (1, 2, 0.3f, NULL, 5); the compiler would emit the same for both of those calls (calling convention wise). That is, for example, on x86-64, %rax is set to 1 (number of floating point parameters passed to the function in SSE registers) in both cases. Except that variadics use a different kind of calling convention than the rest. But not to be equivalent at the source level, that is: extern void foo(); extern void foo(int a); extern void bar(...); extern void bar(int a); should be a conflicting types for ’bar’ error in C. -- Pedro Alves
Re: Vector shuffling
Artem Shinkarov schrieb: Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote: On 10/03/2011 05:14 AM, Artem Shinkarov wrote: Hi, can anyone commit it please? Richard? Or may be Richard? Committed. r~ Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. The following test cases cause FAILs because main cannot be found by the linker because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file. Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c The following patch avoids __SIZEOF_INT__. Ok by some maintainer to commit? Johann testsuite/ * lib/target-supports.exp (check_effective_target_int32): New function. * gcc.c-torture/execute/vect-shuffle-1.c: Don't use __SIZEOF_INT__. * gcc.c-torture/execute/vect-shuffle-5.c: Ditto. * gcc.c-torture/execute/vect-shuffle-1.x: New file. * gcc.c-torture/execute/vect-shuffle-5.x: New file. Index: lib/target-supports.exp === --- lib/target-supports.exp (revision 179599) +++ lib/target-supports.exp (working copy) @@ -1583,6 +1583,15 @@ proc check_effective_target_int16 { } { }] } +# Returns 1 if we're generating 32-bit integers with the +# default options, 0 otherwise. + +proc check_effective_target_int32 { } { +return [check_no_compiler_messages int32 object { + int dummy[sizeof (int) == 4 ? 1 : -1]; +}] +} + # Return 1 if we're generating 64-bit code using default options, 0 # otherwise. Index: gcc.c-torture/execute/vect-shuffle-1.c === --- gcc.c-torture/execute/vect-shuffle-1.c (revision 179599) +++ gcc.c-torture/execute/vect-shuffle-1.c (working copy) @@ -1,4 +1,3 @@ -#if __SIZEOF_INT__ == 4 typedef unsigned int V __attribute__((vector_size(16), may_alias)); struct S @@ -64,5 +63,3 @@ int main() return 0; } - -#endif /* SIZEOF_INT */ Index: gcc.c-torture/execute/vect-shuffle-1.x === --- gcc.c-torture/execute/vect-shuffle-1.x (revision 0) +++ gcc.c-torture/execute/vect-shuffle-1.x (revision 0) @@ -0,0 +1,7 @@ +load_lib target-supports.exp + +if { [check_effective_target_int32] } { + return 0 +} + +return 1; Index: gcc.c-torture/execute/vect-shuffle-5.c === --- gcc.c-torture/execute/vect-shuffle-5.c (revision 179599) +++ gcc.c-torture/execute/vect-shuffle-5.c (working copy) @@ -1,4 +1,3 @@ -#if __SIZEOF_INT__ == 4 typedef unsigned int V __attribute__((vector_size(16), may_alias)); struct S @@ -60,5 +59,3 @@ int main() return 0; } - -#endif /* SIZEOF_INT */ Index: gcc.c-torture/execute/vect-shuffle-5.x === --- gcc.c-torture/execute/vect-shuffle-5.x (revision 0) +++ gcc.c-torture/execute/vect-shuffle-5.x (revision 0) @@ -0,0 +1,7 @@ +load_lib target-supports.exp + +if { [check_effective_target_int32] } { + return 0 +} + +return 1;
Re: [Patch] Support DEC-C extensions
On Tue, Oct 4, 2011 at 1:24 PM, Douglas Rupp r...@gnat.com wrote: On 10/3/2011 8:35 AM, Gabriel Dos Reis wrote: unnamed variadic functions sounds as if the function itself is unnamed, so not good. -funnamed-variadic-parameter How about -fvariadic-parameters-unnamed there's already a -fvariadic-macros, so maybe putting variadic first is more consistent? consistent with what? Consistency would imply -fvariadic-functions. But that does not make much sense since variadic functions already exist in C. -fvariadic-parameters-unnamed sounds as if the function could have several variadic parameters, but that is what is being proposed.
Re: Vector shuffling
Richard Guenther schrieb: On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay a...@gjlay.de wrote: Artem Shinkarov schrieb: Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote: On 10/03/2011 05:14 AM, Artem Shinkarov wrote: Hi, can anyone commit it please? Richard? Or may be Richard? Committed. r~ Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. The following test cases cause FAILs because main cannot be found by the linker because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file. Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c The following patch avoids __SIZEOF_INT__. Ok by some maintainer to commit? On a general note, if you need to add .x files, consider moving the test to gcc.dg/torture instead. So should I move all vect-shuffle-*.c files so that they are kept together? Johann Richard. Johann testsuite/ * lib/target-supports.exp (check_effective_target_int32): New function. * gcc.c-torture/execute/vect-shuffle-1.c: Don't use __SIZEOF_INT__. * gcc.c-torture/execute/vect-shuffle-5.c: Ditto. * gcc.c-torture/execute/vect-shuffle-1.x: New file. * gcc.c-torture/execute/vect-shuffle-5.x: New file.
Re: Vector shuffling
On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay a...@gjlay.de wrote: Richard Guenther schrieb: On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay a...@gjlay.de wrote: Artem Shinkarov schrieb: Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson r...@redhat.com wrote: On 10/03/2011 05:14 AM, Artem Shinkarov wrote: Hi, can anyone commit it please? Richard? Or may be Richard? Committed. r~ Hi, Richard There is a problem with the testcases of the patch you have committed for me. The code in every test-case is doubled. Could you please, apply the following patch, otherwise it would fail all the tests from the vector-shuffle-patch would fail. Also, if it is possible, could you change my name from in the ChangeLog from Artem Shinkarov to Artjoms Sinkarovs. The last version is the way I am spelled in the passport, and the name I use in the ChangeLog. Thanks, Artem. The following test cases cause FAILs because main cannot be found by the linker because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file. Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c The following patch avoids __SIZEOF_INT__. Ok by some maintainer to commit? On a general note, if you need to add .x files, consider moving the test to gcc.dg/torture instead. So should I move all vect-shuffle-*.c files so that they are kept together? Yes. Johann Richard. Johann testsuite/ * lib/target-supports.exp (check_effective_target_int32): New function. * gcc.c-torture/execute/vect-shuffle-1.c: Don't use __SIZEOF_INT__. * gcc.c-torture/execute/vect-shuffle-5.c: Ditto. * gcc.c-torture/execute/vect-shuffle-1.x: New file. * gcc.c-torture/execute/vect-shuffle-5.x: New file.
Re: Vector shuffling
On Thu, Oct 06, 2011 at 12:51:54PM +0200, Georg-Johann Lay wrote: The following patch avoids __SIZEOF_INT__. Ok by some maintainer to commit? That is unnecessary. You can just add #else int main () { return 0; } before the final #endif in the files instead. Or move around the #ifdefs, so that it ifdefs out for weirdo targets just everything before main and then also main's body except for return 0; at the end. Jakub
[Committed] s390 bootstrap: last_bb_active set but not used
Hi, this fixes a bootstrap problem on s390. s390 doesn't have return nor simple_return expanders so the last_bb_active variable stays unused in thread_prologue_and_epilogue_insns. Committed to mainline as obvious. Bye, -Andreas- 2011-10-06 Andreas Krebbel andreas.kreb...@de.ibm.com * function.c (thread_prologue_and_epilogue_insns): Mark last_bb_active as possibly unused. It is unused for targets which do neither have return nor simple_return expanders. Index: gcc/function.c === *** gcc/function.c.orig --- gcc/function.c *** thread_prologue_and_epilogue_insns (void *** 5453,5459 { bool inserted; basic_block last_bb; ! bool last_bb_active; #ifdef HAVE_simple_return bool unconverted_simple_returns = false; basic_block simple_return_block_hot = NULL; --- 5453,5459 { bool inserted; basic_block last_bb; ! bool last_bb_active ATTRIBUTE_UNUSED; #ifdef HAVE_simple_return bool unconverted_simple_returns = false; basic_block simple_return_block_hot = NULL;
Re: Modify gcc for use with gdb (issue5132047)
On 11-10-06 04:58 , Richard Guenther wrote: I know you are on to that C++ thing and ending up returning a reference to make it an lvalue. Which I very much don't like (please, if you go that route add _set functions and lower the case of the macros). Not necessarily. I'm after making the debugging experience easier (among other things). Only a handful of macros were converted into functions in this patch, not all of them. We may not *need* to convert all of them either. What's the other advantage of using inline functions? The gdb annoyance with the macros can be solved with the .gdbinit macro defines (which might be nice to commit to SVN btw). Static type checking, of course. Ability to set breakpoints, and as time goes on, more inline functions will start showing up. We already have several. The blacklist feature would solve your annoyance with tree_operand_length, too. Additionally, blacklist can deal with non-inline functions, which can be useful. Diego.
Re: [Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING
*ping* http://gcc.gnu.org/ml/fortran/2011-09/msg00150.html On 09/28/2011 04:28 PM, Tobias Burnus wrote: This patch makes the GCC extension __float128 (_Complex) available in the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX. Additionally, I have improved the diagnostic for explicitly use associating -std= versioned symbols. And I have finally added the iso*.def files to the makefile dependencies. As usual, with -std=f2008, the GNU extensions are not loaded. I have also updated the documentation. OK for the trunk? Tobias PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for C's __float128, please speak up before gfortran - and other compilers implement it. (At least one vendor is implementing __float128 support and plans to modify ISO_C_BINDING.) The proper name would be C___FLOAT128, but that looks awkward!
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
On Thu, Oct 6, 2011 at 2:55 PM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Oct 6, 2011 at 2:51 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Wow, it works! Thank you. New patch attached. ChangeLogs were not touched. Tests pass both on ia32/x86-64 with and without simulator. You are missing closing curly braces in dg-do compile directives. Also, please write: TYPE __attribute__((sseregparm)) test_noneg_sub_noneg_sub (TYPE a, TYPE b, TYPE c) The patch is OK with these changes. BTW, don't you also need -mfmpath=sse in dg-options? Uros.
Re: [PATCH] Fix PR46556 (poor address generation)
On Thu, 2011-10-06 at 09:47 +0200, Paolo Bonzini wrote: And IIUC the other address is based on pseudo 125 as well, but the combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and cannot be represented on ppc. I think _this_ is the problem, so I'm afraid your patch could cause pessimizations on x86 for example. On x86, which has a cheap REG+REG+CONST addressing mode, it is much better to propagate pseudo 125 so that you can delete the set altogether. However, indeed there is no downstream pass that undoes the transformation. Perhaps we can do it in CSE, since this _is_ CSE after all. :) The attached untested (uncompiled) patch is an attempt. Paolo Thanks, Paolo! This makes good sense. I will play with your (second :) patch and let you know how it goes. Bill
ARM: Fix PR49049
This corrects a brain fart in one of my patches last year: I added another alternative to a subsi for subtraction of a constant. This is bogus because such an operation should be canonicalized to a PLUS with the negative constant, Normally that's what happens, and so testing never showed that the alternative was only half-finished and didn't work. PR49049 is a testcase where we do end up replacing a REG with a constant and produce the bad alternative, leading to a crash. Tested on arm-eabi and committed as obvious. Will do some sanity checks on 4.6 and commit there as well. Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 179606) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-10-06 Bernd Schmidt ber...@codesourcery.com + + PR target/49049 + * config/arm/arm.md (arm_subsi3_insn): Lose the last alternative. + 2011-10-06 Ulrich Weigand ulrich.weig...@linaro.org PR target/50305 Index: gcc/testsuite/gcc.c-torture/compile/pr49049.c === --- gcc/testsuite/gcc.c-torture/compile/pr49049.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr49049.c (revision 0) @@ -0,0 +1,28 @@ +__extension__ typedef unsigned long long int uint64_t; + +static int +sub (int a, int b) +{ + return a - b; +} + +static uint64_t +add (uint64_t a, uint64_t b) +{ + return a + b; +} + +int *ptr; + +int +foo (uint64_t arg1, int *arg2) +{ + int j; + for (; j 1; j++) +{ + *arg2 |= sub ( sub (sub (j || 1 ^ 0x1, 1), arg1 0x1 = + sub (1, *ptr j)), +(sub ( j != 1 || sub (j j, 1) = 0, + add (!j arg1, 0x35DLL; +} +} Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 179606) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-10-06 Bernd Schmidt ber...@codesourcery.com + + PR target/49049 + * gcc.c-torture/compile/pr49049.c: New test. + 2011-10-06 Ulrich Weigand ulrich.weig...@linaro.org PR target/50305 Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 179606) +++ gcc/config/arm/arm.md (working copy) @@ -1213,27 +1213,24 @@ (define_insn thumb1_subsi3_insn ; ??? Check Thumb-2 split length (define_insn_and_split *arm_subsi3_insn - [(set (match_operand:SI 0 s_register_operand =r,r,rk,r,r) - (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,k,?n,r) - (match_operand:SI 2 reg_or_int_operand r,rI,r, r,?n)))] + [(set (match_operand:SI 0 s_register_operand =r,r,rk,r) + (minus:SI (match_operand:SI 1 reg_or_int_operand rI,r,k,?n) + (match_operand:SI 2 reg_or_int_operand r,rI,r, r)))] TARGET_32BIT @ rsb%?\\t%0, %2, %1 sub%?\\t%0, %1, %2 sub%?\\t%0, %1, %2 - # # - ((GET_CODE (operands[1]) == CONST_INT -!const_ok_for_arm (INTVAL (operands[1]))) - || (GET_CODE (operands[2]) == CONST_INT - !const_ok_for_arm (INTVAL (operands[2] + (GET_CODE (operands[1]) == CONST_INT +!const_ok_for_arm (INTVAL (operands[1]))) [(clobber (const_int 0))] arm_split_constant (MINUS, SImode, curr_insn, INTVAL (operands[1]), operands[0], operands[2], 0); DONE; - [(set_attr length 4,4,4,16,16) + [(set_attr length 4,4,4,16) (set_attr predicable yes)] )
Re: Builtin infrastructure change
On 10/06/2011 03:02 PM, Michael Meissner wrote: On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, Java, Ada, and Go languages with no regressions On a power6 box with RHEL 6.1, I have done the same for C, C++, Objective C, Java, and Ada languages with no regressions. Any reason for not building and testing Fortran? Especially as you patch gcc/fortran/{trans*.c,f95-lang.c}? Tobias [gcc/fortran] 2011-10-05 Michael Meissnermeiss...@linux.vnet.ibm.com * trans-expr.c (gfc_conv_power_op): Delete old interface with two parallel arrays to hold standard builtin declarations, and replace it with a function based interface that can support creating builtins on the fly in the future. Change all uses, and poison the old names. Make sure 0 is not a legitimate builtin index. (fill_with_spaces): Ditto. (gfc_trans_string_copy): Ditto. (gfc_trans_zero_assign): Ditto. (gfc_build_memcpy_call): Ditto. (alloc_scalar_allocatable_for_assignment): Ditto. * trans-array.c (gfc_trans_array_constructor_value): Ditto. (duplicate_allocatable): Ditto. (gfc_alloc_allocatable_for_assignment): Ditto. * trans-openmp.c (gfc_omp_clause_copy_ctor): Ditto. (gfc_omp_clause_assign_op): Ditto. (gfc_trans_omp_atomic): Ditto. (gfc_trans_omp_do): Ditto. (gfc_trans_omp_task): Ditto. * trans-stmt.c (gfc_trans_stop): Ditto. (gfc_trans_sync): Ditto. (gfc_trans_allocate): Ditto. (gfc_trans_deallocate): Ditto. * trans.c (gfc_call_malloc): Ditto. (gfc_allocate_using_malloc): Ditto. (gfc_call_free): Ditto. (gfc_deallocate_with_status): Ditto. (gfc_deallocate_scalar_with_status): Ditto. * f95-lang.c (gfc_define_builtin): Ditto. (gfc_init_builtin_functions): Ditto. * trans-decl.c (create_main_function): Ditto. * trans-intrinsic.c (builtin_decl_for_precision): Ditto.
[build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)
As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my previous libgcc patches. While the crtstuff one will fix it, I'd like to avoid breaking the target. The following patch fixes the problem, as confirmed in the PR. Ok for mainline? Rainer 2011-10-04 Rainer Orth r...@cebitec.uni-bielefeld.de PR bootstrap/49804 * config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to extra_parts. # HG changeset patch # Parent a57e226a2b14812bfa3c37c1aa807f28fac223eb Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804) diff --git a/libgcc/config.host b/libgcc/config.host --- a/libgcc/config.host +++ b/libgcc/config.host @@ -777,7 +777,7 @@ sparc-wrs-vxworks) ;; sparc64-*-freebsd*|ultrasparc-*-freebsd*) tmake_file=$tmake_file t-crtfm - extra_parts=crtfastmath.o + extra_parts=crtbegin.o crtbeginS.o crtend.o crtendS.o crtfastmath.o ;; sparc64-*-linux*) # 64-bit SPARC's running GNU/Linux extra_parts=$extra_parts crtfastmath.o -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)
On 10/06/2011 03:29 PM, Rainer Orth wrote: As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my previous libgcc patches. While the crtstuff one will fix it, I'd like to avoid breaking the target. The following patch fixes the problem, as confirmed in the PR. Ok for mainline? Rainer 2011-10-04 Rainer Orthr...@cebitec.uni-bielefeld.de PR bootstrap/49804 * config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to extra_parts. Ok. Paolo
Re: [PATCH 0/3] Fix vector shuffle problems
Hi, On Wed, 5 Oct 2011, Richard Henderson wrote: Tested on x86_64 with check-gcc//unix/{,-mssse3,-msse4} Hopefully one of the AMD guys can test on a bulldozer with -mxop? === gcc Summary for unix//-mxop === # of expected passes160 Ciao, Michael.
Re: [PATCH] Fix PR46556 (poor address generation)
On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote: People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. These are all good ideas. I will think about casting this as a more general strength reduction over extended basic blocks outside of loops. First I'll put together some simple tests to see whether we're currently missing some non-address opportunities. snip + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) +return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); Before accessing TREE_INT_CST_LOW you need to make sure the constants fit into a HWI using host_integerp () (which conveniently includes the check for INTEGER_CST). Note that you need to sign-extend the MEM_REF offset, thus use mem_ref_offset (base).low instead of TREE_INT_CST_LOW (TREE_OPERAND (base, 1)). Might be worth to add a testcase with negative offset ;) D'oh! . + c4 = bitpos / BITS_PER_UNIT; + c = c1 + c2 * c3 + c4; And you don't know whether this operation overflows. Thus it's probably easiest to use double_ints instead of HOST_WIDE_INTs in all of the code. OK, thanks, will do. snip + /* Determine whether the expression can be represented with base and + offset components. */ + base = get_inner_reference (*expr, bitsize, bitpos, offset, mode, + unsignedp, volatilep, false); + if (!base || !offset) +return false; + + /* Look for a restructuring opportunity. */ + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, + offset, bitpos)) == NULL_TREE) +return false; What I'm missing is a check whether the old address computation stmts will be dead after the transform. Hm, not quite sure what to do here. Prior to the transformation I'll have an assignment with something like: ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td) on LHS or RHS. Ta and Td will be part of the replacement. What should I be checking for? snip - if (is_gimple_assign (stmt) - !stmt_could_throw_p (stmt)) + /* Look for restructuring opportunities within an expression +that references memory. We only do this for blocks not + contained in loops, since the ivopts machinery does a + good job on loop expressions, and we don't want to interfere +with other loop optimizations. */ + if (!in_loop gimple_vuse (stmt) gimple_assign_single_p (stmt)) { + tree *lhs, *rhs; + lhs = gimple_assign_lhs_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (lhs, gsi) || chgd_mem_ref; + rhs = gimple_assign_rhs1_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (rhs, gsi) || chgd_mem_ref; It will either be a store or a load, but never both (unless it's an aggregate copy which I think we should not handle). So ... if (gimple_vdef (stmt)) ... lhs else if (gimple_vuse (stmt)) ... rhs OK, with your suggested gating on non-BLKmode I agree. + } + + else if (is_gimple_assign (stmt) + !stmt_could_throw_p (stmt)) + { tree lhs, rhs1, rhs2; enum tree_code rhs_code = gimple_assign_rhs_code (stmt); @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb) } } } + /* If memory references have been restructured, immediate uses need + to be cleaned up. */ + if (chgd_mem_ref) +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + update_stmt (gsi_stmt (gsi)); ICK. Definitely a no ;) Why does a update_stmt () after the restructure_mem_ref call not work? Ah, yeah, I meant to check again on that before submitting. . IIRC, at some point the update_stmt () following restructure_mem_ref was still giving me verify errors. I thought perhaps the statements created by force_gimple_operand_gsi might be giving me
Re: Initial shrink-wrapping patch
On 10/06/11 01:47, Bernd Schmidt wrote: This appears to be because the split prologue contains a jump, which means the find_many_sub_blocks call reorders the block numbers, and our indices into bb_flags are off. Testing of the patch completed - ok? Regardless of split-stack it seems like a cleanup and eliminates a potential source of errors. Bernd
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
BTW, don't you also need -mfmpath=sse in dg-options? According to doc/invoke.texi ... @itemx -mfma ... These options will enable GCC to use these extended instructions in generated code, even without @option{-mfpmath=sse}. Seems it -mfpmath=sse is useless.. Although, if this is wrong, we probably have to update doc as well. Thanks, K
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. Ciao, Michael.
Re: [Patch] Support DEC-C extensions
On Oct 3, 2011, at 10:23 PM, Joseph S. Myers wrote: On Mon, 3 Oct 2011, Douglas Rupp wrote: On 9/30/2011 8:19 AM, Joseph S. Myers wrote: On Fri, 30 Sep 2011, Tristan Gingold wrote: If you prefer a target hook, I'm fine with that. I will write such a patch. I don't think it must be restricted to system headers, as it is possible that the user 'imports' such a function (and define it in one of VMS favorite languages such as macro-32 or bliss). If it's not restricted to system headers, then probably the option is better than the target hook. I'm not sure I understand the reasoning here. This seems fairly VMS specific so what is the downside for a target hook and user written headers? The language accepted by the compiler in the user's source code (as opposed to in system headers) shouldn't depend on the target except for certain well-defined areas such as target attributes and built-in functions; behaving the same across different systems is an important feature of GCC. This isn't one of those areas of target-dependence; it's generic syntax rather than e.g. exploiting a particular processor feature. So the consensus is for a dedicated option. Which one do you prefer ? -funnamed-variadic-parameter -fpointless-variadic-functions -fallow-parameterless-variadic-functions I will update my patch once this is settled. Thanks, Tristan.
[PATCH, AIX] Add missing macros PR39950
The appended patch adds a few macros that XLC now defines on AIX. - David * config/rs6000/aix.h (TARGET_OS_AIX_CPP_BUILTINS): Define __powerpc__, __PPC__, __unix__. Index: aix.h === --- aix.h (revision 179610) +++ aix.h (working copy) @@ -97,6 +97,9 @@ { \ builtin_define (_IBMR2); \ builtin_define (_POWER); \ + builtin_define (__powerpc__); \ + builtin_define (__PPC__); \ + builtin_define (__unix__); \ builtin_define (_AIX); \ builtin_define (_AIX32); \ builtin_define (_AIX41); \
Re: [PATCH] Fix PR46556 (poor address generation)
On Thu, 6 Oct 2011, William J. Schmidt wrote: On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote: People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. These are all good ideas. I will think about casting this as a more general strength reduction over extended basic blocks outside of loops. First I'll put together some simple tests to see whether we're currently missing some non-address opportunities. snip + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) +return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); Before accessing TREE_INT_CST_LOW you need to make sure the constants fit into a HWI using host_integerp () (which conveniently includes the check for INTEGER_CST). Note that you need to sign-extend the MEM_REF offset, thus use mem_ref_offset (base).low instead of TREE_INT_CST_LOW (TREE_OPERAND (base, 1)). Might be worth to add a testcase with negative offset ;) D'oh! . + c4 = bitpos / BITS_PER_UNIT; + c = c1 + c2 * c3 + c4; And you don't know whether this operation overflows. Thus it's probably easiest to use double_ints instead of HOST_WIDE_INTs in all of the code. OK, thanks, will do. snip + /* Determine whether the expression can be represented with base and + offset components. */ + base = get_inner_reference (*expr, bitsize, bitpos, offset, mode, + unsignedp, volatilep, false); + if (!base || !offset) +return false; + + /* Look for a restructuring opportunity. */ + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, + offset, bitpos)) == NULL_TREE) +return false; What I'm missing is a check whether the old address computation stmts will be dead after the transform. Hm, not quite sure what to do here. Prior to the transformation I'll have an assignment with something like: ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td) on LHS or RHS. Ta and Td will be part of the replacement. What should I be checking for? Doh, I thought you were matching gimple stmts that do the address computation. But now I see you are matching the tree returned from get_inner_reference. So no need to check anything for that case. But that keeps me wondering what you'll do if the accesses were all pointer arithmetic, not arrays. Thus, extern void foo (int, int, int); void f (int *p, unsigned int n) { foo (p[n], p[n+64], p[n+128]); } wouldn't that have the same issue and you wouldn't handle it? Richard.
[PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c
Hi! If the second argument of gimple_build_assign_with_ops is an SSA_NAME, gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs which does if (lhs TREE_CODE (lhs) == SSA_NAME) SSA_NAME_DEF_STMT (lhs) = gs; so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed. Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-06 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs don't set SSA_NAME_DEF_STMT that has been already set by gimple_build_assign_with_ops. (vect_recog_pow_pattern, vect_recog_widen_sum_pattern, vect_operation_fits_smaller_type, vect_recog_over_widening_pattern): Likewise. --- gcc/tree-vect-patterns.c.jj 2011-10-06 12:37:34.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-06 13:19:44.0 +0200 @@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt; VEC_safe_push (gimple, heap, *stmts, def_stmt); *oprnd = new_oprnd; @@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp var = vect_recog_temp_ssa_var (type, NULL); pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0, oprnd1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; if (vect_print_dump_info (REPORT_DETAILS)) print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM); @@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL); stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base); - SSA_NAME_DEF_STMT (var) = stmt; return stmt; } @@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl var = vect_recog_temp_ssa_var (type, NULL); pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var, oprnd0, oprnd1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; if (vect_print_dump_info (REPORT_DETAILS)) { @@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt; VEC_safe_push (gimple, heap, *stmts, def_stmt); oprnd = new_oprnd; @@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; oprnd = new_oprnd; *new_def_stmt = new_stmt; } @@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g VEC_safe_push (gimple, heap, *stmts, prev_stmt); var = vect_recog_temp_ssa_var (new_type, NULL); - pattern_stmt = gimple_build_assign_with_ops ( - gimple_assign_rhs_code (stmt), var, op0, op1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; + pattern_stmt + = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var, + op0, op1); STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt; STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt; @@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g new_oprnd = make_ssa_name (tmp, NULL); pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, var, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt; *type_in = get_vectype_for_scalar_type (new_type); Jakub
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 6, 2011 at 3:49 PM, Michael Matz m...@suse.de wrote: Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. It seems to me it should then simply be if (!TREE_SIDE_EFFECTS (arg1) simple_operand_p (arg1)) return fold-the-not-and-variant (); Richard.
[PATCH] Don't fold always_inline not yet inlined builtins in gimple_fold_builtin
Hi! The 3 functions in builtins.c that dispatch builtin folding give up if avoid_folding_inline_builtin (fndecl) returns true, because we want to wait with those functions until they are inlined (which for -D_FORTIFY_SOURCE contains security checks). Unfortunately gimple_fold_builtin calls fold_builtin_str* etc. directly and thus bypasses this check. This didn't show up often because most of the inlines have __restrict arguments and restrict casts weren't considered useless. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by richi on IRC, will commit to trunk momentarily. 2011-10-06 Jakub Jelinek ja...@redhat.com * tree.h (avoid_folding_inline_builtin): New prototype. * builtins.c (avoid_folding_inline_builtin): No longer static. * gimple-fold.c (gimple_fold_builtin): Give up if avoid_folding_inline_builtin returns true. --- gcc/tree.h.jj 2011-10-03 14:27:50.0 +0200 +++ gcc/tree.h 2011-10-06 13:26:32.0 +0200 @@ -5352,6 +5352,7 @@ fold_build_pointer_plus_hwi_loc (locatio fold_build_pointer_plus_hwi_loc (UNKNOWN_LOCATION, p, o) /* In builtins.c */ +extern bool avoid_folding_inline_builtin (tree); extern tree fold_call_expr (location_t, tree, bool); extern tree fold_builtin_fputs (location_t, tree, tree, bool, bool, tree); extern tree fold_builtin_strcpy (location_t, tree, tree, tree, tree); --- gcc/builtins.c.jj 2011-10-05 08:13:55.0 +0200 +++ gcc/builtins.c 2011-10-06 13:25:39.0 +0200 @@ -10360,7 +10360,7 @@ fold_builtin_varargs (location_t loc, tr been inlined, otherwise e.g. -D_FORTIFY_SOURCE checking might not be performed. */ -static bool +bool avoid_folding_inline_builtin (tree fndecl) { return (DECL_DECLARED_INLINE_P (fndecl) --- gcc/gimple-fold.c.jj2011-10-06 09:14:17.0 +0200 +++ gcc/gimple-fold.c 2011-10-06 13:29:08.0 +0200 @@ -828,6 +828,11 @@ gimple_fold_builtin (gimple stmt) if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_MD) return NULL_TREE; + /* Give up for always_inline inline builtins until they are + inlined. */ + if (avoid_folding_inline_builtin (callee)) +return NULL_TREE; + /* If the builtin could not be folded, and it has no argument list, we're done. */ nargs = gimple_call_num_args (stmt); Jakub
[PATCH] Improve vector lowering a bit
This makes us lookup previous intermediate vector results when decomposing a operation. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-06 Richard Guenther rguent...@suse.de * tree-vect-generic.c (vector_element): Look at previous generated results. Index: gcc/tree-vect-generic.c === *** gcc/tree-vect-generic.c (revision 179598) --- gcc/tree-vect-generic.c (working copy) *** vector_element (gimple_stmt_iterator *gs *** 536,541 --- 536,552 idx = build_int_cst (TREE_TYPE (idx), index); } + /* When lowering a vector statement sequence do some easy + simplification by looking through intermediate vector results. */ + if (TREE_CODE (vect) == SSA_NAME) + { + gimple def_stmt = SSA_NAME_DEF_STMT (vect); + if (is_gimple_assign (def_stmt) + (gimple_assign_rhs_code (def_stmt) == VECTOR_CST + || gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR)) + vect = gimple_assign_rhs1 (def_stmt); + } + if (TREE_CODE (vect) == VECTOR_CST) { unsigned i;
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, I modified the patch so, that it always just converts two leafs of a TRUTH(AND|OR)IF chain into a TRUTH_(AND|OR) expression, if branch costs are high and leafs are simple without side-effects. Additionally I added some testcases for it. 2011-10-06 Kai Tietz kti...@redhat.com * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR to TRUTH_OR_EXPR, if suitable. 2011-10-06 Kai Tietz kti...@redhat.com * gcc.dg/tree-ssa/ssa-ifbranch-1.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-2.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-3.c: New test. * gcc.dg/tree-ssa/ssa-ifbranch-4.c: New test. Bootstrapped and tested for all languages (including Ada and Obj-C++) on host x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ +/* { dg-options -O2 -fdump-tree-gimple } */ +/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-* ilp32 } } } */ + +extern int doo1 (void); +extern int doo2 (void); + +int bar (int a, int b, int c) +{ + if (a b c) +return doo1 (); + return doo2 (); +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/fold-const.c === --- gcc-head.orig/gcc/fold-const.c +++ gcc-head/gcc/fold-const.c @@ -8387,6 +8387,45 @@ fold_truth_andor (location_t loc, enum t if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0) return tem; + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + !TREE_SIDE_EFFECTS (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + /* floats might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (arg1)) + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison +TREE_CODE (arg1) != TRUTH_NOT_EXPR +simple_operand_p (arg1)) + || ((TREE_CODE_CLASS (TREE_CODE (arg1)) == tcc_comparison + || TREE_CODE (arg1) == TRUTH_NOT_EXPR) + /* Float comparison might trap. */ + !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0))) + simple_operand_p (TREE_OPERAND (arg1, 0) +{ + /* We want to combine truth-comparison for +((W TRUTH-ANDOR X) TRUTH-ANDORIF Y) TRUTH-ANDORIF Z, +if Y and Z are simple operands and have no side-effect to +((W TRUTH-ANDOR X) TRUTH-IF (Y TRUTH-ANDOR Z). */ + if (TREE_CODE (arg0) == code + !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) + simple_operand_p (TREE_OPERAND (arg0, 1))) + { + tem = build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR + : TRUTH_OR_EXPR), + type, TREE_OPERAND (arg0, 1), arg1); + return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), +tem); + } + /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y +are simple operands and have no side-effects. */ + if (simple_operand_p (arg0) + !TREE_SIDE_EFFECTS (arg0)) + return build2_loc (loc, + (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR), + type, arg0, arg1); +} + return NULL_TREE; } Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-* mn10300-*-* } } } */ +/* { dg-options -O2 -fdump-tree-gimple } */ +/* { dg-options -O2 -fdump-tree-gimple -march=i586 { target { i?86-*-* ilp32 } } } */ + +extern int doo1 (void); +extern int doo2 (void); + +int bar (int a, int b, int c, int d) +{ + if (a b c d) +return doo1 (); + return doo2 (); +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c @@ -0,0 +1,18 @@ +/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and + lower values in BRANCH_COST. */ +/* { dg-do compile { target { ! mips*-*-* s390*-*-* avr-*-*
Re: [PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c
On Thu, 6 Oct 2011, Jakub Jelinek wrote: Hi! If the second argument of gimple_build_assign_with_ops is an SSA_NAME, gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs which does if (lhs TREE_CODE (lhs) == SSA_NAME) SSA_NAME_DEF_STMT (lhs) = gs; so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed. Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2011-10-06 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs don't set SSA_NAME_DEF_STMT that has been already set by gimple_build_assign_with_ops. (vect_recog_pow_pattern, vect_recog_widen_sum_pattern, vect_operation_fits_smaller_type, vect_recog_over_widening_pattern): Likewise. --- gcc/tree-vect-patterns.c.jj 2011-10-06 12:37:34.0 +0200 +++ gcc/tree-vect-patterns.c 2011-10-06 13:19:44.0 +0200 @@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt; VEC_safe_push (gimple, heap, *stmts, def_stmt); *oprnd = new_oprnd; @@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp var = vect_recog_temp_ssa_var (type, NULL); pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0, oprnd1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; if (vect_print_dump_info (REPORT_DETAILS)) print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM); @@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL); stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base); - SSA_NAME_DEF_STMT (var) = stmt; return stmt; } @@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl var = vect_recog_temp_ssa_var (type, NULL); pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var, oprnd0, oprnd1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; if (vect_print_dump_info (REPORT_DETAILS)) { @@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt; VEC_safe_push (gimple, heap, *stmts, def_stmt); oprnd = new_oprnd; @@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple new_oprnd = make_ssa_name (tmp, NULL); new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, oprnd, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = new_stmt; oprnd = new_oprnd; *new_def_stmt = new_stmt; } @@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g VEC_safe_push (gimple, heap, *stmts, prev_stmt); var = vect_recog_temp_ssa_var (new_type, NULL); - pattern_stmt = gimple_build_assign_with_ops ( - gimple_assign_rhs_code (stmt), var, op0, op1); - SSA_NAME_DEF_STMT (var) = pattern_stmt; + pattern_stmt + = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var, + op0, op1); STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt; STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt; @@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g new_oprnd = make_ssa_name (tmp, NULL); pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, var, NULL_TREE); - SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt; STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt; *type_in = get_vectype_for_scalar_type (new_type); Jakub -- Richard Guenther rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
[PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)
Hi! CAST_RESTRICT based disambiguation unfortunately isn't reliable, e.g. to store a non-restrict pointer into a restricted field, we add a non-useless cast to restricted pointer in the gimplifier, and while we don't consider that field to have a special restrict tag because it is unsafe to do so, we unfortunately create it for the CAST_RESTRICT before that and end up with different restrict tags for the same thing. See the PR for more details. This patch turns off CAST_RESTRICT handling for now, in the future we might try to replace it by explicit CAST_RESTRICT stmts in some form, but need to solve problems with multiple inlined copies of the same function with restrict arguments or restrict variables in it and intermixed code from them (or similarly code from different non-overlapping source blocks). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 4.6 too? 2011-10-06 Jakub Jelinek ja...@redhat.com PR tree-optimization/49279 * tree-ssa-structalias.c (find_func_aliases): Don't handle CAST_RESTRICT. * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow restrict propagation. * tree-ssa.c (useless_type_conversion_p): Don't return false if TYPE_RESTRICT differs. * gcc.dg/tree-ssa/restrict-4.c: XFAIL. * gcc.c-torture/execute/pr49279.c: New test. --- gcc/tree-ssa-structalias.c.jj 2011-10-04 10:18:29.0 +0200 +++ gcc/tree-ssa-structalias.c 2011-10-05 12:43:42.0 +0200 @@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt) (!in_ipa_mode || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop))) make_escape_constraint (rhsop); - /* If this is a conversion of a non-restrict pointer to a -restrict pointer track it with a new heapvar. */ - else if (gimple_assign_cast_p (t) - POINTER_TYPE_P (TREE_TYPE (rhsop)) - POINTER_TYPE_P (TREE_TYPE (lhsop)) - !TYPE_RESTRICT (TREE_TYPE (rhsop)) - TYPE_RESTRICT (TREE_TYPE (lhsop))) - make_constraint_from_restrict (get_vi_for_tree (lhsop), - CAST_RESTRICT); } /* Handle escapes through return. */ else if (gimple_code (t) == GIMPLE_RETURN --- gcc/tree-ssa-forwprop.c.jj 2011-10-04 14:36:00.0 +0200 +++ gcc/tree-ssa-forwprop.c 2011-10-05 12:46:32.0 +0200 @@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name ((rhs_code == SSA_NAME rhs == name) || CONVERT_EXPR_CODE_P (rhs_code))) { - /* Don't propagate restrict pointer's RHS. */ - if (TYPE_RESTRICT (TREE_TYPE (lhs)) - !TYPE_RESTRICT (TREE_TYPE (name)) - !is_gimple_min_invariant (def_rhs)) - return false; /* Only recurse if we don't deal with a single use or we cannot do the propagation to the current statement. In particular we can end up with a conversion needed for a non-invariant --- gcc/tree-ssa.c.jj 2011-09-15 12:18:54.0 +0200 +++ gcc/tree-ssa.c 2011-10-05 12:44:52.0 +0200 @@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty != TYPE_ADDR_SPACE (TREE_TYPE (inner_type))) return false; - /* Do not lose casts to restrict qualified pointers. */ - if ((TYPE_RESTRICT (outer_type) - != TYPE_RESTRICT (inner_type)) - TYPE_RESTRICT (outer_type)) - return false; - /* If the outer type is (void *), the conversion is not necessary. */ if (VOID_TYPE_P (TREE_TYPE (outer_type))) return true; --- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj 2011-10-04 14:33:08.0 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c 2011-10-05 16:22:33.232433231 +0200 @@ -22,5 +22,5 @@ bar (int *x, int y) return p1[y]; } -/* { dg-final { scan-tree-dump-times return 1; 2 optimized } } */ +/* { dg-final { scan-tree-dump-times return 1; 2 optimized { xfail *-*-* } } } */ /* { dg-final { cleanup-tree-dump optimized } } */ --- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj2011-10-05 13:32:43.087670846 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr49279.c 2011-10-05 13:32:43.087670846 +0200 @@ -0,0 +1,35 @@ +/* PR tree-optimization/49279 */ +extern void abort (void); + +struct S { int a; int *__restrict p; }; + +__attribute__((noinline, noclone)) +struct S *bar (struct S *p) +{ + struct S *r; + asm volatile ( : =r (r) : 0 (p) : memory); + return r; +} + +__attribute__((noinline, noclone)) +int +foo (int *p, int *q) +{ + struct S s, *t; + s.a = 1; + s.p = p; + t = bar (s); + t-p = q; + s.p[0] = 0; + t-p[0] = 1; + return s.p[0]; +} + +int +main () +{ + int a, b; + if (foo (a, b) != 1) +abort (); + return 0; +} Jakub
Re: Builtin infrastructure change
On Thu, Oct 06, 2011 at 03:23:07PM +0200, Tobias Burnus wrote: On 10/06/2011 03:02 PM, Michael Meissner wrote: On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, Java, Ada, and Go languages with no regressions On a power6 box with RHEL 6.1, I have done the same for C, C++, Objective C, Java, and Ada languages with no regressions. Any reason for not building and testing Fortran? Especially as you patch gcc/fortran/{trans*.c,f95-lang.c}? Tobias Brain fault on my part. I tested the previous set of patches with Fortran. Since I had to explicitly add the languages to pick up Ada and Go, I seemed to have dropped Fortran. Sigh. Sorry about that. I just started the powerpc bootstrap, since that is a lot faster. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
[v3] Avoid spurious fails when running the testsuite with -std=gnu++0x
Hi, tested x86_64-linux, committed. Paolo. 2011-10-06 Paolo Carlini paolo.carl...@oracle.com * testsuite/27_io/ios_base/cons/assign_neg.cc: Tidy dg- directives, for C++0x testing too. * testsuite/27_io/ios_base/cons/copy_neg.cc: Likewise. * testsuite/ext/pb_ds/example/hash_resize_neg.cc: Likewise. * testsuite/24_iterators/istreambuf_iterator/requirements/ base_classes.cc: Adjust for C++0x testing. * testsuite/ext/codecvt/char-1.cc: Avoid warnings in C++0x mode. * testsuite/ext/codecvt/char-2.cc: Likewise. * testsuite/ext/codecvt/wchar_t.cc: Likewise. Index: testsuite/27_io/ios_base/cons/assign_neg.cc === --- testsuite/27_io/ios_base/cons/assign_neg.cc (revision 179595) +++ testsuite/27_io/ios_base/cons/assign_neg.cc (working copy) @@ -18,21 +18,18 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. - #include ios // Library defect report //50. Copy constructor and assignment operator of ios_base -class test_base : public std::ios_base { }; +class test_base : public std::ios_base { }; // { dg-error within this context|deleted } void test01() { // assign test_base io1; test_base io2; - io1 = io2; + io1 = io2; // { dg-error synthesized|deleted } } -// { dg-error synthesized { target *-*-* } 33 } -// { dg-error within this context { target *-*-* } 26 } -// { dg-error is private { target *-*-* } 791 } -// { dg-error operator= { target *-*-* } 0 } + +// { dg-prune-output include } Index: testsuite/27_io/ios_base/cons/copy_neg.cc === --- testsuite/27_io/ios_base/cons/copy_neg.cc (revision 179595) +++ testsuite/27_io/ios_base/cons/copy_neg.cc (working copy) @@ -18,21 +18,18 @@ // with this library; see the file COPYING3. If not see // http://www.gnu.org/licenses/. - #include ios // Library defect report //50. Copy constructor and assignment operator of ios_base -struct test_base : public std::ios_base +struct test_base : public std::ios_base // { dg-error within this context|deleted } { }; void test02() { // copy ctor test_base io1; - test_base io2 = io1; + test_base io2 = io1; // { dg-error synthesized|deleted } } -// { dg-error within this context { target *-*-* } 26 } -// { dg-error synthesized { target *-*-* } 33 } -// { dg-error is private { target *-*-* } 788 } -// { dg-error copy constructor { target *-*-* } 0 } + +// { dg-prune-output include } Index: testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc === --- testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc (revision 179595) +++ testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc (working copy) @@ -1,7 +1,8 @@ // { dg-do compile } // 1999-06-28 bkoz -// Copyright (C) 1999, 2001, 2003, 2009 Free Software Foundation, Inc. +// Copyright (C) 1999, 2001, 2003, 2009, 2010, 2011 +// Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -31,8 +32,15 @@ // Check for required base class. typedef istreambuf_iteratorchar test_iterator; typedef char_traitschar::off_type off_type; - typedef iteratorinput_iterator_tag, char, off_type, char*, char base_iterator; + typedef iteratorinput_iterator_tag, char, off_type, char*, +#ifdef __GXX_EXPERIMENTAL_CXX0X__ +char +#else +char +#endif +base_iterator; + istringstream isstream(this tag); test_iterator r_it(isstream); base_iterator* base __attribute__((unused)) = r_it; Index: testsuite/ext/pb_ds/example/hash_resize_neg.cc === --- testsuite/ext/pb_ds/example/hash_resize_neg.cc (revision 179595) +++ testsuite/ext/pb_ds/example/hash_resize_neg.cc (working copy) @@ -1,7 +1,8 @@ // { dg-do compile } // -*- C++ -*- -// Copyright (C) 2005, 2006, 2007, 2009 Free Software Foundation, Inc. +// Copyright (C) 2005, 2006, 2007, 2009, 2010, 2011 +// Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the terms @@ -60,4 +61,4 @@ h.resize(20); // { dg-error required from } } -// { dg-error invalid { target *-*-* } 187 } +// { dg-prune-output include } Index: testsuite/ext/codecvt/char-1.cc === --- testsuite/ext/codecvt/char-1.cc (revision 179595) +++ testsuite/ext/codecvt/char-1.cc (working copy) @@ -4,6 +4,7 @@ // 2000-08-22 Benjamin Kosnik b...@cygnus.com // Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009 +// 2010, 2011
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Richard Guenther wrote: + ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison + TREE_CODE (arg1) != TRUTH_NOT_EXPR) + || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0) ? simple_operand_p would have rejected both ! and comparisons. I miss a test for side-effects on arg0 (and probably simple_operand_p there, as well). He has it in the if() body. But why? The point of ANDIF/ORIF is to not evaluate the second argument for side-effects when the first argument is false/true already, and further to establish an order between both evaluations. The sideeffect on the first arg is always evaluated. AND/OR always evaluate both arguments (in unspecified order), but as he checks the second one for being free of side effects already that alone is already equivalent to ANDIF/ORIF. No need to check something on the first argument. Ciao, Michael. That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Regards, Kai
Re: [PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)
On Thu, 6 Oct 2011, Jakub Jelinek wrote: Hi! CAST_RESTRICT based disambiguation unfortunately isn't reliable, e.g. to store a non-restrict pointer into a restricted field, we add a non-useless cast to restricted pointer in the gimplifier, and while we don't consider that field to have a special restrict tag because it is unsafe to do so, we unfortunately create it for the CAST_RESTRICT before that and end up with different restrict tags for the same thing. See the PR for more details. This patch turns off CAST_RESTRICT handling for now, in the future we might try to replace it by explicit CAST_RESTRICT stmts in some form, but need to solve problems with multiple inlined copies of the same function with restrict arguments or restrict variables in it and intermixed code from them (or similarly code from different non-overlapping source blocks). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 4.6 too? Ok for trunk. Ok for 4.6 with the tree-ssa.c change omitted - and the stmt folding patch applied. Thanks, Richard. 2011-10-06 Jakub Jelinek ja...@redhat.com PR tree-optimization/49279 * tree-ssa-structalias.c (find_func_aliases): Don't handle CAST_RESTRICT. * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow restrict propagation. * tree-ssa.c (useless_type_conversion_p): Don't return false if TYPE_RESTRICT differs. * gcc.dg/tree-ssa/restrict-4.c: XFAIL. * gcc.c-torture/execute/pr49279.c: New test. --- gcc/tree-ssa-structalias.c.jj 2011-10-04 10:18:29.0 +0200 +++ gcc/tree-ssa-structalias.c2011-10-05 12:43:42.0 +0200 @@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt) (!in_ipa_mode || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop))) make_escape_constraint (rhsop); - /* If this is a conversion of a non-restrict pointer to a - restrict pointer track it with a new heapvar. */ - else if (gimple_assign_cast_p (t) - POINTER_TYPE_P (TREE_TYPE (rhsop)) - POINTER_TYPE_P (TREE_TYPE (lhsop)) - !TYPE_RESTRICT (TREE_TYPE (rhsop)) - TYPE_RESTRICT (TREE_TYPE (lhsop))) - make_constraint_from_restrict (get_vi_for_tree (lhsop), -CAST_RESTRICT); } /* Handle escapes through return. */ else if (gimple_code (t) == GIMPLE_RETURN --- gcc/tree-ssa-forwprop.c.jj2011-10-04 14:36:00.0 +0200 +++ gcc/tree-ssa-forwprop.c 2011-10-05 12:46:32.0 +0200 @@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name ((rhs_code == SSA_NAME rhs == name) || CONVERT_EXPR_CODE_P (rhs_code))) { - /* Don't propagate restrict pointer's RHS. */ - if (TYPE_RESTRICT (TREE_TYPE (lhs)) -!TYPE_RESTRICT (TREE_TYPE (name)) -!is_gimple_min_invariant (def_rhs)) - return false; /* Only recurse if we don't deal with a single use or we cannot do the propagation to the current statement. In particular we can end up with a conversion needed for a non-invariant --- gcc/tree-ssa.c.jj 2011-09-15 12:18:54.0 +0200 +++ gcc/tree-ssa.c2011-10-05 12:44:52.0 +0200 @@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty != TYPE_ADDR_SPACE (TREE_TYPE (inner_type))) return false; - /* Do not lose casts to restrict qualified pointers. */ - if ((TYPE_RESTRICT (outer_type) -!= TYPE_RESTRICT (inner_type)) -TYPE_RESTRICT (outer_type)) - return false; - /* If the outer type is (void *), the conversion is not necessary. */ if (VOID_TYPE_P (TREE_TYPE (outer_type))) return true; --- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj 2011-10-04 14:33:08.0 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c2011-10-05 16:22:33.232433231 +0200 @@ -22,5 +22,5 @@ bar (int *x, int y) return p1[y]; } -/* { dg-final { scan-tree-dump-times return 1; 2 optimized } } */ +/* { dg-final { scan-tree-dump-times return 1; 2 optimized { xfail *-*-* } } } */ /* { dg-final { cleanup-tree-dump optimized } } */ --- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj 2011-10-05 13:32:43.087670846 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr49279.c 2011-10-05 13:32:43.087670846 +0200 @@ -0,0 +1,35 @@ +/* PR tree-optimization/49279 */ +extern void abort (void); + +struct S { int a; int *__restrict p; }; + +__attribute__((noinline, noclone)) +struct S *bar (struct S *p) +{ + struct S *r; + asm volatile ( : =r (r) : 0 (p) : memory); + return r; +} + +__attribute__((noinline, noclone)) +int +foo (int *p, int *q) +{ + struct S s, *t; + s.a = 1; + s.p = p; + t = bar (s); + t-p = q; + s.p[0] = 0; + t-p[0] = 1; + return s.p[0]; +} + +int +main () +{ + int a, b; + if (foo (a, b) != 1) +abort
Re: rfa: remove get_var_ann (was: Fix PR50260)
Hi, On Sat, 3 Sep 2011, Richard Guenther wrote: OTOH it's a nice invariant that can actually be checked for (that all reachable vars whatsoever have to be in referenced_vars), so I'm going to do that. Yes, until we get rid of referenced_vars (which we still should do at some point...) that's the best. Okay, like so then. Regstrapped on x86_64-linux. (Note that sometimes I use add_referenced_vars, and sometimes find_referenced_vars_in, the latter when I would have to add several add_referenced_vars for one statement). IIRC we have some verification code even, and wonder why it doesn't trigger. Nope, we don't. But with the patch we segfault in case this happens again, which is good enough checking for me. Ciao, Michael. * tree-flow.h (get_var_ann): Don't declare. * tree-flow-inline.h (get_var_ann): Remove. (set_is_used): Use var_ann, not get_var_ann. * tree-dfa.c (add_referenced_var): Inline body of get_var_ann. * tree-profile.c (gimple_gen_edge_profiler): Call find_referenced_var_in. (gimple_gen_interval_profiler): Ditto. (gimple_gen_pow2_profiler): Ditto. (gimple_gen_one_value_profiler): Ditto. (gimple_gen_average_profiler): Ditto. (gimple_gen_ior_profiler): Ditto. (gimple_gen_ic_profiler): Ditto plus call add_referenced_var. (gimple_gen_ic_func_profiler): Call add_referenced_var. * tree-mudflap.c (execute_mudflap_function_ops): Call add_referenced_var. Index: tree-flow.h === --- tree-flow.h (revision 178488) +++ tree-flow.h (working copy) @@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d typedef struct var_ann_d *var_ann_t; static inline var_ann_t var_ann (const_tree); -static inline var_ann_t get_var_ann (tree); static inline void update_stmt (gimple); static inline int get_lineno (const_gimple); Index: tree-flow-inline.h === --- tree-flow-inline.h (revision 178488) +++ tree-flow-inline.h (working copy) @@ -145,16 +145,6 @@ var_ann (const_tree t) return p ? *p : NULL; } -/* Return the variable annotation for T, which must be a _DECL node. - Create the variable annotation if it doesn't exist. */ -static inline var_ann_t -get_var_ann (tree var) -{ - var_ann_t *p = DECL_VAR_ANN_PTR (var); - gcc_checking_assert (p); - return *p ? *p : create_var_ann (var); -} - /* Get the number of the next statement uid to be allocated. */ static inline unsigned int gimple_stmt_max_uid (struct function *fn) @@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us static inline void set_is_used (tree var) { - var_ann_t ann = get_var_ann (var); + var_ann_t ann = var_ann (var); ann-used = true; } Index: tree-dfa.c === --- tree-dfa.c (revision 178488) +++ tree-dfa.c (working copy) @@ -580,8 +580,9 @@ set_default_def (tree var, tree def) bool add_referenced_var (tree var) { - get_var_ann (var); gcc_assert (DECL_P (var)); + if (!*DECL_VAR_ANN_PTR (var)) +create_var_ann (var); /* Insert VAR into the referenced_vars hash table if it isn't present. */ if (referenced_var_check_and_insert (var)) Index: tree-profile.c === --- tree-profile.c (revision 178408) +++ tree-profile.c (working copy) @@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed one = build_int_cst (gcov_type_node, 1); stmt1 = gimple_build_assign (gcov_type_tmp_var, ref); gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1)); + find_referenced_vars_in (stmt1); stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var, gimple_assign_lhs (stmt1), one); gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2)); @@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_ val = prepare_instrumented_value (gsi, value); call = gimple_build_call (tree_interval_profiler_fn, 4, ref_ptr, val, start, steps); + find_referenced_vars_in (call); gsi_insert_before (gsi, call, GSI_NEW_STMT); } @@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu true, NULL_TREE, true, GSI_SAME_STMT); val = prepare_instrumented_value (gsi, value); call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val); + find_referenced_vars_in (call); gsi_insert_before (gsi, call, GSI_NEW_STMT); } @@ -310,6 +313,7 @@ gimple_gen_one_value_profiler (histogram true, NULL_TREE, true, GSI_SAME_STMT); val = prepare_instrumented_value (gsi, value); call = gimple_build_call (tree_one_value_profiler_fn, 2, ref_ptr, val); + find_referenced_vars_in (call); gsi_insert_before
[PATCH][ARM] Fix broken shift patterns
This patch is a follow-up both to my patches here: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00049.html and Paul Brook's patch here: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01076.html The patch fixes both the original problem, in which negative shift constants caused an ICE (pr50193), and the problem introduced by Paul's patch, in which a*64+b is not properly optimized. However, it does not attempt to fix Richard Sandiford's observation that there may be a latent problem with the 'M' constraint which could lead reload to cause a recog ICE. I believe this patch to be nothing but an improvement over the current state, and that a fix to the constraint problem should be a separate patch. In that basis, am I OK to commit? Now, let me explain the other problem: As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit a shift to be expressed as either a shift type and amount (register or constant), or as a multiply and power-of-two constant. This is necessary because the canonical form of (plus (ashift x y) z) appears to be (plus (mult x 2^y) z), presumably for the benefit of multiply-and-accumulate optimizations. (Minus is similarly affected, but other shiftable operations are unaffected, and this only applies to left shifts, of course.) The (possible) problem is that the meanings of the constants for mult and ashift are very different, but the arm.md file has these unified into a single pattern using a single 'M' constraint that must allow both types of constant unconditionally. This is safe for the vast majority of passes because they check recog before they make a change, and anyway don't make changes without understanding the logic. But, reload has a feature where it can pull a constant from a register, and convert it to an immediate, if the constraints allow, but crucially, it doesn't check the predicates; no doubt it shouldn't need to, but the ARM port appears to be breaking to rules. Problem scenario 1: Consider pattern (plus (mult r1 r2) r3). It so happens that reload knows that r2 contains a constant, say 20, so reload checks to see if that could be converted to an immediate. Now, 20 is not a power of two, so recog would reject it, but it is in the range 0..31 so it does match the 'M' constraint. Oops! Problem scenario 2: Consider pattern (ashiftrt r1 r2). Again, it so happens that reload knows that r2 contains a constant, in this case let's say 64, so again reload checks to see if that could be converted to an immediate. This time, 64 is not in the range 0..31, so recog would reject it, but it is a power of two, so it does match the 'M' constraint. Again, oops! I see two ways to fix this properly: 1. Duplicate all the patterns in the machine description, once for the mult case, and once for the other cases. This could probably be done with a code iterator, if preferred. 2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it always uses the (presumably cheaper) shift-and-add option. However, this would require all other targets where madd really is the best option to fix it up. (I'd imagine that two instructions for shift and add would be cheaper speed wise, if properly scheduled, on most targets? That doesn't help the size optimization though.) However, it's not obvious to me that this needs fixing: * The failure mode would be an ICE, and we've not seen any. * There's a comment in arm.c:shift_op that suggests that this can't happen, somehow, at least in the mult case. - I'm not sure exactly how reload works, but it seems reasonable that it will never try to convert a register to an immediate because the pattern does not allow registers in the first place. - This logic doesn't hold in the opposite case though. Have I explained all that clearly? My conclusion after studying all this is that we don't need to do anything until somebody reports an ICE, at which point it becomes worth the effort of fixing it. Other opinions welcome! :) Andrew 2011-10-06 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/predicates.md (shift_amount_operand): Remove constant range check. (shift_operator): Check range of constants for all shift operators. gcc/testsuite/ * gcc.dg/pr50193-1.c: New file. * gcc.target/arm/shiftable.c: New file. --- src/gcc-mainline/gcc/config/arm/predicates.md | 15 ++- src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c | 10 + .../gcc/testsuite/gcc.target/arm/shiftable.c | 43 3 files changed, 65 insertions(+), 3 deletions(-) create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/shiftable.c diff --git a/src/gcc-mainline/gcc/config/arm/predicates.md b/src/gcc-mainline/gcc/config/arm/predicates.md index 27ba603..7307fd5 100644 --- a/src/gcc-mainline/gcc/config/arm/predicates.md +++
Re: rfa: remove get_var_ann (was: Fix PR50260)
On Thu, Oct 6, 2011 at 4:59 PM, Michael Matz m...@suse.de wrote: Hi, On Sat, 3 Sep 2011, Richard Guenther wrote: OTOH it's a nice invariant that can actually be checked for (that all reachable vars whatsoever have to be in referenced_vars), so I'm going to do that. Yes, until we get rid of referenced_vars (which we still should do at some point...) that's the best. Okay, like so then. Regstrapped on x86_64-linux. (Note that sometimes I use add_referenced_vars, and sometimes find_referenced_vars_in, the latter when I would have to add several add_referenced_vars for one statement). IIRC we have some verification code even, and wonder why it doesn't trigger. Nope, we don't. But with the patch we segfault in case this happens again, which is good enough checking for me. Ok. Thanks, Richard. Ciao, Michael. * tree-flow.h (get_var_ann): Don't declare. * tree-flow-inline.h (get_var_ann): Remove. (set_is_used): Use var_ann, not get_var_ann. * tree-dfa.c (add_referenced_var): Inline body of get_var_ann. * tree-profile.c (gimple_gen_edge_profiler): Call find_referenced_var_in. (gimple_gen_interval_profiler): Ditto. (gimple_gen_pow2_profiler): Ditto. (gimple_gen_one_value_profiler): Ditto. (gimple_gen_average_profiler): Ditto. (gimple_gen_ior_profiler): Ditto. (gimple_gen_ic_profiler): Ditto plus call add_referenced_var. (gimple_gen_ic_func_profiler): Call add_referenced_var. * tree-mudflap.c (execute_mudflap_function_ops): Call add_referenced_var. Index: tree-flow.h === --- tree-flow.h (revision 178488) +++ tree-flow.h (working copy) @@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d typedef struct var_ann_d *var_ann_t; static inline var_ann_t var_ann (const_tree); -static inline var_ann_t get_var_ann (tree); static inline void update_stmt (gimple); static inline int get_lineno (const_gimple); Index: tree-flow-inline.h === --- tree-flow-inline.h (revision 178488) +++ tree-flow-inline.h (working copy) @@ -145,16 +145,6 @@ var_ann (const_tree t) return p ? *p : NULL; } -/* Return the variable annotation for T, which must be a _DECL node. - Create the variable annotation if it doesn't exist. */ -static inline var_ann_t -get_var_ann (tree var) -{ - var_ann_t *p = DECL_VAR_ANN_PTR (var); - gcc_checking_assert (p); - return *p ? *p : create_var_ann (var); -} - /* Get the number of the next statement uid to be allocated. */ static inline unsigned int gimple_stmt_max_uid (struct function *fn) @@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us static inline void set_is_used (tree var) { - var_ann_t ann = get_var_ann (var); + var_ann_t ann = var_ann (var); ann-used = true; } Index: tree-dfa.c === --- tree-dfa.c (revision 178488) +++ tree-dfa.c (working copy) @@ -580,8 +580,9 @@ set_default_def (tree var, tree def) bool add_referenced_var (tree var) { - get_var_ann (var); gcc_assert (DECL_P (var)); + if (!*DECL_VAR_ANN_PTR (var)) + create_var_ann (var); /* Insert VAR into the referenced_vars hash table if it isn't present. */ if (referenced_var_check_and_insert (var)) Index: tree-profile.c === --- tree-profile.c (revision 178408) +++ tree-profile.c (working copy) @@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed one = build_int_cst (gcov_type_node, 1); stmt1 = gimple_build_assign (gcov_type_tmp_var, ref); gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1)); + find_referenced_vars_in (stmt1); stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var, gimple_assign_lhs (stmt1), one); gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2)); @@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_ val = prepare_instrumented_value (gsi, value); call = gimple_build_call (tree_interval_profiler_fn, 4, ref_ptr, val, start, steps); + find_referenced_vars_in (call); gsi_insert_before (gsi, call, GSI_NEW_STMT); } @@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu true, NULL_TREE, true, GSI_SAME_STMT); val = prepare_instrumented_value (gsi, value); call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val); + find_referenced_vars_in (call); gsi_insert_before (gsi, call, GSI_NEW_STMT); } @@ -310,6 +313,7 @@ gimple_gen_one_value_profiler (histogram true, NULL_TREE, true, GSI_SAME_STMT); val = prepare_instrumented_value
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Yes, of course. That is what implements the short-circuit semantics. But as Richard already mentioned I also don't understand why you do the reassociation at that point. Why not simply rewrite ANDIF - AND (when possible, i.e. no sideeffects on arg1, and desirable, i.e. when LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other folders do reassociation? I ask because your comments states to transform: ((W AND X) ANDIF Y) ANDIF Z into (W AND X) ANDIF (Y AND Z) (under condition that Y and Z are simple operands). In fact you don't check the form of arg0,0, i.e. the W AND X here. Independend of that it doesn't make sense, because if Y and Z are easy (simple and no side-effects), then Y AND Z is too, and therefore you should transform this (if at all) into: (W AND X) AND (Y AND Z) at which point this association doesn't make sense anymore, as ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) Ciao, Michael.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: That's not the hole story. The difference between TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't. Yes, of course. That is what implements the short-circuit semantics. But as Richard already mentioned I also don't understand why you do the reassociation at that point. Why not simply rewrite ANDIF - AND (when possible, i.e. no sideeffects on arg1, and desirable, i.e. when LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other folders do reassociation? I ask because your comments states to transform: ((W AND X) ANDIF Y) ANDIF Z into (W AND X) ANDIF (Y AND Z) (under condition that Y and Z are simple operands). In fact you don't check the form of arg0,0, i.e. the W AND X here. Independend of that it doesn't make sense, because if Y and Z are easy (simple and no side-effects), then Y AND Z is too, and therefore you should transform this (if at all) into: (W AND X) AND (Y AND Z) at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote: None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Yeah. Perhaps we should break them up after reassoc2, or on the other side teach reassoc (or some other pass) to be able to do the optimizations on a series of GIMPLE_COND with no side-effects in between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4; isn't optimized into (a - 1U) 4U, although it could, if branch cost cause it to be broken up into several GIMPLE_COND stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more probably using enums). Jakub
Re: Initial shrink-wrapping patch
On 10/06/2011 06:37 AM, Bernd Schmidt wrote: On 10/06/11 01:47, Bernd Schmidt wrote: This appears to be because the split prologue contains a jump, which means the find_many_sub_blocks call reorders the block numbers, and our indices into bb_flags are off. Testing of the patch completed - ok? Regardless of split-stack it seems like a cleanup and eliminates a potential source of errors. Yes, patch is ok. r~
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. Right ... ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. ... and I see that (that's why the transformation should be desirable for some definition of desirable, which probably includes and RHS not too long chain). As it stands right now your transformation seems to be a fairly ad-hoc try at avoiding this problem. That's why I wonder why to do the reassoc at all? Which testcases break _without_ the reassociation, i.e. with only rewriting ANDIF - AND at the outermost level? Ciao, Michael.
[cxx-mem-model] Add lockfree tests
This patch supplies __sync_mem_is_lock_free (size) and __sync_mem_always_lock_free (size). __sync_mem_always_lock_free requires a compile time constant, and returns true if an object of the specified size will *always* generate lock free instructions on the current architecture. Otherwise false is returned. __sync_mem_is_lock_free also returns true if instructions will always be lock free, but if the answer is not true, it resolves to an external call named '__sync_mem_is_lock_free' which will be supplied externally. Presumably by whatever library or application is providing the other external __sync_mem routines as documented in http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary New tests, documentation are provided, bootstraps on x86_64-unknown-linux-gnu and causes no new testsuite regressions. Andrew * optabs.h (DOI_sync_mem_always_lock_free): New. (DOI_sync_mem_is_lock_free): New. (sync_mem_always_lock_free_optab, sync_mem_is_lock_free_optab): New. * builtins.c (fold_builtin_sync_mem_always_lock_free): New. (expand_builtin_sync_mem_always_lock_free): New. (fold_builtin_sync_mem_is_lock_free): New. (expand_builtin_sync_mem_is_lock_free): New. (expand_builtin): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE. (fold_builtin_1): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE. * sync-builtins.def: Add BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE. * builtin-types.def: Add BT_FN_BOOL_SIZE type. * fortran/types.def: Add BT_SIZE and BT_FN_BOOL_SIZE. * doc/extend.texi: Add documentation. * testsuite/gcc.dg/sync-mem-invalid.c: Test for invalid param. * testsuite/gcc.dg/sync-mem-lockfree[-aux].c: New tests. Index: optabs.h === *** optabs.h(revision 178916) --- optabs.h(working copy) *** enum direct_optab_index *** 708,713 --- 708,715 DOI_sync_mem_nand, DOI_sync_mem_xor, DOI_sync_mem_or, + DOI_sync_mem_always_lock_free, + DOI_sync_mem_is_lock_free, DOI_sync_mem_thread_fence, DOI_sync_mem_signal_fence, *** typedef struct direct_optab_d *direct_op *** 801,806 --- 803,812 (direct_optab_table[(int) DOI_sync_mem_xor]) #define sync_mem_or_optab \ (direct_optab_table[(int) DOI_sync_mem_or]) + #define sync_mem_always_lock_free_optab \ + (direct_optab_table[(int) DOI_sync_mem_always_lock_free]) + #define sync_mem_is_lock_free_optab \ + (direct_optab_table[(int) DOI_sync_mem_is_lock_free]) #define sync_mem_thread_fence_optab \ (direct_optab_table[(int) DOI_sync_mem_thread_fence]) #define sync_mem_signal_fence_optab \ Index: builtins.c === *** builtins.c (revision 179522) --- builtins.c (working copy) *** expand_builtin_sync_mem_fetch_op (enum m *** 5386,5391 --- 5386,5472 return expand_sync_mem_fetch_op (target, mem, val, code, model, fetch_after); } + /* Return true if size ARG is always lock free on this architecture. */ + static tree + fold_builtin_sync_mem_always_lock_free (tree arg) + { + int size; + enum machine_mode mode; + enum insn_code icode; + + if (TREE_CODE (arg) != INTEGER_CST) + return NULL_TREE; + + /* Check if a compare_and_swap pattern exists for the mode which represents + the required size. The pattern is not allowed to fail, so the existence + of the pattern indicates support is present. */ + size = INTVAL (expand_normal (arg)) * BITS_PER_UNIT; + mode = mode_for_size (size, MODE_INT, 0); + icode = direct_optab_handler (sync_compare_and_swap_optab, mode); + + if (icode == CODE_FOR_nothing) + return integer_zero_node; + + return integer_one_node; + } + + /* Return true if the first argument to call EXP represents a size of +object than will always generate lock-free instructions on this target. +Otherwise return false. */ + static rtx + expand_builtin_sync_mem_always_lock_free (tree exp) + { + tree size; + tree arg = CALL_EXPR_ARG (exp, 0); + + if (TREE_CODE (arg) != INTEGER_CST) + { + error (non-constant argument to __sync_mem_always_lock_free); + return const0_rtx; + } + + size = fold_builtin_sync_mem_always_lock_free (arg); + if (size == integer_one_node) + return const1_rtx; + return const0_rtx; + } + + /* Return a one or zero if it can be determined that size ARG is lock free on +this architecture. */ + static tree + fold_builtin_sync_mem_is_lock_free (tree arg) + { + tree always = fold_builtin_sync_mem_always_lock_free (arg); + + /* If it isnt always lock free, don't generate a result. */ + if (always == integer_one_node) + return always; + + return NULL_TREE; + } + + /* Return one or zero if the first argument to call EXP represents a size of +object than can generate lock-free instructions on
[testsuite] Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125)
After almost two months, two tests are still XPASSing everywhere: XPASS: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12) XPASS: gcc.dg/uninit-pr19430.c (test for warnings, line 32) XPASS: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41) I think it's time to remove the xfail's. Tested with the appropriate runtest invocation on i386-pc-solaris2.10, ok for mainline? Rainer 2011-10-06 Rainer Orth r...@cebitec.uni-bielefeld.de PR middle-end/50125 * gcc.dg/uninit-B.c (baz): Remove xfail *-*-*. * gcc.dg/uninit-pr19430.c (main): Remove xfail *-*-*. (bar3): Likewise. # HG changeset patch # Parent 60c73f26147c2e549be69d750637ed45ca48e93c Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125) diff --git a/gcc/testsuite/gcc.dg/uninit-B.c b/gcc/testsuite/gcc.dg/uninit-B.c --- a/gcc/testsuite/gcc.dg/uninit-B.c +++ b/gcc/testsuite/gcc.dg/uninit-B.c @@ -9,7 +9,7 @@ void baz (void) { int i; - if (i) /* { dg-warning is used uninitialized uninit i warning { xfail *-*-* } } */ + if (i) /* { dg-warning is used uninitialized uninit i warning } */ bar (i); foo (i); } diff --git a/gcc/testsuite/gcc.dg/uninit-pr19430.c b/gcc/testsuite/gcc.dg/uninit-pr19430.c --- a/gcc/testsuite/gcc.dg/uninit-pr19430.c +++ b/gcc/testsuite/gcc.dg/uninit-pr19430.c @@ -29,7 +29,7 @@ void frob(int *pi); int main(void) { int i; - printf(i = %d\n, i); /* { dg-warning 'i' is used uninitialized in this function { xfail *-*-* } } */ + printf(i = %d\n, i); /* { dg-warning 'i' is used uninitialized in this function } */ frob(i); return 0; @@ -38,6 +38,6 @@ int main(void) void foo3(int*); void bar3(void) { int x; - if(x) /* { dg-warning 'x' is used uninitialized in this function uninitialized { xfail *-*-* } } */ + if(x) /* { dg-warning 'x' is used uninitialized in this function uninitialized } */ foo3(x); } -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Initial shrink-wrapping patch
Bernd Schmidt ber...@codesourcery.com writes: On 10/06/11 05:17, Ian Lance Taylor wrote: Thinking about it I think this is the wrong approach. The -fsplit-stack code by definition has to wrap the entire function and it can not modify any callee-saved registers. We should do shrink wrapping before -fsplit-stack, not the other way around. Sorry, I'm not following what you're saying here. Can you elaborate? Basically -fsplit-stack wraps the entire function in code that (on x86_64) looks like cmpq%fs:112, %rsp jae .L2 movl$24, %r10d movl$0, %r11d call__morestack ret .L2: There is absolutely no reason to try to shrink wrap that code. It will never help. That code always has to be first. It especially has to be first because the gold linker recognizes the prologue specially when a split-stack function calls a non-split-stack function, in order to request a larger stack. Therefore, it seems to me that we should apply shrink wrapping to the function as it exists *before* the split-stack prologue is created. The flag_split_stack bit should be moved after the flag_shrink_wrap bit. Ian
Re: Initial shrink-wrapping patch
On 10/06/11 17:57, Ian Lance Taylor wrote: There is absolutely no reason to try to shrink wrap that code. It will never help. That code always has to be first. It especially has to be first because the gold linker recognizes the prologue specially when a split-stack function calls a non-split-stack function, in order to request a larger stack. Urgh, ok. Therefore, it seems to me that we should apply shrink wrapping to the function as it exists *before* the split-stack prologue is created. The flag_split_stack bit should be moved after the flag_shrink_wrap bit. Sounds like we just need to always emit the split prologue on the original entry edge then. Can you test the following with Go? Bernd * function.c (thread_prologue_and_epilogue_insns): Emit split prologue on the orig_entry_edge. Don't account for it in prologue_clobbered. Index: gcc/function.c === --- gcc/function.c (revision 179619) +++ gcc/function.c (working copy) @@ -5602,10 +5602,6 @@ thread_prologue_and_epilogue_insns (void note_stores (PATTERN (p_insn), record_hard_reg_sets, prologue_clobbered); } - for (p_insn = split_prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn)) - if (NONDEBUG_INSN_P (p_insn)) - note_stores (PATTERN (p_insn), record_hard_reg_sets, - prologue_clobbered); bitmap_initialize (bb_antic_flags, bitmap_default_obstack); bitmap_initialize (bb_on_list, bitmap_default_obstack); @@ -5758,7 +5754,7 @@ thread_prologue_and_epilogue_insns (void if (split_prologue_seq != NULL_RTX) { - insert_insn_on_edge (split_prologue_seq, entry_edge); + insert_insn_on_edge (split_prologue_seq, orig_entry_edge); inserted = true; } if (prologue_seq != NULL_RTX)
[PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands
Hi! Since Richard's changes recently to allow different modes in vcond patterns (so far on i?86/x86_64 only I think) we can vectorize more COND_EXPRs than before, and this patch improves it a tiny bit more - even i?86/x86_64 support vconds only if the sizes of vector element modes are the same. With this patch we can optimize even if it is wider or narrower, by vectorizing it as the COND_EXPR in integer mode matching the size of the comparsion operands and then a cast. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-06 Jakub Jelinek ja...@redhat.com PR tree-optimization/50596 * tree-vectorizer.h (vect_is_simple_cond): New prototype. (NUM_PATTERNS): Change to 6. * tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern): New function. (vect_vect_recog_func_ptrs): Add vect_recog_mixed_size_cond_pattern. (vect_mark_pattern_stmts): Don't create stmt_vinfo for def_stmt if it already has one, and don't set STMT_VINFO_VECTYPE in it if it is already set. * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Handle COND_EXPR and VEC_COND_EXPR in pattern stmts. (vect_is_simple_cond): No longer static. * lib/target-supports.exp (check_effective_target_vect_cond_mixed): New. * gcc.dg/vect/vect-cond-8.c: New test. --- gcc/tree-vectorizer.h.jj2011-09-26 14:06:52.0 +0200 +++ gcc/tree-vectorizer.h 2011-10-06 10:04:03.0 +0200 @@ -1,5 +1,5 @@ /* Vectorizer - Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 + Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. Contributed by Dorit Naishlos do...@il.ibm.com @@ -818,6 +818,7 @@ extern bool vect_transform_stmt (gimple, bool *, slp_tree, slp_instance); extern void vect_remove_stores (gimple); extern bool vect_analyze_stmt (gimple, bool *, slp_tree); +extern bool vect_is_simple_cond (tree, loop_vec_info, tree *); extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *, tree, int); extern void vect_get_load_cost (struct data_reference *, int, bool, @@ -902,7 +903,7 @@ extern void vect_slp_transform_bb (basic Additional pattern recognition functions can (and will) be added in the future. */ typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); -#define NUM_PATTERNS 5 +#define NUM_PATTERNS 6 void vect_pattern_recog (loop_vec_info); /* In tree-vectorizer.c. */ --- gcc/tree-vect-patterns.c.jj 2011-10-06 09:14:17.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-06 14:37:12.0 +0200 @@ -49,12 +49,15 @@ static gimple vect_recog_dot_prod_patter static gimple vect_recog_pow_pattern (VEC (gimple, heap) **, tree *, tree *); static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **, tree *, tree *); +static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **, + tree *, tree *); static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = { vect_recog_widen_mult_pattern, vect_recog_widen_sum_pattern, vect_recog_dot_prod_pattern, vect_recog_pow_pattern, -vect_recog_over_widening_pattern}; + vect_recog_over_widening_pattern, + vect_recog_mixed_size_cond_pattern}; /* Function widened_name_p @@ -1218,6 +1214,120 @@ vect_recog_over_widening_pattern (VEC (g } +/* Function vect_recog_mixed_size_cond_pattern + + Try to find the following pattern: + + type x_t, y_t; + TYPE a_T, b_T, c_T; + loop: + S1 a_T = x_t CMP y_t ? b_T : c_T; + + where type 'TYPE' is an integral type which has different size + from 'type'. b_T and c_T are constants and if 'TYPE' is wider + than 'type', the constants need to fit into an integer type + with the same width as 'type'. + + Input: + + * LAST_STMT: A stmt from which the pattern search begins. + + Output: + + * TYPE_IN: The type of the input arguments to the pattern. + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the pattern. + Additionally a def_stmt is added. + + a_it = x_t CMP y_t ? b_it : c_it; + a_T = (TYPE) a_it; */ + +static gimple +vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **stmts, tree *type_in, + tree *type_out) +{ + gimple last_stmt = VEC_index (gimple, *stmts, 0); + tree cond_expr, then_clause, else_clause; + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt), def_stmt_info; + tree type, vectype, comp_vectype, itype, vecitype; + enum machine_mode cmpmode; + gimple pattern_stmt, def_stmt; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); + + if (!is_gimple_assign
[PATCH] Minor readability improvement in vect_pattern_recog{,_1}
Hi! tree-vectorizer.h already has typedefs for the recog functions, and using that typedef we can make these two functions slightly more readable. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-06 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_pattern_recog_1): Use vect_recog_func_ptr typedef for the first argument. (vect_pattern_recog): Rename vect_recog_func_ptr variable to vect_recog_func, use vect_recog_func_ptr typedef for it. --- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-06 15:50:12.0 +0200 @@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm for vect_recog_pattern. */ static void -vect_pattern_recog_1 ( - gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *), - gimple_stmt_iterator si, - VEC (gimple, heap) **stmts_to_replace) +vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func, + gimple_stmt_iterator si, + VEC (gimple, heap) **stmts_to_replace) { gimple stmt = gsi_stmt (si), pattern_stmt; stmt_vec_info stmt_info; @@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v unsigned int nbbs = loop-num_nodes; gimple_stmt_iterator si; unsigned int i, j; - gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); + vect_recog_func_ptr vect_recog_func; VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); if (vect_print_dump_info (REPORT_DETAILS)) @@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v /* Scan over all generic vect_recog_xxx_pattern functions. */ for (j = 0; j NUM_PATTERNS; j++) { - vect_recog_func_ptr = vect_vect_recog_func_ptrs[j]; - vect_pattern_recog_1 (vect_recog_func_ptr, si, + vect_recog_func = vect_vect_recog_func_ptrs[j]; + vect_pattern_recog_1 (vect_recog_func, si, stmts_to_replace); } } Jakub
[PATCH] vshuffle: Use correct mode for mask operand.
--- gcc/ChangeLog |5 + gcc/optabs.c | 16 +++- 2 files changed, 12 insertions(+), 9 deletions(-) * optabs.c (expand_vec_shuffle_expr): Use the proper mode for the mask operand. Tidy the code. This patch is required before I rearrange the testsuite to actually test floating-point shuffle. diff --git a/gcc/optabs.c b/gcc/optabs.c index 3a52fb0..aa233d5 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6650,9 +6650,8 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target) struct expand_operand ops[4]; enum insn_code icode; enum machine_mode mode = TYPE_MODE (type); - rtx rtx_v0, rtx_mask; - gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask)); + gcc_checking_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask)); if (TREE_CODE (mask) == VECTOR_CST) { @@ -6675,24 +6674,23 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, tree mask, rtx target) return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL); } -vshuffle: + vshuffle: icode = direct_optab_handler (vshuffle_optab, mode); if (icode == CODE_FOR_nothing) return 0; - rtx_mask = expand_normal (mask); - create_output_operand (ops[0], target, mode); - create_input_operand (ops[3], rtx_mask, mode); + create_input_operand (ops[3], expand_normal (mask), + TYPE_MODE (TREE_TYPE (mask))); if (operand_equal_p (v0, v1, 0)) { - rtx_v0 = expand_normal (v0); - if (!insn_operand_matches(icode, 1, rtx_v0)) + rtx rtx_v0 = expand_normal (v0); + if (!insn_operand_matches (icode, 1, rtx_v0)) rtx_v0 = force_reg (mode, rtx_v0); - gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0)); + gcc_checking_assert (insn_operand_matches (icode, 2, rtx_v0)); create_fixed_operand (ops[1], rtx_v0); create_fixed_operand (ops[2], rtx_v0); -- 1.7.6.4
[PATCH] Rework vector shuffle tests.
Test vector sizes 8, 16, and 32. Test most data types for each size. This should also solve the problem that Georg reported for AVR. Indeed, I hope that except for the DImode/DFmode tests, these actually execute on that target. r~ Cc: Georg-Johann Lay a...@gjlay.de --- gcc/testsuite/ChangeLog| 29 ++ .../gcc.c-torture/execute/vect-shuffle-1.c | 68 - .../gcc.c-torture/execute/vect-shuffle-2.c | 68 - .../gcc.c-torture/execute/vect-shuffle-3.c | 58 --- .../gcc.c-torture/execute/vect-shuffle-4.c | 51 -- .../gcc.c-torture/execute/vect-shuffle-5.c | 64 .../gcc.c-torture/execute/vect-shuffle-6.c | 64 .../gcc.c-torture/execute/vect-shuffle-7.c | 70 -- .../gcc.c-torture/execute/vect-shuffle-8.c | 55 --- gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc | 81 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc| 38 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc| 39 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc| 101 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc | 26 + gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c |5 + gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c | 15 +++ gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c | 15 +++ gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c | 21 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c | 18 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c | 19 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c | 19 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c | 15 +++ gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c | 25 + gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c | 22 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c | 23 + gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c | 23 + gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c | 30 ++ 27 files changed, 564 insertions(+), 498 deletions(-) delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c + * gcc.c-torture/execute/vect-shuffle-1.c: Remove. + * gcc.c-torture/execute/vect-shuffle-2.c: Remove. + * gcc.c-torture/execute/vect-shuffle-3.c: Remove. + * gcc.c-torture/execute/vect-shuffle-4.c: Remove. + * gcc.c-torture/execute/vect-shuffle-5.c: Remove. + * gcc.c-torture/execute/vect-shuffle-6.c: Remove. + * gcc.c-torture/execute/vect-shuffle-7.c: Remove. + * gcc.c-torture/execute/vect-shuffle-8.c: Remove. + * gcc.c-torture/execute/vshuf-16.inc: New file. + * gcc.c-torture/execute/vshuf-2.inc: New file. + * gcc.c-torture/execute/vshuf-4.inc: New file. + * gcc.c-torture/execute/vshuf-8.inc: New file. + * gcc.c-torture/execute/vshuf-main.inc: New file. + * gcc.c-torture/execute/vshuf-v16qi.c: New test. + * gcc.c-torture/execute/vshuf-v2df.c: New test. + * gcc.c-torture/execute/vshuf-v2di.c: New test. + * gcc.c-torture/execute/vshuf-v2sf.c: New test. + *
Re: [PATCH] Fix PR46556 (poor address generation)
On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote: snip Doh, I thought you were matching gimple stmts that do the address computation. But now I see you are matching the tree returned from get_inner_reference. So no need to check anything for that case. But that keeps me wondering what you'll do if the accesses were all pointer arithmetic, not arrays. Thus, extern void foo (int, int, int); void f (int *p, unsigned int n) { foo (p[n], p[n+64], p[n+128]); } wouldn't that have the same issue and you wouldn't handle it? Richard. Good point. This indeed gets missed here, and that's more fuel for doing a generalized strength reduction along with the special cases like p-a[n] that are only exposed with get_inner_reference. (The pointer arithmetic cases were picked up in my earlier big-hammer approach using the aff-comb machinery, but that had too many problems in the end, as you know.) So for the long term I will look into a full strength reducer for non-loop code. For the short term, what do you think about keeping this single transformation in reassoc to make sure it gets into 4.7? I would plan to strip it back out and fold it into the strength reducer thereafter, which might or might not make 4.7 depending on my other responsibilities and how the 4.7 schedule goes. I haven't seen anything official, but I'm guessing we're getting towards the end of 4.7 stage 1?
Re: [PATCH][ARM] Fix broken shift patterns
I believe this patch to be nothing but an improvement over the current state, and that a fix to the constraint problem should be a separate patch. In that basis, am I OK to commit? One minor nit: (define_special_predicate shift_operator ... + (ior (match_test GET_CODE (XEXP (op, 1)) == CONST_INT + ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) 32) + (match_test REG_P (XEXP (op, 1)) We're already enforcing the REG_P elsewhere, and it's only valid in some contexts, so I'd change this to: (match_test GET_CODE (XEXP (op, 1)) != CONST_INT || ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) 32) Now, let me explain the other problem: As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit a shift to be expressed as either a shift type and amount (register or constant), or as a multiply and power-of-two constant. Added complication is that only ARM mode accepts a register. Problem scenario 1: Consider pattern (plus (mult r1 r2) r3). It so happens that reload knows that r2 contains a constant, say 20, so reload checks to see if that could be converted to an immediate. Now, 20 is not a power of two, so recog would reject it, but it is in the range 0..31 so it does match the 'M' constraint. Oops! Though as you mention below we the predicate don't allow the second operand to be a register, so this can never happen. Reload may do unexpected things, but if it starts randomly changing valid const_int values then we have much bigger problems. Problem scenario 2: Consider pattern (ashiftrt r1 r2). Again, it so happens that reload knows that r2 contains a constant, in this case let's say 64, so again reload checks to see if that could be converted to an immediate. This time, 64 is not in the range 0..31, so recog would reject it, but it is a power of two, so it does match the 'M' constraint. Again, oops! I see two ways to fix this properly: 1. Duplicate all the patterns in the machine description, once for the mult case, and once for the other cases. This could probably be done with a code iterator, if preferred. 2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it always uses the (presumably cheaper) shift-and-add option. However, this would require all other targets where madd really is the best option to fix it up. (I'd imagine that two instructions for shift and add would be cheaper speed wise, if properly scheduled, on most targets? That doesn't help the size optimization though.) 3. Consistently accept both power-of-two and 0..31 for shifts. Large shift counts give undefined results[1], so replace them with an arbitrary value (e.g. 0) during assembly output. Argualy not an entirely proper fix, but I think it'll keep everything happy. However, it's not obvious to me that this needs fixing: * The failure mode would be an ICE, and we've not seen any. Then again noone noticed the negative-shift ICE until recently :-/ * There's a comment in arm.c:shift_op that suggests that this can't happen, somehow, at least in the mult case. - I'm not sure exactly how reload works, but it seems reasonable that it will never try to convert a register to an immediate because the pattern does not allow registers in the first place. - This logic doesn't hold in the opposite case though. Have I explained all that clearly? I think you've convered most of it. For bonus points we should probably disallow MULT in the arm_shiftsi3 pattern, stop it interacting with the regulat mulsi3 pattern in undesirable ways. Paul [1] Or at least not any result gcc will be expecting.
Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands
On 6 October 2011 18:17, Jakub Jelinek ja...@redhat.com wrote: Hi! Since Richard's changes recently to allow different modes in vcond patterns (so far on i?86/x86_64 only I think) we can vectorize more COND_EXPRs than before, and this patch improves it a tiny bit more - even i?86/x86_64 support vconds only if the sizes of vector element modes are the same. With this patch we can optimize even if it is wider or narrower, by vectorizing it as the COND_EXPR in integer mode matching the size of the comparsion operands and then a cast. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK, but... --- gcc/tree-vect-stmts.c.jj 2011-09-29 14:25:46.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-06 12:16:43.0 +0200 @@ -652,9 +652,26 @@ vect_mark_stmts_to_be_vectorized (loop_v have to scan the RHS or function arguments instead. */ if (is_gimple_assign (stmt)) { - for (i = 1; i gimple_num_ops (stmt); i++) + enum tree_code rhs_code = gimple_assign_rhs_code (stmt); + tree op = gimple_assign_rhs1 (stmt); + + i = 1; + if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR) I don't understand why we need VEC_COND_EXPR here. + COMPARISON_CLASS_P (op)) + { + if (!process_use (stmt, TREE_OPERAND (op, 0), loop_vinfo, + live_p, relevant, worklist) + || !process_use (stmt, TREE_OPERAND (op, 1), loop_vinfo, + live_p, relevant, worklist)) + { + VEC_free (gimple, heap, worklist); + return false; + } + i = 2; + } + for (; i gimple_num_ops (stmt); i++) { - tree op = gimple_op (stmt, i); + op = gimple_op (stmt, i); if (!process_use (stmt, op, loop_vinfo, live_p, relevant, worklist)) { Thanks, Ira
Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}
On 6 October 2011 18:19, Jakub Jelinek ja...@redhat.com wrote: Hi! tree-vectorizer.h already has typedefs for the recog functions, and using that typedef we can make these two functions slightly more readable. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Ira 2011-10-06 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_pattern_recog_1): Use vect_recog_func_ptr typedef for the first argument. (vect_pattern_recog): Rename vect_recog_func_ptr variable to vect_recog_func, use vect_recog_func_ptr typedef for it. --- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200 +++ gcc/tree-vect-patterns.c 2011-10-06 15:50:12.0 +0200 @@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm for vect_recog_pattern. */ static void -vect_pattern_recog_1 ( - gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *), - gimple_stmt_iterator si, - VEC (gimple, heap) **stmts_to_replace) +vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func, + gimple_stmt_iterator si, + VEC (gimple, heap) **stmts_to_replace) { gimple stmt = gsi_stmt (si), pattern_stmt; stmt_vec_info stmt_info; @@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v unsigned int nbbs = loop-num_nodes; gimple_stmt_iterator si; unsigned int i, j; - gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *); + vect_recog_func_ptr vect_recog_func; VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1); if (vect_print_dump_info (REPORT_DETAILS)) @@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v /* Scan over all generic vect_recog_xxx_pattern functions. */ for (j = 0; j NUM_PATTERNS; j++) { - vect_recog_func_ptr = vect_vect_recog_func_ptrs[j]; - vect_pattern_recog_1 (vect_recog_func_ptr, si, + vect_recog_func = vect_vect_recog_func_ptrs[j]; + vect_pattern_recog_1 (vect_recog_func, si, stmts_to_replace); } } Jakub
Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands
On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote: + i = 1; + if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR) I don't understand why we need VEC_COND_EXPR here. Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like COND_EXPR. I can leave that out if you want. Jakub
Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}
On 10/06/2011 09:19 AM, Jakub Jelinek wrote: * tree-vect-patterns.c (vect_pattern_recog_1): Use vect_recog_func_ptr typedef for the first argument. (vect_pattern_recog): Rename vect_recog_func_ptr variable to vect_recog_func, use vect_recog_func_ptr typedef for it. Ok. r~
Re: Initial shrink-wrapping patch
On 10/06/2011 09:01 AM, Bernd Schmidt wrote: On 10/06/11 17:57, Ian Lance Taylor wrote: There is absolutely no reason to try to shrink wrap that code. It will never help. That code always has to be first. It especially has to be first because the gold linker recognizes the prologue specially when a split-stack function calls a non-split-stack function, in order to request a larger stack. Urgh, ok. Therefore, it seems to me that we should apply shrink wrapping to the function as it exists *before* the split-stack prologue is created. The flag_split_stack bit should be moved after the flag_shrink_wrap bit. Sounds like we just need to always emit the split prologue on the original entry edge then. Can you test the following with Go? Looks reasonable. I wonder if we can have this as a generic feature? I'm thinking about things like the MIPS and Alpha load-gp stuff. That operation also needs to happen exactly at the start of the function, due to the pc-relative nature of the operation. I do see that MIPS works around this by emitting the load-gp as text in the legacy prologue. But Alpha makes some effort to emit this as rtl, so that the scheduler knows about the two pipeline reservations and the latency of any use of the gp register. Would a pre_prologue named pattern seem wrong to anyone? r~
Re: [PATCH] Fix PR38885
On Wed, Oct 5, 2011 at 6:48 AM, Richard Guenther rguent...@suse.de wrote: I'm testing a pair of patches to fix PR38885 (for constants) and PR38884 (for non-constants) stores to complex/vector memory and CSE of component accesses from SCCVN. This is the piece that handles stores from constants and partial reads of it. We can conveniently re-use fold-const native encode/interpret code for this. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-10-05 Richard Guenther rguent...@suse.de PR tree-optimization/38885 * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads from constants. * gcc.dg/tree-ssa/ssa-fre-33.c: New testcase. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50634 -- H.J.
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 09:37, Jakub Jelinek wrote: On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote: None. I had this implemented first. But Richard was concerned about making non-IF conditions too long.I understand that point that it might not that good to always modify unconditionally to AND/OR chain. For example if (a1 a2 a3 a100) return 1; would be packed by this patch into 50 branches. If we would modify all of them into AND, then we would calculate for all 100 values the result, even if the first a1 is zero. This doesn't improve speed pretty well. But you are right, that from the point of reassociation optimization it could be in some cases more profitable to have packed all elements into on AND-chain. Yeah. Perhaps we should break them up after reassoc2, or on the other side teach reassoc (or some other pass) to be able to do the optimizations on a series of GIMPLE_COND with no side-effects in between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4; isn't optimized into (a - 1U) 4U, although it could, if branch cost cause it to be broken up into several GIMPLE_COND stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more probably using enums). I haven't followed this thread as closely as perhaps I should; what I'm seeing discussed now looks a lot like condition merging and I'm pretty sure there's some research in this area that might guide us. multi-variable condition merging is the term the authors used. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOjeYFAAoJEBRtltQi2kC7eFMIALjFM/GIg1DDryU59EoFQe5A x7pvx3FSlcjLWeyIlzYJvWF4wGybRNNXp5qziIedO6qp0Z/06VvCU07A10VoWSig /EFufo5l+Jef5s1d0mA6mBJ9A52HDL2ipOK8YDQbVzJWqHdaXLrrzUni3wGwcUVs v3UIi5OevjRhJ55fRVxBcReJKF6YAzxFDxqOnVGAbf9R3BEJ2T9JW2CLhIcd/T1L D9K+6YymHaN9eYh7B7gPKG88q+5JjcStHuMQODKSAegt3T4iP9CH/G5dV8u95Y+q 6mxo8gOGAwYR7N/U6fuXRaGJEzWSdrqRy2EBF5B7+Rt6lSWXdfzUEBusT24i67A= =HIrU -END PGP SIGNATURE-
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/6 Michael Matz m...@suse.de: Hi, On Thu, 6 Oct 2011, Kai Tietz wrote: at which point this association doesn't make sense anymore, as Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and therefore it isn't transformed into and AND. Right ... ((W AND X) AND Y) AND Z is just as fine. So, the reassociation looks fishy at best, better get rid of it? (which of the testcases breaks without it?) None. I had this implemented first. But Richard was concerned about making non-IF conditions too long. I understand that point that it might not that good to always modify unconditionally to AND/OR chain. ... and I see that (that's why the transformation should be desirable for some definition of desirable, which probably includes and RHS not too long chain). As it stands right now your transformation seems to be a fairly ad-hoc try at avoiding this problem. That's why I wonder why to do the reassoc at all? Which testcases break _without_ the reassociation, i.e. with only rewriting ANDIF - AND at the outermost level? I don't do here reassociation in inner. See that patch calls build2_loc, and not fold_build2_loc anymore. So it doesn't retries to associate in inner anymore (which might be something of interest for the issue Jakub mentioned). There is no test actual failing AFAICS. I just noticed size-differences by this. Nevertheless it might be better to enhance reassociation pass to break-up (and repropagate) GIMPLE_CONDs with non-side-effect, as Jakub suggested. The other chance might be here to allow deeper chains then two elements within one AND/OR element, but this would be architecture dependent. For x86 -as example- used instruction cycles for a common for branching would suggest that it might be profitable to have here 3 or 4 leafs within one AND|OR chain. But for sure on other architectures the amount of leafs varies. Regards, Kai
Re: [PATCH] Fix PR46556 (poor address generation)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 04:13, Richard Guenther wrote: People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. There's a variety of literature that uses PRE to detect and optimize straightline code strength reduction. I poked at it at one time (RTL gcse framework) and it looked reasonably promising. Never pushed it all the way through. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOjebJAAoJEBRtltQi2kC71ogH/AkMNzXpYK1GXp2EhoS+3Dhn T1mWDKdHT5+ozpuAxRFzuCSQ8HmkbLJk8fGpOyUuLr15zEnT1isE7cU3i4ZzY3o0 lduo9Ck23rMWNroYgxbV+zPvArW5MG9qrGO6XSBynfipmlpznEo8zQPiaoaASlHz 8G7gd9P2la1QHha9OVtiCMKs0zgckU55RqiwV7d8DMi5tgoq5wkN+qcKCoSI7+b0 jxAukIcp6O8QZ6ADcHyAdav+zZzGDBycEhgakam71WifjFlysah2TG05SsK75Dxi h3S13yPpx/A8zBuex5osL0qOGn0H7L93uAsTxcv4dTEpUl4Jx7Y5FoPOEp5D1Z4= =LcZy -END PGP SIGNATURE-
Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands
On 6 October 2011 19:28, Jakub Jelinek ja...@redhat.com wrote: On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote: + i = 1; + if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR) I don't understand why we need VEC_COND_EXPR here. Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like COND_EXPR. I can leave that out if you want. But we mark stmts that we want to vectorize here. I think that expecting a vector stmt is confusing. So yes, please, leave it out. Thanks, Ira Jakub
Re: [PATCH] Add support for lzd and popc instructions on sparc.
On 10/05/2011 11:41 PM, David Miller wrote: +(define_expand popcountmode2 + [(set (match_operand:SIDI 0 register_operand ) +(popcount:SIDI (match_operand:SIDI 1 register_operand )))] + TARGET_POPC +{ + if (! TARGET_ARCH64) +{ + emit_insn (gen_popcountmode_v8plus (operands[0], operands[1])); + DONE; +} +}) + +(define_insn *popcountmode_sp64 + [(set (match_operand:SIDI 0 register_operand =r) +(popcount:SIDI (match_operand:SIDI 1 register_operand r)))] + TARGET_POPC TARGET_ARCH64 + popc\t%1, %0) You've said that POPC only operates on the full 64-bit register, but I see no zero-extend of the SImode input? Similarly for the clzsi patterns. If it weren't for the v8plus ugliness, it would be sufficient to only expose the DImode patterns, and let optabs.c do the work to extend from SImode... r~
[Patch 0/5] ARM 64 bit sync atomic operations [V3]
Hi, This is V3 of a series of 5 patches relating to ARM atomic operations; they incorporate most of the feedback from V2. Note the patch numbering/ ordering is different from v2; the two simple patches are now first. 1) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k as per the docs). 2) Fix pr48126 which is a misplaced barrier in the atomic generation 3) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k and above. 4) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel asssist is called (as per 32bit and smaller ops) 5) Add test cases and support for those test cases, for the operations added in (3) and (4). This code has been tested built on x86-64 cross to ARM run in ARM and Thumb (C, C++, Fortran). It is against git rev 68a79dfc. Relative to v2: Test cases split out Code sharing between the test cases More coding style cleanup A handful of NULL-NULL_RTX changes Relative to v1: Split the DMB_MCR patch out Provide complete changelogs Don't emit IT instruction except in Thumb2 mode Move iterators to iterators.md (didn't move the table since it was specific to sync.md) Remove sync_atleastsi Use sync_predtab in as many places as possible Avoid headers in libgcc Made various libgcc routines I added static used __write instead of write Comment the barrier move to explain it more Note that the kernel interface has remained the same for the helper, and as such I've not changed the way the helper calling in patch 2 is structured. This work is part of Linaro blueprint: https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives Dave
[Patch 1/5] ARM 64 bit sync atomic operations [V3]
gcc/ * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1 diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 993e3a0..f6f1da7 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void); #define TARGET_HAVE_DMB(arm_arch7) /* Nonzero if this chip implements a memory barrier via CP15. */ -#define TARGET_HAVE_DMB_MCR(arm_arch6k ! TARGET_HAVE_DMB) +#define TARGET_HAVE_DMB_MCR(arm_arch6 ! TARGET_HAVE_DMB \ + ! TARGET_THUMB1) /* Nonzero if this chip implements a memory barrier instruction. */ #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
[Patch 2/5] ARM 64 bit sync atomic operations [V3]
Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place relative to the branch target of the compare, since the load could float up beyond the ldrex. PR target/48126 * config/arm/arm.c (arm_output_sync_loop): Move label before barrier diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5161439..6e7105a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit, } } - arm_process_output_memory_barrier (emit, NULL); + /* Note: label is before barrier so that in cmp failure case we still get + a barrier to stop subsequent loads floating upwards past the ldrex + pr/48126. */ arm_output_asm_insn (emit, 1, operands, %sLSYB%%=:, LOCAL_LABEL_PREFIX); + arm_process_output_memory_barrier (emit, NULL); } static rtx
[Patch 3/5] ARM 64 bit sync atomic operations [V3]
Add support for ARM 64bit sync intrinsics. gcc/ * arm.c (arm_output_ldrex): Support ldrexd. (arm_output_strex): Support strexd. (arm_output_it): New helper to output it in Thumb2 mode only. (arm_output_sync_loop): Support DI mode, Change comment to not support const_int. (arm_expand_sync): Support DI mode. * arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD. * iterators.md (NARROW): move from sync.md. (QHSD): New iterator for all current ARM integer modes. (SIDI): New iterator for SI and DI modes only. * sync.md (sync_predtab): New mode_attr (sync_compare_and_swapsi): Fold into sync_compare_and_swapmode (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsimode (sync_sync_optabsi): Fold into sync_sync_optabmode (sync_nandsi): Fold into sync_nandmode (sync_new_sync_optabsi): Fold into sync_new_sync_optabmode (sync_new_nandsi): Fold into sync_new_nandmode (sync_old_sync_optabsi): Fold into sync_old_sync_optabmode (sync_old_nandsi): Fold into sync_old_nandmode (sync_compare_and_swapmode): Support SI DI (sync_lock_test_and_setmode): Likewise (sync_sync_optabmode): Likewise (sync_nandmode): Likewise (sync_new_sync_optabmode): Likewise (sync_new_nandmode): Likewise (sync_old_sync_optabmode): Likewise (sync_old_nandmode): Likewise (arm_sync_compare_and_swapsi): Turn into iterator on SI DI (arm_sync_lock_test_and_setsi): Likewise (arm_sync_new_sync_optabsi): Likewise (arm_sync_new_nandsi): Likewise (arm_sync_old_sync_optabsi): Likewise (arm_sync_old_nandsi): Likewise (arm_sync_compare_and_swapmode NARROW): use sync_predtab, fix indent (arm_sync_lock_test_and_setsimode NARROW): Likewise (arm_sync_new_sync_optabmode NARROW): Likewise (arm_sync_new_nandmode NARROW): Likewise (arm_sync_old_sync_optabmode NARROW): Likewise (arm_sync_old_nandmode NARROW): Likewise diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6e7105a..51c0f3f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24039,12 +24039,26 @@ arm_output_ldrex (emit_f emit, rtx target, rtx memory) { - const char *suffix = arm_ldrex_suffix (mode); - rtx operands[2]; + rtx operands[3]; operands[0] = target; - operands[1] = memory; - arm_output_asm_insn (emit, 0, operands, ldrex%s\t%%0, %%C1, suffix); + if (mode != DImode) +{ + const char *suffix = arm_ldrex_suffix (mode); + operands[1] = memory; + arm_output_asm_insn (emit, 0, operands, ldrex%s\t%%0, %%C1, suffix); +} + else +{ + /* The restrictions on target registers in ARM mode are that the two +registers are consecutive and the first one is even; Thumb is +actually more flexible, but DI should give us this anyway. +Note that the 1st register always gets the lowest word in memory. */ + gcc_assert ((REGNO (target) 1) == 0); + operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1); + operands[2] = memory; + arm_output_asm_insn (emit, 0, operands, ldrexd\t%%0, %%1, %%C2); +} } /* Emit a strex{b,h,d, } instruction appropriate for the specified @@ -24057,14 +24071,41 @@ arm_output_strex (emit_f emit, rtx value, rtx memory) { - const char *suffix = arm_ldrex_suffix (mode); - rtx operands[3]; + rtx operands[4]; operands[0] = result; operands[1] = value; - operands[2] = memory; - arm_output_asm_insn (emit, 0, operands, strex%s%s\t%%0, %%1, %%C2, suffix, - cc); + if (mode != DImode) +{ + const char *suffix = arm_ldrex_suffix (mode); + operands[2] = memory; + arm_output_asm_insn (emit, 0, operands, strex%s%s\t%%0, %%1, %%C2, + suffix, cc); +} + else +{ + /* The restrictions on target registers in ARM mode are that the two +registers are consecutive and the first one is even; Thumb is +actually more flexible, but DI should give us this anyway. +Note that the 1st register always gets the lowest word in memory. */ + gcc_assert ((REGNO (value) 1) == 0 || TARGET_THUMB2); + operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1); + operands[3] = memory; + arm_output_asm_insn (emit, 0, operands, strexd%s\t%%0, %%1, %%2, %%C3, + cc); +} +} + +/* Helper to emit an it instruction in Thumb2 mode only; although the assembler + will ignore it in ARM mode, emitting it will mess up instruction counts we + sometimes keep 'flags' are the extra t's and e's if it's more than one + instruction that is conditional. */ +static void
[Patch 4/5] ARM 64 bit sync atomic operations [V3]
Add ARM 64bit sync helpers for use on older ARMs. Based on 32bit versions but with check for sufficiently new kernel version. gcc/ * config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c) * config/arm/linux-atomic.c: Change comment to point to 64bit version (SYNC_LOCK_RELEASE): Instantiate 64bit version. * config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c new file mode 100644 index 000..6966e66 --- /dev/null +++ b/gcc/config/arm/linux-atomic-64bit.c @@ -0,0 +1,166 @@ +/* 64bit Linux-specific atomic operations for ARM EABI. + Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + Based on linux-atomic.c + + 64 bit additions david.gilb...@linaro.org + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +/* 64bit helper functions for atomic operations; the compiler will + call these when the code is compiled for a CPU without ldrexd/strexd. + (If the CPU had those then the compiler inlines the operation). + + These helpers require a kernel helper that's only present on newer + kernels; we check for that in an init section and bail out rather + unceremoneously. */ + +extern unsigned int __write (int fd, const void *buf, unsigned int count); +extern void abort (void); + +/* Kernel helper for compare-and-exchange. */ +typedef int (__kernel_cmpxchg64_t) (const long long* oldval, + const long long* newval, + long long *ptr); +#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60) + +/* Kernel helper page version number. */ +#define __kernel_helper_version (*(unsigned int *)0x0ffc) + +/* Check that the kernel has a new enough version at load. */ +static void __check_for_sync8_kernelhelper (void) +{ + if (__kernel_helper_version 5) +{ + const char err[] = A newer kernel is required to run this binary. + (__kernel_cmpxchg64 helper)\n; + /* At this point we need a way to crash with some information +for the user - I'm not sure I can rely on much else being +available at this point, so do the same as generic-morestack.c +write () and abort (). */ + __write (2 /* stderr. */, err, sizeof (err)); + abort (); +} +}; + +static void (*__sync8_kernelhelper_inithook[]) (void) + __attribute__ ((used, section (.init_array))) = { + __check_for_sync8_kernelhelper +}; + +#define HIDDEN __attribute__ ((visibility (hidden))) + +#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\ + long long HIDDEN \ + __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\ + {\ +int failure; \ +long long tmp,tmp2;\ + \ +do { \ + tmp = *ptr; \ + tmp2 = PFX_OP (tmp INF_OP val); \ + failure = __kernel_cmpxchg64 (tmp, tmp2, ptr); \ +} while (failure != 0);\ + \ +return tmp;\ + } + +FETCH_AND_OP_WORD64 (add, , +) +FETCH_AND_OP_WORD64 (sub, , -) +FETCH_AND_OP_WORD64 (or,, |) +FETCH_AND_OP_WORD64 (and, , ) +FETCH_AND_OP_WORD64 (xor, , ^) +FETCH_AND_OP_WORD64 (nand, ~, ) + +#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH +#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH + +/* Implement both __sync_op_and_fetch and __sync_fetch_and_op for + subword-sized quantities. */ + +#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\ + long long HIDDEN
[Patch 5/5] ARM 64 bit sync atomic operations [V3]
Test support for ARM 64bit sync intrinsics. gcc/testsuite/ * gcc.dg/di-longlong64-sync-1.c: New test. * gcc.dg/di-sync-multithread.c: New test. * gcc.target/arm/di-longlong64-sync-withhelpers.c: New test. * gcc.target/arm/di-longlong64-sync-withldrexd.c: New test. * lib/target-supports.exp: (arm_arch_*_ok): Series of effective-target tests for v5, v6, v6k, and v7-a, and add-options helpers. (check_effective_target_arm_arm_ok): New helper. (check_effective_target_sync_longlong): New helper. diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c new file mode 100644 index 000..82a4ea2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c @@ -0,0 +1,164 @@ +/* { dg-do run } */ +/* { dg-require-effective-target sync_longlong } */ +/* { dg-options -std=gnu99 } */ +/* { dg-message note: '__sync_fetch_and_nand' changed semantics in GCC 4.4 { target *-*-* } 0 } */ +/* { dg-message note: '__sync_nand_and_fetch' changed semantics in GCC 4.4 { target *-*-* } 0 } */ + + +/* Test basic functionality of the intrinsics. The operations should + not be optimized away if no one checks the return values. */ + +/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long + (an explicit 64bit type maybe a better bet) and 2) Use values that cross + the 32bit boundary and cause carries since the actual maths are done as + pairs of 32 bit instructions. */ + +/* Note: This file is #included by some of the ARM tests. */ + +__extension__ typedef __SIZE_TYPE__ size_t; + +extern void abort (void); +extern void *memcpy (void *, const void *, size_t); +extern int memcmp (const void *, const void *, size_t); + +/* Temporary space where the work actually gets done. */ +static long long AL[24]; +/* Values copied into AL before we start. */ +static long long init_di[24] = { 0x10002ll, 0x20003ll, 0, 1, + +0x10002ll, 0x10002ll, +0x10002ll, 0x10002ll, + +0, 0x1000e0dell, +42 , 0xc001c0dell, + +-1ll, 0, 0xff00ffll, -1ll, + +0, 0x1000e0dell, +42 , 0xc001c0dell, + +-1ll, 0, 0xff00ffll, -1ll}; +/* This is what should be in AL at the end. */ +static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0, + +0x10002ll, 0x10002ll, +0x10002ll, 0x10002ll, + +1, 0xc001c0dell, +20, 0x1000e0dell, + +0x30007ll , 0x50009ll, +0xf100ff0001ll, ~0xa0007ll, + +1, 0xc001c0dell, +20, 0x1000e0dell, + +0x30007ll , 0x50009ll, +0xf100ff0001ll, ~0xa0007ll }; + +/* First check they work in terms of what they do to memory. */ +static void +do_noret_di (void) +{ + __sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll); + __sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll); + __sync_lock_test_and_set (AL+2, 1); + __sync_lock_release (AL+3); + + /* The following tests should not change the value since the + original does NOT match. */ + __sync_val_compare_and_swap (AL+4, 0x2ll, 0x1234567890ll); + __sync_val_compare_and_swap (AL+5, 0x1ll, 0x1234567890ll); + __sync_bool_compare_and_swap (AL+6, 0x2ll, 0x1234567890ll); + __sync_bool_compare_and_swap (AL+7, 0x1ll, 0x1234567890ll); + + __sync_fetch_and_add (AL+8, 1); + __sync_fetch_and_add (AL+9, 0xb000e000ll); /* + to both halves carry. */ + __sync_fetch_and_sub (AL+10, 22); + __sync_fetch_and_sub (AL+11, 0xb000e000ll); + + __sync_fetch_and_and (AL+12, 0x30007ll); + __sync_fetch_and_or (AL+13, 0x50009ll); + __sync_fetch_and_xor (AL+14, 0xe0001ll); + __sync_fetch_and_nand (AL+15, 0xa0007ll); + + /* These should be the same as the fetch_and_* cases except for + return value. */ + __sync_add_and_fetch (AL+16, 1); + /* add to both halves carry. */ + __sync_add_and_fetch (AL+17, 0xb000e000ll); + __sync_sub_and_fetch (AL+18, 22); + __sync_sub_and_fetch (AL+19, 0xb000e000ll); + + __sync_and_and_fetch (AL+20, 0x30007ll); + __sync_or_and_fetch (AL+21, 0x50009ll); + __sync_xor_and_fetch (AL+22, 0xe0001ll); + __sync_nand_and_fetch (AL+23, 0xa0007ll); +} + +/* Now check return values. */ +static void +do_ret_di (void) +{ + if (__sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll) != + 0x10002ll) abort (); + if
Re: [PATCH] Fix PR46556 (poor address generation)
On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 04:13, Richard Guenther wrote: People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. There's a variety of literature that uses PRE to detect and optimize straightline code strength reduction. I poked at it at one time (RTL gcse framework) and it looked reasonably promising. Never pushed it all the way through. jeff I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this question has come up before. The former also suggested a PRE-based approach.
Re: Initial shrink-wrapping patch
HJ found some more maybe_record_trace_start failures. In one case I debugged, we have (insn/f 31 0 32 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) -1 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) (nil))) The insn pattern is later changed by csa to adjust by 24, and the note is left untouched; that seems to be triggering the problem. Richard, is there a reason to use REG_CFA_ADJUST_CFA rather than REG_CFA_DEF_CFA? If no, I'll just try to fix i386.c not to emit the former. Bernd
Re: Vector shuffling
Richard Henderson schrieb: On 10/06/2011 04:46 AM, Georg-Johann Lay wrote: So here it is. Lightly tested on my target: All tests either PASS or are UNSUPPORTED now. Ok? Not ok, but only because I've completely restructured the tests again. Patch coming very shortly... Thanks, I hope your patch fixed the issues addressed in my patch :-) Johann r~
Re: [PATCH] Fix PR46556 (poor address generation)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 12:02, William J. Schmidt wrote: On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 04:13, Richard Guenther wrote: People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. There's a variety of literature that uses PRE to detect and optimize straightline code strength reduction. I poked at it at one time (RTL gcse framework) and it looked reasonably promising. Never pushed it all the way through. jeff I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this question has come up before. The former also suggested a PRE-based approach. Yea. We've kicked it around several times over the last 15 or so years. When I briefly looked at it, I was doing so more in the context of eliminating all the optimize_related_values crap, purely as a cleanup and utlimately couldn't justify the time. IIRC Morgan Muchnick both had write-ups on the basic concepts. There's probably other literature as well. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOje4wAAoJEBRtltQi2kC7q9UIAIXdiGG5Reu75PBkMPO9KKhn RRGKQMMNSinGDyyORGqxfqwrirtFuQqzn+ITRsfjegHydUTwwDDaAtTyqEqFmpt0 2phGYBOS5pN4VImKzrd2fxlwbuW0jUlOpDuWFK+K10W8jU3SlJSIZhfaSMh1PwC5 IQm6FLDiuRuNSgyYattUnI5KZ2chN2QEkfUBQgDvbxHXfPDqjNQymIfv1K9iymrG j3Wq7i47fBkYbnPYtAQ9GCYsmT6Wo2v+2/ZeFWE417FYYhgCdBeYu2iZPE6Nm8Pb SypPDyi1AQ3QRfg+LPiN1bdQk40MhfPlMhHZtnVtq8nEa9+fLTgO/ERzCD0G+r8= =XWLf -END PGP SIGNATURE-
Re: Initial shrink-wrapping patch
On 10/06/2011 11:03 AM, Bernd Schmidt wrote: HJ found some more maybe_record_trace_start failures. In one case I debugged, we have (insn/f 31 0 32 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) -1 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) (nil))) The insn pattern is later changed by csa to adjust by 24, and the note is left untouched; that seems to be triggering the problem. Hmm. That seems a bit odd, considering this function probably does not use alloca (no frame pointer), and uses accumulated outgoing arguments (x86_64 never uses no-accumulate-outgoing-args, afaik). Richard, is there a reason to use REG_CFA_ADJUST_CFA rather than REG_CFA_DEF_CFA? If no, I'll just try to fix i386.c not to emit the former. Not that I can think of. But if that change makes any difference at all, that's almost certainly another bug. What PR are you looking at here? r~
Re: Initial shrink-wrapping patch
On 10/06/11 20:13, Richard Henderson wrote: What PR are you looking at here? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50632 Testcase is gcc.dg/pr50132.c. Bernd
Re: [PATCH] Add support for lzd and popc instructions on sparc.
From: Richard Henderson r...@redhat.com Date: Thu, 06 Oct 2011 10:47:28 -0700 You've said that POPC only operates on the full 64-bit register, but I see no zero-extend of the SImode input? Similarly for the clzsi patterns. Thanks for catching this. I guess if I emit the zero-extend, the compiler will eliminate it if possible. This is another reason why I want to do v8plus differently. The compiler would take care to optimize away zero and sign extensions instead of how we use that sparc_check_64 () thing now. If it weren't for the v8plus ugliness, it would be sufficient to only expose the DImode patterns, and let optabs.c do the work to extend from SImode... Understood.
Re: Initial shrink-wrapping patch
On Tue, Oct 4, 2011 at 3:10 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 09/30/11 18:51, Richard Henderson wrote: Please do leave out RETURN_ADDR_REGNUM for now. If you remember why, then you could bring it back alongside the patch for the ARM backend. Changed. As for the i386 backend changes, not an objection per se, but I'm trying to understand why we need so many copies of patterns. Also changed. I don't see anything glaringly wrong in the middle end. Although the thread_prologue_and_epilogue_insns function is now gigantic. If there were an easy way to break that up and reduce the amount of conditional compilation at the same time... that'd be great, but not a requirement. I don't think there's an easy way; and it's almost certain to break stuff again, so I'd rather avoid doing it at the same time as this patch if possible. I can see one possible way of tackling it; have an analysis phase that fills up a few basic_block VECs (those which need sibcall returns, those which need plain returns, those which need simple returns) and computes other information, such as the edges on which prologue and epilogue are to be inserted, and then a worker phase (probably split across several functions) which does all the fiddling. Richard S. suggested: ...how about adding a bit to crtl to say whether shrink-wrap occured, and check that instead of flag_shrink_wrap? Good idea, also changed. New version below. Bootstrapped and tested i686-linux. It also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50633 Don't you need to update ix86_expand_prologue? -- H.J.
Re: Initial shrink-wrapping patch
On 10/06/11 20:27, H.J. Lu wrote: It also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50633 Don't you need to update ix86_expand_prologue? In theory it should just work. It seems the x32 stuff has entertaining properties :-( Haven't quite figured out how to build it yet, but: -subq$136, %rsp -.cfi_def_cfa_offset 144 movl$0, %eax movl%esp, %ecx addl$60, %ecx @@ -16,6 +14,8 @@ main: movl%eax, (%edx) cmpl$16, %eax jne.L2 +subq$136, %rsp +.cfi_def_cfa_offset 144 So, this looks like we have both $esp and $rsp - i.e. not using stack_pointer_rtx in all cases? Is there a way to avoid this? BTW, one other thing that occurred to me - what about drap_reg? Does that need to be added to the set of registers whose use requires a prologue? Bernd
Re: Modify gcc for use with gdb (issue5132047)
On Oct 6, 2011, at 1:58 AM, Richard Guenther wrote: On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo dnovi...@google.com wrote: What's the other advantage of using inline functions? The gdb annoyance with the macros can be solved with the .gdbinit macro defines (which might be nice to commit to SVN btw). http://old.nabble.com/-incremental--Patch%3A-FYI%3A-add-accessor-macros-to-gdbinit-td17370385.html And yet, this still isn't in gcc. :-( I wonder how much programmer productivity we've lost due to it.
Re: Modify gcc for use with gdb (issue5132047)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/11 12:46, Mike Stump wrote: On Oct 6, 2011, at 1:58 AM, Richard Guenther wrote: On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo dnovi...@google.com wrote: What's the other advantage of using inline functions? The gdb annoyance with the macros can be solved with the .gdbinit macro defines (which might be nice to commit to SVN btw). http://old.nabble.com/-incremental--Patch%3A-FYI%3A-add-accessor-macros-to-gdbinit-td17370385.html And yet, this still isn't in gcc. :-( I wonder how much programmer productivity we've lost due to it. Presumably it hasn't been included because not all gdb's understand those bits and we typically don't build with -g3. Personally, the accessors I use are muscle-memory... Which works great until someone buries everything a level deeper :( jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOjfkaAAoJEBRtltQi2kC7nz0H/iOeU6/iqNBfMDGUMaJbFe/R rbBFKJzAR6IOXWBJzlnA+d1qvpF5z1GsdfkqtaLImKrAFNsbv1LV58X3lc7l8yLQ TGgLVuZPc/cJ8q0fyAd1rDBGsxRrapm5cPgyMFhl9eY1pC9pejAcbqLnXxRmIs41 NyohojHLu09o+WRkjNg79TOyapFNnY8w0Hz6PuFOyYv/eYthZ5dw9+N1XyUKey/M 6GHAOqTD3iQKz9QsG5dc+SVfTPLkToOnAZ5y8TnupFay9aLRXUlXDpFHZbre9h5C ICK8FnWf74Xemw79ID94WWwomdND69myTf7bMlD4pFiHK3Wz6fPeR1GIdTixi7s= =BrUJ -END PGP SIGNATURE-
Re: Initial shrink-wrapping patch
On Thu, Oct 6, 2011 at 11:40 AM, Bernd Schmidt ber...@codesourcery.com wrote: On 10/06/11 20:27, H.J. Lu wrote: It also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50633 Don't you need to update ix86_expand_prologue? In theory it should just work. It seems the x32 stuff has entertaining properties :-( Haven't quite figured out how to build it yet, but: - subq $136, %rsp - .cfi_def_cfa_offset 144 movl $0, %eax movl %esp, %ecx addl $60, %ecx @@ -16,6 +14,8 @@ main: movl %eax, (%edx) cmpl $16, %eax jne .L2 + subq $136, %rsp + .cfi_def_cfa_offset 144 So, this looks like we have both $esp and $rsp - i.e. not using stack_pointer_rtx in all cases? Is there a way to avoid this? X32 has 32bit software stack pointer and 64bit hardware stack pointer. BTW, one other thing that occurred to me - what about drap_reg? Does that need to be added to the set of registers whose use requires a prologue? It should be covered by SP. -- H.J.
[Ada] Fix bad warning for divide by zero
Tested on x86_64-pc-linux-gnu, committed on trunk If the compiler detects a floating division by zero, it was unconditionally issuing a warning and raising a constraint_error. This is wrong behavior for the case of an unconstained floating point type. This patch corrects that behavior as shown by the following test program 1. procedure TESTfdz is 2. 3.type REAL_T is record 4. VALUE : FLOAT; 5.end record; 6. 7.function / (LEFT, RIGHT : in REAL_T) return REAL_T is 8.begin 9. return (VALUE = LEFT.VALUE / RIGHT.VALUE); 10.end /; 11. 12.X : REAL_T := (VALUE = 1.0); 13.Y : FLOAT := 1.0; 14. 15.type CF is digits 6 range 0.0 .. 1.0; 16.Z : CF := 1.0; 17. 18. begin 19.X := X / (VALUE = 0.0); 20.Y := Y / 0.0; | warning: float division by zero, may generate +/- infinity 21.Z := Z / 0.0; | warning: division by zero, Constraint_Error will be raised at run time 22. end; 2011-10-06 Robert Dewar de...@adacore.com * sem_res.adb (Resolve_Arithmetic_Op): Fix bad warning for floating divide by zero. Index: sem_res.adb === --- sem_res.adb (revision 179628) +++ sem_res.adb (working copy) @@ -64,6 +64,7 @@ with Sem_Eval; use Sem_Eval; with Sem_Intr; use Sem_Intr; with Sem_Util; use Sem_Util; +with Targparm; use Targparm; with Sem_Type; use Sem_Type; with Sem_Warn; use Sem_Warn; with Sinfo;use Sinfo; @@ -4874,14 +4875,34 @@ (Is_Real_Type (Etype (Rop)) and then Expr_Value_R (Rop) = Ureal_0)) then - -- Specialize the warning message according to the operation + -- Specialize the warning message according to the operation. + -- The following warnings are for the case case Nkind (N) is when N_Op_Divide = - Apply_Compile_Time_Constraint_Error - (N, division by zero?, CE_Divide_By_Zero, -Loc = Sloc (Right_Opnd (N))); + -- For division, we have two cases, for float division + -- of an unconstrained float type, on a machine where + -- Machine_Overflows is false, we don't get an exception + -- at run-time, but rather an infinity or Nan. The Nan + -- case is pretty obscure, so just warn about infinities. + + if Is_Floating_Point_Type (Typ) + and then not Is_Constrained (Typ) + and then not Machine_Overflows_On_Target + then +Error_Msg_N + (float division by zero, + may generate '+'/'- infinity?, Right_Opnd (N)); + +-- For all other cases, we get a Constraint_Error + + else +Apply_Compile_Time_Constraint_Error + (N, division by zero?, CE_Divide_By_Zero, + Loc = Sloc (Right_Opnd (N))); + end if; + when N_Op_Rem = Apply_Compile_Time_Constraint_Error (N, rem with zero divisor?, CE_Divide_By_Zero,
Re: Modify gcc for use with gdb (issue5132047)
On Oct 6, 2011, at 11:53 AM, Jeff Law wrote: Presumably it hasn't been included because not all gdb's understand those bits and we typically don't build with -g3. Personally, the accessors I use are muscle-memory... Which works great until someone buries everything a level deeper :( Yeah, which works great for encouraging new comers that doing anything with gcc is hard. They, by definition, don't have the muscle memory, and the documentation that describes which memory their muscles should have is hard to find, or non-existant. I'd merely favor any approach to make their life easier, -g3, a gdb macro package, all inline functions, no macros...