[PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand
Hello! Attached patch fixes PR63475, where postreload CSE propagates aliased memory operand. The core of the problem was with the call to base_alias_check when VALUE RTXes are involved. Before the call, find_base_term is used to extract the base of x_addr and mem_addr. Please note that find_base_term is able to extract the bases from VALUE RTXes. These extracted bases were passed to base_alias_check, together with original VALUE RTXes x_addr and mem_addr. The problem begins here. base_alias_check doesn't handle VALUE RTXes, and uses e.g. canon_rtx on VALUEs and various GET_CODE accessors to determine various properties of passed x_addr and mem_addr. One of these check checks for the AND alignment addresses to prevent: /* Differing symbols not accessed via AND never alias. */ if (GET_CODE (x_base) != ADDRESS GET_CODE (y_base) != ADDRESS) return 0; early exit. However, when x and y are passed as VALUE RTXes (that corresponds and hides the address with AND), and preceding calls to find_base_term are nevertheless able to extract the bases of x and y, this condition fires erroneously and invalid return value is returned (with 0 meaning that the addresses X and Y are known to point to different objects). The solution is to always extract values for x_addr and mem_addr and use them in the calls to find_base_term and base_alias_check. [It can happen that get_addr is not able to match VALUE RTX with some address, so it is not possible to simply add a bunch of GET_CODE (x) != VALUE asserts in base_alias_check. But in this case find_base_term returns ADDRESS RTX, so we stay in sync as far as base_alias_check is concerned (see the quoted code above).] Added benefit of the patch is, that canon_rtx now works as expected. canon_rtx does NOT handle VALUE RTXes. A small optimization is also present. If the address is already canonicalized, we pass original address to memrefs_conflict_p, but we have to extract original address for preceding functions nevertheless. Also, we use extracted original address in recently added check for AND aligned addresses when checking for MEM_READONLY_P. The patch also removes a couple of unneeded and unused calls to canon_rtx, also to show the level of bitrot in this area ... 2014-10-14 Uros Bizjak ubiz...@gmail.com PR rtl-optimization/63475 * alias.c (true_dependence_1): Always use get_addr to extract true address operands from x_addr and mem_addr. Use extracted address operands to check for references with alignment ANDs. Use extracted address operands with find_base_term and base_alis_check. For noncanonicalized operands call canon_rtx with extracted address operand. (write_dependence_1): Ditto. (may_alias_p): Ditto. Remove unused calls to canon_rtx. Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and alpha-linux-gnu for all default languages plus obj-c++ and go. While there was no differences on x86_64-linux-gnu (as expected), alpha-linux-gnu improved the result [1] for some hundred of PASSes in gfortran testsuite [2]. OK for mainline? [1] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01151.html [2] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg01478.html Uros. Index: alias.c === --- alias.c (revision 216149) +++ alias.c (working copy) @@ -2439,6 +2439,7 @@ static int true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr, const_rtx x, rtx x_addr, bool mem_canonicalized) { + rtx true_mem_addr; rtx base; int ret; @@ -2458,6 +2459,10 @@ true_dependence_1 (const_rtx mem, enum machine_mod || MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER) return 1; + if (! x_addr) +x_addr = XEXP (x, 0); + x_addr = get_addr (x_addr); + if (! mem_addr) { mem_addr = XEXP (mem, 0); @@ -2464,23 +2469,8 @@ true_dependence_1 (const_rtx mem, enum machine_mod if (mem_mode == VOIDmode) mem_mode = GET_MODE (mem); } + true_mem_addr = get_addr (mem_addr); - if (! x_addr) -{ - x_addr = XEXP (x, 0); - if (!((GET_CODE (x_addr) == VALUE - GET_CODE (mem_addr) != VALUE - reg_mentioned_p (x_addr, mem_addr)) - || (GET_CODE (x_addr) != VALUE -GET_CODE (mem_addr) == VALUE -reg_mentioned_p (mem_addr, x_addr - { - x_addr = get_addr (x_addr); - if (! mem_canonicalized) - mem_addr = get_addr (mem_addr); - } -} - /* Read-only memory is by definition never modified, and therefore can't conflict with anything. However, don't assume anything when AND addresses are involved and leave to the code below to determine @@ -2488,7 +2478,7 @@ true_dependence_1 (const_rtx mem, enum machine_mod stupid user tricks can produce them, so don't die. */ if (MEM_READONLY_P (x) GET_CODE (x_addr) != AND - GET_CODE (mem_addr) != AND) +
Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.
Hello Uroš, It seems like I missed to post uppdated patch. On 25 Sep 20:11, Uros Bizjak wrote: I'd rather go with the second approach, it is less confusing from the maintainer POV. All other patterns with masking use some consistent template, so I'd suggest using the same approach for everything. If it is indeed too many patterns, then please split the patch to smaller pieces. Goal was not to decrease size of the patch, I wanted to make pattern look simpler by hiding masking stuff beyond `subst'. Anyway, I've updated the patch. Here it is (bootstrapped and regtested). Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VI_AVX2): Extend to support AVX-512BW. (define_mode_iterator VI124_AVX2_48_AVX512F): Remove. (define_expand plusminus_insnmode3): Remove masking support. (define_insn *plusminus_insnmode3): Ditto. (define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New. (define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_expand sse2_avx2_andnotmode3): Remove masking support. (define_insn *andnotmode3): Ditto. (define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New. (define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto. (define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto. (define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto. (define_insn *absmode2): Remove masking support. (define_insn absVI48_AVX512VL:mode2_mask): New. (define_insn absVI12_AVX512VL:mode2_mask): Ditto. (define_expand absmode2): Use VI_AVX2 mode iterator. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ffc831f..9edfebc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -268,8 +268,8 @@ (V4DI TARGET_AVX) V2DI]) (define_mode_iterator VI_AVX2 - [(V32QI TARGET_AVX2) V16QI - (V16HI TARGET_AVX2) V8HI + [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX2) V16QI + (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX2) V8HI (V16SI TARGET_AVX512F) (V8SI TARGET_AVX2) V4SI (V8DI TARGET_AVX512F) (V4DI TARGET_AVX2) V2DI]) @@ -359,12 +359,6 @@ [(V16HI TARGET_AVX2) V8HI (V8SI TARGET_AVX2) V4SI]) -(define_mode_iterator VI124_AVX2_48_AVX512F - [(V32QI TARGET_AVX2) V16QI - (V16HI TARGET_AVX2) V8HI - (V16SI TARGET_AVX512F) (V8SI TARGET_AVX2) V4SI - (V8DI TARGET_AVX512F)]) - (define_mode_iterator VI124_AVX512F [(V32QI TARGET_AVX2) V16QI (V32HI TARGET_AVX512F) (V16HI TARGET_AVX2) V8HI @@ -9051,20 +9045,43 @@ TARGET_SSE2 operands[2] = force_reg (MODEmode, CONST0_RTX (MODEmode));) -(define_expand plusminus_insnmode3mask_name +(define_expand plusminus_insnmode3 [(set (match_operand:VI_AVX2 0 register_operand) (plusminus:VI_AVX2 (match_operand:VI_AVX2 1 nonimmediate_operand) (match_operand:VI_AVX2 2 nonimmediate_operand)))] - TARGET_SSE2 mask_mode512bit_condition + TARGET_SSE2 + ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) + +(define_expand plusminus_insnmode3_mask + [(set (match_operand:VI48_AVX512VL 0 register_operand) + (vec_merge:VI48_AVX512VL + (plusminus:VI48_AVX512VL + (match_operand:VI48_AVX512VL 1 nonimmediate_operand) + (match_operand:VI48_AVX512VL 2 nonimmediate_operand)) + (match_operand:VI48_AVX512VL 3 vector_move_operand) + (match_operand:avx512fmaskmode 4 register_operand)))] + TARGET_AVX512F + ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) + +(define_expand plusminus_insnmode3_mask + [(set (match_operand:VI12_AVX512VL 0 register_operand) + (vec_merge:VI12_AVX512VL + (plusminus:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 nonimmediate_operand) + (match_operand:VI12_AVX512VL 2 nonimmediate_operand)) + (match_operand:VI12_AVX512VL 3 vector_move_operand) + (match_operand:avx512fmaskmode 4 register_operand)))] + TARGET_AVX512BW ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) -(define_insn *plusminus_insnmode3mask_name +(define_insn *plusminus_insnmode3 [(set (match_operand:VI_AVX2 0 register_operand =x,v) (plusminus:VI_AVX2 (match_operand:VI_AVX2 1 nonimmediate_operand comm0,v) (match_operand:VI_AVX2 2 nonimmediate_operand xm,vm)))] - TARGET_SSE2 ix86_binary_operator_ok (CODE, MODEmode, operands) mask_mode512bit_condition + TARGET_SSE2 +ix86_binary_operator_ok (CODE, MODEmode, operands) @ pplusminus_mnemonicssemodesuffix\t{%2, %0|%0, %2} vpplusminus_mnemonicssemodesuffix\t{%2, %1, %0mask_operand3|%0mask_operand3, %1, %2} @@ -9074,6 +9091,35 @@ (set_attr prefix mask_prefix3) (set_attr mode sseinsnmode)]) +(define_insn *plusminus_insnmode3_mask + [(set
Move loop peeling from RTL to gimple
Hi, this is update of my 2013 update to 2012 patch to move rtl loop peeling to tree level. This is to expose optimization oppurtunities earlier. Incrementally I think I can also improve profiling to provide a histogram on loop iterations and get more sensible peeling decisions. profiled-bootstrapped/regtested x86_64-linux, OK? Honza * loop-unroll.c: (decide_unrolling_and_peeling): Rename to (decide_unrolling): ... this one. (peel_loops_completely): Remove. (decide_peel_simple): Remove. (decide_peel_once_rolling): Remove. (decide_peel_completely): Remove. (peel_loop_simple): Remove. (peel_loop_completely): Remove. (unroll_and_peel_loops): Rename to ... (unroll_loops): ... this one; handle only unrolling. * cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and LPT_PEEL_SIMPLE. (UAP_PEEL): Remove. (unroll_and_peel_loops): Remove. (unroll_loops): New. * passes.def: Replace pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops. * loop-init.c (gate_rtl_unroll_and_peel_loops, rtl_unroll_and_peel_loops): Rename to ... (gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update. (pass_rtl_unroll_and_peel_loops): Rename to ... (pass_rtl_unroll_loops): ... this one. * tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove. (make_pass_rtl_unroll_loops): New. * tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, try_peel_loop): New. (canonicalize_loop_induction_variables): Update. * gcc.dg/tree-prof/peel-1.c: Update. * gcc.dg/tree-prof/unroll-1.c: Update. * gcc.dg/gcc.dg/unroll_1.c: Update. * gcc.dg/gcc.dg/unroll_2.c: Update. * gcc.dg/gcc.dg/unroll_3.c: Update. * gcc.dg/gcc.dg/unroll_4.c: Update. Index: tree-pass.h === --- tree-pass.h (revision 216145) +++ tree-pass.h (working copy) @@ -504,7 +504,7 @@ extern rtl_opt_pass *make_pass_outof_cfg extern rtl_opt_pass *make_pass_loop2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_loop_init (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_move_loop_invariants (gcc::context *ctxt); -extern rtl_opt_pass *make_pass_rtl_unroll_and_peel_loops (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_rtl_unroll_loops (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_doloop (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_loop_done (gcc::context *ctxt); Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 216145) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -28,9 +28,12 @@ along with GCC; see the file COPYING3. variables. In that case the created optimization possibilities are likely to pay up. - Additionally in case we detect that it is beneficial to unroll the - loop completely, we do it right here to expose the optimization - possibilities to the following passes. */ + We also perform + - complette unrolling (or peeling) when the loops is rolling few enough + times + - simple peeling (i.e. copying few initial iterations prior the loop) + when number of iteration estimate is known (typically by the profile + info). */ #include config.h #include system.h @@ -657,11 +660,12 @@ try_unroll_loop_completely (struct loop HOST_WIDE_INT maxiter, location_t locus) { - unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; + unsigned HOST_WIDE_INT n_unroll = 0, ninsns, max_unroll, unr_insns; gimple cond; struct loop_size size; bool n_unroll_found = false; edge edge_to_cancel = NULL; + int report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_RTL | TDF_DETAILS; /* See if we proved number of iterations to be low constant. @@ -821,6 +825,8 @@ try_unroll_loop_completely (struct loop loop-num); return false; } + dump_printf_loc (report_flags, locus, + loop turned into non-loop; it never loops.\n); initialize_original_copy_tables (); wont_exit = sbitmap_alloc (n_unroll + 1); @@ -902,6 +908,133 @@ try_unroll_loop_completely (struct loop return true; } +/* Return number of instructions after peeling. */ +static unsigned HOST_WIDE_INT +estimated_peeled_sequence_size (struct loop_size *size, + unsigned HOST_WIDE_INT npeel) +{ + return MAX (npeel * (HOST_WIDE_INT) (size-overall + - size-eliminated_by_peeling), 1); +} + +/* If the loop is expected to iterate N times and is + small enough, duplicate the loop body N+1 times before + the loop itself. This way the hot path will never + enter the loop. + Parameters are the same as for
Re: Towards GNU11
On Tue, Oct 07, 2014 at 11:07:56PM +0200, Marek Polacek wrote: I'd like to kick off a discussion about moving the default standard for C from gnu89 to gnu11. The consensus seems to be to go forward with this change. I will commit the patch in 24 hours unless I hear objections. Thanks, Marek
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On Mon, 13 Oct 2014, Bernd Schmidt wrote: On 10/13/2014 12:33 PM, Ilya Verbin wrote: On 13 Oct 12:19, Jakub Jelinek wrote: But I'd like to understand why is this one needed. Why should the compilers care? Aggregates layout and alignment of integral/floating types must match between host and offload compilers, sure, but isn't that something streamed already in the LTO bytecode? Or is LTO streamer not streaming some types like long_type_node? It isn't, see the preload_common_nodes code. Something I'd like to get rid of at some point (but it's not 100% easy as backends for example compare va_list_type_node by pointer). Also, the backend needs to choose the right Pmode (and in the case of ptx, emit a directive about address sizes). Surely that will only be one problem with going the LTO way to handle the offloading ;) Richard.
[PATCH] Fix PR63512
I forgot to mark stmts as modified. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-10-14 Richard Biener rguent...@suse.de PR tree-optimization/63512 * tree-ssa-pre.c (create_expression_by_pieces): Mark stmts modified. * g++.dg/torture/pr63512.C: New testcase. Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 216146) +++ gcc/tree-ssa-pre.c (working copy) @@ -2897,6 +2897,7 @@ create_expression_by_pieces (basic_block } gimple_set_vuse (stmt, BB_LIVE_VOP_ON_EXIT (block)); + gimple_set_modified (stmt, true); } gimple_seq_add_seq (stmts, forced_stmts); } @@ -2904,6 +2905,7 @@ create_expression_by_pieces (basic_block name = make_temp_ssa_name (exprtype, NULL, pretmp); newstmt = gimple_build_assign (name, folded); gimple_set_vuse (newstmt, BB_LIVE_VOP_ON_EXIT (block)); + gimple_set_modified (newstmt, true); gimple_set_plf (newstmt, NECESSARY, false); gimple_seq_add_stmt (stmts, newstmt); Index: gcc/testsuite/g++.dg/torture/pr63512.C === --- gcc/testsuite/g++.dg/torture/pr63512.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr63512.C (working copy) @@ -0,0 +1,46 @@ +// { dg-do compile } + +extern C { +void __assert_fail (); +unsigned long strlen (const char *); +} +class A +{ + int Data; + int Length; + +public: + A (const char *p1) : Data () + { +p1 ? void() : __assert_fail (); +Length = strlen (p1); + } +}; +enum TokenKind +{ + semi +}; +class B +{ +public: + void m_fn1 (); +}; +class C +{ + void m_fn2 (TokenKind, int, A); + struct D + { +D (int); +B Range; + }; + int *m_fn3 (const int , int , int **); +}; +int a, b; +int * +C::m_fn3 (const int , int , int **) +{ + D c (0); + if (a) +c.Range.m_fn1 (); + m_fn2 (semi, 0, b ? : a ? alias declaration : using declaration); +}
Re: Move loop peeling from RTL to gimple
On Tue, 14 Oct 2014, Jan Hubicka wrote: Hi, this is update of my 2013 update to 2012 patch to move rtl loop peeling to tree level. This is to expose optimization oppurtunities earlier. Incrementally I think I can also improve profiling to provide a histogram on loop iterations and get more sensible peeling decisions. profiled-bootstrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. Honza * loop-unroll.c: (decide_unrolling_and_peeling): Rename to (decide_unrolling): ... this one. (peel_loops_completely): Remove. (decide_peel_simple): Remove. (decide_peel_once_rolling): Remove. (decide_peel_completely): Remove. (peel_loop_simple): Remove. (peel_loop_completely): Remove. (unroll_and_peel_loops): Rename to ... (unroll_loops): ... this one; handle only unrolling. * cfgloop.h (lpt_dec): Remove LPT_PEEL_COMPLETELY and LPT_PEEL_SIMPLE. (UAP_PEEL): Remove. (unroll_and_peel_loops): Remove. (unroll_loops): New. * passes.def: Replace pass_rtl_unroll_and_peel_loops by pass_rtl_unroll_loops. * loop-init.c (gate_rtl_unroll_and_peel_loops, rtl_unroll_and_peel_loops): Rename to ... (gate_rtl_unroll_loops, rtl_unroll_loops): ... these; update. (pass_rtl_unroll_and_peel_loops): Rename to ... (pass_rtl_unroll_loops): ... this one. * tree-pass.h (make_pass_rtl_unroll_and_peel_loops): Remove. (make_pass_rtl_unroll_loops): New. * tree-ssa-loop-ivcanon.c: (estimated_peeled_sequence_size, try_peel_loop): New. (canonicalize_loop_induction_variables): Update. * gcc.dg/tree-prof/peel-1.c: Update. * gcc.dg/tree-prof/unroll-1.c: Update. * gcc.dg/gcc.dg/unroll_1.c: Update. * gcc.dg/gcc.dg/unroll_2.c: Update. * gcc.dg/gcc.dg/unroll_3.c: Update. * gcc.dg/gcc.dg/unroll_4.c: Update. Index: tree-pass.h === --- tree-pass.h (revision 216145) +++ tree-pass.h (working copy) @@ -504,7 +504,7 @@ extern rtl_opt_pass *make_pass_outof_cfg extern rtl_opt_pass *make_pass_loop2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_loop_init (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_move_loop_invariants (gcc::context *ctxt); -extern rtl_opt_pass *make_pass_rtl_unroll_and_peel_loops (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_rtl_unroll_loops (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_doloop (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_loop_done (gcc::context *ctxt); Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 216145) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -28,9 +28,12 @@ along with GCC; see the file COPYING3. variables. In that case the created optimization possibilities are likely to pay up. - Additionally in case we detect that it is beneficial to unroll the - loop completely, we do it right here to expose the optimization - possibilities to the following passes. */ + We also perform + - complette unrolling (or peeling) when the loops is rolling few enough + times + - simple peeling (i.e. copying few initial iterations prior the loop) + when number of iteration estimate is known (typically by the profile + info). */ #include config.h #include system.h @@ -657,11 +660,12 @@ try_unroll_loop_completely (struct loop HOST_WIDE_INT maxiter, location_t locus) { - unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; + unsigned HOST_WIDE_INT n_unroll = 0, ninsns, max_unroll, unr_insns; gimple cond; struct loop_size size; bool n_unroll_found = false; edge edge_to_cancel = NULL; + int report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_RTL | TDF_DETAILS; /* See if we proved number of iterations to be low constant. @@ -821,6 +825,8 @@ try_unroll_loop_completely (struct loop loop-num); return false; } + dump_printf_loc (report_flags, locus, + loop turned into non-loop; it never loops.\n); initialize_original_copy_tables (); wont_exit = sbitmap_alloc (n_unroll + 1); @@ -902,6 +908,133 @@ try_unroll_loop_completely (struct loop return true; } +/* Return number of instructions after peeling. */ +static unsigned HOST_WIDE_INT +estimated_peeled_sequence_size (struct loop_size *size, + unsigned HOST_WIDE_INT npeel) +{ + return MAX (npeel * (HOST_WIDE_INT) (size-overall +- size-eliminated_by_peeling), 1); +} + +/* If the loop is expected to iterate N times and is + small enough, duplicate the loop body N+1 times before + the loop itself.
[PATCH][match-and-simplify] Change back default behavior of fold_stmt
This changes default behavior of fold_stmt back to _not_ following SSA use-def chains when trying to simplify things. I had to force that already for one caller and for the merge to trunk I'd rather not track down issues in every other existing caller. This means that fold_stmt will not become more powerful, at least for now. I still hope to get rid of its use of fold() during the merge process. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. (yeah, I'm preparing a first batch of changes to merge from the branch) Richard. 2014-10-14 Richard Biener rguent...@suse.de * gimple-fold.c (fold_stmt): Make old API never follow SSA edges when simplifying. (no_follow_ssa_edges): New function. * tree-cfg.c (no_follow_ssa_edges): Remove. (replace_uses_by): Use plain fold_stmt again. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 216146) +++ gcc/gimple-fold.c (working copy) @@ -3136,6 +3136,14 @@ fail: return changed; } +/* Valueziation callback that ends up not following SSA edges. */ + +static tree +no_follow_ssa_edges (tree) +{ + return NULL_TREE; +} + /* Fold the statement pointed to by GSI. In some cases, this function may replace the whole statement with a new one. Returns true iff folding makes any changes. @@ -3146,7 +3154,7 @@ fail: bool fold_stmt (gimple_stmt_iterator *gsi) { - return fold_stmt_1 (gsi, false, NULL); + return fold_stmt_1 (gsi, false, no_follow_ssa_edges); } bool @@ -3167,7 +3175,7 @@ bool fold_stmt_inplace (gimple_stmt_iterator *gsi) { gimple stmt = gsi_stmt (*gsi); - bool changed = fold_stmt_1 (gsi, true, NULL); + bool changed = fold_stmt_1 (gsi, true, no_follow_ssa_edges); gcc_assert (gsi_stmt (*gsi) == stmt); return changed; } Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 216146) +++ gcc/tree-cfg.c (working copy) @@ -1709,14 +1709,6 @@ gimple_can_merge_blocks_p (basic_block a return true; } -/* ??? Maybe this should be a generic overload of fold_stmt. */ - -static tree -no_follow_ssa_edges (tree) -{ - return NULL_TREE; -} - /* Replaces all uses of NAME by VAL. */ void @@ -1773,17 +1765,7 @@ replace_uses_by (tree name, tree val) recompute_tree_invariant_for_addr_expr (op); } - /* If we have sth like - neighbor_29 = name + -1; - _33 = name + neighbor_29; -and substitute 1 for name then when visiting -_33 first then folding will simplify the stmt -to _33 = name; and the new immediate use will -be inserted before the stmt iterator marker and -thus we fail to visit it again, ICEing within the -has_zero_uses assert. -Avoid that by never following SSA edges. */ - if (fold_stmt (gsi, no_follow_ssa_edges)) + if (fold_stmt (gsi)) stmt = gsi_stmt (gsi); if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt))
[v3] Rename a few testcases
Hi, I'm renaming a few testcases which actually are about alias declarations not typedefs. Thanks, Paolo. 2014-10-14 Paolo Carlini paolo.carl...@oracle.com * testsuite/20_util/add_lvalue_reference/requirements/typedefs.cc: Rename to alias_decl.cc. * testsuite/20_util/add_rvalue_reference/requirements/typedefs.cc: Likewise. * testsuite/20_util/common_type/requirements/typedefs-3.cc: Likewise. * testsuite/20_util/conditional/requirements/typedefs-2.cc: Likewise. * testsuite/20_util/decay/requirements/typedefs-2.cc: Likewise. * testsuite/20_util/enable_if/requirements/typedefs-2.cc: Likewise. * testsuite/20_util/make_signed/requirements/typedefs-3.cc: Likewise. * testsuite/20_util/make_unsigned/requirements/typedefs-3.cc: Likewise. * testsuite/20_util/remove_reference/requirements/typedefs.cc: Likewise. * testsuite/20_util/result_of/requirements/typedefs.cc: Likewise. * testsuite/20_util/underlying_type/requirements/typedefs-3.cc: Likewise.
Re: [PATCH 3/5] timevar.h: Add an auto_timevar class
On Mon, Oct 13, 2014 at 7:45 PM, David Malcolm dmalc...@redhat.com wrote: This is used in a couple of places in jit/jit-playback.c to ensure that we pop the timevar on every exit path from a function. I could rewrite them if need be, but it does simplify things. Sorry to be bikeshedding but auto_timevar sounds odd - this is just a one-element timevar stack. Don't have a real better name though :/ Maybe timevar_pushpop ? Otherwise this looks ok. Thanks, Richard. Written by Tom Tromey. gcc/ChangeLog: * timevar.h (class auto_timevar): New class. --- gcc/timevar.h | 24 1 file changed, 24 insertions(+) diff --git a/gcc/timevar.h b/gcc/timevar.h index 6703cc9..f018e39 100644 --- a/gcc/timevar.h +++ b/gcc/timevar.h @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv) timevar_pop_1 (tv); } +// This is a simple timevar wrapper class that pushes a timevar in its +// constructor and pops the timevar in its destructor. +class auto_timevar +{ + public: + auto_timevar (timevar_id_t tv) +: m_tv (tv) + { +timevar_push (m_tv); + } + + ~auto_timevar () + { +timevar_pop (m_tv); + } + + private: + + // Private to disallow copies. + auto_timevar (const auto_timevar ); + + timevar_id_t m_tv; +}; + extern void print_time (const char *, long); #endif /* ! GCC_TIMEVAR_H */ -- 1.8.5.3
Re: [PATCH 1/2] Revert PR49721's patch
On Tue, Oct 14, 2014 at 12:35 AM, Andrew Pinski pins...@gmail.com wrote: On Fri, Aug 8, 2014 at 8:51 PM, Andrew Pinski apin...@cavium.com wrote: OK? When the second patch is approved? Ping? Ok if the second patch was approved. Richard. Thanks, Andrew Pinski ChangeLog: Revert: 2011-08-19 H.J. Lu hongjiu...@intel.com PR middle-end/49721 * explow.c (convert_memory_address_addr_space): Also permute the conversion and addition of constant for zero-extend. --- gcc/explow.c | 19 +++ 1 files changed, 7 insertions(+), 12 deletions(-) diff --git a/gcc/explow.c b/gcc/explow.c index 92c4e57..eb7dc85 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -376,23 +376,18 @@ convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED, case PLUS: case MULT: - /* FIXME: For addition, we used to permute the conversion and -addition operation only if one operand is a constant and -converting the constant does not change it or if one operand -is a constant and we are using a ptr_extend instruction -(POINTERS_EXTEND_UNSIGNED 0) even if the resulting address -may overflow/underflow. We relax the condition to include -zero-extend (POINTERS_EXTEND_UNSIGNED 0) since the other -parts of the compiler depend on it. See PR 49721. - + /* For addition we can safely permute the conversion and addition +operation if one operand is a constant and converting the constant +does not change it or if one operand is a constant and we are +using a ptr_extend instruction (POINTERS_EXTEND_UNSIGNED 0). We can always safely permute them if we are making the address narrower. */ if (GET_MODE_SIZE (to_mode) GET_MODE_SIZE (from_mode) || (GET_CODE (x) == PLUS CONST_INT_P (XEXP (x, 1)) - (POINTERS_EXTEND_UNSIGNED != 0 - || XEXP (x, 1) == convert_memory_address_addr_space - (to_mode, XEXP (x, 1), as + (XEXP (x, 1) == convert_memory_address_addr_space + (to_mode, XEXP (x, 1), as) + || POINTERS_EXTEND_UNSIGNED 0))) return gen_rtx_fmt_ee (GET_CODE (x), to_mode, convert_memory_address_addr_space (to_mode, XEXP (x, 0), as), -- 1.7.2.5
Re: [PATCH 4/n] OpenMP 4.0 offloading infrastructure: lto-wrapper
On Tue, Oct 14, 2014 at 02:42:47AM +0400, Ilya Verbin wrote: For that I guess lhd_begin_section would need to replace: section = get_section (name, SECTION_DEBUG, NULL); with: section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL); either just for the .gnu.offload_lto prefixed section, or all. The question is what will old assemblers and/or linkers do with that, and if there are any that support linker plugins, but not SHF_EXCLUDE. I've tried to set SECTION_EXCLUDE bit with as+ld version 2.20.51 and got a lot of warnings like: /tmp/ccg7P7iS.s:2: Warning: entity size for SHF_MERGE not specified /tmp/ccg7P7iS.s:2: Warning: group name for SHF_GROUP not specified as: /tmp/ccKFKXfc.o: warning: sh_link not set for section `.gnu.lto_main.11d9780ff2ebf166' /usr/bin/ld: /tmp/ccKFKXfc.o: warning: sh_link not set for section `.gnu.lto_main.11d9780ff2ebf166' I think, it can be placed under such ifdef: #if defined (HAVE_SECTION_EXCLUDE) HAVE_SECTION_EXCLUDE == 1 section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL); #else section = get_section (name, SECTION_DEBUG, NULL); #endif Currently there is HAVE_GAS_SECTION_EXCLUDE implemented in gcc/configure.ac, and HAVE_SECTION_EXCLUDE can use it + check a version of the linker. My preference would be to add the | SECTION_EXCLUDE unconditionally, and instead guard the if (flags SECTION_EXCLUDE) *f++ = 'e'; in varasm.c (default_elf_asm_named_section). The only other user of SECTION_EXCLUDE seems to be -gsplit-dwarf right now, Cary, is such a change ok with you? If you have new gas and old linker, I'd expect it would just ignore SHF_EXCLUDE. Jakub
[PING] [PATCH] Fix PR ipa/61190, 2nd edition
Ping... see: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00536.html Hi Honza, as you know, we have a wrong code bug, when a pure or const method is called via a virtual thunk. I had some more Ideas, how to fix that, but all of them had some serious draw-backs, so I leave the details out... But now I have a new insight, why the obvious fix for this serious code generation bug did not work in the first place. And the reason was, that if ipa-pure-const.c calls set_const_flag or set_pure_flag for a thunk, it calls the same function later for the called method, and this overwrites the flags of _all_ associated thunks and aliases. However that should at least not be done for virtual thunks, as these need to be IPA_NEITHER, even if the method itself has different attributes, that is because the assembler thunk accesses the vtable, while other thunks do not. So I re-factored set_const_flag and set_pure_flag to exclude the virtual thunks, taking care that other users of call_for_symbol_thunks_and_aliases do not get a different behavior than before this patch. The attached patch was boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? PS: As a side-note, there are two identical functions, named call_for_symbol_and_aliases, in class symtab_node and in class cgraph_node, which inherits from symtab_node. Both functions are not declared virtual. Is that what's intended? Usually this could lead to errors, or at least some serious compiler warnings. Thanks Bernd.
[PATCH][match-and-simplify] More TLC to genmatch
This applies more comment / whitespace TLC to genmatch and does minor refactoring on-the-fly. Bootstrap running on x86_64-unknown-linux-gnu. Richard. 2014-10-14 Richard Biener rguent...@suse.de * genmatch.c: Whitespace and comment fixes, some minor refactoring. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 216146) +++ gcc/genmatch.c (working copy) @@ -390,7 +390,8 @@ struct expr : public operand /* Whether the operation is to be applied commutatively. This is later lowered to two separate patterns. */ bool is_commutative; - virtual void gen_transform (FILE *f, const char *, bool, int, const char *, dt_operand ** = 0); + virtual void gen_transform (FILE *f, const char *, bool, int, + const char *, dt_operand ** = 0); }; /* An operator that is represented by native C code. This is always @@ -419,7 +420,8 @@ struct c_expr : public operand unsigned nr_stmts; /* The identifier replacement vector. */ vecid_tab ids; - virtual void gen_transform (FILE *f, const char *, bool, int, const char *, dt_operand **); + virtual void gen_transform (FILE *f, const char *, bool, int, + const char *, dt_operand **); }; /* A wrapper around another operand that captures its value. */ @@ -432,7 +434,8 @@ struct capture : public operand unsigned where; /* The captured value. */ operand *what; - virtual void gen_transform (FILE *f, const char *, bool, int, const char *, dt_operand ** = 0); + virtual void gen_transform (FILE *f, const char *, bool, int, + const char *, dt_operand ** = 0); }; template @@ -569,7 +572,8 @@ print_matches (struct simplify *s, FILE /* Lowering of commutative operators. */ static void -cartesian_product (const vec vecoperand * ops_vector, vec vecoperand * result, vecoperand * v, unsigned n) +cartesian_product (const vec vecoperand * ops_vector, + vec vecoperand * result, vecoperand * v, unsigned n) { if (n == ops_vector.length ()) { @@ -584,14 +588,8 @@ cartesian_product (const vec vecoperan cartesian_product (ops_vector, result, v, n + 1); } } - -static void -cartesian_product (const vec vecoperand * ops_vector, vec vecoperand * result, unsigned n_ops) -{ - vecoperand * v = vNULL; - v.safe_grow_cleared (n_ops); - cartesian_product (ops_vector, result, v, 0); -} + +/* Lower OP to two operands in case it is marked as commutative. */ static vecoperand * commutate (operand *op) @@ -625,8 +623,11 @@ commutate (operand *op) for (unsigned i = 0; i e-ops.length (); ++i) ops_vector.safe_push (commutate (e-ops[i])); - vec vecoperand * result = vNULL; - cartesian_product (ops_vector, result, e-ops.length ()); + auto_vec vecoperand * result; + auto_vecoperand * v (e-ops.length ()); + v.quick_grow_cleared (e-ops.length ()); + cartesian_product (ops_vector, result, v, 0); + for (unsigned i = 0; i result.length (); ++i) { @@ -651,6 +652,9 @@ commutate (operand *op) return ret; } +/* Lower operations marked as commutative in the AST of S and push + the resulting patterns to SIMPLIFIERS. */ + static void lower_commutative (simplify *s, vecsimplify * simplifiers) { @@ -664,15 +668,16 @@ lower_commutative (simplify *s, vecsimp } } -/* Lowering of conditional converts. */ +/* Strip conditional conversios using operator OPER from O and its + children if STRIP, else replace them with an unconditional convert. */ -static operand * -lower_opt_convert (operand *o, enum tree_code oper) +operand * +lower_opt_convert (operand *o, enum tree_code oper, bool strip) { - if (capture *c = dyn_castcapture * (o)) + if (capture *c = dyn_castcapture * (o)) { if (c-what) - return new capture (c-where, lower_opt_convert (c-what, oper)); + return new capture (c-where, lower_opt_convert (c-what, oper, strip)); else return c; } @@ -683,42 +688,23 @@ lower_opt_convert (operand *o, enum tree if (*e-operation == oper) { + if (strip) + return lower_opt_convert (e-ops[0], oper, strip); + expr *ne = new expr (get_operator (CONVERT_EXPR)); - ne-append_op (lower_opt_convert (e-ops[0], oper)); + ne-append_op (lower_opt_convert (e-ops[0], oper, strip)); return ne; } expr *ne = new expr (e-operation, e-is_commutative); for (unsigned i = 0; i e-ops.length (); ++i) -ne-append_op (lower_opt_convert (e-ops[i], oper)); +ne-append_op (lower_opt_convert (e-ops[i], oper, strip)); return ne; } -operand * -remove_opt_convert (operand *o, enum tree_code oper) -{ - if (capture *c = dyn_castcapture * (o)) -{ - if (c-what) - return new capture (c-where, remove_opt_convert (c-what, oper)); - else - return c; -} - - expr *e = as_aexpr * (o); - if (!e) -
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [4/n] Memory accesses instrumentation
On 13 Oct 14:52, Jeff Law wrote: On 10/08/14 13:01, Ilya Enkovich wrote: Hi, This is the main chunk of instrumentation codes. This patch introduces instrumentation pass which instruments memory accesses. Thanks, Ilya -- 2014-10-08 Ilya Enkovichilya.enkov...@intel.com * tree-chkp.c (chkp_may_complete_phi_bounds): New. (chkp_may_finish_incomplete_bounds): New. (chkp_recompute_phi_bounds): New. (chkp_find_valid_phi_bounds): New. (chkp_finish_incomplete_bounds): New. (chkp_maybe_copy_and_register_bounds): New. (chkp_build_returned_bound): New. (chkp_get_bound_for_parm): New. (chkp_compute_bounds_for_assignment): New. (chkp_get_bounds_by_definition): New. (chkp_get_bounds_for_decl_addr): New. (chkp_get_bounds_for_string_cst): New. (chkp_parse_array_and_component_ref): New. (chkp_make_addressed_object_bounds): New. (chkp_find_bounds_1): New. (chkp_find_bounds): New. (chkp_find_bounds_loaded): New. (chkp_copy_bounds_for_elem): New. (chkp_process_stmt): New. (chkp_fix_cfg): New. (chkp_instrument_function): New. (chkp_fini): New. (chkp_execute): New. (chkp_gate): New. (pass_data_chkp): New. (pass_chkp): New. (make_pass_chkp): New. @@ -491,6 +910,129 @@ chkp_get_bounds_var (tree ptr_var) return bnd_var; } + + +/* Register bounds BND for object PTR in global bounds table. + A copy of bounds may be created for abnormal ssa names. + Returns bounds to use for PTR. */ +static tree +chkp_maybe_copy_and_register_bounds (tree ptr, tree bnd) +{ + bool abnormal_ptr; + + if (!chkp_reg_bounds) +return bnd; + + /* Do nothing if bounds are incomplete_bounds + because it means bounds will be recomputed. */ + if (bnd == incomplete_bounds) +return bnd; + + abnormal_ptr = (TREE_CODE (ptr) == SSA_NAME + SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ptr) + gimple_code (SSA_NAME_DEF_STMT (ptr)) != GIMPLE_PHI); + + /* A single bounds value may be reused multiple times for + different pointer values. It may cause coalescing issues + for abnormal SSA names. To avoid it we create a bounds + copy in case it is copmputed for abnormal SSA name. s/copmputed/computed/ + if (!bounds) +{ + tree orig_decl = cgraph_node::get (cfun-decl)-orig_decl; + + /* For static chain param we return zero bounds + because currently we do not check dereferences + of this pointer. */ + /* ?? Is it a correct way to identify such parm? */ + if (cfun-decl DECL_STATIC_CHAIN (cfun-decl) + DECL_ARTIFICIAL (decl)) +bounds = chkp_get_zero_bounds (); Are you just looking for the parameter in which we pass the static chain? Look at get_chain_decl for how we set it up. You may actually have to peek at more fields. I don't think there's a single magic bit that says this is the static chain. Though it may always appear in the same location on the parameter list. Nested functions aren't something I'd poked with much. Richard Henderson might know more since he wrote tree-nested a while back. Looking through tree-nested.c I found there is a static_chain_decl in function structure holding created decl. @@ -1107,6 +1821,323 @@ chkp_build_bndstx (tree addr, tree ptr, tree bounds, } } +/* Compute bounds for pointer NODE which was assigned in + assignment statement ASSIGN. Return computed bounds. */ +static tree +chkp_compute_bounds_for_assignment (tree node, gimple assign) Ugh. Note how this introduces another place that anyone who might add a new RHS gimple statement needs to edit. We need a pointer back to this code so that folks will know it needs updating. The question is where to put it. Basically we want a place where anyone adding a new code that can appear on the RHS of an assignment must change already. Thoughts on a good location? I realize there's probably many other places that probably need these kinds of documentation back links, I'm not asking you to address all of them. Actually it shouldn't be so critical to meet some new RHS code in this switch. We may always say that we cannot find proper bounds and use default ones. I replaced gcc_uneachable with a warning about lost bounds and added a comment into tree.def. Would it be enough? +/* Compute and returne bounds for address of OBJ. */ s/returne/return + +/* Some code transformation made during instrumentation pass + may put code into inconsistent state. Here we find and fix + such flaws. */ +static void +chkp_fix_cfg () Presumably none of the code you're inserting that causes these problems is ever supposed to be executed on the non-fallthru edge? Else your creative method of hiding the abnormal nature of the edge for a period of time, then recreating it won't work. I'm a bit worried by this
[PATCH][match-and-simplify] Update texi documentation
This updates it with changed/added features. pfd-build checked and inspected, applied. Richard. 2014-10-14 Richard Biener rguent...@suse.de * doc/match-and-simplify.texi: Update. Index: gcc/doc/match-and-simplify.texi === --- gcc/doc/match-and-simplify.texi (revision 216146) +++ gcc/doc/match-and-simplify.texi (working copy) @@ -38,6 +38,8 @@ APIs are introduced. @deftypefnx {GIMPLE function} tree gimple_simplify (enum tree_code, tree, tree, tree, gimple_seq *, tree (*)(tree)) @deftypefnx {GIMPLE function} tree gimple_simplify (enum tree_code, tree, tree, tree, tree, gimple_seq *, tree (*)(tree)) @deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, tree, tree, gimple_seq *, tree (*)(tree)) +@deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, tree, tree, tree, gimple_seq *, tree (*)(tree)) +@deftypefnx {GIMPLE function} tree gimple_simplify (enum built_in_function, tree, tree, tree, gimple_seq *, tree (*)(tree)) The main GIMPLE API entry to the expression simplifications mimicing that of the GENERIC fold_@{unary,binary,ternary@} functions. @end deftypefn @@ -48,22 +50,27 @@ inserted on (if @code{NULL} then simplif are not performed) and a valueization hook that can be used to tie simplifications to a SSA lattice. -In addition to those APIs a fold_stmt-like interface is provided with +In addition to those APIs @code{fold_stmt} is overloaded with +a valueization hook: -@deftypefn bool gimple_simplify (gimple_stmt_iterator *, tree (*)(tree)); +@deftypefn bool fold_stmt (gimple_stmt_iterator *, tree (*)(tree)); @end deftypefn -which also has the additional valueization hook. Ontop of these a @code{fold_buildN}-like API for GIMPLE is introduced: -@deftypefn tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree (*valueize) (tree) = NULL); -@deftypefnx tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree, tree (*valueize) (tree) = NULL); -@deftypefnx tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree, tree, tree (*valueize) (tree) = NULL); -@deftypefnx tree gimple_build (gimple_seq *, location_t, enum built_in_function, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefn {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum tree_code, tree, tree, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum built_in_function, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefnx {GIMPLE function} tree gimple_build (gimple_seq *, location_t, enum built_in_function, tree, tree, tree, tree (*valueize) (tree) = NULL); +@deftypefnx {GIMPLE function} tree gimple_convert (gimple_seq *, location_t, tree, tree); @end deftypefn -which is supposed to replace @code{force_gimple_operand (fold_buildN (...), ...)}. +which is supposed to replace @code{force_gimple_operand (fold_buildN (...), ...)} +and calls to @code{fold_convert}. Overloads without the @code{location_t} +argument exist. Built statements are inserted on the provided sequence +and simplification is performed using the optional valueization hook. @node The Language @@ -72,7 +79,7 @@ which is supposed to replace @code{force The language to write expression simplifications in resembles other domain-specific languages GCC uses. Thus it is lispy. Lets start -with an example from the match.pd file on the branch: +with an example from the match.pd file: @smallexample (simplify @@ -86,13 +93,14 @@ That contains at least two operands - an with the GIMPLE or GENERIC IL and a replacement expression that is returned if the match was successful. -Expressions have an ID, @code{bit_and} in this case. Expressions can +Expressions have an operator ID, @code{bit_and} in this case. Expressions can be lower-case tree codes with @code{_expr} stripped off or builtin function code names in all-caps, like @code{BUILT_IN_SQRT}. @code{@@n} denotes a so-called capture. It captures the operand and lets you refer to it in other places of the match-and-simplify. In the -above example it is refered to in the replacement expression. +above example it is refered to in the replacement expression. Captures +are @code{@@} followed by a number or an identifier. @smallexample (simplify @@ -103,7 +111,8 @@ above example it is refered to in the re In this example @code{@@0} is mentioned twice which constrains the matched expression to have two equal operands. This example also introduces operands written in C code. These can be used in the expression
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [6/n] Instrument calls and returns
On 13 Oct 14:49, Ilya Enkovich wrote: On 10 Oct 12:50, Jeff Law wrote: On 10/08/14 13:04, Ilya Enkovich wrote: Hi, This patch adds intrumentation of calls and returns into instrumentation pass. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_add_bounds_to_ret_stmt): New. (chkp_replace_address_check_builtin): New. (chkp_replace_extract_builtin): New. (chkp_find_bounds_for_elem): New. (chkp_add_bounds_to_call_stmt): New. (chkp_instrument_function): Instrument rets and calls. [ snip ] +/* Additionall we need to add bounds s/Additionall/Additionally/ OK with that nit fixed. jeff Here is a fixed version. Thanks, Ilya Here is a slightly modified version with no chkp_can_be_shared check before unshare_expr calls. Thanks, Ilya -- diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index c546d97..2ddd25f 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1042,6 +1042,29 @@ chkp_get_registered_bounds (tree ptr) return slot ? *slot : NULL_TREE; } +/* Add bound retvals to return statement pointed by GSI. */ + +static void +chkp_add_bounds_to_ret_stmt (gimple_stmt_iterator *gsi) +{ + gimple ret = gsi_stmt (*gsi); + tree retval = gimple_return_retval (ret); + tree ret_decl = DECL_RESULT (cfun-decl); + tree bounds; + + if (!retval) +return; + + if (BOUNDED_P (ret_decl)) +{ + bounds = chkp_find_bounds (retval, gsi); + bounds = chkp_maybe_copy_and_register_bounds (ret_decl, bounds); + gimple_return_set_retbnd (ret, bounds); +} + + update_stmt (ret); +} + /* Force OP to be suitable for using as an argument for call. New statements (if any) go to SEQ. */ static tree @@ -1166,6 +1189,64 @@ chkp_check_mem_access (tree first, tree last, tree bounds, chkp_check_upper (last, bounds, iter, location, dirflag); } +/* Replace call to _bnd_chk_* pointed by GSI with + bndcu and bndcl calls. DIRFLAG determines whether + check is for read or write. */ + +void +chkp_replace_address_check_builtin (gimple_stmt_iterator *gsi, + tree dirflag) +{ + gimple_stmt_iterator call_iter = *gsi; + gimple call = gsi_stmt (*gsi); + tree fndecl = gimple_call_fndecl (call); + tree addr = gimple_call_arg (call, 0); + tree bounds = chkp_find_bounds (addr, gsi); + + if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_LBOUNDS + || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_BOUNDS) +chkp_check_lower (addr, bounds, *gsi, gimple_location (call), dirflag); + + if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_UBOUNDS) +chkp_check_upper (addr, bounds, *gsi, gimple_location (call), dirflag); + + if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_CHECK_PTR_BOUNDS) +{ + tree size = gimple_call_arg (call, 1); + addr = fold_build_pointer_plus (addr, size); + addr = fold_build_pointer_plus_hwi (addr, -1); + chkp_check_upper (addr, bounds, *gsi, gimple_location (call), dirflag); +} + + gsi_remove (call_iter, true); +} + +/* Replace call to _bnd_get_ptr_* pointed by GSI with + corresponding bounds extract call. */ + +void +chkp_replace_extract_builtin (gimple_stmt_iterator *gsi) +{ + gimple call = gsi_stmt (*gsi); + tree fndecl = gimple_call_fndecl (call); + tree addr = gimple_call_arg (call, 0); + tree bounds = chkp_find_bounds (addr, gsi); + gimple extract; + + if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_GET_PTR_LBOUND) +fndecl = chkp_extract_lower_fndecl; + else if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CHKP_GET_PTR_UBOUND) +fndecl = chkp_extract_upper_fndecl; + else +gcc_unreachable (); + + extract = gimple_build_call (fndecl, 1, bounds); + gimple_call_set_lhs (extract, gimple_call_lhs (call)); + chkp_mark_stmt (extract); + + gsi_replace (gsi, extract, false); +} + /* Return COMPONENT_REF accessing FIELD in OBJ. */ static tree chkp_build_component_ref (tree obj, tree field) @@ -1227,6 +1308,78 @@ chkp_build_array_ref (tree arr, tree etype, tree esize, return res; } +/* Helper function for chkp_add_bounds_to_call_stmt. + Fill ALL_BOUNDS output array with created bounds. + + OFFS is used for recursive calls and holds basic + offset of TYPE in outer structure in bits. + + ITER points a position where bounds are searched. + + ALL_BOUNDS[i] is filled with elem bounds if there + is a field in TYPE which has pointer type and offset + equal to i * POINTER_SIZE in bits. */ +static void +chkp_find_bounds_for_elem (tree elem, tree *all_bounds, + HOST_WIDE_INT offs, + gimple_stmt_iterator *iter) +{ + tree type = TREE_TYPE (elem); + + if (BOUNDED_TYPE_P (type)) +{ + if (!all_bounds[offs / POINTER_SIZE]) + { + tree temp = make_temp_ssa_name (type, gimple_build_nop (), ); + gimple assign = gimple_build_assign (temp, elem); +
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [3/n] Helper functions
On 14 Oct 01:13, Ilya Enkovich wrote: 2014-10-14 1:05 GMT+04:00 Jeff Law l...@redhat.com: Where does chkp_can_be_shared get used?Normally the thing to do would just be to call unshare_expr. It'll create copies as needed. If it's something that is supposed to be shared then it'll leave it alone. If you need to do something different than unshare_expr, then that needs deeper investigation as you're mucking around in the structure sharing assumptions and that's not to be done lightly. All its uses are like following: if (!chkp_can_be_shared (rhs1)) rhs1 = unshare_expr (rhs1); If unshare_expr avoids copies by itself then this check is useless and I should remove all its uses. Thanks, Ilya jeff Here is a version with no chkp_can_be_shared function. Patches having its uses were updated. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (assign_handler): New. (chkp_get_zero_bounds): New. (chkp_uintptr_type): New. (chkp_none_bounds_var): New. (entry_block): New. (zero_bounds): New. (none_bounds): New. (incomplete_bounds): New. (tmp_var): New. (size_tmp_var): New. (chkp_abnormal_copies): New. (chkp_invalid_bounds): New. (chkp_completed_bounds_set): New. (chkp_reg_bounds): New. (chkp_bound_vars): New. (chkp_reg_addr_bounds): New. (chkp_incomplete_bounds_map): New. (chkp_static_var_bounds): New. (in_chkp_pass): New. (CHKP_BOUND_TMP_NAME): New. (CHKP_SIZE_TMP_NAME): New. (CHKP_BOUNDS_OF_SYMBOL_PREFIX): New. (CHKP_STRING_BOUNDS_PREFIX): New. (CHKP_VAR_BOUNDS_PREFIX): New. (CHKP_NONE_BOUNDS_VAR_NAME): New. (chkp_get_tmp_var): New. (chkp_get_tmp_reg): New. (chkp_get_size_tmp_var): New. (chkp_register_addr_bounds): New. (chkp_get_registered_addr_bounds): New. (chkp_mark_completed_bounds): New. (chkp_completed_bounds): New. (chkp_erase_completed_bounds): New. (chkp_register_incomplete_bounds): New. (chkp_incomplete_bounds): New. (chkp_erase_incomplete_bounds): New. (chkp_mark_invalid_bounds): New. (chkp_valid_bounds): New. (chkp_mark_invalid_bounds_walker): New. (chkp_build_addr_expr): New. (chkp_get_entry_block): New. (chkp_get_bounds_var): New. (chkp_get_registered_bounds): New. (chkp_check_lower): New. (chkp_check_upper): New. (chkp_check_mem_access): New. (chkp_build_component_ref): New. (chkp_build_array_ref): New. (chkp_make_bounds): New. (chkp_get_none_bounds_var): New. (chkp_get_zero_bounds): New. (chkp_get_none_bounds): New. (chkp_get_invalid_op_bounds): New. (chkp_get_nonpointer_load_bounds): New. (chkp_get_next_bounds_parm): New. (chkp_build_bndldx): New. (chkp_make_static_bounds): New. (chkp_generate_extern_var_bounds): New. (chkp_intersect_bounds): New. (chkp_may_narrow_to_field): New. (chkp_narrow_bounds_for_field): New. (chkp_narrow_bounds_to_field): New. (chkp_walk_pointer_assignments): New. (chkp_init): New. * tree-chkp.h (chkp_get_none_bounds_var): New. (chkp_check_mem_access): New. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index eb7a8df..9245fa7 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -60,6 +60,10 @@ along with GCC; see the file COPYING3. If not see #include rtl.h /* For MEM_P, assign_temp. */ #include tree-dfa.h +typedef void (*assign_handler)(tree, tree, void *); + +static tree chkp_get_zero_bounds (); + #define chkp_bndldx_fndecl \ (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDLDX)) #define chkp_bndstx_fndecl \ @@ -83,11 +87,37 @@ along with GCC; see the file COPYING3. If not see #define chkp_extract_upper_fndecl \ (targetm.builtin_chkp_function (BUILT_IN_CHKP_EXTRACT_UPPER)) -static GTY (()) tree chkp_zero_bounds_var; +static GTY (()) tree chkp_uintptr_type; +static GTY (()) tree chkp_zero_bounds_var; +static GTY (()) tree chkp_none_bounds_var; + +static GTY (()) basic_block entry_block; +static GTY (()) tree zero_bounds; +static GTY (()) tree none_bounds; +static GTY (()) tree incomplete_bounds; +static GTY (()) tree tmp_var; +static GTY (()) tree size_tmp_var; +static GTY (()) bitmap chkp_abnormal_copies; + +struct hash_settree *chkp_invalid_bounds; +struct hash_settree *chkp_completed_bounds_set; +struct hash_maptree, tree *chkp_reg_bounds; +struct hash_maptree, tree *chkp_bound_vars; +struct hash_maptree, tree *chkp_reg_addr_bounds; +struct hash_maptree, tree *chkp_incomplete_bounds_map; struct hash_maptree, tree *chkp_bounds_map; +struct hash_maptree, tree *chkp_static_var_bounds; + +static bool in_chkp_pass; +#define CHKP_BOUND_TMP_NAME
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On 10/14/2014 09:25 AM, Richard Biener wrote: On Mon, 13 Oct 2014, Bernd Schmidt wrote: On 10/13/2014 12:33 PM, Ilya Verbin wrote: On 13 Oct 12:19, Jakub Jelinek wrote: But I'd like to understand why is this one needed. Why should the compilers care? Aggregates layout and alignment of integral/floating types must match between host and offload compilers, sure, but isn't that something streamed already in the LTO bytecode? Or is LTO streamer not streaming some types like long_type_node? It isn't, see the preload_common_nodes code. Something I'd like to get rid of at some point (but it's not 100% easy as backends for example compare va_list_type_node by pointer). Hmm, this is unfortunate - I was about to submit a patch not to stream that one since it can differ between host and offload target. I see one such comparison in i386.c - any others you are aware of? Should it be sufficient to just compare the TYPE_MAIN_VARIANT instead? Also, the backend needs to choose the right Pmode (and in the case of ptx, emit a directive about address sizes). Surely that will only be one problem with going the LTO way to handle the offloading ;) Another problem I mentioned above, beyond that I have a patch to use the $host-modes.def file to define machine modes - and that's essentially it. I'll be submitting these additional offloading patches for the case of different host and target once Ilya has committed the others. Bernd
[PATCH][match-and-simplify] Fix ICE
This fixes an ICE that occurs when valueziation returns NULL and we are looking at single-rhs REALPART_EXPR. We should check for is_gimple_min_invariant before valueization. Bootstrapped on x86_64-unknown-linux-gnu, applied. Richard. 2014-10-14 Richard Biener rguent...@suse.de * genmatch.c (dt_operand::gen_gimple_expr): Check for an invariant operand before valueizing it. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 216197) +++ gcc/genmatch.c (working copy) @@ -1566,8 +1566,8 @@ dt_operand::gen_gimple_expr (FILE *f) fprintf (f, tree %s = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), %i);\n, child_opname, i); fprintf (f, if ((TREE_CODE (%s) == SSA_NAME\n - (%s = do_valueize (valueize, %s)))\n || is_gimple_min_invariant (%s))\n + (%s = do_valueize (valueize, %s)))\n {\n, child_opname, child_opname, child_opname, child_opname); continue;
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 10:03:38AM -0600, Jeff Law wrote: Can you add a PR markers to your changelog PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 Actually I think there is an additional test in 47602. Can you please add it to the suite? You'll also want to change the state of 47602 to RESOLVED/FIXED. Unfortunately this broke bootstrap on x86_64/i686-linux, see http://gcc.gnu.org/PR63534 - pretty much everything with -m32 -fsplit-stack -fpic ICEs, -m32 -fpic -p results in wrong-code, and I see significant code quality regressions even on simple testcases. For the first two, I think (and said it before already) that the current model of emitting set_got from a target hook during RA can't work, as there can be calls in the prologue, and the prologue is inserted before the set_got in that case. I really think the RA should in that case just tell the backend whether and in which register it wants to have the PIC register loaded upon start of the function, and it should be emit prologue pass that should arrange for that. As for the code quality, either some RA improvements are needed, or postreload must be able to fix it up, or hardreg propagation (though, cprop_hardreg is forward propagation rather than backwards, right?). Better before prologue is emitted though, because that will save/restore the badly chosen hard reg too. Jakub
Re: [PATCH][match-and-simplify] Change back default behavior of fold_stmt
On Tue, 14 Oct 2014, Richard Biener wrote: This changes default behavior of fold_stmt back to _not_ following SSA use-def chains when trying to simplify things. I had to force that already for one caller and for the merge to trunk I'd rather not track down issues in every other existing caller. This means that fold_stmt will not become more powerful, at least for now. I still hope to get rid of its use of fold() during the merge process. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. (yeah, I'm preparing a first batch of changes to merge from the branch) Unfortunately this exposes an issue with combining our SSA propagators with pattern matching which makes us miscompile tree-vect-generic.c from VRP. Consider Visiting PHI node: i_137 = PHI 0(51), i_48(63) Argument #0 (51 - 52 executable) 0: [0, 0] Argument #1 (63 - 52 not executable) Found new range for i_137: [0, 0] ... i_48 = delta_25 + i_137; Found new range for i_48: VARYING _67 = (unsigned int) delta_25; Found new range for _67: [0, +INF] _78 = (unsigned int) i_48; Found new range for _78: [0, +INF] _257 = _78 - _67; (unsigned int) (delta_25 + i_137) - (unsigned int) delta_25 Match-and-simplified _78 - _67 to 0 Found new range for _257: [0, 0] now after i_137 is revisited and it becomes VARYING the SSA propagator stops at i_48 because its value does not change. Thus it fails to re-visit _257 where a pattern was applied that used the optimistic value of i_137 to its advantage. The following patch makes sure SSA propagators (CCP and VRP) do not get any benefit during their propagation phase from match-and-simplify by disabling the following of SSA use-def edges. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard.
[PATCH, DWARF] re-init dw_frame_pointer_regnum between functions
Hello, ARM and Thumb modes use different hard_frame_pointer_regnum ABIs. The problem is that dwarf2cfi.c:dw_frame_pointer_regnum cache is initialized only once per file, when creating the CIE. While testing the ARM attribute target to switch modes between functions, I got a few assertion with -g, because this value gets inconsistent with the respective FDEs that have different hard_frame_pointer_rtx... The snippet from dwarf2cfi.c illustrates the potential issue with the mismatch between hard_frame_pointer_rtx and a badly set CFA register : if (dest == hard_frame_pointer_rtx) ... cur_cfa-reg = dw_frame_pointer_regnum; ... I'm not aware of other targets giving the possibility to change the frame_pointer_regnum ABI in a file, so the issue will only be show up with the ARM target attribute. However I'd like very much your feedback on this change, before I can send the remaining ARM parts. Tested manually for arm-none-eabi with gdb, unwinding and frame access seem OK when mixing modes. x86 bootstrapped and regressions tests are running. Many thanks, Christian 2014-09-23 Christian Bruel christian.br...@st.com * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each function. Index: dwarf2cfi.c === --- dwarf2cfi.c (revision 216146) +++ dwarf2cfi.c (working copy) @@ -2860,7 +2860,6 @@ dw_trace_info cie_trace; dw_stack_pointer_regnum = DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM); - dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM); memset (cie_trace, 0, sizeof (cie_trace)); cur_trace = cie_trace; @@ -2913,6 +2912,9 @@ static unsigned int execute_dwarf2_frame (void) { + /* Different HARD_FRAME_POINTER_REGNUM might coexist in the same file. */ + dw_frame_pointer_regnum = DWARF_FRAME_REGNUM (HARD_FRAME_POINTER_REGNUM); + /* The first time we're called, compute the incoming frame state. */ if (cie_cfi_vec == NULL) create_cie_data ();
[patch libstdc++]: Fix PR/59807
Hi, this patch fixes issue PR/59807 mutex misses destructor if non-function call initialization is used. This issue just got reported for mingw-w64, as this is the only venture providing posix-threading enabled toolchains (C++11). Nevertheless this issue could happen for other native Windows toolchains, too. Therefore I adjusted the default mingw32-case, too. ChangeLog 2014-10-14 Kai Tietz kti...@redhat.com PR libstdc++/59807 * config/os/mingw32/os_defines.h (_GTHREAD_USE_MUTEX_INIT_FUNC): Define to avoid leak. * config/os/mingw32-w64/os_defines.h: Likewise. I am just testing bootstrap for it, and if successful, I will commit. Thanks, Kai Index: config/os/mingw32/os_defines.h === --- config/os/mingw32/os_defines.h(Revision 216199) +++ config/os/mingw32/os_defines.h(Arbeitskopie) @@ -75,4 +75,7 @@ #define _GLIBCXX_LLP64 1 #endif +// See libstdc++/59807 +#define _GTHREAD_USE_MUTEX_INIT_FUNC 1 + #endif Index: config/os/mingw32-w64/os_defines.h === --- config/os/mingw32-w64/os_defines.h(Revision 216199) +++ config/os/mingw32-w64/os_defines.h(Arbeitskopie) @@ -83,4 +83,7 @@ // their dtors are called #define _GLIBCXX_THREAD_ATEXIT_WIN32 1 +// See libstdc++/59807 +#define _GTHREAD_USE_MUTEX_INIT_FUNC 1 + #endif
Re: [PATCH][match-and-simplify] Change back default behavior of fold_stmt
On Tue, 14 Oct 2014, Richard Biener wrote: On Tue, 14 Oct 2014, Richard Biener wrote: This changes default behavior of fold_stmt back to _not_ following SSA use-def chains when trying to simplify things. I had to force that already for one caller and for the merge to trunk I'd rather not track down issues in every other existing caller. This means that fold_stmt will not become more powerful, at least for now. I still hope to get rid of its use of fold() during the merge process. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. (yeah, I'm preparing a first batch of changes to merge from the branch) Unfortunately this exposes an issue with combining our SSA propagators with pattern matching which makes us miscompile tree-vect-generic.c from VRP. Consider Visiting PHI node: i_137 = PHI 0(51), i_48(63) Argument #0 (51 - 52 executable) 0: [0, 0] Argument #1 (63 - 52 not executable) Found new range for i_137: [0, 0] ... i_48 = delta_25 + i_137; Found new range for i_48: VARYING _67 = (unsigned int) delta_25; Found new range for _67: [0, +INF] _78 = (unsigned int) i_48; Found new range for _78: [0, +INF] _257 = _78 - _67; (unsigned int) (delta_25 + i_137) - (unsigned int) delta_25 Match-and-simplified _78 - _67 to 0 Found new range for _257: [0, 0] now after i_137 is revisited and it becomes VARYING the SSA propagator stops at i_48 because its value does not change. Thus it fails to re-visit _257 where a pattern was applied that used the optimistic value of i_137 to its advantage. The following patch makes sure SSA propagators (CCP and VRP) do not get any benefit during their propagation phase from match-and-simplify by disabling the following of SSA use-def edges. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. And here is the patch. Richard. 2014-10-14 Richard Biener rguent...@suse.de * gimple-fold.h (no_follow_ssa_edges): Declare. (gimple_fold_stmt_to_constant_1): Add separate valueize hook for gimple_simplify, defaulted to no_follow_ssa_edges. * gimple-fold.c (fold_stmt): Make old API never follow SSA edges when simplifying. (no_follow_ssa_edges): New function. (gimple_fold_stmt_to_constant_1): Adjust. * tree-cfg.c (no_follow_ssa_edges): Remove. (replace_uses_by): Use plain fold_stmt again. * gimple-match-head.c (gimple_simplify): When simplifying a statement do not stop when valueizing its operands yields NULL. Index: gcc/gimple-fold.h === --- gcc/gimple-fold.h (revision 216146) +++ gcc/gimple-fold.h (working copy) @@ -32,7 +32,9 @@ extern tree maybe_fold_and_comparisons ( enum tree_code, tree, tree); extern tree maybe_fold_or_comparisons (enum tree_code, tree, tree, enum tree_code, tree, tree); -extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree)); +extern tree no_follow_ssa_edges (tree); +extern tree gimple_fold_stmt_to_constant_1 (gimple, tree (*) (tree), + tree (*) (tree) = no_follow_ssa_edges); extern tree gimple_fold_stmt_to_constant (gimple, tree (*) (tree)); extern tree fold_const_aggregate_ref_1 (tree, tree (*) (tree)); extern tree fold_const_aggregate_ref (tree); Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 216146) +++ gcc/gimple-fold.c (working copy) @@ -3136,6 +3136,14 @@ fail: return changed; } +/* Valueziation callback that ends up not following SSA edges. */ + +tree +no_follow_ssa_edges (tree) +{ + return NULL_TREE; +} + /* Fold the statement pointed to by GSI. In some cases, this function may replace the whole statement with a new one. Returns true iff folding makes any changes. @@ -3146,7 +3154,7 @@ fail: bool fold_stmt (gimple_stmt_iterator *gsi) { - return fold_stmt_1 (gsi, false, NULL); + return fold_stmt_1 (gsi, false, no_follow_ssa_edges); } bool @@ -3167,7 +3175,7 @@ bool fold_stmt_inplace (gimple_stmt_iterator *gsi) { gimple stmt = gsi_stmt (*gsi); - bool changed = fold_stmt_1 (gsi, true, NULL); + bool changed = fold_stmt_1 (gsi, true, no_follow_ssa_edges); gcc_assert (gsi_stmt (*gsi) == stmt); return changed; } @@ -4527,12 +4535,19 @@ gimple_fold_stmt_to_constant_2 (gimple s } } +/* ??? The SSA propagators do not correctly deal with following SSA use-def + edges if there are intermediate VARYING defs. For this reason + there are two valueization hooks here, one for the legacy code + in gimple_fold_stmt_to_constant_2 and one for gimple_simplify + which is defaulted to no_follow_ssa_edges. */ + tree -gimple_fold_stmt_to_constant_1 (gimple stmt, tree (*valueize) (tree)) +gimple_fold_stmt_to_constant_1 (gimple
Re: [C++] Handle || ! for simd vectors
On 10/13/2014 03:45 PM, Marc Glisse wrote: Ping https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00361.html (sorry that my message looked like I had committed as obvious) Indeed. OK. :) On Sat, 4 Oct 2014, Marc Glisse wrote: On Thu, 2 Oct 2014, Jason Merrill wrote: OK. Thanks. While committing, I noticed that I restricted ! to integer vectors, whereas it seems to work just fine with scalar floats, so it would make sense to extend it to float vectors. Tested on x86_64-linux-gnu. 2014-10-04 Marc Glisse marc.gli...@inria.fr gcc/cp/ * typeck.c (cp_build_unary_op) [TRUTH_NOT_EXPR]: Accept float vectors. gcc/testsuite/ * g++.dg/ext/vector9.C: Test ! with float vectors.
Re: __intN patch 3/5: main __int128 - __intN conversion.
On 10/13/2014 04:54 PM, DJ Delorie wrote: This is what I ended up with for the test case. It was a bit tricky since it only works with msp430x (not msp430) and requires the gnu extensions. Is this OK? If so, is there anything else, or can I check the whole mess in yet? Go ahead. Jason
Re: [PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target
On 10 Oct 21:20, Ilya Enkovich wrote: 2014-10-10 20:45 GMT+04:00 Jeff Law l...@redhat.com: On 10/09/14 10:54, Uros Bizjak wrote: On Thu, Oct 9, 2014 at 4:07 PM, Ilya Enkovich enkovich@gmail.com wrote: It appeared I changed a semantics of BNDMK expand when replaced tree operations with rtl ones. Original code: + op1 = expand_normal (fold_build2 (PLUS_EXPR, TREE_TYPE (arg1), + arg1, integer_minus_one_node)); + op1 = force_reg (Pmode, op1); Modified code: + op1 = expand_normal (arg1); + + if (!register_operand (op1, Pmode)) + op1 = ix86_zero_extend_to_Pmode (op1); + + /* Builtin arg1 is size of block but instruction op1 should +be (size - 1). */ + op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, +op1, 1, OPTAB_DIRECT); The problem is that in the fixed version we may modify value of a pseudo register into which arg1 is expanded which means incorrect value for all following usages of arg1. Didn't reveal it early because programs surprisingly rarely hit this bug. I do following change to fix it: op1 = expand_simple_binop (Pmode, PLUS, op1, constm1_rtx, -op1, 1, OPTAB_DIRECT); +NULL, 1, OPTAB_DIRECT); Similar problem was also fixed for BNDNARROW. Does it look OK? I'm not aware of this type of limitation, and there are quite some similar constructs in i386.c. It is hard to say without the testcase what happens, perhaps one of RTX experts (CC'd) can advise what is recommended here. The problem is the call to expand_simple_binop. The source (op1) and the destination (op1) are obviously the same, so its going to clobber whatever value is in there. If there are other uses of the original value of op1, then things aren't going to work. But I'm a little unclear how there's be other later uses of that value. Perhaps Ilya could comment on that. op1 is a result of expand_normal called for SSA name. Other uses of op1 come from expand of uses of this SSA name in GIMPLE code. Regardless, unless there's a strong reason to do so, I'd generally recommend passing a NULL_RTX as the target for expansions so that you always get a new pseudo. Lots of optimizers in the RTL world work better if we don't have pseudos with multiple assignments. By passing NULL_RTX for the target we get that property more often. So a change like Ilya suggests (though I'd use NULL_RTX rather than NULL) makes sense. Will replace it with NULL_RTX. Thanks, Ilya Jeff Here is a version with NULL_RTX used instead of NULL. Thanks, Ilya -- 2014-10-14 Ilya Enkovich ilya.enkov...@intel.com * config/i386/i386-builtin-types.def (BND): New. (ULONG): New. (BND_FTYPE_PCVOID_ULONG): New. (VOID_FTYPE_BND_PCVOID): New. (VOID_FTYPE_PCVOID_PCVOID_BND): New. (BND_FTYPE_PCVOID_PCVOID): New. (BND_FTYPE_PCVOID): New. (BND_FTYPE_BND_BND): New. (PVOID_FTYPE_PVOID_PVOID_ULONG): New. (PVOID_FTYPE_PCVOID_BND_ULONG): New. (ULONG_FTYPE_VOID): New. (PVOID_FTYPE_BND): New. * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h. (ix86_builtins): Add IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. (builtin_isa): Add leaf_p and nothrow_p fields. (def_builtin): Initialize leaf_p and nothrow_p. (ix86_add_new_builtins): Handle leaf_p and nothrow_p flags. (bdesc_mpx): New. (bdesc_mpx_const): New. (ix86_init_mpx_builtins): New. (ix86_init_builtins): Call ix86_init_mpx_builtins. (ix86_emit_cmove): New. (ix86_emit_move_max): New. (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER. diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 9161287..5421ba9 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -47,6 +47,7 @@ DEF_PRIMITIVE_TYPE (UCHAR, unsigned_char_type_node) DEF_PRIMITIVE_TYPE (QI, char_type_node) DEF_PRIMITIVE_TYPE (HI, intHI_type_node) DEF_PRIMITIVE_TYPE (SI, intSI_type_node) +DEF_PRIMITIVE_TYPE (BND, pointer_bounds_type_node) # ??? Logically this should be intDI_type_node, but that maps to long # with 64-bit, and that's not how the emmintrin.h is written. Again,
Re: [PATCH] support ggc hash_map and hash_set
On Tue, Sep 2, 2014 at 3:56 AM, tsaund...@mozilla.com wrote: From: Trevor Saunders tsaund...@mozilla.com Hi, There are still some issues to make this work really nicely, but this part is probably good enough its worth reviewing. For one thing you can't use ggc hash_map or set in front ends with some types or gengtype will decide to put the overloads of the marking routines it provides in a front end file instead of the one it choose before breaking other front ends. However that seems to be an unrelated issue you can trigger it without using hash_map/set, so we might as well solve it separetly. I had to have the entry marking functions for set deligate to the traits class because gcc 4.9.1 issues clearly bogus errors if you inline the code from the traits implementation. We may well want to make map work the same way at some point to enable some of the special GTY attributes like if_marked, but it doesn't seem to be necessary right now. bootstrapped + regtested without regressions on x86_64-unknown-linux-gnu, ok? I have just noticed that this (ggc support for hash-table.h) makes it no longer suitable for use from generator programs (trying to merge from trunk on match-and-simplify). If you look at vec.h it has sophisticated guards to block out GGC support if GENERATOR_FILE is defined. Can you try to fix this please? Thanks, Richard. Trev gcc/ChangeLog: 2014-09-01 Trevor Saunders tsaund...@mozilla.com * alloc-pool.c: Include coretypes.h. * cgraph.h, dbxout.c, dwarf2out.c, except.c, except.h, function.c, function.h, symtab.c, tree-cfg.c, tree-eh.c: Use hash_map and hash_set instead of htab. * ggc-page.c (in_gc): New variable. (ggc_free): Do nothing if a collection is taking place. (ggc_collect): Set in_gc appropriately. * ggc.h (gt_ggc_mx(const char *)): New function. (gt_pch_nx(const char *)): Likewise. (gt_ggc_mx(int)): Likewise. (gt_pch_nx(int)): Likewise. * hash-map.h (hash_map::hash_entry::ggc_mx): Likewise. (hash_map::hash_entry::pch_nx): Likewise. (hash_map::hash_entry::pch_nx_helper): Likewise. (hash_map::hash_map): Adjust. (hash_map::create_ggc): New function. (gt_ggc_mx): Likewise. (gt_pch_nx): Likewise. * hash-set.h (default_hashset_traits::ggc_mx): Likewise. (default_hashset_traits::pch_nx): Likewise. (hash_set::hash_entry::ggc_mx): Likewise. (hash_set::hash_entry::pch_nx): Likewise. (hash_set::hash_entry::pch_nx_helper): Likewise. (hash_set::hash_set): Adjust. (hash_set::create_ggc): New function. (hash_set::elements): Likewise. (gt_ggc_mx): Likewise. (gt_pch_nx): Likewise. * hash-table.h (hash_table::hash_table): Adjust. (hash_table::m_ggc): New member. (hash_table::~hash_table): Adjust. (hash_table::expand): Likewise. (hash_table::empty): Likewise. (gt_ggc_mx): New function. (hashtab_entry_note_pointers): Likewise. (gt_pch_nx): Likewise. diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c index 0d31835..bfaa0e4 100644 --- a/gcc/alloc-pool.c +++ b/gcc/alloc-pool.c @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h +#include coretypes.h #include alloc-pool.h #include hash-table.h #include hash-map.h diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 879899c..030a1c7 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1604,7 +1604,6 @@ struct cgraph_2node_hook_list; /* Map from a symbol to initialization/finalization priorities. */ struct GTY(()) symbol_priority_map { - symtab_node *symbol; priority_type init; priority_type fini; }; @@ -1872,7 +1871,7 @@ public: htab_t GTY((param_is (symtab_node))) assembler_name_hash; /* Hash table used to hold init priorities. */ - htab_t GTY ((param_is (symbol_priority_map))) init_priority_hash; + hash_mapsymtab_node *, symbol_priority_map *init_priority_hash; FILE* GTY ((skip)) dump_file; diff --git a/gcc/dbxout.c b/gcc/dbxout.c index 946f1d1..d856bdd 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -2484,12 +2484,9 @@ dbxout_expand_expr (tree expr) /* Helper function for output_used_types. Queue one entry from the used types hash to be output. */ -static int -output_used_types_helper (void **slot, void *data) +bool +output_used_types_helper (tree const type, vectree *types_p) { - tree type = (tree) *slot; - vectree *types_p = (vectree *) data; - if ((TREE_CODE (type) == RECORD_TYPE || TREE_CODE (type) == UNION_TYPE || TREE_CODE (type) == QUAL_UNION_TYPE @@ -2502,7 +2499,7 @@ output_used_types_helper (void **slot, void *data) TREE_CODE (TYPE_NAME (type)) == TYPE_DECL) types_p-quick_push (TYPE_NAME (type)); - return 1; + return true; } /* This is a qsort callback which sorts types and declarations into a @@ -2544,8 +2541,9 @@ output_used_types
Re: [PATCH] Add D demangling support to libiberty
Hello Ian, libiberty/ChangeLog 2014-08-05 Iain Buclaw ibuc...@gdcproject.org * Makefile.in (CFILES): Add d-demangle.c. (REQUIRED_OFILES): Add d-demangle.o. * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case. (cplus_demangle): Likewise. * d-demangle.c: New file. * testsuite/Makefile.in (really-check): Add check-d-demangle. * testsuite/d-demangle-expected: New file. As hinted on gdb-patches, this patch causes a GDB build failure on Solaris 2.9, because it uses strtold which is not available. According to gnulib's documentation, it should also break on the following systems: NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0, Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS. This patch attempts to fix the issue by adding a configure check for strtold and adjusts the code to use strtod if strtold does not exist. Does this look OK to you? If yes, can one of the GCC maintainers please review? libiberty/ChangeLog: * configure.ac: Add check for strtold's availability. * configure, config.in: Regenerate. * d-demangle.c [!HAVE_STRTOLD]: #define strtold to strtod. Thank you! -- Joel From 9e4d74607075ef857dfa4e118f43641494aaff90 Mon Sep 17 00:00:00 2001 From: Joel Brobecker brobec...@adacore.com Date: Tue, 14 Oct 2014 09:54:05 -0400 Subject: [PATCH] libiberty: fallback on strtod if strtold is not available. This patch fixes a build failurer on Solaris 2.9, and all other systems that do not provide strtold. libiberty/ChangeLog: * configure.ac: Add check for strtold's availability. * configure, config.in: Regenerate. * d-demangle.c [!HAVE_STRTOLD]: #define strtold to strtod. --- libiberty/config.in| 3 +++ libiberty/configure| 2 +- libiberty/configure.ac | 2 +- libiberty/d-demangle.c | 3 +++ 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/libiberty/configure.ac b/libiberty/configure.ac index 3380819..da20a5f 100644 --- a/libiberty/configure.ac +++ b/libiberty/configure.ac @@ -401,7 +401,7 @@ if test x = y; then sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \ stpcpy stpncpy strcasecmp strchr strdup \ strerror strncasecmp strndup strnlen strrchr strsignal strstr strtod \ - strtol strtoul strverscmp sysconf sysctl sysmp \ + strtol strtold strtoul strverscmp sysconf sysctl sysmp \ table times tmpnam \ vasprintf vfprintf vprintf vsprintf \ wait3 wait4 waitpid) diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c index d31bf94..59de083 100644 --- a/libiberty/d-demangle.c +++ b/libiberty/d-demangle.c @@ -46,6 +46,9 @@ If not, see http://www.gnu.org/licenses/. */ extern long strtol (const char *nptr, char **endptr, int base); extern long double strtold (const char *nptr, char **endptr); #endif +#if !defined(HAVE_STRTOLD) +#define strtold strtod +#endif #include demangle.h #include libiberty.h diff --git a/libiberty/config.in b/libiberty/config.in index 1cf9c11..8c5f0b6 100644 --- a/libiberty/config.in +++ b/libiberty/config.in @@ -280,6 +280,9 @@ /* Define to 1 if you have the `strtol' function. */ #undef HAVE_STRTOL +/* Define to 1 if you have the `strtold' function. */ +#undef HAVE_STRTOLD + /* Define to 1 if you have the `strtoul' function. */ #undef HAVE_STRTOUL diff --git a/libiberty/configure b/libiberty/configure index 96feaed..072b03b 100755 --- a/libiberty/configure +++ b/libiberty/configure @@ -5423,7 +5423,7 @@ if test x = y; then sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \ stpcpy stpncpy strcasecmp strchr strdup \ strerror strncasecmp strndup strnlen strrchr strsignal strstr strtod \ - strtol strtoul strverscmp sysconf sysctl sysmp \ + strtol strtold strtoul strverscmp sysconf sysctl sysmp \ table times tmpnam \ vasprintf vfprintf vprintf vsprintf \ wait3 wait4 waitpid -- 1.9.1
Re: New rematerialization sub-pass in LRA
On 10/13/2014 12:24 PM, Wilco Dijkstra wrote: Here is a new rematerialization sub-pass of LRA. I've tested and benchmarked the sub-pass on x86-64 and ARM. The sub-pass permits to generate a smaller code in average on both architecture (although improvement no-significant), adds 0.4% additional compilation time in -O2 mode of release GCC (according user time of compilation of 500K lines fortran program and valgrind lakey # insns in combine.i compilation) and about 0.7% in -O0 mode. As the performance result, the best I found is 1% SPECFP2000 improvement on ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance results are practically the same (Haswell has a very good sophisticated memory sub-system). I ran SPEC2k on AArch64, and EON fails to run correctly with -fno-caller-saves -mcpu=cortex-a57 -fomit-frame-pointer -Ofast. I'm not sure whether this is AArch64 specific, but previously non-optimal register allocation choices triggered A latent bug in ree (it's unclear why GCC still allocates FP registers in high-pressure integer code, as I set the costs for int-FP moves high). On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking. Perlbmk performance degradation is too big and I'll definitely look at this problem. Generally I think it is good to have a specific pass for rematerialization. However should this not also affect the costs of instructions that can be cheaply rematerialized? Similarly for the choice whether to caller save or spill (today the caller-save code doesn't care at all about rematerialization, so it aggressively caller-saves values which could be rematerialized - see eg. https://gcc.gnu.org/ml/gcc/2014-09/msg00071.html). I wanted to address the cost issues later but I guess perlbmk performance problem might be solved by this. So I'll be starting working on this. The rematerialization pass can fix caller-saves code if we add processing move insns too. So it could be another project to improve the rematerialization. Thanks for pointing this out. Also I am confused by the claim memory reads are not profitable to rematerialize. Surely rematerializing a memory read from const-data or literal pool is cheaper than spilling as you avoid a store to the stack? Most such cases are covered by cfg-insensitive rematerialization but I guess there are cfg-sensitve cases. I should try this too. Wilco, thanks for very informative email with three ideas to improve the rematerialization. As I wrote the patch is an initial implementation of the rematerialization and the infrastructure with modifications will be able to handle these and other improvements. Most important we have the infrastructure in the right place now,
[PATCH][match-and-simplify] Remove/revert unneeded changes
This removes duplicate/not needed code from generic-match-head.c and removes integral_op_p (if needed these new predicates should go to tree.h). It also revers one unnecessary Makefile.in change. Applied. Richard. 2014-10-14 Richard Biener rguent...@suse.de * Makefile.in (BUILD_RTL): Revert not needed change. * match.pd (integral_op_p): Remove predicate and use. * generic-match-head.c: Include gimple-match.h and remove all code. * gimple-match-head.c (integral_op_p): Remove. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 216146) +++ gcc/Makefile.in (working copy) @@ -1032,7 +1032,7 @@ BUILD_LIBS = $(BUILD_LIBIBERTY) BUILD_RTL = build/rtl.o build/read-rtl.o build/ggc-none.o \ build/vec.o build/min-insn-modes.o build/gensupport.o \ - build/print-rtl.o build/hash-table.o + build/print-rtl.o BUILD_MD = build/read-md.o BUILD_ERRORS = build/errors.o Index: gcc/match.pd === --- gcc/match.pd(revision 216146) +++ gcc/match.pd(working copy) @@ -24,7 +24,6 @@ along with GCC; see the file COPYING3. /* Generic tree predicates we inherit. */ (define_predicates - integral_op_p integer_onep integer_zerop integer_all_onesp real_zerop real_onep CONSTANT_CLASS_P) @@ -132,8 +131,9 @@ (define_predicates /* fold_negate_exprs convert - (~A) to A + 1. */ (simplify - (negate (bit_not integral_op_p@0)) - (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } )) + (negate (bit_not @0)) + (if (INTEGRAL_TYPE_P (type)) + (plus @0 { build_int_cst (TREE_TYPE (@0), 1); } ))) /* One ternary pattern. */ Index: gcc/generic-match-head.c === --- gcc/generic-match-head.c(revision 216146) +++ gcc/generic-match-head.c(working copy) @@ -41,37 +41,6 @@ along with GCC; see the file COPYING3. #include tree-phinodes.h #include ssa-iterators.h #include dumpfile.h +#include gimple-match.h -#define INTEGER_CST_P(node) (TREE_CODE(node) == INTEGER_CST) -#define integral_op_p(node) INTEGRAL_TYPE_P(TREE_TYPE(node)) -#define REAL_CST_P(node) (TREE_CODE(node) == REAL_CST) - -/* Helper to transparently allow tree codes and builtin function codes - exist in one storage entity. */ -class code_helper -{ -public: - code_helper () {} - code_helper (tree_code code) : rep ((int) code) {} - code_helper (built_in_function fn) : rep (-(int) fn) {} - operator tree_code () const { return (tree_code) rep; } - operator built_in_function () const { return (built_in_function) -rep; } - bool is_tree_code () const { return rep 0; } - bool is_fn_code () const { return rep 0; } -private: - int rep; -}; - - -/* Return whether T is a constant that we'll dispatch to fold to - evaluate fully constant expressions. */ - -static inline bool -constant_for_folding (tree t) -{ - return (CONSTANT_CLASS_P (t) - /* The following is only interesting to string builtins. */ - || (TREE_CODE (t) == ADDR_EXPR - TREE_CODE (TREE_OPERAND (t, 0)) == STRING_CST)); -} Index: gcc/gimple-match-head.c === --- gcc/gimple-match-head.c (revision 216146) +++ gcc/gimple-match-head.c (working copy) @@ -43,8 +43,6 @@ along with GCC; see the file COPYING3. #include dumpfile.h #include gimple-match.h -#define integral_op_p(node) INTEGRAL_TYPE_P(TREE_TYPE(node)) - /* Forward declarations of the private auto-generated matchers. They expect valueized operands in canonical order and do not
Re: [PATCH] Add D demangling support to libiberty
On Tue, Oct 14, 2014 at 7:12 AM, Joel Brobecker brobec...@adacore.com wrote: libiberty/ChangeLog 2014-08-05 Iain Buclaw ibuc...@gdcproject.org * Makefile.in (CFILES): Add d-demangle.c. (REQUIRED_OFILES): Add d-demangle.o. * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case. (cplus_demangle): Likewise. * d-demangle.c: New file. * testsuite/Makefile.in (really-check): Add check-d-demangle. * testsuite/d-demangle-expected: New file. As hinted on gdb-patches, this patch causes a GDB build failure on Solaris 2.9, because it uses strtold which is not available. According to gnulib's documentation, it should also break on the following systems: NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0, Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS. This patch attempts to fix the issue by adding a configure check for strtold and adjusts the code to use strtod if strtold does not exist. Does this look OK to you? If yes, can one of the GCC maintainers please review? It doesn't make sense to me to use strtod if strtold is required. And if strtold is not required, then it seems to me that we should always use strtod. It seems to me that the right options are either 1) use strtod unconditionally; 2) add strtold to libiberty Since option 1 is simpler, what bad things would happen if we use strtod unconditionally? Ian
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 11:49 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros. This caused a regression: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 Another bootstrap failure: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63536 -- H.J.
Re: [PATCH] AutoFDO patch for trunk
Index: gcc/cgraphclones.c === --- gcc/cgraphclones.c(revision 215826) +++ gcc/cgraphclones.c(working copy) @@ -453,6 +453,11 @@ } else count_scale = 0; + /* In AutoFDO, if edge count is larger than callee's entry block + count, we will not update the original callee because it may + mistakenly mark some hot function as cold. */ + if (flag_auto_profile gcov_count = count) +update_original = false; lets drop this from initial patch. Index: gcc/bb-reorder.c === --- gcc/bb-reorder.c (revision 215826) +++ gcc/bb-reorder.c (working copy) @@ -1569,15 +1569,14 @@ /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB_FN (bb, cfun) { - bool cold_bb = false; + bool cold_bb = probably_never_executed_bb_p (cfun, bb); and this too (basically all the tweaks should IMO go in independently and ideally in a way that does not need flag_auto_profile test). +/* Return true if BB contains indirect call. */ + +static bool +has_indirect_call (basic_block bb) +{ + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (gimple_code (stmt) == GIMPLE_CALL +(gimple_call_fn (stmt) == NULL + || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL)) You probably want to skip gimple_call_internal_p calls here. + +/* From AutoFDO profiles, find values inside STMT for that we want to measure + histograms for indirect-call optimization. */ + +static void +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map, + bool transform) +{ + gimple stmt = gsi_stmt (*gsi); + tree callee; + + if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL + || gimple_call_fndecl (stmt) != NULL_TREE) +return; + + callee = gimple_call_fn (stmt); + + histogram_value hist = gimple_alloc_histogram_value ( + cfun, HIST_TYPE_INDIR_CALL, stmt, callee); + hist-n_counters = 3; + hist-hvalue.counters = XNEWVEC (gcov_type, hist-n_counters); + gimple_add_histogram_value (cfun, stmt, hist); + + gcov_type total = 0; + icall_target_map::const_iterator max_iter = map.end(); + + for (icall_target_map::const_iterator iter = map.begin(); + iter != map.end(); ++iter) +{ + total += iter-second; + if (max_iter == map.end() || max_iter-second iter-second) + max_iter = iter; +} + + hist-hvalue.counters[0] = (unsigned long long) + afdo_string_table-get_name (max_iter-first); + hist-hvalue.counters[1] = max_iter-second; + hist-hvalue.counters[2] = total; + + if (!transform) +return; + + if (gimple_ic_transform (gsi)) +{ + struct cgraph_edge *indirect_edge = + cgraph_node::get (current_function_decl)-get_edge (stmt); + struct cgraph_node *direct_call = + find_func_by_profile_id ((int)hist-hvalue.counters[0]); + if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL) + return; + struct cgraph_edge *new_edge = + indirect_edge-make_speculative (direct_call, 0, 0); + new_edge-redirect_call_stmt_to_callee (); + gimple_remove_histogram_value (cfun, stmt, hist); + inline_call (new_edge, true, NULL, NULL, false); + return; +} + return; Is it necessary to go via histogram and gimple_ic_transform here? I would expect that all you need is to make the speculative edge and inline it. (bypassing the work of producing fake histogram value and calling igmple_ic_transofrm on it) Also it seems to me that you want to set direct_count nad frequency argument of make_speculative so the resulting function profile is not off. The rest of interfaces seems quite sane now. Can you please look into using speculative edges directly instead of hooking into the vpt infrastructure and fixing the formatting issues of the new pass? I will try to make another pass over the actual streaming logic that I find bit difficult to read, but I quite trust you it does the right thing ;) Honza
[PATCH x86, pr63534] Fix go bootstrap
Hi, Bootstaped with --enable-languages=c,c++,fortran,lto,go passed. Make check in progress. Is it ok? ChangeLog 2014-10-14 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_expand_split_stack_prologue): Make __morestack calls local. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a3ca2ed..5117572 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11999,7 +11999,10 @@ ix86_expand_split_stack_prologue (void) REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100); if (split_stack_fn == NULL_RTX) -split_stack_fn = gen_rtx_SYMBOL_REF (Pmode, __morestack); +{ + split_stack_fn = gen_rtx_SYMBOL_REF (Pmode, __morestack); + SYMBOL_REF_FLAGS (split_stack_fn) |= SYMBOL_FLAG_LOCAL; +} fn = split_stack_fn; /* Get more stack space. We pass in the desired stack space and the @@ -12044,9 +12047,11 @@ ix86_expand_split_stack_prologue (void) gcc_assert ((args_size 0x) == args_size); if (split_stack_fn_large == NULL_RTX) - split_stack_fn_large = - gen_rtx_SYMBOL_REF (Pmode, __morestack_large_model); - + { + split_stack_fn_large = + gen_rtx_SYMBOL_REF (Pmode, __morestack_large_model); + SYMBOL_REF_FLAGS (split_stack_fn_large) |= SYMBOL_FLAG_LOCAL; + } if (ix86_cmodel == CM_LARGE_PIC) { rtx_code_label *label;
Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2))
On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote: I'd like to merge the JIT branch into trunk: https://gcc.gnu.org/wiki/JIT This is v2 since it incorporates fixes for the various issues identified by Joseph in an earlier submission: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html I've split up the current diff between trunk and the branch into 5 areas for ease of review (and to allow for early merger of the supporting work, if it's deemed ready): patch 1: exposes an entrypoint in libiberty that I need patch 2: configure and Makefile changes in gcc patch 3: timevar.h: Add an auto_timevar class patch 4: State cleanups in gcc patch 5: Add the jit code itself [this is a diff of trunk r215958 aka e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06, vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1]. I've successfully bootstrapped and regression-tested the cumulative result of all of the patches against a control build, building them both with --enable-host-shared, and with --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto adding ,jit to the test build (both on x86_64-unknown-linux-gnu; Fedora 20). There were no regressions vs the control build, and the patched build gains a jit.sum, with 4663 passes (and no failures). OK for trunk? Patch 5 seems to have been too large, even compressed, so I'm breaking it up into separate pieces and compressing, giving 10 patches in total Patches 1-4 are as above. Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir Patch 6: the core of the JIT implementation: the gcc/jit subdir Patch 7: the testsuite: gcc/testsuite/jit.dg Patch 8: sphinx-based documentation: the gcc/jit/docs subdir Patch 9: texinfo documentation autogenerated from the sphinx sources. Patch 10: the ChangeLog.jit logs from the branch.
[PATCH 05/10] JIT-related changes outside of jit subdir
ChangeLog: * MAINTAINERS (Various Maintainers): Add myself as jit maintainer. contrib/ChangeLog: * jit-coverage-report.py: New file: a script to print crude code-coverage information for the libgccjit API. gcc/ChangeLog: * doc/install.texi (--enable-host-shared): Specify that this is required when building libgccjit. * timevar.def (TV_JIT_REPLAY): New. (TV_ASSEMBLE): New. (TV_LINK): New. (TV_LOAD): New. --- MAINTAINERS| 1 + contrib/jit-coverage-report.py | 67 ++ gcc/doc/install.texi | 2 +- gcc/timevar.def| 6 4 files changed, 75 insertions(+), 1 deletion(-) create mode 100644 contrib/jit-coverage-report.py diff --git a/MAINTAINERS b/MAINTAINERS index 5dca84e..1fa679e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -260,6 +260,7 @@ testsuite Janis Johnson jani...@codesourcery.com register allocationVladimir Makarovvmaka...@redhat.com gdbhooks.pyDavid Malcolm dmalc...@redhat.com SLSR Bill Schmidtwschm...@linux.vnet.ibm.com +jitDavid Malcolm dmalc...@redhat.com Note that individuals who maintain parts of the compiler need approval to check in changes outside of the parts of the compiler they maintain. diff --git a/contrib/jit-coverage-report.py b/contrib/jit-coverage-report.py new file mode 100644 index 000..529336f --- /dev/null +++ b/contrib/jit-coverage-report.py @@ -0,0 +1,67 @@ +#! /usr/bin/python +# +# Print a report on which libgccjit.so symbols are used in which test +# cases, and which lack test coverage. Tested with Python 2.7 and 3.2 +# To be run from the root directory of the source tree. +# +# Copyright (C) 2014 Free Software Foundation, Inc. +# Written by David Malcolm dmalc...@redhat.com. +# +# This script is Free Software, and it can be copied, distributed and +# modified as defined in the GNU General Public License. A copy of +# its license can be downloaded from http://www.gnu.org/copyleft/gpl.html + +from collections import Counter +import glob +import re +import sys + +def parse_map_file(path): + +Parse libgccjit.map, returning the symbols in the API as a list of str. + +syms = [] +with open(path) as f: +for line in f: +m = re.match('^\s+([a-z_]+);$', line) +if m: +syms.append(m.group(1)) +return syms + +def parse_test_case(path): + +Locate all symbol-like things in a C test case, yielding +them as a sequence of str. + +with open(path) as f: +for line in f: +for m in re.finditer('([_A-Za-z][_A-Za-z0-9]*)', line): +yield m.group(1) + +def find_test_cases(): +for path in glob.glob('gcc/testsuite/jit.dg/*.[ch]'): +yield path + +api_syms = parse_map_file('gcc/jit/libgccjit.map') + +syms_in_test_cases = {} +for path in find_test_cases(): +syms_in_test_cases[path] = list(parse_test_case(path)) + +uses = Counter() +for sym in sorted(api_syms): +print('symbol: %s' % sym) +uses[sym] = 0 +for path in syms_in_test_cases: +count = syms_in_test_cases[path].count(sym) +uses[sym] += count +if count: +print(' uses in %s: %i' % (path, count)) +if uses[sym] == 0: +print(' NEVER USED') +sys.stdout.write('\n') + +layout = '%40s %5s %s' +print(layout % ('SYMBOL', 'USES', 'HISTOGRAM')) +for sym, count in uses.most_common(): +print(layout % (sym, count, '*' * count if count else 'UNUSED')) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 75ac9a6..c92de28 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -954,7 +954,7 @@ Specify that the @emph{host} code should be built into position-independent machine code (with -fPIC), allowing it to be used within shared libraries, but yielding a slightly slower compiler. -Currently this option is only of use to people developing GCC itself. +This option is required when building the libgccjit.so library. Contrast with @option{--enable-shared}, which affects @emph{target} libraries. diff --git a/gcc/timevar.def b/gcc/timevar.def index a04d05c..b406c16 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -277,3 +277,9 @@ DEFTIMEVAR (TV_VERIFY_LOOP_CLOSED, verify loop closed) DEFTIMEVAR (TV_VERIFY_RTL_SHARING, verify RTL sharing) DEFTIMEVAR (TV_REBUILD_FREQUENCIES , rebuild frequencies) DEFTIMEVAR (TV_REPAIR_LOOPS , repair loop structures) + +/* Stuff used by libgccjit.so. */ +DEFTIMEVAR (TV_JIT_REPLAY , replay of JIT client activity) +DEFTIMEVAR (TV_ASSEMBLE , assemble JIT code) +DEFTIMEVAR (TV_LINK , link JIT code) +DEFTIMEVAR (TV_LOAD , load JIT result) -- 1.8.5.3
[PATCH 06/10] Heart of the JIT implementation (was: Re: [PATCH 0/5] Merger of jit branch (v2))
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote: On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote: I'd like to merge the JIT branch into trunk: https://gcc.gnu.org/wiki/JIT This is v2 since it incorporates fixes for the various issues identified by Joseph in an earlier submission: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html I've split up the current diff between trunk and the branch into 5 areas for ease of review (and to allow for early merger of the supporting work, if it's deemed ready): patch 1: exposes an entrypoint in libiberty that I need patch 2: configure and Makefile changes in gcc patch 3: timevar.h: Add an auto_timevar class patch 4: State cleanups in gcc patch 5: Add the jit code itself [this is a diff of trunk r215958 aka e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06, vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1]. I've successfully bootstrapped and regression-tested the cumulative result of all of the patches against a control build, building them both with --enable-host-shared, and with --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto adding ,jit to the test build (both on x86_64-unknown-linux-gnu; Fedora 20). There were no regressions vs the control build, and the patched build gains a jit.sum, with 4663 passes (and no failures). OK for trunk? Patch 5 seems to have been too large, even compressed, so I'm breaking it up into separate pieces and compressing, giving 10 patches in total Patches 1-4 are as above. Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir Patch 6: the core of the JIT implementation: the gcc/jit subdir Patch 7: the testsuite: gcc/testsuite/jit.dg Patch 8: sphinx-based documentation: the gcc/jit/docs subdir Patch 9: texinfo documentation autogenerated from the sphinx sources. Patch 10: the ChangeLog.jit logs from the branch. This commit adds the gcc/jit subdirectory, implementing the library, which looks like a frontend named jit from the POV of the rest of the gcc code. gcc/jit/ChangeLog: * Make-lang.in: New. * TODO.rst: New. * config-lang.in: New. * dummy-frontend.c: New. * jit-builtins.c: New. * jit-builtins.h: New. * jit-common.h: New. * jit-playback.c: New. * jit-playback.h: New. * jit-recording.c: New. * jit-recording.h: New. * libgccjit++.h: New. * libgccjit.c: New. * libgccjit.h: New. * libgccjit.map: New. * libgccjit.pc.in: New. * notes.txt: New. 0006-Heart-of-the-JIT-implementation.patch.gz Description: GNU Zip compressed data
[PATCH 07/10] Testsuite for the JIT (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote: On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote: I'd like to merge the JIT branch into trunk: https://gcc.gnu.org/wiki/JIT This is v2 since it incorporates fixes for the various issues identified by Joseph in an earlier submission: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html I've split up the current diff between trunk and the branch into 5 areas for ease of review (and to allow for early merger of the supporting work, if it's deemed ready): patch 1: exposes an entrypoint in libiberty that I need patch 2: configure and Makefile changes in gcc patch 3: timevar.h: Add an auto_timevar class patch 4: State cleanups in gcc patch 5: Add the jit code itself [this is a diff of trunk r215958 aka e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06, vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1]. I've successfully bootstrapped and regression-tested the cumulative result of all of the patches against a control build, building them both with --enable-host-shared, and with --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto adding ,jit to the test build (both on x86_64-unknown-linux-gnu; Fedora 20). There were no regressions vs the control build, and the patched build gains a jit.sum, with 4663 passes (and no failures). OK for trunk? Patch 5 seems to have been too large, even compressed, so I'm breaking it up into separate pieces and compressing, giving 10 patches in total Patches 1-4 are as above. Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir Patch 6: the core of the JIT implementation: the gcc/jit subdir Patch 7: the testsuite: gcc/testsuite/jit.dg Patch 8: sphinx-based documentation: the gcc/jit/docs subdir Patch 9: texinfo documentation autogenerated from the sphinx sources. Patch 10: the ChangeLog.jit logs from the branch. Here's patch 7, the testsuite. 0007-Testsuite-for-the-JIT.patch.gz Description: GNU Zip compressed data
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [16/n] Reduce bounds lifetime
On 09 Oct 11:32, Jeff Law wrote: On 10/08/14 13:24, Ilya Enkovich wrote: Hi, This patch adds a bounds lifetime reduction into checker optimization. Thanks, Ilya -- 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_reduce_bounds_lifetime): New. (chkp_opt_execute): Run bounds lifetime reduction algorithm. Basic tests pull into a file with the other optimization work. How expensive is nearest_common_dominator? Would it make more sense to use something like the concept of an anticipated expression from LCM? nearest_common_dominator searches for the nearest common ancestor in a tree so I expect it to be not more expensive than O(h), h - height of a tree. I suppose LCM would be more efficient in case of many processed bounds and many their uses. But this optimization is only for INIT bounds, NULL bounds and bounds for statically allocated objects. Thus their usage is quite limited. + /* Check we do not increase other values lifetime. */ + FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE) +{ + op = USE_FROM_PTR (use_p); + + if (TREE_CODE (op) == SSA_NAME + gimple_code (SSA_NAME_DEF_STMT (op)) != GIMPLE_NOP) +deps = true; Might as well break out of the FOR_EACH_PHI_OR_STMT_USE loop here. Note that some of our iterators have special mechanisms to break out of the loop, but my recollection is those are for the immediate use iterators to ensure the marker is removed. Code is probably OK if LCM/anticipated isn't reasonable and the above issues are dealt with. jeff Here is an updated version with break and testcase added. Thanks, Ilya -- gcc/ 2014-10-14 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp-opt.c (chkp_reduce_bounds_lifetime): New. (chkp_opt_execute): Run bounds lifetime reduction algorithm. gcc/testsuite/ 2014-10-14 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/chkp-lifetime-1.c: New. diff --git a/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c b/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c new file mode 100644 index 000..bcecdd0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkpopt-details } */ +/* { dg-final { scan-tree-dump Moving creation of \[^ \]+ down to its use chkpopt } } */ + +extern int arr[]; + +int test (int i) +{ + int res; + if (i = 0) +res = arr[i]; + else +res = -i; + return res; +} diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c index b3ff433..37da035 100644 --- a/gcc/tree-chkp-opt.c +++ b/gcc/tree-chkp-opt.c @@ -1277,6 +1277,158 @@ chkp_optimize_string_function_calls (void) } } +/* Intrumentation pass inserts most of bounds creation code + in the header of the function. We want to move bounds + creation closer to bounds usage to reduce bounds lifetime. + We also try to avoid bounds creation code on paths where + bounds are not used. */ +static void +chkp_reduce_bounds_lifetime (void) +{ + basic_block bb = FALLTHRU_EDGE (ENTRY_BLOCK_PTR_FOR_FN (cfun))-dest; + gimple_stmt_iterator i; + + for (i = gsi_start_bb (bb); !gsi_end_p (i); ) +{ + gimple dom_use, use_stmt, stmt = gsi_stmt (i); + basic_block dom_bb; + ssa_op_iter iter; + imm_use_iterator use_iter; + use_operand_p use_p; + tree op; + bool want_move = false; + bool deps = false; + + if (gimple_code (stmt) == GIMPLE_CALL + gimple_call_fndecl (stmt) == chkp_bndmk_fndecl) + want_move = true; + + if (gimple_code (stmt) == GIMPLE_ASSIGN + POINTER_BOUNDS_P (gimple_assign_lhs (stmt)) + gimple_assign_rhs_code (stmt) == VAR_DECL) + want_move = true; + + if (!want_move) + { + gsi_next (i); + continue; + } + + /* Check we do not increase other values lifetime. */ + FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE) + { + op = USE_FROM_PTR (use_p); + + if (TREE_CODE (op) == SSA_NAME + gimple_code (SSA_NAME_DEF_STMT (op)) != GIMPLE_NOP) + { + deps = true; + break; + } + } + + if (deps) + { + gsi_next (i); + continue; + } + + /* Check all usages of bounds. */ + if (gimple_code (stmt) == GIMPLE_CALL) + op = gimple_call_lhs (stmt); + else + { + gcc_assert (gimple_code (stmt) == GIMPLE_ASSIGN); + op = gimple_assign_lhs (stmt); + } + + dom_use = NULL; + dom_bb = NULL; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op) + { + if (dom_bb + dominated_by_p (CDI_DOMINATORS, + dom_bb, gimple_bb (use_stmt))) + { + dom_use = use_stmt; + dom_bb = NULL; + } + else if
[PATCH 10/10] ChangeLog files (Re: Patches 5-10 of jit merger (was: Re: [PATCH 0/5] Merger of jit branch (v2)))
On Tue, 2014-10-14 at 11:09 -0400, David Malcolm wrote: On Mon, 2014-10-13 at 13:45 -0400, David Malcolm wrote: I'd like to merge the JIT branch into trunk: https://gcc.gnu.org/wiki/JIT This is v2 since it incorporates fixes for the various issues identified by Joseph in an earlier submission: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02056.html I've split up the current diff between trunk and the branch into 5 areas for ease of review (and to allow for early merger of the supporting work, if it's deemed ready): patch 1: exposes an entrypoint in libiberty that I need patch 2: configure and Makefile changes in gcc patch 3: timevar.h: Add an auto_timevar class patch 4: State cleanups in gcc patch 5: Add the jit code itself [this is a diff of trunk r215958 aka e012cdc775868e9922f5fef9068a764546876d93 which is from 2014-10-06, vs jit branch version 75b3ee7acdc6de55354d65bb7d619386463e50a1]. I've successfully bootstrapped and regression-tested the cumulative result of all of the patches against a control build, building them both with --enable-host-shared, and with --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto adding ,jit to the test build (both on x86_64-unknown-linux-gnu; Fedora 20). There were no regressions vs the control build, and the patched build gains a jit.sum, with 4663 passes (and no failures). OK for trunk? Patch 5 seems to have been too large, even compressed, so I'm breaking it up into separate pieces and compressing, giving 10 patches in total Patches 1-4 are as above. Patch 5: remaining JIT-related changes outside of the gcc/jit/ subdir Patch 6: the core of the JIT implementation: the gcc/jit subdir Patch 7: the testsuite: gcc/testsuite/jit.dg Patch 8: sphinx-based documentation: the gcc/jit/docs subdir Patch 9: texinfo documentation autogenerated from the sphinx sources. Patch 10: the ChangeLog.jit logs from the branch. Finally, patch 10, the ChangeLog files. 0010-ChangeLog-files.patch.gz Description: GNU Zip compressed data
Re: [PATCH] Add zero-overhead looping for xtensa backend
PING? Cheers, Felix On Tue, Oct 14, 2014 at 12:30 AM, Felix Yang fei.yang0...@gmail.com wrote: Thanks for the comments. The patch checked the usage of teh trip count register, making sure that it is not used in the loop body other than the doloop_end or lives past the doloop_end instruction, as the following code snippet shows: + /* Scan all the blocks to make sure they don't use iter_reg. */ + if (loop-iter_reg_used || loop-iter_reg_used_outside) +{ + if (dump_file) +fprintf (dump_file, ;; loop %d uses iterator\n, + loop-loop_no); + return false; +} For the spill issue, I think we need to handle it. The reason is that currently we are not telling GCC about the existence of the LCOUNT register. Instead, we keep the trip count in a general register and it's possible that this register can be spilled when register pressure is high. It's a good idea to post another patch to describe the LCOUNT register in GCC in order to free this general register. But I want this patch applied as a first step, OK? Cheers, Felix On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterl...@gmail.com augustine.sterl...@gmail.com wrote: On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang fei.yang0...@gmail.com wrote: Hi Sterling, I made some improvement to the patch. Two changes: 1. TARGET_LOOPS is now used as a condition of the doloop related patterns, which is more elegant. Fine. 2. As the trip count register of the zero-cost loop maybe potentially spilled, we need to change the patterns in order to handle this issue. Actually, for xtensa you don't. The trip count is copied into LCOUNT at the execution of the loop instruction, and therefore a spill or whatever doesn't matter--it won't affect the result. So as long as you have the trip count at the start of the loop, you are fine. This does bring up an issue of whether or not the trip count can be modified during the loop. (note that this is different than early exit.) If it can, you can't use a zero-overhead loop. Does your patch address this case. The solution is similar to that adapted by c6x backend. Just turn the zero-cost loop into a regular loop when that happens when reload is completed. Attached please find version 4 of the patch. Make check regression tested with xtensa-elf-gcc/simulator. OK for trunk?
Re: [PATCH x86, pr63534] Fix go bootstrap
On 10/14/2014 08:08 AM, Evgeny Stupachenko wrote: Hi, Bootstaped with --enable-languages=c,c++,fortran,lto,go passed. Make check in progress. Is it ok? ChangeLog 2014-10-14 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_expand_split_stack_prologue): Make __morestack calls local. Ok. r~
Re: [PATCH x86, pr63534] Fix go bootstrap
On Tue, Oct 14, 2014 at 08:43:39AM -0700, Richard Henderson wrote: On 10/14/2014 08:08 AM, Evgeny Stupachenko wrote: Hi, Bootstaped with --enable-languages=c,c++,fortran,lto,go passed. Make check in progress. Is it ok? ChangeLog 2014-10-14 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_expand_split_stack_prologue): Make __morestack calls local. Ok. Please mention PR target/63534 in the ChangeLog. Jakub
Re: [PATCH 3/5] timevar.h: Add an auto_timevar class
On Tue, 2014-10-14 at 11:03 +0200, Richard Biener wrote: On Mon, Oct 13, 2014 at 7:45 PM, David Malcolm dmalc...@redhat.com wrote: This is used in a couple of places in jit/jit-playback.c to ensure that we pop the timevar on every exit path from a function. I could rewrite them if need be, but it does simplify things. Sorry to be bikeshedding but auto_timevar sounds odd - this is just a one-element timevar stack. Sorry that the usage examples didn't make it through in my original email; these are in patch 06/10 in gcc/jit/jit-playback.c and look like this: playback::context:: compile () { ... lots of code... { auto_timevar assemble_timevar (TV_ASSEMBLE); ... lots of code, with multiple return paths... } } the idea being that the timevar_pop happens implicitly at the exit from the scope (e.g. via one of the error-handling returns). FWIW I rather like the current name: I think of it as an RAII-style way of not having to manually call timevar_pop. The auto_ prefix to me evokes both such RAII types as auto_ptr and auto_vec, and the fact that it's intended to be on the stack i.e. have auto storage class. Don't have a real better name though :/ Maybe timevar_pushpop ? Otherwise this looks ok. Thanks, Richard. Written by Tom Tromey. gcc/ChangeLog: * timevar.h (class auto_timevar): New class. --- gcc/timevar.h | 24 1 file changed, 24 insertions(+) diff --git a/gcc/timevar.h b/gcc/timevar.h index 6703cc9..f018e39 100644 --- a/gcc/timevar.h +++ b/gcc/timevar.h @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv) timevar_pop_1 (tv); } +// This is a simple timevar wrapper class that pushes a timevar in its +// constructor and pops the timevar in its destructor. +class auto_timevar +{ + public: + auto_timevar (timevar_id_t tv) +: m_tv (tv) + { +timevar_push (m_tv); + } + + ~auto_timevar () + { +timevar_pop (m_tv); + } + + private: + + // Private to disallow copies. + auto_timevar (const auto_timevar ); + + timevar_id_t m_tv; +}; + extern void print_time (const char *, long); #endif /* ! GCC_TIMEVAR_H */ -- 1.8.5.3
Re: [PATCH] Implement -fsanitize=object-size
On Fri, Oct 10, 2014 at 12:26:44PM +0200, Jakub Jelinek wrote: 2014-10-10 Jakub Jelinek ja...@redhat.com * ubsan/Makefile.am (DEFS): Add -DPIC. * ubsan/Makefile.in: Regenerated. I've now bootstrapped/regtested this on x86_64-linux and i686-linux and committed as obvious. 2014-10-14 Jakub Jelinek ja...@redhat.com * ubsan/Makefile.am (DEFS): Add -DPIC. * ubsan/Makefile.in: Regenerated. --- libsanitizer/ubsan/Makefile.am 2014-09-24 11:08:04.183026156 +0200 +++ libsanitizer/ubsan/Makefile.am 2014-10-10 12:15:19.124247283 +0200 @@ -3,7 +3,7 @@ AM_CPPFLAGS = -I $(top_srcdir) -I $(top_ # May be used by toolexeclibdir. gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER) -DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DPIC AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC -fno-builtin -fno-exceptions -fno-rtti -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS) ACLOCAL_AMFLAGS = -I m4 --- libsanitizer/ubsan/Makefile.in 2014-09-25 15:01:25.448109866 +0200 +++ libsanitizer/ubsan/Makefile.in 2014-10-14 11:26:17.772201307 +0200 @@ -132,7 +132,7 @@ CXXCPP = @CXXCPP@ CXXDEPMODE = @CXXDEPMODE@ CXXFLAGS = @CXXFLAGS@ CYGPATH_W = @CYGPATH_W@ -DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS +DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DPIC DEPDIR = @DEPDIR@ DSYMUTIL = @DSYMUTIL@ DUMPBIN = @DUMPBIN@ Jakub
RE: New rematerialization sub-pass in LRA
Vladimir Makarov wrote: On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking. Perlbmk performance degradation is too big and I'll definitely look at this problem. Looking at the diffs in regexec.c which has the hot function regmatch(), nothing obvious stands out that could cause a serious regression. I did notice this around line 2300: .L802: ldr x1, [x23, 48] adrpx5, PL_savestack_ix ldr w0, [x23] str x5, [sp, 104] str x1, [x24, #:lo12:PL_regcc] ldr w27, [x1, 4] bl regcppush - ldr x5, [sp, 104] str w0, [sp, 112] ldr x0, [x23, 32] + adrpx5, PL_savestack_ix ldr w28, [x5, #:lo12:PL_savestack_ix] + str x5, [sp, 104] bl regmatch ldr x5, [sp, 104] mov w19, w0 ldr w1, [sp, 112] ldr w0, [x5, #:lo12:PL_savestack_ix] So it rematerializes once instance, but fails to rematerialize the second use. An extra store is inserted, and the first adrp and store are not removed as dead. Wilco
[PATCH] Fix optimize_range_tests_diff
Hi! When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. --- gcc/tree-ssa-reassoc.c.jj 2014-10-13 17:54:33.0 +0200 +++ gcc/tree-ssa-reassoc.c 2014-10-13 17:58:07.312705218 +0200 @@ -2250,8 +2250,13 @@ optimize_range_tests_diff (enum tree_cod if (tree_log2 (tem1) 0) return false; + type = unsigned_type_for (type); + tem1 = fold_convert (type, tem1); + tem2 = fold_convert (type, tem2); + lowi = fold_convert (type, lowi); mask = fold_build1 (BIT_NOT_EXPR, type, tem1); - tem1 = fold_binary (MINUS_EXPR, type, rangei-exp, lowi); + tem1 = fold_binary (MINUS_EXPR, type, + fold_convert (type, rangei-exp), lowi); tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask); lowj = build_int_cst (type, 0); if (update_range_test (rangei, rangej, 1, opcode, ops, tem1, Jakub
Re: [PATCH 3/5] IPA ICF pass
diff --git a/gcc/cgraph.h b/gcc/cgraph.h index fb41b01..2de98b4 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -172,6 +172,12 @@ public: /* Dump referring in list to FILE. */ void dump_referring (FILE *); + /* Get number of references for this node. */ + inline unsigned get_references_count (void) + { +return ref_list.references ? ref_list.references-length () : 0; + } Probably better called num_references() (like we have num_edge in basic-block.h) @@ -8068,6 +8069,19 @@ it may significantly increase code size (see @option{--param ipcp-unit-growth=@var{value}}). This flag is enabled by default at @option{-O3}. +@item -fipa-icf +@opindex fipa-icf +Perform Identical Code Folding for functions and read-only variables. +The optimization reduces code size and may disturb unwind stacks by replacing +a function by equivalent one with a different name. The optimization works +more effectively with link time optimization enabled. + +Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF +works on different levels and thus the optimizations are not same - there are +equivalences that are found only by GCC and equivalences found only by Gold. + +This flag is enabled by default at @option{-O2}. ... and -Os? +case ARRAY_REF: +case ARRAY_RANGE_REF: + { + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + + if (!compare_operand (array_ref_low_bound (t1), + array_ref_low_bound (t2))) + return return_false_with_msg (); + if (!compare_operand (array_ref_element_size (t1), + array_ref_element_size (t2))) + return return_false_with_msg (); + if (!compare_operand (x1, x2)) + return return_false_with_msg (); + return compare_operand (y1, y2); + } No need for {...} if there are no local vars. +bool +func_checker::compare_function_decl (tree t1, tree t2) +{ + bool ret = false; + + if (t1 == t2) +return true; + + symtab_node *n1 = symtab_node::get (t1); + symtab_node *n2 = symtab_node::get (t2); + + if (m_ignored_source_nodes != NULL m_ignored_target_nodes != NULL) +{ + ret = m_ignored_source_nodes-contains (n1) + m_ignored_target_nodes-contains (n2); + + if (ret) + return true; +} + + /* If function decl is WEAKREF, we compare targets. */ + cgraph_node *f1 = cgraph_node::get (t1); + cgraph_node *f2 = cgraph_node::get (t2); + + if(f1 f2 f1-weakref f2-weakref) +ret = f1-alias_target == f2-alias_target; + + return ret; Comparing aliases is bit more complicated than just handling weakrefs. I have patch for symtab_node::equivalent_address_p somewhre in queue. lets just drop the fancy stuff for the moment and compare f1f2 for equivalence. + ret = compare_decl (t1, t2); Why functions are not compared with compare_decl while variables are? + + return return_with_debug (ret); +} + +void +func_checker::parse_labels (sem_bb *bb) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb-bb); !gsi_end_p (gsi); + gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) == GIMPLE_LABEL) + { + tree t = gimple_label_label (stmt); + gcc_assert (TREE_CODE (t) == LABEL_DECL); + + m_label_bb_map.put (t, bb-bb-index); + } +} +} + +/* Basic block equivalence comparison function that returns true if + basic blocks BB1 and BB2 (from functions FUNC1 and FUNC2) correspond. + + In general, a collection of equivalence dictionaries is built for types + like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure + is utilized by every statement-by-stament comparison function. */ + +bool +func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) +{ + unsigned i; + gimple_stmt_iterator gsi1, gsi2; + gimple s1, s2; + + if (bb1-nondbg_stmt_count != bb2-nondbg_stmt_count + || bb1-edge_count != bb2-edge_count) +return return_false (); + + gsi1 = gsi_start_bb (bb1-bb); + gsi2 = gsi_start_bb (bb2-bb); + + for (i = 0; i bb1-nondbg_stmt_count; i++) +{ + if (is_gimple_debug (gsi_stmt (gsi1))) + gsi_next_nondebug (gsi1); + + if (is_gimple_debug (gsi_stmt (gsi2))) + gsi_next_nondebug (gsi2); + + s1 = gsi_stmt (gsi1); + s2 = gsi_stmt (gsi2); + + int eh1 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_source_func_decl), s1); + int eh2 = lookup_stmt_eh_lp_fn + (DECL_STRUCT_FUNCTION (m_target_func_decl), s2); + + if (eh1 != eh2) + return return_false_with_msg (EH regions are different); + + if (gimple_code (s1) != gimple_code (s2)) + return return_false_with_msg (gimple codes are different); + + switch (gimple_code (s1)) + { + case GIMPLE_CALL: +
[gomp] [3/3] OpenACC 2.0 support for libgomp - documentation
This is a version of the patch: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02024.html against gomp4 branch instead of mainline. OK to apply? Thanks, Julian -xx-xx Thomas Schwinge tho...@codesourcery.com James Norris jnor...@codesourcery.com libgomp/ * libgomp.texi: Outline documentation for OpenACC. From c58006a7ade2a9556bd73bac9ef45b3bbd62ca37 Mon Sep 17 00:00:00 2001 From: Julian Brown jul...@codesourcery.com Date: Wed, 17 Sep 2014 10:26:56 -0700 Subject: [PATCH 2/3] OpenACC documentation --- libgomp/libgomp.texi | 661 -- 1 file changed, 636 insertions(+), 25 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 254be57..9530a2b 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) @ifinfo @dircategory GNU Libraries @direntry -* libgomp: (libgomp).GNU OpenMP runtime library +* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library @end direntry -This manual documents the GNU implementation of the OpenMP API for +This manual documents the GNU implementation of the OpenACC API for +offloading of code to accelerator devices in C/C++ and Fortran and +the GNU implementation of the OpenMP API for multi-platform shared-memory parallel programming in C/C++ and Fortran. Published by the Free Software Foundation @@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA @setchapternewpage odd @titlepage -@title The GNU OpenMP Implementation +@title The GNU OpenACC and OpenMP Implementation @page @vskip 0pt plus 1filll @comment For the @value{version-GCC} Version* @@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@* @top Introduction @cindex Introduction -This manual documents the usage of libgomp, the GNU implementation of the +This manual documents the usage of libgomp, the GNU implementation of the +@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API) +for offloading of code to accelerator devices in C/C++ and Fortran, and +the GNU implementation of the @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API) for multi-platform shared-memory parallel programming in C/C++ and Fortran. @@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran. @comment better formatting. @comment @menu -* Enabling OpenMP::How to enable OpenMP for your applications. -* Runtime Library Routines:: The OpenMP runtime application programming - interface. -* Environment Variables:: Influencing runtime behavior with environment - variables. -* The libgomp ABI::Notes on the external ABI presented by libgomp. -* Reporting Bugs:: How to report bugs in GNU OpenMP. -* Copying::GNU general public license says - how you can copy and share libgomp. -* GNU Free Documentation License:: - How you can copy and share this manual. -* Funding::How to help assure continued work for free - software. -* Library Index:: Index of this documentation. +* Enabling OpenACC:: How to enable OpenACC for your + applications. +* OpenACC Runtime Library Routines:: The OpenACC runtime application + programming interface. +* OpenACC Environment Variables::Influencing OpenACC runtime behavior with + environment variables. +* OpenACC Library Interoperability:: OpenACC library interoperability with the + NVIDIA CUBLAS library. +* Enabling OpenMP:: How to enable OpenMP for your + applications. +* OpenMP Runtime Library Routines: Runtime Library Routines. + The OpenMP runtime application programming + interface. +* OpenMP Environment Variables: Environment Variables. + Influencing OpenMP runtime behavior with + environment variables. +* The libgomp ABI:: Notes on the external libgomp ABI. +* Reporting Bugs:: How to report bugs. +* Copying:: GNU general public license says how you + can copy and share libgomp. +* GNU Free Documentation License:: How you can copy and share this manual. +* Funding:: How to help assure continued work for free + software. +* Library Index::Index of this documentation. @end menu + +@c
Re: [PATCH] Add D demangling support to libiberty
On 14 October 2014 15:28, Ian Lance Taylor i...@google.com wrote: On Tue, Oct 14, 2014 at 7:12 AM, Joel Brobecker brobec...@adacore.com wrote: libiberty/ChangeLog 2014-08-05 Iain Buclaw ibuc...@gdcproject.org * Makefile.in (CFILES): Add d-demangle.c. (REQUIRED_OFILES): Add d-demangle.o. * cplus-dem.c (libiberty_demanglers): Add dlang_demangling case. (cplus_demangle): Likewise. * d-demangle.c: New file. * testsuite/Makefile.in (really-check): Add check-d-demangle. * testsuite/d-demangle-expected: New file. As hinted on gdb-patches, this patch causes a GDB build failure on Solaris 2.9, because it uses strtold which is not available. According to gnulib's documentation, it should also break on the following systems: NetBSD 3.0, OpenBSD 3.8, Minix 3.1.8, IRIX 6.5, OSF/1 4.0, Solaris 9, Cygwin, MSVC 9, Interix 3.5, BeOS. This patch attempts to fix the issue by adding a configure check for strtold and adjusts the code to use strtod if strtold does not exist. Does this look OK to you? If yes, can one of the GCC maintainers please review? It doesn't make sense to me to use strtod if strtold is required. And if strtold is not required, then it seems to me that we should always use strtod. It seems to me that the right options are either 1) use strtod unconditionally; 2) add strtold to libiberty Since option 1 is simpler, what bad things would happen if we use strtod unconditionally? Ian I've just seen this, so I'll repeat what I've said in gdb patches too. The call to strtold is only needed to decode templates which have a floating point value encoded inside. This value may or may not have a greater than double precision. Replacing long double with double will be fine with me. I'll accept that I didn't consider legacy in hindsight, and in reality it would be rather rare to stumble upon the need for strtold. Regards Iain
RE: New rematerialization sub-pass in LRA
Wilco Dijkstra wrote: Vladimir Makarov wrote: On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking. Perlbmk performance degradation is too big and I'll definitely look at this problem. Looking at the diffs in regexec.c which has the hot function regmatch(), nothing obvious stands out that could cause a serious regression. I did notice this around line 2300: .L802: ldr x1, [x23, 48] adrpx5, PL_savestack_ix ldr w0, [x23] str x5, [sp, 104] str x1, [x24, #:lo12:PL_regcc] ldr w27, [x1, 4] bl regcppush - ldr x5, [sp, 104] str w0, [sp, 112] ldr x0, [x23, 32] + adrpx5, PL_savestack_ix ldr w28, [x5, #:lo12:PL_savestack_ix] + str x5, [sp, 104] bl regmatch ldr x5, [sp, 104] mov w19, w0 ldr w1, [sp, 112] ldr w0, [x5, #:lo12:PL_savestack_ix] So it rematerializes once instance, but fails to rematerialize the second use. An extra store is inserted, and the first adrp and store are not removed as dead. A simple example that reproduces the issue (-mcpu=cortex-a57 -O2 -fomit-frame-pointer -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30). It looks like an odd interaction between -fcaller-saves and rematerialization. void g(void); int x; int f3b(int y) { y += x; g(); y += x; g(); y += x; return y; } f3b: adrpx2, x -- DEAD sub sp, sp, #16 ldr w1, [x2, #:lo12:x] str x2, [sp] -- DEAD add w0, w0, w1 str w0, [sp] -- reuse of stackslot!!! bl g adrpx2, x ldr w0, [sp] ldr w1, [x2, #:lo12:x] str x2, [sp, 8] add w0, w0, w1 str w0, [sp] -- REMOVE bl g ldr x2, [sp, 8] -- rematerialize adrp ldr w0, [sp] add sp, sp, 16 ldr w1, [x2, #:lo12:x] add w0, w0, w1 ret Wilco
[PATCH] PR lto/61048 Define missed builtins on demand
Hi all, Attached patch fixes PR lto/61048 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61048 The reason of failure was that the builtin information structure was not initialized properly at the link stage. The failed assertion was caused by missing builtin declaration ( BUILT_IN_ASAN_AFTER_DYNAMIC_INIT), which was requested from this structure. As usual this information should be initialized in function lto_define_builtins, which is called from LTO lang hook function lto_init. But in the given testcase the initialization did not happen, since the declaration is initialized only if the following condition holds: (flag_sanitize (SANITIZE_ADDRESS | SANITIZE_THREAD \ | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT)) But if the user compiles (without linking) file in LTO mode with -fsanitize=address option, and then tries to link the executable from *.o file, but does not specify option -fsanitize=address, variable flag_sanitize will be 0 and sanitizer builtins info will not be initialized, and ICE will happen. Commands to reproduce the problem: g++ test.cpp -c -o test.o -fsanitize=address -flto g++ test.o -o test -Wl,-flto # At this stage flag_sanitize is 0, and sanitizer builtins are not defined. The simplest way to fix this seems to add initialization of sanitizer builtins using function initialize_sanitizer_builtins - and this helps to avoid ICE: diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index bc53632..f5ca849 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include ipa-inline.h #include params.h #include ipa-utils.h +#include asan.h /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. */ @@ -1856,6 +1857,9 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data, data_in = lto_data_in_create (decl_data, (const char *) data + string_offset, header-string_size, resolutions); + /* Initialize sanitizer builtins if necessary. */ + initialize_sanitizer_builtins(); + /* We do not uniquify the pre-loaded cache entries, those are middle-end internal types that should not be merged. */ But this approach means that asan-specific functions must be called from lto. The suggested patch proposes another approach: add definitions of builtins during the final stage, when they are requested from builtin_info structure. I have tried to do it by adding lto-specific lang-hook, so that to reuse existing code for builtins initialization (currently builtins are initialized in lto_init hook). In the attached patch such hook is added, and it is used in streamer_get_builtin_tree. It seems that the discussed issue can happen not only for flag -fsanitize, but also for all options that cause the definition of builtins, so the proposed patch is independent from sanitizers. The patch was bootstrapped and regtested on x86_64-unknown-linux-gnu. Ok for trunk? Best regards, Ilya Palachev From 926a8b84a52f3120c3f71cd28e0d782c719b7791 Mon Sep 17 00:00:00 2001 From: Ilya Palachev i.palac...@samsung.com Date: Tue, 14 Oct 2014 19:22:32 +0400 Subject: [PATCH] Define missed builtins on demand gcc/ 2014-10-14 Ilya Palachev i.palac...@samsung.com * langhooks.h (define_builtin_on_demand): New function. * langhooks-def.h (LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND): New macro. * lto/lto-lang.c (lto_define_builtin_on_demand): New function. * tree-streamer-in.c (streamer_get_builtin_tree): Use define_builtin_on_demand in case when the declaration of builtin is missing. gcc/testsuite/ 2014-10-14 Ilya Palachev i.palac...@samsung.com * g++.dg/lto/pr61048_0.C: New test from bugzilla. --- gcc/langhooks-def.h | 4 +++- gcc/langhooks.h | 3 +++ gcc/lto/lto-lang.c | 16 gcc/testsuite/g++.dg/lto/pr61048_0.C | 10 ++ gcc/tree-streamer-in.c | 4 5 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/lto/pr61048_0.C diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h index e5ae3e3..2ddccbc 100644 --- a/gcc/langhooks-def.h +++ b/gcc/langhooks-def.h @@ -254,11 +254,13 @@ extern void lhd_end_section (void); #define LANG_HOOKS_BEGIN_SECTION lhd_begin_section #define LANG_HOOKS_APPEND_DATA lhd_append_data #define LANG_HOOKS_END_SECTION lhd_end_section +#define LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND 0 #define LANG_HOOKS_LTO { \ LANG_HOOKS_BEGIN_SECTION, \ LANG_HOOKS_APPEND_DATA, \ - LANG_HOOKS_END_SECTION \ + LANG_HOOKS_END_SECTION, \ + LANG_HOOKS_DEFINE_BUILTIN_ON_DEMAND \ } /* The whole thing. The structure is defined in langhooks.h. */ diff --git a/gcc/langhooks.h b/gcc/langhooks.h index 32e76f9..a0cbe5f 100644 --- a/gcc/langhooks.h +++ b/gcc/langhooks.h @@ -255,6 +255,9 @@ struct lang_hooks_for_lto /* End the previously begun LTO section.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/14/14 07:00, Jakub Jelinek wrote: For the first two, I think (and said it before already) that the current model of emitting set_got from a target hook during RA can't work, as there can be calls in the prologue, and the prologue is inserted before the set_got in that case. I really think the RA should in that case just tell the backend whether and in which register it wants to have the PIC register loaded upon start of the function, and it should be emit prologue pass that should arrange for that. That works for me -- I've been encouraging Intel to push emitting the PIC setup further and further back in the RTL pipeline. Their early patches had it very early in the RTL pipeline and naturally there was fallout/bleedout in various places in the optimizers. I don't see much value in emitting the PIC setup prior to allocation, all I see is problems. As for the code quality, either some RA improvements are needed, or postreload must be able to fix it up, or hardreg propagation (though, cprop_hardreg is forward propagation rather than backwards, right?). Better before prologue is emitted though, because that will save/restore the badly chosen hard reg too. RA improvements are the way to go -- however, my understanding is that overall the code is better now than it was before Intel's changes, so I don't consider the performance side as a blocker for this code. The biggest performance issue identified so far is rematerialization. The initial patch Intel sent to me was totally unacceptable as it just hacked off optimizers rather than digging into the guts of why IRA/LRA was unable to sanely rematerialize the PIC register value. jeff
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Tue, Oct 14, 2014 at 9:43 AM, Jeff Law l...@redhat.com wrote: RA improvements are the way to go -- however, my understanding is that overall the code is better now than it was before Intel's changes, so I don't consider the performance side as a blocker for this code. The new approach improves PIC code quality in functions where there no frequent GOT access and extra register helps. For ld.so and libc.so from glibc build, we use 2 registers to access GOT instead of one register which may lead to lower performance in shared libraries. -- H.J.
Re: [PATCH 2/3] libstdc++: Add put_time support.
On 13/10/14 16:28 +0100, Jonathan Wakely wrote: On 13/10/14 13:08 +0100, Jonathan Wakely wrote: On 15/04/14 23:20 +0200, Rüdiger Sonderfeld wrote: Described in [ext.manip]. * libstdc++-v3/include/std/iomanip (_Put_time): New struct. (put_time): New manipulator. (operator): New overloaded function. * libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc: * libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/2.cc: * libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/wchar_t/1.cc: * libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/wchar_t/2.cc: New file. The 27_io/manipulators/extended/put_time/char/2.cc and 27_io/manipulators/extended/put_time/wchar_t/2.cc tests fail for me. i2.exe: /home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/2.cc:41: void test01(): Assertion `oss.str() == Son 1971' failed. FAIL: 27_io/manipulators/extended/put_time/char/2.cc execution test With my de_DE.utf8 locale the output is So 1971 not Son 1971. $ LANG=de_DE.utf8 date +%a Mo So let's just test the full name and not worry about how it's abbreviated. Tested x86_64-linux, committed to trunk. commit 4ae8f20e4924754d7fb7809730f5491dc6a74944 Author: Jonathan Wakely jwak...@redhat.com Date: Tue Oct 14 17:48:44 2014 +0100 2014-10-14 R??diger Sonderfeld ruedi...@c-plusplus.de PR libstdc++/54354 * include/std/iomanip (_Put_time): New struct. (put_time): New manipulator. (operator): New overloaded function. * testsuite/27_io/manipulators/extended/put_time/char/1.cc: New. * testsuite/27_io/manipulators/extended/put_time/char/2.cc: New. * testsuite/27_io/manipulators/extended/put_time/wchar_t/1.cc: New. * testsuite/27_io/manipulators/extended/put_time/wchar_t/2.cc: New. diff --git a/libstdc++-v3/include/std/iomanip b/libstdc++-v3/include/std/iomanip index 9625d43..fce74c9 100644 --- a/libstdc++-v3/include/std/iomanip +++ b/libstdc++-v3/include/std/iomanip @@ -337,6 +337,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __os; } + templatetypename _CharT +struct _Put_time +{ + const std::tm* _M_tmb; + const _CharT* _M_fmt; +}; + + /** + * @brief Extended manipulator for formatting time. + * + * This manipulator uses time_put::put to format time. + * [ext.manip] + * + * @param __tmb struct tm time data to format. + * @param __fmt format string. + */ + templatetypename _CharT +inline _Put_time_CharT +put_time(const std::tm* __tmb, const _CharT* __fmt) +{ return { __tmb, __fmt }; } + + templatetypename _CharT, typename _Traits +basic_ostream_CharT, _Traits +operator(basic_ostream_CharT, _Traits __os, _Put_time_CharT __f) +{ + typename basic_ostream_CharT, _Traits::sentry __cerb(__os); + if (__cerb) +{ + ios_base::iostate __err = ios_base::goodbit; + __try +{ + typedef ostreambuf_iterator_CharT, _Traits _Iter; + typedef time_put_CharT, _Iter_TimePut; + + const _CharT* const __fmt_end = __f._M_fmt + +_Traits::length(__f._M_fmt); + + const _TimePut __mp = use_facet_TimePut(__os.getloc()); + if (__mp.put(_Iter(__os.rdbuf()), __os, __os.fill(), + __f._M_tmb, __f._M_fmt, __fmt_end).failed()) +__err |= ios_base::badbit; +} + __catch(__cxxabiv1::__forced_unwind) +{ + __os._M_setstate(ios_base::badbit); + __throw_exception_again; +} + __catch(...) +{ __os._M_setstate(ios_base::badbit); } + if (__err) +__os.setstate(__err); +} + return __os; +} + #if __cplusplus 201103L #define __cpp_lib_quoted_string_io 201304 diff --git a/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc b/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc new file mode 100644 index 000..76e64ea --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/manipulators/extended/put_time/char/1.cc @@ -0,0 +1,44 @@ +// { dg-options -std=gnu++11 } + +// 2014-04-14 R??diger Sonderfeld ruedi...@c-plusplus.de + +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library;
Re: [PATCH] Add D demangling support to libiberty
I've just seen this, so I'll repeat what I've said in gdb patches too. The call to strtold is only needed to decode templates which have a floating point value encoded inside. This value may or may not have a greater than double precision. Replacing long double with double will be fine with me. I'll accept that I didn't consider legacy in hindsight, and in reality it would be rather rare to stumble upon the need for strtold. Attached is a patch that switches it to strtod. Do you have any test that could quickly verify it? That seems to be the best approach, at least short-term. Later on, if we do want to use higher precision, we can indeed add strtold in libiberty. libiberty/ChangeLog: * d-demangle.c: Replace strtold with strtod in global comment. (strtold): Remove declaration. (strtod): New declaration. (dlang_parse_real): Declare value as double instead of long double. Replace call to strtold by call to strtod. Update format in call to snprintf. I verified that the patch allows GDB to build on both sparc-solaris and x86_64-linux. Thanks, -- Joel From 99f9794c6d2f4dabed0bbcf2cf362b1eb25ee2a7 Mon Sep 17 00:00:00 2001 From: Joel Brobecker brobec...@adacore.com Date: Tue, 14 Oct 2014 12:47:43 -0400 Subject: [PATCH] Use strtod instead of strtold in libiberty/d-demangle.c strtold is currently used to decode templates which have a floating-point value encoded inside; but this routine is not available on some systems, such as Solaris 2.9 for instance. This patch fixes the issue by replace the use of strtold by strtod. It reduces a bit the precision, but it should still remain acceptable in most cases. libiberty/ChangeLog: * d-demangle.c: Replace strtold with strtod in global comment. (strtold): Remove declaration. (strtod): New declaration. (dlang_parse_real): Declare value as double instead of long double. Replace call to strtold by call to strtod. Update format in call to snprintf. --- libiberty/d-demangle.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c index d31bf94..bb481c0 100644 --- a/libiberty/d-demangle.c +++ b/libiberty/d-demangle.c @@ -28,7 +28,7 @@ If not, see http://www.gnu.org/licenses/. */ /* This file exports one function; dlang_demangle. - This file imports strtol and strtold for decoding mangled literals. */ + This file imports strtol and strtod for decoding mangled literals. */ #ifdef HAVE_CONFIG_H #include config.h @@ -44,7 +44,7 @@ If not, see http://www.gnu.org/licenses/. */ #include stdlib.h #else extern long strtol (const char *nptr, char **endptr, int base); -extern long double strtold (const char *nptr, char **endptr); +extern double strtod (const char *nptr, char **endptr); #endif #include demangle.h @@ -810,7 +810,7 @@ dlang_parse_real (string *decl, const char *mangled) { char buffer[64]; int len = 0; - long double value; + double value; char *endptr; /* Handle NAN and +-INF. */ @@ -877,12 +877,12 @@ dlang_parse_real (string *decl, const char *mangled) /* Convert buffer from hexadecimal to floating-point. */ buffer[len] = '\0'; - value = strtold (buffer, endptr); + value = strtod (buffer, endptr); if (endptr == NULL || endptr != (buffer + len)) return NULL; - len = snprintf (buffer, sizeof(buffer), %#Lg, value); + len = snprintf (buffer, sizeof(buffer), %#g, value); string_appendn (decl, buffer, len); return mangled; } -- 1.7.9.5
[patch] Make std::align tests depend on stdint.h
Tested x86_64-linux, committed to trunk.
Re: [PATCH] Fix optimize_range_tests_diff
On October 14, 2014 6:02:19 PM CEST, Jakub Jelinek ja...@redhat.com wrote: Hi! When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. --- gcc/tree-ssa-reassoc.c.jj 2014-10-13 17:54:33.0 +0200 +++ gcc/tree-ssa-reassoc.c 2014-10-13 17:58:07.312705218 +0200 @@ -2250,8 +2250,13 @@ optimize_range_tests_diff (enum tree_cod if (tree_log2 (tem1) 0) return false; + type = unsigned_type_for (type); + tem1 = fold_convert (type, tem1); + tem2 = fold_convert (type, tem2); + lowi = fold_convert (type, lowi); mask = fold_build1 (BIT_NOT_EXPR, type, tem1); - tem1 = fold_binary (MINUS_EXPR, type, rangei-exp, lowi); + tem1 = fold_binary (MINUS_EXPR, type, +fold_convert (type, rangei-exp), lowi); tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask); lowj = build_int_cst (type, 0); if (update_range_test (rangei, rangej, 1, opcode, ops, tem1, Jakub
Re: [PATCH] Fix optimize_range_tests_diff
On 10/14/14 10:02, Jakub Jelinek wrote: Hi! When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. Any chance this fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302 Jeff
[patch] Update libstdc++ status docs
Committed to trunk. commit a94516a841a0588c6c7bf95248c2eaefd5e406f1 Author: Jonathan Wakely jwak...@redhat.com Date: Tue Oct 14 18:21:03 2014 +0100 * doc/xml/manual/intro.xml: Update. * doc/xml/manual/status_cxx2011.xml: Update. * doc/html/manual/*: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml index a71a9f9..2dd833d 100644 --- a/libstdc++-v3/doc/xml/manual/intro.xml +++ b/libstdc++-v3/doc/xml/manual/intro.xml @@ -803,6 +803,13 @@ requirements of the license of GCC. listitemparaThe traditional HP / SGI return type and value is blessed by the resolution of the DR. /para/listitem/varlistentry + +varlistentrytermlink xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=../ext/lwg-defects.html#13391339/link: + emphasisuninitialized_fill_n should return the end of its range/emphasis +/term +listitemparaReturn the end of the filled range. +/para/listitem/varlistentry + /variablelist /section diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml index c4b4457..a553adf 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml @@ -539,10 +539,9 @@ particular release. entry/ /row row - ?dbhtml bgcolor=#C8B0B0 ? entry20.6.5/entry entryAlign/entry - entryN/entry + entryY/entry entry/ /row row @@ -2139,7 +2138,7 @@ particular release. entryFormatting and manipulators/entry entryPartial/entry entry -Missing codeget_time/code and codeput_time/code manipulators. +Missing codeget_time/code manipulator. /entry /row row
Re: [PATCH] Add D demangling support to libiberty
On Tue, Oct 14, 2014 at 10:07 AM, Joel Brobecker brobec...@adacore.com wrote: libiberty/ChangeLog: * d-demangle.c: Replace strtold with strtod in global comment. (strtold): Remove declaration. (strtod): New declaration. (dlang_parse_real): Declare value as double instead of long double. Replace call to strtold by call to strtod. Update format in call to snprintf. This is OK. Thanks. Ian
Re: [PATCH] Add D demangling support to libiberty
On 14 October 2014 18:07, Joel Brobecker brobec...@adacore.com wrote: I've just seen this, so I'll repeat what I've said in gdb patches too. The call to strtold is only needed to decode templates which have a floating point value encoded inside. This value may or may not have a greater than double precision. Replacing long double with double will be fine with me. I'll accept that I didn't consider legacy in hindsight, and in reality it would be rather rare to stumble upon the need for strtold. Attached is a patch that switches it to strtod. Do you have any test that could quickly verify it? That seems to be the best approach, at least short-term. Later on, if we do want to use higher precision, we can indeed add strtold in libiberty. See d-demangle-expected in the libiberty testsuite, in particular: _D8demangle17__T4testVde0A8P6Zv demangle.test!(42.) _D8demangle16__T4testVdeA8P2Zv demangle.test!(42.) _D8demangle18__T4testVdeN0A8P6Zv demangle.test!(-42.) _D8demangle31__T4testVde0F6E978D4FDF3B646P7Zv demangle.test!(123.456) I doubt they would need adjusting. Regards Iain
Re: [PATCH] Fix optimize_range_tests_diff
On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote: On 10/14/14 10:02, Jakub Jelinek wrote: When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. Any chance this fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302 No. For that I have right now: - if (tree_log2 (lowxor) 0) + if (wi::popcount (wi::to_widest (lowxor)) != 1) in my tree, though supposedly: if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION (TREE_TYPE (lowxor != 1) might be better, as without zext it will supposedly not say popcount is 1 for smaller precision signed minimum values. My wide-int-fu is limited, so if there is a better way to do this, I'm all ears. Jakub
Re: [PATCH] Fix optimize_range_tests_diff
On 10/14/14 11:40, Jakub Jelinek wrote: On Tue, Oct 14, 2014 at 11:23:22AM -0600, Jeff Law wrote: On 10/14/14 10:02, Jakub Jelinek wrote: When hacking on range reassoc opt, I've noticed we can emit code with undefined behavior even when there wasn't one originally, in particular for: (X - 43U) = 3U || (X - 75U) = 3U and this loop can transform that into ((X - 43U) ~(75U - 43U)) = 3U. */ we actually don't transform it to what the comment says, but ((X - 43) ~(75U - 43U)) = 3U i.e. the initial subtraction can be performed in signed type, if in here X is e.g. INT_MIN or INT_MIN + 42, the subtraction at gimple level would be UB (not caught by -fsanitize=undefined, because that is handled much earlier). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-10-14 Jakub Jelinek ja...@redhat.com * tree-ssa-reassoc.c (optimize_range_tests_diff): Perform MINUS_EXPR in unsigned type to avoid undefined behavior. Any chance this fixes: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63302 No. For that I have right now: - if (tree_log2 (lowxor) 0) + if (wi::popcount (wi::to_widest (lowxor)) != 1) in my tree, though supposedly: if (wi::popcount (wi::zext (wi::to_widest (lowxor), TYPE_PRECISION (TREE_TYPE (lowxor != 1) might be better, as without zext it will supposedly not say popcount is 1 for smaller precision signed minimum values. My wide-int-fu is limited, so if there is a better way to do this, I'm all ears. Ok. Thanks for checking. jeff
Re: [PATCH, Pointer Bounds Checker 14/x] Passes [4/n] Memory accesses instrumentation
On 10/14/14 04:08, Ilya Enkovich wrote: Are you just looking for the parameter in which we pass the static chain? Look at get_chain_decl for how we set it up. You may actually have to peek at more fields. I don't think there's a single magic bit that says this is the static chain. Though it may always appear in the same location on the parameter list. Nested functions aren't something I'd poked with much. Richard Henderson might know more since he wrote tree-nested a while back. Looking through tree-nested.c I found there is a static_chain_decl in function structure holding created decl. Perfect. Ugh. Note how this introduces another place that anyone who might add a new RHS gimple statement needs to edit. We need a pointer back to this code so that folks will know it needs updating. The question is where to put it. Basically we want a place where anyone adding a new code that can appear on the RHS of an assignment must change already. Thoughts on a good location? I realize there's probably many other places that probably need these kinds of documentation back links, I'm not asking you to address all of them. Actually it shouldn't be so critical to meet some new RHS code in this switch. We may always say that we cannot find proper bounds and use default ones. I replaced gcc_uneachable with a warning about lost bounds and added a comment into tree.def. Would it be enough? It'd be better than hitting the gcc_unreachable :-) It's not perfect, but probably good enough. Jeff
Re: [PATCH] Add D demangling support to libiberty
libiberty/ChangeLog: * d-demangle.c: Replace strtold with strtod in global comment. (strtold): Remove declaration. (strtod): New declaration. (dlang_parse_real): Declare value as double instead of long double. Replace call to strtold by call to strtod. Update format in call to snprintf. This is OK. Thanks, Ian. As suggested by Iain, I re-ran the libiberty testsuite on x86_64-linux before committing the patch. Thank you both! -- Joel
Re: [PATCH 2/3] libstdc++: Add put_time support.
On Tuesday 14 October 2014 18:01:59 Jonathan Wakely wrote: So let's just test the full name and not worry about how it's abbreviated. Tested x86_64-linux, committed to trunk. Sorry for causing the trouble. I had it tested on my local machine. Maybe the de_DE.utf8 locale is different. Anyway testing for the full name is probably a better idea. Thanks. Regards, Rüdiger.
Re: [PATCH, DWARF] re-init dw_frame_pointer_regnum between functions
On 10/14/2014 06:02 AM, Christian Bruel wrote: 2014-09-23 Christian Bruel christian.br...@st.com * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each function. It's tempting to make this a local variable within dwarf2out_frame_debug_expr and not try to cache it at all. But this is ok. r~
Re: [PATCH, DWARF] re-init dw_frame_pointer_regnum between functions
On 10/14/2014 11:25 AM, Richard Henderson wrote: On 10/14/2014 06:02 AM, Christian Bruel wrote: 2014-09-23 Christian Bruel christian.br...@st.com * execute_dwarf2_frame (dw_frame_pointer_regnum): Reinitialize for each function. It's tempting to make this a local variable within dwarf2out_frame_debug_expr and not try to cache it at all. But this is ok. For the record, this also points out that the arm backend ought to be weaned away from using dwarf2out_frame_debug_expr and use the REG_CFA_* notes exclusively. That would also fix an apparent error in arm_expand_prologue: if (IS_INTERRUPT (func_type)) { /* Interrupt functions must not corrupt any registers. Creating a frame pointer however, corrupts the IP register, so we must push it first. */ emit_multi_reg_push (1 IP_REGNUM, 1 IP_REGNUM); /* Do not set RTX_FRAME_RELATED_P on this insn. The dwarf stack unwinding code only wants to see one stack decrement per function, and this is not it. If this instruction is labeled as being part of the frame creation sequence then dwarf2out_frame_debug_expr will die when it encounters the assignment of IP to FP later on, since the use of SP here establishes SP as the CFA register and not IP. Anyway this instruction is not really part of the stack frame creation although it is part of the prologue. */ Certainly dwarf2cfi can handle arbitrary REG_CFA_ADJUST_CFA notes; it's just the frame_debug_expr state machine that gets confused. r~
Re: [PATCH, rtl-optimization] Fix PR63475, Postreload CSE propagates aliased memory operand
On 10/14/14 01:11, Uros Bizjak wrote: 2014-10-14 Uros Bizjak ubiz...@gmail.com PR rtl-optimization/63475 * alias.c (true_dependence_1): Always use get_addr to extract true address operands from x_addr and mem_addr. Use extracted address operands to check for references with alignment ANDs. Use extracted address operands with find_base_term and base_alis_check. For noncanonicalized operands call canon_rtx with extracted address operand. (write_dependence_1): Ditto. (may_alias_p): Ditto. Remove unused calls to canon_rtx. s/alis/alias in the ChangeLog Patch was thoroughly tested on x86_64-linux-gnu {,-m32} and alpha-linux-gnu for all default languages plus obj-c++ and go. While there was no differences on x86_64-linux-gnu (as expected), alpha-linux-gnu improved the result [1] for some hundred of PASSes in gfortran testsuite [2]. OK for mainline? OK. No addition tests needed since this is covered by the existing suite. jeff
Re: [PATCH i386 AVX512] [56/n] Add plus/minus/abs/neg/andnot insn patterns.
On Tue, Oct 14, 2014 at 9:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello Uroš, It seems like I missed to post uppdated patch. On 25 Sep 20:11, Uros Bizjak wrote: I'd rather go with the second approach, it is less confusing from the maintainer POV. All other patterns with masking use some consistent template, so I'd suggest using the same approach for everything. If it is indeed too many patterns, then please split the patch to smaller pieces. Goal was not to decrease size of the patch, I wanted to make pattern look simpler by hiding masking stuff beyond `subst'. Anyway, I've updated the patch. Here it is (bootstrapped and regtested). Is it ok for trunk? gcc/ * config/i386/sse.md (define_mode_iterator VI_AVX2): Extend to support AVX-512BW. (define_mode_iterator VI124_AVX2_48_AVX512F): Remove. (define_expand plusminus_insnmode3): Remove masking support. (define_insn *plusminus_insnmode3): Ditto. (define_expand plusminus_insnVI48_AVX512VL:mode3_mask): New. (define_expand plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI48_AVX512VL:mode3_mask): Ditto. (define_insn *plusminus_insnVI12_AVX512VL:mode3_mask): Ditto. (define_expand sse2_avx2_andnotmode3): Remove masking support. (define_insn *andnotmode3): Ditto. (define_expand sse2_avx2_andnotVI48_AVX512VL:mode3_mask): New. (define_expand sse2_avx2_andnotVI12_AVX512VL:mode3_mask): Ditto. (define_insn *andnotVI48_AVX512VL:mode3mask_name): Ditto. (define_insn *andnotVI12_AVX512VL:mode3mask_name): Ditto. (define_insn *absmode2): Remove masking support. (define_insn absVI48_AVX512VL:mode2_mask): New. (define_insn absVI12_AVX512VL:mode2_mask): Ditto. (define_expand absmode2): Use VI_AVX2 mode iterator. IMO, it seems much more readable this way. OK for mainline. Thanks, Uros.
Re: [PATCH 2/2] Fix ILP32 ld.so.
On 08/08/2014 08:51 PM, Andrew Pinski wrote: ChangeLog: * explow.c (convert_memory_address_addr_space): Rename to ... (convert_memory_address_addr_space_1): This. Add in_const argument. Inside a CONST RTL, permute the conversion and addition of constant for zero and sign extended pointers. (convert_memory_address_addr_space): New function. Ok, with one nit... +((in_const POINTERS_EXTEND_UNSIGNED !=0) Missing space after != r~
[PATCH v2 03/13] Allow the static chain to be set from C
Replacing the hacky v1 with the proposed syntax relayed by PCC, and changing the name to __builtin_call_with_static_chain. Which is kinda long, but at least it's more properly descriptive. Adds documentation and an errors test case. r~ From 7e31234f2e112bad576b748b2ff6cc615194c0f7 Mon Sep 17 00:00:00 2001 From: Richard Henderson r...@redhat.com Date: Tue, 7 Oct 2014 12:17:28 -0700 Subject: [PATCH 03/13] Allow the static chain to be set from C We need to be able to set the static chain on a few calls within the Go runtime, so expose this with __builtin_call_with_static_chain. --- gcc/c-family/c-common.c | 2 ++ gcc/c-family/c-common.h | 2 +- gcc/c/c-parser.c | 40 gcc/doc/extend.texi | 13 + gcc/testsuite/gcc.dg/cwsc0.c | 18 ++ gcc/testsuite/gcc.dg/cwsc1.c | 31 +++ 6 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 23163f5..f1bf47b 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -442,6 +442,8 @@ const struct c_common_resword c_common_reswords[] = { __attribute__, RID_ATTRIBUTE, 0 }, { __auto_type, RID_AUTO_TYPE, D_CONLY }, { __bases, RID_BASES, D_CXXONLY }, + { __builtin_call_with_static_chain, +RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY }, { __builtin_choose_expr, RID_CHOOSE_EXPR, D_CONLY }, { __builtin_complex, RID_BUILTIN_COMPLEX, D_CONLY }, { __builtin_shuffle, RID_BUILTIN_SHUFFLE, 0 }, diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 1e3477f..da1c12e 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -102,7 +102,7 @@ enum rid RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL, RID_CHOOSE_EXPR, RID_TYPES_COMPATIBLE_P, RID_BUILTIN_COMPLEX, RID_BUILTIN_SHUFFLE, RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128, - RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, + RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, RID_BUILTIN_CALL_WITH_STATIC_CHAIN, /* C11 */ RID_ALIGNAS, RID_GENERIC, diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 346448a..708a125 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7372,6 +7372,46 @@ c_parser_postfix_expression (c_parser *parser) = comptypes (e1, e2) ? integer_one_node : integer_zero_node; } break; + case RID_BUILTIN_CALL_WITH_STATIC_CHAIN: + { + vecc_expr_t, va_gc *cexpr_list; + c_expr_t *e2_p; + tree chain_value; + + c_parser_consume_token (parser); + if (!c_parser_get_builtin_args (parser, + __builtin_call_with_static_chain, + cexpr_list, false)) + { + expr.value = error_mark_node; + break; + } + if (vec_safe_length (cexpr_list) != 2) + { + error_at (loc, wrong number of arguments to + %__builtin_call_with_static_chain%); + expr.value = error_mark_node; + break; + } + + expr = (*cexpr_list)[0]; + e2_p = (*cexpr_list)[1]; + *e2_p = convert_lvalue_to_rvalue (loc, *e2_p, true, true); + chain_value = e2_p-value; + mark_exp_read (chain_value); + + if (TREE_CODE (expr.value) != CALL_EXPR) + error_at (loc, first argument to + %__builtin_call_with_static_chain% + must be a call expression); + else if (TREE_CODE (TREE_TYPE (chain_value)) != POINTER_TYPE) + error_at (loc, second argument to + %__builtin_call_with_static_chain% + must be a pointer type); + else + CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value; + break; + } case RID_BUILTIN_COMPLEX: { vecc_expr_t, va_gc *cexpr_list; diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 6db142e..f092ea1 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8639,6 +8639,7 @@ in the Cilk Plus language manual which can be found at @node Other Builtins @section Other Built-in Functions Provided by GCC @cindex built-in functions +@findex __builtin_call_with_static_chain @findex __builtin_fpclassify @findex __builtin_isfinite @findex __builtin_isnormal @@ -9227,6 +9228,18 @@ depending on the arguments' types. For example: @end deftypefn +@deftypefn {Built-in Function} @var{type} __builtin_call_with_static_chain (@var{call_exp}, @var{pointer_exp}) + +The @var{call_exp} expression must be a function call, and the +@var{pointer_exp} expression must be a pointer. The @var{pointer_exp} +is passed to the function call in the target's static chain location. +The result of builtin is the result of the function call. + +@emph{Note:} This builtin is only available for C@. +This builtin can be used to call Go closures from C. + +@end deftypefn + @deftypefn {Built-in Function} @var{type} __builtin_choose_expr (@var{const_exp}, @var{exp1}, @var{exp2}) You can use the built-in function @code{__builtin_choose_expr} to diff --git
Re: [PATCH] Fix typo in comment for IRA
On 10/13/14 20:49, Kito Cheng wrote: Hi Marc: - -1 if it is not a cost classe. */ + -1 if it is not a cost classes. */ a cost class, no plural here. Thank you for correcting me :) Hi Jeff: Thanks, and updated patch in attachment, However I don't have commit right yet, can you help me to commit it? thanks. Done. Thanks. jeff
Re: RFA: fix mode confusion in caller-save.c:replace_reg_with_saved_mem
On 10/13/14 18:16, Joern Rennecke wrote: On 13 October 2014 20:43, Jeff Law l...@redhat.com wrote: ... I think you want smode in the mode_for_size call rather than mode, right (both instances)? No, nregs is the number of hard registers of regno in mode. Hence we must use the size of mode. OK. My bad. To get some case where there's a difference, I was thinking of an architecture that has partial integer mode registers that can be grouped together as integral integer mode registers (e.g. one reg is HImode or PSImode, save_mode would be PSImode, two regs form SImode). In that case, you'd want something so that you can piece together mode, i.e. either GET_MODE_CLASS (mode) or MODE_INT (which happen to be again the same), but not GET_MODE_CLASS(smode), which would be MODE_PARTIAL_INT You're right. We definitely don't want MODE_PARTIAL_INT here. So if your patch resolves your issue, passes the usual bootstrap/regression test, then let's go with it. jeff
Re: __intN patch 3/5: main __int128 - __intN conversion.
extensions. Is this OK? If so, is there anything else, or can I check the whole mess in yet? Go ahead. Thanks! Committed.
Re: __intN patch 3/5: main __int128 - __intN conversion.
On 2014.08.25 at 23:03 -0400, DJ Delorie wrote: I'd like to see the updated version of the whole of patch 3 (tested to be actually independent of the other patches) for review, though I won't be reviewing the C++ parts. Here it is. Tested on x86_64. I include the msp430-modes.def patch for demonstration purposes although obviously msp430's __int20 won't work without the other patches. This patch breaks ppc64: ../../gcc/gcc/config/rs6000/rs6000-c.c: In function ‘cpp_hashnode* rs6000_macro_to_expand(cpp_reader*, const cpp_token*)’: ../../gcc/gcc/config/rs6000/rs6000-c.c:237:24: error: ‘RID_INT128’ was not declared in this scope make[3]: *** [rs6000-c.o] Error 1 -- Markus
Re: New rematerialization sub-pass in LRA
On Fri, 2014-10-10 at 11:02 -0400, Vladimir Makarov wrote: Here is a new rematerialization sub-pass of LRA. When Mike and I build with this patch along with the patch that enables LRA by default on powerpc64*-linux (attached below), we're seeing the following error message. I'm not sure how your patch can cause this error, but it does go away if we remove your patch and build again. Peter # Enable LRA by default Index: gcc/config/rs6000/rs6000.opt === --- gcc/config/rs6000/rs6000.opt(revision 216216) +++ gcc/config/rs6000/rs6000.opt(working copy) @@ -466,7 +466,7 @@ Target RejectNegative Joined UInteger Va -mlong-double-n Specify size of long double (64 or 128 bits) mlra -Target Report Var(rs6000_lra_flag) Init(0) Save +Target Report Var(rs6000_lra_flag) Init(1) Save Use LRA instead of reload msched-costly-dep= Error message caused by LRA Rematerialization patch: make[5]: Entering directory `/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include' mkdir -p ./powerpc64-linux/bits/stdc++.h.gch /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/./gcc/xgcc -shared-libgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/./gcc -nostdinc++ -L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/src -L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/src/.libs -L/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/libsupc++/.libs -B/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/bin/ -B/home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/lib/ -isystem /home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/include -isystem /home/bergner/gcc/install/gcc-fsf-mainline-lra-remat/powerpc64-linux/sys-include -x c++-header -nostdinc++ -g -O2 -D_GNU_SOURCE -I/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/powerpc64-linux -I/home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include -I/home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc++-v3/libsupc++ -O2 -g -std=gnu++0x /home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc++-v3/include/precompiled/stdc++.h \ -o powerpc64-linux/bits/stdc++.h.gch/O2ggnu++0x.gch In file included from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/move.h:57:0, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_pair.h:59, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_algobase.h:64, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/char_traits.h:39, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/ios:40, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/istream:38, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/sstream:38, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/complex:45, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/ccomplex:38, from /home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc ++-v3/include/precompiled/stdc++.h:52: /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:251:12: error: redefinition of ‘struct std::__is_integral_helperunsigned int’ struct __is_integral_helperunsigned __int128 ^ /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:226:12: error: previous definition of ‘struct std::__is_integral_helperunsigned int’ struct __is_integral_helperunsigned int ^ /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:1763:12: error: redefinition of ‘struct std::__make_signedunsigned int’ struct __make_signedunsigned __int128 ^ /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/type_traits:1735:12: error: previous definition of ‘struct std::__make_signedunsigned int’ struct __make_signedunsigned int ^ In file included from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/random:42:0, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/bits/stl_algo.h:66, from /home/bergner/gcc/build/gcc-fsf-mainline-lra-remat/powerpc64-linux/libstdc++-v3/include/algorithm:62, from /home/bergner/gcc/gcc-fsf-mainline-bootstrap/libstdc ++-v3/include/precompiled/stdc++.h:64:
Fix PR ada/62019
Someone broke again weak external symbols in Ada in exactly the same way as: https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00431.html probably during the ongoing C++ reshuffling: FAIL: gnat.dg/weak2.adb (test for excess errors) Tested on x86_64-suse-linux, applied on the mainline as obvious. 2014-10-14 Eric Botcazou ebotca...@adacore.com PR ada/62019 * tree-eh.c (tree_could_trap) FUNCTION_DECL: Revamp and really do not choke on null node. VAR_DECL: Likewise. -- Eric BotcazouIndex: tree-eh.c === --- tree-eh.c (revision 216193) +++ tree-eh.c (working copy) @@ -2657,15 +2657,12 @@ tree_could_trap_p (tree expr) /* Assume that accesses to weak functions may trap, unless we know they are certainly defined in current TU or in some other LTO partition. */ - if (DECL_WEAK (expr) !DECL_COMDAT (expr)) + if (DECL_WEAK (expr) !DECL_COMDAT (expr) DECL_EXTERNAL (expr)) { - struct cgraph_node *node; - if (!DECL_EXTERNAL (expr)) - return false; - node = cgraph_node::get (expr)-function_symbol (); - if (node node-in_other_partition) - return false; - return true; + cgraph_node *node = cgraph_node::get (expr); + if (node) + node = node-function_symbol (); + return !(node node-in_other_partition); } return false; @@ -2673,15 +2670,12 @@ tree_could_trap_p (tree expr) /* Assume that accesses to weak vars may trap, unless we know they are certainly defined in current TU or in some other LTO partition. */ - if (DECL_WEAK (expr) !DECL_COMDAT (expr)) + if (DECL_WEAK (expr) !DECL_COMDAT (expr) DECL_EXTERNAL (expr)) { - varpool_node *node; - if (!DECL_EXTERNAL (expr)) - return false; - node = varpool_node::get (expr)-ultimate_alias_target (); - if (node node-in_other_partition) - return false; - return true; + varpool_node *node = varpool_node::get (expr); + if (node) + node = node-ultimate_alias_target (); + return !(node node-in_other_partition); } return false;
Re: [PATCH] AutoFDO patch for trunk
On Tue, Oct 14, 2014 at 8:02 AM, Jan Hubicka hubi...@ucw.cz wrote: Index: gcc/cgraphclones.c === --- gcc/cgraphclones.c(revision 215826) +++ gcc/cgraphclones.c(working copy) @@ -453,6 +453,11 @@ } else count_scale = 0; + /* In AutoFDO, if edge count is larger than callee's entry block + count, we will not update the original callee because it may + mistakenly mark some hot function as cold. */ + if (flag_auto_profile gcov_count = count) +update_original = false; lets drop this from initial patch. Done Index: gcc/bb-reorder.c === --- gcc/bb-reorder.c (revision 215826) +++ gcc/bb-reorder.c (working copy) @@ -1569,15 +1569,14 @@ /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB_FN (bb, cfun) { - bool cold_bb = false; + bool cold_bb = probably_never_executed_bb_p (cfun, bb); and this too (basically all the tweaks should IMO go in independently and ideally in a way that does not need flag_auto_profile test). Done. +/* Return true if BB contains indirect call. */ + +static bool +has_indirect_call (basic_block bb) +{ + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (gimple_code (stmt) == GIMPLE_CALL +(gimple_call_fn (stmt) == NULL + || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL)) You probably want to skip gimple_call_internal_p calls here. Done + +/* From AutoFDO profiles, find values inside STMT for that we want to measure + histograms for indirect-call optimization. */ + +static void +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map, + bool transform) +{ + gimple stmt = gsi_stmt (*gsi); + tree callee; + + if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL + || gimple_call_fndecl (stmt) != NULL_TREE) +return; + + callee = gimple_call_fn (stmt); + + histogram_value hist = gimple_alloc_histogram_value ( + cfun, HIST_TYPE_INDIR_CALL, stmt, callee); + hist-n_counters = 3; + hist-hvalue.counters = XNEWVEC (gcov_type, hist-n_counters); + gimple_add_histogram_value (cfun, stmt, hist); + + gcov_type total = 0; + icall_target_map::const_iterator max_iter = map.end(); + + for (icall_target_map::const_iterator iter = map.begin(); + iter != map.end(); ++iter) +{ + total += iter-second; + if (max_iter == map.end() || max_iter-second iter-second) + max_iter = iter; +} + + hist-hvalue.counters[0] = (unsigned long long) + afdo_string_table-get_name (max_iter-first); + hist-hvalue.counters[1] = max_iter-second; + hist-hvalue.counters[2] = total; + + if (!transform) +return; + + if (gimple_ic_transform (gsi)) +{ + struct cgraph_edge *indirect_edge = + cgraph_node::get (current_function_decl)-get_edge (stmt); + struct cgraph_node *direct_call = + find_func_by_profile_id ((int)hist-hvalue.counters[0]); + if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL) + return; + struct cgraph_edge *new_edge = + indirect_edge-make_speculative (direct_call, 0, 0); + new_edge-redirect_call_stmt_to_callee (); + gimple_remove_histogram_value (cfun, stmt, hist); + inline_call (new_edge, true, NULL, NULL, false); + return; +} + return; Is it necessary to go via histogram and gimple_ic_transform here? I would expect that all you need is to make the speculative edge and inline it. (bypassing the work of producing fake histogram value and calling igmple_ic_transofrm on it) Also it seems to me that you want to set direct_count nad frequency argument of make_speculative so the resulting function profile is not off. This function is actually served for 2 purposes: * before annotation, we need to mark histogram, promote and inline * after annotation, we just need to mark, and let follow-up logic to decide if it needs to promote and inline. And you are right, for the before annotation case, we can simply call mark speculative and inline. But we still need the logic to fake histogram for after annotation case. As a result, I unified two cases into one function to reuse code as much as possible. Shall I separate it into two functions instead? The rest of interfaces seems quite sane now. Can you please look into using speculative edges directly instead of hooking into the vpt infrastructure and fixing the formatting issues of the new pass? I'll work on the formatting issues now (need to learn the format first ;-). The attached patch is up-to-date except for formatting changes. I'll upload the patch again once the format change is in. Thanks, Dehao I will try to make another pass over
Re: __intN patch 3/5: main __int128 - __intN conversion.
../../gcc/gcc/config/rs6000/rs6000-c.c:237:24: error: ‘RID_INT128’ was not declared in this scope Two options: 1. If you know the RS6000 will never have any __intN other than __int128, just use RID_INT_N_0, although this is a hack it will work as long as there *is* an __int128 for RS6000. 2. Alternately, you need to check all entries in the __intN array for proper size, which is more correct but more complex. Would you like me to work on the second option, or would you prefer to tackle this yourself?
Re: __intN patch 5/5: msp430-specific changes
This is the MSP430-specific use of the new intN framework to enable true 20-bit pointers. Since I'm one of the MSP430 maintainers, this patch is being posted for reference, not for approval. Now that the other parts are committed, I'm checking this one in too. gcc/config/msp430 * config/msp430/msp430-modes.def (PSI): Add. * config/msp430/msp430-protos.h (msp430_hard_regno_nregs_has_padding): New. (msp430_hard_regno_nregs_with_padding): New. * config/msp430/msp430.c (msp430_scalar_mode_supported_p): New. (msp430_hard_regno_nregs_has_padding): New. (msp430_hard_regno_nregs_with_padding): New. (msp430_unwind_word_mode): Use PSImode instead of SImode. (msp430_addr_space_legitimate_address_p): New. (msp430_asm_integer): New. (msp430_init_dwarf_reg_sizes_extra): New. (msp430_print_operand): Use X suffix for PSImode even in small model. * config/msp430/msp430.h (POINTER_SIZE): Use 20 bits, not 32. (PTR_SIZE): ...but 4 bytes for EH. (SIZE_TYPE): Use __int20. (PTRDIFF_TYPE): Likewise. (INCOMING_FRAME_SP_OFFSET): Adjust. * config/msp430/msp430.md (movqi_topbyte): New. (movpsi): Use fixed suffixes. (movsipsi2): Enable for 430X, not large model. (extendhipsi2): Likewise. (zero_extendhisi2): Likewise. (zero_extendhisipsi2): Likewise. (extend_and_shift1_hipsi2): Likewise. (extendpsisi2): Likewise. (*bitbranchmode4_z): Fix suffix logic. Index: gcc/config/msp430/msp430-protos.h === --- gcc/config/msp430/msp430-protos.h (revision 213886) +++ gcc/config/msp430/msp430-protos.h (working copy) @@ -27,12 +27,15 @@ void msp430_expand_epilogue (int); void msp430_expand_helper (rtx *operands, const char *, bool); void msp430_expand_prologue (void); const char * msp430x_extendhisi (rtx *); void msp430_fixup_compare_operands (enum machine_mode, rtx *); int msp430_hard_regno_mode_ok (int, enum machine_mode); int msp430_hard_regno_nregs (int, enum machine_mode); +int msp430_hard_regno_nregs_has_padding (int, enum machine_mode); +int msp430_hard_regno_nregs_with_padding (int, enum machine_mode); +boolmsp430_hwmult_enabled (void); rtx msp430_incoming_return_addr_rtx (void); void msp430_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); int msp430_initial_elimination_offset (int, int); boolmsp430_is_interrupt_func (void); const char * msp430x_logical_shift_right (rtx); const char * msp430_mcu_name (void); Index: gcc/config/msp430/msp430.md === --- gcc/config/msp430/msp430.md (revision 213886) +++ gcc/config/msp430/msp430.md (working copy) @@ -176,12 +176,19 @@ @ MOV.B\t%1, %0 MOV%X1.B\t%1, %0 ) +(define_insn movqi_topbyte + [(set (match_operand:QI 0 msp_nonimmediate_operand =r) + (subreg:QI (match_operand:PSI 1 msp_general_operand r) 2))] + msp430x + PUSHM.A\t#1,%1 { POPM.W\t#1,%0 { POPM.W\t#1,%0 +) + (define_insn movqi [(set (match_operand:QI 0 msp_nonimmediate_operand =rYs,rm) (match_operand:QI 1 msp_general_operand riYs,rmi))] @ MOV.B\t%1, %0 @@ -220,27 +227,27 @@ ;; Some MOVX.A cases can be done with MOVA, this is only a few of them. (define_insn movpsi [(set (match_operand:PSI 0 msp_nonimmediate_operand =r,Ya,rm) (match_operand:PSI 1 msp_general_operand riYa,r,rmi))] @ - MOV%Q0\t%1, %0 - MOV%Q0\t%1, %0 - MOV%X0.%Q0\t%1, %0) + MOVA\t%1, %0 + MOVA\t%1, %0 + MOVX.A\t%1, %0) ; This pattern is identical to the truncsipsi2 pattern except ; that it uses a SUBREG instead of a TRUNC. It is needed in ; order to prevent reload from converting (set:SI (SUBREG:PSI (SI))) ; into (SET:PSI (PSI)). ; ; Note: using POPM.A #1 is two bytes smaller than using POPX.A (define_insn movsipsi2 [(set (match_operand:PSI0 register_operand =r) (subreg:PSI (match_operand:SI 1 register_operand r) 0))] - TARGET_LARGE + msp430x PUSH.W\t%H1 { PUSH.W\t%L1 { POPM.A #1, %0 ; Move reg-pair %L1:%H1 into pointer %0 ) ;; ;; Math @@ -564,49 +571,49 @@ { return msp430x_extendhisi (operands); } ) (define_insn extendhipsi2 [(set (match_operand:PSI 0 nonimmediate_operand =r) (subreg:PSI (sign_extend:SI (match_operand:HI 1 nonimmediate_operand 0)) 0))] - TARGET_LARGE + msp430x RLAM #4, %0 { RRAM #4, %0 ) ;; Look for cases where integer/pointer conversions are suboptimal due ;; to missing patterns, despite us not having opcodes for these ;; patterns. Doing these manually allows for alternate optimization ;; paths. (define_insn zero_extendhisi2 [(set (match_operand:SI 0 nonimmediate_operand =rm)
Re: [PATCH] AutoFDO patch for trunk
The new patch is attached. I used clang-format for format auto-profile.{c|h} Thanks, Dehao On Tue, Oct 14, 2014 at 2:05 PM, Dehao Chen de...@google.com wrote: On Tue, Oct 14, 2014 at 8:02 AM, Jan Hubicka hubi...@ucw.cz wrote: Index: gcc/cgraphclones.c === --- gcc/cgraphclones.c(revision 215826) +++ gcc/cgraphclones.c(working copy) @@ -453,6 +453,11 @@ } else count_scale = 0; + /* In AutoFDO, if edge count is larger than callee's entry block + count, we will not update the original callee because it may + mistakenly mark some hot function as cold. */ + if (flag_auto_profile gcov_count = count) +update_original = false; lets drop this from initial patch. Done Index: gcc/bb-reorder.c === --- gcc/bb-reorder.c (revision 215826) +++ gcc/bb-reorder.c (working copy) @@ -1569,15 +1569,14 @@ /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB_FN (bb, cfun) { - bool cold_bb = false; + bool cold_bb = probably_never_executed_bb_p (cfun, bb); and this too (basically all the tweaks should IMO go in independently and ideally in a way that does not need flag_auto_profile test). Done. +/* Return true if BB contains indirect call. */ + +static bool +has_indirect_call (basic_block bb) +{ + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (gimple_code (stmt) == GIMPLE_CALL +(gimple_call_fn (stmt) == NULL + || TREE_CODE (gimple_call_fn (stmt)) != FUNCTION_DECL)) You probably want to skip gimple_call_internal_p calls here. Done + +/* From AutoFDO profiles, find values inside STMT for that we want to measure + histograms for indirect-call optimization. */ + +static void +afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map map, + bool transform) +{ + gimple stmt = gsi_stmt (*gsi); + tree callee; + + if (map.size() == 0 || gimple_code (stmt) != GIMPLE_CALL + || gimple_call_fndecl (stmt) != NULL_TREE) +return; + + callee = gimple_call_fn (stmt); + + histogram_value hist = gimple_alloc_histogram_value ( + cfun, HIST_TYPE_INDIR_CALL, stmt, callee); + hist-n_counters = 3; + hist-hvalue.counters = XNEWVEC (gcov_type, hist-n_counters); + gimple_add_histogram_value (cfun, stmt, hist); + + gcov_type total = 0; + icall_target_map::const_iterator max_iter = map.end(); + + for (icall_target_map::const_iterator iter = map.begin(); + iter != map.end(); ++iter) +{ + total += iter-second; + if (max_iter == map.end() || max_iter-second iter-second) + max_iter = iter; +} + + hist-hvalue.counters[0] = (unsigned long long) + afdo_string_table-get_name (max_iter-first); + hist-hvalue.counters[1] = max_iter-second; + hist-hvalue.counters[2] = total; + + if (!transform) +return; + + if (gimple_ic_transform (gsi)) +{ + struct cgraph_edge *indirect_edge = + cgraph_node::get (current_function_decl)-get_edge (stmt); + struct cgraph_node *direct_call = + find_func_by_profile_id ((int)hist-hvalue.counters[0]); + if (DECL_STRUCT_FUNCTION (direct_call-decl) == NULL) + return; + struct cgraph_edge *new_edge = + indirect_edge-make_speculative (direct_call, 0, 0); + new_edge-redirect_call_stmt_to_callee (); + gimple_remove_histogram_value (cfun, stmt, hist); + inline_call (new_edge, true, NULL, NULL, false); + return; +} + return; Is it necessary to go via histogram and gimple_ic_transform here? I would expect that all you need is to make the speculative edge and inline it. (bypassing the work of producing fake histogram value and calling igmple_ic_transofrm on it) Also it seems to me that you want to set direct_count nad frequency argument of make_speculative so the resulting function profile is not off. This function is actually served for 2 purposes: * before annotation, we need to mark histogram, promote and inline * after annotation, we just need to mark, and let follow-up logic to decide if it needs to promote and inline. And you are right, for the before annotation case, we can simply call mark speculative and inline. But we still need the logic to fake histogram for after annotation case. As a result, I unified two cases into one function to reuse code as much as possible. Shall I separate it into two functions instead? The rest of interfaces seems quite sane now. Can you please look into using speculative edges directly instead of hooking into the vpt infrastructure and fixing the formatting issues of the new pass? I'll work on the formatting issues now (need to learn the format first ;-). The
[Bug libstdc++/63500] [4.9/5 Regression] bug in debug version of std::make_move_iterator?
Hi Here is a proposal to fix the issue with iterators which do not expose lvalue references when dereferenced. I simply chose to detect such an issue in c++11 mode thanks to the is_lvalue_reference meta function. 2014-10-15 François Dumont fdum...@gcc.gnu.org PR libstdc++/63500 * include/bits/cpp_type_traits.h (__true_type): Add __value constant. (__false_type): Likewise. * include/debug/functions.h (__foreign_iterator_aux2): Do not check for foreign iterators if input iterators returns rvalue reference. * testsuite/23_containers/vector/63500.cc: New. Tested under Linux x86_64. François Index: include/bits/cpp_type_traits.h === --- include/bits/cpp_type_traits.h (revision 216158) +++ include/bits/cpp_type_traits.h (working copy) @@ -79,9 +79,12 @@ { _GLIBCXX_BEGIN_NAMESPACE_VERSION - struct __true_type { }; - struct __false_type { }; + struct __true_type + { enum { __value = 1 }; }; + struct __false_type + { enum { __value = 0 }; }; + templatebool struct __truth_type { typedef __false_type __type; }; Index: include/debug/functions.h === --- include/debug/functions.h (revision 216158) +++ include/debug/functions.h (working copy) @@ -34,7 +34,7 @@ // _Iter_base #include bits/cpp_type_traits.h // for __is_integer #include bits/move.h// for __addressof and addressof -# include bits/stl_function.h // for less +#include bits/stl_function.h // for less #if __cplusplus = 201103L # include type_traits // for is_lvalue_reference and __and_ #endif @@ -252,8 +252,21 @@ const _InputIterator __other, const _InputIterator __other_end) { +#if __cplusplus = 201103L + typedef std::iterator_traits_InputIterator _InputIteTraits; + typedef typename _InputIteTraits::reference _InputIteRefType; +#endif return __foreign_iterator_aux3(__it, __other, __other_end, +#if __cplusplus 201103L _Is_contiguous_sequence_Sequence()); +#else + typename std::conditional + std::__and_std::integral_constant + bool, _Is_contiguous_sequence_Sequence::__value, + std::is_lvalue_reference_InputIteRefType ::value, + std::__true_type, + std::__false_type::type()); +#endif } /* Handle the case where we aren't really inserting a range after all */ Index: testsuite/23_containers/vector/63500.cc === --- testsuite/23_containers/vector/63500.cc (revision 0) +++ testsuite/23_containers/vector/63500.cc (working copy) @@ -0,0 +1,39 @@ +// -*- C++ -*- + +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// { dg-options -std=gnu++11 } +// { dg-do compile } + +#include memory +#include iterator +#include debug/vector + +class Foo +{}; + +void +test01() +{ + __gnu_debug::vectorstd::unique_ptrFoo v; + __gnu_debug::vectorstd::unique_ptrFoo w; + + v.insert(end(v), + make_move_iterator(begin(w)), + make_move_iterator(end(w))); +}
Re: New rematerialization sub-pass in LRA
On 2014-10-14 4:17 PM, Peter Bergner wrote: On Fri, 2014-10-10 at 11:02 -0400, Vladimir Makarov wrote: Here is a new rematerialization sub-pass of LRA. When Mike and I build with this patch along with the patch that enables LRA by default on powerpc64*-linux (attached below), we're seeing the following error message. I'm not sure how your patch can cause this error, but it does go away if we remove your patch and build again. Peter, thanks for checking the patch and reporting this. I had several wrong code generation problems with rematerialization on x86 and arm. I've solved them before posting the patch but I did not check ppc64. As a lot of people started to try the patch, several problems were reported. I'll address them and do some patch modifications. Now I think that I'll commit the patch into the trunk not earlier the next week. And I'll check with ppc64 too to be sure that we have no wrong code generation problems on this target too.
Re: [PATCH] AutoFDO patch for trunk
Dehao Chen de...@google.com writes: + +@item -fauto-profile +@itemx -fauto-profile=@var{path} +@opindex fauto-profile +Enable sampling based feedback directed optimizations, and optimizations +generally profitable only with profile feedback available. + +The following options are enabled: @code{-fbranch-probabilities}, @code{-fvpt}, +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, @code{-ftree-vectorize}, +@code{ftree-loop-distribute-patterns} This needs more description aimed end-users, what it is good for and why, and a pointer to the needed utilities and a short summary what steps they need to take. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] support ggc hash_map and hash_set
On Tue, Oct 14, 2014 at 04:05:25PM +0200, Richard Biener wrote: On Tue, Sep 2, 2014 at 3:56 AM, tsaund...@mozilla.com wrote: From: Trevor Saunders tsaund...@mozilla.com Hi, There are still some issues to make this work really nicely, but this part is probably good enough its worth reviewing. For one thing you can't use ggc hash_map or set in front ends with some types or gengtype will decide to put the overloads of the marking routines it provides in a front end file instead of the one it choose before breaking other front ends. However that seems to be an unrelated issue you can trigger it without using hash_map/set, so we might as well solve it separetly. I had to have the entry marking functions for set deligate to the traits class because gcc 4.9.1 issues clearly bogus errors if you inline the code from the traits implementation. We may well want to make map work the same way at some point to enable some of the special GTY attributes like if_marked, but it doesn't seem to be necessary right now. bootstrapped + regtested without regressions on x86_64-unknown-linux-gnu, ok? I have just noticed that this (ggc support for hash-table.h) makes it no longer suitable for use from generator programs (trying to merge from trunk on match-and-simplify). If you look at vec.h it has sophisticated guards to block out GGC support if GENERATOR_FILE is defined. yeah, it works, but its kind of messy since some of the generator programs include ggc.h. Can you try to fix this please? I expect its doable, I can try to get it done later this week / week end, but next few days are busy for me. Trev Thanks, Richard. Trev gcc/ChangeLog: 2014-09-01 Trevor Saunders tsaund...@mozilla.com * alloc-pool.c: Include coretypes.h. * cgraph.h, dbxout.c, dwarf2out.c, except.c, except.h, function.c, function.h, symtab.c, tree-cfg.c, tree-eh.c: Use hash_map and hash_set instead of htab. * ggc-page.c (in_gc): New variable. (ggc_free): Do nothing if a collection is taking place. (ggc_collect): Set in_gc appropriately. * ggc.h (gt_ggc_mx(const char *)): New function. (gt_pch_nx(const char *)): Likewise. (gt_ggc_mx(int)): Likewise. (gt_pch_nx(int)): Likewise. * hash-map.h (hash_map::hash_entry::ggc_mx): Likewise. (hash_map::hash_entry::pch_nx): Likewise. (hash_map::hash_entry::pch_nx_helper): Likewise. (hash_map::hash_map): Adjust. (hash_map::create_ggc): New function. (gt_ggc_mx): Likewise. (gt_pch_nx): Likewise. * hash-set.h (default_hashset_traits::ggc_mx): Likewise. (default_hashset_traits::pch_nx): Likewise. (hash_set::hash_entry::ggc_mx): Likewise. (hash_set::hash_entry::pch_nx): Likewise. (hash_set::hash_entry::pch_nx_helper): Likewise. (hash_set::hash_set): Adjust. (hash_set::create_ggc): New function. (hash_set::elements): Likewise. (gt_ggc_mx): Likewise. (gt_pch_nx): Likewise. * hash-table.h (hash_table::hash_table): Adjust. (hash_table::m_ggc): New member. (hash_table::~hash_table): Adjust. (hash_table::expand): Likewise. (hash_table::empty): Likewise. (gt_ggc_mx): New function. (hashtab_entry_note_pointers): Likewise. (gt_pch_nx): Likewise. diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c index 0d31835..bfaa0e4 100644 --- a/gcc/alloc-pool.c +++ b/gcc/alloc-pool.c @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h +#include coretypes.h #include alloc-pool.h #include hash-table.h #include hash-map.h diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 879899c..030a1c7 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1604,7 +1604,6 @@ struct cgraph_2node_hook_list; /* Map from a symbol to initialization/finalization priorities. */ struct GTY(()) symbol_priority_map { - symtab_node *symbol; priority_type init; priority_type fini; }; @@ -1872,7 +1871,7 @@ public: htab_t GTY((param_is (symtab_node))) assembler_name_hash; /* Hash table used to hold init priorities. */ - htab_t GTY ((param_is (symbol_priority_map))) init_priority_hash; + hash_mapsymtab_node *, symbol_priority_map *init_priority_hash; FILE* GTY ((skip)) dump_file; diff --git a/gcc/dbxout.c b/gcc/dbxout.c index 946f1d1..d856bdd 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -2484,12 +2484,9 @@ dbxout_expand_expr (tree expr) /* Helper function for output_used_types. Queue one entry from the used types hash to be output. */ -static int -output_used_types_helper (void **slot, void *data) +bool +output_used_types_helper (tree const type, vectree *types_p) { - tree type = (tree) *slot; - vectree *types_p = (vectree *) data; - if ((TREE_CODE (type) == RECORD_TYPE
Re: [PATCH 1/2] xtensa: drop unimplemented floating point operations
On Mon, Oct 13, 2014 at 8:05 PM, augustine.sterl...@gmail.com augustine.sterl...@gmail.com wrote: On Sun, Oct 12, 2014 at 3:46 PM, Max Filippov jcmvb...@gmail.com wrote: xtensa ISA never implemented FP division, reciprocal, square root and inverse square root as single opcode. Remove patterns that can emit them. 2014-10-09 Max Filippov jcmvb...@gmail.com gcc/ * config/xtensa/xtensa.md (divsf3, *recipsf2, sqrtsf2, *rsqrtsf2): remove. Approved. Applied to trunk. Thanks! -- Max
Re: [PATCH 2/2] xtensa: use pre- and postincrement FP load/store when available
On Mon, Oct 13, 2014 at 8:04 PM, augustine.sterl...@gmail.com augustine.sterl...@gmail.com wrote: On Sun, Oct 12, 2014 at 3:46 PM, Max Filippov jcmvb...@gmail.com wrote: 2014-10-10 Max Filippov jcmvb...@gmail.com gcc/ * config/xtensa/xtensa.h (TARGET_HARD_FLOAT_POSTINC): new macro. * config/xtensa/xtensa.md (*lsiu, *ssiu): add dependency on !TARGET_HARD_FLOAT_POSTINC. (*lsip, *ssip): new instructions. Approved. Do you have write priviliges? Applied to trunk. Thanks! -- Max
[committed] MAINTAINERS: add myself to write-after-approval list.
2014-10-15 Max Filippov jcmvb...@gmail.com * MAINTAINERS (write-after-approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 216231) +++ MAINTAINERS (revision 216232) @@ -380,6 +380,7 @@ Chris Fairles cfair...@gcc.gnu.org Changpeng Fang changpeng.f...@amd.com Li Fengnemoking...@gmail.com +Max Filippov jcmvb...@gmail.com Thomas Fitzsimmons fitz...@redhat.com Brian Ford f...@vss.fsi.com John Freeman jfreema...@gmail.com
[PATCH] Better tolerance of incoming profile insanities in jump threading
The below patch fixes the overflow detection when recomputing probabilities after jump threading, in case of incoming profile insanities. It detects more cases where the computation will overflow not only the max probability but the max int and possibly wrap around. LTO profilebootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2014-10-14 Teresa Johnson tejohn...@google.com PR bootstrap/63432 * tree-ssa-threadupdate.c (recompute_probabilities): Better overflow checking. Index: tree-ssa-threadupdate.c === --- tree-ssa-threadupdate.c (revision 216150) +++ tree-ssa-threadupdate.c (working copy) @@ -871,21 +871,23 @@ recompute_probabilities (basic_block bb) edge_iterator ei; FOR_EACH_EDGE (esucc, ei, bb-succs) { - if (bb-count) + if (!bb-count) +continue; + + /* Prevent overflow computation due to insane profiles. */ + if (esucc-count bb-count) esucc-probability = GCOV_COMPUTE_SCALE (esucc-count, bb-count); - if (esucc-probability REG_BR_PROB_BASE) -{ - /* Can happen with missing/guessed probabilities, since we -may determine that more is flowing along duplicated -path than joiner succ probabilities allowed. -Counts and freqs will be insane after jump threading, -at least make sure probability is sane or we will -get a flow verification error. -Not much we can do to make counts/freqs sane without -redoing the profile estimation. */ - esucc-probability = REG_BR_PROB_BASE; - } + else +/* Can happen with missing/guessed probabilities, since we + may determine that more is flowing along duplicated + path than joiner succ probabilities allowed. + Counts and freqs will be insane after jump threading, + at least make sure probability is sane or we will + get a flow verification error. + Not much we can do to make counts/freqs sane without + redoing the profile estimation. */ +esucc-probability = REG_BR_PROB_BASE; } } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: libffi patch RFA: Pass -Qunused-arguments for asm files
Il 30/09/2014 02:12, Ian Lance Taylor ha scritto: Similar to a recent patch to libgo, this patch to the libffi configure script checks whether the compiler support -Qunused-arguments. If it does, it passes -Qunused-arguments when invoking the compiler on .s files. This is because the clang driver complains by default when given useless arguments, such as -I options when compiling a .s file. This somewhat annoying behaviour works poorly with configure scripts. The -Qunused-arguments option disables it. Bootstrapped and ran libffi and libgo tests on x86_64-unknown-linux-gnu. OK for mainline? Ian 2014-09-29 Ian Lance Taylor i...@google.com * configure.ac: If the compiler supports -Qunused-arguments, use it when running the compiler on .s files. * configure: Regenerated. Ok. Paolo