Re: C6X port 8/11: A new FUNCTION_ARG macro
On 05/12/2011 05:40 PM, Bernd Schmidt wrote: + if (targetm.calls.function_arg_round_to_arg_boundary (passed_mode, type)) +round_boundary = boundary; + else +round_boundary = PARM_BOUNDARY; Why add an if, instead of making the new target hook function_arg_round_boundary? The default implementation can then reuse default_function_arg_boundary and C6X will redefine it to c6x_function_arg_boundary. Paolo
Re: Prefixes for libgcc symbols (C6X 9.5/11)
On 05/13/2011 03:40 PM, Bernd Schmidt wrote: gcc/ * libgcc2.h (__NW, __NDW): Define using a __gnu_ prefix if LIBGCC2_GNU_PREFIX is defined. (__N): New macro. (__powisf2, __powidf2, __powitf2, __powixf2, __bswapsi2, __bswapdi2, __mulsc3, __muldc3, __mulxc3, __multc3, __divsc3, __divdc3, __divxc3, __divtc3, __udiv_w_sdiv, __clear_cache, __enable_execute_stack, __clz_tab): Define using __N. (__absvsi2, __negvsi2, __addvsi3, __subvsi3, __mulvsi3): Likewise if COMPAT_SIMODE_TRAPPING_ARITHMETIC. * target.def (libfunc_gnu_prefix): New hook. * doc/tm.texi.in (LIBGCC2_GNU_PREFIX): Document. (TARGET_LIBFUNC_GNU_PREFIX): Add hook. * doc/tm.texi: Regenerate. * optabs.c (gen_libfunc): Take the libfunc_gnu_prefix hook into account. (gen_interclass_conv_libfunc, gen_intraclass_conv_libfunc): Likewise. (init_optabs): Likewise for the bswap libfuncs. * tree.c (build_common_builtin_nodes): Likewise for complex multiply and divide. * config/t-slibgcc-elf-ver (SHLIB_MAPFILES): Use $$(libgcc_objdir). * config/t-slibgcc-sld (SHLIB_MAPFILES): Likewise. * libgcc-std.ver: Remove. * Makefile.in (srcdirify): Handle $$(libgcc_objdir). * config/bfin/libgcc-bfin.ver: Remove. * config/bfin/t-bfin-linux (SHLIB_MAPFILES): Remove. * config/frv/t-linux (SHLIB_MAPFILES): Use $$(libgcc_objdir) for libgcc-std.ver. * config/i386/t-linux (SHLIB_MAPFILES): Likewise. * config/mips/t-slibgcc-irix (SHLIB_MAPFILES): Likewise. * config/rs6000/t-aix43 (SHLIB_MAPFILES): Likewise. * config/rs6000/t-aix52 (SHLIB_MAPFILES): Likewise. * config/sparc/t-linux (SHLIB_MAPFILES): Likewise. * config/i386/t-linux (SHLIB_MAPFILES): Likewise. * config/i386/t-linux (SHLIB_MAPFILES): Likewise. * config/fixed-bit.h (FIXED_OP): Define differently depending on LIBGCC2_GNU_PREFIX. All uses changed not to pass leading underscores. (FIXED_CONVERT_OP, FIXED_CONVERT_OP2): Likewise. libgcc/ * libgcc-std.ver.in: New file. * Makefile.in (LIBGCC_VER_GNU_PREFIX, LIBGCC_VER_SYMBOLS_PREFIX): New variables. (libgcc-std.ver): New rule. * config/t-gnu-prefix: New file. * config/t-underscore-prefix: New file. Build parts are ok. Paolo
[PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)
Hi! This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec} and testing flags, instead of lock xadd plus comparison. The sync_old_addmode predicate change makes it possible to optimize __sync_add_and_fetch with constant second argument to same code as __sync_fetch_and_add. Doing it in peephole2 has a disadvantage though, both that the 3 instructions need to be consecutive and e.g. that xadd insn has to be supported by the CPU. Other alternative would be to come up with a new bool builtin that would represent the whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space in its name to make it inaccessible), try to match it during some folding and expand it using special optab. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk this way? 2011-05-16 Jakub Jelinek ja...@redhat.com PR target/48986 * config/i386/sync.md (sync_old_addmode): Relax operand 2 predicate to allow CONST_INT. (*sync_old_add_cmpmode): New insn and peephole2 for it. --- gcc/config/i386/sync.md.jj 2010-05-21 11:46:29.0 +0200 +++ gcc/config/i386/sync.md 2011-05-16 14:42:08.0 +0200 @@ -170,11 +170,62 @@ (define_insn sync_old_addmode [(match_operand:SWI 1 memory_operand +m)] UNSPECV_XCHG)) (set (match_dup 1) (plus:SWI (match_dup 1) - (match_operand:SWI 2 register_operand 0))) + (match_operand:SWI 2 nonmemory_operand 0))) (clobber (reg:CC FLAGS_REG))] TARGET_XADD lock{%;} xadd{imodesuffix}\t{%0, %1|%1, %0}) +(define_peephole2 + [(set (match_operand:SWI 0 register_operand ) + (match_operand:SWI 2 const_int_operand )) + (parallel [(set (match_dup 0) + (unspec_volatile:SWI +[(match_operand:SWI 1 memory_operand )] UNSPECV_XCHG)) + (set (match_dup 1) + (plus:SWI (match_dup 1) +(match_dup 0))) + (clobber (reg:CC FLAGS_REG))]) + (set (reg:CCZ FLAGS_REG) + (compare:CCZ (match_dup 0) +(match_operand:SWI 3 const_int_operand )))] + peep2_reg_dead_p (3, operands[0]) +(unsigned HOST_WIDE_INT) INTVAL (operands[2]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[3]) +!reg_overlap_mentioned_p (operands[0], operands[1]) + [(parallel [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI [(match_dup 1)] +UNSPECV_XCHG) + (match_dup 3))) + (set (match_dup 1) + (plus:SWI (match_dup 1) +(match_dup 2)))])]) + +(define_insn *sync_old_add_cmpmode + [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI + [(match_operand:SWI 0 memory_operand +m)] + UNSPECV_XCHG) +(match_operand:SWI 2 const_int_operand i))) + (set (match_dup 0) + (plus:SWI (match_dup 0) + (match_operand:SWI 1 const_int_operand i)))] + (unsigned HOST_WIDE_INT) INTVAL (operands[1]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[2]) +{ + if (TARGET_USE_INCDEC) +{ + if (operands[1] == const1_rtx) + return lock{%;} inc{imodesuffix}\t%0; + if (operands[1] == constm1_rtx) + return lock{%;} dec{imodesuffix}\t%0; +} + + if (x86_maybe_negate_const_int (operands[1], MODEmode)) +return lock{%;} sub{imodesuffix}\t{%1, %0|%0, %1}; + + return lock{%;} add{imodesuffix}\t{%1, %0|%0, %1}; +}) + ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space. (define_insn sync_lock_test_and_setmode [(set (match_operand:SWI 0 register_operand =r) Jakub
Re: [patch, ARM] Fix PR42017, LR not used in leaf functions
On 2011/5/13 04:26 PM, Richard Sandiford wrote: Richard Sandiford richard.sandif...@linaro.org writes: Chung-Lin Tang clt...@codesourcery.com writes: My fix here simply adds 'reload_completed' as an additional condition for EPILOGUE_USES to return true for LR_REGNUM. I think this should be valid, as correct LR save/restoring is handled by the epilogue/prologue code; it should be safe for IRA to treat it as a normal call-used register. FWIW, epilogue_completed might be a more accurate choice. I still stand by this, although I realise no other target does it. Did a re-test of the patch just to be sure, as expected the test results were also clear. Attached is the updated patch. It seems a lot of other ports suffer from the same problem though. I wonder which targets really do want to make a register live throughout the function? If none do, perhaps we should say that this macro is only meaningful once the epilogue has been generated. To answer my own question, I suppose VRSAVE is one. So I was wrong about the target-independent fix. Richard To rehash what I remember we discussed at LDS, such registers like VRSAVE might be more appropriately placed in global regs. It looks like EPILOGUE_USES could be more clarified in its use... To Richard Earnshaw and Ramana, is the patch okay for trunk? This should be a not-so-insignificant performance regression-fix/improvement. Thanks, Chung-Lin Index: config/arm/arm.h === --- config/arm/arm.h(revision 173814) +++ config/arm/arm.h(working copy) @@ -1627,7 +1627,7 @@ frame. */ #define EXIT_IGNORE_STACK 1 -#define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM) +#define EPILOGUE_USES(REGNO) (epilogue_completed (REGNO) == LR_REGNUM) /* Determine if the epilogue should be output as RTL. You should override this if you define FUNCTION_EXTRA_EPILOGUE. */
[PATCH, PR45098]
Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. The patch set has been bootstrapped and reg-tested on x86_64, and reg-tested on ARM. The effect of the patch set on examples is the removal of 1 iterator, demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4. tr4.c: ... extern void foo2 (short*); void tr4 (short array[], int n) { int i; if (n 0) for (i = 0; i n; i++) foo2 (array[i]); } ... tr4.s diff (left without, right with patch): ... push{r4, r5, r6, lr} | cmp r1, #0 subsr6, r1, #0| push{r3, r4, r5, lr} ble .L1 ble .L1 mov r5, r0| mov r4, r0 movsr4, #0| add r5, r0, r1, lsl #1 .L3:.L3: mov r0, r5| mov r0, r4 addsr4, r4, #1| addsr4, r4, #2 bl foo2bl foo2 addsr5, r5, #2| cmp r4, r5 cmp r4, r6 bne .L3 bne .L3 .L1:.L1: pop {r4, r5, r6, pc} | pop {r3, r4, r5, pc} ... The effect of the patch set on the test cases in terms of size is listed in the following 2 tables. --- -Os -thumb -mmarch=armv7-a --- withoutwith delta --- tr1 32 30 -2 tr2 36 36 0 tr3 32 30 -2 tr4 26 26 0 tr5 20 20 0 --- --- -Os -mmarch=armv7-a --- withoutwith delta --- tr1 60 52 -8 tr2 64 60 -4 tr3 60 52 -8 tr4 48 44 -4 tr5 36 32 -4 --- The size impact on several benchmarks is shown in the following table (%, lower is better). nonepic thumb1 thumb2 thumb1 thumb2 spec2000 99.999.999.9 99.9 eembc 99.9 100.099.9 100.1 dhrystone100.0 100.0 100.0 100.0 coremark 99.399.999.3 100.0 Thanks, - Tom
[PATCH, PR45098, 3/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2862,7 +2862,9 @@ computation_cost (tree expr, bool speed) default_rtl_profile (); node-frequency = real_frequency; - cost = seq_cost (seq, speed); + cost = (seq != NULL_RTX + ? seq_cost (seq, speed) + : (unsigned)rtx_cost (rslt, SET, speed)); if (MEM_P (rslt)) cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type), TYPE_ADDR_SPACE (type), speed);
[PATCH, PR45098, 4/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand-iv-base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost);
[PATCH, PR45098, 7/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields max_iterations_p and max_iterations. (is_nonwrap_use, max_loop_iterations, set_max_iterations): New function. (may_eliminate_iv): Use max_iterations_p and max_iterations. (tree_ssa_iv_optimize_loop): Use set_max_iterations. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173355) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -291,6 +291,12 @@ struct ivopts_data /* Whether the loop body includes any function calls. */ bool body_includes_call; + + /* Whether max_iterations is valid. */ + bool max_iterations_p; + + /* Maximum number of iterations of current_loop. */ + double_int max_iterations; }; /* An assignment of iv candidates to uses. */ @@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da return (exit-flags EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR); } +/* Determine if USE contains non-wrapping arithmetic. */ + +static bool +is_nonwrap_use (struct ivopts_data *data, struct iv_use *use) +{ + gimple stmt = use-stmt; + tree var, ptr, ptr_type; + + if (!is_gimple_assign (stmt)) +return false; + + switch (gimple_assign_rhs_code (stmt)) +{ +case POINTER_PLUS_EXPR: + ptr = gimple_assign_rhs1 (stmt); + ptr_type = TREE_TYPE (ptr); + var = gimple_assign_rhs2 (stmt); + if (!expr_invariant_in_loop_p (data-current_loop, ptr)) +return false; + break; +case ARRAY_REF: + ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0); + ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt))); + var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1); + break; +default: + return false; +} + + if (!nowrap_type_p (ptr_type)) +return false; + + if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var))) +return false; + + return true; +} + +/* Attempt to infer maximum number of loop iterations of DATA-current_loop + from uses in loop containing non-wrapping arithmetic. If successful, + return true, and return maximum iterations in MAX_NITER. */ + +static bool +max_loop_iterations (struct ivopts_data *data, double_int *max_niter) +{ + struct iv_use *use; + struct iv *iv; + bool found = false; + double_int period; + gimple stmt; + unsigned i; + + for (i = 0; i n_iv_uses (data); i++) +{ + use = iv_use (data, i); + + stmt = use-stmt; + if (!just_once_each_iteration_p (data-current_loop, gimple_bb (stmt))) + continue; + + if (!is_nonwrap_use (data, use)) +continue; + + iv = use-iv; + if (iv-step == NULL_TREE || TREE_CODE (iv-step) != INTEGER_CST) + continue; + period = tree_to_double_int (iv_period (iv)); + + if (found) +*max_niter = double_int_umin (*max_niter, period); + else +{ + found = true; + *max_niter = period; +} +} + + return found; +} + +/* Initializes DATA-max_iterations and DATA-max_iterations_p. */ + +static void +set_max_iterations (struct ivopts_data *data) +{ + double_int max_niter, max_niter2; + bool estimate1, estimate2; + + data-max_iterations_p = false; + estimate1 = estimated_loop_iterations (data-current_loop, true, max_niter); + estimate2 = max_loop_iterations (data, max_niter2); + if (!(estimate1 || estimate2)) +return; + if (estimate1 estimate2) +data-max_iterations = double_int_umin (max_niter, max_niter2); + else if (estimate1) +data-max_iterations = max_niter; + else +data-max_iterations = max_niter2; + data-max_iterations_p = true; +} + /* Check whether it is possible to express the condition in USE by comparison of candidate CAND. If so, store the value compared with to BOUND. */ @@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da /* See if we can take advantage of infered loop bound information. */ if (loop_only_exit_p (loop, exit)) { - if (!estimated_loop_iterations (loop, true, max_niter)) + if (!data-max_iterations_p) return false; /* The loop bound is already adjusted by adding 1. */ - if (double_int_ucmp (max_niter, period_value) 0) + if (double_int_ucmp (data-max_iterations, period_value) 0) return false;
[PATCH, PR45098, 8/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * gcc.target/arm/ivopts-3.c: New test. * gcc.target/arm/ivopts-4.c: New test. * gcc.target/arm/ivopts-5.c: New test. * gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test. Index: gcc/testsuite/gcc.target/arm/ivopts-3.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo2 (short*) __attribute__((pure)); + +unsigned int +tr3 (short array[], unsigned int n) +{ + unsigned sum = 0; + unsigned int x; + for (x = 0; x n; x++) +sum += foo2 (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.target/arm/ivopts-4.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-4.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do assemble } */ +/* { dg-options -mthumb -Os -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo (int*) __attribute__((pure)); + +unsigned int +tr2 (int array[], int n) +{ + unsigned int sum = 0; + int x; + if (n 0) +for (x = 0; x n; x++) + sum += foo (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 36 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.target/arm/ivopts-5.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-5.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo (int*) __attribute__((pure)); + +unsigned int +tr1 (int array[], unsigned int n) +{ + unsigned int sum = 0; + unsigned int x; + for (x = 0; x n; x++) +sum += foo (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 173380) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern int a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can infer loop iteration from nonwrapping pointer arithmetic. + exit test can be replaced. */ void foo (int i_width, TYPE dst, TYPE src1, TYPE src2) { TYPE dstn= dst + i_width; @@ -21,5 +22,5 @@ void foo (int i_width, TYPE dst, TYPE sr } } -/* { dg-final { scan-tree-dump-times Replacing 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times Replacing 1 ivopts} } */ /* { dg-final { cleanup-tree-dump ivopts } } */
[PATCH, PR45098, 10/10]
On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * gcc.target/arm/ivopts-6.c: New test. Index: gcc/testsuite/gcc.target/arm/ivopts-6.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-6.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -fdump-tree-ivopts -save-temps -marm } */ + +void +tr5 (short array[], int n) +{ + int x; + if (n 0) +for (x = 0; x n; x++) + array[x] = 0; +} + +/* { dg-final { scan-tree-dump-times PHI 1 ivopts} } */ +/* { dg-final { object-size text = 32 } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */
Re: [PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)
On Tue, May 17, 2011 at 9:02 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec} and testing flags, instead of lock xadd plus comparison. The sync_old_addmode predicate change makes it possible to optimize __sync_add_and_fetch with constant second argument to same code as __sync_fetch_and_add. Doing it in peephole2 has a disadvantage though, both that the 3 instructions need to be consecutive and e.g. that xadd insn has to be supported by the CPU. Other alternative would be to come up with a new bool builtin that would represent the whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space in its name to make it inaccessible), try to match it during some folding and expand it using special optab. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk this way? 2011-05-16 Jakub Jelinek ja...@redhat.com PR target/48986 * config/i386/sync.md (sync_old_addmode): Relax operand 2 predicate to allow CONST_INT. (*sync_old_add_cmpmode): New insn and peephole2 for it. OK, but please add a comment explaining why we have matched constraint with non-matched predicate. These operands are otherwise targets for cleanups ;) Also, a comment explaining the purpose of the added peephole would be nice. IMO, the change to sync_old_addmode is also appropriate to release branches. Thanks, Uros.
Re: Don't let search bots look at buglist.cgi
On Mon, May 16, 2011 at 10:27:44PM -0700, Ian Lance Taylor wrote: On Mon, May 16, 2011 at 6:42 AM, Richard Guenther richard.guent...@gmail.com wrote: httpd being in the top-10 always, fiddling with bugzilla URLs? (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple instances of discussion on #gcc and richi poking on it; that said, it still might not be web crawlers, that's right, but I'll happily accept _any_ load improvement on gcc.gnu.org, how unfounded they might seem) I think that simply blocking buglist.cgi has dropped bugzilla off the immediate radar. It also seems to have lowered the load, although I'm not sure if we are still keeping historical data. I for example see also 66.249.71.59 - - [16/May/2011:13:37:58 +] GET /viewcvs?view=revisionrevision=169814 HTTP/1.1 200 1334 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) (35%) 2060117us and viewvc is certainly even worse (from an I/O perspecive). I thought we blocked all bot traffic from the viewvc stuff ... This is only happening at top level. I committed this patch to fix this. Probably you know it much better than me, but wouldn't it be a possibility to only allow some of google crawlers? (if all try to crawl bugzilla) As I read http://www.google.com/support/webmasters/bin/answer.py?hl=enanswer=1061943 it would be possible to block the Crawlers Googlebot-Mobile, Mediapartners-Google and AdsBot-Google, (which seem to be independent Crawlers?) while allowing the main Googlebot (Well, I don't know how often which crawler appears how often on bugzilla...) Axel
Commit: RX: Add peepholes for move followed by compare
Hi Guys, I am applying the patch below to add a peephole optimization to the RX backend. It was suggested by Kazuhio Inaoka at Renesas Japan, and adapted by me to use peephole2 system. It finds a register move followed by a comparison of the moved register against zero and replaces the two instructions with a single addition instruction. The addition does not actually do anything since the value being added is zero, but as a side effect it moves the register and performs the comparison. Cheers Nick gcc/ChangeLog 2011-05-17 Kazuhio Inaoka kazuhiro.inaoka...@renesas.com Nick Clifton ni...@redhat.com * config/rx/rx.md: Add peepholes to match a register move followed by a comparison of the moved register. Replace these with an addition of zero that does both actions in one instruction. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173815) +++ gcc/config/rx/rx.md (working copy) @@ -904,6 +904,39 @@ (set_attr length 3,4,5,6,7,6)] ) +;; Peepholes to match: +;; (set (reg A) (reg B)) +;; (set (CC) (compare:CC (reg A/reg B) (const_int 0))) +;; and replace them with the addsi3_flags pattern, using an add +;; of zero to copy the register and set the condition code bits. +(define_peephole2 + [(set (match_operand:SI 0 register_operand) +(match_operand:SI 1 register_operand)) + (set (reg:CC CC_REG) +(compare:CC (match_dup 0) +(const_int 0)))] + + [(parallel [(set (match_dup 0) + (plus:SI (match_dup 1) (const_int 0))) + (set (reg:CC_ZSC CC_REG) + (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0)) + (const_int 0)))])] +) + +(define_peephole2 + [(set (match_operand:SI 0 register_operand) +(match_operand:SI 1 register_operand)) + (set (reg:CC CC_REG) +(compare:CC (match_dup 1) +(const_int 0)))] + + [(parallel [(set (match_dup 0) + (plus:SI (match_dup 1) (const_int 0))) + (set (reg:CC_ZSC CC_REG) + (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0)) + (const_int 0)))])] +) + (define_expand adddi3 [(set (match_operand:DI 0 register_operand) (plus:DI (match_operand:DI 1 register_operand)
Commit: RX: Add peepholes to remove redundant extensions
Hi Guys, I am applying the patch below to add a couple of peephole optimizations to the RX backend. It seems that GCC does not cope very well with the RX's ability to perform either sign-extending loads or zero-extending loads and so sometimes it can generate an extending load followed by a register to register extension. The peepholes match these cases and delete the unnecessary extension where possible. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton ni...@redhat.com * config/rx/rx.md: Add peephole to remove redundant extensions after loads. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173819) +++ gcc/config/rx/rx.md (working copy) @@ -1701,6 +1701,35 @@ (extend_types:SI (match_dup 1] ) +;; Convert: +;; (set (reg1) (sign_extend (mem)) +;; (set (reg2) (zero_extend (reg1)) +;; into +;; (set (reg2) (zero_extend (mem))) +(define_peephole2 + [(set (match_operand:SI 0 register_operand) + (sign_extend:SI (match_operand:small_int_modes 1 memory_operand))) + (set (match_operand:SI 2 register_operand) + (zero_extend:SI (match_operand:small_int_modes 3 register_operand)))] + REGNO (operands[0]) == REGNO (operands[3]) +(REGNO (operands[0]) == REGNO (operands[2]) + || peep2_regno_dead_p (2, REGNO (operands[0]))) + [(set (match_dup 2) + (zero_extend:SI (match_dup 1)))] +) + +;; Remove the redundant sign extension from: +;; (set (reg) (extend (mem))) +;; (set (reg) (extend (reg))) +(define_peephole2 + [(set (match_operand:SI 0 register_operand) + (extend_types:SI (match_operand:small_int_modes 1 memory_operand))) + (set (match_dup 0) + (extend_types:SI (match_operand:small_int_modes 2 register_operand)))] + REGNO (operands[0]) == REGNO (operands[2]) + [(set (match_dup 0) (extend_types:SI (match_dup 1)))] +) + (define_insn comparesi3_extend_types:codesmall_int_modes:mode [(set (reg:CC CC_REG) (compare:CC (match_operand:SI 0 register_operand =r)
Commit: RX: Fix predicates for restricted memory patterns
Hi Guys, I am applying the patch below to fix a minor discrepancy in the rx.md file. Several patterns can only use restricted memory addresses. They have the correct Q constraint, but they were using the more permissive memory_operand predicate. The patch fixes these patterns by replacing memory_operand with rx_restricted_mem_operand. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton ni...@redhat.com * config/rx/rx.md (bitset_in_memory): Use rx_restricted_mem_operand. (bitinvert_in_memory): Likewise. (bitclr_in_memory): Likewise. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 173820) +++ gcc/config/rx/rx.md (working copy) @@ -1831,7 +1831,7 @@ ) (define_insn *bitset_in_memory - [(set (match_operand:QI0 memory_operand +Q) + [(set (match_operand:QI0 rx_restricted_mem_operand +Q) (ior:QI (ashift:QI (const_int 1) (match_operand:QI 1 nonmemory_operand ri)) (match_dup 0)))] @@ -1852,7 +1852,7 @@ ) (define_insn *bitinvert_in_memory - [(set (match_operand:QI 0 memory_operand +Q) + [(set (match_operand:QI 0 rx_restricted_mem_operand +Q) (xor:QI (ashift:QI (const_int 1) (match_operand:QI 1 nonmemory_operand ri)) (match_dup 0)))] @@ -1875,7 +1875,7 @@ ) (define_insn *bitclr_in_memory - [(set (match_operand:QI 0 memory_operand +Q) + [(set (match_operand:QI 0 rx_restricted_mem_operand +Q) (and:QI (not:QI (ashift:QI (const_int 1)
Commit: RX: Include cost of register moving in the cost of register loading.
Hi Guys, I am applying the patch below to fix a bug with the rx_memory_move_cost function. The problem was that the costs are meant to be relative to the cost of moving a value between registers, but the existing definition was making stores cheaper than moves, and loads the same cost as moves. Thus gcc was sometimes choosing to store values in memory when actually it was better to keep them in memory. The patch fixes the problem by adding in the register move cost to the memory move cost. It also removes the call to memory_move_secondary_cost since there is no secondary cost. Cheers Nick gcc/ChangeLog 2011-05-17 Nick Clifton ni...@redhat.com * config/rx/rx.c (rx_memory_move_cost): Include cost of register moves. Index: gcc/config/rx/rx.c === --- gcc/config/rx/rx.c (revision 173815) +++ gcc/config/rx/rx.c (working copy) @@ -2638,7 +2638,7 @@ static int rx_memory_move_cost (enum machine_mode mode, reg_class_t regclass, bool in) { - return (in ? 2 : 0) + memory_move_secondary_cost (mode, regclass, in); + return (in ? 2 : 0) + REGISTER_MOVE_COST (mode, regclass, regclass); } /* Convert a CC_MODE to the set of flags that it represents. */
Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)
On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Richi, thank you for the detailed review! I'll plan to move the power-series expansion into the existing IL walk during pass_cse_sincos. As part of this, I'll move tree_expand_builtin_powi and its subfunctions from builtins.c into tree-ssa-math-opts.c. I'll submit this as a separate patch. I will also stop attempting to make code generation match completely at -O0. If there are tests in the test suite that fail only at -O0 due to these changes, I'll modify the tests to require -O1 or higher. I understand that you'd prefer that I leave the existing canonicalization folds in place, and only un-canonicalize them during my new pass (now, during cse_sincos). Actually, that was my first approach to this issue. The problem that I ran into is that the various folds are not performed just by the front end, but can pop up later, after my pass is done. In particular, pass_fold_builtins will undo my changes, turning expressions involving roots back into expressions involving pow/powi. It wasn't clear to me whether the folds could kick in elsewhere as well, so I took the approach of shutting them down. I see now that this does lose some optimizations such as pow(sqrt(cbrx(x)),6.0), as you pointed out. Yeah, it's always a delicate balance between canonicalization and optimal form for further optimization. Did you really see sqrt(cbrt(x)) being transformed back to pow()? I would doubt that, as on gimple the foldings that require two function calls to match shouldn't trigger. Or do you see sqrt(x) turned into pow(x,0.5)? I see that the vectorizer for example handles both pow(x,0.5) and pow(x,2), so indeed that might happen. I think what we might want to do is limit what the generic gimple fold_stmt folding does to function calls, also to avoid building regular generic call statements again. But that might be a bigger project and certainly should be done separately. So I'd say don't worry about this issue for the initial patch but instead deal with it separately. We also repeatedly thought about whether canonicalizing everything to pow is a good idea or not, especially our canonicalizing of x * x to pow (x, 2) leads to interesting effects in some cases, as several passes do not handle function calls very well. So I also thought about introducing a POW_EXPR tree code that would be easier in this regard and would be a more IL friendly canonical form of the power-related functions. Should I attempt to leave the folds in place, and screen out the particular cases that are causing trouble in pass_fold_builtins? Or is it too fragile to try to catch all places where folds occur? If there's a flag that indicates parsing is complete, I suppose I could disable individual folds once we're into the optimizer. I'd appreciate your guidance. Indeed restricting canonicalization to earlier passes would be the way to go I think. I will think of the best way to achieve this. Richard. Thanks, Bill
[PATCH][?/n] LTO type merging cleanup
This avoids the odd cases where gimple_register_canonical_type could end up running in cycles. I was able to reproduce this issue with an intermediate tree and LTO bootstrap. While the following patch is not the real fix (that one runs into a known cache-preloading issue again ...) it certainly makes a lot of sense and avoids the issue by design. LTO bootstrapped on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-05-17 Richard Guenther rguent...@suse.de * gimple.c (gimple_register_canonical_type): Use the main-variant leader for computing the canonical type. Index: gcc/gimple.c === *** gcc/gimple.c(revision 173825) --- gcc/gimple.c(working copy) *** gimple_register_canonical_type (tree t) *** 4856,4874 if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); ! /* Always register the type itself first so that if it turns out ! to be the canonical type it will be the one we merge to as well. */ ! t = gimple_register_type (t); if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); - /* Always register the main variant first. This is important so we - pick up the non-typedef variants as canonical, otherwise we'll end - up taking typedef ids for structure tags during comparison. */ - if (TYPE_MAIN_VARIANT (t) != t) - gimple_register_canonical_type (TYPE_MAIN_VARIANT (t)); - if (gimple_canonical_types == NULL) gimple_canonical_types = htab_create_ggc (16381, gimple_canonical_type_hash, gimple_canonical_type_eq, 0); --- 4856,4869 if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); ! /* Use the leader of our main variant for determining our canonical ! type. The main variant leader is a type that will always ! prevail. */ ! t = gimple_register_type (TYPE_MAIN_VARIANT (t)); if (TYPE_CANONICAL (t)) return TYPE_CANONICAL (t); if (gimple_canonical_types == NULL) gimple_canonical_types = htab_create_ggc (16381, gimple_canonical_type_hash, gimple_canonical_type_eq, 0);
Re: [PATCH][?/n] Cleanup LTO type merging
On Mon, 16 May 2011, H.J. Lu wrote: On Mon, May 16, 2011 at 7:17 AM, Richard Guenther rguent...@suse.de wrote: The following patch improves hashing types by re-instantiating the patch that makes us visit aggregate target types of pointers and function return and argument types. This halves the collision rate on the type hash table for a linux-kernel build and improves WPA compile-time from 3mins to 1mins and reduces memory usage by 1GB for that testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 build-tested. Richard. (patch is reversed) 2011-05-16 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Re-instantiate change to always visit pointer target and function result and argument types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. Richard.
Re: Reintroduce -mflat option on SPARC
Right, -mflat option should only be for 32-bit SPARC target. OK, let's keep it that way for now. Another question: why does the model hijack %i7 to use it as frame pointer, instead of just using %fp? AFAICS both are kept as fixed registers by the code so the model seems to be wasting 1 register (2 without frame pointer). -- Eric Botcazou
Re: Don't let search bots look at buglist.cgi
Hi, On Mon, 16 May 2011, Ian Lance Taylor wrote: httpd being in the top-10 always, fiddling with bugzilla URLs? (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple instances of discussion on #gcc and richi poking on it; that said, it still might not be web crawlers, that's right, but I'll happily accept _any_ load improvement on gcc.gnu.org, how unfounded they might seem) I think that simply blocking buglist.cgi has dropped bugzilla off the immediate radar. It also seems to have lowered the load, although I'm not sure if we are still keeping historical data. Btw. FWIW, I had a quick look at one of the httpd log files, and in seven hours on last Saturday (from 5:30 to 12:30), there were overall 435203 GET requests, and 391319 of them came from our own MnoGoSearch engine, that's 90%. Granted many are then in fact 304 (not modified) responses, but still, perhaps the eagerness of our own crawler can be turned down a bit. Ciao, Michael.
Re: [PATCH][?/n] Cleanup LTO type merging
On Mon, 16 May 2011, Jan Hubicka wrote: I've seen us merge different named structs which happen to reside on the same variant list. That's bogus, not only because we are adjusting TYPE_MAIN_VARIANT during incremental type-merging and fixup, so computing a persistent hash by looking at it looks fishy as well. Hi, as reported on IRC earlier, I get the segfault while building libxul duea to infinite recursion problem. I now however also get a lot more of the following ICEs: In function '__unguarded_insertion_sort': lto1: internal compiler error: in splice_child_die, at dwarf2out.c:8274 previously it reported once during Mozilla build (and I put testcase into bugzilla), now it reproduces on many libraries. I did not see this problem when applying only the SCC hasing change. This change causes us to preserve more TYPE_DECLs I think, so we might run more often into pre-existing debuginfo issues. Previously most of the types were merged into their nameless variant which probably didn't get output into debug info. Do you by chance have small testcases for your problems? ;) I think you might just look into one at http://gcc.gnu.org/bugzilla/show _bug.cgi?id=48354 Honza
Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)
On Tue, 2011-05-17 at 11:03 +0200, Richard Guenther wrote: On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Richi, thank you for the detailed review! I'll plan to move the power-series expansion into the existing IL walk during pass_cse_sincos. As part of this, I'll move tree_expand_builtin_powi and its subfunctions from builtins.c into tree-ssa-math-opts.c. I'll submit this as a separate patch. I will also stop attempting to make code generation match completely at -O0. If there are tests in the test suite that fail only at -O0 due to these changes, I'll modify the tests to require -O1 or higher. I understand that you'd prefer that I leave the existing canonicalization folds in place, and only un-canonicalize them during my new pass (now, during cse_sincos). Actually, that was my first approach to this issue. The problem that I ran into is that the various folds are not performed just by the front end, but can pop up later, after my pass is done. In particular, pass_fold_builtins will undo my changes, turning expressions involving roots back into expressions involving pow/powi. It wasn't clear to me whether the folds could kick in elsewhere as well, so I took the approach of shutting them down. I see now that this does lose some optimizations such as pow(sqrt(cbrx(x)),6.0), as you pointed out. Yeah, it's always a delicate balance between canonicalization and optimal form for further optimization. Did you really see sqrt(cbrt(x)) being transformed back to pow()? I would doubt that, as on gimple the foldings that require two function calls to match shouldn't trigger. Or do you see sqrt(x) turned into pow(x,0.5)? I see that the vectorizer for example handles both pow(x,0.5) and pow(x,2), so indeed that might happen. Yes, I was seeing sqrt(x) turned back to pow(x,0.5), and even x*x turning back into pow(x,2.0). I don't specifically recall the sqrt(cbrt(x)) case; you're probably right about that one. But I had several test cases break because of this. I think what we might want to do is limit what the generic gimple fold_stmt folding does to function calls, also to avoid building regular generic call statements again. But that might be a bigger project and certainly should be done separately. So I'd say don't worry about this issue for the initial patch but instead deal with it separately. Agreed... We also repeatedly thought about whether canonicalizing everything to pow is a good idea or not, especially our canonicalizing of x * x to pow (x, 2) leads to interesting effects in some cases, as several passes do not handle function calls very well. So I also thought about introducing a POW_EXPR tree code that would be easier in this regard and would be a more IL friendly canonical form of the power-related functions. Should I attempt to leave the folds in place, and screen out the particular cases that are causing trouble in pass_fold_builtins? Or is it too fragile to try to catch all places where folds occur? If there's a flag that indicates parsing is complete, I suppose I could disable individual folds once we're into the optimizer. I'd appreciate your guidance. Indeed restricting canonicalization to earlier passes would be the way to go I think. I will think of the best way to achieve this. Thanks. I think we need to address this as part of this patch, unless you're willing to live with a number of broken test cases in the meanwhile. If I only do the un-canonicalization in the new pass and let some of the folds be re-done later, some will fail. I'll start experimenting and see how many. Bill Richard. Thanks, Bill
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011-05-16 Kai Tietz PR middle-end/48989 * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use boolean_false_node instead of integer_zero_node. (convert_with_check): Likewise. * gcc-interface/decl.c (choices_to_gnu): Likewise. OK for this part. * gcc-interface/misc.c (gnat_init): Set precision for generated boolean_type_node and initialize boolean_false_node. Not OK, you cannot set the precision of boolean_type_node to 1 in Ada. -- Eric Botcazou
[PATCH][?/n] LTO type merging cleanup
This fixes an oversight in the new SCC hash mixing code - we of course need to return the adjusted hash of our type, not the purely local one. There's still something weird going on, hash values somehow depend on the order we feed it types ... Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-05-17 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Simplify singleton case some more, fix final hash value of the non-singleton case. Index: gcc/gimple.c === --- gcc/gimple.c(revision 173827) +++ gcc/gimple.c(working copy) @@ -4213,25 +4213,24 @@ iterative_hash_gimple_type (tree type, h if (state-low == state-dfsnum) { tree x; - struct sccs *cstate; struct tree_int_map *m; /* Pop off the SCC and set its hash values. */ x = VEC_pop (tree, *sccstack); - cstate = (struct sccs *)*pointer_map_contains (sccstate, x); - cstate-on_sccstack = false; /* Optimize SCC size one. */ if (x == type) { + state-on_sccstack = false; m = ggc_alloc_cleared_tree_int_map (); m-base.from = x; - m-to = cstate-u.hash; + m-to = v; slot = htab_find_slot (type_hash_cache, m, INSERT); gcc_assert (!*slot); *slot = (void *) m; } else { + struct sccs *cstate; unsigned first, i, size, j; struct type_hash_pair *pairs; /* Pop off the SCC and build an array of type, hash pairs. */ @@ -4241,6 +4240,8 @@ iterative_hash_gimple_type (tree type, h size = VEC_length (tree, *sccstack) - first + 1; pairs = XALLOCAVEC (struct type_hash_pair, size); i = 0; + cstate = (struct sccs *)*pointer_map_contains (sccstate, x); + cstate-on_sccstack = false; pairs[i].type = x; pairs[i].hash = cstate-u.hash; do @@ -4275,6 +4276,8 @@ iterative_hash_gimple_type (tree type, h for (j = 0; pairs[j].hash != pairs[i].hash; ++j) hash = iterative_hash_hashval_t (pairs[j].hash, hash); m-to = hash; + if (pairs[i].type == type) + v = hash; slot = htab_find_slot (type_hash_cache, m, INSERT); gcc_assert (!*slot); *slot = (void *) m;
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011/5/17 Eric Botcazou ebotca...@adacore.com: 2011-05-16 Kai Tietz PR middle-end/48989 * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use boolean_false_node instead of integer_zero_node. (convert_with_check): Likewise. * gcc-interface/decl.c (choices_to_gnu): Likewise. OK for this part. * gcc-interface/misc.c (gnat_init): Set precision for generated boolean_type_node and initialize boolean_false_node. Not OK, you cannot set the precision of boolean_type_node to 1 in Ada. -- Eric Botcazou Hmm, sad. As the a check in tree-cfg for truth-expressions about having type-precision of 1 would be a good way. What is actual the cause for not setting type-precision here? At least in testcases I didn't found a regression caused by this. Regards, Kai
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 3:29 AM, Richard Guenther rguent...@suse.de wrote: On Mon, 16 May 2011, H.J. Lu wrote: On Mon, May 16, 2011 at 7:17 AM, Richard Guenther rguent...@suse.de wrote: The following patch improves hashing types by re-instantiating the patch that makes us visit aggregate target types of pointers and function return and argument types. This halves the collision rate on the type hash table for a linux-kernel build and improves WPA compile-time from 3mins to 1mins and reduces memory usage by 1GB for that testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 build-tested. Richard. (patch is reversed) 2011-05-16 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Re-instantiate change to always visit pointer target and function result and argument types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. It doesn't solve the problem and I reopened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 Your followup patches may have similar issues. -- H.J.
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 5:59 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 17, 2011 at 3:29 AM, Richard Guenther rguent...@suse.de wrote: On Mon, 16 May 2011, H.J. Lu wrote: On Mon, May 16, 2011 at 7:17 AM, Richard Guenther rguent...@suse.de wrote: The following patch improves hashing types by re-instantiating the patch that makes us visit aggregate target types of pointers and function return and argument types. This halves the collision rate on the type hash table for a linux-kernel build and improves WPA compile-time from 3mins to 1mins and reduces memory usage by 1GB for that testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 build-tested. Richard. (patch is reversed) 2011-05-16 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Re-instantiate change to always visit pointer target and function result and argument types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. It doesn't solve the problem and I reopened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 Your followup patches may have similar issues. I think you reverted the WRONG patch: http://gcc.gnu.org/viewcvs?view=revisionrevision=173827 -- H.J.
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 3:01 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 17, 2011 at 5:59 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 17, 2011 at 3:29 AM, Richard Guenther rguent...@suse.de wrote: On Mon, 16 May 2011, H.J. Lu wrote: On Mon, May 16, 2011 at 7:17 AM, Richard Guenther rguent...@suse.de wrote: The following patch improves hashing types by re-instantiating the patch that makes us visit aggregate target types of pointers and function return and argument types. This halves the collision rate on the type hash table for a linux-kernel build and improves WPA compile-time from 3mins to 1mins and reduces memory usage by 1GB for that testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 build-tested. Richard. (patch is reversed) 2011-05-16 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Re-instantiate change to always visit pointer target and function result and argument types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. It doesn't solve the problem and I reopened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 Your followup patches may have similar issues. I think you reverted the WRONG patch: http://gcc.gnu.org/viewcvs?view=revisionrevision=173827 No, that was on purpose. -- H.J.
Re: [PATCH][?/n] Cleanup LTO type merging
On Tue, May 17, 2011 at 6:03 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 17, 2011 at 3:01 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 17, 2011 at 5:59 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 17, 2011 at 3:29 AM, Richard Guenther rguent...@suse.de wrote: On Mon, 16 May 2011, H.J. Lu wrote: On Mon, May 16, 2011 at 7:17 AM, Richard Guenther rguent...@suse.de wrote: The following patch improves hashing types by re-instantiating the patch that makes us visit aggregate target types of pointers and function return and argument types. This halves the collision rate on the type hash table for a linux-kernel build and improves WPA compile-time from 3mins to 1mins and reduces memory usage by 1GB for that testcase. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6 build-tested. Richard. (patch is reversed) 2011-05-16 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Re-instantiate change to always visit pointer target and function result and argument types. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 I have reverted the patch for now. It doesn't solve the problem and I reopened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013 Your followup patches may have similar issues. I think you reverted the WRONG patch: http://gcc.gnu.org/viewcvs?view=revisionrevision=173827 No, that was on purpose. But it doesn't fix the problem. -- H.J.
Re: FDO patch -- make ic related vars TLS if target allows
On Wed, Apr 27, 2011 at 10:54 AM, Xinliang David Li davi...@google.com wrote: Hi please review the trivial patch below. It reduces race conditions in value profiling. Another trivial change (to initialize function_list struct) is also included. Bootstrapped and regression tested on x86-64/linux. Thanks, David 2011-04-27 Xinliang David Li davi...@google.com * tree-profile.c (init_ic_make_global_vars): Set tls attribute on ic vars. * coverage.c (coverage_end_function): Initialize function_list with zero. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49014 -- H.J.
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
Hmm, sad. As the a check in tree-cfg for truth-expressions about having type-precision of 1 would be a good way. What is actual the cause for not setting type-precision here? But we are setting it: /* In Ada, we use an unsigned 8-bit type for the default boolean type. */ boolean_type_node = make_unsigned_type (8); TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE); See make_unsigned_type: /* Create and return a type for unsigned integers of PRECISION bits. */ tree make_unsigned_type (int precision) { tree type = make_node (INTEGER_TYPE); TYPE_PRECISION (type) = precision; fixup_unsigned_type (type); return type; } The other languages are changing the precision, but in Ada we need a standard scalar (precision == mode size) in order to support invalid values. At least in testcases I didn't found a regression caused by this. Right, I've just installed the attached testcase, it passes with the unmodified compiler but fails with your gcc-interface/misc.c change. 2011-05-17 Eric Botcazou ebotca...@adacore.com * gnat.dg/invalid1.adb: New test. -- Eric Botcazou -- { dg-do run } -- { dg-options -gnatws -gnatVa } pragma Initialize_Scalars; procedure Invalid1 is X : Boolean; A : Boolean := False; procedure Uninit (B : out Boolean) is begin if A then B := True; raise Program_Error; end if; end; begin -- first, check that initialize_scalars is enabled begin if X then A := False; end if; raise Program_Error; exception when Constraint_Error = null; end; -- second, check if copyback of an invalid value raises constraint error begin Uninit (A); if A then -- we expect constraint error in the 'if' above according to gnat ug: -- -- call. Note that there is no specific option to test `out' -- parameters, but any reference within the subprogram will be tested -- in the usual manner, and if an invalid value is copied back, any -- reference to it will be subject to validity checking. -- ... raise Program_Error; end if; raise Program_Error; exception when Constraint_Error = null; end; end;
Re: [PATCH] comment precising need to use free_dominance_info
So maybe this patch adding a comment on calculate_dominance_info is more adapted. ChangeLog: 2011-05-17 Pierre Vittetpier...@pvittet.com * dominance.c (calculate_dominance_info): Add comment precising when to free with free_dominance_info contributor number: 634276 Index: gcc/dominance.c === --- gcc/dominance.c (revision 173830) +++ gcc/dominance.c (working copy) @@ -628,8 +628,15 @@ compute_dom_fast_query (enum cdi_direction dir) } /* The main entry point into this module. DIR is set depending on whether - we want to compute dominators or postdominators. */ + we want to compute dominators or postdominators. + We try to keep dominance info alive as long as possible (to avoid + recomputing it often). It has to be freed with free_dominance_info when CFG + transformation makes it invalide. + + post_dominance info is less often used, and should be freed after each use. +*/ + void calculate_dominance_info (enum cdi_direction dir) {
RFA: MN10300: Add TLS support
Hi Richard, Hi Jeff, Hi Alex, Here is another MN10300 patch. This ones adds support for TLS. I must confess that I did not actually write this code - DJ did - but I have been asked to submit it upstream, so here goes: OK to apply ? Cheers Nick gcc/ChangeLog 2011-05-17 DJ Delorie d...@redhat.com Nick Clifton ni...@redhat.com * config/mn10300/mn10300.c (mn10300_unspec_int_label_counter): New variable. (mn10300_option_override): Disable TLS for the MN10300. (tls_symbolic_operand_kind): New function. (get_some_local_dynamic_name_1): New function. (get_some_local_dynamic_name): New function. (mn10300_print_operand): Handle %. (mn10300_legitimize_address): Legitimize TLS addresses. (is_legitimate_tls_operand): New function. (mn10300_legitimate_pic_operand_p): TLS operands are legitimate. (mn10300_legitimate_address_p): TLS symbols do not make legitimate addresses. Allow TLS operands under some circumstances. (mn10300_legitimate_constant_p): Handle TLS UNSPECs. (mn10300_init_machine_status): New function. (mn10300_init_expanders): New function. (pic_nonpic_got_ptr): New function. (mn10300_tls_get_addr): New function. (mn10300_legitimize_tls_address): New function. (mn10300_constant_address_p): New function. (TARGET_HAVE_TLS): Define. * config/mn10300/predicates.md (tls_symbolic_operand): New. (nontls_general_operand): New. * config/mn10300/mn10300.h (enum reg_class): Add D0_REGS, A0_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (struct machine_function): New structure. (INIT_EXPANDERS): Define. (mn10300_unspec_int_label_counter): New variable. (PRINT_OPERAND_PUNCT_VALID_P): Define. (CONSTANT_ADDRESS_P): Define. * config/mn10300/constraints (B): New constraint. (C): New constraint. * config/mn10300/mn10300-protos.h: Alpha sort. (mn10300_init_expanders): Prototype. (mn10300_tls_get_addr): Prototype. (mn10300_legitimize_tls_address): Prototype. (mn10300_constant_address_p): Prototype. * config/mn10300/mn10300.md (TLS_REG): New constant. (UNSPEC_INT_LABEL): New constant. (UNSPEC_TLSGD): New constant. (UNSPEC_TLSLDM): New constant. (UNSPEC_DTPOFF): New constant. (UNSPEC_GOTNTPOFF): New constant. (UNSPEC_INDNTPOFF): New constant. (UNSPEC_TPOFF): New constant. (UNSPEC_TLS_GD): New constant. (UNSPEC_TLS_LD_BASE): New constant. (movsi): Add TLS code. (tls_global_dynamic_i): New pattern. (tls_global_dynamic): New pattern. (tls_local_dynamic_base_i): New pattern. (tls_local_dynamic_base): New pattern. (tls_initial_exec): New pattern. (tls_initial_exec_1): New pattern. (tls_initial_exec_2): New pattern. (am33_set_got): New pattern. (int_label): New pattern. (am33_loadPC_anyreg): New pattern. (add_GOT_to_any_reg): New pattern. Index: gcc/config/mn10300/mn10300.c === --- gcc/config/mn10300/mn10300.c (revision 173815) +++ gcc/config/mn10300/mn10300.c (working copy) @@ -46,7 +46,12 @@ #include df.h #include opts.h #include cfgloop.h +#include ggc.h +/* This is used by GOTaddr2picreg to uniquely identify + UNSPEC_INT_LABELs. */ +int mn10300_unspec_int_label_counter; + /* This is used in the am33_2.0-linux-gnu port, in which global symbol names are not prefixed by underscores, to tell whether to prefix a label with a plus sign or not, so that the assembler can tell @@ -124,6 +129,9 @@ target_flags = ~MASK_MULT_BUG; else { + /* We can't do TLS if we don't have the TLS register. */ + targetm.have_tls = false; + /* Disable scheduling for the MN10300 as we do not have timing information available for it. */ flag_schedule_insns = 0; @@ -162,6 +170,51 @@ fprintf (asm_out_file, \t.am33\n); } +/* Returns non-zero if OP has the KIND tls model. */ + +static inline bool +tls_symbolic_operand_kind (rtx op, enum tls_model kind) +{ + if (GET_CODE (op) != SYMBOL_REF) +return false; + return SYMBOL_REF_TLS_MODEL (op) == kind; +} + +/* Locate some local-dynamic symbol still in use by this function + so that we can print its name in some tls_local_dynamic_base + pattern. This is used by % in print_operand(). */ + +static int +get_some_local_dynamic_name_1 (rtx *px, void *data ATTRIBUTE_UNUSED) +{ + rtx x = *px; + + if (GET_CODE (x) == SYMBOL_REF + tls_symbolic_operand_kind (x, TLS_MODEL_LOCAL_DYNAMIC)) +{ + cfun-machine-some_ld_name = XSTR (x, 0); + return 1; +} + + return 0; +} + +static const char * +get_some_local_dynamic_name (void) +{
[PATCH] Fixup LTO SCC hash comparison fn
Quite obvious if you look at it for the 100th time... Richard. 2011-05-17 Richard Guenther rguent...@suse.de * gimple.c (type_hash_pair_compare): Fix comparison. Index: gcc/gimple.c === --- gcc/gimple.c(revision 173830) +++ gcc/gimple.c(working copy) @@ -4070,9 +4070,11 @@ type_hash_pair_compare (const void *p1_, { const struct type_hash_pair *p1 = (const struct type_hash_pair *) p1_; const struct type_hash_pair *p2 = (const struct type_hash_pair *) p2_; - if (p1-hash == p2-hash) -return TYPE_UID (p1-type) - TYPE_UID (p2-type); - return p1-hash - p2-hash; + if (p1-hash p2-hash) +return -1; + else if (p1-hash p2-hash) +return 1; + return 0; } /* Returning a hash value for gimple type TYPE combined with VAL.
[PING][PATCH 13/18] move TS_EXP to be a substructure of TS_TYPED
On 05/10/2011 04:18 PM, Nathan Froyd wrote: On 03/10/2011 11:23 PM, Nathan Froyd wrote: After all that, we can finally make tree_exp inherit from typed_tree. Quite anticlimatic. Ping. http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00559.html Ping^2. -Nathan
Re: Libiberty: POSIXify psignal definition
On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote: [Please keep me CCed, I'm not subscribed to gcc-patches. Thank you] Hi, the definition of psignal in libiberty is void psignal (int, char *); The correct definition per POSIX is void psignal (int, const char *); The below patch fixes that. Thanks, Corinna * strsignal.c (psignal): Change second parameter to const char *. Fix comment accordingly. OK. R.
Re: Libiberty: POSIXify psignal definition
* strsignal.c (psignal): Change second parameter to const char *. Fix comment accordingly. OK. I had argued against this patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html The newlib change broke ALL released versions of gcc, and the above patch does NOT fix the problem, but merely hides it until the next time we trip over it.
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011/5/17 Eric Botcazou ebotca...@adacore.com: Hmm, sad. As the a check in tree-cfg for truth-expressions about having type-precision of 1 would be a good way. What is actual the cause for not setting type-precision here? But we are setting it: /* In Ada, we use an unsigned 8-bit type for the default boolean type. */ boolean_type_node = make_unsigned_type (8); TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE); See make_unsigned_type: /* Create and return a type for unsigned integers of PRECISION bits. */ tree make_unsigned_type (int precision) { tree type = make_node (INTEGER_TYPE); TYPE_PRECISION (type) = precision; fixup_unsigned_type (type); return type; } The other languages are changing the precision, but in Ada we need a standard scalar (precision == mode size) in order to support invalid values. At least in testcases I didn't found a regression caused by this. Right, I've just installed the attached testcase, it passes with the unmodified compiler but fails with your gcc-interface/misc.c change. 2011-05-17 Eric Botcazou ebotca...@adacore.com * gnat.dg/invalid1.adb: New test. -- Eric Botcazou Ok, thanks for explaining it. So would be patch ok for apply without the precision setting? Regards, Kai
Re: Libiberty: POSIXify psignal definition
On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote: * strsignal.c (psignal): Change second parameter to const char *. Fix comment accordingly. OK. I had argued against this patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html The newlib change broke ALL released versions of gcc, and the above patch does NOT fix the problem, but merely hides it until the next time we trip over it. So regardless of whether the changes to newlib are a good idea or not, I think the fix to libiberty is still right. Posix says that psignal takes a const char *, and libiberty's implementation doesn't. That's just silly. I do agree that the newlib code should be tightened up, particularly in order to support older compilers; but that doesn't mean we shouldn't fix libiberty as well. R.
Re: Libiberty: POSIXify psignal definition
On May 17 16:33, Richard Earnshaw wrote: On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote: [Please keep me CCed, I'm not subscribed to gcc-patches. Thank you] Hi, the definition of psignal in libiberty is void psignal (int, char *); The correct definition per POSIX is void psignal (int, const char *); The below patch fixes that. Thanks, Corinna * strsignal.c (psignal): Change second parameter to const char *. Fix comment accordingly. OK. R. Thanks. I just have no check in rights to the gcc repository. I applied the change to the sourceware CVS repository but for gcc I need a proxy. Thanks, Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Re: Libiberty: POSIXify psignal definition
On May 17 17:07, Richard Earnshaw wrote: On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote: * strsignal.c (psignal): Change second parameter to const char *. Fix comment accordingly. OK. I had argued against this patch: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html The newlib change broke ALL released versions of gcc, and the above patch does NOT fix the problem, but merely hides it until the next time we trip over it. So regardless of whether the changes to newlib are a good idea or not, I think the fix to libiberty is still right. Posix says that psignal takes a const char *, and libiberty's implementation doesn't. That's just silly. I do agree that the newlib code should be tightened up, particularly in order to support older compilers; What I don't understand is why the newlib change broke older compilers. The function has been added to newlib and the definitions in newlib are correct. If this is refering to the fact that libiberty doesn't grok automatically if a symbol has been added to newlib, then that's a problem in libiberty, not in newlib. Otherwise, if you're building an older compiler, just use an older newlib as well. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Re: Libiberty: POSIXify psignal definition
So regardless of whether the changes to newlib are a good idea or not, I think the fix to libiberty is still right. Irrelevent. I said I'd accept that change *after* the real problem is fixed. The real problem hasn't been fixed. The real problem is that libibery should NOT INCLUDE PSIGNAL AT ALL if newlib has it. What *should* have happened, is libiberty should have been fixed *first*, and newlib waited until a gcc/binutils release cycle happened, so that at least ONE version of those could build with newlib.
Re: Libiberty: POSIXify psignal definition
Thanks. I just have no check in rights to the gcc repository. I applied the change to the sourceware CVS repository but for gcc I need a proxy. Please, never apply libiberty patches only to src. They're likely to get deleted by the robomerge. The rule is: gcc only, or both at the same time.
Re: Libiberty: POSIXify psignal definition
What I don't understand is why the newlib change broke older compilers. Older compilers have the older libiberty. At the moment, libiberty cannot be built by *any* released gcc, because you cannot *build* any released gcc, because it cannot build its target libiberty. The function has been added to newlib and the definitions in newlib are correct. Correct is irrelevent. They don't match libiberty, so the build breaks. If this is refering to the fact that libiberty doesn't grok automatically if a symbol has been added to newlib, then that's a problem in libiberty, not in newlib. It's a problem in every released gcc at the moment, so no released gcc can be built for a newlib target, without hacking the sources. Otherwise, if you're building an older compiler, just use an older newlib as well. The only option here is to not release a newlib at all until a fixed gcc release happens, then, and require that fixed gcc for that version of newlib forward.
Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler
On 05/14/2011 09:40 PM, Janne Blomqvist wrote: Hi, the current version of showing the backtrace is not async-signal-safe as it uses backtrace_symbols() which, in turn, uses malloc(). The attached patch changes the backtrace printing functionality to instead use backtrace_symbols_fd() and pipes. Great - this would solve a problem I filed a bugzilla report for years ago (unfortunately, I do not know the number of it). I closed it WONTFIX, because neither FX nor I could come up with an alternative way *not* using malloc. [ The problem was getting a traceback after corruption of the malloc arena, which just hangs under the current implementation. ] -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
[PATCH, i386]: Trivial, use bool some more.
Hello! 2011-05-16 Uros Bizjak ubiz...@gmail.com * config/i386/i386-protos.h (output_fix_trunc): Change arg 3 to bool. (output_fp_compare): Change args 3 and 4 to bool. (ix86_expand_call): Change arg 6 to bool. (ix86_attr_length_immediate_default): Change arg 2 to bool. (ix86_attr_length_vex_default): Change arg 3 to bool. * config/i386/i386.md: Update all uses. * config/i386/i386.c: Ditto. (ix86_flags_dependent): Change return type to bool. Patch was tested on x86_64-pc-linux-gnu {,-m32}, also with --enable-build-with-cxx (additional patch is needed to bootstrap without errors ATM). Committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 173832) +++ config/i386/i386.md (working copy) @@ -414,9 +414,9 @@ (const_int 0) (eq_attr type alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1, imul,icmp,push,pop) - (symbol_ref ix86_attr_length_immediate_default(insn,1)) + (symbol_ref ix86_attr_length_immediate_default (insn, true)) (eq_attr type imov,test) - (symbol_ref ix86_attr_length_immediate_default(insn,0)) + (symbol_ref ix86_attr_length_immediate_default (insn, false)) (eq_attr type call) (if_then_else (match_operand 0 constant_call_address_operand ) (const_int 4) @@ -524,11 +524,11 @@ (if_then_else (and (eq_attr prefix_0f 1) (eq_attr prefix_extra 0)) (if_then_else (eq_attr prefix_vex_w 1) - (symbol_ref ix86_attr_length_vex_default (insn, 1, 1)) - (symbol_ref ix86_attr_length_vex_default (insn, 1, 0))) + (symbol_ref ix86_attr_length_vex_default (insn, true, true)) + (symbol_ref ix86_attr_length_vex_default (insn, true, false))) (if_then_else (eq_attr prefix_vex_w 1) - (symbol_ref ix86_attr_length_vex_default (insn, 0, 1)) - (symbol_ref ix86_attr_length_vex_default (insn, 0, 0) + (symbol_ref ix86_attr_length_vex_default (insn, false, true)) + (symbol_ref ix86_attr_length_vex_default (insn, false, false) ;; Set when modrm byte is used. (define_attr modrm @@ -1262,7 +1262,7 @@ UNSPEC_FNSTSW))] X87_FLOAT_MODE_P (GET_MODE (operands[1])) GET_MODE (operands[1]) == GET_MODE (operands[2]) - * return output_fp_compare (insn, operands, 0, 0); + * return output_fp_compare (insn, operands, false, false); [(set_attr type multi) (set_attr unit i387) (set (attr mode) @@ -1309,7 +1309,7 @@ (match_operand:XF 2 register_operand f))] UNSPEC_FNSTSW))] TARGET_80387 - * return output_fp_compare (insn, operands, 0, 0); + * return output_fp_compare (insn, operands, false, false); [(set_attr type multi) (set_attr unit i387) (set_attr mode XF)]) @@ -1343,7 +1343,7 @@ (match_operand:MODEF 2 nonimmediate_operand fm))] UNSPEC_FNSTSW))] TARGET_80387 - * return output_fp_compare (insn, operands, 0, 0); + * return output_fp_compare (insn, operands, false, false); [(set_attr type multi) (set_attr unit i387) (set_attr mode MODE)]) @@ -1378,7 +1378,7 @@ UNSPEC_FNSTSW))] X87_FLOAT_MODE_P (GET_MODE (operands[1])) GET_MODE (operands[1]) == GET_MODE (operands[2]) - * return output_fp_compare (insn, operands, 0, 1); + * return output_fp_compare (insn, operands, false, true); [(set_attr type multi) (set_attr unit i387) (set (attr mode) @@ -1428,7 +1428,7 @@ X87_FLOAT_MODE_P (GET_MODE (operands[1])) (TARGET_USE_MODEMODE_FIOP || optimize_function_for_size_p (cfun)) (GET_MODE (operands [3]) == GET_MODE (operands[1])) - * return output_fp_compare (insn, operands, 0, 0); + * return output_fp_compare (insn, operands, false, false); [(set_attr type multi) (set_attr unit i387) (set_attr fp_int_src true) @@ -1504,7 +1504,7 @@ TARGET_MIX_SSE_I387 SSE_FLOAT_MODE_P (GET_MODE (operands[0])) GET_MODE (operands[0]) == GET_MODE (operands[1]) - * return output_fp_compare (insn, operands, 1, 0); + * return output_fp_compare (insn, operands, true, false); [(set_attr type fcmp,ssecomi) (set_attr prefix orig,maybe_vex) (set (attr mode) @@ -1533,7 +1533,7 @@ TARGET_SSE_MATH SSE_FLOAT_MODE_P (GET_MODE (operands[0])) GET_MODE (operands[0]) == GET_MODE (operands[1]) - * return output_fp_compare (insn, operands, 1, 0); + * return output_fp_compare (insn, operands, true, false); [(set_attr type ssecomi) (set_attr prefix maybe_vex) (set (attr mode) @@ -1557,7 +1557,7 @@ TARGET_CMOVE !(SSE_FLOAT_MODE_P (GET_MODE (operands[0])) TARGET_SSE_MATH) GET_MODE (operands[0]) == GET_MODE (operands[1]) - * return output_fp_compare (insn, operands, 1, 0); + * return output_fp_compare (insn, operands, true, false); [(set_attr type fcmp) (set (attr mode) (cond
[PATCH]: Restore bootstrap with --enable-build-with-cxx
Hello! 2011-05-17 Uros Bizjak ubiz...@gmail.com * ipa-inline-analysis.c (inline_node_duplication_hook): Initialize info-entry with 0 * tree-inline.c (maybe_inline_call_in_expr): Initialize id.transform_lang_insert_block with NULL. Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Uros. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 173832) +++ ipa-inline-analysis.c (working copy) @@ -702,7 +702,7 @@ inline_node_duplication_hook (struct cgr bool inlined_to_p = false; struct cgraph_edge *edge; - info-entry = false; + info-entry = 0; VEC_safe_grow_cleared (tree, heap, known_vals, count); for (i = 0; i count; i++) { Index: tree-inline.c === --- tree-inline.c (revision 173832) +++ tree-inline.c (working copy) @@ -5232,7 +5232,7 @@ maybe_inline_call_in_expr (tree exp) id.transform_call_graph_edges = CB_CGE_DUPLICATE; id.transform_new_cfg = false; id.transform_return_to_modify = true; - id.transform_lang_insert_block = false; + id.transform_lang_insert_block = NULL; /* Make sure not to unshare trees behind the front-end's back since front-end specific mechanisms may rely on sharing. */
[PATCH, MELT] correcting path error in the Makefile.in
This patch correct a bug in the current revision of MELT, which was preventing MELT to run correctly. This was a path problem in gcc/Makefile.in (melt-modules/ and melt-modules.mk) were not found. My contributor number is 634276. changelog : 2011-05-17 Pierre Vittet pier...@pvittet.com * Makefile.in : Correct path errors for melt_module_dir and for install-melt-mk target Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 173832) +++ gcc/Makefile.in (working copy) @@ -5352,7 +5352,7 @@ melt_default_modules_list=melt-default-modules melt_source_dir=$(libexecsubdir)/melt-source/ ## this is the installation directory of melt dynamic modules (*.so) -melt_module_dir=$(libexecsubdir)/melt-module/ +melt_module_dir=$(libexecsubdir)/melt-modules/ ## this is the installed path of the MELT module makefile melt_installed_module_makefile=$(libexecsubdir)/melt-module.mk @@ -5416,8 +5416,8 @@ install-melt-modules: melt-modules melt-all-module ## install the makefile for MELT modules install-melt-mk: melt-module.mk - $(mkinstalldirs) $(DESTDIR)$(plugin_includedir) - $(INSTALL_DATA) $ $(DESTDIR)/$(plugin_includedir)/ + $(mkinstalldirs) $(DESTDIR)$(libexecsubdir) + $(INSTALL_DATA) $ $(DESTDIR)/$(libexecsubdir)/ ## install the default modules list install-melt-default-modules-list: $(melt_default_modules_list).modlis
Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler
On 05/17/2011 07:50 PM, Toon Moene wrote: On 05/14/2011 09:40 PM, Janne Blomqvist wrote: Hi, the current version of showing the backtrace is not async-signal-safe as it uses backtrace_symbols() which, in turn, uses malloc(). The attached patch changes the backtrace printing functionality to instead use backtrace_symbols_fd() and pipes. Great - this would solve a problem I filed a bugzilla report for years ago (unfortunately, I do not know the number of it). It was 33905 (2007-10-26). -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH, MELT] correcting path error in the Makefile.in
On Tue, 17 May 2011 21:30:44 +0200 Pierre Vittet pier...@pvittet.com wrote: This patch correct a bug in the current revision of MELT, which was preventing MELT to run correctly. This was a path problem in gcc/Makefile.in (melt-modules/ and melt-modules.mk) were not found. My contributor number is 634276. changelog : 2011-05-17 Pierre Vittet pier...@pvittet.com * Makefile.in : Correct path errors for melt_module_dir and for install-melt-mk target The ChangeLog.MELT entry should mention the Makefile target as changelog functions. And the colon shouldn't have any space before. So I applied the patch with the following entry: 2011-05-17 Pierre Vittet pier...@pvittet.com * Makefile.in (melt_module_dir,install-melt-mk): Correct path errors. Committed revision 173835. Thanks. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On 05/17/2011 08:32 PM, Uros Bizjak wrote: Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Does that mean that I can now remove the --disable-werror from my daily C++ bootstrap run ? It's great that some people understand the intricacies of the infight^H^H^H^H^H^H differences between the C and C++ type model. OK: 1/2 :-) -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Restore MIPS builds
I've applied the patch below to restore -Werror MIPS builds. Tested on mips64-linux-gnu. Richard gcc/ * config/mips/mips.c (mips_handle_option): Remove unused variable. Index: gcc/config/mips/mips.c === --- gcc/config/mips/mips.c 2011-05-15 08:37:21.0 +0100 +++ gcc/config/mips/mips.c 2011-05-15 08:37:28.0 +0100 @@ -15287,7 +15287,6 @@ mips_handle_option (struct gcc_options * location_t loc ATTRIBUTE_UNUSED) { size_t code = decoded-opt_index; - const char *arg = decoded-arg; switch (code) {
[PATCH] fix vfmsubaddpd/vfmaddsubpd generation
This patch fixes an obvious problem: the fma4_fmsubadd/fma4_fmaddsub instruction templates don't generate vfmsubaddpd/vfmaddsubpd because they don't use ssemodesuffix This passes bootstrap on x86_64 on trunk. Okay to commit? BTW, I'm testing on gcc-4_6-branch. Should I post a different patch thread, or just use this one? -- Quentin From aa70d4f6180f1c6712888b7328723232b5da8bdc Mon Sep 17 00:00:00 2001 From: Quentin Neill quentin.ne...@amd.com Date: Tue, 17 May 2011 10:24:17 -0500 Subject: [PATCH] 2011-05-17 Harsha Jagasia harsha.jaga...@amd.com * config/i386/sse.md (fma4_fmsubadd): Use ssemodesuffix. (fma4_fmaddsub): Likewise --- gcc/ChangeLog |5 + gcc/config/i386/sse.md |4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3625d9b..e86ea4e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-05-17 Harsha Jagasia harsha.jaga...@amd.com + + * config/i386/sse.md (fma4_fmsubadd): Use ssemodesuffix. + (fma4_fmaddsub): Likewise + 2011-05-17 Richard Guenther rguent...@suse.de * gimple.c (iterative_hash_gimple_type): Simplify singleton diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 291bffb..7c4e6dd 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1663,7 +1663,7 @@ (match_operand:VF 3 nonimmediate_operand xm,x)] UNSPEC_FMADDSUB))] TARGET_FMA4 - vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmaddsubpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} [(set_attr type ssemuladd) (set_attr mode MODE)]) @@ -1676,7 +1676,7 @@ (match_operand:VF 3 nonimmediate_operand xm,x))] UNSPEC_FMADDSUB))] TARGET_FMA4 - vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmsubaddpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} [(set_attr type ssemuladd) (set_attr mode MODE)]) -- 1.7.1
[v3] tuple vs noexcept
Hi, this time too, took the occasion to add the get(tuple) bits. Tested x86_64-linux, committed. Paolo. /// 2011-05-17 Paolo Carlini paolo.carl...@oracle.com * include/std/tuple: Use noexcept where appropriate. (tuple::swap): Rework implementation. (_Head_base::_M_swap_impl): Remove. (get(std::tuple)): Add. * testsuite/20_util/tuple/element_access/get2.cc: New. * testsuite/20_util/weak_ptr/comparison/cmp_neg.cc: Adjust dg-error line number. Index: include/std/tuple === --- include/std/tuple (revision 173832) +++ include/std/tuple (working copy) @@ -59,6 +59,15 @@ struct __add_ref_Tp { typedef _Tp type; }; + // Adds an rvalue reference to a non-reference type. + templatetypename _Tp +struct __add_r_ref +{ typedef _Tp type; }; + + templatetypename _Tp +struct __add_r_ref_Tp +{ typedef _Tp type; }; + templatestd::size_t _Idx, typename _Head, bool _IsEmpty struct _Head_base; @@ -78,13 +87,6 @@ _Head _M_head() { return *this; } const _Head _M_head() const { return *this; } - - void - _M_swap_impl(_Head __h) - { - using std::swap; - swap(__h, _M_head()); - } }; templatestd::size_t _Idx, typename _Head @@ -103,13 +105,6 @@ _Head _M_head() { return _M_head_impl; } const _Head _M_head() const { return _M_head_impl; } - void - _M_swap_impl(_Head __h) - { - using std::swap; - swap(__h, _M_head()); - } - _Head _M_head_impl; }; @@ -130,9 +125,11 @@ */ templatestd::size_t _Idx struct _Tuple_impl_Idx -{ +{ + templatestd::size_t, typename... friend class _Tuple_impl; + protected: - void _M_swap_impl(_Tuple_impl) { /* no-op */ } + void _M_swap(_Tuple_impl) noexcept { /* no-op */ } }; /** @@ -145,6 +142,8 @@ : public _Tuple_impl_Idx + 1, _Tail..., private _Head_base_Idx, _Head, std::is_empty_Head::value { + templatestd::size_t, typename... friend class _Tuple_impl; + typedef _Tuple_impl_Idx + 1, _Tail... _Inherited; typedef _Head_base_Idx, _Head, std::is_empty_Head::value _Base; @@ -218,10 +217,14 @@ protected: void - _M_swap_impl(_Tuple_impl __in) + _M_swap(_Tuple_impl __in) + noexcept(noexcept(swap(std::declval_Head(), +std::declval_Head())) + noexcept(__in._M_tail()._M_swap(__in._M_tail( { - _Base::_M_swap_impl(__in._M_head()); - _Inherited::_M_swap_impl(__in._M_tail()); + using std::swap; + swap(this-_M_head(), __in._M_head()); + _Inherited::_M_swap(__in._M_tail()); } }; @@ -300,14 +303,15 @@ void swap(tuple __in) - { _Inherited::_M_swap_impl(__in); } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; template class tuple { public: - void swap(tuple) { /* no-op */ } + void swap(tuple) noexcept { /* no-op */ } }; /// tuple (2-element), with construction and assignment from a pair. @@ -360,6 +364,7 @@ tuple operator=(tuple __in) + // noexcept has to wait is_nothrow_move_assignable { static_cast_Inherited(*this) = std::move(__in); return *this; @@ -392,7 +397,7 @@ templatetypename _U1, typename _U2 tuple -operator=(pair_U1, _U2 __in) +operator=(pair_U1, _U2 __in) noexcept { this-_M_head() = std::forward_U1(__in.first); this-_M_tail()._M_head() = std::forward_U2(__in.second); @@ -401,11 +406,8 @@ void swap(tuple __in) - { - using std::swap; - swap(this-_M_head(), __in._M_head()); - swap(this-_M_tail()._M_head(), __in._M_tail()._M_head()); - } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; /// tuple (1-element). @@ -473,7 +475,8 @@ void swap(tuple __in) - { _Inherited::_M_swap_impl(__in); } + noexcept(noexcept(__in._M_swap(__in))) + { _Inherited::_M_swap(__in); } }; @@ -522,22 +525,31 @@ __get_helper(const _Tuple_impl__i, _Head, _Tail... __t) { return __t._M_head(); } - // Return a reference (const reference) to the ith element of a tuple. - // Any const or non-const ref elements are returned with their original type. + // Return a reference (const reference, rvalue reference) to the ith element + // of a tuple. Any const or non-const ref elements are returned with their + // original type. templatestd::size_t __i, typename... _Elements inline typename __add_ref - typename tuple_element__i, tuple_Elements... ::type + typename tuple_element__i,
Fix PR 49026 (-mfpmath= attribute bug)
PR 49026 identified testsuite regressions when mfpmath= is set by target attributes, that for some reason appear on x86_64-darwin but not x86_64-linux. This patch fixes one place where I failed to preserve the logic of this attribute handling, and restores the code generated for the testcase to the code attached to that PR as being generated before my previous patch. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. 2011-05-17 Joseph Myers jos...@codesourcery.com * config/i386/i386.c (ix86_valid_target_attribute_tree): Use enum_opts_set when testing if attributes have set -mfpmath=. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 173809) +++ gcc/config/i386/i386.c (working copy) @@ -4692,7 +4692,7 @@ ix86_valid_target_attribute_tree (tree a || target_flags != def-x_target_flags || option_strings[IX86_FUNCTION_SPECIFIC_ARCH] || option_strings[IX86_FUNCTION_SPECIFIC_TUNE] - || ix86_fpmath != def-x_ix86_fpmath) + || enum_opts_set.x_ix86_fpmath) { /* If we are using the default tune= or arch=, undo the string assigned, and use the default. */ -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On Tue, May 17, 2011 at 2:46 PM, Toon Moene t...@moene.org wrote: On 05/17/2011 08:32 PM, Uros Bizjak wrote: Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Does that mean that I can now remove the --disable-werror from my daily C++ bootstrap run ? It's great that some people understand the intricacies of the infight^H^H^H^H^H^H differences between the C and C++ type model. OK: 1/2 :-) I suspect this infight would vanish if we just switched, as we discussed in the past. -- Gaby
Re: [google] Parameterize function overhead estimate for inlining
You will have a followup patch to override arm defaults, right? Ok for google/main. Thanks, David On Tue, May 17, 2011 at 9:29 PM, Mark Heffernan meh...@google.com wrote: This tiny change improves the size estimation for inlining and results in an average 1% size reduction and a small (maybe 0.25% geomean) performance increase on internal benchmarks on x86-64. I parameterized the value rather than changing it directly because previous exploration with x86 and ARM arches indicated that it varies significantly with architecture. Default value is tuned for x86-64. Bootstrapped and tested on x86-64. Will explore relevance and effectiveness for trunk and SPEC later. Ok for google/main? Mark 2011-05-17 Mark Heffernan meh...@google.com * ipa-inline.c (estimate_function_body_sizes): Parameterize static function static overhead. * params.def (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE): New parameter. Index: ipa-inline.c === --- ipa-inline.c (revision 173845) +++ ipa-inline.c (working copy) @@ -1979,10 +1979,11 @@ estimate_function_body_sizes (struct cgr gcov_type time = 0; gcov_type time_inlining_benefit = 0; /* Estimate static overhead for function prologue/epilogue and alignment. */ - int size = 2; + int size = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); /* Benefits are scaled by probability of elimination that is in range 0,2. */ - int size_inlining_benefit = 2 * 2; + int size_inlining_benefit = + PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE) * 2; basic_block bb; gimple_stmt_iterator bsi; struct function *my_function = DECL_STRUCT_FUNCTION (node-decl); Index: params.def === --- params.def (revision 173845) +++ params.def (working copy) @@ -110,6 +110,11 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO Inline recursively only when the probability of call being executed exceeds the parameter, 10, 0, 0) +DEFPARAM (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE, + inline-function-overhead-size, + Size estimate of function overhead (prologue and epilogue) for inlining purposes, + 7, 0, 0) + /* Limit of iterations of early inliner. This basically bounds number of nested indirect calls early inliner can resolve. Deeper chains are still handled by late inlining. */
[google] Increase inlining limits with FDO/LIPO
This small patch greatly expands the function size limits for inlining with FDO/LIPO. With profile information, the inliner is much more selective and precise and so the limits can be increased with less worry that functions and total code size will blow up. This speeds up x86-64 internal benchmarks by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to 1.5% with FDO. Size increase is negligible (0.1% mean). Bootstrapped and regression tested on x86-64. Trunk testing to follow. Ok for google/main? Mark 2011-05-17 Mark Heffernan meh...@google.com * opts.c (finish_options): Increase inlining limits with profile generate and use. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts opts-x_flag_split_stack = 0; } } + + if (opts-x_flag_profile_use + || opts-x_profile_arc_flag + || opts-x_flag_profile_values) +{ + /* With accurate profile information, inlining is much more +selective and makes better decisions, so increase the +inlining function size limits. Changes must be added to both +the generate and use builds to avoid profile mismatches. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, +opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, +opts-x_param_values, opts_set-x_param_values); +} }