Remove obsolete comment
The first part is obsolete since r91570 at least, for the second part about MEM_KEEP_ALIAS_SET_P it's more recent. Tested on x86_64-suse-linux, applied on the mainline. 2013-09-26 Eric Botcazou ebotca...@adacore.com * expr.c (expand_assignment): Remove obsolete comment. -- Eric BotcazouIndex: expr.c === --- expr.c (revision 202912) +++ expr.c (working copy) @@ -4836,11 +4836,7 @@ expand_assignment (tree to, tree from, b /* If the field is at offset zero, we could have been given the DECL_RTX of the parent struct. Don't munge it. */ to_rtx = shallow_copy_rtx (to_rtx); - set_mem_attributes_minus_bitpos (to_rtx, to, 0, bitpos); - - /* Deal with volatile and readonly fields. The former is only - done for MEM. Also set MEM_KEEP_ALIAS_SET_P if needed. */ if (volatilep) MEM_VOLATILE_P (to_rtx) = 1; }
Re: [PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP
Hi, This is the updated patch for expanding gimple stmts without zer/sign extensions when it is safe to do that. This is based on the latest changes to propagating value range information to SSA_NAMEs and addresses review comments from Eric. Bootstrapped and regtested on x86_64-unknown-linux-gnu and arm-none linux-gnueabi. Is this OK ? Thanks, Kugan ChangeLog Description: Binary data patch.diff Description: Binary data
RE: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, On Wed, 25 Sep 2013 16:01:02, Martin Jambor wrote: Hi, On Wed, Sep 25, 2013 at 11:05:21AM +0200, Richard Biener wrote: On Tue, Sep 24, 2013 at 8:06 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Tue, Sep 24, 2013 at 12:02:17PM +0200, Richard Biener wrote: On Tue, Sep 24, 2013 at 4:52 AM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi, with the attached patch the read-side of the out of bounds accesses are fixed. There is a single new test case pr57748-3.c that is derived from Martin's test case. The test case does only test the read access and does not depend on part 1 of the patch. This patch was boot-strapped and regression tested on x86_64-unknown-linux-gnu. Additionally I generated eCos and an eCos-application (on ARMv5 using packed structures) with an arm-eabi cross compiler, and looked for differences in the disassembled code with and without this patch, but there were none. OK for trunk? Index: gcc/expr.c === --- gcc/expr.c (revision 202778) +++ gcc/expr.c (working copy) @@ -9878,7 +9878,7 @@ modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, - modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier); + EXPAND_MEMORY); /* If the bitfield is volatile, we want to access it in the field's mode, not the computed mode. context suggests that we may arrive with EXPAND_STACK_PARM here which is a correctness modifier (see its docs). But I'm not too familiar with the details of the various expand modifiers, Eric may be though. That said, I still believe that fixing the misalign path in expand_assignment would be better than trying to avoid it. For this testcase the issue is again that expand_assignment passes the wrong mode/target to the movmisalign optab. Perhaps I'm stating the obvious, but this is supposed to address a separate bug in the expansion of the RHS (as opposed to the first bug which was in the expansion of the LHS), and so we would have to make expand_assignment to also examine potential misalignments in the RHS, which it so far does not do, despite its complexity. Having said that, I also dislike the patch, but I am quite convinced that if we allow non-BLK structures with zero sized arrays, the fix will be ugly - but I'll be glad to be shown I've been wrong. I don't think the issues have anything to do with zero sized arrays or non-BLK structures. The issue is just we are feeding movmisaling with the wrong mode (the mode of the base object) if we are expanding a base object which is unaligned and has non-BLK mode. I disagree. Consider a slightly modified testcase: #include stdlib.h typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); #if 1 typedef struct S { V a; V b[0]; } P __attribute__((aligned (1))); struct __attribute__((packed)) T { char c; P s; }; #else typedef struct S { V a; V b[0]; } P; struct T { char c; P s; }; #endif void __attribute__((noinline, noclone)) good_value (V b) { if (b[0] != 3 || b[1] != 4) __builtin_abort (); } void __attribute__((noinline, noclone)) check (P *p) { good_value (p-b[0]); } int main () { struct T *t = (struct T *) calloc (128, 1); V a = { 3, 4 }; t-s.b[0] = a; check (t-s); free (t); return 0; } The problem is the expansion of the memory load in function check. All involved modes, the one of the base structure and of the loaded component, are V2DI so even if we somehow mixed them up, in this example it should not matter, yet the unaligned load gives wrong results. I'm still convinced that the problem is that we have a V2DImode structure which is larger that the mode tells and we want to load the data from outside of its mode. That is only happening because zero sized arrays. Yes, this is another good example, and it passes vector values on the stack to a function. Again my patch produces working code, while the current trunk produces just completely _insane_ code. And again, this is not only a problem of structures with zero-sized arrays at the end. Remember my previous example code: On ARM (or anything with STRICT_ALIGNMENT) this union has the same problems: /* PR middle-end/57748 */ /* arm-eabi-gcc -mcpu=cortex-a9 -O3 */ #include stdlib.h union x { short a[2]; char x[4]; } __attribute__((packed, aligned(4))) ; typedef volatile union x *s; void __attribute__((noinline, noclone)) check (void) { s xx=(s)(0x8002); /* although volatile xx-x[3] reads 4 bytes here */ if (xx-x[3] != 3) abort (); } void __attribute__((noinline, noclone)) foo (void) { s xx=(s)(0x8002); xx-x[3] = 3; } int main () { foo (); check (); return 0; } So what we maybe need is another expand modifier that tells us to not use movmisalign when expanding the base object. In that case we can just as well use EXPAND_MEMORY. If so, I'd do that only when there is a zero sized array involved in order not to avoid
Re: [PATCH, LRA, AARCH64] Switching LRA on for AArch64
On 24/09/13 10:03, Yvan Roux wrote: Hi, The following patch switch LRA on for AArch64. The patch introduces an undocumented option -mlra to use LRA instead of reload, for a testing purpose. Please notice that this patch is dependent on the one submitted in the thread below: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html Thanks, Yvan 2013-09-24 Yvan Roux yvan.r...@linaro.org * config/aarch64/aarch64.opt (mlra): New option. * config/aarch64/aarch64.c (aarch64_lra_p): New function. (TARGET_LRA_P): Define.= +mlra +Target Report Var(aarch64_lra_flag) Init(1) Save +Use LRA instead of reload Please mark this (transitional) in the doc string. Otherwise OK. R.
Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
Eric, the current way component_uses_parent_alias_set is called from get_alias_set causes the reference tree chain to be walked O(n^2) in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT reference in the tree chain. Also the comment above component_uses_parent_alias_set doesn't seem to match behavior in get_alias_set. get_alias_set ends up using exactly the type of the parent of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference instead of the alias set provided by the object at the heart of T That's what the object at the heart of T means I think: given the loop in get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the parent of the outermost non-addressable component in the component chain (outermost in the type layout sense), which is what is wanted: when you go down the component chain from the base object to find the alias set, you need to stop at the first non-addressable component. That's what is achieved in the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the base object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable component, so that the alias set isn't touched any more after that. But I agree that the head comment and the interface of c_u_p_a_s are awkward, to say the least, and the quadratic aspect isn't very nice either. I also fail to see why component_uses_parent_alias_set invokes get_alias_set (is that to make 'a.c' in struct { char c; } a; use the alias set of 'a' instead of the alias set of 'char'? Yes, I think it's intended to stop the walk as soon as you have a type with alias set 0: if 'a' has alias set 0, then 'a.c' will have it. Or is it for correctness reasons in case there is a ref-all indirect ref at the base? For the former I believe it doesn't work because it only checks the non-outermost type). This code predates ref-all. So, the following patch removes the quadraticness by returning the parent of the innermost non-addressable (or alias-set zero) reference component instead of just a flag. That changes behavior for the alias-set zero case but still doesn't match the overall function comment. The change is fine, but let's rename the function and reword the comment (and agree on the terminology, I think it's better to use outermost here as already used in get_alias_set/reference_alias_ptr_type_1): /* Return the outermost parent of component present in the chain of component references handled by get_inner_reference in T with the following property: - the component is non-addressable, or - the parent has alias set zero, or NULL_TREE if no such parent exists. In the former cases, the alias set of this parent is the alias set that must be used for T itself. */ tree component_uses_parent_alias_set_from (const_tree t) However, I think the handling of the parent has alias set zero is wrong in your patch, you should test if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0) The only other use besides from get_alias_set is to set MEM_KEEP_ALIAS_SET_P - It means that the alias set already on the MEM may not be changed afterwards, even if you look into its sub-components later. whatever that is exactly for, I can see a single non-obvious use in store_constructor_field (didn't bother to lookup how callers specify the alias_set argument). In if (MEM_P (to_rtx) !MEM_KEEP_ALIAS_SET_P (to_rtx) DECL_NONADDRESSABLE_P (field)) { to_rtx = copy_rtx (to_rtx); MEM_KEEP_ALIAS_SET_P (to_rtx) = 1; } we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case MEM_KEEP_ALIAS_SET_P is dropped). Do you mean dropped in set_mem_attributes_minus_bitpos? No, I don't think we can do that. Btw, checks like if (!MEM_KEEP_ALIAS_SET_P (to_rtx) MEM_ALIAS_SET (to_rtx) != 0) set_mem_alias_set (to_rtx, alias_set); seem to miss MEM_ALIAS_SET (to_rtx) being ALIAS_SET_MEMORY_BARRIER? Or alias_set being zero / ALIAS_SET_MEMORY_BARRIER? This code predates ALIAS_SET_MEMORY_BARRIER. -- Eric Botcazou
RE: [PATCH] fortran/PR58113
On Wed, 25 Sep 2013 21:00:33, Tobias Burnus wrote: Bernd Edlinger wrote: this test case fails very often, and the reason is not in GCC but in a missing glibc rounding support for strtod. This patch fixes the test case, to first determine if the rounding support is available. This is often the case for real(16) thru the libquadmath. So even in cases where the test case usually fails it still tests something with this patch. Ok for trunk? First, for Fortran patches, it helps if you CC fortran@ as otherwise the email might be missed. Your change doesn't really directly check whether strtod handles rounding but whether libgfortran (invoking strtod) supports up/down rounding. Hence, after your patch, one effectively checks - given that up/down rounding works (or at least produces different values) - that the nearest/zero/up/down give the correct result. As only few strtod implementations currently support rounding, it is probably the best approach. However, I think it merits a comment making clear what it now checked (and what isn't). Maybe something along my paragraph (but polished) - and removing the then obsoleted parts of the existing comment. Except for the comment update, the patch looks fine to me. OK, I used some of your wordings to update the comment. I assume it's OK now. Tobias PS: I wonder whether there is a good way to do rounded strtod without relying on the system's libc to handle it. Apparently the real(16) aka libquadmath has an own implementation for strtod, that handles all rounding stuff the right way. Probably it should be possible to locate that code and rework it for the other possible real precisions, that currently rely on glibc's strtod. However I will likely not have time for that :( Bernd.2013-09-25 Bernd Edlinger bernd.edlin...@hotmail.de PR fortran/58113 * gfortran.dg/round_4.f90: Check for rounding support. patch-round4.diff Description: Binary data
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
So I still think my patch does the right thing. The rationale is: = expand_expr (tem, (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE COMPLETE_TYPE_P (TREE_TYPE (tem)) (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem))) != INTEGER_CST) modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, EXPAND_MEMORY); returns the address of the structure in question, we can add offset, bitoffset, and access the memory in the right mode and alignment information is passed to the backend via MEM_ALIGN (op0). But there are conceptually no reasons to require a MEM here. Look at all the code just below the block. Given how hard it is to eliminate spills to memory in RTL once they are generated, this shouldn't be taken lightly. -- Eric Botcazou
Commit: MSP430: Code gen improvements
Hi Guys, I am checking in the attached patch in order to fix some code generation problems exposed by running the GCC testsuite for the MSP430. The patch basically adds a definition of TARGET_PRINT_OPERAND_ADDRESS as some asm() statements needs this. It also adds a sign extending PSI to SI conversion pattern for when signed pointer values need to be stored in an SI pair of registers. Cheers Nick gcc/ChangeLog 2013-09-26 Nick Clifton ni...@redhat.com * config/msp430/msp430.c (msp430_expand_epilogue): Fix compile time warning message. (msp430_print_operand_raw): Delete unused letter parameter. (TARGET_PRINT_OPERAND_ADDRESS): Define. (msp430_print_operand_address): New function. (msp430_print_operand): Move address printing code from here to new function. * config/msp430/msp430.md (movsipsi2): Add comment in generated assembler. (zero_extendpsisi2): Likewise. (extendpsisi2): New pattern. (andneghi3): New pattern. msp430.patch.xz Description: application/xz
Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
On Thu, 26 Sep 2013, Eric Botcazou wrote: Eric, the current way component_uses_parent_alias_set is called from get_alias_set causes the reference tree chain to be walked O(n^2) in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT reference in the tree chain. Also the comment above component_uses_parent_alias_set doesn't seem to match behavior in get_alias_set. get_alias_set ends up using exactly the type of the parent of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference instead of the alias set provided by the object at the heart of T That's what the object at the heart of T means I think: given the loop in get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the parent of the outermost non-addressable component in the component chain (outermost in the type layout sense), which is what is wanted: when you go down the component chain from the base object to find the alias set, you need to stop at the first non-addressable component. That's what is achieved in the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the base object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable component, so that the alias set isn't touched any more after that. But I agree that the head comment and the interface of c_u_p_a_s are awkward, to say the least, and the quadratic aspect isn't very nice either. Ok. I also fail to see why component_uses_parent_alias_set invokes get_alias_set (is that to make 'a.c' in struct { char c; } a; use the alias set of 'a' instead of the alias set of 'char'? Yes, I think it's intended to stop the walk as soon as you have a type with alias set 0: if 'a' has alias set 0, then 'a.c' will have it. That wasn't the question - I was asking if we have a struct type with non-zero alias set, like struct { char c; int i } a; then if we have an access like a.c does the code want to use the alias set of 'a' (non-zero) for the access for optimization purposes? It seems not, given your comment below... [...] However, I think the handling of the parent has alias set zero is wrong in your patch, you should test if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0) but I fail to see how this can happen in practice? It can happen for ref-all bases but that case is handled separately. But ok, I'll leave the code as-is functionality wise, we should change semantics as followup if at all. The only other use besides from get_alias_set is to set MEM_KEEP_ALIAS_SET_P - It means that the alias set already on the MEM may not be changed afterwards, even if you look into its sub-components later. whatever that is exactly for, I can see a single non-obvious use in store_constructor_field (didn't bother to lookup how callers specify the alias_set argument). In if (MEM_P (to_rtx) !MEM_KEEP_ALIAS_SET_P (to_rtx) DECL_NONADDRESSABLE_P (field)) { to_rtx = copy_rtx (to_rtx); MEM_KEEP_ALIAS_SET_P (to_rtx) = 1; } we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case MEM_KEEP_ALIAS_SET_P is dropped). Do you mean dropped in set_mem_attributes_minus_bitpos? No, I don't think we can do that. Yeah, I thought of dropping it completely - we know the effective alias-set of the load/store stmt we are expanding via get_alias_set of the expression. I don't see why we need to invent new alias sets in any place down the road when creating sub-accesses (either from storing constructor components or from storing bitfield pieces). Thanks for the comments, I'll prepare a patch to only remove the quadraticness without changing anything else. Richard.
Re: [PATCH, RTL] Prepare ARM build with LRA
Hi Yvan, On 24 Sep 2013, at 09:29, Yvan Roux wrote: On 11 September 2013 21:08, Yvan Roux yvan.r...@linaro.org wrote: Here is the new patch discussed in the other thread. Thanks Yvan 2013-09-11 Yvan Roux yvan.r...@linaro.org Vladimir Makarov vmaka...@redhat.com * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations from the least significant bit. (strip_address_mutations): Add bitfield operations handling. (shift_code_p): New predicate for shifting operations. (must_be_index_p): Add shifting operations handling. (set_address_index): Likewise. as discussed on irc, this part (applied as r202914) breaks bootstrap on powerpc-darwin9 (and, presumably, other BE targets) with a signed/unsigned compare error at rtlanal.c:5482 below is a trivial patch, which makes both parts of test signed. With this, bootstrap completes on powerpc-darwin9 - however, you might want to check that it still does what you intended. thanks Iain diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 24cbcd2..0349bcc 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -5476,7 +5476,7 @@ lsb_bitfield_op_p (rtx x) if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS) { enum machine_mode mode = GET_MODE (XEXP (x, 0)); - unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); + HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); HOST_WIDE_INT pos = INTVAL (XEXP (x, 2)); return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));
[PATCH] Fix PR58539
The vectorizer does not honor the fact that debug statements do not participate in loop-closed-SSA construction and thus a SSA name can have outside loop uses that are not in the loop-closed PHI node but in a debug statement. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-09-26 Richard Biener rguent...@suse.de PR tree-optimization/58539 * tree-vect-loop.c (vect_create_epilog_for_reduction): Honor the fact that debug statements are not taking part in loop-closed SSA construction. * gcc.dg/torture/pr58539.c: New testcase. Index: gcc/tree-vect-loop.c === *** gcc/tree-vect-loop.c(revision 202883) --- gcc/tree-vect-loop.c(working copy) *** vect_finalize_reduction: *** 4411,4417 result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) ! if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p phis.safe_push (USE_STMT (use_p)); /* While we expect to have found an exit_phi because of loop-closed-ssa --- 4411,4418 result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) ! if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))) !!is_gimple_debug (USE_STMT (use_p))) phis.safe_push (USE_STMT (use_p)); /* While we expect to have found an exit_phi because of loop-closed-ssa *** vect_finalize_reduction: *** 4541,4547 FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p ! phis.safe_push (USE_STMT (use_p)); else { if (double_reduc gimple_code (USE_STMT (use_p)) == GIMPLE_PHI) --- 4542,4551 FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p ! { ! if (!is_gimple_debug (USE_STMT (use_p))) ! phis.safe_push (USE_STMT (use_p)); ! } else { if (double_reduc gimple_code (USE_STMT (use_p)) == GIMPLE_PHI) *** vect_finalize_reduction: *** 4551,4557 FOR_EACH_IMM_USE_FAST (phi_use_p, phi_imm_iter, phi_res) { if (!flow_bb_inside_loop_p (loop, ! gimple_bb (USE_STMT (phi_use_p phis.safe_push (USE_STMT (phi_use_p)); } } --- 4555,4562 FOR_EACH_IMM_USE_FAST (phi_use_p, phi_imm_iter, phi_res) { if (!flow_bb_inside_loop_p (loop, ! gimple_bb (USE_STMT (phi_use_p))) ! !is_gimple_debug (USE_STMT (phi_use_p))) phis.safe_push (USE_STMT (phi_use_p)); } } Index: gcc/testsuite/gcc.dg/torture/pr58539.c === *** gcc/testsuite/gcc.dg/torture/pr58539.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr58539.c (working copy) *** *** 0 --- 1,20 + /* { dg-do compile } */ + /* { dg-options -g } */ + + int a, b; + + extern void baz (int); + + int foo (int p) + { + return p ? p : 1; + } + + void bar () + { + int *c = a, *d = a; + for (b = 0; b 12; b++) + *d |= 1; + foo (*c); + baz (*c 1); + }
Re: [PATCH] Improve out-of-SSA coalescing
On Wed, 25 Sep 2013, Steven Bosscher wrote: On Wednesday, September 25, 2013, Richard Biener rguent...@suse.de wrote: On Wed, 25 Sep 2013, Richard Biener wrote: This loosens the restriction of only coalescing SSA names with the same base variable by ignoring that restriction for DECL_INGORED_P base variables (ok, all of them can and should be anonymous SSA names now, but code obviously hasn't catched up 100%). This improves the code generated for the loop in the testcase to fallthru .p2align 4,,10 .p2align 3 .L4: xorps %xmm1, %xmm1 cvtsi2ss%eax, %xmm1 addl$1, %eax cmpl%edi, %eax addss %xmm1, %xmm0 jne .L4 from jmp .L4 .p2align 4,,10 .p2align 3 .L6: movaps %xmm0, %xmm1 .L4: xorps %xmm0, %xmm0 cvtsi2ss%eax, %xmm0 addl$1, %eax cmpl%edi, %eax addss %xmm1, %xmm0 jne .L6 avoiding the copy on the backedge and the loop entry jump. Overall this is similar to what Jeff was after with his latest adjustment of this code. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. For some reason it miscompiles GCC itself. Hmm. Cannot spot the obvious error yet. Try reverting the gcc_assert change. With the checking_assert there will be different code for checking enabled or disabled. Nah, it was us coalescing PARM_DECLs and VAR_DECLs (and RESULT_DECLs). Fixed by restricting this handling to VAR_DECLs only. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-09-25 Richard Biener rguent...@suse.de * tree-ssa-live.c (var_map_base_init): Handle SSA names with DECL_IGNORED_P base VAR_DECLs like anonymous SSA names. (loe_visit_block): Use gcc_checking_assert. * tree-ssa-coalesce.c (create_outofssa_var_map): Use gimple_assign_ssa_name_copy_p. (gimple_can_coalesce_p): Adjust according to the var_map_base_init change. * gcc.dg/tree-ssa/coalesce-2.c: New testcase. Index: gcc/tree-ssa-live.c === *** gcc/tree-ssa-live.c.orig2013-09-26 11:50:32.0 +0200 --- gcc/tree-ssa-live.c 2013-09-26 12:11:49.412951758 +0200 *** var_map_base_init (var_map map) *** 104,110 struct tree_int_map **slot; unsigned baseindex; var = partition_to_var (map, x); ! if (SSA_NAME_VAR (var)) m-base.from = SSA_NAME_VAR (var); else /* This restricts what anonymous SSA names we can coalesce --- 104,112 struct tree_int_map **slot; unsigned baseindex; var = partition_to_var (map, x); ! if (SSA_NAME_VAR (var) ! (!VAR_P (SSA_NAME_VAR (var)) ! || !DECL_IGNORED_P (SSA_NAME_VAR (var m-base.from = SSA_NAME_VAR (var); else /* This restricts what anonymous SSA names we can coalesce *** loe_visit_block (tree_live_info_p live, *** 992,1000 edge_iterator ei; basic_block pred_bb; bitmap loe; - gcc_assert (!bitmap_bit_p (visited, bb-index)); bitmap_set_bit (visited, bb-index); loe = live_on_entry (live, bb); FOR_EACH_EDGE (e, ei, bb-preds) --- 994,1003 edge_iterator ei; basic_block pred_bb; bitmap loe; + gcc_checking_assert (!bitmap_bit_p (visited, bb-index)); bitmap_set_bit (visited, bb-index); + loe = live_on_entry (live, bb); FOR_EACH_EDGE (e, ei, bb-preds) Index: gcc/tree-ssa-coalesce.c === *** gcc/tree-ssa-coalesce.c.orig2013-09-26 11:50:32.0 +0200 --- gcc/tree-ssa-coalesce.c 2013-09-26 13:12:48.848382555 +0200 *** create_outofssa_var_map (coalesce_list_p *** 982,991 { tree lhs = gimple_assign_lhs (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); ! ! if (gimple_assign_copy_p (stmt) ! TREE_CODE (lhs) == SSA_NAME !TREE_CODE (rhs1) == SSA_NAME gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); --- 982,988 { tree lhs = gimple_assign_lhs (stmt); tree rhs1 = gimple_assign_rhs1 (stmt); ! if (gimple_assign_ssa_name_copy_p (stmt) gimple_can_coalesce_p (lhs, rhs1)) { v1 = SSA_NAME_VERSION (lhs); *** gimple_can_coalesce_p (tree name1, tree *** 1349,1355 { /* First check the SSA_NAME's associated DECL. We only want to coalesce if they have the same DECL or both have no associated DECL. */ ! if (SSA_NAME_VAR (name1) != SSA_NAME_VAR (name2)) return
Re: [PATCH, RTL] Prepare ARM build with LRA
(Added Eric and Richard) Sorry for the inconvenience Iain, It's ok for my side. Thanks, Yvan On 26 September 2013 13:18, Iain Sandoe i...@codesourcery.com wrote: Hi Yvan, On 24 Sep 2013, at 09:29, Yvan Roux wrote: On 11 September 2013 21:08, Yvan Roux yvan.r...@linaro.org wrote: Here is the new patch discussed in the other thread. Thanks Yvan 2013-09-11 Yvan Roux yvan.r...@linaro.org Vladimir Makarov vmaka...@redhat.com * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations from the least significant bit. (strip_address_mutations): Add bitfield operations handling. (shift_code_p): New predicate for shifting operations. (must_be_index_p): Add shifting operations handling. (set_address_index): Likewise. as discussed on irc, this part (applied as r202914) breaks bootstrap on powerpc-darwin9 (and, presumably, other BE targets) with a signed/unsigned compare error at rtlanal.c:5482 below is a trivial patch, which makes both parts of test signed. With this, bootstrap completes on powerpc-darwin9 - however, you might want to check that it still does what you intended. thanks Iain diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 24cbcd2..0349bcc 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -5476,7 +5476,7 @@ lsb_bitfield_op_p (rtx x) if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS) { enum machine_mode mode = GET_MODE (XEXP (x, 0)); - unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); + HOST_WIDE_INT len = INTVAL (XEXP (x, 1)); HOST_WIDE_INT pos = INTVAL (XEXP (x, 2)); return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));
RE: [PATCH, PR 57748] Check for out of bounds access, Part 2
Hi, On Thu, 26 Sep 2013 11:34:02, Eric Botcazou wrote: So I still think my patch does the right thing. The rationale is: = expand_expr (tem, (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE COMPLETE_TYPE_P (TREE_TYPE (tem)) (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem))) != INTEGER_CST) modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, EXPAND_MEMORY); returns the address of the structure in question, we can add offset, bitoffset, and access the memory in the right mode and alignment information is passed to the backend via MEM_ALIGN (op0). But there are conceptually no reasons to require a MEM here. Look at all the code just below the block. Given how hard it is to eliminate spills to memory in RTL once they are generated, this shouldn't be taken lightly. -- Eric Botcazou Sure, but the modifier is not meant to force something into memory, especially when it is already in an register. Remember, we are only talking of structures here, and we only want to access one member. It is more the other way round: It says: You do not have to load the value in a register, if it is already in memory I'm happy At least in my understanding. Bernd.
RE: [PATCH]Construct canonical scaled address expression in IVOPT
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 7:58 PM To: Bin.Cheng Cc: Bin Cheng; GCC Patches; Richard Earnshaw Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know reg*scale+reg is actually not supported. The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Also from the newer comments: Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Yes, you can share a single RTL expression for all this and I think querying reg + reg * scale first makes sense (then fallback to reg * scale for compatibility). I updated the patch as discussed. Please review. It bootstraps and passes regression test on x86/x86_64, but fails tree-ssa/scev-4.c on arm cortex-a15. Hi Richard, I investigated the failure and found out it reveals two other problems in IVOPT we have discussed. For scev-4.c like: typedef struct { int x; int y; } S; int *a_p; S a[1000]; f(int k) { int i; for (i=k; i1000; i+=k) { a_p = a[i].y; *a_p = 100; } } The iv candidates and uses are like: use 0 generic in statement a_p.0_5 = a[k_11].y; at position type int * base (int *) a + ((sizetype) k_3(D) * 8 + 4) step (sizetype) k_3(D) * 8 base object (void *) a related candidates use 1 address in statement MEM[(int *)a][k_11].y = 100; at position MEM[(int *)a][k_11].y type int * base MEM[(int *)a][k_3(D)].y step (sizetype) k_3(D) * 8 base object (void *) a related candidates candidate 4 (important) depends on 1 original biv type int base k_3(D) step k_3(D) candidate 7 depends on 1 var_before ivtmp.12 var_after ivtmp.12 incremented before exit test type unsigned int base (unsigned int) ((int *) a + (sizetype) k_3(D) * 8) step (unsigned int) ((sizetype) k_3(D) * 8) base object (void *) a Candidate 7 is related to use 0 Problem 1) When
RE: [PATCH]Construct canonical scaled address expression in IVOPT
Sorry for missing the patch. Thanks. bin -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Thursday, September 26, 2013 8:05 PM To: 'Richard Biener'; Bin.Cheng Cc: GCC Patches; Richard Earnshaw Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 7:58 PM To: Bin.Cheng Cc: Bin Cheng; GCC Patches; Richard Earnshaw Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know reg*scale+reg is actually not supported. The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Also from the newer comments: Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Yes, you can share a single RTL expression for all this and I think querying reg + reg * scale first makes sense (then fallback to reg * scale for compatibility). I updated the patch as discussed. Please review. It bootstraps and passes regression test on x86/x86_64, but fails tree- ssa/scev-4.c on arm cortex-a15. Hi Richard, I investigated the failure and found out it reveals two other problems in IVOPT we have discussed. For scev-4.c like: typedef struct { int x; int y; } S; int *a_p; S a[1000]; f(int k) { int i; for (i=k; i1000; i+=k) { a_p = a[i].y; *a_p = 100; } } The iv candidates and uses are like: use 0 generic in statement a_p.0_5 = a[k_11].y; at position type int * base (int *) a + ((sizetype) k_3(D) * 8 + 4) step (sizetype) k_3(D) * 8 base object (void *) a related candidates use 1 address in statement MEM[(int *)a][k_11].y = 100; at position MEM[(int *)a][k_11].y type int * base
Re: [PATCH, ARM] Fix PR target/58423
On 26/09/13 09:50, Zhenqiang Chen wrote: -Original Message- From: Richard Earnshaw Sent: Monday, September 23, 2013 11:11 PM To: Zhenqiang Chen Cc: Yufeng Zhang; gcc-patches@gcc.gnu.org Subject: Re: [PATCH, ARM] Fix PR target/58423 On 23/09/13 09:11, Zhenqiang Chen wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Yufeng Zhang Sent: Monday, September 23, 2013 3:16 PM To: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, ARM] Fix PR target/58423 On 09/23/13 07:58, Zhenqiang Chen wrote: --- clean-trunk/gcc/config/arm/arm.c2013-09-17 14:29:45.632457018 +0800 +++ pr58423/gcc/config/arm/arm.c2013-09-18 14:34:24.708892318 +0800 @@ -17645,8 +17645,8 @@ mem = gen_frame_mem (DImode, stack_pointer_rtx); tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem); -RTX_FRAME_RELATED_P (tmp) = 1; tmp = emit_insn (tmp); +RTX_FRAME_RELATED_P (tmp) = 1; The indent doesn't seem right. Thanks for the comments. My gmail server changes tab to 4 spaces. Recreate the patch to remove tab and use attachment to make sure no other automatic changes. Please fix your mailer. The GNU coding standard uses hard tabs and gratuitous changes to the white space are a pain to deal with. Thanks for the comments. Patch is updated to use hard tabs. -Zhenqiang OK. R.
Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
On Thu, 26 Sep 2013, Richard Biener wrote: On Thu, 26 Sep 2013, Eric Botcazou wrote: Eric, the current way component_uses_parent_alias_set is called from get_alias_set causes the reference tree chain to be walked O(n^2) in case there is any DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT reference in the tree chain. Also the comment above component_uses_parent_alias_set doesn't seem to match behavior in get_alias_set. get_alias_set ends up using exactly the type of the parent of the DECL_NONADDRESSABLE_P / TYPE_NONALIASED_COMPONENT reference instead of the alias set provided by the object at the heart of T That's what the object at the heart of T means I think: given the loop in get_alias_set (or reference_alias_ptr_type_1 now), we end up retrieving the parent of the outermost non-addressable component in the component chain (outermost in the type layout sense), which is what is wanted: when you go down the component chain from the base object to find the alias set, you need to stop at the first non-addressable component. That's what is achieved in the RTL expander by means of MEM_KEEP_ALIAS_SET_P: you expand first the base object, then set MEM_KEEP_ALIAS_SET_P for the first non-addressable component, so that the alias set isn't touched any more after that. But I agree that the head comment and the interface of c_u_p_a_s are awkward, to say the least, and the quadratic aspect isn't very nice either. Ok. I also fail to see why component_uses_parent_alias_set invokes get_alias_set (is that to make 'a.c' in struct { char c; } a; use the alias set of 'a' instead of the alias set of 'char'? Yes, I think it's intended to stop the walk as soon as you have a type with alias set 0: if 'a' has alias set 0, then 'a.c' will have it. That wasn't the question - I was asking if we have a struct type with non-zero alias set, like struct { char c; int i } a; then if we have an access like a.c does the code want to use the alias set of 'a' (non-zero) for the access for optimization purposes? It seems not, given your comment below... [...] However, I think the handling of the parent has alias set zero is wrong in your patch, you should test if (get_alias_set (TREE_TYPE (TREE_OPERAND (t, 0))) == 0) but I fail to see how this can happen in practice? It can happen for ref-all bases but that case is handled separately. But ok, I'll leave the code as-is functionality wise, we should change semantics as followup if at all. The only other use besides from get_alias_set is to set MEM_KEEP_ALIAS_SET_P - It means that the alias set already on the MEM may not be changed afterwards, even if you look into its sub-components later. whatever that is exactly for, I can see a single non-obvious use in store_constructor_field (didn't bother to lookup how callers specify the alias_set argument). In if (MEM_P (to_rtx) !MEM_KEEP_ALIAS_SET_P (to_rtx) DECL_NONADDRESSABLE_P (field)) { to_rtx = copy_rtx (to_rtx); MEM_KEEP_ALIAS_SET_P (to_rtx) = 1; } we can just drop the MEM_KEEP_ALIAS_SET_P check (well, in case MEM_KEEP_ALIAS_SET_P is dropped). Do you mean dropped in set_mem_attributes_minus_bitpos? No, I don't think we can do that. Yeah, I thought of dropping it completely - we know the effective alias-set of the load/store stmt we are expanding via get_alias_set of the expression. I don't see why we need to invent new alias sets in any place down the road when creating sub-accesses (either from storing constructor components or from storing bitfield pieces). Thanks for the comments, I'll prepare a patch to only remove the quadraticness without changing anything else. Like the following. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2013-09-26 Richard Biener rguent...@suse.de * alias.h (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this. * alias.c (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this and return the desired parent. (reference_alias_ptr_type_1): Use the result from component_uses_parent_alias_set_from instead of stripping components one at a time. * emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust. Index: gcc/alias.c === --- gcc/alias.c (revision 202941) +++ gcc/alias.c (working copy) @@ -500,51 +500,58 @@ objects_must_conflict_p (tree t1, tree t return alias_sets_must_conflict_p (set1, set2); } -/* Return true if all nested component references handled by - get_inner_reference in T are such that we should use the alias set - provided by the object
[PATCH, committed] Update t-rs6000 for automatic dependencies
All of the rs6000-specific dependencies are included and discovered automatically. Bootstrapped on powerpc-ibm-aix7.1.0.0. David * config/rs6000/t-rs6000 (rs6000.o): Remove. (rs6000-c.o): Use COMPILE and POSTCOMPILE. Index: config/rs6000/t-rs6000 === --- config/rs6000/t-rs6000 (revision 202942) +++ config/rs6000/t-rs6000 (working copy) @@ -20,22 +20,10 @@ TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def -rs6000.o: $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ - $(RTL_H) $(REGS_H) hard-reg-set.h \ - real.h insn-config.h conditions.h insn-attr.h flags.h $(RECOG_H) \ - $(OBSTACK_H) $(TREE_H) $(EXPR_H) $(OPTABS_H) except.h function.h \ - output.h dbxout.h $(BASIC_BLOCK_H) toplev.h $(GGC_H) $(HASHTAB_H) \ - $(TM_P_H) $(TARGET_H) $(TARGET_DEF_H) langhooks.h reload.h gt-rs6000.h \ - cfgloop.h $(OPTS_H) $(COMMON_TARGET_H) dumpfile.h \ - $(srcdir)/config/rs6000/rs6000-cpus.def +rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c + $(COMPILE) $ + $(POSTCOMPILE) -rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c \ -$(srcdir)/config/rs6000/rs6000-protos.h \ -$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(CPPLIB_H) \ -$(TM_P_H) $(C_PRAGMA_H) errors.h coretypes.h $(TM_H) - $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ - $(srcdir)/config/rs6000/rs6000-c.c - $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \ $(srcdir)/config/rs6000/rs6000-cpus.def $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 \
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Thu, Sep 26, 2013 at 2:10 PM, bin.cheng bin.ch...@arm.com wrote: Sorry for missing the patch. Thanks. bin -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Thursday, September 26, 2013 8:05 PM To: 'Richard Biener'; Bin.Cheng Cc: GCC Patches; Richard Earnshaw Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 7:58 PM To: Bin.Cheng Cc: Bin Cheng; GCC Patches; Richard Earnshaw Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- Or even [reg*scale] (not sure about that). But yes, at least reg*scale + offset and reg*scale + reg. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check index, then addr if index is failed? By any kind of addressing modes, I mean modes supported by function get_address_cost, i.e., in form of [base] + [off] + [var] + (reg|reg*scale). I suppose so, but it of course depends on what IVOPTs uses the answer for in the end. Appearantly it doesn't distinguish between the various cases even though TARGET_MEM_REF does support all the variants in question (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + reg + const). So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know reg*scale+reg is actually not supported. The above also builds more RTX waste which you can fix by re-using the PLUS by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I would expect that many targets support reg1 * scale + constant-offset but not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into reg or reg+reg. Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. Ok. So no, the helper now checks sth completely different. What's the problem with arm supporting reg1 * scale? Why shouldn't it being able to handle the implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. I see. Also from the newer comments: Btw, it should be reasonably possible to compute the whole multiplier_allowed_in_address_p table for all primary and secondary archs (simply build cross-cc1) and compare the results before / after a patch candidate. Querying both reg * scale and reg + reg * scale if the first fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Yes, you can share a single RTL expression for all this and I think querying reg + reg * scale first makes sense (then fallback to reg * scale for compatibility). I updated the patch as discussed. Please review. It bootstraps and passes regression test on x86/x86_64, but fails tree- ssa/scev-4.c on arm cortex-a15. Hi Richard, I investigated the failure and found out it reveals two other problems in IVOPT we have discussed. For scev-4.c like: typedef struct { int x; int y; } S; int *a_p; S a[1000]; f(int k) { int i; for (i=k; i1000; i+=k) { a_p = a[i].y; *a_p = 100; } } The iv candidates and uses are like: use 0 generic in statement a_p.0_5 = a[k_11].y; at position type int * base (int *) a + ((sizetype) k_3(D) * 8 + 4) step (sizetype) k_3(D) * 8 base object (void *) a related candidates use 1 address in statement MEM[(int *)a][k_11].y = 100;
[PATCH] Fix libgfortran cross compile configury w.r.t newlib
This patch: http://gcc.gnu.org/ml/fortran/2013-06/msg00038.html ... breaks libgfortran configure against newlib. The solution implemented hard wires an assumption in libgfortran/configure.ac that newlib provides strtold(). This assumption is not correct, newlib only provides an implementation of strtold on systems where sizeof(long double) == sizeof(double). This manifests as a regression when trying to build a cross aarch64 bare metal toolchain with fortran enabled. The attached patch tightens the condition introduced in the earlier patch such that we continue to call AC_CHECK_FUNCS_ONCE unless we know that link tests are not possible, in which case we fall back to the existing broken assumption. I'm in two minds about whether further sticky tape of this form is the right approach or whether the original patch should be reverted until a proper fix that does not regress the tree can be found. Thoughts? 2013-09-26 Marcus Shawcroft marcus.shawcr...@arm.com * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement dependent on gcc_no_link. Cheers /Marcusdiff --git a/libgfortran/configure.ac b/libgfortran/configure.ac index 4609eba..411ab38 100644 --- a/libgfortran/configure.ac +++ b/libgfortran/configure.ac @@ -261,7 +261,7 @@ GCC_HEADER_STDINT(gstdint.h) AC_CHECK_MEMBERS([struct stat.st_blksize, struct stat.st_blocks, struct stat.st_rdev]) # Check for library functions. -if test x${with_newlib} = xyes; then +if test x${with_newlib} = xyes -a x${gcc_no_link} = xyes ; then # We are being configured with a cross compiler. AC_REPLACE_FUNCS # may not work correctly, because the compiler may not be able to # link executables.
[PATCH] Fix PR58459
After much thinking I settled on removing the restriction from forwprop that avoids propagating non-invariant addresses into loops. As forwprop is mainly seen as a way to canonicalize the IL and simplify it for further passes this seems like the correct thing to do. LIM and IVOPTs should undo any harm this causes, otherwise I'll find another way to fix the fallout. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-09-26 Richard Biener rguent...@suse.de PR tree-optimization/58459 * tree-ssa-forwprop.c (forward_propagate_addr_expr): Remove restriction not propagating into loops. * gcc.dg/tree-ssa/ssa-pre-31.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c === *** gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c (working copy) *** *** 0 --- 1,47 + /* { dg-do compile } */ + /* { dg-options -O2 -fdump-tree-pre } */ + + typedef struct { + unsigned int key; + } S; + typedef struct s1 { + unsigned int key; + unsigned int bits; + struct s1 *left, *right; + }S1; + extern S a[1024]; + static inline int bar( S* p, S1* n ) + { + S1 *curr; + S1 *next; + + if ( n-left == n ) + return (int)(p-key == n-key); + + curr = n; + next = n-left; + + while (curr-bits next-bits ) { + curr = next; + if (p-key (1 curr-bits)) + next = curr-right; + else + next = curr-left; + } + + return (int)(p-key == next-key); + + } + + int foo (S1 *root, int N) + { + volatile int r; + int i,j; + for (i=0; iN; i++) + for (j=0;j1024; j++) + r = bar(a[j], root); + return 0; + } + + /* { dg-final { scan-tree-dump-times key 4 pre } } */ + /* { dg-final { cleanup-tree-dump pre } } */ Index: gcc/tree-ssa-forwprop.c === *** gcc/tree-ssa-forwprop.c (revision 202941) --- gcc/tree-ssa-forwprop.c (working copy) *** forward_propagate_addr_expr_1 (tree name *** 1001,1007 static bool forward_propagate_addr_expr (tree name, tree rhs) { - int stmt_loop_depth = bb_loop_depth (gimple_bb (SSA_NAME_DEF_STMT (name))); imm_use_iterator iter; gimple use_stmt; bool all = true; --- 1001,1006 *** forward_propagate_addr_expr (tree name, *** 1014,1050 /* If the use is not in a simple assignment statement, then there is nothing we can do. */ ! if (gimple_code (use_stmt) != GIMPLE_ASSIGN) { if (!is_gimple_debug (use_stmt)) all = false; continue; } ! /* If the use is in a deeper loop nest, then we do not want !to propagate non-invariant ADDR_EXPRs into the loop as that !is likely adding expression evaluations into the loop. */ ! if (bb_loop_depth (gimple_bb (use_stmt)) stmt_loop_depth ! !is_gimple_min_invariant (rhs)) { ! all = false; ! continue; } ! ! { ! gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); ! result = forward_propagate_addr_expr_1 (name, rhs, gsi, ! single_use_p); ! /* If the use has moved to a different statement adjust ! the update machinery for the old statement too. */ ! if (use_stmt != gsi_stmt (gsi)) ! { ! update_stmt (use_stmt); ! use_stmt = gsi_stmt (gsi); ! } ! ! update_stmt (use_stmt); ! } all = result; /* Remove intermediate now unused copy and conversion chains. */ --- 1013,1036 /* If the use is not in a simple assignment statement, then there is nothing we can do. */ ! if (!is_gimple_assign (use_stmt)) { if (!is_gimple_debug (use_stmt)) all = false; continue; } ! gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); ! result = forward_propagate_addr_expr_1 (name, rhs, gsi, ! single_use_p); ! /* If the use has moved to a different statement adjust !the update machinery for the old statement too. */ ! if (use_stmt != gsi_stmt (gsi)) { ! update_stmt (use_stmt); ! use_stmt = gsi_stmt (gsi); } ! update_stmt (use_stmt); all = result; /* Remove intermediate now unused copy and conversion chains. */
Re: [PATCH] Refactor type handling in get_alias_set, fix PR58513
On Tue, Sep 24, 2013 at 12:00:48PM +0100, Richard Biener wrote: 2013-09-24 Richard Biener rguent...@suse.de * g++.dg/vect/pr58513.cc: New testcase. Hi, This testcase fails for arm and aarch64 targets when using -fPIC. As discussed on IRC this can be fixed by making op static. After asking Richard Earnshaw to explain -fPIC and dynamic linking to me, I've committed this (now obvious) patch as r202947. Thanks, James --- gcc/testsuite/ 2013-09-26 James Greenhalgh james.greenha...@arm.com * g++.dg/vect/pr58513.cc (op): Make static.diff --git a/gcc/testsuite/g++.dg/vect/pr58513.cc b/gcc/testsuite/g++.dg/vect/pr58513.cc index 2563047..08a175c 100644 --- a/gcc/testsuite/g++.dg/vect/pr58513.cc +++ b/gcc/testsuite/g++.dg/vect/pr58513.cc @@ -1,7 +1,7 @@ // { dg-do compile } // { dg-require-effective-target vect_int } -int op (const int x, const int y) { return x + y; } +static int op (const int x, const int y) { return x + y; } void foo(int* a) {
Re: [PATCH] Trivial cleanup
Hi, On Wed, 25 Sep 2013, Jeff Law wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Yea. I actually reviewed the libstdc++ guidelines to see where they differed from GNU's C guidelines. I'm strongly in favor of dropping the horizontal whitespace between the method name and its open paren when the result is then dereferenced. ie foo.last()-e rather than foo.last ()-e. I'd prefer to not write in this style at all, like Jakub. If we must absolutely have it, then I agree that the space before _empty_ parentheses are ugly if followed by references. I.e. I'd like to see spaces before parens as is customary, except in one case: empty parens in the middle of expressions (which don't happen very often right now in GCC, and hence wouldn't introduce a coding style confusion): do.this (); give.that()-flag; get.list (one)-clear (); I'd prefer to not have further references to return values be applied, though (as in, the parentheses should be the end of statement), which would avoid the topic (at the expensive of having to invent names for those temporaries, or to write trivial wrapper methods contracting several method calls). Ciao, Michael.
[PATCH] Make more SSA names anonymous after into-SSA
It turns out we are pretty conservative on the set of SSA names we make anonymous. In particular all named temporaries created by gimplification and lowering passes were left alone. The following patch makes all SSA names anonymous that we will not generate debug information for and transitions the name over to it. Bootstrap and regtest running on x86_64_unknown-linux-gnu. Richard. 2013-09-26 Richard Biener rguent...@suse.de * tree-into-ssa.c (rewrite_into_ssa): Make more SSA names to anonymous. Index: gcc/tree-into-ssa.c === *** gcc/tree-into-ssa.c (revision 202941) --- gcc/tree-into-ssa.c (working copy) *** rewrite_into_ssa (void) *** 2366,2375 if (decl TREE_CODE (decl) == VAR_DECL !VAR_DECL_IS_VIRTUAL_OPERAND (decl) ! DECL_ARTIFICIAL (decl) ! DECL_IGNORED_P (decl) ! !DECL_NAME (decl)) ! SET_SSA_NAME_VAR_OR_IDENTIFIER (name, NULL_TREE); } return 0; --- 2366,2373 if (decl TREE_CODE (decl) == VAR_DECL !VAR_DECL_IS_VIRTUAL_OPERAND (decl) ! DECL_IGNORED_P (decl)) ! SET_SSA_NAME_VAR_OR_IDENTIFIER (name, DECL_NAME (decl)); } return 0;
Re: [PATCH] Trivial cleanup
On Thu, Sep 26, 2013 at 4:15 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 25 Sep 2013, Jeff Law wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Yea. I actually reviewed the libstdc++ guidelines to see where they differed from GNU's C guidelines. I'm strongly in favor of dropping the horizontal whitespace between the method name and its open paren when the result is then dereferenced. ie foo.last()-e rather than foo.last ()-e. I'd prefer to not write in this style at all, like Jakub. If we must absolutely have it, then I agree that the space before _empty_ parentheses are ugly if followed by references. I.e. I'd like to see spaces before parens as is customary, except in one case: empty parens in the middle of expressions (which don't happen very often right now in GCC, and hence wouldn't introduce a coding style confusion): do.this (); give.that()-flag; get.list (one)-clear (); I'd prefer to not have further references to return values be applied, though (as in, the parentheses should be the end of statement), which would avoid the topic (at the expensive of having to invent names for those temporaries, or to write trivial wrapper methods contracting several method calls). Seconded, even give.that()-flag; is ugly. Richard.
Re: cost model patch
On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com wrote: I took the liberty to pick up Richard's original fvect-cost-model patch and made some modification. What has not changed: 1) option -ftree-vect-loop-version is removed; 2) three cost models are introduced: cheap, dynamic, and unlimited; 3) unless explicitly specified, cheap model is the default at O2 (e.g. when -ftree-loop-vectorize is used with -O2), and dynamic mode is the default for O3 and FDO 4) alignment based versioning is disabled with cheap model. What has changed: 1) peeling is also disabled with cheap model; 2) alias check condition limit is reduced with cheap model, but not completely suppressed. Runtime alias check is a pretty important enabler. 3) tree if conversion changes are not included. Does this patch look reasonable? In principle yes. Note that it changes the behavior of -O2 -ftree-vectorize as -ftree-vectorize does not imply changing the default cost model. I am fine with that, but eventually this will have some testsuite fallout. This reorg would also need documenting in changes.html to make people aware of this. With completely disabling alingment peeling and alignment versioning you cut out targets that have no way of performing unaligned accesses. From looking at vect_no_align this are mips, sparc, ia64 and some arm. A compromise for them would be to allow peeling a single iteration and some alignment checks (like up to two?). Reducing the number of allowed alias-checks is ok, but I'd reduce it more than to 6 (was that an arbitrary number or is that the result of some benchmarking?) I suppose all of the params could use some benchmarking to select a sweet spot in code size vs. runtime. I suppose the patch is ok as-is (if it actually works) if you provide a changelog and propose an entry for changes.html. We can tune the params for the cheap model as followup. Thanks for picking this up, Richard. thanks, David
Re: [PATCH] Fix PR58459
On Thu, Sep 26, 2013 at 03:54:19PM +0200, Richard Biener wrote: After much thinking I settled on removing the restriction from forwprop that avoids propagating non-invariant addresses into loops. As forwprop is mainly seen as a way to canonicalize the IL and simplify it for further passes this seems like the correct thing to do. LIM and IVOPTs should undo any harm this causes, otherwise I'll find another way to fix the fallout. But, aren't some forwprop passes run after LIM and IVOPTs, so while LIM and IVOPTs undo that harm, forwprop4 pass adds it there again? I guess it is just fine to remove the restriction for forwprop{1,2,3}, but for forwprop4 I'm not that sure. You are then hoping RTL optimizations will undo the harm, but if they don't, ... Jakub
Re: [PATCH] Fix PR58459
On Thu, 26 Sep 2013, Jakub Jelinek wrote: On Thu, Sep 26, 2013 at 03:54:19PM +0200, Richard Biener wrote: After much thinking I settled on removing the restriction from forwprop that avoids propagating non-invariant addresses into loops. As forwprop is mainly seen as a way to canonicalize the IL and simplify it for further passes this seems like the correct thing to do. LIM and IVOPTs should undo any harm this causes, otherwise I'll find another way to fix the fallout. But, aren't some forwprop passes run after LIM and IVOPTs, so while LIM and IVOPTs undo that harm, forwprop4 pass adds it there again? LIM would move a memory reference, forwprop nowadays hopefully only propagates these beasts into dereferences. IVOPTs only works on memory references as well, exposing lowered address computation, thus a different kind of addresses. I guess it is just fine to remove the restriction for forwprop{1,2,3}, but for forwprop4 I'm not that sure. You are then hoping RTL optimizations will undo the harm, but if they don't, ... ... then we find a solution to address this. No, I don't have my bets on RTL optimizers fixing this but instead on no fallout actually happening. Richard.
Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote: I'm in two minds about whether further sticky tape of this form is the right approach or whether the original patch should be reverted until a proper fix that does not regress the tree can be found. Thoughts? 2013-09-26 Marcus Shawcroft marcus.shawcr...@arm.com * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement dependent on gcc_no_link. Cheers /Marcus I think this patch is a good fix. I (obviously) don't favor reverting the previous patch because that would re-break the Fortran build on MIPS bare-metal cross compilers (or any compiler where a linker script is needed). Any 'proper' fix should address libstdc++, libjava, and other libraries as well as libgfortran and I don't know what a cleaner fix would be. In fact I would say the other libraries should consider using this fix. The only reason they don't run into this problem too is that they don't depend on any long double functions or any other functions that are optionally built by newlib. I will test this patch on my targets and make sure it works for me, but I don't see why it would not. Steve Ellcey sell...@mips.com
[GOOGLE] max-lipo-group parameter for AutoFDO
This patch fix the bug when setting max-lipo-group in AutoFDO: Bootstrapped and passed regression test. OK for google branches? Thanks, Dehao Index: gcc/auto-profile.c === --- gcc/auto-profile.c (revision 202926) +++ gcc/auto-profile.c (working copy) @@ -746,7 +746,7 @@ read_aux_modules (void) assembler statements, *iter); continue; } - if (max_group != 0 curr_module == max_group) + if (max_group != 0 curr_module = max_group) { if (flag_opt_info) inform (0, Not importing %s: maximum group size reached, *iter);
Re: [GOOGLE] max-lipo-group parameter for AutoFDO
Looks ok. David On Thu, Sep 26, 2013 at 9:00 AM, Dehao Chen de...@google.com wrote: This patch fix the bug when setting max-lipo-group in AutoFDO: Bootstrapped and passed regression test. OK for google branches? Thanks, Dehao Index: gcc/auto-profile.c === --- gcc/auto-profile.c (revision 202926) +++ gcc/auto-profile.c (working copy) @@ -746,7 +746,7 @@ read_aux_modules (void) assembler statements, *iter); continue; } - if (max_group != 0 curr_module == max_group) + if (max_group != 0 curr_module = max_group) { if (flag_opt_info) inform (0, Not importing %s: maximum group size reached, *iter);
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
Eric Botcazou ebotca...@adacore.com writes: So in the set_* routines it isn't about whether the value is definitely a base or a definitely an index. It's just about drilling down through what we've already decided is a base or index to get the inner reg or mem, and knowing which XEXPs to look at. We could instead have used a for_each_rtx, or something like that, without any code checks. But I wanted to be precise about the types of address we allow, so that we can assert for things we don't understand. In other words, it was designed to require the kind of extension Yvan is adding here. Does this mean that the design is to require a parallel implementation in the predicates and in the set routines, i.e. each time you add a new case to the predicates, you need to add it (or do something) to the set routines as well? If so, that's a little weird, but OK, feel free to revert the de-duplication part, but add comments saying that the functions must be kept synchronized. They don't need to be kept synchronised as such. It's fine for the index to allow more than must_be_index_p. But if you're not keen on the current structure, does the following look better? Tested on x86_64-linux-gnu. Thanks, Richard gcc/ * rtlanal.c (must_be_base_p, must_be_index_p): Delete. (binary_scale_code_p, get_base_term, get_index_term): New functions. (set_address_segment, set_address_base, set_address_index) (set_address_disp): Accept the argument unconditionally. (baseness): Remove must_be_base_p and must_be_index_p checks. (decompose_normal_address): Classify as much as possible in the main loop. Index: gcc/rtlanal.c === --- gcc/rtlanal.c 2013-09-25 22:42:16.870221821 +0100 +++ gcc/rtlanal.c 2013-09-26 09:11:41.273778750 +0100 @@ -5521,26 +5521,50 @@ strip_address_mutations (rtx *loc, enum } } -/* Return true if X must be a base rather than an index. */ +/* Return true if CODE applies some kind of scale. The scaled value is + is the first operand and the scale is the second. */ static bool -must_be_base_p (rtx x) +binary_scale_code_p (enum rtx_code code) { - return GET_CODE (x) == LO_SUM; + return (code == MULT + || code == ASHIFT + /* Needed by ARM targets. */ + || code == ASHIFTRT + || code == LSHIFTRT + || code == ROTATE + || code == ROTATERT); } -/* Return true if X must be an index rather than a base. */ +/* If *INNER can be interpreted as a base, return a pointer to the inner term + (see address_info). Return null otherwise. */ -static bool -must_be_index_p (rtx x) +static rtx * +get_base_term (rtx *inner) { - return (GET_CODE (x) == MULT - || GET_CODE (x) == ASHIFT - /* Needed by ARM targets. */ - || GET_CODE (x) == ASHIFTRT - || GET_CODE (x) == LSHIFTRT - || GET_CODE (x) == ROTATE - || GET_CODE (x) == ROTATERT); + if (GET_CODE (*inner) == LO_SUM) +inner = strip_address_mutations (XEXP (*inner, 0)); + if (REG_P (*inner) + || MEM_P (*inner) + || GET_CODE (*inner) == SUBREG) +return inner; + return 0; +} + +/* If *INNER can be interpreted as an index, return a pointer to the inner term + (see address_info). Return null otherwise. */ + +static rtx * +get_index_term (rtx *inner) +{ + /* At present, only constant scales are allowed. */ + if (binary_scale_code_p (GET_CODE (*inner)) CONSTANT_P (XEXP (*inner, 1))) +inner = strip_address_mutations (XEXP (*inner, 0)); + if (REG_P (*inner) + || MEM_P (*inner) + || GET_CODE (*inner) == SUBREG) +return inner; + return 0; } /* Set the segment part of address INFO to LOC, given that INNER is the @@ -5549,8 +5573,6 @@ must_be_index_p (rtx x) static void set_address_segment (struct address_info *info, rtx *loc, rtx *inner) { - gcc_checking_assert (GET_CODE (*inner) == UNSPEC); - gcc_assert (!info-segment); info-segment = loc; info-segment_term = inner; @@ -5562,12 +5584,6 @@ set_address_segment (struct address_info static void set_address_base (struct address_info *info, rtx *loc, rtx *inner) { - if (must_be_base_p (*inner)) -inner = strip_address_mutations (XEXP (*inner, 0)); - gcc_checking_assert (REG_P (*inner) - || MEM_P (*inner) - || GET_CODE (*inner) == SUBREG); - gcc_assert (!info-base); info-base = loc; info-base_term = inner; @@ -5579,12 +5595,6 @@ set_address_base (struct address_info *i static void set_address_index (struct address_info *info, rtx *loc, rtx *inner) { - if (must_be_index_p (*inner) CONSTANT_P (XEXP (*inner, 1))) -inner = strip_address_mutations (XEXP (*inner, 0)); - gcc_checking_assert (REG_P (*inner) - || MEM_P (*inner) - || GET_CODE (*inner) == SUBREG); - gcc_assert (!info-index);
Re: RFA: Store the REG_BR_PROB probability directly as an int
Steve Ellcey sell...@mips.com writes: On Tue, 2013-09-24 at 21:07 +0200, Andreas Schwab wrote: Richard Sandiford rdsandif...@googlemail.com writes: Sorry for the breakage. I think we need to handle INT_LIST in the same way as INSN_LIST though, and eliminate in XEXP (x, 1). How about the attached? Testing in progress... Works for me as well. Andreas. This patch worked for me as well on MIPS. I did a complete build and test overnight. Thanks for the testing. It also passes bootstrap on x86_64-linux-gnu. OK to install? Thanks, Richard
[GOOGLE] Fix an ICE in lipo_cmp_type
This fixes an ICE when lipo_cmp_type handles NULL_PTR_TYPE. Bootstrapped and regression test on going? OK for google branches? Thanks, Dehao Index: gcc/l-ipo.c === --- gcc/l-ipo.c (revision 202926) +++ gcc/l-ipo.c (working copy) @@ -713,6 +713,7 @@ lipo_cmp_type (tree t1, tree t2) lipo_cmp_type (TREE_TYPE (t1), TREE_TYPE (t2))); case VOID_TYPE: case BOOLEAN_TYPE: +case NULLPTR_TYPE: return 1; case TEMPLATE_TYPE_PARM: return 1;
Re: [GOOGLE] Fix an ICE in lipo_cmp_type
yes. David On Thu, Sep 26, 2013 at 9:26 AM, Dehao Chen de...@google.com wrote: This fixes an ICE when lipo_cmp_type handles NULL_PTR_TYPE. Bootstrapped and regression test on going? OK for google branches? Thanks, Dehao Index: gcc/l-ipo.c === --- gcc/l-ipo.c (revision 202926) +++ gcc/l-ipo.c (working copy) @@ -713,6 +713,7 @@ lipo_cmp_type (tree t1, tree t2) lipo_cmp_type (TREE_TYPE (t1), TREE_TYPE (t2))); case VOID_TYPE: case BOOLEAN_TYPE: +case NULLPTR_TYPE: return 1; case TEMPLATE_TYPE_PARM: return 1;
[patch, committed] Remove walk_use_def_chains
As preapproved by Richard Biener. Bootstrapped on x86_64-debian-linux-gnu. 2013-09-26 Florian Weimer f...@deneb.enyo.de * tree-ssa.h (walk_use_def_chains_fn, walk_use_def_chains): Delete. * tree-ssa.c (walk_use_def_chains_1, walk_use_def_chains): Delete. * doc/tree-ssa.texi (Walking use-def chains): Delete. Index: gcc/tree-ssa.c === --- gcc/tree-ssa.c (revision 202947) +++ gcc/tree-ssa.c (working copy) @@ -1347,115 +1347,6 @@ } -/* Internal helper for walk_use_def_chains. VAR, FN and DATA are as - described in walk_use_def_chains. - - VISITED is a pointer set used to mark visited SSA_NAMEs to avoid - infinite loops. We used to have a bitmap for this to just mark - SSA versions we had visited. But non-sparse bitmaps are way too - expensive, while sparse bitmaps may cause quadratic behavior. - - IS_DFS is true if the caller wants to perform a depth-first search - when visiting PHI nodes. A DFS will visit each PHI argument and - call FN after each one. Otherwise, all the arguments are - visited first and then FN is called with each of the visited - arguments in a separate pass. */ - -static bool -walk_use_def_chains_1 (tree var, walk_use_def_chains_fn fn, void *data, - struct pointer_set_t *visited, bool is_dfs) -{ - gimple def_stmt; - - if (pointer_set_insert (visited, var)) -return false; - - def_stmt = SSA_NAME_DEF_STMT (var); - - if (gimple_code (def_stmt) != GIMPLE_PHI) -{ - /* If we reached the end of the use-def chain, call FN. */ - return fn (var, def_stmt, data); -} - else -{ - size_t i; - - /* When doing a breadth-first search, call FN before following the -use-def links for each argument. */ - if (!is_dfs) - for (i = 0; i gimple_phi_num_args (def_stmt); i++) - if (fn (gimple_phi_arg_def (def_stmt, i), def_stmt, data)) - return true; - - /* Follow use-def links out of each PHI argument. */ - for (i = 0; i gimple_phi_num_args (def_stmt); i++) - { - tree arg = gimple_phi_arg_def (def_stmt, i); - - /* ARG may be NULL for newly introduced PHI nodes. */ - if (arg - TREE_CODE (arg) == SSA_NAME - walk_use_def_chains_1 (arg, fn, data, visited, is_dfs)) - return true; - } - - /* When doing a depth-first search, call FN after following the -use-def links for each argument. */ - if (is_dfs) - for (i = 0; i gimple_phi_num_args (def_stmt); i++) - if (fn (gimple_phi_arg_def (def_stmt, i), def_stmt, data)) - return true; -} - - return false; -} - - - -/* Walk use-def chains starting at the SSA variable VAR. Call - function FN at each reaching definition found. FN takes three - arguments: VAR, its defining statement (DEF_STMT) and a generic - pointer to whatever state information that FN may want to maintain - (DATA). FN is able to stop the walk by returning true, otherwise - in order to continue the walk, FN should return false. - - Note, that if DEF_STMT is a PHI node, the semantics are slightly - different. The first argument to FN is no longer the original - variable VAR, but the PHI argument currently being examined. If FN - wants to get at VAR, it should call PHI_RESULT (PHI). - - If IS_DFS is true, this function will: - - 1- walk the use-def chains for all the PHI arguments, and, - 2- call (*FN) (ARG, PHI, DATA) on all the PHI arguments. - - If IS_DFS is false, the two steps above are done in reverse order - (i.e., a breadth-first search). */ - -void -walk_use_def_chains (tree var, walk_use_def_chains_fn fn, void *data, - bool is_dfs) -{ - gimple def_stmt; - - gcc_assert (TREE_CODE (var) == SSA_NAME); - - def_stmt = SSA_NAME_DEF_STMT (var); - - /* We only need to recurse if the reaching definition comes from a PHI - node. */ - if (gimple_code (def_stmt) != GIMPLE_PHI) -(*fn) (var, def_stmt, data); - else -{ - struct pointer_set_t *visited = pointer_set_create (); - walk_use_def_chains_1 (var, fn, data, visited, is_dfs); - pointer_set_destroy (visited); -} -} - - /* If necessary, rewrite the base of the reference tree *TP from a MEM_REF to a plain or converted symbol. */ Index: gcc/tree-ssa.h === --- gcc/tree-ssa.h (revision 202947) +++ gcc/tree-ssa.h (working copy) @@ -56,11 +56,6 @@ extern bool tree_ssa_useless_type_conversion (tree); extern tree tree_ssa_strip_useless_type_conversions (tree); -/* Call-back function for walk_use_def_chains(). At each reaching - definition, a function with this prototype is called. */ -typedef bool (*walk_use_def_chains_fn) (tree, gimple, void *); -extern void walk_use_def_chains (tree,
Re: [gomp4] Dumping gimple for offload.
On 25 Sep 15:48, Richard Biener wrote: On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 24 Sep 11:02, Richard Biener wrote: On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: thus consider assigning the section name in a different place. Richard. What do you mean by different place? I can add global dumping_omp_target variable to choose correct name, depending on it's value (patch below). Is it better? More like passing down a different abstraction, like for @@ -907,9 +907,15 @@ output_symtab (void) { symtab_node node = lto_symtab_encoder_deref (encoder, i); if (cgraph_node *cnode = dyn_cast cgraph_node (node)) -lto_output_node (ob, cnode, encoder); + { + if (!dumping_omp_target || lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) + lto_output_node (ob, cnode, encoder); + } else -lto_output_varpool_node (ob, varpool (node), encoder); + if (!dumping_omp_target || lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) + lto_output_varpool_node (ob, varpool (node), encoder); } have the symtab encoder already not contain the varpool nodes you don't need. And instead of looking up attributes, mark the symtab node with a flag. Good idea! I've tried creating 2 encoders, and adding only nodes with omp declare target attribute in omp case. There is still some is_omp passing to control lto_set_symtab_encoder_in_partition behaivor, because i think it's better than global var. What do you think? --- gcc/cgraphunit.c| 10 +- gcc/lto-cgraph.c| 15 ++- gcc/lto-streamer.c | 3 ++- gcc/lto-streamer.h | 10 -- gcc/lto/lto-partition.c | 4 ++-- gcc/passes.c| 10 +- gcc/tree-pass.h | 2 +- 7 files changed, 37 insertions(+), 17 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 1644ca9..9e0fc77 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2016,7 +2016,15 @@ ipa_passes (void) passes-all_lto_gen_passes); if (!in_lto_p) -ipa_write_summaries (); +{ + if (flag_openmp) + { + section_name_prefix = OMP_SECTION_NAME_PREFIX; + ipa_write_summaries (true); + } + section_name_prefix = LTO_SECTION_NAME_PREFIX; + ipa_write_summaries (false); +} if (flag_generate_lto) targetm.asm_out.lto_end (); diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 952588d..4a7d179 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t encoder, void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder, -symtab_node node) +symtab_node node, bool is_omp) { + /* Ignore non omp target nodes for omp case. */ + if (is_omp !lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) +return; + int index = lto_symtab_encoder_encode (encoder, (symtab_node)node); encoder-nodes[index].in_partition = true; } @@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder, ignored by the partitioning logic earlier. */ lto_symtab_encoder_t -compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) +compute_ltrans_boundary (lto_symtab_encoder_t in_encoder, bool is_omp) { struct cgraph_node *node; struct cgraph_edge *edge; @@ -779,7 +784,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) { node = lsei_cgraph_node (lsei); add_node_to (encoder, node, true); - lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node); + lto_set_symtab_encoder_in_partition (encoder, (symtab_node)node, is_omp); add_references (encoder, node-symbol.ref_list); /* For proper debug info, we need to ship the origins, too. */ if (DECL_ABSTRACT_ORIGIN (node-symbol.decl)) @@ -794,7 +799,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) { struct varpool_node *vnode = lsei_varpool_node (lsei); - lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode); + lto_set_symtab_encoder_in_partition (encoder, (symtab_node)vnode, is_omp); lto_set_symtab_encoder_encode_initializer (encoder, vnode); add_references (encoder, vnode-symbol.ref_list); /* For proper debug info, we need to ship the origins, too. */ @@ -802,7 +807,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) { struct varpool_node *origin_node = varpool_get_node (DECL_ABSTRACT_ORIGIN (node-symbol.decl)); - lto_set_symtab_encoder_in_partition (encoder,
Re: [gomp4] Tweak GOMP_target{,_data,_update} arguments
On 19 Sep 11:23, Jakub Jelinek wrote: that. Another complication is dependent shared libraries. Consider liba.c: #pragma omp declare target int i; int foo (void) { return ++i; } #pragma omp end declare target main.c: #pragma omp declare target extern int i; extern int foo (void); #pragma omp end declare target int main () { int j; #pragma omp target { j = i; j += foo (); } if (j != 1) abort (); return 0; } gcc -shared -O2 -fpic -fopenmp -o liba.so -Wl,-soname,liba.so liba.c gcc -O2 -fopenmp -o main main.c -L. -la ./main Perhaps the linker plugin can extract the target shared libraries from the embedded sections of dependent shared libraries (if any), and link the main shared library against that, but GOMP_target will need to know that it can't just offload main.so, but also has to offload the dependent liba.so (and of course libgomp.so.1 from the libgomp plugin). What does ICC do in this case? Jakub Hi Jakub, Here's what ICC does. Suppose we have liba.c and main.c, both with target regions: 1. Building liba.c - liba.so. A call to offload-runtime library is inserted into _init of liba.so. Target region is compiled into liba_target.so, and placed into .rodata of liba.so. 2. Building main.c - main.exe. Similarly, a call to offload-runtime library is inserted into _init of main.exe. Target region is compiled into main_target.so, and placed into .rodata of main.exe. 3. Runtime. So, when liba.so and main.exe are loaded at host-side, the runtime library knows, that it should transfer liba_target.so and main_target.so to the target-side. Then, main.exe starts execution. At every entry point to the target region, runtime library checks whether it should perform an initialization. If target is not initialized, runtime library calls COIProcessCreateFromMemory(main_target.exe), that transfers some standard main_target.exe to the target and starts it. Then, runtime library calls COIProcessLoadLibraryFromMemory(liba_target.so, main_target.so), that transfers these libraries to the target and loads them into the main_target.exe. The target-side functions are called from host through COIProcessGetFunctionHandles(f_name) and COIPipelineRunFunction(handle). The addresses of target-side functions are obtained from *_target.so by dlsym(). So, the host-side knows nothing about target addresses. What do you think, how will such an approach work with other target architectures, and with current implementation of GOMP_target{,_data,_update}? Thanks, -- Ilya
Re: cost model patch
On Thu, Sep 26, 2013 at 7:37 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com wrote: I took the liberty to pick up Richard's original fvect-cost-model patch and made some modification. What has not changed: 1) option -ftree-vect-loop-version is removed; 2) three cost models are introduced: cheap, dynamic, and unlimited; 3) unless explicitly specified, cheap model is the default at O2 (e.g. when -ftree-loop-vectorize is used with -O2), and dynamic mode is the default for O3 and FDO 4) alignment based versioning is disabled with cheap model. What has changed: 1) peeling is also disabled with cheap model; 2) alias check condition limit is reduced with cheap model, but not completely suppressed. Runtime alias check is a pretty important enabler. 3) tree if conversion changes are not included. Does this patch look reasonable? In principle yes. Note that it changes the behavior of -O2 -ftree-vectorize as -ftree-vectorize does not imply changing the default cost model. I am fine with that, but eventually this will have some testsuite fallout. This reorg would also need documenting in changes.html to make people aware of this. Here is the proposed change: Index: htdocs/gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26 +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 - @@ -37,6 +37,7 @@ ul liAddressSanitizer, a fast memory error detector, is now available on ARM. /li +liGCC introduces a new cost model for vectorizer, called 'cheap' model. The new cost model is intenteded to minimize compile time, code size, and potential negative runtime impact introduced when vectorizer is turned on at the expense of not getting the maximum potential runtime speedup. The 'cheap' model will be the default when vectorizer is turned on at code-O2/code. To override this, use option code-fvect-cost-model=[cheap|dynamic|unlimited]/code. /ul h2New Languages and Language specific improvements/h2 With completely disabling alingment peeling and alignment versioning you cut out targets that have no way of performing unaligned accesses. From looking at vect_no_align this are mips, sparc, ia64 and some arm. A compromise for them would be to allow peeling a single iteration and some alignment checks (like up to two?). Possibly. I think target owners can choose to do target specific tunings as follow up. Reducing the number of allowed alias-checks is ok, but I'd reduce it more than to 6 (was that an arbitrary number or is that the result of some benchmarking?) yes -- we found that it is not uncommon to have a loop with 2 or 3 distinct source address and 1 or 2 target address. There are also tuning opportunities. For instance, in cases where source address are derived from the same base, a consolidated alias check (against the whole access range instead of just checking cross 1-unrolled iteration dependence) can be done. I suppose all of the params could use some benchmarking to select a sweet spot in code size vs. runtime. Agree. I suppose the patch is ok as-is (if it actually works) if you provide a changelog and propose an entry for changes.html. We can tune the params for the cheap model as followup. Ok. I will do more testing and check in the patch with proper ChangeLog. The changes.html change will be done separately. thanks, David Thanks for picking this up, Richard. thanks, David
[gomp4] Library side of depend clause support
Hi! This patch adds depend clause support. In GOMP_task, before queueing the task, if task has any depend clauses we look up the addresses in a hash table (in the parent task, because only sibling tasks are ordered through depend clause), and if there are any dependencies on the earlier started tasks, the new task isn't added to team, parent and taskgroup task queues, but instead just added into the earlier task's depender vectors. Each task has also an integer number of how many other tasks it depends on. When a task on which something depends on finishes, if parent exists, it's records are removed from parent's depend address hash table, and even if parent doesn't exist anymore, we decrement num_dependees of every task mentioned in the dependers vector. If any of those counters go to zero, we insert them into all the relevant task queues. Tested on x86_64-linux. Will commit tomorrow unless somebody complains, but in any case would appreciate review of the changes. 2013-09-26 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry): New type. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. --- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200 +++ libgomp/libgomp.h 2013-09-26 17:17:28.267001263 +0200 @@ -39,6 +39,7 @@ #include pthread.h #include stdbool.h +#include stdlib.h #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility push(hidden) @@ -253,7 +254,19 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; /* This structure describes a task to be run by a thread. */ @@ -268,6 +281,10 @@ struct gomp_task struct gomp_task *next_taskgroup; struct gomp_task *prev_taskgroup; struct gomp_taskgroup *taskgroup; + struct gomp_task **dependers; + struct htab *depend_hash; + size_t depend_count; + size_t num_dependees; struct gomp_task_icv icv; void (*fn) (void *); void *fn_data; @@ -277,6 +294,7 @@ struct gomp_task bool final_task; bool copy_ctors_done; gomp_sem_t taskwait_sem; + struct gomp_task_depend_entry depend[]; }; struct gomp_taskgroup @@ -286,6 +304,7 @@ struct gomp_taskgroup bool in_taskgroup_wait; bool cancelled; gomp_sem_t taskgroup_sem; + size_t num_children; }; /* This structure describes a team of threads. These are the threads @@ -525,6 +544,8 @@ extern void gomp_barrier_handle_tasks (g static void inline gomp_finish_task (struct gomp_task *task) { + if (__builtin_expect
Re: [patch] fix libstdc++/55861
On 19 January 2013 23:43, Jonathan Wakely wrote: PR libstdc++/55861 * include/std/future (_State_base::_S_check(const shared_ptrT)): Fix return type. (__basic_future::_M_get_result()): Const qualify. (shared_future::get()): Likewise. * testsuite/30_threads/shared_future/members/get.cc: Use const objects. Tested x86_64-linux, committed to trunk. I've backported the first part of this, fixing the return type, to the 4.7 branch. Unfortunately it makes std::future unusable with clang, see http://llvm.org/bugs/show_bug.cgi?id=17375 Tested x86_64-linux, committed to the 4.7 branch. commit e2829e7c39c153cfc0d09cd8a8be14c5467c7730 Author: Jonathan Wakely jwakely@gmail.com Date: Thu Sep 26 10:11:29 2013 +0100 Backport from mainline 2013-01-19 Jonathan Wakely jwakely@gmail.com PR libstdc++/55861 * include/std/future (_State_base::_S_check(const shared_ptrT)): Fix return type. diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future index 98c7b84..150c1af 100644 --- a/libstdc++-v3/include/std/future +++ b/libstdc++-v3/include/std/future @@ -1,6 +1,6 @@ // future -*- C++ -*- -// Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc. +// Copyright (C) 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free // software; you can redistribute it and/or modify it under the @@ -456,7 +456,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __setter(promisevoid* __prom); templatetypename _Tp -static bool +static void _S_check(const shared_ptr_Tp __p) { if (!static_castbool(__p))
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Tue, Sep 24, 2013 at 5:31 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi Honza, I am finally getting back to working on this after a few weeks of working on some other priorities. I am also trying to return to this, so good timming ;) Martin has got smaller C++ programs (Inkscape) to not touch cold segment during the startup with FDO (w/o partitioning). Firefox still does, I think the problem are lost samples due to different linker decisions even with LTO. (i.e. linker pick an object from .a libraryat profile-generate time that i never passes later.). I plan to look into that today. Did you mean to commit the above change? I see that it went in as part of r202258 but doesn't show up in the ChangeLog entry for that revision. Yes, I meant to check it in, but did not mean to do so w/o Changelog. I wil fix that. In other cases it was mostly loop unrolling in combination with jump threading. So I modified my script to separately report when failure happens for test trained once and test trained hundred times. Thanks for the linker script. I reproduced your results. I looked at a couple cases. The first was one that failed after 1 training run only (2910-2.c). It was due to jump threading, which you noted was a problem. For this one I think we can handle it in the partitioning, since there is an FDO insanity that we could probably treat more conservatively when splitting. We should fix the roundoff issues - when I was introducing the frequency/probability/count system I made an assumptions that parts of programs with very low counts do not matter, since they are not part of hot spot (and I cared only about the hot spot). Now we care about identifying unlikely executed spots and we need to fix this. I looked at one that failed after 100 as well (20031204-1.c). In this case, it was due to expansion which was creating multiple branches/bbs from a logical OR and guessing incorrectly on how to assign the counts: if (octets == 4 (*cp == ':' || *cp == '\0')) { The (*cp == ':' || *cp == '\0') part looked like the following going into RTL expansion: [20031204-1.c : 31:33] _29 = _28 == 58; [20031204-1.c : 31:33] _30 = _28 == 0; [20031204-1.c : 31:33] _31 = _29 | _30; [20031204-1.c : 31:18] if (_31 != 0) goto bb 16; else goto bb 19; where the result of the OR was always true, so bb 16 had a count of 100 and bb 19 a count of 0. When it was expanded, the expanded version of the above turned into 2 bbs with a branch in between. Both comparisons were done in the first bb, but the first bb checked whether the result of the *cp == '\0' compare was true, and if not branched to the check for whether the *cp == ':' compare was true. It gave the branch to the second check against ':' a count of 0, so that bb got a count of 0 and was split out, and put the count of 100 on the fall through assuming the compare with '\0' always evaluated to true. In reality, this OR condition was always true because *cp was ':', not '\0'. Therefore, the count of 0 on the second block with the check for ':' was incorrect, we ended up trying to execute it, and failed. Presumably we had the correct profile data for both blocks, but the accuracy was reduced when the OR was represented as a logical computation with a single branch. We could change the expansion code to do something different, e.g. treat as a 50-50 branch. But we would still end up with integer truncation issues when there was a single training run. But that could be dealt with conservatively in the bbpart code as I suggested for the jump threading issue above. I.e. a cold block with incoming non-cold edges conservatively not marked cold for splitting. FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2422-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/2910-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20020413-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20030903-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20031204-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060420-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20060905-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120427-2.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20120808-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/20121108-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920501-6.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/920726-1.c FAIL1 /home/jh/trunk/gcc/testsuite/gcc.c-torture/execute/981001-1.c FAIL
OMP4/cilkplus: simd clone function mangling
Hi folks. Both OMP4 and Cilk Plus provide mechanisms for simd function cloning. In OMP4 it's #pragma omp declare simd and in Cilk Plus they are elemental functions (or simd-enabled functions in their Intel's latest nomenclature). For lack of a better term, I'll call them simd clones. We need a generic way of representing the clone information all the way through to the vectorizer, as well as provide unique mangling for the simd functions. The current patch does both. Currently nothing is done with the information, but I wanted to get this out and get feedback as this will affect not only OMP4 and Cilk Plus, but possibly OpenACC and others-- and later, the vectorizer. Intel's Vector Function ABI v0.9.5 document explains how their mangling and ABI works for Cilk Plus, so this seems like a good start. I have adapted the algorithm to use a more generic interface for either non-x86* or non-CilkPlus. I'd like to keep/commit this in the gomp4 branch, since it has a complete parsing implementation of #pragma omp declare simd. Tested on x86-64 Linux on the gomp-4_0-branch. OK for branch? diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1a12eda..ff1533c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,45 @@ +2013-09-24 Aldy Hernandez al...@redhat.com + + * Makefile.in (omp-low.o): Depend on PRETTY_PRINT_H. + * ipa-cp.c (determine_versionability): Nodes with SIMD clones are + not versionable. + * ggc.h (ggc_alloc_cleared_simd_clone_stat): New. + * cgraph.h (enum linear_stride_type): New. + (struct simd_clone_arg): New. + (struct simd_clone): New. + (struct cgraph_node): Add `simdclone' field. + Add `has_simd_clones' field. + * omp-low.c: Add new pass_omp_simd_clone support code. + (vecsize_mangle): New. + (ipa_omp_simd_clone): New. + (simd_clone_clauses_extract): New. + (simd_clone_compute_base_data_type): New. + (simd_clone_compute_isa_and_simdlen): New. + (simd_clone_create): New. + (simd_clone_mangle): New. + (simd_clone_struct_allow): New. + (simd_clone_struct_copy): New. + (class argno_map): New. + (argno_map::argno_map(tree)): New. + (argno_map::~argno_map): New. + (argno_map::to_tree): New. + * tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): New. + * tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document. + * tree-pass.h (make_pass_omp_simd_clone): New. + * passes.def (pass_omp_simd_clone): New. + * target.def: Define new hook prefix TARGET_CILKPLUS_. + (default_vector_mangling_isa_code): New. + (max_vector_size_for_isa): New. + * doc/tm.texi.in: Add placeholder for + TARGET_CILKPLUS_DEFAULT_DEFAULT_VECTOR_MANGLING_ISA_CODE, + TARGET_CILKPLUS_MAX_VECTOR_SIZE_FOR_ISA. + * doc/tm.texi: Regenerate. + * config/i386/i386.c (ix86_cilkplus_default_vector_mangling_isa_code): + New. + (ix86_cilkplus_max_vector_size_for_isa): New. + (TARGET_CILKPLUS_DEFAULT_DEFAULT_VECTOR_MANGLING_ISA_CODE): Define. + (TARGET_CILKPLUS_MAX_VECTOR_SIZE_FOR_ISA): Define. + 2013-09-19 Jakub Jelinek ja...@redhat.com PR tree-optimization/58472 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index c006711..4fc7e48 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2573,6 +2573,7 @@ omp-low.o : omp-low.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ $(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h $(DIAGNOSTIC_CORE_H) \ $(TREE_SSA_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \ $(TREE_PASS_H) $(GGC_H) $(EXCEPT_H) $(SPLAY_TREE_H) $(OPTABS_H) \ + $(PRETTY_PRINT_H) \ $(CFGLOOP_H) tree-iterator.h $(TARGET_H) gt-omp-low.h tree-browser.o : tree-browser.c tree-browser.def $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(HASH_TABLE_H) $(TREE_H) $(TREE_PRETTY_PRINT_H) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 50e8743..3db29cd 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -248,6 +248,68 @@ struct GTY(()) cgraph_clone_info bitmap combined_args_to_skip; }; +enum linear_stride_type { + LINEAR_STRIDE_NO, + LINEAR_STRIDE_YES_CONSTANT, + LINEAR_STRIDE_YES_VARIABLE +}; + +/* Function arguments in the original function of a SIMD clone. + Supplementary data for `struct simd_clone'. */ + +struct GTY(()) simd_clone_arg { + /* A SIMD clone's argument can be either linear (constant or + variable), uniform, or vector. If the argument is neither linear + or uniform, the default is vector. */ + + /* If the linear stride is a constant, `linear_stride' is + LINEAR_STRIDE_YES_CONSTANT, and `linear_stride_num' holds + the numeric stride. + + If the linear stride is variable, `linear_stride' is + LINEAR_STRIDE_YES_VARIABLE, and `linear_stride_num' contains + the function argument containing the stride (as an index into the + function arguments starting at 0). + + Otherwise, `linear_stride' is LINEAR_STRIDE_NO
Re: OMP4/cilkplus: simd clone function mangling
+ /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Oh yeah, rth had asked me why I remove the attribute. My initial thoughts were that whether or not a function is a simd clone can be accessed through the cgraph bits (node-simdclone != NULL for the clone, and node-has_simd_clones for the parent). No sense keeping the attribute. But I can leave it if you think it's better. Aldy
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
I discovered that I was setting the wv/wu constraints incorrectly to ALTIVEC_REGS, which leads to reload failures in some cases. Is this patch ok to apply along with the previous patch assuming it bootstraps and has no regressions with make check? It builds the programs that had failures with the previous patch. 2013-09-26 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can occupy the Altivec registers. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 202846) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1209,9 +1209,9 @@ BU_VSX_1 (XVRSPIZ, xvrspiz,CONS BU_VSX_1 (XSRDPI,xsrdpi, CONST, vsx_xsrdpi) BU_VSX_1 (XSRDPIC, xsrdpic,CONST, vsx_xsrdpic) -BU_VSX_1 (XSRDPIM, xsrdpim,CONST, vsx_floordf2) -BU_VSX_1 (XSRDPIP, xsrdpip,CONST, vsx_ceildf2) -BU_VSX_1 (XSRDPIZ, xsrdpiz,CONST, vsx_btruncdf2) +BU_VSX_1 (XSRDPIM, xsrdpim,CONST, floordf2) +BU_VSX_1 (XSRDPIP, xsrdpip,CONST, ceildf2) +BU_VSX_1 (XSRDPIZ, xsrdpiz,CONST, btruncdf2) /* VSX predicate functions. */ BU_VSX_P (XVCMPEQSP_P, xvcmpeqsp_p,CONST, vector_eq_v4sf_p) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 202874) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -2394,13 +2394,17 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS; rs6000_constraints[RS6000_CONSTRAINT_wd] = VSX_REGS; rs6000_constraints[RS6000_CONSTRAINT_wf] = VSX_REGS; - rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS; if (TARGET_VSX_TIMODE) rs6000_constraints[RS6000_CONSTRAINT_wt] = VSX_REGS; - rs6000_constraints[RS6000_CONSTRAINT_ws] - = (TARGET_UPPER_REGS_DF) ? VSX_REGS : FLOAT_REGS; + if (TARGET_UPPER_REGS_DF) + { + rs6000_constraints[RS6000_CONSTRAINT_ws] = VSX_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wv] = ALTIVEC_REGS; + } + else + rs6000_constraints[RS6000_CONSTRAINT_ws] = FLOAT_REGS; } /* Add conditional constraints based on various options, to allow us to @@ -2420,12 +2424,16 @@ rs6000_init_hard_regno_mode_ok (bool glo if (TARGET_POWERPC64) rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS; - if (TARGET_P8_VECTOR) + if (TARGET_P8_VECTOR TARGET_UPPER_REGS_SF) { rs6000_constraints[RS6000_CONSTRAINT_wu] = ALTIVEC_REGS; - rs6000_constraints[RS6000_CONSTRAINT_wy] - = rs6000_constraints[RS6000_CONSTRAINT_ww] - = (TARGET_UPPER_REGS_SF) ? VSX_REGS : FLOAT_REGS; + rs6000_constraints[RS6000_CONSTRAINT_wy] = VSX_REGS; + rs6000_constraints[RS6000_CONSTRAINT_ww] = VSX_REGS; +} + else if (TARGET_P8_VECTOR) +{ + rs6000_constraints[RS6000_CONSTRAINT_wy] = FLOAT_REGS; + rs6000_constraints[RS6000_CONSTRAINT_ww] = FLOAT_REGS; } else if (TARGET_VSX) rs6000_constraints[RS6000_CONSTRAINT_ww] = FLOAT_REGS; Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 202846) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -617,6 +617,25 @@ extern int rs6000_vector_align[]; || rs6000_cpu == PROCESSOR_PPC8548) +/* Whether SF/DF operations are supported on the E500. */ +#define TARGET_SF_SPE (TARGET_HARD_FLOAT TARGET_SINGLE_FLOAT \ + !TARGET_FPRS) + +#define TARGET_DF_SPE (TARGET_HARD_FLOAT TARGET_DOUBLE_FLOAT \ + !TARGET_FPRS TARGET_E500_DOUBLE) + +/* Whether SF/DF operations are supported by by the normal floating point unit + (or the vector/scalar unit). */ +#define TARGET_SF_FPR (TARGET_HARD_FLOAT TARGET_FPRS \ + TARGET_SINGLE_FLOAT) + +#define TARGET_DF_FPR (TARGET_HARD_FLOAT TARGET_FPRS \ + TARGET_DOUBLE_FLOAT) + +/* Whether SF/DF operations are supported by any hardware. */ +#define TARGET_SF_INSN (TARGET_SF_FPR || TARGET_SF_SPE) +#define TARGET_DF_INSN (TARGET_DF_FPR || TARGET_DF_SPE) + /* Which machine supports the various reciprocal estimate instructions. */ #define TARGET_FRES(TARGET_HARD_FLOAT TARGET_PPC_GFXOPT \ TARGET_FPRS TARGET_SINGLE_FLOAT)
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
Just to be clear, I was only asking about the change in rs6000.c. The other two changes (rs6000-builtins.def, rs6000.h) will be part of the next patch set. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, i386, MPX 1/X] Support of Intel MPX ISA. 2/2 New registers and instructions
On Tue, Sep 17, 2013 at 10:41 AM, Ilya Enkovich enkovich@gmail.com wrote: The x86 part looks mostly OK (I have a couple of comments bellow), but please first get target-independent changes reviewed and committed. Do you mean I should move bound type and mode declaration into a separate patch? Yes, target-independent part (middle end) has to go through the separate review to check if this part is OK. The target-dependent part uses the infrastructure from the middle end, so it can go into the code base only after target-independent parts are committed. I sent a separate patch for bound type and mode class (http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01268.html). Here is target part of the patch with fixes you mentioned. Does it look OK? Bootstrapped and checked on linux-x86_64. Still shows incorrect length attribute computation (described here http://gcc.gnu.org/ml/gcc/2013-07/msg00311.html). Please look at the attached patch that solves length computation problem. The patch also implements length calculation in a generic way, as proposed earlier. The idea is to calculate total insn length via generic length attribute calculation from length_nobnd attribute, but iff length_attribute is non-null. This way, we are able to decorate bnd-prefixed instructions by lenght_nobnd attribute, and generic part will automatically call ix86_bnd_prefixed_insn_p predicate with current insn pattern. I also belive that this approach is most flexible to decorate future patterns. The patch adds new attribute to a couple of patterns to illustrate its usage. Please test this approach. Modulo length calculations, improved by the patch in this message, I have no further comments, but please repost complete (target part) of your patch. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 202953) +++ config/i386/i386.md (working copy) @@ -562,12 +562,19 @@ ] (const_int 1))) +;; When this attribute is set, calculate total insn length from +;; length_nobnd attribute, prefixed with eventual bnd prefix byte +(define_attr length_nobnd (const_int 0)) + ;; The (bounding maximum) length of an instruction in bytes. ;; ??? fistp and frndint are in fact fldcw/{fistp,frndint}/fldcw sequences. ;; Later we may want to split them and compute proper length as for ;; other insns. (define_attr length - (cond [(eq_attr type other,multi,fistp,frndint) + (cond [(eq_attr length_nobnd !0) + (plus (symbol_ref (ix86_bnd_prefixed_insn_p (insn))) +(attr length_nobnd)) +(eq_attr type other,multi,fistp,frndint) (const_int 16) (eq_attr type fcmp) (const_int 4) @@ -10683,7 +10690,7 @@ %+j%C1\t%l0 [(set_attr type ibr) (set_attr modrm 0) - (set (attr length) + (set (attr length_nobnd) (if_then_else (and (ge (minus (match_dup 0) (pc)) (const_int -126)) (lt (minus (match_dup 0) (pc)) @@ -10701,7 +10708,7 @@ %+j%c1\t%l0 [(set_attr type ibr) (set_attr modrm 0) - (set (attr length) + (set (attr length_nobnd) (if_then_else (and (ge (minus (match_dup 0) (pc)) (const_int -126)) (lt (minus (match_dup 0) (pc)) @@ -11623,7 +11630,7 @@ [(simple_return)] reload_completed ret - [(set_attr length 1) + [(set_attr length_nobnd 1) (set_attr atom_unit jeu) (set_attr length_immediate 0) (set_attr modrm 0)])
[Patch] Regex Fixes on constants and no-backref in extended
Just follow the standard :) Booted and tested under -m32, -m64. Thank you! -- Tim Shen a.patch Description: Binary data
Re: [Patch] Regex Fixes on constants and no-backref in extended
On 26 September 2013 22:28, Tim Shen wrote: Just follow the standard :) Booted and tested under -m32, -m64. The ChangeLog entry says stanard, with that fixed this is OK to commit, thanks!
Re: [Patch] Regex Fixes on constants and no-backref in extended
On Thu, Sep 26, 2013 at 5:37 PM, Jonathan Wakely jwakely@gmail.com wrote: The ChangeLog entry says stanard, with that fixed this is OK to commit, thanks! Committed. -- Tim Shen
Re: [PATCH] Trivial cleanup
On 09/26/2013 10:21 AM, Richard Biener wrote: On Thu, Sep 26, 2013 at 4:15 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 25 Sep 2013, Jeff Law wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Yea. I actually reviewed the libstdc++ guidelines to see where they differed from GNU's C guidelines. I'm strongly in favor of dropping the horizontal whitespace between the method name and its open paren when the result is then dereferenced. ie foo.last()-e rather than foo.last ()-e. I'd prefer to not write in this style at all, like Jakub. If we must absolutely have it, then I agree that the space before _empty_ parentheses are ugly if followed by references. I.e. I'd like to see spaces before parens as is customary, except in one case: empty parens in the middle of expressions (which don't happen very often right now in GCC, and hence wouldn't introduce a coding style confusion): do.this (); give.that()-flag; get.list (one)-clear (); I'd prefer to not have further references to return values be applied, though (as in, the parentheses should be the end of statement), which would avoid the topic (at the expensive of having to invent names for those temporaries, or to write trivial wrapper methods contracting several method calls). Seconded, even give.that()-flag; is ugly. I don't find it ugly :-) so my example would end up being int unsignedsrcp = ptrvar.type().type().type_unsigned (); ? I can live with this suggestion... its the sequence of 2 or 3 or 4 empty parens with spaces that I really found distasteful. Andrew
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
As for COMDAT merging, i would like to see the patch. I am experimenting now with a patch to also privatize COMDATs during -fprofile-generate to avoid problems with lost profiles mentioned above. Do you mean you privatize every COMDAT function in the profile-generate? We discussed this idea internally and we thought it would not work for large applications (like in google) due to size. Yes, Martin and I plan to test this on firefox. In a way you already have all the COMDAT functions unshared in the object files, so the resulting binary should not be completely off the limits. But I do not have any quantitative data, yet, since we hit bug in constant folding and devirtualization I fixed in meantime but we did not re-run the tests yet. As for context sensitivity, one approach would be to have two sets of counters for every comdat - one merged globally and one counting local instances. We can then privatize always and at profile read in stage just clone every comdat and have two instances - one for offline copy and one for inlining. In my implementation, I also allow multiple sets of COMDAT profile co-existing in one compilation. Due to the auxiliary modules in LIPO, I actually have more than two. How does auxiliary modules work? But I'm wondering how do you determine which profile to use for each call-site -- the inline decision may not be the same for profile-generate and profile-use compilation. My suggestion was to simply use the module local profile for all inline sites within the given module and the global profile for the offline copy of the function (that one will, in the case it survives linking, be shared across all the modules anyway). I think this may work in the cases where i.e. use of hash templates in one module is very different (in average size) from other module. I did not really put much effort into it - I currently worry primarily about the cases where profile is lost completely since it gets attached to a function not surviving final linking (or because we inline something we did not inlined at profile time). As for context sensitivity, we may try to consider developing more consistent solution for this. COMDAT functions are definitely not only that may exhibit context sensitive behaviour. One approach would be to always have multiple counters for each function and hash based on cbacktraces collected by indirect call profiling instrumentation. In a way this is same path profiling, but that would definitely add quite some overhead + we will need to think of resonable way to represent this within compiler. How do you decide what functions you want to have multiple profiles for? Honza
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
Why not just have probably_never_executed_bb_p return simply return false bb-frequency is non-zero (right now it does the opposite - We want to have frequencies guessed for functions that was not trained in the profiling run (that was patch I posted earlier that I think did not go in, yet). Currently I return true when frequency indicate that BB is executed at least in 1/4th of all executions. With the cases discussed I see we may need to reduce this threshold. In general I do not like much hard tests for 0 because meaning of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may want to make frequencies sreal, too. I suppose we may introduce --param for this. You are also right that I should update probably_never_executed_edge_p (I intended so, but obviously the code ended up in mainline accidentally). I however saw at least one case of jump threading where this trick did not help: the jump threading update confused itself by scaling via counts rather than frequencies and ended up with dropping everything to 0. This makes it more tempting to try to go with sreals for those Honza returns true when bb-frequency is 0)? Making this change removed a bunch of other failures. With this change as well, there are only 3 cases that still fail with 1 train run that pass with 100. Need to look at those. Will you look into logic of do_jump or shall I try to dive in? I can take a look, but probably won't have a chance until late this week. If you don't get to it before then I will see if I can figure out why it is applying the branch probabilities this way. Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[google gcc-4_8] fix size_estimation for builtin_expect
Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
[google gcc-4_8] alternate hirate for builtin_expert
Hi, Current default probably for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. -Rong 2013-09-26 Rong Xu x...@google.com * params.def (DEFPARAM): New. * params.def: New. * predict.c (tree_predict_by_opcode): Alternate probablity hirate for builtin_expect. Index: params.def === --- params.def (revision 202638) +++ params.def (working copy) @@ -483,6 +483,10 @@ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY, tracer-min-branch-probability, Stop forward growth if the probability of best edge is less than this threshold (in percent). Used when profile feedback is not available, 50, 0, 100) +DEFPARAM(BUILTIN_EXPECT_PROBABILITY_RELAXED, +builtin-expect-probability-relaxed, +Set the estimated probability for builtin expect. By default using PROB_VERY_LIKELY, while value of 1 using HIRATE(90) probability., +0, 0, 1) /* The maximum number of incoming edges to consider for crossjumping. */ DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES, Index: params.def === --- params.def (revision 202638) +++ params.def (working copy) @@ -483,6 +483,10 @@ DEFPARAM(TRACER_MIN_BRANCH_PROBABILITY, tracer-min-branch-probability, Stop forward growth if the probability of best edge is less than this threshold (in percent). Used when profile feedback is not available, 50, 0, 100) +DEFPARAM(BUILTIN_EXPECT_PROBABILITY_RELAXED, +builtin-expect-probability-relaxed, +Set the estimated probability for builtin expect. By default using PROB_VERY_LIKELY, while value of 1 using HIRATE(90) probability., +0, 0, 1) /* The maximum number of incoming edges to consider for crossjumping. */ DEFPARAM(PARAM_MAX_CROSSJUMP_EDGES, Index: predict.c === --- predict.c (revision 202638) +++ predict.c (working copy) @@ -1950,10 +1950,17 @@ tree_predict_by_opcode (basic_block bb) BITMAP_FREE (visited); if (val) { + enum br_predictor predictor; + + if (PARAM_VALUE (BUILTIN_EXPECT_PROBABILITY_RELAXED) == 0) +predictor = PRED_BUILTIN_EXPECT; + else +predictor = PRED_BUILTIN_EXPECT_RELAXED; + if (integer_zerop (val)) - predict_edge_def (then_edge, PRED_BUILTIN_EXPECT, NOT_TAKEN); + predict_edge_def (then_edge, predictor, NOT_TAKEN); else - predict_edge_def (then_edge, PRED_BUILTIN_EXPECT, TAKEN); + predict_edge_def (then_edge, predictor, TAKEN); return; } /* Try pointer heuristic.
Re: [google gcc-4_8] fix size_estimation for builtin_expect
Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. This seems fine with an comment in the code what it is about. I also think we want to support mutiple builtin_expects in a BB so perhaps we want to have pointer set of statements to ignore? To avoid spagetti code, please just move the new logic into separate functions. Honza Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Thu, Sep 26, 2013 at 2:54 PM, Jan Hubicka hubi...@ucw.cz wrote: As for COMDAT merging, i would like to see the patch. I am experimenting now with a patch to also privatize COMDATs during -fprofile-generate to avoid problems with lost profiles mentioned above. Do you mean you privatize every COMDAT function in the profile-generate? We discussed this idea internally and we thought it would not work for large applications (like in google) due to size. Yes, Martin and I plan to test this on firefox. In a way you already have all the COMDAT functions unshared in the object files, so the resulting binary should not be completely off the limits. But I do not have any quantitative data, yet, since we hit bug in constant folding and devirtualization I fixed in meantime but we did not re-run the tests yet. LInker removes a great numbers of duplicated copies, esp for those template functions. We don't have a quantitative numbers either. But I'll collect some soon. As for context sensitivity, one approach would be to have two sets of counters for every comdat - one merged globally and one counting local instances. We can then privatize always and at profile read in stage just clone every comdat and have two instances - one for offline copy and one for inlining. In my implementation, I also allow multiple sets of COMDAT profile co-existing in one compilation. Due to the auxiliary modules in LIPO, I actually have more than two. How does auxiliary modules work? It pulls in multiple profiles from other compilation. So there might be multiple inlined profiles. But I'm wondering how do you determine which profile to use for each call-site -- the inline decision may not be the same for profile-generate and profile-use compilation. My suggestion was to simply use the module local profile for all inline sites within the given module and the global profile for the offline copy of the function (that one will, in the case it survives linking, be shared across all the modules anyway). For simple example like: callsite1 -- comcat_function_foo callsite2 -- comdat_function_foo callsite1 is inlined in profile-generate, it has its own inlined profile counter. callsite2 is not inlined and the profile goes to the offline copies. let's callsite 1 is cold (0 counter) and callsite 2 is hot. Using local profile (the cold one) for callsite2 will not be correct. I think this may work in the cases where i.e. use of hash templates in one module is very different (in average size) from other module. I did not really put much effort into it - I currently worry primarily about the cases where profile is lost completely since it gets attached to a function not surviving final linking (or because we inline something we did not inlined at profile time). As for context sensitivity, we may try to consider developing more consistent solution for this. COMDAT functions are definitely not only that may exhibit context sensitive behaviour. One approach would be to always have multiple counters for each function and hash based on cbacktraces collected by indirect call profiling instrumentation. In a way this is same path profiling, but that would definitely add quite some overhead + we will need to think of resonable way to represent this within compiler. How do you decide what functions you want to have multiple profiles for? I do the instrumentation after ipa-inline for comdat function. I know if a callsite is inlined or not. In profile-use phrase, I also need to provide to the context (which module this is from) to pick the right profile. Honza
Re: [google gcc-4_8] alternate hirate for builtin_expert
Hi, Current default probably for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This is interesting. I was always unsure if programmers use builtin_expect more often to mark an impossible paths (as those leading to crash) or just unlikely paths. Your data seems to suggest the second. We probably also ought to get analyze_brprob working again. It was always useful to get such a data. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. -Rong 2013-09-26 Rong Xu x...@google.com * params.def (DEFPARAM): New. * params.def: New. * predict.c (tree_predict_by_opcode): Alternate probablity hirate for builtin_expect. This also seems resonable for mainline. Please add a comment to the parameter explaining how the value was determined. Please also add invoke.texi documentation. For patches that seems resonable for mainline in FDO/IPA area, i would apprechiate if you just added me to CC, so I do not lose track of them. Honza
Re: [google gcc-4_8] alternate hirate for builtin_expert
This patch improves linux kernel performance with a large workload, so it is good to first submit this to trunk and backport it. thanks, David On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, Current default probably for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This is interesting. I was always unsure if programmers use builtin_expect more often to mark an impossible paths (as those leading to crash) or just unlikely paths. Your data seems to suggest the second. We probably also ought to get analyze_brprob working again. It was always useful to get such a data. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. -Rong 2013-09-26 Rong Xu x...@google.com * params.def (DEFPARAM): New. * params.def: New. * predict.c (tree_predict_by_opcode): Alternate probablity hirate for builtin_expect. This also seems resonable for mainline. Please add a comment to the parameter explaining how the value was determined. Please also add invoke.texi documentation. For patches that seems resonable for mainline in FDO/IPA area, i would apprechiate if you just added me to CC, so I do not lose track of them. Honza
Re: [google gcc-4_8] alternate hirate for builtin_expert
it might worth extend __builtin_expect to take more parameters: 1) argument to specify actual probability: __builtin_expect (x, 10, 0.6) 2) a more general way of doing this is to allow specifying multiple values, and the probability is determined by # of occurances: __builtin_expect (x, 10, 10, 20) -- tells compiler x is expected to be 10 66% of the time, and 33% of time with value twenty. 3) a special value can be reserved to indicate if the branch is predictable or not. David On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, Current default probably for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This is interesting. I was always unsure if programmers use builtin_expect more often to mark an impossible paths (as those leading to crash) or just unlikely paths. Your data seems to suggest the second. We probably also ought to get analyze_brprob working again. It was always useful to get such a data. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. -Rong 2013-09-26 Rong Xu x...@google.com * params.def (DEFPARAM): New. * params.def: New. * predict.c (tree_predict_by_opcode): Alternate probablity hirate for builtin_expect. This also seems resonable for mainline. Please add a comment to the parameter explaining how the value was determined. Please also add invoke.texi documentation. For patches that seems resonable for mainline in FDO/IPA area, i would apprechiate if you just added me to CC, so I do not lose track of them. Honza
Re: [gomp4] Library side of depend clause support
On 09/26/2013 11:36 AM, Jakub Jelinek wrote: +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; I'm a bit confused about the combination of linked lists and reallocated arrays. When did you decide to use what? + if ((flags 8) thr-task thr-task-depend_hash) + { + struct gomp_task *parent = thr-task; + struct gomp_task_depend_entry elem, *ent = NULL; + size_t ndepend = (uintptr_t) depend[0]; + size_t nout = (uintptr_t) depend[1]; + size_t i; + gomp_mutex_lock (team-task_lock); + for (i = 0; i ndepend; i++) + { + elem.addr = depend[i + 2]; + ent = htab_find (parent-depend_hash, elem); + for (; ent; ent = ent-next) + if (i = nout ent-is_in) + continue; + else + break; I wonder if we ought always defer tasks with dependencies and skip this lock and search? Unless the taskgroup is truly weird, we *are* going to have dependencies between the tasks. Dunno what exactly to do with final_tasks that have unfulfilled dependencies... I also think it would significantly clean up the code to declare a struct with a variable tail for the depend argument. That would eliminate all of the casting to uintptr_t and give names to the first two entries. + if (tsk-dependers == NULL) + { + tsk-dependers + = gomp_malloc (8 * sizeof (struct gomp_task *)); + tsk-dependers[0] + = (struct gomp_task *) (uintptr_t) 1; + tsk-dependers[1] + = (struct gomp_task *) (uintptr_t) (8 - 2); + tsk-dependers[2] = task; + task-num_dependees++; + continue; Another place for which a struct with variable tail would significantly clean up the code. And here's where I wonder why you're using realloc'd arrays here as opposed to another linked list? Perhaps what we need are smaller linked list entries like struct gomp_task_depend_node { struct gomp_task *task; struct gomp_task_depend_node *next; struct gomp_task_depend_node *prev; }; and a different hash table entry like struct gomp_task_depend_head { void *addr; struct gomp_task_depend_node *outs; struct gomp_task_depend_node *ins; size_t n_ins; }; If we scan the ndepend entries twice, we can find out how many nodes we need, and allocate them with the task like you do now. Scanning the ndepends array twice can be sped by only looking up the hash table entries once -- allocate a local array of size ndepend entries and cache the lookups. I'd say we don't need a count of the n_outs because all of them on the list must be sequentially dependent. Thus any new task simply depends on the previous task in the outs list. Thus imo it makes sense to have ins/outs point to the tail of the list as opposed to the head. Is is really worthwhile to detect redundant dependencies? It seems just as easy to add multiple dependencies and let them just fall out naturally. OTOH, perhaps you should just go ahead with this patch and we can evolve it incrementally. I don't see anything technically wrong with it. r~
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
On Thu, Sep 26, 2013 at 4:51 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I discovered that I was setting the wv/wu constraints incorrectly to ALTIVEC_REGS, which leads to reload failures in some cases. Is this patch ok to apply along with the previous patch assuming it bootstraps and has no regressions with make check? It builds the programs that had failures with the previous patch. 2013-09-26 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can occupy the Altivec registers. Okay. Can you add a testcase to catch this in the future? Thanks, David
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #3
On Thu, Sep 26, 2013 at 06:56:37PM -0400, David Edelsohn wrote: On Thu, Sep 26, 2013 at 4:51 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I discovered that I was setting the wv/wu constraints incorrectly to ALTIVEC_REGS, which leads to reload failures in some cases. Is this patch ok to apply along with the previous patch assuming it bootstraps and has no regressions with make check? It builds the programs that had failures with the previous patch. 2013-09-26 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Don't allow wv/wu constraints to be ALTIVEC_REGISTERS unless DF/SF can occupy the Altivec registers. Okay. Can you add a testcase to catch this in the future? You only see it in big programs with agressive optimizations. I did not see it during the normal testing (bootstrap, etc.). The failure is reload complaining it can't find an Altivec register to spill if the move pattern has an option for only Altivec registers. It isn't like bad code is silently generated. I will check the 5 spec benchmarks that failed with to see if I can extract one module that shows it off. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [gomp4] Library side of depend clause support
On Thu, Sep 26, 2013 at 03:54:09PM -0700, Richard Henderson wrote: On 09/26/2013 11:36 AM, Jakub Jelinek wrote: +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; I'm a bit confused about the combination of linked lists and reallocated arrays. When did you decide to use what? I initially wanted to use linked lists only, but while I can statically preallocate the chains for the hash table, for the depender - dependee chains where a task may depend on many other tasks that would mean having to allocate small structures (or pool allocate them, per team?). I wonder if we ought always defer tasks with dependencies and skip this lock and search? Unless the taskgroup is truly weird, we *are* going to have dependencies between the tasks. Dunno what exactly to do with final_tasks that have unfulfilled dependencies... I think final tasks aren't a problem, if the parent is a final task, then all the children are non-deferred, thus we never record any dependencies and the test for that will be cheap too (because parent-depend_hash will be NULL). The problem is if (0) tasks, the spec says that they must be non-deferred unless they depend on some earlier non-finished task. But the cost in that case is primarily in taking the lock/unlock; the search will stop on the first dependency found, if there aren't any, nothing will be recorded and we don't jump to the defer label, if there are some, as soon as we discover first we jump there. I also think it would significantly clean up the code to declare a struct with a variable tail for the depend argument. That would eliminate all of the casting to uintptr_t and give names to the first two entries. I agree if we keep using realloced vectors that flexible array would make it cleaner. Perhaps what we need are smaller linked list entries like struct gomp_task_depend_node { struct gomp_task *task; struct gomp_task_depend_node *next; struct gomp_task_depend_node *prev; }; The dependee - depender vectors resp. linked lists are just pushed to first (the only thing needed during insertion is to have a cheap check if the last inserted task is the current one, to avoid having the same task multiple times in the vector/linked list), and then just walked once when the dependee finishes, so no removal is needed there, it can be freed at once; thus, for linked list, it would be enough to use non-doubly linked list for that. For the hash table chains, unless we want to always lookup the hash table and walk the chains for removal, we need doubly linked list. and a different hash table entry like struct gomp_task_depend_head { void *addr; struct gomp_task_depend_node *outs; struct gomp_task_depend_node *ins; size_t n_ins; }; You mean that the hash table instead would contain the structures, or pointers to these structures? If the latter (not sure what n_ins would be for), then we'd again need to pool alloc them. If we scan the ndepend entries twice, we can find out how many nodes we need, and allocate them with the task like you do now. Scanning the ndepends array twice can be sped by only looking up the hash table entries once -- allocate a local array of size ndepend entries and cache the lookups. I'd say we don't need a count of the n_outs because all of them on the list must be sequentially dependent. Thus any new task simply depends on the previous task in the outs list. Thus imo it makes sense to have ins/outs point to the tail of the list as opposed to the head. Ah, haven't thought about it this way, yes, you're right that for out/inout dependencies it is enough to remember in the hash table the last one, because the dependencies will form a chain on the same address and serialize the tasks. Is is really worthwhile to detect redundant dependencies? It seems just as easy to add multiple dependencies and let them just fall out naturally. I just didn't want to have duplicates in the hash table chains, the redundant flag is just a sign that the entry doesn't need to be removed from the hash table chains. OTOH, perhaps you should just go ahead with this patch and we can evolve it incrementally. I don't see anything technically wrong with it. Perhaps. What if I do just minor cleanup (use flexible array members for the reallocated vectors, and perhaps keep only the last out/inout task in the hash table chains rather than all of them), retest, commit and then we can discuss/incrementally improve it? Jakub
[ping] [PATCH] Silence an unused variable warning
On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi! With the VAX target, I see this warning: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include -I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../../gcc/gcc/../libbacktrace ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’: ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable ‘value_p’ [-Wunused-variable] bool value_p; ^ [...] Ping: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01568.html `-- http://gcc.gnu.org/ml/gcc-patches/2013-09/txtnrNwaGiD3x.txt MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: Gib Dein Bestes. Dann übertriff Dich selbst! the second : signature.asc Description: Digital signature
[Patch] Let ordinary escaping in POSIX regex be valid
POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 Thanks! -- Tim Shen a.patch Description: Binary data
User-define literals for std::complex.
Greetings, The complex user-defined literals finally passed (n3779) with the resolution to DR1473 allowing the suffix id to touch the quotes (Can't find it but I put it in not too long ago). (http://wiki.edg.com/twiki/pub/Wg21chicago2013/LibraryWorkingGroup/N3779-complex_literals.pdf) Actually, I think allowing space between quotes and suffix ID was a mistake. Also it looks like they are *removing* inline from the 'namespace literals' so that 'using std;' brings in the literals but that will be a later patch for all literals at once. This has been bootstrapped and regtested on x86_64-linux. As a general stylistic guide for the library I think I'll put operatorabc(...) with no spaces. Later. OK? 2013-09-27 Ed Smith-Rowland 3dw...@verizon.net Implement N3779 - User-defined Literals for std::complex, part 2 of UDL for Standard Library Types * include/std/complex: Add complex literal operators. * testsuite/26_numerics/complex/literals/types.cc: New. * testsuite/26_numerics/complex/literals/values.cc: New. Index: include/std/complex === --- include/std/complex (revision 202928) +++ include/std/complex (working copy) @@ -1924,6 +1924,40 @@ conj(_Tp __x) { return __x; } +#if __cplusplus 201103L + +inline namespace literals { +inline namespace complex_literals { + + inline constexpr std::complexfloat + operatorif(long double __num) + { return std::complexfloat{0.0F, static_castfloat(__num)}; } + + inline constexpr std::complexfloat + operatorif(unsigned long long __num) + { return std::complexfloat{0.0F, static_castfloat(__num)}; } + + inline constexpr std::complexdouble + operatori(long double __num) + { return std::complexdouble{0.0, static_castdouble(__num)}; } + + inline constexpr std::complexdouble + operatori(unsigned long long __num) + { return std::complexdouble{0.0, static_castdouble(__num)}; } + + inline constexpr std::complexlong double + operatoril(long double __num) + { return std::complexlong double{0.0L, __num}; } + + inline constexpr std::complexlong double + operatoril(unsigned long long __num) + { return std::complexlong double{0.0L, static_castlong double(__num)}; } + +} // inline namespace complex_literals +} // inline namespace literals + +#endif // C++14 + _GLIBCXX_END_NAMESPACE_VERSION } // namespace Index: testsuite/26_numerics/complex/literals/types.cc === --- testsuite/26_numerics/complex/literals/types.cc (revision 0) +++ testsuite/26_numerics/complex/literals/types.cc (working copy) @@ -0,0 +1,46 @@ +// { dg-options -std=c++1y } +// { dg-do compile } + +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include complex +#include type_traits + +void +test02() +{ + using namespace std::literals::complex_literals; + + static_assert(std::is_samedecltype(1.0if), std::complexfloat::value, + 1.0if is std::complexfloat); + + static_assert(std::is_samedecltype(1if), std::complexfloat::value, + 1if is std::complexfloat); + + static_assert(std::is_samedecltype(1.0i), std::complexdouble::value, + 1.0i is std::complexdouble); + + static_assert(std::is_samedecltype(1i), std::complexdouble::value, + 1i is std::complexdouble); + + static_assert(std::is_samedecltype(1.0il), std::complexlong double::value, + 1.0il is std::complexlong double); + + static_assert(std::is_samedecltype(1il), std::complexlong double::value, + 1il is std::complexlong double); +} Index: testsuite/26_numerics/complex/literals/values.cc === --- testsuite/26_numerics/complex/literals/values.cc(revision 0) +++ testsuite/26_numerics/complex/literals/values.cc(working copy) @@ -0,0 +1,48 @@ +// { dg-options -std=gnu++1y } +// { dg-do run } + +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either
Re: [PATCH] Trivial cleanup
On 09/26/2013 08:15 AM, Michael Matz wrote: Hi, On Wed, 25 Sep 2013, Jeff Law wrote: I was going to bring it up at some point too. My preference is strongly to simply eliminate the space on methods... Which wouldn't be so weird: in the libstdc++-v3 code we do it all the time. Yea. I actually reviewed the libstdc++ guidelines to see where they differed from GNU's C guidelines. I'm strongly in favor of dropping the horizontal whitespace between the method name and its open paren when the result is then dereferenced. ie foo.last()-e rather than foo.last ()-e. I'd prefer to not write in this style at all, like Jakub. If we must absolutely have it, then I agree that the space before _empty_ parentheses are ugly if followed by references. I.e. I'd like to see spaces before parens as is customary, except in one case: empty parens in the middle of expressions (which don't happen very often right now in GCC, and hence wouldn't introduce a coding style confusion): do.this (); give.that()-flag; get.list (one)-clear (); I'd prefer to not have further references to return values be applied, though (as in, the parentheses should be the end of statement), which would avoid the topic (at the expensive of having to invent names for those temporaries, or to write trivial wrapper methods contracting several method calls). Should we consider banning dereferencing the result of a method call and instead prefer to use a more functional interface such as Jakub has suggested, or have the result of the method call put into a temporary and dereference the temporary. I considered suggesting the latter. I wouldn't be a huge fan of the unnecessary temporaries, but they may be better than the horrid foo.last()-argh()-e-src or whatever. Stuffing the result into a temporary does have one advantage, it encourages us to CSE across the method calls in cases where the compiler might not be able to do so. Of course, being humans, we'll probably mess it up. jeff
RE: [PATCH]Fix computation of offset in ivopt
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 6:31 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } Will be refined. note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / + BITS_PER_UNIT; may still overflow. Since it's unsigned + signed + signed, according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed-unsigned problem here, see below. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION + (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like int_cst_value (tmp) above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? Thanks. bin