Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner. -- Eric Botcazou
[committed] Add pr59927 testcase
Hi! On Tue, Feb 11, 2014 at 01:13:09AM +, rth at gcc dot gnu.org wrote: --- Comment #9 from Richard Henderson rth at gcc dot gnu.org --- Author: rth Date: Tue Feb 11 01:12:38 2014 New Revision: 207677 URL: http://gcc.gnu.org/viewcvs?rev=207677root=gccview=rev Log: PR target/59927 * calls.c (expand_call): Don't double-push for reg_parm_stack_space. * config/i386/i386.c (init_cumulative_args): Remove sorry for 64-bit ms-abi vs -mno-accumulate-outgoing-args. (ix86_expand_prologue): Unconditionally call ix86_eax_live_at_start_p. * config/i386/i386.h (ACCUMULATE_OUTGOING_ARGS): Fix comment with respect to ms-abi. Modified: trunk/gcc/ChangeLog trunk/gcc/calls.c trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h I've committed following testcase for a PR Richard has fixed: 2014-02-11 Jakub Jelinek ja...@redhat.com PR target/59927 * gcc.target/i386/pr59927.c: New test. --- gcc/testsuite/gcc.target/i386/pr59927.c.jj 2014-02-11 10:04:10.430559305 +0100 +++ gcc/testsuite/gcc.target/i386/pr59927.c 2014-02-11 10:03:54.0 +0100 @@ -0,0 +1,17 @@ +/* PR target/59927 */ +/* { dg-do compile } */ +/* { dg-options -O2 -g } */ + +extern void baz (int) __attribute__ ((__ms_abi__)); + +void +foo (void (__attribute__ ((ms_abi)) *fn) (int)) +{ + fn (0); +} + +void +bar (void) +{ + baz (0); +} Jakub
[PATCH][ARM] Fix PR 55426
Hi all, In this PR the 128-bit load-duplicate intrinsics in neon.exp ICE on big-endian with an unrecognisable insn error: neon-vld1_dupQ.c:24:1: error: unrecognizable insn: (insn 94 93 31 (set (subreg:DI (reg:V2DI 95 d16 [orig:137 D.14400 ] [137]) 0) (subreg:DI (reg:V2DI 95 d16 [orig:137 D.14400 ] [137]) 8)) The problem seems to be that the neon_vld1_dupv2di splitter generates subregs after reload with gen_lowpart and gen_highpart. Since that splitter always matches after reload, we already know the hard register numbers, so we can just manipulate those directly to extract the two doubleword parts of a quadword reg. While we're at it, we might as well use a more general move instruction when the alignment is natural to potentially take advantage of more complex addressing modes. We're allowed to do that because the vld1Q_dup*64 intrinsics describe a behaviour and do not guarantee that a particular instruction will be used. Therefore the vld1Q_dup*64 tests are updated to be run-time tests instead to test the functionality. New *_misaligned tests are added, however, to make sure that we still generate vld1.64 when the address is explicitly unaligned, since vld1.64 is the only instruction that can handle that. Did an armeb-none-linux-gnueabihf build. The vld1Q_dup*64* tests now pass on big and little endian. arm-none-linux-gnueabihf bootstrap on Chromebook successful. This is a regression since 4.7. I've tested this on trunk. Will test this on the 4.8 and 4.7 branches. Ok for those branches if no regressions? Thanks, Kyrill 2014-02-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/55426 * config/arm/neon.md (neon_vld1_dupv2di): Do not generate low and high part subregs, use hard reg numbers. * config/arm/arm.c (arm_mem_aligned_p): New function. (arm_init_neon_builtins): Allow for memory operands in load operations. * config/arm/arm-protos.h (arm_mem_aligned_p): Declare extern. * config/arm/constraints.md (Uo): New constraint. 2014-02-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/55426 * gcc.target/arm/neon/vld1Q_dupp64.c: Change to run-time test. * gcc.target/arm/neon/vld1Q_dups64.c: Likewise. * gcc.target/arm/neon/vld1Q_dupu64.c: Likewise. * gcc.target/arm/neon/vld1Q_dupp64_misaligned.c: New test. * gcc.target/arm/neon/vld1Q_dups64_misaligned.c: Likewise. * gcc.target/arm/neon/vld1Q_dupu64_misaligned.c: Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 13874ee..56f46e3 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -95,6 +95,7 @@ extern enum reg_class coproc_secondary_reload_class (enum machine_mode, rtx, extern bool arm_tls_referenced_p (rtx); extern int arm_coproc_mem_operand (rtx, bool); +extern bool arm_mem_aligned_p (rtx, unsigned int); extern int neon_vector_mem_operand (rtx, int, bool); extern int neon_struct_mem_operand (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fc81bf6..33c829d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12499,6 +12499,14 @@ arm_coproc_mem_operand (rtx op, bool wb) return FALSE; } +/* Return true if the MEM RTX x has the given alignment. */ +bool +arm_mem_aligned_p (rtx x, unsigned int alignment) +{ + gcc_assert (MEM_P (x)); + return MEM_ALIGN (x) == alignment; +} + /* Return TRUE if OP is a memory operand which we can load or store a vector to/from. TYPE is one of the following values: 0 - Vector load/stor (vldr) @@ -23644,7 +23652,9 @@ arm_init_neon_builtins (void) /* Neon load patterns always have the memory operand in the operand 1 position. */ gcc_assert (insn_data[d-code].operand[k].predicate -== neon_struct_operand); + == neon_struct_operand + || insn_data[d-code].operand[k].predicate + == memory_operand); switch (d-mode) { diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 85dd116..86947dd 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -381,6 +381,14 @@ (and (match_code mem) (match_test TARGET_32BIT neon_vector_mem_operand (op, 2, true +(define_memory_constraint Uo + @internal + In ARM/Thumb-2 state a valid address for Neon element and structure + load/store instructions or normal load on doubleword alignment. + (and (match_code mem) + (match_test TARGET_32BIT (arm_mem_aligned_p (op, DOUBLEWORD_ALIGNMENT) + || neon_vector_mem_operand (op, 2, true) + (define_memory_constraint Us @internal In ARM/Thumb-2 state a valid address for non-offset loads/stores of diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 2f06e42..e4490ba 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -4406,19 +4406,28 @@ (define_insn_and_split neon_vld1_dupv2di [(set (match_operand:V2DI 0 s_register_operand
Re: Use [warning enabled by default] for default warnings
Richard Biener richard.guent...@gmail.com writes: On Sun, Feb 9, 2014 at 9:30 PM, Robert Dewar de...@adacore.com wrote: On 2/9/2014 3:23 PM, Richard Sandiford wrote: can't we just reword the one warning where there is an ambiguity to avoid the confusion, rather than creating such an earthquake, which as Arno says, really has zero advantages to Ada programmers, and clear disadvantages .. to me [enabled by default] is already awfully long! Well, since the Ada part has been rejected I think we just need to consider this from the non-Ada perspective. And IMO there's zero chance that each new warning will be audited for whether the [enabled by default] will be unambiguous. The fact that this particular warning caused confusion and someone actually reported it doesn't mean that there are no other warnings like that. E.g.: -fprefetch-loop-arrays is not supported with -Os [enabled by default] could also be misunderstood, especially if working on an existing codebase with an existing makefile. And the effect for: pragma simd ignored because -fcilkplus is not enabled [enabled by default] is a bit unfortunate. Those were just two examples -- I'm sure I could pick more. Indeed, worrisome examples, a shorter substitute would be [default warning] ??? Or print nothing at all? After all [...] was supposed to tell people how to disable the warning! If there isn't a way to do that ... maybe instead print [-w]? hmm, all existing [...] are positive so we'd have to print -no-w which doesn't exist. Bah. So there isn't a way to negate -w on the commandline to only get default warnings enabled again. OK, this version drops the [enabled by default] altogether. Tested as before. OK to install? Thanks, Richard gcc/ * opts.c (option_name): Remove enabled by default rider. gcc/testsuite/ * gcc.dg/gomp/simd-clones-5.c: Update comment for new warning message. Index: gcc/opts.c === --- gcc/opts.c 2014-02-10 20:36:32.380197329 + +++ gcc/opts.c 2014-02-10 20:58:45.894502379 + @@ -2216,14 +2216,10 @@ option_name (diagnostic_context *context return xstrdup (cl_options[option_index].opt_text); } /* A warning without option classified as an error. */ - else if (orig_diag_kind == DK_WARNING || orig_diag_kind == DK_PEDWARN - || diag_kind == DK_WARNING) -{ - if (context-warning_as_error_requested) - return xstrdup (cl_options[OPT_Werror].opt_text); - else - return xstrdup (_(enabled by default)); -} + else if ((orig_diag_kind == DK_WARNING || orig_diag_kind == DK_PEDWARN + || diag_kind == DK_WARNING) + context-warning_as_error_requested) +return xstrdup (cl_options[OPT_Werror].opt_text); else return NULL; } Index: gcc/testsuite/gcc.dg/gomp/simd-clones-5.c === --- gcc/testsuite/gcc.dg/gomp/simd-clones-5.c 2014-02-10 20:36:32.380197329 + +++ gcc/testsuite/gcc.dg/gomp/simd-clones-5.c 2014-02-10 21:00:32.549412313 + @@ -3,7 +3,7 @@ /* ?? The -w above is to inhibit the following warning for now: a.c:2:6: warning: AVX vector argument without AVX enabled changes - the ABI [enabled by default]. */ + the ABI. */ #pragma omp declare simd notinbranch simdlen(4) void foo (int *a)
[patch] Fix wrong code with VCE to bit-field type at -O
Hi, this is an interesting regression from GCC 4.5 present on all active branches and triggered by recent SRA improvements. For the attached testcase, we have an unchecked conversion of a 3-byte slice of an array of 4 bytes to a record type containing a 3-byte bit-field; an unchecked conversion in this case directly translates into a VIEW_CONVERT_EXPR. Then SRA scalarizes the bit- field and turns the original VCE into a VCE of the 3-byte slice to the bit- field type, which is a 32-bit integer with precision 24. But the expansion of VCE isn't prepared for this and generates a full 32-bit read, which thus reads 1 additional byte and doesn't mask it afterwards, thus resulting in a wrong value for the scalarized bit-field. Proposed fix attached, tested on x86-64/Linux, OK for the mainline? 2014-02-11 Eric Botcazou ebotca...@adacore.com * expr.c (REDUCE_BIT_FIELD): Extend the scope of the macro. (expand_expr_real_1) case VIEW_CONVERT_EXPR: Deal with bit-field destination types. 2014-02-11 Eric Botcazou ebotca...@adacore.com * gnat.dg/opt31.adb: New test. -- Eric BotcazouIndex: expr.c === --- expr.c (revision 207685) +++ expr.c (working copy) @@ -9221,7 +9221,6 @@ expand_expr_real_2 (sepops ops, rtx targ return temp; return REDUCE_BIT_FIELD (temp); } -#undef REDUCE_BIT_FIELD /* Return TRUE if expression STMT is suitable for replacement. @@ -10444,15 +10443,21 @@ expand_expr_real_1 (tree exp, rtx target op0 = target; } - /* At this point, OP0 is in the correct mode. If the output type is + /* If it's a MEM with a different mode and we need to reduce it to a + bit-field type, do an extraction. Otherwise, we can read all bits + of MODE but need to deal with the alignment. If the output type is such that the operand is known to be aligned, indicate that it is. - Otherwise, we need only be concerned about alignment for non-BLKmode - results. */ + Otherwise, we actually need only be concerned about alignment for + non-BLKmode results. */ if (MEM_P (op0)) { enum insn_code icode; - if (TYPE_ALIGN_OK (type)) + if (reduce_bit_field GET_MODE (op0) != mode) + return extract_bit_field (op0, TYPE_PRECISION (type), 0, + TYPE_UNSIGNED (type), NULL_RTX, + mode, mode); + else if (TYPE_ALIGN_OK (type)) { /* ??? Copying the MEM without substantially changing it might run afoul of the code handling volatile memory references in @@ -10483,7 +10488,7 @@ expand_expr_real_1 (tree exp, rtx target /* Nor can the insn generator. */ insn = GEN_FCN (icode) (reg, op0); emit_insn (insn); - return reg; + return REDUCE_BIT_FIELD (reg); } else if (STRICT_ALIGNMENT) { @@ -10513,7 +10518,7 @@ expand_expr_real_1 (tree exp, rtx target op0 = adjust_address (op0, mode, 0); } - return op0; + return REDUCE_BIT_FIELD (op0); case MODIFY_EXPR: { @@ -10613,6 +10618,7 @@ expand_expr_real_1 (tree exp, rtx target return expand_expr_real_2 (ops, target, tmode, modifier); } } +#undef REDUCE_BIT_FIELD /* Subroutine of above: reduce EXP to the precision of TYPE (in the signedness of TYPE), possibly returning the result in TARGET. */-- { dg-do run } -- { dg-options -O } with Interfaces; use Interfaces; with Unchecked_Conversion; procedure Opt31 is type Unsigned_24 is new Unsigned_32 range 0 .. 2**24 - 1; subtype Time_T is Unsigned_24 range 0 .. 24 * 60 * 60 * 128 - 1; type Messages_T is array (Positive range ) of Unsigned_8; subtype T_3Bytes is Messages_T (1 .. 3); type Rec1 is record F : Time_T; end record; for Rec1 use record F at 0 range 0 .. 23; end record; for Rec1'Size use 24; type Rec2 is record I1,I2,I3,I4 : Integer; R1 : Rec1; end record; function Conv is new Unchecked_Conversion (T_3Bytes, Rec1); procedure Decode (M : Messages_T) is My_Rec2 : Rec2; begin My_Rec2.R1 := Conv (M (1 .. 3)); if not My_Rec2.R1.F'Valid then raise Program_Error; end if; end; Message : Messages_T (1 .. 4) := (16#18#, 16#0C#, 16#0C#, 16#18#); begin Decode (Message); end;
Re: Use [warning enabled by default] for default warnings
On 02/09/2014 09:00 PM, Richard Sandiford wrote: + return xstrdup (_(warning enabled by default)); I think this is still wrong because this message really means, this warning cannot be controlled with a warning flag, but it can likely be switched off by other means. I don't think it's helpful. In my opinion, it is better to make this message obsolete by introducing the missing warning flags. -- Florian Weimer / Red Hat Product Security Team
Re: Use [warning enabled by default] for default warnings
On 2/11/2014 4:45 AM, Richard Sandiford wrote: OK, this version drops the [enabled by default] altogether. Tested as before. OK to install? Still a huge earthquake in terms of affecting test suites and baselines of many users. is it really worth it? In the case of GNAT we have only recently started tagging messages in this way, so changes would not be so disruptive, and we can debate following whatever gcc does, but I think it is important to understand that any change in this area is a big one in terms of impact on users. Thanks, Richard gcc/ * opts.c (option_name): Remove enabled by default rider. gcc/testsuite/ * gcc.dg/gomp/simd-clones-5.c: Update comment for new warning message. Index: gcc/opts.c === --- gcc/opts.c 2014-02-10 20:36:32.380197329 + +++ gcc/opts.c 2014-02-10 20:58:45.894502379 + @@ -2216,14 +2216,10 @@ option_name (diagnostic_context *context return xstrdup (cl_options[option_index].opt_text); } /* A warning without option classified as an error. */ - else if (orig_diag_kind == DK_WARNING || orig_diag_kind == DK_PEDWARN - || diag_kind == DK_WARNING) -{ - if (context-warning_as_error_requested) - return xstrdup (cl_options[OPT_Werror].opt_text); - else - return xstrdup (_(enabled by default)); -} + else if ((orig_diag_kind == DK_WARNING || orig_diag_kind == DK_PEDWARN + || diag_kind == DK_WARNING) + context-warning_as_error_requested) +return xstrdup (cl_options[OPT_Werror].opt_text); else return NULL; } Index: gcc/testsuite/gcc.dg/gomp/simd-clones-5.c === --- gcc/testsuite/gcc.dg/gomp/simd-clones-5.c 2014-02-10 20:36:32.380197329 + +++ gcc/testsuite/gcc.dg/gomp/simd-clones-5.c 2014-02-10 21:00:32.549412313 + @@ -3,7 +3,7 @@ /* ?? The -w above is to inhibit the following warning for now: a.c:2:6: warning: AVX vector argument without AVX enabled changes - the ABI [enabled by default]. */ + the ABI. */ #pragma omp declare simd notinbranch simdlen(4) void foo (int *a)
[PATCH][LTO] Fix PR60060
This fixes the ICE seen in PR60060 which stems from us asking to output debug-info for a function-scope global twice - once via emitting the function and its BLOCK-associated vars and once via the debug-hook through lto_write_globals - emit_debug_global_declarations. The list of global variables does not agree with those from the frontends (that function-scope global is not in GFortrans list). Thus the following simply avoids the mess by only ever emitting debug-info for non-function scope globals here. As wrapup_global_declarations looks like a complete no-op for LTO (DECL_DEFER_OUTPUT is never 1, it's not even streamed ...) I chose to inline emit_debug_global_declarations. That whole area asks for a FE-middle-end interface sanitization ... LTO bootstrapped / tested on x86_64-unknown-linux-gnu. Does that look ok? Thanks, Richard. 2014-02-11 Richard Biener rguent...@suse.de PR lto/60060 * lto-lang.c (lto_write_globals): Do not call wrapup_global_declarations or emit_debug_global_declarations but emit debug info for non-function scope variables directly. Index: gcc/lto/lto-lang.c === *** gcc/lto/lto-lang.c (revision 207655) --- gcc/lto/lto-lang.c (working copy) *** lto_write_globals (void) *** 1082,1098 if (flag_wpa) return; ! /* Record the global variables. */ ! vectree lto_global_var_decls = vNULL; varpool_node *vnode; FOR_EACH_DEFINED_VARIABLE (vnode) ! lto_global_var_decls.safe_push (vnode-decl); ! ! tree *vec = lto_global_var_decls.address (); ! int len = lto_global_var_decls.length (); ! wrapup_global_declarations (vec, len); ! emit_debug_global_declarations (vec, len); ! lto_global_var_decls.release (); } static tree --- 1082,1092 if (flag_wpa) return; ! /* Output debug info for global variables. */ varpool_node *vnode; FOR_EACH_DEFINED_VARIABLE (vnode) ! if (!decl_function_context (vnode-decl)) ! debug_hooks-global_decl (vnode-decl); } static tree
Re: [RFA] [middle-end/54041] Convert modes as needed from expand_expr
On Mon, Feb 10, 2014 at 5:35 PM, Jeff Law l...@redhat.com wrote: On 02/07/14 02:17, Richard Biener wrote: +2014-02-05 Jeff Law l...@redhat.com + + PR middle-end/54041 + * expr.c (expand_expr_addr_1): Handle expand_expr returning an + object with an undesirable mode. + 2014-02-05 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Change diff --git a/gcc/expr.c b/gcc/expr.c index 878a51b..9609c45 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7708,6 +7708,11 @@ expand_expr_addr_expr_1 (tree exp, rtx target, enum machine_mode tmode, modifier == EXPAND_INITIALIZER ? EXPAND_INITIALIZER : EXPAND_NORMAL); + /* expand_expr is allowed to return an object in a mode other +than TMODE. If it did, we need to convert. */ + if (tmode != GET_MODE (tmp)) + tmp = convert_modes (tmode, GET_MODE (tmp), +tmp, TYPE_UNSIGNED (TREE_TYPE (offset))); What about CONSTANT_P tmp? Don't you need to use TYPE_MODE (TREE_TYPE (offset)) in that case? As I mentioned last week, we want to pass VOIDmode objects (constants) down to convert_memory_address_addr_space unchange. c_m_a_a_s will handle those correctly. This patch fixes that oversight and the function name in the ChangeLog entry. I've verified this version still fixes the original bug report and included it in an x86_64-unknown-linux-gnu bootstrap test for sanity's sake. OK for the trunk? Ok. Thanks, Richard. Thanks, Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2dbab72..eca3e2f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2014-02-05 Jeff Law l...@redhat.com + + PR middle-end/54041 + * expr.c (expand_expr_addr_expr_1): Handle expand_expr returning an + object with an undesirable mode. + 2014-02-05 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Change diff --git a/gcc/expr.c b/gcc/expr.c index 878a51b..42a451d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7708,6 +7708,11 @@ expand_expr_addr_expr_1 (tree exp, rtx target, enum machine_mode tmode, modifier == EXPAND_INITIALIZER ? EXPAND_INITIALIZER : EXPAND_NORMAL); + /* expand_expr is allowed to return an object in a mode other +than TMODE. If it did, we need to convert. */ + if (GET_MODE (tmp) != VOIDmode tmode != GET_MODE (tmp)) + tmp = convert_modes (tmode, GET_MODE (tmp), +tmp, TYPE_UNSIGNED (TREE_TYPE (offset))); result = convert_memory_address_addr_space (tmode, result, as); tmp = convert_memory_address_addr_space (tmode, tmp, as); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index c81a00d..283912d 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-02-05 Jeff Law l...@redhat.com + + PR middle-end/54041 + * gcc.target/m68k/pr54041.c: New test. + 2014-02-05 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/sum2s.c: New. diff --git a/gcc/testsuite/gcc.target/m68k/pr54041.c b/gcc/testsuite/gcc.target/m68k/pr54041.c new file mode 100644 index 000..645cb6d --- /dev/null +++ b/gcc/testsuite/gcc.target/m68k/pr54041.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O -mshort } */ + +extern int r[]; + +int *fn(int i) +{ + return r[i]; +} +
Re: RFA: XFAIL gcc.dg/vect/pr56787.c if vect_no_align
On Mon, Feb 10, 2014 at 9:38 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Not 100% sure whether this is the preferred fix, but gcc.dg/vect/pr56787.c has lots of float * parameters that point who-knows-where and so is only vectorised if the target supports misaligned vector accesses: void foo (unsigned long n, const float *__restrict u0, const float *__restrict u1, const float *__restrict u2, const float *__restrict u3, const float *__restrict u4, const float *__restrict s0, const float *__restrict s1, const float *__restrict s2, float *__restrict t3, float *__restrict t4) { unsigned long i; for (i = 0; i n; i++) { float u[5], f[3][5]; u[0] = u0[i]; u[1] = u1[i]; u[2] = u2[i]; u[3] = u3[i]; u[4] = u4[i]; bar (u, f); t3[i] = s0[i] * f[0][3] + s1[i] * f[1][3] + s2[i] * f[2][3]; } } MIPS paired-single doesn't have any form of vector misalignment. I suppose it would be technically possible to use misaligned integer accesses and an FPR-GPR move, but that can be expensive. I also don't have any hardware to benchmark it on. So this patch adds an xfail for vect_no_align. Tested on mipsisa64-sde-elf. OK to install? Ok. Thanks, Richard. Thanks, Richard gcc/testsuite/ * gcc.dg/vect/pr56787.c: Mark as xfail for vect_no_align. Index: gcc/testsuite/gcc.dg/vect/pr56787.c === --- gcc/testsuite/gcc.dg/vect/pr56787.c 2014-02-10 20:26:03.870867802 + +++ gcc/testsuite/gcc.dg/vect/pr56787.c 2014-02-10 20:36:42.072279177 + @@ -31,5 +31,5 @@ foo (unsigned long n, const float *__res } } -/* { dg-final { scan-tree-dump vectorized 1 loops vect } } */ +/* { dg-final { scan-tree-dump vectorized 1 loops vect { xfail vect_no_align } } } */ /* { dg-final { cleanup-tree-dump vect } } */
[PING] [test case, patch] ICE in Cilk Plus structured block checker
Hi! Ping again for test case and rather trivial patch to cure a GCC trunk ICE in the Cilk Plus structured block checker if both -fcilkplus and -fopenmp are specified. Also, I'd still like to know whether the »Generalize diagnose_omp_blocks' structured block logic« patch is fine, too? (If that one is acknowledged, I'll then commit the »OpenACC parallel structured blocks« patch to gomp-4_0-branch.) On Fri, 24 Jan 2014 16:57:28 +0100, I wrote: Ping. With a different subject line, this time. Let's first concentrate on the ICE in the Cilk Plus structured block checker, then generalize diagnose_omp_blocks' structured block logic, before finally getting to the OpenACC extension. Here starts the ICE patch: On Fri, 10 Jan 2014 12:48:21 +0100, I wrote: Ping. On Thu, 19 Dec 2013 17:49:09 +0100, I wrote: Ping. On Tue, 10 Dec 2013 13:53:39 +0100, I wrote: At the end of this email you can find the patch that I'd like to apply to gomp-4_0-branch for OpenACC structured blocks, after the following two have been approved: On Fri, 06 Dec 2013 19:33:35 +0100, I wrote: On Fri, 15 Nov 2013 14:44:45 -0700, Aldy Hernandez al...@redhat.com wrote: --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -10177,12 +10210,33 @@ diagnose_sb_0 (gimple_stmt_iterator *gsi_p, error (invalid entry to OpenMP structured block); #endif + bool cilkplus_block = false; + if (flag_enable_cilkplus) +{ + if ((branch_ctx + gimple_code (branch_ctx) == GIMPLE_OMP_FOR + gimple_omp_for_kind (branch_ctx) == GF_OMP_FOR_KIND_CILKSIMD) + || (gimple_code (label_ctx) == GIMPLE_OMP_FOR + gimple_omp_for_kind (label_ctx) == GF_OMP_FOR_KIND_CILKSIMD)) + cilkplus_block = true; +} There is one issue here: consider the following code: void baz() { bad1: #pragma omp parallel goto bad1; } Then, if both -fcilkplus and -fopenmp are specified, that will run into a SIGSEGV/ICE because of label_ctx == NULL. The fix is simple enough; OK for trunk and gomp-4_0-branch (after full testing)? Testing looks good. The testcase is basically a concatenation of gcc.dg/cilk-plus/jump.c and gcc.dg/gomp/block-1.c -- should this be done differently/better? commit eee16f8aad4527b705d327476b00bf9f5ba6dcce Author: Thomas Schwinge tho...@codesourcery.com Date: Fri Dec 6 18:55:41 2013 +0100 Fix possible ICE (null pointer dereference) introduced in r204863. gcc/ * omp-low.c (diagnose_sb_0): Make sure label_ctx is valid to dereference. gcc/testsuite/ * gcc.dg/cilk-plus/jump-openmp.c: New file. diff --git gcc/omp-low.c gcc/omp-low.c index e0f7d1d..91221c0 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -10865,7 +10865,8 @@ diagnose_sb_0 (gimple_stmt_iterator *gsi_p, if ((branch_ctx gimple_code (branch_ctx) == GIMPLE_OMP_FOR gimple_omp_for_kind (branch_ctx) == GF_OMP_FOR_KIND_CILKSIMD) - || (gimple_code (label_ctx) == GIMPLE_OMP_FOR + || (label_ctx +gimple_code (label_ctx) == GIMPLE_OMP_FOR gimple_omp_for_kind (label_ctx) == GF_OMP_FOR_KIND_CILKSIMD)) cilkplus_block = true; } diff --git gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c new file mode 100644 index 000..95e6b2d --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options -fcilkplus -fopenmp } */ +/* { dg-require-effective-target fopenmp } */ + +int *a, *b, c; + +void foo() +{ +#pragma simd + for (int i=0; i 1000; ++i) +{ + a[i] = b[i]; + if (c == 5) + return; /* { dg-error invalid branch to/from a Cilk Plus structured block } */ +} +} + +void bar() +{ +#pragma simd + for (int i=0; i 1000; ++i) +{ +lab: + a[i] = b[i]; +} + if (c == 6) +goto lab; /* { dg-error invalid entry to Cilk Plus structured block } */ +} + +void baz() +{ + bad1: + #pragma omp parallel +goto bad1; /* { dg-error invalid branch to/from an OpenMP structured block } */ + + goto bad2; /* { dg-error invalid entry to OpenMP structured block } */ + #pragma omp parallel +{ + bad2: ; +} + + #pragma omp parallel +{ + int i; + goto ok1; + for (i = 0; i 10; ++i) + { ok1:
Re: Use [warning enabled by default] for default warnings
Robert Dewar de...@adacore.com writes: On 2/11/2014 4:45 AM, Richard Sandiford wrote: OK, this version drops the [enabled by default] altogether. Tested as before. OK to install? Still a huge earthquake in terms of affecting test suites and baselines of many users. is it really worth it? In the case of GNAT we have only recently started tagging messages in this way, so changes would not be so disruptive, and we can debate following whatever gcc does, but I think it is important to understand that any change in this area is a big one in terms of impact on users. The patch deliberately didn't affect Ada's diagnostic routines given your comments from the first round. Calling this a huge earthquake for other languages seems like a gross overstatement. I don't think gcc, g++, gfortran, etc, have ever made a commitment to producing textually identical warnings and errors for given inputs across different releases. It seems ridiculous to require that, especially if it stands in the way of improving the diagnostics or introducing finer-grained -W control. E.g. Florian's complaint was that we shouldn't have warnings that are not under the control of any -W options. But by your logic we couldn't change that either, because all those [enabled by default]s would become [-Wnew-option]s. Thanks, Richard
Only assume 4-byte stack alignment on Solaris 9/x86 (PR libgomp/60107)
As described in the PR, a few libgomp testcases FAIL on Solaris 9/x86 because the SSE insns assume 16-byte stack alignment, while Solaris 9, unlike Solaris 10, only guarantees the 4-byte alignment prescribed by the i386 psABI. This can be fixed, at a slight performance penalty and code size increase, by defaulting Solaris 9/x86 to STACK_REALIGN_DEFAULT=1. This patch does just that. Bootstrapped without regressions on i386-pc-solaris2.9, ok'ed for 4.9.0 by Jakub in the PR, installed on mainline. Rainer 2014-02-10 Rainer Orth r...@cebitec.uni-bielefeld.de PR libgomp/60107 * config/i386/sol2-9.h: New file. * config.gcc (i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*, *-*-solaris2.9*): Use it. # HG changeset patch # Parent 3ca092d5b3d8fd8554991136a1a48c4eaabeb941 Only assume 4-byte stack alignment on Solaris 9/x86 (PR libgomp/60107) diff --git a/gcc/config.gcc b/gcc/config.gcc --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1537,6 +1537,9 @@ i[34567]86-*-solaris2* | x86_64-*-solari esac with_tune_32=${with_tune_32:-generic} case ${target} in + *-*-solaris2.9*) + tm_file=${tm_file} i386/sol2-9.h + ;; *-*-solaris2.1[0-9]*) tm_file=${tm_file} i386/x86-64.h i386/sol2-bi.h sol2-bi.h tm_defines=${tm_defines} TARGET_BI_ARCH=1 diff --git a/gcc/config/i386/sol2-9.h b/gcc/config/i386/sol2-9.h new file mode 100644 --- /dev/null +++ b/gcc/config/i386/sol2-9.h @@ -0,0 +1,23 @@ +/* Target definitions for GCC for Intel 80386 running Solaris 9 + Copyright (C) 2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +/* Solaris 9 only guarantees 4-byte stack alignment as required by the i386 + psABI, so realign it as necessary for SSE instructions. */ +#undef STACK_REALIGN_DEFAULT +#define STACK_REALIGN_DEFAULT 1 -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: Hi, The patch gives an expected 3 times gain for the test case in the PR52252 (and even 6 times for AVX2). It passes make check and bootstrap on x86. spec2000/spec2006 got no regressions/gains on x86. Is this patch ok? I've worked on generalizing the permutation support in the light of the availability of the generic shuffle support in the IL but hit some road-blocks in the way code-generation works for group loads with permutations (I don't remember if I posted all patches). This patch seems to be to a slightly different place but it again special-cases a specific permutation. Why's that? Why can't we support groups of size 7 for example? So - can this be generalized to support arbitrary non-power-of-two load/store groups? Other than that the patch has to wait for stage1 to open again, of course. And it misses a testcase. Btw, do you have a copyright assignment on file with the FSF covering work on GCC? Thanks, Richard. ChangeLog: 2014-02-11 Evgeny Stupachenko evstu...@gmail.com * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle. * tree-vect-data-refs.c (vect_grouped_store_supported): New check for stores group of length 3. (vect_permute_store_chain): New permutations for stores group of length 3. (vect_grouped_load_supported): New check for loads group of length 3. (vect_permute_load_chain): New permutations for loads group of length 3. * tree-vect-stmts.c (vect_model_store_cost): New cost vec_perm_shuffle for the new permutations. (vect_model_load_cost): Ditto. * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding vec_perm_shuffle cost as equvivalent of vec_perm cost. * config/arm/arm.c: Ditto. * config/rs6000/rs6000.c: Ditto. * config/spu/spu.c: Ditto. * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte shuffle on some x86 architectures. * config/i386/i386.h (processor_costs): Defining pshuffb cost. * config/i386/i386.c (processor_costs): Adding pshuffb cost. (ix86_builtin_vectorization_cost): Adding cost for the new permutations. Fixing cost for other permutations. (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are slow (TARGET_SLOW_PHUFFB). (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. Adding new shuffle cost only when byte shuffle is expected. Fixing cost model for Silvermont. Thanks, Evgeny -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PING] [test case, patch] ICE in Cilk Plus structured block checker
Hi! On Tue, 11 Feb 2014 13:43:46 +0100, I wrote: Ping again for test case and rather trivial patch to cure a GCC trunk ICE in the Cilk Plus structured block checker if both -fcilkplus and -fopenmp are specified. Jakub asked me to »please repost just the (hopefully small) trunk patch alone«, so here we go: Consider the following code: void baz() { bad1: #pragma omp parallel goto bad1; } Then, if both -fcilkplus and -fopenmp are specified, that will run into a SIGSEGV/ICE because of label_ctx == NULL in omp-low.c:diagnose_sb_0. The testcase is basically a concatenation of gcc.dg/cilk-plus/jump.c and gcc.dg/gomp/block-1.c -- should this be done differently/better? Fix potential ICE (null pointer dereference) in omp-low.c:diagnose_sb_0. gcc/ * omp-low.c (diagnose_sb_0): Make sure label_ctx is valid to dereference. gcc/testsuite/ * gcc.dg/cilk-plus/jump-openmp.c: New file. diff --git gcc/omp-low.c gcc/omp-low.c index e0f7d1d..91221c0 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -10865,7 +10865,8 @@ diagnose_sb_0 (gimple_stmt_iterator *gsi_p, if ((branch_ctx gimple_code (branch_ctx) == GIMPLE_OMP_FOR gimple_omp_for_kind (branch_ctx) == GF_OMP_FOR_KIND_CILKSIMD) - || (gimple_code (label_ctx) == GIMPLE_OMP_FOR + || (label_ctx + gimple_code (label_ctx) == GIMPLE_OMP_FOR gimple_omp_for_kind (label_ctx) == GF_OMP_FOR_KIND_CILKSIMD)) cilkplus_block = true; } diff --git gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c new file mode 100644 index 000..95e6b2d --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/jump-openmp.c @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-options -fcilkplus -fopenmp } */ +/* { dg-require-effective-target fopenmp } */ + +int *a, *b, c; + +void foo() +{ +#pragma simd + for (int i=0; i 1000; ++i) +{ + a[i] = b[i]; + if (c == 5) + return; /* { dg-error invalid branch to/from a Cilk Plus structured block } */ +} +} + +void bar() +{ +#pragma simd + for (int i=0; i 1000; ++i) +{ +lab: + a[i] = b[i]; +} + if (c == 6) +goto lab; /* { dg-error invalid entry to Cilk Plus structured block } */ +} + +void baz() +{ + bad1: + #pragma omp parallel +goto bad1; /* { dg-error invalid branch to/from an OpenMP structured block } */ + + goto bad2; /* { dg-error invalid entry to OpenMP structured block } */ + #pragma omp parallel +{ + bad2: ; +} + + #pragma omp parallel +{ + int i; + goto ok1; + for (i = 0; i 10; ++i) + { ok1: break; } +} +} Grüße, Thomas pgpmmBQ4hAZPQ.pgp Description: PGP signature
Re: [patch] Fix wrong code with VCE to bit-field type at -O
On Tue, Feb 11, 2014 at 11:40 AM, Eric Botcazou ebotca...@adacore.com wrote: Hi, this is an interesting regression from GCC 4.5 present on all active branches and triggered by recent SRA improvements. For the attached testcase, we have an unchecked conversion of a 3-byte slice of an array of 4 bytes to a record type containing a 3-byte bit-field; an unchecked conversion in this case directly translates into a VIEW_CONVERT_EXPR. Then SRA scalarizes the bit- field and turns the original VCE into a VCE of the 3-byte slice to the bit- field type, which is a 32-bit integer with precision 24. But the expansion of VCE isn't prepared for this and generates a full 32-bit read, which thus reads 1 additional byte and doesn't mask it afterwards, thus resulting in a wrong value for the scalarized bit-field. Proposed fix attached, tested on x86-64/Linux, OK for the mainline? Hmm. The intent was of course to only allow truly no-op converts via VIEW_CONVERT_EXPR - that is, the size of the operand type and the result type should be the same. So, isn't SRA doing it wrong when creating the VIEW_CONVERT_EXPR of a 3-byte type to a 4-byte type? The verification we do in tree-cfg.c:verify_types_in_gimple_reference hints at that we _do_ have even grosser mismatches - so reality may trump desired design here. Richard. 2014-02-11 Eric Botcazou ebotca...@adacore.com * expr.c (REDUCE_BIT_FIELD): Extend the scope of the macro. (expand_expr_real_1) case VIEW_CONVERT_EXPR: Deal with bit-field destination types. 2014-02-11 Eric Botcazou ebotca...@adacore.com * gnat.dg/opt31.adb: New test. -- Eric Botcazou
Re: Use [warning enabled by default] for default warnings
On Tue, Feb 11, 2014 at 1:48 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Robert Dewar de...@adacore.com writes: On 2/11/2014 4:45 AM, Richard Sandiford wrote: OK, this version drops the [enabled by default] altogether. Tested as before. OK to install? Still a huge earthquake in terms of affecting test suites and baselines of many users. is it really worth it? In the case of GNAT we have only recently started tagging messages in this way, so changes would not be so disruptive, and we can debate following whatever gcc does, but I think it is important to understand that any change in this area is a big one in terms of impact on users. The patch deliberately didn't affect Ada's diagnostic routines given your comments from the first round. Calling this a huge earthquake for other languages seems like a gross overstatement. I don't think gcc, g++, gfortran, etc, have ever made a commitment to producing textually identical warnings and errors for given inputs across different releases. It seems ridiculous to require that, especially if it stands in the way of improving the diagnostics or introducing finer-grained -W control. E.g. Florian's complaint was that we shouldn't have warnings that are not under the control of any -W options. But by your logic we couldn't change that either, because all those [enabled by default]s would become [-Wnew-option]s. Yeah, I think Roberts argument is a red herring - there are loads of diagnostic changes every release so you cannot expect those to be stable. I'm fine with dropping the [enabled by default] as in the patch, but lets hear another ok before making the change. Thanks, Richard. Thanks, Richard
Re: [patch] Fix wrong code with VCE to bit-field type at -O
On Tue, Feb 11, 2014 at 02:17:04PM +0100, Richard Biener wrote: this is an interesting regression from GCC 4.5 present on all active branches and triggered by recent SRA improvements. For the attached testcase, we have an unchecked conversion of a 3-byte slice of an array of 4 bytes to a record type containing a 3-byte bit-field; an unchecked conversion in this case directly translates into a VIEW_CONVERT_EXPR. Then SRA scalarizes the bit- field and turns the original VCE into a VCE of the 3-byte slice to the bit- field type, which is a 32-bit integer with precision 24. But the expansion of VCE isn't prepared for this and generates a full 32-bit read, which thus reads 1 additional byte and doesn't mask it afterwards, thus resulting in a wrong value for the scalarized bit-field. Proposed fix attached, tested on x86-64/Linux, OK for the mainline? Hmm. The intent was of course to only allow truly no-op converts via VIEW_CONVERT_EXPR - that is, the size of the operand type and the result type should be the same. So, isn't SRA doing it wrong when creating the VIEW_CONVERT_EXPR of a 3-byte type to a 4-byte type? The verification we do in tree-cfg.c:verify_types_in_gimple_reference hints at that we _do_ have even grosser mismatches - so reality may trump desired design here. I thought we only allow VCE if the bitsize of both types is the same. If you have different bitsizes, then supposedly VCE to same bitsize integer followed by zero/sign extension or truncation followed by another VCE should be used. Jakub
Re: Use [warning enabled by default] for default warnings
On 2/11/2014 7:48 AM, Richard Sandiford wrote: The patch deliberately didn't affect Ada's diagnostic routines given your comments from the first round. Calling this a huge earthquake for other languages seems like a gross overstatement. Actually it's much less of an impact for Ada for two reasons. First we only just started tagging warnings. In fact we have only just released an official version with the facility for tagging warnings. Second, this tagging of warnings is not the default (that would have been a big earthquake) but you have to turn it on explicitly. But I do indeed think it will have a significant impact for users of other languages, where this has been done for a while, and if I am not mistaken, done by default? I don't think gcc, g++, gfortran, etc, have ever made a commitment to producing textually identical warnings and errors for given inputs across different releases. It seems ridiculous to require that, especially if it stands in the way of improving the diagnostics or introducing finer-grained -W control. E.g. Florian's complaint was that we shouldn't have warnings that are not under the control of any -W options. But by your logic we couldn't change that either, because all those [enabled by default]s would become [-Wnew-option]s. I am not saying you can't change it, just that it is indeed a big earthquake. No of course there is no commitment not to make changes. But you have to be aware that when you make changes like this, the impact is very significant in real production environments, and gcc is as you know extensively used in such environments. What I am saying here is that this is worth some discussion on what the best approach is. Ideally indeed it would be better if all warnings were controlled by some specific warning category. I am not sure a warning switch that default-covered all otherwise uncovered cases (as suggested by one person at least) would be a worthwhile approach.
Re: [patch] Fix wrong code with VCE to bit-field type at -O
On Tue, Feb 11, 2014 at 2:22 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Feb 11, 2014 at 02:17:04PM +0100, Richard Biener wrote: this is an interesting regression from GCC 4.5 present on all active branches and triggered by recent SRA improvements. For the attached testcase, we have an unchecked conversion of a 3-byte slice of an array of 4 bytes to a record type containing a 3-byte bit-field; an unchecked conversion in this case directly translates into a VIEW_CONVERT_EXPR. Then SRA scalarizes the bit- field and turns the original VCE into a VCE of the 3-byte slice to the bit- field type, which is a 32-bit integer with precision 24. But the expansion of VCE isn't prepared for this and generates a full 32-bit read, which thus reads 1 additional byte and doesn't mask it afterwards, thus resulting in a wrong value for the scalarized bit-field. Proposed fix attached, tested on x86-64/Linux, OK for the mainline? Hmm. The intent was of course to only allow truly no-op converts via VIEW_CONVERT_EXPR - that is, the size of the operand type and the result type should be the same. So, isn't SRA doing it wrong when creating the VIEW_CONVERT_EXPR of a 3-byte type to a 4-byte type? The verification we do in tree-cfg.c:verify_types_in_gimple_reference hints at that we _do_ have even grosser mismatches - so reality may trump desired design here. I thought we only allow VCE if the bitsize of both types is the same. If you have different bitsizes, then supposedly VCE to same bitsize integer followed by zero/sign extension or truncation followed by another VCE should be used. Yeah, but the verification code is conveniently crippled: if (TREE_CODE (expr) == VIEW_CONVERT_EXPR) { /* For VIEW_CONVERT_EXPRs which are allowed here too, we only check that their operand is not an SSA name or an invariant when requiring an lvalue (this usually means there is a SRA or IPA-SRA bug). Otherwise there is nothing to verify, gross mismatches at most invoke undefined behavior. */ if (require_lvalue (TREE_CODE (op) == SSA_NAME || is_gimple_min_invariant (op))) { error (conversion of an SSA_NAME on the left hand side); debug_generic_stmt (expr); return true; } else if (TREE_CODE (op) == SSA_NAME TYPE_SIZE (TREE_TYPE (expr)) != TYPE_SIZE (TREE_TYPE (op))) { error (conversion of register to a different size); debug_generic_stmt (expr); return true; } thus that only applies to register VIEW_CONVERT_EXPRs. But maybe we two are missing sth here - it's an Ada testcase after all ;) Richard. Jakub
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
Missed patch attached in plain-text. I have copyright assignment on file with the FSF covering work on GCC. Load/stores groups of length 3 is the most frequent non-power-of-2 case. It is used in RGB image processing (like test case in PR52252). For sure we can extend the patch to length 5 and more. However, this potentially affect performance on some other architectures and requires larger testing. So length 3 it is just first step.The algorithm in the patch could be modified for a general case in several steps. I understand that the patch should wait for the stage 1, however since its ready we can discuss it right now and make some changes (like general size of group). Thanks, Evgeny On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener rguent...@suse.de wrote: On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: Hi, The patch gives an expected 3 times gain for the test case in the PR52252 (and even 6 times for AVX2). It passes make check and bootstrap on x86. spec2000/spec2006 got no regressions/gains on x86. Is this patch ok? I've worked on generalizing the permutation support in the light of the availability of the generic shuffle support in the IL but hit some road-blocks in the way code-generation works for group loads with permutations (I don't remember if I posted all patches). This patch seems to be to a slightly different place but it again special-cases a specific permutation. Why's that? Why can't we support groups of size 7 for example? So - can this be generalized to support arbitrary non-power-of-two load/store groups? Other than that the patch has to wait for stage1 to open again, of course. And it misses a testcase. Btw, do you have a copyright assignment on file with the FSF covering work on GCC? Thanks, Richard. ChangeLog: 2014-02-11 Evgeny Stupachenko evstu...@gmail.com * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle. * tree-vect-data-refs.c (vect_grouped_store_supported): New check for stores group of length 3. (vect_permute_store_chain): New permutations for stores group of length 3. (vect_grouped_load_supported): New check for loads group of length 3. (vect_permute_load_chain): New permutations for loads group of length 3. * tree-vect-stmts.c (vect_model_store_cost): New cost vec_perm_shuffle for the new permutations. (vect_model_load_cost): Ditto. * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding vec_perm_shuffle cost as equvivalent of vec_perm cost. * config/arm/arm.c: Ditto. * config/rs6000/rs6000.c: Ditto. * config/spu/spu.c: Ditto. * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte shuffle on some x86 architectures. * config/i386/i386.h (processor_costs): Defining pshuffb cost. * config/i386/i386.c (processor_costs): Adding pshuffb cost. (ix86_builtin_vectorization_cost): Adding cost for the new permutations. Fixing cost for other permutations. (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are slow (TARGET_SLOW_PHUFFB). (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. Adding new shuffle cost only when byte shuffle is expected. Fixing cost model for Silvermont. Thanks, Evgeny -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer vect.patch Description: Binary data
[patch c++]: Fix pr/58835 [4.7/4.8/4.9 Regression] ICE with __PRETTY_FUNCTION__ in broken function
Hi, the following patch adds missing handling of error_mark_node result of fname_decl within finish_fname. ChangeLog 2014-02-11 Kai Tietz kti...@redhat.com PR c++/58835 * semantics.c (finish_fname): Handle error_mark_node. Regression tested for x86_64-unknown-linux-gnu, i686-w64-mingw32. Ok for apply? Regards, Kai Index: semantics.c === --- semantics.c (Revision 207686) +++ semantics.c (Arbeitskopie) @@ -2630,7 +2630,8 @@ finish_fname (tree id) tree decl; decl = fname_decl (input_location, C_RID_CODE (id), id); - if (processing_template_decl current_function_decl) + if (processing_template_decl current_function_decl + decl != error_mark_node) decl = DECL_NAME (decl); return decl; }
Re: [PING] [test case, patch] ICE in Cilk Plus structured block checker
On Tue, Feb 11, 2014 at 02:15:00PM +0100, Thomas Schwinge wrote: Jakub asked me to »please repost just the (hopefully small) trunk patch alone«, so here we go: Consider the following code: void baz() { bad1: #pragma omp parallel goto bad1; } Then, if both -fcilkplus and -fopenmp are specified, that will run into a SIGSEGV/ICE because of label_ctx == NULL in omp-low.c:diagnose_sb_0. The testcase is basically a concatenation of gcc.dg/cilk-plus/jump.c and gcc.dg/gomp/block-1.c -- should this be done differently/better? Fix potential ICE (null pointer dereference) in omp-low.c:diagnose_sb_0. gcc/ * omp-low.c (diagnose_sb_0): Make sure label_ctx is valid to dereference. gcc/testsuite/ * gcc.dg/cilk-plus/jump-openmp.c: New file. Ok, thanks. Jakub
Re: Use [warning enabled by default] for default warnings
Robert Dewar de...@adacore.com writes: I don't think gcc, g++, gfortran, etc, have ever made a commitment to producing textually identical warnings and errors for given inputs across different releases. It seems ridiculous to require that, especially if it stands in the way of improving the diagnostics or introducing finer-grained -W control. E.g. Florian's complaint was that we shouldn't have warnings that are not under the control of any -W options. But by your logic we couldn't change that either, because all those [enabled by default]s would become [-Wnew-option]s. I am not saying you can't change it, just that it is indeed a big earthquake. No of course there is no commitment not to make changes. But you have to be aware that when you make changes like this, the impact is very significant in real production environments, and gcc is as you know extensively used in such environments. What I am saying here is that this is worth some discussion on what the best approach is. But what's the basis for that discussion going to be? I first made this suggestion on gcc@, which is the best list we have for getting user feedback, and no user made this objection. And when I worked in an environment where I had direct contact with GCC-using customers, none of them gave any indication that they were expecting textual stability between releases. If you know of people who are using non-Ada languages this way then please describe their set-up. If you don't, how are we going to know how such hypothetical users are going to react? E.g. how many of those users will have heard of sed? I thought the trend these days was to move towards -Werror, so that for many people the expected output is to get no warnings at all. And bear in mind that the kind of warnings that are not under -W control tend to be those that are so likely to be a mistake that no-one has ever had an incentive to make them optional. I find it hard to believe that significant numbers of users are not fixing the sources of those warnings and are instead requiring every release of GCC to produce warnings with a particular wording. Thanks, Richard
Re: Allow passing arrays in registers on AArch64
On 6 February 2014 22:51, Michael Hudson-Doyle michael.hud...@canonical.com wrote: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 16c51a8..958c667 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1187,14 +1187,10 @@ aarch64_pass_by_reference (cumulative_args_t pcum ATTRIBUTE_UNUSED, size = (mode == BLKmode type) ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode); - if (type) + /* Aggregates are passed by reference based on their size. */ + if (type AGGREGATE_TYPE_P (type)) { - /* Arrays always passed by reference. */ - if (TREE_CODE (type) == ARRAY_TYPE) - return true; - /* Other aggregates based on their size. */ - if (AGGREGATE_TYPE_P (type)) - size = int_size_in_bytes (type); + size = int_size_in_bytes (type); } /* Variable sized arguments are always returned by reference. */ This version of the patch looks fine. Since this is a bug I think it should be committed now in stage 4.This is OK if release manager agrees. Cheers /Marcus
Re: Allow passing arrays in registers on AArch64
On Tue, Feb 11, 2014 at 02:51:08PM +, Marcus Shawcroft wrote: On 6 February 2014 22:51, Michael Hudson-Doyle michael.hud...@canonical.com wrote: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 16c51a8..958c667 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1187,14 +1187,10 @@ aarch64_pass_by_reference (cumulative_args_t pcum ATTRIBUTE_UNUSED, size = (mode == BLKmode type) ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode); - if (type) + /* Aggregates are passed by reference based on their size. */ + if (type AGGREGATE_TYPE_P (type)) { - /* Arrays always passed by reference. */ - if (TREE_CODE (type) == ARRAY_TYPE) - return true; - /* Other aggregates based on their size. */ - if (AGGREGATE_TYPE_P (type)) - size = int_size_in_bytes (type); + size = int_size_in_bytes (type); } /* Variable sized arguments are always returned by reference. */ This version of the patch looks fine. Since this is a bug I think it should be committed now in stage 4.This is OK if release manager agrees. Ok. Jakub
Re: [PATCH][ARM] Fix PR 55426
On 11/02/14 09:38, Kyrill Tkachov wrote: Hi all, In this PR the 128-bit load-duplicate intrinsics in neon.exp ICE on big-endian with an unrecognisable insn error: neon-vld1_dupQ.c:24:1: error: unrecognizable insn: (insn 94 93 31 (set (subreg:DI (reg:V2DI 95 d16 [orig:137 D.14400 ] [137]) 0) (subreg:DI (reg:V2DI 95 d16 [orig:137 D.14400 ] [137]) 8)) The problem seems to be that the neon_vld1_dupv2di splitter generates subregs after reload with gen_lowpart and gen_highpart. Since that splitter always matches after reload, we already know the hard register numbers, so we can just manipulate those directly to extract the two doubleword parts of a quadword reg. While we're at it, we might as well use a more general move instruction when the alignment is natural to potentially take advantage of more complex addressing modes. We're allowed to do that because the vld1Q_dup*64 intrinsics describe a behaviour and do not guarantee that a particular instruction will be used. Therefore the vld1Q_dup*64 tests are updated to be run-time tests instead to test the functionality. New *_misaligned tests are added, however, to make sure that we still generate vld1.64 when the address is explicitly unaligned, since vld1.64 is the only instruction that can handle that. Did an armeb-none-linux-gnueabihf build. The vld1Q_dup*64* tests now pass on big and little endian. arm-none-linux-gnueabihf bootstrap on Chromebook successful. This is a regression since 4.7. I've tested this on trunk. Will test this on the 4.8 and 4.7 branches. My apologies, I misread the bug report as if this appears on 4.7. I couldn't reproduce it on 4.7 since the offending pattern didn't exist then. I'm testing a 4.8 variant of this patch. Kyrill Ok for those branches if no regressions? Thanks, Kyrill 2014-02-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/55426 * config/arm/neon.md (neon_vld1_dupv2di): Do not generate low and high part subregs, use hard reg numbers. * config/arm/arm.c (arm_mem_aligned_p): New function. (arm_init_neon_builtins): Allow for memory operands in load operations. * config/arm/arm-protos.h (arm_mem_aligned_p): Declare extern. * config/arm/constraints.md (Uo): New constraint. 2014-02-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/55426 * gcc.target/arm/neon/vld1Q_dupp64.c: Change to run-time test. * gcc.target/arm/neon/vld1Q_dups64.c: Likewise. * gcc.target/arm/neon/vld1Q_dupu64.c: Likewise. * gcc.target/arm/neon/vld1Q_dupp64_misaligned.c: New test. * gcc.target/arm/neon/vld1Q_dups64_misaligned.c: Likewise. * gcc.target/arm/neon/vld1Q_dupu64_misaligned.c: Likewise.
Re: [PATCH][AArch64] Wire up Cortex-A57 rtx costs
On 10 February 2014 09:55, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 30 January 2014 13:48, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch wires up the aarch64 backend to use the Cortex-A57 rtx costs table that is proposed at http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01954.html OK if release manager agrees. /Marcus Jakub, are you OK with us taking this patch in stage4 ? This patch builds on the ARM patch http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00403.html and like the ARM patch it is none default tuning. /Marcus
Re: [PATCH][ARM] Add -mcpu=native detection for Cortex-A53, A57
On Mon, Feb 10, 2014 at 11:01 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patchlet adds the part numbers for the Cortex-A53 and A57 cores so that they can be detected when parsing /proc/cpuinfo on AArch32 Linux systems. This will allow the -mcpu=native machinery to detect those cores. Tested arm-none-eabi on a model. This is a fairly innocuous change, is it ok at this stage or for next stage 1? I'm happy for this to go in provided an RM doesn't object in 24 hours. regards Ramana Thanks, Kyrill 2014-02-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/driver-arm.c (arm_cpu_table): Add entries for Cortex-A53, Cortex-A57.
Re: [PATCH][ARM]fix potential testsuite/gcc.target/arm/fixed_float_conversion.c regression
On Mon, Feb 3, 2014 at 3:56 PM, Renlin Li renlin...@arm.com wrote: Hi all, This patch will ensure testsuite/gcc.target/arm/fixed_float_conversion.c is checked only when -mfpu=vfp3 -mfloat-abi=softfp is applicable for the target. Accordingly, two procs (check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3) are added into gcc/testsuite/lib/target-supports.exp. I have also update related documentation. Okay for trunk? This is OK. Ramana Kind regards, Renlin Li gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option to the test case. * lib/target-supports.exp: check_effective_target_arm_vfp3_ok: New. add_options_for_arm_vfp3: New. gcc/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * doc/sourcebuild.texi: Document check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3
Re: [PATCH][AArch64] Wire up Cortex-A57 rtx costs
On Tue, Feb 11, 2014 at 03:03:32PM +, Marcus Shawcroft wrote: On 10 February 2014 09:55, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 30 January 2014 13:48, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch wires up the aarch64 backend to use the Cortex-A57 rtx costs table that is proposed at http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01954.html OK if release manager agrees. /Marcus Jakub, are you OK with us taking this patch in stage4 ? This patch builds on the ARM patch http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00403.html and like the ARM patch it is none default tuning. Ok. Jakub
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: Missed patch attached in plain-text. I have copyright assignment on file with the FSF covering work on GCC. Load/stores groups of length 3 is the most frequent non-power-of-2 case. It is used in RGB image processing (like test case in PR52252). For sure we can extend the patch to length 5 and more. However, this potentially affect performance on some other architectures and requires larger testing. So length 3 it is just first step.The algorithm in the patch could be modified for a general case in several steps. I understand that the patch should wait for the stage 1, however since its ready we can discuss it right now and make some changes (like general size of group). Other than that I'd like to see a vectorizer hook querying the cost of a vec_perm_const expansion instead of adding vec_perm_shuffle (thus requires the constant shuffle mask to be passed as well as the vector type). That's more useful for other uses that would require (arbitrary) shuffles. Didn't look at the rest of the patch yet - queued in my review pipeline. Thanks, Richard. Thanks, Evgeny On Tue, Feb 11, 2014 at 5:00 PM, Richard Biener rguent...@suse.de wrote: On Tue, 11 Feb 2014, Evgeny Stupachenko wrote: Hi, The patch gives an expected 3 times gain for the test case in the PR52252 (and even 6 times for AVX2). It passes make check and bootstrap on x86. spec2000/spec2006 got no regressions/gains on x86. Is this patch ok? I've worked on generalizing the permutation support in the light of the availability of the generic shuffle support in the IL but hit some road-blocks in the way code-generation works for group loads with permutations (I don't remember if I posted all patches). This patch seems to be to a slightly different place but it again special-cases a specific permutation. Why's that? Why can't we support groups of size 7 for example? So - can this be generalized to support arbitrary non-power-of-two load/store groups? Other than that the patch has to wait for stage1 to open again, of course. And it misses a testcase. Btw, do you have a copyright assignment on file with the FSF covering work on GCC? Thanks, Richard. ChangeLog: 2014-02-11 Evgeny Stupachenko evstu...@gmail.com * target.h (vect_cost_for_stmt): Defining new cost vec_perm_shuffle. * tree-vect-data-refs.c (vect_grouped_store_supported): New check for stores group of length 3. (vect_permute_store_chain): New permutations for stores group of length 3. (vect_grouped_load_supported): New check for loads group of length 3. (vect_permute_load_chain): New permutations for loads group of length 3. * tree-vect-stmts.c (vect_model_store_cost): New cost vec_perm_shuffle for the new permutations. (vect_model_load_cost): Ditto. * config/aarch64/aarch64.c (builtin_vectorization_cost): Adding vec_perm_shuffle cost as equvivalent of vec_perm cost. * config/arm/arm.c: Ditto. * config/rs6000/rs6000.c: Ditto. * config/spu/spu.c: Ditto. * config/i386/x86-tune.def (TARGET_SLOW_PHUFFB): Target for slow byte shuffle on some x86 architectures. * config/i386/i386.h (processor_costs): Defining pshuffb cost. * config/i386/i386.c (processor_costs): Adding pshuffb cost. (ix86_builtin_vectorization_cost): Adding cost for the new permutations. Fixing cost for other permutations. (expand_vec_perm_even_odd_1): Avoid byte shuffles when they are slow (TARGET_SLOW_PHUFFB). (ix86_add_stmt_cost): Adding cost when STMT is WIDEN_MULTIPLY. Adding new shuffle cost only when byte shuffle is expected. Fixing cost model for Silvermont. Thanks, Evgeny -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: Use [warning enabled by default] for default warnings
On 2/11/2014 9:36 AM, Richard Sandiford wrote: I find it hard to believe that significant numbers of users are not fixing the sources of those warnings and are instead requiring every release of GCC to produce warnings with a particular wording. Good enough for me, I think it is OK to make the change.
Re: [PATCH][ARM]fix potential testsuite/gcc.target/arm/fixed_float_conversion.c regression
On 11/02/14 15:11, Ramana Radhakrishnan wrote: On Mon, Feb 3, 2014 at 3:56 PM, Renlin Li renlin...@arm.com wrote: Hi all, This patch will ensure testsuite/gcc.target/arm/fixed_float_conversion.c is checked only when -mfpu=vfp3 -mfloat-abi=softfp is applicable for the target. Accordingly, two procs (check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3) are added into gcc/testsuite/lib/target-supports.exp. I have also update related documentation. Okay for trunk? This is OK. Hi all, I've committed this on behalf of Renlin as r207691. Kyrill Ramana Kind regards, Renlin Li gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option to the test case. * lib/target-supports.exp: check_effective_target_arm_vfp3_ok: New. add_options_for_arm_vfp3: New. gcc/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * doc/sourcebuild.texi: Document check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3
Re: [RFC, patch] Detect lack of 32-bit devel environment on x86_64-linux targets
On Fri, Jan 31, 2014 at 03:49:40PM +0100, Richard Biener wrote: On Fri, Jan 31, 2014 at 3:45 PM, FX fxcoud...@gmail.com wrote: I've just seen that an explicit --enable-multilib is a way to do that. Yes, I was writing that as a reply when I received your email. (Also, it's written in the configure error message.) Yeah - you know, that message is quite long and somehow I didn't read it carefully. I suspect that will happen to others, too, so we'll get some extra complaints from that ;) Btw, doing the configure check exactly after all-stage1-gcc should be an early enough and a serialization point, no? There you can do the check even for when cross-compiling. Well, you've already built a whole stage, so it's not so early, is it? Well, building the stage1 compiler is probably the fastest thing nowadays (it didn't yet build the target libraries for stage1 with the stage1, unoptimized and checking-enabled compiler - which is where it would fail in the odd way which is what you want to improve). As I said, you can't properly check it at the point you are checking. Which is why I complain - you're not checking this properly! Anyway, I've fixed the bug on our side with --enable-multilib. Just hit the same thing, while I have (in mock) 32-bit devel libc installed, I don't have 32-bit libgcc_s installed (what for, it will be built by gcc). Please revert it, or at least improve it (e.g. by trying to build with -static-libgcc at least). Jakub
Re: [PATCH 4/6] [GOMP4] OpenACC 1.0+ support in fortran front-end
Hi! On Fri, 31 Jan 2014 15:16:07 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: OpenACC 1.0 support -- GENERIC nodes and gimplify stubs. Thanks! This one is nearly ready for commit, and can then go in already, while the Fortran front end patches are still being dicussed. :-) Please merge »OpenACC 1.0 support -- documentation« into this patch. In gcc/doc/generic.texi, please also add an entry for OACC_DECLARE, which is currently missing. Ideally, gcc/doc/generic.texi OMP_CLAUSE should also be updated for the OpenACC clauses (as well as recently added OpenMP ones), but as that one's outdated already, this is not a prerequisite. For ChangeLog files updates (on gomp-4_0-branch, use the respective ChangeLog.gomp files, by the way), should just you be listed as the author, or also your colleagues? --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6157,6 +6166,20 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, remove = true; break; + case OMP_CLAUSE_HOST: + case [...] + case OMP_CLAUSE_VECTOR_LENGTH: case OMP_CLAUSE_NOWAIT: case OMP_CLAUSE_ORDERED: case OMP_CLAUSE_UNTIED: Indentation (one tab instead of two spaces). @@ -6476,6 +6499,23 @@ gimplify_adjust_omp_clauses (tree *list_p) } break; + case OMP_CLAUSE_HOST: + case [...] + case OMP_CLAUSE_VECTOR_LENGTH: +sorry (Clause not supported yet); +break; + case OMP_CLAUSE_REDUCTION: case OMP_CLAUSE_COPYIN: case OMP_CLAUSE_COPYPRIVATE: Indentation. Also, shouldn't the sorry be moved to gimplify_scan_omp_clauses, and a »remove = true« be added in there, and gimplify_adjust_omp_clauses then just contain a gcc_unreachable (or, move the new »case OMP_CLAUSE_*« next to the existing default: that already contains gcc_unreachable)? @@ -7988,6 +8028,19 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, ret = GS_ALL_DONE; break; + case OACC_KERNELS: + case OACC_DATA: + case OACC_CACHE: + case OACC_WAIT: + case OACC_HOST_DATA: + case OACC_DECLARE: + case OACC_UPDATE: + case OACC_ENTER_DATA: + case OACC_EXIT_DATA: +sorry (directive not yet implemented); +ret = GS_ALL_DONE; +break; + case OMP_PARALLEL: gimplify_omp_parallel (expr_p, pre_p); ret = GS_ALL_DONE; Indentation. Further down in gimplify_expr, shouldn't these new OACC_* codes also be added to the »These expressions should already be in gimple IR form« assert? --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1491,6 +1491,18 @@ fixup_child_record_type (omp_context *ctx) TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type); } +static bool +gimple_code_is_oacc (const_gimple g) +{ + switch (gimple_code (g)) +{ +case GIMPLE_OACC_PARALLEL: + return true; +default: + return false; +} +} + Eventually, this will probably end up next to CASE_GIMPLE_OP/is_gimple_op in gimple.h (or the latter be reworked to be able to ask for is_omp vs. is_oacc vs. is_omp_or_oacc), but it's fine to do that once we actually need it in files other than just omp-low.c, and once we support more GIMPLE_OACC_* codes. @@ -1710,17 +1732,34 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) + case OMP_CLAUSE_HOST: + case OMP_CLAUSE_OACC_DEVICE: + case OMP_CLAUSE_DEVICE_RESIDENT: + case OMP_CLAUSE_USE_DEVICE: + case OMP_CLAUSE_ASYNC: + case OMP_CLAUSE_GANG: + case OMP_CLAUSE_WAIT: + case OMP_NO_CLAUSE_CACHE: + case OMP_CLAUSE_INDEPENDENT: + case OMP_CLAUSE_WORKER: + case OMP_CLAUSE_VECTOR: + case OMP_CLAUSE_NUM_GANGS: + case OMP_CLAUSE_NUM_WORKERS: + case OMP_CLAUSE_VECTOR_LENGTH: +sorry (Clause not supported yet); +break; + default: gcc_unreachable (); } Indentation. @@ -1827,9 +1876,26 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE__LOOPTEMP_: case OMP_CLAUSE_TO: case OMP_CLAUSE_FROM: - gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OACC_PARALLEL); + gcc_assert (!gimple_code_is_oacc (ctx-stmt)); break; + case OMP_CLAUSE_HOST: + case OMP_CLAUSE_OACC_DEVICE: + case OMP_CLAUSE_DEVICE_RESIDENT: + case OMP_CLAUSE_USE_DEVICE: + case OMP_CLAUSE_ASYNC: + case OMP_CLAUSE_GANG: + case OMP_CLAUSE_WAIT: + case OMP_NO_CLAUSE_CACHE: + case OMP_CLAUSE_INDEPENDENT: + case OMP_CLAUSE_WORKER: + case OMP_CLAUSE_VECTOR: + case OMP_CLAUSE_NUM_GANGS: + case OMP_CLAUSE_NUM_WORKERS: + case OMP_CLAUSE_VECTOR_LENGTH: +sorry (Clause not supported yet); +break; + default: gcc_unreachable (); } Indentation. --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -213,19 +213,19 @@ enum omp_clause_code { (c_parser_omp_variable_list). */ OMP_CLAUSE_ERROR = 0, - /* OpenMP clause: private (variable_list). */ + /* OpenMP/OpenACC clause: private (variable_list). */
Simple install hook
Added install hook: /opt/gjit/bin/gcc -g -O2 -Wall t.c -o test -I/opt/gjit/include -lgccjit -L/opt/gjit/lib Compiles the helloworld examples correctly and it runs instead of pointing to my gcc build dir. Am working on getting more involved with this and started: https://github.com/redbrain/spy Its only just starting to parse stuff but a kind of C/Python kind of language using gcc as a JIT might be interesting kind of dynamic language for C that can call C libs safely and easily is the idea. Mostly just so i can see where to help out in the jit front-end. Was also considering some kind of libopcodes work to assemble the code in memory instead of creating a .so in /tmp. Not sure if i know what i am doing enough there but it might be more elegant for stuff using this front-end. Am so impressed how well this works. Thanks David. From c0c1c3d55ad707ef5c62199791acabd1ceea6858 Mon Sep 17 00:00:00 2001 From: Philip redbr...@gcc.gnu.org Date: Tue, 11 Feb 2014 16:46:15 + Subject: [PATCH] Add install hook --- gcc/jit/Make-lang.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in index 17d27ca..0a468da 100644 --- a/gcc/jit/Make-lang.in +++ b/gcc/jit/Make-lang.in @@ -85,7 +85,9 @@ jit.srcman: # # Install hooks: -jit.install-common: +jit.install-common: installdirs + $(INSTALL_PROGRAM) libgccjit.so $(DESTDIR)/$(libdir)/libgccjit.so + $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h $(DESTDIR)/$(includedir)/libgccjit.h jit.install-man: -- 1.8.3.2
Re: Use [warning enabled by default] for default warnings
Am 2014-02-11 15:36, schrieb Richard Sandiford: I thought the trend these days was to move towards -Werror, so that for many people the expected output is to get no warnings at all. And bear in mind that the kind of warnings that are not under -W control tend to be those that are so likely to be a mistake that no-one has ever had an incentive to make them optional. I find it hard to believe that significant numbers of users are not fixing the sources of those warnings and are instead requiring every release of GCC to produce warnings with a particular wording. Hi, actually at my site we turn on more and more warnings into errors, but we do it warning by warning with more -Werror=..., so the fine-grained warning changes are really nice for us. The problem we face with [enabled by default] warnings is not that there are no options to turn these warnings off (we _want_ these warnings), but this also means there are no corresponding -Werror= options (and also no -Werror=enabled-by-default or -Werror=default-warnings). And pure -Werror turns all other warnings we want to see into errors too :(. regards, Franz
Re: [patch] Fix wrong code with VCE to bit-field type at -O
Hmm. The intent was of course to only allow truly no-op converts via VIEW_CONVERT_EXPR - that is, the size of the operand type and the result type should be the same. So, isn't SRA doing it wrong when creating the VIEW_CONVERT_EXPR of a 3-byte type to a 4-byte type? That's debatable IMO if the 4-byte type has 3-byte precision, but I don't have a strong opinion so I can try to fix it in SRA, although it will be weird to do low-level fiddling because of precision and size at the Tree level. -- Eric Botcazou
Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables
On Mon, Feb 10, 2014 at 11:48 PM, Wei Mi w...@google.com wrote: Here is the updated patch, which follow UD chain to determine whether iv.base is defined by __gcovx.xxx[] var. It is a lot simpler than adding a tree bit. regression test and previously failed benchmark in piii mode is ok. Other test is going on. 2014-02-10 Wei Mi w...@google.com * tree-ssa-loop-ivopts.c (defined_by_gcov_counter): New. (contains_abnormal_ssa_name_p): Add defined_by_gcov_counter check for ssa name. * testsuite/gcc.dg/profile-generate-4.c: New. Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 207019) +++ tree-ssa-loop-ivopts.c (working copy) @@ -705,6 +705,68 @@ idx_contains_abnormal_ssa_name_p (tree b return !abnormal_ssa_name_p (*index); } +/* Return true if the use is defined by a gcov counter var. + It is used to check if an iv candidate is generated for + profiling. For profile generated ssa name, we should not + use it as IV because gcov counter may have data-race for + multithread program, it could involve tricky bug to use + such ssa var in IVOPT. + Add the snippets describing how ralloc introduces a second gcov counter load and asynchronous update of it from another thread leading to bogus trip count. Also mention that setting volatile flag on gcov counter accesses may greatly degrade profile-gen performance. + To limit patterns to be checked, we list the possible cases + here: + Before PRE, the ssa name used to set __gcov counter is as + follows: + for () { + PROF_edge_counter_1 = __gcov.foo[i]; + PROF_edge_counter_2 = PROF_edge_counter_1 + 1; + __gcov.foo[i] = PROF_edge_counter_2; + } + If PRE works, the loop may be transformed to: + pretmp_1 = __gcov.foo[i]; + for () { + prephitmp_1 = PHI (PROF_edge_counter_2, pretmp_1); + PROF_edge_counter_1 = prephitmp_1; + PROF_edge_counter_2 = PROF_edge_counter_1 + 1; + __gcov.foo[i] = PROF_edge_counter_2; + } + So there are two cases: + case1: If PRE doesn't work, PROF_edge_counter_1 and PROF_edge_counter_2 + are neither induction variables candidates. We don't have to worry + about this case. + case2: If PRE works, the iv candidate base of PROF_edge_counter_1 and + PROF_edge_counter_2 are pretmp_1 or pretmp_1 + 1. pretmp_1 is defined + by __gcov var. + + So this func only has to check case2. For a ssa name which is an iv + candidate, check its base USE and see if it is defined by __gcov var. + Returning true means the ssa name is generated for profiling. */ + +bool +defined_by_gcov_counter (tree use) +{ + gimple stmt; + tree rhs, decl; + const char *name; + + stmt = SSA_NAME_DEF_STMT (use); + if (!is_gimple_assign (stmt)) +return false; + + rhs = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs) != ARRAY_REF) +return false; + + decl = TREE_OPERAND (rhs, 0); + if (TREE_CODE (decl) != VAR_DECL) +return false; Also check TREE_STATIC and DECL_ARTIFICIAL flag. David + + name = IDENTIFIER_POINTER (DECL_NAME (decl)); + if (strncmp (name, __gcov, 6)) +return false; + + return true; +} + /* Returns true if EXPR contains a ssa name that occurs in an abnormal phi node. */ @@ -721,7 +783,8 @@ contains_abnormal_ssa_name_p (tree expr) codeclass = TREE_CODE_CLASS (code); if (code == SSA_NAME) -return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0; +return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0 + || defined_by_gcov_counter (expr); if (code == INTEGER_CST || is_gimple_min_invariant (expr)) Index: testsuite/gcc.dg/profile-generate-4.c === --- testsuite/gcc.dg/profile-generate-4.c (revision 0) +++ testsuite/gcc.dg/profile-generate-4.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -fprofile-generate -fno-tree-loop-im -fdump-tree-ivopts-details-blocks } */ + +/* Because gcov counter related var has data race for multithread program, + compiler should prevent them from affecting program correctness. So + PROF_edge_counter variable should not be used as induction variable, or + else IVOPT may use such variable to compute loop boundary. */ + +void *ptr; +int N; + +void foo(void *t) { + int i; + for (i = 0; i N; i++) { +t = *(void **)t; + } + ptr = t; +} + +/* { dg-final { scan-tree-dump-times ssa name PROF_edge_counter 0 ivopts} } */ +/* { dg-final { cleanup-tree-dump ivopts } } */
[PATCH] Use VIEW_CONVERT_EXPR in SRA created debug stmts if needed (PR debug/59776)
Hi! As discussed in the PR, this patch adds VCE if types aren't compatible, in order not to create invalid debug stmts. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-11 Richard Henderson r...@redhat.com Jakub Jelinek ja...@redhat.com PR debug/59776 * tree-sra.c (load_assign_lhs_subreplacements): Add VIEW_CONVERT_EXPR around drhs if type conversion to lacc-type is not useless. * gcc.dg/guality/pr59776.c: New test. --- gcc/tree-sra.c.jj 2014-02-08 00:53:46.0 +0100 +++ gcc/tree-sra.c 2014-02-11 14:31:51.469937602 +0100 @@ -2950,6 +2950,10 @@ load_assign_lhs_subreplacements (struct lacc); else drhs = NULL_TREE; + if (drhs + !useless_type_conversion_p (lacc-type, TREE_TYPE (drhs))) + drhs = fold_build1_loc (loc, VIEW_CONVERT_EXPR, + lacc-type, drhs); ds = gimple_build_debug_bind (get_access_replacement (lacc), drhs, gsi_stmt (*old_gsi)); gsi_insert_after (new_gsi, ds, GSI_NEW_STMT); --- gcc/testsuite/gcc.dg/guality/pr59776.c.jj 2014-02-11 14:44:04.604957250 +0100 +++ gcc/testsuite/gcc.dg/guality/pr59776.c 2014-02-11 14:52:57.0 +0100 @@ -0,0 +1,29 @@ +/* PR debug/59776 */ +/* { dg-do run } */ +/* { dg-options -g } */ + +#include ../nop.h + +struct S { float f, g; }; + +__attribute__((noinline, noclone)) void +foo (struct S *p) +{ + struct S s1, s2; /* { dg-final { gdb-test pr59776.c:17 s1.f 5.0 } } */ + s1 = *p; /* { dg-final { gdb-test pr59776.c:17 s1.g 6.0 } } */ + s2 = s1; /* { dg-final { gdb-test pr59776.c:17 s2.f 0.0 } } */ + *(int *) s2.f = 0; /* { dg-final { gdb-test pr59776.c:17 s2.g 6.0 } } */ + asm volatile (NOP : : : memory); /* { dg-final { gdb-test pr59776.c:20 s1.f 5.0 } } */ + asm volatile (NOP : : : memory); /* { dg-final { gdb-test pr59776.c:20 s1.g 6.0 } } */ + s2 = s1; /* { dg-final { gdb-test pr59776.c:20 s2.f 5.0 } } */ + asm volatile (NOP : : : memory); /* { dg-final { gdb-test pr59776.c:20 s2.g 6.0 } } */ + asm volatile (NOP : : : memory); +} + +int +main () +{ + struct S x = { 5.0f, 6.0f }; + foo (x); + return 0; +} Jakub
Re: [PATCH] Use VIEW_CONVERT_EXPR in SRA created debug stmts if needed (PR debug/59776)
On 02/11/2014 09:36 AM, Jakub Jelinek wrote: Hi! As discussed in the PR, this patch adds VCE if types aren't compatible, in order not to create invalid debug stmts. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-11 Richard Henderson r...@redhat.com Jakub Jelinek ja...@redhat.com PR debug/59776 * tree-sra.c (load_assign_lhs_subreplacements): Add VIEW_CONVERT_EXPR around drhs if type conversion to lacc-type is not useless. * gcc.dg/guality/pr59776.c: New test. Ok. Nice test case. r~
[PATCH] Fix up -Wsequence-point handling (PR c/60101)
Hi! Right now we spent several minutes in verify_tree. The problems I see is that merge_tlist does the exact opposite of what should it be doing with copy flag (most merge_tlist calls are with copy=0, thus that means mostly unnecessary copying, but for the single one for SAVE_EXPR it actually probably breaks things or can break), the middle-hunk is about one spot which IMHO uses copy=1 unnecessarily (nothing uses tmp_list2 afterwards). The most important is the last hunk, the SAVE_EXPR handling was doing merge_tlist first on the whole tmp_nosp chain (with copy=0 which mistakenly copied everything, see above), and then doing that again with the whole tmp_nosp list except the first entry, then except the first two entries etc. O(n^2) complexity, but IMHO none of the calls but the first one actually would do anything, because after the first merge_tlist call all entries should be actually in tmp_list3 (except for duplicates), and so for all further entries it would be finding a matching entry already (found=1). With this patch pr60101.c compiles within less than a second. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-11 Jakub Jelinek ja...@redhat.com PR c/60101 * c-common.c (merge_tlist): If copy is true, call new_tlist, if false, add ADD itself, rather than vice versa. (verify_tree): For COND_EXPR, don't call merge_tlist with non-zero copy. For SAVE_EXPR, only call merge_tlist once. * c-c++-common/pr60101.c: New test. --- gcc/c-family/c-common.c.jj 2014-02-08 10:07:12.0 +0100 +++ gcc/c-family/c-common.c 2014-02-11 13:18:53.270521683 +0100 @@ -2976,7 +2976,7 @@ merge_tlist (struct tlist **to, struct t } if (!found) { - *end = copy ? add : new_tlist (NULL, add-expr, add-writer); + *end = copy ? new_tlist (NULL, add-expr, add-writer) : add; end = (*end)-next; *end = 0; } @@ -3134,7 +3134,7 @@ verify_tree (tree x, struct tlist **pbef verify_tree (TREE_OPERAND (x, 0), tmp_before, tmp_list2, NULL_TREE); warn_for_collisions (tmp_list2); merge_tlist (pbefore_sp, tmp_before, 0); - merge_tlist (pbefore_sp, tmp_list2, 1); + merge_tlist (pbefore_sp, tmp_list2, 0); tmp_list3 = tmp_nosp = 0; verify_tree (TREE_OPERAND (x, 1), tmp_list3, tmp_nosp, NULL_TREE); @@ -3238,12 +3238,7 @@ verify_tree (tree x, struct tlist **pbef warn_for_collisions (tmp_nosp); tmp_list3 = 0; - while (tmp_nosp) - { - struct tlist *t = tmp_nosp; - tmp_nosp = t-next; - merge_tlist (tmp_list3, t, 0); - } + merge_tlist (tmp_list3, tmp_nosp, 0); t-cache_before_sp = tmp_before; t-cache_after_sp = tmp_list3; } --- gcc/testsuite/c-c++-common/pr60101.c.jj 2014-02-11 14:55:20.728288546 +0100 +++ gcc/testsuite/c-c++-common/pr60101.c2014-02-11 13:37:28.0 +0100 @@ -0,0 +1,112 @@ +/* PR c/60101 */ +/* { dg-do compile } */ +/* { dg-options -O2 -Wall } */ + +extern int *a, b, *c, *d; + +void +foo (double _Complex *x, double _Complex *y, double _Complex *z, unsigned int l, int w) +{ + unsigned int e = (unsigned int) a[3]; + double _Complex (*v)[l][4][e][l][4] = (double _Complex (*)[l][4][e][l][4]) z; + double _Complex (*f)[l][b][l] = (double _Complex (*)[l][b][l]) y; + unsigned int g = c[0] * c[1] * c[2]; + unsigned int h = d[0] + c[0] * (d[1] + c[1] * d[2]); + unsigned int i; + + for (i = 0; i e; i++) +{ + int j = e * d[3] + i; + + unsigned int n0, n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11; + float _Complex s = 0.; + unsigned int t = 0; + + for (n0 = 0; n0 l; n0++) + for (n1 = 0; n1 l; n1++) + for (n2 = 0; n2 l; n2++) + for (n3 = 0; n3 l; n3++) + for (n4 = 0; n4 l; n4++) + for (n5 = 0; n5 l; n5++) + for (n6 = 0; n6 l; n6++) + for (n7 = 0; n7 l; n7++) + for (n8 = 0; n8 l; n8++) + for (n9 = 0; n9 l; n9++) + for (n10 = 0; n10 l; n10++) + for (n11 = 0; n11 l; n11++) + { + if (t % g == h) + s + += f[n0][n4][j][n8] * f[n1][n5][j][n9] * ~(f[n2][n6][w][n10]) * ~(f[n3][n7][w][n11]) + * (+0.25 * v[0][n2][0][i][n9][1] * v[0][n3][0][i][n5][1] * v[0][n10][0][i][n4][1] + * v[0][n7][1][i][n8][0] * v[0][n11][1][i][n1][0] * v[0][n6][1][i][n0][0] + + 0.25 * v[0][n2][0][i][n9][1] * v[0][n3][0][i][n5][1] * v[0][n10][0][i][n4][1] + * v[0][n11][1][i][n8][0] * v[0][n6][1][i][n1][0] *
Re: [PING]Re: [PATCH][ARM]fix potential testsuite/gcc.target/arm/fixed_float_conversion.c regression
On Mon, Feb 10, 2014 at 6:02 AM, Renlin Li renlin...@arm.com wrote: On 03/02/14 15:56, Renlin Li wrote: Hi all, This patch will ensure testsuite/gcc.target/arm/fixed_float_conversion.c is checked only when -mfpu=vfp3 -mfloat-abi=softfp is applicable for the target. Accordingly, two procs (check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3) are added into gcc/testsuite/lib/target-supports.exp. I have also update related documentation. Okay for trunk? Kind regards, Renlin Li gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option to the test case. * lib/target-supports.exp: check_effective_target_arm_vfp3_ok: New. add_options_for_arm_vfp3: New. gcc/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * doc/sourcebuild.texi: Document check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3 Hi, Anybody help me to review this patch? This breaks bootstrap: ../../src-trunk/gcc/doc/sourcebuild.texi:1963: @ref reference to nonexistent node `arm_vfp3_ok' I am checking in this as an obvious fix. -- H.J. -- diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 1ea5753..85ef819 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1552,6 +1552,7 @@ ARM target supports @code{-mfpu=vfp -mfloat-abi=softfp}. Some multilibs may be incompatible with these options. @item arm_vfp3_ok +@anchor{arm_vfp3_ok} ARM target supports @code{-mfpu=vfp3 -mfloat-abi=softfp}. Some multilibs may be incompatible with these options.
Re: [PING]Re: [PATCH][ARM]fix potential testsuite/gcc.target/arm/fixed_float_conversion.c regression
On Tue, Feb 11, 2014 at 9:46 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 10, 2014 at 6:02 AM, Renlin Li renlin...@arm.com wrote: On 03/02/14 15:56, Renlin Li wrote: Hi all, This patch will ensure testsuite/gcc.target/arm/fixed_float_conversion.c is checked only when -mfpu=vfp3 -mfloat-abi=softfp is applicable for the target. Accordingly, two procs (check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3) are added into gcc/testsuite/lib/target-supports.exp. I have also update related documentation. Okay for trunk? Kind regards, Renlin Li gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option to the test case. * lib/target-supports.exp: check_effective_target_arm_vfp3_ok: New. add_options_for_arm_vfp3: New. gcc/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * doc/sourcebuild.texi: Document check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3 Hi, Anybody help me to review this patch? This breaks bootstrap: ../../src-trunk/gcc/doc/sourcebuild.texi:1963: @ref reference to nonexistent node `arm_vfp3_ok' I am checking in this as an obvious fix. -- H.J. -- diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 1ea5753..85ef819 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1552,6 +1552,7 @@ ARM target supports @code{-mfpu=vfp -mfloat-abi=softfp}. Some multilibs may be incompatible with these options. @item arm_vfp3_ok +@anchor{arm_vfp3_ok} ARM target supports @code{-mfpu=vfp3 -mfloat-abi=softfp}. Some multilibs may be incompatible with these options. It has been fixed by Uros. -- H.J.
Re: [PING]Re: [PATCH][ARM]fix potential testsuite/gcc.target/arm/fixed_float_conversion.c regression
On 11/02/14 17:47, H.J. Lu wrote: On Tue, Feb 11, 2014 at 9:46 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 10, 2014 at 6:02 AM, Renlin Li renlin...@arm.com wrote: On 03/02/14 15:56, Renlin Li wrote: Hi all, This patch will ensure testsuite/gcc.target/arm/fixed_float_conversion.c is checked only when -mfpu=vfp3 -mfloat-abi=softfp is applicable for the target. Accordingly, two procs (check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3) are added into gcc/testsuite/lib/target-supports.exp. I have also update related documentation. Okay for trunk? Kind regards, Renlin Li gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option to the test case. * lib/target-supports.exp: check_effective_target_arm_vfp3_ok: New. add_options_for_arm_vfp3: New. gcc/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * doc/sourcebuild.texi: Document check_effective_target_arm_vfp3_ok and add_options_for_arm_vfp3 Hi, Anybody help me to review this patch? This breaks bootstrap: ../../src-trunk/gcc/doc/sourcebuild.texi:1963: @ref reference to nonexistent node `arm_vfp3_ok' I am checking in this as an obvious fix. -- H.J. -- diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 1ea5753..85ef819 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1552,6 +1552,7 @@ ARM target supports @code{-mfpu=vfp -mfloat-abi=softfp}. Some multilibs may be incompatible with these options. @item arm_vfp3_ok +@anchor{arm_vfp3_ok} ARM target supports @code{-mfpu=vfp3 -mfloat-abi=softfp}. Some multilibs may be incompatible with these options. It has been fixed by Uros. Hi Uros H.J. , Thank you for the fix! It's definitely my fault without double checking the patch. Kind regards, Renlin
Re: [PATCH][LTO] Fix PR60060
This fixes the ICE seen in PR60060 which stems from us asking to output debug-info for a function-scope global twice - once via emitting the function and its BLOCK-associated vars and once via the debug-hook through lto_write_globals - emit_debug_global_declarations. The list of global variables does not agree with those from the frontends (that function-scope global is not in GFortrans list). Yes, function scope statics are not supposed to be in the global list, only in the symbol table. if (flag_wpa) return; ! /* Output debug info for global variables. */ varpool_node *vnode; FOR_EACH_DEFINED_VARIABLE (vnode) ! if (!decl_function_context (vnode-decl)) ! debug_hooks-global_decl (vnode-decl); If it works, it looks OK to me :) Wrapup_global_decls does more than just debug output, but it is already done at compilation time, so we probably do not need to re-do it. Honza } static tree
[jit] Improvements to error-handling
Committed to branch dmalcolm/jit: gcc/jit/ * internal-api.c (gcc::jit::recording::context::add_error_va): If GCC_JIT_STR_OPTION_PROGNAME is NULL, use libgccjit.so, as per the comment in libgccjit.h (gcc::jit::recording::label::replay_into): When reporting on an unplaced label, include the name of the containing function in the error message. * libgccjit.c: Remove comment about Functions for use within the code factory, as this distinction went away in commit 96b218c9a1d5f39fb649e02c0e77586b180e8516. (RETURN_VAL_IF_FAIL_PRINTF4): New. (RETURN_NULL_IF_FAIL_PRINTF4): New. (jit_error): Invoke vfprintf with the correct format string in the NULL context case. (gcc_jit_context_new_call): Check for NULL entries within the array of arguments. --- gcc/jit/ChangeLog.jit | 18 ++ gcc/jit/internal-api.c | 10 -- gcc/jit/libgccjit.c| 29 + 3 files changed, 51 insertions(+), 6 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 68c38db..d331082 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,21 @@ +2014-02-11 David Malcolm dmalc...@redhat.com + + * internal-api.c (gcc::jit::recording::context::add_error_va): If + GCC_JIT_STR_OPTION_PROGNAME is NULL, use libgccjit.so, as per + the comment in libgccjit.h + (gcc::jit::recording::label::replay_into): When reporting on an + unplaced label, include the name of the containing function in + the error message. + * libgccjit.c: Remove comment about Functions for use within the + code factory, as this distinction went away in commit + 96b218c9a1d5f39fb649e02c0e77586b180e8516. + (RETURN_VAL_IF_FAIL_PRINTF4): New. + (RETURN_NULL_IF_FAIL_PRINTF4): New. + (jit_error): Invoke vfprintf with the correct format string in + the NULL context case. + (gcc_jit_context_new_call): Check for NULL entries within the + array of arguments. + 2014-02-10 David Malcolm dmalc...@redhat.com * libgccjit.h (gcc_jit_context_get_int_type): New. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 9ff0eb8..6c66d3d 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -500,8 +500,12 @@ recording::context::add_error_va (const char *fmt, va_list ap) char buf[1024]; vsnprintf (buf, sizeof (buf) - 1, fmt, ap); + const char *progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); + if (!progname) +progname = libgccjit.so; + fprintf (stderr, %s: %s\n, - get_str_option (GCC_JIT_STR_OPTION_PROGNAME), + progname, buf); if (!m_error_count) @@ -1075,7 +1079,9 @@ recording::label::replay_into (replayer *r) { if (!m_has_been_placed) { - r-add_error (unplaced label: %s, get_debug_string ()); + r-add_error (unplaced label within %s: %s, + m_func-get_debug_string (), + get_debug_string ()); return; } set_playback_obj (m_func-playback_function () diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 94b3271..3c2d962 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -100,6 +100,16 @@ struct gcc_jit_loop : public gcc::jit::recording::loop }\ JIT_END_STMT +#define RETURN_VAL_IF_FAIL_PRINTF4(TEST_EXPR, RETURN_EXPR, CTXT, ERR_FMT, A0, A1, A2, A3) \ + JIT_BEGIN_STMT \ +if (!(TEST_EXPR)) \ + {\ + jit_error ((CTXT), %s: ERR_FMT, \ + __func__, (A0), (A1), (A2), (A3)); \ + return (RETURN_EXPR); \ + }\ + JIT_END_STMT + #define RETURN_VAL_IF_FAIL_PRINTF6(TEST_EXPR, RETURN_EXPR, CTXT, ERR_FMT, A0, A1, A2, A3, A4, A5) \ JIT_BEGIN_STMT \ if (!(TEST_EXPR)) \ @@ -119,6 +129,9 @@ struct gcc_jit_loop : public gcc::jit::recording::loop #define RETURN_NULL_IF_FAIL_PRINTF3(TEST_EXPR, CTXT, ERR_FMT, A0, A1, A2) \ RETURN_VAL_IF_FAIL_PRINTF3 (TEST_EXPR, NULL, CTXT, ERR_FMT, A0, A1, A2) +#define RETURN_NULL_IF_FAIL_PRINTF4(TEST_EXPR, CTXT, ERR_FMT, A0, A1, A2, A3) \ + RETURN_VAL_IF_FAIL_PRINTF4 (TEST_EXPR, NULL, CTXT, ERR_FMT, A0, A1, A2, A3) + #define RETURN_NULL_IF_FAIL_PRINTF6(TEST_EXPR, CTXT, ERR_FMT, A0, A1, A2, A3, A4, A5) \ RETURN_VAL_IF_FAIL_PRINTF6 (TEST_EXPR, NULL, CTXT, ERR_FMT, A0, A1, A2, A3, A4, A5) @@ -174,7 +187,8 @@ jit_error (gcc::jit::recording::context *ctxt, const char
Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables
+/* Return true if the use is defined by a gcov counter var. + It is used to check if an iv candidate is generated for + profiling. For profile generated ssa name, we should not + use it as IV because gcov counter may have data-race for + multithread program, it could involve tricky bug to use + such ssa var in IVOPT. + Add the snippets describing how ralloc introduces a second gcov counter load and asynchronous update of it from another thread leading to bogus trip count. Also mention that setting volatile flag on gcov counter accesses may greatly degrade profile-gen performance. Comments added. + if (!is_gimple_assign (stmt)) +return false; + + rhs = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs) != ARRAY_REF) +return false; + + decl = TREE_OPERAND (rhs, 0); + if (TREE_CODE (decl) != VAR_DECL) +return false; Also check TREE_STATIC and DECL_ARTIFICIAL flag. David Check added. Add DECL_ARTIFICIAL setting in build_var() in coverage.c. The updated patch is attached. Thanks, Wei. 2014-02-11 Wei Mi w...@google.com * tree-ssa-loop-ivopts.c (defined_by_gcov_counter): New. (contains_abnormal_ssa_name_p): Add defined_by_gcov_counter check for ssa name. * coverage.c (build_var): Set DECL_ARTIFICIAL(gcov var decl) to be 1. * testsuite/gcc.dg/profile-generate-4.c: New. Index: testsuite/gcc.dg/profile-generate-4.c === --- testsuite/gcc.dg/profile-generate-4.c (revision 0) +++ testsuite/gcc.dg/profile-generate-4.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -fprofile-generate -fno-tree-loop-im -fdump-tree-ivopts-details-blocks } */ + +/* Because gcov counter related var has data race for multithread program, + compiler should prevent them from affecting program correctness. So + PROF_edge_counter variable should not be used as induction variable, or + else IVOPT may use such variable to compute loop boundary. */ + +void *ptr; +int N; + +void foo(void *t) { + int i; + for (i = 0; i N; i++) { +t = *(void **)t; + } + ptr = t; +} + +/* { dg-final { scan-tree-dump-times ssa name PROF_edge_counter 0 ivopts} } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: coverage.c === --- coverage.c (revision 207019) +++ coverage.c (working copy) @@ -1485,6 +1485,7 @@ build_var (tree fn_decl, tree type, int TREE_STATIC (var) = 1; TREE_ADDRESSABLE (var) = 1; DECL_ALIGN (var) = TYPE_ALIGN (type); + DECL_ARTIFICIAL (var) = 1; return var; } Index: tree-ssa-loop-ivopts.c === --- tree-ssa-loop-ivopts.c (revision 207019) +++ tree-ssa-loop-ivopts.c (working copy) @@ -705,6 +705,115 @@ idx_contains_abnormal_ssa_name_p (tree b return !abnormal_ssa_name_p (*index); } +/* Return true if the use is defined by a gcov counter var. + It is used to check if an iv candidate is generated for + profiling. For profile generated ssa name, we should not + use it as IV because gcov counter may have data-race for + multithread program, it is compiler's responsibility to + avoid connecting profile counter related vars with program + correctness. + + Without the check, the following bug could happen in + following case: + * original loop + + int i; + for (i = 0; i N; i++) { + t = *(void **)t; + } + + * after profile-gen and IVOPT, loop condition is replaced and + pretmp_1 is involved in loop boundary computation. + + pretmp_1 = __gcov0.foo[0]; + _22 = pretmp_1 + 1; + ... + _31 = (unsigned long) pretmp_1; + _32 = _30 + _31; + _33 = _32 + 2; + label: + ivtmp.8_9 = PHI ivtmp.8_5(5), ivtmp.8_2(3) + PROF_edge_counter_10 = (long int) ivtmp.8_9; + __gcov0.foo[0] = PROF_edge_counter_10; + ... + ivtmp.8_5 = ivtmp.8_9 + 1; + if (ivtmp.8_5 != _33) + goto label + + * after register allocation, pretmp_1 may be marked as REG_EQUIV in IRA + with __gcov0.foo[0] and some references are replaced by __gcov0.foo in LRA. + + _22 = __gcov0.foo[0] + 1; + ... + _31 = (unsigned long) __gcov0.foo[0]; + _32 = _30 + _31; + _33 = _32 + 2; + label: + + + * Bug happens when __gcov0.foo[0] is updated asynchronously by other thread + between the above __gcov0.foo[0] references statements. + + We don't choose to mark gcov counter as volatile because it may greatly + degrade profile-gen performance. + + To limit patterns to be checked, we list the possible cases + here: + Before PRE, the ssa name used to set __gcov counter is as + follows: + for () { + PROF_edge_counter_1 = __gcov.foo[i]; + PROF_edge_counter_2 = PROF_edge_counter_1 + 1; + __gcov.foo[i] = PROF_edge_counter_2; + } + If
fix typo in doc/generic.texi
* (generic.texi): Fix typo Index: generic.texi === --- generic.texi(revision 207627) +++ generic.texi(working copy) @@ -53,7 +53,7 @@ seems inelegant. @node Deficiencies @section Deficiencies -There are many places in which this document is incomplet and incorrekt. +There are many places in which this document is incomplete and incorrect. It is, as of yet, only @emph{preliminary} documentation. @c -
Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables
ok for google branch for now. Please resubmit to trunk and backport the trunk version if needed. thanks, David On Tue, Feb 11, 2014 at 10:29 AM, Wei Mi w...@google.com wrote: +/* Return true if the use is defined by a gcov counter var. + It is used to check if an iv candidate is generated for + profiling. For profile generated ssa name, we should not + use it as IV because gcov counter may have data-race for + multithread program, it could involve tricky bug to use + such ssa var in IVOPT. + Add the snippets describing how ralloc introduces a second gcov counter load and asynchronous update of it from another thread leading to bogus trip count. Also mention that setting volatile flag on gcov counter accesses may greatly degrade profile-gen performance. Comments added. + if (!is_gimple_assign (stmt)) +return false; + + rhs = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs) != ARRAY_REF) +return false; + + decl = TREE_OPERAND (rhs, 0); + if (TREE_CODE (decl) != VAR_DECL) +return false; Also check TREE_STATIC and DECL_ARTIFICIAL flag. David Check added. Add DECL_ARTIFICIAL setting in build_var() in coverage.c. The updated patch is attached. Thanks, Wei.
Re: fix typo in doc/generic.texi
On Wed, Feb 12, 2014 at 12:00:30AM +0530, Prathamesh Kulkarni wrote: * (generic.texi): Fix typo Index: generic.texi === --- generic.texi(revision 207627) +++ generic.texi(working copy) @@ -53,7 +53,7 @@ seems inelegant. @node Deficiencies @section Deficiencies -There are many places in which this document is incomplet and incorrekt. +There are many places in which this document is incomplete and incorrect. I believe this typo in intentional. Marek
[PATCH, testsuite]: Revert PR testsuite/58630 on mainline
Hello! Now that Richard fixed ms_abi limitation w.r.t. accumulate-outgoing-args, we can revert PR testsuite/58630 fix on mainline. On mainline, ms_abi works without problems with and without this option. 2014-02-11 Uros Bizjak ubiz...@gmail.com PR target/59927 Revert 2013-12-15 Uros Bizjak ubiz...@gmail.com PR testsuite/58630 * gcc.target/i386/pr43662.c (dg-options): Add -maccumulate-outgoing-args. * gcc.target/i386/pr43869.c (dg-options): Ditto. * gcc.target/i386/pr57003.c (dg-options): Ditto. * gcc.target/i386/avx-vzeroupper-16.c (dg-options): Remove -mtune=generic and add -maccumulate-outgoing-args instead. * gcc.target/i386/avx-vzeroupper-17.c (dg-options): Ditto. * gcc.target/i386/avx-vzeroupper-18.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/func-1.c (dg-options): Add -maccumulate-outgoing-args. * gcc.target/x86_64/abi/callabi/func-2a.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/func-2b.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/func-indirect.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/func-indirect-2a.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/func-indirect-2b.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/leaf-1.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/leaf-2.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/pr38891.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-1.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-2.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-3.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-4a.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-4b.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-5a.c (dg-options): Ditto. * gcc.target/x86_64/abi/callabi/vaarg-5b.c (dg-options): Ditto. Tested on x86_64-pc-linux-gnu, with and without --with-arch=core2 --with-cpu=core2 configured compiler. Committed to mainline SVN. Uros. Index: ChangeLog === --- ChangeLog (revision 207693) +++ ChangeLog (working copy) @@ -1,3 +1,36 @@ +2014-02-11 Uros Bizjak ubiz...@gmail.com + + PR target/59927 + Revert + 2013-12-15 Uros Bizjak ubiz...@gmail.com + + PR testsuite/58630 + * gcc.target/i386/pr43662.c (dg-options): + Add -maccumulate-outgoing-args. + * gcc.target/i386/pr43869.c (dg-options): Ditto. + * gcc.target/i386/pr57003.c (dg-options): Ditto. + * gcc.target/i386/avx-vzeroupper-16.c (dg-options): + Remove -mtune=generic and add -maccumulate-outgoing-args instead. + * gcc.target/i386/avx-vzeroupper-17.c (dg-options): Ditto. + * gcc.target/i386/avx-vzeroupper-18.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/func-1.c (dg-options): + Add -maccumulate-outgoing-args. + * gcc.target/x86_64/abi/callabi/func-2a.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/func-2b.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/func-indirect.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/func-indirect-2a.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/func-indirect-2b.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/leaf-1.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/leaf-2.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/pr38891.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-1.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-2.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-3.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-4a.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-4b.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-5a.c (dg-options): Ditto. + * gcc.target/x86_64/abi/callabi/vaarg-5b.c (dg-options): Ditto. + 2014-02-11 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: Add arm_vfp3 option. Index: gcc.target/i386/avx-vzeroupper-16.c === --- gcc.target/i386/avx-vzeroupper-16.c (revision 207693) +++ gcc.target/i386/avx-vzeroupper-16.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile { target lp64 } } */ -/* { dg-options -O2 -mavx -mabi=ms -maccumulate-outgoing-args -dp } */ +/* { dg-options -O2 -mavx -mabi=ms -dp } */ typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__)); Index: gcc.target/i386/avx-vzeroupper-17.c === --- gcc.target/i386/avx-vzeroupper-17.c (revision 207693) +++ gcc.target/i386/avx-vzeroupper-17.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile { target lp64 } } */ -/* { dg-options -O2 -mavx -mabi=ms -maccumulate-outgoing-args
Re: Use [warning enabled by default] for default warnings
On Tue, Feb 11, 2014 at 5:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Feb 11, 2014 at 1:48 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Robert Dewar de...@adacore.com writes: On 2/11/2014 4:45 AM, Richard Sandiford wrote: OK, this version drops the [enabled by default] altogether. Tested as before. OK to install? Still a huge earthquake in terms of affecting test suites and baselines of many users. is it really worth it? In the case of GNAT we have only recently started tagging messages in this way, so changes would not be so disruptive, and we can debate following whatever gcc does, but I think it is important to understand that any change in this area is a big one in terms of impact on users. The patch deliberately didn't affect Ada's diagnostic routines given your comments from the first round. Calling this a huge earthquake for other languages seems like a gross overstatement. I don't think gcc, g++, gfortran, etc, have ever made a commitment to producing textually identical warnings and errors for given inputs across different releases. It seems ridiculous to require that, especially if it stands in the way of improving the diagnostics or introducing finer-grained -W control. E.g. Florian's complaint was that we shouldn't have warnings that are not under the control of any -W options. But by your logic we couldn't change that either, because all those [enabled by default]s would become [-Wnew-option]s. Yeah, I think Roberts argument is a red herring - there are loads of diagnostic changes every release so you cannot expect those to be stable. I'm fine with dropping the [enabled by default] as in the patch, but lets hear another ok before making the change. I think this change is OK. It's obviously not a great situation, but enabled by default is fairly useless information, so this seems like a marginal improvement. Ian
PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
Hi, HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? Thanks. H.J. --- 2014-02-11 H.J. Lu hongjiu...@intel.com PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to assembler. * configure: Regenerated. diff --git a/gcc/configure.ac b/gcc/configure.ac index ac3d842..0aafbc9 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3868,7 +3868,7 @@ foo: nop # These two are used unconditionally by i386.[ch]; it is to be defined # to 1 if the feature is present, 0 otherwise. gcc_GAS_CHECK_FEATURE([GOTOFF in data], -gcc_cv_as_ix86_gotoff_in_data, [2,11,0],, +gcc_cv_as_ix86_gotoff_in_data, [2,11,0], --32, [ .text .L0: nop
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
Hi H.J., HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? This would break Solaris/x86 with as configurations, where this test currently passes, but would fail since as doesn't understand --32. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
On Tue, Feb 11, 2014 at 8:28 PM, H.J. Lu hongjiu...@intel.com wrote: HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? Unfortunately, .code32 didn't work as expected... So, OK. Thanks, Uros.
RFA: one more version of patch for PR59535
This is one more version of the patch to fix the PR59535 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535 Here are the results of applying the patch: ThumbThumb2 reload 2626334 2400154 lra (before the patch) 2665749 2414926 lra (after the patch) 2626334 2397132 I already wrote that the change in arm.h is to prevent reloading sp as an address by LRA. Reload has no such problem as it uses legitimate address hook and LRA mostly relies on base_reg_class. Richard, I need an approval for this change. 2014-02-11 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/59535 * lra-constraints.c (process_alt_operands): Encourage alternative when unassigned pseudo class is superset of the alternative class. (inherit_reload_reg): Don't inherit when optimizing for code size. * config/arm/arm.h (MODE_BASE_REG_CLASS): Return CORE_REGS for Thumb2 and BASE_REGS for modes not less than 4 for LRA. Index: lra-constraints.c === --- lra-constraints.c (revision 207562) +++ lra-constraints.c (working copy) @@ -2112,6 +2112,21 @@ process_alt_operands (int only_alternati goto fail; } + /* If not assigned pseudo has a class which a subset of + required reg class, it is a less costly alternative + as the pseudo still can get a hard reg of necessary + class. */ + if (! no_regs_p REG_P (op) hard_regno[nop] 0 + (cl = get_reg_class (REGNO (op))) != NO_REGS + ira_class_subset_p[this_alternative][cl]) + { + if (lra_dump_file != NULL) + fprintf + (lra_dump_file, + %d Super set class reg: reject-=3\n, nop); + reject -= 3; + } + this_alternative_offmemok = offmemok; if (this_costly_alternative != NO_REGS) { @@ -4391,6 +4406,9 @@ static bool inherit_reload_reg (bool def_p, int original_regno, enum reg_class cl, rtx insn, rtx next_usage_insns) { + if (optimize_function_for_size_p (cfun)) +return false; + enum reg_class rclass = lra_get_allocno_class (original_regno); rtx original_reg = regno_reg_rtx[original_regno]; rtx new_reg, new_insns, usage_insn; Index: config/arm/arm.h === --- config/arm/arm.h (revision 207562) +++ config/arm/arm.h (working copy) @@ -1272,8 +1272,10 @@ enum reg_class when addressing quantities in QI or HI mode; if we don't know the mode, then we must be conservative. */ #define MODE_BASE_REG_CLASS(MODE) \ -(TARGET_ARM || (TARGET_THUMB2 !optimize_size) ? CORE_REGS : \ - (((MODE) == SImode) ? BASE_REGS : LO_REGS)) +(TARGET_ARM || (TARGET_THUMB2 (!optimize_size || arm_lra_flag)) \ + ? CORE_REGS : ((MODE) == SImode \ +|| (arm_lra_flag GET_MODE_SIZE (MODE) = 4) \ +? BASE_REGS : LO_REGS)) /* For Thumb we can not support SP+reg addressing, so we return LO_REGS instead of BASE_REGS. */
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
On Tue, Feb 11, 2014 at 11:40 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Hi H.J., HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? This would break Solaris/x86 with as configurations, where this test currently passes, but would fail since as doesn't understand --32. How about passing --32 to as only for Linux? OK to install? Thanks. -- H.J. --- 2014-02-11 H.J. Lu hongjiu...@intel.com PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to assembler for Linux target. * configure: Regenerated. diff --git a/gcc/configure.ac b/gcc/configure.ac index ac3d842..1b5dca2 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3867,8 +3867,17 @@ foo: nop # These two are used unconditionally by i386.[ch]; it is to be defined # to 1 if the feature is present, 0 otherwise. +case $target_os in +linux*) + as_ix86_gotoff_in_data_opt=--32 + ;; +*) + as_ix86_gotoff_in_data_opt= + ;; +esac gcc_GAS_CHECK_FEATURE([GOTOFF in data], -gcc_cv_as_ix86_gotoff_in_data, [2,11,0],, +gcc_cv_as_ix86_gotoff_in_data, [2,11,0], + $as_ix86_gotoff_in_data_opt, [ .text .L0: nop
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/11/14 02:06, Eric Botcazou wrote: I pondered changing the condition for swapping the insn order, but it didn't seem worth it. I doubt we see many 3-2 combinations where I3 is a JUMP_INSN, the result turns into two simple sets and the insn swapping code you wrote decides it needs to swap the insns. I didn't actually write it, just enhanced it recently, that's why I suggested to do the same here. It's one line of code and we have an example of valid simplification at hand so I think we ought to do it. It seems to me that as long as we're re-using the existing insns to contain the simple sets that we have to ensure that they're INSNs, not CALL_INSNs or JUMP_INSNs. I disagree, nullifying JUMP_INSNs by changing them to (set (pc) (pc)) is a standard method in the combiner. Right, but most of the time we're doing a 2-1 or 3-1 that turns the conditional jump into a nop -- at which point all we do is modify I3 (which is the conditional jump) into a nop-jump. That common case shouldn't be affected by my change. It's only when we have a 3-2 or 4-2 combination where one of the earlier insns has a side effect we need to keep and we're able to simplify the conditional into a NOP that would be affected by my change. And I suspect that is much more rare. Regardless, I'll see what I can do. THanks, Jeff
[COMMITTED] Fix target/59927
[Re-send, since I never saw the original arrive on the list.] After the first version passed Kai's testing, we chatted on IRC about various other bits that might be relevant to the winx64 abi issues. The conditional call to ix86_eax_live_at_start_p seemed to be harmless either way, but confusing as a conditional. It seems least confusing to make the call unconditional. Finally, adjust the comment for ACCUMULATE_OUTGOING_ARGS to match reality. r~ PR target/59927 * calls.c (expand_call): Don't double-push for reg_parm_stack_space. * config/i386/i386.c (init_cumulative_args): Remove sorry for 64-bit ms-abi vs -mno-accumulate-outgoing-args. (ix86_expand_prologue): Unconditionally call ix86_eax_live_at_start_p. * config/i386/i386.h (ACCUMULATE_OUTGOING_ARGS): Fix comment with respect to ms-abi. diff --git a/gcc/calls.c b/gcc/calls.c index d574a95..f392319 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -2955,6 +2955,7 @@ expand_call (tree exp, rtx target, int ignore) /* If we push args individually in reverse order, perform stack alignment before the first push (the last arg). */ if (PUSH_ARGS_REVERSED argblock == 0 + adjusted_args_size.constant reg_parm_stack_space adjusted_args_size.constant != unadjusted_args_size) { /* When the stack adjustment is pending, we get better code diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1b7bb3e..0a15e44 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -6110,10 +6110,6 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, /* Argument info to initialize */ cum-caller = caller; /* Set up the number of registers to use for passing arguments. */ - - if (TARGET_64BIT cum-call_abi == MS_ABI !ACCUMULATE_OUTGOING_ARGS) -sorry (ms_abi attribute requires -maccumulate-outgoing-args - or subtarget optimization implying it); cum-nregs = ix86_regparm; if (TARGET_64BIT) { @@ -11032,15 +11028,14 @@ ix86_expand_prologue (void) if (TARGET_64BIT) r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0); - if (!TARGET_64BIT_MS_ABI) -eax_live = ix86_eax_live_at_start_p (); - /* Note that SEH directives need to continue tracking the stack -pointer even after the frame pointer has been set up. */ + eax_live = ix86_eax_live_at_start_p (); if (eax_live) { insn = emit_insn (gen_push (eax)); allocate -= UNITS_PER_WORD; + /* Note that SEH directives need to continue tracking the stack +pointer even after the frame pointer has been set up. */ if (sp_is_cfa_reg || TARGET_SEH) { if (sp_is_cfa_reg) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 0e757c9c..b605ae2 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1537,12 +1537,12 @@ enum reg_class mode the difference is less drastic but visible. FIXME: Unlike earlier implementations, the size of unwind info seems to - actually grouw with accumulation. Is that because accumulated args + actually grow with accumulation. Is that because accumulated args unwind info became unnecesarily bloated? - - 64-bit MS ABI seem to require 16 byte alignment everywhere except for - function prologue and epilogue. This is not possible without - ACCUMULATE_OUTGOING_ARGS. + + With the 64-bit MS ABI, we can generate correct code with or without + accumulated args, but because of OUTGOING_REG_PARM_STACK_SPACE the code + generated without accumulated args is terrible. If stack probes are required, the space used for large function arguments on the stack must also be probed, so enable
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
H.J. Lu hjl.to...@gmail.com writes: On Tue, Feb 11, 2014 at 11:40 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Hi H.J., HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? This would break Solaris/x86 with as configurations, where this test currently passes, but would fail since as doesn't understand --32. How about passing --32 to as only for Linux? OK to install? I'd rather do it for gas instead, which can be used on non-Linux systems, too. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Avoid uninit warnings with Fortran optional args (PR fortran/52370)
Hi! Optional dummy arg support vars are often undefined in various paths, while the predicated uninit analysis can sometimes avoid false positives, sometimes it can't. So IMHO we should set TREE_NO_WARNING here, it is already set on the other support vars like size.N, ubound.N, lbound.N, etc. but not on the actual data pointer. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-11 Jakub Jelinek ja...@redhat.com PR fortran/52370 * trans-decl.c (gfc_build_dummy_array_decl): Set TREE_NO_WARNING on decl if sym-attr.optional. * gfortran.dg/pr52370.f90: New test. --- gcc/fortran/trans-decl.c.jj 2014-01-03 11:40:53.0 +0100 +++ gcc/fortran/trans-decl.c2014-02-11 18:17:47.315598566 +0100 @@ -1014,6 +1014,10 @@ gfc_build_dummy_array_decl (gfc_symbol * TREE_STATIC (decl) = 0; DECL_EXTERNAL (decl) = 0; + /* Avoid uninitialized warnings for optional dummy arguments. */ + if (sym-attr.optional) +TREE_NO_WARNING (decl) = 1; + /* We should never get deferred shape arrays here. We used to because of frontend bugs. */ gcc_assert (sym-as-type != AS_DEFERRED); --- gcc/testsuite/gfortran.dg/pr52370.f90.jj2014-02-11 18:20:45.704628460 +0100 +++ gcc/testsuite/gfortran.dg/pr52370.f90 2014-02-11 18:20:11.0 +0100 @@ -0,0 +1,21 @@ +! PR fortran/52370 +! { dg-do compile } +! { dg-options -O1 -Wall } + +module pr52370 +contains + subroutine foo(a,b) +real, intent(out) :: a +real, dimension(:), optional, intent(out) :: b +a=0.5 +if (present(b)) then + b=1.0 +end if + end subroutine foo +end module pr52370 + +program prg52370 + use pr52370 + real :: a + call foo(a) +end program prg52370 Jakub
[jit] Fix error message in test
Committed to branch dmalcolm/jit: gcc/testsuite/ * jit.dg/test-error-unplaced-label.c (verify_code): Update expected error message to reflect commit 6cd4f82c5237cc328aea229cdaaa428ff09d6e98. --- gcc/testsuite/ChangeLog.jit | 6 ++ gcc/testsuite/jit.dg/test-error-unplaced-label.c | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/ChangeLog.jit b/gcc/testsuite/ChangeLog.jit index e108fb7e..25aff70 100644 --- a/gcc/testsuite/ChangeLog.jit +++ b/gcc/testsuite/ChangeLog.jit @@ -1,3 +1,9 @@ +2014-02-11 David Malcolm dmalc...@redhat.com + + * jit.dg/test-error-unplaced-label.c (verify_code): Update + expected error message to reflect commit + 6cd4f82c5237cc328aea229cdaaa428ff09d6e98. + 2014-02-10 David Malcolm dmalc...@redhat.com * jit.dg/test-types.c (struct zoo): Add field m_sized_int_type, diff --git a/gcc/testsuite/jit.dg/test-error-unplaced-label.c b/gcc/testsuite/jit.dg/test-error-unplaced-label.c index 64073f47..464fdfc 100644 --- a/gcc/testsuite/jit.dg/test-error-unplaced-label.c +++ b/gcc/testsuite/jit.dg/test-error-unplaced-label.c @@ -45,6 +45,6 @@ verify_code (gcc_jit_context *ctxt, gcc_jit_result *result) /* Verify that the correct error message was emitted. */ CHECK_STRING_VALUE (gcc_jit_context_get_first_error (ctxt), - unplaced label: foo); + unplaced label within test_fn: foo); } -- 1.7.11.7
Re: PATCH: PR target/60151: HAVE_AS_GOTOFF_IN_DATA is mis-detected on x86-64
On Tue, Feb 11, 2014 at 12:29 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: H.J. Lu hjl.to...@gmail.com writes: On Tue, Feb 11, 2014 at 11:40 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Hi H.J., HAVE_AS_GOTOFF_IN_DATA defines a 32-bit assembler feature, we need to pass --32 to assembler. Otherwise, we get the wrong result on x86-64. We already pass --32 to assembler on x86. It should be OK to do it in configure. OK for trunk? This would break Solaris/x86 with as configurations, where this test currently passes, but would fail since as doesn't understand --32. How about passing --32 to as only for Linux? OK to install? I'd rather do it for gas instead, which can be used on non-Linux systems, too. Sure. Here is the new patch. OK to install? -- H.J. --- 2014-02-11 H.J. Lu hongjiu...@intel.com PR target/60151 * configure.ac (HAVE_AS_GOTOFF_IN_DATA): Pass --32 to GNU assembler. * configure: Regenerated. diff --git a/gcc/configure.ac b/gcc/configure.ac index ac3d842..333bca6 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3867,8 +3867,14 @@ foo: nop # These two are used unconditionally by i386.[ch]; it is to be defined # to 1 if the feature is present, 0 otherwise. +if test x$gas = xyes; then + as_ix86_gotoff_in_data_opt=--32 +else + as_ix86_gotoff_in_data_opt= +fi gcc_GAS_CHECK_FEATURE([GOTOFF in data], -gcc_cv_as_ix86_gotoff_in_data, [2,11,0],, +gcc_cv_as_ix86_gotoff_in_data, [2,11,0], + $as_ix86_gotoff_in_data_opt, [ .text .L0: nop
Harden ipa-devirt for invalid programs, part 1 (and also fix a wrong code issue)
Hi, this is first part of fix for lto/59468 that is about ipa-devirt ICEs on invalid C++ programs (violating ODR rule). This patch hardens it wrt various low-level VTABLE corruptions - the case virtual tables gets too small, their initalizer is missing or user decided to rewrite the symbol by a different kind of variable so we no longer understand its constructor. While working on it I did some testing and noticed that there is important bug WRT coplete list. We may end up not listing a node because we can not refer it, but we forget to clear the list. This is sported by the testcases added. To avoid bugs like this, I decided to make can_refer explicit when calling gimple_get_virt_method_for_vtable (that function may return NULL also in other cases when caller is confused - this will not happen for ipa-devirt but may happen for old devirt code in ipa-prop that I plan to remove next stage1). This allows bit more sanity checks about updating complete flag. Bootstrapped/regtested x86_64-linux, will commit it after bit of firefox testing. As a followup I will add ODR violation warning when virtual tables do not match. I also noticed that we may end up not shipping virtual tables that matters to LTO because they are not referenced directly by any of the units. This seems a bit more important than anticipated. Have old patch to make this happen but it seems bit expensive without measurable benefits (i.e. improvements in devirtualization counts) and it triggers interesting issues with free_lang_data that I would like to discuss separately. Will evaulate it bit further. Honza PR lto/59468 * ipa-utils.h (possible_polymorphic_call_targets): Update prototype and wrapper. * ipa-devirt.c: Include demangle.h (odr_violation_reported): New static variable. (add_type_duplicate): Update odr_violations. (maybe_record_node): Add completep parameter; update it. (record_target_from_binfo): Add COMPLETEP parameter; update it as needed. (possible_polymorphic_call_targets_1): Likewise. (struct polymorphic_call_target_d): Add nonconstruction_targets; rename FINAL to COMPLETE. (record_targets_from_bases): Sanity check we found the binfo; fix COMPLETEP updating. (possible_polymorphic_call_targets): Add NONCONSTRUTION_TARGETSP parameter, fix computing of COMPLETEP. (dump_possible_polymorphic_call_targets): Imrove readability of dump; at LTO time do demangling. (ipa_devirt): Use nonconstruction_targets; Improve dumps. * gimple-fold.c (gimple_get_virt_method_for_vtable): Add can_refer parameter. (gimple_get_virt_method_for_binfo): Likewise. * gimple-fold.h (gimple_get_virt_method_for_binfo, gimple_get_virt_method_for_vtable): Update prototypes. * g++.dg/ipa/devirt-27.C: New testcase. * g++.dg/ipa/devirt-26.C: New testcase. Index: gimple-fold.c === *** gimple-fold.c (revision 207649) --- gimple-fold.c (working copy) *** fold_const_aggregate_ref (tree t) *** 3170,3191 } /* Lookup virtual method with index TOKEN in a virtual table V !at OFFSET. */ tree gimple_get_virt_method_for_vtable (HOST_WIDE_INT token, tree v, ! unsigned HOST_WIDE_INT offset) { tree vtable = v, init, fn; unsigned HOST_WIDE_INT size; unsigned HOST_WIDE_INT elt_size, access_index; tree domain_type; /* First of all double check we have virtual table. */ if (TREE_CODE (v) != VAR_DECL || !DECL_VIRTUAL_P (v)) ! return NULL_TREE; init = ctor_for_folding (v); --- 3170,3204 } /* Lookup virtual method with index TOKEN in a virtual table V !at OFFSET. !Set CAN_REFER if non-NULL to false if method !is not referable or if the virtual table is ill-formed (such as rewriten !by non-C++ produced symbol). Otherwise just return NULL in that calse. */ tree gimple_get_virt_method_for_vtable (HOST_WIDE_INT token, tree v, ! unsigned HOST_WIDE_INT offset, ! bool *can_refer) { tree vtable = v, init, fn; unsigned HOST_WIDE_INT size; unsigned HOST_WIDE_INT elt_size, access_index; tree domain_type; + if (can_refer) + *can_refer = true; + /* First of all double check we have virtual table. */ if (TREE_CODE (v) != VAR_DECL || !DECL_VIRTUAL_P (v)) ! { ! gcc_assert (in_lto_p); ! /* Pass down that we lost track of the target. */ ! if (can_refer) ! *can_refer = false; ! return NULL_TREE; ! } init = ctor_for_folding (v); *** gimple_get_virt_method_for_vtable (HOST_ *** 3197,3202 --- 3210,3218 if
Re: [PATCH] Avoid uninit warnings with Fortran optional args (PR fortran/52370)
On Tue, Feb 11, 2014 at 09:38:20PM +0100, Jakub Jelinek wrote: Optional dummy arg support vars are often undefined in various paths, while the predicated uninit analysis can sometimes avoid false positives, sometimes it can't. So IMHO we should set TREE_NO_WARNING here, it is already set on the other support vars like size.N, ubound.N, lbound.N, etc. but not on the actual data pointer. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Yes. Although trunk is in stage 4, your patch seems to be relatively benign. -- Steve
Re: [PATCH] Fix PR target/60137, no splitters for vectors that get GPR registers
On Mon, Feb 10, 2014 at 8:14 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch fixes PR target/60137 that shows up when some vector types get allocated to GPR registers, and there wasn't a splitter for the type. It shows up when targetting ISA 2.07 (power8) when VSX is disabled but Altivec (VMX) is enabled, though I suspect there are other failure cases. I bootstrapped and checked the patch, and it caused no regressions in the test suite. Is it ok to apply? [gcc] 2014-02-10 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60137 * config/rs6000/rs6000.md (128-bit GPR splitter): Add a splitter for VSX/Altivec vectors that land in GPR registers. [gcc/testsuite] 2014-02-10 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60137 * gcc.target/powerpc/pr60137.c: New file. Okay. Thanks, David
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + split section. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set? + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert (kept-comdat_group == NULL); + cold_section = kept; + } + else + { + /* Record the split cold section in the hot section. */ + assert (!is_prefix_of (.text.unlikely, kept-full_name)); + kept-split_section = section; + cold_section = section; + } + assert (cold_section != NULL cold_section-comdat_group == NULL); + cold_section-is_split_cold_section = 1; + } ... Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
patch fixing PR49008
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49008 This typo was not important as the code worked correctly in the most cases as units required precence of only one another unit mostly. Committed as rev. 207701. 2014-02-11 Vladimir Makarov vmaka...@redhat.com PR target/49008 * genautomata.c (add_presence_absence): Fix typo with {final_}presence_list. Index: genautomata.c === --- genautomata.c (revision 207697) +++ genautomata.c (working copy) @@ -2348,7 +2348,7 @@ add_presence_absence (unit_set_el_t dest for (prev_el = (presence_p ? (final_p ? dst-unit_decl-final_presence_list - : dst-unit_decl-final_presence_list) + : dst-unit_decl-presence_list) : (final_p ? dst-unit_decl-final_absence_list : dst-unit_decl-absence_list));
[jit] Add an explicit boolean type
Committed to branch dmalcolm/jit: gcc/jit/ * libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BOOL. * internal-api.h (gcc::jit::recording::comparison::comparison): Use GCC_JIT_TYPE_BOOL as the types of comparisons, rather than GCC_JIT_TYPE_INT. * internal-api.c (gcc::jit::recording::memento_of_get_type:: dereference): Handle GCC_JIT_TYPE_BOOL (with an error). (get_type_strings): Add GCC_JIT_TYPE_BOOL. (get_tree_node_for_type): Handle GCC_JIT_TYPE_BOOL. gcc/testsuite/ * jit.dg/test-types.c: Add test coverage for getting type GCC_JIT_TYPE_BOOL. * jit.dg/test-expressions.c (make_test_of_comparison): Convert return type from int to bool. (verify_comparisons): Likewise. --- gcc/jit/ChangeLog.jit | 13 + gcc/jit/internal-api.c | 6 ++ gcc/jit/internal-api.h | 2 +- gcc/jit/libgccjit.h | 4 gcc/testsuite/ChangeLog.jit | 8 gcc/testsuite/jit.dg/test-expressions.c | 9 ++--- gcc/testsuite/jit.dg/test-types.c | 15 +++ 7 files changed, 53 insertions(+), 4 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index d331082..90c2e3f 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,18 @@ 2014-02-11 David Malcolm dmalc...@redhat.com + * libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BOOL. + + * internal-api.h (gcc::jit::recording::comparison::comparison): Use + GCC_JIT_TYPE_BOOL as the types of comparisons, rather than + GCC_JIT_TYPE_INT. + + * internal-api.c (gcc::jit::recording::memento_of_get_type:: + dereference): Handle GCC_JIT_TYPE_BOOL (with an error). + (get_type_strings): Add GCC_JIT_TYPE_BOOL. + (get_tree_node_for_type): Handle GCC_JIT_TYPE_BOOL. + +2014-02-11 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::recording::context::add_error_va): If GCC_JIT_STR_OPTION_PROGNAME is NULL, use libgccjit.so, as per the comment in libgccjit.h diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 6c66d3d..0f140fe 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -657,6 +657,7 @@ recording::memento_of_get_type::dereference () case GCC_JIT_TYPE_VOID_PTR: return m_ctxt-get_type (GCC_JIT_TYPE_VOID); +case GCC_JIT_TYPE_BOOL: case GCC_JIT_TYPE_CHAR: case GCC_JIT_TYPE_SIGNED_CHAR: case GCC_JIT_TYPE_UNSIGNED_CHAR: @@ -698,6 +699,8 @@ static const char * const get_type_strings[] = { void,/* GCC_JIT_TYPE_VOID */ void *, /* GCC_JIT_TYPE_VOID_PTR */ + bool, /* GCC_JIT_TYPE_BOOL */ + char, /* GCC_JIT_TYPE_CHAR */ signed char,/* GCC_JIT_TYPE_SIGNED_CHAR */ unsigned char, /* GCC_JIT_TYPE_UNSIGNED_CHAR */ @@ -1723,6 +1726,9 @@ get_tree_node_for_type (enum gcc_jit_types type_) case GCC_JIT_TYPE_VOID_PTR: return ptr_type_node; +case GCC_JIT_TYPE_BOOL: + return boolean_type_node; + case GCC_JIT_TYPE_CHAR: return char_type_node; case GCC_JIT_TYPE_SIGNED_CHAR: diff --git a/gcc/jit/internal-api.h b/gcc/jit/internal-api.h index 55772a6..f285039 100644 --- a/gcc/jit/internal-api.h +++ b/gcc/jit/internal-api.h @@ -961,7 +961,7 @@ public: location *loc, enum gcc_jit_comparison op, rvalue *a, rvalue *b) - : rvalue (ctxt, loc, ctxt-get_type (GCC_JIT_TYPE_INT)), /* FIXME: should be bool? */ + : rvalue (ctxt, loc, ctxt-get_type (GCC_JIT_TYPE_BOOL)), m_op (op), m_a (a), m_b (b) diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h index 976bcb2..a4ef65e 100644 --- a/gcc/jit/libgccjit.h +++ b/gcc/jit/libgccjit.h @@ -305,6 +305,10 @@ enum gcc_jit_types /* void *. */ GCC_JIT_TYPE_VOID_PTR, + /* C++'s bool type; also C99's _Bool type, aka bool if using + stdbool.h. */ + GCC_JIT_TYPE_BOOL, + /* Various integer types. */ /* C's char (of some signedness) and the variants where the diff --git a/gcc/testsuite/ChangeLog.jit b/gcc/testsuite/ChangeLog.jit index 25aff70..4d8f209 100644 --- a/gcc/testsuite/ChangeLog.jit +++ b/gcc/testsuite/ChangeLog.jit @@ -1,5 +1,13 @@ 2014-02-11 David Malcolm dmalc...@redhat.com + * jit.dg/test-types.c: Add test coverage for getting type + GCC_JIT_TYPE_BOOL. + * jit.dg/test-expressions.c (make_test_of_comparison): Convert + return type from int to bool. + (verify_comparisons): Likewise. + +2014-02-11 David Malcolm dmalc...@redhat.com + * jit.dg/test-error-unplaced-label.c (verify_code): Update expected error message to reflect commit 6cd4f82c5237cc328aea229cdaaa428ff09d6e98. diff --git a/gcc/testsuite/jit.dg/test-expressions.c b/gcc/testsuite/jit.dg/test-expressions.c index 36588c6..d2fd950 100644 --- a/gcc/testsuite/jit.dg/test-expressions.c +++
Re: [PATCH i386] Enable -freorder-blocks-and-partition
On Thu, Dec 19, 2013 at 10:19 PM, Teresa Johnson tejohn...@google.com wrote: On Thu, Dec 12, 2013 at 5:13 PM, Jan Hubicka hubi...@ucw.cz wrote: On Wed, Dec 11, 2013 at 1:21 AM, Martin Liška marxin.li...@gmail.com wrote: Hello, I prepared a collection of systemtap graphs for GIMP. 1) just my profile-based function reordering: 550 pages 2) just -freorder-blocks-and-partitions: 646 pages 3) just -fno-reorder-blocks-and-partitions: 638 pages Please see attached data. Thanks for the data. A few observations/questions: With both 1) (your (time-based?) reordering) and 2) (-freorder-blocks-and-partitions) there are a fair amount of accesses out of the cold section. I'm not seeing so many accesses out of the cold section in the apps I am looking at with splitting enabled. In I see you already comitted the patch, so perhaps Martin's measurement assume the pass is off by default? I rebuilded GCC with profiledboostrap and with the linkerscript unmapping text.unlikely. I get ICE in: (gdb) bt #0 diagnostic_set_caret_max_width(diagnostic_context*, int) () at ../../gcc/diagnostic.c:108 #1 0x00f68457 in diagnostic_initialize (context=0x18ae000 global_diagnostic_context, n_opts=n_opts@entry=1290) at ../../gcc/diagnostic.c:135 #2 0x0100050e in general_init (argv0=optimized out) at ../../gcc/toplev.c:1110 #3 toplev_main(int, char**) () at ../../gcc/toplev.c:1922 #4 0x7774cbe5 in __libc_start_main () from /lib64/libc.so.6 #5 0x00f7898d in _start () at ../sysdeps/x86_64/start.S:122 That is relatively early in startup process. The function seems inlined and it fails only on second invocation, did not have time to investigate further, yet while without -fprofile-use it starts... I'll see if I can reproduce this and investigate, although at this point that might have to wait until after my holiday vacation. I tried the linkerscript with cpu2006 and got quite a lot of failures (using the ref inputs to train, so the behavior should be the same in both profile-gen and profile-use). I investigated the one in bzip2 and found an issue that may not be easy to fix and is perhaps something it is not worth fixing. The case was essentially the following: Function foo was called by callsites A and B, with call counts 148122615 and 18, respectively. Within function foo, there was a basic block that had a very low count (compared to the entry bb count of 148122633), and therefore a 0 frequency: ;; basic block 6, loop depth 0, count 18, freq 0 The ipa inliner decided to inline into callsite A but not B. Because the vast majority of the call count was from callsite A, when we performed execute_fixup_cfg after doing the inline transformation, the count_scale is 0 and the out-of-line copy of foo's blocks all got counts 0. However, most of the bbs still had non-zero frequencies. But bb 6 ended up with a count and frequency of 0, leading us to split it out. It turns out that at least one of the 18 counts for this block were from callsite B, and we ended up trying to execute the split bb in the out-of-line copy from that callsite. I can think of a couple of ways to prevent this to happen (e.g. have execute_fixup_cfg give the block a count or frequency of 1 instead of 0, or mark the bb somehow as not eligible for splitting due to a low confidence in the 0 count/frequency), but they seem a little hacky. I am thinking that the splitting here is something we shouldn't worry about - it is so close to 0 count that the occasional jump to the split section caused by the lack of precision in the frequency is not a big deal. Unfortunately, without fixing this I can't use the linker script without disabling inlining to avoid this problem. I reran cpu2006 with the linker script but with -fno-inline and got 6 more benchmarks to pass. So there are other issues out there. I will take a look at another one and see if it is a similar scaling/precision issue. I'm thinking that I may just use my heatmap scripts (which leverages perf-events profiles) to see if there is any significant execution in the split cold sections, since it seems we can't realistically prevent any and all execution of the cold split sections, and that is more meaningful anyway. Teresa On our periodic testers I see off-noise improvement in crafty 2200-2300 and regression on Vortex, 2900-2800, plus code size increase. I had only run cpu2006, but not cpu2000. I'll see if I can reproduce this as well. I have been investigating a few places where I saw accesses in the cold split regions in internal benchmarks. Here are a couple, and how I have addressed them so far: 1) loop unswitching In this case, loop unswitching hoisted a branch from within the loop to outside the loop, and in doing so it was effectively speculated above several other branches. In it's original location it always went to only one of the successors (biased 0/100%). But when it was hoisted it sometimes took
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Tue, Feb 11, 2014 at 1:32 PM, Teresa Johnson tejohn...@google.com wrote: On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + split section. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set I removed this line as it is not useful. Details: In general, the computed_weight for sections is assigned in set_node_type in line: s-computed_weight = n-computed_weight; The computed_weight is obtained by adding the weights of all incoming edges. However, for the cold part of split functions, this can never be non-zero as the split cold part is never called and so this line is not useful. + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): Changed. New patch attached. Thanks Sri /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert (kept-comdat_group == NULL); + cold_section = kept; + } + else + { + /* Record the split cold section in the hot section. */ + assert (!is_prefix_of (.text.unlikely, kept-full_name)); + kept-split_section = section; + cold_section = section; + } + assert (cold_section != NULL cold_section-comdat_group == NULL); + cold_section-is_split_cold_section = 1; + } ... Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 Index: function_reordering_plugin/callgraph.h === --- function_reordering_plugin/callgraph.h (revision 207700) +++ function_reordering_plugin/callgraph.h (working copy) @@ -236,6 +236,8 @@ typedef struct section_id_ is comdat hot and kept, pointer to the kept cold split section. */ struct section_id_ *split_section; + /* If this is the cold part of a split section. */ + char is_split_cold_section; /* Check if this section has been considered for output. */ char processed; } Section_id;
Re: [PATCH] Handle more COMDAT profiling issues
Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation? David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson tejohn...@google.com wrote: This patch attempts to address the lost profile issue for COMDATs in more circumstances, exposed by function splitting. My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies on 0-count functions from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. This could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. In some internal C++ applications I am seeing that we execute out of the split cold portion of COMDATs for this reason. This problem is more complicated to address than the 0-count instance, because we need the cgraph to determine which functions to drop the profile on, and at that point the estimated frequencies have already been overwritten by counts_to_freqs. To handle this broader case, I have changed the drop_profile routine to simply re-estimate the probabilities/frequencies (and translate these into counts scaled from the incoming call edge counts). This unfortunately necessitates rebuilding the cgraph, to propagate the new synthesized counts and avoid checking failures downstream. But it will only be rebuilt if we dropped any profiles. With this solution, some of the older approach can be removed (e.g. propagating counts using the guessed probabilities during inlining). Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? Thanks, Teresa 2014-02-10 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from tree_estimate_probability_driver. (tree_reestimate_probability): New function. (tree_estimate_probability_driver): Invoke tree_estimate_probability_worker. (freqs_to_counts): Move from tree-inline.c. (drop_profile): Re-estimated profiles when dropping counts. (handle_missing_profiles): Drop for some non-zero functions as well. (counts_to_freqs): Remove code obviated by reestimation. * predict.h (handle_missing_profiles): Update declartion. (tree_estimate_probability): Ditto. * tree-inline.c (freqs_to_counts): Move to predict.c. (copy_cfg_body): Remove code obviated by reestimation. * tree-profile.c (gimple_gen_ior_profiler): (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph. (tree_profiling): Invoke rebuild_cgraph as needed. Index: graphite.c === --- graphite.c (revision 207436) +++ graphite.c (working copy) @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) cleanup_tree_cfg (); profile_status_for_fn (cfun) = PROFILE_ABSENT; release_recorded_exits (); - tree_estimate_probability (); + tree_estimate_probability (false); } cloog_state_free (cloog_state); Index: params.def === --- params.def (revision 207436) +++ params.def (working copy) @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, Maximal estimated outcome of branch considered predictable, 2, 0, 50) +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, + min-caller-reestimate-ratio, + Minimum caller-to-callee node count ratio to force reestimated branch + probabilities in callee (where 0 means only when callee count is 0), + 10, 0, 0) + DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, inline-min-speedup, The minimal estimated speedup allowing inliner to ignore inline-insns-single and
[google gcc-4_8] Remove DW_AT_GNU_addr_base from skeleton type unit DIEs
Remove DW_AT_GNU_addr_base from skeleton type unit DIEs. GCC currently adds a DW_AT_GNU_addr_base attribute to skeleton type unit DIEs, even though it's not needed there. By removing it, we can save the 8 bytes that a DW_FORM_addr takes up, plus the 24 bytes that the corresponding relocation takes. This patch is for the google/gcc-4_8 branch. I will submit it for trunk when stage 1 is open. Tested with crosstool_validate.py. 2014-02-11 Cary Coutant ccout...@google.com * dwarf2out.c (add_top_level_skeleton_die_attrs): Don't add DW_AT_GNU_addr_base to all skeleton DIEs. (dwarf2out_finish): Add it only to the skeleton comp_unit DIE. Index: dwarf2out.c === --- dwarf2out.c (revision 207671) +++ dwarf2out.c (working copy) @@ -8918,7 +8918,6 @@ add_top_level_skeleton_die_attrs (dw_die if (comp_dir != NULL) add_skeleton_AT_string (die, DW_AT_comp_dir, comp_dir); add_AT_pubnames (die); - add_AT_lineptr (die, DW_AT_GNU_addr_base, debug_addr_section_label); } /* Return the single type-unit die for skeleton type units. */ @@ -23929,6 +23928,8 @@ dwarf2out_finish (const char *filename) skeleton die attrs are added when the skeleton type unit is created, so ensure it is created by this point. */ add_top_level_skeleton_die_attrs (main_comp_unit_die); + add_AT_lineptr (main_comp_unit_die, DW_AT_GNU_addr_base, + debug_addr_section_label); (void) get_skeleton_type_unit (); htab_traverse_noresize (debug_str_hash, index_string, index); }
Re: [google gcc-4_8] Remove DW_AT_GNU_addr_base from skeleton type unit DIEs
On Tue, Feb 11, 2014 at 3:55 PM, Cary Coutant ccout...@google.com wrote: Remove DW_AT_GNU_addr_base from skeleton type unit DIEs. GCC currently adds a DW_AT_GNU_addr_base attribute to skeleton type unit DIEs, even though it's not needed there. By removing it, we can save the 8 bytes that a DW_FORM_addr takes up, plus the 24 bytes that the corresponding relocation takes. This patch is for the google/gcc-4_8 branch. I will submit it for trunk when stage 1 is open. Tested with crosstool_validate.py. 2014-02-11 Cary Coutant ccout...@google.com * dwarf2out.c (add_top_level_skeleton_die_attrs): Don't add DW_AT_GNU_addr_base to all skeleton DIEs. (dwarf2out_finish): Add it only to the skeleton comp_unit DIE. Index: dwarf2out.c === --- dwarf2out.c (revision 207671) +++ dwarf2out.c (working copy) @@ -8918,7 +8918,6 @@ add_top_level_skeleton_die_attrs (dw_die if (comp_dir != NULL) add_skeleton_AT_string (die, DW_AT_comp_dir, comp_dir); add_AT_pubnames (die); - add_AT_lineptr (die, DW_AT_GNU_addr_base, debug_addr_section_label); } /* Return the single type-unit die for skeleton type units. */ @@ -23929,6 +23928,8 @@ dwarf2out_finish (const char *filename) skeleton die attrs are added when the skeleton type unit is created, so ensure it is created by this point. */ add_top_level_skeleton_die_attrs (main_comp_unit_die); + add_AT_lineptr (main_comp_unit_die, DW_AT_GNU_addr_base, + debug_addr_section_label); (void) get_skeleton_type_unit (); htab_traverse_noresize (debug_str_hash, index_string, index); } LGTM
Re: [PATCH, WWW] [AVX-512] Add news about AVX-512.
On Mon, 10 Feb 2014, Kirill Yukhin wrote: Index: htdocs/index.html === + dtspanIntel AVX-512 support/span + span class=date[2014-02-07]/span/dt + ddIntel AVX-512 support was added to GCC. That includes inline assembly + support, extended existing and new registers, intrinsics set (covered by + corresponding testsuite), and basic autovectorization./dd How about support for inline assembly, new registers and extending existing ones, new intrinsics, and basic autovectorization ? Please keep an eye on whether I might have changed the meaning; the above is my interpretation of what I _think_ I read. (To not make this too long I'd omit the testsuite aspect unless you feel strong about it. Then I'd ask the question why only that part is covered by the testsuite, though. ;-) No attribution to Intel or the actual developers as we have done in other cases? Index: htdocs/gcc-4.9/changes.html === + liGCC now supports Intel Advanced Vector Extensions 512 instructions + (AVX-512), including inline assembly support, extended existing and + new registers, intrinsics set (covered by corresponding testsuite) and + basic autovectorization. Can you adjust this similarly (but keep/add the testsuite piece)? AVX-512 instructions are available via + following GCC switches: AVX-512 foundamental instructions + code-mavx512f/code, AVX-512 prefetch instructions code-mavx512pf/code, + AVX-512 exponential and reciprocal instructions code-mavx512er/code, + AVX-512 conflict detection instructions code-mavx512cd/code. via the following (add the) Here I'd add a colon after each instructions. Is there an option to enable all of them together? This is fine with these changes or different ones addressing the items I pointed out. Thanks, Gerald PS: This is my primary e-mail addres (just in case you have put the other into your address book or so).
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Tue, Feb 11, 2014 at 3:29 PM, Teresa Johnson tejohn...@google.com wrote: On Feb 11, 2014 2:37 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 1:32 PM, Teresa Johnson tejohn...@google.com wrote: On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + spliandection. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set I removed this line as it is not useful. Details: In general, the computed_weight for sections is assigned in set_node_type in line: s-computed_weight = n-computed_weight; The computed_weight is obtained by adding the weights of all incoming edges. However, for the cold part of split functions, this can never be non-zero as the split cold part is never called and so this line is not useful. + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): Changed. New patch attached. The comment is fixed but the checks stayed the same and seem out of order now. Teresa Ah!, sorry. Changed and patch attached. Sri Thanks Sri /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert (kept-comdat_group == NULL); + cold_section = kept; + } + else + { + /* Record the split cold section in the hot section. */ + assert (!is_prefix_of (.text.unlikely, kept-full_name)); + kept-split_section = section; + cold_section = section; + } + assert (cold_section != NULL cold_section-comdat_group == NULL); + cold_section-is_split_cold_section = 1; + } ... Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 Index: gcc/testsuite/g++.dg/tree-prof/func_reorder_gold_plugin_split_functions_1.C === ---
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Tue, Feb 11, 2014 at 4:32 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 3:29 PM, Teresa Johnson tejohn...@google.com wrote: On Feb 11, 2014 2:37 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 1:32 PM, Teresa Johnson tejohn...@google.com wrote: On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + spliandection. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set I removed this line as it is not useful. Details: In general, the computed_weight for sections is assigned in set_node_type in line: s-computed_weight = n-computed_weight; The computed_weight is obtained by adding the weights of all incoming edges. However, for the cold part of split functions, this can never be non-zero as the split cold part is never called and so this line is not useful. + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): Changed. New patch attached. The comment is fixed but the checks stayed the same and seem out of order now. Teresa Ah!, sorry. Changed and patch attached. The assert in the else clause is unnecessary (since you have landed there after doing that same check already). It would be good to have asserts in both the if and else clauses on the new section_name (see my suggested code in the initial response). Teresa Sri Thanks Sri /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert (kept-comdat_group == NULL); + cold_section = kept; + } + else + { + /* Record the split cold section in the hot section. */ + assert (!is_prefix_of (.text.unlikely, kept-full_name)); + kept-split_section = section; + cold_section = section; + } + assert (cold_section != NULL cold_section-comdat_group == NULL); +
[GOOGLE] Emit a single unaligned load/store instruction for i386 m_GENERIC
This small patch lets GCC emit a single unaligned load/store instruction for m_GENERIC i386 CPUs. Bootstrapped and passed regression test. OK for Google branch? thanks, Cong Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 207701) +++ gcc/config/i386/i386.c (working copy) @@ -1903,10 +1903,10 @@ static unsigned int initial_ix86_tune_fe m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMDFAM10 | m_BDVER | m_GENERIC, /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */ - m_COREI7 | m_COREI7_AVX | m_AMDFAM10 | m_BDVER | m_BTVER, + m_COREI7 | m_COREI7_AVX | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC, /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */ - m_COREI7 | m_COREI7_AVX | m_BDVER, + m_COREI7 | m_COREI7_AVX | m_BDVER | m_GENERIC, /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */ m_BDVER ,
Re: [GOOGLE] Emit a single unaligned load/store instruction for i386 m_GENERIC
ok. David On Tue, Feb 11, 2014 at 4:50 PM, Cong Hou co...@google.com wrote: This small patch lets GCC emit a single unaligned load/store instruction for m_GENERIC i386 CPUs. Bootstrapped and passed regression test. OK for Google branch? thanks, Cong Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 207701) +++ gcc/config/i386/i386.c (working copy) @@ -1903,10 +1903,10 @@ static unsigned int initial_ix86_tune_fe m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_AMDFAM10 | m_BDVER | m_GENERIC, /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */ - m_COREI7 | m_COREI7_AVX | m_AMDFAM10 | m_BDVER | m_BTVER, + m_COREI7 | m_COREI7_AVX | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC, /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */ - m_COREI7 | m_COREI7_AVX | m_BDVER, + m_COREI7 | m_COREI7_AVX | m_BDVER | m_GENERIC, /* X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL */ m_BDVER ,
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Tue, Feb 11, 2014 at 4:39 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 4:32 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 3:29 PM, Teresa Johnson tejohn...@google.com wrote: On Feb 11, 2014 2:37 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 1:32 PM, Teresa Johnson tejohn...@google.com wrote: On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + spliandection. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set I removed this line as it is not useful. Details: In general, the computed_weight for sections is assigned in set_node_type in line: s-computed_weight = n-computed_weight; The computed_weight is obtained by adding the weights of all incoming edges. However, for the cold part of split functions, this can never be non-zero as the split cold part is never called and so this line is not useful. + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): Changed. New patch attached. The comment is fixed but the checks stayed the same and seem out of order now. Teresa Ah!, sorry. Changed and patch attached. The assert in the else clause is unnecessary (since you have landed there after doing that same check already). It would be good to have asserts in both the if and else clauses on the new section_name (see my suggested code in the initial response). Ok, I overlooked the code you suggested in the original response, sorry about that. I have included those asserts you suggested in both places where we swap the new section with the existing section. Thanks Sri Teresa Sri Thanks Sri /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert (kept-comdat_group == NULL); + cold_section = kept; + } + else + { + /* Record
Re: [PATCH] Fix up -Wsequence-point handling (PR c/60101)
On Tue, 11 Feb 2014, Jakub Jelinek wrote: Hi! Right now we spent several minutes in verify_tree. The problems I see is that merge_tlist does the exact opposite of what should it be doing with copy flag (most merge_tlist calls are with copy=0, thus that means mostly unnecessary copying, but for the single one for SAVE_EXPR it actually probably breaks things or can break), the middle-hunk is about one spot which IMHO uses copy=1 unnecessarily (nothing uses tmp_list2 afterwards). The most important is the last hunk, the SAVE_EXPR handling was doing merge_tlist first on the whole tmp_nosp chain (with copy=0 which mistakenly copied everything, see above), and then doing that again with the whole tmp_nosp list except the first entry, then except the first two entries etc. O(n^2) complexity, but IMHO none of the calls but the first one actually would do anything, because after the first merge_tlist call all entries should be actually in tmp_list3 (except for duplicates), and so for all further entries it would be finding a matching entry already (found=1). With this patch pr60101.c compiles within less than a second. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: Simple install hook
On Tue, 2014-02-11 at 16:51 +, Philip Herron wrote: [adding the j...@gcc.gnu.org ML to the CC] Added install hook: Thanks! I don't know that this is needed for a 3-line patch, but have you done the copyright assignment paperwork for GCC contribution? (I hope to merge my branch into gcc trunk at some point). [Also, I'd love to have more, larger, patches from you for the jit branch!] /opt/gjit/bin/gcc -g -O2 -Wall t.c -o test -I/opt/gjit/include -lgccjit -L/opt/gjit/lib Compiles the helloworld examples correctly and it runs instead of pointing to my gcc build dir. Am working on getting more involved with this and started: https://github.com/redbrain/spy Its only just starting to parse stuff but a kind of C/Python kind of language using gcc as a JIT might be interesting kind of dynamic language for C that can call C libs safely and easily is the idea. Mostly just so i can see where to help out in the jit front-end. Excellent! Looks promising - though it looks like the backend is all stubbed out at the moment. Note that the JIT API isn't frozen yet. I try to remember to add API change to the subject line when posting my commits, but I don't always remember. Let me know if you have any questions on how the JIT API works - or input on how it *should* work. FWIW I've been experimentally porting GNU Octave's LLVM-based JIT to using libgccjit, and finding and fixing various issues in the latter on the way - that's been driving a lot of the patches to the jit branch lately. Was also considering some kind of libopcodes work to assemble the code in memory instead of creating a .so in /tmp. Not sure if i know what i am doing enough there but it might be more elegant for stuff using this front-end. My thinking here was that the core code of the GNU assembler could gain the option of being built as a shared library, and having to isolate state in a context object, and we could try to hack the two projects into meeting in the middle. Large amount of work though (and a different mailing list), hence the crude .so hack for now. Am so impressed how well this works. Cheers! Dave
Re: [google][gcc-4_8][patch]Handle Split functions in the Function Reordering Plugin
On Tue, Feb 11, 2014 at 4:51 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 4:39 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 4:32 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 3:29 PM, Teresa Johnson tejohn...@google.com wrote: On Feb 11, 2014 2:37 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Feb 11, 2014 at 1:32 PM, Teresa Johnson tejohn...@google.com wrote: On Mon, Feb 10, 2014 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Teresa, I have attached a patch to recognize and reorder split functions in the function reordering plugin. Please review. Thanks Sri Index: function_reordering_plugin/callgraph.c === --- function_reordering_plugin/callgraph.c(revision 207671) +++ function_reordering_plugin/callgraph.c(working copy) @@ -550,6 +550,25 @@ static void set_node_type (Node *n) s-computed_weight = n-computed_weight; s-max_count = n-max_count; + /* If s is split into a cold section, assign the split weight to the + max count of the split section. Use this also for the weight of the + spliandection. */ + if (s-split_section) +{ + s-split_section-max_count = s-split_section-weight = n-split_weight; + /* If split_section is comdat, update all the comdat + candidates for weight. */ + s_comdat = s-split_section-comdat_group; + while (s_comdat != NULL) +{ + s_comdat-weight = s-split_section-weight; + s_comdat-computed_weight + = s-split_section-computed_weight; Where is s-split_section-computed_weight set I removed this line as it is not useful. Details: In general, the computed_weight for sections is assigned in set_node_type in line: s-computed_weight = n-computed_weight; The computed_weight is obtained by adding the weights of all incoming edges. However, for the cold part of split functions, this can never be non-zero as the split cold part is never called and so this line is not useful. + s_comdat-max_count = s-split_section-max_count; + s_comdat = s_comdat-comdat_group; +} + } + ... + /* It is split and it is not comdat. */ + else if (is_split_function) + { + Section_id *cold_section = NULL; + /* Function splitting means that the hot part is really the + relevant section and the other section is unlikely executed and + should not be part of the callgraph. */ - section-comdat_group = kept-comdat_group; - kept-comdat_group = section; + /* Store the new section in the section list. */ + section-next = first_section; + first_section = section; + /* The right section is not in the section_map if .text.unlikely is + not the new section. */ + if (!is_prefix_of (.text.unlikely, section_name)) The double-negative in the above comment is a bit confusing. Can we make this similar to the check in the earlier split comdat case. I.e. something like (essentially swapping the if condition and assert): Changed. New patch attached. The comment is fixed but the checks stayed the same and seem out of order now. Teresa Ah!, sorry. Changed and patch attached. The assert in the else clause is unnecessary (since you have landed there after doing that same check already). It would be good to have asserts in both the if and else clauses on the new section_name (see my suggested code in the initial response). Ok, I overlooked the code you suggested in the original response, sorry about that. I have included those asserts you suggested in both places where we swap the new section with the existing section. Ok, thanks! Looks good. Teresa Thanks Sri Teresa Sri Thanks Sri /* If the existing section is cold, the newly detected split must be hot. */ if (is_prefix_of (.text.unlikely, kept-full_name)) { assert (!is_prefix_of (.text.unlikely, section_name)); ... } else { assert (is_prefix_of (.text.unlikely, section_name)); ... } + { + assert (is_prefix_of (.text.unlikely, kept-full_name)); + /* The kept section was the unlikely section. Change the section + in section_map to be the new section which is the hot one. */ + *slot = section; + /* Record the split cold section in the hot section. */ + section-split_section = kept; + /* Comdats and function splitting are already handled. */ + assert
Re: [PATCH] Handle more COMDAT profiling issues
On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li davi...@google.com wrote: Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation? This is the problem I was mentioning below where we call counts_to_freqs before we have the cgraph and can tell that we will need to drop the profile. When we were only dropping the profile for functions with all 0 counts, we simply avoided doing the counts_to_freqs when the counts were all 0, which works since the 0 counts don't leave things in an inconsistent state (counts vs estimated frequencies). Teresa David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson tejohn...@google.com wrote: This patch attempts to address the lost profile issue for COMDATs in more circumstances, exposed by function splitting. My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies on 0-count functions from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. This could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. In some internal C++ applications I am seeing that we execute out of the split cold portion of COMDATs for this reason. This problem is more complicated to address than the 0-count instance, because we need the cgraph to determine which functions to drop the profile on, and at that point the estimated frequencies have already been overwritten by counts_to_freqs. To handle this broader case, I have changed the drop_profile routine to simply re-estimate the probabilities/frequencies (and translate these into counts scaled from the incoming call edge counts). This unfortunately necessitates rebuilding the cgraph, to propagate the new synthesized counts and avoid checking failures downstream. But it will only be rebuilt if we dropped any profiles. With this solution, some of the older approach can be removed (e.g. propagating counts using the guessed probabilities during inlining). Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? Thanks, Teresa 2014-02-10 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from tree_estimate_probability_driver. (tree_reestimate_probability): New function. (tree_estimate_probability_driver): Invoke tree_estimate_probability_worker. (freqs_to_counts): Move from tree-inline.c. (drop_profile): Re-estimated profiles when dropping counts. (handle_missing_profiles): Drop for some non-zero functions as well. (counts_to_freqs): Remove code obviated by reestimation. * predict.h (handle_missing_profiles): Update declartion. (tree_estimate_probability): Ditto. * tree-inline.c (freqs_to_counts): Move to predict.c. (copy_cfg_body): Remove code obviated by reestimation. * tree-profile.c (gimple_gen_ior_profiler): (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph. (tree_profiling): Invoke rebuild_cgraph as needed. Index: graphite.c === --- graphite.c (revision 207436) +++ graphite.c (working copy) @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) cleanup_tree_cfg (); profile_status_for_fn (cfun) = PROFILE_ABSENT; release_recorded_exits (); - tree_estimate_probability (); + tree_estimate_probability (false); } cloog_state_free (cloog_state); Index: params.def === --- params.def (revision 207436) +++ params.def (working copy) @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, Maximal estimated outcome of
Re: [PATCH] Handle more COMDAT profiling issues
Why is call graph needed to determine whether to drop the profile? If that is needed, it might be possible to leverage the ipa_profile pass as it will walk through all function nodes to do profile annotation. With this you can make decision to drop callee profile in caller's context. David On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li davi...@google.com wrote: Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation? This is the problem I was mentioning below where we call counts_to_freqs before we have the cgraph and can tell that we will need to drop the profile. When we were only dropping the profile for functions with all 0 counts, we simply avoided doing the counts_to_freqs when the counts were all 0, which works since the 0 counts don't leave things in an inconsistent state (counts vs estimated frequencies). Teresa David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson tejohn...@google.com wrote: This patch attempts to address the lost profile issue for COMDATs in more circumstances, exposed by function splitting. My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies on 0-count functions from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. This could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. In some internal C++ applications I am seeing that we execute out of the split cold portion of COMDATs for this reason. This problem is more complicated to address than the 0-count instance, because we need the cgraph to determine which functions to drop the profile on, and at that point the estimated frequencies have already been overwritten by counts_to_freqs. To handle this broader case, I have changed the drop_profile routine to simply re-estimate the probabilities/frequencies (and translate these into counts scaled from the incoming call edge counts). This unfortunately necessitates rebuilding the cgraph, to propagate the new synthesized counts and avoid checking failures downstream. But it will only be rebuilt if we dropped any profiles. With this solution, some of the older approach can be removed (e.g. propagating counts using the guessed probabilities during inlining). Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? Thanks, Teresa 2014-02-10 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from tree_estimate_probability_driver. (tree_reestimate_probability): New function. (tree_estimate_probability_driver): Invoke tree_estimate_probability_worker. (freqs_to_counts): Move from tree-inline.c. (drop_profile): Re-estimated profiles when dropping counts. (handle_missing_profiles): Drop for some non-zero functions as well. (counts_to_freqs): Remove code obviated by reestimation. * predict.h (handle_missing_profiles): Update declartion. (tree_estimate_probability): Ditto. * tree-inline.c (freqs_to_counts): Move to predict.c. (copy_cfg_body): Remove code obviated by reestimation. * tree-profile.c (gimple_gen_ior_profiler): (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph. (tree_profiling): Invoke rebuild_cgraph as needed. Index: graphite.c === --- graphite.c (revision 207436) +++ graphite.c (working copy) @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) cleanup_tree_cfg (); profile_status_for_fn (cfun) = PROFILE_ABSENT; release_recorded_exits (); -
[GOOGLE] Prevent x_flag_complex_method to be set to 2 for C++.
With this patch x_flag_complex_method won't be set to 2 for C++ so that multiply/divide between std::complex objects won't be replaced by expensive builtin function calls. Bootstrapped and passed regression test. OK for Google branch? thanks, Cong Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 207701) +++ gcc/c-family/c-opts.c (working copy) @@ -204,8 +204,10 @@ c_common_init_options_struct (struct gcc opts-x_warn_write_strings = c_dialect_cxx (); opts-x_flag_warn_unused_result = true; - /* By default, C99-like requirements for complex multiply and divide. */ - opts-x_flag_complex_method = 2; + /* By default, C99-like requirements for complex multiply and divide. + But for C++ this should not be required. */ + if (c_language != clk_cxx) +opts-x_flag_complex_method = 2; } /* Common initialization before calling option handlers. */
Re: [GOOGLE] Prevent x_flag_complex_method to be set to 2 for C++.
ok. David On Tue, Feb 11, 2014 at 5:30 PM, Cong Hou co...@google.com wrote: With this patch x_flag_complex_method won't be set to 2 for C++ so that multiply/divide between std::complex objects won't be replaced by expensive builtin function calls. Bootstrapped and passed regression test. OK for Google branch? thanks, Cong Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 207701) +++ gcc/c-family/c-opts.c (working copy) @@ -204,8 +204,10 @@ c_common_init_options_struct (struct gcc opts-x_warn_write_strings = c_dialect_cxx (); opts-x_flag_warn_unused_result = true; - /* By default, C99-like requirements for complex multiply and divide. */ - opts-x_flag_complex_method = 2; + /* By default, C99-like requirements for complex multiply and divide. + But for C++ this should not be required. */ + if (c_language != clk_cxx) +opts-x_flag_complex_method = 2; } /* Common initialization before calling option handlers. */
Re: [PATCH] Handle more COMDAT profiling issues
On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li davi...@google.com wrote: Why is call graph needed to determine whether to drop the profile? Because we detect this situation by looking for cases where the call edge counts greatly exceed the callee node count. If that is needed, it might be possible to leverage the ipa_profile pass as it will walk through all function nodes to do profile annotation. With this you can make decision to drop callee profile in caller's context. There are 2 ipa profiling passes, which are somewhat confusingly named (to me at least. =). This is being done during the first. The first is pass_ipa_tree_profile in tree-profile.c, but is a SIMPLE_IPA_PASS and has the name profile in the dump. The second is pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the name profile_estimate in the dump. I assume you are suggesting to move this into the latter? But I'm not clear on what benefit that gives - the functions are not being traversed in order, so there is still the issue of needing to rebuild the cgraph after dropping profiles, which might be best done earlier as I have in the patch. Teresa David On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li davi...@google.com wrote: Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation? This is the problem I was mentioning below where we call counts_to_freqs before we have the cgraph and can tell that we will need to drop the profile. When we were only dropping the profile for functions with all 0 counts, we simply avoided doing the counts_to_freqs when the counts were all 0, which works since the 0 counts don't leave things in an inconsistent state (counts vs estimated frequencies). Teresa David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson tejohn...@google.com wrote: This patch attempts to address the lost profile issue for COMDATs in more circumstances, exposed by function splitting. My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies on 0-count functions from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. This could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. In some internal C++ applications I am seeing that we execute out of the split cold portion of COMDATs for this reason. This problem is more complicated to address than the 0-count instance, because we need the cgraph to determine which functions to drop the profile on, and at that point the estimated frequencies have already been overwritten by counts_to_freqs. To handle this broader case, I have changed the drop_profile routine to simply re-estimate the probabilities/frequencies (and translate these into counts scaled from the incoming call edge counts). This unfortunately necessitates rebuilding the cgraph, to propagate the new synthesized counts and avoid checking failures downstream. But it will only be rebuilt if we dropped any profiles. With this solution, some of the older approach can be removed (e.g. propagating counts using the guessed probabilities during inlining). Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? Thanks, Teresa 2014-02-10 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from tree_estimate_probability_driver. (tree_reestimate_probability): New function. (tree_estimate_probability_driver): Invoke tree_estimate_probability_worker. (freqs_to_counts): Move from tree-inline.c. (drop_profile): Re-estimated profiles when dropping counts. (handle_missing_profiles): Drop for some
Re: [PATCH] Handle more COMDAT profiling issues
On Tue, Feb 11, 2014 at 5:36 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li davi...@google.com wrote: Why is call graph needed to determine whether to drop the profile? Because we detect this situation by looking for cases where the call edge counts greatly exceed the callee node count. If that is needed, it might be possible to leverage the ipa_profile pass as it will walk through all function nodes to do profile annotation. With this you can make decision to drop callee profile in caller's context. There are 2 ipa profiling passes, which are somewhat confusingly named (to me at least. =). This is being done during the first. The first is pass_ipa_tree_profile in tree-profile.c, but is a SIMPLE_IPA_PASS and has the name profile in the dump. The second is pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the name profile_estimate in the dump. I assume you are suggesting to move this into the latter? But I'm not clear on what benefit that gives - the functions are not being traversed in order, so there is still the issue of needing to rebuild the cgraph after dropping profiles, which might be best done earlier as I have in the patch. I meant the tree-profile one. I think this might work: after all the function's profile counts are annotated, add another walk of the the call graph nodes to drop bad profiles before the the call graph is rebuilt (Call graph does exist at that point). David Teresa David On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li davi...@google.com wrote: Is it better to add some logic in counts_to_freq to determine if the profile count needs to be dropped completely to force profile estimation? This is the problem I was mentioning below where we call counts_to_freqs before we have the cgraph and can tell that we will need to drop the profile. When we were only dropping the profile for functions with all 0 counts, we simply avoided doing the counts_to_freqs when the counts were all 0, which works since the 0 counts don't leave things in an inconsistent state (counts vs estimated frequencies). Teresa David On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson tejohn...@google.com wrote: This patch attempts to address the lost profile issue for COMDATs in more circumstances, exposed by function splitting. My earlier patch handled the case where the comdat had 0 counts since the linker kept the copy in a different module. In that case we prevent the guessed frequencies on 0-count functions from being dropped by counts_to_freq, and then later mark any reached via non-zero callgraph edges as guessed. Finally, when one such 0-count comdat is inlined the call count is propagated to the callee blocks using the guessed probabilities. However, in this case, there was a comdat that had a very small non-zero count, that was being inlined to a much hotter callsite. This could happen when there was a copy that was ipa-inlined in the profile gen compile, so the copy in that module gets some non-zero counts from the ipa inlined instance, but when the out of line copy was eliminated by the linker (selected from a different module). In this case the inliner was scaling the bb counts up quite a lot when inlining. The problem is that you most likely can't trust that the 0 count bbs in such a case are really not executed by the callsite it is being into, since the counts are very small and correspond to a different callsite. In some internal C++ applications I am seeing that we execute out of the split cold portion of COMDATs for this reason. This problem is more complicated to address than the 0-count instance, because we need the cgraph to determine which functions to drop the profile on, and at that point the estimated frequencies have already been overwritten by counts_to_freqs. To handle this broader case, I have changed the drop_profile routine to simply re-estimate the probabilities/frequencies (and translate these into counts scaled from the incoming call edge counts). This unfortunately necessitates rebuilding the cgraph, to propagate the new synthesized counts and avoid checking failures downstream. But it will only be rebuilt if we dropped any profiles. With this solution, some of the older approach can be removed (e.g. propagating counts using the guessed probabilities during inlining). Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? Thanks, Teresa 2014-02-10 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from
Warn about virtual table mismatches
Hi, this patch implements warning when we merge two virtual tables that do not match. It compares the size and also go into fields. I had to implement my own comparsion code, since the vtables may subtly differ - in RTTI and also I need to do so during DECL merging (since we will forget about the decl then) Here is an example of real ODR violation in bugzilla it finds. ../../dist/include/mozilla/a11y/DocAccessible.h:40:0: warning: type �struct DocAccessible� violates one definition rule [enabled by default] class DocAccessible : public HyperTextAccessibleWrap, ^ /aux/hubicka/firefox/accessible/src/generic/DocAccessible.h:40:0: note: a type with the same name but different virtual table is defined in another translation unit class DocAccessible : public HyperTextAccessibleWrap, ^ /aux/hubicka/firefox/accessible/src/generic/DocAccessible.cpp:1232:0: note: the first different method is �HandleAccEvent� DocAccessible::HandleAccEvent(AccEvent* aEvent) ^ /aux/hubicka/firefox/accessible/src/atk/AccessibleWrap.cpp:956:0: note: a conflicting method is �HandleAccEvent� AccessibleWrap::HandleAccEvent(AccEvent* aEvent) ^ The patch is not terribly pretty, but does the job. I will wait for few days for comments/suggestins and then plan to commit it. PR lto/59468 * lto/lto-symtab.c (lto_symtab_prevailing_decl): Check virtual tables for ODR violations. * ipa-devirt.c (compare_virtual_tables): New function. * ipa-utils.h (compare_virtual_tables): Declare. Index: lto/lto-symtab.c === --- lto/lto-symtab.c(revision 207702) +++ lto/lto-symtab.c(working copy) @@ -679,5 +679,13 @@ lto_symtab_prevailing_decl (tree decl) if (!ret) return decl; + /* Check and report ODR violations on virtual tables. */ + if (decl != ret-decl DECL_VIRTUAL_P (decl)) +{ + compare_virtual_tables (ret-decl, decl); + /* We are done with checking and DECL will die after merigng. */ + DECL_VIRTUAL_P (decl) = 0; +} + return ret-decl; } Index: ipa-devirt.c === --- ipa-devirt.c(revision 207702) +++ ipa-devirt.c(working copy) @@ -1675,6 +1673,101 @@ update_type_inheritance_graph (void) } +/* Compare two virtual tables, PREVAILING and VTABLE and output ODR + violation warings. */ + +void +compare_virtual_tables (tree prevailing, tree vtable) +{ + tree init1 = DECL_INITIAL (prevailing), init2 = DECL_INITIAL (vtable); + if (init1 == init2) +return; + if (init2 == error_mark_node) +return; + /* Be sure to keep virtual table contents even for external + vtables when they are available. */ + if (init1 == error_mark_node || !init1) +{ + DECL_INITIAL (prevailing) = DECL_INITIAL (vtable); + return; +} + if (!init2 DECL_EXTERNAL (vtable)) +return; + if (DECL_VIRTUAL_P (prevailing) init1 init2 + CONSTRUCTOR_NELTS (init1) == CONSTRUCTOR_NELTS (init2)) +{ + unsigned i; + tree field1, field2; + tree val1, val2; + bool matched = true; + + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init1), +i, field1, val1) + { + gcc_assert (!field1); + field2 = CONSTRUCTOR_ELT (init2, i)-index; + val2 = CONSTRUCTOR_ELT (init2, i)-value; + gcc_assert (!field2); + if (val2 == val1) + continue; + STRIP_NOPS (val1); + STRIP_NOPS (val2); + /* Unwind + val addr_expr type pointer_type + readonly constant + arg 0 mem_ref type pointer_type __vtbl_ptr_type + readonly + arg 0 addr_expr type pointer_type + arg 0 var_decl _ZTCSd0_Si arg 1 integer_cst 24 */ + + while ((TREE_CODE (val1) == MEM_REF + TREE_CODE (val2) == MEM_REF + (TREE_OPERAND (val1, 1) + == TREE_OPERAND (val2, 1))) +|| (TREE_CODE (val1) == ADDR_EXPR + TREE_CODE (val2) == ADDR_EXPR)) + { + val1 = TREE_OPERAND (val1, 0); + val2 = TREE_OPERAND (val2, 0); + } + if (val1 == val2) + continue; + if (TREE_CODE (val2) == ADDR_EXPR) + { + tree tmp = val1; + val1 = val2; + val2 = tmp; +} + /* Allow combining RTTI and non-RTTI is OK. */ + if (TREE_CODE (val1) == ADDR_EXPR + TREE_CODE (TREE_OPERAND (val1, 0)) == VAR_DECL + !DECL_VIRTUAL_P (TREE_OPERAND (val1, 0)) + TREE_CODE (val2) == NOP_EXPR + integer_zerop (TREE_OPERAND (val2, 0))) + continue; + if (TREE_CODE (val1) == NOP_EXPR + TREE_CODE (val2) == NOP_EXPR + integer_zerop (TREE_OPERAND (val1, 0)) + integer_zerop
Re: Warn about virtual table mismatches
On 02/11/2014 07:54 PM, Jan Hubicka wrote: + /* Allow combining RTTI and non-RTTI is OK. */ You mean combining -frtti and -fno-rtti compiles? Yes, that's fine, though you need to prefer the -frtti version in case code from that translation unit uses the RTTI info. /aux/hubicka/firefox/accessible/src/generic/DocAccessible.cpp:1232:0: note: the first different method is �HandleAccEvent� I don't see where this note would come from in the patch. Jason
Re: Warn about virtual table mismatches
On 02/11/2014 07:54 PM, Jan Hubicka wrote: + /* Allow combining RTTI and non-RTTI is OK. */ You mean combining -frtti and -fno-rtti compiles? Yes, that's fine, though you need to prefer the -frtti version in case code from that translation unit uses the RTTI info. Is there some mechanism that linker will do so? At the moment we just chose variant that would be selected by linker. I can make the choice, but what happens with non-LTO? /aux/hubicka/firefox/accessible/src/generic/DocAccessible.cpp:1232:0: note: the first different method is �HandleAccEvent� I don't see where this note would come from in the patch. Sorry, diffed old tree Index: ipa-devirt.c === --- ipa-devirt.c(revision 207702) +++ ipa-devirt.c(working copy) @@ -1675,6 +1675,132 @@ } +/* Compare two virtual tables, PREVAILING and VTABLE and output ODR + violation warings. */ + +void +compare_virtual_tables (tree prevailing, tree vtable) +{ + tree init1 = DECL_INITIAL (prevailing), init2 = DECL_INITIAL (vtable); + tree val1 = NULL, val2 = NULL; + if (!DECL_VIRTUAL_P (prevailing)) +{ + odr_violation_reported = true; + if (warning_at (DECL_SOURCE_LOCATION (prevailing), 0, +declaration %D conflict with a virtual table, +prevailing)) + inform (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT (vtable))), + a type defining the virtual table in another translation unit); + return; +} + if (init1 == init2 || init2 == error_mark_node) +return; + /* Be sure to keep virtual table contents even for external + vtables when they are available. */ + gcc_assert (init1 init1 != error_mark_node); + if (!init2 DECL_EXTERNAL (vtable)) +return; + if (init1 init2 + CONSTRUCTOR_NELTS (init1) == CONSTRUCTOR_NELTS (init2)) +{ + unsigned i; + tree field1, field2; + bool matched = true; + + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init1), +i, field1, val1) + { + gcc_assert (!field1); + field2 = CONSTRUCTOR_ELT (init2, i)-index; + val2 = CONSTRUCTOR_ELT (init2, i)-value; + gcc_assert (!field2); + if (val2 == val1) + continue; + if (TREE_CODE (val1) == NOP_EXPR) + val1 = TREE_OPERAND (val1, 0); + if (TREE_CODE (val2) == NOP_EXPR) + val2 = TREE_OPERAND (val2, 0); + /* Unwind + val addr_expr type pointer_type + readonly constant + arg 0 mem_ref type pointer_type __vtbl_ptr_type + readonly + arg 0 addr_expr type pointer_type + arg 0 var_decl _ZTCSd0_Si arg 1 integer_cst 24 */ + + while (TREE_CODE (val1) == TREE_CODE (val2) + (((TREE_CODE (val1) == MEM_REF + || TREE_CODE (val1) == POINTER_PLUS_EXPR) + (TREE_OPERAND (val1, 1)) + == TREE_OPERAND (val2, 1)) +|| TREE_CODE (val1) == ADDR_EXPR)) + { + val1 = TREE_OPERAND (val1, 0); + val2 = TREE_OPERAND (val2, 0); + if (TREE_CODE (val1) == NOP_EXPR) + val1 = TREE_OPERAND (val1, 0); + if (TREE_CODE (val2) == NOP_EXPR) + val2 = TREE_OPERAND (val2, 0); + } + if (val1 == val2) + continue; + if (TREE_CODE (val2) == ADDR_EXPR) + { + tree tmp = val1; + val1 = val2; + val2 = tmp; +} + /* Combining RTTI and non-RTTI vtables is OK. +??? Perhaps we can alsa (optionally) warn here? */ + if (TREE_CODE (val1) == ADDR_EXPR + TREE_CODE (TREE_OPERAND (val1, 0)) == VAR_DECL + !DECL_VIRTUAL_P (TREE_OPERAND (val1, 0)) + integer_zerop (val2)) + continue; + /* For some reason zeros gets NOP_EXPR wrappers. */ + if (integer_zerop (val1) + integer_zerop (val2)) + continue; + /* Compare assembler names; this function is run during +declaration merging. */ + if (DECL_P (val1) DECL_P (val2) + DECL_ASSEMBLER_NAME (val1) == DECL_ASSEMBLER_NAME (val2)) + continue; + matched = false; + break; + } + if (!matched) + { + odr_violation_reported = true; + if (warning_at (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT (vtable))), 0, +type %qD violates one definition rule, +DECL_CONTEXT (vtable))) + { + inform (DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT (prevailing))), + a type with the same name but different virtual table is + defined in another translation unit); +
[Ping] Re: [C++ Patch/RFC] PR 60047
Hi, the last version of this work is unreviewed, should be rather straightforward... Paolo.
Re: [PATCH] Documentation for dump and optinfo output
Hi! On Wed, 5 Feb 2014 16:33:19 -0800, Sharad Singhai sing...@google.com wrote: I am really sorry about the delay. No worries; not exactly a severe issue. ;-) I couldn't exactly reproduce the warning which you described Maybe the version of makeinfo is relevant? $ makeinfo --version | head -n 1 makeinfo (GNU texinfo) 5.1 but I found a place where two nodes were out of order in optinfo.texi. Could you please apply the following patch and see if the problem goes away? If it works for you, I will commit the doc fixes. Also I would appreciate the exact command which produces these warnings. Your patch does fix the problem; see the following diff of the build log, where the warnings are now gone, and which also happens to contain the makeinfo command line. @@ -4199,12 +4199,6 @@ if [ xinfo = xinfo ]; then \ makeinfo --split-size=500 --split-size=500 --no-split -I . -I ../../source/gcc/doc \ -I ../../source/gcc/doc/include -o doc/gccint.info ../../source/gcc/doc/gccint.texi; \ fi -../../source/gcc/doc/optinfo.texi:45: warning: node next `Optimization groups' in menu `Dump output verbosity' and in sectioning `Dump files and streams' differ -../../source/gcc/doc/optinfo.texi:77: warning: node next `Dump files and streams' in menu `Dump types' and in sectioning `Dump output verbosity' differ -../../source/gcc/doc/optinfo.texi:77: warning: node prev `Dump files and streams' in menu `Dump output verbosity' and in sectioning `Optimization groups' differ -../../source/gcc/doc/optinfo.texi:104: warning: node next `Dump output verbosity' in menu `Dump files and streams' and in sectioning `Dump types' differ -../../source/gcc/doc/optinfo.texi:104: warning: node prev `Dump output verbosity' in menu `Optimization groups' and in sectioning `Dump files and streams' differ -../../source/gcc/doc/optinfo.texi:137: warning: node prev `Dump types' in menu `Dump files and streams' and in sectioning `Dump output verbosity' differ if [ xinfo = xinfo ]; then \ makeinfo --split-size=500 --split-size=500 --no-split -I ../../source/gcc/doc \ -I ../../source/gcc/doc/include -o doc/gccinstall.info ../../source/gcc/doc/install.texi; \ * doc/optinfo.texi: Fix order of nodes. Thanks, please commit. Grüße, Thomas pgpFX9jiZv3dY.pgp Description: PGP signature