Re: [C,C++] integer constants in attribute arguments
Marc Glisse marc.gli...@inria.fr writes: /usr/local/gcc/gcc-20140204/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:32:6: error: alignment for 'void g()' must be at least 16 (I don't know why we error out for this, align specifies a minimal alignment, if it ends up more aligned, that's fine) Probably because there is no packed attribute for functions, so the alignment is always taken to be exact. Feel free to replace 8 with 16 in the initialization of size (you can commit it as obvious if it works for you). Are there any targets that may reject an overaligned function decl? Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [C,C++] integer constants in attribute arguments
On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote: Feel free to replace 8 with 16 in the initialization of size (you can commit it as obvious if it works for you). Are there any targets that may reject an overaligned function decl? I think that is very well possible. Why not just #ifdef that particular test line to a handful of widely used targets where it is known to work? Jakub
Re: [C,C++] integer constants in attribute arguments
The test g++.dg/cpp0x/constexpr-attribute2.C fails on darwin with FAIL: g++.dg/cpp0x/constexpr-attribute2.C (test for excess errors) Excess errors: /opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:9:40: error: 'init_priority' attribute is not supported on this platform /opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:16:46: error: 'init_priority' attribute is not supported on this platform /opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:23:45: error: 'init_priority' attribute is not supported on this platform IMO the test should check that 'init_priority' is supported. TIA Dominique
[PATCH][4.10] Move PRE:inhibit_phi_insertion to where it better applies
The following patch moves the code that avoids creating loop carried data dependences in PRE when later vectorizing to a place where it can use the optimizations PRE performs (in particular loop invariant motion). This makes sure it applies for Himeno when built with -fipa-pta which then allows the critical loop to be vectorized without any runtime checks for aliasing. Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for 4.10. Richard. 2014-02-05 Richard Biener rguent...@suse.de PR tree-optimization/60042 * tree-ssa-pre.c (inhibit_phi_insertion): Remove. (insert_into_preds_of_block): Do not prevent PHI insertion for REFERENCE exprs here ... (eliminate_dom_walker::before_dom_children): ... but prevent their use here under similar conditions when applied to the IL after PRE optimizations. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 207455) --- gcc/tree-ssa-pre.c (working copy) *** create_expression_by_pieces (basic_block *** 3013,3078 } - /* Returns true if we want to inhibit the insertions of PHI nodes -for the given EXPR for basic block BB (a member of a loop). -We want to do this, when we fear that the induction variable we -create might inhibit vectorization. */ - - static bool - inhibit_phi_insertion (basic_block bb, pre_expr expr) - { - vn_reference_t vr = PRE_EXPR_REFERENCE (expr); - vecvn_reference_op_s ops = vr-operands; - vn_reference_op_t op; - unsigned i; - - /* If we aren't going to vectorize we don't inhibit anything. */ - if (!flag_tree_loop_vectorize) - return false; - - /* Otherwise we inhibit the insertion when the address of the - memory reference is a simple induction variable. In other - cases the vectorizer won't do anything anyway (either it's - loop invariant or a complicated expression). */ - FOR_EACH_VEC_ELT (ops, i, op) - { - switch (op-opcode) - { - case CALL_EXPR: - /* Calls are not a problem. */ - return false; - - case ARRAY_REF: - case ARRAY_RANGE_REF: - if (TREE_CODE (op-op0) != SSA_NAME) - break; - /* Fallthru. */ - case SSA_NAME: - { - basic_block defbb = gimple_bb (SSA_NAME_DEF_STMT (op-op0)); - affine_iv iv; - /* Default defs are loop invariant. */ - if (!defbb) - break; - /* Defined outside this loop, also loop invariant. */ - if (!flow_bb_inside_loop_p (bb-loop_father, defbb)) - break; - /* If it's a simple induction variable inhibit insertion, - the vectorizer might be interested in this one. */ - if (simple_iv (bb-loop_father, bb-loop_father, - op-op0, iv, true)) - return true; - /* No simple IV, vectorizer can't do anything, hence no - reason to inhibit the transformation for this operand. */ - break; - } - default: - break; - } - } - return false; - } - /* Insert the to-be-made-available values of expression EXPRNUM for each predecessor, stored in AVAIL, into the predecessors of BLOCK, and merge the result with a phi node, given the same value number as --- 3013,3018 *** insert_into_preds_of_block (basic_block *** 3106,3113 EDGE_PRED (block, 1)-src); /* Induction variables only have one edge inside the loop. */ if ((firstinsideloop ^ secondinsideloop) ! (expr-kind != REFERENCE ! || inhibit_phi_insertion (block, expr))) { if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Skipping insertion of phi for partial redundancy: Looks like an induction variable\n); --- 3046,3052 EDGE_PRED (block, 1)-src); /* Induction variables only have one edge inside the loop. */ if ((firstinsideloop ^ secondinsideloop) ! expr-kind != REFERENCE) { if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Skipping insertion of phi for partial redundancy: Looks like an induction variable\n); *** eliminate_dom_walker::before_dom_childre *** 4234,4239 --- 4173,4228 gcc_assert (sprime != rhs); + /* Inhibit the use of an inserted PHI on a loop header when +the address of the memory reference is a simple induction +variable. In other cases the vectorizer won't do anything +anyway (either it's loop invariant or a complicated +expression). */ + if (flag_tree_loop_vectorize + gimple_assign_single_p (stmt) +
Re: [C,C++] integer constants in attribute arguments
On Wed, 5 Feb 2014, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote: Feel free to replace 8 with 16 in the initialization of size (you can commit it as obvious if it works for you). Are there any targets that may reject an overaligned function decl? I think that is very well possible. Why not just #ifdef that particular test line to a handful of widely used targets where it is known to work? With Dominique's message that Darwin doesn't support init_priority, would this be ok? That's the most frequently tested platform, some attributes are already tested elsewhere, etc. // { dg-do compile { target x86_64-*linux-gnu } } -- Marc Glisse
Re: [C,C++] integer constants in attribute arguments
On Wed, Feb 05, 2014 at 09:58:28AM +0100, Marc Glisse wrote: On Wed, 5 Feb 2014, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote: Feel free to replace 8 with 16 in the initialization of size (you can commit it as obvious if it works for you). Are there any targets that may reject an overaligned function decl? I think that is very well possible. Why not just #ifdef that particular test line to a handful of widely used targets where it is known to work? With Dominique's message that Darwin doesn't support init_priority, would this be ok? That's the most frequently tested platform, some attributes are already tested elsewhere, etc. // { dg-do compile { target x86_64-*linux-gnu } } I'd use i?86-*linux* x86_64-*linux* instead, but otherwise LGTM. Please give others a chance to comment on that though. Jakub
[PATCH] Improve diagnostics for PR60042
The following improves dumping / messages for vectorizer runtime alias checks. Bootstrap / regtest running on x86_64-unknown-linux-gnu. I'll consider this documentation and thus appropriate at this stage. Richard. 2014-02-05 Richard Biener rguent...@suse.de * tree-vect-loop.c (vect_analyze_loop_2): Be more informative when not vectorizing because of too many alias checks. * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Add more verboseness, avoid duplicate MSG_MISSED_OPTIMIZATION. Index: gcc/tree-vect-loop.c === *** gcc/tree-vect-loop.c(revision 207497) --- gcc/tree-vect-loop.c(working copy) *** vect_analyze_loop_2 (loop_vec_info loop_ *** 1723,1730 { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, !too long list of versioning for alias !run-time tests.\n); return false; } --- 1723,1732 { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, !number of versioning for alias !run-time tests exceeds %d !(--param vect-max-version-for-alias-checks)\n, !PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)); return false; } Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c (revision 207497) --- gcc/tree-vect-data-refs.c (working copy) *** vect_prune_runtime_alias_test_list (loop *** 2901,2906 --- 2923,2946 diff - (HOST_WIDE_INT) TREE_INT_CST_LOW (dr_a1-seg_len) min_seg_len_b)) { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + merging ranges for ); + dump_generic_expr (MSG_NOTE, TDF_SLIM, +DR_REF (dr_a1-dr)); + dump_printf (MSG_NOTE, , ); + dump_generic_expr (MSG_NOTE, TDF_SLIM, +DR_REF (dr_b1-dr)); + dump_printf (MSG_NOTE, and ); + dump_generic_expr (MSG_NOTE, TDF_SLIM, +DR_REF (dr_a2-dr)); + dump_printf (MSG_NOTE, , ); + dump_generic_expr (MSG_NOTE, TDF_SLIM, +DR_REF (dr_b2-dr)); + dump_printf (MSG_NOTE, \n); + } + dr_a1-seg_len = size_binop (PLUS_EXPR, dr_a2-seg_len, size_int (diff)); comp_alias_ddrs.ordered_remove (i--); *** vect_prune_runtime_alias_test_list (loop *** 2908,2925 } } if ((int) comp_alias_ddrs.length () PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) ! { ! if (dump_enabled_p ()) ! { ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, ! disable versioning for alias - max number of ! generated checks exceeded.\n); ! } ! ! return false; ! } return true; } --- 2948,2959 } } + dump_printf_loc (MSG_NOTE, vect_location, + improved number of alias checks from %d to %d\n, + may_alias_ddrs.length (), comp_alias_ddrs.length ()); if ((int) comp_alias_ddrs.length () PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)) ! return false; return true; }
Re: [C,C++] integer constants in attribute arguments
On Wed, 5 Feb 2014, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 09:58:28AM +0100, Marc Glisse wrote: On Wed, 5 Feb 2014, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote: Feel free to replace 8 with 16 in the initialization of size (you can commit it as obvious if it works for you). Are there any targets that may reject an overaligned function decl? I think that is very well possible. Why not just #ifdef that particular test line to a handful of widely used targets where it is known to work? With Dominique's message that Darwin doesn't support init_priority, would this be ok? That's the most frequently tested platform, some attributes are already tested elsewhere, etc. // { dg-do compile { target x86_64-*linux-gnu } } I'd use i?86-*linux* x86_64-*linux* instead, but otherwise LGTM. I wasn't sure if all libc (not just glibc) used on linux could handle all the attributes, but now you've said it, good :-) Please give others a chance to comment on that though. I'll give it a day (it can easily be changed again later). -- Marc Glisse
Re: [C,C++] integer constants in attribute arguments
I'll give it a day (it can easily be changed again later). IMO it would be better (more general) to use a dg-require-effective-target with the corresponding test(s) in target-supports-dg.exp. Dominique
Re: [C,C++] integer constants in attribute arguments
On Wed, 5 Feb 2014, Dominique Dhumieres wrote: I'll give it a day (it can easily be changed again later). IMO it would be better (more general) to use a dg-require-effective-target with the corresponding test(s) in target-supports-dg.exp. But which tests? One for over/under alignment of functions, one for init_priority, maybe yet one more, all that for a file that is just checking that the parser understands the attribute argument is a constant... I agree on the principle, but unless those tests already exist, it doesn't seem worth it (well, if you are volunteering to write them, I certainly won't object). -- Marc Glisse
Re: [C,C++] integer constants in attribute arguments
On Wed, Feb 05, 2014 at 10:43:24AM +0100, Marc Glisse wrote: On Wed, 5 Feb 2014, Dominique Dhumieres wrote: I'll give it a day (it can easily be changed again later). IMO it would be better (more general) to use a dg-require-effective-target with the corresponding test(s) in target-supports-dg.exp. But which tests? One for over/under alignment of functions, one for init_priority, maybe yet one more, all that for a file that is just checking that the parser understands the attribute argument is a constant... I agree on the principle, but unless those tests already exist, it doesn't seem worth it (well, if you are volunteering to write them, I certainly won't object). Yeah, the tests really don't test anything target specific, but test it on attributes that have target specific meaning/applicability. So it is enough to catch it on some most widely tested targets, any regression in that area will be known the same day when it breaks. Jakub
Re: PR ipa/59831 (ipa-cp devirt issues)
Hi, On Wed, Feb 05, 2014 at 12:47:30AM +0100, Jan Hubicka wrote: - if (TREE_CODE (t) != TREE_BINFO) + /* Try to work out BINFO from virtual table pointer value in replacements. */ + if (!t agg_reps !ie-indirect_info-by_ref) At this point you know that !ie-indirect_info-polymorphic is set and thus ie-indirect_info-by_ref is always false because it really has no meaning (it is only meaningful when agg_contents is set which is mutually exclusive with the polymorphic flag). I was worried here about case where in future we may want to represent call of virtual methods from pointers passed by reference (i.e. in some other object). We don't do that at the moment, but for that would really need better jump functions. If you preffer, I can remove that check. I think it would be better, yes. IIRC, We want to re-organize indirect_info anyway (I vaguely remember we want to split the overloaded offset field into two but forgot the exact reason why but I have it written somewhere), I suppose we'll be turning it into a union (or class hierarchy?) and this would make it slightly awkward. If we ever support the pointer-by-reference scenario, quite a few more places will need to be adjusted anyway. Thanks, Martin
Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
Hi Marcus, + ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0 + [(set_attr length 12)]) This pattern emits an opaque sequence of instructions that cannot be scheduled, is that necessary? Can we not expand individual instructions or at least split ? Almost all the ports emits a template of assembly instructions. I m not sure why they have to be generated this way. But usage of these pattern is to clear the register that holds canary value immediately after its usage. -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-do compile { target stack_protection } } */ /* { dg-options -O2 -fstack-protector-strong } */ Do we need a new effective target test, why is the existing fstack_protector not appropriate? stack_protector does a run time test. It failed in cross compilation environment and these are compile only tests. Also I thought richard suggested me to add a new option for this. ref: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03358.html regards, Venkat. On 4 February 2014 21:39, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: Hi Venkat, On 22 January 2014 16:57, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: Hi Marcus, After we changed the frame growing direction (downwards) in Aarch64, the back-end now generates stack smashing set and test based on generic code available in GCC. But most of the ports (i386, spu, rs6000, s390, sh, sparc, tilepro and tilegx) define machine descriptions using standard pattern names 'stack_protect_set' and 'stack_protect_test'. This is done for both TLS model as well as global variable based stack guard model. + + ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0 + [(set_attr length 12)]) This pattern emits an opaque sequence of instructions that cannot be scheduled, is that necessary? Can we not expand individual instructions or at least split ? + ldr\t%x3, %x1\;ldr\t%x0, %x2\;eor\t%x0, %x3, %x0 + [(set_attr length 12)]) Likewise. -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-do compile { target stack_protection } } */ /* { dg-options -O2 -fstack-protector-strong } */ Do we need a new effective target test, why is the existing fstack_protector not appropriate? Cheers /Marcus
Re: [ARM Documentation] Clarify -mcpu, -mtune, -march
*ping* Thanks, James On Mon, Jan 27, 2014 at 10:01:51AM +, James Greenhalgh wrote: Hi, I've tripped myself over with these three options too many times, actually, their behaviour is very simple. This patch clarifies the language used to describe the options, and puts them in a logical order. I'm happy to reword again if this is still not clear. OK? Thanks, James --- gcc/ 2014-01-27 James Greenhalgh james.greenha...@arm.com PR target/59718 * doc/invoke.texi (-march=): Clarify documentation for ARM. (-mtune=): Likewise. (-mcpu=): Likewise. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 8c620a5..38a55a0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12221,11 +12221,35 @@ option should only be used if you require compatibility with code for big-endian ARM processors generated by versions of the compiler prior to 2.8. This option is now deprecated. -@item -mcpu=@var{name} -@opindex mcpu -This specifies the name of the target ARM processor. GCC uses this name -to determine what kind of instructions it can emit when generating -assembly code. Permissible names are: @samp{arm2}, @samp{arm250}, +@item -march=@var{name} +@opindex march +This specifies the name of the target ARM architecture. GCC uses this +name to determine what kind of instructions it can emit when generating +assembly code. This option can be used in conjunction with or instead +of the @option{-mcpu=} option. Permissible names are: @samp{armv2}, +@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t}, +@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te}, +@samp{armv6}, @samp{armv6j}, +@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk}, @samp{armv6-m}, +@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, +@samp{armv8-a}, @samp{armv8-a+crc}, +@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}. + +@option{-march=armv8-a+crc} enables code generation for the ARMv8-A +architecture together with the optional CRC32 extensions. + +@option{-march=native} causes the compiler to auto-detect the architecture +of the build computer. At present, this feature is only supported on +Linux, and not all architectures are recognized. If the auto-detect is +unsuccessful the option has no effect. + +@item -mtune=@var{name} +@opindex mtune +This option specifies the name of the target ARM processor for +which GCC should tune the performance of the code. +For some ARM implementations better performance can be obtained by using +this option. +Permissible names are: @samp{arm2}, @samp{arm250}, @samp{arm3}, @samp{arm6}, @samp{arm60}, @samp{arm600}, @samp{arm610}, @samp{arm620}, @samp{arm7}, @samp{arm7m}, @samp{arm7d}, @samp{arm7dm}, @samp{arm7di}, @samp{arm7dmi}, @samp{arm70}, @samp{arm700}, @@ -12259,26 +12283,6 @@ Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: @samp{cortex-a15.cortex-a7}, @samp{cortex-a57.cortex-a53}. -@option{-mcpu=generic-@var{arch}} is also permissible, and is -equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}. -See @option{-mtune} for more information. - -@option{-mcpu=native} causes the compiler to auto-detect the CPU -of the build computer. At present, this feature is only supported on -Linux, and not all architectures are recognized. If the auto-detect is -unsuccessful the option has no effect. - -@item -mtune=@var{name} -@opindex mtune -This option is very similar to the @option{-mcpu=} option, except that -instead of specifying the actual target processor type, and hence -restricting which instructions can be used, it specifies that GCC should -tune the performance of the code as if the target were of the type -specified in this option, but still choosing the instructions it -generates based on the CPU specified by a @option{-mcpu=} option. -For some ARM implementations better performance can be obtained by using -this option. - @option{-mtune=generic-@var{arch}} specifies that GCC should tune the performance for a blend of processors within architecture @var{arch}. The aim is to generate code that run well on the current most popular @@ -12291,24 +12295,21 @@ of the build computer. At present, this feature is only supported on Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect. -@item -march=@var{name} -@opindex march -This specifies the name of the target ARM architecture. GCC uses this -name to determine what kind of instructions it can emit when generating -assembly code. This option can be used in conjunction with or instead -of the @option{-mcpu=} option. Permissible names are: @samp{armv2}, -@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t}, -@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te}, -@samp{armv6},
RFA: MN10300: Include saved registers in stack usage
Hi Jeff, Hi Alex, Sorry - another MN10300 patch - this time for the stack size reported with -fstack-usage. Currently the value includes the stack frame, but it does not take into account any registers that might have been pushed onto the stack. The patch below fixes this problem, but there is one thing that I am not sure about - is it OK to use __builtin_popcount() or should I be calling some other function ? Tested with no regression on an mn10300-elf toolchain. OK to apply ? Cheers Nick gcc/ChangeLog 2014-02-05 Nick Clifton ni...@redhat.com * config/mn10300/mn10300.c (mn10300_expand_prologue): Include saved registers in current function's stack size. Index: gcc/config/mn10300/mn10300.c === --- gcc/config/mn10300/mn10300.c(revision 207498) +++ gcc/config/mn10300/mn10300.c(working copy) @@ -745,13 +745,15 @@ mn10300_expand_prologue (void) { HOST_WIDE_INT size = mn10300_frame_size (); + int mask; + mask = mn10300_get_live_callee_saved_regs (NULL); + /* If we use any of the callee-saved registers, save them now. */ + mn10300_gen_multiple_store (mask); + if (flag_stack_usage_info) -current_function_static_stack_size = size; +current_function_static_stack_size = size + __builtin_popcount (mask) * 4; - /* If we use any of the callee-saved registers, save them now. */ - mn10300_gen_multiple_store (mn10300_get_live_callee_saved_regs (NULL)); - if (TARGET_AM33_2 fp_regs_to_save ()) { int num_regs_to_save = fp_regs_to_save (), i; @@ -767,6 +769,9 @@ unsigned int strategy_size = (unsigned)-1, this_strategy_size; rtx reg; + if (flag_stack_usage_info) + current_function_static_stack_size += num_regs_to_save * 4; + /* We have several different strategies to save FP registers. We can store them using SP offsets, which is beneficial if there are just a few registers to save, or we can use `a0' in
[PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060
When looking at PR60060, outputting the declaration debug info for a local static variable twice (once via BLOCK_VARS and once via calling the debug hook on all globals) I noticed that the rest_of_decl_compilation call in materialize_cgraph always works on the empty lto_global_var_decls vector (we only populate it later). That results in the opportunity to effectively remove it and in lto_write_globals walk over the defined vars in the varpool instead. Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN decls there (but I'm not sure of other side-effects of that ...). Thus, this cleanup first. LTO bootstrap / regtest running on x86_64-unknown-linux-gnu. Ok? Thanks, Richard. 2014-02-05 Richard Biener rguent...@suse.de lto/ * lto.h (lto_global_var_decls): Remove. * lto-lang.c (lto_init): Do not allocate lto_global_var_decls. (lto_write_globals): Do nothing in WPA stage, gather globals from the varpool here ... * lto.c (lto_main): ... not here. (materialize_cgraph): Do not call rest_of_decl_compilation on the empty lto_global_var_decls vector. (lto_global_var_decls): Remove. Index: gcc/lto/lto-lang.c === *** gcc/lto/lto-lang.c (revision 207497) --- gcc/lto/lto-lang.c (working copy) *** lto_getdecls (void) *** 1075,1085 static void lto_write_globals (void) { ! tree *vec = lto_global_var_decls-address (); ! int len = lto_global_var_decls-length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! vec_free (lto_global_var_decls); } static tree --- 1075,1094 static void lto_write_globals (void) { ! if (flag_wpa) ! return; ! ! /* Record the global variables. */ ! vectree lto_global_var_decls = vNULL; ! varpool_node *vnode; ! FOR_EACH_DEFINED_VARIABLE (vnode) ! lto_global_var_decls.safe_push (vnode-decl); ! ! tree *vec = lto_global_var_decls.address (); ! int len = lto_global_var_decls.length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! lto_global_var_decls.release (); } static tree *** lto_init (void) *** 1218,1224 #undef NAME_TYPE /* Initialize LTO-specific data structures. */ - vec_alloc (lto_global_var_decls, 256); in_lto_p = true; return true; --- 1227,1232 Index: gcc/lto/lto.c === *** gcc/lto/lto.c (revision 207497) --- gcc/lto/lto.c (working copy) *** along with GCC; see the file COPYING3. *** 50,57 #include context.h #include pass_manager.h - /* Vector to keep track of external variables we've seen so far. */ - vectree, va_gc *lto_global_var_decls; static GTY(()) tree first_personality_decl; --- 50,55 *** read_cgraph_and_symbols (unsigned nfiles *** 3009,3017 static void materialize_cgraph (void) { - tree decl; struct cgraph_node *node; - unsigned i; timevar_id_t lto_timer; if (!quiet_flag) --- 3007,3013 *** materialize_cgraph (void) *** 3043,3052 current_function_decl = NULL; set_cfun (NULL); - /* Inform the middle end about the global variables we have seen. */ - FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl) - rest_of_decl_compilation (decl, 1, 0); - if (!quiet_flag) fprintf (stderr, \n); --- 3039,3044 *** lto_main (void) *** 3309,3316 do_whole_program_analysis (); else { - varpool_node *vnode; - timevar_start (TV_PHASE_OPT_GEN); materialize_cgraph (); --- 3301,3306 *** lto_main (void) *** 3330,3339 this. */ if (flag_lto_report || (flag_wpa flag_lto_report_wpa)) print_lto_report_1 (); - - /* Record the global variables. */ - FOR_EACH_DEFINED_VARIABLE (vnode) - vec_safe_push (lto_global_var_decls, vnode-decl); } } --- 3320,3325 Index: gcc/lto/lto.h === *** gcc/lto/lto.h (revision 207497) --- gcc/lto/lto.h (working copy) *** extern tree lto_eh_personality (void); *** 41,49 extern void lto_main (void); extern void lto_read_all_file_options (void); - /* In lto-symtab.c */ - extern GTY(()) vectree, va_gc *lto_global_var_decls; - /* In lto-elf.c or lto-coff.c */ extern lto_file *lto_obj_file_open (const char *filename, bool writable); extern void lto_obj_file_close (lto_file *file); --- 41,46
Re: [C,C++] integer constants in attribute arguments
Marc Glisse marc.gli...@inria.fr writes: On Wed, 5 Feb 2014, Dominique Dhumieres wrote: I'll give it a day (it can easily be changed again later). IMO it would be better (more general) to use a dg-require-effective-target with the corresponding test(s) in target-supports-dg.exp. But which tests? One for over/under alignment of functions, one for init_priority, maybe yet one more, all that for a file that is just checking that the parser understands the attribute argument is a constant... Wrt. the function decl it should probably just be changed to a variable decl. Alignments on function decls are kind of exotic. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [ARM Documentation] Clarify -mcpu, -mtune, -march
On 01/27/14 10:01, James Greenhalgh wrote: Hi, I've tripped myself over with these three options too many times, actually, their behaviour is very simple. This patch clarifies the language used to describe the options, and puts them in a logical order. I'm happy to reword again if this is still not clear. OK? Thanks, James --- gcc/ 2014-01-27 James Greenhalgh james.greenha...@arm.com PR target/59718 * doc/invoke.texi (-march=): Clarify documentation for ARM. (-mtune=): Likewise. (-mcpu=): Likewise. s/=/ in Changelog :) Where this option is used in conjunction with @option{-march} or @option{-mtune}, those options override this option Also, rewording this sentence in the mcpu description like the AArch64 version would be better. i.e. Where this option is used in conjunction with @option{-march} or @option{-mtune}, those options override appropriate parts of this option. Ok with those changes. Thanks, Ramana -- Ramana Radhakrishnan Principal Engineer ARM Ltd.
New Dutch PO file for 'gcc' (version 4.9-b20140202)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Dutch team of translators. The file is available at: http://translationproject.org/latest/gcc/nl.po (This file, 'gcc-4.9-b20140202.nl.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [C,C++] integer constants in attribute arguments
On Wed, 5 Feb 2014, Andreas Schwab wrote: Marc Glisse marc.gli...@inria.fr writes: On Wed, 5 Feb 2014, Dominique Dhumieres wrote: I'll give it a day (it can easily be changed again later). IMO it would be better (more general) to use a dg-require-effective-target with the corresponding test(s) in target-supports-dg.exp. But which tests? One for over/under alignment of functions, one for init_priority, maybe yet one more, all that for a file that is just checking that the parser understands the attribute argument is a constant... Wrt. the function decl it should probably just be changed to a variable decl. Alignments on function decls are kind of exotic. Good idea. How about this patch then? -- Marc GlisseIndex: gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C === --- gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C (revision 207501) +++ gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C (working copy) @@ -1,10 +1,11 @@ +// { dg-do compile { target init_priority } } // { dg-options -std=gnu++11 } struct t { t(); }; constexpr int prio = 123; constexpr int size = 8; constexpr int pos = 1; enum A { zero = 0, one, two }; __attribute__((init_priority(prio))) t a; @@ -22,11 +23,11 @@ enum E2 { }; __attribute__((init_priority(E2_second))) t c; void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(pos,two))); void* my_realloc(void*, unsigned) __attribute__((alloc_size(two))); typedef char vec __attribute__((vector_size(size))); void f(char*) __attribute__((nonnull(pos))); -void g() __attribute__((aligned(size))); +char g __attribute__((aligned(size)));
[PATCH] Don't ICE on empty bbs with no successors in EH opt (PR middle-end/57499)
Hi! As the testcase shows, invalid use of noreturn attribute (which we diagnose) can result in in empty bb with no successors. This patch fixes cleanup_empty_eh not to ICE on these. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-04 Jakub Jelinek ja...@redhat.com PR middle-end/57499 * tree-eh.c (cleanup_empty_eh): Bail out on totally empty bb with no successors. * g++.dg/torture/pr57499.C: New test. --- gcc/tree-eh.c.jj2014-01-31 22:22:09.0 +0100 +++ gcc/tree-eh.c 2014-02-04 12:35:05.611066852 +0100 @@ -4396,8 +4396,11 @@ cleanup_empty_eh (eh_landing_pad lp) /* If the block is totally empty, look for more unsplitting cases. */ if (gsi_end_p (gsi)) { - /* For the degenerate case of an infinite loop bail out. */ - if (infinite_empty_loop_p (e_out)) + /* For the degenerate case of an infinite loop bail out. +If bb has no successors and is totally empty, which can happen e.g. +because of incorrect noreturn attribute, bail out too. */ + if (e_out == NULL + || infinite_empty_loop_p (e_out)) return ret; return ret | cleanup_empty_eh_unsplit (bb, e_out, lp); --- gcc/testsuite/g++.dg/torture/pr57499.C.jj 2014-02-04 12:52:10.241846755 +0100 +++ gcc/testsuite/g++.dg/torture/pr57499.C 2014-02-04 12:51:51.0 +0100 @@ -0,0 +1,14 @@ +// PR middle-end/57499 +// { dg-do compile } + +struct S +{ + ~S () __attribute__ ((noreturn)) {} // { dg-warning function does return } +}; + +void +foo () +{ + S s; + throw 1; +} Jakub
[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)
Hi! Using !!optimize to determine if we should switch local ABI to regparm convention isn't compatible with optimize attribute, as !!optimize is whether the current function is being optimized, but for the ABI decisions we actually need the caller and callee to agree on the calling convention. Fixed by looking at callee's optimize settings all the time. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-05 Jakub Jelinek ja...@redhat.com PR target/60062 * config/i386/i386.c (ix86_function_regparm): Use optimize from the callee instead of current function's optimize to determine if local regparm convention should be used. * gcc.c-torture/execute/pr60062.c: New test. * gcc.c-torture/execute/pr60072.c: New test. --- gcc/config/i386/i386.c.jj 2014-02-04 01:36:00.0 +0100 +++ gcc/config/i386/i386.c 2014-02-05 09:09:29.603827877 +0100 @@ -5608,11 +5608,22 @@ ix86_function_regparm (const_tree type, /* Use register calling convention for local functions when possible. */ if (decl TREE_CODE (decl) == FUNCTION_DECL - optimize !(profile_flag !flag_fentry)) { /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified. */ struct cgraph_local_info *i = cgraph_local_info (CONST_CAST_TREE (decl)); + + /* Caller and callee must agree on the calling convention, so +checking here just optimize means that with +__attribute__((optimize (...))) caller could use regparm convention +and callee not, or vice versa. Instead look at whether the callee +is optimized or not. */ + tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl); + if (opts == NULL_TREE) + opts = optimization_default_node; + if (!TREE_OPTIMIZATION (opts)-x_optimize) + i = NULL; + if (i i-local i-can_change_signature) { int local_regparm, globals = 0, regno; --- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 09:02:13.703085235 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr60062.c 2014-02-05 09:01:54.0 +0100 @@ -0,0 +1,25 @@ +/* PR target/60062 */ + +int a; + +static void +foo (const char *p1, int p2) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +static void +bar (const char *p1) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +__attribute__((optimize (0))) int +main () +{ + foo (hello, a); + bar (hello); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2013-08-25 18:20:55.717911035 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr60072.c 2014-02-05 11:33:07.501748426 +0100 @@ -0,0 +1,16 @@ +/* PR target/60072 */ + +int c = 1; + +__attribute__ ((optimize (1))) +static int *foo (int *p) +{ + return p; +} + +int +main () +{ + *foo (c) = 2; + return c - 2; +} Jakub
[PATCH] Make pr59597 test PIC-friendly
PR59597 reinstated some code to cancel unnecessary jump threading, and brought with it a testcase to check that the cancelling happened. http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01448.html With PIC enabled for arm and aarch64, the unnecessary jump threading already never took place, so there is nothing to cancel, leading the test case to fail. My suspicion is that similar issues will happen for other architectures too. This patch changes the called function to be static, so that jump threading and the resulting cancellation happen for PIC variants too. OK for stage 4 or wait for stage 1? Cheers, Ian 2014-02-05 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.dg/tree-ssa/pr59597.c: Make called function static so that expected outcome works for PIC variants too.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c index 814d299..bc9d730 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c @@ -8,7 +8,8 @@ typedef unsigned int u32; u32 f[NNN], t[NNN]; -u16 Calc_crc8(u8 data, u16 crc ) +static u16 +Calc_crc8 (u8 data, u16 crc) { u8 i=0,x16=0,carry=0; for (i = 0; i 8; i++) @@ -31,7 +32,9 @@ u16 Calc_crc8(u8 data, u16 crc ) } return crc; } -int main (int argc, char argv[]) + +int +main (int argc, char argv[]) { int i, j; u16 crc; for (j = 0; j 1000; j++)
[PATCH] One possible fix for the compute_bb_predicate oscillation (PR ipa/60013)
Hi! Here is one possible fix for the PR60013 compute_bb_predicates oscillation, the PR contains long analysis, but to sum it up, the problem is that we only allow 8 conjuction operands for the predicates and when we reach 8 and want to add some further one, we just throw some of the disjunctions on the floor, which makes the dataflow computation not guaranteed to terminate. The following patch fixes that by just turning all predicates that would need more than 8 CNF toplevel operands into always true predicates (meaning that we conservatively consider the particular basic block to be potentially always executed independently on the function arguments). I've bootstrapped/regtested this patch on x86_64-linux and i686-linux, the (i2 == MAX_CLAUSES) condition was true in 2447 cases while inside of compute_bb_predicates (on average compute_bb_predicates call that triggered at least one such conservative bail out would see 3.14 of those) and 482 times outside of compute_bb_predicate, on 159 unique source files something triggered. But, the fact that it triggered doesn't mean at all the function would be even considered for inlining and even if it would, it often would not change the inlining decisions. As I said in the PR, the other possibility I see is just in letting compute_bb_predicates do the dataflow until it set's the predicates on all basic blocks, and then perhaps have some counter for each bb how many times it has changed in the following iterations and if it changes too many times, just use true_predicate for it or something similar. Or the following patch could be combined with some MAX_CLAUSES growth (say from 8 to 16) to make it punt less often. 2014-02-05 Jakub Jelinek ja...@redhat.com PR ipa/60013 * ipa-inline-analysis.c (add_clause): Fix comment typos. If clause couldn't be added because there are already MAX_CLAUSES clauses, turn p into a true predicate. (and_predicates, or_predicates): If add_clause turned the result into a true_predicate_p, break out of the loop nest. * gcc.dg/pr60013.c: New test. --- gcc/ipa-inline-analysis.c.jj2014-02-03 08:53:12.0 +0100 +++ gcc/ipa-inline-analysis.c 2014-02-05 08:01:57.093878954 +0100 @@ -310,7 +310,7 @@ add_clause (conditions conditions, struc if (false_predicate_p (p)) return; - /* No one should be sily enough to add false into nontrivial clauses. */ + /* No one should be silly enough to add false into nontrivial clauses. */ gcc_checking_assert (!(clause (1 predicate_false_condition))); /* Look where to insert the clause. At the same time prune out @@ -372,9 +372,13 @@ add_clause (conditions conditions, struc } - /* We run out of variants. Be conservative in positive direction. */ + /* We run out of variants. Conservatively turn clause into true + predicate. */ if (i2 == MAX_CLAUSES) -return; +{ + p-clause[0] = 0; + return; +} /* Keep clauses in decreasing order. This makes equivalence testing easy. */ p-clause[i2 + 1] = 0; if (insert_here = 0) @@ -412,6 +416,8 @@ and_predicates (conditions conditions, { gcc_checking_assert (i MAX_CLAUSES); add_clause (conditions, out, p2-clause[i]); + if (true_predicate_p (out)) + break; } return out; } @@ -459,6 +465,8 @@ or_predicates (conditions conditions, { gcc_checking_assert (i MAX_CLAUSES j MAX_CLAUSES); add_clause (conditions, out, p-clause[i] | p2-clause[j]); + if (true_predicate_p (out)) + return out; } return out; } @@ -1035,7 +1043,7 @@ inline_node_removal_hook (struct cgraph_ memset (info, 0, sizeof (inline_summary_t)); } -/* Remap predicate P of former function to be predicate of duplicated functoin. +/* Remap predicate P of former function to be predicate of duplicated function. POSSIBLE_TRUTHS is clause of possible truths in the duplicated node, INFO is inline summary of the duplicated node. */ --- gcc/testsuite/gcc.dg/pr60013.c.jj 2014-02-05 09:49:51.632142747 +0100 +++ gcc/testsuite/gcc.dg/pr60013.c 2014-02-05 09:49:14.0 +0100 @@ -0,0 +1,47 @@ +/* PR ipa/60013 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +typedef long int jmp_buf[64]; +extern int _setjmp (jmp_buf) __attribute__ ((__nothrow__)); +struct S { int a, b, c; }; +extern struct S *baz (struct S *); +static jmp_buf j; + +static inline int +bar (int b, int d) +{ + return (b d) 0; +} + +struct S * +foo (int a, struct S *b, struct S *c, struct S *d) +{ + if (b-a == 0) +{ + switch (a) + { + case 8: + return baz (b); + case 7: + bar (b-c, c-b); + return 0; + case 6: + case 5: + case 4: + return baz (c); + case 3: + case 2: + return baz (d); + } + return 0; +} + if (b-a == 1) +{ + if (baz (c)) + return c; + else if
Re: [PATCH] Don't ICE on empty bbs with no successors in EH opt (PR middle-end/57499)
On Wed, 5 Feb 2014, Jakub Jelinek wrote: Hi! As the testcase shows, invalid use of noreturn attribute (which we diagnose) can result in in empty bb with no successors. This patch fixes cleanup_empty_eh not to ICE on these. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-02-04 Jakub Jelinek ja...@redhat.com PR middle-end/57499 * tree-eh.c (cleanup_empty_eh): Bail out on totally empty bb with no successors. * g++.dg/torture/pr57499.C: New test. --- gcc/tree-eh.c.jj 2014-01-31 22:22:09.0 +0100 +++ gcc/tree-eh.c 2014-02-04 12:35:05.611066852 +0100 @@ -4396,8 +4396,11 @@ cleanup_empty_eh (eh_landing_pad lp) /* If the block is totally empty, look for more unsplitting cases. */ if (gsi_end_p (gsi)) { - /* For the degenerate case of an infinite loop bail out. */ - if (infinite_empty_loop_p (e_out)) + /* For the degenerate case of an infinite loop bail out. + If bb has no successors and is totally empty, which can happen e.g. + because of incorrect noreturn attribute, bail out too. */ + if (e_out == NULL + || infinite_empty_loop_p (e_out)) return ret; return ret | cleanup_empty_eh_unsplit (bb, e_out, lp); --- gcc/testsuite/g++.dg/torture/pr57499.C.jj 2014-02-04 12:52:10.241846755 +0100 +++ gcc/testsuite/g++.dg/torture/pr57499.C2014-02-04 12:51:51.0 +0100 @@ -0,0 +1,14 @@ +// PR middle-end/57499 +// { dg-do compile } + +struct S +{ + ~S () __attribute__ ((noreturn)) {} // { dg-warning function does return } +}; + +void +foo () +{ + S s; + throw 1; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)
On Wed, 5 Feb 2014, Jakub Jelinek wrote: Hi! Using !!optimize to determine if we should switch local ABI to regparm convention isn't compatible with optimize attribute, as !!optimize is whether the current function is being optimized, but for the ABI decisions we actually need the caller and callee to agree on the calling convention. Fixed by looking at callee's optimize settings all the time. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Seeing a 2nd variant of such code I'd rather have an abstraction for this ... optimize_fn ()? opt_for_fn (fn, OPT_...)? Richard. 2014-02-05 Jakub Jelinek ja...@redhat.com PR target/60062 * config/i386/i386.c (ix86_function_regparm): Use optimize from the callee instead of current function's optimize to determine if local regparm convention should be used. * gcc.c-torture/execute/pr60062.c: New test. * gcc.c-torture/execute/pr60072.c: New test. --- gcc/config/i386/i386.c.jj 2014-02-04 01:36:00.0 +0100 +++ gcc/config/i386/i386.c2014-02-05 09:09:29.603827877 +0100 @@ -5608,11 +5608,22 @@ ix86_function_regparm (const_tree type, /* Use register calling convention for local functions when possible. */ if (decl TREE_CODE (decl) == FUNCTION_DECL - optimize !(profile_flag !flag_fentry)) { /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified. */ struct cgraph_local_info *i = cgraph_local_info (CONST_CAST_TREE (decl)); + + /* Caller and callee must agree on the calling convention, so + checking here just optimize means that with + __attribute__((optimize (...))) caller could use regparm convention + and callee not, or vice versa. Instead look at whether the callee + is optimized or not. */ + tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl); + if (opts == NULL_TREE) + opts = optimization_default_node; + if (!TREE_OPTIMIZATION (opts)-x_optimize) + i = NULL; + if (i i-local i-can_change_signature) { int local_regparm, globals = 0, regno; --- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj 2014-02-05 09:02:13.703085235 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr60062.c 2014-02-05 09:01:54.0 +0100 @@ -0,0 +1,25 @@ +/* PR target/60062 */ + +int a; + +static void +foo (const char *p1, int p2) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +static void +bar (const char *p1) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +__attribute__((optimize (0))) int +main () +{ + foo (hello, a); + bar (hello); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj 2013-08-25 18:20:55.717911035 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr60072.c 2014-02-05 11:33:07.501748426 +0100 @@ -0,0 +1,16 @@ +/* PR target/60072 */ + +int c = 1; + +__attribute__ ((optimize (1))) +static int *foo (int *p) +{ + return p; +} + +int +main () +{ + *foo (c) = 2; + return c - 2; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [ARM Documentation] Clarify -mcpu, -mtune, -march
On Wed, Feb 05, 2014 at 11:02:42AM +, Ramana Radhakrishnan wrote: On 01/27/14 10:01, James Greenhalgh wrote: Hi, I've tripped myself over with these three options too many times, actually, their behaviour is very simple. This patch clarifies the language used to describe the options, and puts them in a logical order. I'm happy to reword again if this is still not clear. OK? Thanks, James --- gcc/ 2014-01-27 James Greenhalgh james.greenha...@arm.com PR target/59718 * doc/invoke.texi (-march=): Clarify documentation for ARM. (-mtune=): Likewise. (-mcpu=): Likewise. snip Ok with those changes. Thanks, I've attached what I ended up committing. As far as I know the behaviour of this flag has always been this way. So is this also OK to backport to release branches? Thanks, James From b72249d400685a3c1a2e7eee5ad21db86006d34c Mon Sep 17 00:00:00 2001 From: James Greenhalgh james.greenha...@arm.com Date: Wed, 8 Jan 2014 12:26:12 + Subject: [ARM Documentation] Clarify -mcpu, -mtune, -march MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=1.8.3-rc0 This is a multi-part message in MIME format. --1.8.3-rc0 Content-Type: text/plain; charset=UTF-8; format=fixed Content-Transfer-Encoding: 8bit Hi, I've tripped myself over with these three options too many times, actually, their behaviour is very simple. This patch clarifies the language used to describe the options, and puts them in a logical order. I'm happy to reword again if this is still not clear. OK? Thanks, James --- gcc/ 2014-02-05 James Greenhalgh james.greenha...@arm.com PR target/59718 * doc/invoke.texi (-march=): Clarify documentation for ARM. (-mtune=): Likewise. (-mcpu=): Likewise. --1.8.3-rc0 Content-Type: text/x-patch; name=0001-ARM-Documentation-Clarify-mcpu-mtune-march.patch Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename=0001-ARM-Documentation-Clarify-mcpu-mtune-march.patch diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 640c123..e3dc9df 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12221,11 +12221,38 @@ option should only be used if you require compatibility with code for big-endian ARM processors generated by versions of the compiler prior to 2.8. This option is now deprecated. -@item -mcpu=@var{name} -@opindex mcpu -This specifies the name of the target ARM processor. GCC uses this name -to determine what kind of instructions it can emit when generating -assembly code. Permissible names are: @samp{arm2}, @samp{arm250}, +@item -march=@var{name} +@opindex march +This specifies the name of the target ARM architecture. GCC uses this +name to determine what kind of instructions it can emit when generating +assembly code. This option can be used in conjunction with or instead +of the @option{-mcpu=} option. Permissible names are: @samp{armv2}, +@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t}, +@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te}, +@samp{armv6}, @samp{armv6j}, +@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk}, @samp{armv6-m}, +@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7ve}, +@samp{armv8-a}, @samp{armv8-a+crc}, +@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}. + +@option{-march=armv7ve} is the armv7-a architecture with virtualization +extensions. + +@option{-march=armv8-a+crc} enables code generation for the ARMv8-A +architecture together with the optional CRC32 extensions. + +@option{-march=native} causes the compiler to auto-detect the architecture +of the build computer. At present, this feature is only supported on +Linux, and not all architectures are recognized. If the auto-detect is +unsuccessful the option has no effect. + +@item -mtune=@var{name} +@opindex mtune +This option specifies the name of the target ARM processor for +which GCC should tune the performance of the code. +For some ARM implementations better performance can be obtained by using +this option. +Permissible names are: @samp{arm2}, @samp{arm250}, @samp{arm3}, @samp{arm6}, @samp{arm60}, @samp{arm600}, @samp{arm610}, @samp{arm620}, @samp{arm7}, @samp{arm7m}, @samp{arm7d}, @samp{arm7dm}, @samp{arm7di}, @samp{arm7dmi}, @samp{arm70}, @samp{arm700}, @@ -12259,26 +12286,6 @@ Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: @samp{cortex-a15.cortex-a7}, @samp{cortex-a57.cortex-a53}. -@option{-mcpu=generic-@var{arch}} is also permissible, and is -equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}. -See @option{-mtune} for more information. - -@option{-mcpu=native} causes the compiler to auto-detect the CPU -of the build computer. At present, this feature is only supported on -Linux, and not all architectures are recognized. If the auto-detect is -unsuccessful the option has no effect. - -@item
Re: [ARM Documentation] Clarify -mcpu, -mtune, -march
I've attached what I ended up committing. As far as I know the behaviour of this flag has always been this way. So is this also OK to backport to release branches? Ok by me. It's a documentation fix to make things more explicit. And s/=/ again in Changelog :) . Ramana Thanks, James -- Ramana Radhakrishnan Principal Engineer ARM Ltd.
[RS6000] SDmode and reload
This little patch fixes a number of dfp failures: Fail: c-c++-common/dfp/func-vararg-alternate-d32.c execution test FAIL: c-c++-common/dfp/func-vararg-alternate-d32.c -std=c++11 execution test FAIL: c-c++-common/dfp/func-vararg-alternate-d32.c -std=c++98 execution test FAIL: c-c++-common/dfp/func-vararg-dfp.c execution test FAIL: c-c++-common/dfp/func-vararg-dfp.c -std=c++11 execution test FAIL: c-c++-common/dfp/func-vararg-dfp.c -std=c++98 execution test FAIL: c-c++-common/dfp/func-vararg-mixed.c execution test FAIL: c-c++-common/dfp/func-vararg-mixed.c -std=c++11 execution test FAIL: c-c++-common/dfp/func-vararg-mixed.c -std=c++98 execution test FAIL: decimal/pr54036-1.cc (test for excess errors) FAIL: gcc.dg/compat/scalar-by-value-dfp c_compat_x_tst.o-c_compat_y_tst.o execute FAIL: gcc.dg/compat/scalar-return-dfp c_compat_x_tst.o-c_compat_y_tst.o execute FAIL: gcc.target/powerpc/ppc32-abi-dfp-1.c execution test FAIL: g++.dg/compat/decimal/pass-1 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/pass-2 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/pass-3 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/pass-4 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/pass-5 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/pass-6 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-1 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-2 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-3 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-4 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-5 cp_compat_x_tst.o-cp_compat_y_tst.o execute FAIL: g++.dg/compat/decimal/return-6 cp_compat_x_tst.o-cp_compat_y_tst.o execute Vlad added rs6000_secondary_memory_needed_mode for lra, and in so doing broke reload's use of cfun-machine-sdmode_stack_slot in rs6000_secondary_memory_needed_rtx (mode was no longer SDmode there). Bootstrapped and regression tested powerpc64-linux, both with and without -mlra. OK to commit? PR target/60032 * config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only change SDmode to DDmode when lra_in_progress. * gcc.target/powerpc/pr60032.c: New. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 207417) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15581,7 +15581,7 @@ enum machine_mode rs6000_secondary_memory_needed_mode (enum machine_mode mode) { - if (mode == SDmode) + if (lra_in_progress mode == SDmode) return DDmode; return mode; } Index: gcc/testsuite/gcc.target/powerpc/pr60032.c === --- gcc/testsuite/gcc.target/powerpc/pr60032.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr60032.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target dfp } */ +/* { dg-options -O2 } */ + +void foo (void) +{ + register float __attribute__ ((mode(SD))) r31 __asm__ (r31); + register float __attribute__ ((mode(SD))) fr1 __asm__ (fr1); + + __asm__ (# : =d (fr1)); + r31 = fr1; + __asm__ (# : : r (r31)); +} -- Alan Modra Australia Development Lab, IBM
[PATCH] Fix loop preserving in loop header copying
There's a simple mistake in gimple_duplicate_sese_region that causes all loop-header copied loops to be thrown away (but soon re-discovered, hopefully without somebody inbetween messing things up). Fixed like the following, bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-02-05 Richard Biener rguent...@suse.de * tree-cfg.c (gimple_duplicate_sese_region): Fix ordering of set_loop_copy and initialize_original_copy_tables. Index: gcc/tree-cfg.c === *** gcc/tree-cfg.c (revision 207497) --- gcc/tree-cfg.c (working copy) *** gimple_duplicate_sese_region (edge entry *** 5879,5892 return false; } - set_loop_copy (loop, loop); - /* In case the function is used for loop header copying (which is the primary use), ensure that EXIT and its copy will be new latch and entry edges. */ if (loop-header == entry-dest) { copying_header = true; - set_loop_copy (loop, loop_outer (loop)); if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src)) return false; --- 5905,5915 *** gimple_duplicate_sese_region (edge entry *** 5897,5910 return false; } if (!region_copy) { region_copy = XNEWVEC (basic_block, n_region); free_region_copy = true; } - initialize_original_copy_tables (); - /* Record blocks outside the region that are dominated by something inside. */ if (update_dominance) --- 5920,5938 return false; } + initialize_original_copy_tables (); + + if (copying_header) + set_loop_copy (loop, loop_outer (loop)); + else + set_loop_copy (loop, loop); + if (!region_copy) { region_copy = XNEWVEC (basic_block, n_region); free_region_copy = true; } /* Record blocks outside the region that are dominated by something inside. */ if (update_dominance)
Re: [ARM][PATCH] Vectorizer generates unaligned access when -mno-unaligned-access is enabled
Ramana wrote: The error is caused by unaligned vector load via two vldr instructions: So, does this mean that we get unaligned vector loads when munaligned-access is on, using vldr instructions ? No, when -munaligned-access is on, the backend will enable movmisalign patterns that generate friendly vst1/vld1 instructions (well, at least on little-endian ARMs). Cause for this error is, even when -mno-unaligned-access is enabled, backend will inform vectorizer that it supports misaligned accesses via TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT. Ramana wrote: Attached patch fixes this. Is this OK for trunk? How has it been tested ? I was hoping Linaro people could run their magical cbuild on it... -Y
Re: [C,C++] integer constants in attribute arguments
Marc Glisse marc.gli...@inria.fr writes: On Wed, 5 Feb 2014, Andreas Schwab wrote: Wrt. the function decl it should probably just be changed to a variable decl. Alignments on function decls are kind of exotic. Good idea. How about this patch then? LGTM. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [PATCH, rs6000] Implement -maltivec=be for vec_pack, vec_unpackh, vec_unpackl Altivec builtins
On Tue, Feb 4, 2014 at 2:13 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Yet another -maltivec=be patch, this one for the vector pack/unpack builtins. For the pack operations, we need to reverse the order of the inputs for little endian without -maltivec=be. For the unpack operations, we need to replace unpackh with unpackl, and vice versa, for little endian without -maltivec=be. For both pack and unpack, there are some internal uses that should not have their semantics changed from the existing implementation. For these, _direct forms of the insns are provided, as has been done in previous patches in this series. Four new test cases are added to test these builtins for all applicable vector types, with and without -maltivec=be. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill gcc: 2014-02-04 Bill Schmidt wschm...@linux.vnet.ibm.com * altivec.md (UNSPEC_VPACK_UNS_UNS_MOD_DIRECT): New unspec. (UNSPEC_VUNPACK_HI_SIGN_DIRECT): Likewise. (UNSPEC_VUNPACK_LO_SIGN_DIRECT): Likewise. (mulv8hi3): Use gen_altivec_vpkuwum_direct instead of gen_altivec_vpkuwum. (altivec_vpkpx): Test for VECTOR_ELT_ORDER_BIG instead of for BYTES_BIG_ENDIAN. (altivec_vpksVI_charss): Likewise. (altivec_vpksVI_charus): Likewise. (altivec_vpkuVI_charus): Likewise. (altivec_vpkuVI_charum): Likewise. (altivec_vpkuVI_charum_direct): New (copy of altivec_vpkuVI_charum that still relies on BYTES_BIG_ENDIAN, for internal use). (altivec_vupkhsVU_char): Emit vupkls* instead of vupkhs* when target is little endian and -maltivec=be is not specified. (*altivec_vupkhsVU_char_direct): New (copy of altivec_vupkhsVU_char that always emits vupkhs*, for internal use). (altivec_vupklsVU_char): Emit vupkhs* instead of vupkls* when target is little endian and -maltivec=be is not specified. (*altivec_vupklsVU_char_direct): New (copy of altivec_vupklsVU_char that always emits vupkls*, for internal use). (altivec_vupkhpx): Emit vupklpx instead of vupkhpx when target is little endian and -maltivec=be is not specified. (altivec_vupklpx): Emit vupkhpx instead of vupklpx when target is little endian and -maltivec=be is not specified. gcc/testsuite: 2014-02-04 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/pack.c: New. * gcc.dg/vmx/pack-be-order.c: New. * gcc.dg/vmx/unpack.c: New. * gcc.dg/vmx/unpack-be-order.c: New. Okay. Thanks, David
Re: [ARM][PATCH] Vectorizer generates unaligned access when -mno-unaligned-access is enabled
Ramana wrote: Attached patch fixes this. Is this OK for trunk? How has it been tested ? I was hoping Linaro people could run their magical cbuild on it... Ok if no regressions. regards Ramana -Y -- Ramana Radhakrishnan Principal Engineer ARM Ltd.
[PATCH] FIx PR60076
Tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-02-05 Richard Biener rguent...@suse.de PR testsuite/60076 * gcc.dg/vect/pr60012.c: Require vect_extract_even_odd and avoid using unsigned long long. Index: gcc/testsuite/gcc.dg/vect/pr60012.c === --- gcc/testsuite/gcc.dg/vect/pr60012.c (revision 207504) +++ gcc/testsuite/gcc.dg/vect/pr60012.c (working copy) @@ -8,14 +8,14 @@ typedef struct } complex16_t; void -libvector_AccSquareNorm_ref (unsigned long long *acc, +libvector_AccSquareNorm_ref (unsigned int *acc, const complex16_t *x, unsigned len) { unsigned i; for (i = 0; i len; i++) -acc[i] += ((unsigned long long)((int)x[i].real * x[i].real)) - + ((unsigned long long)((int)x[i].imag * x[i].imag)); +acc[i] += ((unsigned int)((int)x[i].real * x[i].real)) + + ((unsigned int)((int)x[i].imag * x[i].imag)); } -/* { dg-final { scan-tree-dump LOOP VECTORIZED vect } } */ +/* { dg-final { scan-tree-dump LOOP VECTORIZED vect { target { vect_extract_even_odd } } } } */ /* { dg-final { cleanup-tree-dump vect } } */
Re: [PATCH, rs6000] Handle -maltivec=be for vec_sum2s builtin (last -maltivec=be patch)
On Tue, Feb 4, 2014 at 10:15 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, One final patch in the series, this one for vec_sum2s. This builtin requires some additional code generation for the case of little endian without -maltivec=be. Here's an example: va = {-10,1,2,3};0x 0003 0002 0001 fff6 vb = {100,101,102,-103}; 0x ff99 0066 0065 0064 vc = vec_sum2s (va, vb); 0x ff9e 005c = {0,92,0,-98}; We need to add -10 + 1 + 101 = 92 and place it in vc[1], and add 2 + 3 + -103 and place the result in vc[3], with zeroes in the other two elements. To do this, we first use vsldoi vs,vb,vb,12 to rotate 101 and -103 into big-endian elements 1 and 3, as required by the vsum2sws instruction: 0x ff99 0066 0065 0064 ff99 0066 0065 0064 vs = 0064 ff99 0066 0065 Executing vsum2sws vs,va,vs then gives vs = 0x ff9e 005c which then must be shifted into position with vsldoi vc,vs,vs,4 0x ff9e 005c ff9e 005c vc = ff9e 005c which is the desired result. In addition to this change, I noticed a redundant test from one of my previous patches and simplified it. (BYTES_BIG_ENDIAN implies VECTOR_ELT_ORDER_BIG, so we don't need to test BYTES_BIG_ENDIAN.) As usual, new test cases are added to cover the possible cases. These are simpler this time since only vector signed integer is a legal type for vec_sum2s. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill gcc: 2014-02-04 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/altivec.md (altivec_vsum2sws): Adjust code generation for -maltivec=be. (altivec_vsumsws): Simplify redundant test. gcc/testsuite: 2014-02-04 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/sum2s.c: New. * gcc.dg/vmx/sum2s-be-order.c: New. Okay. The multi-instruction sequences really should be emitted as separate instructions and the scratch only allocated for the LE case. Thanks, David
Re: [PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)
On Wed, 5 Feb 2014, Jakub Jelinek wrote: Hi! Using !!optimize to determine if we should switch local ABI to regparm convention isn't compatible with optimize attribute, as !!optimize is whether the current function is being optimized, but for the ABI decisions we actually need the caller and callee to agree on the calling convention. Fixed by looking at callee's optimize settings all the time. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Seeing a 2nd variant of such code I'd rather have an abstraction for this ... optimize_fn ()? opt_for_fn (fn, OPT_...)? Propbable opt_for_fn. I wanted to implement SSE passing conventions for local functions but didn't get to it since I was concerned about -fno-sse being passed to teh callee by target attribute. Having API to check those would be nice. Honza
Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060
When looking at PR60060, outputting the declaration debug info for a local static variable twice (once via BLOCK_VARS and once via calling the debug hook on all globals) I noticed that the rest_of_decl_compilation call in materialize_cgraph always works on the empty lto_global_var_decls vector (we only populate it later). That results in the opportunity to effectively remove it and in lto_write_globals walk over the defined vars in the varpool instead. Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN decls there (but I'm not sure of other side-effects of that ...). Thus, this cleanup first. LTO bootstrap / regtest running on x86_64-unknown-linux-gnu. Ok? Seems OK to me. I always had an impression that this machinery exists to make decls that are revmoed fro symbol table (optimized out) to land the debug info, but I think this is mostly broken by partitioning anyway, right? Do we stream them at all? Honza Thanks, Richard. 2014-02-05 Richard Biener rguent...@suse.de lto/ * lto.h (lto_global_var_decls): Remove. * lto-lang.c (lto_init): Do not allocate lto_global_var_decls. (lto_write_globals): Do nothing in WPA stage, gather globals from the varpool here ... * lto.c (lto_main): ... not here. (materialize_cgraph): Do not call rest_of_decl_compilation on the empty lto_global_var_decls vector. (lto_global_var_decls): Remove. Index: gcc/lto/lto-lang.c === *** gcc/lto/lto-lang.c(revision 207497) --- gcc/lto/lto-lang.c(working copy) *** lto_getdecls (void) *** 1075,1085 static void lto_write_globals (void) { ! tree *vec = lto_global_var_decls-address (); ! int len = lto_global_var_decls-length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! vec_free (lto_global_var_decls); } static tree --- 1075,1094 static void lto_write_globals (void) { ! if (flag_wpa) ! return; ! ! /* Record the global variables. */ ! vectree lto_global_var_decls = vNULL; ! varpool_node *vnode; ! FOR_EACH_DEFINED_VARIABLE (vnode) ! lto_global_var_decls.safe_push (vnode-decl); ! ! tree *vec = lto_global_var_decls.address (); ! int len = lto_global_var_decls.length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! lto_global_var_decls.release (); } static tree *** lto_init (void) *** 1218,1224 #undef NAME_TYPE /* Initialize LTO-specific data structures. */ - vec_alloc (lto_global_var_decls, 256); in_lto_p = true; return true; --- 1227,1232 Index: gcc/lto/lto.c === *** gcc/lto/lto.c (revision 207497) --- gcc/lto/lto.c (working copy) *** along with GCC; see the file COPYING3. *** 50,57 #include context.h #include pass_manager.h - /* Vector to keep track of external variables we've seen so far. */ - vectree, va_gc *lto_global_var_decls; static GTY(()) tree first_personality_decl; --- 50,55 *** read_cgraph_and_symbols (unsigned nfiles *** 3009,3017 static void materialize_cgraph (void) { - tree decl; struct cgraph_node *node; - unsigned i; timevar_id_t lto_timer; if (!quiet_flag) --- 3007,3013 *** materialize_cgraph (void) *** 3043,3052 current_function_decl = NULL; set_cfun (NULL); - /* Inform the middle end about the global variables we have seen. */ - FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl) - rest_of_decl_compilation (decl, 1, 0); - if (!quiet_flag) fprintf (stderr, \n); --- 3039,3044 *** lto_main (void) *** 3309,3316 do_whole_program_analysis (); else { - varpool_node *vnode; - timevar_start (TV_PHASE_OPT_GEN); materialize_cgraph (); --- 3301,3306 *** lto_main (void) *** 3330,3339 this. */ if (flag_lto_report || (flag_wpa flag_lto_report_wpa)) print_lto_report_1 (); - - /* Record the global variables. */ - FOR_EACH_DEFINED_VARIABLE (vnode) - vec_safe_push (lto_global_var_decls, vnode-decl); } } --- 3320,3325 Index: gcc/lto/lto.h === *** gcc/lto/lto.h (revision 207497) --- gcc/lto/lto.h (working copy) *** extern tree lto_eh_personality (void); *** 41,49 extern void lto_main (void); extern void lto_read_all_file_options (void); - /* In lto-symtab.c */ - extern GTY(()) vectree, va_gc *lto_global_var_decls; - /* In lto-elf.c or lto-coff.c */
Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060
On Wed, 5 Feb 2014, Jan Hubicka wrote: When looking at PR60060, outputting the declaration debug info for a local static variable twice (once via BLOCK_VARS and once via calling the debug hook on all globals) I noticed that the rest_of_decl_compilation call in materialize_cgraph always works on the empty lto_global_var_decls vector (we only populate it later). That results in the opportunity to effectively remove it and in lto_write_globals walk over the defined vars in the varpool instead. Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN decls there (but I'm not sure of other side-effects of that ...). Thus, this cleanup first. LTO bootstrap / regtest running on x86_64-unknown-linux-gnu. Ok? Seems OK to me. I always had an impression that this machinery exists to make decls that are revmoed fro symbol table (optimized out) to land the debug info, but I think this is mostly broken by partitioning anyway, right? Do we stream them at all? No, we don't stream those. We'd need to stream what the FEs think are the globals (lang_hooks.getdecls()) and somehow partition them and ship them somewhere. Richard. Honza Thanks, Richard. 2014-02-05 Richard Biener rguent...@suse.de lto/ * lto.h (lto_global_var_decls): Remove. * lto-lang.c (lto_init): Do not allocate lto_global_var_decls. (lto_write_globals): Do nothing in WPA stage, gather globals from the varpool here ... * lto.c (lto_main): ... not here. (materialize_cgraph): Do not call rest_of_decl_compilation on the empty lto_global_var_decls vector. (lto_global_var_decls): Remove. Index: gcc/lto/lto-lang.c === *** gcc/lto/lto-lang.c (revision 207497) --- gcc/lto/lto-lang.c (working copy) *** lto_getdecls (void) *** 1075,1085 static void lto_write_globals (void) { ! tree *vec = lto_global_var_decls-address (); ! int len = lto_global_var_decls-length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! vec_free (lto_global_var_decls); } static tree --- 1075,1094 static void lto_write_globals (void) { ! if (flag_wpa) ! return; ! ! /* Record the global variables. */ ! vectree lto_global_var_decls = vNULL; ! varpool_node *vnode; ! FOR_EACH_DEFINED_VARIABLE (vnode) ! lto_global_var_decls.safe_push (vnode-decl); ! ! tree *vec = lto_global_var_decls.address (); ! int len = lto_global_var_decls.length (); wrapup_global_declarations (vec, len); emit_debug_global_declarations (vec, len); ! lto_global_var_decls.release (); } static tree *** lto_init (void) *** 1218,1224 #undef NAME_TYPE /* Initialize LTO-specific data structures. */ - vec_alloc (lto_global_var_decls, 256); in_lto_p = true; return true; --- 1227,1232 Index: gcc/lto/lto.c === *** gcc/lto/lto.c (revision 207497) --- gcc/lto/lto.c (working copy) *** along with GCC; see the file COPYING3. *** 50,57 #include context.h #include pass_manager.h - /* Vector to keep track of external variables we've seen so far. */ - vectree, va_gc *lto_global_var_decls; static GTY(()) tree first_personality_decl; --- 50,55 *** read_cgraph_and_symbols (unsigned nfiles *** 3009,3017 static void materialize_cgraph (void) { - tree decl; struct cgraph_node *node; - unsigned i; timevar_id_t lto_timer; if (!quiet_flag) --- 3007,3013 *** materialize_cgraph (void) *** 3043,3052 current_function_decl = NULL; set_cfun (NULL); - /* Inform the middle end about the global variables we have seen. */ - FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl) - rest_of_decl_compilation (decl, 1, 0); - if (!quiet_flag) fprintf (stderr, \n); --- 3039,3044 *** lto_main (void) *** 3309,3316 do_whole_program_analysis (); else { - varpool_node *vnode; - timevar_start (TV_PHASE_OPT_GEN); materialize_cgraph (); --- 3301,3306 *** lto_main (void) *** 3330,3339 this. */ if (flag_lto_report || (flag_wpa flag_lto_report_wpa)) print_lto_report_1 (); - - /* Record the global variables. */ - FOR_EACH_DEFINED_VARIABLE (vnode) - vec_safe_push (lto_global_var_decls, vnode-decl); } } --- 3320,3325 Index: gcc/lto/lto.h === *** gcc/lto/lto.h (revision 207497) ---
Re: PR ipa/59831 (ipa-cp devirt issues)
I think it would be better, yes. IIRC, We want to re-organize indirect_info anyway (I vaguely remember we want to split the overloaded offset field into two but forgot the exact reason why but I have it written somewhere), I suppose we'll be turning it into a union (or class hierarchy?) and this would make it slightly awkward. If we ever support the pointer-by-reference scenario, quite a few more places will need to be adjusted anyway. OK, I will remove the check. I wanted to split polymorphic call context and its propagation from aggregate analysis. For example struct A { struct B b; struct C c; } when you call method of A.c.foo() its context is not {outer_type=a,offset=offsetof(c)}, since A is not polymorphic type at all. We should instead use {outer_type=c,offset=0,not_derived_type}. In ipa-devirt there is function get_class_context that gets you from first to second. So as briefly discussed last july, I think ipa-prop can simply do two propagations in parallel where the type one is done via functionality that willbe exported from ipa-devirt, since the logic should be shared with a local devirt pass. There are also cases where the parameter is KNOWN_TYPE but also PASS_THROUGH at the same type (as tings passed by invisible refernece, or when the function is constructor and it initialized dynamic type of THIS pointer). Those are cases where current code is losing information because one is not supperset of the other. We want to use KNOWN_TYPE information for devirtualization, but we do not want to forget about PASS_THROUGH for normal constant propagation in case constructor is only called on decl. Honza Thanks, Martin
Re: [RS6000] SDmode and reload
On Wed, Feb 5, 2014 at 8:27 AM, Alan Modra amo...@gmail.com wrote: Vlad added rs6000_secondary_memory_needed_mode for lra, and in so doing broke reload's use of cfun-machine-sdmode_stack_slot in rs6000_secondary_memory_needed_rtx (mode was no longer SDmode there). Bootstrapped and regression tested powerpc64-linux, both with and without -mlra. OK to commit? PR target/60032 * config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only change SDmode to DDmode when lra_in_progress. * gcc.target/powerpc/pr60032.c: New. So Vlad added the definition to the rs6000 port, but it only is needed by LRA and does not interact correctly with classic reload. Okay. - David
Re: var-tracking and s390's LM(G)
Jakub Jelinek ja...@redhat.com writes: On Tue, Feb 04, 2014 at 12:21:14PM +, Richard Sandiford wrote: Jakub Jelinek ja...@redhat.com writes: But then we wouldn't be able to use var-tracking when __builtin_eh_return is used, since in that case replacing the (set (reg ...) (mem ...)) with a (plus ...) would be incorrect -- the value we're loading from the stack will have had a variable adjustment applied. And I know from painful experience that being able to debug the unwind code is very useful. :-) Aren't functions using EH_RETURN typically using frame pointer? Sorry, forgot to respond to this bit. On s390, _Unwind_RaiseException and _Unwind_ForcedUnwind don't use a frame pointer, at least not on trunk. %r11 is used as a general scratch register instead. E.g.: 2ba8 _Unwind_ForcedUnwind: 2ba8: 90 6f f0 18 stm %r6,%r15,24(%r15) 2bac: 0d d0 basr%r13,%r0 2bae: 60 40 f0 50 std %f4,80(%r15) 2bb2: 60 60 f0 58 std %f6,88(%r15) 2bb6: a7 fa fd f8 ahi %r15,-520 2bba: 58 c0 d0 9e l %r12,158(%r13) 2bbe: 58 10 d0 9a l %r1,154(%r13) 2bc2: 18 b2 lr %r11,%r2 ... 2c10: 98 6f f2 20 lm %r6,%r15,544(%r15) 2c14: 07 f4 br %r4 Looking at pr54693-2.c -O2 -g -m64 -march=z196 (was that the one you meant) with your patches, I see: (insn/f:TI 122 30 31 4 (parallel [ (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 80 [0x50])) [3 S8 A8]) (reg:DI 10 %r10)) (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 88 [0x58])) [3 S8 A8]) (reg:DI 11 %r11)) (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 96 [0x60])) [3 S8 A8]) (reg:DI 12 %r12)) (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 104 [0x68])) [3 S8 A8]) (reg:DI 13 %r13)) (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 112 [0x70])) [3 S8 A8]) (reg:DI 14 %r14)) (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 120 [0x78])) [3 S8 A8]) (reg/f:DI 15 %r15)) ]) pr54693-2.c:16 94 {*store_multiple_di} (expr_list:REG_DEAD (reg:DI 14 %r14) (expr_list:REG_DEAD (reg:DI 13 %r13) (expr_list:REG_DEAD (reg:DI 12 %r12) (expr_list:REG_DEAD (reg:DI 11 %r11) (expr_list:REG_DEAD (reg:DI 10 %r10) (nil))) ... (insn/f 123 31 124 4 (set (reg/f:DI 15 %r15) (plus:DI (reg/f:DI 15 %r15) (const_int -160 [0xff60]))) pr54693-2.c:16 65 {*la_64} (expr_list:REG_FRAME_RELATED_EXPR (set (reg/f:DI 15 %r15) (plus:DI (reg/f:DI 15 %r15) (const_int -160 [0xff60]))) (nil))) ... (insn/f:TI 135 134 136 7 (parallel [ (set (reg:DI 10 %r10) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 240 [0xf0])) [3 S8 A8])) (set (reg:DI 11 %r11) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 248 [0xf8])) [3 S8 A8])) (set (reg:DI 12 %r12) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 256 [0x100])) [3 S8 A8])) (set (reg:DI 13 %r13) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 264 [0x108])) [3 S8 A8])) (set (reg:DI 14 %r14) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 272 [0x110])) [3 S8 A8])) (set (reg/f:DI 15 %r15) (mem:DI (plus:DI (reg/f:DI 15 %r15) (const_int 280 [0x118])) [3 S8 A8])) ]) pr54693-2.c:25 92 {*load_multiple_di} (expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 15 %r15) (const_int 160 [0xa0])) (expr_list:REG_CFA_RESTORE (reg/f:DI 15 %r15) (expr_list:REG_CFA_RESTORE (reg:DI 14 %r14) (expr_list:REG_CFA_RESTORE (reg:DI 13 %r13) (expr_list:REG_CFA_RESTORE (reg:DI 12 %r12) (expr_list:REG_CFA_RESTORE (reg:DI 11 %r11) (expr_list:REG_CFA_RESTORE (reg:DI 10 %r10) (nil) In a function that doesn't need frame pointer, I'd say this is a serious bloat of the unwind info, you are saving/restoring %r15 not because you have to, but just that it seems to be cheaper for the target. So, I'd say you shouldn't emit DW_CFA_offset 15, 5 on the insn 122
Re: Fix bootstrap with -mno-accumulate-outgoing-args
On 02/04/2014 09:35 AM, Jan Hubicka wrote: On 02/04/2014 08:48 AM, Jan Hubicka wrote: How things are supposed to work in this case? So perhaps we need scheduler to understand and move around the ARGS_SIZE note? I believe we do need to have the scheduler move the notes around. If we need notes on non-stack adjusting insns as you seem to show in your testcase, I guess it is the only way around. Still combine stack adjust may be smart enough to not produce redundant note in the case of go's ICE. I am not terribly familiar with the code, will you look into it or shall I give it a try? I had a brief look at find_modifiable_mems, which seems to be the only kind of schedule-time dependency breaking that we do. I started writing some new code that would handle REG_ARGS_SIZE, but then considered that wasn't terribly appropriate for stage4. So I've left off with just the dependency links between the insns. We'd need these for the schedule-time adjustments anyway, and with them in place the scheduler does not re-order the insns in a way that causes problems. Testing for x86_64 has finished; i686 is nearly done. I suppose the only question I have is if anyone disagrees that OUTPUT is incorrect as the dependency type. r~
Re: Fix bootstrap with -mno-accumulate-outgoing-args
This time with the patch... On 02/05/2014 07:40 AM, Richard Henderson wrote: On 02/04/2014 09:35 AM, Jan Hubicka wrote: On 02/04/2014 08:48 AM, Jan Hubicka wrote: How things are supposed to work in this case? So perhaps we need scheduler to understand and move around the ARGS_SIZE note? I believe we do need to have the scheduler move the notes around. If we need notes on non-stack adjusting insns as you seem to show in your testcase, I guess it is the only way around. Still combine stack adjust may be smart enough to not produce redundant note in the case of go's ICE. I am not terribly familiar with the code, will you look into it or shall I give it a try? I had a brief look at find_modifiable_mems, which seems to be the only kind of schedule-time dependency breaking that we do. I started writing some new code that would handle REG_ARGS_SIZE, but then considered that wasn't terribly appropriate for stage4. So I've left off with just the dependency links between the insns. We'd need these for the schedule-time adjustments anyway, and with them in place the scheduler does not re-order the insns in a way that causes problems. Testing for x86_64 has finished; i686 is nearly done. I suppose the only question I have is if anyone disagrees that OUTPUT is incorrect as the dependency type. r~ * combine-stack-adj.c: Revert r206943. * sched-int.h (struct deps_desc): Add last_args_size. * sched-deps.c (init_deps): Initialize it. (sched_analyze_insn): Add OUTPUT dependencies between insns that contain REG_ARGS_SIZE notes. diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c index c591c60..69fd5ea 100644 --- a/gcc/combine-stack-adj.c +++ b/gcc/combine-stack-adj.c @@ -567,7 +567,6 @@ combine_stack_adjustments_for_block (basic_block bb) try_apply_stack_adjustment (insn, reflist, 0, -last_sp_adjust)) { - rtx note; if (last2_sp_set) maybe_move_args_size_note (last2_sp_set, last_sp_set, false); else @@ -577,11 +576,6 @@ combine_stack_adjustments_for_block (basic_block bb) reflist = NULL; last_sp_set = NULL_RTX; last_sp_adjust = 0; - /* We no longer adjust stack size. Whoever adjusted it earlier -hopefully got the note right. */ - note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX); - if (note) - remove_note (insn, note); continue; } } diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 7efc937..efc4223 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -3470,6 +3470,15 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx insn) change_spec_dep_to_hard (sd_it); } } + + /* We do not yet have code to adjust REG_ARGS_SIZE, therefore we must + honor their original ordering. */ + if (find_reg_note (insn, REG_ARGS_SIZE, NULL)) +{ + if (deps-last_args_size) + add_dependence (insn, deps-last_args_size, REG_DEP_OUTPUT); + deps-last_args_size = insn; +} } /* Return TRUE if INSN might not always return normally (e.g. call exit, @@ -3876,6 +3885,7 @@ init_deps (struct deps_desc *deps, bool lazy_reg_last) deps-sched_before_next_jump = 0; deps-in_post_call_group_p = not_post_call; deps-last_debug_insn = 0; + deps-last_args_size = 0; deps-last_reg_pending_barrier = NOT_A_BARRIER; deps-readonly = 0; } diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 3b1106f..2cec624 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -539,6 +539,9 @@ struct deps_desc /* The last debug insn we've seen. */ rtx last_debug_insn; + /* The last insn bearing REG_ARGS_SIZE that we've seen. */ + rtx last_args_size; + /* The maximum register number for the following arrays. Before reload this is max_reg_num; after reload it is FIRST_PSEUDO_REGISTER. */ int max_reg;
Re: Fix bootstrap with -mno-accumulate-outgoing-args
I had a brief look at find_modifiable_mems, which seems to be the only kind of schedule-time dependency breaking that we do. I started writing some new code that would handle REG_ARGS_SIZE, but then considered that wasn't terribly appropriate for stage4. So I've left off with just the dependency links between the insns. We'd need these for the schedule-time adjustments anyway, and with them in place the scheduler does not re-order the insns in a way that causes problems. Testing for x86_64 has finished; i686 is nearly done. I suppose the only question I have is if anyone disagrees that OUTPUT is incorrect as the dependency type. From i386 perspective, this seems like resonable compromise for now. I did some re-testing of scheduler recently for generic. Call sequences is one of few places where scheduling matters (in addition to tight loops), but I do not think we will lose measurable % of perofmrance by disabling scheduling here. Thanks! Honza r~ * combine-stack-adj.c: Revert r206943. * sched-int.h (struct deps_desc): Add last_args_size. * sched-deps.c (init_deps): Initialize it. (sched_analyze_insn): Add OUTPUT dependencies between insns that contain REG_ARGS_SIZE notes. diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c index c591c60..69fd5ea 100644 --- a/gcc/combine-stack-adj.c +++ b/gcc/combine-stack-adj.c @@ -567,7 +567,6 @@ combine_stack_adjustments_for_block (basic_block bb) try_apply_stack_adjustment (insn, reflist, 0, -last_sp_adjust)) { - rtx note; if (last2_sp_set) maybe_move_args_size_note (last2_sp_set, last_sp_set, false); else @@ -577,11 +576,6 @@ combine_stack_adjustments_for_block (basic_block bb) reflist = NULL; last_sp_set = NULL_RTX; last_sp_adjust = 0; - /* We no longer adjust stack size. Whoever adjusted it earlier - hopefully got the note right. */ - note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX); - if (note) - remove_note (insn, note); continue; } } diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 7efc937..efc4223 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -3470,6 +3470,15 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx insn) change_spec_dep_to_hard (sd_it); } } + + /* We do not yet have code to adjust REG_ARGS_SIZE, therefore we must + honor their original ordering. */ + if (find_reg_note (insn, REG_ARGS_SIZE, NULL)) +{ + if (deps-last_args_size) + add_dependence (insn, deps-last_args_size, REG_DEP_OUTPUT); + deps-last_args_size = insn; +} } /* Return TRUE if INSN might not always return normally (e.g. call exit, @@ -3876,6 +3885,7 @@ init_deps (struct deps_desc *deps, bool lazy_reg_last) deps-sched_before_next_jump = 0; deps-in_post_call_group_p = not_post_call; deps-last_debug_insn = 0; + deps-last_args_size = 0; deps-last_reg_pending_barrier = NOT_A_BARRIER; deps-readonly = 0; } diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 3b1106f..2cec624 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -539,6 +539,9 @@ struct deps_desc /* The last debug insn we've seen. */ rtx last_debug_insn; + /* The last insn bearing REG_ARGS_SIZE that we've seen. */ + rtx last_args_size; + /* The maximum register number for the following arrays. Before reload this is max_reg_num; after reload it is FIRST_PSEUDO_REGISTER. */ int max_reg;
Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060
Seems OK to me. I always had an impression that this machinery exists to make decls that are revmoed fro symbol table (optimized out) to land the debug info, but I think this is mostly broken by partitioning anyway, right? Do we stream them at all? No, we don't stream those. We'd need to stream what the FEs think are the globals (lang_hooks.getdecls()) and somehow partition them and ship them somewhere. My old idea was to simply output debug for dead stuff from WPA and add it asone of object files to linking later. But indeed, this is for later time. Honza
Re: var-tracking and s390's LM(G)
On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote: In a function that doesn't need frame pointer, I'd say this is a serious bloat of the unwind info, you are saving/restoring %r15 not because you have to, but just that it seems to be cheaper for the target. So, I'd say you shouldn't emit DW_CFA_offset 15, 5 on the insn 122 in the prologue and DW_CFA_restore 15 in the epilogue (twice in this case), that just waste of .eh_frame/.debug_frame space. I'd say you should represent in this case the *store_multiple_di as REG_FRAME_RELATED_EXPR with all but the last set in the PARALLEL, and on the *load_multiple_di remove REG_CFA_RESTORE (r15) note and change REG_CFA_DEF_CFA note to REG_CFA_ADJUST_CFA note which would say that stack pointer has been incremented by 160 bytes (epilogue expansion knows that value). I think at the moment we're relying on the DW_CFA_offset 15s to provide correct %r15 values on eh_return. Every non-leaf function saves the stack pointer and the unwind routines track the %r15 save slots like they would track any call-saved register. In other words, the final eh_return %r15 comes directly from a save slot, without any CFA-specific handling. If some frames omit the CFI for the %r15 save then we'll just keep the save slot from the previous (inner) frame instead. CCing Richard Henderson into the discussion. The new unwind routines would handle 4.9+ and pre-4.9 frames, but pre-4.9 unwind routines wouldn't handle 4.9+ frames, so would that require a version bump on the unwind symbols? No need to version anything IMHO. We've never done that for any port where we've started to emit something the old unwinder wouldn't cope with, it is user's responsibility to use new enough libgcc_s. BTW, what is the reason why %r15 is saved/restored from the stack at all say for simple: void foo (void) { int a = 6; asm volatile (# %0 : +m (a)); } Is that some ABI matter where it effectively always uses requires to use some kind of frame pointer (which the saved previous stack pointer is)? It can surely make debugging without unwind info easier, ditto backtraces, on the other side it must have a runtime cost. Jakub
[testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86 with gas: FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:21:14: warning: incompatible implicit declaration of built-in function 'floorf ' [enabled by default] /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:24:14: warning: incompatible implicit declaration of built-in function 'ceilf' [enabled by default] The platform lacks C99 support, but this can easily be avoided by using the builtins instead. The following patch does just that; tested with the appropriate runtest invocation on i386-pc-solaris2.9 and x86_64-unknown-linux-gnu. Ok for mainline? Rainer 2014-02-05 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.target/i386/avx512f-vrndscaless-2.c (compute_rndscaless): Use __builtin_floorf, __builtin_ceilf. # HG changeset patch # Parent c8a18ca98263f4a2ca4e3e723f9d2b4596b67207 Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c @@ -18,10 +18,10 @@ compute_rndscaless (float *s1, float *s2 switch (rc) { case _MM_FROUND_FLOOR: - r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m); + r[0] = __builtin_floorf (s2[0] * pow (2, m)) / pow (2, m); break; case _MM_FROUND_CEIL: - r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m); + r[0] = __builtin_ceilf (s2[0] * pow (2, m)) / pow (2, m); break; default: abort (); -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
On Wed, Feb 5, 2014 at 4:50 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86 with gas: FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:21:14: warning: incompatible implicit declaration of built-in function 'floorf ' [enabled by default] /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:24:14: warning: incompatible implicit declaration of built-in function 'ceilf' [enabled by default] The platform lacks C99 support, but this can easily be avoided by using the builtins instead. The following patch does just that; tested with the appropriate runtest invocation on i386-pc-solaris2.9 and x86_64-unknown-linux-gnu. Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add extern float floorf (float); after math.h include. Does this work for you? Uros.
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
On Wed, Feb 05, 2014 at 05:09:31PM +0100, Uros Bizjak wrote: On Wed, Feb 5, 2014 at 4:50 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86 with gas: FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:21:14: warning: incompatible implicit declaration of built-in function 'floorf ' [enabled by default] /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:24:14: warning: incompatible implicit declaration of built-in function 'ceilf' [enabled by default] The platform lacks C99 support, but this can easily be avoided by using the builtins instead. The following patch does just that; tested with the appropriate runtest invocation on i386-pc-solaris2.9 and x86_64-unknown-linux-gnu. Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add extern float floorf (float); Won't that break if math.h defines floorf as a macro? You'd at least need then extern float (floorf) (float); Jakub
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
On Wed, Feb 5, 2014 at 5:11 PM, Jakub Jelinek ja...@redhat.com wrote: gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86 with gas: FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:21:14: warning: incompatible implicit declaration of built-in function 'floorf ' [enabled by default] /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2. c:24:14: warning: incompatible implicit declaration of built-in function 'ceilf' [enabled by default] The platform lacks C99 support, but this can easily be avoided by using the builtins instead. The following patch does just that; tested with the appropriate runtest invocation on i386-pc-solaris2.9 and x86_64-unknown-linux-gnu. Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add extern float floorf (float); Won't that break if math.h defines floorf as a macro? You'd at least need then extern float (floorf) (float); Looks like using builtins should be the correct way, so the original patch is OK. Rainer, can you please add the same cure to sse4_1-floor*.vec tests? Thanks, Uros.
Re: [C++ Patch/RFC] PR 60047
Hi, On 02/04/2014 12:51 PM, Paolo Carlini wrote: Hi, thus I tried to have a look to this issue, while experiencing some weird problems with the debugger, which slowed me down a lot. I have been able to figure out that we don't seem to actively set constexpr_p to true anywhere in implicitly_declare_fn (besides the unrelated case of inheriting constructors), thus I'm wondering if it would make sense to simply do the below?!? (and revert my recent tweak) I spent a little more time on this. Outside the case of inheriting constructors and the special case of expected_trivial default constructors (which are handled in a special conditional), we are normally assigning true to constexpr_p only in one place, that is close to the beginning of synthesized_method_walk: /* If that user-written default constructor would satisfy the requirements of a constexpr constructor (7.1.5), the implicitly-defined default constructor is constexpr. */ if (constexpr_p) *constexpr_p = ctor_p; then, for the testcase at issue it becomes false in the place which we already discussed for the ICE: if (vec_safe_is_empty (vbases)) /* No virtual bases to worry about. */; else if (!assign_p) { if (constexpr_p) *constexpr_p = false; Then it doesn't gets a chance to return true. Thus, given the more accurate analysis, I think that we either do what I quickly proposed yesterday, or alternately, the only option I can see, we change process_subob_fn to set it to true in some cases (but I note that it does never set to true trivial_p). Thanks, Paolo.
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
Hi Uros, Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add extern float floorf (float); after math.h include. Does this work for you? it does, but only because the floorf call is optimized away at -O2. While the Solaris 9 libm contains __floorf, there's no floorf, neither in libm nor in headers. The builtin route avoids those problems. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
Uros Bizjak ubiz...@gmail.com writes: Looks like using builtins should be the correct way, so the original patch is OK. Rainer, can you please add the same cure to sse4_1-floor*.vec tests? Sure, will do. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Uros Bizjak ubiz...@gmail.com writes: Looks like using builtins should be the correct way, so the original patch is OK. Rainer, can you please add the same cure to sse4_1-floor*.vec tests? Sure, will do. Here's what I committed after retesting on i386-pc-solaris2.9, i386-pc-solaris2.11 and x86_64-unknown-linux-gnu. Rainer 2014-02-05 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.target/i386/avx512f-vrndscaless-2.c (compute_rndscaless): Use __builtin_floorf, __builtin_ceilf. * gcc.target/i386/sse4_1-floorf-sfix-vec.c (floorf): Remove declaration. (TEST): Use __builtin_floorf. * gcc.target/i386/sse4_1-floorf-vec.c: Likewise. # HG changeset patch # Parent c8a18ca98263f4a2ca4e3e723f9d2b4596b67207 Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c @@ -18,10 +18,10 @@ compute_rndscaless (float *s1, float *s2 switch (rc) { case _MM_FROUND_FLOOR: - r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m); + r[0] = __builtin_floorf (s2[0] * pow (2, m)) / pow (2, m); break; case _MM_FROUND_CEIL: - r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m); + r[0] = __builtin_ceilf (s2[0] * pow (2, m)) / pow (2, m); break; default: abort (); diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c --- a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c +++ b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c @@ -15,8 +15,6 @@ #include math.h -extern float floorf (float); - #define NUM 64 static void @@ -53,10 +51,10 @@ TEST (void) init_src (a); for (i = 0; i NUM; i++) -r[i] = (int) floorf (a[i]); +r[i] = (int) __builtin_floorf (a[i]); /* check results: */ for (i = 0; i NUM; i++) -if (r[i] != (int) floorf (a[i])) +if (r[i] != (int) __builtin_floorf (a[i])) abort(); } diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c --- a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c +++ b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c @@ -15,8 +15,6 @@ #include math.h -extern float floorf (float); - #define NUM 64 static void @@ -53,10 +51,10 @@ TEST (void) init_src (a); for (i = 0; i NUM; i++) -r[i] = floorf (a[i]); +r[i] = __builtin_floorf (a[i]); /* check results: */ for (i = 0; i NUM; i++) -if (r[i] != floorf (a[i])) +if (r[i] != __builtin_floorf (a[i])) abort(); } -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: regression test issue
Hi, On 02/05/2014 06:29 AM, Iyer, Balaji V wrote: Hello Everyone, The following two Cilk Plus tests is timing out at -O1 in my x86_64 box (-O2, -O3 and -O0 works fine). These tests were working fine till revision r207047. Can someone please look at this? It looks like a middle-end/back-end issue. WARNING: program timed out. FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O1 -fcilkplus execution test WARNING: program timed out. FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c -O1 execution test Thanks. I'm wondering if we could open an high priority Bugzilla about that and for the time being avoid running the test at -O1, it is slowing down the testsuite and using a lot of cpu. Also, maybe you could figure out which specific change caused the regression and ping the appropriate people... Thanks again, Paolo.
RE: regression test issue
-Original Message- From: Paolo Carlini [mailto:paolo.carl...@oracle.com] Sent: Wednesday, February 5, 2014 11:53 AM To: Iyer, Balaji V; gcc-patches@gcc.gnu.org Subject: Re: regression test issue Hi, On 02/05/2014 06:29 AM, Iyer, Balaji V wrote: Hello Everyone, The following two Cilk Plus tests is timing out at -O1 in my x86_64 box (-O2, -O3 and -O0 works fine). These tests were working fine till revision r207047. Can someone please look at this? It looks like a middle-end/back- end issue. WARNING: program timed out. FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O1 -fcilkplus execution test WARNING: program timed out. FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c -O1 execution test Thanks. I'm wondering if we could open an high priority Bugzilla about that and for the time being avoid running the test at -O1, it is slowing down the testsuite and using a lot of cpu. Also, maybe you could figure out which specific change caused the regression and ping the appropriate people... Hi Paolo, OK. Here is a patch that will do so: Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp === --- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy) @@ -64,12 +64,12 @@ dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -O3 -ftree-vectorize -fcilkplus -g if { [check_libcilkrts_available] } { -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O1 -fcilkplus +#dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O1 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O3 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g -O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 +#dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O1 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -O3 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] -g -O2 === Here is the ChangeLog entry: 2014-02-05 Balaji V. Iyer balaji.v.i...@intel.com * g++.dg/cilk-plus/cilk-plus.exp: Commented out -O1 Cilk Keywords test for the time-being. Note: I am purposely commenting the line out and not deleting them because I want to get them to run in -O1 some time, and deleting them might cause us (atleast me :-). ) to forget about it. I will go ahead and file a bugzilla report. Thanks, Balaji V. Iyer. Thanks again, Paolo.
Re: var-tracking and s390's LM(G)
On 02/05/2014 07:58 AM, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote: I think at the moment we're relying on the DW_CFA_offset 15s to provide correct %r15 values on eh_return. Every non-leaf function saves the stack pointer and the unwind routines track the %r15 save slots like they would track any call-saved register. In other words, the final eh_return %r15 comes directly from a save slot, without any CFA-specific handling. If some frames omit the CFI for the %r15 save then we'll just keep the save slot from the previous (inner) frame instead. CCing Richard Henderson into the discussion. If the sp is saved for a given frame, we'll use that instead of the CFA when unwinding to a previous frame. That should be unaffected by whether or not sp is saved for any particular frame. There appears to be code in uw_install_context_1 to handle a mix of sp-saving and non-sp-saving frames. Given that s390 is pretty much the only target that uses these paths, I suppose it could be broken. But if it is, it would surely be better to fix than just say it doesn't work. BTW, what is the reason why %r15 is saved/restored from the stack at all say for simple: void foo (void) { int a = 6; asm volatile (# %0 : +m (a)); } Is that some ABI matter where it effectively always uses requires to use some kind of frame pointer (which the saved previous stack pointer is)? It can surely make debugging without unwind info easier, ditto backtraces, on the other side it must have a runtime cost. As far as I recall, the return address is in r14 so the ABI also saves r15 for all non-leaf functions simply because it is practically free to do so with store-multiple. This creates the familiar ra/fp unwinding chain. Given how cheap it is to maintain this chain on s390, I see no particular call to break it. As for why r15 gets saved for this leaf example... I have no idea. Seems like a bug to me, frankly. r~
RE: [PATCH RFC] MIPS add support for MIPS SIMD ARCHITECTURE V1.07
Hi Richard, Attached is an updated patch for feedback so MSA support to MIPS backend can be added when open again for next stage1. It is unfinished in that some comments from your review of the initial patch have yet to be addressed. The diff is against svn 207500 Graham msa.svn.207500.diffs.gz Description: msa.svn.207500.diffs.gz
Re: RFA: MN10300: Include saved registers in stack usage
On Wed, 5 Feb 2014, Jeff Law wrote: However, it's pretty easy to avoid the headaches and just provide a popcount routine. I don't think this code is at all performance critical (once per function being compiled), so a simple popcount should be sufficient. No need for lookup tables IMHO. hwint.[ch] provide popcount_hwi, if HOST_WIDE_INT is a suitable type. (Personally I think there are lots of bits of standard integer manipulation functionality, such as popcount, that are common in processor instructions and compiler intrinsics, but could do with having bindings in ISO C to provide a standard way to access that functionality. Such bindings might take the form of library functions / macros that compilers / library headers can then optimize as appropriate; that seems most in the ISO C style rather than __builtin_*. Even outside of ISO C, I suspect there's quite a bit in common between target-specific builtins for different targets where a target-independent builtin would be better - and for that matter, target-specific builtins where some form of standard binding already exists but isn't supported in GCC, or where the target-specific builtin duplicates something generic in GNU C.) -- Joseph S. Myers jos...@codesourcery.com
Re: RFA: MN10300: Include saved registers in stack usage
On 02/05/14 03:44, Nick Clifton wrote: Hi Jeff, Hi Alex, Sorry - another MN10300 patch - this time for the stack size reported with -fstack-usage. Currently the value includes the stack frame, but it does not take into account any registers that might have been pushed onto the stack. The patch below fixes this problem, but there is one thing that I am not sure about - is it OK to use __builtin_popcount() or should I be calling some other function ? According to our coding conventions, the ability to build with something other than gcc is still desirable. You could argue that you're unlikely to be bootstrapping on a mn103 with something other than GCC and if you're building a cross, you could start by first building gcc native. However, it's pretty easy to avoid the headaches and just provide a popcount routine. I don't think this code is at all performance critical (once per function being compiled), so a simple popcount should be sufficient. No need for lookup tables IMHO. jeff
RE: regression test issue
Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to install? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Wednesday, February 5, 2014 12:02 PM To: 'Paolo Carlini'; gcc-patches@gcc.gnu.org Subject: RE: regression test issue -Original Message- From: Paolo Carlini [mailto:paolo.carl...@oracle.com] Sent: Wednesday, February 5, 2014 11:53 AM To: Iyer, Balaji V; gcc-patches@gcc.gnu.org Subject: Re: regression test issue Hi, On 02/05/2014 06:29 AM, Iyer, Balaji V wrote: Hello Everyone, The following two Cilk Plus tests is timing out at -O1 in my x86_64 box (-O2, -O3 and -O0 works fine). These tests were working fine till revision r207047. Can someone please look at this? It looks like a middle-end/back- end issue. WARNING: program timed out. FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O1 -fcilkplus execution test WARNING: program timed out. FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c -O1 execution test Thanks. I'm wondering if we could open an high priority Bugzilla about that and for the time being avoid running the test at -O1, it is slowing down the testsuite and using a lot of cpu. Also, maybe you could figure out which specific change caused the regression and ping the appropriate people... Hi Paolo, OK. Here is a patch that will do so: Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp == = --- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy) @@ -64,12 +64,12 @@ dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]] -O3 - ftree-vectorize -fcilkplus -g if { [check_libcilkrts_available] } { -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] - O1 -fcilkplus +#dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] - O1 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -O3 -fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g - fcilkplus dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] -g - O2 -fcilkplus -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk- plus/CK/*.c]] -O1 +#dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk- plus/CK/*.c]] -O1 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk- plus/CK/*.c]] -O3 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk- plus/CK/*.c]] -g dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk- plus/CK/*.c]] -g -O2 == = Here is the ChangeLog entry: 2014-02-05 Balaji V. Iyer balaji.v.i...@intel.com * g++.dg/cilk-plus/cilk-plus.exp: Commented out -O1 Cilk Keywords test for the time-being. Note: I am purposely commenting the line out and not deleting them because I want to get them to run in -O1 some time, and deleting them might cause us (atleast me :-). ) to forget about it. I will go ahead and file a bugzilla report. Thanks, Balaji V. Iyer. Thanks again, Paolo.
Re: regression test issue
On Wed, Feb 05, 2014 at 07:08:32PM +, Iyer, Balaji V wrote: Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to install? No. If you want to skip a particular test, you should use dg-skip-if for that, see http://gcc.gnu.org/onlinedocs/gccint/Directives.html Jakub
RE: regression test issue
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, February 5, 2014 2:25 PM To: Iyer, Balaji V Cc: Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: regression test issue On Wed, Feb 05, 2014 at 07:08:32PM +, Iyer, Balaji V wrote: Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to install? No. If you want to skip a particular test, you should use dg-skip-if for that, see http://gcc.gnu.org/onlinedocs/gccint/Directives.html Ok. Here is the fixed patch: Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 207437) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-02-05 Balaji V. Iyer balaji.v.i...@intel.com + + * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1 + * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise. + 2014-02-03 Marc Glisse marc.gli...@inria.fr PR c++/53017 Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc === --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (working copy) @@ -1,6 +1,7 @@ /* { dg-options -fcilkplus } */ /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */ /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ #include assert.h #include unistd.h Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c === --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 207437) +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run { target { i?86-*-* x86_64-*-* } } } */ /* { dg-options -fcilkplus } */ /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ OK to install? Thanks, Balaji V. Iyer. Jakub
Re: regression test issue
On Wed, Feb 05, 2014 at 07:38:11PM +, Iyer, Balaji V wrote: +2014-02-05 Balaji V. Iyer balaji.v.i...@intel.com + + * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1 + * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise. Ok. --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (working copy) @@ -1,6 +1,7 @@ /* { dg-options -fcilkplus } */ /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */ /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ #include assert.h #include unistd.h Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c === --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 207437) +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run { target { i?86-*-* x86_64-*-* } } } */ /* { dg-options -fcilkplus } */ /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ OK to install? Jakub
[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062, take 2)
On Wed, Feb 05, 2014 at 02:20:05PM +0100, Richard Biener wrote: Using !!optimize to determine if we should switch local ABI to regparm convention isn't compatible with optimize attribute, as !!optimize is whether the current function is being optimized, but for the ABI decisions we actually need the caller and callee to agree on the calling convention. Fixed by looking at callee's optimize settings all the time. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Seeing a 2nd variant of such code I'd rather have an abstraction for this ... optimize_fn ()? opt_for_fn (fn, OPT_...)? Here is an updated version of the patch, unfortunately this one regressed in quite a lot of tests. The problem is that by using opt_for_fn there it also sets the failure reason for all functions at -O0 that don't have the optimize attribute at all. Now copy_forbidden is used by the inliner, where we have to handle always_inline functions, other -O0 functions we likely don't want to inline, be it because the cmd line options are -O0 or because the function we are considering for inlining has optimize (0), and for versioning, where I'd say we want to never version the -O0 functions. So, where do we want to do that instead? E.g. should it be e.g. in tree_versionable_function_p directly and let the inliner (if it doesn't do already) also treat optimize(0) functions that aren't always_inline as noinline? I guess even without this patch my PR60026 fix broke inlining of __attribute__((always_inline, optimize (0))) functions. 2014-02-05 Jakub Jelinek ja...@redhat.com PR target/60062 * tree.h (opts_for_fn): New inline function. (opt_for_fn): Define. * config/i386/i386.c (ix86_function_regparm): Use opt_for_fn (decl, optimize) instead of optimize. * tree-inline.c (copy_forbidden): Use opt_for_fn. * gcc.c-torture/execute/pr60062.c: New test. * gcc.c-torture/execute/pr60072.c: New test. --- gcc/tree.h.jj 2014-01-20 19:16:29.0 +0100 +++ gcc/tree.h 2014-02-05 15:54:04.681904492 +0100 @@ -4470,6 +4470,20 @@ may_be_aliased (const_tree var) || TREE_ADDRESSABLE (var))); } +/* Return pointer to optimization flags of FNDECL. */ +static inline struct cl_optimization * +opts_for_fn (const_tree fndecl) +{ + tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl); + if (fn_opts == NULL_TREE) +fn_opts = optimization_default_node; + return TREE_OPTIMIZATION (fn_opts); +} + +/* opt flag for function FNDECL, e.g. opts_for_fn (fndecl, optimize) is + the optimization level of function fndecl. */ +#define opt_for_fn(fndecl, opt) (opts_for_fn (fndecl)-x_##opt) + /* For anonymous aggregate types, we need some sort of name to hold on to. In practice, this should not appear, but it should not be harmful if it does. */ --- gcc/config/i386/i386.c.jj 2014-02-05 10:37:36.166167116 +0100 +++ gcc/config/i386/i386.c 2014-02-05 15:56:36.779146394 +0100 @@ -5608,7 +5608,12 @@ ix86_function_regparm (const_tree type, /* Use register calling convention for local functions when possible. */ if (decl TREE_CODE (decl) == FUNCTION_DECL - optimize + /* Caller and callee must agree on the calling convention, so +checking here just optimize means that with +__attribute__((optimize (...))) caller could use regparm convention +and callee not, or vice versa. Instead look at whether the callee +is optimized or not. */ + opt_for_fn (decl, optimize) !(profile_flag !flag_fentry)) { /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified. */ --- gcc/tree-inline.c.jj2014-02-04 14:03:31.0 +0100 +++ gcc/tree-inline.c 2014-02-05 15:55:34.747448968 +0100 @@ -3315,16 +3315,10 @@ copy_forbidden (struct function *fun, tr goto fail; } - tree fs_opts; - fs_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fun-decl); - if (fs_opts) + if (!opt_for_fn (fun-decl, optimize)) { - struct cl_optimization *os = TREE_OPTIMIZATION (fs_opts); - if (!os-x_optimize) - { - reason = G_(function %q+F compiled without optimizations); - goto fail; - } + reason = G_(function %q+F compiled without optimizations); + goto fail; } fail: --- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 15:55:48.639388697 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr60062.c 2014-02-05 15:55:48.639388697 +0100 @@ -0,0 +1,25 @@ +/* PR target/60062 */ + +int a; + +static void +foo (const char *p1, int p2) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +static void +bar (const char *p1) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +__attribute__((optimize (0))) int +main () +{ + foo (hello, a); + bar (hello); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2014-02-05
RE: regression test issue
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: Wednesday, February 5, 2014 2:43 PM To: Iyer, Balaji V Cc: Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: regression test issue On Wed, Feb 05, 2014 at 07:38:11PM +, Iyer, Balaji V wrote: +2014-02-05 Balaji V. Iyer balaji.v.i...@intel.com + + * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1 + * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise. Ok. Committed. -Balaji V. Iyer. --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (working copy) @@ -1,6 +1,7 @@ /* { dg-options -fcilkplus } */ /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */ /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ #include assert.h #include unistd.h Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c == = --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 207437) +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run { target { i?86-*-* x86_64-*-* } } } */ /* { dg-options -fcilkplus } */ /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ OK to install? Jakub
Re: var-tracking and s390's LM(G)
Richard Henderson r...@redhat.com writes: On 02/05/2014 07:58 AM, Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote: I think at the moment we're relying on the DW_CFA_offset 15s to provide correct %r15 values on eh_return. Every non-leaf function saves the stack pointer and the unwind routines track the %r15 save slots like they would track any call-saved register. In other words, the final eh_return %r15 comes directly from a save slot, without any CFA-specific handling. If some frames omit the CFI for the %r15 save then we'll just keep the save slot from the previous (inner) frame instead. CCing Richard Henderson into the discussion. If the sp is saved for a given frame, we'll use that instead of the CFA when unwinding to a previous frame. That should be unaffected by whether or not sp is saved for any particular frame. There appears to be code in uw_install_context_1 to handle a mix of sp-saving and non-sp-saving frames. Given that s390 is pretty much the only target that uses these paths, I suppose it could be broken. But if it is, it would surely be better to fix than just say it doesn't work. OK. When I was looking at it earlier today, the reason it didn't work seemed to be that if %r15 wasn't saved for a particular frame, the context would inherit the %r15 save slot for the previous frame, just like when some functions use and save a particular call-saved register and some don't. So we would only get to uw_install_context_1 without a %r15 slot if none of the frames had one. In other words, the reason seemed to be that the _Unwind_SetGRPtr in: #ifdef EH_RETURN_STACKADJ_RTX /* Special handling here: Many machines do not use a frame pointer, and track the CFA only through offsets from the stack pointer from one frame to the next. In this case, the stack pointer is never stored, so it has no saved address in the context. What we do have is the CFA from the previous stack frame. In very special situations (such as unwind info for signal return), there may be location expressions that use the stack pointer as well. Do this conditionally for one frame. This allows the unwind info for one frame to save a copy of the stack pointer from the previous frame, and be able to use much easier CFA mechanisms to do it. Always zap the saved stack pointer value for the next frame; carrying the value over from one frame to another doesn't make sense. */ _Unwind_SpTmp tmp_sp; if (!_Unwind_GetGRPtr (orig_context, __builtin_dwarf_sp_column ())) _Unwind_SetSpColumn (orig_context, context-cfa, tmp_sp); _Unwind_SetGRPtr (context, __builtin_dwarf_sp_column (), NULL); #endif was conditional on EH_RETURN_STACKADJ_RTX being defined. It looked at first glance like things would work if that call was moved outside, so that we never inherit save slots for the stack pointer. Does that make sense? I can give it a go tomorrow if so. Thanks, Richard
[PATCH] Remove unreachable return stmt (PR c/53123)
This is not a regression, on the other hand it's so obvious that it could go in nevertheless. Ok? 2014-02-05 Marek Polacek pola...@redhat.com PR c/53123 c-family/ * c-omp.c (c_finish_omp_atomic): Remove unreachable return statement. diff --git gcc/c-family/c-omp.c gcc/c-family/c-omp.c index 4ce51e4..dd0a45d 100644 --- gcc/c-family/c-omp.c +++ gcc/c-family/c-omp.c @@ -183,7 +183,6 @@ c_finish_omp_atomic (location_t loc, enum tree_code code, OMP_ATOMIC_SEQ_CST (x) = seq_cst; return build_modify_expr (loc, v, NULL_TREE, NOP_EXPR, loc, x, NULL_TREE); - return x; } /* There are lots of warnings, errors, and conversions that need to happen Marek
Re: [PATCH] Remove unreachable return stmt (PR c/53123)
On Wed, Feb 05, 2014 at 09:58:32PM +0100, Marek Polacek wrote: This is not a regression, on the other hand it's so obvious that it could go in nevertheless. Ok? 2014-02-05 Marek Polacek pola...@redhat.com PR c/53123 c-family/ * c-omp.c (c_finish_omp_atomic): Remove unreachable return statement. Ok, thanks. --- gcc/c-family/c-omp.c +++ gcc/c-family/c-omp.c @@ -183,7 +183,6 @@ c_finish_omp_atomic (location_t loc, enum tree_code code, OMP_ATOMIC_SEQ_CST (x) = seq_cst; return build_modify_expr (loc, v, NULL_TREE, NOP_EXPR, loc, x, NULL_TREE); - return x; } /* There are lots of warnings, errors, and conversions that need to happen Jakub
Re: regression test issue
Iyer, Balaji V balaji.v.i...@intel.com writes: Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc === --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (working copy) @@ -1,6 +1,7 @@ /* { dg-options -fcilkplus } */ /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */ /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ #include assert.h #include unistd.h Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c === --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 207437) +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run { target { i?86-*-* x86_64-*-* } } } */ /* { dg-options -fcilkplus } */ /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ You should put a comment (e.g. a PR reference) in the comment field to explain why you're skipping this test. Most likely, the last arg to dg-skip-if ({ }) is optional; if so, please omit it. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: var-tracking and s390's LM(G)
Richard Sandiford wrote: In other words, the reason seemed to be that the _Unwind_SetGRPtr in: #ifdef EH_RETURN_STACKADJ_RTX /* Special handling here: Many machines do not use a frame pointer, and track the CFA only through offsets from the stack pointer from one frame to the next. In this case, the stack pointer is never stored, so it has no saved address in the context. What we do have is the CFA from the previous stack frame. In very special situations (such as unwind info for signal return), there may be location expressions that use the stack pointer as well. Do this conditionally for one frame. This allows the unwind info for one frame to save a copy of the stack pointer from the previous frame, and be able to use much easier CFA mechanisms to do it. Always zap the saved stack pointer value for the next frame; carrying the value over from one frame to another doesn't make sense. */ _Unwind_SpTmp tmp_sp; if (!_Unwind_GetGRPtr (orig_context, __builtin_dwarf_sp_column ())) _Unwind_SetSpColumn (orig_context, context-cfa, tmp_sp); _Unwind_SetGRPtr (context, __builtin_dwarf_sp_column (), NULL); #endif was conditional on EH_RETURN_STACKADJ_RTX being defined. It looked at first glance like things would work if that call was moved outside, so that we never inherit save slots for the stack pointer. Does that make sense? I can give it a go tomorrow if so. I haven't looked into this in detail right now, but I recall that I had to specifically add that #ifdef a long time ago since unwinding wouldn't work correctly on s390 otherwise: http://gcc.gnu.org/ml/gcc-patches/2003-05/msg00904.html And the background of the bug here: http://gcc.gnu.org/ml/gcc/2003-05/msg00536.html Actually, now I think the problem originally described there is still valid: on s390 the CFA is *not* equal to the value at function entry, but biased by 96/160 bytes. So setting the SP to the CFA is wrong ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [C++ Patch/RFC] PR 60047
On 02/05/2014 11:19 AM, Paolo Carlini wrote: if (vec_safe_is_empty (vbases)) /* No virtual bases to worry about. */; else if (!assign_p) { if (constexpr_p) *constexpr_p = false; *constexpr_p should be false for a constructor of a class with virtual bases, according to the standard (7.1.5p4): The definition of a constexpr constructor shall satisfy the following constraints: — the class shall not have any virtual base classes; ... So the assert in implicit_declare_fn is wrong. I guess I would fix it by checking CLASSTYPE_VBASECLASSES. Jason
Re: var-tracking and s390's LM(G)
I wrote and forgot to proof-read: Actually, now I think the problem originally described there is still valid: on s390 the CFA is *not* equal to the value at function entry, ... CFA is not equal to the *SP* value at function entry ... but biased by 96/160 bytes. So setting the SP to the CFA is wrong ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
RE: regression test issue
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Rainer Orth Sent: Wednesday, February 5, 2014 4:22 PM To: Iyer, Balaji V Cc: Jakub Jelinek; Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: regression test issue Iyer, Balaji V balaji.v.i...@intel.com writes: Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc == = --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (revision 207437) +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc (working copy) @@ -1,6 +1,7 @@ /* { dg-options -fcilkplus } */ /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */ /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ #include assert.h #include unistd.h Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c == = --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 207437) +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do run { target { i?86-*-* x86_64-*-* } } } */ /* { dg-options -fcilkplus } */ /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-skip-if { *-*-* } { -O1 } { } } */ You should put a comment (e.g. a PR reference) in the comment field to explain why you're skipping this test. Most likely, the last arg to dg-skip-if ({ }) is optional; if so, please omit it. There isn't a PR reference for this. I can add a comment in the testcase. Would that be enough? I will also remove the last {} from the tests. Thanks, Balaji V. Iyer. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [i386] Replace builtins with vector extensions
Hello, I was wondering if the new #pragma target in *mmintrin.h make this approach more acceptable for 4.10? http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00374.html On Sun, 7 Apr 2013, Marc Glisse wrote: Hello, the attached patch is very incomplete (it passes bootstrap+testsuite on x86_64-linux-gnu), but it raises a number of questions that I'd like to settle before continuing. * Is there any chance of a patch in this direction being accepted? * May I remove the builtins (from i386.c and the doc) when they become unused? * Do we want to keep the casts even when they don't seem strictly necessary? For instance for _mm_add_ps, we can write: return __A + __B; or: return (__m128) ((__v4sf)__A + (__v4sf)__B); Note that for _mm_add_epi8 for instance we do need the casts. * For integer operations like _mm_add_epi16 I should probably use the unsigned typedefs to make it clear overflow is well defined? (the patch still has the signed version) * Any better name than __v4su for the unsigned version of __v4si? * Other comments? 2013-04-07 Marc Glisse marc.gli...@inria.fr * emmintrin.h (__v2du, __v4su, __v8hu): New typedefs. (_mm_add_pd, _mm_sub_pd, _mm_mul_pd, _mm_div_pd, _mm_cmpeq_pd, _mm_cmplt_pd, _mm_cmple_pd, _mm_cmpgt_pd, _mm_cmpge_pd, _mm_cmpneq_pd, _mm_add_epi8, _mm_add_epi16, _mm_add_epi32, _mm_add_epi64, _mm_slli_epi16, _mm_slli_epi32, _mm_slli_epi64, _mm_srai_epi16, _mm_srai_epi32, _mm_srli_epi16, _mm_srli_epi32, _mm_srli_epi64): Replace builtins with vector extensions. * xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps, _mm_div_ps, _mm_cmpeq_ps, _mm_cmplt_ps, _mm_cmple_ps, _mm_cmpgt_ps, _mm_cmpge_ps, _mm_cmpneq_ps): Likewise. -- Marc Glisse
Re: regression test issue
On Wed, Feb 05, 2014 at 09:48:12PM +, Iyer, Balaji V wrote: You should put a comment (e.g. a PR reference) in the comment field to explain why you're skipping this test. Most likely, the last arg to dg-skip-if ({ }) is optional; if so, please omit it. There isn't a PR reference for this. I can add a comment in the testcase. Would that be enough? I will also remove the last {} from the tests. Please keep it as is, I have an untested fix for the issue already and will be testing it soon, so I'd prefer not to have to back out too many changes. Jakub
Re: RFA: MN10300: Include saved registers in stack usage
On 02/05/14 11:26, Joseph S. Myers wrote: On Wed, 5 Feb 2014, Jeff Law wrote: However, it's pretty easy to avoid the headaches and just provide a popcount routine. I don't think this code is at all performance critical (once per function being compiled), so a simple popcount should be sufficient. No need for lookup tables IMHO. hwint.[ch] provide popcount_hwi, if HOST_WIDE_INT is a suitable type. Even better. (Personally I think there are lots of bits of standard integer manipulation functionality, such as popcount, that are common in processor instructions and compiler intrinsics, but could do with having bindings in ISO C to provide a standard way to access that functionality. Agreed. Jeff
Re: var-tracking and s390's LM(G)
On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote: Actually, now I think the problem originally described there is still valid: on s390 the CFA is *not* equal to the value at function entry, but biased by 96/160 bytes. So setting the SP to the CFA is wrong ... Such biasing happens on most of the targets, e.g. x86_64 has upon function entry CFA set to %rsp + 8, i?86 to %esp + 4. Jakub
Re: Is testing libgomp outside of the build tree supported?
Am 04.02.2014 03:14, schrieb Mike Stump: On Feb 3, 2014, at 3:52 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 3 Feb 2014, Andrew Pinski wrote: On Mon, Feb 3, 2014 at 11:14 AM, Diego Novillo dnovi...@google.com wrote: On Mon, Feb 3, 2014 at 2:11 PM, Ian Lance Taylor i...@google.com wrote: If the presence of the build tree makes writing some tests significantly simpler, I think that is OK. I would like to discourage that. Testing an already installed GCC for which no build tree exists is a very useful feature. I agree. Here at Cavium, we use the packaged up toolchain that comes from a RPM and test it so we are testing exactly what we ship out to our customers. Similarly at Mentor. And the maintainer of the test suite thinks that supporting the people that ship gcc to large numbers of people who help ensure the quality of gcc is useful. :-) It is nice to hear from people that this type of testing is useful; thanks. could somebody please shed some light on how this is done? It's nice that everybody has this kind of testing, but the only bit in the gcc sources itself seems to be a bit bit-rot and incomplete (contrib/test_installed). thanks, Matthias
Re: var-tracking and s390's LM(G)
Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote: Actually, now I think the problem originally described there is still valid: on s390 the CFA is *not* equal to the value at function entry, but biased by 96/160 bytes. So setting the SP to the CFA is wrong ... Such biasing happens on most of the targets, e.g. x86_64 has upon function entry CFA set to %rsp + 8, i?86 to %esp + 4. Ah, but that's a different bias. In those cases it is still correct to *unwind* SP to the CFA, since that's the value SP needs to have back in the caller immediately after the call site. On s390, the call/return instructions do not modify the SP so the SP at function entry is equal to the SP in the caller after return, but neither of these is equal to the CFA. Instead, SP points to 96/160 bytes below the CFA ... Therefore you cannot simply unwind SP to the CFA. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] Make pr59597 test PIC-friendly
On 02/05/14 05:32, Ian Bolton wrote: 2014-02-05 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.dg/tree-ssa/pr59597.c: Make called function static so that expected outcome works for PIC variants too. This is fine for stage4. Please install. Thanks, Jeff
Re: Is testing libgomp outside of the build tree supported?
On 02/05/14 15:10, Matthias Klose wrote: Am 04.02.2014 03:14, schrieb Mike Stump: On Feb 3, 2014, at 3:52 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 3 Feb 2014, Andrew Pinski wrote: On Mon, Feb 3, 2014 at 11:14 AM, Diego Novillo dnovi...@google.com wrote: On Mon, Feb 3, 2014 at 2:11 PM, Ian Lance Taylor i...@google.com wrote: If the presence of the build tree makes writing some tests significantly simpler, I think that is OK. I would like to discourage that. Testing an already installed GCC for which no build tree exists is a very useful feature. I agree. Here at Cavium, we use the packaged up toolchain that comes from a RPM and test it so we are testing exactly what we ship out to our customers. Similarly at Mentor. And the maintainer of the test suite thinks that supporting the people that ship gcc to large numbers of people who help ensure the quality of gcc is useful. :-) It is nice to hear from people that this type of testing is useful; thanks. could somebody please shed some light on how this is done? It's nice that everybody has this kind of testing, but the only bit in the gcc sources itself seems to be a bit bit-rot and incomplete (contrib/test_installed). I suspect most folks have a site.exp they drop somewhere and explicitly call runtest --tool gcc I know we do it internally in our QE team, but I never asked them about the current mechanics of how it's done. Jeff
[PATCH] C++ thunk section names
Hi, I would like this patch reviewed and considered for commit when Stage 1 is active again. Patch Description: A C++ thunk's section name is set to be the same as the original function's section name for which the thunk was created in order to place the two together. This is done in cp/method.c in function use_thunk. However, with function reordering turned on, the original function's section name can change to something like .text.hot.orginal or .text.unlikely.original in function default_function_section in varasm.c based on the node count of that function. The thunk function's section name is not updated to be the same as the original here and also is not always correct to do it as the original function can be hotter than the thunk. I have created a patch to not name the thunk function's section to be the same as the original function when function reordering is enabled. Thanks Sri A C++ thunk's section name is set to be the same as the original function's section name for which the thunk was created in order to place the two together. This is done in cp/method.c in function use_thunk. However, with function reordering turned on, the original function's section name can change to something like .text.hot.orginal or .text.unlikely.original in function default_function_section in varasm.c based on the node count of that function. The thunk function's section name is not updated to be the same as the original here and also is not always correct to do it as the original function can be hotter than the thunk. I have created a patch to not name the thunk function's section to be the same as the original function when function reordering is enabled. Index: testsuite/g++.dg/thunk_section_name.C === --- testsuite/g++.dg/thunk_section_name.C (revision 0) +++ testsuite/g++.dg/thunk_section_name.C (revision 0) @@ -0,0 +1,30 @@ +/* { dg-require-named-sections } */ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-reorder-blocks-and-partition -ffunction-sections } */ + +class base_class_1 +{ +public: + virtual void vfn () {} +}; + +class base_class_2 +{ +public: + virtual void vfn () {} +}; + +class need_thunk_class : public base_class_1, public base_class_2 +{ +public: + virtual void vfn () {} +}; + +int main (int argc, char *argv[]) +{ + base_class_1 *c = new need_thunk_class (); + c-vfn(); + return 0; +} + +/* { dg-final { scan-assembler \.text\._ZThn8_N16need_thunk_class3vfnEv } } */ Index: cp/method.c === --- cp/method.c (revision 207517) +++ cp/method.c (working copy) @@ -362,8 +362,10 @@ use_thunk (tree thunk_fndecl, bool emit_p) { resolve_unique_section (thunk_fndecl, 0, flag_function_sections); - /* Output the thunk into the same section as function. */ - DECL_SECTION_NAME (thunk_fndecl) = DECL_SECTION_NAME (function); + /* Output the thunk into the same section as function if function reordering +is not switched on. */ + if (!flag_reorder_functions) + DECL_SECTION_NAME (thunk_fndecl) = DECL_SECTION_NAME (function); } }
Re: [PATCH] Documentation for dump and optinfo output
I am really sorry about the delay. I couldn't exactly reproduce the warning which you described, but I found a place where two nodes were out of order in optinfo.texi. Could you please apply the following patch and see if the problem goes away? If it works for you, I will commit the doc fixes. Also I would appreciate the exact command which produces these warnings. Thanks, Sharad 2014-02-05 Sharad Singhai sing...@google.com * doc/optinfo.texi: Fix order of nodes. Index: doc/optinfo.texi === --- doc/optinfo.texi (revision 207525) +++ doc/optinfo.texi (working copy) @@ -12,8 +12,8 @@ information about various compiler transformations @menu * Dump setup:: Setup of optimization dumps. * Optimization groups::Groups made up of optimization passes. +* Dump files and streams:: Dump output file names and streams. * Dump output verbosity:: How much information to dump. -* Dump files and streams:: Dump output file names and streams. * Dump types:: Various types of dump functions. * Dump examples:: Sample usage. @end menu Thanks, Sharad On Fri, Jan 24, 2014 at 2:20 AM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! On Thu, 19 Dec 2013 01:44:08 -0800, Sharad Singhai sing...@google.com wrote: I ran 'make doc' and browsed the info output for basic sanity. Okay for trunk? * Makefile.in: Add optinfo.texi. * doc/optinfo.texi: New documentation for optimization info. * doc/passes.texi: Add new node. Thanks for contributing to the documentation! During build, I'm seeing the following warnings; would be great if you could address these: ../../source/gcc/doc/optinfo.texi:45: warning: node next `Optimization groups' in menu `Dump output verbosity' and in sectioning `Dump files and streams' differ ../../source/gcc/doc/optinfo.texi:77: warning: node next `Dump files and streams' in menu `Dump types' and in sectioning `Dump output verbosity' differ ../../source/gcc/doc/optinfo.texi:77: warning: node prev `Dump files and streams' in menu `Dump output verbosity' and in sectioning `Optimization groups' differ ../../source/gcc/doc/optinfo.texi:104: warning: node next `Dump output verbosity' in menu `Dump files and streams' and in sectioning `Dump types' differ ../../source/gcc/doc/optinfo.texi:104: warning: node prev `Dump output verbosity' in menu `Optimization groups' and in sectioning `Dump files and streams' differ ../../source/gcc/doc/optinfo.texi:137: warning: node prev `Dump types' in menu `Dump files and streams' and in sectioning `Dump output verbosity' differ Grüße, Thomas
Re: Is testing libgomp outside of the build tree supported?
On Wed, 5 Feb 2014, Jeff Law wrote: I suspect most folks have a site.exp they drop somewhere and explicitly call runtest --tool gcc * Create site.exp (based on what GCC's makefiles do for build-tree testing). Note that in some cases you may need different contents for different testsuite, especially runtime libraries. * Set up PATH so the installed compilers are found, LD_LIBRARY_PATH so the installed shared libraries are found by the dynamic linker (or otherwise set up your board file appropriately for running newly compiled programs). * runtest --srcdir /some/where --tool whatever If you compare results of this for build-tree and installed testing, fixing differences is very useful. (There are some known bugs for such differences; at least, bugs 23867, 25320, 58867.) -- Joseph S. Myers jos...@codesourcery.com
Re: var-tracking and s390's LM(G)
On 02/05/2014 02:30 PM, Ulrich Weigand wrote: Jakub Jelinek wrote: On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote: Actually, now I think the problem originally described there is still valid: on s390 the CFA is *not* equal to the value at function entry, but biased by 96/160 bytes. So setting the SP to the CFA is wrong ... Such biasing happens on most of the targets, e.g. x86_64 has upon function entry CFA set to %rsp + 8, i?86 to %esp + 4. Ah, but that's a different bias. In those cases it is still correct to *unwind* SP to the CFA, since that's the value SP needs to have back in the caller immediately after the call site. On s390, the call/return instructions do not modify the SP so the SP at function entry is equal to the SP in the caller after return, but neither of these is equal to the CFA. Instead, SP points to 96/160 bytes below the CFA ... Therefore you cannot simply unwind SP to the CFA. I was about to reply the same to Jakub, Ulrich beat me to it. Basically, the CFA has been defined incorrectly for s390, but it hasn't mattered since they record the save of the SP. But the result is that you can't stop recording the save of SP without also adjusting how the CFA is defined. Which _can_ be done, in a way that's fully compatible with all of the existing unwinding. But certainly shouldn't be attempted at this stage of development. r~
New Vietnamese PO file for 'cpplib' (version 4.9-b20140202)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Vietnamese team of translators. The file is available at: http://translationproject.org/latest/cpplib/vi.po (This file, 'cpplib-4.9-b20140202.vi.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Contents of PO file 'cpplib-4.9-b20140202.vi.po'
cpplib-4.9-b20140202.vi.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
New Vietnamese PO file for 'gcc' (version 4.9-b20140202)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Vietnamese team of translators. The file is available at: http://translationproject.org/latest/gcc/vi.po (This file, 'gcc-4.9-b20140202.vi.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] One possible fix for the compute_bb_predicate oscillation (PR ipa/60013)
Hi, this is variant of patch I am testing (thanks Jakub for the hard analysis). The dataflow was supposed to be monotonous by keeping the rule that oldvalue imply new value at each step. This is broken by the approximation done by and/or operations. Instead of dropping to true, it seems easier to simply or the result before updating. This operation should be no-op until we hit the clause limit. Bootstrapping/regtesting x86_64-linux, will commit if it passes. Honza * ipa-inline-analysis.c (compute_bb_predicates): Ensure monotonicity of the dataflow. gcc.dg/pr60013.c: New testcase. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 207514) +++ ipa-inline-analysis.c (working copy) @@ -310,7 +310,7 @@ add_clause (conditions conditions, struc if (false_predicate_p (p)) return; - /* No one should be sily enough to add false into nontrivial clauses. */ + /* No one should be silly enough to add false into nontrivial clauses. */ gcc_checking_assert (!(clause (1 predicate_false_condition))); /* Look where to insert the clause. At the same time prune out @@ -1035,7 +1035,7 @@ inline_node_removal_hook (struct cgraph_ memset (info, 0, sizeof (inline_summary_t)); } -/* Remap predicate P of former function to be predicate of duplicated functoin. +/* Remap predicate P of former function to be predicate of duplicated function. POSSIBLE_TRUTHS is clause of possible truths in the duplicated node, INFO is inline summary of the duplicated node. */ @@ -1887,8 +1887,15 @@ compute_bb_predicates (struct cgraph_nod } else if (!predicates_equal_p (p, (struct predicate *) bb-aux)) { - done = false; - *((struct predicate *) bb-aux) = p; + /* This OR operation is needed to ensure monotonous data flow +in the case we hit the limit on number of clauses and the +and/or operations above give approximate answers. */ + p = or_predicates (summary-conds, p, (struct predicate *)bb-aux); + if (!predicates_equal_p (p, (struct predicate *) bb-aux)) + { + done = false; + *((struct predicate *) bb-aux) = p; + } } } } Index: testsuite/gcc.dg/pr60013.c === --- testsuite/gcc.dg/pr60013.c (revision 0) +++ testsuite/gcc.dg/pr60013.c (revision 0) @@ -0,0 +1,47 @@ +/* PR ipa/60013 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +typedef long int jmp_buf[64]; +extern int _setjmp (jmp_buf) __attribute__ ((__nothrow__)); +struct S { int a, b, c; }; +extern struct S *baz (struct S *); +static jmp_buf j; + +static inline int +bar (int b, int d) +{ + return (b d) 0; +} + +struct S * +foo (int a, struct S *b, struct S *c, struct S *d) +{ + if (b-a == 0) +{ + switch (a) + { + case 8: + return baz (b); + case 7: + bar (b-c, c-b); + return 0; + case 6: + case 5: + case 4: + return baz (c); + case 3: + case 2: + return baz (d); + } + return 0; +} + if (b-a == 1) +{ + if (baz (c)) + return c; + else if (_setjmp (j)) + baz (b); +} + return 0; +}
[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062, take 3)
On Wed, Feb 05, 2014 at 08:42:27PM +0100, Jakub Jelinek wrote: So, where do we want to do that instead? E.g. should it be e.g. in tree_versionable_function_p directly and let the inliner (if it doesn't do already) also treat optimize(0) functions that aren't always_inline as noinline? So, another attempt to put the opt_for_fn (fndecl, optimize) into tree_versionable_function_p also failed, because e.g. for TM (but also for SIMD clones) we need to clone -O0 functions. So, can I fix PR600{6,7}2 without touching tree-inline.c and fix PR60026 again separately somehow else as follow-up? Bootstrapped/regtested on x86_64-linux and i686-linux. 2014-02-06 Jakub Jelinek ja...@redhat.com PR target/60062 * tree.h (opts_for_fn): New inline function. (opt_for_fn): Define. * config/i386/i386.c (ix86_function_regparm): Use opt_for_fn (decl, optimize) instead of optimize. * gcc.c-torture/execute/pr60062.c: New test. * gcc.c-torture/execute/pr60072.c: New test. --- gcc/tree.h.jj 2014-01-20 19:16:29.0 +0100 +++ gcc/tree.h 2014-02-05 15:54:04.681904492 +0100 @@ -4470,6 +4470,20 @@ may_be_aliased (const_tree var) || TREE_ADDRESSABLE (var))); } +/* Return pointer to optimization flags of FNDECL. */ +static inline struct cl_optimization * +opts_for_fn (const_tree fndecl) +{ + tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl); + if (fn_opts == NULL_TREE) +fn_opts = optimization_default_node; + return TREE_OPTIMIZATION (fn_opts); +} + +/* opt flag for function FNDECL, e.g. opts_for_fn (fndecl, optimize) is + the optimization level of function fndecl. */ +#define opt_for_fn(fndecl, opt) (opts_for_fn (fndecl)-x_##opt) + /* For anonymous aggregate types, we need some sort of name to hold on to. In practice, this should not appear, but it should not be harmful if it does. */ --- gcc/config/i386/i386.c.jj 2014-02-05 10:37:36.166167116 +0100 +++ gcc/config/i386/i386.c 2014-02-05 15:56:36.779146394 +0100 @@ -5608,7 +5608,12 @@ ix86_function_regparm (const_tree type, /* Use register calling convention for local functions when possible. */ if (decl TREE_CODE (decl) == FUNCTION_DECL - optimize + /* Caller and callee must agree on the calling convention, so +checking here just optimize means that with +__attribute__((optimize (...))) caller could use regparm convention +and callee not, or vice versa. Instead look at whether the callee +is optimized or not. */ + opt_for_fn (decl, optimize) !(profile_flag !flag_fentry)) { /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified. */ --- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 15:55:48.639388697 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr60062.c 2014-02-05 15:55:48.639388697 +0100 @@ -0,0 +1,25 @@ +/* PR target/60062 */ + +int a; + +static void +foo (const char *p1, int p2) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +static void +bar (const char *p1) +{ + if (__builtin_strcmp (p1, hello) != 0) +__builtin_abort (); +} + +__attribute__((optimize (0))) int +main () +{ + foo (hello, a); + bar (hello); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2014-02-05 15:55:48.640388641 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr60072.c 2014-02-05 15:55:48.640388641 +0100 @@ -0,0 +1,16 @@ +/* PR target/60072 */ + +int c = 1; + +__attribute__ ((optimize (1))) +static int *foo (int *p) +{ + return p; +} + +int +main () +{ + *foo (c) = 2; + return c - 2; +} Jakub
[PATCH] Fix two missed Makefile.in dependencies
Hi! I have noticed that with the .deps introduction for gcc/ we've lost these two dependencies, which autodependency creation isn't aware of, as the Makefile runs cat on those files and passes the content through -D option. Ok for trunk? 2014-02-06 Jakub Jelinek ja...@redhat.com * Makefile.in (prefix.o, cppbuiltin.o): Depend on $(BASEVER). --- gcc/Makefile.in.jj 2014-01-28 14:03:49.0 +0100 +++ gcc/Makefile.in 2014-02-05 13:09:26.871019299 +0100 @@ -1925,6 +1925,7 @@ default-c.o: config/default-c.c # Files used by all variants of C and some other languages. CFLAGS-prefix.o += -DPREFIX=\$(prefix)\ -DBASEVER=$(BASEVER_s) +prefix.o: $(BASEVER) # Language-independent files. @@ -2540,6 +2541,7 @@ PREPROCESSOR_DEFINES = \ @TARGET_SYSTEM_ROOT_DEFINE@ CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) +cppbuiltin.o: $(BASEVER) CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES) Jakub
[PATCH] Fix linemap_location_before_p with adhoc locs (PR preprocessor/56824)
Hi! linemap_location_before_p (which is implemented as a call to linemap_compare_locations) broke with the addition of IS_ADHOC_LOC, because it doesn't look through those and thus can happily compare some 0x8... number to normal locus and expect it to work. Most of the other line-map.c routines handle IS_ADHOC_LOC already, just not this function. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? The rest is just formatting cleanup, seeing opening [ at the end of lines was just too much for me. 2014-02-06 Jakub Jelinek ja...@redhat.com PR preprocessor/56824 * line-map.c (get_combined_adhoc_loc, linemap_get_expansion_line, linemap_get_expansion_filename, linemap_location_in_system_header_p, linemap_location_from_macro_expansion_p, linemap_macro_loc_to_spelling_point, linemap_macro_loc_to_def_point, linemap_macro_loc_to_exp_point, linemap_expand_location): Fix formatting. (linemap_compare_locations): Look through adhoc locations for both l0 and l1. * gcc.dg/pr56824.c: New test. --- libcpp/line-map.c.jj2014-01-23 10:53:07.0 +0100 +++ libcpp/line-map.c 2014-02-05 15:07:41.558748559 +0100 @@ -106,8 +106,8 @@ get_combined_adhoc_loc (struct line_maps linemap_assert (data); if (IS_ADHOC_LOC (locus)) -locus = - set-location_adhoc_data_map.data[locus MAX_SOURCE_LOCATION].locus; +locus + = set-location_adhoc_data_map.data[locus MAX_SOURCE_LOCATION].locus; if (locus == 0 data == NULL) return 0; lb.locus = locus; @@ -141,8 +141,8 @@ get_combined_adhoc_loc (struct line_maps } *slot = set-location_adhoc_data_map.data + set-location_adhoc_data_map.curr_loc; - set-location_adhoc_data_map.data[ - set-location_adhoc_data_map.curr_loc++] = lb; + set-location_adhoc_data_map.data[set-location_adhoc_data_map.curr_loc++] + = lb; } return ((*slot) - set-location_adhoc_data_map.data) | 0x8000; } @@ -833,8 +833,8 @@ linemap_get_expansion_line (struct line_ const struct line_map *map = NULL; if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; if (location RESERVED_LOCATION_COUNT) return 0; @@ -861,8 +861,8 @@ linemap_get_expansion_filename (struct l const struct line_map *map = NULL; if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; if (location RESERVED_LOCATION_COUNT) return NULL; @@ -899,8 +899,8 @@ linemap_location_in_system_header_p (str const struct line_map *map = NULL; if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; if (location RESERVED_LOCATION_COUNT) return false; @@ -942,8 +942,8 @@ linemap_location_from_macro_expansion_p source_location location) { if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; linemap_assert (location = MAX_SOURCE_LOCATION (set-highest_location @@ -1024,6 +1024,11 @@ linemap_compare_locations (struct line_m bool pre_virtual_p, post_virtual_p; source_location l0 = pre, l1 = post; + if (IS_ADHOC_LOC (l0)) +l0 = set-location_adhoc_data_map.data[l0 MAX_SOURCE_LOCATION].locus; + if (IS_ADHOC_LOC (l1)) +l1 = set-location_adhoc_data_map.data[l1 MAX_SOURCE_LOCATION].locus; + if (l0 == l1) return 0; @@ -1086,8 +1091,8 @@ linemap_macro_loc_to_spelling_point (str struct line_map *map; if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; linemap_assert (set location = RESERVED_LOCATION_COUNT); @@ -1124,8 +1129,8 @@ linemap_macro_loc_to_def_point (struct l struct line_map *map; if (IS_ADHOC_LOC (location)) -location = set-location_adhoc_data_map.data[ - location MAX_SOURCE_LOCATION].locus; +location = set-location_adhoc_data_map.data[location + MAX_SOURCE_LOCATION].locus; linemap_assert (set location =
[PATCH] Fix various issues in vect_analyze_data_refs (PR middle-end/59150)
Hi! This patch fixes a double free on simd-lane access if get_vectype_for_scalar_type fails (the new dr is already in datarefs[i] and if we free_data_ref it, it will be free_data_ref'ed again at the end of the failed vectorization), fixes handling of clobbers (plugs a memory leak due to missed free_data_ref and more importantly, if we goto again, dr would be kept at the old dr and thus we'd basically remove all the remaining drs in a loop) and plugs a memory leak with simd_lane_access. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? PS, just looking at the patch now again, in the light of PR59594 the reordering of datarefs before goto again looks also wrong, we IMHO have to perform a vector ordered remove in that case, ok to handle it as a follow-up? 2014-02-06 Jakub Jelinek ja...@redhat.com PR middle-end/59150 * tree-vect-data-refs.c (vect_analyze_data_refs): For clobbers, call free_data_ref on the dr first, and before goto again also set dr to the next dr. For simd_lane_access, free old datarefs[i] before overwriting it. For get_vectype_for_scalar_type failure, don't free_data_ref if simd_lane_access. --- gcc/tree-vect-data-refs.c.jj2014-02-05 15:28:10.0 +0100 +++ gcc/tree-vect-data-refs.c 2014-02-05 18:30:53.323659288 +0100 @@ -3297,12 +3297,13 @@ again: clobber stmts during vectorization. */ if (gimple_clobber_p (stmt)) { + free_data_ref (dr); if (i == datarefs.length () - 1) { datarefs.pop (); break; } - datarefs[i] = datarefs.pop (); + datarefs[i] = dr = datarefs.pop (); goto again; } @@ -3643,13 +3644,14 @@ again: if (simd_lane_access) { STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true; + free_data_ref (datarefs[i]); datarefs[i] = dr; } /* Set vectype for STMT. */ scalar_type = TREE_TYPE (DR_REF (dr)); - STMT_VINFO_VECTYPE (stmt_info) = -get_vectype_for_scalar_type (scalar_type); + STMT_VINFO_VECTYPE (stmt_info) + = get_vectype_for_scalar_type (scalar_type); if (!STMT_VINFO_VECTYPE (stmt_info)) { if (dump_enabled_p ()) @@ -3669,7 +3671,8 @@ again: if (gather || simd_lane_access) { STMT_VINFO_DATA_REF (stmt_info) = NULL; - free_data_ref (dr); + if (gather) + free_data_ref (dr); } return false; } Jakub