[PATCH] Fix loop pattern distribution ICE (PR tree-optimization/54321)
Hi! The middle-end argument of memset is signed (int), so simplify_builtin_call correctly checks host_integerp (val2, 0), but later on used tree_low_cst (val2, 1), so for negative values it would ICE. Fixed thusly, the memset is supposed to cast the int to unsigned char internally anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-08-20 Jakub Jelinek ja...@redhat.com PR tree-optimization/54321 * tree-ssa-forwprop.c (simplify_builtin_call): Pass 0 instead of 1 as second argument to tree_low_cst call on val2. * gcc.c-torture/compile/pr54321.c: New test. --- gcc/tree-ssa-forwprop.c.jj 2012-08-14 08:45:00.0 +0200 +++ gcc/tree-ssa-forwprop.c 2012-08-20 08:11:06.247936035 +0200 @@ -1554,7 +1554,7 @@ simplify_builtin_call (gimple_stmt_itera else src_buf[0] = tree_low_cst (src1, 0); memset (src_buf + tree_low_cst (diff, 1), - tree_low_cst (val2, 1), tree_low_cst (len2, 1)); + tree_low_cst (val2, 0), tree_low_cst (len2, 1)); src_buf[src_len] = '\0'; /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str handle embedded '\0's. */ --- gcc/testsuite/gcc.c-torture/compile/pr54321.c.jj2012-08-20 08:12:10.955630873 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr54321.c 2012-08-20 08:13:27.963398948 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/54321 */ +struct S { char s[0]; } *a; + +void +foo (void) +{ + char *b = a-s; + int c = 0; + b[0] = 0; + while (++c 9) +b[c] = 255; +} Jakub
Re: [PATCH][C++] Save memory and reallocations in name-lookup
On Sat, 18 Aug 2012, Dimitrios Apostolou wrote: Hi, On Fri, 17 Aug 2012, Jakub Jelinek wrote: On Fri, Aug 17, 2012 at 06:41:37AM -0500, Gabriel Dos Reis wrote: I am however concerned with: static void store_bindings (tree names, VEC(cxx_saved_binding,gc) **old_bindings) { ! static VEC(tree,heap) *bindings_need_stored = NULL; I would be more comfortable to see the cache be on per-scope (e.g. namespace scope) basis as opposed a blanket global cache stored in a global variable. It is not any kind of cache. It could be in theory an automatic variable vector pointer, it is only used during that function. The reason why it is static variable instead is just to avoid constant allocation/deallocation of the vector, this way after the first call it will be already allocated (but, upon entry to store_bindings will always be empty). Why not use stack vector of sufficient size for most cases then? I usually do something like: VEC (tree, stack) *bindings_need_stored = VEC_alloc (tree, stack, 32); ... VEC_free (tree, stack, bindings_need_stored); if I've measured that 32 (or whatever) is enough for most cases. I don't have a clue about this case though. I considered that, but I'm not sure it would be an improvement. We'd have a re-allocation and free in some cases then. Richard.
[PATCH] Fix PR54326
The following should fix PR54326 (3.4.6 at least compiles genoutput.c without warnings after this). Bootstrap running on x86_64-unknown-linux-gnu. Richard. 2012-08-20 Richard Guenther rguent...@suse.de PR bootstrap/54326 * genoutput.c (note_constraint): Properly use CONST_CAST. Index: gcc/genoutput.c === --- gcc/genoutput.c (revision 190523) +++ gcc/genoutput.c (working copy) @@ -1175,7 +1175,7 @@ note_constraint (rtx exp, int lineno) } } new_cdata = XNEWVAR (struct constraint_data, sizeof (struct constraint_data) + namelen); - strcpy ((char *)new_cdata + offsetof(struct constraint_data, name), name); + strcpy (CONST_CAST(char *, new_cdata-name), name); new_cdata-namelen = namelen; new_cdata-lineno = lineno; new_cdata-next_this_letter = *slot;
Re: alloc_pool for tree-ssa-pre.c:phi_translate_table
On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou ji...@gmx.net wrote: 2012-08-19 Dimitrios Apostolou ji...@gmx.net * gcc/tree-ssa-pre.c (phi_translate_pool): New static global alloc_pool, used for allocating struct expr_pred_trans_d for phi_translate_table. (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of malloc() and free() calls. This avoids lots of malloc/free calls and slow iterations during numerous htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as soon as a post C++ one is available. Ok, if bootstrap / testing still succeeds. Thanks, Richard. Thanks, Dimitris
Re: CXX conversion: min g++ version pre-requisite?
On Sun, Aug 19, 2012 at 8:44 PM, Gary Funck g...@intrepid.com wrote: I filed two bug reports: GCC install document does not list minimum required g++ version http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54324 GCC does not build with G++ version 3.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54326 Re: the latter bug report (54326), it might be further divided into two bug reports: one documenting the failure to build with g++ 3.4.0 and the other having to do with the use of casts in genoutput.c. Let me know if you recommend that I separate the bug reports. Does it still fail, even after fixing genoutput.c? If so, yes, please open a bug for the compile-with-3.4.6(!) issue. No, I think 3.4._0_ should not be what we should be after. Richard. Richard. - Gary
Re: [PATCH] Fix PR 52631 (VN does not use simplified expression for lookup)
On Mon, Aug 20, 2012 at 6:49 AM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: On Wed, Jul 25, 2012 at 4:39 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Jul 24, 2012 at 5:50 PM, Andrew Pinski andrew.pin...@caviumnetworks.com wrote: Hi, Before tuples was introduced, VN used to lookup the simplified expression to see if it was available already and use that instead of the non simplified one. This patch adds the support back to VN to do exactly that. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. I think this should be done for all RHS and SSA name LHS, not only for UNARY/BINARY/TERNARY - because even for SINGLE rhs we can end up simplifying (for REALPART_EXPR for example which we handle as nary, too). I think we constrain try_to_simplify enough so that + /* First try to lookup the simplified expression. */ + if (simplified valid_gimple_rhs_p (simplified)) + { + tree result = vn_nary_op_lookup (simplified, NULL); + if (result) + { + changed = set_ssa_val_to (lhs, result); + goto done; + } + changed = set_ssa_val_to (lhs, lhs); + vn_nary_op_insert (simplified, lhs); + } switch (get_gimple_rhs_class (code)) { case GIMPLE_UNARY_RHS: case GIMPLE_BINARY_RHS: ... should work. As you also insert the simplified variant I think we really (finally) want to have a valid_nary_op routine rather than relying on valid_gimple_rhs_p which is way too generic. I don't see valid_gimple_rhs_p being that generic as it checks to make sure the operands of the gimple are valid. Maybe I am missing something here though. valid_gimple_rhs_p checks what it says. But what we want to know is whether the rhs is valid for a SCCVN NARY. Those are not the same. Richard. Thanks, Andrew Pinski Thanks, Richard. Thanks, Andrew Pinski ChangeLog: * tree-ssa-sccvn.c (visit_use): Look up the simplified expression before visting the original one. * gcc.dg/tree-ssa/ssa-fre-9.c: Update the number of eliminatations that happen.
Re: [PATCH] Fix loop pattern distribution ICE (PR tree-optimization/54321)
On Mon, Aug 20, 2012 at 8:27 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! The middle-end argument of memset is signed (int), so simplify_builtin_call correctly checks host_integerp (val2, 0), but later on used tree_low_cst (val2, 1), so for negative values it would ICE. Fixed thusly, the memset is supposed to cast the int to unsigned char internally anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Yes, sorry for the mixup. Thanks, Richard. 2012-08-20 Jakub Jelinek ja...@redhat.com PR tree-optimization/54321 * tree-ssa-forwprop.c (simplify_builtin_call): Pass 0 instead of 1 as second argument to tree_low_cst call on val2. * gcc.c-torture/compile/pr54321.c: New test. --- gcc/tree-ssa-forwprop.c.jj 2012-08-14 08:45:00.0 +0200 +++ gcc/tree-ssa-forwprop.c 2012-08-20 08:11:06.247936035 +0200 @@ -1554,7 +1554,7 @@ simplify_builtin_call (gimple_stmt_itera else src_buf[0] = tree_low_cst (src1, 0); memset (src_buf + tree_low_cst (diff, 1), - tree_low_cst (val2, 1), tree_low_cst (len2, 1)); + tree_low_cst (val2, 0), tree_low_cst (len2, 1)); src_buf[src_len] = '\0'; /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str handle embedded '\0's. */ --- gcc/testsuite/gcc.c-torture/compile/pr54321.c.jj2012-08-20 08:12:10.955630873 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr54321.c 2012-08-20 08:13:27.963398948 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/54321 */ +struct S { char s[0]; } *a; + +void +foo (void) +{ + char *b = a-s; + int c = 0; + b[0] = 0; + while (++c 9) +b[c] = 255; +} Jakub
[PATCH] GTY chain_next annotate gimple_statement_base
This creates better marking code (even though we probably tail-recurse for the exising one). Bootstrapped and tested on x86-64-unknown-linux-gnu, applied. Richard. 2012-08-20 Richard Guenther rguent...@suse.de * gimple.h (gimple_statement_base): Annotate with GTY chain_next. Index: gcc/gimple.h === --- gcc/gimple.h(revision 190523) +++ gcc/gimple.h(working copy) @@ -151,7 +151,7 @@ typedef struct /* Data structure definitions for GIMPLE tuples. NOTE: word markers are for 64 bit hosts. */ -struct GTY(()) gimple_statement_base { +struct GTY((chain_next (%h.next))) gimple_statement_base { /* [ WORD 1 ] Main identifying code for a tuple. */ ENUM_BITFIELD(gimple_code) code : 8;
Re: Speedups/Cleanups: End of GSOC patch collection
Dimitrios Apostolou ji...@gmx.net a écrit: [...] * include/libiberty.h (XOBDELETE, XOBGROW, XOBGROWVEC, XOBSHRINK) (XOBSHRINKVEC, XOBFINISH): New type-safe macros for obstack operations. (XOBFINISH): Changed to return (T *) instead of T. All callers updated. * libcpp/include/symtab.h (obstack_chunk_alloc) (obstack_chunk_free): Define, so that obstack_init() can be used. * libcpp/internal.h (struct cset_converter): Same. * libcpp/files.c (_cpp_init_files): Changed _obstack_begin() to obstack_init(). * libcpp/identifiers.c (_cpp_init_hashtable): Same. * libcpp/symtab.c (ht_create): Same. * libcpp/init.c (cpp_create_reader): Same. [...] +++ libcpp/include/symtab.h 2012-08-18 15:07:01 + [...] +#ifndef obstack_chunk_alloc Please add a comment here, as you did bellow in hunk for libcpp/internal.h: +#ifndef obstack_chunk_alloc + /* Needed for calling obstack_init(). */ + #define obstack_chunk_alloc(void *(*) (long)) xmalloc + #define obstack_chunk_free (void (*) (void *)) free +#endif + #define obstack_chunk_alloc(void *(*) (long)) xmalloc + #define obstack_chunk_free (void (*) (void *)) free +#endif [...] With these changes, the libcpp parts look OK to me if they still boostrap post c++ conversion. I am not a maintainer so I a deferring to Tom and the other maintainers. Thanks. -- Dodji
Re: alloc_pool for tree-ssa-pre.c:phi_translate_table
On Mon, Aug 20, 2012 at 09:37:39AM +0200, Richard Guenther wrote: On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou ji...@gmx.net wrote: 2012-08-19 Dimitrios Apostolou ji...@gmx.net * gcc/tree-ssa-pre.c (phi_translate_pool): New static global alloc_pool, used for allocating struct expr_pred_trans_d for phi_translate_table. (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of malloc() and free() calls. This avoids lots of malloc/free calls and slow iterations during numerous htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as soon as a post C++ one is available. Ok, if bootstrap / testing still succeeds. I'd note for all the recently posted patches from Dimitrios, the gcc/ prefix doesn't belong to the ChangeLog entry pathnames, the filenames are relative to the corresponding ChangeLog location. Jakub
Re: alloc_pool for tree-ssa-pre.c:phi_translate_table
On Mon, 20 Aug 2012, Jakub Jelinek wrote: I'd note for all the recently posted patches from Dimitrios, the gcc/ prefix doesn't belong to the ChangeLog entry pathnames, the filenames are relative to the corresponding ChangeLog location. Ah sorry, it's what the mklog utility generates, it seems I got carried away over-using it. :-) Dimitris
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
Well, it should store the largest working set in BBs, or one that came from a much longer run. But I see the issue you are pointing out. The num_counters (the number of hot bbs) should be reasonable, as the total number of BBs is the same between all runs, and I want to show the largest or more dominant working set in terms of the number of hot bbs. But the min_bb_counter will become more and more inaccurate as the number of runs increases, since the counter values are accumulated. Yes and that is the one that we plan to use to determine hot/cold decisions on, right? Note that there is no really 1-1 corespondence in betwen BBs and counters. For each function the there should be num_edges-num_bbs+1 counters. What do you plan to use BB counts for? I typed up a detailed email below on why getting this right would be difficult, but then I realized there might be a fairly simple accurate solution, which I'll describe first: The only way I see to do this completely accurately is to take two passes through the existing gcda files that we are merging into, one to read in all the counter values and merge them into all the counter values in the arrays from the current run (after which we can do the histogramming/working set computation accurately from the merged counters), and the second to rewrite them. In libgcov this doesn't seem like it would be too difficult to do, although it is a little bit of restructuring of the main merging loop and needs some special handling for buffered functions (which could increase the memory footprint a bit if there are many of these since they will all need to be buffered across the iteration over all the gcda files). The summary merging in coverage.c confuses me a bit as it seems to be handling the case when there are multiple program summaries in a single gcda file. When would this happen? It looks like the merge handling in libgcov should always produce a single program summary per gcda file. This is something Nathan introduced years back. The idea was IMO to handle more acurately objects linked into multiple binaries. I am not sure if the code really works or worked on some point. The idea, as I recall it, was to produce overall checksum of all objects and have different profile records for each combination. This is not really useful for profile feedback as you generate single object file for all uses, but it might become useful for LTO where you know into which binary you are linking to. I am not really sure it is worth all the infrastructure needed to support this though. Why you don't simply write the histogram into gcov file and don't merge the values here (i.e. doing the cummulation loop in GCC instead of within libgcov)? That doesn't completely solve the problem, unfortunately. The reason is that you don't know which histogram entry contains a particular block each run, and therefore you don't know how to compute the new combined histogram value and index for that bb counter. For example, a particular histogram index may have 5 counters (bbs) in it for one run and 2 counters (bbs) in it for a second run, so the question is how to compute the new entry for all of those bb counters, as the 5 bbs from the first run may or may not be a superset of the 2 from the second run. You could assume that the bbs have the same relative order of hotness between runs, and combine the bb counters accordingly, but there is still some trickiness because you have to apportion the cumulative counter sum stored in the histogram entry between new entries. For example, assume the highest indexed non-zero entries (the histogram buckets containing the largest counter values) in the two incoming histograms are: histogram 1: index 100: 4 counters, cumulative value X, min counter value minx ... histogram 2: index 100: 2 counters, cumulative value Y, min counter value miny index 99: 3 counters, cumulative value W, min counter value minw ... To merge, you could assume something like 2 counters with a new cumulative value (Y + X*2/4), and new min counter value minx+miny, that go into the merged histogram with the index corresponding to counter value minx+miny. And then 2 counters have a new cumulative value (W*2/3 + X*2/4) and new min counter value minx+minw, that go into the merged histogram with index corresponding to counter value minw+minx. Etc... Not entirely accurate, although it might be a reasonable approximation, but it also requires a number of division operations during the merge in libgcov. Another possibility, that might also provide a reasonable approximation, would be to scale the min_bb_counter values in the working sets by the sum_all_merged/sum_all_orig, where sum_all_merged is the new sum_all, and sum_all_orig is the sum_all from the summary whose working_set was chosen to be propagated to the new merged summary. This also requires some divisions at libgcov merge time, unless we
Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)
Tejas Belagod wrote: Ulrich Weigand wrote: The following patch implements this idea; it passes a basic regression test on arm-linux-gnueabi. (Obviously this would need a lot more testing on various platforms before getting into mainline ...) Can you have a look whether this fixes the problem you're seeing? Sorry for the delay in replying. Thanks for the patch. I tried this patch - it doesn't seem to reach as far as cleanup_subreg_operands (), but fails an assertion in push_reload () in reload.c:1307. I'm currently investigating this and will let you know the reason soon. Hi, Sorry for the delay in replying. These new issues that I was seeing were bugs specific to my code that I've fixed. Your patch seems to work fine. Thanks a lot for the patch. Thanks, Tejas Belagod. ARM.
[PATCH] Fix PR54327
This is an issue of folding looking at SSA def stmts when SSA form is not up-to-date. That's not safe unless the SSA name we are looking at is not marked for update (see GIMPLE_COND folding in cfgcleaup for another example). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-20 Richard Guenther rguent...@suse.de PR tree-optimization/54327 * gimple-fold.c (get_maxval_strlen): Do not walk use-def chains if the use is registered for SSA update. * gcc.dg/torture/pr54327.c: New testcase. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 190523) --- gcc/gimple-fold.c (working copy) *** get_maxval_strlen (tree arg, tree *lengt *** 736,741 --- 736,746 return true; } + /* If ARG is registered for SSA update we cannot look at its defining + statement. */ + if (name_registered_for_update_p (arg)) + return false; + /* If we were already here, break the infinite cycle. */ if (!bitmap_set_bit (visited, SSA_NAME_VERSION (arg))) return true; Index: gcc/testsuite/gcc.dg/torture/pr54327.c === *** gcc/testsuite/gcc.dg/torture/pr54327.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr54327.c (working copy) *** *** 0 --- 1,15 + /* { dg-do compile } */ + + #include string.h + #include stdlib.h + void treathead () + { + char *a = ';' == '\0' ? : 0; + if (*a == '=') + { + while (*a == (*a == 0) || *a == '\'') + a++; + if (strlen (a) 2) + abort (); + } + }
Re: [PATCH][C++] Get rid of TREE_CHAIN use for TREE_VEC (NON_DEFAULT_TEMPLATE_ARGS_COUNT)
On Fri, 17 Aug 2012, Richard Guenther wrote: This gets rid of this field, pushing it into a short int in tree_base (hopefully 2^16 non-defaulted template args are enough ...). The existing code is a bit confusing because of the differences with respect to ENABLE_CHECKING, it has very little testcase coverage as well (a single testcase, g++.dg/template/error39.C), so I am not sure I got the idea correct (especially as ENABLE_CHECKING vs. non-ENABLE_CHECKING looks to introduce subtle differences?). Anyway, a previous iteration that failed g++.dg/template/error39.C bootstrapped and tested all languages ok, the following version now passes g++.dg/template/error39.C and hopefully bootstrap and regtest as well. We build a _lot_ of TREE_VECs from the C++ frontend, so this 8 byte saving is quite visible. TREE_VEC is the most used TREE_CODE by far on PR54146: ... ssa_name 363597 mem_ref 403966 addr_expr1203839 tree_list1205721 tree_vec 2654415 which translates to 109615896 bytes allocated in TREE_VECs (yes, that includes the actual storage, the average size is 41.3 bytes, with this patch TREE_VEC itself shrinks from 40 bytes to 32 bytes which means the average TREE_VEC size is one!). Let's hope removing TREE_TYPE from TREE_VEC is as easy ;) It turns out the C++ frontend is again the only user. It gets used for DECL_PRIMARY_TEMPLATE. Now, the C++ fronend indeed seems to want to associate information with the template argument vector itself, not the template. As the patch removing TREE_CAHIN is in some sense not an improvement (C++ specific bits remain in the generic tree structure, blocking from the VEC size being moved to spare bits in tree_base), the DECL_PRIMARY_TEMPLATE usage shows that using a TREE_VEC to represent the template argument is the issue. We cannot simply derive from TREE_VEC because it relies on storage allocated at its tail, so instead a C++ specific tree node mimicking a TREE_VEC but sporting extra fields, including ones for the number of non-defaulted args and for DECL_PRIMARY_TEMPLATE would be the way to go. That node would derive from tree_base. This should be pretty straight-forward to do, but I'll push it back for now due to lack of time. So consider this patch withdrawn. You'll instead see me trying to move the length field into tree_base. Richard. Re-bootstrap and regtest running on x86_64-unknown-linux-gnu. Ok for the C++ parts? Thanks, Richard. 2012-08-17 Richard Guenther rguent...@suse.de * tree.h (struct tree_base): Put address_space into a union, alongside a new field non_default_template_args_count used by C++ TREE_VECs. (struct tree_vec): Derive from tree_typed instead of tree_common. (TYPE_ADDR_SPACE): Adjust. * tree.c (initialize_tree_contains_struct): Adjust tree type hierarchy. cp/ * cp-tree.h (NON_DEFAULT_TEMPLATE_ARGS_COUNT): Remove. (SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT, GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Use non_default_template_args_count in tree_base instead of TREE_CHAIN. * pt.c (expand_template_argument_pack): Adjust. (template_parm_to_arg): Likewise. (current_template_args): Likewise. (coerce_template_parameter_pack): Likewise. (coerce_template_parms): Likewise. (tsubst_template_args): Likewise. (type_unification_real): Likewise. * parser.c (cp_parser_template_argument_list): Likewise. Index: gcc/tree.h === *** gcc/tree.h(revision 190469) --- gcc/tree.h(working copy) *** struct GTY(()) tree_base { *** 453,465 unsigned packed_flag : 1; unsigned user_align : 1; unsigned nameless_flag : 1; ! unsigned spare : 12; ! ! /* This field is only used with type nodes; the only reason it is present ! in tree_base instead of tree_type is to save space. The size of the ! field must be large enough to hold addr_space_t values. */ ! unsigned address_space : 8; }; struct GTY(()) tree_typed { --- 453,473 unsigned packed_flag : 1; unsigned user_align : 1; unsigned nameless_flag : 1; + unsigned spare0 : 4; ! /* The following has to be at a 16-bit alignment boundary to be properly ! packed and not make tree_base larger than 64 bits. */ ! union tree_base_spare_bits { ! /* This field is only used with type nodes; the only reason it is present !in tree_base instead of tree_type is to save space. The size of the !field must be large enough to hold addr_space_t values. */ ! unsigned char address_space; ! ! /* This field is only used with tree_vec nodes by the C++ frontend; the !only reason it is present in tree_base instead of tree_vec is to !save space. */ ! unsigned
Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html
On Tue, 14 Aug 2012, Tobias Burnus wrote: I have committed the patch as obvious, however, I am happy for any comments. I went ahead and made some smaller changes, patch below. I noticed you are using q.../q, as in qcodee/code/q, which we usually don't. Why that? Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.15 diff -u -3 -p -r1.15 changes.html --- changes.html19 Aug 2012 19:48:02 - 1.15 +++ changes.html20 Aug 2012 10:32:20 - @@ -92,18 +92,20 @@ by this change./p (re)allocation in hot loops. (For arrays, replacing qcodevar=/code/q by qcodevar(:)=/code/q disables the automatic reallocation.)/li -liReading floating point numbers which use qcodeq/code/q for the -exponential (such as code4.0q0/code) is now supported as vendor +lipReading floating point numbers which use qcodeq/code/q for +the exponential (such as code4.0q0/code) is now supported as vendor extension for better compatibility with old data files. It is strongly recommended to use for I/O the equivalent but standard conforming -qcodee/code/q (such as code4.0e0/code). [For the Fortran +qcodee/code/q (such as code4.0e0/code)./p + +p(For Fortran source code, consider replacing the qcodeq/code/q in floating-point literals by a kind parameter (e.g. code4.0e0_qp/code -with a suitable codeqp/code). Note that ndash; in the Fortran +with a suitable codeqp/code). Note that ndash; in Fortran source code ndash; replacing qcodeq/code/q by a simple -qcodee/code/q is emnot/em equivalent.]/li +qcodee/code/q is emnot/em equivalent.)/p/li -liThe codeGFORTRAN_TMPDIR/code environment variable, for specifying +liThe codeGFORTRAN_TMPDIR/code environment variable for specifying a non-default directory for files opened with codeSTATUS=SCRATCH/code, is not used anymore. Instead gfortran checks the POSIX/GNU standard codeTMPDIR/code environment variable. If codeTMPDIR/code is not
Re: [SH] PR 54089 - Add support for rotcr insn
Oleg Endo oleg.e...@t-online.de wrote: This adds support for SH's rotcr insn. Tested on rev 190459 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Re: [SH] PR 51244 - Use more zero displacement branches
Oleg Endo oleg.e...@t-online.de wrote: This adds two new patterns to undo an optimization that is done by ifcvt and is not beneficial if zero displacement branches are available on SH. Tested on rev 190459 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Check for ffs declaration in gcc/configure.ac
Building a cross compiler from i686-mingw32 to arm-none-eabi fails after the move to build as C++ because config/arm/neon.md uses ffs but no MinGW header declares ffs (and unlike in C, an implicit declaration of this built-in function can't be used). This patch fixes this issue by making the gcc directory configure check for a declaration, so causing libiberty.h to provide one if needed. Tested building a cross compiler as described above (fails before the patch, passes afterwards). OK to commit? 2012-08-20 Joseph Myers jos...@codesourcery.com * configure.ac (ffs): Check for declaration. * configure, config.in: Regenerate. Index: configure === --- configure (revision 190529) +++ configure (working copy) @@ -10288,7 +10288,7 @@ for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \ strsignal strstr stpcpy strverscmp \ errno snprintf vsnprintf vasprintf malloc realloc calloc \ - free basename getopt clock getpagesize clearerr_unlocked feof_unlocked ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked putchar_unlocked putc_unlocked + free basename getopt clock getpagesize ffs clearerr_unlocked feof_unlocked ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked putchar_unlocked putc_unlocked do ac_tr_decl=`$as_echo HAVE_DECL_$ac_func | $as_tr_cpp` { $as_echo $as_me:${as_lineno-$LINENO}: checking whether $ac_func is declared 5 Index: config.in === --- config.in (revision 190529) +++ config.in (working copy) @@ -621,6 +621,12 @@ #endif +/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */ +#ifndef USED_FOR_TARGET +#undef HAVE_DECL_FFS +#endif + + /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise define to 0. */ #ifndef USED_FOR_TARGET Index: configure.ac === --- configure.ac(revision 190529) +++ configure.ac(working copy) @@ -1075,7 +1075,7 @@ gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \ strsignal strstr stpcpy strverscmp \ errno snprintf vsnprintf vasprintf malloc realloc calloc \ - free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[ + free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[ #include ansidecl.h #include system.h]) -- Joseph S. Myers jos...@codesourcery.com
Re: Check for ffs declaration in gcc/configure.ac
On Mon, Aug 20, 2012 at 1:28 PM, Joseph S. Myers jos...@codesourcery.com wrote: Building a cross compiler from i686-mingw32 to arm-none-eabi fails after the move to build as C++ because config/arm/neon.md uses ffs but no MinGW header declares ffs (and unlike in C, an implicit declaration of this built-in function can't be used). This patch fixes this issue by making the gcc directory configure check for a declaration, so causing libiberty.h to provide one if needed. Tested building a cross compiler as described above (fails before the patch, passes afterwards). OK to commit? Ok. Thanks, Richard. 2012-08-20 Joseph Myers jos...@codesourcery.com * configure.ac (ffs): Check for declaration. * configure, config.in: Regenerate. Index: configure === --- configure (revision 190529) +++ configure (working copy) @@ -10288,7 +10288,7 @@ for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \ strsignal strstr stpcpy strverscmp \ errno snprintf vsnprintf vasprintf malloc realloc calloc \ - free basename getopt clock getpagesize clearerr_unlocked feof_unlocked ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked putchar_unlocked putc_unlocked + free basename getopt clock getpagesize ffs clearerr_unlocked feof_unlocked ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked putchar_unlocked putc_unlocked do ac_tr_decl=`$as_echo HAVE_DECL_$ac_func | $as_tr_cpp` { $as_echo $as_me:${as_lineno-$LINENO}: checking whether $ac_func is declared 5 Index: config.in === --- config.in (revision 190529) +++ config.in (working copy) @@ -621,6 +621,12 @@ #endif +/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */ +#ifndef USED_FOR_TARGET +#undef HAVE_DECL_FFS +#endif + + /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise define to 0. */ #ifndef USED_FOR_TARGET Index: configure.ac === --- configure.ac(revision 190529) +++ configure.ac(working copy) @@ -1075,7 +1075,7 @@ gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \ strsignal strstr stpcpy strverscmp \ errno snprintf vsnprintf vasprintf malloc realloc calloc \ - free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[ + free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[ #include ansidecl.h #include system.h]) -- Joseph S. Myers jos...@codesourcery.com
Re: combine permutations in gimple
On Sat, Aug 18, 2012 at 11:59 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, here is a new patch (passes bootstrap+regtest), which only combines permutations if the result is the identity. I'll file a PR about more general combinations to link to this conversation and the cost hook proposition. Ok? Ok. Thanks, Richard. 2012-08-18 Marc Glisse marc.gli...@inria.fr gcc/ * fold-const.c (fold_ternary_loc): Detect identity permutations. Canonicalize permutations more. * tree-ssa-forwprop.c (is_combined_permutation_identity): New function. (simplify_permutation): Likewise. (ssa_forward_propagate_and_combine): Call it. gcc/testsuite/ * gcc.dg/tree-ssa/forwprop-19.c: New testcase. * gcc.dg/fold-perm.c: Likewise. -- Marc Glisse Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 190500) +++ gcc/fold-const.c(working copy) @@ -14148,58 +14148,96 @@ fold_ternary_loc (location_t loc, enum t return fold_build2_loc (loc, PLUS_EXPR, type, const_binop (MULT_EXPR, arg0, arg1), arg2); if (integer_zerop (arg2)) return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); return fold_fma (loc, type, arg0, arg1, arg2); case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i; + unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask; unsigned char *sel = XALLOCAVEC (unsigned char, nelts); tree t; bool need_mask_canon = false; + bool all_in_vec0 = true; + bool all_in_vec1 = true; + bool maybe_identity = true; + bool single_arg = (op0 == op1); + bool changed = false; + mask = single_arg ? (nelts - 1) : (2 * nelts - 1); gcc_assert (nelts == VECTOR_CST_NELTS (arg2)); for (i = 0; i nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); if (TREE_CODE (val) != INTEGER_CST) return NULL_TREE; - sel[i] = TREE_INT_CST_LOW (val) (2 * nelts - 1); + sel[i] = TREE_INT_CST_LOW (val) mask; if (TREE_INT_CST_HIGH (val) || ((unsigned HOST_WIDE_INT) TREE_INT_CST_LOW (val) != sel[i])) need_mask_canon = true; + + if (sel[i] nelts) + all_in_vec1 = false; + else + all_in_vec0 = false; + + if ((sel[i] (nelts-1)) != i) + maybe_identity = false; + } + + if (maybe_identity) + { + if (all_in_vec0) + return op0; + if (all_in_vec1) + return op1; } if ((TREE_CODE (arg0) == VECTOR_CST || TREE_CODE (arg0) == CONSTRUCTOR) (TREE_CODE (arg1) == VECTOR_CST || TREE_CODE (arg1) == CONSTRUCTOR)) { t = fold_vec_perm (type, arg0, arg1, sel); if (t != NULL_TREE) return t; } + if (all_in_vec0) + op1 = op0; + else if (all_in_vec1) + { + op0 = op1; + for (i = 0; i nelts; i++) + sel[i] -= nelts; + need_mask_canon = true; + } + + if (op0 == op1 !single_arg) + changed = true; + if (need_mask_canon arg2 == op2) { tree *tsel = XALLOCAVEC (tree, nelts); tree eltype = TREE_TYPE (TREE_TYPE (arg2)); for (i = 0; i nelts; i++) tsel[i] = build_int_cst (eltype, sel[i]); - t = build_vector (TREE_TYPE (arg2), tsel); - return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, t); + op2 = build_vector (TREE_TYPE (arg2), tsel); + changed = true; } + + if (changed) + return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, op2); } return NULL_TREE; default: return NULL_TREE; } /* switch (code) */ } /* Perform constant folding and related simplification of EXPR. The related simplifications include x*1 = x, x*0 = 0, etc., Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 190500) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -2570,20 +2570,109 @@ combine_conversions (gimple_stmt_iterato gimple_assign_set_rhs_code (stmt, CONVERT_EXPR); update_stmt (stmt); return remove_prop_source_from_use (op0) ? 2 : 1; } } } return 0; } +/* Determine whether applying the 2
Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate
On Fri, Aug 17, 2012 at 7:05 PM, Richard Earnshaw rearn...@arm.com wrote: On 17/08/12 16:20, Richard Earnshaw wrote: Ok, in which case we have to give is_widening_mult_rhs_p enough smarts to not strip (s32)u32 and return u32. I'll have another think about it. Take two. This version should address your concerns about handling (u32)u16 * (u32)u16 - u64 We now look at each operand directly, but when doing so we check whether the operand is the same size as the result or not. When it is, we can strip any conversion; when it isn't the conversion must preserve signedness of the inner operand and mustn't be a narrowing conversion. * tree-ssa-math-opts.c (widening_mult_conversion_strippable_p): New function. (is_widening_mult_rhs_p): Use it. Testing underway (again) OK? Ok. Thanks, Richard. R.
[PATCH] Remove register_new_name_mapping, do less timevar push/pop
This performs timevar push/pop at the single remaining entry to incremental SSA rewrite setup. The other entry, register_new_name_mapping, is removed by making its only other user use create_new_def_for. Which needs to handle a NULL DEF for this case, as we have no access to a DEF op for an SSA name LHS (but we do for a PHI node). Something Micha may change soon. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-20 Richard Guenther rguent...@suse.de * tree-flow.h (register_new_name_mapping): Remove. * tree-into-ssa.c (register_new_name_mapping): Likewise. (add_new_name_mapping): Do not push/pop timevar here. (create_new_def_for): Instead do it here. Initialize update-ssa here, handle a NULL def. * tree-vrp.c (build_assert_expr_for): Use create_new_def_for. Index: gcc/tree-flow.h === *** gcc/tree-flow.h (revision 190525) --- gcc/tree-flow.h (working copy) *** void release_defs_bitset (bitmap toremov *** 516,522 /* In tree-into-ssa.c */ void update_ssa (unsigned); void delete_update_ssa (void); - void register_new_name_mapping (tree, tree); tree create_new_def_for (tree, gimple, def_operand_p); bool need_ssa_update_p (struct function *); bool name_registered_for_update_p (tree); --- 516,521 Index: gcc/tree-into-ssa.c === *** gcc/tree-into-ssa.c (revision 190525) --- gcc/tree-into-ssa.c (working copy) *** static VEC(gimple_vec, heap) *phis_to_re *** 111,125 static bitmap blocks_with_phis_to_rewrite; /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES. These sets need !to grow as the callers to register_new_name_mapping will typically !create new names on the fly. FIXME. Currently set to 1/3 to avoid !frequent reallocations but still need to find a reasonable growth !strategy. */ #define NAME_SETS_GROWTH_FACTOR (MAX (3, num_ssa_names / 3)) /* The function the SSA updating data structures have been initialized for. !NULL if they need to be initialized by register_new_name_mapping. */ static struct function *update_ssa_initialized_fn = NULL; /* Global data to attach to the main dominator walk structure. */ --- 111,125 static bitmap blocks_with_phis_to_rewrite; /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES. These sets need !to grow as the callers to create_new_def_for will create new names on !the fly. !FIXME. Currently set to 1/3 to avoid frequent reallocations but still !need to find a reasonable growth strategy. */ #define NAME_SETS_GROWTH_FACTOR (MAX (3, num_ssa_names / 3)) /* The function the SSA updating data structures have been initialized for. !NULL if they need to be initialized by create_new_def_for. */ static struct function *update_ssa_initialized_fn = NULL; /* Global data to attach to the main dominator walk structure. */ *** add_to_repl_tbl (tree new_tree, tree old *** 587,594 static void add_new_name_mapping (tree new_tree, tree old) { - timevar_push (TV_TREE_SSA_INCREMENTAL); - /* OLD and NEW_TREE must be different SSA names for the same symbol. */ gcc_assert (new_tree != old SSA_NAME_VAR (new_tree) == SSA_NAME_VAR (old)); --- 587,592 *** add_new_name_mapping (tree new_tree, tre *** 613,620 respectively. */ SET_BIT (new_ssa_names, SSA_NAME_VERSION (new_tree)); SET_BIT (old_ssa_names, SSA_NAME_VERSION (old)); - - timevar_pop (TV_TREE_SSA_INCREMENTAL); } --- 611,616 *** delete_update_ssa (void) *** 2842,2857 /* Create a new name for OLD_NAME in statement STMT and replace the !operand pointed to by DEF_P with the newly created name. Return !the new name and register the replacement mapping NEW, OLD in update_ssa's tables. */ tree create_new_def_for (tree old_name, gimple stmt, def_operand_p def) { ! tree new_name = duplicate_ssa_name (old_name, stmt); ! SET_DEF (def, new_name); if (gimple_code (stmt) == GIMPLE_PHI) { --- 2838,2865 /* Create a new name for OLD_NAME in statement STMT and replace the !operand pointed to by DEF_P with the newly created name. If DEF_P !is NULL then STMT should be a GIMPLE assignment. !Return the new name and register the replacement mapping NEW, OLD in update_ssa's tables. */ tree create_new_def_for (tree old_name, gimple stmt, def_operand_p def) { ! tree new_name; ! timevar_push (TV_TREE_SSA_INCREMENTAL); ! ! if (!update_ssa_initialized_fn) ! init_update_ssa (cfun); ! ! gcc_assert (update_ssa_initialized_fn == cfun); ! ! new_name = duplicate_ssa_name (old_name, stmt); ! if (def) ! SET_DEF (def, new_name); ! else ! gimple_assign_set_lhs (stmt,
Re: [graphds.h] Allocate graph from obstack
Il 19/08/2012 18:55, Richard Guenther ha scritto: Initially I had one obstack per struct graph, which was better than using XNEW for every edge, but still obstack_init() called from new_graph() was too frequent. So in this iteration of the patch the obstack is static global, initialised once and never freed. Please advise on whether this is acceptable, and also where I should initialise the obstack once, and avoid checking if it's NULL in every use. Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot, I'll retest soon and post update. Either an obstack per graph or the ability to specify an obstack for allocation. A global obstack with global lifetime is bad IMHO. Dimitrios's patch has a per-file obstack with per-pass lifetime, which I think is the right solution---but putting graph_obstack as a static variable in graphds.h is gly. You can just move the declaration to each file separately, and give it a better name perhaps (e.g. loop_graph_obstack). Paolo
Re: [graphds.h] Allocate graph from obstack
On Mon, Aug 20, 2012 at 2:03 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 19/08/2012 18:55, Richard Guenther ha scritto: Initially I had one obstack per struct graph, which was better than using XNEW for every edge, but still obstack_init() called from new_graph() was too frequent. So in this iteration of the patch the obstack is static global, initialised once and never freed. Please advise on whether this is acceptable, and also where I should initialise the obstack once, and avoid checking if it's NULL in every use. Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot, I'll retest soon and post update. Either an obstack per graph or the ability to specify an obstack for allocation. A global obstack with global lifetime is bad IMHO. Dimitrios's patch has a per-file obstack with per-pass lifetime, which I think is the right solution---but putting graph_obstack as a static variable in graphds.h is gly. You can just move the declaration to each file separately, and give it a better name perhaps (e.g. loop_graph_obstack). Well, I see that the way it is constructed is that you can at most have a single live graph at the same time (using the same obstack). That's a big limitation IMHO. Now, if the users would manage the obstack completely and instead would obstack_init()/free() them that would be different. Of course the outcome is exactly as with the initial patch having the obstack per struct graph. The present patch has too much low-level stuff exposed and does not easily allow putting other data on the same obstack. So I'd vote for managing the obstack completely in the caller for flexibility. Some may want to allocate auxiliar data on the same obstack for example. Richard. Paolo
[PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base
This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from 80 bytes to 72 bytes on a 64bit host. Both structures suffer from the fact they need storage for an integer (length and version) which leaves unused padding. Both data structures do not require as many flag bits as we keep in tree_base though, so they can conveniently use the upper 4-bytes of the 8-bytes tree_base to store length / version. I added a union to tree_base to divide up the space between flags (possibly) used for all tree kinds and flags that are not used for those who chose to re-use the upper 4-bytes of tree_base for something else. This superseeds the patch that moved the C++ specific usage of TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC isn't any closer to be based on tree_base only). Due to re-use of flags from frontends definitive checking for flag accesses is not always possible (TREE_NOTRHOW for example). Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead, to catch mis-uses of the C++ frontend. Changed ARGUMENT_PACK_INCOMPLETE_P to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which it used on TREE_VECs. We are very lazy adjusting flag usage documentation :/ Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Any comments? Thanks, Richard. 2012-08-20 Richard Guenther rguent...@suse.de cp/ * cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor. (ATTR_IS_DEPENDENT): Likewise. (ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 on TREE_VECs. * tree.h (struct tree_base): Add union to make it possible to re-use the upper 4 bytes for tree codes that do not need as many flags as others. Move visited and default_def_flag to common bits section in exchange for saturating_flag and unsigned_flag. Add SSA name version and tree vec length fields here. (struct tree_vec): Remove length field here. (struct tree_ssa_name): Remove version field here. Index: trunk/gcc/cp/cp-tree.h === *** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200 --- trunk/gcc/cp/cp-tree.h 2012-08-20 13:53:05.212969994 +0200 *** struct GTY((variable_size)) lang_decl { *** 2520,2530 /* In a TREE_LIST concatenating using directives, indicate indirect directives */ ! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0) /* In a TREE_LIST in an attribute list, indicates that the attribute must be applied at instantiation time. */ ! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0) extern tree decl_shadowed_for_var_lookup (tree); extern void decl_shadowed_for_var_insert (tree, tree); --- 2520,2530 /* In a TREE_LIST concatenating using directives, indicate indirect directives */ ! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE)) /* In a TREE_LIST in an attribute list, indicates that the attribute must be applied at instantiation time. */ ! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE)) extern tree decl_shadowed_for_var_lookup (tree); extern void decl_shadowed_for_var_insert (tree, tree); *** extern void decl_shadowed_for_var_insert *** 2881,2887 arguments will be placed into the beginning of the argument pack, but additional arguments might still be deduced. */ #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\ ! TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE)) /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template arguments used to fill this pack. */ --- 2881,2887 arguments will be placed into the beginning of the argument pack, but additional arguments might still be deduced. */ #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\ ! TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE)) /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template arguments used to fill this pack. */ Index: trunk/gcc/tree.h === *** trunk.orig/gcc/tree.h 2012-08-20 12:47:47.0 +0200 --- trunk/gcc/tree.h2012-08-20 13:56:12.109963537 +0200 *** struct GTY(()) tree_base { *** 427,435 unsigned addressable_flag : 1; unsigned volatile_flag : 1; unsigned readonly_flag : 1; - unsigned unsigned_flag : 1; unsigned asm_written_flag: 1; unsigned nowarning_flag : 1; unsigned used_flag : 1; unsigned nothrow_flag : 1; --- 427,435 unsigned addressable_flag : 1; unsigned volatile_flag : 1; unsigned readonly_flag : 1; unsigned asm_written_flag: 1; unsigned nowarning_flag : 1; + unsigned visited : 1; unsigned used_flag : 1; unsigned nothrow_flag : 1; *** struct GTY(()) tree_base { *** 438,465
Re: [wwwdocs] SH 4.8 changes update
On Mon, 20 Aug 2012, Oleg Endo wrote: You mean, since SH is neither a primary nor a secondary platform, there are no particular release criteria for it? What does that actually mean? From what I recall, you can make stage 1 type changes even after stage 1. However, if your port is broken, the release managers will shrug their shoulders and release even if your port is quite broken due to that. Related to your web page update, I just comitted the patch below which allows for direct linking to the SH entry. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.16 diff -u -3 -p -r1.16 changes.html --- changes.html20 Aug 2012 11:02:28 - 1.16 +++ changes.html20 Aug 2012 12:22:52 - @@ -159,7 +159,7 @@ by this change./p options to the assembler./li /ul -h3SH/h3 +h3 id=shSH/h3 ul liThe default alignment settings have been reduced to be less aggressive. This results in more compact code for optimization levels other than
Re: [wwwdocs] SH 4.8 changes update
On Mon, Aug 20, 2012 at 1:37 AM, Oleg Endo oleg.e...@t-online.de wrote: On Sun, 2012-08-19 at 21:43 +0200, Gerald Pfeifer wrote: On Sun, 19 Aug 2012, Oleg Endo wrote: Thanks. Let's hope that I can squeeze in some more stuff while stage 1 lasts. :T You know that for backend-specific changes (especially for smaller ports) you actually have some more leeway? You mean, since SH is neither a primary nor a secondary platform, there are no particular release criteria for it? What does that actually mean? It means that release managers do not care about the state of the SH port. It means SH target maintainers have to watch how the release goes and decide when to put effort into testing and fixing bugs themselves, otherwise they may run into the situation that the SH port does not even build for the release. So kind of target maintainers for ports that are not primary or secondary are their own release managers without any control over release timing. Richard.
Re: [graphds.h] Allocate graph from obstack
Hi Paolo, On Mon, 20 Aug 2012, Paolo Bonzini wrote: Il 19/08/2012 18:55, Richard Guenther ha scritto: Initially I had one obstack per struct graph, which was better than using XNEW for every edge, but still obstack_init() called from new_graph() was too frequent. So in this iteration of the patch the obstack is static global, initialised once and never freed. Please advise on whether this is acceptable, and also where I should initialise the obstack once, and avoid checking if it's NULL in every use. Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot, I'll retest soon and post update. Either an obstack per graph or the ability to specify an obstack for allocation. A global obstack with global lifetime is bad IMHO. Dimitrios's patch has a per-file obstack with per-pass lifetime Notice that I never call XOBDELETE with NULL argument, I only free the first object, which means that the 4KB per obstack overhead is retained until program exit, I did that to save on malloc calls. Thanks, Dimitris
Re: [bootstrap] Tentative fix for PR 54281
On 2012-08-19 07:18 , Arnaud Charlet wrote: The conditionals cannot be removed for the time being because the foreign language interface of the Ada part of the compiler is hardcoded for C. Barring a massive switch to the Ada language for the GNU project, we would need to switch the foreign language interface to C++, which might introduce various maintenance issues in the short term (Arno CCed). Yes, that would be a large hearthquake for both the compiler and the run-time, which is unwanted at this stage. I guess the solution for now is to more selectively export the various symbols as C symbols and include header files as is. Thanks. One potential issue I see here is that as we start using more C++ headers (and more C++ in existing headers), we will reach a point in which including something like system.h inside an extern C {} block will fail. None of the GCC headers should be included as C code anymore. Diego.
Re: new sign/zero extension elimination pass
On 11/07/12 12:30, Tom de Vries wrote: On 13/11/10 10:50, Eric Botcazou wrote: I profiled the pass on spec2000: -mabi=32 -mabi=64 ee-pass (usr time): 0.70 1.16 total (usr time): 919.30 879.26 ee-pass(%): 0.08 0.13 The pass takes 0.13% or less of the total usr runtime. For how many hits? What are the numbers with --param ee-max-propagate=0? Is it necessary to improve the runtime of this pass? I've already given my opinion about the implementation. The other passes in the compiler try hard not to rescan everything when a single bit changes; as currently written, yours doesn't. Eric, I've done the following: - refactored the pass such that it now scans at most twice over all instructions. - updated the patch to be applicable to current trunk - updated the motivating example to a more applicable one (as discussed in this thread), and added that one as test-case. - added a part in the header comment illustrating the working of the pass on the motivating example. bootstrapped and reg-tested on x86_64 and i686. build and reg-tested on mips, mips64, and arm. OK for trunk? Eric, does the new patch meet your concerns related to rescanning? If so, OK for trunk? Thanks, - Tom Thanks, - Tom 2012-07-10 Tom de Vries t...@codesourcery.com * ee.c: New file. * tree-pass.h (pass_ee): Declare. * opts.c ( default_options_table): Set flag_ee at -O2. * timevar.def (TV_EE): New timevar. * common.opt (fextension-elimination): New option. * Makefile.in (ee.o): New rule. * passes.c (pass_ee): Add it. * gcc.dg/extend-1.c: New test. * gcc.dg/extend-2.c: Same. * gcc.dg/extend-2-64.c: Same. * gcc.dg/extend-3.c: Same. * gcc.dg/extend-4.c: Same. * gcc.dg/extend-5.c: Same. * gcc.target/mips/octeon-bbit-2.c: Make test more robust.
patch for machine dependent rtl section to hide case statements for different types of constants.
This patch started out to be a purely mechanical change to the switch statements so that the ones that are used to take apart constants can be logically grouped.This is important for the next patch that I will submit this week that frees the rtl level from only being able to represent large integer constants with two HWIs. I sent the patch to Richard Sandiford and when the comments came back from him, this patch turned into something that actually has real semantic changes. (His comments are enclosed below.) I did almost all of Richard's changes because he is generally right about such things, but it does mean that the patch has to be more carefully reviewed. Richard does not count his comments as a review. The patch has, of course, been properly tested on x86-64. Any comments? Ok for commit? Kenny Richard's comments: = The omission of CONST_FIXED from the cselib_expand_value_rtx_1, attr_copy_rtx, clear_struct_flag and combine switches looks unintentional (though only as a missed compiler-speed optimisation). Same goes for the omission of CONST_VECTOR from check_maybe_invariant. The omission of CONST_FIXED from dse.c:const_or_frame_p looks like a missed target-code optimisation. The function ought to be using CONSTANT_P instead. I did not do what is suggested in the last sentence because it changes the behavior of the rtx HIGH. I don't see any reason to prefer the hashing of CONST_VECTORs in hash_invariant_expr_1 and invariant_expr_equal_p over the default cse implementation, so here too I think we can add CONST_VECTOR to the switch. cse.c:exp_equiv_p, rtx_equal_for_memref_p, operands_match_p, rtx_equal_p_cb and rtx_equal_p are all testing the property is pointer equality sufficient for this kind of constant. rtx_renumbered_equal_p is too, so I think the omission of CONST_FIXED there is again unintentional. That leaves mark_jump_label_1. This block is really testing does this rtx have no operands that might be labels, which is again true for CONST_FIXED. TBH, this smacks of premature optimisation to me. How do we know that checking each code here is cheaper than falling through? I doubt the switch is populated enough to merit a jump table, and the number of executed branches might well be higher like this when you take other codes into account. So in terms of setting the abstraction level, I think there are two options: 1) Define: /* Match CONST_*s that can represent compile-time constant integers. */ #define CASE_CONST_SCALAR_INT \ case CONST_INT: \ case CONST_DOUBLE /* Match CONST_*s for which pointer equality corresponds to value equality. */ #define CASE_CONST_UNIQUE \ case CONST_INT: \ case CONST_DOUBLE: \ case CONST_FIXED /* Match all CONST_* rtxes. */ #define CASE_CONST_ANY \ case CONST_INT: \ case CONST_DOUBLE: \ case CONST_FIXED: \ case CONST_VECTOR and remove the mark_jump_label_1 cases. The name CASE_CONST_SCALAR_INT matches SCALAR_INT_MODE_P, CASE_CONST_UNIQUE is solely for the equality functions above. 2) As for (1), but keep the mark_jump_label_1 cases and rename CASE_CONST_UNIQUE to CASE_CONST_LEAF so that mark_jump_label_1 can use it. This implies that all leaf CONST_*s (i.e. those without rtx operands) will always be hashed to give pointer equality. I don't like that assumption, and CASE_CONST_UNIQUE feels like a better abstraction, so I prefer (1) over (2). For avoidance of doubt, this isn't an approval or anything. It definitely needs to be submitted upstream. (Yeah, yeah, sorry for nagging. :-)) Richard == diff -uprN '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c --- gccBaseline/gcc/alias.c 2012-08-17 09:35:24.794195890 -0400 +++ gccWCase/gcc/alias.c 2012-08-19 09:48:33.666509880 -0400 @@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con return XSTR (x, 0) == XSTR (y, 0); case VALUE: -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: /* There's no need to compare the contents of CONST_DOUBLEs or CONST_INTs because pointer equality is a good enough comparison for these nodes. */ diff -uprN '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400 +++ gccWCase/gcc/combine.c 2012-08-17 09:36:49.921199621 -0400 @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc) switch (code) { -case CONST_INT: case CONST: case LABEL_REF: case SYMBOL_REF: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_INTEGER_OR_FLOAT: case CLOBBER: return 0; @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x) { case LABEL_REF: case SYMBOL_REF: -case CONST_INT: case CONST: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_INTEGER_OR_FLOAT: case PC: case ADDR_VEC: case ADDR_DIFF_VEC: diff -uprN '--exclude=.svn'
Re: patch for machine dependent rtl section to hide case statements for different types of constants.
I of course meant the machine independent not dependent On 08/20/2012 09:50 AM, Kenneth Zadeck wrote: This patch started out to be a purely mechanical change to the switch statements so that the ones that are used to take apart constants can be logically grouped.This is important for the next patch that I will submit this week that frees the rtl level from only being able to represent large integer constants with two HWIs. I sent the patch to Richard Sandiford and when the comments came back from him, this patch turned into something that actually has real semantic changes. (His comments are enclosed below.) I did almost all of Richard's changes because he is generally right about such things, but it does mean that the patch has to be more carefully reviewed. Richard does not count his comments as a review. The patch has, of course, been properly tested on x86-64. Any comments? Ok for commit? Kenny
Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate
Hi Richard, your patch fails here; I get the build failure: /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’: /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable] enum tree_code rhs_code; ^ Tobias On 08/17/2012 07:05 PM, Richard Earnshaw wrote: --- tree-ssa-math-opts.c(revision 190502) +++ tree-ssa-math-opts.c(local) @@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree if (is_gimple_assign (stmt)) { rhs_code = gimple_assign_rhs_code (stmt); - if (TREE_CODE (type) == INTEGER_TYPE - ? !CONVERT_EXPR_CODE_P (rhs_code) - : rhs_code != FIXED_CONVERT_EXPR) + if (! widening_mult_conversion_strippable_p (type, stmt)) rhs1 = rhs; else {
Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86
On Fri, Aug 17, 2012 at 2:42 AM, Chupin, Pavel V pavel.v.chu...@intel.com wrote: Submitted patch here: https://android-review.googlesource.com/#/c/41705 -- Pavel link.h has been added to AOSP. I am closing PR 54209. Thanks. -- H.J.
Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate
On 20/08/12 15:01, Tobias Burnus wrote: Hi Richard, your patch fails here; I get the build failure: /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’: /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable] enum tree_code rhs_code; ^ Tobias On 08/17/2012 07:05 PM, Richard Earnshaw wrote: --- tree-ssa-math-opts.c (revision 190502) +++ tree-ssa-math-opts.c (local) @@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree if (is_gimple_assign (stmt)) { rhs_code = gimple_assign_rhs_code (stmt); - if (TREE_CODE (type) == INTEGER_TYPE - ? !CONVERT_EXPR_CODE_P (rhs_code) - : rhs_code != FIXED_CONVERT_EXPR) + if (! widening_mult_conversion_strippable_p (type, stmt)) rhs1 = rhs; else { Whoops! Sorry about that. Fixed thusly. Committed as obvious. PR tree-ssa/54295 * tree-ssa-math-opts.c (is_widening_mult_rhs_p): Delete rhs_code declaration and setter. R. Index: tree-ssa-math-opts.c === --- tree-ssa-math-opts.c(revision 190533) +++ tree-ssa-math-opts.c(working copy) @@ -2011,14 +2011,12 @@ is_widening_mult_rhs_p (tree type, tree { gimple stmt; tree type1, rhs1; - enum tree_code rhs_code; if (TREE_CODE (rhs) == SSA_NAME) { stmt = SSA_NAME_DEF_STMT (rhs); if (is_gimple_assign (stmt)) { - rhs_code = gimple_assign_rhs_code (stmt); if (! widening_mult_conversion_strippable_p (type, stmt)) rhs1 = rhs; else
[C++ Patch] PR 10416
Hi, in this old issue submitter points out that we emit too easily Wunused_variable warnings even when the destructor has side effects (otoh, the constructor is handled Ok). This is particularly annoying together with eg, RAII. Turns out that lately we are already careful when we handle the new Wunused_but_set_variable, thus I'm simply proposing to commonize the additional check. Tested x86_64-linux. Thanks, Paolo. / /cp 2012-08-20 Paolo Carlini paolo.carl...@oracle.com PR c++/10416 * decl.c (poplevel): Check TYPE_HAS_NONTRIVIAL_DESTRUCTOR for Wunused_variable too. /testsuite 2012-08-20 Paolo Carlini paolo.carl...@oracle.com PR c++/10416 * g++.dg/warn/Wunused-var-17.C: New. Index: testsuite/g++.dg/warn/Wunused-var-17.C === --- testsuite/g++.dg/warn/Wunused-var-17.C (revision 0) +++ testsuite/g++.dg/warn/Wunused-var-17.C (revision 0) @@ -0,0 +1,4 @@ +// PR c++/10416 +// { dg-options -Wunused } + +void f () { struct atend { ~atend () { __builtin_printf(leaving f\n); } } a; } Index: cp/decl.c === --- cp/decl.c (revision 190526) +++ cp/decl.c (working copy) @@ -621,16 +621,16 @@ poplevel (int keep, int reverse, int functionbody) if (TREE_CODE (decl) == VAR_DECL (! TREE_USED (decl) || !DECL_READ_P (decl)) ! DECL_IN_SYSTEM_HEADER (decl) - DECL_NAME (decl) ! DECL_ARTIFICIAL (decl)) + DECL_NAME (decl) ! DECL_ARTIFICIAL (decl) + TREE_TYPE (decl) != error_mark_node + (!CLASS_TYPE_P (TREE_TYPE (decl)) + || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl { if (! TREE_USED (decl)) warning (OPT_Wunused_variable, unused variable %q+D, decl); else if (DECL_CONTEXT (decl) == current_function_decl - TREE_TYPE (decl) != error_mark_node TREE_CODE (TREE_TYPE (decl)) != REFERENCE_TYPE - errorcount == unused_but_set_errorcount - (!CLASS_TYPE_P (TREE_TYPE (decl)) - || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl + errorcount == unused_but_set_errorcount) { warning (OPT_Wunused_but_set_variable, variable %q+D set but not used, decl);
[PATCH] PR53992 - openmp lower transaction code
In this PR, OMP lowering is not going into the transaction code. So if GIMPLE_TRANSACTION is found, we lower its body. (Patch also fixes a format issue.) Note that PR53992 component in Bugzilla must be change from c to libgomp (I don't have bugzilla account with admin rights, who should I ask for that?). Tested on trunk / i686. Ok for trunk? Ok to backport to 4.7 branch if no regression? Thanks. gcc/ 2012-08-17 Patrick Marlier patrick.marl...@gmail.com PR libgomp/53992 * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION. Index: omp-low.c === --- omp-low.c (revision 190488) +++ omp-low.c (working copy) @@ -6827,6 +6827,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_cont lower_omp (gimple_try_eval_ptr (stmt), ctx); lower_omp (gimple_try_cleanup_ptr (stmt), ctx); break; +case GIMPLE_TRANSACTION: + lower_omp (gimple_transaction_body_ptr (stmt), ctx); + break; case GIMPLE_BIND: lower_omp (gimple_bind_body_ptr (stmt), ctx); break; @@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool * break; case GIMPLE_COND: - { - tree lab = gimple_cond_true_label (stmt); - if (lab) - { - n = splay_tree_lookup (all_labels, - (splay_tree_key) lab); - diagnose_sb_0 (gsi_p, context, - n ? (gimple) n-value : NULL); - } - lab = gimple_cond_false_label (stmt); - if (lab) - { - n = splay_tree_lookup (all_labels, - (splay_tree_key) lab); - diagnose_sb_0 (gsi_p, context, - n ? (gimple) n-value : NULL); - } - } + { + tree lab = gimple_cond_true_label (stmt); + if (lab) + { + n = splay_tree_lookup (all_labels, + (splay_tree_key) lab); + diagnose_sb_0 (gsi_p, context, + n ? (gimple) n-value : NULL); + } + lab = gimple_cond_false_label (stmt); + if (lab) + { + n = splay_tree_lookup (all_labels, + (splay_tree_key) lab); + diagnose_sb_0 (gsi_p, context, + n ? (gimple) n-value : NULL); + } + } break; case GIMPLE_GOTO: Index: testsuite/gcc.dg/gomp/pr53992.c === --- testsuite/gcc.dg/gomp/pr53992.c (revision 0) +++ testsuite/gcc.dg/gomp/pr53992.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -fgnu-tm -fopenmp } */ +/* { dg-require-effective-target fgnu_tm } */ + +int main() { +long data[1]; +long i, min=1; +for (i=0; i1; i++) data[i] = -i; + +#pragma omp parallel for +for (i=0; i1; i++) { +__transaction_atomic +{ +if (data[i] min) +min = data[i]; +} +} + +return !(min == -); +}
Fix -ftime-report for C++ lookup
Found this while running -ftime-report on a largish C++ source file. We need to start TV_NAME_LOOKUP conditionally inside poplevel() because it may be called from another lookup routine that already has TV_NAME_LOOKUP going. Tested on x86_64. Committed to trunk. 2012-08-20 Diego Novillo dnovi...@google.com * decl.c (poplevel): Start TV_NAME_LOOKUP conditionally. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 5908996..0dad597 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -552,7 +552,7 @@ poplevel (int keep, int reverse, int functionbody) unsigned ix; cp_label_binding *label_bind; - timevar_start (TV_NAME_LOOKUP); + bool subtime = timevar_cond_start (TV_NAME_LOOKUP); restart: block = NULL_TREE; @@ -818,7 +818,7 @@ poplevel (int keep, int reverse, int functionbody) if (kind == sk_cleanup) goto restart; - timevar_stop (TV_NAME_LOOKUP); + timevar_cond_stop (TV_NAME_LOOKUP, subtime); return block; }
Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches
On 17 August 2012 16:55, Joseph S. Myers jos...@codesourcery.com wrote: On Fri, 17 Aug 2012, Simon Baldwin wrote: You could have just added case OPT_cpp_: to the switch in gen_producer_string, instead of all this. Thanks. I was under the impression, apparently mistaken, that OPT_cpp_ exists only if fortran is enabled. OPT_* for Fortran options only exist when the Fortran front-end is in the source tree (whether or not enabled). I think we try to avoid knowingly breaking use cases where people remove some front ends from the source tree, although we don't actively test them and no longer provide split-up source tarballs. Thanks for the update. Which fix should move forwards? -- Google UK Limited | Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1W 9TQ | Registered in England Number: 3977902
Re: [PATCH] PR53992 - openmp lower transaction code
On 08/20/2012 07:20 AM, Patrick Marlier wrote: 2012-08-17 Patrick Marlier patrick.marl...@gmail.com PR libgomp/53992 * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION. Ok everywhere. The Bugzilla must not change to libgomp, as that is reserved for the runtime library. r~
Re: [PATCH] PR53992 - openmp lower transaction code
On Mon, Aug 20, 2012 at 10:20:33AM -0400, Patrick Marlier wrote: Ok for trunk? Ok to backport to 4.7 branch if no regression? Ok for both, with the following nits resolved: gcc/ 2012-08-17 Patrick Marlier patrick.marl...@gmail.com PR libgomp/53992 Use PR middle-end/53992 instead, libgomp is for library issues only. * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION. @@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool * break; case GIMPLE_COND: - { - tree lab = gimple_cond_true_label (stmt); - if (lab) - { - n = splay_tree_lookup (all_labels, - (splay_tree_key) lab); - diagnose_sb_0 (gsi_p, context, - n ? (gimple) n-value : NULL); - } - lab = gimple_cond_false_label (stmt); - if (lab) - { - n = splay_tree_lookup (all_labels, - (splay_tree_key) lab); - diagnose_sb_0 (gsi_p, context, - n ? (gimple) n-value : NULL); - } - } + { + tree lab = gimple_cond_true_label (stmt); + if (lab) + { + n = splay_tree_lookup (all_labels, +(splay_tree_key) lab); + diagnose_sb_0 (gsi_p, context, +n ? (gimple) n-value : NULL); + } + lab = gimple_cond_false_label (stmt); + if (lab) + { + n = splay_tree_lookup (all_labels, +(splay_tree_key) lab); + diagnose_sb_0 (gsi_p, context, +n ? (gimple) n-value : NULL); + } + } break; case GIMPLE_GOTO: Please leave this hunk out. Formatting can be normally fixed only if you touch the code in question or at least lines around it, not in an unrelated patch that touches completely different function. --- testsuite/gcc.dg/gomp/pr53992.c (revision 0) +++ testsuite/gcc.dg/gomp/pr53992.c (working copy) @@ -0,0 +1,20 @@ Please add /* PR middle-end/53992 */ line to the beginning of the file. +/* { dg-do compile } */ +/* { dg-options -fgnu-tm -fopenmp } */ +/* { dg-require-effective-target fgnu_tm } */ + +int main() { +long data[1]; +long i, min=1; +for (i=0; i1; i++) data[i] = -i; + +#pragma omp parallel for +for (i=0; i1; i++) { +__transaction_atomic +{ +if (data[i] min) +min = data[i]; +} +} + +return !(min == -); +} Jakub
Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches
On Mon, 20 Aug 2012, Simon Baldwin wrote: OPT_* for Fortran options only exist when the Fortran front-end is in the source tree (whether or not enabled). I think we try to avoid knowingly breaking use cases where people remove some front ends from the source tree, although we don't actively test them and no longer provide split-up source tarballs. Thanks for the update. Which fix should move forwards? I think the approach using a new option flag is the way to go, though the patch needs (at least) documentation for the new flag in options.texi. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 2:48 AM, Jan Hubicka hubi...@ucw.cz wrote: Well, it should store the largest working set in BBs, or one that came from a much longer run. But I see the issue you are pointing out. The num_counters (the number of hot bbs) should be reasonable, as the total number of BBs is the same between all runs, and I want to show the largest or more dominant working set in terms of the number of hot bbs. But the min_bb_counter will become more and more inaccurate as the number of runs increases, since the counter values are accumulated. Yes and that is the one that we plan to use to determine hot/cold decisions on, right? Yes, so I agree it is important to do something to update this as profiles are merged. Note that there is no really 1-1 corespondence in betwen BBs and counters. For each function the there should be num_edges-num_bbs+1 counters. What do you plan to use BB counts for? True, although what I am using this for is to get an idea of the size of the working set to help identify large icache footprints in order to control code size increasing optimizations such as unrolling. The idea is that a huge number of hot counters indicates a huge number of hot bbs which in turn indicates a huge number of hot instructions and therefore high icache pressure. We're using it in the google branch to control unrolling successfully. I typed up a detailed email below on why getting this right would be difficult, but then I realized there might be a fairly simple accurate solution, which I'll describe first: The only way I see to do this completely accurately is to take two passes through the existing gcda files that we are merging into, one to read in all the counter values and merge them into all the counter values in the arrays from the current run (after which we can do the histogramming/working set computation accurately from the merged counters), and the second to rewrite them. In libgcov this doesn't seem like it would be too difficult to do, although it is a little bit of restructuring of the main merging loop and needs some special handling for buffered functions (which could increase the memory footprint a bit if there are many of these since they will all need to be buffered across the iteration over all the gcda files). The summary merging in coverage.c confuses me a bit as it seems to be handling the case when there are multiple program summaries in a single gcda file. When would this happen? It looks like the merge handling in libgcov should always produce a single program summary per gcda file. This is something Nathan introduced years back. The idea was IMO to handle more acurately objects linked into multiple binaries. I am not sure if the code really works or worked on some point. The idea, as I recall it, was to produce overall checksum of all objects and have different profile records for each combination. This is not really useful for profile feedback as you generate single object file for all uses, but it might become useful for LTO where you know into which binary you are linking to. I am not really sure it is worth all the infrastructure needed to support this though. Ok, so perhaps for the merging in coverage.c we can do something less accurate, and worry more about the libgcov merging. Why you don't simply write the histogram into gcov file and don't merge the values here (i.e. doing the cummulation loop in GCC instead of within libgcov)? That doesn't completely solve the problem, unfortunately. The reason is that you don't know which histogram entry contains a particular block each run, and therefore you don't know how to compute the new combined histogram value and index for that bb counter. For example, a particular histogram index may have 5 counters (bbs) in it for one run and 2 counters (bbs) in it for a second run, so the question is how to compute the new entry for all of those bb counters, as the 5 bbs from the first run may or may not be a superset of the 2 from the second run. You could assume that the bbs have the same relative order of hotness between runs, and combine the bb counters accordingly, but there is still some trickiness because you have to apportion the cumulative counter sum stored in the histogram entry between new entries. For example, assume the highest indexed non-zero entries (the histogram buckets containing the largest counter values) in the two incoming histograms are: histogram 1: index 100: 4 counters, cumulative value X, min counter value minx ... histogram 2: index 100: 2 counters, cumulative value Y, min counter value miny index 99: 3 counters, cumulative value W, min counter value minw ... To merge, you could assume something like 2 counters with a new cumulative value (Y + X*2/4), and new min counter value minx+miny, that go into the merged histogram with the index corresponding to counter value minx+miny. And then 2 counters have
[DF] RFC: obstacks in DF
Hi, while I was happy using obstacks in other parts of the compiler I thought they would provide a handy solution for the XNEWVECs/XRESIZEVECs in df-scan.c, especially df_install_refs() which is the heaviest malloc() user after the rest of my patches. In the process I realised that obstacks weren't exactly suitable (thanks matz for helping on IRC), nevertheless I have a patch which is stable, a bit faster and more memory friendly (don't know why, but RSS memory for common non-pathological compilations, was smaller). However after trying various incorrect changes I'm aware that there are leaks in there that can't be closed without dirty hacks, so I gave up. I'm posting the simplest but stable version of my changes and would really like to hear ideas. There are holes in the obstack that should have been free'd, but in the end I didn't see memory increase. I don't know what would happen for huge functions that keep the obstack alive for long, I didn't have such testcase. Finally, I think my next try will involve converting the chains to pool_alloc'ed linked list, do you think that makes sense? Thanks, Dimitris === modified file 'gcc/df-scan.c' --- gcc/df-scan.c 2012-07-16 11:32:42 + +++ gcc/df-scan.c 2012-08-20 14:01:59 + @@ -184,6 +184,17 @@ struct df_scan_problem_data bitmap_obstack insn_bitmaps; }; +/* Obstack for storing all of all of + insn_info-{defs,uses,eq_uses,mw_hardregs} and + bb_info-artificial_{defs,uses}. */ +static struct obstack df_refs_obstack; + +/* Keep the obstack initialised (only 4k overhead) and use this pointer to + actually empty it fast. */ +static void *df_first_refs_obj; +static int df_refs_obstack_is_init; + + typedef struct df_scan_bb_info *df_scan_bb_info_t; @@ -193,36 +204,13 @@ df_scan_free_internal (void) { struct df_scan_problem_data *problem_data = (struct df_scan_problem_data *) df_scan-problem_data; - unsigned int i; - basic_block bb; - /* The vectors that hold the refs are not pool allocated because - they come in many sizes. This makes them impossible to delete - all at once. */ - for (i = 0; i DF_INSN_SIZE(); i++) -{ - struct df_insn_info *insn_info = DF_INSN_UID_GET(i); - /* Skip the insns that have no insn_info or have been -deleted. */ - if (insn_info) - { - df_scan_free_ref_vec (insn_info-defs); - df_scan_free_ref_vec (insn_info-uses); - df_scan_free_ref_vec (insn_info-eq_uses); - df_scan_free_mws_vec (insn_info-mw_hardregs); - } -} - - FOR_ALL_BB (bb) -{ - unsigned int bb_index = bb-index; - struct df_scan_bb_info *bb_info = df_scan_get_bb_info (bb_index); - if (bb_info) - { - df_scan_free_ref_vec (bb_info-artificial_defs); - df_scan_free_ref_vec (bb_info-artificial_uses); - } -} + /* Delete at once all vectors that hold the refs (all of + insn_info-{defs,uses,eq_uses,mw_hardregs} and + bb_info-artificial_{defs,uses}) but keep the obstack initialised, so + that it's ready for more BBs. */ + obstack_free (df_refs_obstack, df_first_refs_obj); + df_first_refs_obj = NULL; free (df-def_info.refs); free (df-def_info.begin); @@ -364,6 +352,14 @@ df_scan_alloc (bitmap all_blocks ATTRIBU bb_info-artificial_uses = NULL; } + if (! df_refs_obstack_is_init) +{ + obstack_init (df_refs_obstack); + df_refs_obstack_is_init = 1; +} + gcc_assert (df_first_refs_obj == NULL); + df_first_refs_obj = XOBNEW (df_refs_obstack, void *); + bitmap_initialize (df-hardware_regs_used, problem_data-reg_bitmaps); bitmap_initialize (df-regular_block_artificial_uses, problem_data-reg_bitmaps); bitmap_initialize (df-eh_block_artificial_uses, problem_data-reg_bitmaps); @@ -791,9 +787,15 @@ df_install_ref_incremental (df_ref ref) } ref_rec = *ref_rec_ptr; + /* fprintf (stderr, count:%d ref_rec:%p\n, count, ref_rec); */ if (count) { - ref_rec = XRESIZEVEC (df_ref, ref_rec, count+2); + /* Impossible to actually know if ref_rec was malloc'ed or obstack'ed +thus we might have a leak here! */ + df_ref *ref_rec2 = XOBNEWVEC (df_refs_obstack, df_ref, count+2); + memcpy (ref_rec2, ref_rec, count*sizeof(df_ref)); + /* free (ref_rec); */ + ref_rec = ref_rec2; *ref_rec_ptr = ref_rec; ref_rec[count] = ref; ref_rec[count+1] = NULL; @@ -801,7 +803,7 @@ df_install_ref_incremental (df_ref ref) } else { - df_ref *ref_rec = XNEWVEC (df_ref, 2); + ref_rec = XOBNEWVEC (df_refs_obstack, df_ref, 2); ref_rec[0] = ref; ref_rec[1] = NULL; *ref_rec_ptr = ref_rec; @@ -1070,8 +1072,9 @@ df_ref_chain_delete (df_ref *ref_rec) /* If the list is empty, it has a special shared element that is not to be deleted. */ - if (*start) -free (start); + /* if (*start) */ + /* free (start); */ + /* obstack_free'd
Re: C++ PR 54197: lifetime of reference not properly extended
On Thu, Aug 16, 2012 at 2:13 PM, Gabriel Dos Reis g...@integrable-solutions.net wrote: On Wed, Aug 15, 2012 at 9:52 AM, Ollie Wild a...@google.com wrote: (Adding other C++ maintainers in case someone else wants to have a stab.) Ping? I consider Jason to be the expert on this; so let see what he says. -- Gaby Jason, any idea when you can look at this? The patch is about as short as they come, so it shouldn't take long to review. Thanks, Ollie
Re: [DF] RFC: obstacks in DF
On Mon, Aug 20, 2012 at 4:54 PM, Dimitrios Apostolou ji...@gmx.net wrote: In the process I realised that obstacks weren't exactly suitable (thanks matz for helping on IRC), Right, alloc-pool would be a better choice here. Ciao! Steven
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka hubi...@ucw.cz wrote: The summary merging in coverage.c confuses me a bit as it seems to be handling the case when there are multiple program summaries in a single gcda file. When would this happen? It looks like the merge handling in libgcov should always produce a single program summary per gcda file. AFAIU it merges existing profile data with new profile data from a second (or third, or ...) trial run. Ciao! Steven
Re: patch for machine dependent rtl section to hide case statements for different types of constants.
Kenneth Zadeck zad...@naturalbridge.com writes: The omission of CONST_FIXED from the cselib_expand_value_rtx_1, attr_copy_rtx, clear_struct_flag and combine switches looks unintentional (though only as a missed compiler-speed optimisation). Same goes for the omission of CONST_VECTOR from check_maybe_invariant. The omission of CONST_FIXED from dse.c:const_or_frame_p looks like a missed target-code optimisation. The function ought to be using CONSTANT_P instead. I did not do what is suggested in the last sentence because it changes the behavior of the rtx HIGH. As mentioned privately, that's what we want. 1) Define: /* Match CONST_*s that can represent compile-time constant integers. */ #define CASE_CONST_SCALAR_INT \ case CONST_INT: \ case CONST_DOUBLE /* Match CONST_*s for which pointer equality corresponds to value equality. */ #define CASE_CONST_UNIQUE \ case CONST_INT: \ case CONST_DOUBLE: \ case CONST_FIXED /* Match all CONST_* rtxes. */ #define CASE_CONST_ANY \ case CONST_INT: \ case CONST_DOUBLE: \ case CONST_FIXED: \ case CONST_VECTOR and remove the mark_jump_label_1 cases. I meant that these three should be the _only_ cases we need. The reason I listed all those missed cases was that, with the exception of mark_jump_label_1, the switches really seemed to be testing one of the three conditions above. Richard
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
So I definitely preffer 2 or 3 over 1. David has experience with 3. How well does it work for LIPO? This (lack of locking, races) is not a new problem. There is no synchronization in libgcov for profile update/merge at both thread and process level. Thread level data races leads to inconsistent counters, but can be mostly corrected under -fprofile-correction and smoothing. There is also problem with false indirect call targets --- gcc has mechanism to filter those out. Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. thanks, David Honza Thanks, Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
failed attempt: retain identifier length from frontend to backend
Hello, my last attempt on improving something serious was about three weeks ago, trying to keep all lengths of all strings parsed in the frontend for the whole compilation phase until the assembly output. I was hoping that would help on using faster hashes (knowing the length allows us to hash 4 or 8 bytes per iteration), quicker strcmp in various places, and using less strlen() calls, which show especially on -g3 compilations that store huge macro strings. I'll post no patch here, since what I currently have is a mess in 3 different branches and most don't even compile. I tried various approaches. First I tried adding an extra length parameter in all relevant functions, starting from the assembly generation and working my way upwards. This got too complex, and I'd really like to ask if you find any meaning in changing target specific hooks and macros to actually accept length as argument (e.g. ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). Changes seemed too intrusive for me to continue. But seeing that identifier length is there inside struct ht_identifier (or cpp_hashnode) and not lost, I tried the approach of having the length at str[-4] for all identifiers. To achieve this I changed ht_identifier to store str with the flexible array hack. Unfortunately I hadn't noticed that ht_identifier was a part of tree_node and also part of too many other structs, so changing all those structs to have variable size was not without side effects. In the end it compiled but I got crashes all over, and I'm sure I didn't do things right since I broke things like the static assert in libcpp/identifiers.c, that I don't even understand: /* We don't need a proxy since the hash table's identifier comes first in cpp_hashnode. However, in case this is ever changed, we have a static assertion for it. */ -extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) == 0 ? 1 : -1]; Anyway last attempt was to decouple ht_identifier completely from trees and other structs by storing pointer to it, but I was pretty worn out and quickly gave up after getting errors on gengtype-generated files that I didn't even know how to handle. Was all this project too ambitious? I'd appreciate all input. Thanks, Dimitris
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
If this approach seems like it is feasible, then we could stick with the current approach of emitting the working set array in the summary, mitigating it somewhat by doing the sum_all based scaling of the counter values, then in a follow on patch restructure the merging code to delay the working set computation as described above. +1 David Teresa Honza Thanks, Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH][LIBTOOL] AIX PIC compiler options
The GCC -fpic/-fPIC option has evolved to mean code generation for a shared library and changes the optimization behavior of the compiler. Code for AIX PowerPC always is PIC, but the optimization behavior is affecting AIX. libtool exports all global symbols on AIX while GCC binds_local_p() determines that some function calls are local, causing GCC to emit the local (non-external) form of function calls, which generates AIX linker warnings. The IBM XL compiler defaults to the equivalent of -fpic. GCC could try to match that, but it requires working around other GCC bootstrap problems and continually chasing compatibility. The other option is to assume that developers using GCC to build shared libraries either are using libtool or copying FOSS Makefiles designed for GNU/Linux or expecting GNU/Linux compatibility. The following patch starts to implement the latter by adding -fPIC to the compiler command line options when building objects for a shared library. Because this places control in the hands of the user with a command line option and matches GNU/Linux, this seems better than playing with defaults and fighting GCC semantics for building shared libraries. I also will post this patch upstream to Libtool. Comments? Thanks, David * libtool.m4 (_LT_COMPILER_PIC): Add -fPIC to GCC and GXX for AIX. * configure: Regenerate. Index: libtool.m4 === --- libtool.m4 (revision 190521) +++ libtool.m4 (working copy) @@ -3580,6 +3580,7 @@ # AIX 5 now supports IA64 processor _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' fi + _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' ;; amigaos*) @@ -3891,6 +3892,7 @@ # AIX 5 now supports IA64 processor _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' fi + _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' ;; amigaos*)
Re: [C++ Patch] PR 10416
OK. Jason
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 8:35 AM, Steven Bosscher stevenb@gmail.com wrote: On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka hubi...@ucw.cz wrote: The summary merging in coverage.c confuses me a bit as it seems to be handling the case when there are multiple program summaries in a single gcda file. When would this happen? It looks like the merge handling in libgcov should always produce a single program summary per gcda file. AFAIU it merges existing profile data with new profile data from a second (or third, or ...) trial run. It looks like libgcov will always merge program summaries between different runs though. As far as I can tell, it will always either rewrite the whole gcda file with a single merged program summary, or abort the write if the merge was not successful. However, the comment above gcov_exit does talk about keeping program summaries separate for object files contained in different programs, which is what Honza was describing as the motivation for the coverage.c merging. But I don't see where multiple program summaries ever get written to the same gcda file - if the checksums are different it seems to abort the write. But the code in coverage.c is dealing with a single gcda file containing multiple program summaries. Is there code somewhere that will cause this to happen? Thanks, Teresa Ciao! Steven -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 7:44 PM, Teresa Johnson tejohn...@google.com wrote: But the code in coverage.c is dealing with a single gcda file containing multiple program summaries. Is there code somewhere that will cause this to happen? Not that I know of, no. (There are enhancement requests for this, see e.g. PR47618). Ciao! Steven
Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target
Hi Tobias, do you think that -Wtarget-lifetime should be included with -Wall? I think so, because the code flagged is certainly invalid, and likely to cause random errors. Thomas
Re: [wwwdocs] Document Runtime CPU detection builtins
Ping. On Tue, Aug 14, 2012 at 11:02 AM, Sriraman Tallam tmsri...@google.com wrote: +ger...@pfiefer.com On Tue, Aug 14, 2012 at 10:51 AM, Sriraman Tallam tmsri...@google.com wrote: Hi Gerald, Is this release note alright? Thanks, -Sri. On Fri, Aug 10, 2012 at 7:20 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, I have added a release note for x86 builtins __builtin_cpu_is and __builtin_cpu_supports. They were checked in to trunk in rev. 186789. Is this ok to submit? Thanks, -Sri.
Re: [wwwdocs] Document Runtime CPU detection builtins
On Fri, Aug 10, 2012 at 10:20 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, I have added a release note for x86 builtins __builtin_cpu_is and __builtin_cpu_supports. They were checked in to trunk in rev. 186789. Is this ok to submit? I would not include such detailed documentation in changes.html. I assume that all this documentation, including caveats and limitations is documented in the manual itself? If that's the case, then simply mention the builtins, an overview description of about a paragraph and pointers to the user documentation for limitations and caveats. Diego.
Re: [Google 4.7] Generate pubnames compatible with gdb-index version 7. (issue6459099)
On Thu, Aug 16, 2012 at 5:32 PM, Cary Coutant ccout...@google.com wrote: +/* Output a single entry in the pubnames table. */ + +static void +output_pubname (dw_offset die_offset, pubname_entry *entry) For this function, I'd suggest a comment to the effect that the logic is lifted from GDB. @@ -2424,6 +2424,10 @@ gpubnames Common RejectNegative Var(debug_generate_pub_sections, 1) Generate DWARF pubnames and pubtypes sections. +ggnu-pubnames +Common RejectNegative Var(debug_generate_pub_sections, 2) +Generate DWARF pubnames and pubtypes sections. Instead of RejectNegative, I think these three options should now use Negative(...) flags (each one naming the next, circularly). Not sure about that, though. (See the treatment of -gdwarf-, -gstabs, etc.) OK for google/gcc-4_7 branch. Committed with your recommended changes, as attached. Sterling patch.diff Description: Binary data
Re: [PATCH] Fix -fcompare-debug failure in fwprop (PR rtl-optimization/54294)
Jakub Jelinek ja...@redhat.com writes: 2012-08-17 Jakub Jelinek ja...@redhat.com PR rtl-optimization/54294 * fwprop.c (all_uses_available_at): Ignore debug insns in between def_insn and target_insn when checking whether the shortcut is possible. OK, thanks. Richard
Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target
Hi Thomas, Thomas Koenig wrote: do you think that -Wtarget-lifetime should be included with -Wall? I think so, because the code flagged is certainly invalid, and likely to cause random errors. I concur. However, that's what the current version in the trunk does: -Wall implies -Wtarget-lifetime. On the other hand, I just realized that attr.result is not set if the function symbols is also its result symbol – hence, there is no warning for: function f() integer, pointer :: f integer, target :: t f = t end I think the following patch will fix that. I will package, regtest and commit it later. Tobias --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue) warn = lvalue-symtree-n.sym-attr.dummy || lvalue-symtree-n.sym-attr.result +|| lvalue-symtree-n.sym-attr.function || lvalue-symtree-n.sym-attr.host_assoc || lvalue-symtree-n.sym-attr.use_assoc || lvalue-symtree-n.sym-attr.in_common;
[SPARC] Define MAX_FIXED_MODE_SIZE
SPARC is now one of the last mainstream 64-bit platforms that do not define MAX_FIXED_MODE_SIZE to TImode. Doing so helps the Ada compiler in particular because it is a heavy user of structures made up of a pair of pointers. Bootstrapped/regtested/compat-regtested on SPARC64/Solaris, applied on the mainline. 2012-08-20 Eric Botcazou ebotca...@adacore.com * config/sparc/sparc.h (MAX_FIXED_MODE_SIZE): Define. -- Eric Botcazou Index: config/sparc/sparc.h === --- config/sparc/sparc.h (revision 190512) +++ config/sparc/sparc.h (working copy) @@ -475,7 +475,6 @@ extern enum cmodel sparc_cmodel; #endif /* Now define the sizes of the C data types. */ - #define SHORT_TYPE_SIZE 16 #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE (TARGET_ARCH64 ? 64 : 32) @@ -512,7 +511,6 @@ extern enum cmodel sparc_cmodel; #define SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 TARGET_STACK_BIAS) /* ALIGN FRAMES on double word boundaries */ - #define SPARC_STACK_ALIGN(LOC) \ (TARGET_ARCH64 ? (((LOC)+15) ~15) : (((LOC)+7) ~7)) @@ -551,6 +549,10 @@ extern enum cmodel sparc_cmodel; : MAX ((COMPUTED), (SPECIFIED))) \ : MAX ((COMPUTED), (SPECIFIED))) +/* An integer expression for the size in bits of the largest integer machine + mode that should actually be used. We allow pairs of registers. */ +#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_ARCH64 ? TImode : DImode) + /* We need 2 words, so we can save the stack pointer and the return register of the function containing a non-local goto target. */ #define STACK_SAVEAREA_MODE(LEVEL) \
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? -Andi
[PATCH] Fix valtrack ICE (PR debug/53923)
Hi! On the testcase from this PR on AVR (from libgcc, thus not including it into testsuite/) we ICE, because dead_debug_insert_temp is called several times on the same insn, for multi-register hard register for each regno in it (except the first which doesn't seem to be dead). In the first call dead_debug_insert_temp changes *DF_REF_REAL_LOC to DEBUG_EXPR, and in the next call for the next consecutive hard register we set reg variable to *DF_REF_REAL_LOC and rely on it to be a REG, when it is a DEBUG_EXPR already instead. No change is needed in that case. Bootstrapped/regtested on x86_64-linux and i686-linux (where this never kicks in though), tested on the testcase in cross to avr. Ok for trunk? 2012-08-20 Jakub Jelinek ja...@redhat.com PR debug/53923 * valtrack.c (dead_debug_insert_temp): Drop non-reg uses from the chain. --- gcc/valtrack.c.jj 2012-08-10 12:57:21.0 +0200 +++ gcc/valtrack.c 2012-08-20 14:59:07.609586159 +0200 @@ -336,6 +336,14 @@ dead_debug_insert_temp (struct dead_debu { if (DF_REF_REGNO (cur-use) == uregno) { + /* If this loc has been changed e.g. to debug_expr already +as part of a multi-register use, just drop it. */ + if (!REG_P (*DF_REF_REAL_LOC (cur-use))) + { + *tailp = cur-next; + XDELETE (cur); + continue; + } *usesp = cur; usesp = cur-next; *tailp = cur-next; Jakub
[Patch, Fortran, commit] -Wtarget-lifetime: Fix omission
Committed as Rev. 190542 Tobias Index: gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90 === --- gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90 (Revision 0) +++ gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90 (Arbeitskopie) @@ -0,0 +1,10 @@ +! { dg-do compile } +! { dg-options -Wtarget-lifetime } +! +! PR fortran/54301 +! +function f() + integer, pointer :: f + integer, target :: t + f = t ! { dg-warning Pointer at .1. in pointer assignment might outlive the pointer target } +end Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (Revision 190541) +++ gcc/testsuite/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2012-08-20 Tobias Burnus bur...@net-b.de + + PR fortran/54301 + * gfortran.dg/warn_target_lifetime_2.f90: New. + 2012-08-20 Paolo Carlini paolo.carl...@oracle.com PR c++/10416 Index: gcc/fortran/expr.c === --- gcc/fortran/expr.c (Revision 190541) +++ gcc/fortran/expr.c (Arbeitskopie) @@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex warn = lvalue-symtree-n.sym-attr.dummy || lvalue-symtree-n.sym-attr.result + || lvalue-symtree-n.sym-attr.function || lvalue-symtree-n.sym-attr.host_assoc || lvalue-symtree-n.sym-attr.use_assoc || lvalue-symtree-n.sym-attr.in_common; Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 190541) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,6 +1,12 @@ 2012-08-20 Tobias Burnus bur...@net-b.de PR fortran/54301 + * expr.c (gfc_check_pointer_assign): Warn when a pointer, + which is a function result, might outlive its target. + +2012-08-20 Tobias Burnus bur...@net-b.de + + PR fortran/54301 * expr.c (gfc_check_pointer_assign): Warn when the pointer might outlive its target. * gfortran.h (struct gfc_option_t): Add warn_target_lifetime.
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
I was mistaken here -- gcov_open actually uses locking via fcntl interface -- so we do need to find a way to solve the summary update synchronization problem when it is separate out of the per-file update loop. David On Mon, Aug 20, 2012 at 12:03 PM, Andi Kleen a...@firstfloor.org wrote: Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? -Andi
Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.
On Fri, Apr 27, 2012 at 08:16:04PM +0200, Mark Wielaard wrote: On Fri, 2012-04-27 at 15:43 +0200, Jakub Jelinek wrote: On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote: But even without this, I think the patch is worth it just to get rid of all the relocations necessary otherwise. IMHO we should defer applying this by a few months, given that GDB support is only being added these days and -gdwarf-4 is now the default. Applying it in August/September when there is already GDB version with the support would be better. OK, I'll try to remember and ping as soon as a new GDB release is out with the patch in. Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and elfutils 0.154 out now that support it. I rebased the patch and tested against GDB 7.5. 2012-08-20 Mark Wielaard m...@redhat.com * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc. * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc. (add_AT_low_high_pc): New function. (AT_lbl): Handle dw_val_class_high_pc. (print_die): Likewise. (attr_checksum): Likewise. (attr_checksum_ordered): Likewise. (same_dw_val_p): Likewise. (size_of_die): Likewise. (value_format): Likewise. (output_die): Likewise. (gen_subprogram_die): Use add_AT_low_high_pc. (add_high_low_attributes): Likewise. (dwarf2out_finish): Likewise. OK to commit now? Thanks, Mark diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 4bc4cc3..11d925b 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1312,6 +1312,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b) case dw_val_class_fde_ref: return a-v.val_fde_index == b-v.val_fde_index; case dw_val_class_lbl_id: +case dw_val_class_high_pc: return strcmp (a-v.val_lbl_id, b-v.val_lbl_id) == 0; case dw_val_class_str: return a-v.val_str == b-v.val_str; @@ -3598,6 +3599,26 @@ add_AT_data8 (dw_die_ref die, enum dwarf_attribute attr_kind, add_dwarf_attr (die, attr); } +/* Add DW_AT_low_pc and DW_AT_high_pc to a DIE. */ +static inline void +add_AT_low_high_pc (dw_die_ref die, const char *lbl_low, const char *lbl_high) +{ + dw_attr_node attr; + + attr.dw_attr = DW_AT_low_pc; + attr.dw_attr_val.val_class = dw_val_class_lbl_id; + attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_low); + add_dwarf_attr (die, attr); + + attr.dw_attr = DW_AT_high_pc; + if (dwarf_version 4) +attr.dw_attr_val.val_class = dw_val_class_lbl_id; + else +attr.dw_attr_val.val_class = dw_val_class_high_pc; + attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_high); + add_dwarf_attr (die, attr); +} + /* Hash and equality functions for debug_str_hash. */ static hashval_t @@ -3981,7 +4002,8 @@ AT_lbl (dw_attr_ref a) { gcc_assert (a (AT_class (a) == dw_val_class_lbl_id || AT_class (a) == dw_val_class_lineptr - || AT_class (a) == dw_val_class_macptr)); + || AT_class (a) == dw_val_class_macptr + || AT_class (a) == dw_val_class_high_pc)); return a-dw_attr_val.v.val_lbl_id; } @@ -4877,6 +4899,7 @@ print_die (dw_die_ref die, FILE *outfile) case dw_val_class_lbl_id: case dw_val_class_lineptr: case dw_val_class_macptr: + case dw_val_class_high_pc: fprintf (outfile, label: %s, AT_lbl (a)); break; case dw_val_class_str: @@ -5033,6 +5056,7 @@ attr_checksum (dw_attr_ref at, struct md5_ctx *ctx, int *mark) case dw_val_class_lbl_id: case dw_val_class_lineptr: case dw_val_class_macptr: +case dw_val_class_high_pc: break; case dw_val_class_file: @@ -5305,6 +5329,7 @@ attr_checksum_ordered (enum dwarf_tag tag, dw_attr_ref at, case dw_val_class_lbl_id: case dw_val_class_lineptr: case dw_val_class_macptr: +case dw_val_class_high_pc: break; case dw_val_class_file: @@ -5770,6 +5795,7 @@ same_dw_val_p (const dw_val_node *v1, const dw_val_node *v2, int *mark) case dw_val_class_lbl_id: case dw_val_class_lineptr: case dw_val_class_macptr: +case dw_val_class_high_pc: return 1; case dw_val_class_file: @@ -7241,6 +7267,9 @@ size_of_die (dw_die_ref die) case dw_val_class_vms_delta: size += DWARF_OFFSET_SIZE; break; + case dw_val_class_high_pc: + size += DWARF2_ADDR_SIZE; + break; default: gcc_unreachable (); } @@ -7558,6 +7587,17 @@ value_format (dw_attr_ref a) case dw_val_class_data8: return DW_FORM_data8; +case dw_val_class_high_pc: + switch (DWARF2_ADDR_SIZE) + { + case 4: + return DW_FORM_data4; + case 8: + return DW_FORM_data8; + default: + gcc_unreachable (); + } + default: gcc_unreachable (); } @@ -7984,6 +8024,11 @@ output_die (dw_die_ref die) break; } + case
Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.
On Mon, Aug 20, 2012 at 09:59:26PM +0200, Mark Wielaard wrote: Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and elfutils 0.154 out now that support it. I rebased the patch and tested against GDB 7.5. 2012-08-20 Mark Wielaard m...@redhat.com * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc. * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc. (add_AT_low_high_pc): New function. (AT_lbl): Handle dw_val_class_high_pc. (print_die): Likewise. (attr_checksum): Likewise. (attr_checksum_ordered): Likewise. (same_dw_val_p): Likewise. (size_of_die): Likewise. (value_format): Likewise. (output_die): Likewise. (gen_subprogram_die): Use add_AT_low_high_pc. (add_high_low_attributes): Likewise. (dwarf2out_finish): Likewise. OK to commit now? Okay, thanks. Jakub
Re: Fix PR c++/19351 (operator new[] overflow)
OK. Sorry for the delay. Jason
Re: [wwwdocs] Document Runtime CPU detection builtins
Hi Sriraman, On Fri, 10 Aug 2012, Sriraman Tallam wrote: I have added a release note for x86 builtins __builtin_cpu_is and __builtin_cpu_supports. They were checked in to trunk in rev. 186789. I had hoped one of the x86 maintainers would review this from his perspective given that they have more background. For the lack of that, let me give it a try. Index: changes.html === +li New builtin functions to detect run-time CPU type and ISA:br built-in, cf. http://gcc.gnu.org/codingconventions.html; here and in the following. No br here; ul should just do that. +ul + liBuiltin code__builtin_cpu_is/code has been added to detect if + the run-time CPU is of a particular type. The builtin returns a postive + integer on a match and zero otherwise. The builtin accepts one string + literal argument, the CPU name. For example, A built-in function... positive It accepts one string (to make this shorter) + code__builtin_cpu_is(westmere)/code returns a postive integer if positive + the run-time CPU is an Intel Corei7 Westmere processor. The following I don't work for Intel, but should there be a space before i7? + are the CPU names recognized by code__builtin_cpu_is:/code How about making this The following are the CPU names recognized for now, which avoids another reference to the name of the built-in and makes it clear that this is subject to change. + liBuiltin code__builtin_cpu_supports/code has been added to detect A built-in function... + returns a postive integer on a match and zero otherwise. The builtin positive + following are the ISA features recognized by + code__builtin_cpu_supports:/code Same is above? +pCaveat: If the above builtins are called before any constructors are +invoked, like during IFUNC initialization, then the CPU detection +initialization must be explicity run using this newly provided +builtin, code__builtin_cpu_init/code. ...using the new built-in function code__builtin_cpu_init/code. What is a constructor in this context, by the way? Will this be clear to all the users? +code +static void (*some_ifunc_resolver(void))(void)br +{br +nbspnbsp __builtin_cpu_init();br +nbspnbsp if (__builtin_cpu_is(amdfam10h) ...br +nbspnbsp if (__builtin_cpu_supports(popcnt) ...br +} +/code How about using pre here? That avoids the br/s which will cause problems with the web page validator, by the way. Nice job for documenting this so well. Thanks for taking the time and your patience! The patch is fine modulo the changes I pointed out (though some of them are more suggestions and you do not need to slavishly follow those). Gerald
Re: [patch] Fix problems with -fdebug-types-section and local types
Ping. http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00398.html -cary On Mon, Aug 13, 2012 at 1:13 PM, Cary Coutant ccout...@google.com wrote: 2012-08-07 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin attribute. (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute from original DIE. (clone_tree_hash): Rename to ... (clone_tree_partial): ... this; change callers. Copy DW_TAG_subprogram DIEs as declarations. gcc/testsuite/ * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case. * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add -fdebug-types-section flag. Ping? -cary
Re: Merge C++ conversion into trunk (0/6 - Overview)
On Tue, Aug 14, 2012 at 11:59 AM, Diego Novillo dnovi...@google.com wrote: On 12-08-14 09:48 , Diego Novillo wrote: This merge touches several files, so I'm thinking that the best time is going to be on Thu 16/Aug around 2:00 GMT. So, the fixes I needed from Lawrence are already in so we can proceed with the merge. I'll commit the merge tonight at ~2:00 GMT. After the merge is in, I will send an announcement and request major branch merges to wait for another 24 hrs to allow testers the chance to pick up this merge. The C++ merge caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332 GCC memory usage is more than doubled from = 3GB to = 10GB. Is this a known issue? -- H.J.
Re: [SH] PR 54089 - Add support for rotcr insn
On 08/20/12 01:02:39, Oleg Endo wrote: Hello, This adds support for SH's rotcr insn. Tested on rev 190459 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? Cheers, Oleg ChangeLog: PR target/50489 Above: that should be: PR target/54089. * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits. (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split. * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function. * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it. testsuite/ChangeLog: PR target/50489 Here also. * gcc.target/sh/pr54089-1.c: New. This is OK. For a sec. I thought my long-standing bug report got some attention. :) - Gary
Re: [SH] PR 54089 - Add support for rotcr insn
On Mon, 2012-08-20 at 14:09 -0700, Gary Funck wrote: ChangeLog: PR target/50489 Above: that should be: PR target/54089. * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits. (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split. * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function. * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it. testsuite/ChangeLog: PR target/50489 Here also. * gcc.target/sh/pr54089-1.c: New. This is OK. Thanks and sorry for the digit twister. I've corrected it in the ChangeLog files. Is there any way to delete the PR commit comment in bugzilla? For a sec. I thought my long-standing bug report got some attention. :) It did, but probably not in a constructive way ;) Cheers, Oleg
Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]
On Sat, 2012-08-18 at 10:23 +0200, Uros Bizjak wrote: On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak ubiz...@gmail.com wrote: After discussion with Oleg, it looks that it is enough to prevent wrong registers in the output of the (multi-output) insn pattern. As far as inputs are concerned, combine already handles limited reload classes in the right way. The problem with x86 is, that reload tried to fix output operand of the multi-output ins, where scheduler already moved load of ax register before this insn. Version 2 of the patch now handles only output operands. Also, handling of empty constraints was fixed. ... but here we fail testcase from PR46843... so we HAVE to check input operands. Hm, I'm curious ... how would that work for patterns such as (define_insn *addc [(set (match_operand:SI 0 arith_reg_dest =r) (plus:SI (plus:SI (match_operand:SI 1 arith_reg_operand %0) (match_operand:SI 2 arith_reg_or_0_operand r)) (match_operand:SI 3 t_reg_operand ))) (clobber (reg:SI T_REG))] TARGET_SH1 addc %2,%0 [(set_attr type arith)]) ... where the predicate arith_reg_or_0_operand allows either const_int 0 or an arith_reg_operand, but the constraint r tells reload to load the constant into a register for this insn. Probably those kind of patterns that rely on this behavior would need to be rewritten as insn_and_split to do the constant loading 'manually'? Cheers, Oleg
[google/gcc-4_7] Fix ICEs with -gfission.
This patch is for the google/gcc-4_7 branch. When a location list or location expression is removed from a DIE, we need to remove entries in the .debug_addr table that were referenced by those location expressions. Except for one case, the existing code checked only the first descriptor in each location expression instead of looping through all the descriptors. In cases where we don't remove the .debug_addr table entries, an ICE occurs during assembly output. This patch also fixes an ICE in output_pubname with -ggnu-pubnames, where we are asserting on TAGs that GDB doesn't care about. Instead of asserting, we should just be setting the flags to 0. 2012-08-20 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (remove_loc_list_addr_table_entries): Change parameter; update all calls. (output_pubname): Don't assert on unknown TAGs. (resolve_addr): Call remove_loc_list_addr_table_entries for all location expressions. Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 190548) +++ gcc/dwarf2out.c (working copy) @@ -4858,13 +4858,9 @@ remove_addr_table_entry (unsigned int i) address_table. */ static void -remove_loc_list_addr_table_entries (dw_loc_list_ref loc) +remove_loc_list_addr_table_entries (dw_loc_descr_ref descr) { - dw_loc_descr_ref descr; - - gcc_assert (loc-replaced); - - for (descr = loc-expr; descr; descr = descr-dw_loc_next) + for (; descr; descr = descr-dw_loc_next) if (descr-dw_loc_oprnd1.val_index != -1U) remove_addr_table_entry (descr-dw_loc_oprnd1.val_index); } @@ -9468,7 +9464,8 @@ output_pubname (dw_offset die_offset, pu GDB_INDEX_SYMBOL_STATIC_SET_VALUE(flags, 1); break; default: - gcc_unreachable (); + /* For unrecognized TAGs, don't set the flags. */ + break; } dw2_asm_output_data (1, flags GDB_INDEX_CU_BITSIZE, GDB-index flags); @@ -22875,8 +22872,8 @@ resolve_addr (dw_die_ref die) gcc_assert (!next-ll_symbol); next-ll_symbol = (*curr)-ll_symbol; } - if (l-dw_loc_oprnd1.val_index != -1U) - remove_addr_table_entry (l-dw_loc_oprnd1.val_index); + if (dwarf_split_debug_info) + remove_loc_list_addr_table_entries (l); *curr = next; } else @@ -22891,7 +22888,7 @@ resolve_addr (dw_die_ref die) { loc-replaced = 1; if (dwarf_split_debug_info) - remove_loc_list_addr_table_entries (loc); + remove_loc_list_addr_table_entries (loc-expr); loc-dw_loc_next = *start; } } @@ -22916,8 +22913,8 @@ resolve_addr (dw_die_ref die) || l-dw_loc_next != NULL) !resolve_addr_in_expr (l)) { - if (l-dw_loc_oprnd1.val_index != -1U) - remove_addr_table_entry (l-dw_loc_oprnd1.val_index); + if (dwarf_split_debug_info) + remove_loc_list_addr_table_entries (l); remove_AT (die, a-dw_attr); ix--; }
Re: [google/gcc-4_7] Fix ICEs with -gfission.
On Mon, Aug 20, 2012 at 5:30 PM, Cary Coutant ccout...@google.com wrote: This patch is for the google/gcc-4_7 branch. When a location list or location expression is removed from a DIE, we need to remove entries in the .debug_addr table that were referenced by those location expressions. Except for one case, the existing code checked only the first descriptor in each location expression instead of looping through all the descriptors. In cases where we don't remove the .debug_addr table entries, an ICE occurs during assembly output. This patch also fixes an ICE in output_pubname with -ggnu-pubnames, where we are asserting on TAGs that GDB doesn't care about. Instead of asserting, we should just be setting the flags to 0. 2012-08-20 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (remove_loc_list_addr_table_entries): Change parameter; update all calls. (output_pubname): Don't assert on unknown TAGs. (resolve_addr): Call remove_loc_list_addr_table_entries for all location expressions. OK for google 4.7. Sterling
[Google 4.7 Obvious] Fix trailing enumerator comma
My last change to google 4.7 that included gdb/gdb-index.h as it exists in binutils, but it included a trailing comma in an enumerator. I have checked in as obvious the patch below which eliminates the trailing comma. Sterling 2012-08-20 Sterling Augustine saugust...@google.com * gdb/gdb-index.h: Remove comma from last enum. --- branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 18:23:19 190539 +++ branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 23:05:44 190549 @@ -68,7 +68,7 @@ Give the unused bits a value so gdb will print them sensibly. */ GDB_INDEX_SYMBOL_KIND_UNUSED5 = 5, GDB_INDEX_SYMBOL_KIND_UNUSED6 = 6, - GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7, + GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7 } gdb_index_symbol_kind; #define GDB_INDEX_SYMBOL_KIND_SHIFT 28
Re: [google/gcc-4_7] Fix ICEs with -gfission.
2012-08-20 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (remove_loc_list_addr_table_entries): Change parameter; update all calls. (output_pubname): Don't assert on unknown TAGs. (resolve_addr): Call remove_loc_list_addr_table_entries for all location expressions. OK for google 4.7. Thanks, committed at r190553. -cary
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? The issue here is holding lock for all the files (that can be many) versus number of locks limits possibilities for deadlocking (mind that updating may happen in different orders on the same files for different programs built from same objects) For David: there is no thread safety code in mainline for the counters. Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was invented) http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down as too memory expensive per thread. We could optionaly do atomic updates like ICC or combination of both as discussed in the thread. So far no one implemented it since the coverage fixups seems to work well enough in pracitce for multithreaded programs where reproducibility do not seem to be _that_ important. For GCC profiled bootstrap however I would like to see the output binary to be reproducible. We realy ought to update profiles safe for multple processes. Trashing whole process run is worse than doing race in increment. There is good chance that one of runs is more important than others and it will get trashed. I do not think we do have serious update problems in the summaries at the moment. We lock individual files as we update them. The summary is simple enough to be safe. sum_all is summed, max_all is maximum over the individual runs. Even when you combine mutiple programs the summary will end up same. Everything except for max_all is ignored anyway. Solution 2 (i.e. histogram streaming) will also have the property that it is safe WRT multiple programs, just like sum_all. Honza -Andi
Re: Merge C++ conversion into trunk (0/6 - Overview)
On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote: The C++ merge caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332 GCC memory usage is more than doubled from = 3GB to = 10GB. Is this a known issue? The two memory stat reports show no differences. Are you sure you didn't splice in the wrong report? -- Lawrence Crowl
[PATCH, MIPS] fix MIPS16 jump table overflow
In config/mips/mips.h, there is presently this comment: /* ??? 16-bit offsets can overflow in large functions. */ #define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS A while ago we had a bug report where a big switch statement did, in fact, overflow the range of 16-bit offsets, causing a runtime error. GCC already has generic machinery to shorten offset tables for switch statements that does the necessary range checking, but it only works with casesi, not the lower-level tablejump expansion. So, this patch provides a casesi expander to handle this situation. This patch has been in use on our local 4.5 and 4.6 branches for about a year now. When testing it on mainline, I found it tripped over the recent change to add MIPS16 branch overflow checking in other situations, causing it to get into an infinite loop. I think telling it to ignore these new jump insns it doesn't know how to process is the right thing to do, but I'm not sure if there's a better way to restrict the condition or make mips16_split_long_branches more robust. Richard, since that's your code I assume you'll suggest an alternative if this doesn't meet with your approval. Is the rest of the patch OK to check in? -Sandra 2012-08-20 Julian Brown jul...@codesourcery.com Sandra Loosemore san...@codesourcery.com gcc/ * config/mips/mips.md (MIPS16_T_REGNUM): New constant. (tablejump): Don't use for MIPS16_SHORT_JUMP_TABLES. (casesi): New. (casesi_internal_mips16): New. * config/mips/mips.c (mips16_split_long_branches): Ignore casesi_internal_mips16 insns. * config/mips/mips.h (TARGET_MIPS16_SHORT_JUMP_TABLES): Update comment. (CASE_VECTOR_MODE): Use ptr_mode unconditionally. (CASE_VECTOR_SHORTEN_MODE): Define. (ASM_OUTPUT_ADDR_DIFF_ELT): Output word-sized addr_diff_elts when necessary for MIPS16_SHORT_JUMP_TABLES. gcc/testsuite/ * gcc.target/mips/code-readable-1.c: Generalize assembler pattern for jump table. Index: gcc/config/mips/mips.md === --- gcc/config/mips/mips.md (revision 190463) +++ gcc/config/mips/mips.md (working copy) @@ -138,6 +138,7 @@ (define_constants [(TLS_GET_TP_REGNUM 3) + (MIPS16_T_REGNUM 24) (PIC_FUNCTION_ADDR_REGNUM 25) (RETURN_ADDR_REGNUM 31) (CPRESTORE_SLOT_REGNUM 76) @@ -5904,14 +5905,9 @@ [(set (pc) (match_operand 0 register_operand)) (use (label_ref (match_operand 1 )))] - + !TARGET_MIPS16_SHORT_JUMP_TABLES { - if (TARGET_MIPS16_SHORT_JUMP_TABLES) -operands[0] = expand_binop (Pmode, add_optab, -convert_to_mode (Pmode, operands[0], false), -gen_rtx_LABEL_REF (Pmode, operands[1]), -0, 0, OPTAB_WIDEN); - else if (TARGET_GPWORD) + if (TARGET_GPWORD) operands[0] = expand_binop (Pmode, add_optab, operands[0], pic_offset_table_rtx, 0, 0, OPTAB_WIDEN); else if (TARGET_RTP_PIC) @@ -5937,6 +5933,91 @@ [(set_attr type jump) (set_attr mode none)]) +;; For MIPS16, we don't know whether a given jump table will use short or +;; word-sized offsets until late in compilation, when we are able to determine +;; the sizes of the insns which comprise the containing function. This +;; necessitates the use of the casesi rather than the tablejump pattern, since +;; the latter tries to calculate the index of the offset to jump through early +;; in compilation, i.e. at expand time, when nothing is known about the +;; eventual function layout. + +(define_expand casesi + [(match_operand:SI 0 register_operand ) ; index to jump on + (match_operand:SI 1 const_int_operand ) ; lower bound + (match_operand:SI 2 const_int_operand ) ; total range + (match_operand:SI 3 ) ; table label + (match_operand:SI 4 )] ; out of range label + TARGET_MIPS16_SHORT_JUMP_TABLES +{ + if (operands[1] != const0_rtx) +{ + rtx reg = gen_reg_rtx (SImode); + rtx offset = gen_int_mode (-INTVAL (operands[1]), SImode); + + if (!arith_operand (offset, SImode)) +offset = force_reg (SImode, offset); + + emit_insn (gen_addsi3 (reg, operands[0], offset)); + operands[0] = reg; +} + + if (!arith_operand (operands[0], SImode)) +operands[0] = force_reg (SImode, operands[0]); + + operands[2] = GEN_INT (INTVAL (operands[2]) + 1); + + emit_jump_insn (gen_casesi_internal_mips16 (operands[0], operands[2], + operands[3], operands[4])); + + DONE; +}) + +(define_insn casesi_internal_mips16 + [(set (pc) + (if_then_else + (leu (match_operand:SI 0 register_operand d) + (match_operand:SI 1 arith_operand dI)) + (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4)) + (label_ref (match_operand 2 + (label_ref (match_operand 3 + (clobber (match_scratch:SI 4 =d)) + (clobber (match_scratch:SI 5 =d)) + (clobber (reg:SI MIPS16_T_REGNUM)) + (use
Re: [PATCH] convert m32c to constraints.md
I ran the testsuite for you; no regressions. Looks OK to me, please apply. Thanks!
Re: [wwwdocs] Document Runtime CPU detection builtins
Hi Gerald / Diego, I have made all the mentioned changes. I also shortened the description like Diego mentioned by removing all the strings but kept the caveats. I have not added a reference to the documentation because i do not know what link to reference. The builtins are completely documented in extend.texi. I have attached the patch. If there are no further comments I will submit this tomorrow. Thanks, -Sri. On Mon, Aug 20, 2012 at 1:31 PM, Gerald Pfeifer ger...@pfeifer.com wrote: Hi Sriraman, On Fri, 10 Aug 2012, Sriraman Tallam wrote: I have added a release note for x86 builtins __builtin_cpu_is and __builtin_cpu_supports. They were checked in to trunk in rev. 186789. I had hoped one of the x86 maintainers would review this from his perspective given that they have more background. For the lack of that, let me give it a try. Index: changes.html === +li New builtin functions to detect run-time CPU type and ISA:br built-in, cf. http://gcc.gnu.org/codingconventions.html; here and in the following. No br here; ul should just do that. +ul + liBuiltin code__builtin_cpu_is/code has been added to detect if + the run-time CPU is of a particular type. The builtin returns a postive + integer on a match and zero otherwise. The builtin accepts one string + literal argument, the CPU name. For example, A built-in function... positive It accepts one string (to make this shorter) + code__builtin_cpu_is(westmere)/code returns a postive integer if positive + the run-time CPU is an Intel Corei7 Westmere processor. The following I don't work for Intel, but should there be a space before i7? + are the CPU names recognized by code__builtin_cpu_is:/code How about making this The following are the CPU names recognized for now, which avoids another reference to the name of the built-in and makes it clear that this is subject to change. + liBuiltin code__builtin_cpu_supports/code has been added to detect A built-in function... + returns a postive integer on a match and zero otherwise. The builtin positive + following are the ISA features recognized by + code__builtin_cpu_supports:/code Same is above? +pCaveat: If the above builtins are called before any constructors are +invoked, like during IFUNC initialization, then the CPU detection +initialization must be explicity run using this newly provided +builtin, code__builtin_cpu_init/code. ...using the new built-in function code__builtin_cpu_init/code. What is a constructor in this context, by the way? Will this be clear to all the users? +code +static void (*some_ifunc_resolver(void))(void)br +{br +nbspnbsp __builtin_cpu_init();br +nbspnbsp if (__builtin_cpu_is(amdfam10h) ...br +nbspnbsp if (__builtin_cpu_supports(popcnt) ...br +} +/code How about using pre here? That avoids the br/s which will cause problems with the web page validator, by the way. Nice job for documenting this so well. Thanks for taking the time and your patience! The patch is fine modulo the changes I pointed out (though some of them are more suggestions and you do not need to slavishly follow those). Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.10 diff -u -r1.10 changes.html --- changes.html10 Aug 2012 16:25:46 - 1.10 +++ changes.html21 Aug 2012 02:38:40 - @@ -92,6 +92,38 @@ wrong results. You must build all modules with code-mpreferred-stack-boundary=3/code, including any libraries. This includes the system libraries and startup modules./li +li New built-in functions to detect run-time CPU type and ISA: +ul + liA built-in function code__builtin_cpu_is/code has been added to + detect if the run-time CPU is of a particular type. It returns a + positive integer on a match and zero otherwise. It accepts one string + literal argument, the CPU name. For example, + code__builtin_cpu_is(westmere)/code returns a positive integer if + the run-time CPU is an Intel Core i7 Westmere processor. Please refer + to the documentation for the list of valid CPU names recognized./li + liA built-in function code__builtin_cpu_supports/code has been + added to detect if the run-time CPU supports a particular ISA feature. + It returns a positive integer on a match and zero otherwise. It accepts + one string literal argument, the ISA feature. For example, + code__builtin_cpu_supports(ssse3)/code returns a positive integer + if the run-time CPU supports SSSE3 instructions. Please refer to the + documentation for the list of valid ISA names recognized./li +/ul +pCaveat: If these built-in
Re: Merge C++ conversion into trunk (0/6 - Overview)
On Mon, Aug 20, 2012 at 6:31 PM, Lawrence Crowl cr...@google.com wrote: On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote: The C++ merge caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332 GCC memory usage is more than doubled from = 3GB to = 10GB. Is this a known issue? The two memory stat reports show no differences. Are you sure you didn't splice in the wrong report? Those are outputs from -fmem-report. I am not sure how useful they are for this bug. -- H.J.
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? The issue here is holding lock for all the files (that can be many) versus number of locks limits possibilities for deadlocking (mind that updating may happen in different orders on the same files for different programs built from same objects) For David: there is no thread safety code in mainline for the counters. Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was invented) http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down as too memory expensive per thread. We could optionaly do atomic updates like ICC or combination of both as discussed in the thread. So far no one implemented it since the coverage fixups seems to work well enough in pracitce for multithreaded programs where reproducibility do not seem to be _that_ important. For GCC profiled bootstrap however I would like to see the output binary to be reproducible. We realy ought to update profiles safe for multple processes. Trashing whole process run is worse than doing race in increment. There is good chance that one of runs is more important than others and it will get trashed. I do not think we do have serious update problems in the summaries at the moment. We lock individual files as we update them. The summary is simple enough to be safe. sum_all is summed, max_all is maximum over the individual runs. Even when you combine mutiple programs the summary will end up same. Everything except for max_all is ignored anyway. Solution 2 (i.e. histogram streaming) will also have the property that it is safe WRT multiple programs, just like sum_all. I think the sum_all based scaling of the working set entries also has this property. What is your opinion on saving the histogram in the summary and merging histograms together as best as possible compared to the alternative of saving the working set information as now and scaling it up by the ratio between the new and old sum_all when merging? Thanks, Teresa Honza -Andi -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? The issue here is holding lock for all the files (that can be many) versus number of locks limits possibilities for deadlocking (mind that updating may happen in different orders on the same files for different programs built from same objects) For David: there is no thread safety code in mainline for the counters. Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was invented) http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down as too memory expensive per thread. We could optionaly do atomic updates like ICC or combination of both as discussed in the thread. So far no one implemented it since the coverage fixups seems to work well enough in pracitce for multithreaded programs where reproducibility do not seem to be _that_ important. For GCC profiled bootstrap however I would like to see the output binary to be reproducible. We realy ought to update profiles safe for multple processes. Trashing whole process run is worse than doing race in increment. There is good chance that one of runs is more important than others and it will get trashed. I do not think we do have serious update problems in the summaries at the moment. We lock individual files as we update them. The summary is simple enough to be safe. sum_all is summed, max_all is maximum over the individual runs. Even when you combine mutiple programs the summary will end up same. Everything except for max_all is ignored anyway. Solution 2 (i.e. histogram streaming) will also have the property that it is safe WRT multiple programs, just like sum_all. I think the sum_all based scaling of the working set entries also has this property. What is your opinion on saving the histogram in the I think the scaling will have at lest roundoff issues WRT different merging orders. summary and merging histograms together as best as possible compared to the alternative of saving the working set information as now and scaling it up by the ratio between the new and old sum_all when merging? So far I like this option best. But David seems to lean towards thirtd option with whole file locking. I see it may show to be more extensible in the future. At the moment I do not understand two things 1) why do we need info on the number of counter above given threshold, sinc ethe hot/cold decisions usually depends purely on the count cutoff. Removing those will solve merging issues with variant 2 and then it would be probably good solution. 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? What other stuff Google plans to merge? (In general I would be curious about merging plans WRT profile stuff, so we get more synchronized and effective on getting patches in. We have about two months to get it done in stage1 and it would be nice to get as much as possible. Obviously some of the patches will need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think this is an important feature). I also realized today that the common value counters (used by switch, indirect call and div/mod value profiling) are non-stanble WRT different merging orders (i.e. parallel make in train run). I do not think there is actual solution to that except for not merging the counter section of this type in libgcov and merge them in some canonical order at profile feedback time. Perhaps we just want to live with this, since the disprepancy here is small. (i.e. these counters are quite rare and their outcome has just local effect on the final binary, unlike the global summaries/edge counters). Honza Thanks, Teresa Honza -Andi -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
libgo patch committed: Define F_GETLK and friends on i386 GNU/Linux
On i386 GNU/Linux, when compiling with -D_FILE_OFFSET_BITS=64, fcntl.h winds up doing this: # define F_GETLK F_GETLK64 /* Get record locking info. */ ... # define F_GETLK64 12 /* Get record locking info. */ Because of the ordering, -fdump-go-spec does not write F_GETLK into the file that it creates. The effect is that syscall.F_GETLK is not defined for i386 GNU/Linux. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu, both 32-bit and 64-bit mode. Committed to mainline and 4.7 branch. Ian diff -r a602dc132c2d libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh Tue Aug 14 20:46:42 2012 -0700 +++ b/libgo/mksysinfo.sh Mon Aug 20 22:10:34 2012 -0700 @@ -211,6 +211,16 @@ echo const O_CLOEXEC = 0 ${OUT} fi +# These flags can be lost on i386 GNU/Linux when using +# -D_FILE_OFFSET_BITS=64, because we see #define F_SETLK F_SETLK64 +# before we see the definition of F_SETLK64. +for flag in F_GETLK F_SETLK F_SETLKW; do + if ! grep ^const ${flag} ${OUT} /dev/null 21 \ + grep ^const ${flag}64 ${OUT} /dev/null 21; then +echo const ${flag} = ${flag}64 ${OUT} + fi +done + # The signal numbers. grep '^const _SIG[^_]' gen-sysinfo.go | \ grep -v '^const _SIGEV_' | \