Re: [PATCH, trivial][AArch64] Fix mode iterator for *aarch64_simd_ld1rmode pattern
On 13 November 2014 06:14, Yangfei (Felix) felix.y...@huawei.com wrote: Hi, We find that the VALLDI mode iterator used in *aarch64_simd_ld1rmode pattern is not appropriate. The reason is that it's impossible to get a new operand of DImode by vec_duplicating an operand of the same mode. So this patch just excludes the DImode and uses VALL instead. Reg-tested for aarch64-linux-gnu with QEMU. OK for the trunk? OK, can you back port it to 4.9? Thanks /Marcus Yes, committed: - trunk: r217573. - 4.9 branch: r217574.
[PATCH] Check 'fd' neither -1 nor 0, before close it
'fd' may be 0 which does not need 'open' operation, so neither need 'close' operation on it when it is 0. Also in c_common_read_pch(), when failure occurs, also need be sure the 'fd' is not '-1' for the next close operation. It passes testsuite under Fedora x86_64-unknown-linux-gnu. gcc/ * c-family/c-pch.c (c_common_read_pch): Check 'fd' neither -1 nor 0, before close it, libcpp/ * files.c (_cpp_compare_file_date, read_file, validate_pch open_file, _cpp_save_file_entries): Check 'fd' neither -1 nor 0, before close it. --- gcc/c-family/c-pch.c | 10 ++ libcpp/files.c | 20 +++- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c index 93609b6..d001965 100644 --- a/gcc/c-family/c-pch.c +++ b/gcc/c-family/c-pch.c @@ -355,7 +355,8 @@ c_common_read_pch (cpp_reader *pfile, const char *name, if (f == NULL) { cpp_errno (pfile, CPP_DL_ERROR, calling fdopen); - close (fd); + if (fd fd != -1) + close (fd); goto end; } @@ -376,14 +377,15 @@ c_common_read_pch (cpp_reader *pfile, const char *name, timevar_push (TV_PCH_CPP_RESTORE); if (cpp_read_state (pfile, name, f, smd) != 0) { - fclose (f); + if (fd) + fclose (f); timevar_pop (TV_PCH_CPP_RESTORE); goto end; } timevar_pop (TV_PCH_CPP_RESTORE); - - fclose (f); + if (fd) +fclose (f); line_table-trace_includes = saved_trace_includes; linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line); diff --git a/libcpp/files.c b/libcpp/files.c index 3984821..5c845da 100644 --- a/libcpp/files.c +++ b/libcpp/files.c @@ -243,7 +243,8 @@ open_file (_cpp_file *file) errno = ENOENT; } - close (file-fd); + if (file-fd) +close (file-fd); file-fd = -1; } #if defined(_WIN32) !defined(__CYGWIN__) @@ -753,7 +754,8 @@ read_file (cpp_reader *pfile, _cpp_file *file) } file-dont_read = !read_file_guts (pfile, file); - close (file-fd); + if (file-fd) +close (file-fd); file-fd = -1; return !file-dont_read; @@ -1435,11 +1437,9 @@ _cpp_compare_file_date (cpp_reader *pfile, const char *fname, if (file-err_no) return -1; - if (file-fd != -1) -{ - close (file-fd); - file-fd = -1; -} + if (file-fd file-fd != -1) +close (file-fd); + file-fd = -1; return file-st.st_mtime pfile-buffer-file-st.st_mtime; } @@ -1694,7 +1694,8 @@ validate_pch (cpp_reader *pfile, _cpp_file *file, const char *pchname) if (!valid) { - close (file-fd); + if (file-fd) + close (file-fd); file-fd = -1; } @@ -1849,7 +1850,8 @@ _cpp_save_file_entries (cpp_reader *pfile, FILE *fp) } ff = fdopen (f-fd, rb); md5_stream (ff, result-entries[count].sum); - fclose (ff); + if (f-fd) + fclose (ff); f-fd = oldfd; } result-entries[count].size = f-st.st_size; -- 1.9.3
[Patch, Fortran] Convert gfc_fatal_error to common diagnostics
Especially since color diagnostic is now the default [1], it makes sense to convert more gfortran diagnostics to use the common diagnostics. For an example, see [1]. That also brings all the nice features like placing the warning option in brackets: Warning: USE statement at (1) has no ONLY qualifier [-Wuse-without-only] Adding -Werror changes it to: Error: USE statement at (1) has no ONLY qualifier [-Werror=use-without-only] -fno-diagnostics-show-caret compactifies the error into a single line without showing the source code – and other nice features. [Thanks Manuel!] This patch converts all gfc_fatal_error to the new scheme, except for those few using %L. As most calls could be converted, I renamed the old one to _1 instead of using the _2 name for the new one. Additionally, I changed quoted strings from '%s' to %qs and added quotes around -farguments via % … %. That also has a colouring effect (default: black and bold). Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias PS: Manuel is working on %L support and buffered output (gfc_error, gfc_warning); thus, expect more colors in the next weeks. While I intent to convert gfc_intrinsic_error and the remaining gfc_{error,warning}_now (all which do not use %L). PPS: Turning on colors comes too late for the command-line diagnostic - they always come up in blank and white. [1] See https://gcc.gnu.org/gcc-5/changes.html under Fortran and under C family. 2014-11-15 Tobias Burnus bur...@net-b.de gcc/fortran/ * error.c (gfc_fatal_error_1): Renamed from gfc_fatal_error. (gfc_fatal_error): Add; uses common diagnostics. * array.c (gfc_match_array_ref, gfc_match_array_spec): Use % %. * check.c (check_co_collective, gfc_check_lcobound, gfc_check_image_index, gfc_check_num_images, gfc_check_this_image, gfc_check_ucobound): Ditto. * cpp.c (gfc_cpp_post_options): Ditto. (gfc_cpp_init_0, gfc_cpp_done): Change %s to %qs. * gfc-diagnostic.def (DK_FATAL): Capitalize first letter. * gfortran.h (gfc_fatal_error_1): Add. * match.c (gfc_match_name, gfc_match_critical, lock_unlock_statement, sync_statement): Add % %. * module.c (bad_module, gfc_dump_module, gfc_use_module): Change %s to %qs. * options.c (gfc_handle_module_path_options, gfc_handle_fpe_option, gfc_handle_coarray_option, gfc_handle_runtime_check_option, gfc_handle_option): Add % %. * simplify.c (gfc_simplify_num_images): Ditto. * trans-stmt.c (gfc_trans_sync): Use gfc_fatal_error_1. * trans-array.c (gfc_conv_array_initializer): Ditto. * trans-types.c (gfc_init_kinds): Use gfc_fatal_error instead of fatal_error; add % % quotations. gcc/testsuite/ * gfortran.dg/binding_label_tests_4.f03: Add dg-excess-errors. * gfortran.dg/coarray_9.f90: Ditto. * gfortran.dg/empty_label.f: Ditto. * gfortran.dg/empty_label.f90: Ditto. diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c index ef2aa69..159e626 100644 --- a/gcc/fortran/array.c +++ b/gcc/fortran/array.c @@ -209,7 +209,7 @@ coarray: if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %C, use -fcoarray= to enable); + gfc_fatal_error (Coarrays disabled at %C, use %-fcoarray=% to enable); return MATCH_ERROR; } @@ -592,7 +592,7 @@ coarray: if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %C, use -fcoarray= to enable); + gfc_fatal_error (Coarrays disabled at %C, use %-fcoarray=% to enable); goto cleanup; } diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 6f1fe3f..034b329 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -1482,8 +1482,8 @@ check_co_collective (gfc_expr *a, gfc_expr *image_idx, gfc_expr *stat, if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %L, use -fcoarray= to enable, - a-where); + gfc_fatal_error_1 (Coarrays disabled at %L, use -fcoarray= to enable, + a-where); return false; } @@ -2569,7 +2569,7 @@ gfc_check_lcobound (gfc_expr *coarray, gfc_expr *dim, gfc_expr *kind) { if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %C, use -fcoarray= to enable); + gfc_fatal_error (Coarrays disabled at %C, use %-fcoarray=% to enable); return false; } @@ -4847,7 +4847,7 @@ gfc_check_image_index (gfc_expr *coarray, gfc_expr *sub) if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %C, use -fcoarray= to enable); + gfc_fatal_error (Coarrays disabled at %C, use %-fcoarray=% to enable); return false; } @@ -4885,7 +4885,7 @@ gfc_check_num_images (gfc_expr *distance, gfc_expr *failed) { if (gfc_option.coarray == GFC_FCOARRAY_NONE) { - gfc_fatal_error (Coarrays disabled at %C, use -fcoarray= to
Re: Follow-up to PR51471
IIRC, fill_eager and its related friends are all speculative in some way and aren't those precisely the ones that are causing us problems. Also note we have backends working around this stuff in fairly blunt ways: I'd say that the PA back-end went a bit too far here, especially if it marks some insns of the epilogue as frame-related. dwarf2cfi.c has special code to handle delay slots (SEQUENCEs) so it's not an all-or-nothing game. Given architectural difficulties of delay slots on modern processors, would it be that painful to just not allow filling slots with frame insns and let dbr try to find something else or drop in a nop? I wouldn't be all that surprised if there wasn't a measurable performance difference on something like a modern Sparc. Yes, modern SPARCs have (short) branches without delay slots. But the other big contender is MIPS here and the story might be different for it. -- Eric Botcazou
Re: Drop target_option_node reconstruction logic.
On November 14, 2014 8:13:15 PM CET, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch kills lto's code to rebuilt DECL_FUNCTION_SPECIFIC_TARGET from target attributes. This code was never complete and it should be no-op now when we save tehe target nodes. It also makes free_land_data_in_decl to actually anotate all function bodies with a default option node. The reason is that when LTOint units, one compiled with default settings and one, say with -msse3, we want ot keep these functions preserved. Incrementally i will proceed with similar changes for optimization nodes. Bootstrapped/regtested ppc64-linux, earlier version tested at x86_64-linux and firefox LTO, OK? OK. Thanks, Richard. Honza * lto.c (lto_read_decls): Do not rebuild DECL_FUNCTION_SPECIFIC_TARGET. * tree.c (free_lang_data_in_decl): Annotate all functio nbodies with DECL_FUNCTION_SPECIFIC_TARGET. Index: lto/lto.c === --- lto/lto.c (revision 217571) +++ lto/lto.c (working copy) @@ -1935,15 +1935,6 @@ if (TREE_CODE (t) == INTEGER_CST !TREE_OVERFLOW (t)) cache_integer_cst (t); -/* Re-build DECL_FUNCTION_SPECIFIC_TARGET, we need that - for both WPA and LTRANS stage. */ -if (TREE_CODE (t) == FUNCTION_DECL) - { -tree attr = lookup_attribute (target, DECL_ATTRIBUTES (t)); -if (attr) - targetm.target_option.valid_attribute_p - (t, NULL_TREE, TREE_VALUE (attr), 0); - } /* Register TYPE_DECLs with the debuginfo machinery. */ if (!flag_wpa TREE_CODE (t) == TYPE_DECL) Index: tree.c === --- tree.c (revision 217571) +++ tree.c (working copy) @@ -5115,6 +5115,9 @@ the PARM_DECL will be used in the function's body). */ for (t = DECL_ARGUMENTS (decl); t; t = TREE_CHAIN (t)) DECL_CONTEXT (t) = decl; +if (!DECL_FUNCTION_SPECIFIC_TARGET (decl)) + DECL_FUNCTION_SPECIFIC_TARGET (decl) += target_option_default_node; } /* DECL_SAVED_TREE holds the GENERIC representation for DECL.
Re: [Patch, Fortran] Convert gfc_fatal_error to common diagnostics
Build and regtested on x86-64-gnu-linux. OK for the trunk? Please document, in the source, the difference between gfc_fatal_error and gfc_fatal_error_1. They currently have the same generic description, which wouldn’t help people writing new front-end code to know which one to use. Moreover, if the transition will not be complete soon (or indeterminate), it should be added to the wiki’s list of partial transitions. Other than that, OK, and thanks for doing this tedious work. FX
Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization
The close of stage 1 is getting close (very close). Even there is not so much new code (basically the new code computes the separation class option for AST build), I am not sure that the patch qualify for stage 2. There is very nice code generated by unroll-and-jam (stride mining) for small kernels both for constant or non-constant bound loops, and is an argument for the new isl based code generator. Otherwise I'm afraid that the code generated looks very similar with the cloog generated one, an inner loop with bounds of min/max that GCC doesn't further optimize, preventing perceived advantages of strip mining (register reuse and scalar reduction, instruction scheduling etc). ok for trunk ? Thanks, Mircea Index: gcc/graphite-poly.h === --- gcc/graphite-poly.h (revision 217013) +++ gcc/graphite-poly.h (working copy) @@ -349,6 +349,9 @@ poly_scattering_p _saved; isl_map *saved; + /* For tiling, the map for computing the separating class. */ + isl_map *map_sepclass; + /* True when this PBB contains only a reduction statement. */ bool is_reduction; }; Index: gcc/graphite.c === --- gcc/graphite.c (revision 217013) +++ gcc/graphite.c (working copy) @@ -383,7 +383,8 @@ || flag_loop_strip_mine || flag_graphite_identity || flag_loop_parallelize_all - || flag_loop_optimize_isl) + || flag_loop_optimize_isl + || flag_loop_unroll_jam) flag_graphite = 1; return flag_graphite != 0; Index: gcc/common.opt === --- gcc/common.opt (revision 217013) +++ gcc/common.opt (working copy) @@ -1328,6 +1328,10 @@ Common Report Var(flag_loop_block) Optimization Enable Loop Blocking transformation +floop-unroll-and-jam +Common Report Var(flag_loop_unroll_jam) Optimization +Enable Loop Unroll Jam transformation + fgnu-tm Common Report Var(flag_tm) Enable support for GNU transactional memory Index: gcc/graphite-optimize-isl.c === --- gcc/graphite-optimize-isl.c (revision 217013) +++ gcc/graphite-optimize-isl.c (working copy) @@ -186,7 +186,7 @@ PartialSchedule = isl_band_get_partial_schedule (Band); *Dimensions = isl_band_n_member (Band); - if (DisableTiling) + if (DisableTiling || flag_loop_unroll_jam) return PartialSchedule; /* It does not make any sense to tile a band with just one dimension. */ @@ -241,7 +241,9 @@ constant number of iterations, if the number of loop iterations at DimToVectorize can be devided by VectorWidth. The default VectorWidth is currently constant and not yet target specific. This function does not reason - about parallelism. */ + about parallelism. + + */ static isl_map * getPrevectorMap (isl_ctx *ctx, int DimToVectorize, int ScheduleDimensions, @@ -305,8 +307,98 @@ isl_constraint_set_constant_si (c, VectorWidth - 1); TilingMap = isl_map_add_constraint (TilingMap, c); - isl_map_dump (TilingMap); + return TilingMap; +} +/* Compute an auxiliary map to getPrevectorMap, for computing the separating + class defined by full tiles. Used in graphite_isl_ast_to_gimple.c to set the + corresponding option for AST build. + + The map (for VectorWidth=4): + + [i,j] - [it,j,ip] : it % 4 = 0 and it = ip = it + 3 and it + 3 = i and +ip = 0 + + The image of this map is the separation class. The range of this map includes + all the i that are multiple of 4 in the domain beside the greater one. + + */ +static isl_map * +getPrevectorMap_full (isl_ctx *ctx, int DimToVectorize, + int ScheduleDimensions, + int VectorWidth) +{ + isl_space *Space; + isl_local_space *LocalSpace, *LocalSpaceRange; + isl_set *Modulo; + isl_map *TilingMap; + isl_constraint *c; + isl_aff *Aff; + int PointDimension; /* ip */ + int TileDimension; /* it */ + isl_val *VectorWidthMP; + int i; + + /* assert (0 = DimToVectorize DimToVectorize ScheduleDimensions);*/ + + Space = isl_space_alloc (ctx, 0, ScheduleDimensions, ScheduleDimensions + 1); + TilingMap = isl_map_universe (isl_space_copy (Space)); + LocalSpace = isl_local_space_from_space (Space); + PointDimension = ScheduleDimensions; + TileDimension = DimToVectorize; + + /* Create an identity map for everything except DimToVectorize and the + point loop. */ + for (i = 0; i ScheduleDimensions; i++) +{ + if (i == DimToVectorize) +continue; + + c = isl_equality_alloc (isl_local_space_copy (LocalSpace)); + + isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1); + isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1); + + TilingMap = isl_map_add_constraint (TilingMap, c); +} + + /* it % 'VectorWidth' = 0 */ + LocalSpaceRange = isl_local_space_range (isl_local_space_copy (LocalSpace)); + Aff =
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote: On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm dmalc...@redhat.com wrote: On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a gassign * (s1); gassign *a2 = as_a gassign * (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. But even that looks like just adding extra work for all developers, with no gain. You only have to add extra code and extra temporaries, in switches typically also have to add {} because of the temporaries and thus extra indentation level, and it doesn't simplify anything in the code. The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from gimple to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. Can you e.g. compare the size of - lines in your patchset combined, and size of + lines in your patchset? As in, if your changes lead to less typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a gassign * (g); } into a gimple-compat.h header which you include in places that are not converted nicely. Thanks for the suggestions. Am I missing something, or is the gimple-compat.h idea above not valid C ++? Note that gimple is still a typedef to gimple_statement_base * (as noted before, the gimple - gimple * change would break everyone else's patches, so we talked about that as a followup patch for early stage3). Given that, if I try to create an operator () outside of a class, I get this error: ‘gassign* operator()(gimple)’ must be a nonstatic member function which is emitted from cp/decl.c's grok_op_properties: /* An operator function must either be a non-static member function or have at least one parameter of a class, a reference to a class, an enumeration, or a reference to an enumeration. 13.4.0.6 */ I tried making it a member function of gimple_statement_base, but that doesn't work either: we want a conversion from a gimple_statement_base * to a gassign *, not from a gimple_statement_base to a gassign *. Is there some syntactic trick here that I'm missing? Sorry if I'm being dumb (I can imagine there's a way of doing it by making gimple become some kind of wrapped ptr class, but that way lies madness, surely). Hmm. struct assign; struct base { operator assign *() const { return (assign *)this; } }; struct assign : base { }; void foo (assign *); void bar (base *b) { foo (b); } doesn't work, but void bar (base b) { foo (b); } does. Indeed C++ doesn't seem to provide what is necessary for the compat trick :( So the gimple-compat.h header would need to provide additional overloads for the affected functions like inline
Re: [Patch, Fortran] Convert gfc_fatal_error to common diagnostics
FX wrote: Please document, in the source, the difference between gfc_fatal_error and gfc_fatal_error_1. They currently have the same generic description, which wouldn’t help people writing new front-end code to know which one to use. Moreover, if the transition will not be complete soon (or indeterminate), it should be added to the wiki’s list of partial transitions. Well, the diagnostics conversion is on going and was only delayed due to delays of reviewing the line-map part of last patch. (That part was required for %C support; the last patch added the _2 variants for gfc_error_now/gfc_warning_now.) In any case, the support for %L should be ready soon. When that's in, there won't be any need for gfc*_error*/gfc*_warning* duplication any more. Support for buffered output (and discarding it), will take a bit longer – but is also planed for GCC 5. This support is required for gfc_error/gfc_warning and, hence, for most diagnostic output. Thus, I don't think it should be put into the wiki. (Admittedly, I also do not know which page you are referring to.) In any case, there are several PRs about issues fixed by the on-going change to the common diagnostics. When that's done, there are still additional task for diagnostic improvements left (see PRs), all which required the common diagnostic in place. Other than that, OK, and thanks for doing this tedious work. Thanks for the review! For the diagnostic changes, you have mainly to thank Manuel, who is the driving force behind all diagnositic work (C, C++) and who did the lion share of the Fortran front end work (including the required changes in the common part). Tobias PS: Attached is the error.c part of the committed patch (r217600); I added a few lines above the functions _2/_1 to make clear when to use them. I hope that we can soon remove the old version. Index: gcc/fortran/error.c === --- gcc/fortran/error.c (Revision 217599) +++ gcc/fortran/error.c (Arbeitskopie) @@ -933,6 +933,7 @@ gfc_notify_std (int std, const char *gmsgid, ...) /* Immediate warning (i.e. do not buffer the warning). */ +/* Use gfc_warning_now_2 instead, unless gmsgid contains a %L. */ void gfc_warning_now (const char *gmsgid, ...) @@ -1086,6 +1087,7 @@ gfc_diagnostic_finalizer (diagnostic_context *cont } /* Immediate warning (i.e. do not buffer the warning). */ +/* This function uses the common diagnostics, but does not support %L, yet. */ bool gfc_warning_now_2 (int opt, const char *gmsgid, ...) @@ -1104,6 +1106,7 @@ gfc_warning_now_2 (int opt, const char *gmsgid, .. } /* Immediate warning (i.e. do not buffer the warning). */ +/* This function uses the common diagnostics, but does not support %L, yet. */ bool gfc_warning_now_2 (const char *gmsgid, ...) @@ -1122,6 +1125,7 @@ gfc_warning_now_2 (const char *gmsgid, ...) /* Immediate error (i.e. do not buffer). */ +/* This function uses the common diagnostics, but does not support %L, yet. */ void gfc_error_now_2 (const char *gmsgid, ...) @@ -1135,6 +1139,24 @@ gfc_error_now_2 (const char *gmsgid, ...) va_end (argp); } + +/* Fatal error, never returns. */ +/* This function uses the common diagnostics, but does not support %L, yet. */ + +void +gfc_fatal_error (const char *gmsgid, ...) +{ + va_list argp; + diagnostic_info diagnostic; + + va_start (argp, gmsgid); + diagnostic_set_info (diagnostic, gmsgid, argp, UNKNOWN_LOCATION, DK_FATAL); + report_diagnostic (diagnostic); + va_end (argp); + + gcc_unreachable (); +} + /* Clear the warning flag. */ void @@ -1213,6 +1235,7 @@ warning: /* Immediate error. */ +/* Use gfc_error_now_2 instead, unless gmsgid contains a %L. */ void gfc_error_now (const char *gmsgid, ...) @@ -1243,9 +1266,10 @@ gfc_error_now (const char *gmsgid, ...) /* Fatal error, never returns. */ +/* Use gfc_fatal_error instead, unless gmsgid contains a %L. */ void -gfc_fatal_error (const char *gmsgid, ...) +gfc_fatal_error_1 (const char *gmsgid, ...) { va_list argp;
Re: [PATCH] Add force option to find_best_rename_reg in regrename pass
It looks at register that respect the constraints of all the instructions in the set and tries to pick one in the preferred class for all the instructions involved. This is generally useful for any pass that wants to do register renaming. However it also contains some logic to only select the register that also haven't been used for a longer time than the register that should be replaced. This bit is specific to the register renaming pass and makes the function unusable for this new pass as a result which forces us to do a copy of the function. This patch adds an extra parameter to skip this check and only consider the constraints and tries to pick a register in the preferred class. OK on principle but... 2014-11-14 Thomas Preud'homme thomas.preudho...@arm.com * regrename.c (find_best_rename_reg): Rename to ... (find_rename_reg): This. Also add a parameter to skip tick check. * regrename.h: Likewise. * config/c6x/c6x.c: Adapt to above renaming. Missing function in config/c6x/c6x.c entry. @@ -408,8 +410,13 @@ find_best_rename_reg (du_head_p this_head, enum reg_class super_class, ((pass == 0 !TEST_HARD_REG_BIT (reg_class_contents[preferred_class], best_new_reg)) - || tick[best_new_reg] tick[new_reg])) - best_new_reg = new_reg; + || !best_rename || tick[best_new_reg] tick[new_reg])) + { + if (best_rename) + best_new_reg = new_reg; + else + return new_reg; + } } if (pass == 0 best_new_reg != old_reg) break; Please write it like so: if (!check_new_reg_p (old_reg, new_reg, this_head, *unavailable)) continue; if (!best_rename) return new_reg; /* In the first pass, we force the renaming of registers that don't belong to PREFERRED_CLASS to registers that do, even though the latters were used not very long ago. * if ((pass == 0 !TEST_HARD_REG_BIT (reg_class_contents[preferred_class], best_new_reg)) || tick[best_new_reg] tick[new_reg])) best_new_reg = new_reg; -- Eric Botcazou
Re: [Patch, Fortran] Convert gfc_fatal_error to common diagnostics
Thus, I don't think it should be put into the wiki. (Admittedly, I also do not know which page you are referring to.) In any case, there are several PRs about issues fixed by the on-going change to the common diagnostics. OK. (The page I was referring to is here: https://gcc.gnu.org/wiki/Partial_Transitions) FX
[PATCH] Fix Cilk+ ICEs with overflow builtins (PR middle-end/63884)
The problem here is that the Cilk+ code wasn't prepared to handle internal calls that the new overflow builtins entail. Fixed by checking that the CALL_EXPR_FN isn't NULL. Looking at cilk-plus.exp, I think this file will need some tweaks now that the C default is gnu11... Bootstrapped/regtested on powerpc64-linux, ok for trunk? 2014-11-15 Marek Polacek pola...@redhat.com PR middle-end/63884 c-family/ * array-notation-common.c (is_sec_implicit_index_fn): Return false for NULL fndecl. (extract_array_notation_exprs): Return for NULL node. testsuite/ * c-c++-common/cilk-plus/AN/pr63884.c: New test. diff --git gcc/c-family/array-notation-common.c gcc/c-family/array-notation-common.c index f8bce04..cb5708c 100644 --- gcc/c-family/array-notation-common.c +++ gcc/c-family/array-notation-common.c @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3. If not see bool is_sec_implicit_index_fn (tree fndecl) { + if (!fndecl) +return false; + if (TREE_CODE (fndecl) == ADDR_EXPR) fndecl = TREE_OPERAND (fndecl, 0); @@ -327,6 +330,9 @@ extract_array_notation_exprs (tree node, bool ignore_builtin_fn, vectree, va_gc **array_list) { size_t ii = 0; + + if (!node) +return; if (TREE_CODE (node) == ARRAY_NOTATION_REF) { vec_safe_push (*array_list, node); diff --git gcc/testsuite/c-c++-common/cilk-plus/AN/pr63884.c gcc/testsuite/c-c++-common/cilk-plus/AN/pr63884.c index e69de29..c876a8d 100644 --- gcc/testsuite/c-c++-common/cilk-plus/AN/pr63884.c +++ gcc/testsuite/c-c++-common/cilk-plus/AN/pr63884.c @@ -0,0 +1,10 @@ +/* PR middle-end/63884 */ +/* { dg-do compile } */ +/* { dg-options -fcilkplus } */ + +int +foo (int x, int y) +{ + int r; + return __builtin_sadd_overflow (x, y, r); +} Marek
Add log message for max-completely-peeled-times
Try_unroll_loop_completely logs a message for max-completely-peeled-insns: else if (unr_insns (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)) { if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Not unrolling loop %d: (--param max-completely-peeled-insns limit reached).\n, loop-num); return false; } but not for max-completely-peeled-times: max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES); if (n_unroll max_unroll) return false; so the attached patch adds one. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2014-11-15 Eric Botcazou ebotca...@adacore.com * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add log message for max-completely-peeled-insns limit. -- Eric BotcazouIndex: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 217538) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -674,7 +674,7 @@ try_unroll_loop_completely (struct loop HOST_WIDE_INT maxiter, location_t locus) { - unsigned HOST_WIDE_INT n_unroll = 0, ninsns, max_unroll, unr_insns; + unsigned HOST_WIDE_INT n_unroll = 0, ninsns, unr_insns; gimple cond; struct loop_size size; bool n_unroll_found = false; @@ -720,9 +720,14 @@ try_unroll_loop_completely (struct loop if (!n_unroll_found) return false; - max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES); - if (n_unroll max_unroll) -return false; + if (n_unroll (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES)) +{ + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Not unrolling loop %d + (--param max-completely-peeled-times limit reached).\n, + loop-num); + return false; +} if (!edge_to_cancel) edge_to_cancel = loop_edge_to_cancel (loop);
openacc kernels directive -- initial support
Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... The patch series does not yet apply cleanly to trunk, since it's dependent on the oacc middle end changes present in the gomp-4_0-branch, already submitted by Thomas for trunk. Furthermore, it's dependent on an assert fix submitted for trunk ('Fix gcc_assert in expand_omp_for_static_chunk' @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01149.html ). The patch series is intended for trunk, but - given the dependency on the oacc middle end changes - has been bootstrapped for x86_64 on top of gomp-4_0-branch. I'll post the patch series in reply to this email. Thanks, - Tom [ FTR In order to get clean libgomp and goacc test results in gomp-4_0-branch, to have a good basis for testing, I used the following patch set: Don't allow flto-partition=balance for fopenacc Unsubmitted. This works around a compilation problem for libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-2.c that I ran into on our internal dev branch. I'll investigate whether I can reproduce with gomp-4_0-branch asap. Mark fopenacc as LTO option @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00085.html Only use nvidia accelerator if present @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00247.html Set default LIBGOMP_PLUGIN_PATH @ https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00242.html ]
Fix ICE on pragma Loop_Optimize in Ada
The attached testcase triggers an ICE because it contains pragma Loop_Optimize which survives down to RTL expansion. There are 2 bugs: first, this should not ICE but issue a ignoring loop annotation message (this was accidentally disabled in https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00681.html) and, second, the code must look for IFN_ANNOTATE calls in the latch as well. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2014-11-15 Eric Botcazou ebotca...@adacore.com * tree-cfg.c (replace_loop_annotate_in_block): New function extracted from... (replace_loop_annotate): ...here. Call it on the header and on the latch block, if any. Restore proper behavior of final cleanup. 2014-11-15 Eric Botcazou ebotca...@adacore.com * gnat.dg/opt44.ad[sb]: New test. -- Eric BotcazouIndex: tree-cfg.c === --- tree-cfg.c (revision 217538) +++ tree-cfg.c (working copy) @@ -265,13 +265,56 @@ build_gimple_cfg (gimple_seq seq) discriminator_per_locus = NULL; } +/* Look for ANNOTATE calls with loop annotation kind in BB; if found, remove + them and propagate the information to LOOP. We assume that the annotations + come immediately before the condition in BB, if any. */ + +static void +replace_loop_annotate_in_block (basic_block bb, struct loop *loop) +{ + gimple_stmt_iterator gsi = gsi_last_bb (bb); + gimple stmt = gsi_stmt (gsi); + + if (!(stmt gimple_code (stmt) == GIMPLE_COND)) +return; + + for (gsi_prev_nondebug (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) +{ + stmt = gsi_stmt (gsi); + if (gimple_code (stmt) != GIMPLE_CALL) + break; + if (!gimple_call_internal_p (stmt) + || gimple_call_internal_fn (stmt) != IFN_ANNOTATE) + break; + + switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1))) + { + case annot_expr_ivdep_kind: + loop-safelen = INT_MAX; + break; + case annot_expr_no_vector_kind: + loop-dont_vectorize = true; + break; + case annot_expr_vector_kind: + loop-force_vectorize = true; + cfun-has_force_vectorize_loops = true; + break; + default: + gcc_unreachable (); + } + + stmt = gimple_build_assign (gimple_call_lhs (stmt), + gimple_call_arg (stmt, 0)); + gsi_replace (gsi, stmt, true); +} +} /* Look for ANNOTATE calls with loop annotation kind; if found, remove them and propagate the information to the loop. We assume that the annotations come immediately before the condition of the loop. */ static void -replace_loop_annotate () +replace_loop_annotate (void) { struct loop *loop; basic_block bb; @@ -280,37 +323,12 @@ replace_loop_annotate () FOR_EACH_LOOP (loop, 0) { - gsi = gsi_last_bb (loop-header); - stmt = gsi_stmt (gsi); - if (!(stmt gimple_code (stmt) == GIMPLE_COND)) - continue; - for (gsi_prev_nondebug (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) - { - stmt = gsi_stmt (gsi); - if (gimple_code (stmt) != GIMPLE_CALL) - break; - if (!gimple_call_internal_p (stmt) - || gimple_call_internal_fn (stmt) != IFN_ANNOTATE) - break; - switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1))) - { - case annot_expr_ivdep_kind: - loop-safelen = INT_MAX; - break; - case annot_expr_no_vector_kind: - loop-dont_vectorize = true; - break; - case annot_expr_vector_kind: - loop-force_vectorize = true; - cfun-has_force_vectorize_loops = true; - break; - default: - gcc_unreachable (); - } - stmt = gimple_build_assign (gimple_call_lhs (stmt), - gimple_call_arg (stmt, 0)); - gsi_replace (gsi, stmt, true); - } + /* First look into the header. */ + replace_loop_annotate_in_block (loop-header, loop); + + /* Then look into the latch, if any. */ + if (loop-latch) + replace_loop_annotate_in_block (loop-latch, loop); } /* Remove IFN_ANNOTATE. Safeguard for the case loop-latch == NULL. */ @@ -320,10 +338,11 @@ replace_loop_annotate () { stmt = gsi_stmt (gsi); if (gimple_code (stmt) != GIMPLE_CALL) - break; + continue; if (!gimple_call_internal_p (stmt) || gimple_call_internal_fn (stmt) != IFN_ANNOTATE) - break; + continue; + switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1))) { case annot_expr_ivdep_kind: @@ -333,6 +352,7 @@ replace_loop_annotate () default: gcc_unreachable (); } + warning_at (gimple_location (stmt), 0, ignoring loop annotation); stmt = gimple_build_assign (gimple_call_lhs (stmt), gimple_call_arg (stmt, 0));-- { dg-do compile } -- { dg-options -O } package body Opt44 is procedure Addsub (X, Y : Sarray; R : out Sarray; N : Integer) is begin for I in Sarray'Range loop pragma Loop_Optimize (Ivdep); pragma Loop_Optimize (Vector); if N 0 then R(I) :=
Re: [PATCH 2/5] combine: handle I2 a parallel of two SETs
On Fri, Nov 14, 2014 at 08:35:48PM +0100, Bernd Schmidt wrote: On 11/14/2014 08:19 PM, Segher Boessenkool wrote: + /* If I2 is a PARALLEL of two SETs of REGs (and perhaps some CLOBBERs), + make those two SETs separate I1 and I2 insns, and make an I0 that is + the original I1. */ + if (i0 == 0 + GET_CODE (PATTERN (i2)) == PARALLEL + XVECLEN (PATTERN (i2), 0) = 2 + GET_CODE (XVECEXP (PATTERN (i2), 0, 0)) == SET + GET_CODE (XVECEXP (PATTERN (i2), 0, 1)) == SET + REG_P (SET_DEST (XVECEXP (PATTERN (i2), 0, 0))) + REG_P (SET_DEST (XVECEXP (PATTERN (i2), 0, 1))) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3) + !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3) Don't we have other code in combine checking the reg_used_between case? It doesn't make any sense at all. What I wanted to check is whether splitting the parallel creates a conflict, but I woefully failed at that. Will fix. + (XVECLEN (PATTERN (i2), 0) == 2 + || GET_CODE (XVECEXP (PATTERN (i2), 0, 2)) == CLOBBER)) This probably wants to test for XVECLEN == 3 for the second case. Can then drop the earlier test. It needs to test there are exactly two SETs, any amount of clobbers, and nothing else. I think you also need to check that !reg_overlap_mentioned_p between the two dests and the other set's sources. Only the dest of the new I1 with the sources of the new I2, but yes. Segher
Re: [PATCH][AArch64] LR register not used in leaf functions
2014-11-15 0:15 GMT+00:00 Andrew Pinski pins...@gmail.com: On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang jiong.w...@arm.com wrote: On 27/09/14 22:20, Kugan wrote: On 23/09/14 01:58, Jiong Wang wrote: + /* If we decided that we didn't need a leaf frame pointer but then used + LR in the function, then we'll want a frame pointer after all, so + prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + flag_omit_leaf_frame_pointer + df_regs_ever_live_p (LR_REGNUM)) + return false; This breaks my build on aarch64-elf (with some local modifications) Hi Andrew, then what's your local modification? I think the problem is we need to figure out why there is an ICE after your local modification? can you please send me your local modification and testcase if possible. aarch64_frame_pointer_required returns true but then we use LR but now aarch64_can_eliminate and aarch64_frame_pointer_required are inconsitant which is not a valid thing for LRA (and reload). This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : IRA calls hook frame_pointer_required and it returns false. After that LRA calls can_eliminate hook and it returns false which means that fp can not be used for allocation and we should spill all pseudos assigned to it. Thanks, Andrew Pinski
Re: patch switching on LRA remat
On Fri, Nov 14, 2014 at 12:07 PM, Vladimir Makarov vmaka...@redhat.com wrote: The LRA rematerialization patch I've submitted about day ago broke H.J.'s 32-bit bootstrap. So I switched off the rematerialization right away. The set for bootstrapping used by H.J. was very useful. I've fixed several existing and potential bugs. Here the patch fixing the bugs and switching on LRA remat back. The patch was bootstrapped on x86-64 and i686 (using H.J.'s options). Committed as rev. 217588. 2014-11-14 Vladimir Makarov vmaka...@redhat.com * lra-int.h (lra_create_live_ranges): Add parameter. * lra-lives.c (temp_bitmap): Move higher. (initiate_live_solver): Move temp_bitmap initialization into lra_live_ranges_init. (finish_live_solver): Move temp_bitmap clearing into live_ranges_finish. (process_bb_lives): Add parameter. Use it to control live info update and dead insn elimination. Pass it to mark_regno_live and mark_regno_dead. (lra_create_live_ranges): Add parameter. Pass it to process_bb_lives. (lra_live_ranges_init, lra_live_ranges_finish): See changes in initiate_live_solver and finish_live_solver. * lra-remat.c (do_remat): Process insn non-operand hard regs too. Use temp_bitmap to update avail_cands. * lra.c (lra): Pass new parameter to lra_create_live_ranges. Move check with lra_need_for_spill_p after live range pass. Switch on rematerialization pass. Unfortunately, it failed to bootstrap ia32 GCC: https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00392.html You can bootstrap ia32 GCC on Linux/x86-64: 1. Install ia32 binutils under /foo/bar. 2. Set PATH=/foo/bar:$PATH 3. Install 32-bit libraries used by GCC, glibc, mpfr, gmp, libmpc. ... 4. Configure GCC with CC=gcc -m32 CXX=g++ -m32 ../src-trunk/configure \ --with-arch=core2 --with-cpu=slm --prefix=/usr/5.0.0 --enable-clocale=gnu --enable-shared --with-demangler-in-ld i686-linux --with-fpmath=sse --enable-languages=c,c++ -- H.J.
Re: [PATCH][AArch64] LR register not used in leaf functions
On Sat, Nov 15, 2014 at 6:08 AM, Jiong Wang wong.kwongyuan.to...@gmail.com wrote: 2014-11-15 0:15 GMT+00:00 Andrew Pinski pins...@gmail.com: On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang jiong.w...@arm.com wrote: On 27/09/14 22:20, Kugan wrote: On 23/09/14 01:58, Jiong Wang wrote: + /* If we decided that we didn't need a leaf frame pointer but then used + LR in the function, then we'll want a frame pointer after all, so + prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + flag_omit_leaf_frame_pointer + df_regs_ever_live_p (LR_REGNUM)) + return false; This breaks my build on aarch64-elf (with some local modifications) Hi Andrew, then what's your local modification? I think the problem is we need to figure out why there is an ICE after your local modification? can you please send me your local modification and testcase if possible. My local modifications can be found in the gcc git at apinski/thunderx-cost. Note I reverted this patch so I can continue working. The testcase is compiling newlib. Let me try to get it again. I was configuring a combined build with: --disable-fixed-point --without-ppl --without-python --disable-werror --enable-plugins --enable-checking --disable-sim --with-newlib --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 --target=aarch64-thunderx-elf --enable-languages=c,c++ Thanks, Andrew Pinski aarch64_frame_pointer_required returns true but then we use LR but now aarch64_can_eliminate and aarch64_frame_pointer_required are inconsitant which is not a valid thing for LRA (and reload). This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : IRA calls hook frame_pointer_required and it returns false. After that LRA calls can_eliminate hook and it returns false which means that fp can not be used for allocation and we should spill all pseudos assigned to it. Thanks, Andrew Pinski
[committed,testsuite] Fix dg-error for a darwin testcase
Committed as trivial, as the error wording changed due to more precise diagnostics: it now says ‘CFStringRef {aka const struct __CFString *}’ instead of just ‘CFStringRef’ FX 2014-10-19 Francois-Xavier Coudert fxcoud...@gcc.gnu.org * gcc.dg/darwin-cfstring-format-1.c: Adjust dg-error. Index: gcc.dg/darwin-cfstring-format-1.c === --- gcc.dg/darwin-cfstring-format-1.c (revision 217599) +++ gcc.dg/darwin-cfstring-format-1.c (working copy) @@ -18,7 +18,7 @@ int s2 (int a, CFStringRef fmt, ... ) __ int s2a (int a, CFStringRef fmt, ... ) __attribute__((format(CFString, 2, 2))) ; /* { dg-error format string argument follows the args to be formatted } */ int s3 (const char *fmt, ... ) __attribute__((format(__CFString__, 1, 2))) ; /* { dg-error format argument should be a .CFString. reference but a string was found } */ -int s4 (CFStringRef fmt, ... ) __attribute__((format(printf, 1, 2))) ; /* { dg-error found a .CFStringRef. but the format argument should be a string } */ +int s4 (CFStringRef fmt, ... ) __attribute__((format(printf, 1, 2))) ; /* { dg-error found a .CFStringRef.* but the format argument should be a string } */ char *s5 (char dum, char *fmt1, ... ) __attribute__((format_arg(2))) ; /* OK */ CFStringRef s6 (CFStringRef dum, CFStringRef fmt1, ... ) __attribute__((format_arg(2))) ; /* OK */
Re: [gofrontend-dev] [PATCH 4/4] Gccgo port to s390[x] -- part II
On Thu, Nov 13, 2014 at 2:58 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: What do you think about the attached patches? They work for me, but I'm not sure whether the patch to go-test.exp is good because I know nothing about tcl. Looks plausible to me. Ian
Re: [PATCH][AArch64] LR register not used in leaf functions
On Sat, Nov 15, 2014 at 7:21 AM, Andrew Pinski pins...@gmail.com wrote: On Sat, Nov 15, 2014 at 6:08 AM, Jiong Wang wong.kwongyuan.to...@gmail.com wrote: 2014-11-15 0:15 GMT+00:00 Andrew Pinski pins...@gmail.com: On Tue, Sep 30, 2014 at 8:00 AM, Jiong Wang jiong.w...@arm.com wrote: On 27/09/14 22:20, Kugan wrote: On 23/09/14 01:58, Jiong Wang wrote: + /* If we decided that we didn't need a leaf frame pointer but then used + LR in the function, then we'll want a frame pointer after all, so + prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + flag_omit_leaf_frame_pointer + df_regs_ever_live_p (LR_REGNUM)) + return false; This breaks my build on aarch64-elf (with some local modifications) Hi Andrew, then what's your local modification? I think the problem is we need to figure out why there is an ICE after your local modification? can you please send me your local modification and testcase if possible. My local modifications can be found in the gcc git at apinski/thunderx-cost. Note I reverted this patch so I can continue working. The testcase is compiling newlib. Let me try to get it again. I was configuring a combined build with: --disable-fixed-point --without-ppl --without-python --disable-werror --enable-plugins --enable-checking --disable-sim --with-newlib --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 --target=aarch64-thunderx-elf --enable-languages=c,c++ Attached is the preprocessed source. cc1 strtol.i -mabi=ilp32 -O2 is enough to reproduce the ICE. Thanks, Andrew Thanks, Andrew Pinski aarch64_frame_pointer_required returns true but then we use LR but now aarch64_can_eliminate and aarch64_frame_pointer_required are inconsitant which is not a valid thing for LRA (and reload). This was mentioned in https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00151.html : IRA calls hook frame_pointer_required and it returns false. After that LRA calls can_eliminate hook and it returns false which means that fp can not be used for allocation and we should spill all pseudos assigned to it. Thanks, Andrew Pinski strtol.i Description: Binary data
[committed,testsuite] Fix missing includes for darwin testcases
Committed as trivial. And also, fixed wrong date on my earlier ChangeLog entry :) FX 2014-11-15 Francois-Xavier Coudert fxcoud...@gcc.gnu.org * gcc.dg/pubtypes-3.c: Include string.h. * gcc.dg/pubtypes-4.c: Likewise. Index: gcc.dg/pubtypes-3.c === --- gcc.dg/pubtypes-3.c (revision 217599) +++ gcc.dg/pubtypes-3.c (working copy) @@ -9,6 +9,7 @@ #include stdlib.h #include stdio.h +#include string.h struct used_struct { Index: gcc.dg/pubtypes-4.c === --- gcc.dg/pubtypes-4.c (revision 217599) +++ gcc.dg/pubtypes-4.c (working copy) @@ -11,6 +11,7 @@ #include stdlib.h #include stdio.h +#include string.h struct used_struct {
Re: [BUILDROBOT] error: �??cl_target_option_stream_in�?? was not declared in this scope (was: LTO streaming of TARGET_OPTIMIZE_NODE)
On Fri, 2014-11-14 19:53:33 +0100, Jan Hubicka hubi...@ucw.cz wrote: Breaks build: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o tree-streamer-in.o -MT tree-streamer-in.o -MMD -MP -MF ./.deps/tree-streamer-in.TPo /home/jbglaw/repos/gcc/gcc/tree-streamer-in.c /home/jbglaw/repos/gcc/gcc/tree-streamer-in.c: In function ‘void unpack_value_fields(data_in*, bitpack_d*, tree)’: /home/jbglaw/repos/gcc/gcc/tree-streamer-in.c:527:180: error: ‘cl_target_option_stream_in’ was not declared in this scope make[1]: *** [tree-streamer-in.o] Error 1 See eg. these builds: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=376049 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=376050 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=376051 I managed to do a partial commit (mistyping lto-streamer.h). It should be fixed by the followup commit a minute later. Does it works for you now? Looks good. Thanks! MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: They that give up essential liberty to obtain temporary safety, the second : deserve neither liberty nor safety. (Ben Franklin) signature.asc Description: Digital signature
[committed,testsuite] Only run gcc.dg/tree-ssa/pr61144.c when aliases are supported
All other tests in gcc.dg/ that use __attribute__((__alias__())) are guarded by dg-require-alias. Let’s do the same for gcc.dg/tree-ssa/pr61144.c, otherwise it complains on darwin. 2014-11-15 Francois-Xavier Coudert fxcoud...@gcc.gnu.org * gcc.dg/tree-ssa/pr61144.c: Add dg-require-alias. Index: gcc.dg/tree-ssa/pr61144.c === --- gcc.dg/tree-ssa/pr61144.c (revision 217599) +++ gcc.dg/tree-ssa/pr61144.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-weak } */ +/* { dg-require-alias } */ /* { dg-options -O2 -fdump-tree-optimized } */ static int dummy = 0; extern int foo __attribute__((__weak__, __alias__(dummy)));
Rerog streaming of OPTIMIZATION_NODE
Hi, this patch implements OPTIMIZATION_NODE streaming same was as previous patch did for TARGET_OPTION_NODE. Since the code turned out to be completely analogous to the previous one I will go ahead and commit it as obvious. It will help to make followup changes easier to follow. I also tested this with forcing default optimization node on every function with LTO. It seems to just work, modulo inliner ignoring most of the flags and happily dragging code from one set of optimization options to another. Bootstrapped/regtested ppc64-linux and x86_64-linux, tested with Firefox, Comitted. Honza * lto-streamer-out.c (hash_tree): Use cl_optimization_hash. * lto-streamer.h (cl_optimization_stream_out, cl_optimization_stream_in): Declare. * optc-save-gen.awk: Generate cl_optimization LTO streaming and hashing routines. * opth-gen.awk: Add prototype of cl_optimization_hash. * tree-streamer-in.c (unpack_ts_optimization): Remove. (streamer_unpack_tree_bitfields): Use cl_optimization_stream_in. * tree-streamer-out.c (pack_ts_optimization): Remove. (streamer_pack_tree_bitfields): Use cl_optimization_stream_out. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 217572) +++ lto-streamer-out.c (working copy) @@ -948,7 +948,7 @@ hstate.add_wide_int (cl_target_option_hash (TREE_TARGET_OPTION (t))); if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION)) -hstate.add (t, sizeof (struct cl_optimization)); +hstate.add_wide_int (cl_optimization_hash (TREE_OPTIMIZATION (t))); if (CODE_CONTAINS_STRUCT (code, TS_IDENTIFIER)) hstate.merge_hash (IDENTIFIER_HASH_VALUE (t)); Index: lto-streamer.h === --- lto-streamer.h (revision 217572) +++ lto-streamer.h (working copy) @@ -844,7 +844,11 @@ struct bitpack_d *, struct cl_target_option *); +void cl_optimization_stream_out (struct bitpack_d *, struct cl_optimization *); +void cl_optimization_stream_in (struct bitpack_d *, struct cl_optimization *); + + /* In lto-symtab.c. */ extern void lto_symtab_merge_decls (void); extern void lto_symtab_merge_symbols (void); Index: optc-save-gen.awk === --- optc-save-gen.awk (revision 217571) +++ optc-save-gen.awk (working copy) @@ -551,4 +551,61 @@ print }; +n_opt_val = 2; +var_opt_val[0] = x_optimize +var_opt_val_type[0] = char +var_opt_val[1] = x_optimize_size +var_opt_val_type[1] = char +for (i = 0; i n_opts; i++) { + if (flag_set_p(Optimization, flags[i])) { + name = var_name(flags[i]) + if(name == ) + continue; + + if(name in var_opt_list_seen) + continue; + + var_opt_list_seen[name]++; + + otype = var_type_struct(flags[i]) + var_opt_val_type[n_opt_val] = otype; + var_opt_val[n_opt_val++] = x_ name; + } } +print ; +print /* Hash optimization options */; +print hashval_t; +print cl_optimization_hash (struct cl_optimization const *ptr ATTRIBUTE_UNUSED); +print {; +print inchash::hash hstate;; +for (i = 0; i n_opt_val; i++) { + name = var_opt_val[i] + print hstate.add_wide_int (ptr- name);; +} +print return hstate.end ();; +print }; + +print ; +print /* Stream out optimization options */; +print void; +print cl_optimization_stream_out (struct bitpack_d *bp,; +print struct cl_optimization *ptr); +print {; +for (i = 0; i n_opt_val; i++) { + name = var_opt_val[i] + print bp_pack_value (bp, ptr- name, 64);; +} +print }; + +print ; +print /* Stream in optimization options */; +print void; +print cl_optimization_stream_in (struct bitpack_d *bp,; +printstruct cl_optimization *ptr); +print {; +for (i = 0; i n_opt_val; i++) { + name = var_opt_val[i] + print ptr- name = ( var_opt_val_type[i] ) bp_unpack_value (bp, 64);; +} +print }; +} Index: opth-gen.awk === --- opth-gen.awk(revision 217571) +++ opth-gen.awk(working copy) @@ -299,6 +299,9 @@ print /* Hash option variables from a structure. */; print extern hashval_t cl_target_option_hash (const struct cl_target_option *);; print ; +print /* Hash optimization from a structure. */; +print extern hashval_t cl_optimization_hash (const struct cl_optimization *);; +print ; print /* Anything that includes tm.h, does not necessarily need this. */ print #if !defined(GCC_TM_H) print #include \input.h\ /* for location_t */ Index: tree-streamer-in.c === --- tree-streamer-in.c (revision 217571) +++ tree-streamer-in.c
Re: patch switching on LRA remat
On 2014-11-15 9:58 AM, H.J. Lu wrote: On Fri, Nov 14, 2014 at 12:07 PM, Vladimir Makarov vmaka...@redhat.com wrote: The LRA rematerialization patch I've submitted about day ago broke H.J.'s 32-bit bootstrap. So I switched off the rematerialization right away. The set for bootstrapping used by H.J. was very useful. I've fixed several existing and potential bugs. Here the patch fixing the bugs and switching on LRA remat back. The patch was bootstrapped on x86-64 and i686 (using H.J.'s options). Committed as rev. 217588. 2014-11-14 Vladimir Makarov vmaka...@redhat.com * lra-int.h (lra_create_live_ranges): Add parameter. * lra-lives.c (temp_bitmap): Move higher. (initiate_live_solver): Move temp_bitmap initialization into lra_live_ranges_init. (finish_live_solver): Move temp_bitmap clearing into live_ranges_finish. (process_bb_lives): Add parameter. Use it to control live info update and dead insn elimination. Pass it to mark_regno_live and mark_regno_dead. (lra_create_live_ranges): Add parameter. Pass it to process_bb_lives. (lra_live_ranges_init, lra_live_ranges_finish): See changes in initiate_live_solver and finish_live_solver. * lra-remat.c (do_remat): Process insn non-operand hard regs too. Use temp_bitmap to update avail_cands. * lra.c (lra): Pass new parameter to lra_create_live_ranges. Move check with lra_need_for_spill_p after live range pass. Switch on rematerialization pass. Unfortunately, it failed to bootstrap ia32 GCC: https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00392.html You can bootstrap ia32 GCC on Linux/x86-64: 1. Install ia32 binutils under /foo/bar. 2. Set PATH=/foo/bar:$PATH 3. Install 32-bit libraries used by GCC, glibc, mpfr, gmp, libmpc. ... 4. Configure GCC with Thanks, H.J. I see it's a different set of options as it was before. I switched off remat. temporarily (rev. 217609). Index: ChangeLog === --- ChangeLog (revision 217608) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-11-15 Vladimir Makarov vmaka...@redhat.com + + * lra.c (lra): Switch off rematerialization pass. + 2014-11-15 Marc Glisse marc.gli...@inria.fr * config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps, Index: lra.c === --- lra.c (revision 217602) +++ lra.c (working copy) @@ -2354,7 +2354,7 @@ break; /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ - if (lra_remat ()) + if (0lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true);
[PATCH, 1/8] Expand oacc kernels after pass_build_ealias
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch moves omp expansion of the oacc kernels directive to after pass_build_ealias. The rationale is that in order to use pass_parallelize_loops for analysis and transformation of an oacc kernels region, we postpone omp expansion of that region until the earliest point in the pass list where enough information is availabe to run pass_parallelize_loops, in other words, after pass_build_ealias. The patch postpones expansion in expand_omp, and ensures expansion by adding pass_expand_omp_ssa: - after pass_build_ealias, and - after pass_all_early_optimizations for the case we're not optimizing. In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa, the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of lowered omp code, to handle it conservatively. The patch contains changes in expand_omp_target to deal with ssa-code, similar to what is already present in expand_omp_taskreg. Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be static for oacc kernels. It does this to get some references to .omp_data_sizes and .omp_data_kinds in the ssa code. Without these references, the definitions will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE kludge for this purpose ]. Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the original function of which the definition has been removed (as in moved to the split off function). TODO_remove_unused_locals takes care of some of them, but not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these dangling SSA_NAMEs and releases them. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * function.h (struct function): Add contains_oacc_kernels field. * gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels. * omp-low.c: Include gimple-pretty-print.h. (release_first_vuse_in_edge_dest): New function. (expand_omp_target): Handle ssa-code. (expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa. (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in properties_provided field. (pass_expand_omp::execute): Set PROP_gimple_eomp in cfun-curr_properties only if cfun does not contain oacc kernels. (pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to todo_flags_finish field. (pass_expand_omp_ssa::execute): Release dandging SSA_NAMEs after calling execute_expand_omp. (lower_omp_target): Add static_arrays variable, init to 1. Don't use static arrays for kernels directive. Use static_arrays variable. Handle case that .omp_data_kinds is not static. (gimple_stmt_omp_lowering_p): New function. * omp-low.h (gimple_stmt_omp_lowering_p): Declare. * passes.def: Add pass_expand_omp_ssa after pass_build_ealias. * tree-ssa-ccp.c: Include omp-low.h. (surely_varying_stmt_p): Handle omp lowering code conservatively. * tree-ssa-forwprop.c: Include omp-low.h. (pass_forwprop::execute): Handle omp lowering code conservatively. --- gcc/function.h | 3 + gcc/gimplify.c | 1 + gcc/omp-low.c | 194 +--- gcc/omp-low.h | 1 + gcc/passes.def | 2 + gcc/tree-ssa-ccp.c | 4 + gcc/tree-ssa-forwprop.c | 4 +- 7 files changed, 196 insertions(+), 13 deletions(-) diff --git a/gcc/function.h b/gcc/function.h index 08ab761..a72c154 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -664,6 +664,9 @@ struct GTY(()) function { /* Set when the tail call has been identified. */ unsigned int tail_call_marked : 1; + + /* Set when the function contains oacc kernels directives. */ + unsigned int contains_oacc_kernels : 1; }; /* Add the decl D to the local_decls list of FUN. */ diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 2c8c666..52d7e6d 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -7281,6 +7281,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) break; case OACC_KERNELS: stmt = gimple_build_oacc_kernels (body, OACC_KERNELS_CLAUSES (expr)); + cfun-contains_oacc_kernels = 1; break; case OACC_PARALLEL: stmt =
[PATCH, 2/8] Add pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds a pass group pass_oacc_kernels. The rationale is that we want a pass group to run oacc kernels region related (optimization) passes in. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * passes.def: Add pass group pass_oacc_kernels. * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. --- gcc/passes.def | 5 + gcc/tree-pass.h | 1 + gcc/tree-ssa-loop.c | 48 3 files changed, 54 insertions(+) diff --git a/gcc/passes.def b/gcc/passes.def index bce8591..1fdb70a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -72,6 +72,11 @@ along with GCC; see the file COPYING3. If not see /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); + /* Pass group that runs when there are oacc kernels in the + function. */ + NEXT_PASS (pass_oacc_kernels); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) + POP_INSERT_PASSES () NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_fre); NEXT_PASS (pass_merge_phi); diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index eaa69b4..0bae847 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -445,6 +445,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt); extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 758b5fc..c29aa22 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -157,6 +157,54 @@ make_pass_tree_loop (gcc::context *ctxt) return new pass_tree_loop (ctxt); } +/* Gate for oacc kernels pass group. */ + +static bool +gate_oacc_kernels (function *fn) +{ + if (!flag_openacc) +return false; + + return fn-contains_oacc_kernels; +} + +/* The oacc kernels superpass. */ + +namespace { + +const pass_data pass_data_oacc_kernels = +{ + GIMPLE_PASS, /* type */ + oacc_kernels, /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_oacc_kernels : public gimple_opt_pass +{ +public: + pass_oacc_kernels (gcc::context *ctxt) +: gimple_opt_pass (pass_data_oacc_kernels, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *fn) { return gate_oacc_kernels (fn); } + +}; // class pass_oacc_kernels + +} // anon namespace + +gimple_opt_pass * +make_pass_oacc_kernels (gcc::context *ctxt) +{ + return new pass_oacc_kernels (ctxt); +} + /* The no-loop superpass. */ namespace { -- 1.9.1
[PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels. The idea is that pass_parallelize_loops only deals with loops for which the header has been copied, so the easiest way to meet that requirement when running pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a part of pass_oacc_kernels. We define a seperate pass pass_ch_oacc_kernels, to leave all loops that aren't part of a kernels region alone. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. * tree-pass.h (make_pass_ch_oacc_kernels): Declare * tree-ssa-loop-ch.c: Include omp-low.h. (pass_ch_execute): Declare. (pass_ch::execute): Factor out ... (pass_ch_execute): ... this new function. If handling oacc kernels, skip loops that are not in oacc kernels region. (pass_ch_oacc_kernels::execute): (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New function. --- gcc/omp-low.c | 83 ++ gcc/omp-low.h | 2 ++ gcc/passes.def | 1 + gcc/tree-pass.h| 1 + gcc/tree-ssa-loop-ch.c | 59 +-- 5 files changed, 144 insertions(+), 2 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 6caeae9..e35fa8b 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -13909,4 +13909,87 @@ gimple_stmt_omp_lowering_p (gimple stmt) return false; } +/* Return true if LOOP is inside a kernels region. */ + +bool +loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry, + basic_block *region_exit) +{ + bitmap excludes_bitmap = BITMAP_GGC_ALLOC (); + bitmap region_bitmap = BITMAP_GGC_ALLOC (); + bitmap_clear (region_bitmap); + + if (region_entry != NULL) +*region_entry = NULL; + if (region_exit != NULL) +*region_exit = NULL; + + basic_block bb; + gimple last; + FOR_EACH_BB_FN (bb, cfun) +{ + if (bitmap_bit_p (region_bitmap, bb-index)) + continue; + + last = last_stmt (bb); + if (!last) + continue; + + if (gimple_code (last) != GIMPLE_OACC_KERNELS) + continue; + + bitmap_clear (excludes_bitmap); + bitmap_set_bit (excludes_bitmap, bb-index); + + vecbasic_block dominated + = get_all_dominated_blocks (CDI_DOMINATORS, bb); + + unsigned di; + basic_block dom; + + basic_block end_region = NULL; + FOR_EACH_VEC_ELT (dominated, di, dom) + { + if (dom == bb) + continue; + + last = last_stmt (dom); + if (!last) + continue; + + if (gimple_code (last) != GIMPLE_OMP_RETURN) + continue; + + if (end_region == NULL + || dominated_by_p (CDI_DOMINATORS, end_region, dom)) + end_region = dom; + } + + vecbasic_block excludes + = get_all_dominated_blocks (CDI_DOMINATORS, end_region); + + unsigned di2; + basic_block exclude; + + FOR_EACH_VEC_ELT (excludes, di2, exclude) + if (exclude != end_region) + bitmap_set_bit (excludes_bitmap, exclude-index); + + FOR_EACH_VEC_ELT (dominated, di, dom) + if (!bitmap_bit_p (excludes_bitmap, dom-index)) + bitmap_set_bit (region_bitmap, dom-index); + + if (bitmap_bit_p (region_bitmap, loop-header-index)) + { + if (region_entry != NULL) + *region_entry = bb; + if (region_exit != NULL) + *region_exit = end_region; + return true; + } +} + + return false; +} + #include gt-omp-low.h diff --git a/gcc/omp-low.h b/gcc/omp-low.h index ff8a956..f1b9d77 100644 --- a/gcc/omp-low.h +++ b/gcc/omp-low.h @@ -29,6 +29,8 @@ extern tree omp_reduction_init (tree, tree); extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *); extern void omp_finish_file (void); extern bool gimple_stmt_omp_lowering_p (gimple); +extern bool loop_in_oacc_kernels_region_p (struct loop *, basic_block *, + basic_block *); extern GTY(()) vectree, va_gc *offload_funcs; extern GTY(()) vectree, va_gc *offload_vars; diff --git a/gcc/passes.def b/gcc/passes.def index 1fdb70a..5eefe73 100644 --- a/gcc/passes.def +++
[PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_tree_loop_init and pass_tree_loop_init_done to pass_oacc_kernels. Pass_parallelize_loops is run between these passes in the pass group pass_tree_loop, since it requires loop information. We do the same for pass_oacc_kernels. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass group pass_oacc_kernels. * tree-ssa-loop.c (pass_tree_loop_init::clone) (pass_tree_loop_done::clone): New function. --- gcc/passes.def | 2 ++ gcc/tree-ssa-loop.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/gcc/passes.def b/gcc/passes.def index 5eefe73..83f437b 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -77,6 +77,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); + NEXT_PASS (pass_tree_loop_init); + NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_fre); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index c29aa22..c78b013 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -269,6 +269,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); } }; // class pass_tree_loop_init @@ -563,6 +564,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *) { return tree_ssa_loop_done (); } + opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); } }; // class pass_tree_loop_done -- 1.9.1
[PATCH, 6/8] Add pass_ccp to pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_loop_ccp to pass group pass_oacc_kernels. We need this pass to simplify the loop body, and allow pass_parloops to detect that loop iterations are independent. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * passes.def: Add pass_ccp in pass group pass_oacc_kernels. * gcc.dg/pr43513.c: Update for new pass_ccp. * gcc.dg/tree-ssa/alias-17.c: Same. * gcc.dg/tree-ssa/foldconst-4.c: Same. * gcc.dg/tree-ssa/ssa-ccp-29.c: Same. * gcc.dg/tree-ssa/ssa-ccp-3.c: Same. --- gcc/passes.def | 1 + gcc/testsuite/gcc.dg/pr43513.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/alias-17.c| 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c | 6 +++--- 6 files changed, 16 insertions(+), 15 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index f6c16b9..cd9443c 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -79,6 +79,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); + NEXT_PASS (pass_ccp); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_expand_omp_ssa); diff --git a/gcc/testsuite/gcc.dg/pr43513.c b/gcc/testsuite/gcc.dg/pr43513.c index 78a037b..3fb0890 100644 --- a/gcc/testsuite/gcc.dg/pr43513.c +++ b/gcc/testsuite/gcc.dg/pr43513.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-ccp2 } */ +/* { dg-options -O2 -fdump-tree-ccp3 } */ void bar (int *); void foo (char *, int); @@ -15,5 +15,5 @@ foo3 () foo (%d , results[i]); } -/* { dg-final { scan-tree-dump-times alloca 0 ccp2} } */ -/* { dg-final { cleanup-tree-dump ccp2 } } */ +/* { dg-final { scan-tree-dump-times alloca 0 ccp3} } */ +/* { dg-final { cleanup-tree-dump ccp3 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c b/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c index 48e72ff..59862f6 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-17.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fno-early-inlining -fdump-tree-ccp2 } */ +/* { dg-options -O -fno-early-inlining -fdump-tree-ccp3 } */ int *p; int inline bar(void) { return 0; } @@ -14,5 +14,5 @@ int foo(int x) return *q + *p; } -/* { dg-final { scan-tree-dump-not NOTE: no flow-sensitive alias info for ccp2 } } */ -/* { dg-final { cleanup-tree-dump ccp2 } } */ +/* { dg-final { scan-tree-dump-not NOTE: no flow-sensitive alias info for ccp3 } } */ +/* { dg-final { cleanup-tree-dump ccp3 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c index 445d415..916a857 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/foldconst-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fdump-tree-ccp2 } */ +/* { dg-options -O -fdump-tree-ccp3 } */ struct a {int a,b;}; const static struct a a; @@ -10,5 +10,5 @@ test() { return a.a+b[c]; } -/* { dg-final { scan-tree-dump return 0; ccp2 } } */ -/* { dg-final { cleanup-tree-dump ccp2 } } */ +/* { dg-final { scan-tree-dump return 0; ccp3 } } */ +/* { dg-final { cleanup-tree-dump ccp3 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c index 44d2945..1e3f41b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-29.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fdump-tree-ccp2 } */ +/* { dg-options -O -fdump-tree-ccp3 } */ static double num; int foo (void) @@ -7,5 +7,5 @@ int foo (void) return *(unsigned *)num; } -/* { dg-final { scan-tree-dump return 0; ccp2 } } */ -/* { dg-final { cleanup-tree-dump ccp2 } } */ +/* { dg-final { scan-tree-dump return 0; ccp3 } } */ +/* { dg-final { cleanup-tree-dump ccp3 } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c index 86a706b..03717e1 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-3.c @@ -1,5 +1,5 @@ /* { dg-do
[PATCH, 5/8] Add pass_loop_im to pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds pass_loop_im to pass group pass_oacc_kernels. We need this pass to simplify the loop body, and allow pass_parloops to detect that loop iterations are independent. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. * c-c++-common/restrict-2.c: Update for new pass_lim. * c-c++-common/restrict-4.c: Same. * g++.dg/tree-ssa/pr33615.C: Same. * g++.dg/tree-ssa/restrict1.C: Same. * gcc.dg/tm/pub-safety-1.c: Same. * gcc.dg/tm/reg-promotion.c: Same. * gcc.dg/tree-ssa/20050314-1.c: Same. * gcc.dg/tree-ssa/loop-32.c: Same. * gcc.dg/tree-ssa/loop-33.c: Same. * gcc.dg/tree-ssa/loop-34.c: Same. * gcc.dg/tree-ssa/loop-35.c: Same. * gcc.dg/tree-ssa/loop-7.c: Same. * gcc.dg/tree-ssa/pr23109.c: Same. * gcc.dg/tree-ssa/restrict-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-1.c: Same. * gcc.dg/tree-ssa/ssa-lim-10.c: Same. * gcc.dg/tree-ssa/ssa-lim-11.c: Same. * gcc.dg/tree-ssa/ssa-lim-12.c: Same. * gcc.dg/tree-ssa/ssa-lim-2.c: Same. * gcc.dg/tree-ssa/ssa-lim-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-6.c: Same. * gcc.dg/tree-ssa/ssa-lim-7.c: Same. * gcc.dg/tree-ssa/ssa-lim-8.c: Same. * gcc.dg/tree-ssa/ssa-lim-9.c: Same. * gcc.dg/tree-ssa/structopt-1.c: Same. * gfortran.dg/pr32921.f: Same. --- gcc/passes.def | 1 + gcc/testsuite/c-c++-common/restrict-2.c | 6 +++--- gcc/testsuite/c-c++-common/restrict-4.c | 6 +++--- gcc/testsuite/g++.dg/tree-ssa/pr33615.C | 6 +++--- gcc/testsuite/g++.dg/tree-ssa/restrict1.C | 6 +++--- gcc/testsuite/gcc.dg/tm/pub-safety-1.c | 6 +++--- gcc/testsuite/gcc.dg/tm/reg-promotion.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-32.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-33.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-34.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-35.c | 8 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/pr23109.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c | 8 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c | 6 +++--- gcc/testsuite/gfortran.dg/pr32921.f | 6 +++--- 27 files changed, 81 insertions(+), 80 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index 83f437b..f6c16b9 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3. If not see PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); + NEXT_PASS (pass_lim); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_expand_omp_ssa); diff --git a/gcc/testsuite/c-c++-common/restrict-2.c b/gcc/testsuite/c-c++-common/restrict-2.c index 3f71b77..f0b0e15a 100644 --- a/gcc/testsuite/c-c++-common/restrict-2.c +++ b/gcc/testsuite/c-c++-common/restrict-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -fno-strict-aliasing -fdump-tree-lim1-details } */ +/* { dg-options -O -fno-strict-aliasing -fdump-tree-lim2-details } */ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j) { @@ -10,5 +10,5 @@ void foo (float * __restrict__ a, float * __restrict__ b, int n, int j) /* We should move the RHS of the store out of the loop. */ -/* { dg-final { scan-tree-dump-times Moving statement 11 lim1 } } */ -/* { dg-final { cleanup-tree-dump lim1 } } */ +/* { dg-final { scan-tree-dump-times Moving statement 11 lim2 } } */ +/* { dg-final { cleanup-tree-dump lim2 } } */ diff --git a/gcc/testsuite/c-c++-common/restrict-4.c b/gcc/testsuite/c-c++-common/restrict-4.c
[PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch adds: - a specialized version of pass_parallelize_loops called pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and - relevant test-cases. The pass only handles loops that are in a kernels region, and skips over bits of pass_parallelize_loops that are already done for oacc kernels. The pass reintroduces the use of omp_expand_local, I haven't managed to make it work yet using the external pass pass_expand_omp_ssa. An obvious limitation of the patch is the fact that we copy over the clauses from the kernels directive to the generated parallel directive. We'll need to do something more intelligent here, f.i. setting vector_length based on the parallelization factor. Another limitation is that the pass still needs -ftree-parallelize-loops to trigger. OK for trunk? Thanks, - Tom 2014-11-14 Tom de Vries t...@codesourcery.com * passes.def: Add pass_parallelize_loops_oacc_kernels in pass group pass_oacc_kernels. Move pass_expand_omp_ssa into pass group pass_oacc_kernels. * tree-parloops.c (create_parallel_loop): Add function parameters region_entry and bool oacc_kernels_p. Handle oacc_kernels_p. (gen_parallel_loop): Same. Use omp_expand_local if oacc_kernels_p. Call create_parallel_loop with additional args. (parallelize_loops): Add function parameter oacc_kernels_p. Calculate dominance info. Skip loops that are not in a kernels region. Call gen_parallel_loop with additional args. (pass_parallelize_loops::execute): Call parallelize_loops with false argument. (pass_data_parallelize_loops_oacc_kernels): New pass_data. (class pass_parallelize_loops_oacc_kernels): New pass. (pass_parallelize_loops_oacc_kernels::execute) (make_pass_parallelize_loops_oacc_kernels): New function. * tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare. * testsuite/libgomp.oacc-c/oacc-kernels-2-run.c: New test. * testsuite/libgomp.oacc-c/oacc-kernels-run.c: New test. * gcc.dg/oacc-kernels-2.c: New test. * gcc.dg/oacc-kernels.c: New test. --- gcc/passes.def | 3 +- gcc/testsuite/gcc.dg/oacc-kernels-2.c | 79 +++ gcc/testsuite/gcc.dg/oacc-kernels.c| 71 ++ gcc/tree-parloops.c| 242 - gcc/tree-pass.h| 2 + .../testsuite/libgomp.oacc-c/oacc-kernels-2-run.c | 65 ++ .../testsuite/libgomp.oacc-c/oacc-kernels-run.c| 59 + 7 files changed, 465 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels-2.c create mode 100644 gcc/testsuite/gcc.dg/oacc-kernels.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-2-run.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/oacc-kernels-run.c diff --git a/gcc/passes.def b/gcc/passes.def index cd9443c..cc09ba9 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -80,9 +80,10 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_ccp); + NEXT_PASS (pass_parallelize_loops_oacc_kernels); + NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () - NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_fre); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); diff --git a/gcc/testsuite/gcc.dg/oacc-kernels-2.c b/gcc/testsuite/gcc.dg/oacc-kernels-2.c new file mode 100644 index 000..1ff4bad --- /dev/null +++ b/gcc/testsuite/gcc.dg/oacc-kernels-2.c @@ -0,0 +1,79 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fopenacc } */ +/* { dg-options -fopenacc -ftree-parallelize-loops=32 -O2 -std=c99 -fdump-tree-parloops_oacc_kernels-all -fdump-tree-copyrename } */ + +#include stdlib.h +#include stdio.h + +#define N (1024 * 512) +#define N_REF 4293394432 + +#if 1 +#define COUNTERTYPE unsigned int +#else +#define COUNTERTYPE int +#endif + +int +main (void) +{ + unsigned int i; + + unsigned int *__restrict a; + unsigned int *__restrict b; + unsigned int *__restrict c; + + a = malloc (N * sizeof (unsigned int)); + b = malloc (N * sizeof (unsigned int)); + c = malloc (N * sizeof (unsigned int)); + + +#pragma acc kernels copyout (a[0:N]) + { +for
[PATCH, 8/8] Do simple omp lowering for no address taken var
On 15-11-14 13:14, Tom de Vries wrote: Hi, I'm submitting a patch series with initial support for the oacc kernels directive. The patch series uses pass_parallelize_loops to implement parallelization of loops in the oacc kernels region. The patch series consists of these 8 patches: ... 1 Expand oacc kernels after pass_build_ealias 2 Add pass_oacc_kernels 3 Add pass_ch_oacc_kernels to pass_oacc_kernels 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels 5 Add pass_loop_im to pass_oacc_kernels 6 Add pass_ccp to pass_oacc_kernels 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels 8 Do simple omp lowering for no address taken var ... This patch lowers integer variables that do not have their address taken as local variable. We use a copy at region entry and exit to copy the value in and out. In the context of reduction handling in a kernels region, this allows the parloops reduction analysis to recognize the reduction, even after oacc lowering has been done in pass_lower_omp. In more detail, without this patch, the omp_data_i load and stores are generated in place (in this case, in the loop): ... { .omp_data_iD.2201 = .omp_data_arr.15D.2220; { unsigned intD.9 iD.2146; iD.2146 = 0; goto D.2207; D.2208: D.2216 = .omp_data_iD.2201-cD.2203; c.9D.2176 = *D.2216; D.2177 = (long unsigned intD.10) iD.2146; D.2178 = D.2177 * 4; D.2179 = c.9D.2176 + D.2178; D.2180 = *D.2179; D.2217 = .omp_data_iD.2201-sumD.2205; D.2218 = *D.2217; D.2217 = .omp_data_iD.2201-sumD.2205; D.2219 = D.2180 + D.2218; *D.2217 = D.2219; iD.2146 = iD.2146 + 1; D.2207: if (iD.2146 = 524287) goto D.2208; else goto D.2209; D.2209: } ... With this patch, the omp_data_i load and stores for sum are generated at entry and exit: ... { .omp_data_iD.2201 = .omp_data_arr.15D.2218; D.2216 = .omp_data_iD.2201-sumD.2205; sumD.2206 = *D.2216; { unsigned intD.9 iD.2146; iD.2146 = 0; goto D.2207; D.2208: D.2217 = .omp_data_iD.2201-cD.2203; c.9D.2176 = *D.2217; D.2177 = (long unsigned intD.10) iD.2146; D.2178 = D.2177 * 4; D.2179 = c.9D.2176 + D.2178; D.2180 = *D.2179; sumD.2206 = D.2180 + sumD.2206; iD.2146 = iD.2146 + 1; D.2207: if (iD.2146 = 524287) goto D.2208; else goto D.2209; D.2209: } *D.2216 = sumD.2206; #pragma omp return } ... So, without the patch the reduction operation looks like this: ... *(.omp_data_iD.2201-sumD.2205) = *(.omp_data_iD.2201-sumD.2205) + x ... And with this patch the reduction operation is simply: ... sumD.2206 = sumD.2206 + x: ... OK for trunk? Thanks, - Tom 2014-11-03 Tom de Vries t...@codesourcery.com * gimple.c (gimple_seq_ior_addresses_taken_op) (gimple_seq_ior_addresses_taken): New function. * gimple.h (gimple_seq_ior_addresses_taken): Declare. * omp-low.c (addresses_taken): Declare local variable. (lower_oacc_offload): Lower variables that do not have their address taken as local variable. Use a copy at region entry and exit to copy the value in and out. (execute_lower_omp): Calculate addresses_taken. --- gcc/gimple.c | 35 +++ gcc/gimple.h | 1 + gcc/omp-low.c | 25 ++--- 3 files changed, 58 insertions(+), 3 deletions(-) diff --git a/gcc/gimple.c b/gcc/gimple.c index a9174e6..107eb26 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2428,6 +2428,41 @@ gimple_ior_addresses_taken (bitmap addresses_taken, gimple stmt) gimple_ior_addresses_taken_1); } +/* Helper function for gimple_seq_ior_addresses_taken. */ + +static tree +gimple_seq_ior_addresses_taken_op (tree *tp, + int *walk_subtrees ATTRIBUTE_UNUSED, + void *data) +{ + struct walk_stmt_info *wi = (struct walk_stmt_info *)data; + bitmap addresses_taken = (bitmap)wi-info; + + tree t = *tp; + if (TREE_CODE (t) != ADDR_EXPR) +return NULL_TREE; + + tree var = TREE_OPERAND (t, 0); + if (!DECL_P (var)) +return NULL_TREE; + + bitmap_set_bit (addresses_taken, DECL_UID (var)); + + return NULL_TREE; +} + +/* Find the decls in SEQ that have their address taken, and set the + corresponding decl_uid
Re: [PATCH] Fix Cilk+ ICEs with overflow builtins (PR middle-end/63884)
On Sat, Nov 15, 2014 at 01:03:46PM +0100, Marek Polacek wrote: The problem here is that the Cilk+ code wasn't prepared to handle internal calls that the new overflow builtins entail. Fixed by checking that the CALL_EXPR_FN isn't NULL. Looking at cilk-plus.exp, I think this file will need some tweaks now that the C default is gnu11... Bootstrapped/regtested on powerpc64-linux, ok for trunk? 2014-11-15 Marek Polacek pola...@redhat.com PR middle-end/63884 c-family/ * array-notation-common.c (is_sec_implicit_index_fn): Return false for NULL fndecl. (extract_array_notation_exprs): Return for NULL node. testsuite/ * c-c++-common/cilk-plus/AN/pr63884.c: New test. Ok, thanks. Jakub
PATCH: PR bootstrap/63888: [5 Regression] bootstrap failed when configured with -with-build-config=bootstrap-asan --disable-werror
Hi, GCC uses xstrndup/xstrdup throughout the source tree and those memory may not be freed explicitly before exut. LeakSanitizer isn't very useful here. This patch suppresses LeakSanitizer in bootstrap. OK for trunk? This patch isn't sufficient. I got configure:3612: /export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/xgcc -B/export/build/gnu/gcc-asan/build-x86_64-linux/./gcc/ -B/usr/gcc-5.0.0/x86_64-unknown-linux-gnu/bin/ -B/usr/gcc-5.0.0/x86_64-unknown-linux-gnu/lib/ -isystem /usr/gcc-5.0.0/x86_64-unknown-linux-gnu/include -isystem /usr/gcc-5.0.0/x86_64-unknown-linux-gnu/sys-include-c -g -O2 conftest.c 5 = ==14370==ERROR: AddressSanitizer: odr-violation (0x02b38aa0): [1] size=12 'CSWTCH.2819' /export/gnu/import/git/sources/gcc/gcc/tree-vrp.c:4056:7 [2] size=12 'CSWTCH.2820' /export/gnu/import/git/sources/gcc/gcc/tree-vrp.c:4109:8 These globals were registered at these points: [1]: #0 0x68e9c6 in __asan_register_globals /export/gnu/import/git/sources/gcc/libsanitizer/asan/asan_globals.cc:217 #1 0x28dc89c in __libc_csu_init (/export/build/gnu/gcc-asan/build-x86_64-linux/gcc/cc1+0x28dc89c) #2 0x309e821c34 in __libc_start_main (/lib64/libc.so.6+0x309e821c34) #3 0x683d3e (/export/build/gnu/gcc-asan/build-x86_64-linux/gcc/cc1+0x683d3e) [2]: #0 0x68e9c6 in __asan_register_globals /export/gnu/import/git/sources/gcc/libsanitizer/asan/asan_globals.cc:217 #1 0x28dc89c in __libc_csu_init (/export/build/gnu/gcc-asan/build-x86_64-linux/gcc/cc1+0x28dc89c) #2 0x309e821c34 in __libc_start_main (/lib64/libc.so.6+0x309e821c34) #3 0x683d3e (/export/build/gnu/gcc-asan/build-x86_64-linux/gcc/cc1+0x683d3e) ==14370==HINT: if you don't care about these warnings you may set ASAN_OPTIONS=detect_odr_violation=0 SUMMARY: AddressSanitizer: odr-violation: global 'CSWTCH.2819' at /export/gnu/import/git/sources/gcc/gcc/tree-vrp.c:4056:7 ==14370==ABORTING H.J. --- 2014-11-15 H.J. Lu hongjiu...@intel.com PR bootstrap/63888 * bootstrap-asan.mk (ASAN_OPTIONS): Export detect_leaks=0. diff --git a/config/bootstrap-asan.mk b/config/bootstrap-asan.mk index fbef021..52ef30e 100644 --- a/config/bootstrap-asan.mk +++ b/config/bootstrap-asan.mk @@ -1,5 +1,8 @@ # This option enables -fsanitize=address for stage2 and stage3. +# Suppress LeakSanitizer in bootstrap. +export ASAN_OPTIONS=detect_leaks=0 + STAGE2_CFLAGS += -fsanitize=address STAGE3_CFLAGS += -fsanitize=address POSTSTAGE1_LDFLAGS += -fsanitize=address -static-libasan \
Re: patch switching on LRA remat
On Sat, Nov 15, 2014 at 9:07 AM, Vladimir Makarov vmaka...@redhat.com wrote: On 2014-11-15 9:58 AM, H.J. Lu wrote: On Fri, Nov 14, 2014 at 12:07 PM, Vladimir Makarov vmaka...@redhat.com wrote: The LRA rematerialization patch I've submitted about day ago broke H.J.'s 32-bit bootstrap. So I switched off the rematerialization right away. The set for bootstrapping used by H.J. was very useful. I've fixed several existing and potential bugs. Here the patch fixing the bugs and switching on LRA remat back. The patch was bootstrapped on x86-64 and i686 (using H.J.'s options). Committed as rev. 217588. 2014-11-14 Vladimir Makarov vmaka...@redhat.com * lra-int.h (lra_create_live_ranges): Add parameter. * lra-lives.c (temp_bitmap): Move higher. (initiate_live_solver): Move temp_bitmap initialization into lra_live_ranges_init. (finish_live_solver): Move temp_bitmap clearing into live_ranges_finish. (process_bb_lives): Add parameter. Use it to control live info update and dead insn elimination. Pass it to mark_regno_live and mark_regno_dead. (lra_create_live_ranges): Add parameter. Pass it to process_bb_lives. (lra_live_ranges_init, lra_live_ranges_finish): See changes in initiate_live_solver and finish_live_solver. * lra-remat.c (do_remat): Process insn non-operand hard regs too. Use temp_bitmap to update avail_cands. * lra.c (lra): Pass new parameter to lra_create_live_ranges. Move check with lra_need_for_spill_p after live range pass. Switch on rematerialization pass. Unfortunately, it failed to bootstrap ia32 GCC: https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00392.html You can bootstrap ia32 GCC on Linux/x86-64: 1. Install ia32 binutils under /foo/bar. 2. Set PATH=/foo/bar:$PATH 3. Install 32-bit libraries used by GCC, glibc, mpfr, gmp, libmpc. ... 4. Configure GCC with Thanks, H.J. I see it's a different set of options as it was before. I switched off remat. temporarily (rev. 217609). It also miscompiled SPEC CPU 2000 on both ia32 and x86-64: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63895 -- H.J.
[committed,testsuite] Not run gcc.target/i386/sibcall-1.c on PIC targets
Don’t run gcc.target/i386/sibcall-1.c on PIC targets. 2014-11-15 Francois-Xavier Coudert fxcoud...@gcc.gnu.org PR target/60104 * gcc.target/i386/sibcall-1.c: Don't run on pic targets. Index: gcc.target/i386/sibcall-1.c === --- gcc.target/i386/sibcall-1.c (revision 217599) +++ gcc.target/i386/sibcall-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target ia32 } } */ +/* { dg-do compile { target { ia32 nonpic } } } */ /* { dg-options -O2 } */ extern int (*foo)(int);
Re: [PATCH][AArch64] LR register not used in leaf functions
2014-11-15 15:49 GMT+00:00 Andrew Pinski pins...@gmail.com: My local modifications can be found in the gcc git at apinski/thunderx-cost. Note I reverted this patch so I can continue working. The testcase is compiling newlib. Let me try to get it again. I was configuring a combined build with: --disable-fixed-point --without-ppl --without-python --disable-werror --enable-plugins --enable-checking --disable-sim --with-newlib --disable-tls --with-cpu=thunderx --with-multilib-list=lp64,ilp32 --target=aarch64-thunderx-elf --enable-languages=c,c++ Attached is the preprocessed source. cc1 strtol.i -mabi=ilp32 -O2 is enough to reproduce the ICE. thanks. I can reproduce this ICE under -mabi=ilp32 on your code base. and it's strange LR marked as alive while the final assembly don't have it. the reason of this is the following pattern define_insn *tboptabmode1 define_insn *cboptabmode1 always declare to clobber some register while they don't always use them in code generation, so sub-optimal code generated, some registers are wasted. you can see before my patch x19 is allocated for that clobber, thus there is an unnecessary save of x19 to stack, while after my patch, x30 is allocated for that clobber, so aarch64_can_eliminate invoked after this will think this function require frame pointer according to our ABI, so there are unncessary frame setup instruction. basically, it's not inconsistent between aarch64_require_frame_p and aarch64_can_eliminate. it's inconsistent between aarch64_can_eliminate invoked before assign_by_spills and after that. and my first impression is that the gcc_assert in lra-eliminate.c is to strong and need to be relaxed in some situation. Thanks, Andrew
Re: [committed,testsuite] Not run gcc.target/i386/sibcall-1.c on PIC targets
On Sat, Nov 15, 2014 at 11:46 AM, FX fxcoud...@gmail.com wrote: Don’t run gcc.target/i386/sibcall-1.c on PIC targets. 2014-11-15 Francois-Xavier Coudert fxcoud...@gcc.gnu.org PR target/60104 * gcc.target/i386/sibcall-1.c: Don't run on pic targets. Index: gcc.target/i386/sibcall-1.c === --- gcc.target/i386/sibcall-1.c (revision 217599) +++ gcc.target/i386/sibcall-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target ia32 } } */ +/* { dg-do compile { target { ia32 nonpic } } } */ /* { dg-options -O2 } */ extern int (*foo)(int); This looks wrong. This test should pass for 64-bit or ia32 nonpic. -- H.J.
Re: [committed,testsuite] Not run gcc.target/i386/sibcall-1.c on PIC targets
This looks wrong. This test should pass for 64-bit or ia32 nonpic. It was Kai’s original testcase, so I don’t want to modify it too much, other than make it skip where it clearly fails. FX
Re: [committed,testsuite] Not run gcc.target/i386/sibcall-1.c on PIC targets
On Sat, Nov 15, 2014 at 12:22 PM, FX fxcoud...@gmail.com wrote: This looks wrong. This test should pass for 64-bit or ia32 nonpic. It was Kai’s original testcase, so I don’t want to modify it too much, other than make it skip where it clearly fails. Original bug report was filed against x86-64: The attached testcase is a greatly reduced interpreter loop, containing a simple load and indirect branch: goto *addresses[*pc++] gcc 4.8.2 (as well as older versions) with -O2 produces the following x86-64 output: movq addresses.1721(,%rax,8), %rax jmp *%rax Since the loaded value is not used after the branch, there's no need to hold it in a register, so the load could be folded into the branch. This would improve code size and instruction count. Add a testcase only for ia32 makes no senses at all. H.J. -- H.J.
Re: Rerog streaming of OPTIMIZATION_NODE
On Sat, 2014-11-15 17:57:20 +0100, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch implements OPTIMIZATION_NODE streaming same was as previous patch did for TARGET_OPTION_NODE. Since the code turned out to be completely analogous to the previous one I will go ahead and commit it as obvious. It will help to make followup changes easier to follow. I also tested this with forcing default optimization node on every function with LTO. It seems to just work, modulo inliner ignoring most of the flags and happily dragging code from one set of optimization options to another. Bootstrapped/regtested ppc64-linux and x86_64-linux, tested with Firefox, Comitted. Honza * lto-streamer-out.c (hash_tree): Use cl_optimization_hash. * lto-streamer.h (cl_optimization_stream_out, cl_optimization_stream_in): Declare. * optc-save-gen.awk: Generate cl_optimization LTO streaming and hashing routines. * opth-gen.awk: Add prototype of cl_optimization_hash. * tree-streamer-in.c (unpack_ts_optimization): Remove. (streamer_unpack_tree_bitfields): Use cl_optimization_stream_in. * tree-streamer-out.c (pack_ts_optimization): Remove. (streamer_pack_tree_bitfields): Use cl_optimization_stream_out. The recent work, I'm not exactly sure if it's actually /this/ commit, broke nios2-rtems, see eg. build http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=376303 g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o optabs.o -MT optabs.o -MMD -MP -MF ./.deps/optabs.TPo /home/jbglaw/repos/gcc/gcc/optabs.c gawk -f /home/jbglaw/repos/gcc/gcc/opt-functions.awk -f /home/jbglaw/repos/gcc/gcc/opt-read.awk \ -f /home/jbglaw/repos/gcc/gcc/optc-save-gen.awk \ -v header_name=config.h system.h coretypes.h tm.h optionlist options-save.c g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o options-save.o -MT options-save.o -MMD -MP -MF ./.deps/options-save.TPo options-save.c options-save.c: In function ‘void cl_target_option_stream_in(data_in*, bitpack_d*, cl_target_option*)’: options-save.c:1901:41: error: expected primary-expression before ‘enum’ ptr-saved_custom_code_status[256] = (enum nios2_ccs_code saved_custom_code_status[256]) bp_unpack_value (bp, 64); ^ options-save.c:1901:41: error: expected ‘)’ before ‘enum’ options-save.c:1902:40: error: expected primary-expression before ‘int’ ptr-saved_custom_code_index[256] = (int saved_custom_code_index[256]) bp_unpack_value (bp, 64); ^ options-save.c:1902:40: error: expected ‘)’ before ‘int’ options-save.c:1903:49: error: expected primary-expression before ‘int’ ptr-saved_fpu_custom_code[n2fpu_code_num] = (int saved_fpu_custom_code[n2fpu_code_num]) bp_unpack_value (bp, 64); ^ options-save.c:1903:49: error: expected ‘)’ before ‘int’ make[1]: *** [options-save.o] Error 1 (I just bisected it just now, it's this commit: 2014-11-14 Jan Hubicka hubi...@ucw.cz * optc-save-gen.awk: Output cl_target_option_eq, cl_target_option_hash, cl_target_option_stream_out, cl_target_option_stream_in functions. * opth-gen.awk: Output prototypes for cl_target_option_eq and cl_target_option_hash. * lto-streamer.h (cl_target_option_stream_out, cl_target_option_stream_in): Declare. * tree.c (cl_option_hash_hash): Use cl_target_option_hash. (cl_option_hash_eq): Use cl_target_option_eq. * tree-streamer-in.c (unpack_value_fields): Stream in TREE_TARGET_OPTION. * lto-streamer-out.c (DFS::DFS_write_tree_body): Follow DECL_FUNCTION_SPECIFIC_TARGET. (hash_tree): Hash TREE_TARGET_OPTION; visit
RE: Follow-up to PR51471
Eric Botcazou ebotca...@adacore.com writes: IIRC, fill_eager and its related friends are all speculative in some way and aren't those precisely the ones that are causing us problems. Also note we have backends working around this stuff in fairly blunt ways: I'd say that the PA back-end went a bit too far here, especially if it marks some insns of the epilogue as frame-related. dwarf2cfi.c has special code to handle delay slots (SEQUENCEs) so it's not an all-or- nothing game. Given architectural difficulties of delay slots on modern processors, would it be that painful to just not allow filling slots with frame insns and let dbr try to find something else or drop in a nop? I wouldn't be all that surprised if there wasn't a measurable performance difference on something like a modern Sparc. Yes, modern SPARCs have (short) branches without delay slots. But the other big contender is MIPS here and the story might be different for it. MIPSr6 introduces 'compact' branches which do not have delay slots. So the issues of filling delay slots will be less important from R6 onwards. However, delay slots remain important for now. I haven't thought about the problem much but instinctively I'd be surprised if a blanket restriction on frame-related instructions would lead to lots of NOPs in delay slots. Matthew
Re: Rerog streaming of OPTIMIZATION_NODE
The recent work, I'm not exactly sure if it's actually /this/ commit, broke nios2-rtems, see eg. build http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=376303 g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o optabs.o -MT optabs.o -MMD -MP -MF ./.deps/optabs.TPo /home/jbglaw/repos/gcc/gcc/optabs.c gawk -f /home/jbglaw/repos/gcc/gcc/opt-functions.awk -f /home/jbglaw/repos/gcc/gcc/opt-read.awk \ -f /home/jbglaw/repos/gcc/gcc/optc-save-gen.awk \ -v header_name=config.h system.h coretypes.h tm.h optionlist options-save.c g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o options-save.o -MT options-save.o -MMD -MP -MF ./.deps/options-save.TPo options-save.c options-save.c: In function ‘void cl_target_option_stream_in(data_in*, bitpack_d*, cl_target_option*)’: options-save.c:1901:41: error: expected primary-expression before ‘enum’ ptr-saved_custom_code_status[256] = (enum nios2_ccs_code saved_custom_code_status[256]) bp_unpack_value (bp, 64); ^ options-save.c:1901:41: error: expected ‘)’ before ‘enum’ options-save.c:1902:40: error: expected primary-expression before ‘int’ ptr-saved_custom_code_index[256] = (int saved_custom_code_index[256]) bp_unpack_value (bp, 64); ^ options-save.c:1902:40: error: expected ‘)’ before ‘int’ options-save.c:1903:49: error: expected primary-expression before ‘int’ ptr-saved_fpu_custom_code[n2fpu_code_num] = (int saved_fpu_custom_code[n2fpu_code_num]) bp_unpack_value (bp, 64); ^ options-save.c:1903:49: error: expected ‘)’ before ‘int’ make[1]: *** [options-save.o] Error 1 Yep, it is because my code does not handle streaming of arrays into the target optimization nodes. I will take a look on why that array is really needed. It seems like a overkill? Honza (I just bisected it just now, it's this commit: 2014-11-14 Jan Hubicka hubi...@ucw.cz * optc-save-gen.awk: Output cl_target_option_eq, cl_target_option_hash, cl_target_option_stream_out, cl_target_option_stream_in functions. * opth-gen.awk: Output prototypes for cl_target_option_eq and cl_target_option_hash. * lto-streamer.h (cl_target_option_stream_out, cl_target_option_stream_in): Declare. * tree.c (cl_option_hash_hash): Use cl_target_option_hash. (cl_option_hash_eq): Use cl_target_option_eq. * tree-streamer-in.c (unpack_value_fields): Stream in TREE_TARGET_OPTION. * lto-streamer-out.c (DFS::DFS_write_tree_body): Follow DECL_FUNCTION_SPECIFIC_TARGET. (hash_tree): Hash TREE_TARGET_OPTION; visit DECL_FUNCTION_SPECIFIC_TARGET. * tree-streamer-out.c (streamer_pack_tree_bitfields): Skip TS_TARGET_OPTION. (streamer_write_tree_body): Output TS_TARGET_OPTION. ) MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: http://perl.plover.com/Questions.html the second :
Re: [PATCH] Fix gimple_fold_stmt_to_constant regression
On Fri, Nov 14, 2014 at 4:39 AM, Richard Biener rguent...@suse.de wrote: Following up https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01233.html and fixing the regressions this caused as soon as I removed the dispatch to fold_unary (and more regressions it would have caused if I managed to finish the idea to also remove the dispatches to fold_binary and fold_ternary...) the following patch makes CCP and VRP follow selected SSA edges again when gimple_fold_stmt_to_constant_1 dispatches to gimple_simplify. The valueization for gimple_simplify of SSA propagator users may both valueize to anything (in particular constants) and it may signal to follow SSA edges if the destination will never be visited again by the propagator (thus its lattice value is stable). Esp. cutting out valueizing SSA names to constants is what caused the regressions. Note that this highlights the fact that overloading the valueization result with the signal to (not) follow SSA edges isn't the very best thing to do - for example we can't valueize to a SSA name (like for looking through SSA copies) but at the same time say that gimple_simplify shouldn't follow the edge to its definition. This shouldn't be a serious limitation for CCP and VRP which care about constants only - but it shows a defect in the gimple_simplify interface. I haven't yet concluded on a better one though - options go from adding a secondary return to the valueize hook to adding a second hook maybe with additionally adding a simple flag to turn off SSA edge following globally. Anyway - the following patch should fix the immediate regression and allows to go forward with removing GENERIC folding from both fold_stmt and gimple_fold_stmt_to_constant. Just not for this stage1 which will end too soon. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Thanks, Richard. 2014-11-14 Richard Biener rguent...@suse.de * gimple-fold.h (gimple_fold_stmt_to_constant_1): Add 2nd valueization hook defaulted to no_follow_ssa_edges. * gimple-fold.c (gimple_fold_stmt_to_constant_1): Pass 2nd valueization hook to gimple_simplify. * tree-ssa-ccp.c (valueize_op_1): New function to be used for gimple_simplify called via gimple_fold_stmt_to_constant_1. (ccp_fold): Adjust. * tree-vrp.c (vrp_valueize_1): New function to be used for gimple_simplify called via gimple_fold_stmt_to_constant_1. (vrp_visit_assignment_or_call): Adjust. This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63898 -- H.J.
Re: [PING][PATCH] c11-atomic-exec-5: Avoid dead code where LDBL_MANT_DIG is 106
On Fri, 14 Nov 2014, Joseph Myers wrote: OK. Applied, thanks. Maciej
Re: [PATCH] gcc/testsuite: guality.exp: Fix `test_counts' restoration
On Fri, 14 Nov 2014, Jakub Jelinek wrote: gcc/testsuite/ * g++.dg/guality/guality.exp (check_guality): Fix `test_counts' restoration. Ok, thanks. Applied, thanks. Maciej
Small C++ constexpr PATCHes
Here are two small constexpr changes that I made while working on C++14 constexpr support that I thought should go in separately. The first clarifies the error about missing mem-initializers in constexpr constructors so that people aren't confused about why assigning to the field in the constructor body doesn't count as initialization. The second is a small optimization opportunity that I noticed. Tested x86_64-pc-linux-gnu, applying to trunk. commit 2b9db1c0982dfdee378b41039a37262d87575e25 Author: Jason Merrill ja...@redhat.com Date: Thu Nov 13 20:44:02 2014 -0500 * constexpr.c (cx_check_missing_mem_inits): Clarify error message. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index d30bf635..0d45f31 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -716,8 +716,9 @@ cx_check_missing_mem_inits (tree fun, tree body, bool complain) } if (!complain) return true; - error (uninitialized member %qD in %constexpr% constructor, - field); + error (member %qD must be initialized by mem-initializer + in %constexpr% constructor, field); + inform (DECL_SOURCE_LOCATION (field), declared here); bad = true; } if (field == NULL_TREE) diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor.C index 659e733..55beda7 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ctor.C @@ -3,5 +3,5 @@ struct A { int i; - constexpr A() { } // { dg-error uninitialized member .A::i } + constexpr A() { } // { dg-error A::i } }; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-diag4.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-diag4.C index 29f574d..13ca6fa 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-diag4.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-diag4.C @@ -21,5 +21,5 @@ struct A1 struct B1 { A1 a1; -constexpr B1() {} // { dg-error uninitialized member } +constexpr B1() {} // { dg-error B1::a1 } }; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C index 3e2685b..a589356 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ex3.C @@ -6,7 +6,7 @@ struct A { int i; - constexpr A(int _i) { i = _i; } // { dg-error empty body|uninitialized member } + constexpr A(int _i) { i = _i; } // { dg-error empty body|A::i } }; template class T diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-template2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-template2.C index a316b34..12a8d42 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-template2.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-template2.C @@ -3,7 +3,7 @@ template class T struct A { T t; - constexpr A() { } // { dg-error uninitialized } + constexpr A() { } // { dg-error ::t } }; int main() diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi3.C b/gcc/testsuite/g++.dg/cpp0x/nsdmi3.C index 6ac414b..d2e7439 100644 --- a/gcc/testsuite/g++.dg/cpp0x/nsdmi3.C +++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi3.C @@ -15,4 +15,4 @@ struct B constexpr B b; // { dg-error B::B } -// { dg-prune-output uninitialized member } +// { dg-prune-output B::a1 } commit ac8ad66594ec8fcf2c3a1a03b71f0f3b4e6825e5 Author: Jason Merrill ja...@redhat.com Date: Sat Nov 15 00:53:13 2014 -0500 * constexpr.c (cxx_eval_builtin_function_call): Use fold_builtin_call_array. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 0d45f31..66d356f 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -995,9 +995,8 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, tree t, } if (*non_constant_p) return t; - new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t), - CALL_EXPR_FN (t), nargs, args); - new_call = fold (new_call); + new_call = fold_builtin_call_array (EXPR_LOCATION (t), TREE_TYPE (t), + CALL_EXPR_FN (t), nargs, args); VERIFY_CONSTANT (new_call); return new_call; }
Small C++ PATCH to cp_parser_omp_declare_reduction_exprs
With the C++14 constexpr support we were getting confused by using finish_expr_stmt on something that isn't an expression. This already should have been add_stmt, so I'm checking it in separately. Tested x86_64-pc-linux-gnu, applying to trunk. commit 046fdf7db65830bc1030d766ffa8f4ba696e0660 Author: Jason Merrill ja...@redhat.com Date: Sat Nov 15 01:31:47 2014 -0500 * parser.c (cp_parser_omp_declare_reduction_exprs): A block is not an expression. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 3ab65a9..111ec10 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -31188,7 +31188,7 @@ cp_parser_omp_declare_reduction_exprs (tree fndecl, cp_parser *parser) block = finish_omp_structured_block (block); cp_walk_tree (block, cp_remove_omp_priv_cleanup_stmt, omp_priv, NULL); - finish_expr_stmt (block); + add_stmt (block); if (ctor) add_decl_expr (omp_orig);
Re: Rerog streaming of OPTIMIZATION_NODE
Jonah, Yep, it is because my code does not handle streaming of arrays into the target optimization nodes. I will take a look on why that array is really needed. It seems like a overkill? I am looking into the nios2_register_custom_code and I do not quite understand what it is good for? If it tracks customs codes function wide, then perhaps target part of cfun would be better place to home it. It it is unit wide, then saving/restoring does not seem to make much sense. Can you, please, explain me why this needs to be stored into target option structure? If this is really needed, then we can always add a support for streaming arrays, but I would preffer keeping that structure small and simple ;) Honza
[BUILDROBOT] nios2: build breakage (was: Rerog streaming of OPTIMIZATION_NODE)
Hi, On Sun, 2014-11-16 00:36:27 +0100, Jan Hubicka hubi...@ucw.cz wrote: Yep, it is because my code does not handle streaming of arrays into the target optimization nodes. I will take a look on why that array is really needed. It seems like a overkill? I am looking into the nios2_register_custom_code and I do not quite understand what it is good for? If it tracks customs codes function wide, then perhaps target part of cfun would be better place to home it. It it is unit wide, then saving/restoring does not seem to make much sense. Can you, please, explain me why this needs to be stored into target option structure? If this is really needed, then we can always add a support for streaming arrays, but I would preffer keeping that structure small and simple ;) Port maintainers Cc'ed. MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: http://www.eyrie.org/~eagle/faqs/questions.html the second : signature.asc Description: Digital signature
[PATCH, committed] Sync config.{guess,sub} with upstream
Hi! I was under the impression that somebody else took over keeping an eye on syncing common files between gcc, binutils, automake, config, ... Seems I was kind of wrong with that assumption? Alas, I've started my scripts again and will continue my former syncing work, starting with some easy stuff and do it step by step, additionally verifying multiple targets being built without errors in the build robot. 2014-11-16 Jan-Benedict Glaw jbg...@lug-owl.de * config.sub: Update from upstream config repo. * config.guess: Ditto. diff --git a/config.guess b/config.guess index 1f5c50c..6c32c86 100755 --- a/config.guess +++ b/config.guess @@ -2,7 +2,7 @@ # Attempt to guess a canonical system name. # Copyright 1992-2014 Free Software Foundation, Inc. -timestamp='2014-03-23' +timestamp='2014-11-04' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -24,12 +24,12 @@ timestamp='2014-03-23' # program. This Exception is an additional permission under section 7 # of the GNU General Public License, version 3 (GPLv3). # -# Originally written by Per Bothner. +# Originally written by Per Bothner; maintained since 2000 by Ben Elliston. # # You can get the latest version of this script from: # http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD # -# Please send patches with a ChangeLog entry to config-patc...@gnu.org. +# Please send patches to config-patc...@gnu.org. me=`echo $0 | sed -e 's,.*/,,'` @@ -579,8 +579,9 @@ EOF else IBM_ARCH=powerpc fi - if [ -x /usr/bin/oslevel ] ; then - IBM_REV=`/usr/bin/oslevel` + if [ -x /usr/bin/lslpp ] ; then + IBM_REV=`/usr/bin/lslpp -Lqc bos.rte.libc | + awk -F: '{ print $3 }' | sed s/[0-9]*$/0/` else IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} fi diff --git a/config.sub b/config.sub index 88a0cb4..7cc68ba 100755 --- a/config.sub +++ b/config.sub @@ -2,7 +2,7 @@ # Configuration validation subroutine script. # Copyright 1992-2014 Free Software Foundation, Inc. -timestamp='2014-07-28' +timestamp='2014-09-26' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -25,7 +25,7 @@ timestamp='2014-07-28' # of the GNU General Public License, version 3 (GPLv3). -# Please send patches with a ChangeLog entry to config-patc...@gnu.org. +# Please send patches to config-patc...@gnu.org. # # Configuration subroutine to validate and canonicalize a configuration type. # Supply the specified configuration type as an argument. @@ -302,6 +302,7 @@ case $basic_machine in | pdp10 | pdp11 | pj | pjl \ | powerpc | powerpc64 | powerpc64le | powerpcle \ | pyramid \ + | riscv32 | riscv64 \ | rl78 | rx \ | score \ | sh | sh[1234] | sh[24]a | sh[24]aeb | sh[23]e | sh[34]eb | sheb | shbe | shle | sh[1234]le | sh3ele \ @@ -326,6 +327,9 @@ case $basic_machine in c6x) basic_machine=tic6x-unknown ;; + leon|leon[3-9]) + basic_machine=sparc-$basic_machine + ;; m6811 | m68hc11 | m6812 | m68hc12 | m68hcs12x | nvptx | picochip) basic_machine=$basic_machine-unknown os=-none @@ -773,6 +777,9 @@ case $basic_machine in basic_machine=m68k-isi os=-sysv ;; + leon-*|leon[3-9]-*) + basic_machine=sparc-`echo $basic_machine | sed 's/-.*//'` + ;; m68knommu) basic_machine=m68k-unknown os=-linux -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: Zensur im Internet? Nein danke! the second : signature.asc Description: Digital signature
Fix speculation in ipa-cp
Hi, this patch enables propagation of speculative contextes I promised to fix after Martin's merge. There were few bugs that ended up disturbing testsuite: 1) ipa_polymorphic_call_context::combine_with did not handle very well case where the incomming type is in construction and it's current type is known. This made us to drop the ball on one of devirt-*.C testcases 2) ipa_get_indirect_edge_target_1 did not correctly apply the offset adjustment and type_preserved prior using the known context. This caused an ICE while building Firefox 3) propagate_context_accross_jump_function cutpaste the logic where for constants we track if the function may be called externally with a unknown parameter (and thus we need to clone). In the case of contextes we do not really need to set this flag if we did not use the incomming information. Also I think it would be better to handle these by speculation rather than clonning, but I hope Martin will followup on this. 4) I noticed that find_more_scalar_values_for_callers_subset seems to contain a bug when searching if all incomming edges do contribute a same constant to a given parameter. The code seems to set the constant to NULL and then set it to non-NULL at first occurence. When it however hits two different constants it resets back to NULL and later it may get again non-NULL. Otherwise the patch just disable parts where speculation si cleared; it makes equal_to to work better and it also add meet operation that is used to compute context of edges that have multiple callers. Bootstrapped/regtested x86_64-linux, tested with Firefox, plan to commit it shortly. Honza * ipa-polymorphic-call.c (ipa_polymorphic_call_context::speculation_consistent_p): Constify. (ipa_polymorphic_call_context::meet_speculation_with): New function. (ipa_polymorphic_call_context::combine_with): Handle types in construction better. (ipa_polymorphic_call_context::equal_to): Do not bother about useless speculation. (ipa_polymorphic_call_context::meet_with): New function. * cgraph.h (class ipa_polymorphic_call_context): Add meet_width, meet_speculation_with; constify speculation_consistent_p. * ipa-cp.c (ipa_context_from_jfunc): Handle speculation; combine with incomming context. (propagate_context_accross_jump_function): Likewise; be more cureful. about set_contains_variable. (ipa_get_indirect_edge_target_1): Fix handling of dynamic type changes. (find_more_scalar_values_for_callers_subset): Fix. (find_more_contexts_for_caller_subset): Perform meet operation. Index: ipa-polymorphic-call.c === --- ipa-polymorphic-call.c (revision 217609) +++ ipa-polymorphic-call.c (working copy) @@ -1727,7 +1727,7 @@ bool ipa_polymorphic_call_context::speculation_consistent_p (tree spec_outer_type, HOST_WIDE_INT spec_offset, bool spec_maybe_derived_type, - tree otr_type) + tree otr_type) const { if (!flag_devirtualize_speculatively) return false; @@ -1873,6 +1873,102 @@ ipa_polymorphic_call_context::combine_sp return false; } +/* Make speculation less specific so + NEW_OUTER_TYPE, NEW_OFFSET, NEW_MAYBE_DERIVED_TYPE is also included. + If OTR_TYPE is set, assume the context is used with OTR_TYPE. */ + +bool +ipa_polymorphic_call_context::meet_speculation_with + (tree new_outer_type, HOST_WIDE_INT new_offset, bool new_maybe_derived_type, +tree otr_type) +{ + if (!new_outer_type speculative_outer_type) +{ + clear_speculation (); + return true; +} + + /* restrict_to_inner_class may eliminate wrong speculation making our job + easeier. */ + if (otr_type) +restrict_to_inner_class (otr_type); + + if (!speculative_outer_type + || !speculation_consistent_p (speculative_outer_type, + speculative_offset, + speculative_maybe_derived_type, + otr_type)) +return false; + + if (!speculation_consistent_p (new_outer_type, new_offset, +new_maybe_derived_type, otr_type)) +{ + clear_speculation (); + return true; +} + + else if (types_must_be_same_for_odr (speculative_outer_type, + new_outer_type)) +{ + if (speculative_offset != new_offset) + { + clear_speculation (); + return true; + } + else + { + if (!speculative_maybe_derived_type new_maybe_derived_type) + { + speculative_maybe_derived_type = true; +
Re: OpenACC middle end changes
On Thursday 2014-11-13 17:59, Thomas Schwinge wrote: Here is our current set of OpenACC middle end changes. As discussed before, this is not yet all of OpenACC 2.0 -- we shall a) document what is working already, and b) continue to work on closing the gap. As David wrote in a different context, strchrnul is a GNU extension and not present at least on AIX and FreeBSD 8 (and possibly 9). Gerald PS: Sorry, this mail got stuck in my outbox.
Re: Concepts code review
I don't believe I'll be able to familiarize myself adequately with the more complex issues before the stage 1 deadline (from what I understand Andrew is/was taking care of the blocking issues?), so I will leave what I have for the more trivial issues and a few comments. On 11/11/2014 12:05 PM, Jason Merrill wrote: // Diagnose constraint failures in a variable concept. void diagnose_var (location_t loc, tree t, tree args) The name and comment seem misleading since T is actually a TEMPLATE_ID_EXPR. Variable templates (and thus concepts) are TEMPLATE_ID_EXPR. I changed the comment to explicitly state that it will be a TEMPLATE_ID_EXPR, but I'm not sure the name needs to be changed. If I recall correctly, I initially implemented as *_var_concept and Andrew told me to shorten it. Note that this also matches up with normalize_var. +// Bring the parameters of a function declaration back into +// scope without entering the function body. The declarator +// must be a function declarator. The caller is responsible +// for calling finish_scope. +void +push_function_parms (cp_declarator *declarator) I think if the caller is calling finish_scope, I'd prefer for the begin_scope call to be there as well. Even though Andrew said that this will change later for other reasons, it's a function I wrote so: I actually debated this with Andrew before. My rationale for calling begin_scope in the function was that it feels consistent with the semantics of the call. Specifically it can be seen as reopening the function parameter scope. Thus the call is balanced by calling finish_scope. Either way would work of course, but perhaps it just needed a better name and/or comment? + // Save the current template requirements. + saved_template_reqs = release (current_template_reqs); It seems like a lot of places with saved_template_reqs variables could be converted to use cp_manage_requirements. Probably. The instance you quoted isn't very trivial though. The requirements are saved in two different branches and need to be restored before a certain point in the function. Might just need to spend more time looking over the code. + // FIXME: This could be improved. Perhaps the type of the requires + // expression depends on the satisfaction of its constraints. That + // is, its type is bool only if its substitution into its normalized + // constraints succeeds. The requires-expression is not type-dependent, but it can be instantiation-dependent and value-dependent. This is an interesting change. The REQUIRES_EXPR is currently marked as value dependent. The ChangeLog indicates that Andrew loosened the conditions for being value dependent for some cases, but then added it as type dependent when something else failed. May require some time to pin down exactly what needs to be done here. - Braden Obrzut 2014-11-15 Braden Obrzut ad...@maniacsvault.net * gcc/cp/constraint.cc (resolve_constraint_check): Move definition check to grokfndecl. (normalize_template_id): Use expression location if available when informing about missing parentheses. (build_requires_expr): Added comment. (diagnose_var): Clarified comment. * gcc/cp/decl.c (check_concept_refinement): Remove outdated comment regarding variable concepts. (grokfndecl): Ensure that all concept declarations are definitions. (grokdeclarator): Remove outdated comment regarding variable concepts. * gcc/cp/parser.c (cp_parser_introduction_list): Use vec for temporary list instead of a TREE_LIST. (get_id_declarator): Renamed from cp_get_id_declarator. (get_unqualified_id): Renamed from cp_get_identifier. (is_constrained_parameter): Renamed from cp_is_constrained_parameter. (cp_parser_check_constrained_type_parm): Renamed from cp_check_constrained_type_parm. (cp_parser_constrained_type_template_parm): Renamed from cp_constrained_type_template_parm. (cp_parser_constrained_template_template_parm): Renamed from cp_constrained_template_template_parm. (constrained_non_type_template_parm): Renamed from cp_constrained_non_type_tmeplate_parm. (finish_constrained_parameter): Renamed from cp_finish_constrained_parameter. (maybe_type_parameter): Renamed from cp_maybe_type_parameter. (declares_type_parameter): Renamed from cp_declares_type_parameter. (declares_type_template_parameter): Renamed from cp_declares_type_template_parameter. (declares_template_template_parameter): Renamed from cp_declares_template_template_parameter. (cp_parser_type_parameter): Call cp_parser_default_type_template_argument and cp_parser_default_template_template_argument which were already factored out from this function. (cp_maybe_constrained_type_specifier): Use the new INTRODUCED_PARM_DECL instead of PLACEHOLDER_EXPR. (cp_parser_requires_expr_scope): Remove old comment and change destructor to use
[PATCH] driver: ignore SIGINT while waiting on subprocesses to finish
Currently the top-level driver handles SIGINT by immediately killing itself even when the driver has subprocesses (e.g. cc1, as) running. I don't think this is a good idea because 1. if the top-level driver is waiting on a hanging subprocess, pressing ^C will kill the driver but it may not necessarily kill the subprocess; an unresponsive, perhaps busy-looping subprocess may be running in the background yet the compiler will seem to have to terminated successfully. 2. when debugging gcc with gcc -wrapper gdb,--args we are unable to use ^C from within the GDB subprocess because pressing ^C immediately kills the driver and we lose our terminal. This makes debugging more inconvenient. This patch fixes these two issues by having the driver ignore SIGINT while a subprocess is running. The driver instead will have to wait for the subprocess to exit before it terminates, like usual. I tested this change by running gcc -wrapper gdb, gcc -wrapper valgrind and plain old gcc in various ways (-pipe, -flto, -c, etc) and pressing ^C during compilation. I noticed no differences in behavior or promptness other than finally being able to use ^C inside GDB. Does this change look OK for trunk after a successful bootstrap + regtest on x86_64-unknown-linux-gnu? --- gcc/gcc.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index 653ca8d..0802f41 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -2653,7 +2653,11 @@ add_sysrooted_prefix (struct path_prefix *pprefix, const char *prefix, add_prefix (pprefix, prefix, component, priority, require_machine_suffix, os_multilib); } - + +/* True if the SIGINT signal should be ignored by the driver's + signal handler. */ +static bool ignore_sigint_p; + /* Execute the command specified by the arguments on the current line of spec. When using pipes, this includes several piped-together commands with `|' between them. @@ -2839,6 +2843,10 @@ execute (void) if (pex == NULL) fatal_error (pex_init failed: %m); + /* A SIGINT raised while subprocesses are running should not kill the + driver. */ + ignore_sigint_p = true; + for (i = 0; i n_commands; i++) { const char *errmsg; @@ -2878,6 +2886,8 @@ execute (void) if (!pex_get_status (pex, n_commands, statuses)) fatal_error (failed to get exit status: %m); +ignore_sigint_p = false; + if (report_times || report_times_to_file) { times = (struct pex_time *) alloca (n_commands * sizeof (struct pex_time)); @@ -2893,6 +2903,9 @@ execute (void) if (WIFSIGNALED (status)) { + if (WTERMSIG (status) == SIGINT) + raise (SIGINT); + else #ifdef SIGPIPE /* SIGPIPE is a special case. It happens in -pipe mode when the compiler dies before the preprocessor is done, @@ -6710,6 +6723,12 @@ set_input (const char *filename) static void fatal_signal (int signum) { + if (signum == SIGINT ignore_sigint_p) +{ + signal (signum, fatal_signal); + return; +} + signal (signum, SIG_DFL); delete_failure_queue (); delete_temp_files (); -- 2.2.0.rc1.23.gf570943
Re: [BUILDROBOT] nios2: build breakage
On 11/15/2014 04:49 PM, Jan-Benedict Glaw wrote: Hi, On Sun, 2014-11-16 00:36:27 +0100, Jan Hubicka hubi...@ucw.cz wrote: Yep, it is because my code does not handle streaming of arrays into the target optimization nodes. I will take a look on why that array is really needed. It seems like a overkill? I am looking into the nios2_register_custom_code and I do not quite understand what it is good for? If it tracks customs codes function wide, then perhaps target part of cfun would be better place to home it. It it is unit wide, then saving/restoring does not seem to make much sense. Can you, please, explain me why this needs to be stored into target option structure? If this is really needed, then we can always add a support for streaming arrays, but I would preffer keeping that structure small and simple ;) Port maintainers Cc'ed. I can explain why this is needed, at least. The Nios II architecture optionally allows custom instructions that are typically used to implement floating-point operations. The nios2 GCC backend knows to generate these instructions if the user tells it what opcodes implement these instructions. This is typically done by command-line options, but can also be done on a per-function basis by means of target attributes or pragmas -- see gcc/testsuite/gcc.target/nios2/custom-fp-* for examples. The command-line options, attribute, and pragma support was a requirement from Altera, so yes, this is really needed. -Sandra
Re: [PATCH, libgfortran] PR 60324 Unbounded stack allocations in libgfortran
On Sat, Nov 15, 2014 at 8:01 AM, Janne Blomqvist blomqvist.ja...@gmail.com wrote: On Fri, Nov 14, 2014 at 11:02 PM, Tobias Burnus bur...@net-b.de wrote: I think instead of doing a run-time check I'd prefer something like the following, keeping the compile-time assert. --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -253 +253 @@ static GFC_UINTEGER_4 kiss_default_seed[] = { -static const GFC_INTEGER_4 kiss_size = sizeof(kiss_seed)/sizeof(kiss_seed[0]); +#define KISS_SIZE ((GFC_INTEGER_4) (sizeof(kiss_seed)/sizeof(kiss_seed[0])) (plus s/kiss_size/KISS_SIZE/ changes in the code.) Janne, what do you think? I like it. With this, you can also get rid of the assert and the newly introduced KISS_MAX_SIZE macro, and just make the seed array the correct size, as was originally done (with a VLA). Consider such a patch pre-approved. I went ahead and did it myself, committed the attached patch as r217623 after regtesting. 2014-11-16 Janne Blomqvist j...@gcc.gnu.org PR libfortran/60324 * intrinsics/random.c (kiss_size): Rename to KISS_SIZE, make it a macro instead of a variable. (random_seed_i4): Make seed correct size, remove assert, KISS_SIZE related changes. (random_seed_i8): KISS_SIZE related changes. -- Janne Blomqvist diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c index 5e91929..d2510b2 100644 --- a/libgfortran/intrinsics/random.c +++ b/libgfortran/intrinsics/random.c @@ -224,7 +224,7 @@ KISS algorithm. */ z=0,c=0 and z=2^32-1,c=698769068 should be avoided. */ -/* Any modifications to the seeds that change kiss_size below need to be +/* Any modifications to the seeds that change KISS_SIZE below need to be reflected in check.c (gfc_check_random_seed) to enable correct compile-time checking of PUT size for the RANDOM_SEED intrinsic. */ @@ -250,7 +250,7 @@ static GFC_UINTEGER_4 kiss_default_seed[] = { #endif }; -static const GFC_INTEGER_4 kiss_size = sizeof(kiss_seed)/sizeof(kiss_seed[0]); +#define KISS_SIZE (sizeof(kiss_seed)/sizeof(kiss_seed[0])) static GFC_UINTEGER_4 * const kiss_seed_1 = kiss_seed; static GFC_UINTEGER_4 * const kiss_seed_2 = kiss_seed + 4; @@ -665,12 +665,7 @@ unscramble_seed (unsigned char *dest, unsigned char *src, int size) void random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) { - int i; - -#define KISS_MAX_SIZE 12 - unsigned char seed[4 * KISS_MAX_SIZE]; - _Static_assert (kiss_size = KISS_MAX_SIZE, - kiss_size must = KISS_MAX_SIZE); + unsigned char seed[4 * KISS_SIZE]; __gthread_mutex_lock (random_lock); @@ -681,11 +676,11 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) /* From the standard: If no argument is present, the processor assigns a processor-dependent value to the seed. */ if (size == NULL put == NULL get == NULL) - for (i = 0; i kiss_size; i++) + for (size_t i = 0; i KISS_SIZE; i++) kiss_seed[i] = kiss_default_seed[i]; if (size != NULL) -*size = kiss_size; +*size = KISS_SIZE; if (put != NULL) { @@ -694,18 +689,18 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) runtime_error (Array rank of PUT is not 1.); /* If the array is too small, abort. */ - if (GFC_DESCRIPTOR_EXTENT(put,0) kiss_size) + if (GFC_DESCRIPTOR_EXTENT(put,0) (index_type) KISS_SIZE) runtime_error (Array size of PUT is too small.); /* We copy the seed given by the user. */ - for (i = 0; i kiss_size; i++) + for (size_t i = 0; i KISS_SIZE; i++) memcpy (seed + i * sizeof(GFC_UINTEGER_4), - (put-base_addr[(kiss_size - 1 - i) * GFC_DESCRIPTOR_STRIDE(put,0)]), + (put-base_addr[(KISS_SIZE - 1 - i) * GFC_DESCRIPTOR_STRIDE(put,0)]), sizeof(GFC_UINTEGER_4)); /* We put it after scrambling the bytes, to paper around users who provide seeds with quality only in the lower or upper part. */ - scramble_seed ((unsigned char *) kiss_seed, seed, 4*kiss_size); + scramble_seed ((unsigned char *) kiss_seed, seed, 4 * KISS_SIZE); } /* Return the seed to GET data. */ @@ -716,15 +711,15 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get) runtime_error (Array rank of GET is not 1.); /* If the array is too small, abort. */ - if (GFC_DESCRIPTOR_EXTENT(get,0) kiss_size) + if (GFC_DESCRIPTOR_EXTENT(get,0) (index_type) KISS_SIZE) runtime_error (Array size of GET is too small.); /* Unscramble the seed. */ - unscramble_seed (seed, (unsigned char *) kiss_seed, 4*kiss_size); + unscramble_seed (seed, (unsigned char *) kiss_seed, 4 * KISS_SIZE); /* Then copy it back to the user variable. */ - for (i = 0; i kiss_size; i++) - memcpy ((get-base_addr[(kiss_size - 1 - i) *
Re: [PATCH] driver: ignore SIGINT while waiting on subprocesses to finish
On Sat, 15 Nov 2014, Patrick Palka wrote: Currently the top-level driver handles SIGINT by immediately killing itself even when the driver has subprocesses (e.g. cc1, as) running. I don't think this is a good idea because 1. if the top-level driver is waiting on a hanging subprocess, pressing ^C will kill the driver but it may not necessarily kill the subprocess; an unresponsive, perhaps busy-looping subprocess may be running in the background yet the compiler will seem to have to terminated successfully. 2. when debugging gcc with gcc -wrapper gdb,--args we are unable to use ^C from within the GDB subprocess because pressing ^C immediately kills the driver and we lose our terminal. This makes debugging more inconvenient. This patch fixes these two issues by having the driver ignore SIGINT while a subprocess is running. The driver instead will have to wait for the subprocess to exit before it terminates, like usual. I tested this change by running gcc -wrapper gdb, gcc -wrapper valgrind and plain old gcc in various ways (-pipe, -flto, -c, etc) and pressing ^C during compilation. I noticed no differences in behavior or promptness other than finally being able to use ^C inside GDB. Does this change look OK for trunk after a successful bootstrap + regtest on x86_64-unknown-linux-gnu? I forgot a ChangeLog entry: * gcc.c (ignore_sigint_p): New static variable. (execute): Use it. (fatal_signal): Ignore SIGINT if ignore_sigint_p is true.
Re: Concepts code review
On 11/15/2014 07:58 PM, Braden Obrzut wrote: Variable templates (and thus concepts) are TEMPLATE_ID_EXPR. I changed the comment to explicitly state that it will be a TEMPLATE_ID_EXPR, but I'm not sure the name needs to be changed. If I recall correctly, I initially implemented as *_var_concept and Andrew told me to shorten it. Note that this also matches up with normalize_var. OK. I think if the caller is calling finish_scope, I'd prefer for the begin_scope call to be there as well. Even though Andrew said that this will change later for other reasons, it's a function I wrote so: I actually debated this with Andrew before. My rationale for calling begin_scope in the function was that it feels consistent with the semantics of the call. Specifically it can be seen as reopening the function parameter scope. Thus the call is balanced by calling finish_scope. Either way would work of course, but perhaps it just needed a better name and/or comment? A better name would be OK. Perhaps push_function_parms_with_scope. Jason
Re: [Patch] PR 61692 - Fix for inline asm ICE
On 9/15/2014 2:51 PM, Jeff Law wrote: Let's go with your original inputs + outputs + labels change and punt the clobbers stuff for now. jeff I have also added the test code you requested. I have a release on file with the FSF, but don't have SVN write access. Problem: extract_insn() in recog.c will ICE if (noperands MAX_RECOG_OPERANDS). Normally this isn't a problem since expand_asm_operands() in cfgexpand.c catches and reports a proper error for this condition. However, expand_asm_operands() only checks (ninputs + noutputs) instead of (ninputs + noutputs + nlabels), so you can get the ICE when using asm goto. ChangeLog: 2014-11-15 David Wohlferd d...@limegreensocks.com PR target/61692 * cfgexpand.c (expand_asm_operands): Count all inline asm params. * testsuite/gcc.dg/pr61692.c: New test. dw Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 217623) +++ gcc/cfgexpand.c (working copy) @@ -2589,7 +2589,7 @@ } ninputs += ninout; - if (ninputs + noutputs MAX_RECOG_OPERANDS) + if (ninputs + noutputs + nlabels MAX_RECOG_OPERANDS) { error (more than %d operands in %asm%, MAX_RECOG_OPERANDS); return; Index: gcc/testsuite/gcc.dg/pr61692.c === --- gcc/testsuite/gcc.dg/pr61692.c (revision 0) +++ gcc/testsuite/gcc.dg/pr61692.c (working copy) @@ -0,0 +1,173 @@ +/* PR 61692 */ + +/* Check for ice when exceededing the max # + of parameters to inline asm. */ + +int Labels() +{ +label01: label02: label03: label04: label05: +label06: label07: label08: label09: label10: +label11: label12: label13: label14: label15: +label16: label17: label18: label19: label20: +label21: label22: label23: label24: label25: +label26: label27: label28: label29: label30: +label31: + +__asm__ goto ( /* Works. */ +: /* no outputs */ +: /* no inputs */ +: /* no clobbers */ +: label01, label02, label03, label04, label05, + label06, label07, label08, label09, label10, + label11, label12, label13, label14, label15, + label16, label17, label18, label19, label20, + label21, label22, label23, label24, label25, + label26, label27, label28, label29, label30); + +__asm__ goto ( /* { dg-error more than 30 operands } */ +: /* no outputs */ +: /* no inputs */ +: /* no clobbers */ +: label01, label02, label03, label04, label05, + label06, label07, label08, label09, label10, + label11, label12, label13, label14, label15, + label16, label17, label18, label19, label20, + label21, label22, label23, label24, label25, + label26, label27, label28, label29, label30, + label31); + +return 0; +} + +int Labels_and_Inputs() +{ +int b01, b02, b03, b04, b05, b06, b07, b08, b09, b10; +int b11, b12, b13, b14, b15, b16, b17, b18, b19, b20; +int b21, b22, b23, b24, b25, b26, b27, b28, b29, b30; +int b31; + +label01: label02: label03: label04: label05: +label06: label07: label08: label09: label10: +label11: label12: label13: label14: label15: +label16: label17: label18: label19: label20: +label21: label22: label23: label24: label25: +label26: label27: label28: label29: label30: +label31: + +b01 = b02 = b03 = b04 = b05 = b06 = b07 = b08 = b09 = b10 = 0; +b11 = b12 = b13 = b14 = b15 = b16 = b17 = b18 = b19 = b20 = 0; +b21 = b22 = b23 = b24 = b25 = b26 = b27 = b28 = b29 = b30 = 0; +b31 = 0; + +__asm__ goto ( /* Works. */ + : /* no outputs */ + : m (b01), m (b02), m (b03), m (b04), m (b05), +m (b06), m (b07), m (b08), m (b09), m (b10), +m (b11), m (b12), m (b13), m (b14), m (b15), +m (b16), m (b17), m (b18), m (b19), m (b20), +m (b21), m (b22), m (b23), m (b24), m (b25), +m (b26), m (b27), m (b28), m (b29) + : /* no clobbers */ + : label01); + +__asm__ goto ( /* { dg-error more than 30 operands } */ + : /* no outputs */ + : m (b01), m (b02), m (b03), m (b04), m (b05), +m (b06), m (b07), m (b08), m (b09), m (b10), +m (b11), m (b12), m (b13), m (b14), m (b15), +m (b16), m (b17), m (b18), m (b19), m (b20), +m (b21), m (b22), m (b23), m (b24), m (b25), +m (b26), m (b27), m (b28), m (b29), m (b30) + : /* no clobbers */ + : label01); + + return 0; +} + +int Outputs() +{ +int b01, b02, b03, b04, b05, b06, b07, b08, b09, b10; +int b11, b12, b13, b14, b15, b16, b17, b18, b19, b20; +int b21, b22, b23, b24, b25, b26, b27, b28, b29, b30; +int b31; + +/* Outputs. */ +__asm__ volatile ( /* Works. */ + : =m (b01), =m (b02), =m (b03), =m (b04), =m (b05), + =m (b06), =m (b07), =m (b08), =m (b09), =m (b10), + =m (b11),
Re: [PATCH] Look through widening type conversions for possible edge assertions
On Wed, Nov 12, 2014 at 3:38 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Nov 12, 2014 at 5:17 AM, Patrick Palka patr...@parcs.ath.cx wrote: On Tue, Nov 11, 2014 at 8:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 11, 2014 at 1:10 PM, Patrick Palka patr...@parcs.ath.cx wrote: This patch is a replacement for the 2nd VRP refactoring patch. It simply teaches VRP to look through widening type conversions when finding suitable edge assertions, e.g. bool p = x != y; int q = (int) p; if (q == 0) // new edge assert: p == 0 and therefore x == y I think the proper fix is to forward x != y to q == 0 instead of this one. That said - the tree-ssa-forwprop.c restriction on only forwarding single-uses into conditions is clearly bogus here. I suggest to relax it for conversions and compares. Like with Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 217349) +++ tree-ssa-forwprop.c (working copy) @@ -476,7 +476,7 @@ forward_propagate_into_comparison_1 (gim { rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt); tmp = combine_cond_expr_cond (stmt, code, type, - rhs0, op1, !single_use0_p); + rhs0, op1, false); if (tmp) return tmp; } Thanks, Richard. That makes sense. Attached is what I have so far. I relaxed the forwprop restriction in the case of comparing an integer constant with a comparison or with a conversion from a boolean value. (If I allow all conversions, not just those from a boolean value, then a couple of -Wstrict-overflow faillures trigger..) Does the change look sensible? Should the logic be duplicated for the case when TREE_CODE (op1) == SSA_NAME? Thanks for your help so far! It looks good though I'd have allowed all kinds of conversions, not only those from booleans. If the patch tests ok with that change it is ok. Sadly changing the patch to propagate all kinds of conversions, not only just those from booleans, introduces regressions that I don't know how to adequately fix.
RFA (tree-inline): PATCH for more C++14 constexpr support
This patch implements more support for C++14 constexpr: it allows arbitrary modification of variables in a constexpr function, but does not currently handle jumping -- multiple returns, loops, switches. The approach I took for this was to just use the DECL_SAVED_TREE for a constexpr function as the basis for expansion rather than trying to massage it into a magic expression. And now the values of local variables, including parameters, are kept in the values hash map that I introduced with the aggregate NSDMI patch. But in the presence of recursive constexpr calls we can't use the same PARM_DECL as a key, so we need to remap it. Thus I've added remap_fn_body to tree-inline.c to unshare the entire function body and remap the parms and result to avoid clashes. This handles some more C++14 testcases and has no regressions on C++11 constexpr testcases. Support for jumps will follow soon. Tested x86_64-pc-linux-gnu and powerpc64-unknown-linux-gnu. Is the remap_fn_body function ok for trunk? commit decc90baa31ae1535b4b0ab80aeee185471a5ddb Author: Jason Merrill ja...@redhat.com Date: Sat Nov 15 10:46:55 2014 -0500 C++14 constexpr support (minus loops and multiple returns) gcc/ * tree-inline.c (remap_fn_body): New. * tree-inline.h: Declare it. gcc/cp/ * constexpr.c (use_new_call): New macro. (build_data_member_initialization): Ignore non-mem-inits. (check_constexpr_bind_expr_vars): Remove C++14 checks. (constexpr_fn_retval): Likewise. (check_constexpr_ctor_body): Do nothing in C++14. (massage_constexpr_body): In C++14 only collect mem-inits. (get_function_named_in_call): Handle null CALL_EXPR_FN. (cxx_bind_parameters_in_call): Build bindings in same order as parameters. Don't treat iniviref parms specially in new call mode. (cxx_eval_call_expression): If use_new_call, do constexpr expansion based on DECL_SAVED_TREE rather than the massaged constexpr body. (cxx_eval_component_reference): A null element means we're mid- initialization. (cxx_eval_store_expression, cxx_eval_increment_expression): New. (cxx_eval_constant_expression): Handle RESULT_DECL, DECL_EXPR, MODIFY_EXPR, STATEMENT_LIST, BIND_EXPR, USING_STMT, PREINCREMENT_EXPR, POSTINCREMENT_EXPR, PREDECREMENT_EXPR, POSTDECREMENT_EXPR. Don't look into DECL_INITIAL of variables in constexpr functions. In new-call mode find parms in the values table. (potential_constant_expression_1): Handle null CALL_EXPR_FN. Handle STATEMENT_LIST, MODIFY_EXPR, MODOP_EXPR, IF_STMT, PREINCREMENT_EXPR, POSTINCREMENT_EXPR, PREDECREMENT_EXPR, POSTDECREMENT_EXPR, BIND_EXPR, WITH_CLEANUP_EXPR, CLEANUP_POINT_EXPR, MUST_NOT_THROW_EXPR, TRY_CATCH_EXPR, EH_SPEC_BLOCK, EXPR_STMT, DECL_EXPR. (cxx_eval_array_reference): Call build_cplus_new. (cxx_eval_component_reference): Likewise. (cxx_eval_outermost_constant_expr): Pull object out of AGGR_INIT_EXPR. (maybe_constant_init): Look through INIT_EXPR. (ensure_literal_type_for_constexpr_object): Set cp_function_chain-invalid_constexpr. * cp-tree.h (struct language_function): Add invalid_constexpr bitfield. * decl.c (start_decl): Set cp_function_chain-invalid_constexpr. (check_for_uninitialized_const_var): Likewise. (maybe_save_function_definition): Check it. * parser.c (cp_parser_jump_statement): Set cp_function_chain-invalid_constexpr. (cp_parser_asm_definition): Likewise. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 66d356f..53cfb18 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include tree-iterator.h #include gimplify.h #include builtins.h +#include tree-inline.h static bool verify_constant (tree, bool, bool *, bool *); #define VERIFY_CONSTANT(X) \ @@ -93,8 +94,11 @@ ensure_literal_type_for_constexpr_object (tree decl) error (the type %qT of constexpr variable %qD is not literal, type, decl); else - error (variable %qD of non-literal type %qT in %constexpr% - function, decl, type); + { + error (variable %qD of non-literal type %qT in %constexpr% + function, decl, type); + cp_function_chain-invalid_constexpr = true; + } explain_non_literal_class (type); return NULL; } @@ -310,13 +314,20 @@ build_data_member_initialization (tree t, vecconstructor_elt, va_gc **vec) if (TREE_CODE (t) == CONVERT_EXPR) t = TREE_OPERAND (t, 0); if (TREE_CODE (t) == INIT_EXPR - || TREE_CODE (t) == MODIFY_EXPR) + /* vptr initialization shows up as a MODIFY_EXPR. In C++14 we only + use what this function builds for cx_check_missing_mem_inits, and + assignment in the ctor body doesn't count. */ + || (cxx_dialect cxx14 TREE_CODE (t) == MODIFY_EXPR)) { member = TREE_OPERAND (t, 0); init = break_out_target_exprs
a patch to fix an x86 bootstrap with LRA
The following patch fixes a x86 bootstrap failure with different sets of options (-with-arch=corei7 -with-cpu=intel, -with-arch=core2 -with-cpu=slm). The patch was bootstrapped on x86 (with the 2 sets) and x86-64. Committed as rev. 217624. 2014-11-15 Vladimir Makarov vmaka...@redhat.com * lra-remat.c (cand_transf_func): Process regno for rematerialization too. * lra.c (lra): Switch on rematerialization pass. Index: lra.c === --- lra.c (revision 217609) +++ lra.c (working copy) @@ -2354,7 +2354,7 @@ lra (FILE *f) break; /* Now we know what pseudos should be spilled. Try to rematerialize them first. */ - if (0lra_remat ()) + if (lra_remat ()) { /* We need full live info -- see the comment above. */ lra_create_live_ranges (lra_reg_spill_p, true); Index: lra-remat.c === --- lra-remat.c (revision 217602) +++ lra-remat.c (working copy) @@ -860,6 +860,10 @@ cand_trans_fun (int bb_index, bitmap bb_ bitmap_set_bit (temp_bitmap, cid); break; } + /* Check regno for rematerialization. */ + if (bitmap_bit_p (bb_changed_regs, cand-regno) + || bitmap_bit_p (bb_dead_regs, cand-regno)) + bitmap_set_bit (temp_bitmap, cid); } return bitmap_ior_and_compl (bb_out, bb_info-gen_cands, bb_in, temp_bitmap);
RFC: PATCH to genericize C++ loops to LOOP_EXPR instead of gotos
I've had a TODO in genericize_cp_loop for a long time suggesting that we should genericize to LOOP_EXPR rather than gotos, and now that I need to interpret the function body for constexpr evaluation, making this change will also simplify that handling. This change also does away with canonicalizing the condition to the bottom of the loop, to avoid the extra goto. It seems to me that this is unnecessary nowadays, since the optimizers are very capable of making any necessary transformations, but I'm interested in feedback from other people. Tested x86_64-pc-linux-gnu. Opinions? commit 1a45860e7757ee054f6bf98bee4ebe5c661dfb90 Author: Jason Merrill ja...@redhat.com Date: Thu Nov 13 23:54:48 2014 -0500 * cp-gimplify.c (genericize_cp_loop): Use LOOP_EXPR. (genericize_for_stmt): Handle null statement-list. diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index e5436bb..81b26d2 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -208,7 +208,7 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body, void *data) { tree blab, clab; - tree entry = NULL, exit = NULL, t; + tree exit = NULL; tree stmt_list = NULL; blab = begin_bc_block (bc_break, start_locus); @@ -222,64 +222,46 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body, cp_walk_tree (incr, cp_genericize_r, data, NULL); *walk_subtrees = 0; - /* If condition is zero don't generate a loop construct. */ - if (cond integer_zerop (cond)) + if (cond TREE_CODE (cond) != INTEGER_CST) { - if (cond_is_first) - { - t = build1_loc (start_locus, GOTO_EXPR, void_type_node, - get_bc_label (bc_break)); - append_to_statement_list (t, stmt_list); - } -} - else -{ - /* Expand to gotos, just like c_finish_loop. TODO: Use LOOP_EXPR. */ - tree top = build1 (LABEL_EXPR, void_type_node, - create_artificial_label (start_locus)); - - /* If we have an exit condition, then we build an IF with gotos either - out of the loop, or to the top of it. If there's no exit condition, - then we just build a jump back to the top. */ - exit = build1 (GOTO_EXPR, void_type_node, LABEL_EXPR_LABEL (top)); - - if (cond !integer_nonzerop (cond)) - { - /* Canonicalize the loop condition to the end. This means - generating a branch to the loop condition. Reuse the - continue label, if possible. */ - if (cond_is_first) - { - if (incr) - { - entry = build1 (LABEL_EXPR, void_type_node, - create_artificial_label (start_locus)); - t = build1_loc (start_locus, GOTO_EXPR, void_type_node, - LABEL_EXPR_LABEL (entry)); - } - else - t = build1_loc (start_locus, GOTO_EXPR, void_type_node, -get_bc_label (bc_continue)); - append_to_statement_list (t, stmt_list); - } - - t = build1 (GOTO_EXPR, void_type_node, get_bc_label (bc_break)); - exit = fold_build3_loc (start_locus, - COND_EXPR, void_type_node, cond, exit, t); - } - - append_to_statement_list (top, stmt_list); + /* If COND is constant, don't bother building an exit. If it's false, + we won't build a loop. If it's true, any exits are in the body. */ + location_t cloc = EXPR_LOC_OR_LOC (cond, start_locus); + exit = build1_loc (cloc, GOTO_EXPR, void_type_node, + get_bc_label (bc_break)); + exit = fold_build3_loc (cloc, COND_EXPR, void_type_node, cond, + build_empty_stmt (cloc), exit); } + if (exit cond_is_first) +append_to_statement_list (exit, stmt_list); append_to_statement_list (body, stmt_list); finish_bc_block (stmt_list, bc_continue, clab); append_to_statement_list (incr, stmt_list); - append_to_statement_list (entry, stmt_list); - append_to_statement_list (exit, stmt_list); + if (exit !cond_is_first) +append_to_statement_list (exit, stmt_list); + + if (!stmt_list) +stmt_list = build_empty_stmt (start_locus); + + tree loop; + if (cond integer_zerop (cond)) +{ + if (cond_is_first) + loop = fold_build3_loc (start_locus, COND_EXPR, +void_type_node, cond, stmt_list, +build_empty_stmt (start_locus)); + else + loop = stmt_list; +} + else +loop = build1_loc (start_locus, LOOP_EXPR, void_type_node, stmt_list); + + stmt_list = NULL; + append_to_statement_list (loop, stmt_list); finish_bc_block (stmt_list, bc_break, blab); - - if (stmt_list == NULL_TREE) -stmt_list = build1 (NOP_EXPR, void_type_node, integer_zero_node); + if (!stmt_list) +stmt_list = build_empty_stmt (start_locus); *stmt_p = stmt_list; } @@ -303,6 +285,8 @@ genericize_for_stmt (tree *stmt_p, int *walk_subtrees, void *data) genericize_cp_loop (loop, EXPR_LOCATION (stmt), FOR_COND (stmt), FOR_BODY (stmt), FOR_EXPR (stmt), 1, walk_subtrees, data); append_to_statement_list (loop, expr); + if (expr == NULL_TREE) +expr = loop; *stmt_p = expr; }
Re: RFA (tree-inline): PATCH for more C++14 constexpr support
On 11/15/2014 11:59 PM, Jason Merrill wrote: This handles some more C++14 testcases and has no regressions on C++11 constexpr testcases. Support for jumps will follow soon. Like this, though it still needs a bit of cleaning up. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 53cfb18..611d4f2 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -871,7 +871,7 @@ struct constexpr_ctx { static GTY (()) hash_tableconstexpr_call_hasher *constexpr_call_table; static tree cxx_eval_constant_expression (const constexpr_ctx *, tree, - bool, bool, bool *, bool *); + bool, bool, bool *, bool *, tree *); /* Compute a hash value for a constexpr call representation. */ @@ -993,7 +993,8 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, tree t, { args[i] = cxx_eval_constant_expression (ctx, CALL_EXPR_ARG (t, i), allow_non_constant, addr, - non_constant_p, overflow_p); + non_constant_p, overflow_p, + NULL); if (allow_non_constant *non_constant_p) return t; } @@ -1070,7 +1071,7 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t, } arg = cxx_eval_constant_expression (ctx, x, allow_non_constant, TREE_CODE (type) == REFERENCE_TYPE, - non_constant_p, overflow_p); + non_constant_p, overflow_p, NULL); /* Don't VERIFY_CONSTANT here. */ if (*non_constant_p allow_non_constant) return; @@ -1151,7 +1152,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, /* Might be a constexpr function pointer. */ fun = cxx_eval_constant_expression (ctx, fun, allow_non_constant, /*addr*/false, non_constant_p, - overflow_p); + overflow_p, NULL); STRIP_NOPS (fun); if (TREE_CODE (fun) == ADDR_EXPR) fun = TREE_OPERAND (fun, 0); @@ -1187,7 +1188,8 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, { tree arg = convert_from_reference (get_nth_callarg (t, 1)); return cxx_eval_constant_expression (ctx, arg, allow_non_constant, - addr, non_constant_p, overflow_p); + addr, non_constant_p, + overflow_p, NULL); } else if (TREE_CODE (t) == AGGR_INIT_EXPR AGGR_INIT_ZERO_FIRST (t)) @@ -1270,7 +1272,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, result = (cxx_eval_constant_expression (new_ctx, new_call.fundef-body, allow_non_constant, addr, - non_constant_p, overflow_p)); + non_constant_p, overflow_p, NULL)); } else { @@ -1310,8 +1312,10 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, else ctx-values-put (res, NULL_TREE); + tree jump_target = NULL_TREE; cxx_eval_constant_expression (ctx, body, allow_non_constant, - addr, non_constant_p, overflow_p); + addr, non_constant_p, overflow_p, + jump_target); if (VOID_TYPE_P (TREE_TYPE (res))) /* This can be null for a subobject constructor call, in @@ -1320,7 +1324,16 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, to put such a call in the hash table. */ result = addr ? ctx-object : ctx-ctor; else - result = *ctx-values-get (slot ? slot : res); + { + result = *ctx-values-get (slot ? slot : res); + if (result == NULL_TREE) + { + if (!allow_non_constant) + error (constexpr call flows off the end + of the function); + *non_constant_p = true; + } + } /* Remove the parms/result from the values map. Is it worth bothering to do this when the map itself is only live for @@ -1433,7 +1446,8 @@ cxx_eval_unary_expression (const constexpr_ctx *ctx, tree t, tree r; tree orig_arg = TREE_OPERAND (t, 0); tree arg = cxx_eval_constant_expression (ctx, orig_arg, allow_non_constant, - addr, non_constant_p, overflow_p); + addr, non_constant_p, overflow_p, + NULL); VERIFY_CONSTANT (arg); if (arg == orig_arg) return t; @@ -1456,11 +1470,11 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, tree lhs, rhs; lhs = cxx_eval_constant_expression (ctx, orig_lhs, allow_non_constant, addr, - non_constant_p, overflow_p); + non_constant_p, overflow_p, NULL); VERIFY_CONSTANT (lhs); rhs = cxx_eval_constant_expression (ctx, orig_rhs, allow_non_constant, addr, - non_constant_p, overflow_p); + non_constant_p, overflow_p, NULL); VERIFY_CONSTANT (rhs); if (lhs == orig_lhs rhs == orig_rhs) return t; @@ -1476,20 +1490,24 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, static tree cxx_eval_conditional_expression (const constexpr_ctx *ctx, tree t, bool allow_non_constant, bool addr, - bool *non_constant_p, bool *overflow_p) + bool *non_constant_p, bool *overflow_p, + tree *jump_target) { tree val = cxx_eval_constant_expression (ctx, TREE_OPERAND (t,
Avoid applying inline plan for all functions ahead of late compilation
Hi, late in GCC 4.9 development we broke the feature that ltrans stages do not read all functions in ahead. This is because of late IPA passes that do not like to see functions without IPA transformations applied. I was originally OK with the solution based on fact that we have only IPA-PTA as late IPA pass that is disabled by default and eventually probably should become part of WPA in some form. SIMD streaming was however added and this causes us to stream in all function bodies and apply all inlining decisions at very beggining of optimization queue. Fixed by this patch. get_body is now responsible for applying transformations on demand and late IPA passes needs to call get_body on functions that they are interested in + are advised to not be interested in every single function in the program. The patch also hits a bug in i386's ix86_set_current_function. It is responsible for initializing backend and it does so lazily remembering the previous options backend was initialized for. Pragma parsing however clears the cache that leads to wrong settings being used for subsetquent functions. Bootstrapped/regtested x86_64-linux, will commit it tomorrow after bit of more testing. Index: gcc/cgraphclones.c === --- gcc/cgraphclones.c (revision 217612) +++ gcc/cgraphclones.c (working copy) @@ -307,7 +307,7 @@ duplicate_thunk_for_node (cgraph_node *t node = duplicate_thunk_for_node (thunk_of, node); if (!DECL_ARGUMENTS (thunk-decl)) -thunk-get_body (); +thunk-get_untransformed_body (); cgraph_edge *cs; for (cs = node-callers; cs; cs = cs-next_caller) @@ -1067,7 +1067,7 @@ symbol_table::materialize_all_clones (vo !gimple_has_body_p (node-decl)) { if (!node-clone_of-clone_of) - node-clone_of-get_body (); + node-clone_of-get_untransformed_body (); if (gimple_has_body_p (node-clone_of-decl)) { if (symtab-dump_file) Index: gcc/ipa-icf.c === --- gcc/ipa-icf.c (revision 217612) +++ gcc/ipa-icf.c (working copy) @@ -706,7 +706,7 @@ void sem_function::init (void) { if (in_lto_p) -get_node ()-get_body (); +get_node ()-get_untransformed_body (); tree fndecl = node-decl; function *func = DECL_STRUCT_FUNCTION (fndecl); Index: gcc/passes.c === --- gcc/passes.c(revision 217612) +++ gcc/passes.c(working copy) @@ -2214,36 +2214,6 @@ execute_one_pass (opt_pass *pass) executed. */ invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass); - /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it. - Apply all trnasforms first. */ - if (pass-type == SIMPLE_IPA_PASS) -{ - struct cgraph_node *node; - bool applied = false; - FOR_EACH_DEFINED_FUNCTION (node) - if (node-analyzed -node-has_gimple_body_p () -(!node-clone_of || node-decl != node-clone_of-decl)) - { - if (!node-global.inlined_to -node-ipa_transforms_to_apply.exists ()) - { - node-get_body (); - push_cfun (DECL_STRUCT_FUNCTION (node-decl)); - execute_all_ipa_transforms (); - cgraph_edge::rebuild_edges (); - free_dominance_info (CDI_DOMINATORS); - free_dominance_info (CDI_POST_DOMINATORS); - pop_cfun (); - applied = true; - } - } - if (applied) - symtab-remove_unreachable_nodes (false, dump_file); - /* Restore current_pass. */ - current_pass = pass; -} - if (!quiet_flag !cfun) fprintf (stderr, %s, pass-name ? pass-name : ); Index: gcc/cgraphunit.c === --- gcc/cgraphunit.c(revision 217612) +++ gcc/cgraphunit.c(working copy) @@ -197,7 +197,6 @@ along with GCC; see the file COPYING3. #include target.h #include diagnostic.h #include params.h -#include fibheap.h #include intl.h #include hash-map.h #include plugin-api.h @@ -1469,7 +1468,7 @@ cgraph_node::expand_thunk (bool output_a } if (in_lto_p) - get_body (); + get_untransformed_body (); a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; @@ -1522,7 +1521,7 @@ cgraph_node::expand_thunk (bool output_a gimple ret; if (in_lto_p) - get_body (); + get_untransformed_body (); a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; @@ -1744,7 +1743,7 @@ cgraph_node::expand (void) announce_function (decl); process = 0; gcc_assert (lowered); - get_body (); + get_untransformed_body (); /* Generate RTL for the body of DECL. */ Index:
Re: Avoid applying inline plan for all functions ahead of late compilation
On November 16, 2014 8:15:37 AM CET, Jan Hubicka hubi...@ucw.cz wrote: Hi, late in GCC 4.9 development we broke the feature that ltrans stages do not read all functions in ahead. This is because of late IPA passes that do not like to see functions without IPA transformations applied. I was originally OK with the solution based on fact that we have only IPA-PTA as late IPA pass that is disabled by default and eventually probably should become part of WPA in some form. SIMD streaming was however added and this causes us to stream in all function bodies and apply all inlining decisions at very beggining of optimization queue. Fixed by this patch. get_body is now responsible for applying transformations on demand and late IPA passes needs to call get_body on functions that they are interested in + are advised to not be interested in every single function in the program. The patch also hits a bug in i386's ix86_set_current_function. It is responsible for initializing backend and it does so lazily remembering the previous options backend was initialized for. Pragma parsing however clears the cache that leads to wrong settings being used for subsetquent functions. Bootstrapped/regtested x86_64-linux, will commit it tomorrow after bit of more testing. But for example for IPA pta this means we apply all IPA transforms without any garbage collection run? Richard. Index: gcc/cgraphclones.c === --- gcc/cgraphclones.c (revision 217612) +++ gcc/cgraphclones.c (working copy) @@ -307,7 +307,7 @@ duplicate_thunk_for_node (cgraph_node *t node = duplicate_thunk_for_node (thunk_of, node); if (!DECL_ARGUMENTS (thunk-decl)) -thunk-get_body (); +thunk-get_untransformed_body (); cgraph_edge *cs; for (cs = node-callers; cs; cs = cs-next_caller) @@ -1067,7 +1067,7 @@ symbol_table::materialize_all_clones (vo !gimple_has_body_p (node-decl)) { if (!node-clone_of-clone_of) - node-clone_of-get_body (); + node-clone_of-get_untransformed_body (); if (gimple_has_body_p (node-clone_of-decl)) { if (symtab-dump_file) Index: gcc/ipa-icf.c === --- gcc/ipa-icf.c (revision 217612) +++ gcc/ipa-icf.c (working copy) @@ -706,7 +706,7 @@ void sem_function::init (void) { if (in_lto_p) -get_node ()-get_body (); +get_node ()-get_untransformed_body (); tree fndecl = node-decl; function *func = DECL_STRUCT_FUNCTION (fndecl); Index: gcc/passes.c === --- gcc/passes.c (revision 217612) +++ gcc/passes.c (working copy) @@ -2214,36 +2214,6 @@ execute_one_pass (opt_pass *pass) executed. */ invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass); - /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it. - Apply all trnasforms first. */ - if (pass-type == SIMPLE_IPA_PASS) -{ - struct cgraph_node *node; - bool applied = false; - FOR_EACH_DEFINED_FUNCTION (node) - if (node-analyzed - node-has_gimple_body_p () - (!node-clone_of || node-decl != node-clone_of-decl)) -{ - if (!node-global.inlined_to - node-ipa_transforms_to_apply.exists ()) -{ - node-get_body (); - push_cfun (DECL_STRUCT_FUNCTION (node-decl)); - execute_all_ipa_transforms (); - cgraph_edge::rebuild_edges (); - free_dominance_info (CDI_DOMINATORS); - free_dominance_info (CDI_POST_DOMINATORS); - pop_cfun (); - applied = true; -} -} - if (applied) - symtab-remove_unreachable_nodes (false, dump_file); - /* Restore current_pass. */ - current_pass = pass; -} - if (!quiet_flag !cfun) fprintf (stderr, %s, pass-name ? pass-name : ); Index: gcc/cgraphunit.c === --- gcc/cgraphunit.c (revision 217612) +++ gcc/cgraphunit.c (working copy) @@ -197,7 +197,6 @@ along with GCC; see the file COPYING3. #include target.h #include diagnostic.h #include params.h -#include fibheap.h #include intl.h #include hash-map.h #include plugin-api.h @@ -1469,7 +1468,7 @@ cgraph_node::expand_thunk (bool output_a } if (in_lto_p) - get_body (); + get_untransformed_body (); a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; @@ -1522,7 +1521,7 @@ cgraph_node::expand_thunk (bool output_a gimple ret; if (in_lto_p) - get_body (); + get_untransformed_body (); a = DECL_ARGUMENTS (thunk_fndecl); current_function_decl = thunk_fndecl; @@ -1744,7 +1743,7 @@ cgraph_node::expand (void) announce_function (decl);