Re: [PATCH 2/5] completely_scalarize arrays as well as records
On August 26, 2015 6:08:55 PM GMT+02:00, Alan Lawrence alan.lawre...@arm.com wrote: Richard Biener wrote: One extra question is does the way we limit total scalarization work well for arrays? I suppose we have either sth like the maximum size of an aggregate we scalarize or the maximum number of component accesses we create? Only the former and that would be kept intact. It is in fact visible in the context of the last hunk of the patch. OK. IIRC the gimplification code also has the latter and also considers zeroing the whole aggregate before initializing non-zero fields. IMHO it makes sense to reuse some of the analysis and classification routines it has. Do you mean gimplify_init_constructor? Yes, there's quite a lot of logic there ;). That feels like a separate patch - and belonging to the constant-handling subseries of this series Yes. - as gimplify_init_constructor already deals with both record and array types, and I don't see anything there that's specifically good for total-scalarization of arrays? IOW, do you mean that to block this patch, or can it be separate (I can address Martin + Jeff's comments fairly quickly and independently) ? No, but I'd like this being explores with the init sub series. We don't want two places doing total scalarization of initualizers , gimplification and SRA and with different/conflicting heuristics. IMHO the gimplification total scalarization happens too early. Richard. Cheers, Alan
Go patch committed: don't crash on invalid numeric type
This patch by Chris Manghane fixes a compiler crash on an invalid program when the compiler tries to set a numeric constant to an invalid type. This fixes https://golang.org/issue/11537. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227201) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d5e6af4e6dd456075a1ec1c03d0dc41cbea5eb36 +cd5362c7bb0b207f484a8dfb8db229fd2bffef09 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 227201) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -15150,7 +15150,11 @@ Numeric_constant::set_type(Type* type, b else if (type-complex_type() != NULL) ret = this-check_complex_type(type-complex_type(), issue_error, loc); else -go_unreachable(); +{ + ret = false; + if (issue_error) +go_assert(saw_errors()); +} if (ret) this-type_ = type; return ret;
Re: [libvtv] Fix formatting errors
On 08/26/2015 07:30 AM, Rainer Orth wrote: While looking at libvtv for the Solaris port, I noticed all sorts of GNU Coding Standard violations: * ChangeLog entries attributed to the committer instead of the author and with misformatted PR references, entries only giving a vague rational instead of what changed * overlong lines * tons of whitespace errors (though I may be wrong in some cases: C++ code might have other rules) * code formatting that seems to have been done to be visually pleasing, completely different from what Emacs does * commented code fragments (#if 0 equivalent) * configure.tgt target list in no recognizable order * the Cygwin/MingW port is done in the worst possible way: tons of target-specific ifdefs instead of feature-specific conditionals or an interface that can wrap both Cygwin and Linux variants of the code The following patch (as yet not even compiled) fixes some of the most glaring errors. The Solaris port will fix a few of the latter ones. Do you think this is the right direction or did I get something wrong? Thanks. Rainer 2015-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de Fix formatting errors. I'm more interested in the current state of vtv as I keep getting dragged into discussions about what we can/should be doing in the compiler world to close more security stuff. Vtables are an obvious candidate given we've got vtv. Jeff
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On 08/26/2015 05:13 AM, Ilya Enkovich wrote: 2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com: On 08/21/2015 04:49 AM, Ilya Enkovich wrote: I want a work with bitmasks to be expressed in a natural way using regular integer operations. Currently all masks manipulations are emulated via vector statements (mostly using a bunch of vec_cond). For complex predicates it may be nontrivial to transform it back to scalar masks and get an efficient code. Also the same vector may be used as both a mask and an integer vector. Things become more complex if you additionally have broadcasts and vector pack/unpack code. It also should be transformed into a scalar masks manipulations somehow. Or why not model the conversion at the gimple level using a CONVERT_EXPR? In fact, the more I think about it, that seems to make more sense to me. We pick a canonical form for the mask, whatever it may be. We use that canonical form and model conversions between it and the other form via CONVERT_EXPR. We then let DOM/PRE find/eliminate the redundant conversions. If it's not up to the task, we should really look into why and resolve. Yes, that does mean we have two forms which I'm not terribly happy about and it means some target dependencies on what the masked vector operation looks like (ie, does it accept a simple integer or vector mask), but I'm starting to wonder if, as distasteful as I find it, it's the right thing to do. If we have some special representation for masks in GIMPLE then we might not need any conversions. We could ask a target to define a MODE for this type and use it directly everywhere: directly compare into it, use it directly for masked loads and stores, AND, IOR, EQ etc. If that type is reserved for masks usage then you previous suggestion to transform masks into target specific form at GIMPLE-RTL phase should work fine. This would allow to support only a single masks representation in GIMPLE. Possibly, but you mentioned that you may need to use the masks in both forms depending on the exact context. If so, then I think we need to model a conversion between the two forms. Jeff
Fwd: [libvtv] Fix formatting errors
-- Forwarded message -- From: Caroline Tice cmt...@google.com Date: Wed, Aug 26, 2015 at 12:50 PM Subject: Re: [libvtv] Fix formatting errors To: Jeff Law l...@redhat.com Cc: Rainer Orth r...@cebitec.uni-bielefeld.de, GCC Patches gcc-patches@gcc.gnu.org As far as I know vtv is working just fine...is there something I don't know about? -- Caroline cmt...@google.com On Wed, Aug 26, 2015 at 12:47 PM, Jeff Law l...@redhat.com wrote: On 08/26/2015 07:30 AM, Rainer Orth wrote: While looking at libvtv for the Solaris port, I noticed all sorts of GNU Coding Standard violations: * ChangeLog entries attributed to the committer instead of the author and with misformatted PR references, entries only giving a vague rational instead of what changed * overlong lines * tons of whitespace errors (though I may be wrong in some cases: C++ code might have other rules) * code formatting that seems to have been done to be visually pleasing, completely different from what Emacs does * commented code fragments (#if 0 equivalent) * configure.tgt target list in no recognizable order * the Cygwin/MingW port is done in the worst possible way: tons of target-specific ifdefs instead of feature-specific conditionals or an interface that can wrap both Cygwin and Linux variants of the code The following patch (as yet not even compiled) fixes some of the most glaring errors. The Solaris port will fix a few of the latter ones. Do you think this is the right direction or did I get something wrong? Thanks. Rainer 2015-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de Fix formatting errors. I'm more interested in the current state of vtv as I keep getting dragged into discussions about what we can/should be doing in the compiler world to close more security stuff. Vtables are an obvious candidate given we've got vtv. Jeff
[Patch, fortran] F2008 - implement pointer function assignment
Dear All, The attached patch more or less implements the assignment of expressions to the result of a pointer function. To wit: my_ptr_fcn (arg1, arg2...) = expr arg1 would usually be the target, pointed to by the function. The patch parses these statements and resolves them into: temp_ptr = my_ptr_fcn (arg1, arg2...) temp_ptr = expr I say more or less implemented because I have ducked one of the headaches here. At the end of the specification block, there is an ambiguity between statement functions and pointer function assignments. I do not even try to resolve this ambiguity and require that there be at least one other type of executable statement before these beasts. This can undoubtedly be fixed but the effort seems to me to be unwarranted at the present time. I had a stupid amount of trouble with the test fmt_tab_1.f90. I have no idea why but the gfc_warning no longer showed the offending line, although the line number in the error message was OK. Changing to gfc_warning_now fixed the problem. Also, I can see no reason why this should be dg-run and so changed to dg-compile. Finally, I set -std=legacy to stop the generic error associated with tabs. Bootstraps and regtests on x86_64/FC21 - OK for trunk? Now back to trying to get my head round parameterized derived types! Cheers Paul 2015-08-26 Paul Thomas pa...@gcc.gnu.org * decl.c (get_proc_name): Return if statement function is found. * io.c (next_char_not_space): Change tab warning to warning now to prevent locus being lost. * match.c (gfc_match_ptr_fcn_assign): New function. * match.h : Add prototype for gfc_match_ptr_fcn_assign. * parse.c : Add static flag 'in_specification_block'. (decode_statement): If in specification block match a statement function, otherwise if standard embraces F2008 try to match a pointer function assignment. (parse_interface): Set 'in_specification_block' on exiting from parse_spec. (parse_spec): Set and then reset 'in_specification_block'. (gfc_parse_file): Set 'in_specification_block'. * resolve.c (get_temp_from_expr): Extend to include functions and array constructors as rvalues.. (resolve_ptr_fcn_assign): New function. (gfc_resolve_code): Call it on finding a pointer function as an lvalue. * symbol.c (gfc_add_procedure): Add a sentence to the error to flag up the ambiguity between a statement function and pointer function assignment at the end of the specification block. 2015-08-26 Paul Thomas pa...@gcc.gnu.org * gfortran.dg/fmt_tab_1.f90: Change from run to compile and set standard as legacy. * gfortran.dg/ptr_func_assign_1.f08: New test. Index: gcc/fortran/decl.c === *** gcc/fortran/decl.c (revision 227118) --- gcc/fortran/decl.c (working copy) *** get_proc_name (const char *name, gfc_sym *** 901,906 --- 901,908 return rc; sym = *result; + if (sym-attr.proc == PROC_ST_FUNCTION) + return rc; if (sym-attr.module_procedure sym-attr.if_source == IFSRC_IFBODY) Index: gcc/fortran/io.c === *** gcc/fortran/io.c(revision 227118) --- gcc/fortran/io.c(working copy) *** next_char_not_space (bool *error) *** 200,206 if (c == '\t') { if (gfc_option.allow_std GFC_STD_GNU) ! gfc_warning (0, Extension: Tab character in format at %C); else { gfc_error (Extension: Tab character in format at %C); --- 200,206 if (c == '\t') { if (gfc_option.allow_std GFC_STD_GNU) ! gfc_warning_now (0, Extension: Tab character in format at %C); else { gfc_error (Extension: Tab character in format at %C); Index: gcc/fortran/match.c === *** gcc/fortran/match.c (revision 227118) --- gcc/fortran/match.c (working copy) *** match *** 4886,4892 gfc_match_st_function (void) { gfc_error_buffer old_error; - gfc_symbol *sym; gfc_expr *expr; match m; --- 4886,4891 *** gfc_match_st_function (void) *** 4926,4931 --- 4925,5000 return MATCH_YES; undo_error: + gfc_pop_error (old_error); + return MATCH_NO; + } + + + /* Match an assignment to a pointer function (F2008). This could, in +general be ambiguous with a statement function. In this implementation +it remains so if it is the first statement after the specification +block. */ + + match + gfc_match_ptr_fcn_assign (void) + { + gfc_error_buffer old_error; + locus old_loc; + gfc_symbol *sym; + gfc_expr *expr; + match m; + char name[GFC_MAX_SYMBOL_LEN + 1]; + + old_loc = gfc_current_locus; + m = gfc_match_name (name); + if (m != MATCH_YES) + return m; + +
Re: [PATCH 2/5] completely_scalarize arrays as well as records
On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 03:42 PM, Martin Jambor wrote: Hi, On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote: This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. thanks for working on this. I see Jeff has already approved the patch, but I have two comments nevertheless. First, I would be much happier if you added a proper comment to scalarize_elem function which you forgot completely. The name is not very descriptive and it has quite few parameters too. Right. I mentioned that I missed the lack of function comments when looking at #3 and asked Alan to go back and fix them in #1 and #2. Second, this patch should also fix PR 67283. It would be great if you could verify that and add it to the changelog when committing if that is indeed the case. Excellent. Yes, definitely mention the BZ. One extra question is does the way we limit total scalarization work well for arrays? I suppose we have either sth like the maximum size of an aggregate we scalarize or the maximum number of component accesses we create? Thanks, Richard. jeff
Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
On Tue, Aug 25, 2015 at 9:50 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 05:06 AM, Alan Lawrence wrote: When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant propagation in the ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle constant-pool loads). I tried to work around this by making dom2's hashable_expr_equal_p less conservative, but found that on platforms without AArch64's vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)a] equivalent to a[0], etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task than this patch series. I can't see how to write a testcase for this in C though as direct assignment to an array is not possible; such assignments occur only with constant pool data, which is dealt with in the next patch. It's a general issue that if there's 1 common way to represent an expression, then DOM will often miss discovery of the CSE opportunity because of the way it hashes expressions. Ideally we'd be moving to a canonical form, but I also realize that in the case of memory references like this, that may not be feasible. It does make me wonder how many CSEs we're really missing due to the two ways to represent array accesses. Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, aarch64-none-linux-gnu. gcc/ChangeLog: * tree-sra.c (completely_scalarize): Move some code into: (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base is aligned array. --- gcc/tree-sra.c | 110 - 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) } } +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) Function comment needed. I may have missed it in the earlier patches, but can you please make sure any new functions you created have comments in those as well. Such patches are pre-approved. With the added function comment, this patch is fine. Err ... you generally _cannot_ create ARRAY_REFs out of thin air because of correctness issues with data-ref and data dependence analysis. You can of course keep ARRAY_REFs if the original access was an ARRAY_REF. But I'm not convinced this is what the pass does. We've went through great lengths removing all the code from gimplification and folding that tried to be clever in producing array refs from accesses to sth with an ARRAY_TYPE - this all eventually lead to wrong-code issues later. So I'd rather _not_ have this patch. (as always I'm too slow responding and Jeff is too fast ;)) Thanks, Richard. jeff
Re: [libgfortran,patch] Remove never-used debugging code
OK. Just checking. Thanks for the code cleanup. Thanks for the review. Committed as rev. 227208. FX
Re: [PATCH 2/5] completely_scalarize arrays as well as records
Richard Biener wrote: One extra question is does the way we limit total scalarization work well for arrays? I suppose we have either sth like the maximum size of an aggregate we scalarize or the maximum number of component accesses we create? Only the former and that would be kept intact. It is in fact visible in the context of the last hunk of the patch. OK. IIRC the gimplification code also has the latter and also considers zeroing the whole aggregate before initializing non-zero fields. IMHO it makes sense to reuse some of the analysis and classification routines it has. Do you mean gimplify_init_constructor? Yes, there's quite a lot of logic there ;). That feels like a separate patch - and belonging to the constant-handling subseries of this series - as gimplify_init_constructor already deals with both record and array types, and I don't see anything there that's specifically good for total-scalarization of arrays? IOW, do you mean that to block this patch, or can it be separate (I can address Martin + Jeff's comments fairly quickly and independently) ? Cheers, Alan
Re: [Scalar masks 2/x] Use bool masks in if-conversion
2015-08-26 17:56 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com: Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Huh? Bitwise ops should be readily available. Right bitwise ops are available, but there is no comparison into a vector and no masked loads and stores using vector masks (when we speak about 512-bit vectors). Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. How do you declare those? Something like this: #pragma omp declare simd inbranch int foo(int*); The 'inbranch' is the thing that matters? And all of foo is then implicitely predicated? That's right. And a vector version of foo gets a mask as an additional arg. Well, you are missing the case of bool b = a b; int x = (int)b; This case seems to require no changes and just be transformed into vec_cond. Ok, the example was too simple but I meant that a bool has a non-conditional use. Right. In such cases I think it's reasonable to replace it with a select similar to what we now have but without whole bool tree transformed. Ok, so I still believe we don't want two ways to express things on GIMPLE if possible. Yes, the vectorizer already creates only vector stmts that are supported by the hardware. So it's a matter of deciding on the GIMPLE representation for the mask. I'd rather use vectorbool (and the target assigning an integer mode to it) than an 'int' in GIMPLE statements. Because that makes the type constraints on GIMPLE very weak and exposes those 'ints' to all kind of optimization passes. Thus if we change the result type requirement of vector comparisons from signed integer vectors to bool vectors the vectorizer can still go for promoting that bool vector to a vector of ints via a VEC_COND_EXPR and the expander can special-case that if the target has a vector comparison producing a vector mask. So, can you give that vectorbool some thought? Yes, I want to try it. But getting rid of bool patterns would mean support for all targets currently supporting vec_cond. Would it be OK to have vectorbool mask co-exist with bool patterns for some time? Thus first step would be to require vectorbool for MASK_LOAD and MASK_STORE and support it for i386 (the only user of MASK_LOAD and MASK_STORE). Note that to assign sth else than a vector mode to it needs adjustments in stor-layout.c. I'm pretty sure we don't want vector BImodes. I can directly build a vector type with specified mode to avoid it. Smth. like: mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size); mask_type = make_vector_type (bool_type_node, nunits, mask_mode); Thanks, Ilya Richard.
Re: [PATCH] 2015-07-31 Benedikt Huber benedikt.hu...@theobroma-systems.com Philipp Tomsich philipp.toms...@theobroma-systems.com
ping [PATCH v4][aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02698.html On 31 Jul 2015, at 19:05, Benedikt Huber benedikt.hu...@theobroma-systems.com wrote: * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf. * config/aarch64/aarch64-opts.h: -mrecip has a default value depending on the core. * config/aarch64/aarch64-protos.h: Declare. * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and frsqrts. * config/aarch64/aarch64-tuning-flags.def: Added MRECIP_DEFAULT_ENABLED. * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code in fast math mode. * config/aarch64/aarch64.md: Added enum entries. * config/aarch64/aarch64.opt: Added options -mrecip and -mlow-precision-recip-sqrt. * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans for frsqrte and frsqrts * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt. Signed-off-by: Philipp Tomsich philipp.toms...@theobroma-systems.com --- gcc/ChangeLog | 21 gcc/config/aarch64/aarch64-builtins.c | 104 gcc/config/aarch64/aarch64-opts.h | 7 ++ gcc/config/aarch64/aarch64-protos.h| 2 + gcc/config/aarch64/aarch64-simd.md | 27 ++ gcc/config/aarch64/aarch64-tuning-flags.def| 1 + gcc/config/aarch64/aarch64.c | 106 +++- gcc/config/aarch64/aarch64.md | 3 + gcc/config/aarch64/aarch64.opt | 8 ++ gcc/doc/invoke.texi| 19 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c | 63 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 107 + 12 files changed, 463 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3432adb..3bf3098 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,24 @@ +2015-07-31 Benedikt Huber benedikt.hu...@theobroma-systems.com + Philipp Tomsich philipp.toms...@theobroma-systems.com + + * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and + rsqrtf. + * config/aarch64/aarch64-opts.h: -mrecip has a default value + depending on the core. + * config/aarch64/aarch64-protos.h: Declare. + * config/aarch64/aarch64-simd.md: Matching expressions for + frsqrte and frsqrts. + * config/aarch64/aarch64-tuning-flags.def: Added + MRECIP_DEFAULT_ENABLED. + * config/aarch64/aarch64.c: New functions. Emit rsqrt + estimation code in fast math mode. + * config/aarch64/aarch64.md: Added enum entries. + * config/aarch64/aarch64.opt: Added options -mrecip and + -mlow-precision-recip-sqrt. + * testsuite/gcc.target/aarch64/rsqrt-asm-check.c: Assembly scans + for frsqrte and frsqrts + * testsuite/gcc.target/aarch64/rsqrt.c: Functional tests for rsqrt. + 2015-07-08 Jiong Wang jiong.w...@arm.com * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index b6c89b9..b4f443c 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -335,6 +335,11 @@ enum aarch64_builtins AARCH64_BUILTIN_GET_FPSR, AARCH64_BUILTIN_SET_FPSR, + AARCH64_BUILTIN_RSQRT_DF, + AARCH64_BUILTIN_RSQRT_SF, + AARCH64_BUILTIN_RSQRT_V2DF, + AARCH64_BUILTIN_RSQRT_V2SF, + AARCH64_BUILTIN_RSQRT_V4SF, AARCH64_SIMD_BUILTIN_BASE, AARCH64_SIMD_BUILTIN_LANE_CHECK, #include aarch64-simd-builtins.def @@ -824,6 +829,42 @@ aarch64_init_crc32_builtins () } void +aarch64_add_builtin_rsqrt (void) +{ + tree fndecl = NULL; + tree ftype = NULL; + + tree V2SF_type_node = build_vector_type (float_type_node, 2); + tree V2DF_type_node = build_vector_type (double_type_node, 2); + tree V4SF_type_node = build_vector_type (float_type_node, 4); + + ftype = build_function_type_list (double_type_node, double_type_node, NULL_TREE); + fndecl = add_builtin_function (__builtin_aarch64_rsqrt_df, +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl; + + ftype = build_function_type_list (float_type_node, float_type_node, NULL_TREE); + fndecl = add_builtin_function (__builtin_aarch64_rsqrt_sf, +ftype, AARCH64_BUILTIN_RSQRT_SF, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_SF] = fndecl; + + ftype = build_function_type_list (V2DF_type_node, V2DF_type_node, NULL_TREE); + fndecl
Re: [PATCH] [AVX512F] Add scatter support for vectorizer
On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener richard.guent...@gmail.com wrote: @@ -3763,32 +3776,46 @@ again: if (vf *min_vf) *min_vf = vf; - if (gather) + if (gatherscatter != SG_NONE) { tree off; + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, true) != 0) + gatherscatter = GATHER; + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, false) + != 0) + gatherscatter = SCATTER; + else + gatherscatter = SG_NONE; as I said vect_check_gather_scatter already knows whether the DR is a read or a write and thus whether it needs to check for gather or scatter. Remove the new argument. And simply do if (!vect_check_gather_scatter (stmt)) gatherscatter = SG_NONE; - STMT_VINFO_GATHER_P (stmt_info) = true; + if (gatherscatter == GATHER) + STMT_VINFO_GATHER_P (stmt_info) = true; + else + STMT_VINFO_SCATTER_P (stmt_info) = true; } and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P using the enum so you can simply do STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter; Otherwise the patch looks ok to me. Fixed. Uros, could you please have a look at target part of patch? Thanks, Petr 2015-08-26 Andrey Turetskiy andrey.turets...@intel.com Petr Murzin petr.mur...@intel.com gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it for loads/stores in case of gather/scatter accordingly. (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P. (vect_check_gather_scatter): Use it instead of vect_check_gather. (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable and new checkings for it accordingly. * tree-vect-stmts.c (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather_scatter): Use it instead of vect_check_gather. (vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P. gcc/testsuite/ * gcc.target/i386/avx512f-scatter-1.c: New. * gcc.target/i386/avx512f-scatter-2.c: Ditto. * gcc.target/i386/avx512f-scatter-3.c: Ditto. scatter Description: Binary data tests Description: Binary data
Re: [PATCH] [AVX512F] Add scatter support for vectorizer
On Wed, Aug 26, 2015 at 7:39 PM, Petr Murzin petrmurz...@gmail.com wrote: On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener richard.guent...@gmail.com wrote: @@ -3763,32 +3776,46 @@ again: if (vf *min_vf) *min_vf = vf; - if (gather) + if (gatherscatter != SG_NONE) { tree off; + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, true) != 0) + gatherscatter = GATHER; + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, false) + != 0) + gatherscatter = SCATTER; + else + gatherscatter = SG_NONE; as I said vect_check_gather_scatter already knows whether the DR is a read or a write and thus whether it needs to check for gather or scatter. Remove the new argument. And simply do if (!vect_check_gather_scatter (stmt)) gatherscatter = SG_NONE; - STMT_VINFO_GATHER_P (stmt_info) = true; + if (gatherscatter == GATHER) + STMT_VINFO_GATHER_P (stmt_info) = true; + else + STMT_VINFO_SCATTER_P (stmt_info) = true; } and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P using the enum so you can simply do STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter; Otherwise the patch looks ok to me. Fixed. Uros, could you please have a look at target part of patch? 2015-08-26 Andrey Turetskiy andrey.turets...@intel.com Petr Murzin petr.mur...@intel.com gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it for loads/stores in case of gather/scatter accordingly. (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P. (vect_check_gather_scatter): Use it instead of vect_check_gather. (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable and new checkings for it accordingly. * tree-vect-stmts.c (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather_scatter): Use it instead of vect_check_gather. (vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P. gcc/testsuite/ * gcc.target/i386/avx512f-scatter-1.c: New. * gcc.target/i386/avx512f-scatter-2.c: Ditto. * gcc.target/i386/avx512f-scatter-3.c: Ditto. x86 target part and testsuite are OK with the following change to the testcases: +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options -O3 -mavx512f -DAVX512F } */ + +#include avx512f-check.h + +#define N 1024 We don't want -D in the options, please move these to the source: /* { dg-do run } */ /* { dg-require-effective-target avx512f } */ /* { dg-options -O3 -mavx512f } */ #define AVX512F #include avx512f-check.h #define N 1024 Thanks, Uros.
[PATCH 0/2] Final cleanup in move to ISL
Hi, Richi suggested at the Cauldron that it would be good to have graphite more automatic and with fewer flags. The first patch removes the -funroll-and-jam pass that does not seem very stable or useful for now. The second patch removes the other -floop-* flags that were part of the old graphite's middle-end (these were the first transforms implemented on the polyhedral representation (matrices, etc.) when we had no ISL scheduler.) The transition to ISL that removed GCC's dependence on PPL and Cloog has not removed all graphite's middle-end for loop transforms. We now can remove that code as it is replaced by ISL's scheduler. The patches pass make check and bootstrap (in progress) with -fgraphite-identity. Ok to commit? Thanks, Sebastian Sebastian Pop (2): remove -floop-unroll-and-jam remove -floop-* flags gcc/Makefile.in|2 - gcc/common.opt | 20 +- gcc/doc/invoke.texi| 108 +- gcc/graphite-blocking.c| 270 - gcc/graphite-interchange.c | 656 gcc/graphite-isl-ast-to-gimple.c | 102 +- gcc/graphite-optimize-isl.c| 193 +--- gcc/graphite-poly.c| 492 + gcc/graphite-poly.h| 1085 gcc/graphite-sese-to-poly.c| 22 +- gcc/graphite.c | 13 +- gcc/params.def | 15 - gcc/testsuite/g++.dg/graphite/graphite.exp | 10 +- gcc/testsuite/gcc.dg/graphite/block-0.c|2 +- gcc/testsuite/gcc.dg/graphite/block-1.c|2 +- gcc/testsuite/gcc.dg/graphite/block-3.c|4 +- gcc/testsuite/gcc.dg/graphite/block-4.c|4 +- gcc/testsuite/gcc.dg/graphite/block-5.c|2 +- gcc/testsuite/gcc.dg/graphite/block-6.c|2 +- gcc/testsuite/gcc.dg/graphite/block-7.c|2 +- gcc/testsuite/gcc.dg/graphite/block-8.c|2 +- gcc/testsuite/gcc.dg/graphite/block-pr47654.c |2 +- gcc/testsuite/gcc.dg/graphite/graphite.exp | 14 +- gcc/testsuite/gcc.dg/graphite/interchange-0.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-1.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +- gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +- gcc/testsuite/gcc.dg/graphite/interchange-3.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-4.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-5.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-6.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-7.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-8.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-9.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +- gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +- gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c |2 +- .../gcc.dg/graphite/uns-interchange-mvt.c |4 +- gcc/testsuite/gfortran.dg/graphite/graphite.exp| 10 +- gcc/toplev.c |3 +- 48 files changed, 123 insertions(+), 2973 deletions(-) delete mode 100644 gcc/graphite-blocking.c delete mode 100644 gcc/graphite-interchange.c -- 2.1.0.243.g30d45f7
[PATCH 2/2] remove -floop-* flags
--- gcc/Makefile.in|2 - gcc/common.opt | 16 +- gcc/doc/invoke.texi| 108 +- gcc/graphite-blocking.c| 270 - gcc/graphite-interchange.c | 656 gcc/graphite-optimize-isl.c| 14 +- gcc/graphite-poly.c| 489 + gcc/graphite-poly.h| 1082 gcc/graphite-sese-to-poly.c| 22 +- gcc/graphite.c | 10 +- gcc/testsuite/g++.dg/graphite/graphite.exp | 10 +- gcc/testsuite/gcc.dg/graphite/block-0.c|2 +- gcc/testsuite/gcc.dg/graphite/block-1.c|2 +- gcc/testsuite/gcc.dg/graphite/block-3.c|4 +- gcc/testsuite/gcc.dg/graphite/block-4.c|4 +- gcc/testsuite/gcc.dg/graphite/block-5.c|2 +- gcc/testsuite/gcc.dg/graphite/block-6.c|2 +- gcc/testsuite/gcc.dg/graphite/block-7.c|2 +- gcc/testsuite/gcc.dg/graphite/block-8.c|2 +- gcc/testsuite/gcc.dg/graphite/block-pr47654.c |2 +- gcc/testsuite/gcc.dg/graphite/graphite.exp | 14 +- gcc/testsuite/gcc.dg/graphite/interchange-0.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-1.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-10.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-11.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-12.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-13.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-14.c |3 +- gcc/testsuite/gcc.dg/graphite/interchange-15.c |4 +- gcc/testsuite/gcc.dg/graphite/interchange-3.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-4.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-5.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-6.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-7.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-8.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-9.c |2 +- gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|4 +- gcc/testsuite/gcc.dg/graphite/pr37485.c|5 +- gcc/testsuite/gcc.dg/graphite/uns-block-1.c|2 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |2 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |3 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |4 +- gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c |2 +- .../gcc.dg/graphite/uns-interchange-mvt.c |4 +- gcc/testsuite/gfortran.dg/graphite/graphite.exp| 10 +- 45 files changed, 98 insertions(+), 2686 deletions(-) delete mode 100644 gcc/graphite-blocking.c delete mode 100644 gcc/graphite-interchange.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index e298ecc..3d1c1e5 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1277,10 +1277,8 @@ OBJS = \ graph.o \ graphds.o \ graphite.o \ - graphite-blocking.o \ graphite-isl-ast-to-gimple.o \ graphite-dependences.o \ - graphite-interchange.o \ graphite-optimize-isl.o \ graphite-poly.o \ graphite-scop-detection.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 0964ae4..94d1d88 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1341,16 +1341,16 @@ Common Report Var(flag_loop_parallelize_all) Optimization Mark all loops as parallel floop-strip-mine -Common Report Var(flag_loop_strip_mine) Optimization -Enable Loop Strip Mining transformation +Common Alias(floop-nest-optimize) +Enable loop nest transforms. Same as -floop-nest-optimize floop-interchange -Common Report Var(flag_loop_interchange) Optimization -Enable Loop Interchange transformation +Common Alias(floop-nest-optimize) +Enable loop nest transforms. Same as -floop-nest-optimize floop-block -Common Report Var(flag_loop_block) Optimization -Enable Loop Blocking transformation +Common Alias(floop-nest-optimize) +Enable loop nest transforms. Same as -floop-nest-optimize floop-unroll-and-jam Common Alias(floop-nest-optimize) @@ -2315,8 +2315,8 @@ Common Report Var(flag_tree_loop_im) Init(1) Optimization Enable loop invariant motion on trees ftree-loop-linear -Common Alias(floop-interchange) -Enable loop interchange transforms. Same as -floop-interchange +Common Alias(floop-nest-optimize) +Enable loop nest transforms. Same as -floop-nest-optimize ftree-loop-ivcanon Common Report Var(flag_tree_loop_ivcanon) Init(1) Optimization diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index c33cc27..8710ff8 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8733,102 +8733,19 @@ Perform loop optimizations on trees. This flag is enabled by default at @option{-O} and
[PATCH 1/2] remove -floop-unroll-and-jam
--- gcc/common.opt | 4 +- gcc/doc/invoke.texi | 8 +- gcc/graphite-isl-ast-to-gimple.c | 102 +- gcc/graphite-optimize-isl.c | 179 --- gcc/graphite-poly.c | 3 +- gcc/graphite-poly.h | 3 - gcc/graphite.c | 3 +- gcc/params.def | 15 gcc/toplev.c | 3 +- 9 files changed, 29 insertions(+), 291 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 4dcd518..0964ae4 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1353,8 +1353,8 @@ Common Report Var(flag_loop_block) Optimization Enable Loop Blocking transformation floop-unroll-and-jam -Common Report Var(flag_loop_unroll_jam) Optimization -Enable Loop Unroll Jam transformation +Common Alias(floop-nest-optimize) +Enable loop nest transforms. Same as -floop-nest-optimize fgnu-tm Common Report Var(flag_tm) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 27be317..c33cc27 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8848,10 +8848,10 @@ is experimental. @item -floop-unroll-and-jam @opindex floop-unroll-and-jam -Enable unroll and jam for the ISL based loop nest optimizer. The unroll -factor can be changed using the @option{loop-unroll-jam-size} parameter. -The unrolled dimension (counting from the most inner one) can be changed -using the @option{loop-unroll-jam-depth} parameter. . +Perform loop nest transformations. Same as +@option{-floop-nest-optimize}. To use this code transformation, GCC has +to be configured with @option{--with-isl} to enable the Graphite loop +transformation infrastructure. @item -floop-parallelize-all @opindex floop-parallelize-all diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index dfb012f..5434bfd 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -968,92 +968,6 @@ extend_schedule (__isl_take isl_map *schedule, int nb_schedule_dims) return schedule; } -/* Set the separation_class option for unroll and jam. */ - -static __isl_give isl_union_map * -generate_luj_sepclass_opt (scop_p scop, __isl_take isl_union_set *domain, - int dim, int cl) -{ - isl_map *map; - isl_space *space, *space_sep; - isl_ctx *ctx; - isl_union_map *mapu; - int nsched = get_max_schedule_dimensions (scop); - - ctx = scop-ctx; - space_sep = isl_space_alloc (ctx, 0, 1, 1); - space_sep = isl_space_wrap (space_sep); - space_sep = isl_space_set_tuple_name (space_sep, isl_dim_set, - separation_class); - space = isl_set_get_space (scop-context); - space_sep = isl_space_align_params (space_sep, isl_space_copy(space)); - space = isl_space_map_from_domain_and_range (space, space_sep); - space = isl_space_add_dims (space,isl_dim_in, nsched); - map = isl_map_universe (space); - isl_map_fix_si (map,isl_dim_out,0,dim); - isl_map_fix_si (map,isl_dim_out,1,cl); - - mapu = isl_union_map_intersect_domain (isl_union_map_from_map (map), -domain); - return (mapu); -} - -/* Compute the separation class for loop unroll and jam. */ - -static __isl_give isl_union_set * -generate_luj_sepclass (scop_p scop) -{ - int i; - poly_bb_p pbb; - isl_union_set *domain_isl; - - domain_isl = isl_union_set_empty (isl_set_get_space (scop-context)); - - FOR_EACH_VEC_ELT (SCOP_BBS (scop), i, pbb) -{ - isl_set *bb_domain; - isl_set *bb_domain_s; - - if (pbb-map_sepclass == NULL) - continue; - - if (isl_set_is_empty (pbb-domain)) - continue; - - bb_domain = isl_set_copy (pbb-domain); - bb_domain_s = isl_set_apply (bb_domain, pbb-map_sepclass); - pbb-map_sepclass = NULL; - - domain_isl = - isl_union_set_union (domain_isl, isl_union_set_from_set (bb_domain_s)); -} - - return domain_isl; -} - -/* Set the AST built options for loop unroll and jam. */ - -static __isl_give isl_union_map * -generate_luj_options (scop_p scop) -{ - isl_union_set *domain_isl; - isl_union_map *options_isl_ss; - isl_union_map *options_isl = -isl_union_map_empty (isl_set_get_space (scop-context)); - int dim = get_max_schedule_dimensions (scop) - 1; - int dim1 = dim - PARAM_VALUE (PARAM_LOOP_UNROLL_JAM_DEPTH); - - if (!flag_loop_unroll_jam) -return options_isl; - - domain_isl = generate_luj_sepclass (scop); - - options_isl_ss = generate_luj_sepclass_opt (scop, domain_isl, dim1, 0); - options_isl = isl_union_map_union (options_isl, options_isl_ss); - - return options_isl; -} - /* Generates a schedule, which specifies an order used to visit elements in a domain. */ @@ -1102,13 +1016,11 @@ ast_build_before_for (__isl_keep isl_ast_build *build, void *user) } /* Set the separate option for all dimensions. - This helps to reduce control overhead. - Set the options for
Re: [libvtv] Fix formatting errors
On 08/26/2015 01:50 PM, Caroline Tice wrote: As far as I know vtv is working just fine...is there something I don't know about? I'm not aware of anything that isn't working, but I'm also not aware of vtv in widespread use, typical performance hit experienced, etc. jeff
Re: [PATCH] [AVX512F] Add scatter support for vectorizer
On Fri, Aug 21, 2015 at 2:18 PM, Petr Murzin petrmurz...@gmail.com wrote: Hello, Please have a look at updated patch. On Tue, Aug 4, 2015 at 3:15 PM, Richard Biener rguent...@suse.de wrote: On Fri, 31 Jul 2015, Petr Murzin wrote: @@ -5586,8 +5770,6 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt, prev_stmt_info = NULL; for (j = 0; j ncopies; j++) { - gimple new_stmt; - if (j == 0) { if (slp) spurious change? I have increased the scope of this variable to use it in checking for STMT_VINFO_SCATTER_P (stmt_info). @@ -3763,32 +3776,46 @@ again: if (vf *min_vf) *min_vf = vf; - if (gather) + if (gatherscatter != SG_NONE) { tree off; + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, true) != 0) + gatherscatter = GATHER; + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, off, NULL, false) + != 0) + gatherscatter = SCATTER; + else + gatherscatter = SG_NONE; as I said vect_check_gather_scatter already knows whether the DR is a read or a write and thus whether it needs to check for gather or scatter. Remove the new argument. And simply do if (!vect_check_gather_scatter (stmt)) gatherscatter = SG_NONE; - STMT_VINFO_GATHER_P (stmt_info) = true; + if (gatherscatter == GATHER) + STMT_VINFO_GATHER_P (stmt_info) = true; + else + STMT_VINFO_SCATTER_P (stmt_info) = true; } and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P using the enum so you can simply do STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter; I miss a few testcases that exercise scatter vectorization. And as Uros said, the i386 specific parts should be split out. Otherwise the patch looks ok to me. Thanks, Richard. Thanks, Petr 2015-08-21 Andrey Turetskiy andrey.turets...@intel.com Petr Murzin petr.mur...@intel.com gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new checkings for STMT_VINFO_SCATTER_P. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this and enhance number of arguments. (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable and new checkings for it accordingly. * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it for loads/stores in case of gather/scatter accordingly. (STMT_VINFO_SCATTER_P(S)): Define. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * triee-vect-stmts.c (vectorizable_mask_load_store): Ditto. (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P. (vect_mark_stmts_to_be_vectorized): Ditto.
[PATCH] Remove reference to undefined documentation node.
This patch removes a menu entry that points to an undefined node in the documentation. The faulty entry has been introduced with git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96. It looks like the entry is a remnant of an earlier version of the documentation introduced with that change. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog * doc/extend.texi: Remove reference to undefined node. From 55b9c29f73d8da1881ce5a3f65d0c7f40623e161 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Wed, 26 Aug 2015 10:59:29 +0100 Subject: [PATCH] Remove reference to undefined documentation node. --- gcc/doc/extend.texi | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 018b5d8..f5f90e6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7245,7 +7245,6 @@ for a C symbol, or to place a C variable in a specific register. @menu * Basic Asm:: Inline assembler without operands. * Extended Asm:: Inline assembler with operands. -* Constraints::Constraints for @code{asm} operands * Asm Labels:: Specifying the assembler name to use for a C symbol. * Explicit Reg Vars:: Defining variables residing in specified registers. * Size of an asm:: How GCC calculates the size of an @code{asm} block. -- 2.3.0
Re: [RFC 5/5] Always completely replace constant pool entries
On Tue, Aug 25, 2015 at 9:54 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 05:06 AM, Alan Lawrence wrote: I used this as a means of better-testing the previous changes, as it exercises the constant replacement code a whole lot more. Indeed, quite a few tests are now optimized away to nothing on AArch64... Always pulling in constants, is almost certainly not what we want, but we may nonetheless want something more aggressive than the usual --param, e.g. for the ssa-dom-cse-2.c test. Thoughts welcomed? I'm of the opinion that we have too many knobs already. So I'd perhaps ask whether or not this option is likely to be useful to end users? As for the patch itself, any thoughts on reasonable heuristics for when to pull in the constants? Clearly we don't want the patch as-is, but are there cases we can identify when we want to be more aggressive? Well - I still think that we need to enhance those followup passes to directly handle the constant pool entry. Expanding the assignment piecewise for arbitrary large initializers is certainly a no-go. IIRC I enhanced FRE to do this at some point. For DOM it's much harder due to the way it is structured and I'd like to keep DOM simple. Note that we still want SRA to partly scalarize the initializer if only few elements remain accessed (so we can optimize the initializer away). Of course that requires catching most followup optimization opportunities before the 2nd SRA run. Richard. jeff
Re: [PATCH][ARM]Tighten the conditions for arm_movw, arm_movt
I have tested that, arm-none-linux-gnueabi bootstraps Okay on trunk code. JFTR, this is ok to backport to gcc-5 in case there are no regressions. regards Ramana Thanks, Kyrill
Re: [RFC 4/5] Handle constant-pool entries
On Tue, Aug 25, 2015 at 10:13 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 05:06 AM, Alan Lawrence wrote: This makes SRA replace loads of records/arrays from constant pool entries, with elementwise assignments of the constant values, hence, overcoming the fundamental problem in PR/63679. As a first pass, the approach I took was to look for constant-pool loads as we scanned through other accesses, and add them as candidates there; to build a constant replacement_decl for any such accesses in completely_scalarize; and to use any existing replacement_decl rather than creating a variable in create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but that does not allow addresses of labels, which can still end up in the constant pool.) Feedback as to the approach or how it might be better structured / fitted into SRA, is solicited ;). Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised! Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh. gcc/ChangeLog: * tree-sra.c (create_access): Scan for uses of constant pool and add to candidates. (subst_initial): New. (scalarize_elem): Build replacement_decl using subst_initial. (create_access_replacement): Use replacement_decl if set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param sra-max-scalarization-size-Ospeed. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +--- gcc/tree-sra.c| 56 +-- 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index af35fcc..a3ff2df 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write) else ptr = false; + /* FORNOW: scan for uses of constant pool as we go along. */ I'm not sure why you have this marked as FORNOW. If I'm reading all this code correctly, you're lazily adding items from the constant pool into the candidates table when you find they're used. That seems better than walking the entire constant pool adding them all to the candidates. I don't see this as fundamentally wrong or unclean. The question I have is why this differs from the effects of patch #5. That would seem to indicate that there's things we're not getting into the candidate tables with this approach?!? @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) } } +static tree +subst_initial (tree expr, tree var) Function comment. I think this patch is fine with the function comment added and removing the FORNOW part of the comment in create_access. It may be worth noting in create_access's comment that it can add new items to the candidates tables for constant pool entries. I'm happy seeing this code in SRA as I never liked that we already decide at gimplification time which initializers to expand and which to init from a constant pool entry. So ... can we now remove gimplify_init_constructor by _always_ emitting a constant pool entry and an assignment from it (obviously only if the constructor can be put into the constant pool)? Defering the expansion decision to SRA makes it possible to better estimate whether the code is hot/cold or whether the initialized variable can be replaced by the constant pool entry completely (variable ends up readonly). Oh, and we'd no longer create the awful split code at -O0 ... So can you explore that a bit once this series is settled? This is probably also related to 5/5 as this makes all the target dependent decisions in SRA now and thus the initial IL from gimplification should be the same for all targets (that's always a nice thing to have IMHO). Thanks, Richard. Jeff
Re: [PING^2][PATCH, PR46193] Handle mix/max pointer reductions in parloops
On Mon, Aug 24, 2015 at 5:10 PM, Tom de Vries tom_devr...@mentor.com wrote: On 22-07-15 20:15, Tom de Vries wrote: On 13/07/15 13:02, Tom de Vries wrote: Hi, this patch fixes PR46193. It handles min and max reductions of pointer type in parloops. Bootstrapped and reg-tested on x86_64. OK for trunk? Ping^2. Original submission at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01018.html . Please don't use lower_bound_in_type with two identical types. Instead use wi::max_value and wide_int_to_tree. Ok with that change. Thanks, Richard. Thanks, - Tom 0001-Handle-mix-max-pointer-reductions-in-parloops.patch Handle mix/max pointer reductions in parloops 2015-07-13 Tom de Vriest...@codesourcery.com PR tree-optimization/46193 * omp-low.c (omp_reduction_init): Handle pointer type for min or max clause. * gcc.dg/autopar/pr46193.c: New test. * testsuite/libgomp.c/pr46193.c: New test. --- gcc/omp-low.c | 4 ++ gcc/testsuite/gcc.dg/autopar/pr46193.c | 38 +++ libgomp/testsuite/libgomp.c/pr46193.c | 67 ++ 3 files changed, 109 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46193.c create mode 100644 libgomp/testsuite/libgomp.c/pr46193.c diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 2e2070a..20d0010 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -3423,6 +3423,8 @@ omp_reduction_init (tree clause, tree type) real_maxval (min, 1, TYPE_MODE (type)); return build_real (type, min); } + else if (POINTER_TYPE_P (type)) +return lower_bound_in_type (type, type); else { gcc_assert (INTEGRAL_TYPE_P (type)); @@ -3439,6 +3441,8 @@ omp_reduction_init (tree clause, tree type) real_maxval (max, 0, TYPE_MODE (type)); return build_real (type, max); } + else if (POINTER_TYPE_P (type)) +return upper_bound_in_type (type, type); else { gcc_assert (INTEGRAL_TYPE_P (type)); diff --git a/gcc/testsuite/gcc.dg/autopar/pr46193.c b/gcc/testsuite/gcc.dg/autopar/pr46193.c new file mode 100644 index 000..544a5da --- /dev/null +++ b/gcc/testsuite/gcc.dg/autopar/pr46193.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details } */ + +extern void abort (void); + +char * +foo (int count, char **list) +{ + char *minaddr = list[0]; + int i; + + for (i = 0; i count; i++) +{ + char *addr = list[i]; + if (addr minaddr) +minaddr = addr; +} + + return minaddr; +} + +char * +foo2 (int count, char **list) +{ + char *maxaddr = list[0]; + int i; + + for (i = 0; i count; i++) +{ + char *addr = list[i]; + if (addr maxaddr) +maxaddr = addr; +} + + return maxaddr; +} + +/* { dg-final { scan-tree-dump-times parallelizing inner loop 2 parloops } } */ diff --git a/libgomp/testsuite/libgomp.c/pr46193.c b/libgomp/testsuite/libgomp.c/pr46193.c new file mode 100644 index 000..1e27faf --- /dev/null +++ b/libgomp/testsuite/libgomp.c/pr46193.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-additional-options -ftree-parallelize-loops=2 } */ + +extern void abort (void); + +char * +foo (int count, char **list) +{ + char *minaddr = list[0]; + int i; + + for (i = 0; i count; i++) +{ + char *addr = list[i]; + if (addr minaddr) +minaddr = addr; +} + + return minaddr; +} + +char * +foo2 (int count, char **list) +{ + char *maxaddr = list[0]; + int i; + + for (i = 0; i count; i++) +{ + char *addr = list[i]; + if (addr maxaddr) +maxaddr = addr; +} + + return maxaddr; +} + +#define N 5 + +static void +init (char **list) +{ + int i; + for (i = 0; i N; ++i) +list[i] = (char *)list[i]; +} + +int +main (void) +{ + char *list[N]; + char * res; + + init (list); + + res = foo (N, list); + + if (res != (char *)list[0]) +abort (); + + res = foo2 (N, list); + + if (res != (char *)list[N-1]) +abort (); + + return 0; +} -- 1.9.1
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
Moore, Catherine catherine_mo...@mentor.com writes: The recent changes to the MIPS GCC Linux sysroot (https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01014.html) have meant that the include directory is now not global and is provided only for each multi-lib configuration. This means that for any test in the MIPS GCC Testsuite that requires standard library support we need to check if there is a multi-lib support for the test options, otherwise it might fail to compile. This patch adds this support to the testsuite and mips.exp files. Firstly any test that requires standard library support has the implicit option (REQUIRES_STDLIB) added to its dg-options. Secondly in mips.exp a pre- processor check is performed to ensure that when expanding a testcase containing a #include stdlib.h using the current set of test options we do not get file not found errors. If this happens we mark the testcase as unsupported. The patch has been tested on the mti/img elf/linux-gnu toolchains, and there have been no new regressions. The patch and ChangeLog are below. Ok to commit? Yes. This looks good. I had some comments on this that I hadn't got round to posting. The fix in this patch is not general enough as the missing header problem comes in two (related) forms: 1) Using the new MTI and IMG sysroot layout we can end up with GCC looking for headers in a sysroot that simply does not exist. The current patch handles this. 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible for the stdlib.h header to be found but the ABI dependent gnu-stubs header may not be installed depending on soft/hard nan1985/nan2008. The test for stdlib.h needs to therefore verify that preprocessing succeeds rather than just testing for an error relating to stdlib.h. This could be done by adding a further option to mips_preprocess to indicate the processor output should go to a file and that the caller wants the messages emitted by the compiler instead. A second issue is that you have added (REQUIRES_STDLIB) to too many tests. You only need to add it to tests that request a compiler option (via dg-options) that could potentially lead to forcing soft/hard nan1985/nan2008 directly or indirectly. So -mips32r6 implies nan2008 so you need it -mips32r5 implies nan1985 so you need it. There are at least two tests which don't need the option but you need to check them all so we don't run the check needlessly. Thanks, Matthew
Re: [PATCH 2/5] completely_scalarize arrays as well as records
On August 26, 2015 11:30:26 AM GMT+02:00, Martin Jambor mjam...@suse.cz wrote: Hi, On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote: On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 03:42 PM, Martin Jambor wrote: Hi, On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote: This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. thanks for working on this. I see Jeff has already approved the patch, but I have two comments nevertheless. First, I would be much happier if you added a proper comment to scalarize_elem function which you forgot completely. The name is not very descriptive and it has quite few parameters too. Right. I mentioned that I missed the lack of function comments when looking at #3 and asked Alan to go back and fix them in #1 and #2. Second, this patch should also fix PR 67283. It would be great if you could verify that and add it to the changelog when committing if that is indeed the case. Excellent. Yes, definitely mention the BZ. One extra question is does the way we limit total scalarization work well for arrays? I suppose we have either sth like the maximum size of an aggregate we scalarize or the maximum number of component accesses we create? Only the former and that would be kept intact. It is in fact visible in the context of the last hunk of the patch. OK. IIRC the gimplification code also has the latter and also considers zeroing the whole aggregate before initializing non-zero fields. IMHO it makes sense to reuse some of the analysis and classification routines it has. Richard. Martin
Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
On Wed, 26 Aug 2015, Bin.Cheng wrote: On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law l...@redhat.com wrote: On 08/25/2015 05:06 AM, Alan Lawrence wrote: When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant propagation in the ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle constant-pool loads). I tried to work around this by making dom2's hashable_expr_equal_p less conservative, but found that on platforms without AArch64's vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)a] equivalent to a[0], etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task than this patch series. I can't see how to write a testcase for this in C though as direct assignment to an array is not possible; such assignments occur only with constant pool data, which is dealt with in the next patch. It's a general issue that if there's 1 common way to represent an expression, then DOM will often miss discovery of the CSE opportunity because of the way it hashes expressions. Ideally we'd be moving to a canonical form, but I also realize that in the case of memory references like this, that may not be feasible. IIRC, there were talks about lowering all memory reference on GIMPLE? Which is the reverse approach. Since SRA is in quite early compilation stage, don't know if lowered memory reference has impact on other optimizers. Yeah, I'd only do the lowering after loop opts. Which also may make the DOM issue moot as the array refs would be lowered as well and thus DOM would see a consistent set of references again. The lowering should also simplify SLSR and expose address computation redundancies to DOM. I'd place such lowering before the late reassoc (any takers? I suppose you can pick up one of the bitfield lowering passes posted in the previous years as this should also handle bitfield accesses correctly). Thanks, Richard. Thanks, bin It does make me wonder how many CSEs we're really missing due to the two ways to represent array accesses. Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, aarch64-none-linux-gnu. gcc/ChangeLog: * tree-sra.c (completely_scalarize): Move some code into: (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base is aligned array. --- gcc/tree-sra.c | 110 - 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) } } +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) Function comment needed. I may have missed it in the earlier patches, but can you please make sure any new functions you created have comments in those as well. Such patches are pre-approved. With the added function comment, this patch is fine. jeff -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.
On Wed, Aug 26, 2015 at 3:29 PM, Richard Biener rguent...@suse.de wrote: On Wed, 26 Aug 2015, Bin.Cheng wrote: On Wed, Aug 26, 2015 at 3:50 AM, Jeff Law l...@redhat.com wrote: On 08/25/2015 05:06 AM, Alan Lawrence wrote: When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)a] are not hashable_expr_equal_p with accesses to e.g. MEM[(int[8] *)a]. This is necessary for constant propagation in the ssa-dom-cse-2.c testcase (after the next patch that makes SRA handle constant-pool loads). I tried to work around this by making dom2's hashable_expr_equal_p less conservative, but found that on platforms without AArch64's vectorized reductions (specifically Alpha, hppa, PowerPC, and SPARC, mentioned in ssa-dom-cse-2.c), I also needed to make MEM[(int[8] *)a] equivalent to a[0], etc.; a complete overhaul of hashable_expr_equal_p seems like a larger task than this patch series. I can't see how to write a testcase for this in C though as direct assignment to an array is not possible; such assignments occur only with constant pool data, which is dealt with in the next patch. It's a general issue that if there's 1 common way to represent an expression, then DOM will often miss discovery of the CSE opportunity because of the way it hashes expressions. Ideally we'd be moving to a canonical form, but I also realize that in the case of memory references like this, that may not be feasible. IIRC, there were talks about lowering all memory reference on GIMPLE? Which is the reverse approach. Since SRA is in quite early compilation stage, don't know if lowered memory reference has impact on other optimizers. Yeah, I'd only do the lowering after loop opts. Which also may make the DOM issue moot as the array refs would be lowered as well and thus DOM would see a consistent set of references again. The lowering should also simplify SLSR and expose address computation redundancies to DOM. I'd place such lowering before the late reassoc (any takers? I suppose you can pick up one of the bitfield lowering passes posted in the previous years as this should also handle bitfield accesses correctly). I ran into several issues related to lowered memory references (some of them are about slsr), and want to have a look at this. But only after finishing major issues in IVO... As for slsr, I think the problem is more about we need to prove equality of expressions by diving into definition chain of ssa_var, just like tree_to_affine_expand. I think this has already been discussed too. Anyway, lowering memory reference provides a canonical form and should benefit other optimizers. Thanks, bin Thanks, Richard. Thanks, bin It does make me wonder how many CSEs we're really missing due to the two ways to represent array accesses. Bootstrap + check-gcc on x86-none-linux-gnu, arm-none-linux-gnueabihf, aarch64-none-linux-gnu. gcc/ChangeLog: * tree-sra.c (completely_scalarize): Move some code into: (get_elem_size): New. (build_ref_for_offset): Build ARRAY_REF if base is aligned array. --- gcc/tree-sra.c | 110 - 1 file changed, 69 insertions(+), 41 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 08fa8dc..af35fcc 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -957,6 +957,20 @@ scalarizable_type_p (tree type) } } +static bool +get_elem_size (const_tree type, unsigned HOST_WIDE_INT *sz_out) Function comment needed. I may have missed it in the earlier patches, but can you please make sure any new functions you created have comments in those as well. Such patches are pre-approved. With the added function comment, this patch is fine. jeff -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][1/n] dwarf2out refactoring for early (LTO) debug
On Wed, 19 Aug 2015, Richard Biener wrote: On Tue, 18 Aug 2015, Aldy Hernandez wrote: On 08/18/2015 07:20 AM, Richard Biener wrote: This starts a series of patches (still in development) to refactor dwarf2out.c to better cope with early debug (and LTO debug). Awesome! Thanks. Aldyh, what other testing did you usually do for changes? Run the gdb testsuite against the new compiler? Anything else? gdb testsuite, and make sure you test GCC with --enable-languages=all,go,ada, though the latter is mostly useful while you iron out bugs initially. I found that ultimately, the best test was C++. I see. Pre merge I also bootstrapped the compiler and compared .debug* section sizes in object files to make sure things were within reason. + +static void +vmsdbgout_early_finish (const char *filename ATTRIBUTE_UNUSED) +{ + if (write_symbols == VMS_AND_DWARF2_DEBUG) +(*dwarf2_debug_hooks.early_finish) (filename); +} You can get rid of ATTRIBUTE_UNUSED now. Done. I've also refrained from moving gen_scheduled_generic_parms_dies (); gen_remaining_tmpl_value_param_die_attribute (); for now as that causes regressions I have to investigate. The patch below has passed bootstrap regtest on x86_64-unknown-linux-gnu as well as gdb testing. Twice unpatched, twice patched - results seem to be somewhat unstable!? I even refrained from using any -j with make check-gdb... maybe it's just contrib/test_summary not coping well with gdb? any hints? Difference between unpatched run 1 2 is for example --- results.unpatched 2015-08-19 15:08:36.152899926 +0200 +++ results.unpatched2 2015-08-19 15:29:46.902060797 +0200 @@ -209,7 +209,6 @@ WARNING: remote_expect statement without a default case?! WARNING: remote_expect statement without a default case?! WARNING: remote_expect statement without a default case?! -FAIL: gdb.base/varargs.exp: print find_max_float_real(4, fc1, fc2, fc3, fc4) FAIL: gdb.cp/inherit.exp: print g_vD FAIL: gdb.cp/inherit.exp: print g_vE FAIL: gdb.cp/no-dmgl-verbose.exp: setting breakpoint at 'f(std::string)' @@ -238,6 +237,7 @@ UNRESOLVED: gdb.fortran/types.exp: set print sevenbit-strings FAIL: gdb.fortran/whatis_type.exp: run to MAIN__ WARNING: remote_expect statement without a default case?! +FAIL: gdb.gdb/complaints.exp: print symfile_complaints-root-fmt WARNING: remote_expect statement without a default case?! WARNING: remote_expect statement without a default case?! WARNING: remote_expect statement without a default case?! @@ -362,12 +362,12 @@ === gdb Summary === -# of expected passes 30881 +# of expected passes 30884 # of unexpected failures 284 # of unexpected successes 2 -# of expected failures 85 +# of expected failures 83 # of unknown successes 2 -# of known failures60 +# of known failures59 # of unresolved testcases 6 # of untested testcases32 # of unsupported tests 165 the sames changes randomly appear/disappear in the patched case. Otherwise patched/unpatched agree. Ok? Jason, are you willing to review these refactoring patches or can I invoke my middle-end maintainer powers to remove some of this noise from the LTO parts? Thanks, Richard. Thanks, Richard. 2015-08-18 Richard Biener rguent...@suse.de * debug.h (gcc_debug_hooks::early_finish): Add filename argument. * dbxout.c (dbx_debug_hooks): Adjust. * debug.c (do_nothing_hooks): Likewise. * sdbout.c (sdb_debug_hooks): Likewise. * vmsdbgout.c (vmsdbgout_early_finish): New function dispatching to dwarf2out variant if needed. (vmsdbg_debug_hooks): Adjust. * dwarf2out.c (dwarf2_line_hooks): Adjust. (flush_limbo_die_list): New function. (dwarf2out_finish): Call flush_limbo_die_list instead of dwarf2out_early_finish. Assert there are no deferred asm-names. Move early stuff ... (dwarf2out_early_finish): ... here. * cgraphunit.c (symbol_table::finalize_compilation_unit): Call early_finish with main_input_filename argument. Index: gcc/cgraphunit.c === --- gcc/cgraphunit.c (revision 226966) +++ gcc/cgraphunit.c (working copy) @@ -2490,7 +2490,7 @@ symbol_table::finalize_compilation_unit /* Clean up anything that needs cleaning up after initial debug generation. */ - (*debug_hooks-early_finish) (); + (*debug_hooks-early_finish) (main_input_filename); /* Finally drive the pass manager. */ compile (); Index: gcc/dbxout.c === --- gcc/dbxout.c (revision 226966) +++ gcc/dbxout.c (working copy) @@ -354,7 +354,7 @@ const struct gcc_debug_hooks dbx_debug_h {
[PATCH][3/n] dwarf2out refactoring for early (LTO) debug
The following fixes a GC issue I run into when doing prune_unused_types_prune early. The issue is that the DIE struct has a chain_circular marked field (die_sib) which cannot tolerate spurious extra entries from old removed entries into the circular chain. Otherwise we fail to properly mark parts of the chain. Those stray entries are kept live referenced from TYPE_SYMTAB_DIE. So the following patch that makes sure to clear -die_sib for nodes we remove. (these DIEs remaining in TYPE_SYMTAB_DIE also means we may end up re-using them which is probably not what we want ... in the original LTO experiment I had a -removed flag in the DIE struct and removed DIEs from the cache at cache lookup time if I hit a removed DIE) Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb tested there as well. Ok for trunk? Thanks, Richard. 2015-08-26 Richard Biener rguent...@suse.de * dwarf2out.c (remove_child_with_prev): Clear child-die_sib. (replace_child): Likewise. (remove_child_TAG): Adjust. (move_marked_base_types): Likewise. (prune_unused_types_prune): Clear die_sib of removed children. Index: trunk/gcc/dwarf2out.c === --- trunk.orig/gcc/dwarf2out.c 2015-08-26 09:30:54.679185817 +0200 +++ trunk/gcc/dwarf2out.c 2015-08-25 16:54:09.150506037 +0200 @@ -4827,6 +4827,7 @@ remove_child_with_prev (dw_die_ref child prev-die_sib = child-die_sib; if (child-die_parent-die_child == child) child-die_parent-die_child = prev; + child-die_sib = NULL; } /* Replace OLD_CHILD with NEW_CHILD. PREV must have the property that @@ -4853,6 +4854,7 @@ replace_child (dw_die_ref old_child, dw_ } if (old_child-die_parent-die_child == old_child) old_child-die_parent-die_child = new_child; + old_child-die_sib = NULL; } /* Move all children from OLD_PARENT to NEW_PARENT. */ @@ -4883,9 +4885,9 @@ remove_child_TAG (dw_die_ref die, enum d remove_child_with_prev (c, prev); c-die_parent = NULL; /* Might have removed every child. */ - if (c == c-die_sib) + if (die-die_child == NULL) return; - c = c-die_sib; + c = prev-die_sib; } } while (c != die-die_child); } @@ -24565,8 +24590,8 @@ prune_unused_types_prune (dw_die_ref die c = die-die_child; do { -dw_die_ref prev = c; -for (c = c-die_sib; ! c-die_mark; c = c-die_sib) +dw_die_ref prev = c, next; +for (c = c-die_sib; ! c-die_mark; c = next) if (c == die-die_child) { /* No marked children between 'prev' and the end of the list. */ @@ -24578,8 +24603,14 @@ prune_unused_types_prune (dw_die_ref die prev-die_sib = c-die_sib; die-die_child = prev; } + c-die_sib = NULL; return; } + else + { + next = c-die_sib; + c-die_sib = NULL; + } if (c != prev-die_sib) prev-die_sib = c; @@ -24824,8 +24855,8 @@ move_marked_base_types (void) remove_child_with_prev (c, prev); /* As base types got marked, there must be at least one node other than DW_TAG_base_type. */ - gcc_assert (c != c-die_sib); - c = c-die_sib; + gcc_assert (die-die_child != NULL); + c = prev-die_sib; } } while (c != die-die_child);
[build] Use __cxa_atexit on Solaris 12+
Solaris 12 introduced __cxa_atexit in libc. The following patch makes use of it, and also removes the strange failures seen with gld reported in PR c++/51923. Bootstrapped without regressions on i386-pc-solaris2.1[12] and sparc-sun-solaris2.1[12], will installl on mainline. Will backport to the gcc 5 branch after some soak time. Rainer 2015-02-10 Rainer Orth r...@cebitec.uni-bielefeld.de * config.gcc (*-*-solaris2*): Enable default_use_cxa_atexit on Solaris 12+. Use __cxa_atexit on Solaris 10+ diff --git a/gcc/config.gcc b/gcc/config.gcc --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -820,6 +820,12 @@ case ${target} in sol2_tm_file_head=dbxelf.h elfos.h ${cpu_type}/sysv4.h sol2_tm_file_tail=${cpu_type}/sol2.h sol2.h sol2_tm_file=${sol2_tm_file_head} ${sol2_tm_file_tail} + case ${target} in +*-*-solaris2.1[2-9]*) + # __cxa_atexit was introduced in Solaris 12. + default_use_cxa_atexit=yes + ;; + esac use_gcc_stdint=wrap if test x$gnu_ld = xyes; then tm_file=usegld.h ${tm_file} -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[gomp4] loop partition optimization
I've committed this patch, which implements a simple partioned execution optimization. A loop over both worker and vector dimensions is emits separate FORK and JOIN markers for the two dimensions -- there may be reduction pieces between them, as Cesar will shortly be committing. However, if there aren't reductions, then we end up with one partitioned region sitting neatly entirely inside another region. This is inefficient, as it causes us to add separate worker and vector partitioning startup. This optimization looks for regions of this form, and if found consumes the inner retion into the outer region. Then we only emit a single setup block of code. nathan 2015-08-26 Nathan Sidwell nat...@codesourcery.com * config/nvptx/nvptx.opt (moptimize): New flag. * config/nvptx/nvptx.c (nvptx_option_override): Default nvptx_optimize. (nvptx_optimmize_inner): New. (nvptx_process_pars): Call it. * doc/invoke.txi (Nvptx options): Document moptimize. Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 227180) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -178,6 +178,9 @@ nvptx_option_override (void) write_symbols = NO_DEBUG; debug_info_level = DINFO_LEVEL_NONE; + if (nvptx_optimize 0) +nvptx_optimize = optimize 0; + declared_fndecls_htab = hash_tabletree_hasher::create_ggc (17); needed_fndecls_htab = hash_tabletree_hasher::create_ggc (17); declared_libfuncs_htab @@ -3005,6 +3008,64 @@ nvptx_skip_par (unsigned mask, parallel nvptx_single (mask, par-forked_block, pre_tail); } +/* If PAR has a single inner parallel and PAR itself only contains + empty entry and exit blocks, swallow the inner PAR. */ + +static void +nvptx_optimize_inner (parallel *par) +{ + parallel *inner = par-inner; + + /* We mustn't be the outer dummy par. */ + if (!par-mask) +return; + + /* We must have a single inner par. */ + if (!inner || inner-next) +return; + + /* We must only contain 2 blocks ourselves -- the head and tail of + the inner par. */ + if (par-blocks.length () != 2) +return; + + /* We must be disjoint partitioning. As we only have vector and + worker partitioning, this is sufficient to guarantee the pars + have adjacent partitioning. */ + if ((par-mask inner-mask) (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1)) +/* This indicates malformed code generation. */ +return; + + /* The outer forked insn should be the only one in its block. */ + rtx_insn *probe; + rtx_insn *forked = par-forked_insn; + for (probe = BB_END (par-forked_block); + probe != forked; probe = PREV_INSN (probe)) +if (INSN_P (probe)) + return; + + /* The outer joining insn, if any, must be in the same block as the inner + joined instruction, which must otherwise be empty of insns. */ + rtx_insn *joining = par-joining_insn; + rtx_insn *join = inner-join_insn; + for (probe = BB_END (inner-join_block); + probe != join; probe = PREV_INSN (probe)) +if (probe != joining INSN_P (probe)) + return; + + /* Preconditions met. Swallow the inner par. */ + par-mask |= inner-mask (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1); + + par-blocks.reserve (inner-blocks.length ()); + while (inner-blocks.length ()) +par-blocks.quick_push (inner-blocks.pop ()); + + par-inner = inner-inner; + inner-inner = NULL; + + delete inner; +} + /* Process the parallel PAR and all its contained parallels. We do everything but the neutering. Return mask of partitioned modes used within this parallel. */ @@ -3012,8 +3073,11 @@ nvptx_skip_par (unsigned mask, parallel static unsigned nvptx_process_pars (parallel *par) { - unsigned inner_mask = par-mask; + if (nvptx_optimize) +nvptx_optimize_inner (par); + unsigned inner_mask = par-mask; + /* Do the inner parallels first. */ if (par-inner) { Index: gcc/config/nvptx/nvptx.opt === --- gcc/config/nvptx/nvptx.opt (revision 227180) +++ gcc/config/nvptx/nvptx.opt (working copy) @@ -29,6 +29,10 @@ mmainkernel Target Report RejectNegative Link in code for a __main kernel. +moptimize +Target Report Var(nvptx_optimize) Init(-1) +Optimize partition neutering + Enum Name(ptx_isa) Type(int) Known PTX ISA versions (for use with the -misa= option): Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 227180) +++ gcc/doc/invoke.texi (working copy) @@ -18814,6 +18814,11 @@ Generate code for 32-bit or 64-bit ABI. Link in code for a __main kernel. This is for stand-alone instead of offloading execution. +@item -moptimize +@opindex moptimize +Apply partitioned execution optimizations. This is the default when any +level of optimization is selected. + @end table @node PDP-11 Options
[libgfortran,committed] Fix SHAPE intrinsic with KIND values 1 and 2
Attached patch fixes the SHAPE intrinsic with option argument KIND values of 1 and 2. While we already accept and emit code for SHAPE with KIND values, the runtime versions with integer kinds 1 and 2 are missing (while values of 4, 8 and 10 are present). The patch adds the necessary generated files, and symbols into gfortran.map, as well as a testcase. I also took the opportunity to fix an error in the type of the SHAPE argument, which is a generic array (array_t) and not a specifically-typed version. This changes nothing for the generated code, because only the shape of the array descriptor is accessed. But it’s cleaner that way. Committed as revision 227210, after bootstrapping and regtesting on x86_64-apple-darwin15. FX shape.ChangeLog Description: Binary data shape.diff Description: Binary data
Re: [Scalar masks 2/x] Use bool masks in if-conversion
2015-08-26 0:26 GMT+03:00 Jeff Law l...@redhat.com: On 08/21/2015 06:17 AM, Ilya Enkovich wrote: Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. That's what I assumed -- you can pass in a mask as an argument and it's supposed to be a simple integer, right? Depending on target ABI requires either vector mask or a simple integer value. Current implementation of masked loads, masked stores and bool patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we really call it a canonical representation for all targets? No idea - we'll revisit when another targets adds a similar capability. AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I'm a bit surprised they're so prevalent and hard to optimize away. ISTM PRE ought to handle this kind of thing with relative ease. Most of vector comparisons are UNSPEC. And I doubt PRE may actually help much even if get rid of UNSPEC somehow. Is there really a redundancy in: if ((v1 cmp v2) (v3 cmp v4)) load v1 cmp v2 - mask1 select mask1 vec_cst_-1 vec_cst_0 - vec_mask1 v3 cmp v4 - mask2 select mask2 vec_mask1 vec_cst_0 - vec_mask2 vec_mask2 NE vec_cst_0 - mask3 load by mask3 It looks to me more like a i386 specific instruction selection problem. Ilya Fact is GCC already copes with vector masks generated by vector compares just fine everywhere and I'd rather leave it as that. Nope. Currently vector mask is obtained from a vec_cond A op B, {0 .. 0}, {-1 .. -1}. AND and IOR on bools are also expressed via additional vec_cond. I don't think vectorizer ever generates vector comparison. And I wouldn't say it's fine 'everywhere' because there is a single target utilizing them. Masked loads and stored for AVX-512 just don't work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to 512-bit vector then we get an ugly inefficient code. The question is where to fight with this inefficiency: in RTL or in GIMPLE. I want to fight with it where it appears, i.e. in GIMPLE by preventing bool - int conversions applied everywhere even if target doesn't need it. You should expect pushback anytime target dependencies are added to gimple, even if it's stuff in the vectorizer, which is infested with target dependencies. If we don't want to support both types of masks in GIMPLE then it's more reasonable to make bool - int conversion in expand for targets requiring it, rather than do it for everyone and then leave it to target to transform it back and try to get rid of all those redundant transformations. I'd give vectorbool a chance to become a canonical mask representation for that. Might be worth some experimentation. Jeff
Re: [Scalar masks 2/x] Use bool masks in if-conversion
2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com: On 08/21/2015 04:49 AM, Ilya Enkovich wrote: I want a work with bitmasks to be expressed in a natural way using regular integer operations. Currently all masks manipulations are emulated via vector statements (mostly using a bunch of vec_cond). For complex predicates it may be nontrivial to transform it back to scalar masks and get an efficient code. Also the same vector may be used as both a mask and an integer vector. Things become more complex if you additionally have broadcasts and vector pack/unpack code. It also should be transformed into a scalar masks manipulations somehow. Or why not model the conversion at the gimple level using a CONVERT_EXPR? In fact, the more I think about it, that seems to make more sense to me. We pick a canonical form for the mask, whatever it may be. We use that canonical form and model conversions between it and the other form via CONVERT_EXPR. We then let DOM/PRE find/eliminate the redundant conversions. If it's not up to the task, we should really look into why and resolve. Yes, that does mean we have two forms which I'm not terribly happy about and it means some target dependencies on what the masked vector operation looks like (ie, does it accept a simple integer or vector mask), but I'm starting to wonder if, as distasteful as I find it, it's the right thing to do. If we have some special representation for masks in GIMPLE then we might not need any conversions. We could ask a target to define a MODE for this type and use it directly everywhere: directly compare into it, use it directly for masked loads and stores, AND, IOR, EQ etc. If that type is reserved for masks usage then you previous suggestion to transform masks into target specific form at GIMPLE-RTL phase should work fine. This would allow to support only a single masks representation in GIMPLE. Thanks, Ilya But I don't like changing our IL so much as to allow 'integer' masks everywhere. I'm warming up to that idea... jeff
Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB
On 25 August 2015 at 14:12, Andre Vieira andre.simoesdiasvie...@arm.com wrote: gcc/ChangeLog: 2015-08-07 Ramana Radhakrishnan ramana.radhakrish...@arm.com Andre Vieira andre.simoesdiasvie...@arm.com * config/aarch64/aarch64.md (*condjump): Handle functions 1 Mib. (*cboptabmode1): Likewise. (*tboptabmode1): Likewise. (*cboptabmode1): Likewise. * config/aarch64/iterators.md (inv_cb): New code attribute. (inv_tb): Likewise. * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. gcc/testsuite/ChangeLog: 2015-08-07 Andre Vieira andre.simoesdiasvie...@arm.com * gcc.target/aarch64/long_branch_1.c: New test. OK /Marcus
[libgo] Use stat_atim.go on Solaris 12+
Solaris 12 changes the stat_[amc]tim members of struct stat from timestruc_t to timespec_t for XPG7 compatiblity, thus breaking the libgo build. The following patch checks for this change and uses the common stat_atim.go if appropriate. Btw., I noticed that go/os/stat_atim.go and stat_dragonfly.go are identical; no idea why that would be useful. Bootstrapped without regressions on i386-pc-solaris2.1[12] and sparc-sun-solaris2.1[12]. I had to regenerate aclocal.m4 since for some reason it had been built with automake 1.11.1 instead of the common 1.11.6, thus inhibiting Makefile.in regeneration. Ok for mainline now and the gcc 5 branch after some soak time? Rainer 2015-02-10 Rainer Orth r...@cebitec.uni-bielefeld.de * configure.ac (have_stat_timespec): Check for timespec_t st_atim in sys/stat.h. (HAVE_STAT_TIMESPEC): New conditional. * configure: Regenerate. * Makefile.am [LIBGO_IS_SOLARIS HAVE_STAT_TIMESPEC] (go_os_stat_file): Use go/os/stat_atim.go. * aclocal.m4: Regenerate. * Makefile.in: Regenerate. # HG changeset patch # Parent b83d7b91430fc3d2c2f34df34aaf648b178d2cad Use stat_atim.go on Solaris 12+ diff --git a/libgo/Makefile.am b/libgo/Makefile.am --- a/libgo/Makefile.am +++ b/libgo/Makefile.am @@ -880,7 +880,11 @@ endif endif if LIBGO_IS_SOLARIS +if HAVE_STAT_TIMESPEC +go_os_stat_file = go/os/stat_atim.go +else go_os_stat_file = go/os/stat_solaris.go +endif else if LIBGO_IS_LINUX go_os_stat_file = go/os/stat_atim.go diff --git a/libgo/configure.ac b/libgo/configure.ac --- a/libgo/configure.ac +++ b/libgo/configure.ac @@ -654,6 +654,12 @@ AC_CACHE_CHECK([epoll_event data.fd offs STRUCT_EPOLL_EVENT_FD_OFFSET=${libgo_cv_c_epoll_event_fd_offset} AC_SUBST(STRUCT_EPOLL_EVENT_FD_OFFSET) +dnl Check if sys/stat.h uses timespec_t for st_?tim members. Introduced +dnl in Solaris 12 for XPG7 compatibility. +AC_EGREP_HEADER([timespec_t.*st_atim], [sys/stat.h], + [have_stat_timespec=yes], [have_stat_timespec=no]) +AM_CONDITIONAL(HAVE_STAT_TIMESPEC, test $have_stat_timespec = yes) + dnl See if struct exception is defined in math.h. AC_CHECK_TYPE([struct exception], [libgo_has_struct_exception=yes], -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[libvtv] Update copyrights
While working on the Solaris libvtv port, I noticed that many of the libvtv copyright years hadn't been updated, were misformtted, or both. It turns out that libvtv isn't listed in contrib/update-copyright.py at all. This patch fixes this and includes the result of running update-copyright.py --this-year libvtv. I've neither added libvtv to self.default_dirs in the script nor added copyrights to the numerous files in libvtv that currently lack one. Ok for mainline once it has survived regtesting? Thanks. Rainer 2015-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de libvtv: Update copyrights. contrib: * update-copyright.py (GCCCmdLine): Add libvtv. # HG changeset patch # Parent 322129613b3dfc80c06f5f87dae9f2fa962a3496 Update copyrights diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py --- a/contrib/update-copyright.py +++ b/contrib/update-copyright.py @@ -745,6 +745,7 @@ class GCCCmdLine (CmdLine): # libsanitiser is imported from upstream. self.add_dir ('libssp') self.add_dir ('libstdc++-v3', LibStdCxxFilter()) +self.add_dir ('libvtv') self.add_dir ('lto-plugin') # zlib is imported from upstream. diff --git a/libvtv/Makefile.am b/libvtv/Makefile.am --- a/libvtv/Makefile.am +++ b/libvtv/Makefile.am @@ -1,6 +1,6 @@ ## Makefile for the VTV library. ## -## Copyright (C) 2013 Free Software Foundation, Inc. +## Copyright (C) 2013-2015 Free Software Foundation, Inc. ## ## Process this file with automake to produce Makefile.in. ## diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt --- a/libvtv/configure.tgt +++ b/libvtv/configure.tgt @@ -1,5 +1,5 @@ # -*- shell-script -*- -# Copyright (C) 2013 Free Software Foundation, Inc. +# Copyright (C) 2013-2015 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/libvtv/testsuite/config/default.exp b/libvtv/testsuite/config/default.exp --- a/libvtv/testsuite/config/default.exp +++ b/libvtv/testsuite/config/default.exp @@ -1,4 +1,4 @@ -# Copyright (C) 2013 Free Software Foundation, Inc. +# Copyright (C) 2013-2015 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by diff --git a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc --- a/libvtv/testsuite/libvtv.cc/virtfunc-test.cc +++ b/libvtv/testsuite/libvtv.cc/virtfunc-test.cc @@ -2,8 +2,7 @@ /* This test script is part of GDB, the GNU debugger. - Copyright 1993, 1994, 1997, 1998, 1999, 2003, 2004, - Free Software Foundation, Inc. + Copyright (C) 1993-2015 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by diff --git a/libvtv/testsuite/other-tests/Makefile.am b/libvtv/testsuite/other-tests/Makefile.am --- a/libvtv/testsuite/other-tests/Makefile.am +++ b/libvtv/testsuite/other-tests/Makefile.am @@ -1,6 +1,6 @@ ## Makefile for the testsuite subdirectory of the VTV library. ## -## Copyright (C) 2013 Free Software Foundation, Inc. +## Copyright (C) 2013-2015 Free Software Foundation, Inc. ## ## Process this file with automake to produce Makefile.in. ## diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc --- a/libvtv/vtv_fail.cc +++ b/libvtv/vtv_fail.cc @@ -1,5 +1,4 @@ -/* Copyright (C) 2012-2013 - Free Software Foundation +/* Copyright (C) 2012-2015 Free Software Foundation, Inc. This file is part of GCC. diff --git a/libvtv/vtv_fail.h b/libvtv/vtv_fail.h --- a/libvtv/vtv_fail.h +++ b/libvtv/vtv_fail.h @@ -1,5 +1,4 @@ -// Copyright (C) 2012-2013 -// Free Software Foundation +// Copyright (C) 2012-2015 Free Software Foundation, Inc. // // This file is part of GCC. // diff --git a/libvtv/vtv_malloc.cc b/libvtv/vtv_malloc.cc --- a/libvtv/vtv_malloc.cc +++ b/libvtv/vtv_malloc.cc @@ -1,5 +1,4 @@ -/* Copyright (C) 2012-2013 - Free Software Foundation +/* Copyright (C) 2012-2015 Free Software Foundation, Inc. This file is part of GCC. diff --git a/libvtv/vtv_malloc.h b/libvtv/vtv_malloc.h --- a/libvtv/vtv_malloc.h +++ b/libvtv/vtv_malloc.h @@ -1,5 +1,4 @@ -// Copyright (C) 2012-2013 -// Free Software Foundation +// Copyright (C) 2012-2015 Free Software Foundation, Inc. // // This file is part of GCC. // diff --git a/libvtv/vtv_map.h b/libvtv/vtv_map.h --- a/libvtv/vtv_map.h +++ b/libvtv/vtv_map.h @@ -1,5 +1,4 @@ -/* Copyright (C) 2012-2013 - Free Software Foundation +/* Copyright (C) 2012-2015 Free Software Foundation, Inc. This file is part of GCC. diff --git a/libvtv/vtv_rts.cc b/libvtv/vtv_rts.cc --- a/libvtv/vtv_rts.cc +++ b/libvtv/vtv_rts.cc @@ -1,5 +1,4 @@ -/* Copyright (C) 2012-2013 - Free Software Foundation +/*
Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran
From: Ulrich Weigand uweig...@de.ibm.com Date: Wed, 26 Aug 2015 13:45:35 +0200 Hans-Peter Nilsson wrote: From: Ulrich Weigand uweig...@de.ibm.com Date: Tue, 25 Aug 2015 19:45:06 +0200 However, neither works for the SPU, because in both cases libtool will only do the test whether the target supports the -fPIC option. It will not test whether the target supports dynamic libraries. [ It will do that test; and default to --disable-shared on SPU. That is a no-op for libbacktrace however, since it calls LT_INIT with the disable-shared option anyway. Maybe it shouldn't? Huh? We do want libbacktrace solely as static library, that's the whole point ... I meant that as a *suggestion for a possible workaround* to stop libtool from refusing to compile with PIC, but then I take it you don't need hints to try another angle than adjusting compilation flags. When adding back the -fPIC flag due to either the pic-only LT_INIT option or the -prefer-pic libtool command line option, it does not check for that again. ] Sounds like a bug somewhere, in libtool or its current use: there *should* be a way to specify I'd prefer PIC code in these static libraries. But that's what the option *does*. Let me try again, maybe we can reduce confusion a bit :-) I don't feel very confused, but I understand you've investigated things down to a point where we can conclude that libtool can't do what SPU needs without also at least fiddling with compilation options. I guess we can always fall back to just hard-coding SPU once more; that's certainly the simplest solution right now. Maybe. brgds, H-P
[boehm-gc] Avoid unstructured procfs on Solaris
boehm-gc doesn't currently build on Solaris 12 since that release finally removed the old unstructured /proc, thus the PIOCOPENPD ioctl. This is already mentioned in the Solaris 11 EOF list: http://www.oracle.com/technetwork/systems/end-of-notices/eonsolaris11-392732.html Since the replacement (using /proc/pid/pagedata directly) has been available since Solaris 2.6 in 1997, there's no need to retain the old code, especially given that mainline only supports Solaris 10 and up. Bootstrapped without regressions on i386-pc-solaris2.1[12] and sparc-sun-solaris2.1[12], will install on mainline. Will backport to the gcc 5 branch after some soak time. Rainer 2015-02-10 Rainer Orth r...@cebitec.uni-bielefeld.de * os_dep.c [GC_SOLARIS_THREADS] (GC_dirty_init): Use /proc/pid/pagedata instead of PIOCOPENPD. # HG changeset patch # Parent 819be80e1b9c7e840fe5d232d64cf106869a933d Avoid unstructured procfs on Solaris 12+ diff --git a/boehm-gc/os_dep.c b/boehm-gc/os_dep.c --- a/boehm-gc/os_dep.c +++ b/boehm-gc/os_dep.c @@ -3184,13 +3184,11 @@ void GC_dirty_init() (GC_words_allocd + GC_words_allocd_before_gc)); # endif } -sprintf(buf, /proc/%d, getpid()); -fd = open(buf, O_RDONLY); -if (fd 0) { +sprintf(buf, /proc/%d/pagedata, getpid()); +GC_proc_fd = open(buf, O_RDONLY); +if (GC_proc_fd 0) { ABORT(/proc open failed); } -GC_proc_fd = syscall(SYS_ioctl, fd, PIOCOPENPD, 0); -close(fd); syscall(SYS_fcntl, GC_proc_fd, F_SETFD, FD_CLOEXEC); if (GC_proc_fd 0) { ABORT(/proc ioctl failed); -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Remove reference to undefined documentation node.
On Wed, Aug 26, 2015 at 11:05:09AM +0100, Dominik Vogt wrote: This patch removes a menu entry that points to an undefined node in the documentation. The faulty entry has been introduced with git commit id 3aabc45f2, subversion id 138bc75d-0d04-0410-96. It looks like the entry is a remnant of an earlier version of the documentation introduced with that change. Sorry, this patch is not good. Please ignore; I'll look for a different way to fix the warning. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran
Hans-Peter Nilsson wrote: From: Ulrich Weigand uweig...@de.ibm.com Date: Tue, 25 Aug 2015 19:45:06 +0200 However, neither works for the SPU, because in both cases libtool will only do the test whether the target supports the -fPIC option. It will not test whether the target supports dynamic libraries. [ It will do that test; and default to --disable-shared on SPU. That is a no-op for libbacktrace however, since it calls LT_INIT with the disable-shared option anyway. Maybe it shouldn't? Huh? We do want libbacktrace solely as static library, that's the whole point ... When adding back the -fPIC flag due to either the pic-only LT_INIT option or the -prefer-pic libtool command line option, it does not check for that again. ] Sounds like a bug somewhere, in libtool or its current use: there *should* be a way to specify I'd prefer PIC code in these static libraries. But that's what the option *does*. Let me try again, maybe we can reduce confusion a bit :-) We've been discussing three potential sets of options to use with the LT_INIT call here. Those are: A) LT_INIT# no options Build both a static and a shared library. If the target does not support shared libraries, build the static library only. The code landing in the static library is built without -fPIC; code for the shared library is built with -fPIC (or the appropriate target flag). B) LT_INIT([disable-shared]) Build *solely* a static library. Code is compiled without -fPIC. C) LT_INIT([disable-shared,pic-only]) Build solely a static library, but compile code with -fPIC or the appropriate target flag (may be none if the target does not support -fPIC). [Note that in all cases, behaviour can be overridden via configure options like --enable/disable-shared and --enable/disable-static.] As I understand it, we deliberately do not use option A. As the comment in the libbacktrace configure.ac says: # When building as a target library, shared libraries may want to link # this in. We don't want to provide another shared library to # complicate dependencies. Instead, we just compile with -fPIC. That's why libbacktrace currently uses option B and manually adds a -fPIC flag. Now, after the latest check-in, the behaviour is mostly equivalent to using option C (and not manually changing PIC flags). However, none of the options do exactly what would be right for the SPU, which would be: Build solely a static library, using code that is compiled so that it can be linked as part of a second library (static or shared). This is equivalent to: Build solely a static library, but compile code with -fPIC or the appropriate target flag *if the target supports shared libraries*. This again is *mostly* equivalent to option C, *except* on targets that support -fPIC but do not support shared libraries. I'm not sure if it is worthwhile to try and change libtool to support targets with that property (e.g. adding a new LT_INIT option), if this in practice only affects SPU. But, I'll have to leave solving this PIC-failing-at-linkage problem to you; I committed the current approved fix for PIC-failing-at-compilation. I guess we can always fall back to just hard-coding SPU once more; that's certainly the simplest solution right now. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [AArch64][TLSLE][1/3] Add the option -mtls-size for AArch64
On 25 August 2015 at 15:15, Jiong Wang jiong.w...@arm.com wrote: 2015-08-25 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.opt (mtls-size): New entry. * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. (aarch64_override_options_internal): Call initialize_aarch64_tls_size. * doc/invoke.texi (AArch64 Options): Document -mtls-size. OK Thanks /Marcus
Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION
On 19 Aug 2015, at 22:35, Jeff Law l...@redhat.com wrote: On 08/19/2015 06:29 AM, David Sherwood wrote: I asked Richard S. to give this a once-over which he did. However, he technically can't approve due to the way his maintainership position was worded. The one request would be a function comment for emit_mode_unit_size and emit_mode_unit_precision. OK with that change. Thanks. Here's a new patch with the comments added. Good to go? David. ChangeLog: 2015-08-19 David Sherwood david.sherw...@arm.com gcc/ * genmodes.c (emit_mode_unit_size_inline): New function. (emit_mode_unit_precision_inline): New function. (emit_insn_modes_h): Emit new #define. Emit new functions. (emit_mode_unit_size): New function. (emit_mode_unit_precision): New function. (emit_mode_adjustments): Add mode_unit_size adjustments. (emit_insn_modes_c): Emit new arrays. * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to use new inline methods. Thanks, this is OK for the trunk. It seems this broke sh-elf, at least when compiling on OSX with its native clang. ../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 'mode_unit_size' with a different type: 'const unsigned char [56]' vs 'unsigned char [56]' extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES]; ^ ./insn-modes.h:417:24: note: previous definition is here extern unsigned char mode_unit_size[NUM_MACHINE_MODES]; ^ Cheers, Oleg
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 3:35 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote: On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote: AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). That was just one of the reasons. The other reason is that even if we would choose some vector of integer type as vector of bool, the question is what type. E.g. if you use vector of chars, you almost always get terrible vectorized code, except for the AVX-512 you really want an integral type that has the size of the types you are comparing. Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always first compute the vector type for the comparison itself (which is fixed) and thus we can compute the vector type of any bitwise op on it as well. Sure, but if you then immediately vector narrow it to a V*QI vector because it is stored originally into a bool/_Bool variable, and then again when it is used in say a COND_EXPR widen it again, you get really poor code. So, what the bool pattern code does is kind of poor man's type promotion/demotion pass for bool only, at least for the common cases. Yeah, I just looked at the code but in the end everything should be fixable in the place we compute STMT_VINFO_VECTYPE. The code just looks at the LHS type plus at the narrowest type (for vectorization factor). It should get re-structured to get the vector types from the operands (much like code-generation will eventually fall back to). PR50596 has been the primary reason to introduce the bool patterns. If there is a better type promotion/demotion pass on a copy of the loop, sure, we can get rid of it (but figure out also what to do for SLP). Yeah, of course. Basic-block SLP just asks for the vectype during SLP analysis AFAIK. I suppose we want sth like get_result_vectype (gimple) which can look at operands as well and can be used from both places. After all we do want to fix the non-single-use issue somehow and getting rid of the patterns sounds good to me anyway... Not sure if I can get to the above for GCC 6, but at least putting it on my TODO... Richard. Jakub
[PATCH][AArch64 array_mode 8/8] Add d-registers to TARGET_ARRAY_MODE_SUPPORTED_P
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing ...QREG... macro, and as a driveby fixes mode-(MODE) in the latter. The new test now compiles (at -O3) to: test_1: add v1.2s, v1.2s, v5.2s add v2.2s, v2.2s, v6.2s add v3.2s, v3.2s, v7.2s add v0.2s, v0.2s, v4.2s ret Whereas prior to this patch we got: test_1: add v0.2s, v0.2s, v4.2s sub sp, sp, #160 add v1.2s, v1.2s, v5.2s add v2.2s, v2.2s, v6.2s add v3.2s, v3.2s, v7.2s str d0, [sp, 96] str d1, [sp, 104] str d2, [sp, 112] str d3, [sp, 120] ldp x2, x3, [sp, 96] stp x2, x3, [sp, 128] ldp x0, x1, [sp, 112] stp x0, x1, [sp, 144] ldr d1, [sp, 136] ldr d0, [sp, 128] ldr d2, [sp, 144] ldr d3, [sp, 152] add sp, sp, 160 ret I've tried to look for (the absence of) this extra code in a number of ways, all 3 scan...not's were previously failing (i.e. regex's were matching) but now pass. bootstrapped and check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.h (AARCH64_VALID_SIMD_DREG_MODE): New. (AARCH64_VALID_SIMD_QREG_MODE): Correct mode-MODE. * config/aarch64/aarch64.c (aarch64_array_mode_supported_p): Add AARCH64_VALID_SIMD_DREG_MODE. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-int32x2x4_1.c: New. --- gcc/config/aarch64/aarch64.c | 3 ++- gcc/config/aarch64/aarch64.h | 7 ++- .../gcc.target/aarch64/vect-int32x2x4_1.c | 22 ++ 3 files changed, 30 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a923b55..d2ea7f6 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -650,7 +650,8 @@ aarch64_array_mode_supported_p (machine_mode mode, unsigned HOST_WIDE_INT nelems) { if (TARGET_SIMD - AARCH64_VALID_SIMD_QREG_MODE (mode) + (AARCH64_VALID_SIMD_QREG_MODE (mode) + || AARCH64_VALID_SIMD_DREG_MODE (mode)) (nelems = 2 nelems = 4)) return true; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 3851564..d1ba00b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -915,10 +915,15 @@ extern enum aarch64_code_model aarch64_cmodel; (aarch64_cmodel == AARCH64_CMODEL_TINY \ || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) +/* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register. */ +#define AARCH64_VALID_SIMD_DREG_MODE(MODE) \ + ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \ + || (MODE) == V2SFmode || (MODE) == DImode || (MODE) == DFmode) + /* Modes valid for AdvSIMD Q registers. */ #define AARCH64_VALID_SIMD_QREG_MODE(MODE) \ ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \ - || (MODE) == V4SFmode || (MODE) == V2DImode || mode == V2DFmode) + || (MODE) == V4SFmode || (MODE) == V2DImode || (MODE) == V2DFmode) #define ENDIAN_LANE_N(mode, n) \ (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n) diff --git a/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c new file mode 100644 index 000..734cfd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-int32x2x4_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-rtl-expand } */ + +#include arm_neon.h + +uint32x2x4_t +test_1 (uint32x2x4_t a, uint32x2x4_t b) +{ + uint32x2x4_t result; + + for (unsigned index = 0; index 4; ++index) + result.val[index] = a.val[index] + b.val[index]; + + return result; +} + +/* Should not use the stack in expand. */ +/* { dg-final { scan-rtl-dump-not virtual-stack-vars expand } } */ +/* Should not have to modify the stack pointer. */ +/* { dg-final { scan-assembler-not \t(add|sub).*sp } } */ +/* Should not have to store or load anything. */ +/* { dg-final { scan-assembler-not \t(ld|st)\[rp\] } } */ -- 1.8.3
Re: [libvtv] Update copyrights
On Wed, 26 Aug 2015, Rainer Orth wrote: While working on the Solaris libvtv port, I noticed that many of the libvtv copyright years hadn't been updated, were misformtted, or both. It turns out that libvtv isn't listed in contrib/update-copyright.py at all. This patch fixes this and includes the result of running update-copyright.py --this-year libvtv. I've neither added libvtv to self.default_dirs in the script nor added copyrights to the numerous files in libvtv that currently lack one. Ok for mainline once it has survived regtesting? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [Scalar masks 2/x] Use bool masks in if-conversion
2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com: Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Huh? Bitwise ops should be readily available. Right bitwise ops are available, but there is no comparison into a vector and no masked loads and stores using vector masks (when we speak about 512-bit vectors). Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. How do you declare those? Something like this: #pragma omp declare simd inbranch int foo(int*); Current implementation of masked loads, masked stores and bool patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we really call it a canonical representation for all targets? No idea - we'll revisit when another targets adds a similar capability. AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). Using scalar masks everywhere should probably cause the same conversion problem for SSE I listed above though. Talking about a canonical representation, shouldn't we use some special masks representation and not mixing it with integer and vector of integers then? Only in this case target would be able to efficiently expand it into a corresponding rtl. That was my idea of vectorbool ... but I didn't explore it and see where it will cause issues. Fact is GCC already copes with vector masks generated by vector compares just fine everywhere and I'd rather leave it as that. Nope. Currently vector mask is obtained from a vec_cond A op B, {0 .. 0}, {-1 .. -1}. AND and IOR on bools are also expressed via additional vec_cond. I don't think vectorizer ever generates vector comparison. Ok, well that's an implementation detail then. Are you sure about AND and IOR? The comment above vect_recog_bool_pattern says Assuming size of TYPE is the same as size of all comparisons (otherwise some casts would be added where needed), the above sequence we create related pattern stmts: S1' a_T = x1 CMP1 y1 ? 1 : 0; S3' c_T = x2 CMP2 y2 ? a_T : 0; S4' d_T = x3 CMP3 y3 ? 1 : 0; S5' e_T = c_T | d_T; S6' f_T = e_T; thus has vector mask | I think in practice it would look like: S4' d_T = x3 CMP3 y3 ? 1 : c_T; Thus everything is usually hidden in vec_cond. But my concern is mostly about types used for that. And I wouldn't say it's fine 'everywhere' because there is a single target utilizing them. Masked loads and stored for AVX-512 just don't work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to 512-bit vector then we get an ugly inefficient code. The question is where to fight with this inefficiency: in RTL or in GIMPLE. I want to fight with it where it appears, i.e. in GIMPLE by preventing bool - int conversions applied everywhere even if target doesn't need it. If we don't want to support both types of masks in GIMPLE then it's more reasonable to make bool - int conversion in expand for targets requiring it, rather than do it for everyone and then leave it to target to transform it back and try to get rid of all those redundant transformations. I'd give vectorbool a chance to become a canonical mask representation for that. Well, you are missing the case of bool b = a b; int x = (int)b; This case seems to require no changes and just be transformed into vec_cond. Thanks, Ilya where the bool is used as integer (and thus an integer mask would have to be expanded). When the bool is a mask in itself the integer use is either free or a matter of a widening/shortening operation. Richard.
[v3 patch] Only set std::enable_shared_from_this member once.
This adds a check to weak_ptr::_M_assign() so that calling __enable_shared_from_this_helper twice with the same pointer won't change which shared_ptr object the weak_ptr shares ownership with. On the lib reflector Peter Dimov convinced me that the boost::enable_shared_from_this behaviour is preferable to what we do now. I'm writing a proposal to specify this in the standard, but am changing it now in our implementation. Tested powerpc64le-linux, committing to trunk. commit a1cd60820fb1af7f3396ff4b28e0e1d3449bfacb Author: Jonathan Wakely jwak...@redhat.com Date: Tue Aug 25 17:10:36 2015 +0100 Only set std::enable_shared_from_this member once. * include/bits/shared_ptr.h (__enable_shared_from_this_helper): Use nullptr. * include/bits/shared_ptr_base.h (weak_ptr::_M_assign): Don't assign if ownership is already shared with a shared_ptr object. (__enable_shared_from_this_helper): Use nullptr. * testsuite/20_util/enable_shared_from_this/members/const.cc: New. * testsuite/20_util/enable_shared_from_this/members/reinit.cc: New. * testsuite/20_util/enable_shared_from_this/requirements/ explicit_instantiation.cc: Instantiate with const and incomplete types. diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h index f96c078..2413b1b 100644 --- a/libstdc++-v3/include/bits/shared_ptr.h +++ b/libstdc++-v3/include/bits/shared_ptr.h @@ -588,7 +588,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const enable_shared_from_this* __pe, const _Tp1* __px) noexcept { - if (__pe != 0) + if (__pe != nullptr) __pe-_M_weak_assign(const_cast_Tp1*(__px), __pn); } diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index aec10fe..820edcb 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -1468,8 +1468,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void _M_assign(_Tp* __ptr, const __shared_count_Lp __refcount) noexcept { - _M_ptr = __ptr; - _M_refcount = __refcount; + if (use_count() == 0) + { + _M_ptr = __ptr; + _M_refcount = __refcount; + } } templatetypename _Tp1, _Lock_policy _Lp1 friend class __shared_ptr; @@ -1549,7 +1552,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const __enable_shared_from_this* __pe, const _Tp1* __px) noexcept { - if (__pe != 0) + if (__pe != nullptr) __pe-_M_weak_assign(const_cast_Tp1*(__px), __pn); } diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc new file mode 100644 index 000..fdf39c7 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/const.cc @@ -0,0 +1,60 @@ +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// { dg-options -std=gnu++11 } + +#include memory +#include testsuite_hooks.h + +templatetypename T, typename U + bool + share_ownership(const std::shared_ptrT p1, const std::shared_ptrU p2) + { +return !p1.owner_before(p2) !p2.owner_before(p1); + } + +void +test01() +{ + struct X : public std::enable_shared_from_thisX { }; + using CX = const X; + std::shared_ptrCX p(new X); + VERIFY( share_ownership(p-shared_from_this(), p) ); + p.reset(new CX); + VERIFY( share_ownership(p-shared_from_this(), p) ); + auto p2 = std::const_pointer_castX(p)-shared_from_this(); + VERIFY( share_ownership(p2, p) ); +} + +void +test02() +{ + struct X; + using CX = const X; + struct X : public std::enable_shared_from_thisCX { }; + std::shared_ptrCX p(new X); + VERIFY( share_ownership(p-shared_from_this(), p) ); + p.reset(new CX); + VERIFY( share_ownership(p-shared_from_this(), p) ); +} + +int +main() +{ + test01(); + test02(); +} diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc new file mode 100644 index 000..3740db8 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/members/reinit.cc @@ -0,0 +1,49 @@ +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Fri, Aug 21, 2015 at 12:49 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-21 11:15 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Thu, Aug 20, 2015 at 8:46 PM, Jeff Law l...@redhat.com wrote: On 08/17/2015 10:25 AM, Ilya Enkovich wrote: Hi, This patch intoriduces a new vectorizer hook use_scalar_mask_p which affects code generated by if-conversion pass (and affects patterns in later patches). Thanks, Ilya -- 2015-08-17 Ilya Enkovich enkovich@gmail.com * doc/tm.texi (TARGET_VECTORIZE_USE_SCALAR_MASK_P): New. * doc/tm.texi.in: Regenerated. * target.def (use_scalar_mask_p): New. * tree-if-conv.c: Include target.h. (predicate_mem_writes): Don't convert boolean predicates into integer when scalar masks are used. Presumably this is how you prevent the generation of scalar masks rather than boolean masks on targets which don't have the former? I hate to ask, but how painful would it be to go from a boolean to integer masks later such as during expansion? Or vice-versa. WIthout a deep knowledge of the entire patchkit, it feels like we're introducing target stuff in a place where we don't want it and that we'd be better served with a canonical representation through gimple, then dropping into something more target specific during gimple-rtl expansion. I want a work with bitmasks to be expressed in a natural way using regular integer operations. Currently all masks manipulations are emulated via vector statements (mostly using a bunch of vec_cond). For complex predicates it may be nontrivial to transform it back to scalar masks and get an efficient code. Also the same vector may be used as both a mask and an integer vector. Things become more complex if you additionally have broadcasts and vector pack/unpack code. It also should be transformed into a scalar masks manipulations somehow. Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Huh? Bitwise ops should be readily available. Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. How do you declare those? Current implementation of masked loads, masked stores and bool patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we really call it a canonical representation for all targets? No idea - we'll revisit when another targets adds a similar capability. AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). Using scalar masks everywhere should probably cause the same conversion problem for SSE I listed above though. Talking about a canonical representation, shouldn't we use some special masks representation and not mixing it with integer and vector of integers then? Only in this case target would be able to efficiently expand it into a corresponding rtl. That was my idea of vectorbool ... but I didn't explore it and see where it will cause issues. Fact is GCC already copes with vector masks generated by vector compares just fine everywhere and I'd rather leave it as that. Nope. Currently vector mask is obtained from a vec_cond A op B, {0 .. 0}, {-1 .. -1}. AND and IOR on bools are also expressed via additional vec_cond. I don't think vectorizer ever generates vector comparison. Ok, well that's an implementation detail then. Are you sure about AND and IOR? The comment above vect_recog_bool_pattern says Assuming size of TYPE is the same as size of all comparisons (otherwise some casts would be added where needed), the above sequence we create related pattern stmts: S1' a_T = x1 CMP1 y1 ? 1 : 0; S3' c_T = x2 CMP2 y2 ? a_T : 0; S4' d_T = x3 CMP3 y3 ? 1 : 0; S5' e_T = c_T | d_T; S6' f_T = e_T; thus has vector mask | And I wouldn't say it's fine 'everywhere' because there is a single target utilizing them. Masked loads and stored for AVX-512 just don't work now. And if we extend existing MASK_LOAD and MASK_STORE
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 1:13 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-26 0:42 GMT+03:00 Jeff Law l...@redhat.com: On 08/21/2015 04:49 AM, Ilya Enkovich wrote: I want a work with bitmasks to be expressed in a natural way using regular integer operations. Currently all masks manipulations are emulated via vector statements (mostly using a bunch of vec_cond). For complex predicates it may be nontrivial to transform it back to scalar masks and get an efficient code. Also the same vector may be used as both a mask and an integer vector. Things become more complex if you additionally have broadcasts and vector pack/unpack code. It also should be transformed into a scalar masks manipulations somehow. Or why not model the conversion at the gimple level using a CONVERT_EXPR? In fact, the more I think about it, that seems to make more sense to me. We pick a canonical form for the mask, whatever it may be. We use that canonical form and model conversions between it and the other form via CONVERT_EXPR. We then let DOM/PRE find/eliminate the redundant conversions. If it's not up to the task, we should really look into why and resolve. Yes, that does mean we have two forms which I'm not terribly happy about and it means some target dependencies on what the masked vector operation looks like (ie, does it accept a simple integer or vector mask), but I'm starting to wonder if, as distasteful as I find it, it's the right thing to do. If we have some special representation for masks in GIMPLE then we might not need any conversions. We could ask a target to define a MODE for this type and use it directly everywhere: directly compare into it, use it directly for masked loads and stores, AND, IOR, EQ etc. If that type is reserved for masks usage then you previous suggestion to transform masks into target specific form at GIMPLE-RTL phase should work fine. This would allow to support only a single masks representation in GIMPLE. But we can already do all this with the integer vector masks we have. If you think that the vectorizer generated mask = VEC_COND v1 v2 ? { -1,...} : { 0, ...} is ugly then we can remove that implementation detail and use mask = v1 v2; directly. Note that the VEC_COND form was invented to avoid the need to touch RTL expansion for vector compares (IIRC). Or it pre-dated specifying what compares generate on GIMPLE. Richard. Thanks, Ilya But I don't like changing our IL so much as to allow 'integer' masks everywhere. I'm warming up to that idea... jeff
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote: AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). That was just one of the reasons. The other reason is that even if we would choose some vector of integer type as vector of bool, the question is what type. E.g. if you use vector of chars, you almost always get terrible vectorized code, except for the AVX-512 you really want an integral type that has the size of the types you are comparing. And I'd say this is very much related to the need to do some type promotions or demotions on the scalar code meant to be vectorized (but only the copy for vectorizations), so that we have as few different scalar type sizes in the loop as possible, because widening / narrowing vector conversions aren't exactly cheap and a single char operation in a loop otherwise full of long long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into vf=16 (or 32 or 64), increasing it a lot. Jakub
[libvtv] Fix formatting errors
While looking at libvtv for the Solaris port, I noticed all sorts of GNU Coding Standard violations: * ChangeLog entries attributed to the committer instead of the author and with misformatted PR references, entries only giving a vague rational instead of what changed * overlong lines * tons of whitespace errors (though I may be wrong in some cases: C++ code might have other rules) * code formatting that seems to have been done to be visually pleasing, completely different from what Emacs does * commented code fragments (#if 0 equivalent) * configure.tgt target list in no recognizable order * the Cygwin/MingW port is done in the worst possible way: tons of target-specific ifdefs instead of feature-specific conditionals or an interface that can wrap both Cygwin and Linux variants of the code The following patch (as yet not even compiled) fixes some of the most glaring errors. The Solaris port will fix a few of the latter ones. Do you think this is the right direction or did I get something wrong? Thanks. Rainer 2015-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de Fix formatting errors. # HG changeset patch # Parent 6459822b8e6fa7647ad0d12ffb6f3da7bd0c5db2 Fix formatting errors diff --git a/libvtv/ChangeLog b/libvtv/ChangeLog --- a/libvtv/ChangeLog +++ b/libvtv/ChangeLog @@ -1,6 +1,6 @@ -2015-08-01 Caroline Tice cmt...@google.com +2015-08-01 Eric Gallager eg...@gwmail.gwu.edu - PR 66521 + PR bootstrap/66521 * Makefile.am: Update to match latest tree. * Makefile.in: Regenerate. * testsuite/lib/libvtv: Brought up to date. @@ -24,15 +24,13 @@ 2015-02-09 Thomas Schwinge thomas@cod * configure: Likewise. * testsuite/Makefile.in: Likewise. -2015-01-29 Caroline Tice cmt...@google.com +2015-01-29 Patrick Wollgast patrick.wollg...@rub.de - Committing VTV Cywin/Ming patch for Patrick Wollgast * libvtv/Makefile.in : Regenerate. * libvtv/configure : Regenerate. -2015-01-28 Caroline Tice cmt...@google.com +2015-01-28 Patrick Wollgast patrick.wollg...@rub.de - Committing VTV Cywin/Ming patch for Patrick Wollgast * libvtv/Makefile.am : Add libvtv.la to toolexeclib_LTLIBRARIES, if VTV_CYGMIN is set. Define libvtv_la_LIBADD, libvtv_la_LDFLAGS, libvtv_stubs_la_LDFLAGS and libvtv_stubs_la_SOURCES if VTV_CYGMIN is diff --git a/libvtv/vtv_fail.cc b/libvtv/vtv_fail.cc --- a/libvtv/vtv_fail.cc +++ b/libvtv/vtv_fail.cc @@ -38,9 +38,7 @@ desired. This may be the case if the programmer has to deal wtih unverified third party software, for example. __vtv_really_fail is available for the programmer to call from his version of - __vtv_verify_fail, if he decides the failure is real. - -*/ + __vtv_verify_fail, if he decides the failure is real. */ #include stdlib.h #include stdio.h @@ -80,8 +78,8 @@ const unsigned long SET_HANDLE_HANDLE_BI /* Instantiate the template classes (in vtv_set.h) for our particular hash table needs. */ -typedef void * vtv_set_handle; -typedef vtv_set_handle * vtv_set_handle_handle; +typedef void *vtv_set_handle; +typedef vtv_set_handle *vtv_set_handle_handle; static int vtv_failures_log_fd = -1; @@ -121,17 +119,16 @@ log_error_message (const char *log_msg, variable. */ static inline bool -is_set_handle_handle (void * ptr) +is_set_handle_handle (void *ptr) { - return ((unsigned long) ptr SET_HANDLE_HANDLE_BIT) - == SET_HANDLE_HANDLE_BIT; + return ((unsigned long) ptr SET_HANDLE_HANDLE_BIT) == SET_HANDLE_HANDLE_BIT; } /* Returns the actual pointer value of a vtable map variable, PTR (see comments for is_set_handle_handle for more details). */ static inline vtv_set_handle * -ptr_from_set_handle_handle (void * ptr) +ptr_from_set_handle_handle (void *ptr) { return (vtv_set_handle *) ((unsigned long) ptr ~SET_HANDLE_HANDLE_BIT); } @@ -141,7 +138,7 @@ ptr_from_set_handle_handle (void * ptr) variable. */ static inline vtv_set_handle_handle -set_handle_handle (vtv_set_handle * ptr) +set_handle_handle (vtv_set_handle *ptr) { return (vtv_set_handle_handle) ((unsigned long) ptr | SET_HANDLE_HANDLE_BIT); } @@ -151,7 +148,7 @@ set_handle_handle (vtv_set_handle * ptr) file, then calls __vtv_verify_fail. SET_HANDLE_PTR is the pointer to the set of valid vtable pointers, VTBL_PTR is the pointer that was not found in the set, and DEBUG_MSG is the message to be - written to the log file before failing. n */ + written to the log file before failing. */ void __vtv_verify_fail_debug (void **set_handle_ptr, const void *vtbl_ptr, @@ -197,9 +194,9 @@ vtv_fail (const char *msg, void **data_s *** Unable to verify vtable pointer (%p) in set (%p) *** \n; snprintf (buffer, sizeof (buffer), format_str, vtbl_ptr, -is_set_handle_handle(*data_set_ptr) ? - ptr_from_set_handle_handle (*data_set_ptr) : - *data_set_ptr); +
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 03:21:52PM +0200, Richard Biener wrote: On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote: AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). That was just one of the reasons. The other reason is that even if we would choose some vector of integer type as vector of bool, the question is what type. E.g. if you use vector of chars, you almost always get terrible vectorized code, except for the AVX-512 you really want an integral type that has the size of the types you are comparing. Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always first compute the vector type for the comparison itself (which is fixed) and thus we can compute the vector type of any bitwise op on it as well. Sure, but if you then immediately vector narrow it to a V*QI vector because it is stored originally into a bool/_Bool variable, and then again when it is used in say a COND_EXPR widen it again, you get really poor code. So, what the bool pattern code does is kind of poor man's type promotion/demotion pass for bool only, at least for the common cases. PR50596 has been the primary reason to introduce the bool patterns. If there is a better type promotion/demotion pass on a copy of the loop, sure, we can get rid of it (but figure out also what to do for SLP). Jakub
[PATCH][AArch64 array_mode 3/8] Stop using EImode in aarch64-simd.md and iterators.md
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally EImode. This patch changes to BLKmode in all cases, explicitly setting memory size (thus, preserving size for the cases that were EImode, and setting size for the first time for cases that were already BLKmode). The patterns affected are only for intrinsics: the aarch64_ld3r expanders and aarch64_simd_ld3r insns, and the aarch64_vec_{load,store}_lanesci_lane insns used by the aarch64_{ld,st}3_lane expanders. bootstrapped and check-gcc on aarch64-none-linux-gnu gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_ld3rmode, aarch64_vec_load_lanesci_lanemode, aarch64_vec_store_lanesci_lanemode): Change operand mode from V_THREE_ELEM to BLK. (aarch64_ld3rmode, aarch64_ld3_lanemode, aarch64_st3_laneVQ:mode): Generate MEM rtx with BLKmode, call set_mem_size. * config/aarch64/iterators.md (V_THREE_ELEM): Remove. --- gcc/config/aarch64/aarch64-simd.md | 27 ++- gcc/config/aarch64/iterators.md| 8 2 files changed, 14 insertions(+), 21 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7b7a1b8..156fc4f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4001,7 +4001,7 @@ (define_insn aarch64_simd_ld3rmode [(set (match_operand:CI 0 register_operand =w) - (unspec:CI [(match_operand:V_THREE_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:CI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] UNSPEC_LD3_DUP))] TARGET_SIMD @@ -4011,7 +4011,7 @@ (define_insn aarch64_vec_load_lanesci_lanemode [(set (match_operand:CI 0 register_operand =w) - (unspec:CI [(match_operand:V_THREE_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:CI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (match_operand:CI 2 register_operand 0) (match_operand:SI 3 immediate_operand i) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4052,11 +4052,11 @@ ;; RTL uses GCC vector extension indices, so flip only for assembly. (define_insn aarch64_vec_store_lanesci_lanemode - [(set (match_operand:V_THREE_ELEM 0 aarch64_simd_struct_operand =Utv) - (unspec:V_THREE_ELEM [(match_operand:CI 1 register_operand w) -(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) - (match_operand:SI 2 immediate_operand i)] - UNSPEC_ST3_LANE))] + [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv) + (unspec:BLK [(match_operand:CI 1 register_operand w) +(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) +(match_operand:SI 2 immediate_operand i)] + UNSPEC_ST3_LANE))] TARGET_SIMD { operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); @@ -4368,8 +4368,8 @@ (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_THREE_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3); emit_insn (gen_aarch64_simd_ld3rmode (operands[0], mem)); DONE; @@ -4589,8 +4589,8 @@ (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_THREE_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3); aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode), NULL); @@ -4874,8 +4874,9 @@ (match_operand:SI 2 immediate_operand)] TARGET_SIMD { - machine_mode mode = V_THREE_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[0]); + rtx mem = gen_rtx_MEM (BLKmode, operands[0]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3); + operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); emit_insn (gen_aarch64_vec_store_lanesci_laneVQ:mode (mem, diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 98b6714..ae0be0b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -568,14 +568,6 @@ (V2SF V2SF) (V4SF V2SF) (DF V2DI) (V2DF V2DI)]) -;; Similar, for three elements. -(define_mode_attr V_THREE_ELEM [(V8QI BLK) (V16QI BLK) -(V4HI BLK) (V8HI BLK) -(V2SI BLK) (V4SI BLK) -(DI EI)(V2DI EI) -(V2SF BLK) (V4SF BLK) -(DF EI)(V2DF EI)]) - ;; Similar, for four elements.
[PATCH][AArch64 array_mode 5/8] Remove V_FOUR_ELEM, again using BLKmode + set_mem_size.
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM, again using BLKmode + set_mem_size. (This makes the four-lane expanders very similar to the three-lane expanders, and they will be combined in patch 7.) bootstrapped and check-gcc on aarch64-none-linux-gnu gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_ld4rmode, aarch64_vec_load_lanesxi_lanemode, aarch64_vec_store_lanesxi_lanemode): Change operand mode from V_FOUR_ELEM to BLK. (aarch64_ld4rmode, aarch64_ld4_lanemode, aarch64_st4_laneVQ:mode): Generate MEM rtx with BLKmode, call set_mem_size. * config/aarch64/iterators.md (V_FOUR_ELEM): Remove. --- gcc/config/aarch64/aarch64-simd.md | 25 + gcc/config/aarch64/iterators.md| 9 - 2 files changed, 13 insertions(+), 21 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 156fc4f..68182d6 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4096,7 +4096,7 @@ (define_insn aarch64_simd_ld4rmode [(set (match_operand:XI 0 register_operand =w) - (unspec:XI [(match_operand:V_FOUR_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:XI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] UNSPEC_LD4_DUP))] TARGET_SIMD @@ -4106,7 +4106,7 @@ (define_insn aarch64_vec_load_lanesxi_lanemode [(set (match_operand:XI 0 register_operand =w) - (unspec:XI [(match_operand:V_FOUR_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:XI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (match_operand:XI 2 register_operand 0) (match_operand:SI 3 immediate_operand i) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4147,10 +4147,10 @@ ;; RTL uses GCC vector extension indices, so flip only for assembly. (define_insn aarch64_vec_store_lanesxi_lanemode - [(set (match_operand:V_FOUR_ELEM 0 aarch64_simd_struct_operand =Utv) - (unspec:V_FOUR_ELEM [(match_operand:XI 1 register_operand w) -(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) - (match_operand:SI 2 immediate_operand i)] + [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv) + (unspec:BLK [(match_operand:XI 1 register_operand w) +(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) +(match_operand:SI 2 immediate_operand i)] UNSPEC_ST4_LANE))] TARGET_SIMD { @@ -4381,8 +4381,8 @@ (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_FOUR_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4); emit_insn (gen_aarch64_simd_ld4rmode (operands[0],mem)); DONE; @@ -4609,8 +4609,8 @@ (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_FOUR_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4); aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode), NULL); @@ -4892,8 +4892,9 @@ (match_operand:SI 2 immediate_operand)] TARGET_SIMD { - machine_mode mode = V_FOUR_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[0]); + rtx mem = gen_rtx_MEM (BLKmode, operands[0]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4); + operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); emit_insn (gen_aarch64_vec_store_lanesxi_laneVQ:mode (mem, diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index ae0be0b..9535b7f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -568,15 +568,6 @@ (V2SF V2SF) (V4SF V2SF) (DF V2DI) (V2DF V2DI)]) -;; Similar, for four elements. -(define_mode_attr V_FOUR_ELEM [(V8QI SI) (V16QI SI) - (V4HI V4HI) (V8HI V4HI) - (V2SI V4SI) (V4SI V4SI) - (DI OI) (V2DI OI) - (V2SF V4SF) (V4SF V4SF) - (DF OI) (V2DF OI)]) - - ;; Mode for atomic operation suffixes (define_mode_attr atomic_sfx [(QI b) (HI h) (SI ) (DI )]) -- 1.8.3
Re: [PATCH], PowerPC IEEE 128-bit patch #6
On Fri, Aug 14, 2015 at 11:47 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This is patch #6: 2015-08-13 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000-protos.h (rs6000_expand_float128_convert): Add declaration. * config/rs6000/rs6000.c (rs6000_emit_le_vsx_store): Fix a comment. (rs6000_cannot_change_mode_class): Add support for IEEE 128-bit floating point in VSX registers. (rs6000_output_move_128bit): Always print out the set insn if we can't generate an appropriate 128-bit move. (rs6000_generate_compare): Add support for IEEE 128-bit floating point in VSX registers comparisons. (rs6000_expand_float128_convert): Likewise. * config/rs6000/rs6000.md (extenddftf2): Add support for IEEE 128-bit floating point in VSX registers. (extenddftf2_internal): Likewise. (trunctfdf2): Likewise. (trunctfdf2_internal2): Likewise. (fix_trunc_helper): Likewise. (fix_trunctfdi2): Likewise. (floatditf2): Likewise. (floatunsmodetf2): Likewise. (extendFLOAT128_SFDFTF:modeIFKF:mode2): Likewise. (truncIFKF:modeFLOAT128_SFDFTF:mode2): Likewise. (fix_truncIFKF:modeSDI:mode2): Likewise. (fixuns_truncIFKF:modeSDI:mode2): Likewise. (floatSDI:modeIFKF:mode2): Likewise. (floatunsSDI:modeIFKF:mode2): Likewise. This patch is okay. Thanks, David
Re: [RFC 4/5] Handle constant-pool entries
Hi, On Tue, Aug 25, 2015 at 12:06:16PM +0100, Alan Lawrence wrote: This makes SRA replace loads of records/arrays from constant pool entries, with elementwise assignments of the constant values, hence, overcoming the fundamental problem in PR/63679. As a first pass, the approach I took was to look for constant-pool loads as we scanned through other accesses, and add them as candidates there; to build a constant replacement_decl for any such accesses in completely_scalarize; and to use any existing replacement_decl rather than creating a variable in create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but that does not allow addresses of labels, which can still end up in the constant pool.) Feedback as to the approach or how it might be better structured / fitted into SRA, is solicited ;). I'm not familiar with constant pools very much, but I'll try: Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly increases the number of testcases in which this code is exercised! Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, and sh. gcc/ChangeLog: * tree-sra.c (create_access): Scan for uses of constant pool and add to candidates. (subst_initial): New. (scalarize_elem): Build replacement_decl using subst_initial. (create_access_replacement): Use replacement_decl if set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param sra-max-scalarization-size-Ospeed. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +--- gcc/tree-sra.c| 56 +-- 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index 9eccdc9..b13d583 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized } */ +/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32 } */ int foo () @@ -17,7 +17,4 @@ foo () /* After late unrolling the above loop completely DOM should be able to optimize this to return 28. */ -/* See PR63679 and PR64159, if the target forces the initializer to memory then - DOM is not able to perform this optimization. */ - -/* { dg-final { scan-tree-dump return 28; optimized { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */ +/* { dg-final { scan-tree-dump return 28; optimized } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index af35fcc..a3ff2df 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write) else ptr = false; + /* FORNOW: scan for uses of constant pool as we go along. */ + if (TREE_CODE (base) == VAR_DECL DECL_IN_CONSTANT_POOL (base) + !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) +{ + gcc_assert (!write); + bitmap_set_bit (candidate_bitmap, DECL_UID (base)); + tree_node **slot = candidates-find_slot_with_hash (base, DECL_UID (base), + INSERT); + *slot = base; +} + I believe you only want to do this if (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA). The idea of candidates is that we gather them in find_var_candidates and ten we only eliminate them, this has the benefit of not worrying about disqualifying a candidate and then erroneously re-adding it later. So if you could find a way to structure your code this way, I'd much happier. If it is impossible without traversing the whole function just for that purpose, we may need some mechanism to prevent us from making a disqualified decl a candidate again. Or, if we come to the conclusion that constant pool decls do not ever get disqualified, a gcc_assert making sure it actually does not happen in disqualify_candidate. And of course at find_var_candidates time we check that all candidates pass simple checks in maybe_add_sra_candidate. I suppose many of them do not make sense for constant pool decls but at least please have a look whether that is the case for all of them or not. if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) } } +static tree +subst_initial (tree expr, tree var) This needs a comment and a better name. A name that would make it clear this is for constant
Re: [PATCH 2/5] completely_scalarize arrays as well as records
Hi, On Wed, Aug 26, 2015 at 09:07:33AM +0200, Richard Biener wrote: On Tue, Aug 25, 2015 at 11:44 PM, Jeff Law l...@redhat.com wrote: On 08/25/2015 03:42 PM, Martin Jambor wrote: Hi, On Tue, Aug 25, 2015 at 12:06:14PM +0100, Alan Lawrence wrote: This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I renamed both methods so as not to mention 'record'. thanks for working on this. I see Jeff has already approved the patch, but I have two comments nevertheless. First, I would be much happier if you added a proper comment to scalarize_elem function which you forgot completely. The name is not very descriptive and it has quite few parameters too. Right. I mentioned that I missed the lack of function comments when looking at #3 and asked Alan to go back and fix them in #1 and #2. Second, this patch should also fix PR 67283. It would be great if you could verify that and add it to the changelog when committing if that is indeed the case. Excellent. Yes, definitely mention the BZ. One extra question is does the way we limit total scalarization work well for arrays? I suppose we have either sth like the maximum size of an aggregate we scalarize or the maximum number of component accesses we create? Only the former and that would be kept intact. It is in fact visible in the context of the last hunk of the patch. Martin
Re: [PATCH], PowerPC IEEE 128-bit patch #5
On Tue, Aug 25, 2015 at 7:20 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: Here is the revised patch. Is it ok to install? 2015-08-25 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/predicates.md (int_reg_operand_not_pseudo): New predicate for only GPR hard registers. * config/rs6000/rs6000.md (FP): Add IEEE 128-bit floating point modes to iterators. Add new iterators for moving 128-bit values in scalar FPR registers and VSX registers. (FMOVE128): Likewise. (FMOVE128_FPR): Likewise. (FMOVE128_GPR): Likewise. (FMOVE128_VSX): Likewise. (FLOAT128_SFDFTF): New iterators for IEEE 128-bit floating point in VSX registers. (IFKF): Likewise. (IBM128): Likewise. (TFIFKF): Likewise. (RELOAD): Add IEEE 128-bit floating point modes. (signbittf2): Convert TF insns to add support for new IEEE 128-bit floating point in VSX registers modes. (signbitmode2, IBM128 iterator): Likewise. (movmode_64bit_dm, FMOVE128_FPR iterator): Likewise. (movmode_32bit, FMOVE128_FPR iterator): Likewise. (negtf2): Likewise. (negmode2, TFIFKF iterator): Likewise. (negtf2_internal): Likewise. (abstf2): Likewise. (absmode2, TFIFKF iterator): Likewise. (ieee_128bit_negative_zero): New IEEE 128-bit floating point in VSX insn support for negate, absolute value, and negative absolute value. (ieee_128bit_vsx_negmode2): Likewise. (ieee_128bit_vsx_negmode2_internal): Likewise. (ieee_128bit_vsx_absmode2): Likewise. (ieee_128bit_vsx_absmode2_internal): Likewise. (ieee_128bit_vsx_nabsmode2): Likewise. (ieee_128bit_vsx_nabsmode2_internal): Likewise. (FP128_64): Update pack/unpack 128-bit insns for IEEE 128-bit floating point in VSX registers. (unpackmode_dm): Likewise. (unpackmode_nodm): Likewise. (packmode): Likewise. (unpackv1ti): Likewise. (unpackmode, FMOVE128_VSX iterator): Likewise. (packv1ti): Likewise. (packmode, FMOVE128_VSX iterator): Likewise. The revised patch is okay. Thanks, David
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 3:16 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Aug 26, 2015 at 03:02:02PM +0200, Richard Biener wrote: AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). That was just one of the reasons. The other reason is that even if we would choose some vector of integer type as vector of bool, the question is what type. E.g. if you use vector of chars, you almost always get terrible vectorized code, except for the AVX-512 you really want an integral type that has the size of the types you are comparing. Yeah, but the way STMT_VINFO_VECTYPE is computed is that we always first compute the vector type for the comparison itself (which is fixed) and thus we can compute the vector type of any bitwise op on it as well. And I'd say this is very much related to the need to do some type promotions or demotions on the scalar code meant to be vectorized (but only the copy for vectorizations), so that we have as few different scalar type sizes in the loop as possible, because widening / narrowing vector conversions aren't exactly cheap and a single char operation in a loop otherwise full of long long operations might unnecessarily turn a vf=2 (or 4 or 8) loop into vf=16 (or 32 or 64), increasing it a lot. That's true but unrelated. With conditions this gets to optimizing where the promotion/demotion happens (which depends on how the result is used). The current pattern approach has the issue that it doesn't work for multiple uses in the condition bitops which is bad as well. But it couldn't have been _only_ the vector type computation that made us invent the patterns, no? Do you remember anything else? Thanks, Richard. Jakub
[PATCH][AArch64 array_mode 4/8] Remove EImode
This removes EImode from the (AArch64) compiler, and all mention of or support for it. bootstrapped and check-gcc on aarch64-none-linux-gnu gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update comment. * config/aarch64/aarch64-builtins.c (ei_UP, aarch64_simd_intEI_type_node): Remove. (aarch64_simd_builtin_std_type): Remove EImode case. (aarch64_init_simd_builtin_types): Don't create/add intEI_type_node. * config/aarch64/aarch64-modes.def: Remove EImode. --- gcc/config/aarch64/aarch64-builtins.c | 8 gcc/config/aarch64/aarch64-modes.def | 5 ++--- gcc/config/aarch64/aarch64.c | 2 +- 3 files changed, 3 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 294bf9d..9c8ca3b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -73,7 +73,6 @@ #define v2di_UP V2DImode #define v2df_UP V2DFmode #define ti_UP TImode -#define ei_UP EImode #define oi_UP OImode #define ci_UP CImode #define xi_UP XImode @@ -435,7 +434,6 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = { #undef ENTRY static tree aarch64_simd_intOI_type_node = NULL_TREE; -static tree aarch64_simd_intEI_type_node = NULL_TREE; static tree aarch64_simd_intCI_type_node = NULL_TREE; static tree aarch64_simd_intXI_type_node = NULL_TREE; @@ -509,8 +507,6 @@ aarch64_simd_builtin_std_type (enum machine_mode mode, return QUAL_TYPE (TI); case OImode: return aarch64_simd_intOI_type_node; -case EImode: - return aarch64_simd_intEI_type_node; case CImode: return aarch64_simd_intCI_type_node; case XImode: @@ -623,15 +619,11 @@ aarch64_init_simd_builtin_types (void) #define AARCH64_BUILD_SIGNED_TYPE(mode) \ make_signed_type (GET_MODE_PRECISION (mode)); aarch64_simd_intOI_type_node = AARCH64_BUILD_SIGNED_TYPE (OImode); - aarch64_simd_intEI_type_node = AARCH64_BUILD_SIGNED_TYPE (EImode); aarch64_simd_intCI_type_node = AARCH64_BUILD_SIGNED_TYPE (CImode); aarch64_simd_intXI_type_node = AARCH64_BUILD_SIGNED_TYPE (XImode); #undef AARCH64_BUILD_SIGNED_TYPE tdecl = add_builtin_type - (__builtin_aarch64_simd_ei , aarch64_simd_intEI_type_node); - TYPE_NAME (aarch64_simd_intEI_type_node) = tdecl; - tdecl = add_builtin_type (__builtin_aarch64_simd_oi , aarch64_simd_intOI_type_node); TYPE_NAME (aarch64_simd_intOI_type_node) = tdecl; tdecl = add_builtin_type diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index b17b90d..653bd00 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -46,9 +46,8 @@ VECTOR_MODE (FLOAT, DF, 1); /* V1DF. */ /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments. */ INT_MODE (OI, 32); -/* Opaque integer modes for 3, 6 or 8 Neon double registers (2 is - TImode). */ -INT_MODE (EI, 24); +/* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon d-registers + (2 d-regs = 1 q-reg = TImode). */ INT_MODE (CI, 48); INT_MODE (XI, 64); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 020f63c..a923b55 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9305,7 +9305,7 @@ aarch64_simd_attr_length_move (rtx_insn *insn) } /* Compute and return the length of aarch64_simd_reglistmode, where mode is - one of VSTRUCT modes: OI, CI, EI, or XI. */ + one of VSTRUCT modes: OI, CI, or XI. */ int aarch64_simd_attr_length_rglist (enum machine_mode mode) { -- 1.8.3
[PATCH][AArch64 array_mode 2/8] Remove VSTRUCT_DREG, use BLKmode for d-reg aarch64_st/ld expands
aarch64_stVSTRUCT:nregsVDC:mode and aarch64_ldVSTRUCT:nregsVDC:mode expanders back onto 12 insns aarch64_{ld,st}{2,3,4}mode_dreg (for VD and DX modes), using the VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory transferred. Instead, use BLKmode for all memory transfers, explicitly setting mem_size. Bootstrapped and check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_ld2mode_dreg VD DX, aarch64_st2mode_dreg VD DX ): Change all TImode operands to BLKmode. (aarch64_ld3mode_dreg VD DX, aarch64_st3mode_dreg VD DX): Change all EImode operands to BLKmode. (aarch64_ld4mode_dreg VD DX, aarch64_st4mode_dreg VD DX): Change all OImode operands to BLKmode. (aarch64_ldVSTRUCT:nregsVDC:mode, aarch64_stVSTRUCT:nregsVDC:mode): Generate MEM rtx with BLKmode and call set_mem_size. * config/aarch64/iterators.md (VSTRUCT_DREG): Remove. --- gcc/config/aarch64/aarch64-simd.md | 44 +++--- gcc/config/aarch64/iterators.md| 2 -- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3796386..7b7a1b8 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4393,7 +4393,7 @@ (subreg:OI (vec_concat:VRL2 (vec_concat:VDBL -(unspec:VD [(match_operand:TI 1 aarch64_simd_struct_operand Utv)] +(unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD2) (vec_duplicate:VD (const_int 0))) (vec_concat:VDBL @@ -4410,7 +4410,7 @@ (subreg:OI (vec_concat:VRL2 (vec_concat:VDBL -(unspec:DX [(match_operand:TI 1 aarch64_simd_struct_operand Utv)] +(unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD2) (const_int 0)) (vec_concat:VDBL @@ -4428,7 +4428,7 @@ (vec_concat:VRL3 (vec_concat:VRL2 (vec_concat:VDBL -(unspec:VD [(match_operand:EI 1 aarch64_simd_struct_operand Utv)] +(unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD3) (vec_duplicate:VD (const_int 0))) (vec_concat:VDBL @@ -4450,7 +4450,7 @@ (vec_concat:VRL3 (vec_concat:VRL2 (vec_concat:VDBL -(unspec:DX [(match_operand:EI 1 aarch64_simd_struct_operand Utv)] +(unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD3) (const_int 0)) (vec_concat:VDBL @@ -4472,7 +4472,7 @@ (vec_concat:VRL4 (vec_concat:VRL2 (vec_concat:VDBL - (unspec:VD [(match_operand:OI 1 aarch64_simd_struct_operand Utv)] + (unspec:VD [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD4) (vec_duplicate:VD (const_int 0))) (vec_concat:VDBL @@ -4499,7 +4499,7 @@ (vec_concat:VRL4 (vec_concat:VRL2 (vec_concat:VDBL - (unspec:DX [(match_operand:OI 1 aarch64_simd_struct_operand Utv)] + (unspec:DX [(match_operand:BLK 1 aarch64_simd_struct_operand Utv)] UNSPEC_LD4) (const_int 0)) (vec_concat:VDBL @@ -4526,8 +4526,8 @@ (unspec:VDC [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = VSTRUCT:VSTRUCT_DREGmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, VSTRUCT:nregs * 8); emit_insn (gen_aarch64_ldVSTRUCT:nregsVDC:mode_dreg (operands[0], mem)); DONE; @@ -4765,8 +4765,8 @@ ) (define_insn aarch64_st2mode_dreg - [(set (match_operand:TI 0 aarch64_simd_struct_operand =Utv) - (unspec:TI [(match_operand:OI 1 register_operand w) + [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv) + (unspec:BLK [(match_operand:OI 1 register_operand w) (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST2))] TARGET_SIMD @@ -4775,8 +4775,8 @@ ) (define_insn aarch64_st2mode_dreg - [(set (match_operand:TI 0 aarch64_simd_struct_operand =Utv) - (unspec:TI [(match_operand:OI 1 register_operand w) + [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv) + (unspec:BLK [(match_operand:OI 1 register_operand w) (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST2))] TARGET_SIMD @@ -4785,8 +4785,8 @@ ) (define_insn aarch64_st3mode_dreg - [(set (match_operand:EI 0 aarch64_simd_struct_operand =Utv) - (unspec:EI [(match_operand:CI 1 register_operand w) +
[PATCH][AArch64 array_mode 6/8] Remove V_TWO_ELEM, again using BLKmode + set_mem_size.
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow the same pattern. bootstrapped and check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_ld2rmode, aarch64_vec_load_lanesoi_lanemode, aarch64_vec_store_lanesoi_lanemode): Change operand mode from V_TWO_ELEM to BLK. (aarch64_ld2rmode, aarch64_ld2_lanemode, aarch64_st2_laneVQ:mode): Generate MEM rtx with BLKmode, call set_mem_size. * config/aarch64/iterators.md (V_TWO_ELEM): Remove. --- gcc/config/aarch64/aarch64-simd.md | 21 +++-- gcc/config/aarch64/iterators.md| 9 - 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 68182d6..f938754 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3906,7 +3906,7 @@ (define_insn aarch64_simd_ld2rmode [(set (match_operand:OI 0 register_operand =w) - (unspec:OI [(match_operand:V_TWO_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:OI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] UNSPEC_LD2_DUP))] TARGET_SIMD @@ -3916,7 +3916,7 @@ (define_insn aarch64_vec_load_lanesoi_lanemode [(set (match_operand:OI 0 register_operand =w) - (unspec:OI [(match_operand:V_TWO_ELEM 1 aarch64_simd_struct_operand Utv) + (unspec:OI [(match_operand:BLK 1 aarch64_simd_struct_operand Utv) (match_operand:OI 2 register_operand 0) (match_operand:SI 3 immediate_operand i) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] @@ -3957,8 +3957,8 @@ ;; RTL uses GCC vector extension indices, so flip only for assembly. (define_insn aarch64_vec_store_lanesoi_lanemode - [(set (match_operand:V_TWO_ELEM 0 aarch64_simd_struct_operand =Utv) - (unspec:V_TWO_ELEM [(match_operand:OI 1 register_operand w) + [(set (match_operand:BLK 0 aarch64_simd_struct_operand =Utv) + (unspec:BLK [(match_operand:OI 1 register_operand w) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) (match_operand:SI 2 immediate_operand i)] UNSPEC_ST2_LANE))] @@ -4355,8 +4355,8 @@ (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_TWO_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2); emit_insn (gen_aarch64_simd_ld2rmode (operands[0], mem)); DONE; @@ -4569,8 +4569,8 @@ (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { - machine_mode mode = V_TWO_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[1]); + rtx mem = gen_rtx_MEM (BLKmode, operands[1]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2); aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode), NULL); @@ -4857,8 +4857,9 @@ (match_operand:SI 2 immediate_operand)] TARGET_SIMD { - machine_mode mode = V_TWO_ELEMmode; - rtx mem = gen_rtx_MEM (mode, operands[0]); + rtx mem = gen_rtx_MEM (BLKmode, operands[0]); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2); + operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); emit_insn (gen_aarch64_vec_store_lanesoi_laneVQ:mode (mem, diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 9535b7f..2a99e10 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -559,15 +559,6 @@ (V4SI V16SI) (V4SF V16SF) (V2DI V8DI) (V2DF V8DF)]) -;; Mode of pair of elements for each vector mode, to define transfer -;; size for structure lane/dup loads and stores. -(define_mode_attr V_TWO_ELEM [(V8QI HI) (V16QI HI) - (V4HI SI) (V8HI SI) - (V2SI V2SI) (V4SI V2SI) - (DI V2DI) (V2DI V2DI) - (V2SF V2SF) (V4SF V2SF) - (DF V2DI) (V2DF V2DI)]) - ;; Mode for atomic operation suffixes (define_mode_attr atomic_sfx [(QI b) (HI h) (SI ) (DI )]) -- 1.8.3
[PATCH][AArch64 0/8] Add D-registers to TARGET_ARRAY_MODE_SUPPORTED_P
The end goal of this series of patches is to enable 64bit vector modes for TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so causes ICEs with illegal subregs (e.g. returning the middle bits from a large int mode covering 3 vectors); the patchset avoids these by first removing EImode (192 bits = 24 bytes = 1.5 vector registers), which is currently used for 24-byte quantities transferred to/from memory by some {ld,st}3_lane instrinsics. There is no real need to use EImode here, it's only real purpose is that it has size 24 bytes, so we can use BLKmode instead as long as we explicitly set the size. Patches 5-6 extend the same BLKmode treatment to {ld,st}{2,4}, allowing all the expander patterns to combined in patch 7; these are not essential to the end goal but it seemed good to be consistent. Patch 1 is a driveby, and stands in its own right.
[PATCH][AArch64 array_mode 1/8] Rename vec_store_lanesmode_lane to aarch64_vec_store_lanesmode_lane
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern names, paralleling aarch64_vec_load_lanesmode_lane. bootstrapped and check-gcc on aarch64-none-linux-gnu gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_store_lanesoi_lanemode): Rename to... (aarch64_vec_store_lanesoi_lanemode): ...this. (vec_store_lanesci_lanemode): Rename to... (aarch64_vec_store_lanesci_lanemode): ...this. (vec_store_lanesxi_lanemode): Rename to... (aarch64_vec_store_lanesxi_lanemode): ...this. (aarch64_st2_laneVQ:mode, aarch64_st3_laneVQ:mode, aarch64_st4_laneVQ:mode): Follow renaming. --- gcc/config/aarch64/aarch64-simd.md | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index b90f938..3796386 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3956,7 +3956,7 @@ ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn vec_store_lanesoi_lanemode +(define_insn aarch64_vec_store_lanesoi_lanemode [(set (match_operand:V_TWO_ELEM 0 aarch64_simd_struct_operand =Utv) (unspec:V_TWO_ELEM [(match_operand:OI 1 register_operand w) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) @@ -4051,7 +4051,7 @@ ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn vec_store_lanesci_lanemode +(define_insn aarch64_vec_store_lanesci_lanemode [(set (match_operand:V_THREE_ELEM 0 aarch64_simd_struct_operand =Utv) (unspec:V_THREE_ELEM [(match_operand:CI 1 register_operand w) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) @@ -4146,7 +4146,7 @@ ) ;; RTL uses GCC vector extension indices, so flip only for assembly. -(define_insn vec_store_lanesxi_lanemode +(define_insn aarch64_vec_store_lanesxi_lanemode [(set (match_operand:V_FOUR_ELEM 0 aarch64_simd_struct_operand =Utv) (unspec:V_FOUR_ELEM [(match_operand:XI 1 register_operand w) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) @@ -4861,9 +4861,9 @@ rtx mem = gen_rtx_MEM (mode, operands[0]); operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); - emit_insn (gen_vec_store_lanesoi_laneVQ:mode (mem, - operands[1], - operands[2])); + emit_insn (gen_aarch64_vec_store_lanesoi_laneVQ:mode (mem, + operands[1], + operands[2])); DONE; }) @@ -4878,9 +4878,9 @@ rtx mem = gen_rtx_MEM (mode, operands[0]); operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); - emit_insn (gen_vec_store_lanesci_laneVQ:mode (mem, - operands[1], - operands[2])); + emit_insn (gen_aarch64_vec_store_lanesci_laneVQ:mode (mem, + operands[1], + operands[2])); DONE; }) @@ -4895,9 +4895,9 @@ rtx mem = gen_rtx_MEM (mode, operands[0]); operands[2] = GEN_INT (ENDIAN_LANE_N (MODEmode, INTVAL (operands[2]))); - emit_insn (gen_vec_store_lanesxi_laneVQ:mode (mem, - operands[1], - operands[2])); + emit_insn (gen_aarch64_vec_store_lanesxi_laneVQ:mode (mem, + operands[1], + operands[2])); DONE; }) -- 1.8.3
[PATCH][AArch64 array_mode 7/8] Combine the expanders using VSTRUCT:nregs
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders all nearly identical, so we can easily parameterize across the number of lanes and combine them. For the ldVSTRUCT:nregs_lane pattern, I switched from the VCONQ attribute to just using the MODE attribute, this is identical for all the Q-register modes over which we iterate. bootstrapped and check-gcc on aarch64-none-linux-gnu gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_ld2rmode, aarch64_ld3rmode, aarch64_ld4rmode): Combine together, making... (aarch64_simd_ldVSTRUCT:nregsrVALLDIF:mode): ...this. (aarch64_ld2_lanemode, aarch64_ld3_lanemode, aarch64_ld4_lanemode): Combine together, making... (aarch64_ldVSTRUCT:nregs_laneVQ:mode): ...this. (aarch64_st2_laneVQ:mode, aarch64_st3_laneVQ:mode, aarch64_st4_laneVQ:mode): Combine together, making... (aarch64_stVSTRUCT:nregs_laneVQ:mode): ...this. --- gcc/config/aarch64/aarch64-simd.md | 144 ++--- 1 file changed, 21 insertions(+), 123 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f938754..38c4210 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4349,42 +4349,18 @@ FAIL; }) -(define_expand aarch64_ld2rmode - [(match_operand:OI 0 register_operand =w) +(define_expand aarch64_ldVSTRUCT:nregsrVALLDIF:mode + [(match_operand:VSTRUCT 0 register_operand =w) (match_operand:DI 1 register_operand w) (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (VALLDIF:MODEmode)) +* VSTRUCT:nregs); - emit_insn (gen_aarch64_simd_ld2rmode (operands[0], mem)); - DONE; -}) - -(define_expand aarch64_ld3rmode - [(match_operand:CI 0 register_operand =w) - (match_operand:DI 1 register_operand w) - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - TARGET_SIMD -{ - rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3); - - emit_insn (gen_aarch64_simd_ld3rmode (operands[0], mem)); - DONE; -}) - -(define_expand aarch64_ld4rmode - [(match_operand:XI 0 register_operand =w) - (match_operand:DI 1 register_operand w) - (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - TARGET_SIMD -{ - rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 4); - - emit_insn (gen_aarch64_simd_ld4rmode (operands[0],mem)); + emit_insn (gen_aarch64_simd_ldVSTRUCT:nregsrVALLDIF:mode (operands[0], + mem)); DONE; }) @@ -4561,67 +4537,25 @@ DONE; }) -(define_expand aarch64_ld2_lanemode - [(match_operand:OI 0 register_operand =w) +(define_expand aarch64_ldVSTRUCT:nregs_laneVQ:mode + [(match_operand:VSTRUCT 0 register_operand =w) (match_operand:DI 1 register_operand w) - (match_operand:OI 2 register_operand 0) + (match_operand:VSTRUCT 2 register_operand 0) (match_operand:SI 3 immediate_operand i) (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] TARGET_SIMD { rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 2); + set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (VQ:MODEmode)) +* VSTRUCT:nregs); - aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode), + aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VQ:MODEmode), NULL); - emit_insn (gen_aarch64_vec_load_lanesoi_lanemode (operands[0], - mem, - operands[2], - operands[3])); + emit_insn (gen_aarch64_vec_load_lanesVSTRUCT:mode_laneVQ:mode ( + operands[0], mem, operands[2], operands[3])); DONE; }) -(define_expand aarch64_ld3_lanemode - [(match_operand:CI 0 register_operand =w) - (match_operand:DI 1 register_operand w) - (match_operand:CI 2 register_operand 0) - (match_operand:SI 3 immediate_operand i) - (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - TARGET_SIMD -{ - rtx mem = gen_rtx_MEM (BLKmode, operands[1]); - set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (MODEmode)) * 3); - - aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (VCONQmode), - NULL); - emit_insn (gen_aarch64_vec_load_lanesci_lanemode (operands[0], - mem, - operands[2], - operands[3])); - DONE;
[gomp4] teach the tracer pass to ignore more blocks for OpenACC
I hit a problem in on one of my reduction test cases where the GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be single-entry, single-exit regions, or some form of thread divergence may occur. When that happens, we cannot use the shfl instruction for reductions or broadcasting (if the warp is divergent), and it may cause problems with synchronization in general. Nathan ran into a similar problem in one of the ssa passes when he added support for predication in the nvptx backend. Part of his solution was to add a gimple_call_internal_unique_p function to determine if internal functions are safe to be cloned. This patch teaches the tracer to scan each basic block for internal function calls using gimple_call_internal_unique_p, and mark the blocks that contain certain OpenACC internal functions calls as ignored. It is a shame that gimple_statement_iterators do not play nicely with const_basic_block. Is this patch ok for gomp-4_0-branch? Cesar 2015-08-25 Cesar Philippidis ce...@codesourcery.com gcc/ * tracer.c (ignore_bb_p): Change bb argument from const_basic_block to basic_block. Check for non-clonable calls to internal functions. diff --git a/gcc/tracer.c b/gcc/tracer.c index cad7ab1..f20c158 100644 --- a/gcc/tracer.c +++ b/gcc/tracer.c @@ -58,7 +58,7 @@ #include fibonacci_heap.h static int count_insns (basic_block); -static bool ignore_bb_p (const_basic_block); +static bool ignore_bb_p (basic_block); static bool better_p (const_edge, const_edge); static edge find_best_successor (basic_block); static edge find_best_predecessor (basic_block); @@ -91,8 +91,9 @@ bb_seen_p (basic_block bb) /* Return true if we should ignore the basic block for purposes of tracing. */ static bool -ignore_bb_p (const_basic_block bb) +ignore_bb_p (basic_block bb) { + gimple_stmt_iterator gsi; gimple g; if (bb-index NUM_FIXED_BLOCKS) @@ -106,6 +107,16 @@ ignore_bb_p (const_basic_block bb) if (g gimple_code (g) == GIMPLE_TRANSACTION) return true; + /* Ignore blocks containing non-clonable function calls. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + g = gsi_stmt (gsi); + + if (is_gimple_call (g) gimple_call_internal_p (g) + gimple_call_internal_unique_p (g)) + return true; +} + return false; }
Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack
I am working on a new patch to address some of the previous concerns and plan to post it soon after some final testing. On 08/25/2015 05:51 PM, Ian Lance Taylor wrote: On Tue, Aug 18, 2015 at 1:36 PM, Lynn A. Boger labo...@linux.vnet.ibm.com wrote: libgo/ PR target/66870 configure.ac: When gccgo for building libgo uses the gold version containing split stack support on ppc64, ppc64le, define LINKER_SUPPORTS_SPLIT_STACK. configure: Regenerate. Your version test for gold isn't robust: if the major version = 3, then presumably split stack is supported. And since you have numbers, I suggest not trying to use switch, but instead writing something like if expr $gold_minor == 25; then ... elif expr $gold_minor 25; then ... fi If that is fixed, I'm fine with the libgo part of this patch. Ian
[AArch64/testsuite] Add more TLS local executable testcases
This patch cover tlsle tiny model tests, tls size truncation for tiny small model included also. All testcases pass native test. OK for trunk? 2015-08-26 Jiong Wang jiong.w...@arm.com gcc/testsuite/ * gcc.target/aarch64/tlsle12_tiny_1.c: New testcase for tiny model. * gcc.target/aarch64/tlsle24_tiny_1.c: Likewise. * gcc.target/aarch64/tlsle_sizeadj_tiny_1.c: TLS size truncation test for tiny model. * gcc.target/aarch64/tlsle_sizeadj_small_1.c: TLS size truncation test for small model. -- Regards, Jiong Index: gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c === --- gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/tlsle12_tiny_1.c (working copy) @@ -0,0 +1,8 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=12 -mcmodel=tiny --save-temps } */ + +#include tls_1.x + +/* { dg-final { scan-assembler-times #:tprel_lo12 2 } } */ +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c === --- gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/tlsle24_tiny_1.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=24 -mcmodel=tiny --save-temps } */ + +#include tls_1.x + +/* { dg-final { scan-assembler-times #:tprel_lo12_nc 2 } } */ +/* { dg-final { scan-assembler-times #:tprel_hi12 2 } } */ +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c === --- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_small_1.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target aarch64_tlsle32 } */ +/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=48 --save-temps } */ + +#include tls_1.x + +/* { dg-final { scan-assembler-times #:tprel_g1 2 } } */ +/* { dg-final { scan-assembler-times #:tprel_g0_nc 2 } } */ +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c === --- gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/tlsle_sizeadj_tiny_1.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -O2 -fpic -ftls-model=local-exec -mtls-size=32 -mcmodel=tiny --save-temps } */ + +#include tls_1.x + +/* { dg-final { scan-assembler-times #:tprel_lo12_nc 2 } } */ +/* { dg-final { scan-assembler-times #:tprel_hi12 2 } } */ +/* { dg-final { cleanup-saved-temps } } */
[patch] libstdc++/64351 Ensure std::generate_canonical doesn't return 1.
Ed posted this patch to https://gcc.gnu.org/PR64351 in January, I've tested it and am committing it to trunk with a test. commit 45f154a5f9172a17f6226b99b41cb9c0bd8d15ec Author: Jonathan Wakely jwak...@redhat.com Date: Wed Aug 26 12:53:08 2015 +0100 Ensure std::generate_canonical doesn't return 1. 2015-08-26 Edward Smith-Rowland 3dw...@verizon.net Jonathan Wakely jwak...@redhat.com PR libstdc++/64351 PR libstdc++/63176 * include/bits/random.tcc (generate_canonical): Loop until we get a result less than one. * testsuite/26_numerics/random/uniform_real_distribution/operators/ 64351.cc: New. diff --git a/libstdc++-v3/include/bits/random.tcc b/libstdc++-v3/include/bits/random.tcc index 4fdbcfc..a6d966b 100644 --- a/libstdc++-v3/include/bits/random.tcc +++ b/libstdc++-v3/include/bits/random.tcc @@ -3472,15 +3472,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const long double __r = static_castlong double(__urng.max()) - static_castlong double(__urng.min()) + 1.0L; const size_t __log2r = std::log(__r) / std::log(2.0L); - size_t __k = std::maxsize_t(1UL, (__b + __log2r - 1UL) / __log2r); - _RealType __sum = _RealType(0); - _RealType __tmp = _RealType(1); - for (; __k != 0; --__k) + const size_t __m = std::maxsize_t(1UL, + (__b + __log2r - 1UL) / __log2r); + _RealType __ret; + do { - __sum += _RealType(__urng() - __urng.min()) * __tmp; - __tmp *= __r; + _RealType __sum = _RealType(0); + _RealType __tmp = _RealType(1); + for (size_t __k = __m; __k != 0; --__k) + { + __sum += _RealType(__urng() - __urng.min()) * __tmp; + __tmp *= __r; + } + __ret = __sum / __tmp; } - return __sum / __tmp; + while (__builtin_expect(__ret = _RealType(1), 0)); + return __ret; } _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc new file mode 100644 index 000..3de4412 --- /dev/null +++ b/libstdc++-v3/testsuite/26_numerics/random/uniform_real_distribution/operators/64351.cc @@ -0,0 +1,57 @@ +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// { dg-options -std=gnu++11 } +// { dg-do run { target { ! simulator } } } + +#include random +#include testsuite_hooks.h + +// libstdc++/64351 +void +test01() +{ + std::mt19937 rng(8890); + std::uniform_real_distributionfloat dist; + + rng.discard(30e6); + for (long i = 0; i 10e6; ++i) +{ + auto n = dist(rng); + VERIFY( n != 1.f ); +} +} + +// libstdc++/63176 +void +test02() +{ + std::mt19937 rng(8890); + std::seed_seq sequence{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; + rng.seed(sequence); + rng.discard(12 * 629143 + 6); + float n = +std::generate_canonicalfloat, std::numeric_limitsfloat::digits(rng); + VERIFY( n != 1.f ); +} + +int +main() +{ + test01(); + test02(); +}
Re: [PATCH][4/N] Introduce new inline functions for GET_MODE_UNIT_SIZE and GET_MODE_UNIT_PRECISION
On 26 Aug 2015, at 23:27, Oleg Endo oleg.e...@t-online.de wrote: On 19 Aug 2015, at 22:35, Jeff Law l...@redhat.com wrote: On 08/19/2015 06:29 AM, David Sherwood wrote: I asked Richard S. to give this a once-over which he did. However, he technically can't approve due to the way his maintainership position was worded. The one request would be a function comment for emit_mode_unit_size and emit_mode_unit_precision. OK with that change. Thanks. Here's a new patch with the comments added. Good to go? David. ChangeLog: 2015-08-19 David Sherwood david.sherw...@arm.com gcc/ * genmodes.c (emit_mode_unit_size_inline): New function. (emit_mode_unit_precision_inline): New function. (emit_insn_modes_h): Emit new #define. Emit new functions. (emit_mode_unit_size): New function. (emit_mode_unit_precision): New function. (emit_mode_adjustments): Add mode_unit_size adjustments. (emit_insn_modes_c): Emit new arrays. * machmode.h (GET_MODE_UNIT_SIZE, GET_MODE_UNIT_PRECISION): Update to use new inline methods. Thanks, this is OK for the trunk. It seems this broke sh-elf, at least when compiling on OSX with its native clang. ../../gcc-trunk/gcc/machmode.h:228:43: error: redefinition of 'mode_unit_size' with a different type: 'const unsigned char [56]' vs 'unsigned char [56]' extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES]; ^ ./insn-modes.h:417:24: note: previous definition is here extern unsigned char mode_unit_size[NUM_MACHINE_MODES]; ^ This following fixes the problem for me: Index: gcc/genmodes.c === --- gcc/genmodes.c (revision 227221) +++ gcc/genmodes.c (working copy) @@ -1063,7 +1063,7 @@ unsigned char\n\ mode_unit_size_inline (machine_mode mode)\n\ {\n\ - extern unsigned char mode_unit_size[NUM_MACHINE_MODES];\n\ + extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];\n\ switch (mode)\n\ {); Cheers, Oleg
[PATCH] s390: Add emit_barrier() after trap.
This patch fixes an ICE on S390 when a trap is generated because the given -mstack-size is to small. A barrier was missing after the trap, so on higher optimization levels a NULL pointer fron an uninitialized basic block was used. The patch also contains a test case. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog * config/s390/s390.c (s390_emit_prologue): Add emit_barrier() after trap to fix ICE. gcc/testsuite/ChangeLog * gcc.target/s390/20150826-1.c: New test. From ec6b88cd51234d138bd559271def086156fcae07 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Wed, 26 Aug 2015 14:37:00 +0100 Subject: [PATCH] s390: Add emit_barrier() after trap. --- gcc/config/s390/s390.c | 1 + gcc/testsuite/gcc.target/s390/20150826-1.c | 11 +++ 2 files changed, 12 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/20150826-1.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6366691..5951598 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -10491,6 +10491,7 @@ s390_emit_prologue (void) current_function_name(), cfun_frame_layout.frame_size, s390_stack_size); emit_insn (gen_trap ()); + emit_barrier (); } else { diff --git a/gcc/testsuite/gcc.target/s390/20150826-1.c b/gcc/testsuite/gcc.target/s390/20150826-1.c new file mode 100644 index 000..830772f --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/20150826-1.c @@ -0,0 +1,11 @@ +/* Check that -mstack-size=32 does not cause an ICE. */ + +/* { dg-do compile } */ +/* { dg-options -O3 -mstack-size=32 -Wno-pointer-to-int-cast } */ + +extern char* bar(char *); +int foo(void) +{ + char b[100]; + return (int)bar(b); +} /* { dg-warning An unconditional trap is added } */ -- 2.3.0
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 4:38 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-26 16:02 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Fri, Aug 21, 2015 at 2:17 PM, Ilya Enkovich enkovich@gmail.com wrote: 2015-08-21 14:00 GMT+03:00 Richard Biener richard.guent...@gmail.com: Hmm, I don't see how vector masks are more difficult to operate with. There are just no instructions for that but you have to pretend you have to get code vectorized. Huh? Bitwise ops should be readily available. Right bitwise ops are available, but there is no comparison into a vector and no masked loads and stores using vector masks (when we speak about 512-bit vectors). Also according to vector ABI integer mask should be used for mask operand in case of masked vector call. What ABI? The function signature of the intrinsics? How would that come into play here? Not intrinsics. I mean OpenMP vector functions which require integer arg for a mask in case of 512-bit vector. How do you declare those? Something like this: #pragma omp declare simd inbranch int foo(int*); The 'inbranch' is the thing that matters? And all of foo is then implicitely predicated? Current implementation of masked loads, masked stores and bool patterns in vectorizer just reflect SSE4 and AVX. Can (and should) we really call it a canonical representation for all targets? No idea - we'll revisit when another targets adds a similar capability. AVX-512 is such target. Current representation forces multiple scalar mask - vector mask and back transformations which are artificially introduced by current bool patterns and are hard to optimize out. I dislike the bool patterns anyway and we should try to remove those and make the vectorizer handle them in other ways (they have single-use issues anyway). I don't remember exactly what caused us to add them but one reason was there wasn't a vector type for 'bool' (but I don't see how it should be necessary to ask get me a vector type for 'bool'). Using scalar masks everywhere should probably cause the same conversion problem for SSE I listed above though. Talking about a canonical representation, shouldn't we use some special masks representation and not mixing it with integer and vector of integers then? Only in this case target would be able to efficiently expand it into a corresponding rtl. That was my idea of vectorbool ... but I didn't explore it and see where it will cause issues. Fact is GCC already copes with vector masks generated by vector compares just fine everywhere and I'd rather leave it as that. Nope. Currently vector mask is obtained from a vec_cond A op B, {0 .. 0}, {-1 .. -1}. AND and IOR on bools are also expressed via additional vec_cond. I don't think vectorizer ever generates vector comparison. Ok, well that's an implementation detail then. Are you sure about AND and IOR? The comment above vect_recog_bool_pattern says Assuming size of TYPE is the same as size of all comparisons (otherwise some casts would be added where needed), the above sequence we create related pattern stmts: S1' a_T = x1 CMP1 y1 ? 1 : 0; S3' c_T = x2 CMP2 y2 ? a_T : 0; S4' d_T = x3 CMP3 y3 ? 1 : 0; S5' e_T = c_T | d_T; S6' f_T = e_T; thus has vector mask | I think in practice it would look like: S4' d_T = x3 CMP3 y3 ? 1 : c_T; Thus everything is usually hidden in vec_cond. But my concern is mostly about types used for that. And I wouldn't say it's fine 'everywhere' because there is a single target utilizing them. Masked loads and stored for AVX-512 just don't work now. And if we extend existing MASK_LOAD and MASK_STORE optabs to 512-bit vector then we get an ugly inefficient code. The question is where to fight with this inefficiency: in RTL or in GIMPLE. I want to fight with it where it appears, i.e. in GIMPLE by preventing bool - int conversions applied everywhere even if target doesn't need it. If we don't want to support both types of masks in GIMPLE then it's more reasonable to make bool - int conversion in expand for targets requiring it, rather than do it for everyone and then leave it to target to transform it back and try to get rid of all those redundant transformations. I'd give vectorbool a chance to become a canonical mask representation for that. Well, you are missing the case of bool b = a b; int x = (int)b; This case seems to require no changes and just be transformed into vec_cond. Ok, the example was too simple but I meant that a bool has a non-conditional use. Ok, so I still believe we don't want two ways to express things on GIMPLE if possible. Yes, the vectorizer already creates only vector stmts that are supported by the hardware. So it's a matter of deciding on the GIMPLE representation for the mask. I'd rather use vectorbool (and the target assigning an integer mode to it) than an 'int'
Re: [Scalar masks 2/x] Use bool masks in if-conversion
On Wed, Aug 26, 2015 at 04:56:23PM +0200, Richard Biener wrote: How do you declare those? Something like this: #pragma omp declare simd inbranch int foo(int*); The 'inbranch' is the thing that matters? And all of foo is then implicitely predicated? If it is #pragma omp declare simd notinbranch, then only the non-predicated version is emitted and thus it is usable only in vectorized loops inside of non-conditional contexts. If it is #pragma omp declare simd inbranch, then only the predicated version is emitted, there is an extra argument (either V*QI if I remember well, or for AVX-512 short/int/long bitmask), if the caller wants to use it in non-conditional contexts, it just passes all ones mask. For #pragma omp declare simd (neither inbranch nor notinbranch), two versions are emitted, one predicated and one non-predicated. Jakub
[PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran)
Hans-Peter Nilsson wrote: I don't feel very confused, but I understand you've investigated things down to a point where we can conclude that libtool can't do what SPU needs without also at least fiddling with compilation options. Well, looks like I was confused after all. I missed one extra feature of libtool that does indeed just make everything work automatically: if a library is set up using the noinst flag, libtool considers it a convenience library and will never create a shared library in any case; but it will create two sets of object files, one suitable for linking into a static library and one suitable for linking into a shared library, and will automatically use the correct set when linking any other library against the convenince library. This is exactly what we want to happen for libbacktrace. And in fact, it is *already* set up as convenience library: noinst_LTLIBRARIES = libbacktrace.la This means the only thing we need to do is simply remove all the special code: no more disable-shared and no more fiddling with -fPIC (except for the --enable-host-shared case, which remains special just like it does in all other libraries). I've verified that this works on x86_64: the resulting libgfortran.so uses the -fPIC version of the libbacktrace object, while libgfortran.a uses the non-PIC versions. On SPU, libtool will now automatically only generate the non-PIC versions since the target does not support shared library. So everything works as expected. OK for mainline? Bye, Ulrich Index: libbacktrace/configure.ac === --- libbacktrace/configure.ac (revision 227217) +++ libbacktrace/configure.ac (working copy) @@ -79,7 +79,7 @@ case $AWK in ) AC_MSG_ERROR([can't build without awk]) ;; esac -LT_INIT([disable-shared]) +LT_INIT AM_PROG_LIBTOOL backtrace_supported=yes @@ -161,22 +161,11 @@ else fi fi -# When building as a target library, shared libraries may want to link -# this in. We don't want to provide another shared library to -# complicate dependencies. Instead, we just compile with -fPIC, if -# the target supports compiling with that option. -PIC_FLAG= -if test -n ${with_target_subdir}; then - ac_save_CFLAGS=$CFLAGS - CFLAGS=$CFLAGS -fPIC - AC_TRY_COMPILE([], [], [PIC_FLAG=-fPIC]) - CFLAGS=$ac_save_CFLAGS -fi -# Similarly, use -fPIC with --enable-host-shared: +# Enable --enable-host-shared. AC_ARG_ENABLE(host-shared, [AS_HELP_STRING([--enable-host-shared], [build host code as shared libraries])], -[PIC_FLAG=-fPIC], []) +[PIC_FLAG=-fPIC], [PIC_FLAG=]) AC_SUBST(PIC_FLAG) # Test for __sync support. -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [gomp4.1] comment some stuff
On Tue, Aug 25, 2015 at 10:35:37AM -0700, Aldy Hernandez wrote: diff --git a/libgomp/env.c b/libgomp/env.c index 65a6851..0569521 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -69,7 +69,7 @@ struct gomp_task_icv gomp_global_icv = { unsigned long gomp_max_active_levels_var = INT_MAX; bool gomp_cancel_var = false; -int gomp_max_task_priority_var = 0; +static int gomp_max_task_priority_var = 0; #ifndef HAVE_SYNC_BUILTINS gomp_mutex_t gomp_managed_threads_lock; #endif Please remove this hunk. The variable is meant to be used in task.c, where (void) priority; is present right now (like: if (priority gomp_max_task_priority_var) priority = gomp_max_task_priority_var; or so. @@ -110,7 +112,12 @@ static void gomp_task_maybe_wait_for_dependencies (void **depend); /* Called when encountering an explicit task directive. If IF_CLAUSE is false, then we must not delay in executing the task. If UNTIED is true, - then the task may be executed by any member of the team. */ + then the task may be executed by any member of the team. + + DEPEND is an array containing: + depend[0]: number of depend elements. + depend[1]: number of depend elements of type out. + depend[N+2]: address of [0..N]th depend element. */ Either [1..N]th, or [0..N-1]th. And depend[N+2] should better be depend[2..N+1]. Otherwise LGTM. Jakub
Re: [gomp4] teach the tracer pass to ignore more blocks for OpenACC
On 08/26/15 09:57, Cesar Philippidis wrote: I hit a problem in on one of my reduction test cases where the GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be single-entry, single-exit regions, or some form of thread divergence may occur. When that happens, we cannot use the shfl instruction for reductions or broadcasting (if the warp is divergent), and it may cause problems with synchronization in general. Nathan ran into a similar problem in one of the ssa passes when he added support for predication in the nvptx backend. Part of his solution was to add a gimple_call_internal_unique_p function to determine if internal functions are safe to be cloned. This patch teaches the tracer to scan each basic block for internal function calls using gimple_call_internal_unique_p, and mark the blocks that contain certain OpenACC internal functions calls as ignored. It is a shame that gimple_statement_iterators do not play nicely with const_basic_block. Is this patch ok for gomp-4_0-branch? ok by me. (I idly wonder if tracer should be using the routine that jump-threading has for scanning a block determining duplicability) nathan -- Nathan Sidwell - Director, Sourcery Services - Mentor Embedded
[patch] libstdc++/66902 Make _S_debug_messages static.
This patch removes a public symbol from the .so, which is generally a bad thing, but there should be no users of this anywhere (it's never declared in any public header). For targets using symbol versioning this isn't exported at all, as it isn't in the linker script, so this really just makes other targets consistent with the ones using versioned symbols. Tested powerpc64le-linux and dragonfly-4.2, committed to trunk commit d35fbf8937930554af62a7320806abecf7381175 Author: Jonathan Wakely jwak...@redhat.com Date: Fri Jul 17 10:15:03 2015 +0100 libstdc++/66902 Make _S_debug_messages static. PR libstdc++/66902 * src/c++11/debug.cc (_S_debug_messages): Give internal linkage. diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc index 997c0f3..c435de7 100644 --- a/libstdc++-v3/src/c++11/debug.cc +++ b/libstdc++-v3/src/c++11/debug.cc @@ -103,7 +103,7 @@ namespace namespace __gnu_debug { - const char* _S_debug_messages[] = + static const char* _S_debug_messages[] = { // General Checks function requires a valid iterator range [%1.name;, %2.name;),
[v3 patch] try_emplace and insert_or_assign for Debug Mode.
These new members need to be defined in Debug Mode, because the iterators passed in as hints and returned as results need to be safe iterators. No new tests, because we already have tests for these members, and they're failing in debug mode. Tested powerpc64le-linux, committed to trunk. commit ae899df9056ff8a58d658ef42125935856503f96 Author: Jonathan Wakely jwak...@redhat.com Date: Wed Aug 26 21:24:30 2015 +0100 try_emplace and insert_or_assign for Debug Mode. * include/debug/map.h (map::try_emplace, map::insert_or_assign): Define. * include/debug/unordered_map (unordered_map::try_emplace, unordered_map::insert_or_assign): Define. diff --git a/libstdc++-v3/include/debug/map.h b/libstdc++-v3/include/debug/map.h index d45cf79..914d721 100644 --- a/libstdc++-v3/include/debug/map.h +++ b/libstdc++-v3/include/debug/map.h @@ -317,6 +317,89 @@ namespace __debug _Base::insert(__first, __last); } + +#if __cplusplus 201402L + template typename... _Args +pairiterator, bool +try_emplace(const key_type __k, _Args... __args) +{ + auto __res = _Base::try_emplace(__k, + std::forward_Args(__args)...); + return { iterator(__res.first, this), __res.second }; + } + + template typename... _Args +pairiterator, bool +try_emplace(key_type __k, _Args... __args) +{ + auto __res = _Base::try_emplace(std::move(__k), + std::forward_Args(__args)...); + return { iterator(__res.first, this), __res.second }; + } + + template typename... _Args +iterator +try_emplace(const_iterator __hint, const key_type __k, +_Args... __args) +{ + __glibcxx_check_insert(__hint); + return iterator(_Base::try_emplace(__hint.base(), __k, + std::forward_Args(__args)...), + this); + } + + template typename... _Args +iterator +try_emplace(const_iterator __hint, key_type __k, _Args... __args) +{ + __glibcxx_check_insert(__hint); + return iterator(_Base::try_emplace(__hint.base(), std::move(__k), + std::forward_Args(__args)...), + this); + } + + template typename _Obj +std::pairiterator, bool +insert_or_assign(const key_type __k, _Obj __obj) + { + auto __res = _Base::insert_or_assign(__k, + std::forward_Obj(__obj)); + return { iterator(__res.first, this), __res.second }; + } + + template typename _Obj +std::pairiterator, bool +insert_or_assign(key_type __k, _Obj __obj) + { + auto __res = _Base::insert_or_assign(std::move(__k), + std::forward_Obj(__obj)); + return { iterator(__res.first, this), __res.second }; + } + + template typename _Obj +iterator +insert_or_assign(const_iterator __hint, + const key_type __k, _Obj __obj) + { + __glibcxx_check_insert(__hint); + return iterator(_Base::insert_or_assign(__hint.base(), __k, + std::forward_Obj(__obj)), + this); + } + + template typename _Obj +iterator +insert_or_assign(const_iterator __hint, key_type __k, _Obj __obj) +{ + __glibcxx_check_insert(__hint); + return iterator(_Base::insert_or_assign(__hint.base(), + std::move(__k), + std::forward_Obj(__obj)), + this); + } +#endif + + #if __cplusplus = 201103L iterator erase(const_iterator __position) diff --git a/libstdc++-v3/include/debug/unordered_map b/libstdc++-v3/include/debug/unordered_map index cc3bc3f..1bbdb61 100644 --- a/libstdc++-v3/include/debug/unordered_map +++ b/libstdc++-v3/include/debug/unordered_map @@ -377,6 +377,88 @@ namespace __debug _M_check_rehashed(__bucket_count); } +#if __cplusplus 201402L + template typename... _Args +pairiterator, bool +try_emplace(const key_type __k, _Args... __args) +{ + auto __res = _Base::try_emplace(__k, + std::forward_Args(__args)...); + return { iterator(__res.first, this), __res.second }; + } + + template typename... _Args +pairiterator, bool +try_emplace(key_type __k, _Args... __args) +{ + auto __res = _Base::try_emplace(std::move(__k), + std::forward_Args(__args)...); + return { iterator(__res.first, this), __res.second }; + } + + template typename... _Args +iterator +try_emplace(const_iterator __hint, const key_type __k, +_Args... __args) +{ + __glibcxx_check_insert(__hint); + return iterator(_Base::try_emplace(__hint.base(), __k, + std::forward_Args(__args)...), + this); + } + + template typename... _Args +iterator +try_emplace(const_iterator __hint, key_type __k, _Args... __args) +{ + __glibcxx_check_insert(__hint); + return iterator(_Base::try_emplace(__hint.base(), std::move(__k), + std::forward_Args(__args)...), + this); + } + + template typename _Obj +pairiterator, bool +
Re: [gomp4] lowering OpenACC reductions
On 08/21/2015 02:00 PM, Cesar Philippidis wrote: This patch teaches omplower how to utilize the new OpenACC reduction framework described in Nathan's document, which was posted here https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01248.html. Here is the infrastructure patch https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01130.html, and here's the nvptx backend changes https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01334.html. The updated reduction tests have been posted here https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01561.html. All of these patches have been committed to gomp-4_0-branch. Cesar
Go patch committed: Don't crash on invalid builtin calls
This patch by Chris Manghane fixes the Go compiler to not crash when it sees invalid builtin calls. This fixes https://golang.org/issue/11544 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227227) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -cd5362c7bb0b207f484a8dfb8db229fd2bffef09 +5ee78e7d52a4cad0b23f5bc62e5b452489243c70 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 227227) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -6588,7 +6588,11 @@ Builtin_call_expression::Builtin_call_ex recover_arg_is_set_(false) { Func_expression* fnexp = this-fn()-func_expression(); - go_assert(fnexp != NULL); + if (fnexp == NULL) +{ + this-code_ = BUILTIN_INVALID; + return; +} const std::string name(fnexp-named_object()-name()); if (name == append) this-code_ = BUILTIN_APPEND; @@ -6661,7 +6665,7 @@ Expression* Builtin_call_expression::do_lower(Gogo* gogo, Named_object* function, Statement_inserter* inserter, int) { - if (this-classification() == EXPRESSION_ERROR) + if (this-is_error_expression()) return this; Location loc = this-location(); @@ -7500,11 +7504,13 @@ Builtin_call_expression::do_discarding_v Type* Builtin_call_expression::do_type() { + if (this-is_error_expression()) +return Type::make_error_type(); switch (this-code_) { case BUILTIN_INVALID: default: - go_unreachable(); + return Type::make_error_type(); case BUILTIN_NEW: case BUILTIN_MAKE:
[gomp4] initialize worker reduction locks
This patch teaches omplow how to emit function calls to IFN_GOACC_LOCK_INIT so that the worker mutex has a proper initial value. On nvptx targets, shared memory isn't initialized (and that's where the lock is located for OpenACC workers), so this makes it explicit. Nathan added the internal function used in the patch a couple of days ago. I've applied this patch to gomp-4_0-branch. Cesar 2015-08-26 Cesar Philippidis ce...@codesourcery.com gcc/ * omp-low.c (lower_oacc_reductions): Call GOACC_REDUCTION_INIT to initialize the gang and worker mutex. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 955a098..ee92141 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -4795,10 +4795,20 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses, if (ctx-reductions == 0) return; + dim = build_int_cst (integer_type_node, loop_dim); + + /* Call GOACC_LOCK_INIT. */ + if (ifn == IFN_GOACC_REDUCTION_SETUP) +{ + call = build_call_expr_internal_loc (UNKNOWN_LOCATION, + IFN_GOACC_LOCK_INIT, + void_type_node, 2, dim, lid); + gimplify_and_add (call, ilist); +} + /* Call GOACC_LOCK. */ if (ifn == IFN_GOACC_REDUCTION_FINI write_back) { - dim = build_int_cst (integer_type_node, loop_dim); call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_LOCK, void_type_node, 2, dim, lid); gimplify_and_add (call, ilist);
RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Thursday, August 20, 2015 9:19 PM To: Ajit Kumar Agarwal; Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On 08/20/2015 09:38 AM, Ajit Kumar Agarwal wrote: Bootstrapping with i386 and Microblaze target works fine. No regression is seen in Deja GNU tests for Microblaze. There are lesser failures. Mibench/EEMBC benchmarks were run for Microblaze target and the gain of 9.3% is seen in rgbcmy_lite the EEMBC benchmarks. What do you mean by there are lesser failures? Are you saying there are cases where path splitting generates incorrect code, or cases where path splitting produces code that is less efficient, or something else? I meant there are more Deja GNU testcases passes with the path splitting changes. Ah, in that case, that's definitely good news! Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c void dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs) { int i, level, qmul, qadd; qadd = (qscale - 1) | 1; qmul = qscale 1; for (i = 0; i = nCoeffs; i++) { level = block[i]; if (level 0) level = level * qmul - qadd; else level = level * qmul + qadd; block[i] = level; } } The above Loop is a candidate of path splitting as the IF block merges at the latch of the Loop and the path splitting duplicates The latch of the loop which is the statement block[i] = level into the predecessors THEN and ELSE block. Due to above path splitting, the IF conversion is disabled and the above IF-THEN-ELSE is not IF-converted and the test case fails. There were following review comments from the above patch. +/* This function performs the feasibility tests for path splitting + to perform. Return false if the feasibility for path splitting + is not done and returns true if the feasibility for path splitting + is done. Following feasibility tests are performed. + + 1. Return false if the join block has rhs casting for assign + gimple statements. Comments from Jeff: These seem totally arbitrary. What's the reason behind each of these restrictions? None should be a correctness requirement AFAICT. In the above patch I have made a check given in point 1. in the loop latch and the Path splitting is disabled and the IF-conversion happens and the test case passes. I have incorporated the above review comments of not doing the above feasibility check of the point 1 and the above testcases goes For path splitting and due to path splitting the if-cvt is not happening and the test case fails (expecting the pattern Applying if conversion To be present). With the above patch given for review and the Feasibility check of cast assign in the latch of the loop as given in point 1 disables the path splitting and if-cvt happens and the above test case passes. Please let me know whether to keep the above feasibility check as given in point 1 or better appropriate changes required for the above Test case scenario of path splitting vs IF-conversion. Thanks Regards Ajit jeff
Re: [RFC 4/5] Handle constant-pool entries
Jeff Law wrote: The question I have is why this differs from the effects of patch #5. That would seem to indicate that there's things we're not getting into the candidate tables with this approach?!? I'll answer this first, as I think (Richard and) Martin have identified enough other issues with this patch that will take longer to address but if you look at the context to the hunk in patch 5, it is iterating through the candidates (from patch 4), and then filtering out any candidates bigger than max-scalarization-size, which filtering patch 5 removes. --Alan
[Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex
Bootstrapped and tested on x86_64-pc-linux-gnu. Thanks! -- Regards, Tim Shen commit e134e1a835ad15900686351cade36774593b91ea Author: Tim Shen tims...@google.com Date: Wed Aug 26 17:51:29 2015 -0700 PR libstdc++/67362 * include/bits/regex_scanner.tcc (_Scanner::_M_scan_normal): Always returns ordinary char token if the char isn't considered a special char. * testsuite/28_regex/regression.cc: New test file for collecting regression testcases from, typically, bugzilla. diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc b/libstdc++-v3/include/bits/regex_scanner.tcc index 3bcbd0f..1555669 100644 --- a/libstdc++-v3/include/bits/regex_scanner.tcc +++ b/libstdc++-v3/include/bits/regex_scanner.tcc @@ -99,6 +99,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto __c = *_M_current++; const char* __pos; + if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr) + { + _M_token = _S_token_ord_char; + _M_value.assign(1, __c); + return; + } if (__c == '\\') { if (_M_current == _M_end) diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc b/libstdc++-v3/testsuite/28_regex/regression.cc new file mode 100644 index 000..71d82d5 --- /dev/null +++ b/libstdc++-v3/testsuite/28_regex/regression.cc @@ -0,0 +1,42 @@ +// { dg-options -std=gnu++11 } + +// +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include testsuite_hooks.h +#include testsuite_regex.h + +using namespace __gnu_test; +using namespace std; + +// PR libstdc++/67362 +void +test01() +{ + bool test __attribute__((unused)) = true; + + regex re(((.), regex_constants::basic); +} + +int +main() +{ + test01(); + return 0; +} +
Re: [Patch, libstdc++/67362] Fix non-special character for POSIX basic syntax in regex
On Wed, Aug 26, 2015 at 6:41 PM, Tim Shen tims...@google.com wrote: Bootstrapped and tested on x86_64-pc-linux-gnu. Also plan to backport to 4.9 and 5. -- Regards, Tim Shen
[c++-delayed-folding] fold_simple
Why does fold_simple fold so many patterns? I thought we wanted something that would just fold conversions and negations of constant values. Jason
[gomp4, wip] remove references to ganglocal shared memory inside gcc
This patch strips out all of the references to ganglocal memory in gcc. Unfortunately, the runtime api still takes a shared memory parameter, so I haven't made any changes there yet. Perhaps we could still keep the shared memory argument to GOACC_parallel, but remove all of the support for ganglocal mappings. Then again, maybe we still need support ganglocal mappings for legacy purposes. With the ganglocal mapping aside, I'm in favor of leaving the shared memory argument to GOACC_parallel, just in case we find another use for shared memory in the future. Nathan, what do you want to do here? Cesar 2015-08-26 Cesar Philippidis ce...@codesourcery.com gcc/ * builtins.c (expand_oacc_ganglocal_ptr): Delete. (expand_builtin): Remove stale GOACC_GET_GANGLOCAL_PTR builtin. * config/nvptx/nvptx.md (ganglocal_ptr): Delete. * gimple.h (struct gimple_statement_omp_parallel_layout): Remove ganglocal_size member. (gimple_omp_target_ganglocal_size): Delete. (gimple_omp_target_set_ganglocal_size): Delete. * omp-builtins.def (BUILT_IN_GOACC_GET_GANGLOCAL_PTR): Delete. * omp-low.c (struct omp_context): Remove ganglocal_init, ganglocal_ptr, ganglocal_size, ganglocal_size_host, worker_var, worker_count and worker_sync_elt. (alloc_var_ganglocal): Delete. (install_var_ganglocal): Delete. (new_omp_context): Don't use ganglocal memory. (expand_omp_target): Likewise. (lower_omp_taskreg): Likewise. (lower_omp_target): Likewise. * tree-parloops.c (create_parallel_loop): Likewise. * tree-pretty-print.c (dump_omp_clause): Remove support for GOMP_MAP_FORCE_TO_GANGLOCAL diff --git a/gcc/builtins.c b/gcc/builtins.c index 7c3ead1..f465716 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5913,25 +5913,6 @@ expand_builtin_acc_on_device (tree exp, rtx target) return target; } -static rtx -expand_oacc_ganglocal_ptr (rtx target ATTRIBUTE_UNUSED) -{ -#ifdef HAVE_ganglocal_ptr - enum insn_code icode; - icode = CODE_FOR_ganglocal_ptr; - rtx tmp = target; - if (!REG_P (tmp) || GET_MODE (tmp) != Pmode) -tmp = gen_reg_rtx (Pmode); - rtx insn = GEN_FCN (icode) (tmp); - if (insn != NULL_RTX) -{ - emit_insn (insn); - return tmp; -} -#endif - return NULL_RTX; -} - /* Expand an expression EXP that calls a built-in function, with result going to TARGET if that's convenient (and in mode MODE if that's convenient). @@ -7074,12 +7055,6 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, return target; break; -case BUILT_IN_GOACC_GET_GANGLOCAL_PTR: - target = expand_oacc_ganglocal_ptr (target); - if (target) - return target; - break; - default: /* just do library call, if unknown builtin */ break; } diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 3d734a8..d0d6564 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1485,23 +1485,6 @@ %.\\tst.shared%u1\\t%1,%0;) -(define_insn ganglocal_ptrmode - [(set (match_operand:P 0 nvptx_register_operand ) - (unspec:P [(const_int 0)] UNSPEC_SHARED_DATA))] - - %.\\tcvta.shared%t0\\t%0, sdata;) - -(define_expand ganglocal_ptr - [(match_operand 0 nvptx_register_operand )] - -{ - if (Pmode == DImode) -emit_insn (gen_ganglocal_ptrdi (operands[0])); - else -emit_insn (gen_ganglocal_ptrsi (operands[0])); - DONE; -}) - ;; Atomic insns. (define_expand atomic_compare_and_swapmode diff --git a/gcc/gimple.h b/gcc/gimple.h index d8d8742..278b49f 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -580,10 +580,6 @@ struct GTY((tag(GSS_OMP_PARALLEL_LAYOUT))) /* [ WORD 10 ] Shared data argument. */ tree data_arg; - - /* [ WORD 11 ] - Size of the gang-local memory to allocate. */ - tree ganglocal_size; }; /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */ @@ -5232,25 +5228,6 @@ gimple_omp_target_set_data_arg (gomp_target *omp_target_stmt, } -/* Return the size of gang-local data associated with OMP_TARGET GS. */ - -static inline tree -gimple_omp_target_ganglocal_size (const gomp_target *omp_target_stmt) -{ - return omp_target_stmt-ganglocal_size; -} - - -/* Set SIZE to be the size of gang-local memory associated with OMP_TARGET - GS. */ - -static inline void -gimple_omp_target_set_ganglocal_size (gomp_target *omp_target_stmt, tree size) -{ - omp_target_stmt-ganglocal_size = size; -} - - /* Return the clauses associated with OMP_TEAMS GS. */ static inline tree diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index 0d9f386..615c4e0 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -58,8 +58,6 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_UPDATE, GOACC_update, DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, GOACC_wait, BT_FN_VOID_INT_INT_VAR, ATTR_NOTHROW_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr, - BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST) DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DEVICEPTR, GOACC_deviceptr, BT_FN_PTR_PTR,
Re: C++ delayed folding branch review
On 08/24/2015 03:15 AM, Kai Tietz wrote: 2015-08-03 17:39 GMT+02:00 Jason Merrill ja...@redhat.com: On 08/03/2015 05:42 AM, Kai Tietz wrote: 2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com: On 07/31/2015 05:54 PM, Kai Tietz wrote: The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could remove, but for one case in constexpr. Without folding we don't do type-sinking/raising. Right. So binary/unary operations might be containing cast, which were in the past unexpected. Why aren't the casts folded away? On such cast constructs, as for this vector-sample, we can't fold away Which testcase is this? It is the g++.dg/ext/vector20.C testcase. IIRC I mentioned this testcase already earlier as reference, but I might be wrong here. I don't see any casts in that testcase. So the compiler is introducing introducing conversions back and forth between const and non-const, then? I suppose it doesn't so much matter where they come from, they should be folded away regardless. the cast chain. The difference here to none-delayed-folding branch is that the cast isn't moved out of the plus-expr. What we see now is (plus ((vec) (const vector ...) { }), ...). Before we had (vec) (plus (const vector ...) { ... }). How could a PLUS_EXPR be considered a reduced constant, regardless of where the cast is? Of course it is just possible to sink out a cast from PLUS_EXPR, in pretty few circumstance (eg. on constants if both types just differ in const-attribute, if conversion is no view-convert). I don't understand how this is an answer to my question. On verify_constant we check by reduced_constant_expression_p, if value is a constant. We don't handle here, that NOP_EXPRs are something we want to look through here, as it doesn't change anything if this is a constant, or not. NOPs around constants should have been folded away by the time we get there. Not in this cases, as the we actually have here a switch from const to none-const. So there is an attribute-change, which we can't ignore in general. I wasn't suggesting we ignore it, we should be able to change the type of the vector_cst. Well, the vector_cst we can change type, but this wouldn't help AFAICS. As there is still one cast surviving within PLUS_EXPR for the other operand. Isn't the other operand also constant? In constexpr evaluation, either we're dealing with a bunch of constants, in which case we should be folding things fully, including conversions between const and non-const, or we don't care. So the way to solve it would be to move such conversion out of the expression. For integer-scalars we do this, and for some floating-points too. So it might be something we don't handle for operations with vector-type. We don't need to worry about that in constexpr evaluation, since we only care about constant operands. But I agree that for constexpr's we could special case cast from const to none-const (as required in expressions like const vec v = v + 1). Right. But really this should happen in convert.c, it shouldn't be specific to C++. Hmm, maybe. But isn't one of our different goals to move such implicit code-modification to match.pd instead? Folding const into a constant is hardly code modification. But perhaps it should go into fold_unary_loc:VIEW_CONVERT_EXPR rather than into convert.c. Jason
Re: [PATCH] Fix and simplify (Re: Fix libbacktrace -fPIC breakage from Use libbacktrace in libgfortran)
Ulrich Weigand uweig...@de.ibm.com writes: I've verified that this works on x86_64: the resulting libgfortran.so uses the -fPIC version of the libbacktrace object, while libgfortran.a uses the non-PIC versions. On SPU, libtool will now automatically only generate the non-PIC versions since the target does not support shared library. So everything works as expected. OK for mainline? Can you verify that libgo works as expected? Ian
[PATCH] fix --with-cpu for sh targets
A missing * in the pattern for sh targets prevents the --with-cpu configure option from being accepted for certain targets (e.g. ones with explicit endianness, like sh2eb). The latest config.sub should also be pulled from upstream since it has a fix for related issues. Rich --- gcc-5.2.0.orig/gcc/config.gcc +++ gcc-5.2.0/gcc/config.gcc @@ -4096,7 +4099,7 @@ esac ;; - sh[123456ble]-*-* | sh-*-*) + sh[123456ble]*-*-* | sh-*-*) supported_defaults=cpu case `echo $with_cpu | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ_ abcdefghijklmnopqrstuvwxyz- | sed s/sh/m/` in | m1 | m2 | m2e | m3 | m3e | m4 | m4-single | m4-single-only | m4-nofpu )