[C++, ping] Fix PR bootstrap/81926
The analysis and original patch: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00101.html and the amended patch: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00146.html Thanks in advance. -- Eric Botcazou
C++ PATCH for c++/82053, ICE with default argument in lambda in template
When we regenerate a lambda, the resulting op() doesn't have any template information, so we can't delay instantiating default arguments like we do for a normal template function. I believe this is also the direction of the core working group for default arguments in local extern function declarations, so I don't think we need to invent a mechanism to remember those template arguments for later use. Tested x86_64-pc-linux-gnu, applying to trunk. commit 17d672e6eaf10f96174b00207c60b5467693877f Author: Jason Merrill Date: Thu Aug 31 13:03:31 2017 -0400 PR c++/82053 - ICE with default argument in lambda in template * pt.c (tsubst_arg_types): Substitute default arguments for lambdas in templates. (retrieve_specialization): Use lambda_fn_in_template_p. * cp-tree.h: Declare it. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 20fa039..a0e31d3 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -6821,6 +6821,7 @@ extern tree current_nonlambda_function(void); extern tree nonlambda_method_basetype (void); extern tree current_nonlambda_scope(void); extern bool generic_lambda_fn_p(tree); +extern bool lambda_fn_in_template_p(tree); extern void maybe_add_lambda_conv_op(tree); extern bool is_lambda_ignored_entity(tree); extern bool lambda_static_thunk_p (tree); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 4a65e31..ec7bbc8 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1193,16 +1193,8 @@ retrieve_specialization (tree tmpl, tree args, hashval_t hash) /* Lambda functions in templates aren't instantiated normally, but through tsubst_lambda_expr. */ - if (LAMBDA_FUNCTION_P (tmpl)) -{ - bool generic = PRIMARY_TEMPLATE_P (tmpl); - if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl)) > generic) - return NULL_TREE; - - /* But generic lambda functions are instantiated normally, once their -containing context is fully instantiated. */ - gcc_assert (generic); -} + if (lambda_fn_in_template_p (tmpl)) +return NULL_TREE; if (optimize_specialization_lookup_p (tmpl)) { @@ -12579,7 +12571,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t complain, bool lambda_fn_in_template_p (tree fn) { - if (!LAMBDA_FUNCTION_P (fn)) + if (!fn || !LAMBDA_FUNCTION_P (fn)) return false; tree closure = DECL_CONTEXT (fn); return CLASSTYPE_TEMPLATE_INFO (closure) != NULL_TREE; @@ -13248,6 +13240,13 @@ tsubst_arg_types (tree arg_types, done in build_over_call. */ default_arg = TREE_PURPOSE (arg_types); +/* Except that we do substitute default arguments under tsubst_lambda_expr, + since the new op() won't have any associated template arguments for us + to refer to later. */ +if (lambda_fn_in_template_p (in_decl)) + default_arg = tsubst_copy_and_build (default_arg, args, complain, in_decl, + false/*fn*/, false/*constexpr*/); + if (default_arg && TREE_CODE (default_arg) == DEFAULT_ARG) { /* We've instantiated a template before its default arguments diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C b/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C new file mode 100644 index 000..f67dfee --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/lambda-defarg7.C @@ -0,0 +1,13 @@ +// PR c++/82053 +// { dg-do compile { target c++14 } } + +template +int fn() { return 42; } + +template +auto lam = [](int = fn()){}; + +int main() +{ + lam(); +}
Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)
On Thu, 17 Aug 2017, Martin Sebor wrote: > +/* Check LAST_DECL and NODE of the same symbol for attributes that are > + recorded in EXCL to be mutually exclusive with ATTRNAME, diagnose > + them, and return true if any have been found. NODE can be a DECL > + or a TYPE. */ > + > +static bool > +diag_attr_exclusions (tree last_decl, tree node, tree attrname, > + const attribute_spec *spec) EXCL is not an argument to this function, so the comment above it should not refer to EXCL (presumably it should refer to SPEC instead). > + note &= warning (OPT_Wattributes, > + "ignoring attribute %qE in declaration of " > + "a built-in function qD because it conflicts " > + "with attribute %qs", > + attrname, node, excl->name); %qD not qD, presumably. (Generically, warning_at would be preferred to warning, but that may best be kept separate if you don't already have a location available here.) > +static const struct attribute_spec::exclusions attr_gnu_inline_exclusions[] = > +{ > + ATTR_EXCL ("gnu_inline", true, true, true), > + ATTR_EXCL ("noinline", true, true, true), > + ATTR_EXCL (NULL, false, false, false), > +}; This says gnu_inline is incompatible with noinline, and is listed as the EXCL field for the gnu_inline attribute. > +static const struct attribute_spec::exclusions attr_inline_exclusions[] = > +{ > + ATTR_EXCL ("always_inline", true, true, true), > + ATTR_EXCL ("noinline", true, true, true), > + ATTR_EXCL (NULL, false, false, false), > +}; This is listed as the EXCL field for the noinline attribute, but does not mention gnu_inline. Does this mean some asymmetry in when that pair is diagnosed? I don't see tests for that pair added by the patch. (Of course, gnu_inline + always_inline is OK, and attr_inline_exclusions is also used for the always_inline attribute in this patch.) In general, the data structures where you need to ensure manually that if attribute A is listed in EXCL for B, then attribute B is also listed in EXCL for A, seem concerning. I'd expect either data structures that make such asymmetry impossible, or a self-test that verifies that the tables in use are in fact symmetric (unless there is some reason the symmetry is not in fact required and symmetric diagnostics still result from asymmetric tables - in which case the various combinations and orderings of gnu_inline and noinline definitely need tests to show that the diagnostics work). > +both the @code{const} and the @code{pure} attribute is diagnnosed. s/diagnnosed/diagnosed/ -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] correct documentation of attribute ifunc (PR 81882)
This patch is OK with the spacing in the function prototype fixed as noted to follow normal GNU standards. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode
On Wed, 2017-09-06 at 16:13 -0500, Pat Haugen wrote: > On 09/06/2017 11:24 AM, Carl Love wrote: > > + "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0" > > + [(set_attr "type" "integer") > > + (set_attr "length" "4")]) > > Should be type "three" and length "12". > > -Pat Pat: Yes, that is wrong in more ways then one. Looks like I posted the wrong version of the patch. This was the first version which unfortunately results in generating extra extsw instructions. I withdraw this patch from consideration. Carl Love
Re: [RFA] [PATCH 4/4] Ignore reads of "dead" memory locations in DSE
Another old patch getting resurrected... On 01/04/2017 06:50 AM, Richard Biener wrote: > On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law wrote: >> This is the final patch in the kit to improve our DSE implementation. >> >> It's based on a observation by Richi. Namely that a read from bytes of >> memory that are dead can be ignored. By ignoring such reads we can >> sometimes find additional stores that allow us to either eliminate or trim >> an earlier store more aggressively. >> >> This only hit (by hit I mean the ability to ignore resulted in finding a >> full or partially dead store that we didn't otherwise find) once during a >> bootstrap, but does hit often in the libstdc++ testsuite. I've added a test >> derived from the conversation between myself and Richi last year. >> >> There's nothing in the BZ database on this issue and I can't reasonably call >> it a bugfix. I wouldn't lose sleep if this deferred to gcc-8. >> >> Bootstrapped and regression tested on x86-64-linux-gnu. OK for the trunk or >> defer to gcc-8? >> >> >> >> * tree-ssa-dse.c (live_bytes_read): New function. >> (dse_classify_store): Ignore reads of dead bytes. >> >> * testsuite/gcc.dg/tree-ssa/ssa-dse-26.c: New test. >> * testsuite/gcc.dg/tree-ssa/ssa-dse-26.c: Likewise. >> >> >> [ snip ] >> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c >> index a807d6d..f5b53fc 100644 >> --- a/gcc/tree-ssa-dse.c >> +++ b/gcc/tree-ssa-dse.c >> @@ -475,6 +475,41 @@ maybe_trim_partially_dead_store (ao_ref *ref, sbitmap >> live, gimple *stmt) >> } >> } >> >> +/* Return TRUE if USE_REF reads bytes from LIVE where live is >> + derived from REF, a write reference. >> + >> + While this routine may modify USE_REF, it's passed by value, not >> + location. So callers do not see those modifications. */ >> + >> +static bool >> +live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live) >> +{ >> + /* We have already verified that USE_REF and REF hit the same object. >> + Now verify that there's actually an overlap between USE_REF and REF. >> */ >> + if ((use_ref.offset < ref->offset >> + && use_ref.offset + use_ref.size > ref->offset) >> + || (use_ref.offset >= ref->offset >> + && use_ref.offset < ref->offset + ref->size)) > > can you use ranges_overlap_p? (tree-ssa-alias.h) Yes. Didn't know about it. Done. > >> +{ >> + normalize_ref (&use_ref, ref); >> + >> + /* If USE_REF covers all of REF, then it will hit one or more >> +live bytes. This avoids useless iteration over the bitmap >> +below. */ >> + if (use_ref.offset == ref->offset && use_ref.size == ref->size) >> + return true; >> + >> + /* Now iterate over what's left in USE_REF and see if any of >> +those bits are i LIVE. */ >> + for (int i = (use_ref.offset - ref->offset) / BITS_PER_UNIT; >> + i < (use_ref.offset + use_ref.size) / BITS_PER_UNIT; i++) >> + if (bitmap_bit_p (live, i)) > > a bitmap_bit_in_range_p () would be nice to have. And it can be more > efficient than this loop... Yea. That likely would help here. I'm testing with a bitmap_bit_in_range_p implementation (only for sbitmaps since that's what we're using here). That implementation does the reasonably efficient things and is modeled after the sbitmap implementation of bitmap_set_range. >> @@ -554,6 +589,41 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple >> **use_stmt, >> /* If the statement is a use the store is not dead. */ >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref)) >> { >> + /* Handle common cases where we can easily build a ao_ref >> +structure for USE_STMT and in doing so we find that the >> +references hit non-live bytes and thus can be ignored. */ >> + if (live_bytes) >> + { >> + if (is_gimple_assign (use_stmt)) >> + { >> + /* Other cases were noted as non-aliasing by >> +the call to ref_maybe_used_by_stmt_p. */ >> + ao_ref use_ref; >> + ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt)); >> + if (valid_ao_ref_for_dse (&use_ref) >> + && use_ref.base == ref->base >> + && use_ref.size == use_ref.max_size >> + && !live_bytes_read (use_ref, ref, live_bytes)) >> + { >> + if (gimple_vdef (use_stmt)) >> + { >> + /* If we have already seen a store and >> +this is also a store, then we have to >> +fail. */ >> + if (temp) >> + { >> + fail = true; >> + BREAK_FROM_IMM_
Re: Add support to trace comparison instructions and switch statements
On Wed, Sep 06, 2017 at 10:08:01PM +0200, David Edelsohn wrote: > This change broke bootstrap on AIX because sancov.c now references a > macro that is defined as a function on AIX. sancov.c needs to include > tm_p.h to pull in the target-dependent prototypes. The following > patch works for me. Is this okay? > > * sancov.c: Include tm_p.h. Ok, thanks. And sorry for the breakage. > Index: sancov.c > === > --- sancov.c(revision 251817) > +++ sancov.c(working copy) > @@ -28,6 +28,7 @@ > #include "basic-block.h" > #include "options.h" > #include "flags.h" > +#include "tm_p.h" > #include "stmt.h" > #include "gimple-iterator.h" > #include "gimple-builder.h" Jakub
Re: [PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode
On 09/06/2017 11:24 AM, Carl Love wrote: > + "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0" > + [(set_attr "type" "integer") > + (set_attr "length" "4")]) Should be type "three" and length "12". -Pat
RFC: Representation of runtime offsets and sizes
The next main step in the SVE submission is to add support for offsets and sizes that are a runtime invariant rather than a compile time constant. This is an RFC about our approach for doing that. It's an update of https://gcc.gnu.org/ml/gcc/2016-11/msg00031.html (which covered more topics than this message). The size of an SVE register in bits can be any multiple of 128 between 128 and 2048 inclusive. The way we chose to represent this was to have a runtime indeterminate that counts the number of 128 bit blocks above the minimum of 128. If we call the indeterminate X then: * an SVE register has 128 + 128 * X bits (16 + 16 * X bytes) * the last int in an SVE vector is at byte offset 12 + 16 * X * etc. Although the maximum value of X is 15, we don't want to take advantage of that, since there's nothing particularly magical about the value. So we have two types of target: those for which there are no runtime indeterminates, and those for which there is one runtime indeterminate. We decided to generalise the interface slightly by allowing any number of indeterminates, although the underlying implementation is still limited to 0 and 1 for now. The main class for working with these runtime offsets and sizes is "poly_int". It represents a value of the form: C0 + C1 * X1 + ... + Cn * Xn where each coefficient Ci is a compile-time constant and where each indeterminate Xi is a nonnegative runtime value. The class takes two template parameters, one giving the number of coefficients and one giving the type of the coefficients. There are then typedefs for the common cases, with the number of coefficients being controlled by the target. poly_int is used for things like: - the number of elements in a VECTOR_TYPE - the size and number of units in a general machine_mode - the offset of something in the stack frame - SUBREG_BYTE - MEM_SIZE and MEM_OFFSET - mem_ref_offset (only a selective list). There are also rtx and tree representations of poly_int, although I've left those out of this RFC. The patch has detailed documentation -- which I've also attached as a PDF -- but the main points are: * there's no total ordering between poly_ints, so the best we can do when comparing them is to ask whether two values *might* or *must* be related in a particular way. E.g. if mode A has size 2 + 2X and mode B has size 4, the condition: GET_MODE_SIZE (A) <= GET_MODE_SIZE (B) is true for X<=1 and false for X>=2. This translates to: may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false Of course, the may/must distinction already exists in things like alias analysis. * some poly_int arithmetic operations (notably division) are only possible for certain values. These operations therefore become conditional. * target-independent code is exposed to these restrictions even if the current target has no indeterminates. But: * we've tried to provide enough operations that poly_ints are easy to work with. * it means that developers working with non-SVE targets don't need to test SVE. If the code compiles on a non-SVE target, and if it doesn't use any asserting operations, it's reasonable to assume that it will work on SVE too. * for target-specific code, poly_int degenerates to a scalar if there are no runtime invariants for that target. Only very minor changes are needed to non-AArch64 targets. * poly_int operations should be (and in practice seem to be) as efficient as normal scalar operations on non-AArch64 targets. The patch really needs some self-tests (which weren't supported when we did the work originally), but otherwise it's what I'd like to submit. Thanks, Richard 10 Sizes and offsets as runtime invariants ** GCC allows the size of a hardware register to be a runtime invariant rather than a compile-time constant. This in turn means that various sizes and offsets must also be runtime invariants rather than compile-time constants, such as: * the size of a general 'machine_mode' (*note Machine Modes::); * the size of a spill slot; * the offset of something within a stack frame; * the number of elements in a vector; * the size and offset of a 'mem' rtx (*note Regs and Memory::); and * the byte offset in a 'subreg' rtx (*note Regs and Memory::). The motivating example is the Arm SVE ISA, whose vector registers can be any multiple of 128 bits between 128 and 2048 inclusive. The compiler normally produces code that works for all SVE register sizes, with the actual size only being known at runtime. GCC's main representation of such runtime invariants is the 'poly_int' class. This chapter describes what 'poly_int' does, lists the available operations, and gives some general usage guidelines. * Menu: * Overview of poly_int:: * Consequences of using poly_int:: * Comparisons involving poly_int:: * Arithmetic on poly_ints:: * A
Re: Add support to trace comparison instructions and switch statements
This change broke bootstrap on AIX because sancov.c now references a macro that is defined as a function on AIX. sancov.c needs to include tm_p.h to pull in the target-dependent prototypes. The following patch works for me. Is this okay? * sancov.c: Include tm_p.h. Index: sancov.c === --- sancov.c(revision 251817) +++ sancov.c(working copy) @@ -28,6 +28,7 @@ #include "basic-block.h" #include "options.h" #include "flags.h" +#include "tm_p.h" #include "stmt.h" #include "gimple-iterator.h" #include "gimple-builder.h"
Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8
Just a followup on this patch. We did some run-time performance testing internally on this set of change on sparc M8 machine with -mmisalign and -mno-misalign based on the latest upstream gcc for CPU2017 C/C++ SPEED run: ***without -O, -mmisalign slowdown the run-time performance about 4% on average This is mainly due to the following workaround to misaligned support in M8: (config/sparc/sparc.c) +/* for misaligned ld/st provided by M8, the IMM field is 10-bit wide + other than the 13-bit for regular ld/st. + The best solution for this problem is to distinguish each ld/st + whether it's aligned or misaligned. However, due to the current + design of the common routine TARGET_LEGITIMATE_ADDRESS_P, only + the ADDR of a ld/st is passed to the routine, the align info + carried by the corresponding MEM is NOT passed in. without changing + the prototype of TARGET_LEGITIMATE_ADDRESS_P, we cannot use this + best solution. + as a workaround, we have to conservatively treat ALL IMM field of + a ld/st insn on a MISALIGNED target is 10-bit wide. + the side-effect of this workaround is: there will be additiona + REG<-IMM insn generated for regular ld/st when -mmisalign is ON. + However, such additional reload insns should be very easily to be + removed by a set of optimization whenever -O specified. +*/ +#define RTX_OK_FOR_OFFSET_P(X, MODE) \ + (CONST_INT_P (X) \ + && ((!TARGET_MISALIGN \ +&& INTVAL (X) >=3D3D -0x1000 \ +&& INTVAL (X) <=3D3D (0x1000 - GET_MODE_SIZE (MODE)))\ +|| (TARGET_MISALIGN \ +&& INTVAL (X) >=3D3D -0x0400 \ +&& INTVAL (X) <=3D3D (0x0400 - GET_MODE_SIZE (MODE) due to this run-time regression introduced by this workaround is not trivial, We decided to hold on this set of change at this time. Thanks. Qing > > This set of change is to provide a way to use misaligned load/store insns to > implement the compiler-time known unaligned memory access, -mno-misalign can > be used > to disable such behavior very easily if our performance data shows that > misaligned load/store insns are slower than the current software emulation. > > Qing
Re: [PATCH, rs6000] Add support for vec_xst_len_r() and vec_xl_len_r() builtins
Hi Carl, On Wed, Sep 06, 2017 at 08:22:03AM -0700, Carl Love wrote: > (define_insn "*stxvl"): add missing argument to the sldi instruction. s/add/Add/ . This one-liner fix is approved right now, please commit it as a separate patch. > +(define_insn "addi_neg16" > + [(set (match_operand:DI 0 "vsx_register_operand" "=r") > + (unspec:DI > + [(match_operand:DI 1 "gpc_reg_operand" "r")] > + UNSPEC_ADDI_NEG16))] > + "" > + "addi %0,%1,-16" > +) You don't need a separate insn (or unspec) for this at all afaics... Where you do emit_insn (gen_addi_neg16 (tmp, operands[2])); you could just do emit_insn (gen_adddi3 (tmp, operands[2], GEN_INT (-16))); > +;; Load VSX Vector with Length, right justified > +(define_expand "lxvll" > + [(set (match_dup 3) > +(match_operand:DI 2 "register_operand")) > + (set (match_operand:V16QI 0 "vsx_register_operand") > + (unspec:V16QI > + [(match_operand:DI 1 "gpc_reg_operand") > + (match_dup 3)] > + UNSPEC_LXVLL))] > + "TARGET_P9_VECTOR && TARGET_64BIT" > +{ > + operands[3] = gen_reg_rtx (DImode); > +}) Hrm, so you make a reg 3 only because the lxvll pattern will clobber it? > +(define_insn "*lxvll" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") > + (unspec:V16QI > + [(match_operand:DI 1 "gpc_reg_operand" "b") > + (match_operand:DI 2 "register_operand" "+r")] > + UNSPEC_LXVLL))] > + "TARGET_P9_VECTOR && TARGET_64BIT" > +;; "lxvll %x0,%1,%2;" > + "sldi %2,%2, 56\; lxvll %x0,%1,%2;" > + [(set_attr "length" "8") > + (set_attr "type" "vecload")]) It is nicer to just have a match_scratch in here then, like (define_insn "*lxvll" [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b") (match_operand:DI 2 "register_operand" "r")] UNSPEC_LXVLL)) (clobber (match_scratch:DI 3 "=&r"))] "TARGET_P9_VECTOR && TARGET_64BIT" "sldi %3,%2,56\;lxvll %x0,%1,%3" [(set_attr "length" "8") (set_attr "type" "vecload")]) (Note spacing, comment, ";" stuff, and the earlyclobber). Ideally you split the sldi off in the expand though, so that the *lxvll pattern is really just that single insn. > +(define_insn "altivec_lvsl_reg" > + [(set (match_operand:V16QI 0 "vsx_register_operand" "=v") > + (unspec:V16QI > + [(match_operand:DI 1 "gpc_reg_operand" "b")] > + UNSPEC_LVSL_REG))] > + "TARGET_ALTIVEC" > + "lvsl %0,0,%1" > + [(set_attr "type" "vecload")]) vecload isn't really the correct type for this, but I see we have the same on the existing lvsl patterns (it's permute unit on p9; I expect the same on p8 and older, but please check). Please move this next to the existing lvsl pattern. > +;; Expand for builtin xl_len_r > +(define_expand "xl_len_r" > + [(match_operand:V16QI 0 "vsx_register_operand" "=v") > + (match_operand:DI 1 "register_operand" "r") > + (match_operand:DI 2 "register_operand" "r")] > + "UNSPEC_XL_LEN_R" > +{ > + rtx shift_mask = gen_reg_rtx (V16QImode); > + rtx rtx_vtmp = gen_reg_rtx (V16QImode); > + rtx tmp = gen_reg_rtx (DImode); > + > +/* Setup permute vector to shift right by operands[2] bytes. > + Note: addi operands[2], -16 is negative so we actually need to > + shift left to get a right shift. */ Indent the comment with the code, so that's 2 spaces more here. The comment isn't clear to me... Neither is the code though: lvsl looks at just the low 4 bits of its arg, so the addi does nothing useful? Maybe I am missing something. > + emit_insn (gen_addi_neg16 (tmp, operands[2])); > + emit_insn (gen_altivec_lvsl_reg (shift_mask, tmp)); > + emit_insn (gen_lxvll (rtx_vtmp, operands[1], operands[2])); > + emit_insn (gen_altivec_vperm_v8hiv16qi (operands[0], rtx_vtmp, > + rtx_vtmp, shift_mask)); > +;; Store VSX Vector with Length, right justified _left_ justified? > +(define_expand "stxvll" > + [(set (match_dup 3) > + (match_operand:DI 2 "register_operand")) > + (set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand")) > + (unspec:V16QI > + [(match_operand:V16QI 0 "vsx_register_operand") > + (match_dup 3)] > + UNSPEC_STXVLL))] > + "TARGET_P9_VECTOR && TARGET_64BIT" > +{ > + operands[3] = gen_reg_rtx (DImode); > +}) > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-5-p9-runnable.c > @@ -0,0 +1,309 @@ > +/* { dg-do run { target { powerpc64*-*-* && { p9vector_hw } } } } */ This should be powerpc*-*-* I think? Does it need braces around p9vector_hw? Segher
C++ PATCH for c++/82070, error with nested lambda capture
I was expecting that references to capture proxies would be resolved in the reconstructed lambda by normal name lookup, but that doesn't work in decltype, and processing the nested lambda really wants to find the new capture proxy, not the captured variable. Tested x86_64-pc-linux-gnu, applying to trunk. commit f9a1fe6d129418e72c68d0d1d9d35089ba7817b2 Author: Jason Merrill Date: Wed Sep 6 13:41:58 2017 -0400 PR c++/82070 - error with nested lambda capture * pt.c (tsubst_expr) [DECL_EXPR]: Register capture proxies with register_local_specialization. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index eb27f6a..4a65e31 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15985,8 +15985,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl, else if (is_capture_proxy (decl) && !DECL_TEMPLATE_INSTANTIATION (current_function_decl)) { - /* We're in tsubst_lambda_expr, we've already inserted new capture - proxies, and uses will find them with lookup_name. */ + /* We're in tsubst_lambda_expr, we've already inserted a new + capture proxy, so look it up and register it. */ + tree inst = lookup_name (DECL_NAME (decl)); + gcc_assert (inst != decl && is_capture_proxy (inst)); + register_local_specialization (inst, decl); break; } else if (DECL_IMPLICIT_TYPEDEF_P (decl) diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C new file mode 100644 index 000..7403315 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested7.C @@ -0,0 +1,17 @@ +// PR c++/82070 +// { dg-do compile { target c++11 } } + +namespace a { +template +void +c (int, int, b d) +{ + [d] { [d] {}; }; +} +} +void +e () +{ + int f; + a::c (f, 3, [] {}); +}
Re: [Patch, fortran] Parameterized Derived Types
Thanks for your tireless efforts on this, Paul! I look forward to trying this out after it hits the trunk. Your phrase “last unimplemented F2003” feature bolsters my suspicion that it might be ok to switch the features listed as “Partial” on the Fortran wiki to “Yes." I suppose the difference depends on developer intent. If the developer(s) intended to leave some aspect of a feature unimplemented (as might be evidenced by an appropriate compiler message), then “Partial” seems best. Otherwise, “Yes” seems appropriate even in the presence of bugs. I’ll send a separate email to the list with further thoughts on this. Best Regards, ___ Damian Rouson, Ph.D., P.E. President, Sourcery Institute www.sourceryinstitute.org(http://www.sourceryinstitute.org) +1-510-600-2992 (mobile) On September 6, 2017 at 6:04:47 AM, Paul Richard Thomas (paul.richard.tho...@gmail.com(mailto:paul.richard.tho...@gmail.com)) wrote: > Dear All, > > Since my message to the list of 16 August 2017 I have put in another > intense period of activity to develop a patch to implement PDTs in > gfortran. I have now temporarily run out of time to develop it > further; partly because of a backlog of other patches and PRs to deal > with but also pressure from daytime work. > > The patch adds the last unimplemented F2003 feature to gfortran. > > As in the provisional patch, I have attached some notes on the > implementation. This indicates some of the weaknesses, problem areas > and TODOs. > > Suggest that a good read of Mark Leair's excellent PGInsider article > on PDTs - http://www.pgroup.com/lit/articles/insider/v5n2a4.htm is a > worthwhile exercise. > > To judge by the complete silence following my previous message, I will > have a problem getting this patch reviewed. I would welcome any > remarks or reviews but intend to commit, warts and all, on Saturday > unless something fundamentally wrong comes out of the woodwork. > > Note that the PDT parts in the compiler are rather well insulated from > the rest of fortran and that I do not believe that any regressions > will result. > > I hope that a month or two of testing in other hands will add to the > list of TODOs and that when I return to PDTs a greatly improved > version will result. > > Bootstrapped and regtested on FC23/x86_4 - OK for trunk? (Note above > remark about committing on Saturday in the absence of a review.) > > Best regards > > Paul > > 2017-09-05 Paul Thomas > > * decl.c : Add decl_type_param_list, type_param_spec_list as > static variables to hold PDT spec lists. > (build_sym): Copy 'type_param_spec_list' to symbol spec_list. > (build_struct): Copy the 'saved_kind_expr' to the component > 'kind_expr'. Check that KIND or LEN components appear in the > decl_type_param_list. These should appear as symbols in the > f2k_derived namespace. If the component is itself a PDT type, > copy the decl_type_param_list to the component param_list. > (gfc_match_kind_spec): If the KIND expression is parameterized > set KIND to zero and store the expression in 'saved_kind_expr'. > (insert_parameter_exprs): New function. > (gfc_insert_kind_parameter_exprs): New function. > (gfc_insert_parameter_exprs): New function. > (gfc_get_pdt_instance): New function. > (gfc_match_decl_type_spec): Match the decl_type_spec_list if it > is present. If it is, call 'gfc_get_pdt_instance' to obtain the > specific instance of the PDT. > (match_attr_spec): Match KIND and LEN attributes. Check for the > standard and for type/kind of the parameter. They are also not > allowed outside a derived type definition. > (gfc_match_data_decl): Null the decl_type_param_list and the > type_param_spec_list on entry and free them on exit. > (gfc_match_formal_arglist): If 'typeparam' is true, add the > formal symbol to the f2k_derived namespace. > (gfc_match_derived_decl): Register the decl_type_param_list > if this is a PDT. If this is a type extension, gather up all > the type parameters and put them in the right order. > *dump-parse-tree.c (show_attr): Signal PDT templates and the > parameter attributes. > (show_components): Output parameter atrributes and component > parameter list. > (show_symbol): Show variable parameter lists. > * expr.c (expr.c): Copy the expression parameter list. > (gfc_is_constant_expr): Pass on symbols representing PDT > parameters. > (gfc_check_init_expr): Break on PDT KIND parameters and > PDT parameter expressions. > (gfc_check_assign): Assigning to KIND or LEN components is an > error. > (derived_parameter_expr): New function. > (gfc_derived_parameter_expr): New function. > (gfc_spec_list_type): New function. > * gfortran.h : Add enum gfc_param_spec_type. Add the PDT attrs > to the structure symbol_attr. Add the 'kind_expr' and > 'param_list' field to the gfc_component structure. Comment on > the reuse of the gfc_actual_arglist structure as storage for > type parameter spec lists. Add the new field 'spec_type' to > this structure.
Re: [Patch, fortran] Parameterized Derived Types
Hi Paul, thanks for your patch! It's really great to finally see PDTs come to gfortran. You're a hero, man ;) Also: Sorry about the silence. It's certainly not due to lack of interest, but rather lack of time (day job and private life taking up all of mine at the moment). In my current situation I can not promise a complete review of this beast of a patch, but I will try to do some testing and at least skim over the diff. I will probably not get to it before the weekend, though. Cheers, Janus 2017-09-06 15:04 GMT+02:00 Paul Richard Thomas : > Dear All, > > Since my message to the list of 16 August 2017 I have put in another > intense period of activity to develop a patch to implement PDTs in > gfortran. I have now temporarily run out of time to develop it > further; partly because of a backlog of other patches and PRs to deal > with but also pressure from daytime work. > > The patch adds the last unimplemented F2003 feature to gfortran. > > As in the provisional patch, I have attached some notes on the > implementation. This indicates some of the weaknesses, problem areas > and TODOs. > > Suggest that a good read of Mark Leair's excellent PGInsider article > on PDTs - http://www.pgroup.com/lit/articles/insider/v5n2a4.htm is a > worthwhile exercise. > > To judge by the complete silence following my previous message, I will > have a problem getting this patch reviewed. I would welcome any > remarks or reviews but intend to commit, warts and all, on Saturday > unless something fundamentally wrong comes out of the woodwork. > > Note that the PDT parts in the compiler are rather well insulated from > the rest of fortran and that I do not believe that any regressions > will result. > > I hope that a month or two of testing in other hands will add to the > list of TODOs and that when I return to PDTs a greatly improved > version will result. > > Bootstrapped and regtested on FC23/x86_4 - OK for trunk? (Note above > remark about committing on Saturday in the absence of a review.) > > Best regards > > Paul > > 2017-09-05 Paul Thomas > > * decl.c : Add decl_type_param_list, type_param_spec_list as > static variables to hold PDT spec lists. > (build_sym): Copy 'type_param_spec_list' to symbol spec_list. > (build_struct): Copy the 'saved_kind_expr' to the component > 'kind_expr'. Check that KIND or LEN components appear in the > decl_type_param_list. These should appear as symbols in the > f2k_derived namespace. If the component is itself a PDT type, > copy the decl_type_param_list to the component param_list. > (gfc_match_kind_spec): If the KIND expression is parameterized > set KIND to zero and store the expression in 'saved_kind_expr'. > (insert_parameter_exprs): New function. > (gfc_insert_kind_parameter_exprs): New function. > (gfc_insert_parameter_exprs): New function. > (gfc_get_pdt_instance): New function. > (gfc_match_decl_type_spec): Match the decl_type_spec_list if it > is present. If it is, call 'gfc_get_pdt_instance' to obtain the > specific instance of the PDT. > (match_attr_spec): Match KIND and LEN attributes. Check for the > standard and for type/kind of the parameter. They are also not > allowed outside a derived type definition. > (gfc_match_data_decl): Null the decl_type_param_list and the > type_param_spec_list on entry and free them on exit. > (gfc_match_formal_arglist): If 'typeparam' is true, add the > formal symbol to the f2k_derived namespace. > (gfc_match_derived_decl): Register the decl_type_param_list > if this is a PDT. If this is a type extension, gather up all > the type parameters and put them in the right order. > *dump-parse-tree.c (show_attr): Signal PDT templates and the > parameter attributes. > (show_components): Output parameter atrributes and component > parameter list. > (show_symbol): Show variable parameter lists. > * expr.c (expr.c): Copy the expression parameter list. > (gfc_is_constant_expr): Pass on symbols representing PDT > parameters. > (gfc_check_init_expr): Break on PDT KIND parameters and > PDT parameter expressions. > (gfc_check_assign): Assigning to KIND or LEN components is an > error. > (derived_parameter_expr): New function. > (gfc_derived_parameter_expr): New function. > (gfc_spec_list_type): New function. > * gfortran.h : Add enum gfc_param_spec_type. Add the PDT attrs > to the structure symbol_attr. Add the 'kind_expr' and > 'param_list' field to the gfc_component structure. Comment on > the reuse of the gfc_actual_arglist structure as storage for > type parameter spec lists. Add the new field 'spec_type' to > this structure. Add 'param_list' fields to gfc_symbol and > gfc_expr. Add prototypes for gfc_insert_kind_parameter_exprs, > gfc_insert_parameter_exprs, gfc_add_kind, gfc_add_len, > gfc_derived_parameter_expr and gfc_spec_list_type. > *
Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Richard Sandiford writes: > Richard Sandiford writes: >> Michael Collison writes: >>> Richard, >>> >>> The problem with this approach for Aarch64 is that >>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is >>> normally 0 as it based on the TARGET_SIMD flag. >> >> Maybe I'm wrong, but that seems like a missed optimisation in itself. Sorry to follow up on myself yet again, but I'd forgotten this was because we allow the SIMD unit to do scalar shifts. So I guess we have no choice, even though it seems unfortunate. > +(define_insn_and_split "*aarch64_reg__minus3" > + [(set (match_operand:GPI 0 "register_operand" "=&r") > + (ASHIFT:GPI > + (match_operand:GPI 1 "register_operand" "r") > + (minus:QI (match_operand 2 "const_int_operand" "n") > + (match_operand:QI 3 "register_operand" "r"] > + "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)" > + "#" > + "&& true" > + [(const_int 0)] > + { > +/* Handle cases where operand 3 is a plain QI register, or > + a subreg with either a SImode or DImode register. */ > + > +rtx subreg_tmp = (REG_P (operands[3]) > + ? gen_lowpart_SUBREG (SImode, operands[3]) > + : SUBREG_REG (operands[3])); > + > +if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode) > + subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp); I think this all simplifies to: rtx subreg_tmp = gen_lowpart (SImode, operands[3]); (or it would be worth having a comment that explains why not). As well as being shorter, it will properly simplify hard REGs to new hard REGs. > +rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) > +: operands[0]); > + > +if (mode == DImode && !can_create_pseudo_p ()) > + tmp = gen_lowpart_SUBREG (SImode, operands[0]); I think this too would be simpler with gen_lowpart: rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode) : gen_lowpart (SImode, operands[0])); > + > +emit_insn (gen_negsi2 (tmp, subreg_tmp)); > + > +rtx and_op = gen_rtx_AND (SImode, tmp, > + GEN_INT (GET_MODE_BITSIZE (mode) - 1)); > + > +rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op); > + > +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp2)); > +DONE; > + } > +) The pattern should probably set the "length" attribute to 8. Looks good to me with those changes FWIW. Thanks, Richard
Re: [PATCH], Enable -mfloat128 by default on PowerPC VSX systems
On Wed, Sep 06, 2017 at 01:48:38AM -0400, Michael Meissner wrote: > Here is a respin of the patch to enable -mfloat128 on PowerPC Linux systems > now > that the libquadmath patch has been applied. I rebased the patches against > the > top of the trunk on Tuesday (subversion id 251609). > > I tweaked the documentation a bit based on your comments. > > I built the patch on the following systems. There are no regressions, and the > tests float128-type-{1,2}.c now pass (previously they had regressed due to > other float128 changes). > > * Power7, bootstrap, big endian, --with-cpu=power7 > * Power7, bootstrap, big endian, --with-cpu=power5 > * Power8, bootstrap, little endian, --with-cpu=power8 > * Power9 prototype bootstrap, little endian, --with-cpu=power9 > > Can I check these patches into the trunk? It looks fine, please commit. Thanks! Segher
Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Richard Sandiford writes: > Michael Collison writes: >> Richard, >> >> The problem with this approach for Aarch64 is that >> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is >> normally 0 as it based on the TARGET_SIMD flag. > > Maybe I'm wrong, but that seems like a missed optimisation in itself. > Like you say, the definition is: > > static unsigned HOST_WIDE_INT > aarch64_shift_truncation_mask (machine_mode mode) > { > return > (!SHIFT_COUNT_TRUNCATED >|| aarch64_vector_mode_supported_p (mode) >|| aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE >|| (mode) - 1); > } er, aarch64_shift_truncation_mask (machine_mode mode) { return (!SHIFT_COUNT_TRUNCATED || aarch64_vector_mode_supported_p (mode) || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1); } > SHIFT_COUNT_TRUNCATED is: > > #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD) > > and aarch64_vector_mode_supported_p always returns false for > !TARGET_SIMD: > > static bool > aarch64_vector_mode_supported_p (machine_mode mode) > { > if (TARGET_SIMD > && (mode == V4SImode || mode == V8HImode > || mode == V16QImode || mode == V2DImode > || mode == V2SImode || mode == V4HImode > || mode == V8QImode || mode == V2SFmode > || mode == V4SFmode || mode == V2DFmode > || mode == V4HFmode || mode == V8HFmode > || mode == V1DFmode)) > return true; > > return false; > } > > So when does the second || condition fire? > > I'm surprised the aarch64_vect_struct_mode_p part is needed, since > this hook describes the shift optabs, and AArch64 don't provide any > shift optabs for OI, CI or XI. > > Thanks, > Richard > >> -Original Message- >> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] >> Sent: Wednesday, September 6, 2017 11:32 AM >> To: Michael Collison >> Cc: Richard Biener ; Richard Kenner >> ; GCC Patches ; nd >> ; Andrew Pinski >> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >> >> Michael Collison writes: >>> Richard Sandiford do you have any objections to the patch as it stands? >>> It doesn't appear as if anything is going to change in the mid-end >>> anytime soon. >> >> I think one of the suggestions was to do it in expand, taking >> advantage of range info and TARGET_SHIFT_TRUNCATION_MASK. This would >> be like the current FMA_EXPR handling in expand_expr_real_2. >> >> I know there was talk about cleaner approaches, but at least doing the >> above seems cleaner than doing in the backend. It should also be a >> nicely-contained piece of work. >> >> Thanks, >> Richard >> >>> -Original Message- >>> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] >>> Sent: Tuesday, August 22, 2017 9:11 AM >>> To: Richard Biener >>> Cc: Richard Kenner ; Michael Collison >>> ; GCC Patches ; nd >>> ; Andrew Pinski >>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >>> >>> Richard Biener writes: On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford wrote: > Richard Biener writes: >> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford >> wrote: >>>Richard Biener writes: On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner wrote: >> Correct. It is truncated for integer shift, but not simd shift >> instructions. We generate a pattern in the split that only >>>generates >> the integer shift instructions. > > That's unfortunate, because it would be nice to do this in >>>simplify_rtx, > since it's machine-independent, but that has to be conditioned > on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. SHIFT_COUNT_TRUNCATED should go ... you should express this in the patterns, like for example with (define_insn ashlSI3 [(set (match_operand 0 "") (ashl:SI (match_operand ... ) (subreg:QI (match_operand:SI ...)))] or an explicit and:SI and combine / simplify_rtx should apply the >>>magic optimization we expect. >>> >>>The problem with the explicit AND is that you'd end up with either >>>an AND of two constants for constant shifts, or with two separate >>>patterns, one for constant shifts and one for variable shifts. >>>(And the problem in theory with two patterns is that it reduces the >>>RA's freedom, although in practice I guess we'd always want a >>>constant shift where possible for cost reasons, and so the RA would >>>never need to replace pseudos with constants itself.) >>> >>>I think all useful instances of this optimisation will be exposed >>>by the gimple optimisers, so maybe expand could to do it based on >>>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab
Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Michael Collison writes: > Richard, > > The problem with this approach for Aarch64 is that > TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is > normally 0 as it based on the TARGET_SIMD flag. Maybe I'm wrong, but that seems like a missed optimisation in itself. Like you say, the definition is: static unsigned HOST_WIDE_INT aarch64_shift_truncation_mask (machine_mode mode) { return (!SHIFT_COUNT_TRUNCATED || aarch64_vector_mode_supported_p (mode) || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1); } SHIFT_COUNT_TRUNCATED is: #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD) and aarch64_vector_mode_supported_p always returns false for !TARGET_SIMD: static bool aarch64_vector_mode_supported_p (machine_mode mode) { if (TARGET_SIMD && (mode == V4SImode || mode == V8HImode || mode == V16QImode || mode == V2DImode || mode == V2SImode || mode == V4HImode || mode == V8QImode || mode == V2SFmode || mode == V4SFmode || mode == V2DFmode || mode == V4HFmode || mode == V8HFmode || mode == V1DFmode)) return true; return false; } So when does the second || condition fire? I'm surprised the aarch64_vect_struct_mode_p part is needed, since this hook describes the shift optabs, and AArch64 don't provide any shift optabs for OI, CI or XI. Thanks, Richard > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Wednesday, September 6, 2017 11:32 AM > To: Michael Collison > Cc: Richard Biener ; Richard Kenner > ; GCC Patches ; nd > ; Andrew Pinski > Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts > > Michael Collison writes: >> Richard Sandiford do you have any objections to the patch as it stands? >> It doesn't appear as if anything is going to change in the mid-end >> anytime soon. > > I think one of the suggestions was to do it in expand, taking advantage of > range info and TARGET_SHIFT_TRUNCATION_MASK. This would be like the current > FMA_EXPR handling in expand_expr_real_2. > > I know there was talk about cleaner approaches, but at least doing the above > seems cleaner than doing in the backend. It should also be a > nicely-contained piece of work. > > Thanks, > Richard > >> -Original Message- >> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] >> Sent: Tuesday, August 22, 2017 9:11 AM >> To: Richard Biener >> Cc: Richard Kenner ; Michael Collison >> ; GCC Patches ; nd >> ; Andrew Pinski >> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >> >> Richard Biener writes: >>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >>> wrote: Richard Biener writes: > On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford > wrote: >>Richard Biener writes: >>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >>> wrote: > Correct. It is truncated for integer shift, but not simd shift > instructions. We generate a pattern in the split that only >>generates > the integer shift instructions. That's unfortunate, because it would be nice to do this in >>simplify_rtx, since it's machine-independent, but that has to be conditioned on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >>> >>> SHIFT_COUNT_TRUNCATED should go ... you should express this in >>> the patterns, like for example with >>> >>> (define_insn ashlSI3 >>> [(set (match_operand 0 "") >>> (ashl:SI (match_operand ... ) >>> (subreg:QI (match_operand:SI ...)))] >>> >>> or an explicit and:SI and combine / simplify_rtx should apply the >>magic >>> optimization we expect. >> >>The problem with the explicit AND is that you'd end up with either >>an AND of two constants for constant shifts, or with two separate >>patterns, one for constant shifts and one for variable shifts. >>(And the problem in theory with two patterns is that it reduces the >>RA's freedom, although in practice I guess we'd always want a >>constant shift where possible for cost reasons, and so the RA would >>never need to replace pseudos with constants itself.) >> >>I think all useful instances of this optimisation will be exposed >>by the gimple optimisers, so maybe expand could to do it based on >>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >>the rtx code and it does take the mode into account. > > Sure, that could work as well and also take into account range info. > But we'd then need named expanders and the result would still have > the explicit and or need to be an unspec or a different RTL operation. Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have target-dependent rather than undefine
Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)
On Wed, Sep 06, 2017 at 06:26:10PM +0200, Jakub Jelinek wrote: > > Maybe this "switch to the other section" thing should be abstracted out? > > Messing with in_cold_section_p is a bit dirty. > > But it reflects the reality, and is what final.c and varasm.c also do. Yes, but those aren't target code :-) I'm suggesting adding a generic switch_from_hot_to_cold_or_the_other_way_around function (but with a better name ;-) ) that just does these same two lines, only not in target code. Seems cleaner to me, less surprising. But, okay either way. Segher
[committed][Testsuite] PR78468 - add alloca alignment test
Add an alignment test to check that aligned alloca's really do get correctly aligned. Some targets may not ensure SP is always a multiple of STACK_BOUNDARY (particularly with outgoing arguments), which means aligned alloca does not get correctly aligned. This can be fixed either by aligning the outgoing arguments or setting STACK_BOUNDARY correctly. Committed as obvious. ChangeLog: 2017-09-06 Wilco Dijkstra PR middle-end/78468 * gcc.dg/pr78468.c: Add alignment test. -- diff --git a/gcc/testsuite/gcc.dg/pr78468.c b/gcc/testsuite/gcc.dg/pr78468.c new file mode 100644 index ..68eb83a0868c16327e36055aae4eea34fc2ba35e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr78468.c @@ -0,0 +1,102 @@ +/* { dg-do run } */ +/* { dg-require-effective-target alloca } */ +/* { dg-options "-O2 -fno-inline" } */ + +/* Test that targets correctly round the size of the outgoing arguments + to a multiple of STACK_BOUNDARY. There is a serious alignment bug if + aligned alloca does not get aligned! */ + +__extension__ typedef __UINTPTR_TYPE__ uintptr_t; +extern void abort (void); + +volatile int xx; +volatile int x = 16; + +void +t1 (int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7, +void *p, int align) +{ + xx = x0 + x1 + x2 + x3 + x4 + x4 + x6 + x7; + if ((int)(uintptr_t)p & (align-1)) +abort (); +} + +void +t2 (int x0, int x1, int x2, int x3, int x4, int x5, int x6, int x7, +void *p, int align, int dummy) +{ + xx = x0 + x1 + x2 + x3 + x4 + x4 + x6 + x7; + if ((int)(uintptr_t)p & (align-1)) +abort (); +} + +void +t1_a4 (int size) +{ + void *p = __builtin_alloca_with_align (size, 32); + t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 4); +} + +void +t2_a4 (int size) +{ + void *p = __builtin_alloca_with_align (size, 32); + t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 4, 0); +} + +void +t1_a8 (int size) +{ + void *p = __builtin_alloca_with_align (size, 64); + t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 8); +} + +void +t2_a8 (int size) +{ + void *p = __builtin_alloca_with_align (size, 64); + t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 8, 0); +} + +void +t1_a16 (int size) +{ + void *p = __builtin_alloca_with_align (size, 128); + t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 16); +} + +void +t2_a16 (int size) +{ + void *p = __builtin_alloca_with_align (size, 128); + t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 16, 0); +} + +void +t1_a32 (int size) +{ + void *p = __builtin_alloca_with_align (size, 256); + t1 (0, 0, 0, 0, 0, 0, 0, 0, p, 32); +} + +void +t2_a32 (int size) +{ + void *p = __builtin_alloca_with_align (size, 256); + t2 (0, 0, 0, 0, 0, 0, 0, 0, p, 32, 0); +} + + +int +main () +{ + t1_a4 (x); + t2_a4 (x); + t1_a8 (x); + t2_a8 (x); + t1_a16 (x); + t2_a16 (x); + t1_a32 (x); + t2_a32 (x); + return 0; +}
Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)
On Wed, Sep 06, 2017 at 11:10:07AM -0500, Segher Boessenkool wrote: > >for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) > > { > > > if (INSN_P (insn)) > > @@ -25270,10 +25273,14 @@ uses_TOC (void) > > sub = XEXP (sub, 0); > > if (GET_CODE (sub) == UNSPEC > > && XINT (sub, 1) == UNSPEC_TOC) > > - return 1; > > + return ret; > > } > > } > >} > > +else if (crtl->has_bb_partition > > +&& NOTE_P (insn) > > +&& NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > > + ret = 2; > > } Ok. > > + if (uses_toc == 2) I could repeat the crtl->has_bb_partition test here if it made things clearer, but it is redundant with the above. > > + { > > + in_cold_section_p = !in_cold_section_p; > > + switch_to_section (current_function_section ()); > > + } > >(*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno); > > > >fprintf (file, "\t.long "); > > @@ -33321,6 +4,11 @@ rs6000_elf_declare_function_name (FILE * > >ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno); > >assemble_name (file, buf); > >putc ('\n', file); > > + if (uses_toc == 2) > > + { > > + in_cold_section_p = !in_cold_section_p; > > + switch_to_section (current_function_section ()); > > + } > > } > > Hrm, does that work if not hot/cold partitioning? Oh, that cannot happen > because uses_toc==2. Tricky. > > Maybe this "switch to the other section" thing should be abstracted out? > Messing with in_cold_section_p is a bit dirty. But it reflects the reality, and is what final.c and varasm.c also do. Without changing in_cold_section_p, that flag will be incorrect while inside of the other section. There are no switch_to_* functions except to switch_to_section, and as argument that can use current_function_section which uses the in_cold_section_p flag, or unlikely_text_section which hardcodes true for in cold, or function_section which uses first_function_block_is_cold. Even if we introduced function_other_section that used !first_function_block_is_cold the in_cold_section_p flag would be incorrect there. Jakub
[PATCH, rs6000] Add builtins to convert from float/double to int/long using current rounding mode
GCC Maintainers: The following patch adds support for a couple of requested builtins that convert from float/double to int / long using the current rounding mode. The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE). Please let me know if the following patch is acceptable. Thanks. Carl Love --- gcc/ChangeLog: 2017-09-06 Carl Love * config/rs6000/rs6000-builtin.def (FCTID, FCTIW): Add BU_P7_MISC_1 macro expansion for builtins. * config/rs6000/rs6000.md (fctid, fctiw): Add define_insn for the fctid and fctiw instructions. gcc/testsuite/ChangeLog: 2017-09-06 Carl Love * gcc.target/powerpc/builtin-fctid-fctiw-runnable.c: New test file for the __builtin_fctid and __builtin_fctiw builtins. --- gcc/config/rs6000/rs6000-builtin.def | 2 + gcc/config/rs6000/rs6000.md| 18 +++ .../powerpc/builtin-fctid-fctiw-runnable.c | 138 + 3 files changed, 158 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 850164a..7affa30 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2231,6 +2231,8 @@ BU_DFP_MISC_2 (DSCRIQ,"dscriq", CONST, dfp_dscri_td) /* 1 argument BCD functions added in ISA 2.06. */ BU_P7_MISC_1 (CDTBCD, "cdtbcd", CONST, cdtbcd) BU_P7_MISC_1 (CBCDTD, "cbcdtd", CONST, cbcdtd) +BU_P7_MISC_1 (FCTID, "fctid",CONST, fctid) +BU_P7_MISC_1 (FCTIW, "fctiw",CONST, fctiw) /* 2 argument BCD functions added in ISA 2.06. */ BU_P7_MISC_2 (ADDG6S, "addg6s", CONST, addg6s) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 20873ac..a5cbef5 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -14054,6 +14054,24 @@ [(set_attr "type" "integer") (set_attr "length" "4")]) +(define_insn "fctid" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand:DF 1 "register_operand" "f")] + UNSPEC_FCTID))] + "" + "fctid %1,%1; mfvsrd %0,%1" + [(set_attr "type" "two") + (set_attr "length" "8")]) + +(define_insn "fctiw" + [(set (match_operand:SI 0 "register_operand" "=r") + (unspec:SI [(match_operand:DF 1 "register_operand" "f")] + UNSPEC_FCTIW))] + "" + "fctiw %1,%1; mfvsrd %0,%1; extsw %0,%0" + [(set_attr "type" "integer") + (set_attr "length" "4")]) + (define_int_iterator UNSPEC_DIV_EXTEND [UNSPEC_DIVE UNSPEC_DIVEO UNSPEC_DIVEU diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c new file mode 100644 index 000..79c5341 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fctid-fctiw-runnable.c @@ -0,0 +1,138 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mcpu=power8" } */ + +#ifdef DEBUG +#include +#endif + +void abort (void); + +long +test_bi_lrint_1 (float __A) +{ + return (__builtin_fctid (__A)); +} +long +test_bi_lrint_2 (double __A) +{ + return (__builtin_fctid (__A)); +} + +int +test_bi_rint_1 (float __A) +{ + return (__builtin_fctiw (__A)); +} + +int +test_bi_rint_2 (double __A) +{ + return (__builtin_fctiw (__A)); +} + + +int main( void) +{ + signed long lx, expected_l; + double dy; + + signed int x, expected_i; + float y; + + dy = 1.45; + expected_l = 1; + lx = __builtin_fctid (dy); + + if( lx != expected_l) +#ifdef DEBUG +printf("ERROR: __builtin_fctid(dy= %f) = %ld, expected %ld\n", + dy, lx, expected_l); +#else +abort(); +#endif + + dy = 3.51; + expected_l = 4; + lx = __builtin_fctid (dy); + + if( lx != expected_l) +#ifdef DEBUG +printf("ERROR: __builtin_fctid(dy= %f) = %ld, expected %ld\n", + dy, lx, expected_l); +#else +abort(); +#endif + + dy = 5.57; + expected_i = 6; + x = __builtin_fctiw (dy); + + if( x != expected_i) +#ifdef DEBUG +printf("ERROR: __builtin_fctiw(dy= %f) = %d, expected %d\n", + dy, x, expected_i); +#else +abort(); +#endif + + y = 11.47; + expected_i = 11; + x = __builtin_fctiw (y); + + if( x != expected_i) +#ifdef DEBUG +printf("ERROR: __builtin_fctiw(y = %f) = %d, expected %d\n", + y, x, expected_i); +#else +abort(); +#endif + + y = 17.77; + expected_l = 18; + lx = test_bi_lrint_1 (y); + + if( lx != expected_l) +#ifdef DEBUG +printf("ERROR: function call test_bi_lrint_1 (y = %f) = %ld, expected %ld\n", + y, lx, expected_l); +#else
Re: [PATCH] Fix rs6000 sysv4 -fPIC hot/cold partitioning handling (PR target/81979)
Hi, On Tue, Sep 05, 2017 at 11:27:25PM +0200, Jakub Jelinek wrote: > On powerpc with sysv4 -fPIC we emit something like > .LCL0: > .long .LCTOC1-.LCF0 > before we start emitting the function, and in the prologue we emit > .LCF0: > and some code. This fails to assemble if the prologue is emitted in a > different partition from the start of the function, as e.g. the following > testcase, where the start of the function is hot, i.e. in .text section, > but the shrink-wrapped prologue is cold, emitted in .text.unlikely section. > .LCL0 is still emitted in the section the function starts, thus .text, and > there is no relocation for subtraction of two symbols in other sections > (the second - operand has to be in the current section so that a PC-relative > relocation can be used). This probably never worked, but is now more > severe, as we enable hot/cold partitioning in GCC 8, where it > has been previously only enabled for -fprofile-use. I wonder if that helps performance at all, for rs6000 anyway... It's is a never-ending source of ICEs though :-( > --- gcc/config/rs6000/rs6000.c.jj 2017-09-04 09:55:28.0 +0200 > +++ gcc/config/rs6000/rs6000.c2017-09-04 16:36:49.033213325 +0200 > @@ -25248,12 +25248,15 @@ get_TOC_alias_set (void) > > /* This returns nonzero if the current function uses the TOC. This is > determined by the presence of (use (unspec ... UNSPEC_TOC)), which > - is generated by the ABI_V4 load_toc_* patterns. */ > + is generated by the ABI_V4 load_toc_* patterns. > + Return 2 instead of 1 if the load_toc_* pattern is in the function > + partition that doesn't start the function. */ > #if TARGET_ELF > static int > uses_TOC (void) > { >rtx_insn *insn; > + int ret = 1; > >for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) { > if (INSN_P (insn)) > @@ -25270,10 +25273,14 @@ uses_TOC (void) > sub = XEXP (sub, 0); > if (GET_CODE (sub) == UNSPEC > && XINT (sub, 1) == UNSPEC_TOC) > - return 1; > + return ret; > } > } >} > +else if (crtl->has_bb_partition > + && NOTE_P (insn) > + && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > + ret = 2; } >return 0; > } > #endif > @@ -33304,14 +33311,20 @@ rs6000_elf_declare_function_name (FILE * >return; > } > > + int uses_toc; >if (DEFAULT_ABI == ABI_V4 >&& (TARGET_RELOCATABLE || flag_pic > 1) >&& !TARGET_SECURE_PLT >&& (!constant_pool_empty_p () || crtl->profile) > - && uses_TOC ()) > + && (uses_toc = uses_TOC ())) > { >char buf[256]; > > + if (uses_toc == 2) > + { > + in_cold_section_p = !in_cold_section_p; > + switch_to_section (current_function_section ()); > + } >(*targetm.asm_out.internal_label) (file, "LCL", rs6000_pic_labelno); > >fprintf (file, "\t.long "); > @@ -33321,6 +4,11 @@ rs6000_elf_declare_function_name (FILE * >ASM_GENERATE_INTERNAL_LABEL (buf, "LCF", rs6000_pic_labelno); >assemble_name (file, buf); >putc ('\n', file); > + if (uses_toc == 2) > + { > + in_cold_section_p = !in_cold_section_p; > + switch_to_section (current_function_section ()); > + } > } Hrm, does that work if not hot/cold partitioning? Oh, that cannot happen because uses_toc==2. Tricky. Maybe this "switch to the other section" thing should be abstracted out? Messing with in_cold_section_p is a bit dirty. Otherwise looks okay; please add the {} in the first fragment. Thanks, Segher
RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops
Patch updated with all relevant comments and suggestions. Bootstrapped and tested on arm-none-linux-gnueabihf, and aarch64-none-linux-gnu and x86_64. Ok for trunk? 2017-08-05 Kyrylo Tkachov Michael Collison * compare-elim.c: Include emit-rtl.h. (can_merge_compare_into_arith): New function. (try_validate_parallel): Likewise. (try_merge_compare): Likewise. (try_eliminate_compare): Call the above when no previous clobber is available. (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN dataflow problems. 2017-08-05 Kyrylo Tkachov Michael Collison * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test. -Original Message- From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] Sent: Saturday, September 2, 2017 12:07 AM To: Kyrill Tkachov Cc: Jeff Law ; Michael Collison ; gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops Hi! On Tue, Aug 29, 2017 at 09:39:06AM +0100, Kyrill Tkachov wrote: > On 28/08/17 19:26, Jeff Law wrote: > >On 08/10/2017 03:14 PM, Michael Collison wrote: > >>One issue that we keep encountering on aarch64 is GCC not making > >>good use of the flag-setting arithmetic instructions like ADDS, > >>SUBS, ANDS etc. that perform an arithmetic operation and compare the > >>result against zero. > >>They are represented in a fairly standard way in the backend as > >>PARALLEL > >>patterns: > >>(parallel [(set (reg x1) (op (reg x2) (reg x3))) > >>(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int > >>0)))]) That is incorrect: the compare has to come first. From md.texi: @cindex @code{compare}, canonicalization of [ ... ] @item For instructions that inherently set a condition code register, the @code{compare} operator is always written as the first RTL expression of the @code{parallel} instruction pattern. For example, [ ... ] aarch64.md seems to do this correctly, fwiw. > >>GCC isn't forming these from separate arithmetic and comparison > >>instructions as aggressively as it could. > >>A particular pain point is when the result of the arithmetic insn is > >>used before the comparison instruction. > >>The testcase in this patch is one such example where we have: > >>(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (plus:SI (reg:SI 0 x0 [ x ]) > >> (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 33 7 34 2 (set (reg:SI 1 x1 [77]) > >> (plus:SI (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64} > >> (nil)) > >>(insn 34 33 17 2 (set (reg:CC 66 cc) > >> (compare:CC (reg/v:SI 0 x0 [orig:73 ] [73]) > >> (const_int 0 [0]))) "comb.c":4 391 {cmpsi} > >> (nil)) > >> > >>This scares combine away as x0 is used in insn 33 as well as the > >>comparison in insn 34. > >>I think the compare-elim pass can help us here. > >Is it the multiple use or the hard register that combine doesn't > >appreciate. The latter would definitely steer us towards compare-elim. > > It's the multiple use IIRC. Multiple use, and multiple set (of x1), and more complications... 7+33 won't combine to an existing insn. 7+34 will not even be tried (insn 33 is the first use of x0, not insn 34). But it cannot work anyway, since x1 in insn 7 is clobbered in insn 33, so 7 cannot be merged into 34. 7+33+34 results in a parallel of a compare with the same invalid insn as in the 7+33 case. Combine would try to split it to two insns again, except it already has two insns (the arith and the compare). It does not see that when it splits the insn it can combine the first half with the compare. What would be needed is pulling insn 34 before insn 33 (which is fine, no conflicts there), and then we could combine 7+34 just fine. But combine tries to be linear complexity, and it really cannot change insns around anyway. Segher pr5198v2.patch Description: pr5198v2.patch
RE: [PATCH] [Aarch64] Optimize subtract in shift counts
Richard, The problem with this approach for Aarch64 is that TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is normally 0 as it based on the TARGET_SIMD flag. -Original Message- From: Richard Sandiford [mailto:richard.sandif...@linaro.org] Sent: Wednesday, September 6, 2017 11:32 AM To: Michael Collison Cc: Richard Biener ; Richard Kenner ; GCC Patches ; nd ; Andrew Pinski Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts Michael Collison writes: > Richard Sandiford do you have any objections to the patch as it stands? > It doesn't appear as if anything is going to change in the mid-end > anytime soon. I think one of the suggestions was to do it in expand, taking advantage of range info and TARGET_SHIFT_TRUNCATION_MASK. This would be like the current FMA_EXPR handling in expand_expr_real_2. I know there was talk about cleaner approaches, but at least doing the above seems cleaner than doing in the backend. It should also be a nicely-contained piece of work. Thanks, Richard > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Tuesday, August 22, 2017 9:11 AM > To: Richard Biener > Cc: Richard Kenner ; Michael Collison > ; GCC Patches ; nd > ; Andrew Pinski > Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts > > Richard Biener writes: >> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >> wrote: >>> Richard Biener writes: On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford wrote: >Richard Biener writes: >> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >> wrote: Correct. It is truncated for integer shift, but not simd shift instructions. We generate a pattern in the split that only >generates the integer shift instructions. >>> >>> That's unfortunate, because it would be nice to do this in >simplify_rtx, >>> since it's machine-independent, but that has to be conditioned >>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >> >> SHIFT_COUNT_TRUNCATED should go ... you should express this in >> the patterns, like for example with >> >> (define_insn ashlSI3 >> [(set (match_operand 0 "") >> (ashl:SI (match_operand ... ) >> (subreg:QI (match_operand:SI ...)))] >> >> or an explicit and:SI and combine / simplify_rtx should apply the >magic >> optimization we expect. > >The problem with the explicit AND is that you'd end up with either >an AND of two constants for constant shifts, or with two separate >patterns, one for constant shifts and one for variable shifts. >(And the problem in theory with two patterns is that it reduces the >RA's freedom, although in practice I guess we'd always want a >constant shift where possible for cost reasons, and so the RA would >never need to replace pseudos with constants itself.) > >I think all useful instances of this optimisation will be exposed >by the gimple optimisers, so maybe expand could to do it based on >TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >the rtx code and it does take the mode into account. Sure, that could work as well and also take into account range info. But we'd then need named expanders and the result would still have the explicit and or need to be an unspec or a different RTL operation. >>> >>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >>> target-dependent rather than undefined behaviour, so it's OK for a >>> target to use shift codes with out-of-range values. >> >> Hmm, but that means simplify-rtx can't do anything with them because >> we need to preserve target dependent behavior. > > Yeah, it needs to punt. In practice that shouldn't matter much. > >> I think the RTL IL should be always well-defined and its semantics >> shouldn't have any target dependences (ideally, and if, then they >> should be well specified via extra target hooks/macros). > > That would be nice :-) I think the problem has traditionally been that >> shifts can be used in quite a few define_insn patterns besides those >> for shift instructions. So if your target defines shifts to have >> 256-bit precision (say) then you need to make sure that every >> define_insn with a shift rtx will honour that. > > It's more natural for target guarantees to apply to instructions than > to >> rtx codes. > >>> And >>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about >>> how the normal shift optabs behave, so I don't think we'd need new >>> optabs or new unspecs. >>> >>> E.g. it already works this way when expanding double-word shifts, >>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There >>> it's possible to use a shorter sequence if you know that the shift >>> optab truncates the count, so we can do that ev
Re: [PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)
On Wed, Sep 06, 2017 at 09:29:25AM -0600, Jeff Law wrote: > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > > trunk? > > > > 2017-09-05 Jakub Jelinek > > > > PR middle-end/82095 > > * varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars > > with > > NULL DECL_INITIAL. > > > > * gcc.dg/tls/pr82095.c: New test. > THanks. Sorry about the breakage. TLS didn't even cross my mind. > Presumably the TLS initialization sections are readonly and copied into > the actual thread specific locations. .tbss section is just in headers (it implies zeroing the corresponding thread private chunk) and .tdata is in relro part (the image; it might contain relocations and we don't support separate images for parts without and with relocations), copied to the thread private chunk. Jakub
[C++ PATCH] Merge fn and non-fn lookup interface
This patch merges the lookup of function and non-function member lookup into get_class_binding_direct. lookup_field_1 becomes an internal detail. We grow a tri-valued argument to get_class_binding_direct: <0 -- caller wants functions =0 -- caller wants whatever is bound >0 -- caller wants type_decl binding. This has the nice property that lookup_field_1's want_type argument maps onto the latter two values. The default is the first, which matches the existing get_class_binding usage. The two places where lookup_field_1 was being called directly are converted and were: 1) hierarchy searching. This functionality is swallowed by get_class_binding_direct, and it passes in the want_type argument. 2) named initializers. this now passes in 0. You'll notice this case is with the type being complete, so we now might get a binary search of METHOD_VEC that we didn;t before. This is going to be a short-lived performance regression. applied to trunk. I'm going to hold off the next patch as (a) it's more invasive, but (b) it steals the punch line from my name-lookup cauldron talk. nathan -- Nathan Sidwell 2017-09-06 Nathan Sidwell * name-lookup.h (lookup_field_1): Delete. (get_class_binding_direct, get_class_binding): Add type_or_fns arg. * name-lookup.c (lookup_field_1): make static (method_vec_binary_search, method_vec_linear_search): New. Broken out of ... (get_class_binding_direct): ... here. Add TYPE_OR_FNS argument. Do complete search of this level. (get_class_binding): Adjust. * decl.c (reshape_init_class): Call get_class_binding. * search.c (lookup_field_r): Move field searching into get_class_binding_direct. Index: decl.c === --- decl.c (revision 251782) +++ decl.c (working copy) @@ -5746,7 +5746,7 @@ reshape_init_class (tree type, reshape_i /* We already reshaped this. */ gcc_assert (d->cur->index == field); else if (TREE_CODE (d->cur->index) == IDENTIFIER_NODE) - field = lookup_field_1 (type, d->cur->index, /*want_type=*/false); + field = get_class_binding (type, d->cur->index, false); else { if (complain & tf_error) Index: name-lookup.c === --- name-lookup.c (revision 251794) +++ name-lookup.c (working copy) @@ -1113,79 +1113,54 @@ extract_conversion_operator (tree fns, t return convs; } -/* TYPE is a class type. Return the member functions in the method - vector with name NAME. Does not lazily declare implicitly-declared - member functions. */ +/* Binary search of (ordered) METHOD_VEC for NAME. */ -tree -get_class_binding_direct (tree type, tree name) +static tree +method_vec_binary_search (vec *method_vec, tree name) { - vec *method_vec = CLASSTYPE_METHOD_VEC (type); - if (!method_vec) -return NULL_TREE; - - /* Conversion operators can only be found by the marker conversion - operator name. */ - bool conv_op = IDENTIFIER_CONV_OP_P (name); - tree lookup = conv_op ? conv_op_identifier : name; - tree val = NULL_TREE; - tree fns; - - /* If the type is complete, use binary search. */ - if (COMPLETE_TYPE_P (type)) + for (unsigned lo = 0, hi = method_vec->length (); lo < hi;) { - int lo = 0; - int hi = method_vec->length (); - while (lo < hi) - { - int i = (lo + hi) / 2; - - fns = (*method_vec)[i]; - tree fn_name = OVL_NAME (fns); - if (fn_name > lookup) - hi = i; - else if (fn_name < lookup) - lo = i + 1; - else - { - val = fns; - break; - } - } + unsigned mid = (lo + hi) / 2; + tree binding = (*method_vec)[mid]; + tree binding_name = OVL_NAME (binding); + + if (binding_name > name) + hi = mid; + else if (binding_name < name) + lo = mid + 1; + else + return binding; } - else -for (int i = 0; vec_safe_iterate (method_vec, i, &fns); ++i) - /* We can get a NULL binding during insertion of a new - method name, because the identifier_binding machinery - performs a lookup. If we find such a NULL slot, that's - the thing we were looking for, so we might as well bail - out immediately. */ - if (!fns) - break; - else if (OVL_NAME (fns) == lookup) - { - val = fns; - break; - } - /* Extract the conversion operators asked for, unless the general - conversion operator was requested. */ - if (val && conv_op) -{ - gcc_checking_assert (OVL_FUNCTION (val) == conv_op_marker); - val = OVL_CHAIN (val); - if (tree type = TREE_TYPE (name)) - val = extract_conversion_operator (val, type); -} + return NULL_TREE; +} - return val; +/* Linear search of (unordered) METHOD_VEC for NAME. */ + +static tree +method_vec_linear_search (vec *method_vec, tree name) +{ + for (int ix = method_vec->length (); ix--;) +/* We can get a NULL binding during insertion of a new method + name, because the identifier_binding machinery perfo
Re: [PATCH, ARM] correctly encode the CC reg data flow
On 06/09/17 14:17, Bernd Edlinger wrote: On 09/06/17 14:51, Richard Earnshaw (lists) wrote: On 06/09/17 13:44, Bernd Edlinger wrote: On 09/04/17 21:54, Bernd Edlinger wrote: Hi Kyrill, Thanks for your review! On 09/04/17 15:55, Kyrill Tkachov wrote: Hi Bernd, On 18/01/17 15:36, Bernd Edlinger wrote: On 01/13/17 19:28, Bernd Edlinger wrote: On 01/13/17 17:10, Bernd Edlinger wrote: On 01/13/17 14:50, Richard Earnshaw (lists) wrote: On 18/12/16 12:58, Bernd Edlinger wrote: Hi, this is related to PR77308, the follow-up patch will depend on this one. When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned before reload, a mis-compilation in libgcc function __gnu_satfractdasq was discovered, see [1] for more details. The reason seems to be that when the *arm_cmpdi_insn is directly followed by a *arm_cmpdi_unsigned instruction, both are split up into this: [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1))) (parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 3) (match_dup 4))) (set (match_dup 2) (minus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) (set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1] The problem is that the reg:CC from the *subsi3_carryin_compare is not mentioning that the reg:CC is also dependent on the reg:CC from before. Therefore the *arm_cmpsi_insn appears to be redundant and thus got removed, because the data values are identical. I think that applies to a number of similar pattern where data flow is happening through the CC reg. So this is a kind of correctness issue, and should be fixed independently from the optimization issue PR77308. Therefore I think the patterns need to specify the true value that will be in the CC reg, in order for cse to know what the instructions are really doing. Bootstrapped and reg-tested on arm-linux-gnueabihf. Is it OK for trunk? I agree you've found a valid problem here, but I have some issues with the patch itself. (define_insn_and_split "subdi3_compare1" [(set (reg:CC_NCV CC_REGNUM) (compare:CC_NCV (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "register_operand" "r"))) (set (match_operand:DI 0 "register_operand" "=&r") (minus:DI (match_dup 1) (match_dup 2)))] "TARGET_32BIT" "#" "&& reload_completed" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) (parallel [(set (reg:CC_C CC_REGNUM) (compare:CC_C (zero_extend:DI (match_dup 4)) (plus:DI (zero_extend:DI (match_dup 5)) (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] This pattern is now no-longer self consistent in that before the split the overall result for the condition register is in mode CC_NCV, but afterwards it is just CC_C. I think CC_NCV is correct mode (the N, C and V bits all correctly reflect the result of the 64-bit comparison), but that then implies that the cc mode of subsi3_carryin_compare is incorrect as well and should in fact also be CC_NCV. Thinking about this pattern, I'm inclined to agree that CC_NCV is the correct mode for this operation I'm not sure if there are other consequences that will fall out from fixing this (it's possible that we might need a change to select_cc_mode as well). Yes, this is still a bit awkward... The N and V bit will be the correct result for the subdi3_compare1 a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) only gets the C bit correct, the expression for N and V is a different one. It probably works, because the subsi3_carryin_compare instruction sets more CC bits than the pattern does explicitly specify the value. We know the subsi3_carryin_compare also computes the NV bits, but it is hard to write down the correct rtl expression for it. In theory the pattern should describe everything correctly, maybe, like: set (reg:CC_C CC_REGNUM) (compare:CC_C (zero_extend:DI (match_dup 4)) (plus:DI (zero_extend:DI (match_dup 5)) (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) set (reg:CC_NV CC_REGNUM) (compare:CC_NV (match_dup 4)) (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))) set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0) But I doubt that wil
Re: [PATCH] Fix ICE in categorize_decl_for_section with TLS decl (PR middle-end/82095)
On 09/05/2017 03:16 PM, Jakub Jelinek wrote: > Hi! > > If a DECL_THREAD_LOCAL_P decl has NULL DECL_INITIAL and > -fzero-initialized-in-bss (the default), we ICE starting with > r251602, which changed bss_initializer_p: > + /* Do not put constants into the .bss section, they belong in a readonly > + section. */ > + return (!TREE_READONLY (decl) > + && > to: > (DECL_INITIAL (decl) == NULL > /* In LTO we have no errors in program; error_mark_node is used > to mark offlined constructors. */ > || (DECL_INITIAL (decl) == error_mark_node > && !in_lto_p) > || (flag_zero_initialized_in_bss > && initializer_zerop (DECL_INITIAL (decl > Previously because bss_initializer_p for these returned true, ret was > SECCAT_BSS and therefore we set it to SECCAT_TBSS as intended, but now ret > is not SECCAT_BSS, but as TLS has only tbss and tdata possibilities, we > still want to use tbss. DECL_INITIAL NULL for a decl means implicit zero > initialization. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? > > 2017-09-05 Jakub Jelinek > > PR middle-end/82095 > * varasm.c (categorize_decl_for_section): Use SECCAT_TBSS for TLS vars > with > NULL DECL_INITIAL. > > * gcc.dg/tls/pr82095.c: New test. THanks. Sorry about the breakage. TLS didn't even cross my mind. Presumably the TLS initialization sections are readonly and copied into the actual thread specific locations. Jeff
[PATCH, config.gcc] fix case filter for powerpc-*-vxworkspe
To match on vxworks*spe so it applies to VxWorks 7 as well. Committing to mainline after verifying for e500v2-wrs-vxworks7 that we now include config/powerpcspe/vxworks.h instead of config/rs6000/vxworks.h. Olivier 2017-09-06 Olivier Hainque * config.gcc (powerpc-wrs-vxworksspe): Now match as vxworks*spe. config-vx7spe.diff Description: Binary data
[PATCH, rs6000] Add support for vec_xst_len_r() and vec_xl_len_r() builtins
GCC Maintainers: The following patch adds support for the vec_xst_len_r() and vec_xl_len_r() Powerr 9 builtins. The patch has been run on powerpc64le-unknown-linux-gnu (Power 9 LE). No regressions were found but it does seem to "fix" a couple of existing tests. 136a137 > FAIL: TestCgoCallbackGC 139c140,141 < # of expected passes 350 --- > # of expected passes 349 > # of unexpected failures 1 141c143 < /home/carll/GCC/build/gcc-builtin-pre-commit/./gcc/gccgo version 8.0.0 20170905 (experimental) (GCC) --- > /home/carll/GCC/build/gcc-base/./gcc/gccgo version 8.0.0 20170905 > (experimental) (GCC) 163a166 > FAIL: html/template 167,168c170,172 < # of expected passes 146 < /home/carll/GCC/build/gcc-builtin-pre-commit/./gcc/gccgo version 8.0.0 20170905 (experimental) (GCC) --- > # of expected passes 145 > # of unexpected failures 1 > /home/carll/GCC/build/gcc-base/./gcc/gccgo version 8.0.0 20170905 > (experimental) (GCC) Please let me know if the following patch is acceptable. Thanks. Carl Love gcc/ChangeLog: 2017-09-06 Carl Love * config/rs6000/rs6000-c.c (P9V_BUILTIN_VEC_XL_LEN_R, P9V_BUILTIN_VEC_XST_LEN_R): Add support for builtins vector unsigned char vec_xl_len_r (unsigned char *, size_t); void vec_xst_len_r (vector unsigned char, unsigned char *, size_t); * config/rs6000/altivec.h (vec_xl_len_r, vec_xst_len_r): Add defines. * config/rs6000/rs6000-builtin.def (XL_LEN_R, XST_LEN_R): Add definitions and overloading. * config/rs6000/rs6000.c (altivec_expand_builtin): Add case statement for P9V_BUILTIN_XST_LEN_R. (altivec_init_builtins): Add def_builtin for P9V_BUILTIN_STXVLL. * config/rs6000/vsx.md (addi_neg16, lxvll, stxvll, altivec_lvsl_reg, altivec_lvsr_reg, xl_len_r, xst_len_r): Add define_expand and define_insn for the instructions and builtins. (define_insn "*stxvl"): add missing argument to the sldi instruction. * doc/extend.texi: Update the built-in documenation file for the new built-in functions. gcc/testsuite/ChangeLog: 2017-09-06 Carl Love * gcc.target/powerpc/builtins-5-p9-runnable.c: Add new runable test file for the new built-ins and the existing built-ins. --- gcc/config/rs6000/altivec.h| 2 + gcc/config/rs6000/rs6000-builtin.def | 4 + gcc/config/rs6000/rs6000-c.c | 8 + gcc/config/rs6000/rs6000.c | 7 +- gcc/config/rs6000/vsx.md | 133 - gcc/doc/extend.texi| 4 + .../gcc.target/powerpc/builtins-5-p9-runnable.c| 309 + 7 files changed, 465 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-5-p9-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index c8e508c..94a4db2 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -467,6 +467,8 @@ #ifdef _ARCH_PPC64 #define vec_xl_len __builtin_vec_lxvl #define vec_xst_len __builtin_vec_stxvl +#define vec_xl_len_r __builtin_vec_xl_len_r +#define vec_xst_len_r __builtin_vec_xst_len_r #endif #define vec_cmpnez __builtin_vec_vcmpnez diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 850164a..8f87cce 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2125,6 +2125,7 @@ BU_P9V_OVERLOAD_2 (VIESP, "insert_exp_sp") /* 2 argument vector functions added in ISA 3.0 (power9). */ BU_P9V_64BIT_VSX_2 (LXVL, "lxvl", CONST, lxvl) +BU_P9V_64BIT_VSX_2 (XL_LEN_R, "xl_len_r", CONST, xl_len_r) BU_P9V_AV_2 (VEXTUBLX, "vextublx", CONST, vextublx) BU_P9V_AV_2 (VEXTUBRX, "vextubrx", CONST, vextubrx) @@ -2141,6 +2142,7 @@ BU_P9V_VSX_3 (VINSERT4B_DI, "vinsert4b_di", CONST, vinsert4b_di) /* 3 argument vector functions returning void, treated as SPECIAL, added in ISA 3.0 (power9). */ BU_P9V_64BIT_AV_X (STXVL, "stxvl",MISC) +BU_P9V_64BIT_AV_X (XST_LEN_R, "xst_len_r",MISC) /* 1 argument vector functions added in ISA 3.0 (power9). */ BU_P9V_AV_1 (VCLZLSBB, "vclzlsbb", CONST, vclzlsbb) @@ -2182,12 +2184,14 @@ BU_P9V_AV_P (VCMPNEZW_P,"vcmpnezw_p", CONST, vector_nez_v4si_p) /* ISA 3.0 Vector scalar overloaded 2 argument functions */ BU_P9V_OVERLOAD_2 (LXVL, "lxvl") +BU_P9V_OVERLOAD_2 (XL_LEN_R, "xl_len_r") BU_P9V_OVERLOAD_2 (VEXTULX,"vextulx") BU_P9V_OVERLOAD_2 (VEXTURX,"vexturx") BU_P9V_OVERLOAD_2 (VEXTRACT4B, "vextract4b") /* ISA 3.0 Vector scalar overloaded 3 argument functions */ BU_P9V_OVERLOAD_3 (STXVL, "stxvl") +BU_P9V_OVERLOAD_3 (XST_LEN_R, "xst_len_r") B
[arm-embedded] [PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor
Hi, We have decided to apply the following patch to the embedded-7-branch to enable Arm Cortex-R52 support. *** gcc/ChangeLog.arm *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-14 Thomas Preud'homme * config/arm/arm-cpus.in (cortex-r52): Add new entry. (armv8-r): Set ARM Cortex-R52 as default CPU. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R52. * doc/invoke.texi: Mention -mtune=cortex-r52 and availability of fp.dp extension for -mcpu=cortex-r52. Best regards, Thomas --- Begin Message --- Hi, On 29/06/17 16:13, Thomas Preudhomme wrote: Please ignore this patch. I'll respin the patch on a more recent GCC. Please find an updated patch in attachment. This patch adds support for the ARM Cortex-R52 processor rencently announced. [1] https://developer.arm.com/products/processors/cortex-r/cortex-r52 ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-07-14 Thomas Preud'homme * config/arm/arm-cpus.in (cortex-r52): Add new entry. (armv8-r): Set ARM Cortex-R52 as default CPU. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R52. * doc/invoke.texi: Mention -mtune=cortex-r52 and availability of fp.dp extension for -mcpu=cortex-r52. Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52 and building an hello world with it. Also checked that the .fpu option created by GCC for -mcpu=cortex-r52 and -mcpu=cortex-r52+nofp.dp is as expected (respectively .fpu neon-fp-armv8 and .fpu fpv5-sp-d16 Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index e2ff297aed7514073dbb3bf5ee86964f202e5a14..d009a9e18acb093aefe0f9d8d6de49489fc2325c 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -381,7 +381,7 @@ begin arch armv8-m.main end arch armv8-m.main begin arch armv8-r - tune for cortex-r4 + tune for cortex-r52 tune flags CO_PROC base 8R profile R @@ -1315,6 +1315,16 @@ begin cpu cortex-m33 costs v7m end cpu cortex-m33 +# V8 R-profile implementations. +begin cpu cortex-r52 + cname cortexr52 + tune flags LDSCHED + architecture armv8-r+crc+simd + fpu neon-fp-armv8 + option nofp.dp remove FP_DBL ALL_SIMD + costs cortex +end cpu cortex-r52 + # FPU entries # format: # begin fpu diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 51678c2566e841894c5c0e9c613c8c0f832e9988..4e508b1555a77628ff6e7cfea39c98b87caa840a 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -357,6 +357,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23) EnumValue Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33) +EnumValue +Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52) + Enum Name(arm_arch) Type(int) Known ARM architectures (for use with the -march= option): diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index ba2c7d8ecfdbf6966ebf04b680d587a0e057b161..1b3f7a94cc78fac8abf1042ef60c81a74eaf24eb 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -57,5 +57,6 @@ cortexa73,exynosm1,xgene1, cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35, cortexa73cortexa53,cortexa55,cortexa75, - cortexa75cortexa55,cortexm23,cortexm33" + cortexa75cortexa55,cortexm23,cortexm33, + cortexr52" (const (symbol_ref "((enum attr_tune) arm_tune)"))) diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index 16171d4e801af46ad549314d1f376e90d5bff57c..5c29b94caaba4ff6f89a191f1d8edcf10431c0b3 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -58,6 +58,7 @@ static struct vendor_cpu arm_cpu_table[] = { {"0xc15", "armv7-r", "cortex-r5"}, {"0xc17", "armv7-r", "cortex-r7"}, {"0xc18", "armv7-r", "cortex-r8"}, +{"0xd13", "armv8-r+crc", "cortex-r52"}, {"0xc20", "armv6-m", "cortex-m0"}, {"0xc21", "armv6-m", "cortex-m1"}, {"0xc23", "armv7-m", "cortex-m3"}, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e60edcae53ef3c995054b9b0229b5f0fccbb8462..a093b9bcf77b1f4b40992516e853826bb7d528d4 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15538,7 +15538,7 @@ Permissible names are: @samp{arm2}, @samp{arm250}, @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55}, @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75}, @samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5}, @samp{cortex-r7}, -@samp{cortex-r8}, +@samp{cortex-r8}, @samp{cortex-r52}, @samp{cortex-m33}, @samp{cortex-m23}, @samp{cortex-m7}, @@ -15628,7 +15628,7 @@ Disables the floating-point and SIMD instructions on @item +nofp.dp
[arm-embedded] [PATCH, GCC/ARM] Rewire -mfpu=fp-armv8 as VFPv5 + D32 + DP
Hi, We have decided to apply the following patch to the embedded-7-branch to enable ARMv8-R support. ChangeLog entry is as follows: *** gcc/ChangeLog.arm *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-14 Thomas Preud'homme * config/arm/arm-isa.h (isa_bit_FP_ARMv8): Delete enumerator. (ISA_FP_ARMv8): Define as ISA_FPv5 and ISA_FP_D32. * config/arm/arm-cpus.in (armv8-r): Define fp.sp as enabling FPv5. (fp-armv8): Define it as FP_ARMv8 only. config/arm/arm.h (TARGET_FPU_ARMV8): Delete. (TARGET_VFP_FP16INST): Define using TARGET_VFP5 rather than TARGET_FPU_ARMV8. config/arm/arm.c (arm_rtx_costs_internal): Replace checks against TARGET_FPU_ARMV8 by checks against TARGET_VFP5. * config/arm/arm-builtins.c (arm_builtin_vectorized_function): Define first ARM_CHECK_BUILTIN_MODE definition using TARGET_VFP5 rather than TARGET_FPU_ARMV8. * config/arm/arm-c.c (arm_cpu_builtins): Likewise for __ARM_FEATURE_NUMERIC_MAXMIN macro definition. * config/arm/arm.md (cmov): Condition on TARGET_VFP5 rather than TARGET_FPU_ARMV8. * config/arm/neon.md (neon_vrint): Likewise. (neon_vcvt): Likewise. (neon_): Likewise. (3): Likewise. * config/arm/vfp.md (lsi2): Likewise. * config/arm/predicates.md (arm_cond_move_operator): Check against TARGET_VFP5 rather than TARGET_FPU_ARMV8 and fix spacing. Best regards, Thomas --- Begin Message --- Hi, fp-armv8 is currently defined as a double precision FPv5 with 32 D registers *and* a special FP_ARMv8 bit. However FP for ARMv8 should only bring 32 D registers on top of FPv5-D16 so this FP_ARMv8 bit is spurious. As a consequence, many instruction patterns which are guarded by TARGET_FPU_ARMV8 are unavailable to FPv5-D16 and FPv5-SP-D16. This patch gets rid of TARGET_FPU_ARMV8 and rewire all uses to expressions based on TARGET_VFP5, TARGET_VFPD32 and TARGET_VFP_DOUBLE. It also redefine ISA_FP_ARMv8 to include the D32 capability to distinguish it from FPv5-D16. At last, it sets the +fp.sp for ARMv8-R to enable FPv5-SP-D16 (ie FP for ARMv8 with single precision only and 16 D registers). ChangeLog entry is as follows: 2017-07-07 Thomas Preud'homme * config/arm/arm-isa.h (isa_bit_FP_ARMv8): Delete enumerator. (ISA_FP_ARMv8): Define as ISA_FPv5 and ISA_FP_D32. * config/arm/arm-cpus.in (armv8-r): Define fp.sp as enabling FPv5. (fp-armv8): Define it as FP_ARMv8 only. config/arm/arm.h (TARGET_FPU_ARMV8): Delete. (TARGET_VFP_FP16INST): Define using TARGET_VFP5 rather than TARGET_FPU_ARMV8. config/arm/arm.c (arm_rtx_costs_internal): Replace checks against TARGET_FPU_ARMV8 by checks against TARGET_VFP5. * config/arm/arm-builtins.c (arm_builtin_vectorized_function): Define first ARM_CHECK_BUILTIN_MODE definition using TARGET_VFP5 rather than TARGET_FPU_ARMV8. * config/arm/arm-c.c (arm_cpu_builtins): Likewise for __ARM_FEATURE_NUMERIC_MAXMIN macro definition. * config/arm/arm.md (cmov): Condition on TARGET_VFP5 rather than TARGET_FPU_ARMV8. * config/arm/neon.md (neon_vrint): Likewise. (neon_vcvt): Likewise. (neon_): Likewise. (3): Likewise. * config/arm/vfp.md (lsi2): Likewise. * config/arm/predicates.md (arm_cond_move_operator): Check against TARGET_VFP5 rather than TARGET_FPU_ARMV8 and fix spacing. Testing: * Bootstrapped under ARMv8-A Thumb state and ran testsuite -> no regression * built Spec2000 and Spec2006 with -march=armv8-a+fp16 and compared objdump -> no code generation difference Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 63ee880822c17eda55dd58438d61cbbba333b2c6..7504ed581c63a657a0dff48442633704bd252b2e 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -3098,7 +3098,7 @@ arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in) NULL_TREE is returned if no such builtin is available. */ #undef ARM_CHECK_BUILTIN_MODE #define ARM_CHECK_BUILTIN_MODE(C)\ - (TARGET_FPU_ARMV8 \ + (TARGET_VFP5 \ && flag_unsafe_math_optimizations \ && ARM_CHECK_BUILTIN_MODE_1 (C)) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index a3daa3220a2bc4220dffdb7ca08ca9419bdac425..9178937b6d9e0fe5d0948701390c4cf01f4f8c7d 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -96,7 +96,7 @@ arm_cpu_builtins (struct cpp_reader* pfile) || TARGET_ARM_ARCH_ISA_THUMB >=2)); def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN", - TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_FPU_ARMV8); + TARGET_ARM_ARCH >= 8 && TARGET_NEON && TARGET_VFP5); def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD); diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in inde
[arm-embedded] [PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture
Hi, We have decided to apply the following patch to the embedded-7-branch to enable ARMv8-R support. ChangeLog entry is as follows: *** gcc/ChangeLog.arm *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-06 Thomas Preud'homme * config/arm/arm-cpus.in (armv8-r): Add new entry. * config/arm/arm-isa.h (ISA_ARMv8r): Define macro. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R enumerator. * doc/invoke.texi: Mention -march=armv8-r and its extensions. *** gcc/testsuite/ChangeLog *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-06 Thomas Preud'homme * lib/target-supports.exp: Generate check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r and check_effective_target_arm_arch_v8r_multilib. *** libgcc/ChangeLog *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-06 Thomas Preud'homme * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R. --- Begin Message --- Please find an updated patch in attachment. ChangeLog entry are now as follows: *** gcc/ChangeLog *** 2017-07-06 Thomas Preud'homme * config/arm/arm-cpus.in (armv8-r): Add new entry. * config/arm/arm-isa.h (ISA_ARMv8r): Define macro. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R enumerator. * doc/invoke.texi: Mention -march=armv8-r and its extensions. *** gcc/testsuite/ChangeLog *** 2017-01-31 Thomas Preud'homme * lib/target-supports.exp: Generate check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r and check_effective_target_arm_arch_v8r_multilib. *** libgcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R. Tested by building an arm-none-eabi GCC cross-compiler targetting ARMv8-R. Is this ok for stage1? Best regards, Thomas Best regards, Thomas On 29/06/17 16:13, Thomas Preudhomme wrote: Please ignore this patch. I'll respin the patch on a more recent GCC. Best regards, Thomas On 29/06/17 14:55, Thomas Preudhomme wrote: Hi, This patch adds support for ARMv8-R architecture [1] which was recently announced. User level instructions for ARMv8-R are the same as those in ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same features as ARMv8-A in ARM backend. [1] https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile ChangeLog entries are as follow: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Regenerate. * config/arm/arm-isa.h (ISA_ARMv8r): Define macro. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R enumerator. * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and ARMv8-R with CRC extensions. * doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc options. Document meaning of -march=armv8-r+rcr. *** gcc/testsuite/ChangeLog *** 2017-01-31 Thomas Preud'homme * lib/target-supports.exp: Generate check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r and check_effective_target_arm_arch_v8r_multilib. *** libgcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R. Tested by building an arm-none-eabi GCC cross-compiler targetting ARMv8-R. Is this ok for stage1? Best regards, Thomas diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 946d543ebb29416da9b4928161607cccacaa78a7..f35128acb7d68c6a0592355b9d3d56ee8f826aca 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -380,6 +380,22 @@ begin arch armv8-m.main option nodsp remove bit_ARMv7em end arch armv8-m.main +begin arch armv8-r + tune for cortex-r4 + tune flags CO_PROC + base 8R + profile R + isa ARMv8r + option crc add bit_crc32 +# fp.sp => fp-armv8 (d16); simd => simd + fp-armv8 + d32 + double precision +# note: no fp option for fp-armv8 (d16) + double precision at the moment + option fp.sp add FP_ARMv8 + option simd add FP_ARMv8 NEON + option crypto add FP_ARMv8 CRYPTO + option nocrypto remove ALL_CRYPTO + option nofp remove ALL_FP +end arch armv8-r + begin arch iwmmxt tune for iwmmxt tune flags LDSCHED STRONG XSCALE diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h index c0c2ccee330f2313951e980c5d399ae5d21005d6..0d66a0400c517668db023fc66ff43e26d43add51 100644 --- a/gcc/config/arm/arm-isa.h +++ b/gcc/config/arm/arm-isa.h @@ -127,6 +127,7 @@ enum isa_feature #define IS
[arm-embedded] [PATCH 1/3, GCC/ARM, ping] Add MIDR info for ARM Cortex-R7 and Cortex-R8
Hi, We have decided to apply the following patch to the embedded-7-branch as a dependency patch to enable ARMv8-R support. ChangeLog entry is as follows: *** gcc/ChangeLog.arm *** 2017-09-04 Thomas Preud'homme Backport from mainline 2017-07-04 Thomas Preud'homme * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R7 and Cortex-R8 processors. Best regards, Thomas --- Begin Message --- Ping? Best regards, Thomas On 29/06/17 14:55, Thomas Preudhomme wrote: Hi, The driver is missing MIDR information for processors ARM Cortex-R7 and Cortex-R8 to support -march/-mcpu/-mtune=native on the command line. This patch adds the missing information. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R7 and Cortex-R8 processors. Is this ok for master? Best regards, Thomas diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = { {"0xd09", "armv8-a+crc", "cortex-a73"}, {"0xc14", "armv7-r", "cortex-r4"}, {"0xc15", "armv7-r", "cortex-r5"}, +{"0xc17", "armv7-r", "cortex-r7"}, +{"0xc18", "armv7-r", "cortex-r8"}, {"0xc20", "armv6-m", "cortex-m0"}, {"0xc21", "armv6-m", "cortex-m1"}, {"0xc23", "armv7-m", "cortex-m3"}, --- End Message ---
Re: Add support to trace comparison instructions and switch statements
On Wed, Sep 06, 2017 at 04:37:18PM +0200, Jakub Jelinek wrote: > Ok. Please make sure those entrypoints make it into the various example > __sanitier_cov_trace* fuzzer implementations though, so that people using > -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves. > At least it should be added to sanitizer_common (both in LLVM and GCC). Forgot to say that I've committed the patch to GCC trunk today. Jakub
Re: Add support to trace comparison instructions and switch statements
On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote: > Hi Jakub > I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall > -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address". > And run my fuzzer with pc and cmp feedbacks for hours. It works fine. > About __sanitizer_cov_trace_cmp{f,d} , yes, it isn't provided by llvm. But > once we trace integer comparisons, why not real type comparisons. > I remember Dmitry said it is not enough useful to trace real type comparisons > because it is rare to see them in programs. > But libdng_sdk really has real type comparisons. So I want to keep them and > implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary. Ok. Please make sure those entrypoints make it into the various example __sanitier_cov_trace* fuzzer implementations though, so that people using -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves. At least it should be added to sanitizer_common (both in LLVM and GCC). BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep using some weirdo LLVM IL acronym instead of being named by what it really traces (trace-array-idx or something similar)). Any plans to implement some or all of those? Jakub
[PATCH 2/2] [arm] Improve error checking in parsecpu.awk
This patch adds a bit more error checking to parsecpu.awk to ensure that statements are not missing arguments or have excess arguments beyond those permitted. It also slightly improves the handling of errors so that we terminate properly if parsing fails and be as helpful as we can while in the parsing phase. * config/arm/parsecpu.awk (fatal): Note that we've encountered an error. Only quit immediately if parsing is complete. (BEGIN): Initialize fatal_err and parse_done. (begin fpu, end fpu): Check number of arguments. (begin arch, end arch): Likewise. (begin cpu, end cpu): Likewise. (cname, tune for, tune flags, architecture, fpu, option): Likewise. (optalias): Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251800 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 11 +++ gcc/config/arm/parsecpu.awk | 26 +- 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index cab5166..69713c1 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,16 @@ 2017-09-06 Richard Earnshaw + * config/arm/parsecpu.awk (fatal): Note that we've encountered an + error. Only quit immediately if parsing is complete. + (BEGIN): Initialize fatal_err and parse_done. + (begin fpu, end fpu): Check number of arguments. + (begin arch, end arch): Likewise. + (begin cpu, end cpu): Likewise. + (cname, tune for, tune flags, architecture, fpu, option): Likewise. + (optalias): Likewise. + +2017-09-06 Richard Earnshaw + * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk index d07d3fc..0b4fc68 100644 --- a/gcc/config/arm/parsecpu.awk +++ b/gcc/config/arm/parsecpu.awk @@ -32,7 +32,8 @@ function fatal (m) { print "error ("lineno"): " m > "/dev/stderr" -exit 1 +fatal_err = 1 +if (parse_done) exit 1 } function toplevel () { @@ -502,14 +503,18 @@ BEGIN { arch_name = "" fpu_name = "" lineno = 0 +fatal_err = 0 +parse_done = 0 if (cmd == "") fatal("Usage parsecpu.awk -v cmd=") } +# New line. Reset parse status and increment line count for error messages // { lineno++ parse_ok = 0 } +# Comments must be on a line on their own. /^#/ { parse_ok = 1 } @@ -552,12 +557,14 @@ BEGIN { } /^begin fpu / { +if (NF != 3) fatal("syntax: begin fpu ") toplevel() fpu_name = $3 parse_ok = 1 } /^end fpu / { +if (NF != 3) fatal("syntax: end fpu ") if (fpu_name != $3) fatal("mimatched end fpu") if (! (fpu_name in fpu_isa)) { fatal("fpu definition \"" fpu_name "\" lacks an \"isa\" statement") @@ -570,24 +577,28 @@ BEGIN { } /^begin arch / { +if (NF != 3) fatal("syntax: begin arch ") toplevel() arch_name = $3 parse_ok = 1 } /^[ ]*base / { +if (NF != 2) fatal("syntax: base ") if (arch_name == "") fatal("\"base\" statement outside of arch block") arch_base[arch_name] = $2 parse_ok = 1 } /^[ ]*profile / { +if (NF != 2) fatal("syntax: profile ") if (arch_name == "") fatal("\"profile\" statement outside of arch block") arch_prof[arch_name] = $2 parse_ok = 1 } /^end arch / { +if (NF != 3) fatal("syntax: end arch ") if (arch_name != $3) fatal("mimatched end arch") if (! arch_name in arch_tune_for) { fatal("arch definition lacks a \"tune for\" statement") @@ -603,18 +614,21 @@ BEGIN { } /^begin cpu / { +if (NF != 3) fatal("syntax: begin cpu ") toplevel() cpu_name = $3 parse_ok = 1 } /^[ ]*cname / { +if (NF != 2) fatal("syntax: cname ") if (cpu_name == "") fatal("\"cname\" outside of cpu block") cpu_cnames[cpu_name] = $2 parse_ok = 1 } /^[ ]*tune for / { +if (NF != 3) fatal("syntax: tune for ") if (cpu_name != "") { cpu_tune_for[cpu_name] = $3 } else if (arch_name != "") { @@ -624,6 +638,7 @@ BEGIN { } /^[ ]*tune flags / { +if (NF < 3) fatal("syntax: tune flags []*") flags="" flag_count = NF for (n = 3; n <= flag_count; n++) { @@ -640,18 +655,21 @@ BEGIN { } /^[ ]*architecture / { +if (NF != 2) fatal("syntax: architecture ") if (cpu_name == "") fatal("\"architecture\" outside of cpu block") cpu_arch[cpu_name] = $2 parse_ok = 1 } /^[ ]*fpu / { +if (NF != 2) fatal("syntax: fpu ") if (cpu_name == "") fatal("\"fpu\" outside of cpu block") cpu_fpu[cpu_name] = $2 parse_ok = 1 } /^[ ]*isa / { +if (NF < 2) fatal("syntax: isa []*") flags="" flag_count = NF for (n = 2; n <= flag_count; n++) { @@ -670,6 +688,7 @@ BEGIN { } /^[ ]*option / { +if (NF < 4) fatal("syntax: option add|remove +") name=$2 if ($3 == "add") { remove
[arm] auto-generate arm-isa.h from CPU descriptions
This patch autogenerates arm-isa.h from new entries in arm-cpus.in. This has the primary advantage that it makes the description file more self-contained, but it also solves the 'array dimensioning' problem that Tamar recently encountered. It adds two new constructs to arm-cpus.in: features and fgroups. Fgroups are simply a way of naming a group of feature bits so that they can be referenced together. We follow the convention that feature bits are all lower case, while fgroups are (predominantly) upper case. This is helpful as in some contexts they share the same namespace. Most of the minor changes in this patch are related to adopting this new naming convention. * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. * config/arm/arm-isa.h: Delete. Move definitions to ... * arm-cpus.in: ... here. Use new feature and fgroup values. * config/arm/arm.c (arm_option_override): Use lower case for feature bit names. * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. * config/arm/parsecpu.awk (END): Add new command 'isa'. (isa_pfx): Delete. (print_isa_bits_for): New function. (gen_isa): New function. (gen_comm_data): Use print_isa_bits_for. (define feature): New keyword. (define fgroup): New keyword. * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h (arm-isa.h): Add rule to generate file. * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower case for feature bit names. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251799 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 21 +++ gcc/common/config/arm/arm-common.c | 10 +- gcc/config.gcc | 2 +- gcc/config/arm/arm-cpus.in | 262 + gcc/config/arm/arm-isa.h | 172 gcc/config/arm/arm.c | 32 ++--- gcc/config/arm/arm.h | 8 +- gcc/config/arm/parsecpu.awk| 187 ++ gcc/config/arm/t-arm | 9 ++ 9 files changed, 418 insertions(+), 285 deletions(-) delete mode 100644 gcc/config/arm/arm-isa.h diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5df398c..cab5166 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,24 @@ +2017-09-06 Richard Earnshaw + + * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file. + * config/arm/arm-isa.h: Delete. Move definitions to ... + * arm-cpus.in: ... here. Use new feature and fgroup values. + * config/arm/arm.c (arm_option_override): Use lower case for feature + bit names. + * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise. + (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise. + * config/arm/parsecpu.awk (END): Add new command 'isa'. + (isa_pfx): Delete. + (print_isa_bits_for): New function. + (gen_isa): New function. + (gen_comm_data): Use print_isa_bits_for. + (define feature): New keyword. + (define fgroup): New keyword. + * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h + (arm-isa.h): Add rule to generate file. + * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower + case for feature bit names. + 2017-09-06 Richard Biener * tree-ssa-pre.c (NECESSARY): Remove. diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c index 38bd3a7..7cb99ec 100644 --- a/gcc/common/config/arm/arm-common.c +++ b/gcc/common/config/arm/arm-common.c @@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv) { /* The easiest and safest way to remove the default fpu capabilities is to look for a '+no..' option that removes - the base FPU bit (isa_bit_VFPv2). If that doesn't exist + the base FPU bit (isa_bit_vfpv2). If that doesn't exist then the best we can do is strip out all the bits that might be part of the most capable FPU we know about, which is "crypto-neon-fp-armv8". */ @@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv) ++ext) { if (ext->remove - && check_isa_bits_for (ext->isa_bits, isa_bit_VFPv2)) + && check_isa_bits_for (ext->isa_bits, isa_bit_vfpv2)) { arm_initialize_isa (fpu_isa, ext->isa_bits); bitmap_and_compl (target_isa, target_isa, fpu_isa); @@ -620,7 +620,7 @@ arm_canon_arch_option (int argc, const char **argv) { /* Clearing the VFPv2 bit is sufficient to stop any extention that builds on the FPU from matching. */ - bitmap_clear_bit (target_isa, isa_bit_VFPv2); + bitmap_clear_bit (target_isa, isa_bit_vfpv2); } /* If we don't have a selected architecture by now, something's @@ -692,8 +692,8 @@ arm_canon_arch_option (int argc, const char **argv) capable FPU variant that we do support. This is sufficient for multilib selection. */ - if (bitmap_bit_p (target_isa_unsatisfie
[C++ PATCH] method vec
This preparatory patch fixes up a couple of places where a non-function could start appearing in the METHOD_VEC. The warn_hidden change looks bigger than necessary, because of indentation change. I noticed check_classfn could check a template mismatch earlier, and avoid doing some work. applied to trunk. nathan -- Nathan Sidwell 2017-09-06 Nathan Sidwell * class.c (warn_hidden): Don't barf on non-functions. * decl2.c (check_classfn): Likewise. Check template match earlier. Index: class.c === --- class.c (revision 251782) +++ class.c (working copy) @@ -2818,63 +2818,64 @@ check_for_override (tree decl, tree ctyp static void warn_hidden (tree t) { - vec *method_vec = CLASSTYPE_METHOD_VEC (t); - tree fns; - - /* We go through each separately named virtual function. */ - for (int i = 0; vec_safe_iterate (method_vec, i, &fns); ++i) -{ - tree name = OVL_NAME (fns); - auto_vec base_fndecls; - tree base_binfo; - tree binfo; - int j; + if (vec *method_vec = CLASSTYPE_METHOD_VEC (t)) +for (unsigned ix = method_vec->length (); ix--;) + { + tree fns = (*method_vec)[ix]; - /* Iterate through all of the base classes looking for possibly - hidden functions. */ - for (binfo = TYPE_BINFO (t), j = 0; - BINFO_BASE_ITERATE (binfo, j, base_binfo); j++) - { - tree basetype = BINFO_TYPE (base_binfo); - get_basefndecls (name, basetype, &base_fndecls); - } + if (!OVL_P (fns)) + continue; - /* If there are no functions to hide, continue. */ - if (base_fndecls.is_empty ()) - continue; + tree name = OVL_NAME (fns); + auto_vec base_fndecls; + tree base_binfo; + tree binfo; + unsigned j; + + /* Iterate through all of the base classes looking for possibly + hidden functions. */ + for (binfo = TYPE_BINFO (t), j = 0; + BINFO_BASE_ITERATE (binfo, j, base_binfo); j++) + { + tree basetype = BINFO_TYPE (base_binfo); + get_basefndecls (name, basetype, &base_fndecls); + } - /* Remove any overridden functions. */ - for (ovl_iterator iter (fns); iter; ++iter) - { - tree fndecl = *iter; - if (TREE_CODE (fndecl) == FUNCTION_DECL - && DECL_VINDEX (fndecl)) - { - /* If the method from the base class has the same - signature as the method from the derived class, it - has been overridden. */ - for (size_t k = 0; k < base_fndecls.length (); k++) - if (base_fndecls[k] - && same_signature_p (fndecl, base_fndecls[k])) - base_fndecls[k] = NULL_TREE; - } - } + /* If there are no functions to hide, continue. */ + if (base_fndecls.is_empty ()) + continue; - /* Now give a warning for all base functions without overriders, - as they are hidden. */ - size_t k; - tree base_fndecl; - FOR_EACH_VEC_ELT (base_fndecls, k, base_fndecl) - if (base_fndecl) + /* Remove any overridden functions. */ + for (ovl_iterator iter (fns); iter; ++iter) { - /* Here we know it is a hider, and no overrider exists. */ - warning_at (location_of (base_fndecl), - OPT_Woverloaded_virtual, - "%qD was hidden", base_fndecl); - warning_at (location_of (fns), - OPT_Woverloaded_virtual, " by %qD", fns); + tree fndecl = *iter; + if (TREE_CODE (fndecl) == FUNCTION_DECL + && DECL_VINDEX (fndecl)) + { + /* If the method from the base class has the same + signature as the method from the derived class, it + has been overridden. */ + for (size_t k = 0; k < base_fndecls.length (); k++) + if (base_fndecls[k] + && same_signature_p (fndecl, base_fndecls[k])) + base_fndecls[k] = NULL_TREE; + } } -} + + /* Now give a warning for all base functions without overriders, + as they are hidden. */ + tree base_fndecl; + FOR_EACH_VEC_ELT (base_fndecls, j, base_fndecl) + if (base_fndecl) + { + /* Here we know it is a hider, and no overrider exists. */ + warning_at (location_of (base_fndecl), + OPT_Woverloaded_virtual, + "%qD was hidden", base_fndecl); + warning_at (location_of (fns), + OPT_Woverloaded_virtual, " by %qD", fns); + } + } } /* Recursive helper for finish_struct_anon. */ @@ -6981,7 +6982,7 @@ unreverse_member_declarations (tree t) /* For the TYPE_FIELDS, only the non TYPE_DECLs are in reverse order, so we can't just use nreverse. Due to stat_hack - chicanery in finish_member_declarations. */ + chicanery in finish_member_declaration. */ prev = NULL_TREE; for (x = TYPE_FIELDS (t); x && TREE_CODE (x) != TYPE_DECL; Index: decl2.c === --- decl2.c (revision 251782) +++ decl2.c (working copy) @@ -611,6 +611,15 @@ check_classfn (tree ctype, tree function for (ovl_iterator iter (fns); !matched && iter; ++iter) { tree fndecl = *iter; + + /* A member template definition only matches a member template
[Ada] Wrong code on assignment of conditional expression to a mutable obkect
This patch fixes an error in an assignmen statement to an entity of a mutable type (variable or in-out parameter) when the righ-hand side of the assignment is a conditioal expression, some of whose alternatives are aggregates. Prior to this patch, not all components of the mutable object were properly assigned the corresponding values of the aggregate. Executing: gnatmake -q bug ./bug must yield: local var 72 local var 42 in_out parameter 72 in_out parameter 42 --- with Ada.Text_IO; procedure Bug is type Yoyo (Exists : Boolean := False) is record case Exists is when False => null; when True => Value : Integer := 5; end case; end record; Var1 : Yoyo; Var2 : Yoyo; procedure Test (Condition : in Boolean; Value : in Integer; Yo: in out Yoyo) is Var3 : Yoyo; begin Yo := (if Condition then (Exists => True, Value => Value) else (Exists => False)); Var3 := (case condition is when True => (Exists => True, Value => Value), when False => (Exists => False)); if Condition and then Yo.Value /= Value then Ada.Text_IO.Put_Line ("Compiler bug exposed"); end if; if Condition then Ada.Text_IO.Put_Line ("local var " & Integer'Image (Var3.Value)); end if; end; begin Test (True, 72, Var1); Test (True, 42, Var2); Ada.Text_IO.Put_Line ("in_out parameter " & Var1.Value'Img); Ada.Text_IO.Put_Line ("in_out parameter " & Var2.Value'Img); Test (False, 1000, Var1); end Bug; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_ch5.adb (Analyze_Assigment): If the left-hand side is an entity of a mutable type and the right-hand side is a conditional expression, resolve the alternatives of the conditional using the base type of the target entity, because the alternatives may have distinct subtypes. This is particularly relevant if the alternatives are aggregates. Index: sem_ch5.adb === --- sem_ch5.adb (revision 251789) +++ sem_ch5.adb (working copy) @@ -580,8 +580,27 @@ Set_Assignment_Type (Lhs, T1); - Resolve (Rhs, T1); + -- If the target of the assignment is an entity of a mutable type + -- and the expression is a conditional expression, its alternatives + -- can be of different subtypes of the nominal type of the LHS, so + -- they must be resolved with the base type, given that their subtype + -- may differ frok that of the target mutable object. + if Is_Entity_Name (Lhs) +and then Ekind_In (Entity (Lhs), + E_Variable, + E_Out_Parameter, + E_In_Out_Parameter) +and then Is_Composite_Type (T1) +and then not Is_Constrained (Etype (Entity (Lhs))) +and then Nkind_In (Rhs, N_If_Expression, N_Case_Expression) + then + Resolve (Rhs, Base_Type (T1)); + + else + Resolve (Rhs, T1); + end if; + -- This is the point at which we check for an unset reference Check_Unset_Reference (Rhs);
[C++ PATCH] class FIELD_VEC initialization
Here's some cleanup of the SORTED_FIELDS vector initialization. Some function renaming, to be more specific. The functionality change is a minor bug in late enums. We only add them to the field vec, if there's already a field vec. But of course, their addition could have cause the class's TYPE_FIELD length to cross the threshold for wanting a field vector. Applied to trunk. nathan -- Nathan Sidwell 2017-09-06 Nathan Sidwell * name-lookup.c (count_fields): Rename to ... (count_class_fields): ... here. Take a class, don't count NULL-named fields. (add_fields_to_record_type): Rename to ... (field_vec_append_class_fields): ... here. Take a class, don't add NULL-named fields. (add_enum_fields_to_record_type): Rename to ... (field_vec_append_enum_values): ... here. (set_class_bindings): Adjust, assert we added expected number. (insert_late_enum_def_bindings): Reimplement. Create vector if there are now sufficient entries. Index: name-lookup.c === --- name-lookup.c (revision 251782) +++ name-lookup.c (working copy) @@ -1452,59 +1452,57 @@ sorted_fields_type_new (int n) return sft; } -/* Subroutine of insert_into_classtype_sorted_fields. Recursively - count the number of fields in TYPE, including anonymous union - members. */ +/* Recursively count the number of fields in KLASS, including anonymous + union members. */ -static int -count_fields (tree fields) +static unsigned +count_class_fields (tree klass) { - tree x; - int n_fields = 0; - for (x = fields; x; x = DECL_CHAIN (x)) -{ - if (DECL_DECLARES_FUNCTION_P (x)) - /* Functions are dealt with separately. */; - else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x))) - n_fields += count_fields (TYPE_FIELDS (TREE_TYPE (x))); - else - n_fields += 1; -} + unsigned n_fields = 0; + + for (tree fields = TYPE_FIELDS (klass); fields; fields = DECL_CHAIN (fields)) +if (DECL_DECLARES_FUNCTION_P (fields)) + /* Functions are dealt with separately. */; +else if (TREE_CODE (fields) == FIELD_DECL + && ANON_AGGR_TYPE_P (TREE_TYPE (fields))) + n_fields += count_class_fields (TREE_TYPE (fields)); +else if (DECL_NAME (fields)) + n_fields += 1; + return n_fields; } -/* Subroutine of insert_into_classtype_sorted_fields. Recursively add - all the fields in the TREE_LIST FIELDS to the SORTED_FIELDS_TYPE - elts, starting at offset IDX. */ - -static int -add_fields_to_record_type (tree fields, struct sorted_fields_type *field_vec, - int idx) +/* Append all the nonfunction members fields of KLASS to FIELD_VEC + starting at IDX. Recurse for anonymous members. The array must + have space. Returns the next available index. */ + +static unsigned +field_vec_append_class_fields (struct sorted_fields_type *field_vec, + tree klass, unsigned idx) { - tree x; - for (x = fields; x; x = DECL_CHAIN (x)) -{ - if (DECL_DECLARES_FUNCTION_P (x)) - /* Functions are handled separately. */; - else if (TREE_CODE (x) == FIELD_DECL && ANON_AGGR_TYPE_P (TREE_TYPE (x))) - idx = add_fields_to_record_type (TYPE_FIELDS (TREE_TYPE (x)), field_vec, idx); - else - field_vec->elts[idx++] = x; -} + for (tree fields = TYPE_FIELDS (klass); fields; fields = DECL_CHAIN (fields)) +if (DECL_DECLARES_FUNCTION_P (fields)) + /* Functions are handled separately. */; +else if (TREE_CODE (fields) == FIELD_DECL + && ANON_AGGR_TYPE_P (TREE_TYPE (fields))) + idx = field_vec_append_class_fields (field_vec, TREE_TYPE (fields), idx); +else if (DECL_NAME (fields)) + field_vec->elts[idx++] = fields; + return idx; } -/* Add all of the enum values of ENUMTYPE, to the FIELD_VEC elts, - starting at offset IDX. */ +/* Append all of the enum values of ENUMTYPE to FIELD_VEC starting at IDX. + FIELD_VEC must have space. */ -static int -add_enum_fields_to_record_type (tree enumtype, -struct sorted_fields_type *field_vec, -int idx) +static unsigned +field_vec_append_enum_values (struct sorted_fields_type *field_vec, + tree enumtype, unsigned idx) { - tree values; - for (values = TYPE_VALUES (enumtype); values; values = TREE_CHAIN (values)) + for (tree values = TYPE_VALUES (enumtype); + values; values = TREE_CHAIN (values)) field_vec->elts[idx++] = TREE_VALUE (values); + return idx; } @@ -1518,12 +1516,12 @@ set_class_bindings (tree klass) qsort (method_vec->address (), method_vec->length (), sizeof (tree), method_name_cmp); - tree fields = TYPE_FIELDS (klass); - int n_fields = count_fields (fields); + int n_fields = count_class_fields (klass); if (n_fields >= 8) { struct sorted_fields_type *field_vec = sorted_fields_type_new (n_fields); - add_fields_to_record_type (fields, field_vec, 0); + unsigned idx = field_vec_append_class_fields (field_vec, klass, 0); + gcc_assert (idx =
Re: [PATCH, ARM] correctly encode the CC reg data flow
On 09/06/17 14:51, Richard Earnshaw (lists) wrote: > On 06/09/17 13:44, Bernd Edlinger wrote: >> On 09/04/17 21:54, Bernd Edlinger wrote: >>> Hi Kyrill, >>> >>> Thanks for your review! >>> >>> >>> On 09/04/17 15:55, Kyrill Tkachov wrote: Hi Bernd, On 18/01/17 15:36, Bernd Edlinger wrote: > On 01/13/17 19:28, Bernd Edlinger wrote: >> On 01/13/17 17:10, Bernd Edlinger wrote: >>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: On 18/12/16 12:58, Bernd Edlinger wrote: > Hi, > > this is related to PR77308, the follow-up patch will depend on this > one. > > When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned > before reload, a mis-compilation in libgcc function > __gnu_satfractdasq > was discovered, see [1] for more details. > > The reason seems to be that when the *arm_cmpdi_insn is directly > followed by a *arm_cmpdi_unsigned instruction, both are split > up into this: > > [(set (reg:CC CC_REGNUM) >(compare:CC (match_dup 0) (match_dup 1))) > (parallel [(set (reg:CC CC_REGNUM) > (compare:CC (match_dup 3) (match_dup 4))) > (set (match_dup 2) > (minus:SI (match_dup 5) >(ltu:SI (reg:CC_C CC_REGNUM) > (const_int > 0])] > > [(set (reg:CC CC_REGNUM) >(compare:CC (match_dup 2) (match_dup 3))) > (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) > (set (reg:CC CC_REGNUM) > (compare:CC (match_dup 0) (match_dup 1] > > The problem is that the reg:CC from the *subsi3_carryin_compare > is not mentioning that the reg:CC is also dependent on the reg:CC > from before. Therefore the *arm_cmpsi_insn appears to be > redundant and thus got removed, because the data values are > identical. > > I think that applies to a number of similar pattern where data > flow is happening through the CC reg. > > So this is a kind of correctness issue, and should be fixed > independently from the optimization issue PR77308. > > Therefore I think the patterns need to specify the true > value that will be in the CC reg, in order for cse to > know what the instructions are really doing. > > > Bootstrapped and reg-tested on arm-linux-gnueabihf. > Is it OK for trunk? > I agree you've found a valid problem here, but I have some issues with the patch itself. (define_insn_and_split "subdi3_compare1" [(set (reg:CC_NCV CC_REGNUM) (compare:CC_NCV (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "register_operand" "r"))) (set (match_operand:DI 0 "register_operand" "=&r") (minus:DI (match_dup 1) (match_dup 2)))] "TARGET_32BIT" "#" "&& reload_completed" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) (parallel [(set (reg:CC_C CC_REGNUM) (compare:CC_C (zero_extend:DI (match_dup 4)) (plus:DI (zero_extend:DI (match_dup 5)) (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] This pattern is now no-longer self consistent in that before the split the overall result for the condition register is in mode CC_NCV, but afterwards it is just CC_C. I think CC_NCV is correct mode (the N, C and V bits all correctly reflect the result of the 64-bit comparison), but that then implies that the cc mode of subsi3_carryin_compare is incorrect as well and should in fact also be CC_NCV. Thinking about this pattern, I'm inclined to agree that CC_NCV is the correct mode for this operation I'm not sure if there are other consequences that will fall out from fixing this (it's possible that we might need a change to select_cc_mode as well). >>> Yes, this is still a bit awkward... >>> >>> The N and V bit will be the correct result for the subdi3_compare1 >>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI
[PATCH, e500v2-vxworks] correct CPU name designation for 8548 targets on VxWorks7
Compared to prior versions of regular VxWorks (not AE/653), the VxWorks 7 header files expect the e500v2 family of CPUs to be designated in a slightly different fashion. With this on top of previously posted patches, a build for e500v2-wrs-vxworks proceeds to completion. Committing to mainline. Olivier 2017-09-06 Olivier Hainque * config/powerpcspe/vxworks.h (VXCPU_FOR_8548): Correct definition for VxWorks 7. Adjust surrounding comments. vx7-cpu-8548.diff Description: Binary data
[PATCH] Replace PRE "DCE"
The following replaces the weird PRE "DCE" algorithm by a simple work-list based one seeded by inserted_exprs. This makes it possible to get rid of the error-prone marking of stmts necessary and allows re-ordering of elimination dead stmt removal and DCE again (I'm in the process of developing a RPO based VN and want to keep elimination common but move it out of PRE). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2017-09-06 Richard Biener * tree-ssa-pre.c (NECESSARY): Remove. (create_expression_by_pieces): Do not touch pass-local flags. (insert_into_preds_of_block): Likewise. (do_pre_regular_insertion): Likewise. (eliminate_insert): Likewise. (eliminate_dom_walker::before_dom_children): Likewise. (fini_eliminate): Do not look at inserted_exprs. (mark_operand_necessary): Remove. (remove_dead_inserted_code): Replace with simple work-list algorithm based on inserted_exprs and SSA uses. (pass_pre::execute): Re-order fini_eliminate and remove_dead_inserted_code. Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 251790) +++ gcc/tree-ssa-pre.c (working copy) @@ -2753,8 +2753,6 @@ find_or_generate_expression (basic_block return NULL_TREE; } -#define NECESSARY GF_PLF_1 - /* Create an expression in pieces, so that we can handle very complex expressions that may be ANTIC, but not necessary GIMPLE. BLOCK is the basic block the expression will be inserted into, @@ -2972,7 +2970,6 @@ create_expression_by_pieces (basic_block } bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (forcedname)); - gimple_set_plf (stmt, NECESSARY, false); } gimple_seq_add_seq (stmts, forced_stmts); } @@ -3095,7 +3092,6 @@ insert_into_preds_of_block (basic_block temp = make_temp_ssa_name (type, NULL, "prephitmp"); phi = create_phi_node (temp, block); - gimple_set_plf (phi, NECESSARY, false); VN_INFO_GET (temp)->value_id = val; VN_INFO (temp)->valnum = sccvn_valnum_from_value_id (val); if (VN_INFO (temp)->valnum == NULL_TREE) @@ -3342,7 +3338,6 @@ do_pre_regular_insertion (basic_block bl gimple_stmt_iterator gsi = gsi_after_labels (block); gsi_insert_before (&gsi, assign, GSI_NEW_STMT); - gimple_set_plf (assign, NECESSARY, false); VN_INFO_GET (temp)->value_id = val; VN_INFO (temp)->valnum = sccvn_valnum_from_value_id (val); if (VN_INFO (temp)->valnum == NULL_TREE) @@ -4204,9 +4199,6 @@ eliminate_insert (gimple_stmt_iterator * { gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); VN_INFO_GET (res)->valnum = val; - - if (TREE_CODE (leader) == SSA_NAME) - gimple_set_plf (SSA_NAME_DEF_STMT (leader), NECESSARY, true); } pre_stats.insertions++; @@ -4291,17 +4283,9 @@ eliminate_dom_walker::before_dom_childre remove_phi_node (&gsi, false); - if (inserted_exprs - && !bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (res)) - && TREE_CODE (sprime) == SSA_NAME) - gimple_set_plf (SSA_NAME_DEF_STMT (sprime), NECESSARY, true); - if (!useless_type_conversion_p (TREE_TYPE (res), TREE_TYPE (sprime))) sprime = fold_convert (TREE_TYPE (res), sprime); gimple *stmt = gimple_build_assign (res, sprime); - /* ??? It cannot yet be necessary (DOM walk). */ - gimple_set_plf (stmt, NECESSARY, gimple_plf (phi, NECESSARY)); - gimple_stmt_iterator gsi2 = gsi_after_labels (b); gsi_insert_before (&gsi2, stmt, GSI_NEW_STMT); continue; @@ -4478,10 +4462,6 @@ eliminate_dom_walker::before_dom_childre print_gimple_stmt (dump_file, stmt, 0); } - if (TREE_CODE (sprime) == SSA_NAME) - gimple_set_plf (SSA_NAME_DEF_STMT (sprime), - NECESSARY, true); - pre_stats.eliminations++; gimple *orig_stmt = stmt; if (!useless_type_conversion_p (TREE_TYPE (lhs), @@ -4615,10 +4595,6 @@ eliminate_dom_walker::before_dom_childre { propagate_value (use_p, sprime); modified = true; - if (TREE_CODE (sprime) == SSA_NAME - && !is_gimple_debug (stmt)) - gimple_set_plf (SSA_NAME_DEF_STMT (sprime), - NECESSARY, true); } } @@ -4787,11 +4763,7 @@ eliminate_dom_walker::before_dom_childre continue; tree sprime = eliminate_avail (arg); if (sprime && may_propagate_copy (arg, sprime)) - { - propagate_value (use_p, sprime); - if (TREE_CODE (sprime) == SSA_NAME) - gimple_set_plf (SSA_NAME_DEF_STMT (sprime), NECESSARY, true); -
Re: [PATCH, ARM] correctly encode the CC reg data flow
On 09/06/17 14:51, Richard Earnshaw (lists) wrote: > On 06/09/17 13:44, Bernd Edlinger wrote: >> On 09/04/17 21:54, Bernd Edlinger wrote: >>> Hi Kyrill, >>> >>> Thanks for your review! >>> >>> >>> On 09/04/17 15:55, Kyrill Tkachov wrote: Hi Bernd, On 18/01/17 15:36, Bernd Edlinger wrote: > On 01/13/17 19:28, Bernd Edlinger wrote: >> On 01/13/17 17:10, Bernd Edlinger wrote: >>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: On 18/12/16 12:58, Bernd Edlinger wrote: > Hi, > > this is related to PR77308, the follow-up patch will depend on this > one. > > When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned > before reload, a mis-compilation in libgcc function > __gnu_satfractdasq > was discovered, see [1] for more details. > > The reason seems to be that when the *arm_cmpdi_insn is directly > followed by a *arm_cmpdi_unsigned instruction, both are split > up into this: > > [(set (reg:CC CC_REGNUM) >(compare:CC (match_dup 0) (match_dup 1))) > (parallel [(set (reg:CC CC_REGNUM) > (compare:CC (match_dup 3) (match_dup 4))) > (set (match_dup 2) > (minus:SI (match_dup 5) >(ltu:SI (reg:CC_C CC_REGNUM) > (const_int > 0])] > > [(set (reg:CC CC_REGNUM) >(compare:CC (match_dup 2) (match_dup 3))) > (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) > (set (reg:CC CC_REGNUM) > (compare:CC (match_dup 0) (match_dup 1] > > The problem is that the reg:CC from the *subsi3_carryin_compare > is not mentioning that the reg:CC is also dependent on the reg:CC > from before. Therefore the *arm_cmpsi_insn appears to be > redundant and thus got removed, because the data values are > identical. > > I think that applies to a number of similar pattern where data > flow is happening through the CC reg. > > So this is a kind of correctness issue, and should be fixed > independently from the optimization issue PR77308. > > Therefore I think the patterns need to specify the true > value that will be in the CC reg, in order for cse to > know what the instructions are really doing. > > > Bootstrapped and reg-tested on arm-linux-gnueabihf. > Is it OK for trunk? > I agree you've found a valid problem here, but I have some issues with the patch itself. (define_insn_and_split "subdi3_compare1" [(set (reg:CC_NCV CC_REGNUM) (compare:CC_NCV (match_operand:DI 1 "register_operand" "r") (match_operand:DI 2 "register_operand" "r"))) (set (match_operand:DI 0 "register_operand" "=&r") (minus:DI (match_dup 1) (match_dup 2)))] "TARGET_32BIT" "#" "&& reload_completed" [(parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) (parallel [(set (reg:CC_C CC_REGNUM) (compare:CC_C (zero_extend:DI (match_dup 4)) (plus:DI (zero_extend:DI (match_dup 5)) (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5)) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] This pattern is now no-longer self consistent in that before the split the overall result for the condition register is in mode CC_NCV, but afterwards it is just CC_C. I think CC_NCV is correct mode (the N, C and V bits all correctly reflect the result of the 64-bit comparison), but that then implies that the cc mode of subsi3_carryin_compare is incorrect as well and should in fact also be CC_NCV. Thinking about this pattern, I'm inclined to agree that CC_NCV is the correct mode for this operation I'm not sure if there are other consequences that will fall out from fixing this (it's possible that we might need a change to select_cc_mode as well). >>> Yes, this is still a bit awkward... >>> >>> The N and V bit will be the correct result for the subdi3_compare1 >>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI
[Ada] Spurious warning in formal package when use clause is present.
This patch removes a spurious style warning on an operator declared in a generic package when the package is used as a formal of a generic subprogram, and the subprogream body includes a use clause on that package. The following must compile quietly: gcc -c -gnatyO generic_test.adb --- with Generic_2; procedure Generic_Test is generic with package P_1 is new Generic_2 (<>); procedure S_1_G; procedure S_1_G is use P_1; begin null; end S_1_G; pragma Unreferenced (S_1_G); begin null; end Generic_Test; --- with Dummy; pragma Unreferenced (Dummy); with Generic_1; generic package Generic_2 is package P_1 is new Generic_1 (T_1 => Natural); end Generic_2; --- generic type T_1 is limited private; package Generic_1 is private type T_2 is record X : T_1; end record; function "=" (Left, Right : T_2) return Boolean is (True); end Generic_1; -- package Dummy is generic type T is range <>; package Dummy is function Foo (Of_Image : String) return T renames T'Value; end Dummy; end Dummy; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_aux.adb (Is_Geeric_Formal): Handle properly formal packages. * sem_ch3.adb (Analyze_Declarations): In a generic subprogram body. do not freeze the formals of the generic unit. Index: sem_ch3.adb === --- sem_ch3.adb (revision 251789) +++ sem_ch3.adb (working copy) @@ -2649,9 +2649,27 @@ -- in order to perform visibility checks on delayed aspects. Adjust_Decl; - Freeze_All (First_Entity (Current_Scope), Decl); - Freeze_From := Last_Entity (Current_Scope); + -- If the current scope is a generic subprogram body. skip + -- the generic formal parameters that are not frozen here. + + if Is_Subprogram (Current_Scope) + and then Nkind (Unit_Declaration_Node (Current_Scope)) + = N_Generic_Subprogram_Declaration + and then Present (First_Entity (Current_Scope)) + then + while Is_Generic_Formal (Freeze_From) loop + Freeze_From := Next_Entity (Freeze_From); + end loop; + + Freeze_All (Freeze_From, Decl); + Freeze_From := Last_Entity (Current_Scope); + + else + Freeze_All (First_Entity (Current_Scope), Decl); + Freeze_From := Last_Entity (Current_Scope); + end if; + -- Current scope is a package specification elsif Scope (Current_Scope) /= Standard_Standard Index: sem_aux.adb === --- sem_aux.adb (revision 251753) +++ sem_aux.adb (working copy) @@ -1053,9 +1053,13 @@ return Nkind_In (Kind, N_Formal_Object_Declaration, - N_Formal_Package_Declaration, N_Formal_Type_Declaration) - or else Is_Formal_Subprogram (E); + or else Is_Formal_Subprogram (E) + + or else + (Ekind (E) = E_Package + and then Nkind (Original_Node (Unit_Declaration_Node (E))) = +N_Formal_Package_Declaration); end if; end Is_Generic_Formal;
[Ada] Volatile component not treated as such
This patch corrects an issue where attributes applied to records were not propagated to components within the records - causing incorrect code to be generated by the backend. Additionally, this ticket fixes another issue with pragma Volatile_Full_Access that allowed the attribute to be applied to a type with aliased components. -- Source -- -- p.ads with System; use System; package P is type Int8_t is mod 2**8; type Rec is record A,B,C,D : aliased Int8_t; end record; type VFA_Rec is new Rec with Volatile_Full_Access; -- ERROR R : Rec with Volatile_Full_Access; -- ERROR type Arr is array (1 .. 4) of aliased Int8_t; type VFA_Arr is new Arr with Volatile_Full_Access; -- ERROR A : Arr with Volatile_Full_Access; -- ERROR type Priv_VFA_Rec is private with Volatile_Full_Access; -- ERROR type Priv_Ind_Rec is private with Independent; -- ERROR type Priv_Vol_Rec is private with Volatile; -- ERROR type Priv_Atomic_Rec is private with Atomic; -- ERROR type Aliased_Rec is tagged record X : aliased Integer; end record with Volatile_Full_Access; -- OK type Atomic_And_VFA_Int is new Integer with Atomic, Volatile_Full_Access; -- ERROR type Atomic_And_VFA_Rec is record X : Integer with Atomic; end record with Volatile_Full_Access; -- ERROR type Atomic_T is tagged record X : Integer with Atomic; -- OK end record; type Atomic_And_VFA_T is new Atomic_T with record Y : Integer; end record with Volatile_Full_Access; -- ERROR type Aliased_And_VFA_T is new Aliased_Rec with record Y : Integer; end record with Volatile_Full_Access; -- ERROR Aliased_And_VFA_Obj : aliased Integer with Volatile_Full_Access; -- ERROR Atomic_And_VFA_Obj: Integer with Atomic, Volatile_Full_Access; -- ERROR Aliased_And_VFA_Obj_B : Aliased_Rec with Volatile_Full_Access; -- ERROR Atomic_And_VFA_Obj_B : Atomic_T with Volatile_Full_Access;-- ERROR private type Priv_VFA_Rec is record X : Integer; end record; type Priv_Ind_Rec is record X : Integer; end record; type Priv_Vol_Rec is record X : Integer; end record; type Priv_Atomic_Rec is record X : Integer; end record; end; -- p2.adb with System; procedure P2 is type Type1_T is record Field_1 : Integer; Field_2 : Integer; Field_3 : Integer; Field_4 : Short_Integer; end record; for Type1_T use record Field_1 at 0 range 0 .. 31; Field_2 at 4 range 0 .. 31; Field_3 at 8 range 0 .. 31; Field_4 at 12 range 0 .. 15; end record; for Type1_T'Size use (14) * System.Storage_Unit; pragma Volatile(Type1_T); type Type2_T is record Type1 : Type1_T; Field_1 : Integer; Field_2 : Integer; Field_3 : Integer; Field_4 : Short_Integer; end record; for Type2_T use record Type1 at 0 range 0 .. 111; Field_1 at 14 range 0 .. 31; Field_2 at 18 range 0 .. 31; Field_3 at 22 range 0 .. 31; Field_4 at 26 range 0 .. 15; end record; for Type2_T'Size use (28) * System.Storage_Unit; pragma Volatile(Type2_T); -- ERROR Type1 : Type1_T := (0,0,0,0); Type2 : Type2_T:= ((0,0,0,0),0,0,0,0); begin Type1.Field_1 := Type1.Field_1 +1; Type2.Field_1 := Type2.Field_1 +1; end; -- Compilation and output -- & gcc -c p.ads & gnatmake -q p2.adb p.ads:8:33: cannot apply Volatile_Full_Access (aliased component present) p.ads:10:17: cannot apply Volatile_Full_Access (aliased component present) p.ads:13:33: cannot apply Volatile_Full_Access (aliased component present) p.ads:15:17: cannot apply Volatile_Full_Access (aliased component present) p.ads:18:11: representation item must be after full type declaration p.ads:21:11: representation item must be after full type declaration p.ads:24:11: representation item must be after full type declaration p.ads:27:11: representation item must be after full type declaration p.ads:31:20: cannot apply Volatile_Full_Access (aliased component present) p.ads:34:19: cannot have Volatile_Full_Access and Atomic for same entity p.ads:38:20: cannot have Volatile_Full_Access and Atomic for same entity p.ads:46:20: cannot have Volatile_Full_Access and Atomic for same entity p.ads:50:20: cannot apply Volatile_Full_Access (aliased component present) p.ads:53:49: cannot have Volatile_Full_Access and Atomic for same entity p.ads:54:45: cannot apply Volatile_Full_Access (aliased component present) p.ads:55:42: cannot have Volatile_Full_Access and Atomic for same entity p2.adb:30:31: size of volatile field "Type1" must be at least 128 bits p2.adb:31:27: position of volatile field "Field_1" must be multiple of 32 bits p2.adb:32:27: position of volatile field "Field_2" must be multiple of 32 bits p2.adb:33:27: position of volatile field "Field_3" mu
[PATCH] Adjust gcc.c-torture/execute/20050604-1.c
When fiddling around with vector lowering I found the following adjusted testcase helpful testing proper vector lowering of word_mode vector plus. Tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-09-06 Richard Biener * gcc.c-torture/execute/20050604-1.c: Adjust to be a better test for correctness of vector lowering. Index: gcc/testsuite/gcc.c-torture/execute/20050604-1.c === --- gcc/testsuite/gcc.c-torture/execute/20050604-1.c(revision 251790) +++ gcc/testsuite/gcc.c-torture/execute/20050604-1.c(working copy) @@ -6,7 +6,7 @@ extern void abort (void); -typedef short v4hi __attribute__ ((vector_size (8))); +typedef unsigned short v4hi __attribute__ ((vector_size (8))); typedef float v4sf __attribute__ ((vector_size (16))); union @@ -26,7 +26,7 @@ foo (void) { unsigned int i; for (i = 0; i < 2; i++) -u.v += (v4hi) { 12, 14 }; +u.v += (v4hi) { 12, 32768 }; for (i = 0; i < 2; i++) v.v += (v4sf) { 18.0, 20.0, 22 }; } @@ -35,7 +35,7 @@ int main (void) { foo (); - if (u.s[0] != 24 || u.s[1] != 28 || u.s[2] || u.s[3]) + if (u.s[0] != 24 || u.s[1] != 0 || u.s[2] || u.s[3]) abort (); if (v.f[0] != 36.0 || v.f[1] != 40.0 || v.f[2] != 44.0 || v.f[3] != 0.0) abort ();
Re: [PATCH] Fix SLSR issue
On Wed, 6 Sep 2017, Richard Biener wrote: > > This fixes a bogus check for a mode when the type matters. The > test can get fooled by vector ops with integral mode and thus we > later ICE trying to use wide-ints operating on vector constants. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Promptly overlooked some regressions, fixed as follows. Richard. 2017-09-06 Richard Biener * gimple-ssa-strength-reduction.c (find_candidates_dom_walker::before_dom_children): Also allow pointer types. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 251753) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -1742,7 +1742,8 @@ find_candidates_dom_walker::before_dom_c slsr_process_ref (gs); else if (is_gimple_assign (gs) - && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs + && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs))) + || POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs) { tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;
Re: [PATCH, ARM] correctly encode the CC reg data flow
On 06/09/17 13:44, Bernd Edlinger wrote: > On 09/04/17 21:54, Bernd Edlinger wrote: >> Hi Kyrill, >> >> Thanks for your review! >> >> >> On 09/04/17 15:55, Kyrill Tkachov wrote: >>> Hi Bernd, >>> >>> On 18/01/17 15:36, Bernd Edlinger wrote: On 01/13/17 19:28, Bernd Edlinger wrote: > On 01/13/17 17:10, Bernd Edlinger wrote: >> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>> On 18/12/16 12:58, Bernd Edlinger wrote: Hi, this is related to PR77308, the follow-up patch will depend on this one. When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned before reload, a mis-compilation in libgcc function __gnu_satfractdasq was discovered, see [1] for more details. The reason seems to be that when the *arm_cmpdi_insn is directly followed by a *arm_cmpdi_unsigned instruction, both are split up into this: [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1))) (parallel [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 3) (match_dup 4))) (set (match_dup 2) (minus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) (set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1] The problem is that the reg:CC from the *subsi3_carryin_compare is not mentioning that the reg:CC is also dependent on the reg:CC from before. Therefore the *arm_cmpsi_insn appears to be redundant and thus got removed, because the data values are identical. I think that applies to a number of similar pattern where data flow is happening through the CC reg. So this is a kind of correctness issue, and should be fixed independently from the optimization issue PR77308. Therefore I think the patterns need to specify the true value that will be in the CC reg, in order for cse to know what the instructions are really doing. Bootstrapped and reg-tested on arm-linux-gnueabihf. Is it OK for trunk? >>> I agree you've found a valid problem here, but I have some issues >>> with >>> the patch itself. >>> >>> >>> (define_insn_and_split "subdi3_compare1" >>>[(set (reg:CC_NCV CC_REGNUM) >>> (compare:CC_NCV >>>(match_operand:DI 1 "register_operand" "r") >>>(match_operand:DI 2 "register_operand" "r"))) >>> (set (match_operand:DI 0 "register_operand" "=&r") >>> (minus:DI (match_dup 1) (match_dup 2)))] >>>"TARGET_32BIT" >>>"#" >>>"&& reload_completed" >>>[(parallel [(set (reg:CC CC_REGNUM) >>> (compare:CC (match_dup 1) (match_dup 2))) >>>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>> 2)))]) >>> (parallel [(set (reg:CC_C CC_REGNUM) >>> (compare:CC_C >>> (zero_extend:DI (match_dup 4)) >>> (plus:DI (zero_extend:DI (match_dup 5)) >>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) >>>(set (match_dup 3) >>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] >>> >>> >>> This pattern is now no-longer self consistent in that before the >>> split >>> the overall result for the condition register is in mode CC_NCV, but >>> afterwards it is just CC_C. >>> >>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>> reflect the result of the 64-bit comparison), but that then >>> implies that >>> the cc mode of subsi3_carryin_compare is incorrect as well and >>> should in >>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>> agree >>> that CC_NCV is the correct mode for this operation >>> >>> I'm not sure if there are other consequences that will fall out from >>> fixing this (it's possible that we might need a change to >>> select_cc_mode >>> as well). >>> >> Yes, this is still a bit awkward... >> >> The N and V bit will be the correct result for the subdi3_compare1 >> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >> only gets the C bit correct, the expression for N and V is a different >> one. >> >> It probably works, because the subsi3_carryin_compare instruction sets >> mor
Re: [PATCH, ARM] correctly encode the CC reg data flow
On 09/04/17 21:54, Bernd Edlinger wrote: > Hi Kyrill, > > Thanks for your review! > > > On 09/04/17 15:55, Kyrill Tkachov wrote: >> Hi Bernd, >> >> On 18/01/17 15:36, Bernd Edlinger wrote: >>> On 01/13/17 19:28, Bernd Edlinger wrote: On 01/13/17 17:10, Bernd Edlinger wrote: > On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >> On 18/12/16 12:58, Bernd Edlinger wrote: >>> Hi, >>> >>> this is related to PR77308, the follow-up patch will depend on this >>> one. >>> >>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>> before reload, a mis-compilation in libgcc function >>> __gnu_satfractdasq >>> was discovered, see [1] for more details. >>> >>> The reason seems to be that when the *arm_cmpdi_insn is directly >>> followed by a *arm_cmpdi_unsigned instruction, both are split >>> up into this: >>> >>> [(set (reg:CC CC_REGNUM) >>> (compare:CC (match_dup 0) (match_dup 1))) >>> (parallel [(set (reg:CC CC_REGNUM) >>> (compare:CC (match_dup 3) (match_dup 4))) >>> (set (match_dup 2) >>> (minus:SI (match_dup 5) >>> (ltu:SI (reg:CC_C CC_REGNUM) >>> (const_int >>> 0])] >>> >>> [(set (reg:CC CC_REGNUM) >>> (compare:CC (match_dup 2) (match_dup 3))) >>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>> (set (reg:CC CC_REGNUM) >>> (compare:CC (match_dup 0) (match_dup 1] >>> >>> The problem is that the reg:CC from the *subsi3_carryin_compare >>> is not mentioning that the reg:CC is also dependent on the reg:CC >>> from before. Therefore the *arm_cmpsi_insn appears to be >>> redundant and thus got removed, because the data values are >>> identical. >>> >>> I think that applies to a number of similar pattern where data >>> flow is happening through the CC reg. >>> >>> So this is a kind of correctness issue, and should be fixed >>> independently from the optimization issue PR77308. >>> >>> Therefore I think the patterns need to specify the true >>> value that will be in the CC reg, in order for cse to >>> know what the instructions are really doing. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>> Is it OK for trunk? >>> >> I agree you've found a valid problem here, but I have some issues >> with >> the patch itself. >> >> >> (define_insn_and_split "subdi3_compare1" >>[(set (reg:CC_NCV CC_REGNUM) >> (compare:CC_NCV >>(match_operand:DI 1 "register_operand" "r") >>(match_operand:DI 2 "register_operand" "r"))) >> (set (match_operand:DI 0 "register_operand" "=&r") >> (minus:DI (match_dup 1) (match_dup 2)))] >>"TARGET_32BIT" >>"#" >>"&& reload_completed" >>[(parallel [(set (reg:CC CC_REGNUM) >> (compare:CC (match_dup 1) (match_dup 2))) >>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup >> 2)))]) >> (parallel [(set (reg:CC_C CC_REGNUM) >> (compare:CC_C >> (zero_extend:DI (match_dup 4)) >> (plus:DI (zero_extend:DI (match_dup 5)) >>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0) >>(set (match_dup 3) >> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])] >> >> >> This pattern is now no-longer self consistent in that before the >> split >> the overall result for the condition register is in mode CC_NCV, but >> afterwards it is just CC_C. >> >> I think CC_NCV is correct mode (the N, C and V bits all correctly >> reflect the result of the 64-bit comparison), but that then >> implies that >> the cc mode of subsi3_carryin_compare is incorrect as well and >> should in >> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >> agree >> that CC_NCV is the correct mode for this operation >> >> I'm not sure if there are other consequences that will fall out from >> fixing this (it's possible that we might need a change to >> select_cc_mode >> as well). >> > Yes, this is still a bit awkward... > > The N and V bit will be the correct result for the subdi3_compare1 > a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) > only gets the C bit correct, the expression for N and V is a different > one. > > It probably works, because the subsi3_carryin_compare instruction sets > more CC bits than the pattern does explicitly specify the value. > We know the subsi3_carryin_compare also computes the NV bits, but > it is > hard to
replace libiberty with gnulib (was: Re: [PATCH 0/2] add unique_ptr class)
On 05/09/17 18:40, Pedro Alves wrote: On 09/05/2017 05:52 PM, Manuel López-Ibáñez wrote: Yeah, ISTR it was close, though there were a couple things that needed addressing still. The wiki seems to miss a pointer to following iterations/review of that patch (mailing list archives don't cross month boundaries...). You can find it starting here: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01208.html I think this was the latest version posted: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01554.html Thanks, I have updated the hyperlinks in the wiki. Unfortunately, Ayush left and there is no one else to finish the work. While converting individuals functions from libiberty to gnulib is more or less straightforward, the build system of GCC is far too complex for any new or less-experienced contributor to finish the job. I have also updated https://gcc.gnu.org/wiki/SummerOfCode I don't believe that this was the only project accepted in 2016, but I cannot remember the others. Didn't GCC apply this year? Cheers, Manuel.
[PATCH] Fix PR82108
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk sofar. Richard. 2017-09-06 Richard Biener PR tree-optimization/82108 * tree-vect-stmts.c (vectorizable_load): Fix pointer adjustment for gap in the non-permutation SLP case. * gcc.dg/vect/pr82108.c: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 251642) +++ gcc/tree-vect-stmts.c (working copy) @@ -7203,7 +7203,6 @@ vectorizable_load (gimple *stmt, gimple_ { first_stmt = GROUP_FIRST_ELEMENT (stmt_info); group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt)); - int group_gap = GROUP_GAP (vinfo_for_stmt (first_stmt)); /* For SLP vectorization we directly vectorize a subchain without permutation. */ if (slp && ! SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) @@ -7246,7 +7245,8 @@ vectorizable_load (gimple *stmt, gimple_ else { vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); - group_gap_adj = group_gap; + group_gap_adj + = group_size - SLP_INSTANCE_GROUP_SIZE (slp_node_instance); } } else Index: gcc/testsuite/gcc.dg/vect/pr82108.c === --- gcc/testsuite/gcc.dg/vect/pr82108.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/pr82108.c (working copy) @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_float } */ + +#include "tree-vect.h" + +void __attribute__((noinline,noclone)) +downscale_2 (const float* src, int src_n, float* dst) +{ + int i; + + for (i = 0; i < src_n; i += 2) { + const float* a = src; + const float* b = src + 4; + + dst[0] = (a[0] + b[0]) / 2; + dst[1] = (a[1] + b[1]) / 2; + dst[2] = (a[2] + b[2]) / 2; + dst[3] = (a[3] + b[3]) / 2; + + src += 2 * 4; + dst += 4; + } +} + +int main () +{ + const float in[4 * 4] = { + 1, 2, 3, 4, + 5, 6, 7, 8, + + 1, 2, 3, 4, + 5, 6, 7, 8 + }; + float out[2 * 4]; + + check_vect (); + + downscale_2 (in, 4, out); + + if (out[0] != 3 || out[1] != 4 || out[2] != 5 || out[3] != 6 + || out[4] != 3 || out[5] != 4 || out[6] != 5 || out[7] != 6) +__builtin_abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
[Ada] Issue error message on invalid representation clause for extension
This makes the compiler generate an error message also in the case where one of the specified components overlaps the parent field because its size has been explicitly set by a size clause. The compiler must issue an error on 32-bit platforms for the package: 1. package P is 2. 3. type Byte is mod 2**8; 4. for Byte'Size use 8; 5. 6. type Root is tagged record 7. Status : Byte; 8. end record; 9. for Root use record 10. Status at 4 range 0 .. 7; 11. end record; 12. for Root'Size use 64; 13. 14. type Ext is new Root with record 15. Thread_Status : Byte; 16. end record; 17. for Ext use record 18. Thread_Status at 5 range 0 .. 7; | >>> component overlaps parent field of "Ext" 19. end record; 20. 21. end P; 21 lines: 1 error Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Eric Botcazou * sem_ch13.adb (Check_Record_Representation_Clause): Give an error as soon as one of the specified components overlaps the parent field. Index: sem_ch13.adb === --- sem_ch13.adb(revision 251784) +++ sem_ch13.adb(working copy) @@ -9806,12 +9806,12 @@ -- checking for overlap, since no overlap is possible. Tagged_Parent : Entity_Id := Empty; - -- This is set in the case of a derived tagged type for which we have - -- Is_Fully_Repped_Tagged_Type True (indicating that all components are - -- positioned by record representation clauses). In this case we must - -- check for overlap between components of this tagged type, and the - -- components of its parent. Tagged_Parent will point to this parent - -- type. For all other cases Tagged_Parent is left set to Empty. + -- This is set in the case of an extension for which we have either a + -- size clause or Is_Fully_Repped_Tagged_Type True (indicating that all + -- components are positioned by record representation clauses) on the + -- parent type. In this case we check for overlap between components of + -- this tagged type and the parent component. Tagged_Parent will point + -- to this parent type. For all other cases, Tagged_Parent is Empty. Parent_Last_Bit : Uint; -- Relevant only if Tagged_Parent is set, Parent_Last_Bit indicates the @@ -9959,19 +9959,23 @@ if Rectype = Any_Type then return; - else - Rectype := Underlying_Type (Rectype); end if; + Rectype := Underlying_Type (Rectype); + -- See if we have a fully repped derived tagged type declare PS : constant Entity_Id := Parent_Subtype (Rectype); begin - if Present (PS) and then Is_Fully_Repped_Tagged_Type (PS) then + if Present (PS) and then Known_Static_RM_Size (PS) then Tagged_Parent := PS; +Parent_Last_Bit := RM_Size (PS) - 1; + elsif Present (PS) and then Is_Fully_Repped_Tagged_Type (PS) then +Tagged_Parent := PS; + -- Find maximum bit of any component of the parent type Parent_Last_Bit := UI_From_Int (System_Address_Size - 1); @@ -10063,7 +10067,7 @@ ("bit number out of range of specified size", Last_Bit (CC)); - -- Check for overlap with tag component + -- Check for overlap with tag or parent component else if Is_Tagged_Type (Rectype) @@ -10073,27 +10077,20 @@ ("component overlaps tag field of&", Component_Name (CC), Rectype); Overlap_Detected := True; + + elsif Present (Tagged_Parent) + and then Fbit <= Parent_Last_Bit + then + Error_Msg_NE +("component overlaps parent field of&", + Component_Name (CC), Rectype); + Overlap_Detected := True; end if; if Hbit < Lbit then Hbit := Lbit; end if; end if; - --- Check parent overlap if component might overlap parent field - -if Present (Tagged_Parent) and then Fbit <= Parent_Last_Bit then - Pcomp := First_Component_Or_Discriminant (Tagged_Parent); - while Present (Pcomp) loop - if not Is_Tag (Pcomp) -and then Chars (Pcomp) /= Name_uParent - then - Check_Component_Overlap (Comp, Pcomp); - end if; - - Next_Component_Or_Discriminant (Pcomp); - end loop; -end if; end if; Next (CC);
[Ada] Reject invalid use of Global/Depends on object declaration
GNAT failed to issue an error on a Global/Depends aspect put on an object declaration, which is only allowed for a task object. Instead it crashed. Now fixed. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Yannick Moy * sem_prag.adb (Analyze_Depends_Global): Reinforce test on object declarations to only consider valid uses of Global/Depends those on single concurrent objects. Index: sem_prag.adb === --- sem_prag.adb(revision 251778) +++ sem_prag.adb(working copy) @@ -4080,7 +4080,10 @@ -- Object declaration of a single concurrent type - elsif Nkind (Subp_Decl) = N_Object_Declaration then + elsif Nkind (Subp_Decl) = N_Object_Declaration + and then Is_Single_Concurrent_Object + (Unique_Defining_Entity (Subp_Decl)) + then null; -- Single task type
[Ada] Missing finalization of generalized indexed element
This patch modifies the finalization mechanism to recognize a heavily expanded generalized indexing where the element type requires finalization actions. -- Source -- -- types.ads with Ada.Finalization; use Ada.Finalization; package Types is type Element is new Controlled with record Id : Natural := 0; end record; procedure Adjust (Obj : in out Element); procedure Finalize (Obj : in out Element); procedure Initialize (Obj : In out Element); subtype Index is Integer range 1 .. 3; type Collection is array (Index) of Element; type Vector is new Controlled with record Id : Natural := 0; Elements : Collection; end record with Constant_Indexing => Element_At; procedure Adjust (Obj : in out Vector); procedure Finalize (Obj : in out Vector); procedure Initialize (Obj : In out Vector); function Element_At (Obj : Vector; Pos : Index) return Element'Class; function Make_Vector return Vector'Class; end Types; -- types.adb with Ada.Text_IO; use Ada.Text_IO; package body Types is Id_Gen : Natural := 10; procedure Adjust (Obj : in out Element) is Old_Id : constant Natural := Obj.Id; New_Id : constant Natural := Old_Id + 1; begin if Old_Id = 0 then Put_Line (" Element adj ERROR"); else Put_Line (" Element adj" & Old_Id'Img & " ->" & New_Id'Img); Obj.Id := New_Id; end if; end Adjust; procedure Adjust (Obj : in out Vector) is Old_Id : constant Natural := Obj.Id; New_Id : constant Natural := Old_Id + 1; begin if Old_Id = 0 then Put_Line (" Vector adj ERROR"); else Put_Line (" Vector adj" & Old_Id'Img & " ->" & New_Id'Img); Obj.Id := New_Id; end if; end Adjust; function Element_At (Obj : Vector; Pos : Index) return Element'Class is begin return Obj.Elements (Pos); end Element_At; procedure Finalize (Obj : in out Element) is begin if Obj.Id = 0 then Put_Line (" Element fin ERROR"); else Put_Line (" Element fin" & Obj.Id'Img); Obj.Id := 0; end if; end Finalize; procedure Finalize (Obj : in out Vector) is begin if Obj.Id = 0 then Put_Line (" Vector fin ERROR"); else Put_Line (" Vector fin" & Obj.Id'Img); Obj.Id := 0; end if; end Finalize; procedure Initialize (Obj : In out Element) is begin Obj.Id := Id_Gen; Id_Gen := Id_Gen + 10; Put_Line (" Element ini" & Obj.Id'Img); end Initialize; procedure Initialize (Obj : In out Vector) is begin Obj.Id := Id_Gen; Id_Gen := Id_Gen + 10; Put_Line (" Vector ini" & Obj.Id'Img); end Initialize; function Make_Vector return Vector'Class is Result : Vector; begin return Result; end Make_Vector; end Types; -- main.adb with Ada.Text_IO; use Ada.Text_IO; with Types; use Types; procedure Main is begin Put_Line ("Main"); declare Vec : Vector'Class := Make_Vector; Elem : Element'Class := Vec (1); begin Put_Line ("Main middle"); end; Put_Line ("Main end"); end Main; -- Compilation and output -- $ gnatmake -q main.adb $ ./main.adb Main Element ini 10 Element ini 20 Element ini 30 Vector ini 40 Element adj 10 -> 11 Element adj 20 -> 21 Element adj 30 -> 31 Vector adj 40 -> 41 Vector fin 40 Element fin 30 Element fin 20 Element fin 10 Element adj 11 -> 12 Element adj 21 -> 22 Element adj 31 -> 32 Vector adj 41 -> 42 Vector fin 41 Element fin 31 Element fin 21 Element fin 11 Element adj 12 -> 13 Element adj 13 -> 14 Element fin 13 Main middle Element fin 14 Vector fin 42 Element fin 32 Element fin 22 Element fin 12 Main end Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Hristian Kirtchev * exp_util.adb (Is_Controlled_Indexing): New routine. (Is_Displace_Call): Use routine Strip to remove indirections. (Is_Displacement_Of_Object_Or_Function_Result): Code clean up. Add a missing case of controlled generalized indexing. (Is_Source_Object): Use routine Strip to remove indirections. (Strip): New routine. Index: exp_util.adb === --- exp_util.adb(revision 251784) +++ exp_util.adb(working copy) @@ -7590,22 +7590,28 @@ (Obj_Id : Entity_Id) return Boolean is function Is_Controlled_Function_Call (N : Node_Id) return Boolean; - -- Determine if particular node denotes a controlled function call. The - -- call may have been heavily expanded. + -- Determine whether node N denotes a controlled function call + function Is_Controlled_Indexing (N : Node_Id) return Boolean; + -- Det
[Ada] Handling of inherited and explicit postconditions
This patch fixes the handling of overriding operations that have both an explicit postcondition and an inherited classwide one. Executing: gnatmake -q -gnata post_class.adb post_class must yield: raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : failed inherited postcondition from the_package.ads:4 --- with The_Package; use The_Package; procedure Post_Class is X : D; begin Proc (X); end Post_Class; --- package The_Package is type T is tagged null record; function F (X : T) return Boolean is (True); procedure Proc (X : in out T) with Post => True, post'class => F (X); type D is new T with null record; overriding function F (X : D) return Boolean is (False); end The_Package; --- package body The_Package is procedure Proc (X : in out T) is begin null; end Proc; end The_Package; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * einfo.ads, einfo.adb (Get_Classwwide_Pragma): New utility, to retrieve the inherited classwide precondition/postcondition of a subprogram. * freeze.adb (Freeze_Entity): Use Get_Classwide_Pragma when freezing a subprogram, to complete the generation of the corresponding checking code. Index: einfo.adb === --- einfo.adb (revision 251783) +++ einfo.adb (working copy) @@ -7481,6 +7481,39 @@ return Empty; end Get_Pragma; + -- + -- Get_Classwide_Pragma -- + -- + + function Get_Classwide_Pragma + (E : Entity_Id; + Id : Pragma_Id) return Node_Id +is + Item : Node_Id; + Items : Node_Id; + + begin + Items := Contract (E); + if No (Items) then + return Empty; + end if; + + Item := Pre_Post_Conditions (Items); + + while Present (Item) loop + if Nkind (Item) = N_Pragma + and then Get_Pragma_Id (Pragma_Name_Unmapped (Item)) = Id + and then Class_Present (Item) + then +return Item; + else +Item := Next_Pragma (Item); + end if; + end loop; + + return Empty; + end Get_Classwide_Pragma; + -- -- Get_Record_Representation_Clause -- -- Index: einfo.ads === --- einfo.ads (revision 251783) +++ einfo.ads (working copy) @@ -8295,6 +8295,12 @@ --Test_Case --Volatile_Function + function Get_Classwide_Pragma + (E : Entity_Id; + Id : Pragma_Id) return Node_Id; + -- Examine Rep_Item chain to locate a classwide pre- or postcondition + -- of a primitive operation. Returns Empty if not present. + function Get_Record_Representation_Clause (E : Entity_Id) return Node_Id; -- Searches the Rep_Item chain for a given entity E, for a record -- representation clause, and if found, returns it. Returns Empty Index: freeze.adb === --- freeze.adb (revision 251781) +++ freeze.adb (working copy) @@ -1418,8 +1418,8 @@ New_Prag : Node_Id; begin - A_Pre := Get_Pragma (Par_Prim, Pragma_Precondition); - if Present (A_Pre) and then Class_Present (A_Pre) then + A_Pre := Get_Classwide_Pragma (Par_Prim, Pragma_Precondition); + if Present (A_Pre) then New_Prag := New_Copy_Tree (A_Pre); Build_Class_Wide_Expression (Prag => New_Prag, @@ -1436,9 +1436,9 @@ end if; end if; - A_Post := Get_Pragma (Par_Prim, Pragma_Postcondition); + A_Post := Get_Classwide_Pragma (Par_Prim, Pragma_Postcondition); - if Present (A_Post) and then Class_Present (A_Post) then + if Present (A_Post) then New_Prag := New_Copy_Tree (A_Post); Build_Class_Wide_Expression (Prag => New_Prag,
[Ada] Dimensional checking and generic subprograms
This patch enahnces dimensionality checking to cover generic subprograms that are intended to apply to types of different dimensions, such as an integration function. Dimensionality checking is performed in each instance. and rely on a special handling of conversion operations to prevent spurious dimensional errors in the generic unit itself. The following must compile quietly: gcc -c -gnatws integrate.adb --- package Dims with SPARK_Mode is - -- Setup Dimension System - type Unit_Type is new Float with Dimension_System => ((Unit_Name => Meter, Unit_Symbol => 'm', Dim_Symbol => 'L'), (Unit_Name => Kilogram, Unit_Symbol => "kg", Dim_Symbol => 'M'), (Unit_Name => Second, Unit_Symbol => 's', Dim_Symbol => 'T'), (Unit_Name => Ampere, Unit_Symbol => 'A', Dim_Symbol => 'I'), (Unit_Name => Kelvin, Unit_Symbol => 'K', Dim_Symbol => "Theta"), (Unit_Name => Radian, Unit_Symbol => "Rad", Dim_Symbol => "A")), Default_Value => 0.0; -- Base Dimensions subtype Length_Type is Unit_Type with Dimension => (Symbol => 'm', Meter => 1, others => 0); subtype Time_Type is Unit_Type with Dimension => (Symbol => 's', Second => 1, others => 0); subtype Linear_Velocity_Type is Unit_Type with Dimension => (Meter => 1, Second => -1, others => 0); -- Base Units Meter: constant Length_Type := Length_Type (1.0); Second : constant Time_Type := Time_Type (1.0); end dims; --- with Dims; use Dims; procedure Integrate is generic type Op1 is new Unit_Type; type Op2 is new Unit_Type; type Res is new Unit_Type; function I (X : op1; Y : Op2) return Res; function I (X : op1; Y : Op2) return Res is begin return Res (Unit_Type (X) * Unit_type (Y)); end I; function Distance is new I (Time_Type, Linear_Velocity_Type, Length_Type); Secs : Time_Type := 5.0; Speed : Linear_Velocity_Type := 10.0; Covered : Length_Type; begin Covered := Distance (Secs, Speed); declare subtype Area is Unit_Type with dimension => (Meter => 2, others => 0); My_Little_Acre : Area; function Acres is new I (Length_Type, Length_Type, Area); begin My_Little_Acre := Covered * Covered; My_Little_Acre := Acres (Covered, Covered); end; end Integrate; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_dim.adb (Analyze_Dimension): In an instance, a type conversion takes its dimensions from the expression, not from the context type. (Dimensions_Of_Operand): Ditto. Index: sem_dim.adb === --- sem_dim.adb (revision 251753) +++ sem_dim.adb (working copy) @@ -1161,7 +1161,6 @@ | N_Qualified_Expression | N_Selected_Component | N_Slice -| N_Type_Conversion | N_Unchecked_Type_Conversion => Analyze_Dimension_Has_Etype (N); @@ -1191,7 +1190,17 @@ when N_Subtype_Declaration => Analyze_Dimension_Subtype_Declaration (N); + when N_Type_Conversion => +if In_Instance + and then Exists (Dimensions_Of (Expression (N))) +then + Set_Dimensions (N, Dimensions_Of (Expression (N))); +else + Analyze_Dimension_Has_Etype (N); +end if; + when N_Unary_Op => + Analyze_Dimension_Unary_Op (N); when others => @@ -1378,11 +1387,24 @@ -- A type conversion may have been inserted to rewrite other -- expressions, e.g. function returns. Dimensions are those of - -- the target type. + -- the target type, unless this is a conversion in an instance, + -- in which case the proper dimensions are those of the operand, elsif Nkind (N) = N_Type_Conversion then -return Dimensions_Of (Etype (N)); +if In_Instance + and then Is_Generic_Actual_Type (Etype (Expression (N))) +then + return Dimensions_Of (Etype (Expression (N))); +elsif In_Instance + and then Exists (Dimensions_Of (Expression (N))) +then + return Dimensions_Of (Expression (N)); + +else + return Dimensions_Of (Etype (N)); +end if; + -- Otherwise return the default dimensions else
[Ada] Time_IO.Value enhanced to parse ISO-8861 UTC date and time
The function Value of package GNAT.Calendar.Time_IO has been enhanced to parse strings containing UTC date and time. After this patch the following test works fine. with Ada.Calendar; use Ada.Calendar; with Ada.Text_IO; use Ada.Text_IO; with GNAT.Calendar.Time_IO; use GNAT.Calendar.Time_IO; procedure Do_Test is Picture : Picture_String := "%Y-%m-%dT%H:%M:%S,%i"; T1 : Time; T2 : Time; T3 : Time; T4 : Time; T5 : Time; begin T1 := Value ("2017-04-14T14:47:06"); pragma Assert (Image (T1, Picture) = "2017-04-14T14:47:06,000"); T2 := Value ("2017-04-14T14:47:06Z"); pragma Assert (Image (T2, Picture) = "2017-04-14T14:47:06,000"); T3 := Value ("2017-04-14T14:47:06,999"); pragma Assert (Image (T3, Picture) = "2017-04-14T14:47:06,999"); T4 := Value ("2017-04-14T19:47:06+05"); pragma Assert (Image (T4, Picture) = "2017-04-14T14:47:06,000"); T5 := Value ("2017-04-14T09:00:06-05:47"); pragma Assert (Image (T5, Picture) = "2017-04-14T14:47:06,000"); end; Command: gnatmake -gnata do_test.adb; ./do_test Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Javier Miranda * g-catiio.ads, g-catiio.adb (Value): Extended to parse an UTC time following ISO-8861. Index: g-catiio.adb === --- g-catiio.adb(revision 251753) +++ g-catiio.adb(working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- --- Copyright (C) 1999-2016, AdaCore -- +-- Copyright (C) 1999-2017, AdaCore -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -93,6 +93,26 @@ Length : Natural := 0) return String; -- As above with N provided in Integer format + procedure Parse_ISO_8861_UTC + (Date: String; + Time: out Ada.Calendar.Time; + Success : out Boolean); + -- Subsidiary of function Value. It parses the string Date, interpreted as + -- an ISO 8861 time representation, and returns corresponding Time value. + -- Success is set to False when the string is not a supported ISO 8861 + -- date. The following regular expression defines the supported format: + -- + --(mmdd | '-'mm'-'dd)'T'(hhmmss | hh':'mm':'ss) + -- [ ('Z' | ('.' | ',') s{s} | ('+'|'-')hh':'mm) ] + -- + -- Trailing characters (in particular spaces) are not allowed. + -- + -- Examples: + -- + --2017-04-14T14:47:0620170414T14:47:0620170414T144706 + --2017-04-14T14:47:06,12 20170414T14:47:06.12 + --2017-04-14T19:47:06+05 20170414T09:00:06-05:47 + --- -- Am_Pm -- --- @@ -531,7 +551,7 @@ "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"); -- Short version of the month names, used when parsing date strings - S : String := Str; + S : String := Str; begin GNAT.Case_Util.To_Upper (S); @@ -545,6 +565,390 @@ return Abbrev_Upper_Month_Names'First; end Month_Name_To_Number; + + -- Parse_ISO_8861_UTC -- + + + procedure Parse_ISO_8861_UTC + (Date: String; + Time: out Ada.Calendar.Time; + Success : out Boolean) + is + Index : Positive := Date'First; + -- The current character scan index. After a call to Advance, Index + -- points to the next character. + + End_Of_Source_Reached : exception; + -- An exception used to signal that the scan pointer has reached the + -- end of the source string. + + Wrong_Syntax : exception; + -- An exception used to signal that the scan pointer has reached an + -- unexpected character in the source string. + + procedure Advance; + pragma Inline (Advance); + -- Past the current character of Date + + procedure Advance_Digits (Num_Digits : Positive); + pragma Inline (Advance_Digits); + -- Past the given number of digit characters + + function Scan_Day return Day_Number; + pragma Inline (Scan_Day); + -- Scan the two digits of a day number and return its value + + function Scan_Hour return Hour_Number; + pragma Inline (Scan_Hour); + -- Scan the two digits of an hour number and return its value + + function Scan_Minute return Minute_Number; + pragma Inline (Scan_Minute); + -- Scan the two digits of a minute number and return its value + +
[Ada] Eliminate out-of-line body of local inlined subprograms
This improves a little the algorithm used to compute the set of externally visible entities in package bodies to make it less conservative in the presence of local inlined subprograms. The typical effect is to eliminate the out-of-line body if the subprogram is inlined at every call site: package Q3 is procedure Caller; end Q3; package body Q3 is I : Integer := 0; procedure Inner is begin I := 1; end; procedure Proc; pragma Inline (Proc); procedure Proc is begin Inner; end; procedure Caller is begin Proc; end; end Q3; The out-of-line body of Proc is now eliminated at -O1 and above. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Eric Botcazou * inline.adb (Split_Unconstrained_Function): Also set Is_Inlined on the procedure created to encapsulate the body. * sem_ch7.adb: Add with clause for GNAT.HTable. (Entity_Table_Size): New constant. (Entity_Hash): New function. (Subprogram_Table): New instantiation of GNAT.Htable.Simple_HTable. (Is_Subprogram_Ref): Rename into... (Scan_Subprogram_Ref): ...this. Record references to subprograms in the table instead of bailing out on them. Scan the value of constants if it is not known at compile time. (Contains_Subprograms_Refs): Rename into... (Scan_Subprogram_Refs): ...this. (Has_Referencer): Scan the body of all inlined subprograms. Reset the Is_Public flag on subprograms if they are not actually referenced. (Hide_Public_Entities): Beef up comment on the algorithm. Reset the table of subprograms on entry. Index: inline.adb === --- inline.adb (revision 251779) +++ inline.adb (working copy) @@ -1607,7 +1607,7 @@ -- N is an inlined function body that returns an unconstrained type and -- has a single extended return statement. Split N in two subprograms: -- a procedure P' and a function F'. The formals of P' duplicate the - -- formals of N plus an extra formal which is used return a value; + -- formals of N plus an extra formal which is used to return a value; -- its body is composed by the declarations and list of statements -- of the extended return statement of N. @@ -1915,6 +1915,7 @@ Pop_Scope; Build_Procedure (Proc_Id, Decl_List); Insert_Actions (N, Decl_List); +Set_Is_Inlined (Proc_Id); Push_Scope (Scope); end; Index: sem_ch7.adb === --- sem_ch7.adb (revision 251763) +++ sem_ch7.adb (working copy) @@ -70,6 +70,8 @@ with Style; with Uintp; use Uintp; +with GNAT.HTable; + package body Sem_Ch7 is --- @@ -187,6 +189,38 @@ end if; end Analyze_Package_Body; + -- + -- Analyze_Package_Body_Helper Data and Subprograms -- + -- + + Entity_Table_Size : constant := 4096; + -- Number of headers in hash table + + subtype Entity_Header_Num is Integer range 0 .. Entity_Table_Size - 1; + -- Range of headers in hash table + + function Entity_Hash (Id : Entity_Id) return Entity_Header_Num; + -- Simple hash function for Entity_Ids + + package Subprogram_Table is new GNAT.Htable.Simple_HTable + (Header_Num => Entity_Header_Num, + Element=> Boolean, + No_Element => False, + Key=> Entity_Id, + Hash => Entity_Hash, + Equal => "="); + -- Hash table to record which subprograms are referenced. It is declared + -- at library level to avoid elaborating it for every call to Analyze. + + - + -- Entity_Hash -- + - + + function Entity_Hash (Id : Entity_Id) return Entity_Header_Num is + begin + return Entity_Header_Num (Id mod Entity_Table_Size); + end Entity_Hash; + - -- Analyze_Package_Body_Helper -- - @@ -200,8 +234,8 @@ -- Attempt to hide all public entities found in declarative list Decls -- by resetting their Is_Public flag to False depending on whether the -- entities are not referenced by inlined or generic bodies. This kind - -- of processing is a conservative approximation and may still leave - -- certain entities externally visible. + -- of processing is a conservative approximation and will still leave + -- entities externally visible if the package is not simple enough. procedure Install_Composite_Operations (P : Entity_Id); -- Composite types declared in the current scope may depend on types @@ -214,11 +248,6 @@ -- procedure Hide_Public_Entities (Decls : List_Id) is -
[Ada] Crash when issuing warning on uninitialized value
When issuing a warning on a read of an uninitialized variable through reading an attribute such as Loop_Entry, GNAT could crash. Now fixed. GNAT issues a warning as expected on the following code: $ gcc -c s.adb 1. package S is 2. 3.type Array_Range is range 1 .. 10; 4. 5.type IntArray is array (Array_Range) of Integer; 6. 7.procedure Move (Dest, Src : aliased out IntArray); 8. 9. end S; 1. package body S is 2. 3.procedure Move (Dest, Src : aliased out IntArray) is 4.begin 5. for Index in Dest'Range loop 6. pragma Assert (for all J in Dest'First .. Index - 1 => 7. Dest (J) = Src'Loop_Entry (J)); 1 2 >>> warning: "Dest" may be referenced before it has a value >>> warning: "Src" may be referenced before it has a value 8. 9. Dest (Index) := Src (Index); 10. Src (Index) := 0; 11. end loop; 12.end Move; 13. 14. end S; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Yannick Moy * sem_warn.adb (Check_References): Take into account possibility of attribute reference as original node. Index: sem_warn.adb === --- sem_warn.adb(revision 251773) +++ sem_warn.adb(working copy) @@ -1382,16 +1382,22 @@ -- deal with case where original unset reference has been -- rewritten during expansion. - -- In some cases, the original node may be a type conversion - -- or qualification, and in this case we want the object - -- entity inside. + -- In some cases, the original node may be a type + -- conversion, a qualification or an attribute reference and + -- in this case we want the object entity inside. Same for + -- an expression with actions. UR := Original_Node (UR); while Nkind (UR) = N_Type_Conversion or else Nkind (UR) = N_Qualified_Expression or else Nkind (UR) = N_Expression_With_Actions +or else Nkind (UR) = N_Attribute_Reference loop - UR := Expression (UR); + if Nkind (UR) = N_Attribute_Reference then +UR := Prefix (UR); + else +UR := Expression (UR); + end if; end loop; -- Don't issue warning if appearing inside Initial_Condition
Re: Add support to trace comparison instructions and switch statements
Hi Jakub I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address". And run my fuzzer with pc and cmp feedbacks for hours. It works fine. About __sanitizer_cov_trace_cmp{f,d} , yes, it isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons. I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs. But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary. And thanks again for your professional help. Wish Wu -- From:Jakub Jelinek Time:2017 Sep 6 (Wed) 05:44 To:Wish Wu Cc:Dmitry Vyukov ; gcc-patches ; Jeff Law ; wishwu007 Subject:Re: Add support to trace comparison instructions and switch statements On Tue, Sep 05, 2017 at 09:03:52PM +0800, 吴潍浠(此彼) wrote: > Attachment is my updated path. > The implementation of parse_sanitizer_options is not elegance enough. Mixing > handling flags of fsanitize is easy to make mistakes. To avoid too many further iterations, I took the liberty to tweak your patch. From https://clang.llvm.org/docs/SanitizerCoverage.html I've noticed that since 2017-08-11 clang/llvm wants to emit __sanitizer_cov_trace_const_cmpN with the first argument a constant if one of the comparison operands is a constant, so the patch implements that too. I wonder about the __sanitizer_cov_trace_cmp{f,d} entry-points, because I can't find them on that page nor in llvm sources. I've also added handling of COND_EXPRs and added some documentation. I've bootstrapped/regtested the patch on x86_64-linux and i686-linux. Can you test it on whatever you want to use the patch for? 2017-09-05 Wish Wu Jakub Jelinek * asan.c (initialize_sanitizer_builtins): Add BT_FN_VOID_UINT8_UINT8, BT_FN_VOID_UINT16_UINT16, BT_FN_VOID_UINT32_UINT32, BT_FN_VOID_UINT64_UINT64, BT_FN_VOID_FLOAT_FLOAT, BT_FN_VOID_DOUBLE_DOUBLE and BT_FN_VOID_UINT64_PTR variables. * builtin-types.def (BT_FN_VOID_UINT8_UINT8): New fn type. (BT_FN_VOID_UINT16_UINT16): Likewise. (BT_FN_VOID_UINT32_UINT32): Likewise. (BT_FN_VOID_FLOAT_FLOAT): Likewise. (BT_FN_VOID_DOUBLE_DOUBLE): Likewise. (BT_FN_VOID_UINT64_PTR): Likewise. * common.opt (flag_sanitize_coverage): New variable. (fsanitize-coverage=trace-pc): Remove. (fsanitize-coverage=): Add. * flag-types.h (enum sanitize_coverage_code): New enum. * fold-const.c (fold_range_test): Disable non-short-circuit optimization if flag_sanitize_coverage. (fold_truth_andor): Likewise. * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise. * opts.c (COVERAGE_SANITIZER_OPT): Define. (coverage_sanitizer_opts): New array. (get_closest_sanitizer_option): Add OPTS argument, handle also OPT_fsanitize_coverage_. (parse_sanitizer_options): Adjusted to also handle OPT_fsanitize_coverage_. (common_handle_option): Add OPT_fsanitize_coverage_. * sancov.c (instrument_comparison, instrument_switch): New function. (sancov_pass): Add trace-cmp support. * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1, BUILT_IN_SANITIZER_COV_TRACE_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CMP4, BUILT_IN_SANITIZER_COV_TRACE_CMP8, BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1, BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4, BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8, BUILT_IN_SANITIZER_COV_TRACE_CMPF, BUILT_IN_SANITIZER_COV_TRACE_CMPD, BUILT_IN_SANITIZER_COV_TRACE_SWITCH): New builtins. * doc/invoke.texi: Document -fsanitize-coverage=trace-cmp. * gcc.dg/sancov/cmp0.c: New test. --- gcc/asan.c.jj 2017-09-04 09:55:26.600687479 +0200 +++ gcc/asan.c 2017-09-05 15:39:32.452612728 +0200 @@ -2709,6 +2709,29 @@ initialize_sanitizer_builtins (void) tree BT_FN_SIZE_CONST_PTR_INT = build_function_type_list (size_type_node, const_ptr_type_node, integer_type_node, NULL_TREE); + + tree BT_FN_VOID_UINT8_UINT8 += build_function_type_list (void_type_node, unsigned_char_type_node, +unsigned_char_type_node, NULL_TREE); + tree BT_FN_VOID_UINT16_UINT16 += build_function_type_list (void_type_node, uint16_type_node, +uint16_type_node, NULL_TREE); + tree BT_FN_VOID_UINT32_UINT32 += build_function_type_list (void_type_node, uint32_type_node, +uint32_type_node, NULL_TREE); + tree BT_FN_VOID_UINT64_UINT64 += build_function_type_list (void_type_node, uint64_type_node, +uint64_type_node, NULL_TREE); + tree BT_FN_VOID_FLOAT_FLOAT += build_function_type_list (void_type_node, float_type_node, +float_type_node, NULL_TREE); + tree BT_FN_VOID_DOUBLE_DOUBLE += build_function_type_list (void_type_node, double_type_node, +double_type_node, NULL_TREE); + tree BT_FN_VOID_UINT64_PTR += build_function_type_list (void_type_node, uint64_type_node, +ptr_type_node, NULL_TREE); + tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
[C++ PATCH] rename lookup_fnfields_slot
This patch renames lookup_fnfields_slot{,_nolazy} to get_class_binding{,_direct}. It also removes a few now-unneeded checks for CLASSTYPE_METHOD_VEC being non-null. You may notice that the new names mention nothing about the kind of member looked for. That's intentional. These functions will absorb the non-function member lookup functionality. nathan -- Nathan Sidwell 2017-09-06 Nathan Sidwell * name-lookup.h (lookup_fnfields_slot_nolazy, lookup_fnfields_slot): Rename to ... (get_class_binding_direct, get_class_binding): ... here. * name-lookup.c (lookup_fnfields_slot_nolazy, lookup_fnfields_slot): Rename to ... (get_class_binding_direct, get_class_binding): ... here. * cp-tree.h (CLASSTYPE_CONSTRUCTORS, CLASSTYPE_DESTRUCTOR): Adjust. * call.c (build_user_type_conversion_1): Adjust. (has_trivial_copy_assign_p): Adjust. (has_trivial_copy_p): Adjust. * class.c (get_basefndecls) Adjust. (vbase_has_user_provided_move_assign) Adjust. (classtype_has_move_assign_or_move_ctor_p): Adjust. (type_build_ctor_call, type_build_dtor_call): Adjust. * decl.c (register_dtor_fn): Adjust. * decl2.c (check_classfn): Adjust. * pt.c (retrieve_specialization): Adjust. (check_explicit_specialization): Adjust. (do_class_deduction): Adjust. * search.c (lookup_field_r): Adjust. (look_for_overrides_here, lookup_conversions_r): Adjust. * semantics.c (classtype_has_nothrow_assign_or_copy_p): Adjust. * tree.c (type_has_nontrivial_copy_init): Adjust. * method.c (lazily_declare_fn): Adjust comment. Index: call.c === --- call.c (revision 251779) +++ call.c (working copy) @@ -3738,7 +3738,7 @@ build_user_type_conversion_1 (tree totyp if (CLASS_TYPE_P (totype)) /* Use lookup_fnfields_slot instead of lookup_fnfields to avoid creating a garbage BASELINK; constructors can't be inherited. */ -ctors = lookup_fnfields_slot (totype, complete_ctor_identifier); +ctors = get_class_binding (totype, complete_ctor_identifier); /* FIXME P0135 doesn't say what to do in C++17 about list-initialization from a single element. For now, let's handle constructors as before and also @@ -8243,9 +8243,7 @@ first_non_public_field (tree type) static bool has_trivial_copy_assign_p (tree type, bool access, bool *hasassign) { - tree fns = cp_assignment_operator_id (NOP_EXPR); - fns = lookup_fnfields_slot (type, fns); - + tree fns = get_class_binding (type, cp_assignment_operator_id (NOP_EXPR)); bool all_trivial = true; /* Iterate over overloads of the assignment operator, checking @@ -8294,8 +8292,7 @@ has_trivial_copy_assign_p (tree type, bo static bool has_trivial_copy_p (tree type, bool access, bool hasctor[2]) { - tree fns = lookup_fnfields_slot (type, complete_ctor_identifier); - + tree fns = get_class_binding (type, complete_ctor_identifier); bool all_trivial = true; for (ovl_iterator oi (fns); oi; ++oi) Index: class.c === --- class.c (revision 251779) +++ class.c (working copy) @@ -2745,7 +2745,7 @@ get_basefndecls (tree name, tree t, vec< bool found_decls = false; /* Find virtual functions in T with the indicated NAME. */ - for (ovl_iterator iter (lookup_fnfields_slot (t, name)); iter; ++iter) + for (ovl_iterator iter (get_class_binding (t, name)); iter; ++iter) { tree method = *iter; @@ -5034,14 +5034,12 @@ bool vbase_has_user_provided_move_assign (tree type) { /* Does the type itself have a user-provided move assignment operator? */ - for (ovl_iterator iter (lookup_fnfields_slot_nolazy - (type, cp_assignment_operator_id (NOP_EXPR))); - iter; ++iter) -{ - tree fn = *iter; - if (move_fn_p (fn) && user_provided_p (fn)) + if (!CLASSTYPE_LAZY_MOVE_ASSIGN (type)) +for (ovl_iterator iter (get_class_binding_direct + (type, cp_assignment_operator_id (NOP_EXPR))); + iter; ++iter) + if (!DECL_ARTIFICIAL (*iter) && move_fn_p (*iter)) return true; -} /* Do any of its bases? */ tree binfo = TYPE_BINFO (type); @@ -5180,13 +5178,12 @@ classtype_has_move_assign_or_move_ctor_p && !CLASSTYPE_LAZY_MOVE_ASSIGN (t))); if (!CLASSTYPE_LAZY_MOVE_CTOR (t)) -for (ovl_iterator iter (lookup_fnfields_slot_nolazy (t, ctor_identifier)); - iter; ++iter) +for (ovl_iterator iter (CLASSTYPE_CONSTRUCTORS (t)); iter; ++iter) if ((!user_p || !DECL_ARTIFICIAL (*iter)) && move_fn_p (*iter)) return true; if (!CLASSTYPE_LAZY_MOVE_ASSIGN (t)) -for (ovl_iterator iter (lookup_fnfields_slot_nolazy +for (ovl_iterator iter (get_class_binding_direct (t, cp_assignment_operator_id (NOP_EXPR))); iter; ++iter) if ((!user_p || !DECL_ARTIFICIAL (*iter)) && move_fn_p (*iter)) @@ -5220,8 +5217,7 @@ type_build_ctor_call (tree t) return false; /* A user-declared constructor might be private, and a constructor might be trivial but dele
[Ada] Extension of 'Image in Ada2020
Refactor of all 'Image attributes for better error diagnostics and clarity. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Justin Squirek * exp_imgv.adb (Expand_Image_Attribute), (Expand_Wide_Image_Attribute), (Expand_Wide_Wide_Image_Attribute): Added case to handle new-style 'Image expansion (Rewrite_Object_Image): Moved from exp_attr.adb * exp_attr.adb (Expand_N_Attribute_Reference): Modified Image attribute cases so that the relevant subprograms in exp_imgv.adb handle all expansion. (Rewrite_Object_Reference_Image): Moved to exp_imgv.adb * sem_attr.adb (Analyze_Attribute): Modified Image attribute cases to call common function Analyze_Image_Attribute. (Analyze_Image_Attribute): Created as a common path for all image attributes (Check_Object_Reference_Image): Removed * sem_util.ads, sem_util.adb (Is_Image_Applied_To_Object): Removed and refactored into Is_Object_Image (Is_Object_Image): Created as a replacement for Is_Image_Applied_To_Object Index: exp_imgv.adb === --- exp_imgv.adb(revision 251753) +++ exp_imgv.adb(working copy) @@ -36,6 +36,7 @@ with Rtsfind; use Rtsfind; with Sem_Aux; use Sem_Aux; with Sem_Res; use Sem_Res; +with Sem_Util; use Sem_Util; with Sinfo;use Sinfo; with Snames; use Snames; with Stand;use Stand; @@ -52,6 +53,17 @@ -- Ordinary_Fixed_Point_Type with a small that is a negative power of ten. -- Shouldn't this be in einfo.adb or sem_aux.adb??? + procedure Rewrite_Object_Image + (N : Node_Id; + Pref : Entity_Id; + Attr_Name : Name_Id; + Str_Typ : Entity_Id); + -- AI12-00124: Rewrite attribute 'Image when it is applied to an object + -- reference as an attribute applied to a type. N denotes the node to be + -- rewritten, Pref denotes the prefix of the 'Image attribute, and Name + -- and Str_Typ specify which specific string type and 'Image attribute to + -- apply (e.g. Name_Wide_Image and Standard_Wide_String). + -- Build_Enumeration_Image_Tables -- @@ -254,10 +266,10 @@ Loc : constant Source_Ptr := Sloc (N); Exprs : constant List_Id:= Expressions (N); Pref : constant Node_Id:= Prefix (N); - Ptyp : constant Entity_Id := Entity (Pref); - Rtyp : constant Entity_Id := Root_Type (Ptyp); Expr : constant Node_Id:= Relocate_Node (First (Exprs)); Imid : RE_Id; + Ptyp : Entity_Id; + Rtyp : Entity_Id; Tent : Entity_Id; Ttyp : Entity_Id; Proc_Ent : Entity_Id; @@ -273,6 +285,14 @@ Pnn : constant Entity_Id := Make_Temporary (Loc, 'P'); begin + if Is_Object_Image (Pref) then + Rewrite_Object_Image (N, Pref, Name_Image, Standard_String); + return; + end if; + + Ptyp := Entity (Pref); + Rtyp := Root_Type (Ptyp); + -- Build declarations of Snn and Pnn to be inserted Ins_List := New_List ( @@ -791,11 +811,19 @@ procedure Expand_Wide_Image_Attribute (N : Node_Id) is Loc : constant Source_Ptr := Sloc (N); - Rtyp : constant Entity_Id := Root_Type (Entity (Prefix (N))); - Rnn : constant Entity_Id := Make_Temporary (Loc, 'S'); - Lnn : constant Entity_Id := Make_Temporary (Loc, 'P'); + Pref : constant Entity_Id := Prefix (N); + Rnn : constant Entity_Id := Make_Temporary (Loc, 'S'); + Lnn : constant Entity_Id := Make_Temporary (Loc, 'P'); + Rtyp : Entity_Id; begin + if Is_Object_Image (Pref) then + Rewrite_Object_Image (N, Pref, Name_Wide_Image, Standard_Wide_String); + return; + end if; + + Rtyp := Root_Type (Entity (Pref)); + Insert_Actions (N, New_List ( -- Rnn : Wide_String (1 .. base_typ'Width); @@ -882,12 +910,20 @@ procedure Expand_Wide_Wide_Image_Attribute (N : Node_Id) is Loc : constant Source_Ptr := Sloc (N); - Rtyp : constant Entity_Id := Root_Type (Entity (Prefix (N))); + Pref : constant Entity_Id := Prefix (N); + Rnn : constant Entity_Id := Make_Temporary (Loc, 'S'); + Lnn : constant Entity_Id := Make_Temporary (Loc, 'P'); + Rtyp : Entity_Id; - Rnn : constant Entity_Id := Make_Temporary (Loc, 'S'); - Lnn : constant Entity_Id := Make_Temporary (Loc, 'P'); + begin + if Is_Object_Image (Pref) then + Rewrite_Object_Image + (N, Pref, Name_Wide_Wide_Image, Standard_Wide_Wide_String); + return; + end if; - begin + Rtyp := Root_Type (Entity (Pref)); + Insert_Actions (N, New_List ( -- Rnn : Wide_Wide_String (1 .. rt'Wide_Wide_Width); @@ -1373,4 +1409,23 @@ and then Ur
[AArch64] Merge stores of D register values of different modes
Hi all, This patch merges loads and stores from D-registers that are of different modes. Code like this: typedef int __attribute__((vector_size(8))) vec; struct pair { vec v; double d; } void assign (struct pair *p, vec v) { p->v = v; p->d = 1.0; } Now generates a stp instruction whereas previously it generated two `str` instructions. Likewise for loads. I have taken the opportunity to merge some of the patterns into a single pattern. Previously, we had different patterns for DI, DF, SI, SF modes. The patch uses the new iterators to reduce these to two patterns. This patch also merges storing of double zero values with long integer values: struct pair { long long l; double d; } void foo (struct pair *p) { p->l = 10; p->d = 0.0; } Now generates a single store pair instruction rather than two `str` instructions. Bootstrap and testsuite run OK. OK for trunk? Jackson gcc/ 2017-07-21 Jackson Woodruff * config/aarch64/aarch64.md: New patterns to generate stp and ldp. * config/aarch64/aarch64-ldpstp.md: Modified peephole for different mode ldpstp and added peephole for merge zero stores. Likewise for loads. * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp): Added size check. (aarch64_gen_store_pair): Rename calls to match new patterns. (aarch64_gen_load_pair): Rename calls to match new patterns. * config/aarch64/aarch64-simd.md (store_pair): Updated pattern to match two modes. (store_pair_sw, store_pair_dw): New patterns to generate stp for single words and double words. (load_pair_sw, load_pair_dw): Likewise. (store_pair_sf, store_pair_df, store_pair_si, store_pair_di): Removed. (load_pair_sf, load_pair_df, load_pair_si, load_pair_di): Removed. * config/aarch64/iterators.md: New mode iterators for types in d registers and duplicate DX and SX modes. New iterator for DI, DF, SI, SF. * config/aarch64/predicates.md (aarch64_reg_zero_or_fp_zero): New. gcc/testsuite/ 2017-07-21 Jackson Woodruff * gcc.target/aarch64/ldp_stp_6.c: New. * gcc.target/aarch64/ldp_stp_7.c: New. * gcc.target/aarch64/ldp_stp_8.c: New. diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md index e8dda42c2dd1e30c4607c67a2156ff7813bd89ea..14e860d258e548d4118d957675f8bdbb74615337 100644 --- a/gcc/config/aarch64/aarch64-ldpstp.md +++ b/gcc/config/aarch64/aarch64-ldpstp.md @@ -99,10 +99,10 @@ }) (define_peephole2 - [(set (match_operand:VD 0 "register_operand" "") - (match_operand:VD 1 "aarch64_mem_pair_operand" "")) - (set (match_operand:VD 2 "register_operand" "") - (match_operand:VD 3 "memory_operand" ""))] + [(set (match_operand:DREG 0 "register_operand" "") + (match_operand:DREG 1 "aarch64_mem_pair_operand" "")) + (set (match_operand:DREG2 2 "register_operand" "") + (match_operand:DREG2 3 "memory_operand" ""))] "aarch64_operands_ok_for_ldpstp (operands, true, mode)" [(parallel [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (match_dup 3))])] @@ -119,11 +119,12 @@ }) (define_peephole2 - [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "") - (match_operand:VD 1 "register_operand" "")) - (set (match_operand:VD 2 "memory_operand" "") - (match_operand:VD 3 "register_operand" ""))] - "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, mode)" + [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "") + (match_operand:DREG 1 "register_operand" "")) + (set (match_operand:DREG2 2 "memory_operand" "") + (match_operand:DREG2 3 "register_operand" ""))] + "TARGET_SIMD + && aarch64_operands_ok_for_ldpstp (operands, false, mode)" [(parallel [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (match_dup 3))])] { @@ -138,7 +139,6 @@ } }) - ;; Handle sign/zero extended consecutive load/store. (define_peephole2 @@ -181,6 +181,30 @@ } }) +;; Handle storing of a floating point zero. +;; We can match modes that won't work for a stp instruction +;; as aarch64_operands_ok_for_ldpstp checks that the modes are +;; compatible. +(define_peephole2 + [(set (match_operand:DSX 0 "aarch64_mem_pair_operand" "") + (match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" "")) + (set (match_operand: 2 "memory_operand" "") + (match_operand: 3 "aarch64_reg_zero_or_fp_zero" ""))] + "aarch64_operands_ok_for_ldpstp (operands, false, DImode)" + [(parallel [(set (match_dup 0) (match_dup 1)) + (set (match_dup 2) (match_dup 3))])] +{ + rtx base, offset_1, offset_2; + + extract_base_offset_in_addr (operands[0], &base, &offset_1); + extract_base_offset_in_addr (operands[2], &base, &offset_2); + if (INTVAL (offset_1) > INT
Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Michael Collison writes: > Richard Sandiford do you have any objections to the patch as it stands? > It doesn't appear as if anything is going to change in the mid-end > anytime soon. I think one of the suggestions was to do it in expand, taking advantage of range info and TARGET_SHIFT_TRUNCATION_MASK. This would be like the current FMA_EXPR handling in expand_expr_real_2. I know there was talk about cleaner approaches, but at least doing the above seems cleaner than doing in the backend. It should also be a nicely-contained piece of work. Thanks, Richard > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@linaro.org] > Sent: Tuesday, August 22, 2017 9:11 AM > To: Richard Biener > Cc: Richard Kenner ; Michael Collison > ; GCC Patches ; nd > ; Andrew Pinski > Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts > > Richard Biener writes: >> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >> wrote: >>> Richard Biener writes: On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford wrote: >Richard Biener writes: >> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >> wrote: Correct. It is truncated for integer shift, but not simd shift instructions. We generate a pattern in the split that only >generates the integer shift instructions. >>> >>> That's unfortunate, because it would be nice to do this in >simplify_rtx, >>> since it's machine-independent, but that has to be conditioned on >>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >> >> SHIFT_COUNT_TRUNCATED should go ... you should express this in the >> patterns, like for example with >> >> (define_insn ashlSI3 >> [(set (match_operand 0 "") >> (ashl:SI (match_operand ... ) >> (subreg:QI (match_operand:SI ...)))] >> >> or an explicit and:SI and combine / simplify_rtx should apply the >magic >> optimization we expect. > >The problem with the explicit AND is that you'd end up with either >an AND of two constants for constant shifts, or with two separate >patterns, one for constant shifts and one for variable shifts. (And >the problem in theory with two patterns is that it reduces the RA's >freedom, although in practice I guess we'd always want a constant >shift where possible for cost reasons, and so the RA would never >need to replace pseudos with constants itself.) > >I think all useful instances of this optimisation will be exposed by >the gimple optimisers, so maybe expand could to do it based on >TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >the rtx code and it does take the mode into account. Sure, that could work as well and also take into account range info. But we'd then need named expanders and the result would still have the explicit and or need to be an unspec or a different RTL operation. >>> >>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >>> target-dependent rather than undefined behaviour, so it's OK for a >>> target to use shift codes with out-of-range values. >> >> Hmm, but that means simplify-rtx can't do anything with them because >> we need to preserve target dependent behavior. > > Yeah, it needs to punt. In practice that shouldn't matter much. > >> I think the RTL IL should be always well-defined and its semantics >> shouldn't have any target dependences (ideally, and if, then they >> should be well specified via extra target hooks/macros). > > That would be nice :-) I think the problem has traditionally been that >> shifts can be used in quite a few define_insn patterns besides those >> for shift instructions. So if your target defines shifts to have >> 256-bit precision (say) then you need to make sure that every >> define_insn with a shift rtx will honour that. > > It's more natural for target guarantees to apply to instructions than to >> rtx codes. > >>> And >>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how >>> the normal shift optabs behave, so I don't think we'd need new optabs >>> or new unspecs. >>> >>> E.g. it already works this way when expanding double-word shifts, >>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There it's >>> possible to use a shorter sequence if you know that the shift optab >>> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED >>> isn't defined. >> >> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK >> applies to the instructions generated by the named shift patterns but >> _not_ general shift RTXen. But the generated pattern contains shift >> RTXen and how can we figure whether they were generated by the named >> expanders or by other means? Don't define_expand also serve as >> define_insn for things like combine? > > Yeah, you can't (and aren't supposed to
[Ada] Derived iterable types with noniterable parent
This patch fixes a bug in which if a derived type has a Default_Iterator specified, and the parent type does not, then a "for ... of" loop causes the compiler to crash. No small test case available. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Bob Duff * exp_ch5.adb (Get_Default_Iterator): Replace "Assert(False)" with "return Iter", because if an iterable type is derived from a noniterable one, then we won't find an overriding or inherited default iterator. Index: exp_ch5.adb === --- exp_ch5.adb (revision 251767) +++ exp_ch5.adb (working copy) @@ -3934,9 +3934,9 @@ function Get_Default_Iterator (T : Entity_Id) return Entity_Id; --- If the container is a derived type, the aspect holds the parent --- operation. The required one is a primitive of the derived type --- and is either inherited or overridden. Also sets Container_Arg. +-- Return the default iterator for a specific type. If the type is +-- derived, we return the inherited or overridden one if +-- appropriate. -- -- Get_Default_Iterator -- @@ -3953,11 +3953,11 @@ begin Container_Arg := New_Copy_Tree (Container); - -- A previous version of GNAT allowed indexing aspects to - -- be redefined on derived container types, while the - -- default iterator was inherited from the parent type. - -- This non-standard extension is preserved temporarily for - -- use by the modelling project under debug flag d.X. + -- A previous version of GNAT allowed indexing aspects to be + -- redefined on derived container types, while the default + -- iterator was inherited from the parent type. This + -- nonstandard extension is preserved for use by the + -- modelling project under debug flag -gnatd.X. if Debug_Flag_Dot_XX then if Base_Type (Etype (Container)) /= @@ -3995,9 +3995,11 @@ Next_Elmt (Prim); end loop; - -- Default iterator must exist + -- If we didn't find it, then our parent type is not + -- iterable, so we return the Default_Iterator aspect of + -- this type. - pragma Assert (False); + return Iter; -- Otherwise not a derived type
[Ada] Missing finalization of cursor in "of" iterator loop
This patch modifies the finalization machinery to ensure that the cursor of an "of" iterator loop is properly finalized at the end of the loop. Previously it was incorrectly assumed that such a cursor will never need finalization ctions. -- Source -- -- leak.adb pragma Warnings (Off); with Ada.Unchecked_Deallocation; with Ada.Finalization; with Ada.Iterator_Interfaces; with Ada.Text_IO; use Ada.Text_IO; procedure Leak is type El is tagged null record; type Integer_Access is access all Integer; procedure Unchecked_Free is new Ada.Unchecked_Deallocation (Integer, Integer_Access); type Cursor is new Ada.Finalization.Controlled with record Count : Integer_Access := new Integer'(1); end record; overriding procedure Adjust (C : in out Cursor); overriding procedure Finalize (C : in out Cursor); overriding procedure Adjust (C : in out Cursor) is begin C.Count.all := C.Count.all + 1; Put_Line ("Adjust Cursor. Count = " & C.Count.all'Img); end Adjust; overriding procedure Finalize (C : in out Cursor) is begin C.Count.all := C.Count.all - 1; Put_Line ("Finalize Cursor. Count = " & C.Count.all'Img); if C.Count.all = 0 then Unchecked_Free (C.Count); end if; end Finalize; function Has_Element (C : Cursor) return Boolean is (False); package Child is package Iterators is new Ada.Iterator_Interfaces (Cursor => Cursor, Has_Element => Has_Element); type Iterator is new Ada.Finalization.Controlled and Iterators.Forward_Iterator with record Count : Integer_Access := new Integer'(1); end record; overriding function First (I : Iterator) return Cursor is (Ada.Finalization.Controlled with others => <>); overriding function Next (I : Iterator; C : Cursor) return Cursor is (Ada.Finalization.Controlled with others => <>); overriding procedure Adjust (I : in out Iterator); end Child; package body Child is overriding procedure Adjust (I : in out Iterator) is begin I.Count.all := I.Count.all + 1; Put_Line ("Adjust Iterator. Count = " & I.Count.all'Img); end Adjust; overriding procedure Finalize (I : in out Iterator) is begin I.Count.all := I.Count.all - 1; Put_Line ("Finalize Iterator. Count = " & I.Count.all'Img); if I.Count.all = 0 then Unchecked_Free (I.Count); end if; end Finalize; end Child; type Iterable is tagged null record with Default_Iterator => Iterate, Iterator_Element => El'Class, Constant_Indexing => El_At; function Iterate (O : Iterable) return Child.Iterators.Forward_Iterator'Class is (Child.Iterator'(Ada.Finalization.Controlled with others => <>)); function El_At (Self : Iterable; Pos : Cursor'Class) return El'Class is (El'(others => <>)); Seq : Iterable; begin Put_Line ("START"); for V of Seq loop null; end loop; Put_Line ("END"); end Leak; -- Compilation and output -- $ gnatmake -q leak.adb -largs -lgmem $ ./leak $ gnatmem ./leak > leaks.txt $ grep -c "Number of non freed allocations" leaks.txt START Adjust Iterator. Count = 2 Finalize Iterator. Count = 1 Adjust Cursor. Count = 2 Finalize Cursor. Count = 1 Adjust Cursor. Count = 2 Finalize Cursor. Count = 1 Finalize Cursor. Count = 0 Finalize Iterator. Count = 0 END 0 Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Hristian Kirtchev * einfo.adb (Status_Flag_Or_Transient_Decl): The attribute is now allowed on loop parameters. (Set_Status_Flag_Or_Transient_Decl): The attribute is now allowed on loop parameters. (Write_Field15_Name): Update the output for Status_Flag_Or_Transient_Decl. * einfo.ads: Attribute Status_Flag_Or_Transient_Decl now applies to loop parameters. Update the documentation of the attribute and the E_Loop_Parameter entity. * exp_ch7.adb (Process_Declarations): Remove the bogus guard which assumes that cursors can never be controlled. * exp_util.adb (Requires_Cleanup_Actions): Remove the bogus guard which assumes that cursors can never be controlled. Index: exp_ch7.adb === --- exp_ch7.adb (revision 251753) +++ exp_ch7.adb (working copy) @@ -2100,15 +2100,6 @@ elsif Is_Ignored_Ghost_Entity (Obj_Id) then null; - -- The expansion of iterator loops generates an object - -- declaration where the Ekind is explicitly set to loop - -- parameter. This is to ensure that the loop parameter behaves - -- as a constant from user code point of view. Such object are
[Ada] Better warning on access to string at negative or null index
The warning issued when accessing a string at a negative or null index was misleading, suggesting to use S'First - 1 as correct index, which it is obviously not. Add a detection for negative or null index when accessing a standard string, so that an appropriate warning is issued. Also add a corresponding warning for other arrays, which is currently not triggered by this detection mechanism under -gnatww The following compilation shows the new warning: $ gcc -c cstr.adb 1. procedure Cstr (X : in out String; J : Integer := -1) is 2. begin 3.X(0 .. J) := ""; | >>> warning: string index should be positive >>> warning: static expression fails Constraint_Check 4.X(0) := 'c'; | >>> warning: string index should be positive >>> warning: static expression fails Constraint_Check 5.X(0 .. 4) := "hello"; 13 >>> warning: string index should be positive >>> warning: static expression fails Constraint_Check >>> warning: index for "X" may assume lower bound of 1 >>> warning: suggested replacement: "X'First + 3" 6. end Cstr; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Yannick Moy * sem_warn.adb (Warn_On_Suspicious_Index): Improve warning when the literal index used to access a string is null or negative. Index: sem_warn.adb === --- sem_warn.adb(revision 251772) +++ sem_warn.adb(working copy) @@ -46,6 +46,7 @@ with Snames; use Snames; with Stand;use Stand; with Stringt; use Stringt; +with Tbuild; use Tbuild; with Uintp;use Uintp; package body Sem_Warn is @@ -3878,6 +3879,13 @@ procedure Warn1; -- Generate first warning line + procedure Warn_On_Index_Below_Lower_Bound; + -- Generate a warning on indexing the array with a literal value + -- below the lower bound of the index type. + + procedure Warn_On_Literal_Index; + -- Generate a warning on indexing the array with a literal value + -- -- Length_Reference -- -- @@ -3903,21 +3911,31 @@ ("?w?index for& may assume lower bound of^", X, Ent); end Warn1; - -- Start of processing for Test_Suspicious_Index + - + -- Warn_On_Index_Below_Lower_Bound -- + - - begin - -- Nothing to do if subscript does not come from source (we don't - -- want to give garbage warnings on compiler expanded code, e.g. the - -- loops generated for slice assignments. Such junk warnings would - -- be placed on source constructs with no subscript in sight). + procedure Warn_On_Index_Below_Lower_Bound is + begin +if Is_Standard_String_Type (Typ) then + Discard_Node + (Compile_Time_Constraint_Error + (N => X, +Msg => "?w?string index should be positive")); +else + Discard_Node + (Compile_Time_Constraint_Error + (N => X, +Msg => "?w?index out of the allowed range")); +end if; + end Warn_On_Index_Below_Lower_Bound; - if not Comes_From_Source (Original_Node (X)) then -return; - end if; + --- + -- Warn_On_Literal_Index -- + --- - -- Case where subscript is a constant integer - - if Nkind (X) = N_Integer_Literal then + procedure Warn_On_Literal_Index is + begin Warn1; -- Case where original form of subscript is an integer literal @@ -4037,7 +4055,35 @@ Error_Msg_FE -- CODEFIX ("\?w?suggested replacement: `&~`", Original_Node (X), Ent); end if; + end Warn_On_Literal_Index; + -- Start of processing for Test_Suspicious_Index + + begin + -- Nothing to do if subscript does not come from source (we don't + -- want to give garbage warnings on compiler expanded code, e.g. the + -- loops generated for slice assignments. Such junk warnings would + -- be placed on source constructs with no subscript in sight). + + if not Comes_From_Source (Original_Node (X)) then +return; + end if; + + -- Case where subscript is a constant integer + + if Nkind (X) = N_Integer_Literal then + +-- Case where subscript is lower than the lowest possible bound. +-- This might be the case for example when programmers try to +-- access a string at index 0, as they are used to in other +-- programming
[Ada] Improve error message when function is used in a call statement
A typical error for new users of Ada is to call functions in a call statement. Improve the error message for these users, to better indicate what the error is in that case. The following compilation raises the new message. $ gcc -c main.adb 1. procedure Main is 2.function Lol return Integer is (0); 3. begin 4.Lol; | >>> cannot use call to function "Lol" as a statement >>> return value of a function call cannot be ignored 5. end Main; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Yannick Moy * sem_res.adb (Resolve): Update message for function call as statement. Index: sem_res.adb === --- sem_res.adb (revision 251755) +++ sem_res.adb (working copy) @@ -2533,8 +2533,11 @@ and then Ekind (Entity (Name (N))) = E_Function then Error_Msg_NE -("cannot use function & in a procedure call", +("cannot use call to function & as a statement", Name (N), Entity (Name (N))); + Error_Msg_N +("\return value of a function call cannot be ignored", + Name (N)); -- Otherwise give general message (not clear what cases this -- covers, but no harm in providing for them).
[Ada] No_Return procedures in renaming declarations.
This patch implements legality rule in 6.5.1 (7/2): if a renaming as body completes a nonreturning procedure declaration, the renamed procedure must be nonreturning as well. Previously GNAT only produced a warning in such cases. Tested in ACATS test B651002. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_ch6.adb (Check_Returns): Clean up warnings coming from generated bodies for renamings that are completions, when renamed procedure is No_Return. * sem_ch8.adb (Analyze_Subprogram_Renaming): Implement legality rule in 6.5.1 (7/2): if a renaming is a completion of a subprogram with No_Return, the renamed entity must be No_Return as well. Index: sem_ch6.adb === --- sem_ch6.adb (revision 251762) +++ sem_ch6.adb (working copy) @@ -6693,7 +6693,11 @@ Error_Msg_N ("implied return after this statement " & "would have raised Program_Error", Last_Stm); - else + + -- In normal compilation mode, do not warn on a generated + -- call (e.g. in the body of a renaming as completion). + + elsif Comes_From_Source (Last_Stm) then Error_Msg_N ("implied return after this statement " & "will raise Program_Error??", Last_Stm); Index: sem_ch8.adb === --- sem_ch8.adb (revision 251762) +++ sem_ch8.adb (working copy) @@ -2946,6 +2946,14 @@ Check_Fully_Conformant (New_S, Rename_Spec); Set_Public_Status (New_S); + if No_Return (Rename_Spec) +and then not No_Return (Entity (Nam)) + then +Error_Msg_N ("renaming completes a No_Return procedure", N); +Error_Msg_N + ("\renamed procedure must be nonreturning (RM 6.5.1 (7/2))", N); + end if; + -- The specification does not introduce new formals, but only -- repeats the formals of the original subprogram declaration. -- For cross-reference purposes, and for refactoring tools, we
[Ada] Crash on generic subprogram with aspect No_Return.
This patch fixes a compiler abort on a generic unit to which the aspect No_Return applies. Tested in ACATS 4.1D C651002. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * freeze.adb (Freeze_Entity): Do not generate a freeze node for a generic unit, even if it includes delayed aspect specifications. Freeze nodes for generic entities must never appear in the tree that reaches the back-end of the compiler. Index: freeze.adb === --- freeze.adb (revision 251765) +++ freeze.adb (working copy) @@ -5489,6 +5489,13 @@ then Explode_Initialization_Compound_Statement (E); end if; + +-- Do not generate a freeze node for a generic unit. + +if Is_Generic_Unit (E) then + Result := No_List; + goto Leave; +end if; end if; -- Case of a type or subtype being frozen
[Ada] Pragma No_Return on generic units
This patch ensures that if a pragma No_Return applies to a generic subprogram , all its instantiations are treated as No_Return subprograms as well. Tested in ACATS 4.1D C651001. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_ch12.adb (Analyze_Subprogram_Instantiation): Propagate No_Return flag to instance if pragma applies to generic unit. This must be done explicitly because the pragma does not appear directly in the generic declaration (unlike the corresponding aspect specification). Index: sem_ch12.adb === --- sem_ch12.adb(revision 251753) +++ sem_ch12.adb(working copy) @@ -5382,6 +5382,15 @@ Set_Has_Pragma_Inline (Act_Decl_Id, Has_Pragma_Inline (Gen_Unit)); Set_Has_Pragma_Inline (Anon_Id, Has_Pragma_Inline (Gen_Unit)); + -- Propagate No_Return if pragma applied to generic unit. This must + -- be done explicitly because pragma does not appear in generic + -- declaration (unlike the aspect case). + + if No_Return (Gen_Unit) then +Set_No_Return (Act_Decl_Id); +Set_No_Return (Anon_Id); + end if; + Set_Has_Pragma_Inline_Always (Act_Decl_Id, Has_Pragma_Inline_Always (Gen_Unit)); Set_Has_Pragma_Inline_Always
[Ada] Inherited aspects that may be delayed in a parent type
This patch fixes an omission in the handling of delayed aspects on derived types. The type may inherit a representation aspect from its parent, but have no explicit aspect specifications. At the point it is frozen, the parent is frozen as well and its explicit aspects have been analyzed. The inherited aspects of the derived type can then be captured properly. Tested in ACATS test C35A001. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * freeze.adb (Freeze_Entity): For a derived type that has no explicit delayed aspects but may inherit delayed aspects from its parent type, analyze aspect at freeze point for proper capture of an inherited aspect. Index: freeze.adb === --- freeze.adb (revision 251760) +++ freeze.adb (working copy) @@ -5266,8 +5266,12 @@ -- pragma or attribute definition clause in the tree at this point. We -- also analyze the aspect specification node at the freeze point when -- the aspect doesn't correspond to pragma/attribute definition clause. + -- In addition, a derived type may have inherited aspects that were + -- delayed in the parent, so these must also be captured now. - if Has_Delayed_Aspects (E) then + if Has_Delayed_Aspects (E) + or else May_Inherit_Delayed_Rep_Aspects (E) + then Analyze_Aspects_At_Freeze_Point (E); end if;
[Ada] Restore original implementation of internal Table package
This wasn't explicitly mentioned but the previous changes also replaced the internal Table package used in the compiler by GNAT.Tables, resulting in a large performance hit for the compiler because the memory management scheme of the latter is very inefficient. This restores the original implementation, which brings about a 10% speedup in clock time on a typical compilation at -O0. In addition, also use Table instead of GNAT.Table consistently in compiler units: most compiler units instantiate the Table package when they need a resizable array but a few of them were instantiating GNAT.Table instead, which is less efficient and creates an additional dependency on the runtime. This changes these units to using the Table package, which is immediate since the interface is (essentially) the same. No functional changes. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Eric Botcazou * table.ads, table.adb: Restore original implementation. * namet.h (Names_Ptr): Adjust back. (Name_Chars_Ptr): Likewise. * uintp.h (Uints_Ptr): Likewise. (Udigits_Ptr): Likewise. * g-table.ads: Remove pragma Compiler_Unit_Warning. * par_sco.adb: Do not with GNAT.Table and use Table consistently. * scos.ads: Replace GNAT.Table with Table and adjust instantiations. * spark_xrefs.ads: Likewise. * scos.h: Undo latest changes. * gcc-interfaces/trans.c (gigi): Likewise. Index: g-table.ads === --- g-table.ads (revision 251753) +++ g-table.ads (working copy) @@ -41,8 +41,6 @@ -- GNAT.Table -- Table (the compiler unit) -pragma Compiler_Unit_Warning; - with GNAT.Dynamic_Tables; generic Index: namet.h === --- namet.h (revision 251753) +++ namet.h (working copy) @@ -45,11 +45,11 @@ }; /* Pointer to names table vector. */ -#define Names_Ptr namet__name_entries__tab__the_instance +#define Names_Ptr namet__name_entries__table extern struct Name_Entry *Names_Ptr; /* Pointer to name characters table. */ -#define Name_Chars_Ptr namet__name_chars__tab__the_instance +#define Name_Chars_Ptr namet__name_chars__table extern char *Name_Chars_Ptr; /* This is Hostparm.Max_Line_Length. */ Index: par_sco.adb === --- par_sco.adb (revision 251753) +++ par_sco.adb (working copy) @@ -44,7 +44,6 @@ with GNAT.HTable; use GNAT.HTable; with GNAT.Heap_Sort_G; -with GNAT.Table; package body Par_SCO is @@ -76,12 +75,13 @@ -- running some steps multiple times (the second pass has to be started -- from multiple places). - package SCO_Raw_Table is new GNAT.Table + package SCO_Raw_Table is new Table.Table (Table_Component_Type => SCO_Table_Entry, Table_Index_Type => Nat, Table_Low_Bound => 1, Table_Initial=> 500, - Table_Increment => 300); + Table_Increment => 300, + Table_Name => "Raw_Table"); --- -- Unit Number Table -- Index: scos.ads === --- scos.ads(revision 251753) +++ scos.ads(working copy) @@ -6,7 +6,7 @@ -- -- -- S p e c -- -- -- --- Copyright (C) 2009-2016, Free Software Foundation, Inc. -- +-- Copyright (C) 2009-2017, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -29,10 +29,9 @@ -- is used in the ALI file. with Namet; use Namet; +with Table; with Types; use Types; -with GNAT.Table; - package SCOs is -- SCO information can exist in one of two forms. In the ALI file, it is @@ -383,12 +382,13 @@ -- For the SCO for a pragma/aspect, gives the pragma/apsect name end record; - package SCO_Table is new GNAT.Table ( + package SCO_Table is new Table.Table ( Table_Component_Type => SCO_Table_Entry, Table_Index_Type => Nat, Table_Low_Bound => 1, Table_Initial=> 500, - Table_Increment => 300); + Table_Increment => 300, + Table_Name => "Table"); Is_Decision : constant array (Character) of Boolean := ('E' | 'G' | 'I' | 'P' | 'a' | 'A' | 'W' | 'X' => True, @@ -530,12 +530,13 @@ end record; - package SCO_Unit_Table is new GNAT.Table ( + package SCO_Unit_Table is new Table.Table ( Table_Component_Type => SCO_Unit_Table_Entry, Table_I
[Ada] Primitive functions that require one formal and return an array
Primitive functions whose first formal is a controlling parameter, whose other formals have defaults and whose result is an array type can lead to ambiguities when the result of such a call is the prefix of an indexed component. The interpretation that analyzes Obj.F (X, Y) into F (Obj)(X, Y) is only legal if the first parameter of F is a controlling parameter. This additional guard was previously missing from the predicate, leading to malformed trees and a compiler crash. Compiling huckel.adb must yield: huckel.adb:135:27: expected type "Real" defined at huckel.ads:9 huckel.adb:135:27: found type "Ada.Numerics.Generic_Real_Arrays.Real_Matrix" from instance at huckel.ads:16 -- Huckel package -- This is a translation from Fortran II code documented in the -- book "Computing Methods for Quantum Organic Chemistry" with Ada.Numerics.Generic_Real_Arrays; package Huckel is type Real is digits 15; type Molecule (Atoms : Positive) is tagged private; function Input return Molecule; procedure Compute_Energies(Item : in out Molecule); procedure Output(Item : in Molecule); private package Matrices is new Ada.Numerics.Generic_Real_Arrays(Real); use Matrices; type Molecule (Atoms : Positive) is tagged record Orbitals: Positive; Atomic_Matrix : Real_Matrix(1..Atoms, 1..Atoms); Atomic_Diagonal : Real_Vector(1..Atoms); Unit_Matrix : Real_Matrix(1..Atoms, 1..Atoms); Bond_Orders : Real_Matrix(1..Atoms, 1..Atoms); Free_Valences : Real_vector(1..Atoms); end record; end Huckel; --- with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; with Ada.Text_IO; with Ada.Numerics.Generic_Elementary_Functions; package body Huckel is package Real_IO is new Ada.Text_IO.Float_IO(Real); use Real_Io; --- -- Input -- --- function Input return Molecule is Num_Atoms : Positive; Num_Orbs : Positive; begin Get(Item => Num_Atoms); Get(Item => Num_Orbs); declare Temp : Molecule(Atoms => Num_Atoms); begin Temp.Orbitals := Num_Orbs; -- Read the atomic matrix into the upper semi-matrix of Atomic_Matrix for I in 1..Num_Atoms loop for J in 1..I loop Get(Item => Temp.Atomic_Matrix(J, I)); -- Print the input matrix in lower semi-matrix format Put(Item => Temp.Atomic_Matrix(J,I), Aft => 0, Fore => 2, Exp => 0); -- Make all bonding terms negative Temp.Atomic_Matrix(I, J) := -Temp.Atomic_Matrix(I,J); end loop; New_Line; end loop; return Temp; end; end Input; -- Modify -- procedure Modify(Item : in out Molecule) is Num_Mods : natural; I, J : Positive; Modification : Real; begin Get(Item => Num_Mods); if Num_Mods > 0 then New_Line(3); Put_Line("Modifications"); for Num in 1..Num_Mods loop Get(Item => I); Get(Item => J); Get(Item => Modification); Put(Item => I, Width => 3); Put(Item => J, Width => 6); Put(Item => Modification, Aft => 3, Fore => 7, Exp => 0); New_Line; if I = J then Item.Atomic_Diagonal(J) := Modification; elsif I < J then Item.Atomic_Matrix(I, J) := Modification; else Item.Atomic_Matrix(J, I) := Modification; end if; end loop; end if; end Modify; -- -- Pahy -- -- procedure Pahy(Item : in out Molecule) is begin for J in 1..Item.Atoms loop for I in 1..J loop Item.Atomic_Matrix(I, J) := Item.Atomic_Matrix(J, I); Item.Atomic_Diagonal(J) := Item.Atomic_Matrix(J,J); end loop; end loop; end Pahy; -- Scofi1 -- procedure Scofi1(Item : in out Molecule) is package elem_funcs is new Ada.Numerics.Generic_Elementary_Functions(real); use elem_funcs; Max : Real := 0.0; J_up : Natural; Aii : Real; Ajj : Real; Aod : Real; Asq : Real; Eps : constant Real := 1.0e-16; diffr : Real; sign : Real; tden : Real; Tank : Real; C: Real; S : Real; xj : Real; begin -- initialize unit matrix Item.Unit_Matrix := (Others => (Others => 0.0)); for I in 1..Item.Atoms loop Item.Unit_Matrix(I, I) := 1.0; end loop; for I in 2..Item.Atoms loop J_Up := I - 1; for J in 1..J_Up loop Aii := Item.Atomic_Diagonal(I); Ajj := Item.Atomic_Diagonal(J); Aod := Item.Atomic_Matrix(J, I); Asq := Aod * Aod; if Asq > Max then
Re: [PATCH] Factor out division by squares and remove division around comparisons (2/2)
Hi all, A minor improvement came to mind while updating other parts of this patch. I've updated a testcase to make it more clear and a condition now uses a call to is_division_by rather than manually checking those conditions. Jackson On 08/30/2017 05:32 PM, Jackson Woodruff wrote: Hi all, I've attached a new version of the patch in response to a few of Wilco's comments in person. The end product of the pass is still the same, but I have fixed several bugs. Now tested independently of the other patches. On 08/15/2017 03:07 PM, Richard Biener wrote: On Thu, Aug 10, 2017 at 4:10 PM, Jackson Woodruff wrote: Hi all, The patch implements the some of the division optimizations discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 . We now reassociate (as discussed in the bug report): x / (y * y) -> x * (1 / y) * (1 / y) If it is reasonable to do so. This is done with -funsafe-math-optimizations. Bootstrapped and regtested with part (1/2). OK for trunk? I believe your enhancement shows the inherent weakness of CSE of reciprocals in that it works from the defs. It will handle x / (y * y) but not x / (y * y * y). I think a rewrite of this mini-pass is warranted. I suspect that there might be more to gain by of handling the case of x / (y * z) rather than the case of x / (y**n), but I agree that this pass could do more. Richard. Jackson gcc/ 2017-08-03 Jackson Woodruff PR 71026/tree-optimization * tree-ssa-math-opts (is_division_by_square, is_square_of, insert_sqaure_reciprocals): New. (insert_reciprocals): Change to insert reciprocals before a division by a square. (execute_cse_reciprocals_1): Change to consider division by a square. gcc/testsuite 2017-08-03 Jackson Woodruff PR 71026/tree-optimization * gcc.dg/associate_division_1.c: New. Thanks, Jackson. Updated ChangeLog: gcc/ 2017-08-30 Jackson Woodruff PR 71026/tree-optimization * tree-ssa-math-opts (is_division_by_square, is_square_of): New. (insert_reciprocals): Change to insert reciprocals before a division by a square and to insert the square of a reciprocal. (execute_cse_reciprocals_1): Change to consider division by a square. (register_division_in): Add importance parameter. gcc/testsuite 2017-08-30 Jackson Woodruff PR 71026/tree-optimization * gcc.dg/extract_recip_3.c: New. * gcc.dg/extract_recip_4.c: New. * gfortran.dg/extract_recip_1.f: New. diff --git a/gcc/testsuite/gcc.dg/extract_recip_3.c b/gcc/testsuite/gcc.dg/extract_recip_3.c new file mode 100644 index ..ad9f2dc36f1e695ceca1f50bc78f4ac4fbb2e787 --- /dev/null +++ b/gcc/testsuite/gcc.dg/extract_recip_3.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +float +extract_square (float *a, float *b, float x, float y) +{ + *a = 3 / (y * y); + *b = 5 / (y * y); + + return x / (y * y); +} + +/* Don't expect the 'powmult' (calculation of y * y) + to be deleted until a later pass, so look for one + more multiplication than strictly necessary. */ +float +extract_recip (float *a, float *b, float x, float y, float z) +{ + *a = 7 / y; + *b = x / (y * y); + + return z / y; +} + +/* 4 For the pointers to a, b, 4 multiplications in 'extract_square', + 4 multiplications in 'extract_recip' expected. */ +/* { dg-final { scan-tree-dump-times " \\* " 12 "optimized" } } */ + +/* 1 division in 'extract_square', 1 division in 'extract_recip'. */ +/* { dg-final { scan-tree-dump-times " / " 2 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/extract_recip_4.c b/gcc/testsuite/gcc.dg/extract_recip_4.c new file mode 100644 index ..83105c60ced5c2671f3793d76482c35502712a2c --- /dev/null +++ b/gcc/testsuite/gcc.dg/extract_recip_4.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-optimized" } */ + +/* Don't expect any of these divisions to be extracted. */ +double f (double x, int p) +{ + if (p > 0) +{ + return 1.0/(x * x); +} + + if (p > -1) +{ + return x * x * x; +} + return 1.0 /(x); +} + +/* Expect a reciprocal to be extracted here. */ +double g (double *a, double x, double y) +{ + *a = 3 / y; + double k = x / (y * y); + + if (y * y == 2.0) +return k + 1 / y; + else +return k - 1 / y; +} + +/* Expect 2 divisions in 'f' and 1 in 'g'. */ +/* { dg-final { scan-tree-dump-times " / " 3 "optimized" } } */ +/* Expect 3 multiplications in 'f' and 4 in 'g'. Also + expect one for the point to a. */ +/* { dg-final { scan-tree-dump-times " \\* " 8 "optimized" } } */ diff --git a/gcc/testsuite/gfortran.dg/extract_recip_1.f b/gcc/testsuite/gfortran.dg/extract_recip_1.f new file mode 100644 index ..ecf05189773b6c2f46222857fd88fd010bfdf348 --- /dev/null ++
Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)
On 08/30/2017 01:46 PM, Richard Biener wrote: On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff wrote: On 08/29/2017 01:13 PM, Richard Biener wrote: On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff wrote: Hi all, Apologies again to those CC'ed, who (again) received this twice. Joseph: Yes you are correct. I misread the original thread, now fixed. Richard: I've moved the optimizations out of fold-const.c. One has been replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) I've deleted as it only introduced a new optimization when running with the flags '-O0 -funsafe-math-optimizations'. Hmm, how did you verify that, that it only adds sth with -O0 -funsafe-math-optimizations? By checking with various flags, although not exhaustively. I looked for reasons for the behavior in match.pd (explained below). I have also since discovered that the combinations of '-funsafe-math-optimizations -frounding-math' (at all levels) and '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds something. Is that because in GIMPLE the reassoc pass should do this transform? It's because the pattern that changes (X / C) -> X * (1 / C) is gated with O1: (for cst (REAL_CST COMPLEX_CST VECTOR_CST) (simplify (rdiv @0 cst@1) ->(if (optimize) -> (if (flag_reciprocal_math && !real_zerop (@1)) (with { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1); } (if (tem) (mult @0 { tem; } ))) (if (cst != COMPLEX_CST) (with { tree inverse = exact_inverse (type, @1); } (if (inverse) (mult @0 { inverse; } I've flagged the two lines that are particularly relevant to this. So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y why's that in any way preferable? I suppose this is again to enable the recip pass to detect / y (as opposed to / (C * y))? What's the reason to believe that / y is more "frequent"? Removing this pattern, as I would expect, means that the divisions in the above optimization (and the one further down) are not removed. So then there is the question of edge cases. This pattern is (ignoring the second case) going to fail when const_binop returns null. Looking through that function says that it will fail (for reals) when: - Either argument is null (not the case) - The operation is not in the list (PLUS_EXPR, MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR) (again not the case) - We honor Signalling NaNs and one of the operands is a sNaN. - The operation is a division, and the second argument is zero and dividing by 0.0 raises an exception. - The result is infinity and neither of the operands were infinity and flag_trapping_math is set. - The result isn't exact and flag_rounding_math is set. For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these so that the pattern is never executed if one of these would be the case. Why not transform this directly to (x * (1/C)) / y then (and only then)? That makes it obvious not two divisions prevail. Done. That said, I'm questioning this canonicalization. I can come up with a testcase where it makes things worse: tem = x / (y * C); tem2 = z / (y * C); should generate rdivtmp = 1 / (y*C); tem = x *rdivtmp; tem2= z * rdivtmp; instead of rdivtmp = 1/y; tem = x * 1/C * rdivtmp; tem2 = z * 1/C * rdivtmp; Ideally we would be able to CSE that into rdivtmp = 1/y * 1/C; tem = x * rdivtmp; tem2 = z * rdivtmp; Although we currently do not. An equally (perhaps more?) problematic case is something like: tem = x / (y * C) tem2 = y * C Which becomes: tem = x * (1 / C) / y tem2 = y * C Instead of K = y * C tem = x / K tem2 = K Which ultimately requires context awareness to avoid. This does seem to be a general problem with a large number of match.pd patterns rather than anything specific to this one. For example, a similar example can be constructed for (say) (A / B) / C -> (A / (B * C)). The additional cases where this isn't converted to a multiplication by the reciprocal appear to be when -freciprocal-math is used, but we don't have -fno-rounding-math, or funsafe-math-optimizations. >> On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)' is enabled and then the pattern is covered by that and (x * C +- y * C -> C * (x +- y)) (which is already present in match.pd) I have also updated the testcase for those optimizations to use 'O1' to avoid that case. On 08/24/2017 10:06 PM, Jeff Law wrote: On 08/17/2017 03:55 AM, Wilco Dijkstra wrote: Richard Biener wrote: On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra wrote: Richard Biener wrote: We also change the association of x / (y * C) -> (x / C) / y If C is a constant. Why's that profitable? It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example. Also 1/y is now available to the reciprocal optimization, see https://gcc.g
[Ada] Spurious errors on derived untagged types with partial constraints
This patch fixes the handling of untagged discriminated derived types that constrain some parent discriminants and rename others. The compiler failed to handle a change of representation on the derived type, and generated faulty code for the initialization procedure or such a derived type. Executing: --- gnatmake -q p p -- must yield: -- 1234 TRUE 20 discriminant rules!! --- with Q; use Q; with Text_IO; use Text_IO; procedure P is procedure Inner (B : Base) is begin null; -- Put_Line (B.S); Put_Line (Integer'Image (B.I)); Put_Line (Boolean'Image (B.B)); Put_Line (Integer'Image (B.D)); Put_Line (B.S); end; D1 : Derived (True); begin D1.S := "discriminant rules!!"; Inner (Base (D1)); end; --- package Q is type Base (D : Positive; B : Boolean) is record I : Integer := 1234; S : String (1 .. D); -- := (1 .. D => 'Q'); end record; type Derived (B : Boolean) is new Base (D => 20, B => B); for Derived use record I at 0 range 0 .. 31; end record; Thing : Derived (False); end Q; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * exp_ch4.adb (Handle_Changed_Representation): For an untagged derived type with a mixture of renamed and constrained parent discriminants, the constraint for the target must obtain the discriminant values from both the operand and from the stored constraint for it, given that the constrained discriminants are not visible in the object. * exp_ch5.adb (Make_Field_Assign): The type of the right-hand side may be derived from that of the left-hand side (as in the case of an assignment with a change of representation) so the discriminant to be used in the retrieval of the value of the component must be the entity in the type of the right-hand side. Index: exp_ch5.adb === --- exp_ch5.adb (revision 251753) +++ exp_ch5.adb (working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- --- Copyright (C) 1992-2016, Free Software Foundation, Inc. -- +-- Copyright (C) 1992-2017, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -1448,9 +1448,21 @@ U_U : Boolean := False) return Node_Id is A: Node_Id; +Disc : Entity_Id; Expr : Node_Id; begin + +-- The discriminant entity to be used in the retrieval below must +-- be one in the corresponding type, given that the assignment +-- may be between derived and parent types. + +if Is_Derived_Type (Etype (Rhs)) then + Disc := Find_Component (R_Typ, C); +else + Disc := C; +end if; + -- In the case of an Unchecked_Union, use the discriminant -- constraint value as on the right-hand side of the assignment. @@ -1463,7 +1475,7 @@ Expr := Make_Selected_Component (Loc, Prefix=> Duplicate_Subexpr (Rhs), - Selector_Name => New_Occurrence_Of (C, Loc)); + Selector_Name => New_Occurrence_Of (Disc, Loc)); end if; A := Index: exp_ch4.adb === --- exp_ch4.adb (revision 251758) +++ exp_ch4.adb (working copy) @@ -10627,7 +10627,6 @@ Temp : Entity_Id; Decl : Node_Id; Odef : Node_Id; - Disc : Node_Id; N_Ix : Node_Id; Cons : List_Id; @@ -10657,23 +10656,70 @@ if not Is_Constrained (Target_Type) then if Has_Discriminants (Operand_Type) then - Disc := First_Discriminant (Operand_Type); - if Disc /= First_Stored_Discriminant (Operand_Type) then - Disc := First_Stored_Discriminant (Operand_Type); - end if; + -- A change of representation can only apply to untagged + -- types. We need to build the constraint that applies to + -- the target type, using the constraints of the operand. + -- The analysis is complicated if there are both inherited + -- discriminants and constrained discriminants. + -- We iterate over the discriminants of the target, and + -- find the discriminant of the same name
[Ada] Minor cleanup in support machinery for inter-unit inlining
The inter-unit inlining done by the compiler requires a dedicated machinery to deal with the public status of library-level entities, since it breaks the private/plublic semantic barrier of the language. This is a minor cleanup to this machinery, no functional changes. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Eric Botcazou * sem_ch7.adb (Has_Referencer): Move up and expand comment explaining the test used to detect inlining. Use same test in second occurrence. (Analyze_Package_Body_Helper): Minor formatting fixes. Index: sem_ch7.adb === --- sem_ch7.adb (revision 251762) +++ sem_ch7.adb (working copy) @@ -392,6 +392,13 @@ -- An inlined subprogram body acts as a referencer + -- Note that we test Has_Pragma_Inline here in addition + -- to Is_Inlined. We are doing this for a client, since + -- we are computing which entities should be public, and + -- it is the client who will decide if actual inlining + -- should occur, so we need to catch all cases where the + -- subprogram may be inlined by the client. + if Is_Inlined (Decl_Id) or else Has_Pragma_Inline (Decl_Id) then @@ -413,18 +420,13 @@ else Decl_Id := Defining_Entity (Decl); - -- An inlined body acts as a referencer. Note that an - -- inlined subprogram remains Is_Public as gigi requires - -- the flag to be set. + -- An inlined body acts as a referencer, see above. Note + -- that an inlined subprogram remains Is_Public as gigi + -- requires the flag to be set. - -- Note that we test Has_Pragma_Inline here rather than - -- Is_Inlined. We are compiling this for a client, and - -- it is the client who will decide if actual inlining - -- should occur, so we need to assume that the procedure - -- could be inlined for the purpose of accessing global - -- entities. - - if Has_Pragma_Inline (Decl_Id) then + if Is_Inlined (Decl_Id) + or else Has_Pragma_Inline (Decl_Id) + then if Top_Level and then not Contains_Subprograms_Refs (Decl) then @@ -915,11 +917,11 @@ -- down the number of global symbols that do not neet public visibility -- as this has two beneficial effects: --(1) It makes the compilation process more efficient. - --(2) It gives the code generatormore freedom to optimize within each + --(2) It gives the code generator more leeway to optimize within each --unit, especially subprograms. - -- This is done only for top level library packages or child units as - -- the algorithm does a top down traversal of the package body. + -- This is done only for top-level library packages or child units as + -- the algorithm does a top-down traversal of the package body. if (Scope (Spec_Id) = Standard_Standard or else Is_Child_Unit (Spec_Id)) and then not Is_Generic_Unit (Spec_Id)
[PATCH] Factor out division by squares and remove division around comparisons (0/2)
Hi all, This patch is split from part (1/2). It includes the patterns that have been moved out of fold-const.c It also removes an (almost entirely) redundant pattern: (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2) which was only used in special cases, either with combinations of flags like -fno-reciprocal-math -funsafe-math-optimizations and cases where C was sNaN, or small enough to result in infinity. This pattern is covered by: (A / C1) +- (A / C2) -> (with O1 and reciprocal math) A * (1 / C1) +- A * (1 / C2) -> A * (1 / C1 +- 1 / C2) The previous pattern required funsafe-math-optimizations. To adjust for this case, the testcase has been updated to require O1 so that the optimization is still performed. This pattern is moved verbatim into match.pd: (A / C) +- (B / C) -> (A +- B) / C. OK for trunk? Jackson gcc/ 2017-08-30 Jackson Woodruff PR 71026/tree-optimization * match.pd: Move RDIV patterns from fold-const.c * fold-const.c (distribute_real_division): Removed. (fold_binary_loc): Remove calls to distribute_real_divison. gcc/testsuite/ 2017-08-30 Jackson Woodruff PR 71026/tree-optimization * gcc/testsuire/gcc.dg/fold-div-1.c: Use O1. diff --git a/gcc/fold-const.c b/gcc/fold-const.c index de60f681514aacedb993d5c83c081354fa3b342b..9de1728fb27b7749aaca1ab318b88c4c9b237317 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -3794,47 +3794,6 @@ invert_truthvalue_loc (location_t loc, tree arg) : TRUTH_NOT_EXPR, type, arg); } - -/* Knowing that ARG0 and ARG1 are both RDIV_EXPRs, simplify a binary operation - with code CODE. This optimization is unsafe. */ -static tree -distribute_real_division (location_t loc, enum tree_code code, tree type, - tree arg0, tree arg1) -{ - bool mul0 = TREE_CODE (arg0) == MULT_EXPR; - bool mul1 = TREE_CODE (arg1) == MULT_EXPR; - - /* (A / C) +- (B / C) -> (A +- B) / C. */ - if (mul0 == mul1 - && operand_equal_p (TREE_OPERAND (arg0, 1), - TREE_OPERAND (arg1, 1), 0)) -return fold_build2_loc (loc, mul0 ? MULT_EXPR : RDIV_EXPR, type, - fold_build2_loc (loc, code, type, -TREE_OPERAND (arg0, 0), -TREE_OPERAND (arg1, 0)), - TREE_OPERAND (arg0, 1)); - - /* (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2). */ - if (operand_equal_p (TREE_OPERAND (arg0, 0), - TREE_OPERAND (arg1, 0), 0) - && TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST - && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST) -{ - REAL_VALUE_TYPE r0, r1; - r0 = TREE_REAL_CST (TREE_OPERAND (arg0, 1)); - r1 = TREE_REAL_CST (TREE_OPERAND (arg1, 1)); - if (!mul0) - real_arithmetic (&r0, RDIV_EXPR, &dconst1, &r0); - if (!mul1) -real_arithmetic (&r1, RDIV_EXPR, &dconst1, &r1); - real_arithmetic (&r0, code, &r0, &r1); - return fold_build2_loc (loc, MULT_EXPR, type, - TREE_OPERAND (arg0, 0), - build_real (type, r0)); -} - - return NULL_TREE; -} /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER starting at BITPOS. The field is unsigned if UNSIGNEDP is nonzero @@ -9378,12 +9337,6 @@ fold_binary_loc (location_t loc, } } - if (flag_unsafe_math_optimizations - && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == MULT_EXPR) - && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == MULT_EXPR) - && (tem = distribute_real_division (loc, code, type, arg0, arg1))) - return tem; - /* Convert a + (b*c + d*e) into (a + b*c) + d*e. We associate floats only if the user has specified -fassociative-math. */ @@ -9775,13 +9728,6 @@ fold_binary_loc (location_t loc, return tem; } - if (FLOAT_TYPE_P (type) - && flag_unsafe_math_optimizations - && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == MULT_EXPR) - && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == MULT_EXPR) - && (tem = distribute_real_division (loc, code, type, arg0, arg1))) - return tem; - /* Handle (A1 * C1) - (A2 * C2) with A1, A2 or C1, C2 being the same or one. Make sure the type is not saturating and has the signedness of the stripped operands, as fold_plusminus_mult_expr will re-associate. diff --git a/gcc/match.pd b/gcc/match.pd index 69dd8193cd0524d99fba8be8da8183230b8d621a..ab3f133f443a02e423abfbd635947ecaa8024a74 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3517,6 +3517,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (!HONOR_SNANS (type)) @0)) + (for op (plus minus) + /* Simplify (A / C) +- (B / C) -> (A +- B) / C. */ + (simplify + (op (rdi
[Ada] Resolution of set membersip operations with overloaded alternatives
This patch fixes a bug in the resolution of set membership operations when the expression and/or the alternatives on the right-hand side are overloaded. If a given overloaded alternative is resolved to a unique type by intersection with the types of previous alternatives, the type is used subsequently to resolve the expression itself. If the alternative is an enumeration literal, it must be replaced by the literal correspoding to the selected interpretation, because subsequent resolution will not replace the entity itself. The following must compile and run quietly: gnatmake -q -gnatws c45 c45 --- with Text_IO; use Text_IO; procedure C45 is procedure Failed (Msg : String) is begin Put_Line (Msg); end; type Month is (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec); type Radix is (Bin, Oct, Dec, Hex); type Shape is (Tri, Sqr, Pnt, Hex, Oct); -- Oct is defined for all three types; Dec for all but Shape; and Hex for -- all but Month. -- Three identical functions, one for each type. These provide no -- overloading information at all. function Item return Month is begin return Aug; end Item; function Item return Radix is begin return Dec; end Item; function Item return Shape is begin return Hex; end Item; begin -- No overloading in the choices: if Item in Jan .. Mar then -- type Month Failed ("Wrong result - no choice overloading (1)"); end if; if Item in Tri | Sqr | Pnt then -- type Radix Failed ("Wrong result - no choice overloading (2)"); end if; -- A single overloaded choice: if Item not in May .. Oct then -- type Month Failed ("Wrong result - single overloaded choice (3)"); end if; if Item not in Bin | Dec then -- type Radix Failed ("Wrong result - single overloaded choice (4)"); end if; if Item not in Tri | Sqr | Hex then -- type Shape Failed ("Wrong result - single overloaded choice (5)"); end if; -- At least one choice without overloading: if Item in Jan | Oct .. Dec then -- type Month Failed ("Wrong result - a non-overloaded choice (6)"); end if; if Item not in Oct .. Hex | Bin then -- type Radix Failed ("Wrong result - a non-overloaded choice (7)"); end if; if Item not in Oct | Sqr | Hex then -- type Shape Failed ("Wrong result - a non-overloaded choice (8)"); end if; if Item not in Oct | Sqr | Hex | Tri then -- type Shape Failed ("Wrong result - a non-overloaded choice (9)"); end if; if Item not in Dec | Hex | Oct | Bin then -- type Radix Failed ("Wrong result - a non-overloaded choice (10"); end if; -- The ultimate: everything is overloaded, but there still is only -- one possible solution. if Item not in Oct | Dec | Hex then -- type Radix Failed ("Wrong result - everything overloaded (11)"); end if; end C45; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_ch4.adb (Analyze_Set_Membership): If an alternative in a set membership is an overloaded enumeration literal, and the type of the alternative is resolved from a previous one, replace the entity of the alternative as well as the type, to prevent inconsistencies between the entity and the type. Index: sem_ch4.adb === --- sem_ch4.adb (revision 251753) +++ sem_ch4.adb (working copy) @@ -2935,11 +2935,20 @@ -- for all of them. Set_Etype (Alt, It.Typ); + + -- If the alternative is an enumeration literal, use + -- the one for this interpretation. + + if Is_Entity_Name (Alt) then + Set_Entity (Alt, It.Nam); + end if; + Get_Next_Interp (Index, It); if No (It.Typ) then Set_Is_Overloaded (Alt, False); Common_Type := Etype (Alt); + end if; Candidate_Interps := Alt;
[Ada] Enable automatic reordering of components in record types
This activates the reordering of components in record types with convention Ada that was implemented some time ago in the compiler. The idea is to get rid of blatant inefficiencies that the layout in textual order of the source code can bring about, typically when the offset of components is not fixed or not a multiple of the storage unit. The reordering is automatic and silent by default, but both aspects can be toggled: pragma No_Component_Reordering disables it either on a per-record- type or on a global basis, while -gnatw.q gives a warning for each affected component in record types. When pragma No_Component_Reordering is used as a configuration pragma to disable it, there is a requirement that the pragma be used consistently within a partition. The typical example is a discriminated record type with an array component, which yields with -gnatw.q -gnatl: 1. package P is 2. 3. type R (D : Positive) is record 4. S : String (1 .. D); | >>> warning: record layout may cause performance issues >>> warning: component "S" whose length depends on a discriminant >>> warning: comes too early and was moved down 5. I : Integer; 6. end record; 7. 8. end P; In this case, the compiler moves component S to the last position in the record so that every component is at a fixed offset from the start. Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Eric Botcazou * ali.ads (ALIs_Record): Add No_Component_Reordering component. (No_Component_Reordering_Specified): New switch. * ali.adb (Initialize_ALI): Set No_Component_Reordering_Specified. (Scan_ALI): Set No_Component_Reordering and deal with NC marker. * bcheck.adb (Check_Consistent_No_Component_Reordering): New check. (Check_Configuration_Consistency): Invoke it. * debug.adb (d.r): Toggle the effect of the switch. (d.v): Change to no-op. * einfo.ads (Has_Complex_Representation): Restrict to record types. (No_Reordering): New alias for Flag239. (OK_To_Reorder_Components): Delete. (No_Reordering): Declare. (Set_No_Reordering): Likewise. (OK_To_Reorder_Components): Delete. (Set_OK_To_Reorder_Components): Likewise. * einfo.adb (Has_Complex_Representation): Expect record types. (No_Reordering): New function. (OK_To_Reorder_Components): Delete. (Set_Has_Complex_Representation): Expect base record types. (Set_No_Reordering): New procedure. (Set_OK_To_Reorder_Components): Delete. (Write_Entity_Flags): Adjust to above change. * fe.h (Debug_Flag_Dot_R): New macro and declaration. * freeze.adb (Freeze_Record_Type): Remove conditional code setting OK_To_Reorder_Components on record types with convention Ada. * lib-writ.adb (Write_ALI): Deal with NC marker. * opt.ads (No_Component_Reordering): New flag. (No_Component_Reordering_Config): Likewise. (Config_Switches_Type): Add No_Component_Reordering component. * opt.adb (Register_Opt_Config_Switches): Copy No_Component_Reordering onto No_Component_Reordering_Config. (Restore_Opt_Config_Switches): Restore No_Component_Reordering. (Save_Opt_Config_Switches): Save No_Component_Reordering. (Set_Opt_Config_Switches): Set No_Component_Reordering. * par-prag.adb (Prag): Deal with Pragma_No_Component_Reordering. * sem_ch3.adb (Analyze_Private_Extension_Declaration): Also set the No_Reordering flag from the default. (Build_Derived_Private_Type): Likewise. (Build_Derived_Record_Type): Likewise. Then inherit it for untagged types and clean up handling of similar flags. (Record_Type_Declaration): Likewise. * sem_ch13.adb (Same_Representation): Deal with No_Reordering and remove redundant test on Is_Tagged_Type. * sem_prag.adb (Analyze_Pragma): Handle No_Component_Reordering. (Sig_Flags): Likewise. * snames.ads-tmpl (Name_No_Component_Reordering): New name. (Pragma_Id): Add Pragma_No_Component_Reordering value. * warnsw.adb (Set_GNAT_Mode_Warnings): Enable -gnatw.q as well. * gcc-interface/decl.c (gnat_to_gnu_entity) : Copy the layout of the parent type only if the No_Reordering settings match. (components_to_record): Reorder record types with convention Ada by default unless No_Reordering is set or -gnatd.r is specified and do not warn if No_Reordering is set in GNAT mode. Index: sem_ch3.adb === --- sem_ch3.adb (revision 251759) +++ sem_ch3.adb (working copy) @@ -5015,6 +5015,7 @@ Set_Ekind(T, E_Record_Type_With_Private); Init_Size_Align (T); Set_Default_SSO (T); + Set_No_Reordering
[Ada] Spurious error with formal incomplete types
This patch fixes a spurious error on the use of of a generic unit with formal incomplete types, as a formal package in another generic unit, when the actuals for the incomplete types are themselves formal incomplete types. The treatment of incomplete subtypes that are created for such formals is now more consistent with the handling of other subtypes, given their increased use in Ada2012. The following must compile quietly: --- gcc -c promote_2_streams.ads generic type Data_Stream_Type; type Data_Type; with function Has_Data (Stream : not null access Data_Stream_Type) return Boolean; with function Consume (Stream : not null access Data_Stream_Type) return Data_Type; package Data_Streams is end; --- with Data_Streams; generic type Data1_Type is private; type Data2_Type is private; with package DS1 is new Data_Streams (Data_Type => Data1_Type, others => <>); with package DS2 is new Data_Streams (Data_Type => Data2_Type, others => <>); package Promote_2_Streams is type Which_Type is range 1 .. 2; type Data_Type (Which : Which_Type := 1) is record case Which is when 1 => Data1 : Data1_Type; when 2 => Data2 : Data2_Type; end case; end record; function Consume1 (Stream : not null access DS1.Data_Stream_Type) return Data_Type is ((Which => 1, Data1 => DS1.Consume (Stream))); function Consume2 (Stream : not null access DS2.Data_Stream_Type) return Data_Type is ((Which => 2, Data2 => DS2.Consume (Stream))); package PS1 is new Data_Streams (DS1.Data_Stream_Type, Data_Type, DS1.Has_Data, Consume1); package PS2 is new Data_Streams (DS2.Data_Stream_Type, Data_Type, DS2.Has_Data, Consume2); end Promote_2_Streams; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * einfo.adb (Designated_Type): Use Is_Incomplete_Type to handle properly incomplete subtypes that may be created by explicit or implicit declarations. (Is_Base_Type): Take E_Incomplete_Subtype into account. (Subtype_Kind): Ditto. * sem_ch3.adb (Build_Discriminated_Subtype): Set properly the Ekind of a subtype of a discriminated incomplete type. (Fixup_Bad_Constraint): Use Subtype_Kind in all cases, including incomplete types, to preserve error reporting. (Process_Incomplete_Dependents): Do not create a subtype declaration for an incomplete subtype that is created internally. * sem_ch7.adb (Analyze_Package_Specification): Handle properly incomplete subtypes that do not require a completion, either because they are limited views, of they are generic actuals. Index: sem_ch3.adb === --- sem_ch3.adb (revision 251753) +++ sem_ch3.adb (working copy) @@ -10094,7 +10094,11 @@ -- elaboration, because only the access type is needed in the -- initialization procedure. - Set_Ekind (Def_Id, Ekind (T)); + if Ekind (T) = E_Incomplete_Type then +Set_Ekind (Def_Id, E_Incomplete_Subtype); + else +Set_Ekind (Def_Id, Ekind (T)); + end if; if For_Access and then Within_Init_Proc then null; @@ -13629,15 +13633,9 @@ procedure Fixup_Bad_Constraint is begin - -- Set a reasonable Ekind for the entity. For an incomplete type, - -- we can't do much, but for other types, we can set the proper - -- corresponding subtype kind. + -- Set a reasonable Ekind for the entity, including incomplete types. - if Ekind (T) = E_Incomplete_Type then -Set_Ekind (Def_Id, Ekind (T)); - else -Set_Ekind (Def_Id, Subtype_Kind (Ekind (T))); - end if; + Set_Ekind (Def_Id, Subtype_Kind (Ekind (T))); -- Set Etype to the known type, to reduce chances of cascaded errors @@ -20802,7 +20800,9 @@ -- Ada 2005 (AI-412): Transform a regular incomplete subtype into a -- corresponding subtype of the full view. - elsif Ekind (Priv_Dep) = E_Incomplete_Subtype then + elsif Ekind (Priv_Dep) = E_Incomplete_Subtype +and then Comes_From_Source (Priv_Dep) + then Set_Subtype_Indication (Parent (Priv_Dep), New_Occurrence_Of (Full_T, Sloc (Priv_Dep))); Set_Etype (Priv_Dep, Full_T); Index: sem_ch7.adb === --- sem_ch7.adb (revision 251753) +++ sem_ch7.adb (working copy) @@ -1441,11 +1441,14 @@ -- Check on incomplete types - -- AI05-0213: A formal incomplete type has no completion + -- AI05-0213: A formal incomplete type has no completion, + -- and neither does the corresponding subtype in an instance. - if Ekind (E) = E_Incomplete_Type + if Is_Incomplete_Type (E)
[Ada] Extension of 'Image in Ada2020.
AI12-0124 adds the notation Object'Image to the language, following the semantics of GNAT-defined attribute 'Img. This patch fixes an omission in the characterization of objects, which must include function calls and thus attribute references for attributes that are functions, as well as predefined operators. The following must compile and execute quietly: gnatmake -q img img --- procedure Img is type Enum is (A, BC, ABC, A_B_C, abcd, 'd'); type New_Enum is new Enum; function Ident (X : Enum) return Enum is begin return X; end Ident; E1 : New_Enum := New_Enum (Ident (BC)); type Int is new Long_Integer; type Der is new Int; function Ident (X : Der) return Der is begin return X; end Ident; V : Der := Ident (123); begin if New_Enum'Pred (E1)'Img /= "A" then raise Program_Error; end if; if New_Enum'Pred (E1)'Image /= "A" then raise Program_Error; end if; if Der'(V - 23)'Image /= "100" then raise Program_Error; end if; end; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_util.adb (Is_Object_Reference): A function call is an object reference, and thus attribute references for attributes that are functions (such as Pred and Succ) as well as predefined operators are legal in contexts that require an object, such as the prefix of attribute Img and the Ada2020 version of 'Image. Index: sem_util.adb === --- sem_util.adb(revision 251753) +++ sem_util.adb(working copy) @@ -14153,18 +14153,21 @@ -- In Ada 95, a function call is a constant object; a procedure -- call is not. -when N_Function_Call => +-- Note that predefined operators are functions as well, and so +-- are attributes that are (can be renamed as) functions. + +when N_Function_Call | N_Binary_Op | N_Unary_Op => return Etype (N) /= Standard_Void_Type; --- Attributes 'Input, 'Loop_Entry, 'Old, and 'Result produce --- objects. +-- Attributes references 'Loop_Entry, 'Old, and 'Result yield +-- objects, even though they are not functions. when N_Attribute_Reference => return - Nam_In (Attribute_Name (N), Name_Input, - Name_Loop_Entry, + Nam_In (Attribute_Name (N), Name_Loop_Entry, Name_Old, - Name_Result); + Name_Result) + or else Is_Function_Attribute_Name (Attribute_Name (N)); when N_Selected_Component => return
Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
On 09/05/17 23:27, Wilco Dijkstra wrote: > Bernd Edlinger wrote: >> No, the split condition does not begin with "&& TARGET_32BIT...". >> Therefore the split is enabled in TARGET_NEON after reload_completed. >> And it is invoked from adddi3_neon for all alternatives without vfp >> registers: > > Hmm that's a huge mess. I'd argue that any inst_and_split should only split > it's own instruction, never other instructions (especially if they are from > different md files, which is extremely confusing). Otherwise we should use > a separate split and explicitly list which instructions it splits. > This is literally a mine-field. > So then the next question is whether the neon_adddi3 still needs the > arm_adddi3 splitter in some odd corner cases? > Yes, I think so. While *arm_adddi3 and adddi3_neon insn are mutually exclusive, they share the splitter. I don't know with which iwmmxt-pattern the *arm_adddi3-splitter might interfere, therefore I don't know if the !TARGET_IWMMXT can be removed from the splitter condition. Other patterns have a iwmmxt-twin that is not mutually exclusive. For instance *anddi_notdi_di, bicdi3_neon and iwmmxt_nanddi3 The bicdi3_neon insn duplicates the alternatives while iwmmxt does not. And nobody is able to test iwmmxt. >>> Also there are more cases, a quick grep suggests *anddi_notdi_di has the >>> same issue. > >> Yes, that pattern can be cleaned up in a follow-up patch. > > And there are a lot more instructions that need the same treatment and split > early (probably best at expand time). I noticed none of the zero/sign extends > split before regalloc for example. > I did use the test cases as benchmarks to decide which way to go. It is quite easy to try different combinations of cpu and inspect the stack usage of neon, iwmmxt, and vfp etc. It may be due to interaction with other patterns, but not in every case the split at expand time produced the best results. >> Note this splitter is invoked from bicdi3_neon as well. >> However I think anddi_notdi_di should be safe as long as it is enabled >> after reload_completed (which is probably a bug). > > Since we should be splitting and/bic early now I don't think you can get > anddi_notdi > anymore. So it could be removed completely assuming Neon already does the > right > thing. > > It looks like we need to do a full pass over all DI mode instructions and clean up > all the mess. > Yes, but in small steps, and using some benchmarks to make sure that each step does improve at least something. Bernd. > Wilco >
Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands
On Tue, Sep 05, 2017 at 11:21:48PM -0600, Jeff Law wrote: > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb) >"Width = %d was chosen for reassociation\n", > width); > > > - /* For binary bit operations, if the last operand in > - OPS is a constant, move it to the front. This > - helps ensure that we generate (X & Y) & C rather > - than (X & C) & Y. The former will often match > - a canonical bit test when we get to RTL. */ > - if ((rhs_code == BIT_AND_EXPR > -|| rhs_code == BIT_IOR_EXPR > -|| rhs_code == BIT_XOR_EXPR) > + /* For binary bit operations, if there are at least 3 > + operands and the last last operand in OPS is a constant, > + move it to the front. This helps ensure that we generate > + (X & Y) & C rather than (X & C) & Y. The former will > + often match a canonical bit test when we get to RTL. */ > + if (ops.length () != 2 So wouldn't it be clearer to write ops.length () > 2 ? if (ops.length () == 0) else if (ops.length () == 1) come earlier, so it is the same thing, but might help the reader. > + && (rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > && TREE_CODE (ops.last ()->op) == INTEGER_CST) > std::swap (*ops[0], *ops[ops_num - 1]); Don't you then want to put the constant as second operand rather than first, i.e. swap with *ops[1]? And doesn't swap_ops_for_binary_stmt undo it again? Jakub
[Ada] Compiler crash on call to eliminated protected operation.
This patch fixes an omission in the handling of pragma Eliminate when applied to a protected operation. The pragma was properly processed, but a call to an eliminated protected operation was not flagged as an error, and the code generator aborted on a call to an undefined operation. Compiling: gcc -c -gnatec=gnat.adc data.adb must yield: data.adb:12:14: cannot reference subprogram "Some_Protected_Data" eliminated at Global_Pragmas.adc:4 data.adb:20:21: cannot reference subprogram "Some_Protected_Data" eliminated at Global_Pragmas.adc:4 --- -- List of unused entities to be placed in gnat.adc. -- pragma Eliminate (Data, Some_Protected_Data, Source_Location => "data.ads:12"); --- package Data is type Data_Type_T is new Natural; function Get_Private_Data return Data_Type_T; private protected type Some_Type is function Some_Protected_Data return Data_Type_T; private Data : Data_Type_T := 0; end Some_Type; end Data; --- package body Data is protected body Some_Type is function Some_Protected_Data return Data_Type_T is begin return Data; end Some_Protected_Data; function Redundant return Data_Type_T is begin return Some_Protected_Data; end; end Some_Type; My_Data : Some_Type; function Get_Private_Data return Data_Type_T is begin return My_Data.Some_Protected_Data; end Get_Private_Data; end Data; Tested on x86_64-pc-linux-gnu, committed on trunk 2017-09-06 Ed Schonberg * sem_res.adb (Resolve_Entry_Call): Check whether a protected operation is subject to a pragma Eliminate. Index: sem_res.adb === --- sem_res.adb (revision 251753) +++ sem_res.adb (working copy) @@ -7519,10 +7519,15 @@ if Nkind (Entry_Name) = N_Selected_Component then - -- Simple entry call + -- Simple entry or protected operation call Nam := Entity (Selector_Name (Entry_Name)); Obj := Prefix (Entry_Name); + + if Is_Subprogram (Nam) then +Check_For_Eliminated_Subprogram (Entry_Name, Nam); + end if; + Was_Over := Is_Overloaded (Selector_Name (Entry_Name)); else pragma Assert (Nkind (Entry_Name) = N_Indexed_Component);
Re: [PR 82078] Enqueue all SRA links for write flag propagation
On Wed, 6 Sep 2017, Martin Jambor wrote: > Hi, > > PR 82078 is another fallout from lazy setting of written flag in SRA. > The problem here is that we do not enqueue assignment links going out > of accesses of candidates that were disqualified before we start the > loop with sort_and_splice_var_accesses. > > Given that the propagation is now a correctness necessity, we need to > enqueue all links for processing, so this patch does it when they they > are created. There should be very little extra work done because of > this because propagate_all_subaccesses starts with checking the > candidate-status of both sides of each link and acts accordingly. > > Bootstrapped and tested on x86_64-linux without any issues. OK for > trunk? Ok. Thanks, Richard. > Thanks, > > Martin > > > > 2017-09-05 Martin Jambor > > PR tree-optimization/82078 > gcc/ > * tree-sra.c (sort_and_splice_var_accesses): Move call to > add_access_to_work_queue... > (build_accesses_from_assign): ...here. > (propagate_all_subaccesses): Make sure racc is the group > representative, if there is one. > > gcc/testsuite/ > * gcc.dg/tree-ssa/pr82078.c: New test. > --- > gcc/testsuite/gcc.dg/tree-ssa/pr82078.c | 27 +++ > gcc/tree-sra.c | 5 - > 2 files changed, 31 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82078.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c > new file mode 100644 > index 000..3774986324b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c > @@ -0,0 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O" } */ > + > +struct S0 { > + signed f4; > + signed f9 : 5; > +} a[6][5], b = {2} > + > +; > +int c, d; > +int fn1() { > + struct S0 e[5][6]; > + struct S0 f; > + b = f = e[2][5] = a[5][0]; > + if (d) > +; > + else > +return f.f9; > + e[c][45] = a[4][4]; > +} > + > +int main() { > + fn1(); > + if (b.f4 != 0) > +__builtin_abort (); > + return 0; > +} > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 68edbce21b3..163b7a2d03b 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -1359,6 +1359,8 @@ build_accesses_from_assign (gimple *stmt) >link->lacc = lacc; >link->racc = racc; >add_link_to_rhs (racc, link); > + add_access_to_work_queue (racc); > + >/* Let's delay marking the areas as written until propagation of > accesses >across link, unless the nature of rhs tells us that its data comes >from elsewhere. */ > @@ -2118,7 +2120,6 @@ sort_and_splice_var_accesses (tree var) >access->grp_total_scalarization = total_scalarization; >access->grp_partial_lhs = grp_partial_lhs; >access->grp_unscalarizable_region = unscalarizable_region; > - add_access_to_work_queue (access); > >*prev_acc_ptr = access; >prev_acc_ptr = &access->next_grp; > @@ -2712,6 +2713,8 @@ propagate_all_subaccesses (void) >struct access *racc = pop_access_from_work_queue (); >struct assign_link *link; > > + if (racc->group_representative) > + racc= racc->group_representative; >gcc_assert (racc->first_link); > >for (link = racc->first_link; link; link = link->next) > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PR 82078] Enqueue all SRA links for write flag propagation
Hi, PR 82078 is another fallout from lazy setting of written flag in SRA. The problem here is that we do not enqueue assignment links going out of accesses of candidates that were disqualified before we start the loop with sort_and_splice_var_accesses. Given that the propagation is now a correctness necessity, we need to enqueue all links for processing, so this patch does it when they they are created. There should be very little extra work done because of this because propagate_all_subaccesses starts with checking the candidate-status of both sides of each link and acts accordingly. Bootstrapped and tested on x86_64-linux without any issues. OK for trunk? Thanks, Martin 2017-09-05 Martin Jambor PR tree-optimization/82078 gcc/ * tree-sra.c (sort_and_splice_var_accesses): Move call to add_access_to_work_queue... (build_accesses_from_assign): ...here. (propagate_all_subaccesses): Make sure racc is the group representative, if there is one. gcc/testsuite/ * gcc.dg/tree-ssa/pr82078.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr82078.c | 27 +++ gcc/tree-sra.c | 5 - 2 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82078.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c new file mode 100644 index 000..3774986324b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82078.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O" } */ + +struct S0 { + signed f4; + signed f9 : 5; +} a[6][5], b = {2} + +; +int c, d; +int fn1() { + struct S0 e[5][6]; + struct S0 f; + b = f = e[2][5] = a[5][0]; + if (d) +; + else +return f.f9; + e[c][45] = a[4][4]; +} + +int main() { + fn1(); + if (b.f4 != 0) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 68edbce21b3..163b7a2d03b 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -1359,6 +1359,8 @@ build_accesses_from_assign (gimple *stmt) link->lacc = lacc; link->racc = racc; add_link_to_rhs (racc, link); + add_access_to_work_queue (racc); + /* Let's delay marking the areas as written until propagation of accesses across link, unless the nature of rhs tells us that its data comes from elsewhere. */ @@ -2118,7 +2120,6 @@ sort_and_splice_var_accesses (tree var) access->grp_total_scalarization = total_scalarization; access->grp_partial_lhs = grp_partial_lhs; access->grp_unscalarizable_region = unscalarizable_region; - add_access_to_work_queue (access); *prev_acc_ptr = access; prev_acc_ptr = &access->next_grp; @@ -2712,6 +2713,8 @@ propagate_all_subaccesses (void) struct access *racc = pop_access_from_work_queue (); struct assign_link *link; + if (racc->group_representative) + racc= racc->group_representative; gcc_assert (racc->first_link); for (link = racc->first_link; link; link = link->next) -- 2.14.1
RE: [PATCH] [Aarch64] Optimize subtract in shift counts
Richard Sandiford do you have any objections to the patch as it stands? It doesn't appear as if anything is going to change in the mid-end anytime soon. -Original Message- From: Richard Sandiford [mailto:richard.sandif...@linaro.org] Sent: Tuesday, August 22, 2017 9:11 AM To: Richard Biener Cc: Richard Kenner ; Michael Collison ; GCC Patches ; nd ; Andrew Pinski Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts Richard Biener writes: > On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford >>> wrote: Richard Biener writes: > On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner > wrote: >>> Correct. It is truncated for integer shift, but not simd shift >>> instructions. We generate a pattern in the split that only generates >>> the integer shift instructions. >> >> That's unfortunate, because it would be nice to do this in simplify_rtx, >> since it's machine-independent, but that has to be conditioned on >> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. > > SHIFT_COUNT_TRUNCATED should go ... you should express this in the > patterns, like for example with > > (define_insn ashlSI3 > [(set (match_operand 0 "") > (ashl:SI (match_operand ... ) > (subreg:QI (match_operand:SI ...)))] > > or an explicit and:SI and combine / simplify_rtx should apply the magic > optimization we expect. The problem with the explicit AND is that you'd end up with either an AND of two constants for constant shifts, or with two separate patterns, one for constant shifts and one for variable shifts. (And the problem in theory with two patterns is that it reduces the RA's freedom, although in practice I guess we'd always want a constant shift where possible for cost reasons, and so the RA would never need to replace pseudos with constants itself.) I think all useful instances of this optimisation will be exposed by the gimple optimisers, so maybe expand could to do it based on TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than the rtx code and it does take the mode into account. >>> >>> Sure, that could work as well and also take into account range info. >>> But we'd then need named expanders and the result would still have >>> the explicit and or need to be an unspec or a different RTL operation. >> >> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >> target-dependent rather than undefined behaviour, so it's OK for a >> target to use shift codes with out-of-range values. > > Hmm, but that means simplify-rtx can't do anything with them because > we need to preserve target dependent behavior. Yeah, it needs to punt. In practice that shouldn't matter much. > I think the RTL IL should be always well-defined and its semantics > shouldn't have any target dependences (ideally, and if, then they > should be well specified via extra target hooks/macros). That would be nice :-) I think the problem has traditionally been that shifts can be used in quite a few define_insn patterns besides those for shift instructions. So if your target defines shifts to have 256-bit precision (say) then you need to make sure that every define_insn with a shift rtx will honour that. It's more natural for target guarantees to apply to instructions than to rtx codes. >> And >> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about how >> the normal shift optabs behave, so I don't think we'd need new optabs >> or new unspecs. >> >> E.g. it already works this way when expanding double-word shifts, >> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There it's >> possible to use a shorter sequence if you know that the shift optab >> truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED >> isn't defined. > > I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK > applies to the instructions generated by the named shift patterns but > _not_ general shift RTXen. But the generated pattern contains shift > RTXen and how can we figure whether they were generated by the named > expanders or by other means? Don't define_expand also serve as > define_insn for things like combine? Yeah, you can't (and aren't supposed to try to) reverse-engineer the expander from the generated instructions. TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding a shift optab. > That said, from looking at the docs and your description above it > seems that semantics are not fully reflected in the RTL instruction stream? Yeah, the semantics go from being well-defined in the optab interface to being target-dependent in the rtl stream. Thanks, Richard > > Richard. > >> Thanks, >> Richard >> >>> >>> Richard. >>> Thanks, Richard
Re: [AArch64, PATCH] Improve Neon store of zero
Hi all, I've attached a new patch that addresses some of the issues raised with my original patch. On 08/23/2017 03:35 PM, Wilco Dijkstra wrote: Richard Sandiford wrote: Sorry for only noticing now, but the call to aarch64_legitimate_address_p is asking whether the MEM itself is a legitimate LDP/STP address. Also, it might be better to pass false for strict_p, since this can be called before RA. So maybe: if (GET_CODE (operands[0]) == MEM && !(aarch64_simd_imm_zero (operands[1], mode) && aarch64_mem_pair_operand (operands[0], mode))) There were also some issues with the choice of mode for the call the aarch64_mem_pair_operand. For a 128-bit wide mode, we want to check `aarch64_mem_pair_operand (operands[0], DImode)` since that's what the stp will be. For a 64-bit wide mode, we don't need to do that check because a normal `str` can be issued. I've updated the condition as such. Is there any reason for doing this check at all (or at least this early during expand)? Not doing this check means that the zero is forced into a register, so we then carry around a bit more RTL and rely on combine to merge things. There is a similar issue with this part: (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, m, w, ?r, ?w, ?r, w") + "=w, Ump, m, w, ?r, ?w, ?r, w") The Ump causes the instruction to always split off the address offset. Ump cannot be used in patterns that are generated before register allocation as it also calls laarch64_legitimate_address_p with strict_p set to true. I've changed the constraint to a new constraint 'Umq', that acts the same as Ump, but calls aarch64_legitimate_address_p with strict_p set to false and uses DImode for the mode to pass. OK for trunk? Jackson Wilco ChangeLog: gcc/ 2017-08-29 Jackson Woodruff * config/aarch64/constraints.md (Umq): New constraint. * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Change to use Umq. (mov): Update condition. gcc/testsuite 2017-08-29 Jackson Woodruff * gcc.target/aarch64/simd/vect_str_zero.c: Update testcase. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f3e084f8778d70c82823b92fa80ff96021ad26db..a044a1306a897b169ff3bfa06532c692aaf023c8 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -23,10 +23,11 @@ (match_operand:VALL_F16 1 "general_operand" ""))] "TARGET_SIMD" " -if (GET_CODE (operands[0]) == MEM - && !(aarch64_simd_imm_zero (operands[1], mode) -&& aarch64_legitimate_address_p (mode, operands[0], - PARALLEL, 1))) + if (GET_CODE (operands[0]) == MEM + && !(aarch64_simd_imm_zero (operands[1], mode) + && ((GET_MODE_SIZE (mode) == 16 + && aarch64_mem_pair_operand (operands[0], DImode)) + || GET_MODE_SIZE (mode) == 8))) operands[1] = force_reg (mode, operands[1]); " ) @@ -126,7 +127,7 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Ump, m, w, ?r, ?w, ?r, w") + "=w, Umq, m, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 9ce3d4efaf31a301dfb7c1772a6b685fb2cbd2ee..4b926bf80558532e87a1dc4cacc85ff008dd80aa 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -156,6 +156,14 @@ (and (match_code "mem") (match_test "REG_P (XEXP (op, 0))"))) +(define_memory_constraint "Umq" + "@internal + A memory address which uses a base register with an offset small enough for + a load/store pair operation in DI mode." + (and (match_code "mem") + (match_test "aarch64_legitimate_address_p (DImode, XEXP (op, 0), + PARALLEL, 0)"))) + (define_memory_constraint "Ump" "@internal A memory address suitable for a load/store pair operation." diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c index 07198de109432b530745cc540790303ae0245efb..00cbf20a0b293e71ed713f0c08d89d8a525fa785 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vect_str_zero.c @@ -7,7 +7,7 @@ void f (uint32x4_t *p) { uint32x4_t x = { 0, 0, 0, 0}; - p[1] = x; + p[4] = x; /* { dg-final { scan-assembler "stp\txzr, xzr," } } */ } @@ -16,7 +16,9 @@ void g (float32x2_t *p) { float32x2_t x = {0.0, 0.0}; - p[0] = x; + p[400] = x; /* { dg-final { scan-assembler "str\txzr, " } } */ } + +/* { dg-final { scan-assembler-not "add\tx\[0-9\]\+, x0, \[0-9\]+" } } */
Re: [PATCH v2] Python testcases to check DWARF output
On 09/05/2017 09:46 PM, Mike Stump wrote: I've included the dwarf people on the cc list. Seems like they may have an opinion on the direction or the patch itself. I was fine with the patch from the larger testsuite perspective. Good idea, thank you! And thank you for your feedback. :-) -- Pierre-Marie de Rodat
Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31
Hi Thomas, On 05/09/17 10:04, Thomas Preudhomme wrote: Ping? This is ok if a bootstrap and test run on arm-none-linux-gnueabihf shows no problems. Thanks, Kyrill Best regards, Thomas On 25/08/17 12:18, Thomas Preudhomme wrote: Hi, I've now also added a couple more changes: * size to_clear_bitmap according to maxregno to be consistent with its use * use directly TARGET_HARD_FLOAT instead of clear_vfpregs Original message below (ChangeLog unchanged): Function cmse_nonsecure_entry_clear_before_return has code to deal with high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do not support more than 16 double VFP registers (D0-D15). This makes this security-sensitive code harder to read for not much benefit since libcall for cmse_nonsecure_call functions do not deal with those high VFP registers anyway. This commit gets rid of this code for simplicity and fixes 2 issues in the same function: - stop the first loop when reaching maxregno to avoid dealing with VFP registers if targetting Thumb-1 or using -mfloat-abi=soft - include maxregno in that loop ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it. Replace the remaining entry by a sbitmap and adapt code accordingly. Testing: Testsuite shows no regression when run for ARMv8-M Baseline and ARMv8-M Mainline. Is this ok for trunk? Best regards, Thomas On 23/08/17 11:56, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 17/07/17 17:25, Thomas Preudhomme wrote: My bad, found an off-by-one error in the sizing of bitmaps. Please find fixed patch in attachment. ChangeLog entry is unchanged: *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it. Replace the remaining entry by a sbitmap and adapt code accordingly. Best regards, Thomas On 17/07/17 09:52, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 12/07/17 09:59, Thomas Preudhomme wrote: Hi Richard, On 07/07/17 15:19, Richard Earnshaw (lists) wrote: Hmm, I think that's because really this is a partial conversion. It looks like doing this properly would involve moving that existing code to use sbitmaps as well. I think doing that would be better for long-term maintenance perspectives, but I'm not going to insist that you do it now. There's also the assert later but I've found a way to improve it slightly. While switching to auto_sbitmap I also changed the code slightly to allocate directly bitmaps to the right size. Since the change is probably bigger than what you had in mind I'd appreciate if you can give me an OK again. See updated patch in attachment. ChangeLog entry is unchanged: 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it. Replace the remaining entry by a sbitmap and adapt code accordingly. As a result I'll let you take the call as to whether you keep this version or go back to your earlier patch. If you do decide to keep this version, then see the comment below. Given the changes I'm more happy with how the patch looks now and making it go in can be a nice incentive to change other ARMv8-M Security Extension related code later on. Best regards, Thomas
Re: [PATCH] Improve alloca alignment
Jeff Law writes: > On 08/22/2017 08:15 AM, Wilco Dijkstra wrote: >> Jeff Law wrote: >> On 07/26/2017 05:29 PM, Wilco Dijkstra wrote: >> But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0 seems wrong too given that round_push uses a different alignment to align to. >>> I had started to dig into the history of this code, but just didn't have >>> the time to do so fully before needing to leave for the day. To some >>> degree I was hoping you knew the rationale behind the test against >>> MAX_SUPPORTED_STACK_ALIGNMENT and I wouldn't have to do a ton of digging :-) >> >> I looked further into this - it appears this works correctly since it is >> only bypassed if >> size_align is already maximally aligned. round_push aligns to the >> preferred alignment, >> which may be lower or equal to MAX_SUPPORTED_STACK_ALIGNMENT (which is >> at least STACK_BOUNDARY). >> >> Here is the updated version: >> >> ChangeLog: >> 2017-08-22 Wilco Dijkstra >> >> * explow.c (get_dynamic_stack_size): Improve dynamic alignment. > OK. I wonder if this will make it easier to write stack-clash tests of > the dynamic space for boundary conditions :-) I was always annoyed that > I had to fiddle around with magic adjustments to the sizes of objects to > tickle boundary cases. This patch brought back PR libgomp/78468, which had caused its predecessor to be backed out of gcc-7. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Fix SLSR issue
This fixes a bogus check for a mode when the type matters. The test can get fooled by vector ops with integral mode and thus we later ICE trying to use wide-ints operating on vector constants. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-09-06 Richard Biener * gimple-ssa-strength-reduction.c (find_candidates_dom_walker::before_doom_children): Use a type and not a mode check. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 251710) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -1742,8 +1742,7 @@ find_candidates_dom_walker::before_dom_c slsr_process_ref (gs); else if (is_gimple_assign (gs) - && SCALAR_INT_MODE_P - (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs) + && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (gs { tree rhs1 = NULL_TREE, rhs2 = NULL_TREE;
Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
On 5 September 2017 at 20:20, Christophe Lyon wrote: > On 5 September 2017 at 19:53, Kyrill Tkachov > wrote: >> >> On 05/09/17 18:48, Bernd Edlinger wrote: >>> >>> On 09/05/17 17:02, Wilco Dijkstra wrote: Bernd Edlinger wrote: > Combine creates an invalid insn out of these two insns: Yes it looks like a latent bug. We need to use arm_general_register_operand as arm_adddi3/subdi3 only allow integer registers. You don't need a new predicate s_register_operand_nv. Also I'd prefer something like arm_general_adddi_operand. >>> Thanks, attached is a patch following your suggestion. >>> + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" The split condition for adddi3 now looks more accurate indeed, although we could remove the !TARGET_NEON from the split condition as this is always true given arm_adddi3 uses "TARGET_32BIT && !TARGET_NEON". >>> No, the split condition does not begin with "&& TARGET_32BIT...". >>> Therefore the split is enabled in TARGET_NEON after reload_completed. >>> And it is invoked from adddi3_neon for all alternatives without vfp >>> registers: >>> >>> switch (which_alternative) >>> { >>> case 0: /* fall through */ >>> case 3: return "vadd.i64\t%P0, %P1, %P2"; >>> case 1: return "#"; >>> case 2: return "#"; >>> case 4: return "#"; >>> case 5: return "#"; >>> case 6: return "#"; >>> >>> >>> Also there are more cases, a quick grep suggests *anddi_notdi_di has the same issue. >>> Yes, that pattern can be cleaned up in a follow-up patch. >>> Note this splitter is invoked from bicdi3_neon as well. >>> However I think anddi_notdi_di should be safe as long as it is enabled >>> after reload_completed (which is probably a bug). >>> >> >> Thanks, that's what I had in mind in my other reply. >> This is ok if testing comes back ok. >> > > I've submitted the patch for testing, I'll let you know about the results. > I can confirm the last patch does fix the regression I reported, and causes no other regression. (The previous previous of the fix, worked, too). Thanks for the prompt fix. Christophe > Christophe > >> Kyrill >> >> >>> Bernd. >>> Wilco >>