Re: limit on emails for merge commits.
Hello, > You should include Joel on such questions as the expert on the hooks. > > I don't know whether there's something to put in the commit message to say > "allow this merge of more than 100 commits". I don't think a squashed > merge is the right workaround, supposing you do want the git ancestry > information to reflect what got merged. The simple workaround is to do > three successive merges, each of under 100 commits and each pushed > separately. Alternatively, you could check out refs/meta/config, push a > change that either increases hooks.max-commit-emails or sets > hooks.no-emails to prevent mails for this branch, then push the merge, > then push a reversion of the change to refs/meta/config. Joseph pretty much nailed it in terms of options. -- Joel
[COMMITTED] contrib/gcc_update: Insert "tformat:" for git log --pretty=tformat:%p:%t:%H
Really old git versions (like 1.6.0) require "git log --pretty=tformat:%p:%t:%H" or else we see: Updating GIT tree Current branch master is up to date. fatal: invalid --pretty format: %p:%t:%H Adjusting file timestamps Touching gcc/config.in... Touching gcc/config/arm/arm-tune.md... ...and an empty revision in LAST_UPDATED and gcc/REVISION. In its absence, for newer git versions, "tformat" is the default qualifier, documented as such default for at least git-2.11.0. Committed as obvious. diff --git a/contrib/ChangeLog b/contrib/ChangeLog index 16d0667..4e89b8d 100644 --- a/contrib/ChangeLog +++ b/contrib/ChangeLog @@ -1,3 +1,8 @@ +2020-01-17 Hans-Peter Nilsson + + * gcc_update : Use git log "--pretty=tformat:%p:%t:%H", + not "--pretty=%p:%t:%H". + 2020-01-16 Andreas Schwab * gcc-git-customization.sh: Avoid double expansion. diff --git a/contrib/gcc_update b/contrib/gcc_update index 5df3297..8c980b1 100755 --- a/contrib/gcc_update +++ b/contrib/gcc_update @@ -330,7 +330,7 @@ case $vcs_type in exit 1 fi - revision=`$GCC_GIT log -n1 --pretty=%p:%t:%H` + revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H` branch=`$GCC_GIT name-rev --name-only HEAD || :` ;; brgds, H-P
Re: [PATCH] vect: Fix ICE in vectorizable_comparison PR93292
On Fri, 17 Jan 2020, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs on powerpc64le-linux. The problem is that > get_vectype_for_scalar_type returns NULL, and while most places in > tree-vect-stmts.c handle that case, this spot doesn't and punts only > if it is non-NULL, but with different number of elts than expected. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK. Richard. > 2020-01-16 Jakub Jelinek > > PR tree-optimization/93292 > * tree-vect-stmts.c (vectorizable_comparison): Punt also if > get_vectype_for_scalar_type returns NULL. > > * g++.dg/opt/pr93292.C: New test. > > --- gcc/tree-vect-stmts.c.jj 2020-01-12 11:54:38.522381590 +0100 > +++ gcc/tree-vect-stmts.c 2020-01-16 19:42:30.60882 +0100 > @@ -10492,7 +10492,7 @@ vectorizable_comparison (stmt_vec_info s > { >vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), >slp_node); > - if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) > + if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) > return false; > } >else if (maybe_ne (nunits, TYPE_VECTOR_SUBPARTS (vectype))) > --- gcc/testsuite/g++.dg/opt/pr93292.C.jj 2020-01-16 19:48:51.110144613 > +0100 > +++ gcc/testsuite/g++.dg/opt/pr93292.C2020-01-16 19:47:57.351956177 > +0100 > @@ -0,0 +1,18 @@ > +// PR tree-optimization/93292 > +// { dg-do compile } > +// { dg-options "-O3 -w" } > + > +struct A { > + static int foo (float x) { static int b; b = x ? x + 0.5 : 0; return b; } > +}; > + > +void > +bar (int *d, float e) > +{ > + float g; > + for (int h = 0; h < 64; h++) > +{ > + d[h] += A::foo (g < 0 ? : g > 5 ? : g); > + A::foo (e); > +} > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [PATCH] analyzer: fix handling of negative byte offsets (v2) (PR 93281)
On Thu, 16 Jan 2020, David Malcolm wrote: > On Thu, 2020-01-16 at 09:14 +0100, Jakub Jelinek wrote: > > On Wed, Jan 15, 2020 at 06:56:48PM -0500, David Malcolm wrote: > > > gcc/analyzer/ChangeLog: > > > PR analyzer/93281 > > > * region-model.cc > > > (region_model::convert_byte_offset_to_array_index): Convert > > > offset_cst to ssizetype before dividing by byte_size. > > > --- > > > gcc/analyzer/region-model.cc | 5 - > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region- > > > model.cc > > > index 5c39be4fd7f..b62ddf82a40 100644 > > > --- a/gcc/analyzer/region-model.cc > > > +++ b/gcc/analyzer/region-model.cc > > > @@ -6414,9 +6414,12 @@ > > > region_model::convert_byte_offset_to_array_index (tree ptr_type, > > >/* This might not be a constant. */ > > >tree byte_size = size_in_bytes (elem_type); > > > > > > + /* Ensure we're in a signed representation before doing the > > > division. */ > > > + tree signed_offset_cst = fold_convert (ssizetype, > > > offset_cst); > > > + > > >tree index > > > = fold_build2 (TRUNC_DIV_EXPR, integer_type_node, > > > -offset_cst, byte_size); > > > +signed_offset_cst, byte_size); > > > > I'd say you want to fold_convert byte_size to ssizetype too. > > Another thing is the integer_type_node, that looks wrong when you are > > dividing ssizetype by ssizetype. The fold-const.c folders are > > sensitive to > > using incorrect types and type mismatches. > > And, I think fold_build2 is wasteful, you only care if you can > > simplify it > > to a constant (just constant or INTEGER_CST only?). > > So, either use fold_binary (TRUNC_DIV_EXPR, ssizetype, offset_cst, > > fold_convert (ssizetype, byte_size)) > > or, if you have checked that both arguments are INTEGER_CSTs, perhaps > > int_const_binop or so. > > > > >if (CONSTANT_CLASS_P (index)) > > > > And this would need to be if (index && TREE_CODE (index) == > > INTEGER_CST) > > (or if you can handle other constants (index && CONSTANT_CLASS_P > > (index)). > > > > > return get_or_create_constant_svalue (index); > > > > Jakub > > Thanks. > > Here's an updated version of the patch which fold_converts both > inputs to the division, and uses fold_binary rather than fold_build2. > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; > verified with -m32 and -m64. > > OK for master? OK. > gcc/analyzer/ChangeLog: > PR analyzer/93281 > * region-model.cc > (region_model::convert_byte_offset_to_array_index): Convert to > ssizetype before dividing by byte_size. Use fold_binary rather > than fold_build2 to avoid needlessly constructing a tree for the > non-const case. > --- > gcc/analyzer/region-model.cc | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc > index 5c39be4fd7f..f67572e2d45 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -6414,11 +6414,13 @@ region_model::convert_byte_offset_to_array_index > (tree ptr_type, >/* This might not be a constant. */ >tree byte_size = size_in_bytes (elem_type); > > + /* Try to get a constant by dividing, ensuring that we're in a > + signed representation first. */ >tree index > - = fold_build2 (TRUNC_DIV_EXPR, integer_type_node, > -offset_cst, byte_size); > - > - if (CONSTANT_CLASS_P (index)) > + = fold_binary (TRUNC_DIV_EXPR, ssizetype, > +fold_convert (ssizetype, offset_cst), > +fold_convert (ssizetype, byte_size)); > + if (index && TREE_CODE (index) == INTEGER_CST) > return get_or_create_constant_svalue (index); > } > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: Deque rotate on current node
After the recent optimization that went in despite in stage 4 I wonder why I never got any feedback for this one. https://gcc.gnu.org/ml/libstdc++/2019-05/msg00166.html I forgot to note that this is the simplest reply to an old Paolo's request to recycle std::deque's "nodes". And this one is abi compatible ! François On 5/20/19 7:39 AM, François Dumont wrote: Hi std::deque is supposed to be like a double-queue that you can attack from front and back. But currrently its implementation makes it behave differently when starting from a fresh deque. If push_back then all goes well, it is copy/move to the current internal 'node'. If push_front then a new 'node' is created and on next pop_back the initial node will be deallocated without having ever be used. This patch changes this behavior. As long as deque is empty we can play with its state to make it push_back- or push_front-ready. It will also benefit to the usage of deque in the std::stack. More generally it could really improve scenario in which deque is used as queue of elements exchanged between different threads. As you need to make sure that consumers are faster than producers then your deque should remain empty most of the time and so this proposed patch should avoid nodes to be allocated/deallocated all the time. * include/bits/deque.tcc (deque<>::_M_push_back_aux): Rotate on current node if deque is empty. (deue<>::_M_push_front_aux): Likewise. Tested under Linux x86_64, ok to commit ? François
Re: [patch, fortran] Fix PR 44960, accepts invalid reference on function
On Thu, Jan 16, 2020 at 11:34:43PM +0100, Thomas Koenig wrote: > diff --git a/gcc/testsuite/gfortran.dg/function_reference_1.f90 > b/gcc/testsuite/gfortran.dg/function_reference_1.f90 > new file mode 100644 > index 000..78c92a6f20d > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/function_reference_1.f90 > @@ -0,0 +1,11 @@ > +! { dg-do compile } > +! PR 44960 - this was erroneusly accepted. > +! Original test case by Daniel Franke. > + > +type t > + integer :: a > +end type t > +type(t) :: foo > +print *, foo(1)%a ! { dg-error "Function call can not contain a reference" } > +end > + I do not understand this error message, and find it to be confusing. foo(1)%a looks like an invalid array reference. That is, foo is scalar and foo(1) is an array element. PS: s/can not/cannot -- Steve
[C++ PATCH] PR c++/93286 - ICE with __is_constructible and variadic template.
Here we had been recursing in tsubst_copy_and_build if type2 was a TREE_LIST because that function knew how to deal with pack expansions, and tsubst didn't. But tsubst_copy_and_build expects to be dealing with expressions, so we crash when trying to convert_from_reference a type. Tested x86_64-pc-linux-gnu, applying to trunk. * pt.c (tsubst) [TREE_LIST]: Handle pack expansion. (tsubst_copy_and_build) [TRAIT_EXPR]: Always use tsubst for type2. --- gcc/cp/pt.c | 74 ++-- gcc/testsuite/g++.dg/ext/is_constructible4.C | 18 + 2 files changed, 85 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ext/is_constructible4.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 9bb8cc13e5f..872f8ff8f52 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15350,6 +15350,71 @@ tsubst (tree t, tree args, tsubst_flags_t complain, tree in_decl) if (t == void_list_node) return t; + if ((TREE_PURPOSE (t) && PACK_EXPANSION_P (TREE_PURPOSE (t))) + || (TREE_VALUE (t) && PACK_EXPANSION_P (TREE_VALUE (t + { + /* We have pack expansions, so expand those and + create a new list out of it. */ + + /* Expand the argument expressions. */ + tree purposevec = NULL_TREE; + if (TREE_PURPOSE (t)) + purposevec = tsubst_pack_expansion (TREE_PURPOSE (t), args, + complain, in_decl); + if (purposevec == error_mark_node) + return error_mark_node; + + tree valuevec = NULL_TREE; + if (TREE_VALUE (t)) + valuevec = tsubst_pack_expansion (TREE_VALUE (t), args, + complain, in_decl); + if (valuevec == error_mark_node) + return error_mark_node; + + /* Build the rest of the list. */ + tree chain = TREE_CHAIN (t); + if (chain && chain != void_type_node) + chain = tsubst (chain, args, complain, in_decl); + if (chain == error_mark_node) + return error_mark_node; + + /* Determine the number of arguments. */ + int len = -1; + if (purposevec && TREE_CODE (purposevec) == TREE_VEC) + { + len = TREE_VEC_LENGTH (purposevec); + gcc_assert (!valuevec || len == TREE_VEC_LENGTH (valuevec)); + } + else if (TREE_CODE (valuevec) == TREE_VEC) + len = TREE_VEC_LENGTH (valuevec); + else + { + /* Since we only performed a partial substitution into + the argument pack, we only RETURN (a single list + node. */ + if (purposevec == TREE_PURPOSE (t) + && valuevec == TREE_VALUE (t) + && chain == TREE_CHAIN (t)) + return t; + + return tree_cons (purposevec, valuevec, chain); + } + + /* Convert the argument vectors into a TREE_LIST. */ + for (int i = len; i-- > 0; ) + { + purpose = (purposevec ? TREE_VEC_ELT (purposevec, i) + : NULL_TREE); + value = (valuevec ? TREE_VEC_ELT (valuevec, i) +: NULL_TREE); + + /* Build the list (backwards). */ + chain = hash_tree_cons (purpose, value, chain); + } + + return chain; + } + purpose = TREE_PURPOSE (t); if (purpose) { @@ -20158,13 +20223,8 @@ tsubst_copy_and_build (tree t, { tree type1 = tsubst (TRAIT_EXPR_TYPE1 (t), args, complain, in_decl); - - tree type2 = TRAIT_EXPR_TYPE2 (t); - if (type2 && TREE_CODE (type2) == TREE_LIST) - type2 = RECUR (type2); - else if (type2) - type2 = tsubst (type2, args, complain, in_decl); - + tree type2 = tsubst (TRAIT_EXPR_TYPE2 (t), args, +complain, in_decl); RETURN (finish_trait_expr (TRAIT_EXPR_LOCATION (t), TRAIT_EXPR_KIND (t), type1, type2)); } diff --git a/gcc/testsuite/g++.dg/ext/is_constructible4.C b/gcc/testsuite/g++.dg/ext/is_constructible4.C new file mode 100644 index 000..6dfe3c01661 --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/is_constructible4.C @@ -0,0 +1,18 @@ +// PR c++/93286 +// { dg-do compile { target c++14 } } + +struct A { static const bool value = true; }; +template using __bool_constant = A; +template +struct B : __bool_constant<__is_constructible(int, _Args...)> {}; +template using enable_if_t = int; +template bool is_constructible_v = B<_Args...>::value; +class C { + template >> + C(_Tp &&); +}; +using Effect_t = C; +void fn1(Effect_t effect) { + int i; + [](int ) {}(i); +} base-commit: 801f5b96775288e55193a66a746caab1ddd56f4a --
Re: limit on emails for merge commits.
On Fri, 17 Jan 2020, Iain Sandoe wrote: > So I know that the policy is under review (and agree, from my PoV, that > something that represents these in a similar way to the SVN single mail > containing all the changes is probably optimum). > > However, right now I’m stuck - and really want to get a short-term resolution > so that i can complete the merge of coroutines onto trunk. > > I guess I could do a squashed merge from trunk - presumably, that would only > generate 1 email? You should include Joel on such questions as the expert on the hooks. I don't know whether there's something to put in the commit message to say "allow this merge of more than 100 commits". I don't think a squashed merge is the right workaround, supposing you do want the git ancestry information to reflect what got merged. The simple workaround is to do three successive merges, each of under 100 commits and each pushed separately. Alternatively, you could check out refs/meta/config, push a change that either increases hooks.max-commit-emails or sets hooks.no-emails to prevent mails for this branch, then push the merge, then push a reversion of the change to refs/meta/config. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] analyzer: fix handling of negative byte offsets (v2) (PR 93281)
On Thu, 2020-01-16 at 09:14 +0100, Jakub Jelinek wrote: > On Wed, Jan 15, 2020 at 06:56:48PM -0500, David Malcolm wrote: > > gcc/analyzer/ChangeLog: > > PR analyzer/93281 > > * region-model.cc > > (region_model::convert_byte_offset_to_array_index): Convert > > offset_cst to ssizetype before dividing by byte_size. > > --- > > gcc/analyzer/region-model.cc | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region- > > model.cc > > index 5c39be4fd7f..b62ddf82a40 100644 > > --- a/gcc/analyzer/region-model.cc > > +++ b/gcc/analyzer/region-model.cc > > @@ -6414,9 +6414,12 @@ > > region_model::convert_byte_offset_to_array_index (tree ptr_type, > >/* This might not be a constant. */ > >tree byte_size = size_in_bytes (elem_type); > > > > + /* Ensure we're in a signed representation before doing the > > division. */ > > + tree signed_offset_cst = fold_convert (ssizetype, > > offset_cst); > > + > >tree index > > = fold_build2 (TRUNC_DIV_EXPR, integer_type_node, > > - offset_cst, byte_size); > > + signed_offset_cst, byte_size); > > I'd say you want to fold_convert byte_size to ssizetype too. > Another thing is the integer_type_node, that looks wrong when you are > dividing ssizetype by ssizetype. The fold-const.c folders are > sensitive to > using incorrect types and type mismatches. > And, I think fold_build2 is wasteful, you only care if you can > simplify it > to a constant (just constant or INTEGER_CST only?). > So, either use fold_binary (TRUNC_DIV_EXPR, ssizetype, offset_cst, > fold_convert (ssizetype, byte_size)) > or, if you have checked that both arguments are INTEGER_CSTs, perhaps > int_const_binop or so. > > >if (CONSTANT_CLASS_P (index)) > > And this would need to be if (index && TREE_CODE (index) == > INTEGER_CST) > (or if you can handle other constants (index && CONSTANT_CLASS_P > (index)). > > > return get_or_create_constant_svalue (index); > > Jakub Thanks. Here's an updated version of the patch which fold_converts both inputs to the division, and uses fold_binary rather than fold_build2. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; verified with -m32 and -m64. OK for master? gcc/analyzer/ChangeLog: PR analyzer/93281 * region-model.cc (region_model::convert_byte_offset_to_array_index): Convert to ssizetype before dividing by byte_size. Use fold_binary rather than fold_build2 to avoid needlessly constructing a tree for the non-const case. --- gcc/analyzer/region-model.cc | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 5c39be4fd7f..f67572e2d45 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -6414,11 +6414,13 @@ region_model::convert_byte_offset_to_array_index (tree ptr_type, /* This might not be a constant. */ tree byte_size = size_in_bytes (elem_type); + /* Try to get a constant by dividing, ensuring that we're in a +signed representation first. */ tree index - = fold_build2 (TRUNC_DIV_EXPR, integer_type_node, - offset_cst, byte_size); - - if (CONSTANT_CLASS_P (index)) + = fold_binary (TRUNC_DIV_EXPR, ssizetype, + fold_convert (ssizetype, offset_cst), + fold_convert (ssizetype, byte_size)); + if (index && TREE_CODE (index) == INTEGER_CST) return get_or_create_constant_svalue (index); } -- 2.21.0
limit on emails for merge commits.
I have a merge commit to devel/c++-coroutines, which is intended to refect the state at the point of commit to trunk. However, I cannot push that commit because: remote: *** This update introduces too many new commits (284), which would remote: *** trigger as many emails, exceeding the current limit (100). remote: *** Contact your repository adminstrator if you really meant remote: *** to generate this many commit emails. remote: error: hook declined to update refs/heads/devel/c++-coroutines To git+ssh://gcc.gnu.org/git/gcc.git ! [remote rejected] devel/c++-coroutines -> devel/c++-coroutines (hook declined) in fact, I usually merged trunk into the branch weekly - this one got slightly delayed because it was intended to be the final one. 284 isn’t a lot of trunk commits for an interval between merges into a devel branch, so this seems likely to fire quite often. —— So I know that the policy is under review (and agree, from my PoV, that something that represents these in a similar way to the SVN single mail containing all the changes is probably optimum). However, right now I’m stuck - and really want to get a short-term resolution so that i can complete the merge of coroutines onto trunk. I guess I could do a squashed merge from trunk - presumably, that would only generate 1 email? cheers Iain
Re: [PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)
On Thu, Jan 16, 2020 at 5:51 PM Andrew Pinski wrote: > > On Thu, Jan 16, 2020 at 5:14 AM Richard Sandiford > wrote: > > > > Wilco Dijkstra writes: > > > The separate shrinkwrapping pass may insert stores in the middle > > > of atomics loops which can cause issues on some implementations. > > > Avoid this by delaying splitting of atomic patterns until after > > > prolog/epilog generation. > > > > > > Bootstrap completed, no test regressions on AArch64. > > > > > > Andrew, can you verify this fixes the failure you were getting? > > > > > > ChangeLog: > > > 2020-01-16 Wilco Dijkstra > > > > > > PR target/92692 > > > * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) > > > Add assert to ensure prolog has been emitted. > > > (aarch64_split_atomic_op): Likewise. > > > * config/aarch64/atomics.md (aarch64_compare_and_swap) > > > Use epilogue_completed rather than reload_completed. > > > (aarch64_atomic_exchange): Likewise. > > > (aarch64_atomic_): Likewise. > > > (atomic_nand): Likewise. > > > (aarch64_atomic_fetch_): Likewise. > > > (atomic_fetch_nand): Likewise. > > > (aarch64_atomic__fetch): Likewise. > > > (atomic_nand_fetch): Likewise. > > > > OK if Andrew confirms it works, thanks. > > Yes this fixes the issue for me. Here is the new assembly showing it worked: d390: f9000bf3str x19, [sp, #16] d394: 885ffdc8ldaxr w8, [x14] d398: 6b01011fcmp w8, w1 d39c: 5461b.ned3a8 // b.any d3a0: 88137dc5stxrw19, w5, [x14] Thanks, Andrew Pinski > > Thanks, > Andrew > > > > > Richard > > > > > --- > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > index > > > ac89cc1f9c938455d33d8850d9ebfc0473cb73dc..cd9d813f2ac4990971f6435fdb28b0f94ae10309 > > > 100644 > > > --- a/gcc/config/aarch64/aarch64.c > > > +++ b/gcc/config/aarch64/aarch64.c > > > @@ -18375,6 +18375,9 @@ aarch64_emit_post_barrier (enum memmodel model) > > > void > > > aarch64_split_compare_and_swap (rtx operands[]) > > > { > > > + /* Split after prolog/epilog to avoid interactions with > > > shrinkwrapping. */ > > > + gcc_assert (epilogue_completed); > > > + > > >rtx rval, mem, oldval, newval, scratch, x, model_rtx; > > >machine_mode mode; > > >bool is_weak; > > > @@ -18469,6 +18472,9 @@ void > > > aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, > > > rtx mem, > > > rtx value, rtx model_rtx, rtx cond) > > > { > > > + /* Split after prolog/epilog to avoid interactions with > > > shrinkwrapping. */ > > > + gcc_assert (epilogue_completed); > > > + > > >machine_mode mode = GET_MODE (mem); > > >machine_mode wmode = (mode == DImode ? DImode : SImode); > > >const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); > > > diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md > > > index > > > c2bcabd0c3c2627b7222dcbc1af9c2e6b7ce6a76..996947799b5ef8445e9786b94e1ce62fd16e5b5c > > > 100644 > > > --- a/gcc/config/aarch64/atomics.md > > > +++ b/gcc/config/aarch64/atomics.md > > > @@ -56,7 +56,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > > > (clobber (match_scratch:SI 7 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_compare_and_swap (operands); > > > @@ -80,7 +80,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > > > (clobber (match_scratch:SI 7 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_compare_and_swap (operands); > > > @@ -104,7 +104,7 @@ (define_insn_and_split > > > "@aarch64_compare_and_swap" > > > (clobber (match_scratch:SI 7 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_compare_and_swap (operands); > > > @@ -223,7 +223,7 @@ (define_insn_and_split "aarch64_atomic_exchange" > > > (clobber (match_scratch:SI 4 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_atomic_op (SET, operands[0], NULL, operands[1], > > > @@ -344,7 +344,7 @@ (define_insn_and_split > > > "aarch64_atomic_" > > >(clobber (match_scratch:SI 4 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_atomic_op (, NULL, operands[3], operands[0], > > > @@ -400,7 +400,7 @@ (define_insn_and_split "atomic_nand" > > > (clobber (match_scratch:SI 4 "="))] > > >"" > > >"#" > > > - "&& reload_completed" > > > + "&& epilogue_completed" > > >[(const_int 0)] > > >{ > > > aarch64_split_atomic_op (NOT, NULL, operands[3], operands[0], > > > @@ -504,7 +504,7
Re: [PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)
On Thu, Jan 16, 2020 at 5:14 AM Richard Sandiford wrote: > > Wilco Dijkstra writes: > > The separate shrinkwrapping pass may insert stores in the middle > > of atomics loops which can cause issues on some implementations. > > Avoid this by delaying splitting of atomic patterns until after > > prolog/epilog generation. > > > > Bootstrap completed, no test regressions on AArch64. > > > > Andrew, can you verify this fixes the failure you were getting? > > > > ChangeLog: > > 2020-01-16 Wilco Dijkstra > > > > PR target/92692 > > * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) > > Add assert to ensure prolog has been emitted. > > (aarch64_split_atomic_op): Likewise. > > * config/aarch64/atomics.md (aarch64_compare_and_swap) > > Use epilogue_completed rather than reload_completed. > > (aarch64_atomic_exchange): Likewise. > > (aarch64_atomic_): Likewise. > > (atomic_nand): Likewise. > > (aarch64_atomic_fetch_): Likewise. > > (atomic_fetch_nand): Likewise. > > (aarch64_atomic__fetch): Likewise. > > (atomic_nand_fetch): Likewise. > > OK if Andrew confirms it works, thanks. Yes this fixes the issue for me. Thanks, Andrew > > Richard > > > --- > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index > > ac89cc1f9c938455d33d8850d9ebfc0473cb73dc..cd9d813f2ac4990971f6435fdb28b0f94ae10309 > > 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -18375,6 +18375,9 @@ aarch64_emit_post_barrier (enum memmodel model) > > void > > aarch64_split_compare_and_swap (rtx operands[]) > > { > > + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. > > */ > > + gcc_assert (epilogue_completed); > > + > >rtx rval, mem, oldval, newval, scratch, x, model_rtx; > >machine_mode mode; > >bool is_weak; > > @@ -18469,6 +18472,9 @@ void > > aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx > > mem, > > rtx value, rtx model_rtx, rtx cond) > > { > > + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. > > */ > > + gcc_assert (epilogue_completed); > > + > >machine_mode mode = GET_MODE (mem); > >machine_mode wmode = (mode == DImode ? DImode : SImode); > >const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); > > diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md > > index > > c2bcabd0c3c2627b7222dcbc1af9c2e6b7ce6a76..996947799b5ef8445e9786b94e1ce62fd16e5b5c > > 100644 > > --- a/gcc/config/aarch64/atomics.md > > +++ b/gcc/config/aarch64/atomics.md > > @@ -56,7 +56,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > > (clobber (match_scratch:SI 7 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_compare_and_swap (operands); > > @@ -80,7 +80,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > > (clobber (match_scratch:SI 7 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_compare_and_swap (operands); > > @@ -104,7 +104,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > > (clobber (match_scratch:SI 7 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_compare_and_swap (operands); > > @@ -223,7 +223,7 @@ (define_insn_and_split "aarch64_atomic_exchange" > > (clobber (match_scratch:SI 4 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_atomic_op (SET, operands[0], NULL, operands[1], > > @@ -344,7 +344,7 @@ (define_insn_and_split > > "aarch64_atomic_" > >(clobber (match_scratch:SI 4 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_atomic_op (, NULL, operands[3], operands[0], > > @@ -400,7 +400,7 @@ (define_insn_and_split "atomic_nand" > > (clobber (match_scratch:SI 4 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_atomic_op (NOT, NULL, operands[3], operands[0], > > @@ -504,7 +504,7 @@ (define_insn_and_split > > "aarch64_atomic_fetch_" > > (clobber (match_scratch:SI 5 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_atomic_op (, operands[0], operands[4], operands[1], > > @@ -551,7 +551,7 @@ (define_insn_and_split "atomic_fetch_nand" > > (clobber (match_scratch:SI 5 "="))] > >"" > >"#" > > - "&& reload_completed" > > + "&& epilogue_completed" > >[(const_int 0)] > >{ > > aarch64_split_atomic_op (NOT, operands[0], operands[4], operands[1], > > @@ -604,7 +604,7 @@ (define_insn_and_split > >
Re: drop -aux{dir,base}, revamp -dump{dir,base}
On Thu, 16 Jan 2020, Alexandre Oliva wrote: > And here's a followup that fixes a limitation (bug?) in libiberty that > was hit when I attempted a last-minute simplification in lto-wrapper. > > Regstrapped separately on x86_64-linux-gnu. Ok to install? > > > [libiberty] output empty args as a pair of quotes This libiberty / lto-wrapper patch is OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix ICE caused by swallowing a token in c_parser_consume_token
On Thu, 2020-01-16 at 23:43 +, Joseph Myers wrote: > Thanks, patch committed. Beat me to it :-) I spun it yesterday, but didn't get around to committing it. jeff
Re: Copy list of development branches from svn.html into git.html
On Fri, 17 Jan 2020, Jakub Jelinek wrote: > On Thu, Jan 16, 2020 at 11:22:15PM +, Joseph Myers wrote: > > Committed. I think the main remaining things that need work in our git > > documentation on the website are: > > Shouldn't we move some of the branches from Active Development to Inactive? Yes. I only reviewed the ones under "General Infrastructure" (and one of the ones left there should probably be considered inactive as well, being a git-mirror-only branch). If it's ended up in refs/dead/heads/ it can be presumed inactive and some vendor branches are likely inactive as well. (Quite possibly some branches in devel/ are also inactive, but there are few enough at present it doesn't seem a priority to figure out a suitably smooth process for moving such branches into refs/dead/ - at present it would be necessary to create the refs/dead/ branch and remove the devel/ one manually on the server. It may be better to leave any such moves out of the fetched-by-default branches into not-fetched-by-default refs until after the new sourceware is in use, since the new system should have enough resources for fully repacking the live repository on sourceware to be reasonable, and such a full repack is appropriate after moving anything out of the fetched-by-default refs in order to update the delta islands used in the packing.) -- Joseph S. Myers jos...@codesourcery.com
[PATCH] vect: Fix ICE in vectorizable_comparison PR93292
Hi! The following testcase ICEs on powerpc64le-linux. The problem is that get_vectype_for_scalar_type returns NULL, and while most places in tree-vect-stmts.c handle that case, this spot doesn't and punts only if it is non-NULL, but with different number of elts than expected. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-01-16 Jakub Jelinek PR tree-optimization/93292 * tree-vect-stmts.c (vectorizable_comparison): Punt also if get_vectype_for_scalar_type returns NULL. * g++.dg/opt/pr93292.C: New test. --- gcc/tree-vect-stmts.c.jj2020-01-12 11:54:38.522381590 +0100 +++ gcc/tree-vect-stmts.c 2020-01-16 19:42:30.60882 +0100 @@ -10492,7 +10492,7 @@ vectorizable_comparison (stmt_vec_info s { vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1), slp_node); - if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) + if (!vectype || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits)) return false; } else if (maybe_ne (nunits, TYPE_VECTOR_SUBPARTS (vectype))) --- gcc/testsuite/g++.dg/opt/pr93292.C.jj 2020-01-16 19:48:51.110144613 +0100 +++ gcc/testsuite/g++.dg/opt/pr93292.C 2020-01-16 19:47:57.351956177 +0100 @@ -0,0 +1,18 @@ +// PR tree-optimization/93292 +// { dg-do compile } +// { dg-options "-O3 -w" } + +struct A { + static int foo (float x) { static int b; b = x ? x + 0.5 : 0; return b; } +}; + +void +bar (int *d, float e) +{ + float g; + for (int h = 0; h < 64; h++) +{ + d[h] += A::foo (g < 0 ? : g > 5 ? : g); + A::foo (e); +} +} Jakub
Re: [PATCH] Fix ICE caused by swallowing a token in c_parser_consume_token
Thanks, patch committed. -- Joseph S. Myers jos...@codesourcery.com
Re: Copy list of development branches from svn.html into git.html
On Thu, Jan 16, 2020 at 11:22:15PM +, Joseph Myers wrote: > Committed. I think the main remaining things that need work in our git > documentation on the website are: Shouldn't we move some of the branches from Active Development to Inactive? I mean, some branches haven't been touched since 2013, others since 2014 or 2011 (e.g. avx{512,512vlbwdq,2}), but various others are similarly inactive. I guess I should post a patch for the redhat/ branches, because the ones listed in there are certainly all inactive, maybe it is better to use a wildcard like it was used for linaro. Jakub
Copy list of development branches from svn.html into git.html
This patch makes a start on making the branch information from svn.html available in git.html. The list is copied, some notes added on devel/, refs/dead/heads/ and refs/vendors/ naming, and inactive branches from the general section (ones not present in refs/heads/devel/, automatically moved to refs/dead/heads/ as part of the conversion, because of lack of commits since at least December 2017) are moved into the section of inactive branches that might or might not have been merged. Significant further work on the list is needed to ensure that all inactive branches are moved to inactive sections, that active branches are properly listed in active sections, and that it is clear for each branch where it appears in git (including vendor branches, and git-mirror-only branches that are now in refs/git-old/heads). Committed. I think the main remaining things that need work in our git documentation on the website are: * More details on branch/tag naming in general, and how to use not-fetched-by-default branches. * More work on the list of branches, as discussed above. * Something about the gcc-git-customization.sh script. * A realistic example commit session like the one in svnwrite.html. * Documentation of rebasing a branch that was originally created in the git-svn mirror. * Something about appropriate formatting and content of commit messages. * At an appropriate point svn.html and svnwrite.html should be redirected to git.html and gitwrite.html. Until then they should probably have warnings at the top that the SVN repository is now read-only and superseded by the Git repository. diff --git a/htdocs/git.html b/htdocs/git.html index 725bd81c..6100146e 100644 --- a/htdocs/git.html +++ b/htdocs/git.html @@ -197,5 +197,1079 @@ To create a worktree for a new project branch based on master, do git worktree add -b project ../project master +Active Development Branches + + +General +Architecture +Target +Language +Distribution +Merged +Inactive + + + +General Infrastructure + +Active development branches have names starting devel/ +in Git. + + + + ira-select + This branch is for work on a new algorithm of calculations of + pseudo register classes. The new algorithm is based on choosing an + insn alternative first and then calculation of pseudo reg class + costs knowing the alternative. The branch is maintained by Vladimir + Makarov mailto:vmaka...@redhat.com;>vmaka...@redhat.com. + + + https://gcc.gnu.org/wiki/OpenACC;>openacc-gcc-9-branch + This https://gcc.gnu.org/wiki/GitMirror;>Git-only branch is + used for collaborative development + of https://gcc.gnu.org/wiki/OpenACC;>OpenACC support and related + functionality, such + as https://gcc.gnu.org/wiki/Offloading;>offloading support. The + branch is based on gcc-9-branch. Find it + at git://gcc.gnu.org/git/gcc.git, + https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch;>https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch, + or + https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch;>https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch. + Please send patch emails with a short-hand [og9] tag in the + subject line, and use ChangeLog.openacc files. + + unified-autovect + This branch is for work on improving effectiveness and generality of GCC's + autovectorization by performing target-aware reordering instruction selection + using unified representation. This branch is maintained by Sameera Deshpande + mailto:sameera.deshpa...@imgtec.com;>sameera.deshpa...@imgtec.com. + + + +Architecture-specific + + + https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/aarch64/sve-acle-branch;>aarch64/sve-acle-branch + This https://gcc.gnu.org/wiki/GitMirror;>Git-only branch is + used for collaborative development of the AArch64 SVE ACLE implementation. + The branch is based off and merged with trunk. Please send patches to + gcc-patches with an [SVE ACLE] tag in the subject line. + There's no need to use ChangeLogs; the ChangeLogs will instead be + written when the work is ready to be merged into trunk. The branch is + maintained by Richard Sandiford. + + arc-20081210-branch + The goal of this branch is to make the port to the ARCompact + architecture available. This branch is maintained by Joern Rennecke + during spring 2009, and is expected to be unmaintained thereafter. + + avx512 + The goal of this branch is to implement Intel AVX-512 and SHA + Programming Reference. + The branch is maintained by Yukhin Kirill mailto:kirill.yuk...@intel.com;>kirill.yuk...@intel.com. + Patches should be marked with the tag [AVX512] in the subject + line. + + avx-512vlbwdq + The goal of this branch is to implement the Intel AVX-512{VL,BW,DQ} + Programming Reference + (https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf;>link). + The branch is maintained by Yukhin Kirill
Make profile estimation more precise
Hi, while analyzing code size regression in SPEC2k GCC binary I noticed that we perform some inline decisions because we think that number of executions are very high. In particular there was inline decision inlining gen_rtx_fmt_ee to find_reloads believing that it is called 4 billion times. This turned out to be cummulation of roundoff errors in propagate_freq which was bit mechanically updated from original sreals to C++ sreals and later to new probabilities. This led us to estimate that a loopback edge is reached with probability 2.3 which was capped to 1-1/1 and since this happened in nested loop it quickly escalated to large values. Originally capping to REG_BR_PROB_BASE avoided such problems but now we have much higher range. This patch avoids going from probabilites to REG_BR_PROB_BASE so precision is kept. In addition it makes the propagation to not estimate more than param-max-predicted-loop-iterations. The first change makes the cap to not be triggered on the gcc build, but it is still better to be safe than sorry. 2020-01-16 Jan Hubicka * ipa-fnsummary.c (estimate_calls_size_and_time): Fix formating of dump. * params.opt: (max-predicted-iterations): Set bounds. * predict.c (real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max): Remove. (propagate_freq): Add max_cyclic_prob parameter; cap cyclic probabilities; do not truncate to reg_br_prob_bases. (estimate_loops_at_level): Pass max_cyclic_prob. (estimate_loops): Compute max_cyclic_prob. (estimate_bb_frequencies): Do not initialize real_*; update calculation of back edge prob. * profile-count.c (profile_probability::to_sreal): New. * profile-count.h (class sreal): Move up in file. (profile_probability::to_sreal): Declare. diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index 4e1c587b101..dbd53f12a40 100644 --- a/gcc/ipa-fnsummary.c +++ b/gcc/ipa-fnsummary.c @@ -3258,7 +3258,7 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, gcc_assert (*size == old_size); if (time && (*time - old_time > 1 || *time - old_time < -1) && dump_file) - fprintf (dump_file, "Time mismatch in call summary %f!=%f", + fprintf (dump_file, "Time mismatch in call summary %f!=%f\n", old_time.to_double (), time->to_double ()); } diff --git a/gcc/params.opt b/gcc/params.opt index 31cc20031b1..f02c769d0e3 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -555,7 +555,7 @@ Common Joined UInteger Var(param_max_pow_sqrt_depth) Init(5) IntegerRange(1, 32) Maximum depth of sqrt chains to use when synthesizing exponentiation by a real constant. -param=max-predicted-iterations= -Common Joined UInteger Var(param_max_predicted_iterations) Init(100) Param Optimization +Common Joined UInteger Var(param_max_predicted_iterations) Init(100) IntegerRange(1, 65536) Param Optimization The maximum number of loop iterations we predict statically. -param=max-reload-search-insns= diff --git a/gcc/predict.c b/gcc/predict.c index 02c5fe0667d..c3aed9ed854 100644 --- a/gcc/predict.c +++ b/gcc/predict.c @@ -76,10 +76,6 @@ enum predictor_reason static const char *reason_messages[] = {"", " (ignored)", " (single edge duplicate)", " (edge pair duplicate)"}; -/* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, - 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ -static sreal real_almost_one, real_br_prob_base, -real_inv_br_prob_base, real_one_half, real_bb_freq_max; static void combine_predictions_for_insn (rtx_insn *, basic_block); static void dump_prediction (FILE *, enum br_predictor, int, basic_block, @@ -3266,7 +3262,8 @@ public: TOVISIT, starting in HEAD. */ static void -propagate_freq (basic_block head, bitmap tovisit) +propagate_freq (basic_block head, bitmap tovisit, + sreal max_cyclic_prob) { basic_block bb; basic_block last; @@ -3322,22 +3319,14 @@ propagate_freq (basic_block head, bitmap tovisit) FOR_EACH_EDGE (e, ei, bb->preds) if (EDGE_INFO (e)->back_edge) - { - cyclic_probability += EDGE_INFO (e)->back_edge_prob; - } + cyclic_probability += EDGE_INFO (e)->back_edge_prob; else if (!(e->flags & EDGE_DFS_BACK)) { - /* frequency += (e->probability - * BLOCK_INFO (e->src)->frequency / - REG_BR_PROB_BASE); */ - /* FIXME: Graphite is producing edges with no profile. Once this is fixed, drop this. */ sreal tmp = e->probability.initialized_p () ? - e->probability.to_reg_br_prob_base () : 0; - tmp *= BLOCK_INFO (e->src)->frequency; - tmp *=
[wwwdocs] Remove reference to SVN and our svn.html page from projects/.
...usually by simplifying and removing the link. (And add a blank linke between the and tags where there was none.) Pushed. Gerald --- htdocs/projects/beginner.html | 4 ++-- htdocs/projects/cfo.html | 3 +-- htdocs/projects/cli.html | 3 +-- htdocs/projects/cxx-reflection/index.html | 2 +- htdocs/projects/gomp/index.html | 5 ++--- htdocs/projects/gupc.html | 5 ++--- htdocs/projects/sched-treegion.html | 4 ++-- htdocs/projects/tree-ssa/index.html | 3 ++- 8 files changed, 13 insertions(+), 16 deletions(-) diff --git a/htdocs/projects/beginner.html b/htdocs/projects/beginner.html index c4b7214c..0a705dbf 100644 --- a/htdocs/projects/beginner.html +++ b/htdocs/projects/beginner.html @@ -24,8 +24,8 @@ individual task seems daunting; there's probably an easier one. If you have no programming skills, we can still use your help with documentation or our bug tracker. -We assume that you already know how to get the -latest sources, configure and build the compiler, and run the test +We assume that you already know how to get the +latest sources, configure and build the compiler, and run the test suite. You should also familiarize yourself with the requirements for contributions to GCC. diff --git a/htdocs/projects/cfo.html b/htdocs/projects/cfo.html index 4ec8c82b..8f4b4230 100644 --- a/htdocs/projects/cfo.html +++ b/htdocs/projects/cfo.html @@ -47,8 +47,7 @@ algorithms). The implementation currently resides on the branch. Contributing -Checkout the cfo-branch branch from -our respository. +Checkout the cfo-branch branch from our respository. When posting to the development lists, please mark messages and patches with [cfo] in the subject. The usual contribution and testing rules apply. This diff --git a/htdocs/projects/cli.html b/htdocs/projects/cli.html index fcbcbaed..f6258fc1 100644 --- a/htdocs/projects/cli.html +++ b/htdocs/projects/cli.html @@ -134,8 +134,7 @@ People currently involved in the development or that had worked on it in the pas Contributing -Check out st/cli-be branch following the instructions found in the -SVN documentation. +Check out st/cli-be branch. Being this a branch, the usual maintainer rules do not apply. The branch is being maintained by diff --git a/htdocs/projects/cxx-reflection/index.html b/htdocs/projects/cxx-reflection/index.html index 2f5dd953..276327aa 100644 --- a/htdocs/projects/cxx-reflection/index.html +++ b/htdocs/projects/cxx-reflection/index.html @@ -25,7 +25,7 @@ constrained genericity in C++. Contributing Checkout the cxx-reflection-branch branch -in our respository. +in our respository. When posting to the development lists, please mark messages and patches with [cxx-reflection] in the subject. As this is a branch, and not diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html index b96ce0d4..39c8d3ad 100644 --- a/htdocs/projects/gomp/index.html +++ b/htdocs/projects/gomp/index.html @@ -1,5 +1,6 @@ + GOMP An OpenMP implementation for GCC @@ -50,9 +51,7 @@ the generation of efficient and small code for OpenMP applications. Contributing We encourage everyone to contribute changes -and help test GOMP. GOMP has been merged into the SVN trunk. We -provide read access to our development sources for everybody with -anonymous SVN. +and help test GOMP. GOMP has been merged into mainline GCC. Reporting Bugs Please add "openmp" to the keywords field when filing a bug report. diff --git a/htdocs/projects/gupc.html b/htdocs/projects/gupc.html index 9d3aeda0..3b822ca3 100644 --- a/htdocs/projects/gupc.html +++ b/htdocs/projects/gupc.html @@ -1,5 +1,6 @@ + GUPC A UPC implementation for GCC @@ -117,9 +118,7 @@ in the body of the message. We encourage everyone to contribute changes -and help test GUPC. GUPC is currently an SVN branch. We -provide read access to our development sources for everybody with -anonymous SVN. +and help test GUPC. GUPC is currently on a development branch. Reporting Bugs diff --git a/htdocs/projects/sched-treegion.html b/htdocs/projects/sched-treegion.html index b906bcf0..d421d87b 100644 --- a/htdocs/projects/sched-treegion.html +++ b/htdocs/projects/sched-treegion.html @@ -1,5 +1,6 @@ + Treegion Scheduling @@ -151,8 +152,7 @@ speculation for instructions dominated by this basic block. Contributing -Checkout the sched-treegion branch following the instructions found in the -SVN documentation. +Checkout the sched-treegion branch. When posting to the development lists, please mark messages and patches with [sched-treegion] in the subject. The usual contribution and testing diff --git a/htdocs/projects/tree-ssa/index.html b/htdocs/projects/tree-ssa/index.html index 0b2e01d5..a15d0f32 100644 --- a/htdocs/projects/tree-ssa/index.html +++ b/htdocs/projects/tree-ssa/index.html @@ -1,5 +1,6 @@ + SSA for Trees @@ -80,7 +81,7 @@
[C++ PATCH] PR c++/93280 - ICE with aggregate assignment and DMI.
I recently added an assert to cp-gimplify to catch any TARGET_EXPR_DIRECT_INIT_P being expanded without a target object, and this testcase found one. We started out with a TARGET_EXPR around the CONSTRUCTOR, which would normally mean that the member initializer would be used to directly initialize the appropriate member of whatever object the TARGET_EXPR ends up initializing. But then gimplify_modify_expr_rhs stripped the TARGET_EXPR in order to assign directly from the elements of the CONSTRUCTOR, leaving no object for the TARGET_EXPR_DIRECT_INIT_P to initialize. I considered setting CONSTRUCTOR_PLACEHOLDER_BOUNDARY in that case, which implies TARGET_EXPR_NO_ELIDE, but decided that there's no particular reason the A initializer needs to initialize a member of a B rather than a distinct A object, so let's only set TARGET_EXPR_DIRECT_INIT_P when we're using the DMI in a constructor. Tested x86_64-pc-linux-gnu, applying to trunk. * init.c (get_nsdmi): Set TARGET_EXPR_DIRECT_INIT_P here. * typeck2.c (digest_nsdmi_init): Not here. --- gcc/cp/init.c| 4 gcc/cp/typeck2.c | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/cp/init.c b/gcc/cp/init.c index ba80474e6ac..543d127abcd 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -655,6 +655,10 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t complain) if (simple_target) init = TARGET_EXPR_INITIAL (init); init = break_out_target_exprs (init, /*loc*/true); + if (in_ctor && init && TREE_CODE (init) == TARGET_EXPR) +/* This expresses the full initialization, prevent perform_member_init from + calling another constructor (58162). */ +TARGET_EXPR_DIRECT_INIT_P (init) = true; if (simple_target && TREE_CODE (init) != CONSTRUCTOR) /* Now put it back so C++17 copy elision works. */ init = get_target_expr (init); diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index f36a5649ae6..371b203c29b 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -1369,9 +1369,6 @@ digest_nsdmi_init (tree decl, tree init, tsubst_flags_t complain) && CP_AGGREGATE_TYPE_P (type)) init = reshape_init (type, init, complain); init = digest_init_flags (type, init, flags, complain); - if (TREE_CODE (init) == TARGET_EXPR) -/* This represents the whole initialization. */ -TARGET_EXPR_DIRECT_INIT_P (init) = true; return init; } base-commit: 98d56ea8900fdcff8f1987cf2bf499a5b7399857 -- 2.18.1
[patch, fortran] Fix PR 44960, accepts invalid reference on function
Hello world, here is my first patch made from the git world. It certainly took enough time to work out how to to this, but I think I have it figured out now... Anyway, the fix is rather straightforward. We cannot have a reference on a function. If this slipped through on parsing, let's issue the error during resolution. Regression-tested. OK for trunk? Regards Thomas 2020-01-16 Thomas König PR fortran/44960 * resolve.c (resolve_function): Issue error when a function call contains a reference. 2020-01-16 Thomas König PR fortran/44960 * gfortran.dg/function_reference_1.f90: New test. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 6f2a4c4d65a..1525c00ea4c 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -3129,6 +3129,13 @@ resolve_function (gfc_expr *expr) || sym->intmod_sym_id == GFC_ISYM_CAF_SEND)) return true; + if (expr->ref) +{ + gfc_error ("Function call can not contain a reference at %L", + >where); + return false; +} + if (sym && sym->attr.intrinsic && !gfc_resolve_intrinsic (sym, >where)) return false; diff --git a/gcc/testsuite/gfortran.dg/function_reference_1.f90 b/gcc/testsuite/gfortran.dg/function_reference_1.f90 new file mode 100644 index 000..78c92a6f20d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/function_reference_1.f90 @@ -0,0 +1,11 @@ +! { dg-do compile } +! PR 44960 - this was erroneusly accepted. +! Original test case by Daniel Franke. + +type t + integer :: a +end type t +type(t) :: foo +print *, foo(1)%a ! { dg-error "Function call can not contain a reference" } +end + ! { dg-do compile } ! PR 44960 - this was erroneusly accepted. ! Original test case by Daniel Franke. type t integer :: a end type t type(t) :: foo print *, foo(1)%a ! { dg-error "Function call can not contain a reference" } end
[wwwdocs] Promote Git on our home page.
Pushed. Gerald --- htdocs/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/index.html b/htdocs/index.html index 89bcfcca..41bcfe18 100644 --- a/htdocs/index.html +++ b/htdocs/index.html @@ -32,7 +32,7 @@ of native and cross targets (including GNU/Linux), and encourage everyone to contribute changes or help testing GCC. Our sources are readily and freely available via -SVN and weekly +Git and weekly snapshots. Major decisions about GCC are made by the -- 2.24.1
Re: [PATCH] libstdc++/91223 Improve unordered containers == operator
On 1/16/20 5:01 PM, Jonathan Wakely wrote: On 16/01/20 13:25 +, Jonathan Wakely wrote: On 16/01/20 07:42 +0100, François Dumont wrote: On 1/15/20 10:52 PM, Jonathan Wakely wrote: On 15/01/20 21:48 +, Jonathan Wakely wrote: On 14/01/20 22:25 +0100, François Dumont wrote: On 1/13/20 10:53 PM, Jonathan Wakely wrote: On 13/01/20 22:41 +0100, François Dumont wrote: For the multi-keys we could still avoid redundant comparisons when _Equal is just doing == on the key type. On unordered_multiset we could just avoids the call to is_permuation and on the unordered_multimap we could check the is_permutation only on the associated value rather than on the std::pair. I don't think that's necessary, or helpful. The idea of https://gcc.gnu.org/ml/libstdc++/2020-01/msg00070.html is that you shouldn't be using _Equal at all, and therefore it doesn't matter whether it's std::equal_to or not. And it was indeed possible. Nice! PR libstdc++/91223 * include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend. * include/bits/hashtable_policy.h: Include . (_Equality_base): Remove. (_Equality<>::_M_equal): Review implementation. Use std::is_permutation. * testsuite/23_containers/unordered_multiset/operators/1.cc (Hash, Equal, test02, test03): New. * testsuite/23_containers/unordered_set/operators/1.cc (Hash, Equal, test02, test03): New. Tested under Linux x86_64. Ok to commit ? Yes, OK for trunk (we're in stage4 but your patch was posted in stage3 and fixes a pretty nasty performance bug, so is OK now). N.B. GCC has moved to Git instead of Subversion. If you don't have Git access set up let me know and I can commit the patch for you. I haven't done the move yet and won't be able to do it before the week-end. So please proceed to the commit for me, thanks. No problem, I can do that. Your patch is now committed to trunk. Thanks for the major improvement. I had a look at std::is_permutation and I think we can make some simplifications to the 4-argument overload, and we can share most of the code between the 3-arg and 4-arg overloads (once they've confirmed the lengths are the same they do exactly the same thing). See the attached patch. This should probably wait for stage1 though. I also wanted to add some comments to the _Equality::_M_equal specialiation for unordered_multimap/multiset to explain what the code was doing, and had some more ideas. See patch again. It looks like this loop can potentially visit every element of __other, instead of stopping at the end of the bucket: typename __hashtable::const_iterator __ity(__y_n); for (auto __ity_end = __ity; __ity_end != __other.end(); ++__ity_end) if (--__x_count == 0) break; Consider a case like this: unordered_multiset a{1, 2, 3, 4}; for (int i = 0; i <1; ++i) a.insert(1); unordered_multiset b{1, 2, 3, 4}; for (int i = 0; i <1; ++i) b.insert(2); When doing a == b we'll find 1 elements in a with key '1', and find one element in b with that key. But then we iterate through every element in b after that one, even though they have different keys and are probably in different buckets. Instead of just iterating from __ity to __other.end(), can we use a local iterator so we stop at the end of the bucket? This seems to make the PR91263 example *very* slightly slower, but makes the example above significantly faster. What do you think? The hashtable implementation is doing its best to provide good performances as long as the user does its part of the job. Mainly provide a good hash to distrubute elements smoothly throughout the buckets. But also avoid this kind of unordered_multiset. If you check if you don't move out of bucket you'll have to pay for the bucket computation (subject of PR 68303) or perform a redundant _Equal to check when we left the range of equivalent elements like it used to be done. Current implementation leave it to the std::is_permutation to do that which in normal situation will be better I think.
Re: [PATCH] Fix noreorder symbol partitioning reversion.
> Hi. > > The patch is fixes a regression in libgcrypt package where > we incorrectly forget to stream out a definition of a no-reorder symbol. > It's caused by LTO balanced map reversion, where we do not revert > also best_noreorder_pos. > > It's pre-approved patch by Honza and I'm going to install it. > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Thanks, > Martin > > > gcc/lto/ChangeLog: > > 2020-01-16 Martin Liska > > * lto-partition.c (lto_balanced_map): Remember > best_noreorder_pos and then restore to it > when we revert. Thanks, also please backport it to release branches once it survives some testing in mainline. Honza
[PATCH] Fix noreorder symbol partitioning reversion.
Hi. The patch is fixes a regression in libgcrypt package where we incorrectly forget to stream out a definition of a no-reorder symbol. It's caused by LTO balanced map reversion, where we do not revert also best_noreorder_pos. It's pre-approved patch by Honza and I'm going to install it. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Thanks, Martin gcc/lto/ChangeLog: 2020-01-16 Martin Liska * lto-partition.c (lto_balanced_map): Remember best_noreorder_pos and then restore to it when we revert. --- gcc/lto/lto-partition.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 3a9990903c7..8e0488ab13e 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -471,6 +471,7 @@ void lto_balanced_map (int n_lto_partitions, int max_partition_size) { int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0; + int best_noreorder_pos = 0; auto_vec order (symtab->cgraph_count); auto_vec noreorder; auto_vec varpool_order; @@ -732,6 +733,7 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size) best_i = i; best_n_nodes = lto_symtab_encoder_size (partition->encoder); best_varpool_pos = varpool_pos; + best_noreorder_pos = noreorder_pos; } if (dump_file) fprintf (dump_file, "Step %i: added %s, size %i, " @@ -752,6 +754,7 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size) i - best_i, best_i); undo_partition (partition, best_n_nodes); varpool_pos = best_varpool_pos; + noreorder_pos = best_noreorder_pos; } gcc_assert (best_size == partition->insns); i = best_i;
[PATCH] Redirect cvswrite.html to gitwrite.html instead of svnwrite.html.
I doubt there's many references to https://gcc.gnu.org/cvs.html left, but just in case - Git is the new SVN is the new CVS. :-) Gerald --- htdocs/.htaccess | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/htdocs/.htaccess b/htdocs/.htaccess index acaac093..a28af129 100644 --- a/htdocs/.htaccess +++ b/htdocs/.htaccess @@ -58,7 +58,7 @@ Redirect permanent /java/ https://gcc.gnu.org/ Redirect permanent /bugs.html https://gcc.gnu.org/bugs/ Redirect permanent /c9xstatus.html https://gcc.gnu.org/c99status.html -Redirect permanent /cvswrite.html https://gcc.gnu.org/svnwrite.html +Redirect permanent /cvswrite.html https://gcc.gnu.org/gitwrite.html Redirect permanent /gnats.html https://gcc.gnu.org/bugs/ Redirect permanent /proj-bp.html https://gcc.gnu.org/projects/bp/main.html Redirect permanent /proj-cpplib.html https://gcc.gnu.org/projects/cpplib.html -- 2.24.1
Re: [committed, amdgcn] Allow constants in vector extends and truncates
On 19/12/2019 17:39, Richard Sandiford wrote: Andrew Stubbs writes: This patch changes the operand predicates such that vector constants are permitted during compilation. This prevents ICEs caused by the compiler trying to emit such instructions without checking. That sounds like a target-independent bug though. Why didn't we apply the predicates in the normal way? Sorry, I've just got back to checking this. I've now confirmed that this is not a target independent bug. The code generating the instructions is in expanders elsewhere in this machine description. They don't attempt to fold constants, so we get an ICE when the following vregs pass attempts to recognise the instructions. By relaxing the predicates we can allow the compiler to fold the instructions in the normal way, and keep the other expanders' code straight-forward. Andrew
Re: [PATCH][Arm] Only enable fsched-pressure with Ofast
ping The current pressure scheduler doesn't appear to correctly track register pressure and avoid creating unnecessary spills when register pressure is high. As a result disabling the early scheduler improves integer performance considerably and reduces codesize as a bonus. Since scheduling floating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06 Wilco Dijkstra * gcc/common/config/arm-common.c (arm_option_optimization_table): Enable fsched_pressure with Ofast only. -- diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c index 41a920f6dc96833e778faa8dbcc19beac483734c..b761d3abd670a144a593c4b410b1e7fbdcb52f56 100644 --- a/gcc/common/config/arm/arm-common.c +++ b/gcc/common/config/arm/arm-common.c @@ -38,7 +38,7 @@ static const struct default_options arm_option_optimization_table[] = { /* Enable section anchors by default at -O1 or higher. */ { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, -{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, +{ OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } };
Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream
> On Jan 16, 2020, at 9:24 AM, Jonathan Wakely wrote: > > Thanks to everyone for their feedback. I've pushed the patch to GCC > master now (I think it's reasonable to consider "doesn't build on > NetBSD anymore" as a regression and so fixable at this point in GCC's > development). > > Please let me know of any problems you find with the changes. Great, thanks Jonathan! -- thorpej
Re: [PATCH][AARCH64] Enable compare branch fusion
ping Enable the most basic form of compare-branch fusion since various CPUs support it. This has no measurable effect on cores which don't support branch fusion, but increases fusion opportunities on cores which do. Bootstrapped on AArch64, OK for commit? ChangeLog: 2019-12-24 Wilco Dijkstra * config/aarch64/aarch64.c (generic_tunings): Add branch fusion. (neoversen1_tunings): Likewise. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a3b18b381e1748f8fe5e522bdec4f7c850821fe8..1c32a3543bec4031cc9b641973101829c77296b5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -726,7 +726,7 @@ static const struct tune_params generic_tunings = SVE_NOT_IMPLEMENTED, /* sve_width */ 4, /* memmov_cost */ 2, /* issue_rate */ - (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ "16:12", /* function_align. */ "4", /* jump_align. */ "8", /* loop_align. */ @@ -1130,7 +1130,7 @@ static const struct tune_params neoversen1_tunings = SVE_NOT_IMPLEMENTED, /* sve_width */ 4, /* memmov_cost */ 3, /* issue_rate */ - AARCH64_FUSE_AES_AESMC, /* fusible_ops */ + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ "32:16", /* function_align. */ "32:16", /* jump_align. */ "32:16", /* loop_align. */
Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1
ping Testing shows the setting of 32:16 for jump alignment has a significant codesize cost, however it doesn't make a difference in performance. So set jump-align to 4 to get 1.6% codesize improvement. OK for commit? ChangeLog 2019-12-24 Wilco Dijkstra * config/aarch64/aarch64.c (neoversen1_tunings): Set jump_align to 4. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1646ed1d9a3de8ee2f0abff385a1ea145e234475..209ed8ebbe81104d9d8cff0df31946ab7704fb33 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1132,7 +1132,7 @@ static const struct tune_params neoversen1_tunings = 3, /* issue_rate */ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ "32:16", /* function_align. */ - "32:16", /* jump_align. */ + "4", /* jump_align. */ "32:16", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */
Re: [PATCH] testsuite: Unbreak compat.exp testing with alt compiler (PR testsuite/93294)
On Thu, 2020-01-16 at 18:01 +0100, Jakub Jelinek wrote: > Hi! > > The PR87488 changes include: > * lib/prune.exp (TEST_ALWAYS_FLAGS): Add -fdiagnostics- > urls=never. > Unfortunately, this broke all the compat.exp etc. compatibility > testing with > alternate compilers, because those compilers (likely) don't > understand this > new option and thus all alt compilations FAIL and everything that > needs it > is then UNRESOLVED. Sorry about that. > Fixed thusly, by handling it like -fdiagnostics-color=never and other > similar options. > > Tested on x86_64-linux, ok for trunk? Looks sane to me, by analogy with the "color" handling. Dave
Re: [PATCH] libstdcxx: Update ctype_base.h from NetBSD upstream
On 14/01/20 08:56 -0800, Jason Thorpe wrote: On Jan 14, 2020, at 8:49 AM, Kamil Rytarowski wrote: On 10.01.2020 17:11, Jonathan Wakely wrote: On 07/01/20 12:44 -0800, Jason Thorpe wrote: On Jan 7, 2020, at 7:43 AM, Jonathan Wakely wrote: For Jason and Krister's benefit, that last comment was referring to an earlier suggestion to not try to support old NetBSD releases, see https://gcc.gnu.org/ml/libstdc++/2020-01/msg00026.html I think we need the netbsd target maintainers (CC'd) to decide whether GCC should still support older releases or drop that support for GCC 10. Usually I'd say we need a period of deprecation, but if GCC doesn't currently build on NetBSD then maybe that's unnecessary. The affected NetBSD versions are NetBSD 6 and earlier, which are EOL from the NetBSD perspective, so I think this is OK. So is this patch OK then? Looks good to me. Now that we have Kamil's confirmation that it's working for him, it also gets a thumbs-up from me. Thanks! Thanks to everyone for their feedback. I've pushed the patch to GCC master now (I think it's reasonable to consider "doesn't build on NetBSD anymore" as a regression and so fixable at this point in GCC's development). Please let me know of any problems you find with the changes.
Re: [PATCH][AArch64] Enable CLI for Armv8.6-A f64mm
Matthew Malcomson writes: > This patch is necessary for sve-ld1ro intrinsic I posted in > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00466.html . > > I had mistakenly thought this option was already enabled upstream. > > This provides the option +f64mm, that turns on the 64 bit floating point > matrix multiply extension. This extension is only available for > AArch64. Turning on this extension also turns on the SVE extension. > > This extension is optional and only available at Armv8.2-A and onward. > > We also add the ACLE defined macro for this extension. > > Tested on aarch64 cross compiler from x86. > > gcc/ChangeLog: > > 2020-01-13 Matthew Malcomson > > * config/aarch64/aarch64-c.c (_ARM_FEATURE_MATMUL_FLOAT64): > Introduce this ACLE specified predefined macro. > * config/aarch64/aarch64-option-extensions.def (f64mm): New. > (fp): Disabling this disables f64mm. > (simd): Disabling this disables f64mm. > (fp16): Disabling this disables f64mm. > (sve): Disabling this disables f64mm. > * config/aarch64/aarch64.h (AARCH64_FL_F64MM): New. > (AARCH64_ISA_F64MM): New. > (TARGET_F64MM): New. > * doc/invoke.texi (f64mm): Document new option. Sorry for the slow reply, somehow missed this originally. > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index > 5022a1b3552f35364e35b3955bf2c39a33ab0752..6057b33033a0f3a8be7d656dc1e459d4d93b2842 > 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -53,26 +53,26 @@ > /* Enabling "fp" just enables "fp". > Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", > "sha3", sm3/sm4, "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4", > - "sve2-bitperm", "i8mm" and "bf16". */ > + "sve2-bitperm", "i8mm", "f64mm", and "bf16". */ > AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | \ > AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | \ > AARCH64_FL_SHA2 | AARCH64_FL_SHA3 | AARCH64_FL_SM4 | \ > AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_SVE2_AES | \ > AARCH64_FL_SVE2_SHA3 | AARCH64_FL_SVE2_SM4 | \ > AARCH64_FL_SVE2_BITPERM | AARCH64_FL_I8MM | \ > - AARCH64_FL_BF16, false, "fp") > + AARCH64_FL_F64MM | AARCH64_FL_BF16, false, "fp") > > /* Enabling "simd" also enables "fp". > Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3", > "sm3/sm4", "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4", > - "sve2-bitperm", and "i8mm". */ > + "sve2-bitperm", "i8mm", and "f64mm". */ > AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, \ > AARCH64_FL_CRYPTO | AARCH64_FL_DOTPROD | \ > AARCH64_FL_AES | AARCH64_FL_SHA2 | AARCH64_FL_SHA3 | \ > AARCH64_FL_SM4 | AARCH64_FL_SVE | AARCH64_FL_SVE2 | \ > AARCH64_FL_SVE2_AES | AARCH64_FL_SVE2_SHA3 | \ > AARCH64_FL_SVE2_SM4 | AARCH64_FL_SVE2_BITPERM | \ > - AARCH64_FL_I8MM, false, \ > + AARCH64_FL_I8MM | AARCH64_FL_F64MM, false, \ > "asimd") > > /* Enabling "crypto" also enables "fp", "simd", "aes" and "sha2". > @@ -92,12 +92,13 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, false, > "crc32") > AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, false, "atomics") > > /* Enabling "fp16" also enables "fp". > - Disabling "fp16" disables "fp16", "fp16fml", "sve", "sve2", "sve2-aes", > - "sve2-sha3", "sve2-sm4", and "bitperm". */ > + Disabling "fp16" disables "fp16", "fp16fml", "sve", "f64mm", "sve2", > + "sve2-aes", "sve2-sha3", "sve2-sm4", and "bitperm". */ > AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, \ > - AARCH64_FL_F16FML | AARCH64_FL_SVE | AARCH64_FL_SVE2 | \ > - AARCH64_FL_SVE2_AES | AARCH64_FL_SVE2_SHA3 | \ > - AARCH64_FL_SVE2_SM4 | AARCH64_FL_SVE2_BITPERM, false, \ > + AARCH64_FL_F16FML | AARCH64_FL_SVE | AARCH64_FL_F64MM | \ > + AARCH64_FL_SVE2 | AARCH64_FL_SVE2_AES | \ > + AARCH64_FL_SVE2_SHA3 | AARCH64_FL_SVE2_SM4 | \ > + AARCH64_FL_SVE2_BITPERM, false, \ > "fphp asimdhp") Minor (of course), but it'd probably better to keep the order of the options consistent. Above "f64mm" came at the end, after "i8mm" and the SVE options, so let's do the same here. Could you fix s/"bitperm"/"sve2-bitperm" while you're there? > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index > 53df4b1fdf9884d0b7cb73341d717887e79fef9c..47668731efc25dbf32a3b6b38817d7762bbfa93f > 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -16130,7 +16130,7 @@ and the
contrib: New remotes structure for vendor and personal refs
The initial structure for vendor and personal branches makes use of the default remote (normally origin) for the upstream repository). Unfortunately, this causes some confusion, especially for personal branches because a push will not push to the correct upstream location. This can be 'fixed' by adding a push refspec for the remote, but that has the unfortunate consequence of breaking the push.default behaviour for git push, and it becomes too easy to accidentally commit something unintended to the main parts of the repository. To work around this, this patch changes the configuration to use separate 'remotes' for these additional refs, with one remote for the personal space and another remote for each vendor's space. The personal space is called after the user's preferred branch-space prefix (default 'me'), the vendor spaces are called vendor-. As far as possible, I've made the script automatically restructure any existing fetch or push lines that earlier versions of the scripts may have created - the gcc-git-customization.sh script will convert all vendor refs that it can find, so it is not necessary to re-add any vendors you've already added. You might, however, want to run git remote prune after running to clean up any stale upstream-refs that might still be in your local repo, and then git fetch vendor- or git fetch to re-populate the remotes/ structures. Also, for any branch you already have that tracks a personal or vendor branch upstream, you might need to run git config branch..remote so that merges and pushes go to the right place (I haven't attempted to automate this last part). Please be aware that if you have multiple personal branches set up, then git push will still consider all of them for pushing. If you only want to push one branch, then either write git push HEAD or git push /branch as appropriate. And don't forget '-n' (--dry-run) to see what would be done if this were not a dry run. * gcc-git-customization.sh: New personal and vendor layout. Convert any existing fetch/push rules to new layout. * git-fetch-vendor.sh: New vendor layout. I'll wait a bit before pushing this, just in case someone spots an issue. R. diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh index 26f4389bcc8..b62397314c6 100755 --- a/contrib/gcc-git-customization.sh +++ b/contrib/gcc-git-customization.sh @@ -81,6 +81,13 @@ then upstream="origin" fi ask "Local name for upstream repository" "origin" upstream + +v=$(git config --get-all "remote.${upstream}.fetch") +if [ "x$v" = "x" ] +then +echo "Remote $upstream does not seem to exist as a remote" +exit 1 +fi git config "gcc-config.upstream" "$upstream" remote_id=$(git config --get "gcc-config.user") @@ -113,22 +120,38 @@ echo "(local branches starting / can be pushed directly to your" ask "personal area on the gcc server)" $old_pfx new_pfx git config "gcc-config.userpfx" "$new_pfx" -echo "Setting up tracking for personal namespace $remote_id in remotes/$upstream/${new_pfx}" -git config --replace-all "remote.${upstream}.fetch" "+refs/users/${remote_id}/heads/*:refs/remotes/${upstream}/${new_pfx}/*" ":refs/remotes/${upstream}/${old_pfx}/" -git config --replace-all "remote.${upstream}.fetch" "+refs/users/${remote_id}/tags/*:refs/tags/${new_pfx}/*" ":refs/tags/${old_pfx}/" +# Scan the existing settings to see if there are any we need to rewrite. +vendors=$(git config --get-all "remote.${upstream}.fetch" "refs/vendors/" | sed -r "s:.*refs/vendors/([^/]+)/.*:\1:" | sort | uniq) +url=$(git config --get "remote.${upstream}.url") +pushurl=$(git config --get "remote.${upstream}.pushurl") +for v in $vendors +do +echo "Migrating vendor $v to new remote vendor-$v" +git config --unset-all "remote.${upstream}.fetch" "refs/vendors/$v/" +git config --unset-all "remote.${upstream}.push" "refs/vendors/$v/" +git config "remote.vendor-${v}.url" "${url}" +if [ "x$pushurl" != "x" ] +then + git config "remote.vendor-${v}.pushurl" "${pushurl}" +fi +git config --add "remote.vendor-${v}.fetch" "+refs/vendors/$v/heads/*:refs/remotes/vendor-${v}/*" +git config --add "remote.vendor-${v}.fetch" "+refs/vendors/$v/tags/*:refs/tags/vendor-${v}/*" +done + +echo "Setting up tracking for personal namespace $remote_id in remotes/${new_pfx}" +git config "remote.${new_pfx}.url" "${url}" +if [ "x$pushurl" != "x" ] +then +git config "remote.${new_pfx}.pushurl" "${pushurl}" +fi +git config --replace-all "remote.${new_pfx}.fetch" "+refs/users/${remote_id}/heads/*:refs/remotes/${new_pfx}/*" ":refs/remotes/${old_pfx}/" +git config --replace-all "remote.${new_pfx}.fetch" "+refs/users/${remote_id}/tags/*:refs/tags/${new_pfx}/*" ":refs/tags/${old_pfx}/" +git config --replace-all "remote.${new_pfx}.push" "refs/heads/${new_pfx}/*:refs/users/${remote_id}/heads/*" ":refs/users/${remote_id}" -push_rule=$(git config --get
[PATCH] testsuite: Unbreak compat.exp testing with alt compiler (PR testsuite/93294)
Hi! The PR87488 changes include: * lib/prune.exp (TEST_ALWAYS_FLAGS): Add -fdiagnostics-urls=never. Unfortunately, this broke all the compat.exp etc. compatibility testing with alternate compilers, because those compilers (likely) don't understand this new option and thus all alt compilations FAIL and everything that needs it is then UNRESOLVED. Fixed thusly, by handling it like -fdiagnostics-color=never and other similar options. Tested on x86_64-linux, ok for trunk? 2020-01-16 Jakub Jelinek PR testsuite/93294 * lib/c-compat.exp (compat-use-alt-compiler): Handle -fdiagnostics-urls=never similarly to -fdiagnostics-color=never. (compat_setup_dfp): Likewise. --- gcc/testsuite/lib/c-compat.exp.jj 2020-01-12 11:54:38.457382571 +0100 +++ gcc/testsuite/lib/c-compat.exp 2020-01-16 17:48:25.688227433 +0100 @@ -36,6 +36,7 @@ load_lib target-libpath.exp proc compat-use-alt-compiler { } { global GCC_UNDER_TEST ALT_CC_UNDER_TEST global compat_same_alt compat_alt_caret compat_alt_color compat_no_line_no +global compat_alt_urls global TEST_ALWAYS_FLAGS # We don't need to do this if the alternate compiler is actually @@ -48,6 +49,9 @@ proc compat-use-alt-compiler { } { if { $compat_alt_color == 0 } then { regsub -- "-fdiagnostics-color=never" $TEST_ALWAYS_FLAGS "" TEST_ALWAYS_FLAGS } + if { $compat_alt_urls == 0 } then { + regsub -- "-fdiagnostics-urls=never" $TEST_ALWAYS_FLAGS "" TEST_ALWAYS_FLAGS + } if { $compat_no_line_no == 0 } then { regsub -- "-fno-diagnostics-show-line-numbers" $TEST_ALWAYS_FLAGS "" TEST_ALWAYS_FLAGS } @@ -80,11 +84,13 @@ proc compat_setup_dfp { } { global compat_have_dfp global compat_alt_caret global compat_alt_color +global compat_alt_urls global compat_no_line_no global TEST_ALWAYS_FLAGS compat_save_TEST_ALWAYS_FLAGS set compat_alt_caret 0 set compat_alt_color 0 +set compat_alt_urls 0 set compat_no_line_no 0 set compat_save_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS @@ -105,6 +111,10 @@ proc compat_setup_dfp { } { int dummy; } "-fdiagnostics-color=never"] != 0 } { set compat_alt_color 1 } + if { [check_no_compiler_messages_nocache compat_alt_has_urls object { + int dummy; } "-fdiagnostics-urls=never"] != 0 } { + set compat_alt_urls 1 + } if { [check_no_compiler_messages_nocache compat_alt_has_no_line_no object { int dummy; } "-fno-diagnostics-show-line-numbers"] != 0 } { set compat_no_line_no 1 Jakub
Re: [Patch, fortran][9/10 Regression] PR 93236 -fno-automatic and RECURSIVE
On 1/16/20 3:47 PM, Mark Eggleston wrote: Please find attached patch to fix this regression. OK for master and 9 branch? OK for both – thanks! As note to others: * Up to Fortran 2008, 'recursive' attribute is required if a proc is recursively used; since Fortran 2018 recursive is the default (and the non_recursive attribute exists as well). For this patch (and the near future), we mostly ignore this bit of F2018. See also PR91413. * The PR is about a vendor extension – cf. test case pr93263_1.f90. The second test case should be standard conforming and ensures that neither this patch (nor a future patch) breaks it. [Namely, static memory is indeed used as requested.] Thanks, Tobias Change logs: gcc/fortran Mark Eggleston PR fortran/93236 * resolve.c (resolve_types): Declare boolean recursive and set with the value of the recursive attribute of namespace proc_name symbol structure if it exists. Call gfc_save_all if both flag_automatic and recursive are false or ns->save_all is true. gcc/testsuite Mark Eggleston Tobias Burnus * gfortran.dg/pr93263_1.f90: New test. * gfortran.dg/pr93263_2.f90: New test.
Re: Git: accessing vendors branches
On Thu, Jan 16, 2020 at 04:14:18PM +, Mark Eggleston wrote: > Using git ls-remote, refs for vendor branches can be seen e.g.: > > 1a7bff0abccf167830c6e0443f3591380b960cb0 > refs/vendors/redhat/heads/gcc-9-branch > > Using git branch -r and there is no sign of that branch. > > What needs to be done to access vendor branches? contrib/gcc-git-customization.sh #if you haven't run it yet contrib/git-fetch-vendor.sh redhat Jakub
Git: accessing vendors branches
Using git ls-remote, refs for vendor branches can be seen e.g.: 1a7bff0abccf167830c6e0443f3591380b960cb0 refs/vendors/redhat/heads/gcc-9-branch Using git branch -r and there is no sign of that branch. What needs to be done to access vendor branches? -- https://www.codethink.co.uk/privacy.html
Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)
On 1/8/20 3:18 PM, Stam Markianos-Wright wrote: > > > On 12/10/19 5:03 PM, Kyrill Tkachov wrote: >> Hi Stam, >> >> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote: >>> Pinging with more correct maintainers this time :) >>> >>> Also would need to backport to gcc7,8,9, but need to get this approved >>> first! >>> >> >> Sorry for the delay. > > Same here now! Sorry totally forget about this in the lead up to Xmas! > > Done the changes marked below and also removed the unnecessary extra #defines > from the test. Ping :) Cheers, Stam > >> >> >>> Thank you, >>> Stam >>> >>> >>> Forwarded Message >>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional >>> branches in Thumb2 (PR91816) >>> Date: Mon, 21 Oct 2019 10:37:09 +0100 >>> From: Stam Markianos-Wright >>> To: Ramana Radhakrishnan >>> CC: gcc-patches@gcc.gnu.org , nd , >>> James Greenhalgh , Richard Earnshaw >>> >>> >>> >>> >>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote: >>> >> >>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, >>> >> however, on my native Aarch32 setup the test times out when run as part >>> >> of a big "make check-gcc" regression, but not when run individually. >>> >> >>> >> 2019-10-11 Stamatis Markianos-Wright >>> >> >>> >> * config/arm/arm.md: Update b for Thumb2 range checks. >>> >> * config/arm/arm.c: New function arm_gen_far_branch. >>> >> * config/arm/arm-protos.h: New function arm_gen_far_branch >>> >> prototype. >>> >> >>> >> gcc/testsuite/ChangeLog: >>> >> >>> >> 2019-10-11 Stamatis Markianos-Wright >>> >> >>> >> * testsuite/gcc.target/arm/pr91816.c: New test. >>> > >>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h >>> >> index f995974f9bb..1dce333d1c3 100644 >>> >> --- a/gcc/config/arm/arm-protos.h >>> >> +++ b/gcc/config/arm/arm-protos.h >>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const >>> cpu_arch_option *, >>> >> >>> >> void arm_initialize_isa (sbitmap, const enum isa_feature *); >>> >> >>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char >>> >> *); >>> >> + >>> >> + >>> > >>> > Lets get the nits out of the way. >>> > >>> > Unnecessary extra new line, need a space between int and const above. >>> > >>> > >>> >>> .Fixed! >>> >>> >> #endif /* ! GCC_ARM_PROTOS_H */ >>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c >>> >> index 39e1a1ef9a2..1a693d2ddca 100644 >>> >> --- a/gcc/config/arm/arm.c >>> >> +++ b/gcc/config/arm/arm.c >>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void) >>> >> } >>> >> } /* Namespace selftest. */ >>> >> >>> >> + >>> >> +/* Generate code to enable conditional branches in functions over 1 >>> >> MiB. */ >>> >> +const char * >>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest, >>> >> + const char * branch_format) >>> > >>> > Not sure if this is some munging from the attachment but check >>> > vertical alignment of parameters. >>> > >>> >>> .Fixed! >>> >>> >> +{ >>> >> + rtx_code_label * tmp_label = gen_label_rtx (); >>> >> + char label_buf[256]; >>> >> + char buffer[128]; >>> >> + ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \ >>> >> + CODE_LABEL_NUMBER (tmp_label)); >>> >> + const char *label_ptr = arm_strip_name_encoding (label_buf); >>> >> + rtx dest_label = operands[pos_label]; >>> >> + operands[pos_label] = tmp_label; >>> >> + >>> >> + snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr); >>> >> + output_asm_insn (buffer, operands); >>> >> + >>> >> + snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, >>> label_ptr); >>> >> + operands[pos_label] = dest_label; >>> >> + output_asm_insn (buffer, operands); >>> >> + return ""; >>> >> +} >>> >> + >>> >> + >>> > >>> > Unnecessary extra newline. >>> > >>> >>> .Fixed! >>> >>> >> #undef TARGET_RUN_TARGET_SELFTESTS >>> >> #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests >>> >> #endif /* CHECKING_P */ >>> >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >>> >> index f861c72ccfc..634fd0a59da 100644 >>> >> --- a/gcc/config/arm/arm.md >>> >> +++ b/gcc/config/arm/arm.md >>> >> @@ -6686,9 +6686,16 @@ >>> >> ;; And for backward branches we have >>> >> ;; (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + >>> >>4). >>> >> ;; >>> >> +;; In 16-bit Thumb these ranges are: >>> >> ;; For a 'b' pos_range = 2046, neg_range = -2048 giving >>> (-2040->2048). >>> >> ;; For a 'b' pos_range = 254, neg_range = -256 giving (-250 >>> >>->256). >>> >> >>> >> +;; In 32-bit Thumb these ranges are: >>> >> +;; For a 'b' +/- 16MB is not checked for. >>> >> +;; For a 'b' pos_range = 1048574, neg_range = -1048576 giving >>> >> +;; (-1048568 -> 1048576). >>> >> + >>> >> + >>> > >>> > Unnecessary extra newline. >>> > >>> >>> .Fixed! >>> >>> >> (define_expand "cbranchsi4" >>> >> [(set
[Pingx2][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: > > > On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >> >> >> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>> Hi all, >>> >>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>> operations (vector/by element) to the ARM back-end. >>> >>> These are: >>> usdot (vector), dot (by element). >>> >>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>> for ARM they remain optional as of ARMv8.6-a. >>> >>> The functions are declared in arm_neon.h, RTL patterns are defined to >>> generate assembler and tests are added to verify and perform adequate >>> checks. >>> >>> Regression testing on arm-none-eabi passed successfully. >>> >>> This patch depends on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>> >>> for ARM CLI updates, and on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>> >>> for testsuite effective_target update. >>> >>> Ok for trunk? >> >> .Ping :) >> > Ping :) > > New diff addressing review comments from Aarch64 version of the patch. > > _Change of order of operands in RTL patterns. > _Change tests to use check-function-bodies, compile with optimisation and > check > for exact registers. > _Rename tests to remove "-compile-" in filename. > Ping! Cheers, Stam >>> >>> Cheers, >>> Stam >>> >>> >>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>> >>> PS. I don't have commit rights, so if someone could commit on my behalf, >>> that would be great :) >>> >>> >>> gcc/ChangeLog: >>> >>> 2019-11-28 Stam Markianos-Wright >>> >>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>> (USTERNOP_QUALIFIERS): New define. >>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (arm_expand_builtin_args): >>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>> * config/arm/arm_neon.h (vusdot_s32): New. >>> (vusdot_lane_s32): New. >>> (vusdotq_lane_s32): New. >>> (vsudot_lane_s32): New. >>> (vsudotq_lane_s32): New. >>> * config/arm/arm_neon_builtins.def >>> (usdot,usdot_lane,sudot_lane): New. >>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>> (sup, opsuffix): Add . >>> * config/arm/neon.md (neon_usdot, dot_lane: New. >>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>> >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2019-12-12 Stam Markianos-Wright >>> >>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>> >>> >
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension
On 1/9/20 3:48 PM, Richard Sandiford wrote: > OK, thanks. > Committed as r10-6004-g8c197c851e7528baba7cb837f34c05ba2242f705 Thank you! Stam > Richard > > Stam Markianos-Wright writes: >> On 12/30/19 10:21 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: On 12/20/19 2:13 PM, Richard Sandiford wrote: > Stam Markianos-Wright writes: >> +**... >> +**ret >> +*/ >> +int32x2_t ufoo (int32x2_t r, uint8x8_t x, int8x8_t y) >> +{ >> + return vusdot_s32 (r, x, y); >> +} >> + > > If we're using check-function-bodies anyway, it might be slightly more > robust to compile at -O and check for the exact RA. E.g.: > > /* > **ufoo: > **usdotv0\.2s, (v1\.8b, v2\.8b|v2\.8b, v1\.8b) > **ret > */ > > Just a suggestion though -- either way is fine. done this too and as per our internal discussion also added one xx_untied tests for usdot and one for usdot_lane That's one xx_untied test for each of the RTL pattern types added in aarch64-simd.md. Lmk if this is ok! Also I found that the way we were using check-function-bodies wasn't actually checking the assembler correctly, so I've changed that to: +/* { dg-final { check-function-bodies "**" "" "" } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ which seems to perform more checks >>> >>> Ah, OK, hadn't realised that we were cycling through optimisation >>> options already. In that case, it might be better to leave out the >>> -O from the dg-options and instead use: >>> >>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } { "-O0" } } } */ >>> >>> (untested). >>> >>> It's unfortunate that we're skipping this for -O0 though. Ideally we'd >>> still compile the code and just skip the dg-final. Does it work if you do: >>> >>> /* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ >>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } } } */ >>> >>> ? Make sure that we actually still run the check-function-bodies when >>> optimisation is enabled. :-) >> >> This works! >> Now we are only doing the following for O0: >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O0 (test >> for >> excess errors) >> >> whereas for other optimisation levels do all the checks: >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 (test >> for >> excess errors) >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfoo_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfoo_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfooq_lane >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies sfooq_laneq >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufoo_untied >> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c -O1 >> check-function-bodies ufooq_laneq_untied >> >>> >>> Also, I'm an idiot. The reason I'd used (...|...) in the regexps was >>> that "dot product is commutative". But of course that's not true for >>> these mixed-sign ops, so the string must be: >>> >>> usdot v0\.2s, v1\.8b, v2\.8b >>> >>> The patch copied the (...|...) regexps above to the lane tests, but those >>> wouldn't be commutative even if the operands had the same type. >> >> Ahh, makes sense now. Done :) >> >> Cheers, >> Stam >> >>> >>> Thanks, >>> Richard >>> >> >> >> diff --git a/gcc/config/aarch64/aarch64-builtins.c >> b/gcc/config/aarch64/aarch64-builtins.c >> index >> 1bd2640a1ced352de232fed1cf134b46c69b80f7..702b317d94d2fc6ebe59609727ad853f3f5cc652 >> 100644 >> --- a/gcc/config/aarch64/aarch64-builtins.c >> +++ b/gcc/config/aarch64/aarch64-builtins.c >> @@ -107,6 +107,9 @@ enum aarch64_type_qualifiers >> /* Lane indices selected in pairs. - must be in range, and flipped for >>bigendian. */ >> qualifier_lane_pair_index = 0x800, >> + /* Lane indices selected in quadtuplets. - must be in range, and flipped >> for >> + bigendian. */ >> + qualifier_lane_quadtup_index = 0x1000, >> }; >> >> typedef struct >> @@ -173,6 +176,10 @@ >>
Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension
On 1/9/20 3:54 PM, Richard Sandiford wrote: > Please update the names of the testsuite files to match the ones > in the bfloat16_t patch. (Same for the usdot/sudot patch -- sorry > for forgetting there.) > > OK with that change, thanks. > Done and committed as r10-6006-gf275d73a57f1e5a07fbd4978f4b4457a5eaa1e39 Thank you! Stam > Richard > > Stam Markianos-Wright writes: >> On 12/30/19 10:29 AM, Richard Sandiford wrote: >>> Stam Markianos-Wright writes: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index adfda96f077075ad53d4bea2919c4d3b326e49f5..7587bc46ba1c80389ea49fa83a0e6f8a489711e9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7028,3 +7028,36 @@ "xtn\t%0., %1." [(set_attr "type" "neon_shift_imm_narrow_q")] ) + +(define_insn "aarch64_bfdot" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (plus:VDQSF +(unspec:VDQSF + [(match_operand: 2 "register_operand" "w") + (match_operand: 3 "register_operand" "w")] + UNSPEC_BFDOT) +(match_operand:VDQSF 1 "register_operand" "0")))] + "TARGET_BF16_SIMD" + "bfdot\t%0., %2., %3." + [(set_attr "type" "neon_dot")] +) + + +(define_insn "aarch64_bfdot_lane" >>> >>> Too many blank lines. >> >> Fixed, sorry I hadn't noticed! >> >>> + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (plus:VDQSF +(unspec:VDQSF + [(match_operand: 2 "register_operand" "w") + (match_operand:VBF 3 "register_operand" "w") + (match_operand:SI 4 "const_int_operand" "n")] + UNSPEC_BFDOT) +(match_operand:VDQSF 1 "register_operand" "0")))] + "TARGET_BF16_SIMD" +{ + int nunits = GET_MODE_NUNITS (mode).to_constant (); + int lane = INTVAL (operands[4]); + operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode); + return "bfdot\t%0., %2., %3.2h[%4]"; +} + [(set_attr "type" "neon_dot")] +) [...] diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c new file mode 100644 index ..c575dcd3901172a52fa9403c9179d58eea44eb72 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c @@ -0,0 +1,91 @@ +/* { dg-do assemble { target { aarch64*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-O -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ >>> >>> Same comment as for USDOT/SUDOT regarding the dg- markup. >> >> Done! >>> + +#include + +/* +**ufoo: +**bfdot v0.2s, (v1.4h, v2.4h|v2.4h, v1.4h) +**ret +*/ +float32x2_t ufoo(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) +{ + return vbfdot_f32 (r, x, y); +} + +/* +**ufooq: +**bfdot v0.4s, (v1.8h, v2.8h|v2.8h, v1.8h) +**ret +*/ +float32x4_t ufooq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) +{ + return vbfdotq_f32 (r, x, y); +} >>> >>> The (...|...)s here are correct. >> Yep. >>> + +/* +**ufoo_lane: +**bfdot v0.2s, (v1.4h, v2.2h\[0\]|v2.4h, v1.2h\[0\]) +**ret +*/ +float32x2_t ufoo_lane(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y) +{ + return vbfdot_lane_f32 (r, x, y, 0); +} + +/* +**ufooq_laneq: +**bfdot v0.4s, (v1.8h, v2.2h\[2\]|v2.8h, v1.2h\[2\]) +**ret +*/ +float32x4_t ufooq_laneq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) +{ + return vbfdotq_laneq_f32 (r, x, y, 2); +} + +/* +**ufoo_laneq: +**bfdot v0.2s, (v1.4h, v2.2h\[3\]|v2.4h, v1.2h\[3\]) +**ret +*/ +float32x2_t ufoo_laneq(float32x2_t r, bfloat16x4_t x, bfloat16x8_t y) +{ + return vbfdot_laneq_f32 (r, x, y, 3); +} + +/* +**ufooq_lane: +**bfdot v0.4s, (v1.8h, v2.2h\[1\]|v2.8h, v1.2h\[1\]) +**ret +*/ +float32x4_t ufooq_lane(float32x4_t r, bfloat16x8_t x, bfloat16x4_t y) +{ + return vbfdotq_lane_f32 (r, x, y, 1); +} >>> >>> But these aren't, since the operands must be in the order given. >> Yep. >>> + +/* +**ufoo_untied: +**mov v0.8b, v1.8b +**bfdot v0.2s, (v2.4h, v3.4h|v3.4h, v2.4h) +**ret +*/ +float32x2_t ufoo_untied(float32x4_t unused, float32x2_t r, bfloat16x4_t x,
Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi Frederik! On 2019-12-20T17:46:57+0100, "Harwath, Frederik" wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c I suggest to rename this one to 'acc_get_property-nvptx.c'. > @@ -0,0 +1,68 @@ > +/* Test the `acc_get_property' and '`acc_get_property_string' library > + functions on Nvidia devices by comparing property values with > + those obtained through the CUDA API. */ > +/* { dg-additional-sources acc_get_property-aux.c } */ > +/* { dg-additional-options "-lcuda -lcudart" } */ > +/* { dg-do run { target openacc_nvidia_accel_selected } } */ > + > +#include > +#include > +#include > +#include > +#include > + > +void expect_device_properties > +(acc_device_t dev_type, int dev_num, > + int expected_total_mem, int expected_free_mem, > + const char* expected_vendor, const char* expected_name, > + const char* expected_driver); > + > +int main () > +{ > + int dev_count; > + cudaGetDeviceCount (_count); > + > + for (int dev_num = 0; dev_num < dev_count; ++dev_num) > +{ > + if (cudaSetDevice (dev_num) != cudaSuccess) > + { > + fprintf (stderr, "cudaSetDevice failed.\n"); > + abort (); > + } > + > + printf("Checking device %d\n", dev_num); > + > + const char *vendor = "Nvidia"; > + size_t free_mem; > + size_t total_mem; > + if (cudaMemGetInfo(_mem, _mem) != cudaSuccess) > + { > + fprintf (stderr, "cudaMemGetInfo failed.\n"); > + abort (); > + } > + > + struct cudaDeviceProp p; > + if (cudaGetDeviceProperties(, dev_num) != cudaSuccess) > + { > + fprintf (stderr, "cudaGetDeviceProperties failed.\n"); > + abort (); > + } > + > + int driver_version; > + if (cudaDriverGetVersion(_version) != cudaSuccess) > + { > + fprintf (stderr, "cudaDriverGetVersion failed.\n"); > + abort (); > + } > + /* The version string should contain the version of the CUDA Toolkit > + in the same MAJOR.MINOR format that is used by Nvidia. > + The format string below is the same that is used by the deviceQuery > + program, which belongs to Nvidia's CUDA samples, to print the version. > */ > + char driver[30]; > + snprintf (driver, sizeof driver, "CUDA Driver %u.%u", > + driver_version / 1000, driver_version % 1000 / 10); > + > + expect_device_properties(acc_device_nvidia, dev_num, This assumes that the 'cuda*' interfaces and OpenACC/libgomp interfaces handle/order device numbers in the same way -- which it seems they do, but just noting this in case this becomes an issue at some point. > +total_mem, free_mem, vendor, p.name, driver); > +} > +} So I just witnessed a FAIL here, because: Expected acc_property_free_memory to equal -929226752, but was -928956416. Aside from improper data types being used for storing/printing the memory information, we have to expect 'acc_property_free_memory' to change between two invocations. ;-) > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c I suggest to rename this one to 'acc_get_property-host.c'. > @@ -0,0 +1,19 @@ > +/* Test the `acc_get_property' and '`acc_get_property_string' library > + functions for the host device. */ > +/* { dg-additional-sources acc_get_property-aux.c } */ > +/* { dg-do run } */ > + > +#include > +#include > + > +void expect_device_properties > +(acc_device_t dev_type, int dev_num, > + int expected_total_mem, int expected_free_mem, > + const char* expected_vendor, const char* expected_name, > + const char* expected_driver); > + > +int main() > +{ > + printf ("Checking acc_device_host device properties\n"); > + expect_device_properties (acc_device_host, 0, 0, 0, "GNU", "GOMP", "1.0"); > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c > @@ -0,0 +1,80 @@ > +/* Auxiliary functions for acc_get_property tests */ > +/* { dg-do compile { target skip-all-targets } } */ > + > +#include > +#include > +#include > +#include > + > +void expect_device_properties > +(acc_device_t dev_type, int dev_num, > + int expected_total_mem, int expected_free_mem, > + const char* expected_vendor, const char* expected_name, > + const char* expected_driver) > +{ > + const char *vendor = acc_get_property_string (dev_num, dev_type, > + acc_property_vendor); > + if (strcmp (vendor, expected_vendor)) > +{ > + fprintf (stderr, "Expected acc_property_vendor to equal \"%s\", " > +"but was \"%s\".\n", expected_vendor, vendor); > + abort (); > +} > + > + int total_mem = acc_get_property (dev_num, dev_type, > + acc_property_memory); > + if (total_mem != expected_total_mem) > +{ > + fprintf (stderr, "Expected acc_property_memory to equal %d, " > +"but was %d.\n", expected_total_mem, total_mem); > + abort (); >
Re: [PATCH] libstdc++/91223 Improve unordered containers == operator
On 16/01/20 13:25 +, Jonathan Wakely wrote: On 16/01/20 07:42 +0100, François Dumont wrote: On 1/15/20 10:52 PM, Jonathan Wakely wrote: On 15/01/20 21:48 +, Jonathan Wakely wrote: On 14/01/20 22:25 +0100, François Dumont wrote: On 1/13/20 10:53 PM, Jonathan Wakely wrote: On 13/01/20 22:41 +0100, François Dumont wrote: For the multi-keys we could still avoid redundant comparisons when _Equal is just doing == on the key type. On unordered_multiset we could just avoids the call to is_permuation and on the unordered_multimap we could check the is_permutation only on the associated value rather than on the std::pair. I don't think that's necessary, or helpful. The idea of https://gcc.gnu.org/ml/libstdc++/2020-01/msg00070.html is that you shouldn't be using _Equal at all, and therefore it doesn't matter whether it's std::equal_to or not. And it was indeed possible. Nice! PR libstdc++/91223 * include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend. * include/bits/hashtable_policy.h: Include . (_Equality_base): Remove. (_Equality<>::_M_equal): Review implementation. Use std::is_permutation. * testsuite/23_containers/unordered_multiset/operators/1.cc (Hash, Equal, test02, test03): New. * testsuite/23_containers/unordered_set/operators/1.cc (Hash, Equal, test02, test03): New. Tested under Linux x86_64. Ok to commit ? Yes, OK for trunk (we're in stage4 but your patch was posted in stage3 and fixes a pretty nasty performance bug, so is OK now). N.B. GCC has moved to Git instead of Subversion. If you don't have Git access set up let me know and I can commit the patch for you. I haven't done the move yet and won't be able to do it before the week-end. So please proceed to the commit for me, thanks. No problem, I can do that. Your patch is now committed to trunk. Thanks for the major improvement. I had a look at std::is_permutation and I think we can make some simplifications to the 4-argument overload, and we can share most of the code between the 3-arg and 4-arg overloads (once they've confirmed the lengths are the same they do exactly the same thing). See the attached patch. This should probably wait for stage1 though. I also wanted to add some comments to the _Equality::_M_equal specialiation for unordered_multimap/multiset to explain what the code was doing, and had some more ideas. See patch again. It looks like this loop can potentially visit every element of __other, instead of stopping at the end of the bucket: typename __hashtable::const_iterator __ity(__y_n); for (auto __ity_end = __ity; __ity_end != __other.end(); ++__ity_end) if (--__x_count == 0) break; Consider a case like this: unordered_multiset a{1, 2, 3, 4}; for (int i = 0; i <1; ++i) a.insert(1); unordered_multiset b{1, 2, 3, 4}; for (int i = 0; i <1; ++i) b.insert(2); When doing a == b we'll find 1 elements in a with key '1', and find one element in b with that key. But then we iterate through every element in b after that one, even though they have different keys and are probably in different buckets. Instead of just iterating from __ity to __other.end(), can we use a local iterator so we stop at the end of the bucket? This seems to make the PR91263 example *very* slightly slower, but makes the example above significantly faster. What do you think? diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h index 6503d1518d3..6d5d3b2fced 100644 --- a/libstdc++-v3/include/bits/stl_algo.h +++ b/libstdc++-v3/include/bits/stl_algo.h @@ -3588,6 +3588,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return std::make_pair(*__p.first, *__p.second); } + // precondition: distance(__first1, __last1) == distance(__first2, __last2). + template +_GLIBCXX20_CONSTEXPR +bool +__is_permutation_impl(_FwdIter1 __first1, _FwdIter1 __last1, + _FwdIter2 __first2, _FwdIter2 __last2, + _BinaryPredicate __pred = {}) +{ + for (auto __scan = __first1; __scan != __last1; ++__scan) + { + if (__scan != std::__find_if(__first1, __scan, + __gnu_cxx::__ops::__iter_comp_iter(__pred, __scan))) + continue; // We've seen this one before. + + auto __matches + = std::__count_if(__first2, __last2, + __gnu_cxx::__ops::__iter_comp_iter(__pred, __scan)); + if (0 == __matches || + std::__count_if(__scan, __last1, + __gnu_cxx::__ops::__iter_comp_iter(__pred, __scan)) + != __matches) + return false; + } + return true; +} + template _GLIBCXX20_CONSTEXPR @@ -3604,26 +3631,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (__first1 == __last1) return true; - // Establish __last2 assuming equal ranges by iterating over the - // rest of the list. - _ForwardIterator2 __last2 = __first2; - std::advance(__last2, std::distance(__first1, __last1)); - for (_ForwardIterator1 __scan = __first1; __scan != __last1; ++__scan) - { -
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]
On 1/13/20 10:43 AM, Kyrill Tkachov wrote: > Hi Stam, > > On 1/10/20 6:47 PM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch is part 2 of Bfloat16_t enablement in the ARM back-end. >> >> This new type is constrained using target hooks TARGET_INVALID_CONVERSION, >> TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used >> through ACLE intrinsics (will be provided in later patches). >> >> Regression testing on arm-none-eabi passed successfully. >> >> Ok for trunk? > > > Ok. > > Thanks, > > Kyrill Committed as r10-6021-g3ea9140170b8a511822b1a873dea1227093f3ccf Thank you! Stam > > >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> Details on ARM Bfloat can be found here: >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a >> >> >> >> >> >> gcc/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * config/arm/arm.c >> (arm_invalid_conversion): New function for target hook. >> (arm_invalid_unary_op): New function for target hook. >> (arm_invalid_binary_op): New function for target hook. >> >> 2020-01-10 Stam Markianos-Wright >> >> * gcc.target/arm/bfloat16_scalar_typecheck.c: New test. >> * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test. >> * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test. >> >>
Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
On 1/13/20 10:05 AM, Kyrill Tkachov wrote: > Hi Stam, > > On 1/10/20 6:45 PM, Stam Markianos-Wright wrote: >> Hi all, >> >> This is a respin of patch: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html >> >> which has now been split into two (similar to the Aarch64 version). >> >> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end. >> It also adds a new machine_mode (BFmode) for this type and accompanying >> Vector >> modes V4BFmode and V8BFmode. >> >> The second patch in this series uses existing target hooks to restrict type >> use. >> >> Regression testing on arm-none-eabi passed successfully. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for test suite effective_target update. >> >> Ok for trunk? > > This is ok, thanks. > > You can commit it once the git conversion goes through :) Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b Thank you! Stam > > Kyrill > > >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> Details on ARM Bfloat can be found here: >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a >> >> >> >> >> >> gcc/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * config.gcc: Add arm_bf16.h. >> * config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment. >> (arm_simd_builtin_std_type): Add BFmode. >> (arm_init_simd_builtin_types): Define element types for vector types. >> (arm_init_bf16_types): New function. >> (arm_init_builtins): Add arm_init_bf16_types function call. >> * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes. >> * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF. >> * config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode. >> (arm_hard_regno_mode_ok): Add BFmode and tidy up statements. >> (arm_vector_mode_supported_p): Add V4BF, V8BF. >> (arm_mangle_type): >> * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE, >> VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node, >> arm_bf16_ptr_type_node. >> * config/arm/arm.md: New enabled_for_bfmode_scalar, >> enabled_for_bfmode_vector attributes. Add BFmode to movhf expand. >> pattern and define_split between ARM registers. >> * config/arm/arm_bf16.h: New file. >> * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types. >> * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): >> New. >> (VQXMOV): Add V8BF. >> * config/arm/neon.md: Add BF vector types to NEON move patterns. >> * config/arm/vfp.md: Add BFmode to movhf patterns. >> >> gcc/testsuite/ChangeLog: >> >> 2020-01-10 Stam Markianos-Wright >> >> * g++.dg/abi/mangle-neon.C: Add Bfloat vector types. >> * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test. >> * gcc.target/arm/bfloat16_scalar_1_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_1_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_2_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_2_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_3_1.c: New test. >> * gcc.target/arm/bfloat16_scalar_3_2.c: New test. >> * gcc.target/arm/bfloat16_scalar_4.c: New test. >> * gcc.target/arm/bfloat16_simd_1_1.c: New test. >> * gcc.target/arm/bfloat16_simd_1_2.c: New test. >> * gcc.target/arm/bfloat16_simd_2_1.c: New test. >> * gcc.target/arm/bfloat16_simd_2_2.c: New test. >> * gcc.target/arm/bfloat16_simd_3_1.c: New test. >> * gcc.target/arm/bfloat16_simd_3_2.c: New test. >> >> >>
Re: [PATCH][AArch64] PR92424: Fix -fpatchable-function-entry=N,M with BTI
Szabolcs Nagy writes: > this affects the linux kernel and technically a wrong code bug > so this fix tries to be backportable (fixing all issues with > -fpatchable-function-entry=N,M will likely require new option). Even for the backportable version, I think it would be better not to duplicate so much varasm stuff. Perhaps instead we could: (a) set a cfun->machine flag in aarch64_declare_function_name to say that we've assembled the label (b) override print_patchable_function_entry so that it prints a BTI if this flag is set and the usual other conditions are true As discussed off-list, it'd be good to avoid a second BTI after the nops if possible. print_patchable_function_entry should be able to find the first instruction using get_insns and next_real_insn, then remove it if it turns out to be a BTI. Thanks, Richard > > gcc/ChangeLog: > > 2020-01-16 Szabolcs Nagy > > PR target/92424 > * config/aarch64/aarch64.c (aarch64_declare_function_name): Emit BTI c > if the function label is followed by a patch area. > > From ac2a46bab60ecc80a453328b4749a951908c02c5 Mon Sep 17 00:00:00 2001 > From: Szabolcs Nagy > Date: Wed, 15 Jan 2020 12:23:40 + > Subject: [PATCH] [AArch64] PR92424: Fix -fpatchable-function-entry=N,M with > BTI > > This is a minimal workaround that emits another BTI after the function > label if that is followed by a patch area. > > So before this commit -fpatchable-function-entry=3,1 with bti generates > > .section __patchable_function_entries > .8byte .LPFE > .text > .LPFE: > nop > foo: > nop > nop > bti c // or paciasp > ... > > and after this commit > > .section __patchable_function_entries > .8byte .LPFE > .text > .LPFE: > nop > foo: > bti c > nop > nop > bti c // or paciasp > ... > > There is a new bti insn in the middle of the patchable area unless M=0 > (patch area is after the new bti) or M=N (patch area is before the > label, no new bti). > > Note that .cfi_startproc and the asynchronous unwind table entry label > comes after the patch area, but whatever effect that has on the newly > inserted bti c, it already affected the insns in the patch area. > > Tested on aarch64-none-linux-gnu. > > gcc/ChangeLog: > > PR target/92424 > * config/aarch64/aarch64.c (aarch64_declare_function_name): Emit BTI c > if the function label is followed by a patch area. > --- > gcc/config/aarch64/aarch64.c | 33 + > 1 file changed, 33 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 66e20becaf2..0394c274330 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -18079,6 +18079,39 @@ aarch64_declare_function_name (FILE *stream, const > char* name, >/* Don't forget the type directive for ELF. */ >ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function"); >ASM_OUTPUT_LABEL (stream, name); > + > + if (!aarch64_bti_enabled () > + || cgraph_node::get (fndecl)->only_called_directly_p ()) > +return; > + > + /* Copy logic from varasm.c assemble_start_function > + to handle -fpatchable-function-entry=N,M with BTI. */ > + unsigned HOST_WIDE_INT patch_area_size = function_entry_patch_area_size; > + unsigned HOST_WIDE_INT patch_area_entry = function_entry_patch_area_start; > + > + tree patchable_function_entry_attr > += lookup_attribute ("patchable_function_entry", DECL_ATTRIBUTES > (fndecl)); > + if (patchable_function_entry_attr) > +{ > + tree pp_val = TREE_VALUE (patchable_function_entry_attr); > + tree patchable_function_entry_value1 = TREE_VALUE (pp_val); > + > + patch_area_size = tree_to_uhwi (patchable_function_entry_value1); > + patch_area_entry = 0; > + if (TREE_CHAIN (pp_val) != NULL_TREE) > + { > + tree patchable_function_entry_value2 > + = TREE_VALUE (TREE_CHAIN (pp_val)); > + patch_area_entry = tree_to_uhwi (patchable_function_entry_value2); > + } > +} > + > + if (patch_area_entry > patch_area_size) > +patch_area_entry = 0; > + > + /* Emit a BTI c if a patch area comes after the function label. */ > + if (patch_area_size > patch_area_entry) > +asm_fprintf (stream, "\thint\t34 // bti c\n"); > } > > /* Implement ASM_OUTPUT_DEF_FROM_DECLS. Output .variant_pcs for aliases. */
Re: [patch] contrib: Don't add push rules for personal and vendor spaces.
On Wed, Jan 15, 2020 at 10:48:23PM +, Joseph Myers wrote: > A reasonable replacement for those push rules might be command aliases > (e.g. "git upush " to push the user branch ). Would we have then separate aliases for pushing to one vendor branch and another vendor branch? Couldn't one alias determine from the branch remote and merge and perhaps the fetch rules figure out the right git push arguments given the setup (e.g. treat $(git config --get gcc-config.userpfx) and $(git config --get gcc-config.user) as push to user branch, devel/*, master, releases/* as normal git push and the rest as vendors (or let git-fetch-vendor.sh record list of all used vendors in some gcc-config setting)? Jakub
Re: [patch] contrib: Don't add push rules for personal and vendor spaces.
On Wed, Jan 15, 2020 at 5:56 PM Joseph Myers wrote: > A reasonable replacement for those push rules might be command aliases > (e.g. "git upush " to push the user branch ). > And/or git mpush for push origin HEAD:master Jason
Re: contrib: Check and if needed set user.name and user.email in gcc-git-customization.sh
On Jan 16 2020, Richard Earnshaw (lists) wrote: > diff --git a/contrib/gcc-git-customization.sh > b/contrib/gcc-git-customization.sh > index dae2c35bb57..1cde6fd8224 100755 > --- a/contrib/gcc-git-customization.sh > +++ b/contrib/gcc-git-customization.sh > @@ -11,9 +11,9 @@ ask () { > read answer > if [ "x$answer" = "x" ] > then > - eval $var=$default > + eval $var=\"$default\" > else > - eval $var=$answer > + eval $var=\"$answer\" This still isn't the safe way to do indirect assignment. The expansion of the rhs needs to be delayed, so that the result isn't subject to further expansions. Andreas. Subject: [PATCH] gcc-git-customization.sh: avoid double expansion --- contrib/gcc-git-customization.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh index b7e4ce308062..26f4389bcc8a 100755 --- a/contrib/gcc-git-customization.sh +++ b/contrib/gcc-git-customization.sh @@ -11,9 +11,9 @@ ask () { read answer if [ "x$answer" = "x" ] then - eval $var=\"$default\" + eval $var=\$default else - eval $var=\"$answer\" + eval $var=\$answer fi } -- 2.25.0 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[Patch, fortran][9/10 Regression] PR 93236 -fno-automatic and RECURSIVE
Please find attached patch to fix this regression. OK for master and 9 branch? Change logs: gcc/fortran Mark Eggleston PR fortran/93236 * resolve.c (resolve_types): Declare boolean recursive and set with the value of the recursive attribute of namespace proc_name symbol structure if it exists. Call gfc_save_all if both flag_automatic and recursive are false or ns->save_all is true. gcc/testsuite Mark Eggleston Tobias Burnus * gfortran.dg/pr93263_1.f90: New test. * gfortran.dg/pr93263_2.f90: New test. -- https://www.codethink.co.uk/privacy.html >From aefc0be25bc7ea1a216ac1950a856d5b654d5493 Mon Sep 17 00:00:00 2001 From: Mark Eggleston Date: Thu, 16 Jan 2020 14:08:11 + Subject: [PATCH] Fortran: PR93263 -fno-automatic and RECURSIVE The use of -fno-automatic should not affect the save attribute of a recursive procedure. The first test case checks unsaved variables and the second checks saved variables. --- gcc/fortran/resolve.c | 3 ++- gcc/testsuite/gfortran.dg/pr93263_1.f90 | 29 + gcc/testsuite/gfortran.dg/pr93263_2.f90 | 24 3 files changed, 55 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr93263_1.f90 create mode 100644 gcc/testsuite/gfortran.dg/pr93263_2.f90 diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 6f2a4c4d65a..bddab39d023 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -17079,6 +17079,7 @@ resolve_types (gfc_namespace *ns) gfc_data *d; gfc_equiv *eq; gfc_namespace* old_ns = gfc_current_ns; + bool recursive = ns->proc_name && ns->proc_name->attr.recursive; if (ns->types_resolved) return; @@ -17132,7 +17133,7 @@ resolve_types (gfc_namespace *ns) gfc_traverse_ns (ns, resolve_values); - if (ns->save_all || !flag_automatic) + if (ns->save_all || (!flag_automatic && !recursive)) gfc_save_all (ns); iter_stack = NULL; diff --git a/gcc/testsuite/gfortran.dg/pr93263_1.f90 b/gcc/testsuite/gfortran.dg/pr93263_1.f90 new file mode 100644 index 000..f96b3589411 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr93263_1.f90 @@ -0,0 +1,29 @@ +! { dg-do compile } +! { dg-options "-fno-automatic -fdump-tree-original" } +! +! Test contributed by Mark Eggleston + +program main + implicit none + call check(2) +end + +recursive subroutine check(n) + implicit none + integer n, a + a = 10 + print*,"n=",n + if (n==1) then +a=a-1 +print*,"assigning a=",a + else +a=a-2 +print*,"assigning a=",a +call check(n-1) + endif + print*,"a=",a +end + +! { dg-final { scan-tree-dump-not "static integer\\(kind=4\\) a" "original" } } +! { dg-final { scan-tree-dump-not "integer\\(kind=4\\) a" "original" } } + diff --git a/gcc/testsuite/gfortran.dg/pr93263_2.f90 b/gcc/testsuite/gfortran.dg/pr93263_2.f90 new file mode 100644 index 000..fd353c6b548 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr93263_2.f90 @@ -0,0 +1,24 @@ +! { dg-do run } +! +! Test contributed by Tobias Burnus + + integer :: cnt + cnt = 0 + call sub() + if (cnt /= 5) stop 1 +contains + recursive subroutine sub() +save +logical :: first = .true. +integer :: i +cnt = cnt + 1 +if (first) then + first = .false. + i = 1 +end if +print *, "Hello", i +i = i + 1 +if (i <= 5) call sub() + end subroutine +end + -- 2.11.0
Re: Prevent all uses of DFP when unsupported (PR c/91985)
On Thu, Jan 16, 2020 at 02:42:29PM +, Szabolcs Nagy wrote: > what is the expected way to fix this issue? > > i see hppa-linux-gnu baseline was updated to remove > the decimal rtti symbols, but other targets were not. > > is it better to update the baseline or wait for a > generic fix? My preference is a generic fix and we have a regression PR about this. Jakub
Re: Prevent all uses of DFP when unsupported (PR c/91985)
On 27/11/2019 18:00, Jakub Jelinek wrote: > On Wed, Nov 27, 2019 at 05:48:21PM +, Joseph Myers wrote: On 26/11/19 00:57 +, Joseph Myers wrote: > On Mon, 25 Nov 2019, Rainer Orth wrote: > >> and a few more, all DFP related. They used to be emitted by g++ for >> __fundamental_type_info in libsupc++/fundamental_type_info.cc and lived >> in the CXXABI_1.3.4 version. However, since Solaris *does* lack DFP >> support, that's no longer the case. I'm uncertain how best to deal with >> this, however. > > As I understand it, _GLIBCXX_USE_DECIMAL_FLOAT should already have been > undefined for this target, and so std::decimal::decimal32 etc. should not > have been usable (both the header not working without that define, and the > mode attributes in the header being rejected by the front end when DFP is > unsupported). I.e. such defines in libsupc++ would never have been usable > on this target, so I think they are something it should be safe to remove >from the ABI baseline. If it's actually impossible that any real program could have depended on those symbols, then I agree. >>> >>> this is exactly what I've got no way of telling, that's why I was asking >>> for guidance. Just removing the DFP symbols from the baselines works, >>> of course. >> >> I don't think any real program could have used those symbols; it would >> have required using __typeof (__builtin_fabsd32 (0)) or similar to access >> types that weren't normally available for those targets (and by accessing >> the types using builtins like that, you're getting a completely undefined >> function-calling ABI for them anyway). > > I think various tools we use to check ABI will be unhappy about removal > of symbols. Can't we on targets that do support aliases and don't support > decimal float e.g. alias the DFP rtti symbols to void rtti symbols? what is the expected way to fix this issue? i see hppa-linux-gnu baseline was updated to remove the decimal rtti symbols, but other targets were not. is it better to update the baseline or wait for a generic fix?
Re: contrib: Check and if needed set user.name and user.email in gcc-git-customization.sh
On 15/01/2020 16:59, Richard Earnshaw (lists) wrote: As discussed on IRC, this adds a couple more checks in the customization setup for git. If the variables user.name and user.email are not set anywhere in the git config hierarchy, we set some local values. We always ask about the values we detect and if the user gives an answer that is new, we save that in the local config: this gives the opportunity to use different values to those configured for the global space. I've also cleaned up a couple of minor niggles, such as using $() rather than `` for subshells and some quoting issues when using eval. * gcc-git-customization.sh: Use $() instead of ``, and fix variable quoting with eval. Set user.name and user.email if not found in the local or config. I'm not sure what I did wrong yesterday when I (thought I wrote) git config --get "user.name" and found it not looking across all config scopes, but clearly I did something sill as the above does 'just work'. So this is what I've checked in. R. diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh index dae2c35bb57..1cde6fd8224 100755 --- a/contrib/gcc-git-customization.sh +++ b/contrib/gcc-git-customization.sh @@ -11,9 +11,9 @@ ask () { read answer if [ "x$answer" = "x" ] then - eval $var=$default + eval $var=\"$default\" else - eval $var=$answer + eval $var=\"$answer\" fi } @@ -30,7 +30,52 @@ git config alias.gcc-undescr \!"f() { o=\$(git config --get gcc-config.upstream) # *.mddiff=md git config diff.md.xfuncname '^\(define.*$' -upstream=`git config --get "gcc-config.upstream"` +set_user=$(git config --get "user.name") +set_email=$(git config --get "user.email") + +if [ "x$set_user" = "x" ] +then +# Try to guess the user's name by looking it up in the password file +new_user=$(getent passwd $(whoami) | awk -F: '{ print $5 }') +if [ "x$new_user" = "x" ] +then + new_user="(no default)" +fi +else +new_user=$set_user +fi +ask "Your name" "${new_user}" new_user +if [ "x$new_user" = "x(no default)" ] +then +echo "Cannot continue, git needs to record your name against commits" +exit 1 +fi + +if [ "x$set_email" = "x" ] +then +new_email="(no_default)" +else +new_email=$set_email +fi + +ask "Your email address (for git commits)" "${new_email}" new_email +if [ "x$new_email" = "x(no default)" ] +then +echo "Cannot continue, git needs to record your email address against commits" +exit 1 +fi + +if [ "x$set_user" != "x$new_user" ] +then +git config "user.name" "$new_user" +fi + +if [ "x$set_email" != "x$new_email" ] +then +git config "user.email" "$new_email" +fi + +upstream=$(git config --get "gcc-config.upstream") if [ "x$upstream" = "x" ] then upstream="origin" @@ -38,27 +83,27 @@ fi ask "Local name for upstream repository" "origin" upstream git config "gcc-config.upstream" "$upstream" -remote_id=`git config --get "gcc-config.user"` +remote_id=$(git config --get "gcc-config.user") if [ "x$remote_id" = "x" ] then # See if the url specifies the remote user name. -url=`git config --get "remote.$upstream.url"` +url=$(git config --get "remote.$upstream.url") if [ "x$url" = "x" ] then # This is a pure guess, but for many people it might be OK. - remote_id=`whoami` + remote_id=$(whoami) else - remote_id=`echo $url | sed -r "s|^.*ssh://(.+)@gcc.gnu.org.*$|\1|"` + remote_id=$(echo $url | sed -r "s|^.*ssh://(.+)@gcc.gnu.org.*$|\1|") if [ x$remote_id = x$url ] then - remote_id=`whoami` + remote_id=$(whoami) fi fi fi ask "Account name on gcc.gnu.org (for your personal branches area)" $remote_id remote_id git config "gcc-config.user" "$remote_id" -old_pfx=`git config --get "gcc-config.userpfx"` +old_pfx=$(git config --get "gcc-config.userpfx") if [ "x$old_pfx" = "x" ] then old_pfx="me" @@ -72,7 +117,7 @@ echo "Setting up tracking for personal namespace $remote_id in remotes/$upstream git config --replace-all "remote.${upstream}.fetch" "+refs/users/${remote_id}/heads/*:refs/remotes/${upstream}/${new_pfx}/*" ":refs/remotes/${upstream}/${old_pfx}/" git config --replace-all "remote.${upstream}.fetch" "+refs/users/${remote_id}/tags/*:refs/tags/${new_pfx}/*" ":refs/tags/${old_pfx}/" -push_rule=`git config --get "remote.${upstream}.push"` +push_rule=$(git config --get "remote.${upstream}.push") if [ "x$push_rule" != "x" ] then echo "***"
Re: [patch, openacc] Fix ICE verifying gimple
Ping. On 22/11/2019 11:06, Andrew Stubbs wrote: This test case causes an ICE (reformatted for email): void test(int k) { unsigned int x = 1; #pragma acc parallel loop async(x) for (int i = 0; i < k; i++) { } } t.c: In function 'test': t.c:4:9: error: invalid argument to gimple call 4 | #pragma acc parallel loop async(x) | ^~~ (int) x __builtin_GOACC_parallel_keyed (-1, test._omp_fn.0, 1, &.omp_data_arr.3, &.omp_data_sizes.4, &.omp_data_kinds.5, 536936447, (int) x, 0); during GIMPLE pass: ompexp dump file: t.c.013t.ompexp t.c:4:9: internal compiler error: verify_gimple failed The problem is that "x" needs to be cast to "int" (from "unsigned int") before calling the function, and that's not valid in a gimple call. The attached patch assigns the "(int) x" to a temporary and passes that to the function instead. OK to commit?
[PATCH] Fix spacing in a dump in value-prof.c.
Hi. One obvious dump fix that I've just tested. I'm going to install it. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Thanks, Martin gcc/ChangeLog: 2020-01-15 Martin Liska * value-prof.c (dump_histogram_value): Fix obvious spacing issue. --- gcc/value-prof.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/value-prof.c b/gcc/value-prof.c index 9a2c46252c6..b7c7d7eaea5 100644 --- a/gcc/value-prof.c +++ b/gcc/value-prof.c @@ -262,10 +262,10 @@ dump_histogram_value (FILE *dump_file, histogram_value hist) { fprintf (dump_file, (hist->type == HIST_TYPE_TOPN_VALUES - ? "Top N value counter " : "Indirect call counter")); + ? "Top N value counter" : "Indirect call counter")); if (hist->hvalue.counters) { - fprintf (dump_file, "all: %" PRId64 ", values: ", + fprintf (dump_file, " all: %" PRId64 ", values: ", (int64_t) hist->hvalue.counters[0]); for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++) {
Re: [PATCH] libstdc++/91223 Improve unordered containers == operator
On 16/01/20 07:42 +0100, François Dumont wrote: On 1/15/20 10:52 PM, Jonathan Wakely wrote: On 15/01/20 21:48 +, Jonathan Wakely wrote: On 14/01/20 22:25 +0100, François Dumont wrote: On 1/13/20 10:53 PM, Jonathan Wakely wrote: On 13/01/20 22:41 +0100, François Dumont wrote: For the multi-keys we could still avoid redundant comparisons when _Equal is just doing == on the key type. On unordered_multiset we could just avoids the call to is_permuation and on the unordered_multimap we could check the is_permutation only on the associated value rather than on the std::pair. I don't think that's necessary, or helpful. The idea of https://gcc.gnu.org/ml/libstdc++/2020-01/msg00070.html is that you shouldn't be using _Equal at all, and therefore it doesn't matter whether it's std::equal_to or not. And it was indeed possible. Nice! PR libstdc++/91223 * include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend. * include/bits/hashtable_policy.h: Include . (_Equality_base): Remove. (_Equality<>::_M_equal): Review implementation. Use std::is_permutation. * testsuite/23_containers/unordered_multiset/operators/1.cc (Hash, Equal, test02, test03): New. * testsuite/23_containers/unordered_set/operators/1.cc (Hash, Equal, test02, test03): New. Tested under Linux x86_64. Ok to commit ? Yes, OK for trunk (we're in stage4 but your patch was posted in stage3 and fixes a pretty nasty performance bug, so is OK now). N.B. GCC has moved to Git instead of Subversion. If you don't have Git access set up let me know and I can commit the patch for you. I haven't done the move yet and won't be able to do it before the week-end. So please proceed to the commit for me, thanks. No problem, I can do that.
Re: [PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)
Wilco Dijkstra writes: > The separate shrinkwrapping pass may insert stores in the middle > of atomics loops which can cause issues on some implementations. > Avoid this by delaying splitting of atomic patterns until after > prolog/epilog generation. > > Bootstrap completed, no test regressions on AArch64. > > Andrew, can you verify this fixes the failure you were getting? > > ChangeLog: > 2020-01-16 Wilco Dijkstra > > PR target/92692 > * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) > Add assert to ensure prolog has been emitted. > (aarch64_split_atomic_op): Likewise. > * config/aarch64/atomics.md (aarch64_compare_and_swap) > Use epilogue_completed rather than reload_completed. > (aarch64_atomic_exchange): Likewise. > (aarch64_atomic_): Likewise. > (atomic_nand): Likewise. > (aarch64_atomic_fetch_): Likewise. > (atomic_fetch_nand): Likewise. > (aarch64_atomic__fetch): Likewise. > (atomic_nand_fetch): Likewise. OK if Andrew confirms it works, thanks. Richard > --- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > ac89cc1f9c938455d33d8850d9ebfc0473cb73dc..cd9d813f2ac4990971f6435fdb28b0f94ae10309 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -18375,6 +18375,9 @@ aarch64_emit_post_barrier (enum memmodel model) > void > aarch64_split_compare_and_swap (rtx operands[]) > { > + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. */ > + gcc_assert (epilogue_completed); > + >rtx rval, mem, oldval, newval, scratch, x, model_rtx; >machine_mode mode; >bool is_weak; > @@ -18469,6 +18472,9 @@ void > aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx > mem, > rtx value, rtx model_rtx, rtx cond) > { > + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. */ > + gcc_assert (epilogue_completed); > + >machine_mode mode = GET_MODE (mem); >machine_mode wmode = (mode == DImode ? DImode : SImode); >const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); > diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md > index > c2bcabd0c3c2627b7222dcbc1af9c2e6b7ce6a76..996947799b5ef8445e9786b94e1ce62fd16e5b5c > 100644 > --- a/gcc/config/aarch64/atomics.md > +++ b/gcc/config/aarch64/atomics.md > @@ -56,7 +56,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > (clobber (match_scratch:SI 7 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_compare_and_swap (operands); > @@ -80,7 +80,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > (clobber (match_scratch:SI 7 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_compare_and_swap (operands); > @@ -104,7 +104,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" > (clobber (match_scratch:SI 7 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_compare_and_swap (operands); > @@ -223,7 +223,7 @@ (define_insn_and_split "aarch64_atomic_exchange" > (clobber (match_scratch:SI 4 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (SET, operands[0], NULL, operands[1], > @@ -344,7 +344,7 @@ (define_insn_and_split > "aarch64_atomic_" >(clobber (match_scratch:SI 4 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (, NULL, operands[3], operands[0], > @@ -400,7 +400,7 @@ (define_insn_and_split "atomic_nand" > (clobber (match_scratch:SI 4 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (NOT, NULL, operands[3], operands[0], > @@ -504,7 +504,7 @@ (define_insn_and_split > "aarch64_atomic_fetch_" > (clobber (match_scratch:SI 5 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (, operands[0], operands[4], operands[1], > @@ -551,7 +551,7 @@ (define_insn_and_split "atomic_fetch_nand" > (clobber (match_scratch:SI 5 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (NOT, operands[0], operands[4], operands[1], > @@ -604,7 +604,7 @@ (define_insn_and_split > "aarch64_atomic__fetch" > (clobber (match_scratch:SI 4 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >[(const_int 0)] >{ > aarch64_split_atomic_op (, NULL, operands[0], operands[1], > @@ -628,7 +628,7 @@ (define_insn_and_split "atomic_nand_fetch" > (clobber (match_scratch:SI 4 "="))] >"" >"#" > - "&& reload_completed" > + "&& epilogue_completed" >
[PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)
The separate shrinkwrapping pass may insert stores in the middle of atomics loops which can cause issues on some implementations. Avoid this by delaying splitting of atomic patterns until after prolog/epilog generation. Bootstrap completed, no test regressions on AArch64. Andrew, can you verify this fixes the failure you were getting? ChangeLog: 2020-01-16 Wilco Dijkstra PR target/92692 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) Add assert to ensure prolog has been emitted. (aarch64_split_atomic_op): Likewise. * config/aarch64/atomics.md (aarch64_compare_and_swap) Use epilogue_completed rather than reload_completed. (aarch64_atomic_exchange): Likewise. (aarch64_atomic_): Likewise. (atomic_nand): Likewise. (aarch64_atomic_fetch_): Likewise. (atomic_fetch_nand): Likewise. (aarch64_atomic__fetch): Likewise. (atomic_nand_fetch): Likewise. --- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac89cc1f9c938455d33d8850d9ebfc0473cb73dc..cd9d813f2ac4990971f6435fdb28b0f94ae10309 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -18375,6 +18375,9 @@ aarch64_emit_post_barrier (enum memmodel model) void aarch64_split_compare_and_swap (rtx operands[]) { + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. */ + gcc_assert (epilogue_completed); + rtx rval, mem, oldval, newval, scratch, x, model_rtx; machine_mode mode; bool is_weak; @@ -18469,6 +18472,9 @@ void aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, rtx value, rtx model_rtx, rtx cond) { + /* Split after prolog/epilog to avoid interactions with shrinkwrapping. */ + gcc_assert (epilogue_completed); + machine_mode mode = GET_MODE (mem); machine_mode wmode = (mode == DImode ? DImode : SImode); const enum memmodel model = memmodel_from_int (INTVAL (model_rtx)); diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md index c2bcabd0c3c2627b7222dcbc1af9c2e6b7ce6a76..996947799b5ef8445e9786b94e1ce62fd16e5b5c 100644 --- a/gcc/config/aarch64/atomics.md +++ b/gcc/config/aarch64/atomics.md @@ -56,7 +56,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" (clobber (match_scratch:SI 7 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_compare_and_swap (operands); @@ -80,7 +80,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" (clobber (match_scratch:SI 7 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_compare_and_swap (operands); @@ -104,7 +104,7 @@ (define_insn_and_split "@aarch64_compare_and_swap" (clobber (match_scratch:SI 7 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_compare_and_swap (operands); @@ -223,7 +223,7 @@ (define_insn_and_split "aarch64_atomic_exchange" (clobber (match_scratch:SI 4 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (SET, operands[0], NULL, operands[1], @@ -344,7 +344,7 @@ (define_insn_and_split "aarch64_atomic_" (clobber (match_scratch:SI 4 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (, NULL, operands[3], operands[0], @@ -400,7 +400,7 @@ (define_insn_and_split "atomic_nand" (clobber (match_scratch:SI 4 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (NOT, NULL, operands[3], operands[0], @@ -504,7 +504,7 @@ (define_insn_and_split "aarch64_atomic_fetch_" (clobber (match_scratch:SI 5 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (, operands[0], operands[4], operands[1], @@ -551,7 +551,7 @@ (define_insn_and_split "atomic_fetch_nand" (clobber (match_scratch:SI 5 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (NOT, operands[0], operands[4], operands[1], @@ -604,7 +604,7 @@ (define_insn_and_split "aarch64_atomic__fetch" (clobber (match_scratch:SI 4 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (, NULL, operands[0], operands[1], @@ -628,7 +628,7 @@ (define_insn_and_split "atomic_nand_fetch" (clobber (match_scratch:SI 4 "="))] "" "#" - "&& reload_completed" + "&& epilogue_completed" [(const_int 0)] { aarch64_split_atomic_op (NOT, NULL, operands[0], operands[1],
Re: [PATCH] adjust object size computation for union accesses and PHIs (PR 92765)
On Wed, Jan 15, 2020 at 09:52:57PM +0100, Jakub Jelinek wrote: > This looks wrong. For one, this function is used for two purposes now and > you tweak it for one, but more importantly, whether he initial stmt > you see is a PHI or not can't make a difference, how is that case e.g. > different from _1 = PHI <_3, _4>; _2 = _1 + 1; and asking about _2? > For _1, you'd use (correctly) the maximum, but if called on _2, you'd ask > (wrongly) for minimum instead of maximum. And now with testcases. strlenopt-95.c shows the above. > This also looks like a hack to shut up the particular testcases instead of > really playing with what the IL provides. Instead of the unions, consider > e.g. C++ placement new, have a pointer to a buffer into which you placement > new one structure, take address of some member in it, pass it to something, > if it doesn't have a destructor do a C++ placement new into the same buffer > but with different structure, take address of a different member with the > same address as the first member, do the str*cmp on it that invokes this > stuff. SCCVN will (likely) find out that the values of those two pointers > are the same and just use the former pointer in the latter case. And strlenopt-93.C shows the latter. strlenopt-94.C is similar, just to show that it breaks equally badly with non-PODs that will be constructed by placement new and destructed later. Jakub __attribute__((noipa)) int barrier_copy (char *x, int y) { asm volatile ("" : : "g" (x), "g" (y) : "memory"); if (y == 0) __builtin_strcpy (x, "abcd"); return y; } __attribute__((noipa)) char * test_2 (int x) { char *p; if (x) p = __builtin_malloc (4); else p = __builtin_calloc (16, 1); char *q = p + 2; if (barrier_copy (q, x)) return p; if (__builtin_strcmp (q, "abcd") != 0) __builtin_abort (); return p; } int main () { __builtin_free (test_2 (0)); __builtin_free (test_2 (1)); return 0; }#include struct S1 { char a[2]; char b[2]; char c[2]; }; struct S2 { char d[6]; }; __attribute__((noipa)) void foo (char *b) { b[0] = 1; b[1] = 2; asm volatile ("" : : "g" (b) : "memory"); } __attribute__((noipa)) void bar (char *d) { __builtin_memcpy (d, "cde", 4); asm volatile ("" : : "g" (d) : "memory"); } __attribute__((noipa)) void baz (char *buf) { S1 *s1 = new (buf) S1; char *p = (char *) >b; foo (p); S2 *s2 = new (buf) S2; char *q = (char *) >d[2]; bar (q); if (__builtin_strcmp (q, "cde")) __builtin_abort (); } int main () { union U { S1 s1; S2 s2; char buf[sizeof (S1) > sizeof (S2) ? sizeof (S1) : sizeof (S2)]; } u; baz (u.buf); return 0; } #include struct S1 { char a[2]; char b[2]; char c[2]; S1 () { a[0] = 0; b[0] = 0; c[0] = 0; }; ~S1 () {} }; struct S2 { char d[6]; S2 () { d[0] = 0; d[2] = 0; } ~S2 () {} }; __attribute__((noipa)) void foo (char *b) { b[0] = 1; b[1] = 2; asm volatile ("" : : "g" (b) : "memory"); } __attribute__((noipa)) void bar (char *d) { __builtin_memcpy (d, "cde", 4); asm volatile ("" : : "g" (d) : "memory"); } __attribute__((noipa)) void baz (char *buf) { S1 *s1 = new (buf) S1 (); char *p = (char *) >b; foo (p); s1->~S1 (); S2 *s2 = new (buf) S2 (); char *q = (char *) >d[2]; bar (q); if (__builtin_strcmp (q, "cde")) __builtin_abort (); s2->~S2 (); } int main () { char buf[sizeof (S1) > sizeof (S2) ? sizeof (S1) : sizeof (S2)]; baz (buf); return 0; }
[PATCH] gimplifier: handle POLY_INT_CST-sized TARGET_EXPRs
If a TARGET_EXPR has poly-int size, the gimplifier would treat it like a VLA and use gimplify_vla_decl. gimplify_vla_decl in turn would use an alloca and expect all references to be gimplified via the DECL_VALUE_EXPR. This caused confusion later in gimplify_var_or_parm_decl_1 when we (correctly) had direct rather than indirect references. For completeness, the patch also fixes similar tests in the RETURN_EXPR handling and OpenMP depend clauses. Tested on aarch64-linux-gnu and x86_64-linux-gnu. I guess this isn't really a regression, but instead a bug in a new feature. It should be very low risk for non-SVE targets though, so OK anyway? Richard 2020-01-16 Richard Sandiford gcc/ * gimplify.c (gimplify_return_expr): Use poly_int_tree_p rather than testing directly for INTEGER_CST. (gimplify_target_expr, gimplify_omp_depend): Likewise. gcc/testsuite/ * g++.target/aarch64/sve/acle/general-c++/gimplify_1.C: New test. --- gcc/gimplify.c | 6 +++--- .../g++.target/aarch64/sve/acle/general-c++/gimplify_1.C| 4 2 files changed, 7 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gimplify_1.C diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 05d7922116b..e50be5706c7 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1632,7 +1632,7 @@ gimplify_return_expr (tree stmt, gimple_seq *pre_p) result = NULL_TREE; else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl))) { - if (TREE_CODE (DECL_SIZE (result_decl)) != INTEGER_CST) + if (!poly_int_tree_p (DECL_SIZE (result_decl))) { if (!TYPE_SIZES_GIMPLIFIED (TREE_TYPE (result_decl))) gimplify_type_sizes (TREE_TYPE (result_decl), pre_p); @@ -6714,7 +6714,7 @@ gimplify_target_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p) /* TARGET_EXPR temps aren't part of the enclosing block, so add it to the temps list. Handle also variable length TARGET_EXPRs. */ - if (TREE_CODE (DECL_SIZE (temp)) != INTEGER_CST) + if (!poly_int_tree_p (DECL_SIZE (temp))) { if (!TYPE_SIZES_GIMPLIFIED (TREE_TYPE (temp))) gimplify_type_sizes (TREE_TYPE (temp), pre_p); @@ -7921,7 +7921,7 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p) tree type = build_array_type (ptr_type_node, build_index_type (totalpx)); tree array = create_tmp_var_raw (type); TREE_ADDRESSABLE (array) = 1; - if (TREE_CODE (totalpx) != INTEGER_CST) + if (!poly_int_tree_p (totalpx)) { if (!TYPE_SIZES_GIMPLIFIED (TREE_TYPE (array))) gimplify_type_sizes (TREE_TYPE (array), pre_p); diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gimplify_1.C b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gimplify_1.C new file mode 100644 index 000..06ee7eae029 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gimplify_1.C @@ -0,0 +1,4 @@ +/* { dg-additional-options "-flax-vector-conversions" } */ + +inline void foo (const __SVInt32_t ) {} +void bar (__SVUint32_t x) { foo(x); }
[committed] contrib: Verify the id to be printed is ancestor of the corresponding remote release branch (or master), otherwise print nothing.
Hi! The monotonically increasing revision ids need to be globally unique, so they should only identify commits that were committed to the upstream repo to its master or releases/gcc-N branches. The alias could print something even for private branches or vendor branches etc., but if such an identifier is then used publicly, it will refer to something else. Committed to trunk. Martin also suggested for git gcc-descr --full to also include date (commit date) in the string, so instead of r10-5998-g2588197b6c2175d7ef9f05a8636f58dfeab12b1d print r10-5998-20200116-g2588197b6c2175d7ef9f05a8636f58dfeab12b1d What do others think? The r10-5998 part in there is still needed to be the unique monotonically increasing identifier, the rest is something that can be derived from it, but e.g. in bugzilla the date could provide extra helpful information (which can be seen if one follows the link obviously). In addition to adjusting gcc-git-customization.sh, git-hooks we'd need to update the redirects and ask Frederic to update bugzilla URLization too. 2020-01-16 Jakub Jelinek * gcc-git-customization.sh: Verify the id to be printed is ancestor of the corresponding remote release branch (or master), otherwise print nothing. --- contrib/gcc-git-customization.sh.jj 2020-01-15 14:28:26.996758773 +0100 +++ contrib/gcc-git-customization.sh2020-01-16 12:27:46.734690164 +0100 @@ -22,7 +22,7 @@ git config alias.svn-rev '!f() { rev=$1; # Add git commands to convert git commit to monotonically increasing revision number # and vice versa -git config alias.gcc-descr \!"f() { if test \${1:-no} = --full; then r=\$(git describe --all --abbrev=40 --match 'basepoints/gcc-[0-9]*' \${2:-master} | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-,r,p'); expr match \${r:-no} '^r[0-9]\\+\$' >/dev/null && r=\${r}-0-g\$(git rev-parse \${2:-master}); test -n \$r && echo \${r}; else git describe --all --match 'basepoints/gcc-[0-9]*' \${1:-master} | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)-\\([0-9]\\+\\)-g[0-9a-f]*\$,r\\2-\\3,p;s,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)\$,r\\2-0,p'; fi; }; f" +git config alias.gcc-descr \!"f() { if test \${1:-no} = --full; then c=\${2:-master}; r=\$(git describe --all --abbrev=40 --match 'basepoints/gcc-[0-9]*' \$c | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-,r,p'); expr match \${r:-no} '^r[0-9]\\+\$' >/dev/null && r=\${r}-0-g\$(git rev-parse \${2:-master}); else c=\${1:-master}; r=\$(git describe --all --match 'basepoints/gcc-[0-9]*' \$c | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)-\\([0-9]\\+\\)-g[0-9a-f]*\$,r\\2-\\3,p;s,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)\$,r\\2-0,p'); fi; if test -n \$r; then o=\$(git config --get gcc-config.upstream); rr=\$(echo \$r | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\\(-g[0-9a-f]\\+\\)\\?\$,\\1,p'); if git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$rr >/dev/null; then m=releases/gcc-\$rr; else m=master; fi; git merge-base --is-ancestor \$c \${o:-origin}/\$m && \echo \${r}; fi; }; f" git config alias.gcc-undescr \!"f() { o=\$(git config --get gcc-config.upstream); r=\$(echo \$1 | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\$,\\1,p'); n=\$(echo \$1 | sed -n 's,^r[0-9]\\+-\\([0-9]\\+\\)\$,\\1,p'); test -z \$r && echo Invalid id \$1 && exit 1; h=\$(git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$r); test -z \$h && h=\$(git rev-parse --verify --quiet \${o:-origin}/master); p=\$(git describe --all --match 'basepoints/gcc-'\$r \$h | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+-\\([0-9]\\+\\)-g[0-9a-f]*\$,\\2,p;s,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+\$,0,p'); git rev-parse --verify \$h~\$(expr \$p - \$n); }; f" # Make diff on MD files use "(define" as a function marker. Jakub
[PATCH][AArch64] PR92424: Fix -fpatchable-function-entry=N,M with BTI
this affects the linux kernel and technically a wrong code bug so this fix tries to be backportable (fixing all issues with -fpatchable-function-entry=N,M will likely require new option). gcc/ChangeLog: 2020-01-16 Szabolcs Nagy PR target/92424 * config/aarch64/aarch64.c (aarch64_declare_function_name): Emit BTI c if the function label is followed by a patch area. From ac2a46bab60ecc80a453328b4749a951908c02c5 Mon Sep 17 00:00:00 2001 From: Szabolcs Nagy Date: Wed, 15 Jan 2020 12:23:40 + Subject: [PATCH] [AArch64] PR92424: Fix -fpatchable-function-entry=N,M with BTI This is a minimal workaround that emits another BTI after the function label if that is followed by a patch area. So before this commit -fpatchable-function-entry=3,1 with bti generates .section __patchable_function_entries .8byte .LPFE .text .LPFE: nop foo: nop nop bti c // or paciasp ... and after this commit .section __patchable_function_entries .8byte .LPFE .text .LPFE: nop foo: bti c nop nop bti c // or paciasp ... There is a new bti insn in the middle of the patchable area unless M=0 (patch area is after the new bti) or M=N (patch area is before the label, no new bti). Note that .cfi_startproc and the asynchronous unwind table entry label comes after the patch area, but whatever effect that has on the newly inserted bti c, it already affected the insns in the patch area. Tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/92424 * config/aarch64/aarch64.c (aarch64_declare_function_name): Emit BTI c if the function label is followed by a patch area. --- gcc/config/aarch64/aarch64.c | 33 + 1 file changed, 33 insertions(+) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 66e20becaf2..0394c274330 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -18079,6 +18079,39 @@ aarch64_declare_function_name (FILE *stream, const char* name, /* Don't forget the type directive for ELF. */ ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function"); ASM_OUTPUT_LABEL (stream, name); + + if (!aarch64_bti_enabled () + || cgraph_node::get (fndecl)->only_called_directly_p ()) +return; + + /* Copy logic from varasm.c assemble_start_function + to handle -fpatchable-function-entry=N,M with BTI. */ + unsigned HOST_WIDE_INT patch_area_size = function_entry_patch_area_size; + unsigned HOST_WIDE_INT patch_area_entry = function_entry_patch_area_start; + + tree patchable_function_entry_attr += lookup_attribute ("patchable_function_entry", DECL_ATTRIBUTES (fndecl)); + if (patchable_function_entry_attr) +{ + tree pp_val = TREE_VALUE (patchable_function_entry_attr); + tree patchable_function_entry_value1 = TREE_VALUE (pp_val); + + patch_area_size = tree_to_uhwi (patchable_function_entry_value1); + patch_area_entry = 0; + if (TREE_CHAIN (pp_val) != NULL_TREE) + { + tree patchable_function_entry_value2 + = TREE_VALUE (TREE_CHAIN (pp_val)); + patch_area_entry = tree_to_uhwi (patchable_function_entry_value2); + } +} + + if (patch_area_entry > patch_area_size) +patch_area_entry = 0; + + /* Emit a BTI c if a patch area comes after the function label. */ + if (patch_area_size > patch_area_entry) +asm_fprintf (stream, "\thint\t34 // bti c\n"); } /* Implement ASM_OUTPUT_DEF_FROM_DECLS. Output .variant_pcs for aliases. */ -- 2.17.1
Re: [PATCH][gcc] libgccjit: introduce version entry points
Hi, second version of the patch here cleaning up an unnecessary change. Does not introduce regressions with make check-jit. Andrea gcc/jit/ChangeLog 2020-??-?? Andrea Corallo * docs/topics/compatibility.rst (LIBGCCJIT_ABI_13): New ABI tag plus add version paragraph. * libgccjit++.h (namespace gccjit::version): Add new namespace. * libgccjit.c (gcc_jit_version_major, gcc_jit_version_minor) (gcc_jit_version_patchlevel): New functions. * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_version): New macro. (gcc_jit_version_major, gcc_jit_version_minor) (gcc_jit_version_patchlevel): New functions. * libgccjit.map (LIBGCCJIT_ABI_13) New ABI tag. gcc/testsuite/ChangeLog 2020-??-?? Andrea Corallo * jit.dg/test-version.c: New testcase. diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst index a6faee0810e..0c0ce070d72 100644 --- a/gcc/jit/docs/topics/compatibility.rst +++ b/gcc/jit/docs/topics/compatibility.rst @@ -61,6 +61,28 @@ You can see the symbol tags provided by libgccjit.so using ``objdump``: LIBGCCJIT_ABI_0 [...snip...] +Programmatically checking version +*** + +Client code can programmatically check libgccjit version using: + +.. function:: int gcc_jit_version_major (void) + + Return libgccjit major version. This is analogous to __GNUC__ in C code. + +.. function:: int gcc_jit_version_minor (void) + + Return libgccjit minor version. This is analogous to + __GNUC_MINOR__ in C code. + +.. function:: int gcc_jit_version_patchlevel (void) + + Return libgccjit patchlevel version. This is analogous to + __GNUC_PATCHLEVEL__ in C code. + +.. note:: These entry points has been added with ``LIBGCCJIT_ABI_13`` + (see below). + ABI symbol tags *** @@ -182,3 +204,14 @@ entrypoints: ``LIBGCCJIT_ABI_12`` covers the addition of :func:`gcc_jit_context_new_bitfield` + +``LIBGCCJIT_ABI_13`` + +``LIBGCCJIT_ABI_13`` covers the addition of version functions via API +entrypoints: + + * :func:`gcc_jit_version_major` + + * :func:`gcc_jit_version_minor` + + * :func:`gcc_jit_version_patchlevel` diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h index 82a62d614c5..afb92194c28 100644 --- a/gcc/jit/libgccjit++.h +++ b/gcc/jit/libgccjit++.h @@ -49,6 +49,8 @@ namespace gccjit class timer; class auto_time; + namespace version {}; + /* Errors within the API become C++ exceptions of this class. */ class error { @@ -1913,6 +1915,26 @@ auto_time::~auto_time () m_timer.pop (m_item_name); } +namespace version +{ +inline int +major () +{ + return gcc_jit_version_major (); +} + +inline int +minor () +{ + return gcc_jit_version_minor (); +} + +inline int +patchlevel () +{ + return gcc_jit_version_patchlevel (); +} +} // namespace version } // namespace gccjit #endif /* #ifndef LIBGCCJIT_PLUS_PLUS_H */ diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h index 21a0dc09b03..1c5a12e9c01 100644 --- a/gcc/jit/libgccjit.h +++ b/gcc/jit/libgccjit.h @@ -1487,6 +1487,22 @@ gcc_jit_context_new_rvalue_from_vector (gcc_jit_context *ctxt, size_t num_elements, gcc_jit_rvalue **elements); +#define LIBGCCJIT_HAVE_gcc_jit_version + +/* Functions to retrive libgccjit version. + Analogous to __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ in C code. + + These API entrypoints were added in LIBGCCJIT_ABI_13; you can test for their + presence using + #ifdef LIBGCCJIT_HAVE_gcc_jit_version + */ +extern int +gcc_jit_version_major (void); +extern int +gcc_jit_version_minor (void); +extern int +gcc_jit_version_patchlevel (void); + #ifdef __cplusplus } #endif /* __cplusplus */ diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 83055fc297b..572c82f053c 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see #include "coretypes.h" #include "timevar.h" #include "typed-splay-tree.h" +#include "cppbuiltin.h" #include "libgccjit.h" #include "jit-recording.h" @@ -3175,3 +3176,27 @@ gcc_jit_context_new_rvalue_from_vector (gcc_jit_context *ctxt, as_vec_type, (gcc::jit::recording::rvalue **)elements); } + +extern int +gcc_jit_version_major (void) +{ + int major, minor, patchlevel; + parse_basever (, , ); + return major; +} + +extern int +gcc_jit_version_minor (void) +{ + int major, minor, patchlevel; + parse_basever (, , ); + return minor; +} + +extern int +gcc_jit_version_patchlevel (void) +{ + int major, minor, patchlevel; + parse_basever (, , ); + return patchlevel; +} diff --git a/gcc/jit/libgccjit.map b/gcc/jit/libgccjit.map index 4514bd3aa33..6137dd4b4b0 100644 --- a/gcc/jit/libgccjit.map +++ b/gcc/jit/libgccjit.map @@ -179,4 +179,11 @@ LIBGCCJIT_ABI_11 { LIBGCCJIT_ABI_12 { global: gcc_jit_context_new_bitfield; -} LIBGCCJIT_ABI_11; \ No newline at end of file
Re: [PATCH] doc: Replace references to SVN with those to Git
Hi, On Thu, Jan 16 2020, Gerald Pfeifer wrote: > On Wed, 15 Jan 2020, Martin Jambor wrote: >> when going over stuff linked from the SummerOfCode wiki page, >> I found out that doc/install.texi still refers to Subversion. > > We've got a fair number of references left in various places; > working through that slowly and appreciate your help! > >> The following patch replaces SVN to Git almost mechanically. > > I'm wondering whether we can/should see where to use a more > generic term in a few cases; let me propose some such cases. > >> OK for trunk? And then perhaps for the opened release branches too? > > Yes (considering the notes below), and yes, please! It was brought to my attention that there already is Eric's https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00719.html which addresses the same thing and better, so I don't plan to commit my patch any more. However, it would be great if the (already approved?) patch would be comitted soon. Thanks, Martin
Re: drop -aux{dir,base}, revamp -dump{dir,base}
On Jan 16, 2020, Alexandre Oliva wrote: > On Jan 9, 2020, Alexandre Oliva wrote: >> On Jan 9, 2020, Richard Biener wrote: >>> Did I miss the actual (non-documentation) patch? >> No, I didn't post it. It's kind of big, and only yesterday did I get it >> to work as expected and now extensively documented, passing all of the >> extensive testsuite I wrote for it. > Here it is, at last, regstrapped on x86_64-linux-gnu. Ok to install? And here's a followup that fixes a limitation (bug?) in libiberty that was hit when I attempted a last-minute simplification in lto-wrapper. Regstrapped separately on x86_64-linux-gnu. Ok to install? [libiberty] output empty args as a pair of quotes From: Alexandre Oliva writeargv writes out empty arguments in a way that expandargv skips them instead of preserving them. Fixed by writing out a pair of quotes for them. This enables lto-wrapper to pass down an empty string, as desired. for libiberty/ChangeLog * argv.c (writeargv): Output empty args as "". for gcc/ChangeLog * lto-wrapper.c (run_gcc): Use an empty string for -dumpdir. --- gcc/lto-wrapper.c |5 + libiberty/argv.c |8 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index ed076e3..aa71f1e 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -1482,10 +1482,7 @@ run_gcc (unsigned argc, char *argv[]) if (!incoming_dumppfx) { obstack_ptr_grow (_obstack, "-dumpdir"); - /* An empty string would do, if only writeargv would write it -out in a way that would not be skipped by expandargv and -buildargv. */ - obstack_ptr_grow (_obstack, current_dir); + obstack_ptr_grow (_obstack, ""); } obstack_ptr_grow (_obstack, "-dumpbase"); diff --git a/libiberty/argv.c b/libiberty/argv.c index 8c9794db..6a72208 100644 --- a/libiberty/argv.c +++ b/libiberty/argv.c @@ -327,6 +327,14 @@ writeargv (char * const *argv, FILE *f) arg++; } + /* Write out a pair of quotes for an empty argument. */ + if (arg == *argv) + if (EOF == fputs ("\"\"", f)) + { + status = 1; + goto done; + } + if (EOF == fputc ('\n', f)) { status = 1; -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Free Software Evangelist Stallman was right, but he's left :( GNU Toolchain EngineerFSMatrix: It was he who freed the first of us FSF & FSFLA board memberThe Savior shall return (true);
[committed] aarch64: Fix BE SVE mode punning involving floats
The patterns used by aarch64_split_sve_subreg_move only support integer modes, so if the widest mode is a float, we should get its integer equivalent. Fixes gcc.target/aarch64/sel_3.c for big-endian targets. Tested on aarch64-linux-gnu and aarch64_be-none-elf. Richard 2020-01-16 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_split_sve_subreg_move): Apply aarch64_sve_int_mode to each mode. --- gcc/config/aarch64/aarch64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac89cc1f9c9..600a238c1f4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4827,8 +4827,8 @@ aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src) /* Decide which REV operation we need. The mode with wider elements determines the mode of the operands and the mode with the narrower elements determines the reverse width. */ - machine_mode mode_with_wider_elts = GET_MODE (dest); - machine_mode mode_with_narrower_elts = GET_MODE (src); + machine_mode mode_with_wider_elts = aarch64_sve_int_mode (GET_MODE (dest)); + machine_mode mode_with_narrower_elts = aarch64_sve_int_mode (GET_MODE (src)); if (GET_MODE_UNIT_SIZE (mode_with_wider_elts) < GET_MODE_UNIT_SIZE (mode_with_narrower_elts)) std::swap (mode_with_wider_elts, mode_with_narrower_elts);
[PATCH 4/4 GCC11] rs6000: P9 D-form test cases
gcc/testsuite/ChangeLog 2020-01-16 Kelvin Nilsen Kewen Lin * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. gcc/testsuite/gcc.target/powerpc/p9-dform-0.c | 43 + gcc/testsuite/gcc.target/powerpc/p9-dform-1.c | 55 ++ gcc/testsuite/gcc.target/powerpc/p9-dform-2.c | 12 + gcc/testsuite/gcc.target/powerpc/p9-dform-3.c | 15 ++ gcc/testsuite/gcc.target/powerpc/p9-dform-4.c | 12 + .../gcc.target/powerpc/p9-dform-generic.h | 34 + 6 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-4.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c new file mode 100644 index 000..01f8b69 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-0.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +/* This test confirms that the dform instructions are selected in the + translation of this main program. */ + +extern void first_dummy (); +extern void dummy (double sacc, int n); +extern void other_dummy (); + +extern float opt_value; +extern char *opt_desc; + +#define M 128 +#define N 512 + +double x [N]; +double y [N]; + +int main (int argc, char *argv []) { + double sacc; + + first_dummy (); + for (int j = 0; j < M; j++) { + +sacc = 0.00; +for (unsigned long long int i = 0; i < N; i++) { + sacc += x[i] * y[i]; +} +dummy (sacc, N); + } + opt_value = ((float) N) * 2 * ((float) M); + opt_desc = "flops"; + other_dummy (); +} + +/* At time the dform optimization pass was merged with trunk, 12 + lxv instructions were emitted in place of the same number of lxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mlxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c new file mode 100644 index 000..c6f1d76 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-1.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +/* This test confirms that the dform instructions are selected in the + translation of this main program. */ + +extern void first_dummy (); +extern void dummy (double sacc, int n); +extern void other_dummy (); + +extern float opt_value; +extern char *opt_desc; + +#define M 128 +#define N 512 + +double x [N]; +double y [N]; +double z [N]; + +int main (int argc, char *argv []) { + double sacc; + + first_dummy (); + for (int j = 0; j < M; j++) { + +sacc = 0.00; +for (unsigned long long int i = 0; i < N; i++) { + z[i] = x[i] * y[i]; + sacc += z[i]; +} +dummy (sacc, N); + } + opt_value = ((float) N) * 2 * ((float) M); + opt_desc = "flops"; + other_dummy (); +} + + + +/* At time the dform optimization pass was merged with trunk, 12 + lxv instructions were emitted in place of the same number of lxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mlxv\M} } } */ + +/* At time the dform optimization pass was merged with trunk, 6 + stxv instructions were emitted in place of the same number of stxvx + instructions. No need to require exactly this number, as it may + change when other optimization passes evolve. */ + +/* { dg-final { scan-assembler {\mstxv\M} } } */ + diff --git a/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c b/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c new file mode 100644 index 000..8752f3d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p9-dform-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */ + +#define TYPE int +#include "p9-dform-generic.h" + +/* The precise number of lxv and stxv instructions may be impacted by + complex interactions between optimization passes, but we expect at + least one of each. */ +/* { dg-final { scan-assembler {\mlxv\M} } } */ +/* {
[PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling
gcc/ChangeLog 2020-01-16 Kewen Lin * tree-ssa-loop-ivopts.c (struct iv_group): New field dform_p. (struct iv_cand): New field dform_p. (struct ivopts_data): New field mark_dform_p. (record_group): Initialize dform_p. (mark_dform_groups): New function. (find_interesting_uses): Call mark_dform_groups. (add_candidate_1): Update dform_p if derived from dform_p group. (determine_iv_cost): Increase cost by considering unroll factor. (tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update mark_dform_p. gcc/tree-ssa-loop-ivopts.c | 84 +- 1 file changed, 83 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index ab52cbe..a0d29bb 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -429,6 +429,8 @@ struct iv_group struct iv_cand *selected; /* To indicate this is a doloop use group. */ bool doloop_p; + /* To indicate this group is D-form preferred. */ + bool dform_p; /* Uses in the group. */ vec vuses; }; @@ -470,6 +472,7 @@ struct iv_cand struct iv *orig_iv; /* The original iv if this cand is added from biv with smaller type. */ bool doloop_p; /* Whether this is a doloop candidate. */ + bool dform_p;/* Derived from one D-form preferred group. */ }; /* Hashtable entry for common candidate derived from iv uses. */ @@ -650,6 +653,10 @@ struct ivopts_data /* Whether the loop has doloop comparison use. */ bool doloop_use_p; + + /* Whether the loop is likely to unroll and need to check and mark + D-form group for better step cost modeling. */ + bool mark_dform_p; }; /* An assignment of iv candidates to uses. */ @@ -1575,6 +1582,7 @@ record_group (struct ivopts_data *data, enum use_type type) group->related_cands = BITMAP_ALLOC (NULL); group->vuses.create (1); group->doloop_p = false; + group->dform_p = false; data->vgroups.safe_push (group); return group; @@ -2724,6 +2732,59 @@ split_address_groups (struct ivopts_data *data) } } +/* Go through all address type groups, check and mark D-form preferred. */ +static void +mark_dform_groups (struct ivopts_data *data) +{ + if (!data->mark_dform_p) +return; + + class loop *loop = data->current_loop; + bool dump_details = (dump_file && (dump_flags & TDF_DETAILS)); + for (unsigned i = 0; i < data->vgroups.length (); i++) +{ + struct iv_group *group = data->vgroups[i]; + if (address_p (group->type)) + { + bool found = true; + for (unsigned j = 0; j < group->vuses.length (); j++) + { + struct iv_use *use = group->vuses[j]; + gcc_assert (use->mem_type); + /* Ensure the step fit into D-form field. */ + if (TREE_CODE (use->iv->step) != INTEGER_CST + || !tree_fits_shwi_p (use->iv->step)) + { + found = false; + if (dump_details) + fprintf (dump_file, +" Group use %u.%u doesn't" +"have constant step for D-form.\n", +i, j); + break; + } + bool is_store + = TREE_CODE (gimple_assign_lhs (use->stmt)) == SSA_NAME; + if (!targetm.stride_dform_valid_p (TYPE_MODE (use->mem_type), +tree_to_shwi (use->iv->step), +TYPE_UNSIGNED (use->mem_type), +is_store, loop->estimated_uf)) + { + found = false; + if (dump_details) + fprintf (dump_file, +" Group use %u.%u isn't" +"suitable for D-form.\n", +i, j); + break; + } + } + if (found) + group->dform_p = true; + } +} +} + /* Finds uses of the induction variables that are interesting. */ static void @@ -2755,6 +2816,8 @@ find_interesting_uses (struct ivopts_data *data) split_address_groups (data); + mark_dform_groups (data); + if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "\n:\n"); @@ -3137,6 +3200,7 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, cand->important = important; cand->incremented_at = incremented_at; cand->doloop_p = doloop; + cand->dform_p = false; data->vcands.safe_push (cand); if (!poly_int_tree_p (step)) @@ -3173,7 +3237,11 @@ add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, /* Relate candidate to the group for which it is added. */ if (use) -
[PATCH 2/4 GCC11] Add target hook stride_dform_valid_p
gcc/ChangeLog 2020-01-16 Kewen Lin * config/rs6000/rs6000.c (TARGET_STRIDE_DFORM_VALID_P): New macro. (rs6000_stride_dform_valid_p): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_STRIDE_DFORM_VALID_P): New hook. * target.def (stride_dform_valid_p): New hook. gcc/config/rs6000/rs6000.c | 40 gcc/doc/tm.texi| 8 gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 13 - 4 files changed, 62 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 0dabaa6..1e41fcf 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1657,6 +1657,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_STRIDE_DFORM_VALID_P +#define TARGET_STRIDE_DFORM_VALID_P rs6000_stride_dform_valid_p + #undef TARGET_HAVE_COUNT_REG_DECR_P #define TARGET_HAVE_COUNT_REG_DECR_P true @@ -26272,6 +26275,43 @@ rs6000_predict_doloop_p (struct loop *loop) return true; } +/* Return true if the memory access with mode MODE, signedness SIGNED_P and + store STORE_P with offset from 0 to (NUNROLL-1) * STRIDE are valid with + D-form instructions. */ + +static bool +rs6000_stride_dform_valid_p (machine_mode mode, signed HOST_WIDE_INT stride, + bool signed_p, bool store_p, unsigned nunroll) +{ + static const HOST_WIDE_INT max_bound = 0x7fff; + static const HOST_WIDE_INT min_bound = -0x8000; + + if (!IN_RANGE ((nunroll - 1) * stride, min_bound, max_bound)) +return false; + + /* Check DQ-form for vector mode or float128 mode. */ + if (VECTOR_MODE_P (mode) || FLOAT128_VECTOR_P (mode)) +{ + if (mode_supports_dq_form (mode) && !(stride & 0xF)) + return true; + else + return false; +} + + /* Simply consider non VSX instructions. */ + if (mode == QImode || mode == HImode || mode == SFmode || mode == DFmode) +return true; + + /* lwz/stw is D-form, but lwa is DS-form. */ + if (mode == SImode && (!signed_p || store_p || !(stride & 0x03))) +return true; + + if (mode == DImode && !(stride & 0x03)) +return true; + + return false; +} + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 86ad278..0b8bc7c 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11669,6 +11669,14 @@ function version at run-time for a given set of function versions. body must be generated. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STRIDE_DFORM_VALID_P (machine_mode @var{mode}, signed HOST_WIDE_INT @var{stride}, bool @var{signed_p}, bool @var{store_p}, unsigned @var{nunroll}) +For a given memory access, check whether it is valid to put 0, @var{stride} +, 2 * @var{stride}, ... , (@var{nunroll} - 1) to the instruction D-form +displacement, with mode @var{mode}, signedness @var{signed_p} and store +@var{store_p}. Return true if valid. +The default version of this hook returns false. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (class loop *@var{loop}) Return true if we can predict it is possible to use a low-overhead loop for a particular loop. The parameter @var{loop} is a pointer to the loop. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index fd9769e..e90d020 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7953,6 +7953,8 @@ to by @var{ce_info}. @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY +@hook TARGET_STRIDE_DFORM_VALID_P + @hook TARGET_PREDICT_DOLOOP_P @hook TARGET_HAVE_COUNT_REG_DECR_P diff --git a/gcc/target.def b/gcc/target.def index f61c831..ee19a8d 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4300,7 +4300,18 @@ DEFHOOK emits a @code{speculation_barrier} instruction if that is defined.", rtx, (machine_mode mode, rtx result, rtx val, rtx failval), default_speculation_safe_value) - + +DEFHOOK +(stride_dform_valid_p, + "For a given memory access, check whether it is valid to put 0, @var{stride}\n\ +, 2 * @var{stride}, ... , (@var{nunroll} - 1) to the instruction D-form\n\ +displacement, with mode @var{mode}, signedness @var{signed_p} and store\n\ +@var{store_p}. Return true if valid.\n\ +The default version of this hook returns false.", + bool, (machine_mode mode, signed HOST_WIDE_INT stride, bool signed_p, + bool store_p, unsigned nunroll), + NULL) + DEFHOOK (predict_doloop_p, "Return true if we can predict it is possible to use a low-overhead loop\n\ -- 2.7.4
[PATCH 1/4 GCC11] Add middle-end unroll factor estimation
gcc/ChangeLog 2020-01-16 Kewen Lin * cfgloop.h (struct loop): New field estimated_uf. * config/rs6000/rs6000.c (TARGET_LOOP_UNROLL_ADJUST_TREE): New macro. (rs6000_loop_unroll_adjust_tree): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_LOOP_UNROLL_ADJUST_TREE): New hook. * target.def (loop_unroll_adjust_tree): New hook. * tree-ssa-loop-manip.c (decide_uf_const_iter): New function. (decide_uf_runtime_iter): Likewise. (decide_uf_stupid): Likewise. (estimate_unroll_factor): Likewise. * tree-ssa-loop-manip.h (estimate_unroll_factor): New declare. * tree-ssa-loop.c (tree_average_num_loop_insns): New function. * tree-ssa-loop.h (tree_average_num_loop_insns): New declare. gcc/cfgloop.h | 3 + gcc/config/rs6000/rs6000.c | 16 ++- gcc/doc/tm.texi| 6 ++ gcc/doc/tm.texi.in | 2 + gcc/target.def | 8 ++ gcc/tree-ssa-loop-manip.c | 254 + gcc/tree-ssa-loop-manip.h | 3 +- gcc/tree-ssa-loop.c| 33 ++ gcc/tree-ssa-loop.h| 2 + 9 files changed, 324 insertions(+), 3 deletions(-) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index e3590d7..feceed6 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -232,6 +232,9 @@ public: Other values means unroll with the given unrolling factor. */ unsigned short unroll; + /* Like unroll field above, but it's estimated in middle-end. */ + unsigned short estimated_uf; + /* If this loop was inlined the main clique of the callee which does not need remapping when copying the loop body. */ unsigned short owned_clique; diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2995348..0dabaa6 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1431,6 +1431,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_LOOP_UNROLL_ADJUST #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust +#undef TARGET_LOOP_UNROLL_ADJUST_TREE +#define TARGET_LOOP_UNROLL_ADJUST_TREE rs6000_loop_unroll_adjust_tree + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS rs6000_init_builtins #undef TARGET_BUILTIN_DECL @@ -5090,7 +5093,8 @@ rs6000_destroy_cost_data (void *data) free (data); } -/* Implement targetm.loop_unroll_adjust. */ +/* Implement targetm.loop_unroll_adjust. Don't forget to update + loop_unroll_adjust_tree for any changes. */ static unsigned rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) @@ -5109,6 +5113,16 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop) return nunroll; } +/* Implement targetm.loop_unroll_adjust_tree, strictly refers to + targetm.loop_unroll_adjust. */ + +static unsigned +rs6000_loop_unroll_adjust_tree (unsigned nunroll, struct loop *loop) +{ + /* For now loop_unroll_adjust is simple, just invoke directly. */ + return rs6000_loop_unroll_adjust (nunroll, loop); +} + /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a library with vectorized intrinsics. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 2244df4..86ad278 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11875,6 +11875,12 @@ is required only when the target has special constraints like maximum number of memory accesses. @end deftypefn +@deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST_TREE (unsigned @var{nunroll}, class loop *@var{loop}) +This target hook is the same as @code{loop_unroll_adjust}, but it's for +middle-end unroll factor estimation computation. See +@code{loop_unroll_adjust} for the function description. +@end deftypefn + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 52cd603..fd9769e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8008,6 +8008,8 @@ lists. @hook TARGET_LOOP_UNROLL_ADJUST +@hook TARGET_LOOP_UNROLL_ADJUST_TREE + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/target.def b/gcc/target.def index e705c5d..f61c831 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2725,6 +2725,14 @@ number of memory accesses.", unsigned, (unsigned nunroll, class loop *loop), NULL) +DEFHOOK +(loop_unroll_adjust_tree, + "This target hook is the same as @code{loop_unroll_adjust}, but it's for\n\ +middle-end unroll factor estimation computation. See\n\ +@code{loop_unroll_adjust} for the function description.", + unsigned, (unsigned nunroll, class loop *loop), + NULL) + /* True if X is a legitimate MODE-mode immediate operand. */ DEFHOOK (legitimate_constant_p, diff --git a/gcc/tree-ssa-loop-manip.c
[PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling
Hi, As we discussed in the thread https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, I'm working to teach IVOPTs to consider D-form group access during unrolling. The difference on D-form and other forms during unrolling is we can put the stride into displacement field to avoid additional step increment. eg: With X-form (uf step increment): ... LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride ... With D-form (one step increment for each base): ... LD A = baseA, OFF LD B = baseB, OFF ST C = baseC, OFF LD A = baseA, OFF+stride LD B = baseB, OFF+stride ST C = baseC, OFF+stride LD A = baseA, OFF+2*stride LD B = baseB, OFF+2*stride ST C = baseC, OFF+2*stride ... baseA += stride * uf baseB += stride * uf baseC += stride * uf Imagining that if the loop get unrolled by 8 times, then 3 step updates with D-form vs. 8 step updates with X-form. Here we only need to check stride meet D-form field requirement, since if OFF doesn't meet, we can construct baseA' with baseA + OFF. This patch set consists four parts: [PATCH 1/4 GCC11] Add middle-end unroll factor estimation Add unroll factor estimation in middle-end. It mainly refers to current RTL unroll factor determination in function decide_unrolling and its sub calls. As Richard B. suggested, we probably can force unroll factor with this and avoid duplicate unroll factor calculation, but I think it need more benchmarking work and should be handled separately. [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p Add one target hook to determine whether the current memory access with the given mode, stride and other flags have available D-form supports. [PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling Teach IVOPTs to identify address type iv group with D-form preferred, and flag dform_p of their derived iv cands. Considering unroll factor, increase iv cost with (uf - 1) * cost_step if it's not a dform iv cand. [PATCH 4/4 GCC11] rs6000: P9 D-form test cases Add some test cases, mainly copied from Kelvin's patch. Bootstrapped and regress tested on powerpc64le-linux-gnu. I'll take two weeks leave soon, please expect late responses. Thanks a lot in advance! BR, Kewen gcc/cfgloop.h | 3 + gcc/config/rs6000/rs6000.c | 56 - gcc/doc/tm.texi | 14 + gcc/doc/tm.texi.in | 4 ++ gcc/target.def | 21 ++- gcc/testsuite/gcc.target/powerpc/p9-dform-0.c | 43 + gcc/testsuite/gcc.target/powerpc/p9-dform-1.c | 55 + gcc/testsuite/gcc.target/powerpc/p9-dform-2.c | 12 gcc/testsuite/gcc.target/powerpc/p9-dform-3.c | 15 + gcc/testsuite/gcc.target/powerpc/p9-dform-4.c | 12 gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h | 34 +++ gcc/tree-ssa-loop-ivopts.c | 84 +- gcc/tree-ssa-loop-manip.c | 254 + gcc/tree-ssa-loop-manip.h | 3 +- gcc/tree-ssa-loop.c | 33 ++ gcc/tree-ssa-loop.h | 2 + 16 files changed, 640 insertions(+), 5 deletions(-)
Re: [PATCH 2/2] Uninitialized padding in struct _dep.
On January 16, 2020 6:08:49 AM GMT+01:00, apin...@marvell.com wrote: >From: Andrew Pinski > >In struct _dep, there is an implicit padding of 4bits. This >bit-field padding is uninitialized when init_dep_1 is being called. >This means we access uninitialized memory but never use it for >anything. Adding an unused bit-field field and initializing it >in init_dep_1 will improve code generation also as we initialize >the whole 32bits now rather than just part of it. > >OK? Bootstrapped and tested on x86_64-linux-gnu. OK. Richard. >Thanks, >Andrew Pinski > >ChangeLog: >* sched-int.h (_dep): Add unused bit-field field for the padding. >* sched-deps.c (init_dep_1): Init unused field. > >Change-Id: I27000323e728f8a73189426e0b9a98c5235b8c55 >--- > gcc/sched-deps.c | 1 + > gcc/sched-int.h | 2 ++ > 2 files changed, 3 insertions(+) > >diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c >index 9182aba5588..331af5ffdb3 100644 >--- a/gcc/sched-deps.c >+++ b/gcc/sched-deps.c >@@ -101,6 +101,7 @@ init_dep_1 (dep_t dep, rtx_insn *pro, rtx_insn >*con, enum reg_note type, ds_t ds > DEP_NONREG (dep) = 0; > DEP_MULTIPLE (dep) = 0; > DEP_REPLACE (dep) = NULL; >+ dep->unused = 0; > } > > /* Init DEP with the arguments. >diff --git a/gcc/sched-int.h b/gcc/sched-int.h >index 833b552a340..a847f876e65 100644 >--- a/gcc/sched-int.h >+++ b/gcc/sched-int.h >@@ -238,6 +238,8 @@ struct _dep >/* Cached cost of the dependency. Make sure to update UNKNOWN_DEP_COST > when changing the size of this field. */ > int cost:20; >+ >+ unsigned unused:4; > }; > > #define UNKNOWN_DEP_COST ((int) ((unsigned int) -1 << 19))
Re: [PATCH 1/2] Fix uninitialized field in expand_operand.
On January 16, 2020 6:08:48 AM GMT+01:00, apin...@marvell.com wrote: >From: Andrew Pinski > >Commit g:f96bf49a0 added the target field to expand_operand. >But it leaves it uninitialized when doing a full initialization >inside create_expand_operand. This fixes the problem and improves >the code generation inside create_expand_operand too. > >OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. OK. Richard. >ChangeLog: >* optabs.h (create_expand_operand): Initialize target field also. > >Change-Id: Ib653fbfbb2b0709970db87fb94de14b59758bc6c >--- > gcc/optabs.h | 1 + > 1 file changed, 1 insertion(+) > >diff --git a/gcc/optabs.h b/gcc/optabs.h >index 07bdc56586e..5bd19503a0a 100644 >--- a/gcc/optabs.h >+++ b/gcc/optabs.h >@@ -78,6 +78,7 @@ create_expand_operand (class expand_operand *op, > { > op->type = type; > op->unsigned_p = unsigned_p; >+ op->target = 0; > op->unused = 0; > op->mode = mode; > op->value = value;
Re: [PATCH] Fix value numbering dealing with reverse byte order
On January 16, 2020 9:13:46 AM GMT+01:00, apin...@marvell.com wrote: >From: Andrew Pinski > >Hi, > While working on bit-field lowering pass, I came across this bug. >The IR looks like: > VIEW_CONVERT_EXPR(var1) = _12; > _1 = BIT_FIELD_REF ; > >Where the BIT_FIELD_REF has REF_REVERSE_STORAGE_ORDER set on it >and var1's type has TYPE_REVERSE_STORAGE_ORDER set on it. >PRE/FRE would decided to prop _12 into the BFR statement >which would produce wrong code. >And yes _12 has the correct byte order already; bit-field lowering >removes the implicit byte swaps in the IR and adds the explicity >to make it easier optimize later on. > >This patch adds a check for storage_order_barrier_p on the lhs tree >which returns true in the case where we had a reverse order with a VCE. > >gcc.c-torture/execute/pr86659-1.c was the testcase which showed the >issue. > >OK? Bootstrapped and tested on x86_64-linux-gnu with no regression. OK. Richard. >Thanks, >Andrew Pinski > >ChangeLog: >* tree-ssa-sccvn.c(vn_reference_lookup_3): Check lhs for >!storage_order_barrier_p. > >Change-Id: I7810de6fc4ff01e431033fa7f7f7b3ec95f67644 >--- > gcc/tree-ssa-sccvn.c | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c >index 3b27c50..4d13015 100644 >--- a/gcc/tree-ssa-sccvn.c >+++ b/gcc/tree-ssa-sccvn.c >@@ -2593,6 +2593,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, >void *data_, >, , , ); > if (base2 > && !reverse >+&& !storage_order_barrier_p (lhs) > && known_eq (maxsize2, size2) > && multiple_p (size2, BITS_PER_UNIT) > && multiple_p (offset2, BITS_PER_UNIT) >@@ -2695,6 +2696,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, >void *data_, >, , , ); > tree def_rhs = gimple_assign_rhs1 (def_stmt); > if (!reverse >+&& !storage_order_barrier_p (lhs) > && known_size_p (maxsize2) > && known_eq (maxsize2, size2) > && adjust_offsets_for_equal_base_address (base, ,
Re: [PATCH] analyzer: fix handling of negative byte offsets (PR 93281)
On Wed, Jan 15, 2020 at 06:56:48PM -0500, David Malcolm wrote: > gcc/analyzer/ChangeLog: > PR analyzer/93281 > * region-model.cc > (region_model::convert_byte_offset_to_array_index): Convert > offset_cst to ssizetype before dividing by byte_size. > --- > gcc/analyzer/region-model.cc | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc > index 5c39be4fd7f..b62ddf82a40 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -6414,9 +6414,12 @@ region_model::convert_byte_offset_to_array_index (tree > ptr_type, >/* This might not be a constant. */ >tree byte_size = size_in_bytes (elem_type); > > + /* Ensure we're in a signed representation before doing the division. > */ > + tree signed_offset_cst = fold_convert (ssizetype, offset_cst); > + >tree index > = fold_build2 (TRUNC_DIV_EXPR, integer_type_node, > -offset_cst, byte_size); > +signed_offset_cst, byte_size); I'd say you want to fold_convert byte_size to ssizetype too. Another thing is the integer_type_node, that looks wrong when you are dividing ssizetype by ssizetype. The fold-const.c folders are sensitive to using incorrect types and type mismatches. And, I think fold_build2 is wasteful, you only care if you can simplify it to a constant (just constant or INTEGER_CST only?). So, either use fold_binary (TRUNC_DIV_EXPR, ssizetype, offset_cst, fold_convert (ssizetype, byte_size)) or, if you have checked that both arguments are INTEGER_CSTs, perhaps int_const_binop or so. >if (CONSTANT_CLASS_P (index)) And this would need to be if (index && TREE_CODE (index) == INTEGER_CST) (or if you can handle other constants (index && CONSTANT_CLASS_P (index)). > return get_or_create_constant_svalue (index); Jakub
[PATCH] Fix value numbering dealing with reverse byte order
From: Andrew Pinski Hi, While working on bit-field lowering pass, I came across this bug. The IR looks like: VIEW_CONVERT_EXPR(var1) = _12; _1 = BIT_FIELD_REF ; Where the BIT_FIELD_REF has REF_REVERSE_STORAGE_ORDER set on it and var1's type has TYPE_REVERSE_STORAGE_ORDER set on it. PRE/FRE would decided to prop _12 into the BFR statement which would produce wrong code. And yes _12 has the correct byte order already; bit-field lowering removes the implicit byte swaps in the IR and adds the explicity to make it easier optimize later on. This patch adds a check for storage_order_barrier_p on the lhs tree which returns true in the case where we had a reverse order with a VCE. gcc.c-torture/execute/pr86659-1.c was the testcase which showed the issue. OK? Bootstrapped and tested on x86_64-linux-gnu with no regression. Thanks, Andrew Pinski ChangeLog: * tree-ssa-sccvn.c(vn_reference_lookup_3): Check lhs for !storage_order_barrier_p. Change-Id: I7810de6fc4ff01e431033fa7f7f7b3ec95f67644 --- gcc/tree-ssa-sccvn.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 3b27c50..4d13015 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -2593,6 +2593,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, , , , ); if (base2 && !reverse + && !storage_order_barrier_p (lhs) && known_eq (maxsize2, size2) && multiple_p (size2, BITS_PER_UNIT) && multiple_p (offset2, BITS_PER_UNIT) @@ -2695,6 +2696,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, , , , ); tree def_rhs = gimple_assign_rhs1 (def_stmt); if (!reverse + && !storage_order_barrier_p (lhs) && known_size_p (maxsize2) && known_eq (maxsize2, size2) && adjust_offsets_for_equal_base_address (base, , -- 1.8.3.1
Re: [C++ coroutines 4/6] Middle end expanders and transforms.
Bin.Cheng wrote: >>> + gassign *get_res = gimple_build_assign (lhs, done); >>> + gsi_replace (gsi, get_res, true); >>> + *handled_ops_p = true; >>> + } >>> + break; >>> + } >>> +} >>> + return NULL_TREE; >>> +} >>> + >>> +/* Main entry point for lowering coroutine FE builtins. */ >>> + >>> +static unsigned int >>> +execute_lower_coro_builtins (void) >>> +{ >>> + struct walk_stmt_info wi; >>> + gimple_seq body; >>> + >>> + body = gimple_body (current_function_decl); >>> + memset (, 0, sizeof (wi)); >>> + walk_gimple_seq_mod (, lower_coro_builtin, NULL, ); >>> + gimple_set_body (current_function_decl, body); >> >> it would be nice to elide the function walk for functions not >> containing any CO* stuff (you noted that below for other parts). >> We can spend a bit in struct function noting functions with >> coroutine code inside and set the bit from frontends or from >> the gimplifier for example. Since it's behind the flag_coroutines >> paywall this can be addressed as followup. > > Yes, this bit flag is necessary for following optimization passes, I > wonder which bit you would suggest? it’s implemented in the patches (that should be merged soon, just retesing after the git transition). Uses a flag bit in the function struct. see: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00435.html diff --git a/gcc/function.h b/gcc/function.h index 496d3f728c..1ee8ed3de5 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -418,6 +418,9 @@ struct GTY(()) function { /* Set when the function was compiled with generation of debug (begin stmt, inline entry, ...) markers enabled. */ unsigned int debug_nonbind_markers : 1; + + /* Set if this is a coroutine-related function. */ + unsigned int coroutine_component : 1; };