date:20211012

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Hongtao Liu via Gcc-patches

On Tue, Oct 12, 2021 at 11:49 PM Martin Sebor  wrote:
>
> On 10/11/21 8:31 PM, Hongtao Liu wrote:
> > On Tue, Oct 12, 2021 at 4:08 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> On 10/11/21 11:43 AM, Segher Boessenkool wrote:
> >>> On Mon, Oct 11, 2021 at 10:23:03AM -0600, Martin Sebor wrote:
>  On 10/11/21 9:30 AM, Segher Boessenkool wrote:
> > On Mon, Oct 11, 2021 at 10:47:00AM +0800, Kewen.Lin wrote:
> >> - For generic test cases, it follows the existing suggested
> >> practice with necessary target/xfail selector.
> >
> > Not such a great choice.  Many of those tests do not make sense with
> > vectorisation enabled.  This should have been thought about, in some
> > cases resulting in not running the test with vectorisation enabled, and
> > in some cases duplicating the test, once with and once without
> > vectorisation.
> 
>  The tests detect bugs that are present both with and without
>  vetctorization, so they should pass both ways.
> >>>
> >>> Then it should be tested both ways!  This is my point.
> >>
> >> Agreed.  (Most warnings are tested with just one set of options,
> >> but it's becoming apparent that the middle end ones should be
> >> exercised more extensively.)
> >>
> >>>
>  That they don't
>  tells us that that the warnings need work (they were written with
>  an assumption that doesn't hold anymore).
> >>>
> >>> They were written in world A.  In world B many things behave
> >>> differently.  Transplanting the testcases from A to B without any extra
> >>> analysis will not test what the testcases wanted to test, and possibly
> >>> nothing at all anymore.
> >>
> >> Absolutely.
> >>
> >>>
>  We need to track that
>  work somehow, but simply xfailing them without making a record
>  of what underlying problem the xfails correspond to isn't the best
>  way.  In my experience, what works well is opening a bug for each
>  distinct limitation (if one doesn't already exist) and adding
>  a reference to it as a comment to the xfail.
> >>>
> >>> Probably, yes.
> >>>
> > But you are just following established practice, so :-)
> >>>
> >>> I also am okay with this.  If it was decided x86 does not have to deal
> >>> with these (generic!) problems, then why should we do other people's
> >>> work?
> >>
> >> I don't know that anything was decided.  I think those changes
> >> were made in haste, and (as you noted in your review of these
> >> updates to them), were incomplete (missing comments referencing
> >> the underlying bugs or limitations).  Now that we've noticed it
> >> we should try to fix it.  I'm not expecting you (or Kwen) to do
> >> other people's work, but it would help to let them/us know that
> >> there is work for us to do.  I only noticed the problem by luck.
> >>
> >> -  struct A1 a = { 0, { 1 } };   // { dg-warning
> >> "\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* } } }
> >> +  struct A1 a = { 0, { 1 } };   // { dg-warning
> >> "\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* 
> >> powerpc*-*-*
> >> } } }
> 
>  As I mentioned in the bug, when adding xfails for regressions
>  please be sure to reference the bug that tracks the underlying
>  root cause.]
> >>>
> >>> You are saying this to whoever added that x86 xfail I hope.
> >>
> >> In general it's an appeal to both authors and reviewers of such
> >> changes.  Here, it's mostly for Hongtao who apparently added all
> >> these undocumented xfails.
> >>
>  There may be multiple problems, and we need to
>  identify what it is in each instance.  As the author of
>  the tests I can help with that but not if I'm not in the loop
>  on these changes (it would seem prudent to get the author's
>  thoughts on such sweeping changes to their work).
> >>>
> >>> Yup.
> >>>
>  I discussed one of these failures with Hongtao in detail at
>  the time autovectorization was being enabled and made the same
>  request then but I didn't realize the problem was so pervasive.
> 
>  In addition, the target-specific conditionals in the xfails are
>  going to be difficult to maintain.
> >>>
> >>> It is a cop-out.  Especially because it makes no comment why it is
> >>> xfailed (which should *always* be explained!)
> >>>
>  It might be okay for one or
>  two in a single test but for so many we need a better solution
>  than that.  If autovectorization is only enabled for a subset
>  of targets then a solution might be to add a new DejagGNU test
>  for it and conditionalize the xfails on it.
> >>>
> >>> That, combined with duplicating these tests and still testing the
> >>> -fno-vectorization situation properly.  Those tests tested something.
> >>> With vectorisation enabled they might no longer test that same thing,
> >>> especially if the test fails now!
> >>
> >> Right.  The original autovectorization change was made either
> >> without a full

Re: [PATCH] rs6000: Remove builtin mask check from builtin_decl [PR102347]

2021-10-12 Thread Kewen.Lin via Gcc-patches

Hi Bill!

on 2021/10/13 上午12:36, Bill Schmidt wrote:
> Hi Kewen,
> 
> On 10/11/21 1:30 AM, Kewen.Lin wrote:
>> Hi Segher,
>>
>> Thanks for the comments.
>>
>> on 2021/10/1 上午6:13, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Thu, Sep 30, 2021 at 11:06:50AM +0800, Kewen.Lin wrote:
>>>
>>> [ huge snip ]
>>>
 Based on the understanding and testing, I think it's safe to adopt this 
 patch.
 Do both Peter and you agree the rs6000_expand_builtin will catch the 
 invalid built-in?
 Is there some special case which probably escapes out?
>>> The function rs6000_builtin_decl has a terribly generic name.  Where all
>>> is it called from?  Do all such places allow the change in semantics?
>>> Do any comments or other documentation need to change?  Is the function
>>> name still good?
>>
>> % grep -rE "\ \(" .
>> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
>> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
>> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
>> ./gcc/config/aarch64/aarch64.c:  return aarch64_sve::builtin_decl 
>> (subcode, initialize_p);
>> ./gcc/config/aarch64/aarch64-protos.h:  tree builtin_decl (unsigned, bool);
>> ./gcc/config/aarch64/aarch64-sve-builtins.cc:builtin_decl (unsigned int 
>> code, bool)
>> ./gcc/tree-streamer-in.c:  tree result = targetm.builtin_decl 
>> (fcode, true);
>>
>> % grep -rE "\ \(" .
>> ./gcc/config/rs6000/rs6000-c.c:  if (rs6000_builtin_decl 
>> (instance->bifid, false) != error_mark_node
>> ./gcc/config/rs6000/rs6000-c.c:  if (rs6000_builtin_decl 
>> (instance->bifid, false) != error_mark_node
>> ./gcc/config/rs6000/rs6000-c.c:  if (rs6000_builtin_decl 
>> (instance->bifid, false) != error_mark_node
>> ./gcc/config/rs6000/rs6000-gen-builtins.c:  "extern tree 
>> rs6000_builtin_decl (unsigned, "
>> ./gcc/config/rs6000/rs6000-call.c:rs6000_builtin_decl (unsigned code, bool 
>> initialize_p ATTRIBUTE_UNUSED)
>> ./gcc/config/rs6000/rs6000-internal.h:extern tree rs6000_builtin_decl 
>> (unsigned code,
>>
>> As above, the call sites are mainly in
>>   1) function unpack_ts_function_decl_value_fields in gcc/tree-streamer-in.c
>>   2) function altivec_resolve_new_overloaded_builtin in 
>> gcc/config/rs6000/rs6000-c.c
>>
>> 2) is newly introduced by Bill's bif rewriting patch series, all uses in it 
>> are
>> along with rs6000_new_builtin_is_supported which adopts a new way to check 
>> bif
>> supported or not (the old rs6000_builtin_is_supported_p uses builtin mask), 
>> so
>> I think the builtin mask checking is useless (unexpected?) for these uses.
> 
> Things are a bit confused because we are part way through the patch series.
> rs6000_builtin_decl will be changed to redirect to rs6000_new_builtin_decl 
> when
> using the new builtin support.  That function will be:
> 
> static tree
> rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> {
>   rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;
> 
>   if (fcode >= RS6000_OVLD_MAX)
> return error_mark_node;
> 
>   if (!rs6000_new_builtin_is_supported (fcode))
> {
>   rs6000_invalid_new_builtin (fcode);
>   return error_mark_node;
> }
> 
>   return rs6000_builtin_decls_x[code];
> }
> 
> So, as you surmise, this will be using the new method of testing for builtin 
> validity.
> You can ignore the rs6000-c.c and rs6000-gen-builtins.c references of 
> rs6000_builtin_decl
> for purposes of fixing the existing way of doing things.
> 

Thanks for the explanation, it makes more sense. 

>>
>> Besides, the description for this hook:
>>
>> "tree TARGET_BUILTIN_DECL (unsigned code, bool initialize_p) [Target Hook]
>> Define this hook if you have any machine-specific built-in functions that 
>> need to be
>> defined. It should be a function that returns the builtin function 
>> declaration for the
>> builtin function code code. If there is no such builtin and it cannot be 
>> initialized at
>> this time if initialize p is true the function should return NULL_TREE. If 
>> code is out
>> of range the function should return error_mark_node."
>>
>> It would only return error_mark_node when the code is out of range.  The 
>> current
>> rs6000_builtin_decl returns error_mark_node not only for "out of range", it 
>> looks
>> inconsistent and this patch also revise it.
>>
>> The hook was introduced by commit e9e4b3a892d0d19418f23bb17bdeac33f9a8bfd2,
>> it meant to ensure the bif function_decl is valid (check if bif code in the
>> range and the corresponding entry in bif table is not NULL).  May be better
>> with name check_and_get_builtin_decl?  CC Richi, he may have more insights.
>>
 By the way, I tested the bif rewriting patch series V5, it couldn't make 
 the original
 case in PR (S5) pass, I may miss something or the used series isn't 
 up-to-date.  Could
 you help to have a try?  I agree with Peter, if the rewriting can fix this 
 issue,

PING^3 [PATCH] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-10-12 Thread Kewen.Lin via Gcc-patches

Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html

One related patch [1] is ready to commit, whose test cases rely on
this patch if no changes are applied to them.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html

BR,
Kewen

>> on 2021/9/1 下午2:55, Kewen.Lin via Gcc-patches wrote:
>>> Hi!
>>>
>>> This patch is to fix the inconsistent behaviors for non-LTO mode
>>> and LTO mode.  As Martin pointed out, currently the function
>>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>>> NULL, but it's wrong, we should use the command line options
>>> from target_option_default_node as default.  It also replaces
>>> rs6000_isa_flags with the one from target_option_default_node
>>> when caller_tree is NULL as rs6000_isa_flags could probably
>>> change since initialization.
>>>
>>> It also extends the scope of the check for the case that callee
>>> has explicit set options, for test case pr102059-2.c inlining can
>>> happen unexpectedly before, it's fixed accordingly.
>>>
>>> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
>>> can be neglected for inlining, this patch also exludes them when
>>> the callee is attributed by always_inline.
>>>
>>> Bootstrapped and regtested on powerpc64le-linux-gnu Power9.
>>>
>>> BR,
>>> Kewen
>>> -
>>> gcc/ChangeLog:
>>>
>>> PR ipa/102059
>>> * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>>> target_option_default_node and consider always_inline_safe flags.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR ipa/102059
>>> * gcc.target/powerpc/pr102059-1.c: New test.
>>> * gcc.target/powerpc/pr102059-2.c: New test.
>>> * gcc.target/powerpc/pr102059-3.c: New test.
>>> * gcc.target/powerpc/pr102059-4.c: New test.
>>>
>>

PING^1 [PATCH v2] rs6000: Modify the way for extra penalized cost

2021-10-12 Thread Kewen.Lin via Gcc-patches

Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580358.html

BR,
Kewen

on 2021/9/28 下午4:16, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This patch follows the discussions here[1][2], where Segher
> pointed out the existing way to guard the extra penalized
> cost for strided/elementwise loads with a magic bound does
> not scale.
> 
> The way with nunits * stmt_cost can get one much
> exaggerated penalized cost, such as: for V16QI on P8, it's
> 16 * 20 = 320, that's why we need one bound.  To make it
> better and more readable, the penalized cost is simplified
> as:
> 
> unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
> unsigned extra_cost = nunits * adjusted_cost;
> 
> For V2DI/V2DF, it uses 2 penalized cost for each scalar load
> while for the other modes, it uses 1.  It's mainly concluded
> from the performance evaluations.  One thing might be
> related is that: More units vector gets constructed, more
> instructions are used.  It has more chances to schedule them
> better (even run in parallelly when enough available units
> at that time), so it seems reasonable not to penalize more
> for them.
> 
> The SPEC2017 evaluations on Power8/Power9/Power10 at option
> sets O2-vect and Ofast-unroll show this change is neutral.
> 
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
> 
> Is it ok for trunk?
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579529.html
> 
> BR,
> Kewen
> -
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
>   the way to compute extra penalized cost.  Remove useless parameter.
>   (rs6000_add_stmt_cost): Adjust the call to function
>   rs6000_update_target_cost_per_stmt.
> 
> 
> ---
>  gcc/config/rs6000/rs6000.c | 31 ++-
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index dd42b0964f1..8200e1152c2 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5422,7 +5422,6 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
> *data,
>   enum vect_cost_for_stmt kind,
>   struct _stmt_vec_info *stmt_info,
>   enum vect_cost_model_location where,
> - int stmt_cost,
>   unsigned int orig_count)
>  {
> 
> @@ -5462,17 +5461,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
> *data,
>   {
> tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> unsigned int nunits = vect_nunits_for_cost (vectype);
> -   unsigned int extra_cost = nunits * stmt_cost;
> -   /* As function rs6000_builtin_vectorization_cost shows, we have
> -  priced much on V16QI/V8HI vector construction as their units,
> -  if we penalize them with nunits * stmt_cost, it can result in
> -  an unreliable body cost, eg: for V16QI on Power8, stmt_cost
> -  is 20 and nunits is 16, the extra cost is 320 which looks
> -  much exaggerated.  So let's use one maximum bound for the
> -  extra penalized cost for vector construction here.  */
> -   const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
> -   if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
> - extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
> +   /* Don't expect strided/elementwise loads for just 1 nunit.  */
> +   gcc_assert (nunits > 1);
> +   /* i386 port adopts nunits * stmt_cost as the penalized cost
> +  for this kind of penalization, we used to follow it but
> +  found it could result in an unreliable body cost especially
> +  for V16QI/V8HI modes.  To make it better, we choose this
> +  new heuristic: for each scalar load, we use 2 as penalized
> +  cost for the case with 2 nunits and use 1 for the other
> +  cases.  It's without much supporting theory, mainly
> +  concluded from the broad performance evaluations on Power8,
> +  Power9 and Power10.  One possibly related point is that:
> +  vector construction for more units would use more insns,
> +  it has more chances to schedule them better (even run in
> +  parallelly when enough available units at that time), so
> +  it seems reasonable not to penalize that much for them.  */
> +   unsigned int adjusted_cost = (nunits == 2) ? 2 : 1;
> +   unsigned int extra_cost = nunits * adjusted_cost;
> data->extra_ctor_cost += extra_cost;
>   }
>  }
> @@ -5510,7 +5515,7 @@ rs6000_add_stmt_cost (class vec_info *vinfo, void 
> *data, int count,
>cost_data->cost[where] += retval;
> 
>rs6000_update_target_cost_per_stmt

PING^4 [PATCH v2] combine: Tweak the condition of last_set invalidation

2021-10-12 Thread Kewen.Lin via Gcc-patches

Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572555.html

BR,
Kewen

>>> on 2021/6/11 下午9:16, Kewen.Lin via Gcc-patches wrote:
 Hi Segher,

 Thanks for the review!

 on 2021/6/10 上午4:17, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
>> Currently we have the check:
>>
>>   if (!insn
>>|| (value && rsp->last_set_table_tick >= label_tick_ebb_start))
>>  rsp->last_set_invalid = 1; 
>>
>> which means if we want to record some value for some reg and
>> this reg got refered before in a valid scope,
>
> If we already know it is *set* in this same extended basic block.
> Possibly by the same instruction btw.
>
>> we invalidate the
>> set of reg (last_set_invalid to 1).  It avoids to find the wrong
>> set for one reg reference, such as the case like:
>>
>>... op regX  // this regX could find wrong last_set below
>>regX = ...   // if we think this set is valid
>>... op regX
>
> Yup, exactly.
>
>> But because of retry's existence, the last_set_table_tick could
>> be set by some later reference insns, but we see it's set due
>> to retry on the set (for that reg) insn again, such as:
>>
>>insn 1
>>insn 2
>>
>>regX = ... --> (a)
>>... op regX--> (b)
>>
>>insn 3
>>
>>// assume all in the same BB.
>>
>> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
>> (3 insns -> 2 insns),
>
> This will delete insn 1 and write the combined result to insns 2 and 3.
>
>> retrying from insn1 or insn2 again:
>
> Always 2, but your point remains valid.
>
>> it will scan insn (a) again, the below condition holds for regX:
>>
>>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
>>
>> it will mark this set as invalid set.  But actually the
>> last_set_table_tick here is set by insn (b) before retrying, so it
>> should be safe to be taken as valid set.
>
> Yup.
>
>> This proposal is to check whether the last_set_table safely happens
>> after the current set, make the set still valid if so.
>
>> Full SPEC2017 building shows this patch gets more sucessful combines
>> from 1902208 to 1902243 (trivial though).
>
> Do you have some example, or maybe even a testcase?  :-)
>

 Sorry for the late reply, it took some time to get one reduced case.

 typedef struct SA *pa_t;

 struct SC {
   int h;
   pa_t elem[];
 };

 struct SD {
   struct SC *e;
 };

 struct SA {
   struct {
 struct SD f[1];
   } g;
 };

 void foo(pa_t *k, char **m) {
   int l, i;
   pa_t a;
   l = (int)a->g.f[5].e;
   i = 0;
   for (; i < l; i++) {
 k[i] = a->g.f[5].e->elem[i];
 m[i] = "";
   }
 }

 Baseline is r12-0 and the option is "-O3 -mcpu=power9 
 -fno-strict-aliasing",
 with this patch, the generated assembly can save two rlwinm s.

>> +  /* Record the luid of the insn whose expression involving register n. 
>>  */
>> +
>> +  int   last_set_table_luid;
>
> "Record the luid of the insn for which last_set_table_tick was set",
> right?
>

 But it can be updated later to one smaller luid, how about the wording 
 like:


 +  /* Record the luid of the insn which uses register n, the insn should
 + be the first one using register n in that block of the insn which
 + last_set_table_tick was set for.  */


>> -static void update_table_tick (rtx);
>> +static void update_table_tick (rtx, int);
>
> Please remove this declaration instead, the function is not used until
> after its actual definition :-)
>

 Done.

>> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
>>for (r = regno; r < endregno; r++)
>>  {
>>reg_stat_type *rsp = _stat[r];
>> -  rsp->last_set_table_tick = label_tick;
>> +  if (rsp->last_set_table_tick >= label_tick_ebb_start)
>> +{
>> +  /* Later references should not have lower ticks.  */
>> +  gcc_assert (label_tick >= rsp->last_set_table_tick);
>
> This should be obvious, but checking it won't hurt, okay.
>
>> +  /* Should pick up the lowest luid if the references
>> + are in the same block.  */
>> +  if (label_tick == rsp->last_set_table_tick
>> +  && rsp->last_set_table_luid > insn_luid)
>> +rsp->last_set_table_luid = insn_luid;
>
> Why?  Is it conservative for the check you will do later?  Please spell
> this

[PATCH] Adjust testcase for O2 vectorization[Wuninitialized]

2021-10-12 Thread liuhongt via Gcc-patches

As discussed in PR.
It looks like it's just the the location of the warning that's off,
the warning itself is still issued but it's swallowed by the
dg-prune-output directive.

Since the test was added to verify the fix for an ICE without
vectorization I think disabling vectorization should be fine.
Ideally, we would understand why the location is wrong so let's keep
this bug open and add a comment to the test referencing this bug.

Pushed to trunk.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wuninitialized-13.C: Add -fno-tree-vectorize.
---
 gcc/testsuite/g++.dg/warn/Wuninitialized-13.C | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C
index 210e74c3c3b..e92978f809a 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C
@@ -1,11 +1,14 @@
 /* PR c/98597 - ICE in -Wuninitialized printing a MEM_REF
{ dg-do compile }
-   { dg-options "-O2 -Wall" } */
+   { dg-options "-O2 -Wall -fno-tree-vectorize" } */
 
+/* After vectorization, the location of the warning that's off,
+   the warning itself is still issued but it's swallowed by
+   the dg-prune-output directive. Refer to pr102700.  */
 struct shared_count {
   shared_count () { }
   shared_count (shared_count )
-: pi (r.pi) { } // { dg-warning "\\\[-Wuninitialized" "" { xfail { 
i?86-*-* x86_64-*-* } } }
+: pi (r.pi) { } // { dg-warning "\\\[-Wuninitialized"}
   int pi;
 };
 
-- 
2.18.1

Re: [PATCH] libgccjit: add some reflection functions in the jit C api

2021-10-12 Thread Antoni Boucher via Gcc-patches

David: PING

Le lundi 27 septembre 2021 à 20:53 -0400, Antoni Boucher a écrit :
> I fixed an issue (it would show an error message when
> gcc_jit_type_dyncast_function_ptr_type was called on a type different
> than a function pointer type).
> 
> Here's the updated patch.
> 
> Le vendredi 18 juin 2021 à 16:37 -0400, David Malcolm a écrit :
> > On Fri, 2021-06-18 at 15:41 -0400, Antoni Boucher wrote:
> > > I have write access now.
> > 
> > Great.
> > 
> > > I'm not sure how I'm supposed to send my patches:
> > > should I put it in personal branches and you'll merge them?
> > 
> > Please send them to this mailing list for review; once they're
> > approved
> > you can merge them.
> > 
> > > 
> > > And for the MAINTAINERS file, should I just push to master right
> > > away,
> > > after sending it to the mailing list?
> > 
> > I think people just push the MAINTAINERS change and then let the
> > list
> > know, since it makes a good test that write access is working
> > correctly.
> > 
> > Dave
> > 
> > > 
> > > Thanks for your help!
> > > 
> > > Le vendredi 18 juin 2021 à 12:09 -0400, David Malcolm a écrit :
> > > > On Fri, 2021-06-18 at 11:55 -0400, Antoni Boucher wrote:
> > > > > Le vendredi 11 juin 2021 à 14:00 -0400, David Malcolm a
> > > > > écrit :
> > > > > > On Fri, 2021-06-11 at 08:15 -0400, Antoni Boucher wrote:
> > > > > > > Thank you for your answer.
> > > > > > > I attached the updated patch.
> > > > > > 
> > > > > > BTW you (or possibly me) dropped the mailing lists; was
> > > > > > that
> > > > > > deliberate?
> > > > > 
> > > > > Oh, my bad.
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > I have signed the FSF copyright attribution.
> > > > > > 
> > > > > > I can push changes on your behalf, but I'd prefer it if you
> > > > > > did
> > > > > > it,
> > > > > > especially given that you have various other patches you
> > > > > > want
> > > > > > to
> > > > > > get
> > > > > > in.
> > > > > > 
> > > > > > Instructions on how to get push rights to the git repo are
> > > > > > here:
> > > > > >   https://gcc.gnu.org/gitwrite.html
> > > > > > 
> > > > > > I can sponsor you.
> > > > > 
> > > > > Thanks.
> > > > > I did sign up to get push rights.
> > > > > Have you accepted my request to get those?
> > > > 
> > > > I did, but I didn't see any kind of notification.  Did you get
> > > > an
> > > > email
> > > > about it?
> > > > 
> > > > 
> > > > Dave
> > > > 
> > > 
> > > 
> > 
> > 
>

Re: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-12 Thread Xionghu Luo via Gcc-patches

Thanks David,

On 2021/10/13 06:51, David Edelsohn wrote:
> Hi, Xionghu
> 
> What's the status of the \M and \m testcase beautification requested
> by Segher?  Did you send an updated patch? Your messages ping the
> version prior to Segher's additional comments.

The pinged link already answered Segher's questions and included a patch
pasted in it.  To follow Segher's preference ;), I just post a v2 patch
here:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581497.html

\M and \m are actually not quite necessary to the testcase
gcc.target/powerpc/builtins-1.c since it is built with
"-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp", so the testcase also counts
the generated instruction patterns.

> 
> It seems that the changes to the patterns are complete, but there are
> remaining questions about the testcase style and if the instruction
> counts are ideal. I trust that the instruction counts match the
> behavior after the patch, but it seemed that Segher wanted to confirm
> that the counts are the values desired / expected from optimal code
> generation.  The counts are the total for the file, which doesn't
> communicate if the sequences themselves are optimal.

Will rebase and retest after Segher's review of the v2 patch.

-- 
Thanks,
Xionghu

[PATCH v2] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-12 Thread Xionghu Luo via Gcc-patches

Resend this patch.  Previous discussion is:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html

vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrglb.
Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
+ vec_concat as normal RTL.

Tested pass on P8LE, P9LE and P8BE{m32}, ok for trunk?

gcc/ChangeLog:

* config/rs6000/altivec.md (*altivec_vmrghb_internal): Delete.
(altivec_vmrghb_direct): New.
(*altivec_vmrghh_internal): Delete.
(altivec_vmrghh_direct): New.
(*altivec_vmrghw_internal): Delete.
(altivec_vmrghw_direct_): New.
(altivec_vmrghw_direct): Delete.
(*altivec_vmrglb_internal): Delete.
(altivec_vmrglb_direct): New.
(*altivec_vmrglh_internal): Delete.
(altivec_vmrglh_direct): New.
(*altivec_vmrglw_internal): Delete.
(altivec_vmrglw_direct_): New.
(altivec_vmrglw_direct): Delete.
* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
* config/rs6000/rs6000.c (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md (vsx_xxmrghw_): Adjust.
(vsx_xxmrglw_): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/builtins-1.c: Update instruction counts.
---
 gcc/config/rs6000/altivec.md  | 203 +-
 gcc/config/rs6000/rs6000-p8swap.c |   2 -
 gcc/config/rs6000/rs6000.c|  75 +++
 gcc/config/rs6000/vsx.md  |  26 ++-
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |   8 +-
 5 files changed, 116 insertions(+), 198 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 208d6343225..097a127be07 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -143,8 +143,6 @@ (define_c_enum "unspec"
UNSPEC_VUPKHU_V4SF
UNSPEC_VUPKLU_V4SF
UNSPEC_VGBBD
-   UNSPEC_VMRGH_DIRECT
-   UNSPEC_VMRGL_DIRECT
UNSPEC_VSPLT_DIRECT
UNSPEC_VMRGEW_DIRECT
UNSPEC_VMRGOW_DIRECT
@@ -1291,19 +1289,17 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT 
(17),
-  GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
-  GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
-  GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23));
-  rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]);
-  x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v));
-  emit_insn (gen_rtx_SET (operands[0], x));
+  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
+   : gen_altivec_vmrglb_direct;
+  if (!BYTES_BIG_ENDIAN)
+std::swap (operands[1], operands[2]);
+  emit_insn (fun (operands[0], operands[1], operands[2]));
   DONE;
 })
 
-(define_insn "*altivec_vmrghb_internal"
+(define_insn "altivec_vmrghb_direct"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
-(vec_select:V16QI
+   (vec_select:V16QI
  (vec_concat:V32QI
(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v"))
@@ -1316,20 +1312,6 @@ (define_insn "*altivec_vmrghb_internal"
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
   "TARGET_ALTIVEC"
-{
-  if (BYTES_BIG_ENDIAN)
-return "vmrghb %0,%1,%2";
-  else
-return "vmrglb %0,%2,%1";
-}
-  [(set_attr "type" "vecperm")])
-
-(define_insn "altivec_vmrghb_direct"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
-  (match_operand:V16QI 2 "register_operand" "v")]
- UNSPEC_VMRGH_DIRECT))]
-  "TARGET_ALTIVEC"
   "vmrghb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
@@ -1339,16 +1321,15 @@ (define_expand "altivec_vmrghh"
(use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9),
-  GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11));
-  rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]);
-
-  x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v));
-  emit_insn (gen_rtx_SET (operands[0], x));
+  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
+   : gen_altivec_vmrglh_direct;
+  if (!BYTES_BIG_ENDIAN)
+std::swap (operands[1], operands[2]);
+  emit_insn (fun (operands[0], operands[1], operands[2]));
   DONE;
 })
 
-(define_insn "*altivec_vmrghh_internal"
+(define_insn "altivec_vmrghh_direct"
   [(set (match_operand:V8HI 0 "register_operand" "=v")

Re: [PATCH] check to see if null pointer is dereferenceable [PR102630]

2021-10-12 Thread Martin Sebor via Gcc-patches


On 10/11/21 6:26 PM, Joseph Myers wrote:

The testcase uses the __seg_fs address space, which is x86-specific, but
it isn't in an x86-specific directory or otherwise restricted to x86
targets; thus, I'd expect it to fail for other architectures.

This is not a review of the rest of the patch.



Good point!  I thought I might make the test target-independent
(via macros) but it looks like just i386 defines the hook to
something other than false so I should probably move it under
i386.

Thanks
Martin

Re: [PATCH] Warray-bounds: Warn only for generic address spaces

2021-10-12 Thread Siddhesh Poyarekar


On 10/13/21 00:36, Martin Sebor wrote:

On 10/12/21 12:33 PM, Siddhesh Poyarekar wrote:

The warning is falsely triggered for THREAD_SELF in glibc when
accessing TCB through the segment register.


Thanks for looking into it!  The Glibc warning is being tracked
in PR 102630.  The root cause behind it is in compute_objsize_r
in pointer-query.cc (which is used by -Warray-bounds as well as
other warnings).  I just posted a patch for it the other day;
it's waiting for approval (though as Joseph noted, I need to
adjust the test and either make it target-independent or move
it under i386).


Ahh, targetm.addr_space.zero_address_valid was what I was looking for 
and didn't find.  Your fix looks good to me module moving the test out 
into gcc.target/i386.


Thanks,
Siddhesh

[committed] c-family: Support format checking C2X %b, %B formats

2021-10-12 Thread Joseph Myers

C2X adds a %b printf format to print integers in binary (analogous to
%x, including %#b printing a leading 0b on nonzero integers), with
recommended practice for a corresponding %B (where %#B uses 0B instead
of 0b) where that doesn't conflict with existing implementation
extensions.  See N2630 for details (accepted for C2X, not yet in the
latest working draft).  There is also a scanf %b format.

Add corresponding format checking support (%b accepted by -std=c2x
-Wformat -pedantic, %B considered an extension to be diagnosed with
-Wformat -pedantic).  glibc support for the printf formats has been
proposed at

(scanf support to be done in a separate patch).

Note that this does not add any support for these formats to the code
for bounding the amount of output produces by a printf function,
although that would also be useful.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c-family/
* c-format.c (print_char_table): Add %b and %B formats.
(scan_char_table): Add %b format.
* c-format.h (T2X_UI, T2X_UL, T2X_ULL, T2X_US, T2X_UC, T2X_ST)
(T2X_UPD, T2X_UIM): New macros.

gcc/testsuite/
* gcc.dg/format/c11-printf-1.c, gcc.dg/format/c11-scanf-1.c,
gcc.dg/format/c2x-printf-1.c, gcc.dg/format/c2x-scanf-1.c,
gcc.dg/format/ext-9.c, gcc.dg/format/ext-10.c: New tests.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index ca66c81f716..c27faf71676 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -712,11 +712,14 @@ static const format_char_info print_char_table[] =
   /* C99 conversion specifiers.  */
   { "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "",   
NULL },
   { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64,  TEX_D128 }, "-wp0 +#",   "",   
NULL },
+  /* C2X conversion specifiers.  */
+  { "b",   0, STD_C2X, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL 
},
   /* X/Open conversion specifiers.  */
   { "C",   0, STD_EXT, { TEX_WI,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","",   NULL 
},
   { "S",   1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "R",  NULL 
},
   /* GNU conversion specifiers.  */
   { "m",   0, STD_EXT, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "",   NULL 
},
+  { "B",   0, STD_EXT, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL 
},
   { NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
 };
 
@@ -876,6 +879,8 @@ static const format_char_info scan_char_table[] =
   /* C99 conversion specifiers.  */
   { "F",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "*w'",  "W",   NULL },
   { "aA",   1, STD_C99, { T99_F,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD, 
 BADLEN,  BADLEN,  BADLEN,  TEX_D32,  TEX_D64,  TEX_D128 }, "*w'",  "W",   NULL 
},
+  /* C2X conversion specifiers.  */
+  { "b", 1, STD_C2X, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, 
TEX_ULL, T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN }, "*w",   "W",   
NULL },
   /* X/Open conversion specifiers.  */
   { "C", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*mw",   "W",   
NULL },
   { "S", 1, STD_EXT, { TEX_W,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "*amw",  "W",   
NULL },
diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 2f926f4c8c1..2b5012ee3a9 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -278,13 +278,17 @@ struct format_kind_info
 #define T89_S  { STD_C89, NULL, T_S }
 #define T_UI   _type_node
 #define T89_UI { STD_C89, NULL, T_UI }
+#define T2X_UI { STD_C2X, NULL, T_UI }
 #define T_UL   _unsigned_type_node
 #define T89_UL { STD_C89, NULL, T_UL }
+#define T2X_UL { STD_C2X, NULL, T_UL }
 #define T_ULL  _long_unsigned_type_node
 #define T9L_ULL{ STD_C9L, NULL, T_ULL }
+#define T2X_ULL{ STD_C2X, NULL, T_ULL }
 #define TEX_ULL{ STD_EXT, NULL, T_ULL }
 #define T_US   _unsigned_type_node
 #define T89_US { STD_C89, NULL, T_US }
+#define T2X_US { STD_C2X, NULL, T_US }
 #define T_F_type_node
 #define T89_F  { STD_C89, NULL, T_F }
 #define T99_F  { STD_C99, NULL, T_F }
@@ -300,6 +304,7 @@ struct format_kind_info
 #define T99_SC { STD_C99, NULL, T_SC }

[RFC][patch][PR102281]Clear padding for variables that are in registers

2021-10-12 Thread Qing Zhao via Gcc-patches

Hi,

PR10228 1https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281
Exposed an issue in the current implementation of the padding initialization of 
-ftrivial-auto-var-init.

Currently, we add __builtin_clear_padding call _AFTER_ every explicit 
initialization of an auto variable:

var_decl = {init_constructor};
__builtin_clear_padding (_decl, 0B, 1);

the reason I added the call to EVERY auto variable that has explicit 
initialization is, the folding of __builtin_clear_padding will automatically 
turn this call to a NOP when there is no padding in the variable. So, we don't 
need to check whether the variable has padding explicitly. 

However, always adding the call to __builtin_clear_padding (_decl,…) might 
introduce invalid IR when VAR_DECL cannot be addressable. 

In order to resolve this issue, I propose the following solution:

Instead of adding the call to __builtin_clear_padding _AFTER_ the explicit 
initialization, Using zero to initialize the whole variable BEFORE explicit 
fields initialization when VAR_DECL has padding, i.e:

If (had_padding_p (var_decl))
var_decl = ZERO;
var_decl = {init_constructor};

This should resolve the invalid IR issue.  However, there might be more run 
time overhead from such padding initialization since the whole variable is set 
to zero instead of only the paddings. 

Please let me know you comments on this.

Thanks.

Qing


The complete patch is :

From cb2ef83e8f53c13694c70ac4bc1df6e09b15f1c7 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Tue, 12 Oct 2021 22:33:06 +
Subject: [PATCH] Fix pr102281

---
 gcc/gimple-fold.c | 25 ++
 gcc/gimple-fold.h |  1 +
 gcc/gimplify.c| 49 +--
 gcc/testsuite/c-c++-common/pr102281.c | 15 
 4 files changed, 72 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr102281.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 7fcfef41f72..de4feb27dbc 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -4651,6 +4651,31 @@ clear_padding_type_may_have_padding_p (tree type)
 }
 }
 
+/* Return true if TYPE contains any padding bits.  */
+
+bool
+clear_padding_type_has_padding_p (tree type)
+{
+  bool has_padding = false;
+  if (BITS_PER_UNIT == 8
+  && CHAR_BIT == 8
+  && clear_padding_type_may_have_padding_p (type))
+{
+  HOST_WIDE_INT sz = int_size_in_bytes (type), i;
+  gcc_assert (sz > 0);
+  unsigned char *buf = XALLOCAVEC (unsigned char, sz);
+  memset (buf, ~0, sz);
+  clear_type_padding_in_mask (type, buf);
+  for (i = 0; i < sz; i++)
+  if (buf[i] != (unsigned char) ~0)
+{
+  has_padding = true;
+  break;
+}
+}
+  return has_padding;
+}
+
 /* Emit a runtime loop:
for (; buf.base != end; buf.base += sz)
  __builtin_clear_padding (buf.base);  */
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 397f4aeb7cf..eb750a68eca 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -37,6 +37,7 @@ extern tree maybe_fold_and_comparisons (tree, enum tree_code, 
tree, tree,
 extern tree maybe_fold_or_comparisons (tree, enum tree_code, tree, tree,
   enum tree_code, tree, tree);
 extern bool clear_padding_type_may_have_padding_p (tree);
+extern bool clear_padding_type_has_padding_p (tree);
 extern void clear_type_padding_in_mask (tree, unsigned char *);
 extern bool optimize_atomic_compare_exchange_p (gimple *);
 extern void fold_builtin_atomic_compare_exchange (gimple_stmt_iterator *);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d8e4b139349..4cc3ca3ae4e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1955,7 +1955,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 In order to make the paddings as zeroes for pattern init, We
 should add a call to __builtin_clear_padding to clear the
 paddings to zero in compatiple with CLANG.  */
- if (flag_auto_var_init == AUTO_INIT_PATTERN)
+ if (flag_auto_var_init == AUTO_INIT_PATTERN
+ && clear_padding_type_has_padding_p (TREE_TYPE (decl)))
gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
}
 }
@@ -4994,9 +4995,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
   tree object, ctor, type;
   enum gimplify_status ret;
   vec *elts;
-  bool cleared = false;
-  bool is_empty_ctor = false;
-  bool is_init_expr = (TREE_CODE (*expr_p) == INIT_EXPR);
+  bool need_clear_padding = false;
 
   gcc_assert (TREE_CODE (TREE_OPERAND (*expr_p, 1)) == CONSTRUCTOR);
 
@@ -5015,6 +5014,13 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
   elts = CONSTRUCTOR_ELTS (ctor);
   ret = GS_ALL_DONE;
 
+  /* If the user requests to initialize automatic variables, we
+ should initialize paddings inside the variable.  */
+  if

Re: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-12 Thread David Edelsohn via Gcc-patches

Hi, Xionghu

What's the status of the \M and \m testcase beautification requested
by Segher?  Did you send an updated patch? Your messages ping the
version prior to Segher's additional comments.

It seems that the changes to the patterns are complete, but there are
remaining questions about the testcase style and if the instruction
counts are ideal. I trust that the instruction counts match the
behavior after the patch, but it seemed that Segher wanted to confirm
that the counts are the values desired / expected from optimal code
generation.  The counts are the total for the file, which doesn't
communicate if the sequences themselves are optimal.

Thanks, David

On Sun, Sep 5, 2021 at 8:54 PM Xionghu Luo  wrote:
>
> Ping^2, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
>
>
> On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote:
> > Gentle ping, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
> >
> >
> > On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
> >> Hi,
> >>
> >> On 2021/6/9 07:25, Segher Boessenkool wrote:
> >>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
>  vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
>  5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for
>  vmrghlb.
> >>>
> >>> (vmrglb)
> >>>
>  +  if (BYTES_BIG_ENDIAN)
>  +emit_insn (
>  +  gen_altivec_vmrghb_direct (operands[0], operands[1],
>  operands[2]));
>  +  else
>  +emit_insn (
>  +  gen_altivec_vmrglb_direct (operands[0], operands[2],
>  operands[1]));
> >>>
> >>> Please don't indent like that, it doesn't match what we do elsewhere.
> >>> For better or for worse (for worse imo), we use deep hanging indents.
> >>> If you have to, you can do something like
> >>>
> >>>rtx insn;
> >>>if (BYTES_BIG_ENDIAN)
> >>>  insn = gen_altivec_vmrghb_direct (operands[0], operands[1],
> >>> operands[2]);
> >>>else
> >>>  insn = gen_altivec_vmrglb_direct (operands[0], operands[2],
> >>> operands[1]);
> >>>emit_insn (insn);
> >>>
> >>> (this is better even, in that it has only one emit_insn), or even
> >>>
> >>>rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> >>>: gen_altivec_vmrglb_direct;
> >>>if (!BYTES_BIG_ENDIAN)
> >>>  std::swap (operands[1], operands[2]);
> >>>emit_insn (fun (operands[0], operands[1], operands[2]));
> >>>
> >>> Well, C++ does not allow that last example like that, sigh, so
> >>>rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ?
> >>> gen_altivec_vmrghb_direct
> >>> : gen_altivec_vmrglb_direct;
> >>>
> >>> This is shorter than the other two options ;-)
> >>
> >> Changed.
> >>
> >>>
>  +(define_insn "altivec_vmrghb_direct"
>  [(set (match_operand:V16QI 0 "register_operand" "=v")
>  +(vec_select:V16QI
> >>>
> >>> This should be indented one space more.
> >>>
>  "TARGET_ALTIVEC"
>  "@
>  -   xxmrghw %x0,%x1,%x2
>  -   vmrghw %0,%1,%2"
>  +  xxmrghw %x0,%x1,%x2
>  +  vmrghw %0,%1,%2"
> >>>
> >>> The original indent was correct, please restore.
> >>>
>  -  emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
>  +  emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve,
>  vo));
> >>>
> >>> When you see a mode as part of a pattern name, chances are that it will
> >>> be a good candidate for using parameterized names with.  (But don't do
> >>> that now, just keep it in mind as a nice cleanup to do).
> >>
> >> OK.
> >>
> >>>
>  @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target,
>  rtx op0, rtx op1,
>   : CODE_FOR_altivec_vmrglh_direct),
>  {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7,
>  22, 23 } },
>    { OPTION_MASK_ALTIVEC,
>  -  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
>  -   : CODE_FOR_altivec_vmrglw_direct),
>  +  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>  +   : CODE_FOR_altivec_vmrglw_direct_v4si),
> >>>
> >>> The correct way is to align the ? and the : (or put everything on one
> >>> line of course, if that fits)
> >>>
> >>> The parens around this are not needed btw, and are a distraction.
> >>
> >> Changed.
> >>
> >>>
>  --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>  +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>  @@ -317,10 +317,10 @@ int main ()
>    /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
>    /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
>  -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
>  +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
>    /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
>  -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
>  -/* { dg-final { scan-assembler-times "xxmrglw"

Re: [PATCH] rs6000: Fix vec_cpsgn parameter order (PR101985)

2021-10-12 Thread Bill Schmidt via Gcc-patches

Hi!

On 10/12/21 4:37 PM, Segher Boessenkool wrote:
> On Fri, Sep 24, 2021 at 10:20:46AM -0500, Bill Schmidt wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr101985-1.c
>> @@ -0,0 +1,18 @@
>> +/* PR target/101985 */
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vsx_hw } */
>> +/* { dg-options "-O2" } */
> If you need vsx_hw (or vsx_ok), you need -mvsx in the options as well.
> (Always, so in both testcases here).

Whoops.  Fixed, and adjusted ChangeLog/commit message per David.  Committed.

Thanks for the reviews!!
Bill
>
>
> Segher

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-12 Thread Segher Boessenkool

On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> You asked for it. ;-)  Boiled down to remove macroisms and code that
> should be removed by optimization:

Thanks :-)

> static __inline __attribute__ ((__always_inline__)) void
> libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> {
>   fenv_union_t old;
>   register fenv_union_t __fr;
>   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
>   ctx->env = old.fenv = __fr.fenv; 
>   ctx->updated_status = (r != (old.l & 3));
> }

(Should use "n", not "i", only numbers are allowed, not e.g. the address
of something.  This actually can matter, in unusual cases.)

This orders the updating of RN before the store to __fr.fenv .  There is
no other ordering ensured here.

The store to __fr.env obviously has to stay in order with anything that
can alias it, if that store isn't optimised away completely later.

> static __inline __attribute__ ((__always_inline__)) void
> libc_feresetround_ppc (fenv_t *envp)
> { 
>   fenv_union_t new = { .fenv = *envp };
>   register fenv_union_t __fr;
>   __fr.l = new.l & 3;
>   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
> }

This both reads from and stores to __fr.fenv, the asm has to stay
between those two accesses (in the machine code).  If the code that
actually depends on the modified RN depends onb that __fr.fenv some way,
all will be fine.

> double
> __sin (double x)
> {
>   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
>   libc_feholdsetround_ppc_ctx (, (0));
>   /* floating point intensive code.  */
>   return retval;
> }

... but there is no such dependency.  The cleanup attribute does not
give any such ordering either afaik.

> There's not much to it, really.  "mffscrni" on the way in to save and set
> a required rounding mode, and "mffscrn" on the way out to restore it.

Yes.  But the code making use of the modified RN needs to have some
artificial dependencies with the RN setters, perhaps via __fr.fenv .

> > Calling a real function (that does not even need a stack frame, just a
> > blr) is not terribly expensive, either.
> 
> Not ideal, better would be better.

Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
simple, *robust* solution over some nice, faster,nicely faster way of
doing the wrong thing.

> > > > > Would creating a __builtin_mffsce be another solution?
> > > > 
> > > > Yes.  And not a bad idea in the first place.
> > > 
> > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > difference between "asm" and builtin, how does using a builtin solve the
> > > problem?
> > 
> > You will have to make the builtin solve it.  What a builtin can do is
> > virtually unlimited.  What an asm can do is not: it just outputs some
> > assembler language, and does in/out/clobber constraints.  You can do a
> > *lot* with that, but it is much more limited than everything you can do
> > in the compiler!  :-)
> > 
> > The fact remains that there is no way in RTL (or Gimple for that matter)
> > to express things like rounding mode changes.  You will need to
> > artificially make some barriers.
> 
> I know there is __builtin_set_fpscr_rn that generates mffscrn.

Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
provide the ordering either.  It *cannot*: you need to cooperate with
whatever you are ordering against.  There is no way in GCC to say "this
is an FP insn and has to stay in order with all FP control writes and FP
status reads".

Maybe now you see why I like external functions for this :-)

> This
> is not used in the code above because I believe it first appears in
> GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> a return value, which would be handy in this case).  Does the
> implementation of that builtin meet the requirements needed here,
> to prevent reordering of FP computation across instantiations of the
> builtin?  If not, is there a model on which to base an implementation
> of __builtin_mffsce (or some preferred name)?

It depends on what you are actually ordering, unfortunately.

Segher

Re: [PATCH] rs6000: Fix vec_cpsgn parameter order (PR101985)

2021-10-12 Thread Segher Boessenkool

On Fri, Sep 24, 2021 at 10:20:46AM -0500, Bill Schmidt wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr101985-1.c
> @@ -0,0 +1,18 @@
> +/* PR target/101985 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2" } */

If you need vsx_hw (or vsx_ok), you need -mvsx in the options as well.
(Always, so in both testcases here).


Segher

Re: [PATCH][WIP] Add install-dvi Makefile targets

2021-10-12 Thread Eric Gallager via Gcc-patches

On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager  wrote:
>
> Currently the build machinery handles install-pdf and install-html
> targets, but no install-dvi target. This patch is a step towards
> fixing that. Note that I have only tested with
> --enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will
> probably also have to be added for the languages I skipped.
> Also, please note that this patch applies on top of:
> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html
>
> ChangeLog:
>
> 2016-10-06  Eric Gallager  
>
> * Makefile.def: Handle install-dvi target.
> * Makefile.tpl: Likewise.
> * Makefile.in: Regenerate.
>
> gcc/ChangeLog:
>
> 2016-10-06  Eric Gallager  
>
> * Makefile.in: Handle dvidir and install-dvi target.
> * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi
> target hooks.
> * configure.ac: Handle install-dvi target.
> * configure: Regenerate.
>
> libiberty/ChangeLog:
>
> 2016-10-06  Eric Gallager  
>
> * Makefile.in: Handle dvidir and install-dvi target.
> * functions.texi: Regenerate.

Ping. The prerequisite patch that I linked to previously has gone in now.
I'm not sure if this specific patch still applies, though.
Also note that I've opened a bug to track this issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663

Re: [PATCH] Allow `make tags` to work from top-level directory

2021-10-12 Thread Eric Gallager via Gcc-patches

On Tue, Oct 12, 2021 at 3:56 PM Eric Gallager  wrote:
>
> On Tue, Oct 12, 2021 at 9:18 AM Jeff Law  wrote:
> >
> >
> >
> > On 10/11/2021 4:05 PM, Eric Gallager via Gcc-patches wrote:
> > > On Thu, Oct 13, 2016 at 4:43 PM Eric Gallager  
> > > wrote:
> > >> On 10/13/16, Jeff Law  wrote:
> > >>> On 10/06/2016 07:21 AM, Eric Gallager wrote:
> >  The libdecnumber, libgcc, and libobjc subdirectories are missing TAGS
> >  targets in their Makefiles. The attached patch causes them to be
> >  skipped when running `make tags`.
> > 
> >  ChangeLog entry:
> > 
> >  2016-10-06  Eric Gallager  
> > 
> >    * Makefile.def: Mark libdecnumber, libgcc, and libobjc as missing
> >    TAGS target.
> >    * Makefile.in: Regenerate.
> > 
> > >>> OK.  Please install.
> > >>>
> > >>> Thanks,
> > >>> Jeff
> > >>>
> > >>
> > >> I'm still waiting to hear back from  about my request
> > >> for copyright assignment, which I'll need to get sorted out before I
> > >> can start committing stuff (like this patch).
> > >>
> > >> Thanks,
> > >> Eric
> > > Update: In the intervening years, I got my copyright assignment filed
> > > and have recently become able to commit again; is your old approval
> > > from 2016 still valid, Jeff, or do I need a re-approval?
> > > Ref: https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg00370.html
> > It's still valid.  Just re-test and commit.
> >
> > jeff
>
> While re-testing, it seems that the `etags` command on my computer
> can't be found any longer; I'm thinking gcc/Makefile.in should be
> updated to stop hardcoding etags and use a variable that can be
> overridden instead... should I do a separate patch for that, or
> combine it with this one?

Well, anyways, this is what I've ended up committing for now:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=4ca446a46bef8c127d7aaeb2d4bb4625edc7f84e


patch-Makefile.diff
Description: Binary data

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-12 Thread Luís Ferreira via Gcc-patches

On Tue, 2021-10-12 at 15:40 -0400, Eric Gallager wrote:
> On Tue, Oct 12, 2021 at 8:55 AM Luís Ferreira
>  wrote:
> > 
> > On Fri, 2021-10-08 at 22:11 +0200, Iain Buclaw wrote:
> > > Excerpts from Luís Ferreira's message of October 8, 2021 7:08 pm:
> > > > On Fri, 2021-10-08 at 18:52 +0200, Iain Buclaw wrote:
> > > > > Excerpts from Luís Ferreira's message of October 7, 2021 8:29
> > > > > pm:
> > > > > > On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
> > > > > > > 
> > > > > > > I can help with the autotools part if you can say how
> > > > > > > precisely
> > > > > > > you'd
> > > > > > > like to use them to add address sanitization. And as for
> > > > > > > the
> > > > > > > OSS
> > > > > > > fuzz part, I think someone tried setting up auto-fuzzing
> > > > > > > for it
> > > > > > > once,
> > > > > > > but the main bottleneck was getting the bug reports that
> > > > > > > it
> > > > > > > generated
> > > > > > > properly triaged, so if you could make sure the bug-
> > > > > > > submitting
> > > > > > > portion
> > > > > > > of the process is properly streamlined, that'd probably
> > > > > > > go a
> > > > > > > long
> > > > > > > way
> > > > > > > towards helping it be useful.
> > > > > > 
> > > > > > Bugs are normally reported by email or mailing list. Is
> > > > > > there any
> > > > > > writable mailing list to publish bugs or is it strictly
> > > > > > needed to
> > > > > > open
> > > > > > an entry on bugzilla?
> > > > > > 
> > > > > 
> > > > > Please open an issue on bugzilla, fixes towards it can then
> > > > > be
> > > > > referenced in the commit message/patch posted here.
> > > > > 
> > > > > Iain.
> > > > 
> > > > You mean for this current issue? The discussion was about
> > > > future bug
> > > > reports reported by the OSS fuzzer workers. I can also open an
> > > > issue
> > > > on
> > > > the bugzilla for this issue, please clarify it and let me know
> > > > :)
> > > > 
> > > 
> > > 1. Open one for this issue.
> > > 
> > > 2. Bugs found by the fuzzer would report to bugzilla.
> > > https://gcc.gnu.org/bugs/
> > > 
> > > Iain.
> > 
> > Cross referencing the created issue:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102702
> > 
> > --
> > Sincerely,
> > Luís Ferreira @ lsferreira.net
> > 
> 
> Right, I found the previous time someone tried to set up an
> autofuzzer
> to report bugs to GCC's Bugzilla; searching for bugs reported by
> security-...@google.com on Bugzilla should find them:
> https://gcc.gnu.org/bugzilla/buglist.cgi?email1=security-tps%40google.com_to1=1=1=1=1=substring_id=326459_format=advanced

Good! Do you know how and where this is being handled? I didn't find
anything related to GCC/libiberty on OSS fuzz repository. Existing
resources on that can be useful to increment on top instead of
designing something from scratch. I also took a look at the fuzzer
included in GCC, but it doesn't include any heuristic.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

Re: [PATCH] Allow `make tags` to work from top-level directory

2021-10-12 Thread Eric Gallager via Gcc-patches

On Tue, Oct 12, 2021 at 9:18 AM Jeff Law  wrote:
>
>
>
> On 10/11/2021 4:05 PM, Eric Gallager via Gcc-patches wrote:
> > On Thu, Oct 13, 2016 at 4:43 PM Eric Gallager  wrote:
> >> On 10/13/16, Jeff Law  wrote:
> >>> On 10/06/2016 07:21 AM, Eric Gallager wrote:
>  The libdecnumber, libgcc, and libobjc subdirectories are missing TAGS
>  targets in their Makefiles. The attached patch causes them to be
>  skipped when running `make tags`.
> 
>  ChangeLog entry:
> 
>  2016-10-06  Eric Gallager  
> 
>    * Makefile.def: Mark libdecnumber, libgcc, and libobjc as missing
>    TAGS target.
>    * Makefile.in: Regenerate.
> 
> >>> OK.  Please install.
> >>>
> >>> Thanks,
> >>> Jeff
> >>>
> >>
> >> I'm still waiting to hear back from  about my request
> >> for copyright assignment, which I'll need to get sorted out before I
> >> can start committing stuff (like this patch).
> >>
> >> Thanks,
> >> Eric
> > Update: In the intervening years, I got my copyright assignment filed
> > and have recently become able to commit again; is your old approval
> > from 2016 still valid, Jeff, or do I need a re-approval?
> > Ref: https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg00370.html
> It's still valid.  Just re-test and commit.
>
> jeff

While re-testing, it seems that the `etags` command on my computer
can't be found any longer; I'm thinking gcc/Makefile.in should be
updated to stop hardcoding etags and use a variable that can be
overridden instead... should I do a separate patch for that, or
combine it with this one?

[PATCH v2] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-12 Thread Martin Sebor via Gcc-patches


On 10/12/21 12:52 AM, Richard Biener wrote:

On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor  wrote:


The attached change extends GCC's warnings for out-of-bounds
stores to cover atomic (and __sync) built-ins.

Rather than hardcoding the properties of these built-ins just
for the sake of the out-of-bounds detection, on the assumption
that it might be useful for future optimizations as well, I took
the approach of extending class attr_fnspec to express their
special property that they encode the size of the access in their
name.

I also took the liberty of making attr_fnspec assignable (something
the rest of my patch relies on), and updating some comments for
the characters the class uses to encode function properties, based
on my understanding of their purpose.

Tested on x86_64-linux.


Hmm, so you place 'A' at an odd place (where the return value is specified),
but you do not actually specify the behavior on the return value.  Shoudln't

+ 'A'specifies that the function atomically accesses a constant
+   1 << N bytes where N is indicated by character 3+2i

maybe read

 'A' specifies that the function returns the memory pointed to
by argument
  one of size 1 << N bytes where N is indicated by
character 3 +2i accessed atomically

?


I didn't think the return value would be interesting because in
general (parallel accesses) it's not related (in an observable
way) to the value of the dereferenced operand.  Not all
the built-ins also return a value (e.g., atomic_store), and
whether or not one does return the argument would need to be
encoded somehow because it cannot be determined from the return
type (__atomic_compare_exchange and __atomic_test_and_set return
bool that's not necessarily the value of the operand).  Also,
since the functions return the operand value either before or
after the update, we'd need another letter to describe that.
(This alone could be dealt with simply by using 'A' and 'a',
but that's not enough for the other cases.)

So with all these possibilities I don't think encoding
the return value at this point is worthwhile.  If/when this
enhancement turns out to be used for optimization and we think
encoding the return value would be helpful, I'd say let's
revisit it then.  The accessor APIs should make it a fairly
straightforward exercise.


I also wonder if it's necessary to constrain this to 'atomic' accesses
for the purpose of the patch and whether that detail could be omitted to
eventually make more use of it?


I pondered the same question but I couldn't think of any other
built-ins with similar semantics (read-write-modify, return
a result either pre- or post-modification), so I opted for
simplicity.  I am open to generalizing it if/when there is
a function I could test it with, although I'm not sure
the current encoding scheme has enough letters and letter
positions to describe the effects in their full generality.



Likewise

+ '0'...'9'  specifies the size of value written/read is given either
+   by the specified argument, or for atomic functions, by
+   2 ^ N where N is the constant value denoted by the character

should mention (excluding '0') for the argument position.


Sure, I'll update the comment if you think this change is worth
pursuing.



/* length of the fn spec string.  */
-  const unsigned len;
+  unsigned len;

why that?


The const member is what prevents the struct from being assignable,
which is what the rest of the patch depends on.



+  /* Return true of the function is an __atomic or __sync built-in.  */

you didn't specify that for 'A' ...

+  bool
+  atomic_p () const
+  {
+return str[0] == 'A';
+  }

+attr_fnspec
+atomic_builtin_fnspec (tree callee)
+{
+  switch (DECL_FUNCTION_CODE (callee))
+{
+#define BUILTIN_ACCESS_SIZE_FNSPEC(N, lgsz)\
+  BUILT_IN_ATOMIC_LOAD_ ## N:  \
+   return "Ap" "R" lgsz;

note that doing this for atomics makes those no longer a compiler barrier
for (aliased) loads and stores which means they are no longer a reliable
way to implement locks.  That's a reason why I never pushed a
PTA/alias patch I have to open-code this.

Thus, do we really want to do this?


That's my question to you :) If you don't think this attr_fnspec
extension would be useful for optimization I'll drop this part
of the patch and open-code it separately only for the out-of-bounds
diagnostics.  I'd started out that way but the fnspec class made
the code cleaner and if it could be used elsewhere so much
the better.  Please let me know.

Martin

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-12 Thread Eric Gallager via Gcc-patches

On Tue, Oct 12, 2021 at 8:55 AM Luís Ferreira  wrote:
>
> On Fri, 2021-10-08 at 22:11 +0200, Iain Buclaw wrote:
> > Excerpts from Luís Ferreira's message of October 8, 2021 7:08 pm:
> > > On Fri, 2021-10-08 at 18:52 +0200, Iain Buclaw wrote:
> > > > Excerpts from Luís Ferreira's message of October 7, 2021 8:29 pm:
> > > > > On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
> > > > > >
> > > > > > I can help with the autotools part if you can say how precisely
> > > > > > you'd
> > > > > > like to use them to add address sanitization. And as for the
> > > > > > OSS
> > > > > > fuzz part, I think someone tried setting up auto-fuzzing for it
> > > > > > once,
> > > > > > but the main bottleneck was getting the bug reports that it
> > > > > > generated
> > > > > > properly triaged, so if you could make sure the bug-submitting
> > > > > > portion
> > > > > > of the process is properly streamlined, that'd probably go a
> > > > > > long
> > > > > > way
> > > > > > towards helping it be useful.
> > > > >
> > > > > Bugs are normally reported by email or mailing list. Is there any
> > > > > writable mailing list to publish bugs or is it strictly needed to
> > > > > open
> > > > > an entry on bugzilla?
> > > > >
> > > >
> > > > Please open an issue on bugzilla, fixes towards it can then be
> > > > referenced in the commit message/patch posted here.
> > > >
> > > > Iain.
> > >
> > > You mean for this current issue? The discussion was about future bug
> > > reports reported by the OSS fuzzer workers. I can also open an issue
> > > on
> > > the bugzilla for this issue, please clarify it and let me know :)
> > >
> >
> > 1. Open one for this issue.
> >
> > 2. Bugs found by the fuzzer would report to bugzilla.
> > https://gcc.gnu.org/bugs/
> >
> > Iain.
>
> Cross referencing the created issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102702
>
> --
> Sincerely,
> Luís Ferreira @ lsferreira.net
>

Right, I found the previous time someone tried to set up an autofuzzer
to report bugs to GCC's Bugzilla; searching for bugs reported by
security-...@google.com on Bugzilla should find them:
https://gcc.gnu.org/bugzilla/buglist.cgi?email1=security-tps%40google.com_to1=1=1=1=1=substring_id=326459_format=advanced

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-12 Thread Paul A. Clarke via Gcc-patches

On Mon, Oct 11, 2021 at 05:04:12PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> > > > Very similar methods are used in glibc today. Are those broken?
> > > 
> > > Maybe.
> > 
> > Ouch.
> 
> So show the code?

You asked for it. ;-)  Boiled down to remove macroisms and code that
should be removed by optimization:
--
static __inline __attribute__ ((__always_inline__)) void
libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
{
  fenv_union_t old;
  register fenv_union_t __fr;
  __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
  ctx->env = old.fenv = __fr.fenv; 
  ctx->updated_status = (r != (old.l & 3));
}
static __inline __attribute__ ((__always_inline__)) void
libc_feresetround_ppc (fenv_t *envp)
{ 
  fenv_union_t new = { .fenv = *envp };
  register fenv_union_t __fr;
  __fr.l = new.l & 3;
  __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
}
double
__sin (double x)
{
  struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
  libc_feholdsetround_ppc_ctx (, (0));
  /* floating point intensive code.  */
  return retval;
}
--

There's not much to it, really.  "mffscrni" on the way in to save and set
a required rounding mode, and "mffscrn" on the way out to restore it.

> > > If you get a real (i.e. not inline) function call there, that
> > > can save you often.
> > 
> > Calling a real function in order to execute a single instruction is
> > sub-optimal. ;-)
> 
> Calling a real function (that does not even need a stack frame, just a
> blr) is not terribly expensive, either.

Not ideal, better would be better.

> > > > Would creating a __builtin_mffsce be another solution?
> > > 
> > > Yes.  And not a bad idea in the first place.
> > 
> > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > difference between "asm" and builtin, how does using a builtin solve the
> > problem?
> 
> You will have to make the builtin solve it.  What a builtin can do is
> virtually unlimited.  What an asm can do is not: it just outputs some
> assembler language, and does in/out/clobber constraints.  You can do a
> *lot* with that, but it is much more limited than everything you can do
> in the compiler!  :-)
> 
> The fact remains that there is no way in RTL (or Gimple for that matter)
> to express things like rounding mode changes.  You will need to
> artificially make some barriers.

I know there is __builtin_set_fpscr_rn that generates mffscrn. This
is not used in the code above because I believe it first appears in
GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
a return value, which would be handy in this case).  Does the
implementation of that builtin meet the requirements needed here,
to prevent reordering of FP computation across instantiations of the
builtin?  If not, is there a model on which to base an implementation
of __builtin_mffsce (or some preferred name)?

PC

Re: [PATCH] Warray-bounds: Warn only for generic address spaces

2021-10-12 Thread Martin Sebor via Gcc-patches


On 10/12/21 12:33 PM, Siddhesh Poyarekar wrote:

The warning is falsely triggered for THREAD_SELF in glibc when
accessing TCB through the segment register.


Thanks for looking into it!  The Glibc warning is being tracked
in PR 102630.  The root cause behind it is in compute_objsize_r
in pointer-query.cc (which is used by -Warray-bounds as well as
other warnings).  I just posted a patch for it the other day;
it's waiting for approval (though as Joseph noted, I need to
adjust the test and either make it target-independent or move
it under i386).

Martin

PS Noticing gcc.target/i386/addr-space-2.c makes me wish
-Warray-bounds were enabled by default, like other out-of-bounds
warnings, and reminds me that it should be able to run even at
-O1 (and -O0).



gcc/ChangeLog:

* gimple-array-bounds.cc
(array_bounds_checker::check_mem_ref): Bail out for
non-generic address spaces.

gcc/testsuite/ChangeLog:

* gcc.target/i386/addr-space-3.c: New test case.

Signed-off-by: Siddhesh Poyarekar 
---
  gcc/gimple-array-bounds.cc   | 3 +++
  gcc/testsuite/gcc.target/i386/addr-space-3.c | 5 +
  2 files changed, 8 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 0517e5ddd8e..36fc1dbe3f8 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -432,6 +432,9 @@ array_bounds_checker::check_mem_ref (location_t location, 
tree ref,
if (aref.offset_in_range (axssize))
  return false;
  
+  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (axstype)))

+return false;
+
if (TREE_CODE (aref.ref) == SSA_NAME)
  {
gimple *def = SSA_NAME_DEF_STMT (aref.ref);
diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c 
b/gcc/testsuite/gcc.target/i386/addr-space-3.c
new file mode 100644
index 000..4bd940e696a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
@@ -0,0 +1,5 @@
+/* Verify that __seg_fs/gs marked variables do not trigger an array bounds
+   warning.  */
+/* { dg-do compile */
+/* { dg-options "-O2 -Warray-bounds" } */
+#include "addr-space-2.c"

Re: [PATCH] rs6000: Fix vec_cpsgn parameter order (PR101985)

2021-10-12 Thread David Edelsohn via Gcc-patches

On Fri, Sep 24, 2021 at 11:20 AM Bill Schmidt  wrote:
>
> Hi!
>
> This fixes a bug reported in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101985.
>
> The vec_cpsgn built-in function API differs in argument order from the
> copysign3 convention.  Currently that pattern is incorrectly used to
> implement vec_cpsgn.  Fix that while leaving the existing pattern in place
> to implement copysignf for vector modes.

It's a little confusing what "that" is.  Maybe clarify that the patch
is changing the PowerPC VSX function to invoke the GCC built-in with
the argument in the correct order.

>
> Part of the fix when using the new built-in support requires an adjustment
> to a pending patch that replaces much of altivec.h with an automatically
> generated file.  So that adjustment will be coming later...
>
> Also fix a bug in the new built-in overload infrastructure where we were
> using the VSX form of the VEC_COPYSIGN built-in when we should default to
> the VMX form.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this okay for trunk?
>
> Thanks!
> Bill
>
>
> 2021-09-24  Bill Schmidt  
>
> gcc/
> PR target/101985
> * config/rs6000/altivec.h (vec_cpsgn): Adjust.

Maybe a little more information than "Adjust"?  Swap arguments?

> * config/rs6000/rs6000-overload.def (VEC_COPYSIGN): Use SKIP to
> avoid generating an automatic #define of vec_cpsgn.  Use the
> correct built-in for V4SFmode that doesn't depend on VSX.
>
> gcc/testsuite/
> PR target/101985
> * gcc.target/powerpc/pr101985.c: New.

Okay.

Thanks, David

[PATCH] Warray-bounds: Warn only for generic address spaces

2021-10-12 Thread Siddhesh Poyarekar

The warning is falsely triggered for THREAD_SELF in glibc when
accessing TCB through the segment register.

gcc/ChangeLog:

* gimple-array-bounds.cc
(array_bounds_checker::check_mem_ref): Bail out for
non-generic address spaces.

gcc/testsuite/ChangeLog:

* gcc.target/i386/addr-space-3.c: New test case.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/gimple-array-bounds.cc   | 3 +++
 gcc/testsuite/gcc.target/i386/addr-space-3.c | 5 +
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 0517e5ddd8e..36fc1dbe3f8 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -432,6 +432,9 @@ array_bounds_checker::check_mem_ref (location_t location, 
tree ref,
   if (aref.offset_in_range (axssize))
 return false;
 
+  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (axstype)))
+return false;
+
   if (TREE_CODE (aref.ref) == SSA_NAME)
 {
   gimple *def = SSA_NAME_DEF_STMT (aref.ref);
diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c 
b/gcc/testsuite/gcc.target/i386/addr-space-3.c
new file mode 100644
index 000..4bd940e696a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
@@ -0,0 +1,5 @@
+/* Verify that __seg_fs/gs marked variables do not trigger an array bounds
+   warning.  */
+/* { dg-do compile */
+/* { dg-options "-O2 -Warray-bounds" } */
+#include "addr-space-2.c"
-- 
2.31.1

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Segher Boessenkool

On Mon, Oct 11, 2021 at 02:07:49PM -0600, Martin Sebor wrote:
> On 10/11/21 11:43 AM, Segher Boessenkool wrote:
> >I also am okay with this.  If it was decided x86 does not have to deal
> >with these (generic!) problems, then why should we do other people's
> >work?
> 
> I don't know that anything was decided.

It was approved though :-)  I don't know all history behind it.

> I think those changes
> were made in haste, and (as you noted in your review of these
> updates to them), were incomplete (missing comments referencing
> the underlying bugs or limitations).

Yeah.

> Now that we've noticed it
> we should try to fix it.  I'm not expecting you (or Kwen) to do
> other people's work, but it would help to let them/us know that
> there is work for us to do.  I only noticed the problem by luck.

There is still a month of stage 1 to go, and we are getting >50 new
fails every day.  Maybe once that dies down we can report anything :-(


Segher

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Segher Boessenkool

On Tue, Oct 12, 2021 at 11:15:51AM -0600, Martin Sebor wrote:
> On 10/12/21 10:18 AM, Segher Boessenkool wrote:
> >On Tue, Oct 12, 2021 at 09:49:19AM -0600, Martin Sebor wrote:
> >>Coming back to the xfail conditionals, do you think you'll
> >>be able to put together some target-supports magic so they
> >>don't have to enumerate all the affected targets?
> >
> >There should only be an xfail if we do not expect to be able to fix the
> >bug causing this any time soon.  There shouldn't be one here, not yet
> >anyway.
> >
> >Other than that: yes, and one you have such a selector, just dg-require
> >it (or its inverse) for this test, don't xfail the test (if this is
> >expected and correct behaviour).
> 
> My sense is that fixing all the fallout from the vectorization
> change is going to be delicate and time-consuming work.  With
> the end of stage 1 just about a month away I'm not too optimistic
> how much of it I'll be able to get it done before then.  Depending
> on how intrusive the fixes turn out to be it may or may not be
> suitable in stage 3.

Some it will be suitable for stage4, even (testsuite-only changes for
example).

> Based on pr102706 that Jeff reported for the regressions in his
> automated tester, it also sounds like the test failures are spread
> out across a multitude of targets.  In addition, it doesn't look
> like the targets are all the same in all the tests.  Enumerating
> the targets that correspond to each test failure would be like
> playing the proverbial Whac-A-Mole.
> 
> That makes me think we do need some such selector rather soon.

Yes.

> The failing test cases are a subset of all the cases exercised
> by the tests.  We don't want to conditionally enable/disable
> the whole tests just for the few failing cases (if that's what
> you were suggesting by dg-require).

I mean that the tests should not be done on targets where those tests
do not make sense.

> So we need to apply
> the selector to individual dg-warning and dg-bogus directives
> in these tests.

Some of those tests should not be run with -fvectorize at all, imo.
You *want* to limit things a lot, for detail tests.


Segher

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Jeff Law via Gcc-patches





On 10/12/2021 11:15 AM, Martin Sebor via Gcc-patches wrote:

On 10/12/21 10:18 AM, Segher Boessenkool wrote:

Hi!

On Tue, Oct 12, 2021 at 09:49:19AM -0600, Martin Sebor wrote:

Coming back to the xfail conditionals, do you think you'll
be able to put together some target-supports magic so they
don't have to enumerate all the affected targets?


There should only be an xfail if we do not expect to be able to fix the
bug causing this any time soon.  There shouldn't be one here, not yet
anyway.

Other than that: yes, and one you have such a selector, just dg-require
it (or its inverse) for this test, don't xfail the test (if this is
expected and correct behaviour).


My sense is that fixing all the fallout from the vectorization
change is going to be delicate and time-consuming work.  With
the end of stage 1 just about a month away I'm not too optimistic
how much of it I'll be able to get it done before then.  Depending
on how intrusive the fixes turn out to be it may or may not be
suitable in stage 3.

Based on pr102706 that Jeff reported for the regressions in his
automated tester, it also sounds like the test failures are spread
out across a multitude of targets.  In addition, it doesn't look
like the targets are all the same in all the tests.  Enumerating
the targets that correspond to each test failure would be like
playing the proverbial Whac-A-Mole.
There'll be some degree of whac-a-mole.  But it likely isn't every 
target.   I'm still evaluating that when I have a few minutes to look at 
a given target.


jeff

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Martin Sebor via Gcc-patches


On 10/12/21 10:18 AM, Segher Boessenkool wrote:

Hi!

On Tue, Oct 12, 2021 at 09:49:19AM -0600, Martin Sebor wrote:

Coming back to the xfail conditionals, do you think you'll
be able to put together some target-supports magic so they
don't have to enumerate all the affected targets?


There should only be an xfail if we do not expect to be able to fix the
bug causing this any time soon.  There shouldn't be one here, not yet
anyway.

Other than that: yes, and one you have such a selector, just dg-require
it (or its inverse) for this test, don't xfail the test (if this is
expected and correct behaviour).


My sense is that fixing all the fallout from the vectorization
change is going to be delicate and time-consuming work.  With
the end of stage 1 just about a month away I'm not too optimistic
how much of it I'll be able to get it done before then.  Depending
on how intrusive the fixes turn out to be it may or may not be
suitable in stage 3.

Based on pr102706 that Jeff reported for the regressions in his
automated tester, it also sounds like the test failures are spread
out across a multitude of targets.  In addition, it doesn't look
like the targets are all the same in all the tests.  Enumerating
the targets that correspond to each test failure would be like
playing the proverbial Whac-A-Mole.

That makes me think we do need some such selector rather soon.

The failing test cases are a subset of all the cases exercised
by the tests.  We don't want to conditionally enable/disable
the whole tests just for the few failing cases (if that's what
you were suggesting by dg-require).  So we need to apply
the selector to individual dg-warning and dg-bogus directives
in these tests.

Martin

Re: [PATCH] libiberty: d-demangle: rename function symbols to be more consistent

2021-10-12 Thread Luís Ferreira

On Mon, 2021-10-04 at 09:30 +0200, Iain Buclaw wrote:
> > On 30/09/2021 02:48 Luís Ferreira  wrote:
> > 
> >  
> > There is some function names with `dlang_parse_` prefix and some
> > with only
> > `dlang_` prefix that does parsing. The same happens with
> > `dlang_decode_`.
> > 
> > To make things a bit more consistent and easier to understand, this
> > patch adds
> > the missing prefix, according to the functions signatures.
> > 
> 
> Not too keen on grand the renaming without a changelog entry to go
> with.
> 
> Iain.

I rewrote the patch to a PATCH v2 including the ChangeLog, as
requested.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

[PATCH v2] libiberty: d-demangle: rename function symbols to be more consistent

2021-10-12 Thread Luís Ferreira

There is some function names with `dlang_parse_` prefix and some with only
`dlang_` prefix that does parsing. The same happens with `dlang_decode_`.

To make things a bit more consistent and easier to understand, this patch adds
the missing prefix, according to the functions signatures.

ChangeLog:
libiberty/
* d-demangle.c (dlang_function_type): Rename function to
  dlang_parse_function_type
* d-demangle.c (dlang_function_args): Rename function to
  dlang_parse_function_args
* d-demangle.c (dlang_type): Rename function to dlang_parse_type
* d-demangle.c (dlang_value): Rename function to dlang_parse_value
* d-demangle.c (dlang_lname): Rename function to dlang_parse_lname
* d-demangle.c (dlang_number): Rename function to dlang_decode_number
* d-demangle.c (dlang_hexdigit): Rename function to
 dlang_decode_hexdigit
* d-demangle.c (dlang_decode_backref): Rename function to
   dlang_decode_backref_pos
* d-demangle.c (dlang_backref): Rename function to dlang_decode_backref
* d-demangle.c (dlang_symbol_backref): Rename function to
   dlang_parse_symbol_backref
* d-demangle.c (dlang_type_backref): Rename function to
 dlang_parse_type_backref
* d-demangle.c (dlang_call_convention): Rename function to
dlang_parse_call_convention
* d-demangle.c (dlang_type_modifiers): Rename function to
   dlang_parse_type_modifiers
* d-demangle.c (dlang_attributes): Rename function to
   dlang_parse_attributes
* d-demangle.c (dlang_function_type_noreturn): Rename function to
   
dlang_parse_function_type_noreturn
* d-demangle.c (dlang_identifier): Rename function to
   dlang_parse_identifier
* d-demangle.c (dlang_template_symbol_param): Rename function to
  
dlang_parse_template_symbol_param
* d-demangle.c (dlang_template_args): Rename function to
  dlang_parse_template_args

Signed-off-by: Luís Ferreira 
---
 libiberty/d-demangle.c | 168 -
 1 file changed, 84 insertions(+), 84 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 3adf7b562d1..161bd7abd91 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -183,15 +183,15 @@ struct dlang_info
 #define TEMPLATE_LENGTH_UNKNOWN (-1UL)
 
 /* Prototypes for forward referenced functions */
-static const char *dlang_function_type (string *, const char *,
+static const char *dlang_parse_function_type (string *, const char *,
struct dlang_info *);
 
-static const char *dlang_function_args (string *, const char *,
+static const char *dlang_parse_function_args (string *, const char *,
struct dlang_info *);
 
-static const char *dlang_type (string *, const char *, struct dlang_info *);
+static const char *dlang_parse_type (string *, const char *, struct dlang_info 
*);
 
-static const char *dlang_value (string *, const char *, const char *, char,
+static const char *dlang_parse_value (string *, const char *, const char *, 
char,
struct dlang_info *);
 
 static const char *dlang_parse_qualified (string *, const char *,
@@ -206,14 +206,14 @@ static const char *dlang_parse_tuple (string *, const 
char *,
 static const char *dlang_parse_template (string *, const char *,
 struct dlang_info *, unsigned long);
 
-static const char *dlang_lname (string *, const char *, unsigned long);
+static const char *dlang_parse_lname (string *, const char *, unsigned long);
 
 
 /* Extract the number from MANGLED, and assign the result to RET.
Return the remaining string on success or NULL on failure.
A result larger than UINT_MAX is considered a failure.  */
 static const char *
-dlang_number (const char *mangled, unsigned long *ret)
+dlang_decode_number (const char *mangled, unsigned long *ret)
 {
   /* Return NULL if trying to extract something that isn't a digit.  */
   if (mangled == NULL || !ISDIGIT (*mangled))
@@ -243,7 +243,7 @@ dlang_number (const char *mangled, unsigned long *ret)
 /* Extract the hex-digit from MANGLED, and assign the result to RET.
Return the remaining string on success or NULL on failure.  */
 static const char *
-dlang_hexdigit (const char *mangled, char *ret)
+dlang_decode_hexdigit (const char *mangled, char *ret)
 {
   char c;

[arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches


Hi,

The way we were previously dealing with addressing modes for MVE was 
preventing

the use of pre, post and offset addressing modes for the normal loads and
stores, including widening and narrowing.  This patch fixes that and
adds tests to ensure we are capable of using all the available addressing
modes.

gcc/ChangeLog:
2021-10-12  Andre Vieira  

    * config/arm/arm.c (thumb2_legitimate_address_p): Use 
VALID_MVE_MODE

    when checking mve addressing modes.
    (mve_vector_mem_operand): Fix the way we handle pre, post and 
offset

    addressing modes.
    (arm_print_operand): Fix printing of POST_ and PRE_MODIFY.
    * config/arm/mve.md: Use mve_memory_operand predicate 
everywhere where

    there is a single Ux constraint.

gcc/testsuite/ChangeLog:
2021-10-12  Andre Vieira  

    * gcc.target/arm/mve/mve.exp: Make it test main directory.
    * gcc.target/arm/mve/mve_load_memory_modes.c: New test.
    * gcc.target/arm/mve/mve_store_memory_modes.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6c6e77fab666f4aeff023b1f949e3ca0a3545658..d921261633aeff4f92a2e1a6057b00b685dea892
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8530,8 +8530,7 @@ thumb2_legitimate_address_p (machine_mode mode, rtx x, 
int strict_p)
   bool use_ldrd;
   enum rtx_code code = GET_CODE (x);
 
-  if (TARGET_HAVE_MVE
-  && (mode == V8QImode || mode == E_V4QImode || mode == V4HImode))
+  if (TARGET_HAVE_MVE && VALID_MVE_MODE (mode))
 return mve_vector_mem_operand (mode, x, strict_p);
 
   if (arm_address_register_rtx_p (x, strict_p))
@@ -13433,53 +13432,49 @@ mve_vector_mem_operand (machine_mode mode, rtx op, 
bool strict)
   || code == PRE_INC || code == POST_DEC)
 {
   reg_no = REGNO (XEXP (op, 0));
-  return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
-  ? reg_no <= LAST_LO_REGNUM
-  :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
-}
-  else if ((code == POST_MODIFY || code == PRE_MODIFY)
-  && GET_CODE (XEXP (op, 1)) == PLUS && REG_P (XEXP (XEXP (op, 1), 1)))
+  return ((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
+ ? reg_no <= LAST_LO_REGNUM
+ :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
+   || reg_no >= FIRST_PSEUDO_REGISTER;
+}
+  else if (((code == POST_MODIFY || code == PRE_MODIFY)
+   && GET_CODE (XEXP (op, 1)) == PLUS
+   && XEXP (op, 0) == XEXP (XEXP (op, 1), 0)
+   && REG_P (XEXP (op, 0))
+   && GET_CODE (XEXP (XEXP (op, 1), 1)) == CONST_INT)
+  /* Make sure to only accept PLUS after reload_completed, otherwise
+ this will interfere with auto_inc's pattern detection.  */
+  || (reload_completed && code == PLUS && REG_P (XEXP (op, 0))
+  && GET_CODE (XEXP (op, 1)) == CONST_INT))
 {
   reg_no = REGNO (XEXP (op, 0));
-  val = INTVAL (XEXP ( XEXP (op, 1), 1));
+  if (code == PLUS)
+   val = INTVAL (XEXP (op, 1));
+  else
+   val = INTVAL (XEXP(XEXP (op, 1), 1));
+
   switch (mode)
{
  case E_V16QImode:
-   if (abs (val) <= 127)
- return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
-   return FALSE;
- case E_V8HImode:
- case E_V8HFmode:
-   if (abs (val) <= 255)
- return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
-   return FALSE;
  case E_V8QImode:
  case E_V4QImode:
if (abs (val) <= 127)
- return (reg_no <= LAST_LO_REGNUM
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+ return (reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
+   || reg_no >= FIRST_PSEUDO_REGISTER;
return FALSE;
+ case E_V8HImode:
+ case E_V8HFmode:
  case E_V4HImode:
  case E_V4HFmode:
if (val % 2 == 0 && abs (val) <= 254)
- return (reg_no <= LAST_LO_REGNUM
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
+ return reg_no <= LAST_LO_REGNUM
+   || reg_no >= FIRST_PSEUDO_REGISTER;
return FALSE;
  case E_V4SImode:
  case E_V4SFmode:
if (val % 4 == 0 && abs (val) <= 508)
- return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
- || (!strict && reg_no >= FIRST_PSEUDO_REGISTER));
-   return FALSE;
- case E_V2DImode:
- case E_V2DFmode:
- case E_TImode:
-   if (val % 4 == 0 && val >= 0 && val <= 1020)
- return ((reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM)
- || (!strict && reg_no >=

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-12 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, October 12, 2021 5:25 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Richard Sandiford
> 
> Subject: RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2
> 
> Hi All,
> 
> This is  a new version with BE support and more tests.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?


+(define_insn "*aarch64_narrow_trunc_le"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+  (truncate:
+(match_operand:VQN 1 "register_operand" "w"))
+ (truncate:
+   (match_operand:VQN 2 "register_operand" "w"]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "uzp1\\t%0., %1., %2."
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_narrow_trunc_be"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (truncate:
+   (match_operand:VQN 2 "register_operand" "w"))
+  (truncate:
+(match_operand:VQN 1 "register_operand" "w"]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "uzp1\\t%0., %1., %2."
+  [(set_attr "type" "neon_permute")]
+)
+

Hmmm these patterns are identical in what they match they just have the effect 
of printing operands 1 and 2 in a different order.
Perhaps it's more compact to change the output template into a BYTES_BIG_ENDIAN 
? "uzp1\\t%0., %1., %2."" : uzp1\\t%0., 
%2., %1."
and avoid having a second at all?

Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md
> (*aarch64_narrow_trunc_le):
>   (*aarch64_narrow_trunc_be): New.
>   * config/aarch64/iterators.md (VNARROWSIMD, Vnarrowsimd):
> New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/narrow_high_combine.c: Update case.
>   * gcc.target/aarch64/xtn-combine-1.c: New test.
>   * gcc.target/aarch64/xtn-combine-2.c: New test.
>   * gcc.target/aarch64/xtn-combine-3.c: New test.
>   * gcc.target/aarch64/xtn-combine-4.c: New test.
>   * gcc.target/aarch64/xtn-combine-5.c: New test.
>   * gcc.target/aarch64/xtn-combine-6.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 0b340b49fa06684b80d0b78cb712e49328ca92d5..8435dece660a12aa747c4a4
> 89fbbda5bc0f83a86 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1753,6 +1753,30 @@ (define_expand "aarch64_xtn2"
>}
>  )
> 
> +(define_insn "*aarch64_narrow_trunc_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +  (truncate:
> +(match_operand:VQN 1 "register_operand" "w"))
> +   (truncate:
> + (match_operand:VQN 2 "register_operand" "w"]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "uzp1\\t%0., %1., %2."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
> +(define_insn "*aarch64_narrow_trunc_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (truncate:
> + (match_operand:VQN 2 "register_operand" "w"))
> +  (truncate:
> +(match_operand:VQN 1 "register_operand" "w"]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "uzp1\\t%0., %1., %2."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
>  ;; Packing doubles.
> 
>  (define_expand "vec_pack_trunc_"
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index
> 8dbeed3b0d4a44cdc17dd333ed397b39a33f386a..95b385c0c9405fe95fcd072
> 62a9471ab13d5488e 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -270,6 +270,14 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI
> V4SI])
>  ;; Advanced SIMD modes for H, S and D types.
>  (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
> 
> +;; Modes for which we can narrow the element and increase the lane counts
> +;; to preserve the same register size.
> +(define_mode_attr VNARROWSIMD [(V4HI "V8QI") (V8HI "V16QI") (V4SI
> "V8HI")
> +(V2SI "V4HI") (V2DI "V4SI")])
> +
> +(define_mode_attr Vnarrowsimd [(V4HI "v8qi") (V8HI "v16qi") (V4SI "v8hi")
> +(V2SI "v4hi") (V2DI "v4si")])
> +
>  ;; Advanced SIMD and scalar integer modes for H and S.
>  (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> index
> 50ecab002a3552d37a5cc0d8921f42f6c3dba195..fa61196d3644caa48b12151e
> 12b15dfeab8c7e71 100644
> --- a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
> @@ -225,7 +225,8 @@ TEST_2_UNARY (vqmovun, uint32x4_t, int64x2_t,
> s64, u32)
>  /* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 6} }  */
>  /* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 6} }  */
>  /*

Re: [PATCH] include/longlong.h: Remove incorrect lvalue to rvalue conversion from asm output constraints

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 12, 2021 at 09:21:21AM -0700, Fāng-ruì Sòng wrote:
> > > An output constraint takes a lvalue. While GCC happily strips the
> > > incorrect lvalue to rvalue conversion, Clang rejects the code by default:
> > >
> > > error: invalid use of a cast in a inline asm context requiring an 
> > > lvalue: remove the cast or build with -fheinous-gnu-extensions
> > >
> > > The file appears to share the same origin with gmplib longlong.h but
> > > they differ much now (gmplib version is much longer).
> > >
> > > I don't have write access to the git repo.
> > > ---
> > >  include/longlong.h | 186 ++---
> > >  1 file changed, 93 insertions(+), 93 deletions(-)
> > >
> > > diff --git a/include/longlong.h b/include/longlong.h
> > > index c3e92e54ecc..0a21a441d2d 100644
> > > --- a/include/longlong.h
> > > +++ b/include/longlong.h
> > > @@ -194,8 +194,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, 
> > > UDItype, UDItype);
> > >  #if defined (__arc__) && W_TYPE_SIZE == 32
> > >  #define add_ss(sh, sl, ah, al, bh, bl) \
> > >__asm__ ("add.f%1, %4, %5\n\tadc   %0, %2, %3" \
> > > -: "=r" ((USItype) (sh)), \
> > > -  "=" ((USItype) (sl)) \
> > > +: "=r" (sh), \
> > > +  "=" (sl) \
> > >  : "%r" ((USItype) (ah)), \
> > >"rICal" ((USItype) (bh)),  \
> > >"%r" ((USItype) (al)), \
> >
> > This seems to alter the meanining of existing programs if sh and sl do
> > not have the expected type.
> >
> > I think you need to add a compound expression and temporaries of type
> > USItype if you want to avoid the cast.
> 
> Add folks who may comment on the output constraint behavior when a
> lvalue to rvalue conversion like (`(USItype)`) is added.

Allowing the casts in there is intentional, the comment about this
e.g. in GCC's C FE says:
Really, this should not be here.  Users should be using a
proper lvalue, dammit.  But there's a long history of using casts
in the output operands.  In cases like longlong.h, this becomes a
primitive form of typechecking -- if the cast can be removed, then
the output operand had a type of the proper width; otherwise we'll
get an error.

If you try e.g.:

void
foo (void)
{
  int i;
  long l;
  __asm ("" : "=r" ((unsigned) i));
  __asm ("" : "=r" ((long) l));
  __asm ("" : "=r" ((long long) l));
  __asm ("" : "=r" ((int) l));
  __asm ("" : "=r" ((long) i));
}

then on e.g. x86-64 the first 3 asms are accepted by GCC, the last two
rejected, because the modes are different there.

So the above change throws away important typechecking.  As it is
used in a macro, something different should verify that if the casts are
removed.

Jakub

RE: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift of half top bits (shuffle)

2021-10-12 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, October 12, 2021 5:23 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Richard Sandiford
> 
> Subject: RE: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift
> of half top bits (shuffle)
> 
> Hi All,
> 
> This is  a new version with more tests and BE support.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md
>   (*aarch64_topbits_shuffle_le): New.
>   (*aarch64_topbits_shuffle_le): New.
>   (*aarch64_topbits_shuffle_be): New.
>   (*aarch64_topbits_shuffle_be): New.
>   * config/aarch64/predicates.md
>   (aarch64_simd_shift_imm_vec_exact_top): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/shrn-combine-10.c: New test.
>   * gcc.target/aarch64/shrn-combine-5.c: New test.
>   * gcc.target/aarch64/shrn-combine-6.c: New test.
>   * gcc.target/aarch64/shrn-combine-7.c: New test.
>   * gcc.target/aarch64/shrn-combine-8.c: New test.
>   * gcc.target/aarch64/shrn-combine-9.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 5715db4e1e1386e724e4d4defd5e5ed9efd8a874..7f0888ee2f81ae17ac97be1f
> 8438a2e588587c2a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1852,6 +1852,66 @@ (define_insn
> "*aarch64_shrn2_vect_be"
>[(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
> 
> +(define_insn "*aarch64_topbits_shuffle_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +  (truncate:
> +(SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
> +   (match_operand:VQN 2
> "aarch64_simd_shift_imm_vec_exact_top")))
> +   (truncate:
> + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w")
> +   (match_dup 2)]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "uzp2\\t%0., %1., %3."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
> +(define_insn "*aarch64_topbits_shuffle_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +  (unspec: [
> +  (match_operand:VQN 1 "register_operand" "w")
> +   (match_operand:VQN 2
> "aarch64_simd_shift_imm_vec_exact_top")
> +  ] UNSPEC_RSHRN)
> +   (unspec: [
> +   (match_operand:VQN 3 "register_operand" "w")
> +   (match_dup 2)
> +  ] UNSPEC_RSHRN)))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "uzp2\\t%0., %1., %3."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
> +(define_insn "*aarch64_topbits_shuffle_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (truncate:
> + (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w")
> +   (match_operand:VQN 2
> "aarch64_simd_shift_imm_vec_exact_top")))
> +  (truncate:
> +(SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
> +   (match_dup 2)]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "uzp2\\t%0., %1., %3."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
> +(define_insn "*aarch64_topbits_shuffle_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (unspec: [
> +   (match_operand:VQN 3 "register_operand" "w")
> +   (match_operand:VQN 2
> "aarch64_simd_shift_imm_vec_exact_top")
> +  ] UNSPEC_RSHRN)
> +  (unspec: [
> +  (match_operand:VQN 1 "register_operand" "w")
> +   (match_dup 2)
> +  ] UNSPEC_RSHRN)))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "uzp2\\t%0., %1., %3."
> +  [(set_attr "type" "neon_permute")]
> +)
> +
>  (define_expand "aarch64_shrn"
>[(set (match_operand: 0 "register_operand")
>   (truncate:
> diff --git a/gcc/config/aarch64/predicates.md
> b/gcc/config/aarch64/predicates.md
> index
> 49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f30472
> 90f0446789e1d0d2 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -545,6 +545,12 @@ (define_predicate
> "aarch64_simd_shift_imm_offset_di"
>(and (match_code "const_int")
> (match_test "IN_RANGE (INTVAL (op), 1, 64)")))
> 
> +(define_predicate "aarch64_simd_shift_imm_vec_exact_top"
> +  (and (match_code "const_vector")
> +   (match_test "aarch64_const_vec_all_same_in_range_p (op,
> + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2,
> + GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)")))
> +
>  (define_predicate "aarch64_simd_shift_imm_vec_qi"
>(and (match_code "const_vector")
> (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)")))
> diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c
>

Re: [PATCH] rs6000: Remove builtin mask check from builtin_decl [PR102347]

2021-10-12 Thread Bill Schmidt via Gcc-patches

Hi Kewen,

On 10/11/21 1:30 AM, Kewen.Lin wrote:
> Hi Segher,
>
> Thanks for the comments.
>
> on 2021/10/1 上午6:13, Segher Boessenkool wrote:
>> Hi!
>>
>> On Thu, Sep 30, 2021 at 11:06:50AM +0800, Kewen.Lin wrote:
>>
>> [ huge snip ]
>>
>>> Based on the understanding and testing, I think it's safe to adopt this 
>>> patch.
>>> Do both Peter and you agree the rs6000_expand_builtin will catch the 
>>> invalid built-in?
>>> Is there some special case which probably escapes out?
>> The function rs6000_builtin_decl has a terribly generic name.  Where all
>> is it called from?  Do all such places allow the change in semantics?
>> Do any comments or other documentation need to change?  Is the function
>> name still good?
>
> % grep -rE "\ \(" .
> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
> ./gcc/config/avr/avr-c.c:  fold = targetm.builtin_decl (id, true);
> ./gcc/config/aarch64/aarch64.c:  return aarch64_sve::builtin_decl 
> (subcode, initialize_p);
> ./gcc/config/aarch64/aarch64-protos.h:  tree builtin_decl (unsigned, bool);
> ./gcc/config/aarch64/aarch64-sve-builtins.cc:builtin_decl (unsigned int code, 
> bool)
> ./gcc/tree-streamer-in.c:  tree result = targetm.builtin_decl (fcode, 
> true);
>
> % grep -rE "\ \(" .
> ./gcc/config/rs6000/rs6000-c.c:   if (rs6000_builtin_decl 
> (instance->bifid, false) != error_mark_node
> ./gcc/config/rs6000/rs6000-c.c:   if (rs6000_builtin_decl 
> (instance->bifid, false) != error_mark_node
> ./gcc/config/rs6000/rs6000-c.c:   if (rs6000_builtin_decl 
> (instance->bifid, false) != error_mark_node
> ./gcc/config/rs6000/rs6000-gen-builtins.c:   "extern tree 
> rs6000_builtin_decl (unsigned, "
> ./gcc/config/rs6000/rs6000-call.c:rs6000_builtin_decl (unsigned code, bool 
> initialize_p ATTRIBUTE_UNUSED)
> ./gcc/config/rs6000/rs6000-internal.h:extern tree rs6000_builtin_decl 
> (unsigned code,
>
> As above, the call sites are mainly in
>   1) function unpack_ts_function_decl_value_fields in gcc/tree-streamer-in.c
>   2) function altivec_resolve_new_overloaded_builtin in 
> gcc/config/rs6000/rs6000-c.c
>
> 2) is newly introduced by Bill's bif rewriting patch series, all uses in it 
> are
> along with rs6000_new_builtin_is_supported which adopts a new way to check bif
> supported or not (the old rs6000_builtin_is_supported_p uses builtin mask), so
> I think the builtin mask checking is useless (unexpected?) for these uses.

Things are a bit confused because we are part way through the patch series.
rs6000_builtin_decl will be changed to redirect to rs6000_new_builtin_decl when
using the new builtin support.  That function will be:

static tree
rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
{
  rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;

  if (fcode >= RS6000_OVLD_MAX)
return error_mark_node;

  if (!rs6000_new_builtin_is_supported (fcode))
{
  rs6000_invalid_new_builtin (fcode);
  return error_mark_node;
}

  return rs6000_builtin_decls_x[code];
}

So, as you surmise, this will be using the new method of testing for builtin 
validity.
You can ignore the rs6000-c.c and rs6000-gen-builtins.c references of 
rs6000_builtin_decl
for purposes of fixing the existing way of doing things.

>
> Besides, the description for this hook:
>
> "tree TARGET_BUILTIN_DECL (unsigned code, bool initialize_p) [Target Hook]
> Define this hook if you have any machine-specific built-in functions that 
> need to be
> defined. It should be a function that returns the builtin function 
> declaration for the
> builtin function code code. If there is no such builtin and it cannot be 
> initialized at
> this time if initialize p is true the function should return NULL_TREE. If 
> code is out
> of range the function should return error_mark_node."
>
> It would only return error_mark_node when the code is out of range.  The 
> current
> rs6000_builtin_decl returns error_mark_node not only for "out of range", it 
> looks
> inconsistent and this patch also revise it.
>
> The hook was introduced by commit e9e4b3a892d0d19418f23bb17bdeac33f9a8bfd2,
> it meant to ensure the bif function_decl is valid (check if bif code in the
> range and the corresponding entry in bif table is not NULL).  May be better
> with name check_and_get_builtin_decl?  CC Richi, he may have more insights.
>
>>> By the way, I tested the bif rewriting patch series V5, it couldn't make 
>>> the original
>>> case in PR (S5) pass, I may miss something or the used series isn't 
>>> up-to-date.  Could
>>> you help to have a try?  I agree with Peter, if the rewriting can fix this 
>>> issue, then
>>> we don't need this patch for trunk any more, I'm happy to abandon this.  :)
>> (Mail lines are 70 or so chars max, so that they can be quoted a few
>> levels).
>>
> ah, OK, thanks.  :)
>
>> If we do need a band-aid for 10 and 11 (and we do as far as I can

RE: [PATCH 1/7]AArch64 Add combine patterns for right shift and narrow

2021-10-12 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, October 12, 2021 5:18 PM
> To: Richard Sandiford ; Kyrylo Tkachov
> 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: RE: [PATCH 1/7]AArch64 Add combine patterns for right shift and
> narrow
> 
> Hi All,
> 
> Here's a new version with big-endian support and more tests
> 
> > >
> > > I think this needs to be guarded on !BYTES_BIG_ENDIAN and a similar
> > pattern added for BYTES_BIG_ENDIAN with the vec_concat operands
> > swapped around.
> > > This is similar to the aarch64_xtn2_insn_be pattern, for example.
> >
> > Yeah.  I think that applies to 2/7 and 4/7 too.
> >
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md
> (*aarch64_shrn_vect,
>   *aarch64_shrn2_vect_le,
>   *aarch64_shrn2_vect_be): New.
>   * config/aarch64/iterators.md (srn_op): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/shrn-combine-1.c: New test.
>   * gcc.target/aarch64/shrn-combine-2.c: New test.
>   * gcc.target/aarch64/shrn-combine-3.c: New test.
>   * gcc.target/aarch64/shrn-combine-4.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 48eddf64e05afe3788abfa05141f6544a9323ea1..5715db4e1e1386e724e4d4d
> efd5e5ed9efd8a874 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1818,6 +1818,40 @@ (define_insn "aarch64_shrn_insn_be"
>[(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
> 
> +(define_insn "*aarch64_shrn_vect"
> +  [(set (match_operand: 0 "register_operand" "=w")
> +(truncate:
> +  (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
> +(match_operand:VQN 2
> "aarch64_simd_shift_imm_vec_"]
> +  "TARGET_SIMD"
> +  "shrn\\t%0., %1., %2"
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "*aarch64_shrn2_vect_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (match_operand: 1 "register_operand" "0")
> +   (truncate:
> + (SHIFTRT:VQN (match_operand:VQN 2 "register_operand" "w")
> +   (match_operand:VQN 3
> "aarch64_simd_shift_imm_vec_")]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "shrn2\\t%0., %2., %3"
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "*aarch64_shrn2_vect_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (truncate:
> + (SHIFTRT:VQN (match_operand:VQN 2 "register_operand" "w")
> +   (match_operand:VQN 3
> "aarch64_simd_shift_imm_vec_")))
> +   (match_operand: 1 "register_operand" "0")))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "shrn2\\t%0., %2., %3"
> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
> +)
> +
>  (define_expand "aarch64_shrn"
>[(set (match_operand: 0 "register_operand")
>   (truncate:
> diff --git a/gcc/config/aarch64/iterators.md
> b/gcc/config/aarch64/iterators.md
> index
> caa42f8f169fbf2cf46a90cf73dee05619acc300..8dbeed3b0d4a44cdc17dd333e
> d397b39a33f386a 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -2003,6 +2003,9 @@ (define_code_attr shift [(ashift "lsl") (ashiftrt "asr")
>  ;; Op prefix for shift right and accumulate.
>  (define_code_attr sra_op [(ashiftrt "s") (lshiftrt "u")])
> 
> +;; op prefix for shift right and narrow.
> +(define_code_attr srn_op [(ashiftrt "r") (lshiftrt "")])
> +
>  ;; Map shift operators onto underlying bit-field instructions
>  (define_code_attr bfshift [(ashift "ubfiz") (ashiftrt "sbfx")
>  (lshiftrt "ubfx") (rotatert "extr")])
> diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
> b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
> new file mode 100644
> index
> ..a28524662edca8eb149e34c
> 2242091b51a167b71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
> +
> +#define TYPE char
> +
> +void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
> +{
> +for( int i = 0; i < n; i++ )
> +  d[i] = (a[i] * a[i]) >> 2;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tshrn\t} 1 } } */
> +/* { dg-final { scan-assembler-times {\tshrn2\t} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
> b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
> new file mode 100644
> index
> ..012135b424f98abadc480e7
> ef13fcab080d99c28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O3

RE: [PATCH 4/7]AArch64 Add pattern xtn+xtn2 to uzp2

2021-10-12 Thread Tamar Christina via Gcc-patches

Hi All,

This is  a new version with BE support and more tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_narrow_trunc_le):
(*aarch64_narrow_trunc_be): New.
* config/aarch64/iterators.md (VNARROWSIMD, Vnarrowsimd): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_high_combine.c: Update case.
* gcc.target/aarch64/xtn-combine-1.c: New test.
* gcc.target/aarch64/xtn-combine-2.c: New test.
* gcc.target/aarch64/xtn-combine-3.c: New test.
* gcc.target/aarch64/xtn-combine-4.c: New test.
* gcc.target/aarch64/xtn-combine-5.c: New test.
* gcc.target/aarch64/xtn-combine-6.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
0b340b49fa06684b80d0b78cb712e49328ca92d5..8435dece660a12aa747c4a489fbbda5bc0f83a86
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1753,6 +1753,30 @@ (define_expand "aarch64_xtn2"
   }
 )
 
+(define_insn "*aarch64_narrow_trunc_le"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+  (truncate:
+(match_operand:VQN 1 "register_operand" "w"))
+ (truncate:
+   (match_operand:VQN 2 "register_operand" "w"]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "uzp1\\t%0., %1., %2."
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_narrow_trunc_be"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (truncate:
+   (match_operand:VQN 2 "register_operand" "w"))
+  (truncate:
+(match_operand:VQN 1 "register_operand" "w"]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "uzp1\\t%0., %1., %2."
+  [(set_attr "type" "neon_permute")]
+)
+
 ;; Packing doubles.
 
 (define_expand "vec_pack_trunc_"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
8dbeed3b0d4a44cdc17dd333ed397b39a33f386a..95b385c0c9405fe95fcd07262a9471ab13d5488e
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -270,6 +270,14 @@ (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
 ;; Advanced SIMD modes for H, S and D types.
 (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
 
+;; Modes for which we can narrow the element and increase the lane counts
+;; to preserve the same register size.
+(define_mode_attr VNARROWSIMD [(V4HI "V8QI") (V8HI "V16QI") (V4SI "V8HI")
+  (V2SI "V4HI") (V2DI "V4SI")])
+
+(define_mode_attr Vnarrowsimd [(V4HI "v8qi") (V8HI "v16qi") (V4SI "v8hi")
+  (V2SI "v4hi") (V2DI "v4si")])
+
 ;; Advanced SIMD and scalar integer modes for H and S.
 (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c 
b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
index 
50ecab002a3552d37a5cc0d8921f42f6c3dba195..fa61196d3644caa48b12151e12b15dfeab8c7e71
 100644
--- a/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
+++ b/gcc/testsuite/gcc.target/aarch64/narrow_high_combine.c
@@ -225,7 +225,8 @@ TEST_2_UNARY (vqmovun, uint32x4_t, int64x2_t, s64, u32)
 /* { dg-final { scan-assembler-times "\\tuqshrn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqrshrn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tuqrshrn2\\tv" 6} }  */
-/* { dg-final { scan-assembler-times "\\txtn2\\tv" 12} }  */
+/* { dg-final { scan-assembler-times "\\txtn2\\tv" 6} }  */
+/* { dg-final { scan-assembler-times "\\tuzp1\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tuqxtn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqxtn2\\tv" 6} }  */
 /* { dg-final { scan-assembler-times "\\tsqxtun2\\tv" 6} }  */
diff --git a/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c 
b/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c
new file mode 100644
index 
..14e0414cd1478f1cb7b17766aa8d4451c5659977
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/xtn-combine-1.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+#define SIGN signed
+#define TYPE1 char
+#define TYPE2 short
+
+void d2 (SIGN TYPE1 * restrict a, SIGN TYPE2 *b, int n)
+{
+for (int i = 0; i < n; i++)
+  a[i] = b[i];
+}
+
+/* { dg-final { scan-assembler-times {\tuzp1\t} 1 } } */
+/* { dg-final { scan-assembler-not {\txtn\t} } } */
+/* { dg-final { scan-assembler-not {\txtn2\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c 
b/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c
new file mode 100644
index 
..c259010442bca4ba008706e47b3ffcc50a910b52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/xtn-combine-2.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble } */

[PATCH] i386: Improve workaround for PR82524 LRA limitation [PR85730]

2021-10-12 Thread Uros Bizjak via Gcc-patches

As explained in PR82524, LRA is not able to reload strict_low_part inout
operand with matched input operand. The patch introduces a workaround,
where we allow LRA to generate an instruction with non-matched input operand
which is split post reload to an instruction that inserts non-matched input
operand to an inout operand and the instruction that uses matched operand.

The generated code improves from:

movsbl  %dil, %edx
movl%edi, %eax
sall$3, %edx
movb%dl, %al

to:

movl%edi, %eax
movb%dil, %al
salb$3, %al

which is still not optimal, but the code is one instruction shorter and
does not use a temporary register.

2021-10-12  Uroš Bizjak  

gcc/
PR target/85730
PR target/82524
* config/i386/i386.md (*add_1_slp): Rewrite as
define_insn_and_split pattern.  Add alternative 1 and split it
post reload to insert operand 1 into the low part of operand 0.
(*sub_1_slp): Ditto.
(*and_1_slp): Ditto.
(*_1_slp): Ditto.
(*ashl3_1_slp): Ditto.
(*3_1_slp): Ditto.
(*3_1_slp): Ditto.
(*neg_1_slp): New insn_and_split pattern.
(*one_cmpl_1_slp): Ditto.

gcc/testsuite/
PR target/85730
PR target/82524
* gcc.target/i386/pr85730.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c7ae4ac5fbc..e733a40fc90 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5730,16 +5730,17 @@
  (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
   (symbol_ref "true")))])
 
-(define_insn "*add_1_slp"
-  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+"))
-   (plus:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "%0")
-   (match_operand:SWI12 2 "general_operand" "mn")))
+;; Alternative 1 is needed to work around LRA limitation, see PR82524.
+(define_insn_and_split "*add_1_slp"
+  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+,"))
+   (plus:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "%0,!")
+   (match_operand:SWI12 2 "general_operand" "mn,mn")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
-   && (rtx_equal_p (operands[0], operands[1])
-   || rtx_equal_p (operands[0], operands[2]))"
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
 {
+  if (which_alternative)
+return "#";
+
   switch (get_attr_type (insn))
 {
 case TYPE_INCDEC:
@@ -5758,6 +5759,13 @@
   return "add{}\t{%2, %0|%0, %2}";
 }
 }
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0)) (match_dup 1))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (plus:SWI12 (match_dup 0) (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
   [(set (attr "type")
  (if_then_else (match_operand:QI 2 "incdec_operand")
(const_string "incdec")
@@ -6676,15 +6684,23 @@
   [(set_attr "type" "alu")
(set_attr "mode" "SI")])
 
-(define_insn "*sub_1_slp"
-  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+"))
-   (minus:SWI12 (match_operand:SWI12 1 "register_operand" "0")
-(match_operand:SWI12 2 "general_operand" "mn")))
+;; Alternative 1 is needed to work around LRA limitation, see PR82524.
+(define_insn_and_split "*sub_1_slp"
+  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+,"))
+   (minus:SWI12 (match_operand:SWI12 1 "register_operand" "0,!")
+(match_operand:SWI12 2 "general_operand" "mn,mn")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
-   && rtx_equal_p (operands[0], operands[1])"
-  "sub{}\t{%2, %0|%0, %2}"
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "@
+   sub{}\t{%2, %0|%0, %2}
+   #"
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0)) (match_dup 1))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (minus:SWI12 (match_dup 0) (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
   [(set_attr "type" "alu")
(set_attr "mode" "")])
 
@@ -9606,16 +9622,23 @@
  (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
   (symbol_ref "true")))])
 
-(define_insn "*and_1_slp"
-  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+"))
-   (and:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "%0")
-  (match_operand:SWI12 2 "general_operand" "mn")))
+;; Alternative 1 is needed to work around LRA limitation, see PR82524.
+(define_insn_and_split "*and_1_slp"
+  [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+,"))
+   (and:SWI12 (match_operand:SWI12 1

RE: [PATCH 2/7]AArch64 Add combine patterns for narrowing shift of half top bits (shuffle)

2021-10-12 Thread Tamar Christina via Gcc-patches

Hi All,

This is  a new version with more tests and BE support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_topbits_shuffle_le): New.
(*aarch64_topbits_shuffle_le): New.
(*aarch64_topbits_shuffle_be): New.
(*aarch64_topbits_shuffle_be): New.
* config/aarch64/predicates.md
(aarch64_simd_shift_imm_vec_exact_top): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrn-combine-10.c: New test.
* gcc.target/aarch64/shrn-combine-5.c: New test.
* gcc.target/aarch64/shrn-combine-6.c: New test.
* gcc.target/aarch64/shrn-combine-7.c: New test.
* gcc.target/aarch64/shrn-combine-8.c: New test.
* gcc.target/aarch64/shrn-combine-9.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
5715db4e1e1386e724e4d4defd5e5ed9efd8a874..7f0888ee2f81ae17ac97be1f8438a2e588587c2a
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1852,6 +1852,66 @@ (define_insn "*aarch64_shrn2_vect_be"
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
+(define_insn "*aarch64_topbits_shuffle_le"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+  (truncate:
+(SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
+ (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top")))
+ (truncate:
+   (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w")
+ (match_dup 2)]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "uzp2\\t%0., %1., %3."
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_topbits_shuffle_le"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+  (unspec: [
+  (match_operand:VQN 1 "register_operand" "w")
+ (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top")
+] UNSPEC_RSHRN)
+ (unspec: [
+ (match_operand:VQN 3 "register_operand" "w")
+ (match_dup 2)
+] UNSPEC_RSHRN)))]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "uzp2\\t%0., %1., %3."
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_topbits_shuffle_be"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (truncate:
+   (SHIFTRT:VQN (match_operand:VQN 3 "register_operand" "w")
+ (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top")))
+  (truncate:
+(SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
+ (match_dup 2)]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "uzp2\\t%0., %1., %3."
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_topbits_shuffle_be"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (unspec: [
+ (match_operand:VQN 3 "register_operand" "w")
+ (match_operand:VQN 2 "aarch64_simd_shift_imm_vec_exact_top")
+] UNSPEC_RSHRN)
+  (unspec: [
+  (match_operand:VQN 1 "register_operand" "w")
+ (match_dup 2)
+] UNSPEC_RSHRN)))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "uzp2\\t%0., %1., %3."
+  [(set_attr "type" "neon_permute")]
+)
+
 (define_expand "aarch64_shrn"
   [(set (match_operand: 0 "register_operand")
(truncate:
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 
49f02ae0381359174fed80c2a2264295c75bc189..7fd4f9e7d06d3082d6f3047290f0446789e1d0d2
 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -545,6 +545,12 @@ (define_predicate "aarch64_simd_shift_imm_offset_di"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, 64)")))
 
+(define_predicate "aarch64_simd_shift_imm_vec_exact_top"
+  (and (match_code "const_vector")
+   (match_test "aarch64_const_vec_all_same_in_range_p (op,
+   GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2,
+   GET_MODE_UNIT_BITSIZE (GET_MODE (op)) / 2)")))
+
 (define_predicate "aarch64_simd_shift_imm_vec_qi"
   (and (match_code "const_vector")
(match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 8)")))
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c
new file mode 100644
index 
..3a1cfce93e9065e8d5b43a770b0ef24a17586411
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-10.c
@@ -0,0 +1,14 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+
+#include 
+
+uint32x4_t foo (uint64x2_t a, uint64x2_t b)
+{
+  return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32);
+}
+
+/* { dg-final { scan-assembler-times {\tuzp2\t} 1 } } */
+/* { dg-final {

Re: [PATCH] include/longlong.h: Remove incorrect lvalue to rvalue conversion from asm output constraints

2021-10-12 Thread Fāng-ruì Sòng via Gcc-patches

On Sun, Oct 10, 2021 at 10:03 PM Florian Weimer  wrote:
>
> * Fangrui Song:
>
> > An output constraint takes a lvalue. While GCC happily strips the
> > incorrect lvalue to rvalue conversion, Clang rejects the code by default:
> >
> > error: invalid use of a cast in a inline asm context requiring an 
> > lvalue: remove the cast or build with -fheinous-gnu-extensions
> >
> > The file appears to share the same origin with gmplib longlong.h but
> > they differ much now (gmplib version is much longer).
> >
> > I don't have write access to the git repo.
> > ---
> >  include/longlong.h | 186 ++---
> >  1 file changed, 93 insertions(+), 93 deletions(-)
> >
> > diff --git a/include/longlong.h b/include/longlong.h
> > index c3e92e54ecc..0a21a441d2d 100644
> > --- a/include/longlong.h
> > +++ b/include/longlong.h
> > @@ -194,8 +194,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, 
> > UDItype, UDItype);
> >  #if defined (__arc__) && W_TYPE_SIZE == 32
> >  #define add_ss(sh, sl, ah, al, bh, bl) \
> >__asm__ ("add.f%1, %4, %5\n\tadc   %0, %2, %3" \
> > -: "=r" ((USItype) (sh)), \
> > -  "=" ((USItype) (sl)) \
> > +: "=r" (sh), \
> > +  "=" (sl) \
> >  : "%r" ((USItype) (ah)), \
> >"rICal" ((USItype) (bh)),  \
> >"%r" ((USItype) (al)), \
>
> This seems to alter the meanining of existing programs if sh and sl do
> not have the expected type.
>
> I think you need to add a compound expression and temporaries of type
> USItype if you want to avoid the cast.

Add folks who may comment on the output constraint behavior when a
lvalue to rvalue conversion like (`(USItype)`) is added.

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Segher Boessenkool

Hi!

On Tue, Oct 12, 2021 at 09:49:19AM -0600, Martin Sebor wrote:
> Coming back to the xfail conditionals, do you think you'll
> be able to put together some target-supports magic so they
> don't have to enumerate all the affected targets?

There should only be an xfail if we do not expect to be able to fix the
bug causing this any time soon.  There shouldn't be one here, not yet
anyway.

Other than that: yes, and one you have such a selector, just dg-require
it (or its inverse) for this test, don't xfail the test (if this is
expected and correct behaviour).

Segher

RE: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt

2021-10-12 Thread Tamar Christina via Gcc-patches

Thanks,

Just archiving a version with more tests as requested.

I will assume the OK still stands.

Regards,
Tamar

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Tuesday, October 12, 2021 1:19 PM
> To: Andrew Pinski 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> apin...@marvell.com; Richard Earnshaw ; nd
> ; Marcus Shawcroft ; Richard
> Sandiford 
> Subject: RE: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> 
> 
> > -Original Message-
> > From: Andrew Pinski 
> > Sent: Monday, October 11, 2021 8:56 PM
> > To: Kyrylo Tkachov 
> > Cc: Tamar Christina ;
> > gcc-patches@gcc.gnu.org; apin...@marvell.com; Richard Earnshaw
> > ; nd ; Marcus Shawcroft
> > ; Richard Sandiford
> > 
> > Subject: Re: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> >
> > On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches
> >  wrote:
> > > > -Original Message-
> > > > From: Tamar Christina 
> > > > Sent: Wednesday, September 29, 2021 5:20 PM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: nd ; Richard Earnshaw
> > ;
> > > > Marcus Shawcroft ; Kyrylo Tkachov
> > > > ; Richard Sandiford
> > > > 
> > > > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> > > >
> > > > Hi All,
> > > >
> > > > This optimizes signed right shift by BITSIZE-1 into a cmlt
> > > > operation which
> > is
> > > > more optimal because generally compares have a higher throughput
> > > > than shifts.
> > > >
> > > > On AArch64 the result of the shift would have been either -1 or 0
> > > > which is
> > the
> > > > results of the compare.
> > > >
> > > > i.e.
> > > >
> > > > void e (int * restrict a, int *b, int n) {
> > > > for (int i = 0; i < n; i++)
> > > >   b[i] = a[i] >> 31;
> > > > }
> > > >
> > > > now generates:
> > > >
> > > > .L4:
> > > > ldr q0, [x0, x3]
> > > > cmltv0.4s, v0.4s, #0
> > > > str q0, [x1, x3]
> > > > add x3, x3, 16
> > > > cmp x4, x3
> > > > bne .L4
> > > >
> > > > instead of:
> > > >
> > > > .L4:
> > > > ldr q0, [x0, x3]
> > > > sshrv0.4s, v0.4s, 31
> > > > str q0, [x1, x3]
> > > > add x3, x3, 16
> > > > cmp x4, x3
> > > > bne .L4
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > This should be okay (either a win or neutral) for Arm Cortex and
> > > Neoverse
> > cores so I'm tempted to not ask for a CPU-specific tunable to guard it
> > to keep the code clean.
> > > Andrew, would this change be okay from a Thunder X line perspective?
> >
> > I don't know about ThunderX2 but here are the details for ThunderX1
> > (and OcteonX1) and OcteonX2:
> > The sshr and cmlt are handled the same in the pipeline as far as I can tell.
> >
> 
> Thanks for the info.
> This patch is ok.
> Kyrill
> 
> > Thanks,
> > Andrew
> >
> >
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * config/aarch64/aarch64-simd.md (aarch64_simd_ashr):
> > > > Add case cmp
> > > >   case.
> > > >   * config/aarch64/constraints.md (D1): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.target/aarch64/shl-combine-2.c: New test.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > > b/gcc/config/aarch64/aarch64-simd.md
> > > > index
> > > >
> > 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> > > > 7ded861228ee22007 100644
> > > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr"
> > > >  )
> > > >
> > > >  (define_insn "aarch64_simd_ashr"
> > > > - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> > > > -   (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> > > > -  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > > "Dr")))]
> > > > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> > > > +   (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand"
> "w,w")
> > > > +  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > > "D1,Dr")))]
> > > >   "TARGET_SIMD"
> > > > - "sshr\t%0., %1., %2"
> > > > -  [(set_attr "type" "neon_shift_imm")]
> > > > + "@
> > > > +  cmlt\t%0., %1., #0  sshr\t%0., %1.,
> > > > + %2"
> > > > +  [(set_attr "type" "neon_compare,neon_shift_imm")]
> > > >  )
> > > >
> > > >  (define_insn "*aarch64_simd_sra"
> > > > diff --git a/gcc/config/aarch64/constraints.md
> > > > b/gcc/config/aarch64/constraints.md
> > > > index
> > > >
> > 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> > > > 9db69fd428dfb0d6 100644
> > > > --- a/gcc/config/aarch64/constraints.md
> > > > +++ b/gcc/config/aarch64/constraints.md
> > > > @@ -437,6 +437,14 @@ (define_constraint "Dl"
> > > >(match_test

RE: [PATCH 1/7]AArch64 Add combine patterns for right shift and narrow

2021-10-12 Thread Tamar Christina via Gcc-patches

Hi All,

Here's a new version with big-endian support and more tests

> >
> > I think this needs to be guarded on !BYTES_BIG_ENDIAN and a similar
> pattern added for BYTES_BIG_ENDIAN with the vec_concat operands
> swapped around.
> > This is similar to the aarch64_xtn2_insn_be pattern, for example.
> 
> Yeah.  I think that applies to 2/7 and 4/7 too.
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_shrn_vect,
*aarch64_shrn2_vect_le,
*aarch64_shrn2_vect_be): New.
* config/aarch64/iterators.md (srn_op): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shrn-combine-1.c: New test.
* gcc.target/aarch64/shrn-combine-2.c: New test.
* gcc.target/aarch64/shrn-combine-3.c: New test.
* gcc.target/aarch64/shrn-combine-4.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
48eddf64e05afe3788abfa05141f6544a9323ea1..5715db4e1e1386e724e4d4defd5e5ed9efd8a874
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1818,6 +1818,40 @@ (define_insn "aarch64_shrn_insn_be"
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
 
+(define_insn "*aarch64_shrn_vect"
+  [(set (match_operand: 0 "register_operand" "=w")
+(truncate:
+  (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
+(match_operand:VQN 2 "aarch64_simd_shift_imm_vec_"]
+  "TARGET_SIMD"
+  "shrn\\t%0., %1., %2"
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
+
+(define_insn "*aarch64_shrn2_vect_le"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (match_operand: 1 "register_operand" "0")
+ (truncate:
+   (SHIFTRT:VQN (match_operand:VQN 2 "register_operand" "w")
+ (match_operand:VQN 3 "aarch64_simd_shift_imm_vec_")]
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "shrn2\\t%0., %2., %3"
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
+
+(define_insn "*aarch64_shrn2_vect_be"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (vec_concat:
+ (truncate:
+   (SHIFTRT:VQN (match_operand:VQN 2 "register_operand" "w")
+ (match_operand:VQN 3 "aarch64_simd_shift_imm_vec_")))
+ (match_operand: 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "shrn2\\t%0., %2., %3"
+  [(set_attr "type" "neon_shift_imm_narrow_q")]
+)
+
 (define_expand "aarch64_shrn"
   [(set (match_operand: 0 "register_operand")
(truncate:
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
caa42f8f169fbf2cf46a90cf73dee05619acc300..8dbeed3b0d4a44cdc17dd333ed397b39a33f386a
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2003,6 +2003,9 @@ (define_code_attr shift [(ashift "lsl") (ashiftrt "asr")
 ;; Op prefix for shift right and accumulate.
 (define_code_attr sra_op [(ashiftrt "s") (lshiftrt "u")])
 
+;; op prefix for shift right and narrow.
+(define_code_attr srn_op [(ashiftrt "r") (lshiftrt "")])
+
 ;; Map shift operators onto underlying bit-field instructions
 (define_code_attr bfshift [(ashift "ubfiz") (ashiftrt "sbfx")
   (lshiftrt "ubfx") (rotatert "extr")])
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
new file mode 100644
index 
..a28524662edca8eb149e34c2242091b51a167b71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-1.c
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+#define TYPE char
+
+void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
+{
+for( int i = 0; i < n; i++ )
+  d[i] = (a[i] * a[i]) >> 2;
+}
+
+/* { dg-final { scan-assembler-times {\tshrn\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tshrn2\t} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
new file mode 100644
index 
..012135b424f98abadc480e7ef13fcab080d99c28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrn-combine-2.c
@@ -0,0 +1,13 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 --save-temps --param=vect-epilogues-nomask=0" } */
+
+#define TYPE short
+
+void foo (unsigned TYPE * restrict a, TYPE * restrict d, int n)
+{
+for( int i = 0; i < n; i++ )
+  d[i] = (a[i] * a[i]) >> 2;
+}
+
+/* { dg-final { scan-assembler-times {\tshrn\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tshrn2\t} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c 
b/gcc/testsuite/gcc.target/aarch64/shrn-combine-3.c
new file mode 100644
index 
..8b5b360de623b0ada0da1531795ba6b428c7f9e1
---

[committed] libstdc++: Fix test that fails for C++20

2021-10-12 Thread Jonathan Wakely via Gcc-patches

Also restore the test for 'a < a' that was removed by r12-2537 because
it is ill-formed. We still want to test operator< for tuple, we just
need to not use std::nullptr_t in that tuple type.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Restore test for operator<.
* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Adjust expected errors for C++20.

Tested powerpc64le-linux. Committed to trunk.

commit 727137d6ca6d3d401a0c1b4df6b9aae8b97dacd5
Author: Jonathan Wakely 
Date:   Tue Oct 12 15:39:18 2021

libstdc++: Fix test that fails for C++20

Also restore the test for 'a < a' that was removed by r12-2537 because
it is ill-formed. We still want to test operator< for tuple, we just
need to not use std::nullptr_t in that tuple type.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Restore test for operator<.
* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Adjust expected errors for C++20.

diff --git 
a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc 
b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc
index ef90b6b5b73..a9bc2c7dfb5 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded.cc
@@ -48,3 +48,9 @@ TwistedLogic operator<(const Compares&, const Compares&) { 
return {false}; }
 
 auto a = std::make_tuple(nullptr, Compares{}, 2, 'U');
 auto b = a == a;
+
+#if ! __cpp_lib_three_way_comparison
+// Not valid in C++20, because TwistedLogic doesn't model boolean-testable.
+auto c = std::make_tuple("", Compares{}, 2, 'U');
+auto d = c < c;
+#endif
diff --git 
a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc 
b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
index a66a9315902..bac16ffd521 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
@@ -49,5 +49,7 @@ TwistedLogic operator<(const Compares&, const Compares&) { 
return {false}; }
 auto a = std::make_tuple(nullptr, Compares{}, 2, 'U');
 auto b = a < a;
 
-// { dg-error "ordered comparison" "" { target *-*-* } 0 }
+// { dg-error "no match for 'operator<'" "" { target c++20 } 0 }
+// { dg-error "no match for .*_Synth3way|in requirements" "" { target c++20 } 
0 }
+// { dg-error "ordered comparison" "" { target c++17_down } 0 }
 // { dg-error "not a return-statement" "" { target c++11_only } 0 }

[committed] libstdc++: Fix move construction of std::tuple with array elements [PR101960]

2021-10-12 Thread Jonathan Wakely via Gcc-patches

The r12-3022 commit only fixed the case where an array is the last
element of the tuple. This fixes the other cases too. We can just define
the move constructor as defaulted, which does the right thing. Changing
the move constructor to be trivial would be an ABI break, but since the
last base class still has a non-trivial move constructor, defining the
derived ones as defaulted doesn't change anything.

libstdc++-v3/ChangeLog:

PR libstdc++/101960
* include/std/tuple (_Tuple_impl(_Tuple_impl&&)): Define as
defauled.
* testsuite/20_util/tuple/cons/101960.cc: Check tuples with
array elements before the last element.

Tested powerpc64le-linux. Committed to trunk.

commit 7481021364e75ba583972e15ed421a53988368ea
Author: Jonathan Wakely 
Date:   Tue Oct 12 15:09:50 2021

libstdc++: Fix move construction of std::tuple with array elements 
[PR101960]

The r12-3022 commit only fixed the case where an array is the last
element of the tuple. This fixes the other cases too. We can just define
the move constructor as defaulted, which does the right thing. Changing
the move constructor to be trivial would be an ABI break, but since the
last base class still has a non-trivial move constructor, defining the
derived ones as defaulted doesn't change anything.

libstdc++-v3/ChangeLog:

PR libstdc++/101960
* include/std/tuple (_Tuple_impl(_Tuple_impl&&)): Define as
defauled.
* testsuite/20_util/tuple/cons/101960.cc: Check tuples with
array elements before the last element.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 94a4f0afd31..aaee0b8826a 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -298,13 +298,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 2729. Missing SFINAE on std::pair::operator=
   _Tuple_impl& operator=(const _Tuple_impl&) = delete;
 
-  constexpr
-  _Tuple_impl(_Tuple_impl&& __in)
-  noexcept(__and_,
- is_nothrow_move_constructible<_Inherited>>::value)
-  : _Inherited(std::move(_M_tail(__in))),
-   _Base(std::forward<_Head>(_M_head(__in)))
-  { }
+  _Tuple_impl(_Tuple_impl&&) = default;
 
   template
constexpr
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc
index f14604cdc69..42d17b182ed 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc
@@ -1,4 +1,13 @@
 // { dg-do compile { target c++11 } }
 #include 
+
+// PR libstdc++/101960
+
 std::tuple t;
-auto tt = std::move(t); // PR libstdc++/101960
+auto tt = std::move(t);
+
+std::tuple t2;
+auto tt2 = std::move(t2);
+
+std::tuple t3;
+auto tt3 = std::move(t3);

[committed] libstdc++: Improve diagnostics for misuses of output iterators

2021-10-12 Thread Jonathan Wakely via Gcc-patches

This adds deleted overloads so that the errors for invalid uses of
std::advance and std::distance are easier to understand (see for example
PR 102181).

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator_base_funcs.h (__advance): Add
deleted overload to improve diagnostics.
(__distance): Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit d9dfd7ad3e0196f60a3fc6df6d65a40fb905409f
Author: Jonathan Wakely 
Date:   Wed Sep 29 21:19:36 2021

libstdc++: Improve diagnostics for misuses of output iterators

This adds deleted overloads so that the errors for invalid uses of
std::advance and std::distance are easier to understand (see for example
PR 102181).

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator_base_funcs.h (__advance): Add
deleted overload to improve diagnostics.
(__distance): Likewise.

diff --git a/libstdc++-v3/include/bits/stl_iterator_base_funcs.h 
b/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
index e5afab7f4fd..fc6e9880de3 100644
--- a/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
+++ b/libstdc++-v3/include/bits/stl_iterator_base_funcs.h
@@ -119,6 +119,13 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   input_iterator_tag);
 #endif
 
+#if __cplusplus >= 201103L
+  // Give better error if std::distance called with a non-Cpp17InputIterator.
+  template
+void
+__distance(_OutputIterator, _OutputIterator, output_iterator_tag) = delete;
+#endif
+
   /**
*  @brief A generalization of pointer arithmetic.
*  @param  __first  An input iterator.
@@ -186,6 +193,13 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
__i += __n;
 }
 
+#if __cplusplus >= 201103L
+  // Give better error if std::advance called with a non-Cpp17InputIterator.
+  template
+void
+__advance(_OutputIterator&, _Distance, output_iterator_tag) = delete;
+#endif
+
   /**
*  @brief A generalization of pointer arithmetic.
*  @param  __i  An input iterator.

Re: [PATCH] rs6000/test: Adjust some cases due to O2 vect [PR102658]

2021-10-12 Thread Martin Sebor via Gcc-patches


On 10/11/21 8:31 PM, Hongtao Liu wrote:

On Tue, Oct 12, 2021 at 4:08 AM Martin Sebor via Gcc-patches
 wrote:


On 10/11/21 11:43 AM, Segher Boessenkool wrote:

On Mon, Oct 11, 2021 at 10:23:03AM -0600, Martin Sebor wrote:

On 10/11/21 9:30 AM, Segher Boessenkool wrote:

On Mon, Oct 11, 2021 at 10:47:00AM +0800, Kewen.Lin wrote:

- For generic test cases, it follows the existing suggested
practice with necessary target/xfail selector.


Not such a great choice.  Many of those tests do not make sense with
vectorisation enabled.  This should have been thought about, in some
cases resulting in not running the test with vectorisation enabled, and
in some cases duplicating the test, once with and once without
vectorisation.


The tests detect bugs that are present both with and without
vetctorization, so they should pass both ways.


Then it should be tested both ways!  This is my point.


Agreed.  (Most warnings are tested with just one set of options,
but it's becoming apparent that the middle end ones should be
exercised more extensively.)




That they don't
tells us that that the warnings need work (they were written with
an assumption that doesn't hold anymore).


They were written in world A.  In world B many things behave
differently.  Transplanting the testcases from A to B without any extra
analysis will not test what the testcases wanted to test, and possibly
nothing at all anymore.


Absolutely.




We need to track that
work somehow, but simply xfailing them without making a record
of what underlying problem the xfails correspond to isn't the best
way.  In my experience, what works well is opening a bug for each
distinct limitation (if one doesn't already exist) and adding
a reference to it as a comment to the xfail.


Probably, yes.


But you are just following established practice, so :-)


I also am okay with this.  If it was decided x86 does not have to deal
with these (generic!) problems, then why should we do other people's
work?


I don't know that anything was decided.  I think those changes
were made in haste, and (as you noted in your review of these
updates to them), were incomplete (missing comments referencing
the underlying bugs or limitations).  Now that we've noticed it
we should try to fix it.  I'm not expecting you (or Kwen) to do
other people's work, but it would help to let them/us know that
there is work for us to do.  I only noticed the problem by luck.


-  struct A1 a = { 0, { 1 } };   // { dg-warning
"\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* } } }
+  struct A1 a = { 0, { 1 } };   // { dg-warning
"\\\[-Wstringop-overflow" "" { target { i?86-*-* x86_64-*-* powerpc*-*-*
} } }


As I mentioned in the bug, when adding xfails for regressions
please be sure to reference the bug that tracks the underlying
root cause.]


You are saying this to whoever added that x86 xfail I hope.


In general it's an appeal to both authors and reviewers of such
changes.  Here, it's mostly for Hongtao who apparently added all
these undocumented xfails.


There may be multiple problems, and we need to
identify what it is in each instance.  As the author of
the tests I can help with that but not if I'm not in the loop
on these changes (it would seem prudent to get the author's
thoughts on such sweeping changes to their work).


Yup.


I discussed one of these failures with Hongtao in detail at
the time autovectorization was being enabled and made the same
request then but I didn't realize the problem was so pervasive.

In addition, the target-specific conditionals in the xfails are
going to be difficult to maintain.


It is a cop-out.  Especially because it makes no comment why it is
xfailed (which should *always* be explained!)


It might be okay for one or
two in a single test but for so many we need a better solution
than that.  If autovectorization is only enabled for a subset
of targets then a solution might be to add a new DejagGNU test
for it and conditionalize the xfails on it.


That, combined with duplicating these tests and still testing the
-fno-vectorization situation properly.  Those tests tested something.
With vectorisation enabled they might no longer test that same thing,
especially if the test fails now!


Right.  The original autovectorization change was made either
without a full analysis of its impact on the affected warnings,
or its impact wasn't adequately captured (either in the xfails
comments or by opening bugs for them).  Now that we know about
this we should try to fix it.  The first step toward that is
to review the xfailed test cases and for each add a comment with
the bug that captures its root cause.

Hongtao, please let me know if you are going to work on that.

I will make a copy of the tests to test the -fno-tree-vectorize
scenario(the original test).
For the xfails, they're analyzed and recorded in pr102462/pr102697,
sorry for not adding comments to them.


Thanks for raising pr102697!  It captures the essence of the bug
that's masked by the

Re: [PATCH 2/8] tree-dynamic-object-size: New pass

2021-10-12 Thread Siddhesh Poyarekar


On 10/12/21 19:28, Jakub Jelinek wrote:

On Fri, Oct 08, 2021 at 03:44:26AM +0530, Siddhesh Poyarekar wrote:

A new pass is added to execute just before the tree-object-size pass
to recognize and simplify __builtin_dynamic_object_size.  Some key
ideas (such as multipass object size collection to detect reference
loops) have been taken from tree-object-size but is distinct from it
to ensure minimal impact on existing code.

At the moment, the pass only recognizes allocators and passthrough
functions to attempt to derive object size expressions, and replaces
the call site with those expressions.  On failure, it replaces the
__builtin_dynamic_object_size with __builtin_object_size as a
fallback.


Not full review, just nits for now:
I don't really like using separate passes for this, whether
it should be done in the same source or different source file
is something less clear, I guess it depends on how much from
tree-object-size.[ch] can be reused, how much could be e.g. done
by pretending the 0-3 operands of __builtin_dynamic_object_size
are actually 4-7 and have [8] arrays in tree-object-size.[ch]
to track that and how much is separate.
I think the common case is either that a function won't
contain any of the builtin calls (most likely on non-glibc targets,
but even on glibc targets for most of the code that doesn't use any
of the fortified APIs), or it uses just one of them and not both
(e.g. glibc -D_FORTIFY_SOURCE={1,2} vs. -D_FORTIFY_SOURCE=3),
so either one or the other pass would just uselessly walk the whole
IL.


Thanks, that makes sense, I'll make it into a single pass.  My main 
motivation was to keep the code separate to have minimal impact but I 
can do that in the same pass too.  The secondary motivation (or more 
like ambition), i.e. deprecating the __bos pass can be done even if it 
were all one pass, maybe even simpler.



The objsz passes are walk over the whole IL, if they see
__bos calls, do something and call something in the end
and your pass is similar, so hooking it into the current pass
is trivial, just if
if (!gimple_call_builtin_p (call, BUILT_IN_OBJECT_SIZE))
before doing continue; check for the other builtin and call
a function to handle it, either from the same or different file,
and then at the end destruct what is needed too.
Especially on huge functions, each IL traversal may need to page into
caches (and out of them) lots of statements...


Agreed, I'll work the entry points into tree-object-size.




* Makefile.in (OBJS): Add tree-dynamic-object-size.o.
(PLUGIN_HEADERS): Add tree-dynamic-object-size.h.
* tree-dynamic-object-size.c: New file.
* tree-dynamic-object-size.h: New file.
* builtins.c: Use it.
(fold_builtin_dyn_object_size): Call
compute_builtin_dyn_object_size for
__builtin_dynamic_object_size builtin.
(passes.def): Add pass_dynamic_object_sizes.
* tree-pass.h: Add ake_pass_dynamic_object_sizes.


Missing m in ake


+  if (TREE_CODE (ptr) == SSA_NAME
+  && compute_builtin_dyn_object_size (ptr, object_size_type, ))


Please don't abbreviate.


+  object_sizes[osi->object_size_type][SSA_NAME_VERSION (dest)] =
+object_sizes[osi->object_size_type][SSA_NAME_VERSION (orig)];


GCC coding convention don't want to see the = at the end of line, it should
be at the start of the next line after indentation.


+/* Initialize data structures for the object size computation.  */
+
+void
+init_dynamic_object_sizes (void)
+{
+  int object_size_type;
+
+  if (computed[0])
+return;
+
+  for (object_size_type = 0; object_size_type <= 3; object_size_type++)
+{
+  object_sizes[object_size_type].safe_grow (num_ssa_names, true);
+  computed[object_size_type] = BITMAP_ALLOC (NULL);
+}
+}
+
+
+unsigned int
+dynamic_object_sizes_execute (function *fun, bool lower_to_bos)
+{
+  basic_block bb;
+
+  init_dynamic_object_sizes ();
+


I'd prefer if the initialization could be done only lazily if
it sees at least one such call like the objsz pass does.  That is why
there is the if (computed[0]) return; at the start...


OK, will fix all these nits too.

Thanks,
Siddhesh

Re: [PATCH v2] libiberty: d-demangle: remove parenthesis where it is not needed

2021-10-12 Thread Luís Ferreira

On Tue, 2021-10-12 at 08:42 -0600, Jeff Law wrote:
>  
>  
> On 10/12/2021 8:06 AM, Luís Ferreira wrote:
>  
> > Those parenthesis doesn't increase readability at all and this
> > patch makes the
> > source code a bit more consistent with the rest of the
> > dereferencing
> > assignments.
> > 
> > ChangeLog:
> > libiberty/
> > * d-demangle.c (dlang_parse_qualified): Remove redudant
> > parenthesis around lhs and rhs of assignments.
>  This patch adds libiberty/Makefile and libiberty/testsuite/Makefile,
> which is wrong.  Please check that the patches you send only change
> the
> files you're intending to change.  While it may seem like it's
> trivial
> for someone else to strip out the extraneous junk, it's a waste of
> other people's time and it's easy to get things wrong when culling 
> away undesirable changes.
>  
>  I removed the extraneous junk and line-wrapped the ChangeLog this
> time
> and pushed this patch to the trunk.
>  
>  Thanks,
>  Jeff
>  

Sorry for the unintended junk :( . This got added by accident when
ammending. I will double check next time.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

[PATCH] [og11] nvptx: Revert "[nvptx] Expand OpenACC child function arguments to use CUDA params space"

2021-10-12 Thread Julian Brown

This reverts commit 31e53aef12f574a8534f8aea219b5466edb75b32.

Re-measuring the effect of this patch across a set of benchmarks
shows that it appears to have an overall slightly negative effect on
performance.  Thus, we are reverting it.
---
 gcc/ChangeLog.omp |  11 ++
 gcc/config/nvptx/nvptx.c  | 229 --
 libgomp/ChangeLog.omp |  13 ++
 libgomp/plugin/plugin-nvptx.c | 163 +---
 4 files changed, 173 insertions(+), 243 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 88754626121..b97c1acaf49 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,14 @@
+2021-10-12  Julian Brown  
+
+   Revert:
+
+   2019-09-10  Chung-Lin Tang  
+
+   * config/nvptx/nvptx.c (nvptx_expand_to_rtl_hook): New function
+   implementing CUDA .params space transformation.
+   (TARGET_EXPAND_TO_RTL_HOOK): implement hook with
+   nvptx_expand_to_rtl_hook.
+
 2021-10-12  Tobias Burnus  
 
Backported from master:
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 143be974fe8..e23c3902306 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -68,10 +68,6 @@
 #include "attribs.h"
 #include "tree-vrp.h"
 #include "tree-ssa-operands.h"
-#include "tree-pretty-print.h"
-#include "gimple-pretty-print.h"
-#include "tree-cfg.h"
-#include "gimple-ssa.h"
 #include "tree-ssanames.h"
 #include "gimplify.h"
 #include "tree-phinodes.h"
@@ -6339,228 +6335,6 @@ nvptx_libc_has_function (enum function_class fn_class, 
tree type)
   return default_libc_has_function (fn_class, type);
 }
 
-static void
-nvptx_expand_to_rtl_hook (void)
-{
-  /* For utilizing CUDA .param kernel arguments, we detect and modify
- the gimple of offloaded child functions, here before RTL expansion,
- starting with standard OMP form:
-  foo._omp_fn.0 (const struct .omp_data_t.8 & restrict .omp_data_i) { ... }
-   
- and transform it into a style where the OMP data record fields are
- "exploded" into individual scalar arguments:
-  foo._omp_fn.0 (int * a, int * b, int * c) { ... }
-
- Note that there are implicit assumptions of how OMP lowering (and/or other
- intervening passes) behaves contained in this transformation code;
- if those passes change in their output, this code may possibly need
- updating.  */
-
-  if (lookup_attribute ("omp target entrypoint",
-   DECL_ATTRIBUTES (current_function_decl))
-  /* The rather indirect manner in which OpenMP target functions are
-launched makes this transformation only valid for OpenACC currently.
-TODO: e.g. write_omp_entry(), nvptx_declare_function_name(), etc.
-needs changes for this to work with OpenMP.  */
-  && lookup_attribute ("oacc function",
-  DECL_ATTRIBUTES (current_function_decl))
-  && VOID_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl
-{
-  tree omp_data_arg = DECL_ARGUMENTS (current_function_decl);
-  tree argtype = TREE_TYPE (omp_data_arg);
-
-  /* Ensure this function is of the form of a single reference argument
-to the OMP data record, or a single void* argument (when no values
-passed)  */
-  if (! (DECL_CHAIN (omp_data_arg) == NULL_TREE
-&& ((TREE_CODE (argtype) == REFERENCE_TYPE
- && TREE_CODE (TREE_TYPE (argtype)) == RECORD_TYPE)
-|| (TREE_CODE (argtype) == POINTER_TYPE
-&& TREE_TYPE (argtype) == void_type_node
-   return;
-
-  if (dump_file)
-   {
- fprintf (dump_file, "Detected offloaded child function %s, "
-  "starting parameter conversion\n",
-  print_generic_expr_to_str (current_function_decl));
- fprintf (dump_file, "OMP data record argument: %s (tree type: %s)\n",
-  print_generic_expr_to_str (omp_data_arg),
-  print_generic_expr_to_str (argtype));
- fprintf (dump_file, "Data record fields:\n");
-   }
-  
-  hash_map fld_to_args;
-  tree fld, rectype = TREE_TYPE (argtype);
-  tree arglist = NULL_TREE, argtypelist = NULL_TREE;
-
-  if (TREE_CODE (rectype) == RECORD_TYPE)
-   {
- /* For each field in the OMP data record type, create a corresponding
-PARM_DECL, and map field -> parm using the fld_to_args hash_map.
-Also create the tree chains for creating function type and
-DECL_ARGUMENTS below.  */
- for (fld = TYPE_FIELDS (rectype); fld; fld = DECL_CHAIN (fld))
-   {
- tree narg = build_decl (DECL_SOURCE_LOCATION (fld), PARM_DECL,
- DECL_NAME (fld), TREE_TYPE (fld));
- DECL_ARTIFICIAL (narg) = 1;
- DECL_ARG_TYPE (narg) = TREE_TYPE (fld);
- DECL_CONTEXT (narg) = current_function_decl;
- TREE_USED (narg) = 1;
-

[PATCH v2] libiberty: d-demangle: use appendc for single chars append

2021-10-12 Thread Luís Ferreira

This may be optimized by some modern smart compilers inliner, but since strlen
can be an external source, this can produce slightly more unoptimized code.

ChangeLog:
libiberty/
* d-demangle.c (string_appendc): Add function to append single chars.
* d-demangle.c: Rewrite usage of string_append and string_appendn for
single chars to use string_appendc function.

Signed-off-by: Luís Ferreira 
---
 libiberty/d-demangle.c | 60 --
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 3adf7b562d1..9b12c8158bb 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -144,6 +144,14 @@ string_appendn (string *p, const char *s, size_t n)
 }
 }
 
+static void
+string_appendc (string *p, char c)
+{
+  string_need (p, 1);
+  *p->p = c;
+  p->p++;
+}
+
 static void
 string_prependn (string *p, const char *s, size_t n)
 {
@@ -664,7 +672,7 @@ dlang_function_type (string *decl, const char *mangled, 
struct dlang_info *info)
   /* Append to decl in order. */
   string_appendn (decl, type.b, string_length ());
   string_appendn (decl, args.b, string_length ());
-  string_append (decl, " ");
+  string_appendc (decl, ' ');
   string_appendn (decl, attr.b, string_length ());
 
   string_delete ();
@@ -816,9 +824,9 @@ dlang_type (string *decl, const char *mangled, struct 
dlang_info *info)
  mangled++;
}
   mangled = dlang_type (decl, mangled, info);
-  string_append (decl, "[");
+  string_appendc (decl, '[');
   string_appendn (decl, numptr, num);
-  string_append (decl, "]");
+  string_appendc (decl, ']');
   return mangled;
 }
 case 'H': /* associative array (T[T]) */
@@ -832,9 +840,9 @@ dlang_type (string *decl, const char *mangled, struct 
dlang_info *info)
   sztype = string_length ();
 
   mangled = dlang_type (decl, mangled, info);
-  string_append (decl, "[");
+  string_appendc (decl, '[');
   string_appendn (decl, type.b, sztype);
-  string_append (decl, "]");
+  string_appendc (decl, ']');
 
   string_delete ();
   return mangled;
@@ -844,7 +852,7 @@ dlang_type (string *decl, const char *mangled, struct 
dlang_info *info)
   if (!dlang_call_convention_p (mangled))
{
  mangled = dlang_type (decl, mangled, info);
- string_append (decl, "*");
+ string_appendc (decl, '*');
  return mangled;
}
   /* Fall through */
@@ -1181,7 +1189,7 @@ dlang_parse_integer (string *decl, const char *mangled, 
char type)
{
  /* Represent as a character literal.  */
  char c = (char) val;
- string_appendn (decl, , 1);
+ string_appendc (decl, c);
}
   else
{
@@ -1297,7 +1305,7 @@ dlang_parse_real (string *decl, const char *mangled)
   /* Hexadecimal prefix and leading bit.  */
   if (*mangled == 'N')
 {
-  string_append (decl, "-");
+  string_appendc (decl, '-');
   mangled++;
 }
 
@@ -1305,14 +1313,14 @@ dlang_parse_real (string *decl, const char *mangled)
 return NULL;
 
   string_append (decl, "0x");
-  string_appendn (decl, mangled, 1);
-  string_append (decl, ".");
+  string_appendc (decl, *mangled);
+  string_appendc (decl, '.');
   mangled++;
 
   /* Significand.  */
   while (ISXDIGIT (*mangled))
 {
-  string_appendn (decl, mangled, 1);
+  string_appendc (decl, *mangled);
   mangled++;
 }
 
@@ -1325,7 +1333,7 @@ dlang_parse_real (string *decl, const char *mangled)
 
   if (*mangled == 'N')
 {
-  string_append (decl, "-");
+  string_appendc (decl, '-');
   mangled++;
 }
 
@@ -1352,7 +1360,7 @@ dlang_parse_string (string *decl, const char *mangled)
 return NULL;
 
   mangled++;
-  string_append (decl, "\"");
+  string_appendc (decl, '\"');
   while (len--)
 {
   char val;
@@ -1365,7 +1373,7 @@ dlang_parse_string (string *decl, const char *mangled)
   switch (val)
{
case ' ':
- string_append (decl, " ");
+ string_appendc (decl, ' ');
  break;
case '\t':
  string_append (decl, "\\t");
@@ -1415,7 +1423,7 @@ dlang_parse_arrayliteral (string *decl, const char 
*mangled,
   if (mangled == NULL)
 return NULL;
 
-  string_append (decl, "[");
+  string_appendc (decl, '[');
   while (elements--)
 {
   mangled = dlang_value (decl, mangled, NULL, '\0', info);
@@ -1426,7 +1434,7 @@ dlang_parse_arrayliteral (string *decl, const char 
*mangled,
string_append (decl, ", ");
 }
 
-  string_append (decl, "]");
+  string_appendc (decl, ']');
   return mangled;
 }
 
@@ -1442,14 +1450,14 @@ dlang_parse_assocarray (string *decl, const char 
*mangled,
   if (mangled == NULL)
 return NULL;
 
-  string_append (decl, "[");
+  string_appendc (decl, '[');
   while (elements--)
 {
   mangled = dlang_value (decl, mangled, NULL, '\0', info);

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-12 Thread Martin Liška


On 10/11/21 15:45, Richard Biener wrote:

Btw, I'd be more comfortable when the move of the code would be
independent of the adjustment to not rely on AUTODETECT_VALUE.
Can we do the latter change first (IIRC the former one failed already)?


All right, so I'm doing the first step by eliminating AUTODETECT_VALUE.
Note we can't easily use EnabledBy, the option logic is more complicated (like 
optimize >= 1).

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
From 55af725e87379695faa4b11321e5b416f2981c74 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 12 Oct 2021 14:31:50 +0200
Subject: [PATCH] Eliminate AUTODETECT_VALUE usage in options.

gcc/ChangeLog:

	* common.opt: Stop using AUTODETECT_VALUE as Init value.
	* toplev.c (AUTODETECT_VALUE): Remove it.
	(process_options): Do not compare option values to
	AUTODETECT_VALUE, but use rather OPTION_SET_P macro.
	For flag_var_tracking, do not reset it if it is already set
	based on flag_var_tracking_uninit.
---
 gcc/common.opt | 24 ++--
 gcc/toplev.c   | 32 +---
 2 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4099effcc80..2b401abdc77 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3003,19 +3003,16 @@ Common Undocumented Var(flag_use_linker_plugin)
 
 ; Positive if we should track variables, negative if we should run
 ; the var-tracking pass only to discard debug annotations, zero if
-; we're not to run it.  When flag_var_tracking == 2 (AUTODETECT_VALUE) it
-; will be set according to optimize, debug_info_level and debug_hooks
-; in process_options ().
+; we're not to run it.
 fvar-tracking
-Common Var(flag_var_tracking) Init(2) PerFunction
+Common Var(flag_var_tracking) Init(1) PerFunction
 Perform variable tracking.
 
 ; Positive if we should track variables at assignments, negative if
 ; we should run the var-tracking pass only to discard debug
-; annotations.  When flag_var_tracking_assignments ==
-; AUTODETECT_VALUE it will be set according to flag_var_tracking.
+; annotations.
 fvar-tracking-assignments
-Common Var(flag_var_tracking_assignments) Init(2) PerFunction
+Common Var(flag_var_tracking_assignments) PerFunction
 Perform variable tracking by annotating assignments.
 
 ; Nonzero if we should toggle flag_var_tracking_assignments after
@@ -3026,8 +3023,7 @@ Toggle -fvar-tracking-assignments.
 
 ; Positive if we should track uninitialized variables, negative if
 ; we should run the var-tracking pass only to discard debug
-; annotations.  When flag_var_tracking_uninit == AUTODETECT_VALUE it
-; will be set according to flag_var_tracking.
+; annotations.
 fvar-tracking-uninit
 Common Var(flag_var_tracking_uninit) PerFunction
 Perform variable tracking and also tag variables that are uninitialized.
@@ -3190,11 +3186,11 @@ Common Driver RejectNegative JoinedOrMissing
 Generate debug information in default format.
 
 gas-loc-support
-Common Driver Var(dwarf2out_as_loc_support) Init(2)
+Common Driver Var(dwarf2out_as_loc_support)
 Assume assembler support for (DWARF2+) .loc directives.
 
 gas-locview-support
-Common Driver Var(dwarf2out_as_locview_support) Init(2)
+Common Driver Var(dwarf2out_as_locview_support)
 Assume assembler support for view in (DWARF2+) .loc directives.
 
 gcoff
@@ -3248,7 +3244,7 @@ Common Driver JoinedOrMissing
 Generate debug information in default extended format.
 
 ginline-points
-Common Driver Var(debug_inline_points) Init(2)
+Common Driver Var(debug_inline_points)
 Generate extended entry point information for inlined functions.
 
 ginternal-reset-location-views
@@ -3288,7 +3284,7 @@ Common Driver JoinedOrMissing Negative(gvms)
 Generate debug information in extended STABS format.
 
 gstatement-frontiers
-Common Driver Var(debug_nonbind_markers_p) Init(2)
+Common Driver Var(debug_nonbind_markers_p)
 Emit progressive recommended breakpoint locations.
 
 gstrict-dwarf
@@ -3304,7 +3300,7 @@ Common Driver Var(flag_gtoggle)
 Toggle debug information generation.
 
 gvariable-location-views
-Common Driver Var(debug_variable_location_views, 1) Init(2)
+Common Driver Var(debug_variable_location_views, 1)
 Augment variable location lists with progressive views.
 
 gvariable-location-views=incompat5
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 167feac2583..5e0f548f1ea 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -119,10 +119,6 @@ unsigned int save_decoded_options_count;
 /* Vector of saved Optimization decoded command line options.  */
 vec *save_opt_decoded_options;
 
-/* Used to enable -fvar-tracking, -fweb and -frename-registers according
-   to optimize in process_options ().  */
-#define AUTODETECT_VALUE 2
-
 /* Debug hooks - dependent upon command line options.  */
 
 const struct gcc_debug_hooks *debug_hooks;
@@ -1490,8 +1486,9 @@ process_options (bool no_backend)
   || !dwarf_debuginfo_p ()
   || debug_hooks->var_location ==

Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-12 Thread Martin Liška


On 10/12/21 15:37, Richard Biener wrote:

by adding EnabledBy(funroll-loops) to the respective options instead
(and funroll-loops EnabledBy(funroll-all-loops))


All right, so the suggested approach works correctly.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom c42efec30d7cce36c92d9369791826c9120dd3d1 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 12 Oct 2021 16:05:49 +0200
Subject: [PATCH] Fix handling of flag_rename_registers by a target.

gcc/ChangeLog:

	* common.opt: Use EnabledBy instead of detection in
	finish_options and process_options.
	* opts.c (finish_options): Remove handling of
	x_flag_unroll_all_loops.
	* toplev.c (process_options): Likewise for flag_web and
	flag_rename_registers.

diff --git a/gcc/common.opt b/gcc/common.opt
index 4099effcc80..bcbf95aada3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2399,7 +2399,7 @@ Common Var(flag_live_range_shrinkage) Init(0) Optimization
 Relief of register pressure through live range shrinkage.
 
 frename-registers
-Common Var(flag_rename_registers) Optimization
+Common Var(flag_rename_registers) Optimization EnabledBy(funroll-loops)
 Perform a register renaming optimization pass.
 
 fschedule-fusion
@@ -2939,7 +2939,7 @@ Common Var(flag_unroll_loops) Optimization
 Perform loop unrolling when iteration count is known.
 
 funroll-all-loops
-Common Var(flag_unroll_all_loops) Optimization
+Common Var(flag_unroll_all_loops) Optimization EnabledBy(funroll-all-loops)
 Perform loop unrolling for all loops.
 
 funroll-completely-grow-size
@@ -3158,7 +3158,7 @@ Common Var(flag_value_profile_transformations) Optimization
 Use expression value profiles in optimizations.
 
 fweb
-Common Var(flag_web) Optimization
+Common Var(flag_web) Optimization EnabledBy(funroll-loops)
 Construct webs and split unrelated uses of single variable.
 
 ftree-builtin-call-dce
diff --git a/gcc/opts.c b/gcc/opts.c
index 2116c2991dd..fc71b6e4242 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1321,11 +1321,6 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
    opts->x_flag_live_patching,
    loc);
 
-  /* Unrolling all loops implies that standard loop unrolling must also
- be done.  */
-  if (opts->x_flag_unroll_all_loops)
-opts->x_flag_unroll_loops = 1;
-
   /* Allow cunroll to grow size accordingly.  */
   if (!opts_set->x_flag_cunroll_grow_size)
 opts->x_flag_cunroll_grow_size
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 167feac2583..81546b19e91 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1331,13 +1331,6 @@ process_options (bool no_backend)
   flag_abi_version = 2;
 }
 
-  /* web and rename-registers help when run after loop unrolling.  */
-  if (!OPTION_SET_P (flag_web))
-flag_web = flag_unroll_loops;
-
-  if (!OPTION_SET_P (flag_rename_registers))
-flag_rename_registers = flag_unroll_loops;
-
   if (flag_non_call_exceptions)
 flag_asynchronous_unwind_tables = 1;
   if (flag_asynchronous_unwind_tables)
-- 
2.33.0

Re: [PATCH] doc: Fix typos in alloc_size documentation

2021-10-12 Thread Jeff Law via Gcc-patches





On 10/9/2021 3:59 AM, Daniel Le Duc Khoi Nguyen via Gcc-patches wrote:

2021-10-09  Daniel Le  

 * doc/extend.texi (Common Variable Attributes): Fix typos in
 alloc_size documentation.

Thanks.  I've installed this on the trunk.
jeff

Re: [PATCH v2] libiberty: d-demangle: remove parenthesis where it is not needed

2021-10-12 Thread Jeff Law via Gcc-patches





On 10/12/2021 8:06 AM, Luís Ferreira wrote:

Those parenthesis doesn't increase readability at all and this patch makes the
source code a bit more consistent with the rest of the dereferencing
assignments.

ChangeLog:
libiberty/
* d-demangle.c (dlang_parse_qualified): Remove redudant parenthesis 
around lhs and rhs of assignments.
This patch adds libiberty/Makefile and libiberty/testsuite/Makefile, 
which is wrong.  Please check that the patches you send only change the 
files you're intending to change.  While it may seem like it's trivial 
for someone else to strip out the extraneous junk, it's a waste of other 
people's time and it's easy to get things wrong when culling  away 
undesirable changes.


I removed the extraneous junk and line-wrapped the ChangeLog this time 
and pushed this patch to the trunk.


Thanks,
Jeff

Re: [PATCH 1/8] __builtin_dynamic_object_size: Recognize builtin name

2021-10-12 Thread Siddhesh Poyarekar


On 10/12/21 19:12, Jakub Jelinek wrote:

On Fri, Oct 08, 2021 at 03:44:25AM +0530, Siddhesh Poyarekar wrote:

--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -180,6 +180,7 @@ static rtx expand_builtin_memory_chk (tree, rtx, 
machine_mode,
  static void maybe_emit_chk_warning (tree, enum built_in_function);
  static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function);
  static tree fold_builtin_object_size (tree, tree);
+static tree fold_builtin_dyn_object_size (tree, tree);
  
  unsigned HOST_WIDE_INT target_newline;

  unsigned HOST_WIDE_INT target_percent;
@@ -7910,6 +7911,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
return const0_rtx;
  
  case BUILT_IN_OBJECT_SIZE:

+case BUILT_IN_DYN_OBJECT_SIZE:
return expand_builtin_object_size (exp);


I'd strongly prefer BUILT_IN_DYNAMIC_OBJECT_SIZE, we have even longer
builtin enums and the abbreviation will only lead to confusion.


+/* Fold a call to __builtin_dynamic_object_size with arguments PTR and OST,
+   if possible.  */
+
+static tree
+fold_builtin_dyn_object_size (tree ptr, tree ost)


Also please don't abbreviate.


Got it, will fix.


+{
+  int object_size_type;
+
+  if (!valid_object_size_args (ptr, ost, _size_type))
+return NULL_TREE;
+
+  /* __builtin_dynamic_object_size doesn't evaluate side-effects in its
+ arguments; if there are any side-effects, it returns (size_t) -1 for types
+ 0 and 1 and (size_t) 0 for types 2 and 3.  */
+  if (TREE_SIDE_EFFECTS (ptr))
+return build_int_cst_type (size_type_node, object_size_type < 2 ? -1 : 0);


If we want to commit this patch separately, then the more natural stub
implementation would be fold it into a __builtin_object_size call
(or call fold_builtin_object_size and only if it returns NULL_TREE fold
it into the builtin call).  But I assume we do not want to do that and
want to commit the whole series at once, therefore even this is good enough.


Ideally, it would be great to have the whole series go in at once but if 
we're not able to build consensus soon enough, I'll post this one patch 
for inclusion.


Thanks,
Siddhesh

Re: [PATCH] libiberty: d-demangle: remove parenthesis where it is not needed

2021-10-12 Thread Luís Ferreira

On Mon, 2021-10-04 at 09:33 +0200, ibuc...@gdcproject.org wrote:
> > On 29/09/2021 18:26 Luís Ferreira  wrote:
> > 
> >  
> > Those parenthesis doesn't increase readability at all and this
> > patch makes the
> > source code a bit more consistent with the rest of the
> > dereferencing
> > assignments.
> > 
> 
> OK, but can you write up a changelog entry for it?
> 
> Thanks,
> Iain.

Added on a PATCH v2.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

Re: [PATCH] libiberty: d-demangle: remove parenthesis where it is not needed

2021-10-12 Thread Luís Ferreira

On Mon, 2021-10-04 at 10:40 +0200, Andreas Schwab wrote:
> On Sep 29 2021, Luís Ferreira wrote:
> 
> > diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
> > index 3adf7b562d1..a05e72d8efe 100644
> > --- a/libiberty/d-demangle.c
> > +++ b/libiberty/d-demangle.c
> > @@ -253,15 +253,15 @@ dlang_hexdigit (const char *mangled, char
> > *ret)
> >  
> >    c = mangled[0];
> >    if (!ISDIGIT (c))
> > -    (*ret) = (c - (ISUPPER (c) ? 'A' : 'a') + 10);
> > +    *ret = (c - (ISUPPER (c) ? 'A' : 'a') + 10);
> 
> The outer pair of parens around rhs is also redundant.
> 
> Andreas.
> 

Resent the patch with the requested change.

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-12 Thread Martin Liška


On 10/12/21 15:37, Richard Biener wrote:

On Tue, Oct 12, 2021 at 2:18 PM Martin Liška  wrote:


Hello.

The option is disabled in rs6000 target with:

  { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

Thus, we have to do an auto-detection only if it's really unset and also
equal to the Init value.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
And the problematic test-case works on ppc64le.

Ready to be installed?


Hmm, I can see how it fixes the reported problem but I think the
thing is fragile.


You are fully right, it's quite fragile.


I wonder if we can express things like

+  if (!OPTION_SET_P (flag_web))
  flag_web = flag_unroll_loops;

or

+  if (!OPTION_SET_P (flag_rename_registers))
  flag_rename_registers = flag_unroll_loops;

by adding EnabledBy(funroll-loops) to the respective options instead
(and funroll-loops EnabledBy(funroll-all-loops))


Testing that approach, I like it.

Note that my fix:

if (!OPTION_SET_P (flag_rename_registers) && flag_rename_registers)

won't work if one target sets flag_rename_registers = 1 and another to 
flag_rename_registers = 0.
Then one can't use Init setting a proper default value.



All SET_OPTION_IF_UNSET are fragile with respect to target overrides
(-fprofile-use does a lot of those for example).

I suppose opts_set could also record whether the target overrided
sth with its option_optimization_table.


I can experiment with a patch where SET_OPTION_IF_UNSET modified opts_set.

Thanks for clever feedback.
Martin



Richard.


Thanks,
Martin

 PR target/102688

gcc/ChangeLog:

 * common.opt: Enable flag_rename_registers by default.
 * toplev.c (process_options): Auto-detect flag_rename_registers
 only if it is not turned off in a target.
---
   gcc/common.opt | 2 +-
   gcc/toplev.c   | 3 ++-
   2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4099effcc80..2c6be1bdd36 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2399,7 +2399,7 @@ Common Var(flag_live_range_shrinkage) Init(0) Optimization
   Relief of register pressure through live range shrinkage.

   frename-registers
-Common Var(flag_rename_registers) Optimization
+Common Var(flag_rename_registers) Init(1) Optimization
   Perform a register renaming optimization pass.

   fschedule-fusion
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 167feac2583..ee7d8854f90 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1335,7 +1335,8 @@ process_options (bool no_backend)
 if (!OPTION_SET_P (flag_web))
   flag_web = flag_unroll_loops;

-  if (!OPTION_SET_P (flag_rename_registers))
+  /* The option can be turned off in a target.  */
+  if (!OPTION_SET_P (flag_rename_registers) && flag_rename_registers)
   flag_rename_registers = flag_unroll_loops;

 if (flag_non_call_exceptions)
--
2.33.0

Re: [PATCH 2/8] tree-dynamic-object-size: New pass

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Fri, Oct 08, 2021 at 03:44:26AM +0530, Siddhesh Poyarekar wrote:
> A new pass is added to execute just before the tree-object-size pass
> to recognize and simplify __builtin_dynamic_object_size.  Some key
> ideas (such as multipass object size collection to detect reference
> loops) have been taken from tree-object-size but is distinct from it
> to ensure minimal impact on existing code.
> 
> At the moment, the pass only recognizes allocators and passthrough
> functions to attempt to derive object size expressions, and replaces
> the call site with those expressions.  On failure, it replaces the
> __builtin_dynamic_object_size with __builtin_object_size as a
> fallback.

Not full review, just nits for now:
I don't really like using separate passes for this, whether
it should be done in the same source or different source file
is something less clear, I guess it depends on how much from
tree-object-size.[ch] can be reused, how much could be e.g. done
by pretending the 0-3 operands of __builtin_dynamic_object_size
are actually 4-7 and have [8] arrays in tree-object-size.[ch]
to track that and how much is separate.
I think the common case is either that a function won't
contain any of the builtin calls (most likely on non-glibc targets,
but even on glibc targets for most of the code that doesn't use any
of the fortified APIs), or it uses just one of them and not both
(e.g. glibc -D_FORTIFY_SOURCE={1,2} vs. -D_FORTIFY_SOURCE=3),
so either one or the other pass would just uselessly walk the whole
IL.
The objsz passes are walk over the whole IL, if they see
__bos calls, do something and call something in the end
and your pass is similar, so hooking it into the current pass
is trivial, just if
if (!gimple_call_builtin_p (call, BUILT_IN_OBJECT_SIZE))
before doing continue; check for the other builtin and call
a function to handle it, either from the same or different file,
and then at the end destruct what is needed too.
Especially on huge functions, each IL traversal may need to page into
caches (and out of them) lots of statements...

>   * Makefile.in (OBJS): Add tree-dynamic-object-size.o.
>   (PLUGIN_HEADERS): Add tree-dynamic-object-size.h.
>   * tree-dynamic-object-size.c: New file.
>   * tree-dynamic-object-size.h: New file.
>   * builtins.c: Use it.
>   (fold_builtin_dyn_object_size): Call
>   compute_builtin_dyn_object_size for
>   __builtin_dynamic_object_size builtin.
>   (passes.def): Add pass_dynamic_object_sizes.
>   * tree-pass.h: Add ake_pass_dynamic_object_sizes.

Missing m in ake

> +  if (TREE_CODE (ptr) == SSA_NAME
> +  && compute_builtin_dyn_object_size (ptr, object_size_type, ))

Please don't abbreviate.

> +  object_sizes[osi->object_size_type][SSA_NAME_VERSION (dest)] =
> +object_sizes[osi->object_size_type][SSA_NAME_VERSION (orig)];

GCC coding convention don't want to see the = at the end of line, it should
be at the start of the next line after indentation.

> +/* Initialize data structures for the object size computation.  */
> +
> +void
> +init_dynamic_object_sizes (void)
> +{
> +  int object_size_type;
> +
> +  if (computed[0])
> +return;
> +
> +  for (object_size_type = 0; object_size_type <= 3; object_size_type++)
> +{
> +  object_sizes[object_size_type].safe_grow (num_ssa_names, true);
> +  computed[object_size_type] = BITMAP_ALLOC (NULL);
> +}
> +}
> +
> +
> +unsigned int
> +dynamic_object_sizes_execute (function *fun, bool lower_to_bos)
> +{
> +  basic_block bb;
> +
> +  init_dynamic_object_sizes ();
> +

I'd prefer if the initialization could be done only lazily if
it sees at least one such call like the objsz pass does.  That is why
there is the if (computed[0]) return; at the start...

Jakub

Re: Ping^1 [PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-10-12 Thread Bill Schmidt via Gcc-patches

Hi Hao Chen,

On 10/11/21 12:32 AM, HAO CHEN GUI wrote:
> Hi,
>
>  Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579038.html
>
> Thanks
>
> On 8/9/2021 下午 2:42, HAO CHEN GUI wrote:
>> Hi,
>>
>>   The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it 
>> is implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it 
>> is implemented by quadword byte reverse plus halfword/word byte reverse when 
>> p9_vector is defined.
>>
>>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
>> okay for trunk? Any recommendations? Thanks a lot.
>>
>>
>> ChangeLog
>>
>> 2021-09-08 Haochen Gui 
>>
>> gcc/
>>     * config/rs6000/altivec.md (altivec_vreve2 for VEC_K):
>>     Use xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw
>>     for v4si or v4sf when p9_vector is defined.
>>     (altivec_vreve2 for VEC_64): Defined. Implemented by
>>     xxswapd.
>>
>> gcc/testsuite/
>>     * gcc.target/powerpc/vec_reve_1.c: New test.
>>     * gcc.target/powerpc/vec_reve_2.c: Likewise.
>>
>>
>> patch.diff
>>
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 1351dafbc41..a1698ce85c0 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -4049,13 +4049,43 @@ (define_expand "altivec_negv4sf2"
>>    DONE;
>>  })
>>
>> -;; Vector reverse elements
>> +;; Vector reverse elements for V16QI V8HI V4SI V4SF
>>  (define_expand "altivec_vreve2"
>> -  [(set (match_operand:VEC_A 0 "register_operand" "=v")
>> -   (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
>> +  [(set (match_operand:VEC_K 0 "register_operand" "=v")
>> +   (unspec:VEC_K [(match_operand:VEC_K 1 "register_operand" "v")]
>>   UNSPEC_VREVEV))]
>>    "TARGET_ALTIVEC"
>>  {
>> +  if (TARGET_P9_VECTOR)
>> +    {
>> +  if (mode == V16QImode)
>> +   emit_insn (gen_p9_xxbrq_v16qi (operands[0], operands[1]));
>> +  else if (mode == V8HImode)
>> +   {
>> + rtx subreg1 = simplify_gen_subreg (V1TImode, operands[1],
>> +    mode, 0);
>> + rtx temp = gen_reg_rtx (V1TImode);
>> + emit_insn (gen_p9_xxbrq_v1ti (temp, subreg1));
>> + rtx subreg2 = simplify_gen_subreg (mode, temp,
>> +    V1TImode, 0);
>> + emit_insn (gen_p9_xxbrh_v8hi (operands[0], subreg2));
>> +   }
>> +  else /* V4SI and V4SF.  */
>> +   {
>> + rtx subreg1 = simplify_gen_subreg (V1TImode, operands[1],
>> +    mode, 0);
>> + rtx temp = gen_reg_rtx (V1TImode);
>> + emit_insn (gen_p9_xxbrq_v1ti (temp, subreg1));
>> + rtx subreg2 = simplify_gen_subreg (mode, temp,
>> +    V1TImode, 0);
>> + if (mode == V4SImode)
>> +   emit_insn (gen_p9_xxbrw_v4si (operands[0], subreg2));
>> + else
>> +   emit_insn (gen_p9_xxbrw_v4sf (operands[0], subreg2));
>> +   }
>> +  DONE;
>> +    }
>> +
>>    int i, j, size, num_elements;
>>    rtvec v = rtvec_alloc (16);
>>    rtx mask = gen_reg_rtx (V16QImode);
>> @@ -4074,6 +4104,17 @@ (define_expand "altivec_vreve2"
>>    DONE;
>>  })
>>
>> +;; Vector reverse elements for V2DI V2DF
>> +(define_expand "altivec_vreve2"
>> +  [(set (match_operand:VEC_64 0 "register_operand" "=v")
>> +   (unspec:VEC_64 [(match_operand:VEC_64 1 "register_operand" "v")]
>> + UNSPEC_VREVEV))]
>> +  "TARGET_ALTIVEC"
>> +{
>> +  emit_insn (gen_xxswapd_ (operands[0], operands[1]));
>> +  DONE;
>> +})
>> +
>>  ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL,
>>  ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell.
>>  (define_insn "altivec_lvlx"
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c 
>> b/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c
>> new file mode 100644
>> index 000..83a9206758b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-O2 -maltivec" } */
>> +
>> +#include 
>> +
>> +vector double foo1 (vector double a)
>> +{
>> +   return vec_reve (a);
>> +}
>> +
>> +vector long long foo2 (vector long long a)
>> +{
>> +   return vec_reve (a);
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c 
>> b/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c
>> new file mode 100644
>> index 000..b6dd33d6d79
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c
>> @@ -0,0 +1,28 @@
>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power9 -O2 -maltivec" } */

One nit here -- you don't need -maltivec as it's redundant with 
-mdejagnu-cpu=power9.

Looks fine to me with or

Re: [PATCH, rs6000] Optimization for vec_xl_sext

2021-10-12 Thread Bill Schmidt via Gcc-patches

Hi Hao Chen,

On 9/15/21 2:35 AM, HAO CHEN GUI wrote:
> Bill,
>
>     Yes, I built the gcc with p10 binutils. Then power10_ok tests can pass. 
> Thanks again for your kindly explanation.
>
>     I finally realized that the line wrap settings on my thunderbird didn't 
> take any effect. I have to set a very large line size,  just for a workaround.
>
> ChangeLog
>
> 2021-09-15 Haochen Gui 
>
> gcc/
>     * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>     Modify the expansion for sign extension. All extentions are done
>     within VSX resgisters.

Two typos here:  extentions => extensions, resgisters => registers.

The patch itself looks good to me.  I recommend the maintainers approve with 
the ChangeLog fixed.

Thanks!
Bill
>     * gcc/config/rs6000/vsx.md (vsx_sign_extend_si_v2di): Define.
>
> gcc/testsuite/
>     * gcc.target/powerpc/p10_vec_xl_sext.c: New test.
>
> patch.diff
>
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index b4e13af4dc6..587e9fa2a2a 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9779,7 +9779,7 @@ altivec_expand_lxvr_builtin (enum insn_code icode, tree 
> exp, rtx target, bool bl
>
>    if (sign_extend)
>  {
> -  rtx discratch = gen_reg_rtx (DImode);
> +  rtx discratch = gen_reg_rtx (V2DImode);
>    rtx tiscratch = gen_reg_rtx (TImode);
>
>    /* Emit the lxvr*x insn.  */
> @@ -9788,20 +9788,31 @@ altivec_expand_lxvr_builtin (enum insn_code icode, 
> tree exp, rtx target, bool bl
>     return 0;
>    emit_insn (pat);
>
> -  /* Emit a sign extension from QI,HI,WI to double (DI).  */
> -  rtx scratch = gen_lowpart (smode, tiscratch);
> +  /* Emit a sign extension from V16QI,V8HI,V4SI to V2DI.  */
> +  rtx temp1, temp2;
>    if (icode == CODE_FOR_vsx_lxvrbx)
> -   emit_insn (gen_extendqidi2 (discratch, scratch));
> +   {
> + temp1  = simplify_gen_subreg (V16QImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_qi_v2di (discratch, temp1));
> +   }
>    else if (icode == CODE_FOR_vsx_lxvrhx)
> -   emit_insn (gen_extendhidi2 (discratch, scratch));
> +   {
> + temp1  = simplify_gen_subreg (V8HImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_hi_v2di (discratch, temp1));
> +   }
>    else if (icode == CODE_FOR_vsx_lxvrwx)
> -   emit_insn (gen_extendsidi2 (discratch, scratch));
> -  /*  Assign discratch directly if scratch is already DI.  */
> -  if (icode == CODE_FOR_vsx_lxvrdx)
> -   discratch = scratch;
> +   {
> + temp1  = simplify_gen_subreg (V4SImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_si_v2di (discratch, temp1));
> +   }
> +  else if (icode == CODE_FOR_vsx_lxvrdx)
> +   discratch = simplify_gen_subreg (V2DImode, tiscratch, TImode, 0);
> +  else
> +   gcc_unreachable ();
>
> -  /* Emit the sign extension from DI (double) to TI (quad). */
> -  emit_insn (gen_extendditi2 (target, discratch));
> +  /* Emit the sign extension from V2DI (double) to TI (quad).  */
> +  temp2 = simplify_gen_subreg (TImode, discratch, V2DImode, 0);
> +  emit_insn (gen_extendditi2_vector (target, temp2));
>
>    return target;
>  }
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bcb92be2f5c..987f21bbc22 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4830,7 +4830,7 @@ (define_insn "vsx_sign_extend_hi_"
>    "vextsh2 %0,%1"
>    [(set_attr "type" "vecexts")])
>
> -(define_insn "*vsx_sign_extend_si_v2di"
> +(define_insn "vsx_sign_extend_si_v2di"
>    [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
>     (unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
>  UNSPEC_VSX_SIGN_EXTEND))]
> diff --git a/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c 
> b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
> new file mode 100644
> index 000..78e72ac5425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include 
> +
> +vector signed __int128
> +foo1 (signed long a, signed char *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo2 (signed long a, signed short *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo3 (signed long a, signed int *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo4 (signed long a, signed long *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times

Re: [PATCH 1/8] __builtin_dynamic_object_size: Recognize builtin name

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Fri, Oct 08, 2021 at 03:44:25AM +0530, Siddhesh Poyarekar wrote:
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -180,6 +180,7 @@ static rtx expand_builtin_memory_chk (tree, rtx, 
> machine_mode,
>  static void maybe_emit_chk_warning (tree, enum built_in_function);
>  static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function);
>  static tree fold_builtin_object_size (tree, tree);
> +static tree fold_builtin_dyn_object_size (tree, tree);
>  
>  unsigned HOST_WIDE_INT target_newline;
>  unsigned HOST_WIDE_INT target_percent;
> @@ -7910,6 +7911,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>return const0_rtx;
>  
>  case BUILT_IN_OBJECT_SIZE:
> +case BUILT_IN_DYN_OBJECT_SIZE:
>return expand_builtin_object_size (exp);

I'd strongly prefer BUILT_IN_DYNAMIC_OBJECT_SIZE, we have even longer
builtin enums and the abbreviation will only lead to confusion.

> +/* Fold a call to __builtin_dynamic_object_size with arguments PTR and OST,
> +   if possible.  */
> +
> +static tree
> +fold_builtin_dyn_object_size (tree ptr, tree ost)

Also please don't abbreviate.

> +{
> +  int object_size_type;
> +
> +  if (!valid_object_size_args (ptr, ost, _size_type))
> +return NULL_TREE;
> +
> +  /* __builtin_dynamic_object_size doesn't evaluate side-effects in its
> + arguments; if there are any side-effects, it returns (size_t) -1 for 
> types
> + 0 and 1 and (size_t) 0 for types 2 and 3.  */
> +  if (TREE_SIDE_EFFECTS (ptr))
> +return build_int_cst_type (size_type_node, object_size_type < 2 ? -1 : 
> 0);

If we want to commit this patch separately, then the more natural stub
implementation would be fold it into a __builtin_object_size call
(or call fold_builtin_object_size and only if it returns NULL_TREE fold
it into the builtin call).  But I assume we do not want to do that and
want to commit the whole series at once, therefore even this is good enough.

Jakub

Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-12 Thread Richard Biener via Gcc-patches

On Tue, Oct 12, 2021 at 2:18 PM Martin Liška  wrote:
>
> Hello.
>
> The option is disabled in rs6000 target with:
>
>  { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
>
> Thus, we have to do an auto-detection only if it's really unset and also
> equal to the Init value.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> And the problematic test-case works on ppc64le.
>
> Ready to be installed?

Hmm, I can see how it fixes the reported problem but I think the
thing is fragile.  I wonder if we can express things like

+  if (!OPTION_SET_P (flag_web))
 flag_web = flag_unroll_loops;

or

+  if (!OPTION_SET_P (flag_rename_registers))
 flag_rename_registers = flag_unroll_loops;

by adding EnabledBy(funroll-loops) to the respective options instead
(and funroll-loops EnabledBy(funroll-all-loops))

All SET_OPTION_IF_UNSET are fragile with respect to target overrides
(-fprofile-use does a lot of those for example).

I suppose opts_set could also record whether the target overrided
sth with its option_optimization_table.

Richard.

> Thanks,
> Martin
>
> PR target/102688
>
> gcc/ChangeLog:
>
> * common.opt: Enable flag_rename_registers by default.
> * toplev.c (process_options): Auto-detect flag_rename_registers
> only if it is not turned off in a target.
> ---
>   gcc/common.opt | 2 +-
>   gcc/toplev.c   | 3 ++-
>   2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4099effcc80..2c6be1bdd36 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2399,7 +2399,7 @@ Common Var(flag_live_range_shrinkage) Init(0) 
> Optimization
>   Relief of register pressure through live range shrinkage.
>
>   frename-registers
> -Common Var(flag_rename_registers) Optimization
> +Common Var(flag_rename_registers) Init(1) Optimization
>   Perform a register renaming optimization pass.
>
>   fschedule-fusion
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 167feac2583..ee7d8854f90 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1335,7 +1335,8 @@ process_options (bool no_backend)
> if (!OPTION_SET_P (flag_web))
>   flag_web = flag_unroll_loops;
>
> -  if (!OPTION_SET_P (flag_rename_registers))
> +  /* The option can be turned off in a target.  */
> +  if (!OPTION_SET_P (flag_rename_registers) && flag_rename_registers)
>   flag_rename_registers = flag_unroll_loops;
>
> if (flag_non_call_exceptions)
> --
> 2.33.0
>

Re: [PATCH] Allow `make tags` to work from top-level directory

2021-10-12 Thread Jeff Law via Gcc-patches





On 10/11/2021 4:05 PM, Eric Gallager via Gcc-patches wrote:

On Thu, Oct 13, 2016 at 4:43 PM Eric Gallager  wrote:

On 10/13/16, Jeff Law  wrote:

On 10/06/2016 07:21 AM, Eric Gallager wrote:

The libdecnumber, libgcc, and libobjc subdirectories are missing TAGS
targets in their Makefiles. The attached patch causes them to be
skipped when running `make tags`.

ChangeLog entry:

2016-10-06  Eric Gallager  

  * Makefile.def: Mark libdecnumber, libgcc, and libobjc as missing
  TAGS target.
  * Makefile.in: Regenerate.


OK.  Please install.

Thanks,
Jeff



I'm still waiting to hear back from  about my request
for copyright assignment, which I'll need to get sorted out before I
can start committing stuff (like this patch).

Thanks,
Eric

Update: In the intervening years, I got my copyright assignment filed
and have recently become able to commit again; is your old approval
from 2016 still valid, Jeff, or do I need a re-approval?
Ref: https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg00370.html

It's still valid.  Just re-test and commit.

jeff

Re: [PATCH] libiberty: prevent buffer overflow when decoding user input

2021-10-12 Thread Luís Ferreira

On Fri, 2021-10-08 at 22:11 +0200, Iain Buclaw wrote:
> Excerpts from Luís Ferreira's message of October 8, 2021 7:08 pm:
> > On Fri, 2021-10-08 at 18:52 +0200, Iain Buclaw wrote:
> > > Excerpts from Luís Ferreira's message of October 7, 2021 8:29 pm:
> > > > On Tue, 2021-10-05 at 21:49 -0400, Eric Gallager wrote:
> > > > > 
> > > > > I can help with the autotools part if you can say how precisely
> > > > > you'd
> > > > > like to use them to add address sanitization. And as for the
> > > > > OSS
> > > > > fuzz part, I think someone tried setting up auto-fuzzing for it
> > > > > once,
> > > > > but the main bottleneck was getting the bug reports that it
> > > > > generated
> > > > > properly triaged, so if you could make sure the bug-submitting
> > > > > portion
> > > > > of the process is properly streamlined, that'd probably go a
> > > > > long
> > > > > way
> > > > > towards helping it be useful.
> > > > 
> > > > Bugs are normally reported by email or mailing list. Is there any
> > > > writable mailing list to publish bugs or is it strictly needed to
> > > > open
> > > > an entry on bugzilla?
> > > > 
> > > 
> > > Please open an issue on bugzilla, fixes towards it can then be
> > > referenced in the commit message/patch posted here.
> > > 
> > > Iain.
> > 
> > You mean for this current issue? The discussion was about future bug
> > reports reported by the OSS fuzzer workers. I can also open an issue
> > on
> > the bugzilla for this issue, please clarify it and let me know :)
> > 
> 
> 1. Open one for this issue.
> 
> 2. Bugs found by the fuzzer would report to bugzilla.
> https://gcc.gnu.org/bugs/
> 
> Iain.

Cross referencing the created issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102702

-- 
Sincerely,
Luís Ferreira @ lsferreira.net



signature.asc
Description: This is a digitally signed message part

Re: [PATCH] hardened conditionals

2021-10-12 Thread Richard Biener via Gcc-patches

On Tue, Oct 12, 2021 at 8:35 AM Alexandre Oliva  wrote:
>
> On Oct  9, 2021, Richard Biener  wrote:
>
> > Why two passes (and two IL traverses?)
>
> Different traversals, no reason to force them into a single pass.  One
> only looks at the last stmt of each block, where cond stmts may be,
> while the other has to look at every stmt.
>
> > How do you prevent RTL optimizers (jump threading) from removing the
> > redundant tests?
>
> The trick I'm using to copy of a value without the compiler's knowing
> it's still the same value is 'asm ("" : "=g" (alt) : "0" (src));'
>
> I've pondered introducing __builtin_hidden_copy or somesuch, but it
> didn't seem worth it.

I see.  I remember Marc using sth similar initially when trying to
solve the FENV access problem.  Maybe we indeed want to have
some kind of more generic "dataflow (optimization) barrier" ...

Are there any issues with respect to debugging when using such
asm()s?

> > I'd have expected such hardening to occur very late in the RTL
> > pipeline.
>
> Yeah, that would be another way to do it, but then it would have to be a
> lot trickier, given all the different ways in which compare-and-branch
> can be expressed in RTL.

Agreed, though it would be less disturbing to the early RTL pipeline
and RTL expansion.

> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

RE: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt

2021-10-12 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrew Pinski 
> Sent: Monday, October 11, 2021 8:56 PM
> To: Kyrylo Tkachov 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> apin...@marvell.com; Richard Earnshaw ; nd
> ; Marcus Shawcroft ; Richard
> Sandiford 
> Subject: Re: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> 
> On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches
>  wrote:
> > > -Original Message-
> > > From: Tamar Christina 
> > > Sent: Wednesday, September 29, 2021 5:20 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd ; Richard Earnshaw
> ;
> > > Marcus Shawcroft ; Kyrylo Tkachov
> > > ; Richard Sandiford
> > > 
> > > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt
> > >
> > > Hi All,
> > >
> > > This optimizes signed right shift by BITSIZE-1 into a cmlt operation which
> is
> > > more optimal because generally compares have a higher throughput than
> > > shifts.
> > >
> > > On AArch64 the result of the shift would have been either -1 or 0 which is
> the
> > > results of the compare.
> > >
> > > i.e.
> > >
> > > void e (int * restrict a, int *b, int n)
> > > {
> > > for (int i = 0; i < n; i++)
> > >   b[i] = a[i] >> 31;
> > > }
> > >
> > > now generates:
> > >
> > > .L4:
> > > ldr q0, [x0, x3]
> > > cmltv0.4s, v0.4s, #0
> > > str q0, [x1, x3]
> > > add x3, x3, 16
> > > cmp x4, x3
> > > bne .L4
> > >
> > > instead of:
> > >
> > > .L4:
> > > ldr q0, [x0, x3]
> > > sshrv0.4s, v0.4s, 31
> > > str q0, [x1, x3]
> > > add x3, x3, 16
> > > cmp x4, x3
> > > bne .L4
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > This should be okay (either a win or neutral) for Arm Cortex and Neoverse
> cores so I'm tempted to not ask for a CPU-specific tunable to guard it to keep
> the code clean.
> > Andrew, would this change be okay from a Thunder X line perspective?
> 
> I don't know about ThunderX2 but here are the details for ThunderX1
> (and OcteonX1) and OcteonX2:
> The sshr and cmlt are handled the same in the pipeline as far as I can tell.
> 

Thanks for the info.
This patch is ok.
Kyrill

> Thanks,
> Andrew
> 
> 
> 
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-simd.md (aarch64_simd_ashr):
> > > Add case cmp
> > >   case.
> > >   * config/aarch64/constraints.md (D1): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/shl-combine-2.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > b/gcc/config/aarch64/aarch64-simd.md
> > > index
> > >
> 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829
> > > 7ded861228ee22007 100644
> > > --- a/gcc/config/aarch64/aarch64-simd.md
> > > +++ b/gcc/config/aarch64/aarch64-simd.md
> > > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr"
> > >  )
> > >
> > >  (define_insn "aarch64_simd_ashr"
> > > - [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> > > -   (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> > > -  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > "Dr")))]
> > > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w")
> > > +   (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w,w")
> > > +  (match_operand:VDQ_I  2 "aarch64_simd_rshift_imm"
> > > "D1,Dr")))]
> > >   "TARGET_SIMD"
> > > - "sshr\t%0., %1., %2"
> > > -  [(set_attr "type" "neon_shift_imm")]
> > > + "@
> > > +  cmlt\t%0., %1., #0
> > > +  sshr\t%0., %1., %2"
> > > +  [(set_attr "type" "neon_compare,neon_shift_imm")]
> > >  )
> > >
> > >  (define_insn "*aarch64_simd_sra"
> > > diff --git a/gcc/config/aarch64/constraints.md
> > > b/gcc/config/aarch64/constraints.md
> > > index
> > >
> 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89
> > > 9db69fd428dfb0d6 100644
> > > --- a/gcc/config/aarch64/constraints.md
> > > +++ b/gcc/config/aarch64/constraints.md
> > > @@ -437,6 +437,14 @@ (define_constraint "Dl"
> > >(match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
> > >true)")))
> > >
> > > +(define_constraint "D1"
> > > +  "@internal
> > > + A constraint that matches vector of immediates that is bits(mode)-1."
> > > + (and (match_code "const,const_vector")
> > > +  (match_test "aarch64_const_vec_all_same_in_range_p (op,
> > > + GET_MODE_UNIT_BITSIZE (mode) - 1,
> > > + GET_MODE_UNIT_BITSIZE (mode) - 1)")))
> > > +
> > >  (define_constraint "Dr"
> > >"@internal
> > >   A constraint that matches vector of immediates for right shifts."
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c
> > > new file

[PATCH] Fix handling of flag_rename_registers.

2021-10-12 Thread Martin Liška


Hello.

The option is disabled in rs6000 target with:

{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

Thus, we have to do an auto-detection only if it's really unset and also
equal to the Init value.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
And the problematic test-case works on ppc64le.

Ready to be installed?
Thanks,
Martin

PR target/102688

gcc/ChangeLog:

* common.opt: Enable flag_rename_registers by default.
* toplev.c (process_options): Auto-detect flag_rename_registers
only if it is not turned off in a target.
---
 gcc/common.opt | 2 +-
 gcc/toplev.c   | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4099effcc80..2c6be1bdd36 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2399,7 +2399,7 @@ Common Var(flag_live_range_shrinkage) Init(0) Optimization
 Relief of register pressure through live range shrinkage.
 
 frename-registers

-Common Var(flag_rename_registers) Optimization
+Common Var(flag_rename_registers) Init(1) Optimization
 Perform a register renaming optimization pass.
 
 fschedule-fusion

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 167feac2583..ee7d8854f90 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1335,7 +1335,8 @@ process_options (bool no_backend)
   if (!OPTION_SET_P (flag_web))
 flag_web = flag_unroll_loops;
 
-  if (!OPTION_SET_P (flag_rename_registers))

+  /* The option can be turned off in a target.  */
+  if (!OPTION_SET_P (flag_rename_registers) && flag_rename_registers)
 flag_rename_registers = flag_unroll_loops;
 
   if (flag_non_call_exceptions)

--
2.33.0

Re: [SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-12 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 11 Oct 2021 at 20:42, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Fri, 8 Oct 2021 at 21:19, Richard Sandiford
> >  wrote:
> >>
> >> Thanks for looking at this.
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi,
> >> > As mentioned in PR, for the following test-case:
> >> >
> >> > typedef unsigned char uint8_t;
> >> >
> >> > static inline uint8_t
> >> > x264_clip_uint8(uint8_t x)
> >> > {
> >> >   uint8_t t = -x;
> >> >   uint8_t t1 = x & ~63;
> >> >   return (t1 != 0) ? t : x;
> >> > }
> >> >
> >> > void
> >> > mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
> >> > {
> >> >   for (int x = 0; x < n*16; x++)
> >> > dst[x] = x264_clip_uint8(src[x]);
> >> > }
> >> >
> >> > -O3 -mcpu=generic+sve generates following code for the inner loop:
> >> >
> >> > .L3:
> >> > ld1bz0.b, p0/z, [x1, x2]
> >> > movprfx z2, z0
> >> > and z2.b, z2.b, #0xc0
> >> > movprfx z1, z0
> >> > neg z1.b, p1/m, z0.b
> >> > cmpeq   p2.b, p1/z, z2.b, #0
> >> > sel z0.b, p2, z0.b, z1.b
> >> > st1bz0.b, p0, [x0, x2]
> >> > add x2, x2, x4
> >> > whilelo p0.b, w2, w3
> >> > b.any   .L3
> >> >
> >> > The sel is redundant since we could conditionally negate z0 based on
> >> > the predicate
> >> > comparing z2 with 0.
> >> >
> >> > As suggested in the PR, the attached patch, introduces a new
> >> > conditional internal function .COND_NEG, and in gimple-isel replaces
> >> > the following sequence:
> >> >op2 = -op1
> >> >op0 = A cmp B
> >> >lhs = op0 ? op1 : op2
> >> >
> >> > with:
> >> >op0 = A inverted_cmp B
> >> >lhs = .COND_NEG (op0, op1, op1).
> >> >
> >> > lhs = .COD_NEG (op0, op1, op1)
> >> > implies
> >> > lhs = neg (op1) if cond is true OR fall back to op1 if cond is false.
> >> >
> >> > With patch, it generates the following code-gen:
> >> > .L3:
> >> > ld1bz0.b, p0/z, [x1, x2]
> >> > movprfx z1, z0
> >> > and z1.b, z1.b, #0xc0
> >> > cmpne   p1.b, p2/z, z1.b, #0
> >> > neg z0.b, p1/m, z0.b
> >> > st1bz0.b, p0, [x0, x2]
> >> > add x2, x2, x4
> >> > whilelo p0.b, w2, w3
> >> > b.any   .L3
> >> >
> >> > While it seems to work for this test-case, I am not entirely sure if
> >> > the patch is correct. Does it look in the right direction ?
> >>
> >> For binary ops we use match.pd rather than isel:
> >>
> >> (for uncond_op (UNCOND_BINARY)
> >>  cond_op (COND_BINARY)
> >>  (simplify
> >>   (vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
> >>   (with { tree op_type = TREE_TYPE (@4); }
> >>(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> >> op_type)
> >> && is_truth_type_for (op_type, TREE_TYPE (@0)))
> >> (view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))
> >>  (simplify
> >>   (vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
> >>   (with { tree op_type = TREE_TYPE (@4); }
> >>(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), 
> >> op_type)
> >> && is_truth_type_for (op_type, TREE_TYPE (@0)))
> >> (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type 
> >> @1)))
> >>
> >> I think it'd be good to do the same here, using new (UN)COND_UNARY
> >> iterators.  (The iterators will only have one value to start with,
> >> but other unary ops could get the same treatment in future.)
> > Thanks for the suggestions.
> > The attached patch adds a pattern to match.pd to replace:
> > cond = a cmp b
> > r = cond ? x : -x
> > with:
> > cond = a inverted_cmp b
> > r = cond ? -x : x
> >
> > Code-gen with patch for inner loop:
> > .L3:
> > ld1bz0.b, p0/z, [x1, x2]
> > movprfx z1, z0
> > and z1.b, z1.b, #0xc0
> > cmpne   p1.b, p2/z, z1.b, #0
> > neg z0.b, p1/m, z0.b
> > st1bz0.b, p0, [x0, x2]
> > add x2, x2, x4
> > whilelo p0.b, w2, w3
> > b.any   .L3
> >
> > Does it look OK ?
> > I didn't add it under (UN)COND_UNARY since it inverts the comparison,
> > which we might not want to do for other unary ops ?
>
> I think we should follow the structure of the current binary and
> ternary patterns: cope with unary operations in either arm of the
> vec_cond and use bit_not for the case in which the unary operation
> is in the “false” arm of the vec_cond.
>
> The bit_not will be folded away if the comparison can be inverted,
> but it will be left in-place if the comparison can't be inverted
> (as for some FP comparisons).
Ah indeed, done in the attached patch.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
>
> >
> > Also, I am not sure, how to test if target supports conditional
> > internal function ?
> > I tried to use:
> > (for cmp (tcc_comparison)
> >  icmp (inverted_tcc_comparison)
> >  (simplify
> >   (vec_cond (cmp@2 @0 @1) @3 (negate @3))
> >(with { auto op_type = TREE_TYPE (@2); }
>

[PATCH] tree-optimization/102572 - fix gathers with invariant mask

2021-10-12 Thread Richard Biener via Gcc-patches

This fixes the vector def gathering for invariant masks which
failed to pass in the desired vector type resulting in a non-mask
type to be generate.

Bootstrap and regtest running on x86_64-unknmown-linux-gnu.

2021-10-12  Richard Biener  

PR tree-optimization/102572
* tree-vect-stmts.c (vect_build_gather_load_calls): When
gathering the vectorized defs for the mask pass in the
desired mask vector type so invariants will be handled
correctly.

* g++.dg/vect/pr102572.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr102572.cc | 14 ++
 gcc/tree-vect-stmts.c |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr102572.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr102572.cc 
b/gcc/testsuite/g++.dg/vect/pr102572.cc
new file mode 100644
index 000..0a713081537
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr102572.cc
@@ -0,0 +1,14 @@
+// { dg-do compile }
+// { dg-additional-options "-O3" }
+// { dg-additional-options "-march=skylake-avx512" { target x86_64-*-* 
i?86-*-* } }
+
+int a, b, c, f;
+void g(bool h, int d[][5])
+{
+  for (short i = f; i; i += 1)
+{
+  a = h && d[0][i];
+  for (int j = 0; j < 4; j += c)
+   b = 0;
+}
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index a9c9e3d7c37..f5e1941f8ad 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2791,7 +2791,7 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (mask)
 vect_get_vec_defs_for_operand (vinfo, stmt_info,
   modifier == NARROW ? ncopies / 2 : ncopies,
-  mask, _masks);
+  mask, _masks, masktype);
   for (int j = 0; j < ncopies; ++j)
 {
   tree op, var;
-- 
2.31.1

[PATCH] tree-optimization/102696 - fix SLP discovery for failed BIT_FIELD_REF

2021-10-12 Thread Richard Biener via Gcc-patches

This fixes a forgotten adjustment of matches[] when we fail SLP
discovery.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2021-10-12  Richard Biener  

PR tree-optimization/102696
* tree-vect-slp.c (vect_build_slp_tree_2): Properly mark
the tree fatally failed when we reject a BIT_FIELD_REF.

* g++.dg/vect/pr102696.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr102696.cc | 16 
 gcc/tree-vect-slp.c   |  1 +
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr102696.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr102696.cc 
b/gcc/testsuite/g++.dg/vect/pr102696.cc
new file mode 100644
index 000..5560354304a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr102696.cc
@@ -0,0 +1,16 @@
+// { dg-do compile }
+// { dg-additional-options "-O3" }
+// { dg-additional-options "-march=skylake-avx512" { target x86_64-*-* 
i?86-*-* } }
+
+int a;
+extern bool b[][14];
+char h;
+void f(short g[][14])
+{
+  for (short d = h; d < 21; d += 1)
+for (unsigned char e = 0; e < 14; e += 1)
+  {
+   a = 0;
+   b[d][e] = g[d][e];
+  }
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c70d06e5f20..709bcb63686 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1761,6 +1761,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
   bit_field_size (bfref), ))
{
  lperm.release ();
+ matches[0] = false;
  return NULL;
}
  lperm.safe_push (std::make_pair (0, (unsigned)lane));
-- 
2.31.1

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches


Hi Richi,

I think this is what you meant, I now hide all the unrolling cost 
calculations in the existing target hooks for costs. I did need to 
adjust 'finish_cost' to take the loop_vinfo so the target's 
implementations are able to set the newly renamed 'suggested_unroll_factor'.


Also added the checks for the epilogue's VF.

Is this more like what you had in mind?


gcc/ChangeLog:

    * config/aarch64/aarch64.c (aarch64_finish_cost): Add class 
vec_info parameter.

    * config/i386/i386.c (ix86_finish_cost): Likewise.
    * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.
    * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.
    * target.def: Add class vec_info parameter to finish_cost.
    * targhooks.c (default_finish_cost): Likewise.
    * targhooks.h (default_finish_cost): Likewise.
    * tree-vect-loop.c (vect_determine_vectorization_factor): Use 
suggested_unroll_factor

    to increase vectorization_factor if possible.
    (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor 
member.
    (vect_compute_single_scalar_iteration_cost): Adjust call to 
finish_cost.
    (vect_determine_partial_vectors_and_peeling): Ensure unrolled 
loop is not predicated.

    (vect_determine_unroll_factor): New.
    (vect_try_unrolling): New.
    (vect_reanalyze_as_main_loop): Also try to unroll when 
reanalyzing as main loop.
    (vect_analyze_loop): Add call to vect_try_unrolling and check 
to ensure epilogue
    is either a smaller VF than main loop or uses partial vectors 
and might be of equal

    VF.
    (vect_estimate_min_profitable_iters): Adjust call to finish_cost.
    (vectorizable_reduction): Make sure to not use 
single_defuse_cyle when unrolling.
    * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust 
call to finish_cost.
    * tree-vectorizer.h (finish_cost): Change to pass new class 
vec_info parameter.


On 01/10/2021 09:19, Richard Biener wrote:

On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:


Hi,



That just forces trying the vector modes we've tried before. Though I might
need to revisit this now I think about it. I'm afraid it might be possible
for
this to generate an epilogue with a vf that is not lower than that of the
main
loop, but I'd need to think about this again.

Either way I don't think this changes the vector modes used for the
epilogue.
But maybe I'm just missing your point here.

Yes, I was refering to the above which suggests that when we vectorize
the main loop with V4SF but unroll then we try vectorizing the
epilogue with V4SF as well (but not unrolled).  I think that's
premature (not sure if you try V8SF if the main loop was V4SF but
unrolled 4 times).

My main motivation for this was because I had a SVE loop that vectorized with
both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
V8HI by two and skipped using VNx8HI as a predicated epilogue which would've
been the best choice.

I see, yes - for fully predicated epilogues it makes sense to consider
the same vector mode as for the main loop anyways (independent on
whether we're unrolling or not).  One could argue that with an
unrolled V4SImode main loop a predicated V8SImode epilogue would also
be a good match (but then somehow costing favored the unrolled V4SI
over the V8SI for the main loop...).


So that is why I decided to just 'reset' the vector_mode selection. In a
scenario where you only have the traditional vector modes it might make less
sense.

Just realized I still didn't add any check to make sure the epilogue has a
lower VF than the previous loop, though I'm still not sure that could happen.
I'll go look at where to add that if you agree with this.

As said above, it only needs a lower VF in case the epilogue is not
fully masked - otherwise the same VF would be OK.


I can move it there, it would indeed remove the need for the change to
vect_update_vf_for_slp, the change to
vect_determine_partial_vectors_and_peeling would still be required I think.
It
is meant to disable using partial vectors in an unrolled loop.

Why would we disable the use of partial vectors in an unrolled loop?

The motivation behind that is that the overhead caused by generating
predicates for each iteration will likely be too much for it to be profitable
to unroll. On top of that, when dealing with low iteration count loops, if
executing one predicated iteration would be enough we now still need to
execute all other unrolled predicated iterations, whereas if we keep them
unrolled we skip the unrolled loops.

OK, I guess we're not factoring in costs when deciding on predication
but go for it if it's gernally enabled and possible.

With the proposed scheme we'd then cost the predicated not unrolled
loop against a not predicated unrolled loop which might be a bit
apples vs. oranges also because the target made the unroll decision
based on the data it collected for the predicated loop.

[PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-12 Thread Richard Biener via Gcc-patches



This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
which we can use to wrap a reference tree, recording the ao_ref
associated with it.  That comes in handy when trying to optimize
the constant factor involved with alias stmt walking (or alias
queries in general) where there's parts that are liner in the
reference expression complexity, namely get_ref_base_and_extent,
which shows up usually high on profiles.

The idea was to make a wrapping TREE_AOREFWRAP mostly transparent
to users by making gimple_assign_{lhs,rhs1} strip it and provide
special accessors that get at the embedded ao_ref as well as
doing the wrapping when it's not already there.

The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
not ICE and make the testcases from PR28071 and PR39326 compile
successfully at -O1 (both testcases show a moderately high
load on alias stmt walking around 25%, resp. 34%).  With the
patch which makes use of the cache only from stmt_may_clobber_ref_p
for now the compile-time improves by 7%, resp. 19% which means
overall the idea might be worth pursuing.

I did run into more "issues" with the extra TREE_AOREFWRAP appearing
than anticipated (well, no, not really...).  So given it is a
kind of hack already I'm now thinking of making it even more
transparent by instead of wrapping the refs with another tree
to reallocate the outermost handled_component_p (and only those),
placing the ao_ref after the tree and indicating its presence
by TREE_ASM_WRITTEN (or any other available bit).

As with TREE_AOREFWRAP the key is to invalidate the ao_ref
(strip TREE_AOREFWRAP or unset TREE_ASM_WRITTEN) when its
information becomes stale (for example by unsharing) or invalid
(by mangling the reference into sth semantically different) or
no longer precise (by propagating constants into the reference).

I did consider simply using an (optional) on-the-side hash-map
and allocpool to be initialized by passes and queried by the
oracle.  The downside is that we're optimizing a "constant" factor
of the oracle query but a hash-map lookup isn't exactly
cache-friendly.  Likewise caching across pass boundaries sounded
maybe important (OK, we have at most N ao_refs with N linear in
the program size - but as said, we're optimizing a constant factor).
Existing on-the-side caching includes the SCEV cache for example.

So take this as a proof-of-concept showing the possible gain,
I do think going either the TREE_ASM_WRITTEN or on-the-side table
solution is going to have less issues all around the compiler.

Comments?

Thanks,
Richard.

2021-10-12  Richard Biener  

* treestruct.def (TS_AOREFWRAP): New.
* tree.def (TREE_AOREFWRAP): Likewise.
* gimple.h (gimple_assign_lhs): Look through TREE_AOREFWRAP.
(gimple_assign_rhs1): Likewise.
(gimple_assign_lhs_with_ao_ref): New.
(gimple_assign_rhs1_with_ao_ref): Likewise.
* tree-ssa-alias.c (stmt_may_clobber_ref_p_1): Use the
ao_ref embedded in the stmts LHS if possible.
<... and more hacks ...>
---
 gcc/gimple.c   | 62 +-
 gcc/gimple.h   | 12 ++--
 gcc/tree-core.h|  8 +
 gcc/tree-inline.c  |  4 +++
 gcc/tree-ssa-alias.c   | 11 +--
 gcc/tree-ssa-loop-im.c |  5 ++-
 gcc/tree-ssa-loop-ivopts.c |  3 ++
 gcc/tree-ssa-loop.c|  1 +
 gcc/tree-ssa-operands.c|  1 +
 gcc/tree-streamer.c|  1 +
 gcc/tree.c | 15 +++--
 gcc/tree.def   |  4 +++
 gcc/treestruct.def |  1 +
 13 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index cc7a88e822b..070304b64bf 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1758,6 +1758,60 @@ gimple_set_bb (gimple *stmt, basic_block bb)
 }
 
 
+/* Return rhs1 of a STMT with GIMPLE_SINGLE_RHS and initialize *REF
+   to the wrapping TREE_AOREFWRAP ao_ref or if rhs1 is not wrapped,
+   wrap it.  When wrapping is not profitable initialize *REF to NULL.  */
+
+tree
+gimple_assign_rhs1_with_ao_ref (gassign *stmt, ao_ref **ref)
+{
+  tree rhs1 = stmt->op[1];
+  if (!REFERENCE_CLASS_P (rhs1))
+{
+  *ref = NULL;
+  return rhs1;
+}
+  if (TREE_CODE (rhs1) != TREE_AOREFWRAP)
+{
+  rhs1 = build1 (TREE_AOREFWRAP, TREE_TYPE (rhs1), rhs1);
+  static_assert (sizeof (ao_ref) == sizeof (rhs1->aorefwrap.ref_storage),
+"wrong aorefwrap storage size");
+  *ref = new (>aorefwrap.ref_storage) ao_ref;
+  ao_ref_init (*ref, TREE_OPERAND (rhs1, 0));
+  stmt->op[1] = rhs1;
+  return TREE_OPERAND (rhs1, 0);
+}
+  *ref = reinterpret_cast  (>aorefwrap.ref_storage);
+  return TREE_OPERAND (rhs1, 0);
+}
+
+/* Return lhs of a STMT with GIMPLE_SINGLE_RHS and initialize *REF
+   to the wrapping TREE_AOREFWRAP ao_ref or if rhs1 is not wrapped,
+   wrap it.  When wrapping is not profitable initialize *REF to NULL.  */
+
+tree
+gimple_assign_lhs_with_ao_ref (gassign

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-12 Thread Richard Biener via Gcc-patches

On Tue, Oct 12, 2021 at 10:59 AM HAO CHEN GUI via Gcc-patches
 wrote:
>
> Hi,
>
> This patch disables gimple folding for float or double vec_min/max when 
> fast-math is not set. It makes vec_min/max conform with the guide.
>
> Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
> I re-send the patch as previous one is messed up in email thread. Sorry 
> for that.

If the VSX/altivec min/max instructions conform to IEEE behavior then
you could instead fold
to .F{MIN,MAX} internal functions and define the f{min,max} optabs.

Otherwise the patch looks correct to me - MIN_EXPR and MAX_EXPR are
not IEEE conforming.
Note a better check would be to use HONOR_NANS/HONOR_SIGNED_ZEROS on
the argument type
(that also works for the integer types with the obvious answer).

Richard.

> ChangeLog
>
> 2021-08-25 Haochen Gui 
>
> gcc/
>  * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
>  Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
>  VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.
>
> gcc/testsuite/
>  * gcc.target/powerpc/vec-minmax-1.c: New test.
>  * gcc.target/powerpc/vec-minmax-2.c: Likewise.
>
>
> patch.diff
>
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index b4e13af4dc6..90527734ceb 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
> return true;
>   /* flavors of vec_min.  */
>   case VSX_BUILTIN_XVMINDP:
> +case ALTIVEC_BUILTIN_VMINFP:
> +  if (!flag_finite_math_only || flag_signed_zeros)
> +   return false;
> +  /* Fall through to MIN_EXPR.  */
> +  gcc_fallthrough ();
>   case P8V_BUILTIN_VMINSD:
>   case P8V_BUILTIN_VMINUD:
>   case ALTIVEC_BUILTIN_VMINSB:
> @@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>   case ALTIVEC_BUILTIN_VMINUB:
>   case ALTIVEC_BUILTIN_VMINUH:
>   case ALTIVEC_BUILTIN_VMINUW:
> -case ALTIVEC_BUILTIN_VMINFP:
> arg0 = gimple_call_arg (stmt, 0);
> arg1 = gimple_call_arg (stmt, 1);
> lhs = gimple_call_lhs (stmt);
> @@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
> return true;
>   /* flavors of vec_max.  */
>   case VSX_BUILTIN_XVMAXDP:
> +case ALTIVEC_BUILTIN_VMAXFP:
> +  if (!flag_finite_math_only || flag_signed_zeros)
> +   return false;
> +  /* Fall through to MAX_EXPR.  */
> +  gcc_fallthrough ();
>   case P8V_BUILTIN_VMAXSD:
>   case P8V_BUILTIN_VMAXUD:
>   case ALTIVEC_BUILTIN_VMAXSB:
> @@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>   case ALTIVEC_BUILTIN_VMAXUB:
>   case ALTIVEC_BUILTIN_VMAXUH:
>   case ALTIVEC_BUILTIN_VMAXUW:
> -case ALTIVEC_BUILTIN_VMAXFP:
> arg0 = gimple_call_arg (stmt, 0);
> arg1 = gimple_call_arg (stmt, 1);
> lhs = gimple_call_lhs (stmt);
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> new file mode 100644
> index 000..547798fd65c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
> +
> +/* This test verifies that float or double vec_min/max are bound to
> +   xv[min|max][d|s]p instructions when fast-math is not set.  */
> +
> +
> +#include 
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_D = 0;
> +#else
> +   const int PREF_D = 1;
> +#endif
> +
> +double vmaxd (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_max (va, vb), PREF_D);
> +}
> +
> +double vmind (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_min (va, vb), PREF_D);
> +}
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_F = 0;
> +#else
> +   const int PREF_F = 3;
> +#endif
> +
> +float vmaxf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_max (va, vb), PREF_F);
> +}
> +
> +float vminf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_min (va, vb), PREF_F);
> +}
> diff --git

Re: [PATCH 02/11] Remove base_ind/base_ref handling from extract_base_bit_offset

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Fri, Oct 01, 2021 at 10:07:49AM -0700, Julian Brown wrote:
> In preparation for follow-up patches extending struct dereference
> handling for OpenMP, this patch removes base_ind/base_ref handling from
> gimplify.c:extract_base_bit_offset. This arguably simplifies some of the
> code around the callers of the function also, though subsequent patches
> modify those parts further.
> 
> OK for mainline?
> 
> Thanks,
> 
> Julian
> 
> 2021-09-29  Julian Brown  
> 
> gcc/
>   * gimplify.c (extract_base_bit_offset): Remove BASE_IND, BASE_REF and
>   OPENMP parameters.
>   (strip_indirections): New function.
>   (build_struct_group): Update calls to extract_base_bit_offset.
>   Rearrange indirect/reference handling accordingly.  Use extracted base
>   instead of passed-in decl when grouping component accesses together.

This is ok for trunk once the whole series is approved.

Jakub

[SPARC] Fix PR target/102588

2021-10-12 Thread Eric Botcazou via Gcc-patches

We need a 32-byte wide integer mode (OImode) in order to handle structure 
returns in the 64-bit ABI.

Bootstrapped/regtested on SPARC/Solaris and SPARC64/Linux, applied on the 
mainline, 11 and 10 branches.


2021-10-12  Eric Botcazou  

PR target/102588
* config/sparc/sparc-modes.def (OI): New integer mode.

-- 
Eric Botcazoudiff --git a/gcc/config/sparc/sparc-modes.def b/gcc/config/sparc/sparc-modes.def
index 5cc4743f199..057c09345a9 100644
--- a/gcc/config/sparc/sparc-modes.def
+++ b/gcc/config/sparc/sparc-modes.def
@@ -23,6 +23,9 @@ along with GCC; see the file COPYING3.  If not see
 /* 128-bit floating point */
 FLOAT_MODE (TF, 16, ieee_quad_format);
 
+/* We need a 32-byte mode to return structures in the 64-bit ABI.  */
+INT_MODE (OI, 32);
+
 /* Add any extra modes needed to represent the condition code.
 
We have a CCNZ mode which is used for implicit comparisons with zero when

Re: [PATCH 01/11] libgomp: Release device lock on cbuf error path

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Fri, Oct 01, 2021 at 10:07:48AM -0700, Julian Brown wrote:
> This patch releases the device lock on a sanity-checking error path in
> transfer combining (cbuf) handling in libgomp:target.c.  This shouldn't
> happen when handling well-formed mapping clauses, but erroneous clauses
> can currently cause a hang if the condition triggers.
> 
> Tested with offloading to NVPTX. OK?
> 
> 2021-09-29  Julian Brown  
> 
> libgomp/
>   * target.c (gomp_copy_host2dev): Release device lock on cbuf
>   error path.

Ok, thanks.  This doesn't seem to depend on anything else, so
can be committed separately right away.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index 65bb40100e5..84c6fdf2c47 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -385,7 +385,10 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
> else if (cbuf->chunks[middle].start <= doff)
>   {
> if (doff + sz > cbuf->chunks[middle].end)
> - gomp_fatal ("internal libgomp cbuf error");
> + {
> +   gomp_mutex_unlock (>lock);
> +   gomp_fatal ("internal libgomp cbuf error");
> + }
> memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start),
> h, sz);
> return;
> -- 
> 2.29.2

Jakub

Re: [RFC] Port git gcc-descr to Python

2021-10-12 Thread Martin Liška


Hello.

There's a complete patch that implements both git gcc-descr and gcc-undesrc
and sets corresponding git aliases to use them.

Ready to be installed?
Thanks,
MartinFrom bf46024d03d00edf09d804449acbc5ff17690127 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 11 Oct 2021 14:36:19 +0200
Subject: [PATCH] Port git gcc-{,un}descr to Python.

contrib/ChangeLog:

	* gcc-git-customization.sh: Use the new python implementation.
	* describe_common.py: New file.
	* git-describe.py: New file.
	* git-undescribe.py: New file.
---
 contrib/describe_common.py   | 27 +++
 contrib/gcc-git-customization.sh |  5 ++-
 contrib/git-describe.py  | 56 
 contrib/git-undescribe.py| 39 ++
 4 files changed, 124 insertions(+), 3 deletions(-)
 create mode 100644 contrib/describe_common.py
 create mode 100755 contrib/git-describe.py
 create mode 100755 contrib/git-undescribe.py

diff --git a/contrib/describe_common.py b/contrib/describe_common.py
new file mode 100644
index 000..ff48bccc71c
--- /dev/null
+++ b/contrib/describe_common.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python3
+
+import subprocess
+
+BASE_PREFIX = 'basepoints/gcc-'
+
+
+def run_git(cmd):
+return subprocess.run(cmd, shell=True, encoding='utf8',
+  stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+
+
+def get_upstream():
+r = run_git('git config --get gcc-config.upstream')
+upstream = r.stdout.strip() if r.returncode else 'origin'
+return upstream
+
+
+def get_branch_for_version(version):
+r = run_git('git rev-parse --quiet --verify '
+f'origin/releases/gcc-{version}')
+return f'releases/gcc-{version}' if r.returncode == 0 else 'master'
+
+
+def get_description(revision, options=''):
+return run_git(f'git describe --all --match {BASE_PREFIX}[0-9]* '
+   f'{revision} ' + options)
diff --git a/contrib/gcc-git-customization.sh b/contrib/gcc-git-customization.sh
index aca61b781ff..1cc0a5abc34 100755
--- a/contrib/gcc-git-customization.sh
+++ b/contrib/gcc-git-customization.sh
@@ -22,9 +22,8 @@ git config alias.svn-rev '!f() { rev=$1; shift; git log --all --grep="^From-SVN:
 
 # Add git commands to convert git commit to monotonically increasing revision number
 # and vice versa
-git config alias.gcc-descr \!"f() { if test \${1:-no} = --full; then c=\${2:-master}; r=\$(git describe --all --abbrev=40 --match 'basepoints/gcc-[0-9]*' \$c | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-,r,p'); expr match \${r:-no} '^r[0-9]\\+\$' >/dev/null && r=\${r}-0-g\$(git rev-parse \${2:-master}); else c=\${1:-master}; r=\$(git describe --all --match 'basepoints/gcc-[0-9]*' \$c | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)-\\([0-9]\\+\\)-g[0-9a-f]*\$,r\\2-\\3,p;s,^\\(tags/\\)\\?basepoints/gcc-\\([0-9]\\+\\)\$,r\\2-0,p'); fi; if test -n \$r; then o=\$(git config --get gcc-config.upstream); rr=\$(echo \$r | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\\(-g[0-9a-f]\\+\\)\\?\$,\\1,p'); if git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$rr >/dev/null; then m=releases/gcc-\$rr; else m=master; fi; git merge-base --is-ancestor \$c \${o:-origin}/\$m && \echo \${r}; fi; }; f"
-git config alias.gcc-undescr \!"f() { o=\$(git config --get gcc-config.upstream); r=\$(echo \$1 | sed -n 's,^r\\([0-9]\\+\\)-[0-9]\\+\$,\\1,p'); n=\$(echo \$1 | sed -n 's,^r[0-9]\\+-\\([0-9]\\+\\)\$,\\1,p'); test -z \$r && echo Invalid id \$1 && exit 1; h=\$(git rev-parse --verify --quiet \${o:-origin}/releases/gcc-\$r); test -z \$h && h=\$(git rev-parse --verify --quiet \${o:-origin}/master); p=\$(git describe --all --match 'basepoints/gcc-'\$r \$h | sed -n 's,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+-\\([0-9]\\+\\)-g[0-9a-f]*\$,\\2,p;s,^\\(tags/\\)\\?basepoints/gcc-[0-9]\\+\$,0,p'); git rev-parse --verify \$h~\$(expr \$p - \$n); }; f"
-
+git config alias.gcc-descr '!f() { "`git rev-parse --show-toplevel`/contrib/git-describe.py" $@; } ; f'
+git config alias.gcc-undescr '!f() { "`git rev-parse --show-toplevel`/contrib/git-undescribe.py" $@; } ; f'
 git config alias.gcc-verify '!f() { "`git rev-parse --show-toplevel`/contrib/gcc-changelog/git_check_commit.py" $@; } ; f'
 git config alias.gcc-backport '!f() { "`git rev-parse --show-toplevel`/contrib/git-backport.py" $@; } ; f'
 git config alias.gcc-mklog '!f() { "`git rev-parse --show-toplevel`/contrib/mklog.py" $@; } ; f'
diff --git a/contrib/git-describe.py b/contrib/git-describe.py
new file mode 100755
index 000..db2d229a31d
--- /dev/null
+++ b/contrib/git-describe.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+
+import argparse
+import sys
+
+from describe_common import BASE_PREFIX, get_branch_for_version
+from describe_common import get_description, get_upstream, run_git
+
+DEFAULT_REV = 'master'
+hash_length = 14
+
+parser = argparse.ArgumentParser(description='Describe a GCC git commit.')
+parser.add_argument('revision', nargs='?', default=DEFAULT_REV,
+

[PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-12 Thread HAO CHEN GUI via Gcc-patches


Hi,

   This patch disables gimple folding for float or double vec_min/max when 
fast-math is not set. It makes vec_min/max conform with the guide.

Bootstrapped and tested on powerpc64le-linux with no regressions. Is this okay 
for trunk? Any recommendations? Thanks a lot.

   I re-send the patch as previous one is messed up in email thread. Sorry for 
that.

ChangeLog

2021-08-25 Haochen Gui 

gcc/
    * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
    Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
    VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

gcc/testsuite/
    * gcc.target/powerpc/vec-minmax-1.c: New test.
    * gcc.target/powerpc/vec-minmax-2.c: Likewise.


patch.diff

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..90527734ceb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12159,6 +12159,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return true;
 /* flavors of vec_min.  */
 case VSX_BUILTIN_XVMINDP:
+    case ALTIVEC_BUILTIN_VMINFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+   return false;
+  /* Fall through to MIN_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMINSD:
 case P8V_BUILTIN_VMINUD:
 case ALTIVEC_BUILTIN_VMINSB:
@@ -12167,7 +12172,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case ALTIVEC_BUILTIN_VMINUB:
 case ALTIVEC_BUILTIN_VMINUH:
 case ALTIVEC_BUILTIN_VMINUW:
-    case ALTIVEC_BUILTIN_VMINFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
@@ -12177,6 +12181,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return true;
 /* flavors of vec_max.  */
 case VSX_BUILTIN_XVMAXDP:
+    case ALTIVEC_BUILTIN_VMAXFP:
+  if (!flag_finite_math_only || flag_signed_zeros)
+   return false;
+  /* Fall through to MAX_EXPR.  */
+  gcc_fallthrough ();
 case P8V_BUILTIN_VMAXSD:
 case P8V_BUILTIN_VMAXUD:
 case ALTIVEC_BUILTIN_VMAXSB:
@@ -12185,7 +12194,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case ALTIVEC_BUILTIN_VMAXUB:
 case ALTIVEC_BUILTIN_VMAXUH:
 case ALTIVEC_BUILTIN_VMAXUW:
-    case ALTIVEC_BUILTIN_VMAXFP:
   arg0 = gimple_call_arg (stmt, 0);
   arg1 = gimple_call_arg (stmt, 1);
   lhs = gimple_call_lhs (stmt);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
new file mode 100644
index 000..547798fd65c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
+
+/* This test verifies that float or double vec_min/max are bound to
+   xv[min|max][d|s]p instructions when fast-math is not set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_max (va, vb), PREF_D);
+}
+
+double vmind (double a, double b)
+{
+  vector double va = vec_promote (a, PREF_D);
+  vector double vb = vec_promote (b, PREF_D);
+  return vec_extract (vec_min (va, vb), PREF_D);
+}
+
+#ifdef _BIG_ENDIAN
+   const int PREF_F = 0;
+#else
+   const int PREF_F = 3;
+#endif
+
+float vmaxf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_max (va, vb), PREF_F);
+}
+
+float vminf (float a, float b)
+{
+  vector float va = vec_promote (a, PREF_F);
+  vector float vb = vec_promote (b, PREF_F);
+  return vec_extract (vec_min (va, vb), PREF_F);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
new file mode 100644
index 000..4c6f4365830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
+/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
+
+/* This test verifies that float or double vec_min/max can be converted
+   to scalar comparison when fast-math is set.  */
+
+
+#include 
+
+#ifdef _BIG_ENDIAN
+   const int PREF_D = 0;
+#else
+   const int PREF_D = 1;
+#endif
+
+double vmaxd (double

Re: [Patch] Fortran version of libgomp.c-c++-common/icv-{3,4}.c (was: [committed] openmp: Add testsuite coverage for omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit)

2021-10-12 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 12, 2021 at 10:41:28AM +0200, Tobias Burnus wrote:
> Hi,
> 
> On 12.10.21 09:42, Jakub Jelinek wrote:
> > This adds (C/C++ only) testsuite coverage for these new OpenMP 5.1 APIs.
> 
> And attached is the Fortranified version of those testcases.
> 
> OK?
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> Fortran version of libgomp.c-c++-common/icv-{3,4}.c
> 
> This adds the Fortran testsuite coverage of
> omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit
> 
> libgomp/
>   * testsuite/libgomp.fortran/icv-3.f90: New.
>   * testsuite/libgomp.fortran/icv-4.f90: New.

Ok, thanks.

Jakub

[Patch] Fortran version of libgomp.c-c++-common/icv-{3,4}.c (was: [committed] openmp: Add testsuite coverage for omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit)

2021-10-12 Thread Tobias Burnus


Hi,

On 12.10.21 09:42, Jakub Jelinek wrote:

This adds (C/C++ only) testsuite coverage for these new OpenMP 5.1 APIs.


And attached is the Fortranified version of those testcases.

OK?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran version of libgomp.c-c++-common/icv-{3,4}.c

This adds the Fortran testsuite coverage of
omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit

libgomp/
	* testsuite/libgomp.fortran/icv-3.f90: New.
	* testsuite/libgomp.fortran/icv-4.f90: New.

 libgomp/testsuite/libgomp.fortran/icv-3.f90 | 60 +
 libgomp/testsuite/libgomp.fortran/icv-4.f90 | 45 ++
 2 files changed, 105 insertions(+)

diff --git a/libgomp/testsuite/libgomp.fortran/icv-3.f90 b/libgomp/testsuite/libgomp.fortran/icv-3.f90
new file mode 100644
index 000..b2ccd776223
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-3.f90
@@ -0,0 +1,60 @@
+use omp_lib
+implicit none (type, external)
+  if (.not. env_exists ("OMP_NUM_TEAMS") &
+  .and. omp_get_max_teams () /= 0) &
+error stop 1
+  call omp_set_num_teams (7)
+  if (omp_get_max_teams () /= 7) &
+error stop 2
+  if (.not. env_exists ("OMP_TEAMS_THREAD_LIMIT") &
+  .and. omp_get_teams_thread_limit () /= 0) &
+error stop 3
+  call omp_set_teams_thread_limit (15)
+  if (omp_get_teams_thread_limit () /= 15) &
+error stop 4
+  !$omp teams
+if (omp_get_max_teams () /= 7 &
+.or. omp_get_teams_thread_limit () /= 15 &
+.or. omp_get_num_teams () < 1 &
+.or. omp_get_num_teams () > 7 &
+.or. omp_get_team_num () < 0 &
+.or. omp_get_team_num () >= omp_get_num_teams () &
+.or. omp_get_thread_limit () < 1 &
+.or. omp_get_thread_limit () > 15) &
+  error stop 5
+  !$omp end teams
+  !$omp teams num_teams(5) thread_limit (13)
+if (omp_get_max_teams () /= 7 &
+.or. omp_get_teams_thread_limit () /= 15 &
+.or. omp_get_num_teams () /= 5 &
+.or. omp_get_team_num () < 0 &
+.or. omp_get_team_num () >= omp_get_num_teams () &
+.or. omp_get_thread_limit () < 1 &
+.or. omp_get_thread_limit () > 13) &
+  error stop 6
+  !$omp end teams
+  !$omp teams num_teams(8) thread_limit (16)
+if (omp_get_max_teams () /= 7 &
+.or. omp_get_teams_thread_limit () /= 15 &
+.or. omp_get_num_teams () /= 8 &
+.or. omp_get_team_num () < 0 &
+.or. omp_get_team_num () >= omp_get_num_teams () &
+.or. omp_get_thread_limit () < 1 &
+.or. omp_get_thread_limit () > 16) &
+  error stop 7
+  !$omp end teams
+contains
+  logical function env_exists (name)
+character(len=*) :: name
+character(len=40) :: val
+integer :: stat
+call get_environment_variable (name, val, status=stat)
+if (stat == 0) then
+  env_exists = .true.
+else if (stat == 1) then
+  env_exists = .false.
+else
+  error stop 10
+endif
+  end
+end
diff --git a/libgomp/testsuite/libgomp.fortran/icv-4.f90 b/libgomp/testsuite/libgomp.fortran/icv-4.f90
new file mode 100644
index 000..f76c96d7d0d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/icv-4.f90
@@ -0,0 +1,45 @@
+! { dg-set-target-env-var OMP_NUM_TEAMS "6" }
+! { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "12" }
+
+use omp_lib
+implicit none (type, external)
+  if (env_is_set ("OMP_NUM_TEAMS", "6")) then
+if (omp_get_max_teams () /= 6) &
+  error stop 1
+  else
+call omp_set_num_teams (6)
+  end if
+  if (env_is_set ("OMP_TEAMS_THREAD_LIMIT", "12")) then
+if (omp_get_teams_thread_limit () /= 12) &
+  error stop 2
+  else
+call omp_set_teams_thread_limit (12)
+  end if
+  !$omp teams
+if (omp_get_max_teams () /= 6 &
+.or. omp_get_teams_thread_limit () /= 12 &
+.or. omp_get_num_teams () < 1 &
+.or. omp_get_num_teams () > 6 &
+.or. omp_get_team_num () < 0 &
+.or. omp_get_team_num () >= omp_get_num_teams () &
+.or. omp_get_thread_limit () < 1 &
+.or. omp_get_thread_limit () > 12) &
+  error stop 3
+  !$omp end teams
+contains
+  logical function env_is_set (name, val)
+character(len=*) :: name, val
+character(len=40) :: val2
+integer :: stat
+call get_environment_variable (name, val2, status=stat)
+if (stat == 0) then
+  if (val == val2) then
+env_is_set = .true.
+return
+  end if
+else if (stat /= 1) then
+  error stop 10
+endif
+env_is_set = .false.
+  end
+end

[PATCH] tree-optimization/102659 - avoid undefined overflow after if-conversion

2021-10-12 Thread Richard Biener via Gcc-patches

The following makes sure to rewrite arithmetic with undefined behavior
on overflow to a well-defined variant when moving them to be always
executed as part of doing if-conversion for loop vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2021-10-11  Richard Biener  

PR tree-optimization/102659
* tree-if-conv.c (need_to_rewrite_undefined): New flag.
(if_convertible_gimple_assign_stmt_p): Mark the loop for
rewrite when stmts with undefined behavior on integer
overflow appear.
(combine_blocks): Predicate also when we need to rewrite stmts.
(predicate_statements): Rewrite affected stmts to something
with well-defined behavior on overflow.
(tree_if_conversion): Initialize need_to_rewrite_undefined.

* gcc.dg/torture/pr69760.c: Adjust the testcase.
* gcc.target/i386/avx2-vect-mask-store-move1.c: Expect to move
the conversions to unsigned as well.
---
 gcc/testsuite/gcc.dg/torture/pr69760.c|  3 +-
 .../i386/avx2-vect-mask-store-move1.c |  2 +-
 gcc/tree-if-conv.c| 28 ++-
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr69760.c 
b/gcc/testsuite/gcc.dg/torture/pr69760.c
index 53733c7c6a4..47e01ae59bd 100644
--- a/gcc/testsuite/gcc.dg/torture/pr69760.c
+++ b/gcc/testsuite/gcc.dg/torture/pr69760.c
@@ -1,11 +1,10 @@
 /* PR tree-optimization/69760 */
 /* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } } } */
-/* { dg-options "-O2" } */
 
 #include 
 #include 
 
-__attribute__((noinline, noclone)) void
+__attribute__((noinline, noclone)) static void
 test_func (double *a, int L, int m, int n, int N)
 {
   int i, k;
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c 
b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
index 989ba402e0e..6a47a09c835 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -78,4 +78,4 @@ avx2_test (void)
   abort ();
 }
 
-/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 6 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Move stmt to created bb" 10 "vect" } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index d7b7b309309..6a67acfeaae 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -132,6 +132,11 @@ along with GCC; see the file COPYING3.  If not see
predicate_statements for the kinds of predication we support.  */
 static bool need_to_predicate;
 
+/* True if we have to rewrite stmts that may invoke undefined behavior
+   when a condition C was false so it doesn't if it is always executed.
+   See predicate_statements for the kinds of predication we support.  */
+static bool need_to_rewrite_undefined;
+
 /* Indicate if there are any complicated PHIs that need to be handled in
if-conversion.  Complicated PHI has more than two arguments and can't
be degenerated to two arguments PHI.  See more information in comment
@@ -1042,6 +1047,12 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
fprintf (dump_file, "tree could trap...\n");
   return false;
 }
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
+  && arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+/* We have to rewrite stmts with undefined overflow.  */
+need_to_rewrite_undefined = true;
 
   /* When if-converting stores force versioning, likewise if we
  ended up generating store data races.  */
@@ -2563,6 +2574,20 @@ predicate_statements (loop_p loop)
 
  gsi_replace (, new_stmt, true);
}
+ else if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && TYPE_OVERFLOW_UNDEFINED
+   (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+   {
+ gsi_remove (, true);
+ gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
+GSI_SAME_STMT);
+ if (gsi_end_p (gsi))
+   gsi = gsi_last_bb (gimple_bb (stmt));
+ else
+   gsi_prev ();
+   }
  else if (gimple_vdef (stmt))
{
  tree lhs = gimple_assign_lhs (stmt);
@@ -2647,7 +2672,7 @@ combine_blocks (class loop *loop)
   insert_gimplified_predicates (loop);
   predicate_all_scalar_phis (loop);
 
-  if (need_to_predicate)
+  if (need_to_predicate || need_to_rewrite_undefined)
 predicate_statements (loop);
 
   /* Merge basic blocks.  */
@@ -3148,6 +3173,7 @@ tree_if_conversion (class loop *loop, vec 
*preds)
   rloop = NULL;
   ifc_bbs = NULL;

Re: [PATCH] aix: handle 64bit inodes for include directories

2021-10-12 Thread CHIGOT, CLEMENT via Gcc-patches

Hi Jeff,

Any update on this patch ?
As it's dealing with configure files, I would like to have it merged
asap before any conflicts appear.

Thanks,
Clément

[committed] openmp: Avoid calling clear_type_padding_in_mask in the common case where there can't be any padding

2021-10-12 Thread Jakub Jelinek via Gcc-patches

Hi!

We can use the clear_padding_type_may_have_padding_p function, which
is conservative for e.g. RECORD_TYPE/UNION_TYPE, but for the floating and
complex floating types is accurate.  clear_type_padding_in_mask is
more expensive because we need to allocate memory, fill it, call the function
which itself is more expensive and then analyze the memory, so for the
common case of float/double atomics or even long double on most targets
we can avoid that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-12  Jakub Jelinek  

gcc/
* gimple-fold.h (clear_padding_type_may_have_padding_p): Declare.
* gimple-fold.c (clear_padding_type_may_have_padding_p): No longer
static.
gcc/c-family/
* c-omp.c (c_finish_omp_atomic): Use
clear_padding_type_may_have_padding_p.

--- gcc/gimple-fold.h.jj2021-04-26 21:49:44.358846211 +0200
+++ gcc/gimple-fold.h   2021-10-11 22:31:03.505270575 +0200
@@ -36,6 +36,7 @@ extern tree maybe_fold_and_comparisons (
enum tree_code, tree, tree);
 extern tree maybe_fold_or_comparisons (tree, enum tree_code, tree, tree,
   enum tree_code, tree, tree);
+extern bool clear_padding_type_may_have_padding_p (tree);
 extern void clear_type_padding_in_mask (tree, unsigned char *);
 extern bool optimize_atomic_compare_exchange_p (gimple *);
 extern void fold_builtin_atomic_compare_exchange (gimple_stmt_iterator *);
--- gcc/gimple-fold.c.jj2021-09-27 23:34:56.519266040 +0200
+++ gcc/gimple-fold.c   2021-10-11 22:30:40.734595135 +0200
@@ -4632,7 +4632,7 @@ clear_padding_real_needs_padding_p (tree
 
 /* Return true if TYPE might contain any padding bits.  */
 
-static bool
+bool
 clear_padding_type_may_have_padding_p (tree type)
 {
   switch (TREE_CODE (type))
--- gcc/c-family/c-omp.c.jj 2021-10-07 23:03:44.070935299 +0200
+++ gcc/c-family/c-omp.c2021-10-11 22:32:06.586371445 +0200
@@ -381,7 +381,9 @@ c_finish_omp_atomic (location_t loc, enu
  bool clear_padding = false;
  HOST_WIDE_INT non_padding_start = 0;
  HOST_WIDE_INT non_padding_end = 0;
- if (BITS_PER_UNIT == 8 && CHAR_BIT == 8)
+ if (BITS_PER_UNIT == 8
+ && CHAR_BIT == 8
+ && clear_padding_type_may_have_padding_p (cmptype))
{
  HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i;
  gcc_assert (sz > 0);

Jakub

[committed] openmp: Add documentation for omp_{get_max, set_num}_threads and omp_{s, g}et_teams_thread_limit

2021-10-12 Thread Jakub Jelinek via Gcc-patches

Hi!

This patch adds documentation for these new OpenMP 5.1 APIs as well as
two new environment variables - OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

We still have a lot of APIs undocumented though:
for i in `sed -n 's,^\t\(omp_.*[^_]\);.*$,\1,p' libgomp.map | sort -u`; do grep 
-q $i:: libgomp.texi || echo $i; done
omp_aligned_alloc
omp_aligned_calloc
omp_alloc
omp_calloc
omp_capture_affinity
omp_destroy_allocator
omp_display_affinity
omp_display_env
omp_free
omp_get_affinity_format
omp_get_default_allocator
omp_get_num_places
omp_get_partition_num_places
omp_get_partition_place_nums
omp_get_place_num
omp_get_place_num_procs
omp_get_place_proc_ids
omp_init_allocator
omp_pause_resource
omp_pause_resource_all
omp_realloc
omp_set_affinity_format
omp_set_default_allocator
omp_target_alloc
omp_target_associate_ptr
omp_target_disassociate_ptr
omp_target_free
omp_target_is_present
omp_target_memcpy
omp_target_memcpy_rect

2021-10-12  Jakub Jelinek  

* libgomp.texi (omp_get_max_teams, omp_get_teams_thread_limit,
omp_set_num_teams, omp_set_teams_thread_limit, OMP_NUM_TEAMS,
OMP_TEAMS_THREAD_LIMIT): Document.

--- libgomp/libgomp.texi.jj 2021-10-11 17:40:13.339012427 +0200
+++ libgomp/libgomp.texi2021-10-11 19:51:08.012941074 +0200
@@ -369,6 +369,7 @@ linkage, and do not throw exceptions.
 * omp_get_level::   Number of parallel regions
 * omp_get_max_active_levels::   Current maximum number of active regions
 * omp_get_max_task_priority::   Maximum task priority value that can be set
+* omp_get_max_teams::   Maximum number of teams for teams region
 * omp_get_max_threads:: Maximum number of threads of parallel region
 * omp_get_nested::  Nested parallel regions
 * omp_get_num_devices:: Number of target devices
@@ -380,6 +381,7 @@ linkage, and do not throw exceptions.
 * omp_get_supported_active_levels:: Maximum number of active regions supported
 * omp_get_team_num::Get team number
 * omp_get_team_size::   Number of threads in a team
+* omp_get_teams_thread_limit::  Maximum number of threads imposed by teams
 * omp_get_thread_limit::Maximum number of threads
 * omp_get_thread_num::  Current thread ID
 * omp_in_parallel:: Whether a parallel region is active
@@ -389,8 +391,10 @@ linkage, and do not throw exceptions.
 * omp_set_dynamic:: Enable/disable dynamic teams
 * omp_set_max_active_levels::   Limits the number of active parallel regions
 * omp_set_nested::  Enable/disable nested parallel regions
+* omp_set_num_teams::   Set upper teams limit for teams region
 * omp_set_num_threads:: Set upper team size limit
 * omp_set_schedule::Set the runtime scheduling method
+* omp_set_teams_thread_limit::  Set upper thread limit for teams construct
 
 Initialize, set, test, unset and destroy simple and nested locks.
 
@@ -684,6 +688,32 @@ This function obtains the maximum allowe
 @end table
 
 
+@node omp_get_max_teams
+@section @code{omp_get_max_teams} -- Maximum number of teams of teams region
+@table @asis
+@item @emph{Description}:
+Return the maximum number of teams used for the teams region
+that does not use the clause @code{num_teams}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_set_num_teams}, @ref{omp_get_num_teams}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
+@end table
+
+
+
 @node omp_get_max_threads
 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel 
region
 @table @asis
@@ -988,6 +1018,32 @@ to @code{omp_get_num_threads}.
 
 
 
+@node omp_get_teams_thread_limit
+@section @code{omp_get_teams_thread_limit} -- Maximum number of threads 
imposed by teams
+@table @asis
+@item @emph{Description}:
+Return the maximum number of threads that will be able to participate in
+each team created by a teams construct.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function 
omp_get_teams_thread_limit()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
+@end table
+
+
+
 @node omp_get_thread_limit
 @section @code{omp_get_thread_limit} -- Maximum number of threads
 @table @asis
@@ -1232,6 +1288,34 @@ regions will set the maximum number

[committed] openmp: Fix up warnings on libgomp.info build

2021-10-12 Thread Jakub Jelinek via Gcc-patches

Hi!

When building libgomp documentation, I see
makeinfo --split-size=500  -I ../../../libgomp/../gcc/doc/include -I 
../../../libgomp -o libgomp.info ../../../libgomp/libgomp.texi
../../../libgomp/libgomp.texi:503: warning: node next `omp_get_default_device' 
in menu `omp_get_device_num' and in sectioning `omp_get_dynamic' differ
../../../libgomp/libgomp.texi:528: warning: node prev `omp_get_dynamic' in menu 
`omp_get_device_num' and in sectioning `omp_get_default_device' differ
../../../libgomp/libgomp.texi:560: warning: node next `omp_get_initial_device' 
in menu `omp_get_level' and in sectioning `omp_get_device_num' differ
../../../libgomp/libgomp.texi:587: warning: node next `omp_get_device_num' in 
menu `omp_get_dynamic' and in sectioning `omp_get_level' differ
../../../libgomp/libgomp.texi:587: warning: node prev `omp_get_device_num' in 
menu `omp_get_default_device' and in sectioning `omp_get_initial_device' differ
../../../libgomp/libgomp.texi:615: warning: node prev `omp_get_level' in menu 
`omp_get_initial_device' and in sectioning `omp_get_device_num' differ
warnings.  This patch fixes those.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-12  Jakub Jelinek  

* libgomp.texi (omp_get_device_num): Move @node before omp_get_dynamic
to avoid makeinfo warnings.

--- libgomp/libgomp.texi.jj 2021-10-10 22:53:06.860738408 +0200
+++ libgomp/libgomp.texi2021-10-11 17:40:13.339012427 +0200
@@ -525,6 +525,34 @@ Get the default device for target region
 
 
 
+@node omp_get_device_num
+@section @code{omp_get_device_num} -- Return device number of current device
+@table @asis
+@item @emph{Description}:
+This function returns a device number that represents the device that the
+current thread is executing on. For OpenMP 5.0, this must be equal to the
+value returned by the @code{omp_get_initial_device} function when called
+from the host.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_initial_device}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
+@end table
+
+
+
 @node omp_get_dynamic
 @section @code{omp_get_dynamic} -- Dynamic teams setting
 @table @asis
@@ -583,34 +611,6 @@ For OpenMP 5.1, this must be equal to th
 @end table
 
 
-
-@node omp_get_device_num
-@section @code{omp_get_device_num} -- Return device number of current device
-@table @asis
-@item @emph{Description}:
-This function returns a device number that represents the device that the
-current thread is executing on. For OpenMP 5.0, this must be equal to the
-value returned by the @code{omp_get_initial_device} function when called
-from the host.
-
-@item @emph{C/C++}
-@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
-@end multitable
-
-@item @emph{Fortran}:
-@multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
-@end multitable
-
-@item @emph{See also}:
-@ref{omp_get_initial_device}
-
-@item @emph{Reference}:
-@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
-@end table
-
-
 
 @node omp_get_level
 @section @code{omp_get_level} -- Obtain the current nesting level

Jakub

[committed] openmp: Add testsuite coverage for omp_{get_max,set_num}_threads and omp_{s,g}et_teams_thread_limit

2021-10-12 Thread Jakub Jelinek via Gcc-patches

Hi!

This adds (C/C++ only) testsuite coverage for these new OpenMP 5.1 APIs.

Regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-12  Jakub Jelinek  

* testsuite/libgomp.c-c++-common/icv-3.c: New test.
* testsuite/libgomp.c-c++-common/icv-4.c: New test.

--- libgomp/testsuite/libgomp.c-c++-common/icv-3.c.jj   2021-10-11 
17:16:35.455296745 +0200
+++ libgomp/testsuite/libgomp.c-c++-common/icv-3.c  2021-10-11 
17:19:14.358023413 +0200
@@ -0,0 +1,54 @@
+#include 
+#include 
+
+int
+main ()
+{
+  if (getenv ("OMP_NUM_TEAMS") == NULL
+  && omp_get_max_teams () != 0)
+abort ();
+  omp_set_num_teams (7);
+  if (omp_get_max_teams () != 7)
+abort ();
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  && omp_get_teams_thread_limit () != 0)
+abort ();
+  omp_set_teams_thread_limit (15);
+  if (omp_get_teams_thread_limit () != 15)
+abort ();
+  #pragma omp teams
+  {
+if (omp_get_max_teams () != 7
+   || omp_get_teams_thread_limit () != 15
+   || omp_get_num_teams () < 1
+   || omp_get_num_teams () > 7
+   || omp_get_team_num () < 0
+   || omp_get_team_num () >= omp_get_num_teams ()
+   || omp_get_thread_limit () < 1
+   || omp_get_thread_limit () > 15)
+  abort ();
+  }
+  #pragma omp teams num_teams(5) thread_limit (13)
+  {
+if (omp_get_max_teams () != 7
+   || omp_get_teams_thread_limit () != 15
+   || omp_get_num_teams () != 5
+   || omp_get_team_num () < 0
+   || omp_get_team_num () >= omp_get_num_teams ()
+   || omp_get_thread_limit () < 1
+   || omp_get_thread_limit () > 13)
+  abort ();
+  }
+  #pragma omp teams num_teams(8) thread_limit (16)
+  {
+if (omp_get_max_teams () != 7
+   || omp_get_teams_thread_limit () != 15
+   || omp_get_num_teams () != 8
+   || omp_get_team_num () < 0
+   || omp_get_team_num () >= omp_get_num_teams ()
+   || omp_get_thread_limit () < 1
+   || omp_get_thread_limit () > 16)
+  abort ();
+  }
+  return 0;
+}
--- libgomp/testsuite/libgomp.c-c++-common/icv-4.c.jj   2021-10-11 
17:17:22.963617081 +0200
+++ libgomp/testsuite/libgomp.c-c++-common/icv-4.c  2021-10-11 
17:22:35.797141533 +0200
@@ -0,0 +1,40 @@
+/* { dg-set-target-env-var OMP_NUM_TEAMS "6" } */
+/* { dg-set-target-env-var OMP_TEAMS_THREAD_LIMIT "12" } */
+
+#include 
+#include 
+#include 
+
+int
+main ()
+{
+  if (getenv ("OMP_NUM_TEAMS") != NULL
+  && strcmp (getenv ("OMP_NUM_TEAMS"), "6") == 0)
+{
+  if (omp_get_max_teams () != 6)
+   abort ();
+}
+  else
+omp_set_num_teams (6);
+  if (getenv ("OMP_TEAMS_THREAD_LIMIT") == NULL
+  && strcmp (getenv ("OMP_TEAMS_THREAD_LIMIT"), "12") == 0)
+{
+  if (omp_get_teams_thread_limit () != 12)
+   abort ();
+}
+  else
+omp_set_teams_thread_limit (12);
+  #pragma omp teams
+  {
+if (omp_get_max_teams () != 6
+   || omp_get_teams_thread_limit () != 12
+   || omp_get_num_teams () < 1
+   || omp_get_num_teams () > 6
+   || omp_get_team_num () < 0
+   || omp_get_team_num () >= omp_get_num_teams ()
+   || omp_get_thread_limit () < 1
+   || omp_get_thread_limit () > 12)
+  abort ();
+  }
+  return 0;
+}

Jakub

Re: [PATCH] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-12 Thread Richard Biener via Gcc-patches

On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor  wrote:
>
> The attached change extends GCC's warnings for out-of-bounds
> stores to cover atomic (and __sync) built-ins.
>
> Rather than hardcoding the properties of these built-ins just
> for the sake of the out-of-bounds detection, on the assumption
> that it might be useful for future optimizations as well, I took
> the approach of extending class attr_fnspec to express their
> special property that they encode the size of the access in their
> name.
>
> I also took the liberty of making attr_fnspec assignable (something
> the rest of my patch relies on), and updating some comments for
> the characters the class uses to encode function properties, based
> on my understanding of their purpose.
>
> Tested on x86_64-linux.

Hmm, so you place 'A' at an odd place (where the return value is specified),
but you do not actually specify the behavior on the return value.  Shoudln't

+ 'A'specifies that the function atomically accesses a constant
+   1 << N bytes where N is indicated by character 3+2i

maybe read

'A' specifies that the function returns the memory pointed to
by argument
 one of size 1 << N bytes where N is indicated by
character 3 +2i accessed atomically

?  I also wonder if it's necessary to constrain this to 'atomic' accesses
for the purpose of the patch and whether that detail could be omitted to
eventually make more use of it?

Likewise

+ '0'...'9'  specifies the size of value written/read is given either
+   by the specified argument, or for atomic functions, by
+   2 ^ N where N is the constant value denoted by the character

should mention (excluding '0') for the argument position.

   /* length of the fn spec string.  */
-  const unsigned len;
+  unsigned len;

why that?

+  /* Return true of the function is an __atomic or __sync built-in.  */

you didn't specify that for 'A' ...

+  bool
+  atomic_p () const
+  {
+return str[0] == 'A';
+  }

+attr_fnspec
+atomic_builtin_fnspec (tree callee)
+{
+  switch (DECL_FUNCTION_CODE (callee))
+{
+#define BUILTIN_ACCESS_SIZE_FNSPEC(N, lgsz)\
+  BUILT_IN_ATOMIC_LOAD_ ## N:  \
+   return "Ap" "R" lgsz;

note that doing this for atomics makes those no longer a compiler barrier
for (aliased) loads and stores which means they are no longer a reliable
way to implement locks.  That's a reason why I never pushed a
PTA/alias patch I have to open-code this.

Thus, do we really want to do this?

Richard.

>
> Martin

Re: [PATCH] hardened conditionals

2021-10-12 Thread Alexandre Oliva via Gcc-patches

On Oct  9, 2021, Richard Biener  wrote:

> Why two passes (and two IL traverses?) 

Different traversals, no reason to force them into a single pass.  One
only looks at the last stmt of each block, where cond stmts may be,
while the other has to look at every stmt.

> How do you prevent RTL optimizers (jump threading) from removing the
> redundant tests?

The trick I'm using to copy of a value without the compiler's knowing
it's still the same value is 'asm ("" : "=g" (alt) : "0" (src));'

I've pondered introducing __builtin_hidden_copy or somesuch, but it
didn't seem worth it.

> I'd have expected such hardening to occur very late in the RTL
> pipeline.

Yeah, that would be another way to do it, but then it would have to be a
lot trickier, given all the different ways in which compare-and-branch
can be expressed in RTL.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

93 matches

Mail list logo